Kafka是如何利用零拷贝提高性能的

        <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
<div>
    Kafka 在执行消息的写入和读取这么快的原因，其中的一个原因是零拷贝（Zero-copy）技术，下面我们来了解一下这么高效的原因。
    传统的文件读写

传统的文件读写或者网络传输，通常需要将数据从 内核态 转换为 用户态 。应用程序读取用户态内存数据，写入文件 / Socket之前，需要从用户态转换为内核态之后才可以写入文件或者网卡当中。

    数据首先从磁盘读取到内核缓冲区，这里面的内核缓冲区就是页缓存（PageCache）。然后从内核缓冲区中复制到应用程序缓冲区（用户态），输出到输出设备时，又会将用户态数据转换为内核态数据。

        <span>
            <strong>
                <span>DMA</span>
            </strong>
        </span>

    在介绍零拷贝之前，我们先来看一个技术名词DMA（Direct Memory Access 直接内存访问）。它是现代电脑的重要特征之一，允许不同速度的硬件之间直接交互，而不需要占用CPU的中断负载。DMA传输将一个地址空间复制到另一个地址空间，当CPU 初始化这个传输之后，实际的数据传输是有DMA设备之间完成，这样可以大大的减少CPU的消耗。我们常见的硬件设备都支持DMA，如下图所示：
    <p style="text-align:center">
        <img src="https://img2.tuicool.com/mmYVr2v.png!web" class="alignCenter" referrerpolicy="no-referrer"/>
    </p>
    零拷贝

对于常见的零拷贝，我们下面主要介绍一下 mmap 和 sendfile 两种方式。下面的介绍我们基于磁盘文件拷贝的方式去讲解。
mmap mmap 就是在用户态直接引用文件句柄，也就是用户态和内核态共享内核态的数据缓冲区，此时数据不需要复制到用户态空间。当应用程序往 mmap 输出数据时，此时就直接输出到了内核态数据，如果此时输出设备是磁盘的话，会直接写盘（flush间隔是30秒）。

上面的图片我们可以这样去理解，比如我们需要从 src.data 文件复制数据到 dest.data 文件中。此时我们不需要更改 src.data 里面的数据，但是对于 dest.data 需要追加一些数据。此时src.data 里面的数据可以直接通过DMA 设备传输，而应用程序还需要对 dest.data 做一些数据追加，此时应用对 dest.data 做 mmap 映射，直接对内核态数据进行修改。

        <span>
            <strong>
                <span>sendfile</span>
            </strong>
        </span>

    对于sendfile 而言，数据不需要在应用程序做业务处理，仅仅是从一个 DMA 设备传输到另一个 DMA设备。 此时数据只需要复制到内核态，用户态不需要复制数据，并且也不需要像 mmap 那样对内核态的数据的句柄（文件引用）。如下图所示：
    <p style="text-align:center">
        <img src="https://img2.tuicool.com/i6nuArj.png!web" class="alignCenter" referrerpolicy="no-referrer"/>
    </p>
    从上图我们可以发现（输出设备可以是网卡/磁盘驱动），内核态有 2 份数据缓存 。sendfile 是 Linux 2.1 开始引入的，在 Linux 2.4 又做了一些优化。也就是上图中磁盘页缓存中的数据，不需要复制到 Socket 缓冲区，而只是将数据的位置和长度信息存储到 Socket 缓冲区。实际数据是由DMA 设备直接发送给对应的协议引擎，从而又减少了一次数据复制。
    <div>
        <span>
            <strong>
                <span>
                    零拷贝的Java实现
                </span>
            </strong>
        </span>
    </div>
    JDK 中的 FileChannel 提供了外部 channel 交互的传输方法。transferTo 方法会将当前 FileChannel 的字节直接传输到 channel 中，transferFrom() 方法可以将可读 channel 的字节直接传输到当前 FileChannel 中。transferTo() 方法底层是基于操作系统的 sendfile 这个系统调用来实现的，map 是对 Channel 做 mmap 映射。
    下面我们看一下 Java NIO 中的方法摘要：
    <pre class="prettyprint"><span>// 将当前 FileChannel 的字节传输到给定的可写 channel 中</span>

public abstract long transferTo(long position, long count, WritableByteChannel target) throws IOException; // 将一个可读 channel 的字节传输到当前 FileChannel中 public abstract long transferFrom(ReadableByteChannel src, long position, long count) throws IOException; // 对 Channel 做 mmap 映射 public abstract MappedByteBuffer map(MapMode mode, long position, long size) throws IOException;

文件拷贝测试对比下面我们看一下执行下面3段代码，并且 src.log 文件在不同大小的情况下的测试耗时结果。 1、传统拷贝

public class OldFileCopy {




    public static final String source = "C:/data/src.log";

    public static final String dest = "C:/data/dest.log";




    public static void main(String[] args) {

        try {

            FileInputStream inputStream = new FileInputStream(source);

            FileOutputStream outputStream = new FileOutputStream(dest);

            long start = System.currentTimeMillis();

            byte[] buff = new byte[4096];

            long read = 0, total = 0;

            while ((read = inputStream.read(buff)) >= 0) {

                total += read;

                outputStream.write(buff);

            }

            outputStream.flush();

            System.out.println("耗时：" + (System.currentTimeMillis() - start));

        } catch (Exception e) {

            e.printStackTrace();

        }

    }

}

2、mmap 拷贝

public class MmapFileCopy {




    public static final String source = "C:/data/src.log";

    public static final String dest = "C:/data/dest.log";




    public static void main(String[] args) {

        try {

            FileChannel sourceChannel = new RandomAccessFile(source, "rw").getChannel();

            FileChannel destChannel = new RandomAccessFile(dest, "rw").getChannel();

            long start = System.currentTimeMillis();




            MappedByteBuffer map = destChannel.map(FileChannel.MapMode.READ_WRITE, 0, sourceChannel.size());

            sourceChannel.write(map);

            map.flip();




            System.out.println("耗时：" + (System.currentTimeMillis() - start));

        } catch (Exception e) {

            e.printStackTrace();

        }

    }




}

3、sendfile 拷贝

public class SendFileCopy {




    public static final String source = "C:/data/src.log";

    public static final String dest = "C:/data/dest.log";




    public static void main(String[] args) {

        try {

            FileChannel sourceChannel = new RandomAccessFile(source, "rw").getChannel();

            FileChannel destChannel = new RandomAccessFile(dest, "rw").getChannel();

            long start = System.currentTimeMillis();




            sourceChannel.transferTo(0, sourceChannel.size(), destChannel);

            System.out.println("耗时：" + (System.currentTimeMillis() - start));

        } catch (Exception e) {

            e.printStackTrace();

        }

    }

}

通过对不同大小的文件进行对比测试，我们得到了下面的测试结果。

    <p style="text-align:center">
        <img src="https://img2.tuicool.com/VFzMZzZ.png!web" class="alignCenter" referrerpolicy="no-referrer"/>
    </p>
    从上面测试结果可以看出，mmap 和 sendfile 的方式要远远优于传统的文件拷贝。对于 mmap 和 sendfile 在文件较小的时候， mmap 耗时更短，当文件较大时 sendfile 的方式最优。

        <span>本文来自：</span>

    http://moguhu.com/article/detail?articleId=146
</div>