CN105808497A

CN105808497A - Data processing method

Info

Publication number: CN105808497A
Application number: CN201410843391.3A
Authority: CN
Inventors: 郑伟; 陆斌; 赵献明
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2014-12-30
Filing date: 2014-12-30
Publication date: 2016-07-27
Anticipated expiration: 2034-12-30
Also published as: CN105808497B

Abstract

The present invention provides a data processing method, which is applied to a CC-NUMA system. The CC-NUMA includes a first processor and a second processor, the first processor and the second processor are connected through a bus, and the first processor is connected to the second processor. A processor includes a first core, a first cache, and a first decoder; the second processor includes a second home agent, a second cache, and the second home agent is connected to a second memory, including: a first The kernel sends a first request, and the first request carries the address of the data to be read; the first cache receives the first request, and finds out from the first decoder that the address points to the second memory , sending a second request to the second home agent, the data requested by the second request includes at least two parts of data, the first part includes the data to be read, and the second home agent preferentially provides the latest first part to the data Describe the first kernel, and then provide the rest. Applying the invention can reduce the waiting time of the first kernel.

Description

A data processing method

技术领域technical field

本发明涉及处理器领域，特别涉及缓存一致性非统一内存访问。The invention relates to the field of processors, in particular to cache coherent non-uniform memory access.

背景技术Background technique

多处理器系统通过系统总线连接，构成的缓存一致性非统一内存访问(CacheCoherency-NonUniformMemoryAccess，CC-NUMA)系统，CC-NUMA系统本质上是一个分布式共享内存多处理器系统。每个处理器拥有自己的私有缓存(Cache)，并挂载有内存。由于现有处理器和内存控制器的紧耦合关系，处理器在访问不同地址空间内存时，访问速度和带宽不一致。也就是说如果访问的是自己的内存，访问速度比较快，如果访问的是挂载在其他处理器的内存，那么速度相对较慢。The multi-processor system is connected through the system bus to form a Cache Coherency-Non Uniform Memory Access (CC-NUMA) system. The CC-NUMA system is essentially a distributed shared memory multi-processor system. Each processor has its own private cache (Cache) and is mounted with memory. Due to the tight coupling relationship between the existing processor and the memory controller, when the processor accesses memory in different address spaces, the access speed and bandwidth are inconsistent. That is to say, if you access your own memory, the access speed is relatively fast, and if you access the memory mounted on other processors, the speed is relatively slow.

现有CC-NUMA系统中，使用统一的数据粒度解决数据一致性问题，例如一般常见的缓存行的颗粒度是64字节(byte)。大多数情况下，程序实际需要使用的数据是远远小于64字节的，以8字节为例，这8字节的数据存储在内存中，那么按照粒度的数值，从内存中读出的是64字节的数据，然后经过一系列的缓存一致性处理，CPU(缓存和内核)最终收到这64字节数据。这64字节的数据包含了程序需要使用的8字节数据，但是，解决64字节数据的缓存一致性问题并返回给特定的CPU(请求数据者)所需的时间远远大于8字节数据的相应时间。In the existing CC-NUMA system, a unified data granularity is used to solve the data consistency problem. For example, the granularity of a common cache line is 64 bytes (byte). In most cases, the data that the program actually needs to use is far less than 64 bytes. Taking 8 bytes as an example, the 8 bytes of data are stored in the memory. Then, according to the value of the granularity, the data read from the memory It is 64 bytes of data, and then after a series of cache consistency processing, the CPU (cache and core) finally receives the 64 bytes of data. These 64 bytes of data contain the 8 bytes of data that the program needs to use. However, the time required to solve the cache coherence problem of 64 bytes of data and return it to a specific CPU (the person who requested the data) is much longer than 8 bytes. The corresponding time of the data.

现有技术中，处理器核心在运行程序时，等待数据的时间相对过长。In the prior art, when the processor core is running a program, the waiting time for data is relatively long.

发明内容Contents of the invention

本发明提供一种数据处理方法，可以减少处理器核心等待数据的时间。The invention provides a data processing method, which can reduce the waiting time of the processor core for data.

第一方面，本发明提供一种数据处理方法，应用于缓存一致性非统一内存访问CC-NUMA系统，CC-NUMA包括第一处理器和第二处理器，所述第一处理器和所述第二处理器通过总线连接，所述第一处理器包括第一内核、第一缓存以及第一译码器；所述第二处理器包括第二家乡代理、第二缓存，所述第二家乡代理和第二内存连接，包括：In a first aspect, the present invention provides a data processing method, which is applied to a cache-coherent non-uniform memory access CC-NUMA system, and CC-NUMA includes a first processor and a second processor, and the first processor and the The second processor is connected by a bus, the first processor includes a first core, a first cache, and a first decoder; the second processor includes a second home agent, a second cache, and the second home Proxy and second memory connections, including:

第一内核发送第一请求，所述第一请求携带待读数据的地址；所述第一缓存接收所述第一请求，从所述第一译码器中查询到所述地址指向所述第二内存，发送第二请求给所述第二家乡代理，所述第二请求所请求的数据包括至少两部分数据，第一部分包括所述待读数据，所述至少两部分数据总的大小等于缓存行的大小；所述第二家乡代理接收到所述第二请求后，提供最新的第一部分数据给所述第一缓存；然后分别提供最新的其余部分数据给所述第一缓存，所述包括至少两部分数据在所述第二内存中地址相邻；所述第一缓存从所述第一部分数据中获取所述待读数据并发送给所述第一核心，以及缓存所述最新的所述至少两部分数据。The first core sends a first request, and the first request carries the address of the data to be read; the first cache receives the first request, and finds out from the first decoder that the address points to the first Two memory, send a second request to the second home agent, the data requested by the second request includes at least two parts of data, the first part includes the data to be read, and the total size of the at least two parts of data is equal to the cache The size of the line; after the second home agent receives the second request, it provides the latest first part of data to the first cache; and then provides the latest rest of the data to the first cache, which includes At least two parts of data have adjacent addresses in the second memory; the first cache acquires the data to be read from the first part of data and sends it to the first core, and caches the latest At least two parts of data.

第一方面的第一种可能实现方案中，提供最新的第一部分数据给所述第一缓存；然后分别提供最新的其余部分数据给所述第一缓存，具体包括：从所述第二内存中读出第一部分数据的目录信息，按照目录信息记录的数据状态，如果第二内存所存储的第一部分数据是最新的第一部分数据，则所述第二家乡代理发送最新的第一部分数据给所述第一缓存，否则，查找拥有最新的第一部分数据的缓存，将拥有最新的第一部分数据的缓存中的第一部分数据发送给所述第一缓存；然后对于其余各部分数据：从所述第二内存中读出目录信息，按照目录信息记录的数据状态，如果第二内存所存储的这部分数据是最新的数据，则所述第二家乡代理发送最新的这部分数据给所述第一缓存，否则，查找拥有最新的这部分数据的缓存，将拥有最新的这部分数据的缓存中的这部分数据发送给所述第一缓存。In the first possible implementation solution of the first aspect, providing the latest first part of data to the first cache; and then respectively providing the latest rest of the data to the first cache, specifically includes: from the second memory read out the directory information of the first part of data, according to the data status recorded in the directory information, if the first part of data stored in the second memory is the latest first part of data, then the second home agent sends the latest first part of data to the The first cache, otherwise, find the cache with the latest first part of data, and send the first part of data in the cache with the latest first part of data to the first cache; then for the rest of the data: from the second Reading out the directory information in the memory, according to the data state recorded in the directory information, if the part of the data stored in the second memory is the latest data, the second home agent sends the latest part of the data to the first cache, Otherwise, search for the cache with the latest part of the data, and send the part of the data in the cache with the latest part of the data to the first cache.

第一方面的第二种可能实现方案中，提供最新的第一部分数据给所述第一缓存；然后分别提供最新的其余部分数据给所述第一缓存，具体包括：In the second possible implementation solution of the first aspect, the latest first part of data is provided to the first cache; and then the latest rest of the data are respectively provided to the first cache, specifically including:

向除了第一缓存之外的所有缓存发送第一部分数据的查询请求，将拥有最新的第一部分数据的缓存中的第一部分数据发送给所述第一缓存；Send a query request for the first part of data to all caches except the first cache, and send the first part of data in the cache with the latest first part of data to the first cache;

然后对于其余各部分数据：向除了第一缓存之外的所有缓存发送这部分数据的查询请求，将拥有最新的这部分数据的缓存中的这数据发送给所述第一缓存。Then for the remaining parts of data: send the query request of this part of data to all caches except the first cache, and send the data in the cache with the latest part of the data to the first cache.

第一方面的第三种可能实现方案中，所述第一部分数据是所述待读数据。In a third possible implementation solution of the first aspect, the first part of data is the data to be read.

第一方面的第四种可能实现方案中，所述第一部分数据大于等于所述待读数据，且是8字节的最小整数倍；所述缓存行的大小是64字节。In a fourth possible implementation solution of the first aspect, the first part of data is greater than or equal to the data to be read and is a minimum integer multiple of 8 bytes; the size of the cache line is 64 bytes.

第二方面，本发明提供一种数据处理方法，应用于缓存一致性非统一内存访问CC-NUMA系统，CC-NUMA包括第一处理器和第二处理器，所述第一处理器和所述第二处理器通过总线连接，所述第一处理器包括第一内核、第一缓存以及第一译码器；所述第二处理器包括第二家乡代理、第二缓存，所述第二家乡代理和第二内存连接，包括：第一内核发送第一请求，所述第一请求携带待读数据的地址；所述第一缓存接收所述第一请求，从所述第一译码器中查询到所述地址指向所述第二内存，发送第二请求给所述第二家乡代理，所述第二请求所请求的数据包括至少两部分数据，第一部分包括所述待读数据，所述至少两部分数据总的大小等于缓存行的大小；所述第二家乡代理接收到所述第二请求后，提供最新的第一部分数据给所述第一缓存；然后分别提供最新的其余部分数据给所述第一缓存，所述包括至少两部分数据在所述第二内存中地址相邻；所述第一缓存将所述最新的第一部分数据发送给所述第一核心，以及缓存所述最新的所述至少两部分数据。In a second aspect, the present invention provides a data processing method, which is applied to a cache-coherent non-uniform memory access CC-NUMA system. CC-NUMA includes a first processor and a second processor, and the first processor and the The second processor is connected by a bus, the first processor includes a first core, a first cache, and a first decoder; the second processor includes a second home agent, a second cache, and the second home The agent is connected to the second memory, including: the first core sends a first request, and the first request carries the address of the data to be read; the first cache receives the first request, and the first request is received from the first decoder Querying that the address points to the second memory, sending a second request to the second home agent, the data requested by the second request includes at least two parts of data, the first part includes the data to be read, and the The total size of at least two parts of data is equal to the size of the cache line; after receiving the second request, the second home agent provides the latest first part of data to the first cache; and then provides the latest rest of the data to In the first cache, at least two parts of data are adjacent to each other in the second memory; the first cache sends the latest first part of data to the first core, and caches the latest The at least two parts of data.

第二方面的第一种可能实现方案中，提供最新的第一部分数据给所述第一缓存；然后分别提供最新的其余部分数据给所述第一缓存，具体包括：从所述第二内存中读出第一部分数据的目录信息，按照目录信息记录的数据状态，如果第二内存所存储的第一部分数据是最新的第一部分数据，则所述第二家乡代理发送最新的第一部分数据给所述第一缓存，否则，查找拥有最新的第一部分数据的缓存，将拥有最新的第一部分数据的缓存中的第一部分数据发送给所述第一缓存；然后对于其余各部分数据：从所述第二内存中读出目录信息，按照目录信息记录的数据状态，如果第二内存所存储的这部分数据是最新的数据，则所述第二家乡代理发送最新的这部分数据给所述第一缓存，否则，查找拥有最新的这部分数据的缓存，将拥有最新的这部分数据的缓存中的这部分数据发送给所述第一缓存。In the first possible implementation solution of the second aspect, providing the latest first part of data to the first cache; and then respectively providing the latest rest of the data to the first cache, specifically includes: from the second memory read out the directory information of the first part of data, according to the data status recorded in the directory information, if the first part of data stored in the second memory is the latest first part of data, then the second home agent sends the latest first part of data to the The first cache, otherwise, find the cache with the latest first part of data, and send the first part of data in the cache with the latest first part of data to the first cache; then for the rest of the data: from the second Reading out the directory information in the memory, according to the data state recorded in the directory information, if the part of the data stored in the second memory is the latest data, the second home agent sends the latest part of the data to the first cache, Otherwise, search for the cache with the latest part of the data, and send the part of the data in the cache with the latest part of the data to the first cache.

第二方面的第二种可能实现方案中，提供最新的第一部分数据给所述第一缓存；然后分别提供最新的其余部分数据给所述第一缓存，具体包括：向除了第一缓存之外的所有缓存发送第一部分数据的查询请求，将拥有最新的第一部分数据的缓存中的第一部分数据发送给所述第一缓存；然后对于其余各部分数据：向除了第一缓存之外的所有缓存发送这部分数据的查询请求，将拥有最新的这部分数据的缓存中的这数据发送给所述第一缓存。In a second possible implementation solution of the second aspect, providing the latest first part of data to the first cache; and then respectively providing the latest rest of the data to the first cache, specifically includes: Send query requests for the first part of data to all the caches of , and send the first part of data in the cache with the latest first part of data to the first cache; then for the rest of the data: to all caches except the first cache Sending the query request of this part of data sends the data in the cache with the latest part of the data to the first cache.

第二方面的第三种可能实现方案中，所述第一部分数据是所述待读数据。In a third possible implementation solution of the second aspect, the first part of data is the data to be read.

第二方面的第四种可能实现方案中，所述第一部分数据大于等于所述待读数据，且是8字节的最小整数倍；所述缓存行的大小是64字节。In a fourth possible implementation solution of the second aspect, the first part of data is greater than or equal to the data to be read and is a minimum integer multiple of 8 bytes; the size of the cache line is 64 bytes.

本发明实施例，第二请求所请求的数据包括至少两部分数据，优先返回最新的第一部分数据给第一缓存，其中第一部分所请求的是第一核心运行需要的数据。因此可以减少第一核心的等待时间。In the embodiment of the present invention, the data requested by the second request includes at least two parts of data, and the latest first part of data is preferentially returned to the first cache, wherein the first part requests data required by the first core for operation. Therefore, the waiting time of the first core can be reduced.

附图说明Description of drawings

为了更清楚地说明本发明实施例的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍，下面描述中的附图仅仅是本发明的一些实施例，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the following will briefly introduce the accompanying drawings that need to be used in the embodiments or the description of the prior art. The accompanying drawings in the following description are only some embodiments of the present invention. Additional drawings can also be derived from these drawings.

图1是本发明CC-NUMA实施例结构示意图；Fig. 1 is a schematic structural diagram of a CC-NUMA embodiment of the present invention;

图2是本发明数据处理方法实施例流程图。Fig. 2 is a flowchart of an embodiment of the data processing method of the present invention.

具体实施方式detailed description

下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without creative efforts fall within the protection scope of the present invention.

如图1所示，本实施例中所涉及的CC-NUMA系统由多个处理器和内存组成。处理器例如是CPU。处理器内部包括内核、缓存、家乡代理以及译码器。缓存分别和内核、家乡代理、地址译码器连接。家乡代理也可以称为内存归属代理，用于管理和CPU直接连接的内存。如果其他CPU要访问本CPU的内存，需要由本CPU的内存中转。As shown in FIG. 1 , the CC-NUMA system involved in this embodiment is composed of multiple processors and memory. The processor is, for example, a CPU. The interior of the processor includes a core, a cache, a home agent, and a decoder. The cache is connected to the kernel, home agent, and address decoder respectively. The home agent can also be called a memory home agent, which is used to manage the memory directly connected to the CPU. If other CPUs want to access the memory of this CPU, it needs to be relayed by the memory of this CPU.

每个处理器挂载有内存，家乡代理和挂载的内存连接，内存用于存储数据。内核是处理器的核心部件，具有运算能力，可以运行程序，是数据的需求方。缓存保存有临时性数据，以供内核使用。内核访问缓存的速度高于访问内存的速度。家乡代理可以对挂载的内存数据进行读写。地址译码器，记录有地址和家乡代理的对应关系，在获知一个数据的地址的情况下，使用地址译码器可以查询到对应的家乡代理，而这个家乡代理所挂载的内存存储有这个地址所代表的数据。处理器挂载内存，是指直接连接并且可以从中读取数据，可选的，还可以对挂载的内存进行管理。内存例如是随机访问内存(random-accessmemory，RAM)。Each processor is mounted with memory, the home agent is connected to the mounted memory, and the memory is used to store data. The core is the core component of the processor, has computing power, can run programs, and is the demand side of data. The cache holds temporary data for use by the kernel. The kernel accesses cache faster than it can access memory. The home agent can read and write the mounted memory data. The address decoder records the corresponding relationship between the address and the home agent. When the address of a piece of data is known, the address decoder can be used to query the corresponding home agent, and the memory stored in the home agent has this The data represented by the address. The memory mounted by the processor means that it is directly connected and can read data from it. Optionally, the mounted memory can also be managed. The memory is, for example, random-access memory (RAM).

使用总线可以连接多个处理器，例如英特尔公司的X86处理器使用的快速通道互连(QPI)总线。通过总线，一个处理器可以对其他处理器挂载的内存间接进行访问。处理器访问直接挂载的内存速度较快，访问其他处理器挂载的内存的速度较慢。Multiple processors can be connected using a bus, such as the QuickPath Interconnect (QPI) bus used by Intel Corporation's X86 processors. Through the bus, a processor can indirectly access the memory mounted by other processors. Processors access directly mounted memory faster, while accessing memory mounted by other processors is slower.

在图1的架构中，第一处理器包括第一内核、第一缓存、第一地址译码器以及第一家乡代理，第一家乡代理和第一内存连接。第二处理器包括第二内核、第二缓存、第二地址译码器以及第二家乡代理，第二家乡代理和第二内存连接。缓存可以通过QPI总线访问其他处理器的家乡代理。第三处理器包括第三处理器内核、第三处理器缓存、第三处理器地址译码器以及第三处理器家乡代理，第三处理器家乡代理和第三处理器内存连接。In the architecture of FIG. 1 , the first processor includes a first core, a first cache, a first address decoder, and a first home agent, and the first home agent is connected to a first memory. The second processor includes a second core, a second cache, a second address decoder, and a second home agent, and the second home agent is connected to a second memory. The cache can access the home agents of other processors through the QPI bus. The third processor includes a third processor core, a third processor cache, a third processor address decoder, and a third processor home agent, and the third processor home agent is connected to the third processor memory.

图1只是示例，在其他实施例中，可以有更多的处理器，例如4个或者8个甚至更多。在处理器数量较多时，可以在处理器间设置中转器。FIG. 1 is just an example, and in other embodiments, there may be more processors, such as 4 or 8 or even more. When the number of processors is large, a relay can be set between the processors.

下面对本发明实施例一种数据处理方法进行详细说明。A data processing method according to an embodiment of the present invention will be described in detail below.

步骤21，第一内核发送数据读请求给第一缓存，本实施例把第一内核需要的数据称为待读数据。第一内核的数据需求来自于它所运行的程序。读请求中携带有待读数据的地址。请求的数据小于缓存行(cacheline)的大小。本实施例将这个请求称为第一请求。In step 21, the first core sends a data read request to the first cache. In this embodiment, the data required by the first core is referred to as data to be read. The data requirements of the first core come from the programs it runs. The read request carries the address of the data to be read. The requested data is smaller than the cacheline size. In this embodiment, this request is referred to as the first request.

通常情况下，第一内核是出于运行程序的目的，而需要使用待读数据。本实施例中，待读数据小于缓存行数据。例如待读数据是8字节(Byte)，而缓存行(cacheline)的大小是64字节。第一请求所请求的数据量是8字节。Usually, the first kernel is for the purpose of running the program and needs to use the data to be read. In this embodiment, the data to be read is smaller than the cache line data. For example, the data to be read is 8 bytes (Byte), and the size of a cache line (cacheline) is 64 bytes. The amount of data requested by the first request is 8 bytes.

步骤22，第一缓存使用待读数据的地址查询第一缓存是不是存储有待读数据。如果有，就直接返回待读数据给第一内核，结束本方法。如果没有，就发送查询请求给第一地址译码器，进入步骤23。Step 22, the first cache uses the address of the data to be read to query whether the first cache stores the data to be read. If there is, the data to be read is directly returned to the first kernel, and the method ends. If not, an inquiry request is sent to the first address decoder, and step 23 is entered.

步骤23，第一缓存通过使用第一地址译码器查询地址所对应的家乡代理。并通过QPI总线发送第二读请求给查询到的家乡代理。本实施例中，假设和地址对应的家乡代理是图1中的第二家乡代理。第二家乡代理管理和第二CPU直连的内存。In step 23, the first cache queries the home agent corresponding to the address by using the first address decoder. And send the second read request to the queried home agent through the QPI bus. In this embodiment, it is assumed that the home agent corresponding to the address is the second home agent in FIG. 1 . The second home agent manages memory directly connected to the second CPU.

第二请求所请求的数据包括几个部分，且这几部分数据的大小的总和等于缓存行的大小。其中，第一部分包括了所述待读数据。由于这几部分数据的大小的和等于缓存行的大小，因此可以兼容现有设计。The data requested by the second request includes several parts, and the sum of the sizes of these parts of data is equal to the size of the cache line. Wherein, the first part includes the data to be read. Since the sum of the sizes of these parts of data is equal to the size of the cache line, it is compatible with existing designs.

本实施例中，第一部分数据包含了第一请求所请求的数据，而且是8字节的倍数，例如最小的倍数。如果第一请求所请求的数据不超过8字节，则第一部分数据是8字节；如果第一部分数据大于8字节、不超过16字节，则第一部分数据的数据是16字节；以此类推。In this embodiment, the first part of data includes the data requested by the first request, and is a multiple of 8 bytes, such as the smallest multiple. If the data requested by the first request does not exceed 8 bytes, the first part of the data is 8 bytes; if the first part of the data is greater than 8 bytes but not more than 16 bytes, the data of the first part of the data is 16 bytes; And so on.

使用缓存行的大小减去第一部分数据长度，即可得到其余部分数据的长度。示例一：一共分成两部分，第一部分数据时8字节，缓存行长度是64字节，那么第二部分数据的长度是56字节。示例二：每8字节一个部分，缓存行长度是64字节，那么包括第一部分在内一共有8个部分。Subtract the length of the first part of data from the size of the cache line to get the length of the rest of the data. Example 1: It is divided into two parts. The first part of data is 8 bytes, and the length of the cache line is 64 bytes. Then the length of the second part of data is 56 bytes. Example 2: Each part is 8 bytes, and the cache line length is 64 bytes, then there are 8 parts in total including the first part.

在其他实施例中，也可以保持第一部分数据的长度和待读数据相同。In other embodiments, the length of the first part of data may also be kept the same as that of the data to be read.

而现有技术中，第一缓存发送给第二家乡代理的请求中，请求的数据量是一个整体，没有分成两个部分，其大小和缓存行相同，也是64字节。However, in the prior art, in the request sent by the first cache to the second home agent, the requested data volume is a whole, not divided into two parts, and its size is the same as the cache line, which is also 64 bytes.

第一地址译码器存储有地址映射表存储，地址映射表记录了数据的地址和数据的存储位置的映射关系。具体而言，存储的是地址和家乡代理的映射关系，家乡代理直接挂载的内存中存储有和地址对应的数据。The first address decoder is stored with an address mapping table, and the address mapping table records the mapping relationship between the address of the data and the storage location of the data. Specifically, what is stored is the mapping relationship between the address and the home agent, and the memory directly mounted by the home agent stores data corresponding to the address.

步骤24，第二家乡代理收到第二请求后，从其挂载的内存2中读取第一部分数据的目录信息。按照目录信息记录的数据状态，提供最新的第一部分数据给第一缓存。然后，家乡代理2按照各个其余部分数据的目录信息，提供最新的其余部分数据给第一缓存。Step 24, after receiving the second request, the second home agent reads the directory information of the first part of data from the memory 2 mounted on it. According to the data status recorded in the directory information, the latest first part of data is provided to the first cache. Then, the home agent 2 provides the latest rest data to the first cache according to the directory information of each rest data.

这几个部分数据在第二内存中的地址相邻。第二家乡代理优先提供最新的第一部分数据，然后再提供最新的其余各部分数据。The addresses of these partial data in the second memory are adjacent. The second home agent gives priority to providing the latest data of the first part, and then provides the latest data of the remaining parts.

各部分数据对应有一个内存中的地址段，这些地址段可以拼接成一个连续的地址段。Each part of data corresponds to an address segment in the memory, and these address segments can be spliced into a continuous address segment.

第二内存中存储有第一部分数据，但可能是较早版本的数据，而不是最新的第一部分数据。例如第二内存中的第一部分数据被其他处理器读取并修改，那么最新的第一部分数据将存在于其他处理器的缓存中。第二家乡代理提供最新的第一部分数据给缓存有以下两种种方式。The first part of the data is stored in the second memory, but may be an earlier version of the data rather than the latest first part of the data. For example, if the first part of data in the second memory is read and modified by other processors, then the latest first part of data will exist in the cache memory of other processors. The second home agent provides the latest first part of the data to the cache in the following two ways.

(1)如果最新的第一部分数据存储在第二内存中，则第二家乡代理直接从第二内存中读取最新的第一部分数据，然后由第二家乡代理通过QPI总线把数据发送给第一缓存。(1) If the latest first part of data is stored in the second memory, the second home agent directly reads the latest first part of data from the second memory, and then the second home agent sends the data to the first through the QPI bus cache.

(2)如果最新的第一部分数据不存储在第二内存中，说明最新第一部分数据存储在其他处理器的缓存中。那么第二家乡代理查询拥有最新第一部分数据的缓存的ID，并向这个缓存发出指令，有三种可选指令，下面分别说明。本实施例中，假设拥有最新第一部分数据的缓存是第三处理器的第三缓存，将第三缓存中的第一部分数据发送给所述第一缓存的方案如下：(2) If the latest first part of data is not stored in the second memory, it means that the latest first part of data is stored in the cache of other processors. Then the second home agent queries the ID of the cache that has the latest first part of the data, and sends an instruction to this cache. There are three optional instructions, which are described below. In this embodiment, assuming that the cache with the latest first part of data is the third cache of the third processor, the scheme of sending the first part of data in the third cache to the first cache is as follows:

(2.1)指令第三缓存把最新的第一部分数据发送给第一缓存。(2.1) Instruct the third cache to send the latest first part of data to the first cache.

(2.2)指令第三缓存把最新的第一部分数据发送给第二家乡代理，第二家乡代理把最新的待数据缓存到第二内存中，并发送最新第一部分数据给第一缓存。(2.2) Instruct the third cache to send the latest first part of data to the second home agent, and the second home agent caches the latest pending data in the second memory, and sends the latest first part of data to the first cache.

(2.3)相当于前两种指令的结合，既指令第三缓存把最新的第一部分数据发送给第一缓存，又指令第三缓存把最新的第一部分数据发送给第二家乡代理，第二家乡代理把最新的待数据缓存到第二内存中。(2.3) It is equivalent to the combination of the first two instructions, which not only instructs the third cache to send the latest first part of data to the first cache, but also instructs the third cache to send the latest first part of data to the second home agent, and the second home agent The agent caches the latest pending data in the second memory.

一般而言，家乡代理中记录有目录。第二家乡代理使用第一部分数据的部分地址作为标签，在目录中查询第一部分数据的状态位以及存储位置。从而可以确认第二内存中是否存储有最新第一部分数据；以及确认如果第二内存中没有存储最新第一部分数据时，存储有最新第一部分数据的缓存。In general, there are directories recorded in the home agent. The second home agent uses the partial address of the first part of the data as a label, and queries the status bit and the storage location of the first part of the data in the directory. Therefore, it can be confirmed whether the latest first part of data is stored in the second memory; and it can be confirmed that if the latest first part of data is not stored in the second memory, there is a cache of the latest first part of data stored.

例如，目录中记载数据是否处于M(modified，修改态)、E(Exclusive，独占态)、S(Share，共享态)、I(Invalid，无效态)或者A(Any/Unkown，任意态/未知态)等状态。For example, whether the data recorded in the directory is in M (modified, modified state), E (Exclusive, exclusive state), S (Share, shared state), I (Invalid, invalid state) or A (Any/Unknown, arbitrary state/unknown state) and other states.

例如：M状态的数据是脏(dirty)的，表示待读数据已修改过，没有写回内存，因此和内存的不一致。因此第二内存中存储的不是最新的待读数据。E状态的数据是干净(clean)的，其他缓存没有对齐进行修改过，因此和内存的一致。内存中存储的是最新的待读数据。其他状态不再详述。For example: the data in the M state is dirty (dirty), indicating that the data to be read has been modified and has not been written back to the memory, so it is inconsistent with the memory. Therefore, what is stored in the second memory is not the latest data to be read. The data in the E state is clean, and other caches are not aligned and modified, so they are consistent with the memory. The latest data to be read is stored in the memory. Other states will not be described in detail.

此外，在有些情况下，家乡代理中的目录记录的状态，可能是不准确的。例如，记录为某个处理器的cache中针对某段数据为E态，但是实际上已降级为S态但是未来得及通知其家乡代理。因此，根据目录发送消息到各处理器的家乡代理查询实际状态，是维护缓存一致性工作中常见的动作。也因为如此，一种可选的方案是：即使由目录确认内存中存储的是最新待读数据时，也可以使用方式(2)提供地方方案，向其他处理器的缓存查询待最新的待读数据。Also, in some cases, the state of the directory record in the home agent may be inaccurate. For example, it is recorded that a certain piece of data in the cache of a certain processor is in the E state, but it has actually been downgraded to the S state but its home agent has not been notified in the future. Therefore, sending a message to the home agent of each processor to query the actual state according to the directory is a common action in maintaining cache consistency. Because of this, an optional solution is: even if the directory confirms that the latest data to be read is stored in the memory, the method (2) can also be used to provide a local solution to query the caches of other processors for the latest data to be read data.

此外，也可能家乡代理中没有目录，在这种场景下，可以向所有其他缓存发起查询操作。也就是说，相当于在有目录中的情况下，目录中记录的各个其他缓存均为未知态。其他缓存是指除了发起者以外的缓存，本实施例中发起者是第一缓存。Also, it is possible that there is no directory in the home agent, in which case all other caches can be queried. That is to say, when there is a directory, each other cache recorded in the directory is unknown. Other caches refer to caches other than the originator, and the originator is the first cache in this embodiment.

上面介绍了如何获得最新的第一部分数据。对于其余各个部分，获得其最新数据的方案与之类似，因此这里不再详述。The above describes how to obtain the latest first part data. For the rest of the parts, the scheme of obtaining the latest data is similar, so it will not be described in detail here.

这里所说的优先提供最新的第一部分数据，并不是指要把查询到最新的第一步数据并发送给第一核心后，才能开始查找最新的其余各部分数据。这里指的是在各个期间中，当提供最新的第一部分数据和提供最新的其他部分数据产生冲突时，优先处理第一部分数据。或者说，如果提供任一最新的其他各部分数据的过程中，会影响到提供最新的第一部分数据的效率时，优先提供最新的第一部分数据。The first part of the latest data is given priority here. It does not mean that the latest data of the first step must be queried and sent to the first core before starting to search for the latest data of the remaining parts. This means that in each period, when there is a conflict between providing the latest first part of data and providing the latest other part of data, the first part of data will be processed first. In other words, if the efficiency of providing the latest first part of data is affected during the process of providing any of the latest other parts of data, the latest first part of data is given priority.

从本步骤可以看出，本实施例中，第二家乡代理会优先第一部分数据给第一缓存；然后再提供提供其余部分数据数据。因此第一部分数据可以更快的到达第一缓存。而其余部分数据并没有太高的时效性的要求，可以延迟到达第一缓存，用于暂存在第一缓存中，供将来使用。这几部分数据加起来，和现有技术中第一缓存最终获得的的数据相同。但是现有技术中，待读数据到第一缓存的速度太慢。而本发明分成几个部分以后，待读数据所在的第一部分数据优先到达第一缓存，减小了第一缓存的等待时间，从而减小了第一核心的等待时间。It can be seen from this step that in this embodiment, the second home agent will give priority to the first part of the data to the first cache; and then provide the rest of the data. Therefore, the first part of data can reach the first cache faster. The rest of the data does not have a high timeliness requirement, so it can be delayed to arrive at the first cache and temporarily stored in the first cache for future use. The sum of these parts of data is the same as the data finally obtained by the first cache in the prior art. However, in the prior art, the speed of data to be read to the first cache is too slow. However, after the present invention is divided into several parts, the first part of data where the data to be read is located arrives at the first cache first, reducing the waiting time of the first cache, thereby reducing the waiting time of the first core.

步骤25，第一缓存接收最新的第一部分数据，从中提取最新的待读数据发送给第一核心，把最新的第一部分数据进行暂存。第一缓存接收最新的其余部分数据，把最新的其余部分数据进行暂存。Step 25, the first cache receives the latest first part of data, extracts the latest data to be read from it and sends it to the first core, and temporarily stores the latest first part of data. The first cache receives the latest rest of the data, and temporarily stores the latest rest of the data.

第一核心收到最新的待读数据后，即可使用这部分数据继续运行程序。After the first core receives the latest data to be read, it can use this part of the data to continue running the program.

应用本发明实施例，先查第一部分数据(例如8比特)，后查其余字节数据状态，对于避免复杂冲突场景有一定好处。举例来说，第一部分数据是共享态，而其余字节有部分是修改态，那么整个64byte是修改态。那么该第一部分数据就可以从第二处理器直接返回给请求者(第一缓存)，而不需要向其他处理器发送任何数据。另一方面，多个处理器的家乡代理同时请求同一地址的数据时，容易产生冲突。冲突场景的解决，是缓存一致性协议设计中最复杂和最耗时的场景。将数据的颗粒度，从64字节减小为8字节，可以降低发生冲突的概率；为保证缓存命中率，本发明又兼容64字节的设计，最终缓存在第一缓存中的数据仍然和现有技术一样是64字节，也就是说最终完成64字节数据的一致性访问。Applying the embodiment of the present invention, first check the first part of data (for example, 8 bits), and then check the state of the rest of the byte data, which has certain benefits for avoiding complex conflict scenarios. For example, if the first part of data is shared state, and some of the remaining bytes are modified state, then the entire 64byte is modified state. Then the first part of data can be directly returned from the second processor to the requester (the first cache), without sending any data to other processors. On the other hand, when the home agents of multiple processors request data at the same address at the same time, conflicts are likely to occur. The resolution of conflict scenarios is the most complex and time-consuming scenario in cache coherence protocol design. The granularity of data is reduced from 64 bytes to 8 bytes, which can reduce the probability of conflict; in order to ensure the cache hit rate, the present invention is compatible with the design of 64 bytes, and finally the data cached in the first cache is still Same as the prior art, it is 64 bytes, that is to say, consistent access to 64-byte data is finally completed.

步骤25中，本发明实施例中第一缓存发送给第一核心的数据是第一核心真正需要的数据。而现有技术中，即使第一核心需要的数据小于64字节，第一缓存发送给第一核心的始终是64字节数据。因此，现有技术中，第一缓存发送给第一的数据更多，而且第一核心需要从中选出实际需要的数据，增加了系统的复杂性。In step 25, the data sent from the first cache to the first core in the embodiment of the present invention is the data actually needed by the first core. However, in the prior art, even if the data required by the first core is less than 64 bytes, what the first cache sends to the first core is always 64 bytes of data. Therefore, in the prior art, the first cache sends more data to the first cache, and the first core needs to select actually required data from it, which increases the complexity of the system.

步骤25中“第一缓存接收最新的第一部分数据，从中提取最新的待读数据发送给第一核心”还可以有一种替换方式，那就是第一缓存不提取最新的待读数据，直接把第一部分数据发送给第一核心。第一核心收到后，从中提取最新的待读数据。In step 25, "the first cache receives the latest first part of data, extracts the latest data to be read from it and sends it to the first core" can also have an alternative method, that is, the first cache does not extract the latest data to be read, and directly transfers the first part of the data to the first core. Part of the data is sent to the first core. After the first core receives it, it extracts the latest data to be read from it.

最后应说明的是：以上实施例仅用以说明本发明的技术方案，而非对其限制；尽管参照前述实施例对本发明进行了详细的说明，本领域的普通技术人员应当理解：其依然可以对前述各实施例所记载的技术方案进行修改，或者对其中部分技术特征进行等同替换；而这些修改或者替换，并不使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围。Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present invention, rather than to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: it can still be Modifications are made to the technical solutions described in the foregoing embodiments, or equivalent replacements are made to some of the technical features; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of the various embodiments of the present invention.

Claims

1. a data processing method, it is applied to buffer consistency nonuniform memory access CC-NUMA system, CC-NUMA includes first processor and the second processor, described first processor and described second processor are connected by bus, and described first processor includes the first kernel, the first buffer memory and the first decoder；Described second processor includes the second home agent, the second buffer memory, described second home agent and the second Memory linkage, it is characterised in that including:

First kernel sends the first request, and the address of the data that continue is carried in described first request；

Described first buffer memory receives described first request, from described first decoder, inquire described address point to described second internal memory, second request that sends is to described second home agent, described second asks requested data to include at least two parts data, Part I continues data described in including, and the total size of described at least two parts data is equal to the size of cache lines；

After described second home agent receives described second request, it is provided that up-to-date Part I data give described first buffer memory；Then provide up-to-date remainder data to described first buffer memory respectively, described in include at least that two parts data address in described second internal memory is adjacent；

Described first buffer memory obtain from described Part I data described in the Data Concurrent that continues give described first core and described at least two parts data up-to-date described in buffer memory.

2. method according to claim 1, it is characterised in that provide up-to-date Part I data to described first buffer memory；Then provide up-to-date remainder data to described first buffer memory respectively, specifically include:

The directory information of Part I data is read from described second internal memory, data mode according to directory information record, if the second internal memory stored Part I data are up-to-date Part I data, then described second home agent sends up-to-date Part I data to described first buffer memory, otherwise, search the buffer memory having up-to-date Part I data, the Part I data in the buffer memory having up-to-date Part I data are sent to described first buffer memory；

Then for all the other each several part data: read directory information from described second internal memory, data mode according to directory information record, if the second internal memory these part data stored are up-to-date data, then described second home agent sends these up-to-date part data to described first buffer memory, otherwise, search the buffer memory having these up-to-date part data, these part data in the buffer memory having these up-to-date part data are sent to described first buffer memory.

3. method according to claim 1, it is characterised in that provide up-to-date Part I data to described first buffer memory；Then provide up-to-date remainder data to described first buffer memory respectively, specifically include:

Send the inquiry request of Part I data to all buffer memorys except the first buffer memory, the Part I data in the buffer memory having up-to-date Part I data are sent to described first buffer memory；

Then for all the other each several part data: send the inquiry request of these part data to all buffer memorys except the first buffer memory, these data in the buffer memory having these up-to-date part data are sent to described first buffer memory.

4. the method according to claim 1,2 or 3, it is characterised in that:

Described Part I data continue data described in being.

5. the method according to claim 1,2 or 3, it is characterised in that:

Described Part I data are be more than or equal to the described data that continue, and are the smallest positive integrals times of 8 bytes；The size of described cache lines is 64 bytes.

6. a data processing method, it is applied to buffer consistency nonuniform memory access CC-NUMA system, CC-NUMA includes first processor and the second processor, described first processor and described second processor are connected by bus, and described first processor includes the first kernel, the first buffer memory and the first decoder；Described second processor includes the second home agent, the second buffer memory, described second home agent and the second Memory linkage, it is characterised in that including:

Described up-to-date Part I data are sent to described first core and described at least two parts data up-to-date described in buffer memory by described first buffer memory.

7. method according to claim 6, it is characterised in that provide up-to-date Part I data to described first buffer memory；Then provide up-to-date remainder data to described first buffer memory respectively, specifically include:

8. method according to claim 6, it is characterised in that provide up-to-date Part I data to described first buffer memory；Then provide up-to-date remainder data to described first buffer memory respectively, specifically include:

9. the method according to claim 6,7 or 8, it is characterised in that:

Described Part I data continue data described in being.

10. the method according to claim 6,7 or 8, it is characterised in that: