WO2014206232A1 - Consistency processing method and device based on multi-core processor - Google Patents

Consistency processing method and device based on multi-core processor Download PDF

Info

Publication number
WO2014206232A1
WO2014206232A1 PCT/CN2014/080169 CN2014080169W WO2014206232A1 WO 2014206232 A1 WO2014206232 A1 WO 2014206232A1 CN 2014080169 W CN2014080169 W CN 2014080169W WO 2014206232 A1 WO2014206232 A1 WO 2014206232A1
Authority
WO
WIPO (PCT)
Prior art keywords
core processor
data
threshold
directory
kernel
Prior art date
Application number
PCT/CN2014/080169
Other languages
French (fr)
Chinese (zh)
Inventor
张轮凯
范东睿
叶笑春
王达
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2014206232A1 publication Critical patent/WO2014206232A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/0815Cache consistency protocols

Definitions

  • Multi-core processor-based consistency processing method and apparatus The present application claims to be submitted to the Chinese Patent Office on June 26, 2013, application number 201310260830.3, and the invention name is "multi-core processor-based consistency processing method and apparatus" Priority of the Chinese Patent Application, the entire contents of which are incorporated herein by reference.
  • the present invention relates to data storage technologies, and more particularly to a consistency processing method and apparatus based on a multi-core processor.
  • a multi-core processor refers to the integration of multiple cores in a single processor.
  • the multi-core processor is connected to an off-chip memory for storing each shared data, and the off-chip memory includes a plurality of data pages, each of which includes a plurality of data blocks.
  • the cores of multicore processors are connected to each other over a network that is used to pass messages between cores.
  • Each core in a multi-core processor includes a processor core, an on-chip cache, a Translation Lookaside Buffer (TLB), and a sparse directory. Before the kernel operates on the shared data in the off-chip memory, the shared data needs to be cached into the kernel's on-chip cache.
  • TLB Translation Lookaside Buffer
  • the on-chip cache is cached. After a kernel writes the cached shared data, the shared data after the write operation is inconsistent with the shared data cached in the rest of the kernel. Therefore, the multi-core processor is required to maintain the consistency of the shared data.
  • a multi-core processor maintains the consistency of shared data using one of a directory protocol and a listening protocol. Since the directory protocol needs to store the directory entries in the sparse directory, the directory entry records the kernel of each shared data, thereby performing consistency processing on the shared data stored in the kernel according to the directory entries in the sparse directory, thereby causing the use of the directory protocol.
  • Multi-core processors have high storage overhead and are less efficient to process consistently; in addition, although the listening protocol does not have to store directories in a sparse directory Item, but because the interception protocol needs to use the broadcast message to know the kernel that caches the shared data, so that the shared data is processed consistently, which causes the multi-core processor signaling overhead of the listening protocol to be large, which also causes Consistency processing is less efficient.
  • Embodiments of the present invention provide a consistency processing method and apparatus based on a multi-core processor for improving consistency processing efficiency.
  • the first aspect provides a multi-core processor-based consistency processing method, including: receiving a consistency request message sent by a first kernel in a multi-core processor; the consistency request message is used to indicate that a consistency process is to be performed.
  • the target shared data according to the number of the second kernel in the multi-core processor, selecting one of a directory protocol or a listening protocol to perform consistency processing on the target shared data; the second kernel is sharing The target shares the core of the data.
  • the target shared data is data in a target data page of off-chip memory; the off-chip memory is used to provide the target sharing for the multi-core processor data.
  • the selecting, according to the number of the second cores in the multi-core processor, The consistency processing of the target shared data by one of the directory protocol or the interception protocol includes: determining whether the number of the second kernel is greater than a predetermined shared threshold; the shared threshold is greater than zero and less than An integer of the number of cores of the multi-core processor; if the number of the second core is not greater than the shared threshold, the target shared data is consistently processed by the interception protocol; If the number of kernels is greater than the shared threshold, the directory sharing protocol is used to perform consistency processing on the target shared data.
  • the selecting includes: a network collision rate according to the multi-core processor and the multi-core processor Sparse directory replacement rate, updating the shared threshold; the network collision rate indicating a degree of congestion of a network for transmitting messages between cores of the multi-core processor; the sparse directory replacement rate indicating The storage space occupancy of the sparse directory in the multi-core processor; if the data in the first data page of the off-chip memory is cached in the kernel of the multi-core processor, deleting the kernel cache of the multi-core processor The data in the first data page, so that the sparse directory in the multi-core processor deletes a directory entry corresponding to the first data page; the directory entry corresponding to the first data page is used to record a a kernel in which data in each data block of the first
  • the updating the shared threshold according to the network conflict rate and the sparse directory replacement rate includes: if the network conflict rate is higher than the first threshold, and the sparse directory replacement rate is lower than the third threshold, determining that the updated shared threshold is twice the shared threshold; If the network conflict rate is lower than the second threshold, and the sparse directory replacement rate is higher than the fourth threshold, determining that the updated shared threshold is half of the shared threshold.
  • the network conflict rate is a difference between an actual transit time and a theoretical transit time, and the theoretical transit time Ratio between the two;
  • the theoretical delivery time is a total time required for the at least one test message to be transmitted in the network when the state of the network is unblocked;
  • the actual delivery time is a statistically obtained location The total time at which the at least one test message is actually delivered in the network.
  • the sparse directory replacement rate is a number of times that the sparse directory performs a read operation at a specified time, The ratio between the number of times the free storage space of the sparse directory is zero in a specified time.
  • the second aspect provides a multi-core processor-based consistency processing apparatus, including: a receiving module, configured to receive a consistency request message sent by a first core in a multi-core processor; the consistency request message is used to Indicate the target shared data to be processed consistently; a processing module, configured to perform consistency processing on the target shared data by using one of a directory protocol or a listening protocol according to the number of the second kernels in the multi-core processor; the second kernel is shared The target shares the core of the data.
  • the target shared data is data in a target data page of off-chip memory; the off-chip memory is used to provide the target sharing for the multi-core processor data.
  • the processing module includes:
  • a determining unit configured to determine whether the number of the second core is greater than a predetermined shared threshold; the shared threshold is greater than zero and less than an integer of the number of cores of the multi-core processor;
  • a first processing unit configured to: if the number of the second kernel is not greater than the shared threshold, use a listening protocol to perform consistency processing on the target shared data;
  • a second processing unit configured to perform consistency processing on the target shared data by using a directory protocol if the number of the second cores is greater than the shared threshold.
  • the consistency processing device of the multi-core processor further includes:
  • an update module configured to update the shared threshold according to a network collision rate of the multi-core processor and a sparse directory replacement rate of the multi-core processor;
  • the network conflict rate is indicated for being used in the multi-core a degree of congestion of a network that communicates messages between cores of the processor;
  • the sparse directory replacement rate indicating a storage space occupancy of a sparse directory in the multi-core processor;
  • a deleting module configured to: if the data in the first data page of the off-chip memory is cached in a kernel of the multi-core processor, deleting the first data page of the kernel cache of the multi-core processor Data, such that the sparse directory in the multi-core processor deletes a directory entry corresponding to the first data page; the directory entry corresponding to the first data page is used to record each data block of the first data page
  • the core of the data is cached; the first data page satisfies the number of kernels that cache data in the first data page is greater than the updated shared threshold.
  • a first updating unit configured to determine, if the network conflict rate is higher than a first threshold, and the sparse directory replacement rate is lower than a third threshold, determining that the updated sharing threshold is the shared threshold Doubled;
  • a second updating unit configured to determine that the updated sharing threshold is the shared threshold if the network collision rate is lower than a second threshold, and the sparse directory replacement rate is higher than a fourth threshold Half of it.
  • the network conflict rate is a difference between an actual transit time and a theoretical transit time, and the theoretical transit time Ratio between the two;
  • the theoretical delivery time is a total time required for the at least one test message to be transmitted in the network when the state of the network is unblocked;
  • the actual delivery time is a statistically obtained location The total time at which the at least one test message is actually delivered in the network.
  • the sparse directory replacement rate is a number of times that the sparse directory performs a read operation at a specified time, The ratio between the number of times the free storage space of the sparse directory is zero in a specified time.
  • the multi-core processor-based consistency processing method and apparatus provided by the embodiments of the present invention, according to the number of second cores sharing the target shared data in the multi-core processor, using the directory protocol or the interception protocol to share the data with the target Consistent processing is performed, so that the target shared data can be processed consistently by using a suitable protocol, and the shared data is partially shared by avoiding the same protocol for all shared data in the multi-core processor.
  • the listening protocol another part of the shared data uses the directory protocol, so that compared with the prior art, not only the directory items occupied by the shared data of the listening protocol in the sparse directory are saved, but also the directory protocol is saved.
  • the shared data is generated by the broadcast message generated during the consistency processing, thereby improving the consistency processing efficiency.
  • FIG. 1 is a schematic flowchart of a multi-core processor-based consistency processing method according to an embodiment of the present invention
  • FIG. 2 is a schematic flowchart of a multi-core processor-based consistency processing method according to another embodiment of the present invention.
  • FIG. 3 is a schematic structural diagram of a multi-core processor-based consistency processing apparatus according to an embodiment of the present invention.
  • FIG. 4 is a schematic structural diagram of a multi-core processor-based consistency processing apparatus according to another embodiment of the present invention.
  • FIG. 5 is a schematic structural diagram of a multi-core processor-based consistency processing apparatus according to still another embodiment of the present invention.
  • FIG. 1 is a schematic flowchart of a multi-core processor-based consistency processing method according to an embodiment of the present invention. As shown in FIG. 1, the embodiment may include the following steps:
  • the consistency request message is used to indicate target shared data to be processed consistently. 102. According to the number of the second kernel in the multi-core processor, select one of a directory protocol or a listening protocol to perform consistency processing on the target shared data.
  • the second core is a kernel that shares the target shared data, and specifically may be a kernel that operates on the target shared data, and the operations include a read operation and a write operation.
  • determining whether the number of the second kernel is greater than a predetermined shared threshold, and if the number of the second kernel is not greater than the shared threshold using the interception protocol to perform consistency on the target shared data. Processing, if the number of the second kernel is greater than the shared threshold, the directory protocol is used to perform consistency processing on the target shared data.
  • the shared threshold is an integer greater than zero and less than the number of cores of the multi-core processor.
  • the shared threshold is determined according to a network collision rate of the multi-core processor and a sparse directory replacement rate of the multi-core processor.
  • the network collision rate indicating a degree of congestion of a network for transmitting a message between cores of the multi-core processor; the sparse directory replacement rate indicating storage space occupation of a sparse directory in the multi-core processor degree.
  • the target shared data is consistently processed by using a directory protocol or a listening protocol, and the target shared data can be used appropriately.
  • the protocol is processed consistently, and since all the shared data in the multi-core processor is avoided, the same protocol is used for consistency processing, the shared data is used for the listening protocol, and the other part is shared with the directory protocol. Therefore, compared with the prior art, not only the directory items occupied by the shared data of the listening protocol in the sparse directory are saved, but also the broadcast message generated when the shared data of the directory protocol is used for consistency processing is saved, thereby improving Consistency processing efficiency.
  • FIG. 2 is a schematic flowchart of a multi-core processor-based consistency processing method according to another embodiment of the present invention.
  • the target shared data in this embodiment is data in a target data page of off-chip memory, as shown in FIG. 2 .
  • This embodiment may include the following steps:
  • the consistency request message is used to indicate the target shared data to be processed consistently.
  • the first kernel Before the first kernel needs to read or write the target shared data, if the first If the kernel does not cache the target shared data, a consistency request message is sent to obtain the target shared data and the kernel that knows that the target shared data is cached. If the first kernel caches the target shared data or the first kernel acquires the target shared data, the first kernel sends a consistency request message to learn the kernel that caches the target shared data, so that the first kernel shares the target. After the data is written, the target shared data in the kernel with the target shared data cached is deleted, and the consistency processing is completed.
  • the shared list is used to record the identity of the kernel that shares data in each data page of the off-chip memory.
  • the second kernel is a kernel that shares the target shared data.
  • the kernel that performs the read or write operation on the target shared data in the multi-core processor is used as the second kernel that shares the target shared data, and the query is based on the storage address of the target shared data in the off-chip memory.
  • the shared list stored in the TLB of the first kernel acquires the kernel identifier corresponding to the storage address, and thereby acquires the number of second cores sharing the target shared data in the multi-core processor.
  • the shared list can be obtained by adding a page share list in the page table entry in the TLB.
  • the shared list includes: a virtual address of the data page, a valid bit of the data page, a physical address of the data page, and a page share list of the data page, and the page share The list is used to record the kernel ID that operates on the data in the data page.
  • the shared threshold is an integer that is greater than zero and less than the number of cores of the multi-core processor. It can be determined that the initial sharing threshold is 1, that is, only when the state of the target data page is exclusive, that is, when the number of the second kernel is 1, the listening protocol is used, otherwise the directory protocol is used.
  • the target shared data is consistently processed by using a listening protocol.
  • the multicast message is multicast to each of the second cores, so that the second core of the second core that caches the target shared data is sent to the first core. Cache the target shared data, and the target shared data is not cached
  • the second core sends a response message to the first kernel indicating that the target shared data is not cached, so that the first kernel acquires the target shared data. If the operation performed by the first kernel on the acquired target shared data is a write operation, after the first kernel operates the acquired target shared data, the target shared data cached in the second kernel of the cached target shared data is deleted, and is sparse
  • the second kernel of the cache target shared data recorded in the directory is updated to complete the consistency processing. Because the listening protocol is used to consistently process the target shared data, the storage capacity of the target shared data occupying the sparse directory is avoided.
  • determining, according to the sparse directory, the second kernel of the cache target shared data, where the sparse directory is used to record the cache target when the number of the second kernel is greater than the shared threshold The second kernel that shares data.
  • a data request message is sent to the second core of the cache target shared data, so that the second core of the cache target shared data sends the cached target shared data to the first core, thereby acquiring the target shared data.
  • the operation performed by the first kernel on the acquired target shared data is a write operation
  • the target shared data cached in the second kernel of the cached target shared data is deleted, and is sparse
  • the second kernel of the cache target shared data recorded in the directory is updated to complete the consistency processing.
  • the network collision rate indicates the degree of congestion of the network used to transfer messages between the cores of the multi-core processor.
  • a sparse directory replacement rate indicating the amount of storage space occupied by sparse directories in the multi-core processor.
  • the network conflict rate is a ratio between the actual transit time and the theoretical transit time, and the theoretical transfer time.
  • the theoretical delivery time is a total time required for the at least one test message to be delivered in the network when the state of the network is unblocked; the actual delivery time is the statistically obtained at least one test message. The total time actually used for delivery in the network.
  • Sparse directory replacement rate The ratio of the number of times the read operation is performed to the sparse directory at a specified time, and the number of times the free storage space of the sparse directory is zero within a specified time.
  • the target shared data is consistently processed by using a directory protocol or a listening protocol, if the network conflict rate is higher than the first threshold, for example: 80%, and the sparse directory replacement rate is lower than the third width
  • the value for example: 1%, determines that the updated sharing threshold is twice the shared threshold, if the network collision rate is lower than the second threshold, for example: 2%, and the sparse directory replacement rate is higher than
  • the fourth threshold for example: 15%, determines that the updated shared threshold is half of the shared threshold, otherwise, no update is performed.
  • the multi-core processor in this embodiment determines the sharing threshold according to the degree of network congestion and the spatial occupancy of the sparse directory, and dynamically adjusts the proportion of the directory protocol and the listening protocol to achieve an optimal balance point, thereby making the network Congestion levels and sparse directory space occupancy levels are dynamically balanced, improving consistency efficiency and improving multi-core processor performance.
  • the shared threshold update can also be performed after the clock cycle of the specified multi-core processor.
  • the first data page satisfies the number of cores that cache data in the first data page is greater than the updated shared threshold.
  • the data in the first data page of the off-chip memory is cached in the kernel of the multi-core processor
  • the data in the first data page stored in the kernel of the multi-core processor is deleted, so that the multi-core is The sparse directory in the processor correspondingly deletes the directory entry corresponding to the first data page, wherein the directory entry corresponding to the first data page is used to record a kernel that caches data in each data block of the first data page.
  • the target shared data is consistently processed by using a directory protocol or a listening protocol, and the target shared data can be used appropriately.
  • Protocol for consistency processing and because It avoids the consistency processing of all shared data in the multi-core processor, the same protocol for the shared data, and the shared data using the directory protocol, so that it saves compared with the prior art.
  • the use of the shared data of the listening protocol in the directory directory occupied by the sparse directory, and the broadcast message generated when the shared data of the directory protocol is used for consistency processing is saved, thereby improving the consistency processing efficiency.
  • FIG. 3 is a schematic structural diagram of a multi-core processor-based consistency processing apparatus according to an embodiment of the present invention. As shown in FIG. 3, the method includes: a receiving module 31 and a processing module 32.
  • the receiving module 31 is configured to receive a consistency request message sent by the first core in the multi-core processor.
  • the consistency request message is used to indicate the target shared data to be processed consistently.
  • the processing module 32 is connected to the receiving module 31, and is configured to perform consistency processing on the target shared data by using one of a directory protocol or a listening protocol according to the number of the second cores in the multi-core processor.
  • the second kernel is a kernel that shares the target shared data.
  • determining whether the number of the second kernel is greater than a predetermined shared threshold, and if the number of the second kernel is not greater than the shared threshold using the interception protocol to perform the target shared data.
  • Consistency processing if the number of the second kernel is greater than the shared threshold, the directory protocol is used to perform consistency processing on the target shared data.
  • the shared threshold is an integer greater than zero and less than the number of cores of the multi-core processor.
  • the functional modules of the multi-core processor-based coherency processing device provided in this embodiment may be used to execute the process of the multi-core processor-based coherent processing device method shown in FIG. 1 , and the specific working principle is not described again. See the description of the method embodiment.
  • the target shared data is consistently processed by using a directory protocol or a listening protocol, and the target shared data can be used appropriately.
  • the protocol is processed consistently, and since all the shared data in the multi-core processor is avoided, the same protocol is used for consistency processing, the shared data is used for the listening protocol, and the other part is shared with the directory protocol. Therefore, compared with the prior art, not only the directory items occupied by the shared data of the listening protocol in the sparse directory are saved, but also the broadcast message generated when the shared data of the directory protocol is used for consistency processing is saved, thereby improving Consistency processing efficiency.
  • FIG. 4 is a schematic structural diagram of a multi-core processor-based consistency processing apparatus according to another embodiment of the present invention.
  • the processing module 32 in this embodiment includes : a judging unit 321, a first processing unit 322, and a second processing unit 323.
  • the determining unit 321 is configured to determine whether the number of the second kernel is greater than a predetermined shared threshold; the shared threshold is an integer greater than zero and less than the number of cores of the multi-core processor;
  • the first processing unit 322 is connected to the determining unit 321 , configured to: if the number of the second core is not greater than the shared threshold, use the interception protocol to perform consistent processing on the target shared data;
  • the second processing unit 323 is connected to the determining unit 321 for performing consistency processing on the target shared data by using a directory protocol if the number of the second cores is greater than the shared threshold.
  • the target shared data is data in a target data page of off-chip memory.
  • the off-chip memory is connected to the multi-core processor and is used to store the target shared data.
  • the multi-core processor-based coherency processing device further includes: an update module 33 and a delete module 34.
  • the update module 33 is coupled to the processing module 32 for updating the shared threshold according to the network collision rate of the multi-core processor and the sparse directory replacement rate of the multi-core processor.
  • the network collision rate indicates a degree of congestion of a network for transmitting messages between cores of the multi-core processor; a sparse directory replacement rate indicating a storage space occupancy of a sparse directory in the multi-core processor.
  • the deleting module 34 is connected to the update module 33, and configured to delete the kernel cache of the multi-core processor if the data in the first data page of the off-chip memory is cached in the kernel of the multi-core processor Decoding the data in the first data page to enable the multi-core processor
  • the sparse directory deletes the directory entry corresponding to the first data page.
  • the directory entry corresponding to the first data page is configured to record a kernel that caches data in each data block of the first data page; the first data page is configured to cache data in the first data page.
  • the number of cores is greater than the updated shared threshold.
  • the update module 33 includes a first update unit 331 and a second update unit 332.
  • the first update unit 331 is configured to determine that the updated sharing threshold is twice the sharing threshold if the network collision rate is higher than a first threshold, and the sparse directory replacement rate is lower than a third threshold. .
  • the network collision rate is the ratio between the actual delivery time and the theoretical delivery time, and the ratio of the theoretical delivery time.
  • the theoretical delivery time is calculated by the total time required for at least one test message to be delivered in the network when the state of the network is unblocked; the actual delivery time is the statistically obtained at least one test message in the The total time actually used for delivery in the network.
  • the sparse directory replacement rate is the ratio of the number of times the sparse directory performs a read operation at a specified time to the number of times the free storage space size of the sparse directory is zero within a specified time.
  • a second updating unit 332 configured to determine, if the network conflict rate is lower than a second threshold, and the sparse directory replacement rate is higher than a fourth threshold, determining that the updated sharing threshold is the shared width Half of the value.
  • the functional modules of the multi-core processor-based coherency processing device provided in this embodiment may be used to execute the process of the multi-core processor-based coherent processing device method shown in FIG. 2, and the specific working principle is not described again. See the description of the method embodiment.
  • the target shared data is consistently processed by using a directory protocol or a listening protocol, and the target shared data can be used appropriately.
  • the protocol is processed consistently, and since all the shared data in the multi-core processor is avoided, the same protocol is used for consistency processing, the shared data is used for the listening protocol, and the other part is shared with the directory protocol. Therefore, compared with the prior art, not only the directory items occupied by the shared data of the listening protocol in the sparse directory are saved, but also the shared data of the directory protocol is saved for consistency.
  • the broadcast message generated during processing increases the efficiency of the consistency processing.
  • FIG. 5 is a schematic structural diagram of a multi-core processor-based consistency processing apparatus according to another embodiment of the present invention. As shown in FIG. 5, the method includes: a communication interface 51, a processor 52, and a memory 53.
  • the communication interface 51 is configured to receive a consistency request message sent by the first core in the multi-core processor.
  • the consistency request message is used to indicate the target shared data to be processed consistently.
  • the memory 53 is used to store the program.
  • the program can include program code, the program code including computer operating instructions.
  • Memory 53 may contain high speed RAM memory and may also include non-volatile memory, such as at least one disk memory.
  • the processor 52 is configured to execute a program stored in the memory 53, configured to: select, according to the number of second cores in the multi-core processor, use one of a directory protocol or a listening protocol to share data with the target Perform consistency processing.
  • the second kernel is a kernel that shares the target shared data.
  • determining whether the number of the second kernel is greater than a predetermined shared threshold, and if the number of the second kernel is not greater than the shared threshold using the interception protocol to perform consistency on the target shared data. Processing; if the number of the second kernel is greater than the shared threshold, the directory protocol is used to perform consistency processing on the target shared data.
  • the shared threshold is an integer greater than zero and less than the number of cores of the multi-core processor.
  • the target shared data is data in a target data page of off-chip memory.
  • the processor 52 is further configured to: update a sharing threshold according to a network collision rate of the multi-core processor and a sparse directory replacement rate of the multi-core processor; if the multi-core processor has a cache in the kernel Data in the first data page of the off-chip memory, deleting data in the first data page of the kernel cache of the multi-core processor, to delete the sparse directory in the multi-core processor The directory entry corresponding to the first data page.
  • the directory entry corresponding to the first data page is used to record each of the first data pages.
  • the kernel in the data block is cached; the first data page satisfies the number of kernels that cache data in the first data page is greater than the updated shared threshold.
  • the network conflict rate is the ratio between the actual transit time and the theoretical transit time, and the theoretical transfer time; the theoretical transit time is calculated and obtained when the state of the network is unblocked, at least one test message is in the The total time required for delivery in the network; the actual delivery time is the total time actually used by the statistically obtained delivery of the at least one test message in the network.
  • the sparse directory replacement rate is a ratio between the number of times the sparse directory performs a read operation at a specified time and the number of times the free storage space size of the sparse directory is zero within the specified time.
  • the functional modules of the multi-core processor-based coherency processing device provided in this embodiment may be used to execute the process of the multi-core processor-based coherent processing device method shown in FIG. 2, and the specific working principle is not described again. See the description of the method embodiment.
  • the target shared data is consistently processed by using a directory protocol or a listening protocol, and the target shared data can be used appropriately.
  • the protocol is processed consistently, and since all the shared data in the multi-core processor is avoided, the same protocol is used for consistency processing, the shared data is used for the listening protocol, and the other part is shared with the directory protocol. Therefore, compared with the prior art, not only the directory items occupied by the shared data of the listening protocol in the sparse directory are saved, but also the broadcast message generated when the shared data of the directory protocol is used for consistency processing is saved, thereby improving Consistency processing efficiency.
  • the aforementioned program can be stored in a computer readable storage medium.
  • the steps including the foregoing method embodiments are performed; and the foregoing storage medium includes: a medium that can store program codes, such as a ROM, a RAM, a magnetic disk, or an optical disk.
  • the invention is not limited thereto; although the invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that the technical solutions described in the foregoing embodiments may be modified, or some or all of them may be modified. Technical features for equivalent replacement; and this The modifications and substitutions do not depart from the scope of the technical solutions of the embodiments of the present invention.

Abstract

A consistency processing method and device based on a multi-core processor. According to the number of second cores which share target shared data in a multi-core processor, consistency processing is performed on the target shared data by using a directory protocol or a snooping protocol, so that the consistency processing can be performed on the target shared data by using an appropriate protocol. In addition, because using the same protocol to perform the consistency processing on all the target shared data in the multi-core processor is avoided, one part of the shared data uses the snooping protocol, and the other part of the shared data uses the directory protocol, so that not only the directory entry occupied by the shared data which uses the directory protocol in a sparse directory is saved, but also broadcast messages produced by the shared data which uses the directory protocol when performing the consistency processing is saved compared with the prior art, thus the consistency processing efficiency is improved.

Description

基于多内核处理器的一致性处理方法和装置 本申请要求于 2013 年 6 月 26 日提交中国专利局、 申请号为 201310260830.3、发明名称为 "基于多内核处理器的一致性处理方法和装置" 的中国专利申请的优先权, 上述专利申请的全部内容通过引用结合在本申请 中。  Multi-core processor-based consistency processing method and apparatus The present application claims to be submitted to the Chinese Patent Office on June 26, 2013, application number 201310260830.3, and the invention name is "multi-core processor-based consistency processing method and apparatus" Priority of the Chinese Patent Application, the entire contents of which are incorporated herein by reference.
技术领域 Technical field
本发明涉及数据存储技术, 尤其涉及一种基于多内核处理器的一致 性处理方法和装置。  The present invention relates to data storage technologies, and more particularly to a consistency processing method and apparatus based on a multi-core processor.
背景技术 Background technique
多内核处理器是指在一枚处理器中集成了多个内核。 多内核处理器与用 于存储各个共享数据的片外内存连接, 片外内存包括多个数据页, 每个数据 页包括多个数据块。 多内核处理器的内核通过网络相互连接, 该网络用于在 内核间传递消息。 多内核处理器中的每个内核包括处理器核、 片内緩存、 旁 路转换緩冲 ( Translation lookaside buffer, TLB )和稀疏目录。 内核在对片外 内存中的共享数据进行操作之前, 需要将共享数据緩存到该内核的片内緩存 中, 若至少两个内核的片内緩存中緩存有片外内存中的同一共享数据, 其中 一个内核对所緩存的共享数据进行写操作后, 则会造成写操作后的共享数据 与其余内核中所緩存的共享数据不一致, 因此, 需要多内核处理器维护共享 数据的一致性。  A multi-core processor refers to the integration of multiple cores in a single processor. The multi-core processor is connected to an off-chip memory for storing each shared data, and the off-chip memory includes a plurality of data pages, each of which includes a plurality of data blocks. The cores of multicore processors are connected to each other over a network that is used to pass messages between cores. Each core in a multi-core processor includes a processor core, an on-chip cache, a Translation Lookaside Buffer (TLB), and a sparse directory. Before the kernel operates on the shared data in the off-chip memory, the shared data needs to be cached into the kernel's on-chip cache. If at least two cores have the same shared data in the on-chip memory, the on-chip cache is cached. After a kernel writes the cached shared data, the shared data after the write operation is inconsistent with the shared data cached in the rest of the kernel. Therefore, the multi-core processor is required to maintain the consistency of the shared data.
现有技术中, 多内核处理器釆用目录协议和侦听协议中的一种维护共享 数据的一致性。 由于目录协议需要在稀疏目录中存储目录项, 利用目录项记 录緩存各个共享数据的内核, 从而根据该稀疏目录中的目录项对内核中存储 的共享数据进行一致性处理, 造成釆用目录协议的多内核处理器存储开销较 大, 一致性处理效率较低; 另外, 尽管侦听协议不必在稀疏目录中存储目录 项, 但由于侦听协议需要釆用广播消息的方式获知緩存有共享数据的内核, 从而对共享数据进行一致性处理, 造成釆用侦听协议的多内核处理器信令开 销较大, 同样造成一致性处理效率较低。 In the prior art, a multi-core processor maintains the consistency of shared data using one of a directory protocol and a listening protocol. Since the directory protocol needs to store the directory entries in the sparse directory, the directory entry records the kernel of each shared data, thereby performing consistency processing on the shared data stored in the kernel according to the directory entries in the sparse directory, thereby causing the use of the directory protocol. Multi-core processors have high storage overhead and are less efficient to process consistently; in addition, although the listening protocol does not have to store directories in a sparse directory Item, but because the interception protocol needs to use the broadcast message to know the kernel that caches the shared data, so that the shared data is processed consistently, which causes the multi-core processor signaling overhead of the listening protocol to be large, which also causes Consistency processing is less efficient.
发明内容 Summary of the invention
本发明实施例提供一种基于多内核处理器的一致性处理方法和装 置, 用于提高一致性处理效率。  Embodiments of the present invention provide a consistency processing method and apparatus based on a multi-core processor for improving consistency processing efficiency.
第一方面是提供一种基于多内核处理器的一致性处理方法, 包括: 接收多内核处理器中第一内核发送的一致性请求消息; 所述一致性请求 消息用于指示待进行一致性处理的目标共享数据; 根据所述多内核处理 器中第二内核的数量, 选择釆用目录协议或侦听协议中的一种对所述目 标共享数据进行一致性处理; 所述第二内核为共享所述目标共享数据的 内核。  The first aspect provides a multi-core processor-based consistency processing method, including: receiving a consistency request message sent by a first kernel in a multi-core processor; the consistency request message is used to indicate that a consistency process is to be performed. The target shared data; according to the number of the second kernel in the multi-core processor, selecting one of a directory protocol or a listening protocol to perform consistency processing on the target shared data; the second kernel is sharing The target shares the core of the data.
在第一方面的第一种可能的实现方式中, 所述目标共享数据为片外 内存的目标数据页中的数据; 所述片外内存用于为所述多内核处理器提 供所述目标共享数据。  In a first possible implementation manner of the first aspect, the target shared data is data in a target data page of off-chip memory; the off-chip memory is used to provide the target sharing for the multi-core processor data.
结合第一方面或第一方面的第一种可能的实现方式, 在第一方面的 第二种可能的实现方式中, 所述根据所述多内核处理器中第二内核的数 量, 选择釆用目录协议或侦听协议中的一种对所述目标共享数据进行一 致性处理, 包括: 判断所述第二内核的数量是否大于预先确定的共享阔 值;所述共享阔值为大于零且小于所述多内核处理器的内核数量的整数; 若所述第二内核的数量不大于所述共享阔值, 则釆用侦听协议对所述目 标共享数据进行一致性处理;若所述第二内核的数量大于所述共享阔值, 则釆用目录协议对所述目标共享数据进行一致性处理。  With reference to the first aspect or the first possible implementation manner of the first aspect, in a second possible implementation manner of the first aspect, the selecting, according to the number of the second cores in the multi-core processor, The consistency processing of the target shared data by one of the directory protocol or the interception protocol includes: determining whether the number of the second kernel is greater than a predetermined shared threshold; the shared threshold is greater than zero and less than An integer of the number of cores of the multi-core processor; if the number of the second core is not greater than the shared threshold, the target shared data is consistently processed by the interception protocol; If the number of kernels is greater than the shared threshold, the directory sharing protocol is used to perform consistency processing on the target shared data.
结合第一方面的第二种可能的实现方式, 在第一方面的第三种可能 的实现方式中, 所述根据所述多内核处理器中第二内核的数量, 选择釆 用目录协议或侦听协议中的一种对所述目标共享数据进行一致性处理之 后, 包括: 根据所述多内核处理器的网络冲突率和所述多内核处理器的 稀疏目录替换率, 更新所述共享阔值; 所述网络冲突率, 指示用于在所 述多内核处理器的内核之间传递消息的网络的拥塞程度; 所述稀疏目录 替换率, 指示所述多内核处理器中的稀疏目录的存储空间占用程度; 若 所述多内核处理器的内核中緩存有所述片外内存的第一数据页中的数 据, 删除所述多内核处理器的内核緩存的所述第一数据页中的数据, 以 使所述多内核处理器中的稀疏目录删除所述第一数据页对应的目录项; 所述第一数据页对应的目录项用于记录对所述第一数据页的各个数据块 中的数据进行緩存的内核; 所述第一数据页满足緩存所述第一数据页中 的数据的内核数量大于更新后的共享阔值。 In conjunction with the second possible implementation of the first aspect, in a third possible implementation manner of the first aspect, the selecting, according to the number of the second cores in the multi-core processor, selecting a directory protocol or a metric After performing consistency processing on the target shared data, the method includes: a network collision rate according to the multi-core processor and the multi-core processor Sparse directory replacement rate, updating the shared threshold; the network collision rate indicating a degree of congestion of a network for transmitting messages between cores of the multi-core processor; the sparse directory replacement rate indicating The storage space occupancy of the sparse directory in the multi-core processor; if the data in the first data page of the off-chip memory is cached in the kernel of the multi-core processor, deleting the kernel cache of the multi-core processor The data in the first data page, so that the sparse directory in the multi-core processor deletes a directory entry corresponding to the first data page; the directory entry corresponding to the first data page is used to record a a kernel in which data in each data block of the first data page is cached; the first data page satisfies the number of cores that cache data in the first data page is greater than the updated shared threshold.
结合第一方面的第三种可能的实现方式, 在第一方面的第四种可能 的实现方式中, 所述根据所述网络冲突率和所述稀疏目录替换率, 更新 所述共享阔值, 包括: 若所述网络冲突率高于第一阔值, 并且所述稀疏 目录替换率低于第三阔值, 则确定所述更新后的共享阔值为所述共享阔 值的二倍; 若所述网络冲突率低于第二阈值, 并且所述稀疏目录替换率 高于第四阔值, 则确定所述更新后的共享阔值为所述共享阔值的一半。  In conjunction with the third possible implementation of the first aspect, in a fourth possible implementation manner of the first aspect, the updating the shared threshold according to the network conflict rate and the sparse directory replacement rate, The method includes: if the network conflict rate is higher than the first threshold, and the sparse directory replacement rate is lower than the third threshold, determining that the updated shared threshold is twice the shared threshold; If the network conflict rate is lower than the second threshold, and the sparse directory replacement rate is higher than the fourth threshold, determining that the updated shared threshold is half of the shared threshold.
结合第一方面的第三种可能的实现方式, 在第一方面的第五种可能 的实现方式中, 所述网络冲突率为实际传递时间和理论传递时间之差, 与所述理论传递时间之间的比值; 所述理论传递时间是计算获得的当所 述网络的状态为畅通时, 至少一个测试消息在所述网络中进行传递所需 的总时间; 所述实际传递时间是统计获得的所述至少一个测试消息在所 述网络中进行传递所实际使用的总时间。  With reference to the third possible implementation manner of the first aspect, in a fifth possible implementation manner of the first aspect, the network conflict rate is a difference between an actual transit time and a theoretical transit time, and the theoretical transit time Ratio between the two; the theoretical delivery time is a total time required for the at least one test message to be transmitted in the network when the state of the network is unblocked; the actual delivery time is a statistically obtained location The total time at which the at least one test message is actually delivered in the network.
结合第一方面的第三种可能的实现方式, 在第一方面的第六种可能 的实现方式中, 所述稀疏目录替换率为所述稀疏目录在指定时间执行读 操作的次数, 与所述指定时间内所述稀疏目录的空闲存储空间大小为零 的次数之间的比值。  In conjunction with the third possible implementation of the first aspect, in a sixth possible implementation manner of the first aspect, the sparse directory replacement rate is a number of times that the sparse directory performs a read operation at a specified time, The ratio between the number of times the free storage space of the sparse directory is zero in a specified time.
第二个方面是提供一种基于多内核处理器的一致性处理装置, 包括: 接收模块, 用于接收多内核处理器中第一内核发送的一致性请求消 息; 所述一致性请求消息用于指示待进行一致性处理的目标共享数据; 处理模块, 用于根据所述多内核处理器中第二内核的数量, 选择釆 用目录协议或侦听协议中的一种对所述目标共享数据进行一致性处理; 所述第二内核为共享所述目标共享数据的内核。 The second aspect provides a multi-core processor-based consistency processing apparatus, including: a receiving module, configured to receive a consistency request message sent by a first core in a multi-core processor; the consistency request message is used to Indicate the target shared data to be processed consistently; a processing module, configured to perform consistency processing on the target shared data by using one of a directory protocol or a listening protocol according to the number of the second kernels in the multi-core processor; the second kernel is shared The target shares the core of the data.
在第二方面的第一种可能的实现方式中, 所述目标共享数据为片外 内存的目标数据页中的数据; 所述片外内存用于为所述多内核处理器提 供所述目标共享数据。  In a first possible implementation manner of the second aspect, the target shared data is data in a target data page of off-chip memory; the off-chip memory is used to provide the target sharing for the multi-core processor data.
结合第二方面或第二方面的第一种可能的实现方式, 在第二方面的 第二种可能的实现方式中, 所述处理模块, 包括:  With reference to the second aspect, or the first possible implementation manner of the second aspect, in the second possible implementation manner of the second aspect, the processing module includes:
判断单元, 用于判断所述第二内核的数量是否大于预先确定的共享 阔值; 所述共享阔值为大于零且小于所述多内核处理器的内核数量的整 数;  a determining unit, configured to determine whether the number of the second core is greater than a predetermined shared threshold; the shared threshold is greater than zero and less than an integer of the number of cores of the multi-core processor;
第一处理单元, 用于若所述第二内核的数量不大于所述共享阔值, 则釆用侦听协议对所述目标共享数据进行一致性处理;  a first processing unit, configured to: if the number of the second kernel is not greater than the shared threshold, use a listening protocol to perform consistency processing on the target shared data;
第二处理单元, 用于若所述第二内核的数量大于所述共享阔值, 则 釆用目录协议对所述目标共享数据进行一致性处理。  And a second processing unit, configured to perform consistency processing on the target shared data by using a directory protocol if the number of the second cores is greater than the shared threshold.
结合第二方面的第二种可能的实现方式, 在第二方面的第三种可能 的实现方式中, 所多内核处理器的一致性处理装置, 还包括:  In conjunction with the second possible implementation of the second aspect, in a third possible implementation of the second aspect, the consistency processing device of the multi-core processor further includes:
更新模块, 用于根据所述多内核处理器的网络冲突率和所述多内核 处理器的稀疏目录替换率, 更新所述共享阔值; 所述网络冲突率, 指示 用于在所述多内核处理器的内核之间传递消息的网络的拥塞程度; 所述 稀疏目录替换率, 指示所述多内核处理器中的稀疏目录的存储空间占用 程度;  And an update module, configured to update the shared threshold according to a network collision rate of the multi-core processor and a sparse directory replacement rate of the multi-core processor; the network conflict rate is indicated for being used in the multi-core a degree of congestion of a network that communicates messages between cores of the processor; the sparse directory replacement rate indicating a storage space occupancy of a sparse directory in the multi-core processor;
删除模块, 用于若所述多内核处理器的内核中緩存有所述片外内存 的第一数据页中的数据, 删除所述多内核处理器的内核緩存的所述第一 数据页中的数据, 以使所述多内核处理器中的稀疏目录删除所述第一数 据页对应的目录项; 所述第一数据页对应的目录项用于记录对所述第一 数据页的各个数据块中的数据进行緩存的内核; 所述第一数据页满足緩 存所述第一数据页中的数据的内核数量大于更新后的共享阔值。 结合第二方面的第三种可能的实现方式, 在第二方面的第四种可能 的实现方式中, 所述更新模块, 包括: a deleting module, configured to: if the data in the first data page of the off-chip memory is cached in a kernel of the multi-core processor, deleting the first data page of the kernel cache of the multi-core processor Data, such that the sparse directory in the multi-core processor deletes a directory entry corresponding to the first data page; the directory entry corresponding to the first data page is used to record each data block of the first data page The core of the data is cached; the first data page satisfies the number of kernels that cache data in the first data page is greater than the updated shared threshold. With reference to the third possible implementation of the second aspect, in a fourth possible implementation manner of the second aspect, the updating module includes:
第一更新单元, 用于若所述网络冲突率高于第一阔值, 并且所述稀 疏目录替换率低于第三阔值, 则确定所述更新后的共享阔值为所述共享 阔值的二倍;  a first updating unit, configured to determine, if the network conflict rate is higher than a first threshold, and the sparse directory replacement rate is lower than a third threshold, determining that the updated sharing threshold is the shared threshold Doubled;
第二更新单元, 用于若所述网络冲突率低于第二阔值, 并且所述稀 疏目录替换率高于第四阔值, 则确定所述更新后的共享阔值为所述共享 阔值的一半。  a second updating unit, configured to determine that the updated sharing threshold is the shared threshold if the network collision rate is lower than a second threshold, and the sparse directory replacement rate is higher than a fourth threshold Half of it.
结合第二方面的第三种可能的实现方式, 在第二方面的第五种可能 的实现方式中, 所述网络冲突率为实际传递时间和理论传递时间之差, 与所述理论传递时间之间的比值; 所述理论传递时间是计算获得的当所 述网络的状态为畅通时, 至少一个测试消息在所述网络中进行传递所需 的总时间; 所述实际传递时间是统计获得的所述至少一个测试消息在所 述网络中进行传递所实际使用的总时间。  With reference to the third possible implementation manner of the second aspect, in a fifth possible implementation manner of the second aspect, the network conflict rate is a difference between an actual transit time and a theoretical transit time, and the theoretical transit time Ratio between the two; the theoretical delivery time is a total time required for the at least one test message to be transmitted in the network when the state of the network is unblocked; the actual delivery time is a statistically obtained location The total time at which the at least one test message is actually delivered in the network.
结合第二方面的第三种可能的实现方式, 在第二方面的第六种可能 的实现方式中, 所述稀疏目录替换率为所述稀疏目录在指定时间执行读 操作的次数, 与所述指定时间内所述稀疏目录的空闲存储空间大小为零 的次数之间的比值。  With reference to the third possible implementation of the second aspect, in a sixth possible implementation manner of the second aspect, the sparse directory replacement rate is a number of times that the sparse directory performs a read operation at a specified time, The ratio between the number of times the free storage space of the sparse directory is zero in a specified time.
本发明实施例提供的基于多内核处理器的一致性处理方法和装置, 根据多内核处理器中对目标共享数据进行共享的第二内核的数量, 釆用 目录协议或侦听协议对目标共享数据进行一致性处理, 从而能够对该目 标共享数据釆用适合的协议进行一致性处理, 并且由于避免了对多内核 处理器中的全部共享数据釆用相同的协议进行一致性处理, 对部分共享 数据釆用侦听协议, 另一部分共享数据釆用目录协议, 从而相比较于现 有技术不仅节省了釆用侦听协议的共享数据在稀疏目录中所占用的目录 项, 而且节省了釆用目录协议的共享数据进行一致性处理时所产生广播 消息, 因而提高了一致性处理效率。 附图说明 The multi-core processor-based consistency processing method and apparatus provided by the embodiments of the present invention, according to the number of second cores sharing the target shared data in the multi-core processor, using the directory protocol or the interception protocol to share the data with the target Consistent processing is performed, so that the target shared data can be processed consistently by using a suitable protocol, and the shared data is partially shared by avoiding the same protocol for all shared data in the multi-core processor. Using the listening protocol, another part of the shared data uses the directory protocol, so that compared with the prior art, not only the directory items occupied by the shared data of the listening protocol in the sparse directory are saved, but also the directory protocol is saved. The shared data is generated by the broadcast message generated during the consistency processing, thereby improving the consistency processing efficiency. DRAWINGS
为了更清楚地说明本发明实施例或现有技术中的技术方案, 下面将 对实施例或现有技术描述中所需要使用的附图作简单地介绍, 显而易见 地, 下面描述中的附图是本发明的一些实施例, 对于本领域普通技术人 员来讲, 在不付出创造性劳动的前提下, 还可以根据这些附图获得其他 的附图。  In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings to be used in the embodiments or the description of the prior art will be briefly described below. Obviously, the drawings in the following description are Some embodiments of the present invention may also be used to obtain other drawings based on these drawings without departing from the prior art.
图 1为本发明一实施例提供的基于多内核处理器的一致性处理方法 的流程示意图;  1 is a schematic flowchart of a multi-core processor-based consistency processing method according to an embodiment of the present invention;
图 2为本发明另一实施例提供的基于多内核处理器的一致性处理方 法的流程示意图;  2 is a schematic flowchart of a multi-core processor-based consistency processing method according to another embodiment of the present invention;
图 3为本发明一实施例提供的基于多内核处理器的一致性处理装置 的结构示意图;  3 is a schematic structural diagram of a multi-core processor-based consistency processing apparatus according to an embodiment of the present invention;
图 4为本发明另一实施例提供的基于多内核处理器的一致性处理装 置的结构示意图;  4 is a schematic structural diagram of a multi-core processor-based consistency processing apparatus according to another embodiment of the present invention;
图 5为本发明又一实施例提供的基于多内核处理器的一致性处理装 置的结构示意图。  FIG. 5 is a schematic structural diagram of a multi-core processor-based consistency processing apparatus according to still another embodiment of the present invention.
具体实施方式 detailed description
为使本发明实施例的目的、 技术方案和优点更加清楚, 下面将结合 本发明实施例中的附图, 对本发明实施例中的技术方案进行清楚、 完整 地描述, 显然, 所描述的实施例是本发明一部分实施例, 而不是全部的 实施例。 基于本发明中的实施例, 本领域普通技术人员在没有作出创造 性劳动前提下所获得的所有其他实施例, 都属于本发明保护的范围。  The technical solutions in the embodiments of the present invention are clearly and completely described in the following with reference to the accompanying drawings in the embodiments of the present invention. It is a partial embodiment of the invention, and not all of the embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present invention without creative work are within the scope of the present invention.
图 1为本发明一实施例提供的基于多内核处理器的一致性处理方法 的流程示意图, 如图 1所示, 本实施例可以包括以下步骤:  FIG. 1 is a schematic flowchart of a multi-core processor-based consistency processing method according to an embodiment of the present invention. As shown in FIG. 1, the embodiment may include the following steps:
101、 接收多内核处理器中第一内核发送的一致性请求消息。  101. Receive a consistency request message sent by a first core in the multi-core processor.
其中, 一致性请求消息用于指示待进行一致性处理的目标共享数 据。 102、 根据多内核处理器中第二内核的数量, 选择釆用目录协议或 侦听协议中的一种对目标共享数据进行一致性处理。 The consistency request message is used to indicate target shared data to be processed consistently. 102. According to the number of the second kernel in the multi-core processor, select one of a directory protocol or a listening protocol to perform consistency processing on the target shared data.
其中, 第二内核为共享所述目标共享数据的内核, 具体可为对所述 目标共享数据进行操作的内核, 操作包括读操作和写操作。  The second core is a kernel that shares the target shared data, and specifically may be a kernel that operates on the target shared data, and the operations include a read operation and a write operation.
可选的, 判断第二内核的数量是否大于预先确定的共享阔值, 若所 述第二内核的数量不大于所述共享阔值, 则釆用侦听协议对所述目标共 享数据进行一致性处理, 若所述第二内核的数量大于所述共享阔值, 则 釆用目录协议对所述目标共享数据进行一致性处理。 其中, 共享阔值为 大于零且小于多内核处理器的内核数量的整数。 其中, 所述共享阔值是 根据所述多内核处理器的网络冲突率和所述多内核处理器的稀疏目录 替换率确定的。 所述网络冲突率, 指示用于在所述多内核处理器的内核 之间传递消息的网络的拥塞程度; 所述稀疏目录替换率, 指示所述多内 核处理器中的稀疏目录的存储空间占用程度。  Optionally, determining whether the number of the second kernel is greater than a predetermined shared threshold, and if the number of the second kernel is not greater than the shared threshold, using the interception protocol to perform consistency on the target shared data. Processing, if the number of the second kernel is greater than the shared threshold, the directory protocol is used to perform consistency processing on the target shared data. Where the shared threshold is an integer greater than zero and less than the number of cores of the multi-core processor. The shared threshold is determined according to a network collision rate of the multi-core processor and a sparse directory replacement rate of the multi-core processor. The network collision rate indicating a degree of congestion of a network for transmitting a message between cores of the multi-core processor; the sparse directory replacement rate indicating storage space occupation of a sparse directory in the multi-core processor degree.
本实施例中, 根据多内核处理器中对目标共享数据进行共享的第二 内核的数量, 釆用目录协议或侦听协议对目标共享数据进行一致性处 理, 能够对该目标共享数据釆用适合的协议进行一致性处理, 并且由于 避免了对多内核处理器中的全部共享数据釆用相同的协议进行一致性 处理,对部分共享数据釆用侦听协议,另一部分共享数据釆用目录协议, 从而相比较于现有技术不仅节省了釆用侦听协议的共享数据在稀疏目 录中所占用的目录项, 而且节省了釆用目录协议的共享数据进行一致性 处理时所产生广播消息, 因而提高了一致性处理效率。  In this embodiment, according to the number of second cores shared by the target shared data in the multi-core processor, the target shared data is consistently processed by using a directory protocol or a listening protocol, and the target shared data can be used appropriately. The protocol is processed consistently, and since all the shared data in the multi-core processor is avoided, the same protocol is used for consistency processing, the shared data is used for the listening protocol, and the other part is shared with the directory protocol. Therefore, compared with the prior art, not only the directory items occupied by the shared data of the listening protocol in the sparse directory are saved, but also the broadcast message generated when the shared data of the directory protocol is used for consistency processing is saved, thereby improving Consistency processing efficiency.
图 2为本发明另一实施例提供的基于多内核处理器的一致性处理方 法的流程示意图, 本实施例中的目标共享数据为片外内存的目标数据页 中的数据, 如图 2所示, 本实施例可以包括以下步骤:  2 is a schematic flowchart of a multi-core processor-based consistency processing method according to another embodiment of the present invention. The target shared data in this embodiment is data in a target data page of off-chip memory, as shown in FIG. 2 . This embodiment may include the following steps:
201、 接收多内核处理器中第一内核发送的一致性请求消息。  201. Receive a consistency request message sent by a first core in the multi-core processor.
其中, 一致性请求消息用于指示待进行一致性处理的目标共享数 据。  The consistency request message is used to indicate the target shared data to be processed consistently.
当第一内核需要对目标共享数据进行读操作或写操作之前, 若第一 内核没有緩存该目标共享数据, 则发送一致性请求消息, 以获取该目标 共享数据以及获知緩存有该目标共享数据的内核。 若第一内核緩存有该 目标共享数据或者第一内核获取到该目标共享数据, 第一内核发送一致 性请求消息, 以获知緩存有该目标共享数据的内核, 以使第一内核对该 目标共享数据进行写操作之后, 删除所获知的緩存有该目标共享数据的 内核中的目标共享数据, 完成一致性处理。 Before the first kernel needs to read or write the target shared data, if the first If the kernel does not cache the target shared data, a consistency request message is sent to obtain the target shared data and the kernel that knows that the target shared data is cached. If the first kernel caches the target shared data or the first kernel acquires the target shared data, the first kernel sends a consistency request message to learn the kernel that caches the target shared data, so that the first kernel shares the target. After the data is written, the target shared data in the kernel with the target shared data cached is deleted, and the consistency processing is completed.
202、 查询第一内核的 TLB中存储的共享列表, 获取多内核处理器 中第二内核的数量。  202. Query the shared list stored in the TLB of the first kernel, and obtain the number of the second kernel in the multi-core processor.
其中, 共享列表用于记录对片外内存的各个数据页中数据进行共享 的内核的标识。 第二内核为共享所述目标共享数据的内核。  The shared list is used to record the identity of the kernel that shares data in each data page of the off-chip memory. The second kernel is a kernel that shares the target shared data.
可选的, 将多内核处理器中对目标共享数据进行过读操作或写操作 的内核, 作为对目标共享数据进行共享的第二内核, 根据目标共享数据 在片外内存中的存储地址, 查询第一内核的 TLB 中存储的共享列表, 获取该存储地址对应的内核标识, 进而获取多内核处理器中共享目标共 享数据的第二内核的数量。 其中, 共享列表可在 TLB 中的页表项中增 加页共享列表获得, 共享列表包括: 数据页的虚拟地址、 数据页的有效 位、 数据页的物理地址和数据页的页共享列表, 页共享列表用于记录对 数据页中数据进行操作的内核标识。  Optionally, the kernel that performs the read or write operation on the target shared data in the multi-core processor is used as the second kernel that shares the target shared data, and the query is based on the storage address of the target shared data in the off-chip memory. The shared list stored in the TLB of the first kernel acquires the kernel identifier corresponding to the storage address, and thereby acquires the number of second cores sharing the target shared data in the multi-core processor. The shared list can be obtained by adding a page share list in the page table entry in the TLB. The shared list includes: a virtual address of the data page, a valid bit of the data page, a physical address of the data page, and a page share list of the data page, and the page share The list is used to record the kernel ID that operates on the data in the data page.
203、 判断第二内核的数量是否大于预先确定的共享阔值, 若是执 行 205 , 否则执行 204。  203. Determine whether the number of the second kernel is greater than a predetermined shared threshold. If the execution is 205, execute 204.
其中, 共享阔值为大于零且小于所述多内核处理器的内核数量的整 数。 可确定初始的共享阔值为 1 , 即仅在目标数据页的状态为独占时, 也就是说, 第二内核数量为 1时, 釆用侦听协议, 否则釆用目录协议。  Wherein, the shared threshold is an integer that is greater than zero and less than the number of cores of the multi-core processor. It can be determined that the initial sharing threshold is 1, that is, only when the state of the target data page is exclusive, that is, when the number of the second kernel is 1, the listening protocol is used, otherwise the directory protocol is used.
204、 若第二内核的数量不大于共享阔值, 则釆用侦听协议对目标 共享数据进行一致性处理。  204. If the number of the second kernel is not greater than the shared threshold, the target shared data is consistently processed by using a listening protocol.
可选的, 若第二内核的数量不大于共享阔值, 则对各个第二内核多 播侦听消息, 以使各个第二内核中緩存目标共享数据的第二内核向所述 第一内核发送緩存的所述目标共享数据, 以及未緩存所述目标共享数据 的第二内核向所述第一内核发送用于指示未緩存所述目标共享数据的 响应消息, 从而第一内核获取目标共享数据。 若第一内核对获取的目标 共享数据进行的操作为写操作, 则第一内核对获取的目标共享数据进行 操作之后, 删除緩存目标共享数据的第二内核中所緩存的目标共享数 据, 对稀疏目录中所记录的緩存目标共享数据的第二内核进行更新, 以 完成一致性处理。 由于釆用了侦听协议对目标共享数据进行一致性处 理, 避免了该目标共享数据占用稀疏目录的存储容量。 Optionally, if the number of the second core is not greater than the shared threshold, the multicast message is multicast to each of the second cores, so that the second core of the second core that caches the target shared data is sent to the first core. Cache the target shared data, and the target shared data is not cached The second core sends a response message to the first kernel indicating that the target shared data is not cached, so that the first kernel acquires the target shared data. If the operation performed by the first kernel on the acquired target shared data is a write operation, after the first kernel operates the acquired target shared data, the target shared data cached in the second kernel of the cached target shared data is deleted, and is sparse The second kernel of the cache target shared data recorded in the directory is updated to complete the consistency processing. Because the listening protocol is used to consistently process the target shared data, the storage capacity of the target shared data occupying the sparse directory is avoided.
205、 若所述第二内核的数量大于所述共享阔值, 则釆用目录协议 对目标共享数据进行一致性处理。  205. If the number of the second kernel is greater than the shared threshold, use the directory protocol to perform consistency processing on the target shared data.
可选的, 若第二内核的数量大于共享阔值, 则根据稀疏目录确定緩 存目标共享数据的第二内核, 其中, 稀疏目录用于当第二内核的数量大 于共享阔值时, 记录緩存目标共享数据的第二内核。 向緩存目标共享数 据的第二内核发送数据请求消息, 以使緩存目标共享数据的第二内核向 第一内核发送緩存的目标共享数据, 从而获取目标共享数据。 若第一内 核对获取的目标共享数据进行的操作为写操作, 则第一内核对获取的目 标共享数据进行操作之后, 删除緩存目标共享数据的第二内核中所緩存 的目标共享数据, 对稀疏目录中所记录的緩存目标共享数据的第二内核 进行更新, 以完成一致性处理。  Optionally, if the number of the second kernel is greater than the shared threshold, determining, according to the sparse directory, the second kernel of the cache target shared data, where the sparse directory is used to record the cache target when the number of the second kernel is greater than the shared threshold The second kernel that shares data. A data request message is sent to the second core of the cache target shared data, so that the second core of the cache target shared data sends the cached target shared data to the first core, thereby acquiring the target shared data. If the operation performed by the first kernel on the acquired target shared data is a write operation, after the first kernel operates the acquired target shared data, the target shared data cached in the second kernel of the cached target shared data is deleted, and is sparse The second kernel of the cache target shared data recorded in the directory is updated to complete the consistency processing.
206、 计算多内核处理器的网络冲突率和多内核处理器的稀疏目录 替换率。  206. Calculate the network collision rate of the multi-core processor and the sparse directory replacement rate of the multi-core processor.
其中, 网络冲突率, 指示用于在多内核处理器的内核之间传递消息 的网络的拥塞程度。 稀疏目录替换率, 指示所述多内核处理器中的稀疏 目录的存储空间占用程度。  Among them, the network collision rate indicates the degree of congestion of the network used to transfer messages between the cores of the multi-core processor. A sparse directory replacement rate indicating the amount of storage space occupied by sparse directories in the multi-core processor.
可选的, 执行 204或 205之后, 网络冲突率为实际传递时间和理论 传递时间之差, 与所述理论传递时间之间的比值。 其中, 理论传递时间 是计算获得的当所述网络的状态为畅通时, 至少一个测试消息在所述网 络中进行传递所需的总时间; 实际传递时间是统计获得的所述至少一个 测试消息在所述网络中进行传递所实际使用的总时间。稀疏目录替换率 为所述稀疏目录在指定时间执行读操作的次数, 与指定时间内所述稀疏 目录的空闲存储空间大小为零的次数之间的比值。 Optionally, after performing 204 or 205, the network conflict rate is a ratio between the actual transit time and the theoretical transit time, and the theoretical transfer time. The theoretical delivery time is a total time required for the at least one test message to be delivered in the network when the state of the network is unblocked; the actual delivery time is the statistically obtained at least one test message. The total time actually used for delivery in the network. Sparse directory replacement rate The ratio of the number of times the read operation is performed to the sparse directory at a specified time, and the number of times the free storage space of the sparse directory is zero within a specified time.
207、 根据多内核处理器的网络冲突率和多内核处理器的稀疏目录 替换率, 更新共享阔值。  207. Update the shared threshold according to the network conflict rate of the multi-core processor and the sparse directory replacement rate of the multi-core processor.
可选的, 釆用目录协议或侦听协议对所述目标共享数据进行一致性 处理之后, 若网络冲突率高于第一阔值, 例如: 80% , 并且稀疏目录替 换率低于第三阔值, 例如: 1% , 则确定更新后的共享阔值为所述共享 阔值的二倍, 若网络冲突率低于第二阔值, 例如: 2% , 并且所述稀疏 目录替换率高于第四阔值, 例如: 15% , 则确定所述更新后的共享阔值 为所述共享阔值的一半, 否则, 不进行更新。 本实施例中的多内核处理 器通过根据网络拥塞程度和稀疏目录的空间占用程度确定共享阈值, 进 而动态调整釆用目录协议和侦听协议的比例, 使其达到最佳平衡点, 从 而使得网络拥塞程度和稀疏目录的空间占用程度达到动态平衡, 提高一 致性的效率, 提高多内核处理器性能。  Optionally, after the target shared data is consistently processed by using a directory protocol or a listening protocol, if the network conflict rate is higher than the first threshold, for example: 80%, and the sparse directory replacement rate is lower than the third width The value, for example: 1%, determines that the updated sharing threshold is twice the shared threshold, if the network collision rate is lower than the second threshold, for example: 2%, and the sparse directory replacement rate is higher than The fourth threshold, for example: 15%, determines that the updated shared threshold is half of the shared threshold, otherwise, no update is performed. The multi-core processor in this embodiment determines the sharing threshold according to the degree of network congestion and the spatial occupancy of the sparse directory, and dynamically adjusts the proportion of the directory protocol and the listening protocol to achieve an optimal balance point, thereby making the network Congestion levels and sparse directory space occupancy levels are dynamically balanced, improving consistency efficiency and improving multi-core processor performance.
需要说明的是, 还可每隔指定的多内核处理器的时钟周期之后进行 共享阔值的更新。  It should be noted that the shared threshold update can also be performed after the clock cycle of the specified multi-core processor.
208、 删除所述多内核处理器的内核中緩存的所述第一数据页中的 数据。  208. Delete data in the first data page cached in a kernel of the multi-core processor.
其中, 第一数据页满足緩存所述第一数据页中的数据的内核数量大 于更新后的共享阔值。  The first data page satisfies the number of cores that cache data in the first data page is greater than the updated shared threshold.
可选的, 若多内核处理器的内核中緩存有片外内存的第一数据页中 的数据, 删除多内核处理器的内核中存储的第一数据页中的数据, 以使 所述多内核处理器中的稀疏目录相应地删除第一数据页对应的目录项, 其中, 第一数据页对应的目录项用于记录緩存第一数据页的各个数据块 中的数据的内核。  Optionally, if the data in the first data page of the off-chip memory is cached in the kernel of the multi-core processor, the data in the first data page stored in the kernel of the multi-core processor is deleted, so that the multi-core is The sparse directory in the processor correspondingly deletes the directory entry corresponding to the first data page, wherein the directory entry corresponding to the first data page is used to record a kernel that caches data in each data block of the first data page.
本实施例中, 根据多内核处理器中对目标共享数据进行共享的第二 内核的数量, 釆用目录协议或侦听协议对目标共享数据进行一致性处 理, 能够对该目标共享数据釆用适合的协议进行一致性处理, 并且由于 避免了对多内核处理器中的全部共享数据釆用相同的协议进行一致性 处理,对部分共享数据釆用侦听协议,另一部分共享数据釆用目录协议, 从而相比较于现有技术不仅节省了釆用侦听协议的共享数据在稀疏目 录中所占用的目录项, 而且节省了釆用目录协议的共享数据进行一致性 处理时所产生广播消息, 因而提高了一致性处理效率。 In this embodiment, according to the number of second cores shared by the target shared data in the multi-core processor, the target shared data is consistently processed by using a directory protocol or a listening protocol, and the target shared data can be used appropriately. Protocol for consistency processing, and because It avoids the consistency processing of all shared data in the multi-core processor, the same protocol for the shared data, and the shared data using the directory protocol, so that it saves compared with the prior art. The use of the shared data of the listening protocol in the directory directory occupied by the sparse directory, and the broadcast message generated when the shared data of the directory protocol is used for consistency processing is saved, thereby improving the consistency processing efficiency.
图 3为本发明一实施例提供的基于多内核处理器的一致性处理装置 的结构示意图, 如图 3所示, 包括: 接收模块 31和处理模块 32。  FIG. 3 is a schematic structural diagram of a multi-core processor-based consistency processing apparatus according to an embodiment of the present invention. As shown in FIG. 3, the method includes: a receiving module 31 and a processing module 32.
接收模块 31 ,用于接收多内核处理器中第一内核发送的一致性请求 消息。  The receiving module 31 is configured to receive a consistency request message sent by the first core in the multi-core processor.
其中, 一致性请求消息用于指示待进行一致性处理的目标共享数 据。  The consistency request message is used to indicate the target shared data to be processed consistently.
处理模块 32 , 与接收模块 31连接, 用于根据所述多内核处理器中 第二内核的数量, 选择釆用目录协议或侦听协议中的一种对所述目标共 享数据进行一致性处理。  The processing module 32 is connected to the receiving module 31, and is configured to perform consistency processing on the target shared data by using one of a directory protocol or a listening protocol according to the number of the second cores in the multi-core processor.
其中, 第二内核为共享所述目标共享数据的内核。  The second kernel is a kernel that shares the target shared data.
可选的, 判断所述第二内核的数量是否大于预先确定的共享阔值, 若所述第二内核的数量不大于所述共享阔值, 则釆用侦听协议对所述目 标共享数据进行一致性处理, 若所述第二内核的数量大于所述共享阔 值, 则釆用目录协议对所述目标共享数据进行一致性处理。 其中, 共享 阔值为大于零且小于所述多内核处理器的内核数量的整数。  Optionally, determining whether the number of the second kernel is greater than a predetermined shared threshold, and if the number of the second kernel is not greater than the shared threshold, using the interception protocol to perform the target shared data. Consistency processing, if the number of the second kernel is greater than the shared threshold, the directory protocol is used to perform consistency processing on the target shared data. Wherein, the shared threshold is an integer greater than zero and less than the number of cores of the multi-core processor.
本实施例提供的基于多内核处理器的一致性处理装置的各功能模 块可用于执行图 1所示的基于多内核处理器的一致性处理装置方法的流 程, 其具体工作原理不再赘述, 详见方法实施例的描述。  The functional modules of the multi-core processor-based coherency processing device provided in this embodiment may be used to execute the process of the multi-core processor-based coherent processing device method shown in FIG. 1 , and the specific working principle is not described again. See the description of the method embodiment.
本实施例中, 根据多内核处理器中对目标共享数据进行共享的第二 内核的数量, 釆用目录协议或侦听协议对目标共享数据进行一致性处 理, 能够对该目标共享数据釆用适合的协议进行一致性处理, 并且由于 避免了对多内核处理器中的全部共享数据釆用相同的协议进行一致性 处理,对部分共享数据釆用侦听协议,另一部分共享数据釆用目录协议, 从而相比较于现有技术不仅节省了釆用侦听协议的共享数据在稀疏目 录中所占用的目录项, 而且节省了釆用目录协议的共享数据进行一致性 处理时所产生广播消息, 因而提高了一致性处理效率。 In this embodiment, according to the number of second cores shared by the target shared data in the multi-core processor, the target shared data is consistently processed by using a directory protocol or a listening protocol, and the target shared data can be used appropriately. The protocol is processed consistently, and since all the shared data in the multi-core processor is avoided, the same protocol is used for consistency processing, the shared data is used for the listening protocol, and the other part is shared with the directory protocol. Therefore, compared with the prior art, not only the directory items occupied by the shared data of the listening protocol in the sparse directory are saved, but also the broadcast message generated when the shared data of the directory protocol is used for consistency processing is saved, thereby improving Consistency processing efficiency.
图 4为本发明另一实施例提供的基于多内核处理器的一致性处理装 置的结构示意图, 如图 4所示, 在上一实施例的基础上, 本实施例中的 处理模块 32 , 包括: 判断单元 321、 第一处理单元 322和第二处理单元 323。  FIG. 4 is a schematic structural diagram of a multi-core processor-based consistency processing apparatus according to another embodiment of the present invention. As shown in FIG. 4, on the basis of the previous embodiment, the processing module 32 in this embodiment includes : a judging unit 321, a first processing unit 322, and a second processing unit 323.
判断单元 321 , 用于判断所述第二内核的数量是否大于预先确定的 共享阔值; 所述共享阔值为大于零且小于所述多内核处理器的内核数量 的整数;  The determining unit 321 is configured to determine whether the number of the second kernel is greater than a predetermined shared threshold; the shared threshold is an integer greater than zero and less than the number of cores of the multi-core processor;
第一处理单元 322 , 与判断单元 321连接, 用于若所述第二内核的 数量不大于所述共享阔值, 则釆用侦听协议对所述目标共享数据进行一 致性处理;  The first processing unit 322 is connected to the determining unit 321 , configured to: if the number of the second core is not greater than the shared threshold, use the interception protocol to perform consistent processing on the target shared data;
第二处理单元 323 , 与判断单元 321连接, 用于若所述第二内核的 数量大于所述共享阔值, 则釆用目录协议对所述目标共享数据进行一致 性处理。  The second processing unit 323 is connected to the determining unit 321 for performing consistency processing on the target shared data by using a directory protocol if the number of the second cores is greater than the shared threshold.
进一步, 目标共享数据为片外内存的目标数据页中的数据。  Further, the target shared data is data in a target data page of off-chip memory.
其中, 片外内存与多内核处理器连接, 用于存储目标共享数据。 基于此,基于多内核处理器的一致性处理装置还包括: 更新模块 33 和删除模块 34。  The off-chip memory is connected to the multi-core processor and is used to store the target shared data. Based on this, the multi-core processor-based coherency processing device further includes: an update module 33 and a delete module 34.
更新模块 33 , 与处理模块 32连接, 用于根据多内核处理器的网络 冲突率和多内核处理器的稀疏目录替换率, 更新共享阔值。  The update module 33 is coupled to the processing module 32 for updating the shared threshold according to the network collision rate of the multi-core processor and the sparse directory replacement rate of the multi-core processor.
其中, 网络冲突率, 指示用于在所述多内核处理器的内核之间传递 消息的网络的拥塞程度; 稀疏目录替换率, 指示所述多内核处理器中的 稀疏目录的存储空间占用程度。  The network collision rate indicates a degree of congestion of a network for transmitting messages between cores of the multi-core processor; a sparse directory replacement rate indicating a storage space occupancy of a sparse directory in the multi-core processor.
删除模块 34 , 与更新模块 33连接, 用于若所述多内核处理器的内 核中緩存有所述片外内存的第一数据页中的数据, 删除所述多内核处理 器的内核緩存的所述第一数据页中的数据, 以使所述多内核处理器中的 稀疏目录删除所述第一数据页对应的目录项。 The deleting module 34 is connected to the update module 33, and configured to delete the kernel cache of the multi-core processor if the data in the first data page of the off-chip memory is cached in the kernel of the multi-core processor Decoding the data in the first data page to enable the multi-core processor The sparse directory deletes the directory entry corresponding to the first data page.
其中, 第一数据页对应的目录项用于记录对所述第一数据页的各个 数据块中的数据进行緩存的内核; 所述第一数据页满足緩存所述第一数 据页中的数据的内核数量大于更新后的共享阔值。  The directory entry corresponding to the first data page is configured to record a kernel that caches data in each data block of the first data page; the first data page is configured to cache data in the first data page. The number of cores is greater than the updated shared threshold.
进一步,更新模块 33包括:第一更新单元 331和第二更新单元 332。 第一更新单元 331 , 用于若所述网络冲突率高于第一阈值, 并且所 述稀疏目录替换率低于第三阈值, 则确定所述更新后的共享阈值为所述 共享阈值的二倍。  Further, the update module 33 includes a first update unit 331 and a second update unit 332. The first update unit 331 is configured to determine that the updated sharing threshold is twice the sharing threshold if the network collision rate is higher than a first threshold, and the sparse directory replacement rate is lower than a third threshold. .
其中, 网络冲突率为实际传递时间和理论传递时间之差, 与所述理 论传递时间之间的比值。 理论传递时间是计算获得的当所述网络的状态 为畅通时, 至少一个测试消息在所述网络中进行传递所需的总时间; 实 际传递时间是统计获得的所述至少一个测试消息在所述网络中进行传 递所实际使用的总时间。稀疏目录替换率为所述稀疏目录在指定时间执 行读操作的次数, 与指定时间内所述稀疏目录的空闲存储空间大小为零 的次数之间的比值。  Wherein, the network collision rate is the ratio between the actual delivery time and the theoretical delivery time, and the ratio of the theoretical delivery time. The theoretical delivery time is calculated by the total time required for at least one test message to be delivered in the network when the state of the network is unblocked; the actual delivery time is the statistically obtained at least one test message in the The total time actually used for delivery in the network. The sparse directory replacement rate is the ratio of the number of times the sparse directory performs a read operation at a specified time to the number of times the free storage space size of the sparse directory is zero within a specified time.
第二更新单元 332 , 用于若所述网络冲突率低于第二阔值, 并且所 述稀疏目录替换率高于第四阔值, 则确定所述更新后的共享阔值为所述 共享阔值的一半。  a second updating unit 332, configured to determine, if the network conflict rate is lower than a second threshold, and the sparse directory replacement rate is higher than a fourth threshold, determining that the updated sharing threshold is the shared width Half of the value.
本实施例提供的基于多内核处理器的一致性处理装置的各功能模 块可用于执行图 2所示的基于多内核处理器的一致性处理装置方法的流 程, 其具体工作原理不再赘述, 详见方法实施例的描述。  The functional modules of the multi-core processor-based coherency processing device provided in this embodiment may be used to execute the process of the multi-core processor-based coherent processing device method shown in FIG. 2, and the specific working principle is not described again. See the description of the method embodiment.
本实施例中, 根据多内核处理器中对目标共享数据进行共享的第二 内核的数量, 釆用目录协议或侦听协议对目标共享数据进行一致性处 理, 能够对该目标共享数据釆用适合的协议进行一致性处理, 并且由于 避免了对多内核处理器中的全部共享数据釆用相同的协议进行一致性 处理,对部分共享数据釆用侦听协议,另一部分共享数据釆用目录协议, 从而相比较于现有技术不仅节省了釆用侦听协议的共享数据在稀疏目 录中所占用的目录项, 而且节省了釆用目录协议的共享数据进行一致性 处理时所产生广播消息, 因而提高了一致性处理效率。 In this embodiment, according to the number of second cores shared by the target shared data in the multi-core processor, the target shared data is consistently processed by using a directory protocol or a listening protocol, and the target shared data can be used appropriately. The protocol is processed consistently, and since all the shared data in the multi-core processor is avoided, the same protocol is used for consistency processing, the shared data is used for the listening protocol, and the other part is shared with the directory protocol. Therefore, compared with the prior art, not only the directory items occupied by the shared data of the listening protocol in the sparse directory are saved, but also the shared data of the directory protocol is saved for consistency. The broadcast message generated during processing increases the efficiency of the consistency processing.
图 5为本发明又一实施例提供的基于多内核处理器的一致性处理装 置的结构示意图, 如图 5所示, 包括: 通信接口 51、 处理器 52和存储 器 53。  FIG. 5 is a schematic structural diagram of a multi-core processor-based consistency processing apparatus according to another embodiment of the present invention. As shown in FIG. 5, the method includes: a communication interface 51, a processor 52, and a memory 53.
通信接口 51 ,用于接收多内核处理器中第一内核发送的一致性请求 消息。  The communication interface 51 is configured to receive a consistency request message sent by the first core in the multi-core processor.
其中, 一致性请求消息用于指示待进行一致性处理的目标共享数 据。  The consistency request message is used to indicate the target shared data to be processed consistently.
存储器 53 , 用于存放程序。 具体地, 程序可以包括程序代码, 所述 程序代码包括计算机操作指令。 存储器 53可能包含高速 RAM存储器, 也可能还包括非易失性存储器(non-volatile memory ), 例如至少一个磁 盘存储器。  The memory 53 is used to store the program. In particular, the program can include program code, the program code including computer operating instructions. Memory 53 may contain high speed RAM memory and may also include non-volatile memory, such as at least one disk memory.
处理器 52 , 用于执行存储器 53存放的程序, 以用于: 根据所述多 内核处理器中第二内核的数量, 选择釆用目录协议或侦听协议中的一种 对所述目标共享数据进行一致性处理。  The processor 52 is configured to execute a program stored in the memory 53, configured to: select, according to the number of second cores in the multi-core processor, use one of a directory protocol or a listening protocol to share data with the target Perform consistency processing.
其中, 第二内核为共享所述目标共享数据的内核。  The second kernel is a kernel that shares the target shared data.
可选的, 判断第二内核的数量是否大于预先确定的共享阔值, 若所 述第二内核的数量不大于所述共享阔值, 则釆用侦听协议对所述目标共 享数据进行一致性处理; 若第二内核的数量大于所述共享阔值, 则釆用 目录协议对所述目标共享数据进行一致性处理。 其中, 共享阔值为大于 零且小于多内核处理器的内核数量的整数。  Optionally, determining whether the number of the second kernel is greater than a predetermined shared threshold, and if the number of the second kernel is not greater than the shared threshold, using the interception protocol to perform consistency on the target shared data. Processing; if the number of the second kernel is greater than the shared threshold, the directory protocol is used to perform consistency processing on the target shared data. Where the shared threshold is an integer greater than zero and less than the number of cores of the multi-core processor.
进一步, 目标共享数据为片外内存的目标数据页中的数据。  Further, the target shared data is data in a target data page of off-chip memory.
基于此, 处理器 52 ,还用于根据所述多内核处理器的网络冲突率和 所述多内核处理器的稀疏目录替换率, 更新共享阈值; 若所述多内核处 理器的内核中緩存有所述片外内存的第一数据页中的数据, 删除所述多 内核处理器的内核緩存的所述第一数据页中的数据, 以使所述多内核处 理器中的稀疏目录删除所述第一数据页对应的目录项。  Based on this, the processor 52 is further configured to: update a sharing threshold according to a network collision rate of the multi-core processor and a sparse directory replacement rate of the multi-core processor; if the multi-core processor has a cache in the kernel Data in the first data page of the off-chip memory, deleting data in the first data page of the kernel cache of the multi-core processor, to delete the sparse directory in the multi-core processor The directory entry corresponding to the first data page.
其中, 第一数据页对应的目录项用于记录对所述第一数据页的各个 数据块中的数据进行緩存的内核; 所述第一数据页满足緩存所述第一数 据页中的数据的内核数量大于更新后的共享阔值。 网络冲突率为实际传 递时间和理论传递时间之差, 与所述理论传递时间之间的比值; 所述理 论传递时间是计算获得的当所述网络的状态为畅通时, 至少一个测试消 息在所述网络中进行传递所需的总时间; 所述实际传递时间是统计获得 的所述至少一个测试消息在所述网络中进行传递所实际使用的总时间。 稀疏目录替换率为所述稀疏目录在指定时间执行读操作的次数, 与所述 指定时间内所述稀疏目录的空闲存储空间大小为零的次数之间的比值。 The directory entry corresponding to the first data page is used to record each of the first data pages. The kernel in the data block is cached; the first data page satisfies the number of kernels that cache data in the first data page is greater than the updated shared threshold. The network conflict rate is the ratio between the actual transit time and the theoretical transit time, and the theoretical transfer time; the theoretical transit time is calculated and obtained when the state of the network is unblocked, at least one test message is in the The total time required for delivery in the network; the actual delivery time is the total time actually used by the statistically obtained delivery of the at least one test message in the network. The sparse directory replacement rate is a ratio between the number of times the sparse directory performs a read operation at a specified time and the number of times the free storage space size of the sparse directory is zero within the specified time.
本实施例提供的基于多内核处理器的一致性处理装置的各功能模 块可用于执行图 2所示的基于多内核处理器的一致性处理装置方法的流 程, 其具体工作原理不再赘述, 详见方法实施例的描述。  The functional modules of the multi-core processor-based coherency processing device provided in this embodiment may be used to execute the process of the multi-core processor-based coherent processing device method shown in FIG. 2, and the specific working principle is not described again. See the description of the method embodiment.
本实施例中, 根据多内核处理器中对目标共享数据进行共享的第二 内核的数量, 釆用目录协议或侦听协议对目标共享数据进行一致性处 理, 能够对该目标共享数据釆用适合的协议进行一致性处理, 并且由于 避免了对多内核处理器中的全部共享数据釆用相同的协议进行一致性 处理,对部分共享数据釆用侦听协议,另一部分共享数据釆用目录协议, 从而相比较于现有技术不仅节省了釆用侦听协议的共享数据在稀疏目 录中所占用的目录项, 而且节省了釆用目录协议的共享数据进行一致性 处理时所产生广播消息, 因而提高了一致性处理效率。  In this embodiment, according to the number of second cores shared by the target shared data in the multi-core processor, the target shared data is consistently processed by using a directory protocol or a listening protocol, and the target shared data can be used appropriately. The protocol is processed consistently, and since all the shared data in the multi-core processor is avoided, the same protocol is used for consistency processing, the shared data is used for the listening protocol, and the other part is shared with the directory protocol. Therefore, compared with the prior art, not only the directory items occupied by the shared data of the listening protocol in the sparse directory are saved, but also the broadcast message generated when the shared data of the directory protocol is used for consistency processing is saved, thereby improving Consistency processing efficiency.
本领域普通技术人员可以理解: 实现上述各方法实施例的全部或部 分步骤可以通过程序指令相关的硬件来完成。 前述的程序可以存储于一 计算机可读取存储介质中。 该程序在执行时, 执行包括上述各方法实施 例的步骤; 而前述的存储介质包括: ROM、 RAM, 磁碟或者光盘等各 种可以存储程序代码的介质。 非对其限制; 尽管参照前述各实施例对本发明进行了详细的说明, 本领 域的普通技术人员应当理解: 其依然可以对前述各实施例所记载的技术 方案进行修改, 或者对其中部分或者全部技术特征进行等同替换; 而这 些修改或者替换, 并不使相应技术方案的本质脱离本发明各实施例技术 方案的范围。 It will be understood by those skilled in the art that all or part of the steps of implementing the above method embodiments may be performed by hardware related to the program instructions. The aforementioned program can be stored in a computer readable storage medium. When the program is executed, the steps including the foregoing method embodiments are performed; and the foregoing storage medium includes: a medium that can store program codes, such as a ROM, a RAM, a magnetic disk, or an optical disk. The invention is not limited thereto; although the invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that the technical solutions described in the foregoing embodiments may be modified, or some or all of them may be modified. Technical features for equivalent replacement; and this The modifications and substitutions do not depart from the scope of the technical solutions of the embodiments of the present invention.

Claims

权利 要求 书 Claim
1、 一种基于多内核处理器的一致性处理方法, 其特征在于, 包括: 接收多内核处理器中第一内核发送的一致性请求消息; 所述一致性 请求消息用于指示待进行一致性处理的目标共享数据; A consistency processing method based on a multi-core processor, comprising: receiving a consistency request message sent by a first kernel in a multi-core processor; the consistency request message is used to indicate consistency to be performed The target of the processing shares data;
根据所述多内核处理器中第二内核的数量, 选择釆用目录协议或侦 听协议中的一种对所述目标共享数据进行一致性处理; 所述第二内核为 共享所述目标共享数据的内核。  Selecting, according to the number of the second kernels in the multi-core processor, selecting one of a directory protocol or a listening protocol to perform consistency processing on the target shared data; and the second kernel is sharing the target shared data. Kernel.
2、 根据权利要求 1所述的基于多内核处理器的一致性处理方法, 其 特征在于, 所述目标共享数据为片外内存的目标数据页中的数据; 所述 片外内存用于为所述多内核处理器提供所述目标共享数据。  2. The multi-core processor-based consistency processing method according to claim 1, wherein the target shared data is data in a target data page of off-chip memory; and the off-chip memory is used for The multi-core processor provides the target shared data.
3、根据权利要求 1或 2所述的基于多内核处理器的一致性处理方法, 其特征在于, 所述根据所述多内核处理器中第二内核的数量, 选择釆用 目录协议或侦听协议中的一种对所述目标共享数据进行一致性处理, 包 括:  The multi-core processor-based consistency processing method according to claim 1 or 2, wherein the selecting a directory protocol or listening according to the number of second cores in the multi-core processor One of the protocols performs consistency processing on the target shared data, including:
判断所述第二内核的数量是否大于预先确定的共享阔值; 所述共享 阔值为大于零且小于所述多内核处理器的内核数量的整数;  Determining whether the number of the second kernel is greater than a predetermined shared threshold; the shared threshold is an integer greater than zero and less than the number of cores of the multi-core processor;
若所述第二内核的数量不大于所述共享阔值, 则釆用侦听协议对所 述目标共享数据进行一致性处理;  If the number of the second kernel is not greater than the shared threshold, the target shared data is consistently processed by using a listening protocol;
若所述第二内核的数量大于所述共享阔值, 则釆用目录协议对所述 目标共享数据进行一致性处理。  If the number of the second kernel is greater than the shared threshold, the directory sharing protocol is used to perform consistency processing on the target shared data.
4、 根据权利要求 3所述的基于多内核处理器的一致性处理方法, 其 特征在于, 所述根据所述多内核处理器中第二内核的数量, 选择釆用目 录协议或侦听协议中的一种对所述目标共享数据进行一致性处理之后, 包括: The multi-core processor-based consistency processing method according to claim 3, wherein the selecting the directory protocol or the listening protocol according to the number of the second kernels in the multi-core processor After performing consistency processing on the target shared data, the method includes:
根据所述多内核处理器的网络冲突率和所述多内核处理器的稀疏目 录替换率, 更新所述共享阔值; 所述网络冲突率, 指示用于在所述多内 核处理器的内核之间传递消,包、的网络的拥塞程度; 所述稀疏目录替换率, 指示所述多内核处理器中的稀疏目录的存储空间占用程度; 若所述多内核处理器的内核中緩存有所述片外内存的第一数据页中 的数据, 删除所述多内核处理器的内核緩存的所述第一数据页中的数据, 以使所述多内核处理器中的稀疏目录删除所述第一数据页对应的目录 项; 所述第一数据页对应的目录项用于记录对所述第一数据页的各个数 据块中的数据进行緩存的内核; 所述第一数据页满足緩存所述第一数据 页中的数据的内核数量大于更新后的共享阔值。 Updating the shared threshold according to a network collision rate of the multi-core processor and a sparse directory replacement rate of the multi-core processor; the network collision rate is indicated for a kernel in the multi-core processor The degree of congestion of the packet, the network, the sparse directory replacement rate, indicating the storage space occupancy of the sparse directory in the multi-core processor; if the kernel of the multi-core processor is cached The first data page of off-chip memory Data, deleting data in the first data page of the kernel cache of the multi-core processor, such that a sparse directory in the multi-core processor deletes a directory entry corresponding to the first data page; a directory entry corresponding to the first data page is used to record a kernel that caches data in each data block of the first data page; the first data page satisfies a number of cores that cache data in the first data page Greater than the updated shared threshold.
5、 根据权利要求 4所述的基于多内核处理器的一致性处理方法, 其 特征在于, 所述根据所述网络冲突率和所述稀疏目录替换率, 更新所述 共享阔值, 包括:  The multi-core processor-based consistency processing method according to claim 4, wherein the updating the shared threshold according to the network conflict rate and the sparse directory replacement rate comprises:
若所述网络冲突率高于第一阔值, 并且所述稀疏目录替换率低于第 三阔值, 则确定所述更新后的共享阈值为所述共享阈值的二倍;  If the network conflict rate is higher than the first threshold, and the sparse directory replacement rate is lower than the third threshold, determining that the updated sharing threshold is twice the sharing threshold;
若所述网络冲突率低于第二阔值, 并且所述稀疏目录替换率高于第 四阔值, 则确定所述更新后的共享阔值为所述共享阔值的一半。  And if the network conflict rate is lower than the second threshold, and the sparse directory replacement rate is higher than the fourth threshold, determining that the updated shared threshold is half of the shared threshold.
6、 根据权利要求 4所述的基于多内核处理器的一致性处理方法, 其 特征在于, 所述网络冲突率为实际传递时间和理论传递时间之差, 与所 述理论传递时间之间的比值; 所述理论传递时间是计算获得的当所述网 络的状态为畅通时, 至少一个测试消息在所述网络中进行传递所需的总 时间; 所述实际传递时间是统计获得的所述至少一个测试消息在所述网 络中进行传递所实际使用的总时间。  The multi-core processor-based consistency processing method according to claim 4, wherein the network collision rate is a ratio between an actual delivery time and a theoretical delivery time, and a ratio between the theoretical transmission time and the theoretical transmission time. The theoretical delivery time is a calculated total time required for at least one test message to be delivered in the network when the state of the network is unblocked; the actual delivery time is the at least one obtained by statistics The total time that the test message was actually used for delivery in the network.
7、 根据权利要求 4所述的基于多内核处理器的一致性处理方法, 其 特征在于, 所述稀疏目录替换率为所述稀疏目录在指定时间执行读操作 的次数, 与所述指定时间内所述稀疏目录的空闲存储空间大小为零的次 数之间的比值。  The multi-core processor-based consistency processing method according to claim 4, wherein the sparse directory replacement rate is the number of times the sparse directory performs a read operation at a specified time, and the specified time The ratio between the number of times the free storage space of the sparse directory is zero.
8、 一种基于多内核处理器的一致性处理装置, 其特征在于, 包括: 接收模块, 用于接收多内核处理器中第一内核发送的一致性请求消 息; 所述一致性请求消息用于指示待进行一致性处理的目标共享数据; 处理模块, 用于根据所述多内核处理器中第二内核的数量, 选择釆 用目录协议或侦听协议中的一种对所述目标共享数据进行一致性处理; 所述第二内核为共享所述目标共享数据的内核。  A consistency processing device based on a multi-core processor, comprising: a receiving module, configured to receive a consistency request message sent by a first kernel in a multi-core processor; the consistency request message is used a target shared data indicating consistency processing; a processing module, configured to select one of a target directory protocol or a listening protocol to perform the target shared data according to the number of second cores in the multi-core processor Consistency processing; the second kernel is a kernel that shares the target shared data.
9、 根据权利要求 8所述的基于多内核处理器的一致性处理装置, 其 特征在于, 所述目标共享数据为片外内存的目标数据页中的数据; 所述 片外内存用于为所述多内核处理器提供所述目标共享数据。 9. The multi-core processor-based consistency processing apparatus according to claim 8, The feature is that the target shared data is data in a target data page of off-chip memory; the off-chip memory is used to provide the target shared data for the multi-core processor.
10、 根据权利要求 8或 9所述的基于多内核处理器的一致性处理装 置, 其特征在于, 所述处理模块, 包括: 判断单元, 用于判断所述第二内核的数量是否大于预先确定的共享 阔值; 所述共享阔值为大于零且小于所述多内核处理器的内核数量的整 数;  The multi-core processor-based consistency processing device according to claim 8 or 9, wherein the processing module comprises: a determining unit, configured to determine whether the number of the second kernel is greater than a predetermined a shared threshold; the shared threshold is an integer greater than zero and less than the number of cores of the multi-core processor;
第一处理单元, 用于若所述第二内核的数量不大于所述共享阔值, 则釆用侦听协议对所述目标共享数据进行一致性处理;  a first processing unit, configured to: if the number of the second kernel is not greater than the shared threshold, use a listening protocol to perform consistency processing on the target shared data;
第二处理单元, 用于若所述第二内核的数量大于所述共享阔值, 则 釆用目录协议对所述目标共享数据进行一致性处理。  And a second processing unit, configured to perform consistency processing on the target shared data by using a directory protocol if the number of the second cores is greater than the shared threshold.
11、 根据权利要求 10所述的基于多内核处理器的一致性处理装置, 其特征在于, 所述基于多内核处理器的一致性处理装置, 还包括: 更新模块, 用于根据所述多内核处理器的网络冲突率和所述多内核 处理器的稀疏目录替换率, 更新所述共享阔值; 所述网络冲突率, 指示 用于在所述多内核处理器的内核之间传递消息的网络的拥塞程度; 所述 稀疏目录替换率, 指示所述多内核处理器中的稀疏目录的存储空间占用 程度; The multi-core processor-based coherency processing device according to claim 10, wherein the multi-core processor-based coherency processing device further comprises: an update module, configured to: according to the multi-core Updating a shared threshold of a network collision rate of the processor and a sparse directory replacement rate of the multi-core processor; the network collision rate indicating a network for transmitting messages between cores of the multi-core processor The degree of congestion; the sparse directory replacement rate indicating the storage space occupancy of the sparse directory in the multi-core processor;
删除模块, 用于若所述多内核处理器的内核中緩存有所述片外内存 的第一数据页中的数据, 删除所述多内核处理器的内核緩存的所述第一 数据页中的数据, 以使所述多内核处理器中的稀疏目录删除所述第一数 据页对应的目录项; 所述第一数据页对应的目录项用于记录对所述第一 数据页的各个数据块中的数据进行緩存的内核; 所述第一数据页满足緩 存所述第一数据页中的数据的内核数量大于更新后的共享阔值。  a deleting module, configured to: if the data in the first data page of the off-chip memory is cached in a kernel of the multi-core processor, deleting the first data page of the kernel cache of the multi-core processor Data, such that the sparse directory in the multi-core processor deletes a directory entry corresponding to the first data page; the directory entry corresponding to the first data page is used to record each data block of the first data page The core of the data is cached; the first data page satisfies the number of kernels that cache data in the first data page is greater than the updated shared threshold.
12、 根据权利要求 11所述的基于多内核处理器的一致性处理装置, 其特征在于, 所述更新模块, 包括: 第一更新单元, 用于若所述网络冲突率高于第一阔值, 并且所述稀 疏目录替换率低于第三阔值, 则确定所述更新后的共享阔值为所述共享 阔值的二倍; 第二更新单元, 用于若所述网络冲突率低于第二阔值, 并且所述稀 疏目录替换率高于第四阔值, 则确定所述更新后的共享阔值为所述共享 阔值的一半。 The multi-core processor-based consistency processing device according to claim 11, wherein the updating module comprises: a first updating unit, configured to: if the network conflict rate is higher than a first threshold And determining, by the sparse directory replacement rate, that the updated sharing threshold is twice the shared threshold; a second updating unit, configured to determine that the updated sharing threshold is the shared threshold if the network collision rate is lower than a second threshold, and the sparse directory replacement rate is higher than a fourth threshold Half of it.
13、 根据权利要求 11所述的基于多内核处理器的一致性处理装置, 其特征在于, 所述网络冲突率为实际传递时间和理论传递时间之差, 与 所述理论传递时间之间的比值; 所述理论传递时间是计算获得的当所述 网络的状态为畅通时, 至少一个测试消息在所述网络中进行传递所需的 总时间; 所述实际传递时间是统计获得的所述至少一个测试消息在所述 网络中进行传递所实际使用的总时间。  13. The multi-core processor-based consistency processing apparatus according to claim 11, wherein the network collision rate is a ratio between an actual delivery time and a theoretical delivery time, and a ratio between the theoretical delivery time and the theoretical delivery time. The theoretical delivery time is a calculated total time required for at least one test message to be delivered in the network when the state of the network is unblocked; the actual delivery time is the at least one obtained by statistics The total time that the test message was actually used for delivery in the network.
14、 根据权利要求 11所述的基于多内核处理器的一致性处理装置, 其特征在于, 所述稀疏目录替换率为所述稀疏目录在指定时间执行读操 作的次数, 与所述指定时间内所述稀疏目录的空闲存储空间大小为零的 次数之间的比值。  14. The multi-core processor-based consistency processing apparatus according to claim 11, wherein the sparse directory replacement rate is a number of times the sparse directory performs a read operation at a specified time, and the specified time The ratio between the number of times the free storage space of the sparse directory is zero.
PCT/CN2014/080169 2013-06-26 2014-06-18 Consistency processing method and device based on multi-core processor WO2014206232A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201310260830.3 2013-06-26
CN201310260830.3A CN104252423B (en) 2013-06-26 2013-06-26 Consistency processing method and device based on multi-core processor

Publications (1)

Publication Number Publication Date
WO2014206232A1 true WO2014206232A1 (en) 2014-12-31

Family

ID=52141037

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2014/080169 WO2014206232A1 (en) 2013-06-26 2014-06-18 Consistency processing method and device based on multi-core processor

Country Status (2)

Country Link
CN (1) CN104252423B (en)
WO (1) WO2014206232A1 (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105183641B (en) * 2015-08-13 2017-12-12 浪潮(北京)电子信息产业有限公司 The data consistency verification method and system of a kind of kernel module
CN109684237B (en) * 2018-11-20 2021-06-01 华为技术有限公司 Data access method and device based on multi-core processor
WO2020132987A1 (en) * 2018-12-26 2020-07-02 华为技术有限公司 Data reading method, device, and multi-core processor
CN110008436B (en) * 2019-03-07 2021-03-26 中国科学院计算技术研究所 Fast Fourier transform method, system and storage medium based on data stream architecture

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050144395A1 (en) * 2001-03-14 2005-06-30 Wisconsin Alumni Research Foundation Bandwidth-adaptive, hybrid, cache-coherence protocol
CN101354682A (en) * 2008-09-12 2009-01-28 中国科学院计算技术研究所 Apparatus and method for settling access catalog conflict of multi-processor
US20090172294A1 (en) * 2007-12-28 2009-07-02 Fryman Joshua B Method and apparatus for supporting scalable coherence on many-core products through restricted exposure
CN102103568A (en) * 2011-01-30 2011-06-22 中国科学院计算技术研究所 Method for realizing cache coherence protocol of chip multiprocessor (CMP) system
CN102281332A (en) * 2011-08-31 2011-12-14 上海西本网络科技有限公司 Distributed cache array and data updating method thereof

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050144395A1 (en) * 2001-03-14 2005-06-30 Wisconsin Alumni Research Foundation Bandwidth-adaptive, hybrid, cache-coherence protocol
US20090172294A1 (en) * 2007-12-28 2009-07-02 Fryman Joshua B Method and apparatus for supporting scalable coherence on many-core products through restricted exposure
CN101354682A (en) * 2008-09-12 2009-01-28 中国科学院计算技术研究所 Apparatus and method for settling access catalog conflict of multi-processor
CN102103568A (en) * 2011-01-30 2011-06-22 中国科学院计算技术研究所 Method for realizing cache coherence protocol of chip multiprocessor (CMP) system
CN102281332A (en) * 2011-08-31 2011-12-14 上海西本网络科技有限公司 Distributed cache array and data updating method thereof

Also Published As

Publication number Publication date
CN104252423A (en) 2014-12-31
CN104252423B (en) 2017-12-15

Similar Documents

Publication Publication Date Title
US10223326B2 (en) Direct access persistent memory shared storage
US9304924B2 (en) Cache coherent handshake protocol for in-order and out-of-order networks
US20170193416A1 (en) Reducing costs related to use of networks based on pricing heterogeneity
EP3441884B1 (en) Method for managing translation lookaside buffer and multi-core processor
EP4220415A2 (en) Method and apparatus for compressing addresses
CN105677580A (en) Method and device for accessing cache
KR20160040274A (en) Memory access processing method, apparatus, and system
US20170004101A1 (en) Data copying method, direct memory access controller, and computer system
US10318165B2 (en) Data operating method, device, and system
TWI502346B (en) Directory cache allocation based on snoop response information
US20190230161A1 (en) System and method for improved storage access in multi core system
CN107341114B (en) Directory management method, node controller and system
CN111080510B (en) Data processing apparatus, data processing method, chip, processor, device, and storage medium
WO2015010646A1 (en) Hybrid memory data access method, module, processor and terminal device
WO2014206232A1 (en) Consistency processing method and device based on multi-core processor
EP3115904B1 (en) Method for managing a distributed cache
WO2015196378A1 (en) Method, device and user equipment for reading/writing data in nand flash
CN115964319A (en) Data processing method for remote direct memory access and related product
CN111406251B (en) Data prefetching method and device
CN109478171B (en) Improving throughput in openfabics environment
CN114285676B (en) Intelligent network card, network storage method and medium of intelligent network card
JP6343722B2 (en) Method and device for accessing a data visitor directory in a multi-core system
WO2017031637A1 (en) Memory access method, apparatus and system
CN110083548B (en) Data processing method and related network element, equipment and system
WO2019149031A1 (en) Data processing method and apparatus applied to node system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14818428

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14818428

Country of ref document: EP

Kind code of ref document: A1