CN105808497A - Data processing method - Google Patents

Data processing method Download PDF

Info

Publication number
CN105808497A
CN105808497A CN201410843391.3A CN201410843391A CN105808497A CN 105808497 A CN105808497 A CN 105808497A CN 201410843391 A CN201410843391 A CN 201410843391A CN 105808497 A CN105808497 A CN 105808497A
Authority
CN
China
Prior art keywords
data
buffer memory
date
memory
home agent
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410843391.3A
Other languages
Chinese (zh)
Other versions
CN105808497B (en
Inventor
郑伟
陆斌
赵献明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN201410843391.3A priority Critical patent/CN105808497B/en
Publication of CN105808497A publication Critical patent/CN105808497A/en
Application granted granted Critical
Publication of CN105808497B publication Critical patent/CN105808497B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Memory System Of A Hierarchy Structure (AREA)

Abstract

The invention provides a data processing method. The data processing method is applied to a CC-NUMA (Cache Coherency-Non Uniform Memory Access) system, wherein the CC-NUMA system comprises a first processor and a second processor; the first processor and the second processor are connected via a bus; the first processor includes a first inner core, a first cache and a first decoder; the second processor includes a second home agent and a second cache; and the second home agent is connected with the second cache. The data processing method comprises the following steps: the first inner core sends a first request, wherein the first request carries an address of to-be-read data; the first cache receives the first request, inquires the address which points to the second cache from the first decoder, and sends a second request to the second home agent, wherein the data requested by the second request include at least two parts of data, a first part of data includes the to-be-read data, the second home agent provides the latest first part of data to the first inner core in priority and then provides the rest part of data. By applying the data processing method, the waiting time of the first inner core can be reduced.

Description

A kind of data processing method
Technical field
The present invention relates to field of processors, particularly to buffer consistency nonuniform memory access.
Background technology
Multicomputer system is connected by system bus, buffer consistency nonuniform memory access (the CacheCoherency-NonUniformMemoryAccess constituted, CC-NUMA) system, CC-NUMA system is substantially a distributed shared memory multicomputer system.Each processor has the privately owned buffer memory (Cache) of oneself, and is mounted with internal memory.Due to the close coupling relation of existing processor and Memory Controller Hub, processor is when accessing different address space internal memory, and access speed and bandwidth are inconsistent.If what that is access is the internal memory of oneself, access speed ratio is very fast, if what access is the carry internal memory at other processors, then speed is relatively slow.
In existing CC-NUMA system, unified data granularity is used to solve Data Consistency, for instance the granularity of generally common cache lines is 64 bytes (byte).In most cases, program is actually needed the data of use and is far smaller than 64 bytes, for 8 bytes, the data of this 8 byte are stored in internal memory, so according to the numerical value of granularity, what read from internal memory is the data of 64 bytes, is then passed through a series of buffer consistency and processes, and CPU (buffer memory and kernel) finally receives this 64 byte data.The data of this 64 byte contain 8 byte datas that program needs to use, but, the time needed for solving the cache coherency problems of 64 byte datas and returning to specific CPU (request data person) is far longer than the corresponding time of 8 byte datas.
In prior art, processor core is when the program of operation, and the time waiting pending data is relatively long.
Summary of the invention
The present invention provides a kind of data processing method, it is possible to reduce the time of the pending datas such as processor core.
First aspect, the present invention provides a kind of data processing method, it is applied to buffer consistency nonuniform memory access CC-NUMA system, CC-NUMA includes first processor and the second processor, described first processor and described second processor are connected by bus, and described first processor includes the first kernel, the first buffer memory and the first decoder;Described second processor includes the second home agent, the second buffer memory, described second home agent and the second Memory linkage, including:
First kernel sends the first request, and the address of the data that continue is carried in described first request;Described first buffer memory receives described first request, from described first decoder, inquire described address point to described second internal memory, second request that sends is to described second home agent, described second asks requested data to include at least two parts data, Part I continues data described in including, and the total size of described at least two parts data is equal to the size of cache lines;After described second home agent receives described second request, it is provided that up-to-date Part I data give described first buffer memory;Then provide up-to-date remainder data to described first buffer memory respectively, described in include at least that two parts data address in described second internal memory is adjacent;Described first buffer memory obtain from described Part I data described in the Data Concurrent that continues give described first core and described at least two parts data up-to-date described in buffer memory.
The first of first aspect is likely in implementation, it is provided that up-to-date Part I data give described first buffer memory;Then provide up-to-date remainder data to described first buffer memory respectively, specifically include: from described second internal memory, read the directory information of Part I data, data mode according to directory information record, if the second internal memory stored Part I data are up-to-date Part I data, then described second home agent sends up-to-date Part I data to described first buffer memory, otherwise, search the buffer memory having up-to-date Part I data, the Part I data in the buffer memory having up-to-date Part I data are sent to described first buffer memory;Then for all the other each several part data: read directory information from described second internal memory, data mode according to directory information record, if the second internal memory these part data stored are up-to-date data, then described second home agent sends these up-to-date part data to described first buffer memory, otherwise, search the buffer memory having these up-to-date part data, these part data in the buffer memory having these up-to-date part data are sent to described first buffer memory.
The second of first aspect is likely in implementation, it is provided that up-to-date Part I data give described first buffer memory;Then provide up-to-date remainder data to described first buffer memory respectively, specifically include:
Send the inquiry request of Part I data to all buffer memorys except the first buffer memory, the Part I data in the buffer memory having up-to-date Part I data are sent to described first buffer memory;
Then for all the other each several part data: send the inquiry request of these part data to all buffer memorys except the first buffer memory, these data in the buffer memory having these up-to-date part data are sent to described first buffer memory.
The third of first aspect is likely in implementation, described Part I data be described in continue data.
In the possible implementation of the 4th kind of first aspect, described Part I data are be more than or equal to the described data that continue, and are the smallest positive integrals times of 8 bytes;The size of described cache lines is 64 bytes.
Second aspect, the present invention provides a kind of data processing method, it is applied to buffer consistency nonuniform memory access CC-NUMA system, CC-NUMA includes first processor and the second processor, described first processor and described second processor are connected by bus, and described first processor includes the first kernel, the first buffer memory and the first decoder;Described second processor includes the second home agent, the second buffer memory, described second home agent and the second Memory linkage, including: the first kernel sends the first request, and the address of the data that continue is carried in described first request;Described first buffer memory receives described first request, from described first decoder, inquire described address point to described second internal memory, second request that sends is to described second home agent, described second asks requested data to include at least two parts data, Part I continues data described in including, and the total size of described at least two parts data is equal to the size of cache lines;After described second home agent receives described second request, it is provided that up-to-date Part I data give described first buffer memory;Then provide up-to-date remainder data to described first buffer memory respectively, described in include at least that two parts data address in described second internal memory is adjacent;Described up-to-date Part I data are sent to described first core and described at least two parts data up-to-date described in buffer memory by described first buffer memory.
The first of second aspect is likely in implementation, it is provided that up-to-date Part I data give described first buffer memory;Then provide up-to-date remainder data to described first buffer memory respectively, specifically include: from described second internal memory, read the directory information of Part I data, data mode according to directory information record, if the second internal memory stored Part I data are up-to-date Part I data, then described second home agent sends up-to-date Part I data to described first buffer memory, otherwise, search the buffer memory having up-to-date Part I data, the Part I data in the buffer memory having up-to-date Part I data are sent to described first buffer memory;Then for all the other each several part data: read directory information from described second internal memory, data mode according to directory information record, if the second internal memory these part data stored are up-to-date data, then described second home agent sends these up-to-date part data to described first buffer memory, otherwise, search the buffer memory having these up-to-date part data, these part data in the buffer memory having these up-to-date part data are sent to described first buffer memory.
The second of second aspect is likely in implementation, it is provided that up-to-date Part I data give described first buffer memory;Then provide up-to-date remainder data to described first buffer memory respectively, specifically include: send the inquiry request of Part I data to all buffer memorys except the first buffer memory, the Part I data in the buffer memory having up-to-date Part I data are sent to described first buffer memory;Then for all the other each several part data: send the inquiry request of these part data to all buffer memorys except the first buffer memory, these data in the buffer memory having these up-to-date part data are sent to described first buffer memory.
The third of second aspect is likely in implementation, described Part I data be described in continue data.
In the possible implementation of the 4th kind of second aspect, described Part I data are be more than or equal to the described data that continue, and are the smallest positive integrals times of 8 bytes;The size of described cache lines is 64 bytes.
The embodiment of the present invention, second asks requested data to include at least two parts data, preferentially returns up-to-date Part I data to the first buffer memory, and what wherein Part I was requested is that the first core runs the data needed.Therefore the waiting time of the first core can be reduced.
Accompanying drawing explanation
In order to be illustrated more clearly that the technical scheme of the embodiment of the present invention, the accompanying drawing used required in embodiment or description of the prior art will be briefly described below, accompanying drawing in the following describes is only some embodiments of the present invention, it is also possible to obtain other accompanying drawing according to these accompanying drawings.
Fig. 1 is CC-NUMA example structure schematic diagram of the present invention;
Fig. 2 is data processing method embodiment flow chart of the present invention.
Detailed description of the invention
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is clearly and completely described, it is clear that described embodiment is only a part of embodiment of the present invention, rather than whole embodiments.Based on the embodiment in the present invention, the every other embodiment that those of ordinary skill in the art obtain under not making creative work premise, broadly fall into the scope of protection of the invention.
As it is shown in figure 1, CC-NUMA system involved in the present embodiment is made up of multiple processors and internal memory.Processor is such as CPU.Kernel, buffer memory, home agent and decoder is included inside processor.Buffer memory connects with kernel, home agent, address decoder respectively.Home agent can also be called internal memory home agent, for managing the internal memory being directly connected to CPU.If other CPU to access the internal memory of this CPU, it is necessary to by the internal memory transfer of this CPU.
Each processor is mounted with the Memory linkage of internal memory, home agent and carry, and internal memory is used for storing data.Kernel is the core component of processor, has operational capability, it is possible to runs program, is the party in request of data.Buffer memory preserves provisional data, uses for kernel.The speed of kernel access cache is higher than the speed accessing internal memory.The internal storage data of carry can be written and read by home agent.Address decoder, record has the corresponding relation of address and home agent, when knowing the address of data, use address decoder can inquire the home agent of correspondence, and the memory storage of this home agent institute carry has the data representated by this address.Processor carry internal memory, refers to and is directly connected to and can therefrom read data, optionally, it is also possible to the internal memory of carry is managed.Internal memory is such as random-access memory (random-accessmemory, RAM).
Use bus can connect multiple processor, for instance FASTTRACK (QPI) bus that the X 86 processor of Intel company uses.By bus, the internal memory of other processor carries can be conducted interviews by a processor indirectly.The memory speed that processor accesses direct carry is very fast, and the speed of the internal memory accessing other processor carries is slower.
In the framework of Fig. 1, first processor includes the first kernel, the first buffer memory, the first address decoder and the first home agent, the first home agent and the first Memory linkage.Second processor includes the second kernel, the second buffer memory, the second address decoder and the second home agent, the second home agent and the second Memory linkage.Buffer memory can pass through the home agent of QPI other processors of bus access.3rd processor includes the 3rd processor cores, the 3rd processor cache, the 3rd processor address decoder and the 3rd processor home agent, and the 3rd processor home agent and the 3rd processor memory connect.
Fig. 1 is example, in other embodiments, it is possible to have more processor, for instance 4 or 8 even more.When processor quantity is more, it is possible to transponder is set between processor.
Below embodiment of the present invention one data processing method is described in detail.
Step 21, the first kernel sends data read request to the first buffer memory, and the data that the present embodiment needs the first kernel call the data that continue.The demand data of the first kernel comes from the program that it runs.Read request carries the address of the data that continue.The data of the request size less than cache lines (cacheline).This request is called the first request by the present embodiment.
Under normal circumstances, the first kernel is the purpose for the program of operation, and needs use to continue data.In the present embodiment, the data that continue are less than cache line data.The data that such as continue are 8 bytes (Byte), and the size of cache lines (cacheline) is 64 bytes.The first requested data volume of request is 8 bytes.
Step 22, the first buffer memory uses address lookup first buffer memory of data of continuing whether to store to need to read data.If it has, just directly return the data that continue to the first kernel, method ends.If it did not, just send inquiry request to the first address decoder, enter step 23.
Step 23, the first buffer memory is by using first address decoder inquiry home agent corresponding to address.And send the second read request to the home agent inquired by QPI bus.In the present embodiment, it is assumed that the home agent corresponding with address is the second home agent in Fig. 1.Second home agent management and the 2nd internal memory direct-connected for CPU.
Second asks requested data to include several part, and the summation of the size of these a few part data is equal to the size of cache lines.Wherein, Part I include described in continue data.Due to these a few part data size and equal to the size of cache lines, therefore can compatible existing design.
In the present embodiment, Part I data contain the data that the first request is requested, and are the multiples of 8 bytes, for instance minimum multiple.If first asks requested data less than 8 bytes, then Part I data are 8 bytes;If Part I data are more than 8 bytes, less than 16 bytes, then the data of Part I data are 16 bytes;By that analogy.
The size using cache lines deducts Part I data length, can obtain the length of remainder data.Example one: being divided into two parts altogether, 8 bytes during Part I data, cache lines length is 64 bytes, then the length of Part II data is 56 bytes.Example two: every one part of 8 byte, cache lines length is 64 bytes, then includes Part I one and has 8 parts.
In other embodiments, it is also possible to the length keeping Part I data is identical with the data that continue.
And in prior art, the first buffer memory is sent in the request of the second home agent, the data volume of request is an entirety, it does not have being divided into two parts, its size is identical with cache lines, is also 64 bytes.
First address decoder storage has address mapping table to store, and address mapping table have recorded the address of data and the mapping relations of the storage position of data.Specifically, storage is the mapping relations of address and home agent, and in the internal memory of the direct carry of home agent, storage has the data corresponding with address.
Step 24, after the second home agent receives the second request, reads the directory information of Part I data from the internal memory 2 of its carry.Data mode according to directory information record, it is provided that up-to-date Part I data give the first buffer memory.Then, home agent 2 is according to the directory information of each remainder data, it is provided that up-to-date remainder data give the first buffer memory.
These part data addresses in the second internal memory are adjacent.Second home agent preferentially provides up-to-date Part I data, then reoffers all the other up-to-date each several part data.
The each several part data address field to having in an internal memory, these address fields can be spliced into a continuous print address field.
Second internal memory stores and has Part I data, but be probably the data of Versions, rather than up-to-date Part I data.Such as, Part I data in the second internal memory are read by other processors and revise, then up-to-date Part I data will be present in the buffer memory of other processors.Second home agent provides up-to-date Part I data to being cached with following two kind mode.
(1) if up-to-date Part I data are stored in the second internal memory, then the second home agent directly reads up-to-date Part I data from the second internal memory, is then sent data to the first buffer memory by the second home agent by QPI bus.
(2) if up-to-date Part I data are not stored in the second internal memory, illustrate that up-to-date Part I data are stored in the buffer memory of other processors.So the second home agent inquiry has the ID of the buffer memory of up-to-date Part I data, and sends instruction to this buffer memory, has three kinds of optional instructions, illustrates separately below.In the present embodiment, it is assumed that the buffer memory having up-to-date Part I data is the 3rd buffer memory of the 3rd processor, by as follows for the scheme that the Part I data in the 3rd buffer memory are sent to described first buffer memory:
(2.1) instruction the 3rd buffer memory is sent to the first buffer memory up-to-date Part I data.
(2.2) instruction the 3rd buffer memory is sent to the second home agent up-to-date Part I data, and the second home agent is cached to up-to-date pending data in the second internal memory, and sends up-to-date Part I data to the first buffer memory.
(2.3) combination of first two instruction is equivalent to, both instruction the 3rd buffer memory was sent to the first buffer memory up-to-date Part I data, instruction the 3rd buffer memory is sent to the second home agent up-to-date Part I data again, and the second home agent is cached to up-to-date pending data in the second internal memory.
It is said that in general, record has catalogue in home agent.Second home agent uses the partial address of Part I data as label, inquires about mode bit and the storage position of Part I data in catalogue.Whereby it was confirmed that whether the second internal memory has stored up-to-date Part I data;And if when confirming not store up-to-date Part I data in the second internal memory, storage has the buffer memory of up-to-date Part I data.
Such as, catalogue records whether data are in M (modified, amendment state), E (Exclusive, exclusive state), S (Share, share state), I (Invalid, invalid state) or the state such as A (Any/Unkown, arbitrary state/unknown state).
Such as: the data of M state are dirty (dirty), represent that the data that continue were revised, it does not have write back internal memory, therefore inconsistent with internal memory.What therefore store in the second internal memory is not the up-to-date data that continue.The data of E-state are clean (clean), and other buffer memorys do not align and modified, therefore consistent with internal memory.What store in internal memory is the up-to-date data that continue.Other states no longer describe in detail.
Additionally, in some cases, the state of the catalogue record in home agent, it may be possible to inaccurate.Such as, being recorded as in the cache of certain processor for certain segment data is E state, but is actually downgraded to S state and still has not enough time to notify its home agent.Therefore, send a message to the home agent inquiry virtual condition of each processor according to catalogue, be safeguard action common in buffer consistency work.Also because of that, a kind of optional scheme is: though by catalogue confirm storage in internal memory be up-to-date continue data time, it is possible to use mode (2) provides local scheme, treats the up-to-date data that continue to the caching query of other processors.
Furthermore, it is also possible to home agent does not have catalogue, in such a scenario, it is possible to initiate query manipulation to every other buffer memory.It is to say, be equivalent to when having in catalogue, in catalogue, each other buffer memorys of record are unknown state.Other buffer memorys refer to the buffer memory except promoter, and in the present embodiment, promoter is the first buffer memory.
It is described above and how to obtain up-to-date Part I data.For all the other various pieces, it is thus achieved that the scheme of its latest data is similar with it, therefore no longer describes in detail here.
The Part I data that preferential offer mentioned here is up-to-date, do not imply that after giving the first core inquiring up-to-date first step Data Concurrent, just can begin look for all the other up-to-date each several part data.Refer here in during each, when providing up-to-date Part I data and providing other up-to-date part data to produce conflict, priority treatment Part I data.In other words, provided that in the process of arbitrary other up-to-date each several part data, when influencing whether to provide the efficiency of up-to-date Part I data, preferential provide up-to-date Part I data.
From this step it can be seen that the present embodiment, the second home agent preferential Part I data can give the first buffer memory;Then offer remainder Data Data is provided.Therefore Part I data can arrive the first buffer memory faster.And remainder data do not have too high ageing requirement, it is possible to delay to reach the first buffer memory, be used for being temporarily stored in the first buffer memory, in the future.These a few part data add up, finally obtain with the first buffer memory in prior art data identical.But in prior art, the data that continue are too slow to the speed of the first buffer memory.And after the present invention is divided into several part, the Part I high priority data at the data that continue place arrives the first buffer memory, reduces the waiting time of the first buffer memory, thus reducing the waiting time of the first core.
Step 25, the first buffer memory receives up-to-date Part I data, therefrom extracts the up-to-date data that continue and is sent to the first core, up-to-date Part I data are kept in.First buffer memory receives up-to-date remainder data, and up-to-date remainder data are kept in.
After first core receives the up-to-date data that continue, these part data can be used to continue to run with program.
Application the embodiment of the present invention, first look into Part I data (such as 8 bit), after look into all the other byte data states, for avoiding sophisticated collision scene to have certain benefit.For example, Part I data are to share state, and all the other bytes have part to be amendment state, then whole 64byte is amendment state.So these Part I data just can be directly returned to requestor's (the first buffer memory) from the second processor, sends any data without to other processors.On the other hand, when the home agent of multiple processors asks the data of same address simultaneously, it is easy to produce conflict.The solution of conflict scene, is scene the most complicated and the most consuming time in Cache coherency protocol design.By the granularity of data, it is reduced to 8 bytes from 64 bytes, it is possible to reduce the probability clashed;For ensureing cache hit rate, the design of the present invention compatible 64 bytes again, being finally buffered in the data in the first buffer memory still the same with prior art is 64 bytes, say, that the concordance being finally completed 64 byte datas accesses.
In step 25, in the embodiment of the present invention, the first buffer memory is sent to the data of the first core is the data that the first core really needs.And in prior art, even if the data that the first core needs are less than 64 bytes, what the first buffer memory was sent to the first core is 64 byte datas all the time.Therefore, in prior art, the data that the first buffer memory is sent to first are more, and the first core needs therefrom to select the data being actually needed, and adds the complexity of system.
In step 25, " the first buffer memory receives up-to-date Part I data; therefrom extracts the up-to-date data that continue and is sent to the first core " can also have a kind of substitute mode, that is exactly that the first buffer memory does not extract the up-to-date data that continue, and directly Part I data is sent to the first core.After first core receives, therefrom extract the up-to-date data that continue.
Last it is noted that above example is only in order to illustrate technical scheme, it is not intended to limit;Although the present invention being described in detail with reference to previous embodiment, it will be understood by those within the art that: the technical scheme described in foregoing embodiments still can be modified by it, or wherein portion of techniques feature is carried out equivalent replacement;And these amendments or replacement, do not make the essence of appropriate technical solution depart from the spirit and scope of various embodiments of the present invention technical scheme.

Claims (10)

1. a data processing method, it is applied to buffer consistency nonuniform memory access CC-NUMA system, CC-NUMA includes first processor and the second processor, described first processor and described second processor are connected by bus, and described first processor includes the first kernel, the first buffer memory and the first decoder;Described second processor includes the second home agent, the second buffer memory, described second home agent and the second Memory linkage, it is characterised in that including:
First kernel sends the first request, and the address of the data that continue is carried in described first request;
Described first buffer memory receives described first request, from described first decoder, inquire described address point to described second internal memory, second request that sends is to described second home agent, described second asks requested data to include at least two parts data, Part I continues data described in including, and the total size of described at least two parts data is equal to the size of cache lines;
After described second home agent receives described second request, it is provided that up-to-date Part I data give described first buffer memory;Then provide up-to-date remainder data to described first buffer memory respectively, described in include at least that two parts data address in described second internal memory is adjacent;
Described first buffer memory obtain from described Part I data described in the Data Concurrent that continues give described first core and described at least two parts data up-to-date described in buffer memory.
2. method according to claim 1, it is characterised in that provide up-to-date Part I data to described first buffer memory;Then provide up-to-date remainder data to described first buffer memory respectively, specifically include:
The directory information of Part I data is read from described second internal memory, data mode according to directory information record, if the second internal memory stored Part I data are up-to-date Part I data, then described second home agent sends up-to-date Part I data to described first buffer memory, otherwise, search the buffer memory having up-to-date Part I data, the Part I data in the buffer memory having up-to-date Part I data are sent to described first buffer memory;
Then for all the other each several part data: read directory information from described second internal memory, data mode according to directory information record, if the second internal memory these part data stored are up-to-date data, then described second home agent sends these up-to-date part data to described first buffer memory, otherwise, search the buffer memory having these up-to-date part data, these part data in the buffer memory having these up-to-date part data are sent to described first buffer memory.
3. method according to claim 1, it is characterised in that provide up-to-date Part I data to described first buffer memory;Then provide up-to-date remainder data to described first buffer memory respectively, specifically include:
Send the inquiry request of Part I data to all buffer memorys except the first buffer memory, the Part I data in the buffer memory having up-to-date Part I data are sent to described first buffer memory;
Then for all the other each several part data: send the inquiry request of these part data to all buffer memorys except the first buffer memory, these data in the buffer memory having these up-to-date part data are sent to described first buffer memory.
4. the method according to claim 1,2 or 3, it is characterised in that:
Described Part I data continue data described in being.
5. the method according to claim 1,2 or 3, it is characterised in that:
Described Part I data are be more than or equal to the described data that continue, and are the smallest positive integrals times of 8 bytes;The size of described cache lines is 64 bytes.
6. a data processing method, it is applied to buffer consistency nonuniform memory access CC-NUMA system, CC-NUMA includes first processor and the second processor, described first processor and described second processor are connected by bus, and described first processor includes the first kernel, the first buffer memory and the first decoder;Described second processor includes the second home agent, the second buffer memory, described second home agent and the second Memory linkage, it is characterised in that including:
First kernel sends the first request, and the address of the data that continue is carried in described first request;
Described first buffer memory receives described first request, from described first decoder, inquire described address point to described second internal memory, second request that sends is to described second home agent, described second asks requested data to include at least two parts data, Part I continues data described in including, and the total size of described at least two parts data is equal to the size of cache lines;
After described second home agent receives described second request, it is provided that up-to-date Part I data give described first buffer memory;Then provide up-to-date remainder data to described first buffer memory respectively, described in include at least that two parts data address in described second internal memory is adjacent;
Described up-to-date Part I data are sent to described first core and described at least two parts data up-to-date described in buffer memory by described first buffer memory.
7. method according to claim 6, it is characterised in that provide up-to-date Part I data to described first buffer memory;Then provide up-to-date remainder data to described first buffer memory respectively, specifically include:
The directory information of Part I data is read from described second internal memory, data mode according to directory information record, if the second internal memory stored Part I data are up-to-date Part I data, then described second home agent sends up-to-date Part I data to described first buffer memory, otherwise, search the buffer memory having up-to-date Part I data, the Part I data in the buffer memory having up-to-date Part I data are sent to described first buffer memory;
Then for all the other each several part data: read directory information from described second internal memory, data mode according to directory information record, if the second internal memory these part data stored are up-to-date data, then described second home agent sends these up-to-date part data to described first buffer memory, otherwise, search the buffer memory having these up-to-date part data, these part data in the buffer memory having these up-to-date part data are sent to described first buffer memory.
8. method according to claim 6, it is characterised in that provide up-to-date Part I data to described first buffer memory;Then provide up-to-date remainder data to described first buffer memory respectively, specifically include:
Send the inquiry request of Part I data to all buffer memorys except the first buffer memory, the Part I data in the buffer memory having up-to-date Part I data are sent to described first buffer memory;
Then for all the other each several part data: send the inquiry request of these part data to all buffer memorys except the first buffer memory, these data in the buffer memory having these up-to-date part data are sent to described first buffer memory.
9. the method according to claim 6,7 or 8, it is characterised in that:
Described Part I data continue data described in being.
10. the method according to claim 6,7 or 8, it is characterised in that:
Described Part I data are be more than or equal to the described data that continue, and are the smallest positive integrals times of 8 bytes;The size of described cache lines is 64 bytes.
CN201410843391.3A 2014-12-30 2014-12-30 A kind of data processing method Active CN105808497B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410843391.3A CN105808497B (en) 2014-12-30 2014-12-30 A kind of data processing method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410843391.3A CN105808497B (en) 2014-12-30 2014-12-30 A kind of data processing method

Publications (2)

Publication Number Publication Date
CN105808497A true CN105808497A (en) 2016-07-27
CN105808497B CN105808497B (en) 2018-09-21

Family

ID=56421077

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410843391.3A Active CN105808497B (en) 2014-12-30 2014-12-30 A kind of data processing method

Country Status (1)

Country Link
CN (1) CN105808497B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108512700A (en) * 2018-03-21 2018-09-07 常熟理工学院 A kind of data Realization Method of Communication of software defined network
CN112100093A (en) * 2020-08-18 2020-12-18 海光信息技术有限公司 Method for keeping consistency of shared memory data of multiple processors and multiple processor system
WO2021114768A1 (en) * 2019-12-11 2021-06-17 成都海光微电子技术有限公司 Data processing device and method, chip, processor, apparatus, and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1180864A (en) * 1996-08-19 1998-05-06 三星电子株式会社 Single-instruction-multiple-data processing in multimedia signal processor and device thereof
CN1786927A (en) * 2004-12-07 2006-06-14 国际商业机器公司 System and method for application-level cache-mapping awareness and reallocation
US7383168B2 (en) * 2003-01-06 2008-06-03 Fujitsu Limited Method and system for design verification and debugging of a complex computing system
EP2187527A2 (en) * 2008-11-18 2010-05-19 Fujitsu Limited Error judging circuit and shared memory system
CN102067090A (en) * 2008-06-17 2011-05-18 Nxp股份有限公司 Processing circuit with cache circuit and detection of runs of updated addresses in cache lines
CN102841857A (en) * 2012-07-25 2012-12-26 龙芯中科技术有限公司 Processor, device and method for carrying out cache prediction

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1180864A (en) * 1996-08-19 1998-05-06 三星电子株式会社 Single-instruction-multiple-data processing in multimedia signal processor and device thereof
US7383168B2 (en) * 2003-01-06 2008-06-03 Fujitsu Limited Method and system for design verification and debugging of a complex computing system
CN1786927A (en) * 2004-12-07 2006-06-14 国际商业机器公司 System and method for application-level cache-mapping awareness and reallocation
CN102067090A (en) * 2008-06-17 2011-05-18 Nxp股份有限公司 Processing circuit with cache circuit and detection of runs of updated addresses in cache lines
EP2187527A2 (en) * 2008-11-18 2010-05-19 Fujitsu Limited Error judging circuit and shared memory system
CN102841857A (en) * 2012-07-25 2012-12-26 龙芯中科技术有限公司 Processor, device and method for carrying out cache prediction

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108512700A (en) * 2018-03-21 2018-09-07 常熟理工学院 A kind of data Realization Method of Communication of software defined network
CN108512700B (en) * 2018-03-21 2020-10-23 常熟理工学院 Data communication implementation method of software defined network
WO2021114768A1 (en) * 2019-12-11 2021-06-17 成都海光微电子技术有限公司 Data processing device and method, chip, processor, apparatus, and storage medium
CN112100093A (en) * 2020-08-18 2020-12-18 海光信息技术有限公司 Method for keeping consistency of shared memory data of multiple processors and multiple processor system
CN112100093B (en) * 2020-08-18 2023-11-21 海光信息技术股份有限公司 Method for maintaining consistency of multiprocessor shared memory data and multiprocessor system

Also Published As

Publication number Publication date
CN105808497B (en) 2018-09-21

Similar Documents

Publication Publication Date Title
US10891228B2 (en) Cache line states identifying memory cache
US6976131B2 (en) Method and apparatus for shared cache coherency for a chip multiprocessor or multiprocessor system
US7814279B2 (en) Low-cost cache coherency for accelerators
US8818942B2 (en) Database system with multiple layer distribution
CN104346294B (en) Data read/write method, device and computer system based on multi-level buffer
US8762651B2 (en) Maintaining cache coherence in a multi-node, symmetric multiprocessing computer
US9563568B2 (en) Hierarchical cache structure and handling thereof
US9208088B2 (en) Shared virtual memory management apparatus for providing cache-coherence
US10133672B2 (en) System and method for efficient pointer chasing
CN104106061A (en) Forward progress mechanism for stores in the presence of load contention in a system favoring loads
US20110314228A1 (en) Maintaining Cache Coherence In A Multi-Node, Symmetric Multiprocessing Computer
CN111868699A (en) Coordination of cache memory operations
US20060294319A1 (en) Managing snoop operations in a data processing apparatus
US9086976B1 (en) Method and apparatus for associating requests and responses with identification information
US8904102B2 (en) Process identifier-based cache information transfer
CN105808497A (en) Data processing method
US20020178329A1 (en) Reverse directory for facilitating accesses involving a lower-level cache
US20090083496A1 (en) Method for Improved Performance With New Buffers on NUMA Systems
US8397029B2 (en) System and method for cache coherency in a multiprocessor system
US20160188470A1 (en) Promotion of a cache line sharer to cache line owner
US11126568B2 (en) Object coherence in distributed shared memory systems
US9842050B2 (en) Add-on memory coherence directory
US11599469B1 (en) System and methods for cache coherent system using ownership-based scheme
JP7238262B2 (en) Computer, semiconductor device, and control method
US9983995B2 (en) Delayed write through cache (DWTC) and method for operating the DWTC

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant