CN101958834B - On-chip network system supporting cache coherence and data request method - Google Patents

On-chip network system supporting cache coherence and data request method Download PDF

Info

Publication number
CN101958834B
CN101958834B CN2010102940174A CN201010294017A CN101958834B CN 101958834 B CN101958834 B CN 101958834B CN 2010102940174 A CN2010102940174 A CN 2010102940174A CN 201010294017 A CN201010294017 A CN 201010294017A CN 101958834 B CN101958834 B CN 101958834B
Authority
CN
China
Prior art keywords
cache
node
request
buffer memory
host
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN2010102940174A
Other languages
Chinese (zh)
Other versions
CN101958834A (en
Inventor
王惊雷
汪东升
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN2010102940174A priority Critical patent/CN101958834B/en
Publication of CN101958834A publication Critical patent/CN101958834A/en
Application granted granted Critical
Publication of CN101958834B publication Critical patent/CN101958834B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Memory System Of A Hierarchy Structure (AREA)

Abstract

The invention discloses an on-chip network system supporting cache coherence. The network system comprises a network interface part and a router, wherein the network interface part is connected with the router, a multi-core processor and a second level cache; a consistent state cache connected with the multi-core processor is additionally arranged in the network interface part and is used for storing and maintaining the consistent state of a data block in a first level cache of the multi-core processor; and an active directory cache connected with the second level cache is also additionally arranged in the network interface part and is used for caching and maintaining the directory information of the data block usually accessed by the first level cache. Coherence maintenance work is separated from the work of a processor, directory maintenance work is separated from the work of the second level cache, and the directory structure in the second level cache is eliminated, so that the design and the verification process of the multi-core processor are simplified, the storage cost of a chip is reduced, and the performance of the multi-core processor is improved. The invention also discloses a data request method of the system.

Description

Support network-on-a-chip and request of data method that high-speed cache is consistent
Technical field
The present invention relates to the Computer Systems Organization technical field, particularly a kind of network-on-a-chip and request of data method of supporting that high-speed cache is consistent.
Background technology
Get into single-chip after 1,000,000,000 transistorized epoch, how the emphasis of architectural study makes full use of the processor that requirements such as high-performance, low-power consumption are satisfied in ever-increasing transistor resource design from how utilizing limited resources realization required function progressively to turn to.Polycaryon processor (Multicore processors) provides a kind of efficient, extendible scheme for effectively utilizing these transistor resources, and the heat that has received academia and industrial quarters is held in both hands.On one chips with integrated a plurality of (Multicore) or numerous (Manycore) processor cores, promptly extensive polycaryon processor.The main challenge that extensive polycaryon processor faces is design complexity, autgmentability and memory access delay etc.
Along with the increase of polycaryon processor scale, need that storage system provides lot of data for it on the sheet.From reducing the consideration that memory access postpones and reduces the programming complexity, polycaryon processor adopts storage system on the sheet of sharing Cache usually.Owing to comprise privately owned Cache usually in the processor core, in polycaryon processor, must use the Cache consistency protocol to safeguard the consistency and the integrality of data among the privately owned Cache.Along with the expansion of polycaryon processor scale, bus structures and based on the requirement that consistency protocol can't satisfy autgmentability of intercepting of bus.In order to address this problem, the consistency protocol of network-on-chip and catalogue is used to substitute bus and intercepts agreement.
The catalogue consistency protocol is a kind of basic communication mechanism of polycaryon processor, has guaranteed the consistency and the integrality of data in the polycaryon processor, and its realization is relevant with a plurality of parts:
1, processor core: the coherency state of in processor core, safeguarding privately owned Cache;
2, share Cache: in sharing Cache, preserve and maintenance directory information;
3, network-on-chip: network-on-chip is that consistency operation provides the transmission service.
Because catalogue consistency protocol and processor, shared Cache and network-on-chip all are closely related, and have increased the polycaryon processor difficulty of design.
Heterogeneous multi-nucleus processor has obtained extensive use owing to the advantage of performance and power consumption in industrial quarters, like the Cell processor of IBM Corporation, and the Bulldozer processor of AMD etc.But the incompatibility problem that has consistency protocol in the heterogeneous multi-nucleus processor between the different processor nuclear, the support consistency protocol that has, what have does not support consistency protocol.PowerPC 755 processors like IBM Corporation are supported MEI (modified, exclusive, and invalid) agreement, and the IA-32 series processors of INTEL Corp. is supported MESI (modified; Exclusive, shared, and invalid) agreement, the Ultrasparc processor of SUN company is supported MOESI (exclusive modified; Shared modified, exclusive clean, shared clean, andinvalid) agreement; AMD 64 series are supported MOESI (modified, owned, exclusive; Shared, and invalid) agreement, but very big-difference is arranged with MOESI agreement that the Ultrasparc processor is supported.The DSP of TI company provides conforming simple functions between maintenance processor and the L2 cache.Some flush bonding processors such as MIPS 4K series though processors such as ARM 7 series comprise privately owned Cache, are not supported consistency protocol.Most of application specific processor and hardware-accelerated parts are not supported consistency protocol yet.Incompatible consistency protocol makes the unusual difficulty of heterogeneous multi-nucleus processor design.
Because consistency protocol and processor, shared Cache and network-on-chip all are closely related; Incompatible consistency protocol in the heterogeneous multi-nucleus processor particularly; Make when designing new polycaryon processor; Each parts all need reduce the reusability of parts for maintaining coherency designs again, have increased difficulty of design.
Along with the expansion of polycaryon processor scale, there is serious scaling concern in the catalogue consistency protocol.In the consistency protocol based on catalogue, the storage of catalogue can take resource on a part of sheet.With full catalogue consistency protocol is example, and when the size of data block was 64B, the storage overhead of catalogue accounted for 3% of L2 cache storage overhead in the 16 nuclear polycaryon processors; When being increased to 64 nuclears, the ratio of directory stores expense is increased to 12.5%; When being increased to 512 nuclears, the ratio of directory stores expense is increased to 50%.Directory stores expense in the catalogue consistency protocol not only increases chip area and cost, has also increased the power consumption of system, has had a strong impact on the autgmentability of polycaryon processor.
In the catalogue consistency protocol, catalogue is generally held among the afterbody Cache (like the L2 cache in the two-stage Cache structure), and each data block in the L2 cache is all safeguarded a catalogue vector, in order to the processor of this data block of trace cache.All miss request of processor all need arrive in the L2 cache of host's node and search directory information, and carry out corresponding consistency operation.Along with the expansion of processor scale,, have a strong impact on the performance of polycaryon processor to the also increase thereupon of access delay of catalogue.The catalogue consistency protocol also is the performance bottleneck of extensive polycaryon processor.
Summary of the invention
The technical problem that (one) will solve
The technical problem that the present invention will solve is the problem that consistency protocol brought in the polycaryon processor complex design, catalogue are difficult to expand and access delay is bigger
(2) technical scheme
For solving the problems of the technologies described above; The invention provides a kind of network-on-a-chip of supporting that high-speed cache is consistent; Comprise: network interface unit, said network interface unit connects router, and connects polycaryon processor and L2 cache; In said network interface unit, have additional the coherency state buffer memory that links to each other with polycaryon processor, said coherency state buffer memory is used for preserving and safeguards the coherency state of level cache data block of each nuclear of polycaryon processor.
Wherein, said coherency state buffer memory comprises:
The coherency state memory has the storage line identical with said level cache, is used for preserving the coherency state of said level cache data block;
Processor interface; Connect said polycaryon processor; Be used for isolating the needed request signal of coherency state buffer memory from the bus request of dissimilar processor cores, and the response of polycaryon processor or request signal are converted the coherency state buffer memory to polycaryon processor can identified signal;
The consistency protocol controller; Be used for when the level cache of polycaryon processor is visited miss request or receiveed the response through network interface unit; Obtain this visit miss request or receive the response through said processor interface, therefrom isolate address tag, and safeguard corresponding coherency state.
Wherein, also have additional in the said network interface unit link to each other with L2 cache enliven the catalogue buffer memory, be used for buffer memory and safeguard that L2 cache is often by the directory information of the data block of said level cache visit.
Wherein, the said catalogue buffer memory that enlivens comprises:
Catalog memory is used for the buffer memory L2 cache often by the directory information of the data block of said level cache visit: the address tag, directory states and the catalogue vector that comprise said data block;
The L2 cache interface connects L2 cache, is used for the visit miss request of polycaryon processor is sent to said L2 cache, or the back-signalling of L2 cache returned to enlivens the catalogue buffer memory;
The catalog control device is used for obtaining the access request of polycaryon processor to L2 cache through the L2 cache interface, and searches the directory information in the catalog memory, whether sends this access request to local L2 cache according to the type decided of request then.
The present invention also provides a kind of request of data method of utilizing above-mentioned system, may further comprise the steps:
S1: the consistency protocol controller of requesting node is caught the level cache visit miss request of described request modal processor,
S2: the consistency protocol controller of described request node is searched the address tag of said miss request in the coherency state memory, and sends request of data according to corresponding coherency state to the L2 cache of host's node;
S3: the catalog control device that enlivens the catalogue buffer memory in the network interface unit of host's node router is caught said request of data; And in this catalog memory, search the corresponding directory information of said request of data, whether send said request of data according to the request type decision then to the L2 cache of this host's node;
S4: the L2 cache of said host's node according to said request of data to requesting node processor return messages; When the router of process requesting node; The consistency protocol controller of this router is caught this message; And store the coherency state of data in the address according to the type of message change or the address tag described in the state cache that keeps consistency, and return to the requesting node processor to the request msg that comprises in the message.
Wherein, said step S2 specifically comprises: when said miss request is:
Read request operation: in the coherency state memory of the coherency state buffer memory of requesting node, be cache lines of this request address distribution; And be made as interim state I S to coherency state; Transmit and to ask to the L2 cache of host's node simultaneously; Said interim state I S representes also not completion of read request, waits for the data response of this L2 cache;
Write solicit operation: if the coherency state buffer memory of requesting node does not hit; It then is cache lines of this address assignment; And be made as interim state I M to coherency state; And transmit this to the L2 cache of host's node and write request, said interim state I M representes to write request and does not also accomplish, and waits for the response of writing of this L2 cache; If the coherency state cache hit of requesting node and be in shared state then is made as the IM state to coherency state, and send the request of renewal of writing to the L2 cache of host's node; If the coherency state cache hit of requesting node and be in the modification state is then directly returned to the requesting node processor and write back-signalling, the state in the coherency state buffer memory does not change;
Upgrade solicit operation: the coherency state in the coherency state buffer memory of requesting node is made as the IM state, sends the request of renewal to host's node then;
The replacement and write back solicit operation: the coherency state buffer memory of requesting node directly is transmitted to this request the L2 cache of host's node;
Replacement operation: when the coherency state buffer memory of requesting node because during capacity conflict generation replacement operation; Then send invalid signals to the requesting node processor; The privately owned level cache of requesting node processor then can send invalid response or write back message to host's node according to its state; After the coherency state buffer memory of requesting node receives the invalid response of requesting node processor or writes back message; Send replacement or write back request to the L2 cache of host's node, the replacement of receiving host's node is by the time deleted this cache lines after responding or write back and receiveing the response from the coherency state buffer memory of requesting node.
Wherein, said step S3 specifically comprises: when said miss request is:
Read request operation: if host's node enliven the catalogue cache hit; The position of node joins request in enlivening the catalogue vector of directory information; If directory states is in shared state; Then the L2 cache to this host's node sends read data request, obtain the data response of L2 cache after, give the processor of requesting node this data forwarding; If directory states is in the modification state; Then send degradation and write back request to the shared node that has these data; When the catalog control device receives the data that write back; Give the processor of requesting node the data forwarding that writes back, and write back the L2 cache of this host's node to these data, directory states becomes shared state; If host's node enliven the catalogue cache miss; Then in enlivening the catalogue buffer memory, add directory entry, the L2 cache to this host's node sends read request then, after obtaining data in buffer and responding; The processor of the data forwarding of request to requesting node, directory states becomes shared state;
Write solicit operation: if the enlivening the catalogue cache hit and be in shared state of host's node; Then the processor to all shared nodes sends invalid signals; And send read request to the L2 cache of this host's node; After the catalog control device is collected all invalid receiveing the response, the position of corresponding shared node is deleted from the catalogue vector, be transmitted to the data response that returns from L2 cache the processor of requesting node; Change into the modification state to the directory states of enlivening the catalogue buffer memory of host's node, the position of the node that in the catalogue vector, joins request; If the catalogue buffer memory that enlivens of host's node is in the modification state; Then send invalid and write back request to the processor of sharing node; When the catalog control device is received the data that write back; Delete from the catalogue vector node corresponding position, gives requesting node data forwarding, the position of the node that in the catalogue vector, joins request; If the catalogue buffer memory that enlivens of host's node does not hit; Then in enlivening the catalogue buffer memory, add directory entry; L2 cache to this host's node sends read request, obtain the data response of L2 cache after, give requesting node the data forwarding of request; Directory states becomes the modification state, the position of the node that in the catalogue vector, joins request;
The replacement request operation: delete from the catalogue vector of host's node requesting node position that will be to be replaced; And return the replacement back-signalling to this requesting node; If unique shared node is then deleted this catalogue vector from enlivening the catalogue buffer memory of host's node;
Write back solicit operation: node location is deleted, write back to the L2 cache of this host's node to data, and return to requesting node and to write back back-signalling, delete this catalogue vector from enlivening the catalogue buffer memory of host's node from the catalogue vector of host's node;
Replacement operation: during the enlivening the catalogue buffer memory and replace of host's node owing to capacity conflict; Send invalidation request to all shared nodes; If the directory states of this catalog memory is in shared state; After then the catalog control device is collected all invalid responses, from enliven the catalogue buffer memory, delete this catalogue vector; If this catalog memory directory states is in the modification state, after then the catalog control device is received the data that write back, write back to these data in the L2 cache of this host's node, delete the corresponding catalogue vector of these data then;
When receiving the invalidation request of L2 cache of host's node; If the catalogue buffer memory that enlivens of this node does not hit, then directly return invalid back-signalling, if enliven the catalogue cache hit to L2 cache; Then enliven the replacement operation of catalogue buffer memory; Replacement operation returns invalid back-signalling or writes back signal to L2 cache after accomplishing, and from enliven the catalogue buffer memory, deletes the catalogue vector.
Wherein, said step S4 specifically comprises: the response corresponding when said miss request is operating as:
Read back and should operate: the IS state of the coherency state buffer memory of requesting node is changed into shared state, and to requesting node processor return data;
Write and respond and upgrade and respond operation: the IM state of the coherency state buffer memory of requesting node is changed into the modification state, and return to write to the requesting node processor and respond or upgrade and respond;
Replacement is responded and is write back and responds operation: the cache lines at this place, address in the coherency state buffer memory of deletion requesting node, and be transmitted to the requesting node processor to back-signalling;
Invalidation request operation: when the coherency state buffer memory of requesting node receives from the invalidation request of host's node L2 cache, directly be transmitted to the requesting node processor;
Invalid response operation: when the coherency state buffer memory of requesting node receives from the invalid back-signalling of requesting node processor, deletion corresponding cache row, and invalid back-signalling forwarding host node.
(3) beneficial effect
The network-on-a-chip of the support high-speed cache unanimity that the present invention proposes is through integrated coherency state buffer memory in the network interface unit of network-on-chip and enliven the catalogue buffer memory; From processor, separate consistency maintenance work; From L2 cache, separate directory maintenance work, and cancelled the bibliographic structure in the L2 cache, simplified the design and the proof procedure of polycaryon processor; Reduce the storage and the time-delay expense of chip, improved the performance of polycaryon processor.
Description of drawings
Fig. 1 is the consistent network-on-a-chip structural representation of support high-speed cache of the embodiment of the invention;
Fig. 2 is a coherency state buffer structure sketch map in the consistent network-on-a-chip of the support high-speed cache of the embodiment of the invention;
Fig. 3 is the structural representation that enlivens the catalogue buffer memory in the consistent network-on-a-chip of the support high-speed cache of the embodiment of the invention;
Fig. 4 is the request of data method flow diagram of the above-mentioned system of utilizing of the embodiment of the invention.
Embodiment
Below in conjunction with accompanying drawing and embodiment, specific embodiments of the invention describes in further detail.Following examples are used to explain the present invention, but are not used for limiting scope of the present invention.
In polycaryon processor, the operation of buffer consistency agreement is transmitted through network-on-chip.Privately owned buffer memory (level cache) miss request of polycaryon processor sends in the network-on-chip through network interface unit, and echo message also is transferred to the router of requesting node through network-on-chip, return to polycaryon processor through network interface unit again.Visit to catalogue and data in the shared L2 cache is also transmitted through network interface unit.Share L2 cache to the receiveing the response and the invalid message of privately owned buffer memory is injected into network-on-chip through network interface unit of polycaryon processor, be transferred to corresponding polycaryon processor.Network interface unit can obtain all the consistency protocol operations in the system, and makes further processing.
In order from processor, to separate the coherency state maintenance work; The present invention has increased a coherency state buffer memory in network interface unit; As shown in Figure 1; Said coherency state buffer memory is used for preserving and safeguarding the coherency state of local privately owned level cache data block, and the level cache of polycaryon processor is worked by the mode of oneself, needn't be concerned about consistency maintenance work.Through the coherency state buffer memory, realized separating of consistency protocol and processor, compatibility has the polycaryon processor of different consistencies agreement.
As shown in Figure 2, above-mentioned coherency state buffer memory comprises:
The coherency state memory has the storage line identical with said level cache, and address tag that stores level cache and coherency state in the storage line are used for preserving the coherency state of said level cache data block.
Processor interface; Connect said polycaryon processor; Be used for isolating the needed request signal of coherency state buffer memory from the bus request of dissimilar processor cores, and the response of polycaryon processor or request signal are converted the coherency state buffer memory to polycaryon processor can identified signal.
The consistency protocol controller; Be used for when the level cache of polycaryon processor is visited miss request or receiveed the response through network interface unit; Obtain this access request or receive the response through said processor interface, therefrom isolate address tag, and safeguard corresponding coherency state.
In order from L2 cache, to separate directory stores and maintenance work; Also have additional in the network interface unit link to each other with L2 cache enliven the catalogue buffer memory; As shown in Figure 1, enliven the catalogue buffer memory and be used for buffer memory and safeguard that L2 cache is often by the directory information of the data block of level cache visit.Cancel directory stores space and directory maintenance work in the L2 cache simultaneously.Enliven the catalogue buffer memory and reduced the directory stores expense, realized separating of consistency protocol and L2 cache.Enliven the catalogue buffer memory and also reduced the directory access delay, the performance of system is increased.
As shown in Figure 3, the above-mentioned catalogue buffer memory that enlivens comprises:
Catalog memory is used for the buffer memory L2 cache often by the directory information of the data block of said level cache visit, and each storage line is made up of address tag, directory states and the catalogue vector etc. of said data block.The purpose of catalogue vector is the position of the level cache of this address of trace cache.Keep a shared position for each processor core that comprises privately owned level cache in the catalogue vector.
The L2 cache interface connects L2 cache, and the miss request of polycaryon processor is used for the visit miss request of polycaryon processor is sent to said L2 cache after visit enlivens the catalogue buffer memory, or the back-signalling of L2 cache returned to enlivens the catalogue buffer memory.
The catalog control device is used for obtaining the access request of polycaryon processor to L2 cache through the L2 cache interface, and searches the directory information in the catalog memory, whether sends this access request to local L2 cache according to the type decided of request then.
With respect to traditional network-on-chip, in supporting the consistent network-on-a-chip structure of high-speed cache, increased the coherency state buffer memory and enlivened the catalogue buffer memory.They all are implemented in the network interface unit of network-on-chip.The coherency state buffer memory is the interface of router and processor, enlivens the interface that the catalogue buffer memory is router and L2 cache.Coherency state safeguard with directory maintenance work respectively by the coherency state buffer memory with enliven the completion of catalogue buffer memory.The major function of network interface unit is that the data of sending are packed and the data that receive are unpacked processing.The coherency state buffer memory with enliven the packing of catalogue buffer memory and network interface and unpack the parts concurrent working, hidden the coherency state buffer memory and enlivened the access delay of catalogue buffer memory.Such design does not change the structure of router, and topological structure and routing algorithm are not all had influence, has increased adaptability and the flexibility of supporting the network-on-chip structure that high-speed cache is consistent.The present invention supports polycaryon processor to be directly connected in the network-on-chip through supporting the consistent network-on-chip structure of high-speed cache with L2 cache, realizes the seamless integrated of polycaryon processor, and consistency protocol is transparent to polycaryon processor and L2 cache.
The invention also discloses a kind of request of data method of utilizing above-mentioned system; This method can make the L2 cache request msg of the host node of processor on network of requesting node; The described request node is meant the computer node at the processor place of sending request of data, and said host's node is meant the computer node on the network at the L2 cache place of preserving these data.As shown in Figure 4, comprising:
Step S401, the consistency protocol controller of described request node catch the level cache visit miss request of described request modal processor.
Step S402, the consistency protocol controller of described request node is searched the address tag of said miss request in the coherency state memory, and sends request of data according to corresponding coherency state to the L2 cache of host's node.
Step S403; Enliven the catalog control device of catalogue buffer memory in the network interface unit of said host's node router and catch said request of data; And in said catalog memory, search the corresponding directory information of said request of data, whether send said request of data according to the request type decision then to the L2 cache of this host's node.
Step S404; The L2 cache of said host's node according to said request of data to described request modal processor return messages; When the router of process requesting node; The consistency protocol controller of this router is caught this message, and stores the coherency state of data in the address according to the type of message change or the address tag described in the state cache that keeps consistency, and returns to the requesting node processor to the request msg that comprises in the message.
Of the present invention connecting internet system operation principle is following:
In the network-on-a-chip of above-mentioned support high-speed cache unanimity, during the memory access miss request process network interface unit of processor, get into the coherency state buffer memory.The consistency protocol controller is at first searched the address tag of level cache request address in the coherency state buffer memory, and according to corresponding coherency state information, sends request to the L2 cache of host's node.If this address tag not in the coherency state buffer memory then adds this address tag.L2 cache returns to receiveing the response of this polycaryon processor and is intercepted and captured by the consistency protocol controller; The consistency protocol controller is according to the type of receiveing the response; Change or keep the coherency state in the coherency state buffer memory; Simultaneously, give polycaryon processor the data forwarding of request, accomplish single treatment device disappearance accessing operation.L2 cache is is also intercepted and captured by the consistency protocol controller the data block invalidation request of polycaryon processor and invalid the receiveing the response of polycaryon processor, and coherency state is changed accordingly.Concrete consistency protocol is operated as follows:
Read request operation: in the coherency state memory of the coherency state buffer memory of requesting node, be cache lines of this request address distribution; And (IS is a kind of interim state to be made as the IS state to coherency state; The expression read request is not also accomplished; Wait for the data response of this L2 cache), and should request to the L2 cache forwarding of host's node.
Write solicit operation: if the coherency state buffer memory of requesting node does not hit; It then is cache lines of this address assignment; And (IM also is a kind of interim state to be made as the IM state to coherency state; Expression is write request and is not also accomplished, and waits for the response of writing of this L2 cache), and transmit this to the L2 cache of host's node and write request; If the coherency state cache hit of requesting node and be in shared (S) state then is made as the IM state to coherency state, and sends to host's node and to write renewals (Update) and ask; If the coherency state cache hit of requesting node and be in modification (M) state is then directly returned to the requesting node processor and write back-signalling, the state in the coherency state buffer memory does not change.
Upgrade solicit operation: the coherency state in the coherency state buffer memory of requesting node is made as the IM state, sends the request of renewal to host's node then.When receiving the renewal request of polycaryon processor, the coherency state buffer memory should be in shared (S) state.
The replacement and write back solicit operation: the coherency state buffer memory of requesting node directly is transmitted to this request the L2 cache of host's node.When receiving the replacement request of polycaryon processor, the coherency state buffer memory should be in shared (S) state.When request of writing back that receives polycaryon processor, the coherency state buffer memory should be in modification (M) state.
Read back and should operate: the IS state of the coherency state buffer memory of requesting node is changed into shared (S) state, and to requesting node processor return data.
Write and respond and upgrade and respond operation: the IM state of the coherency state buffer memory of requesting node is changed into modifications (M) state, and return to write to the requesting node processor and respond or upgrade response.
Replacement is responded and is write back and responds operation: the cache lines at this place, address in the coherency state buffer memory of deletion requesting node, and be transmitted to the requesting node processor to back-signalling.
Invalidation request operation: when the coherency state buffer memory of requesting node receives from the invalidation request of host's node L2 cache, directly be transmitted to the requesting node processor.
Invalid response operation: when the coherency state buffer memory of requesting node receives from the invalid back-signalling of polycaryon processor, deletion corresponding cache row, and invalid back-signalling forwarding host node.
Replacement operation: when the coherency state buffer memory of requesting node because during capacity conflict generation replacement operation; Then send invalid signals to the requesting node processor, the privately owned level cache of requesting node processor then can send invalid response or write back message to host's node according to its state.After the coherency state buffer memory of requesting node receives the invalid response of requesting node processor or writes back message, send replacement or write back request to host's node.The replacement of receiving host's node is by the time deleted this cache lines after responding or write back and receiveing the response from the coherency state buffer memory of requesting node.
In the network-on-a-chip of above-mentioned support high-speed cache unanimity, enliven the catalogue buffer memory and preserve recently the often directory information of the data of visit.The read-write miss request of all polycaryon processors all can cause the visit to host's node L2 cache, and these visits are caught by the catalogue buffer memory that enlivens on the network interface.The catalog control device that enlivens in the catalogue buffer memory is at first searched directory information in the catalog memory that enlivens the catalogue buffer memory, whether sends read-write requests to local L2 cache according to the type decided of request then.Its course of work is following:
Read request operation: if host's node enliven the catalogue cache hit; The position of node joins request in the catalogue vector; If directory states is in shared (S) state, then the L2 cache to this host's node sends read data request, obtain the data response of L2 cache after; The processor of this data forwarding, accomplish read operation to requesting node.If directory states is in modification (M) state; Then send degradation and write back request to the shared node that has these data; When the catalog control device receives the data that write back; Give the processor of requesting node the data forwarding that writes back, and write back the L2 cache of this host's node to these data, directory states becomes shared (S) state.If this host's node enliven the catalogue cache miss; Then in enlivening the catalogue buffer memory, add directory entry; L2 cache to this host's node sends read request then; After obtaining data in buffer and responding, the processor of the data forwarding of request to requesting node, directory states becomes shared (S) state.
Write solicit operation: if the enlivening the catalogue cache hit and be in shared (S) state of host's node, then the processor to all shared nodes sends invalid signals, and sends read request to the L2 cache of this host's node.The request of writing in fact also is a read operation, and this is to a word because of write command, and read-write requests all is capable to a Cache; When writing request; Need all read the capable data of Cache, deliver to requesting node, to merge into a new Cache capable with the content that writes.After the catalog control device is collected all invalid receiveing the response; The corresponding shared node location is deleted from the catalogue vector; Be transmitted to the data response that returns from L2 cache the processor of requesting node; Change into modification (M) state, the position of the node that in the catalogue vector, joins request to the state of the catalogue of enlivening the catalogue buffer memory of host's node.If the catalogue buffer memory that enlivens of host's node is in modification (M) state, then send invalid and write back request to the processor of sharing node, be unique correct copy because at this moment share the copy of the preservation of node, need write back.When the catalog control device was received the data that write back, deleted from the catalogue vector node corresponding position, gives requesting node data forwarding, the position of the node that in the catalogue vector, joins request.If the catalogue buffer memory that enlivens of host's node does not hit; Then in enlivening the catalogue buffer memory, add directory entry; L2 cache to this host's node sends read request, obtain the data response of L2 cache after, give requesting node the data forwarding of request; Directory states becomes modification (M) state, the position of the node that in the catalogue vector, joins request.
The replacement request operation: delete from the catalogue vector requesting node position that will be to be replaced, and return the replacement back-signalling to requesting node.If unique shared node is then deleted this catalogue vector from enlivening the catalogue buffer memory of host's node.
Write back solicit operation: node location is deleted, write back to the L2 cache of this host's node to data, and return to requesting node and to write back back-signalling, then delete this catalogue vector from enlivening the catalogue buffer memory of host's node from the catalogue vector of host's node.
Replacement operation: during the enlivening the catalogue buffer memory and replace of host's node, send invalidation request to all shared nodes owing to capacity conflict.If the directory states of this catalog memory is in shared (S) state after then the catalog control device is collected all invalid responses, is deleted this catalogue vector from enliven the catalogue buffer memory.If the directory states of this catalog memory is in modification (M) state, after then the catalog control device is received the data that write back, write back to these data in the L2 cache of this host's node, delete the corresponding catalogue vector of these data then.
When receiving the invalidation request of L2 cache of host's node, do not hit, then directly return invalid back-signalling to L2 cache if this node enlivens the catalogue buffer memory.If enliven the catalogue cache hit, then enliven the replacement operation of catalogue buffer memory, replacement operation returns invalid back-signalling or writes back signal to L2 cache after accomplishing, and from enliven the catalogue buffer memory, deletes the catalogue vector.
Above execution mode only is used to explain the present invention; And be not limitation of the present invention; The those of ordinary skill in relevant technologies field under the situation that does not break away from the spirit and scope of the present invention, can also be made various variations and modification; Therefore all technical schemes that are equal to also belong to category of the present invention, and scope of patent protection of the present invention should be defined by the claims.

Claims (7)

1. network-on-a-chip of supporting that high-speed cache is consistent; Comprise: network interface unit, said network interface unit connects router, and connects polycaryon processor and L2 cache; It is characterized in that; In said network interface unit, have additional the coherency state buffer memory that links to each other with polycaryon processor, said coherency state buffer memory is used for preserving and safeguards the coherency state of level cache data block of each nuclear of polycaryon processor, and said coherency state buffer memory comprises:
The coherency state memory has the storage line identical with said level cache, is used for preserving the coherency state of said level cache data block;
Processor interface; Connect said polycaryon processor; Be used for isolating the needed request signal of coherency state buffer memory from the bus request of dissimilar processor cores, and the response of polycaryon processor or request signal are converted the coherency state buffer memory to polycaryon processor can identified signal;
The consistency protocol controller; Be used for when the level cache of polycaryon processor is visited miss request or receiveed the response through network interface unit; Obtain this visit miss request or receive the response through said processor interface, therefrom isolate address tag, and safeguard corresponding coherency state.
2. the network-on-a-chip that support high-speed cache as claimed in claim 1 is consistent; It is characterized in that; Also have additional in the said network interface unit link to each other with L2 cache enliven the catalogue buffer memory, be used for buffer memory and safeguard that L2 cache is often by the directory information of the data block of said level cache visit.
3. the network-on-a-chip that support high-speed cache as claimed in claim 2 is consistent is characterized in that the said catalogue buffer memory that enlivens comprises:
Catalog memory is used for the buffer memory L2 cache often by the directory information of the data block of said level cache visit: the address tag, directory states and the catalogue vector that comprise said data block;
The L2 cache interface connects L2 cache, is used for the visit miss request of polycaryon processor is sent to said L2 cache, or the back-signalling of L2 cache returned to enlivens the catalogue buffer memory;
The catalog control device is used for obtaining the access request of polycaryon processor to L2 cache through the L2 cache interface, and searches the directory information in the catalog memory, whether sends this access request to local L2 cache according to the type decided of request then.
4. a request of data method of utilizing the described system of claim 3 is characterized in that, may further comprise the steps:
S1: the consistency protocol controller of requesting node is caught the level cache visit miss request of described request modal processor,
S2: the consistency protocol controller of described request node is searched the address tag of said miss request in the coherency state memory, and sends request of data according to corresponding coherency state to the L2 cache of host's node;
S3: the catalog control device that enlivens the catalogue buffer memory in the network interface unit of host's node router is caught said request of data; And in this catalog memory, search the corresponding directory information of said request of data, whether send said request of data according to the request type decision then to the L2 cache of this host's node;
S4: the L2 cache of said host's node according to said request of data to requesting node processor return messages; When the router of process requesting node; The consistency protocol controller of this router is caught this message; And store the coherency state of data in the address according to the type of message change or the address tag described in the state cache that keeps consistency, and return to the requesting node processor to the request msg that comprises in the message.
5. request of data method as claimed in claim 4 is characterized in that, said step S2 specifically comprises: when said miss request is:
Read request operation: in the coherency state memory of the coherency state buffer memory of requesting node, be cache lines of this request address distribution; And be made as interim state I S to coherency state; Transmit and to ask to the L2 cache of host's node simultaneously; Said interim state I S representes also not completion of read request, waits for the data response of this L2 cache;
Write solicit operation: if the coherency state buffer memory of requesting node does not hit; It then is cache lines of this address assignment; And be made as interim state I M to coherency state; And transmit this to the L2 cache of host's node and write request, said interim state I M representes to write request and does not also accomplish, and waits for the response of writing of this L2 cache; If the coherency state cache hit of requesting node and be in shared state then is made as the IM state to coherency state, and send the request of renewal of writing to the L2 cache of host's node; If the coherency state cache hit of requesting node and be in the modification state is then directly returned to the requesting node processor and write back-signalling, the state in the coherency state buffer memory does not change;
Upgrade solicit operation: the coherency state in the coherency state buffer memory of requesting node is made as the IM state, sends the request of renewal to host's node then;
The replacement and write back solicit operation: the coherency state buffer memory of requesting node directly is transmitted to this request the L2 cache of host's node;
Replacement operation: when the coherency state buffer memory of requesting node because during capacity conflict generation replacement operation; Then send invalid signals to the requesting node processor; The privately owned level cache of requesting node processor then can send invalid response or write back message to host's node according to its state; After the coherency state buffer memory of requesting node receives the invalid response of requesting node processor or writes back message; Send replacement or write back request to the L2 cache of host's node, the replacement of receiving host's node is by the time deleted this cache lines after responding or write back and receiveing the response from the coherency state buffer memory of requesting node.
6. request of data method as claimed in claim 4 is characterized in that, said step S3 specifically comprises: when said miss request is:
Read request operation: if host's node enliven the catalogue cache hit; The position of node joins request in enlivening the catalogue vector of directory information; If directory states is in shared state; Then the L2 cache to this host's node sends read data request, obtain the data response of L2 cache after, give the processor of requesting node this data forwarding; If directory states is in the modification state; Then send degradation and write back request to the shared node that has these data; When the catalog control device receives the data that write back; Give the processor of requesting node the data forwarding that writes back, and write back the L2 cache of this host's node to these data, directory states becomes shared state; If host's node enliven the catalogue cache miss; Then in enlivening the catalogue buffer memory, add directory entry, the L2 cache to this host's node sends read request then, after obtaining data in buffer and responding; The processor of the data forwarding of request to requesting node, directory states becomes shared state;
Write solicit operation: if the enlivening the catalogue cache hit and be in shared state of host's node; Then the processor to all shared nodes sends invalid signals; And send read request to the L2 cache of this host's node; After the catalog control device is collected all invalid receiveing the response, the position of corresponding shared node is deleted from the catalogue vector, be transmitted to the data response that returns from L2 cache the processor of requesting node; Change into the modification state to the directory states of enlivening the catalogue buffer memory of host's node, the position of the node that in the catalogue vector, joins request; If the catalogue buffer memory that enlivens of host's node is in the modification state; Then send invalid and write back request to the processor of sharing node; When the catalog control device is received the data that write back; Delete from the catalogue vector node corresponding position, gives requesting node data forwarding, the position of the node that in the catalogue vector, joins request; If the catalogue buffer memory that enlivens of host's node does not hit; Then in enlivening the catalogue buffer memory, add directory entry; L2 cache to this host's node sends read request, obtain the data response of L2 cache after, give requesting node the data forwarding of request; Directory states becomes the modification state, the position of the node that in the catalogue vector, joins request;
The replacement request operation: delete from the catalogue vector of host's node requesting node position that will be to be replaced; And return the replacement back-signalling to this requesting node; If unique shared node is then deleted this catalogue vector from enlivening the catalogue buffer memory of host's node;
Write back solicit operation: node location is deleted, write back to the L2 cache of this host's node to data, and return to requesting node and to write back back-signalling, delete this catalogue vector from enlivening the catalogue buffer memory of host's node from the catalogue vector of host's node;
Replacement operation: during the enlivening the catalogue buffer memory and replace of host's node owing to capacity conflict; Send invalidation request to all shared nodes; If the directory states of this catalog memory is in shared state; After then the catalog control device is collected all invalid responses, from enliven the catalogue buffer memory, delete this catalogue vector; If this catalog memory directory states is in the modification state, after then the catalog control device is received the data that write back, write back to these data in the L2 cache of this host's node, delete the corresponding catalogue vector of these data then;
When receiving the invalidation request of L2 cache of host's node; If the catalogue buffer memory that enlivens of this node does not hit, then directly return invalid back-signalling, if enliven the catalogue cache hit to L2 cache; Then enliven the replacement operation of catalogue buffer memory; Replacement operation returns invalid back-signalling or writes back signal to L2 cache after accomplishing, and from enliven the catalogue buffer memory, deletes the catalogue vector.
7. request of data method as claimed in claim 4 is characterized in that, said step S4 specifically comprises: the response corresponding when said miss request is operating as:
Read back and should operate: the IS state of the coherency state buffer memory of requesting node is changed into shared state, and to requesting node processor return data;
Write and respond and upgrade and respond operation: the IM state of the coherency state buffer memory of requesting node is changed into the modification state, and return to write to the requesting node processor and respond or upgrade and respond;
Replacement is responded and is write back and responds operation: the cache lines at this place, address in the coherency state buffer memory of deletion requesting node, and be transmitted to the requesting node processor to back-signalling;
Invalidation request operation: when the coherency state buffer memory of requesting node receives from the invalidation request of host's node L2 cache, directly be transmitted to the requesting node processor;
Invalid response operation: when the coherency state buffer memory of requesting node receives from the invalid back-signalling of requesting node processor, deletion corresponding cache row, and invalid back-signalling forwarding host node.
CN2010102940174A 2010-09-27 2010-09-27 On-chip network system supporting cache coherence and data request method Expired - Fee Related CN101958834B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2010102940174A CN101958834B (en) 2010-09-27 2010-09-27 On-chip network system supporting cache coherence and data request method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2010102940174A CN101958834B (en) 2010-09-27 2010-09-27 On-chip network system supporting cache coherence and data request method

Publications (2)

Publication Number Publication Date
CN101958834A CN101958834A (en) 2011-01-26
CN101958834B true CN101958834B (en) 2012-09-05

Family

ID=43485952

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2010102940174A Expired - Fee Related CN101958834B (en) 2010-09-27 2010-09-27 On-chip network system supporting cache coherence and data request method

Country Status (1)

Country Link
CN (1) CN101958834B (en)

Families Citing this family (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102591800B (en) * 2011-12-31 2015-01-07 龙芯中科技术有限公司 Data access and storage system and method for weak consistency storage model
CN102662885B (en) * 2012-04-01 2015-09-23 天津国芯科技有限公司 Symmetrical multi-core processor safeguards the conforming devices and methods therefor of L2 cache
CN102819498B (en) * 2012-08-15 2015-01-07 上海交通大学 Method of constructing consistency protocol of cache, many-core processor and network interface unit
CN103885890B (en) * 2012-12-21 2017-04-12 华为技术有限公司 Replacement processing method and device for cache blocks in caches
CN103440223B (en) * 2013-08-29 2017-04-05 西安电子科技大学 A kind of hierarchical system and its method for realizing cache coherent protocol
FR3010598B1 (en) * 2013-09-06 2017-01-13 Sagem Defense Securite METHOD FOR MANAGING COHERENCE COACHES
CN104462007B (en) * 2013-09-22 2018-10-02 南京中兴新软件有限责任公司 The method and device of buffer consistency between realization multinuclear
CN104360981B (en) * 2014-11-12 2017-09-29 浪潮(北京)电子信息产业有限公司 Towards the design method of the Cache coherence protocol of multinuclear multi processor platform
CN105740164B (en) * 2014-12-10 2020-03-17 阿里巴巴集团控股有限公司 Multi-core processor supporting cache consistency, reading and writing method, device and equipment
US20160210231A1 (en) * 2015-01-21 2016-07-21 Mediatek Singapore Pte. Ltd. Heterogeneous system architecture for shared memory
CN106155853B (en) * 2015-03-23 2018-09-14 龙芯中科技术有限公司 The verification method of processor IP, device and system
CN105488012B (en) * 2015-12-09 2021-05-18 浪潮电子信息产业股份有限公司 Consistency protocol design method based on exclusive data
US10157133B2 (en) * 2015-12-10 2018-12-18 Arm Limited Snoop filter for cache coherency in a data processing system
CN107229593B (en) * 2016-03-25 2020-02-14 华为技术有限公司 Cache consistency operation method of multi-chip multi-core processor and multi-chip multi-core processor
CN107341114B (en) * 2016-04-29 2021-06-01 华为技术有限公司 Directory management method, node controller and system
CN105915619B (en) * 2016-04-29 2019-07-05 中国地质大学(武汉) Take the cyberspace information service high-performance memory cache method of access temperature into account
US10528519B2 (en) * 2017-05-02 2020-01-07 Mellanox Technologies Ltd. Computing in parallel processing environments
CN109213641B (en) * 2017-06-29 2021-10-26 展讯通信(上海)有限公司 Cache consistency detection system and method
CN108694156B (en) * 2018-04-16 2021-12-21 东南大学 On-chip network traffic synthesis method based on cache consistency behavior
CN109684237B (en) * 2018-11-20 2021-06-01 华为技术有限公司 Data access method and device based on multi-core processor
CN110225008B (en) * 2019-05-27 2020-07-31 四川大学 SDN network state consistency verification method in cloud environment
CN110647532B (en) * 2019-08-15 2022-08-05 苏州浪潮智能科技有限公司 Method and device for maintaining data consistency
CN112559433B (en) * 2019-09-25 2024-01-02 阿里巴巴集团控股有限公司 Multi-core interconnection bus, inter-core communication method and multi-core processor
US11188471B2 (en) * 2020-04-03 2021-11-30 Alibaba Group Holding Limited Cache coherency for host-device systems
CN111651375A (en) * 2020-05-22 2020-09-11 中国人民解放军国防科技大学 Method and system for realizing consistency of cache data of multi-path processor based on distributed finite directory
CN111930527B (en) * 2020-06-28 2023-12-08 绵阳慧视光电技术有限责任公司 Method for maintaining cache consistency of multi-core heterogeneous platform
CN115514772B (en) * 2022-11-15 2023-03-10 山东云海国创云计算装备产业创新中心有限公司 Method, device and equipment for realizing cache consistency and readable medium
CN116167310A (en) * 2023-04-25 2023-05-26 上海芯联芯智能科技有限公司 Method and device for verifying cache consistency of multi-core processor

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1171159A (en) * 1994-12-23 1998-01-21 英特尔公司 Cache coherent multiprocessing computer system with reduced power operating features
CN101458665A (en) * 2007-12-14 2009-06-17 扬智科技股份有限公司 Second level cache and kinetic energy switch access method
CN101694639A (en) * 2009-10-15 2010-04-14 清华大学 Computer data caching method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1171159A (en) * 1994-12-23 1998-01-21 英特尔公司 Cache coherent multiprocessing computer system with reduced power operating features
CN101458665A (en) * 2007-12-14 2009-06-17 扬智科技股份有限公司 Second level cache and kinetic energy switch access method
CN101694639A (en) * 2009-10-15 2010-04-14 清华大学 Computer data caching method

Also Published As

Publication number Publication date
CN101958834A (en) 2011-01-26

Similar Documents

Publication Publication Date Title
CN101958834B (en) On-chip network system supporting cache coherence and data request method
CN100495361C (en) Method and system for maintenance of memory consistency
US9792210B2 (en) Region probe filter for distributed memory system
CN103049422B (en) Method for building multi-processor node system with multiple cache consistency domains
JP4848771B2 (en) Cache coherency control method, chipset, and multiprocessor system
EP0817074B1 (en) Multiprocessing system employing a three-hop communication protocol
US9235529B2 (en) Using broadcast-based TLB sharing to reduce address-translation latency in a shared-memory system with optical interconnect
CN102063406B (en) Network shared Cache for multi-core processor and directory control method thereof
JP2000227908A (en) Non-uniform memory access(numa) data processing system having shared intervention support
US9009446B2 (en) Using broadcast-based TLB sharing to reduce address-translation latency in a shared-memory system with electrical interconnect
JPH10171710A (en) Multi-process system for executing effective block copying operation
JPH10187645A (en) Multiprocess system constituted for storage in many subnodes of process node in coherence state
GB2349721A (en) Multi-processor data processing system
JPH10149342A (en) Multiprocess system executing prefetch operation
CN101635679B (en) Dynamic update of route table
EP1561162B1 (en) Methods and apparatus for multiple cluster locking
US9183150B2 (en) Memory sharing by processors
EP2771796B1 (en) A three channel cache-coherency socket protocol
KR100257993B1 (en) Adaptive Granularity Method for Merging Micro and Coarse Communication in Distributed Shared Memory Systems
US6965972B2 (en) Real time emulation of coherence directories using global sparse directories
CN108170544B (en) Shared data dynamic updating method for data conflict-free program
CN1328670C (en) Method for supporting multiple processor node internal organ data sharing by directory protocol
CN116795767A (en) Multi-core Cache sharing consistency protocol construction method based on CHI protocol
Han et al. Reconfigurable MPB combined with cache coherence protocol in many-core
CN117290285A (en) On-chip cache consistency maintenance device and method of multi-core chiplet architecture

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20120905

Termination date: 20210927