CN110049104A

CN110049104A - Hybrid cache method, system and storage medium based on layering on-chip interconnection network

Info

Publication number: CN110049104A
Application number: CN201910198728.2A
Authority: CN
Inventors: 张顺; 黄奕烜; 虞志益
Original assignee: SYSU CMU Shunde International Joint Research Institute; Research Institute of Zhongshan University Shunde District Foshan; National Sun Yat Sen University
Current assignee: Sun Yat Sen University; SYSU CMU Shunde International Joint Research Institute; Research Institute of Zhongshan University Shunde District Foshan; National Sun Yat Sen University
Priority date: 2019-03-15
Filing date: 2019-03-15
Publication date: 2019-07-23

Abstract

The invention discloses hybrid cache method, system and storage medium based on layering on-chip interconnection network, method includes: the request of data sent according to the first kernel, the query caching data in the first cache unit；Confirmation do not inquire it is data cached after, send request of data, and query caching data in other kernels to shared bus；Confirmation do not inquire it is data cached after, by first node controller to contents controller send request of data；The second kernel cluster is confirmed by contents controller；It is sent a query to by second node controller to contents controller data cached；By contents controller by it is described inquire data cached feed back to first node controller；By first node controller by it is described inquire data cached feed back to the first kernel.Present invention reduces storage overhead and and realize difficulty, can be widely applied to technical field of integrated circuits.

Description

Hybrid cache method, system and storage medium based on layering on-chip interconnection network

Technical field

The present invention relates to technical field of integrated circuits, be based especially on layering on-chip interconnection network hybrid cache method, System and storage medium.

Background technique

Cache is the single-level memory being present between main memory and processor, for mitigate processor and memory it Between gaps between their growth rates.In multi-core processor, since multiple kernels may be written and read same data block, when in some When the shared data block of verification carries out write operation, the copy of the data block will obsolesce in the cache of other kernels Data, at this moment the data block in system in the cache of each kernel just will appear inconsistent, that is, the high speed often said is slow Deposit inconsistence problems.For this problem, bus monitoring agreement and the cache coherence based on catalogue are generally used at present Agreement solves, in which:

1, bus monitoring agreement is by way of bus monitoring come maintaining cached consistency.It allows system In each processor can monitor all operations carried out to memory, if these operations destroy in local cache number According to coherency state, local kernel will to bus send control signal, other kernels listen to control signal after can make Corresponding consistency movement.Each cache has the status indicator of oneself to show whether current cache line data have Effect.When local kernel will be written and read, whether the corresponding cache line data of first lookup is effective, carries out if effectively Read-write operation, while an invalid signals are broadcasted in bus, it, will be corresponding high when other kernels listen to the invalid signals Fast cache line data deactivates.Otherwise when local kernel will be written and read, if the number of the cache in local kernel It, then will be to one data request signal of bus broadcast, when other kernels listen to the signal, and the height of corresponding address according to invalid Fast cache lines are effective, then send corresponding data to request data local kernel.

It 2, is that cache coherence maintenance command is only issued into storage phase based on the cache coherent protocol of catalogue With those of data block duplicate cache.Catalogue in the cache coherent protocol based on catalogue is used to store system In it is all it is interior check the cachelines occupy situations, and asking for the same data access of all interior verifications is guaranteed by catalogue Ask serial execution.When local kernel reads and writes data to local cache, if not corresponding in local cache Data copy then sends request signal to catalogue, and contents controller will determine which of system high speed by searching for catalogue Caching possesses the data copy, and request signal is forwarded to the cache.When the cache for possessing the data copy connects When receiving this data request signal, it is slow by the local high speed that latest data is transmitted to request data by data transmission network It deposits.Cache coherent protocol based on catalogue is suitable for the multi-core processor using on-chip interconnection network framework.

In on-chip interconnection network framework and its corresponding cache coherent protocol based on catalogue, each node it Between using grouping routing by the way of be interconnected communication, provide the ability of good scalability and parallel communications for system, So that bandwidth increases several orders of magnitude, the problem of bus architecture bandwidth limits has been well solved.And on piece interconnects Network has changed the long interconnection line in bus architecture into short interconnection line between alteration switch, so that lower power consumption.

Currently, in the case where processor cores are less, it is general using shared bus architecture and using bus monitoring association It discusses and carrys out maintaining cached consistency；In the case where processor cores are more, then use on-chip interconnection network framework and adopt With the cache coherent protocol based on catalogue.

But when number of cores increases again, the catalogue in the cache coherent protocol based on catalogue stores will Very big memory space can be occupied, storage overhead is big and realizes that difficulty is big.

Summary of the invention

In order to solve the above technical problems, it is an object of the invention to: provide a kind of storage overhead it is small and realize difficulty it is small, Hybrid cache method, system and storage medium based on layering on-chip interconnection network.

On the one hand, the embodiment of the invention provides a kind of hybrid cache methods based on layering on-chip interconnection network, including Following steps:

According to the request of data that the first kernel is sent, the query caching number in corresponding first cache unit of the first kernel According to；

Confirm do not inquired in the first cache unit it is data cached after, to the shared bus of the first kernel cluster send data ask It asks, and the query caching data in other kernels of the first kernel cluster；Wherein, the kernel cluster is by multiple kernels in processor Composition；

Confirm do not inquired in the first kernel cluster it is data cached after, pass through correspond to the first kernel cluster first node Controller sends request of data to contents controller；Wherein, the Node Controller is configured at kernel cluster, the contents controller It is configured at the main memory of processor；

The second kernel cluster is confirmed by contents controller；The second kernel cluster primary storage has the request of data corresponding It is data cached；

The caching that second node controller by corresponding to the second kernel cluster is sent a query to contents controller Data；

By contents controller by it is described inquire data cached feed back to first node controller；

By first node controller by it is described inquire data cached feed back to the first kernel.

Further, further include network configuration steps, the network configuration steps the following steps are included:

Multiple kernels of processor are grouped, shared bussing technique is then based on and connects the different kernels in every group At kernel cluster；Wherein, data communication is carried out by shared bus between the different kernels in each kernel cluster；

The multiple kernel fasciations obtained according to grouping are at on-chip interconnection network；

For each kernel cluster configuration node controller；

For the main memory config directory controller of processor.

Further, the network configuration steps are further comprising the steps of:

For each kernel cluster configuration node memory；

It is stored by node memory data cached on each kernel in kernel cluster.

Further, the network configuration steps are further comprising the steps of:

For the main memory config directory memory of processor；

The directory information of each kernel cluster is stored by catalog memory.

Further, data communication is carried out by router between each kernel cluster in the on-chip interconnection network.

Further, the request of data sent according to the first kernel, in corresponding first cache unit of the first kernel The step for query caching data, comprising the following steps:

First kernel issues request of data to the first cache controller；

First cache controller is according to request of data query caching data.

Further, described the step for second kernel cluster is confirmed by contents controller, comprising the following steps:

Contents controller is according to the second kernel cluster where request of data query caching data；

According to query result, the directory information of catalog memory is updated；

Request of data is sent to the second kernel cluster by contents controller.

On the other hand, the embodiment of the invention provides a kind of hybrid cache system based on layering on-chip interconnection network, packets It includes:

Data demand module, the request of data for being sent according to the first kernel, in corresponding first caching of the first kernel Query caching data in unit；

First sending module, for confirm the first cache unit in do not inquire it is data cached after, to the first kernel cluster Shared bus sends request of data, and the query caching data in other kernels of the first kernel cluster；Wherein, the kernel cluster by Multiple kernels composition in processor；

Second sending module, for confirm do not inquired in the first kernel cluster it is data cached after, by corresponding to described the The first node controller of one kernel cluster sends request of data to contents controller；Wherein, in the Node Controller is configured at Core cluster, the contents controller are configured at the main memory of processor；

Confirmation module, for confirming the second kernel cluster by contents controller；The second kernel cluster primary storage has described Request of data is corresponding data cached；

Third sending module, for the second node controller by corresponding to the second kernel cluster to contents controller What is sent a query to is data cached；

First feedback module, for by contents controller by it is described inquire data cached feed back to first node control Device processed；

Second feedback module, for by first node controller by it is described inquire data cached feed back in first Core.

At least one processor；

At least one processor, for storing at least one program；

When at least one described program is executed by least one described processor, so that at least one described processor is realized The hybrid cache method based on layering on-chip interconnection network.

On the other hand, the embodiment of the invention provides a kind of storage mediums, wherein it is stored with the executable instruction of processor, The instruction that the processor can be performed is when executed by the processor for executing the mixing based on layering on-chip interconnection network Close caching method.

One or more technical solutions in the embodiments of the present invention have the advantages that the embodiment of the present invention passes through Shared bus inquiry is data cached in different kernels in same kernel cluster；By contents controller and Node Controller not With query caching data between kernel cluster；The present invention is mixed with bus monitoring agreement and the cache coherence association based on catalogue View can be suitable for the processor more than number of cores few processor and number of cores, reduce storage overhead and and realization hardly possible Degree.

Detailed description of the invention

Fig. 1 is the step flow chart of the embodiment of the present invention；

Fig. 2 is the system structure diagram of the embodiment of the present invention；

Fig. 3 is the shared bus structure schematic diagram of the embodiment of the present invention；

Fig. 4 is the mixed cache structure schematic diagram of the embodiment of the present invention；

Fig. 5 is the state transition diagram of the MEOSI agreement of the embodiment of the present invention.

Specific embodiment

The present invention is further explained and is illustrated with specific embodiment with reference to the accompanying drawings of the specification.For of the invention real The step number in example is applied, is arranged only for the purposes of illustrating explanation, any restriction is not done to the sequence between step, is implemented The execution sequence of each step in example can be adaptively adjusted according to the understanding of those skilled in the art.

Referring to Fig.1, the embodiment of the invention provides a kind of hybrid cache methods based on layering on-chip interconnection network, including Following steps:

It is further used as preferred embodiment, further includes network configuration steps, the network configuration steps include following Step:

For each kernel cluster configuration node controller；

For the main memory config directory controller of processor.

It is further used as preferred embodiment, the network configuration steps are further comprising the steps of:

For each kernel cluster configuration node memory；

It is stored by node memory data cached on each kernel in kernel cluster.

For the main memory config directory memory of processor；

The directory information of each kernel cluster is stored by catalog memory.

Be further used as preferred embodiment, in the on-chip interconnection network between each kernel cluster by router into Row data communication.

It is further used as preferred embodiment, the request of data sent according to the first kernel is checked in first In the first cache unit answered the step for query caching data, comprising the following steps:

First kernel issues request of data to the first cache controller；

First cache controller is according to request of data query caching data.

It is further used as preferred embodiment, described the step for second kernel cluster is confirmed by contents controller, packet Include following steps:

Request of data is sent to the second kernel cluster by contents controller.

With the method for Fig. 1 to corresponding, the embodiment of the invention provides a kind of mixing based on layering on-chip interconnection network is slow Deposit system, comprising:

At least one processor；

At least one processor, for storing at least one program；

With the method for Fig. 1 to corresponding, the embodiment of the invention provides a kind of storage mediums, wherein being stored with processor can hold Capable instruction, the executable instruction of the processor are described based on interconnecting on layergram for executing when executed by the processor The hybrid cache method of network.

A kind of specific implementation step of the hybrid cache method based on layering on-chip interconnection network of the present invention is described below in detail It is rapid:

All kernels in processor are divided into several groups first by the present invention, in each group by sharing bus for each place Reason device connects to form multiple clusters, a node of each kernel cluster as network-on-chip, all node conducts of formation The first layer of system.

Then, data packet is forwarded to form (the i.e. piece online of Noc interference networks by router between kernel cluster and kernel cluster Network), as the system second layer.

In first layer, all kernels in a kernel cluster are come by sharing bus interconnection using bus monitoring agreement Safeguard the partial cache consistency in a kernel cluster；

In the second layer, interconnected by Noc, using safeguarded based on the Cache coherency protocol of catalogue all kernel clusters it Between global buffer consistency.

The present invention by adding controller in each node, for as the communication bridge between two layer protocols, so that Two kinds of agreements can be effectively combined, and both realize that simple, real-time is good, and preferably adapt to more massive more kernel processes Device.

The whole system of the embodiment of the present invention is divided into double-layer structure, and first layer is used shared bus structure, supervised using bus Agreement is listened, the second layer uses on-chip network structure, uses the Cache coherency protocol based on catalogue.Addition section in each node Base site controller, Node Controller one end connect shared bus, and the other end connects the router of Noc, and the Node Controller is used for Receive and forwarding consistency signal (i.e. request of data) be into intra-node or Noc network, realize first layer and the second layer it Between data communication.

It is common, main memory and multiple processing data in each processor by one for storing all data Kernel composition.The embodiment of the present invention adds contents controller, one end of the contents controller on the hosting node of processor It is connected on the router of Noc, the other end connection main memory and catalog memory of the contents controller.

In first layer, the data exchange of all cachings and consistency order are all to realize to transmit by bus, each The Node Controller moment monitors the consistency order in bus, and forwards the command on the router being attached thereto, router It forwards consistency order into the contents controller of hosting node, realizes two kinds of protocol interactions, specifically includes the following steps:

1, the reading data between the kernel in same kernel cluster:

Specifically, when kernel is cached to it issues read data request, (the first kernel of the present invention is slow to first Memory cell sends request of data, corresponding data cached to inquire), if in caching without the data of required reading when, will Occur reading deletion condition.Cache controller can broadcast a reading deleted signal (data i.e. of the present invention in bus at this time Request), it is used to request data and searches its local cache after other kernels listen to this signal in bus and determine whether to possess most New data send the data to the caching of request data if possessing newest data, complete read operation.

2, the reading data between different kernel clusters:

Specifically, as shown in Fig. 2, Node represents a node (i.e. kernel cluster)；R represents router, for forwarding data, Realize the communication between node；Memory represents main memory；When kernel is cached to it reads data, local cache does not have required number According to, and current inner cluster (i.e. the first kernel cluster) in other kernels caching also without this data when, first node controller will A reading deleted signal (request of data i.e. of the present invention) is generated, and this is read into deleted signal and is sent to by router R The contents controller of hosting node Memory.After contents controller receives deleted signal, catalog memory missing data is searched The node (the second kernel cluster i.e. of the present invention) at place, and it is (i.e. of the present invention to node transmission reading deleted signal Request of data), after the Node Controller (i.e. second node controller) of node where data receives reading deleted signal, by data It transmits in Noc network, the node (i.e. of the present invention first for initiating request of data is finally sent to by router Kernel) in, complete read operation.

As shown in Figure 3, it is preferable that in the shared bus structure of a kernel cluster, the present embodiment includes N number of kernel CoreN, each kernel are owned by respective privately owned caching cache, remain peripheral hardware peripheral in each node；It is described outer If sharing bus Sharebus by Peripheral Interface Interface connection；Node controller is Node Controller, is used for Connecting node controller contents controller, Node Controller one end connect Sharebus (i.e. shared bus), other end link road By device；Local memory is node memory, stores the data of all kernel caches of intra-node.

As shown in Figure 4, it is preferable that in the mixed cache structure of the embodiment of the present invention, first layer is shared bus structure, is made With bus monitoring agreement；The second layer is on-chip network structure Noc, uses directory protocol, wherein main memory memory is located at Noc's Center；Kernel cluster node is connected with memory node by the router in Noc；Directory controller represents catalogue Controller, directory memory represent catalog memory, and the contents controller is the core of second layer directory protocol, use In the consistency that maintenance is global.

As shown in Figure 5, it is preferable that the monitoring protocols that the first layer bus structures of the embodiment of the present invention use are MOESI association View.Wherein, M, O, E, S, I respectively represent five kinds of states of the privately owned caching of kernel；Specifically, M modified represents modification State；O is Owned, and representative possesses state；E is exclusive, represents exclusive state；S is share, represents shared state；I For invalid, invalid state is represented.

State transition operation is described as follows in Fig. 5:

Probe is read: the purpose for indicating that current inner obtains data copy from other kernels is to read data；

Probe writes: the purpose for indicating that current inner obtains copy from other kernels is in order to which data are written；

It reads hit, write hit: indicating that current inner obtains data copy from local cache；

It reads to lose, write loss: indicating that current inner does not obtain data copy in local cache；

Probe reads hit, probe writes hit: indicating that current inner obtains data copy in the caching of other kernels.

During carrying out write operation for kernel, kernel is to its privately owned cache writing data, if local cache is hit, shape State may be E, S, M or O, specifically:

1, when state is E or is M, data are directly write in privately owned caching, and the data mode of caching is changed to M；

2, when state is S, data are directly write in privately owned caching, and the data mode of caching is changed to M.Simultaneously to mesh Record issues inquiry request, finds other kernels for saving the data copy, by the data mode of the caching of other data copies from S Or O moves to I；

3, when state is O, data are directly write in privately owned caching, and the state of the data of caching is changed to M.While to Catalogue issues inquiry request, other kernels for saving the data copy is found, by the data mode of the caching of other data copies I is moved to from S；

Preferably, Node Controller (Node controller) is the Fabric Interface of two layer protocols, and function includes:

1, consistency signal in bus is monitored, and is forwarded a signal in Noc network；

2, the consistency signal of contents controller is received, and signal is broadcast in bus, is safeguarded on node memory Data.

The function of node memory (Node memory) is the data cached of each kernel inside memory node, works as node When controller receives the request of data of contents controller, transmit data on the node for initiating request.

The function of contents controller (Directory controller) is the consistency signal of receiving node control, maintenance Contents controller.

Catalog memory (directory memory) is used to store the directory information in the Cache coherency protocol.

First layer of the present invention using MOESI (M:modified, O:Owned, E:exclusive, S:share, I: Invalid) monitoring protocols, it is as follows that state shifts concrete condition:

Read hit and be M state: M state shows that the data block is unique correct data block, local cache pair in system The data block is in exclusive state, and interior verification local cache directly reads data, and data mode is constant.

Read hit and be O state: O state shows that the data block is newest data in system, and one in other kernels Surely there is the data block copy, interior verification local cache directly reads data, and data mode is constant.

Read hit and be E-state: E-state shows there is latest data in caching and main memory, and kernel is directly to local cache It reads, state does not change.

Read hit and be S state: S state shows that the data block is latest data in system, and kernel is directly to local cache It reads, state does not change.

Read hit and be I state: I state shows that the data block is invalid, and local cache is ordered to one reading of bus broadcast first Middle invalid signals find the latest data for possessing the data, then when system other kernels listen to the signal and by comparison It is sent in the caching for initiating request data, and data mode is changed into S.

Write hit and for M state: kernel directly writes data into caching, and data mode is constant.

Write hit and for E-state: kernel directly writes data into caching, and data block status is changed to M.

Write hit and for S state: kernel directly writes data into caching, and state is changed to M, while other kernels The state of the data copy of preservation moves to I from S or O.

Write hit and for O state: kernel directly writes data into caching, and state is changed to M, while other kernels The state of the data copy of preservation moves to I from S.

Write hit and for I state: needs read newest data from other cachings, and local cache writes life to bus broadcast Middle invalid signals when other kernels listen to this signal in system, and possess effective data block, then transmit data to and ask Caching is sought, which is changed to I, state is changed to M after requesting the caching of the data to receive data.

Generally speaking, when kernel a certain in node reads and writes data into caching, cache controller issues five seed types Request signal: invalidation request signal reads miss request signal, reads hit but data invalid request signal, Write missing request Signal writes hit but data invalid request signal.It is as shown in table 1 to the rewriting situation of catalogue under different request signals:

Table 1

In conclusion being provided the present invention is based on hybrid cache method, system and the storage medium of layering on-chip interconnection network A kind of realization multi-core processor cache coherence method, the kernel for being able to solve the support of bus monitoring agreement is few, to bus bar The problems such as width requires high and directory protocol catalogue storage overhead big；So that bus monitoring agreement and the high speed based on catalogue are delayed Depositing consistency protocol can be effectively combined, and both realize that simple, real-time is good, and preferably adapt to more massive more kernels Processor.

It is to be illustrated to preferable implementation of the invention, but the present invention is not limited to the embodiment above, it is ripe Various equivalent deformation or replacement can also be made on the premise of without prejudice to spirit of the invention by knowing those skilled in the art, this Equivalent deformation or replacement are all included in the scope defined by the claims of the present application a bit.

Claims

1. the hybrid cache method based on layering on-chip interconnection network, it is characterised in that: the following steps are included:

According to the request of data that the first kernel is sent, the query caching data in corresponding first cache unit of the first kernel；

Confirm do not inquired in the first cache unit it is data cached after, to the shared bus of the first kernel cluster send request of data, And the query caching data in other kernels of the first kernel cluster；Wherein, the kernel cluster is by multiple core groups in processor At；

Confirm do not inquired in the first kernel cluster it is data cached after, pass through correspond to the first kernel cluster first node control Device sends request of data to contents controller；Wherein, the Node Controller is configured at kernel cluster, the contents controller configuration In the main memory of processor；

The second kernel cluster is confirmed by contents controller；The second kernel cluster primary storage has the corresponding caching of the request of data Data；

Second node controller by corresponding to the second kernel cluster sends a query to data cached to contents controller；

2. the hybrid cache method according to claim 1 based on layering on-chip interconnection network, it is characterised in that: further include Network configuration steps, the network configuration steps the following steps are included:

Multiple kernels of processor are grouped, are then based in shared bussing technique connects into the different kernels in every group Core cluster；Wherein, data communication is carried out by shared bus between the different kernels in each kernel cluster；

For each kernel cluster configuration node controller；

For the main memory config directory controller of processor.

3. the hybrid cache method according to claim 2 based on layering on-chip interconnection network, it is characterised in that: the institute It is further comprising the steps of to state network configuration steps:

For each kernel cluster configuration node memory；

It is stored by node memory data cached on each kernel in kernel cluster.

4. the hybrid cache method according to claim 2 based on layering on-chip interconnection network, it is characterised in that: the institute It is further comprising the steps of to state network configuration steps:

For the main memory config directory memory of processor；

The directory information of each kernel cluster is stored by catalog memory.

5. the hybrid cache method according to claim 2 based on layering on-chip interconnection network, it is characterised in that: described Data communication is carried out by router between each kernel cluster in upper interference networks.

6. the hybrid cache method according to claim 1 based on layering on-chip interconnection network, it is characterised in that: described According to the request of data that the first kernel is sent, in corresponding first cache unit of the first kernel the step for query caching data, The following steps are included:

First kernel issues request of data to the first cache controller；

First cache controller is according to request of data query caching data.

7. the hybrid cache method according to claim 4 based on layering on-chip interconnection network, it is characterised in that: described logical Cross the step for contents controller confirms the second kernel cluster, comprising the following steps:

Request of data is sent to the second kernel cluster by contents controller.

8. the hybrid cache system based on layering on-chip interconnection network, it is characterised in that: include:

Data demand module, the request of data for being sent according to the first kernel, in corresponding first cache unit of the first kernel Interior query caching data；

First sending module, for confirm do not inquired in the first cache unit it is data cached after, to the shared of the first kernel cluster Bus sends request of data, and the query caching data in other kernels of the first kernel cluster；Wherein, the kernel cluster is by handling Multiple kernels composition in device；

Second sending module, for confirm the first kernel cluster in do not inquire it is data cached after, by correspond to described first in The first node controller of core cluster sends request of data to contents controller；Wherein, the Node Controller is configured at kernel cluster, The contents controller is configured at the main memory of processor；

Confirmation module, for confirming the second kernel cluster by contents controller；The second kernel cluster primary storage has the data It requests corresponding data cached；

Third sending module is sent for the second node controller by corresponding to the second kernel cluster to contents controller What is inquired is data cached；

First feedback module, for by contents controller by it is described inquire data cached feed back to first node control Device；

Second feedback module, for by first node controller by it is described inquire data cached feed back to the first kernel.

9. the hybrid cache system based on layering on-chip interconnection network, it is characterised in that: include:

At least one processor；

At least one processor, for storing at least one program；

When at least one described program is executed by least one described processor, so that at least one described processor is realized as weighed Benefit requires the hybrid cache method based on layering on-chip interconnection network described in any one of 1-7.

10. a kind of storage medium, wherein being stored with the executable instruction of processor, it is characterised in that: the processor is executable Instruction be used to execute when executed by the processor as it is of any of claims 1-7 based on layering on-chip interconnection network Hybrid cache method.