CN113392604B - Dynamic capacity expansion method and system for cache under multi-CPU co-encapsulation architecture based on advanced encapsulation technology - Google Patents

Dynamic capacity expansion method and system for cache under multi-CPU co-encapsulation architecture based on advanced encapsulation technology Download PDF

Info

Publication number
CN113392604B
CN113392604B CN202110622895.2A CN202110622895A CN113392604B CN 113392604 B CN113392604 B CN 113392604B CN 202110622895 A CN202110622895 A CN 202110622895A CN 113392604 B CN113392604 B CN 113392604B
Authority
CN
China
Prior art keywords
cache
data
slave
master
cpu
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110622895.2A
Other languages
Chinese (zh)
Other versions
CN113392604A (en
Inventor
李晓霖
郝沁汾
叶笑春
范东睿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Computing Technology of CAS
Original Assignee
Institute of Computing Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Computing Technology of CAS filed Critical Institute of Computing Technology of CAS
Priority to CN202110622895.2A priority Critical patent/CN113392604B/en
Publication of CN113392604A publication Critical patent/CN113392604A/en
Application granted granted Critical
Publication of CN113392604B publication Critical patent/CN113392604B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/30Circuit design
    • G06F30/32Circuit design at the digital level
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/084Multiuser, multiprocessor or multiprocessing cache systems with a shared cache
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0877Cache access modes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2115/00Details relating to the type of the circuit
    • G06F2115/12Printed circuit boards [PCB] or multi-chip modules [MCM]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention provides a dynamic capacity expansion method and a system for a cache under a multi-CPU co-encapsulation architecture based on an advanced encapsulation technology, which aim to solve the problems of increased CPU chip casting cost and difficult encapsulation caused by expanding the cache. In the structure, the cache in the CPU chip can access the cache in the same type of CPU chip by designing the interaction mechanism of caches among different CPUs and by means of the packaging technology, thereby achieving the purpose of dynamically expanding the cache capacity in the CPU chip and realizing the cache sharing among multiple CPUs.

Description

Dynamic capacity expansion method and system for cache under multi-CPU co-encapsulation architecture based on advanced encapsulation technology
Technical Field
The present invention relates to a cache structure design in the field of CPU structure design, and in particular, to a method and system for dynamically expanding cache under a multi-CPU co-encapsulation architecture based on advanced packaging technology.
Background
In the times of big data and cloud computing, due to the number of users and the diversity of data sources, more and more loads on a CPU show the characteristics of low computing access memory ratio, unobvious data locality and the like. For example, graph computation applications are a typical application for data centers that are used to quickly process and handle ever-increasing graph data. But for graph computing applications, the execution behavior of the graph computing application becomes very irregular due to the irregular, unstructured nature of the graph computing load. Irregular fine granularity access causes extremely low hit rate of a cache and low utilization efficiency of a cache block, so that the existing general CPU architecture needs larger access bandwidth.
Since the memory bandwidth of a CPU is growing slowly, in order to make up for the gap between the increasing CPU performance and the memory performance, the CPU needs to integrate a cache of a larger capacity. However, since the SRAM constituting the cache has a large area and the cost of the chip is almost proportional to the design area of the chip, integrating a cache of a larger capacity tends to result in an increase in the chip cost of the chip. At the same time, larger area chips also present significant challenges for single chip packaging. This presents a significant challenge to CPU design and packaging.
Disclosure of Invention
The invention is based on advanced packaging technology (advanced package technology), by which the speed of interaction between multiple chips located in the same package can be made faster.
The invention aims to solve the problems of increased CPU chip casting cost and difficult packaging caused by expanding a cache, and provides a novel CPU cache structure design capable of dynamically expanding capacity. In the structure, the cache in the CPU chip can access the cache in the same type of CPU chip by designing the interaction mechanism of caches among different CPUs and by means of the packaging technology, thereby achieving the purpose of dynamically expanding the cache capacity in the CPU chip and realizing the cache sharing among multiple CPUs.
The invention has the following key points:
1. by accessing the cache in other CPU chips, the capacity of the cache in the chip of the CPU is expanded, the minimum area of the chip is increased, the chip feeding cost is reduced, and the packaging difficulty is reduced.
2. The accessed CPU chips are of the same type, so that only one type of CPU chip is required to be designed, and the CPU design difficulty and the CPU chip throwing cost are reduced.
3. Even if some failed CPU chips exist in the stream chip due to the yield problem, the cache memory can be fully utilized as long as the cache memory circuit in the stream chip is correct, so that the loss caused by the failed CPU chips is reduced.
4. The invention designs an interaction mechanism between caches. By this mechanism, the cache of one CPU chip can operate on the cache of another similar CPU chip.
5. The cache structure designed by the invention can dynamically expand the capacity, so that the CPU chip can not only work independently and is not influenced by other CPU chips, but also can be combined into a CPU chip with a large capacity cache for running programs with large memory access requirements, such as graph computing application, and has flexibility.
6. With advanced packaging techniques, multiple CPU chips are integrated into a single package while maintaining performance close to monolithic integration. Therefore, in the cache structure designed by the invention, the single package integrating N CPU chips has performance close to that of a single CPU chip expanding N times of cache capacity.
Specifically, the invention provides a dynamic capacity expansion method for a cache under a multi-CPU co-encapsulation architecture based on an advanced encapsulation technology, aiming at the defects of the prior art, which comprises the following steps:
step 1, setting a CPU which reaches a preset condition as a main terminal, and selecting a CPU which can meet the memory bandwidth requirement from other CPUs according to the memory bandwidth requirement of the main terminal and the cache size of the other CPUs except the main terminal to be set as a slave terminal of the main terminal;
step 2, when the main end reads the cache data, inquiring whether the read request hits in the local cache of the main end, if yes, reading and returning the data from the local cache, otherwise, sending a request to the auxiliary end, inquiring whether the read request hits in the auxiliary end, if yes, reading the cache of the auxiliary end hit and returning the data, otherwise, reading the data from the internal memory and returning, and simultaneously writing the read data into the local or auxiliary cache;
and 3, inquiring whether the write request hits in the local cache of the master when the master writes the cache data, if yes, reading the data from the local cache, merging the write data with the write data, and then writing the read data back to the local cache, otherwise, sending a request to the slave, inquiring whether the write request hits in the slave, if yes, writing the data into the hit slave cache, otherwise, writing the data into the master cache or the slave cache by replacing the local cache or the slave cache.
The method for dynamically expanding the cache under the multi-CPU co-encapsulation architecture based on the advanced encapsulation technology comprises the following steps:
and step 21, the slave terminal inquires whether the read request hits in the local cache, if yes, the read request reads the data from the local cache and returns the data to the master terminal, otherwise, whether the local data of the slave terminal is replaced is judged, if yes, the replaced block is sent to the master terminal, the data sent by the master terminal is received and written back into the local cache of the slave terminal, otherwise, the cache of the master terminal or other slave terminals is replaced, and the flow is ended.
The method for dynamically expanding the cache under the multi-CPU co-encapsulation architecture based on the advanced encapsulation technology comprises the following steps:
step 31, the slave terminal inquires whether the write request hits in the local cache, if yes, the data sent by the master terminal is received, the data is combined with the data of the local cache and then written back to the local cache of the slave terminal, otherwise, whether the local data of the slave terminal is replaced is judged, if yes, the replaced block is sent to the master terminal, the data sent by the master terminal is received, the data is written back to the local cache of the slave terminal, and otherwise, the flow is ended.
The method for dynamically expanding the cache under the multi-CPU co-encapsulation architecture based on the advanced encapsulation technology, wherein the preset conditions comprise: the memory bandwidth requirement of the CPU is greater than a threshold, or the calculated memory duty cycle of the CPU is less than a threshold.
The method for dynamically expanding the cache under the multi-CPU co-encapsulation architecture based on the advanced encapsulation technology, wherein the master end and the slave end are positioned in the same encapsulation chip.
The invention also provides a dynamic capacity expansion system of the cache under the multi-CPU co-encapsulation architecture based on the advanced encapsulation technology, which comprises:
the module 1 is configured to set a CPU that reaches a preset condition as a master, and select, from the remaining CPUs, a CPU that can meet the access bandwidth requirement as a slave of the master according to the access bandwidth requirement of the master and the cache sizes of the remaining CPUs except the master;
a module 2, configured to, when the master reads the cache data, query whether the read request hits in the local cache of the master, if yes, read and return the data from the local cache, otherwise send a request to the slave, query whether the read request hits in the slave, if yes, read the hit slave cache and return the data, otherwise read the data from the memory and return, and write the read data into the local or slave cache;
and the module 3 is configured to query whether the write request hits in the local cache of the master when the master writes the cache data, if yes, read the data from the local cache, merge with the write data, and write back to the local cache, otherwise send a request to the slave, query whether the write request hits in the slave, if yes, write the data into the hit slave cache, otherwise, write the data into the master cache or the slave cache by replacing the local or slave cache.
The advanced packaging technology-based dynamic capacity expansion system of a cache under a multi-CPU co-packaging architecture, wherein the module 2 comprises:
and the module 21 is configured to query whether the read request hits in the local cache, read and return data from the local cache to the master if the read request hits, otherwise determine whether to replace local data of the slave, send the replaced block to the master if the read request hits, receive the data sent by the master, write the data back to the local cache of the slave, otherwise select to replace the local cache of the master or other slave, and end the flow.
The advanced packaging technology-based dynamic capacity expansion system of a cache under a multi-CPU co-packaging architecture, wherein the module 3 comprises:
and the module 31 is configured to query whether the write request hits in the local cache by the slave, if yes, receive the data sent by the master, combine with the data in the local cache, and write back to the local cache of the slave, otherwise determine whether to replace the local data of the slave, if yes, send the replaced block to the master, receive the data sent by the master, write back to the local cache of the slave, and otherwise end the flow.
The advanced packaging technology-based dynamic capacity expansion system of the cache under the multi-CPU co-packaging architecture, wherein the preset conditions comprise: the memory bandwidth requirement of the CPU is greater than a threshold, or the calculated memory duty cycle of the CPU is less than a threshold.
The dynamic capacity expansion system of the cache under the multi-CPU co-encapsulation architecture based on the advanced encapsulation technology, wherein the master end and the slave end are positioned in the same encapsulation chip.
The advantages of the invention are as follows: in the structure of the present invention, the cache of the CPU chip can dynamically expand the cache capacity by multiple times at the cost of only increasing a smaller area by accessing the caches of other CPU chips and by means of advanced packaging technology. On the one hand, the development costs of the chip as a whole are reduced, since the increased costs using advanced packaging techniques are lower than the increased costs of larger chip area. On the other hand, in a single package integrated with a plurality of CPU chips, each CPU chip can work independently and is not affected by other CPU chips, and can be combined into a CPU chip with a large capacity cache for running programs with large memory access requirements, such as graph computing application. At the same time of obtaining flexibility, only one CPU needs to be designed, so the difficulty of CPU design and the cost of chip throwing are reduced.
Drawings
FIG. 1 is a block diagram of a single package chip with 4 CPU chips integrated therein, the 4 chips being in a general mode, adapted for use in general computing;
fig. 2 is a calculation structure diagram of a single package chip integrated with 4 CPU chips, wherein the CPU chip 1 is in a master mode and the remaining CPU chips are in slave modes, and the calculation structure diagram is used for a large memory access bandwidth requirement;
FIG. 3 is a flow chart of reading cache data in the general mode;
FIG. 4 is a flow chart of writing cache data in the general mode;
FIG. 5 is a flow chart of a read cache data in master mode;
FIG. 6 is a flow chart of writing cache data in master mode;
FIG. 7 is a flow chart of reading cache data from the slave mode;
FIG. 8 is a flow chart for writing cache data in slave mode.
Detailed Description
In order to make the above features and effects of the present invention more clearly understood, the following specific examples are given with reference to the accompanying drawings.
The cache structure design of the invention, the access mode of the CPU chip is divided into the following three modes:
1. general mode. In the mode, the access and storage performance behavior of the CPU chip is consistent with that of the CPU chip which does not use the cache structure design of the invention, namely, the cache of other CPU chips is not accessed, and the cache in the CPU chip is not accessed by other CPU chips.
2. Master mode. In this mode the CPU chip may access the caches of other CPU chips, but not the caches within its own chip.
3. Slave mode. In this mode the CPU chip may be accessed by other CPU chips to its own on-chip cache but not to other CPU chip caches. Meanwhile, the CPU chip only keeps the cache to continue running in the mode, and the rest parts stop working.
The access mode of the CPU chip can be statically configured through a top-level pin and the like, and also can be dynamically configured through configuration of an internal register of the chip. The static configuration adopted by the invention is to configure the access mode of the CPU chip through pins on the top layer of the CPU before the application program is operated, and the mode is not changed in the operation process of the application program. The dynamic configuration adopted by the invention means that the configuration is carried out on the other CPU chips according to the running application program in the running process. For example, in fig. 2, when running an application program with a large access bandwidth requirement, the CPU1 may configure registers in the CPU 2/3/4 to be a slave terminal and to be a master terminal through pins interconnected between the CPUs. If the access bandwidth requirement of the running application program is small, the mode is switched back to the general mode, and other CPUs are not needed as slaves.
In the accompanying drawings, fig. 1 and 2 show 4 CPU chips designed based on the cache structure of the present invention integrated into a single packaged chip. In fig. 1, 4 CPU chips in a general mode are shown, and the 4 chips work normally and independently without mutual influence. The mode is suitable for common calculation, namely, the memory bandwidth requirement is small, the memory ratio is calculated, the memory ratio is the ratio of the calculation operation to the memory operation in the running process of the CPU, and the lower the ratio is, the more the memory operation is, the larger the memory bandwidth requirement is. If the memory bandwidth requirement is required to be calculated, for example, when the memory bandwidth requirement of the CPU is greater than a threshold value, or the calculated memory ratio of the CPU is lower than the threshold value, as shown in fig. 2, 1 CPU chip (e.g., CPU chip 1) may be configured as a master mode, and the other CPU chips are slave modes, where the CPU chip 1 may increase its own cache capacity to four times by accessing caches on the other 3 CPU chips, thereby improving the performance of the calculation. However, the present application is not limited thereto, and because the present application adopts dynamic configuration, only one of the cores may be the master mode and the other may be the slave mode according to the actual requirement of the CPU, and the two remain general modes. At this time, the CPU chip 1 can still double the self-high cache capacity.
The CPU chips are interconnected by the intel's high-level interface bus protocol. Because the advanced interface bus protocol supports higher data transfer rates and a compact CPU layout is employed, the footprint is minimized. By means of advanced packaging technology, the single-chip package integrating N CPU chips in the cache structure designed by the invention has performance close to that of a single CPU chip with N times of cache capacity expanded.
Based on the cache structure design of the invention, the access mode of the CPU chip is divided into a general mode, a master mode and a slave mode. The flow of reading and writing the cache data corresponding to the three modes is shown in fig. 3-8.
Read cache data in general mode:
1. a query is made as to whether the read request hits in the local cache. If hit, jump to step 2, otherwise jump to step 3.
2. Local cache hits. Data is read from and returned to the local cache. Ending the flow.
3. The local cache is lost. Whether the memory needs to be written back is judged, and if so, the replaced block is written back to the memory.
4. Data is read from memory, returned to the read data, and written back to the local cache. Ending the flow.
Write cache data in general mode:
1. it is queried whether the write request hits in the local cache. If hit, jump to step 2, otherwise jump to step 3.
2. Local cache hits. Data is read from the local cache, combined with the write data, and written back to the local cache. Ending the flow.
3. The local cache is lost. Whether the memory needs to be written back is judged, and if so, the replaced block is written back to the memory.
4. The data is read from the memory, combined with the write data and written back to the local cache. Ending the flow.
Reading cache data in a master mode:
1. a query is made as to whether the read request hits in the local cache on the home side. If hit, jump to step 2, otherwise jump to step 3.
2. Local cache hits. Data is read from and returned to the local cache. Ending the flow.
3. The local cache is lost. A request is sent to the slave, querying if the read request hits on the slave, and waiting for all slaves to answer.
4. All slaves answer. If there is a slave hit, go to step 5, otherwise go to step 6.
5. There is a slave hit, and the data of the slave reply is read and returned.
6. There is no slave hit. A cache is selected to replace either the local or the slave. If the local cache is selected to be replaced, go to step 7, otherwise go to step 9.
7. Replacing the local cache. Whether the memory needs to be written back is judged, and if so, the replaced block is written back to the memory.
8. Data is read from memory, returned to the read data, and written back to the local cache. Ending the flow.
9. Replacing the slave cache. Judging whether the slave needs to be written back to the memory, if so, reading the data which the slave needs to write back to the memory.
10. Data is read from memory, returned to the read data, and written back to the slave cache. Ending the flow.
Writing cache data in master mode:
1. it is queried whether the write request hits in the local cache. If hit, jump to step 2, otherwise jump to step 3.
2. Local cache hits. Data is read from the local cache, combined with the write data, and written back to the local cache. Ending the flow.
3. The local cache is lost. A request is sent to the slave, querying if the write request hits on the slave, and waiting for all slaves to answer.
4. All slaves answer. If there is a slave hit, go to step 5, otherwise go to step 6.
5. There is a slave hit, writing the data to the slave cache.
6. There is no slave hit. A cache is selected to replace either the local or the slave. If the local cache is selected to be replaced, go to step 7, otherwise go to step 9.
7. Replacing the local cache. Whether the memory needs to be written back is judged, and if so, the replaced block is written back to the memory.
8. The data is read from the memory, combined with the write data and written back to the local cache. Ending the flow.
9. Replacing the slave cache. Judging whether the slave needs to be written back to the memory, if so, reading the data which the slave needs to write back to the memory.
10. The data is read from the memory, and the read data and the write data are combined and then written back to the slave cache. Ending the flow.
Reading the cache data in slave mode:
1. a query is made as to whether the read request hits in the local cache. If hit, jump to step 2, otherwise jump to step 3.
2. Local cache hits. Reads and returns data from the local cache to the master. Ending the flow.
3. Both the master and the local are lost. It is determined whether to read the data in the content instead of locally. If the primary cache is selected to be replaced, the flow is ended, otherwise, the process jumps to step 4.
4. And judging whether the memory needs to be written back, and if so, sending the replaced block to the main terminal.
5. And receiving the data sent by the main terminal and writing back to the local cache. Ending the flow.
Write cache data in slave mode:
1. it is queried whether the write request hits in the local cache. If hit, jump to step 2, otherwise jump to step 3.
2. Local cache hits. And receiving data sent by the main terminal, merging the data with the data of the local cache, and writing the data back to the local cache. Ending the flow.
3. Both the master and the local are lost. It is determined whether to replace the local. If the primary cache is selected to be replaced, the flow is ended, otherwise, the process jumps to step 4.
4. And judging whether the memory needs to be written back, and if so, sending the replaced block to the main terminal.
5. And receiving the data sent by the main terminal and writing back to the local cache. Ending the flow.
The following is an example of a method corresponding to the above-described example, and this embodiment mode can be implemented in cooperation with the above-described embodiment mode. The related technical details mentioned in the above embodiments are still valid in this embodiment, and in order to reduce repetition, they are not repeated here. Accordingly, the related technical details mentioned in the present embodiment can also be applied to the above-described embodiments.
The invention also provides a dynamic capacity expansion system of the cache under the multi-CPU co-encapsulation architecture based on the advanced encapsulation technology, which comprises:
the module 1 is configured to set a CPU that reaches a preset condition as a master, and select, from the remaining CPUs, a CPU that can meet the access bandwidth requirement as a slave of the master according to the access bandwidth requirement of the master and the cache sizes of the remaining CPUs except the master;
a module 2, configured to, when the master reads the cache data, query whether the read request hits in the local cache of the master, if yes, read and return the data from the local cache, otherwise send a request to the slave, query whether the read request hits in the slave, if yes, read the hit slave cache and return the data, otherwise read the data from the memory and return, and write the read data into the local or slave cache;
and the module 3 is configured to query whether the write request hits in the local cache of the master when the master writes the cache data, if yes, read the data from the local cache, merge with the write data, and write back to the local cache, otherwise send a request to the slave, query whether the write request hits in the slave, if yes, write the data into the hit slave cache, otherwise, write the data into the master cache or the slave cache by replacing the local or slave cache.
The advanced packaging technology-based dynamic capacity expansion system of a cache under a multi-CPU co-packaging architecture, wherein the module 2 comprises:
and the module 21 is configured to query whether the read request hits in the local cache, read and return data from the local cache to the master if the read request hits, otherwise determine whether to replace local data of the slave, send the replaced block to the master if the read request hits, receive the data sent by the master, write the data back to the local cache of the slave, otherwise select to replace the local cache of the master or other slave, and end the flow.
The advanced packaging technology-based dynamic capacity expansion system of a cache under a multi-CPU co-packaging architecture, wherein the module 3 comprises:
and the module 31 is configured to query whether the write request hits in the local cache by the slave, if yes, receive the data sent by the master, combine with the data in the local cache, and write back to the local cache of the slave, otherwise determine whether to replace the local data of the slave, if yes, send the replaced block to the master, receive the data sent by the master, write back to the local cache of the slave, and otherwise end the flow.
The advanced packaging technology-based dynamic capacity expansion system of the cache under the multi-CPU co-packaging architecture, wherein the preset conditions comprise: the memory bandwidth requirement of the CPU is greater than a threshold, or the calculated memory duty cycle of the CPU is less than a threshold.
The dynamic capacity expansion system of the cache under the multi-CPU co-encapsulation architecture based on the advanced encapsulation technology, wherein the master end and the slave end are positioned in the same encapsulation chip.

Claims (6)

1. The dynamic capacity expansion method of the cache under the multi-CPU co-encapsulation architecture based on the advanced encapsulation technology is characterized by comprising the following steps:
step 1, setting a CPU which reaches a preset condition as a main terminal, and selecting a CPU which can meet the memory bandwidth requirement from other CPUs according to the memory bandwidth requirement of the main terminal and the cache size of the other CPUs except the main terminal to be set as a slave terminal of the main terminal; the master end and the slave end are positioned in the same packaging chip; the preset condition is that the access bandwidth requirement of the CPU is larger than a threshold value, or the calculated access duty ratio of the CPU is lower than the threshold value;
step 2, when the main end reads the cache data, inquiring whether the read request hits in the local cache of the main end, if yes, reading and returning the data from the local cache, otherwise, sending a request to the auxiliary end, inquiring whether the read request hits in the auxiliary end, if yes, reading the cache of the auxiliary end hit and returning the data, otherwise, reading the data from the internal memory and returning, and simultaneously writing the read data into the local or auxiliary cache;
and 3, inquiring whether the write request hits in the local cache of the master when the master writes the cache data, if yes, reading the data from the local cache, merging the write data with the write data, and then writing the read data back to the local cache, otherwise, sending a request to the slave, inquiring whether the write request hits in the slave, if yes, writing the data into the hit slave cache, otherwise, writing the data into the master cache or the slave cache by replacing the local cache or the slave cache.
2. The method for dynamically expanding cache memory under multi-CPU co-packaged architecture according to claim 1, wherein the step 2 comprises:
and step 21, the slave terminal inquires whether the read request hits in the local cache, if yes, the read request reads the data from the local cache and returns the data to the master terminal, otherwise, whether the local data of the slave terminal is replaced is judged, if yes, the replaced block is sent to the master terminal, the data sent by the master terminal is received and written back into the local cache of the slave terminal, otherwise, the cache of the master terminal or other slave terminals is replaced, and the flow is ended.
3. The method for dynamic cache expansion in multi-CPU co-packaged architecture based on advanced packaging technology as claimed in claim 1, wherein said step 3 comprises:
step 31, the slave terminal inquires whether the write request hits in the local cache, if yes, the data sent by the master terminal is received, the data is combined with the data of the local cache and then written back to the local cache of the slave terminal, otherwise, whether the local data of the slave terminal is replaced is judged, if yes, the replaced block is sent to the master terminal, the data sent by the master terminal is received, the data is written back to the local cache of the slave terminal, and otherwise, the flow is ended.
4. A dynamic expansion system for cache under multi-CPU co-encapsulation architecture based on advanced encapsulation technology, comprising:
the module 1 is configured to set a CPU that reaches a preset condition as a master, and select, from the remaining CPUs, a CPU that can meet the access bandwidth requirement as a slave of the master according to the access bandwidth requirement of the master and the cache sizes of the remaining CPUs except the master; the master end and the slave end are positioned in the same packaging chip; the preset condition is that the access bandwidth requirement of the CPU is larger than a threshold value, or the calculated access duty ratio of the CPU is lower than the threshold value;
a module 2, configured to, when the master reads the cache data, query whether the read request hits in the local cache of the master, if yes, read and return the data from the local cache, otherwise send a request to the slave, query whether the read request hits in the slave, if yes, read the hit slave cache and return the data, otherwise read the data from the memory and return, and write the read data into the local or slave cache;
and the module 3 is configured to query whether the write request hits in the local cache of the master when the master writes the cache data, if yes, read the data from the local cache, merge with the write data, and write back to the local cache, otherwise send a request to the slave, query whether the write request hits in the slave, if yes, write the data into the hit slave cache, otherwise, write the data into the master cache or the slave cache by replacing the local or slave cache.
5. The advanced packaging technology based multi-CPU co-packaged architecture cache dynamic capacity expansion system as claimed in claim 4, wherein said module 2 comprises:
and the module 21 is configured to query whether the read request hits in the local cache, read and return data from the local cache to the master if the read request hits, otherwise determine whether to replace local data of the slave, send the replaced block to the master if the read request hits, receive the data sent by the master, write the data back to the local cache of the slave, otherwise select to replace the local cache of the master or other slave, and end the flow.
6. The advanced packaging technology based multi-CPU co-packaged architecture cache dynamic capacity expansion system as claimed in claim 4, wherein said module 3 comprises:
and the module 31 is configured to query whether the write request hits in the local cache by the slave, if yes, receive the data sent by the master, combine with the data in the local cache, and write back to the local cache of the slave, otherwise determine whether to replace the local data of the slave, if yes, send the replaced block to the master, receive the data sent by the master, write back to the local cache of the slave, and otherwise end the flow.
CN202110622895.2A 2021-06-04 2021-06-04 Dynamic capacity expansion method and system for cache under multi-CPU co-encapsulation architecture based on advanced encapsulation technology Active CN113392604B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110622895.2A CN113392604B (en) 2021-06-04 2021-06-04 Dynamic capacity expansion method and system for cache under multi-CPU co-encapsulation architecture based on advanced encapsulation technology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110622895.2A CN113392604B (en) 2021-06-04 2021-06-04 Dynamic capacity expansion method and system for cache under multi-CPU co-encapsulation architecture based on advanced encapsulation technology

Publications (2)

Publication Number Publication Date
CN113392604A CN113392604A (en) 2021-09-14
CN113392604B true CN113392604B (en) 2023-08-01

Family

ID=77618188

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110622895.2A Active CN113392604B (en) 2021-06-04 2021-06-04 Dynamic capacity expansion method and system for cache under multi-CPU co-encapsulation architecture based on advanced encapsulation technology

Country Status (1)

Country Link
CN (1) CN113392604B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5608890A (en) * 1992-07-02 1997-03-04 International Business Machines Corporation Data set level cache optimization
CN101706755A (en) * 2009-11-24 2010-05-12 中国科学技术大学苏州研究院 Caching collaboration system of on-chip multi-core processor and cooperative processing method thereof
CN103370696A (en) * 2010-12-09 2013-10-23 国际商业机器公司 Multicore system, and core data reading method
CN107111553A (en) * 2015-01-13 2017-08-29 高通股份有限公司 System and method for providing dynamic caching extension in many cluster heterogeneous processor frameworks

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5608890A (en) * 1992-07-02 1997-03-04 International Business Machines Corporation Data set level cache optimization
CN101706755A (en) * 2009-11-24 2010-05-12 中国科学技术大学苏州研究院 Caching collaboration system of on-chip multi-core processor and cooperative processing method thereof
CN103370696A (en) * 2010-12-09 2013-10-23 国际商业机器公司 Multicore system, and core data reading method
CN107111553A (en) * 2015-01-13 2017-08-29 高通股份有限公司 System and method for providing dynamic caching extension in many cluster heterogeneous processor frameworks

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
面向多处理器的片上高速缓存共享策略研究与实现;梁成豪;《中国优秀硕士学位论文全文数据库信息科技辑》;I137-6 *

Also Published As

Publication number Publication date
CN113392604A (en) 2021-09-14

Similar Documents

Publication Publication Date Title
US7490217B2 (en) Design structure for selecting memory busses according to physical memory organization information stored in virtual address translation tables
Carvalho The gap between processor and memory speeds
US7539842B2 (en) Computer memory system for selecting memory buses according to physical memory organization information stored in virtual address translation tables
US20150261698A1 (en) Memory system, memory module, memory module access method, and computer system
US20080077732A1 (en) Memory module system and method for operating a memory module
WO2020103058A1 (en) Programmable operation and control chip, a design method, and device comprising same
CN103150216A (en) SoC-integrated multi-port DDR2/3 scheduler and scheduling method
US20130191587A1 (en) Memory control device, control method, and information processing apparatus
US20180336034A1 (en) Near memory computing architecture
CN104409099A (en) FPGA (field programmable gate array) based high-speed eMMC (embedded multimedia card) array controller
CN113392604B (en) Dynamic capacity expansion method and system for cache under multi-CPU co-encapsulation architecture based on advanced encapsulation technology
Cho et al. A case for cxl-centric server processors
US11281397B2 (en) Stacked memory device performing function-in-memory (FIM) operation and method of operating the same
CN114240731B (en) Distributed storage interconnection structure, video card and memory access method of graphics processor
Chen et al. MIMS: Towards a message interface based memory system
US20230195368A1 (en) Write Request Buffer
CN111258949A (en) Loongson 3A +7A + FPGA-based heterogeneous computer module
Wang et al. Alloy: Parallel-serial memory channel architecture for single-chip heterogeneous processor systems
US8516179B2 (en) Integrated circuit with coupled processing cores
US11928039B1 (en) Data-transfer test mode
CN217588059U (en) Processor system
US20240079036A1 (en) Standalone Mode
CN116627880B (en) PCIe Switch supporting RAID acceleration and RAID acceleration method thereof
US20240004560A1 (en) Efficient memory power control operations
US6260105B1 (en) Memory controller with a plurality of memory address buses

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant