CN113392604A - Advanced packaging technology-based dynamic capacity expansion method and system for cache under multi-CPU (Central processing Unit) common-package architecture - Google Patents

Advanced packaging technology-based dynamic capacity expansion method and system for cache under multi-CPU (Central processing Unit) common-package architecture Download PDF

Info

Publication number
CN113392604A
CN113392604A CN202110622895.2A CN202110622895A CN113392604A CN 113392604 A CN113392604 A CN 113392604A CN 202110622895 A CN202110622895 A CN 202110622895A CN 113392604 A CN113392604 A CN 113392604A
Authority
CN
China
Prior art keywords
cache
data
slave
master
cpu
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110622895.2A
Other languages
Chinese (zh)
Other versions
CN113392604B (en
Inventor
李晓霖
郝沁汾
叶笑春
范东睿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Computing Technology of CAS
Original Assignee
Institute of Computing Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Computing Technology of CAS filed Critical Institute of Computing Technology of CAS
Priority to CN202110622895.2A priority Critical patent/CN113392604B/en
Publication of CN113392604A publication Critical patent/CN113392604A/en
Application granted granted Critical
Publication of CN113392604B publication Critical patent/CN113392604B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/30Circuit design
    • G06F30/32Circuit design at the digital level
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/084Multiuser, multiprocessor or multiprocessing cache systems with a shared cache
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0877Cache access modes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2115/00Details relating to the type of the circuit
    • G06F2115/12Printed circuit boards [PCB] or multi-chip modules [MCM]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Evolutionary Computation (AREA)
  • Geometry (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

The invention provides a dynamic capacity expansion method and a dynamic capacity expansion system for a cache under a multi-CPU (central processing unit) common-package architecture based on an advanced packaging technology, aims to solve the problems of increased chip cost of a CPU (central processing unit) chip and difficulty in packaging caused by the expansion of the cache, and provides a novel CPU cache structural design capable of dynamically expanding the capacity. In the structure, the high-speed cache in the CPU chip can access the high-speed cache in the same type of CPU chip by designing the interaction mechanism of the high-speed caches among different CPUs and by means of the packaging technology, thereby achieving the aim of dynamically expanding the high-speed cache capacity in the CPU chip and realizing the high-speed cache sharing among multiple CPUs.

Description

Advanced packaging technology-based dynamic capacity expansion method and system for cache under multi-CPU (Central processing Unit) common-package architecture
Technical Field
The invention relates to a cache structure design in the field of CPU structure design, in particular to a dynamic capacity expansion method and a dynamic capacity expansion system of a cache under a multi-CPU co-encapsulation framework based on an advanced encapsulation technology.
Background
In the era of big data and cloud computing, due to the diversification of the number of users and data sources, more and more loads on a CPU present the characteristics of low computing access-memory ratio, unobvious data locality and the like. For example, graph computing applications are a typical application of data centers that are used to quickly process and handle ever-increasing graph data. However, for graph computation applications, the execution behavior of the graph computation applications becomes very irregular due to the irregular, unstructured nature of the graph computation load. Due to irregular fine-grained access, the hit rate of a cache is extremely low, and the utilization efficiency of a cache block is low, so that the existing general CPU architecture needs larger access bandwidth.
Since the memory access bandwidth of the CPU is slowly increasing, in order to make up for the increasing gap between the CPU performance and the memory performance, the CPU needs to integrate a cache with larger capacity. However, since the SRAM constituting the cache has a large area and the chip cost is almost proportional to the design area of the chip, integrating the cache with a larger capacity inevitably leads to an increase in chip cost. At the same time, the larger area of the chip also makes single chip packaging a great challenge. This presents a significant challenge to CPU design and packaging.
Disclosure of Invention
The invention is based on advanced package technology (advanced package technology), and can make the interaction speed between a plurality of chips in the same package faster through the advanced package technology.
The invention aims to solve the problems of increased chip cost and difficult packaging of a CPU chip caused by the expansion of a cache, and provides a novel CPU cache structural design capable of dynamically expanding capacity. In the structure, the high-speed cache in the CPU chip can access the high-speed cache in the same type of CPU chip by designing the interaction mechanism of the high-speed caches among different CPUs and by means of the packaging technology, thereby achieving the aim of dynamically expanding the high-speed cache capacity in the CPU chip and realizing the high-speed cache sharing among multiple CPUs.
The invention has the following key points:
1. by accessing the cache in other CPU chips, the cache capacity in the CPU chip is expanded, the extremely small area of the chip is increased, the chip feeding cost is reduced, and the packaging difficulty is reduced.
2. The accessed CPU chips are of the same kind, so only one CPU chip needs to be designed, and the CPU design difficulty and the CPU chip casting cost are reduced.
3. Even if a CPU chip partially fails due to the problem of good product rate in the tape-out, the cache can still be fully utilized as long as the internal cache circuit is correct, thereby reducing the loss caused by the failed CPU chip.
4. The invention designs an interaction mechanism between caches. By this mechanism, the cache of one CPU chip can operate on the cache of another CPU chip of the same type.
5. The cache structure designed by the invention can dynamically expand the capacity, so that the CPU chips can work independently without being influenced by other CPU chips, and can be combined into a CPU chip with a high-capacity cache for running programs with large access and storage requirements, such as graph calculation application, and the cache structure has flexibility.
6. With advanced packaging techniques, multiple CPU chips are integrated into a single package while maintaining performance close to monolithic integration. Therefore, in the cache structure designed by the invention, the single chip package of N CPU chips is integrated, and the performance of the cache structure is close to that of a single CPU chip with N times of cache capacity expansion.
Specifically, in order to overcome the defects in the prior art, the present invention provides a dynamic capacity expansion method for a cache under a multi-CPU co-encapsulated architecture based on an advanced encapsulation technology, wherein the method comprises:
step 1, setting a CPU which meets a preset condition as a master end, and selecting a CPU which can meet the memory access bandwidth requirement from other CPUs as a slave end of the master end according to the memory access bandwidth requirement of the master end and the cache sizes of the other CPUs except the master end;
step 2, when the main end reads the cache data, inquiring whether the read request hits in the local cache of the main end, if so, reading and returning the data from the local cache, otherwise, sending a request to the auxiliary end, inquiring whether the read request hits in the auxiliary end, if so, reading the hit auxiliary end cache and returning the data, otherwise, reading and returning the data from the memory, and simultaneously writing the read data into the local cache or the auxiliary end cache;
and 3, when the master end writes in the cache data, inquiring whether the write request hits in the local cache of the master end, if so, reading the data from the local cache, merging the data with the write-in data and then writing the data back to the local cache, otherwise, sending a request to the slave end, inquiring whether the write request hits in the slave end, if so, writing the data into the hit slave end cache, otherwise, writing the data into the master end cache or the slave end cache by replacing the local or slave end cache.
The dynamic capacity expansion method of the cache under the multi-CPU co-packaged architecture based on the advanced packaging technology, wherein the step 2 comprises:
step 21, the slave side inquires whether the read request hits in the local cache, if so, the read request is read from the local cache and returns data to the master side, otherwise, whether the local data of the slave side is selected to be replaced is judged, if so, the replaced block is sent to the master side, the data sent by the master side is received and written back to the local cache of the slave side, otherwise, the cache of the master side or other slave sides is selected to be replaced, and the process is ended.
The dynamic capacity expansion method of the cache under the multi-CPU co-packaged architecture based on the advanced packaging technology, wherein the step 3 comprises:
step 31, the slave side inquires whether the write request hits in the local cache, if so, the data sent by the master side is received, the data is merged with the data of the local cache and then written back to the local cache of the slave side, otherwise, whether the local data of the slave side is replaced is judged, if so, the replaced block is sent to the master side, the data sent by the master side is received and written back to the local cache of the slave side, and if not, the flow is ended.
The dynamic capacity expansion method of the cache under the multi-CPU co-encapsulation architecture based on the advanced encapsulation technology comprises the following preset conditions: the memory access bandwidth requirement of the CPU is larger than a threshold value, or the calculation memory access ratio of the CPU is lower than the threshold value.
The dynamic capacity expansion method of the cache under the multi-CPU common-package architecture based on the advanced packaging technology is characterized in that the main end and the auxiliary end are positioned in the same packaging chip.
The invention also provides a dynamic capacity expansion system of the cache under the multi-CPU common-package architecture based on the advanced packaging technology, which comprises the following steps:
the module 1 is used for setting the CPU which reaches the preset condition as a master end, and selecting the CPU which can meet the memory access bandwidth requirement from the other CPUs as a slave end of the master end according to the memory access bandwidth requirement of the master end and the cache sizes of the other CPUs except the master end;
the module 2 is used for inquiring whether the read request hits in a local cache of the master end when the master end reads the cache data, if so, reading and returning the data from the local cache, otherwise, sending a request to the slave end, inquiring whether the read request hits in the slave end, if so, reading and returning the hit slave end cache, otherwise, reading and returning the data from the memory, and simultaneously writing the read data into the local cache or the slave end cache;
and a module 3, configured to query whether the write request hits in the local cache of the master when the master writes in the cache data, and if so, read data from the local cache, merge with the write data and then write back to the local cache, otherwise send a request to the slave, query whether the write request hits in the slave, if so, write data into the hit slave, and otherwise, write data into the master cache or the slave cache by replacing the local or slave cache.
The dynamic capacity expansion system of the cache under the multi-CPU co-encapsulation architecture based on the advanced encapsulation technology, wherein the module 2 comprises:
module 21, configured to query whether the read request hits in the local cache, if so, read from the local cache and return data to the master, otherwise, determine whether to select to replace local data of the slave, if so, send the replaced block to the master, receive data sent by the master, write the data back to the local cache of the slave, otherwise, select to replace the cache of the master or other slaves, and end the process.
The dynamic capacity expansion system of the cache under the multi-CPU co-packaged architecture based on the advanced packaging technology, wherein the module 3 includes:
the module 31 is configured to query, by the slave, whether the write request hits in the local cache, if so, receive data sent by the master, combine with data in the local cache, and write back to the local cache of the slave, otherwise, determine whether to replace local data in the slave, if so, send a replaced block to the master, receive data sent by the master, write back the data to the local cache of the slave, and otherwise, end the flow.
The dynamic capacity expansion system of the cache under the multi-CPU co-packaged architecture based on the advanced packaging technology comprises the following preset conditions: the memory access bandwidth requirement of the CPU is larger than a threshold value, or the calculation memory access ratio of the CPU is lower than the threshold value.
The dynamic capacity expansion system of the cache under the multi-CPU co-encapsulation architecture based on the advanced encapsulation technology is characterized in that the master end and the slave end are positioned in the same encapsulation chip.
According to the scheme, the invention has the advantages that: in the structure of the invention, the cache of the CPU chip can dynamically expand the cache capacity into multiple times at the cost of only increasing a small area by accessing the caches of other CPU chips and by means of advanced packaging technology. On the one hand, the development costs of the chips are generally reduced, since the increased costs using advanced packaging techniques are lower than the increased costs of a larger chip-casting area. On the other hand, in a single package integrating a plurality of CPU chips, each CPU chip can work independently without being influenced by other CPU chips, and can be combined into a CPU chip with a large-capacity cache for running programs with large access and storage requirements, such as graph calculation application. And when the flexibility is obtained, only one CPU needs to be designed, so that the difficulty of CPU design and the cost of chip casting are reduced.
Drawings
Fig. 1 is a structural diagram of a single packaged chip integrated with 4 CPU chips, where 4 chips are in a general mode and are suitable for general computation;
FIG. 2 is a calculation structure diagram of a single package chip integrated with 4 CPU chips, wherein the CPU chip 1 is in a master mode, and the other CPU chips are in slave modes, and are used for large memory access bandwidth requirements;
FIG. 3 is a flow chart of reading cache data in a general mode;
FIG. 4 is a flow chart of a write of cache data in a general mode;
FIG. 5 is a flow chart of reading cache data in master mode;
FIG. 6 is a flow chart of writing cache data in master mode;
FIG. 7 is a flow chart of reading cached data in end mode;
FIG. 8 is a flow chart of writing cache data in end mode.
Detailed Description
In order to make the aforementioned features and effects of the present invention more comprehensible, embodiments accompanied with figures are described in detail below.
The cache structure design of the invention, the access mode of the CPU chip is divided into the following three types:
1. a general mode. In the mode, the access behavior of the CPU chip is consistent with the access behavior of the CPU chip which is not designed by using the cache structure, namely, the cache of other CPU chips cannot be accessed, and the cache in the CPU chip cannot be accessed by other CPU chips.
2. Master side mode. In this mode, the CPU chip can access the cache of other CPU chips, but can not be accessed to the cache in the CPU chip by other CPU chips.
3. Slave mode. In this mode, the CPU chip can be accessed to the cache in the CPU chip by other CPU chips, but can not be accessed to the cache of other CPU chips. Meanwhile, in the mode, the CPU chip only keeps the cache to continue to operate, and the rest parts stop working.
The memory access mode of the CPU chip can be statically configured through a top-layer pin and the like, and can also be dynamically configured through a register in the configuration chip. The static configuration adopted by the invention is that the access mode of the CPU chip is configured through the pins of the CPU top layer before the application program is run, and the mode is not changed in the running process of the application program. The dynamic configuration adopted by the invention is to configure the rest CPU chips according to the running application program in the running process. For example, in fig. 2, when running an application with a large memory bandwidth requirement, the CPU1 may configure the registers inside the CPU 2/3/4 to be the slave and the master through the pins interconnected between the CPUs. And if the access bandwidth requirement of the running application program is small, the running application program is switched back to the common mode, and other CPUs are not needed to be used as the slave ends.
In the description of the drawings, fig. 1 and 2 show 4 CPU chips designed based on the cache structure of the present invention integrated in a single package chip. Fig. 1 shows 4 CPU chips in a general mode, where the 4 chips normally and independently operate without affecting each other. The mode is suitable for common calculation, namely the memory access bandwidth requirement is low, the calculation of the memory access ratio is high, the calculation of the memory access ratio refers to the ratio of calculation operation and memory access operation in the running process of a CPU, and the lower the ratio, the more memory access operation is, and the greater the memory access bandwidth requirement is. If a calculation with a large memory access bandwidth requirement is required, for example, the memory access bandwidth requirement of the CPU is greater than a threshold, or the calculation memory access ratio of the CPU is lower than the threshold, as shown in fig. 2, 1 CPU chip (for example, CPU chip 1) may be configured as a master mode, and the remaining CPU chips are configured as slave modes, and at this time, the CPU chip 1 may increase its own cache capacity by four times by accessing the caches on the remaining 3 CPU chips, thereby improving the performance of the calculation. However, the present application is not limited to this, and since the present application adopts dynamic configuration, only one of the cores may be the master mode and the other is the slave mode according to the actual requirement of the CPU, and the remaining two are still the general modes. At this time, the CPU chip 1 can still increase the self cache capacity by two times.
The CPU chips are interconnected by intel's high-level interface bus protocol. Because the advanced interface bus protocol supports higher data transmission rate and adopts a compact CPU layout, the occupied area is reduced to the maximum extent. By means of advanced packaging technology, the cache structure designed by the invention integrates single-chip packaging of N CPU chips, and has performance close to that of a single CPU chip with N times of cache capacity.
Based on the cache structure design of the invention, the access mode of the CPU chip is divided into a general mode, a master end mode and a slave end mode. The flow of reading and writing the cache data corresponding to the three modes is shown in fig. 3-8.
Read cache data in normal mode:
1. a query is made as to whether the read request hits in the local cache. And if so, jumping to the step 2, otherwise, jumping to the step 3.
2. A local cache hit. Data is read from the local cache and returned. And ending the flow.
3. The local cache is lost. And judging whether the block needs to be written back to the memory or not, and if so, writing the replaced block back to the memory.
4. Data is read from memory, read data is returned, and written back to the local cache. And ending the flow.
Write cache data in general mode:
1. a query is made as to whether the write request hits in the local cache. And if so, jumping to the step 2, otherwise, jumping to the step 3.
2. A local cache hit. Data is read from the local cache, merged with the write data and written back to the local cache. And ending the flow.
3. The local cache is lost. And judging whether the block needs to be written back to the memory or not, and if so, writing the replaced block back to the memory.
4. Data is read from memory, merged with the write data and written back to the local cache. And ending the flow.
Read cache data in master mode:
1. a query is made as to whether the read request hits in the primary local cache. And if so, jumping to the step 2, otherwise, jumping to the step 3.
2. A local cache hit. Data is read from the local cache and returned. And ending the flow.
3. The local cache is lost. Sending a request to the slave, inquiring whether the read request hits in the slave, and waiting for all the slaves to reply.
4. All slaves respond. And if the slave end hits, jumping to the step 5, otherwise, jumping to the step 6.
5. There is a slave hit, and the data that the slave responds to is read and returned.
6. There is no slave hit. The cache local or slave is selected for replacement. If the local cache is selected to be replaced, jump to step 7, otherwise jump to step 9.
7. The local cache is replaced. And judging whether the block needs to be written back to the memory or not, and if so, writing the replaced block back to the memory.
8. Data is read from memory, read data is returned, and written back to the local cache. And ending the flow.
9. The slave-side cache is replaced. And judging whether the slave end needs to write back to the memory, if so, reading the data needing to be written back by the slave end and writing the data back to the memory.
10. Read data from memory, return read data, and write back to the slave cache. And ending the flow.
Write cache data in master mode:
1. a query is made as to whether the write request hits in the local cache. And if so, jumping to the step 2, otherwise, jumping to the step 3.
2. A local cache hit. Data is read from the local cache, merged with the write data and written back to the local cache. And ending the flow.
3. The local cache is lost. Sending a request to the slave, inquiring whether the write request hits in the slave, and waiting for all slaves to reply.
4. All slaves respond. And if the slave end hits, jumping to the step 5, otherwise, jumping to the step 6.
5. There is a slave hit, writing the data to the slave cache.
6. There is no slave hit. The cache local or slave is selected for replacement. If the local cache is selected to be replaced, jump to step 7, otherwise jump to step 9.
7. The local cache is replaced. And judging whether the block needs to be written back to the memory or not, and if so, writing the replaced block back to the memory.
8. Data is read from memory, merged with the write data and written back to the local cache. And ending the flow.
9. The slave-side cache is replaced. And judging whether the slave end needs to write back to the memory, if so, reading the data needing to be written back by the slave end and writing the data back to the memory.
10. Data is read from memory, merged with write data and written back to the slave cache. And ending the flow.
Read cache data in slave mode:
1. a query is made as to whether the read request hits in the local cache. And if so, jumping to the step 2, otherwise, jumping to the step 3.
2. A local cache hit. Reads and returns data from the local cache to the master. And ending the flow.
3. Both the master and the local are lost. It is determined whether the data in the read content is local. If the primary cache is selected to be replaced, the flow is ended, otherwise, the step 4 is skipped.
4. And judging whether the memory needs to be written back or not, and if so, sending the replaced block to the master end.
5. And receiving the data sent by the master end and writing the data back to the local cache. And ending the flow.
Write cache data in slave mode:
1. a query is made as to whether the write request hits in the local cache. And if so, jumping to the step 2, otherwise, jumping to the step 3.
2. A local cache hit. And receiving the data sent by the master end, merging the data with the data of the local cache, and then writing the merged data back to the local cache. And ending the flow.
3. Both the master and the local are lost. And judging whether to replace the local area. If the primary cache is selected to be replaced, the flow is ended, otherwise, the step 4 is skipped.
4. And judging whether the memory needs to be written back or not, and if so, sending the replaced block to the master end.
5. And receiving the data sent by the master end and writing the data back to the local cache. And ending the flow.
The following are examples of methods corresponding to the above examples, and this embodiment can be implemented in cooperation with the above embodiments. The related technical details mentioned in the above embodiments are still valid in this embodiment, and are not described herein again in order to reduce repetition. Accordingly, the related-art details mentioned in the present embodiment can also be applied to the above-described embodiments.
The invention also provides a dynamic capacity expansion system of the cache under the multi-CPU common-package architecture based on the advanced packaging technology, which comprises the following steps:
the module 1 is used for setting the CPU which reaches the preset condition as a master end, and selecting the CPU which can meet the memory access bandwidth requirement from the other CPUs as a slave end of the master end according to the memory access bandwidth requirement of the master end and the cache sizes of the other CPUs except the master end;
the module 2 is used for inquiring whether the read request hits in a local cache of the master end when the master end reads the cache data, if so, reading and returning the data from the local cache, otherwise, sending a request to the slave end, inquiring whether the read request hits in the slave end, if so, reading and returning the hit slave end cache, otherwise, reading and returning the data from the memory, and simultaneously writing the read data into the local cache or the slave end cache;
and a module 3, configured to query whether the write request hits in the local cache of the master when the master writes in the cache data, and if so, read data from the local cache, merge with the write data and then write back to the local cache, otherwise send a request to the slave, query whether the write request hits in the slave, if so, write data into the hit slave, and otherwise, write data into the master cache or the slave cache by replacing the local or slave cache.
The dynamic capacity expansion system of the cache under the multi-CPU co-encapsulation architecture based on the advanced encapsulation technology, wherein the module 2 comprises:
module 21, configured to query whether the read request hits in the local cache, if so, read from the local cache and return data to the master, otherwise, determine whether to select to replace local data of the slave, if so, send the replaced block to the master, receive data sent by the master, write the data back to the local cache of the slave, otherwise, select to replace the cache of the master or other slaves, and end the process.
The dynamic capacity expansion system of the cache under the multi-CPU co-packaged architecture based on the advanced packaging technology, wherein the module 3 includes:
the module 31 is configured to query, by the slave, whether the write request hits in the local cache, if so, receive data sent by the master, combine with data in the local cache, and write back to the local cache of the slave, otherwise, determine whether to replace local data in the slave, if so, send a replaced block to the master, receive data sent by the master, write back the data to the local cache of the slave, and otherwise, end the flow.
The dynamic capacity expansion system of the cache under the multi-CPU co-packaged architecture based on the advanced packaging technology comprises the following preset conditions: the memory access bandwidth requirement of the CPU is larger than a threshold value, or the calculation memory access ratio of the CPU is lower than the threshold value.
The dynamic capacity expansion system of the cache under the multi-CPU co-encapsulation architecture based on the advanced encapsulation technology is characterized in that the master end and the slave end are positioned in the same encapsulation chip.

Claims (10)

1. A dynamic capacity expansion method of a cache under a multi-CPU common-package architecture based on an advanced packaging technology is characterized by comprising the following steps:
step 1, setting the CPU meeting the preset condition as a master end, and selecting the CPU which can meet the memory access bandwidth requirement from the other CPUs as a slave end of the master end according to the memory access bandwidth requirement of the master end and the cache sizes of the other CPUs except the master end.
Step 2, when the main end reads the cache data, inquiring whether the read request hits in the local cache of the main end, if so, reading and returning the data from the local cache, otherwise, sending a request to the auxiliary end, inquiring whether the read request hits in the auxiliary end, if so, reading the hit auxiliary end cache and returning the data, otherwise, reading and returning the data from the memory, and simultaneously writing the read data into the local cache or the auxiliary end cache;
and 3, when the master end writes in the cache data, inquiring whether the write request hits in the local cache of the master end, if so, reading the data from the local cache, merging the data with the write-in data and then writing the data back to the local cache, otherwise, sending a request to the slave end, inquiring whether the write request hits in the slave end, if so, writing the data into the hit slave end cache, otherwise, writing the data into the master end cache or the slave end cache by replacing the local or slave end cache.
2. The method as claimed in claim 1, wherein the step 2 comprises:
step 21, the slave side inquires whether the read request hits in the local cache, if so, the read request is read from the local cache and returns data to the master side, otherwise, whether the local data of the slave side is selected to be replaced is judged, if so, the replaced block is sent to the master side, the data sent by the master side is received and written back to the local cache of the slave side, otherwise, the cache of the master side or other slave sides is selected to be replaced, and the process is ended.
3. The method as claimed in claim 1, wherein the step 3 comprises:
step 31, the slave side inquires whether the write request hits in the local cache, if so, the data sent by the master side is received, the data is merged with the data of the local cache and then written back to the local cache of the slave side, otherwise, whether the local data of the slave side is replaced is judged, if so, the replaced block is sent to the master side, the data sent by the master side is received and written back to the local cache of the slave side, and if not, the flow is ended.
4. The method of claim 1, wherein the predetermined condition comprises: the memory access bandwidth requirement of the CPU is larger than a threshold value, or the calculation memory access ratio of the CPU is lower than the threshold value.
5. The method of claim 1, wherein the master and the slave are located in a same package chip.
6. A dynamic capacity expansion system of a cache under a multi-CPU common-package architecture based on an advanced packaging technology is characterized by comprising:
the module 1 is used for setting the CPU which reaches the preset condition as a master end, and selecting the CPU which can meet the memory access bandwidth requirement from the other CPUs as a slave end of the master end according to the memory access bandwidth requirement of the master end and the cache sizes of the other CPUs except the master end;
the module 2 is used for inquiring whether the read request hits in a local cache of the master end when the master end reads the cache data, if so, reading and returning the data from the local cache, otherwise, sending a request to the slave end, inquiring whether the read request hits in the slave end, if so, reading and returning the hit slave end cache, otherwise, reading and returning the data from the memory, and simultaneously writing the read data into the local cache or the slave end cache;
and a module 3, configured to query whether the write request hits in the local cache of the master when the master writes in the cache data, and if so, read data from the local cache, merge with the write data and then write back to the local cache, otherwise send a request to the slave, query whether the write request hits in the slave, if so, write data into the hit slave, and otherwise, write data into the master cache or the slave cache by replacing the local or slave cache.
7. The system of claim 6, wherein the module 2 comprises:
module 21, configured to query whether the read request hits in the local cache, if so, read from the local cache and return data to the master, otherwise, determine whether to select to replace local data of the slave, if so, send the replaced block to the master, receive data sent by the master, write the data back to the local cache of the slave, otherwise, select to replace the cache of the master or other slaves, and end the process.
8. The system of claim 6, wherein the module 3 comprises:
the module 31 is configured to query, by the slave, whether the write request hits in the local cache, if so, receive data sent by the master, combine with data in the local cache, and write back to the local cache of the slave, otherwise, determine whether to replace local data in the slave, if so, send a replaced block to the master, receive data sent by the master, write back the data to the local cache of the slave, and otherwise, end the flow.
9. The system of claim 6, wherein the predetermined conditions include: the memory access bandwidth requirement of the CPU is larger than a threshold value, or the calculation memory access ratio of the CPU is lower than the threshold value.
10. The advanced packaging technology based multi-CPU co-packaged architecture based dynamic capacity expansion system of cache memory of claim 6, wherein the master and the slave are located in the same package chip.
CN202110622895.2A 2021-06-04 2021-06-04 Dynamic capacity expansion method and system for cache under multi-CPU co-encapsulation architecture based on advanced encapsulation technology Active CN113392604B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110622895.2A CN113392604B (en) 2021-06-04 2021-06-04 Dynamic capacity expansion method and system for cache under multi-CPU co-encapsulation architecture based on advanced encapsulation technology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110622895.2A CN113392604B (en) 2021-06-04 2021-06-04 Dynamic capacity expansion method and system for cache under multi-CPU co-encapsulation architecture based on advanced encapsulation technology

Publications (2)

Publication Number Publication Date
CN113392604A true CN113392604A (en) 2021-09-14
CN113392604B CN113392604B (en) 2023-08-01

Family

ID=77618188

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110622895.2A Active CN113392604B (en) 2021-06-04 2021-06-04 Dynamic capacity expansion method and system for cache under multi-CPU co-encapsulation architecture based on advanced encapsulation technology

Country Status (1)

Country Link
CN (1) CN113392604B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5608890A (en) * 1992-07-02 1997-03-04 International Business Machines Corporation Data set level cache optimization
CN101706755A (en) * 2009-11-24 2010-05-12 中国科学技术大学苏州研究院 Caching collaboration system of on-chip multi-core processor and cooperative processing method thereof
CN103370696A (en) * 2010-12-09 2013-10-23 国际商业机器公司 Multicore system, and core data reading method
CN107111553A (en) * 2015-01-13 2017-08-29 高通股份有限公司 System and method for providing dynamic caching extension in many cluster heterogeneous processor frameworks

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5608890A (en) * 1992-07-02 1997-03-04 International Business Machines Corporation Data set level cache optimization
CN101706755A (en) * 2009-11-24 2010-05-12 中国科学技术大学苏州研究院 Caching collaboration system of on-chip multi-core processor and cooperative processing method thereof
CN103370696A (en) * 2010-12-09 2013-10-23 国际商业机器公司 Multicore system, and core data reading method
CN107111553A (en) * 2015-01-13 2017-08-29 高通股份有限公司 System and method for providing dynamic caching extension in many cluster heterogeneous processor frameworks

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
梁成豪: "面向多处理器的片上高速缓存共享策略研究与实现", 《中国优秀硕士学位论文全文数据库信息科技辑》, pages 137 - 6 *

Also Published As

Publication number Publication date
CN113392604B (en) 2023-08-01

Similar Documents

Publication Publication Date Title
Carvalho The gap between processor and memory speeds
US20160085585A1 (en) Memory System, Method for Processing Memory Access Request and Computer System
US20150261698A1 (en) Memory system, memory module, memory module access method, and computer system
US20080270741A1 (en) Structure for program directed memory access patterns
US7899984B2 (en) Memory module system and method for operating a memory module
US20020103988A1 (en) Microprocessor with integrated interfaces to system memory and multiplexed input/output bus
US11507527B2 (en) Active bridge chiplet with integrated cache
WO2008022162A2 (en) Systems and methods for program directed memory access patterns
CN1983329A (en) Apparatus, system, and method for graphics memory hub
US5832251A (en) Emulation device
KR20150044370A (en) Systems for managing heterogeneous memories
US20130191587A1 (en) Memory control device, control method, and information processing apparatus
US20240021239A1 (en) Hardware Acceleration System for Data Processing, and Chip
CN114490023A (en) High-energy physical calculable storage device based on ARM and FPGA
CN114240731B (en) Distributed storage interconnection structure, video card and memory access method of graphics processor
CN113392604A (en) Advanced packaging technology-based dynamic capacity expansion method and system for cache under multi-CPU (Central processing Unit) common-package architecture
CN100520737C (en) Caching system, method and computer system
KR100237565B1 (en) Semiconductor memory device
KR100332188B1 (en) High bandwidth narrow i/o memory device with command stacking
US8244929B2 (en) Data processing apparatus
US7472212B2 (en) Multi CPU system
CN112559401B (en) PIM technology-based sparse matrix chain access system
US20090300411A1 (en) Implementing Redundant Memory Access Using Multiple Controllers for Memory System
CN217588059U (en) Processor system
US6260105B1 (en) Memory controller with a plurality of memory address buses

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant