WO2024041131A1 - 一种片上系统 - Google Patents

一种片上系统 Download PDF

Info

Publication number
WO2024041131A1
WO2024041131A1 PCT/CN2023/101056 CN2023101056W WO2024041131A1 WO 2024041131 A1 WO2024041131 A1 WO 2024041131A1 CN 2023101056 W CN2023101056 W CN 2023101056W WO 2024041131 A1 WO2024041131 A1 WO 2024041131A1
Authority
WO
WIPO (PCT)
Prior art keywords
mram
chip
data
cache
processing module
Prior art date
Application number
PCT/CN2023/101056
Other languages
English (en)
French (fr)
Inventor
陈伟
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2024041131A1 publication Critical patent/WO2024041131A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit

Definitions

  • the present application relates to the field of communication technology, and in particular to a system on a chip.
  • SoC system on chip
  • the cache includes the first-level cache and the second-level cache used independently by each core, and also includes the third-level cache shared by each core.
  • SRAM Static Random Access Memory
  • This application provides an on-chip system to reduce cache power consumption.
  • embodiments of the present application provide a system-on-chip, which includes a processing module and a cache.
  • the processing module is the core of the system on chip, and the processing module is the core of the system on chip.
  • This application does not limit the number of processing modules.
  • the system-on-chip may include one processing module or multiple processing modules.
  • the processing module can process data and read and write data to the cache.
  • the cache includes multiple levels of cache.
  • the last level cache LLC among the multiple levels of cache includes the first MRAM.
  • MRAM is a non-volatile memory, and data is not lost when power is turned off.
  • MRAM as part of the cache can reduce the power consumption of the cache.
  • MRAM has higher data reading and writing capabilities, can support large-capacity data storage, improves cache data reading and writing speed, and increases cache capacity.
  • the last level cache also includes SRAM.
  • the LLC includes both SRAM and the first MRAM.
  • SRAM is used to support data writing
  • the first MRAM is used to support data reading and writing. That is to say, the processing module can write data into the SRAM and read data from the first MRAM. For example, when the processing module writes data to the last level cache, it writes data in SRAM. Data in the SRAM can be migrated into the first MRAM, that is, the first MRAM stores the data migrated from the SRAM.
  • SRAM can realize high-speed data writing, and the processing module gives priority to writing data to SRAM, which can ensure the data writing speed.
  • MRAM has better data retention and the data is not easy to change.
  • the processing module can accurately read data from the first MRAM.
  • the cache of the last level also includes a second MRAM. That is to say, the LLC includes two kinds of MRAM, one is the first MRAM and the other is the second MRAM.
  • the data retention capacity of the first MRAM is greater than the data retention capacity of the second MRAM, and the data writing speed of the first MRAM is smaller than the data writing speed of the second MRAM.
  • the second MRAM is used to support data writing
  • the first MRAM is used to support data reading and writing. That is to say, the processing module can write data into the second MRAM and read data from the first MRAM. For example, when the processing module writes data to the last level cache, it writes data in the second MRAM. Data in the second MRAM can be migrated into the first MRAM, that is, the first MRAM stores the data migrated from the SRAM.
  • the first MRAM has high data retention, and the data is not easily changed in the first MRAM, thereby reducing the bit error rate of the data.
  • the second MRAM has higher data writing capacity and can achieve high-speed data writing.
  • the processing module preferentially writes data to the second MRAM, which can ensure the data writing speed.
  • system-on-chip may be packaged in 2D.
  • the cache,and the processing module are located on the same plane.
  • the system-on-chip may be 3D packaged.
  • Various components in the system-on-chip (such as processing modules and multiple levels of cache) may be partially located on different planes and partially located on different planes.
  • the first MRAM and the second MRAM are located on the same plane, and the first MRAM and the processing module are located on different planes.
  • the first MRAM and the second MRAM are located on a different plane from the processing module, providing a larger deployment area for the processing module, so that a larger number of processing modules can be included in the system on a chip.
  • this The composition also allows the first MRAM and the second MRAM to occupy a larger area, further increasing the capacity of the LLC.
  • the first MRAM and the second MRAM when the first MRAM and the second MRAM are located on the same plane, the first MRAM and the second MRAM can be grown on the same substrate. This can simplify the LLC structure.
  • the first MRAM and the second MRAM can also be grown on different substrates.
  • the substrate on which the first MRAM is grown is spliced with the substrate on which the first MRAM is grown.
  • the two substrates are in the same plane after splicing, which can reduce LLC production difficulty improves process yield.
  • the system-on-chip may be 3D packaged. Some of the various components in the on-chip system may be located on different planes, and some may be located on different planes.
  • the first MRAM, the second MRAM, and the processing module are respectively located on different planes.
  • the first MRAM, the second MRAM, and the processing module can be arranged in a layer-by-layer stacking manner, and the first MRAM, the second MRAM, and the processing module are located on different layers.
  • the first MRAM, the second MRAM, and the processing module are located on different planes, which provides a larger deployment area for the processing module, so that the system on a chip can include a larger number of processing modules.
  • this composition also allows the first MRAM and the second MRAM to occupy a larger area, further increasing the capacity of the LLC.
  • the second MRAM when the first MRAM, the second MRAM, and the processing module are located on different planes, the second MRAM can be arranged on the plane of the close processing module, and the second MRAM can be arranged on the plane of the far away processing module. flat.
  • the second MRAM is closer to the processing module, which enables the processing module to quickly write data to the first MRAM and improves the data processing efficiency of the on-chip system.
  • the system-on-chip can also adopt a 3D packaging method.
  • the first MRAM and the SRAM are located on the same plane, and the first MRAM and the processing module are located on different planes.
  • the deployment area of the processing module is larger, and the system-on-chip allows the deployment of more processing modules.
  • using this packaging method also provides a larger deployment area for SRAM and the first MRAM, effectively increasing the capacity of LLC.
  • the first MRAM and the SRAM when the first MRAM and the SRAM are located on the same plane, the first MRAM and the SRAM can be grown on the same substrate. Simplify LLC structure. The first MRAM and SRAM can also be grown on different substrates, and the different substrates are located on the same plane after being spliced. This can reduce the difficulty of LLC production and improve process yield.
  • the system-on-chip can also adopt a 3D packaging method.
  • the first MRAM, SRAM, and processing module are respectively located on different planes.
  • a larger deployment area is provided for the first MRAM, SRAM, and processing modules, allowing a larger number of processing modules to be deployed, and a larger capacity of the first MRAM and SRAM.
  • the SRAM is located on a plane close to the processing module, so that the processing module can quickly write data to the SRAM, and the first MRAM is located on away from the surface of the processing module.
  • embodiments of the present application also provide a system-on-chip.
  • the system-on-chip includes the system-on-chip provided in the first aspect and any possible embodiment of the first aspect.
  • the system-on-chip further includes a memory, and the memory It can be used as the memory of the on-chip system to provide storage space for the on-chip system.
  • embodiments of the present application further provide a computing device, which includes the system-on-chip provided in the second aspect.
  • Figure 1 is a schematic structural diagram of an MTJ provided by this application.
  • FIG. 2 is a schematic structural diagram of a storage unit provided by this application.
  • FIG. 3 is a schematic structural diagram of an SoC provided by this application.
  • Figures 4A to 4D are schematic structural diagrams of a system-on-chip provided by this application.
  • Figures 5A to 5D are schematic structural diagrams of a system-on-chip provided by this application.
  • MRAM magnetic random access memory
  • MRAM mainly uses magnetic tunnel junction (MTJ) to display different resistance values under the action of different currents to achieve data storage.
  • MTJ magnetic tunnel junction
  • a magnetic tunnel junction generally refers to a sandwich structure composed of a ferromagnetic layer, a non-magnetic insulating layer, and a ferromagnetic layer.
  • the ferromagnetic layer at the bottom is called the reference layer.
  • the non-magnetic insulating layer is called a barrier.
  • the top ferromagnetic layer is called the free layer.
  • the reference layer has higher magnetism and the free layer has weaker magnetism.
  • STT-MRAM spin-torque-transfer MRAM
  • the magnetic properties of the free layer are weak and are not enough to polarize a spin current that is sufficient to change the magnetization direction of the reference layer.
  • the stronger magnetism of the reference layer will bounce the spin state opposite to its own magnetic moment back to the free layer, forming a spin current opposite to the magnetization direction of the reference layer and injecting it into the free layer. Since the magnetism of the free layer is weak, the free layer's The magnetization direction is deflected by the spin current and tends to be opposite to the magnetization direction of the reference layer. In this case, the resistance of MTJ is large and can be regarded as writing data "1".
  • STT-MRAM includes multiple memory cells, each of which mainly includes a metal oxide semiconductor field effect transistor (MOSFET) and MTJ.
  • MOSFET metal oxide semiconductor field effect transistor
  • the MOSFET is an N-type field effect transistor as an example.
  • the gate of the N-type field effect transistor is connected to the word line (WL).
  • WL is used to control the conduction or disconnection of the N-type field effect transistor, that is, WL controls the working status of the N-type field effect transistor.
  • the source (also called the drain) of the N-type field effect transistor is connected to the reference layer of the MTJ through the source line (SL).
  • the connection on the free layer of MTJ is bit line (BL). Applying different voltages between the bit line and the source line generates a current flowing through the magnetic tunnel junction (this current can be called a write current).
  • This write current can change the magnetization direction of the free layer of the magnetic tunnel junction, making the resistance of the MTJ Change to complete the writing of data "0" or "1" in the storage unit.
  • the bit line When reading the data in the memory cell, the bit line inputs a read current to the MTJ (the read current is usually smaller than the write current input for data writing). After the read current passes through the MTJ and the N-type field effect transistor, it flows from The source line output determines whether the data in the memory cell is "0" or "1” by detecting the voltage at the current input terminal and the current output terminal (the voltage is related to the resistance of the MTJ).
  • MRAM achieves data storage through the same or reverse magnetization direction of the reference layer and the free layer in the MTJ.
  • the speed of data writing in MRAM depends on the speed of change of the magnetization direction of the free layer in the MTJ. That is to say, when a write current is input to the MTJ, if the magnetization direction of the free layer in the MTJ can be deflected in a short time, it means that data can be written faster. If the magnetization direction of the free layer in the MTJ needs to be It takes a long time for deflection to occur, indicating that the data writing speed is relatively slow.
  • some properties of the MTJ in MRAM can be changed. For example, the degree of oxidation at the upper and lower interfaces of the free layer can be reduced. As another example, the thickness of the free layer can be reduced.
  • the data retention capacity of MRAM refers to the ability of data to remain unchanged after data is written into the memory cell.
  • the data retention capacity of MRAM is related to the ease with which the magnetization direction of the free layer in the MTJ is deflected. In other words, after writing data to the MTJ, if the magnetization direction of the free layer in the MTJ will not easily deflect for a long time, it means that the resistance of the MTJ will not change easily and the written data will not be changed. , MRAM has strong data retention. If the magnetization direction of the free layer in the MTJ is affected by the environment and the magnetization direction is deflected again, it means that the resistance of the MTJ is easy to change, the written data is lost, and the data retention of MRAM is poor.
  • some properties of MTJ in MRAM can also be changed.
  • the degree of oxidation at the upper and lower interfaces of the free layer can be increased.
  • the thickness of the free layer can be increased.
  • a metal-based film or metal oxide film is inserted in the middle of the free layer.
  • the embodiments of this application involve two MRAMs with different performances, one is an MRAM with relatively high data retention capacity, and the other is an MRAM with a relatively high data writing speed.
  • the comparison range of data retention and data writing speed here is limited to two types of MRAM.
  • the MRAM with relatively higher data retention capacity among the two MRAMs is called the high retention capacity MRAM 2311.
  • a MRAM with relatively high data writing speed is called high-speed MRAM2312.
  • the high-speed MRAM2312 shows that the data retention of the MRAM is relatively weak.
  • the high-speed MRAM2312 can be configured with an additional circuit. This circuit can input refresh current to the high-speed MRAM2312 to ensure that The magnetization direction of the free layer of the MTJ in this high-speed MRAM2312 remains unchanged after data is written.
  • a retention threshold can be set.
  • the retention threshold can be a time value. If the time for the data to change after the MRAM writes data is greater than the retention threshold, then the MRAM can be considered to be a high retention MRAM 2311. Otherwise, the MRAM is considered to have low data retention and is a low-retention MRAM. Therefore, in the embodiment of the present application, the high retention MRAM 2311 is the MRAM that takes the longest time for data to change after writing data.
  • a speed threshold can also be set.
  • the speed threshold can also be a speed value. If the speed of writing data in the MRAM is greater than the speed threshold, then the MRAM can be considered to be a high-speed MRAM 2312. Otherwise, the MRAM is considered to have a low data writing speed and is a low-speed MRAM. Therefore, in the embodiment of the present application, the high-speed MRAM 2312 is the MRAM with the highest data writing speed among the two MRAMs.
  • the above-mentioned method of evaluating the MRAM data writing speed is only an example. This application does not limit the specific evaluation criteria of the MRAM data writing speed and the setting method of the speed threshold.
  • the SoC 10 includes a processing module 100 and a cache 200 (cache).
  • the processing module 100 is used for data processing.
  • the processing module 100 can read and write data to the cache 200 . That is to say, the processing module 100 can write data in the cache 200 or read data from the cache 200 .
  • the data that the processing module 100 needs to process can be stored in the cache 200, and the processing module 100 can read the data from the cache 200 when it needs to process the data.
  • the processing module 100 can write the processed data into the cache 200 .
  • the processing module 100 can be the core of a processor, that is, the processing module 100 can be the part responsible for data processing in the system-on-chip.
  • the embodiments of this application do not limit the type of processor.
  • the processor can be a central processing unit (CPU), an image processor (graphics processing unit, GPU), or a tensor processing unit (TPU). , data processing unit (DPU) or neural network processing unit (NPU), etc.
  • the processing module 100 can also be implemented by an application-specific integrated circuit (ASIC) or a programmable logic device (PLD).
  • the PLD can be a complex programmable logical device (CPLD). , field-programmable gate array (FPGA), general array logic (GAL) or any combination thereof.
  • CPLD complex programmable logical device
  • FPGA field-programmable gate array
  • GAL general array logic
  • the cache 200 is the data storage module closest to the processing module 100 on the SoC 10, and the processing module 100 will directly read and write data to the cache 200.
  • the cache 200 includes N levels of cache, where N is a positive integer greater than 2.
  • the processing module 100 reads data from the cache 200, it reads the data from the cache in ascending order of levels until the data is hit. That is to say, the processing module 100 will first read data from the first level cache 210 (the first level cache in this article can be represented by L1, that is, the first level cache 210 can be identified by L1-210). If the data is hit in the first-level cache 210, the operation of reading the data ends; if the data is not hit in the first-level cache 210, the processing module 100 will continue to read the data from the second-level cache 220 (the second-level cache in this article can be represented by L2, that is, the second-level cache).
  • Level cache 220 can use L2-220) to read data. If the data is hit in the second-level cache 220, the operation of reading the data ends; if the data is not hit in the second-level cache 220, the processing module 100 will continue to read the data from the next-level cache; if the processing module 100 is in the last-level cache If the data is still not hit in the cache, the processing module 100 will read the data from the memory other than the cache 200 .
  • the processing module 100 When the processing module 100 writes data into the cache 200, it writes the data into the first-level cache 210.
  • the cache of the previous level will migrate the data in the cache of the previous level to the cache of the next level so that Data can continue to be written to the cache at the previous level.
  • the migrated data can be the first data written in the cache of the previous level, or the data whose read and write frequency is lower than the threshold in the cache of the previous level.
  • the data in the first-level cache when there is no free space in the first-level cache or data cannot be written, the data in the first-level cache will be migrated to the second-level cache to ensure that the first-level cache can continue to write data.
  • the data in the second-level cache When there is no free space in the second-level cache or data cannot continue to be written, the data in the second-level cache will be migrated to the next-level cache to ensure that the second-level cache can continue to write data.
  • the embodiment of the present application does not limit the number of processing modules 100.
  • the system-on-chip 10 may include one processing module 100 or may include Multiple processing modules 100.
  • the multiple processing modules 100 can share the N levels of cache. That is to say, the multiple processing modules 100 can all read data from the N levels of cache. Write.
  • the multiple processing modules 100 may also share only some levels of cache.
  • the plurality of processing modules 100 may share only the last one or more levels of cache among the N levels of cache. That is to say, each processing module 100 has a partial-level cache that is used independently, and the remaining partial-level caches can be jointly used by the multiple processing modules 100 .
  • each processing module 100 is configured with an independently used first-level cache 210 and a second-level cache 220. Multiple processing modules 100 share the last level cache 230 .
  • the last-level cache (LLC) 230 in the cache 200 includes MRAM 231.
  • MRAM231 is a non-volatile memory.
  • SRAM static random access memory
  • MRAM231 can support large-capacity data storage.
  • MRAM231 has a higher degree of integration. That is to say, for the same area of SRAM and MRAM231, MRAM231 has a larger storage space, which can increase the storage space of cache 200.
  • This application provides the following two types of LLC230 including MRAM231.
  • the first type, LLC230 includes SRAM232 and high retention MRAM2311.
  • the second type, LLC 230 includes high-speed MRAM 2312 and high-retention MRAM 2311.
  • the structure of the LLC 230 including MRAM is also suitable for having different numbers of processing modules 100 or having different Level cache system-on-chip 10.
  • the first type, LLC230 includes SRAM232 and high retention MRAM2311.
  • the system-on-chip 10 includes two processing modules 100 and a cache 200 .
  • Each processing module 100 independently uses the first-level cache 210 and the second-level cache 220 .
  • LLC230 includes two different types of memory, namely SRAM232 and high retention MRAM2311.
  • SRAM232 is used to realize data writing of LLC230
  • high retention force MRAM2311 is used to realize data reading of LLC230. That is to say, when data is written to LLC 230, it is written to SRAM 232 first. High retention MRAM 2311 can migrate the data in SRAM 232 to high retention when there is no free space in SRAM 232 or data cannot be continued to be written. Force MRAM2311, so that SRAM232 can continue to be written with data later. In this way, when the processing module 100 needs to read data from the LLC 230, the data can be read from the high retention MRAM 2311.
  • the embodiment of the present application does not limit the ratio of the area of the SRAM 232 in the LLC 230 (that is, the area occupied by the SRAM 232 in the SoC 10) and the area occupied by the high retention MRAM 2311 (that is, the area occupied by the high retention MRAM 2311 in the SoC 10).
  • the area of high retention force MRAM2311 can be larger than the area of SRAM232.
  • the high-retention MRAM 2311 Since the high-retention MRAM 2311 has high data retention, the data stored in the high-retention MRAM 2311 is not easily changed, and the bit error rate of the data can be reduced.
  • MTJ in high retention MRAM2311 writes data 0 or data 1, the difference in resistance is obvious. Only a lower read current is needed to determine the data written by MTJ in high retention MRAM2311, which can reduce the time of data reading. extension.
  • the components of the system-on-chip 10 are located in the same plane, which is a 2D packaging method.
  • the overall area of the system-on-chip 10 is related to the area of each component in the system-on-chip 10 .
  • the 2D packaging method will restrict the overall area of the system-on-chip 10, so that the system-on-chip 10 cannot include more processing modules 100, and the size of the cache 200 is also restricted.
  • a 3D packaging method is proposed. 3D packaging means that the components on the system-on-chip 10 are no longer all located in the same plane. Instead, the components on the system-on-chip 10 are packaged together in a three-dimensional manner. These components may be located in different planes in the system-on-chip 10 .
  • an on-chip system 10 is provided in an embodiment of the present application.
  • the on-chip system 10 includes two processing modules 100 and a cache 200 .
  • Each processing module 100 independently uses the first-level cache 200 and the second-level cache 200 .
  • LLC230 includes two different types of memory, namely SRAM232 and high retention MRAM2311.
  • the LLC 230 is located on the two processing modules 100, the first-level cache 200, and the second-level cache 200. That is, LLC 230 may be located in a different plane than processing module 100 .
  • the SRAM 232 and the high retention force MRAM 2311 included in the LLC 230 may be grown on the same substrate (such as a silicon wafer). That is, the SRAM232 and the high retention force MRAM2311 are respectively grown on the same substrate.
  • the two processing modules 100, the first-level cache 210, and the second-level cache 220 may be located on another substrate. As shown in FIG. 4B , the substrate on which the SRAM 232 and the high-retention MRAM 2311 are grown is located on another substrate on which the processing module 100 , the first-level cache 210 , and the second-level cache 220 are deployed, forming a 3D packaged system-on-chip 10 .
  • SRAM232 and high retention MRAM2311 can be grown on different substrates respectively. For details, see Figure 4C .
  • an on-chip system 10 is provided in an embodiment of the present application.
  • the on-chip system 10 includes two processing modules 100 and a cache 200 .
  • Each processing module 100 independently uses the first-level cache 210 and the second-level cache 220 .
  • LLC230 includes two different types of memory, namely SRAM232 and high retention MRAM2311.
  • the LLC 230 is located on the two processing modules 100, the first-level cache 210, and the second-level cache 220. That is, LLC 230 may be located on a different plane than processing module 100 .
  • LLC230 includes SRAM232 and high retention MRAM2311 grown on different substrates, that is, there are two substrates, one substrate is SRAM232, and the other substrate is high retention MRAM2311. Compared with growing SRAM232 and high-retention MRAM2311 on the same substrate, growing SRAM232 and high-retention MRAM2311 on different substrates can improve the manufacturing process yield and reduce costs.
  • the two substrates can be spliced together, and the spliced substrate can be located on the substrate on which the processing module 100, the first-level cache 210, and the second-level cache 220 are deployed, forming a 3D packaged system-on-chip 10.
  • LLC230 is grown on two substrates as an example. In practical applications, in order to further reduce the process difficulty, LLC230 can be grown on more than two substrates, and these substrates can be spliced on Together they form LLC230.
  • the LLC 230 is located on one layer, and the processing module 100 and other levels of cache are located on the other side.
  • the system-on-chip 10 adopts a two-layer stacking method. In some scenarios, the system-on-chip 10 may also be stacked in two or more layers. The following description takes the system-on-chip 10 shown in FIG. 4D as an example.
  • a system-on-chip 10 is provided in an embodiment of the present application.
  • the system-on-chip 10 includes two processing modules 100 and a cache 200 .
  • Each processing module 100 independently uses the first-level cache 210 and the second-level cache 220 .
  • LLC230 includes two different types of memory, namely SRAM232 and high retention MRAM2311.
  • the SRAM 232 and the high retention force MRAM 2311 are located on different planes.
  • the SRAM 232 and the processing module 100 are located on different planes.
  • the high retention force MRAM 2311 and the processing module 100 are located on different planes.
  • the SRAM 232 and the high retention force MRAM 2311 included in the LLC 230 are grown on different substrates, that is, there are two substrates, one to generate the SRAM 232 and the other to generate the high retention force MRAM 2311.
  • these two substrates are located on the substrate on which the processing module 100, the first-level cache 210, and the second-level cache 220 are deployed.
  • the two substrates do not need to be spliced together, but are stacked.
  • the forms lie on two different planes.
  • the substrate on which the SRAM 232 is grown can be located on a layer closer to the processing module 100
  • the substrate on which the high retention force MRAM 2311 is grown can be located on a layer further away from the processing module 100 .
  • LLC230 is grown on two substrates as an example here.
  • LLC230 can be grown on more than two substrates, and these substrates can be layered layer by layer. placed on the processing module 100 in a stacked manner.
  • the substrate can also be placed on the processing module 100 in a partially spliced and partially stacked manner. That is to say, some of these substrates can be spliced together and located on the same plane, while the remaining unspliced substrates are stacked and located on different planes.
  • the second type, LLC 230 includes high-speed MRAM 2312 and high-retention MRAM 2311.
  • the system-on-chip 10 includes two processing modules 100 and a cache 200 .
  • Each processing module 100 independently uses the first-level cache 210 and the second-level cache 220 .
  • LLC230 includes two different types of memory, namely high-speed MRAM2312 and high-retention MRAM2311.
  • high-speed MRAM2312 is used to realize data writing of LLC230
  • high-retention MRAM2311 is used to realize data reading of LLC230. That is to say, when data is written to LLC 230, it is written to high-speed MRAM 2312 first.
  • High-retention MRAM 2311 can migrate the data in high-speed MRAM 2312 when there is no free space in high-speed MRAM 2312 or data cannot be continued to be written. to the high retention force MRAM 2311, so that the high-speed MRAM 2312 can continue to be written with data in the future. In this way, when the processing module 100 needs to read data from the LLC 230, the data can be read from the high retention MRAM 2311.
  • the high-speed MRAM 2312 has the same functions as the SRAM 232 in FIG. 4A . Since MRAM itself can support large-capacity data storage, high-speed MRAM2312 can provide larger storage space, further increasing the capacity of LLC230.
  • the embodiment of the present application does not limit the ratio of the area of the high-speed MRAM 2312 in the LLC 230 (that is, the area occupied by the high-speed MRAM 2312 in the SoC 10) and the area occupied by the high-retention MRAM 2311.
  • the high-retention The area of force MRAM 2311 may be larger than the area of high speed MRAM 2312.
  • High-speed MRAM2312 improves its data writing speed by reducing its data retention.
  • the write current when writing data to high-speed MRAM2312 is less than the write current when writing data to high-retention MRAM2311.
  • the pressure that the layer has to bear is relatively small and it is not easy to be broken down.
  • the life of high-speed MRAM2312 is also relatively long.
  • the high-speed MRAM 2312 has weak data retention, in order to prevent the data written in the high-speed MRAM 2312 from changing, a corresponding refresh circuit can be added to provide the high-speed MRAM 2312 with a refresh current to maintain the data in the high-speed MRAM 2312. Although it will increase power consumption to a certain extent, compared with the power consumption caused by SRAM232, the power consumption caused by high-speed MRAM2312 is less.
  • the components on the system-on-chip 10 can be located in the same plane, which is a 2D packaging method.
  • the overall area of the system-on-chip 10 is limited by the area of each component in the system-on-chip 10 . If more processing modules 100 need to be deployed in the system-on-chip 10, or a larger cache 200 needs to be deployed on the system-on-chip 10, the overall area of the system-on-chip 10 will increase. In order to further reduce the overall area of the system-on-chip 10 while ensuring the number of processing modules 100 or the size of the cache 200 of the system-on-chip, a 3D packaging method is proposed.
  • 3D packaging means that the components in the system-on-chip 10 are no longer all located in the same plane. Instead, the components on the system-on-chip 10 are packaged together in a three-dimensional manner. These components may be located in different planes in the system-on-chip 10 .
  • an on-chip system 10 is provided in an embodiment of the present application.
  • the on-chip system 10 includes two processing modules 100 and a cache 200 .
  • Each processing module 100 independently uses the first-level cache 210 and the second-level cache 220 .
  • LLC230 includes two different types of memory, namely high-speed MRAM2312 and high-retention MRAM2311.
  • the LLC 230 is located on the two processing modules 100, the first-level cache 210, and the second-level cache 220. That is, LLC 230 may be located on a different plane than processing module 100 .
  • the two MRAMs included in the LLC 230 may be grown on the same substrate (such as a silicon wafer), that is, the high-speed MRAM 2312 and the high-retention MRAM 2311 are respectively grown on the same substrate.
  • the two processing modules 100, the first-level cache 210, and the second-level cache 220 may be located on another substrate.
  • the substrate on which the high-speed MRAM 2312 and the high-retention MRAM 2311 are grown is located on another substrate on which the processing module 100 , the first-level cache 210 , and the second-level cache 220 are deployed.
  • a 3D packaged system-on-chip 10 is formed.
  • an on-chip system 10 is provided according to an embodiment of the present application.
  • the on-chip system 10 includes two processing modules 100 and a cache 200 .
  • Each processing module 100 independently uses the first-level cache 210 and the second-level cache 220 .
  • LLC230 includes two different types of memory, namely high-speed MRAM2312 and high-retention MRAM2311.
  • the LLC 230 is located on the two processing modules 100, the first-level cache 210, and the second-level cache 220. That is, LLC 230 may be located in a different plane than processing module 100 .
  • the two MRAMs included in LLC230 are grown on different substrates, that is, there are two substrates, one for high-speed MRAM 2312 and the other for high-retention MRAM 2311 .
  • growing high-speed MRAM2312 and high-retention MRAM2311 on different substrates can improve the process yield and reduce costs of preparing on-chip systems.
  • the two substrates can be spliced together, and the spliced substrate can be located on the substrate where the processing module 100, the first-level cache 210, and the second-level cache 220 are deployed, forming a 3D packaged system-on-chip 10.
  • LLC230 is grown on two substrates as an example here. In practical applications, in order to improve the yield of the overall process, LLC230 can be grown on more than two substrates. These substrates can Spliced together to form LLC230.
  • the LLC 230 is located on one layer (that is, the same plane), and the processing module 100 and other levels of cache 200 are located on another layer (that is, another plane).
  • the system-on-chip 10 adopts a two-layer stacking method. In some scenarios, the system-on-chip 10 may also be stacked in two or more layers.
  • a system-on-chip 10 is provided in an embodiment of the present application.
  • the system-on-chip 10 includes two processing modules 100 and a cache 200 .
  • Each processing module 100 independently uses the first-level cache 210 and the second-level cache 220 .
  • LLC230 includes two different types of memory, namely high-speed MRAM2312 and high-retention MRAM2311.
  • the high-speed MRAM 2312 and the high-retention force MRAM 2311 are located on different planes
  • the high-speed MRAM 2312 and the processing module 100 are located on different planes
  • the high-retention force MRAM 2311 and the processing module 100 are located on different planes.
  • the two MRAMs included in LLC 230 are grown on different substrates, that is, there are two substrates, one for high-speed MRAM 2312 and the other for high-retention MRAM 2311 .
  • these two substrates are located on the substrate where the processing module 100, the first-level cache 210, and the second-level cache 220 are deployed, and the two substrates do not need to be spliced together, but are stacked.
  • the form is located in two different planes, upper and lower.
  • the substrate on which the high-speed MRAM 2312 is grown can be located on a layer closer to the processing module 100
  • the substrate on which the high-retention MRAM 2311 is grown can be located on a layer further away from the processing module 100 .
  • LLC230 is grown on two substrates as an example. In practical applications, in order to improve the yield of the overall process, LLC230 can be grown on more than two substrates. These substrates can placed between the processing modules 100 in a layer-by-layer manner. superior. In this way, the substrate can also be placed on the processing module 100 in a partially spliced and partially stacked manner. That is to say, some of these substrates can be spliced together and located on the same layer, while the remaining unspliced substrates are stacked and located on different layers.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)
  • Hall/Mr Elements (AREA)

Abstract

一种片上系统,本申请中,片上系统包括处理模块以及缓存。处理模块为该片上系统的核。处理模块能够对数据进行处理,并对缓存进行数据读写。缓存作为片上系统中靠近处理模块的、具备存储数据功能的模块,包括多个级别的缓存,多个级别的缓存中最后一级缓存LLC包括第一MRAM。MRAM属于非易失性存储器,掉电数据不丢失,将MRAM作为缓存的一部分,能够减少缓存的功耗。MRAM具备更高的数据读写能力,能够支持大容量数据存储,提升了缓存的数据读写速度,增加了缓存的容量。

Description

一种片上系统
相关申请的交叉引用
本申请要求在2022年08月24日提交中国专利局、申请号为202211021940.X、申请名称为“一种片上系统”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及通信技术领域,尤其涉及一种片上系统。
背景技术
设置有中央片上系统的片上系统(system on chip,SoC),该片上系统包括了CPU的内核以及内核的缓存。该缓存中包括了每个内核独立使用一级缓存和二级缓存,还包各个内核共享的三级缓存。
无论是一级缓存、二级缓存,还是三级缓存,均采用了静态随机存储器(Static Random Access Memory,SRAM)。SRAM作为易失性存储器,在片上系统掉电时SRAM中的数据会丢失,为了保证SRAM在片上系统掉电时,SRAM中的数据不会丢失,需要额外设置电路,持续的向SRAM输入电流。这样导致了SRAM具备较高的功耗。
发明内容
本申请提供一种片上系统,用以降低缓存的功耗。
第一方面,本申请实施例提供了一种片上系统,该片上系统包括处理模块以及缓存。处理模块为该片上系统的核心,该处理模块为该片上系统的核。本申请并不限定该处理模块的数量,片上系统可以包括一个处理模块,也可以包括多个处理模块。
处理模块作为片上系统的核心,能够对数据进行处理,并对缓存进行数据读写。
缓存作为片上系统中靠近处理模块的、具备存储数据功能的模块,包括多个级别的缓存,多个级别的缓存中最后一级缓存LLC包括第一MRAM。
通过上述片上系统,MRAM属于非易失性存储器,掉电数据不丢失,将MRAM作为缓存的一部分,能够减少缓存的功耗。另外,MRAM具备更高的数据读写能力,能够支持大容量数据存储,提升了缓存的数据读写速度,增加了缓存的容量。
在一种可能的实施方式中,最后一级缓存还包括SRAM。也就是说,LLC中既包括SRAM又包括第一MRAM。SRAM用于支持数据写入,第一MRAM用于支持数据读写。也即是说,处理模块可以将数据写入到SRAM中,从第一MRAM读取数据。例如,当处理模块在向最后一级缓存写入数据时,在SRAM中写入数据。SRAM中的数据可以迁移入第一MRAM,也就是说,第一MRAM存储从SRAM中迁移出的数据。
通过上述片上系统,SRAM能够实现数据高速写入,处理模块优先将数据写入到SRAM,能够保证数据写入速度。而MRAM具备较佳的数据保持力,数据不易改变。处理模块从第一MRAM能够准确的读取数据。
在一种可能的实施方式中,最后一个级别的缓存还包括第二MRAM,也就是说,LLC中包括两种MRAM,一种为第一MRAM,另一种为第二MRAM。第一MRAM的数据保持力大于第二MRAM的数据保持力,第一MRAM的数据写入速度小于第二MRAM的数据写入速度。第二MRAM用于支持数据写入,第一MRAM用于支持数据读写。也即是说,处理模块可以将数据写入到第二MRAM中,从第一MRAM读取数据。例如,当处理模块在向最后一级缓存写入数据时,在第二MRAM中写入数据。第二MRAM中的数据可以迁移入第一MRAM,也就是说,第一MRAM存储从SRAM中迁移出的数据。
通过上述片上系统,第一MRAM具备较高的数据保持力,数据在第一MRAM中不易改变,减少数据的误码率。第二MRAM具备较高的数据写入数据,能够实现数据高速写入,处理模块优先将数据写入到第二MRAM,能够保证数据写入速度。
在一种可能的实施方式中,该片上系统可以采用2D封装的方式。也就是说,缓存、与处理模块位于同一平面。
通过上述片上系统,片上系统的结构简单,简化了片上系统的制作流程。
在一种可能的实施方式中,该片上系统可以采用3D封装的方式。该片上系统中的各个组成部分(如处理模块、多个级别的缓存)可以有部分位于不同的平面,部分位于不同的平面。例如,第一MRAM与第二MRAM位于同一平面,第一MRAM与处理模块位于不同平面。
通过上述片上系统,第一MRAM以及第二MRAM位于与处理模块不同的平面,为该处理模块提供了更大的部署面积,使得该片上系统中可以包括数量更多的处理模块,另外,这种组成形式,也使得第一MRAM以及第二MRAM能够占用更大面积,进一步提升了LLC的容量。
在一种可能的实施方式中,第一MRAM与第二MRAM位于同一平面时,该第一MRAM与第二MRAM可以生长在同一衬底上。这样能够简化LLC的结构。该第一MRAM与第二MRAM也可以生长在不同衬底上,生长了第一MRAM的衬底与生长了第一MRAM的衬底进行拼接,这两种衬底拼接之后同一平面,这样能够降低LLC的制作难度,提高工艺良率。
在一种可能的实施方式中,该片上系统可以采用3D封装的方式。该片上系统中的各个组成部分中可以部分位于不同的平面,部分位于不同的平面。例如,第一MRAM、第二MRAM、以及处理模块分别位于不同平面。第一MRAM、第二MRAM、以及处理模块可以采用层层堆叠的方式排列,第一MRAM、第二MRAM、以及处理模块位于不同层。
通过上述片上系统,第一MRAM、第二MRAM、处理模块位于不同平面,为该处理模块提供了更大的部署面积,使得该片上系统中可以包括数量更多的处理模块,另外,这种组成形式,也使得第一MRAM以及第二MRAM能够占用更大面积,进一步提升了LLC的容量。
在一种可能的实施方式中,当第一MRAM、第二MRAM、处理模块位于不同平面时,可以将第二MRAM设置在靠近的处理模块的平面,将第二MRAM设置在远离的处理模块的平面。
通过上述片上系统,第二MRAM更加靠近处理模块,能够使得处理模块能够快速将数据写入到第一MRAM,提升片上系统的数据处理效率。
在一种可能的实施方式中,在LLC中包括第一MRAM、SRAM的情况下,该片上系统同样可以采用3D封装的方式。例如,第一MRAM与SRAM位于同一平面,第一MRAM与处理模块位于不同平面。
通过上述片上系统,采用这种封装方式,处理模块的部署面积更大,片上系统允许部署更多处理模块。同样的,采用这种封装方式也为SRAM以及第一MRAM提供了更大的部署面积,有效提升LLC的容量。
在一种可能的实施方式中,第一MRAM与SRAM位于同一平面时,第一MRAM与SRAM可以生长在同一衬底上。简化LLC的结构。该第一MRAM与SRAM也可以生长在不同衬底上,不同衬底拼接后位于同一平面。这样能够降低LLC的制作难度,提高工艺良率。
在一种可能的实施方式中,在LLC中包括第一MRAM、SRAM的情况下,该片上系统同样可以采用3D封装的方式。例如,第一MRAM、SRAM、以及处理模块分别位于不同平面。
通过上述片上系统,为第一MRAM、SRAM、处理模块提供了更大的部署面积,允许部署更多数量的处理模块,以及更大容量的第一MRAM和SRAM。
在一种可能的实施方式中,第一MRAM、SRAM、以及处理模块分别位于不同平面时,SRAM位于靠近的处理模块的平面,以使得处理模块能够快速将数据写入到SRAM,第一MRAM位于远离的处理模块的平面。
第二方面,本申请实施例还提供了一种片上系统,该片上系统包括第一方面以及第一方面的任一可能的实施例方式中提供的片上系统,该片上系统还包括存储器,该存储器可以作为片上系统的内存,为片上系统提供存储空间。
第三方面,本申请实施例还提供了一种计算设备,该计算设备包括第二方面提供的片上系统。
附图说明
图1为本申请提供的一种MTJ的结构示意图;
图2为本申请提供的一种存储单元的结构示意图;
图3为本申请提供的一种SoC的结构示意图;
图4A~4D为本申请提供的一种片上系统的结构示意图;
图5A~5D为本申请提供的一种片上系统的结构示意图。
具体实施方式
在对申请提供的一种片上系统进行介绍之前,先对本申请实施例中涉及到的一种存储器进行说明。
在本申请中引入了一种非易失性存储器——磁性随机存储器(magnetoresistive random access memory,MRAM)。MRAM既具备高速的数据读写能力,也支持大容量的数据存储。
MRAM主要借助磁性隧道结(magnetic tunnel junction,MTJ)在不同电流的作用下表现的出不同的阻值实现数据存储。
如图1所示,磁性隧道结一般是指由铁磁层、非磁绝缘层、铁磁层构成的三明治结构。位于底层的铁磁层称为参考层(reference layer)。非磁绝缘层称为势垒层(barrier)。位于顶层的铁磁层称为自由层(free layer)。通常参考层具备较高磁性,自由层的磁性较弱。
这里以MRAM中较为常见的自旋磁矩磁性随机存储器(Spin-torque-transfer MRAM,STT-MRAM)对MRAM的工作原理进行说明。
STT中,当电流穿过MTJ的自由层流向参考层时(电子运动方向与电流方向相反),电子经过参考层时,由于参考层具备较强的磁性,电子在参考层的作用下被极化,形成携带与参考层的磁化方向相同的自旋流,该自旋流注入到自由层。由于自由层的磁性较弱,自由层的磁化方向在自旋流的作用下发生偏转,趋于与参考层的磁化方向同向。这种情况下,MTJ的阻值较小,可以看做写入数据“0”。
当电流穿过MTJ的参考层流向自由层,电子在经过自由层时,由于自由层的磁性较弱不足以极化出足够改变参考层磁化方向的自旋流,当电子到达参考层表面时,参考层较强的磁性会将与自身磁矩相反的自旋状态反弹回自由层,形成与参考层的磁化方向相反的自旋流注入到自由层,由于自由层的磁性较弱,自由层的磁化方向在自旋流的作用下发生偏转,趋于与参考层的磁化方向反向。这种情况下,MTJ的阻值较大,可以看做写入数据“1”。
STT-MRAM包括多个存储单元,每个存储单元主要包括金属氧化物半导体型场效应管(metal oxide semiconductor field effect transistor,MOSFET)以及MTJ。
这里以MOSFET为N型场效应管为例进行说明,如图2所示,N型场效应管的栅极连接字线(word line,WL)。WL用于控制N型场效应管导通或断开,也即WL控制N型场效应管的工作状态。N型场效应管的源极(也称为漏极)通过源极线(source line,SL)与MTJ的参考层相连。MTJ的自由层上的连线为位线(bit line,BL)。在位线和源极线之间施加不同的电压,产生流经磁隧道结的电流(该电流可以称为写电流),该写电流可改变磁隧道结自由层的磁化方向,使MTJ的电阻变化,完成该存储单元中数据“0”或“1”的写入。
在读取该存储单元中数据时,位线向该MTJ输入读电流(该读电流通常比数据写入所输入的写电流小),该读电流经过通过MTJ和N型场效应管后,从源极线极输出,通过检测电流输入端以及电流输出端的电压(该电压与MTJ的阻值有关),确定存储单元中的数据是“0”还是“1”。
经过对MRAM的说明可知,MRAM是通过MTJ中参考层以及自由层的磁化方向的同向或反向实现数据存储的。MRAM的数据写入的速度取决于MTJ中自由层的磁化方向的改变速度。也就是说,当向MTJ输入写电流,若MTJ中的自由层的磁化方向能够在较短时间内发生偏转,说明可以较快地实现数据写入,若MTJ中的自由层的磁化方向需要较长时间才能发生偏转,说明数据写入速度相对较慢。
为了提高数据写入速度,可以改变MRAM中MTJ的一些属性。例如,可以降低自由层上下两个界面的氧化程度。又例如,可以减少自由层的厚度。
MRAM的数据保持力是指数据写入存储单元后数据的保持不变的能力,MRAM的数据保持力与MTJ中自由层的磁化方向偏转的容易程度有关。也就是说,当向MTJ中写入数据后,若MTJ中的自由层的磁化方向在长时间内不会轻易发生偏转,说明MTJ的阻值不会轻易变化,写入的数据不会被改变,MRAM具备较强的数据保持力。若MTJ中的自由层的磁化方向交易受环境影响,磁化方向再次发生偏转,说明MTJ的阻值容易变化,写入的数据丢失了,MRAM的数据保持力较差。
为了提升MRAM的数据保持力,同样也可以改变MRAM中MTJ的一些属性。例如,可以增大自由层上下两个界面的氧化程度。又例如,可以增加自由层的厚度。又例如,在自由层中间插入以金属为主的薄膜或金属氧化物薄膜。
本申请实施例中涉及两种不同性能的MRAM,一种为数据保持力相对较高的MRAM,另一种为数据写入速度相对较高的MRAM。
需要说明的是,这里数据保持力以及数据写入速度的比较范围是局限于两种MRAM。为了方便说明,本申请实施例中将这两种MRAM中数据保持力相对较高的MRAM的称为高保持力MRAM2311,将这两 种MRAM中数据写入速度相对较高的MRAM的称为高速MRAM2312。
另外,高速MRAM2312说明该MRAM的数据保持力相对较弱,为了避免高速MRAM2312中写入的数据发生丢失,可以为该高速MRAM2312配置额外的电路,该电路可以向该高速MRAM2312输入刷新电流,以保证该高速MRAM2312中的MTJ在数据写入后自由层的磁化方向保持不变。
此外,为了更精准评价MRAM数据保持力,可以设置保持力阈值。比如该保持力阈值可以为一个时间值,若MRAM在写入数据后数据发生改变的时间大于该保持力阈值,那么可以认为该MRAM属于高保持力MRAM2311。否则,认为该MRAM的数据保持力较低,为低保持力MRAM。故而,在本申请实施例中高保持力MRAM2311是两种MRAM中写入数据后数据发生改变的时间最大的MRAM。上述评价MRAM数据保持力的方式仅是举例,本申请并不限定该MRAM数据保持力的具体评价标准以及保持力阈值的设置方式。
类似的,为了更精准的评价MRAM数据写入速度,也可以设置速度阈值。比如,该速度阈值,也可以为一个速度值,若MRAM中写入数据的速度大于该速度阈值,那么可以认为该MRAM属于高速MRAM2312。否则,认为该MRAM的数据写入速度较低,为低速MRAM。故而,在本申请实施例中高速MRAM2312是两种MRAM中数据写入速度最大的MRAM。上述评价MRAM数据写入速度的方式仅是举例,本申请并不限定该MRAM数据写入速度的具体评价标准以及速度阈值的设置方式。
下面对本申请实施例提供的片上系统进行说明,如图3所示,为本申请实施例提供的一种SoC10结构示意图。该SoC10包括处理模块100、以及缓存200(cache)。处理模块100用于进行数据处理。处理模块100能够对该缓存200进行数据读写。也就是说,处理模块100可以在该缓存200中写入数据,或者从该缓存200中读取数据。例如,处理模块100需要处理的数据可以存储在该缓存200中,处理模块100在需要处理数据时,可以从该缓存200中读取数据。又例如,处理模块100可以将处理后的数据写入到该缓存200中。
本申请实施例并不限定该处理模块100的具体形态,例如,该处理模块100可以为处理器的内核,也即处理模块100可以是片上系统中负责数据处理的部分。本申请实施例并不限定处理器的类型,该处理器可以为中央处理单元(central processing unit,CPU)、图像处理器(graphics processing unit,GPU)、张量处理器(tensor processing unit,TPU)、数据处理器(data processing unit,DPU)或神经网络处理器(neural network processing unit,NPU)等。
处理模块100也可以通过专用集成电路(application-specific integrated circuit,ASIC)实现,或可编程逻辑器件(programmable logic device,PLD)实现,上述PLD可以是复杂程序逻辑器件(complex programmable logical device,CPLD),现场可编程门阵列(field-programmable gate array,FPGA),通用阵列逻辑(generic array logic,GAL)或其任意组合。
缓存200为该SoC10上最靠近处理模块100的数据存储模块,该处理模块100会直接对该缓存200进行数据读写。
缓存200包括N个级别的缓存,N为大于2的正整数。处理模块100在从缓存200中读取数据时,会按照级别由小到大的顺序从缓存中读取数据,直至命中数据。也就是说,处理模块100首先会从一级缓存210(本文用一级缓存可以用L1表示,也即一级缓存210可以用L1-210标识)中读取数据。若在一级缓存210命中数据,结束读取数据的操作;若未在一级缓存210命中数据,处理模块100会继续从二级缓存220(本文用二级缓存可以用L2表示,也即二级缓存220可以用L2-220)中读取数据。若在二级缓存220命中数据,结束读取数据的操作;若未在二级缓存220命中数据,处理模块100会继续从下一级缓存中读取数据;若处理模块100在最后一级缓存中仍未命中数据,处理模块100才会从除缓存200之外的存储器读取数据。
处理模块100在向缓存200中写入数据时,会将数据写入到一级缓存210中。在该N个级别的缓存中,上一级别的缓存无空闲空间或者无法写入数据时,上一级别的缓存会将该上一级别的缓存中的数据迁移到下一级别的缓存中,以便上一级别的缓存中能够继续写入数据。迁移的数据可以为上一级别的缓存中最先写入的数据,也可以是上一级别的缓存中读写频率低于阈值的数据。例如,当一级缓存中不存在空闲空间或无法写入数据时,一级缓存中的数据会迁移到二级缓存,以保证一级缓存能够继续写入数据。当二级缓存中不存在空闲空间或无法继续写入数据时,二级缓存中的数据会迁移到下一级缓存,以保证二级缓存能够继续写入数据。
本申请实施例并不限定处理模块100的数量,该片上系统10可以包括一个处理模块100,也可以包括 多个处理模块100。
当该片上系统10包括多个处理模块100,该多个处理模块100可以共享该N个级别的缓存,也就是说,该多个处理模块100均能够向对该N个级别的缓存进行数据读写。
该多个处理模块100也可以仅共享部分级别的缓存。例如,该多个处理模块100可以仅共享该N个级别的缓存中最后一或多级缓存。也就是说,每个处理模块100有独立使用的部分级别的缓存,而剩余部分级别的缓存可以由该多个处理模块100共同使用。
以N等于3为例,每个处理模块100配置有独立使用的一级缓存210和二级缓存220。多个处理模块100共享最后一级缓存230。
在本申请实施例中缓存200中的最后一级缓存(last-level cache,LLC)230包括MRAM231。相对于静态存储器(static random access memory,SRAM),MRAM231属于非易失性存储器,在片上系统10掉电的情况下MRAM231中的数据不容易丢失,不需要向该MRAM231中持续输入电流保持该MRAM231中的数据。能够有效降低功耗。另外,MRAM231能够支持大容量的数据存储,相较于SRAM,MRAM231的集成程度更高,也就是说,同样面积的SRAM和MRAM231,MRAM231的存储空间更大,这样能够增加缓存200的存储空间。
本申请提供了如下两种包括MRAM231的LLC230。
第一种、LLC230包括SRAM232以及高保持力MRAM2311。
第二种、LLC230包括高速MRAM2312以及高保持力MRAM2311。
下面以片上系统10包括两个处理模块100以及缓存200包括三个级别的缓存为例,分别对两种包括MRAM的LLC230的结构进行说明。应需理解的是,本申请实施例中主要改变的是缓存200中最后一级缓存230的结构,故而本申请实施例提供的LLC230的结构也同样适用于具备不同数量的处理模块100或具备不同级别的缓存的片上系统10。
第一种、LLC230包括SRAM232以及高保持力MRAM2311。
如图4A所示,为本申请实施例提供的一种片上系统,该片上系统10包括两个处理模块100以及缓存200,每个处理模块100独立使用一级缓存210、二级缓存220。LLC230包括两种不同类型的存储器,分别为SRAM232和高保持力MRAM2311。
在LLC230中,SRAM232用于实现LLC230的数据写入,高保持力MRAM2311用于实现LLC230的数据读取。也就是说,当向LLC230中写入数据时,优先写入到SRAM232中,高保持力MRAM2311可以在SRAM232中无空闲空间或无法继续写入数据的情况下,将SRAM232中的数据迁移到高保持力MRAM2311,这样SRAM232在后续能够继续被写入数据。这样,当处理模块100需要从LLC230中读取数据时,可以从高保持力MRAM2311中读取数据。
本申请实施例并限定LLC230中SRAM232的面积(也即该SRAM232的占用SoC10的面积)和高保持力MRAM2311(也即该高保持力MRAM2311的占用SoC10的面积)的占面积的比例,考虑到高保持力MRAM2311支持的数据读取,高保持力MRAM2311的面积可以大于SRAM232的面积。
高保持力MRAM2311由于具备较高的数据保持力,存储在该高保持力MRAM2311中的数据不容易被改变,能够降低数据的误码率。高保持力MRAM2311中MTJ在写入数据0或数据1时,其阻值差异较为明显,仅需较低的读电流就能够确定高保持力MRAM2311中MTJ写入的数据,能够降低数据读取时延。
如图4A所示的片上系统10,该片上系统10上的组成部分位于同一平面内,属于2D封装的方式。片上系统10的整体面积与该片上系统10中各个组成部分的面积相关。2D封装的方式会制约片上系统10的整体面积,使得该片上系统10不能包括较多的处理模块100,缓存200的大小也受到制约。为了能够进一步提升片上系统10的性能,提出了一种3D封装的方式。3D封装是指片上系统10上的组成部分不再均位于同一平面内。而是以一种三维的方式将片上系统10上的组成部分封装在一起。这些组成部分在该片上系统10中可以位于不同平面内。
下面列举两种采用3D封装的片上系统10。
如图4B所示,为本申请实施例提供的一种片上系统10,片上系统10包括两个处理模块100以及缓存200,每个处理模块100独立使用一级缓存200、二级缓存200。LLC230包括两种不同类型的存储器,分别为SRAM232和高保持力MRAM2311。LLC230位于该两个处理模块100、以及一级缓存200、二级缓存200之上。也就是说,LLC230可以与处理模块100位于不同平面内。
图4B中,LLC230包括的SRAM232和高保持力MRAM2311可以是生长在同一片衬底(如硅片)上。 也即同一个衬底上分别生长的SRAM232以及高保持力MRAM2311。两个处理模块100、一级缓存210、以及二级缓存220可以位于另一片衬底上。如图4B所示,生长了SRAM232以及高保持力MRAM2311的衬底位于另一片部署有处理模块100、一级缓存210、以及二级缓存220的衬底之上,形成3D封装的片上系统10。
在同一片衬底生长两种完全不同的MRAM对制备工艺较为苛刻,工艺良率较低,成本较高,为此可以在不同衬底上分别生长SRAM232以及高保持力MRAM2311,具体可以参见图4C。
如图4C所示,为本申请实施例提供的一种片上系统10,片上系统10包括两个处理模块100以及缓存200,每个处理模块100独立使用一级缓存210、二级缓存220。LLC230包括两种不同类型的存储器,分别为SRAM232和高保持力MRAM2311。LLC230位于该两个处理模块100、以及一级缓存210、二级缓存220之上。也就是说,LLC230可以与处理模块100位于不同平面。
与图4B不同,LLC230包括的SRAM232和高保持力MRAM2311是生长在不同衬底上,也即存在两片衬底,一片衬底上为SRAM232,另一片衬底上为高保持力MRAM2311。相较于在同一片衬底上生长SRAM232和高保持力MRAM2311的情况,不同衬底分别生成SRAM232和高保持力MRAM2311,能够提升制备工艺良率,降低成本。
这两片衬底可以拼接在一起,拼接后的衬底可以位于部署有处理模块100、一级缓存210、以及二级缓存220的衬底之上,形成3D封装的片上系统10。
需要说明的是,这里以LLC230生长在两片衬底上为例进行说明的,在实际应用中,为了进一步降低工艺难度,LLC230可以生成在两个以上的衬底上,这些衬底可以拼接在一起形成LLC230。
在图4B以及图4C所示的片上系统10中,LLC230位于一层中,处理模块100以及其他级别的缓存位于另一侧,片上系统10采用了两层堆叠的方式。在一些场景中,片上系统10也可以采用两层以上堆叠的方式。下面以如图4D所示的片上系统10为例进行说明。
如图4D所示,为本申请实施例提供的一种片上系统10,片上系统10包括两个处理模块100以及缓存200,每个处理模块100独立使用一级缓存210、二级缓存220。LLC230包括两种不同类型的存储器,分别为SRAM232和高保持力MRAM2311。SRAM232与高保持力MRAM2311位于不同平面,SRAM232与处理模块100位于不同平面,高保持力MRAM2311与处理模块100位于不同平面。
与图4B类似,LLC230包括的SRAM232和高保持力MRAM2311是生长在不同衬底上,也即存在两片衬底,一片生成SRAM232,另一片生成高保持力MRAM2311。
与图4B不同,这两片衬底均位于部署有处理模块100、一级缓存210、以及二级缓存220的衬底之上,这两片衬底不需要拼接在一起,而是以堆叠的形式位于两个不同的平面。为了能够进一步降低数据写入的时延,生长有SRAM232的衬底可以位于距离处理模块100更近的一层,生长有高保持力MRAM2311的衬底位于距离处理模块100更远的一层.。
需要说明的是,这里以LLC230生长在两片衬底上为例进行说明的,在实际应用中,提升工艺良率,LLC230可以生成在两个以上的衬底上,这些衬底可以以层层堆叠的方式置于处理模块100之上。这样衬底也可以采用部分拼接,部分堆叠的方式置于处理模块100之上。也就是说,这些衬底中的一部分衬底可以拼接在一起、位于同一平面,其余未拼接的衬底堆叠、位于不同平面。
第二种、LLC230包括高速MRAM2312以及高保持力MRAM2311。
如图5A所示,为本申请实施例提供的一种片上系统,片上系统10包括两个处理模块100以及缓存200,每个处理模块100独立使用一级缓存210、二级缓存220。LLC230包括两种不同类型的存储器,分别为高速MRAM2312和高保持力MRAM2311。
在LLC230中,高速MRAM2312用于实现LLC230的数据写入,高保持力MRAM2311用于实现LLC230的数据读取。也就是说,当向LLC230中写入数据时,优先写入到高速MRAM2312中,高保持力MRAM2311可以在高速MRAM2312中无空闲空间或无法继续写入数据的情况下,将高速MRAM2312中的数据迁移到高保持力MRAM2311,这样高速MRAM2312在后续能够继续被写入数据。这样,当处理模块100需要从LLC230中读取数据时,可以从高保持力MRAM2311中读取数据。在图5A中,高速MRAM2312所具备的功能与图4A中SRAM232所具备的功能相同。由于MRAM本身能够支持大容量的数据存储,高速MRAM2312能够提供更大的存储空间,进一步增加了LLC230的容量。
本申请实施例并不限定LLC230中高速MRAM2312的面积(也即该高速MRAM2312的占用SoC10的面积)和高保持力MRAM2311的占面积的比例,考虑到高保持力MRAM2311支持的数据读取,高保持力MRAM2311的面积可以大于高速MRAM2312的面积。
高速MRAM2312通过降低其数据保持力而提升了其数据写入速度,向高速MRAM2312中写入数据时的写电流小于向高保持力MRAM2311写入数据时的写电流,对于高速MRAM2312中MTJ的势垒层需承担的压力相对变小,不容易被击穿,高速MRAM2312的寿命也相对较高。
但由于高速MRAM2312其数据保持力弱,为了避免高速MRAM2312中写入的数据发生改变,可以增加相应的刷新电路,为该高速MRAM2312提供刷新电流,以保持该高速MRAM2312中的数据。虽然在一定程度上会增加功耗,但相较于SRAM232所带来的功耗,高速MRAM2312带来的功耗较少。
如图5A所示的片上系统10,该片上系统10上的组成部分可以位于同一平面内,属于2D封装的方式。片上系统10的整体面积受限于该片上系统10中各个组成部分的面积。片上系统10中需要部署较多的处理模块100,或者片上系统10上需要部署较大的缓存200,会导致片上系统10的整体面积增大。为了能够进一步在保证片上系统的处理模块100的数量或者缓存200大小的前提下,缩减片上系统10的整体面积,提出了一种3D封装的方式。3D封装是指片上系统10中的组成部分不再均位于同一平面内。而是以一种三维的方式将片上系统10上的组成部分封装在一起。这些组成部分在该片上系统10中可以位于不同平面内。
下面列举两种采用3D封装的片上系统10。
如图5B所示,为本申请实施例提供的一种片上系统10,片上系统10包括两个处理模块100以及缓存200,每个处理模块100独立使用一级缓存210、二级缓存220。LLC230包括两种不同类型的存储器,分别为高速MRAM2312和高保持力MRAM2311。LLC230位于该两个处理模块100、以及一级缓存210、二级缓存220之上。也就是说,LLC230可以与处理模块100位于不同平面。
图5B中,LLC230包括的两种MRAM可以是生长在同一片衬底(如硅片)上,也即同一个衬底上分别生长的高速MRAM2312以及高保持力MRAM2311。两个处理模块100、一级缓存210、以及二级缓存220可以位于另一片衬底上。在如图5B所示,生长了高速MRAM2312以及高保持力MRAM2311的衬底位于另一片部署有处理模块100、一级缓存210、以及二级缓存220的衬底之上。形成3D封装的片上系统10。
如图5C所示,为本申请实施例提供的一种片上系统10,片上系统10包括两个处理模块100以及缓存200,每个处理模块100独立使用一级缓存210、二级缓存220。LLC230包括两种不同类型的存储器,分别为高速MRAM2312和高保持力MRAM2311。LLC230位于该两个处理模块100、以及一级缓存210、二级缓存220之上。也就是说,LLC230可以与处理模块100位于不同平面内。
与图5B不同,LLC230包括的两种MRAM是生长在不同衬底上,也即存在两片衬底,一片生长高速MRAM2312,另一片生长高保持力MRAM2311。相较于在同一片衬底上生长两种MRAM的情况,不同衬底分别生长高速MRAM2312和高保持力MRAM2311,能够提升制备片上系统的工艺良率,降低成本。
这两片衬底可以拼接在一起,拼接后的衬底可以位于部署有处理模块100、一级缓存210、以及二级缓存220可以衬底之上,形成3D封装的片上系统10。
需要说明的是,这里以LLC230生长在两片衬底上为例进行说明的,在实际应用中,为了提升整体工艺的良率,LLC230可以生长在两个以上的衬底上,这些衬底可以拼接在一起形成LLC230。
在图5B以及图5C所示的片上系统10中,LLC230位于一层(也即同一个平面)中,处理模块100以及其他级别的缓存200位于另一层(也即另一个平面),片上系统10采用了两层堆叠的方式。在一些场景中,片上系统10也可以采用两层以上堆叠的方式。
如图5D所示,为本申请实施例提供的一种片上系统10,片上系统10包括两个处理模块100以及缓存200,每个处理模块100独立使用一级缓存210、二级缓存220。LLC230包括两种不同类型的存储器,分别为高速MRAM2312和高保持力MRAM2311。高速MRAM2312与高保持力MRAM2311位于不同平面,高速MRAM2312与处理模块100位于不同平面,高保持力MRAM2311与处理模块100位于不同平面。
与图5B类似,LLC230包括的两种MRAM是生长在不同衬底上,也即存在两片衬底,一片生长高速MRAM2312,另一片生长高保持力MRAM2311。
与图5B不同,这两片衬底均位于部署有处理模块100、一级缓存210、以及二级缓存220可以衬底之上,且这两片衬底不需要拼接在一起,而是以堆叠的形式位于上下两个不同的平面。为了能够进一步降低数据写入的时延,生长有高速MRAM2312的衬底可以位于距离处理模块100更近的一层,生长有高保持力MRAM2311的衬底位于距离处理模块100更远的一层。
需要说明的是,这里以LLC230生长在两片衬底上为例进行说明的,在实际应用中,为了提升整体工艺的良率,LLC230可以生长在两个以上的衬底上,这些衬底可以以层层堆叠的方式置于处理模块100之 上。这样衬底也可以采用部分拼接,部分堆叠的方式置于处理模块100之上。也就是说,这些衬底中的一部分衬底可以拼接在一起位于同一层,其余未拼接的衬底堆叠,位于不同层。
显然,本领域的技术人员可以对本申请进行各种改动和变型而不脱离本申请范围。这样,倘若本申请的这些修改和变型属于本申请权利要求及其等同技术的范围之内,则本申请也意图包含这些改动和变型在内。

Claims (14)

  1. 一种片上系统,其特征在于,所述片上系统包括处理模块以及缓存;
    所述处理模块,用于对数据进行处理,并对所述缓存进行数据读写;
    所述缓存包括多个级别的缓存,所述多个级别的缓存中最后一级缓存LLC包括第一磁性随机存储器MRAM。
  2. 如权利要求1所述的片上系统,其特征在于,所述最后一级缓存还包括静态随机存储器SRAM;
    所述处理模块在对所述缓存进行数据写入时,具体用于:
    在所述SRAM中写入数据;
    所述第一MRAM,用于存储从所述SRAM中迁移出的数据。
  3. 如权利要求1所述的片上系统,其特征在于,所述最后一个级别的缓存还包括第二MRAM,其中,所述第一MRAM的数据保持力大于所述第二MRAM的数据保持力;
    所述处理模块在对所述缓存进行数据写入时,具体用于:
    在所述第二MRAM中写入数据;
    所述第一磁性随机存储器MRAM,用于存储从所述第二MRAM中迁移出的数据。
  4. 如权利要求1~3任一项所述的片上系统,其特征在于,所述缓存、与所述处理模块位于同一平面。
  5. 如权利要求3所述的片上系统,其特征在于,所述第一MRAM与所述第二MRAM位于同一平面,所述第一MRAM与所述处理模块位于不同平面。
  6. 如权利要求5所述的片上系统,其特征在于,所述第一MRAM与所述第二MRAM生长在同一衬底上。
  7. 如权利要求5所述的片上系统,其特征在于,所述第一MRAM与所述第二MRAM生长在不同衬底上,所述不同衬底拼接后位于同一平面。
  8. 如权利要求3所述的片上系统,其特征在于,所述第一MRAM、所述第二MRAM、以及所述处理模块分别位于不同平面。
  9. 如权利要求8所述的片上系统,其特征在于,所述第二MRAM位于靠近所述的处理模块的平面,所述第二MRAM位于远离所述的处理模块的平面。
  10. 如权利要求2所述的片上系统,其特征在于,所述第一MRAM与所述SRAM位于同一平面,所述第一MRAM与所述处理模块位于不同平面。
  11. 如权利要求10所述的片上系统,其特征在于,所述第一MRAM与所述SRAM生长在同一衬底上。
  12. 如权利要求10所述的片上系统,其特征在于,所述第一MRAM与所述SRAM生长在不同衬底上,所述不同衬底拼接后位于同一平面。
  13. 如权利要求2所述的片上系统,其特征在于,所述第一MRAM、所述SRAM、以及所述处理模块分别位于不同平面。
  14. 如权利要求13所述的片上系统,其特征在于,所述SRAM位于靠近所述的处理模块的平面,所述第一MRAM位于远离所述的处理模块的平面。
PCT/CN2023/101056 2022-08-24 2023-06-19 一种片上系统 WO2024041131A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211021940.XA CN117667829A (zh) 2022-08-24 2022-08-24 一种片上系统
CN202211021940.X 2022-08-24

Publications (1)

Publication Number Publication Date
WO2024041131A1 true WO2024041131A1 (zh) 2024-02-29

Family

ID=90012337

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/101056 WO2024041131A1 (zh) 2022-08-24 2023-06-19 一种片上系统

Country Status (2)

Country Link
CN (1) CN117667829A (zh)
WO (1) WO2024041131A1 (zh)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103810118A (zh) * 2014-02-28 2014-05-21 北京航空航天大学 一种新型的stt-mram缓存设计方法
CN103810119A (zh) * 2014-02-28 2014-05-21 北京航空航天大学 一种利用三维集成电路片上温差降低stt-ram功耗的缓存设计方法
CN104871248A (zh) * 2012-12-20 2015-08-26 高通股份有限公司 集成mram高速缓存模块

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104871248A (zh) * 2012-12-20 2015-08-26 高通股份有限公司 集成mram高速缓存模块
CN103810118A (zh) * 2014-02-28 2014-05-21 北京航空航天大学 一种新型的stt-mram缓存设计方法
CN103810119A (zh) * 2014-02-28 2014-05-21 北京航空航天大学 一种利用三维集成电路片上温差降低stt-ram功耗的缓存设计方法

Also Published As

Publication number Publication date
CN117667829A (zh) 2024-03-08

Similar Documents

Publication Publication Date Title
US9412787B2 (en) Method and system for providing magnetic tunneling junction elements having improved performance through capping layer induced perpendicular anisotropy and memories using such magnetic elements
US8913350B2 (en) Method and system for providing magnetic tunneling junction elements having improved performance through capping layer induced perpendicular anisotropy and memories using such magnetic elements
US10832749B2 (en) Perpendicular magnetic memory with symmetric fixed layers
US7109539B2 (en) Multiple-bit magnetic random access memory cell employing adiabatic switching
US8891290B2 (en) Method and system for providing inverted dual magnetic tunneling junction elements
US10832847B2 (en) Low stray field magnetic memory
US9178137B2 (en) Magnetoresistive element and magnetic memory
JP2012104825A (ja) スイッチングが改良されたハイブリッド磁気トンネル接合要素を提供するための方法およびシステム
WO2015116415A1 (en) Multi-level cell designs for high density low power gshe-stt mram
WO2012021297A1 (en) Magnetic tunneling junction elements having a biaxial anisotropy
US10522739B2 (en) Perpendicular magnetic memory with reduced switching current
WO2009110530A1 (ja) 半導体装置
US20130258750A1 (en) Dual-cell mtj structure with individual access and logical combination ability
JP5723311B2 (ja) 磁気トンネル接合素子および磁気メモリ
WO2024041131A1 (zh) 一种片上系统
WO2021142681A1 (zh) 一种磁性随机存储器及电子设备
US10847198B2 (en) Memory system utilizing heterogeneous magnetic tunnel junction types in a single chip
CN114335329B (zh) 一种具有高抗磁场干扰能力的磁性随机存储器
US20210390994A1 (en) Memory and access method
CN116844598A (zh) 一种具有多级缓存的处理器以及电子设备
WO2022041278A1 (zh) 一种存储器
CN112201747A (zh) 磁性存储器结构、读写方法及电子设备
CN116406221A (zh) 一种基于双层磁性隧道结的多态磁存储器

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23856223

Country of ref document: EP

Kind code of ref document: A1