CN113722244B

CN113722244B - Cache structure, access method and electronic equipment

Info

Publication number: CN113722244B
Application number: CN202111285507.2A
Authority: CN
Inventors: 李祖松; 赵继业; 郇丹丹
Original assignee: Beijing Micro Core Technology Co ltd
Current assignee: Beijing Micro Core Technology Co ltd
Priority date: 2021-11-02
Filing date: 2021-11-02
Publication date: 2022-02-22
Anticipated expiration: 2041-11-02
Also published as: CN113722244A

Abstract

The present disclosure provides a cache structure, an access method and an electronic device. The cache structure is physically organized to a depth of 2ⁿxM term, bit width of N/2ⁿM, N and n are natural numbers. The cache structure is configured to write data to the cache structure in pair 2ⁿData write operations are performed on individual physical Cache lines, and when data is read from the Cache structure, data read operations are performed on one physical Cache line at a time, i.e., N/2ⁿAnd (4) a bit. By adopting the cache structure disclosed by the invention, the area of the cache can be reduced, so that the cost is reduced, and the access speed of the cache is improved.

Description

Cache structure, access method and electronic equipment

Technical Field

The present invention relates to the field of memory technologies, and in particular, to a cache structure, an access method, and an electronic device.

Background

One of the limiting factors in today's computer innovation is memory and storage technology. In conventional computer systems, system memory (also referred to as primary memory, main memory, executable memory) is typically implemented by Dynamic Random Access Memory (DRAM). Conventional computer systems also rely on multiple levels of Cache (Cache) to improve performance. A cache is a high-speed memory located between a processor and system memory in order to more quickly service memory access requests. Cache management protocols may be used to ensure that the most frequently accessed data and instructions are stored in a certain level of a multi-level cache, thereby reducing the number of times a processor directly accesses memory to perform access transactions and improving performance.

Cache access has spatial locality, i.e., accesses of programs and data use data in relatively close memory locations. Therefore, in general, the width of a Cache line (Cache line) is designed to be relatively wide, for example, 512 bits, that is, each Cache line stores 512 bits of data. When the Cache read data is accessed by the access instruction once, a complete Cache line can be read, and the design of the Cache line with large width can better perform the pre-fetching function on the subsequent data access.

Caches are typically implemented using Static Random Access Memory (SRAM). The organization of SRAM is generally expressed in terms of depth and bit width. For example, the organization form commonly used by the current instruction Cache and data Cache, the organization with the capacity of 32KB and 8-way group connection, 8 physical SRAMs, each physical SRAM 4KB, the Cache line width 512bit, the depth entry 64 entries, the organization form 64 × 512, an SRAM of this size is required, and the specification of one physical block of the SRAM static memory used therein is 64 × 512.

Generally, under the premise of the same storage bit number, the more the number of terms of the SRAM is, the smaller the area of the SRAM is, and the lower the cost is. From the viewpoint of layout (Floorplan), the shape of the SRAM is generally closer to a square, and layout is easier. The SRAM with a shallow depth and a wide width wastes an area, which increases the cost and reduces the access speed.

Aiming at the problem that the area is large due to the organization form of the SRAM in the prior art, an effective solution is not provided.

The statements in the background section are merely prior art as they are known to the inventors and do not, of course, represent prior art in the field.

Disclosure of Invention

To address the problems of the prior art, embodiments of the present invention provide a cache structure. The core concept is that a scheme for reducing the line width and increasing the line number of the physical Cache is provided on the basis of the same total storage capacity in a 'Cache folding' mode, so that the physical organization of the Cache increases the number of items and reduces the bit width, the width of the Cache line in logic is unchanged, the area of the Cache is reduced, the cost is reduced, and the access speed is increased.

In one aspect, embodiments of the present disclosure provide a cache structure. The cache structure includes a tag portion and a data portion. The cache structure is physically organized to a depth of 2ⁿxM term, bit width of N/2ⁿM, N and n are natural numbers. The cache structure is configured to write data to the cache structure in pair 2ⁿPerforming data write operations on individual physical Cache lines, and when reading data from the Cache structure, writing data to one physical Cache line at a time, i.e. N/2ⁿThe data write operation is performed for one bit. Each tag stored in the tag portion corresponds to 2ⁿA physical Cache line.

Optionally, N/2ⁿThe minimum is the maximum data width accessed by the memory access instruction of the processor, such as 32 or 64.

Optionally, N =256, N =512, or N =1024, and N =1, N =2, or N = 3.

Alternatively, the cache structure is constructed by SRAM, SEDRAM, SSRAM or SDRAM, DRAM.

In another aspect, embodiments of the present disclosure provide a method for accessing a cache structure. The cache structure includes a tag portion and a data portion. The cache structure is physically organized to a depth of 2ⁿxM term, bit width of N/2ⁿM, N and n are natural numbers. The method includes, while writing data to the cache structure, pair 2ⁿPerforming data write operations on individual physical Cache lines, and when reading data from the Cache structure, writing data to one physical Cache line at a time, i.e. N/2ⁿThe data read operation is performed by a single bit. Each tag stored in the tag portion corresponds to 2ⁿA physical Cache line.

In yet another aspect, an embodiment of the present disclosure provides an electronic device, which includes the aforementioned cache structure.

According to the embodiment of the disclosure, a novel cache configuration is provided, which can solve the problem of large area caused by the organization form of the cache in the prior art. And the method can also have the prefetching function of a wide Cache line, fully utilize the spatial locality of the Cache, and flexibly adjust the shape of the Cache according to the layout condition by folding.

Drawings

Further details, features and advantages of the invention are disclosed in the following description of exemplary embodiments with reference to the accompanying drawings, in which:

FIG. 1 is a schematic diagram illustrating the formation of a Cache structure according to an embodiment of the present invention by way of "Cache collapse";

FIG. 2 is a schematic diagram illustrating a method of accessing a cache structure according to an embodiment of the invention; and

fig. 3 schematically shows a block diagram of an exemplary electronic device according to an embodiment of the present invention.

Detailed Description

Embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present invention are shown in the drawings, it should be understood that the present invention may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided for a more thorough and complete understanding of the present invention. It should be understood that the drawings and the embodiments of the present invention are illustrative only and are not intended to limit the scope of the present invention.

It should be understood that the various steps recited in the method embodiments of the present invention may be performed in a different order and/or performed in parallel. Moreover, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the invention is not limited in this respect.

The term "include" and variations thereof as used herein are open-ended, i.e., "including but not limited to". The term "based on" is "based, at least in part, on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Relevant definitions for other terms will be given in the following description. It should be noted that the terms "first", "second", and the like in the present invention are only used for distinguishing different devices, modules or units, and are not used for limiting the order or interdependence relationship of the functions performed by the devices, modules or units.

It is noted that references to "a", "an", and "the" modifications in the present invention are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that reference to "one or more" unless the context clearly dictates otherwise.

The conventional Cache structure is mainly composed of two parts, a Tag (Tag) part and a Data (Data) part. The Data portion is used to hold Data for a contiguous piece of address and the Tag portion is used to store the public address for the contiguous piece of Data. In the conventional Cache structure, a Tag and all Data corresponding to the Tag form a Line called a Cache Line, and a Data portion in the Cache Line is called a Data Block (Data Block). One Tag corresponds to one Cache Line. Accordingly, a typical processor accesses the address structure of the Cache by which the processor can access each byte of data in the Cache structure as follows. The address structure includes three parts, a Tag (Tag), an Index (Index), and an intra Block Offset (Block Offset). Using the Index to find a group of Cache lines from the Tag part of the Cache, comparing the Tag part read by using the Index with the Tag in the address structure, and indicating that the Cache lines are the wanted one only if the former comprises the latter; a plurality of access data are correspondingly arranged in one Cache Line, and the really desired data can be found through the Block Offset part in the address structure and the access width of the access instruction, and can be positioned to each byte. And a valid bit (valid) is also arranged in the Cache Line and used for marking whether the Cache Line stores valid data or not, the data of the Cache Line can be stored in the corresponding Cache Line only at the previously accessed memory address, and the corresponding valid bit can be set to be 1.

A conventional cache is illustrated as a comparative example to facilitate describing embodiments and advantages of the present disclosure. One conventional Cache is organized 64 x 512, i.e., 64 entries deep, with 512 bits wide (also referred to as Cache line width). In most cases, the data required by a central processor access instruction will not exceed 64 bits, in which case 512 bits in the cache will be discarded, and the required 64 bits will be selected from the remaining 512 bits.

According to one embodiment of the present disclosure, the SRAM organized in the form of 64 × 512 may be modified to 128 × 256, i.e., the depth is increased from 64 entries to 128 entries, and the bit width is decreased from 512 bits to 256 bits. That is, one 64-entry and 512-bit-wide SRAM is folded into one 128-entry and 256-bit-wide SRAM by folding, which corresponds to one "Cache folding". The storage capacity is unchanged after folding, but the area is reduced from the layout (Floorplan) point of view.

According to an embodiment of the present disclosure, a cache structure is provided. The cache structure includes a tag portion and a data portion. The cache structure is physically organized to a depth of 2ⁿxM term, bit width of N/2ⁿM, N and n are natural numbers. The cache structure is configured to write data to the cache structure in pair 2ⁿPerforming data write operations on individual physical Cache lines, and when reading data from the Cache structure, writing data to one physical Cache line at a time, i.e. N/2ⁿThe data write operation is performed for one bit. Each tag stored in the tag portion corresponds to 2ⁿA physical Cache line.

Different from the conventional Cache in which one Tag corresponds to one Cache line, according to the embodiment of the present disclosure, one Tag of the Tag portion corresponds to multiple Cache lines, and when data is written into the Cache, specifically, into the data portion of the Cache, the data is written into one or more physical Cache lines corresponding to one Tag. Correspondingly, the address structure of the Cache accessed by the processor can be adaptively adjusted, and besides the Tag (Tag), the Index (Index) and the intra-Block Offset (Block Offset), the address structure can also comprise a line flag (Cache line flag) for uniquely determining one Cache line in a plurality of Cache lines corresponding to one Tag. The address structure of the processor accessing the Cache is only used for illustrating the purpose that each line or each byte of the Cache structure according to the embodiment of the disclosure can be accessed by the processor, and other address structures of the processor accessing the Cache are also contemplated, and the address structure can be flexibly designed according to the design requirements of the processor.

In one embodiment, N/2 needs to be satisfiedⁿThe minimum is the maximum data width accessed by the memory access instruction of the processor, for example, the maximum access width of the memory access instruction is Word (32 bits), or the maximum access width of the memory access instruction is double Word (64 bits), N/2ⁿThe bit width is 32 bits or 64 bits at minimum。

In one embodiment, N is an integer multiple of 8. Optionally, N may be 512, and the value of N may also be 256, 1024, or the like.

In one embodiment, n =1, n =2 or n = 3.

In one embodiment, M may be 32, 64, or 128, for example.

It should be understood that the values of M, N and n are exemplary and not limiting.

The cache structure according to the embodiments of the present disclosure may be constructed by SRAM, SSRAM, SDRAM, or DRAM. Alternatively, it may be constructed by sedam (stacked Dynamic Random Access Memory), SSRAM (synchronous SRAM), DRAM, and SDRAM (synchronous DRAM). As regards the SEDRAM, reference may be made to the technical paper published in the 63 rd International electronic device Association (IEDM 2020) entitled "heterogeneous Integrated Embedded LPDDR4/LPDDR4X DRAM with 34GB/s/1Gb Bandwidth and 0.88pJ/b energy efficient Interface Using 3D Hybrid Bonding technology" (A Stacked Embedded DRAM Array for LPDDR4/4Xusing Hybrid Bonding 3D Integration with 34GB/s/1Gb 0.88pJ/b Logic-to-Memory Interface). The EDRAM is a DRAM structure, and by interconnecting (i.e., bonding) an EDRAM (enhanced dynamic random access memory) and a ball grid on a CPU (central processing unit), a higher bit width and a higher data transmission speed are obtained.

FIG. 1 is a schematic diagram illustrating the formation of a Cache structure according to an embodiment of the present invention by way of "Cache collapse". Fig. 1 is shown as a reference only for the sake of understanding the technical solution of the present disclosure, but it should be understood that the cache structure proposed by the present disclosure is not obtained by logically folding a conventional cache, but provides a new cache configuration. In fig. 1, each Cache line of the Cache before folding includes two regions, namely a high region represented by Hi and a low region represented by Ho, and the bit width of the Cache line after one folding is reduced by half, but the depth is increased by one time correspondingly. It should be understood that fig. 1 schematically shows a conventional cache being folded 1 time, but the number of folds is only illustrative, and the number of folds n can theoretically take any natural number.

The access instructions of different instruction set processors are not identical in format, except for vector instructions, the access instructions of a general processor in function generally comprise four types of fetching bytes (lb), half words (lh), words (lw) and double words (ld), and the access instructions of a general processor comprise four types of storing bytes (sb), half words (sh), words (sw) and double words (ld). Bit width N/2 according to the disclosed cacheⁿMay be based on the maximum data width accessed by memory instructions of the processor it supports.

According to the embodiment of the formula, the organization of the physical Cache lines can be folded into 2 from the original N bits of each lineⁿLine, N/2 of each lineⁿA bit. Namely, the N-bit Cache line is folded into N/2, N/4, N/8 and the like, as long as the size of the folded Cache line is larger than 64 bit.

After the Cache is folded, accessing the Cache according to the folded Cache line, namely the bit width of each access is N/2ⁿA bit. However, the Cache line filling is performed according to N bits of the traditional logic Cache line, namely 2 filling is performed at each timeⁿA physical Cache line. Each tag stored in the tag part of the Cache structure corresponds to 2ⁿA physical Cache line.

According to the Cache structure of the embodiment of the disclosure, the area of the SRAM can be reduced. Generally, when a wafer is used to manufacture a Cache, the larger the number of entries of the SRAM is, the smaller the area thereof is. For example, compared with a conventional Cache block with 64 terms and 128bit wide, the Cache structure of the embodiment of the present invention is adopted, for example, the Cache is folded 1 time, the term number is increased to 128 terms, the bit wide can be reduced to 64bit, the total storage bit number of the whole Cache block is not changed (64 × 128=128 × 64), but the area occupied by the Cache block on the wafer is changed, and the total storage bit number is 5650mm before folding (under the 28nm process condition)²(square millimeter) and can be reduced to 4430mm after folding²(square millimeter) can be reduced by more than 20%.

Fig. 2 schematically shows a schematic diagram illustrating an access method of a cache structure according to an embodiment of the present invention. The cache structure includes a tag portion and a data portion. The cache structure is physically organized to a depth of 2ⁿxM term, bit width of N/2ⁿM, N and n are natural numbers. Each tag stored in the tag portion corresponds to 2ⁿA physical Cache line.

Referring to fig. 2, the access method 20 of the cache structure includes: in step S210, when writing data to the cache structure, pair 2ⁿExecuting data writing operation by each physical Cache line; and step S220, when reading data from the Cache structure, one physical Cache line at a time, namely N/2ⁿThe data read operation is performed by a single bit.

In yet another aspect, an embodiment of the present disclosure further provides an electronic device, which includes the aforementioned cache structure.

Fig. 3 schematically shows a block diagram of an exemplary electronic device according to an embodiment of the present invention. Referring to fig. 3, a block diagram of the structure of an electronic device 30, which may be a server or a client of the present invention, which is an example of a hardware device that may be applied to aspects of the present invention, will now be described. Electronic devices are intended to represent various forms of digital electronic computer devices, such as data center servers, notebook computers, thin clients, laptop computers, desktop computers, workstations, personal digital assistants, blade servers, mainframe computers, and other suitable computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.

As shown in fig. 3, the electronic device 30 includes a computing unit 301 that can perform various appropriate actions and processes according to a computer program stored in a memory 302 or a hard disk 303. During the operation of the computing unit, various programs and data in the memory 302 or the hard disk 303 are loaded into the cache 304. The computing unit 301, the memory 302, the hard disk 303, and the cache 304 are connected to each other via a bus 309. An input/output (I/O) interface 305 is also connected to bus 309. The cache 304 may be implemented as the aforementioned cache structure according to the present disclosure. While the cache 304 is shown as being an external unit to the computing unit 301, it should be understood that the cache 304 may be an on-chip cache integrated with the computing unit 301.

A number of components in the electronic device 30 are connected to the I/O interface 305, including: an input unit 306, an output unit 307, a storage unit 308, and a communication unit 308. The input unit 306 may be any type of device capable of inputting information to the electronic device 30, and the input unit 306 may receive input numeric or character information and generate key signal inputs related to user settings and/or function controls of the electronic device. Output unit 307 may be any type of device capable of presenting information and may include, but is not limited to, a display, speakers, a video/audio output terminal, a vibrator, and/or a printer. The storage unit 308 may include, but is not limited to, a magnetic disk, an optical disk. The communication unit 308 allows the electronic device 30 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunications networks, and may include, but is not limited to, modems, network cards, infrared communication devices, wireless communication transceivers, and/or chipsets, such as bluetooth devices, WiFi devices, WiMax devices, cellular communication devices, and/or the like.

The computing unit 301 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 301 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 301 performs the respective methods and processes described above. For example, in some embodiments, the method of determining an access address may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as a hard disk.

While the invention has been described with reference to what are presently considered to be the embodiments, it is to be understood that the invention is not limited to the disclosed embodiments. On the contrary, the invention is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims. The scope of the following claims is to be accorded the broadest interpretation so as to encompass all such modifications and equivalent structures and functions.

Although the present disclosure has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that changes may be made in the embodiments and/or equivalents thereof without departing from the spirit and scope of the disclosure. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present disclosure should be included in the protection scope of the present disclosure.

Claims

1. A cache structure, said cache structure comprising a tag portion and a data portion, wherein said cache structure is physically organized to a depth of 2ⁿxM term, bit width of N/2ⁿM, N and n are natural numbers,

wherein the cache structure is configured to, when writing data to the cache structure, pair 2ⁿPerforming data write operations on individual physical Cache lines, and when reading data from the Cache structure, writing data to one physical Cache line at a time, i.e. N/2ⁿPerforms a data read operation on a single bit, an

Wherein each tag stored in the tag portion corresponds to 2ⁿA physical Cache line.

2. The cache structure of claim 1, wherein N/2ⁿThe minimum is the maximum data width accessed by the memory access instruction of the processor.

3. The cache architecture of claim 1, wherein N =256, N =512, or N =1024, and N =1, N =2, or N = 3.

4. A cache structure according to any of claims 1-3, characterized in that the cache structure is constructed by sedam, SRAM, SSRAM, SDRAM or DRAM.

5. A method of accessing a cache structure, said cache structure comprising a tag portion and a data portion, said cache structure being physically organized to a depth of 2ⁿxM term, bit width of N/2ⁿM, N and n are natural numbers, the method comprising:

when writing data to the cache structure, pair 2ⁿExecuting data writing operation by each physical Cache line; and

when reading data from the Cache structure, one physical Cache line at a time, N/2ⁿA data read operation is performed on a single bit of data,

6. The access method of claim 5, wherein N/2ⁿThe minimum is the maximum data width accessed by the memory access instruction of the processor.

7. The access method according to claim 5, wherein N =256, N =512 or N =1024, and N =1, N =2 or N = 3.

8. An access method according to any one of claims 5-7, characterised in that the cache structure is constructed by SEDRAM, SRAM, SSRAM, SDRAM or DRAM.

9. An electronic device, characterized in that it comprises a cache structure according to any of claims 1-4.