CN114758687B - Self-adaptive Cache access circuit and implementation method thereof - Google Patents

Self-adaptive Cache access circuit and implementation method thereof Download PDF

Info

Publication number
CN114758687B
CN114758687B CN202210674360.4A CN202210674360A CN114758687B CN 114758687 B CN114758687 B CN 114758687B CN 202210674360 A CN202210674360 A CN 202210674360A CN 114758687 B CN114758687 B CN 114758687B
Authority
CN
China
Prior art keywords
sram
self
signal
data
circuit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210674360.4A
Other languages
Chinese (zh)
Other versions
CN114758687A (en
Inventor
任力争
陈庆
戴瑞萍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Low Power Chip Technology Research Institute Co ltd
Original Assignee
Nanjing Low Power Chip Technology Research Institute Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Low Power Chip Technology Research Institute Co ltd filed Critical Nanjing Low Power Chip Technology Research Institute Co ltd
Priority to CN202210674360.4A priority Critical patent/CN114758687B/en
Publication of CN114758687A publication Critical patent/CN114758687A/en
Application granted granted Critical
Publication of CN114758687B publication Critical patent/CN114758687B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C7/00Arrangements for writing information into, or reading information out from, a digital store
    • G11C7/10Input/output [I/O] data interface arrangements, e.g. I/O data control circuits, I/O data buffers
    • G11C7/1051Data output circuits, e.g. read-out amplifiers, data output buffers, data output registers, data output level conversion circuits
    • G11C7/1057Data output buffers, e.g. comprising level conversion circuits, circuits for adapting load
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C7/00Arrangements for writing information into, or reading information out from, a digital store
    • G11C7/10Input/output [I/O] data interface arrangements, e.g. I/O data control circuits, I/O data buffers
    • G11C7/1078Data input circuits, e.g. write amplifiers, data input buffers, data input registers, data input level conversion circuits
    • G11C7/1084Data input buffers, e.g. comprising level conversion circuits, circuits for adapting load
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C7/00Arrangements for writing information into, or reading information out from, a digital store
    • G11C7/22Read-write [R-W] timing or clocking circuits; Read-write [R-W] control signal generators or management 
    • G11C7/225Clock input buffers
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Memory System Of A Hierarchy Structure (AREA)

Abstract

The invention discloses a self-adaptive Cache access circuit, which comprises a load detection module, an access mode switching circuit and two sets of associated caches, wherein the load detection module is used for detecting load of a Cache; the load detection module is used for monitoring a CPU load module; the access mode switching circuit is used for switching the access mode of the Cache; each path of group association Cache comprises a TAG SRAM, four DATA SRAMs, a comparison circuit and a self-timing circuit. The implementation method of the Cache access circuit comprises the following steps: the access mode switching circuit realizes the switching of a parallel access mode or a self-timing serial access mode by selecting internal clocks of different DATA SRAMs according to the output result of the load monitoring module, and meets the requirements of various scenes timely. The invention adopts a self-timing serial access mode under the condition of low load, and can reduce the access power consumption of the DATA SRAM in the case of miss; under high load conditions, a parallel access mode is adopted to realize high-performance access.

Description

Self-adaptive Cache access circuit and implementation method thereof
Technical Field
The invention relates to a Cache access circuit and an implementation method thereof, in particular to a self-adaptive Cache access circuit and an implementation method thereof.
Background
With the increasing design level of semiconductor technology and processors, the development of microprocessors is rapid, and in the face of exponential increase of microprocessor performance, the performance of memories is almost stagnant, which becomes an important bottleneck affecting system performance. In order to solve the problem of the memory wall, a 'multi-level memory hierarchy' is created, in a multi-level memory hierarchy system, the portion closer to a microprocessor is faster in speed, the smaller in capacity and the more expensive in price, and the portion farther from the microprocessor is slower in speed, the larger in capacity and the cheaper in price. In the above-mentioned "multi-level memory hierarchy", the module closest to the microprocessor and having a speed equivalent to that of the microprocessor is a cache (cache memory). Therefore, the capacity, operating frequency (relative to the microprocessor) and power consumption of the cache are often used as one of the important indexes for measuring the performance of the microprocessor. The use of Cache alleviates the speed mismatch problem between the microprocessor and memory to some extent, but new problems ensue. Along with the increase of the operation dominant frequency of the microprocessor, the scale of the chip is enlarged, and the problem of power consumption gradually becomes the bottleneck of the development of the microprocessor.
The common Cache low-power-consumption technology can be started from an architecture, and the read-write power consumption of the Cache is reduced by optimizing an access mode according to the space locality and the time locality of codes. The common access modes comprise serial access and parallel access, the serial access can avoid the power consumption waste of the redundant DATA SRAM by accessing the TAG Cache and the DATA Cache step by step, but the power consumption waste of the redundant TAG Cache is not avoided, the average access time of the instruction Cache is increased, and the method is a method for exchanging the time for the power consumption. And the parallel access accesses the TAG Cache and the DATA Cache simultaneously, so that the speed is high, but the redundant power consumption loss is generated. Therefore, an access scheme is highly desirable that combines both performance and power consumption.
Disclosure of Invention
The purpose of the invention is as follows: the invention aims to provide a self-adaptive Cache access circuit capable of improving the energy efficiency of a Cache and ensuring the performance under the condition of reducing power consumption and an implementation method thereof.
The technical scheme is as follows: the Cache access circuit comprises a load monitoring module, an access mode switching circuit and two-way group-associated caches;
the load monitoring module is used for monitoring the CPU load module;
the access mode switching circuit is used for switching the access mode of the Cache;
each path of group association Cache comprises a TAG SRAM, four DATA SRAMs, a comparison circuit and a self-timing circuit;
the self-timing circuit and the comparison circuit are used for triggering the generation of an internal clock of the DATA SRAM when the self-timing serial access is carried out;
the TAG SRAM is used for storing a main memory address;
the DATA SRAM is used for storing DATA which needs to be read and written by the microprocessor;
when the TAG SRAM is completely read and hit, the self-timing circuit outputs a high-level signal to be sent to the access mode switching circuit to serve as an internal clock of the DATA SRAM during self-timing serial access; the other input signal of the access mode switching circuit is an external clock signal which is used as an internal clock of the DATA SRAM during parallel access; the output DATA PLINE of the load monitoring module is a selection signal of the access mode switching circuit, and the output signal of the access mode switching circuit is sent to an internal clock of the DATA SRAM.
Further, the access mode switching circuit comprises an alternative selector MUX _1 AND a first AND gate AND _ 1;
one input of the first AND gate AND _1 is accessed to a read-write enable signal WEN of the SRAM, AND the other end of the first AND gate AND _1 is accessed to a serial-parallel selection signal PLINE; the output of the first AND gate AND _1 is connected to the input end of the alternative selector MUX _1 AND is used as a selection signal of the alternative selector MUX _ 1; the '0' end of the alternative selector MUX _1 is connected with an internal clock synchronous with the TAG SRAM, the '1' end of the alternative selector MUX _1 is connected with a Q _ pulse signal generated by a self-timing circuit, and an output signal of the alternative selector MUX _1 serves as an internal clock signal of the DATA SRAM.
Further, the comparison circuit comprises a first exclusive or gate XOR _1, one end of the first exclusive or gate XOR _1 is connected with a CPU address, and the other end of the first exclusive or gate XOR _1 is connected with a TAG SRAM to output data TAG _ Q; the output of the first exclusive or gate XOR _1 is the comparison result HIT of the CPU address and TAG _ Q: when the comparison result is consistent, HIT is high level; when the comparison result is inconsistent, HIT is low level, and the output signal HIT is sent to the self-timing circuit.
Further, the self-timing circuit comprises a first inverter INV1, a second inverter INV2, a third inverter INV3, a second exclusive or gate XOR _2, a fourth inverter INV4, a fifth inverter INV5, a second AND gate AND _2, a sixth inverter INV6, a seventh inverter INV7, a first PMOS transistor MP1, a second PMOS transistor MP2, a third PMOS transistor MP3, a fourth PMOS transistor MP4, a fifth PMOS transistor MP5, a first NMOS transistor MN1, a second NMOS transistor MN2, a third NMOS transistor MN3, a fourth NMOS transistor MN4, a fifth NMOS transistor MN5 AND a sixth NMOS transistor MN 6;
the sense amplifier starting signal SAE of the TAG SRAM is connected with the input of a first inverter INV1, the output of the first inverter INV1 is connected with the input of a second inverter INV2, the output of the second inverter INV2 is connected with the input of a third inverter INV3, and the output of the third inverter INV3 outputs a Q _ valid signal to mark the end of the reading operation of the TAG SRAM; the output of the third inverter INV3 is connected to one input end of the second exclusive or gate XOR _2, the other input end of the second exclusive or gate XOR _2 is constantly at low level, the output of the second exclusive or gate XOR _2 is connected to the input of the fourth inverter INV4, the output of the fourth inverter INV4 is connected to the output of the fifth inverter INV5, and the output of the fifth inverter INV5 marks the completion of the comparison operation;
the output of the fifth inverter INV5 is connected to one end of the second AND gate AND _2 for input, AND the other end of the second AND gate AND _2 is connected to the output signal HIT of the comparator circuit for generating a Q _ en signal; the Q _ en signal passes through a sixth inverter INV6 to generate the COMPB signal; the source end of the first PMOS tube MP1 is connected to a power supply VDD, the drain end is connected to the source end of the second PMOS tube MP2, and the grid is connected with a reverse signal SAEB of SAE; the drain terminal of the second PMOS transistor MP2 is an output signal Q _ pulse, and the gate is connected to the comp b signal; the source end of the third PMOS tube MP3 is connected to a power supply VDD, the input signal of the grid electrode is SAE, and the drain end of the third PMOS tube MP3 is connected to the source end of the fourth PMOS tube MP 4; the gate input signal of the fourth PMOS transistor MP4 is the inverse signal CLKB of CLK, and the drain is connected to the fifth PMOS transistor MP 5; the gate of the fifth PMOS transistor MP5 inputs the Q _ pulse inverted signal Q _ pulseb, and the drain is a Q _ pulse signal;
the drain terminal of the first NMOS transistor MN1 is connected with Q _ pulse, the gate input CLKB and the source terminal is connected with a power ground VSS; the drain end of the second NMOS tube MN2 is connected with Q _ pulse, the grid input is Q _ pulse eb, and the source end is connected with a third NMOS tube MN 3; the grid input of the third NMOS tube MN3 is CLK, and the source end is connected with the drain end of the fourth NMOS tube MN 4; the grid electrode input of the fourth NMOS transistor MN4 is SAE, and the source end is connected with a power ground VSS; a drain end of the fifth NMOS tube MN5 is connected with Q _ pulse, a grid electrode is a COMP (complementary metal oxide semiconductor) inverse signal COMPB, and a source end of the fifth NMOS tube MN6 is connected with a sixth NMOS tube MN 6; the grid input of a sixth NMOS transistor MN6 is SAE, and the source end is a power ground VSS; the output Q _ pulse signal passes through a seventh inverter INV7 to generate an inverted signal Q _ pulseb; the Q _ pulse signal is sent to the access mode switching circuit block.
The implementation method of the Cache access circuit comprises the following steps: the access mode switching circuit realizes the switching of a parallel access mode or a self-timing serial access mode by selecting internal clocks of different DATA SRAMs according to the output result of the load monitoring module, and meets the requirements of various scenes in time;
in the parallel access mode, the internal clocks of the TAG SRAM and the DATA SRAM are the same, the reading operation is carried out simultaneously, and the comparison circuit and the self-timing circuit are bypassed;
in a self-timing serial access mode, an internal clock of the TAG SRAM is consistent with an external clock, reading operation is firstly carried out, when the TAG SRAM HITs, a comparison circuit sends out HIT signals, and a self-timing circuit tracks the reading operation time of the TAG SRAM and generates the internal clock; the access mode switching circuit selects a clock generated by the self-timing circuit to be sent to an internal clock end of the DATA SRAM, and the DATA SRAM generates each read operation signal according to the internal clock to carry out read operation and read DATA of the DATA Cache; if the DATA Cache is missed, the internal clock of the DATA SRAM is not generated, and the DATA Cache is not accessed any more.
Further, under the condition of monitoring the load of the CPU, the access mode switching circuit timely adjusts the access mode of the Cache:
when the CPU is in a high-load scene, a parallel access mode is adopted;
when the CPU is in a low-load scene, self-timing serial access is adopted;
through the self-timing circuit and the comparison circuit, under the condition that the TAG SRAM is read out and hit, the internal clock of the DATA SRAM is started to carry out reading operation, and self-timing serial access is completed.
Further, the self-timing serial access firstly accesses the TAG SRAM, when the read DATA of the TAG SRAM is consistent and effective with the CPU address, a clock in the DATA SRAM is started through the comparison circuit and the self-timing circuit to carry out reading operation, and DATA are read in one period; if not, the internal clock of the DATA SRAM is not started, and the DATA SRAM is not accessed any more.
Further, the output result of the TAG SRAM is sent into a comparison circuit, if the TAG SRAM HITs, a HIT signal is output and sent to a self-timing circuit, and meanwhile, an enabling signal SAE of a sensitive amplifier of the TAG SRAM is also sent to the self-timing circuit and used for tracking the time of the completion of the reading operation of the TAG SRAM;
when WEN is at low level, write operation is performed on the Cache; when WEN is high level, read operation is performed on the Cache;
when the PLINE is at a low level, the load of the CPU is high, and parallel access is adopted; when the PLINE signal is at a high level, the CPU is indicated to be in a low-load condition, and self-timing serial access is adopted;
when writing operation is carried out, the internal clock of the DATA SRAM is consistent with the internal clock of the TAG SRAM, and parallel access is carried out; when reading operation is carried out, if the PLINE is '0', the internal clock of the DATA SRAM is consistent with the internal clock of the TAG SRAM, and the Cache carries out parallel access; when the PLINE is 1, an internal clock of the DATA SRAM is generated by the comparison circuit and the self-timing circuit, and the Cache performs self-timing serial access;
and if and only when the TAG SRAM finishes reading the DATA and hits, the comparison circuit and the self-timing circuit trigger the internal clock of the DATA SRAM to carry out the reading operation of the DATA SRAM.
Compared with the prior art, the invention has the following remarkable effects:
1. the invention can timely adjust the access mode according to the load condition of the CPU: under the condition of high load, a parallel access mode is adopted, the reading and writing of the TAG SRAM and the DATA SRAM are completed simultaneously in one period, and the performance of the Cache is improved; the self-timing serial access is adopted under the condition of low load, and through the comparison circuit and the self-timing circuit, under the condition that the TAG SRAM is read out and hit, the clock in the DATA SRAM is started to read the DATA SRAM;
2. according to the self-timing serial access mode, the internal clock of the DATA SRAM is started to carry out reading operation under the condition that the TAG SRAM is completely read and hit through the self-timing circuit and the comparison circuit, compared with the traditional serial access mode with two periods, the self-timing serial access mode can read DATA of the DATA SRAM in the same period, and the performance is guaranteed under the condition that the power consumption is reduced.
Drawings
FIG. 1 is a circuit diagram of an adaptive access scheme according to the present invention;
FIG. 2 is a flow chart of the adaptive access method of the present invention;
FIG. 3 is a waveform diagram illustrating an adaptive access scheme according to the present invention;
FIG. 4 is a diagram of an implementation of an adaptive access mode switching circuit according to the present invention;
FIG. 5 is a schematic diagram of a comparison circuit and a self-timing circuit in serial access according to the present invention.
Detailed Description
The invention is described in further detail below with reference to the drawings and the detailed description.
As shown in fig. 1, the adaptive Cache access circuit of the present invention includes a load monitoring module, an access mode switching circuit, a self-timing circuit and a comparison circuit, two sets of associated TAG SRAM and DATA SRAM; the load monitoring module is used for monitoring the CPU load module; the access mode switching circuit is used for switching the access mode of the Cache; each path of the group correlation Cache comprises a TAG SRAM, four DATA SRAMs, a comparison circuit and a self-timing circuit; the self-timing circuit and the comparison circuit are used for triggering the generation of an internal clock of the DATA SRAM at the time of self-timing serial access; the TAG SRAM is used for storing main memory addresses; and the DATA SRAM is used for storing DATA which needs to be read and written by the microprocessor. And each path of instruction Cache sends an output result of the TAG SRAM into the comparison circuit, if the TAG SRAM HITs, a HIT signal is output and sent to the self-timing circuit, and meanwhile, an enabling signal SAE of a sensitive amplifier of the TAG SRAM is also sent to the self-timing circuit and used for tracking the time of completing the reading operation of the TAG SRAM. When the TAG SRAM is completely read and hit, the self-timing circuit outputs a high-level signal to be sent to the access mode switching circuit to serve as an internal clock of the DATA SRAM during self-timing serial access; the other input signal of the access mode switching circuit is an external clock signal which is used as an internal clock of the DATA SRAM during parallel access; the output DATA of the load monitoring module is used as a selection signal of the access mode switching circuit as PLINE, and the output signal of the access mode switching circuit is sent to an internal clock of the DATA SRAM.
The access mode switching circuit realizes the switching of a parallel access mode or a self-timing serial access mode by selecting internal clocks of different DATA SRAMs according to the output result of the load monitoring module, and meets the requirements of various scenes timely. The parallel access mode refers to synchronous access of the TAG SRAM and the DATA SRAM and simultaneous read operation; the serial access mode refers to asynchronous access of the TAG SRAM and the DATA SRAM, and in the same clock period, the TAG SRAM is accessed first, and then the DATA SRAM is accessed.
In the parallel access mode, internal clocks of the TAG SRAM and the DATA SRAM are the same, reading operation is carried out simultaneously, the comparison circuit and the self-timing circuit are bypassed, the access speed is high, and the requirement of a high-performance scene can be met.
In the self-timing serial access mode, the internal clock of the TAG SRAM is consistent with the external clock, the reading operation is firstly carried out, when the TAG SRAM HITs, the comparison circuit sends out HIT signals, and the self-timing circuit tracks the reading operation time of the TAG SRAM and generates the internal clock. The access mode switching circuit selects a clock generated by the self-timing circuit to be sent to an internal clock end of the DATA SRAM, the DATA SRAM generates each read operation signal according to the internal clock, the read operation is carried out, DATA of the DATA Cache is read out, if the DATA Cache is not hit, the internal clock of the DATA SRAM is not generated, the DATA Cache is not accessed, therefore, the access power consumption of the DATA SRAM which does not hit a path is saved, and the access power consumption of the Cache can be effectively reduced for scenes with low performance requirements.
The flow of the self-adaptive Cache access circuit implementation method of the invention is shown in FIG. 2, and comprises the following steps:
step S1, the processor sends out an access request, and read-write judgment is carried out by combining the read-write signal sent out by the processor;
step S2, when the operation is write operation, the TAG SRAM and the DATA SRAM are accessed at the same time, and the DATA are written into the TAG SRAM and the DATA SRAM at the same time;
when it is a read operation, the load of the CPU is first detected: under the condition of high load, the access mode of the Cache is parallel access, and whether the TAG Cache hits or not, the TAG SRAM and the DATA SRAM are accessed simultaneously to read DATA; when the TAG SRAM is read and hit, an internal clock of the DATA SRAM is generated through the self-timing circuit to access the DATA SRAM, DATA in the DATA SRAM is read in the same period, when the TAG SRAM is not hit, an internal clock signal of the DATA SRAM is not generated, the DATA SRAM is not accessed, and therefore power consumption of the DATA SRAM is saved when the DATA SRAM is not hit.
As shown in fig. 3, the waveform of the adaptive Cache access method of the present invention can be seen from the figure:
when the write operation is carried out, the internal clocks of the two paths of TAG SRAM and DATA SRAM are consistent with the external clock, and DATA are written into the two paths of TAG SRAM and DATA SRAM simultaneously.
When reading operation is carried out, if the CPU is in high load, the PLINE signal output by the CPU load monitoring module is in low level, the internal clock and the external clock of the two paths of TAG SRAM and DATA SRAM are consistent, and simultaneously, the reading operation is carried out, the DATA of the two paths of TAG SRAM and DATA SRAM are read out and accessed in parallel; if the CPU is in low load, the PLINE signal is in high level, the TAG SRAM in the two Cache ways is accessed firstly, when the TAG SRAM is read out and the comparison result is HIT, the HIT signal of the HIT way is changed into high level, the HIT signal is sent to the self-timing circuit, the Q _ plus which generates wide pulse is sent to the internal clock interface of the DATA SRAM, the DATA SRAM starts to carry out reading operation, and the DATA DATA of the HIT way is read out; if the Cache of the miss path does not generate DATA CLK, the access of the DATA SRAM is not performed any more; when the two paths are not hit, the internal clock DATA CLK of the two paths of DATA SRAM is low level, and the two paths are not read and written. It can be seen that self-timed serial access saves DATA read power consumption for the miss way.
As shown in fig. 4, the switching circuit of the adaptive access mode of the present invention includes an alternative selector MUX _1 AND a first AND gate AND _ 1. One input of the first AND gate AND _1 is accessed to a read-write enable signal WEN of the SRAM, AND the other end of the first AND gate AND _1 is accessed to a serial-parallel selection signal PLINE. When WEN is low level, the Cache is subjected to write operation, and when WEN is high level, the Cache is subjected to read operation; when the PLINE is in low level, the CPU load is high, parallel access is needed, when the PLINE signal is in high level, the CPU is in low load condition, and self-timing serial access is adopted. The output of the first AND gate AND _1 is connected to the input of the one-out-of-two selector MUX _1 as its selection signal. One end of the alternative selector, which selects '0', is connected with an internal clock synchronous with the TAG SRAM; one end of '1' is selected to be connected into a Q _ plus signal generated by a self-timing circuit; the output signal of the alternative selector MUX _1 is finally used as the internal clock signal of the DATA SRAM. Therefore, when the writing operation is carried out, the internal clock of the DATA SRAM is consistent with the internal clock of the TAG SRAM, and parallel access is carried out; when reading operation is carried out, if the PLINE is '0', the internal clock of the DATA SRAM is consistent with the internal clock of the TAG SRAM, and the Cache carries out parallel access; when PLINE is "1", the internal clock of DATA SRAM is generated by the comparison circuit and the self-timing circuit, and the Cache performs self-timing serial access.
As shown in fig. 5, the comparison circuit of the present invention includes a first exclusive or gate XOR _1, one end of the first exclusive or gate XOR _1 is connected to the CPU address, and the other end is connected to the TAG SRAM output data TAG _ Q, and the first exclusive or gate XOR _1 outputs the comparison result HIT between the CPU address and the TAG _ Q. When the comparison result is consistent, HIT is high level, and when the comparison result is inconsistent, HIT is low level, and the output signal HIT is sent to the self-timing circuit.
The self-timing circuit comprises a first inverter INV1, a second inverter INV2, a third inverter INV3, a second exclusive OR gate XOR _2, a fourth inverter INV4, a fifth inverter INV5, a second AND gate AND _2, a sixth inverter INV6, a seventh inverter INV7, a first PMOS tube MP1, a second PMOS tube MP2, a third PMOS tube MP3, a fourth PMOS tube MP4, a fifth PMOS tube MP5, a first NMOS tube MN1, a second NMOS tube MN2, a third NMOS tube MN3, a fourth NMOS tube MN4, a fifth NMOS tube MN5 AND a sixth NMOS tube MN 6.
The sense amplifier starting signal SAE of the TAG SRAM is connected with the input of a first inverter INV1, the output of the first inverter INV1 is connected with the input of a second inverter INV2, the output of the second inverter INV2 is connected with the input of a third inverter INV3, and the third inverter INV3 outputs a Q _ valid signal for marking the end of the reading operation of the TAG SRAM; the output of the third inverter INV3 is connected to one end of the input of the second exclusive or gate XOR _2, the input of the other end of the second exclusive or gate XOR _2 is constantly low, the output of the second exclusive or gate XOR _2 is connected to the input of the fourth inverter INV4, the output of the fourth inverter INV4 is connected to the output of the fifth inverter INV5, and the output of the fifth inverter INV5 marks the completion of the comparison operation. The output of the fifth inverter INV5 is connected to one end of the second AND gate AND _2, AND the other end of the second AND gate AND _2 is connected to the output signal HIT of the comparator circuit, generating a Q _ en signal; when Q _ en is in low level, the state that the TAG SRAM is completely read but not hit is represented, and the read state is sent into the DATA SRAM without reading operation; when the Q _ en signal is high, it indicates that the TAG SRAM is completely read and hit, at this time, the internal clock signal of the DATA SRAM may be triggered, and the pulse width of the Q _ en signal, which is the internal clock signal of the DATA SRAM, is narrower, so that the Q _ en signal passes through the sixth inverter INV6 to generate the COMPB signal; the source end of the first PMOS pipe MP1 is connected to a power supply VDD, the drain end is connected to the source end of the second PMOS pipe MP2, and the grid is connected with a reverse signal SAEB of SAE; the drain terminal of the second PMOS transistor MP2 is an output signal Q _ pulse, and the gate is connected to the comp b signal; the source end of the third PMOS tube is connected to a power supply VDD, the input signal of the grid electrode is SAE, and the drain end of the third PMOS tube is connected to the source end of a fourth PMOS tube MP 4; the gate input signal of the fourth PMOS transistor MP4 is the inverse signal CLKB of CLK, and the drain is connected to the fifth PMOS transistor MP 5; the gate of the fifth PMOS transistor MP5 inputs the Q _ pulse inverted signal Q _ pulseb, and the drain is a Q _ pulse signal; the drain terminal of the first NMOS transistor MN1 is connected with Q _ pulse, the grid electrode is input with CLKB, and the source terminal is connected with a power ground VSS; the drain end of the second NMOS tube MN2 is Q _ pulse, the input end of the second NMOS tube MN2 is Q _ pulseb, and the source end of the second NMOS tube MN3 is connected with the drain end of the second NMOS tube; the gate input of the third NMOS transistor MN3 is CLK, and the source end is connected with the source end of the fourth NMOS transistor MN 4; the grid input of the fourth NMOS tube is SAE, and the drain end of the fourth NMOS tube is connected with a power ground VSS; the drain end of the fifth NMOS tube MN5 is connected with Q _ pulse, the grid electrode is a COMP (complementary metal oxide semiconductor) inverse signal COMPB, and the source end of the fifth NMOS tube MN6 is connected with the sixth NMOS tube MN; the grid input of a sixth NMOS transistor MN6 is SAE, and the source end is a power ground VSS; the output signal Q _ pulse passes through the seventh inverter INV7 to generate the inverse signal Q _ pulseb. The generated signal Q _ pulse is a pulse width signal with Q _ en rising edge as rising edge and CLK falling edge as falling edge, and is sent to the access mode switching circuit as the internal clock of TAG SRAM when the read operation is finished and DATA SRAM is hit.
According to the invention, the access mode switching circuit timely adjusts the access mode of the Cache by monitoring the load condition of the CPU. When the load of the CPU is higher, such as web browsing and game operation, a parallel access mode is adopted, and the performance of the Cache is improved; when the load of the CPU is lower, such as music playing and background timing, self-timing serial access is adopted, and the power consumption of the DATASRAM when the Cache is not hit is reduced. In addition, the self-timing serial access enables the internal clock of the DATA SRAM to be started for reading operation under the condition that the TAG SRAM is completely read and hit through the self-timing circuit and the comparison circuit.

Claims (5)

1. A self-adaptive Cache access circuit is characterized by comprising a load monitoring module, an access mode switching circuit and two-way group-associated caches;
the load monitoring module is used for monitoring the CPU load module;
the access mode switching circuit is used for switching the access mode of the Cache;
each path of group association Cache comprises a TAG SRAM, four DATA SRAMs, a comparison circuit and a self-timing circuit;
the self-timing circuit and the comparison circuit are used for triggering the generation of an internal clock of the DATA SRAM in self-timing serial access;
the TAG SRAM is used for storing a main memory address;
the DATA SRAM is respectively used for storing DATA to be read and written by the microprocessor;
when the TAG SRAM is completely read and hit, the self-timing circuit outputs a high-level signal to be sent to the access mode switching circuit to serve as an internal clock of the DATA SRAM during self-timing serial access; the other input signal of the access mode switching circuit is an external clock signal which is used as an internal clock of the DATA SRAM during parallel access; the output DATA PLINE of the load monitoring module is used as a selection signal of the access mode switching circuit, and an output clock signal selected by the access mode switching circuit is sent to a clock end of the DATA SRAM;
the access mode switching circuit comprises an alternative selector (MUX _1) AND a first AND gate (AND _ 1);
one input of the first AND gate (AND _1) is accessed to a read-write enable signal WEN of the SRAM, AND the other end of the first AND gate (AND _1) is accessed to a serial-parallel selection signal PLINE; the output of the first AND gate (AND _1) is connected to the input end of the alternative selector (MUX _1) AND is used as the selection signal of the alternative selector (MUX _ 1); the '0' end of the alternative selector (MUX _1) is connected with an internal clock synchronous with the TAG SRAM, the '1' end of the alternative selector (MUX _1) is connected with a Q _ plus signal generated by a self-timing circuit, and an output signal of the alternative selector (MUX _1) serves as an internal clock signal of any one DATA SRAM;
the comparison circuit comprises a first exclusive-OR gate (XOR _1), one end of the first exclusive-OR gate (XOR _1) is connected with a CPU address, and the other end of the first exclusive-OR gate is connected with a TAG SRAM to output data TAG _ Q; the first exclusive or gate (XOR _1) outputs the comparison result HIT of the CPU address and TAG _ Q: when the comparison result is consistent, HIT is high level; when the comparison result is inconsistent, HIT is low level, and the output signal HIT is sent to the self-timing circuit;
the self-timing circuit comprises a first inverter (INV1), a second inverter (INV2), a third inverter (INV3), a second exclusive-OR gate (XOR _2), a fourth inverter (INV4), a fifth inverter (INV5), a second AND gate (AND _2), a sixth inverter (INV6), a seventh inverter (INV7), a first PMOS (MP1), a second PMOS (MP2), a third PMOS (MP3), a fourth PMOS (MP4), a fifth PMOS (MP5), a first NMOS (MN1), a second NMOS (MN2), a third NMOS (MN3), a fourth NMOS (MN4), a fifth NMOS (MN5) AND a sixth NMOS (MN 6);
the sense amplifier starting signal SAE of the TAG SRAM is connected with the input of a first inverter (INV1), the output of the first inverter (INV1) is connected with the input of a second inverter (INV2), the output of the second inverter (INV2) is connected with the input of a third inverter (INV3), and the third inverter (INV3) outputs a Q _ valid signal to mark the end of the reading operation of the TAG SRAM; the output of the third inverter (INV3) is connected to one input end of the second exclusive-OR gate (XOR _2), the other input end of the second exclusive-OR gate (XOR _2) is constantly at a low level, the output of the second exclusive-OR gate (XOR _2) is connected to the input of the fourth inverter (INV4), the output of the fourth inverter (INV4) is connected to the input of the fifth inverter (INV5), and the output of the fifth inverter (INV5) marks the completion of the comparison operation;
the output of the fifth inverter (INV5) is connected to one end of the second AND gate (AND _2), AND the other end of the second AND gate (AND _2) is connected to the output signal HIT of the comparison circuit, so as to generate a Q _ en signal; the Q _ en signal passes through a sixth inverter (INV6) to generate the COMPB signal; the source end of a first PMOS tube (MP1) is connected to a power supply VDD, the drain end of the first PMOS tube is connected to the source end of a second PMOS tube (MP2), and the grid of the first PMOS tube is connected with a reverse signal SAEB of SAE; the drain terminal of the second PMOS tube (MP2) is an output signal Q _ pulse, and the grid electrode of the second PMOS tube is connected to a COMPB signal; the source end of the third PMOS tube (MP3) is connected to a power supply VDD, the input signal of the grid electrode is SAE, and the drain end of the third PMOS tube is connected to the source end of the fourth PMOS tube (MP 4); the gate input signal of the fourth PMOS tube (MP4) is an inverse signal CLKB of CLK, and the drain is connected with the source of the fifth PMOS tube (MP 5); a gate of the fifth PMOS transistor (MP5) inputs a Q _ pulse inverted signal Q _ pulse, and a drain of the fifth PMOS transistor (MP5) is a Q _ pulse signal;
the drain terminal of the first NMOS transistor (MN1) is connected with Q _ pulse, the grid terminal is input with CLKB, and the source terminal is connected with a power ground VSS; the drain end of the second NMOS tube (MN2) is connected with Q _ pulse, the input of the grid electrode is Q _ pulse, and the source end of the second NMOS tube is connected with the drain end of the third NMOS tube (MN 3); the gate input of the third NMOS transistor (MN3) is CLK, and the source end of the third NMOS transistor is connected with the drain end of the fourth NMOS transistor (MN 4); the grid input of a fourth NMOS tube (MN4) is SAE, and the source end is connected with a power ground VSS; the drain end of a fifth NMOS (MN5) is connected with Q _ pulse, the grid electrode is an inverse signal COMPB of COMP, and the source end of the fifth NMOS is connected with the drain end of a sixth NMOS (MN 6); the grid input of a sixth NMOS transistor (MN6) is SAE, and the source end is power ground VSS; the output Q _ pulse signal passes through a seventh inverter (INV7) to generate an inverse signal Q _ pulse; the Q _ pulse signal is sent to the access mode switching circuit block.
2. The method for implementing the adaptive Cache access circuit according to claim 1, wherein the access mode switching circuit selects internal clocks of different DATA SRAMs according to the result output by the load monitoring module to implement switching of a parallel access mode or a self-timing serial access mode;
in the parallel access mode, the internal clocks of the TAG SRAM and the DATA SRAM are the same, the reading operation is carried out simultaneously, and the comparison circuit and the self-timing circuit are bypassed;
in a self-timing serial access mode, an internal clock of the TAG SRAM is consistent with an external clock, reading operation is firstly carried out, when the TAG SRAM HITs, a comparison circuit sends out HIT signals, and a self-timing circuit tracks the reading operation time of the TAG SRAM and generates the internal clock; the access mode switching circuit selects a clock generated by the self-timing circuit to be sent to an internal clock end of the DATA SRAM, and the DATA SRAM generates each read operation signal according to the internal clock to carry out read operation and read DATA of the DATA Cache; if the DATA Cache is missed, the internal clock of the DATA SRAM is not generated, and the DATA Cache is not accessed any more.
3. The method for implementing the adaptive Cache access circuit according to claim 2, wherein the access mode switching circuit timely adjusts an access mode of the Cache when monitoring the load of the CPU:
when the CPU is in a high-load scene, a parallel access mode is adopted;
when the CPU is in a low-load scene, self-timing serial access is adopted;
through the self-timing circuit and the comparison circuit, under the condition that the TAG SRAM is read out and hit, the internal clock of the DATA SRAM is started to carry out reading operation, and self-timing serial access is completed.
4. The method for implementing the self-adaptive Cache access circuit according to claim 2, wherein the self-timed serial access firstly accesses the TAG SRAM, and when the read DATA of the TAG SRAM is consistent and valid with the address of the CPU, the clock in the DATA SRAM is started through the comparison circuit and the self-timed circuit to perform read operation, so that the DATA can be read in one period; if not, the internal clock of the DATA SRAM is not started, and the DATA SRAM is not accessed any more.
5. The method of claim 2, wherein the output of the TAG SRAM is sent to a comparator circuit, if the TAG SRAM HITs, a HIT signal is output to a self-timing circuit, and meanwhile, an SAE signal is also sent to the self-timing circuit for tracking the time of the TAG SRAM read operation;
when WEN is at low level, write operation is performed on the Cache; when WEN is high level, read operation is performed on the Cache;
when the PLINE is at a low level, the load of the CPU is high, and parallel access is adopted; when the PLINE signal is at a high level, the CPU is indicated to be in a low-load condition, and self-timing serial access is adopted;
when writing operation is carried out, the internal clock of the DATA SRAM is consistent with the internal clock of the TAG SRAM, and parallel access is carried out; when reading operation is carried out, if the PLINE is '0', the internal clock of the DATA SRAM is consistent with the internal clock of the TAG SRAM, and the Cache carries out parallel access; when the PLINE is 1, an internal clock of the DATA SRAM is generated by the comparison circuit and the self-timing circuit, and the Cache performs self-timing serial access;
and if and only when the TAG SRAM finishes reading the DATA and hits, the comparison circuit and the self-timing circuit trigger the internal clock of the DATA SRAM to carry out the reading operation of the DATA SRAM.
CN202210674360.4A 2022-06-15 2022-06-15 Self-adaptive Cache access circuit and implementation method thereof Active CN114758687B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210674360.4A CN114758687B (en) 2022-06-15 2022-06-15 Self-adaptive Cache access circuit and implementation method thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210674360.4A CN114758687B (en) 2022-06-15 2022-06-15 Self-adaptive Cache access circuit and implementation method thereof

Publications (2)

Publication Number Publication Date
CN114758687A CN114758687A (en) 2022-07-15
CN114758687B true CN114758687B (en) 2022-09-02

Family

ID=82336330

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210674360.4A Active CN114758687B (en) 2022-06-15 2022-06-15 Self-adaptive Cache access circuit and implementation method thereof

Country Status (1)

Country Link
CN (1) CN114758687B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH02170248A (en) * 1988-12-23 1990-07-02 Hitachi Ltd Disk access control method for disk controller containing cache
CN201540564U (en) * 2009-12-21 2010-08-04 东南大学 Dynamic distribution circuit for distributing on-chip heterogenous storage resources by utilizing virtual memory mechanism
CN106708747A (en) * 2015-11-17 2017-05-24 深圳市中兴微电子技术有限公司 Memory switching method and device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH02170248A (en) * 1988-12-23 1990-07-02 Hitachi Ltd Disk access control method for disk controller containing cache
CN201540564U (en) * 2009-12-21 2010-08-04 东南大学 Dynamic distribution circuit for distributing on-chip heterogenous storage resources by utilizing virtual memory mechanism
CN106708747A (en) * 2015-11-17 2017-05-24 深圳市中兴微电子技术有限公司 Memory switching method and device

Also Published As

Publication number Publication date
CN114758687A (en) 2022-07-15

Similar Documents

Publication Publication Date Title
US6134180A (en) Synchronous burst semiconductor memory device
KR100274731B1 (en) Synchronous dram whose power consumption is minimized
US5808961A (en) Internal clock generating circuit for clock synchronous type semiconductor memory device
Chang et al. Zero-aware asymmetric SRAM cell for reducing cache power in writing zero
Dreslinski et al. Reconfigurable energy efficient near threshold cache architectures
US6961276B2 (en) Random access memory having an adaptable latency
KR100540964B1 (en) Semiconductor device
US11221665B2 (en) Static power reduction in caches using deterministic naps
KR100459726B1 (en) Data inversion circuit of multi-bit pre-fetch semiconductor device and method there-of
US7649764B2 (en) Memory with shared write bit line(s)
CN114758687B (en) Self-adaptive Cache access circuit and implementation method thereof
Lee et al. A selective filter-bank TLB system
US9601191B2 (en) Exploiting phase-change memory write asymmetries to accelerate write
Karl et al. Timing error correction techniques for voltage-scalable on-chip memories
JPH0845275A (en) Control method for memory reading and writing, and memory device using the method
KR100398954B1 (en) Multi-way set associative cache memory and data reading method therefrom
CN101158926B (en) Apparatus and method for saving power in a trace cache
Masgonty et al. Low-power sram and rom memories
CN114121112A (en) Bubble collapse register in semiconductor device
US6862242B2 (en) SRAM control circuit with a power saving function
CN110634517A (en) High-performance static random access memory
Kumar et al. A 0.47 V-1.17 V 32KB Timing Speculative SRAM in 28nm HKMG CMOS
Pon Technology scaling impact on NOR and NAND flash memories and their applications
CN105608021A (en) Storage device and method capable of utilizing content addressing MRAM (Magnetic Random Access Memory)
JP3559631B2 (en) Semiconductor memory and data processing device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant