CN116166606B

CN116166606B - Cache control architecture based on shared tightly coupled memory

Info

Publication number: CN116166606B
Application number: CN202310431873.7A
Authority: CN
Inventors: 郑茳; 肖佐楠; 匡启和; 陈石; 王惠忠; 邹海春
Original assignee: Wuxi Guoxin Micro Hi Tech Co ltd
Current assignee: Wuxi Guoxin Micro Hi Tech Co ltd
Priority date: 2023-04-21
Filing date: 2023-04-21
Publication date: 2023-07-14
Anticipated expiration: 2043-04-21
Also published as: CN116166606A

Abstract

The application discloses a Cache control architecture based on a shared tightly coupled memory, which relates to the field of embedded processors, wherein a shared controller is arranged between a TCM (traditional Chinese medicine) and a Cache, and a data sharing control module of the shared controller performs data addressing, determines temperature data in DCache and writes data to be read and written into a DTCM; the data cache control module performs read-write control on DCache according to the addressing result; the instruction sharing control module performs instruction addressing, determines a temperature instruction in the ICache and writes an instruction to be read into the ITCM; and the instruction cache control module performs read-write control on ICache according to the addressing result. The architecture takes the idle space of the TCM as the extension space of the Cache and performs partition control and scheduling, so that the Core can read and write more data and instructions from the Cache and the TCM, and the frequency and time for accessing the main memory are reduced.

Description

Cache control architecture based on shared tightly coupled memory

Technical Field

The embodiment of the application relates to the field of embedded processors, in particular to a cache control architecture based on a shared tightly coupled memory.

Background

Chips with embedded high-speed Central Processing Units (CPUs) as cores are widely applied to the fields of automobile control, industrial control and the like, and the application scenes have strict requirements on the computing capacity and the real-time response performance of the CPUs. Instructions and data required for the operation of the CPU are stored in the memory, and the speed of access to the memory affects the actual operating speed of the CPU. In order to enable the speed of accessing the memory by the CPU to match the clock frequency of the CPU, the CPU generally adopts a hierarchical memory architecture, and the closer to the processor Core (Core), the faster the memory speed, the smaller the capacity and the more expensive the memory is.

The main stream CPU adopts a Cache (Cache) as a bridge between the Core and the main memory, and the read-write speed of the Cache is hundreds of times that of the main memory. Logically, a Cache is a subset of main memory, which is a real-time mirror of a portion of the address space of main memory. Cache is divided into an instruction Cache (ICache) and a data Cache (DCache) for storing instructions and data, respectively. Some instructions of the CPU need to be read frequently in actual running, such as actual interrupt processing and encryption algorithm. Some data is required to be read and written frequently, such as stacks and interrupt vectors. In order to further accelerate the CPU to read and write the instructions and the data, a Tightly Coupled Memory (TCM) is adopted in the high-speed CPU to store the instructions and the data, and the read and write speed of the TCM is consistent with that of the Cache. Logically, the TCM is a complement to the main memory, and the TCM address and the main memory accessible address are non-intersecting. TCM is divided into Instruction Tightly Coupled Memory (ITCM) and Data Tightly Coupled Memory (DTCM). Both Cache and TCM typically use Static Random Access Memory (SRAM), while main memory typically uses Dynamic Random Access Memory (DRAM), the former having a read/write speed 100 times that of the latter, and the latter having a read/write speed 100 times that of nonvolatile memory Flash.

In the design of a chip storage architecture, the sizes of the Cache and the TCM are determined according to the specific application scene of the chip, and software distributes common instructions and data to be written into the ITCM and the DTCM. However, when the application scene of the chip is changed, that is, when the instruction and the data operated by the CPU are different, the condition that the read-write busy degree of the Cache is inconsistent with the read-write busy degree of the TCM is often caused, so that the access efficiency of the processor is reduced. The following three countermeasures are generally adopted:

1. chip redesign and production, cache and TCM sizing, and reflow sheet production, but this approach incurs significant cost overhead.

2. The control strategy of the Cache is optimized, such as adjusting and adding Cache grouping, adjusting and replacing and writing back strategies, and the like, and the method is essentially space and time moving, so that a 'pressing hoist floating ladle' effect is often caused, and the overall memory access efficiency is improved slightly.

3. The software compiling is adjusted to adjust the instructions and the data in the TCM, so that the space of the TCM is used as much as possible, but the TCM has no scheduling strategy and can only passively read and write, and the access frequency of the instructions and the data cannot be accurately judged in the software compiling stage, so that the aim of substantially optimizing the read and write speed cannot be achieved.

Disclosure of Invention

The embodiment of the application provides a cache control architecture based on a shared tightly coupled memory, which further improves the memory access efficiency of a processor. Specifically, the cache control architecture comprises a data tightly coupled memory DTCM, an instruction tightly coupled memory ITCM, a data cache DCache, an instruction cache ICache, a main memory and a shared controller; the shared controller is arranged between the main memory and the close-coupled memory and the cache through a data bus and an instruction bus, and comprises:

the data sharing control module is configured to address the DCache, the DTCM and the main memory according to a target data address, determine temperature data in the DCache and transfer the temperature data into the DTCM, and write an address block of data to be read and written into the DTCM; the temperature data is determined based on a data replacement frequency for the target data address;

the data cache control module is configured to perform read-write control on the DCache according to the addressing result of the target data address;

the instruction sharing control module is configured to address the ICache, the ITCM and the main memory according to a target instruction address, determine a temperature instruction in the ICache, transfer the temperature instruction into the ITCM and write an address block of an instruction to be read into the ITCM; the warm instruction is determined based on an instruction replacement frequency for the target instruction address;

And the instruction cache control module is configured to carry out read-write control on the target instruction on the ICache according to the addressing result of the target instruction address.

Specifically, the storage area of the DTCM is divided into a data main area, a data to-be-read area, a data to-be-written area and a data expansion area;

the data main area is configured to store core data frequently read by the CPU;

the data to-be-read area is configured to store the data to be read and written read back from the main memory;

the data to-be-written area is configured to store synchronous data to be updated to the main memory in the DCache;

the data expansion area is configured to store the temperature data transferred from the DCache.

Specifically, the memory area of the ITCM is divided into an instruction main area, an instruction to-be-read area and an instruction expansion area;

the instruction main area is configured to store core instructions frequently read by a CPU;

the instruction to-be-read area is configured to store the to-be-read instruction read back by the ICache from the main memory;

the instruction extension is configured to store the Wen Zhiling transferred from the ICache.

Specifically, the data sharing control module comprises a data address index unit, a data state flag unit and a DTCM control unit;

The data address indexing unit is configured to address the target data address from the DTCM and the DCache, and index the target data and the address block of the data to be read and written from the main memory when the addressing in the DTCM and the DCache fails;

the data state flag unit is configured to record the number of times of continuous addressing failure of the DCache, the temperature data and dirty bit data; the dirty bit data is generated based on the failure of addressing the DTCM and writing the dirty bit data into the DCache;

the DTCM control unit is configured to read the target data from the DTCM according to an addressing result when receiving a data reading request, write the temperature data in the DCache into the data expansion area, and write the read-back data to be read and written into the data to-be-written area; when a data writing request is received, writing the target data into the DTCM according to an addressing result, transferring the temperature data in the DCache to the data expansion area, and synchronizing the dirty bit data in the DCache to the data to-be-written area.

Specifically, the instruction sharing control module comprises an instruction address index unit, an instruction state flag unit and an ITCM control unit;

The instruction address indexing unit is configured to address the target instruction address from the ITCM and the ICache, and index the target instruction and the address block of the instruction to be read from the main memory when the addressing in the ITCM and the ICache fails;

the instruction state flag unit is configured to record the number of continuous addressing failures of the ICache and the Wen Zhiling;

the ITCM control unit is configured to read the target instruction from the ITCM according to an addressing result, restore the Wen Zhiling in the ICache to the instruction expansion area, and write the read-back instruction to the instruction to be read area.

Specifically, the DCache is a group connection structure and comprises at least two DCache groups; when addressing the DTCM and the DCache is invalid, the data state flag unit determines the label data with the lowest replacement frequency in the DCache group as the temperature data based on the addressing result of the target data address;

and the data address index unit addresses the DCache and the DTCM based on a buffer line index and a data tag address.

Specifically, the ICache is a group connection structure and comprises at least two ICache groups; when addressing the ITCM and the ICache is invalid, the instruction state flag unit determines a tag instruction with the lowest replacement frequency in an ICache group as the Wen Zhiling based on an addressing result of the target data address;

and the instruction address index unit is used for addressing the ICache and the ITCM based on a buffer line index and an instruction tag address.

Specifically, based on the target data address of the read data request, when addressing of the DTCM and the DCache fails, the temperature data in the DCache is transferred to the data expansion area, and the number of times of continuous addressing failure of the DCache is read;

when the continuous failure times do not exceed a set threshold value, reading the target data from the main memory through the data cache control module, and writing the target data into the target data address of the DCache;

when the continuous failure times exceed a set threshold value, reading the target data from the main memory, writing the target data into the target data address of the DCache, and reading an address block of the data to be read and written from the main memory through the DTCM control unit, and writing the address block into the data to be read area; wherein the data to be read and written and the target data are stored in consecutive address blocks of the main memory.

Specifically, based on the target data address of the data writing request, when addressing hits the data expansion area or the data to-be-read area, the target data address is transferred from the hit area to the data to-be-written area, and the target data is written;

when addressing hits the DCache, writing the target data into the target data address in the DCache, determining the written target data as the dirty bit data, and synchronizing the dirty bit data into a data to-be-written area through the DTCM control unit;

when addressing of the DCache and the DTCM fails, the temperature data in the DCache is determined, the temperature data are transferred to the data expansion area of the DTCM, the target data are written into the target data address in the DCache, the target address data are determined to be the dirty bit data, and the dirty bit data are synchronized to a data to be written area through the DTCM control unit.

Specifically, based on the target instruction address of the read instruction request, when addressing the ITCM and the ICache fails, the Wen Zhiling in the ICache is restored to the instruction expansion area, and the number of times of continuous addressing failure of the ICache is read;

When the continuous failure times do not exceed a set threshold value, reading the target instruction from the main memory through the instruction cache control module, and writing the target instruction into the target instruction address of the ICache;

when the continuous failure times exceed a set threshold value, reading the target instruction from the main memory, writing the target instruction into the target instruction address of the ICache, and reading an address block of the instruction to be read from the main memory through the ITCM control unit, and writing the address block into the instruction to-be-read area; wherein the instruction to be read and the target instruction are stored in consecutive address blocks of the main memory.

Specifically, when the clock of the CPU is idle or the data of the data to be written area reaches a storage threshold, the synchronous data temporarily stored in the data to be written area is written into the main memory by the DTCM control unit, and the dirty bit data in the DCache is deleted by the data cache control module.

Specifically, the size of the space of the data to-be-read area is N1 times of the size of a DCache buffer line block, and the size of the data to-be-written area is the same as the size of the space of the data to-be-read area; n1 takes on the values of 16, 32, 64 or 128.

Specifically, the space size of the to-be-read area of the instruction is N2 times of the size of the ICache buffer line block, and the value of N2 is 16, 32, 64 or 128.

The beneficial effects that technical scheme that this application embodiment provided include at least: adding a shared controller on the basis of a traditional CPU hierarchical storage structure, simultaneously carrying out partition setting on the ITCM and the DTCM, fully utilizing an idle area, and carrying out combined storage with ICache and DCache to realize the extension of data and space; when the addressing of the Cache fails, the shared controller transfers the medium-temperature data/instructions of the Cache to the idle partition in the TCM, reads the external data to the idle partition of the Cache and the TCM, realizes the efficient operation of the Cache, and simultaneously enables the Core to read and write more data from the Cache and the TCM, thereby reducing the times and time for accessing the main memory.

From the system level, the control architecture improves the read-write access efficiency of the last-level storage of the Core, and improves the overall calculation and control performance of the CPU. The cache performance is optimized, the failure rate is reduced, the failure overhead is reduced, and the hit time is reduced. From the hardware level, the shared controller realizes the sharing and dynamic scheduling of the storage space, increases the application scene of the CPU chip and reduces the cost of multiple times of research and development and production of the similar chips.

Drawings

FIG. 1 is a schematic diagram of a hierarchical memory structure of a CPU in the related art;

FIG. 2 is a schematic diagram of a cache control architecture based on shared tightly coupled memory provided in an embodiment of the present application;

FIG. 3 is a schematic diagram of a shared memory provided in an exemplary embodiment of the present application;

FIG. 4 is a schematic diagram of the partitioning of DTCM and ITCM provided by an exemplary embodiment of the present application;

FIG. 5 is a schematic diagram of the architecture of ICache and ITCM addressing indexes provided by exemplary embodiments of the present application;

FIG. 6 is a flow chart of a CPU reading target data provided in an exemplary embodiment of the present application;

FIG. 7 is a flow chart of a CPU read target instruction provided in an exemplary embodiment of the present application;

fig. 8 is a schematic flow chart of writing target data by a CPU according to an exemplary embodiment of the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

References herein to "a plurality" means two or more. "and/or", describes an association relationship of an association object, and indicates that there may be three relationships, for example, a and/or B, and may indicate: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship.

Fig. 1 is a schematic diagram of a hierarchical storage structure of a CPU in the related art, which is a hierarchical storage structure commonly used for an embedded CPU, and includes a processor Core (Core), an instruction high-speed memory (ICache), a data high-speed memory (DCache), an Instruction Tightly Coupled Memory (ITCM), a Data Tightly Coupled Memory (DTCM), a main memory, and a nonvolatile memory Flash. Core is not a particular component, and for the sake of highlighting the memory structure, the non-memory parts of the CPU, such as the logic computation unit (ALU), the Program Counter (PC), the Register Set (REGs), etc., are collectively referred to herein as Core.

After the system is powered on, the common instruction and the common data are directly read from Flash and respectively stored in the ITCM and the DTCM through a 101 channel and a 103 channel. Core reads Core instructions in ITCM via 102 channels and reads Core data in DTCM via 104 channels. Instructions other than those in ITCM, core initiates a read request to ICache via channel 107; if ICache hits, core reads directly; if ICache fails, a request is initiated to main memory via channel 106. The Core initiates a read-write request to the DCache through the channel 109 for data other than the DTCM; if DCache hits, core directly reads and writes; if DCache fails, a read and write request is initiated to main memory through channel 108. If necessary, main memory initiates a read-write request to Flash through channel 105.

It should be noted that which core instructions are stored in the ITCM and which core data are stored in the DTCM are determined and allocated specifically by the software compile time. However, the compiling time of the software cannot fully predict which instructions and data will be frequently read and written when the program actually runs, especially in the scene with high real-time requirement. In the actual operation scenario, there is often a large space in the storage space of the ITCM and the DTCM, or although codes and data are placed in the ITCM and the DTCM, the frequency of reading is not as high as that of reading without placing, so that substantial waste is generated, and thus the working efficiency of the system is reduced when reading data and instructions.

FIG. 2 is a schematic diagram of a cache control architecture based on a shared tightly coupled memory according to an embodiment of the present application, where a shared controller is added to a conventional CPU hierarchical storage structure. The shared controller is arranged between the main memory and the TCM and the Cache through a data bus and an instruction bus, so that the instruction and the data path of the original system architecture are changed. As shown in fig. 2, core in the architecture reads the Core instruction in ITCM through 202 channel, reads the Core data in DTCM through 204, initiates a read request to ICache through 207 channel, and initiates a read and write request to DCache through 209 channel. ITCM and DTCM read related instructions and data from Flash via channel 201 and channel 203, respectively. The shared controller is communicated with the ICache and the ITCM through the channel 206 and the channel 211, so that the ICache can read and write the ITCM through the shared controller. DCache and DTCM are communicated through a channel 208 and a channel 212, and the DCache reads and writes the DTCM through the shared controller. In addition, since the shared controller is connected to the main memory through the channel 210, the ICache and DCache can realize the access to the main memory through the shared controller, the bidirectional read/write between the ITCM and the main memory, and the bidirectional read/write between the DTCM and the main memory. If necessary, the main memory initiates a read-write request to Flash through channel 105, including the main memory reading data and instructions from Flash through channel 205, or writing data to Flash.

The shared controller is arranged to utilize the free space of the ITCM and the DTCM as the extension space of the ICache and the DCache, and when the Core executes the read-write request, more data are read and written from the Cache and the TCM, so that the times and time for accessing the main memory and the Flash are reduced.

Fig. 3 is a schematic structural diagram of a shared memory according to an exemplary embodiment of the present application, where the shared memory includes a data sharing control module 320, a data cache control module 310, an instruction sharing control module 340, and an instruction cache control module 330. The data sharing control module 320 and the data cache control module 310 are connected with the DTCM, the DCache and the main memory through the data bus 360; the instruction sharing control module 340 and the instruction cache control module 330 are connected to the ITCM, the ICache, and the main memory through an instruction bus 350. The data sharing control module 320 and the instruction sharing control module 340 are respectively used for packet management, read-write addressing, read-write policy, etc. for DCache and ICache, and the data cache control module 310 and the instruction cache control module 330 are respectively used for packet management, read-write addressing, read-write policy, etc. for DCache and ICache.

Specifically, the data sharing control module is configured to address the DCache, the DTCM and the main memory according to the target data address in the CPU Core data read-write request, determine the temperature data in the DCache, and write the address block of the data to be read-written into the DTCM. The data cache control module is configured to perform read-write control on the DCache according to the addressing result of the target data address. The temperature data is data which is determined according to the data replacement frequency when the target data address is addressed in the DCache, and the data occupies the storage space of the DCache, so that the Core can only read and replace the data from the main memory or the Flash in a more time period. When the subsequent Core needs to continue to read the data, the data can only be read and written from the main memory or Flash again, and the action of frequently reading the data from the outside reduces the working efficiency of the whole system. The purpose of the transfer is to transfer the data to the idle space in the DTCM in time, so that the DCache is utilized efficiently, and the reading and writing speed of the DTCM is higher by a plurality of orders of magnitude than that of the main memory and Flash, so that the reading speed is greatly improved compared with the mode of reading the data from the outside again.

The instruction sharing control module is configured to address the ICache, the ITCM and the main memory according to a target instruction address in the CPU Core data read-write request, determine a temperature instruction in the ICache, and write an address block of an instruction to be read into the ITCM. The instruction cache control module is configured to perform read-write control on the ICache according to the addressing result of the target instruction address. Similarly, when a warm instruction is an instruction in ICache that is addressed in ICache based on a target instruction address, the instruction is determined according to the instruction replacement frequency, and needs to be transferred to the ITCM in time.

The whole storage areas in the traditional DTCM and the traditional ITCM are stored with data and instructions frequently read by a CPU Core, and the storage areas of the DTCM and the traditional ITCM are divided into fine grains according to functions, so that dynamic adjustment and optimization of the data and the instructions are realized.

Fig. 4 is a schematic diagram of the partitioning of the DTCM and the ITCM provided in the exemplary embodiment of the present application, and the storage area of the DTCM is divided into a data main area 401, a data area to be read 402, a data area to be written 403, and a data extension area 404. The data main area 401 is configured to store core data frequently read by the CPU; the data to-be-read area 402 is configured to store data to be read and written read back from the main memory; the data to-be-written area 403 is configured to store synchronous data to be updated to the main memory in the DCache; the data expansion area 404 is configured to store temperature data transferred from the DCache. The data area to be read, the data area to be written and the data expansion area can be addressed and read and written by DCache.

Similarly, the memory area of the ITCM is divided into an instruction main area 405, an instruction to-be-read area 406, and an instruction extension area 407. The instruction main area 405 is configured to store Core instructions frequently read by the CPU Core; the instruction to-be-read area 406 is configured to store an instruction to be read that the ICache reads back from the main memory; instruction extension 407 is configured to store temperature instructions transferred from ICache. The instruction to-be-read area and the instruction expansion area can be addressed and read and written by ICache. Because the instructions in the ITCM will only be read and will not be updated, the ITCM partition has no instructions to write to.

In the embodiment of the application, the space size of the data to-be-read area is N1 times of the size of the DCache buffer line block; the space size of the instruction to-be-read area is N2 times of the size of the ICache buffer line block, and N1 and N2 generally take values of 16, 32, 64 or 128. However, the specific partition sizes and buffer line block sizes need to be divided on the software level according to different application scenarios.

In one possible implementation, the partitioning process for each partition of the DTCM is as follows:

allocating a data main area space, and placing data accessed by the CPU program in the data main area, wherein the specific space size is determined according to the data size;

and allocating a data area to be read, wherein the size of the data area to be read is N1 times of the size of the DCache buffer line block. For example, the DCache capacity is 16KB, the buffer line block size is 8 bytes (bs=8b), and the N1 is set to 32, i.e. the space of the data to be read is 32×bs=32×8b=256B;

Allocating a data area space to be written, wherein the size of the data area space to be written is the same as the data area space to be read;

the remaining DTCM space serves as a data expansion area space.

Similarly, the main instruction area space and the space of the to-be-read instruction area are sequentially determined for each partition division of the ITCM, and the residual ITCM space is used as the space of the instruction expansion area.

As shown in fig. 3, the data sharing control module 320 specifically includes a data address index unit 321, a data status flag unit 322, and a DTCM control unit 323 (i.e., DTCM direct connection control).

The data address indexing unit is configured to address a target data address from the DTCM and the DCache, and to index address blocks of the target data and the data to be read and written from the main memory (or Flash) when the addresses in the DTCM and the DCache fail. The data to be read and written and the address blocks of the target data are continuously distributed on the storage space, the data to be read and written after the target data are extracted is read and written by the CPU Core, and the data are read to a designated area in advance, so that the reading efficiency of the control architecture is further improved.

The data status flag unit is configured to record the number of consecutive addressing failures of the DCache, the infrequent replacement of marked warm data in the DCache, and dirty bit data. The dirty bit data is generated after addressing failure of the DTCM and writing in DCache, and the dirty bit data is data which needs to be cleared timely so as to ensure dynamic optimization of the DCache.

The functionality of the DTCM control unit is configured from both read requests and write requests. For a read request of a CPU Core, the DTCM control unit reads target data from the DTCM according to an addressing result, writes temperature data in DCache into a data expansion area, and writes read-back data to be read and written into a data to-be-written area; for the write request of CPU Core, the DTCM control unit writes target data into the DTCM according to the addressing result, and the temperature data in DCache is transferred to the data expansion area, and the dirty bit data in DCache is synchronized to the data to be written area.

Similarly, the instruction sharing control module 340 includes an instruction address index unit 341, an instruction status flag unit 342, and an ITCM control unit 343 (i.e., ITCM direct control).

The instruction address indexing unit is configured to address a target instruction address from the ITCM and the ICache, and to index address blocks of the target instruction and the instruction to be read from the main memory (or Flash) when the addresses in the ITCM and the ICache fail. The address blocks of the target instruction and the instruction to be read are also contiguous in memory space.

The instruction status flag unit is configured to record the number of consecutive addressing failures in the ICache and to flag warm instructions that are not frequently read and written in the ICache.

The ITCM control unit is configured to read a target instruction from the ITCM according to an addressing result, write a temperature instruction in the ICache into the instruction expansion area, and write a read-back instruction to be read into the instruction to-be-read area.

The number of times of the continuous addressing failure of the temperature data, the temperature instruction, the dirty bit data, the DCache and the ICache is determined based on the addressing of the read-write request.

In the method, DCache is set to be in a group connection structure and comprises at least two DCache groups, when a data address index unit is addressed, a continuous address addressing area is formed by all the DCache groups, an instruction to-be-read area and an instruction expansion area, and the DCache and the DTCM are addressed based on a buffer line index and a data tag address.

Similarly, the ICache is also set to be in a group connection structure, and comprises at least two ICache groups, and the instruction address index unit forms a continuous address addressing area by using all the ICache groups, the instruction to-be-read area and the instruction expansion area when addressing, and addresses the ICache and the ITCM based on the buffer line index and the instruction tag address. The following description will be given by taking ICache and ITCM addressing index as examples only.

Fig. 5 is a schematic structural diagram of an icoche and ITCM addressing index provided in an exemplary embodiment of the present application, and a first icoche group 501 and a second icoche group 502 are taken as an example for illustration. To maximally multiplex the mature design, each buffer line of the Cache is divided into a flag bit (State), a Tag bit (Tag), and a Data bit (Data), the Data bit size, i.e., the buffer line block size bs=8 bytes. The instruction pending area 503 (i.e., ITCM-pending area) and the instruction expansion area 504 (i.e., ITCM-expansion area) are regarded as an extended Set of ICache from the address, and thus can be regarded as 4 sets of ICache from the address, i.e., set0 to Set3.

In one possible implementation, the address is represented by a 32-bit bus, where 505 is an intra-Block offset (Block offset), with 8 bytes (bs=8b) in the Block, represented by 3 bits. 506 is a group offset (Set offset) for 4 groups stored for instructions, (5 groups stored for data, one DTCM-area to be written) indicated by 3 bits, the group offset being for ease of addressing to the ITCM.507 is a buffer line Index (Index) pointing to the corresponding buffer line, calculated in a single set of 32K, representing a 4K range in 12 bits in units of blocks (8 bytes). 508 is the tag address for matching, i.e., the target data address in the read-write request, represented by the remaining 14 bits. The buffer line is hit when the tag address corresponds to the instruction tag address in the buffer line, otherwise addressing is disabled.

It should be noted that the foregoing embodiments are described in terms of an address process of an ICache group and an ICache extension group, and the instruction main area also participates in the address process, which will not be described in detail herein.

For a read data request from the CPU Core, it is first addressed from the DTCM and DCache that read at the fastest speed according to the target data address. Directly from the DTCM when the address hits the DTCM and directly from the DCache when the address hits the DCache.

When addressing the DTCM and the DCache is invalid, the DCache can only be read from the outside, but the DCache is self-owned to the small-capacity storage equipment, and the problem of the residual storage space is required to be considered, so when the target data is read from the outside, the temperature data in the DCache is also required to be transferred to a data expansion area in advance, and the number of times of continuous addressing invalidation of the DCache (the number of times of continuous invalidation after the current DCache is +1) is read through a data state flag unit. When the temperature data is transferred to the data expansion area, the target data address is a null, and the target data read from the outside needs to be placed in the target data address null.

And when the continuous failure times do not exceed the set threshold value, reading target data from a main memory (or Flash) through the data cache control module, and writing the target data into a target data address of DCache.

When the continuous failure times exceed the set threshold, the corresponding target data is read through the data cache control module and written into the target data address of DCache, and the address block of the data to be read and written is read from the main memory (or Flash) through the DTCM control unit and is written into the data area to be read.

The data to be read and written and the target data are stored in the continuous address block of the main memory, the step is the pre-judgment of the access memory, and the continuous failure means that a new data section can be read, and the possibility that the subsequent data is read continuously is high. Therefore, batch reading is carried out on the DTCM, so that DCache is conveniently read from the DTCM, and the DCache is not read from a main memory (Flash). Based on the reading rule, the space in each region of the data expansion region and the data to be read region of the DTCM is circularly used, and the updating rule is to replace the temperature data which is not accessed for the longest time by taking a continuous block as a unit.

Fig. 6 is a schematic flow chart of a CPU reading target data according to an exemplary embodiment of the present application, including the following steps:

step 601, core reads A address data;

step 602, accessing DTCM addressing;

when the address hits the DTCM, step 603 is performed; when the DTCM fails, the process goes to step 604.

Step 603, directly reading from the DTCM;

step 604, accessing DCache addressing;

when the address hits DCache, step 605 is performed; when DCache fails, go to step 606.

Step 605, directly reading from DCache;

step 606, a group of temperature data which is not used for the longest time by the A address label in the DCache group is transferred to a data expansion area, and a vacancy is reserved at the A address;

this operation is an improvement over the DCache replacement policy in that the replaced block is not pruned, but instead is transferred to the data expansion because the data that was accessed before the temporal and spatial correlation decisions of the data may still be accessed. Therefore, when the subsequent Core also needs to read and write the A address data, the data expansion area is directly hit, and the situation that the data is replaced and read back from the main memory or the Flash is avoided.

Step 607, dcache is continuously disabled 3 times?

The continuous failure times of DCache are recorded by a data state flag unit, and can be determined by setting a flag bit, and the set threshold value of the failure times is determined according to an application scene, and is described by taking three times as an example, and the flag bit becomes high level after the count reaches three times.

When the read flag bit is still at the low level, step 608 is executed, otherwise step 609 is skipped.

Step 608, reading data corresponding to the A address from the main memory or Flash and placing the data in an A address space in DCache;

step 609, the data corresponding to the a address is read from the main memory or Flash, placed in the a address space in the DCache, and the consecutive eight data address blocks immediately after the reading are placed in the data to-be-read area.

The data corresponding to the A address is the target data, and the eight continuous data address blocks are the address blocks of the data to be read and written.

Similarly, a read command request to the CPU Core is addressed first from the ITCM and ICache that read at the fastest speed according to the target command address. Read directly from ITCM when addressing hits ITCM and read directly from ICache when addressing hits ICache.

When addressing failure of ITCM and ICache is only read from outside, it is necessary to transfer the temperature instruction in ICache to the instruction expansion area, and read the number of continuous addressing failure of ICache (number of continuous failure after the current ICache+1) through the instruction status flag unit.

And when the continuous failure times do not exceed the set threshold value, reading target execution from the main memory (or Flash) through the instruction cache control module, and writing the target execution into a target instruction address of the ICache.

When the continuous failure times exceeds a set threshold value, the corresponding target instruction is read through the instruction cache control module and written into a target instruction address of the ICache, and an address block of data to be read and written is read from a main memory (or Flash) through the ITCM control unit and is written into an instruction area to be read.

Fig. 7 is a schematic flow chart of a CPU reading a target instruction according to an exemplary embodiment of the present application, including the following steps:

step 701, core reads the B address instruction;

step 702, accessing ITCM addressing;

when the address hits ITCM, step 703 is performed; when ITCM fails, the process goes to step 704.

Step 703, directly reading from the ITCM;

step 704, accessing ICache addressing;

when an address hits ICache, step 705 is performed; when ICache fails, the process goes to step 707.

Step 705, directly reading from ICache;

step 706, a group of temperature instructions which are not used for the longest time in the B address label in the ICache group are transferred to the instruction expansion area, and a gap is reserved in the B address;

the replaced block is not deleted here but is transferred to the instruction extension. The instruction expansion area is hit directly during subsequent reading, and the read-back from the main memory or Flash is not needed.

Step 707, icoche is continuously disabled 3 times?

When the continuous failure does not exceed 3 times, step 708 is performed, otherwise step 709 is performed in a jump.

Step 708, reading the instruction corresponding to the B address from the main memory or Flash and placing the instruction in the B address space in the ICache;

step 709, the instruction corresponding to the B address is read from the main memory or Flash, placed in the B address space in the ICache, and the next consecutive eight instruction address blocks read simultaneously and then placed in the instruction to-be-read area.

The instruction corresponding to the B address is the target instruction, and the eight consecutive instruction address blocks are the address blocks of the instruction to be read.

The write data request to the CPU Core is addressed first from the DTCM and DCache that read at the fastest speed according to the target data address. Directly from the DTCM when the address hits the DTCM and directly from the DCache when the address hits the DCache.

But unlike read data, since DTCM is divided into different partitions according to functions, it is necessary to further judge according to the partitions when the DTCM is hit and perform a corresponding operation.

When the data main area is hit directly, the target data is written directly thereto. However, when the data expansion area or the data to-be-read area is hit, the target data address needs to be transferred from the hit area to the data to-be-written area, and the target data needs to be written in the data to-be-written area. The data transfer is required here because only the data to be written area is configured to synchronize data to the main memory, and the data of the extended area or the data to be read area must be changed over a long period of time.

When addressing hits DCache, then the target data is written into the target data address in DCache through the data buffer control module, the written address data is determined as dirty bit data, and the dirty bit data is synchronized to the data to be written area through the DTCM control unit. When the data which is hit and written into the DCache is needed to be stored by synchronizing with the main memory, the writing at the moment is defined as a transfer station, so that the data is defined as dirty bit data, and the dirty bit data is deleted in time later so as to ensure the efficient operation of the DCache.

When addressing both DCache and DTCM fails, it is indicated that the addresses specified in all partitions are occupied, and at this time, it is necessary to determine the warm data in DCache and transfer it to the data expansion area of DTCM. After the warm data is transferred, the target instruction address becomes a vacancy, then the target data is written into the target data address of the vacancy, and the address data still needs to be determined as dirty bit data, and the machine selection is synchronized to the main memory.

Fig. 8 is a schematic flow chart of writing target data by a CPU according to an exemplary embodiment of the present application, including the following steps:

step 801, core writes the C address data;

step 802, accessing DTCM addressing;

when the DTCM is hit, step 803 is continued to be executed; otherwise, the DTCM fails, and jumps to step 808;

Step 803, hit data main region?

When the data primary region is hit, step 804 is performed; otherwise, jumping to step 805;

step 804, writing data to the C address in the data main area;

step 804 is performed to jump to the ending flow;

step 805, hit data to write area?

Executing step 806 when the data to be written is missed; otherwise, jump to step 807;

step 806, writing the C address in the hit area from the hit area to the data to be written area, and deleting from the area;

after executing step 806, the process goes to step 811;

step 807, writing data to the C address in the data area to be written;

after executing step 806, the process goes to step 811;

step 808, addressing access DCache;

when hitting DCache, jumping to execute step 810, otherwise, DCache fails to execute step 809;

step 809, transferring a group of data which is not used for the longest time by the C address label in the Dcache group to a data expansion area, and reserving a vacancy by the C address;

step 810, writing data to a C address in DCache, marking dirty bit data and synchronizing the dirty bit data to a data area to be written;

step 811, clock idle or data pending full?

Step 812 is performed when the condition is satisfied, otherwise the Core is waited to write the next address data.

Step 812, the data to be written in the data area is written in the main memory, and dirty bit data in the DCache is cleared.

In summary, the shared controller is added on the basis of the traditional CPU hierarchical storage structure, meanwhile, the ITCM and the DTCM are partitioned, the idle area is fully utilized, and the idle area is stored in combination with the ICache and the DCache, so that the extension of data and space is realized; and the temperature instructions and temperature data which are not frequently read and written in the ICache and the DCache are transferred to the idle partition in the TCM by the shared controller in time when the addressing of the Cache fails, and external data are read to the idle partition of the Cache and the TCM, so that the Cache can be efficiently operated, more data can be read and written from the Cache and the TCM by the Core, and the frequency and time for accessing the main memory are reduced.

The foregoing describes preferred embodiments of the present invention; it is to be understood that the invention is not limited to the specific embodiments described above, wherein devices and structures not described in detail are to be understood as being implemented in a manner common in the art; any person skilled in the art will make many possible variations and modifications, or adaptations to equivalent embodiments without departing from the technical solution of the present invention, which do not affect the essential content of the present invention; therefore, any simple modification, equivalent variation and modification of the above embodiments according to the technical substance of the present invention still fall within the scope of the technical solution of the present invention.

Claims

1. A cache control architecture based on a tightly coupled memory, which is characterized by comprising a data tightly coupled memory DTCM, an instruction tightly coupled memory ITCM, a data cache DCache, an instruction cache ICache, a main memory and a shared controller; the shared controller is arranged between the main memory and the close-coupled memory and the cache through a data bus and an instruction bus, and comprises:

the data sharing control module is configured to address the DCache, the DTCM and the main memory according to a target data address, determine temperature data in the DCache, transfer the temperature data into the DTCM, and write an address block of data to be read and written into the DTCM; the temperature data is determined based on a data replacement frequency for the target data address;

And the instruction cache control module is configured to carry out read-write control on the ICache according to the addressing result of the target instruction address.

2. The tightly coupled memory based cache control architecture of claim 1, wherein the memory area of the DTCM is divided into a data main area, a data to read area, a data to write area, and a data expansion area;

the data main area is configured to store core data frequently read by the CPU;

3. The tightly-coupled memory-based cache control architecture of claim 1, wherein the memory area of the ITCM is divided into an instruction main area, an instruction pending area, and an instruction extension area;

4. The tightly-coupled memory-based cache control architecture of claim 2, wherein the data sharing control module comprises a data address indexing unit, a data status flag unit, and a DTCM control unit;

the data address indexing unit is configured to address the target data address from the DTCM and the DCache, and index the target data and the address block of the data to be read and written from the main memory when the DTCM and the DCache fail;

5. The tightly-coupled memory-based cache control architecture of claim 3, wherein the instruction sharing control module comprises an instruction address index unit, an instruction status flag unit, and an ITCM control unit;

the instruction address indexing unit is configured to address the target instruction address from the ITCM and the ICache, and index the target instruction and the address block of the instruction to be read from the main memory when the ITCM and the ICache fail;

6. The tightly-coupled memory-based cache control architecture of claim 4, wherein the DCache is a set associative structure comprising at least two DCache sets; when the DTCM and the DCache are invalid, the data state flag unit determines the label data with the lowest replacement frequency in the DCache group as the temperature data based on the addressing result of the target data address;

7. The tightly-coupled memory-based cache control architecture of claim 5, wherein the ICache is a set associative structure comprising at least two ICache sets; when the ITCM and the ICache are invalid, the instruction state flag unit determines a tag instruction with the lowest replacement frequency in an ICache group as the Wen Zhiling based on an addressing result of the target data address;

8. The tightly coupled memory based cache control architecture of claim 6, wherein upon invalidation of the DTCM and the DCache based on the target data address of a read data request, the warm data in the DCache is dumped to the data expansion area and the number of times of DCache consecutive addressing invalidations is read;

9. The tightly coupled memory-based cache control architecture of claim 6, wherein, based on the target data address of a write data request, when an address hits the data expansion region or data pending read region, the target data address is transferred from the hit region to the data pending write region and the target data is written;

When addressing of the DCache and the DTCM fails, the temperature data in the DCache is determined, the temperature data are transferred to the data expansion area of the DTCM, the target data are written into the target data address in the DCache, the target data are determined to be the dirty bit data, and the dirty bit data are synchronized to a data to be written area through the DTCM control unit.

10. The tightly coupled memory based cache control architecture of claim 7, wherein upon invalidation of the ITCM and the ICache addressing based on the target instruction address of a read instruction request, the Wen Zhiling in the ICache is restored to the instruction expansion area and the number of times of the ICache sequential addressing invalidation is read;

11. The tightly coupled memory-based cache control architecture of claim 9, wherein the synchronization data buffered in the data-to-write region is written to the main memory by the DTCM control unit and the dirty bit data in the DCache is deleted by the data cache control module when a clock of the CPU is idle or the data-to-write region data reaches a storage threshold.

12. The tightly coupled memory based cache control architecture of claim 6, wherein the data pending read area space size is N1 times the DCache buffer line block size, the data pending write area and the data pending read area space size being the same; n1 takes on the values of 16, 32, 64 or 128.

13. The tightly coupled memory based cache control architecture of claim 7, wherein the instruction pending read area space size is N2 times the icoche buffer line block size, and N2 takes a value of 16, 32, 64, or 128.