CN116361232A

CN116361232A - Processing method and device for on-chip cache, chip and storage medium

Info

Publication number: CN116361232A
Application number: CN202310369853.1A
Authority: CN
Inventors: 杨宇清; 张亚林
Original assignee: Shanghai Enflame Technology Co ltd
Current assignee: Shanghai Enflame Technology Co ltd
Priority date: 2023-04-07
Filing date: 2023-04-07
Publication date: 2023-06-30

Abstract

The invention discloses a processing method, a device, a chip and a storage medium for on-chip cache, and relates to the technology of high-performance chips. The method comprises the following steps: acquiring a first read-write instruction sent by a host; if the instruction type of the first read-write instruction is a preset instruction type, executing the first read-write instruction of the first read-write instruction, starting the instruction life cycle, and the preset instruction type is a read-write instruction or a part of write instruction; in the instruction life cycle, if a second read-write instruction of a preset instruction type is received and a second address accessed by the second read-write instruction is the same as the first address, merging the first write instruction of the first read-write instruction with the write instruction of the second read-write instruction to obtain a target write instruction; and when the first data read by the first read instruction is returned, the instruction life cycle is ended, and the target write instruction is executed according to the first data. The method and the device realize the combination of write instructions accessing the same address, improve the read-write performance of the on-chip cache and improve the read-write efficiency of the cache data.

Description

Processing method and device for on-chip cache, chip and storage medium

Technical Field

The embodiment of the invention relates to the technical fields of high-performance chips, semiconductor chips, artificial intelligent chips, system-on-chip, chip data processing and the like, in particular to a processing method, a device, a chip and a storage medium for on-chip cache.

Background

With the development of semiconductor technology, the performance of processors is continuously improved. Particularly in the field of artificial intelligence, the access speed of the main memory cannot meet the high computational power requirement of the computing core, and at this time, the on-chip cache is required to support a high computational force scene.

The current on-chip cache mode is to sequentially process the cache operation instructions sent by the main chip and sequentially execute the cache operation instructions. For example, after receiving a read/write command sent from the main chip, a read command of the read/write command is executed, and after reading data from the memory space according to the read command, the data is written according to a write command of the read/write command.

However, when the data processing method is used for a large number of cache operations, the performance is reduced, and the data reading and writing efficiency is low.

Disclosure of Invention

The invention provides a processing method, a processing device, a chip and a storage medium for on-chip cache, which are used for improving the read-write performance of the on-chip cache and improving the read-write efficiency of cache data.

In a first aspect, an embodiment of the present invention provides a method for processing an on-chip cache, including:

acquiring a first read-write instruction sent by a host;

if the instruction type of the first read-write instruction is a preset instruction type, executing the first read-write instruction of the first read-write instruction, starting the instruction life cycle, and the preset instruction type is a read-write instruction or a part of write instruction;

In the instruction life cycle, if a second read-write instruction of a preset instruction type is received and a second address accessed by the second read-write instruction is the same as the first address, merging the first write instruction of the first read-write instruction with the write instruction of the second read-write instruction to obtain a target write instruction, wherein the first address is the address accessed by the first read instruction;

and when the first data read by the first read instruction is returned, the instruction life cycle is ended, and the target write instruction is executed according to the first data.

In a second aspect, an embodiment of the present invention further provides a processing apparatus for on-chip buffering, including:

the instruction analysis module is used for acquiring a first read-write instruction sent by the host;

the read instruction execution module is used for executing the first read instruction of the first read instruction if the instruction type of the first read instruction is a preset instruction type, starting the instruction life cycle, and the preset instruction type is a read-write instruction or a part of write instruction;

the merging module is used for merging the first write instruction of the first read-write instruction with the write instruction of the second read-write instruction to obtain a target write instruction if a second read-write instruction of a preset instruction type is received and a second address accessed by the second read-write instruction is identical to the first address in the instruction life cycle;

And the write instruction execution module is used for executing the target write instruction according to the first data when the first data read by the first read instruction is returned and the instruction life cycle is ended.

In a third aspect, an embodiment of the present invention further provides a chip, including a memory, a processor, and a computer program stored in the memory and capable of running on the processor, where the processor implements a processing method of on-chip cache as shown in the embodiment of the present invention when the processor executes the program.

In a fourth aspect, embodiments of the present invention also provide a storage medium containing computer-executable instructions, which when executed by a computer processor, are configured to perform a method of processing an on-chip cache as shown in the embodiments of the present invention.

The processing method of the on-chip cache provided by the embodiment of the invention obtains a first read-write instruction sent by a host; if the instruction type of the first read-write instruction is a preset instruction type, executing the first read-write instruction of the first read-write instruction, starting the instruction life cycle, and the preset instruction type is a read-write instruction or a part of write instruction; in the instruction life cycle, if a second read-write instruction of a preset instruction type is received and a second address accessed by the second read-write instruction is the same as the first address, merging the first write instruction of the first read-write instruction with the write instruction of the second read-write instruction to obtain a target write instruction, wherein the first address is the address accessed by the first read instruction; and when the first data read by the first read instruction is returned, the instruction life cycle is ended, and the target write instruction is executed according to the first data. Compared with the current serial execution of the cache operation, when a large number of cache operations are faced, the performance is reduced, and the data reading and writing efficiency is low. And when the read instruction of the first read-write instruction returns, ending the instruction life cycle, and performing write operation on the data acquired from the first address according to the target write instruction. The method includes the steps that in the instruction life cycle of a first read-write instruction, one or more write instructions of other read-write instructions accessing the same address are combined. And the execution of a plurality of read-write instructions with the same address is realized through the clock time occupied by the first read-write instruction, so that the read-write performance of the on-chip cache is improved, and the read-write efficiency of the cache data is improved.

Assuming that the duration of the instruction life cycle is N clock cycles, after returning data, the write operation is executed through 1 clock cycle, and then the clock duration of one read-write operation is n+1. If three read-write commands of the first address are triggered continuously in the command life cycle, three read operations and write operations are required to be executed according to the current technical scheme, and the total time length is 3N+3 clock times. By adopting the scheme provided by the invention, in the instruction life cycle of the first read-write instruction, the write instructions of the two latter read-write operations are combined to obtain the target write instruction, and when the first read instruction returns, the clock duration is N, and then the target write instruction is executed through 1 clock cycle. Three continuous read-write instructions can be completed through n+1 clock cycles.

Drawings

Fig. 1 is a flowchart of a processing method of an on-chip cache in a first embodiment of the present invention.

Fig. 2 is a flowchart of a processing method of on-chip caching in the second embodiment of the present invention.

Fig. 3 is a flowchart of a processing method of on-chip caching in the third embodiment of the present invention.

Fig. 4 is a flowchart of a processing method of on-chip caching in the fourth embodiment of the present invention.

Fig. 5 is a flowchart of a processing method of an on-chip cache in a fifth embodiment of the present invention.

FIG. 6 is a schematic diagram of an on-chip cache circuit module according to a sixth embodiment of the present invention.

FIG. 7 is a prior art instruction flow timing diagram in a sixth embodiment of the present invention.

Fig. 8 is a timing diagram of an instruction flow according to the technical solution in the sixth embodiment of the present invention.

Fig. 9 is a schematic structural diagram of a processing device for on-chip buffering in a seventh embodiment of the present invention.

Fig. 10 is a schematic diagram of the structure of a chip in the eighth embodiment of the present invention.

Detailed Description

The invention is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting thereof. It should be further noted that, for convenience of description, only some, but not all of the structures related to the present invention are shown in the drawings.

Example 1

Fig. 1 is a flowchart of a processing method of on-chip cache according to an embodiment of the present invention, where the embodiment is applicable to a data read-write scenario of on-chip cache, and the method may be executed by an on-chip cache chip, and specifically includes the following steps:

step 101, a first read-write instruction sent by a host is obtained.

The host may be a host device connected to the cache chip via a network, or may be a host chip or processor locally connected to the cache chip via a data bus. When multiple compute cores are provided locally, the compute cores may act as hosts. After triggering the cache read-write instruction, the host sends the instruction to the cache chip, wherein the instruction can be a first read-write instruction. The cache chip acquires a first read-write instruction sent by the host.

The type of the first read-write instruction may be a read instruction, a full write instruction, a partial write instruction, or a read-write instruction.

Step 102, if the instruction type of the first read/write instruction is a preset instruction type, executing the first read instruction of the first read/write instruction, starting the instruction life cycle, and the preset instruction type is a read/write instruction or a partial write instruction.

The read-write instruction or part of the write instruction is configured as a preset instruction type in advance. In a specific data verification policy, for example, when a data verification policy with an error correction code (Error Correcting Code, ECC) is adopted, a read-write instruction or a partial write instruction needs to be converted into a read instruction and a write instruction. For example, if the instruction type of the first read/write instruction is the preset instruction type, the first read/write instruction is parsed into a first read instruction of the first read/write instruction and a first read/write instruction of the first read/write instruction.

The instruction lifecycle is used to represent the clock cycle taken by the first read instruction to execute. When executing the first read command of the first read/write command, the command life cycle is started, and the first read command is executed through the data read operation. If the clock period occupied by the data reading operation is fixed, the clock period occupied by the data reading operation is taken as the instruction life period. If the clock period occupied by the data reading operation is not fixed, ending the instruction life cycle when the data reading operation returns the read data.

Step 103, in the instruction life cycle, if a second read-write instruction of a preset instruction type is received and the second address accessed by the second read-write instruction is the same as the first address, merging the first write instruction of the first read-write instruction with the write instruction of the second read-write instruction to obtain the target write instruction.

The first address is an address accessed by the first read instruction. And in the instruction life cycle, if a second read-write instruction of a preset instruction type is received, judging whether a second address accessed by the second read-write instruction is identical to the first address. If the first read-write instruction is the same, combining the first write instruction of the first read-write instruction and the write instruction of the second read-write instruction to obtain the target write instruction. The partial write or read/write command requires the data to be first executed, and the data is rewritten after the data is read. In the instruction life cycle of the first read instruction, if a second read instruction of a preset instruction type which also accesses the first address is received, the second read instruction is indicated to also need to read the first data in the first address. The second write command of the second read/write command may be combined with the first write command of the first read/write command to obtain the target write command.

The second read command is any one read/write command of a preset command type received in the command life cycle of the first read command. The number of second read instructions may be one or more.

Optionally, in the instruction life cycle, if the second read/write instruction of the preset instruction type is received, it may be implemented that after the instruction life cycle of the first read instruction is entered, if the second read/write instruction of the preset instruction type is received in real time. The method can be implemented by analyzing the received read-write instructions in advance, aggregating the read-write instructions accessing the same address, and continuously executing the aggregated read-write instructions after entering the instruction life cycle of the first read instruction.

Step 104, when the first data read by the first read instruction is returned, the instruction life cycle is ended, and the target write instruction is executed according to the first data.

When the first data is read, the first reading instruction returns the first data, and the instruction life cycle of the first reading instruction is ended.

If the first write command and one or more second write commands are combined in step 103 to obtain a target write command, performing a data write operation on the first data according to the target write command.

If the first write command is not combined with the second write command in step 103, performing a data write operation on the first data according to the first write command.

Further, after the start of the instruction lifecycle, the method further comprises:

and in the instruction life cycle, if the third read-write instruction is acquired, one or more third read-write instructions are sequentially executed, wherein the third read-write instruction is a read instruction or all write instructions.

And in the instruction life cycle of the first read instruction, if the type of the received read-write instruction is the read instruction or all the write instructions, the read-write instruction is a third read-write instruction.

Since the third read-write command occupies only one clock cycle, the third read-write command can be executed once after the third read-write command is received.

In other words, in the instruction life cycle of the first read instruction, if the second read instruction is received, the second write instruction of the second read instruction is combined with the first write instruction to obtain the target write instruction. And after the first reading instruction returns the first data, performing data writing operation on the first data according to the target writing instruction. If the second read-write instruction is not received, after the first read instruction returns the first data, performing data write operation on the first data according to the first write instruction. And for sequentially executing the received third read-write instruction in the instruction life cycle of the first read instruction.

Example two

Fig. 2 is a flowchart of a processing method of on-chip buffering according to a second embodiment of the present invention. As a further explanation of the above embodiment, after the first read-write instruction sent by the host is obtained, it further includes: the data verification strategy is error correction code; if the instruction type of the first read-write instruction is a partial write instruction, converting the first read-write instruction into a first read instruction and a first write instruction. The method comprises the following steps:

step 201, a first read-write instruction sent by a host is obtained.

Step 202, the data verification strategy is error correction code. If the instruction type of the first read-write instruction is a partial write instruction, converting the first read-write instruction into a first read instruction and a first write instruction.

In the embodiment of the invention, the data verification strategy is an error correction code. The cache data is stored in a Static Random-Access Memory (SRAM). When receiving the read-write instruction or part of the write instruction, the read-write instruction or part of the write instruction is analyzed into a read instruction and a write instruction. In other words, when a first read-write command of a preset command type is received, the first read-write command is parsed into a read command of the first read-write command and a write command of the first read-write command.

When the data information is received, the data information is encoded, and the generated check code and the data information are sent into a Static Random Access Memory (SRAM) according to the destination address. When performing a data read operation, data and check codes are read from a Static Random Access Memory (SRAM). The read data is decoded and compared to the check code. Alternatively, when the read request is from a read instruction, a Static Random Access Memory (SRAM) returns the read returned data and the verification result directly to the peripheral host that initiated the read instruction. Alternatively, when a read request is from a read instruction in a read-write instruction, a Static Random Access Memory (SRAM) sends the read-back data and the verification result to an instruction information processing module.

Step 203, if the instruction type of the first read/write instruction is a preset instruction type, executing the first read instruction of the first read/write instruction, starting the instruction life cycle, and the preset instruction type is a read/write instruction or a partial write instruction.

Step 204, if a second read-write command of a preset command type is received and the second address accessed by the second read-write command is the same as the first address in the command life cycle, merging the first write command of the first read-write command with the write command of the second read-write command to obtain the target write command.

The first address is an address accessed by the first read instruction.

Step 205, when the first data read by the first read instruction is returned, the instruction life cycle is ended, and the target write instruction is executed according to the first data.

To improve the reliability of the chip, the high capacity on-chip cache often employs a data check policy with Error Correction Codes (ECC), in which case, if the data valid bit carried by the write instruction is only partially valid (partial write), the on-chip cache needs to decode the partial write instruction into a read-write instruction. In order to ensure the compatibility of the on-chip cache, there is often a case that the data bit width of the peripheral host is smaller than that of the on-chip cache, and at this time, the write instructions initiated by the host to the on-chip cache are all partial write instructions. The partial write instruction bits are often partial write instructions that successively access the same address. If the on-chip cache employs an ECC policy, the resulting large number of read-write instructions may cause significant performance degradation. The processing method of the on-chip cache provided by the embodiment of the invention can combine the write instructions of the read-write instructions accessing the same address in the instruction life cycle in the verification strategy scene adopting the error correction code, thereby improving the performance of executing the read-write instructions.

Example III

Fig. 3 is a flowchart of a processing method of on-chip caching according to a third embodiment of the present invention, as a further explanation of the above embodiment, combining a first write instruction of a first read-write instruction with a write instruction of a second read-write instruction to obtain a target write instruction, including: acquiring a first data valid bit of a first write instruction and a second data valid bit of a second write instruction of a second read instruction; if the first data valid bit is different from the second data valid bit, a target write instruction for writing the first data valid bit and the second data valid bit is generated. The method comprises the following steps:

step 301, a first read-write instruction sent by a host is obtained.

Step 302, if the instruction type of the first read/write instruction is a preset instruction type, executing the first read instruction of the first read/write instruction, starting the instruction life cycle, and the preset instruction type is a read/write instruction or a partial write instruction.

Step 303, in the instruction life cycle, if a second read-write instruction of a preset instruction type is received and the second address accessed by the second read-write instruction is the same as the first address, acquiring the first data valid bit of the first write instruction and the second data valid bit of the second write instruction of the second read-write instruction.

The first address is an address accessed by the first read instruction. In some scenarios, the read-write instruction and the partial-write instruction do not write all of the data in the first address, but rather write the data in the partial data bits. It is therefore necessary to acquire the first data valid bit and the second data valid bit. It is determined whether the first data valid bit and the second data valid bit are identical in position.

Step 304, if the first data valid bit is the same as the second data valid bit, the write data of the first write instruction is covered by the write data of the second write instruction, so as to obtain the target write instruction.

If the second data valid bit is the same as the first data valid bit, the data written by the second read-write instruction is the same as the data written by the first read-write instruction. Since the second read-write instruction is post-triggered, the write data of the first write instruction is overwritten with the write data of the second write instruction, and the resulting target write instruction is used to write the second data in the second data valid bit.

Optionally, the writing data of the first writing instruction is overwritten with the writing data of the second writing instruction to obtain the target writing instruction, which may be implemented by the following ways:

Determining a first data valid bit and first write data according to a write instruction of the first read-write instruction; determining a second data valid bit and second write data according to a write instruction of the second read-write instruction; the reserved data bits are determined based on the first data valid bits and the second data valid bits. Determining a target data valid bit according to the reserved data bit and the second data valid bit; and covering the first write data according to the valid bit of the target data and the second write data to obtain the target write data. And determining a target write instruction according to the target write data, wherein the target write instruction is used for writing the target write data according to the valid bit of the target data.

For example, the first address is 8 bits long, and the first data valid bit and the second data valid bit are 11110000 each, i.e., each is a write to the first four data bits. The generated target write instruction is for writing second data to the first four data bits of the first data.

Optionally, if the first data valid bit is different from the second data valid bit, combining the write data of the second write instruction and the write data of the first write instruction to obtain the target write instruction.

If the first data valid bit is different from the second data valid bit, the data bits written by the two writing operations are not overlapped, the data of the first writing instruction are written into the first data valid bit, and the data of the second writing instruction are written into the second data valid bit.

For example, the first address has a length of 8 bits, the first data valid bit is 11110000, the second data valid bit is 00001111, and it can be seen that the first data valid bit is completely different from the second data valid bit. When the target instruction is generated, the data valid bit is 11111111. The generated target write instruction is for writing second data to the first four data bits of the first data.

Optionally, when the first data valid bit is partially different from the second data valid bit, the corresponding data bit in the first data valid bit is covered with the valid bit in the second data valid bit, and whether the data bit in the first data bit is valid or not, the valid bit in the second data valid bit is valid. The invalid bits in the second data valid bits do not overwrite the valid bits in the first data valid bits.

Illustratively, the first address is 8 bits long, the first data valid bit is 10010000, the second data valid bit is 11000011, and it can be seen that the first data valid bit is partially identical to the second data valid bit. When the target instruction is generated, the data valid bit is 11010011. The fourth bit adopts corresponding data in the first writing data, and the rest data bits adopt corresponding data of the second writing data.

Step 305, when the first data read by the first read instruction is returned, the instruction life cycle is ended, and the target write instruction is executed according to the first data.

The processing method of the on-chip cache provided by the embodiment of the invention can determine the data valid bit of the target write instruction according to the first data valid bit and the second data valid bit, realize the accurate combination of the write instruction and improve the accuracy of instruction combination.

Example IV

Fig. 4 is a flowchart of a processing method of on-chip caching according to a fourth embodiment of the present invention, as a further explanation of the foregoing embodiment, the obtaining a first read-write instruction sent by a host includes: analyzing an instruction stream sent by a host; analyzing a plurality of read-write instructions in the instruction stream; and determining the response sequence of the read-write instructions according to the preset priority, wherein the write instruction weight of the first read-write instruction is greater than other instruction weights, the other instruction weights comprise all read instructions, all write instructions and the read instructions of the first read-write instruction, and the read instruction interval between the write instructions of the first read-write instruction and the read instruction of the first read-write instruction is the instruction life cycle. The method comprises the following steps:

step 401, analyzing an instruction stream sent by a host.

The read and write instructions generated by the host may be transmitted to the on-chip cache chip in the form of an instruction stream.

Step 402, analyzing a plurality of read-write instructions in the instruction stream.

After receiving the instruction stream, analyzing the read-write instruction in the instruction stream. And obtaining independent read-write instructions. The read-write instruction may be a full read instruction, a full write instruction, a read-write instruction, or a partial write instruction.

If the read/write command is a read/write command, the read/write command is analyzed into a read command of the read/write command and a write command of the read/write command. If the partial write command is the partial write command, the read-write command is analyzed into a read command of the partial write command and a write command of the partial write command.

Step 403, determining a response sequence of the read-write instructions according to the preset priority, wherein the write instruction weight of the first read-write instruction is greater than other instruction weights, the other instruction weights comprise all read instructions, all write instructions and the read instructions of the first read-write instruction, and the read instruction interval between the write instructions of the first read-write instruction and the read instruction of the first read-write instruction is an instruction life cycle.

The static configuration may be initialized by the peripheral control circuitry, the static configuration including a preset priority of the read-write instructions. And realizing the configuration of the preset priority through static configuration. The preset priority orders the received read-write instructions according to the instruction weight from high to low, and the response sequence of the read-write instructions is determined. And sequentially executing the analyzed instructions according to the response sequence.

The read instruction of the preset instruction type is executed before the write instruction, and the clock period of the interval between the read instruction of the preset instruction type and the write instruction of the preset instruction type is fixed. The preset priority may include a priority ordering of read instructions, all read instructions, and all write instructions of a preset instruction type. And after sorting according to the preset priority, sequentially executing according to the sorting result. When executing, if the read instruction of the preset instruction type is executed, executing the write instruction of the preset instruction type after the clock cycle of the instruction life cycle.

When the same clock cycle of the read instruction, the write instruction in the current read-write instruction and the read instruction in the next read-write instruction arrives, the write instruction of the read-write instruction is necessarily arbitrated to execute, and the other instructions wait for the arbitration of the next clock cycle to execute in turn; the arbitration policy of the remaining instructions may be statically configured to a different arbitration priority or Round-Robin arbitration (Round-Robin).

Step 404, if the instruction type of the first read/write instruction is a preset instruction type, executing the first read instruction of the first read/write instruction, starting the instruction life cycle, and the preset instruction type is a read/write instruction or a partial write instruction.

Step 405, in the instruction life cycle, if a second read-write instruction of a preset instruction type is received and the second address accessed by the second read-write instruction is the same as the first address, merging the first write instruction of the first read-write instruction with the write instruction of the second read-write instruction to obtain the target write instruction.

The first address is an address accessed by the first read instruction.

Step 406, when the first data read by the first read instruction is returned, the instruction life cycle is ended, and the target write instruction is executed according to the first data.

The processing method of the on-chip cache provided by the embodiment of the invention can sort the received read-write instructions according to the preset priority, and configure the read-write instructions which are executed preferentially according to the priority demand, thereby improving the usability.

Example five

Fig. 5 is a flowchart of a processing method of on-chip buffering provided in a fifth embodiment of the present invention, as a further explanation of the foregoing embodiment, after obtaining a first read-write instruction sent by a host, further includes: configuring order-preserving information for the first read-write instruction, wherein the order-preserving information represents the order of feeding back the instruction read-write result to the host; accordingly, after executing the target write instruction according to the first data, further comprising: and feeding back result data to the host according to the order-preserving information. The method comprises the following steps:

Step 501, a first read-write instruction sent by a host is obtained.

Step 502, configuring order-preserving information for the first read-write instruction, where the order-preserving information indicates an order of feeding back the instruction read-write result to the host.

The instructions in the instruction stream of the host are sent in sequence. When the on-chip cache chip feeds back the execution result of the read-write instruction to the host, the on-chip cache chip can feed back the execution result sequentially according to the sequence. The order-preserving information is used for recording the sequence order of the read-write instruction in the instruction stream sent by the host.

In step 503, if the instruction type of the first read/write instruction is a preset instruction type, the first read instruction of the first read/write instruction is executed, the instruction life cycle is started, and the preset instruction type is a read/write instruction or a partial write instruction.

Step 504, if a second read/write command of a preset command type is received and the second address accessed by the second read/write command is the same as the first address in the command life cycle, merging the first write command of the first read/write command with the write command of the second read/write command to obtain the target write command.

The first address is an address accessed by the first read instruction.

Step 505, when the first data read by the first read instruction is returned, the instruction life cycle is ended, and the target write instruction is executed according to the first data.

And step 506, feeding back result data to the host according to the order-preserving information.

After the target write instruction is executed, determining the sequence position of the first read-write instruction in the result data fed back to the host according to the order-preserving information. The obtained multiple instruction execution results can be ordered according to the order-preserving information, so that the obtained instruction sequence is matched with the read-write instruction sequence in the instruction stream sent by the host.

The processing method for on-chip caching provided by the embodiment of the invention can determine the sequence of the result data of the read-write instruction fed back to the host according to the order-preserving information, so that the sequence of the result data is matched with the instruction stream sequence sent by the host, and the reliability of caching operation is improved.

Example six

Fig. 6 is a schematic structural diagram of an on-chip cache circuit module according to a sixth embodiment of the present invention, where the module is applied to an on-chip cache chip, and the module includes the following modules: the system comprises an analysis module, an instruction arbitration module, an instruction information processing module, a queue control module and a data storage module.

The analysis module is used for analyzing the read instruction, the write instruction and the read-write instruction. The instruction arbitration module is used for arbitrating the execution sequence of the parsed instructions. The instruction information processing module is used for maintaining a read-write instruction information list and completing data merging. The queue control module is used for maintaining instruction order-preserving information. The data storage module is used for storing data and check data, and the data check strategy adopted by the data storage module is Error Correction Code (ECC).

In one application environment, an on-chip cache chip may act as a slave, connecting multiple hosts through a data bus. The data resource in the on-chip cache circuit module is shared by the hosts, and each host can initiate a read instruction, a write instruction and a read-write instruction to the host. Meanwhile, the peripheral circuit comprises a sequence preserving control module, and according to the sequence preserving information returned by the on-chip cache, the sequence preserving requirement of the bus protocol between the host and the on-chip cache circuit module is realized.

And the analysis module in the on-chip cache circuit module is used for receiving and analyzing the instructions from the host, including the read instruction, the write instruction and the read-write instruction. When a read instruction is identified, the parsing module sends the read instruction to the instruction arbitration module. When the write instruction is identified and the data valid bits of the write instruction are all valid, the parsing module sends the write instruction to the instruction arbitration module. When the writing instruction is identified and the data valid bit of the writing instruction is only partially valid, the analyzing module analyzes the writing instruction into a reading and writing instruction. When the read-write instruction or the read-write instruction analyzed by part of the read-write instruction is identified, the analysis module carries out retrieval comparison on the access address of the instruction and the destination address of the write instruction in the effective entry in the instruction information list.

The instruction information list is shown in fig. 6 and includes an entry valid bit, a write instruction destination address, a data valid bit, and data. If the effective entry consistent with the instruction access address exists in the instruction information list module after retrieval comparison, for example, the first address is the same as the second address, the analysis module judges that the read-write instruction hits, and the instruction arbitration module is not required to be entered. And sending the order-preserving information of the instruction to a queue control module. The data valid bit and data information of the instruction are sent to an instruction information list.

If the effective item consistent with the instruction access address does not exist in the instruction information list after the retrieval comparison, the analysis module judges that the read-write instruction is not hit, decodes the read-write instruction into a read instruction and a write instruction, and sends the read instruction and the write instruction to the instruction arbitration module respectively. The interval between the read instruction and the write instruction after the read-write instruction decoding is the life cycle of instruction processing, and the life cycle of instruction processing is the delay of returning read data to the instruction information processing module after the read instruction is received by the data storage module.

And the instruction arbitration module is used for arbitrating the instruction execution sequence from the analysis module. The parsed instruction comprises: a read instruction, a write instruction, a read instruction to read a write instruction, and a write instruction to read a write instruction. Wherein, the write command priority in the read-write command is highest. When the same clock cycle of the read instruction, the write instruction in the current read-write instruction and the read instruction in the next read-write instruction arrives, the write instruction of the read-write instruction is necessarily arbitrated for execution, and the other instructions wait for the arbitration of the next clock cycle and then are sequentially executed. The arbitration policy of the remaining instructions may be statically configured to a different arbitration priority or Round-Robin arbitration (Round-Robin).

The instruction arbitration module loops to self-increment generate an instruction sequence number whose range matches the number of entries in the instruction information list module. When the write instruction for reading and rewriting the instruction is arbitrated, the data and the address information are sent to the corresponding instruction information list entry in the instruction information processing module according to the instruction serial number generated in the current clock cycle. When the write instruction for reading and rewriting the instruction is arbitrated, the instruction arbitration module sends the order-preserving information carried by the instruction and the instruction serial number generated in the current clock cycle to the queue control module. When arbitrating the read command of executing the read command or the read-write command, the command arbitration module sends a read request to the data storage module.

The instruction information processing module in the on-chip cache circuit module comprises an instruction information list composed of multiple entries and is used for temporarily storing and reading information of write instructions in the write instructions; and processing the merging operation of partial write instructions and read-write instructions of the same address, and receiving the read data returned by the data storage module and processing the write operation of the read-write instructions. The entry contents of the instruction information list include entry valid bits, a destination address of the write instruction, data valid bits, and data contents. When the number of the entries of the instruction information list is determined by the delay of returning the read data from the data storage module to the instruction information processing module when the cache processes the read-write instruction.

In this embodiment of the present application, the return delay of the read data is referred to as the lifecycle of instruction processing, that is, the lifecycle of each entry in the instruction information list. When the instruction information processing module receives a part of writing instructions from the instruction arbitration module or writing instructions in the reading and writing instructions, the instruction information list stores the instruction information into corresponding items according to the instruction serial numbers sent by the instruction arbitration module, and sets up item valid bits.

When the instruction information processing module receives writing instruction information in partial writing or reading and rewriting from the analysis module, merging the data valid bit of the instruction and the data valid bit and the data in the valid entry hit by the instruction information processing module to obtain the target writing instruction. And further realizes the function of merging the same instruction as the access address thereof in the life cycle of the entry.

When the instruction information processing module receives the read return data from the data storage module, the instruction information list updates the data information in the corresponding item, namely the effective part of the data information in the item is kept unchanged, the invalid part is updated to the read return data on the corresponding data bit, at the moment, the life cycle of the item is ended, the instruction information processing module writes the data information into the data storage module according to the write instruction destination address in the item, and the effective position of the item is invalid.

And the queue control module in the on-chip cache circuit module is used for receiving the bus serial number and the order-preserving information carried by the write instruction and the read-write instruction from the analysis module and the instruction arbitration module and the corresponding instruction serial number. The queue control module maintains a set of queues, the number of which corresponds to the number of entries in the instruction information list module, the number of queues being dependent on the lifecycle of instruction processing.

When the queue control module receives the instruction from the instruction arbitration module or the analysis module, the bus serial number and the order-preserving information carried by the instruction are enqueued into a queue matched with the instruction serial number. When the instruction's lifecycle ends, the queue dequeues the instruction information. The data storage module returns the verification information of the read data, and outputs the verification information to the order-preserving control module outside the on-chip cache circuit module after merging.

The data storage module in the on-chip cache circuit module comprises a data storage unit (SRAM) for storing data and a data verification circuit module, wherein the verification strategy is Error Correction Code (ECC). When receiving the data information sent by the instruction information processing module, the data checking module codes the data information and sends the generated check codes and the data information into a data storage unit (SRAM) according to the destination address. When the data storage module receives a read request sent by the instruction arbitration module, data read from the data storage unit (SRAM) and the check code are sent to the data check module, and the data check module decodes the read data and compares the read data with the check code. When the read request comes from the read instruction, the data storage module directly returns the data returned by the read and the verification result to the peripheral host computer initiating the read instruction; when the read request comes from the read instruction in the read-write instruction, the data storage module sends the read returned data and the verification result to the instruction information processing module.

The response of the read-write command can be performed by combining the modules in the following manner.

In step 601, the peripheral control circuit initializes the static configuration of the on-chip cache circuit module, such as the arbitration policy of the instruction arbitration module.

Step 602, the instruction of the peripheral host is decoded by the instruction decoding module, and is a read instruction, a write instruction with all data valid bits valid, and a write instruction with part of the data valid bits valid or a read-write instruction.

In step 603, when the read instruction is decoded, the instruction arbitration module reads the storage unit data and returns to the host when arbitrating the read instruction.

Step 604, when the data valid bit is decoded into a write instruction with all valid bits, when the instruction arbitration module arbitrates the execution of the write instruction, the instruction information is written into the storage unit, and the host is notified that the write instruction is completed.

In step 605, when decoding into a write instruction or a read/write instruction with a valid data valid bit portion, it is determined whether the destination address hits in a valid entry in the instruction information list.

Step 606, when the destination address of the instruction hits in the valid entry in the instruction information list, merging the data valid bit and data information of the instruction with the information of the entry, and enqueuing the instruction order-preserving information.

Step 607, when the destination address of the instruction misses a valid entry in the instruction information list and the instruction arbitration module arbitrates the write instruction execution of the instruction, the instruction information is filled into an invalid entry and is set to be valid.

Step 608, in the case of step 606 or step 607, waiting for the read data of the corresponding entry to return, merging the instruction information in the entry, that is, keeping the valid part of the data information in the entry unchanged, updating the invalid part into the read return data on the corresponding data bit, and writing the merged instruction information into the data storage unit; the instruction order-preserving information is dequeued, combined with the data verification information and returned to the host computer, and the host computer is informed of the completion of the instruction; the entry is set to invalid.

Illustratively, as shown in FIG. 7, the instruction stream sent by the host includes write instruction 1 (address A, partial write), write instruction 2 (address A, partial write), write instruction 3 (address C), read instruction 1 (address D), write instruction 4 (address E), and write instruction 5 (address F). According to the instruction response mode in the prior art, after the address A is read by the write instruction 1 for reading and rewriting, the read instruction delay is entered. Since the read instruction is not blocked by a read-write or partial-write instruction, the read instruction 1, read address D, is executed in the second clock cycle. Then, in the following four clock cycles, the read address a data of the read instruction 1 is waited for to return. In the seventh clock cycle, the read address a data returns, and the partial write address a is executed. In the eighth clock cycle, write instruction 2 (address a, partial write) is executed. In the same way as the write command 1 (address a, partial write), a read command delay is entered before the read address a of the write command 2 (address a, partial write) returns. Until the read address a data of the write instruction 2 is returned, the partial write address a is executed. Write instruction 3 (address C), write instruction 4 (address E), and write instruction 5 (address F) are then executed taking a total of 17 clock cycles.

By adopting the technical scheme provided by the embodiment of the invention, as shown in fig. 8, 6 instructions (except for the access of the write instruction 6) sent from the host are cached, and the processing can be completed in 7 clock cycles in practice, which is far less than the processing time of 17 clock cycles when the method is not used for optimization.

The read-write instruction 1 obtained after the instruction decoding starts the instruction life cycle of accessing the address a, and the read-write instructions of the address a are merged in the instruction life cycle of the read address a. A write instruction, such as write instruction 2 (address a, partial write) may be combined with write instruction 1 (address a, partial write) to obtain a target write instruction. Write instruction 3 (address C), read instruction 1 (address D), write instruction 4 (address E), and write instruction 5 (address F) may be executed sequentially during the instruction lifecycle of read address a.

When the read-write instruction 6 arrives, the address a lifecycle that the read-write instruction 1 turns on has ended. Even if the access addresses are the same, the read-write instruction 6 cannot be merged with the read-write instruction 1 and the read-write instruction 2. When the read-write instruction 6 is executed, the next clock cycle of the address a is started.

And a circuit module of an on-chip cache for supporting out-of-order processing instructions, thereby improving access performance of read-write instructions. When a read/write command is recognized, the method inserts a read/write access of a subsequent non-identical address into a return delay of the read/write command, thereby fully utilizing the read delay in the read/write command and reducing the performance degradation caused by the read/write command. The aim of reducing cache power consumption and improving performance is fulfilled by combining the same access address instructions; that is, in the instruction life cycle of instruction processing, the method can identify partial write or read-write commands with the same destination address, and combine the commands into one pen and then write the combined commands into the storage unit. The clock cycle range for identifying the same destination address is determined by the read data return delay of the data storage unit in the on-chip cache, namely the life cycle of instruction processing in the method.

Example seven

Fig. 9 is a processing apparatus for on-chip cache according to a seventh embodiment of the present invention, where the present embodiment is applicable to a data read-write scenario of an on-chip cache, and the apparatus may be executed by an on-chip cache chip, and specifically includes: instruction parsing module 71, read instruction execution module 72, merge module 73, and write instruction execution module 74.

The instruction parsing module 71 is configured to obtain a first read-write instruction sent by the host;

the read command execution module 72 is configured to execute the first read command of the first read command if the command type of the first read command is a preset command type, and start the command life cycle, wherein the preset command type is a read-write command or a partial write command;

the merging module 73 is configured to merge, in the instruction lifecycle, the first write instruction of the first read-write instruction and the write instruction of the second read-write instruction to obtain the target write instruction if the second read-write instruction of the preset instruction type is received and the second address accessed by the second read-write instruction is the same as the first address, where the first address is the address accessed by the first read-write instruction;

the write instruction execution module 74 is configured to execute the target write instruction according to the first data when the first data read by the first read instruction returns, and the instruction lifecycle is ended.

On the basis of the embodiment, the data verification strategy is an error correction code; the system also comprises an instruction conversion module.

The instruction conversion module is used for converting the first read-write instruction into the first read-write instruction and the first write instruction if the instruction type of the first read-write instruction is a partial write instruction.

On the basis of the above embodiment, the merging module 73 is configured to:

acquiring a first data valid bit of a first write instruction and a second data valid bit of a second write instruction of a second read instruction;

if the first data valid bit is different from the second data valid bit, a target write instruction for writing the first data valid bit and the second data valid bit is generated.

On the basis of the above embodiment, the merging module 73 is configured to:

combining the first write instruction of the first read-write instruction with the write instruction of the second read-write instruction to obtain a target write instruction, wherein the method comprises the following steps:

and if the first data valid bit is the same as the second data valid bit, using the write data of the second write instruction to cover the write data of the first write instruction, and obtaining the target write instruction.

On the basis of the above embodiment, the merging module 73 is configured to:

using the write data of the second write instruction to overwrite the write data of the first write instruction to obtain a target write instruction, including:

determining a first data valid bit and first write data according to a write instruction of the first read-write instruction;

determining a second data valid bit and second write data according to a write instruction of the second read-write instruction;

determining a reserved data bit according to the first data valid bit and the second data valid bit;

determining a target data valid bit according to the reserved data bit and the second data valid bit;

covering the first write data according to the valid bit of the target data and the second write data to obtain the target write data;

and determining a target write instruction according to the target write data, wherein the target write instruction is used for writing the target write data according to the valid bit of the target data.

On the basis of the above embodiment, the system further comprises other instruction processing modules, configured to: and in the instruction life cycle, if the third read-write instruction is acquired, one or more third read-write instructions are sequentially executed, wherein the third read-write instruction is a read instruction or all write instructions.

Based on the above embodiment, the instruction parsing module 71 is configured to:

analyzing an instruction stream sent by a host;

Analyzing a plurality of read-write instructions in the instruction stream;

and determining the response sequence of the read-write instructions according to the preset priority, wherein the write instruction weight of the first read-write instruction is greater than other instruction weights, the other instruction weights comprise all read instructions, all write instructions and the read instructions of the first read-write instruction, and the read instruction interval between the write instructions of the first read-write instruction and the read instruction of the first read-write instruction is the instruction life cycle.

On the basis of the above embodiment, the system further includes an order-preserving module, configured to configure order-preserving information for the first read-write instruction, where the order-preserving information indicates an order of feeding back the instruction read-write result to the host;

after the target write instruction is executed according to the first data, the result data is fed back to the host according to the order-preserving information.

The on-chip cache processing device provided by the embodiment of the invention is characterized in that an instruction analysis module 71 is used for acquiring a first read-write instruction sent by a host; the read command execution module 72 is configured to execute the first read command of the first read command if the command type of the first read command is a preset command type, and start the command life cycle, wherein the preset command type is a read-write command or a partial write command; the merging module 73 is configured to merge, in the instruction lifecycle, the first write instruction of the first read-write instruction and the write instruction of the second read-write instruction to obtain the target write instruction if the second read-write instruction of the preset instruction type is received and the second address accessed by the second read-write instruction is the same as the first address, where the first address is the address accessed by the first read-write instruction; the write instruction execution module 74 is configured to execute the target write instruction according to the first data when the first data read by the first read instruction returns, and the instruction lifecycle is ended. Compared with the current serial execution of the cache operation, when a large number of cache operations are faced, the performance is reduced, and the data reading and writing efficiency is low. And when the read instruction of the first read-write instruction returns, ending the instruction life cycle, and performing write operation on the data acquired from the first address according to the target write instruction. The method includes the steps that in the instruction life cycle of a first read-write instruction, one or more write instructions of other read-write instructions accessing the same address are combined. And the execution of a plurality of read-write instructions with the same address is realized through the clock time occupied by the first read-write instruction, so that the read-write performance of the on-chip cache is improved, and the read-write efficiency of the cache data is improved.

Example eight

Fig. 10 is a schematic structural diagram of a chip according to an eighth embodiment of the present invention, as shown in fig. 10, the chip includes a processor 80 and a memory 81; the number of processors 80 in the chip may be one or more, and one processor 80 is taken as an example in fig. 9; the processor 80 and the memory 81 in the chip may be connected by a bus or other means, for example in fig. 9.

The memory 81 is a computer readable storage medium that can be used to store a software program, a computer executable program, and a module, such as program instructions/modules (e.g., the merge module 73) corresponding to the processing method of the on-chip cache in the embodiment of the invention. The processor 80 executes various functional applications of the chip and data processing, that is, implements the on-chip cache processing method described above, by running software programs, instructions, and modules stored in the memory 81.

The memory 81 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, at least one application program required for functions; the storage data area may store data created according to the use of the terminal, etc. In addition, the memory 81 may include Static Random Access Memory (SRAM), high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some examples, memory 81 may further include memory located remotely from processor 80, which may be connected to the chip through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

Example nine

A ninth embodiment of the present invention also provides a storage medium containing computer-executable instructions, which when executed by a computer processor, are configured to perform a method of processing an on-chip cache, the method comprising:

acquiring a first read-write instruction sent by a host;

On the basis of the above embodiment, after obtaining the first read-write command sent by the host, the method further includes:

the data verification strategy is error correction code;

if the instruction type of the first read-write instruction is a partial write instruction, converting the first read-write instruction into a first read instruction and a first write instruction.

On the basis of the above embodiment, merging the first write command of the first read-write command and the write command of the second read-write command to obtain the target write command includes:

On the basis of the above embodiment, the step of overlaying the write data of the first write instruction with the write data of the second write instruction to obtain the target write instruction includes:

On the basis of the above embodiment, after the start of the instruction life cycle, the method further includes:

Based on the above embodiment, the obtaining the first read-write command sent by the host includes:

analyzing an instruction stream sent by a host;

analyzing a plurality of read-write instructions in the instruction stream;

configuring order-preserving information for the first read-write instruction, wherein the order-preserving information represents the order of feeding back the instruction read-write result to the host;

accordingly, after executing the target write instruction according to the first data, further comprising:

and feeding back result data to the host according to the order-preserving information.

Of course, the storage medium containing the computer executable instructions provided in the embodiments of the present invention is not limited to the above method operations, and may also perform related operations in the on-chip cache processing method provided in any embodiment of the present invention.

From the above description of embodiments, it will be clear to a person skilled in the art that the present invention may be implemented by means of software and necessary general purpose hardware, but of course also by means of hardware, although in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as a floppy disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a FLASH Memory (FLASH), a hard disk or an optical disk of a computer, etc., including several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to execute the method of the embodiments of the present invention.

It should be noted that, in the above embodiment of the on-chip cache processing apparatus, each unit and module included are only divided according to the functional logic, but not limited to the above division, so long as the corresponding functions can be implemented; in addition, the specific names of the functional units are also only for distinguishing from each other, and are not used to limit the protection scope of the present invention.

Note that the above is only a preferred embodiment of the present invention and the technical principle applied. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, while the invention has been described in connection with the above embodiments, the invention is not limited to the embodiments, but may be embodied in many other equivalent forms without departing from the spirit or scope of the invention, which is set forth in the following claims.

Claims

1. A method for processing an on-chip cache, comprising:

acquiring a first read-write instruction sent by a host;

if the instruction type of the first read-write instruction is a preset instruction type, executing the first read-write instruction of the first read-write instruction, and starting an instruction life cycle, wherein the preset instruction type is a read-write instruction or a part of write instruction;

If a second read-write instruction of the preset instruction type is received and a second address accessed by the second read-write instruction is the same as a first address in the instruction life cycle, merging a first write instruction of the first read-write instruction with a write instruction of the second read-write instruction to obtain a target write instruction, wherein the first address is the address accessed by the first read instruction;

and when the first data read by the first reading instruction is returned, ending the life cycle of the instruction, and executing the target writing instruction according to the first data.

2. The method of claim 1, further comprising, after obtaining the first read-write command sent by the host:

the data verification strategy is error correction code;

and if the instruction type of the first read-write instruction is a partial write instruction, converting the first read-write instruction into a first read instruction and a first write instruction.

3. The method of claim 1, wherein merging the first write instruction of the first read-write instruction with the write instruction of the second read-write instruction results in a target write instruction, comprising:

acquiring a first data valid bit of the first write instruction and a second data valid bit of a second write instruction of the second read instruction;

And if the first data valid bit is different from the second data valid bit, generating a target write instruction for writing the first data valid bit and the second data valid bit.

4. The method of claim 1, wherein merging the first write instruction of the first read-write instruction with the write instruction of the second read-write instruction results in a target write instruction, comprising:

5. The method of claim 4, wherein the overwriting of the write data of the first write instruction with the write data of the second write instruction results in a target write instruction, comprising:

determining a second data valid bit and second write data according to the write instruction of the second read-write instruction;

covering the first write data according to the target data valid bit and the second write data to obtain target write data;

and determining a target write instruction according to the target write data, wherein the target write instruction is used for writing the target write data according to the target data valid bit.

6. The method of claim 1, further comprising, after initiating the instruction lifecycle:

and in the instruction life cycle, if a third read-write instruction is acquired, sequentially executing one or more third read-write instructions, wherein the third read-write instructions are read instructions or all write instructions.

7. The method of claim 1, wherein the obtaining the first read-write command sent by the host includes:

analyzing an instruction stream sent by a host;

analyzing a plurality of read-write instructions in the instruction stream;

and determining a response sequence of the read-write instructions according to the preset priority, wherein the write instruction weight of the first read-write instruction is greater than other instruction weights, the other instruction weights comprise all read instructions, all write instructions and the read instructions of the first read-write instruction, and the read instruction interval between the write instructions of the first read-write instruction and the read instructions of the first read-write instruction is the instruction life cycle.

8. The method of claim 1, further comprising, after obtaining the first read-write command sent by the host:

configuring order-preserving information for the first read-write instruction, wherein the order-preserving information represents the order of feeding back instruction read-write results to a host;

correspondingly, after executing the target write instruction according to the first data, the method further comprises:

9. An on-chip cache processing apparatus, comprising:

the read instruction execution module is used for executing a first read instruction of the first read instruction if the instruction type of the first read instruction is a preset instruction type, and starting an instruction life cycle, wherein the preset instruction type is a read-write instruction or a part of write instruction;

the merging module is configured to merge, in the instruction lifecycle, a first write instruction of the first read-write instruction and a write instruction of the second read-write instruction to obtain a target write instruction if a second read-write instruction of the preset instruction type is received and a second address accessed by the second read-write instruction is the same as a first address, where the first address is an address accessed by the first read-write instruction;

And the write instruction execution module is used for executing the target write instruction according to the first data after the instruction life cycle is ended when the first data read by the first read instruction is returned.

10. A chip comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of any of claims 1-8 when executing the program.

11. A storage medium containing computer executable instructions for performing the method of any of claims 1-8 when executed by a computer processor.