CN114924794B

CN114924794B - Address storage and scheduling method and device for transmission queue of storage component

Info

Publication number: CN114924794B
Application number: CN202210849915.4A
Authority: CN
Inventors: 李祖松; 郇丹丹
Original assignee: Beijing Micro Core Technology Co ltd
Current assignee: Beijing Micro Core Technology Co ltd
Priority date: 2022-07-20
Filing date: 2022-07-20
Publication date: 2022-09-23
Anticipated expiration: 2042-07-20
Also published as: CN114924794A

Abstract

The invention provides an address storage and scheduling method and device for a transmission queue of a storage component, wherein the method comprises the following steps: determining a first access address each time a memory access instruction is received; processing the first intra-Block Offset of the first access address to obtain a second intra-Block Offset, so that the plurality of second intra-Block Offset blocks are distributed in a dispersed manner; generating a second access address and storing the second access address in a transmitting queue; respectively determining second access addresses to be transmitted currently in transmission queues of a plurality of storage components; and transmitting the second access address to the corresponding storage unit, and executing the memory access instruction, wherein the memory access instruction comprises accessing the corresponding Bank based on the second intra-Block Offset. The invention can reduce the probability of access conflict and improve the processing efficiency.

Description

Address storage and scheduling method and device for transmission queue of storage component

Technical Field

The present invention relates to the field of electronic technologies, and in particular, to a method and an apparatus for storing and scheduling addresses of a transmission queue of a storage device.

Background

In the field of electronic technology, a computer device may determine an address for accessing a memory through a memory access instruction, and then read or write data from a physical address indicated by the address.

When a computer processes a task, multiple data accesses may be involved. At this time, it may happen that the location where each data is accessed in the Cache is concentrated in one region, so that the probability of access conflict is high.

When access conflict occurs, access needs to be sequentially executed, which causes the delay of processing tasks to increase and affects the processing efficiency. For example, when multiple memory access instructions point to the same cache Bank, the Bank has only one port, and cannot support simultaneous access of the multiple memory access instructions, and sequential execution is required.

Disclosure of Invention

In order to solve the problems in the prior art, embodiments of the present invention provide a method and an apparatus for storing and scheduling addresses of a transmission queue of a storage device. The technical scheme is as follows:

according to an aspect of the present invention, there is provided an address storage and scheduling method of a transmission queue of a storage device, the method including:

determining a first access address of a memory access instruction every time the memory access instruction to a memory component is received;

acquiring a first intra-Block Offset of the first access address;

processing the Offset Block Offset in the first Block to obtain an Offset Block Offset in a second Block, so that the Offset Block offsets in the second blocks of the plurality of access instructions are distributed dispersedly;

generating a second access address of the memory access instruction and storing the second access address in a transmission queue of the storage component, wherein the second access address comprises the second intra-Block Offset;

respectively determining second access addresses to be transmitted currently in transmission queues of a plurality of storage components;

and transmitting the second access address to a corresponding storage component, and executing the memory access instruction, wherein the memory access instruction comprises accessing a corresponding cache Bank based on the second intra-Block Offset.

Optionally, the processing the first intra-Block Offset includes:

obtaining an Index of the first access address;

and processing the first intra-Block Offset based on the Index.

Optionally, the Index is a virtual address Index or a physical address Index;

the processing the first intra-Block Offset based on the Index includes:

processing the first intra-Block Offset based on the virtual address Index; or

And processing the first intra-Block Offset based on the physical address Index.

Optionally, the processing the Offset Block Offset in the first Block based on the virtual address Index includes:

processing the first intra-Block Offset based on all bits of the virtual address Index; or

And processing the first intra-Block Offset based on the bits of the virtual address Index except the address ambiguity bits.

Optionally, the determining, in the transmission queues of the plurality of storage units, the second access address to be currently transmitted includes:

in the transmission queue of each storage component, when second access addresses of a plurality of access instructions exist, access addresses which do not conflict in access in the second access addresses are obtained and serve as second access addresses to be transmitted.

Optionally, the obtaining an access address with non-conflict access from the plurality of second access addresses includes:

among the plurality of second access addresses, access addresses with Bank bits different from each other are selected.

Optionally, the processing the first intra-Block Offset includes:

and carrying out hash processing on the Offset Block Offset in the first Block.

Optionally, the hashing the Offset Block Offset in the first Block includes: and performing addition processing or exclusive-or processing on the Offset Block Offset in the first Block.

According to another aspect of the present invention, there is provided a method of address storage and scheduling of a transmit queue of storage elements in an instruction pipeline stage, the method comprising:

taking out an instruction to be executed from the instruction Cache I-Cache;

decoding the instruction to obtain a memory access instruction;

renaming a logic register of the access instruction to be a physical register;

determining a first access address of the access instruction;

acquiring a first intra-Block Offset of the first access address;

processing the Offset Block Offset in the first Block to obtain an Offset Block Offset in a second Block, so that the Offset Block offsets in the second Block of the plurality of access instructions are distributed in a dispersed manner;

generating a second access address of the access instruction and storing the second access address in a transmission queue of a corresponding storage component, wherein the second access address comprises the second intra-Block Offset;

sending the access instruction to a functional unit FU for execution through a transmission queue;

reading corresponding operands from a physical register file according to the access instruction;

executing a calculation task in the FU according to the type and the operand of the access instruction;

transmitting the second access address to a corresponding storage component, and executing the memory access instruction, wherein the memory access instruction comprises accessing a corresponding cache Bank based on the second intra-Block Offset;

and writing the final result of the access instruction into a destination register.

According to another aspect of the present invention, there is provided an address storing and scheduling apparatus for a transmission queue of a storage device, the apparatus including:

the determining module is used for determining a first access address of an access instruction every time the access instruction to the storage component is received;

an obtaining module, configured to obtain a first intra-Block Offset of the first access address;

a dispersion module, configured to process the Offset Block Offset in the first Block to obtain an Offset Block Offset in a second Block, so that the Offset Block offsets in the second Block of the multiple access instructions are dispersedly distributed; generating a second access address of the memory access instruction and storing the second access address in a transmission queue of the storage component, wherein the second access address comprises the second intra-Block Offset;

the memory access module is used for respectively determining second access addresses to be transmitted currently in the transmission queues of the plurality of storage components; and transmitting the second access address to a corresponding storage component, and executing the memory access instruction, wherein the memory access instruction comprises accessing a corresponding cache Bank based on the second intra-Block Offset.

Optionally, the dispersion module is configured to:

obtaining an Index of the first access address;

and processing the first intra-Block Offset based on Index.

Optionally, the Index is a virtual address Index or a physical address Index;

the dispersion module is configured to:

processing the first intra-Block Offset based on the virtual address Index; or

Optionally, the dispersion module is configured to:

Optionally, the memory access module is configured to:

Optionally, the dispersion module is configured to: and carrying out hash processing on the Offset Block Offset in the first Block.

Optionally, the dispersion module is configured to: and performing addition processing or exclusive-or processing on the Offset Block Offset in the first Block.

According to another aspect of the present invention, there is provided an electronic apparatus including:

a processor; and

a memory for storing a program, wherein the program is stored in the memory,

wherein the program comprises instructions which, when executed by the processor, cause the processor to perform the address storage and scheduling method of the issue queue of the storage component described above.

According to another aspect of the present invention, there is provided a non-transitory computer readable storage medium storing computer instructions for causing a computer to execute the address storing and scheduling method of a transmission queue of the above-described storage means.

In the invention, when an access instruction to a storage component is received, a first access address of the access instruction can be determined, Block Offset in the first access address is processed to obtain a second access address, and the second access address is stored in a transmission queue, wherein the processing can ensure that the Offset Block Offset in a second Block of a plurality of access instructions is distributed dispersedly. After that, the second access address can be transmitted to each memory unit from the transmission queue, so that banks accessible in each memory unit are dispersed, the probability of access conflict is reduced, and the processing efficiency of the whole system is improved.

Drawings

Further details, features and advantages of the invention are disclosed in the following description of exemplary embodiments with reference to the accompanying drawings, in which:

fig. 1 illustrates a structure of a Multi-banking provided according to an exemplary embodiment of the present invention;

FIG. 2 illustrates an address diagram of Index and Block Offset provided in accordance with an exemplary embodiment of the present invention;

FIG. 3 is a flowchart illustrating a method for address storage and scheduling of a transmit queue of storage units provided in accordance with an illustrative embodiment of the present invention;

FIG. 4 illustrates a transmit queue based logic diagram provided in accordance with an exemplary embodiment of the present invention;

FIG. 5 illustrates a flowchart of a method for address storage and scheduling for a transmit queue of memory components provided in accordance with an exemplary embodiment of the present invention;

FIG. 6 is a flowchart illustrating a method for address storage and scheduling for a issue queue of storage elements in an instruction pipeline stage according to an illustrative embodiment of the present invention;

FIG. 7 is a schematic block diagram illustrating an address storing and scheduling apparatus of a transmit queue of storage devices provided in accordance with an illustrative embodiment of the present invention;

FIG. 8 illustrates a block diagram of an exemplary electronic device that can be used to implement an embodiment of the invention.

Detailed Description

Embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present invention are shown in the drawings, it should be understood that the present invention may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided for a more thorough and complete understanding of the present invention. It should be understood that the drawings and the embodiments of the invention are for illustration purposes only and are not intended to limit the scope of the invention.

It should be understood that the various steps recited in the method embodiments of the present invention may be performed in a different order and/or performed in parallel. Moreover, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the invention is not limited in this respect.

The term "include" and variations thereof as used herein are open-ended, i.e., "including but not limited to". The term "based on" is "based, at least in part, on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Relevant definitions for other terms will be given in the following description. It should be noted that the terms "first", "second", and the like in the present invention are only used for distinguishing different devices, modules or units, and are not used for limiting the order or interdependence relationship of the functions performed by the devices, modules or units.

It is noted that references to "a", "an", and "the" modifications in the present invention are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that reference to "one or more" unless the context clearly dictates otherwise.

The names of messages or information exchanged between devices in the embodiments of the present invention are for illustrative purposes only, and are not intended to limit the scope of the messages or information.

In order to clearly describe the method provided by the embodiments of the present invention, the following description is made of the used technology.

1、Cache

A cache Memory, which is located between a CPU (Central Processing Unit) and a main Memory DRAM (Dynamic Random Access Memory), is generally composed of an SRAM (Static Random-Access Memory). The speed of the CPU is far higher than that of the memory, when the CPU directly accesses data from the memory, a certain clock period is waited, the access speed of the Cache is high, a part of data which is just used or recycled by the CPU can be stored, and if the CPU needs to use the part of data again, the data can be directly called from the Cache, so that the data can be prevented from being accessed from the memory with long delay, the waiting time of the CPU is reduced, and the efficiency of the system is improved.

Cache is mainly composed of two parts, a Tag (Tag) part and a Data (Data) part. The Data portion is used to hold Data for a contiguous piece of address and the Tag portion is used to store the public address for the contiguous piece of Data. One Tag and all the Data corresponding to the Tag form a Line called a Cache Line, and the Data portion in the Cache Line is called a Data Block (Data Block). If a data can be stored in multiple places in the Cache, these multiple Cache lines found by the same address are called Cache Set.

2. Composition mode of Cache

The Cache is composed of direct connection, group connection and full connection, and the invention mainly relates to a group connection. The direct connection and the full connection can be respectively regarded as a special group connection forming mode with the number of paths being 1 and the number of paths being the number of Cache lines. The address of the processor accessing the memory is divided into three parts, Tag (Tag), Index (Index), and Block Offset (Block Offset). Wherein, Index is used to find a group of Cache lines from the caches, namely a Cache Set; comparing the Tag part read by using the Index with the Tag in the access address, and indicating that the Cache Line is the wanted one only if the Tag part read by using the Index is equal to the Tag in the access address; a plurality of access data are correspondingly arranged in one Cache Line, and the really desired data can be found through the Block Offset part in the memory address and the access width of the access instruction, and can be positioned to each byte. And a valid bit (valid) is also arranged in the Cache Line and used for marking whether the Cache Line stores valid data or not, the data of the Cache Line can be stored in the corresponding Cache Line only at the previously accessed memory address, and the corresponding valid bit can be set to be 1.

3. Memory access instruction

The access instruction is divided into a fetch instruction and a store instruction, and the access to the cache is required. The access instructions of different instruction sets are not identical in format, the general instruction set access instructions all comprise four types of byte access (lb), half word access (lh), word access (lw) and double word access (ld), and the storage instructions all comprise four types of byte storage (sb), half word storage (sh), word storage (sw) and double word storage (sd).

4. Pipeline stages of instructions

The pipeline stages of an instruction include:

(1) the instruction fetching part is responsible for fetching instructions from the I-Cache and mainly comprises two parts, and the I-Cache is responsible for storing recently commonly used instructions; the branch predictor is used to determine the PC value of the next instruction.

(2) And decoding, which is used for identifying the type of the instruction, the operand required by the instruction, some control signals of the instruction and the like.

(3) Register renaming, namely renaming logical registers defined in an instruction set into physical registers used inside a processor, wherein the number of the physical registers is more than that of the logical registers, and the processor can schedule more instructions which can be executed in parallel through register renaming.

(4) At this stage, the renamed instructions are written into a launch Queue (Issue Queue), a reorder buffer (ROB), a fetch instruction order maintenance Queue (Load Queue), a Store instruction order maintenance Queue (Store Queue), and the like according to the sequence specified in the program.

Issue Queue (lsue Queue), most Functional Units (FU) can execute instructions out of order, an instruction is put into a buffer before being sent into the FU for execution, the buffer is the issue Queue, each FU corresponds to an issue Queue, when the instruction is put into the buffer, its operand may not be completely ready, it waits in the buffer, as long as all source operands of an instruction are ready, it can be sent into the FU for execution, it is not necessary to understand the original order of the instruction in the program, the superscalar processor depends on this way to obtain higher performance, and this requires that the issue Queue can support the characteristics of the out of order issue instruction.

(5) Issue, after the Issue stage of the pipeline, the instruction is written to an Issue Queue (Issue Queue) which sends the instruction to the FU for execution.

(6) The Register File is read and the issued instruction requires the operands to be read from a Physical Register File (PRF).

(7) When the instruction obtains the operand required by the instruction, the operand is sent to the corresponding FU for execution, and the calculation task can be completed according to the type of the instruction, for example, the arithmetic operation is completed on the instruction of an arithmetic type, the address calculation is completed on the instruction of an access memory type, and the like.

(8) The D-Cache is accessed and only acts on the memory access instruction (mainly load/store instruction), and other types of instructions do nothing in this pipeline segment.

(9) And writing back, if the instruction has a destination register, writing the final execution result of the instruction into the destination register.

5. Multi-bank caching

The Multi-banking architecture, which divides the memory components into many small banks, is widely used in processors, and is shown in fig. 1.

As an example, each data block in the Cache may be divided into 8 independent banks, each Bank may be a 64-bit wide SRAM. Referring to the schematic address diagrams of Index and Block Offset shown in fig. 2, Block Offset may be divided into Bank bits and Byte bits, and after finding Cache Line based on Index, corresponding banks may be found and accessed based on Bank bits [5:3], and then specific bytes are found based on Byte bits.

The embodiment of the present invention provides an address storage and scheduling method for a transmission queue of a storage component, where the method may be applied to a terminal, a server, and/or other electronic devices with processing capability, and the present invention is not limited thereto.

The method will be described with reference to the flowchart of the address storing and scheduling method of the transmission queue of the storage unit shown in fig. 3. The method comprises the following steps 301-306.

Step 301, determining a first access address of a memory access instruction every time the memory access instruction to a memory component is received.

The storage component may include a Cache, an SRAM, a Memory, and the like, which is not limited in this embodiment.

In a possible implementation manner, during the operation of the device, the memory access component in the device may receive a plurality of memory access instructions. For example, when a device performs a certain calculation task, a corresponding fetch instruction may be triggered to obtain data required for calculation. The embodiment does not limit the specific task of triggering the memory access instruction.

The address of the processor accessing the memory is divided into three parts, Tag (Tag), Index (Index) and Block Offset (Block Offset). Wherein, Index is used to find a group of Cache lines from the caches, namely a Cache Set; comparing the Tag part read by using the Index with the Tag in the access address, and indicating that the Cache Line is the wanted one only if the Tag part read by using the Index is equal to the Tag in the access address; a plurality of access data are correspondingly stored in one Cache Line, and the data body and the bytes can be located through the Bank bit and the Byte bit of the Block Offset part in the memory address.

The access instruction may be a fetch instruction or a store instruction, and the device may calculate the first access address according to a base address and an offset carried in the access instruction, for example, the base address and the offset may be added. Alternatively, the first access address may be a virtual address, or a physical address.

As an example, fig. 4 shows a logic diagram based on an issue queue, where each time a memory access instruction is triggered, a base address and an offset in the memory access instruction may be calculated based on the address calculation logic module in fig. 4 to obtain a corresponding first access address.

The processing of step 302 and step 304 will be described based on a memory access instruction.

Step 302, obtain the first intra-Block Offset of the first access address.

In one possible implementation, the first access address may include Tag, Index, and Block Offset. The device may obtain Tag, Index, and Block Offset of the first access address according to the address format of the processor accessing the memory component. It should be noted that the number of bits of Tag, Index, and Block Offset may be different for each memory unit, and is related to the organization of the memory unit.

Step 303, processing the Offset Block Offset in the first Block to obtain an Offset Block Offset in the second Block.

The above processing may be hash processing, that is, hash processing is performed on the Offset Block Offset in the first Block to obtain the Offset Block Offset in the second Block. Specifically, the hash process may be an addition process or an exclusive or process. Of course, other processing capable of dispersedly distributing Block offsets corresponding to a plurality of access instructions may also be adopted, which is not limited in this embodiment.

In one possible implementation, the Block Offset of the first access address may be processed. When a plurality of access instructions exist, the processing can enable the Block Offset corresponding to the access instructions to be distributed dispersedly, namely the corresponding access positions are dispersed, and the access conflict is reduced.

Optionally, the processing of the Block Offset of the first access address may be processing all bits of the Block Offset, so that both the Bank bits and the Byte bits are changed.

Alternatively, the processing of the Block Offset of the first access address may be processing Bank bits in the Block Offset, so that banks accessed by multiple access instructions may be dispersed, thereby avoiding concentration on one or more banks, reducing queuing time for accessing the banks, and reducing time delay.

Optionally, the operation of processing the Offset Block Offset in the first Block may be as follows: obtaining an Index of a first access address; based on Index, the first intra-Block Offset is processed. On this basis, the processed second intra-Block Offset may be made to carry Index information.

In one possible implementation, a combined array of Index and the first intra-Block Offset may be determined, and the combined array may be processed to obtain the second intra-Block Offset. In order to reduce the processing delay, the specific processing performed on the Offset Block Offset in the first Block may be exclusive-or processing or addition processing.

Taking Bank bit processing as an example, the determination of the associative array may exist in the following two ways.

In the first method, in Index, the same part as the number of bits of the Bank bit is specified, and the same part and the Bank bit are used as a combined array. For example, referring to the address diagrams of Index and Block Offset shown in FIG. 2, Index is [14:6], and Bank bits are [5:3], so that 3 bits (e.g., upper 3 bits [14:12], but not limited to) in Index can be obtained and XOR-ed with the Bank bits to obtain new Bank bits, and the new Bank bits and the original Byte bits are combined to obtain the second intra-Block Offset.

And in the second mode, the Index is folded until the number of the Index is the same as that of the Bank bit, the folded Index is processed, and the folded and processed Index and the Bank bit are taken as a combined array. For example, referring to the address diagrams of Index and Block Offset shown in fig. 2, the Index may be folded three times to obtain three sets of arrays [14:12], [11:9], and [8:6], and the three sets of arrays may be xored to obtain a 3-bit xor result carrying the Index information, and further the xor result may be xored with the Bank bits to obtain new Bank bits, and the new Bank bits and the original Byte bits may be combined to obtain the second intra-Block Offset. The specific folding manner may be adjusted according to the address structure of the storage component, and the specific folding manner is not limited in this embodiment.

When all bits of the Offset Block Offset in the first Block are processed, two ways of determining the combination array are the same as the above, that is: in the first method, in Index, a portion having the same number of bits as the first intra-Block Offset is determined, and the portion having the same number of bits as the first intra-Block Offset and the first intra-Block Offset are used as a combined array; and in the second mode, the Index is folded until the Index has the same number of bits as the first Block internal Offset Block Offset, the folded Index is processed, and the folded and processed Index and the first Block internal Offset Block Offset are used as a combined array.

Optionally, corresponding to the case that the first access address may be a virtual address or a physical address, the Index may be a virtual address Index or a physical address Index. Correspondingly, the Offset Block Offset in the first Block may be processed based on the virtual address Index; or, the first intra-Block Offset is processed based on the physical address Index.

Due to the ambiguity of the virtual-real address translation, the operation of processing the first intra-Block Offset may include two cases based on the virtual address Index.

In case one, the first intra-Block Offset is processed based on all bits of the virtual address Index. This case may be applicable to application scenarios where address ambiguity does not exist.

In case two, the first intra-Block Offset is processed based on the bits of the virtual address Index excluding the address ambiguity bits. This case may be applicable to application scenarios where address ambiguities may exist. At this time, the address ambiguity bits of the virtual address Index may also be stored to determine the corresponding virtual address.

As an example, referring to fig. 4, corresponding to the above-mentioned case of processing all bits of Block Offset, new Bank bits and Byte bits may be obtained by processing the first intra-Block Offset in the first access address based on the scatter logic module.

Alternatively, corresponding to the above-described processing of Bank bits, Bank bits in the first intra-Block Offset are processed based on the scatter logic Block to obtain new Bank bits.

And step 304, generating a second access address of the access instruction and storing the second access address in a transmission queue of the storage component.

Wherein the second access address comprises a second intra-Block Offset.

In a possible implementation manner, the Block Offset in the first access address may be updated to the second intra-Block Offset, so as to obtain a second access address of the access instruction. As an example, the second access address may include the above Tag, Index, and second intra-Block Offset.

Further, the second access address may be stored in a transmit queue corresponding to the storage component. Wherein each memory component may have a respective transmit queue, the transmit queues between the memory components do not conflict.

As an example, referring to fig. 4, the issue queue may have an allocation circuit and an arbitration circuit that are matched, for example, when a memory access instruction is executed on the issue queue, a memory line in the issue queue corresponding to the memory access instruction may be determined by the allocation circuit, and the first access address of the memory access instruction and the new Bank bit and the Byte bit obtained by the scatter logic module, or the first access address of the memory access instruction and the new Bank bit obtained by the scatter logic module, may be allocated to the memory line and stored in a corresponding domain. At this time, all the information of the second access address may be stored in the transmission queue, which is equivalent to storing the second access address.

Because the time sequence before the instruction enters the transmission queue is not tense, the first access address can be obtained before the instruction enters the transmission queue, and the processing of the Offset Block Offset in the first Block is executed.

Step 305, respectively determining the current second access addresses to be transmitted in the transmission queues of the plurality of storage components.

In one possible embodiment, for each memory component, at each clock cycle, the second access address currently to be transmitted may be determined in its transmit queue.

Optionally, when the second access addresses of the multiple access instructions exist in the transmission queue, an access address with access conflict avoidance in the multiple second access addresses may be obtained as the second access address to be transmitted. On the basis of this, it can be ensured that the second access address accesses transmitted to each memory component per clock cycle do not conflict.

Specifically, the method may be that, among a plurality of second access addresses in the transmission queue, access addresses with different Bank bits are selected as access addresses with non-conflict access.

As an example, referring to fig. 4, the selecting operation may be implemented by an arbitration circuit, which selects the contents of the dequeue from the transmission queue according to different Bank bits, and generates the second access address to be transmitted based on the corresponding first access address, the new Bank bit, and the Byte bit, or generates the second access address to be transmitted based on the corresponding first access address and the new Bank bit.

On the basis, banks visited by the second access addresses transmitted in each clock cycle can be guaranteed to be different, and after the processing, the number of the second access addresses transmitted in each clock cycle can be increased, so that the memory access efficiency is improved. For example, before the memory access method provided by the invention is adopted, a plurality of memory access instructions may access Bank0 and Bank1 of the Cache in a centralized manner, so that only 2 memory access instructions are accessed in each clock cycle; after the memory access method provided by the invention is adopted, a plurality of memory access instructions can dispersedly access the Bank0-Bank7 of the Cache, so that 8 memory access instructions can be accessed in each clock cycle, and the memory access efficiency is improved.

Step 306, transmitting the second access address to the corresponding storage unit, and executing the memory access instruction, wherein the memory access instruction comprises accessing the corresponding Bank based on the second intra-Block Offset.

Taking the access of the Cache as an example, multiple Cache lines may be obtained from the Cache based on the Index of the second access address, and the hit target Cache Line may be searched from the obtained multiple Cache lines based on the Tag of the second access address. And accessing a corresponding Bank according to the Bank bits in the Block Offset of the second access address (namely, the second intra-Block Offset), and finding a corresponding position from the target Cache Line according to the Byte bits in the Block Offset and performing memory access operation.

In the embodiment of the invention, when an access instruction is received, a first access address of the access instruction can be determined, Block Offset in the first access address is processed to obtain a second access address, and the second access address is stored in a transmission queue, wherein the processing can enable the second blocks of a plurality of access instructions to be distributed in a scattered manner. After that, the second access address can be transmitted to each storage unit from the transmission queue, so that banks accessible in each storage unit are scattered, the probability of access conflict is reduced, and the overall processing efficiency of the system is improved.

The invention also provides another embodiment of an address storing and scheduling method of the transmission queue of the storage component. The method will be described with reference to the flowchart of the address storing and scheduling method of the transmission queue of the storage unit shown in fig. 5. The method comprises the following steps 501-507.

Step 501, every time a memory access instruction for a memory component is received, a first access address of the memory access instruction is determined. The implementation of step 501 is the same as step 301, and is not described herein again.

Step 502, obtain Tag, first Index and first intra-Block Offset of the first access address. The implementation of step 502 is the same as step 302, and is not described herein again.

Step 503, processing the first Index to obtain a second Index.

In one possible implementation, the Index of the first access address may be processed. When a plurality of access instructions exist, the processing can enable indexes corresponding to the plurality of access instructions to be dispersed, namely corresponding storage positions to be dispersed, and storage conflicts are reduced.

Optionally, the operation of processing the first Index may be as follows: based on the Tag, the first Index is processed. On this basis, the processed second Index may carry information of Tag. Because the Tags of different data are different inevitably, the invention reflects the information of the Tag in the Index, changes the Index, can disperse the data storage position, has lower probability of generating conflict, and thus improves the processing efficiency.

For the specific processing of the first Index, reference may be made to the implementation manner of step 303, which is not described in detail in this embodiment.

Step 504, processing the Offset Block Offset in the first Block to obtain the Offset Block Offset in the second Block. The implementation of step 504 is the same as step 303 described above.

In some embodiments, when processing the first intra-Block Offset based on Index, the Index used may be the second Index described above. On the basis, banks accessed by a plurality of access instructions can be further dispersed, and the probability of access conflict is further reduced.

And step 505, generating a second access address of the access instruction and storing the second access address in a transmission queue of the storage component. The target address may include the Tag, the second Index, and the second intra-Block Offset.

Step 506, in the transmission queues of the plurality of storage components, the second access address to be transmitted currently is determined respectively.

Step 507, transmitting the second access address to the corresponding storage component, and executing the access instruction.

The implementation of step 505-.

In the embodiment of the invention, when a memory access instruction is received, a first access address of the memory access instruction can be determined, Index and Block Offset in the first access address are processed to obtain a second access address, and the second access address is stored in a transmission queue, wherein the processing can ensure that Offset Block Offset in a second Block of a plurality of memory access instructions is distributed dispersedly. And then, the second access address can be transmitted to each storage component from the transmission queue, so that the positions of data storage can be dispersed, accessible banks can be dispersed, the probability of storage conflict and access conflict is reduced, and the overall processing efficiency of the system is further improved.

The embodiment of the invention provides an address storage and scheduling method for a transmission queue of a storage unit in an instruction pipeline stage.

Referring to fig. 6, a flowchart of an address storing and scheduling method for a issue queue of a memory unit in an instruction pipeline stage will be described by taking an example of a process of executing a memory access instruction in the instruction pipeline stage.

Step 601, taking out the instruction to be executed from the I-Cache. This step corresponds to the fetch stage of the instruction pipeline stage.

Step 602, decode the instruction to obtain the access instruction. This step corresponds to the decode stage of the instruction pipeline stage.

In one possible embodiment, after the instruction is decoded, the type of the instruction is determined, and the operands required by the instruction, some control signals for the instruction, and so on are identified. In this embodiment, the instruction is a fetch instruction or a store instruction.

Step 603, rename the logical register of the access instruction to a physical register. This step corresponds to the register renaming stage of the instruction pipeline stage.

And step 604, determining the access address of the access instruction, and distributing the access address to a transmission queue. This step corresponds to the Dispatch (Dispatch) phase of the instruction pipeline stage. The implementation of step 604 is the same as the above steps 301-304 or 501-505, and will not be described herein again.

In step 605, the access instruction is sent to the FU (Function Unit) through the issue queue for execution. This step corresponds to the issue stage of the instruction pipeline stage. The implementation of step 605 is the same as that of step 305 or step 506, and is not described herein again.

In step 606, corresponding operands are read from a Physical Register File (PRF) according to the memory access instruction. This step corresponds to the read register file stage of the instruction pipeline stage.

Step 607, according to the type and operand of the access instruction, the calculation task is executed in the FU. This step corresponds to the execution stage of the instruction pipeline stage. For example, arithmetic operations may be performed on arithmetic type instructions. Since the address calculation for the memory access instruction is completed in the dispatch stage, repeated execution is not required at this stage.

And step 608, accessing the corresponding storage part according to the access address. This step corresponds to the D-Cache stage of access of the instruction pipeline stage. The D-Cache is the storage component and can be Cache, SRAM, Memory and the like. The implementation of step 608 is similar to that of step 306 or step 507, and is not described herein again.

And step 609, writing the final result of the access instruction into a destination register. This step corresponds to the write back stage of the instruction pipeline stage.

In the embodiment of the invention, when an access instruction to a storage component is received, a first access address of the access instruction can be determined, Block Offset in the first access address is processed to obtain a second access address, and the second access address is stored in a transmission queue, wherein the processing can enable the Offset Block Offset in a second Block of a plurality of access instructions to be distributed dispersedly. Thereafter, the second access addresses can be transmitted from the transmission queue to the respective memory components, so that accessible banks are scattered, the probability of access collision is reduced, and thus the processing efficiency is improved.

The embodiment of the invention provides an address storage and scheduling device of a transmission queue of a storage component, which is used for realizing the address storage and scheduling method and device method of the transmission queue of the storage component. As shown in fig. 7, the apparatus 700 for storing and scheduling address of a transmission queue of a memory device includes: the device comprises a determining module 701, an obtaining module 702, a dispersing module 703 and a memory access module 704.

A determining module 701, configured to determine, each time a memory access instruction to a memory component is received, a first access address of the memory access instruction;

an obtaining module 702, configured to obtain a first intra-Block Offset of the first access address;

a dispersion module 703, configured to process the Offset Block Offset in the first Block to obtain an Offset Block Offset in a second Block; generating a second access address of the memory access instruction and storing the second access address in a transmission queue of the storage component, wherein the second access address comprises the second intra-Block Offset;

the memory access module 704 is configured to determine second access addresses to be currently transmitted in the transmission queues of the plurality of storage components, respectively; and transmitting the second access address to a corresponding storage component, and executing the memory access instruction, wherein the memory access instruction comprises accessing a corresponding cache Bank based on the second intra-Block Offset.

Optionally, the dispersion module 703 is configured to:

obtaining an Index of the first access address;

and processing the first intra-Block Offset based on Index.

Optionally, the Index is a virtual address Index or a physical address Index;

the dispersion module 703 is configured to:

processing the first intra-Block Offset based on the virtual address Index; or

Optionally, the dispersion module 703 is configured to:

And processing the Offset Block Offset in the first Block based on the bits of the virtual address Index except the address ambiguity bits.

Optionally, the memory access module 704 is configured to:

in the transmission queue of each storage component, when second access addresses of a plurality of access instructions exist, the second access address which is not conflicted in access in a plurality of target addresses is obtained and serves as the second access address to be transmitted.

Optionally, the memory access module 704 is configured to:

Optionally, the dispersion module 703 is configured to: and carrying out hash processing on the Offset Block Offset in the first Block.

Optionally, the dispersion module 703 is configured to: and performing addition processing or exclusive-or processing on the Offset Block Offset in the first Block.

In the embodiment of the invention, when an access instruction to a storage component is received, a first access address of the access instruction can be determined, Block Offset in the first access address is processed to obtain a second access address, and the second access address is stored in a transmission queue, wherein the processing can enable the Offset Block Offset in a second Block of a plurality of access instructions to be distributed dispersedly. After that, the second access address can be transmitted to each storage component from the transmission queue, so that accessible banks are scattered, the probability of access conflict is reduced, and the processing efficiency is improved.

An exemplary embodiment of the present invention also provides an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor. The memory stores a computer program executable by the at least one processor, the computer program, when executed by the at least one processor, is for causing the electronic device to perform a method according to an embodiment of the invention.

Exemplary embodiments of the present invention also provide a non-transitory computer-readable storage medium storing a computer program, wherein the computer program, when executed by a processor of a computer, is operable to cause the computer to perform a method according to an embodiment of the present invention.

The exemplary embodiments of the invention further provide a computer program product comprising a computer program, wherein the computer program, when being executed by a processor of a computer, is adapted to cause the computer to carry out the method according to the embodiments of the invention.

Referring to fig. 8, a block diagram of a structure of an electronic device 800, which may be a server or a client of the present invention, which is an example of a hardware device that may be applied to aspects of the present invention, will now be described. Electronic devices are intended to represent various forms of digital electronic computer devices, such as data center servers, notebook computers, thin clients, laptop computers, desktop computers, workstations, personal digital assistants, blade servers, mainframe computers, and other suitable computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.

As shown in fig. 8, the electronic device 800 includes a computing unit 801 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 802 or a computer program loaded from a storage unit 808 into a Random Access Memory (RAM) 803. In the RAM 803, various programs and data necessary for the operation of the device 800 can also be stored. The calculation unit 801, the ROM 802, and the RAM 803 are connected to each other by a bus 804. An input/output (I/O) interface 805 is also connected to bus 804.

A number of components in the electronic device 800 are connected to the I/O interface 805, including: an input unit 806, an output unit 807, a storage unit 808, and a communication unit 809. The input unit 806 may be any type of device capable of inputting information to the electronic device 800, and the input unit 806 may receive input numeric or character information and generate key signal inputs related to user settings and/or function controls of the electronic device. Output unit 807 can be any type of device capable of presenting information and can include, but is not limited to, a display, speakers, a video/audio output terminal, a vibrator, and/or a printer. The storage unit 808 may include, but is not limited to, a magnetic disk, an optical disk. The communication unit 809 allows the electronic device 800 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunications networks, and may include, but is not limited to, modems, network cards, infrared communication devices, wireless communication transceivers and/or chipsets, such as bluetooth devices, WiFi devices, WiMax devices, cellular communication devices, and/or the like.

Computing unit 801 may be a variety of general and/or special purpose processing components with processing and computing capabilities. Some examples of the computing unit 801 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and the like. The calculation unit 801 executes the respective methods and processes described above. For example, in some embodiments, the address storage and scheduling method of the transmit queue of the storage component may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 808. In some embodiments, part or all of the computer program can be loaded and/or installed onto the electronic device 800 via the ROM 802 and/or the communication unit 809. In some embodiments, the computing unit 801 may be configured to perform the address storage and scheduling method of the transmit queue of storage components by any other suitable means (e.g., by means of firmware).

Program code for implementing the methods of the present invention may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program code, when executed by the processor or controller, causes the functions/acts specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of the present invention, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user may provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

Claims

1. A method for storing and scheduling addresses of a transmit queue of a memory component, the method comprising:

acquiring a first intra-Block Offset of the first access address;

in the transmission queue of each storage component, when a plurality of second access addresses of the access instruction exist, selecting access addresses with different Bank bits from the plurality of second access addresses as second access addresses to be transmitted;

2. The method of claim 1, wherein the processing the first intra Block Offset comprises:

obtaining an Index of the first access address;

and processing the first intra-Block Offset based on the Index.

3. The method of claim 2, wherein the Index is a virtual address Index or a physical address Index;

the processing the first intra-Block Offset based on the Index includes:

processing the first intra-Block Offset based on the virtual address Index; or

4. The method according to claim 3, wherein the processing the first intra-Block Offset based on the virtual address Index comprises:

5. The method of claim 1, wherein the processing the first intra Block Offset comprises:

and carrying out hash processing on the Offset Block Offset in the first Block.

6. The method of claim 5, wherein the hashing the first intra Block Offset Block Offset comprises: and performing addition processing or exclusive-or processing on the Offset Block Offset in the first Block.

7. A method of address storage and scheduling for a issue queue of memory units in an instruction pipeline, the method comprising:

taking out an instruction to be executed from an instruction Cache I-Cache;

decoding the instruction to obtain a memory access instruction;

renaming a logical register of the access instruction as a physical register;

determining a first access address of the access instruction;

acquiring a first intra-Block Offset of the first access address;

sending the access instruction to the functional unit FU for execution through the transmission queue, where the instruction includes: in the transmission queue of each storage component, when a plurality of second access addresses of the access instruction exist, selecting access addresses with different Bank bits from the plurality of second access addresses as second access addresses to be transmitted;

8. An address storing and scheduling apparatus for a transmit queue of memory devices, the apparatus comprising:

the memory access module is used for selecting access addresses with different Bank bits as second access addresses to be transmitted from a plurality of second access addresses when the second access addresses of a plurality of memory access instructions exist in the transmitting queue of each storage component; and transmitting the second access address to a corresponding storage component, and executing the memory access instruction, wherein the memory access instruction comprises accessing a corresponding cache Bank based on the second intra-Block Offset.

9. An electronic device, comprising:

a processor; and

a memory for storing the program, wherein the program is stored in the memory,

wherein the program comprises instructions which, when executed by the processor, cause the processor to carry out the method according to any one of claims 1-7.

10. A non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform the method of any one of claims 1-7.