CN116909755B

CN116909755B - Access method, processor, electronic device and readable storage medium

Info

Publication number: CN116909755B
Application number: CN202311176283.0A
Authority: CN
Inventors: 马建露; 王华强; 王凯帆; 陈键; 唐丹; 包云岗
Original assignee: Beijing Open Source Chip Research Institute
Current assignee: Beijing Open Source Chip Research Institute
Priority date: 2023-09-13
Filing date: 2023-09-13
Publication date: 2023-12-22
Anticipated expiration: 2043-09-13
Also published as: CN116909755A

Abstract

The embodiment of the invention provides a memory access method, a processor, electronic equipment and a readable storage medium, and relates to the technical field of computers, wherein the method comprises the following steps: acquiring an index access instruction to be executed and a first parameter and a second parameter of the index access instruction; determining a first value N1 corresponding to the first parameter and a second value N2 corresponding to the second parameter according to a preset mapping rule; splitting the index access instruction into at least one micro-operation according to the first value and the second value; splitting the micro-operation with the element as granularity to obtain sub-operation corresponding to the micro-operation; determining an element index value of the index access instruction according to the first numerical value, the second numerical value and each sub-operation corresponding to the micro-operation; and executing the memory access operation based on the element index value. The embodiment of the invention can reduce the complexity of selection and is beneficial to improving the memory access performance of the processor.

Description

Access method, processor, electronic device and readable storage medium

Technical Field

The present invention relates to the field of computer technologies, and in particular, to a memory access method, a processor, an electronic device, and a readable storage medium.

Background

An index access instruction, such as a load/store index instruction, is provided in the vector expansion instruction set manual. For a load index instruction, the base address of the instruction is stored in scalar register rs1, its index value is stored in vector register vs2, and its destination register vd is also a vector register; the number of the v s2 vector registers used by the instruction is specified by emul, and the width of the data read from the v s2 vector registers at each calculated address is specified by eew; the number of such instructions that use vd vector registers is specified by lmul, and the width of each write is specified by sew.

For a store index instruction, the base address of the instruction is also stored in scalar register rs1, its index value is stored in vector register vs2, and the source data is stored in vs3 vector register; the number of the v s2 vector registers used by the instruction is specified by a parameter emul, and eew-width data needs to be read from the v s2 registers each time the memory address is calculated; the instruction uses the number of vs3 vector registers specified by lmul, and each access reads sew width data from the vs3 vector registers to store to the specified address.

The values of the two instructions, emul and lmul, are as follows: 1/8, 1/4, 1/2, 1, 2, 4, 8; the values of eew and sew will occur for 4 cases: 1Byte,2Byte,4Byte,8Byte.

A load/store index instruction may select any legal emul and lmul values mentioned above. Taking the store index instruction as an example, the following two limit cases may occur: lmul=1, emul=8, i.e. the instruction will use 1 vs3 register and 8 vs2 registers at the same time; lmul=8, emul=1, i.e. the instruction will use 8 vs3 vector registers and 1 vs2 register simultaneously. The same applies to the load index instruction. From the micro-architecture implementation point of view, based on timing and area considerations, it is impossible to send data of 9 vector registers at a time.

Disclosure of Invention

The embodiment of the invention provides a memory access method, a processor, electronic equipment and a readable storage medium, which can reduce the complexity of selecting an index memory access instruction and improve the memory access performance of the processor.

In order to solve the above problems, an embodiment of the present invention discloses a memory access method, which is applied to a processor, and the method includes:

acquiring an index access instruction to be executed and a first parameter and a second parameter of the index access instruction; the first parameter is used for indicating the number of vector registers storing address offset values, and the second parameter is used for indicating the number of vector registers storing data;

Determining a first numerical value N1 corresponding to the first parameter and a second numerical value N2 corresponding to the second parameter according to a preset mapping rule, wherein both the N1 and the N2 are greater than or equal to 1; the mapping rule is used for mapping a first parameter or a second parameter with the value smaller than or equal to 1;

splitting the index access instruction into at least one micro-operation according to the first value and the second value;

splitting the micro-operation with the element as granularity to obtain sub-operation corresponding to the micro-operation;

determining an element index value of the index access instruction according to the first numerical value, the second numerical value and each sub-operation corresponding to the micro-operation;

and executing the memory access operation based on the element index value.

On the other hand, the embodiment of the invention discloses a processor, which comprises a processor back end, a transmitting queue and a memory access module;

the back end of the processor is used for acquiring an index access instruction to be executed and first parameters and second parameters of the index access instruction; the first parameter is used for indicating the number of vector registers storing address offset values, and the second parameter is used for indicating the number of vector registers storing data; determining a first numerical value N1 corresponding to the first parameter and a second numerical value N2 corresponding to the second parameter according to a preset mapping rule, wherein both the N1 and the N2 are greater than or equal to 1; the mapping rule is used for mapping a first parameter or a second parameter with the value smaller than or equal to 1; splitting the index access instruction into at least one micro-operation according to the first value and the second value;

The emission queue is used for splitting the micro-operation with the element as granularity to obtain sub-operation corresponding to the micro-operation; determining an element index value of the index access instruction according to the first numerical value, the second numerical value and each sub-operation corresponding to the micro-operation;

the memory access module is used for executing memory access operation based on the element index value.

In still another aspect, the embodiment of the invention also discloses an electronic device, which comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete communication with each other through the communication bus; the memory is used for storing executable instructions, and the executable instructions enable the processor to execute the memory access method.

The embodiment of the invention also discloses a readable storage medium, which enables the electronic equipment to execute the memory access method when the instructions in the readable storage medium are executed by the processor of the electronic equipment.

The embodiment of the invention has the following advantages:

according to the memory access method provided by the embodiment of the invention, the first parameter emul is mapped into the first numerical value N1 according to the preset mapping rule, the second parameter lmul is mapped into the second numerical value N2, then the index memory access instruction is split according to the first numerical value and the second numerical value to obtain at least one micro-operation, the micro-operation is further split into sub-operations by taking elements as granularity, the element index value of the index memory access instruction is determined according to the first numerical value, the second numerical value and each sub-operation of the index memory access instruction, and finally the memory access operation is executed based on the element index value. In the embodiment of the invention, the values of the first parameter emul and the second parameter lmul can be combined by utilizing the preset mapping rule, the original 7 values are simplified to 4, and after the values of the emul and the lmul are combined and mapped, the original 49 conditions are simplified to 16 conditions, so that the selection complexity is reduced, and the access performance of the processor is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments of the present invention will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of steps of an embodiment of a memory access method of the present invention;

FIG. 2 is a schematic diagram of a processor architecture according to the present invention;

FIG. 3 is a schematic diagram of another processor architecture of the present invention;

FIG. 4 is a schematic diagram of a processor architecture according to yet another embodiment of the present invention;

fig. 5 is a block diagram of an electronic device for access provided by an example of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The terms first, second and the like in the description and in the claims, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged, as appropriate, such that embodiments of the present invention may be implemented in sequences other than those illustrated or described herein, and that the objects identified by "first," "second," etc. are generally of a type, and are not limited to the number of objects, such as the first object may be one or more. Furthermore, the term "and/or" as used in the specification and claims to describe an association of associated objects means that there may be three relationships, e.g., a and/or B, may mean: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship. The term "plurality" in embodiments of the present invention means two or more, and other adjectives are similar.

Method embodiment

Referring to fig. 1, a step flow diagram of an embodiment of a memory access method of the present invention is shown, where the method specifically may include the following steps:

Step 101, acquiring an index access instruction to be executed and a first parameter and a second parameter of the index access instruction;

step 102, determining a first numerical value N1 corresponding to the first parameter and a second numerical value N2 corresponding to the second parameter according to a preset mapping rule, wherein both the N1 and the N2 are greater than or equal to 1; the mapping rule is used for mapping a first parameter or a second parameter with the value smaller than or equal to 1;

step 103, splitting the index access instruction into at least one micro operation according to the first value and the second value;

step 104, splitting the micro-operation with element as granularity to obtain sub-operation corresponding to the micro-operation;

step 105, determining an element index value of the index access instruction according to the first value, the second value and each sub-operation corresponding to the micro-operation;

and 106, executing memory access operation based on the element index value.

Wherein, the first parameter emul is used for indicating the number of vector registers storing address offset values; the second parameter lmul is used for indicating the number of vector registers for storing data.

The memory access method provided by the embodiment of the invention can be applied to a processor. Referring to fig. 2, a schematic architecture diagram of a processor according to an embodiment of the present invention is shown. As shown in fig. 2, a processor 200 provided in an embodiment of the present invention includes a processor back end, a transmit queue (FlowQueue), and a memory module. The transmission queue is corresponding to the vector memory access operation.

The back end of the processor can decode and split the index access instruction. The index access instruction is used for executing access operation according to the index value. The index access instruction may be provided by the processor front end.

It may be appreciated that the index memory instruction in the embodiment of the present invention may be a load/store index type instruction. As known from the vector instruction set extension manual, for a load index instruction, the base address of the instruction is stored in scalar register rs1, the index value is stored in vector register vs2, and the destination register vd is also a vector register; the number of the v s2 vector registers used by the instruction is specified by emul, and the width of the address offset value read from the v s2 vector registers by each calculated address is specified by eew; the number of such instructions that use vd vector registers is specified by lmul, and the width of each write is specified by sew. For a store index instruction, the base address of the instruction is also stored in scalar register rs1, its index value is stored in vector register vs2, and the source data is stored in vs3 vector register; the number of the v s2 vector registers used by the instruction is specified by a parameter emul, and eew-width data needs to be read from the v s2 registers each time the memory address is calculated; the instruction uses the number of vs3 vector registers specified by lmul, and each access reads sew width data from the vs3 vector registers to store to the specified address.

The values of emul and lmul may occur as follows: 1/8, 1/4, 1/2, 1, 2, 4, 8. When emul and lmul are smaller than 1, the back end of the processor still sends the data to be loaded with a vector register as granularity, or reads the data to be stored with a vector register as granularity. Thus, in the embodiment of the present invention, values of emul and lmul less than 1 may both be mapped to 1, specifically as shown in table 1:

after the first parameter emul and the second parameter lmul are mapped in a merging manner as shown in table 1, the index access instruction is split into N1 or N2 micro-operations according to the first value (emulNum) or the second value (lmulNum) after the merging and mapping.

From the micro-architecture design point of view, for a load index instruction, the back end of the processor can only send data of one vs2 vector register and one vd vector register at most at a time; for a store index instruction, the processor back-end can only send data for one vs2 vector register and one vs3 vector register at a time at most.

Based on this, in the memory access method provided by the embodiment of the present invention, the index memory access instruction is split into a plurality of micro operations (uops) based on the number of registers indicated by the first parameter emul and the second parameter lmul, and each micro operation is used for accessing and storing data corresponding to one vector register. For example, mapping the value of the first parameter emul according to table 1 to obtain a first value N1, splitting the index memory access instruction into N1 micro-operations; or, the value of the second parameter lmul is mapped according to table 1 to obtain a second value N2, and the index memory access instruction is split into N2 micro-operations.

Optionally, the splitting the index access instruction into at least one micro-operation according to the first value and the second value includes:

splitting the index memory access instruction into N2 micro-operations if the first value is less than or equal to the second value;

and splitting the index access instruction into N1 micro-operations under the condition that the first value is larger than the second value.

Specifically, if the first value (emulNum) is less than or equal to the second value (lmulNum), splitting the index memory access instruction into N2 micro-operations; if the first value is greater than the second value, the index access instruction is split into N1 micro-operations. Wherein N1 is a value obtained by mapping the value of the first parameter emul according to table 1, i.e. the first value; n2 is a value obtained by mapping the value of the second parameter lmul according to table 1, i.e. the second value.

For example, assuming that the index memory command emul=1/4, lmul=1/2, after the first parameter emul and the second parameter lmul are mapped in a merging manner according to table 1, emul_ =lmul_ =1, the index memory command will be converted into 1 micro-operation since emulnum=lmulnum.

And then, splitting each micro-operation with the element as granularity to obtain sub-operation (flow) corresponding to each micro-operation.

It may be appreciated that in the embodiment of the present invention, the index access instruction may be split into the micro-operations (uop) by the processor back-end based on the first value or the second value, the processor back-end sends the split micro-operations to the issue queue (FlowQueue), and the issue queue further splits the micro-operations into the sub-operations.

Each sub-operation obtained after splitting the micro-operation according to the element granularity can access and store a single element or a single effective element. Illustratively, the third parameter sew of the index access instruction is used to indicate a single element width, the fourth parameter eew is used to indicate an effective element width, and the issue queue may split the micro-operation according to the third parameter sew or the fourth parameter eew. In other words, the memory access width of each sub-operation is equal to the width indicated by sew or eew.

In the embodiment of the invention, if the processor back end splits the index access instruction based on the magnitude relation of the first value and the second value, two situations exist: the index access instruction is split into N2 micro-operations, and the value of N2 is obtained by mapping the second parameter according to the table 1; the index access instruction is split into N1 micro-operations, and the value of N1 is a numerical value obtained by mapping the first parameter according to the table 1. Wherein, the second parameter lmul is used for indicating the number of vector registers storing data, and corresponding to the second parameter lmul is a third parameter sew for indicating the width of the read data; the first parameter emul is used to indicate the number of vector registers storing address offsets, and the fourth parameter eew is used to indicate the width of the read address offset. When the micro-operation is split by the transmitting queue, the splitting basis of the index access instruction by the rear end of the processor can be referred, and specifically: if the back end of the processor splits the index access instruction by using an lmulNum, splitting the micro operation by using sew in an emission queue; if the processor back-end splits the index memory access instruction with emulNum, then the micro-operation is split at eew in the issue queue.

Optionally, in step 104, splitting the micro-operation with element granularity to obtain a sub-operation corresponding to the micro-operation, including:

step S11, obtaining a third parameter (sew) and a fourth parameter (eew) of the index access instruction; the third parameter is used for indicating the data width; the fourth parameter is used for indicating the width of the address offset value;

step S12, splitting the micro-operation according to the third parameter to obtain a sub-operation corresponding to the micro-operation when the first value is smaller than or equal to the second value;

and step S13, splitting the micro-operation according to the fourth parameter under the condition that the first value is larger than the second value to obtain the sub-operation corresponding to the micro-operation.

The processor back end splits the index memory access instruction into micro-operations (uops) based on a first value (emulNum) and a second value (lmulNum), and the issue queue may split the micro-operations into sub-operations (flows) based on the first value (emulNum) and the second value (lmulNum) as well. Specifically, if the first value is less than or equal to the second value, splitting the micro-operation according to the third parameter sew; if the first value is greater than the second value, the micro-operation is split according to a fourth parameter eew.

In the embodiment of the invention, after the index access instruction is converted into the sub-operations through twice splitting, the element index value of the index access instruction can be further determined according to the first numerical emulNum, the second numerical emulNum and each sub-operation corresponding to the micro-operation.

Illustratively, for a load index instruction, if the micro-operation is split by the third parameter sew, the sub-operations of the micro-operation are ordered sequentially, and the sequence value flowIdx of the sub-operation is the element index value of the destination register vd of the load index instruction; the load index instruction stores the element index value of the vector register of the address offset value, that is, the element index value of the vector register vs2, and may be determined according to the first value emulNum and the second value lmulNum, for example, according to the base address of the instruction, the first value emulNum, and the second value lmulNum, the memory information of the sub-operation is determined, and further, the element index value of the vector register vs2 is determined according to the memory information of the sub-operation. If the micro-operation is split by the fourth parameter eew, sequencing each sub-operation of the micro-operation according to the sequence, wherein the sequence value flowIdx of the sub-operation is the element index value of the vector register of the load index instruction storage address offset value, namely the element index value of the vector register vs 2; the element index value of the destination register vd may be determined according to the first and second values emulNum and lmulNum and flowIdx.

For the store index instruction, if the micro-operation is split by the third parameter sew, sequencing each sub-operation of the micro-operation according to the sequence, wherein the sequence value flowIdx of the sub-operation is the element index value of the source data of the store index instruction, namely the element index value of the vector register vs 3; the store index instruction stores the element index value of the vector register of the address offset value, i.e., the element index value of the vector register vs2, which may be determined according to the first value emulNum and the second value lmulNum. If the micro-operation is split by the fourth parameter eew, sequencing each sub-operation of the micro-operation according to the sequence, wherein the sequence value flowIdx of the sub-operation is the element index value of the vector register of the store index instruction storage address offset value, namely the element index value of the vector register vs 2; the element index value of the source data, i.e. the element index value of the vector register vs3, may be determined according to the first and second values emulNum and lmulNum and flowIdx.

Finally, a memory access operation is performed based on the determined element index value. For example, for a load index instruction, loading data in the memory into a destination register according to the element index value; for the store index instruction, the source data is stored into memory according to the element index value.

It should be noted that if, according to the values of emul and lmul in the related art, an index access instruction may use 9 vector registers simultaneously in a limit, and data of 8 vector registers is sent at a time, which means that the width of the data bus will be 1024 bits or even wider, which is basically not realized in a physical layer. According to the access method provided by the embodiment of the invention, the values of the first parameter emul and the second parameter lmul are combined by using the preset mapping rule, and the original 7 values are simplified to 4 values.

In the related art, both emul and lmul have 7 possible values, i.e. 7×7 possible cases occur, which would be a 48-stage selector, taking up a very large area. In the embodiment of the invention, after the values of emul and lmul are combined and mapped, 7×7 possibilities are simplified to 4×4 possibilities, so that the selection complexity is reduced, and the memory access performance of the processor is improved.

In an optional embodiment of the present invention, the sub-operation carries an index value of a micro-operation corresponding to the sub-operation; step 105 of determining the element index value of the index access instruction according to the first value, the second value and each sub-operation corresponding to the micro-operation includes:

Step S21, sequentially arranging all sub-operations corresponding to the micro-operations according to the index value to obtain a sequence value of the sub-operations;

step S22, determining an element index value of the index access instruction according to the sequence value of the sub-operation, the first value and the second value.

In the embodiment of the invention, the transmitting queue receives the micro-operation of the index access instruction, and splits each micro-operation by taking the element as granularity to obtain the sub-operation corresponding to each micro-operation.

When determining the element index value corresponding to each sub-operation, the transmitting queue can determine which micro-operation the sub-operation belongs to according to the index value of the micro-operation carried by the sub-operation. And sequentially arranging all sub-operations corresponding to the micro-operation according to the index value, so that the sequence value of each sub-operation of the micro-operation can be determined. Then, determining an element index value of the index access instruction according to the sequence value of sub-operations, the first value and the second value.

Optionally, the instruction type of the index access instruction is a load instruction; in the case that the first value is less than or equal to the second value, determining, in step S22, the element index value of the index access instruction according to the sequence value of sub-operations, the first value, and the second value includes:

A11, determining a first element index value of a target register according to the sequence value of the sub-operation;

a12, determining a second element index value of the first vector register of the address offset value stored by the index access instruction according to the ratio of the second value to the first value.

In the embodiment of the present invention, for a load instruction, that is, a load index instruction, if the first value (emulNum) is less than or equal to the second value (lmulNum), the issue queue splits the micro-operation according to the third parameter sew, in which case the sequence value flowIdx of the sub-operation may be directly determined as the first element index value of the destination register; and then, determining a second element index value of the first vector register for indexing the memory access instruction to store the address offset value according to the ratio of the second value to the first value.

Illustratively, referring to Table 2, an example of a method of calculating the vs2 register element index value vs2 Idx (second element index value) of a load index instruction is shown.

The uopdx is an index value of the micro-operation, and indicates the sorting of the micro-operation in each micro-operation corresponding to the index access instruction; flowNum is the number of sub-operations corresponding to one micro-operation; flowIdx is the sequence value of the sub-operation.

As an example, let lmul=2, sew=2byte, eew=1byte, emul=1 for one index instruction. For this index instruction, it needs to be split into two uops, the first uop sending the first vd register and the vs2 register and the second uop sending the second vd register and the vs2 register. The reason for this is: for the first uop (to fill the first vd register), the 16Byte/2Byte (8) addresses need to be calculated, the first eight bytes of data of the vs register need to be used; the second uop (to fill the second vd register) needs to calculate the 16Byte/2Byte (8) addresses, requiring the use of the last 8 bytes of vs2 data.

To achieve this, the lower n bits of uopdx are used for the representationThe number of vd shares one vs2, and the specific number of the vm isIs to be determined by shifting and flowIdx.

Two vd share one vs2, and for the second vd, the value of uopdx is 1, since one uop will be split into 8 sub-operations, i.e. flowNum is 8, log2 (flowNum) =3 after taking the logarithm of flowNum, shift the value of uopdx 3 bits to the left is 8, and it is known that the second uop will fetch data from the 8 th element of the vs2 register.

Optionally, the instruction type of the index access instruction is a load instruction; in the case that the first value is greater than the second value, determining, in step S22, the element index value of the index access instruction according to the sequence value of the sub-operations, the first value, and the second value includes:

a21, determining a first element index value of a destination register according to the ratio of the second value to the first value;

a22, determining a second element index value of the first vector register of the address offset value stored by the index access instruction according to the sequence value of the sub-operation.

In the embodiment of the present invention, for a load instruction, that is, a load index instruction, if the first value (emulNum) is greater than the second value (lmulNum), the issue queue splits the micro-operation according to the fourth parameter eew, in which case the sequence value of the sub-operation may be directly determined as the second element index value of the first vector register that indexes the memory access instruction storage address offset value; then, the first element index value of the destination register is determined according to the ratio of the second value to the first value, and the specific determination method can refer to the calculation method shown in table 2.

Optionally, the performing a memory access operation based on the element index value in step 106 includes:

step S31, determining a first memory access address according to the second element index value and the base address of the index memory access instruction;

step S32, reading target data from a memory according to the first access address, and loading the target data into a target register according to the first element index value; and the width of the target data is the data width indicated by the third parameter of the index access instruction.

For the load index instruction, the first memory address can be calculated according to the address offset value and the base address obtained by the second element index value. Illustratively, an address offset value offset is read from the vector register vs2 according to the second element index value, the width of the read address offset value offset being specified by the fourth parameter eew; the read address offset value offset is added to the base address to obtain the first memory address.

Then, the target data is read from the first memory address of the memory, and then the target data is loaded into the target register according to the first element index value. The width of the read target data is specified by the third parameter sew.

In an optional embodiment of the present invention, in step S32, the step of reading target data from the memory according to the first address, and loading the target data into the destination register according to the first element index value includes:

s321, reading target data from a memory according to the first access address; the width of the target data is the data width indicated by the third parameter of the index access instruction;

sub-step S322, filling the target data into the data field corresponding to the destination register according to the first element index value;

and step 323, loading the target data in the data field into the target register under the condition that the target data read from the memory by each sub operation corresponding to the micro operation is written into the data field corresponding to the target register.

In the embodiment of the invention, the sub-operations are accessed and stored with element granularity, each sub-operation accesses the memory once to fetch a single element or a single effective element, and target data required by one target register may need a plurality of sub-operations to be accessed and stored for a plurality of times. According to the embodiment of the invention, under the condition that all target data which are taken out from the memory by each sub-operation corresponding to the micro-operation are written into the data field of the target register corresponding to the micro-operation, the target data in the data field are loaded into the target register, so that the power consumption generated by data loading is reduced, and the memory access efficiency is improved.

Referring to fig. 3, a schematic architecture diagram of a processor according to an embodiment of the present invention is shown. As shown in fig. 3, for the load index instruction, when emulNum < = lmulNum, the processor back end splits the load index instruction into N1 uops with the number of registers specified by lmulNum; when emulNum > lmulNum, the processor back-end splits the load index instruction into N2 uops with the number of registers specified by emulNum.

The back end of the processor sends the uop to the flowQueue, and the uop is further split into sub-operation flows in the flowQueue. The principle of the flow queue splitting sub-operation flow is as follows: if the processor back end splits uop with lmulNum, then split flow here with sew; if the processor back-end splits uops in emulNum, then the flow is split here in eew.

The transmit queue (FlowQueue) consists of two 32-item queues, each with one read port and one write port. For accurate redirect, each queue adopts an out-of-order enqueue out-of-order dequeue mechanism, maintained by FQfreelist; when the loading module (loadUnit) is idle, the access information, such as element index value, is sent to the loadUnit, and the priority of vector access is the lowest in the loadUnit.

The micro-operation queue (UopQueue) is composed of a 32-item queue, each queue has four write ports (two write ports are used for writing uops into the back end of the processor, and the other two write ports are used for writing back data obtained by the loadUnit access), and two read ports are used for reporting valid data and exceptions to the back end. UopQueue is also an out-of-order enqueue, maintained by the available space table (freelist).

For convenience of unified signal control, the FlowQueue and UopQueue are encapsulated with VLWrappre.

Optionally, the method further comprises:

step S41, determining the number of sub-operations corresponding to the micro-operations according to the third parameter or the fourth parameter;

step S42, setting the count value of a counter according to the number of sub-operations;

step S43, when any sub-operation corresponding to the micro-operation reads target data from the memory and is written into the data field of the target register, the count value is reduced by 1;

step S44, under the condition that the count value of the counter is equal to 0, determining that all the target data read from the memory by each sub-operation corresponding to the micro-operation are written into the data field corresponding to the target register.

In the embodiment of the invention, whether the target data which are taken out from the memory by each sub-operation corresponding to the micro-operation are all written into the data field corresponding to the target register can be judged by the counter.

Illustratively, as shown in FIG. 3, after the processor backend splits the index access instruction into micro-operations (uops), the micro-operations may also be sent to a micro-operation queue (uopQueue). The UopQueue may calculate the number of sub operations (flows) corresponding to the current uop according to the parameters of the split uop at the back end of the processor, that is, the third parameter sew or the fourth parameter eew, and store the number in a counter (counter) and then each time target data of one flow is received, the count value is decremented by 1 until the count value is equal to 0, and the target data in the data domain may be written back to the register file. For example, if emulNum < = lmulNum, the back end of the processor splits to obtain lmulNum uops, the uopQueue further can obtain the number of flows to be received according to sew and writes into a counter, and when one flow retrieves the target data, the count value is decremented by 1, and when the count value is equal to 0, the target data is indicated to be written into the data field of the destination register, and the target data can be written back into the destination register in the register file.

Optionally, the instruction type of the index access instruction is a storage instruction; in the case that the first value is less than or equal to the second value, determining, in step S22, the element index value of the index access instruction according to the sequence value of sub-operations, the first value, and the second value includes:

b11, determining a third element index value of the source data according to the sequence value of the sub-operation;

and B12, determining a fourth element index value of a second vector register of the address offset value stored by the index access instruction according to the ratio of the second value to the first value.

In the embodiment of the present invention, for a store instruction, that is, a store index instruction, if the first value (emulNum) is less than or equal to the second value (lmulNum), the transmit queue splits the micro-operation according to the third parameter sew, in which case the sequence value of the sub-operation may be directly determined as the third element index value of the source data; then, according to the ratio of the second value to the first value, the fourth element index value of the second vector register of the store index instruction deposit address offset value is determined, and the specific determination method can refer to the calculation method shown in table 2.

Optionally, the instruction type of the index access instruction is a storage instruction; in the case that the first value is greater than the second value, determining, in step S22, the element index value of the index access instruction according to the sequence value of the sub-operations, the first value, and the second value includes:

b21, determining a third element index value of the source data according to the ratio of the second value to the first value;

b22, determining a fourth element index value of a second vector register of the index access instruction storing address offset value according to the sequence value of the sub-operation.

In an embodiment of the present invention, for a store instruction, i.e., a store index instruction, if the first value (emulNum) is greater than the second value (lmulNum), the issue queue splits the micro-operation according to the fourth parameter eew, in which case the sequence value of the sub-operation may be directly determined as the fourth element index value of the second vector register of the store index instruction store address offset value. Then, the third element index value of the source data is determined according to the ratio of the second value to the first value, and the specific determination method can refer to the calculation method shown in table 2.

Optionally, the performing a memory access operation based on the element index value includes:

step S51, determining a second memory access address according to the fourth element index value and the base address of the index memory access instruction;

step S52, storing the source data to the second memory address of the memory according to the third element index value; the width of the source data read by each access is the data width indicated by the third parameter of the index access instruction.

For the store index instruction, the second memory address may be determined based on the fourth element index value and the base address. Illustratively, the address offset value is first read from the vector register vs2 according to the fourth element index value, the width of each read address offset value being specified by the fourth parameter eew; then, the read address offset value is added to the base address to obtain the second memory address.

And then, reading the source data from the vector register vs3 according to the third element index value, and storing the read source data to a second memory address of the memory. The width of the source data read at a time is specified by a third parameter sew.

Referring to fig. 4, a schematic architecture diagram of another processor according to an embodiment of the present invention is shown. As shown in fig. 4, for the store index instruction, when emulNum < = lmulNum, the processor back-end splits the load index instruction into N1 uops with the number of registers specified by lmulNum; when emulNum > lmulNum, the processor back-end splits the load index instruction into N2 uops with the number of registers specified by emulNum.

The processor back-end sends uops to a micro-operation queue (UopQueue), which can be understood herein essentially as a buffer. The module may generate a mask specifying the effective element width sew on the one hand; on the other hand temporary store uops due to timing relationships.

UopQueue sends uops to an issue queue (FlowQueue), where the uops are further split into sub-operation flows. The principle of the flow queue splitting sub-operation flow is as follows: if the processor back end splits uop with lmulNum, then split flow here with sew; if the processor back-end splits uops in emulNum, then the flow is split here in eew.

Illustratively, uopQueue in fig. 4 is a 32-entry dual port write, dual port read queue. To facilitate reception (reception) processing, uopQueue is also an out-of-order enqueue, maintained by the available space table (freelist).

The FlowQueue consists of two 32-item queues, receives uops from the uopQueue, splits the uops according to element granularity, stores the uops, sends access information of sub-operation flow to a store unit (store unit) when the store unit is idle, writes the store Queue, submits a write to a buffer when an instruction is submitted, and writes the write to a memory. The access information may carry a third element index value and a fourth element index value corresponding to the sub-operation. The resending operation of vector storage is maintained by FlowQueue, while also for the accuracy of redirection (redirect), flowQueue still adopts an out-of-order dequeue mechanism.

It should be noted that, in the embodiment of the present invention, the memory module in fig. 2 may be divided into a load module (as shown in fig. 3) and a store module (as shown in fig. 4), where the load module is configured to execute a load operation corresponding to a load index instruction, and the store module is configured to execute a store operation corresponding to a store index instruction.

In an optional embodiment of the present invention, the obtaining the first parameter and the second parameter of the index access instruction to be executed and the index access instruction includes:

step S51, acquiring an index access instruction to be executed and second parameters, third parameters and fourth parameters of the index access instruction;

step S52, recoding the fourth parameter based on the coding rule of the third parameter to obtain a fourth coding value of the fourth parameter;

step S53, determining a first code value of a first parameter according to the second code value of the second parameter, the third code value of the third parameter and the fourth code value of the fourth parameter; the first code value=fourth code value-third code value+second code value;

step S54, determining a value of the first parameter according to the first coding value, where the value of the first parameter is a positive number.

In the related art, the RISC-V instruction set specifies emul= eew/sew ×lmul. Referring to tables 3 to 5, the coding formats of eew, sew and lmul are shown, respectively:

as can be seen from tables 3 to 5, there are 4 cases for the codes eew and sew, 7 cases for the codes lmul, and if the value of emul is determined by looking up the codes eew, sew and lmul, there are 112 possibilities, and a 112-entry table is required to record emul, which occupies a large memory space.

In the embodiment of the invention, the determination mode of the first parameter emul is improved. Specifically, the processor back-end may re-encode the fourth parameter eew according to the encoding rules of the third parameter sew. For example, the code value of eew in the segment memory command is "0101", and the fourth code value obtained by recoding it is "001". Then, the back end of the processor calculates a first encoded value of the first parameter emul according to the fourth encoded value of the fourth parameter eew, the third encoded value of the third parameter sew and the second encoded value of the second parameter lmul, wherein emul= eew _ -sew +lmul, and eew _ is the fourth encoded value obtained after re-encoding eew. Finally, the value of the fourth parameter is determined from the first encoded value of the first parameter.

According to the method provided by the embodiment of the invention, the coding value of the first parameter emul can be determined by only one 3-bit adder, so that the storage cost is reduced, and the calculation cost of the first parameter emul is effectively reduced.

Optionally, before the determining the value of the first parameter according to the first encoded value, the method further includes:

and under the condition that the first coded value of the first parameter overflows, determining that the first coded value is illegal, and re-determining the first coded value of the first parameter.

In the process of calculating the first code value of the first parameter emul, if overflow occursOut ofIncluding negative overflow of positive numbers and positive overflow of negative numbers,it is determined that the code value obtained at this time is erroneous, i.e. the first code value of the first parameter is illegal, the currently calculated first code value may be discarded and recalculated in accordance with the previous steps S51 to S54.

In summary, the embodiment of the invention provides a memory access method, which comprises the steps of mapping a first parameter emul into a first numerical value N1 according to a preset mapping rule, mapping a second parameter lmul into a second numerical value N2, splitting an index memory access instruction according to the first numerical value and the second numerical value to obtain at least one micro-operation, splitting the micro-operation into sub-operations by taking elements as granularity, determining an element index value of the index memory access instruction according to each sub-operation of the first numerical value, the second numerical value and the index memory access instruction, and finally executing the memory access operation based on the element index value. In the embodiment of the invention, the values of the first parameter emul and the second parameter lmul can be combined by utilizing the preset mapping rule, the original 7 values are simplified to 4, and after the values of the emul and the lmul are combined and mapped, the original 49 conditions are simplified to 16 conditions, so that the selection complexity is reduced, and the access performance of the processor is improved.

It should be noted that, for simplicity of description, the method embodiments are shown as a series of acts, but it should be understood by those skilled in the art that the embodiments are not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the embodiments. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred embodiments, and that the acts are not necessarily required by the embodiments of the invention.

Device embodiment

Referring to FIG. 2, there is shown a schematic architecture of a processor of the present invention, the processor including a processor back-end, a transmit queue, and a memory module;

Optionally, the splitting processing is performed on the micro-operation with the element as granularity to obtain sub-operations corresponding to the micro-operation, including:

acquiring a third parameter and a fourth parameter of the index access instruction; the third parameter is used for indicating the data width; the fourth parameter is used for indicating the width of the address offset value;

splitting the micro-operation according to the third parameter under the condition that the first value is smaller than or equal to the second value to obtain a sub-operation corresponding to the micro-operation;

And under the condition that the first value is larger than the second value, splitting the micro-operation according to the fourth parameter to obtain the sub-operation corresponding to the micro-operation.

Optionally, the sub-operation carries an index value of the micro-operation corresponding to the sub-operation;

the determining the element index value of the index access instruction according to the first value, the second value and each sub-operation corresponding to the micro-operation includes:

sequentially arranging all sub-operations corresponding to the micro-operations according to the index value to obtain a sequence value of the sub-operations;

and determining an element index value of the index access instruction according to the sequence value of the sub-operation, the first value and the second value.

Optionally, the instruction type of the index access instruction is a load instruction; and if the first value is less than or equal to the second value, determining an element index value of the index access instruction according to the sequence value of the sub-operations, the first value and the second value, including:

determining a first element index value of a destination register according to the sequence value of the sub-operation;

and determining a second element index value of a first vector register of the index access instruction deposit address offset value according to the ratio of the second value to the first value.

Optionally, the instruction type of the index access instruction is a load instruction; and if the first value is greater than the second value, determining an element index value of the index access instruction according to the sequence value of the sub-operations, the first value and the second value, including:

determining a first element index value of a destination register according to the ratio of the second value to the first value;

and determining a second element index value of a first vector register of the index access instruction deposit address offset value according to the sequence value of the sub-operation.

determining a first memory address according to the second element index value and the base address of the index memory access instruction;

reading target data from a memory according to the first memory address, and loading the target data into a target register according to the first element index value; and the width of the target data is the data width indicated by the third parameter of the index access instruction.

Optionally, the instruction type of the index access instruction is a storage instruction; and if the first value is less than or equal to the second value, determining an element index value of the index access instruction according to the sequence value of the sub-operations, the first value and the second value, including:

Determining a third element index value of the source data according to the sequence value of the sub-operation;

and determining a fourth element index value of a second vector register of the index access instruction storage address offset value according to the ratio of the second value to the first value.

Optionally, the instruction type of the index access instruction is a storage instruction; and if the first value is greater than the second value, determining an element index value of the index access instruction according to the sequence value of the sub-operations, the first value and the second value, including:

determining a third element index value of the source data according to the ratio of the second value to the first value;

and determining a fourth element index value of a second vector register of the index access instruction storing address offset value according to the sequence value of the sub-operation.

determining a second memory address according to the fourth element index value and the base address of the index memory access instruction;

storing the source data into the second memory address of the memory according to the third element index value; the width of the source data read by each access is the data width indicated by the third parameter of the index access instruction.

Optionally, the reading the target data from the memory according to the first address, and loading the target data into the destination register according to the first element index value, includes:

reading target data from a memory according to the first access address; the width of the target data is the data width indicated by the third parameter of the index access instruction;

filling the target data into a data field corresponding to the target register according to the first element index value;

and under the condition that target data read from the memory by each sub-operation corresponding to the micro-operation is written into a data field corresponding to the target register, loading the target data in the data field into the target register.

Optionally, the obtaining the index access instruction to be executed and the first parameter and the second parameter of the index access instruction includes:

acquiring an index access instruction to be executed and second, third and fourth parameters of the index access instruction;

recoding the fourth parameter based on the coding rule of the third parameter to obtain a fourth coding value of the fourth parameter;

determining a first encoded value of a first parameter according to the second encoded value of the second parameter, the third encoded value of the third parameter and the fourth encoded value of the fourth parameter; the first code value=fourth code value-third code value+second code value;

And determining the value of the first parameter according to the first coding value, wherein the value of the first parameter is a positive number.

Optionally, the processor back-end is further configured to:

According to the processor provided by the embodiment of the invention, the first parameter emul is mapped into the first numerical value N1 according to the preset mapping rule, the second parameter lmul is mapped into the second numerical value N2, then the index memory access instruction is split according to the first numerical value and the second numerical value to obtain at least one micro-operation, the micro-operation is further split into sub-operations with the element as granularity, the element index value of the index memory access instruction is determined according to the first numerical value, the second numerical value and each sub-operation of the index memory access instruction, and finally the memory access operation is executed based on the element index value. In the embodiment of the invention, the values of the first parameter emul and the second parameter lmul can be combined by utilizing the preset mapping rule, the original 7 values are simplified to 4, and after the values of the emul and the lmul are combined and mapped, the original 49 conditions are simplified to 16 conditions, so that the selection complexity is reduced, and the access performance of the processor is improved.

For the device embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments for relevant points.

In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described by differences from other embodiments, and identical and similar parts between the embodiments are all enough to be referred to each other.

The specific manner in which the various modules perform the operations in relation to the processor of the above-described embodiments have been described in detail in relation to the embodiments of the method and will not be described in detail herein.

Referring to fig. 5, a block diagram of an electronic device for access according to an embodiment of the present invention is provided. As shown in fig. 5, the electronic device includes: the device comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete communication with each other through the communication bus; the memory is configured to store executable instructions that cause the processor to perform the memory access method of the foregoing embodiment.

The processor may be a CPU (Central Processing Unit ), general purpose processor, DSP (Digital Signal Processor ), ASIC (Application Specific Integrated Circuit, application specific integrated circuit), FPGA (Field Programmble Gate Array, field programmable gate array) or other editable device, transistor logic device, hardware components, or any combination thereof. The processor may also be a combination that performs the function of a computation, e.g., a combination comprising one or more microprocessors, a combination of a DSP and a microprocessor, etc.

The communication bus may include a path to transfer information between the memory and the communication interface. The communication bus may be a PCI (Peripheral Component Interconnect, peripheral component interconnect standard) bus or an EISA (Extended Industry Standard Architecture ) bus, or the like. The communication bus may be classified into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one line is shown in fig. 5, but not only one bus or one type of bus.

The memory may be a ROM (Read Only memory) or other type of static storage device that can store static information and instructions, a RAM (Random Access memory) or other type of dynamic storage device that can store information and instructions, an EEPROM (Electrically Erasable Programmable Read Only, electrically erasable programmable Read Only memory), a CD-ROM (Compact Disa Read Only, compact disc Read Only), a magnetic tape, a floppy disk, an optical data storage device, and the like.

Embodiments of the present invention also provide a non-transitory computer-readable storage medium, which when executed by a processor of an electronic device (server or terminal), enables the processor to perform the memory access method shown in fig. 1.

It will be apparent to those skilled in the art that embodiments of the present invention may be provided as a method, apparatus, or computer program product. Accordingly, embodiments of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the invention may take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

Embodiments of the present invention are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal device to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal device, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiment and all such alterations and modifications as fall within the scope of the embodiments of the invention.

Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or terminal device comprising the element.

The foregoing has described in detail the memory access method, the processor, the electronic device and the readable storage medium provided by the present invention, and specific examples have been applied to illustrate the principles and embodiments of the present invention, and the above description of the embodiments is only used to help understand the method and the core idea of the present invention; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present invention, the present description should not be construed as limiting the present invention in view of the above.

Claims

1. A memory access method, applied to a processor, the method comprising:

and executing the memory access operation based on the element index value.

2. The method of claim 1, wherein splitting the index access instruction into at least one micro-operation according to the first value and the second value comprises:

3. The method of claim 1, wherein the splitting the micro-operation with element granularity to obtain the sub-operation corresponding to the micro-operation comprises:

4. The method according to claim 1, wherein the sub-operations carry index values of micro-operations corresponding to the sub-operations;

5. The method of claim 4, wherein the instruction type of the index access instruction is a load instruction; and if the first value is less than or equal to the second value, determining an element index value of the index access instruction according to the sequence value of the sub-operations, the first value and the second value, including:

6. The method of claim 4, wherein the instruction type of the index access instruction is a load instruction; and if the first value is greater than the second value, determining an element index value of the index access instruction according to the sequence value of the sub-operations, the first value and the second value, including:

7. The method of claim 5 or 6, wherein the performing a memory access operation based on the element index value comprises:

8. The method of claim 4, wherein the instruction type of the index access instruction is a store instruction; and if the first value is less than or equal to the second value, determining an element index value of the index access instruction according to the sequence value of the sub-operations, the first value and the second value, including:

9. The method of claim 4, wherein the instruction type of the index access instruction is a store instruction; and if the first value is greater than the second value, determining an element index value of the index access instruction according to the sequence value of the sub-operations, the first value and the second value, including:

10. The method of claim 8 or 9, wherein the performing a memory access operation based on the element index value comprises:

11. The method of claim 7, wherein the reading the target data from the memory according to the first address and loading the target data into the destination register according to the first element index value comprises:

12. The method of claim 1, wherein the obtaining the index access instruction to be executed and the first parameter and the second parameter of the index access instruction comprises:

13. The method of claim 12, wherein prior to determining the value of the first parameter from the first encoded value, the method further comprises:

14. The processor is characterized by comprising a processor back end, a transmission queue and a memory module;

15. An electronic device, comprising a processor, a memory, a communication interface, and a communication bus, wherein the processor, the memory, and the communication interface communicate with each other via the communication bus; the memory is configured to store executable instructions that cause the processor to perform the memory access method of any one of claims 1 to 13.

16. A readable storage medium, characterized in that instructions in the readable storage medium, when executed by a processor of an electronic device, enable the processor to perform the memory access method of any one of claims 1 to 13.