CN117093268B - Vector processing method, system, equipment and storage medium - Google Patents

Vector processing method, system, equipment and storage medium Download PDF

Info

Publication number
CN117093268B
CN117093268B CN202311356528.8A CN202311356528A CN117093268B CN 117093268 B CN117093268 B CN 117093268B CN 202311356528 A CN202311356528 A CN 202311356528A CN 117093268 B CN117093268 B CN 117093268B
Authority
CN
China
Prior art keywords
vector
length
processed
vector length
processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311356528.8A
Other languages
Chinese (zh)
Other versions
CN117093268A (en
Inventor
胡皓琰
蒋江
张弛
施军
蔡学武
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chaorui Technology Changsha Co ltd
Original Assignee
Chaorui Technology Changsha Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chaorui Technology Changsha Co ltd filed Critical Chaorui Technology Changsha Co ltd
Priority to CN202311356528.8A priority Critical patent/CN117093268B/en
Publication of CN117093268A publication Critical patent/CN117093268A/en
Application granted granted Critical
Publication of CN117093268B publication Critical patent/CN117093268B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30007Arrangements for executing specific machine instructions to perform operations on data operands
    • G06F9/30036Instructions to perform operations on packed data, e.g. vector, tile or matrix operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/3005Arrangements for executing specific machine instructions to perform operations for flow control

Abstract

The invention belongs to the field of design of a micro-architecture of a processor, and particularly relates to a vector processing method, a vector processing system, vector processing equipment and a vector processing storage medium, wherein the vector processing method comprises the following steps: after the vector instruction is obtained, setting the vector length in a decoding stage, marking a first vector length setting instruction, obtaining an actual setting vector length based on the software application vector length and the hardware support length, setting the instruction processing times, the processing times and the tail vector length based on the vector length if the predicted vector length is the same as the actual setting vector length, processing the vector to be processed, and refreshing the vector instruction if the vector length is not the same, and processing the vector to be processed according to the actual setting vector length. The method has the effects of predicting the vector length value without waiting for a decoding result and improving vector processing efficiency.

Description

Vector processing method, system, equipment and storage medium
Technical Field
The invention belongs to the field of design of a processor micro-architecture, and particularly relates to a vector processing method, a vector processing system, vector processing equipment and a vector processing storage medium.
Background
The current widely used reduced instruction set is RISC-V, and the RISC-V vector instruction set provides a variable-length vector instruction set which provides rich instruction types and improves the flexibility of program writing.
In the related art, the RISC-V vector instruction set provides an instruction for setting the system state, vector length, and vector length, such as vsetvlr, rs1, rs2, for setting the vector length. Where scalar source operand rs1 is used to convey a software vector length (AVL), rd is the vector length actually set by the hardware, and if source operand rs1 conveys a value greater than the hardware vector register maximum vector length, the hardware vl is set to the maximum vector length. Each software attempt to set vl with a vsetvl instruction via scalar register operands results in execution of subsequent instructions dependent on the value of scalar register rs1 used to set vl.
For the related art, when vector expansion is implemented, virtualization or renaming is needed to eliminate data correlation, so that an instruction dependent on vl waits for the vl value to be ready through an out-of-order scheduling algorithm, but implementing multiple vl values in a processor needs to depend on multiple vl copies to support a vector instruction, and an accurate vl value cannot be obtained in a decoding stage.
Disclosure of Invention
The invention aims to solve the technical problem that when the vector length is set, a plurality of vl copies are required to be set in a processor, and an accurate vl value cannot be obtained in a decoding stage, so that a subsequent instruction cannot be decoded according to the vl value.
A vector processing method, comprising:
acquiring a vector instruction;
sequentially decoding the vector instructions to obtain decoding results;
predictively setting a predicted vector length and marking a first vector length setting instruction;
acquiring an operand address based on the decoding result;
reading a software application vector length based on the operand address;
acquiring a hardware support length;
based on the software application vector length and the hardware support length, obtaining an actual set vector length;
acquiring the length of a preset vector to be processed;
obtaining the processing times and the tail vector length based on the preset vector length to be processed and the actual set vector length, wherein the processing times and the tail vector length are vector prediction tables;
judging whether the length of the actual set vector is equal to the length of the predicted vector;
if the actual set vector length is equal to the predicted vector length, obtaining vector length set instruction processing times based on the first vector length set instruction;
setting instruction processing times, the processing times and the tail vector length based on the vector length, and processing a vector to be processed;
and if the actual set vector length is not equal to the predicted vector length, refreshing the processing vector length to set the instruction processing times and processing the vector to be processed based on the actual set vector length.
And after the vector instruction is obtained, decoding is started, at the moment, the length of the predicted vector is set in advance, the initial value of the length is the hardware support length, and the length is updated to the actual set vector length when the predicted vector length is wrong. When the first vector instruction is processed, the first vector instruction is marked, that is, the first vector length setting instruction is marked, so that the number of times of processing the vector is known, the number of times of processing the vector to be processed, that is, the number of times of processing the vector length setting instruction, can be known in a counter mode, and the counter is incremented by 1 every time the vector to be processed is successfully processed.
Optionally, the obtaining the actual set vector length value based on the software application vector length and the hardware support length includes:
judging whether the length of the software application vector is greater than the hardware support length;
if the software application vector length is greater than the hardware support length, taking the hardware support length as the actual set vector length;
and if the software setting length is smaller than or equal to the hardware supporting length, taking the software application vector length as the actual setting vector length.
Optionally, the obtaining the processing times and the tail vector length based on the preset to-be-processed vector length and the actual set vector length includes:
calculating the dividing times and the residual length based on the actual set vector length and the preset vector length to be processed;
the dividing times are equal to the length of the preset vector to be processed divided by the length of the actual set vector to be added by one, and the dividing times are taken as the processing times;
and the residual length is equal to the length of the preset vector to be processed divided by the length of the actual set vector to obtain a remainder, and the residual length is used as the length of the tail vector.
Optionally, the setting the instruction processing times, the processing times and the tail vector length based on the vector length includes:
acquiring vector length setting instruction processing times;
judging whether the vector length setting instruction processing times are smaller than the processing times or not;
if the vector length setting instruction processing times are smaller than the processing times, setting a prediction vector length based on the hardware support length;
and if the vector length setting instruction processing times are equal to the processing times, setting the predicted vector length as the tail vector length.
Optionally, the acquiring the vector length setting instruction processing times includes:
performing software loop marking on the first vector length instruction based on the vector instruction;
and calculating the vector length setting instruction processing times based on the software loop marker.
Optionally, the setting the instruction processing number, the processing number and the tail vector length based on the vector length includes:
clearing the software loop mark;
based on the decoding result, acquiring the length of the next preset vector to be processed;
judging whether the length of the next preset vector to be processed is the same as the length of the last preset vector to be processed;
if the length of the next preset vector to be processed is the same as the length of the last preset vector to be processed, processing the vector to be processed based on the length of the application vector, the processing times and the length of the tail vector;
and if the length of the next preset to-be-processed vector is different from the length of the last preset to-be-processed vector, re-acquiring the processing times and the tail vector length based on the application vector length and the length of the next preset to-be-processed vector.
Optionally, the calculating the vector length setting instruction processing times based on the software loop flag includes:
and setting the software loop flag i as 0, and setting the vector length setting instruction processing times to j.
When the actual set vector length is equal to the predicted vector length, the vector length sets the instruction processing times j=j+1;
when the actual set vector length is not equal to the predicted vector length, the vector length set instruction processing times remain unchanged.
The invention also provides a vector length setting system.
A vector length setting system comprising:
the first acquisition module is used for acquiring vector instructions;
the decoding module is used for decoding the vector instructions in sequence to obtain a decoding result;
the setting module is used for predictively setting the length of the predicted vector and marking a first vector length setting instruction;
the address acquisition module is used for acquiring an operand address based on the decoding result;
a reading module for reading a software application vector length based on the operand address;
the second acquisition module is used for acquiring the hardware support length;
the comparison module is used for obtaining the actual set vector length based on the software application vector length and the hardware support length;
the third acquisition module is used for acquiring the length of the preset vector to be processed;
the fourth acquisition module is used for acquiring the processing times and the tail vector length based on the preset vector length to be processed and the actual set vector length, wherein the processing times and the tail vector length are vector prediction tables;
the judging module is used for judging whether the length of the actual set vector is equal to the length of the predicted vector;
the first execution module is used for obtaining vector length setting instruction processing times based on the marked first vector length setting instruction if the actual setting vector length is equal to the predicted vector length;
the processing module is used for setting instruction processing times, the processing times and the tail vector length based on the vector length and processing the vector to be processed;
and the second execution module is used for refreshing the processing times of the processing vector length setting instruction and processing the vector to be processed based on the actual setting vector length if the actual setting vector length is not equal to the predicted vector length.
The invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor adopts a vector processing method when executing the program.
The invention also provides a computer storage medium, wherein at least one executable instruction is stored in the storage medium, and the executable instruction causes a processor to execute a vector processing method.
The beneficial effects of the invention are as follows:
when the vector to be processed is processed according to the vector instruction, the length of the predicted vector is predictively set in a decoding stage, so that the instruction can be executed all the time, the first vector length setting instruction is marked in the first processing so as to obtain the processing times of the vector length setting instruction, the predicted vector length is compared with the actual set vector length in each processing, if the predicted vector length is the same, the processing is completed, the processing times of the vector length setting instruction are increased by 1, if the predicted vector length is not the same, the instruction needs to be refreshed, and the vector to be processed is processed again according to the tail vector length. The vector length of each processing is set in the decoding stage, and a plurality of VL copies are not needed, so that the processing efficiency is improved.
Drawings
FIG. 1 is a flow chart of a vector processing method according to an embodiment of the present application;
FIG. 2 is a flow chart of obtaining an actual set vector length based on a software application vector length and a hardware support length according to an embodiment of the present application;
fig. 3 is a schematic flow chart of obtaining the number of processing times and the tail vector length based on the preset length of the vector to be processed and the actual set length of the vector in the embodiment of the application;
FIG. 4 is a flow chart of processing a vector to be processed according to the embodiment of the present application, wherein the number of times of instruction processing, the number of times of processing, and the length of a tail vector are set based on the length of the vector;
FIG. 5 is a flowchart of a process of obtaining a vector length setting instruction number of times according to an embodiment of the present application;
FIG. 6 is a schematic flow chart of the embodiment of the present application after setting the instruction processing times, the processing times and the tail vector length based on the vector length, and processing the vector to be processed;
FIG. 7 is a flowchart of calculating the number of vector length setting instruction processes based on a software loop flag according to an embodiment of the present application;
fig. 8 is a hardware processing flowchart of a vector processing method according to an embodiment of the present application.
Reference numerals illustrate:
1. an instruction acquisition module; 2. a decoding module; 3. renaming the module; 4. a reading module; 5. a control module; 6. a refresh module; 7. and a vector length prediction module.
Detailed Description
A vector processing method, as in fig. 1, comprising:
s100, acquiring a vector instruction.
In particular, vector instructions are a type of computer instruction that performs parallel computation on vector data. Compared with the traditional scalar instruction, the vector instruction can simultaneously execute the same operation on a plurality of data elements, thereby improving the calculation efficiency.
S101, decoding vector instructions in sequence to obtain decoding results.
S102, predictively setting a predicted vector length and marking a first vector length setting instruction.
S103, acquiring an operand address based on the decoding result.
Specifically, the vector instruction is decoded according to the sequence of the sequential vector instruction when decoded, and the sequence decoding is a process of analyzing the operation code and the operand field in the vector instruction into executable control signals according to the sequence of vector execution.
The predicted vector length is to enable the vector instruction to continue running, the decoding of the returned result is not required to be restarted, the processing speed is improved, generally, the initial value of the predicted vector length is a hardware support length, the predicted vector length is changed into an actual vector length after errors occur in the processing process, the first vector length setting instruction is marked to obtain the number of times the preset vector length to be processed is processed, namely the vector length setting instruction processing number of times, marking is carried out when the vector length setting instruction processing number of times is processed for the first time, then the vector length setting instruction processing number of times is increased by 1 each time, and the vector length setting instruction processing number of times can be recorded through a counter.
The decoder determines the specific address of the operand, such as a memory address or a register address, based on the operand field in the instruction.
S104, reading the length of the software application vector based on the operand address.
Specifically, according to the operand address, a corresponding register is found, and the set software application vector length is stored in the register, wherein the software application vector length is the vector length of each processing set by software at the beginning when the vector is processed.
S105, acquiring the hardware support length.
Specifically, the hardware support length is the maximum vector length that the hardware can actually handle.
S106, obtaining the actual set vector length based on the software application vector length and the hardware support length.
Specifically, the software application vector length is transferred to the hardware, but the hardware may not necessarily support the software application vector length in the actual use process, so the software application vector length needs to be adjusted according to the hardware support length to be used as the actual set vector length.
S107, acquiring the length of a preset vector to be processed.
S108, obtaining the processing times and the tail vector length based on the preset vector length to be processed and the actual set vector length, wherein the processing times and the tail vector length are vector prediction tables.
Specifically, the length of the preset to-be-processed vector is the length of the vector to be processed, when the length of the preset to-be-processed vector is long, the length of the preset to-be-processed vector is required to be processed separately, the vector prediction table comprises two pieces of information, the processing times for processing the length of the preset to-be-processed vector and the length of the tail vector, the length of the tail vector is the length of the last time the length of the preset to-be-processed vector is processed, and the processing times are the times required by the preset to-be-processed vector according to the actual set length of the vector and the length of the tail vector.
S109, judging whether the actual set vector length is equal to the predicted vector length.
S110, if the actual set vector length is equal to the predicted vector length, the vector length set instruction processing times are obtained based on the first vector length set instruction.
Specifically, the first vector length setting instruction is marked as recording the start of processing the vector to be processed for the first time, and then the number of times of processing the vector length setting instruction is increased by 1 each time the vector to be processed is successfully processed.
S111, setting instruction processing times, processing times and tail vector length based on the vector length, and processing the vector to be processed.
Specifically, as the number of times of processing the vector to be processed is different, the set vector length of the predictive setting is also different, and the processing is performed according to the tail vector length in the last processing.
And S112, if the actual set vector length is not equal to the predicted vector length, refreshing the processing vector length to set the instruction processing times and processing the vector to be processed based on the actual set vector length.
Specifically, each time a vector to be processed is processed according to the length of the predicted vector, the vector to be processed is compared with the length of the actually set vector, the processing is successful only when the length of the predicted vector is the same as the length of the actually set vector, and when the vector to be processed is processed once, the count number of the counter is increased by 1, but when the length of the actually set vector is not equal to the length of the predicted vector, the vector instruction needs to be refreshed, and at the moment, the counter is refreshed and kept the same as the number of the last vector instruction, and the vector to be processed is processed according to the length of the tail vector. For example, the vector to be processed is processed for 5 times at present, and an error occurs at the 6 th time, at this time, the instruction is refreshed, the counter times are still 5 times, the processing is performed according to the length of the tail vector, and the counter count times become 6 times after the processing is completed.
After each processing of the vector to be processed is completed, the residual length is reduced, for example, the vector length is 23, the length of each processing is 4, the vector length becomes 19 after the first processing is completed, 15 after the second processing is completed, the processing is sequentially performed until the last time, and the residual vector length is 3, but the predicted vector length is 4, at this time, an error occurs, and the residual vector length is processed according to the tail vector length 3.
In existing processing approaches, each software attempt to set vl with a vsetvl instruction via scalar register operands results in execution of subsequent instructions being dependent on scalar register rs1 for setting vl. When the method is implemented, the out-of-order processor can select to virtualize/rename the vl to eliminate data correlation when the vector expansion is implemented, and the instructions with dependence on the vl wait for the values of the vl to be ready through an out-of-order scheduling algorithm, so that the processor can decode and dispatch subsequent operations in advance before the operands for setting the vl are ready. The data correlation is processed to eliminate the vl correlation, and multiple sets of identical vl values, i.e., vector length maxima, are actually stored in the processor.
The implementation principle of the embodiment is as follows: when the vector to be processed is processed according to the vector instruction, the vector length is predicted in the decoding stage, so that the instruction can be executed all the time, the first vector length setting instruction is marked in the first processing so as to obtain the processing times of the vector length setting instruction, the predicted vector length is compared with the actual set vector length in each processing, if the predicted vector length is the same, the processing is completed, the processing times of the vector length setting instruction are increased by 1, if the predicted vector length is not the same, the instruction needs to be refreshed, and the vector to be processed is processed again according to the tail vector length. The vector length of each processing is set in the decoding stage, and a plurality of VL copies are not needed, so that the processing efficiency is improved.
In one implementation manner of the present embodiment, as shown in fig. 2, step S106, that is, obtaining the actual set vector length based on the software application vector length and the hardware support length, includes:
s200, judging whether the length of the software application vector is larger than the hardware support length.
S210, if the software application vector length is greater than the hardware support length, changing the software application vector length into the hardware support length as the application vector length.
S220, if the software setting length is smaller than or equal to the hardware supporting length, the software application vector length is used as the application vector length.
Specifically, when the software application vector length is greater than the hardware support length, step S210 is executed, and when the software application vector length is greater than the hardware support length, the hardware will transfer the value to the software through the destination register because the hardware cannot support the vector length set by the software, and the software updates the value to the hardware support length according to the value so that the hardware can execute the command transferred by the subsequent software.
When the software application vector length is less than or equal to the hardware support length, step S220 is executed, and the software setting length is less than the hardware support length, and the software setting length is directly transmitted to the hardware according to the software setting length, and the hardware changes the length of each processing vector into the software setting length.
The implementation principle of the embodiment is as follows: and reasonably setting the length of the application vector according to the length relation between the actual supporting capacity of the hardware and the length of the software application vector so that the hardware can execute commands sent by the software.
In one implementation manner of the present embodiment, as shown in fig. 3, step S108 includes, based on a preset vector length to be processed and an actual set vector length, obtaining the processing times and the tail vector length:
s300, calculating the segmentation times and the residual length based on the application vector length and the preset to-be-processed vector length.
S310, dividing the number of times to be equal to the length of the preset vector to be processed divided by the length of the application vector to be an integer plus one, and taking the dividing number of times as the processing number of times.
S320, the residual length is equal to the length of the preset vector to be processed divided by the length of the application vector to obtain a remainder, and the residual length is used as the length of the tail vector.
Specifically, since hardware cannot process all the vectors to be processed at one time, it is necessary to divide the vectors to be processed, and the dividing number is the number of times required for processing the vectors to be processed, and the vector length is the vector length when the tail vector length is processed last time. For example, the length of the vector to be processed is preset to be 27 bytes, the length of the applied vector is 5 bytes, then the dividing number=int 27/5+1=6, the integer 27/5 is 5, and then 6 is obtained after adding 1, that is, the dividing number, the tail vector length is=round 27/5=2, meaning that the remainder after 27/5 is 2, so that the tail vector length is 2, that is, a vector of 27 bytes is processed, and when 5 bytes are processed each time, 6 times of processing of 5 bytes each time are required, and the 6 th time is that of processing 2 bytes last time.
Specifically, the vector prediction table includes a tail vector length and the number of processing times, and the vector prediction table can know when the processing of the preset vector to be processed is completed and what the vector length is processed in the last processing.
In one implementation manner of the present embodiment, as shown in fig. 4, step S111 sets the instruction processing number, the processing number, and the tail vector length based on the vector length, where processing the vector to be processed includes:
s400, acquiring vector length and setting instruction processing times.
Specifically, the vector length sets the number of times the instruction processing times has processed the vector to be processed.
S410, judging whether the processing times of the vector length setting instruction are smaller than the processing times.
Specifically, it is determined whether the number of times of processing of the vector length setting instruction is smaller than the number of times of processing in order to confirm whether the vector to be processed enters the last processing, and the vector lengths of the setting processing are different in the last processing.
S420, if the vector length setting instruction processing times are smaller than the processing times, the vector to be processed is processed based on the hardware support length setting prediction vector length.
S430, if the vector length setting instruction processing times are equal to the processing times, setting the predicted vector length as the tail vector length to process the vector to be processed.
Specifically, if the number of times of vector length setting instruction processing is equal to the number of times of processing, it indicates that the vector to be processed is processed to enter the last time of processing, step S430 is executed, and if the number of times of vector length setting instruction processing is smaller than the number of times of processing, it indicates that the vector to be processed is not processed to enter the last time of processing, step S420 is executed.
In a specific embodiment, the tail element of the software loop corresponds to the loop exit of the scalar processor (the loop exit may not be included) in the program semantically, and one implementation may consider that logic is not added separately, and the loop predictor is utilized to implement the above functions.
And when vset is executed for the first time, storing the acquired vector execution parameters, wherein the cycle times are used for realizing accurate value prediction of the branch predictor, the vector length of the tail element is contained, and the cycle predictor used for setting the vector length branch prediction module in the last cycle is responsible for identifying the last vector cycle, so that the AVL rs register number still needs to be recorded for associating the vector branch instruction.
In one implementation of this embodiment, as shown in fig. 5, step S400, that is, obtaining the number of times of processing the vector length setting instruction, includes:
s500, based on the vector instruction, performing software loop marking on the first vector length setting instruction.
Specifically, the software loop flag is a flag for confirming that the vector to be processed starts processing, that is, when the vector is processed for the first time, the number of times of instruction processing can be set according to the length of the vector to be processed later through the memorial flag, and the software loop flag can be regarded as the current number of times of processing as 0.
S510, calculating vector length setting instruction processing times based on the software loop marks.
Specifically, after marking, the vector length sets the number of instruction processing times to be increased by 1 for each processing of the vector.
In one implementation of the present embodiment, as shown in fig. 6, step S112, after setting the instruction processing times by the refresh processing vector length and processing the vector to be processed based on the tail vector length, includes:
s600, clearing the software circulation mark.
Specifically, after one vector to be processed is processed, the software loop flag is cleared, that is, the current processing times are zeroed, so that the next vector is processed.
S610, acquiring the length of the next preset vector to be processed based on the decoding result.
In particular, during processing, it is often impossible to process only one vector, and the length of the next vector to be processed is the length of the vector to be processed next.
S620, judging whether the length of the next preset vector to be processed is the same as the length of the last preset vector to be processed.
S630, if the length of the next preset vector to be processed is the same as the length of the last preset vector to be processed, the vector to be processed is processed based on the length of the application vector, the processing times and the length of the tail vector.
S640, if the length of the next preset to-be-processed vector is different from the length of the last preset to-be-processed vector, re-acquiring the processing times and the tail vector length based on the length of the application vector, the length of the next preset to-be-processed vector.
Specifically, if the length of the next preset to-be-processed vector is the same as the length of the last preset to-be-processed vector, the processing times and the tail vector length are the same, the same setting as the last preset to-be-processed vector is directly adopted to process, step S630 is executed, and if the length of the next preset to-be-processed vector is different from the length of the last preset to-be-processed vector, the processing times and the tail vector length need to be re-acquired according to the length of the application vector, the length of the next preset to-be-processed vector, and step S640 is executed. When the length of the next preset vector to be processed is the same as that of the last preset vector to be processed, the same setting as that of the last preset vector to be processed can be directly adopted, and the processing time is saved.
In one implementation of this embodiment, as shown in fig. 7, step S510, that is, calculating the vector length setting instruction processing number based on the software loop flag, includes:
s700, setting the software loop flag i as 0 and setting the vector length setting instruction processing times as j.
S710, when the actual set vector length is equal to the predicted vector length, the vector length sets the instruction processing number j=j+1.
S720, when the actual set vector length is not equal to the predicted vector length, the vector length setting instruction processing times are kept unchanged.
Specifically, when the actual set vector length is equal to the predicted vector length, the current processing number is j, which is equal to one software loop, and one software loop is performed, the vector length set instruction processing number j=i+1, where 0 is equal to or less than j is equal to or less than k, and k is the dividing number, and it is noted that when the actual set vector length is not equal to the predicted vector length, although the counter number is increased by 1, the number of times of the counter is increased by 1, because the vector instruction is erroneous when being executed, the vector instruction cannot be counted as one processing number, the vector instruction is refreshed, and the vector length set instruction processing number is increased by 1 after the tail vector length is processed.
As shown in fig. 8, the overall flow of processing vectors is:
the instruction acquisition module 1 acquires a vset instruction, the decoding module 2 decodes the vset instruction, searches the vector length prediction module, if the vset instruction is not executed for the first time, the vector cycle number and the vector length value processed each time (including the vector length value processed last time) can be obtained according to the previous training result, then whether the vset has marked the cycle, if the vset has been marked, the counter value needs to be updated each time the vset is executed, if the vset has not been marked, the counter is cleared, then whether the vset is executed last time is determined according to whether the counter value is equal to the vector cycle number, if the vset is equal, the vset is cleared so that the operation can be directly started when the vset is executed for the subsequent new cycle, wherein the vector length value processed each time is acquired through the renaming module 3, the renaming module 3 reads the corresponding physical register and the destination register address according to the operand address in the decoded instruction, and then the physical register value is read through the reading module 4.
If the vector is not the first executed vset instruction, the vector is processed according to the length of the application vector, and the number of vset loops is increased by 1 once, until the error occurs when the vector is processed, at this time, the error is processed because the predicted length of the application vector is unequal to the length of the vector actually required to be processed, the reset pipeline (through the control module 5) is required to inform the refreshing module 6 to refresh the whole pipeline to re-execute the error vector instruction, the value actually required to be processed is written into a target physical register, namely the vector length prediction module 7, the vector length prediction module 7 transmits the written value to the decoding module 2 for decoding, and the vector is processed according to the written value again.
If vset execution is being performed for the first time, the preset vector length to be processed is rounded up by applying the vector length remainder and division, respectively, the vector length containing the tail element (the vector length is always set to the hardware support maximum value if the remainder is 0) and the number of software loops are obtained, and the obtained values are sent to the vector length prediction module 7.
A vector length setting system comprising:
the first acquisition module is used for acquiring vector instructions;
the decoding module is used for decoding the vector instructions in sequence to obtain a decoding result;
the setting module is used for predictively setting the length of the predicted vector and marking a first vector length setting instruction;
the address acquisition module is used for acquiring an operand address based on the decoding result;
a reading module for reading a software application vector length based on the operand address;
the second acquisition module is used for acquiring the hardware support length;
the comparison module is used for obtaining the actual set vector length based on the software application vector length and the hardware support length;
the third acquisition module is used for acquiring the length of the preset vector to be processed;
the fourth acquisition module is used for acquiring the processing times and the tail vector length based on the preset vector length to be processed and the actual set vector length, wherein the processing times and the tail vector length are vector prediction tables;
the judging module is used for judging whether the length of the actual set vector is equal to the length of the predicted vector;
the first execution module is used for obtaining vector length setting instruction processing times based on the marked first vector length setting instruction if the actual setting vector length is equal to the predicted vector length;
the processing module is used for setting instruction processing times, processing times and tail vector length based on the vector length and processing the vector to be processed;
and the second execution module is used for refreshing the processing times of the processing vector length setting instruction and processing the vector to be processed based on the actual setting vector length if the actual setting vector length is not equal to the predicted vector length.
The embodiment of the application also discloses electronic equipment, which comprises a memory and a processor, wherein the memory stores a computer program capable of running on the processor, and a vector processing method is adopted when the processor loads and executes the computer program.
The terminal device may be a computer device such as a desktop computer, a notebook computer, or a cloud server, and the terminal device includes, but is not limited to, a processor and a memory, for example, the terminal device may further include an input/output device, a network access device, a bus, and the like.
The processor may be a Central Processing Unit (CPU), or of course, according to actual use, other general purpose processors, digital Signal Processors (DSP), application Specific Integrated Circuits (ASIC), ready-made programmable gate arrays (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc., and the general purpose processor may be a microprocessor or any conventional processor, etc., which is not limited in this application.
The memory may be an internal storage unit of the terminal device, for example, a hard disk or a memory of the terminal device, or may be an external storage device of the terminal device, for example, a plug-in hard disk, a Smart Memory Card (SMC), a secure digital card (SD), or a flash memory card (FC) equipped on the terminal device, or the like, and may be a combination of the internal storage unit of the terminal device and the external storage device, where the memory is used to store a computer program and other programs and data required by the terminal device, and the memory may be used to temporarily store data that has been output or is to be output, which is not limited in this application.
The vector processing method in the embodiment is stored in the memory of the terminal device through the terminal device, and is loaded and executed on the processor of the terminal device, so that the vector processing method is convenient to use.
The embodiment of the application also discloses a computer readable storage medium, and the computer readable storage medium stores a computer program, wherein the computer program is executed by a processor, and a vector processing method in the embodiment is adopted.
The computer program may be stored in a computer readable medium, where the computer program includes computer program code, where the computer program code may be in a source code form, an object code form, an executable file form, or some middleware form, etc., and the computer readable medium includes any entity or device capable of carrying the computer program code, a recording medium, a usb disk, a removable hard disk, a magnetic disk, an optical disk, a computer memory, a read-only memory (ROM), a Random Access Memory (RAM), an electrical carrier signal, a telecommunication signal, a software distribution medium, etc., where the computer readable medium includes, but is not limited to, the above components.
Wherein, by the present computer readable storage medium, a vector processing method in the above embodiment is stored in the computer readable storage medium, and is loaded and executed on a processor, so as to facilitate the storage and application of the above method.
The foregoing describes certain embodiments of the present invention. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
Those of ordinary skill in the art will appreciate that: the discussion of any of the embodiments above is merely exemplary and is not intended to imply that the scope of the present application is limited to such examples; the technical features of the above embodiments or in the different embodiments may also be combined under the idea of the present application, the steps may be implemented in any order, and there are many other variations of the different aspects of one or more embodiments in the present application as above, which are not provided in details for the sake of brevity.
One or more embodiments herein are intended to embrace all such alternatives, modifications and variations that fall within the broad scope of the present application. Any omissions, modifications, equivalents, improvements, and the like, which are within the spirit and principles of the one or more embodiments in the present application, are therefore intended to be included within the scope of the present application.

Claims (7)

1. A vector processing method, comprising:
acquiring a vector instruction;
sequentially decoding the vector instructions to obtain decoding results;
predictively setting a predicted vector length and marking a first vector length setting instruction;
acquiring an operand address based on the decoding result;
reading a software application vector length based on the operand address;
acquiring a hardware support length;
obtaining an actual set vector length based on the software application vector length and the hardware support length;
acquiring the length of a preset vector to be processed;
obtaining the processing times and the tail vector length based on the preset vector length to be processed and the actual set vector length, wherein the processing times and the tail vector length are vector prediction tables;
judging whether the length of the actual set vector is equal to the length of the predicted vector;
if the actual set vector length is equal to the predicted vector length, obtaining vector length set instruction processing times based on the first vector length set instruction;
setting instruction processing times, the processing times and the tail vector length based on the vector length, and processing a vector to be processed;
if the actual set vector length is not equal to the predicted vector length, refreshing the processing vector length to set the instruction processing times and processing the vector to be processed based on the actual set vector length;
the obtaining the actual set vector length based on the software application vector length and the hardware support length includes:
judging whether the length of the software application vector is greater than the hardware support length;
if the software application vector length is greater than the hardware support length, taking the hardware support length as the actual set vector length;
if the software setting length is smaller than or equal to the hardware supporting length, taking the software application vector length as the actual setting vector length;
the obtaining the processing times and the tail vector length based on the preset vector length to be processed and the actual set vector length includes:
calculating the dividing times and the residual length based on the actual set vector length and the preset vector length to be processed;
the dividing times are equal to the length of the preset vector to be processed divided by the length of the actual set vector to be added by one, and the dividing times are taken as the processing times;
the residual length is equal to the length of the preset vector to be processed divided by the length of the actual set vector, and the residual length is taken as the length of the tail vector;
the step of setting the instruction processing times, the processing times and the tail vector length based on the vector length, and the step of processing the vector to be processed includes:
acquiring vector length setting instruction processing times;
judging whether the vector length setting instruction processing times are smaller than the processing times or not;
if the vector length setting instruction processing times are smaller than the processing times, setting the predicted vector length based on the hardware support length to process the vector to be processed;
and if the vector length setting instruction processing times are equal to the processing times, setting the predicted vector length as the tail vector length to process the vector to be processed.
2. The vector processing method of claim 1, wherein said fetching vector length setting instruction processing times comprises:
based on the vector instruction, performing software loop marking on the first vector length setting instruction;
and calculating the vector length setting instruction processing times based on the software loop marker.
3. The method of vector processing according to claim 2, wherein said setting the number of instruction processes, the number of processes, and the tail vector length based on the vector length, after processing the vector to be processed, comprises:
clearing the software loop mark;
based on the decoding result, acquiring the length of the next preset vector to be processed;
judging whether the length of the next preset vector to be processed is the same as the length of the last preset vector to be processed;
if the length of the next preset vector to be processed is the same as the length of the last preset vector to be processed, processing the vector to be processed based on the length of the application vector, the processing times and the length of the tail vector;
and if the length of the next preset to-be-processed vector is different from the length of the last preset to-be-processed vector, re-acquiring the processing times and the tail vector length based on the application vector length and the length of the next preset to-be-processed vector.
4. The method of vector processing according to claim 2, wherein calculating the vector length setting instruction processing times based on the software loop flag comprises:
setting the software loop mark i as 0, and setting the vector length setting instruction processing times to j;
when the actual set vector length is equal to the predicted vector length, the vector length sets the instruction processing times j=j+1;
when the actual set vector length is not equal to the predicted vector length, the vector length set instruction processing times remain unchanged.
5. A vector processing system, comprising:
the first acquisition module is used for acquiring vector instructions;
the decoding module is used for decoding the vector instructions in sequence to obtain a decoding result;
the setting module is used for predictively setting the length of the predicted vector and marking a first vector length setting instruction;
the address acquisition module is used for acquiring an operand address based on the decoding result;
a reading module for reading a software application vector length based on the operand address;
the second acquisition module is used for acquiring the hardware support length;
the comparison module is used for obtaining the actual set vector length based on the software application vector length and the hardware support length;
the third acquisition module is used for acquiring the length of the preset vector to be processed;
the fourth acquisition module is used for acquiring the processing times and the tail vector length based on the preset vector length to be processed and the actual set vector length, wherein the processing times and the tail vector length are vector prediction tables;
the judging module is used for judging whether the length of the actual set vector is equal to the length of the predicted vector;
the first execution module is used for obtaining vector length setting instruction processing times based on the marked first vector length setting instruction if the actual setting vector length is equal to the predicted vector length;
the processing module is used for setting instruction processing times, the processing times and the tail vector length based on the vector length and processing the vector to be processed;
and the second execution module is used for refreshing the processing times of the processing vector length setting instruction and processing the vector to be processed based on the actual setting vector length if the actual setting vector length is not equal to the predicted vector length.
6. A terminal device comprising a memory and a processor, characterized in that the memory stores a computer program capable of running on the processor, which processor, when loaded and executed, employs the method according to any of claims 1-4.
7. A computer readable storage medium having a computer program stored therein, characterized in that the method according to any of claims 1 to 4 is employed when the computer program is loaded and executed by a processor.
CN202311356528.8A 2023-10-19 2023-10-19 Vector processing method, system, equipment and storage medium Active CN117093268B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311356528.8A CN117093268B (en) 2023-10-19 2023-10-19 Vector processing method, system, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311356528.8A CN117093268B (en) 2023-10-19 2023-10-19 Vector processing method, system, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN117093268A CN117093268A (en) 2023-11-21
CN117093268B true CN117093268B (en) 2024-01-30

Family

ID=88777286

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311356528.8A Active CN117093268B (en) 2023-10-19 2023-10-19 Vector processing method, system, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117093268B (en)

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103279327A (en) * 2013-04-28 2013-09-04 中国人民解放军信息工程大学 Automatic vectorizing method for heterogeneous SIMD expansion components
JP2017117064A (en) * 2015-12-22 2017-06-29 日本電気株式会社 Image processing unit, image processing method, and program
CN107408102A (en) * 2015-02-02 2017-11-28 优创半导体科技有限公司 It is configured to the vector processor operated using digital signal processing instructions to variable-length vector
CN108885550A (en) * 2016-04-01 2018-11-23 Arm有限公司 complex multiplication instruction
CN109062605A (en) * 2018-07-10 2018-12-21 北京智涵芯宇科技有限公司 Method and device towards sliding window Vector Processing
CN109074256A (en) * 2016-04-26 2018-12-21 Arm有限公司 The device and method of management address conflict when executing vector operations
CN111742296A (en) * 2018-02-28 2020-10-02 Arm有限公司 Data processing
CN112667289A (en) * 2020-12-21 2021-04-16 苏州浪潮智能科技有限公司 CNN reasoning acceleration system, acceleration method and medium
CN113590197A (en) * 2021-07-30 2021-11-02 中国人民解放军国防科技大学 Configurable processor supporting variable-length vector processing and implementation method thereof
CN113721985A (en) * 2021-11-02 2021-11-30 超验信息科技(长沙)有限公司 RISC-V vector register grouping setting method, device and electronic equipment
US11477161B1 (en) * 2021-10-29 2022-10-18 Splunk Inc. Systems and methods for detecting DNS communications through time-to-live analyses
CN115878188A (en) * 2023-02-20 2023-03-31 湖南大学 High-performance realization method of pooling layer function based on SVE instruction set

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9015656B2 (en) * 2013-02-28 2015-04-21 Cray Inc. Mapping vector representations onto a predicated scalar multi-threaded system
US11281726B2 (en) * 2017-12-01 2022-03-22 Palantir Technologies Inc. System and methods for faster processor comparisons of visual graph features

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103279327A (en) * 2013-04-28 2013-09-04 中国人民解放军信息工程大学 Automatic vectorizing method for heterogeneous SIMD expansion components
CN107408102A (en) * 2015-02-02 2017-11-28 优创半导体科技有限公司 It is configured to the vector processor operated using digital signal processing instructions to variable-length vector
JP2017117064A (en) * 2015-12-22 2017-06-29 日本電気株式会社 Image processing unit, image processing method, and program
CN108885550A (en) * 2016-04-01 2018-11-23 Arm有限公司 complex multiplication instruction
CN109074256A (en) * 2016-04-26 2018-12-21 Arm有限公司 The device and method of management address conflict when executing vector operations
CN111742296A (en) * 2018-02-28 2020-10-02 Arm有限公司 Data processing
CN109062605A (en) * 2018-07-10 2018-12-21 北京智涵芯宇科技有限公司 Method and device towards sliding window Vector Processing
CN112667289A (en) * 2020-12-21 2021-04-16 苏州浪潮智能科技有限公司 CNN reasoning acceleration system, acceleration method and medium
CN113590197A (en) * 2021-07-30 2021-11-02 中国人民解放军国防科技大学 Configurable processor supporting variable-length vector processing and implementation method thereof
US11477161B1 (en) * 2021-10-29 2022-10-18 Splunk Inc. Systems and methods for detecting DNS communications through time-to-live analyses
CN113721985A (en) * 2021-11-02 2021-11-30 超验信息科技(长沙)有限公司 RISC-V vector register grouping setting method, device and electronic equipment
CN115878188A (en) * 2023-02-20 2023-03-31 湖南大学 High-performance realization method of pooling layer function based on SVE instruction set

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
一种基于汇编代码的单重循环向量化方法;陆洪毅;戴葵;王志英;;计算机科学(第04期);第116-124页 *
面向SDR应用的向量存储器的设计与优化;陈海燕;刘胜;刘仲;陈书明;;国防科技大学学报(第03期);第98-102页 *

Also Published As

Publication number Publication date
CN117093268A (en) 2023-11-21

Similar Documents

Publication Publication Date Title
US10552163B2 (en) Method and apparatus for efficient scheduling for asymmetrical execution units
CN101681259B (en) System and method for using local condition code register for accelerating conditional instruction execution in pipeline processor
EP1709526B1 (en) Processor, method and computer program products for execution of instructions for efficient bit stream extractions
TW201224919A (en) Execute at commit state update instructions, apparatus, methods, and systems
CN110825437B (en) Method and apparatus for processing data
CN111208933B (en) Method, device, equipment and storage medium for data access
CN104133748A (en) Method and system to combine corresponding half word units from multiple register units within a microprocessor
CN114579312A (en) Instruction processing method, processor, chip and electronic equipment
CN117093268B (en) Vector processing method, system, equipment and storage medium
US20140052960A1 (en) Apparatus and method for generating vliw, and processor and method for processing vliw
CN111857830B (en) Method, system and storage medium for designing path for forwarding instruction data in advance
JPS5890247A (en) Pipeline controlling system of information processor
CN114968364B (en) Conditional statement processing method and device and storage medium
CN116501385B (en) Instruction processing method, processor, chip and computer equipment
CN108958802B (en) Thread pre-operation method, device and storage medium
US6550003B1 (en) Not reported jump buffer
JPS6116112B2 (en)
US20230205535A1 (en) Optimization of captured loops in a processor for optimizing loop replay performance
JP2772100B2 (en) Parallel instruction fetch mechanism
CN115080121A (en) Instruction processing method and device, electronic equipment and computer-readable storage medium
CN115167924A (en) Instruction processing method and device, electronic equipment and computer-readable storage medium
CN115167923A (en) Instruction processing method and device, electronic equipment and computer-readable storage medium
CN114968359A (en) Instruction execution method and device, electronic equipment and computer readable storage medium
CN114995884A (en) Instruction retirement unit, instruction execution unit, and related apparatus and method
CN113946540A (en) DSP processor and processing method for judging jump instruction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant