CN113867682B

CN113867682B - Coprocessor for realizing floating-point number out-of-order conversion

Info

Publication number: CN113867682B
Application number: CN202111473279.1A
Authority: CN
Inventors: 欧艳凤; 陈钦树; 朱伏生; 朱晓明
Original assignee: Guangdong Communications and Networks Institute
Current assignee: Guangdong Communications and Networks Institute
Priority date: 2021-12-06
Filing date: 2021-12-06
Publication date: 2022-02-22
Anticipated expiration: 2041-12-06
Also published as: CN113867682A

Abstract

The invention discloses a coprocessor for realizing out-of-order conversion of floating point numbers. The coprocessor comprises: an input decoding module, which is used for acquiring floating point number operation instructions, and decoding the floating point number operation instructions to generate floating point numbers information and an instruction token used to mark the currently executing instruction, match the corresponding functional sub-module according to the floating-point number information and transmit the instruction token to the following instruction scheduling module; a plurality of functional sub-modules are used for The floating-point number information performs out-of-order floating-point number conversion operations to generate floating-point number conversion results; an instruction scheduling module is used to store the thread sequence of the instruction tokens sent by the input decoding module; The instruction token thread sequentially outputs the floating point conversion result. Therefore, the floating-point conversion processing speed can be accelerated, and the problem of asynchronous data transfer between the coprocessor and the general-purpose processor caused by out-of-order execution of the coprocessor can be solved.

Description

Coprocessor for realizing floating-point number out-of-order conversion

Technical Field

The invention relates to the technical field of integrated circuit design, in particular to a coprocessor for realizing floating-point number out-of-order conversion.

Background

The data conversion operation is an indispensable flow in a floating point arithmetic unit of a processor, and a typical data conversion mode may include data conversion instructions such as conversion of various data precisions, conversion between integers and floating point numbers, conversion between floating point numbers and fixed point numbers, and the like. The data conversion operation may be implemented by software or hardware.

In order to realize data conversion operation based on hardware, the RISC-V instruction set (the open instruction set architecture ISA established based on the reduced instruction set computing RISC principle, V is expressed as the fifth generation RISC) is specially extended with a floating point instruction set. However, the floating point conversion processing speed in the RISC-V instruction set is not fast enough, and there is a problem that data transfer between the coprocessor and the general-purpose processor is not synchronized due to the out-of-order execution of the coprocessor.

Disclosure of Invention

The technical problem to be solved by the present invention is to provide a coprocessor for implementing floating-point number out-of-order conversion. Therefore, integer, single-precision and double-precision data operands can be processed out of order, and can be submitted in order, which is beneficial to improving the floating point conversion processing speed.

In order to solve the above technical problem, a first aspect of the present invention discloses a coprocessor for implementing floating-point out-of-order conversion, the coprocessor comprising: the input decoding module is used for acquiring a floating-point number operating instruction, decoding the floating-point number operating instruction to generate floating-point number information and an instruction token for marking the currently executed instruction, matching corresponding functional sub-modules according to the floating-point number information and transmitting the instruction token to the following instruction scheduling module; the plurality of functional sub-modules are used for executing the operation of floating point number conversion out of order according to the floating point number information to generate a floating point number conversion result; the instruction scheduling module is used for storing the order of the instruction token threads sent by the input decoding module; and the output decoding module is used for outputting the floating point number conversion result according to the instruction token thread sequence.

In some implementations, the input coding module includes: the judging unit is used for carrying out operand rule judgment on the floating-point number operation instruction and outputting the floating-point number operation instruction which accords with the operand rule; the analysis unit is used for analyzing the floating-point number operation instruction which accords with the operand rule and generating a function field and an operation code; the matching unit is used for matching the corresponding functional sub-modules according to the function fields and outputting the functional sub-modules to the corresponding functional sub-modules; and the thread unit is used for generating an instruction token according to the floating point number operation instruction executed by the current thread.

In some embodiments, the functional sub-module includes at least a thread for performing conversion between integer and single-precision floating point numbers, a thread for performing conversion between integer and double-precision floating point numbers, and a thread for performing conversion between single-precision floating point numbers and double-precision floating point numbers.

In some embodiments, the plurality of functional sub-modules operate independently of each other, and the plurality of functional sub-modules and the instruction scheduling module operate in parallel in the same pipeline.

In some implementations, the output coding module includes: the polling module is used for polling the functional sub-modules and acquiring a completion signal of the floating point number conversion result generated by the functional sub-modules; and the output module is used for outputting the floating point number conversion result according to the instruction token thread sequence sent by the input decoding module and the completion signal.

In some embodiments, the output coding module further comprises: and the exception processing unit is used for receiving exception data generated by the functional sub-module.

In some embodiments, further comprising: the input temporary storage module is used for caching floating-point number operation instructions; the input decoding module acquires floating point number operation instructions from the input temporary storage module.

In some embodiments, further comprising: and the output temporary storage module is used for caching the floating point number conversion result output by the output decoding module, wherein the floating point number conversion result comprises a 32-bit single-precision floating point or integer result or a 64-bit double-precision floating point result.

In some embodiments, the input buffer module, the instruction dispatch module, and the output buffer module are all FIFO modules.

According to a second aspect of the present invention, there is disclosed an apparatus for out-of-order conversion of floating point numbers, the apparatus comprising: the buffer is used for storing floating-point number operation instructions; a coprocessor coupled with the buffer; the coprocessor is implemented as the coprocessor for implementing floating point out-of-order conversion described above.

Compared with the prior art, the invention has the beneficial effects that:

the invention can execute the operation of floating point number conversion through the disorder of a plurality of functional sub-modules, thereby reducing the execution time, and then realize the sequential submission mechanism by utilizing the execution sequence of the instruction scheduling module, so that the delay of floating point number processing does not need to be set as the longest floating point processing delay, and the floating point numbers can be submitted only after the floating point numbers of all functional sub-modules are processed. Therefore, integer, single-precision and double-precision data operands can be processed out of order, can be submitted in order, and is favorable for improving the floating point conversion processing speed.

Drawings

FIG. 1 is a schematic structural diagram of a coprocessor for implementing out-of-order conversion of floating point numbers according to an embodiment of the present invention;

FIG. 2 is a schematic view of a processing flow of an input decoding module according to an embodiment of the present invention;

FIG. 3 is a schematic view of a processing flow of an output decoding module according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of a coprocessor for implementing floating-point out-of-order translation according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of a coprocessor for implementing floating-point out-of-order conversion according to an embodiment of the present invention.

Detailed Description

For better understanding and implementation, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The terms "comprises," "comprising," and any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or modules is not necessarily limited to those steps or modules explicitly listed, but may include other steps or modules not expressly listed or inherent to such process, method, article, or apparatus.

The embodiment of the invention discloses a coprocessor for realizing floating point number out-of-order conversion, which can execute the operation of floating point number conversion through the out-of-order of a plurality of functional sub-modules so as to reduce the execution time, and then realize an order submission mechanism by utilizing the execution order of an instruction scheduling module, so that the delay of floating point number processing does not need to be set as the longest floating point processing delay, and the floating point number can be submitted only after the floating point number processing of all the functional sub-modules is finished. Therefore, integer, single-precision and double-precision data operands can be processed out of order, can be submitted in order, and is favorable for improving the floating point conversion processing speed.

Example one

Referring to fig. 1, fig. 1 is a schematic structural diagram of a coprocessor for implementing floating-point out-of-order conversion according to an embodiment of the present invention. The coprocessor for implementing floating-point out-of-order conversion can be applied to a RISC-V instruction set, and the application range of the coprocessor is not limited by the embodiment of the invention. As shown in fig. 1, the coprocessor for implementing floating-point out-of-order conversion includes: the device comprises an input decoding module 1, a plurality of functional sub-modules 2, an instruction scheduling module 3 and an output decoding module 4.

The input decoding module 1 is used for acquiring a floating point number operation instruction, decoding the floating point number operation instruction to generate floating point number information and an instruction token for marking the currently executed instruction, matching the corresponding functional sub-module 2 according to the floating point number information, and transmitting the instruction token to the following instruction scheduling module 3. Wherein, the input decoding module 1 comprises: and the judging unit 11 is configured to perform operand rule judgment on the floating-point number operation instruction, and output the floating-point number operation instruction meeting the operand rule. In the obtained floating-point number operation instruction, the floating-point number conversion request packet may be implemented, where the request packet includes an operand, an opcode, and a function field for performing a conversion operation, and in this embodiment, the width of the operand is set to 64 bits, which may satisfy that only 32 bits are used to transfer integer data or single-precision floating-point data, or 64 bits are used to transfer double-precision floating-point data. If signed 32 bits of data, the upper 32 bits are filled by the sign bit of the data. According to the instruction set standard defined by RISC-V, the function field used by the floating-point conversion instruction, such as conversion from integer to single-precision floating-point, conversion from single-precision floating-point to double-precision floating-point, etc., is funct5, where the conversion, movement, and sign injection of floating-point numbers are all encoded in the main OPcode space of the OP-FP (the operation code used by the RISC-V standard represents a sign). When the judgment unit 11 judges the operand rule of the floating-point operation instruction, as shown in fig. 2, it judges that the operation code of the received floating-point operation instruction is an OP-FP (RISC-V standard), and if the operation code of the floating-point operation instruction does not meet the OP-FP standard, judges that the floating-point operation instruction is not the coprocessor instruction, inputs the floating-point operation instruction meeting the coprocessor instruction to the analysis unit 12, and performs the next-stage analysis operation, that is, analyzes the floating-point operation instruction meeting the operand rule, and generates the function field funtc5 and the operation code fmt. Only funtc5 and fmt together determine the type of transaction that the coprocessor can handle. The implementation of the parsing can refer to the RISC-V standard implementation, and illegal instructions (non-floating point translation/move/sign injection instructions) after parsing decoding will not be executed because there is no corresponding functional sub-module in the present coprocessor. Then, the matching unit 13 matches the corresponding functional sub-module according to the function field and outputs the result to the corresponding functional sub-module. At the same time, the thread unit 14 will generate an instruction token according to the floating-point operation instruction executed by the current thread, and the instruction token can mark the floating-point operation instruction currently being executed.

And the plurality of functional sub-modules 2 are used for executing the operation of floating point number conversion out of order according to the floating point number information to generate a floating point number conversion result. The functional sub-modules comprise a certain number of conversion modules, two symbol injection modules and a mobile module. The conversion module not only comprises a thread for executing conversion between integer and single-precision floating point numbers, a thread for executing conversion between integer and double-precision floating point numbers, and a thread for executing conversion between single-precision floating point numbers and double-precision floating point numbers, but also can comprise other necessary conversion modules, or the number of the conversion modules is increased according to needs. In a specific implementation, the functional sub-module may cause the operation result and the exception according to the IEEE 754-. And the plurality of functional sub-modules run independently, the execution is not disturbed, the plurality of functional sub-modules can be in an execution state at the same time, and the plurality of functional sub-modules and the instruction scheduling module run in parallel in the same flow thread.

The instruction scheduling module 3 is used for storing the instruction token thread sequence sent by the input decoding module 1. In a specific application, the instruction scheduling module may be implemented as a FIFO first-in first-out memory, and may form the rule of instruction scheduling by storing the instruction token currently being executed by the pipeline. The depth of the instruction scheduling FIFO may be set to the maximum pipeline number of the coprocessor, and may be set to 5 in this embodiment. Since the instructions sent by the input decoding module are in instruction order, the FIFO is kept in order in the instruction scheduling. This ensures that the last instruction is committed in order rather than out of order (due to the nature of the FIFO, the first instruction to store will necessarily be the first instruction to pop).

And the output decoding module 4 is used for outputting the conversion result of the floating point number according to the instruction token thread sequence. Wherein, the output decoding module 4 comprises: and the polling module 41 is configured to poll the functional sub-module and obtain a completion signal of the floating point number conversion result generated by the functional sub-module. And the output module 42 is configured to output a floating-point number conversion result according to the instruction token thread sequence and the completion signal sent by the input decoding module. And an exception handling unit 43, configured to receive exception data generated by the functional sub-module.

In order to ensure that the entire coprocessed floating-point number conversion result is committed in order, the oldest instruction from the pipeline, i.e. the top instruction of the FIFO in the instruction scheduling module 3, is committed in the output decode module 4. At the same time, the polling module 41 is used to poll the completion signals from the matched functional sub-modules. After the functional sub-module finishes converting to generate a floating point number conversion result, the floating point number conversion result is read, and the exception data in the functional sub-module is stored in the exception handling unit 43, which may be stored in an output FIFO with a depth of 2, for example. Finally, the floating-point number conversion result is output by the output module 42 according to the instruction token thread order and the completion signal sent by the input decoding module, and the floating-point number conversion result may be implemented as an Fpu response packet, where the response packet only contains data after floating-point conversion, and may be a 32-bit (single-precision floating point or integer) or 64-bit (double-precision floating point) result.

Illustratively, as shown in FIG. 3, the instruction at the top of the instruction dispatch module FIFO is instruction 1, and the functional sub-module that executes instruction 1 is module 1; the next instruction to instruction 1 is instruction 2, the functional submodule 2 that executes instruction 2. At this time, the functional submodule 1 has not executed the instruction 1, but the functional submodule 2 has executed the instruction 2. Even then, the output decoding module does not store the response of the instruction 2 into the FIFO, but must wait for the functional sub-module 1 to finish executing the instruction 1, store the response of the instruction 1 into the FIFO, and then store the response of the instruction 2 into the FIFO. The output decoding module needs to know the next instruction to be issued, so the head instruction of the FIFO is read first, and according to the characteristics of the FIFO, the instruction is eliminated in the FIFO after the instruction is read. Instruction 2 will be the top instruction of the instruction dispatch module FIFO after instruction 1 is eliminated.

Therefore, the coprocessor provided by the embodiment can execute the floating point number conversion operation through the disorder of the plurality of functional sub-modules, so that the execution time is reduced, and then the execution sequence of the instruction scheduling module is used for realizing the sequential submission mechanism, so that the delay of floating point number processing does not need to be set as the longest floating point processing delay, and the floating point number can be submitted only after the floating point number processing of all the functional sub-modules is finished. Therefore, integer, single-precision and double-precision data operands can be processed out of order, can be submitted in order, and is favorable for improving the floating point conversion processing speed.

Example two

Referring to fig. 4, fig. 4 is a schematic diagram of another coprocessor for implementing floating-point out-of-order conversion according to an embodiment of the present invention. The coprocessor for implementing floating-point out-of-order conversion can be applied to a RISC-V instruction set, and the application range of the coprocessor is not limited by the embodiment of the invention. As shown in fig. 4, the coprocessor for implementing floating-point out-of-order conversion includes:

the device comprises an input temporary storage module 5, an input decoding module 1, a plurality of functional sub-modules 2, an instruction scheduling module 3, an output decoding module 4 and an output temporary storage module 6. The implementation manners of the input decoding module 1, the plurality of functional sub-modules 2, the instruction scheduling module 3, and the output decoding module 4 are substantially the same as those in the above embodiments, and are not described herein again.

The input temporary storage module 5 is an FIFO memory, the input decoding module 1 can directly obtain the floating point number operation instruction in the input temporary storage module 5 when obtaining the floating point number operation instruction, the floating point number operation instruction can be stored in the input temporary storage module 5, the depth of the FIFO memory, that is, the storage amount, can be set according to the demand, exemplarily, the obtained floating point number operation instruction is stored in the FIFO with the depth of 2, that is, two instructions are cached at most in the input temporary storage module 5, so as to deal with the situation of frequently performing the floating point conversion operation. If the input temporary storage module 5 is full, the whole external coprocessor is fed back to be in a busy state, and a new floating point number conversion instruction is not received for the moment.

Furthermore, for the input decoding module, how to make the corresponding functional sub-module idle after the decoding operation can directly execute the instruction, if the functional sub-module is already occupied, the instruction is temporarily not executed, and the input decoding module is marked as a busy state, and the instruction is not taken from the input temporary storage module any more. And when the current instruction is executed, releasing the input decoding module, and continuously taking out the next floating point number conversion instruction from the input temporary storage module 5 for decoding.

The output temporary storage module 6 is a FIFO memory and is configured to cache the floating-point number conversion result output by the output decoding module, where the floating-point number conversion result includes a 32-bit single-precision floating-point or integer result, or a 64-bit double-precision floating-point result.

Therefore, the coprocessor provided by the embodiment can execute the floating point number conversion operation through the disorder of the plurality of functional sub-modules, so that the execution time is reduced, and then the execution sequence of the instruction scheduling module is used for realizing the sequential submission mechanism, so that the delay of floating point number processing does not need to be set as the longest floating point processing delay, and the floating point number can be submitted only after the floating point number processing of all the functional sub-modules is finished. And the instruction receiving and outputting of the coprocessor are effectively planned by using the input temporary storage module and the input temporary storage module, so that the processing efficiency of the whole coprocessor is further improved. Therefore, integer, single-precision and double-precision data operands can be processed out of order, can be submitted in order, and is favorable for improving the floating point conversion processing speed.

EXAMPLE III

Referring to fig. 5, fig. 5 is a schematic structural diagram of a coprocessor for implementing floating-point out-of-order conversion according to an embodiment of the present invention. The coprocessor for implementing floating-point out-of-order conversion described in fig. 5 may be applied to a RISC-V data set, and the embodiment of the present invention is not limited to the coprocessor application data set for implementing floating-point out-of-order conversion. As shown in fig. 5, the coprocessor may include:

a memory 601 in which executable program code is stored;

a processor 602 coupled to a memory 601;

the processor 602 calls the executable program code stored in the memory 601 for executing the co-processing for implementing floating point out-of-order conversion as described in the first embodiment.

Example four

The embodiment of the invention discloses a computer-readable storage medium for storing a computer program for electronic data exchange, wherein the computer program enables a computer to execute the coprocessor for realizing floating-point out-of-order conversion, which is described in the first embodiment.

EXAMPLE five

An embodiment of the invention discloses a computer program product comprising a non-transitory computer readable storage medium storing a computer program, and the computer program is operable to cause a computer to execute a coprocessor for implementing floating point out-of-order conversion as described in the first or second embodiment.

The above-described embodiments are only illustrative, and the modules described as separate components may or may not be physically separate, and the components displayed as modules may or may not be physical modules, may be located in one place, or may be distributed on a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above detailed description of the embodiments, those skilled in the art will clearly understand that the embodiments may be implemented by software plus a necessary general hardware platform, and may also be implemented by hardware. Based on such understanding, the above technical solutions may be embodied in the form of a software product, which may be stored in a computer-readable storage medium, where the storage medium includes a Read-Only Memory (ROM), a Random Access Memory (RAM), a Programmable Read-Only Memory (PROM), an Erasable Programmable Read-Only Memory (EPROM), a One-time Programmable Read-Only Memory (OTPROM), an Electrically Erasable Programmable Read-Only Memory (EEPROM), a Compact Disc-Read-Only Memory (CD-ROM), or other disk memories, CD-ROMs, or other magnetic disks, A tape memory, or any other medium readable by a computer that can be used to carry or store data.

Finally, it should be noted that: the coprocessor for implementing floating-point out-of-order conversion disclosed in the embodiments of the present invention is only a preferred embodiment of the present invention, and is only used for illustrating the technical solutions of the present invention, rather than limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those skilled in the art; the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. A coprocessor for implementing floating point out-of-order translation, the coprocessor comprising:

the input decoding module is used for acquiring a floating-point number operating instruction, decoding the floating-point number operating instruction to generate floating-point number information and an instruction token for marking the currently executed instruction, matching corresponding functional sub-modules according to the floating-point number information and transmitting the instruction token to the following instruction scheduling module;

the plurality of functional sub-modules are used for executing the operation of floating point number conversion out of order according to the floating point number information to generate a floating point number conversion result;

the instruction scheduling module is used for storing the order of the instruction token threads sent by the input decoding module;

the output decoding module is used for outputting the conversion result of the floating point number according to the order of the instruction token thread; and submitting a top instruction of the FIFO in the instruction scheduling module in the output decoding module, polling a completion signal from each matched functional sub-module by the output decoding module, reading the floating point conversion result after the functional sub-modules complete conversion and generate the floating point conversion result, and outputting the floating point conversion result according to the instruction token thread sequence and the completion signal sent by the input decoding module.

2. The coprocessor of claim 1, wherein the input decode module comprises: the judging unit is used for carrying out operand rule judgment on the floating-point number operation instruction and outputting the floating-point number operation instruction which accords with the operand rule;

the analysis unit is used for analyzing the floating-point number operation instruction which accords with the operand rule and generating a function field and an operation code;

the matching unit is used for matching the corresponding functional sub-modules according to the function fields and outputting the functional sub-modules to the corresponding functional sub-modules;

and the thread unit is used for generating an instruction token according to the floating point number operation instruction executed by the current thread.

3. Coprocessor for implementing out-of-order conversion of floating point numbers according to claim 2, characterized in that said functional submodules comprise at least a thread for executing conversion between integer and single-precision floating point numbers, a thread for executing conversion between integer and double-precision floating point numbers, and a thread for executing conversion between single-precision floating point numbers and double-precision floating point numbers.

4. The coprocessor of claim 3, wherein the plurality of functional sub-modules operate independently of one another, and the plurality of functional sub-modules operate in parallel with the instruction scheduling module in the same pipeline.

5. The coprocessor of claim 1, wherein the output decode module comprises: the polling module is used for polling the functional sub-modules and acquiring a completion signal of the floating point number conversion result generated by the functional sub-modules;

and the output module is used for outputting the floating point number conversion result according to the instruction token thread sequence sent by the input decoding module and the completion signal.

6. The coprocessor of claim 5, wherein the output decode module further comprises:

and the exception processing unit is used for receiving exception data generated by the functional sub-module.

7. The coprocessor for implementing floating-point out-of-order conversion according to claim 2, further comprising:

the input temporary storage module is used for caching floating-point number operation instructions;

the input decoding module acquires floating point number operation instructions from the input temporary storage module.

8. The coprocessor for implementing floating-point out-of-order conversion according to claim 5, further comprising:

and the output temporary storage module is used for caching the floating point number conversion result output by the output decoding module, wherein the floating point number conversion result comprises a 32-bit single-precision floating point or integer result or a 64-bit double-precision floating point result.

9. Coprocessor for implementing floating-point out-of-order conversion according to one of claims 1 to 8,

the input temporary storage module, the instruction scheduling module and the output temporary storage module are FIFO modules.

10. An apparatus for out-of-order conversion of floating point numbers, the apparatus comprising:

the buffer is used for storing floating-point number operation instructions;

a coprocessor coupled with the buffer;

the coprocessor is implemented as a coprocessor for implementing floating point out-of-order conversion as claimed in any one of claims 1 to 9.