CN113867682B - Coprocessor for realizing floating-point number out-of-order conversion - Google Patents
Coprocessor for realizing floating-point number out-of-order conversion Download PDFInfo
- Publication number
- CN113867682B CN113867682B CN202111473279.1A CN202111473279A CN113867682B CN 113867682 B CN113867682 B CN 113867682B CN 202111473279 A CN202111473279 A CN 202111473279A CN 113867682 B CN113867682 B CN 113867682B
- Authority
- CN
- China
- Prior art keywords
- floating
- instruction
- point number
- floating point
- module
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000006243 chemical reaction Methods 0.000 title claims abstract description 105
- 238000012545 processing Methods 0.000 claims abstract description 24
- 238000011022 operating instruction Methods 0.000 claims abstract description 6
- 238000013519 translation Methods 0.000 claims description 3
- 238000012546 transfer Methods 0.000 abstract description 4
- 230000015654 memory Effects 0.000 description 15
- 239000000243 solution Substances 0.000 description 7
- 238000010586 diagram Methods 0.000 description 6
- 230000006870 function Effects 0.000 description 6
- 238000004590 computer program Methods 0.000 description 5
- 230000004044 response Effects 0.000 description 5
- 230000002349 favourable effect Effects 0.000 description 4
- 230000007246 mechanism Effects 0.000 description 4
- 238000000034 method Methods 0.000 description 4
- 238000002347 injection Methods 0.000 description 3
- 239000007924 injection Substances 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000013461 design Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
- G06F7/483—Computations with numbers represented by a non-linear combination of denominational numbers, e.g. rational numbers, logarithmic number system or floating-point numbers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3877—Concurrent instruction execution, e.g. pipeline or look ahead using a slave processor, e.g. coprocessor
Landscapes
- Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computational Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Nonlinear Science (AREA)
- Advance Control (AREA)
Abstract
The invention discloses a coprocessor for realizing floating point number out-of-order conversion, which comprises: the input decoding module is used for acquiring a floating-point number operating instruction, decoding the floating-point number operating instruction to generate floating-point number information and an instruction token for marking the currently executed instruction, matching corresponding functional sub-modules according to the floating-point number information and transmitting the instruction token to the following instruction scheduling module; the plurality of functional sub-modules are used for executing the operation of floating point number conversion out of order according to the floating point number information to generate a floating point number conversion result; the instruction scheduling module is used for storing the order of the instruction token threads sent by the input decoding module; and the output decoding module is used for outputting the floating point number conversion result according to the instruction token thread sequence. Therefore, the floating point conversion processing speed can be accelerated, and the problem of asynchronous data transfer between the coprocessor and the general processor caused by the out-of-order execution of the coprocessor is solved.
Description
Technical Field
The invention relates to the technical field of integrated circuit design, in particular to a coprocessor for realizing floating-point number out-of-order conversion.
Background
The data conversion operation is an indispensable flow in a floating point arithmetic unit of a processor, and a typical data conversion mode may include data conversion instructions such as conversion of various data precisions, conversion between integers and floating point numbers, conversion between floating point numbers and fixed point numbers, and the like. The data conversion operation may be implemented by software or hardware.
In order to realize data conversion operation based on hardware, the RISC-V instruction set (the open instruction set architecture ISA established based on the reduced instruction set computing RISC principle, V is expressed as the fifth generation RISC) is specially extended with a floating point instruction set. However, the floating point conversion processing speed in the RISC-V instruction set is not fast enough, and there is a problem that data transfer between the coprocessor and the general-purpose processor is not synchronized due to the out-of-order execution of the coprocessor.
Disclosure of Invention
The technical problem to be solved by the present invention is to provide a coprocessor for implementing floating-point number out-of-order conversion. Therefore, integer, single-precision and double-precision data operands can be processed out of order, and can be submitted in order, which is beneficial to improving the floating point conversion processing speed.
In order to solve the above technical problem, a first aspect of the present invention discloses a coprocessor for implementing floating-point out-of-order conversion, the coprocessor comprising: the input decoding module is used for acquiring a floating-point number operating instruction, decoding the floating-point number operating instruction to generate floating-point number information and an instruction token for marking the currently executed instruction, matching corresponding functional sub-modules according to the floating-point number information and transmitting the instruction token to the following instruction scheduling module; the plurality of functional sub-modules are used for executing the operation of floating point number conversion out of order according to the floating point number information to generate a floating point number conversion result; the instruction scheduling module is used for storing the order of the instruction token threads sent by the input decoding module; and the output decoding module is used for outputting the floating point number conversion result according to the instruction token thread sequence.
In some implementations, the input coding module includes: the judging unit is used for carrying out operand rule judgment on the floating-point number operation instruction and outputting the floating-point number operation instruction which accords with the operand rule; the analysis unit is used for analyzing the floating-point number operation instruction which accords with the operand rule and generating a function field and an operation code; the matching unit is used for matching the corresponding functional sub-modules according to the function fields and outputting the functional sub-modules to the corresponding functional sub-modules; and the thread unit is used for generating an instruction token according to the floating point number operation instruction executed by the current thread.
In some embodiments, the functional sub-module includes at least a thread for performing conversion between integer and single-precision floating point numbers, a thread for performing conversion between integer and double-precision floating point numbers, and a thread for performing conversion between single-precision floating point numbers and double-precision floating point numbers.
In some embodiments, the plurality of functional sub-modules operate independently of each other, and the plurality of functional sub-modules and the instruction scheduling module operate in parallel in the same pipeline.
In some implementations, the output coding module includes: the polling module is used for polling the functional sub-modules and acquiring a completion signal of the floating point number conversion result generated by the functional sub-modules; and the output module is used for outputting the floating point number conversion result according to the instruction token thread sequence sent by the input decoding module and the completion signal.
In some embodiments, the output coding module further comprises: and the exception processing unit is used for receiving exception data generated by the functional sub-module.
In some embodiments, further comprising: the input temporary storage module is used for caching floating-point number operation instructions; the input decoding module acquires floating point number operation instructions from the input temporary storage module.
In some embodiments, further comprising: and the output temporary storage module is used for caching the floating point number conversion result output by the output decoding module, wherein the floating point number conversion result comprises a 32-bit single-precision floating point or integer result or a 64-bit double-precision floating point result.
In some embodiments, the input buffer module, the instruction dispatch module, and the output buffer module are all FIFO modules.
According to a second aspect of the present invention, there is disclosed an apparatus for out-of-order conversion of floating point numbers, the apparatus comprising: the buffer is used for storing floating-point number operation instructions; a coprocessor coupled with the buffer; the coprocessor is implemented as the coprocessor for implementing floating point out-of-order conversion described above.
Compared with the prior art, the invention has the beneficial effects that:
the invention can execute the operation of floating point number conversion through the disorder of a plurality of functional sub-modules, thereby reducing the execution time, and then realize the sequential submission mechanism by utilizing the execution sequence of the instruction scheduling module, so that the delay of floating point number processing does not need to be set as the longest floating point processing delay, and the floating point numbers can be submitted only after the floating point numbers of all functional sub-modules are processed. Therefore, integer, single-precision and double-precision data operands can be processed out of order, can be submitted in order, and is favorable for improving the floating point conversion processing speed.
Drawings
FIG. 1 is a schematic structural diagram of a coprocessor for implementing out-of-order conversion of floating point numbers according to an embodiment of the present invention;
FIG. 2 is a schematic view of a processing flow of an input decoding module according to an embodiment of the present invention;
FIG. 3 is a schematic view of a processing flow of an output decoding module according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a coprocessor for implementing floating-point out-of-order translation according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of a coprocessor for implementing floating-point out-of-order conversion according to an embodiment of the present invention.
Detailed Description
For better understanding and implementation, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The terms "comprises," "comprising," and any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or modules is not necessarily limited to those steps or modules explicitly listed, but may include other steps or modules not expressly listed or inherent to such process, method, article, or apparatus.
The embodiment of the invention discloses a coprocessor for realizing floating point number out-of-order conversion, which can execute the operation of floating point number conversion through the out-of-order of a plurality of functional sub-modules so as to reduce the execution time, and then realize an order submission mechanism by utilizing the execution order of an instruction scheduling module, so that the delay of floating point number processing does not need to be set as the longest floating point processing delay, and the floating point number can be submitted only after the floating point number processing of all the functional sub-modules is finished. Therefore, integer, single-precision and double-precision data operands can be processed out of order, can be submitted in order, and is favorable for improving the floating point conversion processing speed.
Example one
Referring to fig. 1, fig. 1 is a schematic structural diagram of a coprocessor for implementing floating-point out-of-order conversion according to an embodiment of the present invention. The coprocessor for implementing floating-point out-of-order conversion can be applied to a RISC-V instruction set, and the application range of the coprocessor is not limited by the embodiment of the invention. As shown in fig. 1, the coprocessor for implementing floating-point out-of-order conversion includes: the device comprises an input decoding module 1, a plurality of functional sub-modules 2, an instruction scheduling module 3 and an output decoding module 4.
The input decoding module 1 is used for acquiring a floating point number operation instruction, decoding the floating point number operation instruction to generate floating point number information and an instruction token for marking the currently executed instruction, matching the corresponding functional sub-module 2 according to the floating point number information, and transmitting the instruction token to the following instruction scheduling module 3. Wherein, the input decoding module 1 comprises: and the judging unit 11 is configured to perform operand rule judgment on the floating-point number operation instruction, and output the floating-point number operation instruction meeting the operand rule. In the obtained floating-point number operation instruction, the floating-point number conversion request packet may be implemented, where the request packet includes an operand, an opcode, and a function field for performing a conversion operation, and in this embodiment, the width of the operand is set to 64 bits, which may satisfy that only 32 bits are used to transfer integer data or single-precision floating-point data, or 64 bits are used to transfer double-precision floating-point data. If signed 32 bits of data, the upper 32 bits are filled by the sign bit of the data. According to the instruction set standard defined by RISC-V, the function field used by the floating-point conversion instruction, such as conversion from integer to single-precision floating-point, conversion from single-precision floating-point to double-precision floating-point, etc., is funct5, where the conversion, movement, and sign injection of floating-point numbers are all encoded in the main OPcode space of the OP-FP (the operation code used by the RISC-V standard represents a sign). When the judgment unit 11 judges the operand rule of the floating-point operation instruction, as shown in fig. 2, it judges that the operation code of the received floating-point operation instruction is an OP-FP (RISC-V standard), and if the operation code of the floating-point operation instruction does not meet the OP-FP standard, judges that the floating-point operation instruction is not the coprocessor instruction, inputs the floating-point operation instruction meeting the coprocessor instruction to the analysis unit 12, and performs the next-stage analysis operation, that is, analyzes the floating-point operation instruction meeting the operand rule, and generates the function field funtc5 and the operation code fmt. Only funtc5 and fmt together determine the type of transaction that the coprocessor can handle. The implementation of the parsing can refer to the RISC-V standard implementation, and illegal instructions (non-floating point translation/move/sign injection instructions) after parsing decoding will not be executed because there is no corresponding functional sub-module in the present coprocessor. Then, the matching unit 13 matches the corresponding functional sub-module according to the function field and outputs the result to the corresponding functional sub-module. At the same time, the thread unit 14 will generate an instruction token according to the floating-point operation instruction executed by the current thread, and the instruction token can mark the floating-point operation instruction currently being executed.
And the plurality of functional sub-modules 2 are used for executing the operation of floating point number conversion out of order according to the floating point number information to generate a floating point number conversion result. The functional sub-modules comprise a certain number of conversion modules, two symbol injection modules and a mobile module. The conversion module not only comprises a thread for executing conversion between integer and single-precision floating point numbers, a thread for executing conversion between integer and double-precision floating point numbers, and a thread for executing conversion between single-precision floating point numbers and double-precision floating point numbers, but also can comprise other necessary conversion modules, or the number of the conversion modules is increased according to needs. In a specific implementation, the functional sub-module may cause the operation result and the exception according to the IEEE 754-. And the plurality of functional sub-modules run independently, the execution is not disturbed, the plurality of functional sub-modules can be in an execution state at the same time, and the plurality of functional sub-modules and the instruction scheduling module run in parallel in the same flow thread.
The instruction scheduling module 3 is used for storing the instruction token thread sequence sent by the input decoding module 1. In a specific application, the instruction scheduling module may be implemented as a FIFO first-in first-out memory, and may form the rule of instruction scheduling by storing the instruction token currently being executed by the pipeline. The depth of the instruction scheduling FIFO may be set to the maximum pipeline number of the coprocessor, and may be set to 5 in this embodiment. Since the instructions sent by the input decoding module are in instruction order, the FIFO is kept in order in the instruction scheduling. This ensures that the last instruction is committed in order rather than out of order (due to the nature of the FIFO, the first instruction to store will necessarily be the first instruction to pop).
And the output decoding module 4 is used for outputting the conversion result of the floating point number according to the instruction token thread sequence. Wherein, the output decoding module 4 comprises: and the polling module 41 is configured to poll the functional sub-module and obtain a completion signal of the floating point number conversion result generated by the functional sub-module. And the output module 42 is configured to output a floating-point number conversion result according to the instruction token thread sequence and the completion signal sent by the input decoding module. And an exception handling unit 43, configured to receive exception data generated by the functional sub-module.
In order to ensure that the entire coprocessed floating-point number conversion result is committed in order, the oldest instruction from the pipeline, i.e. the top instruction of the FIFO in the instruction scheduling module 3, is committed in the output decode module 4. At the same time, the polling module 41 is used to poll the completion signals from the matched functional sub-modules. After the functional sub-module finishes converting to generate a floating point number conversion result, the floating point number conversion result is read, and the exception data in the functional sub-module is stored in the exception handling unit 43, which may be stored in an output FIFO with a depth of 2, for example. Finally, the floating-point number conversion result is output by the output module 42 according to the instruction token thread order and the completion signal sent by the input decoding module, and the floating-point number conversion result may be implemented as an Fpu response packet, where the response packet only contains data after floating-point conversion, and may be a 32-bit (single-precision floating point or integer) or 64-bit (double-precision floating point) result.
Illustratively, as shown in FIG. 3, the instruction at the top of the instruction dispatch module FIFO is instruction 1, and the functional sub-module that executes instruction 1 is module 1; the next instruction to instruction 1 is instruction 2, the functional submodule 2 that executes instruction 2. At this time, the functional submodule 1 has not executed the instruction 1, but the functional submodule 2 has executed the instruction 2. Even then, the output decoding module does not store the response of the instruction 2 into the FIFO, but must wait for the functional sub-module 1 to finish executing the instruction 1, store the response of the instruction 1 into the FIFO, and then store the response of the instruction 2 into the FIFO. The output decoding module needs to know the next instruction to be issued, so the head instruction of the FIFO is read first, and according to the characteristics of the FIFO, the instruction is eliminated in the FIFO after the instruction is read. Instruction 2 will be the top instruction of the instruction dispatch module FIFO after instruction 1 is eliminated.
Therefore, the coprocessor provided by the embodiment can execute the floating point number conversion operation through the disorder of the plurality of functional sub-modules, so that the execution time is reduced, and then the execution sequence of the instruction scheduling module is used for realizing the sequential submission mechanism, so that the delay of floating point number processing does not need to be set as the longest floating point processing delay, and the floating point number can be submitted only after the floating point number processing of all the functional sub-modules is finished. Therefore, integer, single-precision and double-precision data operands can be processed out of order, can be submitted in order, and is favorable for improving the floating point conversion processing speed.
Example two
Referring to fig. 4, fig. 4 is a schematic diagram of another coprocessor for implementing floating-point out-of-order conversion according to an embodiment of the present invention. The coprocessor for implementing floating-point out-of-order conversion can be applied to a RISC-V instruction set, and the application range of the coprocessor is not limited by the embodiment of the invention. As shown in fig. 4, the coprocessor for implementing floating-point out-of-order conversion includes:
the device comprises an input temporary storage module 5, an input decoding module 1, a plurality of functional sub-modules 2, an instruction scheduling module 3, an output decoding module 4 and an output temporary storage module 6. The implementation manners of the input decoding module 1, the plurality of functional sub-modules 2, the instruction scheduling module 3, and the output decoding module 4 are substantially the same as those in the above embodiments, and are not described herein again.
The input temporary storage module 5 is an FIFO memory, the input decoding module 1 can directly obtain the floating point number operation instruction in the input temporary storage module 5 when obtaining the floating point number operation instruction, the floating point number operation instruction can be stored in the input temporary storage module 5, the depth of the FIFO memory, that is, the storage amount, can be set according to the demand, exemplarily, the obtained floating point number operation instruction is stored in the FIFO with the depth of 2, that is, two instructions are cached at most in the input temporary storage module 5, so as to deal with the situation of frequently performing the floating point conversion operation. If the input temporary storage module 5 is full, the whole external coprocessor is fed back to be in a busy state, and a new floating point number conversion instruction is not received for the moment.
Furthermore, for the input decoding module, how to make the corresponding functional sub-module idle after the decoding operation can directly execute the instruction, if the functional sub-module is already occupied, the instruction is temporarily not executed, and the input decoding module is marked as a busy state, and the instruction is not taken from the input temporary storage module any more. And when the current instruction is executed, releasing the input decoding module, and continuously taking out the next floating point number conversion instruction from the input temporary storage module 5 for decoding.
The output temporary storage module 6 is a FIFO memory and is configured to cache the floating-point number conversion result output by the output decoding module, where the floating-point number conversion result includes a 32-bit single-precision floating-point or integer result, or a 64-bit double-precision floating-point result.
Therefore, the coprocessor provided by the embodiment can execute the floating point number conversion operation through the disorder of the plurality of functional sub-modules, so that the execution time is reduced, and then the execution sequence of the instruction scheduling module is used for realizing the sequential submission mechanism, so that the delay of floating point number processing does not need to be set as the longest floating point processing delay, and the floating point number can be submitted only after the floating point number processing of all the functional sub-modules is finished. And the instruction receiving and outputting of the coprocessor are effectively planned by using the input temporary storage module and the input temporary storage module, so that the processing efficiency of the whole coprocessor is further improved. Therefore, integer, single-precision and double-precision data operands can be processed out of order, can be submitted in order, and is favorable for improving the floating point conversion processing speed.
EXAMPLE III
Referring to fig. 5, fig. 5 is a schematic structural diagram of a coprocessor for implementing floating-point out-of-order conversion according to an embodiment of the present invention. The coprocessor for implementing floating-point out-of-order conversion described in fig. 5 may be applied to a RISC-V data set, and the embodiment of the present invention is not limited to the coprocessor application data set for implementing floating-point out-of-order conversion. As shown in fig. 5, the coprocessor may include:
a memory 601 in which executable program code is stored;
a processor 602 coupled to a memory 601;
the processor 602 calls the executable program code stored in the memory 601 for executing the co-processing for implementing floating point out-of-order conversion as described in the first embodiment.
Example four
The embodiment of the invention discloses a computer-readable storage medium for storing a computer program for electronic data exchange, wherein the computer program enables a computer to execute the coprocessor for realizing floating-point out-of-order conversion, which is described in the first embodiment.
EXAMPLE five
An embodiment of the invention discloses a computer program product comprising a non-transitory computer readable storage medium storing a computer program, and the computer program is operable to cause a computer to execute a coprocessor for implementing floating point out-of-order conversion as described in the first or second embodiment.
The above-described embodiments are only illustrative, and the modules described as separate components may or may not be physically separate, and the components displayed as modules may or may not be physical modules, may be located in one place, or may be distributed on a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above detailed description of the embodiments, those skilled in the art will clearly understand that the embodiments may be implemented by software plus a necessary general hardware platform, and may also be implemented by hardware. Based on such understanding, the above technical solutions may be embodied in the form of a software product, which may be stored in a computer-readable storage medium, where the storage medium includes a Read-Only Memory (ROM), a Random Access Memory (RAM), a Programmable Read-Only Memory (PROM), an Erasable Programmable Read-Only Memory (EPROM), a One-time Programmable Read-Only Memory (OTPROM), an Electrically Erasable Programmable Read-Only Memory (EEPROM), a Compact Disc-Read-Only Memory (CD-ROM), or other disk memories, CD-ROMs, or other magnetic disks, A tape memory, or any other medium readable by a computer that can be used to carry or store data.
Finally, it should be noted that: the coprocessor for implementing floating-point out-of-order conversion disclosed in the embodiments of the present invention is only a preferred embodiment of the present invention, and is only used for illustrating the technical solutions of the present invention, rather than limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those skilled in the art; the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the spirit and scope of the technical solutions of the embodiments of the present invention.
Claims (10)
1. A coprocessor for implementing floating point out-of-order translation, the coprocessor comprising:
the input decoding module is used for acquiring a floating-point number operating instruction, decoding the floating-point number operating instruction to generate floating-point number information and an instruction token for marking the currently executed instruction, matching corresponding functional sub-modules according to the floating-point number information and transmitting the instruction token to the following instruction scheduling module;
the plurality of functional sub-modules are used for executing the operation of floating point number conversion out of order according to the floating point number information to generate a floating point number conversion result;
the instruction scheduling module is used for storing the order of the instruction token threads sent by the input decoding module;
the output decoding module is used for outputting the conversion result of the floating point number according to the order of the instruction token thread; and submitting a top instruction of the FIFO in the instruction scheduling module in the output decoding module, polling a completion signal from each matched functional sub-module by the output decoding module, reading the floating point conversion result after the functional sub-modules complete conversion and generate the floating point conversion result, and outputting the floating point conversion result according to the instruction token thread sequence and the completion signal sent by the input decoding module.
2. The coprocessor of claim 1, wherein the input decode module comprises: the judging unit is used for carrying out operand rule judgment on the floating-point number operation instruction and outputting the floating-point number operation instruction which accords with the operand rule;
the analysis unit is used for analyzing the floating-point number operation instruction which accords with the operand rule and generating a function field and an operation code;
the matching unit is used for matching the corresponding functional sub-modules according to the function fields and outputting the functional sub-modules to the corresponding functional sub-modules;
and the thread unit is used for generating an instruction token according to the floating point number operation instruction executed by the current thread.
3. Coprocessor for implementing out-of-order conversion of floating point numbers according to claim 2, characterized in that said functional submodules comprise at least a thread for executing conversion between integer and single-precision floating point numbers, a thread for executing conversion between integer and double-precision floating point numbers, and a thread for executing conversion between single-precision floating point numbers and double-precision floating point numbers.
4. The coprocessor of claim 3, wherein the plurality of functional sub-modules operate independently of one another, and the plurality of functional sub-modules operate in parallel with the instruction scheduling module in the same pipeline.
5. The coprocessor of claim 1, wherein the output decode module comprises: the polling module is used for polling the functional sub-modules and acquiring a completion signal of the floating point number conversion result generated by the functional sub-modules;
and the output module is used for outputting the floating point number conversion result according to the instruction token thread sequence sent by the input decoding module and the completion signal.
6. The coprocessor of claim 5, wherein the output decode module further comprises:
and the exception processing unit is used for receiving exception data generated by the functional sub-module.
7. The coprocessor for implementing floating-point out-of-order conversion according to claim 2, further comprising:
the input temporary storage module is used for caching floating-point number operation instructions;
the input decoding module acquires floating point number operation instructions from the input temporary storage module.
8. The coprocessor for implementing floating-point out-of-order conversion according to claim 5, further comprising:
and the output temporary storage module is used for caching the floating point number conversion result output by the output decoding module, wherein the floating point number conversion result comprises a 32-bit single-precision floating point or integer result or a 64-bit double-precision floating point result.
9. Coprocessor for implementing floating-point out-of-order conversion according to one of claims 1 to 8,
the input temporary storage module, the instruction scheduling module and the output temporary storage module are FIFO modules.
10. An apparatus for out-of-order conversion of floating point numbers, the apparatus comprising:
the buffer is used for storing floating-point number operation instructions;
a coprocessor coupled with the buffer;
the coprocessor is implemented as a coprocessor for implementing floating point out-of-order conversion as claimed in any one of claims 1 to 9.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111473279.1A CN113867682B (en) | 2021-12-06 | 2021-12-06 | Coprocessor for realizing floating-point number out-of-order conversion |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111473279.1A CN113867682B (en) | 2021-12-06 | 2021-12-06 | Coprocessor for realizing floating-point number out-of-order conversion |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113867682A CN113867682A (en) | 2021-12-31 |
CN113867682B true CN113867682B (en) | 2022-02-22 |
Family
ID=78986076
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111473279.1A Active CN113867682B (en) | 2021-12-06 | 2021-12-06 | Coprocessor for realizing floating-point number out-of-order conversion |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113867682B (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108027750A (en) * | 2015-09-19 | 2018-05-11 | 微软技术许可有限责任公司 | Out of order submission |
CN111966406A (en) * | 2020-08-06 | 2020-11-20 | 北京微核芯科技有限公司 | Method and device for scheduling out-of-order execution queue in out-of-order processor |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6094719A (en) * | 1997-06-25 | 2000-07-25 | Sun Microsystems, Inc. | Reducing data dependent conflicts by converting single precision instructions into microinstructions using renamed phantom registers in a processor having double precision registers |
US7734901B2 (en) * | 2005-10-31 | 2010-06-08 | Mips Technologies, Inc. | Processor core and method for managing program counter redirection in an out-of-order processor pipeline |
US9262140B2 (en) * | 2008-05-19 | 2016-02-16 | International Business Machines Corporation | Predication supporting code generation by indicating path associations of symmetrically placed write instructions |
CN101645017A (en) * | 2009-09-07 | 2010-02-10 | 深圳市茁壮网络股份有限公司 | Cross-platform byte order processing method, device and byte code running platform |
US10678544B2 (en) * | 2015-09-19 | 2020-06-09 | Microsoft Technology Licensing, Llc | Initiating instruction block execution using a register access instruction |
US20190220278A1 (en) * | 2019-03-27 | 2019-07-18 | Menachem Adelman | Apparatus and method for down-converting and interleaving multiple floating point values |
CN111198715A (en) * | 2019-12-26 | 2020-05-26 | 核芯互联科技(青岛)有限公司 | Out-of-order high-performance core-oriented memory controller command scheduling method and device |
-
2021
- 2021-12-06 CN CN202111473279.1A patent/CN113867682B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108027750A (en) * | 2015-09-19 | 2018-05-11 | 微软技术许可有限责任公司 | Out of order submission |
CN111966406A (en) * | 2020-08-06 | 2020-11-20 | 北京微核芯科技有限公司 | Method and device for scheduling out-of-order execution queue in out-of-order processor |
Also Published As
Publication number | Publication date |
---|---|
CN113867682A (en) | 2021-12-31 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US6560697B2 (en) | Data processor having repeat instruction processing using executed instruction number counter | |
US5150470A (en) | Data processing system with instruction queue having tags indicating outstanding data status | |
US6502117B2 (en) | Data manipulation instruction for enhancing value and efficiency of complex arithmetic | |
US7593978B2 (en) | Processor reduction unit for accumulation of multiple operands with or without saturation | |
US20060041610A1 (en) | Processor having parallel vector multiply and reduce operations with sequential semantics | |
US5604878A (en) | Method and apparatus for avoiding writeback conflicts between execution units sharing a common writeback path | |
JP2002508864A (en) | Tagging floating-point values for quick detection of special floating-point numbers | |
US4967338A (en) | Loosely coupled pipeline processor | |
US20120284489A1 (en) | Methods and Apparatus for Constant Extension in a Processor | |
CN110825436B (en) | Calculation method applied to artificial intelligence chip and artificial intelligence chip | |
US5684971A (en) | Reservation station with a pseudo-FIFO circuit for scheduling dispatch of instructions | |
EP0287115B1 (en) | Coprocessor and method of controlling the same | |
US9471305B2 (en) | Micro-coded transcendental instruction execution | |
CN113867682B (en) | Coprocessor for realizing floating-point number out-of-order conversion | |
US6925548B2 (en) | Data processor assigning the same operation code to multiple operations | |
CN115495212A (en) | Task queue processing method, device, equipment, storage medium and program product | |
CN107077381B (en) | Asynchronous instruction execution device and method | |
CN113779755A (en) | Design method of silicon-based multispectral integrated circuit chip and integrated circuit chip | |
US20060101240A1 (en) | Digital signal processing circuit and digital signal processing method | |
US20120191955A1 (en) | Method and system for floating point acceleration on fixed point digital signal processors | |
WO2023093128A1 (en) | Operation instruction processing method and system, main processor, and coprocessor | |
JP6497250B2 (en) | Arithmetic processing device and control method of arithmetic processing device | |
CN117348933B (en) | Processor and computer system | |
JPS61288230A (en) | Pipeline control system | |
US10884738B2 (en) | Arithmetic processing device and method of controlling arithmetic processing device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |