CN113867682B - Coprocessor for realizing floating-point number out-of-order conversion - Google Patents

Coprocessor for realizing floating-point number out-of-order conversion Download PDF

Info

Publication number
CN113867682B
CN113867682B CN202111473279.1A CN202111473279A CN113867682B CN 113867682 B CN113867682 B CN 113867682B CN 202111473279 A CN202111473279 A CN 202111473279A CN 113867682 B CN113867682 B CN 113867682B
Authority
CN
China
Prior art keywords
floating
instruction
point number
floating point
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111473279.1A
Other languages
Chinese (zh)
Other versions
CN113867682A (en
Inventor
欧艳凤
陈钦树
朱伏生
朱晓明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Communications and Networks Institute
Original Assignee
Guangdong Communications and Networks Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Communications and Networks Institute filed Critical Guangdong Communications and Networks Institute
Priority to CN202111473279.1A priority Critical patent/CN113867682B/en
Publication of CN113867682A publication Critical patent/CN113867682A/en
Application granted granted Critical
Publication of CN113867682B publication Critical patent/CN113867682B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/483Computations with numbers represented by a non-linear combination of denominational numbers, e.g. rational numbers, logarithmic number system or floating-point numbers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3877Concurrent instruction execution, e.g. pipeline or look ahead using a slave processor, e.g. coprocessor

Landscapes

  • Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Nonlinear Science (AREA)
  • Advance Control (AREA)

Abstract

The invention discloses a coprocessor for realizing floating point number out-of-order conversion, which comprises: the input decoding module is used for acquiring a floating-point number operating instruction, decoding the floating-point number operating instruction to generate floating-point number information and an instruction token for marking the currently executed instruction, matching corresponding functional sub-modules according to the floating-point number information and transmitting the instruction token to the following instruction scheduling module; the plurality of functional sub-modules are used for executing the operation of floating point number conversion out of order according to the floating point number information to generate a floating point number conversion result; the instruction scheduling module is used for storing the order of the instruction token threads sent by the input decoding module; and the output decoding module is used for outputting the floating point number conversion result according to the instruction token thread sequence. Therefore, the floating point conversion processing speed can be accelerated, and the problem of asynchronous data transfer between the coprocessor and the general processor caused by the out-of-order execution of the coprocessor is solved.

Description

Coprocessor for realizing floating-point number out-of-order conversion
Technical Field
The invention relates to the technical field of integrated circuit design, in particular to a coprocessor for realizing floating-point number out-of-order conversion.
Background
The data conversion operation is an indispensable flow in a floating point arithmetic unit of a processor, and a typical data conversion mode may include data conversion instructions such as conversion of various data precisions, conversion between integers and floating point numbers, conversion between floating point numbers and fixed point numbers, and the like. The data conversion operation may be implemented by software or hardware.
In order to realize data conversion operation based on hardware, the RISC-V instruction set (the open instruction set architecture ISA established based on the reduced instruction set computing RISC principle, V is expressed as the fifth generation RISC) is specially extended with a floating point instruction set. However, the floating point conversion processing speed in the RISC-V instruction set is not fast enough, and there is a problem that data transfer between the coprocessor and the general-purpose processor is not synchronized due to the out-of-order execution of the coprocessor.
Disclosure of Invention
The technical problem to be solved by the present invention is to provide a coprocessor for implementing floating-point number out-of-order conversion. Therefore, integer, single-precision and double-precision data operands can be processed out of order, and can be submitted in order, which is beneficial to improving the floating point conversion processing speed.
In order to solve the above technical problem, a first aspect of the present invention discloses a coprocessor for implementing floating-point out-of-order conversion, the coprocessor comprising: the input decoding module is used for acquiring a floating-point number operating instruction, decoding the floating-point number operating instruction to generate floating-point number information and an instruction token for marking the currently executed instruction, matching corresponding functional sub-modules according to the floating-point number information and transmitting the instruction token to the following instruction scheduling module; the plurality of functional sub-modules are used for executing the operation of floating point number conversion out of order according to the floating point number information to generate a floating point number conversion result; the instruction scheduling module is used for storing the order of the instruction token threads sent by the input decoding module; and the output decoding module is used for outputting the floating point number conversion result according to the instruction token thread sequence.
In some implementations, the input coding module includes: the judging unit is used for carrying out operand rule judgment on the floating-point number operation instruction and outputting the floating-point number operation instruction which accords with the operand rule; the analysis unit is used for analyzing the floating-point number operation instruction which accords with the operand rule and generating a function field and an operation code; the matching unit is used for matching the corresponding functional sub-modules according to the function fields and outputting the functional sub-modules to the corresponding functional sub-modules; and the thread unit is used for generating an instruction token according to the floating point number operation instruction executed by the current thread.
In some embodiments, the functional sub-module includes at least a thread for performing conversion between integer and single-precision floating point numbers, a thread for performing conversion between integer and double-precision floating point numbers, and a thread for performing conversion between single-precision floating point numbers and double-precision floating point numbers.
In some embodiments, the plurality of functional sub-modules operate independently of each other, and the plurality of functional sub-modules and the instruction scheduling module operate in parallel in the same pipeline.
In some implementations, the output coding module includes: the polling module is used for polling the functional sub-modules and acquiring a completion signal of the floating point number conversion result generated by the functional sub-modules; and the output module is used for outputting the floating point number conversion result according to the instruction token thread sequence sent by the input decoding module and the completion signal.
In some embodiments, the output coding module further comprises: and the exception processing unit is used for receiving exception data generated by the functional sub-module.
In some embodiments, further comprising: the input temporary storage module is used for caching floating-point number operation instructions; the input decoding module acquires floating point number operation instructions from the input temporary storage module.
In some embodiments, further comprising: and the output temporary storage module is used for caching the floating point number conversion result output by the output decoding module, wherein the floating point number conversion result comprises a 32-bit single-precision floating point or integer result or a 64-bit double-precision floating point result.
In some embodiments, the input buffer module, the instruction dispatch module, and the output buffer module are all FIFO modules.
According to a second aspect of the present invention, there is disclosed an apparatus for out-of-order conversion of floating point numbers, the apparatus comprising: the buffer is used for storing floating-point number operation instructions; a coprocessor coupled with the buffer; the coprocessor is implemented as the coprocessor for implementing floating point out-of-order conversion described above.
Compared with the prior art, the invention has the beneficial effects that:
the invention can execute the operation of floating point number conversion through the disorder of a plurality of functional sub-modules, thereby reducing the execution time, and then realize the sequential submission mechanism by utilizing the execution sequence of the instruction scheduling module, so that the delay of floating point number processing does not need to be set as the longest floating point processing delay, and the floating point numbers can be submitted only after the floating point numbers of all functional sub-modules are processed. Therefore, integer, single-precision and double-precision data operands can be processed out of order, can be submitted in order, and is favorable for improving the floating point conversion processing speed.
Drawings
FIG. 1 is a schematic structural diagram of a coprocessor for implementing out-of-order conversion of floating point numbers according to an embodiment of the present invention;
FIG. 2 is a schematic view of a processing flow of an input decoding module according to an embodiment of the present invention;
FIG. 3 is a schematic view of a processing flow of an output decoding module according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a coprocessor for implementing floating-point out-of-order translation according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of a coprocessor for implementing floating-point out-of-order conversion according to an embodiment of the present invention.
Detailed Description
For better understanding and implementation, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The terms "comprises," "comprising," and any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or modules is not necessarily limited to those steps or modules explicitly listed, but may include other steps or modules not expressly listed or inherent to such process, method, article, or apparatus.
The embodiment of the invention discloses a coprocessor for realizing floating point number out-of-order conversion, which can execute the operation of floating point number conversion through the out-of-order of a plurality of functional sub-modules so as to reduce the execution time, and then realize an order submission mechanism by utilizing the execution order of an instruction scheduling module, so that the delay of floating point number processing does not need to be set as the longest floating point processing delay, and the floating point number can be submitted only after the floating point number processing of all the functional sub-modules is finished. Therefore, integer, single-precision and double-precision data operands can be processed out of order, can be submitted in order, and is favorable for improving the floating point conversion processing speed.
Example one
Referring to fig. 1, fig. 1 is a schematic structural diagram of a coprocessor for implementing floating-point out-of-order conversion according to an embodiment of the present invention. The coprocessor for implementing floating-point out-of-order conversion can be applied to a RISC-V instruction set, and the application range of the coprocessor is not limited by the embodiment of the invention. As shown in fig. 1, the coprocessor for implementing floating-point out-of-order conversion includes: the device comprises an input decoding module 1, a plurality of functional sub-modules 2, an instruction scheduling module 3 and an output decoding module 4.
The input decoding module 1 is used for acquiring a floating point number operation instruction, decoding the floating point number operation instruction to generate floating point number information and an instruction token for marking the currently executed instruction, matching the corresponding functional sub-module 2 according to the floating point number information, and transmitting the instruction token to the following instruction scheduling module 3. Wherein, the input decoding module 1 comprises: and the judging unit 11 is configured to perform operand rule judgment on the floating-point number operation instruction, and output the floating-point number operation instruction meeting the operand rule. In the obtained floating-point number operation instruction, the floating-point number conversion request packet may be implemented, where the request packet includes an operand, an opcode, and a function field for performing a conversion operation, and in this embodiment, the width of the operand is set to 64 bits, which may satisfy that only 32 bits are used to transfer integer data or single-precision floating-point data, or 64 bits are used to transfer double-precision floating-point data. If signed 32 bits of data, the upper 32 bits are filled by the sign bit of the data. According to the instruction set standard defined by RISC-V, the function field used by the floating-point conversion instruction, such as conversion from integer to single-precision floating-point, conversion from single-precision floating-point to double-precision floating-point, etc., is funct5, where the conversion, movement, and sign injection of floating-point numbers are all encoded in the main OPcode space of the OP-FP (the operation code used by the RISC-V standard represents a sign). When the judgment unit 11 judges the operand rule of the floating-point operation instruction, as shown in fig. 2, it judges that the operation code of the received floating-point operation instruction is an OP-FP (RISC-V standard), and if the operation code of the floating-point operation instruction does not meet the OP-FP standard, judges that the floating-point operation instruction is not the coprocessor instruction, inputs the floating-point operation instruction meeting the coprocessor instruction to the analysis unit 12, and performs the next-stage analysis operation, that is, analyzes the floating-point operation instruction meeting the operand rule, and generates the function field funtc5 and the operation code fmt. Only funtc5 and fmt together determine the type of transaction that the coprocessor can handle. The implementation of the parsing can refer to the RISC-V standard implementation, and illegal instructions (non-floating point translation/move/sign injection instructions) after parsing decoding will not be executed because there is no corresponding functional sub-module in the present coprocessor. Then, the matching unit 13 matches the corresponding functional sub-module according to the function field and outputs the result to the corresponding functional sub-module. At the same time, the thread unit 14 will generate an instruction token according to the floating-point operation instruction executed by the current thread, and the instruction token can mark the floating-point operation instruction currently being executed.
And the plurality of functional sub-modules 2 are used for executing the operation of floating point number conversion out of order according to the floating point number information to generate a floating point number conversion result. The functional sub-modules comprise a certain number of conversion modules, two symbol injection modules and a mobile module. The conversion module not only comprises a thread for executing conversion between integer and single-precision floating point numbers, a thread for executing conversion between integer and double-precision floating point numbers, and a thread for executing conversion between single-precision floating point numbers and double-precision floating point numbers, but also can comprise other necessary conversion modules, or the number of the conversion modules is increased according to needs. In a specific implementation, the functional sub-module may cause the operation result and the exception according to the IEEE 754-. And the plurality of functional sub-modules run independently, the execution is not disturbed, the plurality of functional sub-modules can be in an execution state at the same time, and the plurality of functional sub-modules and the instruction scheduling module run in parallel in the same flow thread.
The instruction scheduling module 3 is used for storing the instruction token thread sequence sent by the input decoding module 1. In a specific application, the instruction scheduling module may be implemented as a FIFO first-in first-out memory, and may form the rule of instruction scheduling by storing the instruction token currently being executed by the pipeline. The depth of the instruction scheduling FIFO may be set to the maximum pipeline number of the coprocessor, and may be set to 5 in this embodiment. Since the instructions sent by the input decoding module are in instruction order, the FIFO is kept in order in the instruction scheduling. This ensures that the last instruction is committed in order rather than out of order (due to the nature of the FIFO, the first instruction to store will necessarily be the first instruction to pop).
And the output decoding module 4 is used for outputting the conversion result of the floating point number according to the instruction token thread sequence. Wherein, the output decoding module 4 comprises: and the polling module 41 is configured to poll the functional sub-module and obtain a completion signal of the floating point number conversion result generated by the functional sub-module. And the output module 42 is configured to output a floating-point number conversion result according to the instruction token thread sequence and the completion signal sent by the input decoding module. And an exception handling unit 43, configured to receive exception data generated by the functional sub-module.
In order to ensure that the entire coprocessed floating-point number conversion result is committed in order, the oldest instruction from the pipeline, i.e. the top instruction of the FIFO in the instruction scheduling module 3, is committed in the output decode module 4. At the same time, the polling module 41 is used to poll the completion signals from the matched functional sub-modules. After the functional sub-module finishes converting to generate a floating point number conversion result, the floating point number conversion result is read, and the exception data in the functional sub-module is stored in the exception handling unit 43, which may be stored in an output FIFO with a depth of 2, for example. Finally, the floating-point number conversion result is output by the output module 42 according to the instruction token thread order and the completion signal sent by the input decoding module, and the floating-point number conversion result may be implemented as an Fpu response packet, where the response packet only contains data after floating-point conversion, and may be a 32-bit (single-precision floating point or integer) or 64-bit (double-precision floating point) result.
Illustratively, as shown in FIG. 3, the instruction at the top of the instruction dispatch module FIFO is instruction 1, and the functional sub-module that executes instruction 1 is module 1; the next instruction to instruction 1 is instruction 2, the functional submodule 2 that executes instruction 2. At this time, the functional submodule 1 has not executed the instruction 1, but the functional submodule 2 has executed the instruction 2. Even then, the output decoding module does not store the response of the instruction 2 into the FIFO, but must wait for the functional sub-module 1 to finish executing the instruction 1, store the response of the instruction 1 into the FIFO, and then store the response of the instruction 2 into the FIFO. The output decoding module needs to know the next instruction to be issued, so the head instruction of the FIFO is read first, and according to the characteristics of the FIFO, the instruction is eliminated in the FIFO after the instruction is read. Instruction 2 will be the top instruction of the instruction dispatch module FIFO after instruction 1 is eliminated.
Therefore, the coprocessor provided by the embodiment can execute the floating point number conversion operation through the disorder of the plurality of functional sub-modules, so that the execution time is reduced, and then the execution sequence of the instruction scheduling module is used for realizing the sequential submission mechanism, so that the delay of floating point number processing does not need to be set as the longest floating point processing delay, and the floating point number can be submitted only after the floating point number processing of all the functional sub-modules is finished. Therefore, integer, single-precision and double-precision data operands can be processed out of order, can be submitted in order, and is favorable for improving the floating point conversion processing speed.
Example two
Referring to fig. 4, fig. 4 is a schematic diagram of another coprocessor for implementing floating-point out-of-order conversion according to an embodiment of the present invention. The coprocessor for implementing floating-point out-of-order conversion can be applied to a RISC-V instruction set, and the application range of the coprocessor is not limited by the embodiment of the invention. As shown in fig. 4, the coprocessor for implementing floating-point out-of-order conversion includes:
the device comprises an input temporary storage module 5, an input decoding module 1, a plurality of functional sub-modules 2, an instruction scheduling module 3, an output decoding module 4 and an output temporary storage module 6. The implementation manners of the input decoding module 1, the plurality of functional sub-modules 2, the instruction scheduling module 3, and the output decoding module 4 are substantially the same as those in the above embodiments, and are not described herein again.
The input temporary storage module 5 is an FIFO memory, the input decoding module 1 can directly obtain the floating point number operation instruction in the input temporary storage module 5 when obtaining the floating point number operation instruction, the floating point number operation instruction can be stored in the input temporary storage module 5, the depth of the FIFO memory, that is, the storage amount, can be set according to the demand, exemplarily, the obtained floating point number operation instruction is stored in the FIFO with the depth of 2, that is, two instructions are cached at most in the input temporary storage module 5, so as to deal with the situation of frequently performing the floating point conversion operation. If the input temporary storage module 5 is full, the whole external coprocessor is fed back to be in a busy state, and a new floating point number conversion instruction is not received for the moment.
Furthermore, for the input decoding module, how to make the corresponding functional sub-module idle after the decoding operation can directly execute the instruction, if the functional sub-module is already occupied, the instruction is temporarily not executed, and the input decoding module is marked as a busy state, and the instruction is not taken from the input temporary storage module any more. And when the current instruction is executed, releasing the input decoding module, and continuously taking out the next floating point number conversion instruction from the input temporary storage module 5 for decoding.
The output temporary storage module 6 is a FIFO memory and is configured to cache the floating-point number conversion result output by the output decoding module, where the floating-point number conversion result includes a 32-bit single-precision floating-point or integer result, or a 64-bit double-precision floating-point result.
Therefore, the coprocessor provided by the embodiment can execute the floating point number conversion operation through the disorder of the plurality of functional sub-modules, so that the execution time is reduced, and then the execution sequence of the instruction scheduling module is used for realizing the sequential submission mechanism, so that the delay of floating point number processing does not need to be set as the longest floating point processing delay, and the floating point number can be submitted only after the floating point number processing of all the functional sub-modules is finished. And the instruction receiving and outputting of the coprocessor are effectively planned by using the input temporary storage module and the input temporary storage module, so that the processing efficiency of the whole coprocessor is further improved. Therefore, integer, single-precision and double-precision data operands can be processed out of order, can be submitted in order, and is favorable for improving the floating point conversion processing speed.
EXAMPLE III
Referring to fig. 5, fig. 5 is a schematic structural diagram of a coprocessor for implementing floating-point out-of-order conversion according to an embodiment of the present invention. The coprocessor for implementing floating-point out-of-order conversion described in fig. 5 may be applied to a RISC-V data set, and the embodiment of the present invention is not limited to the coprocessor application data set for implementing floating-point out-of-order conversion. As shown in fig. 5, the coprocessor may include:
a memory 601 in which executable program code is stored;
a processor 602 coupled to a memory 601;
the processor 602 calls the executable program code stored in the memory 601 for executing the co-processing for implementing floating point out-of-order conversion as described in the first embodiment.
Example four
The embodiment of the invention discloses a computer-readable storage medium for storing a computer program for electronic data exchange, wherein the computer program enables a computer to execute the coprocessor for realizing floating-point out-of-order conversion, which is described in the first embodiment.
EXAMPLE five
An embodiment of the invention discloses a computer program product comprising a non-transitory computer readable storage medium storing a computer program, and the computer program is operable to cause a computer to execute a coprocessor for implementing floating point out-of-order conversion as described in the first or second embodiment.
The above-described embodiments are only illustrative, and the modules described as separate components may or may not be physically separate, and the components displayed as modules may or may not be physical modules, may be located in one place, or may be distributed on a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above detailed description of the embodiments, those skilled in the art will clearly understand that the embodiments may be implemented by software plus a necessary general hardware platform, and may also be implemented by hardware. Based on such understanding, the above technical solutions may be embodied in the form of a software product, which may be stored in a computer-readable storage medium, where the storage medium includes a Read-Only Memory (ROM), a Random Access Memory (RAM), a Programmable Read-Only Memory (PROM), an Erasable Programmable Read-Only Memory (EPROM), a One-time Programmable Read-Only Memory (OTPROM), an Electrically Erasable Programmable Read-Only Memory (EEPROM), a Compact Disc-Read-Only Memory (CD-ROM), or other disk memories, CD-ROMs, or other magnetic disks, A tape memory, or any other medium readable by a computer that can be used to carry or store data.
Finally, it should be noted that: the coprocessor for implementing floating-point out-of-order conversion disclosed in the embodiments of the present invention is only a preferred embodiment of the present invention, and is only used for illustrating the technical solutions of the present invention, rather than limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those skilled in the art; the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (10)

1. A coprocessor for implementing floating point out-of-order translation, the coprocessor comprising:
the input decoding module is used for acquiring a floating-point number operating instruction, decoding the floating-point number operating instruction to generate floating-point number information and an instruction token for marking the currently executed instruction, matching corresponding functional sub-modules according to the floating-point number information and transmitting the instruction token to the following instruction scheduling module;
the plurality of functional sub-modules are used for executing the operation of floating point number conversion out of order according to the floating point number information to generate a floating point number conversion result;
the instruction scheduling module is used for storing the order of the instruction token threads sent by the input decoding module;
the output decoding module is used for outputting the conversion result of the floating point number according to the order of the instruction token thread; and submitting a top instruction of the FIFO in the instruction scheduling module in the output decoding module, polling a completion signal from each matched functional sub-module by the output decoding module, reading the floating point conversion result after the functional sub-modules complete conversion and generate the floating point conversion result, and outputting the floating point conversion result according to the instruction token thread sequence and the completion signal sent by the input decoding module.
2. The coprocessor of claim 1, wherein the input decode module comprises: the judging unit is used for carrying out operand rule judgment on the floating-point number operation instruction and outputting the floating-point number operation instruction which accords with the operand rule;
the analysis unit is used for analyzing the floating-point number operation instruction which accords with the operand rule and generating a function field and an operation code;
the matching unit is used for matching the corresponding functional sub-modules according to the function fields and outputting the functional sub-modules to the corresponding functional sub-modules;
and the thread unit is used for generating an instruction token according to the floating point number operation instruction executed by the current thread.
3. Coprocessor for implementing out-of-order conversion of floating point numbers according to claim 2, characterized in that said functional submodules comprise at least a thread for executing conversion between integer and single-precision floating point numbers, a thread for executing conversion between integer and double-precision floating point numbers, and a thread for executing conversion between single-precision floating point numbers and double-precision floating point numbers.
4. The coprocessor of claim 3, wherein the plurality of functional sub-modules operate independently of one another, and the plurality of functional sub-modules operate in parallel with the instruction scheduling module in the same pipeline.
5. The coprocessor of claim 1, wherein the output decode module comprises: the polling module is used for polling the functional sub-modules and acquiring a completion signal of the floating point number conversion result generated by the functional sub-modules;
and the output module is used for outputting the floating point number conversion result according to the instruction token thread sequence sent by the input decoding module and the completion signal.
6. The coprocessor of claim 5, wherein the output decode module further comprises:
and the exception processing unit is used for receiving exception data generated by the functional sub-module.
7. The coprocessor for implementing floating-point out-of-order conversion according to claim 2, further comprising:
the input temporary storage module is used for caching floating-point number operation instructions;
the input decoding module acquires floating point number operation instructions from the input temporary storage module.
8. The coprocessor for implementing floating-point out-of-order conversion according to claim 5, further comprising:
and the output temporary storage module is used for caching the floating point number conversion result output by the output decoding module, wherein the floating point number conversion result comprises a 32-bit single-precision floating point or integer result or a 64-bit double-precision floating point result.
9. Coprocessor for implementing floating-point out-of-order conversion according to one of claims 1 to 8,
the input temporary storage module, the instruction scheduling module and the output temporary storage module are FIFO modules.
10. An apparatus for out-of-order conversion of floating point numbers, the apparatus comprising:
the buffer is used for storing floating-point number operation instructions;
a coprocessor coupled with the buffer;
the coprocessor is implemented as a coprocessor for implementing floating point out-of-order conversion as claimed in any one of claims 1 to 9.
CN202111473279.1A 2021-12-06 2021-12-06 Coprocessor for realizing floating-point number out-of-order conversion Active CN113867682B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111473279.1A CN113867682B (en) 2021-12-06 2021-12-06 Coprocessor for realizing floating-point number out-of-order conversion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111473279.1A CN113867682B (en) 2021-12-06 2021-12-06 Coprocessor for realizing floating-point number out-of-order conversion

Publications (2)

Publication Number Publication Date
CN113867682A CN113867682A (en) 2021-12-31
CN113867682B true CN113867682B (en) 2022-02-22

Family

ID=78986076

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111473279.1A Active CN113867682B (en) 2021-12-06 2021-12-06 Coprocessor for realizing floating-point number out-of-order conversion

Country Status (1)

Country Link
CN (1) CN113867682B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108027750A (en) * 2015-09-19 2018-05-11 微软技术许可有限责任公司 Out of order submission
CN111966406A (en) * 2020-08-06 2020-11-20 北京微核芯科技有限公司 Method and device for scheduling out-of-order execution queue in out-of-order processor

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6094719A (en) * 1997-06-25 2000-07-25 Sun Microsystems, Inc. Reducing data dependent conflicts by converting single precision instructions into microinstructions using renamed phantom registers in a processor having double precision registers
US7734901B2 (en) * 2005-10-31 2010-06-08 Mips Technologies, Inc. Processor core and method for managing program counter redirection in an out-of-order processor pipeline
US9262140B2 (en) * 2008-05-19 2016-02-16 International Business Machines Corporation Predication supporting code generation by indicating path associations of symmetrically placed write instructions
CN101645017A (en) * 2009-09-07 2010-02-10 深圳市茁壮网络股份有限公司 Cross-platform byte order processing method, device and byte code running platform
US10678544B2 (en) * 2015-09-19 2020-06-09 Microsoft Technology Licensing, Llc Initiating instruction block execution using a register access instruction
US20190220278A1 (en) * 2019-03-27 2019-07-18 Menachem Adelman Apparatus and method for down-converting and interleaving multiple floating point values
CN111198715A (en) * 2019-12-26 2020-05-26 核芯互联科技(青岛)有限公司 Out-of-order high-performance core-oriented memory controller command scheduling method and device

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108027750A (en) * 2015-09-19 2018-05-11 微软技术许可有限责任公司 Out of order submission
CN111966406A (en) * 2020-08-06 2020-11-20 北京微核芯科技有限公司 Method and device for scheduling out-of-order execution queue in out-of-order processor

Also Published As

Publication number Publication date
CN113867682A (en) 2021-12-31

Similar Documents

Publication Publication Date Title
US6560697B2 (en) Data processor having repeat instruction processing using executed instruction number counter
US5150470A (en) Data processing system with instruction queue having tags indicating outstanding data status
US6502117B2 (en) Data manipulation instruction for enhancing value and efficiency of complex arithmetic
US7593978B2 (en) Processor reduction unit for accumulation of multiple operands with or without saturation
US20060041610A1 (en) Processor having parallel vector multiply and reduce operations with sequential semantics
US5604878A (en) Method and apparatus for avoiding writeback conflicts between execution units sharing a common writeback path
JP2002508864A (en) Tagging floating-point values for quick detection of special floating-point numbers
US4967338A (en) Loosely coupled pipeline processor
US20120284489A1 (en) Methods and Apparatus for Constant Extension in a Processor
CN110825436B (en) Calculation method applied to artificial intelligence chip and artificial intelligence chip
US5684971A (en) Reservation station with a pseudo-FIFO circuit for scheduling dispatch of instructions
EP0287115B1 (en) Coprocessor and method of controlling the same
US9471305B2 (en) Micro-coded transcendental instruction execution
CN113867682B (en) Coprocessor for realizing floating-point number out-of-order conversion
US6925548B2 (en) Data processor assigning the same operation code to multiple operations
CN115495212A (en) Task queue processing method, device, equipment, storage medium and program product
CN107077381B (en) Asynchronous instruction execution device and method
CN113779755A (en) Design method of silicon-based multispectral integrated circuit chip and integrated circuit chip
US20060101240A1 (en) Digital signal processing circuit and digital signal processing method
US20120191955A1 (en) Method and system for floating point acceleration on fixed point digital signal processors
WO2023093128A1 (en) Operation instruction processing method and system, main processor, and coprocessor
JP6497250B2 (en) Arithmetic processing device and control method of arithmetic processing device
CN117348933B (en) Processor and computer system
JPS61288230A (en) Pipeline control system
US10884738B2 (en) Arithmetic processing device and method of controlling arithmetic processing device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant