CN113835677A

CN113835677A - Operand processing system, method and processor

Info

Publication number: CN113835677A
Application number: CN202111116342.6A
Authority: CN
Inventors: 杨灿; 郑雅文; 邢金璋
Original assignee: Loongson Technology Corp Ltd
Current assignee: Loongson Technology Corp Ltd
Priority date: 2021-09-23
Filing date: 2021-09-23
Publication date: 2021-12-24

Abstract

The invention provides an operand processing system, a method and a processor, comprising: the preprocessing module determines the width to be rounded of the source operand according to the precision conversion instruction and the precision value of the target format and the precision value of the source operand, and the random number generator generates a random number according to the width to be rounded, wherein the width of the random number is the same as the width to be rounded; the shifter aligns the random number with the lowest order bit of the source operand to obtain an aligned number; the adder adds the source operand and the alignment number to obtain an addition result; the multiplexer truncates the least significant bits of the sum to obtain the destination operand in the destination format. In the invention, all the operation operations are completed by a pure hardware device in the unbiased rounding process, so that the instruction number in the whole process is reduced, the calculation time is reduced, and the processing efficiency is improved.

Description

Operand processing system, method and processor

Technical Field

The embodiment of the invention relates to the technical field of communication, in particular to operand processing, an operand processing method and a processor.

Background

In the fields of machine learning, financial analysis and the like, due to the requirement on the model precision of a relevant algorithm model, operands in the model are often required to be optimized, so that the model calculation efficiency is improved as much as possible on the premise of ensuring the model precision.

In the related art, the optimization method for the operands in the model may be: the high-precision operand in the model is converted into the low-precision operand in an unbiased rounding mode, so that the calculation resource needed by the model is reduced, and the efficiency is improved. Specifically, at present, a pure software simulation manner is adopted to implement unbiased rounding of an operand, that is, unbiased rounding takes an operand of a first precision value as an input, and each operation in the unbiased rounding process is implemented by a plurality of instructions until a target operand is obtained through multi-step operation calculation and the precision value of the target operand reaches a second precision value lower than the first precision value, that is, the unbiased rounding operation is completed.

However, in the current scheme, unbiased rounding of operands is implemented in a pure software simulation manner, and due to too many instructions adopted in the implementation process, the whole process consumes a lot of time, and the processing efficiency is seriously reduced.

Disclosure of Invention

Embodiments of the present invention provide an operand processing system, a method and a processor, so as to solve the problem in the related art that a large amount of time is consumed in the whole process and the processing efficiency is seriously reduced due to unbiased rounding of operands implemented in a pure software simulation manner.

In a first aspect, an operand processing system is provided, the system comprising:

the device comprises a preprocessing module, a random number generator, a shifter, an adder and a multi-path selector, wherein the preprocessing module, the random number generator, the shifter, the adder and the multi-path selector are sequentially connected;

the preprocessing module is used for: acquiring a precision conversion instruction, and determining the width to be rounded of a source operand according to the precision conversion instruction and the precision value of a target format and the precision value of the source operand;

the random number generator is configured to: generating a random number according to the width to be rounded, wherein the width of the random number is the same as the width to be rounded;

the shifter is used for: aligning the random number with the lowest order bits of the source operand to obtain an aligned number;

the adder is configured to: adding the source operand and the alignment number to obtain an addition result;

the multiplexer is to: and truncating the lowest bit of the addition result to obtain the target operand in the target format.

Optionally, in the case that the source format is a fixed point number format; the preprocessing module comprises: a subtractor submodule;

the preprocessing module is specifically configured to: and calculating the difference value between the precision value of the source operand and the precision value of the target format through the subtracter submodule according to the precision conversion instruction, and determining the difference value as the width to be rounded.

Optionally, in the case when the source format is a floating point format; the preprocessing module comprises: a format conversion submodule and a subtracter submodule;

the preprocessing module is specifically configured to:

converting the format of the source operand into a fixed point number format through the format conversion submodule according to the precision conversion instruction to obtain a new source operand;

determining the width to be rounded to 0 if the precision value of the new source operand is less than or equal to the precision value of the target format;

and under the condition that the precision value of the new source operand is larger than the precision value of the target format, calculating the difference value of the precision value of the new source operand and the precision value of the target format through the subtracter submodule, and determining the difference value as the width to be rounded.

Optionally, the shifter is specifically configured to: and aligning the random number with the source operand to obtain an aligned number with the same width as the source operand, wherein the lowest bit of the random number is placed at the position of the lowest bit of the aligned number, and the other bits of the aligned number except the random number of the lowest bit are set to be 0.

Optionally, the adder is specifically configured to: and after aligning the decimal point of the alignment number with the decimal point of the source operand, performing addition operation of the alignment number and the source operand to obtain an addition result.

Optionally, the multiplexer is specifically configured to: and truncating the lowest bit corresponding to the random number in the addition result to obtain the target operand.

Optionally, the precision converting instruction includes: the first register identification of the source register, the second register identification of the destination register, the third register identification of the source operand precision indication register and the fourth register identification of the destination operand precision indication register; the preprocessing module comprises: a decoding sub-module;

the preprocessing module is further configured to: decoding the precision conversion instruction by the decoding submodule to obtain the first register identifier, the second register identifier, the third register identifier and the fourth register identifier;

obtaining the source operand from the source register according to the first register identifier;

according to the third register identification, obtaining the precision value of the source operand from the source operand precision indication register;

and obtaining the precision value of the target format from the destination operand precision indication register according to the fourth register identifier.

Optionally, after the multiplexer truncates the least significant bit of the sum to obtain the destination operand in the destination format, the multiplexer is further configured to: and adding the target operand to the destination register according to the second register identification.

Optionally, the target format has a preset expressible range;

replacing the value of the source operand with a maximum value of the expressible range when the value of the source operand is greater than or equal to the maximum value;

replacing the value of the source operand with the minimum value when the value of the source operand is less than or equal to the minimum value of the expressible range.

Optionally, the random number generator includes: a linear feedback shift register.

In a second aspect, a processor is provided, the processor comprising:

the device comprises an instruction fetching module, a decoding module and a function module;

the instruction fetching module is used for: acquiring a precision conversion instruction;

the coding module is to: decoding the precision conversion instruction to obtain a source operand in the source format, the target format and a precision value in the target format;

the functional module is used for: determining the width to be rounded of the source operand according to the precision value of the target format and the precision value of the source operand;

generating a random number according to the width to be rounded, wherein the width of the random number is the same as the width to be rounded;

aligning the random number with the lowest order bits of the source operand to obtain an aligned number;

adding the source operand and the alignment number to obtain an addition result;

and truncating the lowest bit of the addition result to obtain the target operand in the target format.

Optionally, the functional module includes:

the device comprises a preprocessing module, a random number generator, a shifter, an adder and a multiplexer;

the preprocessing module is used for: determining the width to be rounded of the source operand according to the precision value of the target format and the precision value of the source operand;

the preprocessing module is specifically configured to:

Optionally, the precision converting instruction includes: the first register identification of the source register, the second register identification of the destination register, the third register identification of the source operand precision indication register and the fourth register identification of the destination operand precision indication register;

the coding module is specifically configured to:

performing a decoding operation on the precision conversion instruction to obtain the first register identifier, the second register identifier, the third register identifier and the fourth register identifier;

the functional module is specifically configured to:

Optionally, the processor further includes a write-back module, where the write-back module is configured to: and adding the target operand to the destination register according to the second register identification.

Optionally, the target format has a preset expressible range;

In a third aspect, an operand processing method is provided, where the method is applied to the operand processing system, and the method includes:

acquiring a precision conversion instruction, and determining the width to be rounded of a source operand according to the precision conversion instruction and the precision value of a target format and the precision value of the source operand;

Optionally, in the case that the source format is a fixed point number format; the determining, according to the precision conversion instruction, the to-be-rounded width of the source operand according to the precision value of the target format and the precision value of the source operand includes:

and calculating the difference value between the precision value of the source operand and the precision value of the target format according to the precision conversion instruction, and determining the difference value as the width to be rounded.

Optionally, when the source format is a floating point format, the determining, according to the precision conversion instruction, a to-be-rounded width of the source operand according to the precision value of the target format and the precision value of the source operand includes:

converting the format of the source operand into a fixed point number format according to the precision conversion instruction to obtain a new source operand;

and calculating a difference value between the precision value of the new source operand and the precision value of the target format under the condition that the precision value of the new source operand is larger than the precision value of the target format, and determining the difference value as the width to be rounded.

Optionally, said aligning the random number with the lowest order bits of the source operand to obtain an aligned number includes;

and aligning the random number with the source operand to obtain an aligned number with the same width as the source operand, wherein the lowest bit of the random number is placed at the position of the lowest bit of the aligned number, and the other bits of the aligned number except the random number of the lowest bit are set to be 0.

Optionally, the summing the source operand and the alignment number to obtain a summed result includes:

and after aligning the decimal point of the alignment number with the decimal point of the source operand, performing addition operation of the alignment number and the source operand to obtain an addition result.

Optionally, the truncating the least significant bit of the sum to obtain the target operand in the target format includes:

and truncating the lowest bit corresponding to the random number in the addition result to obtain the target operand.

the determining, according to the precision conversion instruction, the to-be-rounded width of the source operand according to the precision value of the target format and the precision value of the source operand includes:

according to the fourth register identification, obtaining the precision value of the target format from the destination operand precision indication register;

and determining the width to be rounded of the source operand according to the precision value of the target format and the precision value of the source operand.

Optionally, after truncating the least significant bits of the sum to obtain the target operand in the target format, the method further includes:

and adding the target operand to the destination register according to the second register identification.

In an embodiment of the invention, a system comprises: the device comprises a preprocessing module, a random number generator, a shifter, an adder and a multiplexer, wherein the preprocessing module is used for: acquiring a precision conversion instruction, and determining the width to be rounded of a source operand according to the precision conversion instruction and the precision value of a target format and the precision value of the source operand; the random number generator is operable to: generating a random number according to the width to be rounded, wherein the width of the random number is the same as the width to be rounded; the shifter is used for: aligning the random number with the lowest bit of the source operand to obtain an aligned number; the adder is used for: adding the source operand and the alignment number to obtain an addition result; the multiplexer is used for: the least significant bits of the sum result are truncated to obtain the destination operand in the destination format. In the invention, the purpose of converting a high-precision source operand into a low-precision target operand according to unbiased rounding can be realized by one precision conversion instruction and combining a pure hardware mode, and all operation operations are completed by a pure hardware device in the unbiased rounding process, so that the instruction number in the whole process is reduced, the calculation time is reduced, and the processing efficiency is improved.

The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:

FIG. 1 is an architecture diagram of an operand processing system according to an embodiment of the present invention;

FIG. 2 is an architecture diagram of a processor according to an embodiment of the present invention;

FIG. 3 is a flowchart illustrating steps of a method for processing operands according to an embodiment of the present invention;

Detailed Description

Exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the invention are shown in the drawings, it should be understood that the invention can be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.

Common rounding approaches, including rounding to nearest numbers (round to nearest), rounding to 0, rounding to plus infinity, rounding to minus infinity, etc., are all fixed, i.e., biased, in the rounding direction. Unlike these common rounding approaches, unbiased rounding implements a rounding operation with no bias, i.e., whether its rounded target result is random, larger or smaller than the source operand. In the machine domain, it can be found that firstly rounding the operands reduces the precision by unbiased rounding, but has no effect on the precision of the application model. In addition, the model obtained by the unbiased rounding is higher in precision and more stable than the common biased rounding, and the unbiased rounding effect is better. Unbiased rounding is a mature rounding calculation mode in computer learning, which performs shifting, adding and low-order rounding operations on a source operand with higher precision through a generated random number to finally obtain a target operand with lower precision, wherein the rounding process for obtaining the target operand does not have the characteristics of rounding up and rounding down (i.e. unbiased), and in the field of machine learning, the unbiased rounding of the obtained target operand does not greatly influence the output precision of a model applied by the target operand.

Fig. 1 is an architecture diagram of an operand processing system according to an embodiment of the present invention, and as shown in fig. 1, the system may include:

a preprocessing module 10, a random number generator 20, a shifter 30, an adder 40, and a multiplexer 50. The preprocessing module 10, the random number generator 20, the shifter 30, the adder 40, and the multiplexer 50 are connected in sequence to form a peripheral circuit, which can operate independently of the processor.

In one embodiment, "connected" may be used to indicate that two or more elements/modules are in direct physical or electrical contact with each other.

Specifically, the preprocessing module 10 is configured to: the precision conversion instruction is obtained, and the precision values of the source operand, the target format and the target format in the source format are obtained by decoding the precision conversion instruction, wherein the source format can be a fixed-point format or a floating-point format, and the precision of the source operand is higher.

In order to effectively reduce the requirement of the algorithm model on hardware on the basis of ensuring the calculation accuracy of the algorithm model and reduce the calculation resources required by the algorithm model, the embodiment of the invention needs to convert the source operand in the source format into the target operand in the target format with lower accuracy in an unbiased rounding mode. Unbiased rounding generally involves generating a random number, summing the random number with the input value to produce a result of greater width, and finally truncating the result of greater width to produce the final result.

In the embodiment of the present invention, a source operand, precision indication information of the source operand, and precision indication information of a destination operand may be obtained by parsing a precision conversion instruction, where the precision indication information of the source operand is used to reflect a precision value of a source format of the source operand, and the precision indication information of the destination operand is used to reflect a precision value of a target format.

The source operand can be obtained by analyzing in three ways:

mode 1, a register is set for a preprocessing module to use, a source operand is stored in the register, and a register identifier of the register is added to the precision conversion instruction, the preprocessing module can obtain the register identifier by analyzing the precision conversion instruction, and the preprocessing module can directly extract the source operand from the register through the register identifier. The operand processing method provided by the embodiment of the application can be realized by a processor or a peripheral circuit, and the register is a high-speed memory and is designed and realized for realizing the quick instruction reading and writing of the processor, so that the operand processing method can be present in the processor or the peripheral circuit and can be used for temporarily storing binary data, and the reading speed is high.

And 2, storing the source operand in a preset memory chip area, adding the address of the memory chip area into the precision conversion instruction, analyzing the precision conversion instruction by the preprocessing module to obtain the address of the source operand, and extracting the source operand in the memory chip area by the preprocessing module through the address. The preset memory area may be a memory area such as a memory or a flash memory.

And 3, directly compiling the source operand in the precision conversion instruction according to a preset coding mode, and decoding the precision conversion instruction by the preprocessing module according to a decoding mode corresponding to the coding mode to obtain the source operand compiled in the precision conversion instruction.

Further, the precision indication information of the source operand can be added to the precision translation instruction in two ways: the first method is to compile the precision indication information of the storage source operand directly into the precision translation instruction, and refer to the above method 3 specifically; in the second method, a register identifier of a register storing precision indication information of a source operand is added to the precision translation instruction, and the method 1 may be specifically referred to.

Further, the precision indication information of the target operand can be added to the precision translation instruction in two ways: the first method is to compile the precision indication information of the storage target operand directly into the precision translation instruction, and it can specifically refer to the above method 3; in the second method, a register identifier of a register storing precision indication information of a target operand is added to the precision translation instruction, and the method 1 described above may be specifically referred to.

The preprocessing module 10 determines the width to be rounded of the source operand according to the precision conversion instruction and through the precision value of the target format and the precision value of the source operand.

In particular, a fixed point number may be recorded as<ix,fx>Where ix is the data of the integer part, located to the left of the decimal point; fx is the fractional part of the data, located to the right of the decimal point. Let the length of ix be IL and the length of fx be FL, then<ix,fx>Is the fixed point number of length IL + FL. E.g. fixed point number<110,10011>The fixed point number represented is binary (110.10011)₂Converted to decimal system of-2²+2¹+2^-1+2^-4+2^-5The fixed point number<110,10011>IL + FL + 3+5 + 8.

In the embodiment of the invention, the width of the numerical value can be the number of digits on the right side of the decimal point in the numerical value. The input source operand may be a fixed-point number or a floating-point number. When the source operand is a fixed-point number, it can be written as < isrc, fsrc >, precision input _ precision ═ fl (fsrc).

In the embodiment of the invention, a user can preset a precision value in a target format, the preprocessing module indicates the received precision value of the source operand according to the precision conversion instruction, and determines the width to be rounded of the source operand by using the precision value in the target format and the precision value of the source operand, wherein the width to be rounded can be understood as the number of bits which need to be rounded off of the source operand.

For example, the user may preset the target format as fixed point number format, target formatThe target operation count is recorded as<idest,fdest>The length fl (ident) of the integer part and the length fl (fdest) of the fractional part of the target format are 4 bits, i.e. the precision requirement for the target operand to be understood as the target format is that the fractional part is 4 bits in length. The preprocessing module receives the source operand in fixed point number format as (101.1100101100101)₂The precision value fl (fsrc) of the source operand is 13 bits, and the preprocessing module may calculate the width m _ roundoff fl (fsrc) -fl (fdest) -13-4 fl (9) of the source operand by using a built-in subtractor.

The random number generator 20 generates a random number according to the width to be rounded, and the width of the random number is the same as the width to be rounded.

In the embodiment of the present invention, the random number generator may be implemented by using a linear feedback shift register, which is a logic circuit formed by a register, an xor gate, and the like, and is essentially a "pseudo random number generator" that can generate pseudo random numbers quickly and conveniently.

Alternatively, the random number generator may be implemented as a hardware random number generator, which is a device that generates random numbers from a physical process rather than a computer program. Such devices are generally based on microscopic phenomena that produce low-level, statistically random "noise" signals, which are, in theory, completely unpredictable.

Hardware random number generators typically include: a basic circuit, and a converter and an amplifier arranged on the basic circuit. The converter is used for converting physical phenomena (such as noise signals) into electric signals, the amplifier can amplify the electric signals obtained after conversion, so that the amplitude of the electric signals which fluctuate randomly is increased to a measurable level, and the hardware random number generator can obtain a series of random numbers by repeatedly sampling the signals which change randomly.

In the embodiment of the invention, a random number can be directly generated according to the width to be rounded through a random number generator, and the precision of the random number is controlled so that the width of the random number is the same as the width to be rounded.

Shifter 30 aligns the random number with the lowest order bits of the source operand, resulting in an aligned number.

Specifically, the shifter is a circuit for performing shift operation on a numerical value, and is capable of performing operations such as shifting and aligning of the numerical value.

The shifter may align the random number rand with the lowest bit of the source operand < isrc, fsrc >, by placing the random number rand on the lowest m _ roundoff bit of the source operand < isrc, fsrc > of length Qx fl (isrc) + fl (fsrc), and by complementing the other bits of the source operand by 0, i.e. by 0 for a total pair (Qx-m _ roundoff) of bits, resulting in an aligned number for the subsequent add operation.

For example, the target format preset by the user is a fixed-point number format, and the precision value fl (fdest) of the target format is 4 bits. The source operand received by the pre-processing module in fixed point format is (101.1100101100101)₂The precision value fl (fsrc) of the source operand is 13 bits, the preprocessing module may calculate, by using a built-in subtractor, a width m _ roundoff fl (fsrc) -fl (fdest) -13-4-9 of the source operand to be rounded, and further generate a random number rand with a width of 9 by using a random number generator, and align the random number rand with the source operand, so as to obtain an aligned number (000.000001101101) assuming rand 011101101)₂。

The adder 40 adds the source operand and the alignment number to obtain a sum result.

In an embodiment of the present invention, an adder is a device that generates a sum of numbers, which takes as inputs an addend and an addend, often used as a computer arithmetic logic unit. In electronics, an adder is a digital circuit that performs digital addition calculations.

Specifically, in the present embodiment, the adder may align the number (000.000001101101) with reference to the example described above for shifter 30₂And source operand (101.1100101100101)₂Adding, and the result is (000.000001101101)₂+(101.1100101100101)₂＝(101.1101001010010)₂。

It should be noted that, referring to fig. 1, the preprocessing module 10, the random number generator 20, the shifter 30, the adder 40, and the multiplexer 50 may be connected in sequence to form a data transmission link, and after obtaining the source operand in the source format, the preprocessing module 10 may transmit the source operand to the adder 40 through the data transmission link, so that the adder 40 may add the source operand and the alignment number.

In addition, the adder 40 may also be connected to the preprocessing module 10, and the adder 40 may obtain the source operand from the preprocessing module 10 and perform the addition operation of the source operand and the alignment number while obtaining the alignment number from the shifter 30.

The least significant bit of the sum is truncated by multiplexer 50 to obtain the destination operand in the destination format.

In the embodiment of the present invention, since the lowest bits of the aligned numbers include the random number with the width to be rounded through the alignment operation of the random number with the lowest bits of the source operand, and the lowest bits of the addition result also include the random number with the width to be rounded, the lowest bits of the addition result can be rounded off through the multiplexer, that is, several bits corresponding to the random number in the addition result can be rounded off, so as to obtain the target operand in the target format. The function of converting the high-precision source operand into the low-precision target operand according to unbiased rounding is realized.

In the embodiment of the present invention, the multiplexer may select a plurality of bits from the summation result as a final result, that is, a plurality of bits of the summation result are discarded.

For example, with reference to the above example, for the summed result (101.1101001010010)₂The 9-bit truncation thereof corresponding to the random number rand 011101101 was truncated to obtain (101.1101)₂Since the target format requires the length of the integer part FL (idest) to be 4 bits, then pair (101.1101)₂After zero padding is carried out on the integer part, the target operation is obtainedNumber is (0101.1101)₂。

To sum up, an operand processing system provided by an embodiment of the present invention includes: the device comprises a preprocessing module, a random number generator, a shifter, an adder and a multiplexer, wherein the preprocessing module is used for: acquiring a precision conversion instruction, and determining the width to be rounded of a source operand according to the precision conversion instruction and the precision value of a target format and the precision value of the source operand; the random number generator is operable to: generating a random number according to the width to be rounded, wherein the width of the random number is the same as the width to be rounded; the shifter is used for: aligning the random number with the lowest bit of the source operand to obtain an aligned number; the adder is used for: adding the source operand and the alignment number to obtain an addition result; the multiplexer is used for: the least significant bits of the sum result are truncated to obtain the destination operand in the destination format. In the invention, the purpose of converting a high-precision source operand into a low-precision target operand according to unbiased rounding can be realized by one precision conversion instruction and combining a pure hardware mode, and all operation operations are completed by a pure hardware device in the unbiased rounding process, so that the instruction number in the whole process is reduced, the calculation time is reduced, and the processing efficiency is improved.

Optionally, referring to fig. 1, in the case that the source format is a fixed-point number format; the pre-processing module 10 comprises: the subtractor sub-module 12, the preprocessing module is specifically configured to:

and calculating the difference value between the precision value of the source operand and the precision value of the target format through the subtracter submodule according to the precision conversion instruction, and determining the difference value as the width to be rounded.

In the embodiment of the present invention, when the source format of the input source operand is the fixed-point number format, the difference between the precision value of the source operand and the precision value of the target format may be calculated by a subtractor sub-module built in the preprocessing module, and the difference is determined as the width to be rounded.

Optionally, referring to fig. 1, in the case when the source format is a floating point number format; the pre-processing module 10 comprises: a format conversion submodule 11 and a subtractor submodule 12, the preprocessing module is specifically configured to:

and according to the precision conversion instruction, converting the format of the source operand into a fixed point number format through the format conversion submodule to obtain a new source operand.

In the embodiment of the present invention, when the source format of the input source operand is the floating point format, the source operand in the floating point format needs to be converted into the source operand in the fixed point format.

For example, if the source operand is in a standard half-precision floating-point format as specified in IEEE754-2008, e.g., the source operand is a floating-point number including a sign bit of 1 bit, a exponent bit of 5 bits, and a mantissa bit of 10 bits, the pre-processing module may convert the source operand in the floating-point format (0011111010111011) via the format conversion submodule according to the fixed-point floating-point number conversion rule₂Conversion to fixed point number format: (1.1010111011)₂。

The pre-processing module determines the width to be rounded to 0 if the precision value of the new source operand is less than or equal to the precision value of the target format.

In the embodiment of the present invention, if the precision value of the new source operand is smaller than or equal to the precision value of the target format, the precision of the source operand may be considered to be smaller than the precision of the target format, the width to be rounded may be set to 0, and the unbiased rounding operation on the source operand may be stopped. In addition, if the precision value of the new source operand is smaller than or equal to the precision value of the target format, the user can be reminded to reset the precision value of the target format, so that the newly set precision value of the target format is smaller than the precision value of the new source operand.

And under the condition that the precision value of the new source operand is larger than the precision value of the target format, the preprocessing module calculates the difference value between the precision value of the new source operand and the precision value of the target format through the subtracter submodule and determines the difference value as the width to be rounded.

In the embodiment of the present invention, if the precision value of the new source operand is greater than the precision value of the target format, a difference between the precision value of the source operand and the precision value of the target format may be calculated by a subtractor sub-module built in the preprocessing module, and the difference is determined as the width to be rounded.

For example, where the source operand is a floating-point number, for example, the source operand is in the standard half-precision format specified in IEEE754-2008, including a 1-bit sign bit, a 5-bit exponent bit, and a 10-bit mantissa bit, i.e., the source operand is (0011111010111011)₂. The target format is an integer part of 4 bits in length and a fractional part of 4 bits in length. First, the source operand is converted into fixed point number (1.1010111011)₂The width to be rounded is calculated as fl (fsrc) -fl (fdest) -10-4-6.

Optionally, the random number generator comprises: a linear feedback shift register. A Linear Feedback Shift Register (LFSR) refers to a shift register that, given an output of a previous state, reuses a linear function of the output as an input. The exclusive-or operation is the most common single-bit linear function: and carrying out exclusive OR operation on certain bits of the register to be used as input, and then carrying out integral shift on each bit in the register.

The initial value given to the register is called the "seed" and because the operation of the linear feedback shift register is deterministic, the data stream generated by the register is completely dependent on the state of the register at that time or before. Moreover, since the state of the register is finite, it will eventually be a repetitive loop. However, with the primitive polynomial, the linear feedback shift register can generate sequences that appear random and have very long cycle periods. The shift register has simple structure and high running speed, the practical key stream generator is mostly based on the shift register, and the shift register theory also becomes the basis of the modern stream cipher system.

Applications of linear feedback shift registers include the generation of pseudo-random numbers, pseudo-random noise sequences, fast digital counters, and scramblers. The application of linear feedback shift registers in both hardware and software is very common. The mathematical principle for fast checking transmission errors in cyclic redundancy check is closely related to the linear feedback shift register.

Optionally, the shifter is specifically configured to: and aligning the random number with the source operand to obtain an alignment number with the same width as the source operand.

Wherein the lowest bit of the random number is placed at the lowest bit of the aligned numbers, and the bits of the aligned numbers other than the lowest bit of the random number are set to 0.

Specifically, the shifter aligns the random number rand with the lowest bit of the source operand < isrc, fsrc >, and places the random number rand on the lowest m _ roundoff bit of the source operand < isrc, fsrc > with the length Qx ═ fl (isrc) + fl (fsrc), and complements the other bits of the source operand with 0, that is, complements the (Qx-m _ roundoff) bits with 0, to obtain the aligned number for the subsequent addition operation.

For example, the target format preset by the user is a fixed-point number format, and the precision value fl (fdest) of the target format is 4 bits. The preprocessing module receives the source operand in fixed point number format as (101.1100101100101)₂The precision value fl (fsrc) of the source operand is 13 bits, the preprocessing module may calculate, by using a built-in subtractor, a width m _ roundoff fl (fsrc) -fl (fdest) -13-4-9 of the source operand to be rounded, and further generate a random number rand with a width of 9 by using a random number generator, and align the random number rand with the source operand, so as to obtain an aligned number (000.000001101101) assuming rand 011101101)₂。

Specifically, in the embodiment of the present invention, referring to the above example, the adder aligns the number of times (000.000001101101) according to the precision conversion instruction₂And source operand (101.1100101100101)₂Is aligned, and then a sum operation is performed, the sum result being (000.000001101101)₂+(101.1100101100101)₂＝(101.1101001010010)₂。

Optionally, the multiplexer is specifically configured to: the multiplexer rounds off the lowest order bits corresponding to the random number in the summation result to obtain the target operand.

In the embodiment of the invention, as the lowest bits of the alignment numbers comprise the random numbers with the width to be rounded through the alignment operation of the random numbers and the lowest bits of the addition result also comprise the random numbers with the width to be rounded, the lowest bits of the addition result are rounded through the multiplexer, namely, a plurality of bits corresponding to the random numbers in the addition result are rounded, and the target operand in the target format is obtained. The function of converting the high-precision source operand into the low-precision target operand according to unbiased rounding is realized.

For example, with reference to the above example, for the summed result (101.1101001010010)₂The 9-bit truncation thereof corresponding to the random number rand 011101101 was truncated to obtain (101.1101)₂Since the target format requires the length of the integer part FL (idest) to be 4 bits, then pair (101.1101)₂Is zero padded to obtain a target operand of (0101.1101)₂。

Optionally, the precision converting instruction includes: the first register identification of the source register, the second register identification of the destination register, the third register identification of the source operand precision indication register and the fourth register identification of the destination operand precision indication register; the preprocessing module comprises: and a decoding sub-module.

In a preferred implementation manner of the embodiment of the present invention, the source operand, the precision value of the source operand, and the precision value of the target format may be stored in corresponding registers, and the preprocessing module may obtain the source operand, the precision value of the source operand, and the precision value of the target format in a manner of reading the registers. Specifically, a plurality of registers can be arranged in an integrated circuit of the operand processing system, the registers have register identifiers, the preprocessing module can access the corresponding registers through the register identifiers to obtain data stored in the registers, and the registers are high-speed memories and are designed and implemented for realizing rapid instruction reading and writing of a processor, so that the registers can be arranged in the processor or a peripheral circuit and can be used for temporarily storing binary data.

For example, the embodiment of the present invention may set a plurality of registers: the preprocessing module can comprise a decoding submodule, and a preset decoding rule is packaged in the decoding submodule and used for decoding the precision conversion instruction.

The preprocessing module carries out decoding operation on the precision conversion instruction through the decoding submodule to obtain a first register identifier, a second register identifier, a third register identifier and a fourth register identifier, and obtains the source operand from the source register according to the first register identifier; according to the third register identification, obtaining the precision value of the source operand from the source operand precision indication register; and obtaining the precision value of the target format from the destination operand precision indication register according to the fourth register identifier.

The source operand precision indication register may also specify whether the source operand is a floating point number or a fixed point number. If the floating point number is the floating point number, the format of the floating point number is specified, namely the floating point number format can be stored in a source operand precision indication register in advance, for example, the floating point number format is in a standard IEEE754-2008 compatible double-precision, single-precision or other format; if the source operand is a fixed-point number, the widths of the integer part and the fractional part of the fixed-point number are specified, namely the widths of the integer part and the fractional part of the fixed-point number can be stored in a source operand precision indication register in advance to realize the limitation of the type of the source operand. The target operand precision indication register may also indicate the width of the integer and fractional portions of the fixed-point numbers in the target operand, i.e., the width of the integer and fractional portions of the fixed-point numbers in the target operand may be stored in advance in the target operand precision indication register.

In the precision conversion instruction, the number of the source registers can exceed one, so as to realize the unbiased rounding operation of a plurality of source operands in parallel, and in addition, the instruction can be expanded into a vector version instruction by using a single instruction multiple data technology.

For example, the precision conversion instruction may be extended to a 256-bit wide vector instruction:

wd[16×k+15:16×k]＝unbiased_round(ws[32×k+31:32×k]),0≤k≤7；

wd[16×k+15+128:16×*k+128]＝unbiased_round(wt[32×k+31:32×k]),0≤k≤7。

the instruction can correspondingly indicate that two source registers ws and wt and one target register wd exist in the system, and particularly, ws, wt and wd are register identifications of the registers with 256 bits. unashed _ round () represents a conversion operation that unbiased rounding of a single precision source operand to a 16-bit fixed point format target operand.

Specifically, the precision conversion instruction specifically executes the following operations:

Wd[15:0]＝unbiased_round(ws[31:0])；

Wd[31:16]＝unbiased_round(ws[63:32])；

Wd[47:32]＝unbiased_round(ws[95:64])；

Wd[63:48]＝unbiased_round(ws[127:96])；

Wd[79:64]＝unbiased_round(wt[31:0])；

Wd[95:80]＝unbiased_round(wt[63:32])；

Wd[111:96]＝unbiased_round(wt[95:64])；

Wd[127:112]＝unbiased_round(wt[127:96])。

unbiased _ round () refers to an unbiased rounding operation performed directly with the hardware unit shown. Therefore, the purpose of converting the high-precision source operand into the low-precision target operand according to unbiased rounding can be realized by one precision conversion instruction and combining a pure hardware mode, and in the unbiased rounding process, the generation of the random number and the addition operation are completed by a pure hardware device, so that the instruction number in the whole process is reduced, the calculation time is reduced, and the processing efficiency is improved.

In an embodiment of the present invention, the source register may further have a certain width value, so as to limit the number of source operands placed therein by the width value, for example, when the width value of the source register is 256 bits, 8 32-bit source operands may be stored.

In an optional implementation scheme of the embodiment of the present invention, the source operand, the precision value of the source operand, and the precision value of the target format may also be stored and read in other forms, in one implementation manner, the source operand, the precision value of the source operand, and the precision value of the target format may be directly compiled in the precision conversion instruction according to a preset encoding manner, and the preprocessing module may obtain the source operand, the precision value of the source operand, and the precision value of the target format compiled in the precision conversion instruction by decoding the precision conversion instruction according to a decoding manner corresponding to the encoding manner.

In another implementation manner, for a source operand, the source operand may be stored in a preset memory slice region and an address of the memory slice region is added to the precision translation instruction, the preprocessing module may obtain the address of the source operand by analyzing the precision translation instruction, and the preprocessing module may extract the source operand in the memory slice region by the address.

Optionally, the multiplexer is specifically configured to: adding the target operand to a destination register in the precision translation instruction.

In an implementation manner of the embodiment of the present invention, after the multiplexer calculates the target operand, the multiplexer may add the target operand to a destination register corresponding to the second register identifier and used for storing the target operand according to the second register identifier, so that the processor extracts the target operand in the destination register and returns the target operand to the client, or directly uses the target operand to perform the relevant processing operation of the algorithm model.

In another implementation manner of the embodiment of the present invention, after the multiplexer calculates the target operand, the multiplexer may further store the target operand into the corresponding memory chip area according to the corresponding storage address, thereby providing a different storage manner for storing with the register.

Optionally, the target format has a preset expressible range; replacing the value of the source operand with a maximum value of the expressible range when the value of the source operand is greater than or equal to the maximum value; replacing the value of the source operand with the minimum value when the value of the source operand is less than or equal to the minimum value of the expressible range.

In the embodiment of the present invention, an expressible range of a target format supported by unbiased rounding processing may also be preset in the precision conversion instruction, and when the value of the source operand exceeds the expressible range, a fixed default value is taken to replace the source operand. For example, when the value of the source operand is greater than the maximum fixed-point number that can be expressed by the target format, it may be defined that the source operand is the maximum fixed-point number that can be expressed by the target format; when the value of the source operand is less than the minimum fixed-point number that the target format can express, it may be defined that the source operand is the minimum fixed-point number that the target format can express.

Fig. 2 is a block diagram of an architecture of a processor according to an embodiment of the present invention, as shown in fig. 2, including: an instruction fetch module 60, a decode module 70, and a function module 100.

The instruction fetching module 60 is configured to obtain a precision converting instruction, and the instruction fetching module 60 in the processor may fetch the precision converting instruction from a register corresponding to an address of the precision converting instruction from the processor according to the address of the precision converting instruction.

The decoding module 70 is configured to perform a decoding operation on the precision conversion instruction to obtain a source operand in a source format, a target format, and a precision value in the target format; in the embodiment of the present invention, a preset decoding rule is encapsulated in the decoding module 70, and is used for performing a decoding operation on the precision conversion instruction, so as to obtain a precision value of a target format and a precision value of a source operand in the precision conversion instruction.

The functional module 100 is configured to: determining the width to be rounded of the source operand according to the precision value of the target format and the precision value of the source operand; generating a random number according to the width to be rounded, wherein the width of the random number is the same as the width to be rounded; aligning the random number with the lowest bit of the source operand to obtain an aligned number; adding the source operand and the alignment number to obtain an addition result; the least significant bits of the sum result are truncated to obtain the destination operand in the destination format.

Optionally, the functional module 100 includes: the device comprises a preprocessing module 10, a random number generator 20, a shifter 30, an adder 40 and a multiplexer 50, wherein the preprocessing module 10, the random number generator 20, the shifter 30, the adder 40 and the multiplexer 50 are connected in sequence.

The preprocessing module 10 is configured to: determining the width to be rounded of the source operand according to the precision value of the target format and the precision value of the source operand;

the random number generator 20 is configured to: generating a random number according to the width to be rounded, wherein the width of the random number is the same as the width to be rounded;

the shifter 30 is configured to: aligning the random number with the lowest order bits of the source operand to obtain an aligned number;

the adder 40 is configured to: adding the source operand and the alignment number to obtain an addition result;

the multiplexer 50 is configured to: and truncating the lowest bit of the addition result to obtain the target operand in the target format.

In the embodiment of the present invention, on the basis of including the instruction fetching module 60 and the decoding module 70, the integrated circuit of the processor may further integrate and arrange the preprocessing module 10, the random number generator 20, the shifter 30, the adder 40, and the multiplexer 50, so that the functional module 100 formed by the preprocessing module 10, the random number generator 20, the shifter 30, the adder 40, and the multiplexer 50 may be a module specially used for performing an unbiased rounding operation as the processor.

Optionally, in the case that the source format is a fixed point number format; the preprocessing module comprises: a subtractor submodule; the preprocessing module is specifically configured to: and calculating the difference value between the precision value of the source operand and the precision value of the target format through the subtracter submodule according to the precision conversion instruction, and determining the difference value as the width to be rounded.

Optionally, in the case when the source format is a floating point format; the preprocessing module comprises: a format conversion submodule and a subtracter submodule; the preprocessing module is specifically configured to: converting the format of the source operand into a fixed point number format through the format conversion submodule according to the precision conversion instruction to obtain a new source operand; determining the width to be rounded to 0 if the precision value of the new source operand is less than or equal to the precision value of the target format; and under the condition that the precision value of the new source operand is larger than the precision value of the target format, calculating the difference value of the precision value of the new source operand and the precision value of the target format through the subtracter submodule, and determining the difference value as the width to be rounded.

Optionally, the precision converting instruction includes: the first register identification of the source register, the second register identification of the destination register, the third register identification of the source operand precision indication register and the fourth register identification of the destination operand precision indication register; the coding module is specifically configured to: performing a decoding operation on the precision conversion instruction to obtain the first register identifier, the second register identifier, the third register identifier and the fourth register identifier; the preprocessing module is specifically configured to: obtaining the source operand from the source register according to the first register identifier; according to the third register identification, obtaining the precision value of the source operand from the source operand precision indication register; and obtaining the precision value of the target format from the destination operand precision indication register according to the fourth register identifier.

Optionally, referring to fig. 2, the processor further includes a write-back module 80, where the write-back module 80 is configured to: and adding the target operand to the destination register according to the second register identification.

In the embodiment of the present invention, the basic circuit of the processor has a write-back module 80, and the write-back module may add the target operand to the destination register according to the second register identifier corresponding to the destination register, so that the processor extracts the target operand in the destination register and returns the target operand to the client, or directly uses the target operand to perform the relevant processing operation of the algorithm model.

For the related description of the processor, reference may be made to the description of the operand processing system in the above embodiments of the present invention, and details are not repeated here.

To sum up, an embodiment of the present invention provides a processor, including: the device comprises an instruction fetching module, a decoding module, a preprocessing module, a random number generator, a shifter, an adder and a multi-path selector, wherein the instruction fetching module, the decoding module, the preprocessing module, the random number generator, the shifter, the adder and the multi-path selector are sequentially connected; the instruction fetching module is used for acquiring a precision conversion instruction, and the decoding module is used for carrying out decoding operation on the precision conversion instruction to obtain a source operand in a source format, a target format and a precision value in the target format; the preprocessing module is used for: determining the width to be rounded of the source operand according to the precision value of the target format and the precision value of the source operand; the random number generator is operable to: generating a random number according to the width to be rounded, wherein the width of the random number is the same as the width to be rounded; the shifter is used for: aligning the random number with the lowest bit of the source operand to obtain an aligned number; the adder is used for: adding the source operand and the alignment number to obtain an addition result; the multiplexer is used for: the least significant bits of the sum result are truncated to obtain the destination operand in the destination format. In the invention, the purpose of converting a high-precision source operand into a low-precision target operand according to unbiased rounding can be realized by one precision conversion instruction and combining a pure hardware mode, and all operation operations are completed by a pure hardware device in the unbiased rounding process, so that the instruction number in the whole process is reduced, the calculation time is reduced, and the processing efficiency is improved.

Fig. 3 is a flowchart illustrating steps of an operand processing method according to an embodiment of the present invention, where as shown in fig. 3, the method may include:

step 101, obtaining a precision conversion instruction, and determining a width to be rounded of a source operand according to the precision conversion instruction and through a precision value of a target format and a precision value of the source operand.

Optionally, in the case that the source format is a fixed-point number format, step 101 may specifically include:

sub-step 1011, calculating a difference between the precision value of the source operand and the precision value of the target format according to the precision conversion instruction, and determining the difference as the width to be rounded.

Optionally, in the case that the source format is a floating point format, step 101 may specifically include:

and a substep 1012, converting the format of the source operand into a fixed point number format according to the precision conversion instruction, and obtaining a new source operand.

Sub-step 1013, determining the width to be rounded to 0 if the precision value of the new source operand is less than or equal to the precision value of the target format.

Sub-step 1014, in case the precision value of the new source operand is greater than the precision value of the target format, calculating a difference value between the precision value of the new source operand and the precision value of the target format and determining the difference value as the width to be rounded.

Optionally, the precision converting instruction includes: the first register identification of the source register, the second register identification of the destination register, the third register identification of the source operand precision indication register and the fourth register identification of the destination operand precision indication register; step 101 may specifically include:

substep 1015, obtaining the source operand from the source register according to the first register identification.

Substep 1016, obtaining the precision value of the source operand from the precision indication register of the source operand according to the third register identifier.

Substep 1017, obtaining the precision value of the target format from the destination operand precision indication register according to the fourth register identifier.

Sub-step 1018, determining a width to be rounded of the source operand from the precision value of the target format and the precision value of the source operand.

Optionally, the destination operand may be added to the destination register according to the second register identifier.

And 102, generating a random number according to the width to be rounded, wherein the width of the random number is the same as the width to be rounded.

And 103, aligning the random number with the lowest bit of the source operand to obtain an aligned number.

Optionally, step 103 may specifically include:

and a substep 1031, aligning the random number with the source operand to obtain an aligned number having the same width as the source operand, wherein the lowest bit of the random number is placed at the position of the lowest bit of the aligned number, and the bits of the aligned number except the random number of the lowest bit are set to 0.

And 104, adding the source operand and the alignment number to obtain an addition result.

Optionally, step 104 may specifically include:

in sub-step 1041, after aligning the decimal point of the alignment number with the decimal point of the source operand, the summation operation of the alignment number and the source operand is performed to obtain the summation result.

And 105, truncating the lowest bit of the addition result to obtain the target operand in the target format.

Optionally, step 105 may specifically include:

substep 1051 truncates the lowest bit of the sum corresponding to the random number to obtain the destination operand.

For the related description of step 101 to step 105, reference may be made to the description of the operand processing system in the above embodiment of the present invention, and details are not repeated here.

In summary, the operand processing method provided in the embodiment of the present invention includes acquiring a precision conversion instruction, and determining a width to be rounded of a source operand according to the precision conversion instruction and through a precision value of a target format and a precision value of the source operand; generating a random number according to the width to be rounded, wherein the width of the random number is the same as the width to be rounded; aligning the random number with the lowest bit of the source operand to obtain an aligned number; adding the source operand and the alignment number to obtain an addition result; the least significant bits of the sum result are truncated to obtain the destination operand in the destination format. In the invention, the purpose of converting a high-precision source operand into a low-precision target operand according to unbiased rounding can be realized by one precision conversion instruction and combining a pure hardware mode, and all operation operations are completed by a pure hardware device in the unbiased rounding process, so that the instruction number in the whole process is reduced, the calculation time is reduced, and the processing efficiency is improved.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. An operand processing system, characterized in that the system comprises:

2. The system of claim 1, wherein the pre-processing module comprises: a subtractor submodule;

the preprocessing module is specifically configured to: and calculating the difference value between the precision value of the source operand and the precision value of the target format through the subtracter submodule according to the precision conversion instruction, and determining the difference value as the width to be rounded, wherein the source format of the source operand is a fixed point number format.

3. The system of claim 2, wherein if the source format of the source operand is a floating point number format; the preprocessing module further comprises: a format conversion submodule;

the preprocessing module is specifically configured to:

converting the source operand into a fixed point number format through the format conversion submodule according to the precision conversion instruction to obtain a new source operand;

4. The system according to any of claims 1-3, wherein the shifter is specifically configured to: and aligning the random number with the source operand to obtain an aligned number with the same width as the source operand, wherein the lowest bit of the random number is placed at the position of the lowest bit of the aligned number, and the other bits of the aligned number except the random number of the lowest bit are set to be 0.

5. The system according to any of claims 1-3, wherein the adder is specifically configured to: and after aligning the decimal point of the alignment number with the decimal point of the source operand, performing addition operation of the alignment number and the source operand to obtain an addition result.

6. The system of any of claims 1-5, wherein the precision conversion instruction comprises: the first register identification of the source register, the second register identification of the destination register, the third register identification of the source operand precision indication register and the fourth register identification of the destination operand precision indication register; the preprocessing module comprises: a decoding sub-module;

7. The system of claim 6, wherein after the multiplexer truncates the least significant bits of the sum to obtain the target operand in the target format, the multiplexer is further configured to: and adding the target operand to the destination register according to the second register identification.

8. The system of claim 1, wherein the random number generator comprises: a linear feedback shift register.

9. A processor, comprising:

the functional module is used for: determining the width to be rounded of the source operand according to the precision value of the target format and the precision value of the source operand; generating a random number with the same width as the width to be rounded according to the width to be rounded; aligning the random number with the lowest order bits of the source operand to obtain an aligned number; adding the source operand and the alignment number to obtain an addition result; and truncating the lowest bit of the addition result to obtain the target operand in the target format.

10. The processor of claim 9, wherein the functional module comprises: the operand processing system of any of claims 1-8.

11. An operand processing method, applied to the system of any one of claims 1 to 9, the method comprising:

12. The method of claim 11, wherein if the source operand format is a fixed-point number format; the determining, according to the precision conversion instruction, the to-be-rounded width of the source operand according to the precision value of the target format and the precision value of the source operand includes:

13. The method of claim 11, wherein the determining the width to be rounded of the source operand according to the precision conversion instruction from the precision value of the target format and the precision value of the source operand in the case where the source operand is in a floating point format comprises:

14. The method of claims 11-13, wherein aligning the random number with the least significant bits of the source operand results in an aligned number comprising;

15. The method of claims 11-13, wherein summing the source operand with the alignment number to obtain a summed result comprises:

16. The method of claims 11-15, wherein the precision conversion instruction comprises: the first register identification of the source register, the second register identification of the destination register, the third register identification of the source operand precision indication register and the fourth register identification of the destination operand precision indication register;

17. The method of claim 16, wherein after truncating the least significant bits of the sum to obtain the target operand in the target format, the method further comprises: