CN118312130A - Data processing method and device, processor, electronic equipment and storage medium - Google Patents

Data processing method and device, processor, electronic equipment and storage medium

Info

Publication number
CN118312130A
CN118312130A CN202410720964.7A CN202410720964A CN118312130A CN 118312130 A CN118312130 A CN 118312130A CN 202410720964 A CN202410720964 A CN 202410720964A CN 118312130 A CN118312130 A CN 118312130A
Authority
CN
China
Prior art keywords
mantissa
operand
source operand
bit
integer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410720964.7A
Other languages
Chinese (zh)
Inventor
请求不公布姓名
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Bi Ren Technology Co ltd
Beijing Bilin Technology Development Co ltd
Original Assignee
Shanghai Bi Ren Technology Co ltd
Beijing Bilin Technology Development Co ltd
Filing date
Publication date
Application filed by Shanghai Bi Ren Technology Co ltd, Beijing Bilin Technology Development Co ltd filed Critical Shanghai Bi Ren Technology Co ltd
Publication of CN118312130A publication Critical patent/CN118312130A/en
Pending legal-status Critical Current

Links

Abstract

A data processing method, a processor, a data processing apparatus, an electronic device, and a non-transitory computer readable storage medium. The data processing method comprises the following steps: receiving a first source operand and a second source operand as input parameters, wherein the type of the first source operand is a floating point number type, and the type of the second source operand is a first integer type; representing a mantissa portion of a first source operand with a first intermediate operand of a second integer type, wherein the bit width of the second integer type is greater than the effective bit width L of the mantissa portion of the first source operand, and the bit width of a maximum value of the second source operand is less than the difference between the bit width of the second integer type and the effective bit width L, L being a positive integer; converting the second source operand into a second integer type to obtain a second intermediate operand; executing integer multiplication of a second integer type on the first intermediate operand and the second intermediate operand to obtain a mantissa multiplication result type; and obtaining a destination operand according to the mantissa multiplication result.

Description

Data processing method and device, processor, electronic equipment and storage medium
Technical Field
Embodiments of the present disclosure relate to a data processing method, a processor, an electronic device, and a non-transitory computer readable storage medium.
Background
Floating Point (FP) is mainly used to represent a fraction, and is typically composed of three parts, namely a sign bit, an exponent (exponent) part, which may also be referred to as a step code part, and a mantissa (mantissa) part. For example, floating point number V may be generally expressed in the form:
Wherein, the sign bit s can be 1 bit, and decides whether the floating point number V is a negative number or a positive number; m represents a mantissa portion, which may include a plurality of bits, in the form of a binary fraction defining the precision of the floating point number; e represents an index (also called a step code value) for weighting the floating point number, reflecting the position of the decimal point in the floating point number V and defining the value range of the floating point number.
Disclosure of Invention
At least one embodiment of the present disclosure provides a data processing method, including: receiving a first source operand and a second source operand as input parameters, wherein the type of the first source operand is a floating point type, and the type of the second source operand is a first integer type; representing a mantissa portion of the first source operand with a first intermediate operand of a second integer type, wherein a bit width of the second integer type is greater than a significant bit width L of the mantissa portion of the first source operand, a bit width of a maximum value of the second source operand is less than a difference between the bit width of the second integer type and the significant bit width L, L being a positive integer; converting the second source operand into the second integer type to obtain a second intermediate operand; performing integer multiplication of the second integer type on the first intermediate operand and the second intermediate operand to obtain a mantissa multiplication result, wherein the mantissa multiplication result is of the second integer type; and obtaining a destination operand serving as an output parameter according to the mantissa multiplication result.
For example, in a data processing method provided in at least one embodiment of the present disclosure, representing a mantissa portion of a first source operand with a first intermediate operand of a second integer type includes: filling hidden bits in the mantissa portion of the first source operand to obtain a valid mantissa portion; and taking L bits of the valid mantissa part as 1 st bit to L th bit of the first intermediate operand, wherein the 1 st bit of the first intermediate operand is the lowest bit.
For example, in a data processing method provided in at least one embodiment of the present disclosure, the destination operand is represented by three independent integer parameters, the three independent integer parameters including a first integer parameter representing a mantissa portion of the destination operand, a second integer parameter representing a sign bit of the destination operand, and a third integer parameter representing an exponent portion of the destination operand, the obtaining the destination operand according to the mantissa multiplication result includes: directly taking the mantissa multiplication result as the first integer parameter; determining the second integer parameter according to the sign bit of the first source operand and the sign bit of the second operation number; and taking the exponent part of the first source operand as the third integer parameter.
For example, in the data processing method provided in at least one embodiment of the present disclosure, the destination operand is a floating point number, and the obtaining the destination operand according to the mantissa multiplication result includes: determining a mantissa portion and an exponent portion of the destination operand based on the mantissa multiplication result;
and determining the sign bit of the destination operand according to the sign bit of the first source operand and the sign bit of the second operation number.
For example, in a data processing method provided in at least one embodiment of the present disclosure, determining a mantissa portion and an exponent portion of the destination operand according to the mantissa multiplication result includes: determining an effective part from the mantissa multiplication result, wherein the starting point of the effective part is 1 st 1 in the mantissa multiplication result, which occurs along a preset direction, and the preset direction is the direction from high order to low order in the mantissa multiplication result; determining a mantissa intermediate result according to the effective part, wherein the bit width of the mantissa intermediate result is less than or equal to L; and performing shift processing on the mantissa intermediate result according to the exponent part of the first source operand to obtain a mantissa part and an exponent part of the destination operand.
For example, in the data processing method provided in at least one embodiment of the present disclosure, the end point of the effective portion is the last 1s occurring along the preset direction in the mantissa multiplication result, or the end point of the effective portion is the least significant bit in the mantissa multiplication result, and determining, according to the effective portion, a mantissa intermediate result includes: and rounding the effective part according to a preset rounding rule to obtain the mantissa intermediate result.
For example, in the data processing method provided in at least one embodiment of the present disclosure, rounding the valid portion according to a preset rounding rule to obtain the mantissa intermediate result includes: in response to the length N of the effective portion being equal to or less than L, directly taking the effective portion as the mantissa intermediate result; and determining the mantissa intermediate result according to the L+1th bit in the effective portion in response to the length N of the effective portion being greater than L, wherein the bit width of the mantissa intermediate result is equal to L when the length N of the effective portion is greater than L, the most significant bit in the effective portion is the 1 st bit, and the L+1th bit in the effective portion is the L+1th bit in the preset direction from the 1 st bit.
For example, in a data processing method provided in at least one embodiment of the present disclosure, determining the mantissa intermediate result according to the (l+1) th bit in the valid portion includes: discarding the (L+1) -th bits of the effective portion as the mantissa intermediate result in response to the (L+1) -th bit of the effective portion being a first value; responsive to an L+1th bit in the valid portion being a second value and at least 1 bit of the L+2th to N-th bits of the valid portion not being the first value, discarding the L+1th to N-th bits of the valid portion and performing a carry operation on the L-th bit of the valid portion to obtain the mantissa intermediate result; and in response to the L+1st bit in the active portion being the second value, and the L+2nd bit to the N th bit of the active portion being the first value, determining the mantissa intermediate result according to the L th bit of the active portion.
For example, in a data processing method provided in at least one embodiment of the present disclosure, determining a valid portion from the mantissa multiplication result includes: starting with the starting point, continuously selecting at most L bits from the mantissa multiplication result along the preset direction as the effective part; determining a mantissa intermediate result from the active portion, comprising: the significant portion is taken as the mantissa intermediate result.
For example, in a data processing method provided in at least one embodiment of the present disclosure, the destination operand is a product result of the first source operand and the second source operand, and the shifting includes shifting the mantissa intermediate result according to a value of an exponent portion of the first source operand.
For example, in a data processing method provided in at least one embodiment of the present disclosure, the destination operand is a rounded result after performing a further rounding operation on a product result of the first source operand and the second source operand, a down-rounding operation is performed on the product result in response to the rounding operation indication, and the shifting includes shifting the mantissa intermediate result according to a value of an exponent portion of the first source operand.
For example, in a data processing method provided in at least one embodiment of the present disclosure, the destination operand is a rounded result after further rounding operation is performed on a product result of the first source operand and the second source operand, and in response to the rounding operation indicating that a rounding up operation is performed on the product result, the shift processing includes: shifting the mantissa intermediate result according to the value of the exponent part of the first source operand to obtain a shifted result and an overflow mantissa part generated due to the shifting; responding to the overflow mantissa parts to be first values, and obtaining the mantissa parts and the exponent parts of the destination operand according to the shift result; and in response to the overflow mantissa portion not being the first value, performing a carry operation on the shift result and obtaining a mantissa portion and an exponent portion of the destination operand from the carry result.
For example, in a data processing method provided in at least one embodiment of the present disclosure, determining sign bits of the destination operand according to sign bits of the first source operand and sign bits of the second operation number includes: determining that the sign bit of the destination operand is a positive sign bit in response to the sign bit of the first source operand and the sign bit of the second source operand being the same, wherein the positive sign bit indicates that the destination operand is a positive number; and determining sign bits of the destination operand to be negative sign bits in response to sign bits of the first source operand and sign bits of the second source operand being different, wherein the negative sign bits represent the destination operand to be negative.
At least one embodiment of the present disclosure provides a data processing method, including: receiving a data processing instruction indicating that at least floating point number and integer multiplication are performed, wherein the data processing instruction comprises a first source operand and a second source operand as input parameters, the type of the first source operand is a floating point number type, the type of the second source operand is a first integer type, and a destination operand as output parameter, the data processing instruction is executed by an operation unit after the data processing instruction is analyzed, wherein the data processing instruction is executed by the operation unit, and the method comprises the steps of: representing a mantissa portion of the first source operand with a first intermediate operand of a second integer type, wherein a bit width of the second integer type is greater than a significant bit width L of the mantissa portion of the first source operand, a bit width of a maximum value of the second source operand is less than a difference between the bit width of the second integer type and the significant bit width L, L being a positive integer; converting the second source operand into the second integer type to obtain a second intermediate operand; performing integer multiplication of the second integer type on the first intermediate operand and the second intermediate operand to obtain a mantissa multiplication result, wherein the mantissa multiplication result is of the second integer type; and obtaining a destination operand serving as an output parameter according to the mantissa multiplication result.
At least one embodiment of the present disclosure provides a processor, including an instruction parsing unit and an operation unit, where the instruction parsing unit is configured to receive and parse the data processing instruction, where the data processing instruction includes a first source operand and a second source operand as input parameters, and a destination operand as output parameters, the first source operand is of a floating point type, the second source operand is of a first integer type, and the operation unit executes the data processing instruction after the instruction parsing unit parses the data processing instruction, where the operation unit executes the data processing instruction, including performing the following operations: representing a mantissa portion of the first source operand with a first intermediate operand of a second integer type, wherein a bit width of the second integer type is greater than a significant bit width L of the mantissa portion of the first source operand, a bit width of a maximum value of the second source operand is less than a difference between the bit width of the second integer type and the significant bit width L, L being a positive integer; converting the second source operand into the second integer type to obtain a second intermediate operand; performing integer multiplication of the second integer type on the first intermediate operand and the second intermediate operand to obtain a mantissa multiplication result, wherein the mantissa multiplication result is of the second integer type; and obtaining a destination operand serving as an output parameter according to the mantissa multiplication result.
At least one embodiment of the present disclosure provides a data processing apparatus including: an acquisition module configured to receive a first source operand and a second source operand as input parameters, the first source operand being of a floating point type and the second source operand being of a first integer type; a first conversion module configured to represent a mantissa portion of the first source operand with a first intermediate operand of a second integer type, wherein the bit width of the second integer type is greater than a valid bit width L of the mantissa portion of the first source operand, the bit width of a maximum value of the second source operand is less than a difference between the bit width of the second integer type and the valid bit width L, L being a positive integer; the second conversion module is configured to convert the second source operand into the second integer type to obtain a second intermediate operand; a multiplication calculation module configured to perform integer multiplication of the second integer type on the first intermediate operand and the second intermediate operand to obtain a mantissa multiplication result, wherein the mantissa multiplication result is the second integer type; and the output module is configured to obtain a destination operand serving as an output parameter according to the mantissa multiplication result.
At least one embodiment of the present disclosure provides an electronic device, including: a memory non-transitory storing computer-executable instructions; a processor configured to execute the computer-executable instructions, wherein the computer-executable instructions, when executed by the processor, implement a data processing method according to any embodiment of the present disclosure.
At least one embodiment of the present disclosure provides a non-transitory computer-readable storage medium storing computer-executable instructions that, when executed by a processor, implement a data processing method according to any embodiment of the present disclosure.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings of the embodiments will be briefly described below, and it is apparent that the drawings in the following description relate only to some embodiments of the present disclosure, not to limit the present disclosure.
FIG. 1 is a process of scaling a picture by a resolution operator;
FIG. 2 is a schematic flow chart of a data processing method provided by at least one embodiment of the present disclosure;
FIG. 3 is a schematic diagram illustrating a relationship between a first source operand and a first intermediate operand according to at least one embodiment of the present disclosure;
FIGS. 4A and 4B are schematic diagrams of active portions provided by at least one embodiment of the present disclosure;
FIG. 5 is a schematic flow chart diagram of a data processing method provided by at least one embodiment of the present disclosure;
FIG. 6 is a schematic block diagram of a data processing apparatus provided in accordance with at least one embodiment of the present disclosure;
FIG. 7 is a schematic block diagram of a processor provided in accordance with at least one embodiment of the present disclosure;
FIG. 8 is a schematic block diagram of an electronic device provided in an embodiment of the present disclosure;
fig. 9 is a schematic diagram of a non-transitory computer readable storage medium according to at least one embodiment of the present disclosure.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present disclosure more apparent, the technical solutions of the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings of the embodiments of the present disclosure. It will be apparent that the described embodiments are some, but not all, of the embodiments of the present disclosure. All other embodiments, which can be made by one of ordinary skill in the art without the need for inventive faculty, are within the scope of the present disclosure, based on the described embodiments of the present disclosure.
Unless defined otherwise, technical or scientific terms used in this disclosure should be given the ordinary meaning as understood by one of ordinary skill in the art to which this disclosure belongs. The terms "first," "second," and the like, as used in this disclosure, do not denote any order, quantity, or importance, but rather are used to distinguish one element from another. The word "comprising" or "comprises", and the like, means that elements or items preceding the word are included in the element or item listed after the word and equivalents thereof, but does not exclude other elements or items. The terms "connected" or "connected," and the like, are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. "upper", "lower", "left", "right", etc. are used merely to indicate relative positional relationships, which may also be changed when the absolute position of the object to be described is changed. In order to keep the following description of the embodiments of the present disclosure clear and concise, the present disclosure omits a detailed description of some known functions and known components.
Conventional floating point numbers typically include three formats, namely, half-precision floating point number (FP 16), single-precision floating point number (FP 32), and double-precision floating point number (FP 64), with exponent and mantissa portions having different numbers of bits.
AI (ARTIFICIAL INTELLIGENCE ) accelerators, etc. have been widely used for deep learning model training. For the common convolution operation in the deep learning model, special optimization is performed in software and hardware design to accelerate computation, for example, various floating point number data formats are developed for optimization in fields of artificial intelligence or deep learning, such as BF16 (brain floating point, bit width is 16 bits), BF24 (brain floating point, bit width is 24 bits), TF32 (Tensor Float 32, bit width is 19 bits), and the like, and these data formats can greatly reduce computation processing, especially the computation resources and power consumption required by matrix multiplication or convolution multiplication operation, and the like. In addition, the processor supports some conventional floating point types, such as half-precision floating point (FP 16, bit wide of 16 bits) or single-precision floating point (FP 32, bit wide of 32 bits), etc.
Table 1 below is a data format for several floating point precision types.
TABLE 1 data format
As shown in table 1, for the floating-point number precision type FP32, the total number of bits thereof is 32 bits, including 1 sign bit, the exponent portion (i.e., the step code) includes 8 bits, the mantissa portion includes 23 bits, and the mantissa valid bit width is 23+1=24 bits. For FP64, the total number of bits is 64 bits, including 1 sign bit, the exponent portion (i.e., the step code) includes 11 bits, the mantissa portion includes 52 bits, and the mantissa valid bit width is 52+1=53 bits. For BF16, the total number of bits is 16 bits, including 1 sign bit, the exponent portion (i.e., the step code) includes 8 bits, the mantissa portion includes 7 bits, and the mantissa valid bit width is 7+1=8 bits. For BF24, the total number of bits is 24 bits, including 1 sign bit, the exponent portion (i.e., the step code) includes 8 bits, the mantissa portion includes 15 bits, and the mantissa valid bit width is 15+1=16 bits.
In order to represent the most significant data bits in the mantissa while having a fixed representation of the floating point number, the encoding of the floating point number should take a certain specification, the mantissa portion should be given in pure decimal numbers, and the absolute value of the mantissa should be greater than or equal to 1/R (typically 2) and less than or equal to 1, i.e., the first bit after the decimal point is non-zero, and a floating point number that does not meet this specification may be made to meet this specification by modifying the exponent and moving the mantissa at the same time. Thus, for a normalized floating point number, the most significant bit of its mantissa portion is 1 and its mantissa significant bit is the mantissa portion number of bits plus 1, e.g., for a single precision floating point number, its mantissa portion includes 23 bits and its mantissa significant bit is 24 bits and the most significant bit is 1.
The Resize operator is a common operator used to Resize input data by interpolation methods to obtain output data of a desired size.
In PyTorch, the resolution operator is a function in the torchvision library, primarily for resizing the image. It receives two main parameters: size and interaction. Wherein the size parameter specifies the size of the target image and may be an integer or a tuple; the interpolation parameter is used to specify the interpolation mode, and defaults to bilinear interpolation.
In the field of deep learning, tensorRT is a performance-optimized deep learning inference engine in which the resolution operator plays an important role in image processing and model deployment. The method realizes scaling and clipping of the image by processing the input data by using an interpolation algorithm.
Common interpolation methods include nearest neighbor interpolation, bilinear interpolation, bicubic interpolation, and the like. The choice of these interpolation methods affects the performance of the resolution operator and the quality of the output image.
Taking as an example the resolution operator using bilinear interpolation (bilinear) mode, the process of scaling a picture by the resolution operator is described with reference to fig. 1.
As shown in the bilinear interpolation schematic diagram in fig. 1, one pixel point D in the scaled picture corresponds to the pixel point before scaling according to the scaling relationship, and the found point is usually not an integer pixel point (i.e. a position between 4 pixel points), as shown by the S point in fig. 1.
Bilinear interpolation mode defines the value of the S point as the result of bilinear interpolation of 4 points P11, P12, P21, and P22 around the S point in fig. 1. Firstly, carrying out linear interpolation on P11 and P21 to obtain Q1, carrying out linear interpolation on P12 and P22 to obtain Q2, and carrying out linear interpolation on Q1 and Q2 to obtain a value of S, namely:
Where floor represents a downward rounding function, X frac represents a scaling ratio, that is, the size of the scaled image divided by the size of the image before scaling, and X D represents coordinates of elements on the image before scaling.
Obviously, the precision of the resolution operator depends mainly on whether the calculation of the linear interpolation coordinates is accurate or not, and the precision of the interpolation coordinates depends on X frac·xD, namely the precision of data adopted by the calculation of floating point numbers and integer multiplications.
Many artificial intelligence chips at present do not generally support FP64 high-precision floating point numbers, and double-linear interpolation generally adopts FP32 floating point multiplication, namely: x frac adopts an FP32 type, X D is converted into the FP32 type from integer, then the FP32 multiplication is carried out on X frac and X D to obtain a result X, and finally the result X is rounded downwards to obtain X 1. This way of calculation may result in scaling errors due to insufficient accuracy. This is because the calculation results of the floating point number are very similar due to the precision error, but the calculation results are quite different after rounding operations such as rounding down, and thus larger scaling errors are caused.
At least one embodiment of the present disclosure provides a data processing method, a processor, an electronic device, and a non-transitory computer readable storage medium. The data processing method comprises the following steps: receiving a first source operand and a second source operand as input parameters, wherein the type of the first source operand is a floating point number type, and the type of the second source operand is a first integer type; representing a mantissa portion of a first source operand with a first intermediate operand of a second integer type, wherein the bit width of the second integer type is greater than the effective bit width L of the mantissa portion of the first source operand, and the bit width of a maximum value of the second source operand is less than the difference between the bit width of the second integer type and the effective bit width L, L being a positive integer; converting the second source operand into a second integer type to obtain a second intermediate operand; executing integer multiplication of a second integer type on the first intermediate operand and the second intermediate operand to obtain a mantissa multiplication result, wherein the mantissa multiplication result is of the second integer type; and obtaining a destination operand according to the mantissa multiplication result.
In at least one embodiment, the method can simulate high-precision integer and floating point multiplication, improve the calculation precision, reduce the precision error of the integer and floating point multiplication, improve the precision of operators applying the method, such as a resolution operator, and avoid scaling errors.
Embodiments of the present disclosure will be described in detail below with reference to the attached drawings, but the present disclosure is not limited to these specific embodiments.
Fig. 2 is a schematic flow chart of a data processing method according to at least one embodiment of the present disclosure.
As shown in fig. 2, the data processing method provided in at least one embodiment of the present disclosure includes steps S101 to S105.
In step S101, a first source operand and a second source operand are received as input parameters.
For example, the first source operand is of the floating point type and the second source operand is of the first integer type.
For example, the first source operand and the second source operand may be input parameters provided by an instruction or a microinstruction.
For example, the first source operand and the second source operand may be intermediate parameters in the data processing, such as those generated during execution of the instruction.
The first source operand and the second source operand may have different physical meanings, depending on the application field.
For example, in the field of speech processing, the first source operand and the second source operand may be any parameters used, input, generated in tasks such as feature extraction, speech enhancement, speech recognition, etc., that require floating point numbers and integer multiplications, such as speech feature vectors, filtering parameters, etc.
For example, in the field of image processing, the first source operand and the second source operand may be parameters used in preprocessing, feature extraction, image segmentation, object detection, and the like of an image, input, and any generated parameters needing floating point number and integer multiplication, such as image feature vectors, various edge detection operators (such as Sobel operators, canny operators, prewitt operators, and the like), image filtering operators (such as gaussian filtering, median filtering, bilateral filtering, and the like), morphological operators (such as erosion, expansion, open operation, closed operation, and the like), and the like. For example, in one particular example, the first source operand may be X frac in the resolution operator, representing the scale, and the second source operand may be X D in the resolution operator, representing the coordinates of the element on the image before scaling.
For example, in the field of text processing, the first source operand and the second source operand may be any parameters used, input, generated in tasks such as text classification, emotion analysis, text generation, etc., that require floating point numbers and integer multiplications, such as semantic feature vectors of text, etc.
For example, in the field of video processing, the first source operand and the second source operand may be parameters in the field of image processing as described above, or parameters specific to the field of video processing such as use, input, generation of optical flow operators (for estimating motion between video frames), object tracking operators (for tracking specific objects in video), etc.
Of course, the disclosure is not limited thereto, and as long as the multiplication calculation of floating point numbers and integers is required for other application scenarios or fields, the data processing method described in at least one embodiment of the disclosure may be applied, and will not be described in detail herein.
In step S102, the mantissa portion of the first source operand is represented by a first intermediate operand of a second integer type.
For example, the bit width of the second integer type is greater than the effective bit width L of the mantissa portion of the first source operand, and the bit width of the maximum value of the second source operand is less than the difference between the bit width of the second integer type and the effective bit width L, L being a positive integer.
For example, in one example, a second integer type may be selected based on the effective bit width L and the bit width of the first integer type, e.g., the bit width of the second integer type is greater than the sum of the effective bit width L and the bit width of the first integer type.
For example, in another example, a second integer type may be determined to be used first, e.g., the second integer type requires a bit width greater than the valid bit width L, and then the bit width of the maximum value of the second source operand is determined based on the bit width of the second integer type and the valid bit width L. Specifically, in one specific example, the first source operand is FP64, the mantissa portion has an effective bit width L of 53, and the second integer type UINT64 (64-bit unsigned integer) is determined to be used according to factors such as integer multiplication supported by the processor or force balance, so that the upper limit of the bit width of the second source operand (i.e., the bit width of the maximum value of the second source operand) is 10 bits, because if the value of the second source operand is smaller than 2048, that is, at most 10 bits are occupied, the result of UINT64 multiplication of 10 bits with 53 bits does not exceed 64 bits.
For example, the bit width of the maximum value of the second source operand may be the difference between the bit width of the second integer type and the effective bit width L of the mantissa portion of the first source operand minus 1.
For example, the mantissa multiplication result is calculated to use the most bits as possible, and the mantissa multiplication process does not consider the influence of sign bits, so the second integer type may be set to an unsigned integer type.
In step S103, the second source operand is converted into a second integer type, resulting in a second intermediate operand.
In step S104, integer multiplication of the second integer type is performed on the first intermediate operand and the second intermediate operand, resulting in a mantissa multiplication result.
For example, the mantissa multiplication result is a second integer type.
In step S105, a destination operand is obtained from the mantissa multiplication result.
In at least one embodiment of the present disclosure, the multiplication of floating point numbers and integers is converted into the multiplication of integers and integers, specifically, the mantissa portion of the floating point number is converted into integer form and is multiplied with the integers, so that more mantissa digits can be reserved as much as possible, the calculation precision is improved, the precision error of the multiplication of the integers and the floating point numbers is reduced, the precision of operators applying the method, such as the resolution operator, is improved, and the scaling error is avoided. For example, if high precision floating point numbers or high precision floating point number multiplications are not supported in some processors or chips, then high precision floating point number multiplications may be performed using the data processing method provided by at least one embodiment of the present disclosure.
For example, step S102 may include: filling hidden bits in the mantissa part of the first source operand to obtain a valid mantissa part; the L bits of the valid mantissa portion are taken as the 1 st bit to the L th bit of the first intermediate operand, wherein the 1 st bit of the first intermediate operand is the lowest bit. The L-th bit of the first intermediate operand is the L-th bit in the low to high direction.
For a normalized floating point number, the most significant bits of its mantissa portion are all 1's, so to represent the most significant data bits in the mantissa, they are typically hidden, also referred to as hidden bits.
For example, for a normalized floating point number, at least the hidden bits of the mantissa portion are padded in a padding operation, i.e., a1 is padded before the most significant bit of the mantissa portion of the input parameter.
For example, all bits of the first intermediate operand are used to represent the size of the value of the mantissa significant portion of the first source operand, which corresponds to all bits being used to encode the mantissa of the first source operand.
FIG. 3 is a schematic diagram illustrating a relationship between a first source operand and a first intermediate operand according to at least one embodiment of the present disclosure.
As shown in fig. 3, first, the mantissa portion (L-1 bits) of the first source operand is padded with hidden bits to obtain a valid mantissa portion, and as shown in fig. 3, the most significant bit of the valid mantissa portion is 1.
Then, the 1 st bit of the valid mantissa portion is taken as the 1 st bit of the first intermediate operand, the 2 nd bit of the valid mantissa portion is taken as the 2 nd bit of the first intermediate operand.
Here, the 1 st bit of the valid mantissa portion is the lowest bit, and the L-th bit of the valid mantissa portion is the L-th bit in the low-order to high-order direction.
For example, in step S104, since the first intermediate operand is of the second integer type and the second intermediate operand is also of the second integer type, integer multiplication of the type of the second integer type may be directly performed on the first intermediate operand and the second intermediate operand to obtain a mantissa multiplication result, which is also of the second integer type.
For example, integer multiplication of the first intermediate operand and the second intermediate operand may be performed with a multiplier of the second integer type, such as a U-bit multiplier, where U is a positive integer and is the bit width of the second integer type.
The mantissa multiplication result reflects the relative value size of the product result of the first source operand and the second source operand.
For example, the precision type of the destination operand is the same as the precision type of the first source operand, e.g., FP64 for the first source operand and FP64 for the destination operand. For example, the type of destination operand is the same as the type of first source operand, e.g., both are floating point numbers, but the precision of the destination operand may be higher than the precision of the first source operand, e.g., the mantissa valid bit width of the destination operand is greater than the mantissa valid bit width of the first source operand.
For example, the processor or chip may not support FP64 multiplication or even FP64 precision format, i.e., it is likely that in a processor or chip employing the data processing method provided by at least one embodiment of the present disclosure, the destination operand may not be represented directly in high precision floating point format, where the destination operand may be represented by three separate integer parameters including a first integer parameter representing the mantissa portion of the destination operand, a second integer parameter representing the sign bit of the destination operand, and a third integer parameter representing the exponent portion of the destination operand.
For example, in this embodiment, step S204 may include: directly taking the mantissa multiplication result as a first integer parameter; determining a second integer parameter according to the sign bit of the first source operand and the sign bit of the second operation number; the exponent portion of the first source operand is taken as the third integer parameter.
Thus, the three integer types can be used to represent the higher precision multiplication results, thereby enabling the processor or chip to support higher precision floating point number and integer type multiplication.
For example, the first source operand and the second source operand may both be positive numbers, e.g., in the resolution operator, the first source operand is, e.g., X frac, which is a positive floating point number, and the second source operand is, e.g., X D, which is a positive integer, where the determination of sign bits may not be performed, the default destination operand sign bits being positive sign bits.
Of course, the disclosure is not limited thereto, and the first source operand may be positive or negative, and the second source operand may be positive or negative, where sign bits of the multiplication result (e.g., the destination operand) may be obtained according to sign bits of the first source operand and sign bits of the second source operand.
For example, determining the second integer parameter based on the sign bit of the first source operand and the sign bit of the second operand may include: determining a second integer parameter as a first preset value in response to the sign bit of the first source operand and the sign bit of the second source operand being the same, wherein the first preset value represents that the destination operand is a positive number; and determining the second integer parameter as a second preset value in response to the sign bit of the first source operand and the sign bit of the second source operand being different, wherein the second preset value represents that the destination operand is negative.
For example, the first preset value may be 0 and the second preset value may be 1.
For example, in some embodiments, the destination operand is a normalized floating point number, such as a floating point number for which the processor or chip supports destination operand precision, where it is desirable to convert mantissa multiplication results representing the relative size of the multiplication results into a standard floating point number format.
For example, in this embodiment, step S105 may include: determining a mantissa part and an exponent part of the destination operand according to the mantissa multiplication result; the sign bit of the destination operand is determined based on the sign bit of the first source operand and the sign bit of the second operand.
For example, determining the mantissa portion and the exponent portion of the destination operand based on the mantissa multiplication result may include: determining an effective part from the mantissa multiplication result, wherein the starting point of the effective part is the 1 st 1 which appears along a preset direction in the mantissa multiplication result, and the preset direction is the direction from high order to low order in the mantissa multiplication result; determining a mantissa intermediate result according to the effective part, wherein the bit width of the mantissa intermediate result is less than or equal to L; and performing shift processing on the mantissa intermediate result according to the exponent part of the first source operand to obtain a mantissa part and an exponent part of the destination operand.
For example, there are three embodiments for determining the mantissa portion and exponent portion of the destination operand.
For example, in some embodiments, determining the valid portion from the mantissa multiplication result may include: starting with a starting point, at most L bits are consecutively selected as the significant portion in a preset direction from the mantissa multiplication result.
For example, in an embodiment, determining the mantissa intermediate result from the active portion may include: the significant portion is taken as the mantissa intermediate result.
For example, in this embodiment, the implementation is simpler by selecting at most L bits as mantissa intermediate results directly from the mantissa multiplication results starting from the starting point, discarding the remainder.
For example, when the number of bits between the start point and the lowest bit of the mantissa multiplication result is equal to or greater than L, L bits are continuously selected as the mantissa intermediate result starting with the start point.
For example, when the number of bits between the start point and the lowest bit of the mantissa multiplication result is smaller than L, for example, L 'is a positive integer and smaller than L, L' bits are continuously selected as the mantissa intermediate result starting with the start point.
Fig. 4A is a schematic diagram of an active portion provided in at least one embodiment of the present disclosure.
For example, in other embodiments, as shown in fig. 4A, the starting point of the significant portion is the 1 st 1 in the mantissa multiplication result that occurs in the direction from the high order to the low order in the mantissa multiplication result, and the end point is the last 1 in the mantissa multiplication result that occurs in the preset direction.
For example, in this embodiment, determining the mantissa intermediate result from the active portion may include: and rounding the effective part according to a preset rounding rule to obtain a mantissa intermediate result.
Since the higher 0 has no influence on the precision, the effective part is firstly selected from the mantissa multiplication result, so that the part having substantial influence on the precision can be more accurately obtained; further, since the length of the valid portion is not necessarily equal to L, the valid portion is converted into a mantissa intermediate result of L bits by the rounding operation when the length of the valid portion is greater than L, the mantissa portion of the product result of the floating point number can be more accurately represented, and the mantissa intermediate result is closer to the mantissa multiplication result.
For example, in at least one embodiment of the present disclosure, in response to the length N of the active portion being less than or equal to L, the active portion is directly taken as a mantissa intermediate result, where N is a positive integer greater than 1; and determining a mantissa intermediate result according to the (L+1) -th bit in the effective portion in response to the length (N) of the effective portion being greater than L, wherein the bit width of the mantissa intermediate result is equal to L when the length (N) of the effective portion is greater than L, the most significant bit in the effective portion being the (1) -th bit, and the (L+1) -th bit in the effective portion being the (L+1) -th bit in a preset direction from the (1) -th bit.
For example, in at least one embodiment of the present disclosure, determining the mantissa intermediate result from the l+1st bit in the significant portion may include: discarding the L+1st to N bits of the valid portion in response to the L+1st bit being the first value, taking the 1 st to L bits in the valid portion as mantissa intermediate results; responding to the L+1th bit as a second value, wherein at least 1 bit from the L+2th bit to the N bit of the effective part is not the first value, discarding the L+1th bit to the N bit of the effective part, and executing carry operation on the L bit of the effective part to obtain a mantissa intermediate result; in response to the L+1st bit being the second value, and the L+2nd to N bits of the valid portion being the first value, a mantissa intermediate result is determined from the L bit of the valid portion.
For example, determining the mantissa intermediate result from the L-th bit of the valid portion may include: in response to the L bit of the active portion being a first value, taking the 1 st to L bits in the active portion as mantissa intermediate results; and discarding the L+1st bit to the N bit of the effective part in response to the L bit of the effective part being a second value, and executing carry operation on the L bit of the effective part to obtain a mantissa intermediate result.
The length N of the active portion is the number of bits of the active portion including a start point and an end point.
For example, the preset rounding rule may specify that when N is equal to or less than L, the valid portion may be directly used as a mantissa intermediate result; and when N is greater than L, rounding the effective part according to the L+1st bit in the effective part to obtain a mantissa intermediate result.
For example, the first value may be 0 and the second value may be 1.
For example, the preset rounding rule may specify that, when N is greater than L, if the l+1st bit is 0, the l+1st to nth bits of the valid portion are discarded, and the 1 st to L bits in the valid portion are used as mantissa intermediate results; if the L+1th bit is 1, since the Nth bit is necessarily 1, at least 1 bit from the L+2th bit to the Nth bit of the effective part is necessarily 1, then the L+1th bit to the Nth bit of the effective part are discarded, and carry operation is performed on the L bit, that is, 1 is added to the L bit, so as to obtain a mantissa intermediate result.
Of course, the preset rounding rules may be set as required, and the disclosure is not limited to use of the rounding rules described in the above embodiments, and will not be repeated here.
Fig. 4B is a schematic diagram of an active portion provided in at least one embodiment of the present disclosure.
For example, in other embodiments, as shown in FIG. 4B, the starting point of the active portion is the 1 st 1 in the mantissa multiplication result that occurs in the direction from the high order to the low order in the mantissa multiplication result, and the end point is the lowest order in the mantissa multiplication result.
For example, in this embodiment, determining the mantissa intermediate result from the active portion may include: and rounding the effective part according to a preset rounding rule to obtain a mantissa intermediate result.
Similar to the above embodiment, since the higher 0 has no influence on the precision, the effective part is selected from the mantissa multiplication result first, so that the part having a substantial influence on the precision can be obtained more accurately; further, since the length of the significant portion is not necessarily equal to L, the significant portion is converted into an L-bit mantissa intermediate result by a rounding operation to more accurately represent the mantissa portion of the floating-point product result.
For example, the preset rounding rule may specify that, when the l+1st bit of the valid portion is 0, the l+1st to nth bits of the valid portion are discarded, and the 1 st to L bits in the valid portion are taken as mantissa intermediate results; when the L+1 bit of the effective part is 1, if any one bit from the L+2 bit to the N bit of the effective part is 1, carry operation is executed, namely 1 is added to the L bit to obtain a mantissa intermediate result, if all from the L+2 bit to the N bit of the effective part is 0 and the L bit is 0, the L+1 bit to the N bit of the effective part is abandoned, the 1 st bit to the L bit in the effective part is taken as the mantissa intermediate result, if all from the L+2 bit to the N bit of the effective part is 0 and the L bit is 1, the L+1 bit to the N bit of the effective part is abandoned, and carry operation is executed, namely 1 is added to the L bit to obtain the mantissa intermediate result.
For more details on "the valid portion performs rounding operation according to the preset rounding rule to obtain the mantissa intermediate result", reference may be made to the related description of the above embodiment, and details are not repeated here.
For example, in some embodiments, the destination operand is a product result of the first source operand and the second source operand.
For example, in this embodiment, shifting the mantissa intermediate result according to the exponent portion of the first source operand to obtain the mantissa portion and the exponent portion of the destination operand may include: and shifting the mantissa intermediate result according to the value of the exponent part of the first source operand to obtain the mantissa part and the exponent part of the destination operand.
For example, shifting the mantissa intermediate result according to the value E of the exponent portion of the first source operand corresponds to multiplying the mantissa intermediate result by 2 E, and normalizing the shifted result to obtain the mantissa portion and the exponent portion of the destination operand.
For example, in some embodiments, the destination operand is a rounded result after performing a further rounding operation on the product result of the first source operand and the second source operand, e.g., the rounding operation indicates performing a rounding down operation on the product result. For example, the first source operand may be the scaling X frac in the resolution operator and the second source operand may be the coordinates X D of the element on the image before scaling in the resolution operator, the rounding operation being floor (X) performed on the product result.
For example, in this embodiment, shifting the mantissa intermediate result according to the exponent portion of the first source operand to obtain the mantissa portion and the exponent portion of the destination operand may include: and shifting the mantissa intermediate result according to the value of the exponent part of the first source operand to obtain the mantissa part and the exponent part of the destination operand.
For the rounding-down operation, when the value E of the exponent part is larger than 0, the intermediate mantissa result is amplified, and the overflow mantissa part cannot occur; when the value E of the exponent part is less than 0, there may be some bits that become overflow mantissa parts (the least significant bits of the mantissa flow out of the right end of the mantissa field), but these overflow mantissa parts are themselves truncated due to the rounding down operation, and thus have no effect on precision. Therefore, when the further rounding operation is a rounding down operation, no additional operation is required on the product result of the first source operand and the second source operand.
For example, in some embodiments, the destination operand is a rounded result after performing a further rounding operation on the product result of the first source operand and the second source operand, e.g., the rounding operation indicates performing a round-up operation on the product result.
For example, in this embodiment, shifting the mantissa intermediate result according to the exponent portion of the first source operand to obtain the mantissa portion and the exponent portion of the destination operand may include: shifting the mantissa intermediate result according to the value of the exponent portion of the first source operand to obtain a shifted result and an overflow mantissa portion due to the shifting; responding to the overflow mantissa parts to be the first value, and obtaining the mantissa part and the exponent part of the destination operand according to the shift result; in response to the overflow mantissa portion not being the first value, a carry operation is performed on the shift result and a mantissa portion and an exponent portion of the destination operand are derived from the carry result.
For the round-up operation, when the value E of the exponent part is greater than 0, the mantissa intermediate result is amplified, and no overflow mantissa part occurs, so the mantissa part and the exponent part of the destination operand can be obtained according to the shift result, for example, the shift result is normalized to obtain the mantissa part and the exponent part of the destination operand. When the value E of the exponent portion is smaller than 0, there may be some bits as an overflow mantissa portion (the least significant bit of the mantissa flows out from the right end of the mantissa domain), if the overflow mantissa portion, that is, the bits flowing out from the right end of the mantissa domain are all 0, the mantissa portion and the exponent portion of the destination operand may be obtained according to the shift result, for example, the shift result is normalized to obtain the mantissa portion and the exponent portion of the destination operand; if the overflow mantissa portion is not all 0, a carry operation is performed on the shift result, for example, adding 1 to the least significant bit of the shift result and obtaining the mantissa portion and the exponent portion of the destination operand according to the carry result.
Of course, other regular rounding operations may be provided in addition to the round-up and round-down operations, such as rounding, rounding to zero, average rounding, etc., as may be set by one of ordinary skill in the art, without limitation to this disclosure.
For example, determining sign bits of the destination operand from sign bits of the first source operand and sign bits of the second operand may include: determining that the sign bit of the destination operand is a positive sign bit in response to the sign bit of the first source operand and the sign bit of the second source operand being the same, wherein the positive sign bit indicates that the destination operand is a positive number; and determining sign bits of the destination operand to be negative sign bits in response to sign bits of the first source operand and sign bits of the second source operand being different, wherein the negative sign bits represent the destination operand to be negative.
For example, a positive sign bit may be represented by 0 and a negative sign bit may be represented by 1.
In the following, the execution process of the data processing method provided in at least one embodiment of the present disclosure will be specifically described by taking FP64 as a first source operand, INT32 (32-bit signed integer) as a second source operand, UINT64 as an example.
As previously described, the mantissa portion of FP64 has a valid bit width of 53. Since "the bit width of the second integer type is larger than the effective bit width L of the mantissa portion of the first source operand, the bit width of the maximum value of the second source operand is smaller than the difference between the bit width of the second integer type and the effective bit width L", the bit width of the maximum value of the second source operand is 10, and the value of the second source operand needs to be smaller than 2048.
First, referring to step S101, a first source operand and a second source operand are received. For example, the first source operand may be the scale X frac in the resolution operator and the second source operand may be the coordinates X D of the element on the image before scaling in the resolution operator.
Thereafter, referring to step S102, the mantissa portion of the first source operand is represented by a first intermediate operand of type UIT 64. Specifically, hidden bits in the mantissa portion of the first source operand may be first appended to obtain a valid mantissa portion, then 53 bits of the valid mantissa portion are taken as 1 st bit to 53 rd bit of the first intermediate operand, for example, a lowest bit (1 st bit) of the valid mantissa portion is taken as a lowest bit (1 st bit) of the first intermediate operand, a2 nd bit of the valid mantissa portion is taken as a2 nd bit of the first intermediate operand, and so on.
Referring to step S103, the second source operand is converted from INT32 type to UIT 64. It should be noted that, during conversion, a numerical portion of the second source operand is reserved in the second intermediate operand, and sign bits for the second source operand may be separately stored for later determining sign bits of the destination operand.
Referring to step S104, UINT64 multiplication is performed on the first intermediate operand and the second intermediate operand, resulting in a mantissa multiplication result n. The mantissa multiplication result n is also of the UINT64 type.
Referring to step S105, the mantissa multiplication result n represents the relative size of the mantissa portion, which may be required to be rounded, shifted, etc. to obtain the absolute size of the product result of the first source operand and the second source operand.
For example, the valid portion may first be determined from the mantissa multiplication result n, and the mantissa intermediate result may be determined from the valid portion. Any of the three embodiments described above may be specifically employed.
Here, the process of obtaining the mantissa intermediate result is described taking the 1 st 1 in the mantissa multiplication result n, which occurs in the preset direction, as an example, where the starting point of the effective portion is the lowest order of the mantissa multiplication result n.
Table 2 provides preset rounding rules for converting significant portions to mantissa intermediate results in accordance with at least one embodiment of the present disclosure.
Table 2 preset rounding rules
As shown in table 2, the preset rounding rule specifies that if the length N of the effective portion is less than or equal to 53, the effective portion is directly used as the mantissa intermediate result. The preset rounding rules further provide that if the length N of the effective part is greater than 53, when the 54 th bit is 0, the 54 th bit to the nth bit of the effective part is truncated and the first 53 bits are reserved as mantissa intermediate results; when bit 54 is 1 and bits 55 to N are not all 0, then the first 53 bits of the active portion are reserved and bit 53 is incremented by 1 as a mantissa intermediate result; when the 54 th bit is 1 and the 55 th bit to the nth bit are all 0, if the 53 th bit is 0, the 54 th bit to the nth bit of the effective part is truncated and the first 53 bits are reserved as mantissa intermediate results, and if the 53 th bit is 1, the first 53 bits of the effective part are reserved and the 53 th bit is added by 1 as mantissa intermediate results.
After obtaining the mantissa intermediate result, shifting the mantissa intermediate result according to the exponent part of the first source operand to obtain the mantissa part and the exponent part of the destination operand.
For example, the mantissa intermediate result is shifted according to the value E of the exponent portion of the first source operand, resulting in the mantissa portion and exponent portion of the destination operand.
The specific shift process may refer to the related description of the foregoing embodiment, and the repetition is not repeated.
In addition, if the first source operand and the second source operand are not limited to positive integers, the sign bit of the destination operand needs to be determined according to the sign bit of the first source operand and the sign bit of the second operation number, and the specific process is as described above and will not be repeated here.
Therefore, through the process, the UINT64 multiplication can be utilized to simulate the FP64 multiplication by software, high-precision multiplication is realized, precision errors of integer and floating point multiplication are reduced, the precision of operators applying the method, such as a resolution operator, is improved, and scaling errors are avoided.
The data processing method provided in at least one embodiment of the present disclosure may be applied to different systems or devices, such as the electronic device 500 shown in fig. 8. The electronic device 500 may be a terminal, such as a mobile phone terminal, a tablet computer, a notebook computer, an AR device, a VR device, a vehicle-mounted terminal, etc., or may be a server, etc. The data processing method provided in at least one embodiment of the present disclosure may be applied to scenarios involving floating point numbers and integer multiplications, such as a CPU, a high performance computing (High Performance Computing, abbreviated as HPC), and an artificial intelligence (ARTIFICIAL INTELLIGENCE, AI), in the electronic device 500, for example, a scalar computing unit, a vector computing unit, a matrix computing unit, and a tensor computing unit. Of course, the present disclosure is not limited in this regard, and any scenario, device, apparatus, etc. involving floating point numbers and integer multiplications may employ the data processing method provided by at least one embodiment of the present disclosure.
For example, referring to the electronic device 500 shown in fig. 8, when implementing the data processing method provided by at least one embodiment of the present disclosure, the processing device 501 performs various suitable actions and processes according to non-transitory computer readable instructions stored in a memory to implement the data processing method. For example, the first source operand and the second source operand are stored in a memory, such as a register, a cache or a memory, and when floating point number and integer multiplication are required, the first source operand and the second source operand are read from the memory, and the processing device 501 performs processing such as floating point number and integer multiplication on the first source operand and the second source operand according to the data processing method described in at least one embodiment of the present disclosure, to obtain the destination operand. The destination operand may be transferred to a corresponding operator or the like for use, or transferred to memory for storage.
At least one embodiment of the present disclosure also provides a data processing method. Fig. 5 is a schematic flow chart of a data processing method according to at least one embodiment of the present disclosure.
As shown in fig. 5, the data processing method provided in at least one embodiment of the present disclosure includes steps S201 to S202.
For example, in step S201, a data processing instruction indicating that at least floating point number and integer multiplication are performed is received.
For example, a data processing instruction includes a first source operand and a second source operand as input parameters, and a destination operand as output parameters.
For example, the first source operand is of the floating point type and the second source operand is of the first integer type.
For example, the data processing instruction may be for performing only floating point number and integer multiply calculations, where the destination operand is the result of the product of the first source operand and the second source operand.
For example, data processing instructions may be overlaid with other operations in addition to performing computations of floating point numbers and integer multiplications. Further calculation operations, such as rounding operations, may be performed on the multiplication results of floating point numbers and integer multiplications, for example.
For example, the data processing instructions may perform further rounding operations on the multiplication results of floating point numbers and integer multiplications, the result of the rounding operations being performed as a destination operand.
Of course, the data processing instructions may include more or other computing operations, which are not particularly limited by the present disclosure.
For example, the data processing instruction may be a machine instruction, or the data processing instruction may be a microinstruction. For example, the data processing instruction may be an instruction or micro-instruction to perform floating point numbers and integer multiplications, e.g., the data processing instruction may be a micro-instruction to perform floor (X frac · xD) operations in a resolution operator as described above.
For example, the first source operand is a tensor of the floating point number or floating point number precision type, e.g., FP32, FP64, etc. For example, the second source operand is an integer or integer tensor, such as the UINT32, UINT64, or the like type.
The types of the first source operand and the second source operand may be set as needed, which is not particularly limited by the present disclosure.
In step S202, the data processing instruction is executed using the arithmetic unit after the data processing instruction is parsed.
For example, after receiving the data processing instruction, the processor parses the data processing instruction, e.g., decodes the data processing instruction, generates a microinstruction, and sends the microinstruction to the instruction allocation unit; the instruction distribution unit sends the micro instruction to a corresponding dispatch queue according to the micro instruction class; in response to the microinstruction, after the first source operand and the second source operand (all or a desired portion) are prepared, the data is read by the arithmetic unit and the associated operations of the data processing instruction are performed.
For example, the arithmetic unit may include hardware involved in executing data processing instructions, such as multipliers, registers, queues, ALUs (ARITHMETIC LOGIC UNIT, arithmetic logic units), and the like.
For example, step S202 may include: representing a mantissa portion of the first source operand with a first intermediate operand of a second integer type, wherein the bit width of the second integer type is greater than the effective bit width L of the mantissa portion of the first source operand, and the bit width of a maximum value of the second source operand is less than the difference between the bit width of the second integer type and the effective bit width L, L being a positive integer; converting the second source operand into a second integer type to obtain a second intermediate operand; executing integer multiplication of a second integer type on the first intermediate operand and the second intermediate operand to obtain a mantissa multiplication result, wherein the mantissa multiplication result is of the second integer type; and obtaining a destination operand according to the mantissa multiplication result.
Regarding "representing the mantissa portion of the first source operand with the first intermediate operand having the second integer type", reference may be made to the related content of the aforementioned step S102, and details thereof are not repeated herein.
Regarding "converting the second source operand into the second integer type to obtain the second intermediate operand", reference may be made to the related content of step S103, which is not described herein.
Regarding "integer multiplication of the second integer type is performed on the first intermediate operand and the second intermediate operand to obtain the mantissa multiplication result", reference may be made to the relevant content of the foregoing step S104, and details thereof are not repeated here.
Regarding "obtaining the destination operand from the mantissa multiplication result", reference may be made to the related content of the foregoing step S105, and details are not repeated here.
The data processing method provided in at least one embodiment of the present disclosure may achieve similar technical effects as the foregoing data processing method, and the repetition is not repeated.
At least one embodiment of the present disclosure further provides a data processing apparatus. Fig. 6 is a schematic block diagram of a data processing apparatus according to at least one embodiment of the present disclosure.
As shown in fig. 6, the data processing apparatus 300 includes an acquisition module 301, a first conversion module 302, a second conversion module 303, a multiplication calculation module 304, and an output module 305.
For example, the fetch module 301 is configured to receive a first source operand and a second source operand as input parameters. For example, the first source operand is of the floating point type and the second source operand is of the first integer type.
For example, the first conversion module 302 is configured to represent a mantissa portion of the first source operand with a first intermediate operand of a second integer type.
For example, the bit width of the second integer type is greater than the effective bit width L of the mantissa portion of the first source operand, and the bit width of the maximum value of the second source operand is less than the difference between the bit width of the second integer type and the effective bit width L, L being a positive integer.
For example, the second conversion module 303 is configured to convert the second source operand to a second integer type resulting in a second intermediate operand.
For example, the multiplication computation module 304 is configured to perform integer multiplication of a second integer type on the first intermediate operand and the second intermediate operand resulting in a mantissa multiplication result, wherein the mantissa multiplication result is of the second integer type.
For example, the output module 305 is configured to obtain a destination operand as an output parameter from the mantissa multiplication result.
For example, destination operands may be output directly from data processing apparatus 300, transferred to other components requiring use of output parameters, such as storage devices or other computing devices, etc.
For example, the acquisition module 301, the first conversion module 302, the second conversion module 303, the multiplication computation module 304, and the output module 305 include codes and programs stored in a memory, and the acquisition module 301, the first conversion module 302, the second conversion module 303, the multiplication computation module 304, and the output module 305 are implemented, for example, as a Central Processing Unit (CPU) or other form of processing unit having data processing capability and/or instruction execution capability, which may be a general-purpose processor, and also be a single-chip microcomputer, a microprocessor, a digital signal processor, a dedicated image processing chip, or a field programmable logic array, or the like, and the acquisition module 301, the first conversion module 302, the second conversion module 303, the multiplication computation module 304, and the output module 305 execute the codes and programs to implement some or all of the functions of the acquisition module 301, the first conversion module 302, the second conversion module 303, the multiplication computation module 304, and the output module 305 as described above. For example, the acquisition module 301, the first conversion module 302, the second conversion module 303, the multiplication module 304, and the output module 305 may be one circuit board or a combination of circuit boards for realizing the functions as described above. In an embodiment of the present application, the circuit board or the combination of the circuit boards may include: (1) one or more processors; (2) One or more non-transitory memories coupled to the processor; and (3) firmware stored in the memory that is executable by the processor.
It should be noted that, the obtaining module 301 may be used to implement the step S101 shown in fig. 2, the first converting module 302 may be used to implement the step S102 shown in fig. 2, the second converting module 303 may be used to implement the step S103 shown in fig. 2, the multiplication calculating module 304 may be used to implement the step S104 shown in fig. 2, and the output module 305 may be used to implement the step S105 shown in fig. 2. Thus, for a specific description of the functions that can be implemented by the obtaining module 301, the first converting module 302, the second converting module 303, the multiplication calculating module 304, and the output module 305, reference may be made to the related descriptions of step S301 to step S305 in the above-mentioned embodiments of the data processing method, and the repetition is omitted. In addition, the data processing apparatus 300 may achieve similar technical effects as the foregoing data processing method, and will not be described herein.
It should be noted that, in at least one embodiment of the present disclosure, the data processing apparatus 300 may include more or fewer circuits or units, and the connection relationship between the respective circuits or units is not limited, and may be determined according to actual requirements. The specific configuration of each circuit or unit is not limited, and may be constituted by an analog device according to the circuit principle, a digital chip, or other applicable means.
For example, the data processing apparatus 300 may be implemented in hardware, software, or a combination of hardware and software, which is not particularly limited by the present disclosure.
The data processing method provided in at least one embodiment of the present disclosure may achieve similar technical effects as those of the data processing method described above, and will not be described herein.
Fig. 7 is a schematic block diagram of a processor provided in at least one embodiment of the present disclosure. As shown in fig. 7, the processor 400 includes an instruction parsing unit 401 and an operation unit 402.
For example, the instruction parsing unit 401 is used to receive and parse data processing instructions.
For example, the data processing instruction includes a first source operand and a second source operand as input parameters, the first source operand being of a floating point type and the second source operand being of a first integer type, and a destination operand as output parameter.
For example, the arithmetic unit 402 executes a data processing instruction after the instruction analysis unit analyzes the data processing instruction.
For example, the arithmetic unit 402, when executing a data processing instruction, includes performing the following operations: representing a mantissa portion of a first source operand with a first intermediate operand of a second integer type, wherein the bit width of the second integer type is greater than the effective bit width L of the mantissa portion of the first source operand, and the bit width of a maximum value of the second source operand is less than the difference between the bit width of the second integer type and the effective bit width L, L being a positive integer; converting the second source operand into a second integer type to obtain a second intermediate operand; executing integer multiplication of a second integer type on the first intermediate operand and the second intermediate operand to obtain a mantissa multiplication result, wherein the mantissa multiplication result is of the second integer type; and obtaining a destination operand according to the mantissa multiplication result.
Specifically, when upper-layer software (such as AI application, HPC application, scientific computing application, etc.) based on a processor can send a data processing instruction for computing processing to the processor (such as CPU or GPU) through a unified packaged function library, the data processing instruction can carry a first source operand and a second source operand as input parameters; when the processor receives the data processing instruction, the instruction parsing unit 401 parses the data processing instruction to obtain a first source operand and a second source operand as input parameters, and the processor schedules the operation unit to execute the data processing task on the first source operand and the second source operand as input parameters. For example, after parsing the data processing instruction, the processor may store the first source operand and the second source operand in the data processing instruction in a register or memory so that the input parameters may be retrieved from the register or memory when the arithmetic unit 402 is performing the computing process.
Regarding the specific procedure of executing the data processing instruction using the operation unit 402, reference may be made to steps S102 to S105 in the data processing method as described above, and the repetition is omitted.
Fig. 8 is a schematic block diagram of an electronic device according to an embodiment of the present disclosure. As shown in fig. 8, the electronic device 500 is suitable for use, for example, to implement the data processing method provided by the embodiments of the present disclosure. It should be noted that the components of the electronic device 500 shown in fig. 8 are exemplary only and not limiting, and that the electronic device 500 may have other components as desired for practical applications.
As shown in fig. 8, the electronic device 500 may include a processing apparatus (e.g., central processor, graphics processor, etc.) 501 that may perform various suitable actions and processes in accordance with non-transitory computer readable instructions stored in a memory to implement various functions.
For example, computer readable instructions, when executed by the processing device 501, may perform one or more steps of a data processing method according to any of the embodiments described above. It should be noted that, for a detailed description of the processing procedure of the data processing method, reference may be made to the relevant description in the above-described embodiment of the data processing method.
For example, the memory may comprise any combination of one or more computer program products, which may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. Volatile memory may include, for example, random Access Memory (RAM) 503 and/or cache memory (cache) and the like, for example, computer readable instructions may be loaded into Random Access Memory (RAM) 503 from storage 508 to execute the computer readable instructions. The non-volatile memory may include, for example, read-only memory (ROM) 502, a hard disk, erasable programmable read-only memory (EPROM), portable compact disc read-only memory (CD-ROM), USB memory, flash memory, and the like. Various applications and various data, such as style images, and various data used and/or generated by the applications, may also be stored in the computer readable storage medium.
For example, the processing device 501, the ROM 502, and the RAM 503 are connected to each other via a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.
In general, the following devices may be connected to the I/O interface 505: input devices 506 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; an output device 507 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 508 including, for example, magnetic tape, hard disk, flash memory, etc.; and communication means 509. The communication means 509 may allow the electronic device 500 to communicate with other electronic devices wirelessly or by wire to exchange data. While fig. 8 shows the electronic device 500 with various means, it is to be understood that not all of the illustrated means are required to be implemented or provided, and that the electronic device 500 may alternatively be implemented or provided with more or fewer means. For example, the processing device 501 may control other components in the electronic apparatus 500 to perform desired functions. The processing means 501 may be a Central Processing Unit (CPU), tensor Processor (TPU) or a graphics processor GPU or the like having data processing capabilities and/or program execution capabilities. The Central Processing Unit (CPU) may be an X86, ARM, RISC-V architecture, or the like. The GPU may be integrated directly into the SOC, directly onto the motherboard, or built into the north bridge chip of the motherboard.
Fig. 9 is a schematic diagram of a non-transitory computer readable storage medium according to at least one embodiment of the present disclosure. For example, as shown in fig. 9, the storage medium 600 may be a non-transitory computer-readable storage medium, and one or more computer-readable instructions 601 may be stored non-transitory on the storage medium 600. For example, computer readable instructions 601, when executed by a processor, may perform one or more steps in accordance with the data processing methods described above.
For example, the storage medium 600 may be applied to the electronic device 500 described above, and for example, the storage medium 600 may include the storage 508 in the electronic device 500.
For example, the storage device may comprise any combination of one or more computer program products, which may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. Volatile memory can include, for example, random Access Memory (RAM) and/or cache memory (cache) and the like. The non-volatile memory may include, for example, read-only memory (ROM), hard disk, erasable programmable read-only memory (EPROM), portable compact disc read-only memory (CD-ROM), USB memory, flash memory, and the like. One or more computer readable instructions may be stored on the computer readable storage medium that can be executed by a processor to perform various functions of the processor. Various applications and various data, etc. may also be stored in the storage medium.
For example, the storage medium may include a memory card of a smart phone, a memory component of a tablet computer, a hard disk of a personal computer, random Access Memory (RAM), read Only Memory (ROM), erasable Programmable Read Only Memory (EPROM), portable compact disc read only memory (CD-ROM), flash memory, or any combination of the foregoing, as well as other suitable storage media.
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units involved in the embodiments of the present disclosure may be implemented by means of software, or may be implemented by means of hardware. Wherein the names of the units do not constitute a limitation of the units themselves in some cases.
The functions described above herein may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: a Field Programmable Gate Array (FPGA), an Application Specific Integrated Circuit (ASIC), an Application Specific Standard Product (ASSP), a system on a chip (SOC), a Complex Programmable Logic Device (CPLD), and the like.
The foregoing description is only of the preferred embodiments of the present disclosure and description of the principles of the technology being employed. It will be appreciated by persons skilled in the art that the scope of the disclosure referred to in this disclosure is not limited to the specific combinations of features described above, but also covers other embodiments which may be formed by any combination of features described above or equivalents thereof without departing from the spirit of the disclosure. Such as those described above, are mutually substituted with the technical features having similar functions disclosed in the present disclosure (but not limited thereto).
Moreover, although operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. In certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limiting the scope of the present disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.
Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are example forms of implementing the claims.
For the purposes of this disclosure, the following points are also noted:
(1) The drawings of the embodiments of the present disclosure relate only to the structures related to the embodiments of the present disclosure, and other structures may refer to the general design.
(2) The embodiments of the present disclosure and features in the embodiments may be combined with each other to arrive at a new embodiment without conflict.
The foregoing is merely a specific embodiment of the disclosure, but the scope of the disclosure is not limited thereto and should be determined by the scope of the claims.

Claims (18)

1.A data processing method, comprising:
receiving a first source operand and a second source operand as input parameters, wherein the type of the first source operand is a floating point type, and the type of the second source operand is a first integer type;
Representing a mantissa portion of the first source operand with a first intermediate operand of a second integer type, wherein a bit width of the second integer type is greater than a significant bit width L of the mantissa portion of the first source operand, a bit width of a maximum value of the second source operand is less than a difference between the bit width of the second integer type and the significant bit width L, L being a positive integer;
converting the second source operand into the second integer type to obtain a second intermediate operand;
Performing integer multiplication of the second integer type on the first intermediate operand and the second intermediate operand to obtain a mantissa multiplication result, wherein the mantissa multiplication result is of the second integer type;
And obtaining a destination operand serving as an output parameter according to the mantissa multiplication result.
2. The data processing method of claim 1, wherein representing the mantissa portion of the first source operand with a first intermediate operand of a second integer type comprises:
filling hidden bits in the mantissa portion of the first source operand to obtain a valid mantissa portion;
and taking L bits of the valid mantissa part as 1 st bit to L th bit of the first intermediate operand, wherein the 1 st bit of the first intermediate operand is the lowest bit.
3. The data processing method of claim 1, wherein the destination operand is represented by three independent integer parameters including a first integer parameter representing a mantissa portion of the destination operand, a second integer parameter representing sign bits of the destination operand, and a third integer parameter representing an exponent portion of the destination operand,
Obtaining the destination operand according to the mantissa multiplication result, including:
directly taking the mantissa multiplication result as the first integer parameter;
Determining the second integer parameter according to the sign bit of the first source operand and the sign bit of the second operation number;
and taking the exponent part of the first source operand as the third integer parameter.
4. The data processing method of claim 1, wherein the destination operand is a floating point number,
Obtaining the destination operand according to the mantissa multiplication result, including:
determining a mantissa portion and an exponent portion of the destination operand based on the mantissa multiplication result;
and determining the sign bit of the destination operand according to the sign bit of the first source operand and the sign bit of the second operation number.
5. The data processing method of claim 4, wherein determining the mantissa portion and the exponent portion of the destination operand based on the mantissa multiplication result comprises:
Determining an effective part from the mantissa multiplication result, wherein the starting point of the effective part is 1 st 1 in the mantissa multiplication result, which occurs along a preset direction, and the preset direction is the direction from high order to low order in the mantissa multiplication result;
determining a mantissa intermediate result according to the effective part, wherein the bit width of the mantissa intermediate result is less than or equal to L;
And performing shift processing on the mantissa intermediate result according to the exponent part of the first source operand to obtain a mantissa part and an exponent part of the destination operand.
6. The data processing method according to claim 5, wherein an end point of the effective portion is a last 1 in the mantissa multiplication result occurring in the preset direction, or an end point of the effective portion is a least significant bit in the mantissa multiplication result,
Determining a mantissa intermediate result from the active portion, comprising:
And rounding the effective part according to a preset rounding rule to obtain the mantissa intermediate result.
7. The data processing method according to claim 6, wherein rounding the valid portion according to a preset rounding rule to obtain the mantissa intermediate result comprises:
in response to the length N of the effective portion being equal to or less than L, directly taking the effective portion as the mantissa intermediate result;
And determining the mantissa intermediate result according to the L+1th bit in the effective portion in response to the length N of the effective portion being greater than L, wherein the bit width of the mantissa intermediate result is equal to L when the length N of the effective portion is greater than L, the most significant bit in the effective portion is the 1 st bit, and the L+1th bit in the effective portion is the L+1th bit in the preset direction from the 1 st bit.
8. The data processing method of claim 7, wherein determining the mantissa intermediate result from the l+1th bit in the valid portion comprises:
discarding the (L+1) -th bits of the effective portion as the mantissa intermediate result in response to the (L+1) -th bit of the effective portion being a first value;
Responsive to an L+1th bit in the valid portion being a second value and at least 1 bit of the L+2th to N-th bits of the valid portion not being the first value, discarding the L+1th to N-th bits of the valid portion and performing a carry operation on the L-th bit of the valid portion to obtain the mantissa intermediate result;
and in response to the L+1st bit in the active portion being the second value, and the L+2nd bit to the N th bit of the active portion being the first value, determining the mantissa intermediate result according to the L th bit of the active portion.
9. The data processing method of claim 5, wherein determining a valid portion from the mantissa multiplication result comprises:
starting with the starting point, continuously selecting at most L bits from the mantissa multiplication result along the preset direction as the effective part;
determining a mantissa intermediate result from the active portion, comprising:
The significant portion is taken as the mantissa intermediate result.
10. The data processing method of claim 5, wherein the destination operand is a product of the first source operand and the second source operand,
The shifting process includes shifting the mantissa intermediate result according to a value of an exponent portion of the first source operand.
11. The data processing method of claim 5, wherein the destination operand is a rounded result after performing a further rounding operation on the product result of the first source operand and the second source operand,
In response to the rounding operation indicating a downward rounding operation on the product result, the shifting process includes shifting the mantissa intermediate result according to a value of an exponent portion of the first source operand.
12. The data processing method of claim 5, wherein the destination operand is a rounded result after performing a further rounding operation on the product result of the first source operand and the second source operand,
Performing a round-up operation on the product result in response to the rounding operation indication, the shifting process comprising:
shifting the mantissa intermediate result according to the value of the exponent part of the first source operand to obtain a shifted result and an overflow mantissa part generated due to the shifting;
Responding to the overflow mantissa parts to be first values, and obtaining the mantissa parts and the exponent parts of the destination operand according to the shift result;
and in response to the overflow mantissa portion not being the first value, performing a carry operation on the shift result and obtaining a mantissa portion and an exponent portion of the destination operand from the carry result.
13. The data processing method of claim 4, wherein determining the sign bit of the destination operand from the sign bit of the first source operand and the sign bit of the second operation number comprises:
determining that the sign bit of the destination operand is a positive sign bit in response to the sign bit of the first source operand and the sign bit of the second source operand being the same, wherein the positive sign bit indicates that the destination operand is a positive number;
And determining sign bits of the destination operand to be negative sign bits in response to sign bits of the first source operand and sign bits of the second source operand being different, wherein the negative sign bits represent the destination operand to be negative.
14. A data processing method, comprising:
Receiving a data processing instruction indicating at least a floating point number and an integer multiplication, wherein the data processing instruction comprises a first source operand and a second source operand as input parameters, the first source operand being of the floating point type and the second source operand being of the first integer type,
After parsing the data processing instruction is executed using an arithmetic unit,
Wherein executing the data processing instruction using the arithmetic unit includes:
Representing a mantissa portion of the first source operand with a first intermediate operand of a second integer type, wherein a bit width of the second integer type is greater than a significant bit width L of the mantissa portion of the first source operand, a bit width of a maximum value of the second source operand is less than a difference between the bit width of the second integer type and the significant bit width L, L being a positive integer;
converting the second source operand into the second integer type to obtain a second intermediate operand;
Performing integer multiplication of the second integer type on the first intermediate operand and the second intermediate operand to obtain a mantissa multiplication result, wherein the mantissa multiplication result is of the second integer type;
And obtaining the destination operand according to the mantissa multiplication result.
15. A processor includes an instruction parsing unit and an arithmetic unit, wherein,
The instruction parsing unit is configured to receive and parse a data processing instruction, where the data processing instruction includes a first source operand and a second source operand as input parameters, and a destination operand as output parameter, the first source operand is of a floating point type, the second source operand is of a first integer type,
The operation unit executes the data processing instruction after the instruction parsing unit parses the data processing instruction,
Wherein the arithmetic unit executes the data processing instructions, including performing the following operations:
Representing a mantissa portion of the first source operand with a first intermediate operand of a second integer type, wherein a bit width of the second integer type is greater than a significant bit width L of the mantissa portion of the first source operand, a bit width of a maximum value of the second source operand is less than a difference between the bit width of the second integer type and the significant bit width L, L being a positive integer;
converting the second source operand into the second integer type to obtain a second intermediate operand;
Performing integer multiplication of the second integer type on the first intermediate operand and the second intermediate operand to obtain a mantissa multiplication result, wherein the mantissa multiplication result is of the second integer type;
And obtaining the destination operand according to the mantissa multiplication result.
16. A data processing apparatus comprising:
An acquisition module configured to receive a first source operand and a second source operand as input parameters, the first source operand being of a floating point type and the second source operand being of a first integer type;
A first conversion module configured to represent a mantissa portion of the first source operand with a first intermediate operand of a second integer type, wherein the bit width of the second integer type is greater than a valid bit width L of the mantissa portion of the first source operand, the bit width of a maximum value of the second source operand is less than a difference between the bit width of the second integer type and the valid bit width L, L being a positive integer;
The second conversion module is configured to convert the second source operand into the second integer type to obtain a second intermediate operand;
A multiplication calculation module configured to perform integer multiplication of the second integer type on the first intermediate operand and the second intermediate operand to obtain a mantissa multiplication result, wherein the mantissa multiplication result is the second integer type;
and the output module is configured to obtain a destination operand serving as an output parameter according to the mantissa multiplication result.
17. An electronic device, comprising:
A memory non-transitory storing computer-executable instructions;
a processor configured to execute the computer-executable instructions,
Wherein the computer executable instructions when executed by the processor implement the data processing method according to any of claims 1-14.
18. A non-transitory computer-readable storage medium, wherein the non-transitory computer-readable storage medium stores computer-executable instructions,
The computer executable instructions, when executed by a processor, implement the data processing method according to any one of claims 1-14.
CN202410720964.7A 2024-06-05 Data processing method and device, processor, electronic equipment and storage medium Pending CN118312130A (en)

Publications (1)

Publication Number Publication Date
CN118312130A true CN118312130A (en) 2024-07-09

Family

ID=

Similar Documents

Publication Publication Date Title
US11797301B2 (en) Generalized acceleration of matrix multiply accumulate operations
JP6865847B2 (en) Processing equipment, chips, electronic equipment and methods
US20190102671A1 (en) Inner product convolutional neural network accelerator
US11816482B2 (en) Generalized acceleration of matrix multiply accumulate operations
US8106914B2 (en) Fused multiply-add functional unit
CN108229648B (en) Convolution calculation method, device, equipment and medium for matching data bit width in memory
CN103180820A (en) Method and apparatus for performing floating-point division
CN106528044A (en) Processor, instruction execution method, and calculating system
KR100847934B1 (en) Floating-point operations using scaled integers
CN116795324A (en) Mixed precision floating-point multiplication device and mixed precision floating-point number processing method
CN115269003A (en) Data processing method and device, processor, electronic equipment and storage medium
CN118312130A (en) Data processing method and device, processor, electronic equipment and storage medium
US20230012127A1 (en) Neural network acceleration
CN116700666A (en) Floating point number processing method and device
EP4356298A1 (en) Single function to perform combined matrix multiplication and bias add operations
CN108229668B (en) Operation implementation method and device based on deep learning and electronic equipment
CN113869517A (en) Inference method based on deep learning model
CN118170347A (en) Precision conversion method and device, data processing method, processor and electronic equipment
CN116700665B (en) Method and device for determining floating point number square root reciprocal
RU2276805C2 (en) Method and device for separating integer and fractional components from floating point data
US20220051095A1 (en) Machine Learning Computer
KR101484379B1 (en) Fast branch-free vector division computation
EP4356299A1 (en) Single function to perform combined convolution and select operations
CN115965047A (en) Data processor, data processing method and electronic equipment
EP4356300A1 (en) Recurrent neural network cell activation to perform a plurality of operations in a single invocation

Legal Events

Date Code Title Description
PB01 Publication