CN112130805B - Chip comprising floating point adder, device and control method of floating point operation - Google Patents
Chip comprising floating point adder, device and control method of floating point operation Download PDFInfo
- Publication number
- CN112130805B CN112130805B CN202011004685.9A CN202011004685A CN112130805B CN 112130805 B CN112130805 B CN 112130805B CN 202011004685 A CN202011004685 A CN 202011004685A CN 112130805 B CN112130805 B CN 112130805B
- Authority
- CN
- China
- Prior art keywords
- operand
- operation result
- floating point
- exponent
- operands
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000007667 floating Methods 0.000 title claims abstract description 128
- 238000000034 method Methods 0.000 title claims abstract description 44
- 238000012545 processing Methods 0.000 claims abstract description 42
- 238000006243 chemical reaction Methods 0.000 claims abstract description 11
- 230000000295 complement effect Effects 0.000 claims description 12
- 230000002159 abnormal effect Effects 0.000 claims description 10
- 230000008569 process Effects 0.000 claims description 5
- 238000005516 engineering process Methods 0.000 abstract description 44
- 238000013473 artificial intelligence Methods 0.000 abstract description 27
- 238000010586 diagram Methods 0.000 description 9
- 239000011159 matrix material Substances 0.000 description 6
- 238000011160 research Methods 0.000 description 6
- 238000004364 calculation method Methods 0.000 description 5
- 238000010801 machine learning Methods 0.000 description 5
- 238000003058 natural language processing Methods 0.000 description 5
- 230000006399 behavior Effects 0.000 description 3
- 238000004891 communication Methods 0.000 description 3
- 230000003993 interaction Effects 0.000 description 3
- 238000013135 deep learning Methods 0.000 description 2
- 238000013136 deep learning model Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000012015 optical character recognition Methods 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- 230000001133 acceleration Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000003190 augmentative effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 108700041286 delta Proteins 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000006698 induction Effects 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 230000002787 reinforcement Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
- 238000013526 transfer learning Methods 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
- G06F7/483—Computations with numbers represented by a non-linear combination of denominational numbers, e.g. rational numbers, logarithmic number system or floating-point numbers
- G06F7/487—Multiplying; Dividing
- G06F7/4876—Multiplying
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computational Mathematics (AREA)
- Computing Systems (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Nonlinear Science (AREA)
- General Engineering & Computer Science (AREA)
- Complex Calculations (AREA)
Abstract
The application discloses a chip comprising a floating point adder, a device and a control method of floating point operation, and relates to the technical field of chips. The floating point adder includes: the merging unit is used for respectively adding effective numbers to mantissa parts of the n input operands to obtain merging data respectively corresponding to the n operands, wherein n is a positive integer greater than or equal to 3; the variable shifting unit is used for respectively carrying out variable shifting operation on the merging data corresponding to the n operands to obtain n updated operands in a fixed point number format; the configurable adding unit is used for adding the n updating operands to obtain an operation result; the conversion output unit is used for converting the operation result into a floating point number format and outputting the operation result in the floating point number format. The application can realize that one floating point adder supports addition operation of 3 or more floating point numbers, improves the processing performance of the floating point adder, and provides higher-performance computational support for cloud technology, artificial intelligence technology and the like.
Description
Technical Field
The embodiment of the application relates to the technical fields of cloud technology, artificial intelligence technology and chip technology, in particular to a chip comprising a floating point adder, equipment and a control method of floating point operation.
Background
The floating point arithmetic unit is a processor for executing floating point arithmetic, and is widely used as a coprocessor on graphics processor (Graphics Processing Unit, GPU), artificial intelligence (ARTIFICIAL INTELLIGENCE, AI) chip, central processing unit (Central Processing Unit, CPU), field-Programmable gate array (Field-Programmable GATE ARRAY, FPGA), application SPECIFIC INTEGRATED Circuits (ASIC), etc., which are key data processing engines of the above chips.
A floating point adder is a processor for implementing addition operations of floating point numbers. Currently, a single floating-point adder only supports addition of 2 operands, and has more limitations.
Disclosure of Invention
The embodiment of the application provides a chip comprising a floating point adder, a device and a control method of floating point operation, which can enable a single floating point adder to support addition operation of 3 or more floating point numbers. The technical scheme is as follows:
According to an aspect of an embodiment of the present application, there is provided a chip including a floating-point adder including: the device comprises a merging unit, a variable shifting unit, a configurable adding unit and a conversion output unit;
The merging unit is used for adding effective numbers to mantissa parts of n input operands respectively to obtain merging data corresponding to the n operands respectively, wherein n is a positive integer greater than or equal to 3;
The variable shifting unit is used for performing variable shifting operation on the merging data corresponding to the n operands respectively to obtain n updated operands in a fixed point number format;
The configurable adding unit is used for adding the n updating operands to obtain an operation result;
the conversion output unit is used for converting the operation result into a floating point number format and outputting the operation result in the floating point number format.
According to an aspect of an embodiment of the present application, there is provided a computer apparatus including the above chip including a floating-point adder.
According to an aspect of the embodiment of the present application, there is provided a control method of a floating point operation, applied to a chip including a floating point adder, the method including:
Respectively adding effective numbers to mantissa parts of n input operands to obtain merging data corresponding to the n operands respectively, wherein n is a positive integer greater than or equal to 3;
respectively carrying out variable shift operation on the merging data corresponding to the n operands to obtain n updated operands in a fixed point number format;
performing addition operation on the n updated operands to obtain an operation result;
and converting the operation result into a floating point number format, and outputting the operation result in the floating point number format.
The technical scheme provided by the embodiment of the application can bring the following beneficial effects:
Through converting the input floating point number into the fixed point number and then carrying out addition operation, the floating point adder can realize that one floating point adder supports to carry out addition operation on 3 or more than 3 floating point numbers, the processing performance of the floating point adder is improved, and higher-performance calculation power support is provided for cloud technology, artificial intelligence technology and the like.
In addition, compared with the cascade connection of two floating point adders, the method and the device for adding the floating point numbers realize addition operation of 3 floating point numbers by executing two-step summation, and the technical scheme provided by the embodiment of the application can realize addition operation of 3 floating point numbers by only one-step summation, so that the processing efficiency of the floating point adder is improved, and the processing delay can be reduced to half of the original processing delay.
Moreover, as the processing performance of the floating point adder is improved, a single floating point adder can support addition operation on a larger number of floating point numbers, and therefore the chip area can be reduced on the premise that the chip achieves the same computing capacity.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram of a floating point adder implemented in a GPU according to one embodiment of the present application;
FIG. 2 is a schematic diagram of a floating point adder implemented in an AI chip, as provided by one embodiment of the application;
FIG. 3 is a schematic diagram of a floating point adder implemented in a CPU according to one embodiment of the present application;
FIG. 4 is a block diagram of a floating point adder in a chip provided in accordance with one embodiment of the present application;
FIG. 5 is a block diagram of a floating point adder in a chip provided in accordance with another embodiment of the present application;
FIG. 6 is a block diagram of a floating point adder in a chip provided in accordance with another embodiment of the present application;
FIG. 7 is a flow chart of a method of controlling floating point operations provided by one embodiment of the present application;
fig. 8 is a block diagram of an electronic device according to an embodiment of the present application.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the embodiments of the present application will be described in further detail with reference to the accompanying drawings.
First, a brief introduction is made to several nouns involved in the present application:
1. floating-point number): the identification method of a number adopted in a computer is similar to a scientific counting method, and any binary number N can be always written as:
N=(-1)S×2E×M;
Wherein: x represents the multiplication number; m is the fractional part of the floating point number N (also called mantissa) and is a pure fraction; e is the exponent portion (also called the step code exponent) of the floating point number N, which is an integer; s is a sign bit of the floating point number N, and when the sign bit is 0, the floating point number N is positive, and when the sign bit is 1, the floating point number N is negative. The decimal point position of the corresponding number can be freely floated within a certain range according to the difference of the scale factors, so that the method is called floating point identification method.
2. Floating point adder (floating-point adder): a floating point adder is a processor for implementing addition operations of floating point numbers. Basic operations for floating point numbers include addition, subtraction, multiplication, and division. In a general processor, floating point addition, subtraction, conversion and transfer can be finally realized by adopting a floating point adder, so the floating point adder is a floating point operation module with highest use frequency.
The technical scheme of the application relates to the technical field of cloud technology and artificial intelligence, and provides a chip comprising a floating point adder, wherein the chip can be a GPU (graphics processing unit), an AI (advanced technology interface) chip, a CPU (Central processing Unit) and the like, and provides chip calculation support for the cloud technology and the artificial intelligence.
Artificial intelligence (ARTIFICIAL INTELLIGENCE, AI) is the theory, method, technique, and application system that simulates, extends, and extends human intelligence using a digital computer or a machine controlled by a digital computer, perceives the environment, obtains knowledge, and uses the knowledge to obtain optimal results. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.
The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.
Computer Vision (CV) is a science of studying how to "look" a machine, and more specifically, to replace a human eye with a camera and a Computer to perform machine Vision such as recognition and measurement on a target, and further perform graphic processing to make the Computer process an image more suitable for human eye observation or transmission to an instrument for detection. As a scientific discipline, computer vision research-related theory and technology has attempted to build artificial intelligence systems that can acquire information from images or multidimensional data. Computer vision techniques typically include image processing, image recognition, image semantic understanding, image retrieval, OCR (Optical Character Recognition ), video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D techniques, virtual reality, augmented reality, synchronous positioning, map construction, etc., as well as common biometric recognition techniques such as face recognition, fingerprint recognition, etc.
Key technologies To Speech technology (Speech Technology) are automatic Speech recognition technology (Automatic Speech Recognition, ASR) and Speech synthesis technology (Text To Speech, TTS) and voiceprint recognition technology. The method can enable the computer to listen, watch, say and feel, is the development direction of human-computer interaction in the future, and voice becomes one of the best human-computer interaction modes in the future.
Natural language processing (Nature Language processing, NLP) is an important direction in the fields of computer science and artificial intelligence. It is studying various theories and methods that enable effective communication between a person and a computer in natural language. Natural language processing is a science that integrates linguistics, computer science, and mathematics. Thus, the research in this field will involve natural language, i.e. language that people use daily, so it has a close relationship with the research in linguistics. Natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic questions and answers, knowledge graph techniques, and the like.
Machine learning (MACHINE LEARNING, ML) is a multi-domain interdisciplinary, involving multiple disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory, and the like. It is specially studied how a computer simulates or implements learning behavior of a human to acquire new knowledge or skills, and reorganizes existing knowledge structures to continuously improve own performance. Machine learning is the core of artificial intelligence, a fundamental approach to letting computers have intelligence, which is applied throughout various areas of artificial intelligence. Machine learning and deep learning typically include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, induction learning, teaching learning, and the like.
The automatic driving technology generally comprises high-precision map, environment perception, behavior decision, path planning, motion control and other technologies, and has wide application prospect.
With research and advancement of artificial intelligence technology, research and application of artificial intelligence technology is being developed in various fields, such as common smart home, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned, automatic driving, unmanned aerial vehicles, robots, smart medical treatment, smart customer service, etc., and it is believed that with the development of technology, artificial intelligence technology will be applied in more fields and with increasing importance value.
The cloud technology relates to basic technologies such as cloud computing, cloud storage, databases and big data, and simultaneously relates to cloud applications such as medical cloud, cloud internet of things, cloud security, cloud calling, private cloud, public cloud, hybrid cloud, cloud games, cloud education, cloud conference, cloud social contact, artificial intelligent cloud service and the like.
Cloud computing (clouding) is a computing model that distributes computing tasks across a large pool of computers, enabling various application systems to acquire computing power, storage space, and information services as needed. The network that provides the resources is referred to as the "cloud". Resources in the cloud are infinitely expandable in the sense of users, and can be acquired at any time, used as needed, expanded at any time and paid for use as needed. As a basic capability provider of cloud computing, a cloud computing resource pool (abbreviated as a cloud platform, generally referred to as IaaS (Infrastructure AS A SERVICE) platform) is established, and multiple types of virtual resources are deployed in the resource pool for external clients to select for use. The cloud computing resource pool mainly comprises: computing devices (which are virtualized machines, including operating systems), storage devices, network devices.
One or more servers can be deployed in the cloud platform, and the servers can be independent physical servers, can be server clusters or distributed systems formed by a plurality of physical servers, and can be cloud servers for providing cloud computing services. The terminal can access the cloud platform to acquire various services provided by the cloud platform. The terminal may be, but is not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, etc. The terminal and the server may be directly or indirectly connected through wired or wireless communication, and the present application is not limited herein.
An artificial intelligence cloud service, also commonly referred to as AIaaS (AI AS A SERVICE, chinese is "AI as service"). The service mode of the artificial intelligent platform is the mainstream at present, and particularly, the AIaaS platform can split several common AI services and provide independent or packaged services at the cloud. This service mode is similar to an AI theme mall: all developers can access one or more artificial intelligence services provided by the use platform through an API (Application Programming Interface ), and partial sophisticated developers can also use an AI framework and AI infrastructure provided by the platform to deploy and operate and maintain self-proprietary cloud artificial intelligence services.
The artificial intelligence technology and the cloud technology are simply introduced, and in order to realize various product functions provided based on the artificial intelligence technology and the cloud technology, a chip with better computing capability needs to be arranged in computer equipment to realize various mathematical operations such as addition, subtraction, multiplication, division and the like. The embodiment of the application provides a chip comprising a floating point adder, which can realize that one floating point adder supports adding operation on 3 or more floating point numbers, and improves the processing performance of the floating point adder, thereby providing higher-performance computing power support for cloud technology, artificial intelligence technology and the like. The floating point adder can be applied to chips such as a GPU, an AI chip, a CPU and the like.
In one example, the floating point adder is applied to a GPU as an example. GPUs have been widely used for deep learning model training. For Tensor operations common in deep learning models, special optimization is done to speed up the operation when designing the hardware. Tensor Core (Tensor processor) is a new type of hardware processing Core that rapidly completes matrix mathematical operations. As shown in fig. 1, two 4×4FP16 matrices are multiplied, and then the result is added to the 4×4FP16 or FP32 matrix, finally outputting a new 4×4FP16 or FP32 matrix. Wherein, FP16 is a data type which is stored by adopting 2 bytes (16 bits) for encoding; similarly FP32 refers to a data type that is stored encoded in 4 bytes (32 bits).
In another example, there is also a specialized hardware acceleration unit in the AI chip. As shown in fig. 2, the vector calculation unit 20 is mainly responsible for vector or matrix operation, and if the floating point calculation mode is currently in, the intermediate result of the product of the first input matrix row vector a and the second input matrix column vector B needs to be cascaded and accumulated by using a plurality of floating point adders 21 to obtain the final calculation result.
In another example, the CPU is a general purpose processor, of which the CPU internal architecture includes a fetch unit, a decode unit, and an execution unit. As shown in fig. 3, the execution unit includes a floating-point adder 31 therein, which mainly performs addition, subtraction, and the like of floating-point data.
Fig. 4 is a block diagram illustrating the structure of a floating-point adder 40 on a chip provided in accordance with one embodiment of the present application. The floating point adder 40 may include: a combining unit 41, a variable shifting unit 42, a configurable adding unit 43, and a conversion output unit 44.
The merging unit 41 is configured to add significant numbers to mantissa portions of the n input operands, respectively, to obtain merged data corresponding to the n operands, where n is a positive integer greater than or equal to 3.
The input operands are in floating point format. In an embodiment of the present application, one floating-point adder 40 supports the addition of 3 or more floating-point numbers.
Optionally, the merging unit 41 is configured to obtain an exponent portion and a mantissa portion of the operand; if the exponent portion of the operand is 0, determining that the significand of the operand is 0; if the exponent portion of the operand is not 0, determining that the significand of the operand is 1; and merging the effective number of the operand and the mantissa part of the operand to obtain merging data corresponding to the operand.
The significand is also called significand bits, and the value of the significand is 0 or 1. Illustratively, taking any operand n= (-1) S×2E ×m as an example, the exponent portion of the operand N is E and the mantissa portion is M. When the exponent part E of the operand N is 0, adding a one-bit effective number 0 before the mantissa part M of the operand N to obtain the merging data corresponding to the operand N as 0.M; when the exponent part E of the operand N is not 0, a significant number 1 is added before the mantissa part M of the operand N, so as to obtain the merged data corresponding to the operand N as 1.m.
Illustratively, the processing procedure of the merging unit 41 described above may be expressed as follows:
exponent=data[EXPLOCmax:EXPLOCmin]
mantissa=data[MATLOCmax:MATLOCmin]
if(exponent==0)
combine_data={1’b0,mantissa}
else
combine_data={1’b1,mantissa}
Wherein exponent denotes an exponent portion, mantssa denotes a mantissa portion, EXP LOCmax and EXP LOCmin denote the most significant and least significant of the exponent portion, MAT LOCmax and MAT LOCmin denote the most significant and least significant of the mantissa portion, combination_data denotes merged data, b denotes binary, 1'b0 denotes 0, and 1' b1 denotes 1.
And the variable shifting unit 42 is configured to perform variable shifting operation on the merged data corresponding to the n operands, so as to obtain n updated operands in a fixed-point format.
The variable shifting unit 42 is used to convert the respective input operands from floating point format to fixed point format. The conversion from the floating point format to the fixed point format is achieved by performing a variable shift operation on the merged data corresponding to the operands.
In a computer, if the position of a number of decimal points is floating (i.e., not fixed), the number is a floating point number; if the position of a decimal point is fixed, the number is a fixed point number. Generally, fixed point formats have a limited range of values that can be represented, but require relatively simple processing hardware. While floating point number formats can represent a large range of values, the processing hardware required is relatively complex.
In an exemplary embodiment, as shown in fig. 5, the variable shifting unit 42 includes an index determining unit 421 and a difference shifting unit 422.
An index determining unit 421 for determining a target index value. The target exponent value may be a maximum exponent value in an exponent portion of the n operands, may be a minimum exponent value in an exponent portion of the n operands, may be any exponent value in an exponent portion of the n operands, or may be a predefined exponent value.
For example, assume that there are 3 input operands, N1, N2, and N3, respectively, the exponent value of operand N1 is E1, the exponent value of operand N2 is E2, and the exponent value of operand N3 is E3. Assuming E1 > E2 > E3, if the target exponent value is the largest exponent value in the exponent portion of the n operands, then the target exponent value is E1. If the target exponent value is the smallest exponent value in the exponent portion of the n operands, then the target exponent value is E3. Of course, in some other examples, E2 may also be selected as the target index value, or a predefined index value (e.g., 4 or other value) may be selected as the target index value. It should be noted that, the manner of determining the target index value may be preset according to the actual requirement, which is not limited in the embodiment of the present application.
A difference shift unit 422 for determining, for an ith operand of the n operands, a difference between an exponent portion of the ith operand and a target exponent value; performing variable shift operation on the combined data corresponding to the ith operand according to the difference value to obtain an updated operand corresponding to the ith operand; wherein i is a positive integer less than or equal to n.
The shift direction and shift number of the variable shift operation are determined by the above-described difference. For example, assuming that the difference obtained by subtracting the target exponent value from the exponent portion of the ith operand is delta, if the delta is less than 0, the merged data corresponding to the ith operand is shifted to the right by a delta bit; if delta is greater than 0, shifting the merged data corresponding to the ith operand left by delta bits; if delta is equal to 0, then the merged data corresponding to the ith operand is not shifted.
Illustratively, the target exponent value is the maximum exponent value in the exponent portion of the n operands. The difference shift unit 422 is configured to shift k bits to the merged data corresponding to the ith operand to obtain an updated operand corresponding to the ith operand. Wherein the difference between the exponent portion of the ith operand and the target exponent value is k, and k is an integer.
The adding unit 43 may be configured to add n updated operands to obtain an operation result.
The configurable adding unit 43 may combine sign bits of the n operands to add the n updated operands to obtain an operation result. Assuming that there are 3 operands, N1, N2 and N3, respectively, the sign bits of the 3 operands are 0, 1 and 0, respectively, wherein the sign bit 0 represents a positive number and the sign bit 1 represents a negative number, then the addition operation on the 3 operands is N1-N2+ N3.
In an exemplary embodiment, the configurable adding unit 43 is configured to obtain sign bits of n operands, respectively; for an operand with a sign bit indicated as a negative number, carrying out 2 complement operation on an updated operand corresponding to the operand to obtain n converted updated operands; and carrying out addition operation on the converted n updated operands to obtain an operation result.
The complement is also called a two's complement (2's complement), or two's complement, etc., and is a method of representing signed numbers in binary, and is also a way of changing signs of numbers. The complement is defined in binary numbers with sign bits. The complement of positive number and 0 is the number itself, and the complement of negative number is to invert the corresponding positive number by bit and then add 1.
In addition, the adder performing the addition operation may be configured as a full adder, a carry save adder, a carry look ahead adder, etc., which is not limited in the embodiment of the present application. After the addition operation is performed to obtain an operation result, the sign bit of the operation result can be judged, and if the sign bit of the operation result indicates that the operation result is negative, the sign bit and the unsigned data part of the operation result are extracted; if the sign bit of the operation result indicates that the operation result is a positive number, the operation result is directly output.
The conversion output unit 44 is configured to convert the operation result into a floating point format and output the operation result in the floating point format.
After the operation result is obtained, the operation result is converted into a floating point format, and the operation result in the floating point format is output.
In an example embodiment, as shown in fig. 5, the conversion output unit 44 includes: a mantissa truncating and rounding unit 441, an exponent updating unit 442, and an output unit 443.
A mantissa cutting and rounding unit 441 configured to determine, for unsigned data corresponding to the operation result, a significant number of the unsigned data; determining updated position information of the effective number, the updated position information being used to indicate relative position information between an updated position of the effective number and an original position; and intercepting bit data of the set bit number according to the configured rounding mode to obtain a mantissa part corresponding to the operation result.
The exponent updating unit 442 is configured to determine an exponent portion corresponding to the operation result according to the updated location information.
And the output unit 443 is configured to output the operation result in the floating point number format according to the sign bit corresponding to the operation result, the mantissa portion corresponding to the operation result, and the exponent portion corresponding to the operation result.
For example, assuming that the original position of the effective number is OriDist and the updated position is UpdDist, then:
OriDist=GetLocation(OriSig)
UpdDist=GetLocation(UpdSig)
Update location information (or referred to as Relative location information) relative_distance= OriDist-UpdDist;
Wherein OriSig is an abbreviation of Original Significant, representing the original significand; updSig is Update Significant, which represents the updated significand. The original position of the effective number is obtained by acquiring the position of the original effective number, and the updated position of the effective number is obtained by acquiring the position of the updated effective number.
Alternatively, the relative position information includes the following 3 cases: left shift, unchanged, and right shift. The description of each relative position information can be found in the following table 1:
TABLE 1
Relative position information | Description of the invention |
Left shift | The most significant bit carries out, the position of the significand shifts left by several bits |
Unchanged | The position of the significand remains unchanged |
Move to the right | The most significant bit is borrowed and the position of the significant number is shifted to the right by a plurality of bits |
In the embodiment of the application, the bit data of the set bit number is intercepted according to the configured rounding mode to obtain the mantissa part corresponding to the operation result, so that the precision of the operation result in the floating point number format finally output can realize the self-defined configuration.
Optionally, the mantissa cutting and rounding unit 441 is further configured to perform exception handling on the operation result according to the exception mode and the rounding mode if an exception condition occurs during the process of converting the operation result into the floating point format; wherein the abnormal condition includes at least one of: overflow, underflow, data inaccuracy.
The result of an operation on any significant number is typically stored in a longer register and when the result is put back into the floating-point format, the extra bits must be discarded, which is the rounding of the floating-point number. There are a number of ways in which rounding operations may be run. For example, the rounding mode includes the following table 2:
TABLE 2
In an exemplary embodiment, the exponent updating unit 442 is configured to adjust the target exponent value according to the updated location information, and determine an exponent portion corresponding to the operation result.
For example, assuming that the target exponent value is the maximum exponent value in the exponent portion of the n operands, then the adjustment includes, but is not limited to, the following:
1. the index part corresponding to the operation result is adjusted to positive infinity;
2. the index part corresponding to the operation result is adjusted to be minus infinity;
3. Index part corresponding to operation result = maximum index result + relative position information;
4. Index part corresponding to operation result = maximum index result + relative position information +1;
5. index part corresponding to operation result = maximum index result-relative position information;
6. index portion corresponding to operation result = maximum index result-relative position information +1.
In summary, according to the technical scheme provided by the embodiment of the application, the input floating point number is converted into the fixed point number and then the addition operation is performed, so that one floating point adder can support the addition operation of 3 or more floating point numbers, and the processing performance of the floating point adder is improved.
In addition, compared with the cascade connection of two floating point adders, the method and the device for adding the floating point numbers realize addition operation of 3 floating point numbers by executing two-step summation, and the technical scheme provided by the embodiment of the application can realize addition operation of 3 floating point numbers by only one-step summation, so that the processing efficiency of the floating point adder is improved, and the processing delay can be reduced to half of the original processing delay. For example, if the system dominant frequency is 100Mhz, the result output takes 20ns with the existing scheme, while with the scheme of the present application only 10ns is needed, the delay is reduced to half that of the existing scheme.
Moreover, as the processing performance of the floating point adder is improved, a single floating point adder can support addition operation on a larger number of floating point numbers, and therefore the chip area can be reduced on the premise that the chip achieves the same computing capacity.
In addition, bit data of a set bit number is intercepted according to the configured rounding mode, and a mantissa part corresponding to an operation result is obtained, so that the precision of the operation result in a floating point number format which is finally output can be configured and adjusted by a user in a self-defined way, for example, the precision of an algorithm is maximized according to different application scenes and algorithm requirements.
Referring to fig. 6, the embodiment of the present application will be described by taking an example in which the floating-point adder 40 supports addition operations on 3 floating-point numbers.
The merging unit 41 is configured to add significant numbers to mantissa portions of the input first operand, second operand, and third operand, respectively, to obtain merged data corresponding to the 3 operands, respectively.
As shown in fig. 6, assuming that the first operand is FP1, its sign bit is S1, the exponent portion is E1, and the mantissa portion is M1; the second operand is FP2, the sign bit is S2, the exponent part is E2, and the mantissa part is M2; the third operand is FP3, with sign bit S3, exponent portion E3, mantissa portion M3. The valid bit of the first operand FP1 is denoted as L1, the valid bit of the second operand FP2 is denoted as L2, the valid bit of the third operand FP3 is denoted as L3, the merged data corresponding to the first operand FP1 may be denoted as { L1, M1}, the merged data corresponding to the second operand FP2 may be denoted as { L2, M2}, and the merged data corresponding to the third operand FP3 may be denoted as { L3, M3}.
The index determination unit 421 is configured to determine a target index value. Optionally, the exponent determining unit 421 is configured to select a maximum exponent value in the exponent portion of the n operands as the target exponent value.
For example, the exponent determining unit 421 first obtains exponent portions of the first operand, the second operand, and the third operand, respectively, and then compares the above 3 exponent portions to obtain the maximum exponent value as the target exponent value. The code example for obtaining the exponent portions of the first operand, the second operand, and the third operand may also be as follows:
for(i=0;i≤2;i=i+1)
exponent[i]=data[i][EXPLOCmax:EXPLOCmin]
In addition, the manner in which the maximum index value of the 3 index portions is obtained by comparison may be exemplified as follows:
Mode one: comparing the exponent portion of the first operand with the exponent portion of the second operand to obtain a larger value therebetween; comparing the larger value with the exponent part of the third operand to obtain a larger value between the larger value and the exponent part of the third operand, namely the maximum exponent value;
Mode two: comparing the exponent portion of the first operand with the exponent portion of the third operand to obtain a larger value therebetween; comparing the larger value with the exponent part of the second operand to obtain a larger value between the larger value and the exponent part of the second operand, namely the maximum exponent value;
Mode three: comparing the exponent portion of the second operand with the exponent portion of the third operand to obtain a larger value therebetween; comparing the larger value with the exponent part of the first operand to obtain a larger value between the larger value and the exponent part of the first operand, namely the maximum exponent value.
The difference shift unit 422 is configured to perform variable shift operation on the merged data corresponding to the first operand, the second operand, and the third operand, to obtain a first updated operand, a second updated operand, and a third updated operand.
It is assumed that the other two exponent values in the exponent portion of the 3 operands, in addition to the maximum exponent value, are a first smaller exponent value and a second smaller exponent value. The difference shift unit 422 obtains the merged data corresponding to the operand corresponding to the first smaller exponent value, calculates the difference (denoted as delta 1) between the first smaller exponent value and the maximum exponent value, and right shifts the merged data by delta1 bits to obtain an updated operand. The difference shift unit 422 obtains the merged data corresponding to the operand corresponding to the second smaller exponent value, calculates the difference (denoted as delta 2) between the second smaller exponent value and the maximum exponent value, and right shifts the merged data by delta2 bits to obtain another updated operand. In addition, the merging data corresponding to the operand corresponding to the maximum exponent value is directly used as an updating operand without shifting operation.
The configurable adding unit 43 is configured to obtain sign bits of the first operand, the second operand and the third operand, respectively; for an operand with sign bits indicating negative numbers, carrying out 2 complement operation on an updated operand corresponding to the operand to obtain 3 converted updated operands; and carrying out addition operation on the converted 3 updating operands to obtain an operation result. Here, the 2-complement operation is not required for the operand whose sign bit indicates positive number, that is, the 2-complement operation is performed only for the operand whose sign bit indicates negative number, and the operand whose sign bit indicates positive number is not required to be performed.
Taking A, B and C as examples of the first operand, the second operand and the third operand, the embodiment of the present application can support the following operations:
1、A+B+C
2、A+B-C
3、A-B+C
4、A-B-C
5、-A+B+C
6、-A+B-C
7、-A-B+C
8、-A-B-C
The mantissa cutting and rounding unit 441 is configured to determine, for unsigned data corresponding to the operation result, a significant number of the unsigned data; determining updated position information of the effective number, the updated position information being used to indicate relative position information between an updated position of the effective number and an original position; and intercepting bit data of the set bit number according to the configured rounding mode to obtain a mantissa part corresponding to the operation result.
The exponent updating unit 442 is configured to determine an exponent portion corresponding to the operation result according to the updated location information.
The output unit 443 is configured to output the operation result in the floating-point number format according to the sign bit corresponding to the operation result, the mantissa portion corresponding to the operation result, and the exponent portion corresponding to the operation result.
In summary, the technical scheme provided by the embodiment of the application realizes that one floating point adder supports addition operation on 3 floating points, and compared with the method adopting cascade connection of two floating point adders, the method provided by the embodiment of the application realizes addition operation on 3 floating points by executing two-step summation, and the technical scheme provided by the embodiment of the application can realize addition operation on 3 floating points by only one-step summation, thereby improving the processing efficiency of the floating point adder and reducing the processing delay to half of the original processing delay.
The technical scheme of the application is described and illustrated by the following examples. In the following example, the input first, second and third operands are assumed to be in FP16 format.
Example 1, the significant bit remains unchanged (i.e., the position of the significand remains unchanged), as shown in table 3 below:
TABLE 3 Table 3
Sign bit | Index part | Mantissa portion | Updating operands | |
A first operand | 0 | 5 | 0000000000 | 1.0000000000 |
A second operand | 0 | 3 | 0001000000 | 0.010001000000 |
Third operand | 0 | 3 | 0110100000 | 0.010110100000 |
Outputting the result | 0 | 5 | 1001111000 |
The operation result of the summation operation of the 3 update operands shown in table 3 is:
1.0000000000+0.010001000000+0.010110100000=1.100111100000
the sign bit relative position is not changed and the exponent part does not need to be adjusted.
Example 2, significa bit left shifted by 1 bit (i.e., the position of the significand left shifted by 1 bit), as shown in Table 4 below:
TABLE 4 Table 4
Sign bit | Index part | Mantissa portion | Updating operands | |
A first operand | 0 | 8 | 1000000000 | 1.1000000000 |
A second operand | 0 | 7 | 1001000000 | 0.11001000000 |
Third operand | 0 | 8 | 0110100000 | 1.0110100000 |
Outputting the result | 0 | 9 | 1101100000 |
The operation result of the summation operation of the 3 update operands shown in table 4 is:
1.1000000000+0.11001000000+1.0110100000=11.10110000000
So the relative position of the Significa bit shifts left, the index part needs to be adjusted, and 1 is added on the basis of the maximum index value.
Example 3, significa bit right shifted by 4 bits (i.e., the position of the significand right shifted by 4 bits), as shown in Table 5 below:
TABLE 5
Sign bit | Index part | Mantissa portion | Updating operands | |
A first operand | 0 | 4 | 1100101000 | 1.1100101000 |
A second operand | 1 | 4 | 1001000000 | 1.1001000000 |
Third operand | 1 | 1 | 0000000000 | 0.0010000000000 |
Outputting the result | 0 | 0 | 1010000000 |
The operation result of the summation operation of the 3 update operands shown in table 5 is:
1.1100101000-1.1001000000-1.0000000000=0.0001101000000
So the Significa bit relative position moves left, the index part needs to be adjusted, and the index part is reduced by 4 based on the maximum index value.
FIG. 7 is a flow chart of a method for controlling floating point operations provided by one embodiment of the present application, which may be applied to the chip described in the above embodiments, where the chip includes a floating point adder. The method may comprise the following steps:
In step 701, the mantissa portions of the n operands are added with the significands respectively, so as to obtain the combined data corresponding to the n operands respectively, wherein n is a positive integer greater than or equal to 3.
In an exemplary embodiment, the above step 701 includes the following sub-steps:
1. obtaining an exponent portion and a mantissa portion of an operand;
2. If the exponent part of the operand is 0, determining that the significand of the operand is 0;
3. if the exponent part of the operand is not 0, determining that the effective number of the operand is 1;
4. and merging the effective number of the operand and the mantissa part of the operand to obtain merging data corresponding to the operand.
Step 702, performing variable shift operation on the merged data corresponding to the n operands, respectively, to obtain n updated operands in fixed point format.
In an exemplary embodiment, the above-described step 702 includes the following sub-steps:
1. Determining a target index value;
2. for an ith operand of the n operands, determining a difference value of an exponent portion of the ith operand and a target exponent value;
3. Performing variable shift operation on the combined data corresponding to the ith operand according to the difference value to obtain an updated operand corresponding to the ith operand; wherein i is a positive integer less than or equal to n.
In step 703, the n updated operands are added to obtain an operation result.
In an exemplary embodiment, the above step 703 includes the following sub-steps:
1. Respectively acquiring sign bits of n operands;
2. For an operand with a sign bit indicated as a negative number, carrying out 2 complement operation on an updated operand corresponding to the operand to obtain n converted updated operands;
3. And carrying out addition operation on the converted n updated operands to obtain an operation result.
Step 704, converting the operation result into floating point number format, and outputting the operation result in floating point number format.
In an exemplary embodiment, step 704 described above includes the following sub-steps:
1. For the unsigned data corresponding to the operation result, determining the effective number of the unsigned data;
2. determining updated position information of the effective number, wherein the updated position information is used for indicating relative position information between an updated position and an original position of the effective number;
3. intercepting bit data of a set bit number according to a configured rounding mode to obtain a mantissa part corresponding to an operation result;
4. Determining an index part corresponding to the operation result according to the updated position information;
5. And outputting the operation result in the floating point number format according to the sign bit corresponding to the operation result, the mantissa part corresponding to the operation result and the exponent part corresponding to the operation result.
In an exemplary embodiment, the above method further comprises: in the process of converting the operation result into the floating point number format, if an abnormal condition occurs, carrying out abnormal processing on the operation result according to an abnormal mode and a rounding mode; wherein the abnormal condition includes at least one of: overflow, underflow, data inaccuracy.
In summary, according to the technical scheme provided by the embodiment of the application, the input floating point number is converted into the fixed point number and then the addition operation is performed, so that one floating point adder can support the addition operation of 3 or more floating point numbers, and the processing performance of the floating point adder is improved.
In addition, for details not described in detail in the method embodiments, reference may be made to the description in the other embodiments described above.
An exemplary embodiment of the present application provides a computer apparatus including a chip including a floating point adder as described in the above embodiments, which may be used to perform the control method of the floating point operation provided in the above embodiments. Optionally, the computer device may be a terminal device such as a mobile phone, a PC (Personal Computer ), a tablet computer, an internet of things device, a wearable device, or a server, such as a cloud server in a cloud service system.
Fig. 8 is a schematic structural diagram of a computer device according to an embodiment of the present application.
The computer device 800 includes a processing unit 801 (e.g., one or more of a CPU, GPU, AI chip, FPGA, ASIC), a system Memory 804 including a random access Memory (Random Access Memory, RAM) 802 and a Read-Only Memory (ROM) 803, and a system bus 805 connecting the system Memory 804 and the processing unit 801. The computer device 800 also includes a basic Input/Output system (I/O) 806, which helps to transfer information between various devices within the computer, and a mass storage device 807, which stores an operating system 813, application programs 814, and other program modules 815.
The basic input/output system 806 includes a display 808 for displaying information and an input device 809, such as a mouse, keyboard, or the like, for user input of information. Wherein the display 808 and the input device 809 are connected to the processing unit 801 via an input output controller 810 connected to the system bus 805. The basic input/output system 806 can also include an input/output controller 810 for receiving and processing input from a number of other devices, such as a keyboard, mouse, or electronic stylus. Similarly, the input output controller 810 also provides output to a display screen, a printer, or other type of output device.
The mass storage device 807 is connected to the processing unit 801 through a mass storage controller (not shown) connected to the system bus 805. The mass storage device 807 and its associated computer-readable media provide non-volatile storage for the computer device 800. That is, the mass storage device 807 may include a computer readable medium (not shown) such as a hard disk or CD-ROM (Compact Disc Read-Only Memory) drive.
The computer readable medium may include computer storage media and communication media without loss of generality. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media includes RAM, ROM, EPROM (Erasable Programmable Read-Only Memory), EEPROM (ELECTRICALLY ERASABLE PROGRAMMABLE READ-Only Memory), flash Memory (Flash Memory) or other solid state Memory technology, CD-ROM, DVD (DIGITAL VERSATILE DISC, digital versatile disk) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices. Of course, those skilled in the art will recognize that the computer storage medium is not limited to the one described above. The system memory 804 and mass storage device 807 described above may be collectively referred to as memory.
According to various embodiments of the application, the computer device 800 may also operate through a network, such as the Internet, to other devices on the network. I.e. the computer device 800 may be connected to a network 812 via a network interface unit 811 connected to said system bus 805, or alternatively, the network interface unit 811 may be used to connect to other types of networks or devices (not shown).
In an exemplary embodiment, as shown in FIG. 8, a chip 816 is included in the processing unit 801 of the computer device 800, the chip 816 including the floating point adder described in the above embodiments. The chip 816 including the floating-point adder may implement the control method of the floating-point operation as provided in the above embodiment.
It should be understood that references herein to "a plurality" are to two or more. In addition, the step numbers described herein are merely exemplary of one possible execution sequence among steps, and in some other embodiments, the steps may be executed out of the order of numbers, such as two differently numbered steps being executed simultaneously, or two differently numbered steps being executed in an order opposite to that shown, which is not limiting.
The foregoing description of the exemplary embodiments of the application is not intended to limit the application to the particular embodiments disclosed, but on the contrary, the intention is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the application.
Claims (6)
1. A chip comprising a floating point adder, the floating point adder comprising: the device comprises a merging unit, a variable shifting unit, a configurable adding unit and a conversion output unit;
The merging unit is used for acquiring exponent parts and mantissa parts of the n input operands; for each operand, determining that the significand of the operand is 0 if the exponent part of the operand is 0, and determining that the significand of the operand is 1 if the exponent part of the operand is not 0; adding the effective number before the mantissa part of the operand to obtain merging data corresponding to the operand, wherein n is a positive integer greater than or equal to 3;
The variable shifting unit is used for determining a target index value; for an ith operand of the n operands, determining a difference value of an exponent portion of the ith operand and the target exponent value; if the difference value is greater than 0, shifting the merged data of the ith operand by k bits to the right to obtain an updated operand in a fixed point number format corresponding to the ith operand; if the difference value is greater than 0, shifting the merged data of the ith operand left by k bits to obtain an updated operand in a fixed point number format corresponding to the ith operand; if the difference value is equal to 0, determining the merging data of the ith operand as an updating operand in a fixed point number format corresponding to the ith operand; wherein i is a positive integer less than or equal to n, the difference between the exponent portion of the ith operand and the target exponent value is k, and k is an integer;
the configurable adding unit is used for adding n updating operands to obtain an operation result;
the conversion output unit is used for converting the operation result into a floating point number format and outputting the operation result in the floating point number format;
Wherein the conversion output unit includes: the system comprises a mantissa cutting and rounding unit, an exponent updating unit and an output unit; the mantissa cutting and rounding unit is used for determining the effective number of the unsigned data corresponding to the operation result; determining updated position information of the effective number, wherein the updated position information is used for indicating relative position information between an updated position and an original position of the effective number; intercepting bit data of a set bit number according to a configured rounding mode to obtain a mantissa part corresponding to the operation result; the index updating unit is used for adjusting the target index value according to the updated position information and determining an index part corresponding to the operation result; the output unit is used for outputting the operation result in the floating point number format according to the sign bit corresponding to the operation result, the mantissa part corresponding to the operation result and the exponent part corresponding to the operation result;
The mantissa cutting and rounding unit is further used for carrying out exception processing on the operation result according to an exception mode and a rounding mode if an exception condition occurs in the process of converting the operation result into the floating point number format; wherein the abnormal condition includes at least one of: overflow, underflow, data inaccuracy.
2. The chip of claim 1, wherein the chip comprises a plurality of chips,
The target exponent value is a maximum exponent value in the exponent portion of the n operands;
Or alternatively
The target exponent value is a minimum exponent value in the exponent portion of the n operands;
Or alternatively
The target exponent value is any exponent value in the exponent portion of the n operands;
Or alternatively
The target index value is a predefined index value.
3. The chip of claim 1, wherein the configurable summing unit is configured to:
Respectively acquiring sign bits of the n operands;
2, carrying out 2 complement operation on an updated operand corresponding to the operand to obtain n converted updated operands;
And carrying out addition operation on the converted n updating operands to obtain the operation result.
4. A computer device comprising a chip comprising a floating point adder as claimed in any one of claims 1 to 3.
5. A method of controlling floating point operations for use in a chip including a floating point adder, the method comprising:
acquiring exponent parts and mantissa parts of n operands;
For each operand, determining that the significand of the operand is 0 if the exponent part of the operand is 0, and determining that the significand of the operand is 1 if the exponent part of the operand is not 0;
Adding the effective number before the mantissa part of the operand to obtain merging data corresponding to the operand, wherein n is a positive integer greater than or equal to 3;
Determining a target index value; for an ith operand of the n operands, determining a difference value of an exponent portion of the ith operand and the target exponent value; if the difference value is greater than 0, shifting the merged data of the ith operand by k bits to the right to obtain an updated operand in a fixed point number format corresponding to the ith operand; if the difference value is greater than 0, shifting the merged data of the ith operand left by k bits to obtain an updated operand in a fixed point number format corresponding to the ith operand; if the difference value is equal to 0, determining the merging data of the ith operand as an updating operand in a fixed point number format corresponding to the ith operand; wherein i is a positive integer less than or equal to n, the difference between the exponent portion of the ith operand and the target exponent value is k, and k is an integer;
Respectively carrying out variable shift operation on the merging data corresponding to the n operands to obtain n updated operands in a fixed point number format;
performing addition operation on the n updated operands to obtain an operation result;
For the unsigned data corresponding to the operation result, determining the effective number of the unsigned data;
Determining updated position information of the effective number, wherein the updated position information is used for indicating relative position information between an updated position and an original position of the effective number;
Intercepting bit data of a set bit number according to a configured rounding mode to obtain a mantissa part corresponding to the operation result;
Adjusting the target index value according to the updated position information, and determining an index part corresponding to the operation result;
outputting the operation result in the floating point number format according to the sign bit corresponding to the operation result, the mantissa part corresponding to the operation result and the exponent part corresponding to the operation result;
In the process of converting the operation result into the floating point number format, if an abnormal condition occurs, carrying out abnormal processing on the operation result according to an abnormal mode and a rounding mode; wherein the abnormal condition includes at least one of: overflow, underflow, data inaccuracy.
6. The method of claim 5, wherein adding the n updated operands to obtain an operation result comprises:
Respectively acquiring sign bits of the n operands;
2, carrying out 2 complement operation on an updated operand corresponding to the operand to obtain n converted updated operands;
And carrying out addition operation on the converted n updating operands to obtain the operation result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011004685.9A CN112130805B (en) | 2020-09-22 | 2020-09-22 | Chip comprising floating point adder, device and control method of floating point operation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011004685.9A CN112130805B (en) | 2020-09-22 | 2020-09-22 | Chip comprising floating point adder, device and control method of floating point operation |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112130805A CN112130805A (en) | 2020-12-25 |
CN112130805B true CN112130805B (en) | 2024-05-24 |
Family
ID=73842610
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011004685.9A Active CN112130805B (en) | 2020-09-22 | 2020-09-22 | Chip comprising floating point adder, device and control method of floating point operation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112130805B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113625989B (en) * | 2021-07-16 | 2024-06-28 | 深圳云天励飞技术股份有限公司 | Data operation device, method, electronic device, and storage medium |
CN115718586B (en) * | 2022-11-29 | 2024-01-19 | 格兰菲智能科技有限公司 | Pixel color mixing operation method, graphic drawing method, device and equipment |
CN117632081B (en) * | 2024-01-24 | 2024-04-19 | 沐曦集成电路(上海)有限公司 | Matrix data processing system for GPU |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101847087A (en) * | 2010-04-28 | 2010-09-29 | 中国科学院自动化研究所 | Reconfigurable transverse summing network structure for supporting fixed and floating points |
CN102520906A (en) * | 2011-12-13 | 2012-06-27 | 中国科学院自动化研究所 | Vector dot product accumulating network supporting reconfigurable fixed floating point and configurable vector length |
-
2020
- 2020-09-22 CN CN202011004685.9A patent/CN112130805B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101847087A (en) * | 2010-04-28 | 2010-09-29 | 中国科学院自动化研究所 | Reconfigurable transverse summing network structure for supporting fixed and floating points |
CN102520906A (en) * | 2011-12-13 | 2012-06-27 | 中国科学院自动化研究所 | Vector dot product accumulating network supporting reconfigurable fixed floating point and configurable vector length |
Non-Patent Citations (2)
Title |
---|
三数据通道浮点加法器的FPGA 实现;吉训生;《电子工程师》;20040831;43-45 * |
王诚主编.《计算机组成原理》.2008,59. * |
Also Published As
Publication number | Publication date |
---|---|
CN112130805A (en) | 2020-12-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112074806B (en) | System, method and computer storage medium for block floating point computing | |
CN112130805B (en) | Chip comprising floating point adder, device and control method of floating point operation | |
Lian et al. | High-performance FPGA-based CNN accelerator with block-floating-point arithmetic | |
CN109902186B (en) | Method and apparatus for generating neural network | |
US20190347553A1 (en) | Training neural networks using mixed precision computations | |
KR20210029785A (en) | Neural network acceleration and embedding compression system and method including activation sparse | |
US10579334B2 (en) | Block floating point computations using shared exponents | |
WO2021135455A1 (en) | Semantic recall method, apparatus, computer device, and storage medium | |
WO2022001724A1 (en) | Data processing method and device | |
Yin et al. | Designs of approximate floating-point multipliers with variable accuracy for error-tolerant applications | |
CN111033462A (en) | Providing efficient floating point operations using matrix processors in processor-based systems | |
CN118314264B (en) | Urban model rendering system and method based on digital twin | |
CN114586055A (en) | Multi-scale factor image super-resolution with microstructure mask | |
US20210279594A1 (en) | Method and apparatus for video coding | |
CN111915689A (en) | Method, apparatus, electronic device and computer readable medium for generating objective function | |
CN116700665B (en) | Method and device for determining floating point number square root reciprocal | |
US20220113943A1 (en) | Method for multiply-add operations for neural network | |
CN111860824A (en) | Data processing method and related product | |
CN111274818A (en) | Word vector generation method and device | |
CN111860825A (en) | Data processing method and related product | |
CN117058668B (en) | Three-dimensional model face reduction evaluation method and device | |
Zendegani et al. | AMCAL: Approximate Multiplier with the Configurable Accuracy Levels for Image Processing and Convolutional Neural Network | |
CN116071370A (en) | Smearing type image segmentation method and device, electronic equipment and medium | |
US20230110383A1 (en) | Floating-point logarithmic number system scaling system for machine learning | |
KR20230096902A (en) | Operator structure for floating-point dot product, and apparatus having the same |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |