CN116661729A - Adder tree design system of mixed floating point - Google Patents
Adder tree design system of mixed floating point Download PDFInfo
- Publication number
- CN116661729A CN116661729A CN202310652558.7A CN202310652558A CN116661729A CN 116661729 A CN116661729 A CN 116661729A CN 202310652558 A CN202310652558 A CN 202310652558A CN 116661729 A CN116661729 A CN 116661729A
- Authority
- CN
- China
- Prior art keywords
- floating point
- module
- data
- data transmission
- result
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000013461 design Methods 0.000 title claims abstract description 33
- 241001442055 Vipera berus Species 0.000 title claims abstract description 28
- 238000004364 calculation method Methods 0.000 claims abstract description 62
- 230000005540 biological transmission Effects 0.000 claims abstract description 38
- 238000005516 engineering process Methods 0.000 claims abstract description 30
- 238000010606 normalization Methods 0.000 claims description 19
- 238000013500 data storage Methods 0.000 claims description 10
- 230000008054 signal transmission Effects 0.000 claims description 3
- 238000005457 optimization Methods 0.000 abstract 1
- 238000000034 method Methods 0.000 description 5
- 238000013473 artificial intelligence Methods 0.000 description 4
- 238000013135 deep learning Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 238000009825 accumulation Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 230000004075 alteration Effects 0.000 description 1
- 238000013479 data entry Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
- G06F7/483—Computations with numbers represented by a non-linear combination of denominational numbers, e.g. rational numbers, logarithmic number system or floating-point numbers
- G06F7/485—Adding; Subtracting
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Abstract
The invention provides a mixed floating point adder tree design system, and relates to the technical field of AI chips. The mixed floating point adder tree design system comprises an AI operation system, wherein a mixed floating point addition tree system is arranged in the AI operation system, an input module, an in-memory calculation circuit, an index comparison module, a fraction alignment module, a floating point calculation circuit, a normalized rounding module and an output module are arranged in the mixed floating point addition tree system, the index comparison module is connected with the in-memory calculation circuit through a data transmission technology, and the floating point calculation circuit is connected with the fraction alignment module through the data transmission technology. Through the mixed floating point addition tree structure of the floating point operation circuit, the design structure of optimization combination is utilized, so that the cost of circuit design area is saved, the operation efficiency is effectively improved, meanwhile, the power consumption of hardware operation is reduced, the calculation precision is greatly improved, and high-precision data operation can be realized with lower power consumption cost.
Description
Technical Field
The invention relates to the technical field of AI chips, in particular to a mixed floating point adder tree design system.
Background
Artificial intelligence (Artificial Intelligence), abbreviated AI, is a new technical science for researching, developing theory, methods, techniques and application systems for simulating, extending and expanding human intelligence, is a branch of computer science, and is intended to understand the essence of intelligence and produce a new intelligent machine capable of reacting in a similar manner to human intelligence.
In recent years, with the wide application of artificial intelligence technology in various fields, the performance of computer hardware is also optimized iteratively, the existing AI chip continuously pursues the superiority of high computational power, low power consumption and low cost to meet the calculation requirement of the deep learning algorithm in the artificial intelligence field, however, with the rapid development of the deep learning algorithm, the new algorithm is not well supported in the cured circuit design structure, and in order to continuously optimize the AI chip circuit design, the design mode of the traditional floating point adder tree is to directly stack a plurality of single floating point adders, and each single floating point adder can independently finish the addition of two floating point numbers. This design has the following disadvantages: 1) The single floating point adder operation requires four steps of exponential comparison, fraction alignment, fraction addition and normalization, and direct superposition of a plurality of single floating point adders can generate large area cost and power consumption expenditure; 2) The single floating point addition number is rounded again after each operation is finished, and the continuous inclusion has a great influence on the accuracy of an operation result.
Disclosure of Invention
(one) solving the technical problems
Aiming at the defects of the prior art, the invention provides a mixed floating point adder tree design system, which solves the problems in the background art.
(II) technical scheme
In order to achieve the above purpose, the invention is realized by the following technical scheme: the utility model provides a mixed floating point adder tree design system, includes AI operation system, AI operation system's inside is provided with mixed floating point addition tree system for carry out mixed floating point addition operation to data, promotes the computational efficiency of floating point operation unit, solves current chip circuit precision and the higher problem of design cost, mixed floating point addition tree system inside is provided with input module, deposit internal calculation circuit, index comparison module, score alignment module, floating point calculation circuit, normalization rounding module and output module, deposit internal calculation circuit is used for carrying out logical operation to the data of input and/or storage, so as to form first calculation result, index comparison module links to each other with deposit internal calculation circuit through data transmission technique, be used for carrying out index comparison to the first calculation result that deposit internal calculation circuit formed, index comparison's result is as the benchmark that the score was aligned, score alignment module links to each other through data transmission technique, be used for guiding the completion of a plurality of score alignments, floating point calculation circuit links to each other with score alignment module through data transmission technique, is used for carrying out the floating point operation to the first calculation result, so as to form the second calculation result.
Preferably, the input module comprises a data input unit, a data storage unit and a data transmission unit, wherein the data input unit is connected with the data storage unit through a data transmission technology and is used for temporarily storing externally input data, the data storage unit is connected with the data transmission unit through the data transmission technology, and the data transmission unit is connected with the in-memory computing circuit through the data transmission technology and is used for transmitting the input and/or stored data to the in-memory computing circuit and performing data logic operation on the data.
Preferably, the output module is configured to output an operation result of the normalization rounding module.
Preferably, the logical operation includes one or more of and, or, not, nand, nor, exclusive or, and exclusive or.
Preferably, the operation of the adder tree design system of the mixed floating point mainly comprises the following steps:
a. data to be operated and processed are input through the input module and are transmitted to the in-memory computing circuit through a signal transmission technology;
b. the in-memory computing circuit carries out logic operation on the transmitted and/or stored data to form a first computing result, and the first computing result is input to the index comparison module through a data transmission technology;
c. the index comparison module performs index comparison on the first calculation result, and the index comparison result is used as a score alignment standard to guide the score alignment module to finish alignment of a plurality of scores;
d. the floating point calculation circuit carries out floating point operation on the first calculation result of the completion of the index comparison and the fraction alignment by the index comparison module and the fraction alignment module so as to form a second calculation result, and the second calculation result is transmitted to the normalization rounding module through a data transmission technology;
e. the normalization rounding module performs unified normalization rounding on the second calculation result, normalizes mantissas of the result, and finally transmits normalized rounded data to the output module through a data transmission technology to output a final floating point operation result.
Preferably, in the step b, the floating point number of the first calculation result includes a sign bit of 1bit, an exponent bit of nbit and a mantissa bit of mbit, so as to store a number in ±1.xx×2ζ.
Working principle: firstly, the invention combines a plurality of single floating point adders into a mixed floating point addition tree by changing the internal structure of the single floating point adders, wherein the plurality of single floating point adders are combined into a complete operator, the design cost of a single floating point addition circuit is reduced, secondly, when data is input into a floating point operation unit, all addends in the mixed floating point addition tree are subjected to index comparison, the result of the index comparison is used as a standard of score alignment to guide the completion of a plurality of score pairs Ji Leijia operation, finally, the total operation result obtained after the score alignment accumulation is completed is subjected to unified normalization rounding, the mantissa of the result is normalized, and the final floating point operation result is output.
(III) beneficial effects
The invention provides a mixed floating point adder tree design system. The beneficial effects are as follows:
1. the invention creatively provides a mixed floating point addition tree structure of the floating point operation circuit, utilizes an optimized and combined design structure, not only saves the cost of circuit design area, but also effectively improves operation efficiency, reduces the power consumption of hardware operation, greatly improves the calculation precision, and can realize high-precision data operation with lower power consumption cost.
2. The invention solves the problem that the floating point operation module in the current AI chip circuit is difficult to balance the efficiency and the power consumption cost, meets the calculation requirement of the deep learning algorithm with large operation parameter quantity and high operation complexity, and can be widely applied to AI chips with high precision, high calculation force and low power consumption.
Drawings
FIG. 1 is a schematic diagram of a system of the present invention;
FIG. 2 is a schematic diagram of an input module according to the present invention;
FIG. 3 is a schematic diagram of a hybrid floating point addition tree of the present invention.
Wherein, 1, AI operation system; 2. a hybrid floating point addition tree system; 3. an input module; 4. a memory calculation circuit; 5. an index comparison module; 6. a score alignment module; 7. a floating point calculation circuit; 8. a normalization rounding module; 9. an output module; 10. a data entry unit; 11. a data storage unit; 12. and a data transmission unit.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Examples:
as shown in fig. 1-3, the embodiment of the present invention provides a mixed floating point adder tree design system, which includes an AI operation system 1, a mixed floating point adding tree system 2 is disposed in the AI operation system 1, and is used for performing mixed floating point adding operation on data, improving the calculation efficiency of the floating point operation unit, solving the problems of lower precision and higher design cost of the existing chip circuit, an input module 3, an in-memory calculation circuit 4, an exponent comparing module 5, a fraction aligning module 6, a floating point calculating circuit 7, a normalization rounding module 8 and an output module 9 are disposed in the mixed floating point adding tree system 2, the in-memory calculation circuit 4 is used for performing logic operation on input and/or stored data to form a first calculation result, the exponent comparing module 5 is connected with the in-memory calculation circuit 4 through a data transmission technology, and is used for performing exponent comparison on the first calculation result formed by the in-memory calculation circuit 4, the exponent comparing result is used as a reference for performing exponent alignment, the exponent aligning module 6 is connected with the exponent comparing module 5 through a data transmission technology, and is used for guiding the completion of multiple fraction alignments, the floating point calculating circuit 7 is connected with the exponent aligning module 6 through a data transmission technology, and is used for performing the first calculation result.
The input module 3 includes a data input unit 10, a data storage unit 11 and a data transmission unit 12, where the data input unit 10 is connected with the data storage unit 11 through a data transmission technology and is used for temporarily storing externally input data, the data storage unit 11 is connected with the data transmission unit 12 through a data transmission technology, and the data transmission unit 12 is connected with the in-memory computing circuit 4 through a data transmission technology and is used for transmitting input and/or stored data to the in-memory computing circuit 4 and performing data logic operation on the data.
The output module 9 is configured to output the operation result of the normalization rounding module 8.
The logical operations include one or more of and, or, not, nand, nor, exclusive or, and exclusive or.
Firstly, the internal structures of a plurality of single floating point adders are changed and combined into a mixed floating point addition tree, wherein the plurality of single floating point adders are combined into a complete operator, the design cost of a single floating point addition circuit is reduced, secondly, when data is input into a floating point operation unit, index comparison is carried out on all addends in the mixed floating point addition tree, the result of the index comparison is used as a score alignment standard to guide the completion of a plurality of score pairs Ji Leijia operation, finally, the total operation result obtained after the score alignment accumulation is carried out is subjected to unified normalization rounding, the mantissa of the result is normalized, and a final floating point operation result is output, so that calculation precision errors caused by multiple rounding can be effectively avoided.
The operation of the adder tree design system of the mixed floating point mainly comprises the following steps:
a. data to be operated and processed are input through the input module 3 and are transmitted to the in-memory computing circuit 4 through a signal transmission technology;
b. the in-memory computing circuit 4 performs logic operation on the transmitted and/or stored data to form a first computing result, and inputs the first computing result to the exponent comparing module 5 through a data transmission technology;
c. the index comparison module 5 performs index comparison on the first calculation result, and the index comparison result is used as a reference of score alignment to guide the score alignment module 6 to finish a plurality of score alignments;
d. the floating point calculating circuit 7 carries out floating point operation on the first calculation result of the completion of the index comparison and the fraction alignment by the index comparing module 5 and the fraction alignment module 6 to form a second calculation result, and transmits the second calculation result to the normalization rounding module 8 through a data transmission technology;
e. the normalization rounding module 8 performs unified normalization rounding on the second calculation result, normalizes the mantissa of the result, and finally transmits the normalized rounded data to the output module 9 through a data transmission technology to output a final floating point operation result.
In step b, the floating point number of the first calculation result includes a 1bit sign bit, an nbit exponent bit and an mbit mantissa bit to store + -1. Xx 2 x format numbers.
The invention creatively provides a mixed floating point addition tree structure of a floating point operation circuit, utilizes an optimized and combined design structure, not only saves the cost of circuit design area, but also effectively improves operation efficiency, simultaneously reduces the power consumption of hardware operation, greatly improves calculation precision, and can realize high-precision data operation with smaller power consumption cost.
Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.
Claims (6)
1. The adder tree design system of mixed floating point includes AI operation system (1), its characteristic is: the AI operation system (1) is internally provided with a mixed floating point addition tree system (2) for carrying out mixed floating point addition operation on data, so that the calculation efficiency of a floating point operation unit is improved, the problems of low accuracy and high design cost of an existing chip circuit are solved, the mixed floating point addition tree system (2) is internally provided with an input module (3), an in-memory calculation circuit (4), an index comparison module (5), a fraction alignment module (6), a floating point calculation circuit (7), a normalization rounding module (8) and an output module (9), the in-memory calculation circuit (4) is used for carrying out logic operation on input and/or stored data to form a first calculation result, the index comparison module (5) is connected with the in-memory calculation circuit (4) through a data transmission technology, the first calculation result formed by the in-memory calculation circuit (4) is used for carrying out index comparison, the index comparison result is used as a reference for carrying out fraction alignment, the fraction alignment module (6) is connected with the index comparison module (5) through a data transmission technology, and the floating point calculation circuit (7) is connected with the second calculation result through the floating point alignment module (6) to form a second calculation result through the floating point alignment technology.
2. The mixed floating point adder tree design system of claim 1, wherein: the input module (3) comprises a data input unit (10), a data storage unit (11) and a data transmission unit (12), wherein the data input unit (10) is connected with the data storage unit (11) through a data transmission technology and is used for temporarily storing externally input data, the data storage unit (11) is connected with the data transmission unit (12) through the data transmission technology, and the data transmission unit (12) is connected with the in-memory computing circuit (4) through the data transmission technology and is used for transmitting input and/or stored data to the in-memory computing circuit (4) and performing data logic operation on the data.
3. The mixed floating point adder tree design system of claim 1, wherein: the output module (9) is used for outputting the operation result of the normalization rounding module (8).
4. The mixed floating point adder tree design system of claim 1, wherein: the logical operation includes one or more of and, or, not, nand, nor, exclusive or, and exclusive or.
5. The mixed floating point adder tree design system of claim 1, wherein: the operation of the adder tree design system of the mixed floating point mainly comprises the following steps:
a. data to be operated and processed are input through the input module (3) and are transmitted to the in-memory computing circuit (4) through a signal transmission technology;
b. the in-memory computing circuit (4) carries out logic operation on the transmitted and/or stored data to form a first computing result, and the first computing result is input to the exponent comparing module (5) through a data transmission technology;
c. the index comparison module (5) performs index comparison on the first calculation result, and the index comparison result is used as a reference of score alignment to guide the score alignment module (6) to finish a plurality of score alignments;
d. the floating point calculation circuit (7) carries out floating point operation on the first calculation result of the completion of the index comparison and the score alignment by the index comparison module (5) and the score alignment module (6) to form a second calculation result, and the second calculation result is transmitted to the normalization rounding module (8) through a data transmission technology;
e. and the normalization rounding module (8) performs unified normalization rounding on the second calculation result, normalizes the mantissa of the result, and finally transmits the normalized rounded data to the output module (9) through a data transmission technology to output a final floating point operation result.
6. The mixed floating point adder tree design system according to claim 5, wherein: in the step b, the floating point number of the first calculation result includes a sign bit of 1bit, an exponent bit of nbit and a mantissa bit of mbit to store a number in + -1. Xx 2 x format.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310652558.7A CN116661729A (en) | 2023-06-05 | 2023-06-05 | Adder tree design system of mixed floating point |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310652558.7A CN116661729A (en) | 2023-06-05 | 2023-06-05 | Adder tree design system of mixed floating point |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116661729A true CN116661729A (en) | 2023-08-29 |
Family
ID=87727521
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310652558.7A Pending CN116661729A (en) | 2023-06-05 | 2023-06-05 | Adder tree design system of mixed floating point |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116661729A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117130664A (en) * | 2023-10-23 | 2023-11-28 | 北京星宇天航科技有限公司 | Artificial intelligent chip computing method and device for chip detection task and chip |
-
2023
- 2023-06-05 CN CN202310652558.7A patent/CN116661729A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117130664A (en) * | 2023-10-23 | 2023-11-28 | 北京星宇天航科技有限公司 | Artificial intelligent chip computing method and device for chip detection task and chip |
CN117130664B (en) * | 2023-10-23 | 2024-01-09 | 北京星宇天航科技有限公司 | Artificial intelligent chip computing method and device for chip detection task and chip |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110390383A (en) | A kind of deep neural network hardware accelerator based on power exponent quantization | |
CN116661729A (en) | Adder tree design system of mixed floating point | |
CN102710477B (en) | Data processing system based on VPX bus structure | |
CN108710596A (en) | It is a kind of to assist the desktop of processing card is super to calculate hardware platform based on DSP and FPGA more | |
CN109146067B (en) | Policy convolution neural network accelerator based on FPGA | |
CN106127302A (en) | Process the circuit of data, image processing system, the method and apparatus of process data | |
CN111242282A (en) | Deep learning model training acceleration method based on end edge cloud cooperation | |
CN109740739A (en) | Neural computing device, neural computing method and Related product | |
CN111583940A (en) | Very low power consumption keyword awakening neural network circuit | |
CN103116493B (en) | A kind of automatic mapping method being applied to coarse-grained reconfigurable array | |
CN110383300A (en) | A kind of computing device and method | |
CN111831354B (en) | Data precision configuration method, device, chip array, equipment and medium | |
CN111860773B (en) | Processing apparatus and method for information processing | |
CN115018062A (en) | Convolutional neural network accelerator based on FPGA | |
WO2023070997A1 (en) | Deep learning convolution acceleration method using bit-level sparsity, and processor | |
CN107877582B (en) | A kind of circuit board Cutting road planning method | |
Zong-ling et al. | The design of lightweight and multi parallel CNN accelerator based on FPGA | |
CN112651207A (en) | Asynchronous circuit physical realization method and system | |
CN116822600A (en) | Neural network search chip based on RISC-V architecture | |
CN115952393B (en) | Forward computing method and system of multi-head attention mechanism based on supercomputer | |
CN102109973A (en) | Processor implementation device including transcendental function calculation | |
Zhang et al. | Apply yolov4-tiny on an fpga-based accelerator of convolutional neural network for object detection | |
US11886347B2 (en) | Large-scale data processing computer architecture | |
CN106980348A (en) | The hardware structure extended based on PCIE | |
Song et al. | Bsc: Block-based stochastic computing to enable accurate and efficient tinyml |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication |