CN116661729A - Adder tree design system of mixed floating point - Google Patents

Adder tree design system of mixed floating point Download PDF

Info

Publication number
CN116661729A
CN116661729A CN202310652558.7A CN202310652558A CN116661729A CN 116661729 A CN116661729 A CN 116661729A CN 202310652558 A CN202310652558 A CN 202310652558A CN 116661729 A CN116661729 A CN 116661729A
Authority
CN
China
Prior art keywords
floating point
module
data
data transmission
result
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310652558.7A
Other languages
Chinese (zh)
Inventor
杨越
燕博南
范安骏逸
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Pingxin Technology Co ltd
Original Assignee
Beijing Pingxin Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Pingxin Technology Co ltd filed Critical Beijing Pingxin Technology Co ltd
Priority to CN202310652558.7A priority Critical patent/CN116661729A/en
Publication of CN116661729A publication Critical patent/CN116661729A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/483Computations with numbers represented by a non-linear combination of denominational numbers, e.g. rational numbers, logarithmic number system or floating-point numbers
    • G06F7/485Adding; Subtracting
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention provides a mixed floating point adder tree design system, and relates to the technical field of AI chips. The mixed floating point adder tree design system comprises an AI operation system, wherein a mixed floating point addition tree system is arranged in the AI operation system, an input module, an in-memory calculation circuit, an index comparison module, a fraction alignment module, a floating point calculation circuit, a normalized rounding module and an output module are arranged in the mixed floating point addition tree system, the index comparison module is connected with the in-memory calculation circuit through a data transmission technology, and the floating point calculation circuit is connected with the fraction alignment module through the data transmission technology. Through the mixed floating point addition tree structure of the floating point operation circuit, the design structure of optimization combination is utilized, so that the cost of circuit design area is saved, the operation efficiency is effectively improved, meanwhile, the power consumption of hardware operation is reduced, the calculation precision is greatly improved, and high-precision data operation can be realized with lower power consumption cost.

Description

Adder tree design system of mixed floating point
Technical Field
The invention relates to the technical field of AI chips, in particular to a mixed floating point adder tree design system.
Background
Artificial intelligence (Artificial Intelligence), abbreviated AI, is a new technical science for researching, developing theory, methods, techniques and application systems for simulating, extending and expanding human intelligence, is a branch of computer science, and is intended to understand the essence of intelligence and produce a new intelligent machine capable of reacting in a similar manner to human intelligence.
In recent years, with the wide application of artificial intelligence technology in various fields, the performance of computer hardware is also optimized iteratively, the existing AI chip continuously pursues the superiority of high computational power, low power consumption and low cost to meet the calculation requirement of the deep learning algorithm in the artificial intelligence field, however, with the rapid development of the deep learning algorithm, the new algorithm is not well supported in the cured circuit design structure, and in order to continuously optimize the AI chip circuit design, the design mode of the traditional floating point adder tree is to directly stack a plurality of single floating point adders, and each single floating point adder can independently finish the addition of two floating point numbers. This design has the following disadvantages: 1) The single floating point adder operation requires four steps of exponential comparison, fraction alignment, fraction addition and normalization, and direct superposition of a plurality of single floating point adders can generate large area cost and power consumption expenditure; 2) The single floating point addition number is rounded again after each operation is finished, and the continuous inclusion has a great influence on the accuracy of an operation result.
Disclosure of Invention
(one) solving the technical problems
Aiming at the defects of the prior art, the invention provides a mixed floating point adder tree design system, which solves the problems in the background art.
(II) technical scheme
In order to achieve the above purpose, the invention is realized by the following technical scheme: the utility model provides a mixed floating point adder tree design system, includes AI operation system, AI operation system's inside is provided with mixed floating point addition tree system for carry out mixed floating point addition operation to data, promotes the computational efficiency of floating point operation unit, solves current chip circuit precision and the higher problem of design cost, mixed floating point addition tree system inside is provided with input module, deposit internal calculation circuit, index comparison module, score alignment module, floating point calculation circuit, normalization rounding module and output module, deposit internal calculation circuit is used for carrying out logical operation to the data of input and/or storage, so as to form first calculation result, index comparison module links to each other with deposit internal calculation circuit through data transmission technique, be used for carrying out index comparison to the first calculation result that deposit internal calculation circuit formed, index comparison's result is as the benchmark that the score was aligned, score alignment module links to each other through data transmission technique, be used for guiding the completion of a plurality of score alignments, floating point calculation circuit links to each other with score alignment module through data transmission technique, is used for carrying out the floating point operation to the first calculation result, so as to form the second calculation result.
Preferably, the input module comprises a data input unit, a data storage unit and a data transmission unit, wherein the data input unit is connected with the data storage unit through a data transmission technology and is used for temporarily storing externally input data, the data storage unit is connected with the data transmission unit through the data transmission technology, and the data transmission unit is connected with the in-memory computing circuit through the data transmission technology and is used for transmitting the input and/or stored data to the in-memory computing circuit and performing data logic operation on the data.
Preferably, the output module is configured to output an operation result of the normalization rounding module.
Preferably, the logical operation includes one or more of and, or, not, nand, nor, exclusive or, and exclusive or.
Preferably, the operation of the adder tree design system of the mixed floating point mainly comprises the following steps:
a. data to be operated and processed are input through the input module and are transmitted to the in-memory computing circuit through a signal transmission technology;
b. the in-memory computing circuit carries out logic operation on the transmitted and/or stored data to form a first computing result, and the first computing result is input to the index comparison module through a data transmission technology;
c. the index comparison module performs index comparison on the first calculation result, and the index comparison result is used as a score alignment standard to guide the score alignment module to finish alignment of a plurality of scores;
d. the floating point calculation circuit carries out floating point operation on the first calculation result of the completion of the index comparison and the fraction alignment by the index comparison module and the fraction alignment module so as to form a second calculation result, and the second calculation result is transmitted to the normalization rounding module through a data transmission technology;
e. the normalization rounding module performs unified normalization rounding on the second calculation result, normalizes mantissas of the result, and finally transmits normalized rounded data to the output module through a data transmission technology to output a final floating point operation result.
Preferably, in the step b, the floating point number of the first calculation result includes a sign bit of 1bit, an exponent bit of nbit and a mantissa bit of mbit, so as to store a number in ±1.xx×2ζ.
Working principle: firstly, the invention combines a plurality of single floating point adders into a mixed floating point addition tree by changing the internal structure of the single floating point adders, wherein the plurality of single floating point adders are combined into a complete operator, the design cost of a single floating point addition circuit is reduced, secondly, when data is input into a floating point operation unit, all addends in the mixed floating point addition tree are subjected to index comparison, the result of the index comparison is used as a standard of score alignment to guide the completion of a plurality of score pairs Ji Leijia operation, finally, the total operation result obtained after the score alignment accumulation is completed is subjected to unified normalization rounding, the mantissa of the result is normalized, and the final floating point operation result is output.
(III) beneficial effects
The invention provides a mixed floating point adder tree design system. The beneficial effects are as follows:
1. the invention creatively provides a mixed floating point addition tree structure of the floating point operation circuit, utilizes an optimized and combined design structure, not only saves the cost of circuit design area, but also effectively improves operation efficiency, reduces the power consumption of hardware operation, greatly improves the calculation precision, and can realize high-precision data operation with lower power consumption cost.
2. The invention solves the problem that the floating point operation module in the current AI chip circuit is difficult to balance the efficiency and the power consumption cost, meets the calculation requirement of the deep learning algorithm with large operation parameter quantity and high operation complexity, and can be widely applied to AI chips with high precision, high calculation force and low power consumption.
Drawings
FIG. 1 is a schematic diagram of a system of the present invention;
FIG. 2 is a schematic diagram of an input module according to the present invention;
FIG. 3 is a schematic diagram of a hybrid floating point addition tree of the present invention.
Wherein, 1, AI operation system; 2. a hybrid floating point addition tree system; 3. an input module; 4. a memory calculation circuit; 5. an index comparison module; 6. a score alignment module; 7. a floating point calculation circuit; 8. a normalization rounding module; 9. an output module; 10. a data entry unit; 11. a data storage unit; 12. and a data transmission unit.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Examples:
as shown in fig. 1-3, the embodiment of the present invention provides a mixed floating point adder tree design system, which includes an AI operation system 1, a mixed floating point adding tree system 2 is disposed in the AI operation system 1, and is used for performing mixed floating point adding operation on data, improving the calculation efficiency of the floating point operation unit, solving the problems of lower precision and higher design cost of the existing chip circuit, an input module 3, an in-memory calculation circuit 4, an exponent comparing module 5, a fraction aligning module 6, a floating point calculating circuit 7, a normalization rounding module 8 and an output module 9 are disposed in the mixed floating point adding tree system 2, the in-memory calculation circuit 4 is used for performing logic operation on input and/or stored data to form a first calculation result, the exponent comparing module 5 is connected with the in-memory calculation circuit 4 through a data transmission technology, and is used for performing exponent comparison on the first calculation result formed by the in-memory calculation circuit 4, the exponent comparing result is used as a reference for performing exponent alignment, the exponent aligning module 6 is connected with the exponent comparing module 5 through a data transmission technology, and is used for guiding the completion of multiple fraction alignments, the floating point calculating circuit 7 is connected with the exponent aligning module 6 through a data transmission technology, and is used for performing the first calculation result.
The input module 3 includes a data input unit 10, a data storage unit 11 and a data transmission unit 12, where the data input unit 10 is connected with the data storage unit 11 through a data transmission technology and is used for temporarily storing externally input data, the data storage unit 11 is connected with the data transmission unit 12 through a data transmission technology, and the data transmission unit 12 is connected with the in-memory computing circuit 4 through a data transmission technology and is used for transmitting input and/or stored data to the in-memory computing circuit 4 and performing data logic operation on the data.
The output module 9 is configured to output the operation result of the normalization rounding module 8.
The logical operations include one or more of and, or, not, nand, nor, exclusive or, and exclusive or.
Firstly, the internal structures of a plurality of single floating point adders are changed and combined into a mixed floating point addition tree, wherein the plurality of single floating point adders are combined into a complete operator, the design cost of a single floating point addition circuit is reduced, secondly, when data is input into a floating point operation unit, index comparison is carried out on all addends in the mixed floating point addition tree, the result of the index comparison is used as a score alignment standard to guide the completion of a plurality of score pairs Ji Leijia operation, finally, the total operation result obtained after the score alignment accumulation is carried out is subjected to unified normalization rounding, the mantissa of the result is normalized, and a final floating point operation result is output, so that calculation precision errors caused by multiple rounding can be effectively avoided.
The operation of the adder tree design system of the mixed floating point mainly comprises the following steps:
a. data to be operated and processed are input through the input module 3 and are transmitted to the in-memory computing circuit 4 through a signal transmission technology;
b. the in-memory computing circuit 4 performs logic operation on the transmitted and/or stored data to form a first computing result, and inputs the first computing result to the exponent comparing module 5 through a data transmission technology;
c. the index comparison module 5 performs index comparison on the first calculation result, and the index comparison result is used as a reference of score alignment to guide the score alignment module 6 to finish a plurality of score alignments;
d. the floating point calculating circuit 7 carries out floating point operation on the first calculation result of the completion of the index comparison and the fraction alignment by the index comparing module 5 and the fraction alignment module 6 to form a second calculation result, and transmits the second calculation result to the normalization rounding module 8 through a data transmission technology;
e. the normalization rounding module 8 performs unified normalization rounding on the second calculation result, normalizes the mantissa of the result, and finally transmits the normalized rounded data to the output module 9 through a data transmission technology to output a final floating point operation result.
In step b, the floating point number of the first calculation result includes a 1bit sign bit, an nbit exponent bit and an mbit mantissa bit to store + -1. Xx 2 x format numbers.
The invention creatively provides a mixed floating point addition tree structure of a floating point operation circuit, utilizes an optimized and combined design structure, not only saves the cost of circuit design area, but also effectively improves operation efficiency, simultaneously reduces the power consumption of hardware operation, greatly improves calculation precision, and can realize high-precision data operation with smaller power consumption cost.
Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims (6)

1. The adder tree design system of mixed floating point includes AI operation system (1), its characteristic is: the AI operation system (1) is internally provided with a mixed floating point addition tree system (2) for carrying out mixed floating point addition operation on data, so that the calculation efficiency of a floating point operation unit is improved, the problems of low accuracy and high design cost of an existing chip circuit are solved, the mixed floating point addition tree system (2) is internally provided with an input module (3), an in-memory calculation circuit (4), an index comparison module (5), a fraction alignment module (6), a floating point calculation circuit (7), a normalization rounding module (8) and an output module (9), the in-memory calculation circuit (4) is used for carrying out logic operation on input and/or stored data to form a first calculation result, the index comparison module (5) is connected with the in-memory calculation circuit (4) through a data transmission technology, the first calculation result formed by the in-memory calculation circuit (4) is used for carrying out index comparison, the index comparison result is used as a reference for carrying out fraction alignment, the fraction alignment module (6) is connected with the index comparison module (5) through a data transmission technology, and the floating point calculation circuit (7) is connected with the second calculation result through the floating point alignment module (6) to form a second calculation result through the floating point alignment technology.
2. The mixed floating point adder tree design system of claim 1, wherein: the input module (3) comprises a data input unit (10), a data storage unit (11) and a data transmission unit (12), wherein the data input unit (10) is connected with the data storage unit (11) through a data transmission technology and is used for temporarily storing externally input data, the data storage unit (11) is connected with the data transmission unit (12) through the data transmission technology, and the data transmission unit (12) is connected with the in-memory computing circuit (4) through the data transmission technology and is used for transmitting input and/or stored data to the in-memory computing circuit (4) and performing data logic operation on the data.
3. The mixed floating point adder tree design system of claim 1, wherein: the output module (9) is used for outputting the operation result of the normalization rounding module (8).
4. The mixed floating point adder tree design system of claim 1, wherein: the logical operation includes one or more of and, or, not, nand, nor, exclusive or, and exclusive or.
5. The mixed floating point adder tree design system of claim 1, wherein: the operation of the adder tree design system of the mixed floating point mainly comprises the following steps:
a. data to be operated and processed are input through the input module (3) and are transmitted to the in-memory computing circuit (4) through a signal transmission technology;
b. the in-memory computing circuit (4) carries out logic operation on the transmitted and/or stored data to form a first computing result, and the first computing result is input to the exponent comparing module (5) through a data transmission technology;
c. the index comparison module (5) performs index comparison on the first calculation result, and the index comparison result is used as a reference of score alignment to guide the score alignment module (6) to finish a plurality of score alignments;
d. the floating point calculation circuit (7) carries out floating point operation on the first calculation result of the completion of the index comparison and the score alignment by the index comparison module (5) and the score alignment module (6) to form a second calculation result, and the second calculation result is transmitted to the normalization rounding module (8) through a data transmission technology;
e. and the normalization rounding module (8) performs unified normalization rounding on the second calculation result, normalizes the mantissa of the result, and finally transmits the normalized rounded data to the output module (9) through a data transmission technology to output a final floating point operation result.
6. The mixed floating point adder tree design system according to claim 5, wherein: in the step b, the floating point number of the first calculation result includes a sign bit of 1bit, an exponent bit of nbit and a mantissa bit of mbit to store a number in + -1. Xx 2 x format.
CN202310652558.7A 2023-06-05 2023-06-05 Adder tree design system of mixed floating point Pending CN116661729A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310652558.7A CN116661729A (en) 2023-06-05 2023-06-05 Adder tree design system of mixed floating point

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310652558.7A CN116661729A (en) 2023-06-05 2023-06-05 Adder tree design system of mixed floating point

Publications (1)

Publication Number Publication Date
CN116661729A true CN116661729A (en) 2023-08-29

Family

ID=87727521

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310652558.7A Pending CN116661729A (en) 2023-06-05 2023-06-05 Adder tree design system of mixed floating point

Country Status (1)

Country Link
CN (1) CN116661729A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117130664A (en) * 2023-10-23 2023-11-28 北京星宇天航科技有限公司 Artificial intelligent chip computing method and device for chip detection task and chip

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117130664A (en) * 2023-10-23 2023-11-28 北京星宇天航科技有限公司 Artificial intelligent chip computing method and device for chip detection task and chip
CN117130664B (en) * 2023-10-23 2024-01-09 北京星宇天航科技有限公司 Artificial intelligent chip computing method and device for chip detection task and chip

Similar Documents

Publication Publication Date Title
CN110390383A (en) A kind of deep neural network hardware accelerator based on power exponent quantization
CN116661729A (en) Adder tree design system of mixed floating point
CN102710477B (en) Data processing system based on VPX bus structure
CN108710596A (en) It is a kind of to assist the desktop of processing card is super to calculate hardware platform based on DSP and FPGA more
CN109146067B (en) Policy convolution neural network accelerator based on FPGA
CN106127302A (en) Process the circuit of data, image processing system, the method and apparatus of process data
CN111242282A (en) Deep learning model training acceleration method based on end edge cloud cooperation
CN109740739A (en) Neural computing device, neural computing method and Related product
CN111583940A (en) Very low power consumption keyword awakening neural network circuit
CN103116493B (en) A kind of automatic mapping method being applied to coarse-grained reconfigurable array
CN110383300A (en) A kind of computing device and method
CN111831354B (en) Data precision configuration method, device, chip array, equipment and medium
CN111860773B (en) Processing apparatus and method for information processing
CN115018062A (en) Convolutional neural network accelerator based on FPGA
WO2023070997A1 (en) Deep learning convolution acceleration method using bit-level sparsity, and processor
CN107877582B (en) A kind of circuit board Cutting road planning method
Zong-ling et al. The design of lightweight and multi parallel CNN accelerator based on FPGA
CN112651207A (en) Asynchronous circuit physical realization method and system
CN116822600A (en) Neural network search chip based on RISC-V architecture
CN115952393B (en) Forward computing method and system of multi-head attention mechanism based on supercomputer
CN102109973A (en) Processor implementation device including transcendental function calculation
Zhang et al. Apply yolov4-tiny on an fpga-based accelerator of convolutional neural network for object detection
US11886347B2 (en) Large-scale data processing computer architecture
CN106980348A (en) The hardware structure extended based on PCIE
Song et al. Bsc: Block-based stochastic computing to enable accurate and efficient tinyml

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication