CN116661729A

CN116661729A - Adder tree design system of mixed floating point

Info

Publication number: CN116661729A
Application number: CN202310652558.7A
Authority: CN
Inventors: 杨越; 燕博南; 范安骏逸
Original assignee: Beijing Pingxin Technology Co ltd
Current assignee: Beijing Pingxin Technology Co ltd
Priority date: 2023-06-05
Filing date: 2023-06-05
Publication date: 2023-08-29

Abstract

The invention provides a mixed floating point adder tree design system, and relates to the technical field of AI chips. The mixed floating point adder tree design system comprises an AI operation system, wherein a mixed floating point addition tree system is arranged in the AI operation system, an input module, an in-memory calculation circuit, an index comparison module, a fraction alignment module, a floating point calculation circuit, a normalized rounding module and an output module are arranged in the mixed floating point addition tree system, the index comparison module is connected with the in-memory calculation circuit through a data transmission technology, and the floating point calculation circuit is connected with the fraction alignment module through the data transmission technology. Through the mixed floating point addition tree structure of the floating point operation circuit, the design structure of optimization combination is utilized, so that the cost of circuit design area is saved, the operation efficiency is effectively improved, meanwhile, the power consumption of hardware operation is reduced, the calculation precision is greatly improved, and high-precision data operation can be realized with lower power consumption cost.

Description

Adder tree design system of mixed floating point

Technical Field

The invention relates to the technical field of AI chips, in particular to a mixed floating point adder tree design system.

Background

Artificial intelligence (Artificial Intelligence), abbreviated AI, is a new technical science for researching, developing theory, methods, techniques and application systems for simulating, extending and expanding human intelligence, is a branch of computer science, and is intended to understand the essence of intelligence and produce a new intelligent machine capable of reacting in a similar manner to human intelligence.

In recent years, with the wide application of artificial intelligence technology in various fields, the performance of computer hardware is also optimized iteratively, the existing AI chip continuously pursues the superiority of high computational power, low power consumption and low cost to meet the calculation requirement of the deep learning algorithm in the artificial intelligence field, however, with the rapid development of the deep learning algorithm, the new algorithm is not well supported in the cured circuit design structure, and in order to continuously optimize the AI chip circuit design, the design mode of the traditional floating point adder tree is to directly stack a plurality of single floating point adders, and each single floating point adder can independently finish the addition of two floating point numbers. This design has the following disadvantages: 1) The single floating point adder operation requires four steps of exponential comparison, fraction alignment, fraction addition and normalization, and direct superposition of a plurality of single floating point adders can generate large area cost and power consumption expenditure; 2) The single floating point addition number is rounded again after each operation is finished, and the continuous inclusion has a great influence on the accuracy of an operation result.

Disclosure of Invention

(one) solving the technical problems

Aiming at the defects of the prior art, the invention provides a mixed floating point adder tree design system, which solves the problems in the background art.

(II) technical scheme

In order to achieve the above purpose, the invention is realized by the following technical scheme: the utility model provides a mixed floating point adder tree design system, includes AI operation system, AI operation system's inside is provided with mixed floating point addition tree system for carry out mixed floating point addition operation to data, promotes the computational efficiency of floating point operation unit, solves current chip circuit precision and the higher problem of design cost, mixed floating point addition tree system inside is provided with input module, deposit internal calculation circuit, index comparison module, score alignment module, floating point calculation circuit, normalization rounding module and output module, deposit internal calculation circuit is used for carrying out logical operation to the data of input and/or storage, so as to form first calculation result, index comparison module links to each other with deposit internal calculation circuit through data transmission technique, be used for carrying out index comparison to the first calculation result that deposit internal calculation circuit formed, index comparison's result is as the benchmark that the score was aligned, score alignment module links to each other through data transmission technique, be used for guiding the completion of a plurality of score alignments, floating point calculation circuit links to each other with score alignment module through data transmission technique, is used for carrying out the floating point operation to the first calculation result, so as to form the second calculation result.

Preferably, the input module comprises a data input unit, a data storage unit and a data transmission unit, wherein the data input unit is connected with the data storage unit through a data transmission technology and is used for temporarily storing externally input data, the data storage unit is connected with the data transmission unit through the data transmission technology, and the data transmission unit is connected with the in-memory computing circuit through the data transmission technology and is used for transmitting the input and/or stored data to the in-memory computing circuit and performing data logic operation on the data.

Preferably, the output module is configured to output an operation result of the normalization rounding module.

Preferably, the logical operation includes one or more of and, or, not, nand, nor, exclusive or, and exclusive or.

Preferably, the operation of the adder tree design system of the mixed floating point mainly comprises the following steps:

a. data to be operated and processed are input through the input module and are transmitted to the in-memory computing circuit through a signal transmission technology;

b. the in-memory computing circuit carries out logic operation on the transmitted and/or stored data to form a first computing result, and the first computing result is input to the index comparison module through a data transmission technology;

c. the index comparison module performs index comparison on the first calculation result, and the index comparison result is used as a score alignment standard to guide the score alignment module to finish alignment of a plurality of scores;

d. the floating point calculation circuit carries out floating point operation on the first calculation result of the completion of the index comparison and the fraction alignment by the index comparison module and the fraction alignment module so as to form a second calculation result, and the second calculation result is transmitted to the normalization rounding module through a data transmission technology;

e. the normalization rounding module performs unified normalization rounding on the second calculation result, normalizes mantissas of the result, and finally transmits normalized rounded data to the output module through a data transmission technology to output a final floating point operation result.

Preferably, in the step b, the floating point number of the first calculation result includes a sign bit of 1bit, an exponent bit of nbit and a mantissa bit of mbit, so as to store a number in ±1.xx×2ζ.

Working principle: firstly, the invention combines a plurality of single floating point adders into a mixed floating point addition tree by changing the internal structure of the single floating point adders, wherein the plurality of single floating point adders are combined into a complete operator, the design cost of a single floating point addition circuit is reduced, secondly, when data is input into a floating point operation unit, all addends in the mixed floating point addition tree are subjected to index comparison, the result of the index comparison is used as a standard of score alignment to guide the completion of a plurality of score pairs Ji Leijia operation, finally, the total operation result obtained after the score alignment accumulation is completed is subjected to unified normalization rounding, the mantissa of the result is normalized, and the final floating point operation result is output.

(III) beneficial effects

The invention provides a mixed floating point adder tree design system. The beneficial effects are as follows:

1. the invention creatively provides a mixed floating point addition tree structure of the floating point operation circuit, utilizes an optimized and combined design structure, not only saves the cost of circuit design area, but also effectively improves operation efficiency, reduces the power consumption of hardware operation, greatly improves the calculation precision, and can realize high-precision data operation with lower power consumption cost.

2. The invention solves the problem that the floating point operation module in the current AI chip circuit is difficult to balance the efficiency and the power consumption cost, meets the calculation requirement of the deep learning algorithm with large operation parameter quantity and high operation complexity, and can be widely applied to AI chips with high precision, high calculation force and low power consumption.

Drawings

FIG. 1 is a schematic diagram of a system of the present invention;

FIG. 2 is a schematic diagram of an input module according to the present invention;

FIG. 3 is a schematic diagram of a hybrid floating point addition tree of the present invention.

Wherein, 1, AI operation system; 2. a hybrid floating point addition tree system; 3. an input module; 4. a memory calculation circuit; 5. an index comparison module; 6. a score alignment module; 7. a floating point calculation circuit; 8. a normalization rounding module; 9. an output module; 10. a data entry unit; 11. a data storage unit; 12. and a data transmission unit.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Examples:

as shown in fig. 1-3, the embodiment of the present invention provides a mixed floating point adder tree design system, which includes an AI operation system 1, a mixed floating point adding tree system 2 is disposed in the AI operation system 1, and is used for performing mixed floating point adding operation on data, improving the calculation efficiency of the floating point operation unit, solving the problems of lower precision and higher design cost of the existing chip circuit, an input module 3, an in-memory calculation circuit 4, an exponent comparing module 5, a fraction aligning module 6, a floating point calculating circuit 7, a normalization rounding module 8 and an output module 9 are disposed in the mixed floating point adding tree system 2, the in-memory calculation circuit 4 is used for performing logic operation on input and/or stored data to form a first calculation result, the exponent comparing module 5 is connected with the in-memory calculation circuit 4 through a data transmission technology, and is used for performing exponent comparison on the first calculation result formed by the in-memory calculation circuit 4, the exponent comparing result is used as a reference for performing exponent alignment, the exponent aligning module 6 is connected with the exponent comparing module 5 through a data transmission technology, and is used for guiding the completion of multiple fraction alignments, the floating point calculating circuit 7 is connected with the exponent aligning module 6 through a data transmission technology, and is used for performing the first calculation result.

The input module 3 includes a data input unit 10, a data storage unit 11 and a data transmission unit 12, where the data input unit 10 is connected with the data storage unit 11 through a data transmission technology and is used for temporarily storing externally input data, the data storage unit 11 is connected with the data transmission unit 12 through a data transmission technology, and the data transmission unit 12 is connected with the in-memory computing circuit 4 through a data transmission technology and is used for transmitting input and/or stored data to the in-memory computing circuit 4 and performing data logic operation on the data.

The output module 9 is configured to output the operation result of the normalization rounding module 8.

The logical operations include one or more of and, or, not, nand, nor, exclusive or, and exclusive or.

Firstly, the internal structures of a plurality of single floating point adders are changed and combined into a mixed floating point addition tree, wherein the plurality of single floating point adders are combined into a complete operator, the design cost of a single floating point addition circuit is reduced, secondly, when data is input into a floating point operation unit, index comparison is carried out on all addends in the mixed floating point addition tree, the result of the index comparison is used as a score alignment standard to guide the completion of a plurality of score pairs Ji Leijia operation, finally, the total operation result obtained after the score alignment accumulation is carried out is subjected to unified normalization rounding, the mantissa of the result is normalized, and a final floating point operation result is output, so that calculation precision errors caused by multiple rounding can be effectively avoided.

The operation of the adder tree design system of the mixed floating point mainly comprises the following steps:

a. data to be operated and processed are input through the input module 3 and are transmitted to the in-memory computing circuit 4 through a signal transmission technology;

b. the in-memory computing circuit 4 performs logic operation on the transmitted and/or stored data to form a first computing result, and inputs the first computing result to the exponent comparing module 5 through a data transmission technology;

c. the index comparison module 5 performs index comparison on the first calculation result, and the index comparison result is used as a reference of score alignment to guide the score alignment module 6 to finish a plurality of score alignments;

d. the floating point calculating circuit 7 carries out floating point operation on the first calculation result of the completion of the index comparison and the fraction alignment by the index comparing module 5 and the fraction alignment module 6 to form a second calculation result, and transmits the second calculation result to the normalization rounding module 8 through a data transmission technology;

e. the normalization rounding module 8 performs unified normalization rounding on the second calculation result, normalizes the mantissa of the result, and finally transmits the normalized rounded data to the output module 9 through a data transmission technology to output a final floating point operation result.

In step b, the floating point number of the first calculation result includes a 1bit sign bit, an nbit exponent bit and an mbit mantissa bit to store + -1. Xx 2 x format numbers.

The invention creatively provides a mixed floating point addition tree structure of a floating point operation circuit, utilizes an optimized and combined design structure, not only saves the cost of circuit design area, but also effectively improves operation efficiency, simultaneously reduces the power consumption of hardware operation, greatly improves calculation precision, and can realize high-precision data operation with smaller power consumption cost.

Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims

1. The adder tree design system of mixed floating point includes AI operation system (1), its characteristic is: the AI operation system (1) is internally provided with a mixed floating point addition tree system (2) for carrying out mixed floating point addition operation on data, so that the calculation efficiency of a floating point operation unit is improved, the problems of low accuracy and high design cost of an existing chip circuit are solved, the mixed floating point addition tree system (2) is internally provided with an input module (3), an in-memory calculation circuit (4), an index comparison module (5), a fraction alignment module (6), a floating point calculation circuit (7), a normalization rounding module (8) and an output module (9), the in-memory calculation circuit (4) is used for carrying out logic operation on input and/or stored data to form a first calculation result, the index comparison module (5) is connected with the in-memory calculation circuit (4) through a data transmission technology, the first calculation result formed by the in-memory calculation circuit (4) is used for carrying out index comparison, the index comparison result is used as a reference for carrying out fraction alignment, the fraction alignment module (6) is connected with the index comparison module (5) through a data transmission technology, and the floating point calculation circuit (7) is connected with the second calculation result through the floating point alignment module (6) to form a second calculation result through the floating point alignment technology.

2. The mixed floating point adder tree design system of claim 1, wherein: the input module (3) comprises a data input unit (10), a data storage unit (11) and a data transmission unit (12), wherein the data input unit (10) is connected with the data storage unit (11) through a data transmission technology and is used for temporarily storing externally input data, the data storage unit (11) is connected with the data transmission unit (12) through the data transmission technology, and the data transmission unit (12) is connected with the in-memory computing circuit (4) through the data transmission technology and is used for transmitting input and/or stored data to the in-memory computing circuit (4) and performing data logic operation on the data.

3. The mixed floating point adder tree design system of claim 1, wherein: the output module (9) is used for outputting the operation result of the normalization rounding module (8).

4. The mixed floating point adder tree design system of claim 1, wherein: the logical operation includes one or more of and, or, not, nand, nor, exclusive or, and exclusive or.

5. The mixed floating point adder tree design system of claim 1, wherein: the operation of the adder tree design system of the mixed floating point mainly comprises the following steps:

a. data to be operated and processed are input through the input module (3) and are transmitted to the in-memory computing circuit (4) through a signal transmission technology;

b. the in-memory computing circuit (4) carries out logic operation on the transmitted and/or stored data to form a first computing result, and the first computing result is input to the exponent comparing module (5) through a data transmission technology;

c. the index comparison module (5) performs index comparison on the first calculation result, and the index comparison result is used as a reference of score alignment to guide the score alignment module (6) to finish a plurality of score alignments;

d. the floating point calculation circuit (7) carries out floating point operation on the first calculation result of the completion of the index comparison and the score alignment by the index comparison module (5) and the score alignment module (6) to form a second calculation result, and the second calculation result is transmitted to the normalization rounding module (8) through a data transmission technology;

e. and the normalization rounding module (8) performs unified normalization rounding on the second calculation result, normalizes the mantissa of the result, and finally transmits the normalized rounded data to the output module (9) through a data transmission technology to output a final floating point operation result.

6. The mixed floating point adder tree design system according to claim 5, wherein: in the step b, the floating point number of the first calculation result includes a sign bit of 1bit, an exponent bit of nbit and a mantissa bit of mbit to store a number in + -1. Xx 2 x format.