CN117251134A - Data processing method and device of neural network, storage medium and electronic equipment - Google Patents
Data processing method and device of neural network, storage medium and electronic equipment Download PDFInfo
- Publication number
- CN117251134A CN117251134A CN202210642072.0A CN202210642072A CN117251134A CN 117251134 A CN117251134 A CN 117251134A CN 202210642072 A CN202210642072 A CN 202210642072A CN 117251134 A CN117251134 A CN 117251134A
- Authority
- CN
- China
- Prior art keywords
- data
- processed
- order
- accumulated
- group
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 66
- 238000003672 processing method Methods 0.000 title claims abstract description 23
- 238000012545 processing Methods 0.000 claims abstract description 45
- 238000000034 method Methods 0.000 claims abstract description 37
- 238000009825 accumulation Methods 0.000 claims description 32
- 238000013527 convolutional neural network Methods 0.000 claims description 12
- 238000004590 computer program Methods 0.000 claims description 3
- 238000000605 extraction Methods 0.000 claims description 3
- 230000009286 beneficial effect Effects 0.000 abstract description 2
- 238000007667 floating Methods 0.000 description 16
- 230000008569 process Effects 0.000 description 9
- 238000010586 diagram Methods 0.000 description 8
- 238000004364 calculation method Methods 0.000 description 4
- 238000004891 communication Methods 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 230000006978 adaptation Effects 0.000 description 2
- 238000013529 biological neural network Methods 0.000 description 2
- 238000004422 calculation algorithm Methods 0.000 description 2
- 230000009191 jumping Effects 0.000 description 2
- 239000013307 optical fiber Substances 0.000 description 2
- 230000000644 propagated effect Effects 0.000 description 2
- 241000699670 Mus sp. Species 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 210000004556 brain Anatomy 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000013178 mathematical model Methods 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003252 repetitive effect Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
- G06F7/544—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices for evaluating functions by calculation
- G06F7/5443—Sum of products
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
- G06F7/483—Computations with numbers represented by a non-linear combination of denominational numbers, e.g. rational numbers, logarithmic number system or floating-point numbers
- G06F7/4836—Computations with rational numbers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
- G06F7/483—Computations with numbers represented by a non-linear combination of denominational numbers, e.g. rational numbers, logarithmic number system or floating-point numbers
- G06F7/485—Adding; Subtracting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
- G06F7/483—Computations with numbers represented by a non-linear combination of denominational numbers, e.g. rational numbers, logarithmic number system or floating-point numbers
- G06F7/487—Multiplying; Dividing
- G06F7/4876—Multiplying
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computing Systems (AREA)
- Computational Mathematics (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Nonlinear Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Neurology (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Complex Calculations (AREA)
Abstract
The disclosure provides a data processing method and device of a neural network, a storage medium and electronic equipment, and relates to the technical field of computers. The method comprises the following steps: acquiring a plurality of groups of data to be processed, which are input to a current operation unit in a neural network, wherein the data to be processed comprise an order and a mantissa; extracting common order factors of the multiple groups of data to be processed, and multiplying mantissas in each group of data to be processed by the residual orders after the common order factors are extracted to obtain data to be accumulated corresponding to each group of data to be processed; and accumulating the data to be accumulated corresponding to each group of data to be processed, and multiplying the accumulated result by the common order factor to obtain the output data of the current operation unit. The method and the device are beneficial to improving the data processing efficiency of the neural network and reducing the hardware power consumption.
Description
Technical Field
The disclosure relates to the field of computer technology, and in particular, to a data processing method of a neural network, a data processing device of the neural network, a computer readable storage medium and electronic equipment.
Background
In neural network-based algorithms, data can be divided into floating point data and fixed point data, with the core's computational unit being a multiply-accumulate operation. However, in the related art, floating point data or fixed point data needs to be processed based on different neural networks, so that the universality of a data processing method of the neural networks is reduced; when the accumulation operation is carried out, the data to be accumulated with large bit width is directly accumulated, so that the operation speed of the data processing method of the neural network is low, the hardware power consumption is increased, and the data processing efficiency of the neural network is reduced.
It should be noted that the information disclosed in the above background section is only for enhancing understanding of the background of the present disclosure and thus may include information that does not constitute prior art known to those of ordinary skill in the art.
Disclosure of Invention
The disclosure provides a data processing method of a neural network, a data processing device of the neural network, a computer readable storage medium and electronic equipment, so as to at least improve the problem of low data processing efficiency of the neural network to a certain extent.
Other features and advantages of the present disclosure will be apparent from the following detailed description, or may be learned in part by the practice of the disclosure.
According to a first aspect of the present disclosure, there is provided a data processing method of a neural network, including: acquiring a plurality of groups of data to be processed, which are input to a current operation unit in a neural network, wherein the data to be processed comprise an order and a mantissa; extracting common order factors of the multiple groups of data to be processed, and multiplying mantissas in each group of data to be processed by the residual orders after the common order factors are extracted to obtain data to be accumulated corresponding to each group of data to be processed; and accumulating the data to be accumulated corresponding to each group of data to be processed, and multiplying the accumulated result by the common order factor to obtain the output data of the current operation unit.
According to a second aspect of the present disclosure, there is provided a data processing apparatus of a neural network, comprising: the system comprises a data acquisition module to be processed, a data processing module and a data processing module, wherein the data acquisition module to be processed is configured to acquire a plurality of groups of data to be processed which are input to a current operation unit in a neural network, and the data to be processed comprise an order and a mantissa; the common order factor extraction module is configured to extract common order factors of the plurality of groups of data to be processed, and multiply mantissas in each group of data to be processed with residual orders after the common order factors are extracted to obtain data to be accumulated corresponding to each group of data to be processed; and the output data acquisition module is configured to accumulate the data to be accumulated corresponding to each group of data to be processed, and multiply the accumulated result with the common order factor to obtain the output data of the current operation unit.
According to a third aspect of the present disclosure, there is provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the data processing method of the neural network of the first aspect described above and possible implementations thereof.
According to a fourth aspect of the present disclosure, there is provided an electronic device comprising: a processor; and the memory is used for storing executable instructions of the processor. Wherein the processor is configured to perform the data processing method of the neural network of the first aspect described above and possible implementations thereof via execution of the executable instructions.
The technical scheme of the present disclosure has the following beneficial effects:
in the data processing process of the neural network, multiple groups of input data to be processed are obtained, common order factors of the data to be processed are extracted, mantissas in each group of data to be processed are multiplied by residual orders after the common order factors are extracted, so that data to be accumulated are obtained, the data to be accumulated are accumulated, and an accumulation result is multiplied by the common order factors, so that output data of a current operation unit is obtained. On one hand, based on the existing fixed-point multiplier and adder, the common order factor is extracted for each group of data to be processed, so that the method can process floating point operation and fixed-point operation, and the universality of the data processing method of the neural network is improved; on the other hand, the mantissa in each group of data to be processed is multiplied by the residual order after the common order factor is extracted to obtain the data to be accumulated, and then the data to be accumulated is accumulated, so that the bit width of the data to be accumulated and the accumulation operation is reduced, the operation speed is improved, the data processing efficiency of the neural network is further improved, and meanwhile, the hardware power consumption is reduced.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure. It will be apparent to those of ordinary skill in the art that the drawings in the following description are merely examples of the disclosure and that other drawings may be derived from them without undue effort.
FIG. 1 illustrates a system architecture of an operating environment of the present exemplary embodiment;
fig. 2 is a flowchart showing a data processing method of a neural network in the present exemplary embodiment;
fig. 3 is a schematic diagram showing multiply-accumulate operations performed on three sets of data to be processed in the present exemplary embodiment;
fig. 4 is a diagram showing a procedure of acquiring a step code of a common order factor in the present exemplary embodiment;
fig. 5 is a schematic diagram showing a process of accumulating data to be accumulated after extracting a common order factor in the present exemplary embodiment;
fig. 6 is a flowchart showing a procedure of an accumulation operation in a data processing method of a neural network in the present exemplary embodiment;
FIG. 7 shows a flow chart of another method of data processing for a neural network in the present exemplary implementation;
fig. 8 is a schematic diagram showing a data processing procedure of a convolutional neural network in the case where data to be processed is image data in the present exemplary implementation;
fig. 9 is a schematic diagram showing an operation procedure of an accumulation operation in data processing of a convolutional neural network when data to be processed is image data in the present exemplary implementation;
fig. 10 is a schematic diagram showing the structure of a data processing apparatus of a neural network in the present exemplary embodiment;
fig. 11 shows a schematic diagram of an electronic device in the present exemplary embodiment.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. However, the exemplary embodiments may be embodied in many forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the present disclosure. However, those skilled in the art will recognize that the aspects of the present disclosure may be practiced with one or more of the specific details, or with other methods, components, devices, steps, etc. In other instances, well-known technical solutions have not been shown or described in detail to avoid obscuring aspects of the present disclosure.
Furthermore, the drawings are merely schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus a repetitive description thereof will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in software or in one or more hardware modules or integrated circuits or in different networks and/or processor devices and/or microcontroller devices.
In a computer system, data may be represented in both fixed-point and floating-point modes; if each data fixed point represents, the decimal point position of the data is fixed and the value range of the data is limited, and the decimal point position can be used for representing a decimal or integer; as may be used to represent currency data in a fixed point fashion, 88.00 or 00.88 may be used to represent currency values with four digits of precision, two after a decimal point.
Since the fixed decimal point position determines the integer part and the decimal part of the fixed digit, the form of the data is too stiff due to the fact that the data is represented in a fixed-point mode, so that the data is unfavorable to express a particularly large number or a particularly small number at the same time, and if two data with large differences are represented in a fixed-point mode, a long machine word length is required, and the utilization rate of a data storage unit is low. Therefore, in practical applications, most modern computers use floating point to represent data, where the floating point represents real numbers using scientific counting, and where the floating point represented data may include an order and mantissa, radix and sign bits, such as decimal fixed point data 123.45 may be represented in floating point as (-1) ° x 1.2345 x 10 2 Wherein (-1) ° may be a sign bit, 1.2345 may be a mantissa, 10 may be a radix, 2 may be a step code, 10 2 The floating point number can reach the effect of floating decimal point through an index, and the numerical range of the data is controlled. The floating point data can represent a larger data range and has higher calculation precision, but at the same time, the cost of hardware implementation is larger.
In the related art, an artificial intelligence computing engine applied to a neural network may have a typical hardware structure including a convolution computing engine, a vector operation engine, a storage unit, and an overall scheduling unit and a partial and accumulation unit. Wherein the core operations in the convolution calculation engine and the vector operation engine are multiply-accumulate.
Taking the operation procedure of multiply-accumulate floating point data A, B, C, D, E, F and the like as an example, any floating point data can be expressed as the following form:
A=2 ea ×ma (1)
where ea represents the step code, 2 represents the radix, and ma is the mantissa. Therefore, the operation procedure of the multiply-accumulate operation unit is as follows:
before the data is calculated, all input operands in multiply-accumulate operation are converted into a floating point representation format, and only the order and mantissa of each data need to be transmitted in the data transmission process. In addition, the multiplication results are added up, which corresponds to the result of first calculating a×b+c×d and then adding e×f.
As can be seen from equation (2), each multiplication may comprise both an order and a mantissa, for example A B, the multiplication may comprise an order part 2 ea+eb And a mantissa portion (maxmb), where maxmb corresponds to a fixed-point multiplication operation, which can be accomplished in a hardware implementation using a multiplier; 2 ea+eb X (maxmb) corresponds to a shift operation on the result of maxmb, which can be done in hardware implementation using a shifter; however, this may result in a×b=2 ea+eb The x (maxmb) bit width is very large, for example, floating point number a=2 ea In the x ma, the bit width of ea and ma is 8 bits together, wherein the bit width of ea is 4 bits, and the data range that can be represented is 0 to 15, and when multiplication A x B is performed, the data range of ea+eb obtained is 0 to 30, and at this time, the bit width that needs to be used becomes 5 bits.
In the above process, the bit width of the addition operation is 1bit more for two data, and in the formula (2), the addition operation is needed for the data besides the addition of the order, so that the bit width of the data is increased when the addition operation is performed, the bit width requirement for the data input by the adder is also increased along with the increase of the accumulation times, the operation speed is slowed down by the increase of the operation amount, and the hardware power consumption is increased and the data processing efficiency is reduced.
In view of the foregoing, exemplary embodiments of the present disclosure provide a data processing method of a neural network. The system architecture of the operating environment of the present exemplary embodiment is described below in conjunction with fig. 1.
Referring to fig. 1, a system architecture 100 may include a terminal device 110 and a server 120. The terminal device 110 may be an electronic device such as a smart phone, a tablet computer, a notebook computer, etc. The server 120 generally refers to a background system that provides a data processing-related service of the neural network in the present exemplary embodiment, and may be, for example, a server that implements data processing of the neural network. Server 120 may be a server or a cluster of servers, which is not limited by this disclosure. The terminal device 110 and the server 120 may form a connection through a wired or wireless communication link for data interaction.
In one embodiment, the neural network may be deployed at the terminal device 110, and the data processing method of the neural network in the present exemplary embodiment is performed by the terminal device 110.
In one embodiment, the neural network may be disposed in the server 120, and after the terminal device 110 transmits the data to be processed to the server 120, the server 120 performs the data processing method of the neural network in the present exemplary embodiment.
The data processing method of the neural network is described below with reference to fig. 2. Fig. 2 shows an exemplary flow of a data processing method of a neural network, including the following steps S210 to S230:
step S210, obtaining a plurality of groups of data to be processed, which are input into a current operation unit in a neural network, wherein the data to be processed comprise an order and a mantissa;
step S220, extracting common order factors of a plurality of groups of data to be processed, and multiplying mantissas in each group of data to be processed by the residual orders after the common order factors are extracted to obtain data to be accumulated corresponding to each group of data to be processed;
step S230, accumulating the data to be accumulated corresponding to each group of data to be processed, and multiplying the accumulated result by a common order factor to obtain the output data of the current operation unit.
Based on the method, on one hand, on the basis of the existing fixed-point multiplier and adder, the common order factor is extracted for each group of data to be processed, so that the method can process floating point operation and fixed-point operation, and the universality of the data processing method of the neural network is improved; on the other hand, the mantissa in each group of data to be processed is multiplied by the residual order after the common order factor is extracted to obtain the data to be accumulated, and then the data to be accumulated is accumulated, so that the bit width of the data to be accumulated and the accumulation operation is reduced, the operation speed is improved, the data processing efficiency of the neural network is further improved, and meanwhile, the hardware power consumption is reduced.
Each step in fig. 2 is specifically described below.
Referring to fig. 2, in step S210, a plurality of sets of data to be processed, which includes an order and a mantissa, input to a current operation unit in a neural network are acquired.
The neural network can be an algorithm mathematical model imitating the behavior characteristics of a biological neural network to perform distributed parallel information processing, and can achieve the purpose of processing information by adjusting the interconnection relation among a large number of internal nodes according to the complexity of the system, and generally, the biological neural network mainly refers to a neural network of the human brain. For example, the neural network may include a convolutional neural network, which may be a type of feedforward neural network that includes convolutional calculation and has a deep structure, and feature learning may be performed.
The data to be processed can be the current operation unit input into the neural network and the data to be calculated, and can be any type of data, such as image data, text data or audio data, or intermediate data obtained by processing the data by a part of intermediate layers of the neural network. The data to be processed may include an order and a mantissa; in the hardware level, the data to be processed may be represented by shifting the number of bits of the mantissa order representation, for example, the data to be processed a may be represented as a=2 ea X ma, 2 ea May be the order of the data a to be processed, ma may be the mantissa of the data a to be processed, and in the hardware level, the data a to be processed may be represented by shifting ea bit to the left of ma. The specific content of the data to be processed is not particularly limited in the present disclosure.
Multiple groups of data to be processed can be input into a current operation unit for operation, the current operation unit can execute multiplication and accumulation operation, and the multiplication and accumulation operation unit can comprise multiple accumulators and multipliers, wherein the multipliers complete the multiplication operation of the data; the accumulator is responsible for adding the product obtained by the multiplication. When the accumulation operation is performed, the input of the accumulation operation needs to be added with the output data of the previous module of the current module in addition to the result of each multiplier in the current module.
In one embodiment, each of the multiple sets of pending data of the present disclosure may be used to perform a multiplication operation to obtain each set of pending data, and the pending data between the different sets may be used to perform an accumulation operation; for example, as shown in fig. 3, where a, B, C, D, E, and F are six data to be processed, three groups of data are common in fig. 3: a and B are a group, C and D are a group, E and F are a group; each set of data to be processed may be used to perform a multiplication operation, i.e., axb, cxd, exf, to obtain each set of data to be processed; the different groups of data to be processed can be subjected to accumulation operation, namely A×B, C×D and E×F can be added to obtain A×B+C×D+E×F, and the output data of the previous stage operation unit is added to obtain the output data of the current operation unit.
In one embodiment, the data to be processed may include output data of a previous stage operation unit of the current operation unit.
The upper-stage operation unit may be an operation unit that completes the multiply-accumulate operation, and may output an operation result of the multiply-accumulate operation, that is, output data of the upper-stage operation unit, where the output data of the upper-stage operation unit may be expressed in a form of multiplying an order by a mantissa.
In step S220, extracting common order factors of a plurality of groups of data to be processed, and multiplying mantissas in each group of data to be processed by residual orders after the common order factors are extracted to obtain data to be accumulated corresponding to each group of data to be processed;
wherein each set of data to be processed may include two sets of data to be processed, each set of data to be processed may be used to perform a multiplication operation, and the data to be processed between different sets may be used to perform an accumulation operation; in one embodiment, each set of data to be processed may further include output data of a previous stage operation unit of the current operation unit, and the specific content of each set of data to be processed is not particularly limited in the present disclosure.
In the present exemplary embodiment, each set of data to be processed may be regarded as a whole, and each set of data to be processed may be expressed as a form of multiplying a total order by a total mantissa, for example, a×b, where a=2 ea ×ma,B=2 eb X mb, a x B may be expressed as a x b=2 ea ×ma×2 eb ×mb=2 ea+eb X (maXmb), the total order in the set of data to be processed is 2 ea+eb The total mantissa is (maxmb).
The total mantissa may be a product of different mantissas in each set of data to be processed. It should be noted that, if a certain set of data to be processed includes only one data, the order of the data may be the total order of the set of data to be processed, and the mantissa of the data may be the total mantissa of the set of data to be processed.
The common order factor may be a common factor between the total order of the sets of data to be processed, e.g. there are two sets of data to be processed, a×b and c×d, where a×b=2 ea+eb ×(ma×mb),C×D=2 ec+ed X (mc×md), the common order factor may be the total order 2 of a set of data to be processed ea+eb Or 2 ec+ed The method comprises the steps of carrying out a first treatment on the surface of the In one embodiment, the common order factor may also be some preset order value, which may be an order value that is less than the minimum value of the total order; the common order factor may also be an order of output data of the upper stage operation unit; the present disclosure does not particularly limit the manner of obtaining the common order factor and the total order.
In an embodiment, the extracting the common order factor of the plurality of sets of data to be processed may include:
the total order of each group of data to be processed is determined, and the common order factor is obtained by comparing the total orders.
For example, there are two sets of data to be processed: the first group of data to be processed A and B and the second group of data to be processed C and D can determine that the total order of the first group of data to be processed is 2 ea+eb Determining the total order of the second group of data to be processed to be 2 ec+ed For total order 2 ea+eb And 2 ec+ed When the cardinality of the total orders is the same, the common order factor may be obtained by comparing the order codes ea+eb and ec+ed of the total orders to achieve the comparison of the total orders.
In one embodiment, the comparing the total orders to obtain the common order factor may include the steps of:
the common order factor is determined from the minimum of the total orders.
In one embodiment, if the current operation unit does not have the previous operation unit, the data to be processed only includes the data processed by the current operation unit, and the total order may include the total order of each group of data to be processed; if the current operation unit has the previous operation unit, the data to be processed may include the output data of the previous operation unit of the current operation unit, and when the common order factor is determined, the total order may include the order 2 of the output data of the previous operation unit of the current operation unit emin ' i.e. the common order factor of the previous operation unit of the current operation unit and the total order of each other group of data to be processed, the common order factor 2 can be determined according to the minimum value of the total order emin . Furthermore, when there is only one set of data to be processed, the total order of the set of data to be processed may be directly taken as the common order factor.
For example, a common order factor is determined in two sets of data to be processed: a first group of data A and B to be processed, a second group of data C and D to be processed, the total order of the first group of data to be processed being 2 ea+eb The total order of the second group of data to be processed is 2 ec+ed Pair 2 ea+eb And 2 ec+ed By comparison, since the cardinalities of the total order are all 2, thenThe step codes ea+eb and ec+ed of the total steps are compared, and the minimum value emin in the step codes ea+eb and ec+ed is selected as the step code of the common step factor, so that the common step factor 2 can be obtained emin 。
The common order factor is determined according to the minimum value of the total order, so that the bit width of the data to be accumulated can be reduced to the greatest extent; since the minimum value of the total order is the maximum common factor of each group of data to be processed, the minimum value of the total order is extracted as the common order factor, and the residual order of the minimum bit width can be obtained.
In one embodiment, the remaining order may comprise a quotient of the total order of each set of data to be processed divided by a common order factor.
For example, if the current operation unit is the first stage operation unit, the data to be processed does not include the output result of the previous stage operation unit, and the common order factor 2 can be extracted from multiple groups of data to be processed such as A, B, C, D, E, and F according to the formula (3) emin In the formula (3), the common order factor and the data to be accumulated may be included, and the data to be accumulated may include an operation result obtained by accumulating the product of the mantissa and the remaining order corresponding to each group of data to be processed.
E.g. 2 (ea+eb)-emin Xk 1, k1= (ma×mb), where the remaining order is 2 (ea+eb)-emin May include the total order 2 of the data to be processed ea+eb Divided by common order factor 2 emin The resulting quotient.
In one embodiment, the current operation unit may include a multiplier and an adder, and the mantissa in each set of the data to be processed may be multiplied by the multiplier with the remaining order after the common order factor is extracted, for example, the remaining order and the mantissa of each set of the data to be processed in the formula (3) may be multiplied by the multiplier.
Wherein emin is the minimum of ea+eb, ec+ed, ee+ef, …, 2 can be calculated emin Each set of data products to be processed is extracted as a common order factor, k1= (ma×mb), k2= (mc×md), k3= (me×mf) …
In one embodiment, the current operation unit may include an order processing unit and a mantissa processing unit, and the order and mantissa of each data to be processed may be respectively sent to the order processing unit and the mantissa processing unit for processing.
For example, there are a plurality of data A, B, C, D, E, F to be processed and output data Result of the operation unit of the previous stage of the current operation unit prev The data to be processed may be calculated according to formula (4); since each piece of data to be processed may include an order and a mantissa, the order and the mantissa of each piece of data to be processed may be respectively sent to the order processing unit and the mantissa processing unit for processing.
The emin may be the minimum value of ea+eb, ec+ed, ee+ef, emin ', and may be used as a common order factor to extract the product of each group of data to be processed, k1= (ma×mb), k2= (mc×md), k3= (me×mf), and R' may be the mantissa of the output data of the previous stage of the current operation unit.
In one embodiment, in the order processing unit, as shown in fig. 4, the order processing unit needs to receive the order emin 'of the common order factor of the previous stage of the current operation unit, send the order of each data to be processed to the adder to obtain the order of the total order of the other groups of data to be processed, where the order of the total order of each group of data to be processed may include ea+eb, ec+ed, ee+ef and emin', and input the orders of all the total orders to the MIN module to obtain the minimum value of the orders of the total orders, so that the order emin of the common order factor of the current operation unit may be obtained according to the minimum value of the orders of the total orders.
In one embodiment, if a certain set of data to be processed is output data of a previous stage operation unit, the total order of the set of data to be processed may be a common order factor of the previous stage operation unit.
In one embodiment, if the data to be processed includes the output data of the previous stage operation unit, the common order factor may be obtained by comparing the total order of each group of data to be processed with the common order factor of the previous stage operation unit.
In one embodiment, in the mantissa processing unit, as shown in fig. 5, mantissas of data to be processed may be input to a multiplier to obtain mantissas K1, K2 and K3 corresponding to each group of data to be processed, while receiving mantissas R' of an upper level operation unit of a current operation unit; inputting ea+eb, ec+ed, ee+ef, emin 'and emin into a subtracter, and calculating values of ea+eb-emin, ec+ed-emin, ee+ef-emin, emin' -emin; then, the shift operation can be performed on the K1, K2, K3 and R 'according to the values of ea+eb-emin, ec+ed-emin, ee+ef-emin, emin' -emin, and the shifted results are added to obtain the mantissa R of the output result of the current operation unit, and finally, the step codes emin and R of the common step factor in the output result of the current operation unit can be output to the next operation unit of the current operation unit to continue processing the data.
In one embodiment, the method and the device are based on the fixed-point multiplier and the fixed-point adder, and the common order factors are extracted for each group of data to be processed, so that the scheme can process floating point operation and fixed-point operation, and the universality of a data processing method of the neural network is improved; in addition, the remaining orders of each group of data to be processed after the common order factors are extracted are multiplied by mantissas to obtain data to be accumulated, so that the bit width of addition operation is reduced, for example, in the data to be processed A and B with the bit width of 16 bits, the order codes ea and eb of the orders can use 5 bits, and the mantissas ma and mb can use 11 bits; the data range that ea and eb can represent is 0-31, if ea and eb are directly added, the obtained data range of ea+eb can be 0-62, the bit width required to be used is changed into 6 bits at this time, and the bit width required to be used is larger and larger along with the increase of addition; the common order factor is extracted in the process of calculating A multiplied by B, and when the addition operation of the order is carried out, the data range of ea+eb-emin is still 0-31 by subtracting the operation of the order code emin of the common order factor, and the bit width of the operation result after the addition operation is still 5 bits.
With continued reference to fig. 2, in step S230, the data to be accumulated corresponding to each group of data to be processed is accumulated, and the accumulated result is multiplied by the common order factor to obtain the output data of the current operation unit.
Wherein the data to be accumulated may include mantissas corresponding to each set of data to be processed and remaining orders after the common order factor is extracted, e.g., the disclosure is applied to a set of data to be processed a=2 ea X ma and b=2 eb Processing with x mb, the common order factor of the group of data to be processed being 2 emin The data to be accumulated corresponding to the set of data to be processed can be expressed as 2 (ea +eb)-emin X (maxmb), the specific content of the data to be accumulated in the present disclosure is not particularly limited.
In one embodiment, accumulating the data to be accumulated corresponding to each group of data to be processed, and multiplying the accumulated result by a common order factor to obtain the output data of the current operation unit may include:
and accumulating the data to be accumulated corresponding to each group of data to be processed through an adder, and multiplying the accumulated result with a common order factor through a multiplier to obtain the output data of the current operation unit.
In one embodiment, the output data of the current operation unit may be a product of a common order factor and an accumulation result, and may be expressed as a multiplication of an order and a mantissa, and the specific form of the output data of the current operation unit is not particularly limited in this disclosure.
In one embodiment, the output data of the current operation unit may be input to the next operation unit to continue the processing of the data.
In one embodiment, the current operation unit may further include a carry save adder, and the adder may accumulate the data to be accumulated corresponding to each group of data to be processed, and may include the following steps:
when the number of items of the data to be accumulated is smaller than or equal to the preset number of items, adding the data to be accumulated through an adder to obtain an accumulation result;
and when the number of items of the data to be accumulated is larger than the preset number of items, adding the data to be accumulated by combining the carry save adder and the adder to obtain an accumulation result.
The carry-save adder may be an adder for summing a large number of operands. By inputting three source operands, two operation results can be output, and the addition of three numbers can be compressed into the addition of two numbers by CSA, so that carry can be reserved when a plurality of data are added, and carry transfer can be performed only once.
The preset number of items may be two, three or other, and the specific number of items of the preset number of items is not particularly limited in the present disclosure.
In the calculation of each CSA, three operands a may be entered 1 ,a 2 ,a 3 The two output data b can be obtained according to the following equation (5) and equation (6) 1 And b 2 :
b 1 =a 1 ∧a 2 ∧a 3 (5)
b 2 =((a 1 &a 2 )|(a 2 &a 3 )|(a 1 &a 3 ))<<1 (6)
In one embodiment, if there are only two items of data to be accumulated, the data to be processed can be directly added by the adder; if the data to be accumulated is greater than two items, the Carry Save Adder (CSA) can be combined to compress the accumulation operation, and when the data to be accumulated is compressed to two items, the two items are added by the Adder to obtain the accumulation result.
For example, as shown in FIG. 6, data a may be accumulated in combination with a carry-save adder and an adder 0 ~a 5 And (5) summing. Since CSA can input three data, outputTwo data, therefore, three data can be selected from the six data to be accumulated to be input into the first CSA, and the rest three data are input into the second CSA; then, two outputs of the first CSA and one output of the second CSA are used as input data and input into a third CSA to obtain two output data; the two output data of the third CSA and the remaining output data of the second CSA are used as input data and are input into a fourth CSA to obtain two output data b 0 And b 1 At this time, six data a to be accumulated are obtained by a plurality of CSAs 0 ~a 5 Compressed into two output data b 0 And b 1 And then b 0 And b 1 And inputting the data to an adder for addition operation to obtain an accumulation result of the data to be accumulated.
As can be seen from formulas (5) and (6), the CSA operation logic is simpler, so that the CSA is used for compressing and then adding the data to be accumulated, the operation speed is effectively improved, and the power consumption caused by accumulation operation is reduced.
In one embodiment, fig. 7 illustrates an exemplary flow of a data processing method of the neural network of the present disclosure, and referring to fig. 7, data processing may be performed based on the neural network according to steps S701 to S709.
Step S701, inputting a plurality of groups of data to be processed into a neural network, wherein the data to be processed comprises an order and a mantissa;
step S702, obtaining the total order of each group of data to be processed;
step S703, taking the minimum value of the total order of each group of data to be processed as a common order factor;
step S704, extracting a common order factor from each group of data to be processed;
step S705, taking the mantissa of each group of data to be processed and the residual order after the common order factor is extracted as data to be accumulated;
step S706, judging whether the number of items of the current data to be accumulated is two, if yes, jumping to step S707, otherwise jumping to step S708;
Step S707, directly adding the two items of data to be accumulated through an adder to obtain an accumulation result;
step S708, adding the data to be accumulated by combining the carry save adder and the adder to obtain an accumulation result;
step S709, multiplying the accumulated result by the common order factor to obtain the output data of the current operation unit.
In one embodiment, the neural network may include a convolutional neural network, and the data to be processed may include image block data and a convolutional kernel of the convolutional neural network, which are covered in the image data.
For example, the present disclosure may be used in connection with a convolutional neural network for processing image data, as shown in fig. 8, the image data to be processed may be first input into the convolutional neural network, the convolutional neural network includes a convolutional kernel, and a target image block may be obtained from image blocks covered by the convolutional kernel in the image data; the data to be processed may include pixel data a, B, E, F of the image block and convolution kernels W, X, Y, Z, all of which may be expressed in terms of multiplication of the order and mantissa; the pixel data of the target image block may be multiplied by an operator corresponding to the convolution kernel, and the products may be added to perform convolution operation, and specifically, the convolution operation may be performed on the data to be processed according to the following formula:
Wherein 2 is emin Is the minimum value of the total order of each group of data to be processed; each group of data to be processed can be multiplied, and different groups of data to be processed can be accumulated; k1 = (ma×mw), k2= (mb×mx), k3= (me×my), k4= (mf×mz).
In the convolution operation process of the pixel data of the target image block and the convolution kernel, four groups of data to be processed are A and W, B and X, E and Y, and F and Z respectively, and each data to be processed can be expressed as a form of multiplying an order by a mantissa to perform convolution operation; based on each group of data to be processedThe minimum value of the step codes ea+ew, eb+ex, ee+ey, ef+ez of the total steps obtains the step code emin of the common step factor, and further obtains the common step factor 2 emin The method comprises the steps of carrying out a first treatment on the surface of the Extracting a common factor from each group of data to be processed according to the common order factor to obtain data to be accumulated; the data to be accumulated can comprise mantissas K1, K2, K3 and K4 corresponding to each group of data to be processed and residual orders after the common order factors are extracted; because the number of items of the data to be accumulated is four, as shown in fig. 9, after the data to be accumulated is compressed to two items by combining with CSA, the two items after compression are input to the adder to perform addition operation, so as to obtain an accumulation result; and finally, obtaining the convolution result of the current target image block and the convolution kernel according to the product of the common order factor and the accumulation result, and outputting the obtained convolution result to the convolution operation of the next target image block and the convolution kernel so as to obtain the convolution result of the next target image block and the convolution kernel.
Furthermore, the exemplary embodiments of the present disclosure also provide a data processing apparatus of a neural network. Referring to fig. 10, the data processing apparatus 1000 of the neural network may include:
a to-be-processed data acquisition module 1010 configured to acquire a plurality of sets of to-be-processed data input to a current operation unit in the neural network, the to-be-processed data including an order and a mantissa;
the common order factor extraction module 1020 is configured to extract common order factors of a plurality of groups of data to be processed, and multiply mantissas in each group of data to be processed with residual orders after the common order factors are extracted to obtain data to be accumulated corresponding to each group of data to be processed;
the output data obtaining module 1030 is configured to accumulate the data to be accumulated corresponding to each group of data to be processed, and multiply the accumulated result with the common order factor to obtain the output data of the current operation unit.
In one embodiment, the neural network may include a convolutional neural network, and the data to be processed may include image block data and a convolutional kernel of the convolutional neural network, which are covered in the image data.
In an embodiment, the data to be processed may further include output data of a previous stage operation unit of the current operation unit.
In one embodiment, the extracting the common order factor of the plurality of sets of data to be processed includes:
the total order of each group of data to be processed is determined, and the common order factor is obtained by comparing the total orders.
In one embodiment, the comparing the total order to obtain the common order factor includes:
and determining the common order factor according to the minimum value of the total orders.
In one embodiment, the remaining order may include a quotient of a total order of each set of data to be processed divided by a common order factor.
In one embodiment, the current operation unit may include a multiplier and an adder, where multiplying the mantissa in each set of data to be processed by the remaining order after extracting the common order factor includes:
multiplying mantissas in each group of data to be processed with residual orders after the common order factors are extracted through a multiplier;
the above-mentioned to-be-accumulated data corresponding to each group of to-be-processed data are accumulated, and the accumulated result is multiplied by a common order factor to obtain the output data of the current operation unit, including:
and accumulating the data to be accumulated corresponding to each group of data to be processed through an adder, and multiplying the accumulated result with a common order factor through a multiplier to obtain the output data of the current operation unit.
In one embodiment, the current operation unit may further include a carry save adder, and the adding the to-be-added data corresponding to each set of to-be-processed data by the adder includes:
when the number of items of the data to be accumulated is smaller than or equal to the preset number of items, adding the data to be accumulated through an adder to obtain an accumulation result;
and when the number of items of the data to be accumulated is larger than the preset number of items, adding the data to be accumulated by combining the carry save adder and the adder to obtain an accumulation result.
The specific details of each part in the above apparatus are already described in the method part embodiments, and thus will not be repeated.
Exemplary embodiments of the present disclosure also provide a computer readable storage medium, which may be implemented in the form of a program product comprising program code for causing an electronic device to carry out the steps according to the various exemplary embodiments of the disclosure as described in the above section of the "exemplary method" when the program product is run on the electronic device. In an alternative embodiment, the program product may be implemented as a portable compact disc read only memory (CD-ROM) and comprises program code and may run on an electronic device, such as a personal computer. However, the program product of the present disclosure is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium can be, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The computer readable signal medium may include a data signal propagated in baseband or as part of a carrier wave with readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., connected via the Internet using an Internet service provider).
Exemplary embodiments of the present disclosure also provide an electronic device. The electronic device may include a processor and a memory. The memory stores executable instructions of the processor, such as program code. The processor performs the method of the present exemplary embodiment by executing the executable instructions.
The electronic device is exemplarily described in the form of a general-purpose computer with reference to fig. 11. It should be understood that the electronic device 1100 illustrated in fig. 11 is merely an example and should not be taken as limiting the functionality and scope of use of the disclosed embodiments.
As shown in fig. 11, an electronic device 1100 may include: processor 1110, memory 1120, bus 1130, I/O (input/output) interface 1140, network adapter 1150.
Processor 1110 may include one or more processing units, such as: the processor 1110 may include a central processor (Central Processing Unit, CPU), an AP (Application Processor ), a modem processor, a display processor (Display Process Unit, DPU), a GPU (Graphics Processing Unit, graphics processor), an ISP (Image Signal Processor ), a controller, an encoder, a decoder, a DSP (Digital Signal Processor ), a baseband processor, and/or NPU (Neural-Network Processing Unit, neural network processor), and the like. The data processing method of the neural network in the present exemplary embodiment may be performed by a GPU or an NPU, in one embodiment, the neural network may be deployed in the GPU after the data to be processed is acquired, the common order factor is extracted from the data to be processed by the GPU, the remaining data after the common order factor is extracted from the data to be processed is used as the data to be accumulated, the data to be accumulated is added by the CSA and the adder, an accumulation result is obtained, and a product of the common order factor and the accumulation result is used as the output data of the current operation unit
The memory 1120 may include volatile memory such as RAM 1121, cache unit 1122, and nonvolatile memory such as ROM 1123. Memory 1120 may also include one or more program modules 1124, such program modules 1124 include, but are not limited to: an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment. Program modules 1124 may include, for example, the modules in apparatus 1000 described above.
Bus 1130 is used to enable connections between the different components of electronic device 1100 and may include a data bus, an address bus, and a control bus.
The electronic device 1100 may communicate with one or more external devices 1100 (e.g., terminal devices, keyboards, mice, external controllers, etc.) through the I/O interface 1140.
The electronic device 1100 may communicate with one or more networks through a network adapter 1150, e.g., the network adapter 1150 may provide mobile communication solutions such as 3G/4G/5G or wireless communication solutions such as wireless local area network, bluetooth, near field communication, etc. Network adapter 1150 may communicate with other modules of electronic device 1100 via bus 1130.
Although not shown in fig. 11, other hardware and/or software modules may also be provided in the electronic device 1100, including but not limited to: displays, microcode, device drivers, redundant processors, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and the like.
It should be noted that although in the above detailed description several modules or units of a device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit in accordance with exemplary embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into a plurality of modules or units to be embodied.
Those skilled in the art will appreciate that the various aspects of the present disclosure may be implemented as a system, method, or program product. Accordingly, various aspects of the disclosure may be embodied in the following forms, namely: an entirely hardware embodiment, an entirely software embodiment (including firmware, micro-code, etc.) or an embodiment combining hardware and software aspects may be referred to herein as a "circuit," module "or" system. Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any adaptations, uses, or adaptations of the disclosure following the general principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It is to be understood that the present disclosure is not limited to the precise arrangements and instrumentalities shown in the drawings, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.
Claims (10)
1. A data processing method of a neural network, comprising:
acquiring a plurality of groups of data to be processed, which are input to a current operation unit in a neural network, wherein the data to be processed comprise an order and a mantissa;
extracting common order factors of the multiple groups of data to be processed, and multiplying mantissas in each group of data to be processed by the residual orders after the common order factors are extracted to obtain data to be accumulated corresponding to each group of data to be processed;
and accumulating the data to be accumulated corresponding to each group of data to be processed, and multiplying the accumulated result by the common order factor to obtain the output data of the current operation unit.
2. The method of claim 1, wherein the neural network comprises a convolutional neural network, and the data to be processed comprises image block data and the convolutional kernel of the convolutional neural network overlaid in image data.
3. The method according to claim 1, wherein the data to be processed includes output data of an arithmetic unit that is a previous stage of the current arithmetic unit.
4. The method of claim 1, wherein extracting the common order factor for the plurality of sets of data to be processed and multiplying the mantissa in each set of data to be processed by the remaining order after extracting the common order factor comprises:
determining the total order of each group of data to be processed, and comparing the total orders to obtain the common order factor;
the remaining order comprises a quotient of the total order of each set of data to be processed divided by the common order factor.
5. The method of claim 4, wherein the obtaining the common-order factor by comparing the total order comprises:
and determining the common order factor according to the minimum value of the total orders.
6. The method of claim 1, wherein the current arithmetic unit comprises a multiplier and an adder; the multiplying the mantissa in each group of data to be processed by the residual order after the common order factor is extracted comprises the following steps:
Multiplying mantissas in each group of data to be processed by the residual order after the common order factor is extracted by the multiplier;
the step of accumulating the data to be accumulated corresponding to each group of data to be processed and multiplying the accumulated result by the common order factor to obtain the output data of the current operation unit comprises the following steps:
and accumulating the data to be accumulated corresponding to each group of data to be processed through the adder, and multiplying the accumulated result by the common order factor through the multiplier to obtain the output data of the current operation unit.
7. The method of claim 6, wherein the current arithmetic unit further comprises a carry-save adder, the accumulating, by the adder, the data to be accumulated corresponding to each set of data to be processed, comprising:
when the number of items of the data to be accumulated is smaller than or equal to a preset number of items, adding the data to be accumulated through an adder to obtain the accumulation result;
and when the number of items of the data to be accumulated is larger than the preset number of items, adding the data to be accumulated by combining a carry save adder and an adder to obtain the accumulation result.
8. A data processing apparatus of a neural network, comprising:
the system comprises a data acquisition module to be processed, a data processing module and a data processing module, wherein the data acquisition module to be processed is configured to acquire a plurality of groups of data to be processed which are input to a current operation unit in a neural network, and the data to be processed comprise an order and a mantissa;
the common order factor extraction module is configured to extract common order factors of the plurality of groups of data to be processed, and multiply mantissas in each group of data to be processed with residual orders after the common order factors are extracted to obtain data to be accumulated corresponding to each group of data to be processed;
and the output data acquisition module is configured to accumulate the data to be accumulated corresponding to each group of data to be processed, and multiply the accumulated result with the common order factor to obtain the output data of the current operation unit.
9. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the method of any of claims 1 to 7.
10. An electronic device, comprising:
a processor;
a memory for storing executable instructions of the processor;
wherein the processor is configured to perform the method of any one of claims 1 to 7 via execution of the executable instructions.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210642072.0A CN117251134A (en) | 2022-06-08 | 2022-06-08 | Data processing method and device of neural network, storage medium and electronic equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210642072.0A CN117251134A (en) | 2022-06-08 | 2022-06-08 | Data processing method and device of neural network, storage medium and electronic equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117251134A true CN117251134A (en) | 2023-12-19 |
Family
ID=89135530
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210642072.0A Pending CN117251134A (en) | 2022-06-08 | 2022-06-08 | Data processing method and device of neural network, storage medium and electronic equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117251134A (en) |
-
2022
- 2022-06-08 CN CN202210642072.0A patent/CN117251134A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107608715B (en) | Apparatus and method for performing artificial neural network forward operations | |
CN109165732B (en) | Neural network processing device and method for executing vector multiply-add instruction | |
EP3657399A1 (en) | Weight pruning and quantization method for a neural network and accelerating device therefor | |
CN109032670B (en) | Neural network processing device and method for executing vector copy instruction | |
CN110163360B (en) | Computing device and method | |
CN110689125A (en) | Computing device | |
US10949168B2 (en) | Compressing like-magnitude partial products in multiply accumulation | |
TWI763079B (en) | Multiplier and method for floating-point arithmetic, integrated circuit chip, and computing device | |
CN113076083B (en) | Data multiply-add operation circuit | |
US11909421B2 (en) | Multiplication and accumulation (MAC) operator | |
CN117472325B (en) | Multiplication processor, operation processing method, chip and electronic equipment | |
KR101929847B1 (en) | Apparatus and method for computing a sparse matrix | |
TWI847252B (en) | Hardware circuit and method for multiplying sets of inputs, and non-transitory machine-readable storage device | |
CN117420982A (en) | Chip comprising a fused multiply-accumulator, device and control method for data operations | |
CN117251134A (en) | Data processing method and device of neural network, storage medium and electronic equipment | |
CN116954551A (en) | Data processing method, device, equipment and storage medium | |
CN115237372A (en) | Multiplication circuit, machine learning operation circuit, chip and data processing method | |
TW202333041A (en) | System and method performing floating-point operations | |
CN113591031A (en) | Low-power-consumption matrix operation method and device | |
CN111047024B (en) | Computing device and related product | |
CN113986194A (en) | Neural network approximate multiplier implementation method and device based on preprocessing | |
CN109416757B (en) | Method, apparatus and computer-readable storage medium for processing numerical data | |
CN118034642A (en) | Floating point operation method, device, electronic equipment and storage medium | |
WO2024212952A1 (en) | Computing apparatus and method, electronic device, and storage medium | |
WO2021073511A1 (en) | Multiplier, method, integrated circuit chip, and computing device for floating point operation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |