CN109409514A - Fixed-point calculation method, apparatus, equipment and the storage medium of convolutional neural networks - Google Patents
Fixed-point calculation method, apparatus, equipment and the storage medium of convolutional neural networks Download PDFInfo
- Publication number
- CN109409514A CN109409514A CN201811302449.8A CN201811302449A CN109409514A CN 109409514 A CN109409514 A CN 109409514A CN 201811302449 A CN201811302449 A CN 201811302449A CN 109409514 A CN109409514 A CN 109409514A
- Authority
- CN
- China
- Prior art keywords
- register
- eigenvalue
- weight
- value
- convolutional layer
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Theoretical Computer Science (AREA)
- Evolutionary Computation (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Neurology (AREA)
- Complex Calculations (AREA)
Abstract
The embodiment of the invention discloses fixed-point calculation method, apparatus, equipment and the storage mediums of convolutional neural networks, the convolutional neural networks include convolutional layer, the described method includes: receiving the input activation value of this layer of convolutional layer by input channel, the input channel has corresponding weight;Fixed point operation is carried out to the input activation value, obtains the First Eigenvalue;The First Eigenvalue and the weight are respectively written into the register of multiple register groupings;It is grouped for the multiple register, respectively according to the First Eigenvalue and weight progress multiply-add operation in the register, obtains multiple Second Eigenvalues.Due to usually providing multiple registers in processor, it can be carried out by the way that accumulation operations are dispersed in multiple registers, i.e. grouping is cumulative, the quantity for the multiply-add operation shared equally is reduced with this, reduces and overflows risk, improves the treatment effeciency of application operating instruction, increase entire throughput, meanwhile accuracy is maintained, it ensure that application range.
Description
Technical field
The present embodiments relate to the technology of deep learning more particularly to the fixed-point calculation methods of convolutional neural networks, dress
It sets, equipment and storage medium.
Background technique
In recent years, deep learning is widely used in the fields such as vision, wherein with CNN (Convolutional Neural
Network, convolutional neural networks) be core series of algorithms, image classification, target detection, Pixel-level segmentation etc. application
With preferable effect.
However, the operand of convolutional neural networks CNN is big, 90% or more computing load is concentrated in convolution algorithm,
Convolutional layer can be converted to the multiplication operations of two matrixes by common realization, and the core in matrix multiple is as follows:
Wherein, w is weight, and a is input activation value, and O is output activation value, and N is the public edge lengths of w and a.
Since the operand of convolution algorithm is big, convolution algorithm usually requires GPU (Graphics in actual deployment
Processing Unit, graphics processor) or FPGA (Field-Programmable Gate Array, field programmable gate
Array) etc hardware accelerated, can be only achieved the requirement of real time execution.
In order to move to convolutional neural networks CNN in the limited equipment of calculation resources, such as mobile terminal is embedded sets
It is standby, it is currently the methods of to train model that is smaller, more simplifying, and be aided with fixed point, cut, attempts between speed and accuracy rate
Obtain a balance.
Wherein, fixed point (or quantization) method because it will not change network structure, is not necessarily to the advantages such as re -training, by
Extensive concern.
Fixed point is that original numerical value indicated using 32bits floating number is switched to the number of fixed bit by mapping method
Value indicates.By taking Q notation as an example, U2Q6 indicates the fixed-point number of a 8bits, integer part 2bit, fractional part 6bits.?
In general processor, fixed-point calculation usually has less time delay and bigger handling capacity than floating-point operation.The fixed-point calculation of 8bits
Compared with the floating-point operation of 32bits, 4 times of performance boost can be theoretically brought for CNN.
However, due to during fixed-point calculation bit wide can increase, in order to guarantee precision, need to go to deposit using bigger bit wide
Storage output result.For example, the fixed-point number of two U2Q6 is multiplied, output result is S4Q12, i.e. 16bits.The increasing of this bit wide
Add and affect instruction throughput, eventually leads to actual fixes operational performance far away from theoretical performance.
In order to promote the performance of fixed-point calculation, there are two types of current ways:
First method is to abandon low level by shifting function after fixed-point calculation to reduce bit wide.
For example, some not too important low levels are removed by increasing a shifting function in active coating, it is subsequent fixed to reduce
Point processing complexity.
However, this method needs specially designed hardware supported, because intermediate result usually requires to save non-secondary power
Bit wide as a result, and General Porcess Unit usually only possesses the operational order of secondary power bit wide, treatment effeciency is lower.
Second method is that multiplying order is replaced with to other to overflow the lower instruction of risk, such as shift instruction.
For example, such as { 0.125,0.25,0.5 } etc., these are specific by the way that weight is quantified as several sparse particular values
The multiplying of value can be converted to the shifting function of corresponding bit number.
However, the accuracy of this method is lower, application range is smaller, is only applicable to simple picture classification task.
Summary of the invention
The embodiment of the present invention provides fixed-point calculation method, apparatus, equipment and the storage medium of a kind of convolutional neural networks, with
It realizes while improving fixed-point calculation efficiency, guarantees application range.
In a first aspect, the embodiment of the invention provides a kind of fixed-point calculation method of convolutional neural networks, the convolution mind
It include convolutional layer through network, which comprises
The input activation value of this layer of convolutional layer is received by input channel, the input channel has corresponding weight;
Fixed point operation is carried out to the input activation value, obtains the First Eigenvalue;
The First Eigenvalue and the weight are respectively written into the register of multiple register groupings;
For the multiple register be grouped, respectively according in the register the First Eigenvalue and the weight
Multiply-add operation is carried out, multiple Second Eigenvalues are obtained.
Preferably, each register grouping register includes for storing the register of the First Eigenvalue, for depositing
Store up the register, multiplication register and addend register of the weight;
It is described for the multiple register be grouped, respectively according in the register the First Eigenvalue with it is described
Weight carries out multiply-add operation, obtains multiple Second Eigenvalues, comprising:
It is grouped for each register, corresponding to the same input channel described first is special in the multiplication register
Value indicative and the weight carry out multiplying, obtain feature product data;
By the feature product data accumulation into the addend register, Second Eigenvalue is obtained.
Preferably, further includes:
Merge the multiple Second Eigenvalue, obtains third feature value;
Floating-point operation is carried out to the third feature value, obtains fourth feature value;
The output activation value of this layer of convolutional layer is generated according to the fourth feature value.
Preferably, before the input activation value for receiving this layer of convolutional layer by input channel, the method is also wrapped
It includes:
The bit number of the weight is compressed, to compress the bit number of the Second Eigenvalue.
Preferably, the input activation value of this layer of convolutional layer is the output activation value of upper layer convolutional layer;
Before the input activation value for receiving this layer of convolutional layer by input channel, the method also includes:
The bit number of the output activation value of upper layer convolutional layer is compressed, to compress the bit number of the Second Eigenvalue.
Preferably, the convolutional layer is grouping convolutional layer, and the grouping convolutional layer includes multiple convolution groups;
The method also includes:
For each convolution group, the candidate arrangement mode of assigned input channel is enumerated, the input channel has
Corresponding first trained values;
First trained values and the weight are respectively written into the register of multiple register groupings;
It is grouped for the multiple register, respectively according to first trained values and the weight in the memory
Multiply-add operation is carried out under every kind of candidate arrangement mode, obtains multiple second trained values;
The smallest second trained values of absolute value are determined in second trained values, as target trained values;
Set the corresponding candidate arrangement mode of the target trained values to the target array mode of input channel.
Preferably, the register of each register grouping includes for storing the register of the First Eigenvalue, being used for
Store the register, multiplication register and addend register of the weight;
It is described to be grouped for the multiple register, it is weighed according to first trained values in the memory respectively
It focuses under every kind of candidate arrangement mode and carries out multiply-add operation, obtaining multiple second trained values includes:
First trained values corresponding to each input channel and the weight carry out in the multiplication register
Multiplying obtains training product data;
The selective value the smallest m trained product data from the trained product data, as target training product data;
Target training product data are written into the addend register;
Other training product data in addition to target training product data are added to according to every kind of arrangement mode
In the addend register, the second trained values are obtained.
Preferably, the convolutional layer is grouping convolutional layer, and the grouping convolutional layer includes multiple convolution groups;
The input activation value that this layer of convolutional layer is received by input channel, comprising:
Determine input channel assigned by each convolution group;
In each convolution group, input activation value is received by assigned input channel;
In the register that the First Eigenvalue and the weight are respectively written into multiple register groupings, comprising:
Target convolution group is successively determined from the multiple convolution group;
In the target convolution group, the assigned corresponding the First Eigenvalue of input channel is respectively written into weight
In the register of multiple register groupings;
It is described for the multiple register be grouped, respectively according in the register the First Eigenvalue with it is described
Weight carries out multiply-add operation, obtains multiple Second Eigenvalues, comprising:
In the target convolution group, respectively according to the register grouping in the First Eigenvalue and the weight
Preset target array mode carries out multiply-add operation, obtains multiple Second Eigenvalues.
Second aspect, the embodiment of the invention also provides a kind of fixed-point calculation device of convolutional neural networks, the convolution
Neural network includes convolutional layer, and described device includes:
Activation value receiving module is inputted, it is described defeated for receiving the input activation value of this layer of convolutional layer by input channel
Enter channel with corresponding weight;
Fixed point conversion module obtains the First Eigenvalue for carrying out fixed point operation to the input activation value;
It is grouped memory module, for the First Eigenvalue and the weight to be respectively written into posting for multiple register groupings
In storage;
Multiply-add operation module, for being grouped for the multiple register, respectively according to described the in the register
One characteristic value and the weight carry out multiply-add operation, obtain multiple Second Eigenvalues.
Preferably, each register grouping register includes for storing the register of the First Eigenvalue, for depositing
Store up the register, multiplication register and addend register of the weight;
The multiply-add operation module includes:
Multiplying submodule, for being grouped for each register, to the same input in the multiplication register
The corresponding the First Eigenvalue in channel and the weight carry out multiplying, obtain feature product data;
Add operation submodule, for the feature product data accumulation into the addend register, to be obtained second
Characteristic value.
Preferably, further includes:
Characteristic value merging module obtains third feature value for merging the multiple Second Eigenvalue;
Floating-point conversion module obtains fourth feature value for carrying out floating-point operation to the third feature value;
Activation value generation module is exported, for generating the output activation value of this layer of convolutional layer according to the fourth feature value.
Preferably, further includes:
Weight compression module, for compressing the bit number of the weight, to compress the bit number of the Second Eigenvalue.
Preferably, the input activation value of this layer of convolutional layer is the output activation value of upper layer convolutional layer;
Described device further include:
Activation value compression module is exported, the bit number of the output activation value for compressing upper layer convolutional layer, described in compression
The bit number of Second Eigenvalue.
Preferably, the convolutional layer is grouping convolutional layer, and the grouping convolutional layer includes multiple convolution groups;
Described device further include:
Candidate arrangement mode enumerates module, for being directed to each convolution group, enumerates the candidate of assigned input channel
Arrangement mode, the input channel have corresponding first trained values;
Training set writing module, for first trained values and the weight to be respectively written into multiple register groupings
In register;
Training set training module, for being grouped for the multiple register, respectively according in the memory
First trained values and the weight carry out multiply-add operation under every kind of candidate arrangement mode, obtain multiple second trained values;
Target trained values selecting module, for determining the smallest second trained values of absolute value in second trained values,
As target trained values;
Target array mode setup module, for the corresponding candidate arrangement mode of the target trained values to be set as inputting
The target array mode in channel.
Preferably, the register of each register grouping includes for storing the register of the First Eigenvalue, being used for
Store the register, multiplication register and addend register of the weight;
The training set training module includes:
Training product data computational submodule, for the institute corresponding to each input channel in the multiplication register
It states the first trained values and the weight carries out multiplying, obtain training product data;
Target training product data select submodule, for the smallest m instruction of selective value from the trained product data
Practice product data, as target training product data;
Submodule is written in target training product data, posts for being written target training product data to the addition
In storage;
Training product data accumulation submodule, for by except the target training product data in addition to other training products
Data are added in the addend register according to every kind of arrangement mode, obtain the second trained values.
Preferably, the convolutional layer is grouping convolutional layer, and the grouping convolutional layer includes multiple convolution groups;
The input activation value receiving module includes:
Channel distribution sub module, for determining input channel assigned by each convolution group;
Channel reception submodule, for receiving input activation by assigned input channel in each convolution group
Value;
The grouping memory module includes:
Target convolution group determines submodule, for successively determining target convolution group from the multiple convolution group;
Channel sub-module stored is used in the target convolution group, by assigned input channel corresponding first
Characteristic value and weight are respectively written into the register of multiple register groupings;
The multiply-add operation module includes:
Multiply-add submodule is arranged, is used in the target convolution group, respectively according in register grouping
The First Eigenvalue and the preset target array mode of the weight carry out multiply-add operation, obtain multiple Second Eigenvalues.
The third aspect the embodiment of the invention also provides a kind of equipment, including memory, processor and is stored in memory
Computer program that is upper and can running on a processor, the processor realize that first aspect present invention is real when executing described program
The fixed-point calculation method of the convolutional neural networks of example offer is provided.
Fourth aspect, the embodiment of the invention also provides a kind of computer readable storage mediums, are stored thereon with computer
Program realizes the fixed-point calculation for the convolutional neural networks that first aspect present invention embodiment provides when the program is executed by processor
Method.
In embodiments of the present invention, the input activation value that this layer of convolutional layer is received by input channel, to input activation value
Fixed point operation is carried out, the First Eigenvalue is obtained, the First Eigenvalue and weight are respectively written into the deposit of multiple register groupings
It in device, is grouped for multiple registers, respectively according to the First Eigenvalue and weight progress multiply-add operation in register, obtains more
A Second Eigenvalue can be by being dispersed in multiple deposits for accumulation operations due to usually providing multiple registers in processor
It is carried out in device, i.e. grouping is cumulative, and the quantity for the multiply-add operation shared equally is reduced with this, reduces and overflows risk, improves application operating and refers to
The treatment effeciency of order increases entire throughput, meanwhile, accuracy is maintained, ensure that application range.
Detailed description of the invention
Fig. 1 is the flow chart of the fixed-point calculation for the convolutional neural networks that the embodiment of the present invention one provides;
Fig. 2 is the schematic diagram of the convolutional layer in the embodiment of the present invention one;
Fig. 3 is the schematic diagram of the register grouping in the embodiment of the present invention one;
Fig. 4 is the flow chart of the fixed-point calculation of convolutional neural networks provided by Embodiment 2 of the present invention;
Fig. 5 is the flow chart of the fixed-point calculation for the convolutional neural networks that the embodiment of the present invention three provides;
Fig. 6 is the schematic diagram of the grouping convolutional layer in the embodiment of the present invention three;
Fig. 7 is the structural schematic diagram of the fixed-point calculation device for the convolutional neural networks that the embodiment of the present invention four provides;
Fig. 8 is the structural schematic diagram for the equipment that the embodiment of the present invention five provides.
Specific embodiment
The present invention is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched
The specific embodiment stated is used only for explaining the present invention rather than limiting the invention.It also should be noted that in order to just
Only the parts related to the present invention are shown in description, attached drawing rather than entire infrastructure.
Fig. 1 is the flow chart of the fixed-point calculation for the convolutional neural networks that the embodiment of the present invention one provides, and is specifically included as follows
Step:
S110, the input activation value that this layer of convolutional layer is received by input channel.
In the concrete realization, the embodiment of the present invention can be applied in an equipment, which has universal computing unit, such as
CPU (Central Processing Unit, central processing unit), DSP (Digital Signal Processing, number letter
Number processing) etc., the operational order of commonly used secondary power bit wide.
The embedded device that the equipment can be limited for calculation resources, such as mobile terminal, but may be calculation resources
More abundant server cluster, such as distributed system, the embodiments of the present invention are not limited thereto.
Currently, basic group of stratification of convolutional neural networks may include convolutional layer (Convolution layer), pond layer
(Pooling layer), active coating (Activation layer) and full articulamentum (Full connection layer) etc..
Wherein, convolutional layer, pond layer, active coating can be combined into one layer of operation, for completing the volume of convolutional neural networks
Product operation.
To Mr. Yu's layer convolutional layer, the input of this layer of convolutional layer can be received input to by input channel in_channels
Activation value, the input activation value can be floating data.
In addition, input channel has corresponding weight, which is also known as convolution kernel.
Convolutional layer is generally made of multiple neuronal layers map, and each map is made of multiple neural units, the same map's
All neural units share a weight, and weight often represents a feature, for example some weight represents one section of arc, then convolution
Value be possible to be also one section of arc.
It should be noted that the weight can carry out fixed point operation in advance for fixed-point calculation, i.e. the weight can be
Fixed-point data.
S120, fixed point operation is carried out to the input activation value, obtains the First Eigenvalue.
In embodiments of the present invention, for each input activation value, fixed point behaviour can be carried out by modes such as Q meter methods
Make, is converted to fixed-point data from floating data, the result of fixed point operation conversion is the First Eigenvalue.
S130, the First Eigenvalue and the weight are respectively written into the register of multiple register groupings.
It in embodiments of the present invention, can be in advance by least portion as shown in Fig. 2, being provided with multiple memories in a device
Register is divided to be divided in multiple registers groupings 202.
In one example, processor usually possesses 32 or more SIMD (Single Instruction Multiple
Data, single-instruction multiple-data stream (SIMD)) register.
On the other side, single multiply-add operation MAC occupies 4 registers, wherein weight and the First Eigenvalue respectively occupy 1
A register, multiply-add operation MAC output the result is that 16bits, occupies 2 registers.
In addition, collecting register grouping accumulated result occupies 8 registers, wherein collection register is grouped cumulative
The result is that 32bits, occupies 4 registers, to prevent data jamming assembly line, additional one times of register occupies 8 in total
A register.
In this example, it removes and collects 8 registers that register grouping accumulated result occupies, remaining 24 deposits
Device, every group of register grouping individually carry out multiply-add operation MAC, i.e., each register grouping at least occupies 4 registers, therefore,
Register can be divided in 6 register groupings, carry out multiply-add fortune using independent register in each register grouping
Calculate MAC.
It in embodiments of the present invention, as shown in Fig. 2, can be by the modes such as sequence, random, by each input channel
The corresponding the First Eigenvalue of in_channels 201 is respectively written into the register of corresponding multiple register groupings with weight.
S140, for the multiple register be grouped, respectively according in the register the First Eigenvalue and institute
It states weight and carries out multiply-add operation, obtain multiple Second Eigenvalues.
In the concrete realization, as shown in figure 3, the register of each register grouping includes for storing the First Eigenvalue
Register 301, the register 302 for storing weight, multiplication register 303 and addend register 304.
Under normal circumstances, for the initial data of input convolutional neural networks, grading mode can be torn open by overlapping, by it
Cutting is quantity, size and the identical input feature vector value of weight, so that input feature vector value and weight correspond.
For example, the image data that input convolutional neural networks are a 320*240 can be with if weight is 6*6*10
The image data is split into the input feature vector value of multiple 6*6*10.
It is grouped for each register, to the corresponding the First Eigenvalue of the same input channel in multiplication register 203
Multiplying is carried out with weight, obtains feature product data.
By feature product data accumulation into addend register 204, Second Eigenvalue is obtained.
It is so-called cumulative, refer to that since first feature product data, addend register accumulates always each feature product
The value of data does not remove first feature product data, until being added to the last one feature product data, add up completion
Afterwards, the data of acquisition are Second Eigenvalue.
Furthermore, the multiply-add operation of register grouping can be expressed as follows:
Wherein, O is Second Eigenvalue, and G is the quantity of register grouping, and w is weight, and a is the First Eigenvalue, and N is w and a
Between public edge lengths,Quantity for the multiplying being assigned in each register group.
In embodiments of the present invention, the input activation value that this layer of convolutional layer is received by input channel, to input activation value
Fixed point operation is carried out, the First Eigenvalue is obtained, the First Eigenvalue and weight are respectively written into the deposit of multiple register groupings
It in device, is grouped for multiple registers, respectively according to the First Eigenvalue and weight progress multiply-add operation in register, obtains more
A Second Eigenvalue can be by being dispersed in multiple deposits for accumulation operations due to usually providing multiple registers in processor
It is carried out in device, i.e. grouping is cumulative, and the quantity for the multiply-add operation shared equally is reduced with this, reduces and overflows risk, improves application operating and refers to
The treatment effeciency of order increases entire throughput, meanwhile, accuracy is maintained, ensure that application range.
Fig. 4 is the flow chart of the fixed-point calculation of convolutional neural networks provided by Embodiment 2 of the present invention, before the present embodiment
Based on stating embodiment, the processing to compression weight and/or the bit number, Second Eigenvalue that export bit value is further increased
Operation.This method specifically comprises the following steps:
S410 compresses the bit number of the weight, to compress the bit number of the Second Eigenvalue.
S420, the bit number of the output activation value of compression upper layer convolutional layer, to compress the bit number of the Second Eigenvalue.
Since the fixed-point data of two 8bits is carried out multiplying, the result of output is 16bits, therefore, can be with
The result bit number of output is limited to less than 16bits.
In practical applications, the input activation value of the weight of 8bits and/or 8bits not necessarily, can keep one
Determine to compress weight under the premise of precision and/or input the bit number of activation value, to reach the ratio of the Second Eigenvalue of compression output
Special number.
And for this layer of convolutional layer, the input activation value of this layer of convolutional layer is the output activation value of upper layer convolutional layer,
Therefore, the bit number of output activation value can be compressed in the convolutional layer of upper layer.
In the concrete realization, weight can be compressed in the following way and/or exports the bit number of activation value:
1, homogeneous compaction
Each value is multiplied by 255/ (maximum value-minimum value), and then arest neighbors is rounded.
2, the truncation compression based on relative entropy
A threshold value is obtained by statistical method, the value that absolute value is greater than threshold value is suppressed to 255 by force, and other values all multiply
Upper 255/ threshold value.
3, non-uniform quantizing
Multiplied by the value of a dynamic change, it is desirable that value density in part with high accuracy is high, and otherwise value density is low.
Certainly, above-mentioned compress mode is intended only as example, in implementing the embodiments of the present invention, can set according to the actual situation
Other compress modes are set, the embodiments of the present invention are not limited thereto.In addition, other than above-mentioned compress mode, art technology
Personnel can also use other compress modes according to actual needs, and the embodiment of the present invention is also without restriction to this.
It is [- 128,127] that current standard determined, which has symbol 8bits fixed-point value, and following table lists multiplying when fixed point
The one of operand range of method operation is corresponded to the situation of change of output bit wide by limited time:
Wherein, operand A can be weight, and operand B can be input activation value, and output area and output bit wide are behaviour
The A and operand B that counts carries out the value range and bit wide of multiplying output result (such as Second Eigenvalue).
Optionally, since the bit wide of Real Time Compression output activation value needs overhead, it is thereby possible to select pressing offline
The part bit wide of contracting weight, the weight that convolutional neural networks CNN is loaded in actual motion have been the weights after compression.
Currently, including the overwhelming majority applications convolutional neural networks CNN such as image classification, target detection and segmentation,
The bit wide of weight can be compressed to the range of upper table and keep certain accuracy.
In embodiments of the present invention, compression weight and/or the bit number of output activation value, i.e. limitation weight can be passed through
And/or the value range of output activation value allows to avoid tearing open multiply-add operation to compress the bit number of Second Eigenvalue
The problem of being divided into two instruction executions of multiplying and add operation.
It should be noted that may be incorporated into addition to the value range of limitation weight, the range of limitation input activation value
New hardware instruction allows multiply-add operation MAC to be completed by an instruction to reach, and the embodiments of the present invention are not limited thereto.
S430 receives the input activation value of this layer of convolutional layer by input channel.
Wherein, input channel has corresponding weight.
For this layer of convolutional layer, the input activation value of this layer of convolutional layer is the output activation value of upper layer convolutional layer, i.e.,
The output activation value of upper layer convolutional layer is input to this layer of volume in compression bit number and then by output channel out_channels
Lamination, input activation value received by this layer of convolutional layer have already passed through compression.
S440 carries out fixed point operation to the input activation value, obtains the First Eigenvalue.
The First Eigenvalue and the weight are respectively written into the register of multiple register groupings by S450.
S460, for the multiple register be grouped, respectively according in the register the First Eigenvalue and institute
It states weight and carries out multiply-add operation, obtain multiple Second Eigenvalues.
S470 merges the multiple Second Eigenvalue, obtains third feature value.
S480 carries out floating-point operation to the third feature value, obtains fourth feature value.
S490 generates the output activation value of this layer of convolutional layer according to the fourth feature value.
In embodiments of the present invention, due to having the grouping of multiple registers in a device, multiply-add operation resulting the
The quantity of two characteristic values is multiple.
Multiple Second Eigenvalues are merged two-by-two, the data obtained after merging are third feature value.
Third feature value is fixed-point data, is remapped to floating data, can be obtained fourth feature value.
Standardized operation is carried out by modes such as BN (Batch Normalization) operators to fourth feature value, is passed through
The modes such as ReLU (Rectified Linear Units) function, Sigmod function carry out activation operation, etc., can be used as this
The output activation value of layer convolutional layer, is exported by output channel out_channels to lower layer's convolutional layer.
Optionally, it for this layer of convolutional layer, after generating output activation value, can be activated with the Real Time Compression output
The bit number of value, and the output activation value after compression is output to by lower layer's convolutional layer by output channel out_channels
During the multiply-add operation MAC of each register grouping:
To each register group, characteristic multiplier data are 8bits, and the Second Eigenvalue after adding up is 16bits, this
Step time-consuming is theoretically speaking be (N/G) * t1, wherein t1 is average consumption needed for processor unit executes a multiply-add operation MAC
When, since multiply-add operation MAC can intert progress, without pipeline blocking, assembly line settling time need to be only considered herein.
Assuming that there are 6 registers to be grouped, the Second Eigenvalue of 6 16bits altogether is obtained by cumulative, by the second spy
Value indicative, which matches to add up two-by-two, obtains the intermediate value of 3 32bits, time-consuming 3*t2, including wherein t2 is comprising pipeline blocking
The time-consuming of single accumulating operation.
It is added up twice again to above-mentioned intermediate value, obtains third feature value, time-consuming 2*t2.
When (N/G) is much larger than 5, the time-consuming of multiply-add operation is much larger than the time-consuming of accumulating operation, and the multiply-add operation of 8bits
The data volume (handling capacity) that MAC single can be handled is theoretically 4 times of 32bits full precision data, therefore can be within the unit time
More data are handled, therefore whole time-consuming just less.
It should be noted that the input operand of union operation is still 16bits, output is then 32bits, is compared
The multiplying of 8bits, handling capacity halve, therefore, in practical applications, the adjustable register of those skilled in the art
The quantity of grouping and the bit wide of weight, to reach the compromise of speed and precision.
Fig. 5 is the flow chart for the fixed-point calculation of convolutional neural networks that the embodiment of the present invention three provides, before the present embodiment
Based on stating embodiment, the processing operation of grouping convolutional layer is further increased.This method specifically comprises the following steps:
S501 enumerates the candidate arrangement mode of assigned input channel for each convolution group.
In embodiments of the present invention, some convolutional layer in convolutional neural networks CNN is grouping convolutional layer (Grouped
Convolution), also known as group's convolutional layer, grouping convolutional layer includes multiple convolution group group.
Compared to common convolutional layer, the parameter for being grouped convolutional layer is less, arithmetic speed faster, also, due to its outstanding property
Can and to the characteristic of Cache close friend, grouping convolutional layer has become embedded device and realizes that one of convolutional neural networks CNN passes through
Allusion quotation structure.
Assuming that upper one layer of output feature map has N number of, i.e. input channel number in_channel=N, it is further assumed that grouping
The quantity of the convolution group of convolutional layer is M, then, in grouping convolutional layer, input channel number in_channe is divided into M parts, each
Convolution group group corresponds to N/M input channel number in_channel, is independently connected therewith, then each convolution group group convolution
Output is stacked into (concatenate) after the completion, the output channel out_channel as this layer.
In practical applications, it is grouped input channel in_ assigned by each convolution group group of convolutional layer
Channel has generally been fixed when designing convolutional neural networks CNN, still, according to the commutative law of add operation, one
Input channel in_channel in a convolution group group can readjust sequence, without having an impact to add operation.
For convolutional layer, the sequence of input channel in_channel has no related to the result of output, and inputs
The result that activation value and weight carry out multiplying has just and has negative (probability is close), therefore, can be by adjusting input channel in_
The sequence of channel is overflowed so that the result for carrying out add operation in each convolution group group is as small as possible with this to reduce
Risk.
Therefore, it can be directed to each convolution group when offline, enumerate the candidate arrangement mode of assigned input channel.
First trained values and the weight are respectively written into the register of multiple register groupings by S502.
On the one hand, input channel in_channel has corresponding first trained values, and first trained values are as this layer point
The input feature vector value of group convolutional layer, for training arrangement mode.
Since the first trained values are the data sets for being intended to simulate actual use scene, it can be from training convolutional
It is directly extracted in the test set of neural network CNN.
On the other hand, input channel in_channel has corresponding weight.
Similarly, it is provided with multiple memories in a device, at least partly register can be divided to multiple post in advance
In storage grouping.
By the modes such as sequence, random, by corresponding first trained values of each input channel in_channels and weight
It is respectively written into the register of multiple register groupings.
S503, for the multiple register be grouped, respectively according in the memory first trained values and institute
It states weight and carries out multiply-add operation under every kind of candidate arrangement mode, obtain multiple second trained values.
S504 determines the smallest second trained values of absolute value, as target trained values in second trained values.
In the concrete realization, convolution group group is serially run, and each convolution group group can call multiple registers point
Group is independent to carry out multiply-add operation MAC.
For each convolution group group, the first trained values and weight can be multiplied under every kind of candidate arrangement mode
Add operation MAC, obtains multiple second trained values.
As the target of optimization, so that the absolute value of each register grouping accumulation result (i.e. the second trained values) is as far as possible
It is small, to reduce spilling risk.
Above-mentioned optimization aim can be expressed as follows:
Wherein, G is the quantity of register grouping, and R is the maximum value of the second trained values under current arrangement mode s.
In one embodiment of the invention, the register of each register grouping includes for storing the First Eigenvalue
Register, the register for storing weight, multiplication register and addend register.
Then in embodiments of the present invention, S504 may include:
S5041, first trained values corresponding to each input channel and the power in the multiplication register
Multiplying is carried out again, obtains training product data.
S5042, the selective value the smallest m trained product data from the trained product data multiply as target training
Volume data.
Target training product data are written into the addend register S5043.
S5044, by other training product data in addition to target training product data according to every kind of arrangement mode
It is added in the addend register, obtains the second trained values.
In embodiments of the present invention, estimation is calculated is multiplied between corresponding first trained values of each input channel and weight
The training product data that method operation obtains.
Take absolute value the smallest m (m is positive integer, and general value is 2-32) a training from all training product data
Product data, as target training product data.
Target training product data are written into addend register, as cumulative initial value.
Remaining trained product data are taken from all training product data, are added to and are added according to different arrangement modes
In method register, to obtain the second trained values.
At this point, distributing to the addition with the actual registers value the smallest input channel in_channels of cumulative rear absolute value
Then register recalculates the accumulated value of addend register, add up and complete, and can be obtained the smallest i.e. second training of absolute value
Value.
Above-mentioned second trained values can be expressed as follows:
Wherein, R is the second trained values, and G is the quantity of register grouping, and w is weight, and a is the First Eigenvalue, and N is w and a
Between public edge lengths,Quantity for the multiplying being assigned in each register group, s are arrangement mode, and W and H are the
The width and height of one trained values.
It should be noted that in addition to using violence traversal method channel weight can also be carried out using the method for Dynamic Programming
Optimal arrangement mode is chosen in row's operation, chooses optimal row alternatively, can also judge by indirect indexes such as accuracys rate
Column mode, the embodiments of the present invention are not limited thereto.
S505 sets the corresponding candidate arrangement mode of the target trained values to the target array mode of input channel.
According to the channel ID of the input channel in_channels of each register distribution, interlocks, obtain final mesh
Mark arrangement mode.
Be grouped multiply-add stage, the register of a 16bits need cumulative (N/G) secondary multiplying altogether as a result, one
Denier (N/G) is very big, it is possible to can be in cumulative process, absolute value exceeds the data that 15bits (1bits sign bit) can be stored
The upper limit causes to overflow and unrolls (positive number change negative).
It in embodiments of the present invention, can be with since sequence of the add operation in convolution group to input channel is unrelated
The sequence of arbitrary arrangement input channel adjusts cumulative sequence so that in the first trained values each time after multiply-add operation absolutely
The increment of value is minimum, to reduce the absolute value peak occurred in cumulative process.
Since the sampling to truth may be implemented in the first trained values, when quantity is enough, according to the law of large numbers, it is exhausted
Distribution value approaching to reality is distributed, it is therefore contemplated that this puts in order can also make multiply-add fortune each time in practical applications
The increment of absolute value is minimum after calculation.
S506 determines input channel assigned by each convolution group.
S507 receives input activation value by assigned input channel in each convolution group.
As shown in fig. 6, certain this layer of convolutional layer is grouping convolutional layer when application on site convolutional neural networks CNN, then can divide
The input channel in_channels 601 of each convolution group group 602 Cha Xun be pre-assigned to.
For each convolution group group 602, received respectively by assigned input channel in_channels601
It is input to the input activation value of convolution group group 602.
S508 carries out fixed point operation to the input activation value, obtains the First Eigenvalue.
S509 successively determines target convolution group from the multiple convolution group.
Multiple convolution group group are serially run in sequence, are target convolution group in currently running convolution group.
S510, in the target convolution group, by the assigned corresponding the First Eigenvalue of input channel and weight point
It is not written in the register of multiple register groupings.
As shown in fig. 6, convolution group group 602 is at runtime, multiple registers groupings 603 can be called, by sequence, with
The assigned corresponding the First Eigenvalue of input channel in_channels 601 and weight are respectively written into more by the modes such as machine
In the register of a register grouping 603.
S511, in the target convolution group, respectively according to the register grouping in the First Eigenvalue and institute
It states the preset target array mode of weight and carries out multiply-add operation, obtain multiple Second Eigenvalues.
In practical applications, the corresponding target of input channel in_channels assigned by the target convolution group is inquired
Arrangement mode thereby determines that the First Eigenvalue and weight carry out the sequence of add operation in the target convolution group, is being multiplied
After method operation, add operation is carried out with this, to obtain Second Eigenvalue.
It should be noted that the arrangement mode that the target array mode is trained when can be offline, or design volume
The arrangement mode defaulted when product neural network CNN, the embodiments of the present invention are not limited thereto.
In embodiments of the present invention, divide common edge by introducing convolution group, can limit the First Eigenvalue and weight it
Between common edge quantity with input channel quantity increase and increase, be extended to the convolutional layer comprising any input channel number.
In addition, input channel less in each convolution group allows to obtain input channel by the plain mode of force search
The optimal solution of rearrangement.
It should be noted that the element for participating in single convolution can be limited there are also other modes in addition to being grouped convolutional layer
Number, including but not limited to: depth separates convolution (Depth-wise Convolution), network pruning and rarefaction, empty
Hole convolution etc., the embodiments of the present invention are not limited thereto.
Wherein, network pruning and rarefaction and empty convolution are participated in by the way that fraction in the middle part of convolution kernel is reset to 0 to reduce
The element number of operation.
Embodiment in order to enable those skilled in the art to better understand the present invention passes through specific example in the present specification
Illustrate the fixed-point calculation method of convolutional neural networks in the embodiment of the present invention.
The common convolutional layer of example one
This Ceng Juan base have 8 input channel channel_1, channel_2, channel_3, channel_4,
channel_5、channel_6、channel_7、channel_8
The weight of 8 fixed-point datas, w1, w2, w3, w4, w5, w6, w7, w8 are set for 8 input channel branches
Memory is marked off two register groupings A1, A2 by equipment
When offline:
Compress the bit number of w1, w2, w3, w4, w5, w6, w7, w8
When online:
Respectively from channel_1, channel_2, channel_3, channel_4, channel_5, channel_6,
Channel_7, channel_8 receive input activation value a1, a2, a3, a4, a5, a6, a7, a8
A1, a2, a3, a4, a5, a6, a7, a8 are converted into fixed-point data, as the First Eigenvalue
Channel_1, channel_3, channel_5, channel_7 are assigned to A1
Channel_2, channel_4, channel_6, channel_8 are assigned to A2
In the Second Eigenvalue of A1 operation multiply-add operation are as follows: a1*w1+a3*w3+a5*w5+a7*w7=D1
In the Second Eigenvalue of A2 operation multiply-add operation are as follows: a2*w2+a4*w4+a6*w6+a8*w8=D2
D1 and D2 merge into third feature value: D1+D2=D3
D3 is mapped as to the numerical value of floating-point, it is defeated as activating after the operations such as standardized operation, activation primitive activation
It is worth out
Compress the bit number of D3
D3 is output to lower layer's convolutional layer by output channel
Example two is grouped convolutional layer
Being grouped convolutional layer has 2 convolution groups, B1, B2
With 8 input channel channel_1, channel_2, channel_3, channel_4, channel_5,
channel_6、channel_7、channel_8
Wherein, channel_1, channel_2, channel_3, channel_4 distribute to B1
Channel_5, channel_6, channel_7, w_8 distribute to B2
The weight of 8 fixed-point datas, w1, w2, w3, w4, w5, w6, w7, w8 are set for 8 input channel branches
Memory is marked off two register groupings A1, A2 by equipment
When offline:
Compress the bit number of w1, w2, w3, w4, w5, w6, w7, w8
Respectively from channel_1, channel_2, channel_3, channel_4, channel_5, channel_6,
Channel_7, channel__8 receive the first trained values a1 ', a2 ', a3 ', a4 ', a5 ', a6 ', a7 ', a8 '
In B1, estimate the output valve of channel_1, channel_2, channel_3, channel_4, i.e. a1 ' * w1,
a2’*w2、a3’*w3、a4’*w4
The smallest two channels of absolute value be channel_1, channel_2, i.e. a1 ' * w1, a2 ' * w2 absolute value be less than
The absolute value of a3 ' * w3, a4 ' * w4
Add a3 ' * w3, a4 ' * w4 respectively on the basis of a1 ' * w1
Add a3 ' * w3, a4 ' * w4 respectively on the basis of a2 ' * w2
Assuming that the absolute value of (a1 ' * w1+a3 ' * w3)+(a2 ' * w2+a4 ' * w4) is less than (a1 ' * w1+a4 ' * w4)+(a2 ' *
w2+a3’*w3)
So, in B1 input channel optimal arrangement mode (i.e. target array mode) be (channel_1,
channel_3)、(channel_2、channel_4)
In B2, estimate the output valve of channel_5, channel_6, channel_7, channel_8, i.e. a5 ' * w5,
a6’*w6、a7’*w7、a8’*w8
The smallest two channels of absolute value be channel_6, channel_8, i.e. a6 ' * w6, a8 ' * w8 absolute value be less than
The absolute value of a5 ' * w5, a7 ' * w7
Add a5 ' * w5, a7 ' * w7 respectively on the basis of a6 ' * w6
Add a5 ' * w5, a7 ' * w7 respectively on the basis of a8 ' * w8
Assuming that the absolute value of (a6 ' * w6+a5 ' * w5)+(a7 ' * w7+a8 ' * w8) is less than (a6 ' * w6+a8 ' * w8)+(a7 ' *
w7+a5’*w5)
So, in B2 input channel optimal arrangement mode (i.e. target array mode) be (channel_6,
channel_5)、(channel_7、channel_8)
When online:
Respectively from channel_1, channel_2, channel_3, channel_4, channel_5, channel_6,
Channel_7, channel_8 receive input activation value a1, a2, a3, a4, a5, a6, a7, a8
A1, a2, a3, a4, a5, a6, a7, a8 are converted into fixed-point data, as the First Eigenvalue
Channel_1, channel_3, channel_5, channel_7 are assigned to B1
Channel_2, channel_4, channel_6, channel_8 are assigned to B2
B1 is first handled:
In B1, the target array mode of B1 is read, channel_1, channel_31 are assigned to A1, by channel_
2, channel_4 is assigned to A2
In the Second Eigenvalue of A1 operation multiply-add operation are as follows: a1*w1+a3*w3=E1
In the Second Eigenvalue of A2 operation multiply-add operation are as follows: a2*w2+a4*w4=E2
E1 merges with E2 are as follows: E1+E2=E3
After B1 processing is completed, B2 is handled:
In B2, the target array mode of B2 is read, channel_6, channel_5 are assigned to A1, by channel_
7, channel_8 is assigned to A2
In the Second Eigenvalue of A1 operation multiply-add operation are as follows: a_6*w_6+a_5*w_5=E4
In the Second Eigenvalue of A2 operation multiply-add operation are as follows: a_7*w_7+a_8*w_8=E5
E4 merges with E5: E4+E5=E6
E3 and E6 merge into third feature value: E3+E6=E7
E7 is mapped as to the numerical value of floating-point, it is defeated as activating after the operations such as standardized operation, activation primitive activation
It is worth out
Compress the bit number of E7
E7 is output to lower layer's convolutional layer by output channel
Fig. 7 is the structural schematic diagram of the fixed-point calculation device for the convolutional neural networks that the embodiment of the present invention four provides, specifically
May include following module:
Activation value receiving module 710 is inputted, it is described for receiving the input activation value of this layer of convolutional layer by input channel
Input channel has corresponding weight;
Fixed point conversion module 720 obtains the First Eigenvalue for carrying out fixed point operation to the input activation value;
It is grouped memory module 730, is grouped for the First Eigenvalue and the weight to be respectively written into multiple registers
Register in;
Multiply-add operation module 740, for being grouped for the multiple register, respectively according in the register
The First Eigenvalue and the weight carry out multiply-add operation, obtain multiple Second Eigenvalues.
In an alternate embodiment of the present invention where, each register grouping register includes special for storing described first
The register of value indicative, the register for storing the weight, multiplication register and addend register;
The multiply-add operation module 740 includes:
Multiplying submodule, for being grouped for each register, to the same input in the multiplication register
The corresponding the First Eigenvalue in channel and the weight carry out multiplying, obtain feature product data;
Add operation submodule, for the feature product data accumulation into the addend register, to be obtained second
Characteristic value.
In an alternate embodiment of the present invention where, further includes:
Characteristic value merging module obtains third feature value for merging the multiple Second Eigenvalue;
Floating-point conversion module obtains fourth feature value for carrying out floating-point operation to the third feature value;
Activation value generation module is exported, for generating the output activation value of this layer of convolutional layer according to the fourth feature value.
In an alternate embodiment of the present invention where, further includes:
Weight compression module, for compressing the bit number of the weight, to compress the bit number of the Second Eigenvalue.
In an alternate embodiment of the present invention where, the input activation value of this layer of convolutional layer is that the output of upper layer convolutional layer swashs
Value living;
Described device further include:
Activation value compression module is exported, the bit number of the output activation value for compressing upper layer convolutional layer, described in compression
The bit number of Second Eigenvalue.
In an alternate embodiment of the present invention where, the convolutional layer is grouping convolutional layer, and the grouping convolutional layer includes
Multiple convolution groups;
Described device further include:
Candidate arrangement mode enumerates module, for being directed to each convolution group, enumerates the candidate of assigned input channel
Arrangement mode, the input channel have corresponding first trained values;
Training set writing module, for first trained values and the weight to be respectively written into multiple register groupings
In register;
Training set training module, for being grouped for the multiple register, respectively according in the memory
First trained values and the weight carry out multiply-add operation under every kind of candidate arrangement mode, obtain multiple second trained values;
Target trained values selecting module, for determining the smallest second trained values of absolute value in second trained values,
As target trained values;
Target array mode setup module, for the corresponding candidate arrangement mode of the target trained values to be set as inputting
The target array mode in channel.
In an alternate embodiment of the present invention where, the register of each register grouping includes for storing described first
The register of characteristic value, the register for storing the weight, multiplication register and addend register;
The training set training module includes:
Training product data computational submodule, for the institute corresponding to each input channel in the multiplication register
It states the first trained values and the weight carries out multiplying, obtain training product data;
Target training product data select submodule, for the smallest m instruction of selective value from the trained product data
Practice product data, as target training product data;
Submodule is written in target training product data, posts for being written target training product data to the addition
In storage;
Training product data accumulation submodule, for by except the target training product data in addition to other training products
Data are added in the addend register according to every kind of arrangement mode, obtain the second trained values.
In an alternate embodiment of the present invention where, the convolutional layer is grouping convolutional layer, and the grouping convolutional layer includes
Multiple convolution groups;
The input activation value receiving module includes:
Channel distribution sub module, for determining input channel assigned by each convolution group;
Channel reception submodule, for receiving input activation by assigned input channel in each convolution group
Value;
The grouping memory module includes:
Target convolution group determines submodule, for successively determining target convolution group from the multiple convolution group;
Channel sub-module stored is used in the target convolution group, by assigned input channel corresponding first
Characteristic value and weight are respectively written into the register of multiple register groupings;
The multiply-add operation module includes:
Multiply-add submodule is arranged, is used in the target convolution group, respectively according in register grouping
The First Eigenvalue and the preset target array mode of the weight carry out multiply-add operation, obtain multiple Second Eigenvalues.
Any embodiment of that present invention can be performed in the fixed-point calculation device of convolutional neural networks provided by the embodiment of the present invention
The fixed-point calculation method of provided convolutional neural networks has the corresponding functional module of execution method and beneficial effect.
Fig. 8 is the structural schematic diagram for the equipment that the embodiment of the present invention five provides, as shown in figure 8, the equipment includes processor
80, memory 81, input unit 82 and output device 83;The quantity of processor 80 can be one or more in equipment, in Fig. 8
By taking a processor 80 as an example;Processor 80, memory 81, input unit 82 and output device 83 in equipment can be by total
Line or other modes connect, in Fig. 8 for being connected by bus.
Processor 80 includes central processing unit (Central Processing Unit/Processor, CPU), and is deposited
Device 801 is the component part in central processing unit.Register 801 is the high speed depositing element of limited storage capacity, they can be used
To keep in instruction, data and address.In the control unit of central processing unit, the register for including have command register (IR) and
Program counter (PC).In the arithmetic and logic unit of central processing unit, register 801 has accumulator (ACC).
Memory 81 is used as a kind of computer readable storage medium, can be used for storing software program, journey can be performed in computer
Sequence and module, such as the corresponding program instruction of the fixed-point calculation method of the convolutional neural networks in the embodiment of the present invention/module (example
Such as, the input activation value receiving module 710 in the fixed-point calculation device of convolutional neural networks, fixed point conversion module 720, grouping
Memory module 730 and multiply-add operation module 740).Processor 80 is by running the software program being stored in memory 81, instruction
And module realizes above-mentioned convolution thereby executing equipment/terminal/server various function application and data processing
The fixed-point calculation method of neural network.
Memory 81 can mainly include storing program area and storage data area, wherein storing program area can store operation system
Application program needed for system, at least one function;Storage data area, which can be stored, uses created data etc. according to terminal.This
Outside, memory 81 may include high-speed random access memory, can also include nonvolatile memory, for example, at least a magnetic
Disk storage device, flush memory device or other non-volatile solid state memory parts.In some instances, memory 81 can be further
Including the memory remotely located relative to processor 80, these remote memories can by network connection to equipment/terminal/
Server.The example of above-mentioned network includes but is not limited to internet, intranet, local area network, mobile radio communication and combinations thereof.
Input unit 82 can be used for receiving the number or character information of input, and generate with the user setting of equipment and
The related key signals input of function control.Output device 83 may include that display screen etc. shows equipment.
The embodiment of the present invention also provides a kind of storage medium comprising computer executable instructions, and the computer is executable
It instructs when being executed by computer processor for executing a kind of fixed-point calculation method of convolutional neural networks, this method comprises:
The input activation value of this layer of convolutional layer is received by input channel, the input channel has corresponding weight;
Fixed point operation is carried out to the input activation value, obtains the First Eigenvalue;
The First Eigenvalue and the weight are respectively written into the register of multiple register groupings;
For the multiple register be grouped, respectively according in the register the First Eigenvalue and the weight
Multiply-add operation is carried out, multiple Second Eigenvalues are obtained.
Certainly, a kind of storage medium comprising computer executable instructions, computer provided by the embodiment of the present invention
The method operation that executable instruction is not limited to the described above, can also be performed convolutional Neural provided by any embodiment of the invention
Relevant operation in the fixed-point calculation of network
By the description above with respect to embodiment, it is apparent to those skilled in the art that, the present invention
It can be realized by software and required common hardware, naturally it is also possible to which by hardware realization, but in many cases, the former is more
Good embodiment.Based on this understanding, technical solution of the present invention substantially in other words contributes to the prior art
Part can be embodied in the form of software products, which can store in computer readable storage medium
In, floppy disk, read-only memory (Read-Only Memory, ROM), random access memory (Random such as computer
Access Memory, RAM), flash memory (FLASH), hard disk or CD etc., including some instructions are with so that a computer is set
Standby (can be personal computer, server or the network equipment etc.) executes method described in each embodiment of the present invention.
It is worth noting that, in the embodiment of the fixed-point calculation device of above-mentioned convolutional neural networks, included each list
Member and module are only divided according to the functional logic, but are not limited to the above division, as long as can be realized corresponding
Function;In addition, the specific name of each functional unit is also only for convenience of distinguishing each other, it is not intended to restrict the invention
Protection scope.
It should be noted that above are only presently preferred embodiments of the present invention and institute's application technology principle of the present invention.This field
Technical staff is appreciated that the invention is not limited to the specific embodiments described herein, for a person skilled in the art can be into
The various apparent variations of row are readjusted and are substituted without departing from protection scope of the present invention.Therefore, although above embodiments
The present invention is described in further detail, but the present invention is not limited to this, without departing from the inventive concept,
It can also include more other equivalent embodiments, and the scope of the present invention is determined by scope of the claims.
Claims (14)
1. a kind of fixed-point calculation method of convolutional neural networks, which is characterized in that the convolutional neural networks include convolutional layer, institute
The method of stating includes:
The input activation value of this layer of convolutional layer is received by input channel, the input channel has corresponding weight;
Fixed point operation is carried out to the input activation value, obtains the First Eigenvalue;
The First Eigenvalue and the weight are respectively written into the register of multiple register groupings;
It is grouped for the multiple register, is carried out respectively according to the First Eigenvalue in the register with the weight
Multiply-add operation obtains multiple Second Eigenvalues.
2. fixed-point calculation method according to claim 1, which is characterized in that each register grouping register includes being used for
Store the register of the First Eigenvalue, the register for storing the weight, multiplication register and addend register;
It is described for the multiple register be grouped, respectively according in the register the First Eigenvalue and the weight
Multiply-add operation is carried out, multiple Second Eigenvalues are obtained, comprising:
It is grouped for each register, the First Eigenvalue corresponding to the same input channel in the multiplication register
Multiplying is carried out with the weight, obtains feature product data;
By the feature product data accumulation into the addend register, Second Eigenvalue is obtained.
3. fixed-point calculation method according to claim 1, which is characterized in that further include:
Merge the multiple Second Eigenvalue, obtains third feature value;
Floating-point operation is carried out to the third feature value, obtains fourth feature value;
The output activation value of this layer of convolutional layer is generated according to the fourth feature value.
4. fixed-point calculation method according to claim 1-3, which is characterized in that connect described by input channel
Before the input activation value for receiving this layer of convolutional layer, the method also includes:
The bit number of the weight is compressed, to compress the bit number of the Second Eigenvalue.
5. fixed-point calculation method according to claim 1-3, which is characterized in that the input activation of this layer of convolutional layer
Value is the output activation value of upper layer convolutional layer;
Before the input activation value for receiving this layer of convolutional layer by input channel, the method also includes:
The bit number of the output activation value of upper layer convolutional layer is compressed, to compress the bit number of the Second Eigenvalue.
6. fixed-point calculation method according to claim 1, which is characterized in that the convolutional layer is grouping convolutional layer, described
Being grouped convolutional layer includes multiple convolution groups;
The method also includes:
For each convolution group, the candidate arrangement mode of assigned input channel is enumerated, the input channel, which has, to be corresponded to
The first trained values;
First trained values and the weight are respectively written into the register of multiple register groupings;
For the multiple register be grouped, respectively according to first trained values in the memory with the weight every
Multiply-add operation is carried out under the candidate arrangement mode of kind, obtains multiple second trained values;
The smallest second trained values of absolute value are determined in second trained values, as target trained values;
Set the corresponding candidate arrangement mode of the target trained values to the target array mode of input channel.
7. fixed-point calculation method according to claim 6, which is characterized in that the register of each register grouping includes using
In the register, the register for storing the weight, multiplication register and the addend register that store the First Eigenvalue;
Described to be grouped for the multiple register, the weight according to first trained values in the memory exists respectively
Multiply-add operation is carried out under every kind of candidate arrangement mode, obtaining multiple second trained values includes:
First trained values corresponding to each input channel and the weight carry out multiplication in the multiplication register
Operation obtains training product data;
The selective value the smallest m trained product data from the trained product data, as target training product data;
Target training product data are written into the addend register;
Other training product data in addition to target training product data are added to according to every kind of arrangement mode described
In addend register, the second trained values are obtained.
8. fixed-point calculation method described according to claim 1 or 2 or 3 or 6 or 7, which is characterized in that the convolutional layer is grouping
Convolutional layer, the grouping convolutional layer include multiple convolution groups;
The input activation value that this layer of convolutional layer is received by input channel, comprising:
Determine input channel assigned by each convolution group;
In each convolution group, input activation value is received by assigned input channel;
In the register that the First Eigenvalue and the weight are respectively written into multiple register groupings, comprising:
Target convolution group is successively determined from the multiple convolution group;
In the target convolution group, the assigned corresponding the First Eigenvalue of input channel is respectively written into weight multiple
In the register of register grouping;
It is described for the multiple register be grouped, respectively according in the register the First Eigenvalue and the weight
Multiply-add operation is carried out, multiple Second Eigenvalues are obtained, comprising:
In the target convolution group, preset respectively according to the First Eigenvalue in register grouping with the weight
Target array mode carry out multiply-add operation, obtain multiple Second Eigenvalues.
9. a kind of fixed-point calculation device of convolutional neural networks, which is characterized in that the convolutional neural networks include convolutional layer, institute
Stating device includes:
Activation value receiving module is inputted, for receiving the input activation value of this layer of convolutional layer by input channel, the input is logical
Road has corresponding weight;
Fixed point conversion module obtains the First Eigenvalue for carrying out fixed point operation to the input activation value;
It is grouped memory module, for the First Eigenvalue and the weight to be respectively written into the register of multiple register groupings
In;
Multiply-add operation module, it is special according to described first in the register respectively for being grouped for the multiple register
Value indicative and the weight carry out multiply-add operation, obtain multiple Second Eigenvalues.
10. fixed-point calculation device according to claim 9, which is characterized in that each register grouping register includes using
In the register, the register for storing the weight, multiplication register and the addend register that store the First Eigenvalue;
The multiply-add operation module includes:
Multiplying submodule, for being grouped for each register, to the same input channel in the multiplication register
The corresponding the First Eigenvalue and the weight carry out multiplying, obtain feature product data;
Add operation submodule, for the feature product data accumulation into the addend register, to be obtained second feature
Value.
11. fixed-point calculation device according to claim 9, which is characterized in that further include:
Characteristic value merging module obtains third feature value for merging the multiple Second Eigenvalue;
Floating-point conversion module obtains fourth feature value for carrying out floating-point operation to the third feature value;
Activation value generation module is exported, for generating the output activation value of this layer of convolutional layer according to the fourth feature value.
12. according to the described in any item fixed-point calculation devices of claim 9-11, which is characterized in that further include:
Weight compression module, for compressing the bit number of the weight, to compress the bit number of the Second Eigenvalue.
13. a kind of equipment including memory, processor and stores the computer journey that can be run on a memory and on a processor
Sequence, which is characterized in that the processor realizes such as convolutional Neural net described in any one of claims 1-8 when executing described program
The fixed-point calculation method of network.
14. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the program is by processor
The fixed-point calculation method such as convolutional neural networks described in any one of claims 1-8 is realized when execution.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811302449.8A CN109409514A (en) | 2018-11-02 | 2018-11-02 | Fixed-point calculation method, apparatus, equipment and the storage medium of convolutional neural networks |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811302449.8A CN109409514A (en) | 2018-11-02 | 2018-11-02 | Fixed-point calculation method, apparatus, equipment and the storage medium of convolutional neural networks |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109409514A true CN109409514A (en) | 2019-03-01 |
Family
ID=65471379
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811302449.8A Pending CN109409514A (en) | 2018-11-02 | 2018-11-02 | Fixed-point calculation method, apparatus, equipment and the storage medium of convolutional neural networks |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109409514A (en) |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110780845A (en) * | 2019-10-17 | 2020-02-11 | 浙江大学 | Configurable approximate multiplier for quantization convolutional neural network and implementation method thereof |
CN110796245A (en) * | 2019-10-25 | 2020-02-14 | 浪潮电子信息产业股份有限公司 | Method and device for calculating convolutional neural network model |
CN110874813A (en) * | 2020-01-16 | 2020-03-10 | 湖南极点智能科技有限公司 | Image processing method, device and equipment and readable storage medium |
CN110929862A (en) * | 2019-11-26 | 2020-03-27 | 陈子祺 | Fixed-point neural network model quantization device and method |
CN111210017A (en) * | 2019-12-24 | 2020-05-29 | 北京迈格威科技有限公司 | Method, device, equipment and storage medium for determining layout sequence and processing data |
CN111767980A (en) * | 2019-04-02 | 2020-10-13 | 杭州海康威视数字技术股份有限公司 | Model optimization method, device and equipment |
CN113408715A (en) * | 2020-03-17 | 2021-09-17 | 杭州海康威视数字技术股份有限公司 | Fixed-point method and device for neural network |
CN113785312A (en) * | 2019-05-16 | 2021-12-10 | 日立安斯泰莫株式会社 | Arithmetic device and arithmetic method |
WO2022006919A1 (en) * | 2020-07-10 | 2022-01-13 | 中国科学院自动化研究所 | Activation fixed-point fitting-based method and system for post-training quantization of convolutional neural network |
CN110298446B (en) * | 2019-06-28 | 2022-04-05 | 济南大学 | Deep neural network compression and acceleration method and system for embedded system |
CN114692833A (en) * | 2022-03-30 | 2022-07-01 | 深圳齐芯半导体有限公司 | Convolution calculation circuit, neural network processor and convolution calculation method |
CN115994561A (en) * | 2023-03-22 | 2023-04-21 | 山东云海国创云计算装备产业创新中心有限公司 | Convolutional neural network acceleration method, system, storage medium, device and equipment |
CN118426734A (en) * | 2024-07-02 | 2024-08-02 | 深圳鲲云信息科技有限公司 | Accumulator, method for accumulator and computing device |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101882238A (en) * | 2010-07-15 | 2010-11-10 | 长安大学 | Wavelet neural network processor based on SOPC (System On a Programmable Chip) |
CN102665049A (en) * | 2012-03-29 | 2012-09-12 | 中国科学院半导体研究所 | Programmable visual chip-based visual image processing system |
CN103399304A (en) * | 2013-07-22 | 2013-11-20 | 西安电子科技大学 | Field programmable gate array (FPGA) implementation equipment and method for self-adaptive clutter suppression of external radiation source radar |
CN106127302A (en) * | 2016-06-23 | 2016-11-16 | 杭州华为数字技术有限公司 | Process the circuit of data, image processing system, the method and apparatus of process data |
CN107292382A (en) * | 2016-03-30 | 2017-10-24 | 中国科学院声学研究所 | A kind of neutral net acoustic model activation primitive pinpoints quantization method |
CN107636697A (en) * | 2015-05-08 | 2018-01-26 | 高通股份有限公司 | The fixed point neutral net quantified based on floating-point neutral net |
CN108133270A (en) * | 2018-01-12 | 2018-06-08 | 清华大学 | Convolutional neural networks accelerating method and device |
-
2018
- 2018-11-02 CN CN201811302449.8A patent/CN109409514A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101882238A (en) * | 2010-07-15 | 2010-11-10 | 长安大学 | Wavelet neural network processor based on SOPC (System On a Programmable Chip) |
CN102665049A (en) * | 2012-03-29 | 2012-09-12 | 中国科学院半导体研究所 | Programmable visual chip-based visual image processing system |
CN103399304A (en) * | 2013-07-22 | 2013-11-20 | 西安电子科技大学 | Field programmable gate array (FPGA) implementation equipment and method for self-adaptive clutter suppression of external radiation source radar |
CN107636697A (en) * | 2015-05-08 | 2018-01-26 | 高通股份有限公司 | The fixed point neutral net quantified based on floating-point neutral net |
CN107292382A (en) * | 2016-03-30 | 2017-10-24 | 中国科学院声学研究所 | A kind of neutral net acoustic model activation primitive pinpoints quantization method |
CN106127302A (en) * | 2016-06-23 | 2016-11-16 | 杭州华为数字技术有限公司 | Process the circuit of data, image processing system, the method and apparatus of process data |
CN108133270A (en) * | 2018-01-12 | 2018-06-08 | 清华大学 | Convolutional neural networks accelerating method and device |
Non-Patent Citations (1)
Title |
---|
柳杨: "《数字图像物体识别理论详解与实战》", 31 January 2018, 北京邮电大学出版社 * |
Cited By (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111767980B (en) * | 2019-04-02 | 2024-03-05 | 杭州海康威视数字技术股份有限公司 | Model optimization method, device and equipment |
CN111767980A (en) * | 2019-04-02 | 2020-10-13 | 杭州海康威视数字技术股份有限公司 | Model optimization method, device and equipment |
CN113785312A (en) * | 2019-05-16 | 2021-12-10 | 日立安斯泰莫株式会社 | Arithmetic device and arithmetic method |
CN113785312B (en) * | 2019-05-16 | 2024-06-07 | 日立安斯泰莫株式会社 | Arithmetic device and arithmetic method |
CN110298446B (en) * | 2019-06-28 | 2022-04-05 | 济南大学 | Deep neural network compression and acceleration method and system for embedded system |
CN110780845A (en) * | 2019-10-17 | 2020-02-11 | 浙江大学 | Configurable approximate multiplier for quantization convolutional neural network and implementation method thereof |
CN110780845B (en) * | 2019-10-17 | 2021-11-30 | 浙江大学 | Configurable approximate multiplier for quantization convolutional neural network and implementation method thereof |
CN110796245A (en) * | 2019-10-25 | 2020-02-14 | 浪潮电子信息产业股份有限公司 | Method and device for calculating convolutional neural network model |
CN110796245B (en) * | 2019-10-25 | 2022-03-22 | 浪潮电子信息产业股份有限公司 | Method and device for calculating convolutional neural network model |
CN110929862B (en) * | 2019-11-26 | 2023-08-01 | 陈子祺 | Fixed-point neural network model quantification device and method |
CN110929862A (en) * | 2019-11-26 | 2020-03-27 | 陈子祺 | Fixed-point neural network model quantization device and method |
CN111210017A (en) * | 2019-12-24 | 2020-05-29 | 北京迈格威科技有限公司 | Method, device, equipment and storage medium for determining layout sequence and processing data |
CN111210017B (en) * | 2019-12-24 | 2023-09-26 | 北京迈格威科技有限公司 | Method, device, equipment and storage medium for determining layout sequence and data processing |
CN110874813A (en) * | 2020-01-16 | 2020-03-10 | 湖南极点智能科技有限公司 | Image processing method, device and equipment and readable storage medium |
WO2021185125A1 (en) * | 2020-03-17 | 2021-09-23 | 杭州海康威视数字技术股份有限公司 | Fixed-point method and apparatus for neural network |
CN113408715A (en) * | 2020-03-17 | 2021-09-17 | 杭州海康威视数字技术股份有限公司 | Fixed-point method and device for neural network |
CN113408715B (en) * | 2020-03-17 | 2024-05-28 | 杭州海康威视数字技术股份有限公司 | Method and device for fixing neural network |
WO2022006919A1 (en) * | 2020-07-10 | 2022-01-13 | 中国科学院自动化研究所 | Activation fixed-point fitting-based method and system for post-training quantization of convolutional neural network |
CN114692833A (en) * | 2022-03-30 | 2022-07-01 | 深圳齐芯半导体有限公司 | Convolution calculation circuit, neural network processor and convolution calculation method |
CN114692833B (en) * | 2022-03-30 | 2023-11-21 | 广东齐芯半导体有限公司 | Convolution calculation circuit, neural network processor and convolution calculation method |
CN115994561A (en) * | 2023-03-22 | 2023-04-21 | 山东云海国创云计算装备产业创新中心有限公司 | Convolutional neural network acceleration method, system, storage medium, device and equipment |
CN118426734A (en) * | 2024-07-02 | 2024-08-02 | 深圳鲲云信息科技有限公司 | Accumulator, method for accumulator and computing device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109409514A (en) | Fixed-point calculation method, apparatus, equipment and the storage medium of convolutional neural networks | |
CN110378468B (en) | Neural network accelerator based on structured pruning and low bit quantization | |
Gondimalla et al. | SparTen: A sparse tensor accelerator for convolutional neural networks | |
CN109063825B (en) | Convolutional neural network accelerator | |
US20180204110A1 (en) | Compressed neural network system using sparse parameters and design method thereof | |
CN107451659B (en) | Neural network accelerator for bit width partition and implementation method thereof | |
CN109478144B (en) | Data processing device and method | |
KR102476343B1 (en) | Apparatus and method for supporting neural network calculation of fixed-point numbers with relatively few digits | |
CN109543816B (en) | Convolutional neural network calculation method and system based on weight kneading | |
CN108053028A (en) | Data fixed point processing method, device, electronic equipment and computer storage media | |
US11797855B2 (en) | System and method of accelerating execution of a neural network | |
CN108701250A (en) | Data fixed point method and apparatus | |
CN112668708B (en) | Convolution operation device for improving data utilization rate | |
WO2019239254A1 (en) | Parallel computational architecture with reconfigurable core-level and vector-level parallelism | |
CN106127302A (en) | Process the circuit of data, image processing system, the method and apparatus of process data | |
CN110717583B (en) | Convolution circuit, processor, chip, board card and electronic equipment | |
CN110705703A (en) | Sparse neural network processor based on systolic array | |
TWI738048B (en) | Arithmetic framework system and method for operating floating-to-fixed arithmetic framework | |
CN111985597B (en) | Model compression method and device | |
JP7085600B2 (en) | Similar area enhancement method and system using similarity between images | |
Shahshahani et al. | Memory optimization techniques for fpga based cnn implementations | |
Delaye et al. | Deep learning challenges and solutions with xilinx fpgas | |
CN110337636A (en) | Data transfer device and device | |
Ahn et al. | Deeper weight pruning without accuracy loss in deep neural networks: Signed-digit representation-based approach | |
Lei et al. | Compressing deep convolutional networks using k-means based on weights distribution |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190301 |