CN107688855A - It is directed to the layered quantization method and apparatus of Complex Neural Network - Google Patents

It is directed to the layered quantization method and apparatus of Complex Neural Network Download PDF

Info

Publication number
CN107688855A
CN107688855A CN201610698184.2A CN201610698184A CN107688855A CN 107688855 A CN107688855 A CN 107688855A CN 201610698184 A CN201610698184 A CN 201610698184A CN 107688855 A CN107688855 A CN 107688855A
Authority
CN
China
Prior art keywords
layers
layer
mrow
msub
concat
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610698184.2A
Other languages
Chinese (zh)
Other versions
CN107688855B (en
Inventor
余金城
姚颂
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xilinx Inc
Original Assignee
Beijing Insight Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Insight Technology Co Ltd filed Critical Beijing Insight Technology Co Ltd
Publication of CN107688855A publication Critical patent/CN107688855A/en
Application granted granted Critical
Publication of CN107688855B publication Critical patent/CN107688855B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The present invention relates to artificial neural network (ANN), such as convolutional neural networks (CNN), more particularly to how to realize compression by quantifying to the fixed point of Complex Neural Network and accelerate artificial neural network.

Description

It is directed to the layered quantization method and apparatus of Complex Neural Network
The priority requisition of reference
This application claims the Chinese patent application 201610663201.9 formerly submitted, a kind of " optimized artificial neural network Method " and Chinese patent application 201610663563.8 " a kind of to be used to realize ANN advanced treatment unit " priority.
Technical field
The present invention relates to artificial neural network (ANN), such as convolutional neural networks (CNN), more particularly to how by right The fixed point of Complex Neural Network quantifies to realize compression and accelerate artificial neural network.
Background technology
Based on artificial neural network, especially convolutional neural networks (CNN, Convolutional Neural Network) Method all achieve great success in many applications, especially obtain most powerful sum always in computer vision field Widely use.
Image classification is a basic problem in computer vision (CV).Convolutional neural networks (CNN) cause image point Class precision obtains very big progress.In Image-Net Large Scale Vision Recognition Challenge (ILSVRC) 2012, Krizhevsky et al. is represented, by obtaining 84.7% preceding 5 accuracys rate, wherein CNN in classification task With great role, this is apparently higher than other traditional image classification methods.In ensuing several years, such as ILSVRC2013, ILSVRC2014 and ILSVRC2015, precision bring up to 88.8%, 93.3% and 96.4%.
Although the method based on CNN has state-of-the-art performance, need more to calculate compared with conventional method and interior Deposit resource.It is most of that large server is necessarily dependent upon based on CNN methods.However, there is one can not neglect for embedded system Depending on market, this market demands high accuracy and can real time target recognitio, such as autonomous driving vehicle and robot.But for insertion Formula system, the problem of limited battery and resource are serious.
Convolutional neural networks have a very wide range of applications in present image process field, and neutral net has training method simple The characteristics of single, calculating structure is unified.But neutral net storage amount of calculation is all very big.In present system it is many using 32 or 64 floating number numeral expression systems of person.But bulk redundancy be present in the data of neutral net, with low bit fixed-point number one The data result of neutral net is influenceed in the case of a little little.
In past patent application, the successively change precision fixed point for convolutional neural networks has been proposed in inventor Method and device.For example, it is directed to the scheme of simple branchiess neutral net.Specifically, can only be directed to pure serial Neutral net (any one layer of Layer N of neutral net, one and only one precursor layer, one and only one subsequent layer), As shown in Fig. 2 and re -training is not carried out to neutral net.Basic procedure includes:For the neutral net of input, by from defeated Enter to output, successively by a function error is minimized, decide each layer of precision, to the last one layer.
Such scheme is successively propagated using fix-point method, the front layer that each layer of fixed point will rely on.For network structure That branch merges with branch be present to have no idea to handle.Such scheme for current trend network (GoogLeNet, SqueezeNet etc.) and do not apply to.
The content of the invention
The target of the invention is exactly to solve the fixed-point problem with branching networks.The present invention devise a kind of scheme to Determine to determine the position of fixed-point number decimal point, while the dynamic range of analyze data when fixed-point number bit wide, find neutral net Each layer of dynamic range is different, so we have proposed the method pinpointed respectively to different parameters in each layer according to different layers.
According to an aspect of the invention, it is proposed that a kind of method for quantifying artificial neural network (ANN), methods described bag Include:(a) fixed point quantization is carried out to the number range of each layer of output in the artificial neural network;(b) determine described artificial Each sub-network in neutral net, wherein each sub-network is using cascading layers as last layer, wherein each cascading layers are before An output is merged into the output that multiple layers of one-level;(c) each sub-network is directed to, the fixed point of the output based on cascading layers quantifies model Enclose, change the fixed point quantizing range of the output of each layer of the previous stage of the cascading layers.
According to another aspect of the present invention, the present invention proposes a kind of method for quantifying artificial neural network (ANN), wherein The ANN is the neutral net for having branch, is comprised at least:1st, the 2nd ... n-th of convolutional layer (CONV layers), the 1st, the 2nd ... M is complete to connect layer (FC layers), the 1st, the 2nd ... l-th of CONCAT layer, wherein described n, m, l are positive integer, methods described includes:It is right The number range of each layer of output carries out fixed point quantization in the CONV layers, FC layers, CONCAT layers;Determine in the ANN Multiple sub-networks, each sub-network is using CONCAT layers as last layer, wherein multiple layer of each CONCAT layers previous stage Output merge into an output;For each sub-network, the fixed point quantizing range of the output based on CONCAT layers, described in modification The fixed point quantizing range of the output of each layer of the previous stage of CONCAT layers.
According to another aspect of the invention, the present invention proposes a kind of device for quantifying artificial neural network (ANN), wherein The ANN is the neutral net for having branch, is comprised at least:1st, the 2nd ... n-th of convolutional layer (CONV layers), the 1st, the 2nd ... M is complete to connect layer (FC layers), the 1st, the 2nd ... l-th of CONCAT layer, wherein described n, m, l are positive integer, described device includes:Number According to fixed point quantization device, for being pinpointed to the number range of each layer of output in the CONV layers, FC layers, CONCAT layers Quantify;CONCAT layer determining devices, for determining multiple sub-networks in the ANN, each sub-network using CONCAT layers as Last layer, wherein multiple layers of output of previous stage is merged into an output by each CONCAT layers;Adjusting apparatus is pinpointed, is used In to each sub-network, the fixed point quantizing range of the output based on CONCAT layers, the fixed point of the input of the CONCAT layers is changed Quantizing range.
According to another aspect proposed by the present invention, in dynamic quantization, it is allowed to which neutral net has the structure of Multi-layer Parallel, example In Fig. 6 " DepthConcat " represent layer.Pinpoint the basic procedure quantified:For the neutral net of input, by from input It is divided into several levels to output, one-level is can be regarded as in multiple inputs in parallel of a certain layer;By a function error is minimized step by step, really The precision of every one-level is decided, to the last one-level.
According to another aspect proposed by the present invention, it is further improved, cross-layer, RoI pooling layers can be supported Not support layer before more so, be one more comprehensively and novel quantization method.
Brief description of the drawings
Fig. 1 shows typical CNN schematic diagram.
Fig. 2 shows to CNN compressions, quantization, compiled so as to realize the schematic diagram of optimization acceleration.
Fig. 3 shows the schematic diagram of the quantization step in flow shown in Fig. 2.
Fig. 4 shows each convolutional layer in the CNN networks being connected in series, the full schematic diagram for connecting layer and the output of each layer.
Fig. 5 shows the quantization scheme of the CNN for being connected in series.
Fig. 6 a-6c show the example GoogLeNet in the presence of the complicated CNN for being connected in parallel and being connected in series.
Fig. 7 shows the CONCAT operations for complicated CNN.
Fig. 8 shows the schematic diagram of CONCAT operations.
Fig. 9 shows the quantization scheme for complicated CNN.
Figure 10 shows the fixed point quantization that previous stage input is adjusted according to CONCAT layers based on the embodiment of the present invention The example put.
Embodiment
A part of content of the application is once by inventor Yao Song academic article " Going Deeper With Embedded FPGA Platform for Convolutional Neural Network " (2016.2) are delivered.The application The content of above-mentioned article is included, and has carried out more improvement on its basis.
In the application, mainly it will illustrate improvement of the present invention to CNN by taking image procossing as an example.The scheme of the application is applicable In various artificial neural networks, including deep neural network (DNN), Recognition with Recurrent Neural Network (RNN) and convolutional neural networks (CNN).Illustrated below by taking CNN as an example
CNN basic conceptions
CNN reaches state-of-the-art performance in extensive visual correlation task.Help, which is understood in the application, to be analyzed Based on CNN image classification algorithms, we describe CNN rudimentary knowledge first, introduce image network data set and existing CNN moulds Type.
As shown in figure 1, typical CNN is made up of a series of layer of orderly functions.
The parameter of CNN models is referred to as " weight " (weights).CNN first layer reads input picture, and exports a system The characteristic pattern (map) of row.Following layer reads the characteristic pattern as caused by last layer, and exports new characteristic pattern.Last point The probability for each classification that class device (classifier) output input picture may belong to.CONV layers (convolutional layer) and FC layers are (complete Even layer) it is two kinds of basic channel types in CNN.After CONV layers, generally there is tether layer (Pooling layers).
In this application, for a CNN layer,J-th of input feature vector figure (input feature map) is represented,Represent i-th of output characteristic figure (output feature map), biRepresent the bias term of i-th of output figure.
For CONV layers, ninAnd noutThe quantity of input and output characteristic figure is represented respectively.
For FC layers, ninAnd noutThe length of input and output characteristic vector is represented respectively.
The definition of CONV layers (Convolutional layers, convolutional layer):CONV layers are using series of features figure as defeated Enter, and output characteristic figure is obtained with convolution kernels convolution.
The non-linear layer being generally connected with CONV layers, i.e. nonlinear activation function, be applied to every in output characteristic figure Individual element.
CONV layers can be represented with expression formula 1:
Wherein gi,jIt is applied to the convolution kernels of j-th of input feature vector figure and i-th of output characteristic figure.
The definition of FC layers (Fully-Connected layers, connect layer entirely):FC layers are applied on input feature value One linear transformation:
fout=Wfin+b (2)
W is a nout×ninTransformation matrix, b are bias terms.It is noted that for FC layers, input is not several two dimensions The combination of characteristic pattern, but a characteristic vector.Therefore, in expression formula 2, parameter ninAnd noutActually correspond to input and The length of output characteristic vector.
Collect (pooling) layer:Generally it is connected with CONV layers, for exporting each subregion in each characteristic pattern (subarea) maximum or average value.Pooling maximums can be represented by expression formula 3:
Wherein p is the size for collecting kernel.This nonlinear " down-sampled " is not only that next layer reduces characteristic pattern Size and calculating, additionally provide a kind of translation invariant (translation invariance).
CNN can be used for during forward inference carrying out image classification.But before CNN is used to any task, it should first First train CNN data sets.It has recently been demonstrated that the CNN of the forward direction training based on large data sets for a Given task Model can be used for other tasks, and realize high-precision minor adjustment in network weight (network weights), this Minor adjustment is called " fine setting (fine-tune) ".CNN training is mainly realized on large server.For embedded FPGA platform, we are absorbed in the reasoning process for accelerating CNN.
Fig. 2 is shown to accelerate a whole set of technical scheme that CNN proposes from handling process and the angle of hardware structure.
Artificial nerve network model is shown on the left of Fig. 2, that is, the target to be optimized.Illustrated how between in fig. 2 compression, Quantify, compiling CNN models reduce loss of significance to greatest extent to reduce EMS memory occupation and operation amount.It is aobvious on the right side of Fig. 2 The specialized hardware provided for the CNN after compression has been provided.
The dynamic quantization scheme for neutral net of connecting
Fig. 3 shows the more details of Fig. 2 quantization step.
For a fixed-point number, its value represents as follows:
Wherein bw is several bit widths, flBe can be negative part length (fractional length).
In order to obtain full accuracy while floating number is converted into fixed-point number, it is proposed that a dynamic accuracy data quantization Tactful and automatic workflow.
It is different from former static accuracy quantization strategy, in the data quantization flow proposed, flFor different layers and Feature atlas is dynamic change, while keeps static in one layer, to reduce by every layer of truncated error as far as possible.
As shown in figure 3, the quantization flow that the application is proposed mainly is made up of two stages:Weight quantization stage and data Quantization stage.
The purpose of weight quantization stage is the optimal f for the weight for finding a layerl, such as expression formula 5:
Wherein W is weight, and W (bw, fl) is represented in given bw and flUnder W fixed point format.
Alternatively, the dynamic range of each layer of weight is analyzed first, such as is estimated by sampling.Afterwards, in order to Avoid data from overflowing, initialize fl.In addition, we are in initial flThe optimal f of neighborhood searchl
Alternatively, in weight pinpoints quantization step, optimal f is found using another wayl, such as expression formula 6.
Wherein, i represents a certain position in bw positions, kiFor this weight.By the way of expression formula 6, to different positions Different weights is given, then calculates optimal fl
The data quantization stage is it is intended that the feature atlas between two layers of CNN models finds optimal fl., can be with this stage CNN is trained using training dataset (bench mark).The training dataset can be data set0.
Alternatively, all CNN CONV layers are completed first, the weight of FC layers quantifies, then carries out data quantization.Now, Training dataset is input to the CNN for being quantized weight, and by the successively processing of CONV layers, FC layers, it is special to obtain each layer input Sign figure.
For each layer of input feature vector figure, successively compared in fixed point CNN models and floating-point CNN models using greedy algorithm Between data, to reduce loss of significance.Each layer of optimization aim is as shown in expression formula 7:
In expression formula 7, when A represents the calculating of one layer (such as a certain CONV layers or FC layers), x represents input, x+=A During x, x+Represent the output of this layer.It is worth noting that, for CONV layers or FC layers, direct result x+With than given standard Longer bit width.Therefore, as optimal flNeed to block during selection.Finally, whole data quantization configuration generation.
According to another embodiment of the invention, in data pinpoint quantization step, found most using another way Good fl, such as expression formula 8.
Wherein, i represents a certain position in bw positions, kiFor this weight.It is similar with the mode of expression formula 6, to different Different weights is given in position, then calculates optimal fl
Above-mentioned data quantization step obtains optimal fl
In addition, weight quantifies and data quantization can be alternately.For the flow order of data processing, the ANN Convolutional layer (CONV layers), to connect each layer in layer (FC layers) entirely be series relationship, the training dataset is by the CONV of the ANN Each feature atlas that layer and FC layers obtain when handling successively.
Specifically, the weight quantization step and the data quantization step according to the series relationship alternately, Wherein completed wherein in the weight quantization step when the fixed point of after pinpointing quantization of preceding layer, next layer of beginning quantifies it Before, data quantization step is performed to the feature atlas exported when preceding layer.
The above-mentioned scheme for successively becoming precision fixed point method and device and being applied to simple branchiess neutral net.
Fig. 4 shows pure serial neutral net, any one layer of Layer N of neutral net, one and only one forerunner Layer, one and only one subsequent layer.Basic procedure:For the neutral net of input, by from output is input to, successively by a letter Number causes error to minimize, and decides each layer of precision, to the last one layer.
Positioned ways shown in Fig. 5:Most suitable fixed position is found by the way of successively pinpointing.
It can be seen that Fig. 5 method needs online generation fixed-point number neutral net.So-called " online ", exactly chooses some Typical picture, the serial picture is tested, the result of centre is just can know that during testing these pictures. Neutral net is pinpointed because Fig. 5 scheme employs the mode successively pinpointed, so needing one to support fixed-point number Testing tool, the output result of fixed point last layer is had already passed through during the input of instrument, the output of instrument is this layer of fixed-point number network Result.
The fixed point dynamic quantization scheme of complex network
Fig. 5 scheme is successively propagated using fix-point method, the front layer that each layer of fixed point will rely on.For network structure That branch merges with branch be present to have no idea to handle.
Fig. 5 scheme for current trend network (GoogLeNet, SqueezeNet etc.) and do not apply to.Fig. 5 method Each layer of fixed-point operation all relies on the fixed point results of last layer, so having considerable restraint to network structure.
Fig. 6 a-6c show an example GoogLeNet of the neutral net of complexity, and wherein network has multiple branches, together When the relation including series, parallel, Fig. 6 c are the inputs of GoogLeNet models, and Fig. 6 a are the output of GoogLeNet models.Fig. 6 Shown complex network GoogLeNet more information may refer to Christian Szegedy et al. Going deeper The texts of with convolutions mono-.
For there is the network of branch (such as GoogLeNet), the cascade (CONCAT, concatenation) for multilayer There is the input that multiple layers of output is linked into CONCAT on operation, upper strata.
CONCAT operations just refer to the data of each input layer being attached (cascade, CONCATenation) according to passage Into new one layer, then export to next layer.Such as CONCAT input has two layers:A and input B are inputted, inputs A characteristic pattern Size is WxH, port number C1, and the characteristic pattern size for inputting B is WxH, port number C2.By the spy after CONCAT layers It is WxHx (C1+C2) to levy figure dimension.
In an example as shown in Figure 7, CONCAT layers have 4 inputs, 1*1 convolutional layer, 3*3 convolutional layer, 5*5 The maximum tether layer of convolutional layer, 3*3, CONCAT layers cascade up this 4 inputs, there is provided an output.There is the complexity of branch Neutral net needs CONCAT to operate, so as to have corresponding CONCAT layers in neural network model.
Fig. 8 shows the example for the operation that CONCAT layers perform.
BLOB (binary large object) is binary large object, is an appearance that can store binary file Device.In a computer, BLOB is often the field type for being used for storing binary file in database.BLOB can be understood as one Individual big file, typical BLOB is a pictures or an audio files, due to their size, it is necessary to uses special mode Come handle (such as:Upload, download or be stored in a database).
In embodiments of the present invention, BLOB can be understood as the data structure of the four-dimension.CONCAT layers are the multiple of previous stage BLOB1, BLOB2 of the output of layer ... BLOBn is cascaded as an output.
Further, realize within hardware CONCAT operate, by change each input BLOB1, BLOB2 ... BLOBn is in internal memory In the position (memory address) put realize the merging of branch.
As shown in figure 8, BLOB 1,2,3 ... n fixed point configuration information may be inconsistent.However, in actual hardware, It is required that the fixed point configuration of all inputs of CONCAT layers is consistent.If it is inconsistent to pinpoint configuration information, CONCAT can be caused The data collision of layer, neutral net successively can not be run to next layer again.
As shown in figure 9, in order to solve the above problems, we determine determining for the input range of neutral net using new method Point position.
In the method as shown in figure 9, CNN (convolutional neural networks) is the neutral net for having branch, is comprised at least:1st, 2nd ... n-th of convolutional layer (CONV layers), the 1st, the 2nd ... m-th complete to connect layer (FC layers), the 1st, the 2nd ... l-th of CONCAT layer, Wherein described n, m, l are positive integer.
Weight parameter flow shown in Fig. 9 left-hand branch is roughly the same with Fig. 5.It is different from method shown in Fig. 5, in Fig. 9 In the data quantization flow of right-hand branch, comprise the following steps.
First, the number range of the output to each layer (each layer in CONV layers, FC layers, CONCAT layers) of the CNN Estimated, wherein the numerical value is floating number.
According to one embodiment of present invention, wherein first step includes:Input data is supplied to the CNN, it is described Input data connects layer (FC layers), l CONCAT layers processing entirely by the m convolutional layer (CONV layers) of the CNN, n, obtains each layer Output.
Second, the number range of above-mentioned output is quantified as fixed-point number from floating number fixed point.
Every layer of output is quantified as fixed-point number by above-mentioned steps from floating number, wherein the output for every layer is dynamically chosen Quantizing range, the quantizing range are constant in this described layer.
According to one embodiment of present invention, optimal f can be calculated by the way of formula 7 or 8l, so that it is determined that fixed Point quantizing range.
3rd, the fixed point quantizing range of the output based on CONCAT layers, each input for changing the CONCAT layers is determined Point quantizing range.
Third step includes:Each CONCAT layers in the CNN are determined, wherein each CONCAT layers are the more of previous stage The output of oneself is merged into the output of individual layer.For example, multiple sub-networks can be found out from the network model of the CNN, Each sub-network is using CONCAT layers as last layer, so as to be handled using the sub-network as unit.
According to one embodiment of present invention, third step also includes:It is defeated to multiple layers of the previous stage of CONCAT layers Go out, the fixed point quantizing range and the fixed point amount of the output of the CONCAT layers of every layer of output of previous stage of the CONCAT layers Change scope.If it is not the same, then the fixed point quantizing range of the input is revised as the fixed point amount with the output of the CONCAT layers Change scope.
According to one embodiment of present invention, third step also includes:If some is defeated for the previous stage of a CONCAT layer It is another CONCAT layer to enter, then performs the step of step the three using another described CONCAT layer as another sub-network, iteration Suddenly.
As shown in Fig. 9 left-hand branch, with according to one embodiment of the present of invention, in addition to:Weight pinpoints quantization step, The weight of each layer is quantified as fixed-point number from floating number in CONV layers, FC layers and CONCAT layers.
In addition, the weight quantization flow of left-hand branch shown in Fig. 9 and the data quantization flow of right-hand branch can be simultaneously Perform, can also be alternately performed.
For example, before performing data fixed point quantization step, first to described in the completion of all CONV layers, FC layers and CONCAT layers Weight pinpoints quantization step.
Or the weight quantization step and the data quantization step can be alternately.Located according to input data The order of reason, the weight quantization step complete the convolutional layer (CONV layers), connect full layer (FC layers), CONCAT layers it is current One layer of fixed point quantify after, before the fixed point for starting next layer quantifies, output to this layer performs data quantization step.
According to one embodiment of present invention, also additionally four steps is included:After the third step, the CONV is exported The fixed point quantizing range of each layer of output in layer, FC layers, CONCAT layers.
Figure 10 shows the fixed point quantization that previous stage input is adjusted according to CONCAT layers based on the embodiment of the present invention The example put.
In Figure 10 example, CONCAT layers have two inputs, are convolutional layer CONV3, CONV4 respectively, the input of CONV3 layers For CONV2, the input of CONV2 layers is CONV1.
According to Fig. 9 flow, first step includes:Enter data into neutral net, obtain each layer of output number According to, and determine the number range of the output of each layer.For example, Figure 10 shows the numerical value of the output of each CONV layers and CONCAT layers Scope, such as with Gaussian Profile.
Second step, the number range of the output of each layer is quantified as fixed-point number from floating number fixed point.For example, reference formula 4, it is assumed that floating number is quantified as the fixed-point number of 8, i.e. bw=8.For example, the output fixed point of CONV3 layers is fl=5, CONV4 The output fixed point of layer is flThe output fixed point of=3, CONCAT layer is fl=4.
Third step, the fixed point configuration information of the output based on the CONCAT layers, to change each of the CONCAT layers Individual input CONV3, CONV4 fixed point quantizing range.
For example, the fixed point quantizing range (f of the output of the CONV3l=5) determine with the output of the CONCAT layers Point quantizing range (fl=4).Both differ, then the fixed point quantizing range of CONV3 output are revised as and the CONCAT The fixed point quantizing range of the output of layer.Therefore, the fixed point quantizing range of CONV3 output is by modification fl=4.
Next, the fixed point quantizing range (f of the output of the CONV4l=3) with the output of the CONCAT layers Pinpoint quantizing range (fl=4), both differ, then the fixed point quantizing range of CONV4 output be revised as with it is described The fixed point quantizing range of the output of CONCAT layers.Therefore, the fixed point quantizing range of CONV4 output is by modification fl=4.
If CONCAT layers also have other inputs, modify in a similar manner.
In addition, if CONCAT layers CONCAT1Some input and a CONCAT layers CONCAT1, then iteration behaviour is performed Make.First, by the latter CONCAT2It is considered as the input of previous stage, according to output CONCAT1Fixed point configuration modify;Then, Again by amended CONCAT2As output, CONCAT is changed2The each input of previous stage fixed point configuration.
Moreover, it will be appreciated that the solution of the present invention is applied to various forms of complicated artificial neural networks, it is not limited solely to Artificial neural network with CONCAT cascade operations, with branch.In addition, CONCAT operations should also be as managing as broad sense Solution, i.e. different sub-networks (or, network branches) is combined as the operation of a network.
In addition, " multiple " in description of the invention and claim refer to two or more.
It should be noted that each embodiment in this specification is described by the way of progressive, each embodiment emphasis What is illustrated is all the difference with other embodiment, between each embodiment identical similar part mutually referring to.
In several embodiments provided herein, it should be understood that disclosed apparatus and method, can also pass through Other modes are realized.Device embodiment described above is only schematical, for example, flow chart and block diagram in accompanying drawing Show the device of multiple embodiments according to the present invention, method and computer program product architectural framework in the cards, Function and operation.At this point, each square frame in flow chart or block diagram can represent the one of a module, program segment or code Part, a part for the module, program segment or code include one or more and are used to realize holding for defined logic function Row instruction.It should also be noted that at some as in the implementation replaced, the function that is marked in square frame can also with different from The order marked in accompanying drawing occurs.For example, two continuous square frames can essentially perform substantially in parallel, they are sometimes It can perform in the opposite order, this is depending on involved function.It is it is also noted that every in block diagram and/or flow chart The combination of individual square frame and block diagram and/or the square frame in flow chart, function or the special base of action as defined in performing can be used Realize, or can be realized with the combination of specialized hardware and computer instruction in the system of hardware.
The preferred embodiments of the present invention are the foregoing is only, are not intended to limit the invention, for the skill of this area For art personnel, the present invention can have various modifications and variations.Within the spirit and principles of the invention, that is made any repaiies Change, equivalent substitution, improvement etc., should be included in the scope of the protection.It should be noted that:Similar label and letter exists Similar terms is represented in following accompanying drawing, therefore, once being defined in a certain Xiang Yi accompanying drawing, is then not required in subsequent accompanying drawing It is further defined and explained.
The foregoing is only a specific embodiment of the invention, but protection scope of the present invention is not limited thereto, any Those familiar with the art the invention discloses technical scope in, change or replacement can be readily occurred in, should all be contained Cover within protection scope of the present invention.Therefore, protection scope of the present invention described should be defined by scope of the claims.

Claims (25)

1. the method that one kind quantifies artificial neural network (ANN), wherein the ANN is the neutral net for having branch, is comprised at least: 1st, the 2nd ... n-th of convolutional layer (CONV layers), the 1st, the 2nd ... m-th complete to connect layer (FC layers), the 1st, the 2nd ... l-th of cascade Layer (CONCAT layers), wherein described n, m, l are positive integer, methods described includes:
(a) fixed point quantization is carried out to the number range of each layer of output in the CONV layers, FC layers, CONCAT layers;
(b) each sub-network in the ANN is determined, each sub-network is using CONCAT layers as last layer, wherein each The output of multiple layers of previous stage is merged into an output by CONCAT layers;
(c) each sub-network is directed to, the fixed point quantizing range of the output based on CONCAT layers, changes the previous of the CONCAT layers The fixed point quantizing range of the output of each layer of level.
2. method according to claim 1, wherein step c includes:
For multiple layers of the previous stage of CONCAT layers, the fixed point amount of the output of each layer of the previous stage of the CONCAT layers Change scope and the fixed point quantizing range of the output of the CONCAT layers, if it is not the same, then the fixed point quantizing range of the input It is revised as the fixed point quantizing range with the output of the CONCAT layers.
3. method according to claim 1, wherein step c includes:
If some input that last layer of the sub-network is CONCAT layers and the CONCAT layers is another CONCAT Layer, then using another described CONCAT layer as another sub-network, perform step c.
4. method according to claim 1, wherein step a includes:
Input data is supplied to the ANN, the input data is connected entirely by the m convolutional layer (CONV layers) of the ANN, n Every layer of output, is quantified as fixed-point number, wherein for every layer of output by layer (FC layers), l CONCAT layers processing from floating number Quantizing range is dynamically chosen, the quantizing range is constant in this described layer.
5. method according to claim 4, the output every layer is quantified as fixed-point number from floating number and further comprised:
Fixed-point number y is represented using following expression:
<mrow> <mi>y</mi> <mo>=</mo> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>0</mn> </mrow> <mrow> <mi>b</mi> <mi>w</mi> <mo>-</mo> <mn>1</mn> </mrow> </munderover> <msub> <mi>B</mi> <mi>i</mi> </msub> <mo>&amp;CenterDot;</mo> <msup> <mn>2</mn> <mrow> <mo>-</mo> <msub> <mi>f</mi> <mi>l</mi> </msub> </mrow> </msup> <mo>&amp;CenterDot;</mo> <msup> <mn>2</mn> <mi>i</mi> </msup> </mrow>
Wherein bw is the bit width of fixed-point number number, flBe can be negative part length;
Find the output of each layer obtained when the input data is handled by the CONV layers, FC layers, CONCAT layers of the ANN most Excellent fl
6. method according to claim 5, wherein finding optimal flThe step of include:
Analyze the floating number span of every layer of output;
Based on the scope, to flOne initial value is set;
With following expression initial value the optimal f of neighborhood searchl, based on the flFixed-point number expression with floating number express it Between error it is minimum,
<mrow> <msub> <mi>f</mi> <mi>l</mi> </msub> <mo>=</mo> <mi>arg</mi> <munder> <mrow> <mi>m</mi> <mi>i</mi> <mi>n</mi> </mrow> <msub> <mi>f</mi> <mi>l</mi> </msub> </munder> <mo>&amp;Sigma;</mo> <mo>|</mo> <msubsup> <mi>x</mi> <mrow> <mi>f</mi> <mi>l</mi> <mi>o</mi> <mi>a</mi> <mi>t</mi> </mrow> <mo>+</mo> </msubsup> <mo>-</mo> <msup> <mi>x</mi> <mo>+</mo> </msup> <mrow> <mo>(</mo> <mi>b</mi> <mi>w</mi> <mo>,</mo> <msub> <mi>f</mi> <mi>l</mi> </msub> <mo>)</mo> </mrow> <mo>|</mo> </mrow>
Wherein x+=Ax, A represent any layer in CONV layers, FC layers or CONCAT layers, and x represents to be supplied to any layer Input, x+Represent the output of any layer.
7. according to the method for claim 5, wherein finding optimal flThe step of include:
Analyze the floating number span of every layer of output;
Based on the scope, to flOne initial value is set;
With following expression initial value the optimal f of neighborhood searchl,
<mrow> <msub> <mi>f</mi> <mi>l</mi> </msub> <mo>=</mo> <mi>arg</mi> <munder> <mrow> <mi>m</mi> <mi>i</mi> <mi>n</mi> </mrow> <msub> <mi>f</mi> <mi>l</mi> </msub> </munder> <mo>&amp;Sigma;</mo> <mrow> <mo>|</mo> <mrow> <munder> <mo>&amp;Sigma;</mo> <mi>N</mi> </munder> <msub> <mi>k</mi> <mi>i</mi> </msub> <mo>|</mo> <msubsup> <mi>X</mi> <mrow> <msub> <mi>float</mi> <mi>i</mi> </msub> </mrow> <mo>+</mo> </msubsup> <mo>-</mo> <msup> <mi>X</mi> <mo>+</mo> </msup> <msub> <mrow> <mo>(</mo> <mi>b</mi> <mi>w</mi> <mo>,</mo> <msub> <mi>f</mi> <mi>l</mi> </msub> <mo>)</mo> </mrow> <mi>i</mi> </msub> <mo>|</mo> </mrow> <mo>|</mo> </mrow> </mrow>
Wherein x+=Ax, A represent any layer in CONV layers, FC layers or CONCAT layers, and x represents to be supplied to any layer Input, x+Represent the output of any layer;
I represents a certain position in bw position, kiFor this weight.
8. method according to claim 1, in addition to:
After step c, the fixed point quantizing range of each layer of output in the CONV layers, FC layers, CONCAT layers is exported.
9. method according to claim 1, in addition to:
Weight pinpoints quantization step, and the weight of each layer in CONV layers, FC layers and CONCAT layers is quantified as fixed-point number from floating number.
10. method according to claim 9, the convolutional layer (CONV layers) ANN, connect entirely in layer (FC layers) and CONCAT layers Each layer of weight parameter is quantified as fixed-point number from floating number to be included:Quantization model is dynamically chosen for the weight parameter of each layer Enclose, the quantizing range is constant this layer of inside.
11. method according to claim 10, the weight parameter for each layer is dynamically chosen quantizing range and further wrapped Include:
Fixed-point number y is represented using following expression:
<mrow> <mi>y</mi> <mo>=</mo> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>0</mn> </mrow> <mrow> <mi>b</mi> <mi>w</mi> <mo>-</mo> <mn>1</mn> </mrow> </munderover> <msub> <mi>B</mi> <mi>i</mi> </msub> <mo>&amp;CenterDot;</mo> <msup> <mn>2</mn> <mrow> <mo>-</mo> <msub> <mi>f</mi> <mi>l</mi> </msub> </mrow> </msup> <mo>&amp;CenterDot;</mo> <msup> <mn>2</mn> <mi>i</mi> </msup> </mrow>
Wherein bw is the bit width of fixed-point number, flBe can be negative part length;
Find the optimal f for each layer of weightl, with the weight being quantized described in representing.
12. method according to claim 11, wherein finding optimal flThe step of include:
Estimate the floating number span of each layer of weight parameter;
Based on the scope, to flOne initial value is set;
With following expression initial value the optimal f of neighborhood searchlSo that the flFixed-point number weight parameter and floating number weigh Error between weight parameter is minimum,
<mrow> <msub> <mi>f</mi> <mi>l</mi> </msub> <mo>=</mo> <mi>arg</mi> <munder> <mrow> <mi>m</mi> <mi>i</mi> <mi>n</mi> </mrow> <msub> <mi>f</mi> <mi>l</mi> </msub> </munder> <mo>&amp;Sigma;</mo> <mo>|</mo> <msub> <mi>W</mi> <mrow> <mi>f</mi> <mi>l</mi> <mi>o</mi> <mi>a</mi> <mi>t</mi> </mrow> </msub> <mo>-</mo> <mi>W</mi> <mrow> <mo>(</mo> <mi>b</mi> <mi>w</mi> <mo>,</mo> <msub> <mi>f</mi> <mi>l</mi> </msub> <mo>)</mo> </mrow> <mo>|</mo> </mrow>
Wherein W is weight, and W (bw, fl) is represented in given bw and flUnder W fixed point format.
13. according to the method for claim 11, wherein finding optimal flThe step of include:
Estimate the floating number span of each layer of weight parameter;
Based on the scope, to flOne initial value is set;
With following expression initial value the optimal f of neighborhood searchl,
<mrow> <msub> <mi>f</mi> <mi>l</mi> </msub> <mo>=</mo> <mi>arg</mi> <munder> <mrow> <mi>m</mi> <mi>i</mi> <mi>n</mi> </mrow> <msub> <mi>f</mi> <mi>l</mi> </msub> </munder> <mo>&amp;Sigma;</mo> <mrow> <mo>|</mo> <mrow> <mo>&amp;Sigma;</mo> <msub> <mi>k</mi> <mi>i</mi> </msub> <mo>|</mo> <msub> <mi>W</mi> <mrow> <msub> <mi>float</mi> <mi>i</mi> </msub> </mrow> </msub> <mo>-</mo> <mi>W</mi> <msub> <mrow> <mo>(</mo> <mi>b</mi> <mi>w</mi> <mo>,</mo> <msub> <mi>f</mi> <mi>l</mi> </msub> <mo>)</mo> </mrow> <mi>i</mi> </msub> <mo>|</mo> </mrow> <mo>|</mo> </mrow> </mrow>
Wherein W is weight, W (bw, fl) represent in given bw and flUnder W fixed point format,
I represents a certain position in bw position, kiFor this weight.
14. method according to claim 9, in addition to:
Before step a, the weight is completed to all CONV layers, FC layers and CONCAT layers and pinpoints quantization step.
15. method according to claim 9, in addition to:
The weight quantization step and the step a alternately, wherein completing the convolutional layer in the weight quantization step (CONV layers), connect full layer (FC layers), CONCAT layers when preceding layer fixed point quantify after, start next layer fixed point quantify Before, the output to this layer performs step a.
16. one kind quantifies the device of artificial neural network (ANN), wherein the ANN is the neutral net for having branch, at least wrap Include:1st, the 2nd ... n-th of convolutional layer (CONV layers), the 1st, the 2nd ... m-th complete to connect layer (FC layers), the 1st, the 2nd ... l-th Cascading layers (CONCAT layers), wherein described n, m, l are positive integer, described device includes:
Data pinpoint quantization device, for entering to the number range of each layer of output in the CONV layers, FC layers, CONCAT layers Row fixed point quantifies;
CONCAT layer determining devices, for determining each sub-network in the ANN, each sub-network is using CONCAT layers as most Later layer, wherein the output of multiple layers of previous stage is merged into an output by each CONCAT layers;
Adjusting apparatus is pinpointed, for each sub-network, the fixed point quantizing range of the output based on CONCAT layers, described in modification The fixed point quantizing range of the output of each layer of the previous stage of CONCAT layers.
17. device according to claim 16, wherein fixed point adjusting apparatus is configured as:
For the output of multiple layers of the previous stage of CONCAT layers, the output of each layer of the previous stage of the CONCAT layers is determined Point quantizing range and the fixed point quantizing range of the output of the CONCAT layers, if it is not the same, then the fixed point of the input is quantified Scope is revised as the fixed point quantizing range with the output of the CONCAT layers.
18. device according to claim 16, wherein fixed point adjusting apparatus is configured as:
If some input that last layer of the sub-network is CONCAT layers and the CONCAT layers is another CONCAT Layer, then using another described CONCAT layer as another sub-network, then carry out fixed point adjustment operation.
19. device according to claim 16, wherein data fixed point quantization device are configured as:
Input data is supplied to the ANN, the input data is connected entirely by the m convolutional layer (CONV layers) of the ANN, n Every layer of output, is quantified as fixed-point number, wherein for every layer of output by layer (FC layers), l CONCAT layers processing from floating number Quantizing range is dynamically chosen, the quantizing range is constant in this described layer.
20. device according to claim 16, in addition to:
Weight pinpoints quantization device, fixed for the weight of each layer in CONV layers, FC layers and CONCAT layers to be quantified as from floating number Points.
21. the method that one kind quantifies artificial neural network (ANN), methods described include:
(a) fixed point quantization is carried out to the number range of each layer of output in the artificial neural network;
(b) each sub-network in the artificial neural network is determined, wherein each sub-network is using cascading layers as last layer, The output of multiple layers of previous stage is merged into an output by wherein each cascading layers;
(c) it is directed to each sub-network, the fixed point quantizing range of the output based on cascading layers, the previous stage for changing the cascading layers is each The fixed point quantizing range of the output of individual layer.
22. method according to claim 21, wherein step c includes:For multiple layers of the previous stage of the cascading layers, compare The fixed point quantizing range and the fixed point quantizing range of the output of the cascading layers of the output of each layer of the previous stage of the cascading layers, If it is not the same, then the fixed point quantizing range of the input is revised as the fixed point quantizing range with the output of the CONCAT layers.
23. method according to claim 21, wherein step c includes:If last layer of the sub-network be cascading layers and Some input of the cascading layers is another cascading layers, then using another described cascading layers as another sub-network, performs step Rapid c.
24. method according to claim 21, wherein the ANN is the neutral net for having branch, comprise at least:1st, the 2nd ... N-th of convolutional layer (CONV layers), the 1st, the 2nd ... m-th complete to connect layer (FC layers), the 1st, the 2nd ... l-th of cascading layers (CONCAT Layer).
25. method according to claim 24, wherein step a includes:
Input data is supplied to the ANN, the input data is connected entirely by the m convolutional layer (CONV layers) of the ANN, n Every layer of output, is quantified as fixed-point number, wherein for every by layer (FC layers), l cascading layers (CONCAT layers) processing from floating number Quantizing range is dynamically chosen in the output of layer, and the quantizing range is constant in this described layer.
CN201610698184.2A 2016-08-12 2016-08-19 Hierarchical quantization method and device for complex neural network Active CN107688855B (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN2016106635638 2016-08-12
CN201610663563 2016-08-12
CN2016106632019 2016-08-12
CN201610663201 2016-08-12

Publications (2)

Publication Number Publication Date
CN107688855A true CN107688855A (en) 2018-02-13
CN107688855B CN107688855B (en) 2021-04-13

Family

ID=61127258

Family Applications (2)

Application Number Title Priority Date Filing Date
CN201610695285.4A Active CN107657316B (en) 2016-08-12 2016-08-19 Design of cooperative system of general processor and neural network processor
CN201610698184.2A Active CN107688855B (en) 2016-08-12 2016-08-19 Hierarchical quantization method and device for complex neural network

Family Applications Before (1)

Application Number Title Priority Date Filing Date
CN201610695285.4A Active CN107657316B (en) 2016-08-12 2016-08-19 Design of cooperative system of general processor and neural network processor

Country Status (1)

Country Link
CN (2) CN107657316B (en)

Cited By (33)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108510067A (en) * 2018-04-11 2018-09-07 西安电子科技大学 The convolutional neural networks quantization method realized based on engineering
CN108805265A (en) * 2018-05-21 2018-11-13 Oppo广东移动通信有限公司 Neural network model treating method and apparatus, image processing method, mobile terminal
CN110009096A (en) * 2019-03-06 2019-07-12 开易(北京)科技有限公司 Target detection network model optimization method based on embedded device
CN110309877A (en) * 2019-06-28 2019-10-08 北京百度网讯科技有限公司 A kind of quantization method, device, electronic equipment and the storage medium of feature diagram data
CN110348562A (en) * 2019-06-19 2019-10-18 北京迈格威科技有限公司 The quantization strategy of neural network determines method, image-recognizing method and device
CN110555508A (en) * 2018-05-31 2019-12-10 北京深鉴智能科技有限公司 Artificial neural network adjusting method and device
CN110555450A (en) * 2018-05-31 2019-12-10 北京深鉴智能科技有限公司 Face recognition neural network adjusting method and device
CN110837890A (en) * 2019-10-22 2020-02-25 西安交通大学 Weight value fixed-point quantization method for lightweight convolutional neural network
CN110874628A (en) * 2018-09-03 2020-03-10 三星电子株式会社 Artificial neural network and method for controlling fixed point therein
WO2020056718A1 (en) * 2018-09-21 2020-03-26 华为技术有限公司 Quantization method and apparatus for neural network model in device
CN111144511A (en) * 2019-12-31 2020-05-12 上海云从汇临人工智能科技有限公司 Image processing method, system, medium and electronic terminal based on neural network
CN111178522A (en) * 2020-04-13 2020-05-19 杭州雄迈集成电路技术股份有限公司 Software and hardware cooperative acceleration method and system and computer readable storage medium
CN111476362A (en) * 2019-01-23 2020-07-31 斯特拉德视觉公司 Method and device for determining F L value
CN109523016B (en) * 2018-11-21 2020-09-01 济南大学 Multi-valued quantization depth neural network compression method and system for embedded system
WO2021036362A1 (en) * 2019-08-28 2021-03-04 上海寒武纪信息科技有限公司 Method and apparatus for processing data, and related product
CN112561933A (en) * 2020-12-15 2021-03-26 深兰人工智能(深圳)有限公司 Image segmentation method and device
CN114708180A (en) * 2022-04-15 2022-07-05 电子科技大学 Bit depth quantization and enhancement method for pre-distorted image with dynamic range preservation
US11397579B2 (en) 2018-02-13 2022-07-26 Shanghai Cambricon Information Technology Co., Ltd Computing device and method
US11437032B2 (en) 2017-09-29 2022-09-06 Shanghai Cambricon Information Technology Co., Ltd Image processing apparatus and method
US11442785B2 (en) 2018-05-18 2022-09-13 Shanghai Cambricon Information Technology Co., Ltd Computation method and product thereof
US11513586B2 (en) 2018-02-14 2022-11-29 Shanghai Cambricon Information Technology Co., Ltd Control device, method and equipment for processor
US11544059B2 (en) 2018-12-28 2023-01-03 Cambricon (Xi'an) Semiconductor Co., Ltd. Signal processing device, signal processing method and related products
WO2023000898A1 (en) * 2021-07-20 2023-01-26 腾讯科技(深圳)有限公司 Image segmentation model quantization method and apparatus, computer device, and storage medium
US11609760B2 (en) 2018-02-13 2023-03-21 Shanghai Cambricon Information Technology Co., Ltd Computing device and method
US11630666B2 (en) 2018-02-13 2023-04-18 Shanghai Cambricon Information Technology Co., Ltd Computing device and method
US11675676B2 (en) 2019-06-12 2023-06-13 Shanghai Cambricon Information Technology Co., Ltd Neural network quantization parameter determination method and related products
US11676029B2 (en) 2019-06-12 2023-06-13 Shanghai Cambricon Information Technology Co., Ltd Neural network quantization parameter determination method and related products
US11703939B2 (en) 2018-09-28 2023-07-18 Shanghai Cambricon Information Technology Co., Ltd Signal processing device and related products
US11762690B2 (en) 2019-04-18 2023-09-19 Cambricon Technologies Corporation Limited Data processing method and related products
US11847554B2 (en) 2019-04-18 2023-12-19 Cambricon Technologies Corporation Limited Data processing method and related products
US11966583B2 (en) 2018-08-28 2024-04-23 Cambricon Technologies Corporation Limited Data pre-processing method and device, and related computer device and storage medium
US12001955B2 (en) 2019-08-23 2024-06-04 Anhui Cambricon Information Technology Co., Ltd. Data processing method, device, computer equipment and storage medium
US12112265B2 (en) 2020-12-18 2024-10-08 Analog Devices International Unlimited Company Architecture for running convolutional networks on memory and mips constrained embedded devices

Families Citing this family (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108564165B (en) * 2018-03-13 2024-01-23 上海交通大学 Method and system for optimizing convolutional neural network by fixed point
EP3770775A4 (en) * 2018-03-23 2021-06-02 Sony Corporation Information processing device and information processing method
CN108491890B (en) * 2018-04-04 2022-05-27 百度在线网络技术(北京)有限公司 Image method and device
CN110413255B (en) * 2018-04-28 2022-08-19 赛灵思电子科技(北京)有限公司 Artificial neural network adjusting method and device
KR20190136431A (en) * 2018-05-30 2019-12-10 삼성전자주식회사 Neural network system, Application processor having the same and Operating method of neural network system
CN110598839A (en) * 2018-06-12 2019-12-20 华为技术有限公司 Convolutional neural network system and method for quantizing convolutional neural network
CN109034025A (en) * 2018-07-16 2018-12-18 东南大学 A kind of face critical point detection system based on ZYNQ
CN112805727A (en) * 2018-10-08 2021-05-14 深爱智能科技有限公司 Artificial neural network operation acceleration device for distributed processing, artificial neural network acceleration system using same, and method for accelerating artificial neural network
CN109389120A (en) * 2018-10-29 2019-02-26 济南浪潮高新科技投资发展有限公司 A kind of object detecting device based on zynqMP
CN109740619B (en) * 2018-12-27 2021-07-13 北京航天飞腾装备技术有限责任公司 Neural network terminal operation method and device for target recognition
CN110889497B (en) * 2018-12-29 2021-04-23 中科寒武纪科技股份有限公司 Learning task compiling method of artificial intelligence processor and related product
CN109711367B (en) * 2018-12-29 2020-03-06 中科寒武纪科技股份有限公司 Operation method, device and related product
DE102020100209A1 (en) * 2019-01-21 2020-07-23 Samsung Electronics Co., Ltd. Neural network device, neural network system and method for processing a neural network model by using a neural network system
WO2021012148A1 (en) * 2019-07-22 2021-01-28 深圳市大疆创新科技有限公司 Data processing method and apparatus based on deep neural network, and mobile device
CN110569713B (en) * 2019-07-22 2022-04-08 北京航天自动控制研究所 Target detection system and method for realizing data serial-parallel two-dimensional transmission by using DMA (direct memory access) controller
US11635893B2 (en) * 2019-08-12 2023-04-25 Micron Technology, Inc. Communications between processors and storage devices in automotive predictive maintenance implemented via artificial neural networks
CN110990060B (en) * 2019-12-06 2022-03-22 北京瀚诺半导体科技有限公司 Embedded processor, instruction set and data processing method of storage and computation integrated chip
CN111626414B (en) * 2020-07-30 2020-10-27 电子科技大学 Dynamic multi-precision neural network acceleration unit
CN113240101B (en) * 2021-05-13 2022-07-05 湖南大学 Method for realizing heterogeneous SoC (system on chip) by cooperative acceleration of software and hardware of convolutional neural network
CN113361695B (en) * 2021-06-30 2023-03-24 南方电网数字电网研究院有限公司 Convolutional neural network accelerator

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016039651A1 (en) * 2014-09-09 2016-03-17 Intel Corporation Improved fixed point integer implementations for neural networks
CN105760933A (en) * 2016-02-18 2016-07-13 清华大学 Method and apparatus for fixed-pointing layer-wise variable precision in convolutional neural network

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100388234C (en) * 2005-12-09 2008-05-14 中兴通讯股份有限公司 Method for monitoring internal memory varible rewrite based on finite-state-machine
CN104794102B (en) * 2015-05-14 2018-09-07 哈尔滨工业大学 A kind of Embedded SoC speeding up to Cholesky decomposition
CN105224482B (en) * 2015-10-16 2018-05-25 浪潮(北京)电子信息产业有限公司 A kind of FPGA accelerator cards high-speed memory system
CN105630735A (en) * 2015-12-25 2016-06-01 南京大学 Coprocessor based on reconfigurable computational array

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016039651A1 (en) * 2014-09-09 2016-03-17 Intel Corporation Improved fixed point integer implementations for neural networks
CN105760933A (en) * 2016-02-18 2016-07-13 清华大学 Method and apparatus for fixed-pointing layer-wise variable precision in convolutional neural network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
JIANTAO QIU 等: "Going Deeper with Embedded FPGA Platform for Convolutional Neural Network", 《ACM/SIGDA INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE GATE ARRAYS (FPGA)》 *

Cited By (53)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11437032B2 (en) 2017-09-29 2022-09-06 Shanghai Cambricon Information Technology Co., Ltd Image processing apparatus and method
US11609760B2 (en) 2018-02-13 2023-03-21 Shanghai Cambricon Information Technology Co., Ltd Computing device and method
US11704125B2 (en) 2018-02-13 2023-07-18 Cambricon (Xi'an) Semiconductor Co., Ltd. Computing device and method
US12073215B2 (en) 2018-02-13 2024-08-27 Shanghai Cambricon Information Technology Co., Ltd Computing device with a conversion unit to convert data values between various sizes of fixed-point and floating-point data
US11740898B2 (en) 2018-02-13 2023-08-29 Shanghai Cambricon Information Technology Co., Ltd Computing device and method
US11397579B2 (en) 2018-02-13 2022-07-26 Shanghai Cambricon Information Technology Co., Ltd Computing device and method
US11507370B2 (en) 2018-02-13 2022-11-22 Cambricon (Xi'an) Semiconductor Co., Ltd. Method and device for dynamically adjusting decimal point positions in neural network computations
US11720357B2 (en) 2018-02-13 2023-08-08 Shanghai Cambricon Information Technology Co., Ltd Computing device and method
US11630666B2 (en) 2018-02-13 2023-04-18 Shanghai Cambricon Information Technology Co., Ltd Computing device and method
US11663002B2 (en) 2018-02-13 2023-05-30 Shanghai Cambricon Information Technology Co., Ltd Computing device and method
US11709672B2 (en) 2018-02-13 2023-07-25 Shanghai Cambricon Information Technology Co., Ltd Computing device and method
US11620130B2 (en) 2018-02-13 2023-04-04 Shanghai Cambricon Information Technology Co., Ltd Computing device and method
US11513586B2 (en) 2018-02-14 2022-11-29 Shanghai Cambricon Information Technology Co., Ltd Control device, method and equipment for processor
CN108510067B (en) * 2018-04-11 2021-11-09 西安电子科技大学 Convolutional neural network quantification method based on engineering realization
CN108510067A (en) * 2018-04-11 2018-09-07 西安电子科技大学 The convolutional neural networks quantization method realized based on engineering
US11442785B2 (en) 2018-05-18 2022-09-13 Shanghai Cambricon Information Technology Co., Ltd Computation method and product thereof
US11442786B2 (en) 2018-05-18 2022-09-13 Shanghai Cambricon Information Technology Co., Ltd Computation method and product thereof
CN108805265B (en) * 2018-05-21 2021-03-30 Oppo广东移动通信有限公司 Neural network model processing method and device, image processing method and mobile terminal
CN108805265A (en) * 2018-05-21 2018-11-13 Oppo广东移动通信有限公司 Neural network model treating method and apparatus, image processing method, mobile terminal
CN110555450A (en) * 2018-05-31 2019-12-10 北京深鉴智能科技有限公司 Face recognition neural network adjusting method and device
CN110555508A (en) * 2018-05-31 2019-12-10 北京深鉴智能科技有限公司 Artificial neural network adjusting method and device
US11966583B2 (en) 2018-08-28 2024-04-23 Cambricon Technologies Corporation Limited Data pre-processing method and device, and related computer device and storage medium
CN110874628A (en) * 2018-09-03 2020-03-10 三星电子株式会社 Artificial neural network and method for controlling fixed point therein
CN112449703A (en) * 2018-09-21 2021-03-05 华为技术有限公司 Method and device for quantifying neural network model in equipment
WO2020056718A1 (en) * 2018-09-21 2020-03-26 华为技术有限公司 Quantization method and apparatus for neural network model in device
US11703939B2 (en) 2018-09-28 2023-07-18 Shanghai Cambricon Information Technology Co., Ltd Signal processing device and related products
CN109523016B (en) * 2018-11-21 2020-09-01 济南大学 Multi-valued quantization depth neural network compression method and system for embedded system
US11544059B2 (en) 2018-12-28 2023-01-03 Cambricon (Xi'an) Semiconductor Co., Ltd. Signal processing device, signal processing method and related products
CN111476362A (en) * 2019-01-23 2020-07-31 斯特拉德视觉公司 Method and device for determining F L value
CN111476362B (en) * 2019-01-23 2024-05-03 斯特拉德视觉公司 Method and device for determining FL value
CN110009096A (en) * 2019-03-06 2019-07-12 开易(北京)科技有限公司 Target detection network model optimization method based on embedded device
US11847554B2 (en) 2019-04-18 2023-12-19 Cambricon Technologies Corporation Limited Data processing method and related products
US11762690B2 (en) 2019-04-18 2023-09-19 Cambricon Technologies Corporation Limited Data processing method and related products
US11934940B2 (en) 2019-04-18 2024-03-19 Cambricon Technologies Corporation Limited AI processor simulation
US11676028B2 (en) 2019-06-12 2023-06-13 Shanghai Cambricon Information Technology Co., Ltd Neural network quantization parameter determination method and related products
US11675676B2 (en) 2019-06-12 2023-06-13 Shanghai Cambricon Information Technology Co., Ltd Neural network quantization parameter determination method and related products
US11676029B2 (en) 2019-06-12 2023-06-13 Shanghai Cambricon Information Technology Co., Ltd Neural network quantization parameter determination method and related products
CN110348562B (en) * 2019-06-19 2021-10-15 北京迈格威科技有限公司 Neural network quantization strategy determination method, image identification method and device
CN110348562A (en) * 2019-06-19 2019-10-18 北京迈格威科技有限公司 The quantization strategy of neural network determines method, image-recognizing method and device
CN110309877B (en) * 2019-06-28 2021-12-07 北京百度网讯科技有限公司 Feature map data quantization method and device, electronic equipment and storage medium
CN110309877A (en) * 2019-06-28 2019-10-08 北京百度网讯科技有限公司 A kind of quantization method, device, electronic equipment and the storage medium of feature diagram data
US12001955B2 (en) 2019-08-23 2024-06-04 Anhui Cambricon Information Technology Co., Ltd. Data processing method, device, computer equipment and storage medium
WO2021036362A1 (en) * 2019-08-28 2021-03-04 上海寒武纪信息科技有限公司 Method and apparatus for processing data, and related product
JP2022502724A (en) * 2019-08-28 2022-01-11 上海寒武紀信息科技有限公司Shanghai Cambricon Information Technology Co., Ltd Methods, equipment, and related products for processing data
JP7034336B2 (en) 2019-08-28 2022-03-11 上海寒武紀信息科技有限公司 Methods, equipment, and related products for processing data
CN110837890A (en) * 2019-10-22 2020-02-25 西安交通大学 Weight value fixed-point quantization method for lightweight convolutional neural network
CN111144511A (en) * 2019-12-31 2020-05-12 上海云从汇临人工智能科技有限公司 Image processing method, system, medium and electronic terminal based on neural network
CN111178522A (en) * 2020-04-13 2020-05-19 杭州雄迈集成电路技术股份有限公司 Software and hardware cooperative acceleration method and system and computer readable storage medium
CN111178522B (en) * 2020-04-13 2020-07-10 杭州雄迈集成电路技术股份有限公司 Software and hardware cooperative acceleration method and system and computer readable storage medium
CN112561933A (en) * 2020-12-15 2021-03-26 深兰人工智能(深圳)有限公司 Image segmentation method and device
US12112265B2 (en) 2020-12-18 2024-10-08 Analog Devices International Unlimited Company Architecture for running convolutional networks on memory and mips constrained embedded devices
WO2023000898A1 (en) * 2021-07-20 2023-01-26 腾讯科技(深圳)有限公司 Image segmentation model quantization method and apparatus, computer device, and storage medium
CN114708180A (en) * 2022-04-15 2022-07-05 电子科技大学 Bit depth quantization and enhancement method for pre-distorted image with dynamic range preservation

Also Published As

Publication number Publication date
CN107657316B (en) 2020-04-07
CN107688855B (en) 2021-04-13
CN107657316A (en) 2018-02-02

Similar Documents

Publication Publication Date Title
CN107688855A (en) It is directed to the layered quantization method and apparatus of Complex Neural Network
CN110782015B (en) Training method, device and storage medium for network structure optimizer of neural network
EP3295385B1 (en) Fixed point neural network based on floating point neural network quantization
US11657267B2 (en) Neural network apparatus, vehicle control system, decomposition device, and program
US20190087713A1 (en) Compression of sparse deep convolutional network weights
JP6823495B2 (en) Information processing device and image recognition device
JP2019032808A (en) Mechanical learning method and device
US20180018555A1 (en) System and method for building artificial neural network architectures
Ding et al. Where to prune: Using LSTM to guide data-dependent soft pruning
US11586924B2 (en) Determining layer ranks for compression of deep networks
CN105760933A (en) Method and apparatus for fixed-pointing layer-wise variable precision in convolutional neural network
CN111882040A (en) Convolutional neural network compression method based on channel number search
CN112381763A (en) Surface defect detection method
US20230196202A1 (en) System and method for automatic building of learning machines using learning machines
CN112699958A (en) Target detection model compression and acceleration method based on pruning and knowledge distillation
CN114792378B (en) Quantum image recognition method and device
US11531888B2 (en) Method, device and computer program for creating a deep neural network
CN106339753A (en) Method for effectively enhancing robustness of convolutional neural network
CN110275928B (en) Iterative entity relation extraction method
CN109376763A (en) Sample classification method, system and medium based on multisample reasoning neural network
CN115797808A (en) Unmanned aerial vehicle inspection defect image identification method, system, device and medium
CN115879533A (en) Analog incremental learning method and system based on analog learning
CN111723203A (en) Text classification method based on lifetime learning
CN105976027A (en) Data processing method and device, chip
CN113469262A (en) Incremental learning method based on Fisher information matrix

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20180606

Address after: 100083, 17 floor, 4 Building 4, 1 Wang Zhuang Road, Haidian District, Beijing.

Applicant after: Beijing deep Intelligent Technology Co., Ltd.

Address before: 100084 Wang Zhuang Road, 1, Haidian District, Beijing, Tsinghua Tongfang Technology Plaza, block D, 1705

Applicant before: Beijing insight Technology Co., Ltd.

TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20190927

Address after: 2100 San Jose Rojack Avenue, California, USA

Applicant after: XILINX INC

Address before: 100083, 17 floor, 4 Building 4, 1 Wang Zhuang Road, Haidian District, Beijing.

Applicant before: Beijing Shenjian Intelligent Technology Co., Ltd.

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant