CN107392305A - Realize and perform the method and computer-readable medium of neutral net - Google Patents
Realize and perform the method and computer-readable medium of neutral net Download PDFInfo
- Publication number
- CN107392305A CN107392305A CN201710333745.3A CN201710333745A CN107392305A CN 107392305 A CN107392305 A CN 107392305A CN 201710333745 A CN201710333745 A CN 201710333745A CN 107392305 A CN107392305 A CN 107392305A
- Authority
- CN
- China
- Prior art keywords
- weight
- neutral net
- reorder
- housebroken
- version
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
Abstract
Neutral net is trained to generate characteristic pattern and associated weight.Execution is reordered with the equivalent network of systematic function.It can perform and reorder to improve the compression of weight, load balance and executory at least one.In one implementation, null value weight is grouped, it is allowed to skips them during execution.
Description
The cross reference of related application
This application claims the U.S. Provisional Application No. 62/336,493 submitted on May 13rd, 2016, in January, 2017
The U. S. application submitted the 15/421,423rd on the 31st, the korean patent application 10-2017- submitted on April 13rd, 2017
The rights and interests of No. 0048036, its content are incorporated herein by reference.
Technical field
Embodiments of the invention are usually directed to neutral net.
Background technology
Artificial neural network (NN) can be designed and train to perform extensive function.NN example, which is applied, includes image
Processing, speech recognition, data processing and control and other application.NN model can include significant number of layers and parameter (weight).
Processor with highly-parallel framework (such as graphics processing unit (GPU)) can promote large-scale NN effective realization.
Brief description of the drawings
Fig. 1 is the block diagram to reorder for showing characteristic pattern and weight according to the neutral net of embodiment.
Fig. 2 shows a part for the neutral net according to embodiment.
Fig. 3 shows a part for the neutral net according to embodiment.
The method that Fig. 4 shows the neutral net that reorders according to embodiment.
The method that Fig. 5 shows the neutral net that reordered according to the execution of embodiment.
The method that Fig. 6 shows the neutral net that reorders for including trimming according to embodiment.
Fig. 7 is shown reorders neutral net to skip the method for null value weight according to the execution of embodiment.
Fig. 8 A and Fig. 8 B show reordering to improve load balance according to embodiment.
Fig. 9 A and Fig. 9 B show the huffman coding of the weight according to embodiment.
Figure 10, which is shown in the neutral net according to embodiment, covers code stream decoding and value stream decoding.
Embodiment
Fig. 1 is the high level block diagram according to embodiment.In one embodiment, neutral net (NN) Development Framework 105 is net
One group of weight of all layers of generation of network.In one embodiment, the additional treatments of weight off-line execution on the computer systems.
In one embodiment, optional post processing 110 is performed, it includes trimming (pruning), and it by them by being arranged to zero (0)
To eliminate many weights, as described in more detail below.Perform characteristic pattern reorder 115, and it causes with reordering
The equivalent network of weight.The weight to reorder is by compression 120.Compiled corresponding to the version to reorder of the neutral net of original training
Translate the network 125 of optimization.In one embodiment, it is possible to achieve parallel place is utilized using the neutral net of the weight of compression
Reason.In addition, it be may be implemented as using the neutral net of the weight of compression in all input weight values to parallel processor all
It need not be handled in the case of with null value.
Fig. 2 is the block diagram according to the example of a part for the neutral net using the weight compressed of embodiment.Offer is deposited
Reservoir (for example, static random-access (SRAM) memory) is with the weight and input feature vector figure (IFM) of storage compression.In a reality
Apply in example, control unit includes:Special control logic, for controlling Parallel Unit;And CPU (CPU), it is tied
Work is closed to control cumulative array (multiply-accumulate array, the MAA) unit of SRAM memory, multiplier and input
The operation of data path (IDP) unit.In such as convolution NN many NN, many calculate may be implemented as being based on making
The operation of the operation calculated with MAA units.
In one embodiment, each IDP units receive the weight and input feature vector diagram data of compression, and by decompression
Weight and IFM data outputs are to MAA units.For example, each IDP can include at least one decompressor and buffer to buffer
Input data.In one embodiment, MAA accumulated result corresponds to output characteristic diagram data (OFM) and intermediate result.Can
The additional treatments function of the output to MAA units is supported to provide one or more units (in fig. 2 labeled as DRU), it is all
Such as change of scale, addition biasing, application activating function and pond (pooling).In one embodiment, MAA connects from each IDP
Receive IFM and non-zero weight.
In one embodiment, IDP quantity is 8, but can more generally use the IDP of varying number.One
In individual embodiment, each IDP units are run parallel, each supply a non-zero weight and one group of characteristic pattern to MAA computing units
It is worth (subset as IFM).In one embodiment, input block iteration IFM subset and corresponding power in cycles
Weight, concurrently to generate one group of OFM.
Fig. 3 illustrates in greater detail the example of some data flows of the feeding MAA units according to embodiment.For explanation
Purpose, show 8 parallel IDP and 16 MAA.More generally however, any number of unit can be configured to support simultaneously
Row processing.For example, use 8 sram cells, a part (for example, 1/8) for each individually SRAM storage weights.In a reality
Apply in example, single IDP provides a non-zero weight to MAA, and one IFM of each offer into MAA is (for example, 4 × 4
Block).
Fig. 4 is the flow chart of the method for the weight of NN compression for showing to be reordered according to the generation of embodiment.Receive warp
The characteristic pattern and weight 403 of the neutral net of training.The optional optimization 404 of housebroken network can be performed.Characteristic pattern and/or
Weight is reordered to generate the edition 4 05 to reorder of housebroken neutral net., can will be trained after reordering
Neutral net the version to reorder weight compression 407 and store 409 (for example, in the memory of neutral net equipment,
Although more generally, the weight of compression can be stored in storage medium or memory cell).
Then the weight of the compression of storage can be used for performing neutral net, as illustrated in the flow chart of figure 5.The weight of compression
It is read 505 and decompression 510.The model 515 of neutral net is performed using the weight of the version to reorder of neutral net.
The characteristic pattern that NN training algorithms typically result in NN layer is arbitrarily organized in memory.Therefore, corresponding to feature
The weight of figure generally also will be organized arbitrarily in memory.This any tissue can influence compression and execution efficiency in turn.
What is reordered is the sequence for many function equivalents for having neutral net on one side.However, it is possible to the sequence that selection function is equivalent
In some to obtain the structure of more more preferable than other compression ratios compression ratio with that can be utilized.As explanation, it is assumed that layer
Characteristic pattern 0 and 10 can exchange, as long as the exchange of weight corresponding to this layer progress, does not just have shadow to NN Input output Relationship
Ring.Identical weight inputs applied to identical, and identical knot in both these results and original and network for reordering
Fruit is added to together.However, it is possible to select to reorder to produce more suitable for compressing and/or with knot the advantages of being directed to execution
Structure.For example, NN weight can be reordered so that similar weight is grouped together in memory.That is, instructing
Practice NN afterwards and before its weight is compressed, the characteristic pattern for the NN that can reorder and the weighted value of correlation.
In one embodiment, neutral net can be selected to reorder to sort to introduce to weight, to improve compression weight
Ability (that is, reducing the data volume for representing NN).By being reordered to Internet, the weight of selection can be introduced
Sequence is compressed with providing more preferable weight.One selection is to help to compress their structure to perform weight by introducing to weight
Sequence is compressed with improving.For example, weight according to value can be grouped or sort.Another selection is based on the coding techniques for compression
The characteristic of (such as huffman coding or Columbus-Lai Si codings) reorders to perform.As an example, characteristic pattern can be rearranged
Sequence so that frequency distribution is more sharp in specific localized areas.Furthermore, it is possible to select to reorder to improve the prediction standard in encoding
True property.As another example, network characterization figure can be reordered so that weighted value tends to the quantity of increase or null value weight
Increase.
In addition, by redistributing non-zero weight, null value weight can be more effectively skipped during network performs.One
Selection is to perform to reorder to be grouped null value weight, to allow to skip them during execution.
As another example, weight can be reordered more preferable to be created during the parallel processing of neural network model
Load balance.Reordered for example, reordering and can perform with realizing, wherein each processing unit is selected in parallel processing
Quantity cycle on be supplied (for example, about the same quantity) non-zero weights of more equal amounts.
In one embodiment, the network cut and weight cluster of weight selected by being performed after network training.It is poly-
The weighted value that class includes for example being mapped to multiple different weighted values lesser amt is compressed with improving.For example, can be by 1,000
Or more slightly different weight be mapped to 32 weighted values.Cluster is also sometimes referred to as quantization (quantization).
In one embodiment, low amplitude weight (being arranged to zero) is trimmed.In one embodiment, it is accurate without influenceing network to perform trimming
Property.In shearing procedure, it is zero that low amplitude weight, which is clamped down on,.Network retraining be may then pass through to adjust remaining non-zero power
Weight, to regain the accuracy of any loss.That is, in order to offset the loss of accuracy, retraining can be carried out, with
Readjust some weights so that whole network keeps identical or almost identical accuracy, while keeps the advantage of compression.
In one embodiment, the percentage of trimming increase null value weight.This is for compressing and performing with potential excellent
Gesture.During execution in terminal NN equipment, multiple weight (examples can be concurrently applied in period demand in a manner of SIMD
Such as, all parallel computation unit application weights or whole skip null value weight).That is, it need not be applied during execution
Null weight, because these weights do not influence.In some cases, trimming may cause the weight of significant proportion final
It is zero (for example, about 60% to 95% or more), this in turn provides the chance for accelerating network to perform.
In one embodiment, null value weight is grouped is performed with improving.It may be difficult the place for eliminating many null value weights
Manage the cycle.However, multiple null value weights can be skipped when they are grouped so that they are collected together in same period.
This can help speed up execution, while improve compression.
In addition to resetting the Lossless Compression of sequence network and the weight to reorder, example embodiment can also utilize and damage pressure
Contracting, it can be omitted in other embodiments.In this case, together with reordering, weight is adjusted (for example, small
Adjustment) with improve compression.
Fig. 6 shows the method for including trimming and retraining according to embodiment.Receive the spy of housebroken neutral net
Sign figure and weight 601.
Weight 610 is trimmed, to improve weight compression efficiency and reduce network calculations cost.In one embodiment, with can
Become threshold value and perform trimming.For example, threshold value can be selected based on the predetermined ratio factor of the distance metric of weight.Implement in example
Example in, threshold value be selected as be equal to convolutional layer in each convolution kernels or full articulamentum each weight vectors L1 hammings away from
About 20% value from (hamming distance).Different scale factor or difference can be used in alternative embodiments
Distance metric.In another example, the threshold value can be iteratively found via Dynamic Programming, so that handy meet limitation threshold
Null value in each cluster of the rule generation of value maximizes.
Retraining Weighted residue 615.As shown in block 620, in certain embodiments, can include option with stability appraisal and
Retraining is one or many, until meeting stop condition, such as meets the iteration of predetermined number.
The quantization of weight can perform 625 with optional retraining.In the exemplary embodiment, the cluster of weight is based on
K mean clusters (k-means clustering) are carried out, wherein the barycenter each clustered is used to represent to be included in the cluster
Weight.
The group of quantized weight (quantized weight) is reordered 630.As it was previously stated, reordering to include
With surrounding characteristic pattern in full articulamentum or the switching of feature node of graph is corresponding reorders.However, weight can also be included by reordering
Sequence is compressed with improving.Reordering can include reordering into cluster and reordering based on columns and rows attribute.It can also select
The group of quantized weight in cluster, to maximize the validity of prediction.For example, such reset can be included by reordering
Sequence:Cluster 0 is most common, and it is most uncommon to cluster 31.Selected as one kind, row can be reset according to increasing order
Sequence into the row of selection quantity (for example, 16, depending on realizing details) cluster, to maximize the validity compressed between some row.
In addition, row can reorder in one group of row, iteratively compressed with being effectively expert in dimension.For example, the element of row 1 is predicted to be
Identical with row 0, plus some small positive increments, and increment is compressed.In alternative embodiments, cluster can be any suitable
The row of quantity.In alternative embodiments, cluster can be formed by any suitable element (for example, row).
Increment 635 is calculated relative to prediction.For example, the difference in cluster between adjacent columns and/or rows can be calculated.Other become
Change " basic " column or row that can apply to for being predicted to other columns and rows.For example, it is assumed that row 0 are chosen as " basic "
Row, and the every other row (for example, 16 row) in group are by the different proportion factor prediction applied to fundamental sequence.For example, can be with
Prediction a line is that row 0 is multiplied by scale factor and adds some increments.In some cases, increment can very little.
The optional adjustment of increment can be performed to improve compressibility 645, then perform retraining to mitigate accuracy loss.
For example, in order to improve compressibility, increment size may adjust on a small quantity up or down.This adjustment will be damaging for compression scheme
Part.
Then increment and fundamental forecasting 650 are compressed.Encoding scheme, such as entropy code scheme can be used.For example, it can make
The increment with multiple is represented with huffman coding.It is real can to represent that most common increment comes by using position as few as possible
Now effective compression.
Then the expression of the compression of the model to reorder is written to data storage device 655.
Fig. 7 is to show the flow chart for including skipping the method for the execution of null value weight according to embodiment.Read compression
Weight 705.Decompress weight 710.During the execution of neutral net concurrently with the group of the quantity of selection (such as 16, depend on
In realizing details) apply weight 715.When the cluster of (for one group) value makes its all weight be both configured to zero, this is skipped
Cluster 720.Otherwise, convolution and vector product are handled during the execution of neutral net performs such as conventional neural networks.
In one embodiment, the mode for handling null value is partly dependent on channel type (for example, convolutional layer with being connected entirely
Layer).That is, realize skip null value weight mode depend on channel type (it corresponds to different mathematical operations in turn,
Convolution algorithm such as the vector product computing of full articulamentum and for convolutional layer).For example, null value weight can be grouped
More effectively to skip them in the full articulamentum for calculating vector product.However, for convolutional layer, null value, which can be distributed, (to expand
Exhibition) to help the load balance in parallel computation unit.Because it need not be combined in the convolution algorithm for convolutional layer
Weight of zero is can skip processing null value.An example of convolutional layer is considered, wherein there is load balance.In this example, each
Input block inputs subset for it and finds next non-zero weight and be moved to the weight.Therefore, each input block passes through it
Input data is moved with different speed, is jumped to from a non-zero weight next.They are all moved through at different rates
Their data.If each input block has the non-zero weight of about the same quantity with should in their input subset
With then system, which is supported, balances and effectively skip the original cycle needed using null value weight.Fig. 8 A and Fig. 8 B are shown
Reordered in convolutional layer to improve the example of load balance.Fig. 8 A are shown in the presence of two input blocks (input block 1 and defeated
Enter unit 2) example.The processing feature Fig. 1 of input block 1 and kernel 1 (wherein * computings are convolution algorithms);And the He of characteristic pattern 3
Kernel 3.The processing feature Fig. 2 of input block 2, kernel 2 and characteristic pattern 4, kernel 4.
Fig. 8 A show the example not reordered, wherein in the presence of big laod unbalance.Input block 1 needed for 4 week
Phase sends four non-zero weights in (emit) kernel 1, then needs 3 cycles to send the power of three non-zeros in kernel 3
Weigh, altogether 7 cycles.Input block 2 needs 5 cycles to send 5 non-zero weights in kernel 2, then needs 6 cycles
To send the non-zero weight in kernel 4,11 cycles altogether.Therefore, because laod unbalance, overall to need 11 cycles to locate
Manage four characteristic patterns on two input blocks.
Fig. 8 B show the example according to embodiment, and the IFM in network is moved into (shuffle) to obtain wherein reordering
The equivalent network of more load balance.Characteristic pattern 2 and characteristic pattern 3 are swapped by redefining neutral net, and are also existed
The exchange of corresponding weight kernel.Therefore, characteristic pattern 3 is reordered as the characteristic pattern 3' with corresponding kernel 3'.It is also heavy
The characteristic pattern 2' of sequence and corresponding kernel 2'.In this example, reordering causes bigger load balance.Input block 1 needs
Four cycles are wanted to send four non-zero weights in kernel 1, then need 5 cycles to send kernel 3' non-zero weight,
9 cycles come processing feature Fig. 1 and characteristic pattern 3' altogether.Input block 2 needs three cycles to send three in kernel 2'
Non-zero weight, and need six cycles to send the non-zero weight of kernel 4, altogether 9 cycles.Therefore, in the fig. 8b, it is necessary to nine
The individual cycle handles four characteristic patterns on two input blocks.
In one embodiment, there is provided hardware supported is for the load balance that performs immediately.For example, it can perform offline
Processing optional reordered with obtain IFM and performs reordering for OFM.In one embodiment, support remapping logic and
Replay firing table with specify network hardware perform during performance variable remap.
As previously discussed, reordering can be such as by weight (example corresponding to the characteristic pattern for exchanging different layers and exchange
Such as, interchange graph 2 and 10 and exchange weight corresponding to Fig. 2 and 10), cause the equivalent versions of identical network.However, at one
In embodiment, reorder including producing additional replay firing table to help the hardware in neural processing unit.Replay firing table can be with
Instruction hardware performs exchange.For example, replay firing table can indicate to input Fig. 2 and 10 for exporting Fig. 3 hardware-switch.
As previously discussed, many different data compression algorithms can be used for weight, such as, but not limited to huffman coding
Or any other suitable compression algorithm, such as Columbus-Lai Si codings.Compression performance can depend on the data to be compressed
Tissue.Make prediction for example, compression can depend on and represent the difference with prediction using variable amount of bits.For example, more
Common value is compressed by less position.
Fig. 9 A and Fig. 9 B show the aspect of huffman coding according to an embodiment of the invention.As shown in Figure 9 A, principle
On can carry out weight decoding using single shared huffman table.For output node sequence (for example, output node 0,
1...7 one group of weight index).Exist weight index use be uniformly distributed, wherein it is low index than it is high index it is more conventional.It is single
Huffman table is used for the low index that upper frequency is utilized in whole weight group.However, in figure 9 a it is assumed that weight index uses
Be uniformly distributed-it is low index than it is high index it is more conventional, but in left column compare it is right arrange it is more conventional.In Fig. 9 A example, weight
The each column of index has random sequence.For example, row O0With corresponding to it is any come self-training random index be distributed.For Fig. 9 A
In weight index each column, arrange O1With random index distribution etc..
Fig. 9 B show the use for being used for the huffman coding that context-adaptive Changeable weight compresses according to embodiment.
Row (and/or row) can be classified to generate with the frequency for allowing the low index using two or more different huffman tables
Weight tissue.For example, the distribution that weight index uses, to cause compared with right-hand column, rope low for left-hand line can be selected
Draw more more conventional than high index.In Fig. 9 B example, reorder and low value weight is moved to the side of matrix and moves high level
To opposite side.After the reordering of weight matrix, for the subset optimization group of huffman table of node.For example, each table
Different groups of node are can correspond to, wherein each table has the low index of different frequency.For example, leftmost two are considered first
Row.For output node O0'Weight index column there is the most common low weight index in the row.For output node O1''s
There is weight index column the index similar to left-hand line to be distributed.For the first two node (O' and 1') weight index have with it is low
The frequency of index very high corresponding node 0' and 1' the first huffman table.Into lower two row, for output node 2''s
Weight index column has low index more uncommon than left-hand line here.For output node 3' weight index column have with
The similar distribution of left-hand line.Weight index for node 2' and 3' is with the second huffman table for node 2 and 3.The row
Sequence from left to right continues in the output node entirely to reorder, ends at for the output section with most uncommon low index
Point 6' and with weight index column with the output node 7' of the distribution similar for output node 6'.
Figure 10 shows that the IDP decompressors for Huffman or Columbus-Lai Si decodings include the weight masks of compression
The reality of stream decoder (mask stream decoder) and the weighted value stream decoder (value stream decoder) of compression
Apply example.In one embodiment, weight kernel represents with the mask for specifying (trimming) weight and for the index of non-zero weight.Can
To provide further look-up table (LUT) to support to decode.In one embodiment, output includes zero mask buffer and weighted value delays
Rush device.
Exemplary embodiment can be deployed as the electronic equipment for including processor and the memory of store instruction.In addition, should
Work as understanding, embodiment can be deployed as autonomous device or by the multiple equipment portion in distributed client-server networked system
Administration.
For embodiments of the invention performing environment non-limiting example in graphics processing unit (GPU).Although
GPU can for realize NN provide essence computing capability, but in the equipment with limited memory and/or power may
It is difficult to NN.Example embodiment disclosed herein is by clustering 0 value weight so as to more effectively skip them, Ke Yishi
The improvement compression of the neutral net weight parameter of storage device in the current memory in GPU, and network execution is provided
Improved efficiency.
Here, in appropriate circumstances, computer-readable non-transitory storage medium or media can include one or
Multiple based on semiconductor or other integrated circuits (IC) are (such as, such as field programmable gate array (FPGA) or application-specific integrated circuit
(ASTC)), hard disk drive (HDD), hybrid hard drive (HHD), CD, CD drive (ODD), magneto-optic disk, magneto-optic
Driver, floppy disk, floppy disk (FDD), tape, solid-state drive (SSD), ram driver, safe digital card or driving
Device, any other suitable computer-readable non-temporary storage medium or two or more any suitable in these
Combination.In appropriate circumstances, computer-readable non-transitory storage medium can be volatibility, it is non-volatile,
Or volatibility and non-volatile combination.
Here, unless the context or it is otherwise noted, otherwise "or" is inclusive and not excluded
's.Therefore, herein, unless the context or it is otherwise noted, otherwise " A or B " refer to " A, B or both ".
In addition, unless the context or be otherwise noted, " and " it is joint and several.Therefore, herein, unless
Context is expressly stated otherwise or is otherwise noted, and " A and B " refer to " A and B jointly or respectively ".
The scope of the present disclosure cover those skilled in the art will appreciate that, it is real to the example that is described herein or shows
Apply all changes, replacement, change, change and the modification of example.It is real that the scope of the present disclosure is not limited to example that is described herein or showing
Apply example.In addition, although various embodiments herein is described and illustrates to be to include specific components, element, feature, work(by the disclosure
Can, operation or step, but any one in these embodiments can include those skilled in the art will appreciate that,
Any component, element, feature, function, operation or any combinations or the arrangement of step for Anywhere describing or illustrating herein.Separately
Outside, although disclosure description or explanation specific embodiment provide the advantages of specific, specific embodiment can provide these
Some or all of advantage can not provide these advantages.
Although the present invention is described in conjunction with specific embodiments, but it is to be understood that the present invention is not intended to be limited to institute
The embodiment of description.On the contrary, it is intended to cover it can be included in the spirit and scope of the present invention being defined by the following claims
Replacement, modification and equivalent.The present invention can be put into practice in the case of some or all of these no details.This
Outside, known feature may not be described in detail, to avoid unnecessarily obscuring the present invention.According to the present invention, component, processing step
And/or data structure can use various types of operating systems, programming language, calculating platform, computer program and/or calculating
Equipment is realized.In addition, it will be appreciated by those of ordinary skill in the art that do not depart from inventive concept disclosed herein scope and
In the case of spirit, such as hardwired device, field programmable gate array (FPGA), application specific integrated circuit can also be used
Etc. (ASIC) equipment.The present invention can also be tangibly embodied as being stored on the computer-readable medium of such as memory devices
One group of computer instruction.
Claims (20)
1. a kind of method for realizing neutral net, including:
Receive the data for housebroken neutral net, including characteristic pattern and weight;
The characteristic pattern and/or weight of the housebroken neutral net that reorders, to generate reordering for housebroken neutral net
Version;And
After execution is reordered, the weight of the version to reorder of housebroken neutral net is compressed.
2. according to the method for claim 1, wherein the packet reordering includes the characteristic pattern of the neutral net that reorders to reset
The weight of sequence neutral net.
3. according to the method for claim 1, wherein the packet reordering includes the weight for the neutral net that reorders, with quilt
Select to improve the structure of compression efficiency compared with the weight of received data.
4. according to the method for claim 1, wherein the packet reordering includes considers to reorder in weight based on load balancing
It is at least some with distribution of weights.
5. according to the method for claim 1, at least some weights are divided by weighted value wherein the packet reordering includes
Group.
6. according to the method for claim 5, wherein at least some null value weight is grouped.
7. according to the method for claim 1, it is additionally included in before reordering by the different weighted values by the first quantity
Weight be mapped to the different weighted values of the second quantity to cluster weight, wherein the second quantity is less than the first quantity.
8. by the input reordered and the weight of output node before according to the method for claim 1, being additionally included in compression
Index reorder.
9. according to the method for claim 1, wherein the version to reorder of the housebroken neutral net is trained
Neutral net equivalent versions.
10. according to the method for claim 1, wherein the packet reordering includes replay firing table of the generation for neutral net,
To realize the remapping to realize the version to reorder of housebroken neutral net of characteristic pattern.
11. a kind of method for performing neutral net, including:
The model of neutral net is provided, wherein the model corresponds to the pass the characteristic pattern for the housebroken neutral net that reorders
And/or weight and the version to reorder of housebroken neutral net that generates;And
Perform the model of neutral net.
12. according to the method for claim 11, wherein performing the model includes skipping the group with complete zero weight
Perform.
13. according to the method for claim 11, skip the zero of distribution wherein performing the model and being included in convolution pattern
It is worth the execution of weight.
14. according to the method for claim 11, wherein the version to reorder includes being based on being used to parallel locate at one group
Sequence of the load balance condition performed on reason input block to weight.
15. according to the method for claim 11, wherein the model of the neutral net is in one group of parallel processing input block
Upper execution, and the version to reorder has the non-zero weight value based on the distribution of load balance condition so that at least
One convolutional layer, each parallel processing element are grasped with each cycle about the same non-zero weight average in cycles
Make.
16. according to the method for claim 11, wherein the model includes the replay firing table for neutral net, to realize
Characteristic pattern is remapped to realize the version to reorder of housebroken neutral net.
17. according to the method for claim 16, wherein the table that remaps is utilized to perform spy during execution by hardware
Sign figure reorders.
18. according to the method for claim 11, wherein the version to reorder is equivalent with housebroken neutral net
Network or housebroken neutral net optimization version.
19. according to the method for claim 11, wherein the weight of the neutral net stores in the compressed format, and it is described
Method also includes:
Read the weight of compression;
Decompress the weight of the compression;
Skip the execution of null value weight, including at least one of the following:Skip the power for being zero for all weights of full articulamentum
Any cluster of weight or the execution for skipping the scattered null value weight for convolutional layer;And
Performed using the weight of remaining decompression for neutral net.
20. a kind of computer-readable medium, include the storage medium of store instruction, it is real when performing the instruction on a processor
Existing method, methods described include:
Receive the data for housebroken neutral net, including characteristic pattern and weight;
The characteristic pattern and/or weight of the housebroken neutral net that reorders, to generate reordering for housebroken neutral net
Version;And
After execution is reordered, the weight of the version to reorder of housebroken neutral net is compressed.
Applications Claiming Priority (6)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201662336493P | 2016-05-13 | 2016-05-13 | |
US62/336,493 | 2016-05-13 | ||
US15/421,423 US20180082181A1 (en) | 2016-05-13 | 2017-01-31 | Neural Network Reordering, Weight Compression, and Processing |
US15/421,423 | 2017-01-31 | ||
KR1020170048036A KR20170128080A (en) | 2016-05-13 | 2017-04-13 | Method and apparatus for implementing neural network |
KR10-2017-0048036 | 2017-04-13 |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107392305A true CN107392305A (en) | 2017-11-24 |
Family
ID=60338932
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710333745.3A Withdrawn CN107392305A (en) | 2016-05-13 | 2017-05-12 | Realize and perform the method and computer-readable medium of neutral net |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107392305A (en) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108416425A (en) * | 2018-02-02 | 2018-08-17 | 浙江大华技术股份有限公司 | A kind of convolution method and device |
CN108710906A (en) * | 2018-05-11 | 2018-10-26 | 北方民族大学 | Real-time point cloud model sorting technique based on lightweight network LightPointNet |
CN110084364A (en) * | 2018-01-25 | 2019-08-02 | 北京深鉴智能科技有限公司 | A kind of deep neural network compression method and device |
WO2019177731A1 (en) * | 2018-03-13 | 2019-09-19 | Recogni Inc. | Cluster compression for compressing weights in neural networks |
CN111587436A (en) * | 2018-01-17 | 2020-08-25 | 昕诺飞控股有限公司 | System and method for object recognition using neural networks |
CN111937009A (en) * | 2018-04-05 | 2020-11-13 | Arm有限公司 | Systolic convolutional neural network |
WO2020259031A1 (en) * | 2019-06-27 | 2020-12-30 | 深圳市中兴微电子技术有限公司 | Data processing method and device, storage medium and electronic device |
TWI727641B (en) * | 2020-02-03 | 2021-05-11 | 華邦電子股份有限公司 | Memory apparatus and operation method thereof |
WO2022073160A1 (en) * | 2020-10-07 | 2022-04-14 | 浙江大学 | Encoding method, decoding method, encoder, decoder, and storage medium |
WO2022183345A1 (en) * | 2021-03-01 | 2022-09-09 | 浙江大学 | Encoding method, decoding method, encoder, decoder, and storage medium |
US11562235B2 (en) | 2020-02-21 | 2023-01-24 | International Business Machines Corporation | Activation function computation for neural networks |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5214746A (en) * | 1991-06-17 | 1993-05-25 | Orincon Corporation | Method and apparatus for training a neural network using evolutionary programming |
US20030033265A1 (en) * | 2001-08-10 | 2003-02-13 | Cabana David R. | Artificial neurons including weights that define maximal projections |
CN1470000A (en) * | 2000-09-11 | 2004-01-21 | ά˹�ض�����������ι�˾ | Neural net prediction of seismic streamer shape |
CN102521382A (en) * | 2011-12-21 | 2012-06-27 | 中国科学院自动化研究所 | Method for compressing video dictionary |
US20150006444A1 (en) * | 2013-06-28 | 2015-01-01 | Denso Corporation | Method and system for obtaining improved structure of a target neural network |
-
2017
- 2017-05-12 CN CN201710333745.3A patent/CN107392305A/en not_active Withdrawn
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5214746A (en) * | 1991-06-17 | 1993-05-25 | Orincon Corporation | Method and apparatus for training a neural network using evolutionary programming |
CN1470000A (en) * | 2000-09-11 | 2004-01-21 | ά˹�ض�����������ι�˾ | Neural net prediction of seismic streamer shape |
US20030033265A1 (en) * | 2001-08-10 | 2003-02-13 | Cabana David R. | Artificial neurons including weights that define maximal projections |
CN102521382A (en) * | 2011-12-21 | 2012-06-27 | 中国科学院自动化研究所 | Method for compressing video dictionary |
US20150006444A1 (en) * | 2013-06-28 | 2015-01-01 | Denso Corporation | Method and system for obtaining improved structure of a target neural network |
Non-Patent Citations (3)
Title |
---|
余文芳: "群体决策中权重分配的BP神经网络实现", 《微计算机信息》 * |
白雅娟: "基于人工神经网络的驾驶室人机界面匹配度评价权重分配方法", 《中北大学学报(自然科学版)》 * |
胡彩萍: "基于BP神经网络的排序评价算法研究及其应用", 《中国优秀硕士学位论文全文数据库基础科学辑》 * |
Cited By (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111587436A (en) * | 2018-01-17 | 2020-08-25 | 昕诺飞控股有限公司 | System and method for object recognition using neural networks |
CN110084364A (en) * | 2018-01-25 | 2019-08-02 | 北京深鉴智能科技有限公司 | A kind of deep neural network compression method and device |
CN110084364B (en) * | 2018-01-25 | 2021-08-27 | 赛灵思电子科技(北京)有限公司 | Deep neural network compression method and device |
CN108416425B (en) * | 2018-02-02 | 2020-09-29 | 浙江大华技术股份有限公司 | Convolution operation method and device |
CN108416425A (en) * | 2018-02-02 | 2018-08-17 | 浙江大华技术股份有限公司 | A kind of convolution method and device |
WO2019177731A1 (en) * | 2018-03-13 | 2019-09-19 | Recogni Inc. | Cluster compression for compressing weights in neural networks |
US11468316B2 (en) | 2018-03-13 | 2022-10-11 | Recogni Inc. | Cluster compression for compressing weights in neural networks |
CN111937009A (en) * | 2018-04-05 | 2020-11-13 | Arm有限公司 | Systolic convolutional neural network |
CN108710906B (en) * | 2018-05-11 | 2022-02-11 | 北方民族大学 | Real-time point cloud model classification method based on lightweight network LightPointNet |
CN108710906A (en) * | 2018-05-11 | 2018-10-26 | 北方民族大学 | Real-time point cloud model sorting technique based on lightweight network LightPointNet |
JP2022538735A (en) * | 2019-06-27 | 2022-09-06 | 中▲興▼通▲訊▼股▲ふぇん▼有限公司 | Data processing method, device, storage medium and electronic equipment |
WO2020259031A1 (en) * | 2019-06-27 | 2020-12-30 | 深圳市中兴微电子技术有限公司 | Data processing method and device, storage medium and electronic device |
JP7332722B2 (en) | 2019-06-27 | 2023-08-23 | セインチップス テクノロジー カンパニーリミテッド | Data processing method, device, storage medium and electronic equipment |
TWI727641B (en) * | 2020-02-03 | 2021-05-11 | 華邦電子股份有限公司 | Memory apparatus and operation method thereof |
US11562235B2 (en) | 2020-02-21 | 2023-01-24 | International Business Machines Corporation | Activation function computation for neural networks |
WO2022073160A1 (en) * | 2020-10-07 | 2022-04-14 | 浙江大学 | Encoding method, decoding method, encoder, decoder, and storage medium |
WO2022183345A1 (en) * | 2021-03-01 | 2022-09-09 | 浙江大学 | Encoding method, decoding method, encoder, decoder, and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107392305A (en) | Realize and perform the method and computer-readable medium of neutral net | |
US20180082181A1 (en) | Neural Network Reordering, Weight Compression, and Processing | |
CN110378468B (en) | Neural network accelerator based on structured pruning and low bit quantization | |
CN109472350B (en) | Neural network acceleration system based on block-circulant sparse matrix | |
CN111062472B (en) | Sparse neural network accelerator based on structured pruning and acceleration method thereof | |
KR20170128080A (en) | Method and apparatus for implementing neural network | |
US10534839B2 (en) | Method for matrix by vector multiplication for use in artificial neural network | |
US10599935B2 (en) | Processing artificial neural network weights | |
JP6998968B2 (en) | Deep neural network execution method, execution device, learning method, learning device and program | |
Isenburg et al. | Lossless compression of predicted floating-point geometry | |
JP6869676B2 (en) | Information processing equipment, information processing methods and programs | |
Yang et al. | Legonet: Efficient convolutional neural networks with lego filters | |
KR20160142791A (en) | Method and apparatus for implementing neural network | |
CN104704825B (en) | The lossless compression of segmented image data | |
CN109451308A (en) | Video compression method and device, electronic equipment and storage medium | |
WO2019234794A1 (en) | Arithmetic method | |
JP2014087058A (en) | Encoder, decoder and method thereof | |
IT202000018043A1 (en) | ARTIFICIAL NEURAL NETWORK PROCESSES AND PROCESSING SYSTEMS | |
CN113642726A (en) | System and method for compressing activation data | |
TWI745697B (en) | Computing system and compressing method thereof for neural network parameters | |
KR20230155417A (en) | Sparse matrix multiplication in hardware | |
US10559093B2 (en) | Selecting encoding options | |
CN111626415B (en) | High-efficiency matrix data format suitable for artificial neural network | |
US20220076122A1 (en) | Arithmetic apparatus and arithmetic method | |
Kekre et al. | Vector quantized codebook optimization using modified genetic algorithm |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20171124 |