CN109102074A - A kind of training device - Google Patents

A kind of training device Download PDF

Info

Publication number
CN109102074A
CN109102074A CN201710474297.9A CN201710474297A CN109102074A CN 109102074 A CN109102074 A CN 109102074A CN 201710474297 A CN201710474297 A CN 201710474297A CN 109102074 A CN109102074 A CN 109102074A
Authority
CN
China
Prior art keywords
data
value
judgment
cynapse
training device
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710474297.9A
Other languages
Chinese (zh)
Other versions
CN109102074B (en
Inventor
不公告发明人
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Cambricon Information Technology Co Ltd
Original Assignee
Shanghai Cambricon Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Cambricon Information Technology Co Ltd filed Critical Shanghai Cambricon Information Technology Co Ltd
Priority to CN201710474297.9A priority Critical patent/CN109102074B/en
Priority to PCT/CN2018/090901 priority patent/WO2018228399A1/en
Priority to EP19217768.1A priority patent/EP3657403A1/en
Priority to EP18818258.8A priority patent/EP3637327B1/en
Publication of CN109102074A publication Critical patent/CN109102074A/en
Priority to US16/698,976 priority patent/US11544542B2/en
Priority to US16/698,984 priority patent/US11544543B2/en
Priority to US16/698,988 priority patent/US11537858B2/en
Application granted granted Critical
Publication of CN109102074B publication Critical patent/CN109102074B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Abstract

Present disclose provides a kind of training devices, comprising: data processing module;And computing module, it is connect with the data processing module, treated for receiving the data processing module, and data carry out operation.Disclosure training device effectively can train part by acceleration inversion, it can also be used to which the training process of entire neural network accelerates, and improves trained and arithmetic speed, reduces trained power consumption, improve the validity of operation.

Description

A kind of training device
Technical field
This disclosure relates to field of artificial intelligence more particularly to a kind of sparse training device.
Background technique
Deep neural network is the basis of current many artificial intelligence applications, in speech recognition, image procossing, data point The various aspects such as analysis, advertisement recommender system, automatic driving have obtained breakthrough application, so that deep neural network is applied In the various aspects of life.But the operand of deep neural network is huge, restrict always its faster development and more It is widely applied.When considering to accelerate with accelerator design the operation of deep neural network, huge operand will necessarily With very big energy consumption expense, the further extensive use of accelerator equally restrict.
It mainly include two parts, forward direction speculates and reverse train in the operation of neural network.However add for existing For fast device, positive supposition part is all only supported mostly, without considering reverse train part then, this also brings one and asks Topic is exactly, accelerator be merely able to accelerate it is positive speculate part, can not acceleration inversion training part, be also just unable to complete neural network Entire training process, have apparent limitation.
Summary of the invention
(1) technical problems to be solved
In order to solve or at least partly alleviate above-mentioned technical problem, present disclose provides a kind of sparse training devices.This Sparse training device is disclosed, the positive of sparse or dense neural network can be supported to speculate part, can also be instructed with acceleration inversion Practice part, the training process that can be used in entire neural network accelerates.
(2) technical solution
According to one aspect of the disclosure, a kind of training device is provided, comprising:
Data processing module, for being compressed or being extended to input data;And
Computing module is connect with the data processing module, data that treated for receiving the data processing module Carry out operation.
In some embodiments, the data processing module includes:
Data compression unit, for being compressed according to a compression Rule of judgment to input data;
And Data expansion unit, for being extended according to an extension Rule of judgment to input data.
In some embodiments, the Data expansion unit, for being extended to input data, by the sparse of compression Data expansion be unpacked format.
In some embodiments, the compression Rule of judgment and extension Rule of judgment include that threshold decision condition or function reflect Penetrate Rule of judgment.
In some embodiments, the threshold decision condition, comprising: less than a given threshold value, it is greater than a given threshold value, In one given value range or outside a given value range.
In some embodiments, the data compression unit screens input data according to the sparse index value of data And compression, obtain the data to operation;Or judged according to the value of data itself, thus screen and compress to obtain meet it is described Compress the numerical value of Rule of judgment.
In some embodiments, the data compression unit is according to the sparse index values of cynapse data to the neuron of input Data are screened and are compressed, and obtain the neuron number evidence to operation, or according to the sparse index value of neuron number evidence to input Cynapse data screened and compressed, obtain the cynapse data to operation.
In some embodiments, the data compression unit is sieved according to the value of cynapse itself compared with a given threshold value It selects and compresses to obtain the cynapse data that absolute value is not less than given threshold value, or value and a given threshold value phase according to neuron itself Compare, screen and compresses to obtain the neuron number evidence that absolute value is not less than given threshold value.
In some embodiments, be also used to be determined whether according to a gradient value Rule of judgment will be terraced for the data processing module Angle value and data to operation are sent to computing module.
In some embodiments, the gradient value Rule of judgment includes threshold decision condition or Function Mapping Rule of judgment.
In some embodiments, the threshold decision condition, comprising: less than a given threshold value, it is greater than a given threshold value, In one given value range or outside a given value range.
In some embodiments, if judging to determine that the absolute value of neuron gradient value is less than given threshold through data processing module Value then compresses gradient value and the corresponding cynapse to operation, i.e., does not send computing module and carry out operation;Otherwise, if gradient value Absolute value be not less than given threshold value, then gradient value and the corresponding cynapse to operation are sent in computing module and are transported It calculates.
In some embodiments, if cynapse is stored in the form of sparse, before inputting computing module operation, by the number Cynapse and cynapse index value are extended according to processing module, are converted into non-sparse mode.
In some embodiments, the computing module includes:
First arithmetic element comprising multiple PE, each PE include multiplier and/or adder, for completing multiplication, adding Method or multiply-add operation;
Second arithmetic element comprising two groups of add tree, every group of add tree include multiple add tree, for completing cumulative fortune It calculates;And
Third arithmetic element comprising ALU.
In some embodiments, first arithmetic element includes M*N PE, and each PE includes a multiplier and an addition Device;Second arithmetic element includes two groups of add tree, one group include M N input add tree, another group includes that N number of M is inputted Add tree;The third arithmetic element includes max (M, N) a ALU;Wherein, M and N is positive integer.
In some embodiments, in computing module, first arithmetic element is used to complete multiplying for gradient value and cynapse Method operation, i.e., carry out one-to-one correspondence with corresponding gradient value for cynapse and be multiplied, and second arithmetic element will be same using add tree One, which arranges data to be added up, adds up;If the accumulation result is not the accumulation result finally needed, i.e. accumulation operations are not yet complete At, then skip third arithmetic element, by the intermediate result save in the buffer, wait next time adds up together;Otherwise, it is transported by third The ALU calculated in unit completes subsequent arithmetic.
In some embodiments, if upper one layer there are activation primitive, the third arithmetic element is also used to add up this As a result it is multiplied with the inverse function of the activation primitive, obtains final gradient value.
In some embodiments, the third arithmetic element be also used to according to a zero setting Rule of judgment to the gradient value into Row zero setting setting.
In some embodiments, the zero setting Rule of judgment is threshold decision condition, is set if the absolute value of gradient value is less than The gradient value is then set 0 by zero threshold value;Otherwise gradient value remains unchanged.
In some embodiments, further includes:
Memory module, for storing data;
Control module, for storing and issuing instruction, to control the memory module, data control block and operation mould Block.
A kind of chip another aspect of the present disclosure provides comprising the training device.
A kind of chip-packaging structure another aspect of the present disclosure provides comprising the chip.
A kind of board another aspect of the present disclosure provides comprising the chip-packaging structure.
A kind of electronic device another aspect of the present disclosure provides comprising the board.
(3) beneficial effect
It can be seen from the above technical proposal that the sparse training device of the disclosure at least have the advantages that wherein it One:
(1) the sparse training device of the disclosure effectively can train part by acceleration inversion, substantially increase training speed, reduce Trained power consumption.
(2) by being extended or compressing to data, trained power consumption is reduced.
(3) simultaneously, disclosure sparse sequence device can support the positive of sparse or dense neural network to speculate well Part, so as to accelerate for the training process of entire neural network.
(4) reverse train when, by increase gradient value Rule of judgment, zero setting Rule of judgment, to further mention The high validity of operation, improves arithmetic speed.
Detailed description of the invention
Fig. 1 is according to the sparse training device functional block diagram of the disclosure.
Fig. 2 is according to disclosure computing module structural schematic diagram.
Fig. 3 is according to another structural schematic diagram of disclosure computing module.
Fig. 4 is according to disclosure data processing module structural schematic diagram.
Fig. 5 is according to disclosure Sparse treatment process schematic diagram.
Fig. 6 is according to another schematic diagram of disclosure Sparse treatment process.
Fig. 7 is according to disclosure Data expansion treatment process schematic diagram.
Fig. 8 is according to another structural schematic diagram of disclosure computing module.
Specific embodiment
For the purposes, technical schemes and advantages of the disclosure are more clearly understood, below in conjunction with specific embodiment, and reference Attached drawing is described in further detail the disclosure.
It should be noted that similar or identical part all uses identical figure number in attached drawing or specification description.It is attached The implementation for not being painted or describing in figure is form known to a person of ordinary skill in the art in technical field.In addition, though this Text can provide the demonstration of the parameter comprising particular value, it is to be understood that parameter is equal to corresponding value without definite, but can connect It is similar to be worth accordingly in the error margin or design constraint received.In addition, the direction term mentioned in following embodiment, such as "upper", "lower", "front", "rear", "left", "right" etc. are only the directions with reference to attached drawing.Therefore, the direction term used be for Illustrate not to be used to limit the disclosure.
The disclosure mainly proposes a kind of sparse training device of neural network, can support entirely training for neural network Journey.As shown in Figure 1, the sparse training device includes control module, memory module, data processing module and computing module.
Wherein, control module is mainly used for store instruction, firing order, to regulate and control memory module, data control block And computing module, so that its sparse training device is capable of the cooperation of harmonious orderly on the whole.
Memory module is mainly used for storing data, including in calculating process to the neuron number evidence of operation, cynapse data, Other relevant parameters etc. needed in intermediate result data, final result data and operation.
Data processing module is mainly used for carrying out processing screening by treating operational data, is selected according to certain Rule of judgment It selects out and needs to input the data that computing module carries out operation.
Computing module is mainly used for carrying out neural network computing, and the intermediate result data and final result that needs are stored Data are sent storage section back to and are saved.
Further, computing module can complete training process by cooperation, specifically, as Figure 2-3, the fortune Calculating module includes multiple groups arithmetic element, and every group of arithmetic element includes the first arithmetic element 1, the second arithmetic element 2, third operation list Member 3.Wherein, first arithmetic element includes multiple PE, and each PE includes multiplier and/or adder, for complete multiplication, Addition or multiply-add operation.Second arithmetic element includes multiple add tree, for completing accumulating operation.The third operation list Member includes ALU, the preferably ALU of lightweight, namely the ALU comprising required function.Computing module includes multiplying for completing Side such as activates, compares at a series of basic operations such as nonlinear operations and individual multiplication, addition.It, can be in actual operation According to actual needs, pipelining is carried out to first arithmetic element, the second arithmetic element and third arithmetic element, Also it may be selected to skip wherein some arithmetic element or some arithmetic section of some arithmetic element, be such as not necessarily at POOLING layers Accumulation operations in second arithmetic element then can be skipped directly.
As shown in figure 4, the data processing module includes data compression unit and data expanding element.Wherein, the number Input data is compressed according to a compression Rule of judgment according to compression unit;The Data expansion unit, for expanding according to one Exhibition Rule of judgment is extended input data.The compression Rule of judgment and extension Rule of judgment include threshold decision condition or Function Mapping Rule of judgment.The threshold decision condition, comprising: less than a given threshold value, it is greater than a given threshold value, it is given one In value range or outside a given value range.
Specifically, the data compression unit can both press neuron Data Data for compressing to data Contracting, can also compress cynapse data.It more specifically, can be according to prominent when cynapse data are rarefaction representation the case where The sparse index value of touching data is screened and is compressed to the neuron of input, and " effective " neuron number to operation is filtered out According to and cynapse value be fed together subsequent computing module, carry out operation.Conversely, work as the case where neuron number is according to being rarefaction representation, The cynapse data of input can be screened and be compressed according to the sparse index value of neuron number evidence.
As shown in figure 5, cynapse is the data of rarefaction representation, neuron is the data of dense expression, according to rarefaction representation The index value of cynapse compresses neuron value, includes two arrays in the rarefaction representation mode selected here, and first is used to remember The cynapse value of rarefaction representation is recorded, another is used to save these corresponding positions of cynapse value, that is, indexes.It might as well assume former ordered series of numbers Length is 8, then indicating that the cynapse value of the rarefaction representation is located at the position of the 1st, 3,6, the 7 of former sequence according to index value (start bit 0) should be with the numerical value that the group index value can filter out operation to be done in neuron value positioned at the 1st, 3,6,7 The neuron set, therefore by the neuron number of these operations to be done according to screening, compressed neural metasequence is obtained, i.e., N1N3N6N7.Operation is then carried out in arithmetic element together with the cynapse value of rarefaction representation.
In addition, the data compression unit can also be judged according to the value of data itself, to screen and compress to obtain The numerical value for meeting compression Rule of judgment is passed to subsequent computing module and carries out operation.By taking threshold decision condition as an example, the data Compression unit can screen according to the value of cynapse itself compared with a given threshold value and compress to obtain absolute value not less than given threshold The cynapse data of value can also screen according to the value of neuron itself compared with a given threshold value and compress to obtain absolute value not Less than the neuron number evidence of given threshold value.
As shown in fig. 6, being compressed for disclosure data compression unit according to numerical value itself.It might as well assume former sequence length It is 8, respectively 35210741.A contractive condition is given, that is, reduces the numerical value less than 3, then 012 equal quilts Compression screens out.Retain other data, forms compressed sequence, i.e., 3574.
Data expansion unit is used to be extended input data, including neuron number evidence and cynapse data, i.e., according to general Originally the sparse data of compression are extended as unpacked format.
Specifically, as shown in fig. 7, to be expanded according to cynapse data of the disclosure Data expansion unit to rarefaction representation Exhibition.It include two arrays in the rarefaction representation mode wherein selected, first array is used to record the cynapse value of rarefaction representation, separately One array is used to save these corresponding positions of cynapse value, that is, indexes.The length that might as well assume former ordered series of numbers is 8, then basis Each in index system sequence corresponds to 1 number in former sequence, and 1 indicates that former sequential value is effective, i.e., does not indicate former sequence for 0,0 Value is 0, then the cynapse value shown herein as the rarefaction representation is located at the 1st, 3,6,7 position of former sequence (start bit is 0).Cynapse value is successively filled out to original position with the group index value, other positions 0 are non-depressed after extension thus can be obtained Contracting sequence.
If data are without compression or extension, the data processing module can be skipped, is directly passed to by memory module Computing module carries out operation.
In an embodiment of the disclosure, as shown in figure 8, first arithmetic element 1 includes M*N PE, each PE packet Include a multiplier and an adder.Second arithmetic element 2 includes two groups of add tree, and one group includes adding for M N input Method tree, one group include N number of M input add tree.The third arithmetic element 3 includes max (M, N) (taking the greater in M, N) A lightweight ALU, i.e. ALU only include arithmetic unit needed for operation.It will be appreciated by persons skilled in the art that described Second arithmetic element 2 also may include one group of add tree, include max (M, N) a add tree, it is only necessary to accordingly increase data biography Defeated line completes identical operation.
The sparse training device of the disclosure is introduced by taking the convolutional layer of sparse convolution neural network as an example below for reverse train The acceleration of process.Reverse train when, it is assumed that the input gradient value gradient that transmits of lower layer, cynapse when forward direction speculates are W, corresponding sparse index are index, and input neuron value when forward direction speculates is input.
It should be noted that the data processing module determines whether according to gradient value Rule of judgment in reverse train Gradient value and the data to operation are sent to computing module.The gradient value Rule of judgment includes threshold decision condition or function Map Rule of judgment.The threshold decision condition, comprising: less than a given threshold value, be greater than a given threshold value, in a given value In range or outside a given value range.
Firstly, updating upper one layer of neuron gradient value gradient '.Control module issues load instruction and reads gradient value After gradient and corresponding cynapse w, the data compression unit for being sent into data processing module carries out screening compression processing to it.Such as The absolute value of fruit gradient is less than given threshold value and does not bother to see me out then skipping by gradient and to the cynapse in requisition for operation Enter computing module and carries out operation;Otherwise, if the absolute value of gradient value is not less than given threshold value, and corresponding cynapse w It is sent into computing module together and carries out operation.In addition, if what cynapse w was saved in the form of sparse, for convenient for the ladder to neuron W and index, then needing to be sent into before arithmetic section, are fed together the Data expansion of data processing module by adding up for angle value Unit is extended it, becomes non-sparse mode, is then fed together in computing module with gradient again and carries out operation.? In computing module, gradient value and cynapse are sent into first arithmetic element 1 first, complete the multiplication behaviour of gradient value and cynapse Make, i.e., w is subjected to one-to-one correspondence with corresponding gradient and be multiplied, is then sent into corresponding second arithmetic element 2 of the column In add tree, same row is waited for that cumulative data add up using add tree.If the accumulation result does not need finally Accumulation result, i.e. accumulation operations are not yet completed, then skip the third arithmetic element 3, which is saved, and wait next time It adds up together;Otherwise, the ALU being sent into the corresponding third arithmetic element 3 completes subsequent arithmetic.If upper one layer has activation Function, then the accumulation result is multiplied with the inverse function of activation primitive in the third arithmetic element 3.It obtains final gradient’。
In addition, the ALU can be used for carrying out zero setting setting to gradient value according to a zero setting Rule of judgment.Sentenced with zero setting threshold value For broken strip part is as the zero setting Rule of judgment, if user provides zero setting threshold value at this time, in the third operation Judged in unit 3, if the absolute value of the gradient ' is less than the zero setting threshold value, gradient '=0.Otherwise not Become.Result back into memory module storage.Wherein, zero setting threshold value can also be other Rule of judgment, such as Function Mapping, when Gradient ' meets given Rule of judgment, then saves gradient ' zero setting, otherwise save initial value.
Then, cynapse is updated.Control module issues load instruction and reads out neuron number according to input, the corresponding rope of cynapse w Argument is according to index, and after reading gradient value gradient, the data compression unit being sent into data processing section compresses it Processing, i.e., compress input according to index accordingly, is then sent into computing module and carries out operation.For each group of fortune It calculates in unit, gradient value gradient and original nerve metadata Input is sent into the first arithmetic element 1 first, complete multiplication Operation, i.e., carry out one-to-one correspondence with corresponding gradient for input and be multiplied, be then sent into the second arithmetic element 2, using adding Same a line is waited for that cumulative data add up by method tree.It is if the accumulation result is not the accumulation result finally needed, i.e., cumulative Operation is not yet completed, then skips third arithmetic element 3, which is saved, and waits add up together next time;Otherwise, it is sent into Subsequent arithmetic is carried out in ALU in corresponding third arithmetic element 3.Third arithmetic element 3 receive the cumulative data, divided by and this The number of the connected gradient of cynapse, obtains the renewal amount of the cynapse.The renewal amount is multiplied with learning rate, is saved to storage Module.Then, cynapse w and synapse turnover amount are read out from memory module, are sent into complete in the first arithmetic element 1 of computing module At individual add operation, because without carrying out other cumulative and nonlinear operations, therefore can directly skip the second operation list Member 2 and third arithmetic element 3, save updated cynapse data.
The sparse training device of the disclosure can also support the positive of sparse or dense neural network to speculate part, process class It is similar to update the process of cynapse.Thus the sparse training device of the disclosure can be used in accelerating the training process of entire neural network.
The sparse training device of the disclosure is introduced by taking the full articulamentum of sparse convolution neural network as an example below for reversely instructing Practice the acceleration of process.When reverse train, it is assumed that the gradient value gradient that input lower layer is transmitted, it is prominent when forward direction speculates Touching is w, and corresponding sparse index is index, and input neuron value when forward direction speculates is input.
Firstly, updating upper one layer of gradient value gradient '.Control module issues load instruction and reads gradient value After gradient and corresponding cynapse w, the data compression unit for being sent into data processing module carries out screening compression processing to it.No Harm assumes to give certain judgment threshold here, if the absolute value of gradient is less than given threshold value, by gradient and right Cynapse in requisition for operation is skipped, i.e., is not sent into computing module and carries out operation;Otherwise, if the absolute value of gradient value is not small In given threshold value, then it is sent into computing module together with corresponding cynapse w and carries out operation.Judgment threshold herein can also be it His condition, such as given determination range, given function mapping etc. when waiting the given condition of satisfactions, then by gradient and corresponding need The cynapse of operation is wanted to skip.What if cynapse w was saved in the form of sparse, need to be sent into before computing module, by w and The Data expansion unit that index is fed together data processing module is extended it, becomes non-sparse mode, then again and Gradient, which is fed together in computing module, carries out operation.In computing module, data are sent into the first arithmetic element 1 first In, the multiplication operation of gradient value and cynapse is completed, i.e., w is subjected to one-to-one correspondence with corresponding gradient and is multiplied, is then sent into In add tree in corresponding second arithmetic element 2 of the column, it is tired that same row waited for that cumulative N number of data carry out using add tree Add.If the accumulation result is not the accumulation result finally needed, i.e. accumulation operations are not yet completed, then skip third arithmetic element 3, which is saved, waits add up together next time;Otherwise, it is sent into after the completion of the ALU in corresponding third arithmetic element 3 Reforwarding is calculated: if upper one layer has activation primitive, by the anti-letter of the accumulation result and activation primitive in third arithmetic element 3 Number is multiplied, and obtains final gradient '.If user provides zero setting threshold value at this time, in third arithmetic element 3 into Row judgement, if the absolute value of the gradient ' is less than the zero setting threshold value, gradient '=0.Otherwise constant.Result is write Return to memory module storage.Wherein zero setting threshold value can also be other Rule of judgment, such as Function Mapping, and it is full to work as gradient ' Gradient ' zero setting is then saved, otherwise saves initial value by given Rule of judgment enough.
Then, cynapse is updated.Control module issues load instruction and reads out neuron number according to input, the corresponding rope of cynapse w Argument is according to index, and after reading gradient value gradient, the data compression unit being sent into data processing module compresses it Processing is compressed input neuron input according to cynapse index data index accordingly.Then it is sent into computing module Middle carry out operation.For in each group of arithmetic element, data are sent into the first arithmetic element 1 first, multiplication operation is completed, i.e., will Input carries out one-to-one correspondence with corresponding gradient and is multiplied, and obtains synapse turnover amount, again multiplied by learning rate, then and PE After the received original cynapse transmitted from memory module is added, new cynapse w is directly obtained.Then skip the second arithmetic element 2 and third arithmetic element 3, save back memory module.
In addition, though being to carry out gradient value judgement by given threshold value, but disclosure gradient value judges in above-described embodiment Condition is not limited to threshold decision condition, can also be Function Mapping Rule of judgment, the threshold decision condition can include: be less than One given threshold value is greater than a given threshold value, in a given value range or outside a given value range etc.;And in the disclosure Gradient value Rule of judgment (such as can be all made of with compression Rule of judgment and extension Rule of judgment using identical Rule of judgment The selection of threshold decision, threshold size equally may be the same or different), (such as it can also be distinguished using different Rule of judgment For threshold decision and mapping judgement, specific threshold value and mapping relations can also be different), do not influence the reality of the disclosure It is existing.
The sparse training device of the disclosure can also support the positive of sparse or dense neural network to speculate part, process class It is similar to update the process of cynapse.Thus the sparse training device of the disclosure can be used in accelerating the training process of entire neural network. In addition, the sparse training device of the disclosure is not limited to full articulamentum and convolutional layer for neural network, it also can be applied to His layer, does not influence the realization of the disclosure.
All modules of the sparse training device of the disclosure can be hardware configuration, the physics realization of hardware configuration include but It is not limited to physical device, physical device includes but is not limited to transistor, memristor, DNA computer.
In one embodiment, the present disclosure discloses a chips comprising above-mentioned neural metwork training device.
In one embodiment, the present disclosure discloses a chip-packaging structures comprising said chip.
In one embodiment, the present disclosure discloses a boards comprising said chip encapsulating structure.
In one embodiment, the present disclosure discloses an electronic devices comprising above-mentioned board.
The electronic device includes data processing equipment, robot, computer, printer, scanner, tablet computer, intelligence Terminal, mobile phone, automobile data recorder, navigator, sensor, camera, cloud server, camera, video camera, projector, wrist-watch, Earphone, mobile storage, the wearable device vehicles, household electrical appliance, and/or Medical Devices.
The vehicles include aircraft, steamer and/or vehicle;The household electrical appliance include TV, air-conditioning, micro-wave oven, Refrigerator, electric cooker, humidifier, washing machine, electric light, gas-cooker, kitchen ventilator;The Medical Devices include Nuclear Magnetic Resonance, B ultrasound instrument And/or electrocardiograph.
Particular embodiments described above has carried out further in detail the purpose of the disclosure, technical scheme and beneficial effects Describe in detail it is bright, it is all it should be understood that be not limited to the disclosure the foregoing is merely the specific embodiment of the disclosure Within the spirit and principle of the disclosure, any modification, equivalent substitution, improvement and etc. done should be included in the guarantor of the disclosure Within the scope of shield.

Claims (10)

1. a kind of training device, comprising:
Data processing module, for being compressed or being extended to input data;And
Computing module is connect with the data processing module, and treated for receiving the data processing module, and data carry out Operation.
2. training device according to claim 1, wherein the data processing module includes:
Data compression unit, for being compressed according to a compression Rule of judgment to input data;
And Data expansion unit, for being extended according to an extension Rule of judgment to input data.
3. training device according to claim 2, wherein the Data expansion unit, for expanding input data The sparse Data expansion of compression is unpacked format by exhibition.
4. training device according to claim 2, wherein the compression Rule of judgment and extension Rule of judgment include threshold value Rule of judgment or Function Mapping Rule of judgment.
5. training device according to claim 4, wherein the threshold decision condition, comprising: less than a given threshold value, Greater than a given threshold value, in a given value range or outside a given value range.
6. training device according to claim 2, wherein the data compression unit is according to the sparse index values pair of data Input data is screened and is compressed, and the data to operation are obtained;Or judged according to the value of data itself, so that screening is simultaneously Compression obtains the numerical value for meeting the compression Rule of judgment.
7. training device according to claim 6, wherein the data compression unit is according to the sparse indexes of cynapse data Value obtains the neuron number evidence to operation according to being screened and being compressed to the neuron number of input, or according to neuron number evidence Sparse index value is screened and is compressed to the cynapse data of input, and the cynapse data to operation are obtained.
8. training device according to claim 6, wherein the data compression unit is given according to the value of cynapse itself with one Determine threshold value to compare, screen and compress to obtain the cynapse data that absolute value is not less than given threshold value, or according to neuron itself Value screens compared with a given threshold value and compresses to obtain the neuron number evidence that absolute value is not less than given threshold value.
9. training device according to claim 1, wherein the data processing module is also used to be judged according to a gradient value Condition determines whether gradient value and data to operation being sent to computing module.
10. training device according to claim 9, wherein the gradient value Rule of judgment include threshold decision condition or Function Mapping Rule of judgment.
CN201710474297.9A 2017-06-13 2017-06-21 Training device Active CN109102074B (en)

Priority Applications (7)

Application Number Priority Date Filing Date Title
CN201710474297.9A CN109102074B (en) 2017-06-21 2017-06-21 Training device
EP19217768.1A EP3657403A1 (en) 2017-06-13 2018-06-12 Computing device and method
EP18818258.8A EP3637327B1 (en) 2017-06-13 2018-06-12 Computing device and method
PCT/CN2018/090901 WO2018228399A1 (en) 2017-06-13 2018-06-12 Computing device and method
US16/698,976 US11544542B2 (en) 2017-06-13 2019-11-28 Computing device and method
US16/698,984 US11544543B2 (en) 2017-06-13 2019-11-28 Apparatus and method for sparse training acceleration in neural networks
US16/698,988 US11537858B2 (en) 2017-06-13 2019-11-28 Computing device and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710474297.9A CN109102074B (en) 2017-06-21 2017-06-21 Training device

Publications (2)

Publication Number Publication Date
CN109102074A true CN109102074A (en) 2018-12-28
CN109102074B CN109102074B (en) 2021-06-01

Family

ID=64795966

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710474297.9A Active CN109102074B (en) 2017-06-13 2017-06-21 Training device

Country Status (1)

Country Link
CN (1) CN109102074B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109726800A (en) * 2018-12-29 2019-05-07 北京中科寒武纪科技有限公司 Operation method, device and Related product

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105512723A (en) * 2016-01-20 2016-04-20 南京艾溪信息科技有限公司 Artificial neural network calculating device and method for sparse connection
CN105913450A (en) * 2016-06-22 2016-08-31 武汉理工大学 Tire rubber carbon black dispersity evaluation method and system based on neural network image processing
CN106027300A (en) * 2016-05-23 2016-10-12 深圳市飞仙智能科技有限公司 System and method for parameter optimization of intelligent robot applying neural network
CN106548234A (en) * 2016-11-17 2017-03-29 北京图森互联科技有限责任公司 A kind of neural networks pruning method and device
CN106650928A (en) * 2016-10-11 2017-05-10 广州视源电子科技股份有限公司 Method and device for optimizing neural network
CN106796668A (en) * 2016-03-16 2017-05-31 香港应用科技研究院有限公司 For the method and system that bit-depth in artificial neural network is reduced
CN106779051A (en) * 2016-11-24 2017-05-31 厦门中控生物识别信息技术有限公司 A kind of convolutional neural networks model parameter processing method and system
CN106779068A (en) * 2016-12-05 2017-05-31 北京深鉴智能科技有限公司 The method and apparatus for adjusting artificial neural network
CN107004157A (en) * 2015-01-22 2017-08-01 高通股份有限公司 Model compression and fine setting

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107004157A (en) * 2015-01-22 2017-08-01 高通股份有限公司 Model compression and fine setting
CN105512723A (en) * 2016-01-20 2016-04-20 南京艾溪信息科技有限公司 Artificial neural network calculating device and method for sparse connection
CN106796668A (en) * 2016-03-16 2017-05-31 香港应用科技研究院有限公司 For the method and system that bit-depth in artificial neural network is reduced
CN106027300A (en) * 2016-05-23 2016-10-12 深圳市飞仙智能科技有限公司 System and method for parameter optimization of intelligent robot applying neural network
CN105913450A (en) * 2016-06-22 2016-08-31 武汉理工大学 Tire rubber carbon black dispersity evaluation method and system based on neural network image processing
CN106650928A (en) * 2016-10-11 2017-05-10 广州视源电子科技股份有限公司 Method and device for optimizing neural network
CN106548234A (en) * 2016-11-17 2017-03-29 北京图森互联科技有限责任公司 A kind of neural networks pruning method and device
CN106779051A (en) * 2016-11-24 2017-05-31 厦门中控生物识别信息技术有限公司 A kind of convolutional neural networks model parameter processing method and system
CN106779068A (en) * 2016-12-05 2017-05-31 北京深鉴智能科技有限公司 The method and apparatus for adjusting artificial neural network

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109726800A (en) * 2018-12-29 2019-05-07 北京中科寒武纪科技有限公司 Operation method, device and Related product

Also Published As

Publication number Publication date
CN109102074B (en) 2021-06-01

Similar Documents

Publication Publication Date Title
CN109543832B (en) Computing device and board card
US11544543B2 (en) Apparatus and method for sparse training acceleration in neural networks
CN109522052B (en) Computing device and board card
CN108416327B (en) Target detection method and device, computer equipment and readable storage medium
CN107341547A (en) A kind of apparatus and method for being used to perform convolutional neural networks training
CN107578453A (en) Compressed image processing method, apparatus, electronic equipment and computer-readable medium
EP3564863B1 (en) Apparatus for executing lstm neural network operation, and operational method
CN109754074A (en) A kind of neural network quantization method, device and Related product
CN111738433B (en) Reconfigurable convolution hardware accelerator
CN107957976A (en) A kind of computational methods and Related product
CN111160547B (en) Device and method for artificial neural network operation
CN110909870B (en) Training device and method
CN109903350A (en) Method for compressing image and relevant apparatus
CN109711540B (en) Computing device and board card
CN109583579B (en) Computing device and related product
CN110837567A (en) Method and system for embedding knowledge graph
CN109102074A (en) A kind of training device
CN109359542A (en) The determination method and terminal device of vehicle damage rank neural network based
CN109684085A (en) Memory pool access method and Related product
CN117063182A (en) Data processing method and device
CN110020720B (en) Operator splicing method and device
Li et al. A fault detection optimization method based on chaos adaptive artificial fish swarm algorithm on distributed control system
CN111062477A (en) Data processing method, device and storage medium
CN109543835A (en) Operation method, device and Related product
CN109558565A (en) Operation method, device and Related product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant