CN107977707A - A kind of method and computing device for resisting distillation neural network model - Google Patents
A kind of method and computing device for resisting distillation neural network model Download PDFInfo
- Publication number
- CN107977707A CN107977707A CN201711179045.XA CN201711179045A CN107977707A CN 107977707 A CN107977707 A CN 107977707A CN 201711179045 A CN201711179045 A CN 201711179045A CN 107977707 A CN107977707 A CN 107977707A
- Authority
- CN
- China
- Prior art keywords
- network model
- label
- loss
- training
- loss function
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/082—Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a kind of method for resisting distillation neural network model, wherein neural network model includes the softmax layers of feedforward network and the more classification lower probability vectors of output with feature Rotating fields, and this method is suitable for performing in computing device, including step:Scaling layer is added between the feedforward network and softmax layer of original neural network model according to vapo(u)rizing temperature, generates first nerves network model;Using the first label training first nerves network model of training sample itself, nervus opticus network model is obtained;Training sample is inputted into nervus opticus network model, second label of the training sample in classification lower probability vector more is characterized through softmax layers of output;Using the second label and the first label while constrained learning nervus opticus network model, third nerve network model is obtained;The scaling layer in third nerve network model is deleted, to obtain the neural network model after confrontation distillation.The present invention discloses correspondingly computing device in the lump.
Description
Technical field
The present invention relates to technical field of image processing, especially a kind of method for resisting distillation neural network model and calculating
Equipment.
Background technology
Deep neural network can be always obtained very accurately as a result, in magnanimity number on present classification regression problem
The deep neural network model come according under support, training also has very strong generalization ability, therefore, deep neural network in recent years
Widely applied in computer vision, speech recognition etc..But these deep neural network models are in practical application
In also can there are some defects and loophole.For example, in the case where not knowing network architecture and parameter, the input to network
Special small sample perturbations are done, these can't have any impact judgement from the subjective of people, but can but make network model
The very high error result of confidence level is exported, these are referred to as " to resisting sample " by the input that " small sample perturbations " are crossed.The above problem is straight
Connect the generalization ability and security for having influenced neural network model.
Being commonly used in the generalization ability of raising neural network model and the scheme of security is:In neural network model
Added in training data to resisting sample, error rate of the network model to these confrontation specimen discernings is reduced with this, while further
Improve the generalization ability of model.However, diversity built to resisting sample etc. result in this processing mode and be not reaching to expection
Effect.
Therefore, it is necessary to a kind of scheme for being capable of providing neural network model generalization ability and security.
The content of the invention
For this reason, the present invention provides a kind of method and computing device for resisting distillation neural network model, to try hard to solve
Or at least alleviate existing at least one problem above.
According to an aspect of the invention, there is provided a kind of method for resisting distillation neural network model, wherein nerve net
Network model includes the softmax layers of feedforward network and the more classification lower probability vectors of output with feature Rotating fields, and this method is fitted
In being performed in computing device, including step:According to vapo(u)rizing temperature original neural network model feedforward network and
Scaling layer is added between softmax layers, generates first nerves network model;Utilize the first label training of training sample itself
One neural network model, obtains nervus opticus network model;Training sample is inputted into nervus opticus network model, through softmax
Second label of the layer output characterization training sample in classification lower probability vector more;Constrained at the same time using the second label and the first label
Training nervus opticus network model, obtains third nerve network model;And the scaling layer in deletion third nerve network model,
To obtain the neural network model after confrontation distillation.
Alternatively, in the method according to the invention, scaling layer is suitable for doing according to input of the vapo(u)rizing temperature to softmax layers
Diminution is handled.
Alternatively, in the method according to the invention, the first label training first nerves net of training sample itself is utilized
Network model, which obtains the step of nervus opticus network model, to be included:First nerves is supervised by first-loss function using the first label
The training of network model, obtains nervus opticus network model.
Alternatively, in the method according to the invention, using the second label and the first label, constrained learning second is refreshing at the same time
The step of obtaining third nerve network model through network model includes:Pass through the god of first-loss function pair second using the first label
Classification supervised training is carried out through network model;Nervus opticus network model is carried out by the second loss function using the second label
Return supervised training;And combination first-loss function and the second loss function train to obtain third nerve network model.
Alternatively, in the method according to the invention, train to obtain with reference to first-loss function and the second loss function
The step of three neural network models, includes:Weighting is done to first-loss function and the second loss function to handle to obtain the god of training the 3rd
Final loss function through network model;And utilize the final loss function training third nerve network model.
Alternatively, in the method according to the invention, first-loss function is:
loss1=-logf (zk)
Wherein,
In formula, loss1For first-loss functional value, batch size when N is training, zkFor full articulamentum in feedforward network
The output of k-th of neuron.
Alternatively, in the method according to the invention, the second loss function is:
In formula, loss2For the second loss function value, M is that the classification of the classification through softmax layers of output is total, x1iTo work as
The probability vector of preceding i-th of classification output of network, x2iThe probability vector of i-th of the classification characterized for corresponding second label.
Alternatively, in the method according to the invention, the final loss function of training third nerve network model is defined as:
Loss=w1×loss1+w2×loss2,
In formula, loss is final loss function value, w1And w2First-loss functional value and the second loss function value are represented respectively
Weight factor.
According to another aspect of the present invention, there is provided a kind of computing device, including:One or more processors;And storage
Device;One or more programs, wherein one or more program storages in memory and be configured as by one or more handle
Device performs, and one or more programs include being used for the instruction for performing the either method in method as described above.
In accordance with a further aspect of the present invention, there is provided a kind of computer-readable storage medium for storing one or more programs
Matter, one or more programs include instruction, instruct when computing device so that computing device method as described above
In either method.
The method of confrontation distillation neural network model according to the present invention, by being added in original neural network model
Scaling layer, does neural network model and distills in itself, without changing the feature Rotating fields in neural network model, is effectively reduced
Error rate of the neural network model when reply is to resisting sample;Also, with the second label of training sample and itself the
One label supervises the training of neural network model at the same time, improves the generalization ability of neural network model.
Brief description of the drawings
In order to realize above-mentioned and related purpose, some illustrative sides are described herein in conjunction with following description and attached drawing
Face, these aspects indicate the various modes that can put into practice principles disclosed herein, and all aspects and its equivalent aspect
It is intended to fall under in the range of theme claimed.Read following detailed description in conjunction with the accompanying drawings, the disclosure it is above-mentioned
And other purposes, feature and advantage will be apparent.Throughout the disclosure, identical reference numeral generally refers to identical
Component or element.
Fig. 1 shows the organigram of computing device 100 according to an embodiment of the invention;And
Fig. 2 shows the flow chart of the method 200 of confrontation distillation neural network model according to an embodiment of the invention.
Embodiment
The exemplary embodiment of the disclosure is more fully described below with reference to accompanying drawings.Although the disclosure is shown in attached drawing
Exemplary embodiment, it being understood, however, that may be realized in various forms the disclosure without should be by embodiments set forth here
Limited.On the contrary, these embodiments are provided to facilitate a more thoroughly understanding of the present invention, and can be by the scope of the present disclosure
Completely it is communicated to those skilled in the art.
Fig. 1 is the block diagram of Example Computing Device 100.In basic configuration 102, computing device 100, which typically comprises, is
System memory 106 and one or more processor 104.Memory bus 108 can be used in processor 104 and system storage
Communication between device 106.
Depending on desired configuration, processor 104 can be any kind of processor, include but not limited to:Microprocessor
Device (μ P), microcontroller (μ C), digital information processor (DSP) or any combination of them.Processor 104 can include all
Cache, processor core such as one or more rank of on-chip cache 110 and second level cache 112 etc
114 and register 116.Exemplary processor core 114 can include arithmetic and logical unit (ALU), floating-point unit (FPU),
Digital signal processing core (DSP core) or any combination of them.Exemplary Memory Controller 118 can be with processor
104 are used together, or in some implementations, Memory Controller 118 can be an interior section of processor 104.
Depending on desired configuration, system storage 106 can be any type of memory, include but not limited to:Easily
The property lost memory (RAM), nonvolatile memory (ROM, flash memory etc.) or any combination of them.System stores
Device 106 can include operating system 120, one or more apply 122 and routine data 124.In some embodiments,
It may be arranged to be operated using routine data 124 on an operating system using 122.In certain embodiments, computing device
100 are configured as performing the method that neural network model is distilled in confrontation, are just contained in routine data 124 for performing the side
The instruction of method.
Computing device 100 can also include contributing to from various interface equipments (for example, output equipment 142, Peripheral Interface
144 and communication equipment 146) to basic configuration 102 via the communication of bus/interface controller 130 interface bus 140.Example
Output equipment 142 include graphics processing unit 148 and audio treatment unit 150.They can be configured as contribute to via
One or more A/V port 152 communicates with the various external equipments of such as display or loudspeaker etc.Outside example
If interface 144 can include serial interface controller 154 and parallel interface controller 156, they, which can be configured as, contributes to
Via one or more I/O port 158 and such as input equipment (for example, keyboard, mouse, pen, voice-input device, image
Input equipment) or the external equipment of other peripheral hardwares (such as printer, scanner etc.) etc communicate.Exemplary communication is set
Standby 146 can include network controller 160, it can be arranged to be easy to via one or more communication port 164 and one
The communication that other a or multiple computing devices 162 pass through network communication link.
Network communication link can be an example of communication media.Communication media can be usually presented as in such as carrier wave
Or computer-readable instruction in the modulated data signal of other transmission mechanisms etc, data structure, program module, and can
With including any information delivery media." modulated data signal " can such signal, one in its data set or more
It is a or it change can the mode of coding information in the signal carry out.As nonrestrictive example, communication media can be with
Include the wire medium of such as cable network or private line network etc, and it is such as sound, radio frequency (RF), microwave, infrared
(IR) the various wireless mediums or including other wireless mediums.Term computer-readable medium used herein can include depositing
Both storage media and communication media.In certain embodiments, one or more programs are stored in computer-readable medium, this
Or multiple programs include performing the instruction of some methods, such as according to an embodiment of the invention, computing device 100 passes through the finger
Make to perform the method for confrontation distillation neural network model.
Computing device 100 can be implemented as a part for portable (or mobile) electronic equipment of small size, these electronics are set
Standby can be such as cell phone, digital camera, personal digital assistant (PDA), personal media player device, wireless network
Browsing apparatus, personal helmet, application specific equipment or the mixing apparatus that any of the above function can be included.Calculating is set
Standby 100 are also implemented as including desktop computer and the personal computer of notebook computer configuration.
Below with reference to Fig. 2, the side that neural network model is distilled in confrontation according to an embodiment of the invention is elaborated
Method 200 realizes flow.
The general structure of neural network model according to embodiments of the present invention can be divided into two parts, i.e. have characteristic layer
The softmax layers of the feedforward network of structure and the more classification lower probability vectors of output.Wherein, feedforward network generally has at least one
Convolutional layer, pond layer and full articulamentum, input data for example through multiple convolution and pondization operation after, again through full articulamentum unicom close
Exported after and.The result of normalized is made in the softmax layers of output that can be understood as to feedforward network, it is assumed that neutral net mould
Type is used for classifying to picture, and picture classification has 100 kinds at present, that is exactly one 100 dimension by softmax layers of output
Vector, first value in vector is exactly probable value that photo current belongs to the first classification, it is vectorial in second value be exactly
It is 1 that photo current, which belongs to the sum of the probable value ... of the second classification also, the vector value of this hundred dimension,.
It should be noted that the general application scenarios of method 200 are to carry out classification processing using neural network model, its is right
The concrete structure of neural network model is not limited.In practical applications, the feedforward network of neural network model can be selected
AlexNet, VGGNet, Google Inception Net, ResNet etc. be existing or the network structure that redefines in appoint
Meaning one, the embodiment of the present invention is not restricted this.
Method 200 starts from step S210, if vapo(u)rizing temperature is T, according to vapo(u)rizing temperature in original neural network model
Scaling layer is added between feedforward network and softmax layers, generates first nerves network model.An implementation according to the present invention
Example, scaling layer do diminution processing according to vapo(u)rizing temperature T to softmax layers of input (that is, the output of feedforward network).Namely
Say, increase a scaling layer between the feedforward network and softmax layer of original neural network model, scaling layer is by feedforward network
Output (it is, output of the full articulamentum of last in feedforward network) do the diminution of 1/T, it is then the data after diminution are defeated
Enter to softmax layers.The embodiment of the present invention is not limited the value of vapo(u)rizing temperature T, in practical applications, according to it is preceding to
The size and actual conditions of network choose the value of T.
Then in step S220, using the first label training first nerves network model of training sample itself, obtain
Nervus opticus network model.According to one embodiment of present invention, training sample is inputted into first nerves network model, by the
One loss function supervises the training of first nerves network model on the training sample of the first label for labelling, and will train network
The neural network model of interior each parameter as nervus opticus network model, wherein, the first label is the label of training sample in itself,
Referred to as hard label.
Then in step S230, the training sample in step S220 is inputted to the nervus opticus trained through step S220
In network model, second label of the training sample in classification lower probability vector more is characterized through softmax layers of output, is referred to as soft
Target, prediction probability vector of the second label, that is, nervus opticus network model to training sample.
Then in step S240, the second label and the first label while constrained learning nervus opticus network model are utilized
(that is, the neural network model after distillation), obtains third nerve network model.According to one embodiment of present invention, training the
During two neural network models, with the first label (hard label) and the second label (soft target) of above-mentioned training sample
Training network model at the same time, and distribute two groups of loss functions.Specific steps are described as follows:
On the one hand, for the first label (hard label), by first-loss function pair nervus opticus network model into
Row classification supervised training.Training process can utilize the first label training second of training sample itself with step S220 herein
Neural network model.Alternatively, first-loss function is, for example, the full name of Softmax the with loss, Caffe in Caffe
It is Convolutional Architecture for Fast Feature Embedding, it provides the kit increased income,
For training, testing, finely tuning and deployment depth learning model.According to one embodiment of present invention, choose in Caffe
Softmax with loss carry out classification supervised learning.
On the other hand, for the second label (soft target), by the second loss function to nervus opticus network model
Carry out recurrence supervised training.Second loss function is, for example, the Euclidean loss in Caffe, for learning to be fitted soft
The output vector of target.It should be noted that when training nervus opticus network herein, because the presence of scaling layer, forward direction net
The output of network has all carried out the diminution of 1/T.
Finally, train to obtain third nerve network model with reference to first-loss function and the second loss function.According to this hair
Bright one embodiment, sets different weights, to first-loss function respectively for first-loss function and the second loss function
Weighting is done with the second loss function to handle to obtain the final loss function for training third nerve network model, utilizes the final loss
Function trains third nerve network model.
Then in step s 250, the scaling layer in third nerve network model is deleted, to obtain the god after confrontation distillation
Through network model.That is, the distillation temperature by the scaling layer in the third nerve network model trained through step S240
Degree T is set to 1 (that is, cancelling scaling processing), obtains the neural network model after confrontation distillation.
Embodiment according to the present invention, by adding scaling layer in original neural network model, to neutral net
Model is done in itself to be distilled, and without changing the feature Rotating fields in neural network model, is significantly reduced neural network model and is existed
Error rate when tackling to resisting sample;Also, with the probability vector (i.e. the second label) and the first label of itself after distillation
The training of neural network model is supervised at the same time, improves the generalization ability of neural network model.
To further illustrate the above method 200, below by taking the chin classification during human face five-sense-organ is classified as an example, method is introduced
200 specific implementation procedure.
The first step, chooses traditional VGG-Face networks as feedforward network, it is T=20 to take vapo(u)rizing temperature.In forward direction net
Scaling layer is added between network and softmax layers, output to feedforward network (that is, the full articulamentum of last in feedforward network
Output) carry out 1/20 diminution, be input to again in softmax layer, the neural network model that will add scaling layer is refreshing as first
Through network model.
VGGNet is Oxford University's computer vision research group (Visual Geometry Group) and Google
The depth convolutional neural networks that the researcher of DeepMind companies researches and develops together, are often used to abstract image feature, VGG-Face
It is the network that one of them in VGG groups is used for doing recognition of face.It is using between the depth and its performance of convolutional neural networks
Relation, by the small-sized convolution kernel and 2 × 2 maximum pond layer of stacking 3 × 3 repeatedly, construct 16~19 layer depths
Convolutional neural networks, whole network is simple for structure, all using an equal amount of convolution kernel size (3 × 3) and maximum pond size
(2×2).More details on VGGNet can refer to paper:VERY DEEP CONVOLUTIONAL NETWORKS FOR
LARGE-SCALE IMAGE RECOGNITION, herein do not illustrate its network structure excessively.
Second step, is instructed using first-loss function on the training sample (i.e. training image) of the first label for labelling
Practice, using trained neural network model as nervus opticus network model.The label of first label, that is, training image in itself.Its
In, first-loss function (softmax loss) is defined as:
loss1=-logf (zk)
Wherein,
In above formula, loss1For first-loss functional value, batch size (batch_size) when N is training is popular next
Say, N can be understood as input sample quantity of the first nerves network model when a forward-propagating is handled, zkFor feedforward network
In full articulamentum k-th of neuron output.
3rd step, nervus opticus network model is input to by original training image, obtains each training image
The probability vector of output in two neural network models (that is, distilled model), as the second label.For example, some training figure
Second label of picture is [0.93,0.02,0.05], and the chin that three probable values correspond in the image respectively is under square jaw, point
Bar, circle chin probability.
4th step, the training of nervus opticus network model is supervised using the first label and the second label, will be trained at the same time
Neural network model as third nerve network model.
As it was previously stated, according to one embodiment of present invention, for the first label, still using the first damage in second step
Lose function (softmax loss) and classification supervision is carried out to nervus opticus network model;For the second label, the second loss function
Selection Euclid's loss function (euclidean loss) carries out recurrence supervision to nervus opticus network model.Wherein, second
Loss function (that is, euclidean loss) is defined as:
In above formula, loss2For the second loss function value, M is that the classification sum of the classification through softmax layers of output is (that is, special
The dimension of sign), x1iFor the probability vector of i-th of classification output of current network, x2iI-th characterized for corresponding second label
The probability vector of classification.
Then, different weights is assigned to above-mentioned two loss function, then trains the final damage of third nerve network model
Function is lost finally to be defined as:
Loss=w1×loss1+w2×loss2
In formula, loss is final loss function value, w1And w2First-loss functional value and the second loss function value are represented respectively
Weight factor.w1And w2Value depending on training, the embodiment of the present invention is not restricted this.
5th step, deletes the scaling layer in trained third nerve network model, i.e., vapo(u)rizing temperature T is set to 1, gained
Neural network model can be used as confrontation distillation after neural network model.
The method of confrontation distillation neural network model according to the present invention, in a manner of distilling neural network model itself,
Error rate of the neural network model when reply is to resisting sample is effectively reduced, meanwhile, the classification obtained in classification problem is put down
Face is more reasonable, classification results are more accurate.Also, scheme according to the present invention need not be constructed to resisting sample, improved well
The security of neural network model.
It should be appreciated that in order to simplify the disclosure and help to understand one or more of each inventive aspect, it is right above
The present invention exemplary embodiment description in, each feature of the invention be grouped together into sometimes single embodiment, figure or
In person's descriptions thereof.However, the method for the disclosure should be construed to reflect following intention:I.e. claimed hair
The bright feature more features required than being expressly recited in each claim.More precisely, as the following claims
As book reflects, inventive aspect is all features less than single embodiment disclosed above.Therefore, it then follows specific real
Thus the claims for applying mode are expressly incorporated in the embodiment, wherein each claim is used as this hair in itself
Bright separate embodiments.
Those skilled in the art should understand that the module or unit or group of the equipment in example disclosed herein
Part can be arranged in equipment as depicted in this embodiment, or alternatively can be positioned at and the equipment in the example
In different one or more equipment.Module in aforementioned exemplary can be combined as a module or be segmented into addition multiple
Submodule.
Those skilled in the art, which are appreciated that, to carry out adaptively the module in the equipment in embodiment
Change and they are arranged in one or more equipment different from the embodiment.Can be the module or list in embodiment
Member or component be combined into a module or unit or component, and can be divided into addition multiple submodule or subelement or
Sub-component.In addition at least some in such feature and/or process or unit exclude each other, it can use any
Combination is disclosed to all features disclosed in this specification (including adjoint claim, summary and attached drawing) and so to appoint
Where all processes or unit of method or equipment are combined.Unless expressly stated otherwise, this specification (including adjoint power
Profit requires, summary and attached drawing) disclosed in each feature can be by providing the alternative features of identical, equivalent or similar purpose come generation
Replace.
In addition, it will be appreciated by those of skill in the art that although some embodiments described herein include other embodiments
In included some features rather than further feature, but the combination of the feature of different embodiments means in of the invention
Within the scope of and form different embodiments.For example, in the following claims, embodiment claimed is appointed
One of meaning mode can use in any combination.
Various technologies described herein can combine hardware or software, or combinations thereof is realized together.So as to the present invention
Method and apparatus, or some aspects of the process and apparatus of the present invention or part can take embedded tangible media, such as soft
The form of program code (instructing) in disk, CD-ROM, hard disk drive or other any machine readable storage mediums,
Wherein when program is loaded into the machine of such as computer etc, and is performed by the machine, the machine becomes to put into practice this hair
Bright equipment.
In the case where program code performs on programmable computers, computing device generally comprises processor, processor
Readable storage medium (including volatile and non-volatile memory and or memory element), at least one input unit, and extremely
A few output device.Wherein, memory is arranged to store program codes;Processor is arranged to according to the memory
Instruction in the said program code of middle storage, performs method of the present invention.
By way of example and not limitation, computer-readable medium includes computer storage media and communication media.Calculate
Machine computer-readable recording medium includes computer storage media and communication media.Computer-readable storage medium storage such as computer-readable instruction,
The information such as data structure, program module or other data.Communication media is generally modulated with carrier wave or other transmission mechanisms etc.
Data-signal processed passes to embody computer-readable instruction, data structure, program module or other data including any information
Pass medium.Any combination above is also included within the scope of computer-readable medium.
In addition, be described as herein can be by the processor of computer system or by performing for some in the embodiment
The method or the combination of method element that other devices of the function are implemented.Therefore, have and be used to implement the method or method
The processor of the necessary instruction of element forms the device for being used for implementing this method or method element.In addition, device embodiment
Element described in this is the example of following device:The device is used to implement as in order to performed by implementing the element of the purpose of the invention
Function.
As used in this, unless specifically stated, come using ordinal number " first ", " second ", " the 3rd " etc.
Description plain objects are merely representative of the different instances for being related to similar object, and are not intended to imply that the object being so described must
Must have the time it is upper, spatially, in terms of sequence or given order in any other manner.
Although according to the embodiment of limited quantity, the invention has been described, benefits from above description, the art
It is interior it is clear for the skilled person that in the scope of the present invention thus described, it can be envisaged that other embodiments.Additionally, it should be noted that
The language that is used in this specification primarily to readable and teaching purpose and select, rather than in order to explain or limit
Determine subject of the present invention and select.Therefore, in the case of without departing from the scope and spirit of the appended claims, for this
Many modifications and changes will be apparent from for the those of ordinary skill of technical field.For the scope of the present invention, to this
The done disclosure of invention is illustrative and not restrictive, and it is intended that the scope of the present invention be defined by the claims appended hereto.
Claims (10)
1. a kind of method for resisting distillation neural network model, wherein neural network model include the forward direction with feature Rotating fields
The softmax layers of network and the more classification lower probability vectors of output, the method are suitable for performing in computing device, the method bag
Include step:
Scaling layer is added between the feedforward network and softmax layer of original neural network model according to vapo(u)rizing temperature, is generated
First nerves network model;
Using the first label training first nerves network model of training sample itself, nervus opticus network model is obtained;
Training sample is inputted into the nervus opticus network model, training sample is characterized under more classification through softmax layers of output
Second label of probability vector;
Using second label and the first label while nervus opticus network model described in constrained learning, third nerve net is obtained
Network model;And
The scaling layer in third nerve network model is deleted, to obtain the neural network model after confrontation distillation.
2. the method for claim 1, wherein the scaling layer is suitable for according to input of the vapo(u)rizing temperature to softmax layers
Do diminution processing.
3. method as claimed in claim 1 or 2, wherein, utilize the first label training first nerves net of training sample itself
Network model, which obtains the step of nervus opticus network model, to be included:
The training of the first nerves network model is supervised by first-loss function using the first label, obtains nervus opticus net
Network model.
4. such as the method any one of claim 1-3, wherein, constrained at the same time using second label and the first label
The step of training nervus opticus network model obtains third nerve network model includes:
Classification supervised training is carried out by nervus opticus network model described in first-loss function pair using the first label;
Recurrence supervised training is carried out to the nervus opticus network model by the second loss function using the second label;And
Train to obtain third nerve network model with reference to first-loss function and the second loss function.
5. method as claimed in claim 4, wherein, the combination first-loss function and the second loss function train to obtain
The step of three neural network models, includes:
Weighting is done to first-loss function and the second loss function to handle to obtain the final loss for training third nerve network model
Function;And
Utilize the final loss function training third nerve network model.
6. method as claimed in claim 5, wherein, the first-loss function is:
loss1=-logf (zk)
Wherein,
In formula, loss1For first-loss functional value, batch size when N is training, zkFor k-th of full articulamentum in feedforward network
The output of neuron.
7. method as claimed in claim 6, wherein, second loss function is:
<mrow>
<msub>
<mi>loss</mi>
<mn>2</mn>
</msub>
<mo>=</mo>
<mfrac>
<mn>1</mn>
<mrow>
<mn>2</mn>
<mi>M</mi>
</mrow>
</mfrac>
<msubsup>
<mi>&Sigma;</mi>
<mrow>
<mi>i</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mi>M</mi>
</msubsup>
<mo>|</mo>
<mo>|</mo>
<msub>
<mi>x</mi>
<mrow>
<mn>1</mn>
<mi>i</mi>
</mrow>
</msub>
<mo>-</mo>
<msub>
<mi>x</mi>
<mrow>
<mn>2</mn>
<mi>i</mi>
</mrow>
</msub>
<mo>|</mo>
<msup>
<mo>|</mo>
<mn>2</mn>
</msup>
</mrow>
In formula, loss2For the second loss function value, M is that the classification of the classification through softmax layers of output is total, x1iFor current net
The probability vector of i-th of classification output of network, x2iThe probability vector of i-th of the classification characterized for corresponding second label.
8. the method for claim 7, wherein, the final loss function of training third nerve network model is defined as:
Loss=w1×loss1+w2×loss2
In formula, loss is final loss function value, w1And w2The power of first-loss functional value and the second loss function value is represented respectively
Repeated factor.
9. a kind of computing device, including:
One or more processors;With
Memory;
One or more programs, wherein one or more of program storages are in the memory and are configured as by described one
A or multiple processors perform, and one or more of programs include being used to perform according in claim 1-8 the methods
The instruction of either method.
10. a kind of computer-readable recording medium for storing one or more programs, one or more of programs include instruction,
Described instruction is when computing device so that appointing in method of the computing device according to claim 1-8
One method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711179045.XA CN107977707B (en) | 2017-11-23 | 2017-11-23 | Method and computing equipment for resisting distillation neural network model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711179045.XA CN107977707B (en) | 2017-11-23 | 2017-11-23 | Method and computing equipment for resisting distillation neural network model |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107977707A true CN107977707A (en) | 2018-05-01 |
CN107977707B CN107977707B (en) | 2020-11-06 |
Family
ID=62011190
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711179045.XA Active CN107977707B (en) | 2017-11-23 | 2017-11-23 | Method and computing equipment for resisting distillation neural network model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107977707B (en) |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109241988A (en) * | 2018-07-16 | 2019-01-18 | 北京市商汤科技开发有限公司 | Feature extracting method and device, electronic equipment, storage medium, program product |
CN109886160A (en) * | 2019-01-30 | 2019-06-14 | 浙江工商大学 | It is a kind of it is non-limiting under the conditions of face identification method |
CN109961442A (en) * | 2019-03-25 | 2019-07-02 | 腾讯科技(深圳)有限公司 | Training method, device and the electronic equipment of neural network model |
CN110427466A (en) * | 2019-06-12 | 2019-11-08 | 阿里巴巴集团控股有限公司 | Training method and device for the matched neural network model of question and answer |
CN110490202A (en) * | 2019-06-18 | 2019-11-22 | 腾讯科技(深圳)有限公司 | Detection model training method, device, computer equipment and storage medium |
WO2020062262A1 (en) * | 2018-09-30 | 2020-04-02 | Shanghai United Imaging Healthcare Co., Ltd. | Systems and methods for generating a neural network model for image processing |
CN111027060A (en) * | 2019-12-17 | 2020-04-17 | 电子科技大学 | Knowledge distillation-based neural network black box attack type defense method |
CN111079574A (en) * | 2019-11-29 | 2020-04-28 | 支付宝(杭州)信息技术有限公司 | Method and system for training neural network |
CN111105008A (en) * | 2018-10-29 | 2020-05-05 | 富士通株式会社 | Model training method, data recognition method and data recognition device |
CN111832701A (en) * | 2020-06-09 | 2020-10-27 | 北京百度网讯科技有限公司 | Model distillation method, device, electronic equipment and storage medium |
CN112561076A (en) * | 2020-12-10 | 2021-03-26 | 支付宝(杭州)信息技术有限公司 | Model processing method and device |
CN112820313A (en) * | 2020-12-31 | 2021-05-18 | 北京声智科技有限公司 | Model training method, voice separation method and device and electronic equipment |
JPWO2020161935A1 (en) * | 2019-02-05 | 2021-11-25 | 日本電気株式会社 | Learning equipment, learning methods, and programs |
US11443069B2 (en) | 2019-09-03 | 2022-09-13 | International Business Machines Corporation | Root cause analysis of vulnerability of neural networks to adversarial examples |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101847019A (en) * | 2009-03-23 | 2010-09-29 | 上海都峰智能科技有限公司 | Multichannel temperature controller |
US20120066163A1 (en) * | 2010-09-13 | 2012-03-15 | Nottingham Trent University | Time to event data analysis method and system |
CN102626557A (en) * | 2012-04-13 | 2012-08-08 | 长春工业大学 | Molecular distillation process parameter optimizing method based on GA-BP (Genetic Algorithm-Back Propagation) algorithm |
CN105069212A (en) * | 2015-07-30 | 2015-11-18 | 南通航运职业技术学院 | Ballast water microbe quantity prediction method based on artificial neural network |
-
2017
- 2017-11-23 CN CN201711179045.XA patent/CN107977707B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101847019A (en) * | 2009-03-23 | 2010-09-29 | 上海都峰智能科技有限公司 | Multichannel temperature controller |
US20120066163A1 (en) * | 2010-09-13 | 2012-03-15 | Nottingham Trent University | Time to event data analysis method and system |
CN102626557A (en) * | 2012-04-13 | 2012-08-08 | 长春工业大学 | Molecular distillation process parameter optimizing method based on GA-BP (Genetic Algorithm-Back Propagation) algorithm |
CN105069212A (en) * | 2015-07-30 | 2015-11-18 | 南通航运职业技术学院 | Ballast water microbe quantity prediction method based on artificial neural network |
Non-Patent Citations (5)
Title |
---|
CRISTIAN BUCIL˘A等: "Model Compression", 《IN PROCEEDINGS OF THE 12TH ACMSIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING,KDD》 * |
GEOFFREY HINTON等: "Distilling the Knowledge in a Neural Network", 《ARXIV:1503.02531V1 [STAT.ML]》 * |
李凡长等主编: "《李群机器学习》", 30 April 2013, 合肥:中国科学技术大学出版社 * |
杨文剑等: "集总动力学-BP神经网络混合模型用于预测延迟", 《石油炼制与化工》 * |
陆红主编: "《大数据分析方法》", 30 June 2017, 广州:中国财富出版社 * |
Cited By (24)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109241988A (en) * | 2018-07-16 | 2019-01-18 | 北京市商汤科技开发有限公司 | Feature extracting method and device, electronic equipment, storage medium, program product |
WO2020062262A1 (en) * | 2018-09-30 | 2020-04-02 | Shanghai United Imaging Healthcare Co., Ltd. | Systems and methods for generating a neural network model for image processing |
US11599796B2 (en) | 2018-09-30 | 2023-03-07 | Shanghai United Imaging Healthcare Co., Ltd. | Systems and methods for generating a neural network model for image processing |
US11907852B2 (en) | 2018-09-30 | 2024-02-20 | Shanghai United Imaging Healthcare Co., Ltd. | Systems and methods for generating a neural network model for image processing |
CN111105008A (en) * | 2018-10-29 | 2020-05-05 | 富士通株式会社 | Model training method, data recognition method and data recognition device |
CN109886160B (en) * | 2019-01-30 | 2021-03-09 | 浙江工商大学 | Face recognition method under non-limited condition |
CN109886160A (en) * | 2019-01-30 | 2019-06-14 | 浙江工商大学 | It is a kind of it is non-limiting under the conditions of face identification method |
JPWO2020161935A1 (en) * | 2019-02-05 | 2021-11-25 | 日本電気株式会社 | Learning equipment, learning methods, and programs |
JP7180697B2 (en) | 2019-02-05 | 2022-11-30 | 日本電気株式会社 | LEARNING DEVICE, LEARNING METHOD, AND PROGRAM |
CN109961442B (en) * | 2019-03-25 | 2022-11-18 | 腾讯科技(深圳)有限公司 | Training method and device of neural network model and electronic equipment |
CN109961442A (en) * | 2019-03-25 | 2019-07-02 | 腾讯科技(深圳)有限公司 | Training method, device and the electronic equipment of neural network model |
CN110427466A (en) * | 2019-06-12 | 2019-11-08 | 阿里巴巴集团控股有限公司 | Training method and device for the matched neural network model of question and answer |
CN110427466B (en) * | 2019-06-12 | 2023-05-26 | 创新先进技术有限公司 | Training method and device for neural network model for question-answer matching |
CN110490202A (en) * | 2019-06-18 | 2019-11-22 | 腾讯科技(深圳)有限公司 | Detection model training method, device, computer equipment and storage medium |
US11443069B2 (en) | 2019-09-03 | 2022-09-13 | International Business Machines Corporation | Root cause analysis of vulnerability of neural networks to adversarial examples |
CN111079574A (en) * | 2019-11-29 | 2020-04-28 | 支付宝(杭州)信息技术有限公司 | Method and system for training neural network |
CN111079574B (en) * | 2019-11-29 | 2022-08-02 | 支付宝(杭州)信息技术有限公司 | Method and system for training neural network |
CN111027060A (en) * | 2019-12-17 | 2020-04-17 | 电子科技大学 | Knowledge distillation-based neural network black box attack type defense method |
CN111027060B (en) * | 2019-12-17 | 2022-04-29 | 电子科技大学 | Knowledge distillation-based neural network black box attack type defense method |
CN111832701B (en) * | 2020-06-09 | 2023-09-22 | 北京百度网讯科技有限公司 | Model distillation method, model distillation device, electronic equipment and storage medium |
CN111832701A (en) * | 2020-06-09 | 2020-10-27 | 北京百度网讯科技有限公司 | Model distillation method, device, electronic equipment and storage medium |
CN112561076A (en) * | 2020-12-10 | 2021-03-26 | 支付宝(杭州)信息技术有限公司 | Model processing method and device |
CN112820313B (en) * | 2020-12-31 | 2022-11-01 | 北京声智科技有限公司 | Model training method, voice separation method and device and electronic equipment |
CN112820313A (en) * | 2020-12-31 | 2021-05-18 | 北京声智科技有限公司 | Model training method, voice separation method and device and electronic equipment |
Also Published As
Publication number | Publication date |
---|---|
CN107977707B (en) | 2020-11-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107977707A (en) | A kind of method and computing device for resisting distillation neural network model | |
US11373087B2 (en) | Method and apparatus for generating fixed-point type neural network | |
CN108334499A (en) | A kind of text label tagging equipment, method and computing device | |
US20210004663A1 (en) | Neural network device and method of quantizing parameters of neural network | |
US20220101090A1 (en) | Neural Architecture Search with Factorized Hierarchical Search Space | |
CN109522942B (en) | Image classification method and device, terminal equipment and storage medium | |
US20200097828A1 (en) | Processing method and accelerating device | |
CN110825884B (en) | Embedded representation processing method and device based on artificial intelligence and electronic equipment | |
CN106780512A (en) | The method of segmentation figure picture, using and computing device | |
US11887005B2 (en) | Content adaptive attention model for neural network-based image and video encoders | |
US20160283842A1 (en) | Neural network and method of neural network training | |
CN107977665A (en) | The recognition methods of key message and computing device in a kind of invoice | |
CN106295521A (en) | A kind of gender identification method based on multi output convolutional neural networks, device and the equipment of calculating | |
Xia et al. | Fully dynamic inference with deep neural networks | |
CN112613581A (en) | Image recognition method, system, computer equipment and storage medium | |
US10657439B2 (en) | Processing method and device, operation method and device | |
CN112418292B (en) | Image quality evaluation method, device, computer equipment and storage medium | |
CN107832794A (en) | A kind of convolutional neural networks generation method, the recognition methods of car system and computing device | |
CN116188878A (en) | Image classification method, device and storage medium based on neural network structure fine adjustment | |
CN111275033A (en) | Character recognition method and device, electronic equipment and storage medium | |
CN114742210A (en) | Hybrid neural network training method, traffic flow prediction method, apparatus, and medium | |
CN106503386A (en) | The good and bad method and device of assessment luminous power prediction algorithm performance | |
CN116342420A (en) | Method and system for enhancing mixed degraded image | |
CN116269312A (en) | Individual brain map drawing method and device based on brain map fusion model | |
He et al. | Rank-based greedy model averaging for high-dimensional survival data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |