CN107977707A - A kind of method and computing device for resisting distillation neural network model - Google Patents

A kind of method and computing device for resisting distillation neural network model Download PDF

Info

Publication number
CN107977707A
CN107977707A CN201711179045.XA CN201711179045A CN107977707A CN 107977707 A CN107977707 A CN 107977707A CN 201711179045 A CN201711179045 A CN 201711179045A CN 107977707 A CN107977707 A CN 107977707A
Authority
CN
China
Prior art keywords
network model
label
loss
training
loss function
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201711179045.XA
Other languages
Chinese (zh)
Other versions
CN107977707B (en
Inventor
陈良
洪炜冬
张伟
许清泉
王喆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen Meitu Technology Co Ltd
Original Assignee
Xiamen Meitu Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen Meitu Technology Co Ltd filed Critical Xiamen Meitu Technology Co Ltd
Priority to CN201711179045.XA priority Critical patent/CN107977707B/en
Publication of CN107977707A publication Critical patent/CN107977707A/en
Application granted granted Critical
Publication of CN107977707B publication Critical patent/CN107977707B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a kind of method for resisting distillation neural network model, wherein neural network model includes the softmax layers of feedforward network and the more classification lower probability vectors of output with feature Rotating fields, and this method is suitable for performing in computing device, including step:Scaling layer is added between the feedforward network and softmax layer of original neural network model according to vapo(u)rizing temperature, generates first nerves network model;Using the first label training first nerves network model of training sample itself, nervus opticus network model is obtained;Training sample is inputted into nervus opticus network model, second label of the training sample in classification lower probability vector more is characterized through softmax layers of output;Using the second label and the first label while constrained learning nervus opticus network model, third nerve network model is obtained;The scaling layer in third nerve network model is deleted, to obtain the neural network model after confrontation distillation.The present invention discloses correspondingly computing device in the lump.

Description

A kind of method and computing device for resisting distillation neural network model
Technical field
The present invention relates to technical field of image processing, especially a kind of method for resisting distillation neural network model and calculating Equipment.
Background technology
Deep neural network can be always obtained very accurately as a result, in magnanimity number on present classification regression problem The deep neural network model come according under support, training also has very strong generalization ability, therefore, deep neural network in recent years Widely applied in computer vision, speech recognition etc..But these deep neural network models are in practical application In also can there are some defects and loophole.For example, in the case where not knowing network architecture and parameter, the input to network Special small sample perturbations are done, these can't have any impact judgement from the subjective of people, but can but make network model The very high error result of confidence level is exported, these are referred to as " to resisting sample " by the input that " small sample perturbations " are crossed.The above problem is straight Connect the generalization ability and security for having influenced neural network model.
Being commonly used in the generalization ability of raising neural network model and the scheme of security is:In neural network model Added in training data to resisting sample, error rate of the network model to these confrontation specimen discernings is reduced with this, while further Improve the generalization ability of model.However, diversity built to resisting sample etc. result in this processing mode and be not reaching to expection Effect.
Therefore, it is necessary to a kind of scheme for being capable of providing neural network model generalization ability and security.
The content of the invention
For this reason, the present invention provides a kind of method and computing device for resisting distillation neural network model, to try hard to solve Or at least alleviate existing at least one problem above.
According to an aspect of the invention, there is provided a kind of method for resisting distillation neural network model, wherein nerve net Network model includes the softmax layers of feedforward network and the more classification lower probability vectors of output with feature Rotating fields, and this method is fitted In being performed in computing device, including step:According to vapo(u)rizing temperature original neural network model feedforward network and Scaling layer is added between softmax layers, generates first nerves network model;Utilize the first label training of training sample itself One neural network model, obtains nervus opticus network model;Training sample is inputted into nervus opticus network model, through softmax Second label of the layer output characterization training sample in classification lower probability vector more;Constrained at the same time using the second label and the first label Training nervus opticus network model, obtains third nerve network model;And the scaling layer in deletion third nerve network model, To obtain the neural network model after confrontation distillation.
Alternatively, in the method according to the invention, scaling layer is suitable for doing according to input of the vapo(u)rizing temperature to softmax layers Diminution is handled.
Alternatively, in the method according to the invention, the first label training first nerves net of training sample itself is utilized Network model, which obtains the step of nervus opticus network model, to be included:First nerves is supervised by first-loss function using the first label The training of network model, obtains nervus opticus network model.
Alternatively, in the method according to the invention, using the second label and the first label, constrained learning second is refreshing at the same time The step of obtaining third nerve network model through network model includes:Pass through the god of first-loss function pair second using the first label Classification supervised training is carried out through network model;Nervus opticus network model is carried out by the second loss function using the second label Return supervised training;And combination first-loss function and the second loss function train to obtain third nerve network model.
Alternatively, in the method according to the invention, train to obtain with reference to first-loss function and the second loss function The step of three neural network models, includes:Weighting is done to first-loss function and the second loss function to handle to obtain the god of training the 3rd Final loss function through network model;And utilize the final loss function training third nerve network model.
Alternatively, in the method according to the invention, first-loss function is:
loss1=-logf (zk)
Wherein,
In formula, loss1For first-loss functional value, batch size when N is training, zkFor full articulamentum in feedforward network The output of k-th of neuron.
Alternatively, in the method according to the invention, the second loss function is:
In formula, loss2For the second loss function value, M is that the classification of the classification through softmax layers of output is total, x1iTo work as The probability vector of preceding i-th of classification output of network, x2iThe probability vector of i-th of the classification characterized for corresponding second label.
Alternatively, in the method according to the invention, the final loss function of training third nerve network model is defined as: Loss=w1×loss1+w2×loss2,
In formula, loss is final loss function value, w1And w2First-loss functional value and the second loss function value are represented respectively Weight factor.
According to another aspect of the present invention, there is provided a kind of computing device, including:One or more processors;And storage Device;One or more programs, wherein one or more program storages in memory and be configured as by one or more handle Device performs, and one or more programs include being used for the instruction for performing the either method in method as described above.
In accordance with a further aspect of the present invention, there is provided a kind of computer-readable storage medium for storing one or more programs Matter, one or more programs include instruction, instruct when computing device so that computing device method as described above In either method.
The method of confrontation distillation neural network model according to the present invention, by being added in original neural network model Scaling layer, does neural network model and distills in itself, without changing the feature Rotating fields in neural network model, is effectively reduced Error rate of the neural network model when reply is to resisting sample;Also, with the second label of training sample and itself the One label supervises the training of neural network model at the same time, improves the generalization ability of neural network model.
Brief description of the drawings
In order to realize above-mentioned and related purpose, some illustrative sides are described herein in conjunction with following description and attached drawing Face, these aspects indicate the various modes that can put into practice principles disclosed herein, and all aspects and its equivalent aspect It is intended to fall under in the range of theme claimed.Read following detailed description in conjunction with the accompanying drawings, the disclosure it is above-mentioned And other purposes, feature and advantage will be apparent.Throughout the disclosure, identical reference numeral generally refers to identical Component or element.
Fig. 1 shows the organigram of computing device 100 according to an embodiment of the invention;And
Fig. 2 shows the flow chart of the method 200 of confrontation distillation neural network model according to an embodiment of the invention.
Embodiment
The exemplary embodiment of the disclosure is more fully described below with reference to accompanying drawings.Although the disclosure is shown in attached drawing Exemplary embodiment, it being understood, however, that may be realized in various forms the disclosure without should be by embodiments set forth here Limited.On the contrary, these embodiments are provided to facilitate a more thoroughly understanding of the present invention, and can be by the scope of the present disclosure Completely it is communicated to those skilled in the art.
Fig. 1 is the block diagram of Example Computing Device 100.In basic configuration 102, computing device 100, which typically comprises, is System memory 106 and one or more processor 104.Memory bus 108 can be used in processor 104 and system storage Communication between device 106.
Depending on desired configuration, processor 104 can be any kind of processor, include but not limited to:Microprocessor Device (μ P), microcontroller (μ C), digital information processor (DSP) or any combination of them.Processor 104 can include all Cache, processor core such as one or more rank of on-chip cache 110 and second level cache 112 etc 114 and register 116.Exemplary processor core 114 can include arithmetic and logical unit (ALU), floating-point unit (FPU), Digital signal processing core (DSP core) or any combination of them.Exemplary Memory Controller 118 can be with processor 104 are used together, or in some implementations, Memory Controller 118 can be an interior section of processor 104.
Depending on desired configuration, system storage 106 can be any type of memory, include but not limited to:Easily The property lost memory (RAM), nonvolatile memory (ROM, flash memory etc.) or any combination of them.System stores Device 106 can include operating system 120, one or more apply 122 and routine data 124.In some embodiments, It may be arranged to be operated using routine data 124 on an operating system using 122.In certain embodiments, computing device 100 are configured as performing the method that neural network model is distilled in confrontation, are just contained in routine data 124 for performing the side The instruction of method.
Computing device 100 can also include contributing to from various interface equipments (for example, output equipment 142, Peripheral Interface 144 and communication equipment 146) to basic configuration 102 via the communication of bus/interface controller 130 interface bus 140.Example Output equipment 142 include graphics processing unit 148 and audio treatment unit 150.They can be configured as contribute to via One or more A/V port 152 communicates with the various external equipments of such as display or loudspeaker etc.Outside example If interface 144 can include serial interface controller 154 and parallel interface controller 156, they, which can be configured as, contributes to Via one or more I/O port 158 and such as input equipment (for example, keyboard, mouse, pen, voice-input device, image Input equipment) or the external equipment of other peripheral hardwares (such as printer, scanner etc.) etc communicate.Exemplary communication is set Standby 146 can include network controller 160, it can be arranged to be easy to via one or more communication port 164 and one The communication that other a or multiple computing devices 162 pass through network communication link.
Network communication link can be an example of communication media.Communication media can be usually presented as in such as carrier wave Or computer-readable instruction in the modulated data signal of other transmission mechanisms etc, data structure, program module, and can With including any information delivery media." modulated data signal " can such signal, one in its data set or more It is a or it change can the mode of coding information in the signal carry out.As nonrestrictive example, communication media can be with Include the wire medium of such as cable network or private line network etc, and it is such as sound, radio frequency (RF), microwave, infrared (IR) the various wireless mediums or including other wireless mediums.Term computer-readable medium used herein can include depositing Both storage media and communication media.In certain embodiments, one or more programs are stored in computer-readable medium, this Or multiple programs include performing the instruction of some methods, such as according to an embodiment of the invention, computing device 100 passes through the finger Make to perform the method for confrontation distillation neural network model.
Computing device 100 can be implemented as a part for portable (or mobile) electronic equipment of small size, these electronics are set Standby can be such as cell phone, digital camera, personal digital assistant (PDA), personal media player device, wireless network Browsing apparatus, personal helmet, application specific equipment or the mixing apparatus that any of the above function can be included.Calculating is set Standby 100 are also implemented as including desktop computer and the personal computer of notebook computer configuration.
Below with reference to Fig. 2, the side that neural network model is distilled in confrontation according to an embodiment of the invention is elaborated Method 200 realizes flow.
The general structure of neural network model according to embodiments of the present invention can be divided into two parts, i.e. have characteristic layer The softmax layers of the feedforward network of structure and the more classification lower probability vectors of output.Wherein, feedforward network generally has at least one Convolutional layer, pond layer and full articulamentum, input data for example through multiple convolution and pondization operation after, again through full articulamentum unicom close Exported after and.The result of normalized is made in the softmax layers of output that can be understood as to feedforward network, it is assumed that neutral net mould Type is used for classifying to picture, and picture classification has 100 kinds at present, that is exactly one 100 dimension by softmax layers of output Vector, first value in vector is exactly probable value that photo current belongs to the first classification, it is vectorial in second value be exactly It is 1 that photo current, which belongs to the sum of the probable value ... of the second classification also, the vector value of this hundred dimension,.
It should be noted that the general application scenarios of method 200 are to carry out classification processing using neural network model, its is right The concrete structure of neural network model is not limited.In practical applications, the feedforward network of neural network model can be selected AlexNet, VGGNet, Google Inception Net, ResNet etc. be existing or the network structure that redefines in appoint Meaning one, the embodiment of the present invention is not restricted this.
Method 200 starts from step S210, if vapo(u)rizing temperature is T, according to vapo(u)rizing temperature in original neural network model Scaling layer is added between feedforward network and softmax layers, generates first nerves network model.An implementation according to the present invention Example, scaling layer do diminution processing according to vapo(u)rizing temperature T to softmax layers of input (that is, the output of feedforward network).Namely Say, increase a scaling layer between the feedforward network and softmax layer of original neural network model, scaling layer is by feedforward network Output (it is, output of the full articulamentum of last in feedforward network) do the diminution of 1/T, it is then the data after diminution are defeated Enter to softmax layers.The embodiment of the present invention is not limited the value of vapo(u)rizing temperature T, in practical applications, according to it is preceding to The size and actual conditions of network choose the value of T.
Then in step S220, using the first label training first nerves network model of training sample itself, obtain Nervus opticus network model.According to one embodiment of present invention, training sample is inputted into first nerves network model, by the One loss function supervises the training of first nerves network model on the training sample of the first label for labelling, and will train network The neural network model of interior each parameter as nervus opticus network model, wherein, the first label is the label of training sample in itself, Referred to as hard label.
Then in step S230, the training sample in step S220 is inputted to the nervus opticus trained through step S220 In network model, second label of the training sample in classification lower probability vector more is characterized through softmax layers of output, is referred to as soft Target, prediction probability vector of the second label, that is, nervus opticus network model to training sample.
Then in step S240, the second label and the first label while constrained learning nervus opticus network model are utilized (that is, the neural network model after distillation), obtains third nerve network model.According to one embodiment of present invention, training the During two neural network models, with the first label (hard label) and the second label (soft target) of above-mentioned training sample Training network model at the same time, and distribute two groups of loss functions.Specific steps are described as follows:
On the one hand, for the first label (hard label), by first-loss function pair nervus opticus network model into Row classification supervised training.Training process can utilize the first label training second of training sample itself with step S220 herein Neural network model.Alternatively, first-loss function is, for example, the full name of Softmax the with loss, Caffe in Caffe It is Convolutional Architecture for Fast Feature Embedding, it provides the kit increased income, For training, testing, finely tuning and deployment depth learning model.According to one embodiment of present invention, choose in Caffe Softmax with loss carry out classification supervised learning.
On the other hand, for the second label (soft target), by the second loss function to nervus opticus network model Carry out recurrence supervised training.Second loss function is, for example, the Euclidean loss in Caffe, for learning to be fitted soft The output vector of target.It should be noted that when training nervus opticus network herein, because the presence of scaling layer, forward direction net The output of network has all carried out the diminution of 1/T.
Finally, train to obtain third nerve network model with reference to first-loss function and the second loss function.According to this hair Bright one embodiment, sets different weights, to first-loss function respectively for first-loss function and the second loss function Weighting is done with the second loss function to handle to obtain the final loss function for training third nerve network model, utilizes the final loss Function trains third nerve network model.
Then in step s 250, the scaling layer in third nerve network model is deleted, to obtain the god after confrontation distillation Through network model.That is, the distillation temperature by the scaling layer in the third nerve network model trained through step S240 Degree T is set to 1 (that is, cancelling scaling processing), obtains the neural network model after confrontation distillation.
Embodiment according to the present invention, by adding scaling layer in original neural network model, to neutral net Model is done in itself to be distilled, and without changing the feature Rotating fields in neural network model, is significantly reduced neural network model and is existed Error rate when tackling to resisting sample;Also, with the probability vector (i.e. the second label) and the first label of itself after distillation The training of neural network model is supervised at the same time, improves the generalization ability of neural network model.
To further illustrate the above method 200, below by taking the chin classification during human face five-sense-organ is classified as an example, method is introduced 200 specific implementation procedure.
The first step, chooses traditional VGG-Face networks as feedforward network, it is T=20 to take vapo(u)rizing temperature.In forward direction net Scaling layer is added between network and softmax layers, output to feedforward network (that is, the full articulamentum of last in feedforward network Output) carry out 1/20 diminution, be input to again in softmax layer, the neural network model that will add scaling layer is refreshing as first Through network model.
VGGNet is Oxford University's computer vision research group (Visual Geometry Group) and Google The depth convolutional neural networks that the researcher of DeepMind companies researches and develops together, are often used to abstract image feature, VGG-Face It is the network that one of them in VGG groups is used for doing recognition of face.It is using between the depth and its performance of convolutional neural networks Relation, by the small-sized convolution kernel and 2 × 2 maximum pond layer of stacking 3 × 3 repeatedly, construct 16~19 layer depths Convolutional neural networks, whole network is simple for structure, all using an equal amount of convolution kernel size (3 × 3) and maximum pond size (2×2).More details on VGGNet can refer to paper:VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION, herein do not illustrate its network structure excessively.
Second step, is instructed using first-loss function on the training sample (i.e. training image) of the first label for labelling Practice, using trained neural network model as nervus opticus network model.The label of first label, that is, training image in itself.Its In, first-loss function (softmax loss) is defined as:
loss1=-logf (zk)
Wherein,
In above formula, loss1For first-loss functional value, batch size (batch_size) when N is training is popular next Say, N can be understood as input sample quantity of the first nerves network model when a forward-propagating is handled, zkFor feedforward network In full articulamentum k-th of neuron output.
3rd step, nervus opticus network model is input to by original training image, obtains each training image The probability vector of output in two neural network models (that is, distilled model), as the second label.For example, some training figure Second label of picture is [0.93,0.02,0.05], and the chin that three probable values correspond in the image respectively is under square jaw, point Bar, circle chin probability.
4th step, the training of nervus opticus network model is supervised using the first label and the second label, will be trained at the same time Neural network model as third nerve network model.
As it was previously stated, according to one embodiment of present invention, for the first label, still using the first damage in second step Lose function (softmax loss) and classification supervision is carried out to nervus opticus network model;For the second label, the second loss function Selection Euclid's loss function (euclidean loss) carries out recurrence supervision to nervus opticus network model.Wherein, second Loss function (that is, euclidean loss) is defined as:
In above formula, loss2For the second loss function value, M is that the classification sum of the classification through softmax layers of output is (that is, special The dimension of sign), x1iFor the probability vector of i-th of classification output of current network, x2iI-th characterized for corresponding second label The probability vector of classification.
Then, different weights is assigned to above-mentioned two loss function, then trains the final damage of third nerve network model Function is lost finally to be defined as:
Loss=w1×loss1+w2×loss2
In formula, loss is final loss function value, w1And w2First-loss functional value and the second loss function value are represented respectively Weight factor.w1And w2Value depending on training, the embodiment of the present invention is not restricted this.
5th step, deletes the scaling layer in trained third nerve network model, i.e., vapo(u)rizing temperature T is set to 1, gained Neural network model can be used as confrontation distillation after neural network model.
The method of confrontation distillation neural network model according to the present invention, in a manner of distilling neural network model itself, Error rate of the neural network model when reply is to resisting sample is effectively reduced, meanwhile, the classification obtained in classification problem is put down Face is more reasonable, classification results are more accurate.Also, scheme according to the present invention need not be constructed to resisting sample, improved well The security of neural network model.
It should be appreciated that in order to simplify the disclosure and help to understand one or more of each inventive aspect, it is right above The present invention exemplary embodiment description in, each feature of the invention be grouped together into sometimes single embodiment, figure or In person's descriptions thereof.However, the method for the disclosure should be construed to reflect following intention:I.e. claimed hair The bright feature more features required than being expressly recited in each claim.More precisely, as the following claims As book reflects, inventive aspect is all features less than single embodiment disclosed above.Therefore, it then follows specific real Thus the claims for applying mode are expressly incorporated in the embodiment, wherein each claim is used as this hair in itself Bright separate embodiments.
Those skilled in the art should understand that the module or unit or group of the equipment in example disclosed herein Part can be arranged in equipment as depicted in this embodiment, or alternatively can be positioned at and the equipment in the example In different one or more equipment.Module in aforementioned exemplary can be combined as a module or be segmented into addition multiple Submodule.
Those skilled in the art, which are appreciated that, to carry out adaptively the module in the equipment in embodiment Change and they are arranged in one or more equipment different from the embodiment.Can be the module or list in embodiment Member or component be combined into a module or unit or component, and can be divided into addition multiple submodule or subelement or Sub-component.In addition at least some in such feature and/or process or unit exclude each other, it can use any Combination is disclosed to all features disclosed in this specification (including adjoint claim, summary and attached drawing) and so to appoint Where all processes or unit of method or equipment are combined.Unless expressly stated otherwise, this specification (including adjoint power Profit requires, summary and attached drawing) disclosed in each feature can be by providing the alternative features of identical, equivalent or similar purpose come generation Replace.
In addition, it will be appreciated by those of skill in the art that although some embodiments described herein include other embodiments In included some features rather than further feature, but the combination of the feature of different embodiments means in of the invention Within the scope of and form different embodiments.For example, in the following claims, embodiment claimed is appointed One of meaning mode can use in any combination.
Various technologies described herein can combine hardware or software, or combinations thereof is realized together.So as to the present invention Method and apparatus, or some aspects of the process and apparatus of the present invention or part can take embedded tangible media, such as soft The form of program code (instructing) in disk, CD-ROM, hard disk drive or other any machine readable storage mediums, Wherein when program is loaded into the machine of such as computer etc, and is performed by the machine, the machine becomes to put into practice this hair Bright equipment.
In the case where program code performs on programmable computers, computing device generally comprises processor, processor Readable storage medium (including volatile and non-volatile memory and or memory element), at least one input unit, and extremely A few output device.Wherein, memory is arranged to store program codes;Processor is arranged to according to the memory Instruction in the said program code of middle storage, performs method of the present invention.
By way of example and not limitation, computer-readable medium includes computer storage media and communication media.Calculate Machine computer-readable recording medium includes computer storage media and communication media.Computer-readable storage medium storage such as computer-readable instruction, The information such as data structure, program module or other data.Communication media is generally modulated with carrier wave or other transmission mechanisms etc. Data-signal processed passes to embody computer-readable instruction, data structure, program module or other data including any information Pass medium.Any combination above is also included within the scope of computer-readable medium.
In addition, be described as herein can be by the processor of computer system or by performing for some in the embodiment The method or the combination of method element that other devices of the function are implemented.Therefore, have and be used to implement the method or method The processor of the necessary instruction of element forms the device for being used for implementing this method or method element.In addition, device embodiment Element described in this is the example of following device:The device is used to implement as in order to performed by implementing the element of the purpose of the invention Function.
As used in this, unless specifically stated, come using ordinal number " first ", " second ", " the 3rd " etc. Description plain objects are merely representative of the different instances for being related to similar object, and are not intended to imply that the object being so described must Must have the time it is upper, spatially, in terms of sequence or given order in any other manner.
Although according to the embodiment of limited quantity, the invention has been described, benefits from above description, the art It is interior it is clear for the skilled person that in the scope of the present invention thus described, it can be envisaged that other embodiments.Additionally, it should be noted that The language that is used in this specification primarily to readable and teaching purpose and select, rather than in order to explain or limit Determine subject of the present invention and select.Therefore, in the case of without departing from the scope and spirit of the appended claims, for this Many modifications and changes will be apparent from for the those of ordinary skill of technical field.For the scope of the present invention, to this The done disclosure of invention is illustrative and not restrictive, and it is intended that the scope of the present invention be defined by the claims appended hereto.

Claims (10)

1. a kind of method for resisting distillation neural network model, wherein neural network model include the forward direction with feature Rotating fields The softmax layers of network and the more classification lower probability vectors of output, the method are suitable for performing in computing device, the method bag Include step:
Scaling layer is added between the feedforward network and softmax layer of original neural network model according to vapo(u)rizing temperature, is generated First nerves network model;
Using the first label training first nerves network model of training sample itself, nervus opticus network model is obtained;
Training sample is inputted into the nervus opticus network model, training sample is characterized under more classification through softmax layers of output Second label of probability vector;
Using second label and the first label while nervus opticus network model described in constrained learning, third nerve net is obtained Network model;And
The scaling layer in third nerve network model is deleted, to obtain the neural network model after confrontation distillation.
2. the method for claim 1, wherein the scaling layer is suitable for according to input of the vapo(u)rizing temperature to softmax layers Do diminution processing.
3. method as claimed in claim 1 or 2, wherein, utilize the first label training first nerves net of training sample itself Network model, which obtains the step of nervus opticus network model, to be included:
The training of the first nerves network model is supervised by first-loss function using the first label, obtains nervus opticus net Network model.
4. such as the method any one of claim 1-3, wherein, constrained at the same time using second label and the first label The step of training nervus opticus network model obtains third nerve network model includes:
Classification supervised training is carried out by nervus opticus network model described in first-loss function pair using the first label;
Recurrence supervised training is carried out to the nervus opticus network model by the second loss function using the second label;And
Train to obtain third nerve network model with reference to first-loss function and the second loss function.
5. method as claimed in claim 4, wherein, the combination first-loss function and the second loss function train to obtain The step of three neural network models, includes:
Weighting is done to first-loss function and the second loss function to handle to obtain the final loss for training third nerve network model Function;And
Utilize the final loss function training third nerve network model.
6. method as claimed in claim 5, wherein, the first-loss function is:
loss1=-logf (zk)
Wherein,
In formula, loss1For first-loss functional value, batch size when N is training, zkFor k-th of full articulamentum in feedforward network The output of neuron.
7. method as claimed in claim 6, wherein, second loss function is:
<mrow> <msub> <mi>loss</mi> <mn>2</mn> </msub> <mo>=</mo> <mfrac> <mn>1</mn> <mrow> <mn>2</mn> <mi>M</mi> </mrow> </mfrac> <msubsup> <mi>&amp;Sigma;</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>M</mi> </msubsup> <mo>|</mo> <mo>|</mo> <msub> <mi>x</mi> <mrow> <mn>1</mn> <mi>i</mi> </mrow> </msub> <mo>-</mo> <msub> <mi>x</mi> <mrow> <mn>2</mn> <mi>i</mi> </mrow> </msub> <mo>|</mo> <msup> <mo>|</mo> <mn>2</mn> </msup> </mrow>
In formula, loss2For the second loss function value, M is that the classification of the classification through softmax layers of output is total, x1iFor current net The probability vector of i-th of classification output of network, x2iThe probability vector of i-th of the classification characterized for corresponding second label.
8. the method for claim 7, wherein, the final loss function of training third nerve network model is defined as:
Loss=w1×loss1+w2×loss2
In formula, loss is final loss function value, w1And w2The power of first-loss functional value and the second loss function value is represented respectively Repeated factor.
9. a kind of computing device, including:
One or more processors;With
Memory;
One or more programs, wherein one or more of program storages are in the memory and are configured as by described one A or multiple processors perform, and one or more of programs include being used to perform according in claim 1-8 the methods The instruction of either method.
10. a kind of computer-readable recording medium for storing one or more programs, one or more of programs include instruction, Described instruction is when computing device so that appointing in method of the computing device according to claim 1-8 One method.
CN201711179045.XA 2017-11-23 2017-11-23 Method and computing equipment for resisting distillation neural network model Active CN107977707B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711179045.XA CN107977707B (en) 2017-11-23 2017-11-23 Method and computing equipment for resisting distillation neural network model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711179045.XA CN107977707B (en) 2017-11-23 2017-11-23 Method and computing equipment for resisting distillation neural network model

Publications (2)

Publication Number Publication Date
CN107977707A true CN107977707A (en) 2018-05-01
CN107977707B CN107977707B (en) 2020-11-06

Family

ID=62011190

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711179045.XA Active CN107977707B (en) 2017-11-23 2017-11-23 Method and computing equipment for resisting distillation neural network model

Country Status (1)

Country Link
CN (1) CN107977707B (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109241988A (en) * 2018-07-16 2019-01-18 北京市商汤科技开发有限公司 Feature extracting method and device, electronic equipment, storage medium, program product
CN109886160A (en) * 2019-01-30 2019-06-14 浙江工商大学 It is a kind of it is non-limiting under the conditions of face identification method
CN109961442A (en) * 2019-03-25 2019-07-02 腾讯科技(深圳)有限公司 Training method, device and the electronic equipment of neural network model
CN110427466A (en) * 2019-06-12 2019-11-08 阿里巴巴集团控股有限公司 Training method and device for the matched neural network model of question and answer
CN110490202A (en) * 2019-06-18 2019-11-22 腾讯科技(深圳)有限公司 Detection model training method, device, computer equipment and storage medium
WO2020062262A1 (en) * 2018-09-30 2020-04-02 Shanghai United Imaging Healthcare Co., Ltd. Systems and methods for generating a neural network model for image processing
CN111027060A (en) * 2019-12-17 2020-04-17 电子科技大学 Knowledge distillation-based neural network black box attack type defense method
CN111079574A (en) * 2019-11-29 2020-04-28 支付宝(杭州)信息技术有限公司 Method and system for training neural network
CN111105008A (en) * 2018-10-29 2020-05-05 富士通株式会社 Model training method, data recognition method and data recognition device
CN111832701A (en) * 2020-06-09 2020-10-27 北京百度网讯科技有限公司 Model distillation method, device, electronic equipment and storage medium
CN112561076A (en) * 2020-12-10 2021-03-26 支付宝(杭州)信息技术有限公司 Model processing method and device
CN112820313A (en) * 2020-12-31 2021-05-18 北京声智科技有限公司 Model training method, voice separation method and device and electronic equipment
JPWO2020161935A1 (en) * 2019-02-05 2021-11-25 日本電気株式会社 Learning equipment, learning methods, and programs
US11443069B2 (en) 2019-09-03 2022-09-13 International Business Machines Corporation Root cause analysis of vulnerability of neural networks to adversarial examples

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101847019A (en) * 2009-03-23 2010-09-29 上海都峰智能科技有限公司 Multichannel temperature controller
US20120066163A1 (en) * 2010-09-13 2012-03-15 Nottingham Trent University Time to event data analysis method and system
CN102626557A (en) * 2012-04-13 2012-08-08 长春工业大学 Molecular distillation process parameter optimizing method based on GA-BP (Genetic Algorithm-Back Propagation) algorithm
CN105069212A (en) * 2015-07-30 2015-11-18 南通航运职业技术学院 Ballast water microbe quantity prediction method based on artificial neural network

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101847019A (en) * 2009-03-23 2010-09-29 上海都峰智能科技有限公司 Multichannel temperature controller
US20120066163A1 (en) * 2010-09-13 2012-03-15 Nottingham Trent University Time to event data analysis method and system
CN102626557A (en) * 2012-04-13 2012-08-08 长春工业大学 Molecular distillation process parameter optimizing method based on GA-BP (Genetic Algorithm-Back Propagation) algorithm
CN105069212A (en) * 2015-07-30 2015-11-18 南通航运职业技术学院 Ballast water microbe quantity prediction method based on artificial neural network

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
CRISTIAN BUCIL˘A等: "Model Compression", 《IN PROCEEDINGS OF THE 12TH ACMSIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING,KDD》 *
GEOFFREY HINTON等: "Distilling the Knowledge in a Neural Network", 《ARXIV:1503.02531V1 [STAT.ML]》 *
李凡长等主编: "《李群机器学习》", 30 April 2013, 合肥:中国科学技术大学出版社 *
杨文剑等: "集总动力学-BP神经网络混合模型用于预测延迟", 《石油炼制与化工》 *
陆红主编: "《大数据分析方法》", 30 June 2017, 广州:中国财富出版社 *

Cited By (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109241988A (en) * 2018-07-16 2019-01-18 北京市商汤科技开发有限公司 Feature extracting method and device, electronic equipment, storage medium, program product
WO2020062262A1 (en) * 2018-09-30 2020-04-02 Shanghai United Imaging Healthcare Co., Ltd. Systems and methods for generating a neural network model for image processing
US11599796B2 (en) 2018-09-30 2023-03-07 Shanghai United Imaging Healthcare Co., Ltd. Systems and methods for generating a neural network model for image processing
US11907852B2 (en) 2018-09-30 2024-02-20 Shanghai United Imaging Healthcare Co., Ltd. Systems and methods for generating a neural network model for image processing
CN111105008A (en) * 2018-10-29 2020-05-05 富士通株式会社 Model training method, data recognition method and data recognition device
CN109886160B (en) * 2019-01-30 2021-03-09 浙江工商大学 Face recognition method under non-limited condition
CN109886160A (en) * 2019-01-30 2019-06-14 浙江工商大学 It is a kind of it is non-limiting under the conditions of face identification method
JPWO2020161935A1 (en) * 2019-02-05 2021-11-25 日本電気株式会社 Learning equipment, learning methods, and programs
JP7180697B2 (en) 2019-02-05 2022-11-30 日本電気株式会社 LEARNING DEVICE, LEARNING METHOD, AND PROGRAM
CN109961442B (en) * 2019-03-25 2022-11-18 腾讯科技(深圳)有限公司 Training method and device of neural network model and electronic equipment
CN109961442A (en) * 2019-03-25 2019-07-02 腾讯科技(深圳)有限公司 Training method, device and the electronic equipment of neural network model
CN110427466A (en) * 2019-06-12 2019-11-08 阿里巴巴集团控股有限公司 Training method and device for the matched neural network model of question and answer
CN110427466B (en) * 2019-06-12 2023-05-26 创新先进技术有限公司 Training method and device for neural network model for question-answer matching
CN110490202A (en) * 2019-06-18 2019-11-22 腾讯科技(深圳)有限公司 Detection model training method, device, computer equipment and storage medium
US11443069B2 (en) 2019-09-03 2022-09-13 International Business Machines Corporation Root cause analysis of vulnerability of neural networks to adversarial examples
CN111079574A (en) * 2019-11-29 2020-04-28 支付宝(杭州)信息技术有限公司 Method and system for training neural network
CN111079574B (en) * 2019-11-29 2022-08-02 支付宝(杭州)信息技术有限公司 Method and system for training neural network
CN111027060A (en) * 2019-12-17 2020-04-17 电子科技大学 Knowledge distillation-based neural network black box attack type defense method
CN111027060B (en) * 2019-12-17 2022-04-29 电子科技大学 Knowledge distillation-based neural network black box attack type defense method
CN111832701B (en) * 2020-06-09 2023-09-22 北京百度网讯科技有限公司 Model distillation method, model distillation device, electronic equipment and storage medium
CN111832701A (en) * 2020-06-09 2020-10-27 北京百度网讯科技有限公司 Model distillation method, device, electronic equipment and storage medium
CN112561076A (en) * 2020-12-10 2021-03-26 支付宝(杭州)信息技术有限公司 Model processing method and device
CN112820313B (en) * 2020-12-31 2022-11-01 北京声智科技有限公司 Model training method, voice separation method and device and electronic equipment
CN112820313A (en) * 2020-12-31 2021-05-18 北京声智科技有限公司 Model training method, voice separation method and device and electronic equipment

Also Published As

Publication number Publication date
CN107977707B (en) 2020-11-06

Similar Documents

Publication Publication Date Title
CN107977707A (en) A kind of method and computing device for resisting distillation neural network model
US11373087B2 (en) Method and apparatus for generating fixed-point type neural network
CN108334499A (en) A kind of text label tagging equipment, method and computing device
US20210004663A1 (en) Neural network device and method of quantizing parameters of neural network
US20220101090A1 (en) Neural Architecture Search with Factorized Hierarchical Search Space
CN109522942B (en) Image classification method and device, terminal equipment and storage medium
US20200097828A1 (en) Processing method and accelerating device
CN110825884B (en) Embedded representation processing method and device based on artificial intelligence and electronic equipment
CN106780512A (en) The method of segmentation figure picture, using and computing device
US11887005B2 (en) Content adaptive attention model for neural network-based image and video encoders
US20160283842A1 (en) Neural network and method of neural network training
CN107977665A (en) The recognition methods of key message and computing device in a kind of invoice
CN106295521A (en) A kind of gender identification method based on multi output convolutional neural networks, device and the equipment of calculating
Xia et al. Fully dynamic inference with deep neural networks
CN112613581A (en) Image recognition method, system, computer equipment and storage medium
US10657439B2 (en) Processing method and device, operation method and device
CN112418292B (en) Image quality evaluation method, device, computer equipment and storage medium
CN107832794A (en) A kind of convolutional neural networks generation method, the recognition methods of car system and computing device
CN116188878A (en) Image classification method, device and storage medium based on neural network structure fine adjustment
CN111275033A (en) Character recognition method and device, electronic equipment and storage medium
CN114742210A (en) Hybrid neural network training method, traffic flow prediction method, apparatus, and medium
CN106503386A (en) The good and bad method and device of assessment luminous power prediction algorithm performance
CN116342420A (en) Method and system for enhancing mixed degraded image
CN116269312A (en) Individual brain map drawing method and device based on brain map fusion model
He et al. Rank-based greedy model averaging for high-dimensional survival data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant