CN107977707A

CN107977707A - A kind of method and computing device for resisting distillation neural network model

Info

Publication number: CN107977707A
Application number: CN201711179045.XA
Authority: CN
Inventors: 陈良; 洪炜冬; 张伟; 许清泉; 王喆
Original assignee: Xiamen Meitu Technology Co Ltd
Current assignee: Xiamen Meitu Technology Co Ltd
Priority date: 2017-11-23
Filing date: 2017-11-23
Publication date: 2018-05-01
Anticipated expiration: 2037-11-23
Also published as: CN107977707B

Abstract

The invention discloses a kind of method for resisting distillation neural network model, wherein neural network model includes the softmax layers of feedforward network and the more classification lower probability vectors of output with feature Rotating fields, and this method is suitable for performing in computing device, including step：Scaling layer is added between the feedforward network and softmax layer of original neural network model according to vapo(u)rizing temperature, generates first nerves network model；Using the first label training first nerves network model of training sample itself, nervus opticus network model is obtained；Training sample is inputted into nervus opticus network model, second label of the training sample in classification lower probability vector more is characterized through softmax layers of output；Using the second label and the first label while constrained learning nervus opticus network model, third nerve network model is obtained；The scaling layer in third nerve network model is deleted, to obtain the neural network model after confrontation distillation.The present invention discloses correspondingly computing device in the lump.

Description

A kind of method and computing device for resisting distillation neural network model

Technical field

The present invention relates to technical field of image processing, especially a kind of method for resisting distillation neural network model and calculating Equipment.

Background technology

Deep neural network can be always obtained very accurately as a result, in magnanimity number on present classification regression problem The deep neural network model come according under support, training also has very strong generalization ability, therefore, deep neural network in recent years Widely applied in computer vision, speech recognition etc..But these deep neural network models are in practical application In also can there are some defects and loophole.For example, in the case where not knowing network architecture and parameter, the input to network Special small sample perturbations are done, these can't have any impact judgement from the subjective of people, but can but make network model The very high error result of confidence level is exported, these are referred to as " to resisting sample " by the input that " small sample perturbations " are crossed.The above problem is straight Connect the generalization ability and security for having influenced neural network model.

Being commonly used in the generalization ability of raising neural network model and the scheme of security is：In neural network model Added in training data to resisting sample, error rate of the network model to these confrontation specimen discernings is reduced with this, while further Improve the generalization ability of model.However, diversity built to resisting sample etc. result in this processing mode and be not reaching to expection Effect.

Therefore, it is necessary to a kind of scheme for being capable of providing neural network model generalization ability and security.

The content of the invention

For this reason, the present invention provides a kind of method and computing device for resisting distillation neural network model, to try hard to solve Or at least alleviate existing at least one problem above.

According to an aspect of the invention, there is provided a kind of method for resisting distillation neural network model, wherein nerve net Network model includes the softmax layers of feedforward network and the more classification lower probability vectors of output with feature Rotating fields, and this method is fitted In being performed in computing device, including step：According to vapo(u)rizing temperature original neural network model feedforward network and Scaling layer is added between softmax layers, generates first nerves network model；Utilize the first label training of training sample itself One neural network model, obtains nervus opticus network model；Training sample is inputted into nervus opticus network model, through softmax Second label of the layer output characterization training sample in classification lower probability vector more；Constrained at the same time using the second label and the first label Training nervus opticus network model, obtains third nerve network model；And the scaling layer in deletion third nerve network model, To obtain the neural network model after confrontation distillation.

Alternatively, in the method according to the invention, scaling layer is suitable for doing according to input of the vapo(u)rizing temperature to softmax layers Diminution is handled.

Alternatively, in the method according to the invention, the first label training first nerves net of training sample itself is utilized Network model, which obtains the step of nervus opticus network model, to be included：First nerves is supervised by first-loss function using the first label The training of network model, obtains nervus opticus network model.

Alternatively, in the method according to the invention, using the second label and the first label, constrained learning second is refreshing at the same time The step of obtaining third nerve network model through network model includes：Pass through the god of first-loss function pair second using the first label Classification supervised training is carried out through network model；Nervus opticus network model is carried out by the second loss function using the second label Return supervised training；And combination first-loss function and the second loss function train to obtain third nerve network model.

Alternatively, in the method according to the invention, train to obtain with reference to first-loss function and the second loss function The step of three neural network models, includes：Weighting is done to first-loss function and the second loss function to handle to obtain the god of training the 3rd Final loss function through network model；And utilize the final loss function training third nerve network model.

Alternatively, in the method according to the invention, first-loss function is：

loss₁=-logf (z_k)

Wherein,

In formula, loss₁For first-loss functional value, batch size when N is training, z_kFor full articulamentum in feedforward network The output of k-th of neuron.

Alternatively, in the method according to the invention, the second loss function is：

In formula, loss₂For the second loss function value, M is that the classification of the classification through softmax layers of output is total, x_1iTo work as The probability vector of preceding i-th of classification output of network, x_2iThe probability vector of i-th of the classification characterized for corresponding second label.

Alternatively, in the method according to the invention, the final loss function of training third nerve network model is defined as： Loss=w₁×loss₁+w₂×loss₂,

In formula, loss is final loss function value, w₁And w₂First-loss functional value and the second loss function value are represented respectively Weight factor.

According to another aspect of the present invention, there is provided a kind of computing device, including：One or more processors；And storage Device；One or more programs, wherein one or more program storages in memory and be configured as by one or more handle Device performs, and one or more programs include being used for the instruction for performing the either method in method as described above.

In accordance with a further aspect of the present invention, there is provided a kind of computer-readable storage medium for storing one or more programs Matter, one or more programs include instruction, instruct when computing device so that computing device method as described above In either method.

The method of confrontation distillation neural network model according to the present invention, by being added in original neural network model Scaling layer, does neural network model and distills in itself, without changing the feature Rotating fields in neural network model, is effectively reduced Error rate of the neural network model when reply is to resisting sample；Also, with the second label of training sample and itself the One label supervises the training of neural network model at the same time, improves the generalization ability of neural network model.

Brief description of the drawings

In order to realize above-mentioned and related purpose, some illustrative sides are described herein in conjunction with following description and attached drawing Face, these aspects indicate the various modes that can put into practice principles disclosed herein, and all aspects and its equivalent aspect It is intended to fall under in the range of theme claimed.Read following detailed description in conjunction with the accompanying drawings, the disclosure it is above-mentioned And other purposes, feature and advantage will be apparent.Throughout the disclosure, identical reference numeral generally refers to identical Component or element.

Fig. 1 shows the organigram of computing device 100 according to an embodiment of the invention；And

Fig. 2 shows the flow chart of the method 200 of confrontation distillation neural network model according to an embodiment of the invention.

Embodiment

The exemplary embodiment of the disclosure is more fully described below with reference to accompanying drawings.Although the disclosure is shown in attached drawing Exemplary embodiment, it being understood, however, that may be realized in various forms the disclosure without should be by embodiments set forth here Limited.On the contrary, these embodiments are provided to facilitate a more thoroughly understanding of the present invention, and can be by the scope of the present disclosure Completely it is communicated to those skilled in the art.

Fig. 1 is the block diagram of Example Computing Device 100.In basic configuration 102, computing device 100, which typically comprises, is System memory 106 and one or more processor 104.Memory bus 108 can be used in processor 104 and system storage Communication between device 106.

Depending on desired configuration, processor 104 can be any kind of processor, include but not limited to：Microprocessor Device (μ P), microcontroller (μ C), digital information processor (DSP) or any combination of them.Processor 104 can include all Cache, processor core such as one or more rank of on-chip cache 110 and second level cache 112 etc 114 and register 116.Exemplary processor core 114 can include arithmetic and logical unit (ALU), floating-point unit (FPU), Digital signal processing core (DSP core) or any combination of them.Exemplary Memory Controller 118 can be with processor 104 are used together, or in some implementations, Memory Controller 118 can be an interior section of processor 104.

Depending on desired configuration, system storage 106 can be any type of memory, include but not limited to：Easily The property lost memory (RAM), nonvolatile memory (ROM, flash memory etc.) or any combination of them.System stores Device 106 can include operating system 120, one or more apply 122 and routine data 124.In some embodiments, It may be arranged to be operated using routine data 124 on an operating system using 122.In certain embodiments, computing device 100 are configured as performing the method that neural network model is distilled in confrontation, are just contained in routine data 124 for performing the side The instruction of method.

Computing device 100 can also include contributing to from various interface equipments (for example, output equipment 142, Peripheral Interface 144 and communication equipment 146) to basic configuration 102 via the communication of bus/interface controller 130 interface bus 140.Example Output equipment 142 include graphics processing unit 148 and audio treatment unit 150.They can be configured as contribute to via One or more A/V port 152 communicates with the various external equipments of such as display or loudspeaker etc.Outside example If interface 144 can include serial interface controller 154 and parallel interface controller 156, they, which can be configured as, contributes to Via one or more I/O port 158 and such as input equipment (for example, keyboard, mouse, pen, voice-input device, image Input equipment) or the external equipment of other peripheral hardwares (such as printer, scanner etc.) etc communicate.Exemplary communication is set Standby 146 can include network controller 160, it can be arranged to be easy to via one or more communication port 164 and one The communication that other a or multiple computing devices 162 pass through network communication link.

Network communication link can be an example of communication media.Communication media can be usually presented as in such as carrier wave Or computer-readable instruction in the modulated data signal of other transmission mechanisms etc, data structure, program module, and can With including any information delivery media." modulated data signal " can such signal, one in its data set or more It is a or it change can the mode of coding information in the signal carry out.As nonrestrictive example, communication media can be with Include the wire medium of such as cable network or private line network etc, and it is such as sound, radio frequency (RF), microwave, infrared (IR) the various wireless mediums or including other wireless mediums.Term computer-readable medium used herein can include depositing Both storage media and communication media.In certain embodiments, one or more programs are stored in computer-readable medium, this Or multiple programs include performing the instruction of some methods, such as according to an embodiment of the invention, computing device 100 passes through the finger Make to perform the method for confrontation distillation neural network model.

Computing device 100 can be implemented as a part for portable (or mobile) electronic equipment of small size, these electronics are set Standby can be such as cell phone, digital camera, personal digital assistant (PDA), personal media player device, wireless network Browsing apparatus, personal helmet, application specific equipment or the mixing apparatus that any of the above function can be included.Calculating is set Standby 100 are also implemented as including desktop computer and the personal computer of notebook computer configuration.

Below with reference to Fig. 2, the side that neural network model is distilled in confrontation according to an embodiment of the invention is elaborated Method 200 realizes flow.

The general structure of neural network model according to embodiments of the present invention can be divided into two parts, i.e. have characteristic layer The softmax layers of the feedforward network of structure and the more classification lower probability vectors of output.Wherein, feedforward network generally has at least one Convolutional layer, pond layer and full articulamentum, input data for example through multiple convolution and pondization operation after, again through full articulamentum unicom close Exported after and.The result of normalized is made in the softmax layers of output that can be understood as to feedforward network, it is assumed that neutral net mould Type is used for classifying to picture, and picture classification has 100 kinds at present, that is exactly one 100 dimension by softmax layers of output Vector, first value in vector is exactly probable value that photo current belongs to the first classification, it is vectorial in second value be exactly It is 1 that photo current, which belongs to the sum of the probable value ... of the second classification also, the vector value of this hundred dimension,.

It should be noted that the general application scenarios of method 200 are to carry out classification processing using neural network model, its is right The concrete structure of neural network model is not limited.In practical applications, the feedforward network of neural network model can be selected AlexNet, VGGNet, Google Inception Net, ResNet etc. be existing or the network structure that redefines in appoint Meaning one, the embodiment of the present invention is not restricted this.

Method 200 starts from step S210, if vapo(u)rizing temperature is T, according to vapo(u)rizing temperature in original neural network model Scaling layer is added between feedforward network and softmax layers, generates first nerves network model.An implementation according to the present invention Example, scaling layer do diminution processing according to vapo(u)rizing temperature T to softmax layers of input (that is, the output of feedforward network).Namely Say, increase a scaling layer between the feedforward network and softmax layer of original neural network model, scaling layer is by feedforward network Output (it is, output of the full articulamentum of last in feedforward network) do the diminution of 1/T, it is then the data after diminution are defeated Enter to softmax layers.The embodiment of the present invention is not limited the value of vapo(u)rizing temperature T, in practical applications, according to it is preceding to The size and actual conditions of network choose the value of T.

Then in step S220, using the first label training first nerves network model of training sample itself, obtain Nervus opticus network model.According to one embodiment of present invention, training sample is inputted into first nerves network model, by the One loss function supervises the training of first nerves network model on the training sample of the first label for labelling, and will train network The neural network model of interior each parameter as nervus opticus network model, wherein, the first label is the label of training sample in itself, Referred to as hard label.

Then in step S230, the training sample in step S220 is inputted to the nervus opticus trained through step S220 In network model, second label of the training sample in classification lower probability vector more is characterized through softmax layers of output, is referred to as soft Target, prediction probability vector of the second label, that is, nervus opticus network model to training sample.

Then in step S240, the second label and the first label while constrained learning nervus opticus network model are utilized (that is, the neural network model after distillation), obtains third nerve network model.According to one embodiment of present invention, training the During two neural network models, with the first label (hard label) and the second label (soft target) of above-mentioned training sample Training network model at the same time, and distribute two groups of loss functions.Specific steps are described as follows：

On the one hand, for the first label (hard label), by first-loss function pair nervus opticus network model into Row classification supervised training.Training process can utilize the first label training second of training sample itself with step S220 herein Neural network model.Alternatively, first-loss function is, for example, the full name of Softmax the with loss, Caffe in Caffe It is Convolutional Architecture for Fast Feature Embedding, it provides the kit increased income, For training, testing, finely tuning and deployment depth learning model.According to one embodiment of present invention, choose in Caffe Softmax with loss carry out classification supervised learning.

On the other hand, for the second label (soft target), by the second loss function to nervus opticus network model Carry out recurrence supervised training.Second loss function is, for example, the Euclidean loss in Caffe, for learning to be fitted soft The output vector of target.It should be noted that when training nervus opticus network herein, because the presence of scaling layer, forward direction net The output of network has all carried out the diminution of 1/T.

Finally, train to obtain third nerve network model with reference to first-loss function and the second loss function.According to this hair Bright one embodiment, sets different weights, to first-loss function respectively for first-loss function and the second loss function Weighting is done with the second loss function to handle to obtain the final loss function for training third nerve network model, utilizes the final loss Function trains third nerve network model.

Then in step s 250, the scaling layer in third nerve network model is deleted, to obtain the god after confrontation distillation Through network model.That is, the distillation temperature by the scaling layer in the third nerve network model trained through step S240 Degree T is set to 1 (that is, cancelling scaling processing), obtains the neural network model after confrontation distillation.

Embodiment according to the present invention, by adding scaling layer in original neural network model, to neutral net Model is done in itself to be distilled, and without changing the feature Rotating fields in neural network model, is significantly reduced neural network model and is existed Error rate when tackling to resisting sample；Also, with the probability vector (i.e. the second label) and the first label of itself after distillation The training of neural network model is supervised at the same time, improves the generalization ability of neural network model.

To further illustrate the above method 200, below by taking the chin classification during human face five-sense-organ is classified as an example, method is introduced 200 specific implementation procedure.

The first step, chooses traditional VGG-Face networks as feedforward network, it is T=20 to take vapo(u)rizing temperature.In forward direction net Scaling layer is added between network and softmax layers, output to feedforward network (that is, the full articulamentum of last in feedforward network Output) carry out 1/20 diminution, be input to again in softmax layer, the neural network model that will add scaling layer is refreshing as first Through network model.

VGGNet is Oxford University's computer vision research group (Visual Geometry Group) and Google The depth convolutional neural networks that the researcher of DeepMind companies researches and develops together, are often used to abstract image feature, VGG-Face It is the network that one of them in VGG groups is used for doing recognition of face.It is using between the depth and its performance of convolutional neural networks Relation, by the small-sized convolution kernel and 2 × 2 maximum pond layer of stacking 3 × 3 repeatedly, construct 16~19 layer depths Convolutional neural networks, whole network is simple for structure, all using an equal amount of convolution kernel size (3 × 3) and maximum pond size (2×2).More details on VGGNet can refer to paper：VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION, herein do not illustrate its network structure excessively.

Second step, is instructed using first-loss function on the training sample (i.e. training image) of the first label for labelling Practice, using trained neural network model as nervus opticus network model.The label of first label, that is, training image in itself.Its In, first-loss function (softmax loss) is defined as：

loss₁=-logf (z_k)

Wherein,

In above formula, loss₁For first-loss functional value, batch size (batch_size) when N is training is popular next Say, N can be understood as input sample quantity of the first nerves network model when a forward-propagating is handled, z_kFor feedforward network In full articulamentum k-th of neuron output.

3rd step, nervus opticus network model is input to by original training image, obtains each training image The probability vector of output in two neural network models (that is, distilled model), as the second label.For example, some training figure Second label of picture is [0.93,0.02,0.05], and the chin that three probable values correspond in the image respectively is under square jaw, point Bar, circle chin probability.

4th step, the training of nervus opticus network model is supervised using the first label and the second label, will be trained at the same time Neural network model as third nerve network model.

As it was previously stated, according to one embodiment of present invention, for the first label, still using the first damage in second step Lose function (softmax loss) and classification supervision is carried out to nervus opticus network model；For the second label, the second loss function Selection Euclid's loss function (euclidean loss) carries out recurrence supervision to nervus opticus network model.Wherein, second Loss function (that is, euclidean loss) is defined as：

In above formula, loss₂For the second loss function value, M is that the classification sum of the classification through softmax layers of output is (that is, special The dimension of sign), x_1iFor the probability vector of i-th of classification output of current network, x_2iI-th characterized for corresponding second label The probability vector of classification.

Then, different weights is assigned to above-mentioned two loss function, then trains the final damage of third nerve network model Function is lost finally to be defined as：

Loss=w₁×loss₁+w₂×loss₂

In formula, loss is final loss function value, w₁And w₂First-loss functional value and the second loss function value are represented respectively Weight factor.w₁And w₂Value depending on training, the embodiment of the present invention is not restricted this.

5th step, deletes the scaling layer in trained third nerve network model, i.e., vapo(u)rizing temperature T is set to 1, gained Neural network model can be used as confrontation distillation after neural network model.

The method of confrontation distillation neural network model according to the present invention, in a manner of distilling neural network model itself, Error rate of the neural network model when reply is to resisting sample is effectively reduced, meanwhile, the classification obtained in classification problem is put down Face is more reasonable, classification results are more accurate.Also, scheme according to the present invention need not be constructed to resisting sample, improved well The security of neural network model.

It should be appreciated that in order to simplify the disclosure and help to understand one or more of each inventive aspect, it is right above The present invention exemplary embodiment description in, each feature of the invention be grouped together into sometimes single embodiment, figure or In person's descriptions thereof.However, the method for the disclosure should be construed to reflect following intention：I.e. claimed hair The bright feature more features required than being expressly recited in each claim.More precisely, as the following claims As book reflects, inventive aspect is all features less than single embodiment disclosed above.Therefore, it then follows specific real Thus the claims for applying mode are expressly incorporated in the embodiment, wherein each claim is used as this hair in itself Bright separate embodiments.

Those skilled in the art should understand that the module or unit or group of the equipment in example disclosed herein Part can be arranged in equipment as depicted in this embodiment, or alternatively can be positioned at and the equipment in the example In different one or more equipment.Module in aforementioned exemplary can be combined as a module or be segmented into addition multiple Submodule.

Those skilled in the art, which are appreciated that, to carry out adaptively the module in the equipment in embodiment Change and they are arranged in one or more equipment different from the embodiment.Can be the module or list in embodiment Member or component be combined into a module or unit or component, and can be divided into addition multiple submodule or subelement or Sub-component.In addition at least some in such feature and/or process or unit exclude each other, it can use any Combination is disclosed to all features disclosed in this specification (including adjoint claim, summary and attached drawing) and so to appoint Where all processes or unit of method or equipment are combined.Unless expressly stated otherwise, this specification (including adjoint power Profit requires, summary and attached drawing) disclosed in each feature can be by providing the alternative features of identical, equivalent or similar purpose come generation Replace.

In addition, it will be appreciated by those of skill in the art that although some embodiments described herein include other embodiments In included some features rather than further feature, but the combination of the feature of different embodiments means in of the invention Within the scope of and form different embodiments.For example, in the following claims, embodiment claimed is appointed One of meaning mode can use in any combination.

Various technologies described herein can combine hardware or software, or combinations thereof is realized together.So as to the present invention Method and apparatus, or some aspects of the process and apparatus of the present invention or part can take embedded tangible media, such as soft The form of program code (instructing) in disk, CD-ROM, hard disk drive or other any machine readable storage mediums, Wherein when program is loaded into the machine of such as computer etc, and is performed by the machine, the machine becomes to put into practice this hair Bright equipment.

In the case where program code performs on programmable computers, computing device generally comprises processor, processor Readable storage medium (including volatile and non-volatile memory and or memory element), at least one input unit, and extremely A few output device.Wherein, memory is arranged to store program codes；Processor is arranged to according to the memory Instruction in the said program code of middle storage, performs method of the present invention.

By way of example and not limitation, computer-readable medium includes computer storage media and communication media.Calculate Machine computer-readable recording medium includes computer storage media and communication media.Computer-readable storage medium storage such as computer-readable instruction, The information such as data structure, program module or other data.Communication media is generally modulated with carrier wave or other transmission mechanisms etc. Data-signal processed passes to embody computer-readable instruction, data structure, program module or other data including any information Pass medium.Any combination above is also included within the scope of computer-readable medium.

In addition, be described as herein can be by the processor of computer system or by performing for some in the embodiment The method or the combination of method element that other devices of the function are implemented.Therefore, have and be used to implement the method or method The processor of the necessary instruction of element forms the device for being used for implementing this method or method element.In addition, device embodiment Element described in this is the example of following device：The device is used to implement as in order to performed by implementing the element of the purpose of the invention Function.

As used in this, unless specifically stated, come using ordinal number " first ", " second ", " the 3rd " etc. Description plain objects are merely representative of the different instances for being related to similar object, and are not intended to imply that the object being so described must Must have the time it is upper, spatially, in terms of sequence or given order in any other manner.

Although according to the embodiment of limited quantity, the invention has been described, benefits from above description, the art It is interior it is clear for the skilled person that in the scope of the present invention thus described, it can be envisaged that other embodiments.Additionally, it should be noted that The language that is used in this specification primarily to readable and teaching purpose and select, rather than in order to explain or limit Determine subject of the present invention and select.Therefore, in the case of without departing from the scope and spirit of the appended claims, for this Many modifications and changes will be apparent from for the those of ordinary skill of technical field.For the scope of the present invention, to this The done disclosure of invention is illustrative and not restrictive, and it is intended that the scope of the present invention be defined by the claims appended hereto.

Claims

1. a kind of method for resisting distillation neural network model, wherein neural network model include the forward direction with feature Rotating fields The softmax layers of network and the more classification lower probability vectors of output, the method are suitable for performing in computing device, the method bag Include step：

Scaling layer is added between the feedforward network and softmax layer of original neural network model according to vapo(u)rizing temperature, is generated First nerves network model；

Using the first label training first nerves network model of training sample itself, nervus opticus network model is obtained；

Training sample is inputted into the nervus opticus network model, training sample is characterized under more classification through softmax layers of output Second label of probability vector；

Using second label and the first label while nervus opticus network model described in constrained learning, third nerve net is obtained Network model；And

The scaling layer in third nerve network model is deleted, to obtain the neural network model after confrontation distillation.

2. the method for claim 1, wherein the scaling layer is suitable for according to input of the vapo(u)rizing temperature to softmax layers Do diminution processing.

3. method as claimed in claim 1 or 2, wherein, utilize the first label training first nerves net of training sample itself Network model, which obtains the step of nervus opticus network model, to be included：

The training of the first nerves network model is supervised by first-loss function using the first label, obtains nervus opticus net Network model.

4. such as the method any one of claim 1-3, wherein, constrained at the same time using second label and the first label The step of training nervus opticus network model obtains third nerve network model includes：

Classification supervised training is carried out by nervus opticus network model described in first-loss function pair using the first label；

Recurrence supervised training is carried out to the nervus opticus network model by the second loss function using the second label；And

Train to obtain third nerve network model with reference to first-loss function and the second loss function.

5. method as claimed in claim 4, wherein, the combination first-loss function and the second loss function train to obtain The step of three neural network models, includes：

Weighting is done to first-loss function and the second loss function to handle to obtain the final loss for training third nerve network model Function；And

Utilize the final loss function training third nerve network model.

6. method as claimed in claim 5, wherein, the first-loss function is：

loss₁=-logf (z_k)

Wherein,

In formula, loss₁For first-loss functional value, batch size when N is training, z_kFor k-th of full articulamentum in feedforward network The output of neuron.

7. method as claimed in claim 6, wherein, second loss function is：

<mrow> <msub> <mi>loss</mi> <mn>2</mn> </msub> <mo>=</mo> <mfrac> <mn>1</mn> <mrow> <mn>2</mn> <mi>M</mi> </mrow> </mfrac> <msubsup> <mi>&Sigma;</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>M</mi> </msubsup> <mo>|</mo> <mo>|</mo> <msub> <mi>x</mi> <mrow> <mn>1</mn> <mi>i</mi> </mrow> </msub> <mo>-</mo> <msub> <mi>x</mi> <mrow> <mn>2</mn> <mi>i</mi> </mrow> </msub> <mo>|</mo> <msup> <mo>|</mo> <mn>2</mn> </msup> </mrow>

In formula, loss₂For the second loss function value, M is that the classification of the classification through softmax layers of output is total, x_1iFor current net The probability vector of i-th of classification output of network, x_2iThe probability vector of i-th of the classification characterized for corresponding second label.

8. the method for claim 7, wherein, the final loss function of training third nerve network model is defined as：

Loss=w₁×loss₁+w₂×loss₂

In formula, loss is final loss function value, w₁And w₂The power of first-loss functional value and the second loss function value is represented respectively Repeated factor.

9. a kind of computing device, including：

One or more processors；With

Memory；

One or more programs, wherein one or more of program storages are in the memory and are configured as by described one A or multiple processors perform, and one or more of programs include being used to perform according in claim 1-8 the methods The instruction of either method.

10. a kind of computer-readable recording medium for storing one or more programs, one or more of programs include instruction, Described instruction is when computing device so that appointing in method of the computing device according to claim 1-8 One method.