CN109165738A - Optimization method and device, electronic equipment and the storage medium of neural network model - Google Patents

Optimization method and device, electronic equipment and the storage medium of neural network model Download PDF

Info

Publication number
CN109165738A
CN109165738A CN201811093361.XA CN201811093361A CN109165738A CN 109165738 A CN109165738 A CN 109165738A CN 201811093361 A CN201811093361 A CN 201811093361A CN 109165738 A CN109165738 A CN 109165738A
Authority
CN
China
Prior art keywords
model
output
full articulamentum
student model
student
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811093361.XA
Other languages
Chinese (zh)
Other versions
CN109165738B (en
Inventor
罗棕太
张学森
伊帅
闫俊杰
王晓刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sensetime Technology Development Co Ltd
Original Assignee
Beijing Sensetime Technology Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sensetime Technology Development Co Ltd filed Critical Beijing Sensetime Technology Development Co Ltd
Priority to CN201811093361.XA priority Critical patent/CN109165738B/en
Publication of CN109165738A publication Critical patent/CN109165738A/en
Application granted granted Critical
Publication of CN109165738B publication Critical patent/CN109165738B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

This disclosure relates to a kind of optimization method and device of neural network model, electronic equipment and storage medium.This method comprises: the selected part neuron from the full articulamentum before the output layer of student model;Input data is inputted to student model and teacher's model respectively, based on the partial nerve member chosen from the full articulamentum before the output layer of student model, obtains the output of student model;Whole neurons in full articulamentum before output layer based on teacher's model, obtain the output of teacher's model;The output of output and teacher's model based on student model optimizes student model.The embodiment of the present disclosure can stablize the accuracy rate for improving neural network model, and be capable of the generalization ability of strength neural network model under the premise of not increasing training data total amount and without re -training.

Description

Optimization method and device, electronic equipment and the storage medium of neural network model
Technical field
This disclosure relates to depth learning technology field more particularly to a kind of optimization method and device of neural network model, Electronic equipment and storage medium.
Background technique
Currently, neural network model is applied to the various aspects such as computer vision and natural language processing (such as pedestrian Retrieval and recognition of face), and achieve preferable effect.After neural network model training is completed, the accuracy rate of test is past Toward the upper limit for the accuracy rate for limiting neural network model.
In the related technology, usually by increasing the total amount of training data, changing the structure of neural network model or to mould Type carries out subtle adjustment and retraining, to improve the accuracy rate of neural network model.These methods all propose data volume non- Often high requirement.The total amount and the different accuracy rate for surely improving neural network model of subtle adjustment for increasing training data, may The reason of be the quality of data it is irregular lead to neural network model training difficulty increase.Change the structure of neural network model A usually not specific standard, therefore not can guarantee the accuracy rate of neural network model after training.
Summary of the invention
The present disclosure proposes a kind of optimisation technique schemes of neural network model.
According to the one side of the disclosure, a kind of optimization method of neural network model is provided, comprising:
The selected part neuron from the full articulamentum before the output layer of student model;
Input data is inputted to the student model and teacher's model respectively, based on the output layer from the student model it The partial nerve member chosen in preceding full articulamentum, obtains the output of the student model;
Whole neurons in full articulamentum before output layer based on teacher's model, obtain teacher's model Output;
The output of output and teacher's model based on the student model, optimizes the student model.
In one possible implementation, the selected part nerve from the full articulamentum before the output layer of student model Member, comprising:
The selected part neuron from the last one full articulamentum before the output layer of student model.
In one possible implementation, the selected part nerve from the full articulamentum before the output layer of student model Member, comprising:
For each neuron in the first full articulamentum, a random number is generated in first interval respectively, wherein institute Stating the first full articulamentum indicates that the output layer of the student model carries out the full articulamentum of neuron selection before;
If the corresponding random number of peripheral sensory neuron in the first full articulamentum belongs to second interval, from described first The peripheral sensory neuron is chosen in full articulamentum, wherein the second interval is the subset of the first interval, and described second Section is not equal to the first interval.
In one possible implementation, the output of output and teacher's model based on the student model, it is excellent Change the student model, comprising:
Determine the mean square error between the output of the student model and the output of teacher's model;
Based on the mean square error, first-loss function is obtained;
Using student model described in the first-loss function optimization.
In one possible implementation, the output of output and teacher's model based on the student model, it is excellent Change the student model, comprising:
Determine the relative entropy of the output of the output phase of the student model for teacher's model;
Based on the relative entropy, the second loss function is obtained;
The student model is optimized using second loss function.
In one possible implementation, it is based on the relative entropy, obtains the second loss function, comprising:
Determine the corresponding regulation coefficient of the relative entropy;
The product of the relative entropy and the regulation coefficient is determined as the second loss function.
In one possible implementation, the output is the decilog of decilog layer output.
According to the one side of the disclosure, a kind of optimization device of neural network model is provided, comprising:
Module is chosen, for the selected part neuron from the full articulamentum before the output layer of student model;
First determining module is based on for input data to be inputted to the student model and teacher's model respectively from described The partial nerve member chosen in full articulamentum before the output layer of student model, obtains the output of the student model;
Second determining module, for all nerves in the full articulamentum before the output layer based on teacher's model Member obtains the output of teacher's model;
Optimization module optimizes the student for the output of output and teacher's model based on the student model Model.
In one possible implementation, the selection module is used for:
The selected part neuron from the last one full articulamentum before the output layer of student model.
In one possible implementation, the selection module includes:
Submodule is generated, for generating one in first interval respectively for each neuron in the first full articulamentum A random number, wherein the first full articulamentum indicates that the output layer of the student model carries out the complete of neuron selection before Articulamentum;
Submodule is chosen, if the corresponding random number of peripheral sensory neuron in the described first full articulamentum belongs to the secondth area Between, then the peripheral sensory neuron is chosen from the described first full articulamentum, wherein the second interval is the first interval Subset, and the second interval is not equal to the first interval.
In one possible implementation, the optimization module includes:
First determines submodule, equal between the output of the student model and the output of teacher's model for determining Square error;
Second determines submodule, for being based on the mean square error, obtains first-loss function;
First optimization submodule, for using student model described in the first-loss function optimization.
In one possible implementation, the optimization module includes:
Third determines submodule, for determining the phase of the output of the output phase of the student model for teacher's model To entropy;
4th determines submodule, for being based on the relative entropy, obtains the second loss function;
Second optimization submodule, for optimizing the student model using second loss function.
In one possible implementation, the described 4th determine that submodule includes:
First determination unit, for determining the corresponding regulation coefficient of the relative entropy;
Second determination unit, for the product of the relative entropy and the regulation coefficient to be determined as the second loss function.
In one possible implementation, the output is the decilog of decilog layer output.
According to the one side of the disclosure, a kind of electronic equipment is provided, comprising:
Processor;
Memory for storage processor executable instruction;
Wherein, the processor is configured to: execute the optimization method of above-mentioned neural network model.
According to the one side of the disclosure, a kind of computer readable storage medium is provided, computer program is stored thereon with Instruction, the computer program instructions realize the optimization method of above-mentioned neural network model when being executed by processor.
In the embodiments of the present disclosure, based on the partial nerve chosen from the full articulamentum before the output layer of student model Member obtains the output of student model, whole neurons in full articulamentum before the output layer based on teacher's model, obtains old The output of teacher's model, and the output of the output based on student model and teacher's model optimize student model, and thus, it is possible to not increase Add training data total amount and without stablizing the accuracy rate for improving neural network model, and for big portion under the premise of re -training Neural network model and data is divided generally to be applicable in, and the part mind of the full articulamentum before the output layer by choosing student model The output for effectively to be fitted student model through member, can effectively mitigate the over-fitting of neural network model, strength neural network mould The generalization ability of type.
It should be understood that above general description and following detailed description is only exemplary and explanatory, rather than Limit the disclosure.
According to below with reference to the accompanying drawings to detailed description of illustrative embodiments, the other feature and aspect of the disclosure will become It is clear.
Detailed description of the invention
The drawings herein are incorporated into the specification and forms part of this specification, and those figures show meet this public affairs The embodiment opened, and together with specification it is used to illustrate the technical solution of the disclosure.
Fig. 1 shows the flow chart of the optimization method of the neural network model according to the embodiment of the present disclosure.
Fig. 2 shows the illustrative streams according to the optimization method step S11 of the neural network model of the embodiment of the present disclosure Cheng Tu.
Fig. 3 shows an illustrative stream of the optimization method step S14 according to the neural network model of the embodiment of the present disclosure Cheng Tu.
Fig. 4 shows the another exemplary of the optimization method step S14 according to the neural network model of the embodiment of the present disclosure Flow chart.
Fig. 5 shows an illustrative stream of the optimization method step S145 according to the neural network model of the embodiment of the present disclosure Cheng Tu.
Fig. 6 shows the block diagram of the optimization device according to the neural network model of the embodiment of the present disclosure.
Fig. 7 shows an illustrative block diagram of the optimization device according to the neural network model of the embodiment of the present disclosure.
Fig. 8 is the block diagram of a kind of electronic equipment 800 shown according to an exemplary embodiment.
Fig. 9 is the block diagram of a kind of electronic equipment 1900 shown according to an exemplary embodiment.
Specific embodiment
Various exemplary embodiments, feature and the aspect of the disclosure are described in detail below with reference to attached drawing.It is identical in attached drawing Appended drawing reference indicate element functionally identical or similar.Although the various aspects of embodiment are shown in the attached drawings, remove It non-specifically points out, it is not necessary to attached drawing drawn to scale.
Dedicated word " exemplary " means " being used as example, embodiment or illustrative " herein.Here as " exemplary " Illustrated any embodiment should not necessarily be construed as preferred or advantageous over other embodiments.
The terms "and/or", only a kind of incidence relation for describing affiliated partner, indicates that there may be three kinds of passes System, for example, A and/or B, can indicate: individualism A exists simultaneously A and B, these three situations of individualism B.In addition, herein Middle term "at least one" indicate a variety of in any one or more at least two any combination, it may for example comprise A, B, at least one of C can indicate to include any one or more elements selected from the set that A, B and C are constituted.
In addition, giving numerous details in specific embodiment below in order to which the disclosure is better described. It will be appreciated by those skilled in the art that without certain details, the disclosure equally be can be implemented.In some instances, for Method, means, element and circuit well known to those skilled in the art are not described in detail, in order to highlight the purport of the disclosure.
Fig. 1 shows the flow chart of the optimization method of the neural network model according to the embodiment of the present disclosure.The neural network mould The executing subject of the optimization method of type can be the optimization device of neural network model.For example, the optimization of the neural network model Method can be executed by terminal device or server or other processing equipments.Wherein, terminal device can be user equipment (User Equipment, UE), mobile device, user terminal, terminal, cellular phone, wireless phone, personal digital assistant (Personal Digital Assistant, PDA), handheld device, calculate equipment, mobile unit or wearable device etc..The disclosure is implemented Neural network model in example can be applied to the fields such as intelligent video analysis, safety monitoring or recognition of face.It is some can In the implementation of energy, the optimization method of the neural network model can call the computer stored in memory by processor The mode of readable instruction is realized.As shown in Figure 1, the method comprising the steps of S11 to step S14.
In step s 11, the selected part neuron from the full articulamentum before the output layer of student model.
In the embodiments of the present disclosure, student model and teacher's model are neural network model.Wherein, student model and old Teacher's model can be identical neural network model, or different neural network models.In the embodiment of the present disclosure Raw model can be used for the tasks such as processing feature extraction and/or Object identifying.Wherein, Object identifying can for person recognition or Object identification.For example, person recognition can identify for recognition of face or pedestrian again;Object identification can be vehicle identification Deng.Teacher's model in the embodiment of the present disclosure can be used for the tasks such as processing feature extraction and/or Object identifying.
In one possible implementation, it can be randomly selected from the full articulamentum before the output layer of student model Partial nerve member.
In one possible implementation, the selected part nerve from the full articulamentum before the output layer of student model Member, comprising: the selected part neuron from the last one full articulamentum before the output layer of student model.
It, can be from the full articulamentum of the last one before the output layer of student model as an example of the implementation In randomly select partial nerve member.
In one possible implementation, the selected part nerve from the full articulamentum before the output layer of student model Member, comprising: at least N number of neuron is chosen from the first full articulamentum, wherein the first full articulamentum indicates the output of student model The full articulamentum of neuron selection is carried out before layer, the quantity for the neuron being selected in the first full articulamentum connects entirely less than first The neuron population of layer is connect, N is equal to the neuron population of the first full articulamentum and the product of first threshold, and first threshold is greater than 0 And less than 1.For example, the output layer of student model before carry out neuron selection full articulamentum be student model output layer it The full articulamentum of the last one preceding, then the last one full articulamentum before the first full articulamentum is the output layer of student model. For example, first threshold is equal to 0.7.
In this implementation, by choosing at least N number of neuron from the first full articulamentum, thus, it is possible to avoid choosing Very little neuron, so as to avoid due to caused by choosing very little neuron student model training difficulty greatly increase and/ Or student model the case where can not restraining.
In step s 12, input data is inputted to student model and teacher's model respectively, based on the output from student model The partial nerve member chosen in full articulamentum before layer, obtains the output of student model.
In the embodiments of the present disclosure, input data can be training data.For example, when using pedestrian's weight identification mission training When student model, input data can be the image of pedestrian.
In one possible implementation, the decilog for the output of decilog layer is exported.Wherein, decilog layer refers to Logits layers, the decilog of decilog layer output refers to the logits of logits layers of output.
In alternatively possible implementation, the final output for neural network model is exported.For example, output is Softmax layers of output.
In the embodiments of the present disclosure, pass through the selected part nerve from the full articulamentum before the output layer of student model Member, and the defeated of student model is obtained based on the partial nerve member chosen from the full articulamentum before the output layer of student model Out, thus, it is possible to mitigate the fitting degree of student model, student model over-fitting is avoided, so as to enhance the general of student model Change ability.
In step s 13, whole neurons in the full articulamentum before output layer based on teacher's model, obtain teacher The output of model.
In the embodiments of the present disclosure, the neuron of the full articulamentum of teacher's model is not chosen, is based on teacher's model Whole neurons obtain the output of teacher's model.
In step S14, the output of output and teacher's model based on student model optimizes student model.
In one possible implementation, can output and teacher's model based on student model output, determine learn The corresponding loss function of life model, and student model is optimized using the corresponding loss function of student model.In this implementation, The corresponding loss function of student model can be used, and student model is optimized based on back-propagation algorithm.
In the embodiments of the present disclosure, based on the partial nerve chosen from the full articulamentum before the output layer of student model Member obtains the output of student model, whole neurons in full articulamentum before the output layer based on teacher's model, obtains old The output of teacher's model, and the output of the output based on student model and teacher's model optimize student model, and thus, it is possible to not increase Add training data total amount and without stablizing the accuracy rate for improving neural network model, and for big portion under the premise of re -training Neural network model and data is divided generally to be applicable in, and the part mind of the full articulamentum before the output layer by choosing student model The output for effectively to be fitted student model through member, can effectively mitigate the over-fitting of neural network model, strength neural network mould The generalization ability of type.
Fig. 2 shows the illustrative streams according to the optimization method step S11 of the neural network model of the embodiment of the present disclosure Cheng Tu.As shown in Fig. 2, step S111 may include step S112.
In step S111, for each neuron in the first full articulamentum, one is generated in first interval respectively Random number, wherein the first full articulamentum indicates that the output layer of student model carries out the full articulamentum of neuron selection before.
For example, first interval is [0,1].
In step S112, if the corresponding random number of peripheral sensory neuron in the first full articulamentum belongs to second interval, Peripheral sensory neuron is chosen from the first full articulamentum, wherein second interval is the subset of first interval, and second interval is not equal to First interval.
For example, second interval is [0,0.7].
In the embodiments of the present disclosure, whether belong to according to the corresponding random number of each neuron in the first full articulamentum Two sections judge whether to choose each neuron in the first full articulamentum.For example, if a certain nerve in the first full articulamentum The corresponding random number of member belongs to second interval, then chooses the neuron;If a certain neuron in the first full articulamentum is corresponding Random number is not belonging to second interval, then does not choose the neuron.By for each neuron difference in the first full articulamentum A random number is generated in first interval, if the corresponding random number of peripheral sensory neuron in the first full articulamentum belongs to the secondth area Between, then peripheral sensory neuron is chosen from the first full articulamentum, if the corresponding random number of peripheral sensory neuron in the first full articulamentum It is not belonging to second interval, then does not choose peripheral sensory neuron from the first full articulamentum, thus, it is possible to reach in the first full articulamentum In randomly select the effect of partial nerve member.
Fig. 3 shows an illustrative stream of the optimization method step S14 according to the neural network model of the embodiment of the present disclosure Cheng Tu.As shown in figure 3, step S14 may include step S141 to step S143.
In step s 141, the mean square error between the output of student model and the output of teacher's model is determined.
In the embodiments of the present disclosure, the defeated of the output of student model and teacher's model can be determined using related art method Mean square error between out, details are not described herein.
In step S142, it is based on mean square error, obtains first-loss function.
It in one possible implementation, can be using mean square error as first-loss function.
It, can be using mean square error and the product of the first coefficient as first-loss letter in alternatively possible implementation Number.
It should be noted that although the mode for describing to obtain first-loss function in a manner of implementation above is as above, this Field technical staff it is understood that the disclosure answer it is without being limited thereto.Those skilled in the art can be according to practical application scene demand And/or personal preference flexible setting obtains the specific implementation of first-loss function, as long as being based on mean square error It can.
In step S143, using first-loss function optimization student model.
In one possible implementation, student model can be optimized using first-loss function and the first learning rate.
Fig. 4 shows the another exemplary of the optimization method step S14 according to the neural network model of the embodiment of the present disclosure Flow chart.As shown in figure 4, step S14 may include step S144 to step S146.
In step S144, the relative entropy of the output of the output phase of student model for teacher's model is determined.
In the embodiments of the present disclosure, the output phase of student model can be determined for teacher's model using related art method Output relative entropy, details are not described herein.
In step S145, it is based on relative entropy, obtains the second loss function.
It in one possible implementation, can be using the product of relative entropy and regulation coefficient as the second loss function.
It, can be using relative entropy as the second loss function in alternatively possible implementation.
It should be noted that although the mode for describing to obtain the second loss function in a manner of implementation above is as above, this Field technical staff it is understood that the disclosure answer it is without being limited thereto.Those skilled in the art can be according to practical application scene demand And/or personal preference flexible setting obtains the specific implementation of the second loss function, as long as being obtained based on relative entropy.
In step S146, student model is optimized using the second loss function.
In one possible implementation, student model can be optimized using the second loss function and the second learning rate. Wherein, the second learning rate is less than the first learning rate.
Fig. 5 shows an illustrative stream of the optimization method step S145 according to the neural network model of the embodiment of the present disclosure Cheng Tu.As shown in figure 5, step S145 may include step S1451 and step S1452.
In step S1451, the corresponding regulation coefficient of relative entropy is determined.
In one possible implementation, the corresponding regulation coefficient of relative entropy is greater than 1.
In step S1452, the product of relative entropy and regulation coefficient is determined as the second loss function.
By the way that the product of relative entropy and regulation coefficient is determined as the second loss function, can be avoided in back-propagation process In since gradient is excessive lead to student model failure to train.
It should be noted that although the product of relative entropy and regulation coefficient is determined as the second loss function as example It is as above to describe the mode that the second loss function is obtained based on relative entropy, it is understood by one of ordinary skill in the art that the disclosure is answered It is without being limited thereto.Those skilled in the art can be based on opposite according to practical application scene demand and/or personal preference flexible setting Entropy obtains the specific implementation of the second loss function.For example, the ratio of relative entropy and regulation coefficient can be determined as second Loss function, wherein regulation coefficient is less than 1.
In the embodiments of the present disclosure, the parameter of teacher's model is fixed, and the parameter of student model is not fixed, that is, students are equal In optimizable state, the parameter of student model is with training optimization.Hyper parameter during training, in student model It can manually adjust.After training, student model be can be applied in other tasks different from training mission.
In the embodiments of the present disclosure, student model can be identical with teacher's model, and in other words, the embodiment of the present disclosure does not need The very high student model of utility exercises supervision study, and the student model in the embodiment of the present disclosure can be steamed by knowledge The method evaporated improves accuracy rate, and shows and stablize.
The embodiment of the present disclosure does not need to increase additional data, lower simultaneously for calculation amount and the requirement for calculating the time, The performance of student model can be fast and effeciently promoted in the case where computing resource is less.
It is appreciated that above-mentioned each embodiment of the method that the disclosure refers to, without prejudice to principle logic, To engage one another while the embodiment to be formed after combining, as space is limited, the disclosure is repeated no more.
It will be understood by those skilled in the art that each step writes sequence simultaneously in the above method of specific embodiment It does not mean that stringent execution sequence and any restriction is constituted to implementation process, the specific execution sequence of each step should be with its function It can be determined with possible internal logic.
In addition, the disclosure additionally provides optimization device, electronic equipment, the computer-readable storage medium of neural network model Matter, program, the optimization method of the above-mentioned any neural network model that can be used to realize that the disclosure provides, corresponding technical solution With description and referring to the corresponding record of method part, repeat no more.
Fig. 6 shows the block diagram of the optimization device according to the neural network model of the embodiment of the present disclosure.As shown in fig. 6, the mind Optimization device through network model includes: to choose module 61, for selecting from the full articulamentum before the output layer of student model Take partial nerve first;First determining module 62 is based on for input data to be inputted to student model and teacher's model respectively from The partial nerve member chosen in full articulamentum before the output layer of raw model, obtains the output of student model;Second determines mould Block 63 obtains the output of teacher's model for whole neurons in the full articulamentum before the output layer based on teacher's model; Optimization module 64 optimizes student model for the output of output and teacher's model based on student model.
In one possible implementation, it chooses module 61 to be used for: last before the output layer of student model Selected part neuron in a full articulamentum.
Fig. 7 shows an illustrative block diagram of the optimization device according to the neural network model of the embodiment of the present disclosure.Such as Fig. 7 It is shown:
In one possible implementation, choosing module 61 includes: to generate submodule 611, for connecting entirely for first Each neuron in layer is connect, generates a random number in first interval respectively, wherein the first full articulamentum indicates student's mould The full articulamentum of neuron selection is carried out before the output layer of type;Submodule 612 is chosen, if for the in the first full articulamentum The corresponding random number of one neuron belongs to second interval, then chooses peripheral sensory neuron from the first full articulamentum, wherein the secondth area Between be first interval subset, and second interval be not equal to first interval.
In one possible implementation, optimization module 64 includes: the first determining submodule 641, for determining student Mean square error between the output of model and the output of teacher's model;Second determines submodule 642, for being based on mean square error, Obtain first-loss function;First optimization submodule 643, for using first-loss function optimization student model.
In one possible implementation, optimization module 64 includes: that third determines submodule 644, for determining student Relative entropy of the output phase of model for the output of teacher's model;4th determines submodule 645, for being based on relative entropy, obtains Second loss function;Second optimization submodule 646, for optimizing student model using the second loss function.
In one possible implementation, the 4th determine that submodule 645 includes: the first determination unit, for determining phase Regulation coefficient corresponding to entropy;Second determination unit, for the product of relative entropy and regulation coefficient to be determined as the second loss letter Number.
In one possible implementation, the decilog for the output of decilog layer is exported.
In the embodiments of the present disclosure, based on the partial nerve chosen from the full articulamentum before the output layer of student model Member obtains the output of student model, whole neurons in full articulamentum before the output layer based on teacher's model, obtains old The output of teacher's model, and the output of the output based on student model and teacher's model optimize student model, and thus, it is possible to not increase Add training data total amount and without stablizing the accuracy rate for improving neural network model, and for big portion under the premise of re -training Neural network model and data is divided generally to be applicable in, and the part mind of the full articulamentum before the output layer by choosing student model The output for effectively to be fitted student model through member, can effectively mitigate the over-fitting of neural network model, strength neural network mould The generalization ability of type.
In some embodiments, the embodiment of the present disclosure provides the function that has of device or comprising module can be used for holding The method of row embodiment of the method description above, specific implementation are referred to the description of embodiment of the method above, for sake of simplicity, this In repeat no more.
The embodiment of the present disclosure also proposes a kind of computer readable storage medium, is stored thereon with computer program instructions, institute It states when computer program instructions are executed by processor and realizes the above method.Computer readable storage medium can be non-volatile meter Calculation machine readable storage medium storing program for executing.
The embodiment of the present disclosure also proposes a kind of electronic equipment, comprising: processor;For storage processor executable instruction Memory;Wherein, the processor is configured to the above method.
The equipment that electronic equipment may be provided as terminal, server or other forms.
Fig. 8 is the block diagram of a kind of electronic equipment 800 shown according to an exemplary embodiment.For example, electronic equipment 800 can To be mobile phone, computer, digital broadcasting terminal, messaging device, game console, tablet device, Medical Devices are good for Body equipment, the terminals such as personal digital assistant.
Referring to Fig. 8, electronic equipment 800 may include following one or more components: processing component 802, memory 804, Power supply module 806, multimedia component 808, audio component 810, the interface 812 of input/output (I/O), sensor module 814, And communication component 816.
The integrated operation of the usual controlling electronic devices 800 of processing component 802, such as with display, call, data are logical Letter, camera operation and record operate associated operation.Processing component 802 may include one or more processors 820 to hold Row instruction, to perform all or part of the steps of the methods described above.In addition, processing component 802 may include one or more moulds Block, convenient for the interaction between processing component 802 and other assemblies.For example, processing component 802 may include multi-media module, with Facilitate the interaction between multimedia component 808 and processing component 802.
Memory 804 is configured as storing various types of data to support the operation in electronic equipment 800.These data Example include any application or method for being operated on electronic equipment 800 instruction, contact data, telephone directory Data, message, picture, video etc..Memory 804 can by any kind of volatibility or non-volatile memory device or it Combination realize, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM) is erasable Except programmable read only memory (EPROM), programmable read only memory (PROM), read-only memory (ROM), magnetic memory, fastly Flash memory, disk or CD.
Power supply module 806 provides electric power for the various assemblies of electronic equipment 800.Power supply module 806 may include power supply pipe Reason system, one or more power supplys and other with for electronic equipment 800 generate, manage, and distribute the associated component of electric power.
Multimedia component 808 includes the screen of one output interface of offer between the electronic equipment 800 and user. In some embodiments, screen may include liquid crystal display (LCD) and touch panel (TP).If screen includes touch surface Plate, screen may be implemented as touch screen, to receive input signal from the user.Touch panel includes one or more touches Sensor is to sense the gesture on touch, slide, and touch panel.The touch sensor can not only sense touch or sliding The boundary of movement, but also detect duration and pressure associated with the touch or slide operation.In some embodiments, Multimedia component 808 includes a front camera and/or rear camera.When electronic equipment 800 is in operation mode, as clapped When taking the photograph mode or video mode, front camera and/or rear camera can receive external multi-medium data.It is each preposition Camera and rear camera can be a fixed optical lens system or have focusing and optical zoom capabilities.
Audio component 810 is configured as output and/or input audio signal.For example, audio component 810 includes a Mike Wind (MIC), when electronic equipment 800 is in operation mode, when such as call mode, recording mode, and voice recognition mode, microphone It is configured as receiving external audio signal.The received audio signal can be further stored in memory 804 or via logical Believe that component 816 is sent.In some embodiments, audio component 810 further includes a loudspeaker, is used for output audio signal.
I/O interface 812 provides interface between processing component 802 and peripheral interface module, and above-mentioned peripheral interface module can To be keyboard, click wheel, button etc..These buttons may include, but are not limited to: home button, volume button, start button and lock Determine button.
Sensor module 814 includes one or more sensors, for providing the state of various aspects for electronic equipment 800 Assessment.For example, sensor module 814 can detecte the state that opens/closes of electronic equipment 800, the relative positioning of component, example As the component be electronic equipment 800 display and keypad, sensor module 814 can also detect electronic equipment 800 or The position change of 800 1 components of electronic equipment, the existence or non-existence that user contacts with electronic equipment 800, electronic equipment 800 The temperature change of orientation or acceleration/deceleration and electronic equipment 800.Sensor module 814 may include proximity sensor, be configured For detecting the presence of nearby objects without any physical contact.Sensor module 814 can also include optical sensor, Such as CMOS or ccd image sensor, for being used in imaging applications.In some embodiments, which may be used also To include acceleration transducer, gyro sensor, Magnetic Sensor, pressure sensor or temperature sensor.
Communication component 816 is configured to facilitate the communication of wired or wireless way between electronic equipment 800 and other equipment. Electronic equipment 800 can access the wireless network based on communication standard, such as WiFi, 2G or 3G or their combination.Show at one In example property embodiment, communication component 816 receives broadcast singal or broadcast from external broadcasting management system via broadcast channel Relevant information.In one exemplary embodiment, the communication component 816 further includes near-field communication (NFC) module, short to promote Cheng Tongxin.For example, radio frequency identification (RFID) technology, Infrared Data Association (IrDA) technology, ultra wide band can be based in NFC module (UWB) technology, bluetooth (BT) technology and other technologies are realized.
In the exemplary embodiment, electronic equipment 800 can be by one or more application specific integrated circuit (ASIC), number Word signal processor (DSP), digital signal processing appts (DSPD), programmable logic device (PLD), field programmable gate array (FPGA), controller, microcontroller, microprocessor or other electronic components are realized, for executing the above method.
In the exemplary embodiment, a kind of non-volatile computer readable storage medium storing program for executing is additionally provided, for example including calculating The memory 804 of machine program instruction, above-mentioned computer program instructions can be executed by the processor 820 of electronic equipment 800 to complete The above method.
Fig. 9 is the block diagram of a kind of electronic equipment 1900 shown according to an exemplary embodiment.For example, electronic equipment 1900 It may be provided as a server.Referring to Fig. 9, electronic equipment 1900 includes processing component 1922, further comprise one or Multiple processors and memory resource represented by a memory 1932, can be by the execution of processing component 1922 for storing Instruction, such as application program.The application program stored in memory 1932 may include it is one or more each Module corresponding to one group of instruction.In addition, processing component 1922 is configured as executing instruction, to execute the above method.
Electronic equipment 1900 can also include that a power supply module 1926 is configured as executing the power supply of electronic equipment 1900 Management, a wired or wireless network interface 1950 is configured as electronic equipment 1900 being connected to network and an input is defeated (I/O) interface 1958 out.Electronic equipment 1900 can be operated based on the operating system for being stored in memory 1932, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM or similar.
In the exemplary embodiment, a kind of non-volatile computer readable storage medium storing program for executing is additionally provided, for example including calculating The memory 1932 of machine program instruction, above-mentioned computer program instructions can by the processing component 1922 of electronic equipment 1900 execute with Complete the above method.
The disclosure can be system, method and/or computer program product.Computer program product may include computer Readable storage medium storing program for executing, containing for making processor realize the computer-readable program instructions of various aspects of the disclosure.
Computer readable storage medium, which can be, can keep and store the tangible of the instruction used by instruction execution equipment Equipment.Computer readable storage medium for example can be-- but it is not limited to-- storage device electric, magnetic storage apparatus, optical storage Equipment, electric magnetic storage apparatus, semiconductor memory apparatus or above-mentioned any appropriate combination.Computer readable storage medium More specific example (non exhaustive list) includes: portable computer diskette, hard disk, random access memory (RAM), read-only deposits It is reservoir (ROM), erasable programmable read only memory (EPROM or flash memory), static random access memory (SRAM), portable Compact disk read-only memory (CD-ROM), digital versatile disc (DVD), memory stick, floppy disk, mechanical coding equipment, for example thereon It is stored with punch card or groove internal projection structure and the above-mentioned any appropriate combination of instruction.Calculating used herein above Machine readable storage medium storing program for executing is not interpreted that instantaneous signal itself, the electromagnetic wave of such as radio wave or other Free propagations lead to It crosses the electromagnetic wave (for example, the light pulse for passing through fiber optic cables) of waveguide or the propagation of other transmission mediums or is transmitted by electric wire Electric signal.
Computer-readable program instructions as described herein can be downloaded to from computer readable storage medium it is each calculate/ Processing equipment, or outer computer or outer is downloaded to by network, such as internet, local area network, wide area network and/or wireless network Portion stores equipment.Network may include copper transmission cable, optical fiber transmission, wireless transmission, router, firewall, interchanger, gateway Computer and/or Edge Server.Adapter or network interface in each calculating/processing equipment are received from network to be counted Calculation machine readable program instructions, and the computer-readable program instructions are forwarded, for the meter being stored in each calculating/processing equipment In calculation machine readable storage medium storing program for executing.
Computer program instructions for executing disclosure operation can be assembly instruction, instruction set architecture (ISA) instructs, Machine instruction, machine-dependent instructions, microcode, firmware instructions, condition setup data or with one or more programming languages The source code or object code that any combination is write, the programming language include the programming language-of object-oriented such as Smalltalk, C++ etc., and conventional procedural programming languages-such as " C " language or similar programming language.Computer Readable program instructions can be executed fully on the user computer, partly execute on the user computer, be only as one Vertical software package executes, part executes on the remote computer or completely in remote computer on the user computer for part Or it is executed on server.In situations involving remote computers, remote computer can pass through network-packet of any kind It includes local area network (LAN) or wide area network (WAN)-is connected to subscriber computer, or, it may be connected to outer computer (such as benefit It is connected with ISP by internet).In some embodiments, by utilizing computer-readable program instructions Status information carry out personalized customization electronic circuit, such as programmable logic circuit, field programmable gate array (FPGA) or can Programmed logic array (PLA) (PLA), the electronic circuit can execute computer-readable program instructions, to realize each side of the disclosure Face.
Referring herein to according to the flow chart of the method, apparatus (system) of the embodiment of the present disclosure and computer program product and/ Or block diagram describes various aspects of the disclosure.It should be appreciated that flowchart and or block diagram each box and flow chart and/ Or in block diagram each box combination, can be realized by computer-readable program instructions.
These computer-readable program instructions can be supplied to general purpose computer, special purpose computer or other programmable datas The processor of processing unit, so that a kind of machine is produced, so that these instructions are passing through computer or other programmable datas When the processor of processing unit executes, function specified in one or more boxes in implementation flow chart and/or block diagram is produced The device of energy/movement.These computer-readable program instructions can also be stored in a computer-readable storage medium, these refer to It enables so that computer, programmable data processing unit and/or other equipment work in a specific way, thus, it is stored with instruction Computer-readable medium then includes a manufacture comprising in one or more boxes in implementation flow chart and/or block diagram The instruction of the various aspects of defined function action.
Computer-readable program instructions can also be loaded into computer, other programmable data processing units or other In equipment, so that series of operation steps are executed in computer, other programmable data processing units or other equipment, to produce Raw computer implemented process, so that executed in computer, other programmable data processing units or other equipment Instruct function action specified in one or more boxes in implementation flow chart and/or block diagram.
The flow chart and block diagram in the drawings show system, method and the computer journeys according to multiple embodiments of the disclosure The architecture, function and operation in the cards of sequence product.In this regard, each box in flowchart or block diagram can generation One module of table, program segment or a part of instruction, the module, program segment or a part of instruction include one or more use The executable instruction of the logic function as defined in realizing.In some implementations as replacements, function marked in the box It can occur in a different order than that indicated in the drawings.For example, two continuous boxes can actually be held substantially in parallel Row, they can also be executed in the opposite order sometimes, and this depends on the function involved.It is also noted that block diagram and/or The combination of each box in flow chart and the box in block diagram and or flow chart, can the function as defined in executing or dynamic The dedicated hardware based system made is realized, or can be realized using a combination of dedicated hardware and computer instructions.
The presently disclosed embodiments is described above, above description is exemplary, and non-exclusive, and It is not limited to disclosed each embodiment.Without departing from the scope and spirit of illustrated each embodiment, for this skill Many modifications and changes are obvious for the those of ordinary skill in art field.The selection of term used herein, purport In the principle, practical application or technological improvement to the technology in market for best explaining each embodiment, or lead this technology Other those of ordinary skill in domain can understand each embodiment disclosed herein.

Claims (10)

1. a kind of optimization method of neural network model characterized by comprising
The selected part neuron from the full articulamentum before the output layer of student model;
Input data is inputted to the student model and teacher's model respectively, based on before the output layer of the student model The partial nerve member chosen in full articulamentum, obtains the output of the student model;
Whole neurons in full articulamentum before output layer based on teacher's model, obtain the defeated of teacher's model Out;
The output of output and teacher's model based on the student model, optimizes the student model.
2. the method according to claim 1, wherein being selected from the full articulamentum before the output layer of student model Take partial nerve first, comprising:
The selected part neuron from the last one full articulamentum before the output layer of student model.
3. method according to claim 1 or 2, which is characterized in that the full articulamentum before the output layer of student model Middle selected part neuron, comprising:
For each neuron in the first full articulamentum, a random number is generated in first interval respectively, wherein described the One full articulamentum indicates that the output layer of the student model carries out the full articulamentum of neuron selection before;
If the corresponding random number of peripheral sensory neuron in the first full articulamentum belongs to second interval, connect entirely from described first It connects and chooses the peripheral sensory neuron in layer, wherein the second interval is the subset of the first interval, and the second interval Not equal to the first interval.
4. the method according to claim 1, which is characterized in that the output is that decilog layer exports Decilog.
5. a kind of optimization device of neural network model characterized by comprising
Module is chosen, for the selected part neuron from the full articulamentum before the output layer of student model;
First determining module is based on for input data to be inputted to the student model and teacher's model respectively from the student The partial nerve member chosen in full articulamentum before the output layer of model, obtains the output of the student model;
Second determining module is obtained for whole neurons in the full articulamentum before the output layer based on teacher's model To the output of teacher's model;
Optimization module optimizes the student model for the output of output and teacher's model based on the student model.
6. device according to claim 5, which is characterized in that the selection module is used for:
The selected part neuron from the last one full articulamentum before the output layer of student model.
7. device according to claim 5 or 6, which is characterized in that the selection module includes:
Generate submodule, complete for for each neuron in the first articulamentum, generated in first interval respectively one with Machine number, wherein the first full articulamentum indicates that the output layer of the student model carries out the full connection of neuron selection before Layer;
Submodule is chosen, if the corresponding random number of peripheral sensory neuron in the described first full articulamentum belongs to second interval, The peripheral sensory neuron then is chosen from the described first full articulamentum, wherein the second interval is the son of the first interval Collection, and the second interval is not equal to the first interval.
8. the device according to any one of claim 5 to 7, which is characterized in that the output is that decilog layer exports Decilog.
9. a kind of electronic equipment characterized by comprising
Processor;
Memory for storage processor executable instruction;
Wherein, the processor is configured to: perform claim require any one of 1 to 4 described in method.
10. a kind of computer readable storage medium, is stored thereon with computer program instructions, which is characterized in that the computer Method described in any one of Claims 1-4 is realized when program instruction is executed by processor.
CN201811093361.XA 2018-09-19 2018-09-19 Neural network model optimization method and device, electronic device and storage medium Active CN109165738B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811093361.XA CN109165738B (en) 2018-09-19 2018-09-19 Neural network model optimization method and device, electronic device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811093361.XA CN109165738B (en) 2018-09-19 2018-09-19 Neural network model optimization method and device, electronic device and storage medium

Publications (2)

Publication Number Publication Date
CN109165738A true CN109165738A (en) 2019-01-08
CN109165738B CN109165738B (en) 2021-09-14

Family

ID=64879584

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811093361.XA Active CN109165738B (en) 2018-09-19 2018-09-19 Neural network model optimization method and device, electronic device and storage medium

Country Status (1)

Country Link
CN (1) CN109165738B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110009052A (en) * 2019-04-11 2019-07-12 腾讯科技(深圳)有限公司 A kind of method of image recognition, the method and device of image recognition model training
CN110222705A (en) * 2019-04-23 2019-09-10 华为技术有限公司 A kind of training method and relevant apparatus of network model
CN110807434A (en) * 2019-11-06 2020-02-18 威海若维信息科技有限公司 Pedestrian re-identification system and method based on combination of human body analysis and coarse and fine particle sizes
WO2021047286A1 (en) * 2019-09-12 2021-03-18 华为技术有限公司 Text processing model training method, and text processing method and apparatus
CN112825143A (en) * 2019-11-20 2021-05-21 北京眼神智能科技有限公司 Deep convolutional neural network compression method, device, storage medium and equipment
WO2022052997A1 (en) * 2020-09-09 2022-03-17 Huawei Technologies Co.,Ltd. Method and system for training neural network model using knowledge distillation

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030233335A1 (en) * 2002-06-17 2003-12-18 Mims Aj Student neural network
CN106548190A (en) * 2015-09-18 2017-03-29 三星电子株式会社 Model training method and equipment and data identification method
CN106709565A (en) * 2016-11-16 2017-05-24 广州视源电子科技股份有限公司 Optimization method and device for neural network
CN106991440A (en) * 2017-03-29 2017-07-28 湖北工业大学 A kind of image classification algorithms of the convolutional neural networks based on spatial pyramid
CN107273976A (en) * 2017-06-29 2017-10-20 广州日滨科技发展有限公司 A kind of optimization method of neutral net, device, computer and storage medium
CN107341541A (en) * 2016-04-29 2017-11-10 北京中科寒武纪科技有限公司 A kind of apparatus and method for performing full articulamentum neural metwork training
CN108122003A (en) * 2017-12-19 2018-06-05 西北工业大学 A kind of Weak target recognition methods based on deep neural network
CN108140144A (en) * 2016-03-31 2018-06-08 富士通株式会社 A kind of method, apparatus being trained to neural network model and electronic equipment
US20180197425A1 (en) * 2017-01-06 2018-07-12 Washington State University Self-monitoring analysis and reporting technologies

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030233335A1 (en) * 2002-06-17 2003-12-18 Mims Aj Student neural network
CN106548190A (en) * 2015-09-18 2017-03-29 三星电子株式会社 Model training method and equipment and data identification method
CN108140144A (en) * 2016-03-31 2018-06-08 富士通株式会社 A kind of method, apparatus being trained to neural network model and electronic equipment
CN107341541A (en) * 2016-04-29 2017-11-10 北京中科寒武纪科技有限公司 A kind of apparatus and method for performing full articulamentum neural metwork training
CN106709565A (en) * 2016-11-16 2017-05-24 广州视源电子科技股份有限公司 Optimization method and device for neural network
US20180197425A1 (en) * 2017-01-06 2018-07-12 Washington State University Self-monitoring analysis and reporting technologies
CN106991440A (en) * 2017-03-29 2017-07-28 湖北工业大学 A kind of image classification algorithms of the convolutional neural networks based on spatial pyramid
CN107273976A (en) * 2017-06-29 2017-10-20 广州日滨科技发展有限公司 A kind of optimization method of neutral net, device, computer and storage medium
CN108122003A (en) * 2017-12-19 2018-06-05 西北工业大学 A kind of Weak target recognition methods based on deep neural network

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
JUNHO YIM等: ""A Gift from Knowledge Distillation: Fast Optimization, Network Minimization and Transfer Learning"", 《2017 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR)》 *
LIANG LU等: ""Knowledge Distillation for Small-footprint Highway Networks"", 《HTTPS://ARXIV.ORG/ABS/1608.00892》 *
YI WEI等: ""Quantization Mimic: Towards Very Tiny CNN for Object Detection"", 《HTTPS://ARXIV.ORG/ABS/1805.02152V3》 *
赵胜伟等: ""基于增强监督知识蒸馏的交通标识分类"", 《中国科技论文》 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110009052A (en) * 2019-04-11 2019-07-12 腾讯科技(深圳)有限公司 A kind of method of image recognition, the method and device of image recognition model training
CN110009052B (en) * 2019-04-11 2022-11-18 腾讯科技(深圳)有限公司 Image recognition method, image recognition model training method and device
CN110222705A (en) * 2019-04-23 2019-09-10 华为技术有限公司 A kind of training method and relevant apparatus of network model
CN110222705B (en) * 2019-04-23 2023-10-24 华为技术有限公司 Training method of network model and related device
WO2021047286A1 (en) * 2019-09-12 2021-03-18 华为技术有限公司 Text processing model training method, and text processing method and apparatus
CN110807434A (en) * 2019-11-06 2020-02-18 威海若维信息科技有限公司 Pedestrian re-identification system and method based on combination of human body analysis and coarse and fine particle sizes
CN110807434B (en) * 2019-11-06 2023-08-15 威海若维信息科技有限公司 Pedestrian re-recognition system and method based on human body analysis coarse-fine granularity combination
CN112825143A (en) * 2019-11-20 2021-05-21 北京眼神智能科技有限公司 Deep convolutional neural network compression method, device, storage medium and equipment
WO2022052997A1 (en) * 2020-09-09 2022-03-17 Huawei Technologies Co.,Ltd. Method and system for training neural network model using knowledge distillation

Also Published As

Publication number Publication date
CN109165738B (en) 2021-09-14

Similar Documents

Publication Publication Date Title
CN109165738A (en) Optimization method and device, electronic equipment and the storage medium of neural network model
CN110210535A (en) Neural network training method and device and image processing method and device
CN109800744A (en) Image clustering method and device, electronic equipment and storage medium
CN109800737A (en) Face recognition method and device, electronic equipment and storage medium
CN109919300A (en) Neural network training method and device and image processing method and device
CN109829433A (en) Facial image recognition method, device, electronic equipment and storage medium
CN109614876A (en) Critical point detection method and device, electronic equipment and storage medium
CN110348537A (en) Image processing method and device, electronic equipment and storage medium
CN110516745A (en) Training method, device and the electronic equipment of image recognition model
CN109658352A (en) Optimization method and device, electronic equipment and the storage medium of image information
CN109614613A (en) The descriptive statement localization method and device of image, electronic equipment and storage medium
EP3896587A1 (en) Electronic device for performing user authentication and operation method therefor
CN109543537A (en) Weight identification model increment training method and device, electronic equipment and storage medium
CN109635920A (en) Neural network optimization and device, electronic equipment and storage medium
CN109145970A (en) Question and answer treating method and apparatus, electronic equipment and storage medium based on image
CN109934275A (en) Image processing method and device, electronic equipment and storage medium
CN109858614A (en) Neural network training method and device, electronic equipment and storage medium
CN110458218A (en) Image classification method and device, sorter network training method and device
CN109711546A (en) Neural network training method and device, electronic equipment and storage medium
CN109615006A (en) Character recognition method and device, electronic equipment and storage medium
CN109978891A (en) Image processing method and device, electronic equipment and storage medium
CN109920016A (en) Image generating method and device, electronic equipment and storage medium
CN110245757A (en) A kind of processing method and processing device of image pattern, electronic equipment and storage medium
CN110458102A (en) A kind of facial image recognition method and device, electronic equipment and storage medium
CN109902738A (en) Network module and distribution method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant