CN114219078A

CN114219078A - Neural network model interactive training method and device and storage medium

Info

Publication number: CN114219078A
Application number: CN202111545139.0A
Authority: CN
Inventors: 乔少华
Original assignee: Heading Data Intelligence Co Ltd
Current assignee: Heading Data Intelligence Co Ltd
Priority date: 2021-12-15
Filing date: 2021-12-15
Publication date: 2022-03-22

Abstract

The invention relates to a neural network model interactive training method and a device, wherein the method comprises the following steps: determining a primary neural network and at least one secondary neural network participating in interactive training; determining a target function participating in interactive training according to the distribution difference between the primary neural network and the secondary neural network; and training the primary neural network and the secondary neural network according to the objective function until the objective function value reaches a threshold value and tends to be stable, so as to obtain the trained primary neural network. The invention provides a neural network interactive training method, which is characterized in that KL divergence is adopted to measure the difference of the prediction probability distribution of a main network and a secondary network, and the main network learning is guided by the interactive learning experience of the main network and the secondary network, so that the performance similar to or slightly higher than that of the secondary network is obtained, and the problems that the convergence is slow when the main network is trained independently, the main network is easy to fall into local optimization, particularly under the condition of limited training sample amount, the generalization of a network model is weak, the detection rate is low and the like are solved.

Description

Neural network model interactive training method and device and storage medium

Technical Field

The invention belongs to the technical field of deep learning, and particularly relates to a neural network model interactive training method, a device and a storage medium.

Background

In recent years, the deep learning neural network obtains remarkable achievement in the fields of computer vision, natural language processing, intelligent voice recognition and the like, various application scenes are more and more mature, but under a more complex environment, an algorithm model often shows an unstable prediction result, and application experience is reduced. Researchers find that the main reason causing the above problems is that the model does not fully learn the complex scene information during iterative training, so that the prediction result is not accurate enough.

In order to improve the performance of a network model, the current mainstream solution supports model iterative training by collecting a large number of effective samples in each scene and by data augmentation and other methods, obviously, the performance of the model can be stably improved by enough sample amount, but the training cost is greatly increased, the manufacturing of the training samples is a slow process, the training samples are mainly acquired by manual labeling at present, and the efficiency is low. Particularly, when network model verification and evaluation are performed, and training samples are limited, the output accuracy of the model is often not ideal, and the performance index of the model is affected to some extent, so that a network model training strategy is needed to alleviate the problems.

Disclosure of Invention

In order to solve the problems that the precision of a neural network is not high under the condition that a training sample is limited and the neural network is unstable under the condition that the training environment is complex in the training process, the invention provides a neural network model interactive training method in a first aspect, which comprises the following steps: determining a primary neural network and at least one secondary neural network participating in interactive training; determining a target function participating in interactive training according to the distribution difference between the primary neural network and the secondary neural network; and training the primary neural network and the secondary neural network according to the objective function until the objective function value reaches a threshold value and tends to be stable, so as to obtain the trained primary neural network.

In some embodiments of the present invention, the determining an objective function involved in interactive training according to a distribution difference between the primary neural network and the secondary neural network includes: determining a supervised loss function of the master neural network; determining a loss function of interactive training of the primary neural network and each secondary neural network; and determining an objective function participating in interactive training according to the supervision loss function and the loss function of each interactive training.

Further, the objective function participating in interactive training is determined by the following method:

L₀₁＝αl₀₁+(1-α)D，

wherein L is₀₁And D represents the loss function of interactive training of the neural network and each secondary network.

Preferably, the loss function of the interactive training of the neural network with each sub-network is measured by KL divergence.

Further, the supervision loss function of the main neural network is a Focal loss function.

In the above embodiment, the determining a primary neural network participating in interactive training, and the at least one secondary neural network includes: taking the network model with higher matching degree as a main neural network according to the requirement; and taking one or more neural network models with higher performance or higher generalization capability than the main neural network as a secondary neural network.

In a second aspect of the present invention, there is provided a neural network model interactive training device, including: the first determining module is used for determining a main neural network and at least one secondary neural network which participate in interactive training; the second determination module is used for determining an objective function participating in interactive training according to the distribution difference between the primary neural network and the secondary neural network; and the training module is used for training the primary neural network and the secondary neural network according to the target function until the target function value reaches a threshold value and tends to be stable, so as to obtain the trained primary neural network.

In some embodiments of the invention, the second determining module comprises: a first determination unit for determining a supervised loss function of the main neural network; the second determining unit is used for determining a loss function of interactive training of the main neural network and each secondary neural network; and the third determining unit is used for determining an objective function participating in interactive training according to the supervision loss function and the loss function of each interactive training.

In a third aspect of the present invention, there is provided an electronic device comprising: one or more processors; a storage device, configured to store one or more programs, which when executed by the one or more processors, cause the one or more processors to implement the neural network model interactive training method provided in the first aspect of the present invention.

In a fourth aspect of the present invention, a computer-readable medium is provided, on which a computer program is stored, wherein the computer program, when executed by a processor, implements the neural network model interactive training method provided in the first aspect of the present invention.

The invention has the beneficial effects that:

1. the invention provides a neural network interactive training strategy, which updates the weight of a main network model by using a supervision loss function, measures the difference of the prediction probability distribution of the main network and a secondary network by adopting KL divergence as an interactive learning loss function, realizes the purpose of guiding the learning of the main network by fully utilizing the optimal estimation of the high performance of the secondary network and the interactive learning experience of the main network and the secondary network, and further obtains the performance which is similar to or slightly higher than that of the secondary network. The strategy aims to solve the problems that the convergence is slow when a main network is trained independently, the main network is easy to fall into local optimum, particularly, the generalization of a network model is weak under the condition of limiting the training sample amount, the detection rate is low and the like;

2. the invention mainly adopts interactive training between the primary network model and the secondary network model to assist the primary network in the effective learning process of the sample information. The method aims to alleviate the problems, and has the main advantages that the optimal prediction probability distribution is output by means of strong performance of a secondary network to guide iterative training of a main network, and the optimal prediction probability distribution is more quickly close to a minimum value point. Under the condition of limited training sample size, interactive training can enable the main network to obtain performance similar to or even slightly higher than that of the secondary network, because the learning and understanding of the secondary network on data are at a higher level, the optimal estimation of the secondary network is transmitted to the main network to update iteration, so that the main network obtains learning capacity consistent with the secondary network, and the generalization of the model is improved;

3. weighting and processing the interactive training target function, wherein the secondary network prediction dominates the whole network interactive training in the initial stage of interactive network training, the convergence of the primary network is accelerated, the primary network and the secondary network learn different knowledge in the models of the primary network and the secondary network in the middle and later stages of training, the good fitting capacity of the secondary network to the complex scene shares the learning experience through a KL divergence mode, the learning experience is jointly improved, and the interactive training process is realized. Certainly, more information can be mined, for example, the primary network and the secondary network learn feature spaces of different targets, the relevance between the targets is expressed to a certain degree, and good generalization capability is shown for coping with complex scenes;

4. the invention aims to interactively train and learn between primary and secondary network models, can also expand interactive training between multiple network models, and utilizes n-1 models to assist the primary network learning, so that the models can show more efficient performance in precision, generalization and stability.

Drawings

FIG. 1 is a schematic basic flow diagram of a neural network model interactive training method in some embodiments of the present invention;

FIG. 2 is a schematic flow chart of a neural network model interactive training method according to some embodiments of the present invention;

FIG. 3 is a second flowchart illustrating an interactive training method for neural network models according to some embodiments of the present invention;

FIG. 4 is a schematic diagram of a neural network model interactive training device in some embodiments of the present invention;

fig. 5 is a schematic structural diagram of an electronic device in some embodiments of the invention.

Detailed Description

The principles and features of this invention are described below in conjunction with the following drawings, which are set forth by way of illustration only and are not intended to limit the scope of the invention.

Referring to fig. 1, in a first aspect of the present invention, there is provided a neural network model interactive training method, including: s100, determining a main neural network and at least one secondary neural network which participate in interactive training; s200, determining a target function participating in interactive training according to the distribution difference between the primary neural network and the secondary neural network; s300, training the primary neural network and the secondary neural network according to the objective function until the objective function value reaches a threshold value and tends to be stable, and obtaining the trained primary neural network.

Referring to fig. 2, in step S200 of some embodiments of the present invention, determining an objective function participating in interactive training according to a distribution difference between the primary neural network and the secondary neural network includes: s201, determining a supervision loss function of the main neural network;

specifically, CNN _ 01 predicts the probability:

wherein the content of the first and second substances,

for CNN _ 01 pair input data x_iAnd (5) performing prediction and softmax calculation output.

In particular, in the iterative training process of the difficult samples and the easily-divided samples, in order to make the model pay more attention to the learning of the difficult samples, a Focal loss is adopted to define a loss function:

wherein, p is a true value,

in order to predict the probability of the event,

s202, determining a loss function of interactive training of the main neural network and each secondary neural network;

specifically, the interaction objective function measures the difference value of the prediction probability distribution of the primary network and the secondary network by adopting KL divergence:

wherein D is_KLIn order to obtain a KL divergence, the dispersion,

the prediction probability of CNN _ 02 is obtained, m and n respectively represent the training sample indexes of the primary neural network and the secondary neural network, and KL divergence is used for calculating the difference between the prediction probability distribution of the primary network and the secondary network. The high precision of the secondary network is benefited at the initial training stage, the primary network is biased to the learning experience of the secondary network during iterative training, the convergence of the model is accelerated, and the generalization performance of the model is improved to a certain extent.

S203, determining a target function participating in interactive training according to the supervision loss function and the loss function of each interactive training.

L₀₁＝αl₀₁+(1-α)D，

Specifically, the objective function is a weighted sum of the main network loss function and the interactive training loss function:

L₀₁＝αl₀₁+(1-α)D_KL，

wherein, alpha is a weight factor, because the secondary network is a pre-training weight model, the prediction precision is higher, less loss weight is distributed, alpha is initially set to be 0.85, the alpha value is gradually reduced according to the result of the network model training verification set in the later training period, the weight factor can be subjected to self-adaptive processing, and the constraint condition is that the primary network is a pre-training weight model, the alpha value is mainly adjusted to be lower than the secondary network, the secondary network is a pre-training weight model, the secondary network is a secondary network, the secondary network, and the secondary network is a secondary networkAnd (4) training the learning degree, such as the epoch times, accuracy, recall rate and the like, by the network. Two kinds of loss functions not only make the major network learn different types of information, can also promote the major network generalization through the prediction probability of secondary network, reach mild minimum extreme point more easily. Optionally, the loss function D of the interactive training_KLIn addition to using KL scatters, a MSE mean square error loss function, an SVM loss function, a Cross Encopy cross entropy loss function, a Smooth L1 loss function commonly used in target detection, or other loss functions that measure the difference between two probability distributions may be used.

In step S100 of the above embodiment, the determining a primary neural network participating in interactive training, and the at least one secondary neural network includes: s101, taking a network model with high matching degree as a main neural network according to requirements; and S102, taking one or more neural network models with performance higher than or generalization capability higher than that of the main neural network as a secondary neural network.

Specifically, the soft and hard environment of a model deployment platform is evaluated, a network model with high matching degree is selected as a main network CNN _ 01 according to requirements, and light-weight backbone networks such as VGG, ResNet, ShuffleNet, MobileNet and the like are recommended to be selected; the secondary network model CNN-02 selection standard is high in performance and network structure complexity and has good generalization capability on various different scenes. Similarly, when the interactive training method is used for natural language processing tasks, the main network CNN-01 selects RNN network models such as LSTM and GRU, and the secondary network model CNN-02 selects OPT-3 or BERT series. It will be appreciated that as a result of the development of ensemble learning and federal learning, the secondary network model may be implemented using an organic combination or fusion of multiple neural network models.

Referring to fig. 3, in step S300 of the above embodiment, the interactive training process of the neural network includes: s301: network model training data set input x_iInputting a size setting h x w and a learning rate gamma according to the corresponding label y; s302: initializing the initial weight of the main network randomly, and freezing the weight of the secondary network; s303: starting iterative training, x is randomly obtained from the data set when t is 0_iInputting into network to make forward reasoning calculation to respectively obtainProbability estimation value P of network to primary and secondary₀₁And P₀₂Judging whether the target function meets a set threshold value or not; s304: and calculating the gradient of each layer network when the main network reversely propagates, and updating the weight and the bias according to the gradient:

where w is the corresponding convolution kernel weight and b is the per-layer network bias.

S305: and repeating S303 and S304 until the target function error meets the set threshold, and outputting and storing the network training weight. It can be understood that, in the above steps S301 to S303, training initial parameters are set according to the primary network model backhaul; S304-S305 are stages of adjusting training parameters according to the result of the data prediction of the verification set

Example 2

Referring to fig. 4, in a second aspect of the present invention, there is provided a neural network model interactive training apparatus 1, including: a first determining module 11, configured to determine a primary neural network and at least one secondary neural network participating in interactive training; a second determining module 12, configured to determine an objective function participating in interactive training according to a distribution difference between the primary neural network and the secondary neural network; and the training module 13 is configured to train the primary neural network and the secondary neural network according to the objective function until the objective function value reaches a threshold value and tends to be stable, so as to obtain a trained primary neural network.

In some embodiments of the invention, the second determination module 12 comprises: a first determination unit for determining a supervised loss function of the main neural network; the second determining unit is used for determining a loss function of interactive training of the main neural network and each secondary neural network; and the third determining unit is used for determining an objective function participating in interactive training according to the supervision loss function and the loss function of each interactive training.

Example 3

Referring to fig. 5, in a third aspect of the present invention, there is provided an electronic apparatus comprising: one or more processors; storage means for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to carry out the method of the invention in the first aspect.

The electronic device 500 may include a processing means (e.g., central processing unit, graphics processor, etc.) 501 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)502 or a program loaded from a storage means 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data necessary for the operation of the electronic apparatus 500 are also stored. The processing device 501, the ROM 502, and the RAM 503 are connected to each other through a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.

The following devices may be connected to the I/O interface 505 in general: input devices 506 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; output devices 507 including, for example, a Liquid Crystal Display (LCD), speakers, vibrators, and the like; a storage device 508 including, for example, a hard disk; and a communication device 509. The communication means 509 may allow the electronic device 500 to communicate with other devices wirelessly or by wire to exchange data. While fig. 5 illustrates an electronic device 500 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided. Each block shown in fig. 5 may represent one device or may represent multiple devices as desired.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication means 509, or installed from the storage means 508, or installed from the ROM 502. The computer program, when executed by the processing device 501, performs the above-described functions defined in the methods of embodiments of the present disclosure. It should be noted that the computer readable medium described in the embodiments of the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In embodiments of the disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In embodiments of the present disclosure, however, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more computer programs which, when executed by the electronic device, cause the electronic device to:

computer program code for carrying out operations for embodiments of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, Python, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. A neural network model interactive training method is characterized by comprising the following steps:

determining a primary neural network and at least one secondary neural network participating in interactive training;

determining a target function participating in interactive training according to the distribution difference between the primary neural network and the secondary neural network;

and training the primary neural network and the secondary neural network according to the objective function until the objective function value reaches a threshold value and tends to be stable, so as to obtain the trained primary neural network.

2. The interactive training method for neural network models according to claim 1, wherein the determining an objective function involved in interactive training according to the distribution difference between the primary neural network and the secondary neural network comprises:

determining a supervised loss function of the master neural network;

determining a loss function of interactive training of the primary neural network and each secondary neural network;

and determining an objective function participating in interactive training according to the supervision loss function and the loss function of each interactive training.

3. The neural network model interactive training method of claim 2, wherein the objective function involved in interactive training is determined by:

L₀₁＝αl₀₁+(1-α)D，

wherein L is₀₁Representing the supervised loss function of the principal neural network, alpha being a weighting factor, D representing the nerveA loss function of the interactive training of the network with each sub-network.

4. The neural network model interactive training method of claim 3, wherein the loss function of interactive training of the neural network with each sub-network is measured by KL divergence.

5. The interactive training method for neural network models as claimed in claim 2, wherein the supervised loss function of the master neural network is a Focal loss function.

6. The neural network model interactive training method of any one of claims 1 to 5, wherein the determining a primary neural network involved in interactive training and at least one secondary neural network comprises:

taking the network model with higher matching degree as a main neural network according to the requirement;

and taking one or more neural network models with higher performance or higher generalization capability than the main neural network as a secondary neural network.

7. An interactive training device for neural network models, comprising:

the first determining module is used for determining a main neural network and at least one secondary neural network which participate in interactive training;

the second determination module is used for determining an objective function participating in interactive training according to the distribution difference between the primary neural network and the secondary neural network;

and the training module is used for training the primary neural network and the secondary neural network according to the target function until the target function value reaches a threshold value and tends to be stable, so as to obtain the trained primary neural network.

8. The interactive neural network model training device of claim 7, wherein the second determining module comprises:

a first determination unit for determining a supervised loss function of the main neural network;

the second determining unit is used for determining a loss function of interactive training of the main neural network and each secondary neural network;

and the third determining unit is used for determining an objective function participating in interactive training according to the supervision loss function and the loss function of each interactive training.

9. An electronic device, comprising: one or more processors; a storage device for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the neural network model interactive training method of any one of claims 1-6.

10. A computer-readable medium, on which a computer program is stored which, when being executed by a processor, carries out a neural network model interaction training method as claimed in any one of claims 1 to 6.