CN114298299A

CN114298299A - Model training method, device, equipment and storage medium based on course learning

Info

Publication number: CN114298299A
Application number: CN202110895919.1A
Authority: CN
Inventors: 王子丰; 张子恒
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2021-08-05
Filing date: 2021-08-05
Publication date: 2022-04-08

Abstract

The embodiment of the application discloses a model training method, a device, equipment and a storage medium based on course learning, and belongs to the technical field of artificial intelligence. The method comprises the following steps: acquiring a training data set of a neural network model, wherein the training data set comprises a plurality of data samples; for target data samples in the training data set, acquiring T group model parameters based on parameter distribution sampling of a neural network model, and acquiring T group prediction results corresponding to the target data samples; determining a difficulty index of the target data sample based on the T groups of prediction results; and generating a course learning scheme aiming at the neural network model based on the difficulty index of each data sample in the training data set, and training the model according to the course learning scheme. The application provides a course study scheme that degree of automation is high and more accurate on judging sample difficulty to help promoting the accuracy of the model of training out.

Description

Model training method, device, equipment and storage medium based on course learning

Technical Field

The embodiment of the application relates to the technical field of artificial intelligence, in particular to a model training method, a device, equipment and a storage medium based on course learning.

Background

Curriculum Learning (Curriculum Learning) is a strategy for training neural networks that mimics the process of human Learning. The method advocates that the model learns simple samples/tasks first and then learns complex samples/tasks, so that the model can obtain stronger generalization capability and convergence rate.

In the course learning-based model training process, difficulty degree labeling needs to be carried out on data samples. In the related art, a sample difficulty marking scheme based on a characteristic rule is provided. The difficulty of the data sample is judged by manually compiling characteristic rules by utilizing expert knowledge from some fields and combining the data characteristics of the training data set.

However, this method requires a lot of rules to be written manually, has low automation degree, and often makes the judgment of the difficulty of the sample by people and the model not consistent, thus resulting in low accuracy of the model.

Disclosure of Invention

The embodiment of the application provides a model training method, a device, equipment and a storage medium based on course learning, and provides a course learning scheme which is high in automation degree and more accurate in sample difficulty degree judgment, so that the accuracy of a trained model is improved. The technical scheme is as follows:

according to an aspect of an embodiment of the present application, there is provided a method for training a model based on curriculum learning, the method including:

acquiring a training data set of a neural network model, wherein the training data set comprises a plurality of data samples;

for target data samples in the training data set, based on the parameter distribution of the neural network model, sampling to obtain T groups of model parameters, and respectively performing model prediction processing on the target data samples based on the T groups of model parameters to obtain T groups of prediction results corresponding to the target data samples, wherein T is an integer greater than 1;

determining a difficulty index of the target data sample based on T groups of prediction results corresponding to the target data sample, wherein the difficulty index is used for quantitatively representing the difficulty of the target data sample;

and generating a course learning scheme aiming at the neural network model based on the difficulty degree indexes of the data samples in the training data set, and training the neural network model based on the course learning scheme.

According to an aspect of an embodiment of the present application, there is provided a model training apparatus based on curriculum learning, the apparatus including:

the acquisition module is used for acquiring a training data set of a neural network model, wherein the training data set comprises a plurality of data samples;

the prediction module is used for sampling target data samples in the training data set to obtain T groups of model parameters based on the parameter distribution of the neural network model, and performing model prediction processing on the target data samples respectively based on the T groups of model parameters to obtain T groups of prediction results corresponding to the target data samples, wherein T is an integer greater than 1;

a determining module, configured to determine a difficulty index of the target data sample based on T groups of prediction results corresponding to the target data sample, where the difficulty index is used to quantitatively represent the difficulty of the target data sample;

and the training module is used for generating a course learning scheme aiming at the neural network model based on the difficulty degree indexes of the data samples in the training data set, and training the neural network model based on the course learning scheme.

According to an aspect of embodiments of the present application, there is provided a computer device, comprising a processor and a memory, wherein the memory stores at least one instruction, at least one program, a set of codes, or a set of instructions, and the at least one instruction, the at least one program, the set of codes, or the set of instructions is loaded and executed by the processor to implement the above-mentioned course learning-based model training method.

According to an aspect of embodiments of the present application, there is provided a computer-readable storage medium having at least one instruction, at least one program, a set of codes, or a set of instructions stored therein, which is loaded and executed by a processor to implement the above-mentioned course learning-based model training method.

According to an aspect of embodiments herein, there is provided a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the above-mentioned course learning-based model training method.

The technical scheme provided by the embodiment of the application can bring the following beneficial effects:

the automatic course learning framework is designed aiming at the problems that the sample difficulty is determined based on manual writing rules, the automation degree is low and the accuracy is not high enough. The framework measures the parameter uncertainty of the neural network model, so that the prediction uncertainty of the model for a single sample can be obtained and scored, and the uncertainty can be used for obtaining the difficulty of the sample for model learning, so that the samples are combined and supplied to the model for learning. The automation degree of the scheme is high, manual verification is not needed by field experts, and the difficulty of each sample can be rapidly calculated and used for course arrangement. In addition, according to the technical scheme, the calculation of uncertainty can be adjusted in real time according to the updating of the model parameters, the course can be updated in real time, and the accuracy of the trained model is finally improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a schematic illustration of an environment for implementing an embodiment provided by an embodiment of the present application;

FIG. 2 is a schematic diagram of a scenario for application of the solution provided by an embodiment of the present application;

FIG. 3 is a flow diagram of a method for curriculum learning-based model training provided by an embodiment of the present application;

FIG. 4 is a schematic diagram of a model architecture provided by one embodiment of the present application;

FIG. 5 is a block diagram of a curriculum learning-based model training apparatus provided by an embodiment of the present application;

fig. 6 is a schematic structural diagram of a computer device according to an embodiment of the present application.

Detailed Description

To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

Machine Learning (ML for short) is a multi-domain cross subject, and relates to multiple subjects such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and teaching learning.

The technical scheme of the application relates to an artificial intelligence machine learning technology, in particular to a model training scheme based on course learning, and the technical scheme of the application is introduced and explained through a plurality of embodiments.

Refer to fig. 1, which illustrates a schematic diagram of an environment for implementing an embodiment of the present application. The implementation environment of the scheme can be realized as a model training method based on course learning. The embodiment implementation environment may include a model training apparatus 10 and a model using apparatus 20.

The model training device 10 may be an electronic device such as a computer, server, intelligent robot, or some other electronic device with greater computing power. The model training apparatus 10 is used to train a neural network model. In the embodiments of the present application, the function and structure of the neural network model are not limited. For example, neural network models may be applied to classification tasks, regression tasks, and the like. Illustratively, the classification tasks include, but are not limited to, text classification, picture classification, and the like. In addition, the structure of the Neural Network model may be a Convolutional Neural Network (CNN), a Deep Neural Network (DNN), a Recurrent Neural Network (RNN), or some extensions or modifications based on the above-listed Neural Network structures. The model training device 10 can train the neural network model using a class learning-based model training method to achieve better performance.

The trained neural network model can be deployed in the model using device 20 for performing corresponding tasks and obtaining corresponding execution results. The model using device 20 may be a terminal device such as a mobile phone, a computer, a smart television, a multimedia playing device, a wearable device, a medical device, or a server, which is not limited in this application.

In some embodiments, as shown in FIG. 1, the curriculum learning-based model training method includes a sample difficulty metric module 31 and a curriculum scheduling module 32. The sample difficulty measuring module 31 is configured to obtain an uncertainty index corresponding to the data sample, and obtain a difficulty index of the data sample based on the uncertainty index corresponding to the data sample. Wherein the uncertainty indicator includes, but is not limited to, any of the following: an accidental uncertainty index, a cognitive uncertainty index, a comprehensive uncertainty index, and a signal-to-noise ratio uncertainty index. For an explanation of the various uncertainties, reference is made to the following examples. The course arrangement module 32 arranges a course learning scheme from easy to difficult based on the difficulty index of the data sample, and trains the neural network model according to the course learning scheme to obtain the trained neural network model.

In the following method embodiments, the specific process of the curriculum learning-based model training method will be described in detail.

In some embodiments, taking the application of the technical solution of the present application to disease prediction based on electronic medical record data as an example, the neural network model may be implemented as a disease prediction model, and a disease prediction result may be output based on input electronic medical record data. As shown in fig. 2, the scenario application scenario may include: a terminal 210 and a server 220.

The terminal 210 may be an electronic device such as a mobile phone, a tablet Computer, a PC (Personal Computer), a wearable device, and the like. A client running a target application for presenting a disease prediction result may be installed in the terminal 210. For example, the target application may be a dedicated application for disease prediction and/or disease prediction result presentation, or may be a social application, an instant messaging application, a life service application, and the like with a disease prediction and/or disease prediction result presentation function, which is not limited in this application.

The server 220 may be one server, a server cluster composed of a plurality of servers, or a cloud computing service center. Server 220 may be a background server for the target application described above, and is configured to provide background services for the target application. Illustratively, the disease prediction model is deployed in the server 220, and the server 220 processes the electronic medical record data through the disease prediction model to obtain a corresponding disease prediction result, and then sends the disease prediction result to the client of the target application program for presentation.

The terminal 210 and the server 220 may communicate with each other via a network.

For example, the client of the target application may display a disease prediction interface 230 for a given patient, in which patient information 231 and disease prediction results 232 for the patient are displayed. For example, the patient information 231 of the patient to be diagnosed includes electronic medical record data thereof, including data of patient chief complaints, current medical history, and the like, the disease prediction model outputs a disease prediction result 232 of the patient to be diagnosed based on the electronic medical record data of the patient to be diagnosed, and the disease prediction result 232 may include a plurality of possible diseases of the patient to be diagnosed, such as pharyngolaryngitis, bronchitis, acute upper respiratory infection, and the like shown in fig. 2. Optionally, the plurality of possible diseases in the disease prediction result 232 are ranked and displayed from high to low confidence.

Of course, the above functional description of the neural network model is only exemplary and explanatory, and the neural network model may also be used in many other possible application scenarios, such as the purpose recognition of the input text, the content recognition of the input picture, and the like, and the application is not limited thereto. That is, the Model-modeling technical scheme is Model-independent, and the Model training method based on course learning provided by the application can be applied to training of any Model, and is beneficial to improving the convergence speed and accuracy of the Model.

Referring to fig. 3, a flowchart of a course learning-based model training method according to an embodiment of the present application is shown. The execution subject of the steps of the method may be the model training apparatus 10 in the embodiment environment shown in fig. 1. The method comprises the following steps (310-340):

step 310, a training data set of the neural network model is obtained, wherein the training data set comprises a plurality of data samples.

In the embodiment of the application, the difficulty of the data sample is evaluated, and the model is trained in a course learning mode according to the difficulty index of the data sample.

And 320, for target data samples in the training data set, acquiring T group model parameters based on parameter distribution of a neural network model, and performing model prediction processing on the target data samples respectively based on the T group model parameters to acquire T group prediction results corresponding to the target data samples, wherein T is an integer greater than 1.

The target data sample may be any one of the data samples in the training data set. In the embodiments of the present application, for convenience of explanation, only the target data sample is taken as an example, and a method of determining the difficulty level index of the target data sample will be described. It should be appreciated that each data sample in the training data set may have a difficulty index determined for each data sample in the same manner as the difficulty index for the target data sample.

The parameters of the neural network model have uncertainty (uncertainties). Uncertainty is an important concept in machine learning, and aims to measure the uncertain distribution of neural network model parameters or the uncertainty of a measurement model for a prediction result. Based on the measure of uncertainty, we can obtain confidence in the predicted result, thereby improving model interpretability. When the parameters of the neural network model have uncertainty, i.e., the parameters of the model themselves have distributions, the prediction results of the model also have distributions. Briefly, the neural network model may include a plurality of parameters (e.g., a plurality of weight parameters, a plurality of bias parameters, etc.), each of which is not a fixed value but has a certain distribution, and it is understood that each of the parameters corresponds to a value range. In each prediction, for each parameter, a value is sampled from the corresponding distribution (or value range) thereof, and the value is used as the value of the parameter in the current prediction process. Because the neural network model has a plurality of parameters, each parameter can be sampled in a similar manner from the corresponding distribution (or value range), thereby obtaining a set of model parameters.

In the embodiment of the application, the model parameters are sampled for T times to obtain T groups of model parameters. And for each group of model parameters in the T groups of model parameters, performing model prediction processing on the target data sample by using the group of model parameters to obtain a group of prediction results corresponding to the target data sample. Therefore, T groups of model parameters can obtain T groups of prediction results, and the T groups of model parameters correspond to the T groups of prediction results one by one. For example, the T-th set of model parameters corresponds to the T-th set of predicted results, where T is a positive integer less than or equal to T.

Each set of predictors may be a single value or may include multiple values. For example, in the case that the neural network model performs a two-classification task (i.e. includes 2 prediction classes), each group of prediction results includes 2 values, for example, the 2 values are prediction probability values corresponding to the 2 prediction classes respectively. For another example, in a case where the neural network model performs m classification tasks (i.e., includes m prediction classes), each group of prediction results includes m values, for example, the m values are prediction probability values corresponding to the m prediction classes, respectively, and m is an integer greater than 1.

And 330, determining a difficulty index of the target data sample based on the T groups of prediction results corresponding to the target data sample, wherein the difficulty index is used for quantitatively representing the difficulty of the target data sample.

In the embodiment of the present application, the difficulty index of the target data sample is obtained by evaluating the uncertainty of the T groups of prediction results corresponding to the target data sample, because the uncertainty of the model parameters may cause the uncertainty of the prediction results.

In some embodiments, an uncertainty indicator corresponding to the target data sample is calculated based on the T sets of predictors corresponding to the target data sample, and then a difficulty indicator for the target data sample is determined based on the uncertainty indicator. Optionally, the uncertainty indicator is calculated according to the variance of the T groups of prediction results. The calculation method of the uncertainty index and the difficulty index will be explained in the following examples.

And 340, generating a course learning scheme aiming at the neural network model based on the difficulty degree indexes of the data samples in the training data set, and training the neural network model based on the course learning scheme.

After the difficulty index of each data sample is obtained, a corresponding course learning scheme can be formulated. In some embodiments, the entire training data set D may be equally divided into S portions according to the difficulty index of the data sample, i.e., D ═ D₁，D₂,…，D_S}. Wherein the sample difficulty is from D₁To D_SGradually increasing. When the neural network model is trained, the data samples are easy to learn.

For example, in the first stage, the model is at D₁Training is performed, and in the second stage, the model is in D₁∪D₂Training is performed, and in the third stage, the model is in D₁∪D₂∪D₃Training is carried out on the model, and so on, and in the S stage, the model is in D₁∪D₂∪…D_STraining is carried out until all data samples in the training data set D are learned.

As another example, in the first stage, the model is at D₁Training is performed, and in the second stage, the model is in D₂Training is performed, and in the third stage, the model is in D₃Training is carried out on the model, and so on, and in the S stage, the model is in D_STraining is carried out until all data samples in the training data set D are learned.

Next, a description will be given of a manner of determining the difficulty index of the data sample. In the following embodiments, only the target data sample is taken as an example for description, and the determination manner of the difficulty index of other data samples is the same as or similar to that. The embodiments of the present application provide several possible implementation schemes as follows.

Scheme 1: and calculating an accidental uncertainty index corresponding to the target data sample based on the T groups of prediction results corresponding to the target data sample, and determining the accidental uncertainty index as a difficulty index of the target data sample.

Occasional uncertainty indicators are used to quantify the uncertainty that represents the model prediction result due to the indelible errors implied in the sample data.

In some embodiments, the occasional uncertainty indicator corresponding to the target data sample is calculated by:

1. for each group of prediction results in the T groups of prediction results, converting n-dimensional prediction numerical vectors corresponding to the prediction results into n x n diagonal matrixes, and calculating the outer product of the prediction numerical vectors and the prediction numerical vectors to obtain n x n first outer product matrixes, wherein n is an integer greater than 1;

2. subtracting the diagonal matrix from the first outer product matrix to obtain a difference matrix corresponding to the prediction result;

3. and adding difference matrixes respectively corresponding to the T groups of prediction results, and dividing the sum by T to obtain an accidental uncertainty index corresponding to the target data sample.

Illustratively, the occasional uncertainty index Σ for a target data sample_aleaThe calculation formula of (a) is as follows:

wherein T represents the number of groups of the prediction results, namely the sampling times of the model parameters,

an n-dimensional vector of predictor values representing the T-th set of predictors of the T sets of predictors,

show that

Converted to a diagonal matrix of n x n,

representation calculation

And

the outer product of (a) results in a first outer product matrix of n x n.

Scheme 2: and calculating a cognitive uncertainty index corresponding to the target data sample based on the T groups of prediction results corresponding to the target data sample, and determining the cognitive uncertainty index as a difficulty index of the target data sample.

The cognitive uncertainty index is used for quantifying and representing the uncertainty of the model prediction result caused by the self-ability limitation of the neural network model.

In some embodiments, the cognitive uncertainty indicator corresponding to the target data sample is calculated by:

1. for each group of prediction results in the T groups of prediction results, subtracting the n-dimensional prediction numerical value vector corresponding to the prediction result from the n-dimensional mean vector to obtain an n-dimensional difference vector; the mean vector is the mean of the prediction numerical vectors respectively corresponding to the T groups of prediction results, and n is an integer greater than 1;

2. calculating the outer product of the difference vector and the difference vector to obtain a second outer product matrix of n x n corresponding to the prediction result;

3. and adding the second accumulated matrixes respectively corresponding to the T groups of prediction results, and dividing the sum by T to obtain the cognitive uncertainty index corresponding to the target data sample.

Illustratively, the cognitive uncertainty index Σ for a target data sample_episThe calculation formula of (a) is as follows:

represents a mean vector of the n dimensions and,

i.e. calculating a difference vector

Sum difference vector

And (3) obtaining a second outer product matrix of n x n.

Scheme 3: and calculating an accidental uncertainty index and a cognitive uncertainty index corresponding to the target data sample based on the T groups of prediction results corresponding to the target data sample, determining a comprehensive uncertainty index according to the accidental uncertainty index and the cognitive uncertainty index, and determining the comprehensive uncertainty index as a difficulty index of the target data sample.

In some embodiments, the accidental uncertainty index and the cognitive uncertainty index corresponding to the target data sample are added to obtain a comprehensive uncertainty index corresponding to the target data sample.

Scheme 4: calculating accidental uncertainty indexes and cognitive uncertainty indexes corresponding to the target data samples based on T groups of prediction results corresponding to the target data samples, determining signal-to-noise ratio uncertainty indexes according to the accidental uncertainty indexes and the cognitive uncertainty indexes, and determining the signal-to-noise ratio uncertainty indexes as the difficulty index of the target data samples.

In some embodiments, the accidental uncertainty index and the cognitive uncertainty index corresponding to the target data sample are added to obtain a comprehensive uncertainty index corresponding to the target data sample; dividing the mean prediction result of the neural network model for the target data sample by the comprehensive uncertainty index to obtain a signal-to-noise ratio uncertainty index corresponding to the target data sample; the mean prediction result refers to the mean of the T prediction values corresponding to the label of the target data sample in the T groups of prediction results.

Scheme 5: and determining T predicted values of a prediction category corresponding to the real category of the target data sample based on the T groups of prediction results corresponding to the target data sample, and determining the difficulty index of the target data sample based on the variance of the T predicted values.

For example, the neural network model performs m classification tasks (that is, m prediction classes are included, m is an integer greater than 1), and assuming that the real class of the target data sample is a target prediction class in the m prediction classes, the prediction values corresponding to the target prediction class are respectively obtained from T groups of prediction results corresponding to the target data sample, that is, T prediction values can be obtained, and the variance of the T prediction values is determined as the difficulty index of the target data sample.

In the embodiment of the application, various schemes for determining the difficulty index of a data sample are provided, and the core idea is to calculate the difficulty of the sample for model processing based on the uncertainty of the prediction result corresponding to the sample by the model. The calculation of the scheme 1, the scheme 2 and the scheme 5 is simpler and more efficient, and the scheme 3 and the scheme 4 comprehensively consider two factors, namely accidental uncertainty indexes and cognitive uncertainty indexes, so that the accuracy of the final judgment of the difficulty level of the sample and the accuracy of the trained model are higher. In practical application, the model training effects of the various schemes can be combined, and a proper scheme can be selected for calculating the difficulty level of the sample, which is not limited in the application.

Due to the complexity of the neural network model, even if the parameters of the model are assumed to follow the gaussian distribution, the predicted distribution cannot obtain an analytic solution by an integral method. Instead, we need to approximate the prediction distribution by means of Monte Carlo sampling (Monte Carlo sampling). Specifically, it is assumed that the real parameters of the neural network model obey the distribution P (ω | D), which is a posterior distribution based on the data set D, can be written as follows according to the bayesian theorem:

where ω is a model parameter. Since the right part of the above equation cannot be computed, we usually use a Gaussian distribution q_θ(ω) to approximate the posterior distribution P (ω | D). At this time, for one target data sample (x)^*,y^*) The predicted distribution of (c) can be approximated as:

wherein x is^*Sample data being a sample of the target data, i.e. input data, y, of the sample input model^*Is the true data (also called tag data or true tag) of the target data sample. Omega_tRepresenting the approximate distribution q_θ(ω) the T-th sampling in the set of model parameters, T being the number of samplings. By sampling T times and averaging we can get an approximate distribution, which is the monte carlo sampling.

For the treatment of the syndrome by Monte CarloBy sampling T groups of predicted results, we can calculate the target data sample (x) of the neural network model^*,y^*) Uncertainty of prediction. Suppose q (y)^*) Is the predicted distribution of the neural network model to the target data sample, the variance of the distribution

Can be written as in the formula:

wherein the content of the first and second substances,

is to expect the outer product of the predicted distributions of the target data samples,

is the desired outer product of the predicted distributions of the target data samples.

First half of the above equation

An accidental uncertainty index corresponding to a target data sample refers to uncertainty of a model prediction result caused by an irremovable error implied in the sample data; the latter half of the above equation

The cognitive uncertainty index corresponding to the target data sample refers to uncertainty of a model prediction result caused by self-ability limitation of a neural network model, and is related to data sample distribution learned by the model in the past.

The above formula can be further written as:

wherein the target data sample corresponds to an occasional uncertainty indicator

Cognitive uncertainty index corresponding to target data sample

This form of uncertainty indicator can be conveniently calculated based on monte carlo sampling.

Based on the two uncertainty indexes Sigma_aleaSum-sigma_episThe application designs the following difficulty index of four samples: omega_alea，Ω_epis，Ω_total，Ω_snr. Wherein omega_aleaAnd Ω_episIs the predicted contingent and cognitive uncertainty, Ω, of the corresponding real tag_totalIs the sum of the two. Note that ∑ is_aleaSum-sigma_episThe method is a matrix, and the value of the position of the diagonal line of the matrix corresponding to a real label is used as a difficulty index. In particular, the index based on the signal-to-noise ratio is designed

Represents the mean prediction of the neural network model for the target data sample. The index not only considers the variance, but also considers the predicted mean. Except omega_snrBesides the smaller the sample is, the harder the sample is, the larger the sample is, the more difficult the other indexes are.

To this end, how to get the result of an approximate Gaussian posterior q_θ(ω) sampling for Monte Carlo remains an urgent problem to be solved. First, since the parameter θ of the distribution is unknown (μ, Σ), we need to estimate the parameter distribution. Deep fusion (deep ensemble) is a convenient approach to approximate Monte Carlo samplingMeans are provided. Specifically, the deep fusion uses the idea of model fusion (ensemble) for reference, and obtains multiple groups of model parameters by training the model of the same architecture for multiple times through different random numbers, so that the parameters are used for approximating the posterior distribution of the model. However, in practice, training a model multiple times results in a large waste of resources. The application provides a pseudo model fusion (pseudo ensemble) parameter estimation method based on a cutting method. The pseudo model fusion is implemented by an Influence Function (IF). The influence function is a method for approximating the change of model parameters by the knife-cut method (Jackknife), and is derived from the concept in Robust Statistics (Robust Statistics). Through the influence function, the contribution degree of a single data sample to the model parameter can be measured, and therefore important data samples can be screened out. The impact function is used to measure the difference between the new model parameters obtained by retraining the model and the model parameters before the elimination when a certain data sample is eliminated from the training data set. This process of rejecting a data sample is a knife cutting process. Specifically, the parameter estimation method based on the pseudo model fusion of the cutting method comprises the following steps:

let the model parameter before culling be θ, which is obtained by minimizing the error on the full training data set:

where z represents a single data sample, l (z) refers to the loss of the model for data sample z, and n represents the total number of data samples in the full training data set. After the data sample z is removed from the full training data set, the new model parameter θ \_zComprises the following steps:

the difference Δ θ between the two sets of model parameters can be approximated by an influence function:

where ψ (z) is an influence function, and the expression of the influence function in theory is:

wherein the content of the first and second substances,

is the loss of data sample z, l (z), is graded, and H is the Hessian matrix (Hessian matrix) of the model loss function.

Approximating the posterior q according to n model parameters obtained by the cutting method_θThe covariance of (ω) is:

wherein the content of the first and second substances,

means that the model parameter theta obtained by training on the training data set of the ith data sample is removed^*Is the true model parameter, in combination with the definition of the influence function, the formula can be further written as:

wherein, I_θRepresenting Fisher information matrix (Fisher information matrix), ψ_iIs a function of the influence of the motion of the object,

a gradient representing the loss of the model for the ith data sample,

is that

The transposing of (1).

Due to the fact that

The final form of the above formula is:

wherein the content of the first and second substances,

wherein, the model parameters of the modeling obey Gaussian distribution, and the mean value is theta^*Covariance of

Thus, the covariance matrix of approximate posterior distribution is obtained through the Fisher information matrix

Therefore, the parameters of the model can be conveniently sampled to calculate the difficulty index of each data sample.

Fig. 4 illustrates a schematic diagram of a network architecture of a neural network model. Illustratively, the neural network model may employ a TextCNN network. The TextCNN network is a convolutional neural network, can be used for performing text classification tasks, has the advantages of simple structure and good effect, and is widely applied to the fields of Natural Language Processing (NLP for short) such as text classification and recommendation.

For the text to be classified, firstly, the vector representation of each Word in the text is obtained by using Word2Vec, and the vector representation corresponding to the text is obtained. Where Word2Vec is the correlation model used to generate Word vectors (or called Word embedding). Then, a CNN convolution network is used to perform convolution calculation on the vector representation corresponding to the text, and this process may be understood as using a filter (convolution kernel) to filter each small region of the image, so as to obtain the feature values of these small regions. Each convolution kernel represents an image pattern, and multiple convolution processes may be performed in a single convolution layer. The convolution kernel performs a convolution process on the original vector from left to right and from top to bottom in sequence to obtain a convolution vector which still has characteristics but with reduced scale. The output of the convolutional layer may be pooled (Pooling) to reduce the vector size, while effectively avoiding vector overfitting and reducing the computational process. The convolution and pooling processes can be performed for multiple times, and finally, the feature expression vector corresponding to the text is obtained. And finally, inputting the feature expression vector into a classification network, and outputting a classification result through the classification network. The classification network comprises a full connection layer and an output layer (such as a softmax layer), the full connection layer processes the feature expression vectors, and then the softmax layer calculates the classification result.

The present invention is not limited to the above, and the present invention can be applied to various models such as various text processing models and image processing models because the Model-independent (Model-independent) Model is used.

We name the technical scheme of the present application as UGCL (uncertain-Guided Curriculum Learning), compare the UGCL scheme with some Curriculum Learning schemes provided by the related art, take an image classification task and a text classification task as examples respectively, and verify the model effect on two public data sets to obtain the experimental data shown in table 1 below.

TABLE 1

Specifically, the experiment was performed on multiple photo-classified datasets (including CIFAR-10, CIFAR-100, STL-10, SVHN, etc.) and on multiple text-classified datasets (including Ohsumed, R52, MR, 20NG, etc.). Compared with the UGCL of the scheme, the method has various curriculum learning methods, including SPL, SPCL, SPL-IR, CL-TL, MentorNet, CurricumNet, Data-Param and the like shown in the table 1. The numbers in table 1 are the optimum test accuracy that can be achieved by the corresponding method obtained experimentally. Therefore, the UGCL of the scheme is superior to other methods for comparison in all indexes. In particular, the method based on SNR (Signal Noise Ratio) index (i.e. the above-mentioned SNR uncertainty index) achieves the best effect.

The following are embodiments of the apparatus of the present application that may be used to perform embodiments of the method of the present application. For details which are not disclosed in the embodiments of the apparatus of the present application, reference is made to the embodiments of the method of the present application.

Referring to fig. 5, a block diagram of a class learning-based model training apparatus according to an embodiment of the present application is shown. The device has the function of realizing the model training method based on the course learning, and the function can be realized by hardware or hardware executing corresponding software. The device may be the model training apparatus described above, or may be provided in the model training apparatus. The apparatus 500 may comprise: an acquisition module 510, a prediction module 520, a determination module 530, and a training module 540.

An obtaining module 510, configured to obtain a training data set of the neural network model, where the training data set includes a plurality of data samples.

And the prediction module 520 is configured to, for a target data sample in the training data set, sample to obtain T sets of model parameters based on parameter distribution of the neural network model, and perform model prediction processing on the target data sample based on the T sets of model parameters, respectively, to obtain T sets of prediction results corresponding to the target data sample, where T is an integer greater than 1.

A determining module 530, configured to determine a difficulty index of the target data sample based on the T groups of prediction results corresponding to the target data sample, where the difficulty index is used to quantize the difficulty representing the target data sample.

The training module 540 is configured to generate a course learning scheme for the neural network model based on the difficulty index of each data sample in the training data set, and train the neural network model based on the course learning scheme.

In some embodiments, the determining module 530 is configured to:

calculating an accidental uncertainty index corresponding to the target data sample based on the T groups of prediction results corresponding to the target data sample, wherein the accidental uncertainty index is used for quantitatively representing uncertainty of a model prediction result caused by an irremovable error hidden in the sample data;

calculating a cognitive uncertainty index corresponding to the target data sample based on the T groups of prediction results corresponding to the target data sample, wherein the cognitive uncertainty index is used for quantitatively expressing uncertainty of the model prediction result caused by self-capability limitation of the neural network model;

and determining the difficulty index of the target data sample according to the accidental uncertainty index and the cognitive uncertainty index.

In some embodiments, the determining module 530 is configured to:

and adding the accidental uncertainty index and the cognitive uncertainty index to obtain the difficulty index of the target data sample.

In some embodiments, the determining module 530 is configured to:

adding the accidental uncertainty index and the cognitive uncertainty index to obtain a comprehensive uncertainty index;

dividing the mean prediction result of the neural network model for the target data sample by the comprehensive uncertainty index to obtain the difficulty index of the target data sample; the mean prediction result refers to the mean of the T prediction values corresponding to the label of the target data sample in the T groups of prediction results.

In some embodiments, the determining module 530 is configured to:

for each group of prediction results in the T groups of prediction results, converting n-dimensional prediction numerical vectors corresponding to the prediction results into n x n diagonal matrixes, and calculating the outer product of the prediction numerical vectors and the prediction numerical vectors to obtain n x n first outer product matrixes, wherein n is an integer greater than 1;

subtracting the diagonal matrix from the first outer product matrix to obtain a difference matrix corresponding to the prediction result;

and adding difference matrixes respectively corresponding to the T groups of prediction results, and dividing the sum by T to obtain an accidental uncertainty index corresponding to the target data sample.

In some embodiments, the determining module 530 is configured to:

for each group of prediction results in the T groups of prediction results, subtracting the n-dimensional prediction numerical value vector corresponding to the prediction result from the n-dimensional mean vector to obtain an n-dimensional difference vector; the mean vector is the mean of the prediction numerical vectors respectively corresponding to the T groups of prediction results, and n is an integer greater than 1;

calculating the outer product of the difference vector and the difference vector to obtain a second outer product matrix of n x n corresponding to the prediction result;

and adding the second accumulated matrixes respectively corresponding to the T groups of prediction results, and dividing the sum by T to obtain the cognitive uncertainty index corresponding to the target data sample.

In some embodiments, the prediction module 520 is configured to:

obtaining an influence function corresponding to the target data sample, wherein the influence function is used for measuring the parameter difference of the neural network model obtained by training before and after the target data sample is removed from the training data set;

calculating the covariance of the parameter distribution according to the influence function, and determining the parameter distribution based on the covariance;

and sampling for T times from the parameter distribution to obtain the T groups of model parameters.

It should be noted that, when the apparatus provided in the foregoing embodiment implements the functions thereof, only the division of the functional modules is illustrated, and in practical applications, the functions may be distributed by different functional modules according to needs, that is, the internal structure of the apparatus may be divided into different functional modules to implement all or part of the functions described above. In addition, the apparatus and method embodiments provided by the above embodiments belong to the same concept, and specific implementation processes thereof are described in the method embodiments for details, which are not described herein again.

Referring to fig. 6, a schematic structural diagram of a computer device according to an embodiment of the present application is shown. The Computer device may be any electronic device with data calculation, processing and storage functions, such as a mobile phone, a tablet Computer, a PC (Personal Computer) or a server. The computer device may be implemented as a model training device for implementing the course learning based model training method provided in the above embodiments. Specifically, the method comprises the following steps:

in some embodiments, the computer device 600 includes a Processing Unit (e.g., a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), an FPGA (Field Programmable Gate Array), etc.) 601, a system Memory 604 including a RAM (Random-Access Memory) 602 and a ROM (Read-Only Memory) 603, and a system bus 605 connecting the system Memory 604 and the Central Processing Unit 601. The computer device 600 also includes a basic Input/Output System (I/O System) 606 for facilitating information transfer between the various components within the server, and a mass storage device 607 for storing an operating System 613, application programs 614, and other program modules 615.

In some embodiments, the basic input/output system 606 includes a display 608 for displaying information and an input device 609, such as a mouse, keyboard, etc., for a user to input information. Wherein the display 608 and the input device 609 are connected to the central processing unit 601 through an input output controller 610 connected to the system bus 605. The basic input/output system 606 may also include an input/output controller 610 for receiving and processing input from a number of other devices, such as a keyboard, mouse, or electronic stylus. Similarly, input/output controller 610 may also provide output to a display screen, a printer, or other type of output device.

In some embodiments, the mass storage device 607 is connected to the central processing unit 601 through a mass storage controller (not shown) connected to the system bus 605. The mass storage device 607 and its associated computer-readable media provide non-volatile storage for the computer device 600. That is, the mass storage device 607 may include a computer-readable medium (not shown) such as a hard disk or CD-ROM (Compact disk Read-Only Memory) drive.

Without loss of generality, the computer readable media may comprise computer storage media and communication media. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media includes RAM, ROM, EPROM (Erasable Programmable Read-Only Memory), EEPROM (Electrically Erasable Programmable Read-Only Memory), flash Memory or other solid state Memory technology, CD-ROM, DVD (Digital Video Disc) or other optical, magnetic, tape, magnetic disk storage or other magnetic storage devices. Of course, those skilled in the art will appreciate that the computer storage media is not limited to the foregoing. The system memory 604 and mass storage device 607 described above may be collectively referred to as memory.

According to an embodiment of the present application, the computer apparatus 600 may also operate as a remote computer connected to a network via a network, such as the Internet. That is, the computer device 600 may be connected to the network 612 through the network interface unit 611 connected to the system bus 605, or may be connected to other types of networks or remote computer systems (not shown) using the network interface unit 611.

The memory also includes at least one instruction, at least one program, set of codes, or set of instructions stored in the memory and configured to be executed by the one or more processors to implement the course learning based model training method described above.

In an exemplary embodiment, a computer readable storage medium is also provided, in which at least one instruction, at least one program, a set of codes, or a set of instructions is stored, which when executed by a processor of a computer device, implements the course learning based model training method provided by the above embodiments.

Optionally, the computer-readable storage medium may include: ROM (Read-Only Memory), RAM (Random-Access Memory), SSD (Solid State drive), or optical disk. The Random Access Memory may include a ReRAM (resistive Random Access Memory) and a DRAM (Dynamic Random Access Memory).

In an exemplary embodiment, a computer program product or a computer program is also provided, the computer program product or the computer program comprising computer instructions, the computer instructions being stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer readable storage medium, and the processor executes the computer instructions to cause the computer device to execute the above-mentioned course learning-based model training method.

It should be understood that reference to "a plurality" herein means two or more. "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. In addition, the step numbers described herein only exemplarily show one possible execution sequence among the steps, and in some other embodiments, the steps may also be executed out of the numbering sequence, for example, two steps with different numbers are executed simultaneously, or two steps with different numbers are executed in a reverse order to the order shown in the figure, which is not limited by the embodiment of the present application.

The above description is only exemplary of the present application and is not intended to limit the present application, and any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. A method for model training based on curriculum learning, the method comprising:

2. The method according to claim 1, wherein the determining the difficulty level indicator of the target data sample based on the T groups of prediction results corresponding to the target data sample comprises:

calculating an accidental uncertainty index corresponding to the target data sample based on the T groups of prediction results corresponding to the target data sample, wherein the accidental uncertainty index is used for quantifying and representing uncertainty of a model prediction result caused by an irremovable error hidden in the sample data;

calculating a cognitive uncertainty index corresponding to the target data sample based on the T groups of prediction results corresponding to the target data sample, wherein the cognitive uncertainty index is used for quantifying uncertainty of model prediction results caused by self-capability limitation of the neural network model;

3. The method of claim 2, wherein determining the target data sample's difficulty index based on the incidental uncertainty indicator and the cognitive uncertainty indicator comprises:

and adding the accidental uncertainty index and the cognitive uncertainty index to obtain a difficulty index of the target data sample.

4. The method of claim 2, wherein determining the target data sample's difficulty index based on the incidental uncertainty indicator and the cognitive uncertainty indicator comprises:

dividing the average prediction result of the neural network model for the target data sample by the comprehensive uncertainty index to obtain a difficulty index of the target data sample; wherein the mean prediction result is a mean of T prediction values corresponding to the label of the target data sample in the T groups of prediction results.

5. The method of claim 2, wherein the calculating the contingent uncertainty indicator for the target data sample based on the T sets of predictors for the target data sample comprises:

for each group of the T groups of the prediction results, converting n-dimensional prediction numerical vectors corresponding to the prediction results into n x n diagonal matrixes, and calculating the outer product of the prediction numerical vectors and the prediction numerical vectors to obtain n x n first outer product matrixes, wherein n is an integer greater than 1;

6. The method according to claim 2, wherein the calculating the cognitive uncertainty indicator corresponding to the target data sample based on the T groups of predicted results corresponding to the target data sample comprises:

for each group of prediction results in the T groups of prediction results, subtracting the n-dimensional prediction numerical value vector corresponding to the prediction result from the n-dimensional mean vector to obtain an n-dimensional difference vector; the mean vector is the mean of the predicted numerical value vectors respectively corresponding to the T groups of prediction results, and n is an integer greater than 1;

and adding the second accumulative matrixes respectively corresponding to the T groups of prediction results, and dividing the sum by T to obtain the cognitive uncertainty index corresponding to the target data sample.

7. The method according to any one of claims 1 to 6, wherein the sampling of the T sets of model parameters based on the parameter distribution of the neural network model comprises:

8. A curriculum learning-based model training apparatus, the apparatus comprising:

9. A computer device comprising a processor and a memory, the memory having stored therein at least one instruction, at least one program, set of codes, or set of instructions, which is loaded and executed by the processor to implement the method of any one of claims 1 to 7.

10. A computer readable storage medium having stored therein at least one instruction, at least one program, a set of codes, or a set of instructions, which is loaded and executed by a processor to implement the method according to any one of claims 1 to 7.