CN113642727A

CN113642727A - Training method of neural network model and processing method and device of multimedia information

Info

Publication number: CN113642727A
Application number: CN202110905430.8A
Authority: CN
Inventors: 章文俊; 黄强; 卓泽城; 潘旭; 洪赛丁; 杨哲; 徐思琪; 刘晨晖
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-08-06
Filing date: 2021-08-06
Publication date: 2021-11-12

Abstract

The disclosure provides a training method of a neural network model and a processing method of multimedia information, relates to the field of artificial intelligence, in particular to the fields of deep learning, natural language processing, computer vision and cloud computing, and can be applied to scenes such as smart cities. The training method comprises the following steps: determining a neural network model group and a sample data set according to a target task, wherein the neural network model group comprises m models with different initial parameters, and the sample data set comprises a training sample subset and a test sample subset; in the current training period: adjusting parameters of each model in the neural network model group based on the training sample subset to obtain m adjusted models; based on the test sample subset, selecting n models with higher precision from the adjusted m models; under the condition that the current training period does not meet the training stopping condition, updating the neural network model group based on the n models, and returning to execute the next training period; otherwise, selecting the model for executing the target task from the n models.

Description

Training method of neural network model and processing method and device of multimedia information

Technical Field

The disclosure relates to the technical field of artificial intelligence, in particular to the fields of deep learning, natural language processing, computer vision and cloud computing, and can be applied to scenes such as smart cities.

Background

With the development of computer technology and network technology, deep learning technology has been widely used in many fields. At present, the mainstream training method of the deep learning model is a fine tuning method, and specifically, the training of a single model is realized by optimizing hyper-parameters of the model. If the optimization direction is not correct (for example, the hyper-parameter selection is wrong), the problem of model overfitting is easy to occur. When a deep learning model composed of a plurality of submodels is trained by an ensemble learning method, the submodels converge due to the independence of each submodel, but the overall model performance cannot be further improved.

Disclosure of Invention

Based on the above, the present disclosure provides a training method of a neural network model and a processing method, apparatus, device and storage medium of multimedia information, which improve the performance of the model.

According to an aspect of the present disclosure, there is provided a training method of a neural network model, including: determining a neural network model group and a first sample data set according to a target task, wherein the neural network model comprises m models with different initial parameters, and the first sample data set comprises a training sample subset and a testing sample subset; in the current training period: adjusting parameters of each model in the neural network model group based on the training sample subset to obtain m adjusted models; based on the test sample subset, selecting n first models with higher precision from the adjusted m models; under the condition that the current training period does not meet the training stopping condition, updating the neural network model group based on the n first models, and returning to execute the next training period; and under the condition that the current training period meets the training stopping condition, selecting a model for executing the target task from the n first models, wherein m and n are both natural numbers larger than 1, and m is larger than n.

According to another aspect of the present disclosure, there is provided a method for processing multimedia information, including: inputting multimedia information into a neural network model to obtain output data; and determining the category of the multimedia information based on the output data, wherein the neural network model is obtained by training by adopting the training method of the neural network model described in the foregoing, and the target task comprises a multimedia information classification task.

According to another aspect of the present disclosure, there is provided a training apparatus of a neural network model, including: a data determination module for determining a neural network model group and a first sample data set according to a target task, wherein the neural network model group comprises m models with different initial parameters, and the first sample data set comprises a training sample subset and a test sample subset; the model adjusting module is used for adjusting parameters of each model in the current neural network model group based on the training sample subset in the current training period to obtain m adjusted models; the first model selection module is used for selecting n first models with higher precision from the adjusted m models based on the test sample subset; the first model group updating module is used for updating the current neural network model group based on the n first models and returning to execute the next training period under the condition that the current training period does not meet the training stopping condition; and the second model selection module is used for selecting a model for executing the target task from the n first models under the condition that the current training period meets the training stop condition, wherein m and n are both natural numbers greater than 1, and m is greater than n.

According to another aspect of the present disclosure, there is provided a multimedia information processing apparatus including: the data acquisition module is used for inputting the multimedia information into the neural network model to obtain output data; and a class determination module, configured to determine a class of the multimedia information based on the output data, where the neural network model is obtained by training the training device of the neural network model described above, and the target task includes a multimedia information classification task.

According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method of training a neural network model and/or a method of processing multimedia information provided by the present disclosure.

According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform a training method of a neural network model and/or a processing method of multimedia information provided by the present disclosure.

According to another aspect of the present disclosure, a computer program product is provided, comprising a computer program which, when executed by a processor, implements a method of training a neural network model and/or a method of processing multimedia information provided by the present disclosure.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

fig. 1 is a schematic view of an application scenario of a training method of a neural network model and a processing method and device of multimedia information according to an embodiment of the disclosure;

FIG. 2 is a schematic flow diagram of a method of training a neural network model according to an embodiment of the present disclosure;

FIG. 3 is a schematic diagram of updating a set of neural network models, according to an embodiment of the present disclosure;

FIG. 4 is a schematic diagram of a method of training a neural network model according to an embodiment of the present disclosure;

FIG. 5 is a flow chart illustrating a method for processing multimedia information according to an embodiment of the disclosure;

FIG. 6 is a block diagram of a training apparatus for a neural network model according to an embodiment of the present disclosure;

fig. 7 is a block diagram of a multimedia information processing apparatus according to an embodiment of the present disclosure; and

fig. 8 is a block diagram of an electronic device for implementing a neural network model training method and/or a multimedia information processing method according to an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

The present disclosure provides a training method of a neural network model, comprising a data determination phase and a training phase. In a data determination phase, a neural network model group and a first sample data set are determined according to a target task, the neural network model group comprises m models with different initial parameters, and the first sample data set comprises a training sample subset and a testing sample subset. In each training period, firstly adjusting the parameters of each model in the neural network model group based on the training sample subset to obtain m adjusted models; then based on the test sample subset, selecting n first models with higher precision from the adjusted m models; under the condition that the current training period does not meet the training stopping condition, updating the neural network model group based on the n first models, and returning to execute the next training period; and under the condition that the current training period meets the training stopping condition, selecting a model for executing the target task from the n first models. Wherein m and n are both natural numbers larger than 1, and m is larger than n.

An application scenario of the method and apparatus provided by the present disclosure will be described below with reference to fig. 1.

Fig. 1 is a schematic view of an application scenario of a training method of a neural network model and a processing method and device of multimedia information according to an embodiment of the disclosure.

As shown in fig. 1, the application scenario 100 of this embodiment may include a terminal device 120.

Among other things, the terminal device 120 may have various client applications installed. The terminal device 120 may be various electronic devices having a display screen and having a processing function, including but not limited to a smart phone, a tablet computer, a laptop portable computer, a desktop computer, and the like. The terminal device 120 may be used, for example, to process multimedia information to accomplish a target task. The multimedia information may be any one of an image 111 and a text 112.

In an embodiment, the terminal device 120 may perform tasks such as text classification, emotion analysis, or sequence tagging based on the text 112, or the terminal device 120 may perform tasks such as image classification, scene word recognition, or object detection based on the image 111. The present disclosure is not limited thereto.

In one embodiment, as shown in fig. 1, the application scenario 100 may further include a server 130, and the terminal device 120 may be communicatively connected to the server 130 through a network. For example, the server 130 may be, for example, an application server for providing support for a client application run by the terminal device 120.

In one embodiment, the terminal device 120 may send the multimedia information 140 to the server 130, and the server 130 performs the target task. The server 130 may then feed back the processing result 150 obtained after executing the target task to the terminal device 120, and the terminal device displays the processing result 150.

Illustratively, the terminal device 120 or the server 130 may employ a neural network model that matches the target task to perform the target task. For example, in the text classification task, the Neural Network model may be a TextCNN model, a Dynamic Convolutional Neural Network (DCNN) model, or the like. In the sequence labeling task, the neural network model can be a Li-LSTM + CRF model and the like. In the scene character recognition task, the Neural Network model may be a Convolutional Recurrent Neural Network (CRNN) model or the like. In the image classification task, the Neural Network model may be a VGG Network or a Residual Network (e.g., Residual Neural Network, ResNet), etc. In the target detection network, the neural network model may be a single Look-over detector (YOLO) or the like.

In one embodiment, server 130 may be, for example, a server incorporating a blockchain. Alternatively, the server 130 may also be a virtual server, a cloud server, or the like.

It should be noted that the training method of the neural network model provided in the present disclosure may be performed by the server 130, or may be performed by another server communicatively connected to the server 130. Accordingly, the training apparatus of the neural network model provided by the present disclosure may be disposed in the server 130, or disposed in another server communicatively connected to the server 130. The processing method of the multimedia information provided by the present disclosure may be executed by the terminal device 120 or executed by the server 130. Accordingly, the multimedia information processing apparatus provided by the present disclosure may be provided in the terminal device 120 or in the server 130.

It should be understood that the number and types of terminal devices, servers, and multimedia information in fig. 1 are merely illustrative. There may be any number and type of terminal devices, servers, and multimedia information, as desired for an implementation.

The training method of the neural network model provided by the present disclosure will be described in detail with reference to fig. 1 through fig. 2 to 4 below.

As shown in fig. 2, the training method of the neural network model of this embodiment may include operations S210 to S260, in which operations S220 to S250 are executed in a loop until the current training period satisfies the training stop condition.

In operation S210, a neural network model group and a first sample dataset are determined according to a target task.

According to an embodiment of the present disclosure, the neural network model set includes m models whose initial parameters are different from each other. For example, a model architecture may be selected according to a target task, and then m sets of parameters are randomly generated and respectively used as model parameters of the selected model architecture to obtain m initial models. For example, for a target task of target detection, the model architecture may be a YOLO architecture or the like.

According to the embodiment of the disclosure, a plurality of multimedia information can be acquired from a predetermined database based on a target task, and a plurality of sample data can be obtained by adding tags to the plurality of multimedia information based on the target task. Then, based on a predetermined ratio, the plurality of sample data is divided into a training sample subset and a test sample subset. For example, 10000 road images collected may be acquired from an image library, and a label indicating a vehicle included in the image may be added to each image. Then, 8000 images out of 10000 images are used as sample data in the training sample subset, and the other 2000 images are used as sample data in the test sample subset.

According to an embodiment of the present disclosure, a plurality of sample data in the training sample subset may also be divided into a plurality of training groups, and the sample data included in each training group is approximately equal. Thus, a plurality of training cycles are respectively completed based on a plurality of training groups. The training of the single training period will be described in detail below in conjunction with operations S220 to S250.

In operation S220, parameters of each model in the neural network model group are adjusted based on the training sample subset, and m adjusted models are obtained. Wherein m is a natural number greater than 1.

For example, a plurality of sample data of any one of a plurality of training sets may be sequentially input to each of the m neural network models. For each model, a plurality of predicted results sequentially corresponding to a plurality of sample data may be output via each model. The parameters of each model are adjusted based on differences between the plurality of predicted results and a plurality of actual results indicated by the tags of the plurality of sample data.

For example, in an object detection scenario, after a plurality of sample data are sequentially input into each model, a predicted position of an object and a predicted probability of the object for a predetermined category may be output by each model. The label of each sample data in the plurality of sample data indicates an actual location and an actual category of the object in the each sample data. Based on the predicted position, the predicted probability, the actual position, and the actual class, a value of a predetermined loss function (e.g., represented by a weighted sum of the classification loss function and the regression loss function) may be derived. And then, adjusting the parameters of each model based on the value of the predetermined loss function through a back propagation algorithm or a gradient descent algorithm to obtain an adjusted model. The adjusted m models can be obtained by adjusting the parameters of each model in the m models.

In operation S230, n first models with higher accuracy are selected from the adjusted m models based on the test sample subset. Wherein n is a natural number greater than 1, and m is greater than n.

According to the embodiment of the disclosure, a plurality of sample data in a test sample subset can be divided into a plurality of test groups, one accuracy index of a model is obtained based on each test group, and finally, an average value of a plurality of accuracy indexes obtained based on the plurality of test groups is used as the accuracy of the model.

For example, a plurality of sample data in each test set may be respectively input into each of the adjusted m models, and a prediction result may be output from each of the adjusted models. The prediction result is similar to the prediction result output by each of the m models described above. Then, the predicted result of each adjusted model and the actual result indicated by the labels of a plurality of sample data are counted, and the precision index of each adjusted model obtained on the basis of each test group is determined. The Precision value may be represented by, for example, a Precision (Precision), a Recall (Recall Rate), or a weighted sum of the Precision and Recall, etc. When the weighted sum of the accuracy and the recall rate is calculated, the respective weights of the accuracy and the recall rate can be set according to actual requirements, which is not limited by the disclosure.

For example, for the target detection task, if, of the prediction probabilities output by each adjusted model based on input certain sample data, the maximum probability is for the first class in the predetermined classes, and the actual class indicated by the label of the certain sample data is also the first class, it is determined that the prediction result of each adjusted model for the certain sample data is correct. And if the actual class indicated by the label of the certain sample data is the second class in the preset classes, determining that the prediction result of each adjusted model on the certain sample data is wrong. The embodiment can obtain the accuracy of the prediction result of each adjusted model on the sample data in each test group.

Taking the example that the target detection task is to detect a mud-headed vehicle as an example, referring to table 1 below, if the predetermined class for which the maximum probability is directed is a mud-headed vehicle (i.e., predicted to be True), but the actual class indicated by the label is a non-mud-headed vehicle (i.e., actually False), it can be determined that for each adjusted model, the certain sample data is a negative sample, and the each adjusted model is predicted to be wrong, the certain sample is a False Positive sample (FP) for the each adjusted model. Similarly, if the predetermined class for which the maximum probability is directed is mud-headed vehicles (i.e., predicted to be True) and the actual class indicated by the label is also mud-headed vehicles (i.e., actual to be True), then it may be determined that for each adjusted model, the certain sample data is a Positive sample, and the each adjusted model is predicted to be correct, the certain sample is a True Positive sample (TP) for the each adjusted model. If the predetermined class for which the maximum probability is for is a non-mud head vehicle (i.e., predicted to be False), but the actual class indicated by the label is a mud head vehicle (i.e., actual to be True), then it may be determined that for each adjusted model, the certain sample data is a positive sample, and the each adjusted model is predicted to be incorrect, the certain sample is a False Negative (FN) for the each adjusted model. If the predetermined class for which the maximum probability is for is a non-mud head vehicle (i.e., predicted False) and the actual class indicated by the label is also a non-mud head vehicle (i.e., actual False), then it may be determined that for each adjusted model, the certain sample data is a Negative sample and the each adjusted model is predicted correctly, and the certain sample is a True Negative (TN) for the each adjusted model. The accuracy and the recall rate can be obtained by counting the number of true positive examples, true negative examples, false positive examples and false negative examples in each test group.

TABLE 1

Actual/predicted	True	False
			True	TP	FN
False	FP	TN

For example, the accuracy P is (number of TPs + number of TNs)/(number of TPs + number of TNs + number of FPs + number of FNs), and the recall R is (number of TPs/(number of TPs + number of FNs)).

By the above method, the precision of each of the m adjusted models can be obtained. Then, n first models with higher precision can be selected from the m models. The value of n may be 1/3, 1/2, etc. of m, which is not limited in this disclosure.

In operation S240, it is determined whether the current training period satisfies a training stop condition. If not, operation S250 is performed, and if so, operation S260 is performed.

According to an embodiment of the present disclosure, the training stop condition may be, for example, that the number of times of training reaches a predetermined number of times. The operation S240 may determine whether the number of training times of the model reaches a predetermined number of times in the current training period, that is, whether the current training period is a training period of the predetermined number of training times. And if so, determining that the training stopping condition is met. Otherwise, determining that the training stop condition is not met. Alternatively, the training stop condition may be that the lowest accuracy of the selected n first models is higher than a predetermined accuracy. Alternatively, the training stop condition may be that the n selected first models are not changed any more. For example, if the initial models of the n first models selected for a predetermined number of consecutive training cycles are all the same, it may be determined that the training stop condition is satisfied. The predetermined number may be any integer greater than 1, such as 3 or 5, for example, which is not limited in this disclosure.

In operation S250, the neural network model group is updated based on the n first models, and execution of the next training cycle is returned.

According to an embodiment of the present disclosure, if the training stop condition is not satisfied, the (m-n) sets of parameters may be randomly generated again in a similar manner as operation S210, resulting in (m-n) initial models. And (m-n) obtained initial models and n selected first models form a new neural network model. When the (m-n) sets of parameters are randomly generated, the constraint condition for randomly generating the parameters can be set according to the values of the parameters of the n first models. For example, the maximum and minimum values of each parameter in the n first models may be set as the upper and lower limits of the constraint condition, so that the value of the corresponding parameter in the randomly generated (m-n) set of parameters is between the maximum and minimum values.

According to the embodiment of the present disclosure, an Evolutionary Algorithm (EA) may be further adopted to process the n first models with higher precision, so as to obtain a next generation network model of the n first models. And then updating other models except the n first models in the adjusted m models based on the next generation network model, thereby completing the updating of the neural network model. Wherein the evolutionary algorithm may comprise at least one of a particle swarm algorithm, a genetic algorithm, and a fireworks algorithm, for example. The evolutionary algorithm is an algorithm which realizes the victory or the disablement by using the rule of biological evolution through breeding, competition, reproduction and competition again and approaches the optimal solution of the complex engineering technical problem step by step. When the n first models are processed by using the evolutionary algorithm, the n first models may be used as parent models, and the obtained next-generation network model may be used as a child model of the parent model.

The number of the obtained next generation network models can be (m-n), and other models are directly replaced by the next generation network models. Alternatively, the number of the obtained next generation network models may be larger than (m-n), and then the accuracy of each next generation network model is obtained based on the aforementioned method of obtaining the accuracy of each adjusted model. And finally, selecting (m-n) models with higher precision from the next generation network models to replace other models.

In operation S260, a model to perform the target task is selected from the n first models.

According to the embodiment of the present disclosure, if the training stop condition is satisfied, for example, the model with the highest precision among the n first models may be used as the model for executing the target task. Alternatively, all or part of the plurality of n first models may be used as a model for executing the target task. When the target task is executed, the processing results of the multiple models of the target task are counted, and the final processing result is determined based on the statistical result.

The embodiment of the disclosure trains a plurality of models in the neural network model group at the same time, selects a model with higher precision from the plurality of models based on precision to adjust the neural network model group, and trains the model in the next period, so that the integration of evolutionary computation and deep learning can be realized. Based on the method, the thought of 'winning or losing and survival of the fittest' can be applied to the training of the deep learning model, so that the precision of the model can be further improved through evolutionary computation under the condition that the neural network model converges to the local optimal solution.

Moreover, if the evolutionary algorithm is adopted to obtain the next generation network model, the inheritance of the model in the parameter level can be fully considered, and therefore the training efficiency of the model is improved. This is because the optimal model can be derived by mutating other models in general. Through an evolutionary algorithm, the variation models of the n first models with higher precision can be obtained.

The principle of updating the neural network model set using the evolutionary algorithm will be described in detail below with reference to fig. 3.

FIG. 3 is a schematic diagram of a principle of updating a set of neural network models, according to an embodiment of the present disclosure.

As shown in fig. 3, in this embodiment 300, when the n first models are processed by using the evolutionary algorithm, any one model 311 of the n first models may be copied first to obtain a copied first model 321. The parameters in the copied first model 321 are then adjusted. The adjustment strategy can be set according to actual requirements, which is not limited by the disclosure.

For example, the value of the at least one target parameter 330 in the copied first model 321 may be adjusted based on a predetermined value. Or the copied value of the at least one target parameter 331 in the first model 321 may be adjusted to a random value. Thereby obtaining adjusted target parameters 341. The adjusted target parameter 341 is used to replace at least one target parameter 331 in the copied first model 321, so as to obtain a next generation network model 351. The predetermined value may be, for example, 0 or any value within a parameter value range, and the parameter value range may be set according to an actual requirement, which is not limited in this disclosure.

For example, a cross mutation strategy may be employed to derive a next generation network model based on the n first models. As shown in fig. 3, an other model 312 than any one model 311 may be selected from the n first models, then at least one target parameter 342 is randomly selected from the target parameters 332 of the other models 312, and the value of the corresponding at least one target parameter in the copied first model 321 is adjusted based on the value of the at least one target parameter 342, that is, the value of the corresponding parameter in the model 321 is replaced by the at least one target parameter 342, so as to obtain a next generation network model 352.

In an embodiment, each model in the neural network model set may include a feature extraction sub-model and a task execution sub-model. The feature extraction submodel is used for extracting features of sample data. For example, if the sample data is text data, the feature extraction sub-model may be a Bert model or the like, and the extracted feature is a text feature. If the sample data is an image, the feature extraction sub-model may be a VGG model, and the extracted features are image features. The task execution submodel may be a model of a fully connected layer and a normalization function (e.g., a softmax function). It is to be understood that the above-mentioned structure of the feature extraction submodel and the task execution submodel is only an example to facilitate understanding of the present disclosure, and the present disclosure is not limited thereto.

In this case, only the parameters of the task execution submodel may be selected as the target parameters. This is because the influence of the task execution submodel on the prediction result is generally large, and the parameters of the task execution submodel easily fall into the locally optimal solution. By selecting the target parameters, the training efficiency of the neural network model can be improved.

According to an embodiment of the present disclosure, the neural network model may be trained based on offline data. After training a model to perform a target task, the model may be used to perform an online task. In the process of executing the on-line task, the model for executing the on-line task can be subjected to incremental training by using data generated by the on-line task, so that the stability of the model is improved, and the precision of the model is further improved. The principle of the training method of the neural network model in this manner will be described in detail below with reference to fig. 4.

Fig. 4 is a schematic diagram of a method of training a neural network model according to an embodiment of the present disclosure.

As shown in FIG. 4, the embodiment 400 may retrieve a first sample dataset 420 from an offline database 410 according to a target task when determining a neural network model group and the first sample dataset according to the target task. Each model in the neural network model group 430 is trained using the obtained first sample dataset 420, resulting in m trained models 440. A model for performing the target task may then be obtained as the on-line model 441 using a similar method as described above. The online model 441 may then be used to process the online generated multimedia information 450 to obtain a prediction 460.

The embodiment may also generate a second set of sample data based on the first amount of data produced by the on-line task of the target task. I.e. collecting the multimedia information 450 generated on-line, resulting in a first amount of multimedia information, and then adding a tag 470, input by the user, indicating the actual processing result of the multimedia information to the multimedia information, resulting in a second sample data set 480 consisting of the first amount of tagged multimedia information. Subsequently, the previously obtained trained m models 440 may be incrementally trained based on the second sample data set 480 using the training method described above for each training period. It should be noted that, in the incremental training, it is not necessary to determine whether the current training period satisfies the stop condition, and when a model for executing the target task is selected for the n second models with higher accuracy in the current neural network model group, the current neural network model group needs to be updated based on the n second models with higher accuracy. The updating method is similar to the method described above for updating the neural network model set based on the n first models.

For example, any one of the n second models may be copied first to obtain a third model. The values of the target parameters in the third model are then adjusted.

For example, in the method for updating the neural network model set based on n first models described above, the values of the first predetermined proportion of the target parameters in the copied first models may be adjusted to the predetermined values described above. Alternatively, the values of all the target parameters in the copied first model may be adjusted to a predetermined value. When the current neural network model group is updated based on the n second models, the values of the target parameters in the third model in a second predetermined ratio, which is smaller than the first predetermined ratio, may be adjusted to a predetermined value. By setting the second predetermined ratio, the situation that parameters which learn more information are discarded due to model variation can be avoided. It is understood that the target parameter may be a learning rate, a regularization parameter, etc., which is not limited by this disclosure.

Based on the training method of the neural network model provided by the disclosure, the disclosure also provides a multimedia information processing method. This method will be described in detail below with reference to fig. 5.

Fig. 5 is a flowchart illustrating a method for processing multimedia information according to an embodiment of the disclosure.

As shown in fig. 5, the method 500 of this embodiment may include operations S510 to S520.

In operation S510, the multimedia information is input into the neural network model, resulting in output data.

According to an embodiment of the present disclosure, the output data may include a probability that the multimedia information belongs to each of the predetermined categories. The neural network model is obtained by training using the training method of the neural network model described above. The output data is similar to the prediction results described above and will not be described herein.

The multimedia information may be a text or an image described above, and is not described herein again. And when the neural network model is trained, the target task is a classification task of the multimedia information.

In operation S520, a category of the multimedia information is determined based on the output data.

According to the embodiment of the present disclosure, the category corresponding to the maximum probability in the output data may be used as the category of the multimedia information.

Based on the training method of the neural network model provided by the disclosure, the disclosure also provides a training device of the neural network model. The apparatus will be described in detail below with reference to fig. 6.

Fig. 6 is a block diagram of a structure of a training apparatus of a neural network model according to an embodiment of the present disclosure.

As shown in fig. 6, the training apparatus 600 for neural network model of this embodiment may include a data determination module 610, a model adjustment module 620, a first model selection module 630, a first model group update module 640, and a second model selection module 650.

The data determination module 610 is configured to determine a neural network model group and a first sample dataset from the target task. The set of neural network models includes m models having initial parameters different from each other, and the first sample data set includes a training sample subset and a test sample subset. In an embodiment, the data determining module 610 may be configured to perform the operation S210 described above, which is not described herein again.

The model adjusting module 620 is configured to adjust parameters of each model in the current neural network model group based on the training sample subset in the current training period, so as to obtain m adjusted models. Wherein m is a natural number greater than 1. In an embodiment, the model adjusting module 620 may be configured to perform the operation S220 described above, which is not described herein again.

The first model selecting module 630 is configured to select n first models with higher accuracy from the adjusted m models based on the test sample subset. Wherein n is a natural number greater than 1, and m is greater than n. In an embodiment, the first model selecting module 630 may be configured to perform the operation S230 described above, which is not described herein again.

The first model group updating module 640 is configured to update the current neural network model group based on the n first models and return to execute the next training cycle when the current training cycle does not satisfy the training stop condition. In an embodiment, the first model group updating module 640 may be configured to perform the operation S250 described above, and is not described herein again.

The second model selection module 650 is configured to select a model for executing the target task from the n first models if the current training period satisfies the training stop condition. In an embodiment, the second model selecting module 650 may be configured to perform the operation S260 described above, which is not described herein again.

According to an embodiment of the present disclosure, the first model group update module includes a model processing sub-module and a model update sub-module. The model processing submodule is used for processing the n first models by adopting an evolutionary algorithm to obtain a next generation network model of the n first models. And the model updating submodule is used for updating other models except the n first models in the adjusted m models based on the next generation network model.

According to an embodiment of the present disclosure, the model processing submodule includes a model copying unit and a numerical value adjusting unit. The model copying unit is used for copying any one of the n first models. The value adjusting unit is used for adjusting the value of the target parameter in the copied first model by adopting at least one of the following modes: adjusting the value of at least one target parameter in the copied first model based on a preset value; adjusting the numerical value of at least one target parameter in the copied first model into a random value; and adjusting the value of at least one target parameter in the copied first model based on the value of at least one target parameter of the other models except any one model in the n first models.

According to the embodiment of the disclosure, each model in the neural network model group comprises a feature extraction submodel and a task execution submodel, and the target parameters are parameters included in the task execution submodel.

According to an embodiment of the present disclosure, the data determination module 610 is configured to obtain a first sample data set from an offline database according to a target task. The adjusting the value of at least one target parameter in the copied first model based on the predetermined value may include any one of: adjusting the numerical value of the target parameter with the first preset proportion in the copied first model to a preset value; and adjusting the numerical values of all the target parameters in the copied first model to preset values.

According to an embodiment of the present disclosure, the apparatus 600 may further include a data set generation module, an incremental training module, and a second model group update module. The data set generation module is configured to, after the second model selection module 650 selects a model from the n first models to perform the target task: a second set of sample data is generated based on a first predetermined amount of data produced by the on-line task of the target task. The incremental training module is used for carrying out incremental training on the model for executing the target task based on the second sample data set. And the second model group updating module is used for updating the current neural network model group based on n second models with higher precision in the current neural network model group.

According to an embodiment of the present disclosure, the second model group update module includes a replication submodule and an adjustment submodule. And the replication submodule is used for replicating any one model of the n second models to obtain a third model. The adjusting submodule is used for adjusting the value of the target parameter of the second preset proportion in the third model to a preset value. Wherein the second predetermined ratio is less than the first predetermined ratio.

According to an embodiment of the present disclosure, the apparatus further includes a training stop determining module, configured to determine that the current training period satisfies a training stop condition by any one of the following manners: determining that the current training period meets a training stop condition under the condition that the current training period is a training period with preset training times; and under the condition that the initial models of the n first models selected in the second preset number of training cycles are the same, determining that the current training cycle meets the training stop condition.

Based on the multimedia information processing method provided by the disclosure, the disclosure also provides a multimedia information processing device. The apparatus will be described in detail below with reference to fig. 7.

As shown in fig. 7, the apparatus 700 of this embodiment may include a data obtaining module 710 and a category determining module 720.

The data obtaining module 710 is configured to input the multimedia information into the neural network model to obtain output data. The neural network model is obtained by training the training device of the neural network model described above, and the target task includes a multimedia information classification task. In an embodiment, the data obtaining module 710 is configured to perform the operation S510 described above, which is not described herein again.

The category determination module 720 is configured to determine a category of the multimedia information based on the output data. In an embodiment, the category determining module 720 is configured to perform the operation S520 described above, which is not described herein again.

In the technical scheme of the present disclosure, the processes of collecting, storing, using, processing, transmitting, providing, disclosing and the like of the personal information of the related users all conform to the regulations of the related laws and regulations, and do not violate the good custom of the public order.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

FIG. 8 illustrates a schematic block diagram of an example electronic device 800 that may be used to implement the training methods of neural network models and/or the processing methods of multimedia information of embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 8, the apparatus 800 includes a computing unit 801 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM)802 or a computer program loaded from a storage unit 808 into a Random Access Memory (RAM) 803. In the RAM 803, various programs and data required for the operation of the device 800 can also be stored. The calculation unit 801, the ROM 802, and the RAM 803 are connected to each other by a bus 804. An input/output (I/O) interface 805 is also connected to bus 804.

A number of components in the device 800 are connected to the I/O interface 805, including: an input unit 806, such as a keyboard, a mouse, or the like; an output unit 807 such as various types of displays, speakers, and the like; a storage unit 808, such as a magnetic disk, optical disk, or the like; and a communication unit 809 such as a network card, modem, wireless communication transceiver, etc. The communication unit 809 allows the device 800 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.

Computing unit 801 may be a variety of general and/or special purpose processing components with processing and computing capabilities. Some examples of the computing unit 801 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and the like. The calculation unit 801 performs the various methods and processes described above, such as a training method of a neural network model and/or a processing method of multimedia information. For example, in some embodiments, the training method of the neural network model and/or the processing method of the multimedia information may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 808. In some embodiments, part or all of the computer program can be loaded and/or installed onto device 800 via ROM 802 and/or communications unit 809. When loaded into RAM 803 and executed by the computing unit 801, a computer program may perform one or more steps of the method of training a neural network model and/or the method of processing multimedia information described above. Alternatively, in other embodiments, the computing unit 801 may be configured by any other suitable means (e.g., by means of firmware) to perform the training method of the neural network model and/or the processing method of the multimedia information.

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The Server may be a cloud Server, which is also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service extensibility in a traditional physical host and a VPS service ("Virtual Private Server", or simply "VPS"). The server may also be a server of a distributed system, or a server incorporating a blockchain.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. A method of training a neural network model, comprising:

determining a neural network model group and a first sample data set according to a target task, wherein the neural network model group comprises m models with different initial parameters, and the first sample data set comprises a training sample subset and a testing sample subset;

in the current training period:

adjusting parameters of each model in the current neural network model group based on the training sample subset to obtain m adjusted models;

based on the test sample subset, selecting n first models with higher precision from the adjusted m models;

under the condition that the current training period does not meet the training stopping condition, updating the current neural network model group based on the n first models, and returning to execute the next training period; and

selecting a model for executing the target task from the n first models in a case where the current training period satisfies a training stop condition,

wherein m and n are both natural numbers larger than 1, and m is larger than n.

2. The method of claim 1, wherein the updating the current neural network model set based on the n first models in the case that the current training period does not satisfy a training stop condition comprises:

processing the n first models by adopting an evolutionary algorithm to obtain a next generation network model of the n first models; and

and updating other models except the n first models in the adjusted m models based on the next generation network model.

3. The method of claim 2, wherein the processing the n first models using an evolutionary algorithm to obtain a next generation network model of the n first models comprises: any one of the n first models is copied, and the numerical value of the target parameter in the copied first model is adjusted by adopting at least one of the following modes:

adjusting the value of at least one target parameter in the copied first model based on a preset value;

adjusting the numerical value of at least one target parameter in the copied first model to be a random value;

and adjusting the value of at least one target parameter in the copied first model based on the value of at least one target parameter of other models except any one model in the n first models.

4. The method of claim 3, wherein each model of the neural network model set includes a feature extraction sub-model and a task execution sub-model; the target parameters are parameters included in the task execution submodel.

5. The method of claim 3 or 4, wherein:

determining the first sample dataset from the target task comprises: acquiring the first sample data set from an offline database according to a target task;

based on the predetermined value, adjusting the value of at least one target parameter in the copied first model includes any one of:

adjusting the numerical value of the target parameter with the first preset proportion in the copied first model to the preset value; and

and adjusting the numerical values of all the target parameters in the copied first model to the preset value.

6. The method of claim 5, further comprising, after selecting the model from the n first models to perform the target task:

generating a second sample data set based on a first predetermined amount of data produced by an on-line task of the target task;

performing incremental training on a model executing the target task based on the second sample data set; and

and updating the current neural network model group based on n second models with higher precision in the current neural network model group.

7. The method of claim 6, wherein the updating the current neural network model set based on the n second models of the current neural network model set with higher precision comprises:

any one of the n second models is copied to obtain a third model; and

adjusting the value of a second predetermined proportion of the target parameter in the third model to the predetermined value,

wherein the second predetermined ratio is less than the first predetermined ratio.

8. The method of claim 1, further comprising determining that the current training period satisfies the training stop condition by any one of:

determining that the current training period meets the training stop condition under the condition that the current training period is a training period of a preset training number;

and under the condition that the initial models of the n first models selected in the second preset number of training cycles are the same, determining that the current training cycle meets the training stop condition.

9. A method for processing multimedia information, comprising:

inputting multimedia information into a neural network model to obtain output data; and

determining a category of the multimedia information based on the output data,

the neural network model is obtained by training according to the method of any one of claims 1-8, and the target task comprises a multimedia information classification task.

10. An apparatus for training a neural network model, comprising:

the data determination module is used for determining a neural network model group and a first sample data set according to the target task, wherein the neural network model group comprises m models with different initial parameters, and the first sample data set comprises a training sample subset and a testing sample subset;

the model adjusting module is used for adjusting parameters of each model in the current neural network model group based on the training sample subset in the current training period to obtain m adjusted models;

a first model selection module, configured to select, based on the test sample subset, n first models with higher accuracy from the adjusted m models;

a first model group updating module, configured to update the current neural network model group based on the n first models and return to execute a next training cycle when the current training cycle does not meet a training stop condition; and

a second model selection module for selecting a model for executing the target task from the n first models if the current training period satisfies a training stop condition,

wherein m and n are both natural numbers larger than 1, and m is larger than n.

11. The apparatus of claim 10, wherein the first model group update module comprises:

the model processing submodule is used for processing the n first models by adopting an evolutionary algorithm to obtain a next generation network model of the n first models; and

and the model updating submodule is used for updating other models except the n first models in the adjusted m models based on the next generation network model.

12. The apparatus of claim 11, wherein the model processing sub-module comprises:

a model copying unit configured to copy any one of the n first models; and

a value adjusting unit, configured to adjust the value of the target parameter in the copied first model by using at least one of the following manners:

13. The apparatus of claim 12, wherein each model of the neural network model set includes a feature extraction sub-model and a task execution sub-model; the target parameters are parameters included in the task execution submodel.

14. The apparatus of claim 12 or 13, wherein:

the data determining module is used for acquiring the first sample data set from an off-line database according to a target task;

based on a predetermined value, adjusting the value of at least one target parameter in the copied first model comprises any one of:

15. The apparatus of claim 14, further comprising:

a dataset generation module for, after the second model selection module selects a model from the n first models to perform the target task: generating a second sample data set based on a first predetermined amount of data produced by an on-line task of the target task;

an incremental training module for performing incremental training on a model executing the target task based on the second sample data set; and

and the second model group updating module is used for updating the current neural network model group based on n second models with higher precision in the current neural network model group.

16. The apparatus of claim 15, wherein the second model group update module comprises:

the replication submodule is used for replicating any one model of the n second models to obtain a third model; and

an adjusting submodule, configured to adjust the value of the second predetermined proportion of the target parameter in the third model to the predetermined value,

17. The apparatus of claim 10, comprising a training-stop determination module to determine that the training-stop condition is satisfied by the current training period by any one of:

and under the condition that the initial models of the n first models selected from a second preset number of training cycles are the same, determining that the current training cycle meets the training stop condition.

18. An apparatus for processing multimedia information, comprising:

the data acquisition module is used for inputting the multimedia information into the neural network model to obtain output data; and

a category determination module for determining a category of the multimedia information based on the output data,

wherein the neural network model is obtained by training the device of any one of claims 10-17, and the target task comprises a multimedia information classification task.

19. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-9.

20. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any of claims 1-9.

21. A computer program product comprising a computer program which, when executed by a processor, implements a method according to any one of claims 1 to 9.