CN109376267B

CN109376267B - Method and apparatus for generating a model

Info

Publication number: CN109376267B
Application number: CN201811273684.7A
Authority: CN
Inventors: 袁泽寰; 王长虎
Original assignee: Beijing ByteDance Network Technology Co Ltd
Current assignee: Douyin Vision Co Ltd; Douyin Vision Beijing Co Ltd
Priority date: 2018-10-30
Filing date: 2018-10-30
Publication date: 2020-11-13
Anticipated expiration: 2038-10-30
Also published as: CN109376267A

Abstract

The embodiment of the application discloses a method and a device for generating a model. One embodiment of the method comprises: obtaining a sample set; extracting part of samples in the sample set to form a subset, and executing the following training steps: inputting the samples in the subset into an initial model, and determining the loss value of each input sample based on the information output by the initial model and the labeling information carried by the samples in the subset; selecting loss values of a target number according to the sequence of the loss values from small to large, and determining the average value of the selected loss values as a target loss value; determining whether the initial model is trained completely based on the target loss value; and if so, determining the trained initial model as the target model. This embodiment improves the accuracy of the generated model.

Description

Method and apparatus for generating a model

Technical Field

The embodiment of the application relates to the technical field of computers, in particular to a method and a device for generating a model.

Background

In the field of machine learning, model training using a sample set is often required. However, in the sample set used for model training, many samples have noise. As an example, when training a model for detecting video categories (e.g., classified as high-quality video category, low-quality video category), the sample set used is typically based on the number of clicks. Videos with high click volumes are generally labeled as high quality videos, and videos with low click volumes are generally labeled as low quality videos. However, there are also some cases where low quality video is marked as high quality video due to a large number of clicks. For example, publishers of low quality videos have more attendees, resulting in a greater number of video clicks. Meanwhile, there are some cases where high quality video is marked as low quality video due to a small number of clicks. For example, high quality video is not pushed due to a push system failure.

The correlation method usually does not consider sample noise, directly utilizes the sample set, and utilizes the supervised learning method to carry out model training.

Disclosure of Invention

The embodiment of the application provides a method and a device for generating a model.

In a first aspect, an embodiment of the present application provides a method for generating a model, where the method includes: acquiring a sample set, wherein the samples in the sample set are provided with marking information; extracting a part of samples in the sample set to form a subset, and executing the following training steps: inputting the samples in the subset into an initial model, and determining the loss value of each input sample based on the information output by the initial model and the labeling information carried by the samples in the subset; selecting loss values of a target number according to the sequence of the loss values from small to large, and determining the average value of the selected loss values as the target loss value, wherein the target number is smaller than the number of samples in the subset; determining whether the initial model is trained based on the target loss value; and if so, determining the trained initial model as the target model.

In some embodiments, the method further comprises: in response to determining that the initial model is not trained, updating parameters in the initial model based on the target loss value, and determining whether samples which do not perform the training step exist in the sample set; in response to determining that there is a sample composition subset extracted from the samples for which the training step was not performed, the training step is continued using the initial model with updated parameters as the initial model.

In some embodiments, the method further comprises: in response to determining that there are no samples in the sample set for which training steps are not performed, determining whether the target number is less than a preset number; and in response to the fact that the target quantity is smaller than the preset value, taking the sum of the target quantity and the specified value as the target quantity, taking the initial model after the parameters are updated as the initial model, re-extracting part of sample composition subsets in the sample set, and continuing to execute the training step.

In some embodiments, the method further comprises: and in response to the fact that the target quantity is not smaller than the preset value, using the initial model with the updated parameters as the initial model, re-extracting part of the sample composition subsets in the sample set, and continuing to execute the training step.

In some embodiments, the initial value of the target number is one-half of the number of samples in the initially composed subset.

In some embodiments, the initial model is obtained by: and training to obtain an initial model by using a machine learning method and taking the samples in the sample set as input and the label information of the input samples as output.

In some embodiments, the samples in the sample set are sample videos, the labeling information carried by the samples is used for indicating the category of the sample videos, and the target model is a video category detection model for detecting the category of the videos.

In a second aspect, an embodiment of the present application provides an apparatus for generating a model, where the apparatus includes: the acquisition unit is configured to acquire a sample set, wherein the samples in the sample set are provided with marking information; a training unit configured to extract a part of the sample composition subsets in the sample set, and perform the following training steps: inputting the samples in the subset into an initial model, and determining the loss value of each input sample based on the information output by the initial model and the labeling information carried by the samples in the subset; selecting loss values of a target number according to the sequence of the loss values from small to large, and determining the average value of the selected loss values as the target loss value, wherein the target number is smaller than the number of samples in the subset; determining whether the initial model is trained based on the target loss value; and if so, determining the trained initial model as the target model.

In some embodiments, the apparatus further comprises: a first determination unit configured to update parameters in the initial model based on the target loss value in response to determining that the initial model is not trained, determine whether there are samples in the sample set for which training steps are not performed; and a first execution unit configured to extract a sample composition subset from samples for which the training step is not performed in response to the determination of existence, and continue performing the training step using the initial model after updating the parameters as the initial model.

In some embodiments, the apparatus further comprises: a second determination unit configured to determine whether the target number is smaller than a preset value in response to determining that there is no sample in the sample set for which the training step is not performed; and the second execution unit is configured to respond to the fact that the target quantity is smaller than the preset value, take the sum of the target quantity and the specified value as the target quantity, use the initial model after the parameters are updated as the initial model, re-extract a part of sample composition subsets in the sample set, and continue to execute the training step.

In some embodiments, the apparatus further comprises: and the third execution unit is configured to, in response to the fact that the target quantity is not smaller than the preset value, use the initial model with the updated parameters as the initial model, re-extract a part of the sample composition subsets in the sample set, and continue to execute the training step.

In a third aspect, an embodiment of the present application provides a method for detecting a video category, including: receiving a target video; the frames in the target video are input into the video category detection model generated by the method described in the embodiment of the first aspect, and the video category detection result is obtained.

In a fourth aspect, an embodiment of the present application provides an apparatus for detecting a video category, including: a receiving unit configured to receive a target video; an input unit configured to input frames in a target video into a video category detection model generated by the method described in the embodiment of the first aspect, resulting in a video category detection result.

In a fifth aspect, an embodiment of the present application provides an electronic device, including: one or more processors; storage means having one or more programs stored thereon which, when executed by one or more processors, cause the one or more processors to implement a method as in any one of the embodiments of the first and third aspects above.

In a sixth aspect, the present application provides a computer-readable medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the method according to any one of the first and third aspects.

According to the method and the device for generating the model, the sample set is obtained, and the sample composition subset can be extracted from the sample set to train the initial model. Wherein, the samples in the sample set are provided with the labeling information. In this way, the samples in the extracted subset are input to the initial model, and information corresponding to each sample output by the initial model can be obtained. Then, based on the information output by the initial model and the labeling information carried by the samples in the extracted subset, the loss value of each input sample can be determined. Then, the loss values of the target number (smaller than the number of samples in the subset) may be selected in the order of the loss values from small to large, and the average of the selected loss values may be determined as the target loss value. Thereafter, it may be determined whether the initial model is trained based on the target loss value. If the initial model training is completed, the trained initial model can be determined as the target model. Because the loss value of the noise sample is usually large, the loss values of the target number (smaller than the number of samples in the subset) are selected according to the sequence from small to large of the loss value to train the initial model, so that the influence of the noise sample can be screened out, and the accuracy of the generated model is improved.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

FIG. 1 is an exemplary system architecture diagram in which one embodiment of the present application may be applied;

FIG. 2 is a flow diagram of one embodiment of a method for generating a model according to the present application;

FIG. 3 is a schematic illustration of an application scenario of a method for generating a model according to the present application;

FIG. 4 is a flow diagram of yet another embodiment of a method for generating a model according to the present application;

FIG. 5 is a schematic diagram of an embodiment of an apparatus for generating a model according to the present application;

FIG. 6 is a flow diagram for one embodiment of a method for detecting video categories in accordance with the present application;

FIG. 7 is a schematic block diagram illustrating an embodiment of an apparatus for detecting video category according to the present application;

FIG. 8 is a block diagram of a computer system suitable for use in implementing the electronic device of an embodiment of the present application.

Detailed Description

The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.

Fig. 1 illustrates an exemplary system architecture 100 to which the method for generating a model or the apparatus for generating a model of the present application may be applied.

As shown in fig. 1, the system architecture 100 may include

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The user may use the

terminal devices

101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. Various communication client applications, such as a video recording application, a video playing application, a voice interaction application, a search application, an instant messaging tool, a mailbox client, social platform software, and the like, may be installed on the

terminal devices

101, 102, and 103.

The

terminal apparatuses

101, 102, and 103 may be hardware or software. When the

terminal devices

101, 102, 103 are hardware, they may be various electronic devices with display screens, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like. When the

terminal apparatuses

101, 102, 103 are software, they can be installed in the electronic apparatuses listed above. It may be implemented as multiple pieces of software or software modules (e.g., to provide distributed services) or as a single piece of software or software module. And is not particularly limited herein.

When the

terminal devices

101, 102, 103 are hardware, an image capturing device may be mounted thereon. The image acquisition device can be various devices capable of realizing the function of acquiring images, such as a camera, a sensor and the like. The user may capture video using an image capture device on the

terminal device

101, 102, 103.

The server 105 may be a server that provides various services, such as a data processing server for data storage and data processing. The data processing server may have a sample set stored therein. The sample set may contain a large number of samples. Wherein, the samples in the sample set may have the label information. In addition, the data processing server may train the initial model using the samples in the sample set, and may store the training results (e.g., the generated target model). In this way, the trained target model can be used to perform corresponding data processing to realize the functions supported by the target model.

The server 105 may be hardware or software. When the server is hardware, it may be implemented as a distributed server cluster formed by multiple servers, or may be implemented as a single server. When the server is software, it may be implemented as multiple pieces of software or software modules (e.g., to provide distributed services), or as a single piece of software or software module. And is not particularly limited herein.

It should be noted that the method for generating the model provided in the embodiment of the present application is generally performed by the server 105, and accordingly, the apparatus for generating the model is generally disposed in the server 105.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

With continued reference to FIG. 2, a flow 200 of one embodiment of a method for generating a model according to the present application is shown. The method for generating the model comprises the following steps:

step 201, a sample set is obtained.

In the present embodiment, the execution subject of the method for generating a model (e.g., server 105 shown in fig. 1) may obtain a sample set in a variety of ways. For example, the executing entity may obtain the existing sample set stored therein from another server (e.g., a database server) for storing the samples through a wired connection or a wireless connection. As another example, a user may collect a sample via a terminal device (e.g.,

terminal devices

101, 102, 103 shown in FIG. 1). In this way, the execution entity may receive samples collected by the terminal and store the samples locally, thereby generating a sample set. It should be noted that the wireless connection means may include, but is not limited to, a 3G/4G connection, a WiFi connection, a bluetooth connection, a WiMAX connection, a Zigbee connection, a uwb (ultra wideband) connection, and other wireless connection means now known or developed in the future.

Here, the sample set may include a large number of samples. Wherein, the samples in the sample set may have the label information. It should be noted that, the samples in the sample set may be obtained according to actual requirements. For example, if a model capable of image type detection needs to be trained, the samples in the sample set may be sample images, and the annotation information may be an annotation indicating the type of the image. For another example, if a model capable of performing face detection needs to be trained, the samples in the sample set may be sample face images, and the annotation information may be position information indicating an area where a face object in the sample images is located.

In step 202, a subset of the sample composition in the sample set is extracted.

In this embodiment, the executing subject may extract a part of samples from the sample set acquired in step 201 to form a subset, and perform the training steps from step 203 to step 206. The manner of extracting the sample is not limited in this application. For example, a certain number of samples to be currently extracted may also be extracted from the sample set in a specified order.

In the field of machine learning, a subset of sample composition per extraction may be referred to as a minipatch (mini-batch). The behavior of traversing all samples in a complete set of samples can be called an epoch. By way of example, there are 128000 samples in the sample set, and 128 samples of the 128000 samples can be selected at a time to form a subset for model training. The 128000 samples in the sample set may be grouped into 1000 subsets in turn. When each subset is used, an epoch is considered to have elapsed. It should be noted that different epochs can extract different numbers of samples to form subsets. For example, the first epoch, 128 data can be extracted at a time to form a subset. The second epoch, 256 data can be extracted at a time to form a subset.

Because the number of samples in the sample set is usually large, in each training round, if all the samples in the sample set are used at one time, the time consumption is large, and the processing efficiency is low. A part of samples in the sample set are selected to form subsets, during training, each subset is subjected to gradient descent once, and finally, the samples in the sample set are traversed, so that the data volume of each iteration is small. Therefore, time consumption can be reduced and processing efficiency can be improved.

And 203, inputting the samples in the subset into the initial model, and determining the loss value of each input sample based on the information output by the initial model and the labeling information carried by the samples in the subset.

In this embodiment, the execution subject may first input the samples in the subset formed in step 202 into the initial model. The initial model can perform feature extraction, analysis and other processing on the sample, and then output information. It should be noted that the initial model may be a model that is pre-established as needed, or may be a model obtained after an existing model is initially trained as needed. For example, if a model capable of image type detection or text type detection needs to be trained, an existing classification model can be used as the initial model. By way of example, existing classification models may use convolutional neural networks of various existing structures (e.g., DenseBox, VGGNet, ResNet, SegNet, etc.). A Support Vector Machine (SVM) or the like may also be used.

After inputting the samples in the extracted subset to the initial model, the execution body may extract information output by the initial model. Wherein, each input sample can correspond to the information output by one initial model. For example, 128 samples in the subset, the initial model may output 128 pieces of information corresponding to the 128 input samples one to one.

Then, the execution agent may determine a loss value of each input sample based on information output by the initial model and label information carried by the samples in the subset. Here, the objective of training the initial model is to make the difference between the output information and the annotation information carried by the input sample as small as possible. Therefore, a value for characterizing a difference between the information output from the initial model and the annotation information can be used as the loss value. In practice, various existing loss functions (loss functions) can be used to characterize the difference between the information output by the initial model and the annotation information. For each input sample, inputting the information corresponding to the sample and the labeling information of the sample, which are output by the initial model, into a loss function, so as to obtain a loss value of the sample.

In practice, the loss function can be used to measure the degree of inconsistency between the predicted value (i.e., the output information) and the actual value (i.e., the annotation information) of the initial model. It is a non-negative real-valued function. In general, the smaller the value of the loss function (loss value), the better the robustness of the model. The loss function may be set according to actual requirements. For example, euclidean distances, cross entropy loss functions, etc. may be used.

In some optional implementation manners of this embodiment, the initial model may also be a model obtained after a pre-established model is preliminarily trained according to needs. Specifically, the initial model is obtained by the following steps: the initial model can be obtained by training using the samples in the sample set as input and the label information of the input samples as output by using a machine learning method. Here, the initial model can be trained using the existing model structure. As an example, if a model for detecting image categories needs to be trained, a convolutional neural network may be preliminarily trained by using a supervised learning manner, using a corresponding sample set (the sample may be an image, and label information of the sample may be used to indicate the category of the image). And determining the preliminarily trained convolutional neural network as an initial model. Specifically, samples may be sequentially extracted from the sample set to form subsets, and after training with the samples in each subset, the model may be updated once with a gradient descent algorithm. The executing subject may determine a model trained when the sample traversal in the sample set is completed as an initial model.

And 204, selecting loss values of the target quantity according to the sequence of the loss values from small to large, and determining the average value of the selected loss values as the target loss value.

In this embodiment, the execution body may select a target number of loss values in the order of the loss values from small to large, and determine an average value of the selected loss values as the target loss value. Wherein the target number is less than the number of samples in the subset composed in step 202. In practice, the target loss value is the extracted loss value of the subset. Different target numbers may be preset for different training rounds. In addition, the target number may also be set to a fixed value.

Because the loss value of the noise sample is usually large, the loss values of the target number (smaller than the number of samples in the subset) are selected from the small loss value to the large loss value for training the initial model, so that the influence of the noise sample can be screened out, and the accuracy of the generated model can be improved.

In some optional implementations of the present embodiment, the target number may be preset for a round of training. The larger the round, the larger the number of targets. For example, for the first round of training (i.e., the first epoch), the target number may be set to 64. For the second round of training (i.e., the second epoch), the target number may be set to 66. For the third round of training (i.e., the third epoch), the target number may be set to 68. As another example, the target number may be set to 64 for the first round of training. For the second round of training, the target number may be set to 68. For the third round of training, the target number may be set to 70.

In some optional implementations of the present embodiment, the target number may also be automatically calculated based on a preset initial value and a training turn. The larger the round, the larger the number of targets. For example, an initial value set in advance may be used as the target number used for the first round of training. Then, for each round of training thereafter, the product of the round of the previous round of training and a certain specified value may be determined, and the sum of the product and the initial value may be determined as the target number used in the round of training. For example, for the first round of training, the target number is an initial value of 64. For the second round of training, the target number was 66(64+1 × 2). The third round of training, the target number was 68(64+2 × 2). And so on.

In some alternative implementations of the present embodiment, the initial value of the target number may be one-half of the number of samples in the initially composed subset.

Since the initial model generally performs better and better (i.e., the prediction accuracy) during the initial model training process as the number of training rounds increases, the calculated loss value becomes more and more accurate. For a well-behaved model, the loss value of the noise sample is usually large. After the loss values are sorted from small to large, the noise samples are ordered later. However, when the iteration round is low, the loss value output by the initial model may not be accurate enough. Resulting in an insufficient order of noise samples. By setting the target number to a small number at this time, the loss value of the noise sample can be effectively screened out. The method is beneficial to improving the training effect of the model, and the trained model has better performance. Therefore, setting the number of targets to a value that increases as the number of training rounds increases, the accuracy of the generated model can be improved.

Step 205, determining whether the initial model is trained completely based on the target loss value.

In this embodiment, the execution subject may determine whether the initial model is trained completely in various ways based on the target loss value. As an example, the execution body described above may determine whether the target loss value has converged. When it is determined that the target loss value converges, it may be determined that the initial model at this time is trained. As yet another example, the execution subject may first compare the target loss value with a preset value. In response to determining that the target loss value is less than or equal to the preset value, the ratio of the number of target loss values less than or equal to the preset value to the preset number among the target loss values determined in the latest preset number (for example, 100) training steps may be counted. When the ratio is greater than a preset ratio (e.g., 95%), it may be determined that the initial model training is completed. It should be noted that the preset value can be generally used as an ideal case for representing the degree of inconsistency between the predicted value and the actual value. That is, when the loss value is less than or equal to the preset value, the predicted value may be considered to be close to or approximate to the true value. The preset value can be set according to actual requirements.

It is noted that in response to determining that the initial model has been trained, execution may continue to step 206. In response to determining that the initial model is not trained, parameters in the initial model may be updated based on the target loss value determined in step 204, the sample composition subset may be re-extracted from the sample set, and the training step may be continued using the initial model with updated parameters as the initial model. Here, the gradient of the target loss value with respect to the model parameters may be found using a back propagation algorithm, and then the model parameters may be updated based on the gradient using a gradient descent algorithm. It should be noted that the unselected loss values do not participate in the gradient descent. It should be noted that the back propagation algorithm, the gradient descent algorithm, and the machine learning method are well-known technologies that are currently widely researched and applied, and are not described herein again.

In some optional implementations of this embodiment, in response to determining that the initial model is not trained, the performing agent may update parameters in the initial model based on the target loss value. And, it may be determined whether there are samples in the set of samples for which the training step is not performed. It can be understood that when there are samples in the sample set for which no training step is performed, this means that the training of the current round (current epoch) is not completed, i.e. the samples in the sample set of the current round are not traversed to be completed. In this case, the sample composition subset may be extracted from samples that have not been subjected to the training step, and the training step may be continued using the initial model with updated parameters as the initial model. In general, the number of samples taken at a time (i.e., the number of samples in a subset of each component) may be the same during the same round of training (same epoch). Thus, the number of samples taken here may be the same as the number of samples taken in step 202.

Optionally, in response to determining that there are no samples in the sample set for which the training step is not performed, that is, the current round (current epoch) training is completed, and the samples in the sample set of the current round are traversed to be completed. It may be determined whether the target number is less than a preset number. As an example, if the number of samples in the subset is 128, the preset value may be set to a value smaller than the number of samples, such as 110 or 120. In response to determining that the target number is smaller than the preset value, the sum of the target number and a specified value (for example, 2) may be used as the target number, the initial model with the updated parameters is used as the initial model, a part of the sample composition subsets in the sample set is re-extracted, and the training step is continued. It is understood that, since one round of training is completed at this time, the next round of training is performed by extracting a part of the sample composition subset from the sample set again for training.

Optionally, in response to determining that the target number is not less than the preset value, the executing entity may use the initial model with updated parameters as the initial model, re-extract a part of the sample composition subsets in the sample set, and continue to execute the training step. It is understood that, since one round of training is completed at this time, the next round of training is performed by extracting a part of the sample composition subset from the sample set again for training.

And step 206, in response to determining that the training of the initial model is completed, determining the trained initial model as the target model.

In this embodiment, in response to determining that the training of the initial model is completed, the executing entity may determine the trained initial model as the target model.

In some optional implementation manners of this embodiment, the samples in the sample set are sample videos, the labeling information carried in the samples is used to indicate the type of the sample videos, and the target model is a video type detection model for detecting the type of the videos.

In some optional implementations of this embodiment, the execution subject may store the target model locally, or may send the target model to other electronic devices (for example,

terminal devices

101, 102, and 103 shown in fig. 1).

With continued reference to fig. 3, fig. 3 is a schematic diagram of an application scenario of the method for generating a model according to the present embodiment. In the application scenario of fig. 3, a terminal device 301 used by a user may have a model training application installed thereon. After a user opens the application and uploads the sample set or the storage path of the sample set, the server 302 providing background support for the application may run a method for generating a model, including:

first, a sample set may be obtained. Wherein, the samples in the sample set are provided with marking information.

Then, a subset 303 is formed by extracting a part of samples in the sample set, and the following training steps are performed: the samples in the subset 303 are input to the initial model 304, and the loss value of each input sample is determined based on the information output by the initial model and the labeling information carried by the samples in the subset 303. Next, a target number of loss values may be selected in the order of the loss values from small to large, and the average of the selected loss values may be determined as the target loss value 305, where the target number is smaller than the number of the selected loss values. Then, it may be determined whether the initial model is trained based on the target loss value. If it is determined that the training is complete, the initial model after training may be determined to be the target model 306.

The method provided by the above embodiment of the present application may extract the sample composition subset from the sample set by obtaining the sample set, so as to perform training of the initial model. Wherein, the samples in the sample set are provided with the labeling information. In this way, the samples in the extracted subset are input to the initial model, and information corresponding to each sample output by the initial model can be obtained. Then, based on the information output by the initial model and the labeling information carried by the samples in the extracted subset, the loss value of each input sample can be determined. Then, the loss values of the target number (smaller than the number of samples in the subset) may be selected in the order of the loss values from small to large, and the average of the selected loss values may be determined as the target loss value. Thereafter, it may be determined whether the initial model is trained based on the target loss value. If the initial model training is completed, the trained initial model can be determined as the target model. Because the loss value of the noise sample is usually large, the loss values of the target number (smaller than the number of samples in the subset) are selected according to the sequence from small to large of the loss value to train the initial model, so that the influence of the noise sample can be screened out, and the accuracy of the generated model is improved.

With further reference to FIG. 4, a flow 400 of yet another embodiment of a method for generating a model is shown. The process 400 of the method for generating a model includes the steps of:

step 401, a sample set is obtained.

In this embodiment, an executing agent of the method for generating a model (e.g., server 105 shown in FIG. 1) may obtain a sample set. Here, the sample set may include a large number of samples. Wherein, the samples in the sample set may have the label information.

In this embodiment, the samples in the sample set may be sample videos, the labeling information carried by the samples may be used to indicate the category of the sample videos,

at step 402, a subset of the sample composition in the sample set is extracted.

In this embodiment, the executing entity may extract a part of samples from the sample set acquired in step 401 to compose a subset, and execute the training steps of steps 403 to 411. The manner of extracting the sample is not limited in this application. For example, the samples currently to be extracted may be extracted from the sample set in a specified order.

And 403, inputting the samples in the subset into the initial model, and determining the loss value of each input sample based on the information output by the initial model and the extracted labeling information carried by the samples in the subset.

In this embodiment, the executing agent may first input the samples in the subset composed in step 402 into the initial model. The information output by the initial model can then be extracted. Wherein, each input sample can correspond to the information output by one initial model. The loss value for each sample entered may then be determined based on the information output by the initial model and the labeling information carried by the samples in the subset. It should be noted that the operation of calculating the loss value is substantially the same as the operation described in step 203, and is not described herein again.

In this embodiment, the initial model may be a model obtained after a pre-established model is initially trained according to needs. Specifically, the initial model is obtained by the following steps: the initial model can be obtained by training using the samples in the sample set as input and the label information of the input samples as output by using a machine learning method. Here, the initial model can be trained using the existing model structure.

In this embodiment, a model for video category detection may be trained. The execution subject can initially train the convolutional neural network by using a supervised learning mode. And determining the trained convolutional neural network as an initial model. It should be noted that, after each subset traversal is completed, the model may be updated once by using a gradient descent algorithm. The executing subject may determine a model trained when the sample traversal in the sample set is completed as an initial model.

Step 404, selecting loss values of the target number according to the sequence of the loss values from small to large, and determining the average value of the selected loss values as the target loss value.

In this embodiment, the execution body may select a target number of loss values in the order of the loss values from small to large, and determine an average value of the selected loss values as the target loss value. Wherein the target number may be less than the number of samples in the subset composed in step 402. Because the loss value of the noise sample is usually large, the loss values of the target number are selected from the small loss value to the large loss value to train the initial model, so that the influence of the noise sample can be screened out, and the accuracy of the generated model is improved.

In the present embodiment, the target number may also be automatically calculated based on a preset initial value and the number of training rounds. The larger the round, the larger the number of targets. For example, a preset initial value may be used as the target number used for the first round (i.e., the first epoch) of training. Here, the initial value of the target number may be one-half of the number of samples in the subset of the initial composition. Then, for each subsequent round of training, the sum of the target number used in the previous round of training and a specified value (e.g., 2) can be determined as the target number used in the round of training. For example, for the first round of training, the target number is an initial value of 64. The second round of training, the target number is 66(64+ 2). The third round of training, target number 68(66+ 2). And so on until the target number reaches a preset value (e.g., 110). When the target number reaches a preset value, the target number may not be updated in each subsequent training round.

Step 405, determining whether the initial model is trained completely based on the target loss value.

In this embodiment, the execution subject may determine whether the initial model is trained completely in various ways based on the target loss value. As an example, the execution body described above may determine whether the target loss value has converged. When it is determined that the target loss value converges, it may be determined that the initial model at this time is trained.

It should be noted that, in response to determining that the initial model training is completed, the step 411 may be executed continuously. Step 406 may be performed in response to determining that the initial model is not trained to completion.

And 406, in response to determining that the initial model is not trained, updating parameters in the initial model based on the target loss value, and determining whether a sample which is not subjected to the training step exists in the sample set.

In this embodiment, in response to determining that the initial model is not trained, the execution subject may update parameters in the initial model based on the target loss value. Here, the gradient of the target loss value with respect to the model parameters may be found using a back propagation algorithm, and then the model parameters may be updated based on the gradient using a gradient descent algorithm. It should be noted that the unselected loss values do not participate in the gradient descent. It should be noted that the back propagation algorithm, the gradient descent algorithm, and the machine learning method are well-known technologies that are currently widely researched and applied, and are not described herein again. At the same time, it may be determined whether there are samples in the set of samples for which the training step was not performed. If so, step 407 may be performed, and if not, step 408 may be performed.

Step 407, in response to determining that there are samples in the sample set that have not been subjected to the training step, extracting samples from the samples that have not been subjected to the training step to form a subset, and continuing to perform the training step using the initial model with updated parameters as the initial model.

It can be understood that when there are samples in the sample set for which no training step is performed, this means that the training of the current round (current epoch) is not completed, i.e. the samples in the sample set of the current round are not traversed to be completed. In this case, the sample composition subset may be extracted from samples that have not been subjected to the training step, and the training step may be continued using the initial model with updated parameters as the initial model.

Step 408, in response to determining that there are no samples in the sample set that have not been subjected to the training step, determining whether the target number is less than a preset number.

In this embodiment, in response to determining that there is no sample in the sample set that has not been subjected to the training step, the executing entity may determine whether the current target number is smaller than a preset value. As an example, if the number of samples in the subset is 128, the preset value may be set to a value smaller than the number of samples, such as 110 or 120. In response to determining that the target number is less than the preset number, step 409 may be performed. In response to determining that the target quantity is not less than the preset value, step 410 may be performed.

And 409, in response to the fact that the target quantity is smaller than the preset value, taking the sum of the target quantity and the specified value as the target quantity, taking the initial model after the parameters are updated as the initial model, re-extracting part of samples in the sample set to form a subset, and continuing to execute the training step.

In this embodiment, in response to determining that the target number is smaller than the preset value, the executing entity may use a sum of the target number and a specified value (e.g., 2) as the target number, use the initial model with updated parameters as the initial model, re-extract a part of the sample composition subsets in the sample set, and continue to execute the training step. It is understood that, since one round of training is completed at this time, the next round of training is performed by extracting a part of the sample composition subset from the sample set again for training.

Step 410, in response to the fact that the target quantity is not smaller than the preset value, using the initial model with updated parameters as the initial model, re-extracting a part of samples in the sample set to form a subset, and continuing to perform the training step.

In this embodiment, in response to determining that the target number is not less than the preset value, the executing entity may use the initial model with updated parameters as the initial model, re-extract a part of the sample composition subsets in the sample set, and continue to execute the training step. It is understood that, since one round of training is completed at this time, the next round of training is performed by extracting a part of the sample composition subset from the sample set again for training. Note that the number of targets used in the training process at this time is not updated.

In step 411, in response to determining that the training of the initial model is completed, the trained initial model is determined as the target model.

In this embodiment, in response to determining that the training of the initial model is completed, the executing entity may determine the trained initial model as the target model. Here, the target model is a video category detection model for detecting a video category.

As can be seen from fig. 4, compared with the embodiment corresponding to fig. 2, the flow 400 of the method for generating a model in the present embodiment involves an operation of gradually increasing the number of targets, i.e., gradually increasing the number of selected loss values, in the training process. This can further improve the accuracy of the generated model.

With further reference to fig. 5, as an implementation of the method shown in the above figures, the present application provides an embodiment of an apparatus for generating a model, which corresponds to the embodiment of the method shown in fig. 2, and which can be applied in various electronic devices.

As shown in fig. 5, the apparatus 500 for generating a model according to the present embodiment includes: an obtaining unit 501, configured to obtain a sample set, where samples in the sample set have tagging information; a training unit 502 configured to extract a part of the samples in the sample set to form a subset, and perform the following training steps: inputting the samples in the subset into an initial model, and determining the loss value of each input sample based on the information output by the initial model and the labeling information carried by the samples in the subset; selecting loss values of a target number according to the sequence of the loss values from small to large, and determining the average value of the selected loss values as the target loss value, wherein the target number is smaller than the number of samples in the subset; updating parameters in the initial model based on the target loss value; determining whether the initial model is trained; and if so, determining the trained initial model as the target model.

In some optional embodiments of this embodiment, the apparatus further comprises a first determining unit and a first executing unit (not shown in the figure). Wherein the first determining unit may be configured to determine whether there are samples in the sample set for which the training step is not performed, in response to determining that the initial model is not trained. The first performing unit may be configured to extract a sample composition subset from samples for which the training step is not performed in response to the determination of existence, and continue performing the training step using the initial model with updated parameters as an initial model.

In some optional embodiments of this embodiment, the apparatus further comprises a second determining unit and a second performing unit (not shown in the figure). Wherein the second determining unit is configured to determine whether the target number is smaller than a preset value in response to determining that there is no sample in the sample set for which the training step is not performed; and the second execution unit is configured to respond to the fact that the target quantity is smaller than the preset numerical value, take the sum of the target quantity and the preset quantity as the target quantity, use the initial model after the parameters are updated as the initial model, re-extract a part of sample composition subsets in the sample set, and continue to execute the training step.

In some optional embodiments of this embodiment, the apparatus further comprises a third execution unit (not shown in the figure). And the third execution unit is configured to, in response to determining that the target number is not less than the preset value, use the initial model after updating the parameters as the initial model, re-extract a part of the sample composition subsets in the sample set, and continue to execute the training step.

In some optional embodiments of this embodiment, the initial value of the target number may be one-half of the number of samples in the initially composed subset.

In some optional embodiments of this embodiment, the initial model may be obtained by: and training to obtain an initial model by using a machine learning method and taking the samples in the sample set as input and the label information of the input samples as output.

In some optional embodiments of this embodiment, the samples in the sample set may be sample videos, the labeling information carried in the samples may be used to indicate the types of the sample videos, and the target model may be a video type detection model for detecting the types of the videos.

The apparatus provided by the above embodiment of the present application obtains the sample set through the obtaining unit 501, and may extract the sample composition subset from the sample set to perform training of the initial model. Wherein, the samples in the sample set are provided with the labeling information. In this way, the training unit 502 inputs the samples in the extracted subset to the initial model, and can obtain information corresponding to each sample output by the initial model. Then, the training unit 502 may determine the loss value of each input sample based on the information output by the initial model and the label information carried by the samples in the extracted subset. Then, the loss values of the target number (smaller than the number of samples in the subset) may be selected in the order of the loss values from small to large, and the average of the selected loss values may be determined as the target loss value. Thereafter, it may be determined whether the initial model is trained based on the target loss value. If the initial model training is completed, the trained initial model can be determined as the target model. Because the loss value of the noise sample is usually large, the loss values of the target number (smaller than the number of samples in the subset) are selected according to the sequence from small to large of the loss value to train the initial model, so that the influence of the noise sample can be screened out, and the accuracy of the generated model is improved.

Referring to fig. 6, a flowchart 600 of an embodiment of a method for detecting video category provided by the present application is shown. The method for detecting a video category may comprise the steps of:

step 601, receiving a target video.

In this embodiment, an execution subject (for example, the server 105 shown in fig. 1, or another server storing a video category detection model) for detecting a video category may receive a target video transmitted by a terminal device (for example, the

terminal devices

101, 102, and 103 shown in fig. 1) by using a wired connection or a wireless connection.

Step 602, inputting the frames in the target video into the video category detection model to obtain the video category detection result.

In this embodiment, the executing entity may input the frame in the target video into a video category detection model to obtain a video category detection result. The video category detection model may be generated using the method of generating the object model as described in the embodiment of fig. 2 above. For a specific generation process, reference may be made to the related description of the embodiment in fig. 2, which is not described herein again. The video category detection result can be used for indicating the category of the target video

In some optional implementations of this embodiment, after obtaining the video category detection result, the execution subject may store the target video in a video library corresponding to the category indicated by the video category detection result.

The method for detecting the video category can be used for detecting the category of the video, and the accuracy of video category detection can be improved.

With continuing reference to FIG. 7, as an implementation of the method illustrated in FIG. 6 above, the present application provides one embodiment of an apparatus for detecting video categories. The embodiment of the device corresponds to the embodiment of the method shown in fig. 6, and the device can be applied to various electronic devices.

As shown in fig. 7, the apparatus 700 for detecting video category according to the present embodiment includes: a receiving unit 701 configured to receive a target video; an input unit 702 is configured to input the frames in the target video into a video category detection model, so as to obtain a video category detection result.

It will be understood that the elements described in the apparatus 700 correspond to various steps in the method described with reference to fig. 6. Thus, the operations, features and resulting advantages described above with respect to the method are also applicable to the apparatus 700 and the units included therein, and will not be described herein again.

Referring now to FIG. 8, shown is a block diagram of a computer system 800 suitable for use in implementing the electronic device of an embodiment of the present application. The electronic device shown in fig. 8 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.

As shown in fig. 8, the computer system 800 includes a Central Processing Unit (CPU)801 that can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)802 or a program loaded from a storage section 808 into a Random Access Memory (RAM) 803. In the RAM 803, various programs and data necessary for the operation of the system 800 are also stored. The CPU 801, ROM 802, and RAM 803 are connected to each other via a bus 804. An input/output (I/O) interface 805 is also connected to bus 804.

The following components are connected to the I/O interface 805: an input portion 806 including a keyboard, a mouse, and the like; an output section 807 including a signal such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 808 including a hard disk and the like; and a communication section 809 including a network interface card such as a LAN card, a modem, or the like. The communication section 809 performs communication processing via a network such as the internet. A drive 810 is also connected to the I/O interface 805 as necessary. A removable medium 811 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 810 as necessary, so that a computer program read out therefrom is mounted on the storage section 808 as necessary.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program can be downloaded and installed from a network through the communication section 809 and/or installed from the removable medium 811. The computer program performs the above-described functions defined in the method of the present application when executed by the Central Processing Unit (CPU) 801. It should be noted that the computer readable medium described herein can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present application may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes an acquisition unit and a training unit. Where the names of these units do not in some cases constitute a limitation of the unit itself, for example, the acquisition unit may also be described as a "unit acquiring a sample set".

As another aspect, the present application also provides a computer-readable medium, which may be contained in the apparatus described in the above embodiments; or may be present separately and not assembled into the device. The computer readable medium carries one or more programs which, when executed by the apparatus, cause the apparatus to: obtaining a sample set; extracting part of samples in the sample set to form a subset, and executing the following training steps: inputting the samples in the subset into an initial model, and determining the loss value of each input sample based on the information output by the initial model and the labeling information carried by the samples in the subset; selecting loss values of a target number according to the sequence of the loss values from small to large, and determining the average value of the selected loss values as a target loss value; updating parameters in the initial model based on the target loss value; determining whether the initial model is trained; and if so, determining the trained initial model as the target model.

The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention herein disclosed is not limited to the particular combination of features described above, but also encompasses other arrangements formed by any combination of the above features or their equivalents without departing from the spirit of the invention. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.

Claims

1. A method for generating a model, comprising:

acquiring a sample set, wherein samples in the sample set are provided with marking information, the samples in the sample set are sample videos, and the marking information provided by the samples is used for indicating the types of the sample videos;

extracting a part of samples in the sample set to form a subset, and executing the following training steps: inputting the samples in the subset into an initial model, and determining the loss value of each input sample based on the information output by the initial model and the labeling information carried by the samples in the subset; selecting loss values of a target number according to the sequence of the loss values from small to large, and determining the average value of the selected loss values as the target loss value, wherein the target number is smaller than the number of samples in the subset; determining whether the initial model is trained completely based on the target loss value; and if so, determining the trained initial model as a target model, wherein the target model is a video category detection model for detecting video categories.

2. The method for generating a model of claim 1, wherein the method further comprises:

in response to determining that the initial model is not trained to completion, updating parameters in the initial model based on the target loss value, determining whether there are samples in the sample set for which the training step was not performed;

in response to determining that there is a sample composition subset extracted from the samples for which the training step was not performed, continuing the training step using the initial model with updated parameters as the initial model.

3. The method for generating a model of claim 2, wherein the method further comprises:

in response to determining that there are no samples in the sample set for which the training step was not performed, determining whether a target number is less than a preset number;

and in response to the fact that the target quantity is smaller than the preset value, taking the sum of the target quantity and the specified value as the target quantity, using the initial model after the parameters are updated as the initial model, re-extracting part of sample composition subsets in the sample set, and continuing to execute the training step.

4. The method for generating a model of claim 3, wherein the method further comprises:

and in response to the fact that the target quantity is not smaller than the preset value, using the initial model with updated parameters as an initial model, re-extracting part of sample composition subsets in the sample set, and continuing to execute the training step.

5. A method for generating a model as claimed in claim 3, wherein the initial value of the target number is one half of the number of samples in the primary composed subset.

6. A method for generating a model as claimed in claim 1, wherein the initial model is obtained by:

and training to obtain an initial model by using a machine learning method and taking the samples in the sample set as input and the marking information of the input samples as output.

7. An apparatus for generating a model, comprising:

the system comprises an acquisition unit, a storage unit and a processing unit, wherein the acquisition unit is configured to acquire a sample set, samples in the sample set are provided with marking information, the samples in the sample set are sample videos, and the marking information provided by the samples is used for indicating the category of the sample videos;

a training unit configured to extract a part of the sample composition subsets from the sample set, and perform the following training steps: inputting the samples in the subset into an initial model, and determining the loss value of each input sample based on the information output by the initial model and the labeling information carried by the samples in the subset; selecting loss values of a target number according to the sequence of the loss values from small to large, and determining the average value of the selected loss values as the target loss value, wherein the target number is smaller than the number of samples in the subset; determining whether the initial model is trained completely based on the target loss value; and if so, determining the trained initial model as a target model, wherein the target model is a video category detection model for detecting video categories.

8. The apparatus for generating a model of claim 7, wherein the apparatus further comprises:

a first determining unit configured to update parameters in an initial model based on the target loss value in response to determining that the initial model is not trained, determine whether there are samples in the sample set for which the training step is not performed;

a first execution unit configured to extract a sample composition subset from samples for which the training step is not performed in response to the determination of existence, and continue performing the training step using the initial model after updating parameters as an initial model.

9. The apparatus for generating a model of claim 8, wherein the apparatus further comprises:

a second determination unit configured to determine whether a target number is less than a preset number in response to determining that there is no sample in the sample set for which the training step is not performed;

and the second execution unit is configured to respond to the fact that the target quantity is smaller than the preset value, take the sum of the target quantity and the specified value as the target quantity, use the initial model after the parameters are updated as the initial model, re-extract a part of sample composition subsets in the sample set, and continue to execute the training step.

10. The apparatus for generating a model of claim 9, wherein the apparatus further comprises:

and the third execution unit is configured to, in response to the fact that the target quantity is determined to be not smaller than the preset value, use the initial model with the updated parameters as an initial model, re-extract a part of the sample composition subsets in the sample set, and continue to execute the training step.

11. The apparatus for generating a model of claim 9, wherein the initial value of the target number is one-half of the number of samples in the subset of primary constituents.

12. Apparatus for generating a model according to claim 7, wherein the initial model is obtained by:

13. A method for detecting video categories, comprising:

receiving a target video;

inputting the frames in the target video into the video category detection model generated by the method according to claim 1 to obtain the video category detection result.

14. An apparatus for detecting video categories, comprising:

a receiving unit configured to receive a target video;

an input unit configured to input frames in the target video into the video category detection model generated by the method according to claim 1, and obtain a video category detection result.

15. An electronic device, comprising:

one or more processors;

a storage device having one or more programs stored thereon,

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-6, 13.

16. A computer-readable medium, on which a computer program is stored, which program, when being executed by a processor, carries out the method according to any one of claims 1-6, 13.