CN109447156B

CN109447156B - Method and apparatus for generating a model

Info

Publication number: CN109447156B
Application number: CN201811273681.3A
Authority: CN
Inventors: 袁泽寰; 王长虎
Original assignee: Beijing ByteDance Network Technology Co Ltd
Current assignee: Beijing ByteDance Network Technology Co Ltd
Priority date: 2018-10-30
Filing date: 2018-10-30
Publication date: 2022-05-17
Anticipated expiration: 2038-10-30
Also published as: CN109447156A

Abstract

The embodiment of the application discloses a method and a device for generating a model. One embodiment of the method comprises: obtaining a sample set; extracting part of samples in the sample set to form a subset, and executing the following training steps: inputting the samples in the subset into an initial model, and determining the loss value of each input sample based on the information output by the initial model and the class labels carried by the samples in the subset; selecting loss values of the positive samples and loss values of part of the negative samples in the subset, and determining the average value of the selected loss values as a target loss value; determining whether the initial model is trained based on the target loss value; and if so, determining the trained initial model as a class detection model. This embodiment improves the accuracy of the generated model.

Description

Method and apparatus for generating a model

Technical Field

The embodiment of the application relates to the technical field of computers, in particular to a method and a device for generating a model.

Background

In the field of machine learning, model training using a sample set is generally required. However, in the sample set used for model training, the number of different types of samples in the sample set is not uniform because of the large difference in difficulty in obtaining different types of samples. As an example, when training is used to detect video categories (e.g., classified into abnormal category videos and normal category videos), the number of positive samples (abnormal category videos) is usually very small. While negative examples (normal category video) are more.

The related method usually uses the sample set directly and uses the supervised learning method to train the model.

Disclosure of Invention

The embodiment of the application provides a method and a device for generating a model.

In a first aspect, an embodiment of the present application provides a method for generating a model, where the method includes: obtaining a sample set, wherein the sample set comprises positive samples and negative samples, the number of the positive samples is less than that of the negative samples, and the samples in the sample set are labeled with categories; extracting a part of samples in the sample set to form a subset, and executing the following training steps: inputting the samples in the subset into an initial model, and determining the loss value of each input sample based on the information output by the initial model and the class labels carried by the samples in the subset; selecting loss values of positive samples and loss values of partial negative samples in the subset, and determining the average value of the selected loss values as a target loss value; determining whether the initial model is trained based on the target loss value; and if so, determining the trained initial model as a class detection model.

In some embodiments, selecting the loss value of the positive samples and the loss value of the partial negative samples in the subset comprises: selecting a loss value of a positive sample in the subset; and selecting loss values of the negative samples of the target number according to the sequence of the loss values from large to small, wherein the ratio of the target number to the number of the positive samples in the extracted subset is within a preset value interval.

In some embodiments, selecting the loss value of the positive samples and the loss value of the partial negative samples in the subset comprises: and in response to determining that no positive sample exists in the subset, selecting a preset number or a preset proportion of loss values in the sequence from large loss value to small loss value.

In some embodiments, the method further comprises: in response to determining that the initial model is untrained, updating parameters in the initial model based on the target loss value, re-extracting the sample composition subset from the sample set, and continuing to perform the training step using the initial model after updating the parameters as the initial model.

In some embodiments, the initial model is obtained by: and (3) using a machine learning method, taking the samples in the sample set as input, taking the class labels of the input samples as output, and training to obtain an initial model.

In some embodiments, the samples in the sample set are sample videos, the category labels carried by the samples are used for indicating the categories of the sample videos, and the category detection model is a video category detection model for detecting the categories of the videos.

In a second aspect, an embodiment of the present application provides an apparatus for generating a model, where the apparatus includes: the obtaining unit is configured to obtain a sample set, wherein the sample set comprises positive samples and negative samples, the number of the positive samples is less than that of the negative samples, and the samples in the sample set are labeled with categories; a training unit configured to extract a part of the sample composition subsets in the sample set, and perform the following training steps: inputting the samples in the subset into an initial model, and determining the loss value of each input sample based on the information output by the initial model and the class labels carried by the samples in the subset; selecting loss values of positive samples and loss values of partial negative samples in the subset, and determining the average value of the selected loss values as a target loss value; determining whether the initial model is trained based on the target loss value; and if so, determining the trained initial model as a class detection model.

In some embodiments, the training unit is further configured to: selecting a loss value of a positive sample in the subset; and selecting loss values of the negative samples of the target number according to the sequence of the loss values from large to small, wherein the ratio of the target number to the number of the positive samples in the extracted subset is within a preset value interval.

In some embodiments, the training unit is further configured to: and in response to determining that no positive sample exists in the subset, selecting a preset number or a preset proportion of loss values in the sequence from large loss value to small loss value.

In some embodiments, the apparatus further comprises: and an updating unit configured to update parameters in the initial model based on the target loss value in response to determining that the initial model is not trained, re-extract the sample composition subset from the sample set, and continue to perform the training step using the initial model after updating the parameters as the initial model.

In a third aspect, an embodiment of the present application provides a method for detecting a video category, including: receiving a target video; the frames in the target video are input into the video category detection model generated by the method described in the embodiment of the first aspect, and the video category detection result is obtained.

In a fourth aspect, an embodiment of the present application provides an apparatus for detecting a video category, including: a receiving unit configured to receive a target video; an input unit configured to input frames in a target video into a video category detection model generated by the method described in the embodiment of the first aspect, resulting in a video category detection result.

In a fifth aspect, an embodiment of the present application provides an electronic device, including: one or more processors; storage means having one or more programs stored thereon which, when executed by one or more processors, cause the one or more processors to implement a method as in any one of the embodiments of the first and third aspects above.

In a sixth aspect, the present application provides a computer-readable medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the method according to any one of the first and third aspects.

According to the method and the device for generating the model, the sample set is obtained, and the sample composition subset can be extracted from the sample set to train the initial model. And the samples in the sample set are labeled with categories, and the number of positive samples in the sample set is smaller than that of negative samples. In this way, the samples in the subset are input to the initial model, and information corresponding to each sample output by the initial model can be obtained. Then, based on the information output by the initial model and the class labels carried by the samples in the subset, the loss value of each input sample can be determined. Then, the loss value of the positive sample and the loss value of the partial negative sample in the subset may be selected, and the average of the selected loss values may be determined as the target loss value. Then, it may be determined whether the initial model is trained to be completed based on the target loss value. If the initial model training is completed, the trained initial model can be determined as a class detection model. Because the number of the positive samples and the negative samples in the subsets is unbalanced, the loss values of part of the negative samples are selected for training the initial model under the condition of selecting the loss value of the positive sample, so that the number of the positive samples and the number of the negative samples can be effectively balanced, and the accuracy of the generated model is improved.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

FIG. 1 is an exemplary system architecture diagram in which one embodiment of the present application may be applied;

FIG. 2 is a flow diagram of one embodiment of a method for generating a model according to the present application;

FIG. 3 is a schematic illustration of an application scenario of a method for generating a model according to the present application;

FIG. 4 is a flow diagram of yet another embodiment of a method for generating a model according to the present application;

FIG. 5 is a schematic diagram of an embodiment of an apparatus for generating a model according to the present application;

FIG. 6 is a flow diagram for one embodiment of a method for detecting video categories in accordance with the present application;

FIG. 7 is a schematic block diagram illustrating an embodiment of an apparatus for detecting video category according to the present application;

FIG. 8 is a block diagram of a computer system suitable for use in implementing the electronic device of an embodiment of the present application.

Detailed Description

The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.

Fig. 1 illustrates an exemplary system architecture 100 to which the method for generating a model or the apparatus for generating a model of the present application may be applied.

As shown in fig. 1, the system architecture 100 may include

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The user may use the

terminal devices

101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. Various communication client applications, such as a video recording application, a video playing application, a voice interaction application, a search application, an instant messaging tool, a mailbox client, social platform software, and the like, may be installed on the

terminal devices

101, 102, and 103.

The

terminal apparatuses

101, 102, and 103 may be hardware or software. When the

terminal devices

101, 102, 103 are hardware, they may be various electronic devices with display screens, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like. When the

terminal devices

101, 102, 103 are software, they can be installed in the electronic devices listed above. It may be implemented as multiple pieces of software or software modules (e.g., to provide distributed services) or as a single piece of software or software module. And is not particularly limited herein.

When the

terminal devices

101, 102, 103 are hardware, an image capturing device may be mounted thereon. The image acquisition device can be various devices capable of realizing the function of acquiring images, such as a camera, a sensor and the like. The user may capture video using an image capture device on the

terminal device

101, 102, 103.

The server 105 may be a server that provides various services, such as a data processing server for data storage and data processing. The data processing server may have a sample set stored therein. The sample set may contain a large number of samples. Wherein, the samples in the sample set may have category labels. In addition, the data processing server may train the initial model using the samples in the sample set, and may store the training results (e.g., the generated class detection model). In this way, the trained class detection model can be used to perform corresponding data processing to realize the functions supported by the class detection model.

The server 105 may be hardware or software. When the server is hardware, it may be implemented as a distributed server cluster formed by multiple servers, or may be implemented as a single server. When the server is software, it may be implemented as multiple pieces of software or software modules (e.g., to provide distributed services), or as a single piece of software or software module. And is not particularly limited herein.

It should be noted that the method for generating the model provided in the embodiment of the present application is generally performed by the server 105, and accordingly, the apparatus for generating the model is generally disposed in the server 105.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

With continued reference to FIG. 2, a flow 200 of one embodiment of a method for generating a model according to the present application is shown. The method for generating the model comprises the following steps:

step 201, a sample set is obtained.

In the present embodiment, the execution subject of the method for generating a model (e.g., server 105 shown in fig. 1) may obtain a sample set in a variety of ways. For example, the executing entity may obtain the existing sample set stored therein from another server (e.g., a database server) for storing the samples through a wired connection or a wireless connection. As another example, a user may collect a sample via a terminal device (e.g.,

terminal devices

101, 102, 103 shown in FIG. 1). In this way, the execution entity may receive samples collected by the terminal and store the samples locally, thereby generating a sample set. It should be noted that the wireless connection means may include, but is not limited to, a 3G/4G connection, a WiFi connection, a bluetooth connection, a WiMAX connection, a Zigbee connection, a uwb (ultra wideband) connection, and other wireless connection means now known or developed in the future.

Here, the sample set may include a large number of samples. The sample set may contain positive and negative samples. In practice, positive and negative samples can typically characterize two different classes of samples. In addition, the positive samples may also represent samples of a certain class, and the negative samples may represent samples of other classes (which may include one or more classes) except for the class corresponding to the positive samples. The types of samples corresponding to the positive and negative samples may be set as needed, and are not limited herein.

It should be noted that positive samples are typically of a type that is difficult to obtain. Thus, the number of positive samples in the sample set may be less than the number of negative samples.

Here, the samples in the sample set may be labeled with categories. The above-described class labels may be used to indicate the class of the sample. The types of the samples respectively corresponding to the positive sample and the negative sample can be preset according to needs, so that whether the samples are positive samples or negative samples can be known through type marking. As an example, when the positive sample and the negative sample respectively represent two different classes of samples, the class label of the positive sample may be set to 1; the class label for the negative examples is set to 0.

It should be noted that, the samples in the sample set may be obtained according to actual requirements. For example, if a model capable of detecting image types (e.g., facial image types, non-facial image types) needs to be trained, the samples in the sample set may be sample images, and the type labels may be used to indicate the types of the images.

In step 202, a subset of the sample composition in the sample set is extracted.

In this embodiment, the executing subject may extract a part of samples from the sample set acquired in step 201 to form a subset, and perform the training steps from step 203 to step 207. The manner of extracting the sample is not limited in this application. For example, the samples currently to be extracted may be extracted from the sample set in a specified order.

In the field of machine learning, a subset of sample composition per extraction may be referred to as a minipatch (mini-batch). The behavior of traversing all samples in a complete set of samples can be called an epoch. By way of example, there are 128000 samples in the sample set, and 128 samples of the 128000 samples can be selected at a time to form a subset for model training. The 128000 samples in the sample set may be grouped into 1000 subsets in turn. When each subset is used, an epoch is considered to have elapsed. It should be noted that different epochs can extract different numbers of samples to form subsets. For example, the first epoch, 128 data can be extracted at a time to form a subset. The second epoch, 256 data can be extracted at a time to form a subset.

Because the number of samples in the sample set is usually large, in each round of training, if all the samples in the sample set are used at one time, the time consumption is large, and the processing efficiency is low. A part of samples in the sample set are selected to form subsets, during training, each subset is subjected to gradient descent once, and finally, the samples in the sample set are traversed, so that the data volume of each iteration is small. Therefore, time consumption can be reduced and processing efficiency can be improved.

Step 203, inputting the samples in the subset into the initial model, and determining the loss value of each input sample based on the information output by the initial model and the class labels carried by the samples in the subset.

In this embodiment, the execution subject may first input the samples in the subset formed in step 202 into the initial model. The initial model can perform feature extraction, analysis and other processing on the sample, and then output information. The initial model may be a classification model (an existing model structure capable of performing a classification function may be used) which is previously established as needed, or may be a model obtained by preliminarily training an existing classification model as needed. For example, if a model capable of detecting an image type or a text type needs to be trained, an existing classification model may be used as the initial model. By way of example, existing classification models may use convolutional neural networks of various existing structures (e.g., DenseBox, VGGNet, ResNet, SegNet, etc.). A Support Vector Machine (SVM) or the like may also be used.

After the samples in the subset are input to the initial model, the execution subject may extract information output by the initial model. Wherein, each input sample can correspond to the information output by one initial model. For example, 128 samples in the subset, the initial model may output 128 pieces of information corresponding to the 128 input samples one to one.

The execution agent may then determine a loss value for each of the input samples based on the information output by the initial model and the class labels carried by the samples in the subset. Here, the goal of training the initial model is to make the difference between the output information and the class label carried by the input sample as small as possible. Therefore, a value for characterizing a difference between the information output by the initial model and the class label can be used as a loss value. In practice, various existing loss functions (loss functions) can be used to characterize the difference between the information output by the initial model and the class labels. For each input sample, inputting the information corresponding to the sample and the class label of the sample, which are output by the initial model, into the loss function, so as to obtain the loss value of the sample.

In practice, the loss function can be used to measure the degree of inconsistency between the predicted value (i.e., the output information) of the initial model and the actual value (i.e., the class label). It is a non-negative real-valued function. In general, the smaller the value of the loss function (loss value), the better the robustness of the model. The loss function may be set according to actual requirements. For example, euclidean distances, cross entropy loss functions, etc. may be used.

In some optional implementation manners of this embodiment, the initial model may also be a model obtained after a pre-established classification model is preliminarily trained according to needs. Specifically, the initial model is obtained by the following steps: the initial model can be obtained by training using a machine learning method with the samples in the sample set as input and the class labels of the input samples as output. As an example, if a model for image class detection needs to be trained, a convolutional neural network may be preliminarily trained by using a supervised learning manner, using a corresponding sample set (the sample may be an image, and a class label of the sample may be used to indicate a class of the image). And determining the preliminarily trained convolutional neural network as an initial model. Specifically, samples may be sequentially extracted from the sample set to form subsets, each subset traversal is completed, and the model may be updated once using a gradient descent algorithm. The executing body may determine the model trained when the traversal of each subset is completed as the initial model.

And step 204, selecting loss values of the positive samples and loss values of partial negative samples in the subset, and determining the average value of the selected loss values as a target loss value.

In this embodiment, the samples in the sample set are labeled with categories, and the categories corresponding to the positive sample and the negative sample are known. Therefore, the execution subject may first select the loss value of the positive sample and the loss value of the partial negative sample from the subset extracted in step 202, and determine the average value of the selected loss values as the target loss value. In practice, the target loss value is the extracted loss value of the subset.

Here, since the number of positive samples in the sample set is generally small, the number of negative samples is generally large. Thus, the number of positive samples in the subset of the sample set is also typically smaller, and the number of negative samples is typically larger. In practice, the loss values of all positive samples in the extracted subset may be chosen. Meanwhile, the loss values of some negative samples in the extracted subset can be selected. Here, the extraction of the loss value of the negative sample may be a part of the random extraction, or a part of the random extraction may be selected in the order of decreasing the loss value.

It should be noted that the number of loss values of the selected negative samples may be the same as or similar to the number of positive samples. For example, there are 128 samples in the subset. There are 10 positive samples and 118 negative samples. At this time, a loss value of 10 negative samples, or a loss value of 15 negative samples may be selected. As another example, there are 128 samples in the subset. There are 1 positive sample and 127 negative samples. At this point, a loss value of 1 negative sample may be chosen. In addition, the number of loss values of the selected negative samples may also be specified in advance, regardless of the number of positive samples. In practice, the number of positive samples in different subsets typically differs less, and therefore the number of selected loss values typically differs less.

In some optional implementations of the embodiment, the execution subject may select the loss values of the positive samples in the extracted subset, and select the loss values of the target number of negative samples in the order of the loss values from large to small. Wherein the ratio of the target number to the number of positive samples in the extracted subset is within a preset value range (e.g., value range [1,2 ]). The larger the loss value is, the more difficult the model is to judge the sample type, so that the loss values of the negative samples are selected in the descending order of the loss values, the sample which is the most difficult to judge the type can be used for training, and the model training efficiency can be improved.

In some optional implementations of this embodiment, in response to determining that no positive sample exists in the extracted subset, the execution subject may select a preset number (e.g., 10, or 20) or a preset proportion (e.g., 10%) of loss values in descending order of the loss values.

Step 205, determining whether the initial model is trained completely based on the target loss value.

In this embodiment, the execution subject may determine whether the initial model is trained completely in various ways based on the target loss value. As an example, the execution body described above may determine whether the target loss value has converged. When it is determined that the target loss value converges, it may be determined that the initial model at this time is trained. As yet another example, the execution subject may first compare the target loss value with a preset value. In response to determining that the target loss value is less than or equal to the preset value, the ratio of the number of target loss values less than or equal to the preset value to the preset number among the target loss values determined in the latest preset number (for example, 100) training steps may be counted. When the ratio is greater than a preset ratio (e.g., 95%), it may be determined that the initial model training is completed. It should be noted that the preset value can be generally used as an ideal case for representing the degree of inconsistency between the predicted value and the actual value. That is, when the loss value is less than or equal to the preset value, the predicted value may be considered to be close to or approximate to the true value. The preset value can be set according to actual requirements.

It is noted that in response to determining that the initial model has been trained, execution may continue to step 206. In response to determining that the initial model is not trained, parameters in the initial model may be updated based on the target loss value determined in step 204, the sample composition subset may be re-extracted from the sample set, and the training step may be continued using the initial model with updated parameters as the initial model. Here, the gradient of the target loss value with respect to the model parameters may be found using a back propagation algorithm, and then the model parameters may be updated based on the gradient using a gradient descent algorithm. It should be noted that the unselected loss values do not participate in the gradient descent. It should be noted that the back propagation algorithm, the gradient descent algorithm, and the machine learning method are well-known technologies that are currently widely researched and applied, and are not described herein again.

In response to determining that the training of the initial model is complete, step 206, the trained initial model is determined to be a class detection model.

In this embodiment, in response to determining that the training of the initial model is completed, the executing entity may determine the trained initial model as the class detection model.

In some optional implementation manners of this embodiment, the samples in the sample set may be sample videos, the category labels carried by the samples may be used to indicate categories of the sample videos, and the category detection model may be a video category detection model for detecting categories of the videos.

In some optional implementations of this embodiment, the execution subject may store the class detection model locally, or may send the class detection model to other electronic devices (e.g.,

terminal devices

101, 102, 103 shown in fig. 1).

With continued reference to fig. 3, fig. 3 is a schematic diagram of an application scenario of the method for generating a model according to the present embodiment. In the application scenario of fig. 3, a terminal device 301 used by a user may have a model training application installed thereon. After a user opens the application and uploads the sample set or the storage path of the sample set, the server 302 providing background support for the application may run a method for generating a model, including:

first, a sample set may be obtained. The sample set may include positive samples and negative samples. The number of positive samples is smaller than that of negative samples, and the samples in the sample set are labeled with categories.

Then, a part of the samples in the sample set is extracted to form a subset 303, and the following training steps are performed: the samples in the subset 303 are input to the initial model 304, and the loss value of each input sample is determined based on the information output by the initial model and the class labels carried by the samples in the subset 303. Next, the loss values of the positive samples and the loss values of the partial negative samples in the extracted subset are selected, and the average of the selected loss values is determined as the target loss value 305. Then, it may be determined whether the initial model is trained based on the target loss value. If it is determined that the training is completed, the initial model after updating the parameters may be determined as the target model 306.

The method provided by the above embodiment of the present application may extract a part of the sample composition subset from the sample set to perform the training of the initial model by obtaining the sample set. And the samples in the sample set are labeled with categories, and the number of positive samples in the sample set is smaller than that of negative samples. In this way, the samples in the subset are input to the initial model, and information corresponding to each sample output by the initial model can be obtained. Then, based on the information output by the initial model and the class labels carried by the samples in the subset, the loss value of each input sample can be determined. Then, the loss value of the positive sample and the loss value of the partial negative sample in the subset may be selected, and the average of the selected loss values may be determined as the target loss value. Then, it may be determined whether the initial model is trained to be completed based on the target loss value. If the initial model training is completed, the trained initial model can be determined as a class detection model. Because the number of the positive samples and the negative samples in the subsets is unbalanced, the loss values of part of the negative samples are selected for training the initial model under the condition of selecting the loss value of the positive sample, so that the number of the positive samples and the number of the negative samples can be effectively balanced, and the accuracy of the generated model is improved.

With further reference to FIG. 4, a flow 400 of yet another embodiment of a method for generating a model is shown. The process 400 of the method for generating a model includes the steps of:

step 401, a sample set is obtained.

In the present embodiment, the execution subject of the method for generating a model (e.g., server 105 shown in fig. 1) may obtain a sample set in a variety of ways. Here, the sample set may include a large number of samples. The sample set may contain positive and negative samples. The number of positive samples in the sample set may be less than the number of negative samples. Here, the samples in the sample set may be labeled with categories. The above-described class labels may be used to indicate the class of the sample.

In this embodiment, the samples in the sample set may be sample videos, and the category labels carried by the samples may be used to indicate the categories of the sample videos. A positive sample may be a sample video of some specified category, such as a violation of a legal class video. Negative examples may be other types of video.

At step 402, a subset of the sample composition in the sample set is extracted.

In this embodiment, the executing entity may extract a part of samples from the sample set acquired in step 401 to compose a subset, and execute the training steps from step 403 to step 408. The manner of extracting the sample is not limited in this application. For example, the samples currently to be extracted may be extracted from the sample set in a specified order.

And 403, inputting the samples in the subset into the initial model, and determining the loss value of each input sample based on the information output by the initial model and the class labels carried by the samples in the subset.

In this embodiment, the executing agent may first input the samples in the subset composed in step 402 into the initial model. The information output by the initial model can then be extracted. Wherein, each input sample can correspond to the information output by one initial model. The loss value for each sample entered may then be determined based on the information output by the initial model and the class labels carried by the samples in the subset. It should be noted that the operation of calculating the loss value is substantially the same as the operation described in step 203, and is not described herein again.

Here, the initial model may be a model obtained by preliminarily training a pre-established model as needed. Specifically, the initial model is obtained by the following steps: the initial model can be obtained by training using a machine learning method with the samples in the sample set as input and the class labels of the input samples as output. Here, the initial model can be trained using the existing model structure.

In this embodiment, a model for video category detection may be trained. The execution subject can initially train the convolutional neural network by using a supervised learning mode. And determining the trained convolutional neural network as an initial model. It should be noted that, after each subset traversal is completed, the model may be updated once by using a gradient descent algorithm. The executing subject may determine the model trained when the sample set traversal is completed as the initial model.

At step 404, a loss value for a positive sample in the subset is selected.

In this embodiment, the execution subject may select the loss value of the positive sample in the extracted subset. It will be appreciated that, since there are fewer positive samples, the loss values for all positive samples in the subset may be selected.

And step 405, selecting loss values of the negative samples of the target number according to the sequence of the loss values from large to small.

In this embodiment, the execution subject may select the loss values of the target number of negative samples from the loss values obtained in step 403 in the descending order of the loss values. Wherein the ratio of the target number to the number of positive samples in the extracted subset is within a preset value range (e.g., value range [1,2 ]). The larger the loss value is, the more difficult the model is to judge the sample type, so that the loss values of the negative samples are selected in the descending order of the loss values, the sample which is the most difficult to judge the type can be used for training, and the model training efficiency can be improved.

In this embodiment, in response to determining that no positive sample exists in the subset, the execution subject may select a predetermined number (e.g., 10, or 20) of loss values or a predetermined proportion (e.g., 10%) of loss values in descending order of the loss values.

In step 406, the average of the selected loss values is determined as the target loss value.

In this embodiment, the execution subject may determine an average value of the loss values selected in step 404 and step 405 as a target loss value. In practice, the target loss value is the extracted loss value of the subset.

Step 407, determining whether the initial model is trained completely based on the target loss value.

In this embodiment, the execution subject may determine whether the initial model is trained completely in various ways based on the target loss value. As an example, the execution body described above may determine whether the target loss value has converged. When it is determined that the target loss value converges, it may be determined that the initial model at this time is trained.

It is noted that in response to determining that the initial model has been trained, execution may proceed to step 408. In response to determining that the initial model is not trained, parameters in the initial model may be updated based on the target loss values determined in step 406, the sample composition subset may be re-extracted from the sample set, and the training step may be continued using the initial model with updated parameters as the initial model. Here, the gradient of the target loss value with respect to the model parameters may be found using a back propagation algorithm, and then the model parameters may be updated based on the gradient using a gradient descent algorithm. It should be noted that the unselected loss values do not participate in the gradient descent. It should be noted that the back propagation algorithm, the gradient descent algorithm, and the machine learning method are well-known technologies that are currently widely researched and applied, and are not described herein again.

In response to determining that the initial model training is complete, the trained initial model is determined to be a class detection model, step 408.

In this embodiment, in response to determining that the training of the initial model is completed, the executing entity may determine the trained initial model as the class detection model. Here, the above-described category detection model may be a video category detection model for detecting a video category.

In this embodiment, the execution subject may store the video category detection model locally, or may send the video category detection model to other electronic devices (for example,

terminal devices

101, 102, and 103 shown in fig. 1).

As can be seen from fig. 4, compared with the embodiment corresponding to fig. 2, the flow 400 of the method for generating a model in the present embodiment involves the step of selecting the loss values of the target number of negative samples in the order of the loss values from large to small. The larger the loss value is, the more difficult the model is to judge the sample type, so that the loss values of the negative samples are selected in the descending order of the loss values, the sample which is the most difficult to judge the type can be used for training, and the model training efficiency can be improved.

With further reference to fig. 5, as an implementation of the method shown in the above figures, the present application provides an embodiment of an apparatus for generating a model, which corresponds to the embodiment of the method shown in fig. 2, and which can be applied in various electronic devices.

As shown in fig. 5, the apparatus 500 for generating a model according to the present embodiment includes: an obtaining unit 501 configured to obtain a sample set, where the sample set includes positive samples and negative samples, the number of the positive samples is smaller than the number of the negative samples, and the samples in the sample set are labeled with categories; a training unit 502 configured to extract a part of the samples in the sample set to form a subset, and perform the following training steps: inputting the samples in the subset into an initial model, and determining the loss value of each input sample based on the information output by the initial model and the class labels carried by the samples in the subset; selecting loss values of the positive samples and loss values of part of the negative samples in the subset, and determining the average value of the selected loss values as a target loss value; determining whether the initial model is trained based on the target loss value; and if so, determining the trained initial model as a class detection model.

In some optional implementations of this embodiment, the training unit 502 may be further configured to: selecting a loss value of the positive sample in the subset; and selecting loss values of the negative samples of the target number according to the sequence of the loss values from large to small, wherein the ratio of the target number to the number of the positive samples in the extracted subset is within a preset value interval.

In some optional implementations of this embodiment, the training unit 502 may be further configured to: and in response to determining that no positive sample exists in the subset, selecting a preset number or a preset proportion of loss values in the order of the loss values from large to small.

In some optional implementations of this embodiment, the apparatus may further include an updating unit (not shown in the figure). Wherein the updating unit may be configured to, in response to determining that the initial model is not trained completely, update parameters in the initial model based on the target loss value, re-extract the sample composition subset from the sample set, and continue the training step using the initial model after updating the parameters as the initial model.

In some optional implementations of this embodiment, the initial model may be obtained by: and (3) training to obtain an initial model by using a machine learning method and taking the samples in the sample set as input and the class labels of the input samples as output.

In some optional implementation manners of this embodiment, the samples in the sample set may be sample videos, the category labels carried by the samples may be used to indicate categories of the sample videos, and the category detection model may be a video category detection model for detecting video categories.

The apparatus provided by the above embodiment of the present application obtains the sample set through the obtaining unit 501, and may extract the sample composition subset from the sample set to perform training of the initial model. And the samples in the sample set are labeled with categories, and the number of positive samples in the sample set is smaller than that of negative samples. In this way, the training unit 502 inputs the samples in the subset into the initial model, so as to obtain information corresponding to each sample output by the initial model. Then, the training unit 502 may determine the loss value of each input sample based on the information output by the initial model and the class labels carried by the samples in the subset. Then, the loss value of the positive sample and the loss value of the partial negative sample in the subset may be selected, and the average of the selected loss values may be determined as the target loss value. Then, it may be determined whether the initial model is trained to be completed based on the target loss value. If the initial model training is completed, the trained initial model can be determined as a class detection model. Because the number of the positive samples and the negative samples in the subsets is unbalanced, the loss values of part of the negative samples are selected for training the initial model under the condition of selecting the loss value of the positive sample, so that the number of the positive samples and the number of the negative samples can be effectively balanced, and the accuracy of the generated model is improved.

Referring to fig. 6, a flowchart 600 of an embodiment of a method for detecting video category provided by the present application is shown. The method for detecting a video category may comprise the steps of:

step 601, receiving a target video.

In this embodiment, an execution subject (for example, the server 105 shown in fig. 1, or another server storing a video category detection model) for detecting a video category may receive a target video transmitted by a terminal device (for example, the

terminal devices

101, 102, and 103 shown in fig. 1) by using a wired connection or a wireless connection.

Step 602, inputting the frames in the target video into the video category detection model to obtain the video category detection result.

In this embodiment, the executing entity may input the frame in the target video into a video category detection model to obtain a video category detection result. The video category detection model may be generated using the method of generating a category detection model as described in the embodiment of fig. 2 above. For a specific generation process, reference may be made to the related description of the embodiment in fig. 2, which is not described herein again. The video category detection result may be used to indicate the category of the target video.

In some optional implementations of this embodiment, after obtaining the video category detection result, the execution subject may store the target video in a video library corresponding to the category indicated by the video category detection result.

The method for detecting the video category can be used for detecting the category of the video, and the accuracy of video category detection can be improved.

With continuing reference to FIG. 7, as an implementation of the method illustrated in FIG. 6 above, the present application provides one embodiment of an apparatus for detecting video categories. The embodiment of the device corresponds to the embodiment of the method shown in fig. 6, and the device can be applied to various electronic devices.

As shown in fig. 7, the apparatus 700 for detecting video category according to the present embodiment includes: a receiving unit 701 configured to receive a target video; an input unit 702 is configured to input the frames in the target video into a video category detection model, so as to obtain a video category detection result.

It will be understood that the elements described in the apparatus 700 correspond to various steps in the method described with reference to fig. 6. Thus, the operations, features and resulting advantages described above with respect to the method are also applicable to the apparatus 700 and the units included therein, and will not be described herein again.

Referring now to FIG. 8, shown is a block diagram of a computer system 800 suitable for use in implementing the electronic device of an embodiment of the present application. The electronic device shown in fig. 8 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.

As shown in fig. 8, the computer system 800 includes a Central Processing Unit (CPU)801 that can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)802 or a program loaded from a storage section 808 into a Random Access Memory (RAM) 803. In the RAM 803, various programs and data necessary for the operation of the system 800 are also stored. The CPU 801, ROM 802, and RAM 803 are connected to each other via a bus 804. An input/output (I/O) interface 805 is also connected to bus 804.

The following components are connected to the I/O interface 805: an input portion 806 including a keyboard, a mouse, and the like; an output section 807 including a signal such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 808 including a hard disk and the like; and a communication section 809 including a network interface card such as a LAN card, a modem, or the like. The communication section 809 performs communication processing via a network such as the internet. A drive 810 is also connected to the I/O interface 805 as necessary. A removable medium 811 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 810 as necessary, so that the computer program read out therefrom is mounted on the storage section 808 as necessary.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program can be downloaded and installed from a network through the communication section 809 and/or installed from the removable medium 811. The computer program performs the above-described functions defined in the method of the present application when executed by the Central Processing Unit (CPU) 801. It should be noted that the computer readable medium described herein can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present application may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes an acquisition unit and a training unit. Where the names of these units do not in some cases constitute a limitation on the unit itself, for example, an acquisition unit may also be described as a "unit that acquires a sample set".

As another aspect, the present application also provides a computer-readable medium, which may be contained in the apparatus described in the above embodiments; or may be present separately and not assembled into the device. The computer readable medium carrying one or more programs which, when executed by the apparatus, cause the apparatus to: obtaining a sample set; extracting a subset in the sample set, and executing the following training steps: obtaining a sample set; extracting part of samples in the sample set to form a subset, and executing the following training steps: inputting the samples in the subset into an initial model, and determining the loss value of each input sample based on the information output by the initial model and the class labels carried by the samples in the subset; selecting loss values of the positive samples and loss values of part of the negative samples in the subset, and determining the average value of the selected loss values as a target loss value; determining whether the initial model is trained based on the target loss value; and if so, determining the trained initial model as a class detection model.

The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention herein disclosed is not limited to the particular combination of features described above, but also encompasses other arrangements formed by any combination of the above features or their equivalents without departing from the spirit of the invention. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.

Claims

1. A method for generating a video category detection model, comprising:

obtaining a sample set, wherein the sample set comprises positive samples and negative samples, the number of the positive samples is less than that of the negative samples, the samples in the sample set are provided with category labels, the samples in the sample set are sample videos, and the category labels are used for indicating the categories of the sample videos;

extracting a part of samples in the sample set to form a subset, and executing the following training steps: inputting the samples in the subset into an initial model, and determining the loss value of each input sample based on the information output by the initial model and the class labels carried by the samples in the subset; selecting loss values of the positive samples and loss values of part of the negative samples in the subset, and determining the average value of the selected loss values as a target loss value; determining whether the initial model is trained completely based on the target loss value; if so, determining the trained initial model as a category detection model, wherein the category detection model is a video category detection model for detecting video categories;

wherein the selecting the loss value of the positive sample and the loss value of the partial negative sample in the subset includes: selecting a loss value of a positive sample in the subset; and selecting loss values of the negative samples of the target number according to the sequence of the loss values from large to small, wherein the ratio of the target number to the number of the positive samples in the extracted subset is within a preset value interval.

2. The method for generating a video category detection model according to claim 1, wherein said selecting loss values of positive samples and loss values of partial negative samples in said subset comprises:

and in response to determining that no positive sample exists in the subset, selecting a preset number or a preset proportion of loss values according to the sequence of the loss values from large to small.

3. The method for generating a video category detection model of claim 1, wherein the method further comprises:

in response to determining that the initial model is not trained, updating parameters in the initial model based on the target loss value, re-extracting the sample composition subset from the sample set, and continuing the training step using the initial model after updating parameters as the initial model.

4. The method for generating a video category detection model of claim 1, wherein the initial model is obtained by:

and training to obtain an initial model by using a machine learning method and taking the samples in the sample set as input and the class labels of the input samples as output.

5. An apparatus for generating a video category detection model, comprising:

the acquiring unit is configured to acquire a sample set, wherein the sample set comprises positive samples and negative samples, the number of the positive samples is less than that of the negative samples, the samples in the sample set are provided with category labels, the samples in the sample set are sample videos, and the category labels are used for indicating the categories of the sample videos;

a training unit configured to extract a part of the sample composition subsets from the sample set, and perform the following training steps: inputting the samples in the subset into an initial model, and determining the loss value of each input sample based on the information output by the initial model and the class labels carried by the samples in the subset; selecting loss values of the positive samples and loss values of part of the negative samples in the subset, and determining the average value of the selected loss values as a target loss value; determining whether the initial model is trained completely based on the target loss value; if so, determining the trained initial model as a category detection model, wherein the category detection model is a video category detection model for detecting video categories;

wherein the training unit is further configured to: selecting a loss value of a positive sample in the subset; and selecting loss values of the negative samples of the target number according to the sequence of the loss values from large to small, wherein the ratio of the target number to the number of the positive samples in the extracted subset is within a preset value interval.

6. The apparatus for generating a video category detection model of claim 5, wherein the training unit is further configured to:

7. The apparatus for generating a video category detection model of claim 5, wherein the apparatus further comprises:

an updating unit configured to update parameters in the initial model based on the target loss value in response to determining that the initial model is not trained, re-extract a sample composition subset from the sample set, and continue the training step using the initial model after updating the parameters as the initial model.

8. Apparatus for generating a video category detection model according to claim 5, wherein the initial model is obtained by:

9. A method for detecting video categories, comprising:

receiving a target video;

inputting the frames in the target video into the video category detection model generated by the method according to claim 1 to obtain the video category detection result.

10. An apparatus for detecting video categories, comprising:

a receiving unit configured to receive a target video;

an input unit configured to input frames in the target video into the video category detection model generated by the method according to claim 1, and obtain a video category detection result.

11. An electronic device, comprising:

one or more processors;

a storage device having one or more programs stored thereon,

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-4, 9.

12. A computer-readable medium, on which a computer program is stored, which program, when being executed by a processor, carries out the method according to any one of claims 1-4, 9.