WO2020087974A1 - 生成模型的方法和装置 - Google Patents

生成模型的方法和装置 Download PDF

Info

Publication number
WO2020087974A1
WO2020087974A1 PCT/CN2019/095078 CN2019095078W WO2020087974A1 WO 2020087974 A1 WO2020087974 A1 WO 2020087974A1 CN 2019095078 W CN2019095078 W CN 2019095078W WO 2020087974 A1 WO2020087974 A1 WO 2020087974A1
Authority
WO
WIPO (PCT)
Prior art keywords
low
video
sample
quality
probability
Prior art date
Application number
PCT/CN2019/095078
Other languages
English (en)
French (fr)
Inventor
袁泽寰
王长虎
Original Assignee
北京字节跳动网络技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京字节跳动网络技术有限公司 filed Critical 北京字节跳动网络技术有限公司
Publication of WO2020087974A1 publication Critical patent/WO2020087974A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content

Definitions

  • the embodiments of the present application relate to the field of computer technology, such as a method and device for generating a model.
  • the server can detect the video to determine whether it is low-quality video.
  • the low-quality video is generally a lower-quality video, for example, it may include a blurred video, a black screen video, a screen recording video, and so on.
  • videos are usually divided into multiple categories, for example, black screen video categories, screen recording video categories, blurry video categories, and normal video categories.
  • the training classification model determines the probability that the video belongs to each category, and the sum of the probability that the video belongs to abnormal video is taken as the probability that the video belongs to low-quality video, and then determines whether the video is low-quality video.
  • the embodiments of the present application provide a method and a device for generating a model.
  • an embodiment of the present application provides a method for generating a model.
  • the method includes: obtaining a sample set, wherein the samples in the sample set include a sample video and a file indicating whether the sample video belongs to low-quality video
  • the first labeling information in the case that the sample video belongs to low-quality video, the sample further includes second labeling information indicating a low-quality category of the sample video, and there are multiple low-quality categories of the sample video; Extract samples from the sample set and perform the following training process: input the frames in the sample video from the extracted samples to the initial model, and obtain the probability that the sample video belongs to low-quality video and the sample video belongs to each low-quality category.
  • Probability based on the labeled information in the extracted sample, the obtained probability and the pre-established loss function, determine the loss value of the sample; compare the loss value with the target value to determine whether the initial model is trained; response After determining that the initial model training is completed, the trained initial model is determined to be a low-quality video detection model.
  • an embodiment of the present application provides an apparatus for generating a model
  • the apparatus includes: an acquiring unit configured to acquire a sample set, wherein the samples in the sample set include a sample video and an instruction for indicating the sample video Whether it belongs to the first annotation information of the low-quality video, in the case that the sample video belongs to the low-quality video, the sample further includes second annotation information indicating the low-quality category of the sample video, the sample video
  • the training unit is configured to extract samples from the sample set and perform the following training process: input the frames of the sample video in the extracted samples to the initial model, and obtain the sample video belonging to the low-quality video, respectively Probability and the probability that the sample video belongs to each low-quality category; based on the labeled information in the extracted sample, the obtained probability, and the pre-established loss function, determine the loss value of the sample; compare the loss value with the target value To determine whether the initial model training is completed; in response to determining that the initial model training is completed, the initial
  • an embodiment of the present application provides a method for detecting low-quality video, including: receiving a low-quality video detection request including a target video; and inputting frames in the target video as described in any of the first aspect above
  • the low-quality video detection model generated by the method described in the embodiment obtains a detection result, wherein the detection result includes a probability that the target video belongs to low-quality video; in response to determining that the target video belongs to low-quality video, the probability is greater than
  • the first preset threshold determines that the target video is a low-quality video.
  • an embodiment of the present application provides an apparatus for detecting low-quality video, including: a first receiving unit configured to receive a low-quality video detection request including a target video; an input unit configured to convert the target
  • the frame input in the video adopts the low-quality video detection model generated by the method described in any one of the embodiments of the first aspect above to obtain a detection result, where the detection result includes the probability that the target video belongs to the low-quality video;
  • the first determining unit is configured to determine that the target video is a low-quality video in response to determining that the probability that the target video belongs to a low-quality video is greater than a first preset threshold.
  • an embodiment of the present application provides an electronic device, including: at least one processor; a storage device on which at least one program is stored, and when at least one program is executed by at least one processor, at least one processor is implemented The method according to any one of the above embodiments of the first aspect and the third aspect.
  • an embodiment of the present application provides a computer-readable medium on which a computer program is stored, and when the program is executed by a processor, the method according to any one of the embodiments of the first aspect and the third aspect described above is implemented.
  • FIG. 1 is an exemplary system architecture diagram to which an embodiment of the present application can be applied;
  • FIG. 2 is a flowchart of an embodiment of a method for generating a model according to the present application
  • FIG. 3 is a schematic diagram of an application scenario according to the method of generating a model of the present application
  • FIG. 4 is a flowchart of still another embodiment of the method for generating a model according to the present application.
  • FIG. 5 is a schematic structural diagram of an embodiment of an apparatus for generating a model according to the present application.
  • FIG. 6 is a flowchart of an embodiment of a method for detecting low-quality video according to the present application.
  • FIG. 7 is a schematic structural diagram of an embodiment of an apparatus for detecting low-quality video according to the present application.
  • FIG. 8 is a schematic structural diagram of a computer system suitable for implementing an electronic device according to an embodiment of the present application.
  • FIG. 1 shows an exemplary system architecture 100 to which the method or apparatus for generating a model of the present application can be applied.
  • the system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105.
  • the network 104 is a medium used to provide a communication link between the terminal devices 101, 102, 103 and the server 105.
  • the network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, and so on.
  • the user can use the terminal devices 101, 102, 103 to interact with the server 105 through the network 104 to receive or send messages, and so on.
  • Various communication client applications can be installed on the terminal devices 101, 102, and 103, such as video recording applications, video playback applications, voice interaction applications, search applications, instant communication tools, email clients, social platform software, etc. .
  • the terminal devices 101, 102, and 103 may be hardware or software.
  • the terminal devices 101, 102, and 103 may be various electronic devices with display screens, including but not limited to smart phones, tablet computers, laptop portable computers, and desktop computers.
  • the terminal devices 101, 102, and 103 are software, they can be installed in the electronic devices listed above. It can be implemented as multiple software or software modules (for example to provide distributed services), or as a single software or software module. There is no specific limit here.
  • an image acquisition device may also be installed thereon.
  • the image acquisition device may be various devices capable of realizing image acquisition functions, such as cameras, sensors, and so on. Users can use the image acquisition devices on the terminal devices 101, 102, and 103 to collect video.
  • the server 105 may be a server that provides various services, for example, a video processing server for storing, managing, or analyzing videos uploaded by the terminal devices 101, 102, and 103.
  • the video processing server can obtain the sample set. A large number of samples can be included in the sample set.
  • the samples in the sample set may include sample video, first labeling information indicating whether the sample video belongs to low-quality video, and second labeling information indicating low-quality categories of the sample video belonging to the low-quality video.
  • the video processing server can use the samples in the sample set to train the initial model, and can store the training results (such as the generated low-quality video detection model). In this way, after the user uploads the video using the terminal devices 101, 102, and 103, the server 105 can detect whether the video uploaded by the user is a low-quality video, and further, can perform operations such as prompt information push.
  • the server 105 may be hardware or software.
  • the server When the server is hardware, it can be implemented as a distributed server cluster composed of multiple servers, or as a single server.
  • the server When the server is software, it can be implemented as multiple software or software modules (for example, to provide distributed services), or as a single software or software module. There is no specific limit here.
  • the method for generating models provided by the embodiments of the present application is generally executed by the server 105, and accordingly, the device for generating models is generally provided in the server 105.
  • terminal devices, networks, and servers in FIG. 1 are only schematic. According to the implementation needs, there can be any number of terminal devices, networks and servers.
  • the method for generating a model includes the following steps:
  • step 201 a sample set is obtained.
  • the execution subject of the method of generating a model can obtain the sample set in various ways.
  • the execution subject may obtain the sample set stored in another server (such as a database server) set to store the samples through a wired connection or a wireless connection.
  • a user may collect samples through a terminal device (such as the terminal devices 101, 102, and 103 shown in FIG. 1). In this way, the above-mentioned execution subject can receive the samples collected by the terminal and store these samples locally, thereby generating a sample set.
  • wireless connection methods may include, but are not limited to, 3G / 4G connection, WiFi connection, Bluetooth connection, WiMAX connection, Zigbee connection, ultra-wideband (UWB) connection, and other currently known or future developed Wireless connection method.
  • the sample set may include a large number of samples.
  • the sample may include a sample video and first labeling information indicating whether the sample video belongs to low-quality video. For example, when it belongs to a low-quality video, the first annotation information may be "1"; when it does not belong to a low-quality video, the first annotation information may be "0".
  • the sample further includes second labeling information indicating the low-quality category of the sample video.
  • low-quality videos are usually lower-quality videos.
  • low-quality video may include, but is not limited to, blurry video, black screen video, screen recording video, and so on.
  • the low-quality categories may include, but are not limited to, fuzzy video categories, black screen video categories, and screen recording video categories.
  • step 202 samples are extracted from the sample set.
  • the execution subject may extract samples from the sample set acquired in step 201, and perform the training process from step 203 to step 206.
  • the method of sample extraction is not limited in this application.
  • a sample may be randomly selected, or a sample to be extracted from the sample set in the specified order may be selected.
  • step 203 the frames in the sample video in the extracted samples are input to the initial model, and the probability that the sample video belongs to the low-quality video and the probability that the sample video belongs to each low-quality category are respectively obtained.
  • the execution subject may input the frames in the sample video in the sample extracted in step 202 to the initial model.
  • the initial model may output the sample video as low by performing feature extraction and analysis on the frames in the video.
  • the probability of quality video, and the probability that the sample video belongs to each low-quality category can be output. It should be noted that the probability that the sample video belongs to each low-quality category can be understood as the conditional probability that the sample video belongs to each low-quality category if the sample video belongs to the low-quality video.
  • the initial model may be various models with image feature extraction function and classification function created based on machine learning technology.
  • the initial model can perform feature extraction on the frames in the video, and then perform fusion, analysis and other processing on the extracted features, and finally output the probability that the sample video belongs to low-quality video and the probability that the sample video belongs to each low-quality category.
  • the probabilities output by the initial model are usually inaccurate.
  • the purpose of training the initial model is that the probabilities of the initial model output after the trial training are more accurate.
  • the initial model may be a convolutional neural network using structures in various related technologies (eg, DenseBox, VGGNet, ResNet, SegNet, etc.).
  • Convolutional Neural Network is a feed-forward neural network. Its artificial neurons can respond to some of the surrounding cells in the coverage area, and have excellent performance for image processing. Therefore, convolutional neural networks can be used.
  • the neural network extracts the frame features in the sample video.
  • the established neural network may include convolutional layers, pooling layers, feature fusion layers, fully connected layers, and so on. Among them, the convolutional layer can be used to extract image features.
  • the pooling layer can be used to downsample input information.
  • the feature fusion layer may be used to fuse the obtained image features corresponding to each frame (for example, in the form of a feature matrix or a feature vector). For example, the feature values at the same position in the feature matrix corresponding to different frames may be averaged to perform feature fusion to generate a fused feature matrix.
  • the fully connected layer can be used to classify the resulting features.
  • the fully connected layer can be composed of two parts. Among them, a part can output the probability that the sample video belongs to low-quality video. Another part can output the probability that the sample video belongs to each low-quality category. In practice, each part can use independent softmax function to calculate probability.
  • the above initial model may also be another model with image feature extraction function and classification function, which is not limited to the above example, and the specific model structure is not limited here.
  • step 204 the loss value of the sample is determined based on the labeled information in the extracted sample, the obtained probability, and the pre-established loss function.
  • the execution subject may determine the loss value of the sample based on the extracted labeling information (including the first labeling information and the second labeling information) in the sample, the obtained probability, and the pre-established loss function.
  • the loss function can be used to estimate the degree of inconsistency between the information (such as probability) output by the initial model and the true value (such as labeled information).
  • the smaller the value of the loss function (loss value) the better the robustness of the model.
  • the loss function can be set according to actual needs.
  • the loss function can be set to take into account the two-part loss (for example, it can be set as the sum of the two-part loss or the weighted result of the two-part loss).
  • Part of the loss can be used to characterize the probability and true value of the sample video output by the initial model as low-quality video (such as the first labeling information, if the first labeling information indicates that the sample video is low-quality video, the true value is 1; otherwise , 0).
  • Another part of the loss can be used to characterize the difference between the probability that the sample video output by the initial model belongs to the low-quality category indicated by the second annotation information and the true value (such as 1).
  • the partial loss can be set to a preset value (for example, 0).
  • the two partial losses can be calculated using cross-entropy loss.
  • the above-mentioned execution subject may determine the loss value of the sample according to the following steps:
  • the first label information in the extracted sample and the probability that the sample video belongs to low-quality video are input to a pre-established first loss function to obtain a first loss value.
  • the first loss function can be used to characterize the difference between the probability that the sample video output by the initial model belongs to low-quality video and the first annotation information.
  • the first loss function can use cross-entropy loss.
  • the first loss value in response to determining that the extracted sample does not contain the second label information, the first loss value may be determined as the loss value of the extracted sample.
  • the above-mentioned execution subject may perform the following steps to determine the loss value of the sample: First, the extracted sample The low-quality category indicated by the second labeling information serves as the target category. Then, the second label information included in the extracted sample and the probability that the sample video output by the initial model belongs to the target category may be input into a pre-established second loss function to obtain a second loss value.
  • the second loss function can be used to characterize the difference between the probability that the sample video output by the initial model belongs to the target category (that is, the low-quality category indicated by the second annotation information) and the true value (for example, 1).
  • the second loss function can also use cross-entropy loss.
  • the sum of the first loss value and the second loss value may be determined as the loss value of the extracted sample.
  • the loss value of the sample can also be obtained by other methods.
  • the weighted result of the first loss value and the second loss value is determined as the loss value of the extracted sample.
  • the weight may be preset by the technician according to need.
  • step 205 the loss value is compared with the target value to determine whether the initial model has been trained.
  • the above-mentioned execution subject may determine whether the initial model is completed based on the comparison between the determined loss value and the target value.
  • the above-mentioned executive body may determine whether the loss value has converged. When it is determined that the loss value has converged, it can be determined that the initial model at this time has been trained.
  • the above-mentioned executive body may first compare the loss value with the target value. In response to determining that the loss value is less than or equal to the target value, it may be possible to count the loss value determined by the training process in the latest preset number of times (for example, nearly 100 times), and the number of loss values less than or equal to the above target value account for the preset number proportion.
  • the ratio is greater than the preset ratio (for example, 95%), it can be determined that the initial model training is completed.
  • multiple (at least two) samples may be extracted in step 202.
  • the operation described in steps 202-204 can be used to calculate the loss value of the sample.
  • the executive body can compare the loss value of each sample with the target value.
  • the target value can generally be used to represent the ideal situation of the degree of inconsistency between the predicted value and the true value. That is to say, when the loss value is less than or equal to the target value, it can be considered that the predicted value is close to or approximate to the true value.
  • the preset value can be set according to actual needs.
  • step 206 may be continued.
  • the parameters in the initial model can be updated based on the determined loss values of the samples, and samples can be re-extracted from the above sample set, using the updated initial model as the initial model, and continuing the above training process .
  • the gradient of the loss value relative to the model parameters can be obtained using a back propagation algorithm, and then the model parameters can be updated based on the gradient using a gradient descent algorithm.
  • the above-mentioned back propagation algorithm, gradient descent algorithm and machine learning method are well-known technologies that have been widely researched and applied at present, and will not be repeated here.
  • the sample extraction method here is also not limited in this application. For example, in the case where there are a large number of samples in the sample set, the execution subject may extract unextracted samples from it.
  • step 206 in response to determining that the initial model training is completed, the trained initial model is determined to be a low-quality video detection model.
  • the above-mentioned execution subject may determine the trained initial model as a low-quality video detection model.
  • the low-quality video detection model can detect whether the video is low-quality video, and at the same time, can detect the low-quality category of the low-quality video.
  • FIG. 3 is a schematic diagram of an application scenario of the method for generating a model according to this embodiment.
  • a model training application may be installed on the terminal device 301 used by the user. After the user opens the application and uploads the sample set or the storage path of the sample set, the server 302 that provides background support for the application can run a method for generating a low-quality video detection model, including:
  • the sample set can be obtained.
  • the samples in the above sample set may include sample video 303, first labeling information 304 for indicating whether the sample video belongs to low-quality video, and second labeling information 305 for indicating low-quality category of the sample video belonging to low-quality video .
  • samples can be extracted from the above sample set, and the following training process is performed: input the frames in the sample video from the extracted samples to the initial model 306, and obtain the probability that the sample video belongs to low-quality video and the sample video belongs to each low-quality video Probability of qualitative category; based on the labeling information in the extracted sample, the obtained probability and the pre-established loss function, determine the loss value of the sample 307; compare the above loss value with the target value to determine whether the initial model is trained. If the initial model training is completed, the trained initial model is determined as the low-quality video detection model 308.
  • the method provided by the above embodiment of the present application by acquiring a sample set, can extract samples from it to train the initial model.
  • the samples in the sample set may include sample video, first labeling information indicating whether the sample video belongs to low-quality video, and second labeling information indicating low-quality categories of the sample video belonging to the low-quality video.
  • first labeling information indicating whether the sample video belongs to low-quality video
  • second labeling information indicating low-quality categories of the sample video belonging to the low-quality video.
  • the initial model training is completed, the initial model after training can be determined as a low-quality video detection model.
  • a model that can be used for low-quality video detection can be obtained, which helps to improve the efficiency of low-quality video detection.
  • FIG. 4 shows a flow 400 of yet another embodiment of a method of generating a model.
  • the process 400 of the method for generating a model includes the following steps:
  • step 401 a sample set is obtained.
  • the execution subject of the method of generating the model can obtain the sample set.
  • the sample may include a sample video and first labeling information indicating whether the sample video belongs to low-quality video.
  • the sample further includes second labeling information indicating the low-quality category of the sample video.
  • step 402 samples are extracted from the sample set.
  • the execution subject may extract samples from the sample set acquired in step 401, and perform the training process from step 403 to step 410.
  • the sample extraction method is not limited in this application.
  • a sample may be randomly selected, or a sample to be extracted from the sample set in the specified order may be selected.
  • step 403 the frames in the sample video in the extracted samples are input to the initial model, respectively obtaining the probability that the sample video belongs to low-quality video and the probability that the sample video belongs to each low-quality category.
  • the above-mentioned execution subject can input the frames in the sample video in the sample extracted in step 402 to the initial model.
  • the initial model can output the sample video by performing feature extraction and analysis on the frames in the video.
  • the probability of low-quality video, and the probability that the sample video belongs to each low-quality category can be output.
  • the initial model may use a convolutional neural network created based on machine learning techniques.
  • the built-up neural network can include convolutional layer, pooling layer, feature fusion layer, fully connected layer and so on.
  • the fully connected layer can be composed of two parts. Among them, a part can output the probability that the sample video belongs to low-quality video. Another part can output the probability that the sample video belongs to each low-quality category. In practice, each part can use independent softmax function to calculate probability.
  • step 404 the first label information in the extracted sample and the probability that the sample video belongs to the low-quality video are input to a pre-established first loss function to obtain a first loss value.
  • the above-mentioned execution subject may input the first label information in the extracted sample and the probability that the sample video output in step 403 belongs to low-quality video into the pre-established first loss function to obtain the first loss value .
  • the first loss function can be used to characterize the difference between the probability that the sample video output by the initial model belongs to low-quality video and the first annotation information.
  • the first loss function can use cross-entropy loss.
  • step 405 it is determined whether the extracted sample contains second annotation information.
  • the above-mentioned execution subject may determine whether the extracted sample contains the second annotation information. If not, step 406 can be executed to determine the loss value of the sample. If so, steps 407-408 can be performed to determine the loss value of the sample.
  • step 406 in response to determining that the extracted sample does not contain the second label information, the first loss value is determined as the loss value of the extracted sample.
  • the above-mentioned execution subject may determine the first loss value as the loss value of the extracted sample.
  • step 407 in response to determining that the extracted sample contains the second label information, the low-quality category indicated by the second label information in the extracted sample is taken as the target category, and the first The second labeling information and the probability that the sample video belongs to the target category are input to the pre-established second loss function to obtain the second loss value.
  • the above-mentioned execution subject may use the low-quality category indicated by the second annotation information in the extracted sample as the target category, and use the extracted sample
  • the second labeling information included in and the probability that the sample video belongs to the target category are input to a pre-established second loss function to obtain a second loss value.
  • the second loss function can be used to characterize the difference between the probability that the sample video output by the initial model belongs to the target category and the true value (for example, 1).
  • the second loss function can also use cross-entropy loss.
  • step 408 the sum of the first loss value and the second loss value is determined as the loss value of the extracted sample.
  • the execution subject may determine the sum of the first loss value and the second loss value as the loss value of the extracted sample.
  • step 409 the loss value is compared with the target value to determine whether the initial model has been trained.
  • the above-mentioned execution subject may determine whether the initial model is completed based on the comparison between the determined loss value and the target value.
  • the above-mentioned executive body may determine whether the loss value has converged. When it is determined that the loss value has converged, it can be determined that the initial model at this time has been trained.
  • the above-mentioned executive body may first compare the loss value with the target value. In response to determining that the loss value is less than or equal to the target value, it may be possible to count the loss value determined by the training process in the latest preset number of times (for example, nearly 100 times), and the number of loss values less than or equal to the above target value account for the preset number proportion.
  • the target value can generally be used to represent the ideal situation of the degree of inconsistency between the predicted value and the true value. That is to say, when the loss value is less than or equal to the target value, it can be considered that the predicted value is close to or approximate to the true value.
  • the preset value can be set according to actual needs.
  • step 410 may be continued.
  • the parameters in the initial model can be updated based on the determined loss values of the samples, and samples can be re-extracted from the above sample set, using the updated initial model as the initial model, and continuing the above training process .
  • the gradient of the loss value relative to the model parameters can be obtained using a back propagation algorithm, and then the model parameters can be updated based on the gradient using a gradient descent algorithm.
  • the above-mentioned back propagation algorithm, gradient descent algorithm and machine learning method are well-known technologies that have been widely researched and applied at present, and will not be repeated here.
  • the sample extraction method here is also not limited in this application. For example, in the case where there are a large number of samples in the sample set, the execution subject may extract unextracted samples from it.
  • step 410 in response to determining that the initial model training is completed, the trained initial model is determined to be a low-quality video detection model.
  • the above-mentioned execution subject may determine the trained initial model as a low-quality video detection model.
  • the low-quality video detection model can detect whether the video is low-quality video, and at the same time, can detect the low-quality category of the low-quality video.
  • the flow 400 of the method for generating a model in this embodiment involves a calculation method of the loss value. Training the initial model based on the loss value calculated in this way can enable the trained model to realize the detection function of low-quality video and the detection function of low-quality category of low-quality video. At the same time, the use of the trained low-quality video detection model for video detection helps to improve the detection speed of low-quality videos, as well as to improve the detection effect of low-quality categories.
  • the present application provides an embodiment of a device for generating a model.
  • the device embodiment corresponds to the method embodiment shown in FIG. 2, and the device can be applied Used in various electronic devices.
  • the apparatus 500 for generating a model includes: an obtaining unit 501 configured to obtain a sample set, where the sample may include a sample video and a first video indicating whether the sample video belongs to low-quality video 1. Annotate information.
  • the sample further includes second labeling information indicating the low-quality category of the sample video;
  • the training unit 502 is configured to extract samples from the above sample set, and execute as follows Training process: input the frames of the sample video in the extracted sample to the initial model, and obtain the probability that the sample video belongs to the low-quality video and the probability that the sample video belongs to each low-quality category; based on the annotation information in the extracted sample , The obtained probability and the pre-established loss function to determine the loss value of the sample; compare the above loss value with the target value to determine whether the initial model training is completed; in response to the determination that the initial model training is completed, determine the initial model after training It is a low-quality video detection model.
  • the training unit 502 may be configured to: input the first label information in the extracted sample and the probability that the sample video belongs to low-quality video into the pre-established first loss function to obtain The first loss value; in response to determining that the extracted sample does not contain the second label information, the first loss value is determined as the loss value of the extracted sample.
  • the above-mentioned training unit 502 may be configured to: in response to determining that the extracted sample contains the second annotation information, assign the low-quality category indicated by the second annotation information in the extracted sample As the target category, input the second label information contained in the extracted sample and the probability that the sample video belongs to the target category into a pre-established second loss function to obtain a second loss value; The sum of the two loss values is determined as the loss value of the sample taken.
  • the apparatus may further include an update unit (not shown in the figure).
  • the above-mentioned updating unit may be configured to update the parameters in the initial model based on the determined loss value of the sample in response to determining that the initial model is not trained, and re-extract samples from the above-mentioned sample set, using the updated initial model as the Initial model, continue to perform the above training process.
  • the sample set is obtained by the obtaining unit 501, and the training unit 502 can extract samples therefrom for initial model training.
  • the samples in the sample set may include sample video, first labeling information indicating whether the sample video belongs to low-quality video, and second labeling information indicating low-quality categories of the sample video belonging to the low-quality video.
  • the training unit 502 inputs the frames of the sample video in the extracted samples to the initial model, and can obtain the probability that the sample video output by the initial model belongs to the low-quality video and the probability that the sample video belongs to each low-quality category. Then, based on the labeled information in the extracted sample, the obtained probability and the loss function established in advance, the loss value of the sample can be determined.
  • the initial model training is completed, the initial model after training can be determined as a low-quality video detection model.
  • a model that can be used for low-quality video detection can be obtained, which helps to improve the efficiency of low-quality video detection.
  • FIG. 6 shows a process 600 of an embodiment of a method for detecting low-quality video provided by the present application.
  • the method for detecting low-quality video may include the following steps:
  • step 601 a low-quality video detection request containing a target video is received.
  • an execution subject that detects low-quality video may receive a low-quality video detection request that includes a target video.
  • the target video may be a video to be subjected to low-quality video detection.
  • the target video may be stored in the above-mentioned execution subject in advance. It may also be sent by other electronic devices (eg, terminal devices 101, 102, and 103 shown in FIG. 1).
  • step 602 the frames in the target video are input into a low-quality video detection model to obtain a detection result.
  • the above-mentioned execution subject may input the frame in the target video into the low-quality video detection model to obtain the detection result.
  • the detection result may include the probability that the target video belongs to low-quality video.
  • the low-quality video detection model may be generated by using the method for generating a low-quality video detection model as described in the embodiment of FIG. 2 above. For the generation process, reference may be made to the related description in the embodiment of FIG. 2, and details are not described herein again.
  • step 603 in response to determining that the probability that the target video belongs to low-quality video is greater than the first preset threshold, the target video is determined to be low-quality video.
  • the above-mentioned execution subject may determine that the target video is the low-quality video.
  • the detection result may further include the probability that the target video belongs to each low-quality category among multiple low-quality categories.
  • the execution subject may also perform the following operations:
  • the probability that the target video belongs to the low-quality video may be used as the first probability, and for each low-quality category, the probability that the target video belongs to the low-quality category is determined to be the same as the first
  • the product of the probabilities determines the product as the probability that the target video belongs to the low-quality category.
  • the low-quality category with a probability greater than the second preset value may be determined as the low-quality category of the target video.
  • the low-quality category of the target video can be determined.
  • the method for detecting low-quality video in this embodiment may be used to test the low-quality video detection models generated in the foregoing embodiments. Furthermore, the low-quality video detection model can be continuously optimized according to the test results. This method may also be a practical application method of the low-quality video detection model generated in the above embodiments. Using the low-quality video detection models generated in the above embodiments for low-quality video detection helps to improve the performance of the low-quality video detection model. At the same time, the use of the above-mentioned low-quality video detection model for low-quality video detection improves the detection speed of low-quality videos and the detection effect of low-quality categories.
  • the present application provides an embodiment of an apparatus for detecting low-quality video.
  • the device embodiment corresponds to the method embodiment shown in FIG. 6, and the device can be applied to various electronic devices.
  • the apparatus 700 for detecting low-quality video includes: a first receiving unit 701 configured to receive a low-quality video detection request including a target video; an input unit 702 configured to use the above The frames in the target video are input into the low-quality video detection model, and the detection result is obtained.
  • the detection result includes the probability that the target video belongs to low-quality video; the first determining unit 703 is configured to determine that the target video is low in response to determining that the probability that the target video belongs to low-quality video is greater than a first preset threshold Quality video.
  • the detection result may further include the probability that the target video belongs to each low-quality category among multiple low-quality categories.
  • the above device may further include a second receiving unit and a second determining unit (not shown in the figure).
  • the second receiving unit may be configured to, in response to receiving the low quality category detection request, use the probability that the target video belongs to the low quality video as the first probability, and for each low quality category, determine that the target video belongs to each type
  • the product of the probability of the low-quality category and the first probability described above is determined as the probability that the target video belongs to each low-quality category.
  • the second determining unit may be configured to determine a low-quality category whose probability that the target video belongs to each low-quality category is greater than a second preset value as the low-quality category of the target video.
  • the units recorded in the device 700 correspond to the various steps in the method described with reference to FIG. 6. Therefore, the operations, features, and beneficial effects described above for the method are also applicable to the device 700 and the units included therein, and details are not described herein again.
  • FIG. 8 shows a schematic structural diagram of a computer system 800 suitable for implementing an electronic device according to an embodiment of the present application.
  • the electronic device shown in FIG. 8 is only an example, and should not bring any limitation to the functions and use scope of the embodiments of the present application.
  • the computer system 800 includes a central processing unit (Central Processing Unit, CPU) 801, which can be loaded into a random unit according to a program stored in a read-only memory (Read-Only Memory, ROM) 802 or from the storage section 808 Random Access (RAM) 803 programs are accessed to perform various appropriate actions and processes. In the RAM 803, various programs and data necessary for the operation of the system 800 are also stored.
  • the CPU 801, ROM 802, and RAM 803 are connected to each other through a bus 804.
  • An input / output (Input / Output, I / O) interface 805 is also connected to the bus 804.
  • the following components are connected to the I / O interface 805: an input section 806 including a keyboard, a mouse, etc .; including an output section 807 such as a cathode ray tube (Cathode Ray Tube, CRT), a liquid crystal display (Liquid Crystal Display, LCD), etc. and a speaker, etc.
  • the communication section 809 performs communication processing via a network such as the Internet.
  • the driver 810 is also connected to the I / O interface 805 as needed.
  • a removable medium 811 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like, is installed on the drive 810 as necessary, so that a computer program read therefrom is installed into the storage portion 808 as necessary.
  • the process described above with reference to the flowchart may be implemented as a computer software program.
  • embodiments of the present disclosure include a computer program product that includes a computer program carried on a computer-readable medium, the computer program containing program code for performing the method shown in the flowchart.
  • the computer program may be downloaded and installed from the network through the communication section 809, or installed from at least one of two ways from the removable medium 811.
  • CPU central processing unit
  • the computer-readable medium described in this application may be a computer-readable signal medium or a computer-readable storage medium or any combination of the two.
  • the computer-readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or any combination of the above.
  • Computer-readable storage media may include, but are not limited to: electrical connections with at least one wire, portable computer disk, hard disk, random access memory (RAM), read-only memory (ROM), erasable programmable only Erasable Programmable Read-Only Memory (EPROM) or flash memory, optical fiber, portable compact disk read-only memory (Compact Disc Read-Only Memory, CD-ROM), optical storage device, magnetic storage device, or any suitable combination.
  • the computer-readable storage medium may be any tangible medium that contains or stores a program, and the program may be used by or in combination with an instruction execution system, apparatus, or device.
  • the computer-readable signal medium may include a data signal that is propagated in a baseband or as part of a carrier wave, in which a computer-readable program code is carried.
  • This propagated data signal can take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • the computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, and the computer-readable medium may send, propagate, or transmit a program for use by or in combination with an instruction execution system, apparatus, or device. .
  • the program code contained on the computer-readable medium may be transmitted on any appropriate medium, including but not limited to: wireless, wire, optical cable, radio frequency (Radio Frequency, RF), etc., or any suitable combination of the foregoing.
  • each block in the flowchart or block diagram may represent a module, a program segment, or a part of code, and the module, program segment, or a part of code contains at least one Execute instructions.
  • the functions noted in the block may occur out of the order noted in the figures. For example, two blocks represented in succession may actually be executed in parallel, and they may sometimes be executed in reverse order, depending on the functions involved.
  • each block in the block diagrams and / or flowcharts, and combinations of blocks in the block diagrams and / or flowcharts can be implemented with dedicated hardware-based systems that perform specified functions or operations Or, it can be realized by a combination of dedicated hardware and computer instructions.
  • the units described in the embodiments of the present application may be implemented in software or hardware.
  • the described unit may also be provided in the processor.
  • a processor includes an acquisition unit and a training unit.
  • the names of these units do not constitute a limitation on the unit itself.
  • the acquisition unit can also be described as a “unit for acquiring a sample set”.
  • the present application also provides a computer-readable medium, which may be included in the device described in the foregoing embodiments; or may exist alone without being assembled into the device.
  • the computer-readable medium carries at least one program, and when the at least one program is executed by the device, the device is caused to: acquire a sample set; extract samples from the sample set, and perform the following training process: the samples in the extracted samples
  • the frames in the video are input to the initial model to obtain the probability that the sample video belongs to the low-quality video and the sample video belongs to each low-quality category; based on the labeling information in the extracted sample, the obtained probability, and the pre-established loss function, Determine the loss value of the sample; based on the comparison of the loss value and the target value, determine whether the initial model training is completed; in response to the determination that the initial model training is completed, determine the initial model after training as a low-quality video detection model.

Abstract

一种生成模型方法和装置。该方法的一示例实施方式包括:获取样本集(201);从该样本集中提取样本(202),执行如下训练过程:将所提取的样本中的样本视频中的帧输入至初始模型,分别得到样本视频属于低质视频的概率和样本视频属于每种低质类别的概率(203);基于所提取的样本中的标注信息、所得到的概率和预先建立的损失函数,确定样本的损失值(204);将损失值与目标值进行比较,确定初始模型是否训练完成(205);响应于确定初始模型训练完成,将训练后的初始模型确定为低质视频检测模型(206)。

Description

生成模型的方法和装置
本申请要求在2018年10月30日提交中国专利局、申请号为201811273468.2的中国专利申请的优先权,该申请的全部内容通过引用结合在本申请中。
技术领域
本申请实施例涉及计算机技术领域,例如生成模型的方法和装置。
背景技术
随着计算机技术的发展,短视频类应用应运而生。用户可以利用短视频类应用上传、发布视频。服务器在接收到一个视频后,可以对该视频进行检测,以确定其是否为低质视频。此处,低质视频通常为质量较低的视频,例如,可以包括模糊视频、黑屏视频、录屏视频等。
相关技术中,通常是将视频分为多类,例如分为黑屏视频类、录屏视频类、模糊视频类、正常视频类。训练分类模型确定视频属于每个类别的概率,将视频属于非正常视频的概率之和作为视频属于低质视频的概率,进而确定视频是否为低质视频。
发明内容
本申请实施例提出了生成模型的方法和装置。
第一方面,本申请实施例提供了一种生成模型的方法,该方法包括:获取样本集,其中,所述样本集中的样本包括样本视频和用于指示所述样本视频是否属于低质视频的第一标注信息,在样本视频属于低质视频的情况下,所述样本还包括用于指示所述样本视频的低质类别的第二标注信息,所述样本视频的低质类别为多种;从所述样本集中提取样本,执行如下训练过程:将所提取的样本中的样本视频中的帧输入至初始模型,分别得到样本视频属于低质视频的概率和样本视频属于每种低质类别的概率;基于所提取的样本中的标注信息、所得到的概率和预先建立的损失函数,确定样本的损失值;将所述损失值与目标值进行比较,确定所述初始模型是否训练完成;响应于确定所述初始模型训练完成,将训练后的所述初始模型确定为低质视频检测模型。
第二方面,本申请实施例提供了一种生成模型的装置,该装置包括:获取单元,被配置成获取样本集,其中,所述样本集中的样本包括样本视频和用于指示所述样本视频是否属于低质视频的第一标注信息,在所述样本视频属于低质视频的情况下,所述样本还包括用于指示所述样本视频的低质类别 的第二标注信息,所述样本视频的低质类别为多种;训练单元,被配置成从样本集中提取样本,执行如下训练过程:将所提取的样本中的样本视频的帧输入至初始模型,分别得到样本视频属于低质视频的概率和样本视频属于每种低质类别的概率;基于所提取的样本中的标注信息、所得到的概率和预先建立的损失函数,确定样本的损失值;将所述损失值与目标值进行比较,确定所述初始模型是否训练完成;响应于确定所述初始模型训练完成,将训练后的所述初始模型确定为低质视频检测模型。
第三方面,本申请实施例提供了一种检测低质视频的方法,包括:接收包含目标视频的低质视频检测请求;将所述目标视频中的帧输入采用如上述第一方面中任一实施例所描述的方法生成的低质视频检测模型,得到检测结果,其中,所述检测结果包括所述目标视频属于低质视频的概率;响应于确定所述目标视频属于低质视频的概率大于第一预设阈值,确定所述目标视频为低质视频。
第四方面,本申请实施例提供了一种检测低质视频的装置,包括:第一接收单元,被配置成接收包含目标视频的低质视频检测请求;输入单元,被配置成将所述目标视频中的帧输入采用如上述第一方面中任一实施例所描述的方法生成的低质视频检测模型,得到检测结果,其中,所述检测结果包括所述目标视频属于低质视频的概率;第一确定单元,被配置成响应于确定所述目标视频属于低质视频的概率大于第一预设阈值,确定所述目标视频为低质视频。
第五方面,本申请实施例提供了一种电子设备,包括:至少一个处理器;存储装置,其上存储有至少一个程序,当至少一个程序被至少一个处理器执行,使得至少一个处理器实现如上述第一方面和第三方面中任一实施例的方法。
第六方面,本申请实施例提供了一种计算机可读介质,其上存储有计算机程序,所述程序被处理器执行时实现如上述第一方面和第三方面中任一实施例的方法。
附图说明
图1是本申请的一个实施例可以应用于其中的示例性系统架构图;
图2是根据本申请的生成模型的方法的一个实施例的流程图;
图3是根据本申请的生成模型的方法的一个应用场景的示意图;
图4是根据本申请的生成模型的方法的又一个实施例的流程图;
图5是根据本申请的生成模型的装置的一个实施例的结构示意图;
图6是根据本申请的检测低质视频的方法的一个实施例的流程图;
图7是根据本申请的检测低质视频的装置的一个实施例的结构示意图;
图8是适于用来实现本申请实施例的电子设备的计算机系统的结构示意图。
具体实施方式
下面结合附图和实施例对本申请作详细说明。可以理解的是,此处所描述的示例实施例仅仅用于解释相关发明,而非对该发明的限定。另外还需要说明的是,为了便于描述,附图中仅示出了与有关发明相关的部分。
需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互组合。下面将参考附图并结合实施例来详细说明本申请。
图1示出了可以应用本申请的生成模型的方法或生成模型的装置的示例性系统架构100。
如图1所示,系统架构100可以包括终端设备101、102、103,网络104和服务器105。网络104用以在终端设备101、102、103和服务器105之间提供通信链路的介质。网络104可以包括各种连接类型,例如有线、无线通信链路或者光纤电缆等等。
用户可以使用终端设备101、102、103通过网络104与服务器105交互,以接收或发送消息等。终端设备101、102、103上可以安装有各种通讯客户端应用,例如视频录制类应用、视频播放类应用、语音交互类应用、搜索类应用、即时通信工具、邮箱客户端、社交平台软件等。
终端设备101、102、103可以是硬件,也可以是软件。当终端设备101、102、103为硬件时,可以是具有显示屏的各种电子设备,包括但不限于智能手机、平板电脑、膝上型便携计算机和台式计算机等等。当终端设备101、102、103为软件时,可以安装在上述所列举的电子设备中。其可以实现成多个软件或软件模块(例如用来提供分布式服务),也可以实现成单个软件或软件模块。在此不做具体限定。
当终端设备101、102、103为硬件时,其上还可以安装有图像采集设备。图像采集设备可以是各种能实现采集图像功能的设备,如摄像头、传感器等等。用户可以利用终端设备101、102、103上的图像采集设备,来采集视频。
服务器105可以是提供各种服务的服务器,例如用于对终端设备101、102、103上传的视频进行存储、管理或者分析的视频处理服务器。视频处理服务器可以获取样本集。样本集中可以包含大量的样本。其中,上述样本集中的样本可以包括样本视频、用于指示样本视频是否属于低质视频的第一标注信息、用于指示属于低质视频的样本视频的低质类别的第二标注信息。此外,视频处理服务器可以利用样本集中的样本,对初始模型进行训练,并可 以将训练结果(如生成的低质视频检测模型)进行存储。这样,在用户利用终端设备101、102、103上传视频后,服务器105可以检测用户所上传的视频是否为低质视频,进而,可以进行提示信息推送等操作。
需要说明的是,服务器105可以是硬件,也可以是软件。当服务器为硬件时,可以实现成多个服务器组成的分布式服务器集群,也可以实现成单个服务器。当服务器为软件时,可以实现成多个软件或软件模块(例如用来提供分布式服务),也可以实现成单个软件或软件模块。在此不做具体限定。
需要说明的是,本申请实施例所提供的生成模型的方法一般由服务器105执行,相应地,生成模型的装置一般设置于服务器105中。
应该理解,图1中的终端设备、网络和服务器的数目仅仅是示意性的。根据实现需要,可以具有任意数目的终端设备、网络和服务器。
继续参考图2,示出了根据本申请的生成模型的方法的一个实施例的流程200。该生成模型的方法,包括以下步骤:
在步骤201中,获取样本集。
在本实施例中,生成模型的方法的执行主体(例如图1所示的服务器105)可以通过多种方式来获取样本集。例如,执行主体可以通过有线连接方式或无线连接方式,从设置为存储样本的另一服务器(例如数据库服务器)中获取存储于其中的样本集。再例如,用户可以通过终端设备(例如图1所示的终端设备101、102、103)来收集样本。这样,上述执行主体可以接收终端所收集的样本,并将这些样本存储在本地,从而生成样本集。需要指出的是,上述无线连接方式可以包括但不限于3G/4G连接、WiFi连接、蓝牙连接、WiMAX连接、Zigbee连接、超宽带(ultra wideband,UWB)连接、以及其他现在已知或将来开发的无线连接方式。
此处,样本集中可以包括大量的样本。其中,样本可以包括样本视频和用于指示样本视频是否属于低质视频的第一标注信息。例如,属于低质视频时,第一标注信息可以是“1”;不属于低质视频时,第一标注信息可以是“0”。在样本中的样本视频属于低质视频的情况下,该样本还包括用于指示该样本视频的低质类别的第二标注信息。
需要说明的是,低质视频通常为质量较低的视频。例如,低质视频可以包括但不限于模糊视频、黑屏视频、录屏视频等。相应的,低质类别可以包括但不限于模糊视频类、黑屏视频类、录屏视频类等。
在步骤202中,从样本集中提取样本。
在本实施例中,执行主体可以从步骤201中获取的样本集中提取样本,以及执行步骤203至步骤206的训练过程。其中,样本的提取方式在本申请 中并不限制。例如,可以是随机提取一个样本,也可以是从中样本集中按照指定次序提取当前需提取的样本。
在步骤203中,将所提取的样本中的样本视频中的帧输入至初始模型,分别得到样本视频属于低质视频的概率和样本视频属于每种低质类别的概率。
在本实施例中,执行主体可以将步骤202中所提取的样本中的样本视频中的帧输入至初始模型,初始模型通过对视频中的帧进行特征提取、分析等,可以输出样本视频属于低质视频的概率,以及,可以输出样本视频属于每种低质类别的概率。需要说明的是,样本视频属于每种低质类别的概率,可以理解为,在样本视频属于低质视频的情况下,该样本视频属于每种低质类别的条件概率。
在本实施例中,初始模型可以是基于机器学习技术而创建的各种具有图像特征提取功能和分类功能的模型。初始模型可以对视频中的帧进行特征提取,而后对所提取的特征进行融合、分析等处理,最终输出样本视频属于低质视频的概率和样本视频属于每种低质类别的概率。实践中,在初始模型的训练过程中,初始模型所输出的各项概率通常不准确。训练初始模型的目的是试训练后的初始模型输出的各项概率更准确。
作为示例,初始模型可以是使用各种相关技术中的结构(例如DenseBox、VGGNet、ResNet、SegNet等)的卷积神经网络。实践中,卷积神经网络(Convolutional Neural Network,CNN)是一种前馈神经网络,它的人工神经元可以响应一部分覆盖范围内的周围单元,对于图像处理有出色表现,因而,可以利用卷积神经网络进行样本视频中的帧特征的提取。在本示例中,所建立的积神经网络可以包含卷积层、池化层、特征融合层、全连接层等。其中,卷积层可以用于提取图像特征。池化层可以用于对输入的信息进行降采样(downsample)。特征融合层可以用于将所得到的每帧对应的图像特征(例如,可以是特征矩阵的形式,或者特征向量的形式)进行融合。例如,可以将不同帧对应的特征矩阵中的相同位置的特征值取平均,以进行特征融合,生成一个融合后的特征矩阵。全连接层可以用于将所得到的特征进行分类。
可以理解的是,由于初始模型可以输出样本视频属于低质视频的概率,以及样本视频属于每种低质类别的概率。因而,全连接层可以由两部分构成。其中,一部分可以输出样本视频属于低质视频的概率。另一部分可以输出样本视频属于每种低质类别的概率。实践中,各部分可以分别使用独立的softmax函数进行概率计算。
需要说明的是,上述初始模型也可以是具有图像特征提取功能和分类功能的其他模型,并不限于上述示例,具体的模型结构此处不作限定。
在步骤204中,基于所提取的样本中的标注信息、所得到的概率和预先建立的损失函数,确定样本的损失值。
在本实施例中,上述执行主体可以基于所提取的样本中的标注信息(包含第一标注信息和第二标注信息)、所得到的概率和预先建立的损失函数,确定样本的损失值。实践中,损失函数(loss function)可以用来估量初始模型所输出的信息(如概率)与真实值(如标注信息)的不一致程度。一般情况下,损失函数的值(损失值)越小,模型的鲁棒性就越好。损失函数可以根据实际需求来设置。
在本实施例中,损失函数的设置可以考虑到两部分的损失(例如可以设定为两部分损失之和,或者两部分损失的加权结果)。其中一部分损失可以用于表征初始模型所输出的样本视频属于低质视频的概率与真实值(如第一标注信息,若第一标注信息指示样本视频为低质视频,则真实值为1;反之,为0)的差异程度。另一部分损失可以用于表征初始模型输出的样本视频属于第二标注信息所指示的低质类别的概率与真实值(如1)的差异程度。需要说明的是,当所提取的样本中未含有第二标注信息时,可以将该部分损失设置成预设值(例如0)。实践中,两部分损失可以分别利用交叉熵损失进行计算。
在本实施例的一些实现方式中,上述执行主体可以按照如下步骤确定样本的损失值:
第一步,将所提取的样本中的第一标注信息、样本视频属于低质视频的概率输入至预先建立的第一损失函数,得到第一损失值。此处,第一损失函数可以用于表征初始模型所输出的样本视频属于低质视频的概率与第一标注信息的差异程度。实践中,第一损失函数可以使用交叉熵损失。
第二步,响应于确定所提取的样本中不包含第二标注信息,可以将上述第一损失值确定为所提取的样本的损失值。
在一实施例中,在上述实现方式中,响应于确定所提取的样本中包含第二标注信息,上述执行主体可以执行如下步骤以确定样本的损失值:首先,可以将所提取的样本中的第二标注信息所指示的低质类别作为目标类别。而后,可以将所提取的样本中所包含的第二标注信息、初始模型所输出的样本视频属于该目标类别的概率输入至预先建立的第二损失函数,得到第二损失值。此处,第二损失函数可以用于表征初始模型所输出的样本视频属于目标类别(即第二标注信息所指示的低质类别)的概率与真实值(例如1)的差异程度。实践中,第二损失函数也可以使用交叉熵损失。之后,可以将上述第一损失值与上述第二损失值之和确定为所提取的样本的损失值。此处,还可以利用其他方式得到样本的损失值。例如,将第一损失值与上述第二损失 值的加权结果确定为所提取的样本的损失值。其中,权重可以是技术人员根据需要而预先设置的。
在步骤205中,将损失值与目标值进行比较,确定初始模型是否训练完成。
在本实施例中,上述执行主体可以基于所确定损失值与目标值的比较,确定初始模型是否训练完成。作为示例,上述执行主体可以确定损失值是否已收敛。当确定损失值收敛时,则可以确定此时的初始模型已训练完成。作为又一示例,上述执行主体可以首先将损失值与目标值进行比较。响应于确定损失值小于或等于目标值,可以统计最近的预设数量次(例如近100次)训练过程所确定的损失值中,小于或等于上述目标值的损失值的数量占该预设数量的比例。在该比例大于预设比例(例如95%)时,可以确定初始模型训练完成。需要说明的是,步骤202中可以提取有多个(至少两个)样本。此时,可以针对每一个样本,可以分别利用步骤202-步骤204所记载的操作来计算样本的损失值。此时,执行主体可以将每个样本的损失值分别与目标值进行比较。从而可以确定每个样本的损失值是否小于或等于目标值。需要指出的是,目标值一般可以用于表示预测值与真实值之间的不一致程度的理想情况。也就是说,当损失值小于或等于目标值时,可以认为预测值接近或近似真实值。预设值可以根据实际需求来设置。
需要说明的是,响应于确定初始模型已训练完成,则可以继续执行步骤206。响应于确定初始模型未训练完成,可以基于所确定的样本的损失值,更新初始模型中的参数,从上述样本集中重新提取样本,使用更新参数后的初始模型作为初始模型,继续执行上述训练过程。此处,可以利用反向传播算法求得损失值相对于模型参数的梯度,而后利用梯度下降算法基于梯度更新模型参数。需要说明的是,上述反向传播算法、梯度下降算法以及机器学习方法是目前广泛研究和应用的公知技术,在此不再赘述。需要指出的是,这里的样本提取方式在本申请中也不限制。例如,在样本集中有大量样本的情况下,执行主体可以从中提取未被提取过的样本。
在步骤206中,响应于确定初始模型训练完成,将训练后的初始模型确定为低质视频检测模型。
在本实施例中,响应于确定初始模型训练完成,上述执行主体可以将训练后的初始模型确定为低质视频检测模型。该低质视频检测模型可以对检测视频是否为低质视频,同时,可以检测出低质视频的低质类别。
继续参见图3,图3是根据本实施例的生成模型的方法的应用场景的一个示意图。在图3的应用场景中,在图3的应用场景中,用户所使用的终端设备301上可以安装有模型训练类应用。当用户打开该应用,并上传样本集 或样本集的存储路径后,对该应用提供后台支持的服务器302可以运行生成低质视频检测模型的方法,包括:
可以获取样本集。其中,上述样本集中的样本可以包括样本视频303、用于指示样本视频是否属于低质视频的第一标注信息304、用于指示属于低质视频的样本视频的低质类别的第二标注信息305。之后,可以从上述样本集中提取样本,执行如下训练过程:将所提取的样本中的样本视频中的帧输入至初始模型306,分别得到样本视频属于低质视频的概率和样本视频属于每种低质类别的概率;基于所提取的样本中的标注信息、所得到的概率和预先建立的损失函数,确定样本的损失值307;将上述损失值与目标值进行比较,确定初始模型是否训练完成。如果初始模型训练完成,将训练后的初始模型确定为低质视频检测模型308。
本申请的上述实施例提供的方法,通过获取样本集,可以从中提取样本以进行初始模型的训练。其中,上述样本集中的样本可以包括样本视频、用于指示样本视频是否属于低质视频的第一标注信息、用于指示属于低质视频的样本视频的低质类别的第二标注信息。这样,将提取的样本中的样本视频的帧输入至初始模型,便可以得到初始模型所输出的样本视频属于低质视频的概率和样本视频属于每种低质类别的概率。而后,基于所提取的样本中的标注信息、所得到的概率和预先建立的损失函数,即可确定样本的损失值。之后,将上述损失值与目标值进行比较,可以确定初始模型是否训练完成。如果初始模型训练完成,就可以将训练后的初始模型确定为低质视频检测模型。从而,能够得到一种可以用于低质视频检测的模型,该模型有助于提高对低质视频检测的效率。
参考图4,其示出了生成模型的方法的又一个实施例的流程400。该生成模型的方法的流程400,包括以下步骤:
在步骤401中,获取样本集。
在本实施例中,生成模型的方法的执行主体(例如图1所示的服务器105)可以获取样本集。其中,样本可以包括样本视频和用于指示样本视频是否属于低质视频的第一标注信息。在样本中的样本视频属于低质视频的情况下,该样本还包括用于指示该样本视频的低质类别的第二标注信息。
在步骤402中,从样本集中提取样本。
在本实施例中,执行主体可以从步骤401中获取的样本集中提取样本,以及执行步骤403至步骤410的训练过程。其中,样本的提取方式在本申请中并不限制。例如,可以是随机提取一个样本,也可以是从中样本集中按照指定次序提取当前需提取的样本。
在步骤403中,将所提取的样本中的样本视频中的帧输入至初始模型, 分别得到样本视频属于低质视频的概率和样本视频属于每种低质类别的概率。
在本实施例中,上述执行主体可以将步骤402中所提取的样本中的样本视频中的帧输入至初始模型,初始模型通过对视频中的帧进行特征提取、分析等,可以输出样本视频属于低质视频的概率,以及,可以输出样本视频属于每种低质类别的概率。
在本实施例中,初始模型可以使用基于机器学习技术而创建的卷积神经网络。所建立的积神经网络可以包含卷积层、池化层、特征融合层、全连接层等。全连接层可以由两部分构成。其中,一部分可以输出样本视频属于低质视频的概率。另一部分可以输出样本视频属于每种低质类别的概率。实践中,各部分可以分别使用独立的softmax函数进行概率计算。
在步骤404中,将所提取的样本中的第一标注信息、样本视频属于低质视频的概率输入至预先建立的第一损失函数,得到第一损失值。
在本实施例中,上述执行主体可以将所提取的样本中的第一标注信息、步骤403所输出的样本视频属于低质视频的概率输入至预先建立的第一损失函数,得到第一损失值。此处,第一损失函数可以用于表征初始模型所输出的样本视频属于低质视频的概率与第一标注信息的差异程度。实践中,第一损失函数可以使用交叉熵损失。
在步骤405中,确定所提取的样本中是否包含第二标注信息。
在本实施例中,上述执行主体可以确定所提取的样本中是否包含第二标注信息。若不包含,可以执行步骤406,以确定样本的损失值。若包含,可以执行步骤407-408,以确定样本的损失值。
在步骤406中,响应于确定所提取的样本中不包含第二标注信息,将第一损失值确定为所提取的样本的损失值。
在本实施例中,响应于确定所提取的样本中不包含第二标注信息,上述执行主体可以将第一损失值确定为所提取的样本的损失值。
在步骤407中,响应于确定所提取的样本中包含第二标注信息,将所提取的样本中的第二标注信息所指示的低质类别作为目标类别,将所提取的样本中所包含的第二标注信息、样本视频属于目标类别的概率输入至预先建立的第二损失函数,得到第二损失值。
在本实施例中,响应于确定所提取的样本中包含第二标注信息,上述执行主体可以将所提取的样本中的第二标注信息所指示的低质类别作为目标类别,将所提取的样本中所包含的第二标注信息、样本视频属于目标类别的概率输入至预先建立的第二损失函数,得到第二损失值。此处,第二损失函数可以用于表征初始模型所输出的样本视频属于目标类别概率与真实值(例如1)的差异程度。实践中,第二损失函数也可以使用交叉熵损失。
在步骤408中,将第一损失值与第二损失值之和确定为所提取的样本的损失值。
在本实施例中,上述执行主体可以将上述第一损失值与上述第二损失值之和确定为所提取的样本的损失值。
在步骤409中,将损失值与目标值进行比较,确定初始模型是否训练完成。
在本实施例中,上述执行主体可以基于所确定损失值与目标值的比较,确定初始模型是否训练完成。作为示例,上述执行主体可以确定损失值是否已收敛。当确定损失值收敛时,则可以确定此时的初始模型已训练完成。作为示例,上述执行主体可以首先将损失值与目标值进行比较。响应于确定损失值小于或等于目标值,可以统计最近的预设数量次(例如近100次)训练过程所确定的损失值中,小于或等于上述目标值的损失值的数量占该预设数量的比例。在该比例大于预设比例(例如95%)时,可以确定初始模型训练完成。需要说明的是,目标值一般可以用于表示预测值与真实值之间的不一致程度的理想情况。也就是说,当损失值小于或等于目标值时,可以认为预测值接近或近似真实值。预设值可以根据实际需求来设置。
需要说明的是,响应于确定初始模型已训练完成,则可以继续执行步骤410。响应于确定初始模型未训练完成,可以基于所确定的样本的损失值,更新初始模型中的参数,从上述样本集中重新提取样本,使用更新参数后的初始模型作为初始模型,继续执行上述训练过程。此处,可以利用反向传播算法求得损失值相对于模型参数的梯度,而后利用梯度下降算法基于梯度更新模型参数。需要说明的是,上述反向传播算法、梯度下降算法以及机器学习方法是目前广泛研究和应用的公知技术,在此不再赘述。需要指出的是,这里的样本提取方式在本申请中也不限制。例如,在样本集中有大量样本的情况下,执行主体可以从中提取未被提取过的样本。
在步骤410中,响应于确定初始模型训练完成,将训练后的初始模型确定为低质视频检测模型。
在本实施例中,响应于确定初始模型训练完成,上述执行主体可以将训练后的初始模型确定为低质视频检测模型。该低质视频检测模型可以对检测视频是否为低质视频,同时,可以检测出低质视频的低质类别。
从图4中可以看出,与图2对应的实施例相比,本实施例中的生成模型的方法的流程400涉及了损失值的一种计算方式。基于这种方式计算得到的损失值进行初始模型的训练,可以使训练后的模型实现对低质视频的检测功能,以及,实现对低质视频的低质类别的检测功能。同时,利用所训练出的低质视频检测模型进行视频检测,有助于提升对低质视频的检测速度,以及, 有助于提升对低质类别的检测效果。
参考图5,作为对上述各图所示方法的实现,本申请提供了一种生成模型的装置的一个实施例,该装置实施例与图2所示的方法实施例相对应,该装置可以应用于各种电子设备中。
如图5所示,本实施例所述的生成模型的装置500包括:获取单元501,被配置成获取样本集,其中,样本可以包括样本视频和用于指示样本视频是否属于低质视频的第一标注信息。在样本中的样本视频属于低质视频的情况下,该样本还包括用于指示该样本视频的低质类别的第二标注信息;训练单元502,被配置成从上述样本集中提取样本,执行如下训练过程:将所提取的样本中的样本视频的帧输入至初始模型,分别得到样本视频属于低质视频的概率和样本视频属于每种低质类别的概率;基于所提取的样本中的标注信息、所得到的概率和预先建立的损失函数,确定样本的损失值;将上述损失值与目标值进行比较,确定初始模型是否训练完成;响应于确定初始模型训练完成,将训练后的初始模型确定为低质视频检测模型。
在本实施例的一些实现方式中,上述训练单元502可以被配置成:将所提取的样本中的第一标注信息、样本视频属于低质视频的概率输入至预先建立的第一损失函数,得到第一损失值;响应于确定所提取的样本中不包含第二标注信息,将上述第一损失值确定为所提取的样本的损失值。
在本实施例的一些实现方式中,上述训练单元502可以被配置成:响应于确定所提取的样本中包含第二标注信息,将所提取的样本中的第二标注信息所指示的低质类别作为目标类别,将所提取的样本中所包含的第二标注信息、样本视频属于目标类别的概率输入至预先建立的第二损失函数,得到第二损失值;将上述第一损失值与上述第二损失值之和确定为所提取的样本的损失值。
在本实施例的一些实现方式中,该装置还可以包括更新单元(图中未示出)。其中,上述更新单元可以被配置成响应于确定初始模型未训练完成,基于所确定的样本的损失值,更新初始模型中的参数,从上述样本集中重新提取样本,使用更新参数后的初始模型作为初始模型,继续执行上述训练过程。
本申请的上述实施例提供的装置,通过获取单元501获取样本集,训练单元502可以从中提取样本以进行初始模型的训练。其中,上述样本集中的样本可以包括样本视频、用于指示样本视频是否属于低质视频的第一标注信息、用于指示属于低质视频的样本视频的低质类别的第二标注信息。这样,训练单元502将提取的样本中的样本视频的帧输入至初始模型,便可以得到初始模型所输出的样本视频属于低质视频的概率和样本视频属于每种低质类别的概率。而后,基于所提取的样本中的标注信息、所得到的概率和预先建 立的损失函数,即可确定样本的损失值。之后,将上述损失值与目标值进行比较,可以确定初始模型是否训练完成。如果初始模型训练完成,就可以将训练后的初始模型确定为低质视频检测模型。从而,能够得到一种可以用于低质视频检测的模型,该模型有助于提高对低质视频检测的效率。
请参见图6,其示出了本申请提供的检测低质视频的方法的一个实施例的流程600。该检测低质视频的方法可以包括以下步骤:
在步骤601中,接收包含目标视频的低质视频检测请求。
在本实施例中,检测低质视频的执行主体(例如图1所示的服务器105,或者存储有低质视频检测模型的其他服务器)可以接收包含目标视频的低质视频检测请求。此处,目标视频可以是待进行低质视频检测的视频。目标视频可以预先存储在上述执行主体中。也可以是其他电子设备(例如图1所示的终端设备101、102、103)所发送的。
在步骤602中,将目标视频中的帧输入低质视频检测模型,得到检测结果。
在本实施例中,上述执行主体可以将目标视频中的帧输入低质视频检测模型,得到检测结果。其中,上述检测结果可以包括上述目标视频属于低质视频的概率。低质视频检测模型可以是采用如上述图2实施例所描述的生成低质视频检测模型的方法而生成的。生成过程可以参见图2实施例的相关描述,此处不再赘述。
在步骤603中,响应于确定目标视频属于低质视频的概率大于第一预设阈值,确定目标视频为低质视频。
在本实施例中,响应于确定目标视频属于低质视频的概率大于第一预设阈值,上述执行主体可以确定目标视频为低质视频。
在本实施例的一些实现方式中,上述检测结果还可以包括上述目标视频属于多种低质类别中每种低质类别的概率。在确定上述目标视频为低质视频之后,上述执行主体还可以执行如下操作:
首先,响应于接收到低质类别检测请求,可以将上述目标视频属于低质视频的概率作为第一概率,对于每种低质类别,确定上述目标视频属于该低质类别的概率与上述第一概率的乘积,将上述乘积确定为上述目标视频属于该低质类别的概率。
之后,可以将概率大于第二预设数值的低质类别确定为上述目标视频的低质类别。从而,可以确定出目标视频的低质类别。
需要说明的是,本实施例检测低质视频的方法可以用于测试上述各实施例所生成的低质视频检测模型。进而根据测试结果可以不断地优化低质视频检测模型。该方法也可以是上述各实施例所生成的低质视频检测模型的实际 应用方法。采用上述各实施例所生成的低质视频检测模型,来进行低质视频检测,有助于提高低质视频检测模型的性能。同时,利用上述低质视频检测模型进行低质视频检测,提升了对低质视频的检测速度,以及,提升了对低质类别的检测效果。
继续参见图7,作为对上述图6所示方法的实现,本申请提供了一种检测低质视频的装置的一个实施例。该装置实施例与图6所示的方法实施例相对应,该装置可以应用于各种电子设备中。
如图7所示,本实施例所述的检测低质视频的装置700包括:第一接收单元701,被配置成接收包含目标视频的低质视频检测请求;输入单元702,被配置成将上述目标视频中的帧输入低质视频检测模型,得到检测结果。其中,上述检测结果包括上述目标视频属于低质视频的概率;第一确定单元703,被配置成响应于确定上述目标视频属于低质视频的概率大于第一预设阈值,确定上述目标视频为低质视频。
在本实施例的一些实现方式中,上述检测结果还可以包括上述目标视频属于多种低质类别中每种低质类别的概率。上述装置还可以包括第二接收单元和第二确定单元(图中未示出)。其中,上述第二接收单元可以被配置成响应于接收到低质类别检测请求,将上述目标视频属于低质视频的概率作为第一概率,对于每种低质类别,确定上述目标视频属于每种低质类别的概率与上述第一概率的乘积,将上述乘积确定为上述目标视频属于每种低质类别的概率。上述第二确定单元可以被配置成将目标视频属于每种低质类别的概率大于第二预设数值的低质类别确定为上述目标视频的低质类别。
可以理解的是,该装置700中记载的诸单元与参考图6描述的方法中的各个步骤相对应。由此,上文针对方法描述的操作、特征以及产生的有益效果同样适用于装置700及其中包含的单元,在此不再赘述。
下面参考图8,其示出了适于用来实现本申请实施例的电子设备的计算机系统800的结构示意图。图8示出的电子设备仅仅是一个示例,不应对本申请实施例的功能和使用范围带来任何限制。
如图8所示,计算机系统800包括中央处理单元(Central Processing Unit,CPU)801,其可以根据存储在只读存储器(Read-Only Memory,ROM)802中的程序或者从存储部分808加载到随机访问存储器(Random Access Memory,RAM)803中的程序而执行各种适当的动作和处理。在RAM 803中,还存储有系统800操作所需的各种程序和数据。CPU 801、ROM 802以及RAM 803通过总线804彼此相连。输入/输出(Input/Output,I/O)接口805也连接至总线804。
以下部件连接至I/O接口805:包括键盘、鼠标等的输入部分806;包括 诸如阴极射线管(Cathode Ray Tube,CRT)、液晶显示器(Liquid Crystal Display,LCD)等以及扬声器等的输出部分807;包括硬盘等的存储部分808;以及包括诸如局域网(Local Area Network,LAN)卡、调制解调器等的网络接口卡的通信部分809。通信部分809经由诸如因特网的网络执行通信处理。驱动器810也根据需要连接至I/O接口805。可拆卸介质811,诸如磁盘、光盘、磁光盘、半导体存储器等等,根据需要安装在驱动器810上,以便于从其上读出的计算机程序根据需要被安装入存储部分808。
特别地,根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信部分809从网络上被下载和安装,或从可拆卸介质811被安装这两种方式中的至少一种被安装。在该计算机程序被中央处理单元(CPU)801执行时,执行本申请的方法中限定的上述功能。需要说明的是,本申请所述的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有至少一个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(Erasable Programmable Read-Only Memory,EPROM)或闪存、光纤、便携式紧凑磁盘只读存储器(Compact Disc Read-Only Memory,CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本申请中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本申请中,计算机可读的信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读的信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:无线、电线、光缆、射频(Radio Frequency,RF)等等,或者上述的任意合适的组合。
附图中的流程图和框图,图示了按照本申请各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,该模块、 程序段、或代码的一部分包含至少一个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
描述于本申请实施例中所涉及到的单元可以通过软件的方式实现,也可以通过硬件的方式来实现。所描述的单元也可以设置在处理器中,例如,可以描述为:一种处理器包括获取单元和训练单元。其中,这些单元的名称在某种情况下并不构成对该单元本身的限定,例如,获取单元还可以被描述为“获取样本集的单元”。
作为另一方面,本申请还提供了一种计算机可读介质,该计算机可读介质可以是上述实施例中描述的装置中所包含的;也可以是单独存在,而未装配入该装置中。上述计算机可读介质承载有至少一个程序,当上述至少一个程序被该装置执行时,使得该装置:获取样本集;从该样本集中提取样本,执行如下训练过程:将所提取的样本中的样本视频中的帧输入至初始模型,分别得到样本视频属于低质视频和样本视频属于每种低质类别的概率;基于所提取的样本中的标注信息、所得到的概率和预先建立的损失函数,确定样本的损失值;基于该损失值与目标值的比较,确定初始模型是否训练完成;响应于确定初始模型训练完成,将训练后的初始模型确定为低质视频检测模型。

Claims (14)

  1. 一种生成模型的方法,包括:
    获取样本集,其中,所述样本集中的样本包括样本视频和用于指示所述样本视频是否属于低质视频的第一标注信息,在所述样本视频属于低质视频的情况下,所述样本还包括用于指示所述样本视频的低质类别的第二标注信息,所述样本视频的低质类别包括多种低质类别;
    从所述样本集中提取样本,执行如下训练过程:将所提取的样本中的样本视频中的帧输入至初始模型,分别得到样本视频属于低质视频的概率和样本视频属于每种低质类别的概率;基于所提取的样本中的标注信息、所得到的概率和预先建立的损失函数,确定样本的损失值;将所述损失值与目标值进行比较,确定所述初始模型是否训练完成;响应于确定所述初始模型训练完成,将训练后的所述初始模型确定为低质视频检测模型。
  2. 根据权利要求1所述的方法,其中,所述基于所提取的样本中的标注信息、所得到的概率和预先建立的损失函数,确定样本的损失值,包括:
    将所提取的样本中的第一标注信息、样本视频属于低质视频的概率输入至预先建立的第一损失函数,得到第一损失值;
    响应于确定所提取的样本中不包含第二标注信息,将所述第一损失值确定为所提取的样本的损失值。
  3. 根据权利要求2所述的方法,所述基于所提取的样本中的标注信息、所得到的概率和预先建立的损失函数,确定样本的损失值,还包括:
    响应于确定所提取的样本中包含第二标注信息,将所提取的样本中的第二标注信息所指示的低质类别作为目标类别,将所提取的样本中所包含的第二标注信息、样本视频属于目标类别的概率输入至预先建立的第二损失函数,得到第二损失值;
    将所述第一损失值与所述第二损失值之和确定为所提取的样本的损失值。
  4. 根据权利要求1所述的方法,还包括:
    响应于确定初始模型未训练完成,基于所确定样本的损失值,更新初始模型中的参数,从所述样本集中重新提取样本,使用更新参数后的初始模型作为初始模型,继续执行所述训练过程。
  5. 一种生成模型的装置,包括:
    获取单元,被配置成获取样本集,其中,所述样本集中的样本包括样本视频和用于指示所述样本视频是否属于低质视频的第一标注信息,在所述样本视频属于低质视频的情况下,所述样本还包括用于指示所述样本视频的低质类别的第二标注信息,所述样本视频的低质类别包括多种低质类别;
    训练单元,被配置成从所述样本集中提取样本,执行如下训练过程:将所提取的样本中的样本视频的帧输入至初始模型,分别得到样本视频属于低 质视频的概率和样本视频属于每种低质类别的概率;基于所提取的样本中的标注信息、所得到的概率和预先建立的损失函数,确定样本的损失值;将所述损失值与目标值进行比较,确定所述初始模型是否训练完成;响应于确定所述初始模型训练完成,将训练后的所述初始模型确定为低质视频检测模型。
  6. 根据权利要求5所述的装置,其中,所述训练单元,被配置成:
    将所提取的样本中的第一标注信息、样本视频属于低质视频的概率输入至预先建立的第一损失函数中,得到第一损失值;
    响应于确定所提取的样本中不包含第二标注信息,将所述第一损失值确定为所提取的样本的损失值。
  7. 根据权利要求6所述的装置,所述训练单元,被配置成:
    响应于确定所提取的样本中包含第二标注信息,将所提取的样本中的第二标注信息所指示的低质类别作为目标类别,将所提取的样本中所包含的第二标注信息、样本视频属于目标类别的概率输入至预先建立的第二损失函数,得到第二损失值;
    将所述第一损失值与所述第二损失值之和确定为所提取的样本的损失值。
  8. 根据权利要求5所述的装置,还包括:
    更新单元,被配置成响应于确定初始模型未训练完成,基于所确定的样本的损失值,更新初始模型中的参数,从所述样本集中重新提取样本,使用更新参数后的初始模型作为初始模型,继续执行所述训练过程。
  9. 一种检测低质视频的方法,包括:
    接收包含目标视频的低质视频检测请求;
    将所述目标视频中的帧输入采用如权利要求1-4任一项所述的方法生成的低质视频检测模型,得到检测结果,其中,所述检测结果包括所述目标视频属于低质视频的概率;
    响应于确定所述目标视频属于低质视频的概率大于第一预设阈值,确定所述目标视频为低质视频。
  10. 根据权利要求9所述的方法,所述检测结果还包括所述目标视频属于多种低质类别中每种低质类别的概率;以及
    在所述确定所述目标视频为低质视频之后,所述方法还包括:
    响应于接收到对所述目标视频的低质类别检测请求,将所述目标视频属于低质视频的概率作为第一概率,对于每种低质类别,确定所述目标视频属于每种低质类别的概率与所述第一概率的乘积,将所述乘积确定为所述目标视频属于每种低质类别的概率;
    将所述目标视频属于每种低质类别的概率大于第二预设数值的低质类别确定为所述目标视频的低质类别。
  11. 一种检测低质视频的装置,包括:
    第一接收单元,被配置成接收包含目标视频的低质视频检测请求;
    输入单元,被配置成将所述目标视频中的帧输入采用如权利要求1-4任一项所述的方法生成的低质视频检测模型,得到检测结果,其中,所述检测结果包括所述目标视频属于低质视频的概率;
    第一确定单元,被配置成响应于确定所述目标视频属于低质视频的概率大于第一预设阈值,确定所述目标视频为低质视频。
  12. 根据权利要求11所述的装置,所述检测结果还包括所述目标视频属于多种低质类别中每种低质类别的概率;以及
    所述装置还包括:
    第二接收单元,被配置成响应于接收到对所述目标视频的低质类别检测请求,将所述目标视频属于低质视频的概率作为第一概率,对于每种低质类别,确定所述目标视频属于每种低质类别的概率与所述第一概率的乘积,将所述乘积确定为所述目标视频属于每种低质类别的概率;
    第二确定单元,被配置成将所述目标视频属于每种低质类别的概率大于第二预设数值的低质类别确定为所述目标视频的低质类别。
  13. 一种电子设备,包括:
    至少一个处理器;
    存储装置,其上存储有至少一个程序,
    当所述至少一个程序被所述至少一个处理器执行,使得所述至少一个处理器实现如权利要求1-4、9-10中任一项所述的方法。
  14. 一种计算机可读介质,其上存储有计算机程序,其中,所述程序被处理器执行时实现如权利要求1-4、9-10中任一项所述的方法。
PCT/CN2019/095078 2018-10-30 2019-07-08 生成模型的方法和装置 WO2020087974A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811273468.2 2018-10-30
CN201811273468.2A CN109344908B (zh) 2018-10-30 2018-10-30 用于生成模型的方法和装置

Publications (1)

Publication Number Publication Date
WO2020087974A1 true WO2020087974A1 (zh) 2020-05-07

Family

ID=65311066

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/095078 WO2020087974A1 (zh) 2018-10-30 2019-07-08 生成模型的方法和装置

Country Status (2)

Country Link
CN (1) CN109344908B (zh)
WO (1) WO2020087974A1 (zh)

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111626219A (zh) * 2020-05-28 2020-09-04 深圳地平线机器人科技有限公司 轨迹预测模型生成方法、装置、可读存储介质及电子设备
CN111639591A (zh) * 2020-05-28 2020-09-08 深圳地平线机器人科技有限公司 轨迹预测模型生成方法、装置、可读存储介质及电子设备
CN111724371A (zh) * 2020-06-19 2020-09-29 联想(北京)有限公司 一种数据处理方法、装置及电子设备
CN111814846A (zh) * 2020-06-19 2020-10-23 浙江大华技术股份有限公司 属性识别模型的训练方法、识别方法及相关设备
CN111813932A (zh) * 2020-06-17 2020-10-23 北京小米松果电子有限公司 文本数据的处理方法、分类方法、装置及可读存储介质
CN111832290A (zh) * 2020-05-25 2020-10-27 北京三快在线科技有限公司 用于确定文本相关度的模型训练方法、装置、电子设备及可读存储介质
CN112287225A (zh) * 2020-10-29 2021-01-29 北京奇艺世纪科技有限公司 一种对象推荐方法及装置
CN112734641A (zh) * 2020-12-31 2021-04-30 百果园技术(新加坡)有限公司 目标检测模型的训练方法、装置、计算机设备及介质
CN112749685A (zh) * 2021-01-28 2021-05-04 北京百度网讯科技有限公司 视频分类方法、设备和介质
CN112784111A (zh) * 2021-03-12 2021-05-11 有半岛(北京)信息科技有限公司 视频分类方法、装置、设备及介质
CN112819078A (zh) * 2021-02-04 2021-05-18 上海明略人工智能(集团)有限公司 一种识别模型的迭代方法和装置
CN112906810A (zh) * 2021-03-08 2021-06-04 共达地创新技术(深圳)有限公司 目标检测方法、电子设备和存储介质
CN112926621A (zh) * 2021-01-21 2021-06-08 百度在线网络技术(北京)有限公司 数据标注方法、装置、电子设备及存储介质
CN112966712A (zh) * 2021-02-01 2021-06-15 北京三快在线科技有限公司 语言模型训练方法、装置、电子设备和计算机可读介质
CN113077815A (zh) * 2021-03-29 2021-07-06 腾讯音乐娱乐科技(深圳)有限公司 一种音频评估方法及组件
CN113177529A (zh) * 2021-05-27 2021-07-27 腾讯音乐娱乐科技(深圳)有限公司 识别花屏的方法、装置、设备及存储介质
CN113255824A (zh) * 2021-06-15 2021-08-13 京东数科海益信息科技有限公司 训练分类模型和数据分类的方法和装置
CN113435528A (zh) * 2021-07-06 2021-09-24 北京有竹居网络技术有限公司 对象分类的方法、装置、可读介质和电子设备
CN113723616A (zh) * 2021-08-17 2021-11-30 上海智能网联汽车技术中心有限公司 一种多传感器信息半自动标注方法、系统及存储介质
CN117471421A (zh) * 2023-12-25 2024-01-30 中国科学技术大学 对象跌倒检测模型的训练方法及跌倒检测方法
CN112926621B (zh) * 2021-01-21 2024-05-10 百度在线网络技术(北京)有限公司 数据标注方法、装置、电子设备及存储介质

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109344908B (zh) * 2018-10-30 2020-04-28 北京字节跳动网络技术有限公司 用于生成模型的方法和装置
CN109961032B (zh) * 2019-03-18 2022-03-29 北京字节跳动网络技术有限公司 用于生成分类模型的方法和装置
CN110188833B (zh) * 2019-06-04 2021-06-18 北京字节跳动网络技术有限公司 用于训练模型的方法和装置
CN111770353A (zh) * 2020-06-24 2020-10-13 北京字节跳动网络技术有限公司 一种直播监控方法、装置、电子设备及存储介质
CN114336258B (zh) * 2021-12-31 2023-09-08 武汉锐科光纤激光技术股份有限公司 光束的功率控制方法、装置和存储介质及电子设备

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107194419A (zh) * 2017-05-10 2017-09-22 百度在线网络技术(北京)有限公司 视频分类方法及装置、计算机设备与可读介质
CN108038413A (zh) * 2017-11-02 2018-05-15 平安科技(深圳)有限公司 欺诈可能性分析方法、装置及存储介质
CN108197618A (zh) * 2018-04-08 2018-06-22 百度在线网络技术(北京)有限公司 用于生成人脸检测模型的方法和装置
CN109145828A (zh) * 2018-08-24 2019-01-04 北京字节跳动网络技术有限公司 用于生成视频类别检测模型的方法和装置
CN109344908A (zh) * 2018-10-30 2019-02-15 北京字节跳动网络技术有限公司 用于生成模型的方法和装置
CN109376267A (zh) * 2018-10-30 2019-02-22 北京字节跳动网络技术有限公司 用于生成模型的方法和装置
CN109447156A (zh) * 2018-10-30 2019-03-08 北京字节跳动网络技术有限公司 用于生成模型的方法和装置
CN109447246A (zh) * 2018-10-30 2019-03-08 北京字节跳动网络技术有限公司 用于生成模型的方法和装置

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101426150B (zh) * 2008-12-08 2011-05-11 青岛海信电子产业控股股份有限公司 视频图像质量测评的方法和系统
SE535070C2 (sv) * 2010-09-10 2012-04-03 Choros Cognition Ab Förfarande för att automatiskt klassificera en två-eller högredimensionell bild
CN104346810A (zh) * 2014-09-23 2015-02-11 上海交通大学 基于图片质量水平分类的图片质量评价方法
CN105451016A (zh) * 2015-12-07 2016-03-30 天津大学 一种适用于视频监控系统的无参考视频质量评价方法
CN107451148A (zh) * 2016-05-31 2017-12-08 北京金山安全软件有限公司 一种视频分类方法、装置及电子设备
CN107578034A (zh) * 2017-09-29 2018-01-12 百度在线网络技术(北京)有限公司 信息生成方法和装置

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107194419A (zh) * 2017-05-10 2017-09-22 百度在线网络技术(北京)有限公司 视频分类方法及装置、计算机设备与可读介质
CN108038413A (zh) * 2017-11-02 2018-05-15 平安科技(深圳)有限公司 欺诈可能性分析方法、装置及存储介质
CN108197618A (zh) * 2018-04-08 2018-06-22 百度在线网络技术(北京)有限公司 用于生成人脸检测模型的方法和装置
CN109145828A (zh) * 2018-08-24 2019-01-04 北京字节跳动网络技术有限公司 用于生成视频类别检测模型的方法和装置
CN109344908A (zh) * 2018-10-30 2019-02-15 北京字节跳动网络技术有限公司 用于生成模型的方法和装置
CN109376267A (zh) * 2018-10-30 2019-02-22 北京字节跳动网络技术有限公司 用于生成模型的方法和装置
CN109447156A (zh) * 2018-10-30 2019-03-08 北京字节跳动网络技术有限公司 用于生成模型的方法和装置
CN109447246A (zh) * 2018-10-30 2019-03-08 北京字节跳动网络技术有限公司 用于生成模型的方法和装置

Cited By (36)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111832290B (zh) * 2020-05-25 2024-04-02 北京三快在线科技有限公司 用于确定文本相关度的模型训练方法、装置、电子设备及可读存储介质
CN111832290A (zh) * 2020-05-25 2020-10-27 北京三快在线科技有限公司 用于确定文本相关度的模型训练方法、装置、电子设备及可读存储介质
CN111639591A (zh) * 2020-05-28 2020-09-08 深圳地平线机器人科技有限公司 轨迹预测模型生成方法、装置、可读存储介质及电子设备
CN111639591B (zh) * 2020-05-28 2023-06-30 深圳地平线机器人科技有限公司 轨迹预测模型生成方法、装置、可读存储介质及电子设备
CN111626219B (zh) * 2020-05-28 2023-06-09 深圳地平线机器人科技有限公司 轨迹预测模型生成方法、装置、可读存储介质及电子设备
CN111626219A (zh) * 2020-05-28 2020-09-04 深圳地平线机器人科技有限公司 轨迹预测模型生成方法、装置、可读存储介质及电子设备
CN111813932A (zh) * 2020-06-17 2020-10-23 北京小米松果电子有限公司 文本数据的处理方法、分类方法、装置及可读存储介质
CN111813932B (zh) * 2020-06-17 2023-11-14 北京小米松果电子有限公司 文本数据的处理方法、分类方法、装置及可读存储介质
CN111724371A (zh) * 2020-06-19 2020-09-29 联想(北京)有限公司 一种数据处理方法、装置及电子设备
CN111814846A (zh) * 2020-06-19 2020-10-23 浙江大华技术股份有限公司 属性识别模型的训练方法、识别方法及相关设备
CN111724371B (zh) * 2020-06-19 2023-05-23 联想(北京)有限公司 一种数据处理方法、装置及电子设备
CN112287225A (zh) * 2020-10-29 2021-01-29 北京奇艺世纪科技有限公司 一种对象推荐方法及装置
CN112287225B (zh) * 2020-10-29 2023-09-08 北京奇艺世纪科技有限公司 一种对象推荐方法及装置
CN112734641A (zh) * 2020-12-31 2021-04-30 百果园技术(新加坡)有限公司 目标检测模型的训练方法、装置、计算机设备及介质
CN112926621A (zh) * 2021-01-21 2021-06-08 百度在线网络技术(北京)有限公司 数据标注方法、装置、电子设备及存储介质
CN112926621B (zh) * 2021-01-21 2024-05-10 百度在线网络技术(北京)有限公司 数据标注方法、装置、电子设备及存储介质
CN112749685A (zh) * 2021-01-28 2021-05-04 北京百度网讯科技有限公司 视频分类方法、设备和介质
CN112749685B (zh) * 2021-01-28 2023-11-03 北京百度网讯科技有限公司 视频分类方法、设备和介质
CN112966712A (zh) * 2021-02-01 2021-06-15 北京三快在线科技有限公司 语言模型训练方法、装置、电子设备和计算机可读介质
CN112966712B (zh) * 2021-02-01 2023-01-20 北京三快在线科技有限公司 语言模型训练方法、装置、电子设备和计算机可读介质
CN112819078B (zh) * 2021-02-04 2023-12-15 上海明略人工智能(集团)有限公司 一种图片识别模型的迭代方法和装置
CN112819078A (zh) * 2021-02-04 2021-05-18 上海明略人工智能(集团)有限公司 一种识别模型的迭代方法和装置
CN112906810B (zh) * 2021-03-08 2024-04-16 共达地创新技术(深圳)有限公司 目标检测方法、电子设备和存储介质
CN112906810A (zh) * 2021-03-08 2021-06-04 共达地创新技术(深圳)有限公司 目标检测方法、电子设备和存储介质
CN112784111A (zh) * 2021-03-12 2021-05-11 有半岛(北京)信息科技有限公司 视频分类方法、装置、设备及介质
CN113077815A (zh) * 2021-03-29 2021-07-06 腾讯音乐娱乐科技(深圳)有限公司 一种音频评估方法及组件
CN113077815B (zh) * 2021-03-29 2024-05-14 腾讯音乐娱乐科技(深圳)有限公司 一种音频评估方法及组件
CN113177529A (zh) * 2021-05-27 2021-07-27 腾讯音乐娱乐科技(深圳)有限公司 识别花屏的方法、装置、设备及存储介质
CN113177529B (zh) * 2021-05-27 2024-04-23 腾讯音乐娱乐科技(深圳)有限公司 识别花屏的方法、装置、设备及存储介质
CN113255824B (zh) * 2021-06-15 2023-12-08 京东科技信息技术有限公司 训练分类模型和数据分类的方法和装置
CN113255824A (zh) * 2021-06-15 2021-08-13 京东数科海益信息科技有限公司 训练分类模型和数据分类的方法和装置
CN113435528A (zh) * 2021-07-06 2021-09-24 北京有竹居网络技术有限公司 对象分类的方法、装置、可读介质和电子设备
CN113435528B (zh) * 2021-07-06 2024-02-02 北京有竹居网络技术有限公司 对象分类的方法、装置、可读介质和电子设备
CN113723616A (zh) * 2021-08-17 2021-11-30 上海智能网联汽车技术中心有限公司 一种多传感器信息半自动标注方法、系统及存储介质
CN117471421A (zh) * 2023-12-25 2024-01-30 中国科学技术大学 对象跌倒检测模型的训练方法及跌倒检测方法
CN117471421B (zh) * 2023-12-25 2024-03-12 中国科学技术大学 对象跌倒检测模型的训练方法及跌倒检测方法

Also Published As

Publication number Publication date
CN109344908B (zh) 2020-04-28
CN109344908A (zh) 2019-02-15

Similar Documents

Publication Publication Date Title
WO2020087974A1 (zh) 生成模型的方法和装置
WO2020087979A1 (zh) 生成模型的方法和装置
CN109376267B (zh) 用于生成模型的方法和装置
US11176423B2 (en) Edge-based adaptive machine learning for object recognition
CN109447156B (zh) 用于生成模型的方法和装置
CN109308490B (zh) 用于生成信息的方法和装置
CN108520220B (zh) 模型生成方法和装置
CN109104620B (zh) 一种短视频推荐方法、装置和可读介质
CN109145828B (zh) 用于生成视频类别检测模型的方法和装置
WO2020000879A1 (zh) 图像识别方法和装置
CN111741330B (zh) 一种视频内容评估方法、装置、存储介质及计算机设备
CN109360028B (zh) 用于推送信息的方法和装置
CN109740018B (zh) 用于生成视频标签模型的方法和装置
CN109446990B (zh) 用于生成信息的方法和装置
CN108197652B (zh) 用于生成信息的方法和装置
CN108520470B (zh) 用于生成用户属性信息的方法和装置
JP7394809B2 (ja) ビデオを処理するための方法、装置、電子機器、媒体及びコンピュータプログラム
CN109447246B (zh) 用于生成模型的方法和装置
CN109582825B (zh) 用于生成信息的方法和装置
KR102592402B1 (ko) 연합 학습을 활용한 사용자 특성 분석을 위한 딥 러닝 모델 생성 방법
CN110490304B (zh) 一种数据处理方法及设备
CN111083469A (zh) 一种视频质量确定方法、装置、电子设备及可读存储介质
CN110059743B (zh) 确定预测的可靠性度量的方法、设备和存储介质
WO2020173270A1 (zh) 用于分析数据的方法、设备和计算机存储介质
CN111161238A (zh) 图像质量评价方法及装置、电子设备、存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19880187

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS (EPO FORM 1205A DATED 18.08.2021)

122 Ep: pct application non-entry in european phase

Ref document number: 19880187

Country of ref document: EP

Kind code of ref document: A1