CN114298991A

CN114298991A - Method and device for generating screen splash, and method and device for training screen splash detection model

Info

Publication number: CN114298991A
Application number: CN202111572966.9A
Authority: CN
Inventors: 梅丽
Original assignee: Spreadtrum Communications Shanghai Co Ltd
Current assignee: Spreadtrum Communications Shanghai Co Ltd
Priority date: 2021-12-21
Filing date: 2021-12-21
Publication date: 2022-04-08

Abstract

The invention provides a method and a device for generating a flower screen and a method and a device for training a flower screen detection model, wherein the method for generating the flower screen comprises the following steps: acquiring video playing data, and performing format conversion on the video playing data to obtain first compressed data; modifying the resolution of the first compressed data and acquiring first video data; or dropping the key frame in the first compressed data and obtaining second video data; generating a sample screen-splash image according to the first video data or the second video data. According to the invention, the screen-patterned detection model is trained by acquiring a large number of screen-patterned images, so that the accuracy of the detection result of the screen-patterned detection model is improved.

Description

Method and device for generating screen splash, and method and device for training screen splash detection model

Technical Field

The invention relates to the technical field of image processing, in particular to a method and a device for generating a screen pattern and a method and a device for training a screen pattern detection model.

Background

Because the screen splash phenomenon can appear in video broadcast, and the testing process that the tester detected the screen splash image manually can be more loaded down with trivial details, so in order to improve user's broadcast and experience, generally detect the screen splash phenomenon in the video broadcast through the screen splash detection model based on machine learning to in time take remedial action when producing the screen splash phenomenon. The pattern detection model is very dependent on a data set, so a large number of pattern image samples are required to be trained, and the occurrence of the pattern phenomenon has certain contingency, so that the collection of a large number of pattern images is difficult, and the rapid acquisition of a large number of pattern image samples required by the pattern detection model cannot be realized.

Therefore, the invention provides a method and a device for generating a flower screen and a method and a device for training a flower screen detection model, which are used for generating a large number of flower screen images and training the flower screen detection model according to the large number of flower screen images.

Disclosure of Invention

The embodiment of the invention provides a method and a device for generating a screen splash and a method and a device for training a screen splash detection model, and the screen splash detection model is trained by acquiring a large number of screen splash images, so that the accuracy of the detection result of the screen splash detection model is improved.

In a first aspect, the present invention provides a method for generating a screen, including: acquiring video playing data, and performing format conversion on the video playing data to obtain first compressed data; modifying the resolution of the first compressed data and acquiring first video data; or dropping the key frame in the first compressed data and obtaining second video data; generating a sample screen-splash image according to the first video data or the second video data.

The beneficial effects are that: according to the method, the sample screen-spending image is obtained by modifying the resolution or losing the key frame to process the first compressed data, so that the operation is simple, the implementation is not complex, the influence of the resolution and the probability of the key frame on the generation of the sample screen-spending image is large, and the efficiency of obtaining the sample screen-spending image by modifying the resolution and losing the key frame is high.

Optionally, the modifying the resolution of the first compressed data and acquiring the first video data includes: and increasing the resolution of the first compressed data to obtain second compressed data, and transcoding the second compressed data to obtain the first video data. The beneficial effects are that: when the resolution of the first compressed data is increased, and the greater the degree of the increase in the resolution, the greater the degree of the screen-blooming of the generated sample screen-blooming image, that is, the more advantageous the generation of the screen-blooming.

Optionally, the dropping the key frame in the first compressed data and obtaining the second video data includes: and determining the key frame time in the first compressed data, discarding data frames within the preset range of the key frame time, and acquiring second video data. The beneficial effects are that: because the key frame contains all information of the video related to the first compressed data, the data frame which has a great influence on encoding or decoding the first compressed data is a data frame, and when the key frame is dropped, the probability of generating the screen splash of the second video data is very high, namely, the acquisition of the screen splash image of the sample is very facilitated.

Optionally, the generating a sample screen-splash image according to the first video data or the second video data includes: and playing the first video data or the second video data, and capturing a video screenshot in a picture played by the first video data or the second video data to generate the sample screen-blooming image. The beneficial effects are that: the method and the device can conveniently acquire the sample screen-blooming image by capturing the playing picture of the first video data or the second video data.

Optionally, the generating the sample splash screen image according to the first video data or the second video data includes: generating an alternative screen-blooming image according to the first video data or the second video data; and if the alternative screen-blooming image meets the preset screen-blooming image condition, selecting the alternative screen-blooming image as the sample screen-blooming image. The beneficial effects are that: by presetting the condition of the pattern screen image, the pattern screen image which meets the condition can be easily screened out.

Further optionally, if the candidate screen-blooming image meets a preset screen-blooming image condition, selecting the candidate screen-blooming image as the sample screen-blooming image, including: transcoding the first compressed data to obtain third video data; generating an original video image corresponding to the alternative screen-patterned image according to the third video data; comparing the original video image with the alternative screen-patterned image; and if the similarity between the original video image and the alternative screen-blooming image is smaller than a preset threshold value, selecting the alternative screen-blooming image as the target screen-blooming image. The beneficial effects are that: because machine judgment can have certain errors, the most accurate judgment result can be obtained only by comparing the alternative screen-painted image with the original video image.

In a second aspect, the present invention provides a training method for a pattern detection model, including: acquiring the original video image and the sample screen-splash image as described in any embodiment of the first aspect, establishing a data set, and dividing the data set into a training set and a verification set; carrying out normalization preprocessing on the images in the training set; acquiring a neural network training model, and inputting the images in the training set into the neural network training model for training; calculating the training loss of the images in the training set through a loss function, performing back propagation on the training loss to iteratively update the neural network training model, and finishing training when the verification result of the neural network training model on the verification set meets the preset requirement to obtain a trained screen detection model.

The beneficial effects are that: according to the invention, the original video image and the sample screen-splash image as described in any embodiment of the first aspect are established into a data set, so that the neural network training model can obtain enough training sample images, and the detection result of the screen-splash detection model is more accurate.

Optionally, the performing normalization preprocessing on the images in the training set includes: and performing at least one of rotation processing, scaling processing and cutting processing on the images in the training set. The beneficial effects are that: by carrying out rotation processing, scaling processing or cutting processing on the images in the training set, the training effect of the neural network training model can be enhanced, and the detection accuracy of the screen-patterned detection model is improved.

In a third aspect, the present invention provides a device for the detection of a screen splash, comprising means for performing the method of any one of the possible designs of the first aspect. These modules/units may be implemented by hardware, or by hardware executing corresponding software.

In a fourth aspect, the present invention provides a training apparatus for a pattern detection model, the apparatus comprising means for performing any one of the possible design methods of the second aspect. These modules/units may be implemented by hardware, or by hardware executing corresponding software.

As for the advantageous effects of the above third to fourth aspects, reference may be made to the description in the above first or second aspect.

Drawings

Fig. 1 is a flowchart of a method for generating a flower screen according to an embodiment of the present disclosure;

fig. 2 is a training method of a pattern detection model according to an embodiment of the present disclosure;

fig. 3 is a schematic view of a flower screen detection device according to an embodiment of the present disclosure;

fig. 4 is a schematic view of a training device of a flower screen detection model according to an embodiment of the present application.

Detailed Description

The technical solution in the embodiments of the present application is described below with reference to the drawings in the embodiments of the present application. In the description of the embodiments of the present application, the terminology used in the following embodiments is for the purpose of describing particular embodiments only and is not intended to be limiting of the present application. As used in the specification of the present application and the appended claims, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, such as "one or more", unless the context clearly indicates otherwise. It should also be understood that in the following embodiments of the present application, "at least one", "one or more" means one or more than two (including two). The term "and/or" is used to describe an association relationship that associates objects, meaning that three relationships may exist; for example, a and/or B, may represent: a alone, both A and B, and B alone, where A, B may be singular or plural. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship.

Reference throughout this specification to "one embodiment" or "some embodiments," or the like, means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the present application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," or the like, in various places throughout this specification are not necessarily all referring to the same embodiment, but rather "one or more but not all embodiments" unless specifically stated otherwise. The terms "comprising," "including," "having," and variations thereof mean "including, but not limited to," unless expressly specified otherwise. The term "coupled" includes both direct and indirect connections, unless otherwise noted. "first" and "second" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated.

In the embodiments of the present application, words such as "exemplary" or "for example" are used to mean serving as examples, illustrations or descriptions. Any embodiment or design described herein as "exemplary" or "e.g.," is not necessarily to be construed as preferred or advantageous over other embodiments or designs. Rather, use of the word "exemplary" or "such as" is intended to present concepts related in a concrete fashion.

The embodiment of the application provides a method for generating a screen, the flow of which is shown in fig. 1, and the method comprises the following specific steps:

s101, video playing data are obtained, format conversion is carried out on the video playing data, and first compressed data are obtained.

In this step, the video playing data may be obtained through a terminal device, where the terminal device may be one or more of a smart phone, a tablet computer, and a portable computer, and certainly may also be a desktop computer, and the like, and a source of the video playing data may be a server. The terminal device obtains the video playing data in the server through a network, and the network can comprise various connection types, such as a wired communication link, a wireless communication link and the like. The number of the terminal device, the network and the servers may be set according to actual needs, for example, the terminal device may also obtain the video playing data in a server cluster composed of a plurality of servers through the network.

S102, modifying the resolution of the first compressed data and acquiring first video data; or dropping the key frame in the first compressed data and obtaining second video data.

S103, generating a sample screen-blooming image according to the first video data or the second video data.

In S102 and S103, the user may use the terminal device to interact with the server, which may be a server providing various services, through the network to receive or transmit messages or the like. For example, a user acquires video playing data by using terminal equipment, and performs format conversion on the video playing data to obtain first compressed data; modifying the resolution of the first compressed data and acquiring first video data; or dropping the key frame in the first compressed data and obtaining second video data; generating a sample screen-splash image according to the first video data or the second video data. And then, a large number of flower screen image samples required by the training of the flower screen detection model can be generated quickly, so that the efficiency of generating the flower screen image is improved.

It should be noted that the method for generating the splash screen image provided in the embodiment of the present application is generally executed by a terminal device, and accordingly, the apparatus for generating the splash screen image is generally disposed in the terminal device. However, in other embodiments of the present application, the server may also have a similar function as the terminal device, so as to execute the scheme of the method for generating the flower screen image provided in the embodiments of the present application. For example, a user uploads video playing data to a server by using a terminal device (which may also be a terminal device or a terminal device), format conversion is performed on the video playing data to obtain first compressed data, and the server modifies the resolution of the first compressed data and obtains first video data; or dropping the key frame in the first compressed data and obtaining second video data; transcoding the second video compressed data to obtain second video data; and generating a sample screen-blooming image according to the first video data or the second video data, and pushing the generated sample screen-blooming image to terminal equipment by a server.

In one possible implementation, the modifying the resolution of the first compressed data and obtaining the first video data includes: and increasing the resolution of the first compressed data to obtain second compressed data, and transcoding the second compressed data to obtain the first video data. In this embodiment, when the resolution of the first compressed data is increased, and the greater the degree of the increase in the resolution, the greater the degree of the screen-blooming of the generated sample screen-blooming image, that is, the more advantageous the generation of the screen-blooming.

In yet another possible embodiment, the dropping key frames in the first compressed data and obtaining second video data includes: and determining the key frame time in the first compressed data, discarding data frames within the preset range of the key frame time, and acquiring second video data. In this embodiment, since the key frame includes all information of the video related to the first compressed data, the data frame having a large influence on encoding or decoding the first compressed data is a data frame, and when the key frame is dropped, the probability of generating the screen splash by the second video data is very high, that is, the acquisition of the screen splash image of the sample is very facilitated.

In a further possible embodiment, the generating a sample splash-screen image from the first video data or the second video data comprises: and playing the first video data or the second video data, and capturing a video screenshot in a picture played by the first video data or the second video data to generate the sample screen-blooming image. In this embodiment, the sample screen-splash image can be conveniently obtained by capturing the playing picture of the first video data or the second video data.

In yet another possible embodiment, the generating the sample splash screen image from the first video data or the second video data includes: generating an alternative screen-blooming image according to the first video data or the second video data; and if the alternative screen-blooming image meets the preset screen-blooming image condition, selecting the alternative screen-blooming image as the sample screen-blooming image. In this embodiment, by presetting the condition of the screen-splash image, the screen-splash image of the sample meeting the condition can be easily screened out.

Further, if the candidate screen-blooming image meets a preset screen-blooming image condition, selecting the candidate screen-blooming image as the sample screen-blooming image, including: transcoding the first compressed data to obtain third video data; generating an original video image corresponding to the alternative screen-patterned image according to the third video data; comparing the original video image with the alternative screen-patterned image; and if the similarity between the original video image and the alternative screen-blooming image is smaller than a preset threshold value, selecting the alternative screen-blooming image as the target screen-blooming image. In this embodiment, since there may be a certain error in the machine determination, the most accurate determination result can be obtained only by comparing the candidate screen-splash image with the original video image.

The embodiment of the application provides a method for training a pattern detection model, the flow of which is shown in fig. 2, and the method comprises the following specific steps:

s201, acquiring the original video image and the sample screen-blooming image in any embodiment, establishing a data set, and dividing the data set into a training set and a verification set.

S202, carrying out normalization preprocessing on the images in the training set.

S203, obtaining a neural network training model, and inputting the images in the training set into the neural network training model for training.

S204, calculating the training loss of the images in the training set through a loss function, performing back propagation on the training loss to iteratively update the neural network training model, and finishing training when the verification result of the neural network training model on the verification set meets the preset requirement to obtain the trained screen detection model.

Illustratively, when the original video image and the sample screen splash image are acquired, images in the data set may be classified, for example, into a normal image and a screen splash image, the normal image and the screen splash image are respectively given with labels of "normal" and "screen splash", and proper numbers of normal images and screen splash images are respectively put into the training set and the verification set to train the neural network training model until a verification result of the neural network training model on the verification set meets a preset requirement, and the training is finished to obtain a trained screen splash detection model.

According to the method and the device, the original video image and the sample screen-splash image in any embodiment are established into a data set, so that the neural network training model can obtain enough training sample images, and the detection result of the screen-splash detection model is more accurate.

In a possible embodiment, the normalizing pre-processing the images in the training set includes: and performing at least one of rotation processing, scaling processing and cutting processing on the images in the training set. In this embodiment, the training effect of the neural network training model can be enhanced by performing rotation processing, scaling processing or cutting processing on the images in the training set, and the detection accuracy of the screen splash detection model is improved.

The embodiment of the present application provides a device for detecting a screen splash, and as shown in fig. 3, the device 300 includes an obtaining module 301, a converting module 302, a processing module 303, and a generating module 304.

The obtaining module 301 is configured to obtain video playing data. The conversion module 302 is configured to perform format conversion on the video playing data to obtain first compressed data. The processing module 303 is configured to modify a resolution of the first compressed data and obtain first video data; or dropping the key frame in the first compressed data and obtaining second video data. The generating module 304 is configured to generate a sample splash screen image according to the first video data or the second video data.

In this embodiment, the obtaining module 301, the converting module 302, the processing module 303 and the generating module 304 may be implemented by hardware, or may be implemented by hardware executing corresponding software.

For example, the apparatus 300 for detecting a screen splash provided in the embodiments of the present application may be disposed in the terminal device mentioned in any of the embodiments above. The obtaining module 301 may obtain the video playing data through a terminal device, where the terminal device may be one or more of a smart phone, a tablet computer, and a portable computer, and certainly may also be a desktop computer, and the source of the video playing data may be a server. The terminal device obtains the video playing data in the server through a network, and the network can comprise various connection types, such as a wired communication link, a wireless communication link and the like. The number of the terminal device, the network and the servers may be set according to actual needs, for example, the terminal device may also obtain the video playing data in a server cluster composed of a plurality of servers through the network. The user may use the terminal device to interact with the server, which may be a server providing various services, through the network to receive or transmit messages or the like.

For example, the obtaining module 301 obtains video playing data through a terminal device, and the converting module 302 performs format conversion on the video playing data to obtain first compressed data; the processing module 303 modifies the resolution of the first compressed data and obtains first video data; or dropping the key frame in the first compressed data and obtaining second video data; the generating module 304 generates a sample splash screen image from the first video data or the second video data. And then, a large number of flower screen image samples required by the training of the flower screen detection model can be generated quickly, so that the efficiency of generating the flower screen image is improved.

However, in other embodiments of the present application, the server may also have a similar function as the terminal device, so as to execute the scheme of the method for generating the flower screen image provided in the embodiments of the present application. For example, a user uploads video playing data to a server by using a terminal device (which may also be a terminal device or a terminal device), the obtaining module 301 obtains the video playing data, the converting module 302 performs format conversion on the video playing data to obtain first compressed data, and the processing module 303 modifies the resolution of the first compressed data and obtains the first video data; or dropping the key frame in the first compressed data and obtaining second video data; transcoding the second video compressed data to obtain second video data; the generating module 304 generates a sample screen-blooming image according to the first video data or the second video data, and the server pushes the generated sample screen-blooming image to the terminal device.

In a possible embodiment, the processing module 303 modifies the resolution of the first compressed data and obtains the first video data, including: the processing module 303 increases the resolution of the first compressed data to obtain second compressed data, and transcodes the second compressed data to obtain the first video data. Illustratively, when the resolution of the first compressed data is increased, and the greater the degree of the increase in the resolution, the greater the degree of the screen-blooming of the generated sample screen-blooming image, i.e., the more advantageous the generation of the screen-blooming.

In another possible embodiment, the processing module 303 drops the key frame in the first compressed data and obtains the second video data, including: the processing module 303 determines a key frame time in the first compressed data, discards a data frame within a preset range of the key frame time, and obtains second video data.

Illustratively, since the key frame contains all information of the video related to the first compressed data, which is a data frame having a great influence on encoding or decoding of the first compressed data, when the processing module 303 drops the key frame, the probability of generating the screen splash of the second video data is very high, that is, the acquisition of the screen splash image of the sample is very facilitated.

In a further possible embodiment, the generating module 304 generates a sample splash-screen image according to the first video data or the second video data, including: playing the first video data or the second video data, and the generating module 304 captures a video screenshot from a picture played by the first video data or the second video data to generate the sample screen-splash image. In this embodiment, the generating module 304 may intercept the playing frame of the first video data or the second video data, so as to conveniently obtain the sample screen-blooming image.

In yet another possible embodiment, the generating module 304 generates the sample splash screen image according to the first video data or the second video data, including: the generating module 304 generates an alternative screen-blooming image according to the first video data or the second video data; and if the alternative screen-blooming image meets the preset screen-blooming image condition, selecting the alternative screen-blooming image as the sample screen-blooming image. In this embodiment, by presetting the condition of the screen-splash image, the screen-splash image of the sample meeting the condition can be easily screened out.

Further, if the candidate screen-blooming image meets a preset screen-blooming image condition, selecting the candidate screen-blooming image as the sample screen-blooming image, including: the generating module 304 transcodes the first compressed data to obtain third video data; generating an original video image corresponding to the alternative screen-patterned image according to the third video data; comparing the original video image with the alternative screen-patterned image; and if the similarity between the original video image and the alternative screen-blooming image is smaller than a preset threshold value, selecting the alternative screen-blooming image as the target screen-blooming image. In this embodiment, since there may be a certain error in the machine determination, the most accurate determination result can be obtained only by comparing the candidate screen-splash image with the original video image.

The embodiment of the present application provides a training apparatus 400 for a pattern detection model, as shown in fig. 4, the apparatus 400 includes: an acquisition module 401, a pre-processing module 402, a training module 403, and a verification module 404.

The acquiring module 401 is configured to acquire the original video image and the sample screenful image as described in any of the above embodiments, establish a data set, and divide the data set into a training set and a verification set. The preprocessing module 402 is configured to perform normalization preprocessing on the images in the training set. The training module 403 is configured to obtain a neural network training model, and input the images in the training set into the neural network training model for training. The verification module 404 is configured to calculate a training loss of the images in the training set through a loss function, perform back propagation on the training loss to iteratively update the neural network training model, and when a verification result of the neural network training model on the verification set meets a preset requirement, end training to obtain a trained screenplay detection model.

For example, when the obtaining module 401 obtains the original video image and the sample screen-patterned image, the images in the data set may be classified, for example, into a normal image and a screen-patterned image, the normal image and the screen-patterned image are respectively added with a label of "normal" and a label of "screen-patterned", and a proper number of normal images and screen-patterned images are respectively put into the training set and the verification set to train the neural network training model until a verification result of the neural network training model on the verification set meets a preset requirement, the training is finished, and a trained screen-patterned detection model is obtained.

In this embodiment, the obtaining module 401, the preprocessing module 402, the training module 403, and the verifying module 404 may be implemented by hardware, or may be implemented by hardware executing corresponding software.

In a possible embodiment, the preprocessing module 402 performs a normalization preprocessing on the images in the training set, including: the preprocessing module 402 performs at least one of rotation processing, scaling processing, and clipping processing on the images in the training set. In this embodiment: by carrying out rotation processing, scaling processing or cutting processing on the images in the training set, the training effect of the neural network training model can be enhanced, and the detection accuracy of the screen-patterned detection model is improved.

Certainly in order to improve the degree of accuracy of the flower screen detection model, in the embodiment of the present application, the verification module 404 may further adopt a mode that the learning rate of the neural network training model is gradually reduced, and it is right that the neural network training model is iteratively updated, because the neural network training model is updated every time and is closer to the model of the flower screen detection model, it is right that the neural training network model is iteratively updated through the mode that the learning rate is gradually reduced, and the trained flower screen detection model can be obtained more quickly.

The above description is only a specific implementation of the embodiments of the present application, but the scope of the embodiments of the present application is not limited thereto, and any changes or substitutions within the technical scope disclosed in the embodiments of the present application should be covered by the scope of the embodiments of the present application. Therefore, the protection scope of the embodiments of the present application shall be subject to the protection scope of the claims.

Claims

1. A method for generating a screen, comprising:

acquiring video playing data, and performing format conversion on the video playing data to obtain first compressed data;

modifying the resolution of the first compressed data and acquiring first video data; or dropping the key frame in the first compressed data and obtaining second video data;

generating a sample screen-splash image according to the first video data or the second video data.

2. The method of claim 1, wherein modifying the resolution of the first compressed data and obtaining the first video data comprises:

and increasing the resolution of the first compressed data to obtain second compressed data, and transcoding the second compressed data to obtain the first video data.

3. The method of claim 1, wherein dropping key frames from the first compressed data and obtaining second video data comprises:

and determining the key frame time in the first compressed data, discarding data frames within the preset range of the key frame time, and acquiring second video data.

4. The method of claim 1, wherein generating a sample splash screen image from the first video data or the second video data comprises:

and playing the first video data or the second video data, and capturing a video screenshot in a picture played by the first video data or the second video data to generate the sample screen-blooming image.

5. The method of claim 4, wherein generating the sample splash screen image from the first video data or the second video data comprises:

generating an alternative screen-blooming image according to the first video data or the second video data;

and if the alternative screen-blooming image meets the preset screen-blooming image condition, selecting the alternative screen-blooming image as the sample screen-blooming image.

6. The method according to claim 5, wherein the selecting the alternative screen-blooming image as the sample screen-blooming image if the alternative screen-blooming image meets a preset screen-blooming image condition comprises:

transcoding the first compressed data to obtain third video data;

generating an original video image corresponding to the alternative screen-patterned image according to the third video data;

comparing the original video image with the alternative screen-patterned image;

and if the similarity between the original video image and the alternative screen-blooming image is smaller than a preset threshold value, selecting the alternative screen-blooming image as the target screen-blooming image.

7. A training method of a flower screen detection model is characterized by comprising the following steps:

acquiring the original video image and the sample screenful image according to any one of claims 1 to 6, and establishing a data set, dividing the data set into a training set and a verification set;

carrying out normalization preprocessing on the images in the training set;

acquiring a neural network training model, and inputting the images in the training set into the neural network training model for training;

calculating the training loss of the images in the training set through a loss function, performing back propagation on the training loss to iteratively update the neural network training model, and finishing training when the verification result of the neural network training model on the verification set meets the preset requirement to obtain a trained screen detection model.

8. The method of claim 7, wherein the normalizing the images in the training set comprises:

and performing at least one of rotation processing, scaling processing and cutting processing on the images in the training set.

9. A screenout detection apparatus, configured to perform the method of any one of claims 1 to 6.

10. A training apparatus for a pattern detection model, characterized by being configured to perform the method of any one of claims 7 to 8.