CN114612830A

CN114612830A - Method, device and equipment for identifying screen pattern image and storage medium

Info

Publication number: CN114612830A
Application number: CN202210248824.5A
Authority: CN
Inventors: 张博深
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2022-03-14
Filing date: 2022-03-14
Publication date: 2022-06-10

Abstract

The application discloses a method, a device, equipment and a storage medium for identifying a flower screen image, which can be applied to the field of maps. Acquiring an image sequence corresponding to a target video; inputting video images in the image sequence into a target network model to obtain a screen-splash confidence corresponding to the video images, wherein the target network model is obtained by training based on annotated data and non-annotated data, and the non-annotated data is configured with a pseudo label determined based on first image data and second image data; and then determining the screen-blooming image in the target video according to the screen-blooming confidence coefficient. Therefore, the process of model training can be carried out only by a small number of labeled images and label-free images, and the labeled data set is expanded and the accuracy of flower screen image identification is improved due to the utilization of the model prediction consistency of label-free data and the generation of pseudo labels.

Description

Method, device, equipment and storage medium for identifying pattern screen image

Technical Field

The present application relates to the field of computer technologies, and in particular, to a method, an apparatus, a device, and a storage medium for identifying a flower screen image.

Background

With the rapid development of internet technology, people have higher and higher requirements for video content. The screen splash detection is a way to improve the video quality.

Generally, for the process of the screen splash detection, manual feature extraction may be performed on an input image, and then a classifier is trained according to the extracted manual feature, so as to perform secondary classification on whether a current picture is a screen splash image.

However, the generalization of manual features is poor, and for a complex scene, all features may not be covered, which affects the accuracy of the screen-splash image recognition.

Disclosure of Invention

In view of this, the present application provides a method for identifying a flower-screen image, which can effectively improve the accuracy of identifying the flower-screen image.

A first aspect of the present application provides a method for identifying a flower-screen image, which may be applied to a system or a program that includes a function of identifying a flower-screen image in a terminal device, and specifically includes:

acquiring an image sequence corresponding to a target video;

inputting video images in the image sequence into a target network model to obtain a screen-splash confidence corresponding to the video images, wherein the target network model is obtained by training based on annotated data and non-annotated data, the non-annotated data is configured with a pseudo label determined based on first image data and second image data, the first image data comprises a plurality of serial first processed images obtained by adjustment based on training images, the second image data comprises a second processed image obtained by adjustment based on the training images, and the adjustment amplitude corresponding to the second processed image is larger than the adjustment amplitude corresponding to the first processed image;

and determining the screen-blooming image in the target video according to the screen-blooming confidence coefficient.

Optionally, in some possible implementations of the present application, the method further includes:

acquiring the unmarked training image;

performing a first adjustment operation on the training image to obtain a first serial image;

performing the first adjustment operation on the first serial image to obtain a second serial image;

associating the first serial image and the second serial image as a serial plurality of the first processed images, the first adjustment operation being for indicating an adjustment direction that approximates the training image;

performing a second adjustment operation on the training image to obtain a second processed image, wherein the first adjustment operation is used for indicating an adjustment direction different from the training image;

inputting a plurality of first processing images into a first model to obtain a first training parameter;

inputting the second processed image into a second model to obtain a second training parameter, wherein the first model and the second model share a weight parameter;

weighting the first training parameter and the second training parameter to obtain a target parameter;

performing a sharpening operation based on the target parameter to determine a pseudo label corresponding to the training image;

and training the first model according to the training image and the labeling data which configure the pseudo label to obtain the target network model.

Optionally, in some possible implementations of the present application, the inputting the plurality of first processed images into a first model to obtain a first training parameter includes:

inputting a plurality of first processing images into a first model to obtain a parameter sequence used for indicating the corresponding screen-blooming probability of the first processing images;

and carrying out average operation on the parameter sequence to obtain the first training parameter.

Optionally, in some possible implementations of the present application, after performing an averaging operation on the parameter sequence to obtain the first training parameter, the method further includes:

acquiring consistency loss information corresponding to the first model based on the parameter sequence;

updating parameters of the first model based on the consistency loss information.

Optionally, in some possible implementations of the present application, the performing a sharpening operation based on the target parameter to determine a pseudo label corresponding to the training image includes:

acquiring distribution information corresponding to a one-hot coding mode adopted by the first model;

determining a sharpening coefficient based on the distribution information;

and carrying out sharpening operation on the target parameter according to the sharpening coefficient so as to determine a pseudo label corresponding to the training image.

Optionally, in some possible implementation manners of the present application, the training the first model according to the training image and the annotation data configuring the pseudo tag to obtain the target network model includes:

acquiring corresponding first loss information based on the labeling data;

analyzing the consistency of the probability estimation corresponding to the pseudo label to obtain second loss information;

performing supervised training based on the pseudo label to obtain third loss information;

weighting the first loss information, the second loss information and the third loss information to obtain target loss information;

training the first model based on the target loss information to obtain the target network model.

acquiring a data volume corresponding to the annotation data associated with the target video;

if the data volume corresponding to the marking data does not reach a preset proportion, calling a video frame image in the target video;

taking the video frame image as associated non-annotation data;

training the target network model based on the annotation data and the associated label-free data, wherein the associated label-free data is configured with a pseudo label determined based on third image data and fourth image data, the third image data is a plurality of serial third processed images obtained by successively performing image adjustment operation on the video frame images, the fourth image data comprises a fourth processed image obtained by adjusting based on the video frame images, and the adjustment amplitude corresponding to the fourth processed image is larger than the adjustment amplitude corresponding to the third processed image.

The second aspect of the present application provides an apparatus for recognizing a screen-patterned image, including:

the acquisition unit is used for acquiring an image sequence corresponding to the target video;

an input unit, configured to input a video image in the image sequence into a target network model to obtain a screen-splash confidence corresponding to the video image, where the target network model is obtained by training based on annotation data and non-annotation data, the non-annotation data is configured with a pseudo tag determined based on first image data and second image data, the first image data is a plurality of serial first processed images obtained by successively performing image adjustment operations on a training image, the second image data includes a second processed image obtained by adjustment based on the training image, and an adjustment amplitude corresponding to the second processed image is greater than an adjustment amplitude corresponding to the first processed image;

and the identification unit is used for determining the screen-blooming image in the target video according to the screen-blooming confidence coefficient.

Optionally, in some possible implementation manners of the present application, the identification unit is specifically configured to obtain the unmarked training image;

the identification unit is specifically configured to perform a first adjustment operation on the training image to obtain a first serial image;

the identification unit is specifically configured to perform the first adjustment operation on the first serial image to obtain a second serial image;

the identification unit is specifically configured to associate the first serial image and the second serial image as a plurality of first processing images in series, where the first adjustment operation is used to indicate an adjustment direction approximate to the training image;

the identification unit is specifically configured to perform a second adjustment operation on the training image to obtain the second processed image, where the first adjustment operation is used to indicate an adjustment direction different from the training image;

the identification unit is specifically configured to input the plurality of first processed images into a first model to obtain a first training parameter;

the identification unit is specifically configured to input the second processed image into a second model to obtain a second training parameter, where the first model and the second model share a weight parameter;

the identification unit is specifically configured to weight the first training parameter and the second training parameter to obtain a target parameter;

the identification unit is specifically configured to perform a sharpening operation based on the target parameter to determine a pseudo label corresponding to the training image;

the identification unit is specifically configured to train the first model according to the training image and the labeling data configuring the pseudo tag, so as to obtain the target network model.

Optionally, in some possible implementations of the present application, the identifying unit is specifically configured to input a plurality of first processed images into a first model, so as to obtain a parameter sequence indicating a screensaver probability corresponding to the first processed images;

the identification unit is specifically configured to perform an averaging operation on the parameter sequence to obtain the first training parameter.

Optionally, in some possible implementation manners of the present application, the identifying unit is specifically configured to obtain consistency loss information corresponding to the first model based on the parameter sequence;

the identification unit is specifically configured to update parameters of the first model based on the consistency loss information.

Optionally, in some possible implementation manners of the present application, the identifying unit is specifically configured to obtain distribution information corresponding to a one-hot coding manner adopted by the first model;

the identification unit is specifically configured to determine a sharpening coefficient based on the distribution information;

the identification unit is specifically configured to perform a sharpening operation on the target parameter according to the sharpening coefficient to determine a pseudo label corresponding to the training image.

Optionally, in some possible implementation manners of the present application, the identification unit is specifically configured to obtain corresponding first loss information based on the labeled data;

the identification unit is specifically configured to analyze consistency of probability estimation corresponding to the pseudo tag to obtain second loss information;

the identification unit is specifically used for performing supervision training based on the pseudo label to acquire third loss information;

the identification unit is specifically configured to weight the first loss information, the second loss information, and the third loss information to obtain target loss information;

the identification unit is specifically configured to train the first model based on the target loss information to obtain the target network model.

Optionally, in some possible implementations of the present application, the identification unit is specifically configured to obtain a data amount corresponding to annotation data associated with the target video;

the identification unit is specifically used for calling a video frame image in the target video if the data volume corresponding to the marking data does not reach a preset proportion;

the identification unit is specifically configured to use the video frame image as associated non-labeled data;

the identification unit is specifically configured to train the target network model based on the annotation data and the associated label-free data, where the associated label-free data is configured with a pseudo label determined based on third image data and fourth image data, the third image data is a plurality of serial third processed images obtained by successively performing image adjustment operations on the video frame image, the fourth image data includes a fourth processed image obtained by performing image adjustment on the video frame image, and an adjustment amplitude corresponding to the fourth processed image is greater than an adjustment amplitude corresponding to the third processed image.

A third aspect of the present application provides a computer device comprising: a memory, a processor, and a bus system; the memory is used for storing program codes; the processor is configured to execute the method for identifying a screensaver image according to any one of the first aspect and the first aspect.

A fourth aspect of the present application provides a computer-readable storage medium, which stores instructions that, when executed on a computer, cause the computer to execute the method for identifying a screenful image according to any one of the first aspect or the first aspect.

According to an aspect of the application, a computer program product or computer program is provided, comprising computer instructions, the computer instructions being stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the method for identifying a screenful image provided in the first aspect or the various alternative implementations of the first aspect.

According to the technical scheme, the embodiment of the application has the following advantages:

acquiring an image sequence corresponding to a target video; then inputting the video images in the image sequence into a target network model to obtain a screen-splash confidence corresponding to the video images, wherein the target network model is obtained by training based on annotated data and non-annotated data, the non-annotated data is configured with a pseudo label determined based on first image data and second image data, the first image data comprises a plurality of serial first processed images obtained by adjustment based on the trained images, the second image data comprises a second processed image obtained by adjustment based on the trained images, and the adjustment amplitude corresponding to the second processed image is larger than the adjustment amplitude corresponding to the first processed image; and then determining the screen-blooming image in the target video according to the screen-blooming confidence coefficient. Therefore, the process of model training can be carried out only by a small amount of manually marked images and a large amount of unmarked images, and the marked data set is further expanded by utilizing the model prediction consistency of unmarked data and the generation of pseudo labels, so that the accuracy of the identification of the flower screen image is improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the embodiments or the prior art descriptions will be briefly described below, it is obvious that the drawings in the following description are only the embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

FIG. 1 is a diagram of a network architecture in which a system for identifying an image of a flower screen operates;

fig. 2 is a flowchart illustrating identification of a flower screen image according to an embodiment of the present disclosure;

fig. 3 is a flowchart of a method for identifying a flower screen image according to an embodiment of the present disclosure;

fig. 4 is a scene schematic diagram of a method for identifying a flower screen image according to an embodiment of the present application;

fig. 5 is a scene schematic diagram of another identification method for a splash screen image according to an embodiment of the present application;

fig. 6 is a scene schematic diagram of another identification method for a splash screen image according to an embodiment of the present application;

FIG. 7 is a flowchart of another method for identifying a splash screen image according to an embodiment of the present disclosure;

fig. 8 is a schematic structural diagram of an apparatus for identifying a flower screen image according to an embodiment of the present application;

fig. 9 is a schematic structural diagram of a terminal device according to an embodiment of the present application;

fig. 10 is a schematic structural diagram of a server according to an embodiment of the present application.

Detailed Description

The embodiment of the application provides a method and a related device for identifying a screen-patterned image, which can be applied to a system or a program containing a function of identifying the screen-patterned image in terminal equipment, and can be used for identifying a screen-patterned image by acquiring an image sequence corresponding to a target video; then inputting the video images in the image sequence into a target network model to obtain a screen-splash confidence corresponding to the video images, wherein the target network model is obtained by training based on annotated data and non-annotated data, the non-annotated data is configured with a pseudo label determined based on first image data and second image data, the first image data comprises a plurality of serial first processed images obtained by adjustment based on the trained images, the second image data comprises a second processed image obtained by adjustment based on the trained images, and the adjustment amplitude corresponding to the second processed image is larger than the adjustment amplitude corresponding to the first processed image; and then determining the screen-blooming image in the target video according to the screen-blooming confidence coefficient. Therefore, the model training process can be carried out only by a small amount of manually marked images and a large amount of unmarked images, and the marked data set is expanded and the accuracy of the identification of the flower screen images is improved due to the utilization of the model prediction consistency of unmarked data and the generation of pseudo labels.

The terms "first," "second," "third," "fourth," and the like in the description and in the claims of the present application and in the drawings described above, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are, for example, capable of operation in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "corresponding" and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

First, some nouns that may appear in the embodiments of the present application are explained.

One-Hot encoding (One-Hot encoding): also known as bit-efficient encoding, this is achieved by using an N-bit status register to encode N states, each having its own independent register bit and only one of which is active at any one time.

It should be understood that the method for identifying a flower-screen image provided by the present application may be applied to a system or a program containing a function of identifying a flower-screen image in a terminal device, for example, a video player, specifically, the system for identifying a flower-screen image may operate in a network architecture as shown in fig. 1, which is a network architecture diagram operated by the system for identifying a flower-screen image as shown in fig. 1, and as can be seen from the diagram, the system for identifying a flower-screen image may provide an identification process with flower-screen images of multiple information sources, that is, a terminal-side trigger operation is used to perform flower-screen identification on a video delivered by a server, so as to improve the playing quality of the video; it can be understood that, fig. 1 shows various terminal devices, the terminal devices may be computer devices, in an actual scene, there may be more or fewer types of terminal devices participating in the process of identifying the flower-screen image, the specific number and type are determined by the actual scene, and are not limited herein, in addition, fig. 1 shows one server, but in an actual scene, there may also be participation of multiple servers, especially in a scene of multi-video interaction, the specific number of servers is determined by the actual scene.

In this embodiment, the server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a network service, cloud communication, a middleware service, a domain name service, a security service, a CDN, a big data and artificial intelligence platform, and the like. The terminal may be a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, a smart voice interaction device, a smart home appliance, a vehicle-mounted terminal, and the like, but is not limited thereto. The terminal and the server may be directly or indirectly connected through a wired or wireless communication manner, and the terminal and the server may be connected to form a block chain network, which is not limited herein.

It is understood that the above-mentioned identification system of the flower screen image can be operated in a personal mobile terminal, for example: the application, such as a video player, can also run on a server, and can also run on a third-party device to provide the identification of the flower screen image so as to obtain the identification processing result of the flower screen image of the information source; the specific identification system for the screen-patterned image can be operated in the above-mentioned device in the form of a program, can also be operated as a system component in the above-mentioned device, and can also be used as one of cloud service programs.

In order to solve the above problem, the present application provides a method for identifying a flower-screen image, which is applied to a flow framework for identifying a flower-screen image shown in fig. 2, as shown in fig. 2, for the flow framework for identifying a flower-screen image provided in the embodiment of the present application, a user determines a corresponding video through a play operation of a terminal, so that a server performs flower-screen detection on the video, the flower-screen detection is performed to detect whether a flower-screen phenomenon exists in a picture, a final output result is a probability that a current image is a flower-screen image, and further, a binary result of whether the current image is a flower-screen image can be output. The task is a necessary step in the quality analysis of pictures and videos, which can be used to evaluate the quality of the current picture or video.

In the embodiment, training images are expanded to configure pseudo labels, and then a target network model is trained, so that the algorithm is a semi-supervised screen-splash detection algorithm based on serial image enhancement, training of the model can be completed only by a small amount of screen-splash labeling data and a certain amount of training data without screen-splash labeling, and the trained model can directly perform high-precision classification on screen-splash or normal images and obtain better classification precision.

It can be understood that the method provided by the present application may be a program written as a processing logic in a hardware system, and may also be a device for recognizing a flower screen image, and the processing logic is implemented in an integrated or external manner. As one implementation manner, the device for identifying the flower screen image acquires an image sequence corresponding to a target video; then inputting the video images in the image sequence into a target network model to obtain a screen-splash confidence corresponding to the video images, wherein the target network model is obtained by training based on annotated data and non-annotated data, the non-annotated data is configured with a pseudo label determined based on first image data and second image data, the first image data comprises a plurality of serial first processed images obtained by adjustment based on the trained images, the second image data comprises a second processed image obtained by adjustment based on the trained images, and the adjustment amplitude corresponding to the second processed image is larger than the adjustment amplitude corresponding to the first processed image; and then determining the screen-blooming image in the target video according to the screen-blooming confidence coefficient. Therefore, the process of model training can be carried out only by a small amount of manually marked images and a large amount of unmarked images, and the marked data set is further expanded by utilizing the model prediction consistency of unmarked data and the generation of pseudo labels, so that the accuracy of the identification of the flower screen image is improved.

The scheme provided by the embodiment of the application relates to the computer vision technology of artificial intelligence, and is specifically explained by the following embodiment:

with reference to the above flow architecture, the following describes a method for identifying a splash-screen image in the present application, please refer to fig. 3, where fig. 3 is a flow chart of a method for identifying a splash-screen image according to an embodiment of the present application, where the management method may be executed by a server or a terminal, and the embodiment of the present application at least includes the following steps:

301. and acquiring an image sequence corresponding to the target video.

In this embodiment, the target video may be a video in a user playlist, may also be a video in a cloud video database, and may also be a video played in real time, where a specific video type is determined by an actual scene.

Specifically, the image sequence may be an image obtained by performing video frame analysis on the target video, and the specific video frame gap may be determined by the playing configuration of the player, for example, the higher the playing frame rate, the smaller the gap between the video frames, and the greater the number of images in the obtained image sequence.

302. The method comprises the steps of inputting video images in an image sequence into a target network model to obtain a screen-splash confidence degree corresponding to the video images, wherein the target network model is obtained through training based on annotated data and non-annotated data, the non-annotated data is configured with a pseudo label determined based on first image data and second image data, the first image data are a plurality of serial first processed images obtained through image adjustment operation of training images one by one, the second image data comprise second processed images obtained through adjustment based on the training images, and the adjustment amplitude corresponding to the second processed images is larger than that corresponding to the first processed images.

In this embodiment, that is, in the training process of performing sample expansion based on label-free data, since the image of the flower screen has a gradual change characteristic, specifically as shown in fig. 4, fig. 4 is a scene schematic diagram of the identification method of the flower screen image provided in this embodiment of the present application; the figure shows the flower-screen images with different degrees, and the visible flower-screen images are obtained by gradually adjusting the normal images, so that the adjustment range corresponding to the second processed image is larger than that corresponding to the first processed image, namely, the sample generation process of the flower-screen images is simulated.

Specifically, a framework of a semi-supervised flower-screen detection algorithm based on serial image enhancement is shown in fig. 5, and fig. 5 is a scene schematic diagram of another flower-screen image identification method provided in the embodiment of the present application; for an input image x without an artificial label, two types of data enhancement modes are respectively used, namely, an unmarked training image is firstly obtained; then, performing a first adjustment operation (simple enhancement) on the training image to obtain a first serial image; further executing a first adjustment operation on the first serial image to obtain a second serial image; and correlating the first serial image with the second serial image to serve as a plurality of first processing images in series, the first adjustment operation being used for indicating an adjustment direction approximate to the training image; then, performing a second adjustment operation (complex enhancement) on the training image to obtain a second processed image, the second adjustment operation being used for indicating an adjustment direction different from the training image; then inputting the plurality of first processing images into a first model to obtain a first training parameter; inputting the second processed image into a second model to obtain a second training parameter, wherein the first model and the second model share the weight parameter, so that the training effects of different samples are ensured; further weighting the first training parameter and the second training parameter to obtain a target parameter; carrying out sharpening operation based on the target parameters to determine a pseudo label corresponding to the training image; and then training the first model according to the training image and the labeling data of the configured pseudo label to obtain a target network model.

It is to be understood that, in the course of performing the first adjustment operation, the above example proceeds to the second first adjustment operation; in an actual scene, a first adjustment operation may be performed more times, for example, if there are 5 first serial processing images, the specific processing procedure is to perform a first simple enhancement on the training image to obtain a serial image a 1; further performing a second simple enhancement on the serial image a1 to obtain a serial image a 2; then, performing a second simple enhancement on the serial image A2 to obtain a serial image A3; and performing a second simple enhancement on the serial image A3 to obtain a serial image a4, and further performing a second simple enhancement on the serial image a4 to obtain a serial image a5, so that a plurality of obtained first processed images are a1-a5, where the specific number is determined by an actual scene and is not limited herein.

It should be noted that the specific manner used in the simple enhancement process may be the same or different, and the specific simple enhancement may include: operations such as linear transformation, contrast transformation, brightness transformation, image equalization, image flipping (horizontal and vertical directions), etc., may be a combination of any of the above in an actual scene, and are not limited herein.

Specifically, the first adjustment operation is used to indicate an adjustment direction that approximates the training image; the adjustment direction similar to the training image, namely information such as texture, spatial arrangement and content of the image, is kept relatively completely, and the adjusted image is similar to that before adjustment; in one possible scenario, the first adjustment operation, namely the simple enhancement, may include: linear transformation, contrast transformation, brightness transformation, image equalization, image flipping (horizontal and vertical directions), and the like; correspondingly, the second adjustment operation is used for indicating an adjustment direction different from that of the training image, i.e. a larger range of adjustment is performed on the original image, and the complicated enhancement can be achieved by: random erasure, large angle image rotation, adding complex noise, etc. The two methods are mainly different in that the degrees of image enhancement are inconsistent, simple image enhancement is performed, information such as textures, spatial arrangement, content and the like of a picture is kept relatively completely, and complex image enhancement usually means that image pixels in a certain area are assigned to an original image in a large-scale adjustment manner, for example, random erasure is performed, so that the pixel values (image content) of the assigned area are lost. The simple enhancement is performed for N times, and the complex enhancement is performed only once, so that the consistency supervision is enhanced for the features after the N times of simple enhancement, which is equivalent to a regular term, and the training of the network can be stabilized and the generalization capability of the model can be improved. For the N +1 inputs after the two inputs are enhanced, a final output probability P is obtained through a deep neural network CNN model sharing the weight value₁、P₂…P_NAnd P_cIn which P is₁、P₂…P_NThe consistency loss is calculated according to the principle that after the same picture is simply enhanced for N times, the same CNN model can obtain a relatively consistent prediction result. Subsequently, P is added₁、P₂…P_NCarrying out averaging operation, namely inputting a plurality of first processing images into the first model to obtain a parameter sequence; then, the parameter sequence is averaged to obtain a first training parameter. Further reconcile P_cAnd performing weighted fusion to obtain output probability fusion results corresponding to all enhanced input data, and then performing probability sharpening operation on the fusion probability, wherein the operation aims to sharpen the probability into one-hot coded distribution (which is more consistent with the result of an artificial label) closer to one-hot, and the sharpened probability is directly used as a pseudo label of the current image. Finally, the input image x without the artificial label obtains a reliable pseudo label, which is equivalent to endowing part of unlabeled data with the artificial label, so that the CNN model can be continuously trained by further adopting a supervised learning mode until the network converges. Therefore, a small amount of labeled data and a large amount of unlabeled data can be used for training, and finally the robust CNN model is obtained.

It can be understood that, in this embodiment, the target network model is the CNN model only for example, and the specific network structure is not limited, and a CNN model with a mature academia at present may be used as the structure of the deep network.

In addition, the simple enhancement and the complex enhancement are not limited to the above-mentioned methods, and more complex enhancement methods can be designed to further improve the effect.

In a possible scene, for a process of generating a pseudo label by enhancing a simple and complex image, a consistency loss calculation process can be performed, that is, consistency loss information corresponding to a first model is obtained based on a parameter sequence; the parameters of the first model are then updated based on the consistency loss information. Wherein simple image enhancement would be performed serially N times, resulting in x₁、x₂…x_NAnd complex diagramImage enhancement is performed only once to obtain x_cN probability outputs P are obtained by the input after N times of simple enhancement through a CNN model₁、P₂…P_NFor these N probability outputs, a consistency penalty is imposed:

wherein x is₁、x₂…x_NRepresenting a simple image enhanced image sequence.

It will be appreciated that this loss of consistency may cause the network to move towards P₁、P₂…P_NThe optimization is performed in directions as close as possible, and the principle is that for simple image enhancement, the output of the CNN model should have high consistency, so that the CNN model has supervision signals capable of updating parameters. For complex image enhancement, as image texture, spatial information, content information and the like are damaged, probability output P obtained by the CNN model is output_cPossibly with P₁、P₂…P_NThe difference in (c) is relatively large and is not suitable for consistency loss. However, images after complex enhancement help the generalization of the network, which may cause the CNN model to not fit into the locally optimal solution.

In addition, the sharpening operation is to make the target parameter more indicative, that is, first, the distribution information corresponding to the one-hot encoding mode (one-hot) adopted by the first model is obtained, which is to make the probability distribution information after the sharpening operation match with the encoding mode, that is, sharpen the probability to be closer to the one-hot encoding distribution and more conform to the result of the artificial label; a sharpening coefficient may be determined based on the distribution information; and carrying out sharpening operation on the target parameters according to the sharpening coefficient so as to determine a pseudo label corresponding to the training image.

It will be appreciated that the probability P after simple enhancement is considered simultaneously when configuring the pseudo tag₁、P₂…P_NAnd probability P after complex enhancement_c：

P_sofy＝((P₁+P₂+…+P_N)/N+P_c)/2

Wherein, P_softIntegrating probability distribution information of images after simple enhancement and complex enhancement can be used to reflect the potential label of the current input image x, but P_softOften the distribution is not "sharp" enough, e.g., (0.6, 0.4), and has an inconsistent place from the (1, 0) one-hot tag given by the manual label tag, and thus for P_softCarrying out probability sharpening operation:

wherein p is_iIs P_softA corresponding probability distribution; p is a radical of_jProbability distribution after sharpening operation; t ∈ (0, 1), which is a sharpening coefficient indicating the degree of sharpening, where the smaller T, the resulting P_hardThe closer to the one-hot form. Final P_hardThe picture is used as a pseudo label of the current picture for training a CNN model in a subsequent stage.

The following describes the configuration of a loss function of a target network model, including the configuration of a loss function with three dimensions of labeled data, parameter sequence consistency and unlabeled data, that is, first, corresponding first loss information is obtained based on the labeled data; then analyzing the consistency of probability estimation corresponding to the pseudo label to obtain second loss information; performing supervision training based on the pseudo label to obtain third loss information; further weighting the first loss information, the second loss information and the third loss information to obtain target loss information; the first model is then trained based on the target loss information to obtain a target network model.

Specifically, for labeled data (labeled data), the corresponding loss function adopts cross entropy loss:

L_S(x_s，y_s)＝-[y_s*log(f(x_s，θ))+(1-y_s)*log(1-(f(x_s，θ))

wherein x is_sRepresenting a labeled specimen, y_sIs the corresponding label, f represents the nonlinear operation of CNN, and θ is the parameter of f, i.e. the weight value to be updated.

In addition, for unlabeled data, in addition to the consistency loss L described above_eUsing the generated pseudo label P_hardThe training of the network can also be supervised, and the specific form is as follows:

L_u(x_u，P_hard)＝-[P_hard*log(f(x_u，θ))+(1-P_hard)*log(1-(f(x_u，θ))

thus, the target loss function can be expressed as:

L＝L_S+αL_u+βL_e

where α and β are used to control the weight of the corresponding two-term loss function.

It is understood that the above choice of the loss function is not necessarily limited to the cross-entropy loss, and other loss functions, including the L1 loss, the L2 loss or the JS divergence, etc., can be implemented as the loss function.

The embodiment only depends on the training mode of a small number of labeled images. The semi-supervised lattice-type detection algorithm based on serial image enhancement provided by the invention can be trained only by a small amount of manually marked images and a large amount of unmarked images, so that the manpower marking cost is reduced. Due to the fact that the number of training samples is sufficient, the training effect of the CNN model can be improved.

303. And determining the screen-blooming image in the target video according to the screen-blooming confidence coefficient.

In this embodiment, the screen-blooming confidence is used to indicate a probability that an image belongs to a screen-blooming image, a specific recognition result may be as shown in fig. 6, and fig. 6 is a scene schematic diagram of another identification method for a screen-blooming image provided in this embodiment of the present application; the figure shows that the screen-blooming confidence of the recognized screen-blooming image is 0.99, and the screen-blooming confidence of the normal image is 0.05.

The embodiment can be used as a video analysis function, and directly outputs the confidence that a single image is a screen-splash, so as to obtain a detection result (screen-splash or normal) whether the single image is a screen-splash or not, thereby indicating further video processing operation and improving the quality of the video.

With the above embodiments, an image sequence corresponding to a target video is obtained; then inputting the video images in the image sequence into a target network model to obtain a screen-splash confidence corresponding to the video images, wherein the target network model is obtained by training based on annotated data and non-annotated data, the non-annotated data is configured with a pseudo label determined based on first image data and second image data, the first image data comprises a plurality of serial first processed images obtained by adjustment based on the trained images, the second image data comprises a second processed image obtained by adjustment based on the trained images, and the adjustment amplitude corresponding to the second processed image is larger than the adjustment amplitude corresponding to the first processed image; and then determining the screen-blooming image in the target video according to the screen-blooming confidence coefficient. Therefore, the model training process can be carried out only by a small amount of manually marked images and a large amount of unmarked images, and the marked data set is expanded and the accuracy of the identification of the flower screen images is improved due to the utilization of the model prediction consistency of unmarked data and the generation of pseudo labels.

In a possible scenario, it may be determined whether the annotation data corresponding to the target video meets the requirement, so as to perform data supplementation, which is described below. Referring to fig. 7, fig. 7 is a flowchart of another method for identifying a splash-screen image according to an embodiment of the present application, where the embodiment of the present application at least includes the following steps:

701. and acquiring the data volume corresponding to the annotation data associated with the target video.

In this embodiment, in the process of identifying the target video, the data amount corresponding to the labeled data may be determined to determine whether to supplement the unlabeled data, so as to ensure the accuracy of the identification model.

702. And if the data volume corresponding to the annotation data does not reach the preset proportion, calling the video frame image in the target video, and taking the video frame image as the associated non-annotation data.

In this embodiment, the preset proportion may be a proportion of the data amount corresponding to the annotation data to all the training data, or may be a proportion parameter dynamically configured according to the code rate of the target video, for example, the higher the code rate is, the higher the configuration of the preset proportion is.

In one possible scenario, the video frame images in the call target video may be video frame images marked as frequently appearing screenings in the call target video, such as plot key points, beginnings, and the like, thereby improving the effectiveness of associating the annotation-free data.

703. And training the target network model based on the labeled data and the associated label-free data.

In this embodiment, the associated non-labeled data is non-labeled data associated with the target video, for example, non-labeled data obtained based on video frames in the target video, and the video frames in the target video may be automatically selected or video frames with high popularity, for example, video frames with a large number of pop-ups, so as to improve the effectiveness of the non-labeled data.

In addition, the specific training process is shown as the description of step 302 shown in fig. 3, and is not described herein again. That is, the associated non-labeling data is configured with a pseudo label determined based on third image data and fourth image data, the third image data is a plurality of serial third processed images obtained by sequentially performing image adjustment operations on the video frame image (that is, the third image data is a serial image obtained by simply enhancing the video frame image), the fourth image data includes a fourth processed image obtained by adjusting based on the video frame image (that is, the fourth image data is an image obtained by complexly enhancing the video frame image), and the adjustment range corresponding to the fourth processed image is larger than the adjustment range corresponding to the third processed image.

The embodiment provides a semi-supervised pattern detection algorithm based on serial image enhancement, and the training of the model can be completed only by a small amount of pattern marking data and a certain amount of training data without pattern marking, the trained model can directly classify the pattern or the normal image with high precision, and better classification precision is obtained, and the labor cost consumed by marking is greatly reduced.

In order to better implement the above-mentioned aspects of the embodiments of the present application, the following also provides related apparatuses for implementing the above-mentioned aspects. Referring to fig. 8, fig. 8 is a schematic structural diagram of an apparatus for recognizing a flower-screen image according to an embodiment of the present application, where the apparatus 800 for recognizing a flower-screen image includes:

an acquiring unit 801, configured to acquire an image sequence corresponding to a target video;

an input unit 802, configured to input a video image in the image sequence into a target network model to obtain a screen-splash confidence corresponding to the video image, where the target network model is trained based on annotated data and non-annotated data, where the non-annotated data configures a pseudo tag determined based on first image data and second image data, the first image data includes a plurality of serial first processed images adjusted based on a training image, the second image data includes a second processed image adjusted based on the training image, and an adjustment amplitude corresponding to the second processed image is greater than an adjustment amplitude corresponding to the first processed image;

and the identifying unit 803 is configured to determine a screen-blooming image in the target video according to the screen-blooming confidence.

Optionally, in some possible implementations of the present application, the identifying unit 803 is specifically configured to obtain the training images that are not labeled;

the identifying unit 803 is specifically configured to perform a first adjustment operation on the training image multiple times to obtain a plurality of serial first processing images, where the first adjustment operation is used to indicate an adjustment direction approximate to the training image;

the identifying unit 803 is specifically configured to perform a second adjustment operation on the training image to obtain the second processed image, where the first adjustment operation is used to indicate an adjustment direction different from the training image;

the recognition unit 803 is specifically configured to input the plurality of first processed images into a first model to obtain a first training parameter;

the identifying unit 803 is specifically configured to input the second processed image into a second model to obtain a second training parameter, where the first model and the second model share a weight parameter;

the identifying unit 803 is specifically configured to weight the first training parameter and the second training parameter to obtain a target parameter;

the identifying unit 803 is specifically configured to perform a sharpening operation based on the target parameter to determine a pseudo label corresponding to the training image;

the identifying unit 803 is specifically configured to train the first model according to the training image and the labeling data configuring the pseudo tag, so as to obtain the target network model.

Optionally, in some possible implementations of the present application, the identifying unit 803 is specifically configured to input a plurality of the first processed images into a first model to obtain a parameter sequence;

the identifying unit 803 is specifically configured to perform an averaging operation on the parameter sequence to obtain the first training parameter.

Optionally, in some possible implementation manners of the present application, the identifying unit 803 is specifically configured to obtain consistency loss information corresponding to the first model based on the parameter sequence;

the identifying unit 803 is specifically configured to update the parameter of the first model based on the consistency loss information.

Optionally, in some possible implementation manners of the present application, the identifying unit 803 is specifically configured to obtain distribution information corresponding to a one-hot coding manner adopted by the first model;

the identifying unit 803 is specifically configured to determine a sharpening coefficient based on the distribution information;

the identifying unit 803 is specifically configured to perform a sharpening operation on the target parameter according to the sharpening coefficient, so as to determine a pseudo label corresponding to the training image.

Optionally, in some possible implementation manners of the present application, the identifying unit 803 is specifically configured to obtain corresponding first loss information based on the labeled data;

the identifying unit 803 is specifically configured to analyze consistency of probability estimates corresponding to the pseudo tags to obtain second loss information;

the identifying unit 803 is specifically configured to perform supervised training based on the pseudo tag to obtain third loss information;

the identifying unit 803 is specifically configured to weight the first loss information, the second loss information, and the third loss information to obtain target loss information;

the identifying unit 803 is specifically configured to train the first model based on the target loss information to obtain the target network model.

Optionally, in some possible implementations of the present application, the identifying unit 803 is specifically configured to obtain a data amount corresponding to annotation data associated with the target video;

the identifying unit 803 is specifically configured to, if the data amount corresponding to the annotation data does not reach a preset ratio, invoke a video frame image in the target video;

the identifying unit 803 is specifically configured to use the video frame image as associated non-labeled data;

the identifying unit 803 is specifically configured to train the target network model based on the annotation data and the associated label-free data, where the associated label-free data is configured with a pseudo label determined based on third image data and fourth image data, the third image data is a plurality of serial third processed images obtained by successively performing image adjustment operations on the video frame image, the fourth image data includes a fourth processed image obtained by performing image adjustment on the video frame image, and an adjustment range corresponding to the fourth processed image is greater than an adjustment range corresponding to the third processed image.

Acquiring an image sequence corresponding to a target video; then inputting the video images in the image sequence into a target network model to obtain a screen-splash confidence corresponding to the video images, wherein the target network model is obtained by training based on annotated data and non-annotated data, the non-annotated data is configured with a pseudo label determined based on first image data and second image data, the first image data comprises a plurality of serial first processed images obtained by adjustment based on the trained images, the second image data comprises a second processed image obtained by adjustment based on the trained images, and the adjustment amplitude corresponding to the second processed image is larger than the adjustment amplitude corresponding to the first processed image; and then determining the screen-blooming image in the target video according to the screen-blooming confidence coefficient. Therefore, the model training process can be carried out only by a small amount of manually marked images and a large amount of unmarked images, and the marked data set is expanded and the accuracy of the identification of the flower screen images is improved due to the utilization of the model prediction consistency of unmarked data and the generation of pseudo labels.

An embodiment of the present application further provides a terminal device, as shown in fig. 9, which is a schematic structural diagram of another terminal device provided in the embodiment of the present application, and for convenience of description, only a portion related to the embodiment of the present application is shown, and details of the specific technology are not disclosed, please refer to a method portion in the embodiment of the present application. The terminal may be any terminal device including a mobile phone, a tablet computer, a Personal Digital Assistant (PDA), a point of sale (POS), a vehicle-mounted computer, and the like, taking the terminal as the mobile phone as an example:

fig. 9 is a block diagram illustrating a partial structure of a mobile phone related to a terminal provided in an embodiment of the present application. Referring to fig. 9, the handset includes: radio Frequency (RF) circuitry 910, memory 920, input unit 930, display unit 940, sensor 950, audio circuitry 960, wireless fidelity (WiFi) module 970, processor 980, and power supply 990. Those skilled in the art will appreciate that the handset configuration shown in fig. 9 is not intended to be limiting and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components.

The following describes each component of the mobile phone in detail with reference to fig. 9:

the RF circuit 910 may be used for receiving and transmitting signals during information transmission and reception or during a call, and in particular, for receiving downlink information of a base station and then processing the received downlink information to the processor 980; in addition, data for designing uplink is transmitted to the base station. In general, the RF circuit 910 includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a Low Noise Amplifier (LNA), a duplexer, and the like. In addition, the RF circuit 910 may also communicate with networks and other devices via wireless communication. The wireless communication may use any communication standard or protocol, including but not limited to global system for mobile communications (GSM), General Packet Radio Service (GPRS), Code Division Multiple Access (CDMA), Wideband Code Division Multiple Access (WCDMA), Long Term Evolution (LTE), email, Short Message Service (SMS), etc.

The memory 920 may be used to store software programs and modules, and the processor 980 may execute various functional applications and data processing of the mobile phone by operating the software programs and modules stored in the memory 920. The memory 920 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the cellular phone, and the like. Further, the memory 920 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.

The input unit 930 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function control of the cellular phone. Specifically, the input unit 930 may include a touch panel 931 and other input devices 932. The touch panel 931, also referred to as a touch screen, may collect touch operations by a user (e.g., operations by the user on or near the touch panel 931 using any suitable object or accessory such as a finger or a stylus, and spaced-apart touch operations within a certain range on the touch panel 931) and drive the corresponding connection devices according to a preset program. Alternatively, the touch panel 931 may include two parts, a touch detection device and a touch controller. The touch detection device detects the touch direction of a user, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch sensing device, converts the touch information into touch point coordinates, sends the touch point coordinates to the processor 980, and can receive and execute commands sent by the processor 980. In addition, the touch panel 931 may be implemented by various types, such as a resistive type, a capacitive type, an infrared ray, and a surface acoustic wave. The input unit 930 may include other input devices 932 in addition to the touch panel 931. In particular, other input devices 932 may include, but are not limited to, one or more of a physical keyboard, function keys (e.g., volume control keys, switch keys, etc.), a trackball, a mouse, a joystick, and the like.

The display unit 940 may be used to display information input by the user or information provided to the user and various menus of the mobile phone. The display unit 940 may include a display panel 941, and optionally, the display panel 941 may be configured in the form of a Liquid Crystal Display (LCD), an organic light-emitting diode (OLED), or the like. Further, the touch panel 931 may cover the display panel 941, and when the touch panel 931 detects a touch operation on or near the touch panel 931, the touch panel transmits the touch operation to the processor 980 to determine the type of the touch event, and then the processor 980 provides a corresponding visual output on the display panel 941 according to the type of the touch event. Although in fig. 9, the touch panel 931 and the display panel 941 are two independent components to implement the input and output functions of the mobile phone, in some embodiments, the touch panel 931 and the display panel 941 may be integrated to implement the input and output functions of the mobile phone.

The handset may also include at least one sensor 950, such as a light sensor, motion sensor, and other sensors. Specifically, the light sensor may include an ambient light sensor that adjusts the brightness of the display panel 941 according to the brightness of ambient light, and a proximity sensor that turns off the display panel 941 and/or backlight when the mobile phone is moved to the ear. As one of the motion sensors, the accelerometer sensor can detect the magnitude of acceleration in each direction (generally, three axes), can detect the magnitude and direction of gravity when stationary, and can be used for applications of recognizing the posture of a mobile phone (such as horizontal and vertical screen switching, related games, magnetometer posture calibration), vibration recognition related functions (such as pedometer and tapping), and the like; as for other sensors such as a gyroscope, a barometer, a hygrometer, a thermometer, and an infrared sensor, which can be configured on the mobile phone, the description is omitted here.

Audio circuitry 960, speaker 961, microphone 962 may provide an audio interface between a user and a cell phone. The audio circuit 960 may transmit the electrical signal converted from the received audio data to the speaker 961, and convert the electrical signal into a sound signal for output by the speaker 961; on the other hand, the microphone 962 converts the collected sound signal into an electrical signal, which is received by the audio circuit 960 and converted into audio data, which is then processed by the audio data output processor 980, and then passed through the RF circuit 910 to be sent to, for example, another mobile phone, or output the audio data to the memory 920 for further processing.

WiFi belongs to short-distance wireless transmission technology, and the mobile phone can help a user to receive and send e-mails, browse webpages, access streaming media and the like through the WiFi module 970, and provides wireless broadband Internet access for the user. Although fig. 9 shows the WiFi module 970, it is understood that it does not belong to the essential constitution of the handset, and can be omitted entirely as needed within the scope not changing the essence of the invention.

The processor 980 is a control center of the mobile phone, connects various parts of the entire mobile phone using various interfaces and lines, and performs various functions of the mobile phone and processes data by operating or executing software programs and/or modules stored in the memory 920 and calling data stored in the memory 920, thereby integrally monitoring the mobile phone. Alternatively, processor 980 may include one or more processing units; alternatively, processor 980 may integrate an application processor, which primarily handles operating systems, user interfaces, and applications, etc., and a modem processor, which primarily handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 980.

The handset also includes a power supply 990 (e.g., a battery) for powering the various components, which may optionally be logically connected to the processor 980 via a power management system, such that the power management system may be used to manage charging, discharging, and power consumption.

Although not shown, the mobile phone may further include a camera, a bluetooth module, etc., which are not described herein.

In the embodiment of the present application, the processor 980 included in the terminal further has a function of performing each step of the above-described page processing method.

Referring to fig. 10, fig. 10 is a schematic structural diagram of a server provided in the embodiment of the present application, and the server 1000 may generate a relatively large difference due to different configurations or performances, and may include one or more Central Processing Units (CPUs) 1022 (e.g., one or more processors) and a memory 1032, and one or more storage media 1030 (e.g., one or more mass storage devices) storing an application 1042 or data 1044. Memory 1032 and storage medium 1030 may be, among other things, transient or persistent storage. The program stored on the storage medium 1030 may include one or more modules (not shown), each of which may include a series of instruction operations for the server. Still further, a central processor 1022 may be disposed in communication with the storage medium 1030, and configured to execute a series of instruction operations in the storage medium 1030 on the server 1000.

The server 1000 may also include one or more power supplies 1026, one or more wired or wireless network interfaces 1050, one or more input-output interfaces 1058, and/or one or more operating systems 1041, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, and so forth.

The steps performed by the management apparatus in the above-described embodiment may be based on the server configuration shown in fig. 10.

An embodiment of the present application further provides a computer-readable storage medium, in which identification instructions of an image with a flower screen are stored, and when the computer-readable storage medium is executed on a computer, the computer is caused to perform the steps performed by the identification apparatus of the image with a flower screen in the method described in the foregoing embodiments shown in fig. 3 to 7.

Also provided in the embodiments of the present application is a computer program product including instructions for identifying an image of a flower screen, which when run on a computer causes the computer to perform the steps performed by the apparatus for identifying an image of a flower screen in the method described in the embodiments of fig. 3 to 7.

The embodiment of the present application further provides a system for identifying a flower-screen image, where the system for identifying a flower-screen image may include an apparatus for identifying a flower-screen image in the embodiment described in fig. 8, a terminal device in the embodiment described in fig. 9, or a server described in fig. 10.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit may be implemented in the form of hardware, or may also be implemented in the form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a device for identifying an image of a flower screen, or a network device) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims

1. A method for identifying a flower screen image is characterized by comprising the following steps:

acquiring an image sequence corresponding to a target video;

inputting video images in the image sequence into a target network model to obtain a screen-splash confidence corresponding to the video images, wherein the target network model is obtained by training based on annotated data and non-annotated data, the non-annotated data is configured with a pseudo label determined based on first image data and second image data, the first image data is a plurality of serial first processed images obtained by successively performing image adjustment operation on training images, the second image data comprises a second processed image obtained by adjustment based on the training images, and the adjustment amplitude corresponding to the second processed image is larger than the adjustment amplitude corresponding to the first processed image;

2. The method of claim 1, further comprising:

acquiring the unmarked training image;

performing a second adjustment operation on the training image to obtain a second processed image, wherein the second adjustment operation is used for indicating an adjustment direction different from the training image;

3. The method of claim 2, wherein inputting the plurality of first processed images into a first model to obtain a first training parameter comprises:

4. The method of claim 3, wherein after averaging the sequence of parameters to obtain the first training parameter, the method further comprises:

5. The method of claim 2, wherein performing a sharpening operation based on the target parameter to determine a corresponding pseudo label for the training image comprises:

determining a sharpening coefficient based on the distribution information;

6. The method of claim 2, wherein training the first model according to the training images configuring the pseudo labels and the annotation data to obtain the target network model comprises:

acquiring corresponding first loss information based on the labeling data;

7. The method according to any one of claims 1-6, further comprising:

taking the video frame image as associated non-annotation data;

8. An apparatus for recognizing a screen image, comprising:

9. A computer device, the computer device comprising a processor and a memory:

the memory is used for storing program codes; the processor is configured to execute the method for identifying a screenful of images according to any one of claims 1 to 7 according to instructions in the program code.

10. A computer-readable storage medium having stored therein instructions which, when run on a computer, cause the computer to execute the method of identifying a screenful image according to any one of claims 1 to 7.