CN113297420A

CN113297420A - Video image processing method and device, storage medium and electronic equipment

Info

Publication number: CN113297420A
Application number: CN202110486585.2A
Authority: CN
Inventors: 杨睿智; 卢江虎
Original assignee: Bigo Technology Singapore Pte Ltd
Current assignee: Bigo Technology Singapore Pte Ltd
Priority date: 2021-04-30
Filing date: 2021-04-30
Publication date: 2021-08-24

Abstract

The invention provides a video image processing method and device, a computer readable storage medium and electronic equipment, and belongs to the technical field of networks. The method comprises the steps of obtaining a video to be processed, selecting N frames of video images from the video to be processed to obtain a first image sequence, removing fuzzy images in the first image sequence when N is a positive integer, and clustering first images in the first image sequence to remove repeated images in the first image sequence to obtain a first target image. Therefore, the fuzzy and repeated low-quality images in the first target image can be reduced to a certain extent, the image quality of the selected first target image is ensured, and the subsequent processing effect is further ensured. Meanwhile, the low-quality image is removed, so that the calculation amount can be reduced, and the processing efficiency can be improved to a certain extent.

Description

Video image processing method and device, storage medium and electronic equipment

Technical Field

The present invention relates to the field of network technologies, and in particular, to a video image processing method and apparatus, a computer-readable storage medium, and an electronic device.

Background

With the development of internet technology, various novel original videos such as short videos and live broadcasts are greatly increased, and internet videos are enabled to be more and more abundant. Accordingly, it is sometimes necessary to process the video. For example, there may be more and more illegal videos uploaded to the internet, which causes adverse effects, and therefore, the videos on the internet need to be audited so as to process the illegal videos in time.

In order to perform the auditing process on the video, it is often necessary to select a part of video images from the video as target images, and then process the target images to realize the processing of the video to be processed. In the prior art, video processing is often performed by directly taking video images at fixed time intervals as target images. However, in the target image selected by the above-mentioned means, there is often an image with poor display quality, so that the efficiency of video processing using the target image is low, and the effect of video processing is poor.

Disclosure of Invention

In view of the above, the present invention provides a video image processing method, an apparatus, a computer-readable storage medium, and an electronic device, which solve the problems that when a target image is selected, an image with poor display quality exists, so that efficiency of video processing using the target image is low, and an effect of video processing is poor to a certain extent.

According to a first aspect of the present invention, there is provided a video image processing method, which may include:

acquiring a video to be processed;

selecting N frames of video images from the video to be processed to obtain a first image sequence; n is a positive integer;

and removing blurred images in the first image sequence, and clustering the first images in the first image sequence to remove repeated images in the first image sequence to obtain a first target image.

According to a second aspect of the present invention, there is provided a video image processing apparatus, which may include:

the acquisition module is used for acquiring a video to be processed;

the selection module is used for selecting N frames of video images from the video to be processed so as to obtain a first image sequence; n is a positive integer;

and the removing module is used for removing the blurred image in the first image sequence and clustering the first image in the first image sequence to remove the repeated image in the first image sequence to obtain a first target image.

In a third aspect, an embodiment of the present invention provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when executed by a processor, the computer program implements the steps of the video image processing method according to the first aspect.

In a fourth aspect, the present invention provides an electronic device comprising: a processor, a memory and a computer program stored on the memory and executable on the processor, characterized in that the steps of the video image processing method according to the first aspect are implemented when the processor executes the program.

Aiming at the prior art, the invention has the following advantages:

in the video image processing method provided by the embodiment of the invention, a video to be processed is acquired, N frames of video images are selected from the video to be processed to acquire a first image sequence, wherein N is a positive integer, blurred images in the first image sequence are removed, and a first image in the first image sequence is clustered to remove repeated images in the first image sequence to obtain a first target image. Meanwhile, the low-quality image is removed, so that the calculation amount can be reduced, and the processing efficiency can be improved to a certain extent.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:

FIG. 1 is a flowchart illustrating steps of a video image processing method according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a process provided by an embodiment of the invention;

FIG. 3 is a schematic diagram of another process provided by an embodiment of the invention;

fig. 4 is a block diagram of a video image processing apparatus according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

Exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the invention are shown in the drawings, it should be understood that the invention can be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.

The following describes a video image processing method in the embodiment of the present invention in detail.

Example one

Fig. 1 is a flowchart illustrating steps of a video image processing method according to an embodiment of the present invention, where as shown in fig. 1, the method may include:

step 101, obtaining a video to be processed.

In the embodiment of the invention, the video to be processed can be a short video and a video on a live broadcast platform, the acquisition of the video to be processed can be realized by inquiring the annotation information of the video to be processed on the internet, wherein the annotation information can be the information such as the number, title and the like of the video to be processed, the storage address corresponding to the video to be processed can be determined through the annotation information, and the video to be processed can be downloaded to local equipment according to the storage address so as to realize the acquisition of the video to be processed.

102, selecting N frames of video images from the video to be processed to obtain a first image sequence; and N is a positive integer.

In the embodiment of the invention, the video images can be extracted from the video to be processed according to a preset mode to obtain N frames of video images, and the N frames of video images are used as the first image sequence corresponding to the video to be processed. Specifically, the time information of the video image to be selected may be predetermined according to the playing time sequence of the video to be processed, and the video image corresponding to the time information is extracted from the video to be processed according to the selected time information to obtain N frames of video images, where the video image may be selected according to a preset fixed time interval, for example, one video image is obtained every 4 seconds.

Step 103, removing blurred images in the first image sequence, and clustering the first images in the first image sequence to remove repeated images in the first image sequence, so as to obtain a first target image.

In this embodiment of the present invention, the removing of the blurred image in the first image sequence may be, for all first images in the first image sequence, calculating a pixel value of each first image by using a preset blurred image algorithm, determining whether the first image is a blurred image according to the pixel value, and if the first image is a blurred image, deleting the first image from the first image sequence, where the preset blurred image algorithm may be a blur detection method based on fourier transform, a detection algorithm that performs filtering processing by using Laplacian operator (Laplacian operator), or a blurred image determination method that uses a machine learning model, and the present invention is not limited thereto.

In the embodiment of the present invention, the clustering of the first images in the first image sequence to remove the repeated images in the first image sequence may be performed by using a preset image clustering algorithm to cluster any one of the first images in the first image sequence, determining whether an image that is repeated with the first image exists in the first image sequence, and if so, deleting the repeated images from the first image sequence. The preset image clustering algorithm may be a K-means clustering algorithm (K-means clustering algorithm) or a K-means clustering algorithm (K-means clustering algorithm), and the present invention is not limited thereto. Of course, the repeated images may be determined and removed based on the histogram difference between adjacent images in the image sequence, the next similarity, and the like. However, this method cannot give consideration to images at a time interval of a long time, and is not effective. In this step, by clustering the first images in the first image sequence, the clustering operation is not affected by the time sequence interval, and thus the removal effect of the repeated images can be ensured.

It should be noted that the first target image may be a first image after removing the blurred image and the repeated image in the first image sequence, that is, the first target image is a first image which is displayed clearly and does not have the same image in the first image sequence. When the first image in the first image sequence is processed, the blurred image may be removed first, and then the repeated image may be removed, or the repeated image may be removed first, and then the blurred image is removed, which is not limited in the present invention.

In summary, in the video image processing method provided by the embodiment of the present invention, a video to be processed is obtained, N frames of video images are selected from the video to be processed to obtain a first image sequence, where N is a positive integer, a blurred image in the first image sequence is removed, and the first images in the first image sequence are clustered to remove a repeated image in the first image sequence, so as to obtain a first target image. Therefore, the fuzzy and repeated low-quality images in the first target image can be reduced to a certain extent, the image quality of the selected first target image is ensured, and the subsequent processing effect is further ensured. Meanwhile, the low-quality image is removed, so that the calculation amount can be reduced, and the processing efficiency can be improved to a certain extent.

Example two

Optionally, the following steps may also be executed in the embodiment of the present invention:

and performing model training based on the first target image, or auditing the video to be processed based on the first target image.

In the embodiment of the present invention, the model training may be performed based on the first target image, where the first target image is used as a sample image, and the model is trained, and the model may be a machine learning model, for example, a Convolutional Neural Network (CNN) model for determining a video category, or a Neural network model for recognizing a feature of a human in a video, which is not limited in the present invention. Because the first target image is the first image left after the blurred image and the repeated image are removed from the first image sequence, the effectiveness of model training can be improved by training the model by using the first target image, so that the efficiency of model training can be improved, and the effect of model training can be improved to a certain extent.

In the embodiment of the present invention, the to-be-processed video is audited based on the first target image, and whether the to-be-processed video is illegal may be determined by using the first target image as an input of a preset audit model and according to an obtained output result, where the preset audit model may be obtained by using a sample image to pre-train an initial audit model, and the initial audit model may be an expanded Convolutional network (I3D ConvNet, I3D) model in a Convolutional Neural Network (CNN) model, which is not limited in the present invention. Therefore, the first target image is used for auditing the video to be processed, so that the problems of overlong auditing time or inaccurate auditing result caused by image blurring or repetition can be avoided, and the video auditing efficiency can be improved.

EXAMPLE III

Optionally, in an implementation manner, in an embodiment of the present invention, the video to be processed may be a sample video, and the sample video may be used to train a first video audit model, and accordingly, the step of selecting N frames of video images from the video to be processed to obtain the first image sequence may be implemented by steps shown in the following substeps (1) to (2):

substep (1): and extracting N frames of video images from the video to be processed according to a preset selection mode.

In the embodiment of the present invention, the preset selection manner may be to select the video images according to a preset time interval, for example, 3 frames of video images may be selected every 5 seconds, or one frame of video image may be selected every 0.3 seconds. The method includes extracting N frames of video images from a video to be processed according to a preset selection mode, namely decompressing the video to be processed to obtain each frame of video image sequenced according to a playing time sequence, and extracting corresponding each frame of video image according to the preset selection mode to obtain the N frames of video images. The video images are extracted in a preset selection mode, so that the time for acquiring the video images can be shortened, and the efficiency for acquiring the video images can be improved.

Substep (2): and adjusting the size of the N frames of video images to a preset image size, and forming the first image sequence based on the adjusted N frames of video images.

In an embodiment of the present invention, the preset image size may be an image size suitable for the input model, for example, the preset image size may be 28 × 28. The size of the N frames of video images is adjusted to a preset image size, which may be to scale the size of the video images to the preset image size, for example, the size of the video images is 1280 × 720, the size of the video images may be reduced to 28 × 28, or the video images may be divided according to a preset division manner, so that the size of the divided sub-images is the preset image size. The first image sequence is composed based on the adjusted N frames of video images, and may be the N frames of video images adjusted to a preset image size as the first image sequence. By adjusting the size of the image in advance, the image can be conveniently processed subsequently, and therefore the processing efficiency of the image can be improved.

Optionally, in the embodiment of the present invention, the step of removing the blurred image in the first image sequence and clustering the first image in the first image sequence to remove the repeated image in the first image sequence to obtain the first target image may be implemented by the steps shown in the following substeps (3) to (5):

substep (3): for any first image in the first image sequence, detecting dispersion of pixel values in the first image; the dispersion includes a variance, a standard deviation, and a dispersion coefficient of the pixel values.

In the embodiment of the present invention, the detecting the dispersion of the pixel values in the first image may be to calculate the dispersion of the pixel values of the first image according to the pixel value of each pixel in the first image, and detect the dispersion, where the dispersion may be a variance, a standard deviation, a dispersion coefficient, and the like of the pixel values, which is not limited in this invention.

It should be noted that, before detecting the dispersion of the pixel values in the first image, the embodiment of the present invention may further perform the following steps: and converting the first image into a gray image, and performing filtering processing on the gray image by using a Laplacian Operator (Laplacian Operator) to obtain a processed first image. Specifically, the first image may be converted into a grayscale image by using a preset conversion algorithm, where the preset conversion algorithm may be average value conversion, that is, an average value of three channel color values of red, green, and blue of each pixel in the image is taken as a grayscale value of the pixel, so as to obtain a grayscale image corresponding to the first image. Filtering the grayscale image by using a laplacian operator, or filtering each pixel in the grayscale image according to a preset filtering template and a coefficient corresponding to the filtering template, specifically, roaming in the grayscale image by using the preset filtering template, overlapping the center of the filtering template with a certain pixel position in the image, multiplying the coefficient corresponding to the filtering template with the pixel corresponding to the filtering template, adding all the products, and finally, using the sum as an output response of the filtering template, assigning the output response to the pixel corresponding to the center position of the filtering template, and after the filtering template traverses each pixel in the grayscale image, obtaining a filtering result for the grayscale image, that is, obtaining the first image after filtering. By filtering the first image, the display effect of the image can be enhanced, so that the accuracy of determining the blurred image is improved, and the image processing efficiency is improved.

Substep (4): and if the dispersion of the first image is not larger than a first preset threshold value, deleting the first image from the first image sequence.

In the embodiment of the present invention, the first preset threshold may be a value preset for determining whether the image is a blurred image. When the dispersion of the first image is not greater than the first preset threshold, the first image may be determined to be a blurred image, and the first image may be deleted from the first image sequence. For example, if the variance of the pixel values in the first image is 0.079 and the variance of the pixel values in the first image is less than the first preset threshold value of 0.1, the first image may be determined to be a blurred image, and the first image may be deleted from the first image sequence. Because the processing result obtained by processing the blurred image is often inaccurate and takes much time, the quality of the image in the image sequence can be ensured by deleting the blurred image with poor quality, so that the processing efficiency can be improved when the subsequent processing is carried out.

Substep (5): and clustering the remaining first images in the first image sequence to obtain the first target image.

In the embodiment of the invention, the remaining first images with the blurred images removed in the first image sequence are clustered by using a preset image clustering algorithm, the category to which each first image belongs is determined, and then a frame of image is selected from each category to obtain the first target image. Specifically, the feature vectors of the first image may be extracted first, each feature vector of the remaining first images is stored in the feature matrix, the feature vectors are clustered by using a preset image clustering algorithm, the category to which each feature vector belongs is determined, and the category is used as the category of the first image corresponding to the feature vector. The preset image Clustering algorithm may be k-means Clustering or a Clustering algorithm (Density-Based Spatial Clustering of Applications with Noise, DBSCAN). Repeated images in the first image sequence can be removed through clustering processing, so that the problem of processing resource waste caused by performing video processing operation on the repeated images is avoided, and the efficiency of video processing can be improved.

Optionally, in the embodiment of the present invention, the step of clustering the remaining first images in the first image sequence to obtain the first target image may be implemented by:

calculating a feature value of each of the first images remaining in the first image sequence; based on a preset clustering algorithm, clustering the remaining first images in the first image sequence according to the characteristic values of the first images to obtain a plurality of image groups; and respectively acquiring a first image from each image group as the first target image.

In this embodiment of the present invention, calculating the feature value of each remaining first image in the first image sequence may be calculating the feature value of each first image according to a preset feature function, where the preset feature function may be a hash function, and correspondingly, the feature value may be a perceptual hash value (PHA), a mean hash value, a gradient hash value, a wavelet hash value, and the like, which is not limited in this invention. If the calculating the feature value of the first image is to calculate the perceptual hash value of the first image, the specific steps may include: (1) downscaling the picture to a size of 8x8 for a total of 64 pixels; (2) simplifying colors, and converting the reduced picture into a gray image; (3) calculating an average value, and calculating the gray level average value of all 64 pixels; (4) comparing the gray scale of the pixels, comparing the gray scale of each pixel with the average value, and recording the pixels with the gray scale larger than or equal to the average value as 1 and the pixels with the gray scale smaller than the average value as 0; (5) and calculating a hash value, and combining the comparison results of the previous step together to obtain a 64-bit hash value with 0 or 1, wherein the hash value can be used as the characteristic of the picture.

In the embodiment of the present invention, the preset clustering algorithm may be a DBSCAN clustering algorithm, and based on the preset clustering algorithm, the remaining first images in the first image sequence are clustered according to the feature values of the respective first images to obtain a plurality of image groups, or the DBSCAN clustering algorithm may be used to cluster the feature values of the remaining first images in the first image sequence to obtain a plurality of categories of image groups. Specifically, in the first step, the feature values of the remaining first images in the first image sequence may be used as a data set, input into the DBSCAN clustering algorithm, and input into a neighborhood radius Eps and a neighborhood data object number threshold MinPts, where the neighborhood radius and the neighborhood data object number threshold are adjustable parameters and may be preset according to actual conditions; step two, the DBSCAN clustering algorithm randomly selects a data object point p from the data set; thirdly, if the selected data object point p is the core point for the parameters Eps and MinPts, finding out all data object points with the density reaching from p to form a cluster; fourthly, if the selected data object point p is an edge point, another data object point is selected; and fifthly, repeating the operations of the third step and the fourth step until all points are processed. It should be noted that one cluster may be set as one category, and images corresponding to data object points included in one cluster may be set as an image group of the same category.

In the embodiment of the present invention, one first image is acquired from each image group, and may be selected as the first target image from the image group corresponding to each category. Illustratively, three image groups are obtained through clustering, which are respectively: the image processing method comprises a first image group, a second image group and a third image group, wherein the first image group comprises first images a, b and c, the second image group comprises first images d and e, the third image group comprises a first image f, and one first image is acquired from each image group and is used as a first target image, so that the first target images can be the first images b, e and f. Therefore, the characteristic value of the first image is clustered by using a preset clustering algorithm to determine the first target image, so that the accuracy of clustering can be improved, and the image processing efficiency is improved.

Optionally, in the embodiment of the present invention, when the video to be processed is multiple videos, the following sub-steps may be performed:

substep (6): and taking the first target image corresponding to each video to be processed as a second target image, and clustering the second target image to remove repeated images in the second target image to obtain a third target image.

In the embodiment of the present invention, the first target image corresponding to each video to be processed may be used as a second target image, the feature value of each second target image is calculated, a preset clustering algorithm is then used to perform clustering processing on the feature values of the second target images to obtain a plurality of image groups, and one second target image is obtained from each image group and used as a third target image, so as to remove repeated images in the second target image. The specific clustering step may refer to the clustering process of the foregoing step, which is not limited in the embodiment of the present invention. Therefore, the number of the target images can be reduced, so that when the target images are used for video processing, the problem of processing resource waste caused by repeated images in the target images of a plurality of videos to be processed can be avoided, and the efficiency of video processing can be improved.

outputting the third target image to a labeling person; receiving a judgment identifier returned by the labeling personnel; the judgment mark is generated by the annotating personnel according to the third target image, and the judgment mark is used for representing whether the third target image is illegal.

In the embodiment of the present invention, the third target image is output to the annotating staff, and the third target image may be displayed to the annotating staff. For example, when the video image processing method provided by the embodiment of the present invention is executed by the server, the server may send the third target image to a terminal used by the annotator, and the terminal displays the third target image to the annotator. When the video image processing method provided by the embodiment is executed by the terminal, the terminal can directly display the video image to the annotating person through the display component. The terminal can be a computer, a mobile phone, a tablet and the like used by a label person.

In the embodiment of the invention, after receiving the output third target image, the annotator can determine whether the content displayed by the third target image meets the relevant regulations of internet video distribution or not according to the image information provided by the third target image. And returning a judgment mark representing that the third target image meets the regulation when the image is determined to meet the regulation, and returning a judgment mark representing that the third target image does not meet the regulation when the image is determined to not meet the regulation. For example, if the third target image meets the specification, the annotator can return a judgment flag of "1", and if the third target image does not meet the specification, the annotator can return a judgment flag of "0".

In the embodiment of the invention, because the third target image is the video image without the blurred image and the repeated image, the third target image is output to the annotating personnel, and the judgment identification which is returned by the annotating personnel for judging whether the image violates the rules or not is further received, so that the annotating personnel can conveniently and quickly determine whether the image content violates the rules or not, the problem that the annotating personnel needs to spend more annotation time due to the blurred image and the repeated image is avoided, and the efficiency of annotating the image can be improved.

Correspondingly, in the embodiment of the present invention, the step of training the initial video review model by using the third target image may include:

and taking the judgment identification as a label of the third target image, and training the initial video auditing model according to the third target image and the label of the third target image.

In the embodiment of the present invention, the determination flag for the third target image may be set as the tag of the third target image, for example, if the third target image is an illegal image, the corresponding determination flag is "0", and accordingly, the tag of the third target image is "0". According to the third target image and the label of the third target image, when the initial video audit model is trained, a preset model training method may be used for training, for example, a gradient descent method, and the like, which is not limited in the embodiment of the present invention. The initial video auditing model is trained by utilizing the third target image, so that the problems of poor training result caused by utilizing a fuzzy image and invalid training caused by utilizing a repeated image can be avoided, the efficiency of the training model can be improved, and the training effect is further improved.

Optionally, in the embodiment of the present invention, the step of outputting the third target image to the annotating person may include:

determining a violation score of the third target image based on a preset image auditing model; and outputting the third target image with the violation score larger than a preset score threshold value to the annotating personnel.

In the embodiment of the present invention, the preset image verification model may be a trained image verification model, and the image verification model may be a Scale-innovative feature transform (SIFT). And inputting the third target image into a preset image auditing model, and determining the violation score of the third target image by identifying the similarity between the display content in the third target image and the display content of the violation image. And detecting an illegal score of the third target image, and outputting the third target image to the annotation staff when the illegal score is larger than a preset score threshold, wherein the preset score threshold can be a score predetermined according to actual conditions, for example, the preset score threshold is 65 scores, the illegal score of the third target image is 72 scores, and the third target image can be output to the annotation staff because the illegal score is larger than the preset score threshold.

In the embodiment of the invention, only the third target image with the violation score larger than the preset score threshold value, namely the image with the higher probability of belonging to the violation image, is sent to the annotation personnel, so that the image with the lower annotation significance, namely the image with the lower probability of belonging to the violation image, can be prevented from being sent to the annotation personnel, and further the annotation personnel is prevented from carrying out meaningless annotation operation, thereby reducing the workload of the annotation personnel to a certain extent and improving the efficiency of annotating the image.

Fig. 2 is a schematic diagram of a processing procedure provided by an embodiment of the present invention, as shown in fig. 2, the videos to be processed are video 1 and video 2, … …, inputting videos to be processed, executing step 1 ', extracting images from the videos according to a preset time interval to obtain picture frames corresponding to each video to be processed, executing step 2 ', removing blurred frames to obtain remaining picture frames of each video to be processed, executing step 3 ', selecting suspicious frames by using an existing image model to obtain remaining picture frames of each video to be processed, executing step 4 ', removing repeated frames in the videos to obtain remaining picture frames of each video to be processed, executing step 5 ', removing repeated frames among the videos to obtain all picture frames to be labeled, and training the model by using the picture frames to be labeled as sample images.

Optionally, the step of performing model training based on the first target image in the embodiment of the present invention may include:

training an initial video auditing model by using the third target image to obtain the first video auditing model; and the preset image size is the input size of the initial video auditing model.

In this embodiment of the present invention, a third target image is used as a sample image, an initial video audit model is trained, and the trained video audit model is used as a first video audit model, where the initial video audit model may be an expanded convolution network (I3D ConvNet, I3D) model, and the preset image size may be an input size of the initial video audit model, and specifically, the training process of the initial video audit model may include: inputting the positive sample and the negative sample into an initial video auditing model, identifying the positive sample and the negative sample by the initial video auditing model, improving the accuracy of identifying the illegal video through continuous iterative training, further improving the accuracy of identifying the illegal video by the negative sample serving as supplement, finishing the model training when the identification accuracy of the initial video auditing model reaches a preset accuracy, such as 97%, and taking the initial video auditing model reaching the preset accuracy as the trained video auditing model.

In the embodiment of the invention, the third target image is the video image without the blurred image and the repeated image, so that the diversity of the image content is improved, the definition of image display is ensured, and the proportion of the illegal image is improved.

EXAMPLE III

Optionally, in another implementation manner, in an embodiment of the present invention, the video to be processed may be a video to be audited, and accordingly, the step of auditing the video to be processed based on the first target image may include:

taking the first target image as the input of a second video auditing model to obtain the output result of the second video auditing model; and determining whether the video to be audited violates the rules according to the output result.

In the embodiment of the present invention, the second video audit model may be constructed based on a Convolutional Neural Network (CNN), specifically, the first target image may be used as an input of the video audit model, then, an image feature vector is extracted based on display information of the first target image through a Convolutional layer of the video audit model, then, a full connection processing is performed on the image feature vector by using a full connection layer of the video audit model to obtain a target vector, then, by using a softmax layer of the video audit model, the target vector belongs to a probability of each preset category, and finally, a category with the maximum corresponding probability may be used as a category to which the video to be audited belongs. In the embodiment of the present invention, the preset categories may be violation categories and non-violation categories, the previous level of the softmax layer may include neurons corresponding to the two preset categories one to one, and accordingly, the constituent elements included in the target vector output by the previous level may be elements corresponding to the preset categories, and for each element in the target vector, the softmax layer may map the vector value of the element to (0, 1) by using a softmax function, so as to obtain a probability value corresponding to the element, that is, a probability value corresponding to the preset category. The output result may be the probability that the video to be audited belongs to the illegal video, and the video to be audited, of which the output result is greater than the preset probability threshold value, is determined as the illegal video, for example, if the output result is that the probability that the video to be audited belongs to the illegal video is 86%, and the preset probability threshold value is 75%, the video to be audited is determined as the illegal video.

In the embodiment of the invention, the number of images to be determined can be reduced by taking the first target image as the input of the video auditing model, the problems that more time is spent for determining whether the video violates rules and the output result is inaccurate due to image blurring or repetition can be avoided, and the accuracy rate for determining whether the image violates rules can be improved due to the higher image quality of the first target image, so that the efficiency of video auditing can be improved.

For example, fig. 3 is a schematic diagram of another processing procedure provided by an embodiment of the present invention, as shown in fig. 3, a video is input, step 1 is performed, step 1 is to extract an image from the video according to a preset time interval to obtain all picture frames corresponding to the video, step 2 is performed, step 2 is to remove a blurred frame to obtain remaining picture frames of the video, step 3 is performed, step 3 is to remove a repeated frame in the video to obtain picture frames of the video to be audited, the picture frames of the video to be audited are input into a video auditing model, and whether the video to be audited violates rules is determined.

Example four

Fig. 4 is a block diagram of a video image processing apparatus according to an embodiment of the present invention, and as shown in fig. 4, the apparatus 20 may include:

an obtaining module 201, configured to obtain a video to be processed;

a selecting module 202, configured to select N frames of video images from the video to be processed to obtain a first image sequence; n is a positive integer;

the first removing module 203 is configured to remove blurred images in the first image sequence, and perform clustering on the first images in the first image sequence to remove repeated images in the first image sequence, so as to obtain a first target image.

Optionally, the apparatus 20 further includes:

and the application module is used for carrying out model training based on the first target image or auditing the video to be processed based on the first target image.

Optionally, the first removing module 203 is further configured to:

for any first image in the first image sequence, detecting dispersion of pixel values in the first image; the dispersion comprises variance, standard deviation and dispersion coefficient of the pixel values;

if the dispersion of the first image is not larger than a first preset threshold value, deleting the first image from the first image sequence;

and clustering the remaining first images in the first image sequence to obtain the first target image.

Optionally, the first removing module 203 is further configured to:

calculating a feature value of each of the first images remaining in the first image sequence;

based on a preset clustering algorithm, clustering the remaining first images in the first image sequence according to the characteristic values of the first images to obtain a plurality of image groups;

and respectively acquiring a first image from each image group as the first target image.

Optionally, the video to be processed is a sample video, and the sample video is used for training a first video auditing model; the selecting module 202 is further configured to:

extracting N frames of video images from the video to be processed according to a preset selection mode;

and adjusting the size of the N frames of video images to a preset image size, and forming the first image sequence based on the adjusted N frames of video images.

Optionally, in a case that the video to be processed is a plurality of videos, the apparatus 20 further includes:

and the second removing module is used for clustering the second target images by taking the first target images corresponding to the videos to be processed as the second target images so as to remove repeated images in the second target images and obtain third target images.

Optionally, the application module is further configured to:

Optionally, the apparatus 20 further includes:

the output module is used for outputting the third target image to a labeling person;

the receiving module is used for receiving the judgment identifier returned by the labeling personnel; the judgment identification is generated by the annotation personnel according to the third target image, and is used for representing whether the third target image is illegal or not;

the application module is further configured to:

Optionally, the output module is further configured to:

determining a violation score of the third target image based on a preset image auditing model;

and outputting the third target image with the violation score larger than a preset score threshold value to the annotating personnel.

Optionally, the video to be processed is a video to be audited; the application module is further configured to:

taking the first target image as the input of a second video auditing model to obtain the output result of the second video auditing model;

and determining whether the video to be audited violates the rules according to the output result.

The video image processing device provided by the embodiment of the invention is provided with the corresponding functional module for executing the video image processing method, can execute the video image processing method provided by the embodiment of the invention, and can achieve the same beneficial effects.

In another embodiment provided by the present invention, there is also provided an electronic device, which may include: the processor executes the program to realize the processes of the video image processing method embodiment, and can achieve the same technical effects, and the details are not repeated here in order to avoid repetition. For example, as shown in fig. 5, the electronic device may specifically include: a processor 401, a storage device 402, a display screen 403 with touch functionality, an input device 404, an output device 405, and a communication device 406. The number of the processors 401 in the electronic device may be one or more, and one processor 401 is taken as an example in fig. 4. The processor 401, the storage means 402, the display 403, the input means 404, the output means 405 and the communication means 406 of the electronic device may be connected by a bus or other means.

In yet another embodiment of the present invention, a computer-readable storage medium is further provided, which stores instructions that, when executed on a computer, cause the computer to perform the video image processing method described in any of the above embodiments.

In yet another embodiment, a computer program product containing instructions is provided, which when run on a computer, causes the computer to perform the video image processing method of any of the above embodiments.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims

1. A method for video image processing, the method comprising:

acquiring a video to be processed;

2. The method of claim 1, further comprising:

3. The method of claim 2, wherein removing blurred images in the first image sequence and clustering the first image in the first image sequence to remove duplicate images in the first image sequence to obtain the first target image comprises:

4. The method of claim 3, wherein clustering the remaining first images of the first sequence of images to obtain the first target image comprises:

5. The method according to any one of claims 1 to 4, wherein the video to be processed is a sample video, and the sample video is used for training a first video audit model; the selecting N frames of video images from the video to be processed to obtain a first image sequence includes:

6. The method according to claim 5, wherein in the case that the video to be processed is a plurality of videos, the method further comprises:

and taking the first target image corresponding to each video to be processed as a second target image, and clustering the second target image to remove repeated images in the second target image to obtain a third target image.

7. The method of claim 6, wherein the model training based on the first target image comprises:

8. The method of claim 7, further comprising:

outputting the third target image to a labeling person;

receiving a judgment identifier returned by the labeling personnel; the judgment identification is generated by the annotation personnel according to the third target image, and is used for representing whether the third target image is illegal or not;

the training of the initial video auditing model by using the third target image comprises:

9. The method of claim 8, wherein the outputting the third target image to a human annotator comprises:

10. The method according to claim 2, wherein the video to be processed is a video to be audited; the auditing the video to be processed based on the first target image comprises:

11. A video image processing apparatus, characterized in that the apparatus comprises:

the acquisition module is used for acquiring a video to be processed;

the first removing module is used for removing blurred images in the first image sequence and clustering the first images in the first image sequence to remove repeated images in the first image sequence to obtain a first target image.

12. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 10.

13. An electronic device, comprising:

processor, memory and computer program stored on the memory and executable on the processor, characterized in that the steps of the method according to any of claims 1 to 10 are implemented when the processor executes the program.