CN117409419A

CN117409419A - Image detection method, device and storage medium

Info

Publication number: CN117409419A
Application number: CN202311387773.5A
Authority: CN
Inventors: 武继龙
Original assignee: Beijing 58 Information Technology Co Ltd
Current assignee: Beijing 58 Information Technology Co Ltd
Priority date: 2023-10-24
Filing date: 2023-10-24
Publication date: 2024-01-16

Abstract

The invention provides an image detection method, an image detection device and a storage medium. Comprising the following steps: selecting a reference violation image meeting the requirement on the similarity of the image to be detected from a violation image database, determining a first text description corresponding to the reference violation image, processing the image to be detected by using a first image recognition model to obtain a first feature vector, processing the reference violation image by using a second image recognition model to obtain a second feature vector, processing the first text description by using a text recognition model to obtain a third feature vector, and determining whether the image to be detected contains violation content according to the first feature vector, the second feature vector and the third feature vector. And detecting the illegal contents of the image to be detected from multiple angles, so that the accuracy of detecting the illegal contents is improved, and the image recognition model is obtained based on sample image training without the illegal contents, so that the response to the newly added illegal contents can be realized without continuous iterative updating of the model.

Description

Image detection method, device and storage medium

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to an image detection method, apparatus, and storage medium.

Background

With the development of internet technology, various novel user original contents such as short videos and live broadcast are greatly increased in recent years, and the internet videos are promoted to be more and more rich. At the same time, a large amount of offending video also appears.

At present, a video auditing mode can generally adopt a machine learning mode, whether images in a training video violate rules and violation categories thereof are manually marked, then the images and the corresponding violation categories are input into a machine learning model for training, and other video contents are identified by utilizing a violation identification model obtained by training.

However, the internet has the characteristics of rapidity, openness, timeliness and the like, which often results in rapid propagation of some hot spot offending contents. The traditional violation identification model can only identify the violation contents included in the training set, cannot accurately identify the newly added violation contents, and easily causes the problem of new image content safety.

Disclosure of Invention

The embodiment of the invention provides an image detection method, image detection equipment and a storage medium, which can timely identify whether an image to be detected contains newly added illegal contents or not for response without continuously iterating and updating an image identification model, and further improve the accuracy of image illegal detection.

In a first aspect, an embodiment of the present invention provides an image detection method, including:

acquiring an image to be detected;

selecting a reference violation image meeting the requirement of similarity with the image to be detected from a violation image database, wherein the reference violation image refers to an image containing violation contents;

determining a first text description corresponding to the reference violation image;

inputting the image to be detected into a first image recognition model to obtain a first feature vector corresponding to the image to be detected, inputting the reference violation image into a second image recognition model to obtain a second feature vector corresponding to the reference violation image, wherein the first image recognition model and the second image recognition model are obtained based on sample image training without violation content;

inputting the first text description into a text recognition model to obtain a third feature vector corresponding to the first text description;

and determining whether the image to be detected contains illegal contents according to the first feature vector, the second feature vector and the third feature vector.

In a second aspect, an embodiment of the present invention provides an image detection apparatus, including:

The acquisition module is used for acquiring the image to be detected;

the selection module is used for selecting a reference violation image meeting the requirement on similarity with the image to be detected from a violation image database, wherein the reference violation image refers to an image containing violation contents;

the first determining module is used for determining a first text description corresponding to the reference violation image;

the image processing module is used for inputting the image to be detected into a first image recognition model to obtain a first feature vector corresponding to the image to be detected, inputting the reference violation image into a second image recognition model to obtain a second feature vector corresponding to the reference violation image, and the first image recognition model and the second image recognition model are obtained based on sample image training without violation content;

the text processing module is used for inputting the first text description into a text recognition model to obtain a third feature vector corresponding to the first text description;

and the second determining module is used for determining whether the image to be detected contains illegal contents or not according to the first feature vector, the second feature vector and the third feature vector.

In a third aspect, an embodiment of the present invention provides an electronic device, including: a memory, a processor; wherein the memory is configured to store one or more computer instructions, wherein the one or more computer instructions, when executed by the processor, implement the image detection method of the first aspect.

In a fourth aspect, an embodiment of the present invention provides a computer storage medium storing a computer program, where the computer program causes a computer to implement the image detection method in the first aspect.

In the image detection scheme provided by the embodiment, a reference violation image meeting the requirement on the similarity of an image to be detected is selected from a violation image database, a first text description corresponding to the reference violation image is determined, the image to be detected is processed by using a first image recognition model to obtain a first feature vector corresponding to the image to be detected, the reference violation image is processed by using a second image recognition model to obtain a second feature vector corresponding to the reference violation image, the first text description corresponding to the reference image is processed by using a text recognition model to obtain a third feature vector corresponding to the first text description, and the first image recognition model and the second image recognition model are obtained based on sample image training which does not contain violation content. And finally, determining whether the image to be detected contains illegal contents according to the first feature vector, the second feature vector and the third feature vector.

In the scheme, the reference violation image meeting the requirement on the similarity of the image to be detected is selected from the violation database, namely the coarse detection of the violation content is carried out on the image to be detected, then the reference violation image and the first text description corresponding to the reference violation image are combined, and the accurate detection of the violation content is carried out on the image to be detected from multiple angles, so that the detection result of the image violation is more accurate, and the detection accuracy of the image violation is improved. In addition, because the first image recognition model and the second image recognition model are not obtained based on sample image training containing illegal contents, when the newly added illegal contents occur, continuous iterative updating of the image recognition model is not needed, and the newly added illegal contents can be responded in time. Therefore, when the first image recognition model and the second image recognition model are used for detecting the image to be detected, whether the image to be detected contains the newly added illegal content can be detected in time, so that the problem of safety of the new image content is avoided.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic flow chart of an image detection method according to an embodiment of the present invention;

fig. 2 is a schematic diagram of an application scenario of an image detection method according to an embodiment of the present invention;

fig. 3 is a second application scenario diagram of an image detection method according to an embodiment of the present invention;

FIG. 4 is a schematic flow chart of determining whether an image to be detected is a violation image according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of a model training method according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of an image detection device according to an embodiment of the present invention;

fig. 7 is a schematic structural diagram of an electronic device corresponding to the image detection apparatus provided in the embodiment shown in fig. 6.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

In addition, the sequence of steps in the method embodiments described below is only an example and is not strictly limited.

The more recently nations place about importance on web content security, and a series of web content security clause files are exported. The punishment and measures of the content security incidents on the network are also more and more strict. In order to avoid content security incidents, illegal auditing needs to be performed on online released content to avoid content security incidents.

At present, a machine model is generally adopted to carry out violation detection on contents issued on a network, whether images in a training video are violated and the violation categories of the images are manually marked, then the images and the corresponding violation categories are input into a machine learning model for training, and other video contents are identified by utilizing a violation identification model obtained through training.

However, the internet has the characteristics of rapidity, openness, timeliness and the like, which often results in rapid propagation of some hot spot offending contents. The traditional violation identification model can only identify the violation content included in the training set, so when new violation content suddenly appears on the network, the violation identification model can not respond to some newly added violation content in real time, whether the online video contains the newly added violation content can not be accurately identified, and new content safety problems are easy to occur.

In order to solve the technical problems, the embodiment of the invention provides a new image detection scheme, and the reference violation image and three machine learning models in the violation image database are combined to perform violation content detection processing on the image to be detected so as to obtain a final detection result, and the image to be detected is detected from multiple angles, so that the accuracy of detecting the image violation can be improved. In addition, the machine learning model is obtained based on sample image training without illegal contents, so that when new illegal contents appear on a network, continuous iterative updating of the image recognition model is not needed, and the new illegal contents can be detected directly based on the newly added reference illegal images in the illegal image database and the three machine learning models, namely, the response to the newly added illegal contents in time is realized, and the problem of safety of the new image contents is avoided.

Some embodiments of the present invention are described in detail below with reference to the accompanying drawings. In the case where there is no conflict between the embodiments, the following embodiments and features in the embodiments may be combined with each other.

Fig. 1 is a schematic flow chart of an image detection method according to an embodiment of the present invention; referring to fig. 1, the method may be performed by an image detection apparatus, which may be implemented as software, or a combination of software and hardware. Specifically, the image detection method may include the steps of:

101. And acquiring an image to be detected.

102. And selecting a reference violation image meeting the requirement on similarity with the image to be detected from a violation image database, wherein the reference violation image refers to an image containing violation contents.

103. A first textual description corresponding to the reference violation image is determined.

104. The method comprises the steps of inputting an image to be detected into a first image recognition model to obtain a first feature vector corresponding to the image to be detected, inputting a reference violation image into a second image recognition model to obtain a second feature vector corresponding to the reference violation image, wherein the first image recognition model and the second image recognition model are obtained based on sample image training without violation content.

105. The first text description is input to a text recognition model, and a third feature vector corresponding to the first text description is obtained.

106. And determining whether the image to be detected contains illegal contents or not according to the first feature vector, the second feature vector and the third feature vector.

In practical application, in order to ensure the security of network content, the content such as video and image uploaded by users on the network needs to be detected, and the image detection method provided by the embodiment of the invention can be used for detecting illegal content of each frame of image in the video. Specifically, when an image is detected, an image to be detected is first acquired. The image to be detected can be any type of image which needs to be subjected to illegal content detection, and the type of the image and the content included in the image are not limited.

In addition, the embodiment of the invention can also detect video content, when detecting illegal content of the video uploaded to the network by a user, video frames in the video to be detected can be extracted first, images corresponding to the extracted video frames are determined to be the images to be detected, and illegal content identification is carried out on a plurality of images to be detected respectively to determine whether the video to be detected contains illegal content.

In addition, the specific implementation manner of the image detection device to obtain the image to be detected is not limited in this embodiment, and those skilled in the art may set the implementation manner according to specific application requirements and design requirements, for example: the image to be detected can be stored in a preset area, and the image detection device can obtain the image to be detected by accessing the preset area.

After the image to be detected is obtained, a reference violation image meeting the requirement on similarity with the image to be detected is selected from a violation image database. Wherein the reference violation image refers to an image containing violation content. The violation image database may store a plurality of violation images of at least one category, and may be updated according to real-time hotspot content security regulations. The violation database can be set according to actual requirements, for example, a large number of violation images of various categories can be stored in the violation image database, and after new violation contents appear, the images corresponding to the new violation contents can be updated into the violation image database. In addition, the violation database can only store the violation images corresponding to the hot-spot violation contents, when new violation contents appear, the newly added violation contents can be added into the violation database, and the violation images of other categories stored in the violation database are deleted from the database so as to keep a plurality of violation images corresponding to the latest violation categories.

When the image to be detected is detected, the reference illegal image can be used as an image reference standard to jointly determine whether the image to be detected contains illegal contents. However, since a large number of violation images are stored in the violation image database, in order to make the image detection accuracy better, the violation image meeting the similarity requirement can be determined as a reference violation image corresponding to the image to be detected by determining the similarity between each violation image and the image to be detected. Then selecting the reference violation image for the image to be detected is equivalent to performing a coarse screening operation of the violation content, and respectively comparing the similarity between the image to be detected and the violation content contained in a large number of violation images stored in the violation image database to determine the reference violation image which is close to the image to be detected, namely determining the violation content possibly contained in the image to be detected. Therefore, the images to be detected are further accurately screened based on the reference illegal images, and the image detection accuracy can be improved.

Next, a first textual description corresponding to the reference violation image is determined. Wherein the first text description is mainly used to describe the main features of the reference violation image. In this embodiment, the specific implementation manner of determining the first text description corresponding to the reference violation image is not limited, and a person skilled in the art may set the specific implementation manner according to specific application requirements and design requirements, for example, one implementation manner may obtain the first text description corresponding to the reference violation image through a preset machine learning model, so that an accurate text description of the image may be obtained. Specifically, determining the first text description corresponding to the reference violation image may include: the reference violation image is analyzed and processed by a first machine learning model, a first text description corresponding to the reference violation image is obtained, and the first machine learning model is trained to determine the text description corresponding to the violation image.

In addition, the first machine learning model may be generated by learning training the deep convolutional neural network, that is, learning training the deep convolutional neural network using a preset reference image and a text description corresponding to the reference image, so that the first machine learning model may be obtained. After the first machine learning model is established, the reference violation image may be analyzed using the first machine learning model such that a first textual description corresponding to the reference violation image may be obtained.

In the embodiment, the first machine learning model is trained to analyze and process the reference violation image to obtain the first text description corresponding to the reference violation image, so that the accuracy and the reliability of obtaining the text description are effectively ensured, the detection quality and the detection efficiency of the image to be detected are ensured based on the text description, and the stability and the reliability of the method are further improved.

After the reference violation image corresponding to the image to be detected and the first text description corresponding to the reference violation image are obtained, the image to be detected can be analyzed from multiple angles based on the reference violation image and the first text description corresponding to the reference violation image to determine whether the image to be detected contains the violation content, so that the image detection accuracy can be improved, and the occurrence of the violation content on a network can be avoided.

However, in practical applications, since a piece of image may include a lot of content, if the reference offending image and the first text description corresponding to the reference offending image are directly used to compare with the image to be detected, and the image background and other information may affect the image and the text description comparison result, then in order to improve the detection result of the image to be detected, the machine learning model may be used to obtain the feature vector corresponding to the image to be detected, the feature vector corresponding to the reference image and the feature vector corresponding to the first text description, and then the feature vector corresponding to the image to be detected, and then the feature vector corresponding to the reference image and the feature vector corresponding to the first text description are compared and analyzed to determine whether the image to be detected includes the offending content.

Specifically, an image to be detected is input into a first image recognition model to obtain a first feature vector corresponding to the image to be detected, a reference violation image is input into a second image recognition model to obtain a second feature vector corresponding to the reference violation image, a first text description is input into a text recognition model to obtain a third feature vector corresponding to the first text description. The first image recognition model is trained to be used for determining the feature vector corresponding to the image to be detected, the second image recognition model is trained to be used for determining the feature vector corresponding to the reference violation image, and the text recognition model is trained to be used for determining the feature vector of the text description corresponding to the reference violation image.

In addition, compared with the traditional image recognition model which is obtained by training based on a large number of sample images of illegal contents, the first image recognition model and the second image recognition model in the embodiment of the invention are obtained by training based on the sample images which do not contain illegal contents, and can be obtained by training based on a large number of open-source image sample sets, so that the first image recognition model and the second image recognition model not only have better recognition capability to accurately obtain the first feature vector corresponding to the image to be detected and the second feature vector corresponding to the reference illegal image, thereby improving the accuracy of the detection result corresponding to the image to be detected, which is determined based on the first feature vector and the second feature vector, but also accurately recognize any type of contents contained in the image, and are not limited to recognizing a certain type of image contents, when new illegal contents appear on a network, continuous iterative update of the image recognition model is not needed, a large amount of calculation resources can be saved, the detection efficiency can be improved, and the new illegal contents can be detected in time, and the problem of safety of the image can be avoided.

In addition, in order to enable the machine learning model to have better feature recognition capability and feature comparison capability, in the embodiment of the invention, the first image recognition model, the second image recognition model and the text recognition model can be simultaneously combined trained in a comparison learning mode, so that three feature vectors output by the three recognition models can be directly subjected to comparison analysis.

Then, according to the first feature vector, the second feature vector and the third feature vector, whether the image to be detected contains illegal contents is determined. The method comprises the steps of learning the essential characteristics corresponding to each image or text through a machine learning model, obtaining a first feature vector reflecting the essential characteristics of the image to be detected through a first image recognition model, obtaining a second feature vector reflecting the essential characteristics of a reference violation image through a second image recognition model, and obtaining a third feature vector reflecting the essential characteristics of the first text description corresponding to the reference violation image through a text recognition model. Then, the obtained first feature vector, second feature vector, and third feature vector may be directly subjected to a comparison analysis to determine whether the image to be detected contains offending content.

Specifically, when the first feature vector, the second feature vector and the third feature vector are subjected to comparison analysis, the first feature vector and the second feature vector can be subjected to comparison analysis to determine the similarity between the first feature vector and the second feature vector, and whether the image to be detected contains the offence content in the reference offence image or not is determined through the similarity between the first feature vector and the second feature vector, so that the comparison analysis of the image and the image layer is realized. And then, comparing and analyzing the first feature vector and the third feature vector to determine the similarity between the first feature vector and the third feature vector, and determining whether the image to be detected contains illegal contents in the first text description or not through the similarity between the first feature vector and the third feature vector, namely, comparing and analyzing the text and the image layer.

In an alternative embodiment, in order to improve accuracy and efficiency of the similarity between the determined feature vectors, a pre-trained discrimination model may be used to determine the similarity between the first feature vector and the second feature vector, and the similarity between the first feature vector and the third feature vector, so as to determine whether the image to be detected contains the offending content according to the similarity.

Specifically, the first feature vector, the second feature vector and the third feature vector are input into the discrimination model to obtain whether the image to be detected contains illegal contents. The discriminant model is trained in advance to determine whether the image to be detected contains illegal contents according to the first feature vector, the second feature vector and the third feature vector.

In the embodiment of the invention, the reference violation image meeting the requirement on the similarity of the image to be detected is selected from the violation database, namely the coarse detection of the violation content of the image to be detected is carried out, and then the accurate detection of the violation content of the image to be detected is carried out by combining the reference violation image and the first text description corresponding to the reference violation image, namely the two detection of the image to be detected is realized, so that the detection result of the image violation is more accurate. And combining the reference violation image and the first text description corresponding to the reference violation image, and detecting the violation content of the image to be detected from multiple angles, so that the accuracy of detecting the violation of the image can be improved. In addition, because the first image recognition model and the second image recognition model are not obtained based on sample image training containing illegal contents, when the newly added illegal contents occur, continuous iterative updating of the image recognition model is not needed, and the newly added illegal contents can be responded in time. Therefore, when the first image recognition model and the second image recognition model are used for detecting the image to be detected, whether the image to be detected contains the newly added illegal content can be detected in time, so that the problem of safety of the new image content is avoided.

The above embodiment describes a detection process of whether or not illegal contents are contained in an image to be detected. When the images to be detected are detected, a reference violation image meeting the requirement on similarity of the images to be detected is selected from a violation image database, coarse screening is carried out on the images to be detected to determine whether the images to be detected contain violation contents or not, and then comparison analysis is carried out on the images to be detected based on the reference violation image and a first text description corresponding to the reference violation image so as to determine whether the images to be detected contain violation contents or not.

The violation image database can store a large number of violation images of at least one category, and the violation images stored in the violation image database can be set according to actual requirements. In order to improve the image detection efficiency, a plurality of types of violation images can be stored in the violation image database, after the new violation content appears, the violation image corresponding to the type can be directly stored in the violation image database, so that the violation image database not only comprises a plurality of types of violation images stored before, but also comprises a plurality of types of violation images of the new violation content, when the detection processing is carried out on the to-be-detected image, the comparison can be carried out on the violation images of the new violation type with the reference violation images of the previous type, so that the reference violation images meeting the requirement on the similarity of the to-be-detected image can be obtained, namely, the selected reference violation images can be the violation images of the previous type, but also can be the violation images of the new violation type, so that the identification operation of the full-type violation content is carried out on the to-be-detected image when the reference violation images are selected, the full-type violation content detection is carried out on the to-be-detected image, the full-type detection accuracy can be improved, and the full-detection can be carried out.

For example, if the image to be detected contains newly added type of offence content, the selected reference offence image is the newly added type of reference offence image, and the image to be detected is analyzed in combination with the newly added type of reference offence image, so that the real-time hot spot change can be perceived in time, and the newly added offence content contained in the image to be detected can be found in time, so that a large amount of newly added offence content is avoided on the network.

In addition, in practical application, if the violation image database contains a large number of violation images in various categories, when the reference violation images meeting the similarity requirement with the images to be detected are selected from the violation image database, the comparison needs to be performed one by one to perform the similarity, so that the workflow is very complicated, and the method can also be only used for storing the violation images corresponding to the violation contents in the newly added categories in the violation image database. Firstly, detecting the illegal contents of the previous category by using an image detection model, and if the image to be detected does not contain the illegal contents of the previous category, detecting the image to be detected by using a reference illegal image corresponding to the illegal contents of the new category in the illegal reference image database.

In order to facilitate understanding of the specific implementation process of detecting the image to be detected in the above embodiment, the description is made with reference to an application scenario.

In the first embodiment, referring to fig. 2, an example is described in which a newly added type of violation image is stored in a violation image database, that is, the violation image database includes a first type of violation image, and at this time, an execution subject for executing the image detection method is an image detection device, after a user uploads a video using an internet platform, the image is extracted from the video to obtain a plurality of images to be detected, after the image detection device obtains the images to be detected, the images to be detected are input into an image detection model to obtain a target image type corresponding to the images to be detected, and the image detection model is obtained based on training of the plurality of different second types of violation images.

If the target image category is not matched with the plurality of different second categories, selecting a reference violation image meeting the requirement on similarity with the image to be detected from the violation images of the first category contained in the violation image database. And determining a first text description corresponding to the reference violation image.

Then, the image to be detected is input into a first image recognition model to obtain a first feature vector corresponding to the image to be detected, the reference violation image is input into a second image recognition model to obtain a second feature vector corresponding to the reference violation image, the first text description is input into a text recognition model to obtain a third feature vector corresponding to the first text description. And determining whether the image to be detected contains illegal contents or not according to the first feature vector, the second feature vector and the third feature vector.

If the to-be-detected image contains violation content, the to-be-detected image is used as a newly added first-class violation image and added into a violation image database, and when the number of the first-class violation images stored in the violation image database reaches a set threshold, the image detection model is updated based on the first-class violation images, and the first-class violation images are removed from the violation image database.

According to the image detection method, the image to be detected is input into the image detection model to obtain the target image type corresponding to the image to be detected, the image detection model is obtained based on the illegal image training of a plurality of different second types, if the target image type is not matched with the illegal image of the first type contained in the illegal image database, the reference illegal image meeting the requirement on similarity of the image to be detected is selected, and whether the image to be detected contains illegal contents or not is determined based on the reference illegal image and the first text description corresponding to the reference illegal image, so that the illegal contents of the original type (the preset illegal type) of the image to be detected can be detected, the illegal contents of the newly added type of the image to be detected can be detected, the detection result is more accurate, and meanwhile, the hot illegal contents can be detected timely to avoid the occurrence of new content safety problems.

In the second embodiment, referring to fig. 3, an example is described in which a violation image database includes known violation images of all categories, at this time, an execution main body for executing the image detection method is an image detection device, after a user uploads a video using an internet platform, the video is subjected to image extraction to obtain a plurality of images to be detected, and after the image detection device obtains the images to be detected, similarity between the images to be detected and each violation image in the violation image database is calculated respectively, and a reference violation image meeting requirements on similarity of the images to be detected is determined according to the similarity. And determining a first text description corresponding to the reference violation image.

In order to facilitate understanding of determining whether the image to be detected contains offending content according to the first feature vector, the second feature vector, and the third feature vector, a specific implementation process of determining whether the image to be detected contains offending content according to the first feature vector, the second feature vector, and the third feature vector is described below in an exemplary manner with reference to fig. 4.

FIG. 4 is a schematic flow chart of determining whether an image to be detected is a violation image according to an embodiment of the present invention; on the basis of the above embodiment, with continued reference to fig. 4, the present embodiment provides a manner of determining whether the image to be detected is an offending image, which may specifically include the following steps:

401. and determining the violation score corresponding to the image to be detected according to the first feature vector, the second feature vector and the third feature vector.

402. If the violation score is greater than a preset violation threshold, determining whether the image to be detected contains violation content.

After a first feature vector corresponding to the image to be detected, a second feature vector corresponding to the reference violation image and a third feature vector corresponding to the first text description are obtained, the first feature vector, the second feature vector and the third feature vector are analyzed and processed, so that the violation score corresponding to the image to be detected is determined according to the first feature vector, the second feature vector and the third feature vector, and whether the image to be detected contains violation content is determined according to the violation score.

In an optional embodiment, according to the first feature vector, the second feature vector, and the third feature vector, a specific implementation manner of determining the violation score corresponding to the image to be detected may be: acquiring a first product value of the first feature vector and the second feature vector and a second product value of the first feature vector and the third feature vector; determining a sum of the first product value and the second product value; and determining the violation score corresponding to the image to be detected according to the sum value.

In specific implementation, assuming that the first feature vector is a, the second feature vector is b, the third feature vector is c, a first product value of the first feature vector and the second feature vector is a×b, a second product value of the first feature vector and the third feature vector is a×c, and a sum of the first product value and the second product value is a×b+a×c. And determining that the violation Score corresponding to the image to be detected is score= (a x b+a x c)/2 according to the sum value.

The calculation process can be known as follows: and when determining the violation score corresponding to the image to be detected, determining the violation score corresponding to the image to be detected by respectively calculating the similarity between the first feature vector and the second feature vector and the similarity between the second feature vector and the third feature vector. The higher the violation score, the more similar the image to be detected is to the reference violation image, and the greater the likelihood that the image to be detected contains violation content.

After the violation score corresponding to the image to be detected is obtained, comparing the violation score with a preset threshold, and if the violation score is larger than the preset violation threshold, determining whether the image to be detected contains violation content. The preset violation threshold can be set according to actual requirements, and the larger the preset violation threshold is set, the higher the requirement on image detection is.

In this embodiment, the first feature vector, the second feature vector and the third feature vector are used to determine the violation score corresponding to the image to be detected, and determine whether the image to be detected contains the violation content according to the violation score corresponding to the image to be detected and the preset violation threshold, so that not only is the accuracy and reliability of obtaining the violation score corresponding to the image to be detected ensured, but also the accuracy of detecting the image violation is improved.

In the embodiment of the invention, the first image recognition model, the second image recognition model and the text recognition model not only can accurately recognize the feature vectors corresponding to the images or the texts, but also have certain contrast capability, so that the output feature vectors can be directly subjected to contrast analysis, and the detection result corresponding to the image to be detected, which is determined according to the first feature vector, the second feature vector and the third feature vector, is more accurate. Then, in order to facilitate understanding of the training process of the first image recognition model, the second image recognition model, and the text recognition model, the training process of jointly training the first image recognition model, the second image recognition model, and the text recognition model will be exemplarily described with reference to fig. 5.

FIG. 5 is a schematic diagram of a model training method according to an embodiment of the present invention; referring to fig. 5, specifically, training each model in the image detection method may include the following steps:

501. the method comprises the steps of obtaining a training sample, wherein the training sample comprises a first sample image without illegal contents, a second text description corresponding to the first sample image and a second sample image obtained after data enhancement processing is carried out on the first sample image.

502. And determining a first prediction feature vector corresponding to the first sample image through the first image recognition model.

503. And determining a second prediction feature vector corresponding to the second sample image through the second image recognition model.

504. And determining a third prediction feature vector corresponding to the second text description through the text recognition model.

505. A first similarity between the first predictive feature vector and the second predictive feature vector and a second similarity between the first predictive feature vector and the third predictive feature vector are determined.

506. And training the first image recognition model, the second image recognition model and the text recognition model according to the first similarity and the second similarity.

In the embodiment of the invention, the first image recognition model, the second image recognition model and the text recognition model can be jointly trained until the three models converge. Specifically, firstly, training samples are obtained, wherein each training sample comprises a triplet, a first sample image without illegal contents, a second text description corresponding to the first sample image and a second sample image obtained after data enhancement processing is carried out on the first sample image.

The first sample image in the training sample may be an open source image set, where the open source image set is a plurality of types of images acquired on a network. After a large number of acquired open-source image sets, namely first sample images, corresponding second text descriptions are generated based on the first sample images, and data enhancement processing is carried out on the first sample images so as to obtain corresponding second sample images, so that a large number of training samples can be acquired. The first image recognition model, the second image recognition model and the text recognition model are subjected to learning training based on a large number of training samples, so that the models have better performance.

After a training sample is acquired, inputting a first sample image into a first image recognition model to determine a first prediction feature vector corresponding to the first sample image through the first image recognition model; inputting the second sample image into a second image recognition model to determine a second prediction feature vector corresponding to the second sample image through the second image recognition model; and inputting the second text description into the text recognition model to determine a third predictive feature vector corresponding to the second text description through the text recognition model.

And then, determining a first similarity between the first predicted feature vector and the second predicted feature vector and a second similarity between the first predicted feature vector and the third predicted feature vector, and training a first image recognition model, a second image recognition model and a text recognition model according to the first similarity and the second similarity.

In an alternative embodiment, a contrast learning manner may be adopted, and according to the first similarity and the second similarity, the first image recognition model, the second image recognition model, and the text recognition model are respectively learned and trained, and the contrast loss function is used to fine tune each model until each model converges.

In addition, in order to improve the model training efficiency, in the embodiment of the invention, when the first image recognition model, the second image recognition model and the text recognition model are subjected to learning training, the pre-training model weight can be acquired first, and the first image recognition model, the second image recognition model and the text recognition model can be subjected to learning training by using the pre-training model weight. The image recognition model can be used as an image recognition model, the image recognition model can be used as a clip source model weight, and the text recognition model can be used as a clip source text model weight.

According to the embodiment of the invention, the training sample is obtained, and the training sample comprises a first sample image which does not contain illegal contents, a second text description corresponding to the first sample image and a second sample image obtained after data enhancement processing is carried out on the first sample image. The method comprises the steps of determining a first prediction feature vector corresponding to a first sample image through a first image recognition model, determining a second prediction feature vector corresponding to a second sample image through a second image recognition model, determining a third prediction feature vector corresponding to a second text description through a text recognition model, determining a first similarity between the first prediction feature vector and the second prediction feature vector and a second similarity between the first prediction feature vector and the third prediction feature vector, training the first image recognition model, the second image recognition model and the text recognition model according to the first similarity and the second similarity, so that the three models have better feature recognition and feature extraction efforts, and can have comparison capability, and the first feature vector, the second feature vector and the third feature vector which are acquired based on the first image recognition model, the second image recognition model and the text recognition model are more accurate and have uniqueness, and the detection result which is determined based on the first feature vector, the second feature vector and the third feature vector is more accurate.

The implementation process and technical effects of the method in this embodiment are similar to those of the method in the embodiment shown in fig. 1 to 4, and reference may be made to the above description for details, which are not repeated here.

Fig. 6 is a schematic structural diagram of an image detection device according to an embodiment of the present invention; referring to fig. 6, the present embodiment provides an image detection apparatus, which may perform the image detection method corresponding to fig. 1, and the image detection apparatus may include an acquisition module 11, a selection module 12, a first determination module 13, an image processing module 14, a text processing module 15, and a second determination module 16.

An acquisition module 11, configured to acquire an image to be detected.

And the selection module 12 is used for selecting a reference violation image meeting the requirement of similarity with the image to be detected from a violation image database, wherein the reference violation image refers to an image containing violation contents.

A first determining module 13, configured to determine a first text description corresponding to the reference violation image.

The image processing module 14 is configured to input the image to be detected into a first image recognition model, obtain a first feature vector corresponding to the image to be detected, input the reference violation image into a second image recognition model, obtain a second feature vector corresponding to the reference violation image, where the first image recognition model and the second image recognition model are obtained based on sample image training that does not include violation content.

The text processing module 15 is configured to input the first text description into a text recognition model, and obtain a third feature vector corresponding to the first text description.

A second determining module 16, configured to determine whether the image to be detected contains illegal content according to the first feature vector, the second feature vector, and the third feature vector.

In an alternative embodiment, the violation image database includes a first category of violation images, and the selection module 12 may specifically be configured to: inputting the image to be detected into an image detection model to obtain a target image category corresponding to the image to be detected, wherein the image detection model is obtained based on illegal image training of a plurality of different second categories; and if the target image category is not matched with the plurality of different second categories, selecting a reference violation image meeting the requirement on similarity with the image to be detected from the violation images of the first category contained in the violation image database.

In an alternative embodiment, the violation image database includes violation images of a first category, and the selection module 12 may be further configured to: if the image to be detected contains illegal contents, the image to be detected is used as the newly added first class illegal image and is added into the illegal image database; and if the number of the first class of violation images stored in the violation image database reaches a set threshold, updating the image detection model based on the first class of violation images, and removing the first class of violation images from the violation image database.

In an alternative embodiment, the violation image database contains known violation images of all classes; the selection module 12 may in particular also be used to: respectively calculating the similarity between the image to be detected and each violation image in the violation image database; and determining a reference violation image meeting the requirement of the similarity of the image to be detected according to the similarity.

In an alternative embodiment, the second determining module 16 may specifically be configured to: determining the violation score corresponding to the image to be detected according to the first feature vector, the second feature vector and the third feature vector; and if the violation score is larger than a preset violation threshold, determining that the image to be detected is a violation image.

In an alternative embodiment, the second determining module 16 may be further configured to: acquiring a first product value of the first feature vector and the second feature vector and a second product value of the first feature vector and the third feature vector; determining a sum of the first product value and the second product value; and determining the violation score corresponding to the image to be detected according to the sum value.

In an alternative embodiment, the apparatus may further include a training module, and may specifically be configured to: obtaining a training sample, wherein the training sample comprises a first sample image which does not contain illegal contents, a second text description corresponding to the first sample image and a second sample image obtained after data enhancement processing is carried out on the first sample image; determining a first prediction feature vector corresponding to the first sample image through the first image recognition model; determining a second prediction feature vector corresponding to the second sample image through the second image recognition model; determining a third prediction feature vector corresponding to the second text description through the text recognition model; determining a first similarity between the first predictive feature vector and the second predictive feature vector and a second similarity between the first predictive feature vector and the third predictive feature vector; and training the first image recognition model, the second image recognition model and the text recognition model according to the first similarity and the second similarity.

In an alternative embodiment, the training module may be further configured to: and respectively carrying out learning training on the first image recognition model, the second image recognition model and the text recognition model according to the first similarity and the second similarity by adopting a contrast learning mode.

The apparatus of fig. 6 may perform the method of the embodiment of fig. 1-4, and reference is made to the relevant description of the embodiment of fig. 1-4 for parts of this embodiment not described in detail. The implementation process and the technical effect of this technical solution are described in the embodiments shown in fig. 1 to 4, and are not described herein.

In one possible design, the structure of the image detection apparatus shown in fig. 6 may be implemented as an electronic device, which may be a mobile phone, a tablet computer, a server, or other devices. As shown in fig. 7, the electronic device may include: a processor 21 and a memory 22. Wherein the memory 22 is for storing a program for the corresponding electronic device to execute the image processing method provided in the embodiments shown in fig. 1-4 described above, the processor 21 is configured for executing the program stored in the memory 22.

The program comprises one or more computer instructions, wherein the one or more computer instructions, when executed by the processor 21, are capable of performing the steps of:

acquiring an image to be detected;

Further, the processor 21 is further configured to perform all or part of the steps in the embodiments shown in fig. 1-4.

The electronic device may further include a communication interface 23 in its structure for communicating with other devices or with a communication network.

In addition, an embodiment of the present invention provides a computer storage medium for storing computer software instructions for an electronic device, where the computer storage medium includes a program for executing the image detection method according to the embodiment of the method shown in fig. 1 to fig. 4.

The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by adding necessary general purpose hardware platforms, or may be implemented by a combination of hardware and software. Based on such understanding, the foregoing aspects, in essence and portions contributing to the art, may be embodied in the form of a computer program product, which may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.

Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. An image detection method, comprising:

acquiring an image to be detected;

2. The method of claim 1, wherein the violation image database includes violation images of a first category, wherein the selecting a reference violation image from the violation image database that meets a requirement for similarity to the image to be detected includes:

inputting the image to be detected into an image detection model to obtain a target image category corresponding to the image to be detected, wherein the image detection model is obtained based on illegal image training of a plurality of different second categories;

and if the target image category is not matched with the plurality of different second categories, selecting a reference violation image meeting the requirement on similarity with the image to be detected from the violation images of the first category contained in the violation image database.

3. The method according to claim 2, wherein the method further comprises:

If the image to be detected contains illegal contents, the image to be detected is used as the newly added first class illegal image and is added into the illegal image database;

and if the number of the first class of violation images stored in the violation image database reaches a set threshold, updating the image detection model based on the first class of violation images, and removing the first class of violation images from the violation image database.

4. The method of claim 1, wherein the violation image database contains known full-class violation images; the selecting the reference violation image meeting the requirement of the similarity with the image to be detected from the violation image database comprises the following steps:

respectively calculating the similarity between the image to be detected and each violation image in the violation image database;

and determining a reference violation image meeting the requirement of the similarity of the image to be detected according to the similarity.

5. The method of claim 1, wherein the determining whether the image to be detected is a offending image based on the first feature vector, the second feature vector, and the third feature vector comprises:

Determining the violation score corresponding to the image to be detected according to the first feature vector, the second feature vector and the third feature vector;

and if the violation score is larger than a preset violation threshold, determining that the image to be detected contains violation content.

6. The method of claim 5, wherein determining the corresponding violation score for the image to be detected based on the first feature vector, the second feature vector, and the third feature vector comprises:

acquiring a first product value of the first feature vector and the second feature vector and a second product value of the first feature vector and the third feature vector;

determining a sum of the first product value and the second product value;

and determining the violation score corresponding to the image to be detected according to the sum value.

7. The method according to claim 1, wherein the method further comprises:

obtaining a training sample, wherein the training sample comprises a first sample image which does not contain illegal contents, a second text description corresponding to the first sample image and a second sample image obtained after data enhancement processing is carried out on the first sample image;

Determining a first prediction feature vector corresponding to the first sample image through the first image recognition model;

determining a second prediction feature vector corresponding to the second sample image through the second image recognition model;

determining a third prediction feature vector corresponding to the second text description through the text recognition model;

determining a first similarity between the first predictive feature vector and the second predictive feature vector and a second similarity between the first predictive feature vector and the third predictive feature vector;

and training the first image recognition model, the second image recognition model and the text recognition model according to the first similarity and the second similarity.

8. The method of claim 7, wherein training the first image recognition model, the second image recognition model, the text recognition model based on the first similarity and the second similarity comprises:

and respectively carrying out learning training on the first image recognition model, the second image recognition model and the text recognition model according to the first similarity and the second similarity by adopting a contrast learning mode.

9. An electronic device, comprising: a memory, a processor; wherein the memory is configured to store one or more computer instructions, wherein the one or more computer instructions, when executed by the processor, implement the image detection method of any of claims 1-8.

10. A non-transitory machine-readable storage medium having stored thereon executable code which, when executed by a processor of an electronic device, causes the processor to perform the image detection method of any of claims 1 to 8.