WO2021237570A1

WO2021237570A1 - Image auditing method and apparatus, device, and storage medium

Info

Publication number: WO2021237570A1
Application number: PCT/CN2020/092923
Authority: WO
Inventors: 罗茂
Original assignee: 深圳市欢太科技有限公司; Oppo广东移动通信有限公司
Priority date: 2020-05-28
Filing date: 2020-05-28
Publication date: 2021-12-02
Also published as: CN115443490A

Abstract

An image auditing method, comprising: using a target classification model to perform feature extraction on an image file to be audited to obtain a corresponding feature vector, wherein the target classification model is obtained by training using multiple sample image files and corresponding multiple image transformation files; determining the similarity between the feature vector of the image file to be audited and at least one reference feature vector in an auditing set; and determining, according to the relationship between the determined similarity and a first threshold, whether the image file to be audited is an offending file. Also provided are an image auditing apparatus, a device, and a storage medium.

Description

Image review method and device, equipment and storage medium

Technical field

The embodiments of this application relate to Internet technology, and relate to but not limited to image review methods and devices, equipment, and storage media.

Background technique

In the Internet content review business, "bad guys" deliberately transform the illegal image files in various ways to "cheat" the image review device, and then spread the illegal image files to the Internet. There are many ways to transform image files, such as rotation, liquefaction, deformation, noise, rendering and other basic transformation methods or their combination. It can be seen that the "bad guys" transform the illegal image files and upload them to the Internet, which brings a very big technical challenge to the image review device.

Summary of the invention

The image review method, device, equipment, and storage medium provided in the embodiments of this application are implemented as follows:

The image review method provided by the embodiment of the application includes: extracting features of the image file to be reviewed using a target classification model to obtain a corresponding feature vector; wherein the target classification model uses multiple sample image files and corresponding multiple images Obtained by transforming file training; determining the similarity between the feature vector of the image file to be reviewed and at least one reference feature vector in the review set; determining the relationship between the determined similarity and the first threshold State whether the image file to be reviewed is a violation file.

The image review device provided by this embodiment of the application includes: a feature extraction module configured to use a target classification model to perform feature extraction on an image file to be reviewed to obtain a corresponding feature vector; wherein the target classification model is based on a plurality of sample image files Obtained through training with corresponding multiple image transformation files; the first determining module is configured to determine the similarity between the feature vector of the image file to be reviewed and at least one reference feature vector in the review set; the review module is configured to According to the determined relationship between the similarity and the first threshold, it is determined whether the pending image file is a violation file.

The electronic device provided by an embodiment of the present application includes a memory and a processor. The memory stores a computer program that can run on the processor. When the processor executes the program, the image review described in any of the embodiments of the present application is implemented Steps in the method.

The computer-readable storage medium provided by the embodiment of the present application has a computer program stored thereon, and when the computer program is executed by a processor, it implements the steps in any one of the image review methods described in the embodiment of the present application.

In the embodiment of this application, the electronic device uses the target classification model to perform feature extraction on the image file to be reviewed to obtain the corresponding feature vector; wherein, the target classification model is obtained through training of multiple sample image files and corresponding multiple image transformation files In this way, even if the image file to be reviewed is a file that has undergone various transformations such as rotation, liquefaction, and deformation, it can still extract the feature vector consistent with the original file, so as to achieve the accuracy of the arbitrarily transformed image file Recognition, in turn, can enhance the robustness of the image review method.

Description of the drawings

FIG. 1 is a schematic diagram of an exemplary application scenario of an image review method according to an embodiment of this application;

FIG. 2 is a schematic diagram of the implementation process of the image review method according to the embodiment of the application;

FIG. 3 is a schematic diagram of the training process of the target classification model according to the embodiment of the application;

4 is a schematic diagram of the implementation process of the method for generating a review set according to an embodiment of the application;

FIG. 5 is a schematic diagram of an implementation process of a method for determining a first threshold value according to an embodiment of the application; FIG.

FIG. 6 is a schematic diagram of the implementation process of another image review method according to an embodiment of the application;

FIG. 7A is a schematic structural diagram of MobileNetV2 according to an embodiment of the application;

FIG. 7B is a schematic structural diagram of a feature extraction structure according to an embodiment of the application;

FIG. 8 is a schematic diagram of the implementation process of another image review method according to an embodiment of the application;

FIG. 9 is a schematic diagram of the implementation process of yet another image review method according to an embodiment of the application;

10 is a schematic diagram of the implementation process of another image review method according to an embodiment of the application;

FIG. 11 is a schematic diagram of the implementation process of another image review method according to an embodiment of the application;

FIG. 12 is a schematic diagram of a transformation operation performed on an original picture according to an embodiment of the application;

FIG. 13 is a simplified structural diagram of MobileNetV2 according to an embodiment of the application;

Figure 14 is a schematic diagram of the curve of the sigmoid function;

FIG. 15 is a schematic diagram of a process of image matching according to an embodiment of the application;

FIG. 16 is the corresponding recall and wrong_recall when the candidate threshold is 35 to 70 in the embodiment of the application;

FIG. 17 is the corresponding recall and wrong_recall when the candidate threshold is 50 to 55 according to the embodiment of the application;

FIG. 18 is a schematic flowchart of a picture review system according to an embodiment of the application;

19 is a schematic diagram of the Mobilehashnet algorithm flow in the picture review system according to an embodiment of the application;

20A is a schematic diagram of the structure of an image file review device according to an embodiment of the application;

20B is a schematic structural diagram of another image file review device according to an embodiment of the application;

FIG. 21 is a schematic diagram of a hardware entity of an electronic device according to an embodiment of the application.

Detailed ways

In order to make the objectives, technical solutions, and advantages of the embodiments of the present application clearer, the specific technical solutions of the present application will be described in further detail below in conjunction with the drawings in the embodiments of the present application. The following examples are used to illustrate the application, but are not used to limit the scope of the application.

Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by those skilled in the technical field of this application. The terminology used herein is only for the purpose of describing the embodiments of the application, and is not intended to limit the application.

In the following description, “some embodiments” are referred to, which describe a subset of all possible embodiments, but it is understood that “some embodiments” may be the same subset or different subsets of all possible embodiments, and Can be combined with each other without conflict.

It should be pointed out that the term "first\second\third" involved in the embodiments of the present application only distinguishes similar or different objects, and does not represent a specific order of objects. Understandably, "first\second\ "Third" can be interchanged in a specific order or sequence when permitted, so that the embodiments of the present application described herein can be implemented in a sequence other than those illustrated or described herein.

The following first describes an exemplary application scenario of the image review method provided in the embodiment of the present application.

FIG. 1 is a schematic diagram of an exemplary application scenario 100 of an image review method provided by an embodiment of the present application. As shown in FIG. 1, the scene 100 includes a terminal 101, an image review device 102 and a second database 103. Among them, the image review device 102 is used to review the image file 104 input by the user at the terminal 101 to determine whether the file is a violation file; if it is a violation file, it is forbidden to store the file in the second database 103; otherwise If it is not a violation file, that is, the file is a compliant file, then the file is allowed to be stored in the second database 103 so that the user or other users can retrieve, browse or download the file.

It should be noted that the terminal 101 may be a mobile terminal with wireless communication capabilities such as a mobile phone (for example, a mobile phone), a tablet computer, a notebook computer, or the like, or a desktop computer or desktop computer with computing functions that is inconvenient to move.

The image review device 102 may be configured in the terminal 101, or may be configured independently of the terminal 101. There may be one or more image review devices 102 in the application scene 100. Multiple image review devices 102 can review the image files input by different users in parallel, thereby increasing the data processing speed.

In addition to being independent of the configuration of the image reviewing device 102 and the terminal 101, the second database 103 can also be configured in the image reviewing device 102 when the image reviewing device 102 is configured on the network side.

In the case that the terminal 101, the image auditing device 102, and the second database 103 are independent of different devices, the terminal 101 and the image auditing device 102 can communicate through the network, and the image auditing device 102 and the second database 103 can also communicate with each other through the network. The communication may be performed through a network, and the network may be a wireless network or a wired network, and the embodiment of the present application does not specifically limit the communication mode here.

The embodiment of the application provides an image review method, which can be applied to electronic equipment with an image review device. The electronic equipment can be a computer device, a notebook computer, any node server in a distributed computing architecture, or a mobile terminal. Wait. The functions implemented by the image review method can be implemented by invoking program codes by the processor in the electronic device. Of course, the program codes can be stored in a computer storage medium. It can be seen that the electronic device at least includes a processor and a storage medium.

FIG. 2 is a schematic diagram of the implementation process of the image review method according to the embodiment of the application. As shown in FIG. 2, the method may include the following steps 201 to 203:

Step 201: Use the target classification model to perform feature extraction on the image file to be reviewed to obtain the corresponding feature vector; wherein the target classification model is obtained through training of multiple sample image files and corresponding multiple image transformation files.

It should be noted that the target classification model may be a deep learning model, for example, a neural network model. There is no limit to the number of layers included in the model. The model can be a lightweight neural network model, such as MobileNetV2. Of course, the model can also be a non-lightweight neural network model. For the training process of the target classification model, the electronic device can be implemented through steps 301 to 304 in the following embodiment.

Understandably, the so-called image transformation file refers to a file obtained by performing transformation processing such as inversion, rotation, liquefaction, scaling, cropping, mosaic, noise, color change, or occlusion on a sample image file, or a combination of these transformation methods.

The image file to be reviewed may be of various types. For example, the image file to be reviewed is an image or a piece of video (for example, a short video, a live video, a movie, a TV series, etc.). In the case that the image file to be reviewed is a video, the electronic device can randomly sample one or more video frame images from the video, and then perform feature extraction on these images through the target classification model to obtain the feature vector corresponding to the video.

Step 202: Determine the similarity between the feature vector of the image file to be reviewed and at least one reference feature vector in the review set.

Generally, in order to ensure the accuracy of the review, when the image file to be reviewed is an image and a video, the corresponding review set can be different. That is to say, when the image file to be reviewed is an image, the reference feature vector in the corresponding review set is extracted by the electronic device from the image. When the image file to be reviewed is a piece of video, a reference feature vector in the corresponding review set is extracted by the electronic device from multiple images. All in all, the dimension of the feature vector of the image file to be reviewed is consistent with the dimension of the reference feature vector. Of course, it is not limited to the above rules. The dimensions of the two feature vectors can also be different.

The parameter types that characterize the similarity can be varied, for example, it can be Hamming distance, Euclidean distance, or cosine similarity.

Step 203: Determine whether the to-be-reviewed image file is a violation file according to the determined relationship between the similarity and the first threshold.

Understandably, the audit set generated based on the compliant reference image file (for a brief description, referred to as the compliant set) and the audit set generated based on the offending reference image file (hereinafter referred to as the violation set), correspond to the judgment criteria Is different.

For example, the similarity is characterized by the Hamming distance. The Hamming distance between two strings of equal length refers to the number of different characters at the corresponding positions of the two strings. Therefore, the smaller the Hamming distance, the more similar the two feature vectors, and the more similar the corresponding two image files. For the violation set, in one example, the ratio of the number of similarities less than the first threshold to the total number of similarities is determined, and when the ratio is greater than the second threshold, the image file to be reviewed is determined to be a violation file. For the compliance set, in one example, when the ratio is greater than the second threshold, the image file to be reviewed is determined to be a compliance file.

There are many ways to determine whether an image file to be reviewed is a violation file. For example, the electronic device can be implemented through step 604 to step 606 in the following embodiment. For another example, the electronic device can also be implemented through step 802 to step 809 in the following embodiment. The similarity characterizes the number of different features between two feature vectors. The audit set is a violation set. Every time the electronic device determines the similarity with the reference feature vector, it counts the current similarity that is less than the first threshold. If the number is greater than or equal to the third threshold, the calculation of similarity is stopped, and the image file to be reviewed is determined to be a violation file, which is output as the review result.

For another example, the electronic device can also determine whether the image file to be reviewed is a violation file through steps 902 to 904 in the following embodiment.

In the embodiment of the present application, the electronic device uses the target classification model to perform feature extraction on the image file to be reviewed to obtain the corresponding feature vector; wherein, the target classification model is trained through multiple sample image files and corresponding multiple image transformation files In this way, even if the image file to be reviewed is a file that has undergone multiple transformation processes such as rotation, liquefaction, and deformation of the original file, it can still extract the feature vector consistent with the original file, so as to realize the image file that is arbitrarily transformed Accurate identification can enhance the robustness of the image review method.

In some embodiments, before the electronic device reviews the image file to be reviewed, it may pre-train to obtain the target classification model, generate the review set, and determine the first threshold; wherein,

For the training process of the target classification model, as shown in FIG. 3, the following steps 301 to 304 may be included. It should be noted that the electronic device may perform the following steps 301 to 304 before performing feature extraction on the image file to be reviewed. The electronic device may also execute the following steps 301 to 304 when it is configured to have an image review function.

Step 301: Obtain the type label of each sample image file.

Understandably, the sample image files include illegal image files and compliant image files. Violating video files, for example, can be files related to terror, violence, pornography, and gambling. Compliant image files, for example, may be files related to natural scenery and buildings. The electronic device can sample some illegal sample files from the first database that collects a variety of illegal image files, and sample some compliant sample files from the second database that collects a variety of compliant image files. .

In order to reduce the labeling work of each sample image file, usually, a certain number of image files are selected from the first database and the second database as the sample files. For example, select 100 illegal images and 100 compliant images from these two databases as sample image files.

Step 302: Perform transformation processing on each of the sample image files according to multiple transformation rules to obtain a set of image transformation files corresponding to the files.

The transformation rules can be various. For example, the basic transformation rules include flip, rotation, liquefaction, zoom, crop, mosaic, noise, color change, and occlusion. The combined transformation rule is a combination of at least two basic transformation rules. Taking the above 9 basic transformation rules as an example, there are 502 combined transformation rules, namely

In an example, the electronic device may perform transformation processing on the sample image file according to 100 different transformation rules to obtain 100 image transformation files corresponding to the file.

Step 303: Assign the type label of each sample image file to each image transformation file in the corresponding image transformation file set.

Understandably, the type tags of the image file after conversion and the image file before conversion should be consistent. For example, if the illegal image file has been liquefied, the liquefied file is still illegal, and its nature remains unchanged. Therefore, the type label of the image transformation file corresponding to each sample image file can be consistent with the type label of the sample image file.

Step 304: Train a specific neural network model according to each of the sample image files, each of the image transformation files, and respective corresponding type labels to obtain the target classification model.

In the embodiment of this application, each sample image file is transformed according to multiple transformation rules to obtain the image transformation file set of the corresponding file; the type label of each sample image file is assigned to the corresponding image transformation file set Each of the image transformation files; according to each of the sample image files, each of the image transformation files and respective corresponding type tags, a specific neural network model is trained to obtain the target classification model.

In this way, on the one hand, the training samples include image transformation files obtained by performing multiple transformations on the sample image files, which can enrich the diversity of training samples and make the target classification model obtained by training have better robustness. When reviewing the image file to be reviewed based on the target classification model, it can fight against the transformed file. Even if the user performs transformation processing such as flipping, rotating, scaling, cropping, and mosaicing on the file before inputting the image file, the model can accurately extract the feature vector of the transformed file, so as to accurately identify whether the file is It is a violation file. To put it simply, the feature vectors extracted from the image files before and after the transformation process using this model are basically the same. Therefore, even if the input image file is a file after the transformation process, the electronic device can accurately identify whether the file is Violating documents.

On the other hand, in the embodiment of the present application, the type label of each sample image file is assigned to each image transformation file in the corresponding image transformation file set; in this way, under the premise of ensuring the diversity of training samples, it reduces Manual labeling costs, no need to manually label each image transformation file type label. The electronic device can automatically obtain a large number of rich and diverse training samples by transforming and processing the sample image files.

In some embodiments, the electronic device may load the generated audit set into the cache in advance. There is no restriction on the timing of loading. For example, the electronic device can load the generated review set before using the target classification model to extract the features of the image file to be reviewed; for another example, the electronic device can also extract the features of the image file to be reviewed and determine the image file to be reviewed Before the similarity between the feature vector of and at least one reference feature vector in the audit set, load the generated audit set; another example, when the electronic device is configured to have the image audit function, load the generated audit set .

In some embodiments, the method for generating an audit set, as shown in FIG. 4, may include the following steps 401 and 402:

Step 401: Using the target classification model, perform feature extraction on multiple reference image files to obtain feature vectors of corresponding files.

In some embodiments, the multiple reference image files may be violation files, for example, all or part of the files in the first database, and the audit set obtained based on this is the violation set. In other embodiments, the multiple reference image files may be compliance files, for example, all or part of the files in the second database. As mentioned above, the nature of the audit set is different, that is, the compliance set and the violation set. In the image review stage, the corresponding judgment criteria are also different.

When the multiple reference image files are part of the files in the database, they may be files randomly extracted from the database by the electronic device, or some representative files in the database, such as some files with higher priority.

Step 402: Use the feature vector of each reference image file as a reference feature vector to generate the review set.

In the embodiment of the present application, the review set is loaded into the buffer area in advance. In this way, in the process of reviewing the image file to be reviewed, the electronic device does not need to perform feature extraction on the multiple reference image files to generate the review set; instead, it can directly use the pre-generated review set to perform the image review. In this way, the time consumption of the feature extraction process can be saved, so that the time for reviewing the image can be saved.

In some embodiments, the electronic device may load the determined first threshold into the cache in advance. There is no restriction on the timing of loading. For example, the electronic device may load the determined first threshold before determining whether the image file to be reviewed is a violation file; for another example, the electronic device may also load the determined first threshold value before performing feature extraction on the image file to be reviewed. Threshold; For another example, the electronic device can also load the determined first threshold when it is configured to have an image review function.

In some embodiments, the method for determining the first threshold, as shown in FIG. 5, may include the following steps 501 to 503:

Step 501, assuming that the first threshold is a plurality of different candidate thresholds, according to the image review method, determine whether a plurality of verified image files are violating files, so as to obtain the review corresponding to each candidate threshold Result collection.

In some embodiments, the plurality of verification image files may include a violation image file and a compliance image file. The verification image file is different from the file used to train the neural network model. The multiple verification image files may also include files obtained after the electronic device performs various transformation processes on the original image files. The transformation rules used in the transformation processing may be the same as the transformation rules used in the model training stage.

Understandably, by implementing step 501, a set of audit results obtained based on each candidate threshold can be obtained. As shown in Table 1, the set of audit results corresponding to threshold 1 is the content in the second column of Table 1.

Table 1

To	候选阈值1 Candidate threshold 1	候选阈值2Candidate threshold 2	……...	候选阈值NCandidate threshold N
验证影像文件1Verify image file 1	11	11	……...	11
验证影像文件2Verify image file 2	00	11	……...	11
……...	……...	……...	……...	……...
验证影像文件MVerify image file M	11	11	……...	00

Among them, "1" in the column to which the candidate threshold belongs indicates that the audit result of the corresponding file is a compliant file, and "0" indicates that the audit result of the corresponding file is a violation file.

Step 502: Determine the correct recall rate and the error recall rate under the corresponding candidate threshold according to each audit result set and the type label of each verified image file.

In an example, the calculation formula for the correct recall rate is shown in the following formula (1):

The calculation formula of the error recall rate is shown in the following formula (2):

In equations (1) and (2), TN represents the number of violation documents reviewed as violations; FP represents the number of violation documents reviewed as compliance documents; FN represents the number of compliance documents reviewed as violation documents.

Step 503: Determine the candidate thresholds corresponding to the correct recall rate and the false recall rate that meet specific conditions as the first threshold.

Understandably, which candidate threshold is selected as the first threshold directly determines the recognition accuracy of the image review method. Therefore, on the premise of ensuring a higher correct recall rate, the false recall rate should be reduced as much as possible, so as to select the corresponding candidate threshold as the first threshold. For example, in the case of ensuring that the correct recall rate is greater than or equal to the minimum correct recall rate (such as 0.85), the candidate threshold corresponding to the minimum error recall rate is selected as the first threshold.

In some embodiments, the electronic device may adopt a grid search method to gradually approach the optimal value, so as to select the first threshold from a plurality of candidate thresholds.

The embodiment of the application further provides an image review method. FIG. 6 is a schematic diagram of the implementation process of the image review method according to the embodiment of the application. As shown in FIG. 6, the method may include the following steps 601 to 606:

Step 601: Obtain a feature vector extraction structure of the target classification model. The feature vector extraction structure includes the input layer to the non-linear activation layer of the target classification model; wherein, the target classification model uses a plurality of sample image files And the corresponding multiple image transformation file training.

For example, the target classification model can be a lightweight neural network model MobileNetV2. The structure of the network, as shown in FIG. 7A, includes a "bottleneck structure", a conv2d layer, a sigmoid activation layer, an n×1 dimensional fully connected layer (Dense), and a normalized index layer (softmax). In some embodiments, as shown in FIG. 7B, the "bottleneck structure", the conv2d layer, and the sigmoid activation layer may be used as the feature vector extraction structure.

Step 602: Use the feature vector extraction structure to perform feature extraction on the image file to be reviewed to obtain a corresponding feature vector.

In other words, the output of the nonlinear activation layer of the feature vector extraction structure is the feature vector corresponding to the file.

Step 603: Determine the similarity between the feature vector of the image file to be reviewed and each reference feature vector in the review set; wherein the similarity is used to represent the number of different features between the two feature vectors;

Step 604: Determine the number of similarities less than the first threshold, where the similarity is used to characterize the number of different features between two feature vectors.

For example, the similarity is the Hamming distance.

Step 605: Determine the ratio of the number to the total number of similarities;

Step 606: Determine whether the to-be-reviewed image file is a violation file according to the relationship between the ratio and the second threshold.

Understandably, when the audit set is a violation set, when the ratio is greater than the second threshold, the image file to be reviewed is determined to be a violation file; when the ratio is less than or equal to the second threshold, the file is determined to be a compliant file .

When the audit set is a compliance set, when the ratio is greater than the second threshold, the image file to be reviewed is determined to be a compliant file; when the ratio is less than or equal to the second threshold, the file is determined to be a violation file.

In this embodiment of the application, the number of similarities less than the first threshold is counted, and the ratio between the number and the total number of similarities is determined; according to the relationship between the ratio and the second threshold, it is determined whether the image file to be reviewed is a violation File; In this way, compared to only obtaining the audit result based on the similarity with a reference feature vector, the audit result obtained in this way is more reliable and the recognition accuracy rate is higher.

The embodiment of the application further provides an image review method. FIG. 8 is a schematic diagram of the implementation process of the image review method according to the embodiment of the application. As shown in FIG. 8, the method may include the following steps 801 to 809:

In step 801, the target classification model is used to perform feature extraction on the image file to be reviewed to obtain the corresponding feature vector; wherein the target classification model is obtained through training of multiple sample image files and corresponding multiple image transformation files.

Understandably, a target classification model usually consists of multiple sequentially connected layers. The first layer generally takes an image as input, and extracts features from the image through specific operations. Next, the features extracted from the previous layer of each layer are used as input, and by transforming them in a specific form, more complex features can be obtained. This hierarchical feature extraction process can be accumulated, which gives the neural network powerful feature extraction capabilities. After many layers of transformation, the neural network can transform the initial input image into higher-level abstract features.

This abstract process from simple to complex, from low-level to high-level can be experienced through examples in life. For example, in the process of English learning, through the combination of letters, you can get words; through the combination of words, you can get sentences; through the analysis of sentences, you can understand the semantics; through the analysis of semantics, you can get the expressed thought or purpose. And this kind of semantics, thoughts, etc. is a higher level of abstraction.

Therefore, in the embodiment of the present application, when feature extraction is performed on the image file to be reviewed through the target classification model, no matter how complicated the original file is to obtain the image file to be reviewed, the extracted feature vector is basically unchanged. In this way, the image review method has strong robustness, and even if the illegal file is transformed and uploaded to the network, it can still be accurately identified.

Step 802: Determine the similarity between the feature vector of the image file to be reviewed and the i-th reference feature vector in the review set; where i is greater than 0 and less than or equal to the reference feature vector in the review set Total number

Step 803: Determine whether the image file to be reviewed is a violation file according to the relationship between the similarity corresponding to the i-th reference feature vector and the first threshold; if so, go to step 804; otherwise, go to step 807;

The so-called similarity corresponding to the i-th reference feature vector refers to the similarity between the feature vector of the image file to be reviewed and the i-th reference feature vector.

Step 804: Count the first determined number of times the image file to be reviewed is a violation file;

Step 805: Determine whether the first number of determinations is greater than the third threshold; if yes, go to step 806; otherwise, i+1, go back to step 802;

Step 806: Output that the image file to be reviewed is a violation file.

Understandably, if the first number of times of determination is greater than the third threshold, it is sufficient to reliably determine that the image file to be reviewed is a violation file, and there is no need to continue to calculate the similarity between the feature vector of the image file to be reviewed and the remaining reference feature vector. , Thereby saving the amount of calculation and shortening the audit time.

For example, suppose that the audit set includes 10,000 reference feature vectors, the third threshold is 900, and the similarity is represented by Hamming distance. Then, when calculating the similarity corresponding to the 1000th reference feature vector, the first determination number is 901. That is, among the similarities corresponding to the first to 1000th reference feature vectors, 901 similarities are less than the first threshold. At this point, the image review process can be ended, and the review result of the image file to be reviewed as a violation file is output. There is no need to continue to calculate the similarity with the remaining 9,000 reference feature vectors.

Step 807: Count the second determined number of times that the image file to be reviewed is a compliant file;

Step 808: Determine whether the second determination times are greater than the fourth threshold; if yes, go to step 809; otherwise, i+1, go back to step 802;

In some embodiments, the fourth threshold is greater than the third threshold. In this way, the false detection rate of illegal files can be reduced.

Step 809: Output that the image file to be reviewed is a compliance file.

The embodiment of the application further provides an image review method. FIG. 9 is a schematic diagram of the implementation process of the image review method of the embodiment of the application. As shown in FIG. 9, the method may include the following steps 901 to 904:

Step 901, using the target classification model to perform feature extraction on the image file to be reviewed to obtain the corresponding feature vector; wherein the target classification model is obtained through training of multiple sample image files and corresponding multiple image transformation files;

Step 902: Determine the similarity between the feature vector of the image file to be reviewed and the i-th reference feature vector in the review set; where i is greater than 0 and less than or equal to the reference feature vector in the review set The total number, the reference image file corresponding to the reference feature vector is a violation file; the similarity is used to characterize the number of different features between the two feature vectors;

Step 903: Determine whether the similarity corresponding to the i-th reference feature vector is less than the first threshold; if yes, go to step 904; otherwise, i+1, go back to step 902;

Step 904: Determine that the image file to be reviewed is a violation file, and output the review result.

Compared with the above steps 802 to 809, here, if the similarity corresponding to the i-th reference feature vector is less than the first threshold, the review process is ended, and the output to be reviewed is the review result of the violation file; otherwise, continue to traverse the next One refers to the feature vector until it is determined that the image file to be reviewed is a violation file. Of course, in some embodiments, if each reference feature vector in the audit set is traversed, and the result is that the corresponding similarity is greater than or equal to the first threshold, then the output pending image file is the audit result of the compliance file.

In the related art, the input picture (that is, the picture to be reviewed) and the picture in the violation gallery (that is, an example of the first database) are calculated for similarity to determine whether the input picture is in violation. Commonly used similarity algorithms, such as the perceptual hash (pHash) algorithm and the Scale-Invariant Feature Transform (SIFT) algorithm.

The pHash algorithm is a rule algorithm designed manually. The basic principle of the algorithm is to obtain the hash value of the input picture, and then calculate the hash "distance" between the input picture and a picture in the illegal library to obtain these two The similarity of the picture; when the similarity is greater than the set threshold, the match is considered successful. The implementation process of the algorithm is as follows:

Reduce the size of the input picture; simplify the color of the reduced picture; calculate the average value of the simplified picture; compare the grayscale of the pixel based on the average; calculate the hash value based on the grayscale; calculate and violate the rules based on the hash value The Hamming distance of a picture in the gallery; when the Hamming distance is less than the set threshold, it is determined that the matching is successful, and the input picture is an illegal picture.

The SIFT algorithm is used to detect and describe the local features in the picture. It finds extreme points in the spatial scale and extracts its position, scale, and rotation invariants. The description and detection of local features can help to identify objects. SIFT features are based on some local appearance points of interest on the object and have nothing to do with the size and rotation of the picture.

However, the algorithm factors (ie, image feature extraction operators) of the pHash algorithm and the SIFT algorithm are both artificially designed, so they can only meet specific matching scenarios. The pHash algorithm can only maintain the invariance of scale scaling and color change; the SIFT algorithm can only maintain the invariance of rotation, scale scaling, brightness change, affine, and noise.

Based on this, an exemplary application of the embodiment of the present application in an actual application scenario will be described below.

For the end-to-end deep learning matching algorithm, the neural network model is mainly used to directly calculate whether the two pictures match. The implementation process is shown in Figure 10, which is divided into a training phase and a prediction phase. The basic process of the training phase includes the following steps 1001 to 1004:

Step 1001, design a model structure (including convolutional layer, fully connected layer, pooling layer, etc.) to obtain an initial similarity model, that is, a neural network model;

Step 1002, prepare a large amount of image data as training samples;

Step 1003: Perform data enhancement processing on each picture in the training sample, for example, rotate, mirror, and render the pictures separately, and combine the two pictures obtained after different data transformations of the same picture into a positive sample (1 ), and other transformed pictures as negative samples (0).

In step 1004, the initial similarity model is updated through the gradient descent series optimization algorithm and the training samples after data enhancement, to obtain the trained similarity model, that is, the target classification model.

The basic process of the prediction phase, as shown in Figure 10, includes steps 1005 to 1007:

Step 1005, the input picture and each picture in the illegal library are calculated for similarity;

Step 1006: Determine whether the ratio of the number of similarities less than the first threshold to the total number of similarities is greater than the second threshold; if so, go to step 1007;

In step 1007, it is considered that the matching is successful, and it is determined that the input picture is a violation picture.

End-to-end deep learning matching algorithm. The deep learning model contains multiple convolution kernels obtained through gradient descent. The convolution kernel has a strong ability to express image features and basically meets all image transformation scenarios. However, in the prediction stage, for an input picture, it is necessary to cyclically perform matching calculations with all pictures in the gallery, plus the computational consumption of the neural network model itself, and its resource consumption is unacceptable.

In the embodiment of the present application, combining the characteristics of hash and deep learning, a deep neural network is used to extract image features to obtain the image hash, which is an example of feature vector; compare the similarity of the two image hashes to determine whether the matching is successful.

The following describes in detail the implementation process of the image review method provided by the embodiment of the present application. As shown in FIG. 11, the process may include the following Step 1 to Step 4):

Step1) Data preparation. Prepare 200 original pictures, as shown in Figure 12, perform picture transformation operations such as flipping, rotating, scaling, cropping, liquefying, mosaicing, noise, discoloration, and occlusion on each original picture, or a combination of them. Perform 100 different transformation operations on each picture, so that a total of 20,000 samples are obtained.

Step2) Design the model. The lightweight deep neural network MobileNetV2 is selected as the feature extractor. Before training the model, modify the MobileNetV2 network structure. The original structure of MobileNetV2 is shown in Table 2 below. The header "Input" is the input size of the structure layer, and "Operator" is the structure type of the layer. "C" is the dimension of the output feature layer of this layer, "n" is the number of repetitions of this layer, and "s" is the number of steps of the deep convolution kernel.

Table 2

NumNum	InputInput	OperatorOperator	cc	nn	ss
11	224 ²×3 224 ² ×3	Conv2dConv2d	3232	11	22
22	112 ²×32 112 ² ×32	bottleneckbottleneck	1616	11	11
33	112 ²×16 112 ² ×16	bottleneckbottleneck	24twenty four	22	22
44	56 ²×24 56 ² ×24	bottleneckbottleneck	3232	33	22
55	28 ²×32 28 ² ×32	bottleneckbottleneck	6464	44	22
66	14 ²×64 14 ² ×64	bottleneckbottleneck	9696	33	11
77	14 ²×96 14 ² ×96	bottleneckbottleneck	160160	33	22
88	7 ²×160 7 ² ×160	bottleneckbottleneck	320320	11	11
99	7 ²×320 7 ² ×320	Conv2d 1×1 Conv2d 1×1	12801280	11	11
1010	7 ²×1280 7 ² ×1280	Avgpool 7×7Avgpool 7×7	--	11	--
1111	1×1×12801×1×1280	Conv2d 1×1 Conv2d 1×1	kk	--	--
1212	k×1k×1	Active-SoftmaxActive-Softmax	kk	--	--

The input size of the 11th layer of MobileNetV2 is fixed at 1×1×1280, and k 1×1 size convolution kernels are used for convolution calculation, so as to output a 1-dimensional vector of length k. Finally, connect the softmax activation layer to calculate the probability of k categories.

For ease of description, the first to tenth layers shown in Table 2 are referred to as "bottleneck structure", and the simplified MobileNetV2 structure is shown in Figure 13.

The MobileNetV2 structure is modified as follows: between the conv2d layer and the softmax layer, a sigmoid activation layer and an n×1 dimensional fully connected layer (Dense) are added. The modified MobileNetV2 structure is shown in Figure 7A.

Step3) Model training stage.

Take the 20000 pictures obtained in Step1 as training samples and 200 original pictures as the labels of the training samples to train a picture classification model, that is, a specific neural network model. Corresponding to FIG. 7A, k=200, and n is the dimension of the hash that needs to be encoded (for example, 300). Train the modified MobileNetV2 classification model shown in Figure 7A.

The model loss function is a multi-category cross-entropy loss (categorical_crossentropy), the optimization algorithm is Adam, the learning rate is fixed at 0.001, and the accuracy of the trained model is >99.5%.

Step4) Matching stage.

Load the model parameters obtained in Step3. In order to obtain the hash value of the picture, delete the last two layers of the model, namely the Dense layer and the softmax layer. The modified model is shown in Figure 7B. For ease of description, this model is called "Mobilehashnet", which is an example of feature vector extraction structure. The image review method based on this model is called Mobilehashnet algorithm.

At this time, the output of the model is a 1-dimensional vector with a length of n (for example, 300). As shown in Figure 14, since the activation function is sigmoid, the value range of the sigmoid output is (0, 1). Then, the output is filtered according to the principle of output <0.5, then 0, output>0.5, then 1, and the output is filtered, and finally a hash vector with a length of 300 and a value of 0 or 1 is obtained, that is, a feature vector.

It should be noted that the reason why the extracted feature vector is called a hash vector is because even if the input image is a transformed image of the original image, the feature vector extracted by Mobilehashnet is still consistent with the feature vector of the original image.

As shown in Figure 15, after obtaining the hash vectors of picture 1 and picture 2, the Hamming distance of the two pictures can be calculated according to the hash vector of the picture. The smaller the distance, the more similar the two pictures. The realization of matching can specify a first threshold. When the Hamming distance is lower than the first threshold, the two pictures are considered to be the same picture and the matching is successful; otherwise, the matching fails.

It should be noted that the selection of the first threshold here needs to be obtained through verification in advance. The preparation process of the validation set is the same as the above training set. Prepare several pictures in the non-training set, perform data enhancement, and calculate the correct recall rate (recall) and wrong recall rate (wrong_recall) of the matching model under different candidate thresholds.

A good matching model should minimize the false recall rate while ensuring the correct recall rate. In some embodiments, a grid search method can be used to gradually approach the optimal value. The grid search results are shown in Figure 16 and Figure 17; among them, Figure 16 shows that when the candidate threshold is 35 to 70, the corresponding recall And wrong_recall. Figure 17 shows the corresponding recall and wrong_recall when the candidate threshold is 50 to 55.

In an example, at candidate threshold=52, recall=0.85 and wrong_recall=0.15, which is a good value. This is because, on the premise that the value of recall is greater than or equal to 0.85, the smaller the value of wrong_recall, the better, so the candidate threshold corresponding to the smallest wrong_recall can be determined as the first threshold.

The hash dimension directly determines the number of convolution kernels of the 2d convolutional layer (conv2d1×1) in the modified MobileNetV2 structure and the output dimension n of the activation layer. Since it is at the end of the network structure, its size directly affects the learning ability of the model. If the hash dimension is too small, it will lead to underfitting of the model and reduce the limit on the number of libraries; too large dimension not only increases the time consumption of generating hash, but also increases the time consumption of calculating the Hamming distance, so you need to choose a reasonable hash dimension .

In an example, the hash dimension n is taken to be 1.5 times the number of original pictures (classification types), that is, n=1.5×200=300.

Understandably, compared to relying on purely artificially designed calculation factors, Mobilehashnet uses deep neural networks to extract image features, which theoretically has performance advantages. In order to illustrate its high-performance characteristics more intuitively, the matching performance of the Mobilehashnet algorithm, the Phash algorithm and the SIFT algorithm is compared under different image transformation methods. The experimental results are shown in Table 3.

table 3

From the comparison results shown in Table 3, it can be seen that the Phash algorithm is basically unable to match in image transformations such as flipping, rotating, and zooming; the SIFT algorithm is at a low value in all types of image changes. In the embodiment of this application, the Mobilehashnet algorithm can achieve 100% recall in image transformations of flipping, distorting, cutting, mosaic, and noise, and in other image transformations, the recall value is higher, and the wrong_recall value is lower. .

Compared with training a picture classification model by manually labeling a large number of samples in related technologies, in the Mobilehashnet algorithm provided in the embodiment of this application, training can be performed without manually labeling a large number of samples, and a large number of training samples are automatically obtained through image data enhancement technology. .

The Mobilehashnet algorithm provided by the embodiments of this application extracts image features by using a deep neural network, generates image hashes based on these features, and performs image matching. Compared with the related image matching/similarity algorithm, it effectively improves the correct recall rate, reduces the false recall rate, and does not require a large amount of manual data annotation.

The picture review system reviews the pictures uploaded by users to prevent the spread of a large number of illegal pictures. Due to the complexity of image content, as shown in Figure 18, the process of the image review system includes an illegal library matching model, an image classification model, a face recognition model, a text recognition model, and a text classification model. The pictures to be reviewed are reviewed by each model in turn. When the results of all models are "normal", the review result can be "normal", that is, a compliant picture; otherwise, it is a violating picture.

Among them, the illegal library matching model in the image review system can be implemented by the Mobilehashnet algorithm provided in the embodiment of this application, which ensures a high correct recall rate and a low error recall rate for matching. The implementation process of this algorithm is shown in Figure 19, extract the hash vector of the picture to be reviewed; determine the Hamming distance of each hash vector in the illegal hash library corresponding to the hash vector and the illegal library, that is, calculate the Hamming distance in batches; judge; Whether each Hamming distance is greater than the first threshold, so as to obtain the recall result, that is, the correct recall rate and the false recall rate.

In some embodiments, the offending hash library can be obtained when the system is initialized, and only one hash calculation is required for matching, that is, only the feature extraction of the image to be reviewed is required.

Based on the foregoing embodiments, the image file review device provided by the embodiments of the present application, including the modules included and the units included in each module, can be implemented by the processor in the terminal; of course, it can also be implemented by specific logic. Circuit implementation; in the implementation process, the processor can be a central processing unit (CPU), a microprocessor (MPU), a digital signal processor (DSP), or a field programmable gate array (FPGA), etc.

FIG. 20A is a schematic structural diagram of an image file review device according to an embodiment of the application. As shown in FIG. 20A, the device 200 includes a feature extraction module 201, a first determination module 202, and an review module 203, wherein:

The feature extraction module 201 is configured to use the target classification model to perform feature extraction on the image file to be reviewed to obtain the corresponding feature vector; wherein the target classification model is obtained through training of multiple sample image files and corresponding multiple image transformation files ；

The first determining module 202 is configured to determine the similarity between the feature vector of the image file to be reviewed and at least one reference feature vector in the review set;

The review module 203 is configured to determine whether the image file to be reviewed is a violation file according to the determined relationship between the similarity and the first threshold.

In some embodiments, the feature extraction module 201 is configured to obtain a feature vector extraction structure of the target classification model, and the feature vector extraction structure includes the input layer to the non-linear activation layer of the target classification model; The type of the target classification model is a neural network model; the feature vector extraction structure is used to perform feature extraction on the image file to be reviewed to obtain a corresponding feature vector.

In some embodiments, as shown in FIG. 20B, the image auditing device 200 further includes: a tag acquisition module 204, configured to acquire the type tag of each sample image file; a transformation processing module 205, configured to follow a variety of transformation rules , Performing transformation processing on each of the sample image files to obtain an image transformation file set of the corresponding file; the tag labeling module 206 is configured to assign the type label of each sample image file to the corresponding image transformation file set Each image transformation file; the model training module 207 is configured to train a specific neural network model according to each of the sample image files, each of the image transformation files, and their corresponding type labels to obtain the target classification Model.

In some embodiments, the review module 203 is configured to: determine the number of similarities less than the first threshold, where the similarity is used to characterize the number of different features between two feature vectors; determine that the number is equal to The ratio of the total number of similarities; according to the relationship between the ratio and the second threshold, it is determined whether the image file to be reviewed is a violation file.

In some embodiments, the first determining module 202 is configured to determine the similarity between the feature vector of the image file to be reviewed and the i-th reference feature vector in the review set; where i is greater than 0 and Less than or equal to the total number of reference feature vectors in the review set; the similarity is used to characterize the number of different features between two feature vectors, and the reference image file corresponding to the reference feature vector is a violation file; accordingly , The review module 203 is configured to determine that the image file to be reviewed is a violation file when the similarity corresponding to the i-th reference feature vector is less than the first threshold.

In some embodiments, the first determining module 202 is further configured to: when the similarity corresponding to the i-th reference feature vector is greater than or equal to the first threshold, determine the feature vector of the image file to be reviewed and The similarity between the i+1th reference feature vector in the review set is used to determine whether the image file to be reviewed is a violation file.

In some embodiments, as shown in FIG. 20B, the image review device 200 further includes: a loading module 208 configured to load the generated review set; correspondingly, the feature extraction module 201 is further configured to: use the The target classification model performs feature extraction on multiple reference image files to obtain the feature vector of the corresponding file; and uses the feature vector of each reference image file as a reference feature vector to generate the review set.

In some embodiments, the loading module 208 is configured to load the determined first threshold;

Correspondingly, the device further includes a second determination module, configured to use the feature extraction module, the first determination module, and the review module of the device under the assumption that the first threshold is a plurality of different candidate thresholds. , To determine whether a plurality of verification image files are illegal files, so as to obtain the audit result set corresponding to each candidate threshold; according to each audit result set and the type label of each verification image file, it is determined to be under the corresponding candidate threshold The correct recall rate and the false recall rate of, and the candidate thresholds corresponding to the correct recall rate and the false recall rate that meet specific conditions are determined as the first threshold.

The description of the above device embodiment is similar to the description of the above method embodiment, and has similar beneficial effects as the method embodiment. For technical details not disclosed in the device embodiments of the present application, please refer to the description of the method embodiments of the present application for understanding.

It should be noted that, in the embodiments of the present application, if the above-mentioned image review method is implemented in the form of a software function module and sold or used as an independent product, it can also be stored in a computer readable storage medium. Based on this understanding, the technical solutions of the embodiments of the present application can be embodied in the form of a software product in essence or a part that contributes to related technologies. The computer software product is stored in a storage medium and includes a number of instructions to enable The electronic device executes all or part of the method described in each embodiment of the present application. The aforementioned storage media include: U disk, mobile hard disk, read only memory (Read Only Memory, ROM), magnetic disk or optical disk and other media that can store program codes. In this way, the embodiments of the present application are not limited to any specific combination of hardware and software.

Correspondingly, an embodiment of the present application provides an electronic device. FIG. 21 is a schematic diagram of the hardware entity of the electronic device according to an embodiment of the application. As shown in FIG. 21, the electronic device 210 includes a memory 211 and a processor 212. 211 stores a computer program that can be run on the processor 212, and the processor 212 implements the steps in the image review method provided in the foregoing embodiment when the processor 212 executes the program.

It should be noted that the memory 211 is configured to store instructions and applications executable by the processor 212, and can also cache data to be processed or processed by the processor 212 and each module in the electronic device 210 (for example, image data, audio data, etc.). , Voice communication data and video communication data), which can be implemented by flash memory (FLASH) or random access memory (Random Access Memory, RAM).

Correspondingly, an embodiment of the present application provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the steps in the image review method provided in the above-mentioned embodiments are implemented.

It should be pointed out here that the description of the foregoing storage medium, chip, and terminal device embodiments is similar to the description of the foregoing method embodiment, and has similar beneficial effects as the method embodiment. For technical details not disclosed in the embodiments of the storage medium, chip, and terminal device of this application, please refer to the description of the method embodiments of this application for understanding.

It should be understood that “one embodiment” or “an embodiment” or “some embodiments” mentioned throughout the specification means that a specific feature, structure, or characteristic related to the embodiment is included in at least one embodiment of the present application . Therefore, appearances of "in one embodiment" or "in an embodiment" or "in some embodiments" in various places throughout the specification do not necessarily refer to the same embodiment. In addition, these specific features, structures, or characteristics can be combined in one or more embodiments in any suitable manner. It should be understood that, in the various embodiments of the present application, the size of the sequence numbers of the above-mentioned processes does not mean the order of execution, and the execution order of each process should be determined by its function and internal logic, and should not correspond to the embodiments of the present application. The implementation process constitutes any limitation. The serial numbers of the foregoing embodiments of the present application are for description only, and do not represent the superiority or inferiority of the embodiments.

It should be noted that in this article, the terms "include", "include" or any other variants thereof are intended to cover non-exclusive inclusion, so that a process, method, article or device including a series of elements not only includes those elements, It also includes other elements that are not explicitly listed, or elements inherent to the process, method, article, or equipment. If there are no more restrictions, the element defined by the sentence "including a..." does not exclude the existence of other identical elements in the process, method, article, or equipment that includes the element.

In the several embodiments provided in this application, it should be understood that the disclosed device and method can be implemented in other ways. The embodiments of the touch screen system described above are merely illustrative, for example, the division of the modules is only a logical function division, and there may be other division methods in actual implementation, such as: multiple modules or components can be combined , Or can be integrated into another system, or some features can be ignored or not implemented. In addition, the coupling, or direct coupling, or communication connection between the components shown or discussed can be indirect coupling or communication connection through some interfaces, devices or modules, and can be electrical, mechanical or other forms of.

The modules described above as separate components may or may not be physically separated, and the components displayed as modules may or may not be physical modules; they may be located in one place or distributed on multiple network units; Some or all of the modules can be selected according to actual needs to achieve the objectives of the solutions of the embodiments.

In addition, the functional modules in the embodiments of the present application may all be integrated into one processing unit, or each module may be individually used as a unit, or two or more modules may be integrated into one unit; the above-mentioned integration The module can be implemented in the form of hardware, or in the form of hardware plus software functional units.

A person of ordinary skill in the art can understand that all or part of the steps in the above method embodiments can be implemented by a program instructing relevant hardware. The foregoing program can be stored in a computer readable storage medium. When the program is executed, the execution includes The steps of the foregoing method embodiment; and the foregoing storage medium includes: various media that can store program codes, such as a removable storage device, a read only memory (Read Only Memory, ROM), a magnetic disk, or an optical disk.

Alternatively, if the aforementioned integrated unit of this application is implemented in the form of a software function module and sold or used as an independent product, it may also be stored in a computer readable storage medium. Based on this understanding, the technical solutions of the embodiments of the present application can be embodied in the form of a software product in essence or a part that contributes to related technologies. The computer software product is stored in a storage medium and includes a number of instructions to enable The electronic device executes all or part of the method described in each embodiment of the present application. The aforementioned storage media include: removable storage devices, ROMs, magnetic disks, or optical disks and other media that can store program codes.

The methods disclosed in the several method embodiments provided in this application can be combined arbitrarily without conflict to obtain new method embodiments.

The features disclosed in the several product embodiments provided in this application can be combined arbitrarily without conflict to obtain new product embodiments.

The features disclosed in the several method or device embodiments provided in this application can be combined arbitrarily without conflict to obtain a new method embodiment or device embodiment.

The above are only the implementation manners of this application, but the protection scope of this application is not limited to this. Any person skilled in the art can easily think of changes or substitutions within the technical scope disclosed in this application. Covered in the scope of protection of this application. Therefore, the protection scope of this application should be subject to the protection scope of the claims.

Claims

Image review method, the method includes:

Use the target classification model to perform feature extraction on the image file to be reviewed to obtain the corresponding feature vector; wherein, the target classification model is obtained through training of multiple sample image files and corresponding multiple image transformation files;

Determining the similarity between the feature vector of the image file to be reviewed and at least one reference feature vector in the review set;

According to the determined relationship between the similarity and the first threshold, it is determined whether the pending image file is a violation file.
The method according to claim 1, wherein the type of the target classification model is a neural network model, and the use of the target classification model to perform feature extraction on the image file to be reviewed to obtain the corresponding feature vector comprises:

Acquiring a feature vector extraction structure of the target classification model, where the feature vector extraction structure includes an input layer to a non-linear activation layer of the target classification model;

Using the feature vector extraction structure, feature extraction is performed on the image file to be reviewed to obtain the corresponding feature vector.
The method according to claim 1 or 2, wherein the training process of the target classification model includes:

Acquiring the type label of each of the sample image files;

Perform transformation processing on each of the sample image files according to multiple transformation rules to obtain a set of image transformation files corresponding to the files;

Assigning the type label of each sample image file to each image transformation file in the corresponding image transformation file set;

According to each of the sample image files, each of the image transformation files, and respective corresponding type labels, a specific neural network model is trained to obtain the target classification model.
The method according to claim 1, wherein the determining whether the image file to be reviewed is a violation file according to the determined relationship between the similarity degree and the first threshold value comprises:

Determining the number of similarities less than the first threshold, where the similarity is used to characterize the number of different features between two feature vectors;

Determine the ratio of the number to the total number of similarities;

According to the relationship between the ratio and the second threshold, it is determined whether the pending image file is a violation file.
The method according to claim 1, wherein the similarity is used to characterize the number of different features between two feature vectors, and the reference image file corresponding to the reference feature vector is a violation file;

The determining the similarity between the feature vector of the image file to be reviewed and at least one reference feature vector in the review set includes:

Determining the similarity between the feature vector of the image file to be reviewed and the i-th reference feature vector in the review set; where i is greater than 0 and less than or equal to the total number of reference feature vectors in the review set;

Correspondingly, the determining whether the image file to be reviewed is a violation file according to the determined relationship between the similarity and the first threshold includes:

In a case where the similarity corresponding to the i-th reference feature vector is less than the first threshold, it is determined that the image file to be reviewed is a violation file.
The method according to claim 5, further comprising:

When the similarity corresponding to the i-th reference feature vector is greater than or equal to the first threshold, it is determined between the feature vector of the image file to be reviewed and the i+1-th reference feature vector in the review set To determine whether the pending image file is a violation file.
The method according to any one of claims 1 to 6, further comprising: loading the generated audit set;

The method for generating the audit set includes:

Using the target classification model, feature extraction is performed on multiple reference image files to obtain feature vectors of corresponding files;

The feature vector of each reference image file is used as a reference feature vector to generate the review set.
The method according to any one of claims 1 to 6, further comprising: loading the determined first threshold; wherein the method for determining the first threshold comprises:

Under the assumption that the first threshold is a plurality of different candidate thresholds, according to the image review method, determine whether a plurality of verified image files are violating files, so as to obtain a set of review results corresponding to each of the candidate thresholds;

Determine the correct recall rate and the error recall rate under the corresponding candidate threshold according to each audit result set and the type label of each verification image file;

The candidate thresholds corresponding to the correct recall rate and the false recall rate that meet the specific conditions are determined as the first threshold.
Image review device, including:

The feature extraction module is configured to use the target classification model to perform feature extraction on the image file to be reviewed to obtain the corresponding feature vector; wherein the target classification model is obtained through training of multiple sample image files and corresponding multiple image transformation files;

The first determining module is configured to determine the similarity between the feature vector of the image file to be reviewed and at least one reference feature vector in the review set;

The review module is configured to determine whether the image file to be reviewed is a violation file based on the determined relationship between the similarity and the first threshold.
The device according to claim 9, wherein the feature extraction module is configured to:

Acquiring a feature vector extraction structure of the target classification model, the feature vector extraction structure including an input layer to a non-linear activation layer of the target classification model; wherein the type of the target classification model is a neural network model;

Using the feature vector extraction structure, feature extraction is performed on the image file to be reviewed to obtain the corresponding feature vector.
The device according to claim 9 or 10, further comprising:

The label obtaining module is configured to obtain the type label of each sample image file;

The transformation processing module is configured to perform transformation processing on each of the sample image files according to a variety of transformation rules to obtain a set of image transformation files corresponding to the files;

The labeling module is configured to assign the type label of each sample image file to each image transformation file in the corresponding image transformation file set;

The model training module is configured to train a specific neural network model according to each of the sample image files, each of the image transformation files and respective corresponding type labels to obtain the target classification model.
The device according to claim 9, wherein the audit module is configured to:

Determining the number of similarities less than the first threshold, where the similarity is used to characterize the number of different features between two feature vectors;

Determine the ratio of the number to the total number of similarities;

According to the relationship between the ratio and the second threshold, it is determined whether the pending image file is a violation file.
The device according to claim 9, wherein:

The first determining module is configured to determine the similarity between the feature vector of the image file to be reviewed and the i-th reference feature vector in the review set;

Wherein, i is greater than 0 and less than or equal to the total number of reference feature vectors in the review set; the similarity is used to characterize the number of different features between the two feature vectors, and the reference image file corresponding to the reference feature vector Is a violation document;

Correspondingly, the review module is configured to determine that the image file to be reviewed is a violation file when the similarity corresponding to the i-th reference feature vector is less than the first threshold.
The device according to claim 13, wherein the first determining module is further configured to:

When the similarity corresponding to the i-th reference feature vector is greater than or equal to the first threshold, it is determined between the feature vector of the image file to be reviewed and the i+1-th reference feature vector in the review set To determine whether the pending image file is a violation file.
The device according to any one of claims 9 to 14, further comprising:

Loading module, configured to load the generated audit set;

Correspondingly, the feature extraction module is further configured to: use the target classification model to perform feature extraction on multiple reference image files to obtain the feature vector of the corresponding file; and use the feature vector of each reference image file as a reference The feature vector is used to generate the review set.
The device according to any one of claims 9 to 14, further comprising:

A loading module configured to load the determined first threshold;

Correspondingly, the device further includes a second determination module, configured to use the feature extraction module, the first determination module, and the review module of the device under the assumption that the first threshold is a plurality of different candidate thresholds. , To determine whether a plurality of verification image files are illegal files, so as to obtain the audit result set corresponding to each candidate threshold; according to each audit result set and the type label of each verification image file, it is determined to be under the corresponding candidate threshold The correct recall rate and the false recall rate of, and the candidate thresholds corresponding to the correct recall rate and the false recall rate that meet specific conditions are determined as the first threshold.
An electronic device, comprising a memory and a processor, the memory storing a computer program that can run on the processor, and the processor implements the steps in the image review method of any one of claims 1 to 8 when the processor executes the program .
A computer-readable storage medium has a computer program stored thereon, and when the computer program is executed by a processor, the steps in the image review method described in any one of claims 1 to 8 are realized.