CN112699902A

CN112699902A - Fine-grained sensitive image detection method based on bilinear attention pooling mechanism

Info

Publication number: CN112699902A
Application number: CN202110031134.XA
Authority: CN
Inventors: 柯逍; 王俊强; 林艳
Original assignee: Fuzhou University
Current assignee: Fuzhou University
Priority date: 2021-01-11
Filing date: 2021-01-11
Publication date: 2021-04-23

Abstract

The invention relates to a fine-grained sensitive image detection method based on a bilinear attention pooling mechanism, which comprises the following steps of: step S1: acquiring a sensitive image, and performing data cleaning on the acquired sensitive image set to obtain an NSFW sensitive image training data set; step S2: inputting the NSFW sensitive image training data set into a fine-grained sensitive image intelligent auditing network model, performing feature extraction, and generating a feature map and an attention map; step S3: performing attention-based data enhancement on the NSFW training set according to the obtained attention diagram, and performing attention cropping and attention discarding on the images; step S4: generating a local feature map by the aggregation feature map and the attention map through a bilinear attention pooling mechanism, extracting local features through convolution and pooling, and combining all the local features into a final feature; step S5: and predicting the sensitive image category according to the final characteristics. The method can effectively improve the detection accuracy of the sensitive image of the scene difficult to sample.

Description

Fine-grained sensitive image detection method based on bilinear attention pooling mechanism

Technical Field

The invention relates to the technical field of image recognition, in particular to a fine-grained sensitive image detection method based on a bilinear attention pooling mechanism.

Background

In recent years, a deep learning image classification network Model based on a convolutional neural network is applied to intelligent audit of sensitive images, typically, a sensitive image audit Model Opensfw Model of Yahoo, but the application effect of the sensitive image audit Model of Yahoo in an intelligent audit scene of domestic sensitive images is poor, the main reason is that the used training data is different from the practical application situation, the training data set mainly comes from West, the images collected in a data set are mainly white races, and the generalization capability of the deep learning image classification network Model applied to the domestic sensitive image audit task is insufficient.

Disclosure of Invention

In view of this, the present invention provides a fine-grained sensitive image detection method based on a bilinear attention pooling mechanism, which can effectively improve the detection accuracy of a sensitive image in a difficult-to-sample scene.

In order to achieve the purpose, the invention adopts the following technical scheme:

a fine-grained sensitive image detection method based on a bilinear attention pooling mechanism comprises the following steps:

step S1: acquiring a sensitive image, and performing data cleaning on the acquired sensitive image set to obtain an NSFW sensitive image training data set;

step S2: constructing a fine-grained sensitive image intelligent auditing network model, inputting an NSFW sensitive image training data set into the fine-grained sensitive image intelligent auditing network model, extracting features, and generating a feature map and an attention map;

step S3: performing attention mechanism-based data enhancement on the NSFW training set according to the obtained attention diagram, and performing attention clipping and attention discarding on the images on the basis of reserving the saliency discrimination areas in the sensitive images;

step S4: generating a local feature map by the aggregation feature map and the attention map through a bilinear attention pooling mechanism, extracting local features through convolution and pooling, and combining all the local features into a final feature;

step S5: and predicting the sensitive image category according to the final characteristics.

Further, the step S1 is specifically:

step S11, acquiring five category images of Drawings, Neutral, Sexy, Hentai and Porn in batches through URL addresses;

step S12: classifying the images of the Sexy and Port categories by using an Open _ Nsfw yellow identification model of the Yahoo Open source, and adjusting, screening and filtering sample images which do not belong to the corresponding categories or are unavailable;

step S13: and dividing the sample set subjected to data cleaning into a training set and a testing set according to the proportion of 8:1, and constructing an NSFW data set.

Further, the step S2 is specifically:

step S21: fine-tuning the pre-trained BiT-M model according to the obtained NSFW sensitive image training data set, and extracting features by using a ResNet50 network as a main network to obtain a feature map F;

step S22: performing 1 × 1 convolution operation on the obtained feature diagram F to obtain an attention diagram A;

step S23: an attention regularization loss mechanism is employed to weakly supervise the attention learning based process.

Further, the step S23 is specifically: the variance of local features belonging to the same object is balanced while the local features f_kWill approach the global feature center c_k∈R^1×NAttention-seeking drawing A_kWill be activated locally in the same kth object, and the attention regularization loss is only applied to the original image, and the loss function adopts L in the formula_ARepresents:

c_k←c_k+β(f_k-c_k)

wherein L is_AIs a loss function, M is the number of attention maps, f_kIs a local feature, c_kIs a global feature center that can be initialized from zero and updated using a moving average, β is the control c_kA function of the update rate.

Further, the step S3 is specifically:

step S31: for each training image in the NSFW-sensitive image dataset, an attention map a of the picture is randomly selected_kTo guide and normalize the data enhancement process to the kth data enhancement graph

The formula of (1) is as follows:

wherein A is_kIn an effort to address the need for attention,

for data enhancement map, min (A)_k) Max (A) for minimal attention_k) Is the most attentive purpose;

step S32: enhancing maps by data

Amplifying the significant characteristic region and extracting local characteristics;

step S33: setting a bounding box B covering the entire selected positive field of the crop mask_kAnd taking an image obtained by amplifying the area from the original image as input data of data enhancement;

step S34: attention regularization loss supervision Each attention map A_k∈R^H×WThe part representing the same k-th object will be larger than the threshold value theta_d∈[0，1]Of (2) element(s)

Set to 0 and the other elements to 1, the formula is as follows:

wherein

Is a discriminant element of the position (i, j), D_k(i, j) is the discard mask for position (i, j), θ_cIs a threshold value.

Further, the step S32 is specifically: by setting elements

To come from

Obtaining a cutting mask C_kThe element being greater than the threshold value theta_c∈[0，1]The value of the timer is set to 1, otherwise to 0, and the formula is as follows:

wherein

Is a distinguishing element of position (i, j), C_k(i, j) is a trim mask for position (i, j) (-)_cIs a threshold value.

Further, the step S4 is specifically:

step S41: each attention diagram represents a part of a specific object, each attention diagram is multiplied by a feature diagram element by element to generate a partial feature diagram, and then discriminant local features are further extracted through an additional feature extraction function to obtain a kth attention saliency feature;

step S42: local feature f_kGenerating object characteristics by superposition, and generating a part of characteristic matrix P epsilon R^M×NExpressed, Γ (a, F) represents the bilinear attention pooling process formula for the attention map a and the feature map F:

where Γ (A, F) represents a bilinear attention pooling function, g (-) is a feature extraction function, a₁，...，a_MFor local attention, F is a feature map, F₁，...，f_MIs a local feature;

step S43: local features are extracted from the partial feature map generated in step S41 by convolution or pooling operation, and a final feature matrix is composed of all the partial features.

A fine-grained sensitive image detection system based on a bilinear attention pooling scheme, comprising a memory, a processor and computer program instructions stored on the memory and executable by the processor, which when executed by the processor, implement the method steps of any of claims 1-7.

A computer-readable storage medium having stored thereon computer program instructions executable by a processor, the computer program instructions, when executed by the processor, performing the method steps of any of claims 1-7.

Compared with the prior art, the invention has the following beneficial effects:

1. the fine-grained sensitive image detection method based on the bilinear attention pooling mechanism can effectively solve the problem of distinguishing feature extraction in sensitive images in a difficult sample scene, fully exerts the advantage of feature extraction of a deep learning method, can learn simple features from a large amount of data set firstly and then learn more complex and abstract deep features gradually without depending on artificial feature engineering, and completes the intelligent examination and verification of fine-grained sensitive images under refined classification;

2. according to the method, the data cleaning based on the deep learning Open _ Nsfw model is carried out on the sensitive image data set collected from the Internet, the invalid data samples in the collected sensitive image are effectively screened and filtered, and compared with manual screening, a large amount of time and labor cost are saved;

3. aiming at the problem that the random data enhancement efficiency of a sensitive image data set is low, particularly under the condition that the target size is small, background noise with a high proportion is introduced, the invention provides the attention mechanism-based sensitive image data enhancement method, and the attention clipping and attention discarding are carried out on the image on the basis of reserving the significance distinguishing area in the sensitive image, so that the effectiveness of data enhancement is effectively improved;

4. the invention provides a fine-grained sensitive image content intelligent auditing model based on a bilinear attention pooling mechanism, aiming at the problems of few thinning categories, poor performance on difficult sample sensitive images and the like of the conventional sensitive image auditing model. The model firstly generates a target feature map and an attention map representing the salient features of the target through weak supervised learning, then generates a local feature map from the aggregate feature map and the attention map through a bilinear attention pooling mechanism, extracts local features through convolution and pooling, and finally combines all the local features into a final feature to improve the discrimination of the model. The discrimination capability of the hard sample sensitive image is effectively enhanced, and the accuracy of intelligent examination of the content of the sensitive image is finally improved.

Drawings

FIG. 1 is a schematic flow diagram of the process of the present invention.

Detailed Description

The invention is further explained below with reference to the drawings and the embodiments.

Referring to fig. 1, the present invention provides a fine-grained sensitive image detection method based on bilinear attention pooling, which includes the following steps:

In this embodiment, the step S1 specifically includes:

step S11, acquiring 63486 images of Drawings, Neutral, Sexy, Hentai and Porn in batches through URL addresses;

In this embodiment, the step S2 specifically includes:

The variance of local features belonging to the same object is balanced while the local features f_kWill approach the global feature center c_k∈R^1×NAttention-seeking drawing A_kWill be activated locally in the same kth object, and the attention regularization loss is only applied to the original image, and the loss function adopts L in the formula_ARepresents:

c_k←c_k+β(f_k-c_k)

In this embodiment, the step S3 specifically includes:

The formula of (1) is as follows:

wherein A is_kIn an effort to address the need for attention,

step S32: enhancing maps by data

The significant characteristic region is enlarged, and local characteristics are extracted;

by setting elements

To come from

wherein

Is a distinguishing element of position (i, j), C_k(i, j) is a trim mask for position (i, j) (-)_cIs a threshold value;

step S33: setting a bounding box B covering the entire selected positive field of the crop mask_kAnd taking an image obtained by amplifying the area from the original image as input data of data enhancement; as the local scale of the object increases, the object can be observed more clearly to extract finer grained features;

Set to 0 and the other elements to 1, the formula is as follows:

wherein

Is a discriminant element of the position (i, j), D_k(i, j) is the discard mask for position (i, j), θ_cIs a threshold value. Because the part of the kth target is removed from the sensitive image, the network supports the proposing of other distinguishing area characteristics, the target can be better seen, and the classification robustness and the positioning accuracy are finally improved.

In this embodiment, the step S4 specifically includes:

Preferably, in this embodiment, there is also provided a fine-grained sensitive image detection system based on a bilinear attention pooling scheme, which includes a memory, a processor, and computer program instructions stored on the memory and capable of being executed by the processor, and when the computer program instructions are executed by the processor, the method steps as described in any one of the above are implemented.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The foregoing is directed to preferred embodiments of the present invention, other and further embodiments of the invention may be devised without departing from the basic scope thereof, and the scope thereof is determined by the claims that follow. However, any simple modification, equivalent change and modification of the above embodiments according to the technical essence of the present invention are within the protection scope of the technical solution of the present invention.

The above description is only a preferred embodiment of the present invention, and all equivalent changes and modifications made in accordance with the claims of the present invention should be covered by the present invention.

Claims

1. A fine-grained sensitive image detection method based on a bilinear attention pooling mechanism is characterized by comprising the following steps:

step S4: aggregating the feature map and the attention map through a bilinear attention pooling mechanism to generate a local feature map, extracting local features through convolution and pooling, and combining all the local features into a final feature;

2. The fine-grained sensitive image detection method based on bilinear attention pooling of claim 1, wherein the step S1 specifically comprises:

3. The fine-grained sensitive image detection method based on bilinear attention pooling of claim 1, wherein the step S2 specifically comprises:

4. The fine-grained sensitive image detection method based on bilinear attention pooling of claims 3, wherein the step S23 specifically comprises: the variance of local features belonging to the same object is balanced while the local features f_kWill approach the global feature center c_k∈R^1×NAttention-seeking drawing A_kWill be activated locally in the same kth object, and the attention regularization loss is only applied to the original image, and the loss function adopts L in the formula_ARepresents:

c_k←c_k+β(f_k-c_k)

5. The fine-grained sensitive image detection method based on bilinear attention pooling of claim 1, wherein step S3 is specifically:

The formula of (1) is as follows:

wherein A is_kIn an effort to address the need for attention,

step S32: enhancing maps by data

step S33: setting a bounding box B covering the entire selected positive field of the crop mask_kAnd enlarging the region from the original image as data-enhanced outputInputting data;

Set to 0 and the other elements to 1, the formula is as follows:

wherein

6. The fine-grained sensitive image detection method based on bilinear attention pooling of claim 5, wherein the step S32 specifically comprises: by setting elements

To come from

wherein

7. The fine-grained sensitive image detection method based on bilinear attention pooling of claim 1, wherein the step S4 specifically comprises:

where F (A, F) represents a bilinear attention pooling function, g (-) is a feature extraction function, a₁，...，a_MFor local attention, F is a feature map, F₁，...，f_MIs a local feature;

8. A fine-grained sensitive image detection system based on a bilinear attention pooling scheme, comprising a memory, a processor and computer program instructions stored on the memory and executable by the processor, the computer program instructions, when executed by the processor, performing the method steps of any of claims 1-7.

9. A computer-readable storage medium, having stored thereon computer program instructions executable by a processor, for performing, when the processor executes the computer program instructions, the method steps according to any one of claims 1-7.