CN113688695A

CN113688695A - Picture identification method, system, storage medium and electronic equipment

Info

Publication number: CN113688695A
Application number: CN202110885641.XA
Authority: CN
Inventors: 王少将; 唐会军; 刘拴林; 梁堃; 陈建
Original assignee: Beijing Nextdata Times Technology Co ltd
Current assignee: Beijing Nextdata Times Technology Co ltd
Priority date: 2021-08-03
Filing date: 2021-08-03
Publication date: 2021-11-23

Abstract

The invention relates to a picture identification method, a system, a storage medium and electronic equipment, based on a picture set comprising a plurality of pictures, a deep convolutional neural network added with an ADL layer is trained to obtain an intermediate picture identification model, the ADL layer in the intermediate picture identification model is removed to obtain a picture identification model, a picture to be identified is input into the picture identification model to obtain an identification result, the ADL layer can learn weaker characteristics, when the picture to be identified is a shielding picture, the weaker characteristics in the shielding picture can be identified, the identification precision is improved, the accuracy of the identification result is ensured, a large number of shielding pictures are not needed, and the time and the collection cost are saved. On the other hand, when the significance map is generated by performing the activation operation on the self-attention map and training is performed on the feature map obtained from the output feature of the previous layer and the significance map, the recognition accuracy of the non-occlusion picture can be ensured.

Description

Picture identification method, system, storage medium and electronic equipment

Technical Field

The present invention relates to the field of image recognition technologies, and in particular, to an image recognition method, an image recognition system, a storage medium, and an electronic device.

Background

With the change of multimedia technology, social ways of people are greatly enriched, and pictures, namely pictures and video streams, are very important communication means. With the increasing sophistication of regulatory requirements, identification of pornographic picture/video content has been a major issue for network content regulation. In order to resist network content supervision, pornographic content propagators often perform partial shielding or smearing on pornographic content in pictures/videos, specifically, shielding layers such as mosaic, watermarks or other marked pictures are added to the internal portions of pornographic content, so that shielded pictures or video streams composed of a plurality of shielded pictures are obtained to escape interception of a supervision system, at present, a deep convolutional neural network is the most widely applied deep learning technology in current picture identification, namely image identification, and specifically, identification of shielded pictures is performed by adopting an identification model obtained by training through the deep convolutional neural network, but the following problems exist:

1) in order to rapidly learn the difference between different pictures, the recognition model trained by the algorithm usually focuses more on the features of the most discriminating regions in the pictures, and when the key information in the pictures is shielded, the recognition effect is remarkably reduced, namely the recognition precision is reduced, and the recognition effect on the non-shielded pictures is greatly reduced.

2) When the deep convolutional neural network is trained, a large number of shielding pictures need to be based on, on one hand, the time consumption is long when a large number of shielding pictures are collected, and on the other hand, the cost is high when a database comprising a large number of shielding pictures is established.

Disclosure of Invention

The invention provides a picture identification method, a picture identification system, a storage medium and electronic equipment, aiming at the defects of the prior art.

The technical scheme of the picture identification method is as follows:

training a depth convolution neural network added with an ADL layer based on a picture set comprising a plurality of pictures to obtain an intermediate picture recognition model, removing the ADL layer in the intermediate picture recognition model to obtain a picture recognition model, wherein the ADL layer is used for: obtaining a self-attention map according to the output characteristics of the previous layer, carrying out threshold value binarization operation on the self-attention map to generate an inactivation mask, carrying out activation operation on the self-attention map to generate an importance map, obtaining a characteristic map according to the output characteristics of the previous layer and the importance map or the inactivation mask selected according to preset conditions, and inputting the characteristic map into the next layer;

and inputting the picture to be recognized into the picture recognition model to obtain a recognition result.

The picture identification method has the following beneficial effects:

the ADL layer can obtain the self-attention map according to the output characteristics of the upper layer, on one hand, threshold value binarization operation is carried out on the self-attention map to generate an inactivation mask, when training is carried out according to the output characteristics of the upper layer and the characteristic map obtained by the inactivation mask, weak characteristics can be learned, when the picture to be identified is an occlusion picture, the weak characteristics in the occlusion picture can be identified, the identification precision is improved, the accuracy of the identification result is ensured, in addition, the threshold value binarization operation is carried out on the self-attention map to generate the inactivation mask, the occlusion layer of the occlusion picture can be simulated, a large number of occlusion pictures do not need to be collected, and the time and the cost are saved. On the other hand, when the significance map is generated by performing the activation operation on the self-attention map and training is performed on the feature map obtained from the output feature of the previous layer and the significance map, the recognition accuracy of the non-occlusion picture can be ensured.

On the basis of the above scheme, the picture identification method of the present invention may be further improved as follows.

Further, the process of the ADL layer obtaining the self-attention map includes:

and the ADL layer performs average pooling operation on the output characteristics of the previous layer in a channel dimension to obtain the self-attention map.

Further, the process of obtaining the significance map by the ADL layer includes:

and the ADL layer performs an activation operation on the self-attention map by using a sigmoid activation function to generate the importance map.

The technical scheme of the picture identification system is as follows:

the device comprises a training module and an identification module, wherein the training module is used for: training a depth convolution neural network added with an ADL layer based on a picture set comprising a plurality of pictures to obtain an intermediate picture recognition model, removing the ADL layer in the intermediate picture recognition model to obtain a picture recognition model, wherein the ADL layer is used for: obtaining a self-attention map according to the output characteristics of the previous layer, carrying out threshold value binarization operation on the self-attention map to generate an inactivation mask, carrying out activation operation on the self-attention map to generate an importance map, obtaining a characteristic map according to the output characteristics of the previous layer and the importance map or the inactivation mask selected according to preset conditions, and inputting the characteristic map into the next layer;

the identification module is configured to: and inputting the picture to be recognized into the picture recognition model to obtain a recognition result.

The picture identification system has the following beneficial effects:

the ADL layer can obtain a self-attention map according to the output characteristics of the previous layer, on one hand, the self-attention map is subjected to threshold value binarization operation to generate an inactivation mask, when training is performed by a feature map obtained from the output features of the previous layer and the inactivation mask, weaker features can be learned, when the picture to be identified is the shielded picture, the weaker characteristic in the shielded picture can be identified, the identification precision is improved, the accuracy of the identification result is ensured, moreover, the inactivation mask is generated by carrying out threshold value binarization operation on the self-attention map, so that the shielding layer of the shielding picture can be simulated, a large number of shielding pictures do not need to be collected, the time and the cost are saved, on the other hand, and when training is carried out according to the output features of the previous layer and the feature map obtained by the feature map, the recognition precision of the non-occlusion picture can be ensured.

On the basis of the above scheme, the picture recognition system of the present invention can be further improved as follows.

Further, the ADL layer is specifically configured to: and carrying out average pooling operation on the output features of the upper layer in a channel dimension to obtain the self-attention diagram.

Further, the ADL layer is also specifically configured to: and performing an activation operation on the self-attention map by using a sigmoid activation function to generate the importance map.

The storage medium of the present invention stores instructions, and when the instructions are read by a computer, the computer is caused to execute any one of the above-mentioned picture recognition methods.

An electronic device of the present invention includes a memory, a processor, and a program stored in the memory and running on the processor, wherein the processor implements any one of the steps of the image recognition method when executing the program.

Drawings

Fig. 1 is a schematic flowchart of a picture identification method according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of the structure of an ADL layer;

fig. 3 is a schematic structural diagram of a picture recognition system according to an embodiment of the present invention.

Detailed Description

As shown in fig. 1, a picture identification method according to an embodiment of the present invention includes the following steps:

s1, training the depth convolution neural network added with the ADL layer based on the picture set comprising a plurality of pictures to obtain an intermediate picture recognition model, and removing the ADL layer in the intermediate picture recognition model to obtain a picture recognition model, wherein the ADL layer is used for: obtaining a self-attention map according to the output characteristics of the previous layer, carrying out threshold value binarization operation on the self-attention map to generate an inactivation mask, carrying out activation operation on the self-attention map to generate an importance map, obtaining a characteristic map according to the output characteristics of the previous layer and the importance map or the inactivation mask selected according to preset conditions, and inputting the characteristic map into the next layer;

s2, inputting the picture to be recognized into the picture recognition model to obtain a recognition result, wherein the recognition result is as follows: whether the picture to be identified is a pornographic picture or a violation picture or not.

The deep convolutional neural network comprises an input Layer, a hidden Layer and an output Layer, wherein the hidden Layer comprises a convolutional Layer, a pooling Layer and a full-link Layer, an ADL Layer (ADL: extended-based Dropout Layer based on Attention) can be located between any two adjacent layers, for example, the ADL Layer is located between the convolutional Layer and the pooling Layer, or the ADL Layer is located between the pooling Layer and the full-link Layer, and the like, then:

1) when the ADL layer is positioned between the convolution layer and the pooling layer, the upper layer is the convolution layer, and the lower layer is the pooling layer;

2) when the ADL layer is positioned between the pooling layer and the full-link layer, the upper layer is the pooling layer and the lower layer is the full-link layer.

In another embodiment, a plurality of ADL layers may be provided, for example, one ADL layer may be added between the convolutional layer and the pooling layer, and between the pooling layer and the full-link layer, respectively, at the time of training, specifically:

obtaining a self-attention map according to the output characteristics of the previous layer, namely the convolutional layer, generating a deactivation mask by performing threshold binarization operation on the self-attention map, performing activation operation on the self-attention map to generate an importance map, obtaining a characteristic map according to the output characteristics of the previous layer and the importance map or the deactivation mask selected according to preset conditions, and inputting the characteristic map into the next layer, namely the pooling layer;

obtaining a self-attention map according to the output characteristics of the previous layer, namely the pooling layer, performing threshold value binarization operation on the self-attention map to generate an inactivation mask, performing activation operation on the self-attention map to generate an importance map, obtaining a characteristic map according to the output characteristics of the previous layer and the importance map or the inactivation mask selected according to preset conditions, and inputting the characteristic map into the next layer, namely the full-connection layer.

The following description will be made by taking as an example the case where only one ADL layer is added in conjunction with fig. 2, specifically:

s10, obtaining a Self-attention map (Self-attention map) according to the output feature of the previous layer (output feature map of previous): the ADL layer performs an average Pooling operation (Channelwise Pooling) of channel dimensions on the output features of the previous layer, to obtain the self-attention map, specifically:

the size of the output feature of the convolutional layer is H × W × C, where H represents the height of the output feature, W represents the width of the output feature, and C represents the number of channels, after performing the average pooling operation of the channel dimensions, a self-attention map is generated, the size of the self-attention map is H × W × 1, and the specific implementation details of the average pooling operation of the channel dimensions are known to those skilled in the art and are not described herein.

S11, performing a threshold binarization operation on the self-attention map to generate a deactivation mask (Drop mask), wherein the core idea of the threshold binarization operation is as follows: setting the pixel points with the pixel values larger than the threshold value in the inactivation mask as 0, and setting the pixel points with the pixel values not larger than the threshold value in the inactivation mask as 1; the generated inactivation mask can simulate an occlusion layer of an occlusion picture, a large number of occlusion pictures do not need to be collected, and time and cost are saved, wherein 0.9 times of the maximum pixel value in an attention map is generally selected as a threshold value.

S12, activating the self-attention map to generate an Importance map (Importance map): the process of performing the activation operation on the self-attention map to generate the importance map by using the sigmoid activation function is known to those skilled in the art and is not described herein again.

S13, obtaining a feature map (Out of ADL) according to the output feature of the previous layer and the importance map or the inactivation mask (Drop mask) selected according to a preset condition, and inputting the feature map into the next layer, specifically:

the preset conditions are as follows: randomly selecting (random select) significance diagrams or the inactivation masks, then carrying out multiplication operation (spatialwise multiplication) on the significance diagrams or the inactivation masks and output features of the previous layer to obtain feature diagrams, and then inputting the feature diagrams into the next layer;

the preset conditions are as follows: and (3) artificially setting the probability of selecting the importance graph and the probability of selecting the inactivation mask, then carrying out multiplication operation (spatialwise multiplication) on the importance graph and the output characteristics of the previous layer to obtain a characteristic graph, and then inputting the characteristic graph into the next layer.

That is, when the feature map generated by selecting the inactivation mask is selected, the region with the highest response will be erased (simulated occlusion), so that the deep convolution neural network added with the ADL layer can be forced to learn the weak feature region, and the robustness of the model is enhanced; when the importance map is selected to generate the feature map, the attention mechanism can strengthen the learning of a high-response region (strong feature) to improve the overall recognition effect, namely improve the recognition accuracy of the image recognition model for recognizing the non-occluded image.

For example, when the variety of one dog is distinguished, the distinguishing is easiest according to the head of the dog, but the distinguishing can be distinguished according to the body part of the dog although the distinguishing difficulty is higher, the head of the dog is a strong feature, the position of the head of the dog in the picture is a strong feature region, the position of the body part of the dog is a weak feature, and the position of the body part of the dog in the picture is a weak feature region.

In the training process, after the ADL layer is added, the ADL layer can obtain a self-attention map according to the output characteristics of the upper layer, on one hand, threshold value binarization operation is carried out on the self-attention map to generate an inactivation mask, when training is carried out according to the output characteristics of the upper layer and a characteristic map obtained by the inactivation mask, weak characteristics can be learned, when a picture to be identified is an occlusion picture, weak characteristics in the occlusion picture, such as the edge of the occlusion layer, can be identified, the identification precision is improved, the accuracy of an identification result is ensured, in addition, the threshold value binarization operation is carried out on the self-attention map to generate the inactivation mask, the occlusion layer of the occlusion picture can be simulated, a large number of occlusion pictures are not needed, and the time and the cost are saved. On the other hand, when the significance map is generated by performing the activation operation on the self-attention map and training is performed on the feature map obtained from the output feature of the previous layer and the significance map, the recognition accuracy of the non-occlusion picture can be ensured.

In the above embodiments, although the steps are numbered as S1, S2, etc., but only the specific embodiments are given in this application, and those skilled in the art may adjust the execution sequence of S1, S2, etc. according to the actual situation, which is also within the protection scope of the present invention, it is understood that some embodiments may include some or all of the above embodiments.

As shown in fig. 3, a picture recognition system 200 according to an embodiment of the present invention includes a training module 210 and a recognition module 220;

the training module 210 is configured to: training a depth convolution neural network added with an ADL layer based on a picture set comprising a plurality of pictures to obtain an intermediate picture recognition model, removing the ADL layer in the intermediate picture recognition model to obtain a picture recognition model, wherein the ADL layer is used for: obtaining a self-attention map according to the output characteristics of the previous layer, carrying out threshold value binarization operation on the self-attention map to generate an inactivation mask, carrying out activation operation on the self-attention map to generate an importance map, obtaining a characteristic map according to the output characteristics of the previous layer and the importance map or the inactivation mask selected according to preset conditions, and inputting the characteristic map into the next layer;

the identification module 220 is configured to: and inputting the picture to be recognized into the picture recognition model to obtain a recognition result.

Preferably, in the above technical solution, the ADL layer is specifically configured to: and carrying out average pooling operation on the output features of the upper layer in a channel dimension to obtain the self-attention diagram.

Preferably, in the above technical solution, the ADL layer is further specifically configured to: and performing an activation operation on the self-attention map by using a sigmoid activation function to generate the importance map.

The above steps for realizing the corresponding functions of each parameter and each unit module in the picture identification system 200 of the present invention can refer to each parameter and step in the above embodiment of a picture identification method, which are not described herein again.

An electronic device according to an embodiment of the present invention includes a memory, a processor, and a program stored in the memory and running on the processor, where the processor implements any of the steps of the image recognition method implemented in the above description when executing the program.

The electronic device may be a computer, a mobile phone, or the like, and correspondingly, the program is computer software or a mobile phone APP, and the parameters and the steps in the electronic device of the present invention may refer to the parameters and the steps in the above embodiment of the image recognition method, which are not described herein again.

As will be appreciated by one skilled in the art, the present invention may be embodied as a system, method or computer program product.

Accordingly, the present disclosure may be embodied in the form of: may be embodied entirely in hardware, entirely in software (including firmware, resident software, micro-code, etc.) or in a combination of hardware and software, and may be referred to herein generally as a "circuit," module "or" system. Furthermore, in some embodiments, the invention may also be embodied in the form of a computer program product in one or more computer-readable media having computer-readable program code embodied in the medium.

Any combination of one or more computer-readable media may be employed. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium include an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims

1. A picture recognition method is characterized by comprising the following steps:

2. The picture recognition method according to claim 1, wherein the process of obtaining the self-attention map by the ADL layer comprises:

3. The picture recognition method according to claim 1 or 2, wherein the process of obtaining the importance map by the ADL layer comprises:

4. An image recognition system, comprising a training module and a recognition module, wherein the training module is configured to: training a depth convolution neural network added with an ADL layer based on a picture set comprising a plurality of pictures to obtain an intermediate picture recognition model, removing the ADL layer in the intermediate picture recognition model to obtain a picture recognition model, wherein the ADL layer is used for: obtaining a self-attention map according to the output characteristics of the previous layer, carrying out threshold value binarization operation on the self-attention map to generate an inactivation mask, carrying out activation operation on the self-attention map to generate an importance map, obtaining a characteristic map according to the output characteristics of the previous layer and the importance map or the inactivation mask selected according to preset conditions, and inputting the characteristic map into the next layer;

5. The picture recognition system of claim 4, wherein the ADL layer is specifically configured to: and carrying out average pooling operation on the output features of the upper layer in a channel dimension to obtain the self-attention diagram.

6. The picture recognition system according to claim 4 or 5, wherein the ADL layer is further specifically configured to: and performing an activation operation on the self-attention map by using a sigmoid activation function to generate the importance map.

7. A storage medium having stored therein instructions which, when read by a computer, cause the computer to execute a picture recognition method according to any one of claims 1 to 3.

8. An electronic device comprising a memory, a processor and a program stored on the memory and running on the processor, wherein the steps of a picture recognition method according to any one of claims 1 to 3 are implemented when the program is executed by the processor.