CN107704877B

CN107704877B - Image privacy perception method based on deep learning

Info

Publication number: CN107704877B
Application number: CN201710928967.XA
Authority: CN
Inventors: 王鸿鹏; 张阳; 尤磊; 何华门; 黄兴森
Original assignee: Shenzhen Graduate School Harbin Institute of Technology
Current assignee: Shenzhen Graduate School Harbin Institute of Technology
Priority date: 2017-10-09
Filing date: 2017-10-09
Publication date: 2020-05-29
Anticipated expiration: 2037-10-09
Also published as: CN107704877A; US20210224586A1; WO2019071754A1; US11256952B2

Abstract

The invention provides an image privacy perception method based on deep learning, which comprises the following steps: s1, constructing a privacy classification data set with class labels, and training a privacy perception network by using a transfer learning method; s2, completing the identification of the private image by using a deep convolutional neural network facing to privacy perception; and S3, extracting an attention distribution map according to the deep convolution characteristics of the neural network, and positioning the attention concentration area to finish the perception of the image privacy area. The invention has the beneficial effects that: end-to-end training and testing are completed based on the deep neural network, privacy images can be accurately distinguished, privacy areas in the images can be accurately located, and selective protection can be conveniently carried out on privacy information in the images.

Description

Image privacy perception method based on deep learning

Technical Field

The invention relates to artificial intelligence, in particular to an image privacy perception method based on deep learning.

Background

Privacy awareness is an important prerequisite in a privacy protection process, an image is one of the most important information types in the current social network, and privacy awareness of massive image data becomes particularly critical. In view of the strong subjectivity of the concept of privacy, the existing image privacy perception methods usually define the image privacy in a general sense (such as personal certificates, family photos, confidential document snapshots, etc.) or utilize the labeling of user individuals on the social network. The existing method mainly has the following defects:

firstly, in the aspect of private image feature extraction, most of the existing methods use the traditional image feature extraction methods, such as SIFT features, RGB features, color histograms and the like, and have certain limitations on feature expression capability, cannot represent deep semantic-level features, and have poor generalization capability of classification models.

Secondly, some existing methods for perceiving image privacy need to use extra information to achieve acceptable perception effects, such as an image description label of subjective marking of an image by a user or an access control strategy set by the user, however, the information is difficult to obtain under most conditions, requirements for application scenes are also harsh, and the model is poor in universality.

The existing image privacy perception method only completes image-level privacy perception, namely, whether the whole image is a privacy image is distinguished, and an image privacy area is not perceived. However, in practical applications, it is sometimes necessary to perform blocking or blurring processing on a privacy area of an image to achieve the purpose of privacy protection.

Disclosure of Invention

In order to solve the problems in the prior art, the invention provides an image privacy perception method based on deep learning.

The invention provides an image privacy perception method based on deep learning, which comprises the following steps:

s1, constructing a privacy classification data set with class labels, and training a privacy perception network by using a transfer learning method;

s2, completing the identification of the private image by using a deep convolutional neural network facing to privacy perception;

and S3, extracting an attention distribution map according to the deep convolution characteristics of the neural network, and positioning the attention concentration area to finish the perception of the image privacy area.

As a further improvement of the present invention, step S1 includes: the method comprises the steps of pre-training a deep convolutional neural network model on a large-scale image data set, then constructing a privacy classification data set, and finely adjusting the pre-trained deep convolutional neural network model on the privacy classification data set.

As a further improvement of the present invention, step S2 includes: and adding a bilinear operation layer after the last convolutional layer of the deep convolutional neural network to enhance the characteristic expression capability of the deep convolutional neural network model, and simultaneously changing the full-connection layer into a pooling layer.

As a further improvement of the present invention, step S3 includes: and obtaining a weighted high-level feature map as an attention distribution map according to the corresponding relation between the weight of each node of the pooling layer and the feature map subjected to bilinear operation, and positioning the privacy area in the original map through scale conversion.

As a further improvement of the invention, the bilinear operation layer mainly calculates the result of dot product between two feature images after convolution,

let original feature map set M ═ { M ═ M₁,m₂,…,m_n1}，

The output bilinear feature map set is M '═ M'₁,m’₂,…,m’_n1×n1},

The formula for the conversion is as follows:

wherein "

"dot product of a representative matrix"

"represents rounding up, n1 represents the number of original feature maps, and i represents the subscript of the bilinear feature map.

As a further improvement of the invention, dimension reduction operation is carried out on the bilinear feature map.

As a further improvement of the invention, the dimension reduction operation is carried out on the bilinear feature map by adopting a Tensor Sketch algorithm.

As a further improvement of the invention, the bilinear feature map is a matrix of c w h, the input of the Tensor Sketch algorithm is a vector, and each position in the bilinear feature map is sequentially calculated when the Tensor Sketch algorithm is used, namely, the w h c-dimensional vectors are respectively operated and remapped into the w h d-dimensional space; firstly, randomly generating a parameter set h for carrying out hash operation_k∈{1,…,d}^c，s_k∈{1,-1}^c(k ═ 1,2), where h_kFor storing the remapped indices of the input vectors, s_kRandom negation of each element value of the input vector is realized; according to the parameter set, obtaining a remapped Count Sketch vector through accumulation calculation; as known from the convolution theorem, the convolution of the time domain or the space domain is equal to the corresponding frequency domainThe product of (a); therefore, the two Count Sketch vectors are converted into the frequency domain by using fast fourier transform, the products of the two Count Sketch vectors in the frequency domain are obtained, and then the products are converted back into the space domain by inverse fourier transform, and the convolution of the Count Sketch vectors is obtained through calculation.

As a further improvement of the invention, the fully-connected layer is changed into an average pooling layer, the average pooling layer performs pooling operation on the whole feature map, and the elements of each feature map are averaged to finally obtain a d-dimensional vector.

As a further improvement of the invention, the average pooling layer node has a corresponding relation with the characteristic diagram, and an attention distribution diagram is obtained through the weighted summation of the characteristic diagrams;

let the bilinear feature map sequence P after dimension reduction be { P ═ P₁,p₂,…,p_d}，

And if the finally generated attention distribution graph is A, the calculation formula is as follows:

where n2 is the class label to which the input image is classified,

representing the connection weight for the k-th node of the pooling layer corresponding to category n 2;

and positioning the local of the privacy image according to the result, wherein the specific method comprises the steps of carrying out scale change on the attention distribution map obtained in the step, converting the scale change into the size of an original image, setting a threshold value to complete image binarization, and solving a minimum external matrix of the binarized image as the local perception result of the privacy image.

The invention has the beneficial effects that: the end-to-end training and testing are completed based on the deep neural network, the private image and the non-private image can be accurately distinguished, the private area in the image can be positioned, the private information in the image can be selectively protected conveniently, and good prerequisites are provided for the privacy protection process. From the aspect of method advancement, the method effectively solves the problems of low accuracy, poor generalization capability, dependence on extra information of a user and the like of the traditional privacy perception method, and extends the privacy perception from the integral perception of the image to the perception of the image privacy area under the condition of not increasing a training neural network model.

Drawings

FIG. 1 is a flowchart of an image privacy perception method based on deep learning according to the invention.

FIG. 2 is a deep convolutional neural network structure diagram of an image privacy perception method based on deep learning.

Detailed Description

The invention is further described with reference to the following description and embodiments in conjunction with the accompanying drawings.

As shown in fig. 1 to 2, a method for image privacy perception based on deep learning mainly includes:

constructing a privacy data set: collecting related images and marking according to privacy and non-privacy;

pre-training a neural network: training a deep convolutional neural network on a large-scale image dataset (e.g., ImageNet);

neural network improvement and training: improving the pre-trained neural network, and finely adjusting the private image data set;

image overall privacy perception: automatically judging whether the input image is a privacy image;

image privacy zone perception: a privacy zone in an image is automatically detected.

And in the neural network improvement and training step, the pre-trained convolutional neural network is improved, a bilinear operation layer is added after the last convolutional layer, the characteristic expression capability of the model is enhanced, and meanwhile, the full-connection layer is changed into a pooling layer, so that the foundation is laid for privacy area perception.

Image privacy zone awareness does not require network retraining. According to the invention, a weighted high-level feature map is obtained according to the corresponding relation between the weight of each node of the classification network pooling layer and the feature map subjected to bilinear operation, an attention distribution map is obtained through scale change, and an attention concentrated area is positioned as a privacy area.

The specific implementation mode of each step is as follows:

constructing a privacy data set: in order to improve the data set construction efficiency, the first n alternative images of Baidu and Google image search are selected in a keyword search mode. Keywords mainly relate to categories of certificate photos, family/group photos, file snapshots, etc. In the process of acquiring the keywords, a correlation model (for example, word2vec and GloVe models trained by a large amount of corpora) capable of calculating the similarity between words is used to help generate similar words of the input keywords, so that privacy keywords are increased, and more images can be conveniently searched. Then, the few privacy-independent images obtained by the manual screening search were collected into 4384 privacy images. For non-private images, 200 types of common objects in the ImageNet data set are selected, 4800 images are randomly extracted, and finally, the method comprises the following steps of 1: 1, the training set and the test set are divided, so that the training and the testing of the subsequent neural network are facilitated.

Pre-training a neural network: in the step, the deep convolutional neural network is trained on an ImageNet large-scale image data set. The ImageNet dataset contains about 120 million images, which relate to 1000 classes of common objects. The reason for pre-training is that the private data set is small, and the deep convolutional neural network has more parameters, so that the direct training is difficult to converge. If pre-training is firstly carried out on a large-scale data set, the neural network obtains better initial weight, and meanwhile, certain characteristic expression capability is obtained, so that rapid convergence can be achieved on a small data set, and a better classification effect is obtained. The pretrained neural network uses the VGG16 convolutional neural network with a good effect at present, and the VGG16 comprises 16 convolutional layers and 2 full-connection layers, so that a good effect can be achieved on a general classification task.

Neural network improvement and training: the pre-trained model is first refined and trained on a private data set. The main improvements are as follows:

for the pre-trained VGG16 network, a bilinear operation layer is added after the last convolutional layer, so that the characteristic expression capability of the model is enhanced. The bilinear operation layer mainly calculates the feature map after convolutionThe result of dot multiplication between two points is set as the original feature map set M ═ M₁,m₂,…,m_n1The output bilinear feature map set is M '═ M'₁,m’₂,…,m’_n1×n1And then the formula of the conversion is as follows:

wherein "

"dot product of a representative matrix"

However, in the system implementation process, the problem of dimension disaster can occur when the bilinear feature map of the last layer of convolution is directly calculated. If 512 feature maps are arranged in the last layer of the framework, 512 × 512 bilinear feature maps obtained according to the formula (1) bring a large amount of subsequent calculation overhead, and therefore dimension reduction operation needs to be realized on the bilinear feature maps. In contrast, the invention uses a Tensor Sketch algorithm (TS algorithm for short) to realize the dimensionality reduction of data, and the algorithm is a Countsketch-based vector outer product estimation method. Count Sketch is a data hashing method, which is primarily used in the mining of frequent item sets of data streams for the first time, and is subsequently proven by Pham et al to be capable of estimating the outer product of two vectors (i.e., multiplying each element of the vectors by each other) by calculating the convolution of the Count Sketch.

Since the above mentioned feature map is a matrix of c w h, and the input of the TS algorithm is a vector, the present invention calculates each position in the feature map in turn when using the TS algorithm, i.e. calculates w h c-dimensional vectors respectively, and remaps them into w h d-dimensional space. Firstly, randomly generating a parameter set h for carrying out hash operation_k∈{1,…,d}^c，s_k∈{1,-1}^c(k ═ 1,2), where h_kAfter remapping for storing input vectorIndex of (1), s_kThe random inversion of the numerical values of the elements of the input vector is realized. According to the parameter set, the remapped Count Sketch vector can be obtained through accumulation calculation. As can be seen from the convolution theorem, the convolution of the time or spatial domain is equal to the product in the corresponding frequency domain. The convolution of the Count Sketch vectors can be computed by converting the two Count Sketch vectors to the frequency domain using a Fast Fourier Transform (FFT), multiplying them in the frequency domain, and then converting back to the spatial domain by an inverse fourier transform. The specific algorithm is as follows.

In addition to the improvement of the bilinear operation layer, the invention also changes the fully-connected layer behind the last convolutional layer in the original network structure into an Average Pooling layer (Average Pooling), the Pooling operation is carried out on the whole feature diagram by the fully-connected layer, the elements of each feature diagram are averaged, and finally the d-dimensional vector is obtained. The pooling layer is used for replacing the full-link layer because the pooling layer has no learnable parameters, so that the model parameters are greatly reduced, the convergence speed is accelerated, and the occurrence of overfitting is avoided to a certain extent. Meanwhile, the corresponding relation between the feature map and the pooled feature vectors is ensured, and conditions are created for subsequently extracting the attention distribution map.

Image overall privacy perception: the step is used for automatically identifying whether an input image is a privacy image, inputting a test image into the trained privacy perception network, and determining whether the test image is the privacy image according to the subordinate probability of each category output by the network.

Image privacy zone perception: this step is used to automatically detect privacy zones in the image. The attention distribution map is extracted mainly through deep convolution characteristics of the network, and the attention focusing area is positioned, so that the perception of the privacy area is completed.

Because the pooling layer node has corresponding relation with the characteristic graphThe attention profile can be obtained by weighted summation of feature maps. Let the bilinear feature map sequence P after dimension reduction be { P ═ P₁,p₂,…,p_dAnd f, if the attention distribution graph generated finally is A, the calculation formula is as follows:

where n2 is the class label to which the input image is classified,

indicating the connection weight for the k-th node of the pooling layer corresponding to category n 2.

According to the result, the local position of the private image is positioned, and the specific method is to carry out scale change on the attention distribution map obtained in the step and convert the attention distribution map into the size of an original image. And setting a threshold value to complete image binarization, and solving a minimum external matrix of the binarized image as a local perception result of the private image.

The invention has wide application, such as:

first, photo sharing has become an increasingly popular way of communicating in social networks. However, there is a certain safety hazard in photo sharing by users, for example, many people, especially young people, directly share photos that may expose their privacy to a social network without considering their own safety enough, and some lawbreakers may use this information to engage in illegal activities, which undoubtedly brings a certain safety threat to themselves or their relatives and friends. In contrast, if the privacy perception mechanism is used, the privacy which may be related to the picture of the uploader can be reminded in time, and the function of preventing the picture from being overlooked is achieved. In addition, in some cases, the user wishes to mask or blur the areas of the public photograph that are related to privacy. The method for sensing the privacy sensitive area of the image can better solve the problems, can automatically position the privacy area in the image, is convenient for subsequent processing, and avoids manual operation.

In the second scheme, the current cloud storage is more and more widely applied, and the cloud platform collects personal information of a large number of users, wherein a large part of the personal information is image data. However, most cloud platforms belong to untrusted systems, and the events of personal data leakage of the cloud platforms are rare. In order to ensure that the privacy of individuals is not leaked, some companies use encryption or data disturbance to protect the privacy, but a large amount of computing resources are required for processing all massive image data. At this time, if the method related in the invention is used for analyzing the image data, the privacy image is distinguished or the privacy sensitive area is positioned first, and the targeted protection is carried out, so that the calculation overhead can be greatly reduced under the condition of ensuring the information security.

The invention improves some defects of the existing image privacy perception method on one hand, and extends the privacy perception problem to the perception of the image privacy area on the other hand so as to meet different requirements. Compared with the traditional privacy perception method, the method only trains the image content characteristics and categories, is not restricted by the image labels and the access strategies set by the user, and can play a role in various application scenes. Meanwhile, the deep convolutional network is used, and compared with the traditional feature extraction method, the method has stronger feature expressiveness and improves the classification accuracy and generalization capability of the model.

The image privacy perception method based on deep learning provided by the invention has the following advantages:

firstly, a good prerequisite is provided for image privacy protection. The invention provides an automatic privacy perception mechanism, which can perceive the privacy of the image and the local part of the image, and meets the diversified requirements of image privacy protection. The privacy image can be selectively protected on the premise of guaranteeing the privacy safety of the user, and the calculation overhead of privacy protection is greatly saved.

The privacy perception data set constructed in the invention contains images searched according to a large number of privacy corpora, so that the model can perceive various common privacy categories including certificate photos, file snapshots and the like, and the method has strong universality.

In the training and testing stages, an end-to-end mode is adopted (the input end is an original picture, the output end is a perception result, and no human interference is needed in the process), the perception of the privacy image and the image privacy area is completed simultaneously by using the same model, the model is convenient to use, and the method is easy to popularize in various practical application scenes.

And fourthly, an optimization strategy of bilinear operation is introduced, the feature expression capability is further improved on the basis of the original model, the image sensing accuracy rate is favorably improved, and the positioning of the privacy area is greatly facilitated.

The foregoing is a more detailed description of the invention in connection with specific preferred embodiments and it is not intended that the invention be limited to these specific details. For those skilled in the art to which the invention pertains, several simple deductions or substitutions can be made without departing from the spirit of the invention, and all shall be considered as belonging to the protection scope of the invention.

Claims

1. An image privacy perception method based on deep learning is characterized by comprising the following steps:

s3, extracting an attention distribution map according to the deep convolution characteristics of the neural network, and positioning an attention concentration area to finish sensing an image privacy area;

step S1 includes: pre-training a deep convolutional neural network model on a large-scale image data set, then constructing a privacy classification data set, and finely adjusting the pre-trained deep convolutional neural network model on the privacy classification data set;

step S2 includes: adding a bilinear operation layer after the last convolution layer of the deep convolution neural network, wherein the bilinear operation layer is used for calculating the result of dot product between two feature images after convolution, enhancing the feature expression capability of a deep convolution neural network model, and simultaneously changing a full-connection layer into an average pooling layer, and is used for pooling the whole feature image, averaging the elements of each feature image and finally obtaining a d-dimensional vector;

step S3 includes: and obtaining a weighted high-level feature map as an attention distribution map according to the corresponding relation between the weight of each node of the average pooling layer and the feature map subjected to bilinear operation, and positioning the privacy area in the original map through scale conversion.

2. The image privacy perception method based on deep learning of claim 1, wherein:

the bilinear operation layer mainly calculates the result of dot product between the feature maps after convolution,

let original feature map set M ═ { M ═ M₁,m₂,…,m_n1}，

The output bilinear feature map set is M '═ M'₁,m’₂,…,m’_n1×n1},

The formula for the conversion is as follows:

wherein

Represents a dot-product of the matrix and,

representing rounding up, n1 represents the number of original feature maps, and i represents the subscript of the bilinear feature map.

3. The image privacy perception method based on deep learning of claim 2, wherein: and performing dimension reduction operation on the bilinear feature map.

4. The image privacy perception method based on deep learning of claim 3, wherein: and (5) performing dimensionality reduction on the bilinear feature map by adopting a TensorSketch algorithm.

5. The image privacy perception method based on deep learning of claim 4, wherein: the bilinear feature map is a matrix of c w h, the input of the Tensor Sketch algorithm is a vector, and each position in the bilinear feature map is calculated in sequence when the Tensor Sketch algorithm is used, namely the w h c-dimensional vectors are respectively operated and remapped into a w h d-dimensional space; firstly, randomly generating a parameter set h for carrying out hash operation_k∈{1,…,d}^c，s_k∈{1,-1}^c(k ═ 1,2), where h_kFor storing the remapped indices of the input vectors, s_kRandom negation of each element value of the input vector is realized; according to the parameter set, obtaining a remapped Count Sketch vector through accumulation calculation; as known from the convolution theorem, the convolution of the time domain or the space domain is equal to the product in the corresponding frequency domain; therefore, the two Count Sketch vectors are converted into the frequency domain by using fast fourier transform, the products of the two Count Sketch vectors in the frequency domain are obtained, and then the products are converted back into the space domain by inverse fourier transform, and the convolution of the Count Sketch vectors is obtained through calculation.

6. The image privacy perception method based on deep learning of claim 5, wherein: the average pooling layer node has a corresponding relation with the characteristic graph, and an attention distribution graph is obtained through the weighted summation of the characteristic graphs;

where n2 is the class label to which the input image is classified,

representing the connection weight for the k-th node of the average pooling layer corresponding to category n 2;