CN116310920A

CN116310920A - Image privacy prediction method based on scene context awareness

Info

Publication number: CN116310920A
Application number: CN202310270840.9A
Authority: CN
Inventors: 李红波; 李钊; 袁霖; 高新波
Original assignee: Chongqing University of Post and Telecommunications
Current assignee: Chongqing University of Post and Telecommunications
Priority date: 2023-03-20
Filing date: 2023-03-20
Publication date: 2023-06-23

Abstract

The invention belongs to an image processing technology, and particularly relates to an image privacy prediction method based on scene context awareness, which comprises the steps of obtaining an image to be shared and scene context information of the image, namely sharing time, sharing places and sharing target groups when the image is shared; constructing a privacy tag prediction network, and predicting privacy tags of images to be shared by using the network; constructing a cross-mode image privacy prediction network based on scene context awareness, inputting an image to be shared and a privacy label of the image into the network according to scene context information of the image, and predicting whether the image belongs to the privacy image; the prediction model can complete the privacy prediction task only by two small-scale deep neural network models, has higher efficiency than the prior art, and can support the personalized setting of different users on the image privacy.

Description

Image privacy prediction method based on scene context awareness

Technical Field

The invention belongs to an image processing technology, and particularly relates to an image privacy prediction method based on scene context awareness.

Background

With the increasing popularity of smartphones and other mobile devices, high quality cameras are becoming more common. Thus, capturing pictures and sharing on various social platforms has become a common part of our daily lives, and people increasingly share pictures on the internet. Picture sharing occurs not only between a group of friends, but also increasingly outside the user's social circle, in order to make new social interactions. While current social networking sites allow users to alter their privacy preferences, this is often a cumbersome task for most users, which face difficulties in assigning and managing privacy settings. When these privacy settings are inappropriate, online image sharing may result in unnecessary disclosure and violation of privacy. Thus, in our currently interconnected world, it has become necessary to automatically predict the privacy of a picture to alert users of privacy or sensitive content before uploading the picture to a social networking site. Thus, without proper privacy protection, the shared image may reveal most of the user's personal and social environment and their private lives, as the image may intuitively tell the user when and where to occur at a particular moment, the persons involved, and their relationships. Unfortunately, many people, particularly young users of social networks, often share private photographs of themselves, friends, and classmates without being aware of unnecessary disclosure and violating the potential impact of privacy actions on their future lives.

With the increasing concern of people about picture privacy, large social networking sites, while also beginning to provide privacy settings, allow users to manually specify privacy settings that are appropriate for themselves, such as whether a picture is public, relatively private, or visible to family or friends. The social networking sites are also provided with social grouping options, so that users can group friends according to the friends circle, different groups have different labels, and people with certain fixed groups can be limited to see specific pictures. However, due to lack of privacy knowledge, it is difficult for an average user to properly configure privacy settings to reach their desired level of privacy protection; in addition, considering that the number of the shared images is numerous, many users spend a great deal of time to set the privacy level of each picture to ensure their own privacy security when sharing the pictures, so some users do not want to spend time to set their own privacy preference, even skip the process of setting the privacy, and do not pay much attention to their own privacy problem. Eventually leading to its own privacy disclosure. Although related personal privacy setting methods are gradually maturing, in the technical aspect, an effective means is still needed to ensure that users can protect their own privacy when sharing images on social media.

Disclosure of Invention

Aiming at the problem of insufficient accuracy of the current mainstream image privacy prediction method, the invention provides an image privacy prediction method based on scene context awareness, which specifically comprises the following steps:

acquiring an image to be shared and scene context information of the image, namely sharing time, sharing place and sharing target crowd when the image is shared;

constructing a privacy tag prediction network, and predicting privacy tags of images to be shared by using the network;

and constructing a cross-mode image privacy prediction network based on scene context awareness, inputting the images to be shared and privacy labels of the images into the network according to scene context information of the images, and predicting whether the images belong to the privacy images.

Further, based on the historical data, obtaining the sharing time, the sharing location and the sharing target crowd corresponding to the prediction tag specifically includes the following steps:

obtaining sharing time, sharing places and sharing target groups during image release;

setting privacy labels, manually marking historical data sets of the single attribute labels, and counting privacy scores of the historical data sets of the single attribute labels in sharing time, sharing places and sharing target groups;

and taking the highest degree of scores of each single tag data set in three aspects of sharing time, sharing place and sharing target crowd as the sensitivity corresponding to the single tag.

Further, if N privacy labels are set, for the nth privacy label, collecting a single label image corresponding to the label as a single label attribute set corresponding to the label, manually labeling each image in the label, judging three aspects of sharing time, sharing place and sharing target crowd of each image, and scoring, wherein the score is 0-4, 0 indicates that the privacy is not violated, 1 indicates that the privacy is slightly violated, 2 indicates that the privacy is violated, 3 indicates that the privacy is severely violated, and 4 indicates that the privacy is severely violated.

Further, the sharing time comprises working time and rest time, the sharing place comprises public places, formal places and private places, and the sharing target crowd comprises strangers, general relation crowd and intimate relation crowd.

Further, when the privacy label prediction network predicts the privacy label of the image to be shared, the privacy label prediction network predicts the image to be shared by the privacy label prediction network, including:

acquiring the sensitivity degree of a user to each label, wherein if the user considers that the label is more private, the degree of the nameplate of the label is higher;

extracting image features in images shared by users by utilizing a Resnet network, and performing multi-label classification on the image features by using a softmax classifier to obtain the confidence coefficient of each label;

and obtaining a value obtained by multiplying the confidence coefficient of each label by the sensitivity degree corresponding to the label, and taking the label corresponding to the value exceeding 1 as the privacy label corresponding to the image.

Further, the process of judging whether the image belongs to the privacy image based on the cross-mode image privacy prediction network perceived by the scene context comprises the following steps:

acquiring an affinity matrix representing the degree of association between the privacy tag of the image and the scene context information of the image based on the privacy tag of the image and the scene context information of the image;

extracting significant information of the privacy label from the privacy label of the image by utilizing the affinity matrix, and extracting significant information of the scene context information from the scene context information of the image;

fusing the privacy configuration characteristics of the scene context information with the salient characteristics of the privacy tag to obtain first privacy tag information; fusing the privacy text feature of the privacy tag with the salient feature of the scene context information to obtain privacy feature information;

fusing the first privacy tag information and the characteristics of the image based on the cross attention to obtain second privacy tag information;

taking the similarity between the image features and the second privacy tag information as local similarity, taking the similarity between the image features and the privacy feature information as global similarity, and taking the weighted addition of the local similarity and the global similarity as the similarity between the image and the privacy information;

and sorting according to the similarity between the images and the privacy information, judging that the images belong to the privacy images when the similarity exceeds a set threshold value, and otherwise, judging that the images do not belong to the privacy images.

Further, the process of obtaining an affinity matrix representing a degree of association between a privacy tag of an image and scene context information of the image includes:

A＝(R ^a W ^a )(R ^d W ^d ) ^T

wherein A is an affinity matrix; r is R ^a Features obtained by processing the privacy tag through word2vec coding and GRU model; r is R ^d Embedding a scene context information sensor 2vector model to obtainVectors of the same dimension as the privacy tag; w (W) ^α 、W ^d Is a mapping matrix.

Further, the acquiring process of the first privacy tag information and the privacy feature information includes:

wherein ,

is a privacy tag feature associated with scene context information; g _r The number of kinds of privacy labels;

privacy profile features associated with privacy tags; r is R ^a Vector obtained after word2vec coding is carried out on the privacy tag; r is R ^d Embedding a sense 2vector model for the context information of the scene to obtain a vector with the same temperature as that of the privacy tag; e (E) ^a Representing first privacy tag information; e (E) ^d Representing privacy feature information.

Further, the acquiring process of the second privacy tag information includes:

mapping the amount of the first privacy tag information after passing through the three full connection layers into a query vector through a trainable matrix;

mapping the image features into key vectors and value vectors through two trainable matrixes respectively;

an attention value is calculated based on the query vector, the key vector, and the value vector, and the attention value is used as second privacy tag information.

Further, the obtaining of the similarity between the image and the privacy information includes:

S(I,M)＝w ₁ S _local (I,M)+w ₂ S _glocal (I,M)

S _gl o _cal (I,M)＝cosine(Pool(E ^v ),Pool(E ^d ))

wherein S (I, M) represents a similarity between the image and the privacy information, I represents an image feature, and M represents a privacy information feature; w (w) ₁ 、w ₂ Is a balance factor, and w ₁ +w ₂ =1; sigmoid (·) represents a sigmoid activation layer; MLP (& gt) represents a double-layer perceptron; [ |]Representing a splicing operation; pool (-) represents the average pooling operation; e (E) ^v Representing image features; e (E) ^d The privacy feature information is represented by a set of information,

representing second privacy tag information.

According to the invention, by establishing a cross-mode prediction network and combining the original image and scene context information with the personal privacy preference modeling prediction, the prediction performance of the image privacy is ensured, the machine can be identified, and the problem that the image privacy prediction lacks personalized setting and is combined with the image context information is effectively solved. The invention has the following specific beneficial effects:

1) The method has higher accuracy, namely, in the aspect of image privacy prediction, the method can combine the characteristics of the image and scene context information to achieve the effect of improving the accuracy of image privacy prediction, and in addition, through experimental authentication, the prediction accuracy obtained by the method is higher than that of the method proposed by related research;

2) Experiments prove that the prediction model provided by the method can complete the privacy prediction task only by two small-scale deep neural network models, and has high efficiency;

3) The method and the device have strong availability, and can support personalized setting of different users on image privacy.

Drawings

FIG. 1 is a flow chart of a method for image privacy prediction based on scene context according to the present invention;

FIG. 2 is a cross-modal fusion diagram based on context awareness in the present invention;

fig. 3 is a flow chart of the system of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

In this embodiment, the privacy prediction process mainly includes two sub-networks, and the training process needs to train the two sub-networks. The first sub-network, namely an image privacy label prediction network, inputs an image, extracts image characteristics by utilizing a reset network, carries out multi-label classification by utilizing a softmax classifier, and outputs the sensitivity of the image publisher corresponding to privacy labels with the prediction confidence of 24 privacy labels;

the second sub-network is a cross-mode image privacy prediction network based on scene context awareness, as shown in fig. 2, the training process inputs the privacy label predicted by the image, the privacy context information of the image, and outputs the confidence level of the privacy result of the image. The information fusion module is executed by using a cross-attention mechanism, and an affinity matrix of the privacy label and the privacy context is calculated, wherein the affinity matrix represents the association degree between the privacy context and the privacy label, and the calculation formula is as follows:

wherein A is an affinity matrix; r is R ^a The method comprises the steps that a feature of a privacy tag is obtained through word2vec encoding and GRU model processing, namely the privacy tag obtains vector representation of the privacy tag through word2vec encoding, and the vector representation obtains feature representation of the privacy tag after GRU model processing; r is R ^d Embedding a sense 2vector model for the context information of the scene to obtain a vector with the same temperature as that of the privacy tag; w (W) ^a 、W ^d To map the matrices, the purpose of both matrices is to R ^a 、R ^d Performing linear transformation to obtain R ^a Mapping to R ^d Dimension of (2), R ^d Mapping to R ^a Is a dimension of (c).

In order to obtain the feature of the fused privacy tag and the privacy context, the affinity matrix A is normalized in the feature dimension of the privacy tag to obtain the privacy tag attention matrix specific to the feature of the privacy context, then the feature of the privacy tag associated with the feature of the privacy context is obtained, the feature of the privacy context associated with the feature of the privacy tag is obtained in the same way, the corresponding feature is spliced along the last dimension to obtain the feature of the fused privacy tag and the feature of the privacy context information, and the fused feature information is the feature of the privacy information. The process comprises the following steps:

wherein ,

The method comprises the steps of mapping the image and the privacy label characteristics subjected to privacy ligand fusion to the same dimension by using a shared full-connection layer, exploring potential connection between the image and the privacy label by using multi-head cross attention, wherein the attention process is realized as follows:

wherein Q (query), K (key), V (value) are the same dimensions to which the input three sets of vector images and privacy features are mapped, the privacy tag features are mapped as Q through three different full connection layers, the image features are mapped as K and V (the mapping process is to use a learnable matrix corresponding to Q (query), K (key), V (value) to map how to obtain Q (query), K (key), V (value) as known by a person skilled in the art according to the principle of the attention mechanism, and not described herein), the features are used to obtain image features related to the privacy tag, the attention weights are generated by Q and K, after the multi-head cross attention process, information related to the image features and the privacy tag features can be extracted, attn (Q, K, V) is information related to the extracted features, in order to combine the privacy context information and the privacy tag of the image, the similarity between the image and the privacy information is set as local similarity and global similarity, the local similarity is the local similarity and the global similarity between the image features and the privacy tag is the final similarity, and the similarity between the privacy tag is the local similarity and the privacy tag is the final similarity.

S(I,M)＝w ₁ S _local (I,M)+w ₂ S _glocal (I,M)

S _gl o _cal (I,M)＝cosine(Pool(E ^v ),Pool(E ^d ))

representing second privacy tag information.

The invention provides an image privacy prediction method based on scene context awareness, which specifically comprises the following steps:

In this embodiment, as shown in fig. 1, the present invention is an image privacy prediction method based on context awareness, which specifically includes the following steps:

for an image to be predicted, firstly, collecting context information when the image is released, carrying out structural modeling, and modeling the image into data which can be used by a neural network.

And inputting the image to be predicted into a neural network for image feature extraction, fusing scene context information of the image through a cross attention mechanism, and finally fusing the image and the scene context information by using a multi-head attention mechanism to calculate similarity, and predicting the privacy content of the image according to similarity sequencing.

The sensitivity of the user to different scene labels is collected, and the user can achieve the purpose of predicting the image privacy by combining personal privacy preference through setting the scene labels with strong privacy degree. The construction process of the depth image fusion network adopted in the embodiment comprises the following steps:

step 1), a feature extractor consisting of a plurality of convolution dense blocks is connected with a classifier to construct a prediction model of the image privacy tag;

and 2) fusing scene context information of the image through a cross attention mechanism, and finally fusing the image and the scene context information by using a multi-head attention mechanism to calculate similarity, and performing image privacy prediction on privacy content of the image according to similarity sequencing.

The method mainly comprises the steps of fusing image privacy labels and image scene context information based on a cross attention mechanism, simultaneously memorizing the characteristics of the fused images and the image privacy information by using multiple attention mechanisms to realize cross-mode fusion to calculate the similarity between the images and the privacy information, and finally sequencing the similarity to achieve the image privacy prediction effect.

In this embodiment, two prediction networks with the same structure are selected to construct a depth image fusion network, and other networks can be selected to fuse images and scene contexts in the field. In addition, any network in the prior art can be used for fusing the image and the scene context, and the fusion can occur at a decoder or an encoder, and the invention is not limited to the above.

As a preferred implementation manner, the depth image fusion network in this embodiment may use sub-classification networks with different types and different intensities for each training, and the training may obtain a corresponding depth image prediction fusion network.

The depth image prediction fusion network is constrained by a loss function, and finally, loss L between a privacy result and a real result is predicted _Predict Expressed as:

wherein ,L_Predict The prediction cross entropy loss function is used for measuring the accuracy of the final prediction result of the image, and is used for measuring the image privacy degree. The true distribution of the image is y _i The network output is

The total number of categories is n, currently 2.

The embodiment also provides a specific scheme of label setting. In this embodiment, the context information is divided into three types, which are sharing time, sharing location, and sharing target group. Because these three kinds of information contain a large number of categories, the sharing time is divided into two types, namely, rest_time (rest time), work_time (work time), the sharing place is divided into public_place (public occasion), work_place (work study place such as office, classroom) and rest_place (rest place such as house and dormitory), and the sharing target crowd is divided into structural (stranger), work_partner (work study partner such as colleague, boss, colleague) and family_and_friends (crowd with close relationship such as Family, friends).

The existing privacy image data set is expanded, and a data set containing image context information is established. The existing dataset vipr 2 contains 24 privacy categories, respectively: body parts, receipts, home addresses, passports, manuscripts, signatures, faces, nuances, identification cards, landmarks, usernames, names, cell phone numbers, drivers licenses, student cards, prescriptions, educational experiences, ethnicities, tickets, credit cards, fingerprints, disabilities, electronic mailboxes, birthdays. First, mturk (Amazon Mechanical Turk) is selected as a data collection platform, a person needing certain services (called a Requester) sends a task to be done on the internet, a person who wants to do the task (called a Worker) can accept the task and get a reward, a method for collecting data based on Mturk's questionnaire is designed, the most obvious privacy categories in a data set image are divided into 24 batches, each batch has 500 pictures, each questionnaire has 24 pictures, the pictures belong to different batches, the questionnaire survey of each image shows which privacy category the picture belongs to prompt the Worker, the Worker needs to answer three simple questions to the picture, namely, the picture can be infringed to under the conditions of which sharing time, sharing place and sharing target crowd, and privacy is scored for each scene, and 0-4 points represent the degree of infringement on privacy: the method can be used for establishing the relation between 24 privacy labels and the context information without infringement, slight infringement, relatively serious infringement to privacy and quite serious infringement to privacy.

Secondly, statistics and cleaning are performed on the collected data set, with the following criteria: firstly, counting scores of different context categories of each batch to obtain total scores of different context information of each batch, wherein the score of the highest score context information combination, namely the combination of sharing time, sharing place and sharing crowd, is used as privacy context information of the batch, and the privacy context information is manually marked into an annotation matched with an image according to the corresponding privacy category to complete privacy context modeling of the image.

In modeling personal privacy preferences, the image publishers' sensitivity to 24 privacy categories is first collected, with 0-4 representing the degree of invasiveness to privacy: the method has the advantages that the method is not infringed, slightly infringed, relatively seriously infringed to privacy and very seriously infringed to privacy, and the higher the score is, the more sensitive the publisher is to the privacy attribute;

and secondly, extracting image features by using a Resnet network, training an image privacy label prediction model, and predicting the confidence level of the privacy label contained in the image according to the image.

And finally, modeling the predicted image privacy label according to the sensitivity of the collected image publisher to the privacy label, namely, the form of the sensitivity score of the confidence degree of the predicted image privacy label, wherein the privacy label with the final score exceeding 1 is used as the output of the first subnetwork, so that the process of modeling combining the personal privacy preference is completed.

And inputting the privacy label marked by the original image and the image combined with personal privacy preference and the scene context information of the image into a depth image prediction fusion network, and outputting a privacy label predicted based on the content of the image and a prediction result of whether the image belongs to privacy. The method can rely on the pre-trained image feature extraction model and the deep learning prediction model, does not need to train the image feature extraction model, can directly predict the prediction result of the method through the trained model, and has good prediction accuracy.

The invention also provides an image privacy prediction system based on scene context information perception, which is characterized by comprising a scene context information selection module, an image input preprocessing module, a personal privacy preference setting module, a depth image cross-mode prediction network and an image prediction result output module when an image is released, wherein the scene context information selection module inputs the scene context information when the image is released, the image input preprocessing module preprocesses an input original image to obtain a preprocessed image, and image content recognition is carried out, and the preprocessing method comprises but is not limited to image blurring operation and pixelation operation; the depth image cross-mode prediction network performs feature fusion on the original image and scene context information, and the image prediction result output module outputs a final prediction result.

The method can be used for a system for predicting image privacy. A specific embodiment is shown in fig. 3. Fig. 3 illustrates a system flow for image privacy prediction by combining personal privacy preference and context information, wherein, firstly, a user sets personal privacy preference, the system captures images and context information thereof which need to be shared and uploaded to social media, the image information captured by the system is combined with the personal privacy preference to perform image privacy prediction, and finally, the system judges whether the images are private or not and outputs privacy labels of the images to remind the user.

The embodiment also provides a specific training process for the depth image anonymization network, which specifically comprises the following steps:

1) Data set and preprocessing

VISPR dataset: 22000 images belonging to 28 privacy attributes are contained, and each image is marked with the privacy attribute contained in the image, if so, whether a face is contained, whether a mobile phone number is contained or not, and the like; the training set of this dataset is used for training of the model in this embodiment and the test set is used for testing of the model.

And detecting, cutting and aligning the images in the data set by using a pre-trained image processing tool, and keeping the size and the resolution of all the images the same.

2) Training of a network

The proposed depth image prediction network is trained using a training set of the VISPR, three image feature extraction basis models are used in the training, and five training results are generated in total. The three image feature extraction basic models are respectively as follows:

VGG network architecture

Resnet network architecture

MobileNet V2 network architecture

The training results of the three basic models are compared with the result that the invention has the best effect by using the Resnet50 network structure. The feature extraction model of the image privacy tag is as follows:

·word2vec

the feature extraction model of the image privacy ligand is as follows:

·sentence2vector

the invention also provides an image privacy prediction system based on scene context awareness, as shown in fig. 3, which is used for realizing an image privacy prediction method based on scene context awareness, taking images shared by users and scene information generated by sharing the images as input of the West perpetual, combining privacy preference customized by users, and judging by the system

Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims

1. The image privacy prediction method based on scene context awareness is characterized by comprising the following steps of:

2. The image privacy prediction method based on scene context awareness according to claim 1, wherein obtaining the sharing time, the sharing place and the sharing target crowd corresponding to the prediction tag based on the history data specifically comprises the following steps:

3. The image privacy prediction method based on scene context awareness according to claim 2, wherein if N privacy labels are set, for an nth privacy label, collecting a single label image corresponding to the label as a single label attribute set corresponding to the label, manually labeling each image in the label, judging three aspects of sharing time, sharing place and sharing target crowd of each image, and scoring, wherein the score is 0-4, 0 indicates that the image is not violated, 1 indicates that the image is slightly violated, 2 indicates that the image is violated, 3 indicates that the image is severely violated, and 4 indicates that the image is severely violated.

4. A method for predicting image privacy based on scene context awareness according to claim 2 or 3, wherein the sharing time includes working time and rest time, the sharing location includes public place, formal place and private place, and the sharing target crowd includes strangers, general relation crowd and intimate relation crowd.

5. The method for predicting image privacy based on scene context as claimed in claim 1, wherein when the privacy label predicting network predicts the privacy label of the image to be shared, the privacy label predicting network predicts the image privacy label comprises:

6. The method for predicting image privacy based on scene context awareness according to claim 1, wherein the step of determining whether the image belongs to the privacy image based on the cross-modal image privacy prediction network based on scene context awareness comprises:

7. The method of image privacy prediction based on scene context awareness according to claim 6, wherein the process of obtaining an affinity matrix representing a degree of association between a privacy tag of an image and scene context information of the image comprises:

A＝(R ^a W ^a )(R ^d W ^d ) ^T

wherein A is an affinity matrix; r is R ^a Features obtained by processing the privacy tag through word2vec coding and GRU model; r is R ^d Embedding a sense 2vector model for the context information of the scene to obtain vectors with the same dimension as the privacy tag; w (W) ^α 、W ^d Is a mapping matrix.

8. The method for image privacy prediction based on scene context awareness as defined in claim 6, wherein the acquiring of the first privacy tag information and the privacy feature information comprises:

wherein ,

is a privacy tag feature associated with scene context information; g _r The number of kinds of privacy labels; />

9. The method for image privacy prediction based on scene context awareness as in claim 6, wherein the obtaining the second privacy tag information comprises:

10. The method of image privacy prediction based on scene context awareness according to claim 6, wherein the obtaining of the similarity between the image and the privacy information comprises:

S(I,M)＝w ₁ S _local (I,M)+w ₂ S _glocal (I,M)

S _gl o _cal (I,M)＝cosine(Pool(E ^v ),Pool(E ^d ))

wherein S (I, M) represents a similarity between the image and the privacy information, I represents an image feature, and M represents a privacy information feature; w (w) ₁ 、w ₂ Is a balance factor, and w ₁ +w ₂ =1; sigmoid (·) represents a sigmoid activation layer; MLP (& gt) represents a double-layer perceptron;

representing a splicing operation; pool (-) represents the average pooling operation; e (E) ^v Representing image features; e (E) ^d Representing privacy feature information->

Representing second privacy tag information.