CN116310920A - Image privacy prediction method based on scene context awareness - Google Patents

Image privacy prediction method based on scene context awareness Download PDF

Info

Publication number
CN116310920A
CN116310920A CN202310270840.9A CN202310270840A CN116310920A CN 116310920 A CN116310920 A CN 116310920A CN 202310270840 A CN202310270840 A CN 202310270840A CN 116310920 A CN116310920 A CN 116310920A
Authority
CN
China
Prior art keywords
privacy
image
information
label
tag
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310270840.9A
Other languages
Chinese (zh)
Inventor
李红波
李钊
袁霖
高新波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University of Post and Telecommunications
Original Assignee
Chongqing University of Post and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University of Post and Telecommunications filed Critical Chongqing University of Post and Telecommunications
Priority to CN202310270840.9A priority Critical patent/CN116310920A/en
Publication of CN116310920A publication Critical patent/CN116310920A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/30Scenes; Scene-specific elements in albums, collections or shared content, e.g. social network photos or video
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/768Arrangements for image or video recognition or understanding using pattern recognition or machine learning using context analysis, e.g. recognition aided by known co-occurring patterns
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention belongs to an image processing technology, and particularly relates to an image privacy prediction method based on scene context awareness, which comprises the steps of obtaining an image to be shared and scene context information of the image, namely sharing time, sharing places and sharing target groups when the image is shared; constructing a privacy tag prediction network, and predicting privacy tags of images to be shared by using the network; constructing a cross-mode image privacy prediction network based on scene context awareness, inputting an image to be shared and a privacy label of the image into the network according to scene context information of the image, and predicting whether the image belongs to the privacy image; the prediction model can complete the privacy prediction task only by two small-scale deep neural network models, has higher efficiency than the prior art, and can support the personalized setting of different users on the image privacy.

Description

Image privacy prediction method based on scene context awareness
Technical Field
The invention belongs to an image processing technology, and particularly relates to an image privacy prediction method based on scene context awareness.
Background
With the increasing popularity of smartphones and other mobile devices, high quality cameras are becoming more common. Thus, capturing pictures and sharing on various social platforms has become a common part of our daily lives, and people increasingly share pictures on the internet. Picture sharing occurs not only between a group of friends, but also increasingly outside the user's social circle, in order to make new social interactions. While current social networking sites allow users to alter their privacy preferences, this is often a cumbersome task for most users, which face difficulties in assigning and managing privacy settings. When these privacy settings are inappropriate, online image sharing may result in unnecessary disclosure and violation of privacy. Thus, in our currently interconnected world, it has become necessary to automatically predict the privacy of a picture to alert users of privacy or sensitive content before uploading the picture to a social networking site. Thus, without proper privacy protection, the shared image may reveal most of the user's personal and social environment and their private lives, as the image may intuitively tell the user when and where to occur at a particular moment, the persons involved, and their relationships. Unfortunately, many people, particularly young users of social networks, often share private photographs of themselves, friends, and classmates without being aware of unnecessary disclosure and violating the potential impact of privacy actions on their future lives.
With the increasing concern of people about picture privacy, large social networking sites, while also beginning to provide privacy settings, allow users to manually specify privacy settings that are appropriate for themselves, such as whether a picture is public, relatively private, or visible to family or friends. The social networking sites are also provided with social grouping options, so that users can group friends according to the friends circle, different groups have different labels, and people with certain fixed groups can be limited to see specific pictures. However, due to lack of privacy knowledge, it is difficult for an average user to properly configure privacy settings to reach their desired level of privacy protection; in addition, considering that the number of the shared images is numerous, many users spend a great deal of time to set the privacy level of each picture to ensure their own privacy security when sharing the pictures, so some users do not want to spend time to set their own privacy preference, even skip the process of setting the privacy, and do not pay much attention to their own privacy problem. Eventually leading to its own privacy disclosure. Although related personal privacy setting methods are gradually maturing, in the technical aspect, an effective means is still needed to ensure that users can protect their own privacy when sharing images on social media.
Disclosure of Invention
Aiming at the problem of insufficient accuracy of the current mainstream image privacy prediction method, the invention provides an image privacy prediction method based on scene context awareness, which specifically comprises the following steps:
acquiring an image to be shared and scene context information of the image, namely sharing time, sharing place and sharing target crowd when the image is shared;
constructing a privacy tag prediction network, and predicting privacy tags of images to be shared by using the network;
and constructing a cross-mode image privacy prediction network based on scene context awareness, inputting the images to be shared and privacy labels of the images into the network according to scene context information of the images, and predicting whether the images belong to the privacy images.
Further, based on the historical data, obtaining the sharing time, the sharing location and the sharing target crowd corresponding to the prediction tag specifically includes the following steps:
obtaining sharing time, sharing places and sharing target groups during image release;
setting privacy labels, manually marking historical data sets of the single attribute labels, and counting privacy scores of the historical data sets of the single attribute labels in sharing time, sharing places and sharing target groups;
and taking the highest degree of scores of each single tag data set in three aspects of sharing time, sharing place and sharing target crowd as the sensitivity corresponding to the single tag.
Further, if N privacy labels are set, for the nth privacy label, collecting a single label image corresponding to the label as a single label attribute set corresponding to the label, manually labeling each image in the label, judging three aspects of sharing time, sharing place and sharing target crowd of each image, and scoring, wherein the score is 0-4, 0 indicates that the privacy is not violated, 1 indicates that the privacy is slightly violated, 2 indicates that the privacy is violated, 3 indicates that the privacy is severely violated, and 4 indicates that the privacy is severely violated.
Further, the sharing time comprises working time and rest time, the sharing place comprises public places, formal places and private places, and the sharing target crowd comprises strangers, general relation crowd and intimate relation crowd.
Further, when the privacy label prediction network predicts the privacy label of the image to be shared, the privacy label prediction network predicts the image to be shared by the privacy label prediction network, including:
acquiring the sensitivity degree of a user to each label, wherein if the user considers that the label is more private, the degree of the nameplate of the label is higher;
extracting image features in images shared by users by utilizing a Resnet network, and performing multi-label classification on the image features by using a softmax classifier to obtain the confidence coefficient of each label;
and obtaining a value obtained by multiplying the confidence coefficient of each label by the sensitivity degree corresponding to the label, and taking the label corresponding to the value exceeding 1 as the privacy label corresponding to the image.
Further, the process of judging whether the image belongs to the privacy image based on the cross-mode image privacy prediction network perceived by the scene context comprises the following steps:
acquiring an affinity matrix representing the degree of association between the privacy tag of the image and the scene context information of the image based on the privacy tag of the image and the scene context information of the image;
extracting significant information of the privacy label from the privacy label of the image by utilizing the affinity matrix, and extracting significant information of the scene context information from the scene context information of the image;
fusing the privacy configuration characteristics of the scene context information with the salient characteristics of the privacy tag to obtain first privacy tag information; fusing the privacy text feature of the privacy tag with the salient feature of the scene context information to obtain privacy feature information;
fusing the first privacy tag information and the characteristics of the image based on the cross attention to obtain second privacy tag information;
taking the similarity between the image features and the second privacy tag information as local similarity, taking the similarity between the image features and the privacy feature information as global similarity, and taking the weighted addition of the local similarity and the global similarity as the similarity between the image and the privacy information;
and sorting according to the similarity between the images and the privacy information, judging that the images belong to the privacy images when the similarity exceeds a set threshold value, and otherwise, judging that the images do not belong to the privacy images.
Further, the process of obtaining an affinity matrix representing a degree of association between a privacy tag of an image and scene context information of the image includes:
A=(R a W a )(R d W d ) T
wherein A is an affinity matrix; r is R a Features obtained by processing the privacy tag through word2vec coding and GRU model; r is R d Embedding a scene context information sensor 2vector model to obtainVectors of the same dimension as the privacy tag; w (W) α 、W d Is a mapping matrix.
Further, the acquiring process of the first privacy tag information and the privacy feature information includes:
Figure BDA0004134583550000041
Figure BDA0004134583550000042
Figure BDA0004134583550000043
wherein ,
Figure BDA0004134583550000044
is a privacy tag feature associated with scene context information; g r The number of kinds of privacy labels;
Figure BDA0004134583550000045
privacy profile features associated with privacy tags; r is R a Vector obtained after word2vec coding is carried out on the privacy tag; r is R d Embedding a sense 2vector model for the context information of the scene to obtain a vector with the same temperature as that of the privacy tag; e (E) a Representing first privacy tag information; e (E) d Representing privacy feature information.
Further, the acquiring process of the second privacy tag information includes:
mapping the amount of the first privacy tag information after passing through the three full connection layers into a query vector through a trainable matrix;
mapping the image features into key vectors and value vectors through two trainable matrixes respectively;
an attention value is calculated based on the query vector, the key vector, and the value vector, and the attention value is used as second privacy tag information.
Further, the obtaining of the similarity between the image and the privacy information includes:
S(I,M)=w 1 S local (I,M)+w 2 S glocal (I,M)
Figure BDA0004134583550000051
S gl o cal (I,M)=cosine(Pool(E v ),Pool(E d ))
wherein S (I, M) represents a similarity between the image and the privacy information, I represents an image feature, and M represents a privacy information feature; w (w) 1 、w 2 Is a balance factor, and w 1 +w 2 =1; sigmoid (·) represents a sigmoid activation layer; MLP (& gt) represents a double-layer perceptron; [ |]Representing a splicing operation; pool (-) represents the average pooling operation; e (E) v Representing image features; e (E) d The privacy feature information is represented by a set of information,
Figure BDA0004134583550000052
representing second privacy tag information.
According to the invention, by establishing a cross-mode prediction network and combining the original image and scene context information with the personal privacy preference modeling prediction, the prediction performance of the image privacy is ensured, the machine can be identified, and the problem that the image privacy prediction lacks personalized setting and is combined with the image context information is effectively solved. The invention has the following specific beneficial effects:
1) The method has higher accuracy, namely, in the aspect of image privacy prediction, the method can combine the characteristics of the image and scene context information to achieve the effect of improving the accuracy of image privacy prediction, and in addition, through experimental authentication, the prediction accuracy obtained by the method is higher than that of the method proposed by related research;
2) Experiments prove that the prediction model provided by the method can complete the privacy prediction task only by two small-scale deep neural network models, and has high efficiency;
3) The method and the device have strong availability, and can support personalized setting of different users on image privacy.
Drawings
FIG. 1 is a flow chart of a method for image privacy prediction based on scene context according to the present invention;
FIG. 2 is a cross-modal fusion diagram based on context awareness in the present invention;
fig. 3 is a flow chart of the system of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
In this embodiment, the privacy prediction process mainly includes two sub-networks, and the training process needs to train the two sub-networks. The first sub-network, namely an image privacy label prediction network, inputs an image, extracts image characteristics by utilizing a reset network, carries out multi-label classification by utilizing a softmax classifier, and outputs the sensitivity of the image publisher corresponding to privacy labels with the prediction confidence of 24 privacy labels;
the second sub-network is a cross-mode image privacy prediction network based on scene context awareness, as shown in fig. 2, the training process inputs the privacy label predicted by the image, the privacy context information of the image, and outputs the confidence level of the privacy result of the image. The information fusion module is executed by using a cross-attention mechanism, and an affinity matrix of the privacy label and the privacy context is calculated, wherein the affinity matrix represents the association degree between the privacy context and the privacy label, and the calculation formula is as follows:
Figure BDA0004134583550000061
wherein A is an affinity matrix; r is R a The method comprises the steps that a feature of a privacy tag is obtained through word2vec encoding and GRU model processing, namely the privacy tag obtains vector representation of the privacy tag through word2vec encoding, and the vector representation obtains feature representation of the privacy tag after GRU model processing; r is R d Embedding a sense 2vector model for the context information of the scene to obtain a vector with the same temperature as that of the privacy tag; w (W) a 、W d To map the matrices, the purpose of both matrices is to R a 、R d Performing linear transformation to obtain R a Mapping to R d Dimension of (2), R d Mapping to R a Is a dimension of (c).
In order to obtain the feature of the fused privacy tag and the privacy context, the affinity matrix A is normalized in the feature dimension of the privacy tag to obtain the privacy tag attention matrix specific to the feature of the privacy context, then the feature of the privacy tag associated with the feature of the privacy context is obtained, the feature of the privacy context associated with the feature of the privacy tag is obtained in the same way, the corresponding feature is spliced along the last dimension to obtain the feature of the fused privacy tag and the feature of the privacy context information, and the fused feature information is the feature of the privacy information. The process comprises the following steps:
Figure BDA0004134583550000071
Figure BDA0004134583550000072
Figure BDA0004134583550000073
wherein ,
Figure BDA0004134583550000074
is a privacy tag feature associated with scene context information; g r The number of kinds of privacy labels;
Figure BDA0004134583550000075
privacy profile features associated with privacy tags; r is R a Vector obtained after word2vec coding is carried out on the privacy tag; r is R d Embedding a sense 2vector model for the context information of the scene to obtain a vector with the same temperature as that of the privacy tag; e (E) a Representing first privacy tag information; e (E) d Representing privacy feature information.
The method comprises the steps of mapping the image and the privacy label characteristics subjected to privacy ligand fusion to the same dimension by using a shared full-connection layer, exploring potential connection between the image and the privacy label by using multi-head cross attention, wherein the attention process is realized as follows:
Figure BDA0004134583550000076
wherein Q (query), K (key), V (value) are the same dimensions to which the input three sets of vector images and privacy features are mapped, the privacy tag features are mapped as Q through three different full connection layers, the image features are mapped as K and V (the mapping process is to use a learnable matrix corresponding to Q (query), K (key), V (value) to map how to obtain Q (query), K (key), V (value) as known by a person skilled in the art according to the principle of the attention mechanism, and not described herein), the features are used to obtain image features related to the privacy tag, the attention weights are generated by Q and K, after the multi-head cross attention process, information related to the image features and the privacy tag features can be extracted, attn (Q, K, V) is information related to the extracted features, in order to combine the privacy context information and the privacy tag of the image, the similarity between the image and the privacy information is set as local similarity and global similarity, the local similarity is the local similarity and the global similarity between the image features and the privacy tag is the final similarity, and the similarity between the privacy tag is the local similarity and the privacy tag is the final similarity.
S(I,M)=w 1 S local (I,M)+w 2 S glocal (I,M)
Figure BDA0004134583550000081
S gl o cal (I,M)=cosine(Pool(E v ),Pool(E d ))
Wherein S (I, M) represents a similarity between the image and the privacy information, I represents an image feature, and M represents a privacy information feature; w (w) 1 、w 2 Is a balance factor, and w 1 +w 2 =1; sigmoid (·) represents a sigmoid activation layer; MLP (& gt) represents a double-layer perceptron; [ |]Representing a splicing operation; pool (-) represents the average pooling operation; e (E) v Representing image features; e (E) d The privacy feature information is represented by a set of information,
Figure BDA0004134583550000082
representing second privacy tag information.
The invention provides an image privacy prediction method based on scene context awareness, which specifically comprises the following steps:
acquiring an image to be shared and scene context information of the image, namely sharing time, sharing place and sharing target crowd when the image is shared;
constructing a privacy tag prediction network, and predicting privacy tags of images to be shared by using the network;
and constructing a cross-mode image privacy prediction network based on scene context awareness, inputting the images to be shared and privacy labels of the images into the network according to scene context information of the images, and predicting whether the images belong to the privacy images.
In this embodiment, as shown in fig. 1, the present invention is an image privacy prediction method based on context awareness, which specifically includes the following steps:
for an image to be predicted, firstly, collecting context information when the image is released, carrying out structural modeling, and modeling the image into data which can be used by a neural network.
And inputting the image to be predicted into a neural network for image feature extraction, fusing scene context information of the image through a cross attention mechanism, and finally fusing the image and the scene context information by using a multi-head attention mechanism to calculate similarity, and predicting the privacy content of the image according to similarity sequencing.
The sensitivity of the user to different scene labels is collected, and the user can achieve the purpose of predicting the image privacy by combining personal privacy preference through setting the scene labels with strong privacy degree. The construction process of the depth image fusion network adopted in the embodiment comprises the following steps:
step 1), a feature extractor consisting of a plurality of convolution dense blocks is connected with a classifier to construct a prediction model of the image privacy tag;
and 2) fusing scene context information of the image through a cross attention mechanism, and finally fusing the image and the scene context information by using a multi-head attention mechanism to calculate similarity, and performing image privacy prediction on privacy content of the image according to similarity sequencing.
The method mainly comprises the steps of fusing image privacy labels and image scene context information based on a cross attention mechanism, simultaneously memorizing the characteristics of the fused images and the image privacy information by using multiple attention mechanisms to realize cross-mode fusion to calculate the similarity between the images and the privacy information, and finally sequencing the similarity to achieve the image privacy prediction effect.
In this embodiment, two prediction networks with the same structure are selected to construct a depth image fusion network, and other networks can be selected to fuse images and scene contexts in the field. In addition, any network in the prior art can be used for fusing the image and the scene context, and the fusion can occur at a decoder or an encoder, and the invention is not limited to the above.
As a preferred implementation manner, the depth image fusion network in this embodiment may use sub-classification networks with different types and different intensities for each training, and the training may obtain a corresponding depth image prediction fusion network.
The depth image prediction fusion network is constrained by a loss function, and finally, loss L between a privacy result and a real result is predicted Predict Expressed as:
Figure BDA0004134583550000101
wherein ,LPredict The prediction cross entropy loss function is used for measuring the accuracy of the final prediction result of the image, and is used for measuring the image privacy degree. The true distribution of the image is y i The network output is
Figure BDA0004134583550000102
The total number of categories is n, currently 2.
The embodiment also provides a specific scheme of label setting. In this embodiment, the context information is divided into three types, which are sharing time, sharing location, and sharing target group. Because these three kinds of information contain a large number of categories, the sharing time is divided into two types, namely, rest_time (rest time), work_time (work time), the sharing place is divided into public_place (public occasion), work_place (work study place such as office, classroom) and rest_place (rest place such as house and dormitory), and the sharing target crowd is divided into structural (stranger), work_partner (work study partner such as colleague, boss, colleague) and family_and_friends (crowd with close relationship such as Family, friends).
The existing privacy image data set is expanded, and a data set containing image context information is established. The existing dataset vipr 2 contains 24 privacy categories, respectively: body parts, receipts, home addresses, passports, manuscripts, signatures, faces, nuances, identification cards, landmarks, usernames, names, cell phone numbers, drivers licenses, student cards, prescriptions, educational experiences, ethnicities, tickets, credit cards, fingerprints, disabilities, electronic mailboxes, birthdays. First, mturk (Amazon Mechanical Turk) is selected as a data collection platform, a person needing certain services (called a Requester) sends a task to be done on the internet, a person who wants to do the task (called a Worker) can accept the task and get a reward, a method for collecting data based on Mturk's questionnaire is designed, the most obvious privacy categories in a data set image are divided into 24 batches, each batch has 500 pictures, each questionnaire has 24 pictures, the pictures belong to different batches, the questionnaire survey of each image shows which privacy category the picture belongs to prompt the Worker, the Worker needs to answer three simple questions to the picture, namely, the picture can be infringed to under the conditions of which sharing time, sharing place and sharing target crowd, and privacy is scored for each scene, and 0-4 points represent the degree of infringement on privacy: the method can be used for establishing the relation between 24 privacy labels and the context information without infringement, slight infringement, relatively serious infringement to privacy and quite serious infringement to privacy.
Secondly, statistics and cleaning are performed on the collected data set, with the following criteria: firstly, counting scores of different context categories of each batch to obtain total scores of different context information of each batch, wherein the score of the highest score context information combination, namely the combination of sharing time, sharing place and sharing crowd, is used as privacy context information of the batch, and the privacy context information is manually marked into an annotation matched with an image according to the corresponding privacy category to complete privacy context modeling of the image.
In modeling personal privacy preferences, the image publishers' sensitivity to 24 privacy categories is first collected, with 0-4 representing the degree of invasiveness to privacy: the method has the advantages that the method is not infringed, slightly infringed, relatively seriously infringed to privacy and very seriously infringed to privacy, and the higher the score is, the more sensitive the publisher is to the privacy attribute;
and secondly, extracting image features by using a Resnet network, training an image privacy label prediction model, and predicting the confidence level of the privacy label contained in the image according to the image.
And finally, modeling the predicted image privacy label according to the sensitivity of the collected image publisher to the privacy label, namely, the form of the sensitivity score of the confidence degree of the predicted image privacy label, wherein the privacy label with the final score exceeding 1 is used as the output of the first subnetwork, so that the process of modeling combining the personal privacy preference is completed.
And inputting the privacy label marked by the original image and the image combined with personal privacy preference and the scene context information of the image into a depth image prediction fusion network, and outputting a privacy label predicted based on the content of the image and a prediction result of whether the image belongs to privacy. The method can rely on the pre-trained image feature extraction model and the deep learning prediction model, does not need to train the image feature extraction model, can directly predict the prediction result of the method through the trained model, and has good prediction accuracy.
The invention also provides an image privacy prediction system based on scene context information perception, which is characterized by comprising a scene context information selection module, an image input preprocessing module, a personal privacy preference setting module, a depth image cross-mode prediction network and an image prediction result output module when an image is released, wherein the scene context information selection module inputs the scene context information when the image is released, the image input preprocessing module preprocesses an input original image to obtain a preprocessed image, and image content recognition is carried out, and the preprocessing method comprises but is not limited to image blurring operation and pixelation operation; the depth image cross-mode prediction network performs feature fusion on the original image and scene context information, and the image prediction result output module outputs a final prediction result.
The method can be used for a system for predicting image privacy. A specific embodiment is shown in fig. 3. Fig. 3 illustrates a system flow for image privacy prediction by combining personal privacy preference and context information, wherein, firstly, a user sets personal privacy preference, the system captures images and context information thereof which need to be shared and uploaded to social media, the image information captured by the system is combined with the personal privacy preference to perform image privacy prediction, and finally, the system judges whether the images are private or not and outputs privacy labels of the images to remind the user.
The embodiment also provides a specific training process for the depth image anonymization network, which specifically comprises the following steps:
1) Data set and preprocessing
VISPR dataset: 22000 images belonging to 28 privacy attributes are contained, and each image is marked with the privacy attribute contained in the image, if so, whether a face is contained, whether a mobile phone number is contained or not, and the like; the training set of this dataset is used for training of the model in this embodiment and the test set is used for testing of the model.
And detecting, cutting and aligning the images in the data set by using a pre-trained image processing tool, and keeping the size and the resolution of all the images the same.
2) Training of a network
The proposed depth image prediction network is trained using a training set of the VISPR, three image feature extraction basis models are used in the training, and five training results are generated in total. The three image feature extraction basic models are respectively as follows:
VGG network architecture
Resnet network architecture
MobileNet V2 network architecture
The training results of the three basic models are compared with the result that the invention has the best effect by using the Resnet50 network structure. The feature extraction model of the image privacy tag is as follows:
·word2vec
the feature extraction model of the image privacy ligand is as follows:
·sentence2vector
the invention also provides an image privacy prediction system based on scene context awareness, as shown in fig. 3, which is used for realizing an image privacy prediction method based on scene context awareness, taking images shared by users and scene information generated by sharing the images as input of the West perpetual, combining privacy preference customized by users, and judging by the system
Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims (10)

1. The image privacy prediction method based on scene context awareness is characterized by comprising the following steps of:
acquiring an image to be shared and scene context information of the image, namely sharing time, sharing place and sharing target crowd when the image is shared;
constructing a privacy tag prediction network, and predicting privacy tags of images to be shared by using the network;
and constructing a cross-mode image privacy prediction network based on scene context awareness, inputting the images to be shared and privacy labels of the images into the network according to scene context information of the images, and predicting whether the images belong to the privacy images.
2. The image privacy prediction method based on scene context awareness according to claim 1, wherein obtaining the sharing time, the sharing place and the sharing target crowd corresponding to the prediction tag based on the history data specifically comprises the following steps:
obtaining sharing time, sharing places and sharing target groups during image release;
setting privacy labels, manually marking historical data sets of the single attribute labels, and counting privacy scores of the historical data sets of the single attribute labels in sharing time, sharing places and sharing target groups;
and taking the highest degree of scores of each single tag data set in three aspects of sharing time, sharing place and sharing target crowd as the sensitivity corresponding to the single tag.
3. The image privacy prediction method based on scene context awareness according to claim 2, wherein if N privacy labels are set, for an nth privacy label, collecting a single label image corresponding to the label as a single label attribute set corresponding to the label, manually labeling each image in the label, judging three aspects of sharing time, sharing place and sharing target crowd of each image, and scoring, wherein the score is 0-4, 0 indicates that the image is not violated, 1 indicates that the image is slightly violated, 2 indicates that the image is violated, 3 indicates that the image is severely violated, and 4 indicates that the image is severely violated.
4. A method for predicting image privacy based on scene context awareness according to claim 2 or 3, wherein the sharing time includes working time and rest time, the sharing location includes public place, formal place and private place, and the sharing target crowd includes strangers, general relation crowd and intimate relation crowd.
5. The method for predicting image privacy based on scene context as claimed in claim 1, wherein when the privacy label predicting network predicts the privacy label of the image to be shared, the privacy label predicting network predicts the image privacy label comprises:
acquiring the sensitivity degree of a user to each label, wherein if the user considers that the label is more private, the degree of the nameplate of the label is higher;
extracting image features in images shared by users by utilizing a Resnet network, and performing multi-label classification on the image features by using a softmax classifier to obtain the confidence coefficient of each label;
and obtaining a value obtained by multiplying the confidence coefficient of each label by the sensitivity degree corresponding to the label, and taking the label corresponding to the value exceeding 1 as the privacy label corresponding to the image.
6. The method for predicting image privacy based on scene context awareness according to claim 1, wherein the step of determining whether the image belongs to the privacy image based on the cross-modal image privacy prediction network based on scene context awareness comprises:
acquiring an affinity matrix representing the degree of association between the privacy tag of the image and the scene context information of the image based on the privacy tag of the image and the scene context information of the image;
extracting significant information of the privacy label from the privacy label of the image by utilizing the affinity matrix, and extracting significant information of the scene context information from the scene context information of the image;
fusing the privacy configuration characteristics of the scene context information with the salient characteristics of the privacy tag to obtain first privacy tag information; fusing the privacy text feature of the privacy tag with the salient feature of the scene context information to obtain privacy feature information;
fusing the first privacy tag information and the characteristics of the image based on the cross attention to obtain second privacy tag information;
taking the similarity between the image features and the second privacy tag information as local similarity, taking the similarity between the image features and the privacy feature information as global similarity, and taking the weighted addition of the local similarity and the global similarity as the similarity between the image and the privacy information;
and sorting according to the similarity between the images and the privacy information, judging that the images belong to the privacy images when the similarity exceeds a set threshold value, and otherwise, judging that the images do not belong to the privacy images.
7. The method of image privacy prediction based on scene context awareness according to claim 6, wherein the process of obtaining an affinity matrix representing a degree of association between a privacy tag of an image and scene context information of the image comprises:
A=(R a W a )(R d W d ) T
wherein A is an affinity matrix; r is R a Features obtained by processing the privacy tag through word2vec coding and GRU model; r is R d Embedding a sense 2vector model for the context information of the scene to obtain vectors with the same dimension as the privacy tag; w (W) α 、W d Is a mapping matrix.
8. The method for image privacy prediction based on scene context awareness as defined in claim 6, wherein the acquiring of the first privacy tag information and the privacy feature information comprises:
Figure FDA0004134583540000031
Figure FDA0004134583540000032
Figure FDA0004134583540000033
wherein ,
Figure FDA0004134583540000034
is a privacy tag feature associated with scene context information; g r The number of kinds of privacy labels; />
Figure FDA0004134583540000035
Privacy profile features associated with privacy tags; r is R a Vector obtained after word2vec coding is carried out on the privacy tag; r is R d Embedding a sense 2vector model for the context information of the scene to obtain a vector with the same temperature as that of the privacy tag; e (E) a Representing first privacy tag information; e (E) d Representing privacy feature information.
9. The method for image privacy prediction based on scene context awareness as in claim 6, wherein the obtaining the second privacy tag information comprises:
mapping the amount of the first privacy tag information after passing through the three full connection layers into a query vector through a trainable matrix;
mapping the image features into key vectors and value vectors through two trainable matrixes respectively;
an attention value is calculated based on the query vector, the key vector, and the value vector, and the attention value is used as second privacy tag information.
10. The method of image privacy prediction based on scene context awareness according to claim 6, wherein the obtaining of the similarity between the image and the privacy information comprises:
S(I,M)=w 1 S local (I,M)+w 2 S glocal (I,M)
Figure FDA0004134583540000041
S gl o cal (I,M)=cosine(Pool(E v ),Pool(E d ))
wherein S (I, M) represents a similarity between the image and the privacy information, I represents an image feature, and M represents a privacy information feature; w (w) 1 、w 2 Is a balance factor, and w 1 +w 2 =1; sigmoid (·) represents a sigmoid activation layer; MLP (& gt) represents a double-layer perceptron;
Figure FDA0004134583540000042
representing a splicing operation; pool (-) represents the average pooling operation; e (E) v Representing image features; e (E) d Representing privacy feature information->
Figure FDA0004134583540000043
Representing second privacy tag information.
CN202310270840.9A 2023-03-20 2023-03-20 Image privacy prediction method based on scene context awareness Pending CN116310920A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310270840.9A CN116310920A (en) 2023-03-20 2023-03-20 Image privacy prediction method based on scene context awareness

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310270840.9A CN116310920A (en) 2023-03-20 2023-03-20 Image privacy prediction method based on scene context awareness

Publications (1)

Publication Number Publication Date
CN116310920A true CN116310920A (en) 2023-06-23

Family

ID=86781213

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310270840.9A Pending CN116310920A (en) 2023-03-20 2023-03-20 Image privacy prediction method based on scene context awareness

Country Status (1)

Country Link
CN (1) CN116310920A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117094032A (en) * 2023-10-17 2023-11-21 成都乐超人科技有限公司 User information encryption method and system based on privacy protection

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117094032A (en) * 2023-10-17 2023-11-21 成都乐超人科技有限公司 User information encryption method and system based on privacy protection
CN117094032B (en) * 2023-10-17 2024-02-09 成都乐超人科技有限公司 User information encryption method and system based on privacy protection

Similar Documents

Publication Publication Date Title
KR102106462B1 (en) Method for filtering similar problem based on weight
CN104715023B (en) Method of Commodity Recommendation based on video content and system
US10331729B2 (en) System and method for accessing electronic data via an image search engine
CN110276366A (en) Carry out test object using Weakly supervised model
US20150103097A1 (en) Method and Device for Implementing Augmented Reality Application
US20160070809A1 (en) System and method for accessing electronic data via an image search engine
Qi et al. An investigation of the visual features of urban street vitality using a convolutional neural network
US20220286438A1 (en) Machine learning techniques for mitigating aggregate exposure of identifying information
JP2010218373A (en) Server system, terminal apparatus, program, information storage medium, and image retrieving method
CN113792871A (en) Neural network training method, target identification method, device and electronic equipment
US20160034496A1 (en) System And Method For Accessing Electronic Data Via An Image Search Engine
WO2019200737A1 (en) Real estate data uploading method and apparatus, computer device, and storage medium
CN116310920A (en) Image privacy prediction method based on scene context awareness
Cho et al. Classifying tourists’ photos and exploring tourism destination image using a deep learning model
CN111552865A (en) User interest portrait method and related equipment
Cuesta-Valiño et al. The effects of the aesthetics and composition of hotels’ digital photo images on online booking decisions
CN114372580A (en) Model training method, storage medium, electronic device, and computer program product
Al Qudah et al. Using Artificial Intelligence applications for E-Government services as iris recognition
Ramesh et al. Facial recognition as a tool to identify Roman emperors: towards a new methodology
JP6896608B2 (en) Information presentation devices, methods and programs
CN109614547A (en) The method and apparatus of preferred watermark for electric terminal
US20130273969A1 (en) Mobile app that generates a dog sound to capture data for a lost pet identifying system
CN114722280A (en) User portrait based course recommendation method, device, equipment and storage medium
JP4752628B2 (en) Drawing search system, drawing search method, and drawing search terminal
CN113378859A (en) Interpretable image privacy detection method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination