CN113919998B

CN113919998B - Picture anonymizing method based on semantic and gesture graph guidance

Info

Publication number: CN113919998B
Application number: CN202111196429.9A
Authority: CN
Inventors: 张继东; 吕超; 曹靖城; 吴宇松
Original assignee: Tianyi Digital Life Technology Co Ltd
Current assignee: Tianyi Digital Life Technology Co Ltd
Priority date: 2021-10-14
Filing date: 2021-10-14
Publication date: 2024-05-14
Anticipated expiration: 2041-10-14
Also published as: WO2023060918A1; CN113919998A

Abstract

The invention relates to a picture anonymization method based on semantic and attitude graph guidance. The invention also relates to a picture anonymization system based on the guidance of the semantic graph and the gesture graph. In the system, the picture semantic anonymization module is configured to firstly perform semantic segmentation on the picture to obtain a semantic graph, and then generate a scene graph with the same semantic but different contents under the guidance of the semantic graph by using the countermeasure generation network. The character posture anonymizing module is configured to further guide and generate characters in the picture on the basis of the picture semantic anonymizing module, firstly, the characters are subjected to posture estimation to obtain a posture graph, and then a new figure with the same posture but different characters is generated under the guidance of the posture graph by using the countermeasure generation network. The overlapping module is configured to overlap the scene graph generated by the picture semantic anonymization module and the new portrait graph generated by the character gesture anonymization module according to the semantic graph to obtain a final anonymized picture.

Description

Picture anonymizing method based on semantic and gesture graph guidance

Technical Field

The invention relates to the field of image processing, in particular to the field of picture anonymization.

Background

The development of video monitoring cameras is from an initial closed-circuit television monitoring system, namely a first generation analog television monitoring system, to a video monitoring system based on a PC card in the latter half digital age, and finally to a digital age which mainly uses a network and communication technology as a platform and a network video monitoring system with intelligent image analysis as a characteristic based on the existing embedded technology.

As machine learning and artificial intelligence techniques develop and continue to advance, intelligent video surveillance techniques are becoming increasingly popular. The current intelligent video analysis technology mainly aims at analyzing real-time video images so as to achieve the effect of early warning. The development of network propagation makes users more and more important for personal privacy, and pictures as a rich information carrier are more sensitive to users. Early picture anonymization operations used only masking, blurring, or pixelation methods on sensitive information. While these methods have high ease of use, they are essentially ineffective in facing the currently popular deep learning identification methods. In recent years, more complex and effective methods have been proposed by researchers: face anonymization is performed, for example, using a k-same algorithm, and picture anonymization is achieved using a Generated Antagonism Network (GAN) framework.

The patent 'face anonymity privacy protection method based on generation of an countermeasure network' (CN 111242837A) discloses a face anonymity privacy protection method based on generation of the countermeasure network. Firstly, preprocessing face image data; then constructing and generating an countermeasure network structure; establishing an anonymous objective function of the face area; then establishing an objective function reserved in a scene content area; combining the anonymity of the face and the objective function of scene reservation; and finally, training and testing by adopting the public data set, and outputting a final result. The method replaces the face region in the image with the synthesized face to achieve the effect of anonymity of the face, and compared with the prior mosaic shielding method, the method is more efficient and more visually friendly. However, the method only replaces the human face, the body parts except the human face and other scenes on the picture are not processed, and the method still has the risk of user privacy for the picture anonymization of the indoor scene of the family. Meanwhile, the method depends on the accuracy of face detection, and anonymization failure is possible.

The invention discloses a service robot visual picture privacy protection method based on a generation type countermeasure network (CN 110363183A), which comprises the steps of firstly preprocessing data collected by a visual data acquisition end, then judging whether the input preprocessed data has privacy by a privacy recognition module, if so, performing picture conversion, converting into picture data which does not relate to privacy, and storing; training data growth and feature learning are used for updating a training data set, and a feature model is obtained through a modified Cycle-GAN algorithm based on the training data set and used for the picture conversion. The invention can lead the picture data to not relate to the privacy content from the source, but the invention directly uses the Cycle-GAN to transfer the original picture, and the lack of a fixed guiding mechanism can lead to larger style difference among different processing results and is not suitable for being used as training test data.

Therefore, there is a need for an improved technique for anonymizing pictures while maintaining the original semantic information of the pictures and the pose information of the figures.

Disclosure of Invention

This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.

The invention aims at a video monitoring scene, uses a guided type countermeasure generation network to realize the global anonymization of pictures and furthest protects the privacy of users. In addition, the invention can maintain the availability of the picture data as much as possible, and can simultaneously meet the practical requirements of privacy protection and development of users.

According to one embodiment of the invention, a picture anonymizing method based on semantic graph and gesture graph guidance is disclosed, comprising: carrying out semantic segmentation on the original picture to obtain a semantic graph; generating a scene graph with the same semantics but different contents with the original picture under the guidance of the semantic graph by using a picture semantic anonymization countermeasure generation network; taking the portrait part in the semantic graph as a mask to intercept a portrait graph from the original picture; extracting and estimating the gesture of the person in the portrait graph to generate a gesture graph; generating a new portrait graph of the same gesture but different characters with the portrait graph under the guidance of the gesture graph by using a portrait gesture anonymization countermeasure generation network; and superposing the scene graph and the new portrait graph according to the semantic graph to obtain a final anonymized picture.

According to one embodiment of the invention, a picture anonymization system based on semantic graph and gesture graph guidance is disclosed, comprising: the system comprises a picture semantic anonymization module, a character gesture anonymization module and a superposition module. The picture semantic anonymization module is configured to: carrying out semantic segmentation on the original picture to obtain a semantic graph; a picture semantic anonymization countermeasure generation network is used for generating a scene graph with the same semantic as the original picture but different content under the guidance of the semantic graph. The character pose anonymization module is configured to: taking the portrait part in the semantic graph as a mask to intercept a portrait graph from the original picture; extracting and estimating the gesture of the person in the portrait graph to generate a gesture graph; a new figure of a person having the same pose but different from the figure is generated under the guidance of the figure of the pose using a figure pose anonymization challenge generating network. The superposition module is configured to: and superposing the scene graph and the new portrait graph according to the semantic graph to obtain a final anonymized picture.

According to another embodiment of the invention, a computing device for semantic graph and gesture graph guided picture anonymization is disclosed, comprising: a processor; a memory storing instructions that when executed by the processor are capable of performing the method as described above.

These and other features and advantages will become apparent upon reading the following detailed description and upon reference to the associated drawings. It is to be understood that both the foregoing general description and the following detailed description are explanatory only and are not restrictive of aspects as claimed.

Drawings

So that the manner in which the above recited features of the present invention can be understood in detail, a more particular description of the invention, briefly summarized above, may be had by reference to embodiments, some of which are illustrated in the appended drawings. It is to be noted, however, that the appended drawings illustrate only certain typical aspects of this invention and are therefore not to be considered limiting of its scope, for the description may admit to other equally effective aspects.

FIG. 1 illustrates a block diagram of a picture anonymization system 100 for semantic and gesture graph based guidance according to one embodiment of the present invention;

FIG. 2 shows a diagram 200 further describing the functionality of the picture semantic anonymization module 101 according to one embodiment of the present invention;

FIG. 3 shows a diagram of a multi-channel attention selection model 300 in accordance with one embodiment of the invention;

FIG. 4 shows a diagram 400 further describing the functionality of the persona pose anonymization module 102, according to one embodiment of the present invention;

FIG. 5 illustrates a data flow diagram 500 for a semantic graph and gesture graph guided picture anonymization process according to one embodiment of the present invention;

FIG. 6 illustrates a flow diagram of a method 600 for picture anonymization based on semantic and gesture graph guidance, according to one embodiment of the present invention; and

FIG. 7 illustrates a block diagram 700 of an exemplary computing device, according to one embodiment of the invention.

Detailed Description

The features of the present invention will become more apparent from the detailed description set forth below when taken in conjunction with the drawings.

The user demands in the field of home cameras are becoming more and more abundant, and the accuracy of many AI functions depends on the richness of the video training data of the relevant pictures. Although a large amount of very valuable real data is accumulated during the use of the user, such data cannot be used in actual development for privacy protection and the like. The contradiction between privacy protection and model training data shortage has plagued developers.

The invention uses the semantic graph guiding and the gesture graph guiding methods to carry out global anonymization on the original picture of the user, thereby not only ensuring that the privacy of the user is not revealed, but also keeping the original semantic information of the picture and the gesture information of the figure. The invention can provide available training data for developing and optimizing AI algorithm models with low requirements on human faces, such as human shape detection, motion detection and the like, and can also provide a privacy protection mechanism for active anonymization encryption for users.

FIG. 1 illustrates a block diagram of a picture anonymization system 100 for guidance based on semantic and gesture graphs according to one embodiment of the present invention. As shown in fig. 1, the system 100 is divided by modules, with communication and data exchange between the modules taking place in a manner known in the art. In the present invention, each module may be implemented by software or hardware or a combination thereof. The system 100 includes a picture semantic anonymization module 101, a character pose anonymization module 102, and an overlay module 103.

According to one embodiment of the invention, the picture semantic anonymization module 101 is configured to first semantically segment the picture to obtain a semantic graph, and then generate a scene graph with the same semantic but different content under the guidance of the semantic graph using the countermeasure generation network.

According to one embodiment of the present invention, the character pose anonymizing module 102 is configured to further guide and generate characters in the picture based on the picture semantic anonymizing module 101, firstly perform pose estimation on the characters to obtain a human body key point pose map, and then generate a new figure with the same pose but different characters under the guidance of the pose map by using the countermeasure generation network.

According to one embodiment of the present invention, the superimposing module 103 is configured to superimpose the scene graph generated by the picture semantic anonymizing module 101 and the new portrait graph generated by the portrait pose anonymizing module 102 according to the semantic graph, so as to obtain a final anonymized picture. The position information of the person on the picture can be obtained through semantic segmentation, and the final superposition of the anonymized picture is realized through the information.

As known to those skilled in the art, the camera used in the intelligent video monitoring technology related to the present invention generally refers to a home camera related to the field of smart home, a monitoring probe related to the field of smart city, and a camera device generally installed in public places to perform a monitoring function. The monitoring device can photograph and pick up a scene, and can store the acquired image data in a local machine for subsequent processing or send the data to a remote device (such as an intelligent home control platform, a central control platform, other computing devices and the like) for processing. The manner of connection and communication between the monitoring device and the remote device is not limited herein, but is believed to be performed in a variety of ways known in the art. According to one embodiment of the invention, the system 100 may be implemented in a monitoring device or on a remote device. According to another embodiment of the invention, one or more modules in system 100 may be implemented separately in a monitoring device and a remote device.

FIG. 2 shows a diagram 200 further describing the functionality of the picture semantic anonymization module 101 according to one embodiment of the present invention. The picture semantic anonymization module 101 is configured to implement three phases of semantic segmentation, semantic guided reconstruction, and picture optimization.

As shown in fig. 2, in the semantic segmentation stage, a self-codec built by ShuffleNet as a backbone network is used as a semantic generator to infer an input original picture Ig to obtain a scene semantic graph Sg.

In the context of the present invention, both the semantic guided reconstruction stage and the picture optimization stage may logically/functionally constitute a picture semantic anonymization challenge-generating network based on a multi-channel attention selection mechanism under cascaded semantic guidance. In the picture semantic anonymization countermeasure generation network, a semantic guidance reconstruction stage is used for generating a coarse-granularity picture semantic anonymization result by adopting cascade semantic guidance, and a picture optimization stage is used for generating a finer result through a multichannel attention selection mechanism.

In the semantic guidance reconstruction stage, a target texture picture Ir is randomly selected from a scene texture picture library to serve as a conditional image, the randomly selected target texture picture Ir and a scene semantic picture Sg obtained in the semantic segmentation stage are cascaded, and a result after the cascade is input into a generator Gi to be inferred to obtain a generated image I ' g, wherein the generator Gi is a U-Net model constructed based on REFINENET, and the generator Gi is optimized by optimizing a semantic picture S ' g of the generated image I ' g and a loss function of an original scene semantic picture Sg in training. Where L1-L4 are the four components in calculating the loss function.

The picture optimization stage uses a multichannel attention selection model to optimize the generated picture I 'g of the previous stage to obtain a final scene graph I' g. The purpose of using the multi-channel attention selection model is to produce finer granularity results from a larger generation space and to generate an uncertainty map to guide the optimized pixel loss. FIG. 3 shows a diagram of a multi-channel attention selection model 300 in accordance with one embodiment of the invention.

The multi-channel attention selection model 300 includes a multi-scale space pooling portion and a multi-channel attention selection portion. The multi-scale spatial pooling portion performs global average pooling on the same input features using a set of different sizes and strides, resulting in multi-scale features with different acceptance domains to perceive different spatial contexts. The multichannel attention selection component utilizes the generation of a series of different intermediate pictures and the combination into a final output.

Referring to fig. 2 and 3, the multi-channel attention selection model 300 selects the feature map cascade of the conditional image Ir, the generated image I' g, the generator Gi, and the last convolution layer output in the semantic segmentation stage as features input into a multi-scale spatial pooling section that performs different scale average pooling to obtain multi-scale spatial context features. The pooled features of different scales are multiplied by the input features in order to preserve useful information, and the result is convolved to produce new multi-scale features and used as input for the multi-channel attention selection section. The multi-channel attention selection section expands the channel representation of the image through a convolutional network and combines the attention maps to produce more reasonable results.

Specifically, with further reference to fig. 2 and 3, the multi-channel attention selection model 300 takes the feature maps F _i and F _s of the conditional image I _r, the generated image I' g, the generator Gi, and the last convolutional layer output in the semantic segmentation stage, concatenates the feature inputs into the multi-scale spatial pooling section, and the generated multi-scale features serve as inputs to the multi-channel attention selection section. The multichannel attention selection section enlarges a channel representation of the image by a convolutional network, wherein an intermediate pictureAnd corresponding attention seeking tabletThe calculation method of (2) is shown in the formula (1):

finally, each intermediate picture is selected by using the learned attention picture, and the calculation method is shown as a formula (2):

meanwhile, by learning the uncertainty map (uncertainty maps), the pixel level Loss (Loss function) optimization computation can be made more robust.

According to one embodiment of the invention, the picture semantic anonymization module 101 trains the generator Gi and the multi-attention selection model 300 using the inside 09 indoor scene dataset.

FIG. 4 shows a diagram 400 further describing the functionality of the persona pose anonymization module 102, according to one embodiment of the present invention. As shown in fig. 4, the function of the character pose anonymization module 102 is implemented similarly to the picture semantic anonymization module 101, except that the semantic graph is replaced by the pose graph extracted by openPose model, and a picture is randomly selected as a conditional image in the disclosed portrait picture dataset. Character pose anonymization module 102 trains using CUHK a 03 human data set.

Specifically, the character pose anonymization module 102 is configured to implement three phases of pose estimation, pose guide reconstruction, and picture optimization.

As shown in fig. 4, in the gesture estimation stage, the portrait portion in the semantic graph obtained by the picture semantic anonymization module 101 is used as a mask, an original portrait graph Ig is intercepted from an original input picture, and the gesture of the person in the original portrait graph Ig is extracted and estimated by using a openPose model, so as to generate a gesture graph Sg.

In the context of the present invention, both the gesture-guided reconstruction stage and the picture optimization stage may logically/functionally constitute a multi-channel attention selection mechanism based character gesture anonymization countermeasure generation network under cascaded gesture guidance. In the character pose anonymization countermeasure generation network, a pose guidance reconstruction stage is used for generating a character pose anonymization result with coarse granularity, and a picture optimization stage is used for generating a finer result through a multichannel attention selection mechanism.

In the posture guiding reconstruction stage, a human image picture Ir is randomly selected from a human image picture data set to serve as a conditional image, the randomly selected human image picture Ir and a posture picture Sg obtained in the posture estimation stage are cascaded, and a result after the cascade is input into a generator Gi to be inferred to obtain a generated image I ' g, wherein the generator Gi is a U-Net model constructed based on REFINENET, and the generator Gi is optimized by optimizing a loss function of a posture picture S ' g of the generated image I ' g and an original posture picture Sg in training. Where L1-L4 are the four components in calculating the loss function.

And in the picture optimization stage, the generated picture I 'g in the previous stage is optimized by using a multichannel attention selection model so as to obtain a final portrait picture I' g. For a detailed description of the multi-channel attention selection model, see the description above for fig. 3.

FIG. 5 illustrates a data flow diagram 500 for a semantic graph and gesture graph guided picture anonymization process according to one embodiment of the present invention. The dataflow graph 500 can be divided into a picture semantic anonymization stage 501, a character pose anonymization stage 502, and an overlay stage 503.

Referring to fig. 5, in a picture semantic anonymization stage 501, an input picture is semantically segmented to form a semantic graph that generates a scene graph through a picture semantic anonymization challenge generation network as described above. Meanwhile, after the semantic graph is formed, a character pose anonymization stage 502 may be initiated in which the original portrait portion in the semantic graph is first truncated from the input picture using the portrait portion as a mask. The original portrait graph is subjected to gesture extraction and estimation to generate a gesture graph. The gesture map generates a new portrait map through the character gesture anonymization countermeasure generation network as described above. After the picture semantic anonymization stage 501 and the character pose anonymization stage 502 are completed, an overlay stage 503 may be initiated in which the scene graph generated by the picture semantic anonymization stage 501 and the new portrait graph generated by the character pose anonymization stage 502 are overlaid according to the semantic graph to form an anonymized picture for output.

Fig. 6 shows a flow diagram of a method 600 for picture anonymization based on semantic and gesture graph guidance, according to one embodiment of the present invention.

In step 601, the original picture is semantically segmented to obtain a semantic graph. According to one embodiment of the present invention, the original picture may be a picture taken by a monitoring camera, or a certain frame in a video taken by the monitoring camera, or a picture selected by a user. According to one embodiment of the invention, a self-codec built by ShuffleNet as a backbone network is used as a semantic generator to infer the original pictures to obtain a semantic graph. According to one embodiment of the invention, the semantic graph may indicate the location information of the person on the original picture.

At step 602, a scene graph having the same semantics as the original picture but different content is generated under the guidance of the semantic graph using a picture semantic anonymization countermeasure generation network. According to one embodiment of the invention, the picture semantic anonymization countermeasure generation network comprises a semantic guidance reconstruction stage and a picture optimization stage, wherein the semantic guidance reconstruction stage is used for generating a picture semantic anonymization result of a coarse granularity level by adopting cascade semantic guidance based on a semantic graph, and the picture optimization stage is used for optimizing the picture semantic anonymization result generated by the semantic guidance reconstruction stage through a multichannel attention selection mechanism so as to obtain a final scene graph with a finer granularity level.

In step 603, the image portion in the semantic graph obtained in step 601 is used as a mask to intercept the original image graph from the original image.

At step 604, the pose of the person in the original portrait graph is extracted and estimated to generate a pose graph. According to one embodiment of the present invention, the pose of the person in the original image obtained in step 603 is extracted and estimated using openPose models to generate a pose map.

At step 605, a new figure having the same pose as the original figure but a different person is generated under the guidance of the pose map using the human pose anonymization challenge-generating network. According to one embodiment of the invention, the character pose anonymization countermeasure generation network comprises a pose guide reconstruction stage and a picture optimization stage, wherein the pose guide reconstruction stage is used for generating a character pose anonymization result with a coarse granularity level by adopting cascade pose guide based on a pose map, and the picture optimization stage is used for optimizing the character pose anonymization result generated in the pose guide reconstruction stage through a multichannel attention selection mechanism so as to obtain a final figure with a finer granularity level.

In step 606, the scene graph generated in step 602 and the new portrait graph generated in step 605 are superimposed according to the semantic graph obtained in step 601, so as to obtain a final anonymized picture. According to one embodiment of the invention, the position information of the person on the original picture can be obtained through semantic segmentation, and the superposition of the scene graph and the new image graph is realized through the information.

In summary, compared with the prior art, the invention has the main advantages that: (1) The overall anonymization of the picture is carried out, only the abstract semantic graph and the figure gesture graph of the original picture are reserved in the generated picture, and the face, the person and the background are completely replaced, so that the privacy leakage risk can be reduced to the greatest extent; (2) On the basis of complete anonymization, original semantic information, character posture information and object motion information of the picture can be kept, and a large amount of available training data can be provided for developing and optimizing AI algorithm models of non-identity authentication such as human form detection, motion detection and the like; (3) The initial result of the antagonism generation network output is further optimized using the multichannel attention model, so that the quality of the output picture is higher.

In addition, in practical application, the invention has the advantages that, for example, the clothes can be adjusted by using the online try-on application of similar technology, the facial information and the background information can be replaced, and the privacy of the user is protected to the greatest extent.

FIG. 7 illustrates a block diagram 700 of an exemplary computing device, which is one example of a hardware device that may be used in connection with aspects of the invention, according to one embodiment of the invention. For example, the monitoring devices, remote devices, computing devices associated with users mentioned above may all be implemented as computing devices in fig. 7. Computing device 700 may be any machine that may be configured to implement processes and/or calculations and may be, but is not limited to, a workstation, a server, a desktop computer, a laptop computer, a tablet computer, a personal digital assistant, a smart phone, a vehicle mount computer, or any combination thereof. Computing device 700 may include components that may be connected or in communication with a bus 702 via one or more interfaces. For example, computing device 700 may include a bus 702, one or more processors 704, one or more input devices 706, and one or more output devices 708. The one or more processors 704 may be any type of processor and may include, but are not limited to, one or more general purpose processors and/or one or more special purpose processors (e.g., special processing chips). Input device 706 may be any type of device capable of inputting information to a computing device and may include, but is not limited to, a mouse, keyboard, touch screen, microphone, and/or remote controller. Output device 708 may be any type of device capable of presenting information and may include, but is not limited to, a display, speakers, a video/audio output terminal, a vibrator, and/or a printer. Computing device 700 may also include or be connected to a non-transitory storage device 710, which may be non-transitory and capable of data storage, and which may include, but is not limited to, a disk drive, an optical storage device, a solid state memory, a floppy disk, a flexible disk, a hard disk, a magnetic tape, or any other magnetic medium, an optical disk or any other optical medium, a ROM (read only memory), a RAM (random access memory), a cache memory, and/or any memory chip or cartridge, and/or any other medium from which a computer may read data, instructions, and/or code. The non-transitory storage device 710 may be separated from the interface. The non-transitory storage device 710 may have data/instructions/code for implementing the methods and steps described above. Computing device 700 may also include communication device 712. Communication device 712 may be any type of device or system capable of communicating with internal equipment and/or with a network and may include, but is not limited to, a modem, a network card, an infrared communication device, a wireless communication device, and/or a chipset, such as a Bluetooth device, an IEEE 1302.11 device, a WiFi device, a WiMax device, a cellular communication device, and/or the like.

Bus 702 can include, but is not limited to, an Industry Standard Architecture (ISA) bus, a Micro Channel Architecture (MCA) bus, an Enhanced ISA (EISA) bus, a Video Electronics Standards Association (VESA) local bus, and a Peripheral Component Interconnect (PCI) bus.

Computing device 700 may also include a working memory 714, which working memory 714 may be any type of working memory capable of storing instructions and/or data that facilitate the operation of processor 704 and may include, but is not limited to, random access memory and/or read-only memory devices.

Software components may reside in the working memory 714 and include, but are not limited to, an operating system 716, one or more application programs 718, drivers, and/or other data and code. Instructions for implementing the above-described methods and steps of the present invention may be included in the one or more application programs 718 and the above-described method 600 of the present invention may be implemented by the processor 704 reading and executing the instructions of the one or more application programs 718.

It should also be appreciated that variations may be made according to particular needs. For example, custom hardware may also be used, and/or particular components may be implemented in hardware, software, firmware, middleware, microcode, hardware description voices, or any combination thereof. In addition, connections to other computing devices, such as network input/output devices, etc., may be employed. For example, some or all of the disclosed methods and apparatus may be implemented with programming hardware (e.g., programmable logic circuits including Field Programmable Gate Arrays (FPGAs) and/or Programmable Logic Arrays (PLAs)) having an assembly language or hardware programming language (e.g., VERILOG, VHDL, C ++).

Although aspects of the present invention have been described so far with reference to the accompanying drawings, the above-described methods and apparatuses are merely examples, and the scope of the present invention is not limited to these aspects but is limited only by the appended claims and equivalents thereof. Various components may be omitted or replaced with equivalent components. In addition, the steps may also be implemented in a different order than described in the present invention. Furthermore, the various components may be combined in various ways. It is also important that as technology advances, many of the described components can be replaced by equivalent components that appear later.

Claims

1. A picture anonymization method based on semantic graph and gesture graph guidance comprises the following steps:

carrying out semantic segmentation on the original picture to obtain a semantic graph;

Generating a scene graph with the same semantics as the original picture but different contents under the guidance of the semantic graph by using a picture semantic anonymization countermeasure generation network, wherein the picture semantic anonymization countermeasure generation network comprises a semantic guidance reconstruction stage and a first picture optimization stage, wherein the semantic guidance reconstruction stage is used for generating a picture semantic anonymization result with a coarse granularity level by adopting cascade semantic guidance based on the semantic graph, and the first picture optimization stage is used for optimizing the picture semantic anonymization result generated by the semantic guidance reconstruction stage through a multichannel attention selection mechanism so as to obtain the scene graph with a finer granularity level;

taking the portrait part in the semantic graph as a mask to intercept a portrait graph from the original picture;

extracting and estimating the gesture of the person in the portrait graph to generate a gesture graph;

generating a new figure with the same gesture as the figure but different from the figure under the guidance of the figure by using a figure gesture anonymization countermeasure generation network, wherein the figure gesture anonymization countermeasure generation network comprises a gesture guidance reconstruction stage and a second picture optimization stage, wherein the gesture guidance reconstruction stage is used for generating a figure gesture anonymization result with coarse granularity level by adopting cascade gesture guidance based on the figure gesture, and the second picture optimization stage is used for optimizing the figure gesture anonymization result generated by the gesture guidance reconstruction stage through a multichannel attention selection mechanism so as to obtain the figure with finer granularity level;

And superposing the scene graph and the new portrait graph according to the semantic graph to obtain a final anonymized picture.

2. The method of claim 1, wherein semantically segmenting the original picture to obtain a semantic graph further comprises: and using ShuffleNet as a self-codec built by a backbone network as a semantic generator, and reasoning the original picture to obtain the semantic graph.

3. The method of claim 1, wherein extracting and estimating the pose of the person in the portrait graph to generate a pose graph further comprises: the pose of the person in the portrait graph is extracted and estimated using openPose models to generate the pose graph.

4. A picture anonymization system based on semantic graph and gesture graph guidance, comprising:

A picture semantic anonymization module configured to:

a persona pose anonymization module configured to:

A superposition module configured to:

5. The system of claim 4, wherein semantically segmenting the original picture to obtain a semantic graph further comprises: using ShuffleNet as a self-encoding and decoding machine built by a backbone network as a semantic generator, and reasoning the original picture to obtain the semantic graph;

Extracting and estimating the character pose in the portrait graph to generate a pose graph further comprises: the pose of the person in the portrait graph is extracted and estimated using openPose models to generate the pose graph.

6. A computing device for semantic graph and gesture graph guided picture anonymization, comprising:

A processor;

a memory storing instructions that when executed by the processor are capable of performing the method of any of claims 1-3.