WO2023168903A1 - 模型训练和身份匿名化方法、装置、设备、存储介质及程序产品 - Google Patents

模型训练和身份匿名化方法、装置、设备、存储介质及程序产品 Download PDF

Info

Publication number
WO2023168903A1
WO2023168903A1 PCT/CN2022/111704 CN2022111704W WO2023168903A1 WO 2023168903 A1 WO2023168903 A1 WO 2023168903A1 CN 2022111704 W CN2022111704 W CN 2022111704W WO 2023168903 A1 WO2023168903 A1 WO 2023168903A1
Authority
WO
WIPO (PCT)
Prior art keywords
identity
vectors
image
loss
network model
Prior art date
Application number
PCT/CN2022/111704
Other languages
English (en)
French (fr)
Inventor
罗宇辰
朱俊伟
贺珂珂
储文青
邰颖
汪铖杰
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Priority to JP2022566254A priority Critical patent/JP2024513274A/ja
Priority to EP22773378.9A priority patent/EP4270232A4/en
Priority to KR1020227038590A priority patent/KR20230133755A/ko
Priority to US18/076,073 priority patent/US20230290128A1/en
Publication of WO2023168903A1 publication Critical patent/WO2023168903A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • G06F21/6254Protecting personal data, e.g. for financial or medical purposes by anonymising data, e.g. decorrelating personal data from the owner's identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/04Context-preserving transformations, e.g. by using an importance map
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4038Image mosaicing, e.g. composing plane images from plane sub-images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging

Definitions

  • This application relates to the field of image processing technology, and in particular to a model training and identity anonymization method, device, equipment, storage medium and program product.
  • Identity anonymization also known as De-Identification, refers to the removal of identifiable identity features (Identity) from images or videos, but at the same time retains other attributes unrelated to identity unchanged, and ensures that anonymized pictures or videos must Still visually authentic.
  • conditional generative adversarial networks are used to generate anonymized images by extracting the pose key points of the original image, and using the pose key points of the original image and the background image after removing the facial area as conditions Inputs are fed into the model to generate new virtual identities to fill in the vacant facial areas.
  • this method uses the background image after removing the facial area as the model input, resulting in poor quality images generated by the model.
  • Embodiments of the present application provide a model training method, an identity anonymization method, a device, a computing device, a computer-readable storage medium, and a computer program product, which can improve the quality of generating identity anonymization images.
  • the embodiment of this application provides a model training method, including:
  • image generation is performed based on the N first virtual identity vectors and the M attribute vectors to obtain an identity anonymized image of the second training image;
  • a loss of the target network model is determined, and the target network model is trained based on the loss.
  • the embodiment of this application also provides an identity anonymization method, including:
  • N is a positive integer
  • attribute vectors are extracted from the image to be processed to obtain M attribute vectors, where M is a positive integer;
  • image generation is performed based on the N virtual identity vectors and the M attribute vectors to obtain an identity anonymized image of the image to be processed.
  • An embodiment of the present application also provides a model training device, including:
  • a projection unit configured to project the first training image to the target space through the projection module in the target network model to obtain N first virtual identity vectors, where N is a positive integer;
  • the attribute unit is configured to extract attribute vectors from the second training image through the attribute module in the target network model to obtain M attribute vectors, where M is a positive integer;
  • a fusion unit configured to generate an image based on the N first virtual identity vectors and the M attribute vectors through the fusion module of the target network model to obtain an identity anonymized image of the second training image
  • a training unit configured to determine a loss of the target network model based on the identity anonymized image, and train the target network model based on the loss.
  • the embodiment of the present application also provides an identity anonymization device, including:
  • a sampling unit configured to sample on the target space of the projection module in the target network model to obtain N virtual identity vectors, where N is a positive integer;
  • the attribute unit is configured to extract attribute vectors of the image to be processed through the attribute module in the target network model, and obtain M attribute vectors, where M is a positive integer;
  • the anonymization unit is configured to generate an image based on the N virtual identity vectors and the M attribute vectors through the fusion module of the target network model to obtain an identity anonymized image of the image to be processed.
  • An embodiment of the present application also provides a computing device, including a processor and a memory.
  • the memory is configured to store a computer program
  • the processor is configured to call and run the computer program stored in the memory to execute the above model training method or identity anonymization method provided by embodiments of the present application.
  • An embodiment of the present application also provides a chip configured to implement the above model training method or identity anonymization method provided by the embodiment of the present application.
  • the chip includes: a processor configured to call and run a computer program from a memory, so that a device installed with the chip executes the above model training method or identity anonymization method provided by embodiments of the present application.
  • Embodiments of the present application also provide a computer-readable storage medium configured to store a computer program.
  • the computer program When the computer program is executed, the above model training method or identity anonymization method provided by the embodiment of the present application is implemented.
  • An embodiment of the present application also provides a computer program product, which includes computer program instructions.
  • the computer program instructions When the computer program instructions are executed by a computer, the above-mentioned model training method or identity anonymization method provided by the embodiment of the present application is implemented.
  • An embodiment of the present application also provides a computer program that, when run on a computer, implements the above model training method or identity anonymization method provided by the embodiment of the present application.
  • N first virtual identity vectors are obtained, so that the target network model can fully learn the identity information in the image
  • Attribute vector extraction is performed to obtain M attribute vectors, which enables the target network model to fully learn the attribute information in the image.
  • the identity of the second training image is obtained. Anonymize images, so that the trained model can generate images carrying virtual identity information while ensuring that the attribute information of the original image remains unchanged;
  • N virtual identity vectors are obtained, and the generation of virtual identity information is realized.
  • M attribute vectors are obtained.
  • the image is generated based on N virtual identity vectors and M attribute vectors to obtain the identity anonymized image of the image to be processed, achieving the While ensuring that the attribute information of the image to be processed remains unchanged, an identity anonymized image carrying virtual identity information, that is, hiding the true identity, is generated. That is, in the embodiment of the present application, an independent virtual identity is generated through the target network model during identity anonymization without the need for Remove facial areas from images, thereby increasing the fidelity and resolution of identity anonymization.
  • Figure 1A is a schematic diagram of a real image provided by an embodiment of the present application.
  • Figures 1B-1D are schematic diagrams of the identity anonymization images corresponding to Figure 1A provided by the embodiment of the present application;
  • Figure 2 is a schematic diagram of a system architecture provided by an embodiment of the present application.
  • Figure 3 is a schematic flow chart of the model training method provided by the embodiment of the present application.
  • FIGS. 4 to 6 are schematic structural diagrams of the target network model provided by the embodiment of the present application.
  • Figure 7 is a schematic structural diagram of the fusion module provided by the embodiment of the present application.
  • Figure 8 is a schematic structural diagram of the target network model provided by the embodiment of the present application.
  • Figures 9 and 10 are schematic diagrams of contrast loss determination provided by embodiments of the present application.
  • Figure 11 is a schematic flow chart of the identity anonymization method provided by the embodiment of the present application.
  • Figure 12 is a schematic diagram of the projection module provided by the embodiment of the present application.
  • Figure 13 is a schematic diagram of identity anonymization image determination provided by the embodiment of this application.
  • Figure 14 is a schematic block diagram of a model training device provided by an embodiment of the present application.
  • Figure 15 is a schematic block diagram of an identity anonymization device provided by an embodiment of the present application.
  • Figure 16 is a schematic block diagram of a computing device provided by an embodiment of the present application.
  • B corresponding to A means that B is associated with A.
  • B can be determined based on A.
  • determining B based on A does not mean determining B only based on A.
  • B can also be determined based on A and/or other information.
  • words such as “first”, “second” and “third” are used to describe the same items or items with basically the same functions and effects. Distinguish similar items. Those skilled in the art can understand that words such as “first”, “second” and “third” do not limit the quantity and order of execution, and words such as “first”, “second” and “third” do not limit the number or order of execution. It’s not necessarily different.
  • Artificial Intelligence is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results.
  • artificial intelligence is a comprehensive technology of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can respond in a similar way to human intelligence.
  • Artificial intelligence is the study of the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making.
  • Artificial intelligence technology is a comprehensive subject that covers a wide range of fields, including both hardware-level technology and software-level technology.
  • Basic artificial intelligence technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technology, operation/interaction systems, mechatronics and other technologies.
  • Artificial intelligence software technology mainly includes computer vision technology, speech processing technology, natural language processing technology, and machine learning/deep learning.
  • Machine Learning is a multi-field interdisciplinary subject involving probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and other disciplines. It specializes in studying how computers can simulate or implement human learning behavior to acquire new knowledge or skills, and reorganize existing knowledge structures to continuously improve their performance.
  • Machine learning is the core of artificial intelligence and the fundamental way to make computers intelligent. Its applications cover all fields of artificial intelligence.
  • Machine learning and deep learning usually include artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, teaching learning and other technologies.
  • FIGS. 1A to 1D are the real image
  • FIGS. 1B to 1D are the identity anonymized images of FIG. 1A . Comparing Figure 1A and Figure 1B to Figure 1D, it can be seen that Figure 1B to Figure 1D removes the identifiable identity feature (Identity) in Figure 1A, while leaving other attributes unrelated to identity unchanged, and ensuring that it is still visually authentic.
  • Identity identifiable identity feature
  • Scenario 1 The embodiments of this application can be applied to privacy protection scenarios. For example, for pictures or videos related to human faces, the method of the embodiments of this application can be used to replace the real identity with a virtual identity, so that subsequent detection and other tasks can continue to be performed. No privacy will be compromised. In addition, users can also use the method in the embodiments of this application to hide their identity when posting pictures or videos to avoid leakage of real information.
  • Scenario 2 The embodiments of the present application can be applied to the scene of generating a virtual image.
  • the technical solutions of the embodiments of the present application can be used to generate virtual identities, such as fixing identity latent variables, replacing background pictures, and generating a specific virtual image in different situations. Pictures or videos of the scene.
  • Scenario 1 and Scenario 2 take the target as a human face as an example.
  • the method in the embodiment of the present application can also be applied to the scenario of anonymizing the identity of other targets other than human faces, such as animals in the image to be processed. Anonymize the identity of any targets such as vehicles and vehicles.
  • the methods of the embodiments of the present application can be applied to intelligent transportation systems.
  • Intelligent Traffic System also known as Intelligent Transportation System (Intelligent Transportation System)
  • Intelligent Transportation System Intelligent Transportation System
  • advanced science and technology information technology, Computer technology, data communication technology, sensor technology, electronic control technology, automatic control theory, operations research, artificial intelligence, etc.
  • connections to form a comprehensive transportation system that ensures safety, improves efficiency, improves the environment, and saves energy.
  • the solution of combining this application with intelligent transportation can be that the vehicle-mounted device collects the user's face image, and uses the method of the embodiment of this application to anonymize the identity of the collected face image, and then sends it to other users.
  • the device performs task analysis, such as illegal driving analysis or intelligent driving analysis.
  • Figure 2 is a schematic diagram of a system architecture involved in an embodiment of the present application, including user equipment 101, data collection equipment 102, training equipment 103, execution equipment 104, database 105, content library 106, I/O interface 107 and target network model 108 .
  • the data collection device 102 is configured to read training data from the content library 106 and store the read training data in the database 105 .
  • the training data involved in the embodiment of the present application includes first training images, second training images, and third training images.
  • the first training images, second training images, and third training images are all used to train the target network model.
  • the user device 101 is configured to perform annotation operations on data in the database 105 .
  • the training device 103 trains the target network model 108 based on the training data maintained in the database 105, so that the trained target network model 108 can generate an identity anonymized image of the image to be processed.
  • the target network model 108 obtained by training the device 103 can be applied to different systems or devices.
  • the execution device 104 is configured with an I/O interface 107 for data interaction with external devices.
  • the image to be processed sent by the user device 101 is received through the I/O interface.
  • the computing module 109 in the execution device 104 uses the trained target network model 108 to process the input image to be processed, outputs the identity anonymized image, and outputs the generated identity anonymized image to the user device 101 for display, or inputs other Other tasks are processed in the task model.
  • the user device 101 may include a mobile phone, a tablet computer, a notebook computer, a handheld computer, a mobile internet device (mobile internet device, MID) or other terminal devices with the function of installing a browser.
  • a mobile phone a tablet computer, a notebook computer, a handheld computer, a mobile internet device (mobile internet device, MID) or other terminal devices with the function of installing a browser.
  • a mobile internet device mobile internet device, MID
  • the execution device 104 may be a server. There can be one or more servers. When there are multiple servers, at least one of the following situations may exist: at least two servers are configured to provide different services, and at least two servers are configured to provide the same service; for example, the same service is provided in a load balancing manner.
  • the above-mentioned server can be an independent physical server, or a server cluster or distributed system composed of multiple physical servers, or it can provide cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud Cloud servers for basic cloud computing services such as communications, middleware services, domain name services, security services, Content Delivery Network (CDN), and big data and artificial intelligence platforms. Servers can also become nodes of the blockchain.
  • the execution device 104 is connected to the user device 101 through the network.
  • the network may be an intranet, Internet, Global System of Mobile communication (GSM), Wideband Code Division Multiple Access (WCDMA), 4G network, 5G Wireless or wired networks such as Internet, Bluetooth, Wi-Fi, and call networks.
  • GSM Global System of Mobile communication
  • WCDMA Wideband Code Division Multiple Access
  • 4G network 4G network
  • 5G Wireless or wired networks such as Internet, Bluetooth, Wi-Fi, and call networks.
  • Figure 2 is only a schematic diagram of a system architecture provided by an embodiment of the present application, and the positional relationship between the devices, devices, modules, etc. shown in the figure does not constitute any limitation.
  • the above-mentioned data collection device 102, the user device 101, the training device 103 and the execution device 104 may be the same device.
  • the above-mentioned database 105 can be distributed on one server or multiple servers, and the above-mentioned content library 106 can be distributed on one server or multiple servers.
  • This application provides a target network model, which is used to perform identity anonymization processing on targets (such as faces) in images to be processed, and to generate identity anonymized images of the images to be processed. Therefore, in some embodiments, the target network model may be referred to as an identity anonymization model, or an identity anonymizer.
  • Figure 3 is a schematic flowchart of the model training method provided by the embodiment of the present application.
  • the execution subject of the embodiment of the present application is a device with a model training function, such as a model training device.
  • the model training device may be a computing device, or a part of the computing device.
  • the following description takes the execution subject as a computing device as an example.
  • the method in the embodiment of this application includes:
  • the computing device projects the first training image to the target space through the projection module in the target network model, and obtains N first virtual identity vectors, where N is a positive integer.
  • the first training image in this embodiment of the present application is a training image in the training data. It should be noted that if the above-mentioned first training image is a face image, the above-mentioned first training image is obtained with the user's permission and consent, and the collection, use and processing of relevant image data need to comply with the relevant laws and regulations of relevant countries and regions. and standards.
  • the process of training the model using each first training image is basically similar.
  • a first training image is used as an example for explanation.
  • the embodiment of the present application projects the first training image into the target space through the target network model to obtain one or more virtual identity vectors of the first training image, so that the target network model learns the identity information of the first training image. After the target network model has fully learned the identity information, when actually performing identity anonymization processing, the target space of the target network model can be directly sampled to generate a virtual identity vector.
  • the embodiments of this application mainly involve the concepts of attribute vectors and virtual identity vectors.
  • the virtual identity vector is a vector corresponding to the virtual identity information
  • the virtual identity information is the identity information after hiding the identifiable identity features, such as facial information after hiding the identifiable identity features of the face.
  • the attribute vector is the vector corresponding to the attribute information.
  • Other feature information in the image other than identifiable identity features is called attribute information, such as background information.
  • the target network model of the embodiment of the present application can generate an independent virtual identity vector.
  • Figure 4 is a schematic structural diagram of the target network model provided by the embodiment of the present application.
  • the target network model of the embodiment of the present application includes a projection module, an attribute module and a fusion module.
  • the projection module is configured to project the first training image into the target space to obtain N first virtual identity vectors of the first training image.
  • N is a positive integer.
  • the embodiment of the present application does not limit the value of N and can be set according to actual needs.
  • the attribute module is configured to perform attribute vector extraction on the second training image to extract M attribute vectors of the second training image.
  • M is a positive integer.
  • the embodiment of the present application does not limit the value of M and can be set according to actual needs. In some embodiments, M equals N.
  • the fusion module is configured to perform image generation based on the above-mentioned N first virtual identity vectors and M attribute vectors to obtain an identity anonymized image of the second training image.
  • N is a positive integer greater than 1, then the N first virtual identity vectors respectively correspond to different resolutions.
  • the projection module is configured to generate a virtual identity vector of the target in the second training image.
  • the virtual identity vector hides the true identity characteristics of the target in the second training image.
  • the attribute module is configured In order to extract the attribute vector of the second training image, the attribute vector retains other features other than the true identity characteristics of the target in the second training image. In this way, after the fusion module performs image generation based on the above virtual identity vector and attribute vector, an anonymized image that hides the target identity in the second training image can be obtained, that is, an identity anonymized image.
  • the projection module includes a first projection unit and a second projection unit, and the target space includes a first space Z and a second space W.
  • the above-mentioned computing device passes through The projection module can use the following method to project the first training image to the target space to obtain N first virtual identity vectors:
  • Extract the prior identity information of the first training image project the prior identity information to the first space Z through the first projection unit to obtain N identity latent vectors; project the N identity latent vectors into In the second space W, N first virtual identity vectors are obtained.
  • the a priori identity information of the first training image is extracted, for example, through a pre-trained recognition model, the a priori identity information of the first training image is extracted. Then, through the first projection unit, the prior identity information of the first training image is projected into the first space Z to obtain N identity latent vectors, and then through the second projection unit, the N identity latent vectors are projected to the second Space W, get N first virtual identity vectors.
  • the first space Z and the second space W may be different hidden spaces.
  • the embodiment of the present application places no restrictions on the first space Z and the second space W.
  • the first space is a latent space Z
  • the latent space Z conforms to a standard Gaussian distribution.
  • the above-mentioned first projection unit can project the prior identity information into the first space Z in the following manner to obtain N identity latent vectors:
  • the prior identity information is projected into the mean and variance of the first space through the first projection unit; sampling is performed based on the mean and variance of the first space to obtain N identity latent vectors.
  • the first projection unit is a variational autoencoder (VAE), such as a conditional variational autoencoder (CVAE).
  • VAE variational autoencoder
  • CVAE conditional variational autoencoder
  • the conditional variational autoencoder is a generative network. , learn the distribution of data through the encoder, obtain the latent variables, and then restore the latent variables to the original form of the data through the decoder.
  • Conditional variational autoencoders can learn the distribution of data and then sample to generate new data, often used for image generation.
  • the VAE projects the prior identity information into the mean and variance of the first space. Then, sampling is performed based on the mean and variance of the first space to obtain N identity latent vectors of the first training image.
  • the above-mentioned first space is a latent space Z that conforms to the standard Gaussian distribution. Therefore, in order to enhance the expression ability of the latent space, the embodiment of the present application generates different latent vectors at different resolution levels, for example, generating N Identity latent vector, which is equivalent to constructing a Z+ space containing multiple identity latent vectors.
  • the second space W is obtained from the latent space Z, for example, obtained by linear or nonlinear mapping from the latent space Z.
  • the embodiment of the present application does not limit the network structure of the second projection unit, for example, it is a mapping network.
  • the mapping network is composed of multiple fully connected layers.
  • the projection module by projecting the a priori identity information of the first training image into the shadow space (i.e., the target space) of the projection module, the projection module can fully learn the identity information of the first training image so that the subsequent generation is consistent with reality. virtual identity vector.
  • the second training image is any image in the training data set.
  • the second training image and the first training image may be the same image or different images.
  • the attribute module in the embodiment of the present application is configured to learn attribute information of the second training image to generate M attribute vectors.
  • the embodiments of this application do not limit the network model of the attribute module.
  • the attribute module includes a coding unit and a decoding unit.
  • the attribute vector of the second training image can be extracted in the following manner to obtain M Attribute vector:
  • the second training image is input into the encoding unit to obtain feature information of the second training image; the feature information is input into the decoding unit to obtain M attribute vectors.
  • the coding unit includes multiple feature extraction layers
  • the decoding unit also includes multiple feature extraction units, and there is a skip connection between at least one feature extraction layer in the coding unit and at least one feature extraction layer in the decoding unit.
  • Example 1 Splice N first virtual identity vectors to obtain the spliced first virtual identity vector.
  • Splice M attribute vectors For the spliced attribute vector, combine the spliced first virtual identity vector and the spliced first virtual identity vector. After the attribute vector of the image is generated, it is input into the fusion module to generate the identity anonymized image.
  • the spliced first virtual identity vector and the subsequent attribute vector are concatenated and then input into the fusion module to generate an identity anonymized image.
  • the spliced first virtual identity vector and the subsequent attribute vector are added together and then input into the fusion module to generate an identity anonymized image.
  • Example 2 The fusion module includes multiple different resolution layers. At this time, the fusion module can use the following method to generate images based on N first virtual identity vectors and M attribute vectors to obtain the identity anonymity of the second training image.
  • ized image N first virtual identity vectors and M attribute vectors to obtain the identity anonymity of the second training image.
  • the N first virtual identity vectors are used as styles, and the M attribute vectors are used as noise, and are input into the corresponding resolution layer to obtain the identity anonymization of the second training image. image.
  • N is 3
  • M is 4, and the fusion module includes 4 different resolution layers, among which the 3 first virtual identity vectors are recorded as first virtual identity vector 1, first virtual identity vector 2 and first virtual identity.
  • Vector 3 the four attribute vectors are recorded as attribute vector 1, attribute vector 2, attribute vector 3 and attribute vector 4.
  • the four resolution layers are sequentially recorded as resolution layer 1, resolution layer 2, resolution according to the size of the resolution.
  • the first virtual identity vector 1 corresponds to the lower resolution resolution layer 1 and the resolution layer 2
  • the first virtual identity vector 2 corresponds to the medium resolution resolution layer 3
  • the virtual identity vector 3 corresponds to the highest resolution.
  • the four attribute vectors correspond to the four resolution layers in order according to the resolution size.
  • the first virtual identity vector 1 is input into the resolution layer 1 to obtain the feature information 1.
  • the attribute vector 1 is merged with the feature information 1
  • it is input into the resolution layer 2 together with the first virtual identity vector 1 to obtain the feature information 2.
  • the attribute vector 2 and the feature information 2 are combined, they are input into the resolution layer 3 at the same time as the first virtual identity vector 3 to obtain the feature information 3.
  • the attribute vector 3 is merged with the feature information 3
  • the identity anonymized image of the second training image is generated.
  • the fusion module is a style-based generator (StyleGAN2).
  • the AdaIN layer is included between two adjacent resolution layers of the fusion module. For example, an affine transform (AT) is performed on the first virtual identity vector i+1, and the i-th resolution After the output feature information i of the layer is merged with the attribute vector i, it is input into the AdaIN layer with the first virtual identity vector i+1 after affine transformation, the AdaIN operation is performed, and the AdaIN operation result is input into the i+1th resolution layer .
  • affine transform AT
  • the AdaIN operation is performed, and the AdaIN operation result is input into the i+1th resolution layer .
  • the fusion module in the embodiment of the present application can also be an adversarial model such as StyleGAN3 and ProGAN.
  • an adversarial model such as StyleGAN3 and ProGAN.
  • the methods for determining the identity of the second training image and anonymizing the image may be different.
  • the embodiment of the present application does not Make restrictions.
  • the model training process of the embodiment of the present application is introduced.
  • the first training image Xs is passed through a pre-trained face recognition model to generate a priori identity information. then. Input the prior identity information into VAE, and project the prior identity information into the first space Z through VAE to obtain N identity latent vectors. For example, 3 N identity latent vectors are obtained. These 3 N identity latent vectors are respectively Corresponds to 3 different resolutions: low, medium and high. Next, N identity latent vectors are input into the mapping network, and the N identity latent vectors are projected from the first space Z to the second space W through the mapping network to obtain N first virtual identity vectors.
  • the second training image Xt is input into the autoencoder, and after the second training image Xt is processed by the autoencoder, M attribute vectors are generated. Finally, the M attribute vectors are used as noise and the N first virtual identity vectors are used as styles and input into each layer of StyleGAN2 to obtain the identity anonymized image Ys,t of the second training image output by StyleGAN2.
  • the first training image and the second training image are input into the target network model to obtain the identity anonymized image of the second training image output by the target network model. Then, the following S304 is performed to train the target network model.
  • the target network model outputs the identity anonymized image of the second training image, and the loss of the target network model is determined based on the identity anonymized image.
  • the identity anonymized image is input into a judgment model, which is a pre-trained model that can predict the degree of anonymization of the identity anonymized image.
  • the identity anonymized image is input into the judgment model, the judgment model identifies the identity anonymized image, and determines the recognition result as a loss of the target network model. If the recognition accuracy is high, it means that the anonymization effect of the current target network model is not ideal. At this time, the parameters in the target network model are adjusted according to the loss of the target network model.
  • select the new first training image and the second training image to perform the above-mentioned steps S301 to S304, and continue training the target network model until the target network model reaches the training end condition.
  • the training end conditions at least include that the number of training times reaches the preset number, or the degree of anonymization of the model reaches the expected effect.
  • the embodiment of the present application imposes KL divergence constraints L kl on the N identity latent vectors in the first space Z, To ensure that the identity information is projected to the standard Gaussian distribution.
  • determining the loss of the target network model based on the identity anonymization image may include: determining the target based on the identity anonymization image and the divergence constraints. Network model loss.
  • the divergence constraint L kl of N identity latent vectors can be determined through the following formula (1):
  • ⁇ i is the mean value corresponding to the i-th identity latent vector among the N identity latent vectors
  • ⁇ i is the variance corresponding to the i-th identity latent vector among the N identity latent vectors.
  • the above formula (1) is just an example.
  • the method of determining the divergence constraints of N identity latent vectors in the embodiment of the present application includes but is not limited to the above formula (1).
  • it can be the above formula (1) Perform deformations and other ways of calculating divergence constraints.
  • the above-mentioned second space is obtained from the first space through non-linear mapping, and is a complex non-Gaussian distribution.
  • the distribution of the second space W in the intermediate latent space at this time is not uniform, and the real identity vectors are gathered into multiple different centers, and are different from the generated virtual ones.
  • the identity vectors do not overlap, so the virtual identity vector cannot produce a reasonable face identity.
  • the embodiment of this application proposes to use a contrast loss to constrain the hidden vectors of the second space W space (i.e., the first virtual identity vector), so that the hidden vectors from the same identity are aggregated together, and the hidden vectors from different identities are Vectors repel each other and all hidden vectors are evenly distributed throughout the space.
  • the embodiment of this application can also determine identity loss in the following ways:
  • Step 1 obtain the third training image
  • Step 2 Process the third training image through the projection reference module to obtain N second virtual identity vectors
  • Step 3 Determine the identity loss based on the N first virtual identity vectors and N second virtual identity vectors.
  • the above-mentioned third training image and the first training image are two different images of the first target.
  • the third training image and the first training image are two different face images of the same user.
  • the above-mentioned projection reference module has the same network structure as the projection module and is updated according to the projection module.
  • the projection reference module is updated according to the projection module momentum, that is, the projection reference module is slowly updated as the projection module is updated.
  • the projection reference module can be updated according to the following formula (2):
  • P ⁇ '(t) is the projection reference module parameter after the t-th update
  • P ⁇ '(t-1) is the projection reference module parameter after the t-1 update
  • P ⁇ (t) is the projection reference module parameter after the t-th update.
  • the projection module parameters of , ⁇ is a small value, such as 0.01.
  • the embodiment of the present application sets a projection reference module that is completely consistent with the network structure of the projection module to perform the first virtual identity vector output by the projection module. constraint.
  • the first training image is input into the projection module to obtain N first virtual identity vectors of the first training image
  • the third training image is input into the projection reference module to obtain N second virtual identity vectors of the third training image.
  • the first training image and the third training image are images of the same target, and the network structures of the projection module and the projection reference module are consistent, if the model training is completed, the N first virtual identity vectors corresponding to the first training image are the same as N
  • the difference between the second virtual identity vectors is small.
  • the projection module in the target network model can be trained based on the N first virtual identity vectors and N second virtual identity vectors corresponding to the first training image. So that the projection module can generate a virtual identity vector that meets the requirements.
  • the methods for determining identity loss based on N first virtual identity vectors and N second virtual identity vectors include but are not limited to the following:
  • Method 1 Determine the difference between N first virtual identity vectors and N second virtual identity vectors at different resolutions, and determine the sum of the differences, or the average of the differences, as the identity loss.
  • N is 3, determine the difference 1 between the first virtual identity vector 1 and the second virtual identity vector 1, determine the difference 2 between the first virtual identity vector 2 and the second virtual identity vector 2, determine the first virtual identity vector
  • the difference between 3 and the second virtual identity vector 3 is 3.
  • the sum of Difference 1, Difference 2 and Difference 3 is determined as identity loss, or the average of Difference 1, Difference 2 and Difference 3 is determined as identity loss.
  • N dynamic lists K which store the representations of all different target identities (such as face identities) in the second space W+ space in the entire training set.
  • the identity loss can be determined in the following way:
  • Step 31 For the i-th first virtual identity vector among the N first virtual identity vectors, use the i-th second virtual identity vector to update the virtual identity vector corresponding to the first target in the i-th dynamic list.
  • the i-th dynamic list includes virtual identity vectors of different targets at the i-th resolution, and i is a positive integer from 1 to N.
  • each N second virtual identity vector corresponds to a dynamic list
  • N is 3, corresponding to low resolution, medium resolution and high resolution respectively
  • the dynamic list also includes three, namely the first dynamic list corresponding to low resolution, the second dynamic list corresponding to medium resolution, and the third dynamic list corresponding to high resolution.
  • Step 32 Determine the identity sub-loss corresponding to the i-th first virtual identity vector according to the i-th first virtual identity vector and the updated i-th dynamic list.
  • the first training image and the third training image are two different images of the first target j.
  • the first training image Xj is input into the projection module to obtain N first virtual identity vectors Wj.
  • the i-th dynamic list Ki includes second virtual identity vectors of different targets at the i-th resolution, and the i-th dynamic list Ki is updated in real time.
  • the i-th second virtual identity vector is used to update the virtual identity vector kj corresponding to the first target j in the i-th dynamic list Ki, that is, kj is updated to Wj'.
  • the identity sub-loss i corresponding to the i-th first virtual identity vector is determined.
  • the method of determining the identity subloss corresponding to the i-th first virtual identity vector is not limited.
  • loss methods such as center loss (Center loss) and triplet loss (Triplet loss) to determine the corresponding value of the i-th first virtual identity vector based on the i-th first virtual identity vector and the updated i-th dynamic list. loss of identity.
  • the above-mentioned determination of the identity sub-loss corresponding to the i-th first virtual identity vector among the N first virtual identity vectors may include the following steps:
  • Step 321 Obtain the first ratio between the i-th second virtual identity vector and the first preset value, multiply the first ratio by the i-th first virtual identity vector, obtain the first result, and perform Exponential operation to obtain the first operation value;
  • Step 322 Obtain the second ratio between each second virtual identity vector and the first preset value in the updated i-th dynamic list, and for each second ratio, compare the second ratio with the corresponding th Multiply the i first virtual identity vectors to obtain a second result, and perform an exponential operation on the second result to obtain the second operation value corresponding to each second virtual identity vector;
  • Step 323 Determine the sum of the second operation values corresponding to each second virtual identity vector, obtain the third ratio of the first operation value and the sum, and perform a logarithmic operation on the third ratio to obtain the third operation value. ;
  • Step 324 Determine the negative number of the third operation value as the identity subloss corresponding to the i-th first virtual identity vector.
  • the identity sub-loss L is determined using contrast loss in the form of InfoNCE (Information Noise Contrastive Noise, Information Noise Contrastive Estimation) c , where InfoNCE is a loss function modified by autoregression based on mutual information (Mutual Information).
  • InfoNCE Information Noise Contrastive Noise, Information Noise Contrastive Estimation
  • the identity subloss L c (i) corresponding to the i-th first virtual identity vector is determined according to the following formula (3):
  • w j is the i-th first virtual identity vector of the first target j
  • K[j] is the i-th second virtual identity vector of the first target j
  • is the first preset value
  • K[k] is the i-th second virtual identity vector corresponding to the k-th target in the i-th dynamic list
  • w k is the first virtual identity vector corresponding to the k-th target
  • K is the total number of targets included in the i-th dynamic list .
  • Step 33 Determine the sum of the identity sub-losses corresponding to the N first virtual identity vectors as the identity loss of the target network model.
  • the identity sub-loss corresponding to the i-th first virtual identity vector is determined, the sum of the identity sub-losses corresponding to the N first virtual identity vectors is determined as the identity loss.
  • N is 3.
  • the identity sub-loss corresponding to each of the three first virtual identity vectors is determined, and then the sum of the identity sub-losses corresponding to the three first virtual identity vectors is determined. for model identity loss.
  • the loss of the target network model is determined based on the identity anonymization image and divergence constraints, including the following steps:
  • the reconstruction loss between the identity anonymized image and the second training image is determined, and the loss of the target network model is determined based on the reconstruction loss, divergence constraints and identity loss.
  • the difference between the identity anonymized image and the second training image is determined as the reconstruction loss.
  • the sum of the differences between each pixel of the identity anonymized image and the corresponding pixel of the second training image is determined as the reconstruction loss.
  • the reconstruction loss L rec is determined according to the following formula (4):
  • Y s, t is the identity anonymization image
  • X t is the second training image
  • 1 is the 1-norm operation.
  • the loss of the target network model is determined based on the reconstruction loss, divergence constraint and identity loss. For example, the weighted sum of reconstruction loss, divergence constraints, and identity loss is determined as the final loss of the target network model.
  • the embodiments of the present application also include determining the identity contrast loss of the identity anonymized image. For example, the following steps are included:
  • Step A Determine a first distance between the identity anonymized image and the first training image, a second distance between the identity anonymized image and the second training image, and a third distance between the first training image and the second training image;
  • Step B Determine the contrast loss based on the first distance, the second distance and the third distance
  • first distance, second distance and third distance can be determined by any distance method such as cosine distance.
  • Example 1 After the first distance, the second distance and the third distance are determined according to step A, the sum of the first distance, the second distance and the third distance is determined as the contrast loss.
  • Example 2 Determine the sum of the square of the difference between the second distance and the third distance and the first distance; determine the difference between the preset value and the sum as the contrast loss.
  • the contrast loss L ICL is determined according to the following formula (5):
  • z id represents the 512 -dimensional identity vector representation of image
  • the first distance of a training image, cos(z id (Y s, t ), z id (X t )) is the second distance of the identity anonymized image and the second training image, cos(z id (X s ), z id (X t )) is the third distance between the first training image and the second training image.
  • the loss of the target network model is determined based on the reconstruction loss, divergence constraint, identity loss and contrast loss. For example, the weighting of the reconstruction loss, divergence constraint, identity loss and contrast loss is and, determined as the loss of the target network model.
  • the adversarial loss of the model is also determined. For example, the adversarial loss is determined based on the identity anonymized image and the first training image.
  • the adversarial loss L GAN is determined:
  • D is the discriminator
  • G is the generator
  • E(*) represents the expected value of the distribution function
  • D(X s ) is the identification result of the first training image X s by the discriminator
  • D(Y s, t ) is the identification result.
  • the loss of the target network model can be determined based on the reconstruction loss, divergence constraint, identity loss, contrast loss and adversarial loss.
  • the reconstruction loss, divergence constraint, identity loss, The weighted sum of contrast loss and adversarial loss is determined as the loss of the target network model.
  • the embodiments of this application do not limit the size of the weight values corresponding to the reconstruction loss, divergence constraint, identity loss, contrast loss and adversarial loss, and can be determined according to actual needs.
  • the reconstruction loss, divergence constraint, identity loss, contrast loss and adversarial loss are weighted to obtain the loss L total of the target network model:
  • the weight corresponding to each loss in the above formula (7) is an example.
  • the weight corresponding to each loss in the embodiment of this application includes but is not limited to what is shown in the above formula (7) and can be determined as needed.
  • the embodiment of the present application achieves identity anonymization by generating first virtual identity vectors corresponding to different resolutions, which can improve the resolution of anonymization. For example, an anonymization result with a resolution of 1024 2 can be generated while generating less of picture artifacts with higher fidelity.
  • the embodiment of the present application does not rely on key regression models and segmentation models during model training, that is, the face area in the image is not removed, and the posture, details, and occlusions in the original image are retained.
  • the first training image is projected to the target space through the projection module, and N first virtual identity vectors are obtained, so that the target network model can fully analyze the identity information in the image.
  • the identity anonymized image of the second training image is obtained, so that the trained model can generate an image carrying virtual identity information while ensuring that the attribute information of the original image remains unchanged. That is, this application provides a new target network model.
  • the target network model can learn the identity information in the first training image.
  • the target network model can independently generate a virtual identity, and at the same time, the target network model can learn the identity information in the first training image. 2. Fully learn the attribute information in the training image. There is no need to remove the facial area in the image during the entire learning process, and there is no need to use real identity information for guidance.
  • the target network model is trained by using the clear supervision goals in the face-changing task. , improve the fidelity and resolution of the identity anonymization generation of the target network model, so that the trained target network model can generate high-quality identity anonymization images.
  • FIG 11 is a schematic flowchart of an identity anonymization method provided by an embodiment of the present application.
  • the identity anonymization method shown in Figure 11 uses the above-trained target network model for identity anonymization processing. As shown in Figure 11, the method includes:
  • the embodiment of the present application uses the first training image to train the projection module, so that the projection module can fully learn the identity information in the first training image.
  • N virtual identity vectors can be obtained by sampling the target space of the projection module.
  • the implementation methods of the above S401 include but are not limited to the following:
  • Method 1 Sampling is based on the mean and variance of the target space of the trained projection module to obtain N virtual identity vectors. For example, randomly sample the variance of the target space and then add it to the mean of the target space to obtain a virtual identity vector. Repeat the above steps to obtain N virtual identity vectors.
  • Method 2 The target space includes the first space and the second space, and the target network model includes the second projection unit. At this time, the following method can be used to sample the target space of the projection module in the target network model to obtain N virtual identities. vector:
  • the first projection unit in the projection module is no longer used, and only the second projection unit in the projection module is used for projection.
  • sampling is performed in the first space Z that conforms to the standard Gaussian distribution to obtain N identity latent vectors, and then the N identity latent vectors are input into the second projection unit.
  • the second projection unit projects N identity latent vectors into W space to obtain N virtual identity vectors.
  • N is 3 and the second projection unit is a mapping network as an example.
  • the projection module in the embodiment of the present application is not limited to that shown in Figure 12.
  • the first training image is used to train the first space so that the variance and mean of the first space conform to the standard Gaussian distribution.
  • sampling is performed on the first space to generate N identity latent vectors.
  • sampling is performed based on the mean and variance of the first space to obtain N identity latent vectors.
  • Random sampling is performed in the variance of the first space, and then Add it to the mean value of the first space to obtain an identity latent vector.
  • the N identity latent vectors are projected to the second space through the second projection unit to obtain N virtual identity vectors.
  • the attribute module in the embodiment of the present application is configured to extract attribute information in the image to be processed.
  • the attribute module includes a coding unit and a decoding unit. At this time, the following method can be used to extract the attribute vector of the image to be processed to obtain M attribute vectors:
  • the above-mentioned encoding unit may include multiple feature extraction layers.
  • the above-mentioned decoding unit may also include multiple feature extraction layers; wherein the feature extraction layer may include a convolutional layer, etc.
  • the M attribute vectors generated above can correspond to different resolutions.
  • the target network model is an autoencoder.
  • image generation is performed based on N virtual identity vectors and M attribute vectors to obtain an identity anonymized image of the image to be processed.
  • N virtual identity vectors and M attribute vectors are generated and input into the fusion module to obtain the identity anonymized image of the image to be processed.
  • Example 1 Splice N virtual identity vectors and splice M attribute vectors at the same time. After fusing the spliced virtual identity vectors and attribute vectors, input them into the fusion module.
  • the spliced virtual identity vector and attribute vector are concatenated and then input into the fusion module.
  • the spliced virtual identity vector and attribute vector are added and then input into the fusion module.
  • the fusion module includes multiple different resolution layers.
  • N virtual identity vectors can be used as styles
  • M attribute vectors can be used as noise
  • the corresponding In the resolution layer the identity anonymized image of the image to be processed is obtained.
  • the fusion module is StyleGAN2.
  • the AdaIN layer is included between the two adjacent resolution layers of the fusion module.
  • the virtual identity vector i+1 is subjected to affine transformation, and the output features of the i-th resolution layer are After the information i is merged with the attribute vector i, it is input into the AdaIN layer with the affine transformed virtual identity vector i+1, the AdaIN operation is performed, and the AdaIN operation result is input into the i+1th resolution layer.
  • the fusion module in the embodiment of this application can also be an adversarial model such as StyleGAN3 and ProGAN.
  • an adversarial model such as StyleGAN3 and ProGAN.
  • the identity anonymization process in the embodiment of the present application is introduced.
  • sampling is performed in the first space Z of the projection module, and N identity latent vectors are obtained.
  • N identity latent vectors are obtained, and these 3 N identity latent vectors respectively correspond to 3 different resolutions: low, medium and high.
  • N identity latent vectors are input into the mapping network, and the N identity latent vectors are projected from the first space Z to the second space W through the mapping network to obtain N virtual identity vectors.
  • the image Xt to be processed is input into the autoencoder, and after the image Xt to be processed is processed by the autoencoder, M attribute vectors are generated.
  • the M attribute vectors are used as noise and the N virtual identity vectors are used as styles, which are input into each layer of StyleGAN2 to obtain the identity anonymized image Ys,t of the image to be processed output by StyleGAN2.
  • the identity anonymization method provided by the embodiment of the present application samples on the target space of the projection module in the target network model to obtain N virtual identity vectors.
  • the attribute vector of the image to be processed is extracted to obtain M attribute vectors
  • the fusion module of the target network model generate images based on N virtual identity vectors and M attribute vectors to obtain the identity anonymized image of the image to be processed. That is to say, the target network model of the embodiment of the present application can independently generate a virtual identity.
  • Figure 14 is a schematic block diagram of a model training device provided by an embodiment of the present application.
  • the training device 10 may be a computing device or a part of a computing device.
  • the model training device 10 includes:
  • the projection unit 11 is configured to project the first training image to the target space through the projection module in the target network model to obtain N first virtual identity vectors, where N is a positive integer;
  • the attribute unit 12 is configured to extract attribute vectors from the second training image through the attribute module in the target network model to obtain M attribute vectors, where M is a positive integer;
  • the fusion unit 13 is configured to perform image generation based on the N first virtual identity vectors and the M attribute vectors through the fusion module of the target network model to obtain an identity anonymized image of the second training image;
  • the training unit 14 is configured to determine the loss of the target network model based on the identity anonymized image, and train the target network model based on the loss.
  • the projection module includes a first projection unit and a second projection unit
  • the target space includes a first space and a second space
  • the projection unit 11 is further configured to extract a priori of the first training image. verify the identity information; use the first projection unit to project the a priori identity information to the first space to obtain N identity latent vectors; use the second projection unit to project the N identity latent vectors to In the second space, the N first virtual identity vectors are obtained.
  • the projection unit 11 is further configured to project the prior identity information into the mean and variance of the first space through the first projection unit; and perform sampling based on the mean and variance of the first space. , obtain the N identity latent vectors.
  • the training unit 14 is further configured to determine the divergence constraints of the N identity latent vectors; and determine the loss of the target network model according to the identity anonymized image and the divergence constraints.
  • the N first virtual identity vectors respectively correspond to different resolutions.
  • the first projection unit is a variational autoencoder.
  • the training unit 14 is also configured to obtain a third training image, where the third training image and the first training image are two different images of the first target; through the target network model
  • the projection reference module in projects the third training image to the target space to obtain N second virtual identity vectors.
  • the projection reference module has the same network structure as the projection module and is updated according to the projection module. ; According to the N first virtual identity vectors and the N second virtual identity vectors, determine the identity loss; According to the identity anonymization image, the divergence constraint and the identity loss, determine the target network model loss.
  • the training unit 14 is further configured to, for the i-th second virtual identity vector among the N second virtual identity vectors, use the i-th second virtual identity vector to update the i-th dynamic In the list, the virtual identity vector corresponding to the first target, wherein the i-th dynamic list includes virtual identity vectors of different targets at the i-th resolution, and the i is a positive integer from 1 to N; According to the i-th first virtual identity vector and the updated i-th dynamic list, determine the identity sub-loss corresponding to the i-th first virtual identity vector; assign the N first virtual identity vectors to The sum of the identity sub-losses is determined as the identity loss.
  • the training unit 14 is further configured to obtain a first ratio between the i-th second virtual identity vector and a first preset value, and compare the first ratio with the i-th first virtual identity vector. Multiply the identity vectors to obtain the first result, and perform an exponential operation on the first result to obtain the first operation value; obtain the updated i-th dynamic list, each second virtual identity vector and the first For the second ratio of the preset value, for each second ratio, multiply the second ratio by the corresponding i-th first virtual identity vector to obtain the second result, and perform the Exponential operation is performed to obtain the second operation value corresponding to each second virtual identity vector; the sum of the second operation values corresponding to each second virtual identity vector is determined, and the third operation value of the first operation value and the sum is obtained. Three ratios are calculated, and a logarithmic operation is performed on the third ratio to obtain a third operation value; the negative number of the third operation value is determined as the identity subloss corresponding to the i-th first virtual identity vector.
  • the attribute module includes an encoding unit and a decoding unit.
  • the attribute unit 12 is also configured to perform feature extraction on the second training image through the encoding unit to obtain feature information of the second training image. ; Decode the feature information through the decoding unit to obtain M attribute vectors.
  • the fusion module includes a plurality of different resolution layers, and the fusion unit 13 is further configured to merge the N first virtual identity vectors according to the resolution corresponding to the N first virtual identity vectors.
  • the identity vector is used as the style, and the M attribute vectors are used as noise, and are input into the corresponding resolution layer to obtain the identity anonymized image of the second training image.
  • the training unit 14 is further configured to determine a reconstruction loss between the identity anonymized image and the second training image; according to the reconstruction loss, the divergence constraint and the identity loss, Determine the loss of the target network model.
  • the training unit 14 is further configured to determine a first distance between the identity anonymization image and the first training image, and a second distance between the identity anonymization image and the second training image, and a third distance between the first training image and the second training image; according to the first distance, the second distance and the third distance, a contrast loss is determined; according to the reconstruction loss, The divergence constraint, the identity loss and the contrast loss determine the loss of the target network model.
  • the training unit 14 is further configured to determine the sum of the square of the difference between the second distance and the third distance and the first distance; and calculate the difference between the preset value and the sum. value, determined as the contrast loss.
  • the training unit 14 is further configured to determine an adversarial loss based on the identity anonymized image and the first training image; combine the reconstruction loss, the scattered The weighted sum of the degree constraint, the identity loss, the comparison loss and the adversarial loss is determined as the loss of the target network model.
  • the device embodiments and method embodiments may correspond to each other, and similar descriptions may refer to the method embodiments. To avoid repetition, they will not be repeated here.
  • the device shown in Figure 14 can execute the embodiment of the model training method shown in Figure 3, and the foregoing and other operations and/or functions of each module in the device are respectively intended to implement the corresponding method embodiments of the computing device. , for the sake of brevity, will not be repeated here.
  • FIG 15 is a schematic block diagram of an identity anonymization device provided by an embodiment of the present application.
  • the identity anonymization device 20 may be a computing device or a part of the computing device. As shown in Figure 15, the identity anonymization device 20 includes:
  • the sampling unit 21 is configured to sample on the target space of the projection module in the target network model to obtain N virtual identity vectors, where N is a positive integer;
  • the attribute unit 22 is configured to extract attribute vectors of the image to be processed through the attribute module in the target network model, and obtain M attribute vectors, where M is a positive integer;
  • the anonymization unit 23 is configured to generate an image based on the N virtual identity vectors and the M attribute vectors through the fusion module of the target network model to obtain an identity anonymized image of the image to be processed.
  • the target space includes a first space and a second space
  • the target network model includes a second projection unit
  • the sampling unit 21 is also configured to sample on the first space to obtain N Identity latent vectors: project the N identity latent vectors to the second space through the second projection unit to obtain the N virtual identity vectors.
  • the mean and variance of the first space satisfy the standard Gaussian distribution
  • the sampling unit 21 is also configured to perform sampling based on the mean and variance of the first space to obtain the N identity latent vectors.
  • the N virtual identity vectors respectively correspond to different resolutions.
  • the attribute module includes an encoding unit and a decoding unit.
  • the attribute unit 22 is also configured to perform feature extraction on the image to be processed through the encoding unit to obtain the feature information of the image to be processed;
  • the feature information is decoded by the decoding unit to obtain M attribute vectors.
  • the fusion module includes multiple different resolution layers
  • the anonymization unit 23 is also configured to use the N virtual identity vectors as Style, the M attribute vectors are used as noise and input into the corresponding resolution layer to obtain the identity anonymized image of the image to be processed.
  • the device embodiments and the method embodiments may correspond to each other, and similar descriptions may refer to the method embodiments. To avoid repetition, they will not be repeated here.
  • the device shown in Figure 15 can execute the embodiment of the identity anonymization method shown in Figure 11, and the foregoing and other operations and/or functions of each module in the device are respectively implemented to implement the corresponding method of the computing device. Example, for the sake of brevity, will not be repeated here.
  • this functional module can be implemented in the form of hardware, can also be implemented through instructions in the form of software, or can also be implemented through a combination of hardware and software modules.
  • each step of the method embodiment in the embodiment of the present application can be completed through the integrated logic circuit of the hardware in the processor and/or instructions in the form of software.
  • the steps of the method disclosed in the embodiment of the present application can be directly embodied as a hardware translation.
  • the execution of the code processor is completed, or the execution is completed using a combination of hardware and software modules in the decoding processor.
  • the software module may be located in a mature storage medium in the field such as random access memory, flash memory, read-only memory, programmable read-only memory, electrically erasable programmable memory, register, etc.
  • the storage medium is located in the memory, and the processor reads the information in the memory and completes the steps in the above method embodiment in combination with its hardware.
  • Figure 16 is a schematic block diagram of a computing device provided by an embodiment of the present application.
  • the computing device is configured to execute the above method embodiment.
  • the computing device 30 may include:
  • Memory 31 and processor 32 the memory 31 is configured to store a computer program 33 and transmit the program code 33 to the processor 32 .
  • the processor 32 can call and run the computer program 33 from the memory 31 to implement the method in the embodiment of the present application.
  • the processor 32 may be configured to perform the above method steps according to instructions in the computer program 33 .
  • the processor 32 may include but is not limited to:
  • DSP Digital Signal Processor
  • ASIC Application Specific Integrated Circuit
  • FPGA Field Programmable Gate Array
  • the memory 31 includes but is not limited to:
  • Non-volatile memory can be read-only memory (Read-Only Memory, ROM), programmable read-only memory (Programmable ROM, PROM), erasable programmable read-only memory (Erasable PROM, EPROM), electrically removable memory. Erase programmable read-only memory (Electrically EPROM, EEPROM) or flash memory. Volatile memory may be Random Access Memory (RAM), which is used as an external cache.
  • RAM Random Access Memory
  • RAM static random access memory
  • DRAM dynamic random access memory
  • DRAM synchronous dynamic random access memory
  • SDRAM double data rate synchronous dynamic random access memory
  • Double Data Rate SDRAM DDR SDRAM
  • ESDRAM enhanced synchronous dynamic random access memory
  • SLDRAM synchronous link dynamic random access memory
  • Direct Rambus RAM Direct Rambus RAM
  • the computer program 33 can be divided into one or more modules, and the one or more modules are stored in the memory 31 and executed by the processor 32 to complete the provisions of this application. method of recording a page.
  • the one or more modules may be a series of computer program instruction segments capable of completing specific functions. The instruction segments are used to describe the execution process of the computer program 33 in the computing device.
  • the computing device 30 may also include:
  • Transceiver 34 the transceiver 34 can be connected to the processor 32 or the memory 31 .
  • the processor 32 can control the transceiver 34 to communicate with other devices, for example, it can send information or data to other devices, or receive information or data sent by other devices.
  • Transceiver 34 may include a transmitter and a receiver.
  • the transceiver 34 may also include an antenna, and the number of antennas may be one or more.
  • bus system where in addition to the data bus, the bus system also includes a power bus, a control bus and a status signal bus.
  • Embodiments of the present application provide a computer storage medium on which a computer program is stored. When the computer program is executed by a computer, the computer can perform the method of the above method embodiment. In other words, embodiments of the present application also provide a computer program product containing instructions, which when executed by a computer causes the computer to perform the method of the above method embodiments.
  • Embodiments of the present application provide a computer program product or computer program.
  • the computer program product or computer program includes computer instructions, and the computer instructions are stored in a computer-readable storage medium.
  • the processor of the computing device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computing device performs the method of the above method embodiment.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Bioethics (AREA)
  • Computer Security & Cryptography (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Computer Hardware Design (AREA)
  • Image Analysis (AREA)
  • Traffic Control Systems (AREA)

Abstract

本申请提供了一种模型训练和身份匿名化方法、装置、设备、存储介质及程序产品,可应用于云技术、人工智能、智慧交通、辅助驾驶等各种场景。该方法包括:在目标网络模型中投影模块的目标空间上进行采样,得到N个虚拟身份向量;通过目标网络模型中的属性模块,对待处理图像进行属性向量提取,得到M个属性向量;通过所述目标网络模型的融合模块,基于N个虚拟身份向量和M个属性向量进行图像生成,得到待处理图像的身份匿名化图像。

Description

模型训练和身份匿名化方法、装置、设备、存储介质及程序产品
相关申请的交叉引用
本申请基于申请号为202210234385.2、申请日为2022年03月10日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请作为参考。
技术领域
本申请涉及图像处理技术领域,尤其涉及一种模型训练和身份匿名化方法、装置、设备、存储介质及程序产品。
背景技术
身份匿名化又叫做去身份化(De-Identification),指去除图像或视频中可识别的身份特征(Identity),但同时保留其他与身份无关的属性不变,并保证匿名化的图片或视频必须在视觉上仍然真实的。
相关技术中,使用条件生成式对抗网络(Generative Adversarial Networks,GAN)生成匿名化的图片,通过提取原始图片的姿态关键点,并将原始图片的姿态关键点和去除面部区域之后的背景图片作为条件输入模型中,以生成新的虚拟身份来填补空缺的面部区域。但是,该方法,以去除面部区域之后的背景图片作为模型输入,使得模型生成的图片质量差。
发明内容
本申请实施例提供一种模型训练方法和身份匿名化方法、装置、计算设备、计算机可读存储介质及计算机程序产品,能够提高身份匿名化图像的生成质量。
本申请实施例提供了一种模型训练方法,包括:
通过目标网络模型中的投影模块,将第一训练图像投影至目标空间,得到N个第一虚拟身份向量,所述N为正整数;
通过所述目标网络模型中的属性模块,对第二训练图像进行属性向量提取,得到M个属性向量,所述M为正整数;
通过所述目标网络模型的融合模块,基于所述N个第一虚拟身份向量和所述M个属性向量进行图像生成,得到所述第二训练图像的身份匿名化图像;
根据所述身份匿名化图像,确定所述目标网络模型的损失,并根据所述损失对所述目标网络模型进行训练。
本申请实施例还提供了一种身份匿名化方法,包括:
在目标网络模型中投影模块的目标空间上进行采样,得到N个虚拟身份向 量,所述N为正整数;
通过目标网络模型中的属性模块,对待处理图像进行属性向量提取,得到M个属性向量,所述M为正整数;
通过所述目标网络模型的融合模块,基于所述N个虚拟身份向量和所述M个属性向量进行图像生成,得到所述待处理图像的身份匿名化图像。
本申请实施例还提供了一种模型训练装置,包括:
投影单元,配置为通过目标网络模型中的投影模块,将第一训练图像投影至目标空间,得到N个第一虚拟身份向量,所述N为正整数;
属性单元,配置为通过所述目标网络模型中的属性模块,对第二训练图像进行属性向量提取,得到M个属性向量,所述M为正整数;
融合单元,配置为通过所述目标网络模型的融合模块,基于所述N个第一虚拟身份向量和所述M个属性向量进行图像生成,得到所述第二训练图像的身份匿名化图像;
训练单元,配置为根据所述身份匿名化图像,确定所述目标网络模型的损失,并根据所述损失对所述目标网络模型进行训练。
本申请实施例还提供了一种身份匿名化装置,包括:
采样单元,配置为在目标网络模型中投影模块的目标空间上进行采样,得到N个虚拟身份向量,所述N为正整数;
属性单元,配置为通过目标网络模型中的属性模块,对待处理图像进行属性向量提取,得到M个属性向量,所述M为正整数;
匿名化单元,配置为通过所述目标网络模型的融合模块,基于所述N个虚拟身份向量和所述M个属性向量进行图像生成,得到所述待处理图像的身份匿名化图像。
本申请实施例还提供了一种计算设备,包括处理器和存储器。所述存储器,配置为存储计算机程序,所述处理器,配置为调用并运行所述存储器中存储的计算机程序,以执行本申请实施例提供的上述模型训练方法或身份匿名化方法。
本申请实施例还提供了一种芯片,配置为实现本申请实施例提供的上述模型训练方法或身份匿名化方法法。所述芯片包括:处理器,配置为从存储器中调用并运行计算机程序,使得安装有所述芯片的设备执行本申请实施例提供的上述模型训练方法或身份匿名化方法。
本申请实施例还提供了一种计算机可读存储介质,配置为存储计算机程序,所述计算机程序被执行时,实现本申请实施例提供的上述模型训练方法或身份匿名化方法。
本申请实施例还提供了一种计算机程序产品,包括计算机程序指令,所述计算机程序指令被计算机执行时,实现本申请实施例提供的上述模型训练方法或身份匿名化方法。
本申请实施例还提供了一种计算机程序,当其在计算机上运行时,实现本申请实施例提供的上述模型训练方法或身份匿名化方法。
本申请实施例具有以下有益效果:
在目标网络模型训练过程中,通过将第一训练图像投影至目标空间,得到N个第一虚拟身份向量,使得目标网络模型能够对图像中的身份信息进行充分学习,而通过对第二训练图像进行属性向量提取,得到M个属性向量,实现了目标网络模型对图像中属性信息的充分学习,基于将N个第一虚拟身份向量和M个属性向量进行图像生成,得到第二训练图像的身份匿名化图像,如此,使得训练得到的模型能够在保证原始图像的属性信息不变的情况下,生成携带虚拟身份信息的图像;
在目标网络模型的应用过程中,通过在投影模块的目标空间上进行采样,得到N个虚拟身份向量,实现了虚拟身份信息的生成,通过对待处理图像进行属性向量提取,得到M个属性向量,保证待处理图像中属性特征的不丢失,进而保证所生成身份匿名化图像的质量,基于N个虚拟身份向量和M个属性向量进行图像生成,得到待处理图像的身份匿名化图像,实现了在保证待处理图像的属性信息不变的情况下,生成携带虚拟身份信息、即隐藏真实身份的身份匿名化图像,即本申请实施例在身份匿名化时,通过目标网络模型生成独立虚拟身份,无需去除图像中面部区域,进而提高身份匿名化的保真度和分辨率。
附图说明
图1A为本申请实施例提供的真实图像示意图;
图1B-图1D为本申请实施例提供的图1A对应的身份匿名化图像示意图;
图2为本申请实施例提供的一种系统架构示意图;
图3为本申请实施例提供的模型训练方法的流程示意图;
图4至图6为本申请实施例提供的目标网络模型的结构示意图;
图7为本申请实施例提供的融合模块结构示意图;
图8为本申请实施例提供的目标网络模型的结构示意图;
图9及图10为本申请实施例提供的对比损失确定的示意图;
图11为本申请实施例提供的身份匿名化方法流程示意图;
图12为本申请实施例提供的投影模块示意图;
图13为本申请施例提供的身份匿名化图像确定示意图;
图14为本申请实施例提供的模型训练装置的示意性框图;
图15为本申请实施例提供的身份匿名化装置的示意性框图;
图16为本申请实施例提供的计算设备的示意性框图。
具体实施方式
为使下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行描述。
应理解,在本发明实施例中,“与A对应的B”表示B与A相关联。在 一种实现方式中,可以根据A确定B。但还应理解,根据A确定B并不意味着仅仅根据A确定B,还可以根据A和/或其它信息确定B。
在本申请实施例的描述中,除非另有说明,“多个”是指两个或多于两个。
另外,为了便于清楚描述本申请实施例的技术方案,在本申请的实施例中,采用了“第一”、“第二”、“第三”等字样对功能和作用基本相同的相同项或相似项进行区分。本领域技术人员可以理解“第一”、“第二”、“第三”等字样并不对数量和执行次序进行限定,并且“第一”、“第二”、“第三”等字样也并不限定一定不同。
为了便于理解本申请的实施例,首先对本申请实施例涉及到的相关概念进行如下简单介绍:
人工智能(Artificial Intelligence,AI)是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能,感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。换句话说,人工智能是计算机科学的一个综合技术,它企图了解智能的实质,并生产出一种新的能以人类智能相似的方式做出反应的智能机器。人工智能也就是研究各种智能机器的设计原理与实现方法,使机器具有感知、推理与决策的功能。
人工智能技术是一门综合学科,涉及领域广泛,既有硬件层面的技术也有软件层面的技术。人工智能基础技术一般包括如传感器、专用人工智能芯片、云计算、分布式存储、大数据处理技术、操作/交互系统、机电一体化等技术。人工智能软件技术主要包括计算机视觉技术、语音处理技术、自然语言处理技术以及机器学习/深度学习等几大方向。
机器学习(Machine Learning,ML)是一门多领域交叉学科,涉及概率论、统计学、逼近论、凸分析、算法复杂度理论等多门学科。专门研究计算机怎样模拟或实现人类的学习行为,以获取新的知识或技能,重新组织已有的知识结构使之不断改善自身的性能。机器学习是人工智能的核心,是使计算机具有智能的根本途径,其应用遍及人工智能的各个领域。机器学习和深度学习通常包括人工神经网络、置信网络、强化学习、迁移学习、归纳学习、式教学习等技术。
本申请实施例的方法,可以应用于任意需要对图像进行匿名化处理的场景。例如,图1A至图1D所示,图1A为真实图像,图1B至图1D为图1A的身份匿名化图像。对比图1A和图1B至图1D可知,图1B至图1D去除图1A中可识别的身份特征(Identity),同时保留其他与身份无关的属性不变,并保证在视觉上仍然真实。
场景1,本申请实施例可以应用于隐私保护场景,例如对于人脸相关的图片或视频,可以使用本申请实施例的方法将真实身份替换为虚拟身份,这样后续的检测等任务可以继续执行而不会泄露隐私。另外,用户在发布图片或视频时也可使用本申请实施例的方法隐去自己的身份特征,避免真实信息泄露。
场景2,本申请实施例可以应用于生成虚拟形象场景,例如本申请实施例的技术方案可被用于生成虚拟身份,例如固定身份隐变量,替换背景图片,可以生成某个特定虚拟形象在不同场景下的图片或视频。
需要说明的是,上述场景1和场景2以目标为人脸为例进行说明,本申请实施例的方法还可以应用于非人脸的其他目标的身份匿名化的场景中,例如对待处理图像中动物、车辆等任意目标进行身份匿名化。
在一些实施例中,本申请实施例的方法可以应用于智能交通系统,智能交通系统(Intelligent Traffic System,ITS)又称智能运输系统(Intelligent Transportation System),是将先进的科学技术(信息技术、计算机技术、数据通信技术、传感器技术、电子控制技术、自动控制理论、运筹学、人工智能等)有效地综合运用于交通运输、服务控制和车辆制造,加强车辆、道路、使用者三者之间的联系,从而形成一种保障安全、提高效率、改善环境、节约能源的综合运输系统。示例性的,本申请与智能交通相结合的方案可以是,车载设备采集用户的人脸图像,并采用本申请实施例的方法,对采集的人脸图像进行身份匿名化处理后,发送给其他设备进行任务分析等,例如进行非法驾驶分析、或智能驾驶分析等。
图2为本申请实施例涉及的一种系统架构示意图,包括用户设备101、数据采集设备102、训练设备103、执行设备104、数据库105、内容库106、I/O接口107和目标网络模型108。
其中,数据采集设备102,配置为从内容库106中读取训练数据,并将读取的训练数据存储至数据库105中。本申请实施例涉及的训练数据包括第一训练图像、第二训练图像和第三训练图像,第一训练图像、第二训练图像和第三训练均用于训练目标网络模型。
在一些实施例中,用户设备101,配置为对数据库105中的数据进行标注操作。
训练设备103基于数据库105中维护的训练数据,对目标网络模型108进行训练,使得训练后的目标网络模型108可以生成待处理图像的身份匿名化图像。在一些实施例中,训练设备103得到的目标网络模型108可以应用到不同的系统或设备中。
在附图2中,执行设备104配置有I/O接口107,与外部设备进行数据交互。比如通过I/O接口接收用户设备101发送的待处理图像。执行设备104中的计算模块109使用训练好的目标网络模型108对输入的待处理图像进行处理,输出身份匿名化图像,并将生成的身份匿名化图像输出给用户设备101进行显示,或者输入其他任务模型中进行其他任务处理。
其中,用户设备101可以包括手机、平板电脑、笔记本电脑、掌上电脑、移动互联网设备(mobile internet device,MID)或其他具有安装浏览器功能的终端设备。
执行设备104可以为服务器。服务器可以是一台或多台。服务器是多台 时,可以存在如下情况至少之一:至少两台服务器配置为提供不同的服务,至少两台服务器配置为提供相同的服务;比如以负载均衡方式提供同一种服务,本申请实施例对此不加以限定。其中,上述服务器可以是独立的物理服务器,也可以是多个物理服务器构成的服务器集群或者分布式系统,还可以是提供云服务、云数据库、云计算、云函数、云存储、网络服务、云通信、中间件服务、域名服务、安全服务、内容分发网络(Content Delivery Network,CDN)、以及大数据和人工智能平台等基础云计算服务的云服务器。服务器也可以成为区块链的节点。
本实施例中,执行设备104通过网络与用户设备101连接。所述网络可以是企业内部网(Intranet)、互联网(Internet)、全球移动通讯系统(Global System of Mobile communication,GSM)、宽带码分多址(Wideband Code Division Multiple Access,WCDMA)、4G网络、5G网络、蓝牙(Bluetooth)、Wi-Fi、通话网络等无线或有线网络。
需要说明的是,附图2仅是本申请实施例提供的一种系统架构的示意图,图中所示设备、器件、模块等之间的位置关系不构成任何限制。在一些实施例中,上述数据采集设备102与用户设备101、训练设备103和执行设备104可以为同一个设备。上述数据库105可以分布在一个服务器上也可以分布在多个服务器上,上述的内容库106可以分布在一个服务器上也可以分布在多个服务器上。
下面通过一些实施例对本申请实施例的技术方案进行详细说明。下面这几个实施例可以相互结合,对于相同或相似的概念或过程可能在某些实施例不再赘述。
本申请提供一种目标网络模型,该目标网络模型用于对待处理图像中的目标(例如面部)进行身份匿名化处理,生成待处理图像的身份匿名化图像。因此,在一些实施例中,可以将目标网络模型称为身份匿名化模型,或者身份匿名化器。
首先,对目标网络模型的训练过程进行介绍。
图3为本申请实施例提供的模型训练方法的流程示意图。本申请实施例的执行主体为具有模型训练功能的装置,例如模型训练装置,该模型训练装置可以为计算设备,或者为计算设备中的一部分。下面以执行主体为计算设备为例进行说明。如图3所示,本申请实施例的方法包括:
S301、计算设备通过目标网络模型中的投影模块,将第一训练图像投影至目标空间,得到N个第一虚拟身份向量,N为正整数。
本申请实施例的第一训练图像为训练数据中的一张训练图像。需要说明的是,若上述第一训练图像为人脸图像,则上述第一训练图像为经过用户许可同意后得到的,且相关图像数据的收集、使用和处理需要遵守相关国家和地区的相关法律法规和标准。
本申请实施例中,使用各第一训练图像对模型进行训练的过程基本相似,为了便于描述,以一张第一训练图像为例进行说明。
本申请实施例通过目标网络模型,将第一训练图像投影到目标空间中,得到第一训练图像的一个或多个虚拟身份向量,使得目标网络模型对第一训练图像的身份信息进行学习。在目标网络模型对身份信息充分学习后,在实际进行身份匿名化处理时,可以直接对目标网络模型的目标空间进行采样,生成虚拟身份向量。
本申请实施例主要涉及属性向量和虚拟身份向量这几个概念。
其中,虚拟身份向量为虚拟身份信息对应的向量,虚拟身份信息为隐藏可识别身份特征后的身份信息,例如隐藏面部可识别的身份特征后的面部信息。
属性向量为属性信息对应的向量,将图像中除可识别身份特征外的其他特征信息称为属性信息,例如背景信息等。
本申请实施例的目标网络模型可以生成独立的虚拟身份向量。
图4为本申请实施例提供的目标网络模型的结构示意图,如图4所示,本申请实施例的目标网络模型包括投影模块、属性模块和融合模块。
其中,投影模块,配置为将第一训练图像投影到目标空间中,得到第一训练图像的N个第一虚拟身份向量。N为正整数,本申请实施例对N的取值不做限制,可根据实际需要进行设定。
属性模块,配置为对第二训练图像进行属性向量提取,以提取第二训练图像的M个属性向量。M为正整数,本申请实施例对M的取值不做限制,可根据实际需要进行设定。在一些实施例中,M等于N。
融合模块,配置为基于上述N个第一虚拟身份向量和M个属性向量进行图像生成,得到第二训练图像的身份匿名化图像。
若上述N为大于1的正整数时,则N个第一虚拟身份向量分别对应不同的分辨率。
由上述可知,本申请实施例的目标网络模型,其中投影模块配置为生成第二训练图像中目标的虚拟身份向量,该虚拟身份向量隐藏了第二训练图像中目标的真实身份特性,属性模块配置为提取第二训练图像的属性向量,该属性向量保留了第二训练图像中目标的真实身份特性外的其他特征。这样,融合模块基于上述虚拟身份向量和属性向量进行图像生成后,可得到隐藏第二训练图像中目标身份的匿名化图像,即身份匿名化图像。
在一些实施例中,如图5所示,投影模块包括第一投影单元和第二投影单元,目标空间包括第一空间Z和第二空间W,此时,上述计算设备通过目标网络模型中的投影模块,可采用如下方式实现将第一训练图像投影至目标空间,得到N个第一虚拟身份向量:
提取第一训练图像的先验身份信息;通过第一投影单元,将先验身份信息投影至第一空间Z,得到N个身份隐向量;通过第二投影单元,将N个身 份隐向量投影至第二空间W,得到N个第一虚拟身份向量。
如图5所示,首先提取第一训练图像的先验身份信息,例如通过预先训练好的识别模型,提取第一训练图像的先验身份信息。接着,通过第一投影单元,将第一训练图像的先验身份信息投影到第一空间Z中,得到N个身份隐向量,再通过第二投影单元,将N个身份隐向量投影至第二空间W,得到N个第一虚拟身份向量。
上述第一空间Z与第二空间W可以为不同的隐空间。本申请实施例对第一空间Z和第二空间W不做限制。
在一些实施例中,第一空间为隐空间Z,该隐空间Z符合标准高斯分布。
此时,上述第一投影单元,可采用如下方式将先验身份信息投影至第一空间Z,得到N个身份隐向量:
通过第一投影单元将先验身份信息,投影为所述第一空间的均值和方差;基于第一空间的均值和方差进行采样,得到N个身份隐向量。
在一些实施例中,第一投影单元为变分自编码器(variational autoencoder,VAE),例如为条件变分自编码器(conditional variational autoencoder,CVAE),条件变分自编码器是一种生成网络,通过编码器学习数据的分布,得到隐变量,然后通过解码器将隐变量恢复到数据的原始形式。条件变分自编码器可以学习到数据的分布,然后抽样生成新的数据,通常用于图像生成。
这样,可以通过将第一训练图像的先验身份信息输入该VAE中,该VAE将先验身份信息投影为第一空间的均值和方差。接着,基于第一空间的均值和方差进行采样,得到第一训练图像的N个身份隐向量。
该示例中,上述第一空间为符合标准高斯分布的隐空间Z,因此,为了增强隐空间的表达能力,本申请实施例在不同的分辨率层次上,生成不同的隐向量,例如生成N个身份隐向量,这等价于构建一个包含多个身份隐向量的Z+空间。
在一些实施例中,第二空间W为由隐空间Z得到,例如,由隐空间Z进行线性或非线性映射得到。
本申请实施例对第二投影单元的网络结构不做限制,例如为映射网络(Mapping Network),该映射网络由多个全连接层组成。
本申请实施例,通过将第一训练图像的先验身份信息投影到投影模块的影空间(即目标空间),以使投影模块对第一训练图像的身份信息进行充分学习,以便后续生成符合实际的虚拟身份向量。
S302、通过目标网络模型中的属性模块,对第二训练图像进行属性向量提取,得到M个属性向量,M为正整数。
其中,第二训练图像为训练数据集中的任一图像,该第二训练图像与上述第一训练图像可以为同一张图像,也可以是不同图像。
本申请实施例的属性模块配置为学习第二训练图像的属性信息,以生成M个属性向量。
本申请实施例对属性模块的网络模型不做限制。
在一些实施例中,如图6所示,属性模块包括编码单元和解码单元,此时,通过目标网络模型中的属性模块,可采用如下方式对第二训练图像进行属性向量提取,得到M个属性向量:
将第二训练图像输入编码单元,得到第二训练图像的特征信息;将特征信息输入解码单元,得到M个属性向量。
在一些实施例中,编码单元包括多个特征提取层,解码单元也包括多个特性提取单元,编码单元中的至少一个特征提取层与解码单元中的至少一个特征提取层之间跳跃连接。
根据上述步骤,生成N个第一虚拟身份向量和M个属性向量后,执行如下S303。
S303、通过目标网络模型的融合模块,基于N个第一虚拟身份向量和M个属性向量进行图像生成,得到第二训练图像的身份匿名化图像。
示例1,对N个第一虚拟身份向量进行拼接,得到拼接后的第一虚拟身份向量,对M个属性向量进行拼接,拼接后的属性向量,将拼接后的第一虚拟身份向量和拼接后的属性向量进行图像生成后,输入融合模块中,以进行身份匿名化图像的生成。
例如,将拼接后的第一虚拟身份向量和接后的属性向量进行级联后,输入融合模块中,以进行身份匿名化图像的生成。
再例如,将拼接后的第一虚拟身份向量和接后的属性向量进行相加后,输入融合模块中,以进行身份匿名化图像的生成。
示例2,融合模块包括多个不同的分辨率层,此时,融合模块,可采用如下方式,基于N个第一虚拟身份向量和M个属性向量进行图像生成,得到第二训练图像的身份匿名化图像:
根据N个第一虚拟身份向量所对应的分辨率,将N个第一虚拟身份向量作为样式,将M个属性向量作为噪音,输入对应的分辨率层中,得到第二训练图像的身份匿名化图像。
举例说明,N为3,M为4,融合模块包括4个不同的分辨率层,其中3个第一虚拟身份向量记为第一虚拟身份向量1、第一虚拟身份向量2和第一虚拟身份向量3,4个属性向量记为属性向量1、属性向量2、属性向量3和属性向量4。4个分辨率层根据分辨率的大小依次记为分辨率层1、分辨率层2、分辨率层3和分辨率层4。第一虚拟身份向量1对应分辨率较低的分辨率层1和分辨率层2,第一虚拟身份向量2对应分辨率中等的分辨率层3,虚拟身份向,3对应分辨率最高的分辨率层4。4个属性向量与4个分辨率层按照分辨率大小依次对应。
示例性的,将第一虚拟身份向量1输入分辨率层1,得到特征信息1,属性向量1与特征信息1合并后,与第一虚拟身份向量1同时输入分辨率层2,得到特征信息2。属性向量2与特征信息2合并后,与第一虚拟身份向量3同 时输入分辨率层3,得到特征信息3。属性向量3与特征信息3合并后,与第一虚拟身份向量4同时输入分辨率层4,得到特征信息4。最后,特征信息4和属性向量4进行合并等处理后,生成第二训练图像的身份匿名化图像。
在一些实施例中,融合模块为基于样式的生成器(Style-based generator,StyleGAN2)。如图7所示,在融合模块相邻两个分辨率层之间包括AdaIN层,例如,对第一虚拟身份向量i+1进行仿射变换(Affine transform,AT),将第i个分辨率层的输出的特征信息i与属性向量i合并后,与仿射变换后的第一虚拟身份向量i+1输入AdaIN层,执行AdaIN操作,并将AdaIN操作结果输入第i+1个分辨率层。
本申请实施例的融合模块还可以是StyleGAN3和ProGAN等对抗模型,当融合模块采用不同的对抗模型时,确定第二训练图像的身份匿名化图像的方式可以不相同,本申请实施例对此不做限制。
在一些实施例中,以第一投影单元为VAE,第二投影单元为映射网络,属性模块为自编码器,融合模块为StyleGAN2为例,对本申请实施例的模型训练过程进行介绍。
示例性地,如图8所示,将第一训练图像Xs通过预训练人脸识别模型,生成先验身份信息。接着。将先验身份信息输入VAE,通过VAE将先验身份信息投影至第一空间Z中,得到N个身份隐向量,例如,得到3个N个身份隐向量,这3个N个身份隐向量分别对应低、中、高3个不同的分辨率。接着,将N个身份隐向量输入映射网络,通过映射网络将N个身份隐向量从第一空间Z投影到第二空间W中,得到N个第一虚拟身份向量。另外,将第二训练图像Xt输入自编码器中,通过自编码器对第二训练图像Xt进行处理后,生成M个属性向量。最后,将M个属性向量作为噪声,将N个第一虚拟身份向量作为样式,输入StyleGAN2的各层中,得到StyleGAN2输出的第二训练图像的身份匿名化图像Ys,t。
根据上述步骤,将第一训练图像和第二训练图像输入目标网络模型中,得到目标网络模型输出的第二训练图像的身份匿名化图像,接着,执行如下S304,以对目标网络模型进行训练。
S304、根据身份匿名化图像,确定目标网络模型的损失,并根据损失对目标网络模型进行训练。
根据上述步骤,目标网络模型输出第二训练图像的身份匿名化图像,根据该身份匿名化图像,确定目标网络模型的损失。
在一些实施例中,将身份匿名化图像输入判断模型中,该判断模型为预先训练的,可以预测身份匿名化图像的匿名化程度的模型。例如,将该身份匿名化图像输入该判断模型,该判断模型对该身份匿名化图像进行身份识别,将识别结果确定为该目标网络模型的损失。若识别准确性高,则说明当前目标网络模型的匿名化效果不理想,此时,根据目标网络模型的损失,对该目标网络模型中的参数进行调整。接着,选择新的第一训练图像和第二训练图 像执行上述S301至S304的步骤,对目标网络模型继续进行训练,直到该目标网络模型达到训练结束条件。其中,训练结束条件至少包括训练次数达到预设次数,或者模型的匿名化程度到底预期效果。
在一些实施例中,若图5所示的第一空间Z为符合标准高斯分布的隐空间,则本申请实施例对第一空间Z中的N个身份隐向量加以KL散度约束L kl,以保证身份信息被投影到标准高斯分布。
基于此,本申请实施例还可确定N个身份隐向量的散度约束,此时,根据身份匿名化图像,确定目标网络模型的损失可以包括:根据身份匿名化图像和散度约束,确定目标网络模型的损失。
示例性的,可以通过如下公式(1),确定N个身份隐向量的散度约束L kl
Figure PCTCN2022111704-appb-000001
其中,μ i为N个身份隐向量中第i个身份隐向量对应的均值,σ i为N个身份隐向量中第i个身份隐向量对应的方差。
需要说明的是,上述公式(1)只是一种示例,本申请实施例确定N个身份隐向量的散度约束的方式包括但不限于上述公式(1),例如可以是对上述公式(1)进行变形等其他计算散度约束的方式。
本申请实施例,在N个身份隐向量的散度约束L kl,这样经过训练,不仅使得投影模块对身份信息进行充分学习,且使得投影模块的第一空间满足标准高斯分布,这样在后期匿名化处理时,可以直接对第一空间进行采样,生成符合标准高斯分布N个身份隐向量,用于生成虚拟身份向量。
在一些实施例中,上述第二空间是由第一空间经过非线性映射得到,是一个复杂的非高斯分布。如图5所示,在将身份信息映射到第一空间后,发现此时的中间隐空间第二空间W分布并不均匀,真实的身份向量聚集到多个不同的中心,且与生成的虚拟身份向量没有重合,因此虚拟的身份向量无法产生合理的人脸身份。因此,本申请实施例提出了使用一个对比损失来对第二空间W空间的隐向量(即第一虚拟身份向量)进行约束,使来自同一身份的隐向量聚合到一起,而与不同身份的隐向量相斥,并使所有的隐向量均匀分布到整个空间。
基于此,本申请实施例还可通过如下方式确定身份损失:
步骤1,获取第三训练图像;
步骤2,通过投影参考模块对第三训练图像进行处理,得到N个第二虚拟身份向量;
步骤3,根据N个第一虚拟身份向量和N个第二虚拟身份向量,确定身份损失。
上述第三训练图像和第一训练图像均为第一目标的两张不同的图像。例如,第三训练图像和第一训练图像为同一个用户的两张不同人脸图像。
上述投影参考模块与投影模块的网络结构相同,且根据投影模块进行更 新。例如,投影参考模块根据投影模块动量更新,即投影参考模块随着投影模块的更新进行缓慢更新。
示例性的,投影参考模块可以根据如下公式(2)进行更新:
Pθ’(t)=(1-Δ)*Pθ’(t-1)+Δ*Pθ(t)      (2)
其中,Pθ’(t)为第t次更新后的投影参考模块参数,Pθ’(t-1)为第t-1次更新后的投影参考模块参数,Pθ(t)为第t次更新后的投影模块参数,Δ为较小值,例如为0.01。
如图9所示,在模型训练过程中,为了确定身份损失,则本申请实施例设定一个与投影模块的网络结构完全一致的投影参考模块,以对投影模块输出的第一虚拟身份向量进行约束。示例性地,将第一训练图像输入投影模块,得到第一训练图像的N个第一虚拟身份向量,将第三训练图像输入投影参考模块,得到第三训练图像的N个第二虚拟身份向量。由于第一训练图像和第三训练图像为同一个目标的图像,且投影模块与投影参考模块网络结构一致,这样若模型训练结束后,第一训练图像对应的N个第一虚拟身份向量与N个第二虚拟身份向量之间的差异较小,基于此,可以根据第一训练图像对应的N个第一虚拟身份向量与N个第二虚拟身份向量对目标网络模型中的投影模块进行训练,以使投影模块可以生成符合要求的虚拟身份向量。
上述步骤1中,确定根据N个第一虚拟身份向量和N个第二虚拟身份向量,确定身份损失的方式包括但不限于如下几种:
方式1,确定N个第一虚拟身份向量和N个第二虚拟身份向量关于不同分辨率上的差值,将差值的和值,或差值的平均值,确定为身份损失。例如,N为3,确定第一虚拟身份向量1与第二虚拟身份向量1的差值1,确定第一虚拟身份向量2与第二虚拟身份向量2的差值2,确定第一虚拟身份向量3与第二虚拟身份向量3的差值3。将差值1、差值2和差值3的和值确定为身份损失,或者,将差值1、差值2和差值3的平均值,确定为身份损失。
方式2,本申请实施例设计了N个动态列表K,该动态列表存储了整个训练集中所有不同目标身份(例如人脸身份)在第二空间W+空间的表示。此时,根据N个第一虚拟身份向量和N个第二虚拟身份向量,可采用如下方式确定身份损失:
步骤31、针对N个第一虚拟身份向量中的第i个第一虚拟身份向量,使用第i个第二虚拟身份向量更新第i个动态列表中,第一目标对应的虚拟身份向量。
其中,第i个动态列表中包括第i个分辨率下不同目标的虚拟身份向量,i为从1到N的正整数。
本申请实施例中,N个第二虚拟身份向量中,每个N个第二虚拟身份向量对应的一个动态列表,例如N为3,分别对应低分辨率、中分辨率和高分辨率,这样动态列表也包括3个,分别为低分辨率对应的第一动态列表,中分辨率对应的第二动态列表和高分辨率对应的第三动态列表。
假设i=1,使用第一个第二虚拟身份向量更新第一动态列表中,第一目标对应的虚拟身份向量。
假设i=2,使用第二个第二虚拟身份向量更新第二动态列表中,第一目标对应的虚拟身份向量。
假设i=3,使用第三个第二虚拟身份向量更新第三动态列表中,第一目标对应的虚拟身份向量。
步骤32、根据第i个第一虚拟身份向量和更新后的第i个动态列表,确定第i个第一虚拟身份向量对应的身份子损失。
示例性的,如图10所示,第一训练图像和第三训练图像为第一目标j的两张不同图像,将第一训练图像Xj输入投影模块,得到N个第一虚拟身份向量Wj,将第三训练图像Xj’输入投影参考模块,得到N个第二虚拟身份向量Wj’。针对N个分辨率中的第i个分辨率,第i个动态列表Ki中包括不同目标在第i个分辨率下的第二虚拟身份向量,且该第i个动态列表Ki实时更新。示例性地,使用第i个第二虚拟身份向量更新第i个动态列表Ki中,第一目标j对应的虚拟身份向量kj,即将kj更新为Wj’。接着,根据第i个第二虚拟身份向量和更新后的第i个动态列表,确定第i个第一虚拟身份向量对应的身份子损失i。
本申请实施例中确定第i个第一虚拟身份向量对应的身份子损失的方式不做限制。
例如,使用中心损失(Center loss)、三元损失(Triplet loss)等损失方式,根据第i个第一虚拟身份向量和更新后的第i个动态列表,确定第i个第一虚拟身份向量对应的身份子损失。
在一些实施例中,上述确定N个第一虚拟身份向量中,第i个第一虚拟身份向量对应的身份子损失,可包括如下步骤:
步骤321、获取第i个第二虚拟身份向量与第一预设值的第一比值,将第一比值与第i个第一虚拟身份向量相乘,得到第一结果,并对第一结果进行指数运算,得到第一运算值;
步骤322、获取更新后的第i个动态列表中,每个第二虚拟身份向量与第一预设值的第二比值,针对各所述第二比值,将所述第二比值与对应的第i个第一虚拟身份向量相乘,得到第二结果,并对所述第二结果进行指数运算,得到每个第二虚拟身份向量对应的第二运算值;
步骤323、确定每个第二虚拟身份向量对应的第二运算值的和,获取所述第一运算值与该和的第三比值,并对第三比值进行对数运算,得到第三运算值;
步骤324、将第三运算值的负数,确定为第i个第一虚拟身份向量对应的身份子损失。
示例性的,以w j为锚点,K i中的第j项为正样本,其余为负样本,使用InfoNCE(Information Noise Contrastive Noise,信息噪声对比估计)形式下的 对比损失确定身份子损失L c,其中InfoNCE一种基于互信息(Mutual Information)修改自回归的损失函数。
示例性的,根据如下公式(3)确定第i个第一虚拟身份向量对应的身份子损失L c(i):
Figure PCTCN2022111704-appb-000002
其中,w j为第一目标j的第i个第一虚拟身份向量,K[j]为第一目标j的第i个第二虚拟身份向量,τ为第一预设值,K[k]为第i个动态列表中第k个目标对应的第i个第二虚拟身份向量,w k为第k个目标对应的第一虚拟身份向量,K为第i个动态列表所包括的目标的总数。
步骤33、将N个第一虚拟身份向量分别对应的身份子损失之和,确定为目标网络模型的身份损失。
根据上述步骤32,确定出第i个第一虚拟身份向量对应的身份子损失后,将N个第一虚拟身份向量分别对应的身份子损失之和,确定为身份损失。例如N为3,根据上述方法确定出3个第一虚拟身份向量中每个第一虚拟身份向量对应的身份子损失,再将这3个第一虚拟身份向量对应的身份子损失之和,确定为模型的身份损失。
本申请实施例,根据上述方法确定出模型训练过程中的身份损失后,根据身份匿名化图像和散度约束,确定目标网络模型的损失,包括如下步骤:
根据身份匿名化图像、散度约束和身份损失,确定目标网络模型的损失。
示例性地,确定所述身份匿名化图像和第二训练图像之间的重建损失,根据重建损失、散度约束和身份损失,确定目标网络模型的损失。
在一种示例中,将身份匿名化图像和第二训练图像之间差值,确定为重建损失。例如,将身份匿名化图像各像素点与第二训练图像对应像素点之间的差值之和,确定为重建损失。
在另一种示例中,根据如下公式(4),确定重建损失L rec
L rec=|Y s,t-X t| 1     (4)
其中,Y s,t为身份匿名化图像,X t为第二训练图像,|| 1为1范数运算。
根据上述步骤,确定出重建损失L rec后,根据重建损失、散度约束和身份损失,确定目标网络模型的损失。例如,将重建损失、散度约束和身份损失的加权和,确定为目标网络模型的最终损失。
在一些实施例中,为了提高模型的训练准确性,本申请实施例还包括确定身份匿名化图像的身份对比损失,示例性地,包括如下步骤:
步骤A、确定身份匿名化图像和第一训练图像的第一距离,身份匿名化图像和第二训练图像的第二距离,以及第一训练图像和第二训练图像之间的第三距离;
步骤B、根据第一距离、第二距离和第三距离,确定对比损失;
其中,上述第一距离、第二距离和第三距离可以是余弦距离等任意距离方式确定得到。
示例1,根据步骤A确定出第一距离、第二距离和第三距离后,将第一距离、第二距离和第三距离之和,确定为对比损失。
示例2,确定所述第二距离与所述第三距离差的平方,与所述第一距离的和值;将预设值与所述和值的差值,确定为所述对比损失。
在一种示例中,根据如下公式(5),确定对比损失L ICL
L ICL
1-cos(z id(Y s,t),z id(X s))+
(cos(z id(Y s,t),z id(X t))-cos(z id(X s),z id(X t))) 2
(5)
其中,z id表示从预训练人脸识别模型中提取的关于图像X的512维身份向量表示,cos(z id(Y s,t),z id(X s))为身份匿名化图像和第一训练图像的第一距离,cos(z id(Y s,t),z id(X t))为身份匿名化图像和第二训练图像的第二距离,cos(z id(X s),z id(X t))为第一训练图像和第二训练图像之间的第三距离。
根据上述步骤,确定出对比损失L ICL后,根据重建损失、散度约束、身份损失和对比损失,确定目标网络模型的损失,例如,将重建损失、散度约束、身份损失和对比损失的加权和,确定为目标网络模型的损失。
在一些实施例中,若融合模块为对抗网络,则在模型训练过程中,还确定模型的对抗损失,例如,根据身份匿名化图像和第一训练图像,确定对抗损失。
示例性,根据如下公式(6),确定对抗损失L GAN
L GAN=min Gmax DE[log(D(X s))]+E[log(1-D(Y s,t))]     (6)
其中,D为判别器,G为生成器,E(*)表示分布函数的期望值,D(X s)为鉴别器对第一训练图像X s的鉴别结果,D(Y s,t)为鉴别器对身份匿名化图像Y s,t的鉴别结果。
根据上述步骤,确定出对抗损失L GAN后,可以根据重建损失、散度约束、身份损失、对比损失和对抗损失,确定目标网络模型的损失,例如,将重建损失、散度约束、身份损失、对比损失和对抗损失的加权和,确定为目标网络模型的损失。
需要说明的是,本申请实施例对重建损失、散度约束、身份损失、对比损失和对抗损失对应的权重值的大小不做限制,可根据实际需要进行确定。
在一些实施例中,根据如下公式(7),对重建损失、散度约束、身份损 失、对比损失和对抗损失进行加权运算,得到目标网络模型的损失L total
L total=L GAN+10*L rec+5*L ICL+L c+0.0001*L kl     (7)
上述公式(7)中各损失对应的权重为一种示例,本申请实施例中各损失对应的权重包括但不限于上述公式(7)所示,可根据需要确定。
在一些实施例中,为了提高目标网络模型的训练准确性,则可以确定除上述实施例所述的各损失之外的其他损失,本申请实施例对此不做限制,可根据实际需要确定。
由上述可知,本申请实施例通过生成不同分辨率对应的第一虚拟身份向量来实现身份匿名化,可以提高匿名化的分辨率,例如可以生成1024 2分辨率的匿名化结果,同时产生较少的图片伪影,具有较高的保真度。另外,本申请实施例在模型训练时,不依赖关键回归模型和分割模型,即没有对图像中人脸区域进行去除,保留原始图片中的姿态、细节和遮挡。
本申请实施例,在目标网络模型的应用过程中,通过投影模块,将第一训练图像投影至目标空间,得到N个第一虚拟身份向量,使得目标网络模型能够对图像中的身份信息进行充分学习,通过对第二训练图像进行属性向量提取,得到M个属性向量,实现了目标网络模型对图像中属性信息的充分学习,基于N个第一虚拟身份向量和M个属性向量进行图像生成,得到第二训练图像的身份匿名化图像,如此,使得训练得到的模型能够在保证原始图像的属性信息不变的情况下,生成携带虚拟身份信息的图像。即本申请提供一种新的目标网络模型,通过上述训练方法,使得目标网络模型对第一训练图像中的身份信息进行学习,这样目标网络模型可以独立生成虚拟身份,同时让目标网络模型对第二训练图像中的属性信息进行充分学习,在整个学习的过程中无需去除图像中面部区域,也无需使用真实身份信息进行指导,并通过利用换脸任务中明确的监督目标对目标网络模型进行训练,提高目标网络模型的身份匿名化生成的保真度和分辨率,使得训练后的目标网络模型可以生成高质量的身份匿名化图像。
上文结合图3至图10,详细描述了本申请的模型训练方法进行介绍,下文结合图11至图13,详细描述本申请的身份匿名化方法进行介绍。
图11为本申请一实施例提供的身份匿名化方法流程示意图。图11所示的身份匿名化方法是使用上述训练好的目标网络模型进行身份匿名化处理。如图11所示,该方法包括:
S401、在目标网络模型中投影模块的目标空间上进行采样,得到N个虚拟身份向量,N为正整数。
由上述实施例可知,本申请实施例使用第一训练图像对投影模块进行训练,使得投影模块对第一训练图像中的身份信息进行充分学习。这样在实际使用时,可以通过对投影模块的目标空间进行采样,得到N个虚拟身份向量。
上述S401的实现方式包括但不限于如下几种:
方式1,基于训练后的投影模块的目标空间的均值和方差进行采样,得到N个虚拟身份向量。例如,在目标空间的方差中进行随机采样,然后加到目标空间的均值上,得到一个虚拟身份向量,重复执行上述步骤,可以得到N个虚拟身份向量。
方式2,目标空间包括第一空间和第二空间,目标网络模型包括第二投影单元,此时,可采用如下方式,在目标网络模型中投影模块的目标空间上进行采样,得到N个虚拟身份向量:
在第一空间上进行采样,得到N个身份隐向量;通过第二投影单元,将N个身份隐向量投影至第二空间,得到N个虚拟身份向量。
本申请实施例,在实际匿名化时,投影模块中的第一投影单元不再使用,只使用投影模块中的第二投影单元进行投影。示例性地,如图12所示,在符合标准高斯分布的第一空间Z中进行采样,得到N个身份隐向量,接着将N个身份隐向量输入第二投影单元中。第二投影单元将N个身份隐向量投影到W空间中,得到N个虚拟身份向量。图12中以N为3,第二投影单元为映射网络为例,但是本申请实施例的投影模块不局限于图12所示。
由上述可知,使用第一训练图像对第一空间进行训练,使得第一空间的方差和均值符合标准高斯分布。这样,首先在第一空间上进行采样,生成N个身份隐向量,例如,基于第一空间的均值和方差进行采样,得到N个身份隐向量,在第一空间的方差中进行随机采样,然后加到第一空间的均值上,得到一个身份隐向量,重复执行上述步骤,可以得到N个身份隐向量。接着,通过第二投影单元,将N个身份隐向量投影至第二空间,得到N个虚拟身份向量。
在一些实施例中,上述N个虚拟身份向量分别对应不同的分辨率,例如N=3,其中,第一个虚拟身份向量对应低分辨率,第二个虚拟身份向量对应中分辨率,第三个虚拟身份向量对应高分辨率。
根据上述方法,得到N个虚拟身份向量后,执行如下S402和S403的步骤,得到待处理图像的身份匿名化图像。
S402、通过目标网络模型中的属性模块,对待处理图像进行属性向量提取,得到M个属性向量,M为正整数。
本申请实施例的属性模块配置为提取待处理图像中的属性信息。
在一些实施例中,属性模块包括编码单元和解码单元,此时,可采用如下方式对待处理图像进行属性向量提取,得到M个属性向量:
将待处理图像输入编码单元,得到待处理图像的特征信息;将特征信息输入解码单元,得到M个属性向量。
在一些实施例中,上述编码单元可以包括多个特征提取层,同理,上述解码单元也可以包括多个特征提取层;其中,特征提取层可以包括卷积层等。
在一些实施例中,编码单元中的至少一个特征提取层与解码单元中的至少一个特征提取层之间跳跃连接。
上述生成的M个属性向量可以对应不同的分辨率。
在一些实施例中,上述目标网络模型为自编码器。
S403、通过目标网络模型的融合模块,基于N个虚拟身份向量和M个属性向量进行图像生成,得到待处理图像的身份匿名化图像。
根据上述步骤,生成N个虚拟身份向量和M个属性向量输入融合模块中,得到待处理图像的身份匿名化图像。
上述S403的实现方式包括但不限于如下几种示例:
示例1,对N个虚拟身份向量进行拼接,同时对M个属性向量进行拼接,将拼接后的虚拟身份向量和属性向量进行融合后,输入融合模块中。
例如,将拼接后的虚拟身份向量和属性向量进行级联后,输入融合模块中。
再例如,将拼接后的虚拟身份向量和属性向量进行相加后,输入融合模块中。
示例2,融合模块包括多个不同的分辨率层,此时,可以根据N个虚拟身份向量所对应的分辨率,将N个虚拟身份向量作为样式,将M个属性向量作为噪音,输入对应的分辨率层中,得到待处理图像的身份匿名化图像。
在一些实施例中,融合模块为StyleGAN2。此时,如图7所示,在融合模块相邻两个分辨率层之间包括AdaIN层,例如,对虚拟身份向量i+1进行仿射变换,将第i个分辨率层的输出的特征信息i与属性向量i合并后,与仿射变换后的虚拟身份向量i+1输入AdaIN层,执行AdaIN操作,并将AdaIN操作结果输入第i+1个分辨率层。
本申请实施例的融合模块还可以是StyleGAN3和ProGAN等对抗模型。在一些实施例中,以第二投影单元为映射网络,属性模块为自编码器,融合模块为StyleGAN2为例,对本申请实施例的身份匿名化过程进行介绍。
示例性地,如图13所示,在投影模块的第一空间Z中进行采样,得到N个身份隐向量,例如,得到3个N个身份隐向量,这3个N个身份隐向量分别对应低、中、高3个不同的分辨率。接着,将N个身份隐向量输入映射网络,通过映射网络将N个身份隐向量从第一空间Z投影到第二空间W中,得到N个虚拟身份向量。另外,将待处理图像Xt输入自编码器中,通过自编码器对待处理图像Xt进行处理后,生成M个属性向量。最后,将M个属性向量作为噪声,将N个虚拟身份向量作为样式,输入StyleGAN2的各层中,得到StyleGAN2输出的待处理图像的身份匿名化图像Ys,t。
本申请实施例提供的身份匿名化方法,在目标网络模型中投影模块的目标空间上进行采样,得到N个虚拟身份向量,通过目标网络模型中的属性模块,对待处理图像进行属性向量提取,得到M个属性向量,通过目标网络模型的融合模块,基于N个虚拟身份向量和M个属性向量进行图像生成,得到待处理图像的身份匿名化图像。即本申请实施例的目标网络模型可以独立生成虚拟身份,在对待处理图像进行身份匿名化时,无需去除待处理图像中的 面部区域,进而提高身份匿名化的保真度。
上文结合图3至图13,详细描述了本申请的方法实施例,下文结合图14至图15,详细描述本申请的装置实施例。图14是本申请实施例提供的模型训练装置的示意性框图。该训练装置10可以为计算设备或者为计算设备中的一部分。如图14所示,模型训练装置10包括:
投影单元11,配置为通过目标网络模型中的投影模块,将第一训练图像投影至目标空间,得到N个第一虚拟身份向量,所述N为正整数;
属性单元12,配置为通过所述目标网络模型中的属性模块,对第二训练图像进行属性向量提取,得到M个属性向量,所述M为正整数;
融合单元13,配置为通过所述目标网络模型的融合模块,基于所述N个第一虚拟身份向量和所述M个属性向量进行图像生成,得到所述第二训练图像的身份匿名化图像;
训练单元14,配置为根据所述身份匿名化图像,确定所述目标网络模型的损失,并根据所述损失对所述目标网络模型进行训练。
在一些实施例中,所述投影模块包括第一投影单元和第二投影单元,所述目标空间包括第一空间和第二空间,投影单元11,还配置为提取所述第一训练图像的先验身份信息;通过所述第一投影单元,将所述先验身份信息投影至第一空间,得到N个身份隐向量;通过所述第二投影单元,将所述N个身份隐向量投影至第二空间,得到所述N个第一虚拟身份向量。
在一些实施例中,投影单元11,还配置为通过所述第一投影单元将先验身份信息,投影为所述第一空间的均值和方差;基于所述第一空间的均值和方差进行采样,得到所述N个身份隐向量。
在一些实施例中,训练单元14还配置为确定所述N个身份隐向量的散度约束;并根据所述身份匿名化图像和所述散度约束,确定所述目标网络模型的损失。
在一些实施例中,N个第一虚拟身份向量分别对应不同的分辨率。
在一些实施例中,所述第一投影单元为变分自编码器。
在一些实施例中,训练单元14,还配置为获取第三训练图像,所述第三训练图像和所述第一训练图像均为第一目标的两张不同的图像;通过所述目标网络模型中的投影参考模块,将所述第三训练图像投影至目标空间,得到N个第二虚拟身份向量,所述投影参考模块与所述投影模块的网络结构相同,且根据所述投影模块进行更新;根据所述N个第一虚拟身份向量和所述N个第二虚拟身份向量,确定身份损失;根据所述身份匿名化图像、所述散度约束和所述身份损失,确定所述目标网络模型的损失。
在一些实施例中,训练单元14,还配置为针对所述N个第二虚拟身份向量中的第i个第二虚拟身份向量,使用所述第i个第二虚拟身份向量更新第i个动态列表中,所述第一目标对应的虚拟身份向量,其中,所述第i个动态列 表中包括第i个分辨率下不同目标的虚拟身份向量,所述i为从1到N的正整数;根据第i个第一虚拟身份向量和更新后的所述第i个动态列表,确定所述第i个第一虚拟身份向量对应的身份子损失;将所述N个第一虚拟身份向量分别对应的身份子损失之和,确定为所述身份损失。
在一些实施例中,训练单元14,还配置为获取所述第i个第二虚拟身份向量与第一预设值的第一比值,将所述第一比值与所述第i个第一虚拟身份向量相乘,得到第一结果,并对所述第一结果进行指数运算,得到第一运算值;获取更新后的所述第i个动态列表中,每个第二虚拟身份向量与第一预设值的第二比值,针对各所述第二比值,将所述第二比值与对应的第i个第一虚拟身份向量相乘,,得到第二结果,并对所述第二结果进行指数运算,得到所述每个第二虚拟身份向量对应的第二运算值;确定每个第二虚拟身份向量对应的第二运算值的和,获取所述第一运算值与所述和的第三比值,并对所述第三比值进行对数运算,得到第三运算值;将所述第三运算值的负数,确定为所述第i个第一虚拟身份向量对应的身份子损失。
在一些实施例中,所述属性模块包括编码单元和解码单元,属性单元12,还配置为通过所述编码单元对所述第二训练图像进行特征提取,得到所述第二训练图像的特征信息;通过所述解码单元对所述特征信息进行解码,得到M个属性向量。
在一些实施例中,所述编码单元中的至少一个特征提取层与所述解码单元中的至少一个特征提取层之间跳跃连接。
在一些实施例中,所述融合模块包括多个不同的分辨率层,融合单元13,还配置为根据所述N个第一虚拟身份向量所对应的分辨率,将所述N个第一虚拟身份向量作为样式,将所述M个属性向量作为噪音,输入对应的分辨率层中,得到所述第二训练图像的身份匿名化图像。
在一些实施例中,训练单元14,还配置为确定所述身份匿名化图像和所述第二训练图像之间的重建损失;根据所述重建损失、所述散度约束和所述身份损失,确定所述目标网络模型的损失。
在一些实施例中,训练单元14,还配置为确定所述身份匿名化图像和所述第一训练图像的第一距离,所述身份匿名化图像和所述第二训练图像的第二距离,以及所述第一训练图像和所述第二训练图像之间的第三距离;根据所述第一距离、所述第二距离和所述第三距离,确定对比损失;根据所述重建损失、所述散度约束、所述身份损失和所述对比损失,确定所述目标网络模型的损失。
在一些实施例中,训练单元14,还配置为确定所述第二距离与所述第三距离差的平方,与所述第一距离的和值;将预设值与所述和值的差值,确定为所述对比损失。
在一些实施例中,若所述融合模块为对抗网络,训练单元14,还配置为根据所述身份匿名化图像和所述第一训练图像,确定对抗损失;将所述重建 损失、所述散度约束、所述身份损失、所述对比损失和所述对抗损失的加权和,确定为所述目标网络模型的损失。
应理解的是,装置实施例与方法实施例可以相互对应,类似的描述可以参照方法实施例。为避免重复,此处不再赘述。示例性地,图14所示的装置可以执行上述图3所示的模型训练方法的实施例,并且装置中的各个模块的前述和其它操作和/或功能分别为了实现计算设备对应的方法实施例,为了简洁,在此不再赘述。
图15是本申请实施例提供的身份匿名化装置的示意性框图。该身份匿名化装置20可以为计算设备或者为计算设备中的一部分。如图15所示,身份匿名化装置20包括:
采样单元21,配置为在目标网络模型中投影模块的目标空间上进行采样,得到N个虚拟身份向量,所述N为正整数;
属性单元22,配置为通过目标网络模型中的属性模块,对待处理图像进行属性向量提取,得到M个属性向量,所述M为正整数;
匿名化单元23,配置为通过所述目标网络模型的融合模块,基于所述N个虚拟身份向量和所述M个属性向量进行图像生成,得到所述待处理图像的身份匿名化图像。
在一些实施例中,所述目标空间包括第一空间和第二空间,所述目标网络模型包括第二投影单元,采样单元21,还配置为在所述第一空间上进行采样,得到N个身份隐向量;通过所述第二投影单元,将所述N个身份隐向量投影至第二空间,得到所述N个虚拟身份向量。
在一些实施例中,所述第一空间的均值和方差满足标准高斯分布,采样单元21,还配置为基于所述第一空间的均值和方差进行采样,得到所述N个身份隐向量。
在一些实施例中,所述N个虚拟身份向量分别对应不同的分辨率。
在一些实施例中,所述属性模块包括编码单元和解码单元,属性单元22,还配置为通过所述编码单元,对所述待处理图像进行特征提取,得到所述待处理图像的特征信息;通过所述解码单元对所述特征信息进行解码,得到M个属性向量。
在一些实施例中,所述编码单元中的至少一个特征提取层与所述解码单元中的至少一个特征提取层之间跳跃连接。
在一些实施例中,所述融合模块包括多个不同的分辨率层,匿名化单元23,还配置为根据所述N个虚拟身份向量所对应的分辨率,将所述N个虚拟身份向量作为样式,将所述M个属性向量作为噪音,输入对应的分辨率层中,得到所述待处理图像的身份匿名化图像。
应理解的是,装置实施例与方法实施例可以相互对应,类似的描述可以参照方法实施例。为避免重复,此处不再赘述。示例性地,图15所示的装置 可以执行上述图11所示的身份匿名化方法的实施例,并且装置中的各个模块的前述和其它操作和/或功能分别为了实现计算设备对应的方法实施例,为了简洁,在此不再赘述。
上文中结合附图从功能模块的角度描述了本申请实施例的装置。应理解,该功能模块可以通过硬件形式实现,也可以通过软件形式的指令实现,还可以通过硬件和软件模块组合实现。例如,本申请实施例中的方法实施例的各步骤可以通过处理器中的硬件的集成逻辑电路和/或软件形式的指令完成,结合本申请实施例公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。可选地,软件模块可以位于随机存储器,闪存、只读存储器、可编程只读存储器、电可擦写可编程存储器、寄存器等本领域的成熟的存储介质中。该存储介质位于存储器,处理器读取存储器中的信息,结合其硬件完成上述方法实施例中的步骤。
图16是本申请实施例提供的计算设备的示意性框图,该计算设备配置为执行上述方法实施例。如图16所示,该计算设备30可包括:
存储器31和处理器32,该存储器31配置为存储计算机程序33,并将该程序代码33传输给该处理器32。换言之,该处理器32可以从存储器31中调用并运行计算机程序33,以实现本申请实施例中的方法。
例如,该处理器32可配置为根据该计算机程序33中的指令执行上述方法步骤。
在本申请的一些实施例中,该处理器32可以包括但不限于:
通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列(Field Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等等。
在本申请的一些实施例中,该存储器31包括但不限于:
易失性存储器和/或非易失性存储器。其中,非易失性存储器可以是只读存储器(Read-Only Memory,ROM)、可编程只读存储器(Programmable ROM,PROM)、可擦除可编程只读存储器(Erasable PROM,EPROM)、电可擦除可编程只读存储器(Electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(Random Access Memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(Static RAM,SRAM)、动态随机存取存储器(Dynamic RAM,DRAM)、同步动态随机存取存储器(Synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(Double Data Rate SDRAM,DDR SDRAM)、增强型同步动态随机存取存储器(Enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(synch link DRAM,SLDRAM)和直接内存总线随机存取存储器(Direct Rambus RAM,DR RAM)。
在本申请的一些实施例中,该计算机程序33可以被分割成一个或多个模 块,该一个或者多个模块被存储在该存储器31中,并由该处理器32执行,以完成本申请提供的录制页面的方法。该一个或多个模块可以是能够完成特定功能的一系列计算机程序指令段,该指令段用于描述该计算机程序33在该计算设备中的执行过程。
如图16所示,该计算设备30还可包括:
收发器34,该收发器34可连接至该处理器32或存储器31。
其中,处理器32可以控制该收发器34与其他设备进行通信,例如,可以向其他设备发送信息或数据,或接收其他设备发送的信息或数据。收发器34可以包括发射机和接收机。收发器34还可以包括天线,天线的数量可以为一个或多个。
应当理解,该计算设备30中的各个组件通过总线系统相连,其中,总线系统除包括数据总线之外,还包括电源总线、控制总线和状态信号总线。
本申请实施例提供了一种计算机存储介质,其上存储有计算机程序,该计算机程序被计算机执行时使得该计算机能够执行上述方法实施例的方法。或者说,本申请实施例还提供一种包含指令的计算机程序产品,该指令被计算机执行时使得计算机执行上述方法实施例的方法。
本申请实施例提供了一种计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括计算机指令,该计算机指令存储在计算机可读存储介质中。计算设备的处理器从计算机可读存储介质读取该计算机指令,处理器执行该计算机指令,使得该计算设备执行上述方法实施例的方法。
以上内容,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以该权利要求的保护范围为准。

Claims (21)

  1. 一种模型训练方法,所述方法由计算设备执行,包括:
    通过目标网络模型中的投影模块,将第一训练图像投影至目标空间,得到N个第一虚拟身份向量,所述N为正整数;
    通过所述目标网络模型中的属性模块,对第二训练图像进行属性向量提取,得到M个属性向量,所述M为正整数;
    通过所述目标网络模型的融合模块,基于所述N个第一虚拟身份向量和所述M个属性向量进行图像生成,得到所述第二训练图像的身份匿名化图像;
    根据所述身份匿名化图像,确定所述目标网络模型的损失,并根据所述损失对所述目标网络模型进行训练。
  2. 根据权利要求1所述的方法,其中,所述投影模块包括第一投影单元和第二投影单元,所述目标空间包括第一空间和第二空间,所述通过目标网络模型中的投影模块,将第一训练图像投影至目标空间,得到N个第一虚拟身份向量,包括:
    提取所述第一训练图像的先验身份信息;
    通过所述第一投影单元,将所述先验身份信息投影至第一空间,得到N个身份隐向量;
    通过所述第二投影单元,将所述N个身份隐向量投影至第二空间,得到所述N个第一虚拟身份向量。
  3. 根据权利要求2所述的方法,其中,所述通过所述第一投影单元,将所述先验身份信息投影至第一空间,得到N个身份隐向量,包括:
    通过所述第一投影单元将所述先验身份信息,投影为所述第一空间的均值和方差;
    基于所述第一空间的均值和方差进行采样,得到所述N个身份隐向量。
  4. 根据权利要求2或3所述的方法,其中,所述方法还包括:
    确定所述N个身份隐向量的散度约束;
    所述根据所述身份匿名化图像,确定所述目标网络模型的损失,包括:
    根据所述身份匿名化图像和所述散度约束,确定所述目标网络模型的损失。
  5. 根据权利要求4所述的方法,其中,所述方法还包括:
    获取第三训练图像,所述第三训练图像和所述第一训练图像均为第一目标的两张不同的图像;
    通过所述目标网络模型中的投影参考模块,将所述第三训练图像投 影至目标空间,得到N个第二虚拟身份向量,所述投影参考模块与所述投影模块的网络结构相同,且根据所述投影模块进行更新;
    根据所述N个第一虚拟身份向量和所述N个第二虚拟身份向量,确定身份损失;
    所述根据所述身份匿名化图像和所述散度约束,确定所述目标网络模型的损失,包括:
    根据所述身份匿名化图像、所述散度约束和所述身份损失,确定所述目标网络模型的损失。
  6. 根据权利要求5所述的方法,其中,所述根据所述N个第一虚拟身份向量和所述N个第二虚拟身份向量,确定身份损失,包括:
    针对所述N个第二虚拟身份向量中的第i个第二虚拟身份向量,使用所述第i个第二虚拟身份向量更新第i个动态列表中,所述第一目标对应的虚拟身份向量,其中,所述第i个动态列表中包括第i个分辨率下不同目标的虚拟身份向量,所述i为从1到N的正整数;
    根据第i个第一虚拟身份向量和更新后的所述第i个动态列表,确定所述第i个第一虚拟身份向量对应的身份子损失;
    将所述N个第一虚拟身份向量分别对应的身份子损失之和,确定为所述身份损失。
  7. 根据权利要求6所述的方法,其中,所述根据第i个第一虚拟身份向量和更新后的所述第i个动态列表,确定所述第i个第一虚拟身份向量对应的身份子损失,包括:
    获取所述第i个第二虚拟身份向量与第一预设值的第一比值,将所述第一比值与所述第i个第一虚拟身份向量相乘,得到第一结果,并对所述第一结果进行指数运算,得到第一运算值;
    获取更新后的所述第i个动态列表中,每个第二虚拟身份向量与第一预设值的第二比值,针对各所述第二比值,将所述第二比值与对应的第i个第一虚拟身份向量相乘,得到第二结果,并对所述第二结果进行指数运算,得到所述每个第二虚拟身份向量对应的第二运算值;
    确定每个第二虚拟身份向量对应的第二运算值的和,获取所述第一运算值与所述和的第三比值,并对所述第三比值进行对数运算,得到第三运算值;
    将所述第三运算值的负数,确定为所述第i个第一虚拟身份向量对应的身份子损失。
  8. 根据权利要求1-7任一项所述的方法,其中,所述属性模块包括编码单元和解码单元,所述通过所述目标网络模型中的属性模块,对第二训练图像进行属性向量提取,得到M个属性向量,包括:
    通过所述编码单元对所述第二训练图像进行特征提取,得到所述第二训练图像的特征信息;
    通过所述解码单元对所述特征信息进行解码,得到M个属性向量。
  9. 根据权利要求1-7任一项所述的方法,其中,所述融合模块包括多个不同的分辨率层,所述通过所述目标网络模型的融合模块,基于所述N个第一虚拟身份向量和所述M个属性向量进行图像生成,得到所述第二训练图像的身份匿名化图像,包括:
    根据所述N个第一虚拟身份向量所对应的分辨率,将所述N个第一虚拟身份向量作为样式,将所述M个属性向量作为噪音,输入对应的分辨率层中,得到所述第二训练图像的身份匿名化图像。
  10. 根据权利要求5所述的方法,其中,所述根据所述身份匿名化图像、所述散度约束和所述身份损失,确定所述目标网络模型的损失,包括:
    确定所述身份匿名化图像和所述第二训练图像之间的重建损失;
    根据所述重建损失、所述散度约束和所述身份损失,确定所述目标网络模型的损失。
  11. 根据权利要求10所述的方法,其中,所述方法还包括:
    确定所述身份匿名化图像和所述第一训练图像的第一距离、所述身份匿名化图像和所述第二训练图像的第二距离,以及所述第一训练图像和所述第二训练图像之间的第三距离;
    根据所述第一距离、所述第二距离和所述第三距离,确定对比损失;
    所述根据所述重建损失、所述散度约束和所述身份损失,确定所述目标网络模型的损失,包括:
    根据所述重建损失、所述散度约束、所述身份损失和所述对比损失,确定所述目标网络模型的损失。
  12. 根据权利要求11所述的方法,其中,所述根据所述第一距离、所述第二距离和所述第三距离,确定对比损失,包括:
    确定所述第二距离与所述第三距离差的平方,与所述第一距离的和值;
    将预设值与所述和值的差值,确定为所述对比损失。
  13. 根据权利要求11所述的方法,其中,若所述融合模块为对抗网络,则所述根据所述重建损失、所述散度约束、所述身份损失和所述对比损失,确定所述目标网络模型的损失,包括:
    根据所述身份匿名化图像和所述第一训练图像,确定对抗损失;
    将所述重建损失、所述散度约束、所述身份损失、所述对比损失和所述对抗损失的加权和,确定为所述目标网络模型的损失。
  14. 一种身份匿名化方法,所述方法由计算设备执行,包括:
    在目标网络模型中投影模块的目标空间上进行采样,得到N个虚拟身份向量,所述N为正整数;
    通过目标网络模型中的属性模块,对待处理图像进行属性向量提取, 得到M个属性向量,所述M为正整数;
    通过所述目标网络模型的融合模块,基于所述N个虚拟身份向量和所述M个属性向量进行图像生成,得到所述待处理图像的身份匿名化图像。
  15. 根据权利要求14所述的方法,其中,所述目标空间包括第一空间和第二空间,所述目标网络模型包括第二投影单元,所述在目标网络模型中投影模块的目标空间上进行采样,得到N个虚拟身份向量,包括:
    在所述第一空间上进行采样,得到N个身份隐向量;
    通过所述第二投影单元,将所述N个身份隐向量投影至第二空间,得到所述N个虚拟身份向量。
  16. 根据权利要求15所述的方法,其中,所述第一空间的均值和方差满足标准高斯分布,所述在所述第一空间上进行采样,得到N个身份隐向量,包括:
    基于所述第一空间的均值和方差进行采样,得到所述N个身份隐向量。
  17. 一种模型训练装置,所述装置包括:
    投影单元,配置为通过目标网络模型中的投影模块,将第一训练图像投影至目标空间,得到N个第一虚拟身份向量,所述N为正整数;
    属性单元,配置为通过所述目标网络模型中的属性模块对第二训练图像进行属性向量提取,得到M个属性向量,所述M为正整数;
    融合单元,配置为通过所述目标网络模型的融合模块,基于所述N个虚拟身份向量和所述M个属性向量进行图像生成,得到所述第二训练图像的身份匿名化图像;
    训练单元,配置为根据所述身份匿名化图像,确定所述目标网络模型的损失,并根据所述损失对所述目标网络模型进行训练。
  18. 一种身份匿名化装置,所述装置包括:
    采样单元,配置为在目标网络模型中投影模块的目标空间上进行采样,得到N个虚拟身份向量,所述N为正整数;
    属性单元,配置为通过目标网络模型中的属性模块,对待处理图像进行属性向量提取,得到M个属性向量,所述M为正整数;
    匿名化单元,配置为通过所述目标网络模型的融合模块,基于所述N个虚拟身份向量和所述M个属性向量进行图像生成,得到所述待处理图像的身份匿名化图像。
  19. 一种计算设备,所述计算设备包括处理器和存储器;
    所述存储器,配置为存储计算机程序;
    所述处理器,配置为执行所述计算机程序以实现如上述权利要求1至13任一项所述的方法,或者实现如上述权利要求14至16任一项所述的方法。
  20. 一种计算机可读存储介质,所述计算机可读存储介质配置为存储计算机程序,所述计算机程序使得计算机执行时,实现如上述权利要求1至13任一项所述的方法,或者实现如上述权利要求或14至16任一项所述的方法。
  21. 一种计算机程序产品,包括计算机程序或指令,所述计算机程序或指令被处理器执行时,实现权利要求14至16任一项所述的方法,或者实现权利要求1至13任一项所述的方法。
PCT/CN2022/111704 2022-03-10 2022-08-11 模型训练和身份匿名化方法、装置、设备、存储介质及程序产品 WO2023168903A1 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
JP2022566254A JP2024513274A (ja) 2022-03-10 2022-08-11 モデル訓練方法及び装置、アイデンティティ匿名化方法及び装置、機器、記憶媒体並びにコンピュータプログラム
EP22773378.9A EP4270232A4 (en) 2022-03-10 2022-08-11 MODEL TRAINING METHOD AND APPARATUS, IDENTITY ANONYMIZATION METHOD AND APPARATUS, APPARATUS, STORAGE MEDIUM AND PROGRAM PRODUCT
KR1020227038590A KR20230133755A (ko) 2022-03-10 2022-08-11 모델 트레이닝 방법 및 장치, 아이덴티티 익명화 방법 및 장치, 디바이스, 저장 매체, 그리고 프로그램 제품
US18/076,073 US20230290128A1 (en) 2022-03-10 2022-12-06 Model training method and apparatus, deidentification method and apparatus, device, and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210234385.2 2022-03-10
CN202210234385.2A CN114936377A (zh) 2022-03-10 2022-03-10 模型训练和身份匿名化方法、装置、设备及存储介质

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/076,073 Continuation US20230290128A1 (en) 2022-03-10 2022-12-06 Model training method and apparatus, deidentification method and apparatus, device, and storage medium

Publications (1)

Publication Number Publication Date
WO2023168903A1 true WO2023168903A1 (zh) 2023-09-14

Family

ID=82862564

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/111704 WO2023168903A1 (zh) 2022-03-10 2022-08-11 模型训练和身份匿名化方法、装置、设备、存储介质及程序产品

Country Status (2)

Country Link
CN (1) CN114936377A (zh)
WO (1) WO2023168903A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117274316A (zh) * 2023-10-31 2023-12-22 广东省水利水电科学研究院 一种河流表面流速的估计方法、装置、设备及存储介质
CN117688538A (zh) * 2023-12-13 2024-03-12 上海深感数字科技有限公司 一种基于数字身份安全防范的互动教育管理方法及系统
CN118536163A (zh) * 2024-06-04 2024-08-23 北京高科数聚技术有限公司 汽车企业运营数据管理方法及系统

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118213048A (zh) * 2023-11-20 2024-06-18 清华大学 影像处理方法、模型训练方法、设备、介质及产品

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113033511A (zh) * 2021-05-21 2021-06-25 中国科学院自动化研究所 一种基于操控解耦身份表示的人脸匿名方法
CN113642409A (zh) * 2021-07-15 2021-11-12 上海交通大学 一种人脸匿名化系统及方法、终端
WO2021258920A1 (zh) * 2020-06-24 2021-12-30 百果园技术(新加坡)有限公司 生成对抗网络训练方法、图像换脸、视频换脸方法及装置
CN114120041A (zh) * 2021-11-29 2022-03-01 暨南大学 一种基于双对抗变分自编码器的小样本分类方法
CN114139198A (zh) * 2021-11-29 2022-03-04 杭州电子科技大学 基于层次k匿名身份替换的人脸生成隐私保护方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021258920A1 (zh) * 2020-06-24 2021-12-30 百果园技术(新加坡)有限公司 生成对抗网络训练方法、图像换脸、视频换脸方法及装置
CN113033511A (zh) * 2021-05-21 2021-06-25 中国科学院自动化研究所 一种基于操控解耦身份表示的人脸匿名方法
CN113642409A (zh) * 2021-07-15 2021-11-12 上海交通大学 一种人脸匿名化系统及方法、终端
CN114120041A (zh) * 2021-11-29 2022-03-01 暨南大学 一种基于双对抗变分自编码器的小样本分类方法
CN114139198A (zh) * 2021-11-29 2022-03-04 杭州电子科技大学 基于层次k匿名身份替换的人脸生成隐私保护方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4270232A4

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117274316A (zh) * 2023-10-31 2023-12-22 广东省水利水电科学研究院 一种河流表面流速的估计方法、装置、设备及存储介质
CN117274316B (zh) * 2023-10-31 2024-05-03 广东省水利水电科学研究院 一种河流表面流速的估计方法、装置、设备及存储介质
CN117688538A (zh) * 2023-12-13 2024-03-12 上海深感数字科技有限公司 一种基于数字身份安全防范的互动教育管理方法及系统
CN117688538B (zh) * 2023-12-13 2024-06-07 上海深感数字科技有限公司 一种基于数字身份安全防范的互动教育管理方法及系统
CN118536163A (zh) * 2024-06-04 2024-08-23 北京高科数聚技术有限公司 汽车企业运营数据管理方法及系统

Also Published As

Publication number Publication date
CN114936377A (zh) 2022-08-23

Similar Documents

Publication Publication Date Title
WO2023168903A1 (zh) 模型训练和身份匿名化方法、装置、设备、存储介质及程序产品
CN113688855B (zh) 数据处理方法、联邦学习的训练方法及相关装置、设备
US11164046B1 (en) Method for producing labeled image from original image while preventing private information leakage of original image and server using the same
US11475608B2 (en) Face image generation with pose and expression control
CN111767906B (zh) 人脸检测模型训练方法、人脸检测方法、装置及电子设备
WO2021184754A1 (zh) 视频对比方法、装置、计算机设备和存储介质
CN109977832B (zh) 一种图像处理方法、装置及存储介质
CN118202391A (zh) 从单二维视图进行对象类的神经辐射场生成式建模
JP2023507248A (ja) 物体検出および認識のためのシステムおよび方法
US20220207861A1 (en) Methods, devices, and computer readable storage media for image processing
CN117095019B (zh) 一种图像分割方法及相关装置
Choraś et al. Image Processing & Communications Challenges 6
CN114972016A (zh) 图像处理方法、装置、计算机设备、存储介质及程序产品
CN114783017A (zh) 基于逆映射的生成对抗网络优化方法及装置
Wei et al. Contrastive distortion‐level learning‐based no‐reference image‐quality assessment
Sharjeel et al. Real time drone detection by moving camera using COROLA and CNN algorithm
CN114707589A (zh) 对抗样本的生成方法、装置、存储介质、设备及程序产品
Dai et al. An optimized method for variational autoencoders based on Gaussian cloud model
US20220172416A1 (en) System and method for performing facial image anonymization
JP2024515907A (ja) 画像処理方法及び装置、コンピューター機器、並びにコンピュータープログラム
US20230290128A1 (en) Model training method and apparatus, deidentification method and apparatus, device, and storage medium
CN112463936A (zh) 一种基于三维信息的视觉问答方法及系统
CN116704588B (zh) 面部图像的替换方法、装置、设备及存储介质
CN117932314A (zh) 模型训练方法、装置、电子设备、存储介质及程序产品
Du et al. IGCE: A Compositional Energy Concept Based Deep Image Generation Neural Network

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2022773378

Country of ref document: EP

Effective date: 20220929

WWE Wipo information: entry into national phase

Ref document number: 2022566254

Country of ref document: JP

WWE Wipo information: entry into national phase

Ref document number: 202237065423

Country of ref document: IN

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22773378

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 11202254237E

Country of ref document: SG