WO2022160773A1 - 基于虚拟样本的行人重识别方法 - Google Patents

基于虚拟样本的行人重识别方法 Download PDF

Info

Publication number
WO2022160773A1
WO2022160773A1 PCT/CN2021/122343 CN2021122343W WO2022160773A1 WO 2022160773 A1 WO2022160773 A1 WO 2022160773A1 CN 2021122343 W CN2021122343 W CN 2021122343W WO 2022160773 A1 WO2022160773 A1 WO 2022160773A1
Authority
WO
WIPO (PCT)
Prior art keywords
virtual
pedestrian
data set
real
samples
Prior art date
Application number
PCT/CN2021/122343
Other languages
English (en)
French (fr)
Inventor
杜博
郭小洋
林雨恬
张超
王正
Original Assignee
武汉大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 武汉大学 filed Critical 武汉大学
Publication of WO2022160773A1 publication Critical patent/WO2022160773A1/zh
Priority to US18/337,439 priority Critical patent/US11837007B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • G06V20/53Recognition of crowd images, e.g. recognition of crowd congestion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/194Segmentation; Edge detection involving foreground-background segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Definitions

  • the invention belongs to the technical field of pedestrian re-identification, in particular to a pedestrian re-identification method based on virtual samples.
  • Person Re-ID aims to match images of each person from multiple non-overlapping cameras deployed at different locations.
  • person re-identification technology has developed rapidly, and there are rich application scenarios, such as finding people of interest (such as lost children or criminals) and tracking specific people, etc., which makes person re-identification technology in-depth.
  • Research. Benefiting from deep convolutional neural networks, many of the proposed methods for person re-id achieve very high performance.
  • these pedestrian re-identification methods rely on images from a large number of pedestrian surveillance videos for training, which will expose personal privacy information and may lead to further security problems. Due to the growing concern about privacy issues, some real pedestrian datasets are required to be withdrawn, and even images of the datasets cannot be displayed in any form of publication.
  • the unsupervised domain adaptation method can still learn the relevant features of the target domain data set with the help of the source domain data set without relying on the target domain pedestrian label, which avoids the specific category of target domain pedestrians to a certain extent. direct exposure of information.
  • State-of-the-art unsupervised domain adaptation methods generally fall into two categories: clustering-based methods and generative-based methods.
  • the purpose of the present invention is to provide a pedestrian re-identification method based on a virtual sample in view of the deficiencies of the prior art, realize pedestrian re-identification under privacy protection through virtual samples, and solve the problem of pedestrians under privacy protection in the prior art.
  • the re-identification task faces the challenge of missing pedestrian appearance in target images and the challenge of a large domain gap between virtual and real images.
  • the present invention adopts the following technical solutions:
  • a pedestrian re-identification method based on virtual samples comprising the following steps:
  • Step S1 obtaining the virtual characters generated by the game engine for preprocessing, and generating a batch of virtual samples with character labels by fusing the background of the target data set and the poses of real characters through a multi-factor variational generation network;
  • Step S2 rendering the generated virtual sample according to the lighting condition of the target data set
  • Step S3 sampling the rendered virtual samples according to the character attributes of the target data set
  • Step S4 constructing a training data set according to the sampled virtual samples to train the pedestrian re-identification model, and verify the recognition effect of the trained model.
  • step S1 includes:
  • Step S11 extract k characters from the virtual data set generated by the game engine and extract l backgrounds from the real pedestrian data set, respectively ⁇ c 1 ,...,c k ⁇ and ⁇ b 1 ,...,b l ⁇ , directly synthesizing the two to obtain n virtual images about the fusion of virtual characters and real backgrounds as training samples ⁇ x 1 , ..., x n ⁇ ;
  • Step S12 extract the character pose of each training sample respectively, and use it, the training sample and the corresponding background as the input of the constructed deep neural network based on variational autoencoder, that is, the multi-factor variational generation network, and construct the objective function training. Let the network learn to obtain the transformation law of the characters, backgrounds and poses of the synthetic images;
  • Step S13 adjusting the resolution of the virtual character according to the character resolution of the target data set
  • Step S14 taking the adjusted virtual character, the real background and the pose extracted from the target data set as the input of the network, and generating a batch of virtual samples with character labels through the network.
  • x represents the input training samples
  • z (x, b) represents the joint latent variable
  • D ⁇ represents the decoder network as a generator
  • ⁇ i represents the feature parameters extracted from different network layers
  • q ⁇ represents the posterior distribution parameters
  • p ⁇ represents the prior distribution parameter
  • KL represents the Kullback-Leibler divergence
  • i and ⁇ i are pre-set hyperparameters used to control the contribution of different network layers to the total loss.
  • the pixel ratio of the characters in the virtual data set and the real pedestrian data set in the image is calculated respectively, and the resolution of the virtual character is adjusted by scaling the characters in the virtual data set so that it has the same size as the target. datasets of similar resolution.
  • step S2 convert each image into HSV format, extract the V channel and calculate the average value of the V channel as the brightness value of the image, the brightness value of the image ranges from 0 to 255, so as to obtain the target The lighting conditions of the dataset.
  • step S3 two attributes of the color of the upper body clothes and the color of the lower body clothes are selected as the basic attributes for sampling to perform attribute distribution statistics of the data set.
  • the identification and verification process includes: using the model obtained by training to match the retrieved pictures in the gallery to determine the pictures of the same identity, and output the corresponding picture indexes in turn according to the possibility, and the real labels Do a comparison.
  • the problem of image brightness difference caused by the following problems and the inconsistency of attribute distribution caused by different clothing may be caused by seasonal changes.
  • the present invention uses a virtual image generation framework integrating translation-rendering-sampling to bring the virtual image and the real image as close as possible. distribution and generate a batch of new virtual samples, and further use these virtual samples to train the obtained pedestrian re-identification model, which can be effectively applied to pedestrian datasets in real scenes, so as to achieve the goal of not obtaining the target domain.
  • An effective pedestrian re-identification model is learned under the condition of the appearance of the real pedestrian dataset, and the task of pedestrian re-identification under privacy protection is completed. Specifically include the following aspects:
  • the present invention defines three types of information that are not related to privacy, specifically including content information, namely background and posture, imaging information, such as resolution and lighting conditions, and description information, such as clothing color, etc. human attributes.
  • the present invention adopts a virtual image generation framework integrating image translation-rendering-sampling to process the virtual data generated in the game engine to obtain a virtual sample, Effectively realize the domain distribution approximation from virtual samples to real images.
  • the present invention has the characteristics of high adaptability and strong image translation flexibility, and proposes a deep neural network based on variational autoencoders—multi-factor variational generation network. Coding and fusion of irrelevant factors can effectively generate virtual samples that integrate virtual characters and real-world information.
  • FIG. 1 is a flowchart of a method for pedestrian re-identification based on virtual samples in an embodiment of the present invention.
  • FIG. 2 is a schematic structural diagram of a deep neural network of a multi-factor variational generation network in an embodiment of the present invention.
  • This embodiment discloses a pedestrian re-identification method based on virtual samples, which aims to provide a pedestrian re-identification scheme under privacy protection. Since the appearance of real pedestrians cannot be obtained, this scheme uses the virtual images generated by the game engine as the source dataset for extracting character features. However, if we simply use the virtual source dataset X s to train the pedestrian re-identification model and apply it directly to the real pedestrian target dataset X t , due to the huge domain between the virtual source dataset and the real pedestrian dataset This method cannot learn the effective discriminative feature representation of real pedestrians in the target dataset, which will lead to the model effect far from meeting the actual needs.
  • this scheme introduces three types of privacy-independent information, including content information (background and pose, etc.), imaging information ( Foreground resolution and lighting conditions, etc.) and description information (attributes of people such as clothes color), etc.
  • content information contains real-world information and the limb state of real pedestrians
  • imaging information forces the image style to approach the target domain
  • description information makes the overall attribute distribution of the dataset have statistical semantic consistency.
  • the virtual sample-based pedestrian re-identification method specifically includes the following steps:
  • step S1 virtual data generated by the game engine is acquired and preprocessed to obtain a batch of virtual samples with character labels. Specifically, this step S1 includes the following steps:
  • Step S11 extract k characters from the virtual data set generated by the game engine and extract l backgrounds from the real pedestrian data set, respectively ⁇ c 1 ,...,c k ⁇ and ⁇ b 1 ,...,b l ⁇ , the two are directly synthesized to obtain n virtual images about the fusion of virtual characters and real backgrounds as training samples ⁇ x 1 , . . . , x n ⁇ .
  • Step S12 extract the character pose of each training sample respectively, and use it, the training sample and the corresponding background as the input of the constructed deep neural network based on variational autoencoder, that is, the multi-factor variational generation network, and construct the objective function training. Let the network learn to get the transformation law of the person, background and pose of the synthetic image.
  • step S12 the objective function is
  • x represents the input training samples
  • z (x, b) represents the joint latent variable
  • D ⁇ represents the decoder network as a generator
  • ⁇ i represents the feature parameters extracted from different network layers
  • q ⁇ represents the posterior distribution parameters
  • p ⁇ represents the prior distribution parameter
  • KL represents the Kullback-Leibler divergence
  • i and ⁇ i are pre-set hyperparameters used to control the contribution of different network layers to the total loss.
  • Step S13 Adjust the resolution of the virtual character according to the character resolution of the target data set.
  • step S13 the pixel proportions of the characters in the virtual data set and the real pedestrian data set are calculated respectively, and the resolution of the virtual characters is adjusted by scaling the characters in the virtual data set to have a resolution similar to the target data set. Rate.
  • Step S14 taking the adjusted virtual character, the real background and the pose extracted from the target data set as the input of the network, and generating a batch of virtual samples with character labels through the network.
  • Step S2 Render the generated virtual sample according to the lighting condition of the target data set.
  • each image is converted into HSV format, the V channel is extracted and the average value of the V channel is calculated as the brightness value of the image, so as to obtain the illumination condition of the target dataset.
  • the brightness values of the image range from 0-255.
  • Step S3 Sampling the rendered virtual samples according to the character attributes of the target data set.
  • step S3 two attributes of the color of the upper body clothes and the color of the lower body clothes are selected as the basic attributes of sampling to perform attribute distribution statistics of the data set.
  • Step S4 constructing a training data set according to the sampled virtual samples to train the pedestrian re-identification model, and verify the recognition effect of the trained model.
  • the specific identification and verification process includes: using the model obtained by training to match the pictures with the same identity identification in the searched pictures in the gallery, and outputting the corresponding picture indexes in sequence according to the possibility, and comparing them with the real labels.
  • the pedestrian re-identification method of this embodiment under the privacy-protected pedestrian re-identification setting, we cannot obtain the pedestrian appearance in the real pedestrian data set in the target domain, and turn to the virtual character generated by the game engine to replace the real character as foreground information to extract pedestrians Based on this strategy, a batch of new virtual samples are generated by fusing virtual characters and real backgrounds to be used as the training set of the pedestrian re-identification model.
  • the model trained by the method provided in this embodiment can effectively protect the privacy of pedestrians from being infringed, and use the relevant information in the target domain that does not involve privacy as much as possible to shorten the distance from the distribution of the target domain, with the help of real pedestrians in the target domain
  • the content information (background and pose, etc.) of the data set realizes the basic transformation of the virtual character, and then the imaging information (foreground resolution and lighting conditions, etc.) is extracted from the real pedestrian data set in the target domain and applied to the virtual sample, and the image sampling method is used according to the description.
  • information attributes of people such as clothes color
  • model training only access rights to the generated virtual samples are provided, and the test and evaluation of the recognition effect of the model applied to the real pedestrian dataset is completed under black-box conditions, thus achieving the goal of pedestrian re-identification under privacy protection.
  • Step S1 Since the virtual samples lack real-world information, privacy-independent content is introduced from the real-world data set to generate more realistic images. Therefore, the virtual data set X s and the real pedestrian data set X t need to be prepared in advance. Pedestrian images usually contain two parts of content, the background and the pedestrian as the foreground. In the traditional person re-identification task, many methods propose to reduce the influence of the background through the attention mechanism, through segmentation or local feature extraction-based methods, so that the model can pay more attention to the pedestrian itself.
  • this scheme proposes image fusion of the virtual characters of the virtual dataset and the real background of the target domain dataset.
  • a self-correcting human parsing network is used to extract the person mask in each image, and the area covered by the mask is further erased from the pedestrian image, thereby avoiding the leakage of appearance information involving pedestrian privacy.
  • the background image with pedestrians removed is inpainted using the recurrent feature inference network to obtain the complete background image.
  • the edge of the person mask obtained by the self-correcting human parsing network is incomplete, so dilation and erosion techniques are used to fill the missing pixels to further improve the integrity of the person mask.
  • the erasing process of real pedestrian images should be done by the image provider to avoid privacy leakage.
  • the present embodiment adopts the matting script to extract the avatar from the virtual image with the solid color background, so as to realize the separation of the avatar and the background more quickly and conveniently.
  • k persons are extracted from the virtual dataset generated by the game engine and l backgrounds are extracted from the real pedestrian dataset, respectively ⁇ c 1 ,...,c k ⁇ and ⁇ b 1 ,...,b l ⁇ , the two are directly synthesized to obtain n virtual images about the fusion of virtual characters and real backgrounds as training samples ⁇ x 1 , . . . , x n ⁇ .
  • the character pose of each training sample is extracted separately, and the training sample and the corresponding background are used as the input of the deep neural network based on the variational autoencoder, that is, the multi-factor variational generation network, and the objective function training is constructed.
  • the network learns to obtain the transformation law of the synthetic image with respect to the person, background and pose.
  • the multi-factor variational generation network uses a variety of factors that have nothing to do with privacy (such as background, posture, etc.) to be input into the encoder network to obtain the corresponding encoding, and the encoding is carried out through autoregressive group modeling. Joint latent variable modeling, and then generate virtual samples with the target image content through the decoder network.
  • the specific modeling process is as follows:
  • an effective method is to use a variational autoencoder to model p(x
  • z represents the latent variable
  • p(z) represents the standard normal distribution prior in the variational autoencoder framework.
  • z the joint latent variable z (c,b) . Since the foreground content information of character c is contained in the fusion image x, use x to encode c. As the goal shifts to learning p(x
  • KL represents the Kullback-Leibler divergence
  • this scheme proposes a novel multi-factor variational generative network.
  • the multi-factor variational generative network inputs the person, background and pose into the encoder network separately to obtain their low-dimensional feature encoding.
  • a multifactor variational generative network concatenates target domain-related codes into joint codes before fusing with person codes.
  • the multi-factor variational generation network adopts autoregressive group modeling to construct the joint latent variable representation of z (x,b) .
  • the parameters needed to generate the model can be learned by training the multifactor variational generative network described above.
  • This embodiment assumes that the parameters of the prior distribution and the posterior distribution are ⁇ and ⁇ , respectively.
  • p(z (x, b) ) is modeled as a Gaussian distribution, and the parameters ⁇ and ⁇ are inferred by a neural network. From this, the loss function for training can be deduced as follows:
  • this embodiment combines the perceptual function ⁇ to extract features that are more in line with visual intuition, and is used to calculate the perceptual loss between the original image input and the image generated by the decoder network. Therefore, the final loss function of this scheme is defined as follows:
  • ⁇ i represents the features extracted from each layer of the visual perception network
  • i and ⁇ i are hyperparameters used to control the contribution of different layers of the visual perception network to the total loss
  • D ⁇ represents the decoder network as a generator
  • Character resolution refers to the number of pixels of foreground pedestrians in the image.
  • images of different pedestrians are usually different according to the position and viewpoint of the camera.
  • the virtual dataset obtained by the game engine the virtual The number of pixels occupied by each person in the image is basically the same. Therefore, there is a large gap in the distribution of person resolution between the virtual source domain and the target real domain.
  • by scaling the characters in the source domain the pixel ratio of the characters in the whole image can be closer to the target domain.
  • the mask of the person in each image is first obtained through the self-corrected human parsing network, and then the number of pixels occupied by the person mask is divided by the number of pixels of the whole image to obtain the percentage.
  • the pixel proportions of the characters in the virtual dataset and the target dataset are calculated separately, and the characters in the virtual dataset are scaled accordingly to adjust the character resolution of the virtual character to have a similar percentage to the target domain.
  • a batch of virtual samples with person labels is generated by using the adjusted virtual person, the real background, and the pedestrian pose extracted from the target dataset as the input of the deep neural network.
  • Step S2 Render the generated virtual sample according to the lighting condition of the target data set.
  • lighting conditions can vary widely across datasets. Some datasets only have specific lighting conditions, such as those captured at night. Due to the huge difference in brightness, the learned person re-ID model may not be properly applied to the actual target domain.
  • this scheme adjusts the lighting situation of the source domain to adapt to the lighting situation of the target domain.
  • each image is converted to HSV format, the V channel is extracted and the average value of the V channel is calculated as the brightness value of the image, which ranges from 0-255.
  • this embodiment multiplies each image by the same coefficient to adjust the illumination of the source domain so that the luminance distributions of the two domains have similar peak distributions.
  • Step S3 Sampling the rendered virtual samples according to the character attributes of the target data set.
  • the sampling process draws virtual samples from the target domain according to descriptive information such as clothing style, age, gender, etc.
  • the attributes of characters can be manually set to ensure diversity.
  • the description information of virtual characters usually has various characteristics.
  • the images of the dataset are usually captured in a specific area within a limited period of time. For example, a dataset of real pedestrians is captured on a campus in summer, and there are a large number of pedestrians wearing T-shirts and backpacks. .
  • the virtual image is sampled according to the description information of the real target domain, so that the attributes of the virtual character are as consistent as possible with the real scene, so that the learned person re-identification model can better adapt to the target domain.
  • two attributes are selected as the basic attributes of sampling, including the color of upper body clothes and the color of lower body clothes.
  • Step S4 verifying the recognition effect, constructing a training data set by sampling the virtual samples to train the pedestrian re-identification model, and using the trained model to match the retrieved pictures in the gallery and determine the pictures with the same identity identification and in sequence according to the possibility Output the corresponding image index and compare it with the real label.
  • the implementation platform of this embodiment is pycharm software, and the basis of data reading and writing, basic mathematical operations, and optimization solutions are well-known technologies in the technical field, and details are not described here.
  • the automatic operation of the process can be realized by means of software.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Human Computer Interaction (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Processing Or Creating Images (AREA)
  • Image Analysis (AREA)

Abstract

一种基于虚拟样本的行人重识别方法,包括以下步骤:获取游戏引擎生成的虚拟人物进行预处理后通过多因素变分生成网络融合目标数据集的背景和真实人物姿态生成得到一批带有人物标签的虚拟样本;根据光照情况对生成的虚拟样本进行渲染;根据人物属性对渲染后的虚拟样本进行抽样;根据抽样得到的虚拟样本构造训练数据集对行人重识别模型进行训练,并对训练得到的模型进行识别效果验证。所述方法通过集翻译-渲染-抽样于一体的虚拟图像生成框架拉近虚拟图像与真实图像之间的分布以生成一批虚拟样本,并进行行人重识别模型训练,可以有效地应用于真实场景下的行人数据集,实现在隐私保护下学习到有效的行人重识别模型。

Description

基于虚拟样本的行人重识别方法 技术领域
本发明属于行人重识别技术领域,具体涉及一种基于虚拟样本的行人重识别方法。
背景技术
行人重识别旨在匹配每个人来自部署在不同位置的多个不重叠摄像机下拍摄的图像。最近几年,行人重识别技术迅猛发展,存在着丰富的应用场景,如找感兴趣的人(例如迷路的孩子或犯罪分子)和对特定人员进行追踪等,这使得行人重识别技术得到了深入的研究。受益于深度卷积神经网络,提出的许多行人重识别方法取得了非常高的表现性能。然而,这些行人重识别方法依赖大量行人监控视频中的图像进行训练,这将暴露个人隐私信息并可能进一步招致安全问题。由于隐私问题得到越来越广泛地关注,部分真实行人数据集被要求撤销,甚至要求不能以任何形式的出版物显示有关数据集的图像。
在行人重识别领域,无监督域自适应方法在不依赖目标域行人标签的情况下,仍然可以借助源域数据集学习到目标域数据集的相关特征,一定程度上避免了目标域行人具体类别信息的直接暴露。最新的无监督域自适应方法通常分为两类:基于聚类的方法和基于生成的方法。特别需要注意的是,前一类方法需要依靠目标图像进行无监督聚类,而后一类方法也需要通过目标图像进行图像翻译或对抗训练,这很大程度上依赖于面向公众开放的行人数据集,间接地导致目标图像中行人的隐私信息完全暴露在几乎没有任何权限限制即可随时获取的公共平台上,这是一个迫切需要引起人们重视的问题,并亟需提出有效的解决方案来应对这一现象给行人重识别领域带来的挑战。
发明内容
本发明的目的在于针对现有技术的不足之处,提供一种基于虚拟样本的行人重识别方法,通过虚拟样本实现隐私保护下的行人重识别,解决了现有技术中在隐私保护下的行人重识别任务面临的目标图像的行人外观缺失以及虚拟图像与真实图像之间存在巨大域鸿沟的挑战的难题。
为解决上述技术问题,本发明采用如下技术方案:
一种基于虚拟样本的行人重识别方法,包括以下步骤:
步骤S1、获取游戏引擎生成的虚拟人物进行预处理,并通过多因素变分生成网络融合目标数据集的背景和真实人物姿态生成得到一批带有人物标签的虚拟样本;
步骤S2、根据目标数据集的光照情况对生成的虚拟样本进行渲染;
步骤S3、根据目标数据集的人物属性对渲染后的虚拟样本进行抽样;
步骤S4、根据抽样得到的虚拟样本构造训练数据集对行人重识别模型进行训练,并对训练得到的模型进行识别效果验证。
进一步地,所述步骤S1包括:
步骤S11、从游戏引擎生成的虚拟数据集中提取k个人物和从真实行人数据集中提取l个背景,分别为{c 1,...,c k}和{b 1,...,b l},将两者直接合成得到关于虚拟人物和真实背景融合的n个虚拟图像作为训练样本{x 1,...,x n};
步骤S12、分别提取每个训练样本的人物姿态,将其与训练样本和对应的背景作为所构建的基于变分自编码器的深度神经网络即多因素变分生成网络的输入,构造目标函数训练让网络学习得到合成图像关于人物、背景和姿态的变换规律;
步骤S13、根据目标数据集的人物分辨率对虚拟人物的分辨率进行调整;
步骤S14、将调整后的虚拟人物、真实背景和从目标数据集中提取到的姿态作为网络的输入,通过网络生成得到一批带有人物标签的虚拟样本。
进一步地,在所述步骤S12中,所述目标函数为
Figure PCTCN2021122343-appb-000001
其中,x代表输入的训练样本,z (x,b)代表联合隐变量,D θ代表作为生成器的解码器网络,Φ i代表不同网络层次提取的特征参数,q φ代表后验分布参数,p θ代表先验分布参数,KL代表Kullback-Leibler散度,i和λ i是预先设置的超参数,用于控制不同网络层次的对于总损失的贡献。
进一步地,在所述步骤S13中,分别计算虚拟数据集和真实行人数据集的人物在图像中的像素占比,通过缩放虚拟数据集的人物来调整虚拟人物的分辨率并使其具有 与目标数据集相似的分辨率。
进一步地,在所述步骤S2中,将每个图像转换为HSV格式,提取V通道并计算V通道的平均值作为图像的亮度值,所述图像的亮度值范围为0~255,从而获取目标数据集的光照情况。
进一步地,在所述步骤S3中,选定上半身衣服的颜色和下半身衣服的颜色的两个属性作为抽样的基础属性以进行数据集的属性分布统计。
进一步地,在所述步骤S4中,识别验证过程包括:将训练得到的模型用来匹配检索图片在图库中判定为同一身份标识的图片,并按照可能性依次输出对应的图片索引,与真实标签做对比。
与现有技术相比,本发明的有益效果为:
基于真实场景下行人在不同摄像头下的多种背景和多种姿态变换下导致已有行人重识别模型鲁棒性差的问题、行人与摄像头之间的距离关系导致分辨率不一致的问题、不同光照情况下导致的图像亮度差异问题以及可能由于季节性变换导致的着装不同带来的属性分布不一致问题,本发明通过集翻译-渲染-抽样于一体的虚拟图像生成框架尽可能拉近虚拟图像与真实图像之间的分布并生成一批新的虚拟样本,进一步将这批虚拟样本用于训练得到的行人重识别模型,可以切实有效地应用于真实场景下的行人数据集,从而实现在不获取目标域真实行人数据集人物外观的条件下学习到有效的行人重识别模型,完成隐私保护下的行人重识别任务。具体包括如下几方面:
(1)为了实现对行人的隐私保护,对于目标场景,仅需数据提供方提供与行人隐私无关的信息,而不需要任何真实行人的外观以及身份判别信息,通过采用游戏引擎生成的虚拟人物来替代真实行人进行行人重识别模型的训练即可。
(2)充分利用目标场景的真实世界信息,本发明定义了三种与隐私无关的信息,具体包括内容信息即背景和姿态等、成像信息即分辨率和光照情况等以及描述信息即衣服颜色等人的属性。
(3)为了克服虚拟样本与真实图像之间存在的巨大域鸿,本发明采用集图像翻译-渲染-抽样于一体的虚拟图像生成框架对游戏引擎中生成的虚拟数据进行处理从而得到虚拟样本,有效实现虚拟样本到真实图像的域分布逼近。
(4)本发明具有适应度高、图像翻译灵活性强的特点,提出了一种基于变分自编码器的深度神经网络——多因素变分生成网络,该网络实现了对多种与隐私无关的因素进行编码融合,可以有效生成融合了虚拟人物与真实世界信息的虚拟样本。
附图说明
图1为本发明的实施例中基于虚拟样本的行人重识别方法的流程图。
图2为本发明的实施例中多因素变分生成网络的深度神经网络结构示意图。
具体实施方式
下面结合附图所示的实施例对本发明作进一步说明。
本实施例公开了一种基于虚拟样本的行人重识别方法,目的是提供一个隐私保护下的行人重识别方案。由于无法获取真实行人的外观,本方案将游戏引擎生成的虚拟图像用作提取人物特征的源数据集。然而,如果简单地采用虚拟源数据集X s进行行人重识别模型的训练,并将其直接应用于真实行人目标数据集X t,但由于虚拟源数据集与真实行人数据集存在着巨大的域鸿沟,该方法无法学习到目标数据集中真实行人的有效判别特征表示,这将导致模型效果远远达不到实际需求。进一步,为了更好地将从虚拟样本中学习到的模型适配到真实的目标域,本方案引入了三种类型与隐私无关的信息,具体包括内容信息(背景和姿态等)、成像信息(前景分辨率和光照情况等)和描述信息(衣服颜色等人的属性)等。内容信息蕴含了真实世界的信息和真实行人的肢体状态,成像信息迫使图像风格趋近于目标域,描述信息使得数据集的整体属性分布具备统计上的语义一致性。
如附图1所示,该基于虚拟样本的行人重识别方法具体包括以下步骤:
步骤S1、获取游戏引擎生成的虚拟数据并对其进行预处理从而得到一批带有人物标签的虚拟样本。具体地,该步骤S1包括以下步骤:
步骤S11、从游戏引擎生成的虚拟数据集中提取k个人物和从真实行人数据集中提取l个背景,分别为{c 1,...,c k}和{b 1,...,b l},将两者直接合成得到关于虚拟人物和真实背景融合的n个虚拟图像作为训练样本{x 1,...,x n}。
步骤S12、分别提取每个训练样本的人物姿态,将其与训练样本和对应的背景作为所构建的基于变分自编码器的深度神经网络即多因素变分生成网络的输入,构造目标函数训练让网络学习得到合成图像关于人物、背景和姿态的变换规律。
在步骤S12中,目标函数为
Figure PCTCN2021122343-appb-000002
其中,x代表输入的训练样本,z (x,b)代表联合隐变量,D θ代表作为生成器的解码器网络,Φ i代表不同网络层次提取的特征参数,q φ代表后验分布参数,p θ代表先验分布参数,KL代表Kullback-Leibler散度,i和λ i是预先设置的超参数,用于控制不同网络层次的对于总损失的贡献。
步骤S13、根据目标数据集的人物分辨率对虚拟人物的分辨率进行调整。
在步骤S13中,分别计算虚拟数据集和真实行人数据集的人物在图像中的像素占比,通过缩放虚拟数据集的人物来调整虚拟人物的分辨率并使其具有与目标数据集相似的分辨率。
步骤S14、将调整后的虚拟人物、真实背景和从目标数据集中提取到的姿态作为网络的输入,通过网络生成得到一批带有人物标签的虚拟样本。
步骤S2、根据目标数据集的光照情况对生成的虚拟样本进行渲染。
在步骤S2中,将每个图像转换为HSV格式,提取V通道并计算V通道的平均值作为图像的亮度值,从而获取目标数据集的光照情况。这里,图像的亮度值范围为0-255。
步骤S3、根据目标数据集的人物属性对渲染后的虚拟样本进行抽样。
在步骤S3中,选定上半身衣服的颜色和下半身衣服的颜色的两个属性作为抽样的基础属性以进行数据集的属性分布统计。
步骤S4、根据抽样得到的虚拟样本构造训练数据集对行人重识别模型进行训练,并对训练得到的模型进行识别效果验证。这里,具体的识别验证过程包括:将训练得到的模型用来匹配检索图片在图库中判定为同一身份标识的图片,并按照可能性依次输出对应的图片索引,与真实标签做对比。
根据本实施例的行人重识别方法,在隐私受保护的行人重识别设置下我们无法获取目标域真实行人数据集中的行人外观,转向由游戏引擎生成的虚拟人物替代真实人物作为前景信息来提取行人的身份判别特征,基于该策略融合虚拟人物和真实背景生成一批新的虚拟样本用作行人重识别模型的训练集。通过本实施例提供的方法训练得到的模型能有效地保护行人的隐私不受侵犯,并尽可能地利用目标域中不涉及隐私的 相关信息拉近与目标域分布的距离,借助目标域真实行人数据集的内容信息(背景和姿态等)实现虚拟人物的基本变换,再从目标域真实行人数据集中提取成像信息(前景分辨率和光照情况等)应用到虚拟样本和采用图像抽样的方式根据描述信息(衣服颜色等人的属性)对虚拟样本进行抽样,从而实现有效的行人重识别。在模型训练期间只提供对生成的虚拟样本的访问权限,并在黑盒条件下完成模型应用于真实行人数据集的识别效果的测试评估,从而实现了隐私保护下进行行人重识别的目标。
以下结合实际算法实现进行阐述:
步骤S1、由于虚拟样本缺少真实世界的信息,因此从真实世界数据集中引入与隐私无关的内容以生成更真实的图像,因此需要预先准备虚拟数据集X s和真实行人数据集X t。行人图像通常包含两个部分的内容,即背景和作为前景的行人。在传统的行人重识别任务中,许多方法提出通过注意力机制,通过分段或基于局部特征提取的方法来减少背景的影响,以使模型更多地关注到行人本身。但是,在隐私保护下的行人重识别任务中,向虚拟数据集中的行人图像学习将使模型无法确定行人在真实场景中的焦点,而且将纯粹的虚拟数据用于训练将削弱行人重识别模型的泛化能力。为了缓解此问题,本方案提出将虚拟数据集的虚拟人物与目标域数据集的真实背景进行图像融合。
具体实现时采用自校正的人体解析网络提取每个图像中的人物掩码,进一步地将掩码覆盖的区域从行人图像中抹除,从而避免了涉及行人隐私的外观信息泄露。为了获得完整的图像背景,将移除了行人的背景图片使用循环特征推理网络进行图像修补以获得完整的背景图像。在修补过程中,通过自校正的人体解析网络获得的人物掩码边缘不完整,因此采用膨胀和腐蚀技术填充丢失的像素,以进一步提高人物掩码的完整性。这里,值得一提的是,真实行人图像的人物抹除过程应由图像提供者完成以避免隐私泄露。与真实数据集的复杂现实场景不同,本实施例采用抠图脚本从纯色背景的虚拟图像中提取出虚拟人物,更加快速便捷地实现虚拟人物与其背景的分离。
设从由游戏引擎生成的虚拟数据集中提取k个人物和从真实行人数据集中提取l个背景,分别为{c 1,...,c k}和{b 1,...,b l},将两者直接合成得到关于虚拟人物和真实背景融合的n个虚拟图像作为训练样本{x 1,...,x n}。
进一步,分别提取每个训练样本的人物姿态,将其与训练样本和对应的背景作为所构建的基于变分自编码器的深度神经网络即多因素变分生成网络的输入,构造目标 函数训练让网络学习得到合成图像关于人物、背景和姿态的变换规律。如图2所示,所述多因素变分生成网络采用了多种与隐私无关的因素(如背景、姿态等)输入到编码器网络当中得到相应的编码,通过自回归组建模对编码进行联合隐变量的建模,再通过解码器网络生成具有目标图像内容的虚拟样本。具体建模过程如下:
假设训练样本图像x由前景人物c和背景b融合而成,为了控制c和b构造最大化概率分布p(x|c,b)作为生成器,一种有效的方法是使用变分自编码器来建模p(x|z),其中z表示隐变量,而p(z)表示变分自编码器框架中标准正态分布的先验。但是,在不能保证在此先验条件下,隐变量c和b在潜在空间中是分开的。因此,为了对z进行建模能够表达出空间信息c和b并保持信息不会在编码过程中丢失,将z表示为联合隐变量z (c,b)。由于人物c这部分前景内容信息包含在融合图像x中,因此使用x编码c。随着目标转变为学习p(x|z (x,b)),需要最大化给定的观测数据即输入的训练样本x的对数似然性,并使用神经网络推断从x和b编码得到的隐变量z。于是有:
Figure PCTCN2021122343-appb-000003
其中q(z (x,b)|x)是编码器上的近似后验分布。为避免难以求解的积分,可以根据log p(x)写出变分下界
Figure PCTCN2021122343-appb-000004
为:
Figure PCTCN2021122343-appb-000005
其中KL表示Kullback-Leibler散度。
如上所述,传统的用于变分自编码器的编码器-解码器结构不适用于学习具有多个 隐变量的解缠表示。为此,本方案提出了一种新颖的多因素变分生成网络。如图2所示,多因素变分生成网络将人物、背景和姿态分别输入到编码器网络中以获得其低维特征编码。在与人物编码融合之前,多因素变分生成网络将与目标域相关的编码拼接为联合编码。同时为了提高变分自编码器的表达能力,多因素变分生成网络采用自回归组建模构造z (x,b)的联合隐变量表示。根据变分下界
Figure PCTCN2021122343-appb-000006
和给定先验的p(z (x,b)),可以通过训练上述多因素变分生成网络学习到生成模型所需的参数。本实施例假设先验分布和后验分布的参数分别是θ和φ。本实施例建模p(z (x,b))为高斯分布,参数θ和φ由神经网络进行推断。由此可以推导出训练的损失函数如下所示:
Figure PCTCN2021122343-appb-000007
在此基础上,本实施例结合了感知函数Φ来提取更符合视觉直观上的特征,用于计算原始图像输入与解码器网络生成的图像之间的感知损失。因此,本方案的最终损失函数定义如下:
Figure PCTCN2021122343-appb-000008
其中Φ i表示从视觉感知网络的每一层中提取的特征,i和λ i是用于控制视觉感知网络的不同层次对总损失贡献的超参数,D θ表示作为生成器的解码器网络;
进一步、在确保行人隐私不被侵犯即在黑盒条件下提取目标数据集中行人的姿态并调整虚拟人物分辨率。人物分辨率是指图像中前景行人的像素数,在真实场景下,不同行人图像通常根据摄像机的位置和视点而有所不同,而在由游戏引擎获取的虚拟数据集中,在相同尺寸下的虚拟图像每个人物占用的像素数基本相同。因此,人物分辨率的分布在虚拟源域和目标真实域之间存在较大差距。本实施例通过缩放源域中的人物,以使整个图像中的人物像素比例可以更接近目标域。首先通过自校正的人体解析网络获得人物在每个图像中的掩码,然后将人物掩码所占据的像素数除以整个图像的像素数以获取百分比。分别计算虚拟数据集和目标数据集的人物在图像中的像素占比,据此缩放虚拟数据集的人物以调整虚拟人物的人物分辨率使其具有与目标域相似的百分比。
最后,将调整后的虚拟人物、真实背景和从目标数据集中提取到的行人姿态作为深度神经网络的输入,生成一批带有人物标签的虚拟样本。
步骤S2、根据目标数据集的光照情况对生成的虚拟样本进行渲染。考虑到在不同的时间、背景、视点等条件下拍摄的图像,光照情况在各个数据集之间可能会有很大不同。有些数据集仅具有特定的光照情况,例如在夜间捕获的数据集。由于存在巨大的亮度差异,因此学习到的行人重识别模型可能无法正常应用于实际的目标域。为了解决这个问题,本方案调整源域的光照情况以适应目标域的光照情况。为了获取目标数据集的光照情况,将每个图像转换为HSV格式,提取V通道并计算V通道的平均值作为图像的亮度值,其范围为0-255。通过计算来自虚拟源域和真实目标域的图像的亮度值,本实施例对每个图像乘以相同的系数来调整源域的光照情况,以使两个域的亮度分布具有相似的峰值分布。
步骤S3、根据目标数据集的人物属性对渲染后的虚拟样本进行抽样。抽样过程根据描述信息如衣服样式、年龄、性别等从目标域中抽取虚拟样本。对于虚拟数据集,人物的属性可以人工设定以确保多样性,借助强大的游戏引擎,虚拟人物的描述信息通常具有多种多样的特征。而在真实场景中,数据集的图像通常是在有限的时间段内在特定区域中捕获的,如有的真实行人数据集是在夏天的校园中捕获的,存在大量行人穿着T恤和背包的现象。本实施例根据真实目标域的描述信息对虚拟图像进行抽样,使得虚拟人物的属性特征与真实场景尽可能保持一致,从而让学习到的行人重识别模型更好地适应目标域。为了简化数据集的属性分布统计过程,选定两个属性作为抽样的基础属性,包括上半身衣服的颜色和下半身衣服的颜色。
步骤S4、验证识别效果,通过抽样得到的虚拟样本构造训练数据集对行人重识别模型进行训练,将训练得到的模型用来匹配检索图片在图库中判定为同一身份标识的图片并按照可能性依次输出对应的图片索引,与真实标签做对比。
本实施例的实现平台是pycharm软件,数据读写、基本数学运算、优化求解等基础是本技术领域的公知技术,在此不作赘述。具体实施时,可采用软件方式实现流程的自动运行。
本发明的保护范围不限于上述的实施例,显然,本领域的技术人员可以对本发明进行各种改动和变形而不脱离本发明的范围和精神。倘若这些改动和变形属于本发明权利要求及其等同技术的范围,则本发明的意图也包含这些改动和变形在内。

Claims (7)

  1. 一种基于虚拟样本的行人重识别方法,其特征在于,包括以下步骤:
    步骤S1、获取游戏引擎生成的虚拟人物进行预处理,并通过多因素变分生成网络融合目标数据集的背景和真实人物姿态生成得到一批带有人物标签的虚拟样本;
    步骤S2、根据目标数据集的光照情况对生成的虚拟样本进行渲染;
    步骤S3、根据目标数据集的人物属性对渲染后的虚拟样本进行抽样;
    步骤S4、根据抽样得到的虚拟样本构造训练数据集对行人重识别模型进行训练,并对训练得到的模型进行识别效果验证。
  2. 根据权利要求1所述的基于虚拟样本的行人重识别方法,其特征在于:
    所述步骤S1包括:
    步骤S11、从游戏引擎生成的虚拟数据集中提取k个人物和从真实行人数据集中提取l个背景,分别为{c 1,...,c k}和{b 1,...,b l},将两者直接合成得到关于虚拟人物和真实背景融合的n个虚拟图像作为训练样本{x 1,...,x n};
    步骤S12、分别提取每个训练样本的人物姿态,将其与训练样本和对应的背景作为所构建的基于变分自编码器的深度神经网络即多因素变分生成网络的输入,构造目标函数训练让网络学习得到合成图像关于人物、背景和姿态的变换规律;
    步骤S13、根据目标数据集的人物分辨率对虚拟人物的分辨率进行调整;
    步骤S14、将调整后的虚拟人物、真实背景和从目标数据集中提取到的姿态作为网络的输入,通过网络生成得到一批带有人物标签的虚拟样本。
  3. 根据权利要求2所述的基于虚拟样本的行人重识别方法,其特征在于:
    在所述步骤S12中,所述目标函数为
    Figure PCTCN2021122343-appb-100001
    其中,x代表输入的训练样本,z (x,b)代表联合隐变量,D θ代表作为生成器的解码器网络,Φ i代表不同网络层次提取的特征参数,q φ代表后验分布参数,p θ代表先验分布参数,KL代表Kullback-Leibler散度,i和λ i是预先设置的超参数,用于控制不同网络层 次的对于总损失的贡献。
  4. 根据权利要求2所述的基于虚拟样本的行人重识别方法,其特征在于:
    在所述步骤S13中,分别计算虚拟数据集和真实行人数据集的人物在图像中的像素占比,通过缩放虚拟数据集的人物来调整虚拟人物的分辨率并使其具有与目标数据集相似的分辨率。
  5. 根据权利要求1所述的基于虚拟样本实的行人重识别方法,其特征在于:
    在所述步骤S2中,将每个图像转换为HSV格式,提取V通道并计算V通道的平均值作为图像的亮度值,该通道亮度值范围为0~255,从而获取目标数据集的光照情况。
  6. 根据权利要求1所述的基于虚拟样本的行人重识别方法,其特征在于:
    在所述步骤S3中,选定上半身衣服的颜色和下半身衣服的颜色的两个属性作为抽样的基础属性以进行数据集的属性分布统计。
  7. 根据权利要求1所述的基于虚拟样本的行人重识别方法,其特征在于:
    在所述步骤S4中,识别验证过程包括:将训练得到的模型用来匹配检索图片在图库中判定为同一身份标识的图片,并按照可能性依次输出对应的图片索引,与真实标签做对比。
PCT/CN2021/122343 2021-01-28 2021-09-30 基于虚拟样本的行人重识别方法 WO2022160773A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US18/337,439 US11837007B2 (en) 2021-01-28 2023-06-20 Pedestrian re-identification method based on virtual samples

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110122521.4A CN112784783B (zh) 2021-01-28 2021-01-28 基于虚拟样本的行人重识别方法
CN202110122521.4 2021-01-28

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/337,439 Continuation-In-Part US11837007B2 (en) 2021-01-28 2023-06-20 Pedestrian re-identification method based on virtual samples

Publications (1)

Publication Number Publication Date
WO2022160773A1 true WO2022160773A1 (zh) 2022-08-04

Family

ID=75759603

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/122343 WO2022160773A1 (zh) 2021-01-28 2021-09-30 基于虚拟样本的行人重识别方法

Country Status (3)

Country Link
US (1) US11837007B2 (zh)
CN (1) CN112784783B (zh)
WO (1) WO2022160773A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115496191A (zh) * 2022-11-08 2022-12-20 腾讯科技(深圳)有限公司 一种模型训练方法及相关装置

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112784783B (zh) 2021-01-28 2023-05-02 武汉大学 基于虚拟样本的行人重识别方法

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110414462A (zh) * 2019-08-02 2019-11-05 中科人工智能创新技术研究院(青岛)有限公司 一种无监督的跨域行人重识别方法及系统
CN110490960A (zh) * 2019-07-11 2019-11-22 阿里巴巴集团控股有限公司 一种合成图像生成方法及装置
CN110555390A (zh) * 2019-08-09 2019-12-10 厦门市美亚柏科信息股份有限公司 基于半监督训练方式的行人重识别方法、装置及介质
US20190378333A1 (en) * 2018-06-08 2019-12-12 Verizon Patent And Licensing Inc. Methods and systems for representing a pre-modeled object within virtual reality data
CN112784783A (zh) * 2021-01-28 2021-05-11 武汉大学 基于虚拟样本的行人重识别方法

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11138469B2 (en) * 2019-01-15 2021-10-05 Naver Corporation Training and using a convolutional neural network for person re-identification
GB2584727B (en) * 2019-06-14 2024-02-28 Vision Semantics Ltd Optimised machine learning
CN110427813B (zh) * 2019-06-24 2023-06-09 中国矿业大学 基于姿态指导行人图像生成的孪生生成式对抗网络的行人重识别方法
CN110796080B (zh) * 2019-10-29 2023-06-16 重庆大学 一种基于生成对抗网络的多姿态行人图像合成算法
US20230004760A1 (en) * 2021-06-28 2023-01-05 Nvidia Corporation Training object detection systems with generated images

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190378333A1 (en) * 2018-06-08 2019-12-12 Verizon Patent And Licensing Inc. Methods and systems for representing a pre-modeled object within virtual reality data
CN110490960A (zh) * 2019-07-11 2019-11-22 阿里巴巴集团控股有限公司 一种合成图像生成方法及装置
CN110414462A (zh) * 2019-08-02 2019-11-05 中科人工智能创新技术研究院(青岛)有限公司 一种无监督的跨域行人重识别方法及系统
CN110555390A (zh) * 2019-08-09 2019-12-10 厦门市美亚柏科信息股份有限公司 基于半监督训练方式的行人重识别方法、装置及介质
CN112784783A (zh) * 2021-01-28 2021-05-11 武汉大学 基于虚拟样本的行人重识别方法

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115496191A (zh) * 2022-11-08 2022-12-20 腾讯科技(深圳)有限公司 一种模型训练方法及相关装置
CN115496191B (zh) * 2022-11-08 2023-04-07 腾讯科技(深圳)有限公司 一种模型训练方法及相关装置

Also Published As

Publication number Publication date
CN112784783A (zh) 2021-05-11
US11837007B2 (en) 2023-12-05
US20230334895A1 (en) 2023-10-19
CN112784783B (zh) 2023-05-02

Similar Documents

Publication Publication Date Title
Kumar et al. Object detection system based on convolution neural networks using single shot multi-box detector
CN109376582B (zh) 一种基于生成对抗网络的交互式人脸卡通方法
CN112766160B (zh) 基于多级属性编码器和注意力机制的人脸替换方法
WO2022160773A1 (zh) 基于虚拟样本的行人重识别方法
Garrido et al. Corrective 3D reconstruction of lips from monocular video.
Zhang et al. Copy and paste GAN: Face hallucination from shaded thumbnails
CN112215180A (zh) 一种活体检测方法及装置
Shiri et al. Identity-preserving face recovery from stylized portraits
CN113486944A (zh) 人脸融合方法、装置、设备及存储介质
Yang et al. Training with augmented data: Gan-based flame-burning image synthesis for fire segmentation in warehouse
Lu et al. Detection of deepfake videos using long-distance attention
Barni et al. Iris deidentification with high visual realism for privacy protection on websites and social networks
CN112101320A (zh) 模型训练方法、图像生成方法、装置、设备及存储介质
Cai et al. Fcsr-gan: End-to-end learning for joint face completion and super-resolution
Arora et al. A review of techniques to detect the GAN-generated fake images
Zeinstra et al. ForenFace: a unique annotated forensic facial image dataset and toolset
Xu et al. RelightGAN: Instance-level generative adversarial network for face illumination transfer
Jiang et al. DeepFakes detection: the DeeperForensics dataset and challenge
CN112488165A (zh) 一种基于深度学习模型的红外行人识别方法及系统
Shiri et al. Recovering faces from portraits with auxiliary facial attributes
Marnissi et al. GAN-based Vision Transformer for High-Quality Thermal Image Enhancement
CN112233054B (zh) 基于关系三元组的人-物交互图像生成方法
He et al. FA-GANs: Facial attractiveness enhancement with generative adversarial networks on frontal faces
Annadani et al. Augment and adapt: A simple approach to image tampering detection
CN111275778B (zh) 人脸简笔画生成方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21922364

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21922364

Country of ref document: EP

Kind code of ref document: A1