WO2023160312A1 - 基于自监督学习的行人重识别方法、装置、设备及存储介质 - Google Patents

基于自监督学习的行人重识别方法、装置、设备及存储介质 Download PDF

Info

Publication number
WO2023160312A1
WO2023160312A1 PCT/CN2023/072914 CN2023072914W WO2023160312A1 WO 2023160312 A1 WO2023160312 A1 WO 2023160312A1 CN 2023072914 W CN2023072914 W CN 2023072914W WO 2023160312 A1 WO2023160312 A1 WO 2023160312A1
Authority
WO
WIPO (PCT)
Prior art keywords
loss function
data set
training data
pedestrian
neural network
Prior art date
Application number
PCT/CN2023/072914
Other languages
English (en)
French (fr)
Inventor
吴鸿伟
林修明
梁煜麓
沈代明
林淑强
朱海勇
Original Assignee
厦门市美亚柏科信息股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 厦门市美亚柏科信息股份有限公司 filed Critical 厦门市美亚柏科信息股份有限公司
Publication of WO2023160312A1 publication Critical patent/WO2023160312A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Definitions

  • the present invention relates to the technical field of image processing, in particular to a pedestrian re-identification method, device, equipment and storage medium based on self-supervised learning.
  • Person re-identification is a technology that uses computer vision technology to judge whether there is a specific target in an image or video sequence.
  • person re-identification technology mainly includes representation learning, metric learning, methods based on local features or video sequences. It is very difficult to collect and mark the training data set of the recognition method, and the existing data set can only reach the level of tens of thousands. On the one hand, the small number of training data sets prevents the accuracy of person re-identification methods from being further improved. On the other hand, even combining a small number of training data sets may not necessarily form a positive contribution in training, which hinders the further practical application of person re-identification technology.
  • the purpose of one or more embodiments of the present invention is to propose a self-supervised learning-based pedestrian re-identification method, device, device, and storage medium to at least solve one of the above-mentioned problems.
  • a method for pedestrian re-identification based on self-supervised learning including:
  • the training of the trained pedestrian re-identification model includes:
  • the second neural network identical to the first neural network is trained to obtain a second feature vector of the second training data set; wherein, the first training data set is enhanced. Obtain the second training data set;
  • the data enhancement samples of i samples, ⁇ is the weight factor.
  • the correlation coefficients include: Wherein, m is the number of samples of the first training data set, T A m,i represents the first output feature vector corresponding to the i-th sample, and T B m,j represents the second output feature vector corresponding to the j-th sample.
  • the second loss function includes: a hard sample sampling triplet loss function and a classification loss function of the first neural network.
  • the overall loss of the first neural network includes the sum of the first loss function and the second loss function, specifically including:
  • L total ⁇ L self-sup + ⁇ L triHard + ⁇ L softmax , where ⁇ , ⁇ , and ⁇ are given parameters, L self-sup is the first loss function, and L triHard is the first loss function A hard sample sampling triplet loss function of a neural network, and L softmax is a classification loss function of the first neural network.
  • the hard sample sampling triplet loss function includes:
  • a training batch batch contains P ⁇ K pictures
  • the difficult sample sampling triplet loss function includes:
  • is an artificially set threshold parameter. Calculate the Euclidean distance between picture a and each picture in the training batch batch in the feature space, and then select the positive sample p that is farthest from a and the negative sample p that is the closest. Sample n to compute the triplet loss.
  • the enhancement processing includes: spatial domain enhancement and/or frequency domain enhancement.
  • a self-supervised learning-based pedestrian re-identification device including:
  • An acquisition module configured to acquire image data to be identified
  • a recognition module configured to perform pedestrian re-identification on the image data to be recognized based on the trained pedestrian re-identification model, to obtain a pedestrian re-identification result
  • the training of the trained pedestrian re-identification model includes:
  • the second neural network identical to the first neural network is trained to obtain a second feature vector of the second training data set; wherein, the first training data set is enhanced. Obtain the second training data set;
  • an electronic device including a memory, a processor, and a computer program stored in the memory and operable on the processor, wherein the processor implements the program when executing the program.
  • a non-transitory computer-readable storage medium which is characterized in The feature is that the non-transitory computer-readable storage medium stores computer instructions, and the computer instructions are used to cause the computer to execute the method described in the first aspect.
  • the self-supervised learning-based pedestrian re-identification method, device, device, and storage medium use the same neural network as the self-supervised training branch to perform re-identification based on The training data set after data enhancement is used for training, so that the pedestrian re-identification model can learn the inherent prior laws of the image itself, thereby improving the accuracy of pedestrian re-identification, and can also solve the problem of difficult data collection for training.
  • the network of the self-supervised training branch there is no need to deploy the network of the self-supervised training branch, which will not bring additional model complexity to the pedestrian re-identification network.
  • FIG. 1 is a schematic flow chart of a method for re-identifying pedestrians based on self-supervised learning according to an embodiment of the present invention
  • Fig. 2 is a schematic example of a pedestrian re-identification method based on self-supervised learning according to an embodiment of the present invention
  • FIG. 3 is a schematic block diagram of a pedestrian re-identification device based on self-supervised learning according to an embodiment of the present invention
  • Fig. 4 is a schematic block diagram of an electronic device according to an embodiment of the present invention.
  • the present invention proposes training of person re-identification based on self-supervised learning to improve the recognition accuracy of pedestrian re-identification.
  • FIG. 1 shows a schematic flowchart of a method for pedestrian re-identification based on self-supervised learning according to an embodiment of the present invention.
  • the person re-identification method based on self-supervised learning includes:
  • Step S110 acquiring image data to be identified
  • Step S120 perform pedestrian re-identification on the image data to be identified based on the trained pedestrian re-identification model, and obtain a pedestrian re-identification result;
  • the training of the trained pedestrian re-identification model includes:
  • the second neural network identical to the first neural network is trained to obtain a second feature vector of the second training data set; wherein, the first training data set is enhanced. Obtain the second training data set;
  • the training data set and the training data set are respectively enhanced by using the same neural network for feature extraction to obtain the first feature vector and the second feature vector, and calculate the relationship between the first feature vector and the second feature vector
  • the first loss function is combined with the second loss function during the model's own training to calculate the overall loss function during the training process, and the pedestrian re-identification model is obtained by training with the goal of minimizing the overall loss function.
  • This embodiment uses the same neural network as the self-supervised training branch to train the training data set based on data enhancement, so that the pedestrian re-identification model can learn the inherent prior laws of the image itself, thereby improving the accuracy of pedestrian re-identification , and can also solve the problem of difficult training data collection.
  • the network of the self-supervised training branch In the actual deployment environment, there is no need to deploy the network of the self-supervised training branch, which will not bring additional model complexity to the pedestrian re-identification network. It can be widely used in various scenarios that require pedestrian re-identification.
  • step S110 image data to be recognized is acquired.
  • the image data to be recognized may be real-time data directly collected by an image collection device, or image data obtained from a local data source or a remote data source.
  • the image data to be recognized may include video data and images.
  • the image data to be recognized may be one frame of images or multiple frames of images in video data.
  • video data may be divided into frames to obtain image data.
  • the image data to be recognized may also be a continuous or non-continuous image sequence.
  • step S120 perform pedestrian re-identification on the image data to be recognized based on the trained pedestrian re-identification model, and obtain a pedestrian re-identification result.
  • the image data to be recognized may be input into a trained pedestrian re-identification model, and the trained pedestrian re-identification model may output the pedestrian re-identification result after performing corresponding processing on the image data to be recognized.
  • the pedestrian re-identification result may include: identity information of the target object. Such as ID number, name, etc.
  • the training of the trained pedestrian re-identification model includes:
  • the enhancement processing may include: spatial domain enhancement.
  • the spatial domain enhancement may include at least one of the following: grayscale change, histogram correction, image smoothing, and image sharpening.
  • the grayscale change may include at least one of the following: linear change, piecewise linear change, or nonlinear change (eg, logarithmic transformation, exponential transformation, etc.).
  • image smoothing may include at least one of the following: mean filtering, median filtering, over-limit pixel smoothing, gray-scale K-nearest neighbor averaging, maximum uniformity smoothing, or selective edge-preserving smoothing.
  • image sharpening may include at least one of the following: gradient sharpening method, Laplace transform method, and high-pass filtering method.
  • the enhancement processing may include: frequency domain enhancement.
  • the frequency domain enhancement may include at least one of the following: high-pass filtering, low-pass filtering, homomorphic filtering enhancement, color enhancement (eg, false color enhancement or false color enhancement).
  • the data enhancement samples of i samples, ⁇ is the weight factor.
  • cross-correlation coefficients may include: Wherein, m is the number of samples of the first training data set, T A m,i represents the first output feature vector corresponding to the i-th sample, and T G m,j represents the second output feature vector corresponding to the j-th sample.
  • the second loss function includes: a hard sample sampling triplet loss function and a classification loss function of the first neural network.
  • the hard sample sampling triplet loss function may include: for each training batch in the first training data set, randomly select P target pedestrians with IDs, each target pedestrian randomly selects K different pictures, that is, a training batch batch contains P ⁇ K pictures. After that, for each image a in the training batch, select one of the most difficult positive samples and one of the most difficult negative samples and a to form a triplet.
  • the picture set with the same ID as a can be defined as A, and the picture set formed by the remaining pictures with different IDs is B, then TriHard loss can be expressed as:
  • is an artificially set threshold parameter, calculate the Euclidean distance between the picture a and each picture in the training batch batch in the feature space, and then select the positive sample p that is the farthest (least like) from a and the closest (most similar) negative sample n to calculate the triplet loss.
  • the classification loss function may include a SoftMax loss function L softmax , and those skilled in the art know the calculation method of this function, which will not be repeated here.
  • the overall loss of the first neural network may include the sum of the first loss function and the second loss function, specifically including:
  • L total ⁇ L self-sup + ⁇ L triHard + ⁇ L softmax , where ⁇ , ⁇ , and ⁇ are given parameters.
  • FIG. 2 shows a schematic example of a method for pedestrian re-identification based on self-supervised learning according to an embodiment of the present invention.
  • the second neural network 210 as a self-supervised branch is consistent with the backbone network of the recognition part, that is, the first neural network 220 , except that the input data is augmented with random data.
  • the goal of the self-supervised branch is to target different perspectives a, b (corresponding to X A , X B , where X A is the original data and X B is the data after X A is enhanced) for the same batch input training data set X.
  • the diagonal elements of the cross-correlation matrix C between the eigenvectors TA and TB extracted through the same network mechanism should be as close to 1 as possible, and the other elements should be as close to 0 as possible, so as to calculate the first loss function.
  • the first neural network 220 is f
  • its model parameter is ⁇ .
  • the method in one or more embodiments of the present invention may be executed by a single device, For example a computer or server etc.
  • the method of this embodiment can also be applied in a distributed scenario, and is completed by cooperation of multiple devices.
  • one of the multiple devices may only perform one or more steps in the method of one or more embodiments of the present invention, and the multiple devices will perform mutual interact to complete the described method.
  • one or more embodiments of the present invention also provide a pedestrian re-identification device based on self-supervised learning, corresponding to any of the methods in the above-mentioned embodiments.
  • the described pedestrian re-identification device based on self-supervised learning includes:
  • An acquisition module configured to acquire image data to be identified
  • a recognition module configured to perform pedestrian re-identification on the image data to be recognized based on the trained pedestrian re-identification model, to obtain a pedestrian re-identification result
  • the training of the trained pedestrian re-identification model includes:
  • the second neural network identical to the first neural network is trained to obtain a second feature vector of the second training data set; wherein, the first training data set is enhanced. Obtain the second training data set;
  • the device of the above-mentioned embodiment is used to implement the corresponding self-supervised learning-based pedestrian re-identification method in any of the above-mentioned embodiments, and has the beneficial effect of the corresponding method embodiment, which will not be repeated here.
  • one or more embodiments of the present invention also provide an electronic device, including a memory, a processor, and a computer stored on the memory and operable on the processor
  • a program when the processor executes the program, implements the self-supervised learning-based pedestrian re-identification method described in any one of the above embodiments.
  • FIG. 4 shows a schematic diagram of a more specific hardware structure of an electronic device provided by this embodiment.
  • the device may include: a processor 410 , a memory 420 , an input/output interface 430 , a communication interface 440 and a bus 450 .
  • the processor 410 , the memory 420 , the input/output interface 430 and the communication interface 440 are connected to each other within the device through the bus 450 .
  • the processor 410 may be implemented by a general-purpose CPU (Central Processing Unit, central processing unit), a microprocessor, an application-specific integrated circuit (Application Specific Integrated Circuit, ASIC), or one or more integrated circuits, and is used to execute related programs to realize the technical solutions provided by the embodiments of the present invention.
  • a general-purpose CPU Central Processing Unit, central processing unit
  • a microprocessor an application-specific integrated circuit (Application Specific Integrated Circuit, ASIC), or one or more integrated circuits, and is used to execute related programs to realize the technical solutions provided by the embodiments of the present invention.
  • ASIC Application Specific Integrated Circuit
  • the memory 420 can be implemented in the form of ROM (Read Only Memory, read-only memory), RAM (Random Access Memory, random access memory), static storage device, dynamic storage device, etc.
  • the memory 420 can store an operating system and other application programs. When implementing the technical solutions provided by the embodiments of the present invention through software or firmware, the relevant program codes are stored in the memory 420 and invoked by the processor 410 for execution.
  • the input/output interface 430 is used to connect the input/output module to realize information input and output.
  • the input/output/module can be configured in the device as a component (not shown in the figure), or can be externally connected to the device to provide corresponding functions.
  • the input device may include a keyboard, mouse, touch screen, microphone, various sensors, etc.
  • the output device may include a display, a speaker, a vibrator, an indicator light, and the like.
  • the communication interface 440 is used to connect a communication module (not shown in the figure), so as to realize communication interaction between the device and other devices.
  • the communication module can realize communication through wired means (such as USB, network cable, etc.), and can also realize communication through wireless means (such as mobile network, WIFI, Bluetooth, etc.).
  • Bus 450 includes a path for transferring information between the various components of the device (eg, processor 410, memory 420, input/output interface 430, and communication interface 440).
  • the above device only shows the processor 410, the memory 420, the input/output interface 430, the communication interface 440 and the bus 450, in the specific implementation process, the device may also include necessary other components.
  • the above-mentioned devices may only include components necessary to realize the solutions of the embodiments of the present invention, It is not necessary to include all components shown in the figure.
  • the electronic device in the above-mentioned embodiments is used to implement the corresponding self-supervised learning-based pedestrian re-identification method in any of the above-mentioned embodiments, and has the beneficial effects of the corresponding method embodiments, which will not be repeated here.
  • one or more embodiments of the present invention also provide a non-transitory computer-readable storage medium, the non-transitory computer-readable storage medium stores computer instructions The computer instructions are used to make the computer execute the self-supervised learning-based pedestrian re-identification method as described in any one of the above embodiments.
  • the computer-readable medium in this embodiment includes permanent and non-permanent, removable and non-removable media, and information storage can be realized by any method or technology.
  • Information may be computer readable instructions, data structures, modules of a program, or other data.
  • Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory (ROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Flash memory or other memory technology, Compact Disc Read-Only Memory (CD-ROM), Digital Versatile Disc (DVD) or other optical storage, Magnetic tape cartridge, tape magnetic disk storage or other magnetic storage device or any other non-transmission medium that can be used to store information that can be accessed by a computing device.
  • PRAM phase change memory
  • SRAM static random access memory
  • DRAM dynamic random access memory
  • RAM random access memory
  • ROM read only memory
  • EEPROM Electrically Erasable Programmable
  • the computer instructions stored in the storage medium of the above-mentioned embodiments are used to make the computer execute the self-supervised learning-based pedestrian re-identification method described in any of the above-mentioned embodiments, and have the beneficial effects of the corresponding method embodiments, which are not repeated here. repeat.
  • connections to integrated circuit (IC) chips and other components may or may not be shown in the provided figures.
  • devices may be shown in block diagram form in order to avoid obscuring one or more embodiments of the invention, and this also takes into account the fact that details regarding the implementation of these block diagram devices are highly dependent on the implementation of the invention to be implemented. invent one or more platform of the embodiments (ie, the details should be well within the purview of those skilled in the art). Where specific details (eg, circuits) have been set forth to describe example embodiments of the invention, it will be apparent to those skilled in the art that other embodiments may be implemented without or with variations from these specific details.
  • One or more embodiments of the invention are implemented as follows. Accordingly, these descriptions should be regarded as illustrative rather than restrictive.
  • DRAM dynamic RAM

Abstract

本发明公开提供一种基于自监督学习的行人重识别方法、装置、设备及存储介质,方法包括:将训练数据集与该训练数据集经过数据增强后分别采用相同的神经网络进行特征提取,得到第一特征向量和第二特征向量,并计算第一特征向量和第二特征向量之间的第一损失函数,再结令模型会身训练时的第二损失函数计算训练过程中的整体损失函数,以整体损失函数最小化为目标训练得到行人重识别模型。根据本发明,使行人重识别模型可以学习图像本身固有先验规律,从而提高行人重识别的精确度。

Description

基于自监督学习的行人重识别方法、装置、设备及存储介质 技术领域
本发明涉及图像处理技术领域,尤其涉及基于自监督学习的行人重识别方法、装置、设备及存储介质。
背景技术
行人重识别(Person re-identification)是利用计算机视觉技术判断图像或者视频序列中是否存在特定目标的技术。目前行人重识别技术主要包括表征学习、度量学习、基于局部特征或视频序列等方法,不同于人脸可以到互联网直接爬取名人的图片,行人重识别由于其任务的特殊性,这些行人重识别方法的训练数据集采集标记难度高,现有的数据集往往只能达到几万的级别。一方面,训练数据集的数量较小使得行人重识别方法的精度无法进一步提升。另一方面,即使是将各个数量较小的训练数据集合并在训练中也不一定能形成正向贡献,这些都阻碍了行人重识别技术进一步走向实用化。
发明内容
本PCT申请要求于2022年02月23日提交的申请号为CN202210168277.X的中国在先申请的优先权,在此通过引用将该中国在先申请的全部内容并入本文。
有鉴于此,本发明一个或多个实施例的目的在于提出基于自监督学习的行人重识别方法、装置、设备及存储介质,以至少解决上述问题之一。
基于上述目的,根据本发明的第一方面,提供了一种基于自监督学习的行人重识别方法,包括:
获取待识别的图像数据;
基于训练好的行人重识别模型对所述待识别的图像数据进行行人重识别,得到行人重识别结果;
其中,训练好的行人重识别模型的训练包括:
基于第一训练数据集对第一神经网络进行训练,得到所述第一训练数据 集的第一特征向量;
基于第二训练数据集对与所述第一神经网络相同的第二神经网络进行训练,得到所述第二训练数据集的第二特征向量;其中,对所述第一训练数据集进行增强处理得到所述第二训练数据集;
基于所述第一特征向量和所述第二特征向量计算第一损失函数;
调整所述第一神经网络的模型参数,以使得所述第一损失函数与所述第一神经网络的第二损失函数之和最小化,得到所述训练好的行人重识别模型。
可选地,基于所述第一特征向量和所述第二特征向量计算第一损失函数,包括:
其中,Ci,j代表来自所述第一训练数据集中
的第i个样本与来自所述第二训练数据集中的第j个样本的互相关系数,i=j时表示第j个样本是第i个样本的数据增强样本,λ为权重因子。
可选地,所述互相关系数包括:
其中,m为所述第一训练数据集的样本数量,
TA m,i表示第i个样本对应的第一输出特征向量,TB m,j表示第j个样本对应的第二输出特征向量。
可选地,所述第二损失函数包括:所述第一神经网络的难样本采样三元组损失函数和分类损失函数。
可选地,所述第一神经网络的整体损失包括所述第一损失函数和所述第二损失函数之和,具体包括:
Ltotal=α·Lself-sup+β·LtriHard+γ·Lsoftmax,其中,α、β、γ为给定参数,Lself-sup为所述第一损失函数,LtriHard为所述第一神经网络的难样本采样三元组损失函数,Lsoftmax为所述第一神经网络的分类损失函数。
可选地,所述难样本采样三元组损失函数包括:
对于第一训练数据集中的每一个训练批次,随机挑选P个ID的目标行人,每个目标行人随机挑选K张不同的图片,则一个训练批次batch含有P×K张图片;
对于该训练批次batch中的每一张图片a,挑选一个最难的正样本和一个 最难的负样本和a组成一个三元组;
定义和a为相同ID的图片集为A,剩下不同ID的图片形成的图片集为B,则所述难样本采样三元组损失函数包括:
其中,α是人为设定的阈值参数,计算图片a和该训练批次batch中的每一张图片在特征空间的欧式距离,然后选出与a距离最远的正样本p和距离最近的负样本n来计算三元组损失。
可选地,所述增强处理包括:空间域增强和/或频率域增强。
根据本发明的第二方面,提供了一种基于自监督学习的行人重识别装置,包括:
获取模块,用于获取待识别的图像数据;
识别模块,用于基于训练好的行人重识别模型对所述待识别的图像数据进行行人重识别,得到行人重识别结果;
其中,训练好的行人重识别模型的训练包括:
基于第一训练数据集对第一神经网络进行训练,得到所述第一训练数据集的第一特征向量;
基于第二训练数据集对与所述第一神经网络相同的第二神经网络进行训练,得到所述第二训练数据集的第二特征向量;其中,对所述第一训练数据集进行增强处理得到所述第二训练数据集;
基于所述第一特征向量和所述第二特征向量计算第一损失函数;
调整所述第一神经网络的模型参数,以使得所述第一损失函数与所述第一神经网络的第二损失函数之和最小化,得到所述训练好的行人重识别模型。
根据本发明的第三方面,提供了一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其特征在于,所述处理器执行所述程序时实现如第一方面所述的方法。
根据本发明的第四方面,提供了一种非暂态计算机可读存储介质,其特 征在于,所述非暂态计算机可读存储介质存储计算机指令,所述计算机指令用于使所述计算机执行第一方面所述方法。
从上面所述可以看出,本发明一个或多个实施例提供的基于自监督学习的行人重识别方法、装置、设备及存储介质,通过采用相同的神经网络作为自监督训练分支,来对基于数据增强后的训练数据集进行训练,使行人重识别模型可以学习图像本身固有先验规律,从而提高行人重识别的精确度,还可以解决数训练数据采集困难的问题。而在实际部署环境中无需部署自监督训练分支的网络,不会给行人重识别网络带来额外的模型复杂度。
附图说明
为了更清楚地说明本发明一个或多个实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明一个或多个实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为根据本发明实施例的基于自监督学习的行人重识别方法的示意性流程图;
图2为根据本发明实施例的基于自监督学习的行人重识别方法的示意性示例;
图3为本发明实施例的基于自监督学习的行人重识别装置的示意性框图;
图4为本发明实施例的电子设备的示意性框图。
具体实施方式
为使本发明的目的、技术方案和优点更加清楚明白,以下结合具体实施例,并参照附图,对本发明进一步详细说明。
需要说明的是,除非另外定义,本发明一个或多个实施例使用的技术术语或者科学术语应当为本发明所属领域内具有一般技能的人士所理解的通常意义。本发明一个或多个实施例中使用的“第一”、“第二”以及类似的词语并不表示任何顺序、数量或者重要性,而只是用来区分不同的组成部分。“包括”或者“包含”等类似的词语意指出现该词前面的元件或者物件涵盖出现在该词 后面列举的元件或者物件及其等同,而不排除其他元件或者物件。“连接”或者“相连”等类似的词语并非限定于物理的或者机械的连接,而是可以包括电性的连接,不管是直接的还是间接的。“上”、“下”、“左”、“右”等仅用于表示相对位置关系,当被描述对象的绝对位置改变后,则该相对位置关系也可能相应地改变。
由于实际应用中存在无正脸、姿态变化、存在遮挡、拍摄角度、环境变化、光线差异等等不利条件和干扰因素,行人重识别任务的训练数据集收集起来比较困难,往往只能达到几万的级别。数量较少的训练数据制约了行人重识别模型的训练,无法进一步提升模型的精度,这也阻碍了行人重识别技术进一步走向实用化。因此,如何克服数据采集困难,来提升行人重识别的训练精度成为了一个亟需解决的问题。
而图像本身存在的固有先验规律,有效利用这些规律就可以让计算机完成貌似不可能的任务,如利用物体类别和颜色分布之间的内在关联完成图像自动上色,利用物体类别和形状纹理之间的关联来完成图像修复。
本发明为了有效改善由数据采集困难带来的训练样本不足的问题,提出基于自监督学习来进行人重识别的训练,提升行人重识别的识别精度。
基于上述考虑,本发明实施例提供了一种基于自监督学习的行人重识别方法。参见图1,图1示出了根据本发明实施例的基于自监督学习的行人重识别方法的示意性流程图。如图1所示,基于自监督学习的行人重识别方法包括:
步骤S110,获取待识别的图像数据;
步骤S120,基于训练好的行人重识别模型对所述待识别的图像数据进行行人重识别,得到行人重识别结果;
其中,训练好的行人重识别模型的训练包括:
基于第一训练数据集对第一神经网络进行训练,得到所述第一训练数据集的第一特征向量;
基于第二训练数据集对与所述第一神经网络相同的第二神经网络进行训练,得到所述第二训练数据集的第二特征向量;其中,对所述第一训练数据集进行增强处理得到所述第二训练数据集;
基于所述第一特征向量和所述第二特征向量计算第一损失函数;
调整所述第一神经网络的模型参数,以使得所述第一损失函数与所述第一神经网络的第二损失函数之和最小化,得到所述训练好的行人重识别模型。
其中,将训练数据集与该训练数据集经过数据增强后分别采用相同的神经网络进行特征提取,得到第一特征向量和第二特征向量,并计算第一特征向量和第二特征向量之间的第一损失函数,再结合模型自身训练时的第二损失函数计算训练过程中的整体损失函数,以整体损失函数最小化为目标训练得到行人重识别模型。
本实施例通过采用相同的神经网络作为自监督训练分支,来对基于数据增强后的训练数据集进行训练,使行人重识别模型可以学习图像本身固有先验规律,从而提高行人重识别的精确度,还可以解决训练数据采集困难的问题。而在实际部署环境中无需部署自监督训练分支的网络,不会给行人重识别网络带来额外的模型复杂度。可以广泛用于各种需要对行人重识别的场景。
可选地,在步骤S110中,获取待识别的图像数据。
其中,待识别的图像数据可以是通过图像采集装置直接采集得到的实时数据,也可以是从本地数据源或远程数据源获取得到的图像数据。
在一些实施例中,所述待识别的图像数据可以包括视频数据和图像。在一些实施例中,所述待识别的图像数据可以是视频数据中的一帧图像或多帧图像。在一些实施例中,可以对视频数据进行分帧,得到图像数据。在一些实施例中,待识别的图像数据还可以是连续或非连续的图像序列。
可选地,在步骤S120中,基于训练好的行人重识别模型对所述待识别的图像数据进行行人重识别,得到行人重识别结果。
具体来说,可以将待识别的图像数据输入训练好的行人重识别模型,该训练好的行人重识别模型对待识别的图像数据进行相应的处理后,输出所述行人重识别结果。
在一些实施例中,行人重识别结果可以包括:目标对象的身份信息。例如ID号、姓名等等。
在一些实施例中,训练好的行人重识别模型的训练包括:
基于第一训练数据集对第一神经网络进行训练,得到所述第一训练数据集的第一特征向量;
基于第二训练数据集对与所述第一神经网络相同的第二神经网络进行训 练,得到所述第二训练数据集的第二特征向量;其中,对所述第一训练数据集进行增强处理得到所述第二训练数据集;
基于所述第一特征向量和所述第二特征向量计算第一损失函数;
调整所述第一神经网络的模型参数,以使得所述第一损失函数与所述第一神经网络的第二损失函数之和最小化,得到所述训练好的行人重识别模型。
在一些实施例中,所述增强处理可以包括:空间域增强。在一些实施例中,空间域增强可以包括如下至少一种:灰度变化、直方图修正法、图像平滑、图像锐化。在一些实施例中,灰度变化可以包括如下至少一种:线性变化、分段线性变化或非线性变化(例如对数变换、指数变换等)。在一些实施例中,图像平滑可以包括如下至少一种:均值滤波、中值滤波、超限像素平滑法、灰度K近邻平均法、最大均匀性平滑或有选择保边缘平滑法。在一些实施例中,图像锐化可以包括如下至少一种:梯度锐化法、拉普拉斯变化法、高通滤波法。
在一些实施例中,所述增强处理可以包括:频率域增强。在一些实施例中,频率域增强可以包括如下至少一种:高通滤波、低通滤波、同态滤波增强、彩色增强(例如,假彩色增强或伪彩色增强)。
在一些实施例中,基于所述第一特征向量和所述第二特征向量计算第一损失函数,包括:
其中,Ci,j代表来自所述第一训练数据集中
的第i个样本与来自所述第二训练数据集中的第j个样本的互相关系数,i=j时表示第j个样本是第i个样本的数据增强样本,λ为权重因子。
在一些实施例中,互相关系数可以包括:
其中,m为所述第一训练数据集的样本数量,
TA m,i表示第i个样本对应的第一输出特征向量,TG m,j表示第j个样本对应的第二输出特征向量。
在一些实施例中,第二损失函数包括:所述第一神经网络的难样本采样三元组损失函数和分类损失函数。
其中,难样本采样三元组损失函数(Triplet loss with batch hard mining, TriHard loss)的计算可以包括:对于第一训练数据集中的每一个训练批次,随机挑选P个ID的目标行人,每个目标行人随机挑选K张不同的图片,即一个训练批次batch含有P×K张图片。之后对于该训练批次batch中的每一张图片a,挑选一个最难的正样本和一个最难的负样本和a组成一个三元组。可以定义和a为相同ID的图片集为A,剩下不同ID的图片形成的图片集为B,则TriHard loss可以表示为:
其中,α是人为设定的阈值参数,计算图片a和该训练批次batch中的每一张图片在特征空间的欧式距离,然后选出与a距离最远(最不像)的正样本p和距离最近(最像)的负样本n来计算三元组损失。
而分类损失函数可以包括SoftMax损失函数Lsoftmax,本领域技术人员知晓该函数的计算方法,在此不再赘述。
在一些实施例中,所述第一神经网络的整体损失可以包括所述第一损失函数和所述第二损失函数之和,具体包括:
Ltotal=α·Lself-sup+β·LtriHard+γ·Lsoftmax,其中,α、β、γ为给定参数。
在一些实施例中,如图2所示,图2示出了根据本发明实施例的基于自监督学习的行人重识别方法的示意性示例。图2中,作为自监督分支的第二神经网络210与识别部分的骨干(backbone)网络即第一神经网络220一致,只是输入数据作了随机数据增强。自监督分支的目标在于针对同一批量输入的训练数据集X的不同视角a,b(对应XA,XB,其中XA为原始数据,XB为对XA进行增强数据后的数据)下经过同样的网络机构所提取到的特征向量TA,TB间的互相关矩阵C的对角元素应尽可能接近1,其余元素应尽可能接近于0,由此来计算第一损失函数。可以设第一神经网络220为f,其模型参数为θ,对于训练数据集X不同视角的输入XA,XB对应的输出特征为TA=fθ(XA),TB=fθ(XB),可得其互相关矩阵C,进而得到第一损失函数Lself-sup。然后结合识别部分的难样本采样三元组损失函数LtriHard和分类损失函数Lsoftmax得到整体损失。并以整体损失最小化作为学生模型的训练目标进行训练,即可得到训练好的行人重识别模型。
需要说明的是,本发明一个或多个实施例的方法可以由单个设备执行, 例如一台计算机或服务器等。本实施例的方法也可以应用于分布式场景下,由多台设备相互配合来完成。在这种分布式场景的情况下,这多台设备中的一台设备可以只执行本发明一个或多个实施例的方法中的某一个或多个步骤,这多台设备相互之间会进行交互以完成所述的方法。
需要说明的是,上述对本发明特定实施例进行了描述。其它实施例在所附权利要求书的范围内。在一些情况下,在权利要求书中记载的动作或步骤可以按照不同于实施例中的顺序来执行并且仍然可以实现期望的结果。另外,在附图中描绘的过程不一定要求示出的特定顺序或者连续顺序才能实现期望的结果。在某些实施方式中,多任务处理和并行处理也是可以的或者可能是有利的。
基于同一发明构思,与上述任意实施例方法相对应的,本发明一个或多个实施例还提供了一种基于自监督学习的行人重识别装置。
参考图3,所述基于自监督学习的行人重识别装置,包括:
获取模块,用于获取待识别的图像数据;
识别模块,用于基于训练好的行人重识别模型对所述待识别的图像数据进行行人重识别,得到行人重识别结果;
其中,训练好的行人重识别模型的训练包括:
基于第一训练数据集对第一神经网络进行训练,得到所述第一训练数据集的第一特征向量;
基于第二训练数据集对与所述第一神经网络相同的第二神经网络进行训练,得到所述第二训练数据集的第二特征向量;其中,对所述第一训练数据集进行增强处理得到所述第二训练数据集;
基于所述第一特征向量和所述第二特征向量计算第一损失函数;
调整所述第一神经网络的模型参数,以使得所述第一损失函数与所述第一神经网络的第二损失函数之和最小化,得到所述训练好的行人重识别模型。
为了描述的方便,描述以上装置时以功能分为各种模块分别描述。当然,在实施本发明一个或多个实施例时可以把各模块的功能在同一个或多个软件和/或硬件中实现。
上述实施例的装置用于实现前述任一实施例中相应的基于自监督学习的行人重识别方法,并且具有相应的方法实施例的有益效果,在此不再赘述。
基于同一发明构思,与上述任意实施例方法相对应的,本发明一个或多个实施例还提供了一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现上任意一实施例所述的基于自监督学习的行人重识别方法。
图4示出了本实施例所提供的一种更为具体的电子设备硬件结构示意图,该设备可以包括:处理器410、存储器420、输入/输出接口430、通信接口440和总线450。其中处理器410、存储器420、输入/输出接口430和通信接口440通过总线450实现彼此之间在设备内部的通信连接。
处理器410可以采用通用的CPU(Central Processing Unit,中央处理器)、微处理器、应用专用集成电路(Application Specific Integrated Circuit,ASIC)、或者一个或多个集成电路等方式实现,用于执行相关程序,以实现本发明实施例所提供的技术方案。
存储器420可以采用ROM(Read Only Memory,只读存储器)、RAM(Random Access Memory,随机存取存储器)、静态存储设备,动态存储设备等形式实现。存储器420可以存储操作系统和其他应用程序,在通过软件或者固件来实现本发明实施例所提供的技术方案时,相关的程序代码保存在存储器420中,并由处理器410来调用执行。
输入/输出接口430用于连接输入/输出模块,以实现信息输入及输出。输入输出/模块可以作为组件配置在设备中(图中未示出),也可以外接于设备以提供相应功能。其中输入设备可以包括键盘、鼠标、触摸屏、麦克风、各类传感器等,输出设备可以包括显示器、扬声器、振动器、指示灯等。
通信接口440用于连接通信模块(图中未示出),以实现本设备与其他设备的通信交互。其中通信模块可以通过有线方式(例如USB、网线等)实现通信,也可以通过无线方式(例如移动网络、WIFI、蓝牙等)实现通信。
总线450包括一通路,在设备的各个组件(例如处理器410、存储器420、输入/输出接口430和通信接口440)之间传输信息。
需要说明的是,尽管上述设备仅示出了处理器410、存储器420、输入/输出接口430、通信接口440以及总线450,但是在具体实施过程中,该设备还可以包括实现正常运行所必需的其他组件。此外,本领域的技术人员可以理解的是,上述设备中也可以仅包含实现本发明实施例方案所必需的组件, 而不必包含图中所示的全部组件。
上述实施例的电子设备用于实现前述任一实施例中相应的基于自监督学习的行人重识别方法,并且具有相应的方法实施例的有益效果,在此不再赘述。
基于同一发明构思,与上述任意实施例方法相对应的,本发明一个或多个实施例还提供了一种非暂态计算机可读存储介质,所述非暂态计算机可读存储介质存储计算机指令,所述计算机指令用于使所述计算机执行如上任一实施例所述的基于自监督学习的行人重识别方法。
本实施例的计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。
上述实施例的存储介质存储的计算机指令用于使所述计算机执行如上任一实施例所述的基于自监督学习的行人重识别方法,并且具有相应的方法实施例的有益效果,在此不再赘述。
所属领域的普通技术人员应当理解:以上任何实施例的讨论仅为示例性的,并非旨在暗示本发明的范围(包括权利要求)被限于这些例子;在本发明的思路下,以上实施例或者不同实施例中的技术特征之间也可以进行组合,步骤可以以任意顺序实现,并存在如上所述的本发明一个或多个实施例的不同方面的许多其它变化,为了简明它们没有在细节中提供。
另外,为简化说明和讨论,并且为了不会使本发明一个或多个实施例难以理解,在所提供的附图中可以示出或可以不示出与集成电路(IC)芯片和其它部件的公知的电源/接地连接。此外,可以以框图的形式示出装置,以便避免使本发明一个或多个实施例难以理解,并且这也考虑了以下事实,即关于这些框图装置的实施方式的细节是高度取决于将要实施本发明一个或多个 实施例的平台的(即,这些细节应当完全处于本领域技术人员的理解范围内)。在阐述了具体细节(例如,电路)以描述本发明的示例性实施例的情况下,对本领域技术人员来说显而易见的是,可以在没有这些具体细节的情况下或者这些具体细节有变化的情况下实施本发明一个或多个实施例。因此,这些描述应被认为是说明性的而不是限制性的。
尽管已经结合了本发明的具体实施例对本发明进行了描述,但是根据前面的描述,这些实施例的很多替换、修改和变型对本领域普通技术人员来说将是显而易见的。例如,其它存储器架构(例如,动态RAM(DRAM))可以使用所讨论的实施例。
本发明一个或多个实施例旨在涵盖落入所附权利要求的宽泛范围之内的所有这样的替换、修改和变型。因此,凡在本发明一个或多个实施例的精神和原则之内,所做的任何省略、修改、等同替换、改进等,均应包含在本发明的保护范围之内。

Claims (10)

  1. 一种基于自监督学习的行人重识别方法,其特征在于,包括:
    获取待识别的图像数据;
    基于训练好的行人重识别模型对所述待识别的图像数据进行行人重识别,得到行人重识别结果;
    其中,训练好的行人重识别模型的训练包括:
    基于第一训练数据集对第一神经网络进行训练,得到所述第一训练数据集的第一特征向量;
    基于第二训练数据集对与所述第一神经网络相同的第二神经网络进行训练,得到所述第二训练数据集的第二特征向量;
    其中,对所述第一训练数据集进行增强处理得到所述第二训练数据集;
    基于所述第一特征向量和所述第二特征向量计算第一损失函数;
    调整所述第一神经网络的模型参数,以使得所述第一损失函数与所述第一神经网络的第二损失函数之和最小化,得到所述训练好的行人重识别模型。
  2. 根据权利要求1所述的方法,其特征在于,基于所述第一特征向量和所述第二特征向量计算第一损失函数,包括:
    其中,Ci,j代表来自所述第一训练数据集中
    的第i个样本与来自所述第二训练数据集中的第j个样本的互相关系数,i=j时表示第j个样本是第i个样本的数据增强样本,λ为权重因子。
  3. 根据权利要求2所述的方法,其特征在于,所述互相关系数包括:
    其中,m为所述第一训练数据集的样本数量,
    TA m,i表示第i个样本对应的第一输出特征向量,TB m,j表示第j个样本对应的第二输出特征向量。
  4. 根据权利要求3所述的方法,其特征在于,所述第二损失函数包括:所述第一神经网络的难样本采样三元组损失函数和分类损失函数。
  5. 根据权利要求4所述的方法,其特征在于,所述第一神经网络的整体损失包括所述第一损失函数和所述第二损失函数之和,具体包括:
    Ltotal=α·Lselfsup+β·LtriHard+γ·Lsoftmax,其中,α、β、γ为给定参数,Lself-sup
    所述第一损失函数,LtriHard为所述第一神经网络的难样本采样三元组损失函数,Lsoftmax为所述第一神经网络的分类损失函数。
  6. 根据权利要求5所述的方法,其特征在于,所述难样本采样三元组损失函数包括:
    对于第一训练数据集中的每一个训练批次,随机挑选P个ID的目标行人,每个目标行人随机挑选K张不同的图片,则一个训练批次batch含有P×K张图片;
    对于该训练批次batch中的每一张图片a,挑选一个最难的正样本和一个最难的负样本和a组成一个三元组;
    定义和a为相同ID的图片集为A,剩下不同ID的图片形成的图片集为B,则所述难样本采样三元组损失函数包括:
    其中,α是人为设定的阈值参数,计算图片a和该训练批次batch中的每一张图片在特征空间的欧式距离,然后选出与a距离最远的正样本p和距离最近的负样本n来计算三元组损失。
  7. 根据权利要求1所述的方法,其特征在于,所述增强处理包括:空间域增强和/或频率域增强。
  8. 一种基于自监督学习的行人重识别装置,其特征在于,包括:
    获取模块,用于获取待识别的图像数据;
    识别模块,用于基于训练好的行人重识别模型对所述待识别的图像数据进行行人重识别,得到行人重识别结果;
    其中,训练好的行人重识别模型的训练包括:
    基于第一训练数据集对第一神经网络进行训练,得到所述第一训练数据集的第一特征向量;
    基于第二训练数据集对与所述第一神经网络相同的第二神经网络进行训练,得到所述第二训练数据集的第二特征向量;其中,对所述第一训练数据集进行增强处理得到所述第二训练数据集;
    基于所述第一特征向量和所述第二特征向量计算第一损失函数;
    调整所述第一神经网络的模型参数,以使得所述第一损失函数与所述第一神经网络的第二损失函数之和最小化,得到所述训练好的行人重识别模型。
  9. 一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其特征在于,所述处理器执行所述程序时实现如权利要求1至7任意一项所述的方法。
  10. 一种非暂态计算机可读存储介质,其特征在于,所述非暂态计算机可读存储介质存储计算机指令,所述计算机指令用于使所述计算机执行权利要求1至7任一所述方法。
PCT/CN2023/072914 2022-02-23 2023-01-18 基于自监督学习的行人重识别方法、装置、设备及存储介质 WO2023160312A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210168277.X 2022-02-23
CN202210168277.XA CN114529946A (zh) 2022-02-23 2022-02-23 基于自监督学习的行人重识别方法、装置、设备及存储介质

Publications (1)

Publication Number Publication Date
WO2023160312A1 true WO2023160312A1 (zh) 2023-08-31

Family

ID=81624176

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/072914 WO2023160312A1 (zh) 2022-02-23 2023-01-18 基于自监督学习的行人重识别方法、装置、设备及存储介质

Country Status (3)

Country Link
CN (1) CN114529946A (zh)
WO (1) WO2023160312A1 (zh)
ZA (1) ZA202305534B (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116912535A (zh) * 2023-09-08 2023-10-20 中国海洋大学 一种基于相似筛选的无监督目标重识别方法、装置及介质
CN117251555A (zh) * 2023-11-17 2023-12-19 深圳须弥云图空间科技有限公司 一种语言生成模型训练方法和装置

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114529946A (zh) * 2022-02-23 2022-05-24 厦门市美亚柏科信息股份有限公司 基于自监督学习的行人重识别方法、装置、设备及存储介质
CN115147871A (zh) * 2022-07-19 2022-10-04 北京龙智数科科技服务有限公司 遮挡环境下行人再识别方法

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10692002B1 (en) * 2019-01-28 2020-06-23 StradVision, Inc. Learning method and learning device of pedestrian detector for robust surveillance based on image analysis by using GAN and testing method and testing device using the same
US20200250553A1 (en) * 2017-11-15 2020-08-06 Mitsubishi Electric Corporation Out-of-vehicle communication device, out-of-vehicle communication method, information processing device, and computer readable medium
CN111611880A (zh) * 2020-04-30 2020-09-01 杭州电子科技大学 一种基于神经网络无监督对比学习的高效行人重识别方法
CN113128410A (zh) * 2021-04-21 2021-07-16 湖南大学 一种基于轨迹关联学习的弱监督行人重识别方法
CN113657267A (zh) * 2021-08-17 2021-11-16 中国科学院长春光学精密机械与物理研究所 一种半监督行人重识别模型、方法和装置
CN113920540A (zh) * 2021-11-04 2022-01-11 厦门市美亚柏科信息股份有限公司 基于知识蒸馏的行人重识别方法、装置、设备及存储介质
CN113936302A (zh) * 2021-11-03 2022-01-14 厦门市美亚柏科信息股份有限公司 行人重识别模型的训练方法、装置、计算设备及存储介质
CN114529946A (zh) * 2022-02-23 2022-05-24 厦门市美亚柏科信息股份有限公司 基于自监督学习的行人重识别方法、装置、设备及存储介质

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200250553A1 (en) * 2017-11-15 2020-08-06 Mitsubishi Electric Corporation Out-of-vehicle communication device, out-of-vehicle communication method, information processing device, and computer readable medium
US10692002B1 (en) * 2019-01-28 2020-06-23 StradVision, Inc. Learning method and learning device of pedestrian detector for robust surveillance based on image analysis by using GAN and testing method and testing device using the same
CN111611880A (zh) * 2020-04-30 2020-09-01 杭州电子科技大学 一种基于神经网络无监督对比学习的高效行人重识别方法
CN113128410A (zh) * 2021-04-21 2021-07-16 湖南大学 一种基于轨迹关联学习的弱监督行人重识别方法
CN113657267A (zh) * 2021-08-17 2021-11-16 中国科学院长春光学精密机械与物理研究所 一种半监督行人重识别模型、方法和装置
CN113936302A (zh) * 2021-11-03 2022-01-14 厦门市美亚柏科信息股份有限公司 行人重识别模型的训练方法、装置、计算设备及存储介质
CN113920540A (zh) * 2021-11-04 2022-01-11 厦门市美亚柏科信息股份有限公司 基于知识蒸馏的行人重识别方法、装置、设备及存储介质
CN114529946A (zh) * 2022-02-23 2022-05-24 厦门市美亚柏科信息股份有限公司 基于自监督学习的行人重识别方法、装置、设备及存储介质

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116912535A (zh) * 2023-09-08 2023-10-20 中国海洋大学 一种基于相似筛选的无监督目标重识别方法、装置及介质
CN116912535B (zh) * 2023-09-08 2023-11-28 中国海洋大学 一种基于相似筛选的无监督目标重识别方法、装置及介质
CN117251555A (zh) * 2023-11-17 2023-12-19 深圳须弥云图空间科技有限公司 一种语言生成模型训练方法和装置
CN117251555B (zh) * 2023-11-17 2024-04-16 深圳须弥云图空间科技有限公司 一种语言生成模型训练方法和装置

Also Published As

Publication number Publication date
ZA202305534B (en) 2023-10-25
CN114529946A (zh) 2022-05-24

Similar Documents

Publication Publication Date Title
Li et al. PDR-Net: Perception-inspired single image dehazing network with refinement
WO2023160312A1 (zh) 基于自监督学习的行人重识别方法、装置、设备及存储介质
CN107330439B (zh) 一种图像中物体姿态的确定方法、客户端及服务器
US11443445B2 (en) Method and apparatus for depth estimation of monocular image, and storage medium
CN107818554B (zh) 信息处理设备和信息处理方法
US8842906B2 (en) Body measurement
CN114424253A (zh) 模型训练方法、装置、存储介质及电子设备
CN110570435B (zh) 用于对车辆损伤图像进行损伤分割的方法及装置
CN113674421B (zh) 3d目标检测方法、模型训练方法、相关装置及电子设备
CN114511041B (zh) 模型训练方法、图像处理方法、装置、设备和存储介质
CN108112271A (zh) 检测图像中的运动
CN113688907B (zh) 模型训练、视频处理方法,装置,设备以及存储介质
CN108229494B (zh) 网络训练方法、处理方法、装置、存储介质和电子设备
CN111368717A (zh) 视线确定方法、装置、电子设备和计算机可读存储介质
CN111445496B (zh) 一种水下图像识别跟踪系统及方法
CN113920540A (zh) 基于知识蒸馏的行人重识别方法、装置、设备及存储介质
CN116883588A (zh) 一种大场景下的三维点云快速稠密重建方法及系统
CN110111347B (zh) 图像标志提取方法、装置及存储介质
CN114444565A (zh) 一种图像篡改检测方法、终端设备及存储介质
CN110956131B (zh) 单目标追踪方法、装置及系统
CN116309158A (zh) 网络模型的训练方法、三维重建方法、装置、设备和介质
CN115439384A (zh) 一种无鬼影多曝光图像融合方法、装置
CN114004809A (zh) 皮肤图像处理方法、装置、电子设备和介质
CN111784658B (zh) 一种用于人脸图像的质量分析方法和系统
CN113610016A (zh) 视频帧特征提取模型的训练方法、系统、设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23758937

Country of ref document: EP

Kind code of ref document: A1