WO2022213717A1 - 模型训练方法、行人再识别方法、装置和电子设备 - Google Patents

模型训练方法、行人再识别方法、装置和电子设备 Download PDF

Info

Publication number
WO2022213717A1
WO2022213717A1 PCT/CN2022/075112 CN2022075112W WO2022213717A1 WO 2022213717 A1 WO2022213717 A1 WO 2022213717A1 CN 2022075112 W CN2022075112 W CN 2022075112W WO 2022213717 A1 WO2022213717 A1 WO 2022213717A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
pedestrian
feature
pedestrian image
encoder
Prior art date
Application number
PCT/CN2022/075112
Other languages
English (en)
French (fr)
Inventor
王之港
王健
孙昊
丁二锐
Original Assignee
北京百度网讯科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京百度网讯科技有限公司 filed Critical 北京百度网讯科技有限公司
Priority to US17/800,880 priority Critical patent/US20240221346A1/en
Priority to JP2022547887A priority patent/JP7403673B2/ja
Priority to KR1020227026823A priority patent/KR20220116331A/ko
Publication of WO2022213717A1 publication Critical patent/WO2022213717A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Definitions

  • the present disclosure relates to the field of artificial intelligence, in particular to computer vision and deep learning technologies, which can be used in smart city scenarios.
  • Pedestrian re-identification also known as person re-identification, is a technology that uses computer vision technology to determine whether there is a specific pedestrian in an image or video sequence.
  • a large number of sample images can be used to perform supervised or unsupervised training on the pedestrian re-identification model, and the pedestrian re-identification task can be completed by using the model trained to convergence.
  • the performance of the converged model depends on the quality and difficulty of the sample images. In general, the model is able to distinguish pedestrians with distinct appearances, but it is difficult to distinguish pedestrians with similar appearances but different identities.
  • the present disclosure provides a model training method, a pedestrian re-identification method, an apparatus and an electronic device.
  • a model training method comprising:
  • the image features of the first pedestrian image and the image features of the second pedestrian image are fused to obtain fusion features
  • the third pedestrian image is determined as a negative sample image of the first pedestrian image, and the first preset model is trained to converge by using the first pedestrian image and the negative sample image to obtain a pedestrian re-identification model.
  • a pedestrian re-identification method comprising:
  • the pedestrian re-identification model is used to extract features from the target image and the candidate pedestrian image, respectively, to obtain the pedestrian feature of the target image and the pedestrian feature of the candidate pedestrian image.
  • the pedestrian re-recognition model is obtained according to the model training method provided in any embodiment of the present disclosure. of;
  • the candidate pedestrian image is determined as a related image of the target image.
  • a model training apparatus comprising:
  • the first encoding module is used to extract the features of the first pedestrian image and the second pedestrian image in the sample data set by using the first encoder to obtain the image feature of the first pedestrian image and the image feature of the second pedestrian image;
  • the fusion module is used to fuse the image features of the first pedestrian image and the image features of the second pedestrian image to obtain fusion features;
  • a first decoding module configured to perform feature decoding on the fusion feature by using the first decoder to obtain a third pedestrian image
  • the first training module is used to determine the third pedestrian image as a negative sample image of the first pedestrian image, and use the first pedestrian image and the negative sample image to train the first preset model to convergence, and obtain the pedestrian re-image. Identify the model.
  • a pedestrian re-identification device comprising:
  • the second extraction module is used to perform feature extraction on the target image and the candidate pedestrian image by using the pedestrian re-identification model, respectively, to obtain the pedestrian feature of the target image and the pedestrian feature of the candidate pedestrian image; wherein, the pedestrian re-recognition model is arbitrarily implemented according to the present disclosure.
  • the model training method provided by the example is obtained;
  • the third similarity module is used to determine the similarity between the target image and the candidate pedestrian image based on the pedestrian feature of the target image and the pedestrian feature of the candidate pedestrian image;
  • the second determination module is configured to determine the candidate pedestrian image as a related image of the target image when the similarity meets the preset condition.
  • an electronic device comprising:
  • the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method in any of the embodiments of the present disclosure.
  • a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the method in any of the embodiments of the present disclosure.
  • a computer program product comprising a computer program that, when executed by a processor, implements the method in any of the embodiments of the present disclosure.
  • the third pedestrian image is obtained by fusing the image features of the first sample image and the image features of the second sample image, the third pedestrian image contains both the information in the first pedestrian image, There are also some differences from the first pedestrian image.
  • Using the third pedestrian image as the negative sample of the first pedestrian image can improve the difficulty of distinguishing between the first pedestrian image and its negative samples, so as to train a pedestrian re-identification model based on the samples with difficulty in distinguishing, and improve the model to distinguish between similar appearance but similar The effect of pedestrians with different identities.
  • FIG. 1 is a schematic diagram of a model training method provided by an embodiment of the present disclosure
  • FIG. 2 is a schematic diagram of a first stage in a model training method provided by another embodiment of the present disclosure
  • FIG. 3 is a schematic diagram of a second stage in a model training method provided by another embodiment of the present disclosure.
  • FIG. 4 is a schematic diagram of a third stage in a model training method provided by another embodiment of the present disclosure.
  • FIG. 5 is a schematic diagram of a pedestrian re-identification method provided by an embodiment of the present disclosure.
  • FIG. 6 is a schematic diagram of a model training apparatus provided by an embodiment of the present disclosure.
  • FIG. 7 is a schematic diagram of a model training apparatus provided by another embodiment of the present disclosure.
  • FIG. 8 is a schematic diagram of a model training apparatus provided by another embodiment of the present disclosure.
  • FIG. 9 is a schematic diagram of a pedestrian re-identification device provided by an embodiment of the present disclosure.
  • FIG. 10 is a block diagram of an electronic device used to implement the method of an embodiment of the present disclosure.
  • FIG. 1 shows a schematic diagram of a model training method provided by an embodiment of the present disclosure.
  • the model training method includes:
  • Step S11 using the first encoder to perform feature extraction on the first pedestrian image and the second pedestrian image in the sample data set, to obtain the image feature of the first pedestrian image and the image feature of the second pedestrian image;
  • Step S12 fuse the image features of the first pedestrian image and the image features of the second pedestrian image to obtain fused features
  • Step S13 using the first decoder to perform feature decoding on the fusion feature to obtain a third pedestrian image
  • Step S14 determining the third pedestrian image as a negative sample image of the first pedestrian image, and using the first pedestrian image and the negative sample image to train the first preset model to convergence to obtain a pedestrian re-identification model.
  • the first encoder in the above step S11 can be used for extracting image features based on the pedestrian image
  • the first decoder in step S13 can be used for decoding based on the image features to obtain a new image. Therefore, the first encoder and the first decoder can constitute an image generation model for reconstructing a new pedestrian image based on the input pedestrian image.
  • the image features extracted by the first encoder may be represented by a first vector.
  • the vector may include feature information in multiple dimensions of the corresponding pedestrian image.
  • different pedestrian images in the sample data set may be input into the first encoder respectively, and the first encoder outputs corresponding image features.
  • the fusion features are obtained by fusing the image features.
  • the fusion feature is input into the first decoder, and the first decoder reconstructs and outputs the third pedestrian image based on the fusion feature.
  • the third pedestrian image Since the third pedestrian image is reconstructed based on the fusion feature of the first pedestrian image and the second pedestrian image, the third pedestrian image contains both the information of the first pedestrian image and the information of the second pedestrian image.
  • the third pedestrian image is used as the negative sample image of the first pedestrian image, which makes the distinction between the first pedestrian image and its negative sample image more difficult, so that a pedestrian re-identification model is obtained based on the samples that are difficult to distinguish, and the model is improved. The effect of distinguishing pedestrians with similar appearance but different identities.
  • the sample dataset may include at least two pedestrian images.
  • Each pedestrian image corresponds to a pedestrian.
  • Different pedestrian images can correspond to different pedestrians or the same pedestrian.
  • an image can be sampled from the sample data set as the first sample image.
  • an image that is quite different from the first pedestrian image for example, an image corresponding to a different pedestrian from the first pedestrian image
  • the third pedestrian image is reconstructed based on the sampled image, and the first pedestrian image and the third pedestrian image are respectively input into the first preset model, and the first preset model processes the first pedestrian image and the third pedestrian image respectively.
  • output corresponding processing results such as pedestrian features or pedestrian identifications in the image.
  • the function value of the loss function is calculated.
  • the converged first preset model is determined as a pedestrian re-identification model that can be used to complete the pedestrian re-identification task.
  • the loss function corresponding to the first preset model can be used to constrain the first preset model to push the processing result of the first pedestrian image and the processing result of the negative sample image away, or to make the first preset
  • the model outputs the processing results that are as far away as possible in the feature space for the first pedestrian image and the negative sample image. Therefore, the first preset model can distinguish different pedestrian images.
  • a third pedestrian image can be generated by sampling each time, and after forming a set of positive and negative sample pairs including the first pedestrian image and the third pedestrian image, the first preset model is updated using the set of positive and negative sample pairs. related operations; then proceed to the next sampling. It is also possible to first obtain a corresponding negative sample image for each pedestrian image in the sample data set to form a plurality of positive and negative sample pairs, and then use the plurality of positive and negative sample pairs to perform related operations of updating the first preset model multiple times.
  • the first encoder and the first decoder may also be updated.
  • the model training method may also include:
  • the first encoder and the first decoder are updated based on the first similarity, the at least one second similarity, and the adversarial loss function.
  • the adversarial loss function may be used to constrain the first similarity to be greater than any one of the at least one second similarity. Based on this, updating the first encoder and the first decoder based on the first similarity, the at least one second similarity and the adversarial loss function can make the image reconstructed by the first encoder and the first decoder match the first line
  • the person images are more similar, which increases the difficulty of distinguishing between the first pedestrian image and the negative sample image, thereby further improving the effect of the pedestrian re-identification model.
  • the function value of the adversarial loss function may be calculated based on the first similarity and the second similarity, and the first encoder and the first decoder may be updated based on the function value of the adversarial loss function.
  • the first encoder and the first decoder may also be updated in conjunction with the reconstruction loss function and/or the realism of the negative sample image.
  • the reconstruction loss function can be used to constrain the similarity between the image reconstructed by the first encoder and the first decoder and the first pedestrian image and/or the second pedestrian image to be higher than a preset threshold, that is, The reconstructed image should have a certain similarity with the input image.
  • the realism can be determined using a realism discriminator.
  • the function value of the adversarial loss function and the function value of the reconstruction loss function can be calculated first, and the degree of authenticity can be determined, and then the first encoder and the second encoder can be updated by using the above three.
  • the first encoder and the second encoder are also trained by using the first pedestrian image and the negative sample images. Therefore, the first encoder and the first decoder also gradually improve the quality of the reconstructed negative sample image, thereby gradually improving the training effect of the first preset model.
  • the first encoder and the first decoder may be pre-trained based on pedestrian images.
  • the manner of obtaining the first encoder and the first decoder includes:
  • the second encoder is determined as the first encoder and the second decoder is determined as the first decoder.
  • the reconstruction loss function is used to constrain the similarity between the i-th pedestrian image and the generated image to be less than a preset threshold.
  • the reconstruction loss function constrains the decoded image to be similar to the input encoded image.
  • the second encoder and the second decoder will gradually improve the ability to reconstruct an image similar to the input image.
  • the second encoder and the second decoder are determined as the first encoder and the first decoder under the condition that the convergence conditions are met, so that the first encoder and the first decoder have the ability to reconstruct similar images. Therefore, applying the first encoder and the first decoder to generate negative sample images can improve the generation effect, thereby improving the training effect of the pedestrian re-identification model.
  • updating the second encoder and the second decoder including:
  • the second encoder and the second decoder are updated according to the function value of the reconstruction loss function and the realism of the generated image.
  • the reconstruction loss function is used to constrain the images generated by the second encoder and the second decoder to be similar to the input images, but also the generated images are constrained to be as realistic as possible.
  • Applying the first encoder and the first decoder obtained by training the second encoder and the second decoder to generate negative sample images can improve the generation effect, thereby improving the training effect of the pedestrian re-identification model.
  • the above-mentioned first preset model can also be obtained by pre-training.
  • the manner of obtaining the first preset model includes:
  • the second preset model is trained to converge to obtain the first preset model.
  • the pedestrian feature may be represented by a second vector.
  • the second vector includes features in multiple dimensions of the pedestrian corresponding to the pedestrian image.
  • each encoder, the first preset model, the second preset model, and the pedestrian re-identification model in the embodiments of the present disclosure can all be used for feature extraction, and each encoder or model can be based on the same method or different way to extract features of different dimensions.
  • the encoder can focus on extracting features related to image effects such as color
  • the first preset model, the second preset model, and the pedestrian re-identification model can focus on extracting features related to pedestrians, such as pedestrian height.
  • the above-mentioned clustering of pedestrian images can be based on DBSCAN (Density-Based Spatial Clustering of Applications with Noise, a density-based clustering method with noise), K-means (K-means Clustering Algorithm, K-means clustering class algorithm) and at least one implementation.
  • DBSCAN Density-Based Spatial Clustering of Applications with Noise
  • K-means K-means Clustering Algorithm, K-means clustering class algorithm
  • each pedestrian image is divided into different clusters, and the cluster label of each cluster can be used as a pseudo-label for each pedestrian image in the cluster.
  • Using each pedestrian image and its cluster label or pseudo-label to train the second preset model can realize unsupervised training and reduce the cost of labeling each pedestrian image.
  • the loss function corresponding to the second preset model can be used to constrain the second preset model to target different types of clusters.
  • the processing results of pedestrian images are pushed farther, and the processing results of pedestrian images of the same cluster are brought closer.
  • the second preset model gradually improves the ability to distinguish different pedestrian images.
  • first pedestrian image and second pedestrian image may be pedestrian images in different clusters of at least two clusters.
  • model training method in the embodiment of the present disclosure is described below with a specific application example.
  • the model training method is used to train a person re-identification model. Specifically, it can be divided into three stages.
  • FIG. 2 is a schematic diagram of the first stage. As shown in Figure 2, the first stage includes the following steps:
  • Feature extraction step 201 perform feature extraction on each pedestrian image in the unlabeled sample dataset 200 using the initialized model.
  • the initialized model is recorded as the second preset model, and the initialized model can be obtained by training with multiple labeled pedestrian images.
  • Clustering step 202 clustering the features extracted in step 201 by using one or more of the clustering algorithms such as DBSCAN and k-means to realize the clustering of the images in the unlabeled sample data set 200 . In this way, the images in the unlabeled sample dataset 200 are divided into different clusters in the feature space.
  • the clustering algorithms such as DBSCAN and k-means
  • Step 203 of allocating pseudo-labels assigning pseudo-labels to each image according to the cluster corresponding to each image in the feature space.
  • the pseudo labels are the corresponding cluster indices.
  • Unsupervised comparative training step 204 training a second preset model according to each image, the pseudo-label and the loss function assigned in step 203 .
  • the loss function constrains the images in the same cluster to be close to each other in the feature space, and the images of different clusters to be far away from each other in the feature space.
  • step 204 After the reciprocating and iterative training process in step 204 , the second preset model converges, and the first preset model 205 is obtained.
  • Figure 3 is a schematic diagram of the second stage.
  • the second stage is used to train the image generation model, which includes an encoder and a decoder.
  • the purpose of the second stage is to equip the image generative model with the ability to reconstruct natural images from abstract features.
  • the second stage includes steps:
  • Feature encoding step 300 Use the second encoder in the image generation model to perform feature extraction on each image in the unlabeled sample data set 200 to obtain corresponding image features 301.
  • Feature decoding step 302 decode the image feature 301 by using the second decoder in the image generation model to obtain a generated image.
  • Reality discriminating step 303 determining the realism of the generated image by using a realism discriminator. This step is used to constrain the generated images output by the image generation model to be as realistic as possible.
  • Reconstruction loss function calculation step 304 Calculate the reconstruction loss function according to the generated image and the image of the input image generation model in the unlabeled sample data set 200, and the reconstruction loss function is used to constrain the generated image decoded by the second decoder to be the same as the input
  • the images of the second encoder are similar.
  • the image generation model can be updated.
  • the second encoder in the image generation model may be determined as the first encoder, and the second decoder in the image generation model may be determined as the first decoder, so as to combine the first encoder and the The first decoder is applied in the third stage.
  • FIG 4 is a schematic diagram of the third stage. As shown in Figure 4, the third stage includes:
  • Sampling step 400 Sampling each image in the unlabeled sample dataset 200 in sequence as a reference image, that is, the first pedestrian image. Then an image that does not belong to the same cluster as the first pedestrian image is sampled as the second pedestrian image.
  • Feature encoding step 401 Use the first encoder in the image generation model to perform feature extraction on the first pedestrian image and the second pedestrian image, respectively, to obtain corresponding image features.
  • Step 402 Perform weighted fusion on the images obtained in step 401 to obtain fusion features.
  • Feature decoding step 403 use the first decoder in the image generation model to decode the fusion feature to obtain a third pedestrian image 406 .
  • Reality discriminating step 404 determining the realism of the third pedestrian image 406 by using a realism discriminator.
  • Reconstruction and Adversarial Loss Function 405 In addition to computing the reconstruction loss function, this step also computes the adversarial loss function.
  • the adversarial loss function constrains the similarity between the third pedestrian image 406 and the first pedestrian image to be greater than the similarity between the third pedestrian image 406 and other images in the unlabeled sample dataset 200 . That is, the generated third pedestrian image should have a certain similarity with the first pedestrian image in appearance.
  • Unsupervised training step 407 This step uses the third pedestrian image as a negative sample of the first pedestrian image to perform unsupervised training on the first preset model.
  • the loss function also constrains that the first pedestrian image and the negative sample image should be pushed as far as possible in the feature space, so that the model can have The effect of distinguishing difficult samples.
  • the pedestrian re-identification model 408 is finally output.
  • the third pedestrian image since the third pedestrian image is obtained by fusing the image features of the first sample image and the image features of the second sample image, the third pedestrian image includes both the information, and also has certain differences from the first pedestrian image.
  • Using the third pedestrian image as the negative sample of the first pedestrian image can improve the difficulty of distinguishing between the first pedestrian image and its negative samples, so as to train a pedestrian re-identification model based on the samples with difficulty in distinguishing, and improve the model to distinguish between similar appearance but similar The effect of pedestrians with different identities.
  • FIG. 5 shows a pedestrian re-identification method provided by an embodiment of the present disclosure, including:
  • Step S51 using the pedestrian re-identification model to perform feature extraction on the target image and the candidate pedestrian image respectively, to obtain the pedestrian feature of the target image and the pedestrian feature of the candidate pedestrian image;
  • the pedestrian re-recognition model is a model provided according to any embodiment of the present disclosure obtained by the training method;
  • Step S52 based on the pedestrian feature of the target image and the pedestrian feature of the candidate pedestrian image, determine the similarity between the target image and the candidate pedestrian image;
  • Step S53 when the similarity meets the preset condition, determine the candidate pedestrian image as a related image of the target image.
  • the preset condition is, for example, that the similarity is less than a preset threshold or the similarity is the smallest.
  • the model training method provided by the embodiment of the present disclosure obtains a pedestrian re-identification model based on samples that are difficult to distinguish
  • the pedestrian feature of each image can be accurately extracted by using the pedestrian re-identification model
  • the similarity calculation is performed based on the pedestrian feature of each image, and the use of The calculated similarity can accurately determine the related images of the target image from the candidate pedestrian images.
  • the present disclosure also provides a model training device.
  • the device includes:
  • the first encoding module 610 is configured to perform feature extraction on the first pedestrian image and the second pedestrian image in the sample data set by using the first encoder to obtain the image feature of the first pedestrian image and the image feature of the second pedestrian image;
  • a fusion module 620 configured to fuse the image features of the first pedestrian image and the image features of the second pedestrian image to obtain fusion features
  • a first decoding module 630 configured to perform feature decoding on the fusion feature by using the first decoder to obtain a third pedestrian image
  • the first training module 640 is used to determine the third pedestrian image as a negative sample image of the first pedestrian image, and use the first pedestrian image and the negative sample image to train the first preset model to convergence to obtain pedestrian re-identification Model.
  • the device further includes:
  • a first similarity module 710 configured to determine a first similarity based on the first pedestrian image and the negative sample image
  • a second similarity module 720 configured to determine at least one second similarity corresponding to the at least one pedestrian image based on at least one pedestrian image in the sample image set except the first pedestrian image;
  • the first update module 730 is configured to update the first encoder and the first decoder based on the first similarity, the at least one second similarity and the adversarial loss function.
  • the device further includes:
  • the second encoding module 750 is configured to perform feature extraction on the i-th pedestrian image in the sample data set by using the second encoder to obtain the image feature of the i-th pedestrian image; wherein, i is a positive integer greater than or equal to 1;
  • the second decoding module 760 is configured to perform feature decoding on the image feature of the i-th pedestrian image by using the second decoder to obtain a generated image;
  • the second update module 770 is configured to update the second encoder and the second decoder based on the similarity between the i-th pedestrian image and the generated image and the reconstruction loss function;
  • the first determining module 780 is configured to determine the second encoder as the first encoder and the second decoder as the first decoder when the second encoder and the second decoder meet the convergence condition.
  • the second update module 770 includes:
  • a calculation unit 771 configured to calculate the function value of the reconstruction loss function based on the similarity between the i-th pedestrian image and the generated image and the reconstruction loss function;
  • a determining unit 772 configured to determine the authenticity of the generated image by using the authenticity discriminator
  • the updating unit 773 is configured to update the second encoder and the second decoder according to the function value of the reconstruction loss function and the authenticity of the generated image.
  • the device further includes:
  • the first extraction module 810 is configured to perform feature extraction on each pedestrian image in the sample data set by using the second preset model to obtain the pedestrian feature of each pedestrian image;
  • a clustering module 820 configured to cluster each pedestrian image in the sample data set based on pedestrian features, to obtain at least two clusters corresponding to the at least two cluster labels respectively; wherein, each of the at least two clusters The clusters all include at least one pedestrian image;
  • the second training module 830 is configured to train the second preset model to convergence based on each pedestrian image in the sample data set and the cluster label corresponding to each pedestrian image to obtain the first preset model.
  • the first pedestrian image and the second pedestrian image are pedestrian images in different clusters in the at least two clusters.
  • An embodiment of the present disclosure further provides a pedestrian re-identification device, as shown in FIG. 9 , the device includes:
  • the second extraction module 910 is configured to perform feature extraction on the target image and the candidate pedestrian image by using the pedestrian re-identification model, respectively, to obtain the pedestrian feature of the target image and the pedestrian feature of the candidate pedestrian image; wherein, the pedestrian re-recognition model is trained according to the above-mentioned model obtained by the method;
  • the third similarity module 920 is configured to determine the similarity between the target image and the candidate pedestrian image based on the pedestrian feature of the target image and the pedestrian feature of the candidate pedestrian image;
  • the second determination module 930 is configured to determine the candidate pedestrian image as a related image of the target image when the similarity meets the preset condition.
  • the present disclosure also provides an electronic device, a readable storage medium, and a computer program product.
  • FIG. 10 shows a schematic block diagram of an example electronic device 1000 that may be used to implement embodiments of the present disclosure.
  • Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframe computers, and other suitable computers.
  • Electronic devices may also represent various forms of mobile devices, such as personal digital processors, cellular phones, smart phones, wearable devices, and other similar computing devices.
  • the components shown herein, their connections and relationships, and their functions are by way of example only, and are not intended to limit implementations of the disclosure described and/or claimed herein.
  • the electronic device 1000 includes a computing unit 1001 that can be executed according to a computer program stored in a read only memory (ROM) 1002 or a computer program loaded from a storage unit 1008 into a random access memory (RAM) 1003 Various appropriate actions and handling.
  • ROM read only memory
  • RAM random access memory
  • various programs and data required for the operation of the electronic device 1000 can also be stored.
  • the computing unit 1001, the ROM 1002, and the RAM 1003 are connected to each other through a bus 1004.
  • An input output (I/O) interface 1005 is also connected to bus 1004.
  • Various components in the electronic device 1000 are connected to the I/O interface 1005, including: an input unit 1006, such as a keyboard, a mouse, etc.; an output unit 1007, such as various types of displays, speakers, etc.; a storage unit 1008, such as a magnetic disk, an optical disk, etc. etc.; and a communication unit 1009, such as a network card, modem, wireless communication transceiver, and the like.
  • the communication unit 1009 allows the electronic device 1000 to exchange information/data with other devices through a computer network such as the Internet and/or various telecommunication networks.
  • Computing unit 1001 may be various general-purpose and/or special-purpose processing components with processing and computing capabilities. Some examples of computing units 1001 include, but are not limited to, central processing units (CPUs), graphics processing units (GPUs), various specialized artificial intelligence (AI) computing chips, various computing units that run machine learning model algorithms, digital signal processing processor (DSP), and any suitable processor, controller, microcontroller, etc.
  • the computing unit 1001 performs the various methods and processes described above, such as a model training method or a pedestrian re-identification method.
  • a model training method or a pedestrian re-identification method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 1008 .
  • part or all of the computer program may be loaded and/or installed on the electronic device 1000 via the ROM 1002 and/or the communication unit 1009.
  • the computer program When the computer program is loaded into RAM 1003 and executed by computing unit 1001, one or more steps of the model training method or the pedestrian re-identification method described above may be performed.
  • the computing unit 1001 may be configured to perform a model training method or a pedestrian re-identification method by any other suitable means (eg, by means of firmware).
  • Various implementations of the systems and techniques described herein above may be implemented in digital electronic circuitry, integrated circuit systems, field programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), application specific standard products (ASSPs), systems on chips system (SOC), load programmable logic device (CPLD), computer hardware, firmware, software, and/or combinations thereof.
  • FPGAs field programmable gate arrays
  • ASICs application specific integrated circuits
  • ASSPs application specific standard products
  • SOC systems on chips system
  • CPLD load programmable logic device
  • computer hardware firmware, software, and/or combinations thereof.
  • These various embodiments may include being implemented in one or more computer programs executable and/or interpretable on a programmable system including at least one programmable processor that
  • the processor which may be a special purpose or general-purpose programmable processor, may receive data and instructions from a storage system, at least one input device, and at least one output device, and transmit data and instructions to the storage system, the at least one input device, and the at least one output device an output device.
  • Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer or other programmable data processing apparatus, such that the program code, when executed by the processor or controller, performs the functions/functions specified in the flowcharts and/or block diagrams. Action is implemented.
  • the program code may execute entirely on the machine, partly on the machine, partly on the machine and partly on a remote machine as a stand-alone software package or entirely on the remote machine or server.
  • a machine-readable medium may be a tangible medium that may contain or store a program for use by or in connection with the instruction execution system, apparatus or device.
  • the machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium.
  • Machine-readable media may include, but are not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices, or devices, or any suitable combination of the foregoing.
  • machine-readable storage media would include one or more wire-based electrical connections, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), fiber optics, compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination of the foregoing.
  • RAM random access memory
  • ROM read only memory
  • EPROM or flash memory erasable programmable read only memory
  • CD-ROM compact disk read only memory
  • magnetic storage or any suitable combination of the foregoing.
  • the systems and techniques described herein may be implemented on a computer having a display device (eg, a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user ); and a keyboard and pointing device (eg, a mouse or trackball) through which a user can provide input to the computer.
  • a display device eg, a CRT (cathode ray tube) or LCD (liquid crystal display) monitor
  • a keyboard and pointing device eg, a mouse or trackball
  • Other kinds of devices can also be used to provide interaction with the user; for example, the feedback provided to the user can be any form of sensory feedback (eg, visual feedback, auditory feedback, or tactile feedback); and can be in any form (including acoustic input, voice input, or tactile input to receive input from the user.
  • the systems and techniques described herein may be implemented on a computing system that includes back-end components (eg, as a data server), or a computing system that includes middleware components (eg, an application server), or a computing system that includes front-end components (eg, a user's computer having a graphical user interface or web browser through which a user may interact with implementations of the systems and techniques described herein), or including such backend components, middleware components, Or any combination of front-end components in a computing system.
  • the components of the system may be interconnected by any form or medium of digital data communication (eg, a communication network). Examples of communication networks include: Local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.
  • a computer system can include clients and servers.
  • Clients and servers are generally remote from each other and usually interact through a communication network.
  • the relationship of client and server arises by computer programs running on the respective computers and having a client-server relationship to each other.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

本公开提供了模型训练方法、行人再识别方法、装置和电子设备,涉及人工智能领域,具体为计算机视觉和深度学习技术,可用于智慧城市场景下。具体实现方案为:利用第一编码器对样本数据集中的第一行人图像和第二行人图像进行特征提取,得到第一行人图像的图像特征和第二行人图像的图像特征;对第一行人图像的图像特征和第二行人图像的图像特征进行融合,得到融合特征;利用第一解码器对融合特征进行特征解码,得到第三行人图像;将第三行人图像确定为第一行人图像的负样本图像,并利用第一行人图像及负样本图像将第一预设模型训练至收敛,得到行人再识别模型。利用本公开实施例可以提升模型区分外表相似但身份不同的行人的效果。

Description

模型训练方法、行人再识别方法、装置和电子设备
本申请要求于2021年4月7日提交的、申请号为202110372249.5、发明名称为“模型训练方法、行人再识别方法、装置和电子设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本公开涉及人工智能领域,具体为计算机视觉和深度学习技术,可用于智慧城市场景下。
背景技术
行人再识别也被称为行人重识别,是利用计算机视觉技术判断图像或者视频序列中是否存在特定行人的技术。通常,可以利用大量样本图像对行人再识别模型进行有监督训练或无监督训练,利用训练至收敛的模型完成行人再识别任务。收敛的模型的性能依赖于样本图像的质量和难易程度。一般来说,模型能够对外表明显不同的行人进行区分,但难以区分外表相似但身份不同的行人。
发明内容
本公开提供了一种模型训练方法、行人再识别方法、装置和电子设备。
根据本公开的一方面,提供了一种模型训练方法,包括:
利用第一编码器对样本数据集中的第一行人图像和第二行人图像进行特征提取,得到第一行人图像的图像特征和第二行人图像的图像特征;
对第一行人图像的图像特征和第二行人图像的图像特征进行融合,得到融合特征;
利用第一解码器对融合特征进行特征解码,得到第三行人图像;
将第三行人图像确定为第一行人图像的负样本图像,并利用第一行人图像及负样本图像将第一预设模型训练至收敛,得到行人再识别模型。
根据本公开的另一方面,提供了一种行人再识别方法,包括:
利用行人再识别模型对目标图像以及候选行人图像分别进行特征提取,得到目标图像的行人特征以及候选行人图像的行人特征;其中,行人再识别模型是根据本公开任意实施例提供的模型训练方法得到的;
基于目标图像的行人特征以及候选行人图像的行人特征,确定目标图像与候选行人图像之间的相似度;
在相似度符合预设条件的情况下,将候选行人图像确定为目标图像的相关图像。
根据本公开的另一方面,提供了一种模型训练装置,包括:
第一编码模块,用于利用第一编码器对样本数据集中的第一行人图像和第二行人图像进行特征提取,得到第一行人图像的图像特征和第二行人图像的图像特征;
融合模块,用于对第一行人图像的图像特征和第二行人图像的图像特征进行融合,得到融合特征;
第一解码模块,用于利用第一解码器对融合特征进行特征解码,得到第三行人图像;
第一训练模块,用于将第三行人图像确定为第一行人图像的负样本图像,并利用第一行人图像及所述负样本图像将第一预设模型训练至收敛,得到行人再识别模型。
根据本公开的另一方面,提供了一种行人再识别装置,包括:
第二提取模块,用于利用行人再识别模型对目标图像以及候选行人图像分别进行特征提取,得到目标图像的行人特征以及候选行人图像的行人特征;其中,行人再识别模型是根据本公开任意实施例提供的模型训练方法得到的;
第三相似度模块,用于基于目标图像的行人特征以及候选行人图像的行人特征,确定目标图像与候选行人图像之间的相似度;
第二确定模块,用于在相似度符合预设条件的情况下,将候选行人图像确定为目标图像的相关图像。
根据本公开的另一方面,提供了一种电子设备,包括:
至少一个处理器;以及
与该至少一个处理器通信连接的存储器;其中,
该存储器存储有可被该至少一个处理器执行的指令,该指令被该至少一个处理器执行,以使该至少一个处理器能够执行本公开任一实施例中的方法。
根据本公开的另一方面,提供了一种存储有计算机指令的非瞬时计算机可读存储介质,该计算机指令用于使计算机执行本公开任一实施例中的方法。
根据本公开的另一方面,提供了一种计算机程序产品,包括计算机程序,该计算机程序被处理器执行时实现本公开任一实施例中的方法。
根据本公开的技术,由于第三行人图像是基于第一样本图像的图像特征和第二样本图像的图像特征进行融合得到的,因此第三行人图像既包含第一行人图像中的信息,也与第一行人图像具有一定的差异。利用第三行人图像作为第一行人图像的负样本,可以提升第一行人图像与其负样本之间的区分难度,从而基于区分困难的样本训练得到行人再识别模型,提升模型区分外表相似但身份不同的行人的效果。
应当理解,本部分所描述的内容并非旨在标识本公开的实施例的关键或重要特征,也不用于限制本公开的范围。本公开的其它特征将通过以下的说明书而变得容易理解。
附图说明
附图用于更好地理解本方案,不构成对本公开的限定。其中:
图1是本公开一个实施例提供的模型训练方法的示意图;
图2是本公开另一实施例提供的模型训练方法中第一阶段的示意图;
图3是本公开另一实施例提供的模型训练方法中第二阶段的示意图;
图4是本公开另一实施例提供的模型训练方法中第三阶段的示意图;
图5是本公开一个实施例提供的行人再识别方法的示意图;
图6是本公开一个实施例提供的模型训练装置的示意图;
图7是本公开另一实施例提供的模型训练装置的示意图;
图8是本公开又一实施例提供的模型训练装置的示意图;
图9是本公开一个实施例提供的行人再识别装置的示意图;
图10是用来实现本公开实施例的方法的电子设备的框图。
具体实施方式
以下结合附图对本公开的示范性实施例做出说明,其中包括本公开实施例的各种细节以助于理解,应当将它们认为仅仅是示范性的。因此,本领域普通技术人员应当认识到,可以对这里描述的实施例做出各种改变和修改,而不会背离本公开的范围和精神。同样,为了清楚和简明,以下的描述中省略了对公知功能和结构的描述。
图1示出了本公开一个实施例提供的模型训练方法的示意图。如图1所示, 模型训练方法包括:
步骤S11,利用第一编码器对样本数据集中的第一行人图像和第二行人图像进行特征提取,得到第一行人图像的图像特征和第二行人图像的图像特征;
步骤S12,对第一行人图像的图像特征和第二行人图像的图像特征进行融合,得到融合特征;
步骤S13,利用第一解码器对融合特征进行特征解码,得到第三行人图像;
步骤S14,将第三行人图像确定为第一行人图像的负样本图像,并利用第一行人图像及负样本图像将第一预设模型训练至收敛,得到行人再识别模型。
上述步骤S11中的第一编码器可用于基于行人图像提取图像特征,步骤S13中的第一解码器可用于基于图像特征解码得到新的图像。因此,第一编码器和第一解码器可以构成图像生成模型,用于基于输入的行人图像,重构新的行人图像。其中,第一编码器提取的图像特征可以采用第一向量表征。该向量可以包括对应的行人图像的多个维度的特征信息。
本公开实施例中,可以将样本数据集中不同的行人图像如第一行人图像和第二行人图像分别输入第一编码器,第一编码器输出对应的图像特征。通过对图像特征进行融合,得到融合特征。再将融合特征输入第一解码器,第一解码器基于融合特征重构并输出第三行人图像。
由于是基于第一行人图像和第二行人图像的融合特征重构出第三行人图像,因此第三行人图像既包含第一行人图像的信息也包含第二行人图像的信息。将第三行人图像作为第一行人图像的负样本图像,使得第一行人图像和其负样本图像之间的区分难度较高,从而基于区分困难的样本训练得到行人再识别模型,提升模型区分外表相似但身份不同的行人的效果。
示例性地,样本数据集可以包括至少两个行人图像。每个行人图像对应于一个行人。不同的行人图像可以对应于不同的行人,也可以对应于相同的行人。
实际应用中,可以从样本数据集中采样一个图像作为第一样本图像。并以第一样本图像作为基准,采样一个与第一行人图像差异较大的图像,例如与第一行人图像对应于不同行人的图像,作为第二样本图像。基于采样的图像重构出第三行人图像,将第一行人图像和第三行人图像分别输入第一预设模型,第一预设模型对第一行人图像和第三行人图像分别进行处理后,输出对应的处理结果例如图像中的行人特征或行人标识。根据第一预设模型的处理结果和第一 预设模型所对应的损失函数,计算损失函数的函数值。并基于损失函数的函数值更新第一预设模型,直至第一预设模型达到收敛条件例如更新次数达到第一预设阈值、损失函数的函数值小于第二预设阈值或损失函数的函数值不再发生变化等,将收敛的第一预设模型确定为可用于完成行人再识别任务的行人再识别模型。
示例性地,第一预设模型所对应的损失函数可用于约束第一预设模型将第一行人图像的处理结果和负样本图像的处理结果推远,或者说用于使第一预设模型针对第一行人图像及负样本图像输出在特征空间中距离尽量远的处理结果。从而使得第一预设模型能够区分出不同的行人图像。
示例性地,可以每次采样生成一个第三行人图像,形成包含第一行人图像和第三行人图像的一组正负样本对后,利用该组正负样本对执行更新第一预设模型的相关操作;然后进行下一次采样。也可以先针对样本数据集的每个行人图像均获得对应的负样本图像,形成多个正负样本对之后,再利用多个正负样本对执行多次更新第一预设模型的相关操作。
示例性地,在通过更新第一预设模型实现对第一预设模型的训练的过程中,也可以对第一编码器和第一解码器进行更新。具体地,模型训练方法还可以包括:
基于第一行人图像与负样本图像,确定第一相似度;
基于样本图像集中除第一行人图像以外的至少一个行人图像,确定与至少一个行人图像分别对应的至少一个第二相似度;
基于第一相似度、至少一个第二相似度以及对抗损失函数,更新第一编码器以及第一解码器。
其中,对抗损失函数可以用于约束第一相似度大于至少一个第二相似度中的任一个。基于此,基于第一相似度、至少一个第二相似度以及对抗损失函数更新第一编码器和第一解码器,能够使第一编码器和第一解码器重构出的图像与第一行人图像更相似,增加第一行人图像与负样本图像之间的区分难度,从而进一步提升行人再识别模型的效果。
示例性地,可以基于第一相似度和第二相似度,计算对抗损失函数的函数值,基于对抗损失函数的函数值更新第一编码器和第一解码器。
在一些场景中,还可以结合重构损失函数和/或负样本图像的真实度更新第 一编码器和第一解码器。其中,重构损失函数可用于约束第一编码器和第一解码器重构的图像与第一行人图像和/或第二行人图像之间的相似度高于预设阈值,也就是说,重构的图像要与输入的图像具有一定的相似性。真实度可以利用真实度判别器确定。作为示例,可以先计算对抗损失函数的函数值、重构损失函数的函数值,并确定真实度,再利用以上三者更新第一编码器和第二编码器。
由于在利用第一行人图像及其负样本图像训练第一预设模型以得到行人再识别模型的过程中,还利用第一行人图像及所述负样本图像训练第一编码器和第二解码器,因此,第一编码器和第一解码器也会逐步提升重构的负样本图像的质量,从而逐步提升第一预设模型的训练效果。
示例性地,第一编码器和第一解码器可以是基于行人图像预先训练得到的。具体地,获取第一编码器和第一解码器的方式包括:
利用第二编码器对样本数据集中的第i个行人图像进行特征提取,得到第i个行人图像的图像特征;其中,i为大于等于1的正整数;
利用第二解码器对第i个行人图像的图像特征进行特征解码,得到生成图像;
基于第i个行人图像与生成图像之间的相似度以及重构损失函数,更新第二编码器和第二解码器;
在第二编码器和第二解码器符合收敛条件的情况下,将第二编码器确定为第一编码器并将第二解码器确定为第一解码器。
其中,重构损失函数用于约束第i个行人图像和生成图像之间的相似度小于预设阈值。或者说重构损失函数约束解码出的图像和输入编码的图像相似。
基于上述过程,第二编码器和第二解码器会逐步提高重构出与输入图像相似的图像的能力。在符合收敛条件的情况下将第二编码器和第二解码器确定为第一编码器和第一解码器,从而第一编码器和第一解码器具备重构出相似图像的能力。因此,将第一编码器和第一解码器应用于生成负样本图像,可以提高生成效果,从而提高行人再识别模型的训练效果。
示例性地,基于第i个行人图像与生成图像之间的相似度以及重构损失函数,更新第二编码器和第二解码器,包括:
基于第i个行人图像和生成图像之间的相似度以及重构损失函数,计算重构 损失函数的函数值;
利用真实度判别器确定生成图像的真实度;
根据重构损失函数的函数值以及生成图像的真实度,更新第二编码器和第二解码器。
也就是说,在训练过程中,不仅利用重构损失函数约束第二编码器和第二解码器生成的图像要与输入图像相似,还约束生成图像要尽量逼真。将对第二编码器和第二解码器进行训练得到的第一编码器和第一解码器应用于生成负样本图像,可以提高生成效果,从而提高行人再识别模型的训练效果。
示例性地,上述第一预设模型也可以经预先训练得到。具体地,获取第一预设模型的方式包括:
利用第二预设模型对样本数据集中的每个行人图像进行特征提取,得到每个行人图像的行人特征;
基于行人特征对样本数据集中的各个行人图像进行聚类,得到与至少两个类簇标签分别对应的至少两个类簇;其中,至少两个类簇中的每个类簇均包括至少一个行人图像;
基于样本数据集中的每个行人图像以及每个行人图像所对应的类簇标签,将第二预设模型训练至收敛,得到第一预设模型。
其中,行人特征可以采用第二向量表征。第二向量包括行人图像所对应的行人的多个维度上的特征。
需要说明的是,本公开实施例中的各编码器和第一预设模型、第二预设模型、行人再识别模型均可用于进行特征提取,各编码器或模型可以基于相同的方式或不同的方式提取不同维度的特征。例如,编码器可着重提取与图像画面效果相关的特征如色彩等,第一预设模型、第二预设模型、行人再识别模型可着重提取与行人相关的特征例如行人高度等。
示例性地,上述对行人图像进行聚类,可以基于DBSCAN(Density-Based Spatial Clustering of Applications with Noise,具有噪声的基于密度的聚类方法)、K-means(K-means Clustering Algorithm,K均值聚类算法)等至少一种实现。
通过聚类,各行人图像被划分至不同的类簇中,每个类簇的类簇标签可作为类簇中各行人图像的伪标签。利用各行人图像及其类簇标签或者说伪标签训练第二预设模型,可以实现无监督训练,减少对各行人图像的标注成本。
实际应用中,在将第二预设模型训练至收敛,得到第一预设模型的过程中,可以利用第二预设模型所对应的损失函数,约束第二预设模型将针对不同类簇的行人图像的处理结果推远,将针对相同类簇的行人图像的处理结果拉近。从而使得第二预设模型逐步提高区分不同行人图像的能力。
示例性地,上述第一行人图像与第二行人图像可以是至少两个类簇中的不同类簇中的行人图像。
通过使用不同类簇的图像作为第一行人图像和第二行人图像,可以确保利用融合特征重构出的第三行人图像与第一行人图像具有差异性,从而确保行人再识别模型获得准确区分的能力。
下面以一具体的应用示例,说明本公开实施例的模型训练方法的可选的实现方式。在应用示例中,模型训练方法用于训练得到行人再识别模型。具体可分为三个阶段。
图2是第一阶段的示意图。如图2所示,第一阶段包括以下步骤:
特征提取步骤201:使用初始化的模型对无标签样本数据集200中的每一个行人图像进行特征提取。其中,初始化的模型记为第二预设模型,可利用带标签的多个行人图像训练得到初始化的模型。
聚类步骤202:使用DBSCAN、k-means等聚类算法中的一种或多种对步骤201提取的特征进行聚类,实现对无标签样本数据集200中的图像的聚类。这样,无标签样本数据集200中各图像在特征空间中被划分到各个不同的类簇中。
分配伪标签步骤203:根据各图像在特征空间中对应的类簇,为各图像分配伪标签。伪标签即相应的类簇索引。
无监督对比训练步骤204:根据各图像、步骤203分配的伪标签和损失函数,训练第二预设模型。其中,损失函数约束同一类簇内的图像在特征空间相互靠近,不同类簇的图像在特征空间相互远离。
经过步骤204中往复迭代的训练过程,第二预设模型收敛,得到第一预设模型205。
图3是第二阶段的示意图。第二阶段用于训练图像生成模型,图像生成模型包括编码器和解码器。第二阶段的目的是使图像生成模型具备从抽象特征重构自然图像的能力。第二阶段包括步骤:
特征编码步骤300:利用图像生成模型中的第二编码器对无标签样本数据集 200中的各图像进行特征提取,得到相应的图像特征301。
特征解码步骤302:利用图像生成模型中的第二解码器对图像特征301进行解码,得到生成图像。
真实度判别步骤303:利用真实度判别器确定生成图像的真实度。该步骤用于约束图像生成模型输出的生成图像尽量逼真。
重构损失函数计算步骤304:根据生成图像和无标签样本数据集200中输入图像生成模型的图像计算重构损失函数,重构损失函数用于约束第二解码器解码出的生成图像要和输入第二编码器的图像相似。
基于步骤303和步骤304的输出,可以更新图像生成模型。当符合预设收敛条件时,可将图像生成模型中的第二编码器确定为第一编码器,将图像生成模型中的第二解码器确定为第一解码器,以将第一编码器和第一解码器应用于第三阶段。
图4是第三阶段的示意图。如图4所示,第三阶段包括:
采样步骤400:依次采样无标签样本数据集200中的每个图像,作为基准图像,即第一行人图像。然后采样一个与第一行人图像不属于同一类簇的图像作为第二行人图像。
特征编码步骤401:利用图像生成模型中的第一编码器对第一行人图像和第二行人图像分别进行特征提取,得到相应的图像特征。
融合特征步骤402:将步骤401得到的图像进行加权融合,得到融合特征。
特征解码步骤403:利用图像生成模型中的第一解码器对融合特征进行解码,得到第三行人图像406。
真实度判别步骤404:利用真实度判别器确定第三行人图像406的真实度。
重构和对抗损失函数405:除了计算重构损失函数外,该步骤还计算对抗损失函数。对抗损失函数约束第三行人图像406与第一行人图像的相似度要大于第三行人图像406与无标签样本数据集200中其他图像的相似度。即生成的第三行人图像要在外表上与第一行人图像有一定相似性。
无监督训练步骤407:该步骤将第三行人图像作为第一行人图像的负样本,对第一预设模型进行无监督训练。除第一阶段中无监督训练步骤的损失函数的约束外,在本步骤中损失函数还约束要将第一行人图像及所述负样本图像在特征空间中尽量推远,以使模型能够具备区分困难样本的效果。最终输出行人再 识别模型408。
根据本公开实施例的方法,由于第三行人图像是基于第一样本图像的图像特征和第二样本图像的图像特征进行融合得到的,因此第三行人图像既包含第一行人图像中的信息,也与第一行人图像具有一定的差异。利用第三行人图像作为第一行人图像的负样本,可以提升第一行人图像与其负样本之间的区分难度,从而基于区分困难的样本训练得到行人再识别模型,提升模型区分外表相似但身份不同的行人的效果。
本公开实施例还提供上述行人再识别模型的应用方法。图5示出了本公开一个实施例提供的行人再识别方法,包括:
步骤S51,利用行人再识别模型对目标图像以及候选行人图像分别进行特征提取,得到目标图像的行人特征以及候选行人图像的行人特征;其中,行人再识别模型是根据本公开任意实施例提供的模型训练方法得到的;
步骤S52,基于目标图像的行人特征以及候选行人图像的行人特征,确定目标图像与候选行人图像之间的相似度;
步骤S53,在相似度符合预设条件的情况下,将候选行人图像确定为目标图像的相关图像。
其中,预设条件例如是相似度小于预设阈值或相似度最小等。
由于本公开实施例提供的模型训练方法基于区分困难的样本训练得到行人再识别模型,因此,利用行人再识别模型可以准确提取各图像的行人特征,基于各图像的行人特征进行相似度计算,利用计算得到的相似度可以从候选行人图像中准确确定出目标图像的相关图像。
作为上述各方法的实现,本公开还提供了一种模型训练装置。如图6所示,该装置包括:
第一编码模块610,用于利用第一编码器对样本数据集中的第一行人图像和第二行人图像进行特征提取,得到第一行人图像的图像特征和第二行人图像的图像特征;
融合模块620,用于对第一行人图像的图像特征和第二行人图像的图像特征进行融合,得到融合特征;
第一解码模块630,用于利用第一解码器对融合特征进行特征解码,得到第三行人图像;
第一训练模块640,用于将第三行人图像确定为第一行人图像的负样本图像,并利用第一行人图像及负样本图像将第一预设模型训练至收敛,得到行人再识别模型。
示例性地,如图7所示,该装置还包括:
第一相似度模块710,用于基于第一行人图像与负样本图像,确定第一相似度;
第二相似度模块720,用于基于样本图像集中除第一行人图像以外的至少一个行人图像,确定与至少一个行人图像分别对应的至少一个第二相似度;
第一更新模块730,用于基于第一相似度、至少一个第二相似度以及对抗损失函数,更新第一编码器以及第一解码器。
示例性地,如图7所示,该装置还包括:
第二编码模块750,用于利用第二编码器对样本数据集中的第i个行人图像进行特征提取,得到第i个行人图像的图像特征;其中,i为大于等于1的正整数;
第二解码模块760,用于利用第二解码器对第i个行人图像的图像特征进行特征解码,得到生成图像;
第二更新模块770,用于基于第i个行人图像与生成图像之间的相似度以及重构损失函数,更新第二编码器和第二解码器;
第一确定模块780,用于在第二编码器和第二解码器符合收敛条件的情况下,将第二编码器确定为第一编码器并将第二解码器确定为第一解码器。
示例性地,第二更新模块770包括:
计算单元771,用于基于第i个行人图像和生成图像之间的相似度以及重构损失函数,计算重构损失函数的函数值;
确定单元772,用于利用真实度判别器确定生成图像的真实度;
更新单元773,用于根据重构损失函数的函数值以及生成图像的真实度,更新第二编码器和第二解码器。
示例性地,如图8所示,该装置还包括:
第一提取模块810,用于利用第二预设模型对样本数据集中的每个行人图像进行特征提取,得到每个行人图像的行人特征;
聚类模块820,用于基于行人特征对样本数据集中的各个行人图像进行聚 类,得到与至少两个类簇标签分别对应的至少两个类簇;其中,至少两个类簇中的每个类簇均包括至少一个行人图像;
第二训练模块830,用于基于样本数据集中的每个行人图像以及每个行人图像所对应的类簇标签,将第二预设模型训练至收敛,得到第一预设模型。
示例性地,第一行人图像与第二行人图像为至少两个类簇中的不同类簇中的行人图像。
本公开实施例还提供一种行人再识别装置,如图9所示,该装置包括:
第二提取模块910,用于利用行人再识别模型对目标图像以及候选行人图像分别进行特征提取,得到目标图像的行人特征以及候选行人图像的行人特征;其中,行人再识别模型是根据上述模型训练方法得到的;
第三相似度模块920,用于基于目标图像的行人特征以及候选行人图像的行人特征,确定目标图像与候选行人图像之间的相似度;
第二确定模块930,用于在相似度符合预设条件的情况下,将候选行人图像确定为目标图像的相关图像。
本公开实施例各装置中的各单元、模块或子模块的功能可以参见上述方法实施例中的对应描述,在此不再赘述。
根据本公开的实施例,本公开还提供了一种电子设备、一种可读存储介质和一种计算机程序产品。
图10示出了可以用来实施本公开的实施例的示例电子设备1000的示意性框图。电子设备旨在表示各种形式的数字计算机,诸如,膝上型计算机、台式计算机、工作台、个人数字助理、服务器、刀片式服务器、大型计算机、和其它适合的计算机。电子设备还可以表示各种形式的移动装置,诸如,个人数字处理、蜂窝电话、智能电话、可穿戴设备和其它类似的计算装置。本文所示的部件、它们的连接和关系、以及它们的功能仅仅作为示例,并且不意在限制本文中描述的和/或要求的本公开的实现。
如图10所示,电子设备1000包括计算单元1001,其可以根据存储在只读存储器(ROM)1002中的计算机程序或者从存储单元1008加载到随机访问存储器(RAM)1003中的计算机程序来执行各种适当的动作和处理。在RAM 1003中,还可存储电子设备1000操作所需的各种程序和数据。计算单元1001、ROM 1002以及RAM 1003通过总线1004彼此相连。输入输出(I/O)接口1005也连 接至总线1004。
电子设备1000中的多个部件连接至I/O接口1005,包括:输入单元1006,例如键盘、鼠标等;输出单元1007,例如各种类型的显示器、扬声器等;存储单元1008,例如磁盘、光盘等;以及通信单元1009,例如网卡、调制解调器、无线通信收发机等。通信单元1009允许电子设备1000通过诸如因特网的计算机网络和/或各种电信网络与其他设备交换信息/数据。
计算单元1001可以是各种具有处理和计算能力的通用和/或专用处理组件。计算单元1001的一些示例包括但不限于中央处理单元(CPU)、图形处理单元(GPU)、各种专用的人工智能(AI)计算芯片、各种运行机器学习模型算法的计算单元、数字信号处理器(DSP)、以及任何适当的处理器、控制器、微控制器等。计算单元1001执行上文所描述的各个方法和处理,例如模型训练方法或行人再识别方法。例如,在一些实施例中,模型训练方法或行人再识别方法可被实现为计算机软件程序,其被有形地包含于机器可读介质,例如存储单元1008。在一些实施例中,计算机程序的部分或者全部可以经由ROM 1002和/或通信单元1009而被载入和/或安装到电子设备1000上。当计算机程序加载到RAM 1003并由计算单元1001执行时,可以执行上文描述的模型训练方法或行人再识别方法的一个或多个步骤。备选地,在其他实施例中,计算单元1001可以通过其他任何适当的方式(例如,借助于固件)而被配置为执行模型训练方法或行人再识别方法。
本文中以上描述的系统和技术的各种实施方式可以在数字电子电路系统、集成电路系统、场可编程门阵列(FPGA)、专用集成电路(ASIC)、专用标准产品(ASSP)、芯片上系统的系统(SOC)、负载可编程逻辑设备(CPLD)、计算机硬件、固件、软件、和/或它们的组合中实现。这些各种实施方式可以包括:实施在一个或者多个计算机程序中,该一个或者多个计算机程序可在包括至少一个可编程处理器的可编程系统上执行和/或解释,该可编程处理器可以是专用或者通用可编程处理器,可以从存储系统、至少一个输入装置、和至少一个输出装置接收数据和指令,并且将数据和指令传输至该存储系统、该至少一个输入装置、和该至少一个输出装置。
用于实施本公开的方法的程序代码可以采用一个或多个编程语言的任何组合来编写。这些程序代码可以提供给通用计算机、专用计算机或其他可编程数 据处理装置的处理器或控制器,使得程序代码当由处理器或控制器执行时使流程图和/或框图中所规定的功能/操作被实施。程序代码可以完全在机器上执行、部分地在机器上执行,作为独立软件包部分地在机器上执行且部分地在远程机器上执行或完全在远程机器或服务器上执行。
在本公开的上下文中,机器可读介质可以是有形的介质,其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备,或者上述内容的任何合适组合。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。
为了提供与用户的交互,可以在计算机上实施此处描述的系统和技术,该计算机具有:用于向用户显示信息的显示装置(例如,CRT(阴极射线管)或者LCD(液晶显示器)监视器);以及键盘和指向装置(例如,鼠标或者轨迹球),用户可以通过该键盘和该指向装置来将输入提供给计算机。其它种类的装置还可以用于提供与用户的交互;例如,提供给用户的反馈可以是任何形式的传感反馈(例如,视觉反馈、听觉反馈、或者触觉反馈);并且可以用任何形式(包括声输入、语音输入、或者触觉输入来接收来自用户的输入。
可以将此处描述的系统和技术实施在包括后台部件的计算系统(例如,作为数据服务器)、或者包括中间件部件的计算系统(例如,应用服务器)、或者包括前端部件的计算系统(例如,具有图形用户界面或者网络浏览器的用户计算机,用户可以通过该图形用户界面或者该网络浏览器来与此处描述的系统和技术的实施方式交互)、或者包括这种后台部件、中间件部件、或者前端部件的任何组合的计算系统中。可以通过任何形式或者介质的数字数据通信(例如,通信网络)来将系统的部件相互连接。通信网络的示例包括:局域网(LAN)、广域网(WAN)和互联网。
计算机系统可以包括客户端和服务器。客户端和服务器一般远离彼此并且通常通过通信网络进行交互。通过在相应的计算机上运行并且彼此具有客户端- 服务器关系的计算机程序来产生客户端和服务器的关系。
应该理解,可以使用上面所示的各种形式的流程,重新排序、增加或删除步骤。例如,本公开中记载的各步骤可以并行地执行也可以顺序地执行也可以不同的次序执行,只要能够实现本公开公开的技术方案所期望的结果,本文在此不进行限制。
上述具体实施方式,并不构成对本公开保护范围的限制。本领域技术人员应该明白的是,根据设计要求和其他因素,可以进行各种修改、组合、子组合和替代。任何在本公开的精神和原则之内所作的修改、等同替换和改进等,均应包含在本公开保护范围之内。

Claims (17)

  1. 一种模型训练方法,包括:
    利用第一编码器对样本数据集中的第一行人图像和第二行人图像进行特征提取,得到所述第一行人图像的图像特征和所述第二行人图像的图像特征;
    对所述第一行人图像的图像特征和所述第二行人图像的图像特征进行融合,得到融合特征;
    利用第一解码器对所述融合特征进行特征解码,得到第三行人图像;
    将所述第三行人图像确定为所述第一行人图像的负样本图像,并利用所述第一行人图像及所述负样本图像将第一预设模型训练至收敛,得到行人再识别模型。
  2. 根据权利要求1所述的方法,还包括:
    基于所述第一行人图像与所述负样本图像,确定第一相似度;
    基于所述样本图像集中除所述第一行人图像以外的至少一个行人图像,确定与所述至少一个行人图像分别对应的至少一个第二相似度;
    基于所述第一相似度、所述至少一个第二相似度以及对抗损失函数,更新所述第一编码器以及所述第一解码器。
  3. 根据权利要求1或2所述的方法,其中,获取所述第一编码器和所述第一解码器的方式包括:
    利用第二编码器对所述样本数据集中的第i个行人图像进行特征提取,得到所述第i个行人图像的图像特征;其中,i为大于等于1的正整数;
    利用第二解码器对所述第i个行人图像的图像特征进行特征解码,得到生成图像;
    基于所述第i个行人图像与所述生成图像之间的相似度以及重构损失函数,更新所述第二编码器和所述第二解码器;
    在所述第二编码器和所述第二解码器符合收敛条件的情况下,将所述第二编码器确定为所述第一编码器并将所述第二解码器确定为所述第一解码器。
  4. 根据权利要求3所述的方法,其中,所述基于所述第i个行人图像与所述生成图像之间的相似度以及重构损失函数,更新所述第二编码器和所述第二解码器,包括:
    基于所述第i个行人图像和所述生成图像之间的相似度以及所述重构损失函数,计算所述重构损失函数的函数值;
    利用真实度判别器确定所述生成图像的真实度;
    根据所述重构损失函数的函数值以及所述生成图像的真实度,更新所述第二编码器和所述第二解码器。
  5. 根据权利要求1-4中任一项所述的方法,其中,获取所述第一预设模型的方式包括:
    利用第二预设模型对样本数据集中的每个行人图像进行特征提取,得到所述每个行人图像的行人特征;
    基于所述行人特征对所述样本数据集中的各个行人图像进行聚类,得到与至少两个类簇标签分别对应的至少两个类簇;其中,所述至少两个类簇中的每个类簇均包括至少一个行人图像;
    基于所述样本数据集中的每个行人图像以及所述每个行人图像所对应的类簇标签,将所述第二预设模型训练至收敛,得到所述第一预设模型。
  6. 根据权利要求5所述的方法,其中,所述第一行人图像与所述第二行人图像为所述至少两个类簇中的不同类簇中的行人图像。
  7. 一种行人再识别方法,包括:
    利用行人再识别模型对目标图像以及候选行人图像分别进行特征提取,得到所述目标图像的行人特征以及所述候选行人图像的行人特征;其中,所述行人再识别模型是根据权利要求1-6中任一项所述的模型训练方法得到的;
    基于所述目标图像的行人特征以及所述候选行人图像的行人特征,确定所述目标图像与所述候选行人图像之间的相似度;
    在所述相似度符合预设条件的情况下,将所述候选行人图像确定为所述目标图像的相关图像。
  8. 一种模型训练装置,包括:
    第一编码模块,用于利用第一编码器对样本数据集中的第一行人图像和第二行人图像进行特征提取,得到所述第一行人图像的图像特征和所述第二行人图像的图像特征;
    融合模块,用于对所述第一行人图像的图像特征和所述第二行人图像的图像特征进行融合,得到融合特征;
    第一解码模块,用于利用第一解码器对所述融合特征进行特征解码,得到第三行人图像;
    第一训练模块,用于将所述第三行人图像确定为所述第一行人图像的负样本图像,并利用所述第一行人图像及所述负样本图像将第一预设模型训练至收敛,得到行人再识别模型。
  9. 根据权利要求8所述的装置,还包括:
    第一相似度模块,用于基于所述第一行人图像与所述负样本图像,确定第一相似度;
    第二相似度模块,用于基于所述样本图像集中除所述第一行人图像以外的至少一个行人图像,确定与所述至少一个行人图像分别对应的至少一个第二相似度;
    第一更新模块,用于基于所述第一相似度、所述至少一个第二相似度以及对抗损失函数,更新所述第一编码器以及所述第一解码器。
  10. 根据权利要求8或9所述的装置,还包括:
    第二编码模块,用于利用第二编码器对所述样本数据集中的第i个行人图像进行特征提取,得到所述第i个行人图像的图像特征;其中,i为大于等于1的正整数;
    第二解码模块,用于利用第二解码器对所述第i个行人图像的图像特征进行特征解码,得到生成图像;
    第二更新模块,用于基于所述第i个行人图像与所述生成图像之间的相似度以及重构损失函数,更新所述第二编码器和所述第二解码器;
    第一确定模块,用于在所述第二编码器和所述第二解码器符合收敛条件的情况下,将所述第二编码器确定为所述第一编码器并将所述第二解码器确定为所述第一解码器。
  11. 根据权利要求10所述的装置,其中,所述第二更新模块包括:
    计算单元,用于基于所述第i个行人图像和所述生成图像之间的相似度以及所述重构损失函数,计算所述重构损失函数的函数值;
    确定单元,用于利用真实度判别器确定所述生成图像的真实度;
    更新单元,用于根据所述重构损失函数的函数值以及所述生成图像的真实度,更新所述第二编码器和所述第二解码器。
  12. 根据权利要求8-11中任一项所述的装置,还包括:
    第一提取模块,用于利用第二预设模型对样本数据集中的每个行人图像进行特征提取,得到所述每个行人图像的行人特征;
    聚类模块,用于基于所述行人特征对所述样本数据集中的各个行人图像进行聚类,得到与至少两个类簇标签分别对应的至少两个类簇;其中,所述至少两个类簇中的每个类簇均包括至少一个行人图像;
    第二训练模块,用于基于所述样本数据集中的每个行人图像以及所述每个行人图像所对应的类簇标签,将所述第二预设模型训练至收敛,得到所述第一预设模型。
  13. 根据权利要求12所述的装置,其中,所述第一行人图像与所述第二行人图像为所述至少两个类簇中的不同类簇中的行人图像。
  14. 一种行人再识别装置,包括:
    第二提取模块,用于利用行人再识别模型对目标图像以及候选行人图像分别进行特征提取,得到所述目标图像的行人特征以及所述候选行人图像的行人特征;其中,所述行人再识别模型是根据权利要求1-6中任一项所述的模型训练方法得到的;
    第三相似度模块,用于基于所述目标图像的行人特征以及所述候选行人图像的行人特征,确定所述目标图像与所述候选行人图像之间的相似度;
    第二确定模块,用于在所述相似度符合预设条件的情况下,将所述候选行人图像确定为所述目标图像的相关图像。
  15. 一种电子设备,其特征在于,包括:
    至少一个处理器;以及
    与所述至少一个处理器通信连接的存储器;其中,
    所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行权利要求1-7中任一项所述的方法。
  16. 一种存储有计算机指令的非瞬时计算机可读存储介质,其特征在于,所述计算机指令用于使计算机执行权利要求1-7中任一项所述的方法。
  17. 一种计算机程序产品,包括计算机程序,该计算机程序在被处理器执行时实现根据权利要求1-7中任一项所述的方法。
PCT/CN2022/075112 2021-04-07 2022-01-29 模型训练方法、行人再识别方法、装置和电子设备 WO2022213717A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
US17/800,880 US20240221346A1 (en) 2021-04-07 2022-01-29 Model training method and apparatus, pedestrian re-identification method and apparatus, and electronic device
JP2022547887A JP7403673B2 (ja) 2021-04-07 2022-01-29 モデルトレーニング方法、歩行者再識別方法、装置および電子機器
KR1020227026823A KR20220116331A (ko) 2021-04-07 2022-01-29 모델 트레이닝 방법, 보행자 재인식 방법, 장치 및 전자 기기

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110372249.5A CN112861825B (zh) 2021-04-07 2021-04-07 模型训练方法、行人再识别方法、装置和电子设备
CN202110372249.5 2021-04-07

Publications (1)

Publication Number Publication Date
WO2022213717A1 true WO2022213717A1 (zh) 2022-10-13

Family

ID=75992221

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/075112 WO2022213717A1 (zh) 2021-04-07 2022-01-29 模型训练方法、行人再识别方法、装置和电子设备

Country Status (2)

Country Link
CN (1) CN112861825B (zh)
WO (1) WO2022213717A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115471875A (zh) * 2022-10-31 2022-12-13 之江实验室 一种多码率的行人识别视觉特征编码压缩方法和装置

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112861825B (zh) * 2021-04-07 2023-07-04 北京百度网讯科技有限公司 模型训练方法、行人再识别方法、装置和电子设备
CN113947693A (zh) * 2021-10-13 2022-01-18 北京百度网讯科技有限公司 获取目标对象识别模型的方法、装置及电子设备
CN113920404A (zh) * 2021-11-09 2022-01-11 北京百度网讯科技有限公司 训练方法、图像处理方法、装置、电子设备以及存储介质
CN114724090B (zh) * 2022-05-23 2022-08-30 北京百度网讯科技有限公司 行人再识别模型的训练方法、行人再识别方法及装置

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2016071502A (ja) * 2014-09-29 2016-05-09 セコム株式会社 対象識別装置
CN111027421A (zh) * 2019-11-26 2020-04-17 西安宏规电子科技有限公司 一种基于图的直推式半监督行人再识别方法
CN111523413A (zh) * 2020-04-10 2020-08-11 北京百度网讯科技有限公司 生成人脸图像的方法和装置
CN112069929A (zh) * 2020-08-20 2020-12-11 之江实验室 一种无监督行人重识别方法、装置、电子设备及存储介质
CN112861825A (zh) * 2021-04-07 2021-05-28 北京百度网讯科技有限公司 模型训练方法、行人再识别方法、装置和电子设备

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109934177A (zh) * 2019-03-15 2019-06-25 艾特城信息科技有限公司 行人再识别方法、系统及计算机可读存储介质
CN110443110B (zh) * 2019-06-11 2023-08-25 平安科技(深圳)有限公司 基于多路摄像的人脸识别方法、装置、终端及存储介质
CN112446270B (zh) * 2019-09-05 2024-05-14 华为云计算技术有限公司 行人再识别网络的训练方法、行人再识别方法和装置
CN111833306B (zh) * 2020-06-12 2024-02-13 北京百度网讯科技有限公司 缺陷检测方法和用于缺陷检测的模型训练方法
CN112183637B (zh) * 2020-09-29 2024-04-09 中科方寸知微(南京)科技有限公司 一种基于神经网络的单光源场景光照重渲染方法及系统
CN112418041B (zh) * 2020-11-16 2022-04-15 武汉大学 一种基于人脸正面化的多姿态人脸识别方法
CN112560874B (zh) * 2020-12-25 2024-04-16 北京百度网讯科技有限公司 图像识别模型的训练方法、装置、设备和介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2016071502A (ja) * 2014-09-29 2016-05-09 セコム株式会社 対象識別装置
CN111027421A (zh) * 2019-11-26 2020-04-17 西安宏规电子科技有限公司 一种基于图的直推式半监督行人再识别方法
CN111523413A (zh) * 2020-04-10 2020-08-11 北京百度网讯科技有限公司 生成人脸图像的方法和装置
CN112069929A (zh) * 2020-08-20 2020-12-11 之江实验室 一种无监督行人重识别方法、装置、电子设备及存储介质
CN112861825A (zh) * 2021-04-07 2021-05-28 北京百度网讯科技有限公司 模型训练方法、行人再识别方法、装置和电子设备

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115471875A (zh) * 2022-10-31 2022-12-13 之江实验室 一种多码率的行人识别视觉特征编码压缩方法和装置
CN115471875B (zh) * 2022-10-31 2023-03-03 之江实验室 一种多码率的行人识别视觉特征编码压缩方法和装置

Also Published As

Publication number Publication date
CN112861825B (zh) 2023-07-04
CN112861825A (zh) 2021-05-28

Similar Documents

Publication Publication Date Title
WO2022213717A1 (zh) 模型训练方法、行人再识别方法、装置和电子设备
CN113222916B (zh) 采用目标检测模型检测图像的方法、装置、设备和介质
WO2023016007A1 (zh) 人脸识别模型的训练方法、装置及计算机程序产品
CN114022882B (zh) 文本识别模型训练、文本识别方法、装置、设备及介质
KR20220125672A (ko) 비디오 분류 방법, 장치, 기기 및 기록 매체
WO2023065731A1 (zh) 目标地图模型的训练方法、定位方法及相关装置
US20230306081A1 (en) Method for training a point cloud processing model, method for performing instance segmentation on point cloud, and electronic device
KR20220132414A (ko) 음성 인식 모델의 트레이닝 방법, 장치, 기기 및 기록 매체
JP7403673B2 (ja) モデルトレーニング方法、歩行者再識別方法、装置および電子機器
WO2023134402A1 (zh) 一种基于孪生卷积神经网络的书法字识别方法
WO2023005253A1 (zh) 文本识别模型框架的训练方法、装置及系统
US20230215203A1 (en) Character recognition model training method and apparatus, character recognition method and apparatus, device and storage medium
WO2023159819A1 (zh) 视觉处理及模型训练方法、设备、存储介质及程序产品
CN116363459A (zh) 目标检测方法、模型训练方法、装置、电子设备及介质
CN114817612A (zh) 多模态数据匹配度计算和计算模型训练的方法、相关装置
CN113688955B (zh) 文本识别方法、装置、设备及介质
CN114639096A (zh) 文本识别方法、装置、电子设备和存储介质
CN114898266A (zh) 训练方法、图像处理方法、装置、电子设备以及存储介质
CN113657248A (zh) 人脸识别模型的训练方法、装置及计算机程序产品
CN113360683A (zh) 训练跨模态检索模型的方法以及跨模态检索方法和装置
CN113177483A (zh) 视频目标分割方法、装置、设备以及存储介质
WO2024040870A1 (zh) 文本图像生成、训练、文本图像处理方法以及电子设备
CN115631502A (zh) 文字识别方法、装置、模型训练方法、电子设备及介质
CN115565186A (zh) 文字识别模型的训练方法、装置、电子设备和存储介质
CN113239215B (zh) 多媒体资源的分类方法、装置、电子设备及存储介质

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 20227026823

Country of ref document: KR

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 2022547887

Country of ref document: JP

Kind code of ref document: A

WWE Wipo information: entry into national phase

Ref document number: 17800880

Country of ref document: US

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22783788

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22783788

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 22783788

Country of ref document: EP

Kind code of ref document: A1