WO2023115911A1 - Object re-identification method and apparatus, electronic device, storage medium, and computer program product - Google Patents

Object re-identification method and apparatus, electronic device, storage medium, and computer program product Download PDF

Info

Publication number
WO2023115911A1
WO2023115911A1 PCT/CN2022/104715 CN2022104715W WO2023115911A1 WO 2023115911 A1 WO2023115911 A1 WO 2023115911A1 CN 2022104715 W CN2022104715 W CN 2022104715W WO 2023115911 A1 WO2023115911 A1 WO 2023115911A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
category
loss
sample
network
Prior art date
Application number
PCT/CN2022/104715
Other languages
French (fr)
Chinese (zh)
Inventor
王皓琦
王新江
钟志权
张伟
Original Assignee
上海商汤智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海商汤智能科技有限公司 filed Critical 上海商汤智能科技有限公司
Publication of WO2023115911A1 publication Critical patent/WO2023115911A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/762Arrangements for image or video recognition or understanding using pattern recognition or machine learning using clustering, e.g. of similar faces in social networks

Definitions

  • the present disclosure relates to the field of computer technology, and in particular to an object re-identification method and device, electronic equipment, storage media and computer program products.
  • Re-identification technology is widely used in various projects, such as re-identification of people, vehicles, and objects.
  • some new situations may appear at any time, and correspondingly some never-before-seen data will be generated.
  • Traditional re-identification algorithms require a large number of sample annotations for training, and when the data set shifts or the field shifts, it needs to re-label new data or samples from new fields, and this type of method consumes a lot of manpower and material resources.
  • unsupervised re-identification methods in related technologies often have low accuracy of re-identification results due to influences such as scenes.
  • Embodiments of the present disclosure propose an object re-identification method and device, electronic equipment, a storage medium, and a computer program product, aiming to improve the accuracy of re-identification results through a re-identification model obtained through unsupervised training.
  • an object re-identification method including:
  • the re-recognition result includes the target candidate image, so An object included in the target candidate image matches the target object;
  • the re-identification network is obtained through two-stage training, the first-stage training process is implemented according to at least one sample image and the first category label of each of the sample images, and the second-stage training process is based on the at least one sample image,
  • the pseudo-label and the first category label of each of the sample images are realized, the pseudo-label of each of the sample images is determined based on the re-identification network after the first stage of the training process, and the first category label represents the corresponding image category.
  • an object re-identification device including:
  • An image determination module configured to determine an image to be recognized including a target object
  • a set determination module configured to determine a set of images comprising at least one candidate image, each of said candidate images comprising an object
  • the re-identification module is configured to input the image to be recognized and the image set into a re-identification network to obtain a re-identification result, and if there is a target candidate image in the image set, the re-identification result includes the The target candidate image, the object included in the target candidate image matches the target object;
  • the re-identification network is obtained through two-stage training, the first-stage training process is implemented according to at least one sample image and the first category label of each of the sample images, and the second-stage training process is based on the at least one sample image,
  • the pseudo-label and the first category label of each of the sample images are realized, the pseudo-label of each of the sample images is determined based on the re-identification network after the first stage of the training process, and the first category label represents the corresponding image category.
  • an electronic device including: a processor; a memory for storing instructions executable by the processor; wherein the processor is configured to call the instructions stored in the memory, to perform the above method.
  • a computer-readable storage medium on which computer program instructions are stored, and when the computer program instructions are executed by a processor, the foregoing method is implemented.
  • a computer program product includes a non-transitory computer-readable storage medium storing a computer program, and when the computer program is read and executed by a computer, the following Part or all of the steps of the methods described in the embodiments of the present disclosure.
  • the computer program product may be a software installation package.
  • the performance of the re-identification network is improved through two-stage training, thereby improving the accuracy of the recognition result.
  • FIG. 1 shows a flowchart of an object re-identification method according to an embodiment of the present disclosure
  • Fig. 2 shows a flow chart of training a re-identification network according to an embodiment of the present disclosure
  • Fig. 3 shows a schematic diagram of a preset image according to an embodiment of the present disclosure
  • Fig. 4 shows a schematic diagram of a sample image according to an embodiment of the present disclosure
  • Fig. 5 shows a schematic diagram of determining a sample graph according to an embodiment of the present disclosure
  • FIG. 6 shows a schematic diagram of a first-stage training process of a re-identification network according to an embodiment of the present disclosure
  • FIG. 7 shows a schematic diagram of a second-stage training process of a re-identification network according to an embodiment of the present disclosure
  • Fig. 8 shows a schematic diagram of an object re-identification device according to an embodiment of the present disclosure
  • Fig. 9 shows a schematic diagram of an electronic device according to an embodiment of the present disclosure.
  • Fig. 10 shows a schematic diagram of another electronic device according to an embodiment of the present disclosure.
  • the object re-identification method in the embodiment of the present disclosure may be executed by an electronic device such as a terminal device or a server.
  • the terminal device may be user equipment (User Equipment, UE), mobile device, user terminal, terminal, cellular phone, cordless phone, personal digital assistant (Personal Digital Assistant, PDA), handheld device, computing device, vehicle-mounted device, Any mobile or fixed terminal such as wearable devices.
  • the server can be a single server or a server cluster composed of multiple servers. Any electronic device can realize the object re-identification method of the embodiment of the present disclosure by calling the computer-readable instructions stored in the memory by the processor.
  • the object re-identification method of the embodiment of the present disclosure can be applied to re-identify any object, such as a person, a vehicle, and an animal.
  • the re-identification method can search for images or video frames containing specific objects in multiple images or video frame sequences, and can be applied to the application scene of searching for specific people in images collected by multiple cameras, or tracking objects such as pedestrians and vehicles Application scenarios.
  • Fig. 1 shows a flowchart of an object re-identification method according to an embodiment of the present disclosure.
  • the object re-identification method of the embodiment of the present disclosure may include the following steps S10 to S30.
  • Step S10 determining the image to be recognized including the target object.
  • the image to be recognized may be an image directly obtained by capturing the target object, or an image obtained by intercepting an image obtained by capturing the target object and an area where the target object is located.
  • the determination method of the image to be recognized may be collected by an image acquisition device built in or connected to the electronic device, or directly receiving the image to be recognized sent by other devices.
  • the target object can be any movable or non-movable object such as people, animals, vehicles or even furniture.
  • Step S20 determining an image set including at least one candidate image, each of which includes an object.
  • an image set used as a basis for re-identification of the image to be recognized is determined, including at least one candidate image for matching with the image to be recognized.
  • the image collection may be pre-stored in the electronic device, or in a database connected to the electronic device.
  • Each candidate image is obtained by collecting similar objects of the target object, and may be an image obtained by directly collecting the object, or an image obtained by intercepting the collected object to obtain an area where the object is located in the image. That is, the objects in each candidate image are the same kind of objects as the target object. For example, when the target object is a person, the object in the candidate image is also a person. When the target object is a vehicle, the object in the candidate image is also a vehicle.
  • each candidate image in the image set also has a corresponding second category label, which is used to characterize the category of the object in the candidate image.
  • the second category label may be identity information such as the object's name, phone number, and ID card number.
  • the second category label may be the vehicle's license plate number, vehicle owner information, driving certificate number, and the like.
  • Step S30 input the image to be recognized and the set of images into a re-recognition network to obtain a re-recognition result.
  • the image to be recognized and the image set are input into the re-identification network, and the candidate image whose object is matched with the target object is determined through the re-identification network among multiple candidate images, and the candidate image is used as the target candidate
  • the image gets a re-identification result. That is, in a case where there is a target candidate image in which the included object matches the target object, the target candidate image may be included in the re-identification result.
  • the re-identification result may also include the category of the target object. That is, after the target candidate image is determined, the second category label corresponding to the target candidate image is also determined as the second category label of the image to be recognized.
  • the detailed process of determining the re-identification result through the re-identification network may be to input the image to be recognized and the image set into the re-identification network, extract the target object features of the image to be recognized through the re-identification network, and each candidate image candidate object features. Then determine the similarity between each candidate image and the image to be recognized according to the characteristics of the target object and the characteristics of each candidate object. In response to the similarity between the candidate image and the image to be recognized satisfying the preset condition, it is determined that the object in the candidate image matches the target object, and the candidate image is used as the target candidate image.
  • the features of the target object can be obtained by intercepting the region where the target object is located in the image to be recognized, and extracting the features of the region through the feature extraction layer of the re-identification network.
  • the features of the candidate object can also be obtained by intercepting the area where the object is located in the candidate image and extracting the features of the area through the feature extraction layer of the re-identification network.
  • the features of the target object and each candidate object can be represented by vectors, and the similarity can be obtained by calculating the distance between the two corresponding vectors in the feature space. The similarity can be calculated by the following formula 1:
  • similarity(A,B) is the similarity between A and B
  • A is the target object feature
  • B is the candidate object feature
  • n is the target object feature and the number of elements in the candidate object feature
  • i represents the current element in the target object feature and the position in the feature of the candidate object, that is, which element is the current element.
  • the preset condition may be that the similarity value is the largest and greater than the similarity threshold, that is, the candidate image with the largest similarity value and greater than the similarity threshold is determined as the target candidate image.
  • the second category label of the target candidate image is determined as the second category label of the image to be recognized, and a re-identification result including the target candidate image and the corresponding second category label is determined.
  • the category of the target object in the current image to be recognized is a new Category
  • determine that the re-identification result is a new category.
  • the re-identification network in the embodiment of the present disclosure is obtained through two-stage training.
  • the first-stage training process is implemented according to at least one sample image and the first category label of each sample image
  • the second-stage training process is implemented according to at least one sample image, the pseudo-label of each sample image and the first category label, and each The pseudo-labels of the sample images are determined based on the re-identification network after the first-stage training process, and the first category label represents the category of the corresponding image.
  • the sample image is a sample image that has not been manually labeled.
  • Fig. 2 shows a flow chart of training a re-identification network according to an embodiment of the present disclosure.
  • the training process of the re-identification network in the embodiment of the present disclosure may include the following steps S40 to S80.
  • the electronic device that executes steps S40 to S80 may be an electronic device that executes the object re-identification method, or other electronic devices such as terminals or servers.
  • Step S40 determining at least one preset image including the object.
  • each preset image is obtained by capturing at least one object, and each preset image has at least one image frame for marking an area where the object is located and a first category label corresponding to each image frame.
  • each preset image has at least one image frame, which is used to mark the area where the object in the preset image is located.
  • the image frame can be annotated by any object annotation method.
  • the pre-trained object recognition model may be input into the pre-trained image to recognize the position of the object included in the pre-set image, and output at least one image frame representing the position of the object.
  • the first category label represents the category of the image region in the corresponding image frame, and can be determined according to the collected object. For example, when two characters are collected by an image acquisition device to obtain a preset image, the position of each character in the preset image can be identified to obtain two corresponding image frames, and each image frame is assigned a corresponding first category label as Person 1 and Person 2.
  • At least one preset image may be determined through random sampling, that is, random sampling is performed on a set of preset images to obtain at least one preset image including an object.
  • the preset image set may be pre-stored in the electronic device for training the re-recognition network, or stored in other devices, and the electronic device for training the re-recognition network directly extracts at least one preset image from other electronic devices.
  • Fig. 3 shows a schematic diagram of a preset image according to an embodiment of the present disclosure.
  • the preset image 30 may include at least one object, and the preset image 30 also has an image frame for marking the position of the object.
  • the preset image 30 may have an image frame for representing the location of the face of at least one person.
  • the preset image 30 includes characters 1 and 2
  • the preset image 30 has a first image frame 31 representing the area where the face of character 1 is located, and a second image frame 32 representing the area where the face of character 2 is located.
  • the first category tag corresponding to the first image frame 31 in the preset image 30 can be directly preset as character 1, and the second The first category label corresponding to the image frame 32 is person 2 .
  • Step S50 Determine at least one sample image corresponding to each preset image according to the corresponding at least one image frame.
  • At least one sample image corresponding to each preset image is determined according to an image frame corresponding to each preset image.
  • Each sample image is obtained by cropping a part of the preset image.
  • multiple sample images may be obtained by cutting out each image frame of the preset image.
  • at least one data enhancement may be performed on each preset image, and after each data enhancement, an area in at least one image frame may be intercepted as a sample image.
  • the data enhancement process may include translating the image frame, flipping the image frame, reducing the ratio of the image frame, etc., so that the sample image intercepted after each data enhancement can include different regions of the object.
  • data preprocessing can be performed on each preset image before data enhancement.
  • the data preprocessing process may include any processing methods such as format conversion, image brightness adjustment, and overall noise reduction, and at least one processing method may be pre-selected for data preprocessing as required.
  • Fig. 4 shows a schematic diagram of a sample image according to an embodiment of the present disclosure.
  • multiple sample images corresponding to at least one object are obtained by cropping the preset image 30 .
  • the preset image 30 includes character 1 and character 2
  • the image frame on the preset image 30 is the first image frame 31 representing the area where the face of character 1 is located, and the second image frame 32 representing the area where the face of character 2 is located
  • at least one first object sample image 33 corresponding to the person 1 in the preset image 30 and at least one second object sample image 34 corresponding to the person 2 in the preset image 30 may be determined.
  • the content in the image frame is used to obtain the corresponding first object sample image 33 and second object sample image 34 .
  • Fig. 5 shows a schematic diagram of determining a sample graph according to an embodiment of the present disclosure.
  • the preset image set 50 including at least one preset image can be determined first, and the preset image set 50 is randomly sampled 51 to obtain At least one preset image 52 includes an object.
  • the order of the process of randomly sampling the preset image from the preset image set and the process of extracting the sample image from the preset image can be changed, that is, the preset image can be randomly sampled first Then extract the sample image, or first extract the sample image for each preset image in the preset image set, and then randomly sample.
  • the sequence of image preprocessing and data enhancement during sample image extraction can also be adjusted.
  • the embodiments of the present disclosure can obtain multiple sample images corresponding to each object through data enhancement, greatly expanding the number of sample images.
  • the image processing can also be performed in parallel by a GPU (Graphics Processing Unit, graphics processing unit), so as to shorten the image processing speed and reduce unnecessary background noise.
  • the embodiment of the present disclosure alleviates the problem that the loss of the training process is difficult to calculate due to too many sample categories by randomly selecting preset images, and makes the extracted preset images representative through random sampling, which can reflect the preset Features of an image collection.
  • Step S60 performing a first-stage training on the re-identification network according to the sample image and the corresponding first category label.
  • the re-identification network may be trained in the first stage directly according to each sample image and the first category label.
  • the re-identification network can output the first predicted category of the input sample image, and the characterizing re-identification network predicts the category of the object in the input sample image. Since only one object is included in the sample image, the real image category of the sample image is the real object category, and the loss can be calculated according to the real image category and the real object category of the sample image respectively with the first predicted category to obtain the total re-identification Network loss for network conditioning.
  • manual sample image labeling is not required before re-identification network training, and the first category label of the image frame corresponding to the sample image can be directly used as the first category label of the sample image.
  • a category label that is, the category of the area where each object position in the preset image is used as the real image category of the sample image obtained by collecting this part of the area.
  • the actual second category label may not be labeled according to the object category in each sample image.
  • the first category label of each sample image is directly used as the second category label representing the object category, and the real object category in the sample image is corrected in the second-stage training process .
  • the first category labels of each image frame are "person 1", “person 2” and “person 3”, respectively extract the A sample image of .
  • the identity of each person will be identified in detail, and the corresponding second category tags can be marked as "Zhang San”, “Li Si” and “Wang Wu”.
  • the category labels are "Person 1", “Person 2” and "Person 3”.
  • the first-stage training process of the re-identification network includes determining the first category label corresponding to each sample image as the second category label, and then inputting each sample image into the re-identification network, and outputting corresponding to the first predicted category.
  • a first network loss is determined according to the first category label, the second category label and the first predicted category corresponding to each sample image, and the re-identification network is adjusted according to the first network loss.
  • the first loss may be determined according to the first category label of the sample image and the first predicted category
  • the second loss may be determined according to the second category label of the sample image and the first predicted category.
  • the first loss can be determined according to the first category label and the first prediction category corresponding to each sample image
  • the second loss can be determined according to the second category label and the first prediction category corresponding to each sample image
  • the first loss can be determined according to the first loss and the second loss to determine the first network loss, and adjust the re-identification network according to the first network loss.
  • the first network loss can be obtained by calculating the weighted sum of the first loss and the second loss.
  • the first loss may be a triplet loss
  • the second loss may be a cross-entropy classification loss. That is, the first loss can be obtained by calculating the triplet loss of the first category label and the first predicted category of each sample image, and the second loss can be obtained by calculating the cross entropy of the second category label and the first predicted category of each sample image
  • the classification loss is obtained.
  • the triplet loss is inversely proportional to the distance between samples of the same object category and proportional to the distance between samples of different object categories.
  • the triplet loss can be reduced by network conditioning, bringing the distance between samples of the same object category closer and the distance between samples of different object categories farther away.
  • the cross-entropy classification loss is inversely proportional to the distance between samples of the same image category, and the cross-entropy classification loss can be reduced through network adjustment to pull in the distance between samples of the same image category.
  • triplet loss and the cross-entropy classification loss can be calculated by the following formulas 2 and 3, respectively:
  • the triplet loss in formula 2 is L th , P ⁇ K is the total number of sample images, a is any sample image, and p is the corresponding feature in multiple sample images with the same first category label as a The sample image with the largest distance between the vector and the feature vector corresponding to a in the feature space, n is the sample image with the smallest distance between the corresponding feature vector and the feature vector corresponding to a in the feature space among multiple sample images different from the first category label of a .
  • is a preset correction parameter.
  • the cross-entropy classification loss in formula 3 is L, N is the number of sample images, M is the number of second class labels, p ic is the predicted probability that sample image i belongs to the first predicted class c, y ic is in the second class of sample i It takes 1 when the category label is c, and takes 0 when it is not c.
  • a first-stage adjustment may be performed on the re-identification network until the first network loss satisfies a first preset condition.
  • the first preset condition may be that the first network loss is smaller than a preset first threshold.
  • the re-identification network adjusted in the first stage can obtain a well-distributed feature space. That is to say, by adjusting the feature extraction layer of the re-identification network, the re-recognition network can extract similar feature vectors for images of the same image category, and can also extract similar feature vectors for images of the same object category.
  • Fig. 6 shows a schematic diagram of a first-stage training process of a re-identification network according to an embodiment of the present disclosure.
  • the first category label 61 corresponding to the sample image 60 and Second category label 62 .
  • Each sample image 60 is input into the re-identification network 63 to obtain a first predicted category 64 , and a first loss 65 is calculated according to the first predicted category 64 and the first category label 61 of each sample image 60 .
  • the second loss 66 is calculated according to the first prediction type 64 and the second category label 62 of each sample image 60 , and the re-identification network 63 is jointly adjusted according to the first loss 65 and the second loss 66 .
  • the adjustment method may be to calculate the weighted sum of the first loss 65 and the second loss 66 to obtain the first network loss, and perform a first-stage adjustment on the re-identification network 63 until the first network loss meets the first preset condition.
  • Step S70 Determine the pseudo-label of the sample image according to the re-identification network whose training in the first stage is completed.
  • the pseudo-label of each sample image can be determined according to the feature space with a relatively reasonable distribution after the re-identification network is trained. Among them, the pseudo-label of each sample image is used to represent the category of the object in the sample image during the second stage of training. Pseudo-labels can be labels of any content, and each pseudo-label is used to uniquely represent a class of objects.
  • the way to determine the pseudo-label according to the re-identification network trained in the first stage can be to input each sample image into the re-identification network after the first stage training, and obtain the feature vector after feature extraction for each sample image .
  • the feature vectors of each sample image are clustered, and identification information uniquely corresponding to each cluster obtained after clustering is determined.
  • the identification information corresponding to each cluster is used as the pseudo-label of the sample image corresponding to each feature vector contained therein.
  • the clustering process can be realized based on the k-means clustering algorithm.
  • the unique identification information corresponding to each cluster can be preset or generated according to preset rules.
  • Step S80 performing a second-stage training on the re-identification network obtained after the first-stage training according to the sample images and corresponding first category labels and pseudo-labels.
  • the first category label corresponding to each sample image in the first stage of training process is used as the real image category, and the corresponding Pseudo-labels serve as real object categories.
  • the second-stage training of the re-identification network is performed based on the current real image category of each sample image, the real object category, and the sample image category predicted by the re-identification network. That is to say, each sample image can be input into the re-identification network obtained after the first stage of training, and output the corresponding second predicted category.
  • a second network loss is determined according to the first class label, pseudo-label and second predicted class corresponding to each sample image, and the re-identification network is adjusted according to the second network loss.
  • the second-stage training process calculates the loss according to the real image category and the real object category of the sample image respectively and the second predicted category, and the total re-identification network loss is obtained by Make network adjustments. That is to say, the process of adjusting the re-identification network in the second stage may include determining the third loss according to the first class label and the second predicted class corresponding to each sample image, and determining the third loss according to the pseudo-label corresponding to each sample image and the second predicted class Determine the fourth loss. Then determine the second network loss according to the third loss and the fourth loss, and adjust the re-identification network according to the second network loss. Wherein, the second network loss can be obtained by calculating the weighted sum of the third loss and the fourth loss.
  • the third loss may be a triplet loss
  • the fourth loss may be a cross-entropy classification loss. That is, the third loss can be obtained by calculating the triplet loss of the first category label and the second prediction category of each sample image, and the fourth loss can be obtained by calculating the pseudo-label of each sample image and the cross-entropy classification loss of the first prediction category get.
  • the triplet loss is inversely proportional to the distance between samples of the same object category and proportional to the distance between samples of different object categories.
  • the triplet loss can be reduced by network conditioning, bringing the distance between samples of the same object category closer and the distance between samples of different object categories farther away.
  • the cross-entropy classification loss is inversely proportional to the distance between samples of the same image category, and the cross-entropy classification loss can be reduced through network adjustment to pull in the distance between samples of the same image category.
  • the calculation process of the third loss may be the same as that of the first loss
  • the calculation process of the fourth loss may be the same as that of the second loss.
  • the re-identification network can be adjusted in the second stage until the second network loss satisfies the second preset condition.
  • the second preset condition may be that the second network loss is smaller than a preset second threshold.
  • the re-identification network adjusted in the second stage can obtain a feature space with a more reasonable distribution. That is to say, by adjusting the feature extraction layer of the re-recognition network, the re-recognition network can more accurately extract similar feature vectors for images of the same image category, and can more accurately extract similar feature vectors for images of the same object category. eigenvectors of .
  • Fig. 7 shows a schematic diagram of a second-stage training process of a re-identification network according to an embodiment of the present disclosure.
  • the first category label 71 corresponding to the sample image 70 and Pseudo-labeling72.
  • Each sample image 70 is input into the re-identification network 73 to obtain the second predicted category 74 , and the third loss 75 is calculated according to the second predicted category 74 and the first category label 71 of each sample image 70 .
  • the fourth loss 76 is calculated according to the second prediction type 74 and the pseudo-label 72 of each sample image 70 , and the re-identification network 73 is jointly adjusted according to the third loss 75 and the fourth loss 76 .
  • the adjustment method may be to calculate the weighted sum of the third loss 75 and the fourth loss 76 to obtain the second network loss, and perform a first-stage adjustment on the re-identification network 73 until the second network loss meets the second preset condition.
  • a high-accuracy re-identification network can be obtained through unlabeled data training at low cost and quickly.
  • the re-identification network can accurately extract similar feature vectors for images of the same image category, and also for images of the same object category. It can accurately extract similar feature vectors to obtain a reasonably distributed feature space.
  • the accuracy rate of the re-identification network can be obtained through two-stage training, and the image to be recognized can be accurately re-identified through the re-identification network, and an accurate re-identification result can be obtained.
  • embodiments of the present disclosure further provide an object re-identification method, which is described in detail below:
  • the information of the network input data includes the sample image and its type, as well as the bounding box coordinates of the detection network output.
  • Each sample image has its own independent sample label, and the type of the sample image is called a category label. This label At the beginning, no manual labeling is provided, and some samples are randomly selected to use their sample labels as category labels.
  • the data input by the network is divided into training data, query data, and gallery data.
  • each sample in the training data first undergoes specific data preprocessing, and the defect bounding box with the output of the detection network is generated for each sample by data enhancement. More equivalent data, these equivalent The data share common sample labels and common category labels with the original samples.
  • the algorithm feeds data and labels into a deep learning neural network, which is characterized by two loss functions.
  • the cross-entropy classification loss function uses randomly selected sample labels as the category label as the true value, and the triplet loss function uses the sample label as the true value.
  • a feature space will be obtained, and k-means clustering will be performed on this feature space, and then all samples will get pseudo-labels according to this, and the second learning will be carried out.
  • the cross-entropy classification loss function uses the pseudo labels as ground truth, and the triplet loss function uses the sample labels as ground truth.
  • the feature space distribution is more comprehensive and reasonable. Input the query data and gallery data into the network after preprocessing, calculate the similarity between the query data and the gallery data in the feature space, and use the similarity to judge whether the query data belongs to certain categories.
  • Data enhancement methods include translation, flipping, and reducing the proportion of defective frames to Capture a larger field of view map.
  • a very important step in the algorithm is random sampling and the sample label of the extracted sample is used as the category label for preliminary training.
  • This method frees the samples from the cost of a large number of manual labels, and alleviates the problem that too many category labels are difficult to calculate the cross-entropy classification loss.
  • the randomly selected sample labels are still representative, reflecting the appearance and characteristics of the data set.
  • triple loss function plus difficult sample mining for each sample, take the maximum distance of positive samples (ie, the most difficult positive sample) and the minimum distance of negative samples (ie, the most difficult negative sample) as the optimization of the loss function
  • the goal is to reduce the distance between positive samples and increase the distance between negative samples to obtain a better feature space to ensure effective learning.
  • adding the background image (unblemished image) as a negative sample to the training sample can make the neural network better compare the difference between positive and negative samples, thereby improving the effect.
  • the triplet loss function is determined by the following formula four:
  • k is an adjustable parameter defined by the user, and the number of categories can be further reduced according to the distribution of data to make the samples of the same category closer together. Moreover, after clustering, all samples can be assigned a pseudo-label, which is the label that the sample belongs to the cluster. The samples that have not been randomly drawn before are added to the cross-entropy classification training to further optimize the feature space distribution.
  • the k-means clustering optimization algorithm is determined by the following formula 5, S is the set of all random sampling samples, and ⁇ is the average value in class i.
  • Stage 1 training uses randomly sampled sample labels as the true value of the cross-entropy loss function
  • stage 2 uses the pseudo-labels of all samples obtained after clustering as the true value of the cross-entropy loss function.
  • the number of labels obtained by random sampling and clustering can maintain a constant value or a small incremental value after the data surge, so as to ensure that the calculation cost of cross entropy will not increase sharply with the data surge.
  • the model After the model is trained, you only need to input the image to be detected and the image gallery into the network to obtain the matching result of the image to be detected and the samples in the image gallery. You can determine whether the image to be detected belongs to a certain type of sample and whether it belongs which type of sample.
  • the method of comparing the similarity is to calculate the cosine distance of the samples in the feature space, and the similarity can be determined by the following formula 6. If the query sample has a sample library sample that exceeds a certain pre-value similarity, it is classified into this class, otherwise it is classified into a new class that does not belong to the sample library sample.
  • a and B represent the feature matrix of the sample.
  • the object re-identification method in the embodiment of the present application can achieve the following technical effects:
  • This technique uses an unsupervised re-identification network to assist a classification network to learn to quickly classify images.
  • the network can re-identify and classify images misjudged as a certain category, improve the distribution of the feature space on the basis of the original classification, and refine the feature level, so that the network does not only learn a fuzzy category, but learns each category.
  • the comparison and distribution between samples can improve the original classification accuracy and recall rate.
  • the network uses rendering technology to process each image, and the data enhancement method greatly expands the sample size. It is very time-consuming to process each image separately under such a mechanism, and the method of rendering first, then cutting the frame, and some The method of GPU optimization acceleration can greatly shorten the image processing speed, and can also reduce unnecessary background noise.
  • embodiments of the present disclosure also provide object re-identification devices, electronic equipment, computer-readable storage media, and programs, all of which can be used to implement any object re-identification method provided by the present disclosure, corresponding technical solutions and descriptions, and refer to methods Part of the corresponding record.
  • Fig. 8 shows a schematic diagram of an object re-identification device according to an embodiment of the present disclosure.
  • the object re-identification apparatus of the embodiment of the present disclosure may include an image determination module 80 , a set determination module 81 and a re-identification module 82 .
  • An image determining module 80 configured to determine an image to be recognized including a target object
  • a set determination module 81 configured to determine an image set comprising at least one candidate image, each of which includes an object;
  • the re-identification module 82 is configured to input the image to be recognized and the image set into a re-identification network to obtain a re-identification result, and if there is a target candidate image in the image set, the re-identification result includes The target candidate image, the object included in the target candidate image matches the target object;
  • the re-identification network is obtained through two-stage training, the first-stage training process is implemented according to at least one sample image and the first category label of each of the sample images, and the second-stage training process is based on the at least one sample image,
  • the pseudo-label and the first category label of each of the sample images are realized, the pseudo-label of each of the sample images is determined based on the re-identification network after the first stage of the training process, and the first category label represents the corresponding image category.
  • each of the candidate images has a corresponding second category label, and the second category label represents the category of the object in the corresponding image;
  • the device also includes:
  • the label determination module is configured to determine the second category label corresponding to the target candidate image as the second category label of the image to be recognized.
  • the training process of the re-identification network includes:
  • each of the preset images having at least one image frame for marking the area where the object is located, and a first category label corresponding to each of the image frames;
  • the second-stage training is performed on the re-identification network obtained after the first-stage training according to the sample images and the corresponding first category labels and pseudo-labels.
  • the determining at least one preset image including an object includes:
  • Random sampling is performed on the set of preset images to obtain at least one preset image including an object.
  • the determining at least one sample image corresponding to each preset image according to the corresponding at least one image frame includes:
  • image preprocessing is performed on the preset images.
  • the first-stage training of the re-identification network according to the sample image and the corresponding first category label includes:
  • a first network loss is determined according to the first category label, the second category label and the first predicted category corresponding to each of the sample images, and the re-identification network is adjusted according to the first network loss.
  • the first network loss is determined according to the first category label, the second category label, and the first predicted category corresponding to each of the sample images, and the first network loss is adjusted according to the first network loss.
  • the re-identification network includes:
  • a first network loss is determined based on the first loss and the second loss, and the re-identification network is adjusted based on the first network loss.
  • the determining the pseudo-label of the sample image according to the re-identification network whose training is completed in the first stage includes:
  • Each of the sample images is input into the re-identification network that has been trained in the first stage to obtain a feature vector after feature extraction is performed on each of the sample images;
  • Clustering the feature vectors of each of the sample images, and determining identification information uniquely corresponding to each cluster obtained after clustering;
  • the identification information corresponding to each of the clusters is used as a pseudo-label corresponding to the sample image included in each feature vector.
  • the clustering process is implemented based on a k-means clustering algorithm.
  • the second-stage training of the re-identification network obtained after the first-stage training according to the sample image and the corresponding first category label and pseudo-label includes:
  • a second network loss is determined according to the first class label, pseudo-label and second predicted class corresponding to each of the sample images, and the re-identification network is adjusted according to the second network loss.
  • the second network loss is determined according to the first category label, pseudo-label and second prediction category corresponding to each of the sample images, and the weighting is adjusted according to the second network loss.
  • Identification networks include:
  • a second network loss is determined based on the third loss and the fourth loss, and the re-identification network is adjusted based on the second network loss.
  • the first loss and/or the third loss is a triplet loss
  • the second loss and/or the fourth loss is a cross-entropy classification loss
  • the re-identification module 82 includes:
  • the image input submodule is configured to input the image to be recognized and the set of images into a re-identification network, and extract the target object features of the image to be recognized through the re-identification network, and the candidate of each of the candidate images object characteristics;
  • a similarity matching submodule configured to determine the similarity between each of the candidate images and the image to be recognized according to the characteristics of the target object and the characteristics of each of the candidate objects;
  • the result output submodule is configured to, in response to the similarity between the candidate image and the image to be recognized satisfying a preset condition, determine that the object in the candidate image matches the target object, and use the candidate image as the target candidate image Get re-identification results.
  • the preset condition includes that the similarity value is the largest and greater than a similarity threshold.
  • an electronic device including: a processor; a memory for storing instructions executable by the processor; wherein the processor is configured to call the instructions stored in the memory, to perform the above method.
  • the functions or modules included in the apparatus provided by the embodiments of the present disclosure may be used to execute the methods described in the above method embodiments, and for detailed implementation, refer to the descriptions of the above method embodiments.
  • Embodiments of the present disclosure also provide a computer-readable storage medium, on which computer program instructions are stored, and the above-mentioned method is implemented when the computer program instructions are executed by a processor.
  • Computer readable storage media may be volatile or nonvolatile computer readable storage media.
  • An embodiment of the present disclosure also proposes an electronic device, including: a processor; a memory for storing instructions executable by the processor; wherein the processor is configured to invoke the instructions stored in the memory to execute the above method.
  • An embodiment of the present disclosure also provides a computer program product, including computer-readable codes, or a non-volatile computer-readable storage medium carrying computer-readable codes, when the computer-readable codes are stored in a processor of an electronic device When running in the electronic device, the processor in the electronic device executes the above method.
  • Electronic devices may be provided as terminals, servers, or other forms of devices.
  • FIG. 9 shows a schematic diagram of an electronic device 800 according to an embodiment of the present disclosure.
  • the electronic device 800 may be a terminal such as a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, a fitness device, or a personal digital assistant.
  • electronic device 800 may include one or more of the following components: processing component 802, memory 804, power supply component 806, multimedia component 808, audio component 810, input/output (I/O) interface 812, sensor component 814, and a communication component 816 .
  • the processing component 802 generally controls the overall operations of the electronic device 800, such as those associated with display, telephone calls, data communications, camera operations, and recording operations.
  • the processing component 802 may include one or more processors 820 to execute instructions to complete all or part of the steps of the above method. Additionally, processing component 802 may include one or more modules that facilitate interaction between processing component 802 and other components. For example, processing component 802 may include a multimedia module to facilitate interaction between multimedia component 808 and processing component 802 .
  • the memory 804 is configured to store various types of data to support operations at the electronic device 800 . Examples of such data include instructions for any application or method operating on the electronic device 800, contact data, phonebook data, messages, pictures, videos, and the like.
  • the memory 804 can be implemented by any type of volatile or non-volatile storage device or their combination, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable Programmable Read Only Memory (EPROM), Programmable Read Only Memory (PROM), Read Only Memory (ROM), Magnetic Memory, Flash Memory, Magnetic or Optical Disk.
  • SRAM static random access memory
  • EEPROM electrically erasable programmable read-only memory
  • EPROM erasable Programmable Read Only Memory
  • PROM Programmable Read Only Memory
  • ROM Read Only Memory
  • Magnetic Memory Flash Memory
  • Magnetic or Optical Disk Magnetic Disk
  • the power supply component 806 provides power to various components of the electronic device 800 .
  • Power components 806 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for electronic device 800 .
  • the multimedia component 808 includes a screen providing an output interface between the electronic device 800 and the user.
  • the screen may include a liquid crystal display (LCD) and a touch panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from a user.
  • the touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensor may not only sense a boundary of a touch or swipe action, but also detect duration and pressure associated with the touch or swipe action.
  • multimedia component 808 includes a front camera and/or rear camera. When the electronic device 800 is in an operation mode, such as a shooting mode or a video mode, the front camera and/or the rear camera can receive external multimedia data. Each front camera and rear camera can be a fixed optical lens system or have focal length and optical zoom capability.
  • the audio component 810 is configured to output and/or input audio signals.
  • the audio component 810 includes a microphone (MIC), which is configured to receive external audio signals when the electronic device 800 is in operation modes, such as call mode, recording mode and voice recognition mode. Received audio signals may be further stored in memory 804 or sent via communication component 816 .
  • the audio component 810 also includes a speaker for outputting audio signals.
  • the I/O interface 812 provides an interface between the processing component 802 and a peripheral interface module, which may be a keyboard, a click wheel, a button, and the like. These buttons may include, but are not limited to: a home button, volume buttons, start button, and lock button.
  • Sensor assembly 814 includes one or more sensors for providing status assessments of various aspects of electronic device 800 .
  • the sensor component 814 can detect the open/closed state of the electronic device 800, the relative positioning of components, such as the display and the keypad of the electronic device 800, the sensor component 814 can also detect the electronic device 800 or a Changes in position of components, presence or absence of user contact with electronic device 800 , electronic device 800 orientation or acceleration/deceleration and temperature changes in electronic device 800 .
  • Sensor assembly 814 may include a proximity sensor configured to detect the presence of nearby objects in the absence of any physical contact.
  • Sensor assembly 814 may also include an optical sensor, such as a complementary metal-oxide-semiconductor (CMOS) or charge-coupled device (CCD) image sensor, for use in imaging applications.
  • CMOS complementary metal-oxide-semiconductor
  • CCD charge-coupled device
  • the sensor component 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor or a temperature sensor.
  • the communication component 816 is configured to facilitate wired or wireless communication between the electronic device 800 and other devices.
  • the electronic device 800 can access a wireless network based on a communication standard, such as a wireless network (WiFi), a second generation mobile communication technology (2G) or a third generation mobile communication technology (3G), or a combination thereof.
  • the communication component 816 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel.
  • the communication component 816 also includes a near field communication (NFC) module to facilitate short-range communication.
  • the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, Infrared Data Association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology and other technologies.
  • RFID Radio Frequency Identification
  • IrDA Infrared Data Association
  • UWB Ultra Wideband
  • Bluetooth Bluetooth
  • electronic device 800 may be implemented by one or more application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field programmable A programmable gate array (FPGA), controller, microcontroller, microprocessor or other electronic component implementation for performing the methods described above.
  • ASICs application specific integrated circuits
  • DSPs digital signal processors
  • DSPDs digital signal processing devices
  • PLDs programmable logic devices
  • FPGA field programmable A programmable gate array
  • controller microcontroller, microprocessor or other electronic component implementation for performing the methods described above.
  • a non-volatile computer-readable storage medium such as the memory 804 including computer program instructions, which can be executed by the processor 820 of the electronic device 800 to implement the above method.
  • FIG. 10 shows a schematic diagram of another electronic device 1900 according to an embodiment of the present disclosure.
  • electronic device 1900 may be provided as a server.
  • electronic device 1900 includes processing component 1922 , which further includes one or more processors, and memory resources represented by memory 1932 for storing instructions executable by processing component 1922 , such as application programs.
  • the application programs stored in memory 1932 may include one or more modules each corresponding to a set of instructions.
  • the processing component 1922 is configured to execute instructions to perform the above method.
  • Electronic device 1900 may also include a power supply component 1926 configured to perform power management of electronic device 1900, a wired or wireless network interface 1950 configured to connect electronic device 1900 to a network, and an input-output (I/O) interface 1958 .
  • the electronic device 1900 can operate based on the operating system stored in the memory 1932, such as the Microsoft server operating system (Windows Server TM ), the graphical user interface-based operating system (Mac OS X TM ) introduced by Apple Inc., and the multi-user and multi-process computer operating system (Unix TM ), a free and open source Unix-like operating system (Linux TM ), an open source Unix-like operating system (FreeBSD TM ), or the like.
  • Microsoft server operating system Windows Server TM
  • Mac OS X TM graphical user interface-based operating system
  • Unix TM multi-user and multi-process computer operating system
  • Linux TM free and open source Unix-like operating system
  • FreeBSD TM open source Unix-like operating system
  • a non-transitory computer-readable storage medium such as the memory 1932 including computer program instructions, which can be executed by the processing component 1922 of the electronic device 1900 to implement the above method.
  • the present disclosure can be a system, method and/or computer program product.
  • a computer program product may include a computer readable storage medium having computer readable program instructions thereon for causing a processor to implement various aspects of the present disclosure.
  • a computer readable storage medium may be a tangible device that can retain and store instructions for use by an instruction execution device.
  • a computer readable storage medium may be, for example, but is not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing.
  • Computer-readable storage media include: portable computer diskettes, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM), or flash memory), static random access memory (SRAM), compact disc read only memory (CD-ROM), digital versatile disc (DVD), memory stick, floppy disk, mechanically encoded device, such as a printer with instructions stored thereon A hole card or a raised structure in a groove, and any suitable combination of the above.
  • RAM random access memory
  • ROM read-only memory
  • EPROM erasable programmable read-only memory
  • flash memory static random access memory
  • SRAM static random access memory
  • CD-ROM compact disc read only memory
  • DVD digital versatile disc
  • memory stick floppy disk
  • mechanically encoded device such as a printer with instructions stored thereon
  • a hole card or a raised structure in a groove and any suitable combination of the above.
  • computer-readable storage media are not to be construed as transient signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., pulses of light through fiber optic cables), or transmitted electrical signals.
  • Computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a respective computing/processing device, or downloaded to an external computer or external storage device over a network, such as the Internet, a local area network, a wide area network, and/or a wireless network.
  • the network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers, and/or edge servers.
  • a network adapter card or a network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in each computing/processing device .
  • Computer program instructions for performing the operations of the present disclosure may be assembly instructions, instruction set architecture (ISA) instructions, machine instructions, machine-dependent instructions, microcode, firmware instructions, state setting data, or Source or object code written in any combination, including object-oriented programming languages—such as Smalltalk, C++, etc., and conventional procedural programming languages—such as the “C” language or similar programming languages.
  • Computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server implement.
  • the remote computer can be connected to the user computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (such as via the Internet using an Internet service provider). connect).
  • LAN local area network
  • WAN wide area network
  • an electronic circuit such as a programmable logic circuit, field programmable gate array (FPGA), or programmable logic array (PLA)
  • FPGA field programmable gate array
  • PDA programmable logic array
  • These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine such that when executed by the processor of the computer or other programmable data processing apparatus , producing an apparatus for realizing the functions/actions specified in one or more blocks in the flowchart and/or block diagram.
  • These computer-readable program instructions can also be stored in a computer-readable storage medium, and these instructions cause computers, programmable data processing devices and/or other devices to work in a specific way, so that the computer-readable medium storing instructions includes An article of manufacture comprising instructions for implementing various aspects of the functions/acts specified in one or more blocks in flowcharts and/or block diagrams.
  • each block in a flowchart or block diagram may represent a module, a portion of a program segment, or an instruction that includes one or more Executable instructions.
  • the functions noted in the block may occur out of the order noted in the figures. For example, two blocks in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved.
  • each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations can be implemented by a dedicated hardware-based system that performs the specified function or action , or may be implemented by a combination of dedicated hardware and computer instructions.
  • the computer program product can be specifically realized by means of hardware, software or a combination thereof.
  • the computer program product is embodied as a computer storage medium, and in another optional embodiment, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK) etc. wait.
  • a software development kit Software Development Kit, SDK

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

Embodiments of the present invention relate to an object re-identification method and apparatus, an electronic device, a storage medium, and a computer program product. The method comprises: determining an image to be identified that comprises a target object and an image set comprising a candidate image, each candidate image comprising at least one object; and inputting the image to be identified and the image set into a re-identification network to obtain a target candidate image comprising an object matched with the target object. The re-identification network is obtained by means of two-stage training, the first-stage training process is implemented according to a sample image and a corresponding first category label, and the second-stage training process is implemented according to the sample image, a corresponding pseudo label, and the first category label, the pseudo label of each sample image being determined according to the re-identification network after first training. According to the present invention, the performance of the re-identification network is improved by means of two-stage training, thereby improving the accuracy of an identification result.

Description

对象重识别方法及装置、电子设备、存储介质和计算机程序产品Object re-identification method and device, electronic device, storage medium and computer program product
相关申请的交叉引用Cross References to Related Applications
本申请基于申请号为202111601354.8、申请日为2021年12月24日、申请名称为“对象重识别方法及装置、电子设备和存储介质”的优先权的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此以全文引用的方式引入本申请。This application is based on the Chinese patent application with the application number 202111601354.8, the application date is December 24, 2021, and the application title is "object re-identification method and device, electronic equipment and storage medium", and requires the Chinese patent application The priority of this Chinese patent application is hereby incorporated by reference in its entirety into this application.
技术领域technical field
本公开涉及计算机技术领域,尤其涉及一种对象重识别方法及装置、电子设备、存储介质和计算机程序产品。The present disclosure relates to the field of computer technology, and in particular to an object re-identification method and device, electronic equipment, storage media and computer program products.
背景技术Background technique
重识别技术广泛地应用在各种项目中,如对于人物、车辆、和物品的重识别。在现实开放世界的应用中,随时可能会出现一些新的情况,相应地也会产生一些从未见过的数据。传统的重识别算法需要大量的样本标注进行训练,且在数据集偏移或领域偏移的时候,需要重新标注新数据或新领域的样本,而这类方法需要消耗大量的人力物力。同时相关技术中的无监督重识别方法往往因场景等影响导致重识别结果准确度低。Re-identification technology is widely used in various projects, such as re-identification of people, vehicles, and objects. In the application of the real open world, some new situations may appear at any time, and correspondingly some never-before-seen data will be generated. Traditional re-identification algorithms require a large number of sample annotations for training, and when the data set shifts or the field shifts, it needs to re-label new data or samples from new fields, and this type of method consumes a lot of manpower and material resources. At the same time, unsupervised re-identification methods in related technologies often have low accuracy of re-identification results due to influences such as scenes.
发明内容Contents of the invention
本公开实施例提出了一种对象重识别方法及装置、电子设备、存储介质和计算机程序产品,旨在通过无监督训练得到的重识别模型提高重识别结果的准确度。Embodiments of the present disclosure propose an object re-identification method and device, electronic equipment, a storage medium, and a computer program product, aiming to improve the accuracy of re-identification results through a re-identification model obtained through unsupervised training.
根据本公开实施例的第一方面,提供了一种对象重识别方法,包括:According to a first aspect of an embodiment of the present disclosure, an object re-identification method is provided, including:
确定包括目标对象的待识别图像;Determining an image to be recognized that includes a target object;
确定包括至少一个候选图像的图像集合,每个所述候选图像中包括对象;determining a set of images comprising at least one candidate image, each of said candidate images comprising an object;
将所述待识别图像和所述图像集合输入重识别网络,得到重识别结果,在所述图像集合存在目标侯选图像的情况下下,所述重识别结果中包括所述目标候选图像,所述目标侯选图像包括的对象与所述目标对象匹配;Inputting the image to be recognized and the set of images into a re-identification network to obtain a re-recognition result, if there is a target candidate image in the image set, the re-recognition result includes the target candidate image, so An object included in the target candidate image matches the target object;
其中,所述重识别网络通过两阶段训练得到,第一阶段训练过程根据至少一个样本图像和每个所述样本图像的第一类别标签实现,第二阶段训练过程根据所述至少一个样本图像、每个所述样本图像的伪标签和第一类别标签实现,每个所述样本图像的伪标签基于所述第一阶段训练过程结束后的重识别网络确定,所述第一类别标签表征对应图像的类别。Wherein, the re-identification network is obtained through two-stage training, the first-stage training process is implemented according to at least one sample image and the first category label of each of the sample images, and the second-stage training process is based on the at least one sample image, The pseudo-label and the first category label of each of the sample images are realized, the pseudo-label of each of the sample images is determined based on the re-identification network after the first stage of the training process, and the first category label represents the corresponding image category.
根据本公开实施例的第二方面,提供了一种对象重识别装置,包括:According to a second aspect of an embodiment of the present disclosure, an object re-identification device is provided, including:
图像确定模块,配置为确定包括目标对象的待识别图像;An image determination module configured to determine an image to be recognized including a target object;
集合确定模块,配置为确定包括至少一个候选图像的图像集合,每个所述候选图像中包 括对象;a set determination module configured to determine a set of images comprising at least one candidate image, each of said candidate images comprising an object;
重识别模块,配置为将所述待识别图像和所述图像集合输入重识别网络,得到重识别结果,在所述图像集合存在目标侯选图像的情况下下,所述重识别结果中包括所述目标候选图像,所述目标侯选图像包括的对象与所述目标对象匹配;The re-identification module is configured to input the image to be recognized and the image set into a re-identification network to obtain a re-identification result, and if there is a target candidate image in the image set, the re-identification result includes the The target candidate image, the object included in the target candidate image matches the target object;
其中,所述重识别网络通过两阶段训练得到,第一阶段训练过程根据至少一个样本图像和每个所述样本图像的第一类别标签实现,第二阶段训练过程根据所述至少一个样本图像、每个所述样本图像的伪标签和第一类别标签实现,每个所述样本图像的伪标签基于所述第一阶段训练过程结束后的重识别网络确定,所述第一类别标签表征对应图像的类别。Wherein, the re-identification network is obtained through two-stage training, the first-stage training process is implemented according to at least one sample image and the first category label of each of the sample images, and the second-stage training process is based on the at least one sample image, The pseudo-label and the first category label of each of the sample images are realized, the pseudo-label of each of the sample images is determined based on the re-identification network after the first stage of the training process, and the first category label represents the corresponding image category.
根据本公开实施例的第三方面,提供了一种电子设备,包括:处理器;用于存储处理器可执行指令的存储器;其中,所述处理器被配置为调用所述存储器存储的指令,以执行上述方法。According to a third aspect of an embodiment of the present disclosure, there is provided an electronic device, including: a processor; a memory for storing instructions executable by the processor; wherein the processor is configured to call the instructions stored in the memory, to perform the above method.
根据本公开实施例的第四方面,提供了一种计算机可读存储介质,其上存储有计算机程序指令,所述计算机程序指令被处理器执行时实现上述方法。According to a fourth aspect of the embodiments of the present disclosure, there is provided a computer-readable storage medium, on which computer program instructions are stored, and when the computer program instructions are executed by a processor, the foregoing method is implemented.
根据本公开实施例的第五方面,提供了一种计算机程序产品,该计算机程序产品包括存储了计算机程序的非瞬时性计算机可读存储介质,该计算机程序被计算机读取并执行时,实现如本公开实施例中所描述的方法的部分或全部步骤。该计算机程序产品可以为一个软件安装包。According to a fifth aspect of the embodiments of the present disclosure, a computer program product is provided, the computer program product includes a non-transitory computer-readable storage medium storing a computer program, and when the computer program is read and executed by a computer, the following Part or all of the steps of the methods described in the embodiments of the present disclosure. The computer program product may be a software installation package.
在本公开实施例中,通过两阶段训练提高重识别网络的性能,从而提高识别结果的准确率。In the embodiment of the present disclosure, the performance of the re-identification network is improved through two-stage training, thereby improving the accuracy of the recognition result.
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,而非限制本公开。根据下面参考附图对示例性实施例的详细说明,本公开的其它特征及方面将变得清楚。It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure. Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments with reference to the accompanying drawings.
附图说明Description of drawings
此处的附图被并入说明书中并构成本说明书的一部分,这些附图示出了符合本公开的实施例,并与说明书一起用于说明本公开实施例的技术方案。The drawings here are incorporated into the specification and constitute a part of the specification. These drawings show the embodiments consistent with the present disclosure, and are used together with the description to describe the technical solutions of the embodiments of the present disclosure.
图1示出根据本公开实施例的一种对象重识别方法的流程图;FIG. 1 shows a flowchart of an object re-identification method according to an embodiment of the present disclosure;
图2示出根据本公开实施例的一种训练重识别网络的流程图;Fig. 2 shows a flow chart of training a re-identification network according to an embodiment of the present disclosure;
图3示出根据本公开实施例的一种预设图像的示意图;Fig. 3 shows a schematic diagram of a preset image according to an embodiment of the present disclosure;
图4示出根据本公开实施例的一种样本图像的示意图;Fig. 4 shows a schematic diagram of a sample image according to an embodiment of the present disclosure;
图5示出根据本公开实施例的一种确定样本图的示意图;Fig. 5 shows a schematic diagram of determining a sample graph according to an embodiment of the present disclosure;
图6示出根据本公开实施例的一种重识别网络第一阶段训练过程的示意图;FIG. 6 shows a schematic diagram of a first-stage training process of a re-identification network according to an embodiment of the present disclosure;
图7示出根据本公开实施例的一种重识别网络第二阶段训练过程的示意图;FIG. 7 shows a schematic diagram of a second-stage training process of a re-identification network according to an embodiment of the present disclosure;
图8示出根据本公开实施例的一种对象重识别装置的示意图;Fig. 8 shows a schematic diagram of an object re-identification device according to an embodiment of the present disclosure;
图9示出根据本公开实施例的一种电子设备的示意图;Fig. 9 shows a schematic diagram of an electronic device according to an embodiment of the present disclosure;
图10示出根据本公开实施例的另一种电子设备的示意图。Fig. 10 shows a schematic diagram of another electronic device according to an embodiment of the present disclosure.
具体实施方式Detailed ways
以下将参考附图详细说明本公开的各种示例性实施例、特征和方面。附图中相同的附图标记表示功能相同或相似的元件。尽管在附图中示出了实施例的各种方面,但是除非特别指出,不必按比例绘制附图。Various exemplary embodiments, features, and aspects of the present disclosure will be described in detail below with reference to the accompanying drawings. The same reference numbers in the figures indicate functionally identical or similar elements. While various aspects of the embodiments are shown in drawings, the drawings are not necessarily drawn to scale unless specifically indicated.
在这里专用的词“示例性”意为“用作例子、实施例或说明性”。这里作为“示例性”所说明的任何实施例不必解释为优于或好于其它实施例。The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration." Any embodiment described herein as "exemplary" is not necessarily to be construed as superior or better than other embodiments.
本文中术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中术语“至少一种”表示多种中的任意一种或多种中的至少两种的任意组合,例如,包括A、B、C中的至少一种,可以表示包括从A、B和C构成的集合中选择的任意一个或多个元素。The term "and/or" in this article is just an association relationship describing associated objects, which means that there can be three relationships, for example, A and/or B can mean: A exists alone, A and B exist simultaneously, and there exists alone B these three situations. In addition, the term "at least one" herein means any one of a variety or any combination of at least two of the more, for example, including at least one of A, B, and C, which may mean including from A, Any one or more elements selected from the set formed by B and C.
另外,为了更好地说明本公开,在下文的具体实施方式中给出了众多的细节。本领域技术人员应当理解,没有某些细节,本公开同样可以实施。在一些实例中,对于本领域技术人员熟知的方法、手段、元件和电路未作详细描述,以便于凸显本公开的主旨。In addition, in order to better illustrate the present disclosure, numerous details are given in the following specific embodiments. It will be understood by those skilled in the art that the present disclosure may be practiced without certain of the details. In some instances, methods, means, components and circuits that are well known to those skilled in the art have not been described in detail so as to obscure the gist of the present disclosure.
在一种可能的实现方式中,本公开实施例的对象重识别方法可以由终端设备或服务器等电子设备执行。其中,终端设备可以为用户设备(User Equipment,UE)、移动设备、用户终端、终端、蜂窝电话、无绳电话、个人数字助理(Personal Digital Assistant,PDA)、手持设备、计算设备、车载设备、可穿戴设备等任意移动或固定终端。服务器可以为单独的服务器或多个服务器组成的服务器集群。任意电子设备可以通过处理器调用存储器中存储的计算机可读指令的方式来实现本公开实施例的对象重识别方法。In a possible implementation manner, the object re-identification method in the embodiment of the present disclosure may be executed by an electronic device such as a terminal device or a server. Wherein, the terminal device may be user equipment (User Equipment, UE), mobile device, user terminal, terminal, cellular phone, cordless phone, personal digital assistant (Personal Digital Assistant, PDA), handheld device, computing device, vehicle-mounted device, Any mobile or fixed terminal such as wearable devices. The server can be a single server or a server cluster composed of multiple servers. Any electronic device can realize the object re-identification method of the embodiment of the present disclosure by calling the computer-readable instructions stored in the memory by the processor.
本公开实施例的对象重识别方法可以应用于对任意对象进行重识别,例如人物、车辆以及动物等。该重识别方法能够在多个图像或视频帧序列中搜索包括特定对象的图像或视频帧,可以应用于在多个摄像头采集的图像中搜索特定人物的应用场景,或进行行人车辆等对象的跟踪应用场景。The object re-identification method of the embodiment of the present disclosure can be applied to re-identify any object, such as a person, a vehicle, and an animal. The re-identification method can search for images or video frames containing specific objects in multiple images or video frame sequences, and can be applied to the application scene of searching for specific people in images collected by multiple cameras, or tracking objects such as pedestrians and vehicles Application scenarios.
图1示出根据本公开实施例的一种对象重识别方法的流程图。如图1所示,本公开实施例的对象重识别方法可以包括以下步骤S10至S30。Fig. 1 shows a flowchart of an object re-identification method according to an embodiment of the present disclosure. As shown in FIG. 1 , the object re-identification method of the embodiment of the present disclosure may include the following steps S10 to S30.
步骤S10、确定包括目标对象的待识别图像。Step S10, determining the image to be recognized including the target object.
在一种可能的实现方式中,待识别图像可以为采集目标对象直接得到的图像,或者截取采集目标对象得到的图像中目标对象所在区域得到的图像。其中,待识别图像的确定方式可以为通过电子设备内置或连接的图像采集装置采集得到,或者直接接收其他设备发送的待识别图像。目标对象可以为人、动物、车辆甚至家具等任意能够移动的对象或不能够移动的对象。In a possible implementation manner, the image to be recognized may be an image directly obtained by capturing the target object, or an image obtained by intercepting an image obtained by capturing the target object and an area where the target object is located. Wherein, the determination method of the image to be recognized may be collected by an image acquisition device built in or connected to the electronic device, or directly receiving the image to be recognized sent by other devices. The target object can be any movable or non-movable object such as people, animals, vehicles or even furniture.
步骤S20、确定包括至少一个候选图像的图像集合,每个所述候选图像中包括对象。Step S20, determining an image set including at least one candidate image, each of which includes an object.
在一种可能的实现方式中,确定用于作为待识别图像重识别基础的图像集合,其中包括至少一个用于与待识别图像进行匹配的候选图像。可选地,该图像集合可以预先存储在电子设备中,或与电子设备连接的数据库中。每个候选图像通过采集目标对象的同类对象得到,可以为直接采集对象得到的图像,或者截取采集对象得到图像中对象所在区域得到的图像。也就是说,每个候选图像中的对象为目标对象的同类对象。例如,当目标对象为人时,候选 图像中的对象也为人。当目标对象为车辆时,候选图像中的对象也为车辆。In a possible implementation manner, an image set used as a basis for re-identification of the image to be recognized is determined, including at least one candidate image for matching with the image to be recognized. Optionally, the image collection may be pre-stored in the electronic device, or in a database connected to the electronic device. Each candidate image is obtained by collecting similar objects of the target object, and may be an image obtained by directly collecting the object, or an image obtained by intercepting the collected object to obtain an area where the object is located in the image. That is, the objects in each candidate image are the same kind of objects as the target object. For example, when the target object is a person, the object in the candidate image is also a person. When the target object is a vehicle, the object in the candidate image is also a vehicle.
可选地,图像集合中的每个候选图像还具有对应的第二类别标签,用于表征候选图像中对象的类别。例如,当候选图像中的对象为人时,该第二类别标签可以为对象的名称、电话号码以及身份证件号码等身份信息。当候选图像中的对象为车辆时,该第二类别标签可以为车辆的车牌号码、车主信息以及行驶证件号码等。Optionally, each candidate image in the image set also has a corresponding second category label, which is used to characterize the category of the object in the candidate image. For example, when the object in the candidate image is a person, the second category label may be identity information such as the object's name, phone number, and ID card number. When the object in the candidate image is a vehicle, the second category label may be the vehicle's license plate number, vehicle owner information, driving certificate number, and the like.
步骤S30、将所述待识别图像和所述图像集合输入重识别网络,得到重识别结果。Step S30, input the image to be recognized and the set of images into a re-recognition network to obtain a re-recognition result.
在一种可能的实现方式中,将待识别图像和图像集合输入重识别网络,通过重识别网络在多个候选图像中确定包括的对象与目标对象匹配的候选图像,将该候选图像作为目标候选图像得到重识别结果。也就是说,在在存在包括的对象与目标对象匹配的目标候选图像的情况下,重识别结果中可以包括目标候选图像。可选地,重识别结果中除了目标候选图像以外,还可以包括目标对象的类别。即在确定了目标候选图像后,还确定目标候选图像对应的第二类别标签为待识别图像的第二类别标签。In a possible implementation, the image to be recognized and the image set are input into the re-identification network, and the candidate image whose object is matched with the target object is determined through the re-identification network among multiple candidate images, and the candidate image is used as the target candidate The image gets a re-identification result. That is, in a case where there is a target candidate image in which the included object matches the target object, the target candidate image may be included in the re-identification result. Optionally, in addition to the target candidate image, the re-identification result may also include the category of the target object. That is, after the target candidate image is determined, the second category label corresponding to the target candidate image is also determined as the second category label of the image to be recognized.
在一些实施例中,通过重识别网络确定重识别结果的详细过程可以为将待识别图像和图像集合输入重识别网络,通过重识别网络提取得到待识别图像的目标对象特征,和每个候选图像的候选对象特征。再根据目标对象特征和每个候选对象特征确定每个候选图像与待识别图像的相似度。响应于候选图像与待识别图像的相似度满足预设条件,确定候选图像中的对象与目标对象匹配,并将候选图像作为目标候选图像。In some embodiments, the detailed process of determining the re-identification result through the re-identification network may be to input the image to be recognized and the image set into the re-identification network, extract the target object features of the image to be recognized through the re-identification network, and each candidate image candidate object features. Then determine the similarity between each candidate image and the image to be recognized according to the characteristics of the target object and the characteristics of each candidate object. In response to the similarity between the candidate image and the image to be recognized satisfying the preset condition, it is determined that the object in the candidate image matches the target object, and the candidate image is used as the target candidate image.
可选地,在待识别图像为直接采集目标对象得到的图像时,目标对象特征可以为截取待识别图像中目标对象所在区域、并通过重识别网络的特征提取层提取该区域的特征得到。类似的,在候选图像为直接采集对象得到的图像时,候选对象特征也可以通过截取候选图像中对象所在区域、并通过重识别网络的特征提取层提取该区域的特征得到。目标对象特征和每个候选对象特征可以通过向量表示,相似度可以通过计算两者对应向量在特征空间中的距离得到。该相似度可以通过如下公式一计算得到:Optionally, when the image to be recognized is an image obtained by directly collecting the target object, the features of the target object can be obtained by intercepting the region where the target object is located in the image to be recognized, and extracting the features of the region through the feature extraction layer of the re-identification network. Similarly, when the candidate image is obtained by directly capturing the object, the features of the candidate object can also be obtained by intercepting the area where the object is located in the candidate image and extracting the features of the area through the feature extraction layer of the re-identification network. The features of the target object and each candidate object can be represented by vectors, and the similarity can be obtained by calculating the distance between the two corresponding vectors in the feature space. The similarity can be calculated by the following formula 1:
Figure PCTCN2022104715-appb-000001
Figure PCTCN2022104715-appb-000001
其中,similarity(A,B)为A与B的相似度,A为目标对象特征,B为候选对象特征,n为目标对象特征和候选对象特征中的元素数量,i表征当前元素在目标对象特征和候选对象特征中的位置,即当前元素为第几个元素。Among them, similarity(A,B) is the similarity between A and B, A is the target object feature, B is the candidate object feature, n is the target object feature and the number of elements in the candidate object feature, i represents the current element in the target object feature and the position in the feature of the candidate object, that is, which element is the current element.
在一种可能的实现方式中,预设条件可以为相似度值最大且大于相似度阈值,即确定相似度值最大且大于相似度阈值的候选图像为目标候选图像。在一些实施例中,确定该目标候选图像的第二类别标签为待识别图像的第二类别标签,并确定包括目标候选图像和对应第二类别标签的重识别结果。可选地,在不存在满足预设条件的相似度值时,即不存在包括的对象与目标对象匹配的目标候选图像的情况下,可以确定当前待识别图像中的目标对象的类别为新的类别,确定重识别结果为新类别。In a possible implementation manner, the preset condition may be that the similarity value is the largest and greater than the similarity threshold, that is, the candidate image with the largest similarity value and greater than the similarity threshold is determined as the target candidate image. In some embodiments, the second category label of the target candidate image is determined as the second category label of the image to be recognized, and a re-identification result including the target candidate image and the corresponding second category label is determined. Optionally, when there is no similarity value that satisfies the preset condition, that is, when there is no target candidate image that includes an object that matches the target object, it may be determined that the category of the target object in the current image to be recognized is a new Category, determine that the re-identification result is a new category.
在一种可能的实现方式中,本公开实施例的重识别网络通过两阶段训练得到。其中,第一阶段训练过程根据至少一个样本图像和每个样本图像的第一类别标签实现,第二阶段训练 过程根据至少一个样本图像、每个样本图像的伪标签和第一类别标签实现,每个样本图像的伪标签基于第一阶段训练过程结束后的重识别网络确定,第一类别标签表征对应图像的类别。该样本图像为未进行人工标注的样本图像。In a possible implementation manner, the re-identification network in the embodiment of the present disclosure is obtained through two-stage training. Wherein, the first-stage training process is implemented according to at least one sample image and the first category label of each sample image, and the second-stage training process is implemented according to at least one sample image, the pseudo-label of each sample image and the first category label, and each The pseudo-labels of the sample images are determined based on the re-identification network after the first-stage training process, and the first category label represents the category of the corresponding image. The sample image is a sample image that has not been manually labeled.
图2示出根据本公开实施例的一种训练重识别网络的流程图。如图2所示,本公开实施例重识别网络的训练过程可以包括以下步骤S40至步骤S80。可选地,执行步骤S40至S80的电子设备可以为执行对象重识别方法的电子设备,或者为其他的终端或服务器等电子设备。Fig. 2 shows a flow chart of training a re-identification network according to an embodiment of the present disclosure. As shown in FIG. 2 , the training process of the re-identification network in the embodiment of the present disclosure may include the following steps S40 to S80. Optionally, the electronic device that executes steps S40 to S80 may be an electronic device that executes the object re-identification method, or other electronic devices such as terminals or servers.
步骤S40、确定至少一个包括对象的预设图像。Step S40, determining at least one preset image including the object.
在一种可能的实现方式中,每个预设图像通过采集至少一个对象得到,每个预设图像具有至少一个用于标注对象所在区域的图像框和每个图像框对应的第一类别标签。其中,每个预设图像具有至少一个图像框,用于标注预设图像中对象所在的区域。该图像框可以通过任意对象标注方式标注得到。例如,可以将预设图像输入预先训练得到的对象识别模型,以识别预设图像中包括的对象位置,输出至少一个表征对象位置的图像框。第一类别标签表征对应图像框内图像区域的类别,可以根据采集的对象确定。例如,当通过图像采集装置采集两个人物得到预设图像时,可以识别预设图像中每个人物所在位置得到两个对应的图像框,并为每个图像框分配对应的第一类别标签为人物1和人物2。In a possible implementation manner, each preset image is obtained by capturing at least one object, and each preset image has at least one image frame for marking an area where the object is located and a first category label corresponding to each image frame. Wherein, each preset image has at least one image frame, which is used to mark the area where the object in the preset image is located. The image frame can be annotated by any object annotation method. For example, the pre-trained object recognition model may be input into the pre-trained image to recognize the position of the object included in the pre-set image, and output at least one image frame representing the position of the object. The first category label represents the category of the image region in the corresponding image frame, and can be determined according to the collected object. For example, when two characters are collected by an image acquisition device to obtain a preset image, the position of each character in the preset image can be identified to obtain two corresponding image frames, and each image frame is assigned a corresponding first category label as Person 1 and Person 2.
可选地,至少一个预设图像可以通过随机抽样确定,即对预设图像集合进行随机抽样得到至少一个包括对象的预设图像。其中,预设图像集合可以预先存储在训练重识别网络的电子设备中,或存储在其他设备中,由训练重识别网络的电子设备在其他电子设备中直接抽取至少一个预设图像。Optionally, at least one preset image may be determined through random sampling, that is, random sampling is performed on a set of preset images to obtain at least one preset image including an object. Wherein, the preset image set may be pre-stored in the electronic device for training the re-recognition network, or stored in other devices, and the electronic device for training the re-recognition network directly extracts at least one preset image from other electronic devices.
图3示出根据本公开实施例的一种预设图像的示意图。如图3所示,预设图像30中可以包括至少一个对象,且预设图像30还具有用于标注对象位置的图像框。例如,当预设图像30为采集至少一个人物得到的图像时,预设图像30可以具有用于表征其中至少一个人物面部所在位置的图像框。在预设图像30中包括人物1和人物2时,预设图像30具有表征人物1面部所在区域的第一图像框31,和表征人物2面部所在区域的第二图像框32。可选地,由于预设图像30通过采集两个人物得到,可以在标记两个人物位置的过程中直接预设图像30中的第一图像框31对应的第一类别标签为人物1,第二图像框32对应的第一类别标签为人物2。Fig. 3 shows a schematic diagram of a preset image according to an embodiment of the present disclosure. As shown in FIG. 3 , the preset image 30 may include at least one object, and the preset image 30 also has an image frame for marking the position of the object. For example, when the preset image 30 is an image obtained by capturing at least one person, the preset image 30 may have an image frame for representing the location of the face of at least one person. When the preset image 30 includes characters 1 and 2, the preset image 30 has a first image frame 31 representing the area where the face of character 1 is located, and a second image frame 32 representing the area where the face of character 2 is located. Optionally, since the preset image 30 is obtained by capturing two characters, the first category tag corresponding to the first image frame 31 in the preset image 30 can be directly preset as character 1, and the second The first category label corresponding to the image frame 32 is person 2 .
步骤S50、根据对应的至少一个所述图像框确定每个所述预设图像对应的至少一个样本图像。Step S50. Determine at least one sample image corresponding to each preset image according to the corresponding at least one image frame.
在一种可能的实现方式中,在确定至少一个预设图像后,根据每个预设图像对应的图像框确定每个预设图像对应的至少一个样本图像。每个样本图像通过裁剪预设图像中的部分区域得到。其中,可以根据预设图像的每一个图像框裁剪得到多个样本图像。可选地,可以对每个预设图像进行至少一次数据增强,并在每一次数据增强后截取至少一个图像框中的区域作为样本图像。该数据增强的过程可以包括平移图像框、翻转图像框、以及缩小图像框比例等,使得每一次数据增强后截取的样本图像能够包括对象的不同区域。In a possible implementation manner, after at least one preset image is determined, at least one sample image corresponding to each preset image is determined according to an image frame corresponding to each preset image. Each sample image is obtained by cropping a part of the preset image. Wherein, multiple sample images may be obtained by cutting out each image frame of the preset image. Optionally, at least one data enhancement may be performed on each preset image, and after each data enhancement, an area in at least one image frame may be intercepted as a sample image. The data enhancement process may include translating the image frame, flipping the image frame, reducing the ratio of the image frame, etc., so that the sample image intercepted after each data enhancement can include different regions of the object.
在一些实施例中,由于不同预设图像的格式、属性不同,为使得到的样本图像符合训练重识别网络要求的格式,可以在进行数据增强之前对每个预设图像先进行数据预处理。该数据预处理的过程可以包括格式转换、图像亮度调整、整体降噪等任意处理方式,可以根据需 要预先选择至少一种处理方式用于数据预处理。In some embodiments, since different preset images have different formats and attributes, in order to make the obtained sample images conform to the format required by the training re-identification network, data preprocessing can be performed on each preset image before data enhancement. The data preprocessing process may include any processing methods such as format conversion, image brightness adjustment, and overall noise reduction, and at least one processing method may be pre-selected for data preprocessing as required.
图4示出根据本公开实施例的一种样本图像的示意图。在在一种可能的实现方式中,在确定预设图像30后,在预设图像30中裁剪得到至少一个对象对应的多个样本图像。当预设图像30中包括人物1和人物2,且预设图像30上的图像框为表征人物1面部所在区域的第一图像框31,和表征人物2面部所在区域的第二图像框32时,可以确定预设图像30中人物1对应的至少一个第一对象样本图像33,和预设图像30中人物2对应的至少一个第二对象样本图像34。Fig. 4 shows a schematic diagram of a sample image according to an embodiment of the present disclosure. In a possible implementation manner, after the preset image 30 is determined, multiple sample images corresponding to at least one object are obtained by cropping the preset image 30 . When the preset image 30 includes character 1 and character 2, and the image frame on the preset image 30 is the first image frame 31 representing the area where the face of character 1 is located, and the second image frame 32 representing the area where the face of character 2 is located , at least one first object sample image 33 corresponding to the person 1 in the preset image 30 and at least one second object sample image 34 corresponding to the person 2 in the preset image 30 may be determined.
可选地,在提取样本图像之前,先对预设图像30进行数据预处理,再分别对第一图像框31和第二图像框32进行平移、翻转和尺寸缩放等操作,每一次操作后截取图像框中的内容,得到对应的第一对象样本图像33和第二对象样本图像34。Optionally, before extracting the sample image, first perform data preprocessing on the preset image 30, and then perform operations such as translation, flipping, and size scaling on the first image frame 31 and the second image frame 32 respectively, and intercept after each operation The content in the image frame is used to obtain the corresponding first object sample image 33 and second object sample image 34 .
图5示出根据本公开实施例的一种确定样本图的示意图。如图5所示,本公开实施例确定用于训练重识别网络的样本图像时,可以先确定包括至少一个预设图像的预设图像集合50,对预设图像集合50进行随机采样51,得到至少一个包括对象的预设图像52。对预设图像52依次进行图像预处理53和数据增强54,并截取每一个预设图像52中图像框内的区域得到样本图像55。Fig. 5 shows a schematic diagram of determining a sample graph according to an embodiment of the present disclosure. As shown in FIG. 5 , when the embodiment of the present disclosure determines the sample images used for training the re-identification network, the preset image set 50 including at least one preset image can be determined first, and the preset image set 50 is randomly sampled 51 to obtain At least one preset image 52 includes an object. Perform image preprocessing 53 and data enhancement 54 on the preset image 52 in sequence, and intercept the area within the image frame in each preset image 52 to obtain a sample image 55 .
在一种可能的实现方式中,在预设图像集合中随机采样得到预设图像的过程和在预设图像中提取样本图像的过程的顺序可以进行变换,即可以先对预设图像进行随机采样再提取样本图像,或者先对预设图像集合中每个预设图像进行样本图像提取后再随机采样。可选地,样本图像提取过程中的图像预处理和数据增强的顺序也可以调整。In a possible implementation, the order of the process of randomly sampling the preset image from the preset image set and the process of extracting the sample image from the preset image can be changed, that is, the preset image can be randomly sampled first Then extract the sample image, or first extract the sample image for each preset image in the preset image set, and then randomly sample. Optionally, the sequence of image preprocessing and data enhancement during sample image extraction can also be adjusted.
基于上述方式,本公开实施例能够通过数据增强的方式获取每一个对象对应的多个样本图像,大量扩展了样本图像的数量。在一些实施例中,还可以通过GPU(Graphics Processing Unit,图形处理器)并行进行图像处理,缩短图像处理速度的同时减少不必要的背景噪音。同时,本公开实施例通过随机抽取预设图像确定样本图像的方式缓解了样本类别过多导致训练过程损失难以计算的问题,并通过随机抽样使得抽取的预设图像具有代表性,能够反映预设图像集合的特征。Based on the above method, the embodiments of the present disclosure can obtain multiple sample images corresponding to each object through data enhancement, greatly expanding the number of sample images. In some embodiments, the image processing can also be performed in parallel by a GPU (Graphics Processing Unit, graphics processing unit), so as to shorten the image processing speed and reduce unnecessary background noise. At the same time, the embodiment of the present disclosure alleviates the problem that the loss of the training process is difficult to calculate due to too many sample categories by randomly selecting preset images, and makes the extracted preset images representative through random sampling, which can reflect the preset Features of an image collection.
步骤S60、根据所述样本图像和对应的第一类别标签对所述重识别网络进行第一阶段训练。Step S60, performing a first-stage training on the re-identification network according to the sample image and the corresponding first category label.
在一种可能的实现方式中,本公开实施例在确定多个样本图像后,可以直接根据每个样本图像和第一类别标签对重识别网络进行第一阶段训练。在训练过程中,重识别网络能够输出输入样本图像的第一预测类别,表征重识别网络预测输入样本图像中对象的类别。由于样本图像中仅包括一个对象,该样本图像的真实图像类别即为真实的对象类别,可以分别根据样本图像的真实图像类别和真实对象类别分别与第一预测类别计算损失,得到总的重识别网络损失以进行网络调节。In a possible implementation manner, after a plurality of sample images are determined in the embodiment of the present disclosure, the re-identification network may be trained in the first stage directly according to each sample image and the first category label. During the training process, the re-identification network can output the first predicted category of the input sample image, and the characterizing re-identification network predicts the category of the object in the input sample image. Since only one object is included in the sample image, the real image category of the sample image is the real object category, and the loss can be calculated according to the real image category and the real object category of the sample image respectively with the first predicted category to obtain the total re-identification Network loss for network conditioning.
可选地,为了提高训练过程的效率并降低人工标注成本,在重识别网络训练前不需要人工进行样本图像标注,可以直接将样本图像对应的图像框的第一类别标签作为该样本图像的第一类别标签,即将预设图像中每个对象位置所在区域的类别作为采集该部分区域得到的样本图像的真实图像类别。在一些实施例中,由于大多数预设图像中仅包括一个对象,另外部分预设图像中包括较少数量的对象。因此,为了提高标注效率,可以不根据每个样本图像中 的对象类别标注实际的第二类别标签。在重识别网络的第一阶段训练过程中直接将每个样本图像的第一类别标签作为表征其中对象类别的第二类别标签,在第二阶段训练过程中再进行样本图像中真实对象类别的修正。Optionally, in order to improve the efficiency of the training process and reduce the cost of manual labeling, manual sample image labeling is not required before re-identification network training, and the first category label of the image frame corresponding to the sample image can be directly used as the first category label of the sample image. A category label, that is, the category of the area where each object position in the preset image is used as the real image category of the sample image obtained by collecting this part of the area. In some embodiments, since most of the preset images only include one object, other preset images include a smaller number of objects. Therefore, in order to improve the labeling efficiency, the actual second category label may not be labeled according to the object category in each sample image. In the first-stage training process of the re-identification network, the first category label of each sample image is directly used as the second category label representing the object category, and the real object category in the sample image is corrected in the second-stage training process .
例如,在预设图像具有三个人物对象的图像框,且每个图像框的第一类别标签分别为“人物1”、“人物2”和“人物3”时,分别提取每个图像框中的样本图像。对每个图像框对应的样本图像进行人工标注时会详细识别其中每个人物的身份,可以标注对应的第二类别标签为“张三”、“李四”和“王五”。为节省标注过程的时间,提高重识别网络训练过程的效率,可以不对每个样本图像中的人物身份进行识别,仅通过继承第一类别标签的方式快速标注每个图像框对应样本图像的第二类别标签为“人物1”、“人物2”和“人物3”。For example, when the preset image has image frames of three person objects, and the first category labels of each image frame are "person 1", "person 2" and "person 3", respectively extract the A sample image of . When manually labeling the sample images corresponding to each image frame, the identity of each person will be identified in detail, and the corresponding second category tags can be marked as "Zhang San", "Li Si" and "Wang Wu". In order to save time in the labeling process and improve the efficiency of the re-identification network training process, it is not necessary to identify the identity of the person in each sample image, and only quickly label the second class of the sample image corresponding to each image frame by inheriting the first category label. The category labels are "Person 1", "Person 2" and "Person 3".
基于上述第二类别标签确定方式,对重识别网络进行第一阶段训练的过程包括确定每个样本图像对应的第一类别标签为第二类别标签,再将每个样本图像输入重识别网络,输出对应的第一预测类别。根据每个样本图像对应的第一类别标签、第二类别标签和第一预测类别确定第一网络损失,并根据第一网络损失调节重识别网络。其中,第一损失可以根据样本图像的第一类别标签和第一预测类别确定,第二损失可以根据样本图像的第二类别标签和第一预测类别确定。即可以先根据每个样本图像对应的第一类别标签和第一预测类别确定第一损失,根据每个样本图像对应的第二类别标签和第一预测类别确定第二损失,再根据第一损失和第二损失确定第一网络损失,并根据第一网络损失调节重识别网络。其中,第一网络损失可以通过计算第一损失和第二损失的加权和得到。Based on the above-mentioned method of determining the second category label, the first-stage training process of the re-identification network includes determining the first category label corresponding to each sample image as the second category label, and then inputting each sample image into the re-identification network, and outputting corresponding to the first predicted category. A first network loss is determined according to the first category label, the second category label and the first predicted category corresponding to each sample image, and the re-identification network is adjusted according to the first network loss. Wherein, the first loss may be determined according to the first category label of the sample image and the first predicted category, and the second loss may be determined according to the second category label of the sample image and the first predicted category. That is, the first loss can be determined according to the first category label and the first prediction category corresponding to each sample image, the second loss can be determined according to the second category label and the first prediction category corresponding to each sample image, and then the first loss can be determined according to the first loss and the second loss to determine the first network loss, and adjust the re-identification network according to the first network loss. Wherein, the first network loss can be obtained by calculating the weighted sum of the first loss and the second loss.
在一种可能的实现方式中,第一损失可以为三元组损失,第二损失可以为交叉熵分类损失。即第一损失可以通过计算每个样本图像第一类别标签和第一预测类别的三元组损失得到,第二损失可以通过计算每个样本图像的第二类别标签和第一预测类别的交叉熵分类损失得到。其中,三元组损失与相同对象类别样本之间的距离成反比,与不同对象类别样本之间的距离成正比。可以通过网络调节降低三元组损失,拉近相同对象类别样本之间的距离,并拉远不同对象类别样本之间的距离。交叉熵分类损失与相同图像类别样本之间的距离成反比,可以通过网络调节降低交叉熵分类损失,拉进相同图像类别样本之间的距离。In a possible implementation manner, the first loss may be a triplet loss, and the second loss may be a cross-entropy classification loss. That is, the first loss can be obtained by calculating the triplet loss of the first category label and the first predicted category of each sample image, and the second loss can be obtained by calculating the cross entropy of the second category label and the first predicted category of each sample image The classification loss is obtained. Among them, the triplet loss is inversely proportional to the distance between samples of the same object category and proportional to the distance between samples of different object categories. The triplet loss can be reduced by network conditioning, bringing the distance between samples of the same object category closer and the distance between samples of different object categories farther away. The cross-entropy classification loss is inversely proportional to the distance between samples of the same image category, and the cross-entropy classification loss can be reduced through network adjustment to pull in the distance between samples of the same image category.
可选地,三元组损失和交叉熵分类损失的计算可以分别通过如下公式二和公式三计算得到:Optionally, the triplet loss and the cross-entropy classification loss can be calculated by the following formulas 2 and 3, respectively:
Figure PCTCN2022104715-appb-000002
Figure PCTCN2022104715-appb-000002
Figure PCTCN2022104715-appb-000003
Figure PCTCN2022104715-appb-000003
其中,公式二中的三元组损失为L th,P×K为总的样本图像数量,a为任意一个样本图像,p为多个与a第一类别标签相同的样本图像中,对应的特征向量与a对应特征向量在特征空间中距离最大的样本图像,n为多个与a第一类别标签不同的样本图像中,对应的特征向量与a对应特征向量在特征空间中距离最小的样本图像。α为预设的修正参数。公式三中的交叉熵分类损失为L,N为样本图像数量,M为第二类别标签的数量,p ic为样本图像i属于第一预测类别c的预测概率,y ic在样本i的第二类别标签为c时取1,不为c时取0。 Among them, the triplet loss in formula 2 is L th , P×K is the total number of sample images, a is any sample image, and p is the corresponding feature in multiple sample images with the same first category label as a The sample image with the largest distance between the vector and the feature vector corresponding to a in the feature space, n is the sample image with the smallest distance between the corresponding feature vector and the feature vector corresponding to a in the feature space among multiple sample images different from the first category label of a . α is a preset correction parameter. The cross-entropy classification loss in formula 3 is L, N is the number of sample images, M is the number of second class labels, p ic is the predicted probability that sample image i belongs to the first predicted class c, y ic is in the second class of sample i It takes 1 when the category label is c, and takes 0 when it is not c.
在确定第一网络损失后,可以对重识别网络进行第一阶段调节直到第一网络损失满足第一预设条件。其中,第一预设条件可以为第一网络损失小于预设第一阈值。基于三元组损失和交叉熵分类损失的特性,第一阶段调节后的重识别网络能够得到一个分布合理的特征空间。也就是说,能够通过调整重识别网络的特征提取层,使得重识别网络能够对图像类别相同的图像提取相似的特征向量,并且对对象类别相同的图像也能够提取得到相似的特征向量。After the first network loss is determined, a first-stage adjustment may be performed on the re-identification network until the first network loss satisfies a first preset condition. Wherein, the first preset condition may be that the first network loss is smaller than a preset first threshold. Based on the properties of triplet loss and cross-entropy classification loss, the re-identification network adjusted in the first stage can obtain a well-distributed feature space. That is to say, by adjusting the feature extraction layer of the re-identification network, the re-recognition network can extract similar feature vectors for images of the same image category, and can also extract similar feature vectors for images of the same object category.
图6示出根据本公开实施例的一种重识别网络第一阶段训练过程的示意图。如图6所示,本公开实施例在确定样本图像60后,根据样本图像60对应的预设图像在采集的同时获取的第一类别标签61,确定样本图像60对应的第一类别标签61和第二类别标签62。将每个样本图像60输入到重识别网络63中得到第一预测类别64,并根据每个样本图像60的第一预测类型64和第一类别标签61计算得到第一损失65。同时,根据每个样本图像60的第一预测类型64和第二类别标签62计算得到第二损失66,并根据第一损失65和第二损失66共同调节重识别网络63。可选地,调节方式可以为计算第一损失65和第二损失66的加权和得到第一网络损失,对重识别网络63进行第一阶段调节直到第一网络损失满足第一预设条件。Fig. 6 shows a schematic diagram of a first-stage training process of a re-identification network according to an embodiment of the present disclosure. As shown in FIG. 6 , after the sample image 60 is determined in the embodiment of the present disclosure, the first category label 61 corresponding to the sample image 60 and Second category label 62 . Each sample image 60 is input into the re-identification network 63 to obtain a first predicted category 64 , and a first loss 65 is calculated according to the first predicted category 64 and the first category label 61 of each sample image 60 . At the same time, the second loss 66 is calculated according to the first prediction type 64 and the second category label 62 of each sample image 60 , and the re-identification network 63 is jointly adjusted according to the first loss 65 and the second loss 66 . Optionally, the adjustment method may be to calculate the weighted sum of the first loss 65 and the second loss 66 to obtain the first network loss, and perform a first-stage adjustment on the re-identification network 63 until the first network loss meets the first preset condition.
步骤S70、根据第一阶段训练结束的所述重识别网络确定所述样本图像的伪标签。Step S70. Determine the pseudo-label of the sample image according to the re-identification network whose training in the first stage is completed.
在一种可能的实现方式中,在对重识别网络进行第一阶段训练过程后,可以根据重识别网络训练后分布较为合理的特征空间确定每个样本图像的伪标签。其中,每个样本图像的伪标签在第二阶段训练过程中用于表征样本图像中对象的类别。伪标签可以为任意内容的标签,每个伪标签用于唯一表征一类对象。In a possible implementation, after the first-stage training process of the re-identification network, the pseudo-label of each sample image can be determined according to the feature space with a relatively reasonable distribution after the re-identification network is trained. Among them, the pseudo-label of each sample image is used to represent the category of the object in the sample image during the second stage of training. Pseudo-labels can be labels of any content, and each pseudo-label is used to uniquely represent a class of objects.
可选地,根据第一阶段训练后的重识别网络确定伪标签的方式可以为将每个样本图像输入第一阶段训练结束的重识别网络,得到对每个样本图像进行特征提取后的特征向量。对每个样本图像的特征向量进行聚类,并确定与聚类后得到的每个聚类簇唯一对应的标识信息。将每个聚类簇对应的标识信息作为其中包括的每个特征向量对应样本图像的伪标签。其中,聚类过程可以基于k均值聚类算法实现。每个聚类簇唯一对应的标识信息可以预先设定,或根据预设规则生成。Optionally, the way to determine the pseudo-label according to the re-identification network trained in the first stage can be to input each sample image into the re-identification network after the first stage training, and obtain the feature vector after feature extraction for each sample image . The feature vectors of each sample image are clustered, and identification information uniquely corresponding to each cluster obtained after clustering is determined. The identification information corresponding to each cluster is used as the pseudo-label of the sample image corresponding to each feature vector contained therein. Wherein, the clustering process can be realized based on the k-means clustering algorithm. The unique identification information corresponding to each cluster can be preset or generated according to preset rules.
步骤S80、根据所述样本图像和对应的第一类别标签以及伪标签对第一阶段训练后得到的所述重识别网络进行第二阶段训练。Step S80 , performing a second-stage training on the re-identification network obtained after the first-stage training according to the sample images and corresponding first category labels and pseudo-labels.
在一种可能的实现方式中,在得到每个样本图像的伪标签后,将每个样本图像在第一阶段训练过程中对应的第一类别标签作为真实图像类别,将每个样本图像对应的伪标签作为真实对象类别。在一些实施例中,基于当前每个样本图像的真实图像类别、真实对象类别和重识别网络预测的样本图像类别进行重识别网络的第二阶段训练。也就是说,可以将每个样本图像输入第一阶段训练后得到的重识别网络,输出对应的第二预测类别。根据每个样本图像对应的第一类别标签、伪标签和第二预测类别确定第二网络损失,并根据第二网络损失调节重识别网络。In a possible implementation, after obtaining the pseudo-label of each sample image, the first category label corresponding to each sample image in the first stage of training process is used as the real image category, and the corresponding Pseudo-labels serve as real object categories. In some embodiments, the second-stage training of the re-identification network is performed based on the current real image category of each sample image, the real object category, and the sample image category predicted by the re-identification network. That is to say, each sample image can be input into the re-identification network obtained after the first stage of training, and output the corresponding second predicted category. A second network loss is determined according to the first class label, pseudo-label and second predicted class corresponding to each sample image, and the re-identification network is adjusted according to the second network loss.
可选地,与重识别网络的第一阶段训练过程相同,第二阶段训练过程分别根据样本图像的真实图像类别和真实对象类别分别与第二预测类别计算损失,得到总的重识别网络损失以进行网络调节。也就是说,第二阶段调节重识别网络的过程可以包括根据每个样本图像对应的第一类别标签和第二预测类别确定第三损失,根据每个样本图像对应的伪标签和第二预测 类别确定第四损失。再根据第三损失和所述第四损失确定第二网络损失,并根据第二网络损失调节重识别网络。其中,第二网络损失可以通过计算第三损失和第四损失的加权和得到。Optionally, the same as the first-stage training process of the re-identification network, the second-stage training process calculates the loss according to the real image category and the real object category of the sample image respectively and the second predicted category, and the total re-identification network loss is obtained by Make network adjustments. That is to say, the process of adjusting the re-identification network in the second stage may include determining the third loss according to the first class label and the second predicted class corresponding to each sample image, and determining the third loss according to the pseudo-label corresponding to each sample image and the second predicted class Determine the fourth loss. Then determine the second network loss according to the third loss and the fourth loss, and adjust the re-identification network according to the second network loss. Wherein, the second network loss can be obtained by calculating the weighted sum of the third loss and the fourth loss.
在一种可能的实现方式中,第三损失可以为三元组损失,第四损失可以为交叉熵分类损失。即第三损失可以通过计算每个样本图像第一类别标签和第二预测类别的三元组损失得到,第四损失可以通过计算每个样本图像的伪标签和第一预测类别的交叉熵分类损失得到。其中,三元组损失与相同对象类别样本之间的距离成反比,与不同对象类别样本之间的距离成正比。可以通过网络调节降低三元组损失,拉近相同对象类别样本之间的距离,并拉远不同对象类别样本之间的距离。交叉熵分类损失与相同图像类别样本之间的距离成反比,可以通过网络调节降低交叉熵分类损失,拉进相同图像类别样本之间的距离。可选地,第三损失的计算过程可以与第一损失的计算过程相同,第四损失的计算过程可以为第二损失的计算过程相同。In a possible implementation manner, the third loss may be a triplet loss, and the fourth loss may be a cross-entropy classification loss. That is, the third loss can be obtained by calculating the triplet loss of the first category label and the second prediction category of each sample image, and the fourth loss can be obtained by calculating the pseudo-label of each sample image and the cross-entropy classification loss of the first prediction category get. Among them, the triplet loss is inversely proportional to the distance between samples of the same object category and proportional to the distance between samples of different object categories. The triplet loss can be reduced by network conditioning, bringing the distance between samples of the same object category closer and the distance between samples of different object categories farther away. The cross-entropy classification loss is inversely proportional to the distance between samples of the same image category, and the cross-entropy classification loss can be reduced through network adjustment to pull in the distance between samples of the same image category. Optionally, the calculation process of the third loss may be the same as that of the first loss, and the calculation process of the fourth loss may be the same as that of the second loss.
在通过计算第三损失和第四损失的加权和确定第二网络损失后,可以对重识别网络进行第二阶段调节直到第二网络损失满足第二预设条件。其中,第二预设条件可以为第二网络损失小于预设第二阈值。基于三元组损失和交叉熵分类损失的特性,第二阶段调节后的重识别网络能够得到一个分布更加合理的特征空间。也就是说,能够通过调整重识别网络的特征提取层,使得重识别网络能够更加精准的对图像类别相同的图像提取相似的特征向量,并且对对象类别相同的图像也能够更加精准的提取得到相似的特征向量。After the second network loss is determined by calculating the weighted sum of the third loss and the fourth loss, the re-identification network can be adjusted in the second stage until the second network loss satisfies the second preset condition. Wherein, the second preset condition may be that the second network loss is smaller than a preset second threshold. Based on the characteristics of triplet loss and cross-entropy classification loss, the re-identification network adjusted in the second stage can obtain a feature space with a more reasonable distribution. That is to say, by adjusting the feature extraction layer of the re-recognition network, the re-recognition network can more accurately extract similar feature vectors for images of the same image category, and can more accurately extract similar feature vectors for images of the same object category. eigenvectors of .
图7示出根据本公开实施例的一种重识别网络第二阶段训练过程的示意图。如图7所示,本公开实施例在确定样本图像70后,根据样本图像70对应的预设图像在采集的同时获取的第一类别标签71,确定样本图像70对应的第一类别标签71和伪标签72。将每个样本图像70输入到重识别网络73中得到第二预测类别74,并根据每个样本图像70的第二预测类型74和第一类别标签71计算得到第三损失75。同时,根据每个样本图像70的第二预测类型74和伪标签72计算得到第四损失76,并根据第三损失75和第四损失76共同调节重识别网络73。可选地,调节方式可以为计算第三损失75和第四损失76的加权和得到第二网络损失,对重识别网络73进行第一阶段调节直到第二网络损失满足第二预设条件。Fig. 7 shows a schematic diagram of a second-stage training process of a re-identification network according to an embodiment of the present disclosure. As shown in FIG. 7 , in the embodiment of the present disclosure, after the sample image 70 is determined, the first category label 71 corresponding to the sample image 70 and Pseudo-labeling72. Each sample image 70 is input into the re-identification network 73 to obtain the second predicted category 74 , and the third loss 75 is calculated according to the second predicted category 74 and the first category label 71 of each sample image 70 . At the same time, the fourth loss 76 is calculated according to the second prediction type 74 and the pseudo-label 72 of each sample image 70 , and the re-identification network 73 is jointly adjusted according to the third loss 75 and the fourth loss 76 . Optionally, the adjustment method may be to calculate the weighted sum of the third loss 75 and the fourth loss 76 to obtain the second network loss, and perform a first-stage adjustment on the re-identification network 73 until the second network loss meets the second preset condition.
基于训练方法能够低成本且快速的通过无标注数据训练得到准确度高的重识别网络,该重识别网络能精准的对图像类别相同的图像提取相似的特征向量,并且对对象类别相同的图像也能够精准的提取得到相似的特征向量,以得到合理分布的特征空间。在一些实施例中,能够通过两阶段训练得到重识别网络的准确率,通过该重识别网络能够准确的对待识别图像进行重识别,得到准确的重识别结果。Based on the training method, a high-accuracy re-identification network can be obtained through unlabeled data training at low cost and quickly. The re-identification network can accurately extract similar feature vectors for images of the same image category, and also for images of the same object category. It can accurately extract similar feature vectors to obtain a reasonably distributed feature space. In some embodiments, the accuracy rate of the re-identification network can be obtained through two-stage training, and the image to be recognized can be accurately re-identified through the re-identification network, and an accurate re-identification result can be obtained.
基于前述的实施例,本公开实施例再提供一种对象重识别方法,下面对该对象重识别方法进行详细地说明:Based on the foregoing embodiments, embodiments of the present disclosure further provide an object re-identification method, which is described in detail below:
网络输入数据的信息包括样本图像以及所属的类型,还有检测网络输出的边界框坐标,其中,每个样本图像有自己独立的样本标签,而样本图像所属的类型,称作类别标签,此标签一开始无需人工标注提供,随机选出一些样本将其样本标签当作类别标签。网络输入的数据分为训练数据,查询数据,和图库数据。The information of the network input data includes the sample image and its type, as well as the bounding box coordinates of the detection network output. Each sample image has its own independent sample label, and the type of the sample image is called a category label. This label At the beginning, no manual labeling is provided, and some samples are randomly selected to use their sample labels as category labels. The data input by the network is divided into training data, query data, and gallery data.
重识别流程中,训练数据里的每个样本首先通过特定的数据预处理,将带有检测网络输出的缺陷边界框通过数据增强的方法对每张样本生成更多的等价数据,这些等价数据与原样 本分享共同的样本标签,和共同的类别标签。第一步,算法将数据以及标签输入深度学习神经网络,此网络的特点是有两个损失函数。交叉熵分类损失函数将随机选出的样本标签当作类别标签用做真值,三元组损失函数将样本标签用做真值。经过网络学习训练后,会得到一个特征空间,在此特征空间上进行k均值聚类,然后所有样本依此得到伪标签,进行第二次的学习。在第二次学习中,交叉熵分类损失函数将伪标签当作真值,三元组损失函数将样本标签用做真值。网络经二次学习后,特征空间分布更加全面合理。将查询数据和图库数据经过预处理后输入网络,计算查询数据和图库数据在特征空间里的相似度,通过相似度来判断查询数据是否属于某些类别。In the re-identification process, each sample in the training data first undergoes specific data preprocessing, and the defect bounding box with the output of the detection network is generated for each sample by data enhancement. More equivalent data, these equivalent The data share common sample labels and common category labels with the original samples. In the first step, the algorithm feeds data and labels into a deep learning neural network, which is characterized by two loss functions. The cross-entropy classification loss function uses randomly selected sample labels as the category label as the true value, and the triplet loss function uses the sample label as the true value. After network learning and training, a feature space will be obtained, and k-means clustering will be performed on this feature space, and then all samples will get pseudo-labels according to this, and the second learning will be carried out. In the second learning, the cross-entropy classification loss function uses the pseudo labels as ground truth, and the triplet loss function uses the sample labels as ground truth. After the network is relearned, the feature space distribution is more comprehensive and reasonable. Input the query data and gallery data into the network after preprocessing, calculate the similarity between the query data and the gallery data in the feature space, and use the similarity to judge whether the query data belongs to certain categories.
(1)图像预处理;(1) Image preprocessing;
对每张不同的图像进行统一的图像预处理,然后在已处理的图像上进行裁框以及数据增强,可以大大提高图像处理速度,数据增强方法包括上下左右的平移、翻转和缩小缺陷框比例来截取更大面积的视野范围图。Perform unified image preprocessing on each different image, and then perform frame cutting and data enhancement on the processed image, which can greatly improve the image processing speed. Data enhancement methods include translation, flipping, and reducing the proportion of defective frames to Capture a larger field of view map.
(2)随机抽样;(2) random sampling;
算法很重要的一步即为随机抽样并将抽出样本的样本标签作为类别标签进行初步训练。此方法使得样本摆脱了大量人工标注的成本,且缓解了太多类别标签难以进行交叉熵分类损失计算的困扰。同时随机选出的样本标签仍然具有代表性,反应数据集的样子和特征。A very important step in the algorithm is random sampling and the sample label of the extracted sample is used as the category label for preliminary training. This method frees the samples from the cost of a large number of manual labels, and alleviates the problem that too many category labels are difficult to calculate the cross-entropy classification loss. At the same time, the randomly selected sample labels are still representative, reflecting the appearance and characteristics of the data set.
(3)交叉熵分类损失函数;(3) Cross-entropy classification loss function;
以类别标签作为交叉熵分类损失函数的真值,而不用样本标签做真值,让模型能够在有大量样本的情况下快速学习。使得有相同类别标签的样本能够互相靠近。Using the category label as the true value of the cross-entropy classification loss function instead of using the sample label as the true value allows the model to learn quickly with a large number of samples. This allows samples with the same class label to be close to each other.
(4)三元组损失函数;(4) Triplet loss function;
采用三元组损失函数加难样本挖掘的方式,对于每一个样本,取正样本的最大距离(即最难的正样本)和负样本的最小距离(即最难负样本)作为损失函数的优化目标,使得正样本之间的距离不断减小,负样本距离不断拉大,得到一个更好的特征空间,来保证有效的学习。同时,在训练样本中加入背景图(无瑕疵图)作为负样本,可以使得神经网络更好地对比正负样本得差异,从而提高效果。所述三元组损失函数通过如下公式四确定:Using triple loss function plus difficult sample mining, for each sample, take the maximum distance of positive samples (ie, the most difficult positive sample) and the minimum distance of negative samples (ie, the most difficult negative sample) as the optimization of the loss function The goal is to reduce the distance between positive samples and increase the distance between negative samples to obtain a better feature space to ensure effective learning. At the same time, adding the background image (unblemished image) as a negative sample to the training sample can make the neural network better compare the difference between positive and negative samples, thereby improving the effect. The triplet loss function is determined by the following formula four:
Figure PCTCN2022104715-appb-000004
Figure PCTCN2022104715-appb-000004
(5)聚类;(5) Clustering;
在第一次训练完成的模型特征空间上进行k均值聚类,k为用户定义的可调节的参数,可以根据数据的分布进一步缩小分类的类别数目,来使同种类别的样本进一步靠近。而且聚类后可对所有样本分配伪标签,伪标签即为样本属于聚类的标签。将之前未被随机抽取的样本加入交叉熵分类训练,从而进一步优化特征空间分布。k均值聚类优化算法通过如下公式五确定,S为所有随机抽样样本的集合,μ为i类中的平均值。Carry out k-means clustering on the model feature space completed by the first training, k is an adjustable parameter defined by the user, and the number of categories can be further reduced according to the distribution of data to make the samples of the same category closer together. Moreover, after clustering, all samples can be assigned a pseudo-label, which is the label that the sample belongs to the cluster. The samples that have not been randomly drawn before are added to the cross-entropy classification training to further optimize the feature space distribution. The k-means clustering optimization algorithm is determined by the following formula 5, S is the set of all random sampling samples, and μ is the average value in class i.
Figure PCTCN2022104715-appb-000005
Figure PCTCN2022104715-appb-000005
(6)阶段一训练与阶段二训练的对比;(6) The comparison between stage one training and stage two training;
共同点:均采用所有样本的样本标签作为三元组分类损失的真值。能够拉近正样本之间 的距离,拉远负样本之间的距离。What they have in common: Both use the sample labels of all samples as the ground truth for the triplet classification loss. It can shorten the distance between positive samples and shorten the distance between negative samples.
不同点:阶段一训练采用随机抽样的样本标签作为交叉熵损失函数的真值,阶段二采用聚类后得到的所有样本的伪标签作为交叉熵损失函数的真值。随机抽样和聚类得出的标签数目能够在数据激增后保持一个恒定或增量较小的值,从而保证交叉熵的计算代价不会随着数据激增而激增。The difference: Stage 1 training uses randomly sampled sample labels as the true value of the cross-entropy loss function, and stage 2 uses the pseudo-labels of all samples obtained after clustering as the true value of the cross-entropy loss function. The number of labels obtained by random sampling and clustering can maintain a constant value or a small incremental value after the data surge, so as to ensure that the calculation cost of cross entropy will not increase sharply with the data surge.
(7)查询匹配;(7) query matching;
在模型训练好后,只需将待检测图像和图库输入网络,即可得到待检测图像与图库中样本匹配结果,可以通过设置预值的方法判断待检测图像是否属于某一类样本,以及属于哪一类样本。比较相似度的方法为计算样本在特征空间中的余弦距离,相似度可以通过如下公式六确定。如果查询样本有超过某一预值相似度的样本库样本,则归为该类,否则归为不属于样本库样本的新类。A,B代表样本的特征矩阵。After the model is trained, you only need to input the image to be detected and the image gallery into the network to obtain the matching result of the image to be detected and the samples in the image gallery. You can determine whether the image to be detected belongs to a certain type of sample and whether it belongs which type of sample. The method of comparing the similarity is to calculate the cosine distance of the samples in the feature space, and the similarity can be determined by the following formula 6. If the query sample has a sample library sample that exceeds a certain pre-value similarity, it is classified into this class, otherwise it is classified into a new class that does not belong to the sample library sample. A and B represent the feature matrix of the sample.
Figure PCTCN2022104715-appb-000006
Figure PCTCN2022104715-appb-000006
因此,本申请实施例中的对象重识别方法能够达到以下技术效果:Therefore, the object re-identification method in the embodiment of the present application can achieve the following technical effects:
1)该技术使用无监督重识别网络辅助分类网络学习快速归类图像。1) This technique uses an unsupervised re-identification network to assist a classification network to learn to quickly classify images.
2)提高准确率。网络能够对误判为某一类别的图像进行重识别分类,在原有分类基础上改善特征空间的分布方式,细化特征量级,使得网络不单单是学习一个模糊的大类,而是学习每个样本之间的对比和分布,从而提高原分类准确率和召回率。2) Improve accuracy. The network can re-identify and classify images misjudged as a certain category, improve the distribution of the feature space on the basis of the original classification, and refine the feature level, so that the network does not only learn a fuzzy category, but learns each category. The comparison and distribution between samples can improve the original classification accuracy and recall rate.
3)具有成本效益。在一些实时运行中的项目上遇到待判断的种类,不需要重新训练一个庞大的分类模型,而利用重识别模型达到准确快速的识别类别。而且与传统重拾别技术不同,此模型不需要人工标注即可学习,节省了大量的人力物力成本。3) It is cost-effective. When encountering the category to be judged on some real-time running projects, there is no need to retrain a huge classification model, and the re-identification model can be used to achieve accurate and fast recognition of the category. Moreover, unlike the traditional rediscovery technology, this model can be learned without manual labeling, which saves a lot of manpower and material costs.
4)能够识别新的类别。当出现未见过的样本时,普通的分类网络会将其分为已知类别中的一种,而重识别网络会将置信度低的样本过滤,从而达到识别新种类的效果。4) Ability to recognize new categories. When an unseen sample appears, the ordinary classification network will classify it into one of the known categories, while the re-identification network will filter the samples with low confidence, so as to achieve the effect of identifying new categories.
5)快速处理图像。网络采用渲染的技术来处理每张图像,再加上数据增强的方法大大扩展样本量,在这样的机制下单独处理每张图非常耗时,而采用先渲染,再裁框的方式,以及一些GPU优化加速的方法能够大大缩短图像处理速度,还可以减少不必要的背景噪音。5) Quickly process images. The network uses rendering technology to process each image, and the data enhancement method greatly expands the sample size. It is very time-consuming to process each image separately under such a mechanism, and the method of rendering first, then cutting the frame, and some The method of GPU optimization acceleration can greatly shorten the image processing speed, and can also reduce unnecessary background noise.
可以理解,本公开提及的上述各个方法实施例,在不违背原理逻辑的情况下,均可以彼此相互结合形成结合后的实施例。本领域技术人员可以理解,在实施方式的上述方法中,各步骤的实际执行顺序应当以其功能和可能的内在逻辑确定。It can be understood that the above-mentioned method embodiments mentioned in this disclosure can all be combined with each other to form a combined embodiment without violating the principle and logic. Those skilled in the art can understand that, in the above method of the embodiment, the actual execution order of each step should be determined by its function and possible internal logic.
此外,本公开实施例还提供了对象重识别装置、电子设备、计算机可读存储介质、程序,上述均可用来实现本公开提供的任一种对象重识别方法,相应技术方案和描述和参见方法部分的相应记载。In addition, the embodiments of the present disclosure also provide object re-identification devices, electronic equipment, computer-readable storage media, and programs, all of which can be used to implement any object re-identification method provided by the present disclosure, corresponding technical solutions and descriptions, and refer to methods Part of the corresponding record.
图8示出根据本公开实施例的一种对象重识别装置的示意图。如图8所示,本公开实施例的对象重识别装置可以包括图像确定模块80、集合确定模块81和重识别模块82。Fig. 8 shows a schematic diagram of an object re-identification device according to an embodiment of the present disclosure. As shown in FIG. 8 , the object re-identification apparatus of the embodiment of the present disclosure may include an image determination module 80 , a set determination module 81 and a re-identification module 82 .
图像确定模块80,配置为确定包括目标对象的待识别图像;An image determining module 80 configured to determine an image to be recognized including a target object;
集合确定模块81,配置为确定包括至少一个候选图像的图像集合,每个所述候选图像中 包括对象;A set determination module 81 configured to determine an image set comprising at least one candidate image, each of which includes an object;
重识别模块82,配置为将所述待识别图像和所述图像集合输入重识别网络,得到重识别结果,在所述图像集合存在目标侯选图像的情况下下,所述重识别结果中包括所述目标候选图像,所述目标侯选图像包括的对象与所述目标对象匹配;The re-identification module 82 is configured to input the image to be recognized and the image set into a re-identification network to obtain a re-identification result, and if there is a target candidate image in the image set, the re-identification result includes The target candidate image, the object included in the target candidate image matches the target object;
其中,所述重识别网络通过两阶段训练得到,第一阶段训练过程根据至少一个样本图像和每个所述样本图像的第一类别标签实现,第二阶段训练过程根据所述至少一个样本图像、每个所述样本图像的伪标签和第一类别标签实现,每个所述样本图像的伪标签基于所述第一阶段训练过程结束后的重识别网络确定,所述第一类别标签表征对应图像的类别。Wherein, the re-identification network is obtained through two-stage training, the first-stage training process is implemented according to at least one sample image and the first category label of each of the sample images, and the second-stage training process is based on the at least one sample image, The pseudo-label and the first category label of each of the sample images are realized, the pseudo-label of each of the sample images is determined based on the re-identification network after the first stage of the training process, and the first category label represents the corresponding image category.
在一种可能的实现方式中,每个所述候选图像具有对应的第二类别标签,所述第二类别标签表征对应图像中对象的类别;In a possible implementation manner, each of the candidate images has a corresponding second category label, and the second category label represents the category of the object in the corresponding image;
所述装置还包括:The device also includes:
标签确定模块,配置为确定所述目标候选图像对应的第二类别标签为所述待识别图像的第二类别标签。The label determination module is configured to determine the second category label corresponding to the target candidate image as the second category label of the image to be recognized.
在一种可能的实现方式中,所述重识别网络的训练过程包括:In a possible implementation, the training process of the re-identification network includes:
确定至少一个包括对象的预设图像,每个所述预设图像具有至少一个用于标注对象所在区域的图像框,以及每个所述图像框对应的第一类别标签;determining at least one preset image including an object, each of the preset images having at least one image frame for marking the area where the object is located, and a first category label corresponding to each of the image frames;
根据对应的至少一个所述图像框确定每个所述预设图像对应的至少一个样本图像;determining at least one sample image corresponding to each of the preset images according to the corresponding at least one image frame;
根据所述样本图像和对应的第一类别标签对所述重识别网络进行第一阶段训练;performing a first-stage training on the re-identification network according to the sample image and the corresponding first category label;
根据第一阶段训练结束的所述重识别网络确定所述样本图像的伪标签;Determining the pseudo-label of the sample image according to the re-identification network completed in the first stage of training;
根据所述样本图像和对应的第一类别标签以及伪标签对第一阶段训练后得到的所述重识别网络进行第二阶段训练。The second-stage training is performed on the re-identification network obtained after the first-stage training according to the sample images and the corresponding first category labels and pseudo-labels.
在一种可能的实现方式中,所述确定至少一个包括对象的预设图像包括:In a possible implementation manner, the determining at least one preset image including an object includes:
对预设图像集合进行随机抽样得到至少一个包括对象的预设图像。Random sampling is performed on the set of preset images to obtain at least one preset image including an object.
在一种可能的实现方式中,所述根据对应的至少一个所述图像框确定每个所述预设图像对应的至少一个样本图像包括:In a possible implementation manner, the determining at least one sample image corresponding to each preset image according to the corresponding at least one image frame includes:
对每个所述预设图像进行至少一次数据增强,并在每一次数据增强后截取至少一个所述图像框中的区域作为样本图像。Perform at least one data enhancement on each preset image, and intercept at least one area in the image frame as a sample image after each data enhancement.
在一种可能的实现方式中,在对每个所述预设图像进行数据增强之前,对所述预设图像进行图像预处理。In a possible implementation manner, before data enhancement is performed on each of the preset images, image preprocessing is performed on the preset images.
在一种可能的实现方式中,所述根据所述样本图像和对应的第一类别标签对所述重识别网络进行第一阶段训练包括:In a possible implementation manner, the first-stage training of the re-identification network according to the sample image and the corresponding first category label includes:
确定每个所述样本图像对应的第一类别标签为第二类别标签;Determining the first category label corresponding to each of the sample images as the second category label;
将每个所述样本图像输入所述重识别网络,输出样本图像对应的第一预测类别;Input each sample image into the re-identification network, and output the first prediction category corresponding to the sample image;
根据每个所述样本图像对应的第一类别标签、第二类别标签和第一预测类别确定第一网络损失,并根据所述第一网络损失调节所述重识别网络。A first network loss is determined according to the first category label, the second category label and the first predicted category corresponding to each of the sample images, and the re-identification network is adjusted according to the first network loss.
在一种可能的实现方式中,所述根据每个所述样本图像对应的第一类别标签、第二类别标签和第一预测类别确定第一网络损失,并根据所述第一网络损失调节所述重识别网络包括:In a possible implementation manner, the first network loss is determined according to the first category label, the second category label, and the first predicted category corresponding to each of the sample images, and the first network loss is adjusted according to the first network loss. The re-identification network includes:
根据每个所述样本图像对应的第一类别标签和第一预测类别确定第一损失;determining a first loss according to a first category label and a first predicted category corresponding to each of the sample images;
根据每个所述样本图像对应的第二类别标签和第一预测类别确定第二损失;determining a second loss according to the second category label and the first predicted category corresponding to each of the sample images;
根据所述第一损失和所述第二损失确定第一网络损失,并根据所述第一网络损失调节所述重识别网络。A first network loss is determined based on the first loss and the second loss, and the re-identification network is adjusted based on the first network loss.
在一种可能的实现方式中,所述根据第一阶段训练结束的所述重识别网络确定所述样本图像的伪标签包括:In a possible implementation manner, the determining the pseudo-label of the sample image according to the re-identification network whose training is completed in the first stage includes:
将每个所述样本图像输入第一阶段训练结束的所述重识别网络,得到对每个所述样本图像进行特征提取后的特征向量;Each of the sample images is input into the re-identification network that has been trained in the first stage to obtain a feature vector after feature extraction is performed on each of the sample images;
对每个所述样本图像的特征向量进行聚类,并确定与聚类后得到的每个聚类簇唯一对应的标识信息;Clustering the feature vectors of each of the sample images, and determining identification information uniquely corresponding to each cluster obtained after clustering;
将每个所述聚类簇对应的标识信息作为其中包括的每个特征向量对应样本图像的伪标签。The identification information corresponding to each of the clusters is used as a pseudo-label corresponding to the sample image included in each feature vector.
在一种可能的实现方式中,所述聚类过程基于k均值聚类算法实现。In a possible implementation manner, the clustering process is implemented based on a k-means clustering algorithm.
在一种可能的实现方式中,所述根据所述样本图像和对应的第一类别标签以及伪标签对第一阶段训练后得到的所述重识别网络进行第二阶段训练包括:In a possible implementation manner, the second-stage training of the re-identification network obtained after the first-stage training according to the sample image and the corresponding first category label and pseudo-label includes:
将每个所述样本图像输入第一阶段训练后得到的所述重识别网络,输出对应的第二预测类别;Input each of the sample images into the re-identification network obtained after the first stage of training, and output the corresponding second predicted category;
根据每个所述样本图像对应的第一类别标签、伪标签和第二预测类别确定第二网络损失,并根据所述第二网络损失调节所述重识别网络。A second network loss is determined according to the first class label, pseudo-label and second predicted class corresponding to each of the sample images, and the re-identification network is adjusted according to the second network loss.
在一种可能的实现方式中,所述根据每个所述样本图像对应的第一类别标签、伪标签和第二预测类别确定第二网络损失,并根据所述第二网络损失调节所述重识别网络包括:In a possible implementation manner, the second network loss is determined according to the first category label, pseudo-label and second prediction category corresponding to each of the sample images, and the weighting is adjusted according to the second network loss. Identification networks include:
根据每个所述样本图像对应的第一类别标签和第二预测类别确定第三损失;determining a third loss according to the first category label and the second predicted category corresponding to each of the sample images;
根据每个所述样本图像对应的伪标签和第二预测类别确定第四损失;determining a fourth loss according to a pseudo-label corresponding to each of the sample images and a second prediction category;
根据所述第三损失和所述第四损失确定第二网络损失,并根据所述第二网络损失调节所述重识别网络。A second network loss is determined based on the third loss and the fourth loss, and the re-identification network is adjusted based on the second network loss.
在一种可能的实现方式中,所述第一损失和/或所述第三损失为三元组损失,所述第二损失和/或所述第四损失为交叉熵分类损失。In a possible implementation manner, the first loss and/or the third loss is a triplet loss, and the second loss and/or the fourth loss is a cross-entropy classification loss.
在一种可能的实现方式中,所述重识别模块82包括:In a possible implementation manner, the re-identification module 82 includes:
图像输入子模块,配置为将所述待识别图像和所述图像集合输入重识别网络,通过所述重识别网络提取得到所述待识别图像的目标对象特征,和每个所述候选图像的候选对象特征;The image input submodule is configured to input the image to be recognized and the set of images into a re-identification network, and extract the target object features of the image to be recognized through the re-identification network, and the candidate of each of the candidate images object characteristics;
相似度匹配子模块,配置为根据所述目标对象特征和每个所述候选对象特征确定每个所述候选图像与所述待识别图像的相似度;A similarity matching submodule configured to determine the similarity between each of the candidate images and the image to be recognized according to the characteristics of the target object and the characteristics of each of the candidate objects;
结果输出子模块,配置为响应于候选图像与所述待识别图像的相似度满足预设条件,确定所述候选图像中的对象与所述目标对象匹配,并将所述候选图像作为目标候选图像得到重识别结果。The result output submodule is configured to, in response to the similarity between the candidate image and the image to be recognized satisfying a preset condition, determine that the object in the candidate image matches the target object, and use the candidate image as the target candidate image Get re-identification results.
在一种可能的实现方式中,所述预设条件包括相似度值最大且大于相似度阈值。In a possible implementation manner, the preset condition includes that the similarity value is the largest and greater than a similarity threshold.
根据本公开实施例的第三方面,提供了一种电子设备,包括:处理器;用于存储处理器 可执行指令的存储器;其中,所述处理器被配置为调用所述存储器存储的指令,以执行上述方法。According to a third aspect of an embodiment of the present disclosure, there is provided an electronic device, including: a processor; a memory for storing instructions executable by the processor; wherein the processor is configured to call the instructions stored in the memory, to perform the above method.
在一些实施例中,本公开实施例提供的装置具有的功能或包含的模块可以用于执行上文方法实施例描述的方法,其详细实现可以参照上文方法实施例的描述。In some embodiments, the functions or modules included in the apparatus provided by the embodiments of the present disclosure may be used to execute the methods described in the above method embodiments, and for detailed implementation, refer to the descriptions of the above method embodiments.
本公开实施例还提出一种计算机可读存储介质,其上存储有计算机程序指令,所述计算机程序指令被处理器执行时实现上述方法。计算机可读存储介质可以是易失性或非易失性计算机可读存储介质。Embodiments of the present disclosure also provide a computer-readable storage medium, on which computer program instructions are stored, and the above-mentioned method is implemented when the computer program instructions are executed by a processor. Computer readable storage media may be volatile or nonvolatile computer readable storage media.
本公开实施例还提出一种电子设备,包括:处理器;用于存储处理器可执行指令的存储器;其中,所述处理器被配置为调用所述存储器存储的指令,以执行上述方法。An embodiment of the present disclosure also proposes an electronic device, including: a processor; a memory for storing instructions executable by the processor; wherein the processor is configured to invoke the instructions stored in the memory to execute the above method.
本公开实施例还提供了一种计算机程序产品,包括计算机可读代码,或者承载有计算机可读代码的非易失性计算机可读存储介质,当所述计算机可读代码在电子设备的处理器中运行时,所述电子设备中的处理器执行上述方法。An embodiment of the present disclosure also provides a computer program product, including computer-readable codes, or a non-volatile computer-readable storage medium carrying computer-readable codes, when the computer-readable codes are stored in a processor of an electronic device When running in the electronic device, the processor in the electronic device executes the above method.
电子设备可以被提供为终端、服务器或其它形态的设备。Electronic devices may be provided as terminals, servers, or other forms of devices.
图9示出根据本公开实施例的一种电子设备800的示意图。例如,电子设备800可以是移动电话,计算机,数字广播终端,消息收发设备,游戏控制台,平板设备,医疗设备,健身设备,个人数字助理等终端。FIG. 9 shows a schematic diagram of an electronic device 800 according to an embodiment of the present disclosure. For example, the electronic device 800 may be a terminal such as a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, a fitness device, or a personal digital assistant.
参照图9,电子设备800可以包括以下一个或多个组件:处理组件802,存储器804,电源组件806,多媒体组件808,音频组件810,输入/输出(I/O)接口812,传感器组件814,以及通信组件816。9, electronic device 800 may include one or more of the following components: processing component 802, memory 804, power supply component 806, multimedia component 808, audio component 810, input/output (I/O) interface 812, sensor component 814, and a communication component 816 .
处理组件802通常控制电子设备800的整体操作,诸如与显示,电话呼叫,数据通信,相机操作和记录操作相关联的操作。处理组件802可以包括一个或多个处理器820来执行指令,以完成上述的方法的全部或部分步骤。此外,处理组件802可以包括一个或多个模块,便于处理组件802和其他组件之间的交互。例如,处理组件802可以包括多媒体模块,以方便多媒体组件808和处理组件802之间的交互。The processing component 802 generally controls the overall operations of the electronic device 800, such as those associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component 802 may include one or more processors 820 to execute instructions to complete all or part of the steps of the above method. Additionally, processing component 802 may include one or more modules that facilitate interaction between processing component 802 and other components. For example, processing component 802 may include a multimedia module to facilitate interaction between multimedia component 808 and processing component 802 .
存储器804被配置为存储各种类型的数据以支持在电子设备800的操作。这些数据的示例包括用于在电子设备800上操作的任何应用程序或方法的指令,联系人数据,电话簿数据,消息,图片,视频等。存储器804可以由任何类型的易失性或非易失性存储设备或者它们的组合实现,如静态随机存取存储器(SRAM),电可擦除可编程只读存储器(EEPROM),可擦除可编程只读存储器(EPROM),可编程只读存储器(PROM),只读存储器(ROM),磁存储器,快闪存储器,磁盘或光盘。The memory 804 is configured to store various types of data to support operations at the electronic device 800 . Examples of such data include instructions for any application or method operating on the electronic device 800, contact data, phonebook data, messages, pictures, videos, and the like. The memory 804 can be implemented by any type of volatile or non-volatile storage device or their combination, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable Programmable Read Only Memory (EPROM), Programmable Read Only Memory (PROM), Read Only Memory (ROM), Magnetic Memory, Flash Memory, Magnetic or Optical Disk.
电源组件806为电子设备800的各种组件提供电力。电源组件806可以包括电源管理系统,一个或多个电源,及其他与为电子设备800生成、管理和分配电力相关联的组件。The power supply component 806 provides power to various components of the electronic device 800 . Power components 806 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for electronic device 800 .
多媒体组件808包括在所述电子设备800和用户之间的提供一个输出接口的屏幕。在一些实施例中,屏幕可以包括液晶显示器(LCD)和触摸面板(TP)。如果屏幕包括触摸面板,屏幕可以被实现为触摸屏,以接收来自用户的输入信号。触摸面板包括一个或多个触摸传感器以感测触摸、滑动和触摸面板上的手势。所述触摸传感器可以不仅感测触摸或滑动动作的边界,而且还检测与所述触摸或滑动操作相关的持续时间和压力。在一些实施例中,多媒体 组件808包括一个前置摄像头和/或后置摄像头。当电子设备800处于操作模式,如拍摄模式或视频模式时,前置摄像头和/或后置摄像头可以接收外部的多媒体数据。每个前置摄像头和后置摄像头可以是一个固定的光学透镜系统或具有焦距和光学变焦能力。The multimedia component 808 includes a screen providing an output interface between the electronic device 800 and the user. In some embodiments, the screen may include a liquid crystal display (LCD) and a touch panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from a user. The touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensor may not only sense a boundary of a touch or swipe action, but also detect duration and pressure associated with the touch or swipe action. In some embodiments, multimedia component 808 includes a front camera and/or rear camera. When the electronic device 800 is in an operation mode, such as a shooting mode or a video mode, the front camera and/or the rear camera can receive external multimedia data. Each front camera and rear camera can be a fixed optical lens system or have focal length and optical zoom capability.
音频组件810被配置为输出和/或输入音频信号。例如,音频组件810包括一个麦克风(MIC),当电子设备800处于操作模式,如呼叫模式、记录模式和语音识别模式时,麦克风被配置为接收外部音频信号。所接收的音频信号可以被进一步存储在存储器804或经由通信组件816发送。在一些实施例中,音频组件810还包括一个扬声器,用于输出音频信号。The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a microphone (MIC), which is configured to receive external audio signals when the electronic device 800 is in operation modes, such as call mode, recording mode and voice recognition mode. Received audio signals may be further stored in memory 804 or sent via communication component 816 . In some embodiments, the audio component 810 also includes a speaker for outputting audio signals.
I/O接口812为处理组件802和外围接口模块之间提供接口,上述外围接口模块可以是键盘,点击轮,按钮等。这些按钮可包括但不限于:主页按钮、音量按钮、启动按钮和锁定按钮。The I/O interface 812 provides an interface between the processing component 802 and a peripheral interface module, which may be a keyboard, a click wheel, a button, and the like. These buttons may include, but are not limited to: a home button, volume buttons, start button, and lock button.
传感器组件814包括一个或多个传感器,用于为电子设备800提供各个方面的状态评估。例如,传感器组件814可以检测到电子设备800的打开/关闭状态,组件的相对定位,例如所述组件为电子设备800的显示器和小键盘,传感器组件814还可以检测电子设备800或电子设备800一个组件的位置改变,用户与电子设备800接触的存在或不存在,电子设备800方位或加速/减速和电子设备800的温度变化。传感器组件814可以包括接近传感器,被配置用来在没有任何的物理接触时检测附近物体的存在。传感器组件814还可以包括光传感器,如互补金属氧化物半导体(CMOS)或电荷耦合装置(CCD)图像传感器,用于在成像应用中使用。在一些实施例中,该传感器组件814还可以包括加速度传感器,陀螺仪传感器,磁传感器,压力传感器或温度传感器。 Sensor assembly 814 includes one or more sensors for providing status assessments of various aspects of electronic device 800 . For example, the sensor component 814 can detect the open/closed state of the electronic device 800, the relative positioning of components, such as the display and the keypad of the electronic device 800, the sensor component 814 can also detect the electronic device 800 or a Changes in position of components, presence or absence of user contact with electronic device 800 , electronic device 800 orientation or acceleration/deceleration and temperature changes in electronic device 800 . Sensor assembly 814 may include a proximity sensor configured to detect the presence of nearby objects in the absence of any physical contact. Sensor assembly 814 may also include an optical sensor, such as a complementary metal-oxide-semiconductor (CMOS) or charge-coupled device (CCD) image sensor, for use in imaging applications. In some embodiments, the sensor component 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor or a temperature sensor.
通信组件816被配置为便于电子设备800和其他设备之间有线或无线方式的通信。电子设备800可以接入基于通信标准的无线网络,如无线网络(WiFi),第二代移动通信技术(2G)或第三代移动通信技术(3G),或它们的组合。在一个示例性实施例中,通信组件816经由广播信道接收来自外部广播管理系统的广播信号或广播相关信息。在一个示例性实施例中,所述通信组件816还包括近场通信(NFC)模块,以促进短程通信。例如,在NFC模块可基于射频识别(RFID)技术,红外数据协会(IrDA)技术,超宽带(UWB)技术,蓝牙(BT)技术和其他技术来实现。The communication component 816 is configured to facilitate wired or wireless communication between the electronic device 800 and other devices. The electronic device 800 can access a wireless network based on a communication standard, such as a wireless network (WiFi), a second generation mobile communication technology (2G) or a third generation mobile communication technology (3G), or a combination thereof. In an exemplary embodiment, the communication component 816 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 816 also includes a near field communication (NFC) module to facilitate short-range communication. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, Infrared Data Association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology and other technologies.
在示例性实施例中,电子设备800可以被一个或多个应用专用集成电路(ASIC)、数字信号处理器(DSP)、数字信号处理设备(DSPD)、可编程逻辑器件(PLD)、现场可编程门阵列(FPGA)、控制器、微控制器、微处理器或其他电子元件实现,用于执行上述方法。In an exemplary embodiment, electronic device 800 may be implemented by one or more application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field programmable A programmable gate array (FPGA), controller, microcontroller, microprocessor or other electronic component implementation for performing the methods described above.
在示例性实施例中,还提供了一种非易失性计算机可读存储介质,例如包括计算机程序指令的存储器804,上述计算机程序指令可由电子设备800的处理器820执行以完成上述方法。In an exemplary embodiment, there is also provided a non-volatile computer-readable storage medium, such as the memory 804 including computer program instructions, which can be executed by the processor 820 of the electronic device 800 to implement the above method.
图10示出根据本公开实施例的另一种电子设备1900的示意图。例如,电子设备1900可以被提供为一服务器。参照图10,电子设备1900包括处理组件1922,其进一步包括一个或多个处理器,以及由存储器1932所代表的存储器资源,用于存储可由处理组件1922的执行的指令,例如应用程序。存储器1932中存储的应用程序可以包括一个或一个以上的每一个对应于一组指令的模块。此外,处理组件1922被配置为执行指令,以执行上述方法。FIG. 10 shows a schematic diagram of another electronic device 1900 according to an embodiment of the present disclosure. For example, electronic device 1900 may be provided as a server. Referring to FIG. 10 , electronic device 1900 includes processing component 1922 , which further includes one or more processors, and memory resources represented by memory 1932 for storing instructions executable by processing component 1922 , such as application programs. The application programs stored in memory 1932 may include one or more modules each corresponding to a set of instructions. In addition, the processing component 1922 is configured to execute instructions to perform the above method.
电子设备1900还可以包括一个电源组件1926被配置为执行电子设备1900的电源管理,一 个有线或无线网络接口1950被配置为将电子设备1900连接到网络,和一个输入输出(I/O)接口1958。电子设备1900可以操作基于存储在存储器1932的操作系统,例如微软服务器操作系统(Windows Server TM),苹果公司推出的基于图形用户界面操作系统(Mac OS X TM),多用户多进程的计算机操作系统(Unix TM),自由和开放原代码的类Unix操作系统(Linux TM),开放原代码的类Unix操作系统(FreeBSD TM)或类似。 Electronic device 1900 may also include a power supply component 1926 configured to perform power management of electronic device 1900, a wired or wireless network interface 1950 configured to connect electronic device 1900 to a network, and an input-output (I/O) interface 1958 . The electronic device 1900 can operate based on the operating system stored in the memory 1932, such as the Microsoft server operating system (Windows Server TM ), the graphical user interface-based operating system (Mac OS X TM ) introduced by Apple Inc., and the multi-user and multi-process computer operating system (Unix ), a free and open source Unix-like operating system (Linux ), an open source Unix-like operating system (FreeBSD ), or the like.
在示例性实施例中,还提供了一种非易失性计算机可读存储介质,例如包括计算机程序指令的存储器1932,上述计算机程序指令可由电子设备1900的处理组件1922执行以完成上述方法。In an exemplary embodiment, there is also provided a non-transitory computer-readable storage medium, such as the memory 1932 including computer program instructions, which can be executed by the processing component 1922 of the electronic device 1900 to implement the above method.
本公开可以是系统、方法和/或计算机程序产品。计算机程序产品可以包括计算机可读存储介质,其上载有用于使处理器实现本公开的各个方面的计算机可读程序指令。The present disclosure can be a system, method and/or computer program product. A computer program product may include a computer readable storage medium having computer readable program instructions thereon for causing a processor to implement various aspects of the present disclosure.
计算机可读存储介质可以是可以保持和存储由指令执行设备使用的指令的有形设备。计算机可读存储介质例如可以是(但不限于)电存储设备、磁存储设备、光存储设备、电磁存储设备、半导体存储设备或者上述的任意合适的组合。计算机可读存储介质的更具体的例子(非穷举的列表)包括:便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、静态随机存取存储器(SRAM)、便携式压缩盘只读存储器(CD-ROM)、数字多功能盘(DVD)、记忆棒、软盘、机械编码设备、例如其上存储有指令的打孔卡或凹槽内凸起结构、以及上述的任意合适的组合。这里所使用的计算机可读存储介质不被解释为瞬时信号本身,诸如无线电波或者其他自由传播的电磁波、通过波导或其他传输媒介传播的电磁波(例如,通过光纤电缆的光脉冲)、或者通过电线传输的电信号。A computer readable storage medium may be a tangible device that can retain and store instructions for use by an instruction execution device. A computer readable storage medium may be, for example, but is not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of computer-readable storage media include: portable computer diskettes, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM), or flash memory), static random access memory (SRAM), compact disc read only memory (CD-ROM), digital versatile disc (DVD), memory stick, floppy disk, mechanically encoded device, such as a printer with instructions stored thereon A hole card or a raised structure in a groove, and any suitable combination of the above. As used herein, computer-readable storage media are not to be construed as transient signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., pulses of light through fiber optic cables), or transmitted electrical signals.
这里所描述的计算机可读程序指令可以从计算机可读存储介质下载到各个计算/处理设备,或者通过网络、例如因特网、局域网、广域网和/或无线网下载到外部计算机或外部存储设备。网络可以包括铜传输电缆、光纤传输、无线传输、路由器、防火墙、交换机、网关计算机和/或边缘服务器。每个计算/处理设备中的网络适配卡或者网络接口从网络接收计算机可读程序指令,并转发该计算机可读程序指令,以供存储在各个计算/处理设备中的计算机可读存储介质中。Computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a respective computing/processing device, or downloaded to an external computer or external storage device over a network, such as the Internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers, and/or edge servers. A network adapter card or a network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in each computing/processing device .
用于执行本公开操作的计算机程序指令可以是汇编指令、指令集架构(ISA)指令、机器指令、机器相关指令、微代码、固件指令、状态设置数据、或者以一种或多种编程语言的任意组合编写的源代码或目标代码,所述编程语言包括面向对象的编程语言—诸如Smalltalk、C++等,以及常规的过程式编程语言—诸如“C”语言或类似的编程语言。计算机可读程序指令可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络—包括局域网(LAN)或广域网(WAN)—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。在一些实施例中,通过利用计算机可读程序指令的状态信息来个性化定制电子电路,例如可编程逻辑电路、现场可编程门阵列(FPGA)或可编程逻辑阵列(PLA),该电子电路可以执行计算机可读程序指令,从而实现本公开的各个方面。Computer program instructions for performing the operations of the present disclosure may be assembly instructions, instruction set architecture (ISA) instructions, machine instructions, machine-dependent instructions, microcode, firmware instructions, state setting data, or Source or object code written in any combination, including object-oriented programming languages—such as Smalltalk, C++, etc., and conventional procedural programming languages—such as the “C” language or similar programming languages. Computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server implement. In cases involving a remote computer, the remote computer can be connected to the user computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (such as via the Internet using an Internet service provider). connect). In some embodiments, an electronic circuit, such as a programmable logic circuit, field programmable gate array (FPGA), or programmable logic array (PLA), can be customized by utilizing state information of computer-readable program instructions, which can Various aspects of the present disclosure are implemented by executing computer readable program instructions.
这里参照根据本公开实施例的方法、装置(系统)和计算机程序产品的流程图和/或框图描述了本公开的各个方面。应当理解,流程图和/或框图的每个方框以及流程图和/或框图中各方框的组合,都可以由计算机可读程序指令实现。Aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It should be understood that each block of the flowcharts and/or block diagrams, and combinations of blocks in the flowcharts and/or block diagrams, can be implemented by computer-readable program instructions.
这些计算机可读程序指令可以提供给通用计算机、专用计算机或其它可编程数据处理装置的处理器,从而生产出一种机器,使得这些指令在通过计算机或其它可编程数据处理装置的处理器执行时,产生了实现流程图和/或框图中的一个或多个方框中规定的功能/动作的装置。也可以把这些计算机可读程序指令存储在计算机可读存储介质中,这些指令使得计算机、可编程数据处理装置和/或其他设备以特定方式工作,从而,存储有指令的计算机可读介质则包括一个制造品,其包括实现流程图和/或框图中的一个或多个方框中规定的功能/动作的各个方面的指令。These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine such that when executed by the processor of the computer or other programmable data processing apparatus , producing an apparatus for realizing the functions/actions specified in one or more blocks in the flowchart and/or block diagram. These computer-readable program instructions can also be stored in a computer-readable storage medium, and these instructions cause computers, programmable data processing devices and/or other devices to work in a specific way, so that the computer-readable medium storing instructions includes An article of manufacture comprising instructions for implementing various aspects of the functions/acts specified in one or more blocks in flowcharts and/or block diagrams.
也可以把计算机可读程序指令加载到计算机、其它可编程数据处理装置、或其它设备上,使得在计算机、其它可编程数据处理装置或其它设备上执行一系列操作步骤,以产生计算机实现的过程,从而使得在计算机、其它可编程数据处理装置、或其它设备上执行的指令实现流程图和/或框图中的一个或多个方框中规定的功能/动作。It is also possible to load computer-readable program instructions into a computer, other programmable data processing device, or other equipment, so that a series of operational steps are performed on the computer, other programmable data processing device, or other equipment to produce a computer-implemented process , so that instructions executed on computers, other programmable data processing devices, or other devices implement the functions/actions specified in one or more blocks in the flowcharts and/or block diagrams.
附图中的流程图和框图显示了根据本公开的多个实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段或指令的一部分,所述模块、程序段或指令的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个连续的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或动作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in a flowchart or block diagram may represent a module, a portion of a program segment, or an instruction that includes one or more Executable instructions. In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved. It should also be noted that each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented by a dedicated hardware-based system that performs the specified function or action , or may be implemented by a combination of dedicated hardware and computer instructions.
该计算机程序产品可以具体通过硬件、软件或其结合的方式实现。在一个可选实施例中,所述计算机程序产品具体体现为计算机存储介质,在另一个可选实施例中,计算机程序产品具体体现为软件产品,例如软件开发包(Software Development Kit,SDK)等等。The computer program product can be specifically realized by means of hardware, software or a combination thereof. In an optional embodiment, the computer program product is embodied as a computer storage medium, and in another optional embodiment, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK) etc. wait.
以上已经描述了本公开的各实施例,上述说明是示例性的,并非穷尽性的,并且也不限于所披露的各实施例。在不偏离所说明的各实施例的范围和精神的情况下,对于本技术领域的普通技术人员来说许多修改和变更都是显而易见的。本文中所用术语的选择,旨在最好地解释各实施例的原理、实际应用或对市场中的技术的改进,或者使本技术领域的其它普通技术人员能理解本文披露的各实施例。Having described various embodiments of the present disclosure above, the foregoing description is exemplary, not exhaustive, and is not limited to the disclosed embodiments. Many modifications and alterations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen to best explain the principle of each embodiment, practical application or improvement of technology in the market, or to enable other ordinary skilled in the art to understand each embodiment disclosed herein.

Claims (15)

  1. 一种对象重识别方法,其中,所述方法包括:An object re-identification method, wherein the method includes:
    确定包括目标对象的待识别图像;Determining an image to be recognized that includes a target object;
    确定包括至少一个候选图像的图像集合,每个所述候选图像中包括对象;determining a set of images comprising at least one candidate image, each of said candidate images comprising an object;
    将所述待识别图像和所述图像集合输入重识别网络,得到重识别结果,在所述图像集合存在目标侯选图像的情况下下,所述重识别结果中包括所述目标候选图像,所述目标侯选图像包括的对象与所述目标对象匹配;Inputting the image to be recognized and the set of images into a re-identification network to obtain a re-recognition result, if there is a target candidate image in the image set, the re-recognition result includes the target candidate image, so An object included in the target candidate image matches the target object;
    其中,所述重识别网络通过两阶段训练得到,第一阶段训练过程根据至少一个样本图像和每个所述样本图像的第一类别标签实现,第二阶段训练过程根据所述至少一个样本图像、每个所述样本图像的伪标签和第一类别标签实现,每个所述样本图像的伪标签基于所述第一阶段训练过程结束后的重识别网络确定,所述第一类别标签表征对应图像的类别。Wherein, the re-identification network is obtained through two-stage training, the first-stage training process is implemented according to at least one sample image and the first category label of each of the sample images, and the second-stage training process is based on the at least one sample image, The pseudo-label and the first category label of each of the sample images are realized, the pseudo-label of each of the sample images is determined based on the re-identification network after the first stage of the training process, and the first category label represents the corresponding image category.
  2. 根据权利要求1所述的方法,其中,所述重识别网络的训练过程包括:The method according to claim 1, wherein the training process of the re-identification network comprises:
    确定至少一个包括对象的预设图像,每个所述预设图像具有至少一个用于标注对象所在区域的图像框,以及每个所述图像框对应的第一类别标签;determining at least one preset image including an object, each of the preset images having at least one image frame for marking the area where the object is located, and a first category label corresponding to each of the image frames;
    根据对应的至少一个所述图像框确定每个所述预设图像对应的至少一个样本图像;determining at least one sample image corresponding to each of the preset images according to the corresponding at least one image frame;
    根据所述样本图像和对应的第一类别标签对所述重识别网络进行第一阶段训练;performing a first-stage training on the re-identification network according to the sample image and the corresponding first category label;
    根据第一阶段训练结束的所述重识别网络确定所述样本图像的伪标签;Determining the pseudo-label of the sample image according to the re-identification network completed in the first stage of training;
    根据所述样本图像和对应的第一类别标签以及伪标签对第一阶段训练后得到的所述重识别网络进行第二阶段训练。The second-stage training is performed on the re-identification network obtained after the first-stage training according to the sample images and the corresponding first category labels and pseudo-labels.
  3. 根据权利要求2所述的方法,其中,所述确定至少一个包括对象的预设图像包括:The method of claim 2, wherein said determining at least one preset image comprising an object comprises:
    对预设图像集合进行随机抽样得到至少一个包括对象的预设图像。Random sampling is performed on the set of preset images to obtain at least one preset image including an object.
  4. 根据权利要求2或3所述的方法,其中,所述根据对应的至少一个所述图像框确定每个所述预设图像对应的至少一个样本图像包括:The method according to claim 2 or 3, wherein said determining at least one sample image corresponding to each of said preset images according to the corresponding at least one said image frame comprises:
    对每个所述预设图像进行至少一次数据增强,并在每一次数据增强后截取至少一个所述图像框中的区域作为样本图像;performing at least one data enhancement on each of the preset images, and intercepting at least one area in the image frame as a sample image after each data enhancement;
    和/或,and / or,
    在对每个所述预设图像进行数据增强之前,对所述预设图像进行图像预处理。Before data enhancement is performed on each of the preset images, image preprocessing is performed on the preset images.
  5. 根据权利要求2至4中任意一项所述的方法,其中,每个所述候选图像具有对应的第二类别标签,所述第二类别标签表征对应图像中对象的类别;The method according to any one of claims 2 to 4, wherein each of the candidate images has a corresponding second category label, the second category label representing the category of the object in the corresponding image;
    所述方法还包括:The method also includes:
    确定所述目标候选图像对应的第二类别标签为所述待识别图像的第二类别标签;Determining the second category label corresponding to the target candidate image as the second category label of the image to be recognized;
    和/或,and / or,
    所述根据所述样本图像和对应的第一类别标签对所述重识别网络进行第一阶段训 练包括:The first-stage training of the re-identification network according to the sample image and the corresponding first category label includes:
    确定每个所述样本图像对应的第一类别标签为第二类别标签;Determining the first category label corresponding to each of the sample images as the second category label;
    将每个所述样本图像输入所述重识别网络,输出样本图像对应的第一预测类别;Input each sample image into the re-identification network, and output the first prediction category corresponding to the sample image;
    根据每个所述样本图像对应的第一类别标签、第二类别标签和第一预测类别确定第一网络损失,并根据所述第一网络损失调节所述重识别网络。A first network loss is determined according to the first category label, the second category label and the first predicted category corresponding to each of the sample images, and the re-identification network is adjusted according to the first network loss.
  6. 根据权利要求5所述的方法,其中,所述根据每个所述样本图像对应的第一类别标签、第二类别标签和第一预测类别确定第一网络损失,并根据所述第一网络损失调节所述重识别网络包括:The method according to claim 5, wherein the first network loss is determined according to the first category label, the second category label and the first predicted category corresponding to each of the sample images, and according to the first network loss Adjusting the re-identification network includes:
    根据每个所述样本图像对应的第一类别标签和第一预测类别确定第一损失;determining a first loss according to a first category label and a first predicted category corresponding to each of the sample images;
    根据每个所述样本图像对应的第二类别标签和第一预测类别确定第二损失;determining a second loss according to the second category label and the first predicted category corresponding to each of the sample images;
    根据所述第一损失和所述第二损失确定第一网络损失,并根据所述第一网络损失调节所述重识别网络。A first network loss is determined based on the first loss and the second loss, and the re-identification network is adjusted based on the first network loss.
  7. 根据权利要求2至6中任意一项所述的方法,其中,所述根据第一阶段训练结束的所述重识别网络确定所述样本图像的伪标签包括:The method according to any one of claims 2 to 6, wherein the determination of the pseudo-label of the sample image according to the re-identification network at the end of the first stage of training comprises:
    将每个所述样本图像输入第一阶段训练结束的所述重识别网络,得到对每个所述样本图像进行特征提取后的特征向量;Each of the sample images is input into the re-identification network that has been trained in the first stage to obtain a feature vector after feature extraction is performed on each of the sample images;
    对每个所述样本图像的特征向量进行聚类,并确定与聚类后得到的每个聚类簇唯一对应的标识信息;Clustering the feature vectors of each of the sample images, and determining identification information uniquely corresponding to each cluster obtained after clustering;
    将每个所述聚类簇对应的标识信息作为其中包括的每个特征向量对应样本图像的伪标签;Using the identification information corresponding to each of the clusters as a pseudo-label corresponding to each feature vector included in the sample image;
    其中,所述聚类过程基于k均值聚类算法实现。Wherein, the clustering process is implemented based on the k-means clustering algorithm.
  8. 根据权利要求2至7中任意一项所述的方法,其中,所述根据所述样本图像和对应的第一类别标签以及伪标签对第一阶段训练后得到的所述重识别网络进行第二阶段训练包括:The method according to any one of claims 2 to 7, wherein the second step is performed on the re-identification network obtained after the first stage of training according to the sample image and the corresponding first category label and pseudo-label. Stage training includes:
    将每个所述样本图像输入第一阶段训练后得到的所述重识别网络,输出对应的第二预测类别;Input each of the sample images into the re-identification network obtained after the first stage of training, and output the corresponding second predicted category;
    根据每个所述样本图像对应的第一类别标签、伪标签和第二预测类别确定第二网络损失,并根据所述第二网络损失调节所述重识别网络。A second network loss is determined according to the first class label, pseudo-label and second predicted class corresponding to each of the sample images, and the re-identification network is adjusted according to the second network loss.
  9. 根据权利要求8所述的方法,其中,所述根据每个所述样本图像对应的第一类别标签、伪标签和第二预测类别确定第二网络损失,并根据所述第二网络损失调节所述重识别网络包括:The method according to claim 8, wherein the second network loss is determined according to the first category label, pseudo-label and second predicted category corresponding to each of the sample images, and the second network loss is adjusted according to the second network loss. The re-identification network includes:
    根据每个所述样本图像对应的第一类别标签和第二预测类别确定第三损失;determining a third loss according to the first category label and the second predicted category corresponding to each of the sample images;
    根据每个所述样本图像对应的伪标签和第二预测类别确定第四损失;determining a fourth loss according to a pseudo-label corresponding to each of the sample images and a second prediction category;
    根据所述第三损失和所述第四损失确定第二网络损失,并根据所述第二网络损失调节所述重识别网络。A second network loss is determined based on the third loss and the fourth loss, and the re-identification network is adjusted based on the second network loss.
  10. 根据权利要求6至9中任意一项所述的方法,其中,所述第一损失和/或所述第三 损失为三元组损失,所述第二损失和/或所述第四损失为交叉熵分类损失。The method according to any one of claims 6 to 9, wherein the first loss and/or the third loss is a triplet loss, and the second loss and/or the fourth loss is Cross-entropy classification loss.
  11. 根据权利要求1至10中任意一项所述的方法,其中,所述将所述待识别图像和所述图像集合输入重识别网络,得到重识别结果包括:The method according to any one of claims 1 to 10, wherein said inputting said image to be recognized and said set of images into a re-recognition network, and obtaining a re-recognition result comprises:
    将所述待识别图像和所述图像集合输入重识别网络,通过所述重识别网络提取得到所述待识别图像的目标对象特征,和每个所述候选图像的候选对象特征;The image to be recognized and the set of images are input into a re-identification network, and the target object features of the image to be recognized are extracted through the re-identification network, and the candidate object features of each of the candidate images;
    根据所述目标对象特征和每个所述候选对象特征确定每个所述候选图像与所述待识别图像的相似度;determining the similarity between each of the candidate images and the image to be recognized according to the characteristics of the target object and the characteristics of each of the candidate objects;
    响应于候选图像与所述待识别图像的相似度满足预设条件,确定所述候选图像中的对象与所述目标对象匹配,并将所述候选图像作为目标候选图像得到重识别结果;所述预设条件包括相似度值最大且大于相似度阈值。Responding to the fact that the similarity between the candidate image and the image to be recognized satisfies a preset condition, it is determined that the object in the candidate image matches the target object, and the candidate image is used as the target candidate image to obtain a re-identification result; the The preset condition includes that the similarity value is the largest and greater than the similarity threshold.
  12. 一种对象重识别装置,其中,所述装置包括:An object re-identification device, wherein the device includes:
    图像确定模块,配置为确定包括目标对象的待识别图像;An image determination module configured to determine an image to be recognized including a target object;
    集合确定模块,配置为确定包括至少一个候选图像的图像集合,每个所述候选图像中包括对象;a set determination module configured to determine a set of images comprising at least one candidate image, each of said candidate images comprising an object;
    重识别模块,配置为将所述待识别图像和所述图像集合输入重识别网络,得到重识别结果,在所述图像集合存在目标侯选图像的情况下下,所述重识别结果中包括所述目标候选图像,所述目标侯选图像包括的对象与所述目标对象匹配;The re-identification module is configured to input the image to be recognized and the image set into a re-identification network to obtain a re-identification result, and if there is a target candidate image in the image set, the re-identification result includes the The target candidate image, the object included in the target candidate image matches the target object;
    其中,所述重识别网络通过两阶段训练得到,第一阶段训练过程根据至少一个样本图像和每个所述样本图像的第一类别标签实现,第二阶段训练过程根据所述至少一个样本图像、每个所述样本图像的伪标签和第一类别标签实现,每个所述样本图像的伪标签基于所述第一阶段训练过程结束后的重识别网络确定,所述第一类别标签表征对应图像的类别。Wherein, the re-identification network is obtained through two-stage training, the first-stage training process is implemented according to at least one sample image and the first category label of each of the sample images, and the second-stage training process is based on the at least one sample image, The pseudo-label and the first category label of each of the sample images are realized, the pseudo-label of each of the sample images is determined based on the re-identification network after the first stage of the training process, and the first category label represents the corresponding image category.
  13. 一种电子设备,其中,包括:An electronic device, comprising:
    处理器;processor;
    用于存储处理器可执行指令的存储器;memory for storing processor-executable instructions;
    其中,所述处理器被配置为调用所述存储器存储的指令,以执行权利要求1至11中任意一项所述的方法。Wherein, the processor is configured to invoke instructions stored in the memory to execute the method according to any one of claims 1-11.
  14. 一种计算机可读存储介质,其上存储有计算机程序指令,其中,所述计算机程序指令被处理器执行时实现权利要求1至11中任意一项所述的方法。A computer-readable storage medium on which computer program instructions are stored, wherein the computer program instructions implement the method according to any one of claims 1 to 11 when executed by a processor.
  15. 一种计算机程序产品,其中,所述计算机程序产品包括存储了计算机程序的非瞬时性计算机可读存储介质,所述计算机程序被计算机读取并执行时,实现权利要求1至11中任意一项所述的方法。A computer program product, wherein the computer program product includes a non-transitory computer-readable storage medium storing a computer program, and when the computer program is read and executed by a computer, any one of claims 1 to 11 is realized the method described.
PCT/CN2022/104715 2021-12-24 2022-07-08 Object re-identification method and apparatus, electronic device, storage medium, and computer program product WO2023115911A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111601354.8A CN114332503A (en) 2021-12-24 2021-12-24 Object re-identification method and device, electronic equipment and storage medium
CN202111601354.8 2021-12-24

Publications (1)

Publication Number Publication Date
WO2023115911A1 true WO2023115911A1 (en) 2023-06-29

Family

ID=81012974

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/104715 WO2023115911A1 (en) 2021-12-24 2022-07-08 Object re-identification method and apparatus, electronic device, storage medium, and computer program product

Country Status (2)

Country Link
CN (1) CN114332503A (en)
WO (1) WO2023115911A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116665135A (en) * 2023-07-28 2023-08-29 中国华能集团清洁能源技术研究院有限公司 Thermal runaway risk early warning method and device for battery pack of energy storage station and electronic equipment

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114332503A (en) * 2021-12-24 2022-04-12 商汤集团有限公司 Object re-identification method and device, electronic equipment and storage medium
CN117058489B (en) * 2023-10-09 2023-12-29 腾讯科技(深圳)有限公司 Training method, device, equipment and storage medium of multi-label recognition model

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180374233A1 (en) * 2017-06-27 2018-12-27 Qualcomm Incorporated Using object re-identification in video surveillance
CN111783646A (en) * 2020-06-30 2020-10-16 北京百度网讯科技有限公司 Training method, device, equipment and storage medium of pedestrian re-identification model
CN111967294A (en) * 2020-06-23 2020-11-20 南昌大学 Unsupervised domain self-adaptive pedestrian re-identification method
CN112069929A (en) * 2020-08-20 2020-12-11 之江实验室 Unsupervised pedestrian re-identification method and device, electronic equipment and storage medium
CN113095174A (en) * 2021-03-29 2021-07-09 深圳力维智联技术有限公司 Re-recognition model training method, device, equipment and readable storage medium
CN114332503A (en) * 2021-12-24 2022-04-12 商汤集团有限公司 Object re-identification method and device, electronic equipment and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180374233A1 (en) * 2017-06-27 2018-12-27 Qualcomm Incorporated Using object re-identification in video surveillance
CN111967294A (en) * 2020-06-23 2020-11-20 南昌大学 Unsupervised domain self-adaptive pedestrian re-identification method
CN111783646A (en) * 2020-06-30 2020-10-16 北京百度网讯科技有限公司 Training method, device, equipment and storage medium of pedestrian re-identification model
CN112069929A (en) * 2020-08-20 2020-12-11 之江实验室 Unsupervised pedestrian re-identification method and device, electronic equipment and storage medium
CN113095174A (en) * 2021-03-29 2021-07-09 深圳力维智联技术有限公司 Re-recognition model training method, device, equipment and readable storage medium
CN114332503A (en) * 2021-12-24 2022-04-12 商汤集团有限公司 Object re-identification method and device, electronic equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116665135A (en) * 2023-07-28 2023-08-29 中国华能集团清洁能源技术研究院有限公司 Thermal runaway risk early warning method and device for battery pack of energy storage station and electronic equipment
CN116665135B (en) * 2023-07-28 2023-10-20 中国华能集团清洁能源技术研究院有限公司 Thermal runaway risk early warning method and device for battery pack of energy storage station and electronic equipment

Also Published As

Publication number Publication date
CN114332503A (en) 2022-04-12

Similar Documents

Publication Publication Date Title
WO2021155632A1 (en) Image processing method and apparatus, and electronic device and storage medium
US11120078B2 (en) Method and device for video processing, electronic device, and storage medium
WO2023115911A1 (en) Object re-identification method and apparatus, electronic device, storage medium, and computer program product
WO2021128578A1 (en) Image processing method and apparatus, electronic device, and storage medium
CN108629354B (en) Target detection method and device
WO2021056808A1 (en) Image processing method and apparatus, electronic device, and storage medium
US11455491B2 (en) Method and device for training image recognition model, and storage medium
CN113792207B (en) Cross-modal retrieval method based on multi-level feature representation alignment
CN106228556B (en) image quality analysis method and device
US11222231B2 (en) Target matching method and apparatus, electronic device, and storage medium
CN110532956B (en) Image processing method and device, electronic equipment and storage medium
CN111259148A (en) Information processing method, device and storage medium
CN113326768B (en) Training method, image feature extraction method, image recognition method and device
CN111259967B (en) Image classification and neural network training method, device, equipment and storage medium
CN111582383B (en) Attribute identification method and device, electronic equipment and storage medium
TW202141352A (en) Character recognition method, electronic device and computer readable storage medium
CN112150457A (en) Video detection method, device and computer readable storage medium
WO2022141969A1 (en) Image segmentation method and apparatus, electronic device, storage medium, and program
CN112381091A (en) Video content identification method and device, electronic equipment and storage medium
CN111178115B (en) Training method and system for object recognition network
CN110110742B (en) Multi-feature fusion method and device, electronic equipment and storage medium
CN111797746A (en) Face recognition method and device and computer readable storage medium
WO2021061045A2 (en) Stacked object recognition method and apparatus, electronic device and storage medium
WO2023092975A1 (en) Image processing method and apparatus, electronic device, storage medium, and computer program product
CN112801116B (en) Image feature extraction method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22909242

Country of ref document: EP

Kind code of ref document: A1