WO2023115911A1 - Procédé et appareil de réidentification d'objet, dispositif électronique, support de stockage et produit programme d'ordinateur - Google Patents

Procédé et appareil de réidentification d'objet, dispositif électronique, support de stockage et produit programme d'ordinateur Download PDF

Info

Publication number
WO2023115911A1
WO2023115911A1 PCT/CN2022/104715 CN2022104715W WO2023115911A1 WO 2023115911 A1 WO2023115911 A1 WO 2023115911A1 CN 2022104715 W CN2022104715 W CN 2022104715W WO 2023115911 A1 WO2023115911 A1 WO 2023115911A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
category
loss
sample
network
Prior art date
Application number
PCT/CN2022/104715
Other languages
English (en)
Chinese (zh)
Inventor
王皓琦
王新江
钟志权
张伟
Original Assignee
上海商汤智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海商汤智能科技有限公司 filed Critical 上海商汤智能科技有限公司
Publication of WO2023115911A1 publication Critical patent/WO2023115911A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/762Arrangements for image or video recognition or understanding using pattern recognition or machine learning using clustering, e.g. of similar faces in social networks

Definitions

  • the present disclosure relates to the field of computer technology, and in particular to an object re-identification method and device, electronic equipment, storage media and computer program products.
  • Re-identification technology is widely used in various projects, such as re-identification of people, vehicles, and objects.
  • some new situations may appear at any time, and correspondingly some never-before-seen data will be generated.
  • Traditional re-identification algorithms require a large number of sample annotations for training, and when the data set shifts or the field shifts, it needs to re-label new data or samples from new fields, and this type of method consumes a lot of manpower and material resources.
  • unsupervised re-identification methods in related technologies often have low accuracy of re-identification results due to influences such as scenes.
  • Embodiments of the present disclosure propose an object re-identification method and device, electronic equipment, a storage medium, and a computer program product, aiming to improve the accuracy of re-identification results through a re-identification model obtained through unsupervised training.
  • an object re-identification method including:
  • the re-recognition result includes the target candidate image, so An object included in the target candidate image matches the target object;
  • the re-identification network is obtained through two-stage training, the first-stage training process is implemented according to at least one sample image and the first category label of each of the sample images, and the second-stage training process is based on the at least one sample image,
  • the pseudo-label and the first category label of each of the sample images are realized, the pseudo-label of each of the sample images is determined based on the re-identification network after the first stage of the training process, and the first category label represents the corresponding image category.
  • an object re-identification device including:
  • An image determination module configured to determine an image to be recognized including a target object
  • a set determination module configured to determine a set of images comprising at least one candidate image, each of said candidate images comprising an object
  • the re-identification module is configured to input the image to be recognized and the image set into a re-identification network to obtain a re-identification result, and if there is a target candidate image in the image set, the re-identification result includes the The target candidate image, the object included in the target candidate image matches the target object;
  • the re-identification network is obtained through two-stage training, the first-stage training process is implemented according to at least one sample image and the first category label of each of the sample images, and the second-stage training process is based on the at least one sample image,
  • the pseudo-label and the first category label of each of the sample images are realized, the pseudo-label of each of the sample images is determined based on the re-identification network after the first stage of the training process, and the first category label represents the corresponding image category.
  • an electronic device including: a processor; a memory for storing instructions executable by the processor; wherein the processor is configured to call the instructions stored in the memory, to perform the above method.
  • a computer-readable storage medium on which computer program instructions are stored, and when the computer program instructions are executed by a processor, the foregoing method is implemented.
  • a computer program product includes a non-transitory computer-readable storage medium storing a computer program, and when the computer program is read and executed by a computer, the following Part or all of the steps of the methods described in the embodiments of the present disclosure.
  • the computer program product may be a software installation package.
  • the performance of the re-identification network is improved through two-stage training, thereby improving the accuracy of the recognition result.
  • FIG. 1 shows a flowchart of an object re-identification method according to an embodiment of the present disclosure
  • Fig. 2 shows a flow chart of training a re-identification network according to an embodiment of the present disclosure
  • Fig. 3 shows a schematic diagram of a preset image according to an embodiment of the present disclosure
  • Fig. 4 shows a schematic diagram of a sample image according to an embodiment of the present disclosure
  • Fig. 5 shows a schematic diagram of determining a sample graph according to an embodiment of the present disclosure
  • FIG. 6 shows a schematic diagram of a first-stage training process of a re-identification network according to an embodiment of the present disclosure
  • FIG. 7 shows a schematic diagram of a second-stage training process of a re-identification network according to an embodiment of the present disclosure
  • Fig. 8 shows a schematic diagram of an object re-identification device according to an embodiment of the present disclosure
  • Fig. 9 shows a schematic diagram of an electronic device according to an embodiment of the present disclosure.
  • Fig. 10 shows a schematic diagram of another electronic device according to an embodiment of the present disclosure.
  • the object re-identification method in the embodiment of the present disclosure may be executed by an electronic device such as a terminal device or a server.
  • the terminal device may be user equipment (User Equipment, UE), mobile device, user terminal, terminal, cellular phone, cordless phone, personal digital assistant (Personal Digital Assistant, PDA), handheld device, computing device, vehicle-mounted device, Any mobile or fixed terminal such as wearable devices.
  • the server can be a single server or a server cluster composed of multiple servers. Any electronic device can realize the object re-identification method of the embodiment of the present disclosure by calling the computer-readable instructions stored in the memory by the processor.
  • the object re-identification method of the embodiment of the present disclosure can be applied to re-identify any object, such as a person, a vehicle, and an animal.
  • the re-identification method can search for images or video frames containing specific objects in multiple images or video frame sequences, and can be applied to the application scene of searching for specific people in images collected by multiple cameras, or tracking objects such as pedestrians and vehicles Application scenarios.
  • Fig. 1 shows a flowchart of an object re-identification method according to an embodiment of the present disclosure.
  • the object re-identification method of the embodiment of the present disclosure may include the following steps S10 to S30.
  • Step S10 determining the image to be recognized including the target object.
  • the image to be recognized may be an image directly obtained by capturing the target object, or an image obtained by intercepting an image obtained by capturing the target object and an area where the target object is located.
  • the determination method of the image to be recognized may be collected by an image acquisition device built in or connected to the electronic device, or directly receiving the image to be recognized sent by other devices.
  • the target object can be any movable or non-movable object such as people, animals, vehicles or even furniture.
  • Step S20 determining an image set including at least one candidate image, each of which includes an object.
  • an image set used as a basis for re-identification of the image to be recognized is determined, including at least one candidate image for matching with the image to be recognized.
  • the image collection may be pre-stored in the electronic device, or in a database connected to the electronic device.
  • Each candidate image is obtained by collecting similar objects of the target object, and may be an image obtained by directly collecting the object, or an image obtained by intercepting the collected object to obtain an area where the object is located in the image. That is, the objects in each candidate image are the same kind of objects as the target object. For example, when the target object is a person, the object in the candidate image is also a person. When the target object is a vehicle, the object in the candidate image is also a vehicle.
  • each candidate image in the image set also has a corresponding second category label, which is used to characterize the category of the object in the candidate image.
  • the second category label may be identity information such as the object's name, phone number, and ID card number.
  • the second category label may be the vehicle's license plate number, vehicle owner information, driving certificate number, and the like.
  • Step S30 input the image to be recognized and the set of images into a re-recognition network to obtain a re-recognition result.
  • the image to be recognized and the image set are input into the re-identification network, and the candidate image whose object is matched with the target object is determined through the re-identification network among multiple candidate images, and the candidate image is used as the target candidate
  • the image gets a re-identification result. That is, in a case where there is a target candidate image in which the included object matches the target object, the target candidate image may be included in the re-identification result.
  • the re-identification result may also include the category of the target object. That is, after the target candidate image is determined, the second category label corresponding to the target candidate image is also determined as the second category label of the image to be recognized.
  • the detailed process of determining the re-identification result through the re-identification network may be to input the image to be recognized and the image set into the re-identification network, extract the target object features of the image to be recognized through the re-identification network, and each candidate image candidate object features. Then determine the similarity between each candidate image and the image to be recognized according to the characteristics of the target object and the characteristics of each candidate object. In response to the similarity between the candidate image and the image to be recognized satisfying the preset condition, it is determined that the object in the candidate image matches the target object, and the candidate image is used as the target candidate image.
  • the features of the target object can be obtained by intercepting the region where the target object is located in the image to be recognized, and extracting the features of the region through the feature extraction layer of the re-identification network.
  • the features of the candidate object can also be obtained by intercepting the area where the object is located in the candidate image and extracting the features of the area through the feature extraction layer of the re-identification network.
  • the features of the target object and each candidate object can be represented by vectors, and the similarity can be obtained by calculating the distance between the two corresponding vectors in the feature space. The similarity can be calculated by the following formula 1:
  • similarity(A,B) is the similarity between A and B
  • A is the target object feature
  • B is the candidate object feature
  • n is the target object feature and the number of elements in the candidate object feature
  • i represents the current element in the target object feature and the position in the feature of the candidate object, that is, which element is the current element.
  • the preset condition may be that the similarity value is the largest and greater than the similarity threshold, that is, the candidate image with the largest similarity value and greater than the similarity threshold is determined as the target candidate image.
  • the second category label of the target candidate image is determined as the second category label of the image to be recognized, and a re-identification result including the target candidate image and the corresponding second category label is determined.
  • the category of the target object in the current image to be recognized is a new Category
  • determine that the re-identification result is a new category.
  • the re-identification network in the embodiment of the present disclosure is obtained through two-stage training.
  • the first-stage training process is implemented according to at least one sample image and the first category label of each sample image
  • the second-stage training process is implemented according to at least one sample image, the pseudo-label of each sample image and the first category label, and each The pseudo-labels of the sample images are determined based on the re-identification network after the first-stage training process, and the first category label represents the category of the corresponding image.
  • the sample image is a sample image that has not been manually labeled.
  • Fig. 2 shows a flow chart of training a re-identification network according to an embodiment of the present disclosure.
  • the training process of the re-identification network in the embodiment of the present disclosure may include the following steps S40 to S80.
  • the electronic device that executes steps S40 to S80 may be an electronic device that executes the object re-identification method, or other electronic devices such as terminals or servers.
  • Step S40 determining at least one preset image including the object.
  • each preset image is obtained by capturing at least one object, and each preset image has at least one image frame for marking an area where the object is located and a first category label corresponding to each image frame.
  • each preset image has at least one image frame, which is used to mark the area where the object in the preset image is located.
  • the image frame can be annotated by any object annotation method.
  • the pre-trained object recognition model may be input into the pre-trained image to recognize the position of the object included in the pre-set image, and output at least one image frame representing the position of the object.
  • the first category label represents the category of the image region in the corresponding image frame, and can be determined according to the collected object. For example, when two characters are collected by an image acquisition device to obtain a preset image, the position of each character in the preset image can be identified to obtain two corresponding image frames, and each image frame is assigned a corresponding first category label as Person 1 and Person 2.
  • At least one preset image may be determined through random sampling, that is, random sampling is performed on a set of preset images to obtain at least one preset image including an object.
  • the preset image set may be pre-stored in the electronic device for training the re-recognition network, or stored in other devices, and the electronic device for training the re-recognition network directly extracts at least one preset image from other electronic devices.
  • Fig. 3 shows a schematic diagram of a preset image according to an embodiment of the present disclosure.
  • the preset image 30 may include at least one object, and the preset image 30 also has an image frame for marking the position of the object.
  • the preset image 30 may have an image frame for representing the location of the face of at least one person.
  • the preset image 30 includes characters 1 and 2
  • the preset image 30 has a first image frame 31 representing the area where the face of character 1 is located, and a second image frame 32 representing the area where the face of character 2 is located.
  • the first category tag corresponding to the first image frame 31 in the preset image 30 can be directly preset as character 1, and the second The first category label corresponding to the image frame 32 is person 2 .
  • Step S50 Determine at least one sample image corresponding to each preset image according to the corresponding at least one image frame.
  • At least one sample image corresponding to each preset image is determined according to an image frame corresponding to each preset image.
  • Each sample image is obtained by cropping a part of the preset image.
  • multiple sample images may be obtained by cutting out each image frame of the preset image.
  • at least one data enhancement may be performed on each preset image, and after each data enhancement, an area in at least one image frame may be intercepted as a sample image.
  • the data enhancement process may include translating the image frame, flipping the image frame, reducing the ratio of the image frame, etc., so that the sample image intercepted after each data enhancement can include different regions of the object.
  • data preprocessing can be performed on each preset image before data enhancement.
  • the data preprocessing process may include any processing methods such as format conversion, image brightness adjustment, and overall noise reduction, and at least one processing method may be pre-selected for data preprocessing as required.
  • Fig. 4 shows a schematic diagram of a sample image according to an embodiment of the present disclosure.
  • multiple sample images corresponding to at least one object are obtained by cropping the preset image 30 .
  • the preset image 30 includes character 1 and character 2
  • the image frame on the preset image 30 is the first image frame 31 representing the area where the face of character 1 is located, and the second image frame 32 representing the area where the face of character 2 is located
  • at least one first object sample image 33 corresponding to the person 1 in the preset image 30 and at least one second object sample image 34 corresponding to the person 2 in the preset image 30 may be determined.
  • the content in the image frame is used to obtain the corresponding first object sample image 33 and second object sample image 34 .
  • Fig. 5 shows a schematic diagram of determining a sample graph according to an embodiment of the present disclosure.
  • the preset image set 50 including at least one preset image can be determined first, and the preset image set 50 is randomly sampled 51 to obtain At least one preset image 52 includes an object.
  • the order of the process of randomly sampling the preset image from the preset image set and the process of extracting the sample image from the preset image can be changed, that is, the preset image can be randomly sampled first Then extract the sample image, or first extract the sample image for each preset image in the preset image set, and then randomly sample.
  • the sequence of image preprocessing and data enhancement during sample image extraction can also be adjusted.
  • the embodiments of the present disclosure can obtain multiple sample images corresponding to each object through data enhancement, greatly expanding the number of sample images.
  • the image processing can also be performed in parallel by a GPU (Graphics Processing Unit, graphics processing unit), so as to shorten the image processing speed and reduce unnecessary background noise.
  • the embodiment of the present disclosure alleviates the problem that the loss of the training process is difficult to calculate due to too many sample categories by randomly selecting preset images, and makes the extracted preset images representative through random sampling, which can reflect the preset Features of an image collection.
  • Step S60 performing a first-stage training on the re-identification network according to the sample image and the corresponding first category label.
  • the re-identification network may be trained in the first stage directly according to each sample image and the first category label.
  • the re-identification network can output the first predicted category of the input sample image, and the characterizing re-identification network predicts the category of the object in the input sample image. Since only one object is included in the sample image, the real image category of the sample image is the real object category, and the loss can be calculated according to the real image category and the real object category of the sample image respectively with the first predicted category to obtain the total re-identification Network loss for network conditioning.
  • manual sample image labeling is not required before re-identification network training, and the first category label of the image frame corresponding to the sample image can be directly used as the first category label of the sample image.
  • a category label that is, the category of the area where each object position in the preset image is used as the real image category of the sample image obtained by collecting this part of the area.
  • the actual second category label may not be labeled according to the object category in each sample image.
  • the first category label of each sample image is directly used as the second category label representing the object category, and the real object category in the sample image is corrected in the second-stage training process .
  • the first category labels of each image frame are "person 1", “person 2” and “person 3”, respectively extract the A sample image of .
  • the identity of each person will be identified in detail, and the corresponding second category tags can be marked as "Zhang San”, “Li Si” and “Wang Wu”.
  • the category labels are "Person 1", “Person 2” and "Person 3”.
  • the first-stage training process of the re-identification network includes determining the first category label corresponding to each sample image as the second category label, and then inputting each sample image into the re-identification network, and outputting corresponding to the first predicted category.
  • a first network loss is determined according to the first category label, the second category label and the first predicted category corresponding to each sample image, and the re-identification network is adjusted according to the first network loss.
  • the first loss may be determined according to the first category label of the sample image and the first predicted category
  • the second loss may be determined according to the second category label of the sample image and the first predicted category.
  • the first loss can be determined according to the first category label and the first prediction category corresponding to each sample image
  • the second loss can be determined according to the second category label and the first prediction category corresponding to each sample image
  • the first loss can be determined according to the first loss and the second loss to determine the first network loss, and adjust the re-identification network according to the first network loss.
  • the first network loss can be obtained by calculating the weighted sum of the first loss and the second loss.
  • the first loss may be a triplet loss
  • the second loss may be a cross-entropy classification loss. That is, the first loss can be obtained by calculating the triplet loss of the first category label and the first predicted category of each sample image, and the second loss can be obtained by calculating the cross entropy of the second category label and the first predicted category of each sample image
  • the classification loss is obtained.
  • the triplet loss is inversely proportional to the distance between samples of the same object category and proportional to the distance between samples of different object categories.
  • the triplet loss can be reduced by network conditioning, bringing the distance between samples of the same object category closer and the distance between samples of different object categories farther away.
  • the cross-entropy classification loss is inversely proportional to the distance between samples of the same image category, and the cross-entropy classification loss can be reduced through network adjustment to pull in the distance between samples of the same image category.
  • triplet loss and the cross-entropy classification loss can be calculated by the following formulas 2 and 3, respectively:
  • the triplet loss in formula 2 is L th , P ⁇ K is the total number of sample images, a is any sample image, and p is the corresponding feature in multiple sample images with the same first category label as a The sample image with the largest distance between the vector and the feature vector corresponding to a in the feature space, n is the sample image with the smallest distance between the corresponding feature vector and the feature vector corresponding to a in the feature space among multiple sample images different from the first category label of a .
  • is a preset correction parameter.
  • the cross-entropy classification loss in formula 3 is L, N is the number of sample images, M is the number of second class labels, p ic is the predicted probability that sample image i belongs to the first predicted class c, y ic is in the second class of sample i It takes 1 when the category label is c, and takes 0 when it is not c.
  • a first-stage adjustment may be performed on the re-identification network until the first network loss satisfies a first preset condition.
  • the first preset condition may be that the first network loss is smaller than a preset first threshold.
  • the re-identification network adjusted in the first stage can obtain a well-distributed feature space. That is to say, by adjusting the feature extraction layer of the re-identification network, the re-recognition network can extract similar feature vectors for images of the same image category, and can also extract similar feature vectors for images of the same object category.
  • Fig. 6 shows a schematic diagram of a first-stage training process of a re-identification network according to an embodiment of the present disclosure.
  • the first category label 61 corresponding to the sample image 60 and Second category label 62 .
  • Each sample image 60 is input into the re-identification network 63 to obtain a first predicted category 64 , and a first loss 65 is calculated according to the first predicted category 64 and the first category label 61 of each sample image 60 .
  • the second loss 66 is calculated according to the first prediction type 64 and the second category label 62 of each sample image 60 , and the re-identification network 63 is jointly adjusted according to the first loss 65 and the second loss 66 .
  • the adjustment method may be to calculate the weighted sum of the first loss 65 and the second loss 66 to obtain the first network loss, and perform a first-stage adjustment on the re-identification network 63 until the first network loss meets the first preset condition.
  • Step S70 Determine the pseudo-label of the sample image according to the re-identification network whose training in the first stage is completed.
  • the pseudo-label of each sample image can be determined according to the feature space with a relatively reasonable distribution after the re-identification network is trained. Among them, the pseudo-label of each sample image is used to represent the category of the object in the sample image during the second stage of training. Pseudo-labels can be labels of any content, and each pseudo-label is used to uniquely represent a class of objects.
  • the way to determine the pseudo-label according to the re-identification network trained in the first stage can be to input each sample image into the re-identification network after the first stage training, and obtain the feature vector after feature extraction for each sample image .
  • the feature vectors of each sample image are clustered, and identification information uniquely corresponding to each cluster obtained after clustering is determined.
  • the identification information corresponding to each cluster is used as the pseudo-label of the sample image corresponding to each feature vector contained therein.
  • the clustering process can be realized based on the k-means clustering algorithm.
  • the unique identification information corresponding to each cluster can be preset or generated according to preset rules.
  • Step S80 performing a second-stage training on the re-identification network obtained after the first-stage training according to the sample images and corresponding first category labels and pseudo-labels.
  • the first category label corresponding to each sample image in the first stage of training process is used as the real image category, and the corresponding Pseudo-labels serve as real object categories.
  • the second-stage training of the re-identification network is performed based on the current real image category of each sample image, the real object category, and the sample image category predicted by the re-identification network. That is to say, each sample image can be input into the re-identification network obtained after the first stage of training, and output the corresponding second predicted category.
  • a second network loss is determined according to the first class label, pseudo-label and second predicted class corresponding to each sample image, and the re-identification network is adjusted according to the second network loss.
  • the second-stage training process calculates the loss according to the real image category and the real object category of the sample image respectively and the second predicted category, and the total re-identification network loss is obtained by Make network adjustments. That is to say, the process of adjusting the re-identification network in the second stage may include determining the third loss according to the first class label and the second predicted class corresponding to each sample image, and determining the third loss according to the pseudo-label corresponding to each sample image and the second predicted class Determine the fourth loss. Then determine the second network loss according to the third loss and the fourth loss, and adjust the re-identification network according to the second network loss. Wherein, the second network loss can be obtained by calculating the weighted sum of the third loss and the fourth loss.
  • the third loss may be a triplet loss
  • the fourth loss may be a cross-entropy classification loss. That is, the third loss can be obtained by calculating the triplet loss of the first category label and the second prediction category of each sample image, and the fourth loss can be obtained by calculating the pseudo-label of each sample image and the cross-entropy classification loss of the first prediction category get.
  • the triplet loss is inversely proportional to the distance between samples of the same object category and proportional to the distance between samples of different object categories.
  • the triplet loss can be reduced by network conditioning, bringing the distance between samples of the same object category closer and the distance between samples of different object categories farther away.
  • the cross-entropy classification loss is inversely proportional to the distance between samples of the same image category, and the cross-entropy classification loss can be reduced through network adjustment to pull in the distance between samples of the same image category.
  • the calculation process of the third loss may be the same as that of the first loss
  • the calculation process of the fourth loss may be the same as that of the second loss.
  • the re-identification network can be adjusted in the second stage until the second network loss satisfies the second preset condition.
  • the second preset condition may be that the second network loss is smaller than a preset second threshold.
  • the re-identification network adjusted in the second stage can obtain a feature space with a more reasonable distribution. That is to say, by adjusting the feature extraction layer of the re-recognition network, the re-recognition network can more accurately extract similar feature vectors for images of the same image category, and can more accurately extract similar feature vectors for images of the same object category. eigenvectors of .
  • Fig. 7 shows a schematic diagram of a second-stage training process of a re-identification network according to an embodiment of the present disclosure.
  • the first category label 71 corresponding to the sample image 70 and Pseudo-labeling72.
  • Each sample image 70 is input into the re-identification network 73 to obtain the second predicted category 74 , and the third loss 75 is calculated according to the second predicted category 74 and the first category label 71 of each sample image 70 .
  • the fourth loss 76 is calculated according to the second prediction type 74 and the pseudo-label 72 of each sample image 70 , and the re-identification network 73 is jointly adjusted according to the third loss 75 and the fourth loss 76 .
  • the adjustment method may be to calculate the weighted sum of the third loss 75 and the fourth loss 76 to obtain the second network loss, and perform a first-stage adjustment on the re-identification network 73 until the second network loss meets the second preset condition.
  • a high-accuracy re-identification network can be obtained through unlabeled data training at low cost and quickly.
  • the re-identification network can accurately extract similar feature vectors for images of the same image category, and also for images of the same object category. It can accurately extract similar feature vectors to obtain a reasonably distributed feature space.
  • the accuracy rate of the re-identification network can be obtained through two-stage training, and the image to be recognized can be accurately re-identified through the re-identification network, and an accurate re-identification result can be obtained.
  • embodiments of the present disclosure further provide an object re-identification method, which is described in detail below:
  • the information of the network input data includes the sample image and its type, as well as the bounding box coordinates of the detection network output.
  • Each sample image has its own independent sample label, and the type of the sample image is called a category label. This label At the beginning, no manual labeling is provided, and some samples are randomly selected to use their sample labels as category labels.
  • the data input by the network is divided into training data, query data, and gallery data.
  • each sample in the training data first undergoes specific data preprocessing, and the defect bounding box with the output of the detection network is generated for each sample by data enhancement. More equivalent data, these equivalent The data share common sample labels and common category labels with the original samples.
  • the algorithm feeds data and labels into a deep learning neural network, which is characterized by two loss functions.
  • the cross-entropy classification loss function uses randomly selected sample labels as the category label as the true value, and the triplet loss function uses the sample label as the true value.
  • a feature space will be obtained, and k-means clustering will be performed on this feature space, and then all samples will get pseudo-labels according to this, and the second learning will be carried out.
  • the cross-entropy classification loss function uses the pseudo labels as ground truth, and the triplet loss function uses the sample labels as ground truth.
  • the feature space distribution is more comprehensive and reasonable. Input the query data and gallery data into the network after preprocessing, calculate the similarity between the query data and the gallery data in the feature space, and use the similarity to judge whether the query data belongs to certain categories.
  • Data enhancement methods include translation, flipping, and reducing the proportion of defective frames to Capture a larger field of view map.
  • a very important step in the algorithm is random sampling and the sample label of the extracted sample is used as the category label for preliminary training.
  • This method frees the samples from the cost of a large number of manual labels, and alleviates the problem that too many category labels are difficult to calculate the cross-entropy classification loss.
  • the randomly selected sample labels are still representative, reflecting the appearance and characteristics of the data set.
  • triple loss function plus difficult sample mining for each sample, take the maximum distance of positive samples (ie, the most difficult positive sample) and the minimum distance of negative samples (ie, the most difficult negative sample) as the optimization of the loss function
  • the goal is to reduce the distance between positive samples and increase the distance between negative samples to obtain a better feature space to ensure effective learning.
  • adding the background image (unblemished image) as a negative sample to the training sample can make the neural network better compare the difference between positive and negative samples, thereby improving the effect.
  • the triplet loss function is determined by the following formula four:
  • k is an adjustable parameter defined by the user, and the number of categories can be further reduced according to the distribution of data to make the samples of the same category closer together. Moreover, after clustering, all samples can be assigned a pseudo-label, which is the label that the sample belongs to the cluster. The samples that have not been randomly drawn before are added to the cross-entropy classification training to further optimize the feature space distribution.
  • the k-means clustering optimization algorithm is determined by the following formula 5, S is the set of all random sampling samples, and ⁇ is the average value in class i.
  • Stage 1 training uses randomly sampled sample labels as the true value of the cross-entropy loss function
  • stage 2 uses the pseudo-labels of all samples obtained after clustering as the true value of the cross-entropy loss function.
  • the number of labels obtained by random sampling and clustering can maintain a constant value or a small incremental value after the data surge, so as to ensure that the calculation cost of cross entropy will not increase sharply with the data surge.
  • the model After the model is trained, you only need to input the image to be detected and the image gallery into the network to obtain the matching result of the image to be detected and the samples in the image gallery. You can determine whether the image to be detected belongs to a certain type of sample and whether it belongs which type of sample.
  • the method of comparing the similarity is to calculate the cosine distance of the samples in the feature space, and the similarity can be determined by the following formula 6. If the query sample has a sample library sample that exceeds a certain pre-value similarity, it is classified into this class, otherwise it is classified into a new class that does not belong to the sample library sample.
  • a and B represent the feature matrix of the sample.
  • the object re-identification method in the embodiment of the present application can achieve the following technical effects:
  • This technique uses an unsupervised re-identification network to assist a classification network to learn to quickly classify images.
  • the network can re-identify and classify images misjudged as a certain category, improve the distribution of the feature space on the basis of the original classification, and refine the feature level, so that the network does not only learn a fuzzy category, but learns each category.
  • the comparison and distribution between samples can improve the original classification accuracy and recall rate.
  • the network uses rendering technology to process each image, and the data enhancement method greatly expands the sample size. It is very time-consuming to process each image separately under such a mechanism, and the method of rendering first, then cutting the frame, and some The method of GPU optimization acceleration can greatly shorten the image processing speed, and can also reduce unnecessary background noise.
  • embodiments of the present disclosure also provide object re-identification devices, electronic equipment, computer-readable storage media, and programs, all of which can be used to implement any object re-identification method provided by the present disclosure, corresponding technical solutions and descriptions, and refer to methods Part of the corresponding record.
  • Fig. 8 shows a schematic diagram of an object re-identification device according to an embodiment of the present disclosure.
  • the object re-identification apparatus of the embodiment of the present disclosure may include an image determination module 80 , a set determination module 81 and a re-identification module 82 .
  • An image determining module 80 configured to determine an image to be recognized including a target object
  • a set determination module 81 configured to determine an image set comprising at least one candidate image, each of which includes an object;
  • the re-identification module 82 is configured to input the image to be recognized and the image set into a re-identification network to obtain a re-identification result, and if there is a target candidate image in the image set, the re-identification result includes The target candidate image, the object included in the target candidate image matches the target object;
  • the re-identification network is obtained through two-stage training, the first-stage training process is implemented according to at least one sample image and the first category label of each of the sample images, and the second-stage training process is based on the at least one sample image,
  • the pseudo-label and the first category label of each of the sample images are realized, the pseudo-label of each of the sample images is determined based on the re-identification network after the first stage of the training process, and the first category label represents the corresponding image category.
  • each of the candidate images has a corresponding second category label, and the second category label represents the category of the object in the corresponding image;
  • the device also includes:
  • the label determination module is configured to determine the second category label corresponding to the target candidate image as the second category label of the image to be recognized.
  • the training process of the re-identification network includes:
  • each of the preset images having at least one image frame for marking the area where the object is located, and a first category label corresponding to each of the image frames;
  • the second-stage training is performed on the re-identification network obtained after the first-stage training according to the sample images and the corresponding first category labels and pseudo-labels.
  • the determining at least one preset image including an object includes:
  • Random sampling is performed on the set of preset images to obtain at least one preset image including an object.
  • the determining at least one sample image corresponding to each preset image according to the corresponding at least one image frame includes:
  • image preprocessing is performed on the preset images.
  • the first-stage training of the re-identification network according to the sample image and the corresponding first category label includes:
  • a first network loss is determined according to the first category label, the second category label and the first predicted category corresponding to each of the sample images, and the re-identification network is adjusted according to the first network loss.
  • the first network loss is determined according to the first category label, the second category label, and the first predicted category corresponding to each of the sample images, and the first network loss is adjusted according to the first network loss.
  • the re-identification network includes:
  • a first network loss is determined based on the first loss and the second loss, and the re-identification network is adjusted based on the first network loss.
  • the determining the pseudo-label of the sample image according to the re-identification network whose training is completed in the first stage includes:
  • Each of the sample images is input into the re-identification network that has been trained in the first stage to obtain a feature vector after feature extraction is performed on each of the sample images;
  • Clustering the feature vectors of each of the sample images, and determining identification information uniquely corresponding to each cluster obtained after clustering;
  • the identification information corresponding to each of the clusters is used as a pseudo-label corresponding to the sample image included in each feature vector.
  • the clustering process is implemented based on a k-means clustering algorithm.
  • the second-stage training of the re-identification network obtained after the first-stage training according to the sample image and the corresponding first category label and pseudo-label includes:
  • a second network loss is determined according to the first class label, pseudo-label and second predicted class corresponding to each of the sample images, and the re-identification network is adjusted according to the second network loss.
  • the second network loss is determined according to the first category label, pseudo-label and second prediction category corresponding to each of the sample images, and the weighting is adjusted according to the second network loss.
  • Identification networks include:
  • a second network loss is determined based on the third loss and the fourth loss, and the re-identification network is adjusted based on the second network loss.
  • the first loss and/or the third loss is a triplet loss
  • the second loss and/or the fourth loss is a cross-entropy classification loss
  • the re-identification module 82 includes:
  • the image input submodule is configured to input the image to be recognized and the set of images into a re-identification network, and extract the target object features of the image to be recognized through the re-identification network, and the candidate of each of the candidate images object characteristics;
  • a similarity matching submodule configured to determine the similarity between each of the candidate images and the image to be recognized according to the characteristics of the target object and the characteristics of each of the candidate objects;
  • the result output submodule is configured to, in response to the similarity between the candidate image and the image to be recognized satisfying a preset condition, determine that the object in the candidate image matches the target object, and use the candidate image as the target candidate image Get re-identification results.
  • the preset condition includes that the similarity value is the largest and greater than a similarity threshold.
  • an electronic device including: a processor; a memory for storing instructions executable by the processor; wherein the processor is configured to call the instructions stored in the memory, to perform the above method.
  • the functions or modules included in the apparatus provided by the embodiments of the present disclosure may be used to execute the methods described in the above method embodiments, and for detailed implementation, refer to the descriptions of the above method embodiments.
  • Embodiments of the present disclosure also provide a computer-readable storage medium, on which computer program instructions are stored, and the above-mentioned method is implemented when the computer program instructions are executed by a processor.
  • Computer readable storage media may be volatile or nonvolatile computer readable storage media.
  • An embodiment of the present disclosure also proposes an electronic device, including: a processor; a memory for storing instructions executable by the processor; wherein the processor is configured to invoke the instructions stored in the memory to execute the above method.
  • An embodiment of the present disclosure also provides a computer program product, including computer-readable codes, or a non-volatile computer-readable storage medium carrying computer-readable codes, when the computer-readable codes are stored in a processor of an electronic device When running in the electronic device, the processor in the electronic device executes the above method.
  • Electronic devices may be provided as terminals, servers, or other forms of devices.
  • FIG. 9 shows a schematic diagram of an electronic device 800 according to an embodiment of the present disclosure.
  • the electronic device 800 may be a terminal such as a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, a fitness device, or a personal digital assistant.
  • electronic device 800 may include one or more of the following components: processing component 802, memory 804, power supply component 806, multimedia component 808, audio component 810, input/output (I/O) interface 812, sensor component 814, and a communication component 816 .
  • the processing component 802 generally controls the overall operations of the electronic device 800, such as those associated with display, telephone calls, data communications, camera operations, and recording operations.
  • the processing component 802 may include one or more processors 820 to execute instructions to complete all or part of the steps of the above method. Additionally, processing component 802 may include one or more modules that facilitate interaction between processing component 802 and other components. For example, processing component 802 may include a multimedia module to facilitate interaction between multimedia component 808 and processing component 802 .
  • the memory 804 is configured to store various types of data to support operations at the electronic device 800 . Examples of such data include instructions for any application or method operating on the electronic device 800, contact data, phonebook data, messages, pictures, videos, and the like.
  • the memory 804 can be implemented by any type of volatile or non-volatile storage device or their combination, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable Programmable Read Only Memory (EPROM), Programmable Read Only Memory (PROM), Read Only Memory (ROM), Magnetic Memory, Flash Memory, Magnetic or Optical Disk.
  • SRAM static random access memory
  • EEPROM electrically erasable programmable read-only memory
  • EPROM erasable Programmable Read Only Memory
  • PROM Programmable Read Only Memory
  • ROM Read Only Memory
  • Magnetic Memory Flash Memory
  • Magnetic or Optical Disk Magnetic Disk
  • the power supply component 806 provides power to various components of the electronic device 800 .
  • Power components 806 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for electronic device 800 .
  • the multimedia component 808 includes a screen providing an output interface between the electronic device 800 and the user.
  • the screen may include a liquid crystal display (LCD) and a touch panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from a user.
  • the touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensor may not only sense a boundary of a touch or swipe action, but also detect duration and pressure associated with the touch or swipe action.
  • multimedia component 808 includes a front camera and/or rear camera. When the electronic device 800 is in an operation mode, such as a shooting mode or a video mode, the front camera and/or the rear camera can receive external multimedia data. Each front camera and rear camera can be a fixed optical lens system or have focal length and optical zoom capability.
  • the audio component 810 is configured to output and/or input audio signals.
  • the audio component 810 includes a microphone (MIC), which is configured to receive external audio signals when the electronic device 800 is in operation modes, such as call mode, recording mode and voice recognition mode. Received audio signals may be further stored in memory 804 or sent via communication component 816 .
  • the audio component 810 also includes a speaker for outputting audio signals.
  • the I/O interface 812 provides an interface between the processing component 802 and a peripheral interface module, which may be a keyboard, a click wheel, a button, and the like. These buttons may include, but are not limited to: a home button, volume buttons, start button, and lock button.
  • Sensor assembly 814 includes one or more sensors for providing status assessments of various aspects of electronic device 800 .
  • the sensor component 814 can detect the open/closed state of the electronic device 800, the relative positioning of components, such as the display and the keypad of the electronic device 800, the sensor component 814 can also detect the electronic device 800 or a Changes in position of components, presence or absence of user contact with electronic device 800 , electronic device 800 orientation or acceleration/deceleration and temperature changes in electronic device 800 .
  • Sensor assembly 814 may include a proximity sensor configured to detect the presence of nearby objects in the absence of any physical contact.
  • Sensor assembly 814 may also include an optical sensor, such as a complementary metal-oxide-semiconductor (CMOS) or charge-coupled device (CCD) image sensor, for use in imaging applications.
  • CMOS complementary metal-oxide-semiconductor
  • CCD charge-coupled device
  • the sensor component 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor or a temperature sensor.
  • the communication component 816 is configured to facilitate wired or wireless communication between the electronic device 800 and other devices.
  • the electronic device 800 can access a wireless network based on a communication standard, such as a wireless network (WiFi), a second generation mobile communication technology (2G) or a third generation mobile communication technology (3G), or a combination thereof.
  • the communication component 816 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel.
  • the communication component 816 also includes a near field communication (NFC) module to facilitate short-range communication.
  • the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, Infrared Data Association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology and other technologies.
  • RFID Radio Frequency Identification
  • IrDA Infrared Data Association
  • UWB Ultra Wideband
  • Bluetooth Bluetooth
  • electronic device 800 may be implemented by one or more application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field programmable A programmable gate array (FPGA), controller, microcontroller, microprocessor or other electronic component implementation for performing the methods described above.
  • ASICs application specific integrated circuits
  • DSPs digital signal processors
  • DSPDs digital signal processing devices
  • PLDs programmable logic devices
  • FPGA field programmable A programmable gate array
  • controller microcontroller, microprocessor or other electronic component implementation for performing the methods described above.
  • a non-volatile computer-readable storage medium such as the memory 804 including computer program instructions, which can be executed by the processor 820 of the electronic device 800 to implement the above method.
  • FIG. 10 shows a schematic diagram of another electronic device 1900 according to an embodiment of the present disclosure.
  • electronic device 1900 may be provided as a server.
  • electronic device 1900 includes processing component 1922 , which further includes one or more processors, and memory resources represented by memory 1932 for storing instructions executable by processing component 1922 , such as application programs.
  • the application programs stored in memory 1932 may include one or more modules each corresponding to a set of instructions.
  • the processing component 1922 is configured to execute instructions to perform the above method.
  • Electronic device 1900 may also include a power supply component 1926 configured to perform power management of electronic device 1900, a wired or wireless network interface 1950 configured to connect electronic device 1900 to a network, and an input-output (I/O) interface 1958 .
  • the electronic device 1900 can operate based on the operating system stored in the memory 1932, such as the Microsoft server operating system (Windows Server TM ), the graphical user interface-based operating system (Mac OS X TM ) introduced by Apple Inc., and the multi-user and multi-process computer operating system (Unix TM ), a free and open source Unix-like operating system (Linux TM ), an open source Unix-like operating system (FreeBSD TM ), or the like.
  • Microsoft server operating system Windows Server TM
  • Mac OS X TM graphical user interface-based operating system
  • Unix TM multi-user and multi-process computer operating system
  • Linux TM free and open source Unix-like operating system
  • FreeBSD TM open source Unix-like operating system
  • a non-transitory computer-readable storage medium such as the memory 1932 including computer program instructions, which can be executed by the processing component 1922 of the electronic device 1900 to implement the above method.
  • the present disclosure can be a system, method and/or computer program product.
  • a computer program product may include a computer readable storage medium having computer readable program instructions thereon for causing a processor to implement various aspects of the present disclosure.
  • a computer readable storage medium may be a tangible device that can retain and store instructions for use by an instruction execution device.
  • a computer readable storage medium may be, for example, but is not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing.
  • Computer-readable storage media include: portable computer diskettes, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM), or flash memory), static random access memory (SRAM), compact disc read only memory (CD-ROM), digital versatile disc (DVD), memory stick, floppy disk, mechanically encoded device, such as a printer with instructions stored thereon A hole card or a raised structure in a groove, and any suitable combination of the above.
  • RAM random access memory
  • ROM read-only memory
  • EPROM erasable programmable read-only memory
  • flash memory static random access memory
  • SRAM static random access memory
  • CD-ROM compact disc read only memory
  • DVD digital versatile disc
  • memory stick floppy disk
  • mechanically encoded device such as a printer with instructions stored thereon
  • a hole card or a raised structure in a groove and any suitable combination of the above.
  • computer-readable storage media are not to be construed as transient signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., pulses of light through fiber optic cables), or transmitted electrical signals.
  • Computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a respective computing/processing device, or downloaded to an external computer or external storage device over a network, such as the Internet, a local area network, a wide area network, and/or a wireless network.
  • the network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers, and/or edge servers.
  • a network adapter card or a network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in each computing/processing device .
  • Computer program instructions for performing the operations of the present disclosure may be assembly instructions, instruction set architecture (ISA) instructions, machine instructions, machine-dependent instructions, microcode, firmware instructions, state setting data, or Source or object code written in any combination, including object-oriented programming languages—such as Smalltalk, C++, etc., and conventional procedural programming languages—such as the “C” language or similar programming languages.
  • Computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server implement.
  • the remote computer can be connected to the user computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (such as via the Internet using an Internet service provider). connect).
  • LAN local area network
  • WAN wide area network
  • an electronic circuit such as a programmable logic circuit, field programmable gate array (FPGA), or programmable logic array (PLA)
  • FPGA field programmable gate array
  • PDA programmable logic array
  • These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine such that when executed by the processor of the computer or other programmable data processing apparatus , producing an apparatus for realizing the functions/actions specified in one or more blocks in the flowchart and/or block diagram.
  • These computer-readable program instructions can also be stored in a computer-readable storage medium, and these instructions cause computers, programmable data processing devices and/or other devices to work in a specific way, so that the computer-readable medium storing instructions includes An article of manufacture comprising instructions for implementing various aspects of the functions/acts specified in one or more blocks in flowcharts and/or block diagrams.
  • each block in a flowchart or block diagram may represent a module, a portion of a program segment, or an instruction that includes one or more Executable instructions.
  • the functions noted in the block may occur out of the order noted in the figures. For example, two blocks in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved.
  • each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations can be implemented by a dedicated hardware-based system that performs the specified function or action , or may be implemented by a combination of dedicated hardware and computer instructions.
  • the computer program product can be specifically realized by means of hardware, software or a combination thereof.
  • the computer program product is embodied as a computer storage medium, and in another optional embodiment, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK) etc. wait.
  • a software development kit Software Development Kit, SDK

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

Des modes de réalisation de la présente invention concernent un procédé et un appareil de réidentification d'objet, un dispositif électronique, un support de stockage et un produit programme d'ordinateur. Le procédé consiste à : déterminer une image à identifier qui comprend un objet cible et un ensemble d'images comprenant une image candidate, chaque image candidate comprenant au moins un objet ; et entrer l'image à identifier et l'ensemble d'images dans un réseau de réidentification pour obtenir une image candidate cible comprenant un objet mis en correspondance avec l'objet cible. Le réseau de réidentification est obtenu au moyen d'un entraînement à deux étages, le processus d'entraînement de premier étage est mis en œuvre selon une image d'échantillon et une première étiquette de catégorie correspondante, et le processus d'entraînement de second étage est mis en œuvre selon l'image d'échantillon, une pseudo-étiquette correspondante et la première étiquette de catégorie, la pseudo-étiquette de chaque image d'échantillon étant déterminée selon le réseau de réidentification après un premier entraînement. Selon la présente invention, les performances du réseau de réidentification sont améliorées au moyen d'un entraînement à deux étages, ce qui permet d'améliorer la précision d'un résultat d'identification.
PCT/CN2022/104715 2021-12-24 2022-07-08 Procédé et appareil de réidentification d'objet, dispositif électronique, support de stockage et produit programme d'ordinateur WO2023115911A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111601354.8 2021-12-24
CN202111601354.8A CN114332503A (zh) 2021-12-24 2021-12-24 对象重识别方法及装置、电子设备和存储介质

Publications (1)

Publication Number Publication Date
WO2023115911A1 true WO2023115911A1 (fr) 2023-06-29

Family

ID=81012974

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/104715 WO2023115911A1 (fr) 2021-12-24 2022-07-08 Procédé et appareil de réidentification d'objet, dispositif électronique, support de stockage et produit programme d'ordinateur

Country Status (2)

Country Link
CN (1) CN114332503A (fr)
WO (1) WO2023115911A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116665135A (zh) * 2023-07-28 2023-08-29 中国华能集团清洁能源技术研究院有限公司 储能站电池组的热失控风险预警方法、装置和电子设备

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114332503A (zh) * 2021-12-24 2022-04-12 商汤集团有限公司 对象重识别方法及装置、电子设备和存储介质
CN117058489B (zh) * 2023-10-09 2023-12-29 腾讯科技(深圳)有限公司 多标签识别模型的训练方法、装置、设备及存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180374233A1 (en) * 2017-06-27 2018-12-27 Qualcomm Incorporated Using object re-identification in video surveillance
CN111783646A (zh) * 2020-06-30 2020-10-16 北京百度网讯科技有限公司 行人再识别模型的训练方法、装置、设备和存储介质
CN111967294A (zh) * 2020-06-23 2020-11-20 南昌大学 一种无监督域自适应的行人重识别方法
CN112069929A (zh) * 2020-08-20 2020-12-11 之江实验室 一种无监督行人重识别方法、装置、电子设备及存储介质
CN113095174A (zh) * 2021-03-29 2021-07-09 深圳力维智联技术有限公司 重识别模型训练方法、装置、设备及可读存储介质
CN114332503A (zh) * 2021-12-24 2022-04-12 商汤集团有限公司 对象重识别方法及装置、电子设备和存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180374233A1 (en) * 2017-06-27 2018-12-27 Qualcomm Incorporated Using object re-identification in video surveillance
CN111967294A (zh) * 2020-06-23 2020-11-20 南昌大学 一种无监督域自适应的行人重识别方法
CN111783646A (zh) * 2020-06-30 2020-10-16 北京百度网讯科技有限公司 行人再识别模型的训练方法、装置、设备和存储介质
CN112069929A (zh) * 2020-08-20 2020-12-11 之江实验室 一种无监督行人重识别方法、装置、电子设备及存储介质
CN113095174A (zh) * 2021-03-29 2021-07-09 深圳力维智联技术有限公司 重识别模型训练方法、装置、设备及可读存储介质
CN114332503A (zh) * 2021-12-24 2022-04-12 商汤集团有限公司 对象重识别方法及装置、电子设备和存储介质

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116665135A (zh) * 2023-07-28 2023-08-29 中国华能集团清洁能源技术研究院有限公司 储能站电池组的热失控风险预警方法、装置和电子设备
CN116665135B (zh) * 2023-07-28 2023-10-20 中国华能集团清洁能源技术研究院有限公司 储能站电池组的热失控风险预警方法、装置和电子设备

Also Published As

Publication number Publication date
CN114332503A (zh) 2022-04-12

Similar Documents

Publication Publication Date Title
WO2021155632A1 (fr) Procédé et appareil de traitement d'image, dispositif électronique et support de stockage
US11120078B2 (en) Method and device for video processing, electronic device, and storage medium
WO2023115911A1 (fr) Procédé et appareil de réidentification d'objet, dispositif électronique, support de stockage et produit programme d'ordinateur
WO2021128578A1 (fr) Procédé et appareil de traitement d'image, dispositif électronique et support de stockage
WO2021056808A1 (fr) Procédé et appareil de traitement d'image, dispositif électronique, et support de stockage
CN108629354B (zh) 目标检测方法及装置
US11455491B2 (en) Method and device for training image recognition model, and storage medium
CN113792207B (zh) 一种基于多层次特征表示对齐的跨模态检索方法
CN106228556B (zh) 图像质量分析方法和装置
CN110532956B (zh) 图像处理方法及装置、电子设备和存储介质
US11222231B2 (en) Target matching method and apparatus, electronic device, and storage medium
CN111259148A (zh) 信息处理方法、装置及存储介质
CN113326768B (zh) 训练方法、图像特征提取方法、图像识别方法及装置
CN111259967B (zh) 图像分类及神经网络训练方法、装置、设备及存储介质
CN111582383B (zh) 属性识别方法及装置、电子设备和存储介质
TW202141352A (zh) 字元識別方法及電子設備和電腦可讀儲存介質
CN112150457A (zh) 视频检测方法、装置及计算机可读存储介质
WO2022141969A1 (fr) Procédé et appareil de segmentation d'image, dispositif électronique, support de stockage et programme
CN112381091A (zh) 视频内容识别方法、装置、电子设备及存储介质
CN111178115B (zh) 对象识别网络的训练方法及系统
CN110110742B (zh) 多特征融合方法、装置、电子设备及存储介质
CN111797746A (zh) 人脸识别方法、装置及计算机可读存储介质
WO2021061045A2 (fr) Procédé et appareil de reconnaissance d'objet empilé, dispositif électronique et support de stockage
WO2023092975A1 (fr) Procédé et appareil de traitement d'images, dispositif électronique, support d'enregistrement, et produit programme informatique
CN112801116B (zh) 图像的特征提取方法及装置、电子设备和存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22909242

Country of ref document: EP

Kind code of ref document: A1