WO2023142551A1 - Model training and image recognition methods and apparatuses, device, storage medium and computer program product - Google Patents

Model training and image recognition methods and apparatuses, device, storage medium and computer program product Download PDF

Info

Publication number
WO2023142551A1
WO2023142551A1 PCT/CN2022/127109 CN2022127109W WO2023142551A1 WO 2023142551 A1 WO2023142551 A1 WO 2023142551A1 CN 2022127109 W CN2022127109 W CN 2022127109W WO 2023142551 A1 WO2023142551 A1 WO 2023142551A1
Authority
WO
WIPO (PCT)
Prior art keywords
sub
feature
loss value
image
target
Prior art date
Application number
PCT/CN2022/127109
Other languages
French (fr)
Chinese (zh)
Inventor
唐诗翔
朱烽
赵瑞
Original Assignee
上海商汤智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海商汤智能科技有限公司 filed Critical 上海商汤智能科技有限公司
Publication of WO2023142551A1 publication Critical patent/WO2023142551A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/217Validation; Performance evaluation; Active pattern learning techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures

Definitions

  • the embodiment of the present disclosure is based on the Chinese patent application with the application number 202210107742.9, the application date is January 28, 2022, and the application name is "model training and image recognition method and device, equipment and storage medium", and requires the Chinese patent application
  • the entire content of this Chinese patent application is hereby incorporated into this disclosure as a reference.
  • the present disclosure relates to but not limited to the field of computer technology, and in particular relates to a model training and image recognition method and device, device, storage medium and computer program product.
  • Object re-identification also known as object re-identification
  • Object re-identification is a technology that uses computer vision technology to determine whether a specific object exists in an image or video sequence.
  • Object re-identification is widely considered as a subproblem of image retrieval, i.e., given an image containing an object, retrieve images containing that object across devices. The differences between devices, shooting angles, environments and other factors will all affect the results of object re-identification.
  • Embodiments of the present disclosure provide a model training and image recognition method, device, device, storage medium, and computer program product.
  • An embodiment of the present disclosure provides a model training method, which includes:
  • the first feature is updated to obtain the first target feature corresponding to the first feature, and the similarity between each second object and the first object not less than the first threshold;
  • the model parameters of the first model are updated at least once to obtain the trained first model.
  • An embodiment of the present disclosure provides an image recognition method, the method comprising:
  • the trained target model uses the trained target model to identify the object in the first image and the object in the second image to obtain the recognition result, wherein the trained target model includes: the first model obtained by the above-mentioned model training method; the recognition result representation The object in the first image and the object in the second image are the same object or different objects.
  • An embodiment of the present disclosure provides a model training device, which includes:
  • a first acquisition part configured to acquire a first image sample containing a first object
  • the feature extraction part is configured to use the first network of the first model to be trained to perform feature extraction on the first image sample to obtain the first feature of the first object;
  • the first update part is configured to use the second network of the first model to update the first features respectively based on the second features of at least one second object to obtain the first target features corresponding to the first features, and each of the second objects The similarity between the second object and the first object is not less than the first threshold;
  • the first determination part is configured to determine a target loss value based on the first target feature
  • the second updating part is configured to update the model parameters of the first model at least once based on the target loss value to obtain the trained first model.
  • An embodiment of the present disclosure provides an image recognition device, which includes:
  • a second acquisition part configured to acquire the first image and the second image
  • the identification part is configured to use the trained target model to identify the object in the first image and the object in the second image to obtain a recognition result, wherein the trained target model includes: the first obtained by using the above model training method A model; the recognition result indicates that the object in the first image and the object in the second image are the same object or different objects.
  • An embodiment of the present disclosure provides an electronic device, including a processor and a memory, the memory stores a computer program that can run on the processor, and the above method is implemented when the processor executes the computer program.
  • An embodiment of the present disclosure provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the foregoing method is implemented.
  • An embodiment of the present disclosure provides a computer program product, where the computer program product includes a computer program or an instruction, and when the computer program or instruction is run on the electronic device, the electronic device is made to execute the above method.
  • the first network of the first model by acquiring the first image sample containing the first object; using the first network of the first model to be trained, performing feature extraction on the first image sample to obtain the first feature of the first object; using The second network of the first model updates the first feature based on the second feature of at least one second object to obtain the first target feature corresponding to the first feature, and the similarity between each second object and the first object is different. less than the first threshold; determining a target loss value based on the first target feature; and updating model parameters of the first model at least once based on the target loss value to obtain a trained first model.
  • the characteristics of the second object are introduced as noise at the feature level of the first image sample containing the first object, and the overall network structure of the first model is trained, so that the robustness of the first model can be enhanced and the first model can be improved.
  • the model parameters of the first model are updated at least once. Since the target loss value is determined based on the first target feature, the first model after training can be improved. The consistency of the model's predictions for different image samples of the same object can further enable the trained first model to more accurately re-identify objects in images containing multiple objects.
  • FIG. 1 is a schematic diagram of the implementation flow of a model training method provided by an embodiment of the present disclosure
  • FIG. 2 is a schematic diagram of an implementation flow of a model training method provided by an embodiment of the present disclosure
  • FIG. 3 is a schematic diagram of an implementation flow of a model training method provided by an embodiment of the present disclosure
  • FIG. 4 is a schematic diagram of an implementation flow of an image recognition method provided by an embodiment of the present disclosure.
  • FIG. 5A is a schematic diagram of the composition and structure of a model training system provided by an embodiment of the present disclosure
  • FIG. 5B is a schematic diagram of a model training system provided by an embodiment of the present disclosure.
  • FIG. 5C is a schematic diagram of determining an occlusion mask provided by an embodiment of the present disclosure.
  • FIG. 5D is a schematic diagram of a first network provided by an embodiment of the present disclosure.
  • FIG. 5E is a schematic diagram of a second subnetwork provided by an embodiment of the present disclosure.
  • FIG. 5F is a schematic diagram of a second network provided by an embodiment of the present disclosure.
  • FIG. 5G is a schematic diagram of obtaining a target loss value provided by an embodiment of the present disclosure.
  • FIG. 5H is a schematic diagram of an occlusion score of a pedestrian image provided by an embodiment of the present disclosure.
  • FIG. 5I is a schematic diagram of an image retrieval result provided by an embodiment of the present disclosure.
  • FIG. 6 is a schematic diagram of the composition and structure of a model training device provided by an embodiment of the present disclosure.
  • FIG. 7 is a schematic diagram of the composition and structure of an image recognition device provided by an embodiment of the present disclosure.
  • FIG. 8 is a schematic diagram of a hardware entity of an electronic device in an embodiment of the present disclosure.
  • the embodiment of the present disclosure provides a model training method, which introduces the features of the second object as noise at the feature level of the first image sample containing the first object, and trains the overall network structure of the first model, so that the first model can be enhanced Robustness and improve the performance of the first model, at the same time, when the target loss value does not meet the preset conditions, the model parameters of the first model are updated at least once, because the target loss value is determined based on the first target feature , so that the prediction consistency of the trained first model for different image samples of the same object can be improved, thereby enabling the trained first model to more accurately re-identify objects in images containing multiple objects.
  • Both the model training method and the image recognition method provided by the embodiments of the present disclosure can be executed by electronic equipment, and the electronic equipment can be a notebook computer, a tablet computer, a desktop computer, a set-top box, a mobile device (for example, a mobile phone, a portable music player, a personal digital
  • a mobile device for example, a mobile phone, a portable music player, a personal digital
  • the server can be an independent physical server, or a server cluster or distributed system composed of multiple physical servers, or it can provide cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, intermediate Cloud servers for basic cloud computing services such as mail service, domain name service, security service, content delivery network (Content Delivery Network, CDN), and big data and artificial intelligence platforms.
  • CDN Content Delivery Network
  • Fig. 1 is a schematic diagram of the implementation flow of a model training method provided by an embodiment of the present disclosure. As shown in Fig. 1, the method includes steps S11 to S15, wherein:
  • Step S11 acquiring a first image sample including a first object.
  • the first image sample may be any suitable image containing at least the first object.
  • Content contained in the first image sample may be determined according to an actual application scenario, for example, only the first object, or at least one of the first object and the object, or other objects.
  • the first object may include, but is not limited to, people, animals, plants, objects, and the like.
  • the first image sample is a face image containing Zhang San.
  • the first image sample is an image including Li Si's whole person.
  • the first image sample may include at least one image.
  • the first image sample is any image in the training set.
  • the first image sample includes a first sub-image and a second sub-image, wherein the first sub-image is an image in the training set, and the second sub-image is an image obtained by augmenting the first sub-image.
  • the augmentation processing may include but not limited to at least one of occlusion processing, scaling processing, cropping processing, size adjustment processing, filling processing, flipping processing, color dithering processing, grayscale processing, Gaussian blur processing, random erasing processing, etc. kind.
  • those skilled in the art may use appropriate augmentation processing on the first sub-image to obtain the second sub-image according to actual conditions, which is not limited in the embodiments of the present disclosure.
  • the first image sample includes a first sub-image and a plurality of second sub-images, wherein the first sub-image is an image in the training set, and each second sub-image is an augmentation process on the first sub-image image obtained after.
  • Step S12 using the first network of the first model to be trained, to perform feature extraction on the first image sample to obtain the first feature of the first object.
  • the first model may be any suitable model for object recognition based on image features.
  • the first model may include at least a first network.
  • the first feature may include, but not limited to, the original feature of the first image sample, or a feature obtained by processing the original feature.
  • the original feature may include but not limited to the face feature, body feature, etc. of the first object included in the image.
  • the first network may at least include a first sub-network, and the first sub-network is used to extract features of the first image using a feature extractor.
  • the feature extractor may include, but not limited to, a recurrent neural network (Recurrent Neural Network, RNN), a convolutional neural network (Convolutional Neural Network, CNN), a feature extraction network based on a converter (Transform), etc.
  • RNN recurrent neural network
  • CNN convolutional Neural Network
  • Transform a feature extraction network based on a converter
  • those skilled in the art may use an appropriate first network in the first model to obtain the first feature according to actual conditions, which is not limited in the embodiments of the present disclosure.
  • the third feature of the first image sample is extracted through the first sub-network, and the third feature is determined as the first feature of the first object.
  • the third feature may include, but not limited to, the original feature of the first image sample and the like.
  • the first network may further include a second sub-network for determining the first feature of the first object based on the third feature of the first image sample.
  • the second sub-network may include an occlusion erasure network, which is used to perform occlusion erasure processing on the input third feature to obtain the first feature of the first object.
  • Step S13 using the second network of the first model to update the first feature based on the second feature of at least one second object to obtain the first target feature corresponding to the first feature.
  • the similarity between each second object and the first object is not less than the first threshold.
  • the first threshold may be preset or obtained by statistics. During implementation, those skilled in the art may independently determine the setting manner of the first threshold according to actual needs, which is not limited in the embodiments of the present disclosure.
  • the similarity between the facial features of the second object and the first object is not less than the first threshold.
  • the similarity between the wearing features of the second object and the first object is not less than the first threshold.
  • neither the similarity between the appearance characteristics of the second object nor the similarity between the clothing characteristics of the first object is less than the first threshold.
  • the second feature can be obtained based on the training set, or can be pre-input.
  • the second object may include, but is not limited to, people, animals, plants, objects, and the like.
  • the similarity between each second object and the first object may be obtained based on the similarity between the second feature of each second object and the first feature of the first object. In some implementations, the similarity between each second object and the first object may be obtained based on the similarity between the feature center of each second object and the first feature of the first object.
  • the first model may include a second memory feature library
  • the second memory feature library may include at least one feature of at least one object. The feature center of the second object may be obtained based on at least one feature belonging to the second object in the second memory feature library.
  • features of multiple image samples of at least one object in the training set may be extracted, and the extracted features may be stored in the second memory feature library according to their identity.
  • the second network may include a fifth sub-network and a sixth sub-network, the fifth sub-network is used to aggregate the second feature with the first feature to obtain the first aggregated sub-feature; the sixth sub-network The network is used to update the first aggregation sub-feature to obtain the first target feature.
  • Step S14 Determine the target loss value based on the first target feature.
  • the target loss value may include, but not limited to, at least one of a mean square error loss value, a cross-entropy loss value, a comparison loss value, and the like.
  • Step S15 based on the target loss value, update the model parameters of the first model at least once to obtain the trained first model.
  • the target loss value is compared with the threshold value, and if the target loss value is greater than the threshold value, the model parameters of the first model are updated; when the target loss value is not greater than the threshold value, the first model is determined as training After the first model. For another example, compare the target loss value with the last target loss value, and update the model parameters of the first model if the target loss value is greater than the last target loss value; In the case of a target loss value once, the first model is determined as the first model after training.
  • the first network of the first model by acquiring the first image sample containing the first object; using the first network of the first model to be trained, performing feature extraction on the first image sample to obtain the first feature of the first object; using The second network of the first model updates the first feature based on the second feature of at least one second object to obtain the first target feature corresponding to the first feature, and the similarity between each second object and the first object is different. less than the first threshold; determine the target loss value based on the first target feature; and update the model parameters of the first model at least once based on the target loss value to obtain the trained first model.
  • the characteristics of the second object are introduced as noise at the feature level of the first image sample containing the first object, and the overall network structure of the first model is trained, so that the robustness of the first model can be enhanced and the first model can be improved.
  • the model parameters of the first model are updated at least once. Since the target loss value is determined based on the first target feature, the first model after training can be improved. The consistency of the model's predictions for different image samples of the same object can further enable the trained first model to more accurately re-identify objects in images containing multiple objects.
  • the first image sample includes label information
  • the first model includes a first feature memory library
  • the first feature memory library includes at least one feature belonging to at least one object
  • the above step S14 includes step S141 to step S143, in:
  • Step S141 Determine a first loss value based on the first target feature and label information.
  • tag information may include, but not limited to, tag values, identifiers, and the like.
  • the first loss value may include, but not limited to, a cross-entropy loss value and the like.
  • the first loss value can be calculated by the following formula (1-1):
  • W is a linear matrix
  • W i and W j are the elements in W
  • y i represents the label information of the i-th object
  • f i represents the first target feature of the i-th object
  • ID S represents the total number of objects in the training set .
  • Step S142 Determine a second loss value based on the first target feature and at least one feature of at least one object in the first feature memory.
  • the second loss value may include but not limited to contrastive loss and the like.
  • Step S143 Determine a target loss value based on the first loss value and the second loss value.
  • the target loss value may include, but not limited to, the sum of the first loss value and the second loss value, the sum after weighting the first loss value and the second loss value respectively, and the like.
  • the target loss value can be calculated by the following formula (1-2):
  • step S142 includes step S1421 to step S1422, wherein:
  • Step S1421. From at least one feature of at least one object in the first feature memory, determine a first feature center of the first object and a second feature center of at least one second object.
  • the first feature center may be determined based on the features of the first object in the first feature memory and the first target feature.
  • Each second feature center may be determined based on each feature of each second object in the second feature memory.
  • the feature center of each object can be calculated by the following formula (1-3):
  • c k represents the feature center of the k-th object
  • B k represents the feature set belonging to the k-th object in the mini-batch
  • m is the set updated momentum coefficient
  • f i ′ is the first feature of the i-th sample .
  • m can be 0.2.
  • the feature center c k belonging to the object when f i ' and B k both belong to the same object, the feature center c k belonging to the object will change, and in the case that f i ' and B k do not belong to the same object, the feature center c k belonging to the object The feature center c k is consistent with the previous c k .
  • Step S1422. Determine a second loss value based on the first target feature, the first feature center and each second feature center.
  • the second loss value can be calculated by the following formula (1-4):
  • is a predefined temperature parameter
  • c i represents the first feature center of the i-th object
  • c j represents each second feature center
  • f i represents the first target feature of the i-th object
  • ID S represents the training The total number of objects in the set.
  • step S15 includes step S151 or step S152, wherein:
  • Step S151 if the target loss value does not meet the preset condition, update the model parameters of the first model to obtain an updated first model; based on the updated first model, determine a trained first model.
  • the manner of updating the model parameters of the first model may include but not limited to at least one of gradient descent method, momentum update method, Newton momentum method and the like.
  • those skilled in the art may independently determine the update mode according to actual needs, which is not limited in the embodiments of the present disclosure.
  • Step S152 if the target loss value satisfies the preset condition, determine the updated first model as the trained first model.
  • the preset conditions may include, but are not limited to, the target loss value being smaller than a threshold, the change of the target loss value converging, and the like.
  • those skilled in the art may independently determine the preset conditions according to actual needs, which are not limited by the embodiments of the present disclosure.
  • determining the first model after training based on the updated first model in step S151 includes steps S1511 to S1515, wherein:
  • Step S1511 acquiring the next first image sample
  • Step S1512 Using the updated first network of the first model to be trained, perform feature extraction on the next first image sample to obtain the next first feature;
  • Step S1513 using the updated second network of the first model to update the next first feature based on the second feature of at least one second object, to obtain the next first target feature corresponding to the next first feature;
  • Step S1514 based on the next first target feature, determine the next target loss value
  • Step S1515 Based on the next target loss value, perform at least one next update on the model parameters of the updated first model to obtain the trained first model.
  • step S1511 to step S1515 correspond to the above step S11 to step S15 respectively, and for implementation, reference may be made to the implementation manner of the above step S11 to step S15.
  • the model parameters of the first model are updated next time, and the first model after training is determined based on the first model after the next update, so that The performance of the trained first model can be further improved through continuous iterative updating.
  • the first feature memory library includes feature sets belonging to at least one object, each feature set includes at least one feature of the object to which it belongs, and the method further includes step S16, wherein:
  • Step S16 based on the first target feature, update the feature set belonging to the first object in the first feature storage.
  • the way of updating may include but not limited to adding the first target feature to the first feature storage, replacing a certain feature in the first feature storage with the first target feature, and so on.
  • the first feature center belonging to the first object can be accurately obtained, which further improves the recognition accuracy of the trained first model.
  • Fig. 2 is a schematic diagram of the implementation flow of a model training method provided by an embodiment of the present disclosure. As shown in Fig. 2, the method includes steps S21 to S25, wherein:
  • Step S21 acquiring a first sub-image and a second sub-image containing the first object.
  • the second sub-image may be an image after at least occlusion processing is performed on the first sub-image.
  • the second sub-image may include at least one image.
  • the multiple images may be images obtained by at least performing occlusion processing on the first sub-image respectively.
  • Performing at least occlusion processing may include but not limited to only occlusion processing, or occlusion processing and other processing, and the like.
  • other processing may include, but not limited to, at least one of scaling, cropping, resizing, filling, flipping, color dithering, grayscale, Gaussian blur, and random erasing. A sort of.
  • those skilled in the art may use an appropriate processing method on the first sub-image to obtain the second sub-image according to actual conditions, which is not limited in the embodiments of the present disclosure.
  • step S21 includes step S211 to step S212, wherein:
  • Step S211 acquiring a first sub-image including a first object.
  • the first sub-image may be any suitable image containing at least the first object.
  • the content contained in the first sub-image may be determined according to an actual application scene, for example, only include the first object, or include at least one of the first object and an object, or other objects.
  • the first object may include, but is not limited to, people, animals, plants, objects, and the like.
  • the first sub-image is a face image containing Zhang San.
  • the first sub-image is an image including Li Si's whole person.
  • Step S212 based on the preset occlusion set, perform at least occlusion processing on the first sub-image to obtain a second sub-image.
  • the occlusion set includes at least one occlusion image.
  • the occlusion set may include, but is not limited to, one established based on at least one of a training set, other images, and the like.
  • the occlusion set includes at least a variety of occlusion object images, background images, etc., such as leaves, vehicles, trash cans, buildings, trees, flowers, and the like. For example, find image samples occluded by background and objects in the training set, and manually crop out the occluded parts to form an occlusion library.
  • a suitable image containing at least one object occlusion is selected, and the occlusion part is manually cut out to form an occlusion library.
  • those skilled in the art may choose an appropriate way to establish an occlusion set according to actual requirements, which is not limited by the embodiments of the present disclosure.
  • the position of the occluder may include, but not limited to, a specified position, a specified size, and the like.
  • the specified position can be set as a quarter of the four positions in one to half of the area.
  • those skilled in the art may determine the position of the barrier according to actual needs, which is not limited by the embodiments of the present disclosure.
  • performing at least occlusion processing may include, but is not limited to, occlusion processing and other processing.
  • the occlusion image is randomly selected from the occlusion library, and the size of the occlusion image is adjusted based on the adjustment rules.
  • the adjustment rule may include but not limited to adjusting the size of the occluder image, adjusting the size of the first image sample, and the like.
  • the height of the occluder image exceeds twice the width of the occluder image, it is considered to be vertical occlusion, and the height of the occluder image can be adjusted to the vertical height of the occluder image, and the width of the occluder image can be adjusted to the first image sample 1/4 to 1/2 of the width of the occluder image; otherwise, it is regarded as horizontal occlusion, and the width of the occluder image can be adjusted to the horizontal width of the occluder image, and the height of the occluder image can be adjusted to the height of the first image sample One quarter to one half.
  • those skilled in the art may determine the adjustment rule according to actual needs, which is not limited by the embodiments of the present disclosure.
  • occlusion processing including occlusion processing, resizing processing, filling processing, and cropping processing
  • firstly perform resizing processing, padding processing, and cropping processing on the first image sample
  • the method also includes step S213, wherein:
  • Step S213 based on the first sub-image and the second sub-image, determine an occlusion mask.
  • the occlusion mask is used to represent the occlusion information of the image.
  • the occlusion mask can be used for training the first model on object occlusion.
  • the occlusion mask may be determined based on pixel differences between the first sub-image and the second sub-image.
  • the difference between the first sub-image and the second sub-image can be calculated based on the following formula (2-1):
  • x represents the first sub-image
  • x' represents the second sub-image
  • step S213 includes steps S2131 to S2133, wherein:
  • Step S2131 Divide the first sub-image and the second sub-image into at least one first sub-part image and at least one second sub-part image respectively.
  • fine-grained occlusion masks tend to have many false labels due to misalignment of semantics (e.g., body parts) between different images, so the first sub-image and the second sub-image can be roughly horizontally Divided into a plurality of parts, the occlusion mask is determined based on pixel differences between each part of the first sub-image and each part of the second sub-image. For example, divided into four parts, divided into five parts, etc. During implementation, those skilled in the art may divide the first sub-image and the second sub-image according to actual requirements, which is not limited in the embodiments of the present disclosure.
  • Step S2132 based on each first sub-part image and each second sub-part image, determine an occlusion sub-mask.
  • the pixel difference between each first sub-part image and each second sub-part image can be obtained based on the above formula (2-1), and based on the pixel difference of each part, determine the mask.
  • Step S2133 Determine an occlusion mask based on each occlusion sub-mask.
  • d i is not less than the first threshold, it indicates that there is occlusion in this part of the image.
  • the occlusion sub-mask mask i can be set to 0. Otherwise, it indicates that there is no occlusion in this part.
  • mask i can be set to 1, then the corresponding occlusion mask mask is the occlusion sub-mask of each part.
  • the first sub-image and the second sub-image are divided into four parts, in the case that there is no occlusion in the first, second and third parts, and there is occlusion in the fourth part, then the occlusion mask mask at this time should be 1110.
  • those skilled in the art may determine the occlusion mask according to actual needs, which is not limited by the embodiments of the present disclosure.
  • Step S22 Using the first network of the first model to be trained, perform feature extraction on the first sub-image to obtain the first sub-feature of the first object, and perform feature extraction on the second sub-image to obtain the first sub-feature of the first object. Two sub-features.
  • the first model may be any suitable model for object recognition based on image features.
  • the first model may include at least a first network.
  • the first sub-feature may include, but not limited to, the original feature of the first sub-image, or a feature obtained by processing the original feature.
  • the second sub-feature may include, but not limited to, the original feature of the second sub-image, or a feature obtained by processing the original feature.
  • the original features may include but not limited to facial features, body features, etc. of the objects contained in the image.
  • Step S23 using the second network of the first model, based on the second feature of at least one second object, to update the first sub-feature and the second sub-feature respectively, to obtain the first target sub-feature and the first target sub-feature corresponding to the first sub-feature The second target sub-feature corresponding to the second sub-feature.
  • the similarity between each second object and the first object is not less than the first threshold.
  • the first threshold may be preset or obtained by statistics. During implementation, those skilled in the art may independently determine the setting manner of the first threshold according to actual needs, which is not limited in the embodiments of the present disclosure.
  • the similarity between the facial features of the second object and the first object is not less than the first threshold.
  • the similarity between the wearing features of the second object and the first object is not less than the first threshold.
  • neither the similarity between the appearance characteristics of the second object nor the similarity between the clothing characteristics of the first object is less than the first threshold.
  • the second feature can be obtained based on the training set, or can be pre-input.
  • the second object may include, but is not limited to, people, animals, plants, objects, and the like.
  • the similarity between each second object and the first object may be obtained based on the similarity between the second feature of each second object and the first feature of the first object. In some implementations, the similarity between each second object and the first object may be obtained based on the similarity between the feature center of each second object and the first feature of the first object.
  • the first model may include a second memory feature library
  • the second memory feature library may include at least one feature of at least one object. The feature center of the second object may be obtained based on at least one feature belonging to the second object in the second memory feature library.
  • features of multiple image samples of at least one object in the training set may be extracted, and the extracted features may be stored in the second memory feature library according to their identity.
  • Step S24 Determine a target loss value based on the first target sub-feature and the second target sub-feature.
  • the target loss value may include, but not limited to, at least one of a mean square error loss value, a cross-entropy loss value, a comparison loss value, and the like.
  • Step S25 based on the target loss value, update the model parameters of the first model at least once to obtain the trained first model.
  • the above-mentioned step S25 corresponds to the above-mentioned step S15, and the implementation manner of the above-mentioned step S15 can be referred to for implementation.
  • the second sub-image is an image after at least occlusion processing is performed on the first sub-image; using the first model to be trained
  • the first network performs feature extraction on the first sub-image to obtain the first sub-feature of the first object, and performs feature extraction on the second sub-image to obtain the second sub-feature of the first object; using the second sub-feature of the first model
  • the network based on the second feature of at least one second object, respectively updates the first sub-feature and the second sub-feature to obtain the first target sub-feature corresponding to the first sub-feature and the second target sub-feature corresponding to the second sub-feature feature, the similarity between each second object and the first object is not less than the first threshold; based on the first target sub-feature and the second target sub-feature, determine the target loss value; based on the target loss value, the model parameters of the first model At least one update is performed to obtain the trained first model.
  • the model parameters of the first model are updated at least once. Since the target loss value is determined based on the first target feature, it can be Improve the consistency of the prediction of the first model after training for different image samples of the same object, so that the first model after training can more accurately predict objects in images containing object occlusion and/or multiple objects Re-identify.
  • step S24 includes step S241 to step S243, wherein:
  • Step S241 Determine a first target loss value based on the first target sub-feature and the second target sub-feature.
  • the first target loss value may include, but not limited to, at least one of a mean square error loss value, a cross-entropy loss value, a comparison loss value, and the like.
  • step S241 includes step S2411 to step S2413, wherein:
  • Step S2411 Based on the first target sub-feature, determine a third target sub-loss value.
  • step S2411 corresponds to the above-mentioned step S14, and the implementation manner of the above-mentioned step S14 can be referred to for implementation.
  • Step S2412. Based on the second target sub-feature, determine the fourth target sub-loss value.
  • step S2412 corresponds to the above-mentioned step S14, and the implementation of the above-mentioned step S14 can be referred to for implementation.
  • Step S2413 Determine the first target loss value based on the third target sub-loss value and the fourth target sub-loss value.
  • the first target loss value may include but not limited to the sum between the third target sub-loss value and the fourth target sub-loss value, the sum after weighting the third target sub-loss value and the fourth target sub-loss value, etc. .
  • those skilled in the art may determine the first target loss value according to actual needs, which is not limited by the embodiments of the present disclosure.
  • Step S242 Determine a second target loss value based on the first sub-feature and the second sub-feature.
  • the second target loss value may include but not limited to at least one of a mean square error loss value, a cross-entropy loss value, a comparison loss value, and the like.
  • Step S243 Determine a target loss value based on the first target loss value and the second target loss value.
  • the target loss value may include, but not limited to, the sum of the first target loss value and the second target loss value, the sum after weighting the first target loss value and the second target loss value respectively, and the like.
  • those skilled in the art may determine the target loss value according to actual needs, which is not limited by the embodiments of the present disclosure.
  • the target loss value is determined based on the first sub-feature, the second sub-feature, the first target sub-feature and the second target sub-feature. In this way, the accuracy of the target loss value can be improved, so as to accurately judge whether the first model is converged.
  • the first network includes a first subnet and a second subnet
  • step S22 includes steps S221 to S222, wherein:
  • Step S221. Using the first sub-network of the first model to be trained, perform feature extraction on the first sub-image and the second sub-image respectively, to obtain the third sub-feature corresponding to the first sub-image and the third sub-feature corresponding to the second sub-image. Four features.
  • the first network includes at least a first subnetwork, and the first subnetwork is used to extract features of the image using a feature extractor.
  • the feature extractor may include, but is not limited to, RNN, CNN, a Transform-based feature extraction network, and the like.
  • those skilled in the art may use an appropriate first sub-network in the first model to obtain the third sub-feature according to actual conditions, which is not limited in the embodiments of the present disclosure.
  • a feature of the first sub-image is extracted through the first sub-network, and the feature is determined as a third sub-feature of the first object.
  • the third sub-feature may include but not limited to the original feature of the first sub-image and the like.
  • Step S222 using the second sub-network of the first model, determining the first sub-feature based on the third sub-feature, and determining the second sub-feature based on the fourth sub-feature.
  • the second sub-network may include an occlusion erasure network, which is used to perform occlusion erasure processing on input features and output unoccluded features.
  • the first sub-feature of the first object is obtained after occlusion and erasure processing is performed on the third sub-feature through the second sub-network.
  • the second sub-feature of the first object is obtained after the fourth sub-feature is occluded and erased through the second sub-network.
  • the overall network structure of the first model is trained by introducing the object image as noise at the picture level containing the first image sample of the first object, so that the robustness and the improvement of the first model can be enhanced.
  • the performance of the first model can further enable the trained first model to more accurately re-identify objects in images containing object occlusions.
  • step S242 includes step S2421 to step S2423, wherein:
  • Step S2421 Based on the first sub-feature and the second sub-feature, determine a first target sub-loss value.
  • the first target sub-loss value may include but not limited to at least one of a mean square error loss value, a cross-entropy loss value, a comparison loss value, and the like.
  • Step S2422 Based on the third sub-feature and the fourth sub-feature, determine a second target sub-loss value.
  • the second target sub-loss value may include, but not limited to, at least one of a mean square error loss value, a cross-entropy loss value, a comparison loss value, and the like.
  • Step S2423 Determine a second target loss value based on the first target sub-loss value and the second target sub-loss value.
  • the second target loss value may include but not limited to the sum between the first target sub-loss value and the second target sub-loss value, the sum after weighting the first target sub-loss value and the second target sub-loss value, etc. .
  • those skilled in the art may determine the second target loss value according to actual needs, which is not limited by the embodiments of the present disclosure.
  • the second target loss value is determined based on the first sub-feature, the second sub-feature, the third sub-feature and the fourth sub-feature. In this way, the accuracy of the second target loss value can be improved, so as to accurately judge whether the first model converges.
  • the first sub-image includes label information
  • step S2422 includes steps S251 to S253, wherein:
  • Step S251. Determine a seventh sub-loss value based on the third sub-feature and label information.
  • tag information may include, but not limited to, tag values, identifiers, and the like.
  • the seventh sub-loss value may include but not limited to a cross-entropy loss value and the like.
  • the seventh sub-loss value can be calculated by the above formula (1-1), and at this time, f i in the formula (1-1) is the third sub-feature.
  • Step S252 Determine an eighth sub-loss value based on the fourth sub-feature and label information.
  • the eighth sub-loss value may include but not limited to a cross-entropy loss value and the like.
  • the eighth sub-loss value may be determined according to the above formula (1-1), at this time, f i in the formula (1-1) is the fourth sub-feature.
  • Step S253 Determine a second target sub-loss value based on the seventh sub-loss value and the eighth sub-loss value.
  • the second target sub-loss value may include, but not limited to, the sum between the seventh sub-loss value and the eighth sub-loss value, the sum after weighting the seventh sub-loss value and the eighth sub-loss value, and the like.
  • those skilled in the art may determine the second target sub-loss value according to actual requirements, which is not limited in the embodiments of the present disclosure.
  • the second target sub-loss value is determined based on the third sub-feature, the fourth sub-feature and label information. In this way, the accuracy of the second target sub-loss value can be improved, so as to accurately judge whether the first model is converged.
  • the second subnetwork includes a third subnetwork and a fourth subnetwork
  • step S222 includes steps S2221 to S2222, wherein:
  • Step S2221 using the third sub-network of the first model to determine the first occlusion score based on the third sub-feature, and determine the second occlusion score based on the fourth sub-feature.
  • the second sub-network includes at least a third sub-network
  • the third sub-network is used to perform semantic analysis based on features of the image to obtain an occlusion score corresponding to the image.
  • the third subnetwork includes a pooling subnetwork and at least one occlusion erasure subnetwork, the first occlusion score includes at least one first occlusion subscore, and the second occlusion score includes at least one second occlusion subscore;
  • the above step S2221 includes step 261 to step S262, wherein:
  • Step S26 Divide the third sub-feature into at least one third sub-part feature by using the pooling sub-network, and divide the fourth sub-feature into at least one fourth sub-part feature.
  • the pooling sub-network is used to divide the input feature to obtain at least one sub-part feature of the feature.
  • the number of third sub-section features may be the same as the number of first sub-images. For example, if the first sub-image is divided into four parts, then the third sub-feature can be divided into three third sub-part features through the pooling sub-network, and each third sub-part feature corresponds to f i .
  • Step S262. Using each occlusion erasure sub-network, determine a first occlusion sub-score based on each third sub-part feature, and determine a second occlusion sub-score based on each fourth sub-part feature.
  • each occlusion erasure sub-network is used to perform semantic analysis on the input feature to obtain the occlusion score of the image corresponding to the feature.
  • each occlusion erasing sub-network consists of two fully connected layers, a layer normalization and an activation function, wherein the layer normalization is located between the two fully connected layers, and the activation function is located at the end .
  • the activation function can be a sigmoid function.
  • the number of occlusion erasure sub-networks is the same as the number of first sub-image divisions. For example, the first sub-image is divided into four parts, and the corresponding feature of each part is f i .
  • the third sub-network includes four occlusion-erasing sub-networks, and each occlusion-erasing sub-network is used to output the corresponding Occlusion score.
  • the first sub-image is divided into five parts, and the corresponding feature of each part is f i .
  • the third sub-network includes five occlusion-erasing sub-networks, and each occlusion-erasing sub-module is used to output f i The corresponding occlusion score.
  • the occlusion score can be calculated by the following formula (2-2):
  • W cp is a matrix
  • W rg is a matrix
  • LN is layer normalization
  • c represents the channel dimension
  • fi represents the feature of the i-th part in the third sub-feature or the fourth sub-feature.
  • the third sub-feature is divided into four third sub-part features through the pooling sub-network, and each third sub-part feature is input into the corresponding occlusion erasure sub-network, based on the first fully connected layer W cp Compress the channel dimension to a quarter of the original, and perform layer normalization on the features of the compressed channel dimension, then compress the layer normalized features to one dimension, and finally output the third sub-part feature correspondence through the Sigmoid function
  • Step S2222. Using the fourth sub-network, determine the first sub-feature based on the third sub-feature and the first occlusion score, and determine the second sub-feature based on the fourth sub-feature and the second occlusion score.
  • the second subnetwork further includes a fourth subnetwork, and the fourth subnetwork is used to determine features after occlusion erasure.
  • step S2222 includes step S271 to step 272, wherein:
  • Step S271 using the fourth sub-network, based on each third sub-part feature of the third sub-feature and each first occlusion sub-score, determine the first sub-part feature, and based on each fourth sub-part feature of the fourth sub-feature
  • the partial feature and each second occlusion sub-score determine a second sub-part feature.
  • the first sub-part feature or the second sub-part feature can be calculated by the following formula (2-3):
  • si denotes the i-th occlusion score
  • fi denotes the i-th third sub-part feature or fourth sub-part feature.
  • the second feature memory may be updated based on the first sub-feature.
  • the way of updating may include, but not limited to, adding the first sub-feature to the second feature storage, replacing a certain feature in the second feature storage with the first sub-feature, and so on.
  • Step S272 Determine the first sub-feature based on each first sub-part feature, and determine the second sub-feature based on each second sub-part feature.
  • the first sub-features can be obtained by concatenating at least one first sub-feature.
  • the accuracy of the first sub-feature and the second sub-feature can be improved by using the pooling sub-network, at least one occlusion-erasing sub-network and the fourth sub-network.
  • the first sub-image includes label information
  • the first model includes a second feature memory
  • the second feature memory includes at least one feature belonging to at least one object
  • the above step S2421 includes steps S281 to S285, in:
  • Step S281. Determine an occlusion mask based on the first sub-image and the second sub-image.
  • step S281 corresponds to the above-mentioned step S213, and the implementation manner of the above-mentioned step S213 can be referred to for implementation.
  • Step S282. Determine a third loss value based on the first occlusion score, the second occlusion score and the occlusion mask.
  • the third loss value may include, but not limited to, a mean square error loss value and the like.
  • Step S283 Determine a fourth loss value based on the first sub-feature, the second sub-feature and label information.
  • the fourth loss value may include but not limited to a cross-entropy loss value and the like.
  • Step S284 Determine a fifth loss value based on the first sub-feature, the second sub-feature, and at least one feature of at least one object in the second feature memory.
  • the fifth loss value may include, but not limited to, a comparison loss value and the like.
  • Step S285 based on the third loss value, the fourth loss value and the fifth loss value, determine the first target sub-loss value.
  • the first target sub-loss value may include but not limited to the sum of the third loss value, the fourth loss value and the fifth loss value, after weighting the third loss value, the fourth loss value and the fifth loss value respectively and so on.
  • those skilled in the art may determine the first target sub-loss value according to actual needs, which is not limited in the embodiments of the present disclosure.
  • the first target sub-loss value is determined based on the occlusion mask, the first sub-feature, the second sub-feature, label information and other object characteristics. In this way, the accuracy of the first target sub-loss value can be improved, so as to accurately judge whether the first model is converged.
  • step S282 includes step S2821 to step S2823, wherein:
  • Step S2821 Determine a first sub-loss value based on the first occlusion score and the occlusion mask.
  • the first sub-loss value may include, but not limited to, a mean square error loss value and the like.
  • the first sub-loss value can be calculated according to the following formula (2-4):
  • N is the total number of occlusion erasure sub-networks
  • s i represents the i-th occlusion score
  • mask i represents the i-th occlusion sub-mask in the occlusion mask.
  • the occlusion mask mask is 1110
  • mask 1 is 1
  • mask 4 is 0 at this time.
  • Step S2822 Determine a second sub-loss value based on the second occlusion score and the occlusion mask.
  • the second sub-loss value may include, but not limited to, a mean square error loss value and the like.
  • the manner of determining the second sub-loss value may be the same as that of determining the first sub-loss value, see step S2821 for details.
  • Step S2823 Determine a third loss value based on the first sub-loss value and the second sub-loss value.
  • the third loss value may include, but not limited to, the sum of the first sub-loss value and the second sub-loss value, the sum after weighting the first sub-loss value and the second sub-loss value, and the like.
  • those skilled in the art may determine the third loss value according to actual requirements, which is not limited in the embodiments of the present disclosure.
  • the third loss value is determined based on the first occlusion score, the second occlusion score and the occlusion mask. In this way, the accuracy of the third loss value can be improved, so as to accurately judge whether the first model is converged.
  • step S283 includes step S2831 to step S2833, wherein:
  • Step S2831 Determine a third sub-loss value based on the first sub-feature and label information.
  • tag information may include, but not limited to, tag values, identifiers, and the like.
  • the third sub-loss value may include, but not limited to, a cross-entropy loss value and the like.
  • the third sub-loss value can be calculated by the above formula (1-1), at this time, f i in the formula (1-1) is the first sub-feature.
  • Step S2832 Determine a fourth sub-loss value based on the second sub-feature and label information.
  • the fourth sub-loss value may include but not limited to a cross-entropy loss value and the like.
  • the fourth sub-loss value can be calculated by the above formula (1-1), and at this time, f i in the formula (1-1) is the second sub-feature.
  • Step S2833 Determine a fourth loss value based on the third sub-loss value and the fourth sub-loss value.
  • the fourth loss value may include, but not limited to, the sum between the third sub-loss value and the fourth sub-loss value, the sum after weighting the third sub-loss value and the fourth sub-loss value, and the like.
  • those skilled in the art may determine the fourth loss value according to actual requirements, which is not limited in the embodiments of the present disclosure.
  • the fourth loss value is determined based on the first sub-feature, the second sub-feature and label information. In this way, the accuracy of the fourth loss value can be improved, so as to accurately judge whether the first model is converged.
  • step S284 includes step S2841 to step S2844, wherein:
  • Step S2841 From at least one feature of at least one object in the second feature memory, determine a third feature center of the first object and a fourth feature center of at least one second object.
  • the third feature center may be determined based on the feature of the first object in the second feature memory library and the first sub-feature.
  • Each fourth feature center may be determined based on each feature of each second object in the second feature memory.
  • the feature center of each object can be calculated by the following formula (2-5):
  • c x represents the feature center of the x-th object
  • B k represents the feature set belonging to the k-th object in the mini-batch
  • m is the set update momentum coefficient
  • f i ′ is the first subclass of the i-th sample. feature.
  • m can be 0.2.
  • the feature center c k belonging to the object when f i ' and B k both belong to the same object, the feature center c k belonging to the object will change, and in the case that f i ' and B k do not belong to the same object, the feature center c k belonging to the object The feature center c k is consistent with the previous c k .
  • Step S2842 based on the first sub-feature, the third feature center and each fourth feature center, determine the fifth sub-loss value.
  • the fifth sub-loss value may include but not limited to contrastive loss and the like.
  • the fifth sub-loss value can be calculated by the following formula (2-6):
  • is a predefined temperature parameter
  • c y represents the third feature center of the y-th object
  • c z represents the z-th fourth feature center
  • f i represents the first sub-feature of the i-th object
  • ID S represents The total number of objects in the training set.
  • Step S2843 based on the second sub-feature, the third feature center and each fourth feature center, determine the sixth sub-loss value.
  • the sixth sub-loss value may include but not limited to contrastive loss and the like.
  • the manner of determining the sixth sub-loss value may be the same as that of determining the fifth sub-loss value, see step S2842 for details.
  • Step S2844 Determine a sixth loss value based on the fifth sub-loss value and the sixth sub-loss value.
  • the sixth loss value may include, but not limited to, the sum between the fifth sub-loss value and the sixth sub-loss value, the sum after weighting the fifth sub-loss value and the sixth sub-loss value, and the like.
  • those skilled in the art may determine the sixth loss value according to actual needs, which is not limited in the embodiments of the present disclosure.
  • the sixth loss value is determined based on the first sub-feature, the second sub-feature and other object characteristics. In this way, the accuracy of the sixth loss value can be improved, so as to accurately judge whether the first model is converged.
  • the second network includes a fifth subnetwork and a sixth subnetwork
  • step S23 includes steps S231 to S232, wherein:
  • Step S231 using the fifth sub-network to aggregate the first sub-feature and the second sub-feature with the second feature of at least one second object respectively, to obtain the first aggregated sub-feature and the second sub-feature corresponding to the first sub-feature The corresponding second aggregate subfeature.
  • the second network includes at least a fifth sub-network
  • the fifth sub-network is used to aggregate the first sub-features with the second features of at least one second object to obtain the first aggregated sub-features, and combine the second sub-features with A second feature of at least one second object is aggregated to obtain a second aggregated sub-feature.
  • Step S232 Using the sixth sub-network, determine the first target sub-feature based on the first aggregated sub-feature, and determine the second target sub-feature based on the second aggregated sub-feature.
  • the second network further includes a sixth sub-network for determining the first target sub-feature based on the first aggregated sub-feature, and determining the second target sub-feature based on the second aggregated sub-feature.
  • the overall network structure of the first model is trained by introducing the features of the second object as noise at the feature level of the first image sample containing the first object, so that the robustness of the first model can be enhanced and improve the performance of the first model, thereby enabling the trained first model to more accurately re-identify objects in images containing multiple objects.
  • step S231 includes step S2311 to step S2314, wherein:
  • Step S2311 based on the first sub-feature and each second feature, determine a first attention matrix.
  • the first attention matrix is used to represent the degree of association between the first sub-feature and each second feature.
  • X second features belonging to at least one second object are determined, where X is a positive integer.
  • X can be 10.
  • the X second features closest to the first sub-features belonging to the second object can be searched in the second feature memory library, and based on each second feature, X first sub-features can be determined. center When looking up, it can be calculated based on the cosine distance between features.
  • the network parameters of the fifth sub-network include a first prediction matrix and a second prediction matrix
  • step S2311 includes steps S2321 to S2323, wherein:
  • Step S2321 based on the first sub-feature and the first prediction matrix, determine the first prediction feature.
  • the first predictive feature can be calculated by the following formula (2-7):
  • f' represents the first sub-feature
  • Both d and d' are the feature dimensions of f'.
  • Step S2322. Based on each second feature and the second predictive matrix, determine a second predictive feature.
  • the second predictive feature can be calculated by the following formula (2-8):
  • Both d and d' are feature dimensions of the first sub-feature.
  • Step S2323 Determine a first attention matrix based on the first predictive feature and each second predictive feature.
  • the first attention matrix can be determined by the following formula (2-9):
  • X represents the total number of second features, i ⁇ 1, 2, ... X, is a scaling factor.
  • Step S2312 based on each second feature and each first attention matrix, determine the first aggregation sub-feature.
  • the network parameters of the fifth sub-network also include a third prediction matrix
  • step S2312 includes steps S2331 to S2332, wherein:
  • Step S2331. Based on each second feature and the third predictive matrix, determine a third predictive feature.
  • the third predictive feature can be calculated by the following formula (2-10):
  • Both d and d' are feature dimensions of the first sub-feature.
  • Step S2332 based on each third predictive feature and each first attention matrix, determine the first aggregation sub-feature.
  • the first aggregation sub-feature can be determined by the following formula (2-11):
  • m i represents the i-th first attention matrix
  • f vi represents the i-th third predictive feature
  • Step S2313 Determine a second attention matrix based on the second sub-features and each second feature.
  • the second attention matrix is used to characterize the degree of association between the second sub-features and each second feature.
  • the manner of determining the second attention matrix may be the same as that of determining the first attention matrix, see step S2321 to step S2323.
  • Step S2314 based on each second feature and each second attention matrix, determine a second aggregation sub-feature.
  • the manner of determining the second aggregation sub-feature may be the same as that of determining the first aggregation sub-feature, see step S2331 to step S2332 for details.
  • each first center is divided into multiple parts by multi-head operation, and attention weight is assigned to each part, so as to ensure that more unique patterns similar to target objects and non-target objects can be aggregated to The robustness of the first model is enhanced, so that the trained first model can more accurately re-identify objects in images containing multiple objects.
  • the sixth subnetwork includes the seventh subnetwork and the eighth subnetwork, and the above step S232 includes steps S2341 to S2343, wherein:
  • Step S2341 Determine an occlusion mask based on the first sub-image and the second sub-image.
  • the occlusion mask is used to represent the occlusion information of the image.
  • the occlusion mask may be determined based on pixel differences between the first sub-image and the second sub-image.
  • Step S2342 Using the seventh sub-network, determine the fifth sub-feature based on the first aggregation sub-feature and the occlusion mask, and determine the sixth sub-feature based on the second aggregation sub-feature and the occlusion mask.
  • the seventh sub-network may be an FFN 1 ( ⁇ ) neural network including two fully connected layers and an activation function.
  • the fifth sub-feature or the sixth sub-feature can be obtained by the following formula (2-12):
  • mask is the occlusion mask and f d is the first aggregated sub-feature or the second aggregated sub-feature.
  • Step S2343 Using the eighth sub-network, determine the first target sub-feature based on the first sub-feature and the fifth sub-feature, and determine the second target sub-feature based on the second sub-feature and the sixth sub-feature.
  • the eighth sub-network may be an FFN 2 ( ⁇ ) neural network including two fully connected layers and an activation function.
  • the first target sub-feature or the second target sub-feature can be obtained by the following formula (2-13):
  • f" is the fifth sub-feature or the sixth sub-feature
  • f' is the first sub-feature or the second sub-feature
  • the target feature is obtained, which can ensure that the features of other objects are only added to the human body part of the first object and not the pre-identified object occlusion part , in order to better simulate the features of multi-pedestrian images.
  • Fig. 3 is a schematic diagram of the implementation flow of a model training method provided by an embodiment of the present disclosure. As shown in Fig. 3, the method includes steps S31 to S37, wherein:
  • Step S31 acquiring a first image sample including a first object.
  • Step S32 using the first network of the first model to be trained, to perform feature extraction on the first image sample to obtain the first feature of the first object.
  • Step S33 using the second network of the first model to update the first feature based on the second feature of at least one second object to obtain the first target feature corresponding to the first feature, and each second object is related to the first object
  • the similarity of is not less than the first threshold.
  • Step S34 Determine a target loss value based on the first target feature.
  • Step S35 based on the target loss value, update the model parameters of the first model at least once to obtain the trained first model.
  • steps S31 to S35 correspond to the above-mentioned steps S11 to S15 respectively, and for implementation, reference may be made to the specific implementation manners of the above-mentioned steps S11 to S15.
  • Step S36 Determine an initial second model based on the trained first model.
  • the network of the trained first model may be adjusted according to an actual usage scenario, and the adjusted first model may be determined as the initial second model.
  • the first model includes a first network and a second network, the second network in the trained first model can be removed, and the first network of the first model can be adjusted according to the actual scene , and determine the adjusted first model as the initial second model.
  • Step S37 based on at least one second image sample, update the model parameters of the second model to obtain a trained second model.
  • the second image sample may have label information, or may not have label information.
  • those skilled in the art may determine a suitable second image sample according to an actual application scenario, which is not limited here.
  • fine-tuning training may be performed on model parameters of the second model to obtain a trained second model.
  • an initial second model is determined based on the trained first model, and model parameters of the second model are updated based on at least one second image sample to obtain a trained second model.
  • the model parameters of the trained first model can be migrated to the second model to be applicable to various application scenarios, which can not only reduce the amount of calculation in practical applications, but also improve the training efficiency and training efficiency of the second model. After the detection accuracy of the second model.
  • Fig. 4 is an image recognition method provided by an embodiment of the present disclosure. As shown in Fig. 4, the method includes steps S41 to S42, wherein:
  • Step S41 acquiring a first image and a second image.
  • the first image and the second image may be any suitable images to be recognized. During implementation, those skilled in the art may select an appropriate image according to an actual application scenario, which is not limited by the embodiments of the present disclosure.
  • the first image may include an occluded image or an unoccluded image.
  • the sources of the first image and the second image may be the same or different.
  • both the first image and the second image are images captured by a camera.
  • the first image is an image captured by a camera
  • the second image may be a frame of an image in a video.
  • Step S42 using the trained target model, to recognize the object in the first image and the object in the second image, and obtain a recognition result.
  • the trained target model may include but not limited to at least one of the first model and the second model.
  • the recognition result indicates that the object in the first image and the object in the second image are the same object or different objects.
  • the first target feature corresponding to the first image and the second target feature corresponding to the second image are obtained respectively, and based on the similarity between the first target feature and the second target feature, it is obtained The recognition result.
  • the model training method in the above embodiment can introduce real noise at the feature level, or introduce real noise at both the picture level and the feature level, the overall network structure of the target model is trained, and the target model is enhanced.
  • the robustness of the target model has effectively improved the performance of the target model. Therefore, based on the first model and/or the second model obtained by using the model training method in the above embodiment to identify the image, the pedestrian can be more accurately identified. Re-identify.
  • FIG. 5A is a schematic diagram of the composition and structure of a model training system 50 provided by an embodiment of the present disclosure. As shown in FIG. 54 and feature memory part 55, wherein:
  • the augmentation part 51 is configured to at least perform occlusion processing on the first sub-image containing the first object to obtain the second sub-image.
  • the occlusion erasing part 52 is configured to use the first network of the first model to be trained to perform feature extraction on the first sub-image, obtain the first sub-feature of the first object, and perform feature extraction on the second sub-image, Get the second subfeature of the first object.
  • the feature diffusion part 53 is configured to use the second network of the first model to update the first sub-feature and the second sub-feature respectively based on the second feature of at least one second object, and obtain the first sub-feature corresponding to the first sub-feature A target sub-feature and a second target sub-feature corresponding to the second sub-feature, the similarity between each second object and the first object is not less than the first threshold.
  • the updating part 54 is configured to determine a target loss value based on the first target sub-feature and the second target sub-feature; based on the target loss value, update the model parameters of the first model at least once to obtain the trained first model.
  • the feature memory part 55 is configured to store at least one feature of at least one object.
  • the feature memory part 55 includes a first feature memory and a second feature memory, the first feature memory is used to store the first sub-feature of at least one object, and the second feature memory is used to store at least The first target subfeature of an object.
  • FIG. 5B is a schematic diagram of a model training system 500 provided by an embodiment of the present disclosure.
  • the model training system 500 performs augmentation processing on an input first image 501 to obtain a second image 502, and converts the first image to After an image 501 and a second image 502 are input to the occlusion and erasing part 52, the first sub-feature f1' and the second sub-feature f2' are obtained respectively, and the second feature memory library 552 is updated based on the first sub-feature f1', and the second After a sub-feature f1', a second sub-feature f2' and at least one feature of at least one other object selected from the second feature memory bank 552 are input to the feature diffusion part 53, the first target sub-feature fd1' and the second sub-feature are respectively obtained.
  • the target sub-feature fd2 ′ based on the first target sub-feature fd1 ′, updates the first feature storage 551 , and the network parameters
  • the augmentation part 51 is further configured to: determine an occlusion mask based on the first sub-image and the second sub-image.
  • Fig. 5C is a schematic diagram of determining an occlusion mask provided by an embodiment of the present disclosure. As shown in Fig. 5C, a pixel comparison operation 503 is performed between the first sub-image 501 and the second sub-image 502, and after the pixel comparison operation 503 , perform a binarization operation 504 on the comparison result, and obtain a corresponding occlusion mask 505 after the binarization operation 504 .
  • the first network includes a first sub-network and a second sub-network
  • the occlusion erasing part 52 is further configured to: use the first sub-network of the first model to be trained to respectively perform the first sub-image Perform feature extraction with the second sub-image to obtain the third sub-feature corresponding to the first sub-image and the fourth sub-feature corresponding to the second sub-image; use the second sub-network of the first model to determine the first sub-feature based on the third sub-feature sub-features, and determine the second sub-features based on the fourth sub-features.
  • FIG. 5D is a schematic diagram of a first network 510 provided by an embodiment of the present disclosure.
  • the first network 510 includes a first sub-network 511 and a second sub-network 512.
  • the sub-image 502 is input into the first sub-network 511 to obtain the third sub-feature f1 corresponding to the first sub-image 501, the fourth sub-feature f2 corresponding to the second sub-image 502, and the third sub-feature f1 and the fourth sub-feature f2 is input into the second sub-network 512 to obtain the first sub-feature f1' and the second sub-feature f2'.
  • the second subnetwork includes a third subnetwork and a fourth subnetwork
  • the occlusion erasing part 52 is further configured to: use the third subnetwork of the first model to determine the first occlusion score, and determine the second occlusion score based on the fourth sub-feature; utilize the fourth sub-network, based on the third sub-feature and the first occlusion score, determine the first sub-feature, and based on the fourth sub-feature and the second occlusion score, Determine the second sub-feature.
  • FIG. 5E is a schematic diagram of a second subnetwork 512 provided by an embodiment of the present disclosure.
  • the second subnetwork 512 includes a third subnetwork 521 and a fourth subnetwork 522, and the third subnetwork f1 and The fourth sub-feature f2 is input into the third sub-network 521, and the first occlusion score s1 corresponding to the third sub-feature f1 and the second occlusion score s2 corresponding to the fourth feature f2 are respectively obtained, and the first occlusion score s1 and the second occlusion score
  • the three sub-features f1 are input to the fourth sub-network 522 to obtain the first sub-feature f1', and the second occlusion score s2 and the fourth sub-feature f2 are input to the fourth sub-network 522 to obtain the second sub-feature f2'.
  • the second network includes a fifth sub-network and a sixth sub-network
  • the feature diffusion part 53 is further configured to: use the fifth sub-network to combine the first sub-feature and the second sub-feature with at least one
  • the second feature of the second object is aggregated to obtain the first aggregated sub-feature corresponding to the first sub-feature and the second aggregated sub-feature corresponding to the second sub-feature
  • the sixth sub-network is used to determine the first aggregated sub-feature based on the first aggregated sub-feature target sub-features, and determine second target sub-features based on the second aggregated sub-features.
  • FIG. 5F is a schematic diagram of a second network 520 provided by an embodiment of the present disclosure.
  • the second network 520 includes a fifth sub-network 521 and a sixth sub-network 522, and the first sub-feature f1' is input to
  • the fifth sub-network 521 searches the second feature storage 552 for K nearest first centers belonging to the second object based on the first sub-feature f1′ Based on the first sub-feature f1′ and the first prediction matrix W 1 , determine the first prediction feature f q , based on the first center and the second prediction matrix W 2 , determine the second prediction feature f c , based on the first center and the third prediction matrix W 3 , to determine the third prediction feature f v .
  • the first attention matrix m i is determined based on the first prediction feature f q and the second prediction feature f c
  • the first aggregation sub-feature f d is determined based on the first attention matrix m i and the third prediction feature f v .
  • a target sub-feature f d ′ A target sub-feature f d ′.
  • the feature diffusion part 53 is further configured to: determine a first attention matrix based on the first sub-feature and each second feature, and the first attention matrix is used to characterize the first sub-feature and each second feature The degree of association between the second features; based on each second feature and each first attention matrix, determine the first aggregation sub-feature; based on the second sub-feature and each second feature, determine the second attention matrix, The second attention matrix is used to characterize the degree of association between the second sub-features and each second feature; based on each second feature and each second attention matrix, the second aggregated sub-features are determined.
  • the network parameters of the fifth sub-network include a first prediction matrix and a second prediction matrix
  • the feature diffusion part 53 is further configured to: determine the first prediction feature based on the first sub-feature and the first prediction matrix ; Based on each second feature and the second predictive matrix, determine a second predictive feature; determine the first attention matrix based on the first predictive feature and each second predictive feature.
  • the network parameters of the fifth sub-network include a third predictive matrix
  • the feature diffusion part 53 is further configured to: determine a third predictive feature based on each second feature and the third predictive matrix; The third predictive feature and each of the first attention matrices determine a first aggregated sub-feature.
  • the sixth sub-network includes a seventh sub-network and an eighth sub-network
  • the feature diffusion part 53 is further configured to: use the seventh sub-network to determine the fifth sub-network based on the first aggregation sub-feature and the occlusion mask Sub-features, and determine the sixth sub-feature based on the second aggregated sub-feature and the occlusion mask; use the eighth sub-network, based on the first sub-feature and the fifth sub-feature, determine the first target sub-feature, and based on the second sub-feature and a sixth sub-feature to determine the second target sub-feature.
  • the updating part 54 is further configured to: determine the first target loss value based on the first target sub-feature and the second target sub-feature; determine the second target loss value based on the first sub-feature and the second sub-feature Loss value; determining the target loss value based on the first target loss value and the second target loss value; based on the target loss value, updating the model parameters of the first model at least once to obtain the trained first model.
  • the updating part 54 is further configured to: update the model parameters of the first model when the target loss value does not meet the preset condition, to obtain the updated first model, based on the updated The first model is to determine the first model after training; if the target loss value satisfies the preset condition, determine the updated first model as the first model after training.
  • the updating part 54 is further configured to: determine the first target sub-loss value based on the first sub-feature and the second sub-feature; determine the second target sub-loss value based on the third sub-feature and the fourth sub-feature Loss value: determining a second target loss value based on the first target sub-loss value and the second target sub-loss value.
  • the first sub-image includes label information
  • the first model includes a second feature memory bank
  • the second feature memory bank includes at least one feature belonging to at least one object
  • the updating part 54 is further configured to: based on Determine the third loss value based on the first occlusion score, the second occlusion score and the occlusion mask; determine the fourth loss value based on the first sub-feature, the second sub-feature and label information; based on the first sub-feature, the second sub-feature and at least one feature of at least one object in the second feature memory bank to determine a fifth loss value; based on the third loss value, the fourth loss value and the fifth loss value, determine the first target sub-loss value.
  • the updating part 54 is further configured to: determine the first sub-loss value based on the first occlusion score and the occlusion mask; determine the second sub-loss value based on the second occlusion score and the occlusion mask; The first sub-loss value and the second sub-loss value determine the third loss value.
  • the updating part 54 is further configured to: determine a third sub-loss value based on the first sub-feature and label information; determine a fourth sub-loss value based on the second sub-feature and label information; The sub-loss value and the fourth sub-loss value determine the fourth loss value.
  • the updating part 54 is further configured to: determine the third feature center of the first object and the fourth feature center of at least one second object from at least one feature of at least one object in the second feature memory library. feature center; based on the first sub-feature, the third feature center and each fourth feature center, determine the fifth sub-loss value; based on the second sub-feature, the third feature center and each fourth feature center, determine the sixth sub- A loss value; based on the fifth sub-loss value and the sixth sub-loss value, a fifth loss value is determined.
  • the updating part 54 is further configured to: determine the seventh sub-loss value based on the third sub-feature and label information; determine the eighth sub-loss value based on the fourth sub-feature and label information; The sub-loss value and the eighth sub-loss value determine the second target sub-loss value.
  • FIG. 5G is a schematic diagram of obtaining a target loss value 540 provided by an embodiment of the present disclosure.
  • the target loss value 540 mainly includes three parts: feature extraction, occlusion erasing part 52 and feature diffusion part 53. loss value, where:
  • the loss values for this part of feature extraction include:
  • the loss values for this part of the occlusion erasure part 52 include:
  • the loss values for this part of the characteristic diffusion part 53 include:
  • the ninth sub-loss value Loss11 (corresponding to the above-mentioned first loss value) determined based on the label information of the first target sub-feature fd1' and the first sub-image 501, based on the second target sub-feature fd2' and the first sub-image 501
  • the tenth sub-loss value Loss12 determined by the label information (corresponding to the above-mentioned first loss value);
  • the eleventh sub-loss value Loss21 (corresponding to the above-mentioned second loss value) determined based on the first target sub-feature fd1' and the first feature memory bank 551 is determined based on the second target sub-feature fd2' and the first feature memory bank 551
  • the twelfth sub-loss value Loss22 (corresponding to the above-mentioned second loss value).
  • the model training system further includes: a second determination part and a third determination part; the second determination part is configured to determine an initial second model based on the trained first model; the third determination part, It is configured to update the model parameters of the second model based on at least one second image sample to obtain the trained second model.
  • the method provided by the embodiment of the present disclosure has at least the following improvements:
  • pedestrian re-identification (re-identification, ReID) modeling is mainly based on pose estimation algorithms or human body analysis algorithms for auxiliary training.
  • the modeling of pedestrian re-identification in the embodiment of the present disclosure uses deep learning to perform occluded pedestrian re-identification.
  • a feature erasing and diffusion network (Feature Erasing and Diffusion Network, FED) is proposed to simultaneously process NPO and NTP, specifically, based on occlusion erasing
  • the module (Erasing Module, OEM) eliminates NPO features, supplemented by NPO augmentation strategy to simulate NPO on the overall pedestrian image, and generates accurate occlusion masks.
  • the pedestrian features and other memory features are diffused to synthesize NTP features in the feature space, which realizes the simulation of NPO occlusion interference at the image level and NTP interference at the feature level.
  • TP target Pedestrians
  • the method provided by the embodiments of the present disclosure has at least the following beneficial effects: 1) Make full use of the occlusion information of the picture and the characteristics of other pedestrians to simulate the interference of non-pedestrian occlusion and non-target pedestrians, and can better comprehensively analyze various influencing factors, Improve the model's perception of TP; 2) Use deep learning to make the results of pedestrian re-identification more accurate, and improve the accuracy of pedestrian re-identification in real and complex scenes.
  • Occluded-DukeMTMC O-Duke
  • Occluded-REID O-REID
  • Partial-REID P-REID
  • CMC Cumulative Matching Characteristic
  • mAP mean Average Precision
  • Table 1 shows the performance comparison of each pedestrian ReID method on the three data sets of O-Duke, O-REID and P-REID. Since there is no corresponding training set for O-REID and P-REID, the model trained on Market-1501 is used for testing.
  • Pedestrian ReId methods include: Part-based Convonlutional Baseline (PCB), Deep spatial feature reconstruction (DSR), High-Order re-identification (HOReID 27), Part-aware Transformer (Part-aware Transformer, PAT), Transformer-based ReID (Transformer-based Object Re-Identification, TransReID) adopts Vision without sliding window settings
  • FED achieves the highest Rank-1 and mAP on both O-Duke and O-REID datasets. Especially on the O-REID dataset, it reached 86.3%/79.3% on Rank-1/mAP, surpassing other methods by at least 4.7%/2.6%. On O-Duke, it reaches 68.1%/56.4% on Rank-1/mAP, surpassing other methods by at least 3.6%/0.7%. On the P-REID dataset, the highest mAP accuracy is achieved, reaching 80.5%, which exceeds other methods by 3.9%. Therefore, a good performance is achieved on the occluded ReID dataset.
  • NPO Augmentation NPO Aug
  • OEM and FDM are shown.
  • Numbers 1 to 5 represent baseline, baseline+NPO Aug, baseline+NPO Aug+OEM, baseline+NPO Aug+FDM and FED, respectively.
  • Model 1 uses ViT as the feature extractor and is optimized by cross-entropy loss (ID Loss) and Triplet Loss.
  • ID Loss cross-entropy loss
  • Triplet Loss Triplet Loss.
  • FDM improves Rank-1 and mAP by 1.7% and 2.4%, respectively. This means that optimizing a network with diffusion features can greatly improve the model's perception of TP. In the end, FED achieved the highest accuracy, showing that each component works both individually and together.
  • the number of searches K in the feature memory search operation is analyzed.
  • K is set as 2, 4, 6 and 8, and experiments are performed on DukeMTMC-reID, Market-1501 and Occlude-DukeMTMC.
  • the performance of the two overall person ReID datasets, DukeMTMC-reID and Market-1501 is stable on various K, within 0.5%.
  • NPO and NTP are few, failing to highlight the effectiveness of FDM.
  • For DukeMTMC-reID a large amount of training data comes with NPO and NTP, and the loss constraints can make the network have high accuracy.
  • Occluded-DukeMTMC since all the training data are overall pedestrians, the introduction of FDM can greatly simulate the multi-pedestrian situation in the test set. As K increases, FDM can better preserve the characteristics of TP and introduce realistic noise.
  • FIG. 5H is a schematic diagram of occlusion scores of pedestrian images provided by an embodiment of the present disclosure.
  • occlusion scores of some pedestrian images from OEMs are shown. Images with NPO and non-target pedestrian NTP are shown. From Fig. 6H, it can be seen that for graphs 551 and 552 with vertical object occlusion, the occlusion score is hardly affected, because symmetric pedestrians with less than half occlusion are not a critical issue for pedestrian ReID.
  • OEMs can accurately identify NPOs and flag them with a small occlusion score.
  • maps 555 and 556 with multiple pedestrian image occlusions the OEM identifies each stripe as valuable. Therefore, the subsequent FDM is crucial to improve the model performance.
  • FIG. 5I is a schematic diagram of an image retrieval result provided by an embodiment of the present disclosure. As shown in FIG. 5I , it shows the retrieval results of TransReID and FED.
  • Figure 561 and Figure 562 are object occlusion images. It is obvious that FED has a better recognition ability for NPO and can accurately retrieve the target pedestrian.
  • Figure 563 and Figure 564 are multi-pedestrian images, and FED has a stronger perception of TP and achieves higher retrieval accuracy.
  • FIG. 6 is a schematic diagram of the composition and structure of a model training device provided by an embodiment of the present disclosure.
  • the model training device 60 includes a first acquisition part 61 , feature extraction part 62 , first update part 63 , first determination part 64 and second update part 65 .
  • a first acquiring part 61 configured to acquire a first image sample containing a first object
  • the feature extraction part 62 is configured to use the first network of the first model to be trained to perform feature extraction on the first image sample to obtain the first feature of the first object;
  • the first updating part 63 is configured to use the second network of the first model to update the first features respectively based on the second features of at least one second object to obtain the first target features corresponding to the first features, each the similarity between the second object and the first object is not less than a first threshold;
  • the first determining part 64 is configured to determine a target loss value based on the first target feature
  • the second updating part 65 is configured to update the model parameters of the first model at least once based on the target loss value to obtain the trained first model.
  • the first image sample includes label information
  • the first model includes a first feature memory
  • the first feature memory includes at least one feature belonging to at least one object
  • the first determination part 64 is further configured To: determine the first loss value based on the first target feature and label information; determine the second loss value based on the first target feature and at least one feature of at least one object in the first feature memory; based on the first loss value and The second loss value determines the target loss value.
  • the first determining part 64 is further configured to: determine the first feature center of the first object and the at least one second object from at least one feature of at least one object in the first feature memory library second feature center; determining a second loss value based on the first target feature, the first feature center, and each second feature center.
  • the first feature memory library includes feature sets belonging to at least one object, each feature set includes at least one feature of the object to which it belongs, and the device further includes: a third updating part configured to feature, updating the feature set belonging to the first object in the first feature memory.
  • the first acquisition part 61 is further configured to: acquire the first sub-image and the second sub-image containing the first object, the second sub-image is an image obtained by at least performing occlusion processing on the first sub-image
  • the feature extraction part 62 is also configured to: use the first network of the first model to be trained to perform feature extraction on the first sub-image, obtain the first sub-feature of the first object, and perform feature extraction on the second sub-image Extract to obtain the second sub-feature of the first object;
  • the first update part 63 is also configured to: use the second network of the first model, based on the second feature of at least one second object, to the first sub-feature and the first sub-feature
  • the two sub-features are updated respectively to obtain the first target sub-feature corresponding to the first sub-feature and the second target sub-feature corresponding to the second sub-feature;
  • the first determining part 64 is also configured to: based on the first target sub-feature and The second target sub-feature determines the target loss value.
  • the first determination part 64 is further configured to: determine the first target loss value based on the first target sub-feature and the second target sub-feature; Two target loss values; based on the first target loss value and the second target loss value, determine the target loss value.
  • the first acquisition part 61 is further configured to: acquire the first sub-image containing the first object; based on the preset occlusion set, at least perform occlusion processing on the first sub-image to obtain the second sub-image , the occlusion set includes at least one occlusion image.
  • the first network includes a first sub-network and a second sub-network
  • the feature extraction part 62 is further configured to: use the first sub-network of the first model to be trained to respectively perform the first sub-image and Feature extraction is performed on the second sub-image to obtain the third sub-feature corresponding to the first sub-image and the fourth sub-feature corresponding to the second sub-image; use the second sub-network of the first model to determine the first sub-feature based on the third sub-feature feature, and determine the second sub-feature based on the fourth sub-feature.
  • the first determining part 64 is further configured to: determine the first target sub-loss value based on the first sub-feature and the second sub-feature; determine the second target sub-loss value based on the third sub-feature and the fourth sub-feature A target sub-loss value; determining a second target loss value based on the first target sub-loss value and the second target sub-loss value.
  • the first sub-image includes label information
  • the first determining part 64 is further configured to: determine a seventh sub-loss value based on the third sub-feature and label information; based on the fourth sub-feature and label information, Determine an eighth sub-loss value; determine a second target sub-loss value based on the seventh sub-loss value and the eighth sub-loss value.
  • the second sub-network includes a third sub-network and a fourth sub-network
  • the feature extraction part 62 is further configured to: use the third sub-network of the first model to determine the first occlusion based on the third sub-feature score, and determine the second occlusion score based on the fourth sub-feature; use the fourth sub-network, based on the third sub-feature and the first occlusion score, determine the first sub-feature, and based on the fourth sub-feature and the second occlusion score, determine Second sub-feature.
  • the third subnetwork includes a pooling subnetwork and at least one occlusion erasure subnetwork
  • the first occlusion score includes at least one first occlusion subscore
  • the second occlusion score includes at least one second occlusion subscore
  • the feature extraction part 62 is also configured to: divide the third sub-feature into at least one third sub-part feature by using the pooling sub-network, and divide the fourth sub-feature into at least one fourth sub-part feature; use each The occlusion erasure sub-network determines each first occlusion subscore based on each third subsection feature, and determines each second occlusion subscore based on each fourth subsection feature.
  • the feature extraction part 62 is further configured to: use the fourth sub-network to determine the first sub-part feature based on each third sub-part feature and each first occlusion sub-score of the third sub-feature , and based on each fourth sub-part feature and each second occlusion sub-score of the fourth sub-feature, determine the second sub-part feature; based on each first sub-part feature, determine the first sub-feature, and based on each The second sub-part feature, to determine the second sub-feature.
  • the first sub-image includes label information
  • the first model includes a second feature memory bank
  • the second feature memory bank includes at least one feature belonging to at least one object
  • the first determining part 64 is further configured to : Determine the occlusion mask based on the first sub-image and the second sub-image; determine the third loss value based on the first occlusion score, the second occlusion score and the occlusion mask; based on the first sub-feature, the second sub-feature and the label information, determine a fourth loss value; based on at least one feature of at least one object in the first sub-feature, the second sub-feature, and the second feature memory bank, determine the fifth loss value; based on the third loss value, the fourth loss value and the fifth loss value to determine the first target sub-loss value.
  • the first determining part 64 is further configured to: divide the first sub-image and the second sub-image into at least one first sub-part image and at least one second sub-part image; An occlusion sub-mask is determined for a sub-partial image and each second sub-partial image; based on each occlusion sub-mask, an occlusion mask is determined.
  • the first determining part 64 is further configured to: determine the first sub-loss value based on the first occlusion score and the occlusion mask; determine the second sub-loss value based on the second occlusion score and the occlusion mask ; Determine a third loss value based on the first sub-loss value and the second sub-loss value.
  • the first determining part 64 is further configured to: determine a third sub-loss value based on the first sub-feature and label information; determine a fourth sub-loss value based on the second sub-feature and label information; The third sub-loss value and the fourth sub-loss value determine the fourth loss value.
  • the first determination part 64 is further configured to: determine the third feature center of the first object and the The fourth feature center; based on the first sub-feature, the third feature center and each fourth feature center, determine the fifth sub-loss value; based on the second sub-feature, the third feature center and each fourth feature center, determine the first Six sub-loss values; based on the fifth sub-loss value and the sixth sub-loss value, a fifth loss value is determined.
  • the second network includes a fifth sub-network and a sixth sub-network
  • the first updating part 63 is further configured to: use the fifth sub-network to combine the first sub-feature and the second sub-feature with at least The second feature of a second object is aggregated to obtain the first aggregated sub-feature corresponding to the first sub-feature and the second aggregated sub-feature corresponding to the second sub-feature
  • the sixth sub-network is used to determine the first aggregated sub-feature based on the first aggregated sub-feature a target sub-feature, and determine a second target sub-feature based on the second aggregated sub-feature.
  • the first updating part 63 is further configured to: determine a first attention matrix based on the first sub-feature and each second feature, and the first attention matrix is used to characterize the first sub-feature and each second feature A degree of association between the second features; based on each second feature and each first attention matrix, determine the first aggregation sub-feature; based on the second sub-feature and each second feature, determine the second attention matrix , the second attention matrix is used to characterize the degree of association between the second sub-feature and each second feature; based on each second feature and each second attention matrix, the second aggregation sub-feature is determined.
  • the sixth sub-network includes a seventh sub-network and an eighth sub-network
  • the first updating part 63 is further configured to: use the seventh sub-network to determine the sixth sub-network based on the first aggregation sub-feature and the occlusion mask Five sub-features, and determine the sixth sub-feature based on the second aggregation sub-feature and the occlusion mask; use the eighth sub-network, based on the first sub-feature and the fifth sub-feature, determine the first target sub-feature, and based on the second sub-feature feature and the sixth sub-feature, determine the second target sub-feature.
  • FIG. 7 is a schematic diagram of the composition and structure of an image recognition device provided by an embodiment of the present disclosure.
  • the image recognition device 70 includes a second acquisition part 71 and identification part 72.
  • a second acquiring part 71 configured to acquire the first image and the second image
  • the identification part 72 is configured to use the trained target model to identify the object in the first image and the object in the second image to obtain a recognition result, wherein the trained target model includes: the model obtained by using the above-mentioned model training method
  • the first model the recognition result indicates that the object in the first image and the object in the second image are the same object or different objects.
  • a "part" may be a part of a circuit, a part of a processor, a part of a program or software, etc., of course it may also be a unit, a module or a non-modular one.
  • An embodiment of the present disclosure provides an electronic device, including a memory and a processor, the memory stores a computer program that can run on the processor, and the processor implements the above method when executing the computer program.
  • An embodiment of the present disclosure provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the foregoing method is implemented.
  • Computer readable storage media may be transitory or non-transitory.
  • An embodiment of the present disclosure provides a computer program product.
  • the computer program product includes a non-transitory computer-readable storage medium storing a computer program. When the computer program is read and executed by a computer, part or all of the steps in the above method are implemented.
  • the computer program product can be specifically realized by means of hardware, software or a combination thereof.
  • the computer program product is embodied as a computer storage medium, and in one embodiment, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK) and the like.
  • FIG. 8 is a schematic diagram of a hardware entity of an electronic device in an embodiment of the present disclosure.
  • the hardware entity of the electronic device 800 includes: a processor 801, a communication interface 802, and a memory 803, wherein:
  • the processor 801 generally controls the overall operation of the electronic device 800 .
  • the communication interface 802 can enable the electronic device to communicate with other terminals or servers through the network.
  • the memory 803 is configured to store instructions and applications executable by the processor 801, and can also cache data to be processed or processed by the processor 801 and various modules in the electronic device 800 (for example, image data, audio data, voice communication data and Video communication data) can be realized by flash memory (FLASH) or random access memory (Random Access Memory, RAM). Data transmission may be performed between the processor 801 , the communication interface 802 and the memory 803 through the bus 804 .
  • the disclosed devices and methods may be implemented in other ways.
  • the device embodiments described above are schematic.
  • the division of the units is a logical function division.
  • the coupling, or direct coupling, or communication connection between the components shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be electrical, mechanical or other forms of.
  • the units described above as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units; they may be located in one place or distributed to multiple network units; Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • all the functional units in the embodiments of the present disclosure may be integrated into one processing unit, each unit may be used as a single unit, or two or more units may be integrated into one unit; the above-mentioned integrated
  • the unit can be realized in the form of hardware or in the form of hardware plus software functional unit.
  • the essence of the technical solution of the present disclosure or the part that contributes to related technologies can be embodied in the form of software products, which are stored in a storage medium and include several instructions to make a An electronic device (which may be a personal computer, a server, or a network device, etc.) executes all or part of the methods described in various embodiments of the present disclosure.
  • the aforementioned storage medium includes various media capable of storing program codes such as removable storage devices, ROMs, magnetic disks or optical disks.
  • Embodiments of the present disclosure provide a model training and image recognition method, device, storage medium, and computer program product.
  • the model training method includes: acquiring a first image sample containing a first object; using the first model to be trained The first network of the first image sample is subjected to feature extraction to obtain the first feature of the first object; the second network of the first model is used to update the first feature based on at least one second feature of the second object, The first target feature corresponding to the first feature is obtained, and the similarity between each second object and the first object is not less than the first threshold; based on the first target feature, the target loss value is determined; based on the target loss value, the first model's The model parameters are updated at least once to obtain the trained first model.
  • the above scheme on the one hand, can enhance the robustness of the first model and improve the performance of the first model; on the other hand, it can improve the consistency of the prediction of the first model after training for different image samples of the same object, and then can This enables the trained first model to more accurately re-identify objects in images containing multiple objects.

Abstract

Embodiments of the present disclosure provide model training and image recognition methods and apparatuses, a device, a storage medium and a computer program product. The model training method comprises: acquiring a first image sample containing a first object; performing feature extraction on the first image sample using a first network of a first model to be trained, to obtain a first feature of the first object; updating the first feature on the basis of a second feature of at least one second object, using a second network of the first model, to obtain a first target feature corresponding to the first feature, a degree of similarity between each second object and the first object being not less than a first threshold; determining a target loss value on the basis of the first target feature; and updating a model parameter of the first model at least once on the basis of the target loss value, to obtain a trained first model.

Description

模型训练及图像识别方法和装置、设备、存储介质和计算机程序产品Model training and image recognition method and device, equipment, storage medium and computer program product
相关申请的交叉引用Cross References to Related Applications
本公开实施例基于申请号为202210107742.9、申请日为2022年01月28日、申请名称为“模型训练及图像识别方法和装置、设备及存储介质”的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本公开作为参考。The embodiment of the present disclosure is based on the Chinese patent application with the application number 202210107742.9, the application date is January 28, 2022, and the application name is "model training and image recognition method and device, equipment and storage medium", and requires the Chinese patent application The entire content of this Chinese patent application is hereby incorporated into this disclosure as a reference.
技术领域technical field
本公开涉及但不限于计算机技术领域,尤其涉及一种模型训练及图像识别方法和装置、设备、存储介质和计算机程序产品。The present disclosure relates to but not limited to the field of computer technology, and in particular relates to a model training and image recognition method and device, device, storage medium and computer program product.
背景技术Background technique
对象重识别(Object re-identification),也称对象再识别,是利用计算机视觉技术判断图像或者视频序列中是否存在特定对象的技术。对象重识别广泛被认为是一个图像检索的子问题,即,给定一个包含一对象的图像,检索跨设备下的包含该对象的图像。而设备之间的差异、拍摄的角度、环境等因素均会影响对象重识别的结果。发明内容Object re-identification (Object re-identification), also known as object re-identification, is a technology that uses computer vision technology to determine whether a specific object exists in an image or video sequence. Object re-identification is widely considered as a subproblem of image retrieval, i.e., given an image containing an object, retrieve images containing that object across devices. The differences between devices, shooting angles, environments and other factors will all affect the results of object re-identification. Contents of the invention
本公开实施例提供一种模型训练及图像识别方法和装置、设备、存储介质和计算机程序产品。Embodiments of the present disclosure provide a model training and image recognition method, device, device, storage medium, and computer program product.
本公开实施例的技术方案是这样实现的:The technical scheme of the embodiment of the present disclosure is realized in this way:
本公开实施例提供一种模型训练方法,该模型训练方法包括:An embodiment of the present disclosure provides a model training method, which includes:
获取包含第一对象的第一图像样本;obtaining a first image sample containing a first object;
利用待训练的第一模型的第一网络,对第一图像样本进行特征提取,得到第一对象的第一特征;Using the first network of the first model to be trained, performing feature extraction on the first image sample to obtain the first feature of the first object;
利用第一模型的第二网络,基于至少一个第二对象的第二特征,对第一特征进行更新,得到第一特征对应的第一目标特征,每一第二对象与第一对象的相似度不小于第一阈值;Using the second network of the first model, based on the second feature of at least one second object, the first feature is updated to obtain the first target feature corresponding to the first feature, and the similarity between each second object and the first object not less than the first threshold;
基于第一目标特征,确定目标损失值;determining a target loss value based on the first target feature;
基于目标损失值,对第一模型的模型参数进行至少一次更新,得到训练后的第一模型。Based on the target loss value, the model parameters of the first model are updated at least once to obtain the trained first model.
本公开实施例提供一种图像识别方法,该方法包括:An embodiment of the present disclosure provides an image recognition method, the method comprising:
获取第一图像和第二图像;acquire the first image and the second image;
利用已训练的目标模型,对第一图像中的对象和第二图像中的对象进行识别,得到识别结果,其中已训练的目标模型包括:采用上述模型训练方法得到的第一模型;识别结果表征第一图像中的对象和第二图像中的对象为同一对象或者不同对象。Use the trained target model to identify the object in the first image and the object in the second image to obtain the recognition result, wherein the trained target model includes: the first model obtained by the above-mentioned model training method; the recognition result representation The object in the first image and the object in the second image are the same object or different objects.
本公开实施例提供一种模型训练装置,该模型训练装置包括:An embodiment of the present disclosure provides a model training device, which includes:
第一获取部分,被配置为获取包含第一对象的第一图像样本;a first acquisition part configured to acquire a first image sample containing a first object;
特征提取部分,被配置为利用待训练的第一模型的第一网络,对第一图像样本进行特征提取,得到第一对象的第一特征;The feature extraction part is configured to use the first network of the first model to be trained to perform feature extraction on the first image sample to obtain the first feature of the first object;
第一更新部分,被配置为利用第一模型的第二网络,基于至少一个第二对象的第二特征,对第一特征分别进行更新,得到第一特征对应的第一目标特征,每一第二对象与第一对象的相似度不小于第一阈值;The first update part is configured to use the second network of the first model to update the first features respectively based on the second features of at least one second object to obtain the first target features corresponding to the first features, and each of the second objects The similarity between the second object and the first object is not less than the first threshold;
第一确定部分,被配置为基于第一目标特征,确定目标损失值;The first determination part is configured to determine a target loss value based on the first target feature;
第二更新部分,被配置为基于目标损失值,对第一模型的模型参数进行至少一次更新,得到训练后的第一模型。The second updating part is configured to update the model parameters of the first model at least once based on the target loss value to obtain the trained first model.
本公开实施例提供一种图像识别装置,该图像识别装置包括:An embodiment of the present disclosure provides an image recognition device, which includes:
第二获取部分,被配置为获取第一图像和第二图像;a second acquisition part configured to acquire the first image and the second image;
识别部分,被配置为利用已训练的目标模型,对第一图像中的对象和第二图像中的对象进行识别,得到识别结果,其中已训练的目标模型包括:采用上述模型训练方法得到的第一模型;识别结果表征第一图像中的对象和第二图像中的对象为同一对象或者不同对象。The identification part is configured to use the trained target model to identify the object in the first image and the object in the second image to obtain a recognition result, wherein the trained target model includes: the first obtained by using the above model training method A model; the recognition result indicates that the object in the first image and the object in the second image are the same object or different objects.
本公开实施例提供一种电子设备,包括处理器和存储器,存储器存储有可在处理器上运行的计算机程序,处理器执行计算机程序时实现上述方法。An embodiment of the present disclosure provides an electronic device, including a processor and a memory, the memory stores a computer program that can run on the processor, and the above method is implemented when the processor executes the computer program.
本公开实施例提供一种计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现上述方法。An embodiment of the present disclosure provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the foregoing method is implemented.
本公开实施例提供了一种计算机程序产品,该计算机程序产品包括计算机程序或指令,在计算机程序或指令在电子设备上运行的情况下,使得电子设备执行上述方法。An embodiment of the present disclosure provides a computer program product, where the computer program product includes a computer program or an instruction, and when the computer program or instruction is run on the electronic device, the electronic device is made to execute the above method.
在本公开实施例中,通过获取包含第一对象的第一图像样本;利用待训练的第一模型的第一网络,对第一图像样本进行特征提取,得到第一对象的第一特征;利用第一模型的第二网络,基于至少一个第二对象的第二特征,对第一特征进行更新,得到第一特征对应的第一目标特征,每一第二对象与第一对象的相似度不小于第一阈值; 基于第一目标特征,确定目标损失值;基于目标损失值,对第一模型的模型参数进行至少一次更新,得到训练后的第一模型。这样,在包含第一对象的第一图像样本的特征层面引入第二对象的特征作为噪声,对第一模型的整体网络结构进行训练,从而可以增强第一模型的鲁棒性和提高第一模型的性能,同时在目标损失值不满足预设条件的情况下,对第一模型的模型参数进行至少一次更新,由于目标损失值是基于第一目标特征确定的,从而可以提高训练后的第一模型对于同一对象的不同图像样本的预测的一致性,进而能够使得训练后的第一模型能够更加准确的对包含多个对象的图像中的对象进行重识别。In the embodiment of the present disclosure, by acquiring the first image sample containing the first object; using the first network of the first model to be trained, performing feature extraction on the first image sample to obtain the first feature of the first object; using The second network of the first model updates the first feature based on the second feature of at least one second object to obtain the first target feature corresponding to the first feature, and the similarity between each second object and the first object is different. less than the first threshold; determining a target loss value based on the first target feature; and updating model parameters of the first model at least once based on the target loss value to obtain a trained first model. In this way, the characteristics of the second object are introduced as noise at the feature level of the first image sample containing the first object, and the overall network structure of the first model is trained, so that the robustness of the first model can be enhanced and the first model can be improved. At the same time, when the target loss value does not meet the preset conditions, the model parameters of the first model are updated at least once. Since the target loss value is determined based on the first target feature, the first model after training can be improved. The consistency of the model's predictions for different image samples of the same object can further enable the trained first model to more accurately re-identify objects in images containing multiple objects.
应当理解的是,以上的一般描述和后文的细节描述是示例性和解释性的,而非限制本公开。It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory in nature and are not restrictive of the disclosure.
附图说明Description of drawings
此处的附图被并入说明书中并构成本说明书的一部分,这些附图示出了符合本公开的实施例,并与说明书一起用于说明本公开的技术方案。The accompanying drawings here are incorporated into the description and constitute a part of the present description. These drawings show embodiments consistent with the present disclosure, and are used together with the description to explain the technical solution of the present disclosure.
图1为本公开实施例提供的一种模型训练方法的实现流程示意图;FIG. 1 is a schematic diagram of the implementation flow of a model training method provided by an embodiment of the present disclosure;
图2为本公开实施例提供的一种模型训练方法的实现流程示意图;FIG. 2 is a schematic diagram of an implementation flow of a model training method provided by an embodiment of the present disclosure;
图3为本公开实施例提供的一种模型训练方法的实现流程示意图;FIG. 3 is a schematic diagram of an implementation flow of a model training method provided by an embodiment of the present disclosure;
图4为本公开实施例提供的一种图像识别方法的实现流程示意图;FIG. 4 is a schematic diagram of an implementation flow of an image recognition method provided by an embodiment of the present disclosure;
图5A为本公开实施例提供的一种模型训练系统的组成结构示意图;FIG. 5A is a schematic diagram of the composition and structure of a model training system provided by an embodiment of the present disclosure;
图5B为本公开实施例提供的一种模型训练系统的示意图;FIG. 5B is a schematic diagram of a model training system provided by an embodiment of the present disclosure;
图5C为本公开实施例提供的一种确定遮挡掩码的示意图;FIG. 5C is a schematic diagram of determining an occlusion mask provided by an embodiment of the present disclosure;
图5D为本公开实施例提供的一种第一网络的示意图;FIG. 5D is a schematic diagram of a first network provided by an embodiment of the present disclosure;
图5E为本公开实施例提供的一种第二子网络的示意图;FIG. 5E is a schematic diagram of a second subnetwork provided by an embodiment of the present disclosure;
图5F为本公开实施例提供的一种第二网络的示意图;FIG. 5F is a schematic diagram of a second network provided by an embodiment of the present disclosure;
图5G为本公开实施例提供的一种获取目标损失值的示意图;FIG. 5G is a schematic diagram of obtaining a target loss value provided by an embodiment of the present disclosure;
图5H为本公开实施例提供的一种行人图像的遮挡分数的示意图;FIG. 5H is a schematic diagram of an occlusion score of a pedestrian image provided by an embodiment of the present disclosure;
图5I为本公开实施例提供的一种图像检索的结果的示意图;FIG. 5I is a schematic diagram of an image retrieval result provided by an embodiment of the present disclosure;
图6为本公开实施例提供的一种模型训练装置的组成结构示意图;FIG. 6 is a schematic diagram of the composition and structure of a model training device provided by an embodiment of the present disclosure;
图7为本公开实施例提供的一种图像识别装置的组成结构示意图;FIG. 7 is a schematic diagram of the composition and structure of an image recognition device provided by an embodiment of the present disclosure;
图8为本公开实施例中电子设备的一种硬件实体示意图。FIG. 8 is a schematic diagram of a hardware entity of an electronic device in an embodiment of the present disclosure.
具体实施方式Detailed ways
为了使本公开的目的、技术方案和优点更加清楚,下面将结合附图对本公开作进一步地详细描述,所描述的实施例不应视为对本公开的限制,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其它实施例,都属于本公开保护的范围。在以下的描述中,涉及到“一些实施例”,其描述了所有可能实施例的子集,但是可以理解,“一些实施例”可以是所有可能实施例的相同子集或不同子集,并且可以在不冲突的情况下相互结合。在以下的描述中,所涉及的术语“第一\第二\第三”是区别类似的对象,不代表针对对象的特定排序,可以理解地,“第一\第二\第三”在允许的情况下可以互换特定的顺序或先后次序,以使这里描述的本公开实施例能够以除了在这里图示或描述的以外的顺序实施。除非另有定义,本文所使用的所有的技术和科学术语与属于本公开的技术领域的技术人员通常理解的含义相同。本文中所使用的术语只是为了描述本公开实施例的目的,不是旨在限制本公开。In order to make the purpose, technical solutions and advantages of the present disclosure clearer, the present disclosure will be further described in detail below in conjunction with the accompanying drawings. All other embodiments obtained under the premise of creative labor belong to the protection scope of the present disclosure. In the following description, references to "some embodiments" describe a subset of all possible embodiments, but it is understood that "some embodiments" may be the same subset or a different subset of all possible embodiments, and Can be combined with each other without conflict. In the following description, the terms "first\second\third" are used to distinguish similar objects, and do not represent a specific ordering of objects. Understandably, "first\second\third" is allowed The specific order or sequence may be interchanged under certain circumstances such that the embodiments of the disclosure described herein can be practiced in sequences other than those illustrated or described herein. Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs. The terms used herein are only for the purpose of describing the embodiments of the present disclosure, and are not intended to limit the present disclosure.
相关技术中,大多数算法采用深度神经网络进行图片特征提取,然后通过距离度量实现检索功能。然而由于行人重识别场景复杂,目标行人往往被非行人物体遮挡或被非目标行人干扰。这些算法并没有认识到遮挡问题对于检索准确率的影响,提取的行人特征表示中包含大量的噪声,降低了检索的准确率。虽然现有某些算法引入人体解析算法或姿态估计算法辅助行人重识别模型提取行人特征,然而人体解析和姿态估计算法鲁棒性不高,难以提供准确的辅助信息,甚至会误导模型进行错误特征提取,降低检索的准确率。In related technologies, most algorithms use deep neural networks to extract image features, and then implement retrieval functions through distance metrics. However, due to the complexity of pedestrian re-identification scenarios, target pedestrians are often occluded by non-pedestrian objects or disturbed by non-target pedestrians. These algorithms do not realize the impact of occlusion on the retrieval accuracy, and the extracted pedestrian feature representation contains a lot of noise, which reduces the retrieval accuracy. Although some existing algorithms introduce human body parsing algorithms or pose estimation algorithms to assist pedestrian re-identification models to extract pedestrian features, the robustness of human body parsing and pose estimation algorithms is not high, it is difficult to provide accurate auxiliary information, and even mislead the model to perform wrong features. extraction, reducing the accuracy of retrieval.
本公开实施例提供一种模型训练方法,在包含第一对象的第一图像样本的特征层面引入第二对象的特征作为噪声,对第一模型的整体网络结构进行训练,从而可以增强第一模型的鲁棒性和提高第一模型的性能,同时在目标损失值不满足预设条件的情况下,对第一模型的模型参数进行至少一次更新,由于目标损失值是基于第一目标特征确定的,从而可以提高训练后的第一模型对于同一对象的不同图像样本的预测的一致性,进而能够使得训练后的第一模型能够更加准确的对包含多个对象的图像中的对象进行重识别。本公开实施例提供的模型训练方法和图像识别方法均可以由电子设备执行,电子设备可以为笔记本电脑,平板电脑,台式计算机,机顶盒,移动设备(例如,移动电话,便携式音乐播放器,个人数字助理,专用消息设备,便携式游戏设备)等各种类型的终端,也可以实施为服务器。服务器可以是独立的物理服务器,也可以是多个物理服务器构成的服务器集群或者分布式系统,还可以是提供云服务、云数据库、云计算、云函数、云存储、网络服务、云通信、中间件服务、域名服务、安全服务、内容分发网络(Content Delivery Network,CDN)、以及大数据和人工智能平台等基础云计算服务的云服务器。下面,将结合本公开实施例中的附图,对本公开实施例中的技术方案进行清楚、完整地描述。The embodiment of the present disclosure provides a model training method, which introduces the features of the second object as noise at the feature level of the first image sample containing the first object, and trains the overall network structure of the first model, so that the first model can be enhanced Robustness and improve the performance of the first model, at the same time, when the target loss value does not meet the preset conditions, the model parameters of the first model are updated at least once, because the target loss value is determined based on the first target feature , so that the prediction consistency of the trained first model for different image samples of the same object can be improved, thereby enabling the trained first model to more accurately re-identify objects in images containing multiple objects. Both the model training method and the image recognition method provided by the embodiments of the present disclosure can be executed by electronic equipment, and the electronic equipment can be a notebook computer, a tablet computer, a desktop computer, a set-top box, a mobile device (for example, a mobile phone, a portable music player, a personal digital Various types of terminals such as assistants, dedicated messaging devices, portable game devices) can also be implemented as servers. The server can be an independent physical server, or a server cluster or distributed system composed of multiple physical servers, or it can provide cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, intermediate Cloud servers for basic cloud computing services such as mail service, domain name service, security service, content delivery network (Content Delivery Network, CDN), and big data and artificial intelligence platforms. In the following, the technical solutions in the embodiments of the present disclosure will be clearly and completely described with reference to the drawings in the embodiments of the present disclosure.
图1为本公开实施例提供的一种模型训练方法的实现流程示意图,如图1所示,该方法包括步骤S11至步骤S15,其中:Fig. 1 is a schematic diagram of the implementation flow of a model training method provided by an embodiment of the present disclosure. As shown in Fig. 1, the method includes steps S11 to S15, wherein:
步骤S11、获取包含第一对象的第一图像样本。Step S11, acquiring a first image sample including a first object.
这里,第一图像样本可以是任意合适的至少包含第一对象的图像。第一图像样本中包含的内容可以根据实际应用场景确定,例如,仅包括第一对象、或包括第一对象以及物、或其它对象中的至少一种。该第一对象可以包括但不限于人、动物、植物、物品等。例如,第一图像样本是包含张三的人脸图像。又例如,第一图像样本是包含李四的整个人的图像。在一些实施方式中,第一图像样本可以包括至少一张图像。例如,第一图像样本是训练集中的任意一张图像。又例如,第一图像样本包括第一子图像和第二子图像,其中第一子图像是训练集中的一张图像,第二子图像是对第一子图像进行增广处理后得到的图像。其中,增广处理可以包括但不限于遮挡处理、缩放处理、裁剪处理、尺寸调整处理、填充处理、翻转处理、颜色抖动处理、灰度处理、高斯模糊处理、随机擦除处理等中的至少一种。在实施时,本领域技术人员可以根据实际情况,对第一子图像采用合适的增广处理得到第二子图像,本公开实施例并不限定。还例如,第一图像样本包括第一子图像和多张第二子图像,其中第一子图像是训练集中的一张图像,每一第二子图像是分别对第一子图像进行增广处理后得到的图像。Here, the first image sample may be any suitable image containing at least the first object. Content contained in the first image sample may be determined according to an actual application scenario, for example, only the first object, or at least one of the first object and the object, or other objects. The first object may include, but is not limited to, people, animals, plants, objects, and the like. For example, the first image sample is a face image containing Zhang San. For another example, the first image sample is an image including Li Si's whole person. In some implementations, the first image sample may include at least one image. For example, the first image sample is any image in the training set. For another example, the first image sample includes a first sub-image and a second sub-image, wherein the first sub-image is an image in the training set, and the second sub-image is an image obtained by augmenting the first sub-image. Wherein, the augmentation processing may include but not limited to at least one of occlusion processing, scaling processing, cropping processing, size adjustment processing, filling processing, flipping processing, color dithering processing, grayscale processing, Gaussian blur processing, random erasing processing, etc. kind. During implementation, those skilled in the art may use appropriate augmentation processing on the first sub-image to obtain the second sub-image according to actual conditions, which is not limited in the embodiments of the present disclosure. Also for example, the first image sample includes a first sub-image and a plurality of second sub-images, wherein the first sub-image is an image in the training set, and each second sub-image is an augmentation process on the first sub-image image obtained after.
步骤S12、利用待训练的第一模型的第一网络,对第一图像样本进行特征提取,得到第一对象的第一特征。Step S12 , using the first network of the first model to be trained, to perform feature extraction on the first image sample to obtain the first feature of the first object.
这里,第一模型可以是任意合适的基于图像特征进行对象识别的模型。该第一模型可以至少包括第一网络。该第一特征可以包括但不限于第一图像样本的原始特征、对原始特征进行处理后得到的特征。该原始特征可以包括但不限于图像中包含的第一对象的人脸特征、身体特征等。在一些实施方式中,该第一网络可以至少包括第一子网络,该第一子网络用于采用特征提取器来提取该第一图像的特征。该特征提取器可以包括但不限于循环神经网络(Recurrent Neural Network,RNN)、卷积神经网络(Convolutional Neural Network,CNN)、基于转换器(Transform)的特征提取网络等。在实施时,本领域技术人员可以根据实际情况在第一模型中采用合适的第一网络得到第一特征,本公开实施例不作限定。例如,通过该第一子网络提取第一图像样本的第三特征,并将该第三特征确定为第一对象的第一特征。这里,第三特征可以包括但不限于第一图像样本的原始特征等。Here, the first model may be any suitable model for object recognition based on image features. The first model may include at least a first network. The first feature may include, but not limited to, the original feature of the first image sample, or a feature obtained by processing the original feature. The original feature may include but not limited to the face feature, body feature, etc. of the first object included in the image. In some implementations, the first network may at least include a first sub-network, and the first sub-network is used to extract features of the first image using a feature extractor. The feature extractor may include, but not limited to, a recurrent neural network (Recurrent Neural Network, RNN), a convolutional neural network (Convolutional Neural Network, CNN), a feature extraction network based on a converter (Transform), etc. During implementation, those skilled in the art may use an appropriate first network in the first model to obtain the first feature according to actual conditions, which is not limited in the embodiments of the present disclosure. For example, the third feature of the first image sample is extracted through the first sub-network, and the third feature is determined as the first feature of the first object. Here, the third feature may include, but not limited to, the original feature of the first image sample and the like.
在一些实施方式中,该第一网络还可以包括第二子网络,该第二子网络用于基于该第一图像样本的第三特征,确定该第一对象的第一特征。在一些实施方式中,第二子网络可以包括遮挡擦除网络,该遮挡擦除网络用于对输入的第三特征进行遮挡擦除处理,以得到该第一对象的第一特征。In some implementations, the first network may further include a second sub-network for determining the first feature of the first object based on the third feature of the first image sample. In some implementations, the second sub-network may include an occlusion erasure network, which is used to perform occlusion erasure processing on the input third feature to obtain the first feature of the first object.
步骤S13、利用第一模型的第二网络,基于至少一个第二对象的第二特征,对第一特征进行更新,得到第一特征对应的第一目标特征。Step S13 , using the second network of the first model to update the first feature based on the second feature of at least one second object to obtain the first target feature corresponding to the first feature.
这里,每一第二对象与第一对象的相似度不小于第一阈值。其中,该第一阈值可以是预先设定的,也可以是统计得到的。在实施时,本领域技术人员可以根据实际需求自主确定第一阈值的设定方式,本公开实施例不作限定。例如,第二对象与第一对象的长相特征之间的相似度不小于第一阈值。又例如,第二对象与第一对象的穿着特征之间的相似度不小于第一阈值。再例如,第二对象与第一对象的长相特征之间的相似度和穿着特征之间的相似度均不小于第一阈值。Here, the similarity between each second object and the first object is not less than the first threshold. Wherein, the first threshold may be preset or obtained by statistics. During implementation, those skilled in the art may independently determine the setting manner of the first threshold according to actual needs, which is not limited in the embodiments of the present disclosure. For example, the similarity between the facial features of the second object and the first object is not less than the first threshold. For another example, the similarity between the wearing features of the second object and the first object is not less than the first threshold. For another example, neither the similarity between the appearance characteristics of the second object nor the similarity between the clothing characteristics of the first object is less than the first threshold.
第二特征可以是基于训练集得到的,也可以是预先输入的。该第二对象可以包括但不限于人、动物、植物、物品等。The second feature can be obtained based on the training set, or can be pre-input. The second object may include, but is not limited to, people, animals, plants, objects, and the like.
在一些实施方式中,可以基于每一第二对象的第二特征和第一对象的第一特征之间的相似度,得到每一第二对象与第一对象的相似度。在一些实施方式中,可以基于每一第二对象的特征中心和第一对象的第一特征之间的相似度,得到每一第二对象与第一对象的相似度。其中,第一模型可以包括第二记忆特征库,该第二记忆特征库可以包括至少一个对象的至少一个特征。该第二对象的特征中心可以是基于第二记忆特征库中属于第二对象的至少一个特征得到的。在一些实施方式中,可以提取训练集中至少一个对象的多个图像样本的特征,并将提取的特征按照身份存储至第二记忆特征库中。In some implementations, the similarity between each second object and the first object may be obtained based on the similarity between the second feature of each second object and the first feature of the first object. In some implementations, the similarity between each second object and the first object may be obtained based on the similarity between the feature center of each second object and the first feature of the first object. Wherein, the first model may include a second memory feature library, and the second memory feature library may include at least one feature of at least one object. The feature center of the second object may be obtained based on at least one feature belonging to the second object in the second memory feature library. In some implementations, features of multiple image samples of at least one object in the training set may be extracted, and the extracted features may be stored in the second memory feature library according to their identity.
在一些实施方式中,第二网络可以包括第五子网络和第六子网络,该第五子网络用于将第二特征与第一特征进行聚合,得到第一聚合子特征;该第六子网络用于对第一聚合子特征进行更新,得到第一目标特征。In some embodiments, the second network may include a fifth sub-network and a sixth sub-network, the fifth sub-network is used to aggregate the second feature with the first feature to obtain the first aggregated sub-feature; the sixth sub-network The network is used to update the first aggregation sub-feature to obtain the first target feature.
步骤S14、基于第一目标特征,确定目标损失值。Step S14. Determine the target loss value based on the first target feature.
这里,目标损失值可以包括但不限于均方误差损失值、交叉熵损失值、对比损失值等中的至少一种。Here, the target loss value may include, but not limited to, at least one of a mean square error loss value, a cross-entropy loss value, a comparison loss value, and the like.
步骤S15、基于目标损失值,对第一模型的模型参数进行至少一次更新,得到训练后的第一模型。Step S15 , based on the target loss value, update the model parameters of the first model at least once to obtain the trained first model.
这里,可以基于目标损失值,确定是否需要对第一模型的模型参数进行更新。例如,目标损失值与阈值进行比对,在目标损失值大于阈值的情况下,对第一模型的模型参数进行更新;在目标损失值不大于阈值的情况下,将该第一模型确定为训练后的第一模型。又例如,将目标损失值与上一次的目标损失值进行比对,在目标损失值大于上一次的目标损失值的情况下,对第一模型的模型参数进行更新;在目标损失值基本等于上一次的目标损失值的情况下,将该第一模型确定为训练后的第一模型。Here, based on the target loss value, it may be determined whether to update the model parameters of the first model. For example, the target loss value is compared with the threshold value, and if the target loss value is greater than the threshold value, the model parameters of the first model are updated; when the target loss value is not greater than the threshold value, the first model is determined as training After the first model. For another example, compare the target loss value with the last target loss value, and update the model parameters of the first model if the target loss value is greater than the last target loss value; In the case of a target loss value once, the first model is determined as the first model after training.
在本公开实施例中,通过获取包含第一对象的第一图像样本;利用待训练的第一模型的第一网络,对第一图像样本进行特征提取,得到第一对象的第一特征;利用第一模型的第二网络,基于至少一个第二对象的第二特征,对第一特征进行更新,得到第一特征对应的第一目标特征,每一第二对象与第一对象的相似度不小于第一阈值;基于第一目标特征,确定目标损失值;基于目标损失值,对第一模型的模型参数进行至少一次更新,得到训练后的第一模型。这样,在包含第一对象的第一图像样本的特征层面引入第二对象的特征作为噪声,对第一模型的整体网络结构进行训练,从而可以增强第一模型的鲁棒性和提高第一模型的性能,同时在目标损失值不满足预设条件的情况下,对第一模型的模型参数进行至少一次更新,由于目标损失值是基于第一目标特征确定的,从而可以 提高训练后的第一模型对于同一对象的不同图像样本的预测的一致性,进而能够使得训练后的第一模型能够更加准确的对包含多个对象的图像中的对象进行重识别。In the embodiment of the present disclosure, by acquiring the first image sample containing the first object; using the first network of the first model to be trained, performing feature extraction on the first image sample to obtain the first feature of the first object; using The second network of the first model updates the first feature based on the second feature of at least one second object to obtain the first target feature corresponding to the first feature, and the similarity between each second object and the first object is different. less than the first threshold; determine the target loss value based on the first target feature; and update the model parameters of the first model at least once based on the target loss value to obtain the trained first model. In this way, the characteristics of the second object are introduced as noise at the feature level of the first image sample containing the first object, and the overall network structure of the first model is trained, so that the robustness of the first model can be enhanced and the first model can be improved. At the same time, when the target loss value does not meet the preset conditions, the model parameters of the first model are updated at least once. Since the target loss value is determined based on the first target feature, the first model after training can be improved. The consistency of the model's predictions for different image samples of the same object can further enable the trained first model to more accurately re-identify objects in images containing multiple objects.
在一些实施方式中,第一图像样本包括标签信息,第一模型包括第一特征记忆库,该第一特征记忆库包括属于至少一个对象的至少一个特征;上述步骤S14包括步骤S141至步骤S143,其中:In some embodiments, the first image sample includes label information, the first model includes a first feature memory library, and the first feature memory library includes at least one feature belonging to at least one object; the above step S14 includes step S141 to step S143, in:
步骤S141、基于第一目标特征和标签信息,确定第一损失值。Step S141. Determine a first loss value based on the first target feature and label information.
这里,标签信息可以包括但不限于标签值、标识等。第一损失值可以包括但不限于交叉熵损失值等。在一些实施方式中,可以通过如下公式(1-1)计算第一损失值:Here, tag information may include, but not limited to, tag values, identifiers, and the like. The first loss value may include, but not limited to, a cross-entropy loss value and the like. In some embodiments, the first loss value can be calculated by the following formula (1-1):
Figure PCTCN2022127109-appb-000001
Figure PCTCN2022127109-appb-000001
其中,W是线性矩阵,W i和W j是W中的元素,y i表示第i个对象的标签信息,f i表示第i个对象的第一目标特征,ID S表示训练集中对象的总数。 Among them, W is a linear matrix, W i and W j are the elements in W, y i represents the label information of the i-th object, f i represents the first target feature of the i-th object, ID S represents the total number of objects in the training set .
步骤S142、基于第一目标特征和第一特征记忆库中的至少一个对象的至少一个特征,确定第二损失值。Step S142: Determine a second loss value based on the first target feature and at least one feature of at least one object in the first feature memory.
这里,第一特征记忆库中至少存储了第一对象的至少一个特征和至少一个第二对象的至少一个特征。第二损失值可以包括但不限于对比损失等。Here, at least one feature of the first object and at least one feature of at least one second object are stored in the first feature storage. The second loss value may include but not limited to contrastive loss and the like.
步骤S143、基于第一损失值和第二损失值,确定目标损失值。Step S143: Determine a target loss value based on the first loss value and the second loss value.
这里,目标损失值可以包括但不限于第一损失值和第二损失值之间的和、对第一损失值和第二损失值分别加权之后的和等。在实施时,本领域技术人员可以根据实际需求确定目标损失值的方式,本公开实施例不作限定。在一些实施方式中,可以通过如下公式(1-2)计算目标损失值:Here, the target loss value may include, but not limited to, the sum of the first loss value and the second loss value, the sum after weighting the first loss value and the second loss value respectively, and the like. During implementation, those skilled in the art may determine the target loss value according to actual needs, which is not limited by the embodiments of the present disclosure. In some embodiments, the target loss value can be calculated by the following formula (1-2):
Figure PCTCN2022127109-appb-000002
Figure PCTCN2022127109-appb-000002
其中,
Figure PCTCN2022127109-appb-000003
表示第一损失值,
Figure PCTCN2022127109-appb-000004
表示第二损失值。
in,
Figure PCTCN2022127109-appb-000003
represents the first loss value,
Figure PCTCN2022127109-appb-000004
Indicates the second loss value.
在一些实施方式中,步骤S142包括步骤S1421至步骤S1422,其中:In some embodiments, step S142 includes step S1421 to step S1422, wherein:
步骤S1421、从第一特征记忆库中的至少一个对象的至少一个特征中,确定第一对象的第一特征中心和至少一个第二对象的第二特征中心。Step S1421. From at least one feature of at least one object in the first feature memory, determine a first feature center of the first object and a second feature center of at least one second object.
在一些实施方式中,第一特征中心可以是基于第一特征记忆库中的第一对象的特征和该第一目标特征,确定第一特征中心。每一第二特征中心可以是基于第二特征记忆库中的每一第二对象的每一特征确定的。在一些实施方式中,可以通过如下公式(1-3)计算每个对象的特征中心:In some implementations, the first feature center may be determined based on the features of the first object in the first feature memory and the first target feature. Each second feature center may be determined based on each feature of each second object in the second feature memory. In some embodiments, the feature center of each object can be calculated by the following formula (1-3):
Figure PCTCN2022127109-appb-000005
Figure PCTCN2022127109-appb-000005
其中,c k表示第k个对象的特征中心,B k表示小批量中属于第k个对象的特征集,m是设定的更新的动量系数,f i′是第i个样本的第一特征。在一些实施方式中,m可以为0.2。 Among them, c k represents the feature center of the k-th object, B k represents the feature set belonging to the k-th object in the mini-batch, m is the set updated momentum coefficient, f i ′ is the first feature of the i-th sample . In some embodiments, m can be 0.2.
在一些实施方式中,在f i′和B k都属于同一对象的情况下,属于该对象的特征中心c k会变化,在f i′和B k不属于同一对象的情况下,属于该对象的特征中心c k与上一次c k一致。 In some implementations, when f i ' and B k both belong to the same object, the feature center c k belonging to the object will change, and in the case that f i ' and B k do not belong to the same object, the feature center c k belonging to the object The feature center c k is consistent with the previous c k .
步骤S1422、基于第一目标特征、第一特征中心和每一第二特征中心,确定第二损失值。Step S1422. Determine a second loss value based on the first target feature, the first feature center and each second feature center.
在一些实施方式中,可以通过如下公式(1-4)计算第二损失值:In some embodiments, the second loss value can be calculated by the following formula (1-4):
Figure PCTCN2022127109-appb-000006
Figure PCTCN2022127109-appb-000006
其中,τ是预定义的温度参数,c i表示第i个对象的第一特征中心,c j表示每一第二特征中心,f i表示第i个对象的第一目标特征,ID S表示训练集中对象的总数。 Among them, τ is a predefined temperature parameter, c i represents the first feature center of the i-th object, c j represents each second feature center, f i represents the first target feature of the i-th object, ID S represents the training The total number of objects in the set.
在一些实施方式中,上述步骤S15包括步骤S151或步骤S152,其中:In some embodiments, the above step S15 includes step S151 or step S152, wherein:
步骤S151、在目标损失值不满足预设条件的情况下,对第一模型的模型参数进行更新,得到更新后的第一模型;基于更新后的第一模型,确定训练后的第一模型。Step S151 , if the target loss value does not meet the preset condition, update the model parameters of the first model to obtain an updated first model; based on the updated first model, determine a trained first model.
这里,对第一模型的模型参数进行更新的方式可以包括但不限于梯度下降法、动量更新法、牛顿动量法等中的至少一种。在实施时,本领域技术人员可以根据实际需求自主确定更新方式,本公开实施例不作限定。Here, the manner of updating the model parameters of the first model may include but not limited to at least one of gradient descent method, momentum update method, Newton momentum method and the like. During implementation, those skilled in the art may independently determine the update mode according to actual needs, which is not limited in the embodiments of the present disclosure.
步骤S152、在目标损失值满足预设条件的情况下,将更新后的第一模型确定为训练后的第一模型。Step S152, if the target loss value satisfies the preset condition, determine the updated first model as the trained first model.
这里,预设条件可以包括但不限于目标损失值小于阈值、目标损失值变化收敛等。在实施时,本领域技术人员可以根据实际需求自主确定预设条件,本公开实施例不作限定。Here, the preset conditions may include, but are not limited to, the target loss value being smaller than a threshold, the change of the target loss value converging, and the like. During implementation, those skilled in the art may independently determine the preset conditions according to actual needs, which are not limited by the embodiments of the present disclosure.
在一些实施方式中,步骤S151中的基于更新后的第一模型,确定训练后的第一模型,包括步骤S1511至步骤S1515,其中:In some implementations, determining the first model after training based on the updated first model in step S151 includes steps S1511 to S1515, wherein:
步骤S1511、获取下一第一图像样本;Step S1511, acquiring the next first image sample;
步骤S1512、利用更新后的待训练的第一模型的第一网络,对下一第一图像样本进行特征提取,得到下一第一特征;Step S1512. Using the updated first network of the first model to be trained, perform feature extraction on the next first image sample to obtain the next first feature;
步骤S1513、利用更新后的第一模型的第二网络,基于至少一个第二对象的第二特征,对下一第一特征进行更新,得到下一第一特征对应的下一第一目标特征;Step S1513, using the updated second network of the first model to update the next first feature based on the second feature of at least one second object, to obtain the next first target feature corresponding to the next first feature;
步骤S1514、基于下一第一目标特征,确定下一目标损失值;Step S1514, based on the next first target feature, determine the next target loss value;
步骤S1515、基于下一目标损失值,对更新后的第一模型的模型参数进行至少一次下一次更新,得到训练后 的第一模型。Step S1515. Based on the next target loss value, perform at least one next update on the model parameters of the updated first model to obtain the trained first model.
这里,上述步骤S1511至步骤S1515分别对应于前述步骤S11至步骤S15,在实施时可以参照前述步骤S11至步骤S15的实施方式。Here, the above step S1511 to step S1515 correspond to the above step S11 to step S15 respectively, and for implementation, reference may be made to the implementation manner of the above step S11 to step S15.
在本公开实施方式中,在目标损失值不满足预设条件的情况下对第一模型的模型参数进行下一次更新,并基于下一次更新后的第一模型确定训练后的第一模型,从而可以通过不断迭代更新进一步提高训练后的第一模型的性能。In the embodiment of the present disclosure, when the target loss value does not meet the preset condition, the model parameters of the first model are updated next time, and the first model after training is determined based on the first model after the next update, so that The performance of the trained first model can be further improved through continuous iterative updating.
在一些实施方式中,第一特征记忆库中包括属于至少一个对象的特征集,每一特征集包括所属对象的至少一个特征,该方法还包括步骤S16,其中:In some embodiments, the first feature memory library includes feature sets belonging to at least one object, each feature set includes at least one feature of the object to which it belongs, and the method further includes step S16, wherein:
步骤S16、基于第一目标特征,更新第一特征记忆库中属于第一对象的特征集。Step S16 , based on the first target feature, update the feature set belonging to the first object in the first feature storage.
这里,更新的方式可以包括但不限于将第一目标特征新增至第一特征记忆库中、将第一特征记忆库中的某一特征替换为第一目标特征等。Here, the way of updating may include but not limited to adding the first target feature to the first feature storage, replacing a certain feature in the first feature storage with the first target feature, and so on.
在本公开实施方式中,通过更新第一特征记忆库中第一对象的特征,可以准确得到属于第一对象的第一特征中心,进一步提高了训练后的第一模型的识别准确性。In the embodiments of the present disclosure, by updating the features of the first object in the first feature memory library, the first feature center belonging to the first object can be accurately obtained, which further improves the recognition accuracy of the trained first model.
图2为本公开实施例提供的一种模型训练方法的实现流程示意图,如图2所示,该方法包括步骤S21至步骤S25,其中:Fig. 2 is a schematic diagram of the implementation flow of a model training method provided by an embodiment of the present disclosure. As shown in Fig. 2, the method includes steps S21 to S25, wherein:
步骤S21、获取包含第一对象的第一子图像和第二子图像。Step S21, acquiring a first sub-image and a second sub-image containing the first object.
这里,第二子图像可以是对第一子图像至少进行遮挡处理后的图像。该第二子图像可以包括至少一张图像。在一些实施方式中,在第二子图像包括多张图像的情况下,该多张图像可以是对第一子图像分别至少进行遮挡处理后得到的图像。至少进行遮挡处理可以包括但不限于仅遮挡处理、或遮挡处理和其它处理等。在一些实施方式中,其它处理可以包括但不限于缩放处理、裁剪处理、尺寸调整处理、填充处理、翻转处理、颜色抖动处理、灰度处理、高斯模糊处理、以及随机擦除处理等中的至少一种。在实施时,本领域技术人员可以根据实际情况,对第一子图像采用合适的处理方式得到第二子图像,本公开实施例并不限定。Here, the second sub-image may be an image after at least occlusion processing is performed on the first sub-image. The second sub-image may include at least one image. In some implementations, when the second sub-image includes multiple images, the multiple images may be images obtained by at least performing occlusion processing on the first sub-image respectively. Performing at least occlusion processing may include but not limited to only occlusion processing, or occlusion processing and other processing, and the like. In some embodiments, other processing may include, but not limited to, at least one of scaling, cropping, resizing, filling, flipping, color dithering, grayscale, Gaussian blur, and random erasing. A sort of. During implementation, those skilled in the art may use an appropriate processing method on the first sub-image to obtain the second sub-image according to actual conditions, which is not limited in the embodiments of the present disclosure.
在一些实施方式中,步骤S21包括步骤S211至步骤S212,其中:In some embodiments, step S21 includes step S211 to step S212, wherein:
步骤S211、获取包含第一对象的第一子图像。Step S211, acquiring a first sub-image including a first object.
这里,第一子图像可以是任意合适的至少包含第一对象的图像。第一子图像中包含的内容可以根据实际应用场景确定,例如,仅包括第一对象、或包括第一对象以及物、或其它对象中的至少一种。该第一对象可以包括但不限于人、动物、植物、物品等。例如,第一子图像是包含张三的人脸图像。又例如,第一子图像是包含李四的整个人的图像。Here, the first sub-image may be any suitable image containing at least the first object. The content contained in the first sub-image may be determined according to an actual application scene, for example, only include the first object, or include at least one of the first object and an object, or other objects. The first object may include, but is not limited to, people, animals, plants, objects, and the like. For example, the first sub-image is a face image containing Zhang San. For another example, the first sub-image is an image including Li Si's whole person.
步骤S212、基于预设的遮挡集,对第一子图像至少进行遮挡处理,得到第二子图像。Step S212 , based on the preset occlusion set, perform at least occlusion processing on the first sub-image to obtain a second sub-image.
这里,遮挡集包括至少一个遮挡图像。遮挡集可以包括但不限于基于训练集、其它图像等中的至少一种建立的。其中,遮挡集至少包括多种遮挡物体图像、背景图像等,比如:树叶、车辆、垃圾桶、建筑物、树木、花等。例如,在训练集合中寻找背景和物体遮挡的图像样本,并手动将该遮挡部分裁剪出来以组成遮挡库。又例如,选取合适的包含至少一种物体遮挡的图像,手动将该遮挡部分裁剪出来以组成遮挡库。在实施时,本领域技术人员可以根据实际需求选择合适的方式建立遮挡集,本公开实施例并不限定。Here, the occlusion set includes at least one occlusion image. The occlusion set may include, but is not limited to, one established based on at least one of a training set, other images, and the like. Wherein, the occlusion set includes at least a variety of occlusion object images, background images, etc., such as leaves, vehicles, trash cans, buildings, trees, flowers, and the like. For example, find image samples occluded by background and objects in the training set, and manually crop out the occluded parts to form an occlusion library. For another example, a suitable image containing at least one object occlusion is selected, and the occlusion part is manually cut out to form an occlusion library. During implementation, those skilled in the art may choose an appropriate way to establish an occlusion set according to actual requirements, which is not limited by the embodiments of the present disclosure.
遮挡物的位置可以包括但不限于指定位置、指定大小等。在一些实施方式中,由于遮挡常发生在顶部、底部、左侧、右侧这四个位置的四分之一至一半的区域,因此该指定位置可以设定为这四个位置的四分之一至一半的区域中。在实施时,本领域技术人员可以根据实际需求确定遮挡物的位置,本公开实施例并不限定。The position of the occluder may include, but not limited to, a specified position, a specified size, and the like. In some implementations, since occlusion often occurs in the quarter to half of the four positions of top, bottom, left and right, the specified position can be set as a quarter of the four positions in one to half of the area. During implementation, those skilled in the art may determine the position of the barrier according to actual needs, which is not limited by the embodiments of the present disclosure.
在一些实施方式中,至少进行遮挡处理可以包括但不限于遮挡处理和其它处理。例如,在至少进行遮挡处理包括遮挡处理和尺寸调整的情况下,随机从遮挡库中选择遮挡物图像,基于调整规则对遮挡物图像进行尺寸调整处理,基于预设规则,将调整大小后的遮挡物图像粘贴在第一图像样本的右下角。其中,调整规则可以包括但不限于调整遮挡物图像的尺寸、调整第一图像样本的尺寸等。比如,如果遮挡物图像的高度超过遮挡物图像的宽度的两倍,认为是垂直遮挡,可以将遮挡物图像的高度调整为遮挡物图像的垂直高度,遮挡物图像的宽度调整为第一图像样本的宽度的四分之一至二分之一;反之则认为是水平遮挡,可以将遮挡物图像的宽度调整为遮挡物图像的水平宽度,遮挡物图像的高度调整为第一图像样本的高度的四分之一至二分之一。在实施时,本领域技术人员可以根据实际需求确定调整规则,本公开实施例并不限定。又例如,在至少进行遮挡处理包括遮挡处理、尺寸调整处理、填充处理和裁剪处理的情况下,首先,对第一图像样本进行尺寸调整处理、填充处理和裁剪处理;其次,从遮挡库中随机选择一张遮挡物图像,基于调整规则对遮挡物图像进行尺寸调整处理;然后,基于预设规则,随机选择第一图像样本的一个角作为起点,将调整大小后的遮挡物图像粘贴至该起点处。In some implementations, performing at least occlusion processing may include, but is not limited to, occlusion processing and other processing. For example, in the case of at least performing occlusion processing including occlusion processing and size adjustment, the occlusion image is randomly selected from the occlusion library, and the size of the occlusion image is adjusted based on the adjustment rules. Based on the preset rules, the resized occlusion The object image is pasted in the lower right corner of the first image sample. Wherein, the adjustment rule may include but not limited to adjusting the size of the occluder image, adjusting the size of the first image sample, and the like. For example, if the height of the occluder image exceeds twice the width of the occluder image, it is considered to be vertical occlusion, and the height of the occluder image can be adjusted to the vertical height of the occluder image, and the width of the occluder image can be adjusted to the first image sample 1/4 to 1/2 of the width of the occluder image; otherwise, it is regarded as horizontal occlusion, and the width of the occluder image can be adjusted to the horizontal width of the occluder image, and the height of the occluder image can be adjusted to the height of the first image sample One quarter to one half. During implementation, those skilled in the art may determine the adjustment rule according to actual needs, which is not limited by the embodiments of the present disclosure. For another example, in the case of at least performing occlusion processing including occlusion processing, resizing processing, filling processing, and cropping processing, firstly, perform resizing processing, padding processing, and cropping processing on the first image sample; Select an occluder image, and resize the occluder image based on adjustment rules; then, based on preset rules, randomly select a corner of the first image sample as a starting point, and paste the resized occluder image to the starting point place.
在一些实施方式中,该方法还包括步骤S213,其中:In some embodiments, the method also includes step S213, wherein:
步骤S213、基于第一子图像和第二子图像,确定遮挡掩码。Step S213, based on the first sub-image and the second sub-image, determine an occlusion mask.
这里,遮挡掩码用于表示图像的遮挡信息。该遮挡掩码可以用于第一模型对物体遮挡的训练。在一些实施方式中,可以基于第一子图像和第二子图像之间的像素差异,确定该遮挡掩码。在实施时,可以基于如下公式(2-1)计算第一子图像和第二子图像之间的差异:Here, the occlusion mask is used to represent the occlusion information of the image. The occlusion mask can be used for training the first model on object occlusion. In some implementations, the occlusion mask may be determined based on pixel differences between the first sub-image and the second sub-image. During implementation, the difference between the first sub-image and the second sub-image can be calculated based on the following formula (2-1):
d=|x-x′|     (2-1);d=|x-x′| (2-1);
其中,x表示第一子图像,x′表示第二子图像。Wherein, x represents the first sub-image, and x' represents the second sub-image.
在一些实施方式中,上述步骤S213包括步骤S2131至步骤S2133,其中:In some implementations, the above step S213 includes steps S2131 to S2133, wherein:
步骤S2131、将第一子图像和第二子图像分别划分为至少一个第一子部分图像和至少一个第二子部分图像。Step S2131. Divide the first sub-image and the second sub-image into at least one first sub-part image and at least one second sub-part image respectively.
在一些实施方式中,由于不同图像之间的语义(例如,身体部位)存在错位,细粒度的遮挡掩码容易有很多假标签,因此,可以粗略地将第一子图像和第二子图像水平划分为多个部分,基于第一子图像的每一部分和第二子图像的每一部分之间的像素差异,确定该遮挡掩码。例如,划分成四个部分、划分成五个部分等。在实施时,本领域技术人员可以根据实际需求对第一子图像和第二子图像进行划分,本公开实施例并不限定。In some implementations, fine-grained occlusion masks tend to have many false labels due to misalignment of semantics (e.g., body parts) between different images, so the first sub-image and the second sub-image can be roughly horizontally Divided into a plurality of parts, the occlusion mask is determined based on pixel differences between each part of the first sub-image and each part of the second sub-image. For example, divided into four parts, divided into five parts, etc. During implementation, those skilled in the art may divide the first sub-image and the second sub-image according to actual requirements, which is not limited in the embodiments of the present disclosure.
步骤S2132、基于每一第一子部分图像和每一第二子部分图像,确定遮挡子掩码。Step S2132, based on each first sub-part image and each second sub-part image, determine an occlusion sub-mask.
在一些实施方式中,基于上述公式(2-1)可以得到每一第一子部分图像和每一第二子部分图像之间的像素差异,并基于每一部分的像素差异,确定每一遮挡子掩码。In some implementations, the pixel difference between each first sub-part image and each second sub-part image can be obtained based on the above formula (2-1), and based on the pixel difference of each part, determine the mask.
步骤S2133、基于每一遮挡子掩码,确定遮挡掩码。Step S2133: Determine an occlusion mask based on each occlusion sub-mask.
在一些实施方式中,在d i不小于第一阈值的情况下,表明该部分图像存在遮挡,此时遮挡子掩码mask i可以设定为0,反之则表明该部分不存在遮挡,此时mask i可以设定为1,那么对应的遮挡掩码mask则为各个部分的遮挡子掩码。例如,将第一子图像和第二子图像均划分为四个部分,在第一、二、三部分不存在遮挡,第四部分存在遮挡的情况下,那么此时的遮挡掩码mask应该为1110。在实施时,本领域技术人员可以根据实际需求确定遮挡掩码,本公开实施例并不限定。 In some implementations, if d i is not less than the first threshold, it indicates that there is occlusion in this part of the image. At this time, the occlusion sub-mask mask i can be set to 0. Otherwise, it indicates that there is no occlusion in this part. At this time mask i can be set to 1, then the corresponding occlusion mask mask is the occlusion sub-mask of each part. For example, the first sub-image and the second sub-image are divided into four parts, in the case that there is no occlusion in the first, second and third parts, and there is occlusion in the fourth part, then the occlusion mask mask at this time should be 1110. During implementation, those skilled in the art may determine the occlusion mask according to actual needs, which is not limited by the embodiments of the present disclosure.
步骤S22、利用待训练的第一模型的第一网络,对第一子图像进行特征提取,得到第一对象的第一子特征,并对第二子图像进行特征提取,得到第一对象的第二子特征。Step S22. Using the first network of the first model to be trained, perform feature extraction on the first sub-image to obtain the first sub-feature of the first object, and perform feature extraction on the second sub-image to obtain the first sub-feature of the first object. Two sub-features.
这里,第一模型可以是任意合适的基于图像特征进行对象识别的模型。该第一模型可以至少包括第一网络。该第一子特征可以包括但不限于第一子图像的原始特征、对原始特征进行处理后得到的特征。该第二子特征可以包括但不限于第二子图像的原始特征、对原始特征进行处理后得到的特征。该原始特征可以包括但不限于该图像中包含的对象的人脸特征、身体特征等。Here, the first model may be any suitable model for object recognition based on image features. The first model may include at least a first network. The first sub-feature may include, but not limited to, the original feature of the first sub-image, or a feature obtained by processing the original feature. The second sub-feature may include, but not limited to, the original feature of the second sub-image, or a feature obtained by processing the original feature. The original features may include but not limited to facial features, body features, etc. of the objects contained in the image.
步骤S23、利用第一模型的第二网络,基于至少一个第二对象的第二特征,对第一子特征和第二子特征分别进行更新,得到第一子特征对应的第一目标子特征和第二子特征对应的第二目标子特征。Step S23, using the second network of the first model, based on the second feature of at least one second object, to update the first sub-feature and the second sub-feature respectively, to obtain the first target sub-feature and the first target sub-feature corresponding to the first sub-feature The second target sub-feature corresponding to the second sub-feature.
这里,每一第二对象与第一对象的相似度不小于第一阈值。其中,该第一阈值可以是预先设定的,也可以是统计得到的。在实施时,本领域技术人员可以根据实际需求自主确定第一阈值的设定方式,本公开实施例不作限定。例如,第二对象与第一对象的长相特征之间的相似度不小于第一阈值。又例如,第二对象与第一对象的穿着特征之间的相似度不小于第一阈值。再例如,第二对象与第一对象的长相特征之间的相似度和穿着特征之间的相似度均不小于第一阈值。Here, the similarity between each second object and the first object is not less than the first threshold. Wherein, the first threshold may be preset or obtained by statistics. During implementation, those skilled in the art may independently determine the setting manner of the first threshold according to actual needs, which is not limited in the embodiments of the present disclosure. For example, the similarity between the facial features of the second object and the first object is not less than the first threshold. For another example, the similarity between the wearing features of the second object and the first object is not less than the first threshold. For another example, neither the similarity between the appearance characteristics of the second object nor the similarity between the clothing characteristics of the first object is less than the first threshold.
第二特征可以是基于训练集得到的,也可以是预先输入的。该第二对象可以包括但不限于人、动物、植物、物品等。The second feature can be obtained based on the training set, or can be pre-input. The second object may include, but is not limited to, people, animals, plants, objects, and the like.
在一些实施方式中,可以基于每一第二对象的第二特征和第一对象的第一特征之间的相似度,得到每一第二对象与第一对象的相似度。在一些实施方式中,可以基于每一第二对象的特征中心和第一对象的第一特征之间的相似度,得到每一第二对象与第一对象的相似度。其中,第一模型可以包括第二记忆特征库,该第二记忆特征库可以包括至少一个对象的至少一个特征。该第二对象的特征中心可以是基于第二记忆特征库中属于第二对象的至少一个特征得到的。在一些实施方式中,可以提取训练集中至少一个对象的多个图像样本的特征,并将提取的特征按照身份存储至第二记忆特征库中。In some implementations, the similarity between each second object and the first object may be obtained based on the similarity between the second feature of each second object and the first feature of the first object. In some implementations, the similarity between each second object and the first object may be obtained based on the similarity between the feature center of each second object and the first feature of the first object. Wherein, the first model may include a second memory feature library, and the second memory feature library may include at least one feature of at least one object. The feature center of the second object may be obtained based on at least one feature belonging to the second object in the second memory feature library. In some implementations, features of multiple image samples of at least one object in the training set may be extracted, and the extracted features may be stored in the second memory feature library according to their identity.
步骤S24、基于第一目标子特征和第二目标子特征,确定目标损失值。Step S24: Determine a target loss value based on the first target sub-feature and the second target sub-feature.
这里,目标损失值可以包括但不限于均方误差损失值、交叉熵损失值、对比损失值等中的至少一种。Here, the target loss value may include, but not limited to, at least one of a mean square error loss value, a cross-entropy loss value, a comparison loss value, and the like.
步骤S25、基于目标损失值,对第一模型的模型参数进行至少一次更新,得到训练后的第一模型。Step S25 , based on the target loss value, update the model parameters of the first model at least once to obtain the trained first model.
这里,上述步骤S25对应于前述步骤S15,在实施时可以参照前述步骤S15的实施方式。Here, the above-mentioned step S25 corresponds to the above-mentioned step S15, and the implementation manner of the above-mentioned step S15 can be referred to for implementation.
在本公开实施例中,通过获取包含第一对象的第一子图像和第二子图像,第二子图像是对第一子图像至少进行遮挡处理后的图像;利用待训练的第一模型的第一网络,对第一子图像进行特征提取,得到第一对象的第一子特征,并对第二子图像进行特征提取,得到第一对象的第二子特征;利用第一模型的第二网络,基于至少一个第二对象的第二特征,对第一子特征和第二子特征分别进行更新,得到第一子特征对应的第一目标子特征和第二子特征对应的第二目标子特征,每一第二对象和第一对象的相似度不小于第一阈值;基于第一目标子特征和第二目标子特征,确定目标损失值;基于目标损失值,对第一模型的模型参数进行至少一次更新,得到训练后的第一模型。这样,在包含第一对象的第一图像样本的图片层面和特征层面分别引入物体图像和其他对象的特征作为噪声,对第一模型的整体网络结构进行训练,从而可以增强第一模型的鲁棒性和提高第一模型的性能,同时在目标损失值不满足预设条件的情况下,对第一模型的模型参数进行至少一次更新,由于目标损失值是基于第一目标特征确定的,从而可以提高训练后的第一模型对于同一对象的不同图像样本的预测的一致性,进而能够使得训练后的第一模型能够更加准确的对包含有物体遮挡和/或多个对象的图像中的对象进行重识别。In the embodiment of the present disclosure, by acquiring the first sub-image and the second sub-image containing the first object, the second sub-image is an image after at least occlusion processing is performed on the first sub-image; using the first model to be trained The first network performs feature extraction on the first sub-image to obtain the first sub-feature of the first object, and performs feature extraction on the second sub-image to obtain the second sub-feature of the first object; using the second sub-feature of the first model The network, based on the second feature of at least one second object, respectively updates the first sub-feature and the second sub-feature to obtain the first target sub-feature corresponding to the first sub-feature and the second target sub-feature corresponding to the second sub-feature feature, the similarity between each second object and the first object is not less than the first threshold; based on the first target sub-feature and the second target sub-feature, determine the target loss value; based on the target loss value, the model parameters of the first model At least one update is performed to obtain the trained first model. In this way, at the image level and feature level of the first image sample containing the first object, the features of the object image and other objects are respectively introduced as noise, and the overall network structure of the first model is trained, so that the robustness of the first model can be enhanced. and improve the performance of the first model. At the same time, when the target loss value does not meet the preset conditions, the model parameters of the first model are updated at least once. Since the target loss value is determined based on the first target feature, it can be Improve the consistency of the prediction of the first model after training for different image samples of the same object, so that the first model after training can more accurately predict objects in images containing object occlusion and/or multiple objects Re-identify.
在一些实施方式中,步骤S24包括步骤S241至步骤S243,其中:In some embodiments, step S24 includes step S241 to step S243, wherein:
步骤S241、基于第一目标子特征和第二目标子特征,确定第一目标损失值。Step S241: Determine a first target loss value based on the first target sub-feature and the second target sub-feature.
这里,第一目标损失值可以包括但不限于均方误差损失值、交叉熵损失值、对比损失值等中的至少一种。Here, the first target loss value may include, but not limited to, at least one of a mean square error loss value, a cross-entropy loss value, a comparison loss value, and the like.
在一些实施方式中,步骤S241包括步骤S2411至步骤S2413,其中:In some embodiments, step S241 includes step S2411 to step S2413, wherein:
步骤S2411、基于第一目标子特征,确定第三目标子损失值。Step S2411. Based on the first target sub-feature, determine a third target sub-loss value.
这里,上述步骤S2411对应于前述步骤S14,在实施时可以参照前述步骤S14的实施方式。Here, the above-mentioned step S2411 corresponds to the above-mentioned step S14, and the implementation manner of the above-mentioned step S14 can be referred to for implementation.
步骤S2412、基于第二目标子特征,确定第四目标子损失值。Step S2412. Based on the second target sub-feature, determine the fourth target sub-loss value.
这里,上述步骤S2412对应于前述步骤S14,在实施时可以参照前述步骤S14的实施方式。Here, the above-mentioned step S2412 corresponds to the above-mentioned step S14, and the implementation of the above-mentioned step S14 can be referred to for implementation.
步骤S2413、基于第三目标子损失值和第四目标子损失值,确定第一目标损失值。Step S2413: Determine the first target loss value based on the third target sub-loss value and the fourth target sub-loss value.
这里,第一目标损失值可以包括但不限于第三目标子损失值和第四目标子损失值之间的和、对第三目标子损失值和第四目标子损失值分别加权之后的和等。在实施时,本领域技术人员可以根据实际需求确定第一目标损失值的方式,本公开实施例不作限定。Here, the first target loss value may include but not limited to the sum between the third target sub-loss value and the fourth target sub-loss value, the sum after weighting the third target sub-loss value and the fourth target sub-loss value, etc. . During implementation, those skilled in the art may determine the first target loss value according to actual needs, which is not limited by the embodiments of the present disclosure.
步骤S242、基于第一子特征和第二子特征,确定第二目标损失值。Step S242: Determine a second target loss value based on the first sub-feature and the second sub-feature.
这里,第二目标损失值可以包括但不限于均方误差损失值、交叉熵损失值、对比损失值等中的至少一种。Here, the second target loss value may include but not limited to at least one of a mean square error loss value, a cross-entropy loss value, a comparison loss value, and the like.
步骤S243、基于第一目标损失值和第二目标损失值,确定目标损失值。Step S243: Determine a target loss value based on the first target loss value and the second target loss value.
这里,目标损失值可以包括但不限于第一目标损失值和第二目标损失值之间的和、对第一目标损失值和第二目标损失值分别加权之后的和等。在实施时,本领域技术人员可以根据实际需求确定目标损失值的方式,本公开实施例不作限定。Here, the target loss value may include, but not limited to, the sum of the first target loss value and the second target loss value, the sum after weighting the first target loss value and the second target loss value respectively, and the like. During implementation, those skilled in the art may determine the target loss value according to actual needs, which is not limited by the embodiments of the present disclosure.
在本公开实施方式中,基于第一子特征、第二子特征、第一目标子特征和第二目标子特征,确定目标损失值。这样,可以提高目标损失值的准确度,以便于准确判断第一模型是否收敛。In an embodiment of the present disclosure, the target loss value is determined based on the first sub-feature, the second sub-feature, the first target sub-feature and the second target sub-feature. In this way, the accuracy of the target loss value can be improved, so as to accurately judge whether the first model is converged.
在一些实施方式中,第一网络包括第一子网络和第二子网络,步骤S22包括步骤S221至步骤S222,其中:In some implementations, the first network includes a first subnet and a second subnet, and step S22 includes steps S221 to S222, wherein:
步骤S221、利用待训练的第一模型的第一子网络,分别对第一子图像和第二子图像进行特征提取,得到第一子图像对应的第三子特征和第二子图像对应的第四子特征。Step S221. Using the first sub-network of the first model to be trained, perform feature extraction on the first sub-image and the second sub-image respectively, to obtain the third sub-feature corresponding to the first sub-image and the third sub-feature corresponding to the second sub-image. Four features.
这里,第一网络至少包括第一子网络,该第一子网络用于采用特征提取器来提取该图像的特征。该特征提取器可以包括但不限于RNN、CNN、基于转换器(Transform)的特征提取网络等。在实施时,本领域技术人员可以根据实际情况在第一模型中采用合适的第一子网络得到第三子特征,本公开实施例不作限定。例如,通过该第一子网络提取第一子图像的特征,并将该特征确定为第一对象的第三子特征。其中,第三子特征可以包括但不限于第一子图像的原始特征等。Here, the first network includes at least a first subnetwork, and the first subnetwork is used to extract features of the image using a feature extractor. The feature extractor may include, but is not limited to, RNN, CNN, a Transform-based feature extraction network, and the like. During implementation, those skilled in the art may use an appropriate first sub-network in the first model to obtain the third sub-feature according to actual conditions, which is not limited in the embodiments of the present disclosure. For example, a feature of the first sub-image is extracted through the first sub-network, and the feature is determined as a third sub-feature of the first object. Wherein, the third sub-feature may include but not limited to the original feature of the first sub-image and the like.
步骤S222、利用第一模型的第二子网络,基于第三子特征确定第一子特征,并基于第四子特征确定第二子特征。Step S222, using the second sub-network of the first model, determining the first sub-feature based on the third sub-feature, and determining the second sub-feature based on the fourth sub-feature.
在一些实施方式中,第二子网络可以包括遮挡擦除网络,该遮挡擦除网络用于对输入的特征进行遮挡擦除处理,输出无遮挡的特征。例如,通过第二子网络对第三子特征进行遮挡擦除处理后,得到第一对象的第一子特征。又例如,通过第二子网络对第四子特征进行遮挡擦除处理后,得到第一对象的第二子特征。In some implementations, the second sub-network may include an occlusion erasure network, which is used to perform occlusion erasure processing on input features and output unoccluded features. For example, the first sub-feature of the first object is obtained after occlusion and erasure processing is performed on the third sub-feature through the second sub-network. For another example, the second sub-feature of the first object is obtained after the fourth sub-feature is occluded and erased through the second sub-network.
在本公开实施方式中,通过在包含第一对象的第一图像样本的图片层面引入物体图像作为噪声,对第一模型的整体网络结构进行训练,从而可以增强第一模型的鲁棒性和提高第一模型的性能,进而能够使得训练后的第一模型能够更加准确的对包含有物体遮挡的图像中的对象进行重识别。In the embodiment of the present disclosure, the overall network structure of the first model is trained by introducing the object image as noise at the picture level containing the first image sample of the first object, so that the robustness and the improvement of the first model can be enhanced. The performance of the first model can further enable the trained first model to more accurately re-identify objects in images containing object occlusions.
在一些实施方式中,步骤S242包括步骤S2421至步骤S2423,其中:In some embodiments, step S242 includes step S2421 to step S2423, wherein:
步骤S2421、基于第一子特征和第二子特征,确定第一目标子损失值。Step S2421. Based on the first sub-feature and the second sub-feature, determine a first target sub-loss value.
这里,第一目标子损失值可以包括但不限于均方误差损失值、交叉熵损失值、对比损失值等中的至少一种。Here, the first target sub-loss value may include but not limited to at least one of a mean square error loss value, a cross-entropy loss value, a comparison loss value, and the like.
步骤S2422、基于第三子特征和第四子特征,确定第二目标子损失值。Step S2422. Based on the third sub-feature and the fourth sub-feature, determine a second target sub-loss value.
这里,第二目标子损失值可以包括但不限于均方误差损失值、交叉熵损失值、对比损失值等中的至少一种。Here, the second target sub-loss value may include, but not limited to, at least one of a mean square error loss value, a cross-entropy loss value, a comparison loss value, and the like.
步骤S2423、基于第一目标子损失值和第二目标子损失值,确定第二目标损失值。Step S2423: Determine a second target loss value based on the first target sub-loss value and the second target sub-loss value.
这里,第二目标损失值可以包括但不限于第一目标子损失值和第二目标子损失值之间的和、对第一目标子损失值和第二目标子损失值分别加权之后的和等。在实施时,本领域技术人员可以根据实际需求确定第二目标损失值的方式,本公开实施例不作限定。Here, the second target loss value may include but not limited to the sum between the first target sub-loss value and the second target sub-loss value, the sum after weighting the first target sub-loss value and the second target sub-loss value, etc. . During implementation, those skilled in the art may determine the second target loss value according to actual needs, which is not limited by the embodiments of the present disclosure.
在本公开实施方式中,基于第一子特征、第二子特征、第三子特征和第四子特征,确定第二目标损失值。这样,可以提高第二目标损失值的准确度,以便于准确判断第一模型是否收敛。In an embodiment of the present disclosure, the second target loss value is determined based on the first sub-feature, the second sub-feature, the third sub-feature and the fourth sub-feature. In this way, the accuracy of the second target loss value can be improved, so as to accurately judge whether the first model converges.
在一些实施方式中,第一子图像包括标签信息,步骤S2422包括步骤S251至步骤S253,其中:In some implementations, the first sub-image includes label information, and step S2422 includes steps S251 to S253, wherein:
步骤S251、基于第三子特征和标签信息,确定第七子损失值。Step S251. Determine a seventh sub-loss value based on the third sub-feature and label information.
这里,标签信息可以包括但不限于标签值、标识等。第七子损失值可以包括但不限于交叉熵损失值等。在一些实施方式中,可以通过上述公式(1-1)计算第七子损失值,此时公式(1-1)中的f i是第三子特征。 Here, tag information may include, but not limited to, tag values, identifiers, and the like. The seventh sub-loss value may include but not limited to a cross-entropy loss value and the like. In some implementation manners, the seventh sub-loss value can be calculated by the above formula (1-1), and at this time, f i in the formula (1-1) is the third sub-feature.
步骤S252、基于第四子特征和标签信息,确定第八子损失值。Step S252: Determine an eighth sub-loss value based on the fourth sub-feature and label information.
这里,第八子损失值可以包括但不限于交叉熵损失值等。在一些实施方式中,可以根据上述公式(1-1)确定第八子损失值,此时公式(1-1)中的f i是第四子特征。 Here, the eighth sub-loss value may include but not limited to a cross-entropy loss value and the like. In some implementation manners, the eighth sub-loss value may be determined according to the above formula (1-1), at this time, f i in the formula (1-1) is the fourth sub-feature.
步骤S253、基于第七子损失值和第八子损失值,确定第二目标子损失值。Step S253: Determine a second target sub-loss value based on the seventh sub-loss value and the eighth sub-loss value.
这里,第二目标子损失值可以包括但不限于第七子损失值和第八子损失值之间的和、对第七子损失值和第八子损失值分别加权之后的和等。在实施时,本领域技术人员可以根据实际需求确定第二目标子损失值的方式,本公开实施例不作限定。Here, the second target sub-loss value may include, but not limited to, the sum between the seventh sub-loss value and the eighth sub-loss value, the sum after weighting the seventh sub-loss value and the eighth sub-loss value, and the like. During implementation, those skilled in the art may determine the second target sub-loss value according to actual requirements, which is not limited in the embodiments of the present disclosure.
在本公开实施方式中,基于第三子特征、第四子特征和标签信息,确定第二目标子损失值。这样,可以提高 第二目标子损失值的准确度,以便于准确判断第一模型是否收敛。In an embodiment of the present disclosure, the second target sub-loss value is determined based on the third sub-feature, the fourth sub-feature and label information. In this way, the accuracy of the second target sub-loss value can be improved, so as to accurately judge whether the first model is converged.
在一些实施方式中,第二子网络包括第三子网络和第四子网络,步骤S222包括步骤S2221至步骤S2222,其中:In some implementations, the second subnetwork includes a third subnetwork and a fourth subnetwork, and step S222 includes steps S2221 to S2222, wherein:
步骤S2221、利用第一模型的第三子网络,基于第三子特征确定第一遮挡分数,并基于第四子特征确定第二遮挡分数。Step S2221, using the third sub-network of the first model to determine the first occlusion score based on the third sub-feature, and determine the second occlusion score based on the fourth sub-feature.
这里,第二子网络至少包括第三子网络,该第三子网络用于基于对图像的特征进行语义分析,以得到该图像对应的遮挡分数。Here, the second sub-network includes at least a third sub-network, and the third sub-network is used to perform semantic analysis based on features of the image to obtain an occlusion score corresponding to the image.
在一些实施方式中,第三子网络包括池化子网络和至少一个遮挡擦除子网络,第一遮挡分数包括至少一个第一遮挡子分数,第二遮挡分数包括至少一个第二遮挡子分数;上述步骤S2221包括步骤261至步骤S262,其中:In some embodiments, the third subnetwork includes a pooling subnetwork and at least one occlusion erasure subnetwork, the first occlusion score includes at least one first occlusion subscore, and the second occlusion score includes at least one second occlusion subscore; The above step S2221 includes step 261 to step S262, wherein:
步骤S261、利用池化子网络,将第三子特征划分为至少一个第三子部分特征,并将第四子特征划分为至少一个第四子部分特征。Step S261. Divide the third sub-feature into at least one third sub-part feature by using the pooling sub-network, and divide the fourth sub-feature into at least one fourth sub-part feature.
这里,池化子网络用于对输入的特征进行划分,得到该特征的至少一个子部分特征。第三子部分特征的数量可以与第一子图像的数量相同。例如,将第一子图像划分为四个部分,那么可以通过池化子网络将第三子特征划分为三个第三子部分特征,每一第三子部分特征对应为f iHere, the pooling sub-network is used to divide the input feature to obtain at least one sub-part feature of the feature. The number of third sub-section features may be the same as the number of first sub-images. For example, if the first sub-image is divided into four parts, then the third sub-feature can be divided into three third sub-part features through the pooling sub-network, and each third sub-part feature corresponds to f i .
步骤S262、利用每一遮挡擦除子网络,基于每一第三子部分特征,确定第一遮挡子分数,并基于每一第四子部分特征,确定第二遮挡子分数。Step S262. Using each occlusion erasure sub-network, determine a first occlusion sub-score based on each third sub-part feature, and determine a second occlusion sub-score based on each fourth sub-part feature.
这里,每一遮挡擦除子网络用于对输入的特征进行语义分析以得到该特征对应的图像的遮挡分数。在一些实施方式中,每一遮挡擦除子网络包括两个全连接层、一个层归一化和一个激活函数构成,其中,层归一化位于两个全连接层之间,激活函数位于最后。在一些实施方式中,激活函数可以是Sigmoid函数。在一些实施方式中,遮挡擦除子网络的数量与第一子图像划分的数量相同。例如,将第一子图像划分为四个部分,每一部分对应的特征为f i,此时第三子网络包括四个遮挡擦除子网络,每一遮挡擦除子网络用于输出fi对应的遮挡分数。又例如,将第一子图像划分为五个部分,每一部分对应的特征为f i,此时第三子网络包括五个遮挡擦除子网络,每一遮挡擦除子模块用于输出f i对应的遮挡分数。 Here, each occlusion erasure sub-network is used to perform semantic analysis on the input feature to obtain the occlusion score of the image corresponding to the feature. In some implementations, each occlusion erasing sub-network consists of two fully connected layers, a layer normalization and an activation function, wherein the layer normalization is located between the two fully connected layers, and the activation function is located at the end . In some embodiments, the activation function can be a sigmoid function. In some embodiments, the number of occlusion erasure sub-networks is the same as the number of first sub-image divisions. For example, the first sub-image is divided into four parts, and the corresponding feature of each part is f i . At this time, the third sub-network includes four occlusion-erasing sub-networks, and each occlusion-erasing sub-network is used to output the corresponding Occlusion score. For another example, the first sub-image is divided into five parts, and the corresponding feature of each part is f i . At this time, the third sub-network includes five occlusion-erasing sub-networks, and each occlusion-erasing sub-module is used to output f i The corresponding occlusion score.
在一些实施方式中,可以通过如下公式(2-2)计算遮挡分数:In some implementations, the occlusion score can be calculated by the following formula (2-2):
s i=Sigmoid(W rgLN(W cpf i))   (2-2); s i =Sigmoid(W rg LN(W cp f i )) (2-2);
其中,W cp是一个矩阵,
Figure PCTCN2022127109-appb-000007
W rg是一个矩阵,
Figure PCTCN2022127109-appb-000008
LN是层归一化,c表示通道维数,f i表示第三子特征或第四子特征中的第i个部分的特征。
where W cp is a matrix,
Figure PCTCN2022127109-appb-000007
W rg is a matrix,
Figure PCTCN2022127109-appb-000008
LN is layer normalization, c represents the channel dimension, and fi represents the feature of the i-th part in the third sub-feature or the fourth sub-feature.
例如,通过池化子网络将第三子特征划分为四个第三子部分特征,将每一第三子部分特征输入至对应的遮挡擦除子网络中,基于第一个全连接层W cp将通道维度压缩到原来的四分之一,并对压缩通道维数的特征进行层归一化,再将层归一化的特征压缩至一维,最后经Sigmoid函数输出第三子部分特征对应的第一遮挡子分数s iFor example, the third sub-feature is divided into four third sub-part features through the pooling sub-network, and each third sub-part feature is input into the corresponding occlusion erasure sub-network, based on the first fully connected layer W cp Compress the channel dimension to a quarter of the original, and perform layer normalization on the features of the compressed channel dimension, then compress the layer normalized features to one dimension, and finally output the third sub-part feature correspondence through the Sigmoid function The first occlusion sub-score s i of .
步骤S2222、利用第四子网络,基于第三子特征和第一遮挡分数,确定第一子特征,并基于第四子特征和第二遮挡分数,确定第二子特征。Step S2222. Using the fourth sub-network, determine the first sub-feature based on the third sub-feature and the first occlusion score, and determine the second sub-feature based on the fourth sub-feature and the second occlusion score.
这里,第二子网络还包括第四子网络,第四子网络用于确定遮挡擦除后的特征。Here, the second subnetwork further includes a fourth subnetwork, and the fourth subnetwork is used to determine features after occlusion erasure.
在一些实施方式中,步骤S2222包括步骤S271至步骤272,其中:In some embodiments, step S2222 includes step S271 to step 272, wherein:
步骤S271、利用第四子网络,基于第三子特征的每一第三子部分特征和每一第一遮挡子分数,确定第一子部分特征,并基于第四子特征的每一第四子部分特征和每一第二遮挡子分数,确定第二子部分特征。Step S271, using the fourth sub-network, based on each third sub-part feature of the third sub-feature and each first occlusion sub-score, determine the first sub-part feature, and based on each fourth sub-part feature of the fourth sub-feature The partial feature and each second occlusion sub-score determine a second sub-part feature.
在一些实施方式中,可以通过如下公式(2-3)计算第一子部分特征或第二子部分特征:In some embodiments, the first sub-part feature or the second sub-part feature can be calculated by the following formula (2-3):
f i′=s i·f i   (2-3); f i '=s i f i (2-3);
其中,s i表示第i个遮挡分数,f i表示第i个第三子部分特征或第四子部分特征。 where si denotes the i-th occlusion score, fi denotes the i-th third sub-part feature or fourth sub-part feature.
在一些实施方式中,可以基于第一子特征,更新第二特征记忆库。更新的方式可以包括但不限于将第一子特征新增至第二特征记忆库中、将第二特征记忆库中的某一特征替换为第一子特征等。In some implementations, the second feature memory may be updated based on the first sub-feature. The way of updating may include, but not limited to, adding the first sub-feature to the second feature storage, replacing a certain feature in the second feature storage with the first sub-feature, and so on.
步骤S272、基于每一第一子部分特征,确定第一子特征,并基于每一第二子部分特征,确定第二子特征。Step S272: Determine the first sub-feature based on each first sub-part feature, and determine the second sub-feature based on each second sub-part feature.
在一些实施方式中,将至少一个第一子部分特征进行拼接,便可以得到第一子特征。In some embodiments, the first sub-features can be obtained by concatenating at least one first sub-feature.
在本公开实施方式中,通过池化子网络、至少一个遮挡擦除子网络及第四子网络,可以提高第一子特征和第二子特征的准确度。In the embodiments of the present disclosure, the accuracy of the first sub-feature and the second sub-feature can be improved by using the pooling sub-network, at least one occlusion-erasing sub-network and the fourth sub-network.
在一些实施方式中,第一子图像包括标签信息,第一模型包括第二特征记忆库,第二特征记忆库中包括属于至少一个对象的至少一个特征,上述步骤S2421包括步骤S281至步骤S285,其中:In some embodiments, the first sub-image includes label information, the first model includes a second feature memory, and the second feature memory includes at least one feature belonging to at least one object, and the above step S2421 includes steps S281 to S285, in:
步骤S281、基于第一子图像和第二子图像,确定遮挡掩码。Step S281. Determine an occlusion mask based on the first sub-image and the second sub-image.
这里,上述步骤S281对应于前述步骤S213,在实施时可以参照前述步骤S213的实施方式。Here, the above-mentioned step S281 corresponds to the above-mentioned step S213, and the implementation manner of the above-mentioned step S213 can be referred to for implementation.
步骤S282、基于第一遮挡分数、第二遮挡分数和遮挡掩码,确定第三损失值。Step S282. Determine a third loss value based on the first occlusion score, the second occlusion score and the occlusion mask.
这里,第三损失值可以包括但不限于均方误差损失值等。Here, the third loss value may include, but not limited to, a mean square error loss value and the like.
步骤S283、基于第一子特征、第二子特征和标签信息,确定第四损失值。Step S283: Determine a fourth loss value based on the first sub-feature, the second sub-feature and label information.
这里,第四损失值可以包括但不限于交叉熵损失值等。Here, the fourth loss value may include but not limited to a cross-entropy loss value and the like.
步骤S284、基于第一子特征、第二子特征和第二特征记忆库中的至少一个对象的至少一个特征,确定第五 损失值。Step S284: Determine a fifth loss value based on the first sub-feature, the second sub-feature, and at least one feature of at least one object in the second feature memory.
这里,第五损失值可以包括但不限于对比损失值等。Here, the fifth loss value may include, but not limited to, a comparison loss value and the like.
步骤S285、基于第三损失值、第四损失值和第五损失值,确定第一目标子损失值。Step S285, based on the third loss value, the fourth loss value and the fifth loss value, determine the first target sub-loss value.
这里,第一目标子损失值可以包括但不限于第三损失值、第四损失值和第五损失值之间的和,对第三损失值、第四损失值和第五损失值分别加权之后的和等。在实施时,本领域技术人员可以根据实际需求确定第一目标子损失值的方式,本公开实施例不作限定。Here, the first target sub-loss value may include but not limited to the sum of the third loss value, the fourth loss value and the fifth loss value, after weighting the third loss value, the fourth loss value and the fifth loss value respectively and so on. During implementation, those skilled in the art may determine the first target sub-loss value according to actual needs, which is not limited in the embodiments of the present disclosure.
在本公开实施方式中,基于遮挡掩码、第一子特征、第二子特征、标签信息和其它对象的特征,确定第一目标子损失值。这样,可以提高第一目标子损失值的准确度,以便于准确判断第一模型是否收敛。In the embodiments of the present disclosure, the first target sub-loss value is determined based on the occlusion mask, the first sub-feature, the second sub-feature, label information and other object characteristics. In this way, the accuracy of the first target sub-loss value can be improved, so as to accurately judge whether the first model is converged.
在一些实施方式中,步骤S282包括步骤S2821至步骤S2823,其中:In some embodiments, step S282 includes step S2821 to step S2823, wherein:
步骤S2821、基于第一遮挡分数和遮挡掩码,确定第一子损失值。Step S2821: Determine a first sub-loss value based on the first occlusion score and the occlusion mask.
这里,第一子损失值可以包括但不限于均方误差损失值等。在一些实施方式中,可以根据如下公式(2-4)计算第一子损失值:Here, the first sub-loss value may include, but not limited to, a mean square error loss value and the like. In some implementations, the first sub-loss value can be calculated according to the following formula (2-4):
Figure PCTCN2022127109-appb-000009
Figure PCTCN2022127109-appb-000009
其中,N为遮挡擦除子网络的总数,s i表示第i个遮挡分数,mask i表示遮挡掩码中第i个遮挡子掩码。例如,在遮挡掩码mask为1110的情况下,此时mask 1为1,mask 4为0。 where N is the total number of occlusion erasure sub-networks, s i represents the i-th occlusion score, and mask i represents the i-th occlusion sub-mask in the occlusion mask. For example, in the case where the occlusion mask mask is 1110, mask 1 is 1 and mask 4 is 0 at this time.
步骤S2822、基于第二遮挡分数和遮挡掩码,确定第二子损失值。Step S2822: Determine a second sub-loss value based on the second occlusion score and the occlusion mask.
这里,第二子损失值可以包括但不限于均方误差损失值等。在实施时,确定第二子损失值与确定第一子损失值的方式可以相同,具体参见步骤S2821。Here, the second sub-loss value may include, but not limited to, a mean square error loss value and the like. During implementation, the manner of determining the second sub-loss value may be the same as that of determining the first sub-loss value, see step S2821 for details.
步骤S2823、基于第一子损失值和第二子损失值,确定第三损失值。Step S2823: Determine a third loss value based on the first sub-loss value and the second sub-loss value.
这里,第三损失值可以包括但不限于第一子损失值和第二子损失值之间的和、对第一子损失值和第二子损失值分别加权之后的和等。在实施时,本领域技术人员可以根据实际需求确定第三损失值的方式,本公开实施例不作限定。Here, the third loss value may include, but not limited to, the sum of the first sub-loss value and the second sub-loss value, the sum after weighting the first sub-loss value and the second sub-loss value, and the like. During implementation, those skilled in the art may determine the third loss value according to actual requirements, which is not limited in the embodiments of the present disclosure.
在本公开实施方式中,基于第一遮挡分数、第二遮挡分数和遮挡掩码,确定第三损失值。这样,可以提高第三损失值的准确度,以便于准确判断第一模型是否收敛。In an embodiment of the present disclosure, the third loss value is determined based on the first occlusion score, the second occlusion score and the occlusion mask. In this way, the accuracy of the third loss value can be improved, so as to accurately judge whether the first model is converged.
在一些实施方式中,步骤S283包括步骤S2831至步骤S2833,其中:In some embodiments, step S283 includes step S2831 to step S2833, wherein:
步骤S2831、基于第一子特征和标签信息,确定第三子损失值。Step S2831. Determine a third sub-loss value based on the first sub-feature and label information.
这里,标签信息可以包括但不限于标签值、标识等。第三子损失值可以包括但不限于交叉熵损失值等。在一些实施方式中,可以通过上述公式(1-1)计算第三子损失值,此时公式(1-1)中的f i是第一子特征。 Here, tag information may include, but not limited to, tag values, identifiers, and the like. The third sub-loss value may include, but not limited to, a cross-entropy loss value and the like. In some implementation manners, the third sub-loss value can be calculated by the above formula (1-1), at this time, f i in the formula (1-1) is the first sub-feature.
步骤S2832、基于第二子特征和标签信息,确定第四子损失值。Step S2832. Determine a fourth sub-loss value based on the second sub-feature and label information.
这里,第四子损失值可以包括但不限于交叉熵损失值等。在一些实施方式中,可以通过上述公式(1-1)计算第四子损失值,此时公式(1-1)中的f i是第二子特征。 Here, the fourth sub-loss value may include but not limited to a cross-entropy loss value and the like. In some implementation manners, the fourth sub-loss value can be calculated by the above formula (1-1), and at this time, f i in the formula (1-1) is the second sub-feature.
步骤S2833、基于第三子损失值和第四子损失值,确定第四损失值。Step S2833: Determine a fourth loss value based on the third sub-loss value and the fourth sub-loss value.
这里,第四损失值可以包括但不限于第三子损失值和第四子损失值之间的和、对第三子损失值和第四子损失值分别加权之后的和等。在实施时,本领域技术人员可以根据实际需求确定第四损失值的方式,本公开实施例不作限定。Here, the fourth loss value may include, but not limited to, the sum between the third sub-loss value and the fourth sub-loss value, the sum after weighting the third sub-loss value and the fourth sub-loss value, and the like. During implementation, those skilled in the art may determine the fourth loss value according to actual requirements, which is not limited in the embodiments of the present disclosure.
在本公开实施方式中,基于第一子特征、第二子特征和标签信息,确定第四损失值。这样,可以提高第四损失值的准确度,以便于准确判断第一模型是否收敛。In an embodiment of the present disclosure, the fourth loss value is determined based on the first sub-feature, the second sub-feature and label information. In this way, the accuracy of the fourth loss value can be improved, so as to accurately judge whether the first model is converged.
在一些实施方式中,步骤S284包括步骤S2841至步骤S2844,其中:In some embodiments, step S284 includes step S2841 to step S2844, wherein:
步骤S2841、从第二特征记忆库中的至少一个对象的至少一个特征中,确定第一对象的第三特征中心和至少一个第二对象的第四特征中心。Step S2841. From at least one feature of at least one object in the second feature memory, determine a third feature center of the first object and a fourth feature center of at least one second object.
这里,第二特征记忆库中至少存储了第一对象的至少一个特征和至少一个第二对象的至少一个特征。在一些实施方式中,第三特征中心可以是基于第二特征记忆库中的第一对象的特征和该第一子特征,确定第三特征中心。每一第四特征中心可以是基于第二特征记忆库中的每一第二对象的每一特征确定的。在一些实施方式中,可以通过如下公式(2-5)计算每个对象的特征中心:Here, at least one feature of the first object and at least one feature of at least one second object are stored in the second feature storage. In some implementations, the third feature center may be determined based on the feature of the first object in the second feature memory library and the first sub-feature. Each fourth feature center may be determined based on each feature of each second object in the second feature memory. In some embodiments, the feature center of each object can be calculated by the following formula (2-5):
Figure PCTCN2022127109-appb-000010
Figure PCTCN2022127109-appb-000010
其中,c x表示第x个对象的特征中心,B k表示小批量中属于第k个对象的特征集,m是设定的更新的动量系数,f i′是第i个样本的第一子特征。在一些实施方式中,m可以为0.2。 Among them, c x represents the feature center of the x-th object, B k represents the feature set belonging to the k-th object in the mini-batch, m is the set update momentum coefficient, and f i ′ is the first subclass of the i-th sample. feature. In some embodiments, m can be 0.2.
在一些实施方式中,在f i′和B k都属于同一对象的情况下,属于该对象的特征中心c k会变化,在f i′和B k不属于同一对象的情况下,属于该对象的特征中心c k与上一次c k的一致。 In some implementations, when f i ' and B k both belong to the same object, the feature center c k belonging to the object will change, and in the case that f i ' and B k do not belong to the same object, the feature center c k belonging to the object The feature center c k is consistent with the previous c k .
步骤S2842、基于第一子特征、第三特征中心和每一第四特征中心,确定第五子损失值。Step S2842, based on the first sub-feature, the third feature center and each fourth feature center, determine the fifth sub-loss value.
这里,第五子损失值可以包括但不限于对比损失等。在一些实施方式中,可以通过如下公式(2-6)计算第五子损失值:Here, the fifth sub-loss value may include but not limited to contrastive loss and the like. In some implementations, the fifth sub-loss value can be calculated by the following formula (2-6):
Figure PCTCN2022127109-appb-000011
Figure PCTCN2022127109-appb-000011
其中,τ是预定义的温度参数,c y表示第y个对象的第三特征中心,c z表示第z个第四特征中心,f i表示第i个对象的第一子特征,ID S表示训练集中对象的总数。 Among them, τ is a predefined temperature parameter, c y represents the third feature center of the y-th object, c z represents the z-th fourth feature center, f i represents the first sub-feature of the i-th object, and ID S represents The total number of objects in the training set.
步骤S2843、基于第二子特征、第三特征中心和每一第四特征中心,确定第六子损失值。Step S2843, based on the second sub-feature, the third feature center and each fourth feature center, determine the sixth sub-loss value.
这里,第六子损失值可以包括但不限于对比损失等。在实施时,确定第六子损失值与确定第五子损失值的方式可以相同,具体参见步骤S2842。Here, the sixth sub-loss value may include but not limited to contrastive loss and the like. During implementation, the manner of determining the sixth sub-loss value may be the same as that of determining the fifth sub-loss value, see step S2842 for details.
步骤S2844、基于第五子损失值和第六子损失值,确定第六损失值。Step S2844: Determine a sixth loss value based on the fifth sub-loss value and the sixth sub-loss value.
这里,第六损失值可以包括但不限于第五子损失值和第六子损失值之间的和、对第五子损失值和第六子损失值分别加权之后的和等。在实施时,本领域技术人员可以根据实际需求确定第六损失值的方式,本公开实施例不作限定。Here, the sixth loss value may include, but not limited to, the sum between the fifth sub-loss value and the sixth sub-loss value, the sum after weighting the fifth sub-loss value and the sixth sub-loss value, and the like. During implementation, those skilled in the art may determine the sixth loss value according to actual needs, which is not limited in the embodiments of the present disclosure.
在本公开实施方式中,基于第一子特征、第二子特征和其它对象的特征,确定第六损失值。这样,可以提高第六损失值的准确度,以便于准确判断第一模型是否收敛。In an embodiment of the present disclosure, the sixth loss value is determined based on the first sub-feature, the second sub-feature and other object characteristics. In this way, the accuracy of the sixth loss value can be improved, so as to accurately judge whether the first model is converged.
在一些实施方式中,第二网络包括第五子网络和第六子网络,步骤S23包括步骤S231至步骤S232,其中:In some embodiments, the second network includes a fifth subnetwork and a sixth subnetwork, and step S23 includes steps S231 to S232, wherein:
步骤S231、利用第五子网络,将第一子特征和第二子特征分别与至少一个第二对象的第二特征进行聚合,得到第一子特征对应的第一聚合子特征和第二子特征对应的第二聚合子特征。Step S231, using the fifth sub-network to aggregate the first sub-feature and the second sub-feature with the second feature of at least one second object respectively, to obtain the first aggregated sub-feature and the second sub-feature corresponding to the first sub-feature The corresponding second aggregate subfeature.
这里,第二网络至少包括第五子网络,该第五子网络用于将第一子特征与至少一个第二对象的第二特征进行聚合,得到第一聚合子特征,将第二子特征与至少一个第二对象的第二特征进行聚合,得到第二聚合子特征。Here, the second network includes at least a fifth sub-network, and the fifth sub-network is used to aggregate the first sub-features with the second features of at least one second object to obtain the first aggregated sub-features, and combine the second sub-features with A second feature of at least one second object is aggregated to obtain a second aggregated sub-feature.
步骤S232、利用第六子网络,基于第一聚合子特征确定第一目标子特征,并基于第二聚合子特征确定第二目标子特征。Step S232. Using the sixth sub-network, determine the first target sub-feature based on the first aggregated sub-feature, and determine the second target sub-feature based on the second aggregated sub-feature.
这里,第二网络还包括第六子网络,该第六子网络用于基于第一聚合子特征确定第一目标子特征,基于第二聚合子特征确定第二目标子特征。Here, the second network further includes a sixth sub-network for determining the first target sub-feature based on the first aggregated sub-feature, and determining the second target sub-feature based on the second aggregated sub-feature.
在本公开实施方式中,通过在包含第一对象的第一图像样本的特征层面引入第二对象的特征作为噪声,对第一模型的整体网络结构进行训练,从而可以增强第一模型的鲁棒性和提高第一模型的性能,进而能够使得训练后的第一模型能够更加准确的对包含多个对象的图像中的对象进行重识别。In the embodiment of the present disclosure, the overall network structure of the first model is trained by introducing the features of the second object as noise at the feature level of the first image sample containing the first object, so that the robustness of the first model can be enhanced and improve the performance of the first model, thereby enabling the trained first model to more accurately re-identify objects in images containing multiple objects.
在一些实施方式中,步骤S231包括步骤S2311至步骤S2314,其中:In some embodiments, step S231 includes step S2311 to step S2314, wherein:
步骤S2311、基于第一子特征和每一第二特征,确定第一注意力矩阵。Step S2311, based on the first sub-feature and each second feature, determine a first attention matrix.
这里,第一注意力矩阵用于表征第一子特征和每一第二特征之间的关联度。在一些实施方式中,基于该第一子特征,确定属于至少一个第二对象的X个第二特征,X为正整数。在一些实施方式中,X可以取10。在一些实施方式中,可以基于K近邻算法,在第二特征记忆库中查找与第一子特征最近的X个属于第二对象的第二特征,基于每一第二特征,确定X个第一中心
Figure PCTCN2022127109-appb-000012
在查找时,可以根据特征之间的余弦距离进行计算。
Here, the first attention matrix is used to represent the degree of association between the first sub-feature and each second feature. In some embodiments, based on the first sub-feature, X second features belonging to at least one second object are determined, where X is a positive integer. In some embodiments, X can be 10. In some embodiments, based on the K-nearest neighbor algorithm, the X second features closest to the first sub-features belonging to the second object can be searched in the second feature memory library, and based on each second feature, X first sub-features can be determined. center
Figure PCTCN2022127109-appb-000012
When looking up, it can be calculated based on the cosine distance between features.
在一些实施方式中,所述第五子网络的网络参数包括第一预测矩阵和第二预测矩阵,步骤S2311包括步骤S2321至步骤S2323,其中:In some implementations, the network parameters of the fifth sub-network include a first prediction matrix and a second prediction matrix, and step S2311 includes steps S2321 to S2323, wherein:
步骤S2321、基于第一子特征和第一预测矩阵,确定第一预测特征。Step S2321, based on the first sub-feature and the first prediction matrix, determine the first prediction feature.
在一些实施方式中,可以通过如下公式(2-7)计算第一预测特征:In some embodiments, the first predictive feature can be calculated by the following formula (2-7):
f q=f′W 1    (2-7); f q =f'W 1 (2-7);
其中,f′表示第一子特征,
Figure PCTCN2022127109-appb-000013
d和d′均是f′的特征维度。
Among them, f' represents the first sub-feature,
Figure PCTCN2022127109-appb-000013
Both d and d' are the feature dimensions of f'.
步骤S2322、基于每一第二特征和第二预测矩阵,确定第二预测特征。Step S2322. Based on each second feature and the second predictive matrix, determine a second predictive feature.
在一些实施方式中,可以通过如下公式(2-8)计算第二预测特征:In some embodiments, the second predictive feature can be calculated by the following formula (2-8):
Figure PCTCN2022127109-appb-000014
Figure PCTCN2022127109-appb-000014
其中,
Figure PCTCN2022127109-appb-000015
表示第i个第一中心,i∈1,2,……X,
Figure PCTCN2022127109-appb-000016
d和d′均是第一子特征的特征维度。
in,
Figure PCTCN2022127109-appb-000015
Indicates the i-th first center, i∈1, 2, ... X,
Figure PCTCN2022127109-appb-000016
Both d and d' are feature dimensions of the first sub-feature.
步骤S2323、基于第一预测特征和每一第二预测特征,确定第一注意力矩阵。Step S2323: Determine a first attention matrix based on the first predictive feature and each second predictive feature.
在一些实施方式中,可以通过如下公式(2-9)确定第一注意力矩阵:In some embodiments, the first attention matrix can be determined by the following formula (2-9):
Figure PCTCN2022127109-appb-000017
其中
Figure PCTCN2022127109-appb-000018
Figure PCTCN2022127109-appb-000017
in
Figure PCTCN2022127109-appb-000018
其中,X表示第二特征的总个数,i∈1,2,……X,
Figure PCTCN2022127109-appb-000019
是一个比例因子。
Among them, X represents the total number of second features, i∈1, 2, ... X,
Figure PCTCN2022127109-appb-000019
is a scaling factor.
步骤S2312、基于每一第二特征和每一第一注意力矩阵,确定第一聚合子特征。Step S2312, based on each second feature and each first attention matrix, determine the first aggregation sub-feature.
在一些实施方式中,第五子网络的网络参数还包括第三预测矩阵,步骤S2312包括步骤S2331至步骤S2332,其中:In some implementations, the network parameters of the fifth sub-network also include a third prediction matrix, and step S2312 includes steps S2331 to S2332, wherein:
步骤S2331、基于每一第二特征和第三预测矩阵,确定第三预测特征。Step S2331. Based on each second feature and the third predictive matrix, determine a third predictive feature.
在一些实施方式中,可以通过如下公式(2-10)计算第三预测特征:In some embodiments, the third predictive feature can be calculated by the following formula (2-10):
Figure PCTCN2022127109-appb-000020
Figure PCTCN2022127109-appb-000020
其中,
Figure PCTCN2022127109-appb-000021
表示第i个第一中心,i∈1,2,……X,
Figure PCTCN2022127109-appb-000022
d和d′均是第一子特征的特征维度。
in,
Figure PCTCN2022127109-appb-000021
Indicates the i-th first center, i∈1, 2, ... X,
Figure PCTCN2022127109-appb-000022
Both d and d' are feature dimensions of the first sub-feature.
步骤S2332、基于每一第三预测特征和每一第一注意力矩阵,确定第一聚合子特征。Step S2332, based on each third predictive feature and each first attention matrix, determine the first aggregation sub-feature.
在一些实施方式中,可以通过如下公式(2-11)确定第一聚合子特征:In some embodiments, the first aggregation sub-feature can be determined by the following formula (2-11):
Figure PCTCN2022127109-appb-000023
Figure PCTCN2022127109-appb-000023
其中,m i表示第i个第一注意力矩阵,f vi表示第i个第三预测特征。 Among them, m i represents the i-th first attention matrix, and f vi represents the i-th third predictive feature.
步骤S2313、基于第二子特征和每一第二特征,确定第二注意力矩阵。Step S2313: Determine a second attention matrix based on the second sub-features and each second feature.
这里,第二注意力矩阵用于表征第二子特征和每一第二特征之间的关联度。在实施时,确定第二注意力矩阵与确定第一注意力矩阵的方式可以相同,参见步骤S2321至步骤S2323。Here, the second attention matrix is used to characterize the degree of association between the second sub-features and each second feature. During implementation, the manner of determining the second attention matrix may be the same as that of determining the first attention matrix, see step S2321 to step S2323.
步骤S2314、基于每一第二特征和每一第二注意力矩阵,确定第二聚合子特征。Step S2314, based on each second feature and each second attention matrix, determine a second aggregation sub-feature.
这里,确定第二聚合子特征与确定第一聚合子特征的方式可以相同,具体参见步骤S2331至步骤S2332。Here, the manner of determining the second aggregation sub-feature may be the same as that of determining the first aggregation sub-feature, see step S2331 to step S2332 for details.
在本公开实施方式中,通过多头操作将每个第一中心分成多个部分,并为每个部分分配注意力权重,从而确保可以聚合更多类似于目标对象和非目标对象的独特模式,以增强第一模型的鲁棒性,进而能够使得训练后的第一模型能够更加准确的对包含多个对象的图像中的对象进行重识别。In the embodiment of the present disclosure, each first center is divided into multiple parts by multi-head operation, and attention weight is assigned to each part, so as to ensure that more unique patterns similar to target objects and non-target objects can be aggregated to The robustness of the first model is enhanced, so that the trained first model can more accurately re-identify objects in images containing multiple objects.
在一些实施方式中,第六子网络包括第七子网络和第八子网络,上述步骤S232包括步骤S2341至步骤S2343,其中:In some embodiments, the sixth subnetwork includes the seventh subnetwork and the eighth subnetwork, and the above step S232 includes steps S2341 to S2343, wherein:
步骤S2341、基于第一子图像和第二子图像,确定遮挡掩码。Step S2341. Determine an occlusion mask based on the first sub-image and the second sub-image.
这里,遮挡掩码用于表示图像的遮挡信息。在一些实施方式中,可以基于第一子图像和第二子图像之间的像素差异,确定该遮挡掩码。Here, the occlusion mask is used to represent the occlusion information of the image. In some implementations, the occlusion mask may be determined based on pixel differences between the first sub-image and the second sub-image.
步骤S2342、利用第七子网络,基于第一聚合子特征和遮挡掩码确定第五子特征,并基于第二聚合子特征和遮挡掩码确定第六子特征。Step S2342. Using the seventh sub-network, determine the fifth sub-feature based on the first aggregation sub-feature and the occlusion mask, and determine the sixth sub-feature based on the second aggregation sub-feature and the occlusion mask.
这里,第七子网络可以是包括两个全连接层和一个激活函数的FFN 1(·)神经网络。在一些实施方式中,可以通过如下公式(2-12),得到第五子特征或第六子特征: Here, the seventh sub-network may be an FFN 1 (·) neural network including two fully connected layers and an activation function. In some embodiments, the fifth sub-feature or the sixth sub-feature can be obtained by the following formula (2-12):
f″=mask·FFN 1(f d)    (2-12); f″=mask·FFN 1 (f d ) (2-12);
其中,mask是遮挡掩码,f d是第一聚合子特征或第二聚合子特征。 where mask is the occlusion mask and f d is the first aggregated sub-feature or the second aggregated sub-feature.
步骤S2343、利用第八子网络,基于第一子特征和第五子特征,确定第一目标子特征,并基于第二子特征和第六子特征,确定第二目标子特征。Step S2343. Using the eighth sub-network, determine the first target sub-feature based on the first sub-feature and the fifth sub-feature, and determine the second target sub-feature based on the second sub-feature and the sixth sub-feature.
这里,第八子网络可以是包括两个全连接层和一个激活函数的FFN 2(·)神经网络。在一些实施方式中,可以通过如下公式(2-13),得到第一目标子特征或第二目标子特征: Here, the eighth sub-network may be an FFN 2 (·) neural network including two fully connected layers and an activation function. In some embodiments, the first target sub-feature or the second target sub-feature can be obtained by the following formula (2-13):
f d′=FFN 2(f″+f′)     (2-13); f d '=FFN 2 (f"+f') (2-13);
其中,f″是第五子特征或第六子特征,f′是第一子特征或第二子特征。Wherein, f" is the fifth sub-feature or the sixth sub-feature, and f' is the first sub-feature or the second sub-feature.
在本公开实施方式中,基于遮挡掩码、第一子特征和第一聚合子特征,得到目标特征,可以确保其他对象的特征只添加在第一对象的人体部分还不是预先识别的物体遮挡部分,以便于更好地模拟多行人图像特征。In the embodiment of the present disclosure, based on the occlusion mask, the first sub-feature and the first aggregation sub-feature, the target feature is obtained, which can ensure that the features of other objects are only added to the human body part of the first object and not the pre-identified object occlusion part , in order to better simulate the features of multi-pedestrian images.
图3为本公开实施例提供的一种模型训练方法的实现流程示意图,如图3所示,该方法包括步骤S31至步骤S37,其中:Fig. 3 is a schematic diagram of the implementation flow of a model training method provided by an embodiment of the present disclosure. As shown in Fig. 3, the method includes steps S31 to S37, wherein:
步骤S31、获取包含第一对象的第一图像样本。Step S31 , acquiring a first image sample including a first object.
步骤S32、利用待训练的第一模型的第一网络,对第一图像样本进行特征提取,得到第一对象的第一特征。Step S32 , using the first network of the first model to be trained, to perform feature extraction on the first image sample to obtain the first feature of the first object.
步骤S33、利用第一模型的第二网络,基于至少一个第二对象的第二特征,对第一特征进行更新,得到第一特征对应的第一目标特征,每一第二对象与第一对象的相似度不小于第一阈值。Step S33, using the second network of the first model to update the first feature based on the second feature of at least one second object to obtain the first target feature corresponding to the first feature, and each second object is related to the first object The similarity of is not less than the first threshold.
步骤S34、基于第一目标特征,确定目标损失值。Step S34: Determine a target loss value based on the first target feature.
步骤S35、基于目标损失值,对第一模型的模型参数进行至少一次更新,得到训练后的第一模型。Step S35 , based on the target loss value, update the model parameters of the first model at least once to obtain the trained first model.
这里,上述步骤S31至步骤S35分别对应于前述步骤S11至步骤S15,在实施时,可以参照前述步骤S11至步骤S15的具体实施方式。Here, the above-mentioned steps S31 to S35 correspond to the above-mentioned steps S11 to S15 respectively, and for implementation, reference may be made to the specific implementation manners of the above-mentioned steps S11 to S15.
步骤S36、基于训练后的第一模型,确定初始的第二模型。Step S36: Determine an initial second model based on the trained first model.
这里,可以根据实际的使用场景对训练后的第一模型的网络进行调整,并将调整后的第一模型确定为初始的第二模型。在一些实施方式中,第一模型包括第一网络和第二网络,可以将训练后的第一模型中的第二网络移除,并根据实际的场景对该第一模型的第一网络进行调整,并将调整后的第一模型确定为初始的第二模型。Here, the network of the trained first model may be adjusted according to an actual usage scenario, and the adjusted first model may be determined as the initial second model. In some embodiments, the first model includes a first network and a second network, the second network in the trained first model can be removed, and the first network of the first model can be adjusted according to the actual scene , and determine the adjusted first model as the initial second model.
步骤S37、基于至少一个第二图像样本,对第二模型的模型参数进行更新,得到训练后的第二模型。Step S37 , based on at least one second image sample, update the model parameters of the second model to obtain a trained second model.
这里,第二图像样本可以具有标签信息,也可以是无标签信息。在实施时,本领域技术人员可以根据实际的应用场景确定合适的第二图像样本,这里并不限定。在一些实施方式中,可以基于至少一个第二图像样本,对第二模型的模型参数进行微调训练,得到训练后的第二模型。Here, the second image sample may have label information, or may not have label information. During implementation, those skilled in the art may determine a suitable second image sample according to an actual application scenario, which is not limited here. In some implementations, based on at least one second image sample, fine-tuning training may be performed on model parameters of the second model to obtain a trained second model.
在本公开实施例中,基于训练后的第一模型,确定初始的第二模型,并基于至少一个第二图像样本,对第二模型的模型参数进行更新,得到训练后的第二模型。这样,可以将训练后的第一模型的模型参数迁移至第二模型,以适用于多种应用场景中,不仅可以在实际应用中减小计算量,还可以提高第二模型的训练效率以及训练后的第二模型的检测准确性。In an embodiment of the present disclosure, an initial second model is determined based on the trained first model, and model parameters of the second model are updated based on at least one second image sample to obtain a trained second model. In this way, the model parameters of the trained first model can be migrated to the second model to be applicable to various application scenarios, which can not only reduce the amount of calculation in practical applications, but also improve the training efficiency and training efficiency of the second model. After the detection accuracy of the second model.
图4为本公开实施例提供的一种图像识别方法,如图4所示,该方法包括步骤S41至步骤S42,其中:Fig. 4 is an image recognition method provided by an embodiment of the present disclosure. As shown in Fig. 4, the method includes steps S41 to S42, wherein:
步骤S41、获取第一图像和第二图像。Step S41, acquiring a first image and a second image.
这里,第一图像和第二图像可以是任意合适的待进行识别的图像。在实施时,本领域技术人员可以根据实际应用场景中选择合适的图像,本公开实施例不作限定。在一些实施方式中,第一图像可以包括带有遮挡的图像,也可以包括未遮挡的图像。在一些实施方式中,第一图像和第二图像的来源可以相同,也可以不同。例如,第一 图像和第二图像均是通过摄像头拍摄的图像。又例如,第一图像是通过摄像头拍摄的图像,第二图像可以是视频中的某一帧图像。Here, the first image and the second image may be any suitable images to be recognized. During implementation, those skilled in the art may select an appropriate image according to an actual application scenario, which is not limited by the embodiments of the present disclosure. In some implementations, the first image may include an occluded image or an unoccluded image. In some embodiments, the sources of the first image and the second image may be the same or different. For example, both the first image and the second image are images captured by a camera. For another example, the first image is an image captured by a camera, and the second image may be a frame of an image in a video.
步骤S42、利用已训练的目标模型,对第一图像中的对象和第二图像中的对象进行识别,得到识别结果。Step S42 , using the trained target model, to recognize the object in the first image and the object in the second image, and obtain a recognition result.
这里,已训练的目标模型可以包括但不限于第一模型、第二模型中的至少之一。该识别结果表征第一图像中的对象和第二图像中的对象为同一对象或者不同对象。在一些实施方式中,基于该目标模型分别获取第一图像对应的第一目标特征和第二图像对应的第二目标特征,并基于第一目标特征和第二目标特征之间的相似度,得到该识别结果。Here, the trained target model may include but not limited to at least one of the first model and the second model. The recognition result indicates that the object in the first image and the object in the second image are the same object or different objects. In some implementations, based on the target model, the first target feature corresponding to the first image and the second target feature corresponding to the second image are obtained respectively, and based on the similarity between the first target feature and the second target feature, it is obtained The recognition result.
在本公开实施例中,由于上述实施例中的模型训练方法可以在特征层面引入真实噪声,或在图片层面和特征层面均引入真实噪声,对目标模型的整体网络结构进行训练,增强了目标模型的鲁棒性,有有效地提高了目标模型的性能,因此,基于采用上述实施例中的模型训练方法得到的第一模型和/或第二模型对图像进行识别,能够更加准确的对行人进行重识别。In the embodiment of the present disclosure, since the model training method in the above embodiment can introduce real noise at the feature level, or introduce real noise at both the picture level and the feature level, the overall network structure of the target model is trained, and the target model is enhanced. The robustness of the target model has effectively improved the performance of the target model. Therefore, based on the first model and/or the second model obtained by using the model training method in the above embodiment to identify the image, the pedestrian can be more accurately identified. Re-identify.
图5A为本公开实施例提供的一种模型训练系统50的组成结构示意图,如图5A所示,该模型训练系统50包括增广部分51、遮挡擦除部分52、特征扩散部分53、更新部分54和特征记忆库部分55,其中:FIG. 5A is a schematic diagram of the composition and structure of a model training system 50 provided by an embodiment of the present disclosure. As shown in FIG. 54 and feature memory part 55, wherein:
增广部分51,被配置为对包含第一对象的第一子图像至少进行遮挡处理后,得到第二子图像。The augmentation part 51 is configured to at least perform occlusion processing on the first sub-image containing the first object to obtain the second sub-image.
遮挡擦除部分52,被配置为利用待训练的第一模型的第一网络,对第一子图像进行特征提取,得到第一对象的第一子特征,并对第二子图像进行特征提取,得到第一对象的第二子特征。The occlusion erasing part 52 is configured to use the first network of the first model to be trained to perform feature extraction on the first sub-image, obtain the first sub-feature of the first object, and perform feature extraction on the second sub-image, Get the second subfeature of the first object.
特征扩散部分53,被配置为利用第一模型的第二网络,基于至少一个第二对象的第二特征,对第一子特征和第二子特征分别进行更新,得到第一子特征对应的第一目标子特征和第二子特征对应的第二目标子特征,每一第二对象与第一对象的相似度不小于第一阈值。The feature diffusion part 53 is configured to use the second network of the first model to update the first sub-feature and the second sub-feature respectively based on the second feature of at least one second object, and obtain the first sub-feature corresponding to the first sub-feature A target sub-feature and a second target sub-feature corresponding to the second sub-feature, the similarity between each second object and the first object is not less than the first threshold.
更新部分54,被配置为基于第一目标子特征和第二目标子特征,确定目标损失值;基于目标损失值,对第一模型的模型参数进行至少一次更新,得到训练后的第一模型。The updating part 54 is configured to determine a target loss value based on the first target sub-feature and the second target sub-feature; based on the target loss value, update the model parameters of the first model at least once to obtain the trained first model.
特征记忆库部分55,被配置为存储至少一个对象的至少一个特征。The feature memory part 55 is configured to store at least one feature of at least one object.
在一些实施方式中,特征记忆库部分55包括第一特征记忆库和第二特征记忆库,第一特征记忆库用于存储至少一个对象的第一子特征,第二特征记忆库用于存储至少一个对象的第一目标子特征。In some embodiments, the feature memory part 55 includes a first feature memory and a second feature memory, the first feature memory is used to store the first sub-feature of at least one object, and the second feature memory is used to store at least The first target subfeature of an object.
图5B为本公开实施例提供的一种模型训练系统500的示意图,如图5B所示,该模型训练系统500对输入的第一图像501进行增广处理后,得到第二图像502,将第一图像501和第二图像502输入至遮挡擦除部分52后,分别得到第一子特征f1′和第二子特征f2′,基于第一子特征f1′更新第二特征记忆库552,将第一子特征f1′、第二子特征f2′和从第二特征记忆库552中选择至少一个其他对象的至少一个特征输入至特征扩散部分53后,分别得到第一目标子特征fd1′和第二目标子特征fd2′,基于第一目标子特征fd1′,更新第一特征记忆库551、以及遮挡擦除部分52和特征扩散部分53中的网络参数。FIG. 5B is a schematic diagram of a model training system 500 provided by an embodiment of the present disclosure. As shown in FIG. 5B , the model training system 500 performs augmentation processing on an input first image 501 to obtain a second image 502, and converts the first image to After an image 501 and a second image 502 are input to the occlusion and erasing part 52, the first sub-feature f1' and the second sub-feature f2' are obtained respectively, and the second feature memory library 552 is updated based on the first sub-feature f1', and the second After a sub-feature f1', a second sub-feature f2' and at least one feature of at least one other object selected from the second feature memory bank 552 are input to the feature diffusion part 53, the first target sub-feature fd1' and the second sub-feature are respectively obtained. The target sub-feature fd2 ′, based on the first target sub-feature fd1 ′, updates the first feature storage 551 , and the network parameters in the occlusion erasing part 52 and the feature diffusion part 53 .
在一些实施方式中,增广部分51,还被配置为:基于第一子图像和第二子图像,确定遮挡掩码。In some implementations, the augmentation part 51 is further configured to: determine an occlusion mask based on the first sub-image and the second sub-image.
图5C为本公开实施例提供的一种确定遮挡掩码的示意图,如图5C所示,将第一子图像501和第二子图像502之间进行像素比较操作503,经过像素比较操作503后,对比较结果进行二值化操作504,经过二值化操作504后,得到对应的遮挡掩码505。Fig. 5C is a schematic diagram of determining an occlusion mask provided by an embodiment of the present disclosure. As shown in Fig. 5C, a pixel comparison operation 503 is performed between the first sub-image 501 and the second sub-image 502, and after the pixel comparison operation 503 , perform a binarization operation 504 on the comparison result, and obtain a corresponding occlusion mask 505 after the binarization operation 504 .
在一些实施方式中,第一网络包括第一子网络和第二子网络,遮挡擦除部分52,还被配置为:利用待训练的第一模型的第一子网络,分别对第一子图像和第二子图像进行特征提取,得到第一子图像对应的第三子特征和第二子图像对应的第四子特征;利用第一模型的第二子网络,基于第三子特征确定第一子特征,并基于第四子特征确定第二子特征。In some implementations, the first network includes a first sub-network and a second sub-network, and the occlusion erasing part 52 is further configured to: use the first sub-network of the first model to be trained to respectively perform the first sub-image Perform feature extraction with the second sub-image to obtain the third sub-feature corresponding to the first sub-image and the fourth sub-feature corresponding to the second sub-image; use the second sub-network of the first model to determine the first sub-feature based on the third sub-feature sub-features, and determine the second sub-features based on the fourth sub-features.
图5D为本公开实施例提供的一种第一网络510的示意图,如图5D所示,第一网络510包括第一子网络511和第二子网络512,将第一子图像501和第二子图像502输入至第一子网络511中,得到第一子图像501对应的第三子特征f1,第二子图像502对应的第四子特征f2,将第三子特征f1和第四子特征f2输入至第二子网络512中,得到第一子特征f1′和第二子特征f2′。FIG. 5D is a schematic diagram of a first network 510 provided by an embodiment of the present disclosure. As shown in FIG. 5D , the first network 510 includes a first sub-network 511 and a second sub-network 512. The sub-image 502 is input into the first sub-network 511 to obtain the third sub-feature f1 corresponding to the first sub-image 501, the fourth sub-feature f2 corresponding to the second sub-image 502, and the third sub-feature f1 and the fourth sub-feature f2 is input into the second sub-network 512 to obtain the first sub-feature f1' and the second sub-feature f2'.
在一些实施方式中,第二子网络包括第三子网络和第四子网络,遮挡擦除部分52,还被配置为:利用第一模型的第三子网络,基于第三子特征确定第一遮挡分数,并基于第四子特征确定第二遮挡分数;利用第四子网络,基于第三子特征和第一遮挡分数,确定第一子特征,并基于第四子特征和第二遮挡分数,确定第二子特征。In some embodiments, the second subnetwork includes a third subnetwork and a fourth subnetwork, and the occlusion erasing part 52 is further configured to: use the third subnetwork of the first model to determine the first occlusion score, and determine the second occlusion score based on the fourth sub-feature; utilize the fourth sub-network, based on the third sub-feature and the first occlusion score, determine the first sub-feature, and based on the fourth sub-feature and the second occlusion score, Determine the second sub-feature.
图5E为本公开实施例提供的一种第二子网络512的示意图,如图5E所示,第二子网络512包括第三子网络521和第四子网络522,将第三子特征f1和第四子特征f2输入至第三子网络521中,分别得到第三子特征f1对应的第一遮挡分数s1,和第四特征f2对应的第二遮挡分数s2,将第一遮挡分数s1和第三子特征f1输入至第四子网络522中,得到第一子特征f1′,将第二遮挡分数s2和第四子特征f2输入至第四子网络522中,得到第二子特征f2′。FIG. 5E is a schematic diagram of a second subnetwork 512 provided by an embodiment of the present disclosure. As shown in FIG. 5E, the second subnetwork 512 includes a third subnetwork 521 and a fourth subnetwork 522, and the third subnetwork f1 and The fourth sub-feature f2 is input into the third sub-network 521, and the first occlusion score s1 corresponding to the third sub-feature f1 and the second occlusion score s2 corresponding to the fourth feature f2 are respectively obtained, and the first occlusion score s1 and the second occlusion score The three sub-features f1 are input to the fourth sub-network 522 to obtain the first sub-feature f1', and the second occlusion score s2 and the fourth sub-feature f2 are input to the fourth sub-network 522 to obtain the second sub-feature f2'.
在一些实施方式中,第二网络包括第五子网络和第六子网络,特征扩散部分53,还被配置为:利用第五子网络,将第一子特征和第二子特征分别与至少一个第二对象的第二特征进行聚合,得到第一子特征对应的第一聚合子特征和第二子特征对应的第二聚合子特征;利用第六子网络,基于第一聚合子特征确定第一目标子特征,并基于第二聚合子特征确定第二目标子特征。In some embodiments, the second network includes a fifth sub-network and a sixth sub-network, and the feature diffusion part 53 is further configured to: use the fifth sub-network to combine the first sub-feature and the second sub-feature with at least one The second feature of the second object is aggregated to obtain the first aggregated sub-feature corresponding to the first sub-feature and the second aggregated sub-feature corresponding to the second sub-feature; the sixth sub-network is used to determine the first aggregated sub-feature based on the first aggregated sub-feature target sub-features, and determine second target sub-features based on the second aggregated sub-features.
图5F为本公开实施例提供的一种第二网络520的示意图,如图5F所示,第二网络520包括第五子网络521 和第六子网络522,将第一子特征f1′输入至第五子网络521时,该第五子网络521基于该第一子特征f1′,在第二特征记忆库552中查找K个最近的属于第二对象的第一中心
Figure PCTCN2022127109-appb-000024
基于第一子特征f1′和第一预测矩阵W 1,确定第一预测特征f q,基于第一中心
Figure PCTCN2022127109-appb-000025
和第二预测矩阵W 2,确定第二预测特征f c,基于第一中心
Figure PCTCN2022127109-appb-000026
和第三预测矩阵W 3,确定第三预测特征f v。基于第一预测特征f q和第二预测特征f c确定第一注意力矩阵m i,基于第一注意力矩阵m i和第三预测特征f v确定第一聚合子特征f d。将第一聚合子特征f d输入至FFN 1(·)中,得到第五特征f″,将第一子特征f1′和第五特征f″加权后输入至第六子网络522中,得到第一目标子特征f d′。
FIG. 5F is a schematic diagram of a second network 520 provided by an embodiment of the present disclosure. As shown in FIG. 5F, the second network 520 includes a fifth sub-network 521 and a sixth sub-network 522, and the first sub-feature f1' is input to In the case of the fifth sub-network 521, the fifth sub-network 521 searches the second feature storage 552 for K nearest first centers belonging to the second object based on the first sub-feature f1′
Figure PCTCN2022127109-appb-000024
Based on the first sub-feature f1′ and the first prediction matrix W 1 , determine the first prediction feature f q , based on the first center
Figure PCTCN2022127109-appb-000025
and the second prediction matrix W 2 , determine the second prediction feature f c , based on the first center
Figure PCTCN2022127109-appb-000026
and the third prediction matrix W 3 , to determine the third prediction feature f v . The first attention matrix m i is determined based on the first prediction feature f q and the second prediction feature f c , and the first aggregation sub-feature f d is determined based on the first attention matrix m i and the third prediction feature f v . Input the first aggregation sub-feature f d into FFN 1 (·) to obtain the fifth feature f", and input the weighted first sub-feature f1' and the fifth feature f" into the sixth sub-network 522 to obtain the fifth feature f" A target sub-feature f d ′.
在一些实施方式中,特征扩散部分53,还被配置为:基于第一子特征和每一第二特征,确定第一注意力矩阵,第一注意力矩阵用于表征第一子特征和每一第二特征之间的关联度;基于每一第二特征和每一第一注意力矩阵,确定第一聚合子特征;基于第二子特征和每一第二特征,确定第二注意力矩阵,第二注意力矩阵用于表征第二子特征和每一第二特征之间的关联度;基于每一第二特征和每一第二注意力矩阵,确定第二聚合子特征。In some implementations, the feature diffusion part 53 is further configured to: determine a first attention matrix based on the first sub-feature and each second feature, and the first attention matrix is used to characterize the first sub-feature and each second feature The degree of association between the second features; based on each second feature and each first attention matrix, determine the first aggregation sub-feature; based on the second sub-feature and each second feature, determine the second attention matrix, The second attention matrix is used to characterize the degree of association between the second sub-features and each second feature; based on each second feature and each second attention matrix, the second aggregated sub-features are determined.
在一些实施方式中,第五子网络的网络参数包括第一预测矩阵和第二预测矩阵,特征扩散部分53,还被配置为:基于第一子特征和第一预测矩阵,确定第一预测特征;基于每一第二特征和第二预测矩阵,确定第二预测特征;第一预测特征和每一第二预测特征,确定第一注意力矩阵。In some embodiments, the network parameters of the fifth sub-network include a first prediction matrix and a second prediction matrix, and the feature diffusion part 53 is further configured to: determine the first prediction feature based on the first sub-feature and the first prediction matrix ; Based on each second feature and the second predictive matrix, determine a second predictive feature; determine the first attention matrix based on the first predictive feature and each second predictive feature.
在一些实施方式中,第五子网络的网络参数包括第三预测矩阵,特征扩散部分53,还被配置为:基于每一第二特征和第三预测矩阵,确定第三预测特征;基于每一第三预测特征和每一第一注意力矩阵,确定第一聚合子特征。In some embodiments, the network parameters of the fifth sub-network include a third predictive matrix, and the feature diffusion part 53 is further configured to: determine a third predictive feature based on each second feature and the third predictive matrix; The third predictive feature and each of the first attention matrices determine a first aggregated sub-feature.
在一些实施方式中,第六子网络包括第七子网络和第八子网络,特征扩散部分53,还被配置为:利用第七子网络,基于第一聚合子特征和遮挡掩码确定第五子特征,并基于第二聚合子特征和遮挡掩码确定第六子特征;利用第八子网络,基于第一子特征和第五子特征,确定第一目标子特征,并基于第二子特征和第六子特征,确定第二目标子特征。In some embodiments, the sixth sub-network includes a seventh sub-network and an eighth sub-network, and the feature diffusion part 53 is further configured to: use the seventh sub-network to determine the fifth sub-network based on the first aggregation sub-feature and the occlusion mask Sub-features, and determine the sixth sub-feature based on the second aggregated sub-feature and the occlusion mask; use the eighth sub-network, based on the first sub-feature and the fifth sub-feature, determine the first target sub-feature, and based on the second sub-feature and a sixth sub-feature to determine the second target sub-feature.
在一些实施方式中,更新部分54,还被配置为:基于第一目标子特征和第二目标子特征,确定第一目标损失值;基于第一子特征和第二子特征,确定第二目标损失值;基于第一目标损失值和第二目标损失值,确定目标损失值;基于目标损失值,对第一模型的模型参数进行至少一次更新,得到训练后的第一模型。In some implementations, the updating part 54 is further configured to: determine the first target loss value based on the first target sub-feature and the second target sub-feature; determine the second target loss value based on the first sub-feature and the second sub-feature Loss value; determining the target loss value based on the first target loss value and the second target loss value; based on the target loss value, updating the model parameters of the first model at least once to obtain the trained first model.
在一些实施方式中,更新部分54,还被配置为:在目标损失值不满足预设条件的情况下,对第一模型的模型参数进行更新,得到更新后的第一模型,基于更新后的第一模型,确定训练后的第一模型;在目标损失值满足预设条件的情况下,将更新后的第一模型确定为训练后的第一模型。In some implementations, the updating part 54 is further configured to: update the model parameters of the first model when the target loss value does not meet the preset condition, to obtain the updated first model, based on the updated The first model is to determine the first model after training; if the target loss value satisfies the preset condition, determine the updated first model as the first model after training.
在一些实施方式中,更新部分54,还被配置为:基于第一子特征和第二子特征,确定第一目标子损失值;基于第三子特征和第四子特征,确定第二目标子损失值;基于第一目标子损失值和第二目标子损失值,确定第二目标损失值。In some embodiments, the updating part 54 is further configured to: determine the first target sub-loss value based on the first sub-feature and the second sub-feature; determine the second target sub-loss value based on the third sub-feature and the fourth sub-feature Loss value: determining a second target loss value based on the first target sub-loss value and the second target sub-loss value.
在一些实施方式中,第一子图像包括标签信息,第一模型包括第二特征记忆库,第二特征记忆库中包括属于至少一个对象的至少一个特征,更新部分54,还被配置为:基于第一遮挡分数、第二遮挡分数和遮挡掩码,确定第三损失值;基于第一子特征、第二子特征和标签信息,确定第四损失值;基于第一子特征、第二子特征和第二特征记忆库中的至少一个对象的至少一个特征,确定第五损失值;基于第三损失值、第四损失值和第五损失值,确定第一目标子损失值。In some embodiments, the first sub-image includes label information, the first model includes a second feature memory bank, the second feature memory bank includes at least one feature belonging to at least one object, and the updating part 54 is further configured to: based on Determine the third loss value based on the first occlusion score, the second occlusion score and the occlusion mask; determine the fourth loss value based on the first sub-feature, the second sub-feature and label information; based on the first sub-feature, the second sub-feature and at least one feature of at least one object in the second feature memory bank to determine a fifth loss value; based on the third loss value, the fourth loss value and the fifth loss value, determine the first target sub-loss value.
在一些实施方式中,更新部分54,还被配置为:基于第一遮挡分数和遮挡掩码,确定第一子损失值;基于第二遮挡分数和遮挡掩码,确定第二子损失值;基于第一子损失值和第二子损失值,确定第三损失值。In some implementations, the updating part 54 is further configured to: determine the first sub-loss value based on the first occlusion score and the occlusion mask; determine the second sub-loss value based on the second occlusion score and the occlusion mask; The first sub-loss value and the second sub-loss value determine the third loss value.
在一些实施方式中,更新部分54,还被配置为:基于第一子特征和标签信息,确定第三子损失值;基于第二子特征和标签信息,确定第四子损失值;基于第三子损失值和第四子损失值,确定第四损失值。In some implementations, the updating part 54 is further configured to: determine a third sub-loss value based on the first sub-feature and label information; determine a fourth sub-loss value based on the second sub-feature and label information; The sub-loss value and the fourth sub-loss value determine the fourth loss value.
在一些实施方式中,更新部分54,还被配置为:从第二特征记忆库中的至少一个对象的至少一个特征中,确定第一对象的第三特征中心和至少一个第二对象的第四特征中心;基于第一子特征、第三特征中心和每一第四特征中心,确定第五子损失值;基于第二子特征、第三特征中心和每一第四特征中心,确定第六子损失值;基于第五子损失值和第六子损失值,确定第五损失值。In some embodiments, the updating part 54 is further configured to: determine the third feature center of the first object and the fourth feature center of at least one second object from at least one feature of at least one object in the second feature memory library. feature center; based on the first sub-feature, the third feature center and each fourth feature center, determine the fifth sub-loss value; based on the second sub-feature, the third feature center and each fourth feature center, determine the sixth sub- A loss value; based on the fifth sub-loss value and the sixth sub-loss value, a fifth loss value is determined.
在一些实施方式中,更新部分54,还被配置为:基于第三子特征和标签信息,确定第七子损失值;基于第四子特征和标签信息,确定第八子损失值;基于第七子损失值和第八子损失值,确定第二目标子损失值。In some implementations, the updating part 54 is further configured to: determine the seventh sub-loss value based on the third sub-feature and label information; determine the eighth sub-loss value based on the fourth sub-feature and label information; The sub-loss value and the eighth sub-loss value determine the second target sub-loss value.
图5G为本公开实施例提供的一种获取目标损失值540的示意图,如图5G所示,该目标损失值540主要包括特征提取、遮挡擦除部分52和特征扩散部分53这三个部分的损失值,其中:FIG. 5G is a schematic diagram of obtaining a target loss value 540 provided by an embodiment of the present disclosure. As shown in FIG. 5G , the target loss value 540 mainly includes three parts: feature extraction, occlusion erasing part 52 and feature diffusion part 53. loss value, where:
特征提取这一部分的损失值包括:The loss values for this part of feature extraction include:
基于第三子特征f1和第一子图像501的标签信息确定的第七损失值Loss7,基于第四子特征f2和第一子图像501的标签信息确定的第八损失值Loss8;The seventh loss value Loss7 determined based on the third sub-feature f1 and the label information of the first sub-image 501, the eighth loss value Loss8 determined based on the fourth sub-feature f2 and the label information of the first sub-image 501;
遮挡擦除部分52这一部分的损失值包括:The loss values for this part of the occlusion erasure part 52 include:
基于遮挡掩码541和第一遮挡分数s1确定的第一子损失值Loss31,基于遮挡掩码541和第二遮挡分数s2确定的第二子损失值Loss32;The first sub-loss value Loss31 determined based on the occlusion mask 541 and the first occlusion score s1, the second sub-loss value Loss32 determined based on the occlusion mask 541 and the second occlusion score s2;
基于第一子特征f1’和第一子图像501的标签信息确定的第三子损失值Loss41,基于第二子特征f2’和第一子图像501的标签信息确定的第四子损失值Loss42;The third sub-loss value Loss41 determined based on the label information of the first sub-feature f1' and the first sub-image 501, the fourth sub-loss value Loss42 determined based on the label information of the second sub-feature f2' and the first sub-image 501;
基于第一子特征f1’和第二特征记忆库552确定的第五子损失值Loss51,基于第二子特征f2’和第二特征记忆 库552确定的第六子损失值Loss52;The fifth sub-loss value Loss51 determined based on the first sub-feature f1' and the second feature memory bank 552, the sixth sub-loss value Loss52 determined based on the second sub-feature f2' and the second feature memory bank 552;
特征扩散部分53这一部分的损失值包括:The loss values for this part of the characteristic diffusion part 53 include:
基于第一目标子特征fd1’和第一子图像501的标签信息确定的第九子损失值Loss11(对应上述的第一损失值),基于第二目标子特征fd2’和第一子图像501的标签信息确定的第十子损失值Loss12(对应上述的第一损失值);The ninth sub-loss value Loss11 (corresponding to the above-mentioned first loss value) determined based on the label information of the first target sub-feature fd1' and the first sub-image 501, based on the second target sub-feature fd2' and the first sub-image 501 The tenth sub-loss value Loss12 determined by the label information (corresponding to the above-mentioned first loss value);
基于第一目标子特征fd1’和第一特征记忆库551确定的第十一子损失值Loss21(对应上述的第二损失值),基于第二目标子特征fd2’和第一特征记忆库551确定的第十二子损失值Loss22(对应上述的第二损失值)。The eleventh sub-loss value Loss21 (corresponding to the above-mentioned second loss value) determined based on the first target sub-feature fd1' and the first feature memory bank 551 is determined based on the second target sub-feature fd2' and the first feature memory bank 551 The twelfth sub-loss value Loss22 (corresponding to the above-mentioned second loss value).
在一些实施方式中,模型训练系统还包括:第二确定部分和第三确定部分;第二确定部分,被配置为基于训练后的第一模型,确定初始的第二模型;第三确定部分,被配置为基于至少一个第二图像样本,对第二模型的模型参数进行更新,得到训练后的第二模型。In some embodiments, the model training system further includes: a second determination part and a third determination part; the second determination part is configured to determine an initial second model based on the trained first model; the third determination part, It is configured to update the model parameters of the second model based on at least one second image sample to obtain the trained second model.
本公开实施例提供的方法与相关技术中的方法相比,至少存在以下改进:Compared with the method in the related art, the method provided by the embodiment of the present disclosure has at least the following improvements:
1)相关技术中,行人重识别(re-identification,ReID)的建模主要基于姿态估计算法或者人体解析算法进行辅助训练。而本公开实施例中行人重识别的建模是利用深度学习进行遮挡行人重识别。1) In related technologies, pedestrian re-identification (re-identification, ReID) modeling is mainly based on pose estimation algorithms or human body analysis algorithms for auxiliary training. However, the modeling of pedestrian re-identification in the embodiment of the present disclosure uses deep learning to perform occluded pedestrian re-identification.
2)相关技术中,在行人重识别的建模过程中,主要是基于随机擦除的方式增强模型对遮挡的鲁棒性,侧重于提高模型对非行人遮挡(Non-Pedestrian Occlusions,NPO)的鲁棒性,而忽略了来自非目标行人(Non-Target Pedestrians,NTP)的特征干扰。而本公开实施例中在行人重识别的建模的过程中,提出了一种特征擦除和扩散网络(Feature Erasing and Diffusion Network,FED)来同时处理NPO和NTP,具体地,基于遮挡擦除模块(Erasing Module,OEM)消除了NPO特征,并辅以NPO增广策略在整体行人图像上模拟NPO,生成精确的遮挡掩码。随后,基于特征扩散模块(Feature Diffusion Module,FDM)将行人特征与其他记忆特征进行扩散,以在特征空间中合成NTP特征,实现了在图片层面模拟NPO遮挡干扰,在特征层面模拟NTP干扰,可以大大提高模型对目标行人(Target Pedestrians,TP)的感知能力,并减轻NPO和NTP的影响。2) In related technologies, in the modeling process of pedestrian re-identification, the robustness of the model to occlusions is mainly enhanced based on random erasing, focusing on improving the model's resistance to non-pedestrian occlusions (NPO). Robustness, while ignoring the feature interference from Non-Target Pedestrians (NTP). In the process of pedestrian re-identification modeling in the embodiment of the present disclosure, a feature erasing and diffusion network (Feature Erasing and Diffusion Network, FED) is proposed to simultaneously process NPO and NTP, specifically, based on occlusion erasing The module (Erasing Module, OEM) eliminates NPO features, supplemented by NPO augmentation strategy to simulate NPO on the overall pedestrian image, and generates accurate occlusion masks. Then, based on the feature diffusion module (Feature Diffusion Module, FDM), the pedestrian features and other memory features are diffused to synthesize NTP features in the feature space, which realizes the simulation of NPO occlusion interference at the image level and NTP interference at the feature level. Greatly improve the model's perception of target pedestrians (Target Pedestrians, TP), and reduce the impact of NPO and NTP.
本公开实施例提供的方法至少具有以下有益效果:1)充分利用图片的遮挡信息以及其他行人的特征模拟非行人遮挡和非目标行人的干扰,能够更好地对各种影响因素进行综合分析,提高模型对TP的感知能力;2)利用深度学习,使得行人重识别的结果更加准确,提高在真实复杂场景下行人重识别的准确率。The method provided by the embodiments of the present disclosure has at least the following beneficial effects: 1) Make full use of the occlusion information of the picture and the characteristics of other pedestrians to simulate the interference of non-pedestrian occlusion and non-target pedestrians, and can better comprehensively analyze various influencing factors, Improve the model's perception of TP; 2) Use deep learning to make the results of pedestrian re-identification more accurate, and improve the accuracy of pedestrian re-identification in real and complex scenes.
为了更好地说明本公开实施例的有益效果,下面对本公开实施例提供的方法与相关技术中的方法的实验数据进行比较说明。In order to better illustrate the beneficial effects of the embodiments of the present disclosure, the experimental data of the methods provided in the embodiments of the present disclosure and methods in the related art will be compared and described below.
(1)数据集:Occluded-DukeMTMC(O-Duke)、Occluded-REID(O-REID)、Partial-REID(P-REID)这三个数据集是带有遮挡的ReID数据集,Market-1501和DukeMTMC-reID这两个数据集是很少有遮挡的ReID数据集。(1) Datasets: Occluded-DukeMTMC (O-Duke), Occluded-REID (O-REID), Partial-REID (P-REID) are ReID datasets with occlusion, Market-1501 and The two datasets DukeMTMC-reID are ReID datasets with few occlusions.
(2)评估指标:为保证与现有的行人ReID方法进行公平比较,所有方法都在累积匹配特征(Cumulative Matching Characteristic,CMC)和平均精度(mean Average Precision,mAP)下进行评估。CMC曲线用于评估人物检索的准确性。mAP是所有的平均精度的平均值。所有实验都在单个查询中执行。(2) Evaluation metrics: To ensure a fair comparison with existing pedestrian ReID methods, all methods are evaluated under Cumulative Matching Characteristic (CMC) and mean Average Precision (mAP). CMC curves are used to evaluate the accuracy of person retrieval. mAP is the average of all mean precisions. All experiments are performed in a single query.
(3)模型部分参数的初始化:将输入的图像调整为256×128。通过随机最速下降法(Stochastic Gradient,SGD)优化器以端到端的方式训练第一模型,动量为0.9,权重衰减为le-4。用余弦学习率衰减将学习率初始化为0.008。对于每个输入分支,批量大小为64,其中包含16个身份和每个身份4个样本。在两个RTX 1080Ti GPU上进行所有的实验。将对比损失中的温度τ设置为0.05,将FDM中的头部数量设置为0.8。(3) Initialization of some parameters of the model: adjust the input image to 256×128. The first model is trained end-to-end with a stochastic steepest descent (SGD) optimizer with a momentum of 0.9 and a weight decay of le-4. Initialize the learning rate to 0.008 with cosine learning rate decay. For each input branch, the batch size is 64, which contains 16 identities and 4 samples per identity. All experiments are performed on two RTX 1080Ti GPUs. Set the temperature τ in contrastive loss to 0.05 and the number of heads in FDM to 0.8.
对于NPO增强的遮挡集,只从O-Duke的训练数据中裁剪遮挡物,并采用它们来增广所有其他数据集。这是因为Market-1501包含的遮挡图像很少,而DukeMTMC-reID已经在训练集中包含了许多遮挡数据。For the NPO-enhanced occlusion set, only occluders are cropped from O-Duke's training data and adopted to augment all other datasets. This is because Market-1501 contains few occluded images, while DukeMTMC-reID already contains many occluded data in the training set.
(4)实验结果(4) Experimental results
1)本公开实施例提供的方法与现有的方法在带有遮挡的ReID数据集上的比较结果1) Comparison results between the method provided by the embodiment of the present disclosure and the existing method on the ReID dataset with occlusion
表1为在O-Duke、O-REID和P-REID这三个数据集上各个行人ReID方法的性能比较。由于O-REID和P-REID没有相应的训练集,采用在Market-1501上训练的模型进行测试。行人ReId方法包括:基于部分的卷积基线(Part-based Convonlutional Baseline,PCB)、深度空域特征重建(Deep spatial feature reconstruction,DSR)、高阶重识别(High-Order re-identification,HOReID 27)、部分感知变换(Part-aware Transformer,PAT)、基于转化器的ReID(Transformer-based Object Re-Identification,TransReID)采用没有滑动窗口设置的VisionTable 1 shows the performance comparison of each pedestrian ReID method on the three data sets of O-Duke, O-REID and P-REID. Since there is no corresponding training set for O-REID and P-REID, the model trained on Market-1501 is used for testing. Pedestrian ReId methods include: Part-based Convonlutional Baseline (PCB), Deep spatial feature reconstruction (DSR), High-Order re-identification (HOReID 27), Part-aware Transformer (Part-aware Transformer, PAT), Transformer-based ReID (Transformer-based Object Re-Identification, TransReID) adopts Vision without sliding window settings
Transformer作为主干、基于转化器的ViT Baseline,其中ViT Baseline在O-REID和P-REID数据集上的表现优于TransReID,这是因为TransReID使用了许多特定于数据集的标记。Transformer as the backbone, Transformer-based ViT Baseline, where ViT Baseline outperforms TransReID on O-REID and P-REID datasets, because TransReID uses many dataset-specific markers.
表1在O-Duke、OREID和P-REID数据集上各个方法的性能比较Table 1 Performance comparison of each method on O-Duke, OREID and P-REID datasets
Figure PCTCN2022127109-appb-000027
Figure PCTCN2022127109-appb-000027
Figure PCTCN2022127109-appb-000028
Figure PCTCN2022127109-appb-000028
将FED与现有的方法进行比较,FED在O-Duke和O-REID数据集上都获得了最高的Rank-1和mAP。特别是在O-REID数据集上,在Rank-1/mAP上达到了86.3%/79.3%,超过其它方法至少4.7%/2.6%。在O-Duke上,在Rank-1/mAP上达到68.1%/56.4%,超过其它方法至少3.6%/0.7%。在P-REID数据集上,实现了最高的mAP准确率,达到80.5%,超过其它方法3.9%。因此,在被遮挡的ReID数据集上取得了很好的表现。Comparing FED with existing methods, FED achieves the highest Rank-1 and mAP on both O-Duke and O-REID datasets. Especially on the O-REID dataset, it reached 86.3%/79.3% on Rank-1/mAP, surpassing other methods by at least 4.7%/2.6%. On O-Duke, it reaches 68.1%/56.4% on Rank-1/mAP, surpassing other methods by at least 3.6%/0.7%. On the P-REID dataset, the highest mAP accuracy is achieved, reaching 80.5%, which exceeds other methods by 3.9%. Therefore, a good performance is achieved on the occluded ReID dataset.
2)本公开实施例提供的方法与现有的方法在整体人的ReID数据集上的比较结果2) Comparison results between the method provided by the embodiment of the present disclosure and the existing method on the whole person's ReID dataset
对整体人的ReID数据集进行了实验,包括Market-1501和DukeMTMC-reID。在DukeMTMCreID数据集上进行训练时,不会计算MSE损失。这是因为训练集中存在大量的NPO,无法获得精确的遮挡掩码。结果如表2所示。TransReID没有滑动窗口设置,图像大小为256×128。TransReID在整体人的数据集上比FED获得了更好的性能。这是因为TransReID是专门为整体人的ReID设计的,并在训练过程中对相机信息进行编码。此外,然而,FED在DukeMTMC-reID上也达到了84.9%Rank-1的准确率,超过了其他基于CNN的方法并接近TransReID。Experiments are conducted on ensemble person ReID datasets, including Market-1501 and DukeMTMC-reID. When training on the DukeMTMCreID dataset, the MSE loss is not calculated. This is because there are a large number of NPOs in the training set, and accurate occlusion masks cannot be obtained. The results are shown in Table 2. TransReID has no sliding window setting and the image size is 256×128. TransReID achieves better performance than FED on the whole person dataset. This is because TransReID is specially designed for whole person ReID and encodes camera information during training. In addition, however, FED also achieves 84.9% Rank-1 accuracy on DukeMTMC-reID, surpassing other CNN-based methods and approaching TransReID.
表2在Market-1501和DukeMTMC-reID数据集上各个方法的性能比较Table 2 Performance comparison of each method on Market-1501 and DukeMTMC-reID datasets
Figure PCTCN2022127109-appb-000029
Figure PCTCN2022127109-appb-000029
3)FED的有效性3) Effectiveness of FED
在表3中,展示了NPO增广策略(NPO Augmentation,NPO Aug)、OEM和FDM的消融研究。序号1~5分别代表基线、基线+NPO Aug、基线+NPO Aug+OEM、基线+NPO Aug+FDM和FED。模型1采用ViT作为特征提取器,并通过交叉熵损失(ID Loss)和Triplet Loss进行优化。通过比较模型1(基线)和模型2(基线+NPO Aug),在Rank-1上有很大的改进,达到4.9%,表明增广图像是真实的和有价值的。通过比较模型2(基线+NPO Aug)和模型3(基线+NPO Aug+OEM),OEM可以通过删除潜在的NPO信息来进一步改进表示。通过比较模型2(基线+NPO Aug)和模型4(基线+NPO Aug+FDM),FDM在Rank-1和mAP上分别提高了1.7%和2.4%。这意味着优化具有扩散特征的网络可以大大提高模型对TP的感知能力。最后,FED实现了最高准确度,表明每个组件都可以单独和协同工作。In Table 3, the ablation studies of NPO Augmentation (NPO Aug), OEM and FDM are shown. Numbers 1 to 5 represent baseline, baseline+NPO Aug, baseline+NPO Aug+OEM, baseline+NPO Aug+FDM and FED, respectively. Model 1 uses ViT as the feature extractor and is optimized by cross-entropy loss (ID Loss) and Triplet Loss. By comparing Model 1 (Baseline) with Model 2 (Baseline + NPO Aug), there is a large improvement on Rank-1 of 4.9%, showing that augmented images are real and valuable. By comparing Model 2 (Baseline + NPO Aug) with Model 3 (Baseline + NPO Aug + OEM), OEMs can further improve the representation by removing the underlying NPO information. By comparing model 2 (baseline + NPO Aug) and model 4 (baseline + NPO Aug + FDM), FDM improves Rank-1 and mAP by 1.7% and 2.4%, respectively. This means that optimizing a network with diffusion features can greatly improve the model's perception of TP. In the end, FED achieved the highest accuracy, showing that each component works both individually and together.
表3FED的有效性Table 3 Validity of FED
Figure PCTCN2022127109-appb-000030
Figure PCTCN2022127109-appb-000030
4)特征记忆库的K近邻分析4) K-Nearest Neighbor Analysis of Feature Memory
在这里,分析特征记忆库搜索操作中的搜索次数K。在表4中,将K设置为2、4、6和8,并在DukeMTMC-reID、Market-1501和Occlude-DukeMTMC进行实验。DukeMTMC-reID和Market-1501这两个整体人的ReID数据集的性能在各种K上表现稳定,浮动在0.5%以内。对于Market-1501,NPO和NTP很少,未能突出FDM的有效性。对于DukeMTMC-reID来说,大量的训练数据都带有NPO和NTP,损失约束可以使网络具有很高的准确性。对于Occluded-DukeMTMC,由于所有的训练数据都是整体行人,引入FDM可以极大地模拟测试集中的多行人情况。随着K的增加,FDM可以更好地保持TP的特性并引入逼真的噪声。Here, the number of searches K in the feature memory search operation is analyzed. In Table 4, K is set as 2, 4, 6 and 8, and experiments are performed on DukeMTMC-reID, Market-1501 and Occlude-DukeMTMC. The performance of the two overall person ReID datasets, DukeMTMC-reID and Market-1501, is stable on various K, within 0.5%. For Market-1501, NPO and NTP are few, failing to highlight the effectiveness of FDM. For DukeMTMC-reID, a large amount of training data comes with NPO and NTP, and the loss constraints can make the network have high accuracy. For Occluded-DukeMTMC, since all the training data are overall pedestrians, the introduction of FDM can greatly simulate the multi-pedestrian situation in the test set. As K increases, FDM can better preserve the characteristics of TP and introduce realistic noise.
表4K近邻分析Table 4K nearest neighbor analysis
Figure PCTCN2022127109-appb-000031
Figure PCTCN2022127109-appb-000031
5)FED的定性分析5) Qualitative analysis of FED
图5H为本公开实施例提供的一种行人图像的遮挡分数的示意图,在图5H中,展示了来自OEM的一些行人图像的遮挡分数。显示了带有NPO和非目标行人NTP的图像。从图6H中可以看出,对于存在垂直物体遮挡的图551和图552,几乎不会影响遮挡分数,因为遮挡不到一半的对称行人对于行人ReID来说不是关键问题。对于存在水平遮挡的图553和图554,OEM可以精确识别NPO并用较小的遮挡分数对其进行标记。对于多行人图像遮挡的图555和图556,OEM将每个条纹标识为有价值的。因此,后续的FDM对于提高模型性能至关重要。FIG. 5H is a schematic diagram of occlusion scores of pedestrian images provided by an embodiment of the present disclosure. In FIG. 5H , occlusion scores of some pedestrian images from OEMs are shown. Images with NPO and non-target pedestrian NTP are shown. From Fig. 6H, it can be seen that for graphs 551 and 552 with vertical object occlusion, the occlusion score is hardly affected, because symmetric pedestrians with less than half occlusion are not a critical issue for pedestrian ReID. For maps 553 and 554 where horizontal occlusion is present, OEMs can accurately identify NPOs and flag them with a small occlusion score. For maps 555 and 556 with multiple pedestrian image occlusions, the OEM identifies each stripe as valuable. Therefore, the subsequent FDM is crucial to improve the model performance.
6)利用特征和分布表征的检索结果示例6) Examples of retrieval results using feature and distribution representations
图5I为本公开实施例提供的一种图像检索的结果的示意图,如图5I所示,展示了TransReID和FED的检索结果。图561和图562是物体遮挡图像。很明显,FED对NPO有更好的识别能力,可以准确地检索到目标行人。图563和图564是多行人图像,FED对TP具有更强的感知能力,并实现了更高的检索精度。FIG. 5I is a schematic diagram of an image retrieval result provided by an embodiment of the present disclosure. As shown in FIG. 5I , it shows the retrieval results of TransReID and FED. Figure 561 and Figure 562 are object occlusion images. It is obvious that FED has a better recognition ability for NPO and can accurately retrieve the target pedestrian. Figure 563 and Figure 564 are multi-pedestrian images, and FED has a stronger perception of TP and achieves higher retrieval accuracy.
基于上述实施例,本公开实施例提供一种模型训练装置,图6为本公开实施例提供的一种模型训练装置的组成结构示意图,如图6所示,该模型训练装置60包括第一获取部分61、特征提取部分62、第一更新部分63、第一确定部分64以及第二更新部分65。Based on the above-mentioned embodiments, an embodiment of the present disclosure provides a model training device. FIG. 6 is a schematic diagram of the composition and structure of a model training device provided by an embodiment of the present disclosure. As shown in FIG. 6 , the model training device 60 includes a first acquisition part 61 , feature extraction part 62 , first update part 63 , first determination part 64 and second update part 65 .
第一获取部分61,被配置为获取包含第一对象的第一图像样本;a first acquiring part 61 configured to acquire a first image sample containing a first object;
特征提取部分62,被配置为利用待训练的第一模型的第一网络,对第一图像样本进行特征提取,得到第一对象的第一特征;The feature extraction part 62 is configured to use the first network of the first model to be trained to perform feature extraction on the first image sample to obtain the first feature of the first object;
第一更新部分63,被配置为利用第一模型的第二网络,基于至少一个第二对象的第二特征,对第一特征分别进行更新,得到第一特征对应的第一目标特征,每一第二对象与第一对象的相似度不小于第一阈值;The first updating part 63 is configured to use the second network of the first model to update the first features respectively based on the second features of at least one second object to obtain the first target features corresponding to the first features, each the similarity between the second object and the first object is not less than a first threshold;
第一确定部分64,被配置为基于第一目标特征,确定目标损失值;The first determining part 64 is configured to determine a target loss value based on the first target feature;
第二更新部分65,被配置为基于目标损失值,对第一模型的模型参数进行至少一次更新,得到训练后的第一模型。The second updating part 65 is configured to update the model parameters of the first model at least once based on the target loss value to obtain the trained first model.
在一些实施方式中,第一图像样本包括标签信息,第一模型中包括第一特征记忆库,第一特征记忆库中包括属于至少一个对象的至少一个特征,第一确定部分64,还被配置为:基于第一目标特征和标签信息,确定第一损失值;基于第一目标特征和第一特征记忆库中的至少一个对象的至少一个特征,确定第二损失值;基于第一损失值和第二损失值,确定目标损失值。In some embodiments, the first image sample includes label information, the first model includes a first feature memory, the first feature memory includes at least one feature belonging to at least one object, and the first determination part 64 is further configured To: determine the first loss value based on the first target feature and label information; determine the second loss value based on the first target feature and at least one feature of at least one object in the first feature memory; based on the first loss value and The second loss value determines the target loss value.
在一些实施方式中,第一确定部分64,还被配置为:从第一特征记忆库中的至少一个对象的至少一个特征中,确定第一对象的第一特征中心和至少一个第二对象的第二特征中心;基于第一目标特征、第一特征中心和每一第二特征中心,确定第二损失值。In some embodiments, the first determining part 64 is further configured to: determine the first feature center of the first object and the at least one second object from at least one feature of at least one object in the first feature memory library second feature center; determining a second loss value based on the first target feature, the first feature center, and each second feature center.
在一些实施方式中,第一特征记忆库中包括属于至少一个对象的特征集,每一特征集包括所属对象的至少一个特征,该装置还包括:第三更新部分,被配置为基于第一目标特征,更新第一特征记忆库中属于第一对象的特征集。In some embodiments, the first feature memory library includes feature sets belonging to at least one object, each feature set includes at least one feature of the object to which it belongs, and the device further includes: a third updating part configured to feature, updating the feature set belonging to the first object in the first feature memory.
在一些实施方式中,第一获取部分61,还被配置为:获取包含第一对象的第一子图像和第二子图像,第二子图像是对第一子图像至少进行遮挡处理后的图像;特征提取部分62,还被配置为:利用待训练的第一模型的第一网络,对第一子图像进行特征提取,得到第一对象的第一子特征,并对第二子图像进行特征提取,得到第一对象的第二子特征;第一更新部分63,还被配置为:利用第一模型的第二网络,基于至少一个第二对象的第二特征,对第一子特征和第二子特征分别进行更新,得到第一子特征对应的第一目标子特征和第二子特征对应的第二目标子特征;第一确定部分64,还被配置为:基于第一目标子特征和第二目标子特征,确定目标损失值。In some implementations, the first acquisition part 61 is further configured to: acquire the first sub-image and the second sub-image containing the first object, the second sub-image is an image obtained by at least performing occlusion processing on the first sub-image The feature extraction part 62 is also configured to: use the first network of the first model to be trained to perform feature extraction on the first sub-image, obtain the first sub-feature of the first object, and perform feature extraction on the second sub-image Extract to obtain the second sub-feature of the first object; the first update part 63 is also configured to: use the second network of the first model, based on the second feature of at least one second object, to the first sub-feature and the first sub-feature The two sub-features are updated respectively to obtain the first target sub-feature corresponding to the first sub-feature and the second target sub-feature corresponding to the second sub-feature; the first determining part 64 is also configured to: based on the first target sub-feature and The second target sub-feature determines the target loss value.
在一些实施方式中,第一确定部分64,还被配置为:基于第一目标子特征和第二目标子特征,确定第一目标损失值;基于第一子特征和第二子特征,确定第二目标损失值;基于第一目标损失值和第二目标损失值,确定目标损失值。In some embodiments, the first determination part 64 is further configured to: determine the first target loss value based on the first target sub-feature and the second target sub-feature; Two target loss values; based on the first target loss value and the second target loss value, determine the target loss value.
在一些实施方式中,第一获取部分61,还被配置为:获取包含第一对象的第一子图像;基于预设的遮挡集,对第一子图像至少进行遮挡处理,得到第二子图像,遮挡集中包括至少一个遮挡图像。In some embodiments, the first acquisition part 61 is further configured to: acquire the first sub-image containing the first object; based on the preset occlusion set, at least perform occlusion processing on the first sub-image to obtain the second sub-image , the occlusion set includes at least one occlusion image.
在一些实施方式中,第一网络包括第一子网络和第二子网络,特征提取部分62,还被配置为:利用待训练的第一模型的第一子网络,分别对第一子图像和第二子图像进行特征提取,得到第一子图像对应的第三子特征和第二子图像对应的第四子特征;利用第一模型的第二子网络,基于第三子特征确定第一子特征,并基于第四子特征确定第二子特征。In some embodiments, the first network includes a first sub-network and a second sub-network, and the feature extraction part 62 is further configured to: use the first sub-network of the first model to be trained to respectively perform the first sub-image and Feature extraction is performed on the second sub-image to obtain the third sub-feature corresponding to the first sub-image and the fourth sub-feature corresponding to the second sub-image; use the second sub-network of the first model to determine the first sub-feature based on the third sub-feature feature, and determine the second sub-feature based on the fourth sub-feature.
在一些实施方式中,第一确定部分64,还被配置为:基于第一子特征和第二子特征,确定第一目标子损失值;基于第三子特征和第四子特征,确定第二目标子损失值;基于第一目标子损失值和第二目标子损失值,确定第二目标损失值。In some embodiments, the first determining part 64 is further configured to: determine the first target sub-loss value based on the first sub-feature and the second sub-feature; determine the second target sub-loss value based on the third sub-feature and the fourth sub-feature A target sub-loss value; determining a second target loss value based on the first target sub-loss value and the second target sub-loss value.
在一些实施方式中,第一子图像包括标签信息,第一确定部分64,还被配置为:基于第三子特征和标签信息,确定第七子损失值;基于第四子特征和标签信息,确定第八子损失值;基于第七子损失值和第八子损失值,确定第二目标子损失值。In some embodiments, the first sub-image includes label information, and the first determining part 64 is further configured to: determine a seventh sub-loss value based on the third sub-feature and label information; based on the fourth sub-feature and label information, Determine an eighth sub-loss value; determine a second target sub-loss value based on the seventh sub-loss value and the eighth sub-loss value.
在一些实施方式中,第二子网络包括第三子网络和第四子网络,特征提取部分62,还被配置为:利用第一模型的第三子网络,基于第三子特征确定第一遮挡分数,并基于第四子特征确定第二遮挡分数;利用第四子网络,基于第三子特征和第一遮挡分数,确定第一子特征,并基于第四子特征和第二遮挡分数,确定第二子特征。In some embodiments, the second sub-network includes a third sub-network and a fourth sub-network, and the feature extraction part 62 is further configured to: use the third sub-network of the first model to determine the first occlusion based on the third sub-feature score, and determine the second occlusion score based on the fourth sub-feature; use the fourth sub-network, based on the third sub-feature and the first occlusion score, determine the first sub-feature, and based on the fourth sub-feature and the second occlusion score, determine Second sub-feature.
在一些实施方式中,第三子网络包括池化子网络和至少一个遮挡擦除子网络,第一遮挡分数包括至少一个第一遮挡子分数,第二遮挡分数包括至少一个第二遮挡子分数,特征提取部分62,还被配置为:利用池化子网络,将第三子特征划分为至少一个第三子部分特征,并将第四子特征划分为至少一个第四子部分特征;利用每一遮挡擦除子网络,基于每一第三子部分特征,确定每一第一遮挡子分数,并基于每一第四子部分特征,确定每一第二遮挡子分数。In some embodiments, the third subnetwork includes a pooling subnetwork and at least one occlusion erasure subnetwork, the first occlusion score includes at least one first occlusion subscore, the second occlusion score includes at least one second occlusion subscore, The feature extraction part 62 is also configured to: divide the third sub-feature into at least one third sub-part feature by using the pooling sub-network, and divide the fourth sub-feature into at least one fourth sub-part feature; use each The occlusion erasure sub-network determines each first occlusion subscore based on each third subsection feature, and determines each second occlusion subscore based on each fourth subsection feature.
在一些实施方式中,特征提取部分62,还被配置为:利用第四子网络,基于第三子特征的每一第三子部分特征和每一第一遮挡子分数,确定第一子部分特征,并基于第四子特征的每一第四子部分特征和每一第二遮挡子分数,确定第二子部分特征;基于每一第一子部分特征,确定第一子特征,并基于每一第二子部分特征,确定第二子特征。In some embodiments, the feature extraction part 62 is further configured to: use the fourth sub-network to determine the first sub-part feature based on each third sub-part feature and each first occlusion sub-score of the third sub-feature , and based on each fourth sub-part feature and each second occlusion sub-score of the fourth sub-feature, determine the second sub-part feature; based on each first sub-part feature, determine the first sub-feature, and based on each The second sub-part feature, to determine the second sub-feature.
在一些实施方式中,第一子图像包括标签信息,第一模型包括第二特征记忆库,第二特征记忆库中包括属于至少一个对象的至少一个特征,第一确定部分64,还被配置为:基于第一子图像和第二子图像,确定遮挡掩码;基于第一遮挡分数、第二遮挡分数和遮挡掩码,确定第三损失值;基于第一子特征、第二子特征和标签信息,确定第四损失值;基于第一子特征、第二子特征和第二特征记忆库中的至少一个对象的至少一个特征,确定第五损失值;基于第三损失值、第四损失值和第五损失值,确定第一目标子损失值。In some embodiments, the first sub-image includes label information, the first model includes a second feature memory bank, the second feature memory bank includes at least one feature belonging to at least one object, and the first determining part 64 is further configured to : Determine the occlusion mask based on the first sub-image and the second sub-image; determine the third loss value based on the first occlusion score, the second occlusion score and the occlusion mask; based on the first sub-feature, the second sub-feature and the label information, determine a fourth loss value; based on at least one feature of at least one object in the first sub-feature, the second sub-feature, and the second feature memory bank, determine the fifth loss value; based on the third loss value, the fourth loss value and the fifth loss value to determine the first target sub-loss value.
在一些实施方式中,第一确定部分64,还被配置为:将第一子图像和第二子图像分别划分为至少一个第一子部分图像和至少一个第二子部分图像;基于每一第一子部分图像和每一第二子部分图像,确定遮挡子掩码;基于每一遮挡子掩码,确定遮挡掩码。In some embodiments, the first determining part 64 is further configured to: divide the first sub-image and the second sub-image into at least one first sub-part image and at least one second sub-part image; An occlusion sub-mask is determined for a sub-partial image and each second sub-partial image; based on each occlusion sub-mask, an occlusion mask is determined.
在一些实施方式中,第一确定部分64,还被配置为:基于第一遮挡分数和遮挡掩码,确定第一子损失值;基于第二遮挡分数和遮挡掩码,确定第二子损失值;基于第一子损失值和第二子损失值,确定第三损失值。In some implementations, the first determining part 64 is further configured to: determine the first sub-loss value based on the first occlusion score and the occlusion mask; determine the second sub-loss value based on the second occlusion score and the occlusion mask ; Determine a third loss value based on the first sub-loss value and the second sub-loss value.
在一些实施方式中,第一确定部分64,还被配置为:基于第一子特征和标签信息,确定第三子损失值;基于第二子特征和标签信息,确定第四子损失值;基于第三子损失值和第四子损失值,确定第四损失值。In some embodiments, the first determining part 64 is further configured to: determine a third sub-loss value based on the first sub-feature and label information; determine a fourth sub-loss value based on the second sub-feature and label information; The third sub-loss value and the fourth sub-loss value determine the fourth loss value.
在一些实施方式中,第一确定部分64,还被配置为:从第二特征记忆库中的至少一个对象的至少一个特征中,确定第一对象的第三特征中心和至少一个第二对象的第四特征中心;基于第一子特征、第三特征中心和每一第四特征中心,确定第五子损失值;基于第二子特征、第三特征中心和每一第四特征中心,确定第六子损失值;基于第五子损失值和第六子损失值,确定第五损失值。In some embodiments, the first determination part 64 is further configured to: determine the third feature center of the first object and the The fourth feature center; based on the first sub-feature, the third feature center and each fourth feature center, determine the fifth sub-loss value; based on the second sub-feature, the third feature center and each fourth feature center, determine the first Six sub-loss values; based on the fifth sub-loss value and the sixth sub-loss value, a fifth loss value is determined.
在一些实施方式中,第二网络包括第五子网络和第六子网络,第一更新部分63,还被配置为:利用第五子网络,将第一子特征和第二子特征分别与至少一个第二对象的第二特征进行聚合,得到第一子特征对应的第一聚合子特征和第二子特征对应的第二聚合子特征;利用第六子网络,基于第一聚合子特征确定第一目标子特征,并基于第二聚合子特征确定第二目标子特征。In some embodiments, the second network includes a fifth sub-network and a sixth sub-network, and the first updating part 63 is further configured to: use the fifth sub-network to combine the first sub-feature and the second sub-feature with at least The second feature of a second object is aggregated to obtain the first aggregated sub-feature corresponding to the first sub-feature and the second aggregated sub-feature corresponding to the second sub-feature; the sixth sub-network is used to determine the first aggregated sub-feature based on the first aggregated sub-feature a target sub-feature, and determine a second target sub-feature based on the second aggregated sub-feature.
在一些实施方式中,第一更新部分63,还被配置为:基于第一子特征和每一第二特征,确定第一注意力矩阵,第一注意力矩阵用于表征第一子特征和每一第二特征之间的关联度;基于每一第二特征和每一第一注意力矩阵,确定第一聚合子特征;基于第二子特征和每一第二特征,确定第二注意力矩阵,第二注意力矩阵用于表征第二子特征和每一第二特征之间的关联度;基于每一第二特征和每一第二注意力矩阵,确定第二聚合子特征。In some implementations, the first updating part 63 is further configured to: determine a first attention matrix based on the first sub-feature and each second feature, and the first attention matrix is used to characterize the first sub-feature and each second feature A degree of association between the second features; based on each second feature and each first attention matrix, determine the first aggregation sub-feature; based on the second sub-feature and each second feature, determine the second attention matrix , the second attention matrix is used to characterize the degree of association between the second sub-feature and each second feature; based on each second feature and each second attention matrix, the second aggregation sub-feature is determined.
在一些实施方式中,第六子网络包括第七子网络和第八子网络,第一更新部分63,还被配置为:利用第七子网络,基于第一聚合子特征和遮挡掩码确定第五子特征,并基于第二聚合子特征和遮挡掩码确定第六子特征;利用第八子网络,基于第一子特征和第五子特征,确定第一目标子特征,并基于第二子特征和第六子特征,确定第二目标子特征。In some embodiments, the sixth sub-network includes a seventh sub-network and an eighth sub-network, and the first updating part 63 is further configured to: use the seventh sub-network to determine the sixth sub-network based on the first aggregation sub-feature and the occlusion mask Five sub-features, and determine the sixth sub-feature based on the second aggregation sub-feature and the occlusion mask; use the eighth sub-network, based on the first sub-feature and the fifth sub-feature, determine the first target sub-feature, and based on the second sub-feature feature and the sixth sub-feature, determine the second target sub-feature.
基于上述实施例,本公开实施例提供一种图像识别装置,图7为本公开实施例提供的一种图像识别装置的组成结构示意图,如图7所示,该图像识别装置70包括第二获取部分71以及识别部分72。Based on the above-mentioned embodiments, an embodiment of the present disclosure provides an image recognition device. FIG. 7 is a schematic diagram of the composition and structure of an image recognition device provided by an embodiment of the present disclosure. As shown in FIG. 7 , the image recognition device 70 includes a second acquisition part 71 and identification part 72.
第二获取部分71,被配置为获取第一图像和第二图像;a second acquiring part 71 configured to acquire the first image and the second image;
识别部分72,被配置为利用已训练的目标模型,对第一图像中的对象和第二图像中的对象进行识别,得到识别结果,其中已训练的目标模型包括:采用上述模型训练方法得到的第一模型;识别结果表征第一图像中的对象和第二图像中的对象为同一对象或者不同对象。The identification part 72 is configured to use the trained target model to identify the object in the first image and the object in the second image to obtain a recognition result, wherein the trained target model includes: the model obtained by using the above-mentioned model training method The first model: the recognition result indicates that the object in the first image and the object in the second image are the same object or different objects.
以上装置实施例的描述,与上述方法实施例的描述是类似的,具有同方法实施例相似的有益效果。对于本公开装置实施例中未披露的技术细节,请参照本公开方法实施例的描述而理解。The description of the above device embodiment is similar to the description of the above method embodiment, and has similar beneficial effects as the method embodiment. For technical details not disclosed in the device embodiments of the present disclosure, please refer to the description of the method embodiments of the present disclosure for understanding.
在本公开实施例以及其他的实施例中,“部分”可以是部分电路、部分处理器、部分程序或软件等等,当然也可以是单元,还可以是模块也可以是非模块化的。In the embodiments of the present disclosure and other embodiments, a "part" may be a part of a circuit, a part of a processor, a part of a program or software, etc., of course it may also be a unit, a module or a non-modular one.
需要说明的是,本公开实施例中,如果以软件功能模块的形式实现上述方法,并作为独立的产品销售或使用时,也可以存储在一个计算机可读取存储介质中。基于这样的理解,本公开实施例的技术方案本质上或者说对相关技术做出贡献的部分可以以软件产品的形式体现出来,该软件产品存储在一个存储介质中,包括若干指令用以使得一台电子设备(可以是个人计算机、服务器、或者网络设备等)执行本公开各个实施例所述方法的全部或部分。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read Only Memory,ROM)、磁碟或者光盘等各种可以存储程序代码的介质。这样,本公开实施例不限制于任何特定的硬件和软件结合。It should be noted that, in the embodiments of the present disclosure, if the above methods are implemented in the form of software function modules and sold or used as independent products, they may also be stored in a computer-readable storage medium. Based on this understanding, the essence of the technical solution of the embodiments of the present disclosure or the part that contributes to the related technology can be embodied in the form of a software product, which is stored in a storage medium and includes several instructions to make a An electronic device (which may be a personal computer, a server, or a network device, etc.) executes all or part of the methods described in various embodiments of the present disclosure. The aforementioned storage medium includes: various media that can store program codes such as U disk, mobile hard disk, read-only memory (Read Only Memory, ROM), magnetic disk or optical disk. As such, embodiments of the present disclosure are not limited to any specific combination of hardware and software.
本公开实施例提供一种电子设备,包括存储器和处理器,存储器存储有可在处理器上运行的计算机程序,处理器执行计算机程序时实现上述方法。An embodiment of the present disclosure provides an electronic device, including a memory and a processor, the memory stores a computer program that can run on the processor, and the processor implements the above method when executing the computer program.
本公开实施例提供一种计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现上述方法。计算机可读存储介质可以是瞬时性的,也可以是非瞬时性的。An embodiment of the present disclosure provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the foregoing method is implemented. Computer readable storage media may be transitory or non-transitory.
本公开实施例提供一种计算机程序产品,计算机程序产品包括存储了计算机程序的非瞬时性计算机可读存储介质,计算机程序被计算机读取并执行时,实现上述方法中的部分或全部步骤。该计算机程序产品可以具体通过硬件、软件或其结合的方式实现。在一个实施例中,计算机程序产品具体体现为计算机存储介质,在一个实施例中,计算机程序产品具体体现为软件产品,例如软件开发包(Software Development Kit,SDK)等等。An embodiment of the present disclosure provides a computer program product. The computer program product includes a non-transitory computer-readable storage medium storing a computer program. When the computer program is read and executed by a computer, part or all of the steps in the above method are implemented. The computer program product can be specifically realized by means of hardware, software or a combination thereof. In one embodiment, the computer program product is embodied as a computer storage medium, and in one embodiment, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK) and the like.
需要说明的是,图8为本公开实施例中电子设备的一种硬件实体示意图,如图8所示,该电子设备800的硬件实体包括:处理器801、通信接口802和存储器803,其中:It should be noted that FIG. 8 is a schematic diagram of a hardware entity of an electronic device in an embodiment of the present disclosure. As shown in FIG. 8, the hardware entity of the electronic device 800 includes: a processor 801, a communication interface 802, and a memory 803, wherein:
处理器801通常控制电子设备800的总体操作。通信接口802可以使电子设备通过网络与其他终端或服务器通信。存储器803配置为存储由处理器801可执行的指令和应用,还可以缓存待处理器801以及电子设备800中各模块待处理或已经处理的数据(例如,图像数据、音频数据、语音通信数据和视频通信数据),可以通过闪存(FLASH)或随机访问存储器(Random Access Memory,RAM)实现。处理器801、通信接口802和存储器803之间可以通过总线804进行数据传输。The processor 801 generally controls the overall operation of the electronic device 800 . The communication interface 802 can enable the electronic device to communicate with other terminals or servers through the network. The memory 803 is configured to store instructions and applications executable by the processor 801, and can also cache data to be processed or processed by the processor 801 and various modules in the electronic device 800 (for example, image data, audio data, voice communication data and Video communication data) can be realized by flash memory (FLASH) or random access memory (Random Access Memory, RAM). Data transmission may be performed between the processor 801 , the communication interface 802 and the memory 803 through the bus 804 .
这里需要指出的是:以上存储介质和设备实施例的描述,与上述方法实施例的描述是类似的,具有同方法实施例相似的有益效果。对于本公开存储介质和设备实施例中未披露的技术细节,请参照本公开方法实施例的描述而理解。It should be pointed out here that: the descriptions of the above storage medium and device embodiments are similar to the descriptions of the above method embodiments, and have similar beneficial effects to those of the method embodiments. For technical details not disclosed in the storage medium and device embodiments of the present disclosure, please refer to the description of the method embodiments of the present disclosure for understanding.
应理解,说明书通篇中提到的“一个实施例”或“一实施例”意味着与实施例有关的特定特征、结构或特性包括在本公开的至少一个实施例中。因此,在整个说明书各处出现的“在一个实施例中”或“在一实施例中”未必一定指相同的实施例。此外,这些特定的特征、结构或特性可以任意适合的方式结合在一个或多个实施例中。应理解,在本公开的各种实施例中,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本公开实施例的实施过程构成任何限定。上述本公开实施例序号为了描述,不代表实施例的优劣。需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者装置不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者装置所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、方法、物品或者装置中还存在另外的相同要素。It should be understood that reference throughout the specification to "one embodiment" or "an embodiment" means that a particular feature, structure or characteristic related to the embodiment is included in at least one embodiment of the present disclosure. Thus, appearances of "in one embodiment" or "in an embodiment" in various places throughout the specification are not necessarily referring to the same embodiment. Furthermore, the particular features, structures or characteristics may be combined in any suitable manner in one or more embodiments. It should be understood that in various embodiments of the present disclosure, the sequence numbers of the above-mentioned processes do not mean the order of execution, and the execution order of the processes should be determined by their functions and internal logic, rather than by the embodiments of the present disclosure. The implementation process constitutes any limitation. The serial numbers of the above-mentioned embodiments of the present disclosure are for description, and do not represent the advantages and disadvantages of the embodiments. It should be noted that, in this document, the term "comprising", "comprising" or any other variation thereof is intended to cover a non-exclusive inclusion such that a process, method, article or apparatus comprising a set of elements includes not only those elements, It also includes other elements not expressly listed, or elements inherent in the process, method, article, or device. Without further limitations, an element defined by the phrase "comprising a ..." does not preclude the presence of additional identical elements in the process, method, article, or apparatus comprising that element.
在本公开所提供的几个实施例中,应该理解到,所揭露的设备和方法,可以通过其它的方式实现。以上所描述的设备实施例是示意性的,例如,所述单元的划分,为一种逻辑功能划分,实际实现时可以有另外的划分方式,如:多个单元或组件可以结合,或可以集成到另一个系统,或一些特征可以忽略,或不执行。另外,所显示或讨论的各组成部分相互之间的耦合、或直接耦合、或通信连接可以是通过一些接口,设备或单元的间接耦合或通信连接,可以是电性的、机械的或其它形式的。上述作为分离部件说明的单元可以是、或也可以不是物理上分开的,作为单元显示的部件可以是、或也可以不是物理单元;既可以位于一个地方,也可以分布到多个网络单元上;可以根据实际的需要选择其中的部分或全部单元来实现本实施例方案的目的。另外,在本公开实施例中的各功能单元可以全部集成在一个处理单元中,也可以是各单元分别单独作为一个单元,也可以两个或两个以上单元集成在一个单元中;上述集成的单元既可以采用硬件的形式实现,也可以采用硬件加软件功能单元的形式实现。In the several embodiments provided in the present disclosure, it should be understood that the disclosed devices and methods may be implemented in other ways. The device embodiments described above are schematic. For example, the division of the units is a logical function division. In actual implementation, there may be other division methods, such as: multiple units or components can be combined or integrated. to another system, or some features may be ignored, or not implemented. In addition, the coupling, or direct coupling, or communication connection between the components shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be electrical, mechanical or other forms of. The units described above as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units; they may be located in one place or distributed to multiple network units; Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment. In addition, all the functional units in the embodiments of the present disclosure may be integrated into one processing unit, each unit may be used as a single unit, or two or more units may be integrated into one unit; the above-mentioned integrated The unit can be realized in the form of hardware or in the form of hardware plus software functional unit.
本领域普通技术人员可以理解:实现上述方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成,前述的程序可以存储于计算机可读取存储介质中,该程序在执行时,执行包括上述方法实施例的步骤;而前述的存储介质包括:移动存储设备、只读存储器(Read Only Memory,ROM)、磁碟或者光盘等各种可以存储程序代码的介质。或者,本公开上述集成的单元如果以软件功能模块的形式实现并作为独立的产品销售或使用时,也可以存储在一个计算机可读取存储介质中。基于这样的理解,本公开的技术方案本质上或者说对相关技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台电子设备(可以是个人计算机、服务器、或者网络设备等)执行本公开各个实施例所述方法的全部或部分。而前述的存储介质包括:移动存储设备、ROM、磁碟或者光盘等各种可以存储程序代码的介质。Those of ordinary skill in the art can understand that all or part of the steps to realize the above method embodiments can be completed by hardware related to program instructions, and the aforementioned programs can be stored in computer-readable storage media. When the program is executed, the execution includes: The steps of the foregoing method embodiments; and the aforementioned storage media include: various media that can store program codes such as removable storage devices, read only memory (ROM), magnetic disks or optical disks. Alternatively, if the above-mentioned integrated units of the present disclosure are realized in the form of software function modules and sold or used as independent products, they may also be stored in a computer-readable storage medium. Based on this understanding, the essence of the technical solution of the present disclosure or the part that contributes to related technologies can be embodied in the form of software products, which are stored in a storage medium and include several instructions to make a An electronic device (which may be a personal computer, a server, or a network device, etc.) executes all or part of the methods described in various embodiments of the present disclosure. The aforementioned storage medium includes various media capable of storing program codes such as removable storage devices, ROMs, magnetic disks or optical disks.
以上所述,为本公开的实施方式,但本公开的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本公开揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本公开的保护范围之内。The above is an embodiment of the present disclosure, but the scope of protection of the present disclosure is not limited thereto. Anyone skilled in the art can easily think of changes or substitutions within the technical scope of the present disclosure, and should cover all within the protection scope of the present disclosure.
工业实用性Industrial Applicability
本公开实施例提供了一种模型训练及图像识别方法和装置、设备、存储介质和计算机程序产品,该模型训练方法包括:获取包含第一对象的第一图像样本;利用待训练的第一模型的第一网络,对第一图像样本进行特征提取,得到第一对象的第一特征;利用第一模型的第二网络,基于至少一个第二对象的第二特征,对第一特征进行更新,得到第一特征对应的第一目标特征,每一第二对象与第一对象的相似度不小于第一阈值;基于第一目标特征,确定目标损失值;基于目标损失值,对第一模型的模型参数进行至少一次更新,得到训练后的第一模型。上述方案,一方面,可以增强第一模型的鲁棒性和提高第一模型的性能;另一方面,可以提高训练后的第一模型对于同一对象的不同图像样本的预测的一致性,进而能够使得训练后的第一模型能够更加准确的对包含多个对象的图像中的对象进行重识别。Embodiments of the present disclosure provide a model training and image recognition method, device, storage medium, and computer program product. The model training method includes: acquiring a first image sample containing a first object; using the first model to be trained The first network of the first image sample is subjected to feature extraction to obtain the first feature of the first object; the second network of the first model is used to update the first feature based on at least one second feature of the second object, The first target feature corresponding to the first feature is obtained, and the similarity between each second object and the first object is not less than the first threshold; based on the first target feature, the target loss value is determined; based on the target loss value, the first model's The model parameters are updated at least once to obtain the trained first model. The above scheme, on the one hand, can enhance the robustness of the first model and improve the performance of the first model; on the other hand, it can improve the consistency of the prediction of the first model after training for different image samples of the same object, and then can This enables the trained first model to more accurately re-identify objects in images containing multiple objects.

Claims (27)

  1. 一种模型训练方法,所述方法包括:A model training method, the method comprising:
    获取包含第一对象的第一图像样本;obtaining a first image sample containing a first object;
    利用待训练的第一模型的第一网络,对所述第一图像样本进行特征提取,得到所述第一对象的第一特征;performing feature extraction on the first image sample by using the first network of the first model to be trained to obtain the first feature of the first object;
    利用所述第一模型的第二网络,基于至少一个第二对象的第二特征,对所述第一特征进行更新,得到所述第一特征对应的第一目标特征,每一所述第二对象与所述第一对象的相似度不小于第一阈值;Using the second network of the first model to update the first feature based on the second feature of at least one second object to obtain the first target feature corresponding to the first feature, each of the second the similarity between the object and the first object is not less than a first threshold;
    基于所述第一目标特征,确定目标损失值;determining a target loss value based on the first target feature;
    基于所述目标损失值,对所述第一模型的模型参数进行至少一次更新,得到训练后的所述第一模型。Based on the target loss value, the model parameters of the first model are updated at least once to obtain the trained first model.
  2. 根据权利要求1所述的方法,其中,所述第一图像样本包括标签信息,所述第一模型中包括第一特征记忆库,所述第一特征记忆库中包括属于至少一个对象的至少一个特征;所述基于所述第一目标特征,确定目标损失值,包括:The method of claim 1, wherein the first image sample includes label information, the first model includes a first feature memory, and the first feature memory includes at least one object belonging to at least one object. Features; the determination of the target loss value based on the first target feature includes:
    基于所述第一目标特征和所述标签信息,确定第一损失值;determining a first loss value based on the first target feature and the tag information;
    基于所述第一目标特征和所述第一特征记忆库中的至少一个对象的至少一个特征,确定第二损失值;determining a second loss value based on the first target feature and at least one feature of at least one object in the first feature memory;
    基于所述第一损失值和所述第二损失值,确定所述目标损失值。The target loss value is determined based on the first loss value and the second loss value.
  3. 根据权利要求2所述的方法,其中,所述基于所述第一目标特征和所述第一特征记忆库中的至少一个对象的至少一个特征,确定第二损失值,包括:The method according to claim 2, wherein said determining a second loss value based on said first target feature and at least one feature of at least one object in said first feature memory bank comprises:
    从所述第一特征记忆库中的至少一个对象的至少一个特征中,确定所述第一对象的第一特征中心和至少一个所述第二对象的第二特征中心;determining a first feature center of said first object and a second feature center of at least one said second object from at least one feature of at least one object in said first feature memory;
    基于所述第一目标特征、所述第一特征中心和每一所述第二特征中心,确定所述第二损失值。The second loss value is determined based on the first target feature, the first feature center and each of the second feature centers.
  4. 根据权利要求2或3所述的方法,其中,所述第一特征记忆库中包括属于至少一个对象的特征集,每一所述特征集包括所属对象的至少一个特征;所述方法还包括:The method according to claim 2 or 3, wherein the first feature memory library includes feature sets belonging to at least one object, and each feature set includes at least one feature of the object to which it belongs; the method further comprises:
    基于所述第一目标特征,更新所述第一特征记忆库中属于所述第一对象的特征集。Based on the first target feature, a feature set belonging to the first object in the first feature memory is updated.
  5. 根据权利要求1至4中任一项所述的方法,其中,A method according to any one of claims 1 to 4, wherein,
    所述获取包含第一对象的第一图像样本,包括:获取包含第一对象的第一子图像和第二子图像,所述第二子图像是对第一子图像至少进行遮挡处理后的图像;The obtaining the first image sample containing the first object includes: obtaining a first sub-image and a second sub-image containing the first object, and the second sub-image is an image after at least occlusion processing is performed on the first sub-image ;
    所述利用待训练的第一模型的第一网络,对所述第一图像样本进行特征提取,得到所述第一对象的第一特征,包括:利用待训练的第一模型的第一网络,对所述第一子图像进行特征提取,得到所述第一对象的第一子特征,并对所述第二子图像进行特征提取,得到所述第一对象的第二子特征;Using the first network of the first model to be trained to perform feature extraction on the first image sample to obtain the first feature of the first object includes: using the first network of the first model to be trained, performing feature extraction on the first sub-image to obtain a first sub-feature of the first object, and performing feature extraction on the second sub-image to obtain a second sub-feature of the first object;
    所述利用所述第一模型的第二网络,基于至少一个第二对象的第二特征,对所述第一特征进行更新,得到所述第一特征对应的第一目标特征,包括:利用所述第一模型的第二网络,基于至少一个第二对象的第二特征,对所述第一子特征和所述第二子特征分别进行更新,得到所述第一子特征对应的第一目标子特征和所述第二子特征对应的第二目标子特征;Using the second network of the first model to update the first feature based on the second feature of at least one second object to obtain the first target feature corresponding to the first feature includes: using the In the second network of the first model, based on the second feature of at least one second object, the first sub-feature and the second sub-feature are respectively updated to obtain the first target corresponding to the first sub-feature the sub-feature and the second target sub-feature corresponding to the second sub-feature;
    所述基于所述第一目标特征,确定目标损失值,包括:基于所述第一目标子特征和所述第二目标子特征,确定所述目标损失值。The determining a target loss value based on the first target feature includes: determining the target loss value based on the first target sub-feature and the second target sub-feature.
  6. 根据权利要求5所述的方法,其中,所述基于所述第一目标子特征和所述第二目标子特征,确定所述目标损失值,包括:The method according to claim 5, wherein said determining said target loss value based on said first target sub-feature and said second target sub-feature comprises:
    基于所述第一目标子特征和所述第二目标子特征,确定第一目标损失值;determining a first target loss value based on the first target sub-feature and the second target sub-feature;
    基于所述第一子特征和所述第二子特征,确定第二目标损失值;determining a second target loss value based on the first sub-feature and the second sub-feature;
    基于所述第一目标损失值和所述第二目标损失值,确定目标损失值。A target loss value is determined based on the first target loss value and the second target loss value.
  7. 根据权利要求6所述的方法,其中,所述获取包含第一对象的第一子图像和第二子图像,包括:The method of claim 6, wherein said acquiring a first sub-image and a second sub-image comprising a first object comprises:
    获取包含第一对象的第一子图像;obtaining a first sub-image containing a first object;
    基于预设的遮挡集,对所述第一子图像至少进行遮挡处理,得到所述第二子图像,所述遮挡集中包括至少一个遮挡图像。Based on a preset occlusion set, at least perform occlusion processing on the first sub-image to obtain the second sub-image, the occlusion set includes at least one occlusion image.
  8. 根据权利要求6或7所述的方法,其中,所述第一网络包括第一子网络和第二子网络;所述利用待训练的第一模型的第一网络,对所述第一子图像进行特征提取,得到所述第一对象的第一子特征,并对所述第二子图像进行特征提取,得到所述第一对象的第二子特征,包括:The method according to claim 6 or 7, wherein the first network comprises a first sub-network and a second sub-network; the first network using the first model to be trained, for the first sub-image Perform feature extraction to obtain the first sub-feature of the first object, and perform feature extraction on the second sub-image to obtain the second sub-feature of the first object, including:
    利用待训练的第一模型的第一子网络,分别对所述第一子图像和所述第二子图像进行特征提取,得到所述第一子图像对应的第三子特征和所述第二子图像对应的第四子特征;Use the first sub-network of the first model to be trained to perform feature extraction on the first sub-image and the second sub-image to obtain the third sub-feature corresponding to the first sub-image and the second sub-image. The fourth sub-feature corresponding to the sub-image;
    利用所述第一模型的第二子网络,基于所述第三子特征确定所述第一子特征,并基于所述第四子特征确定所述第二子特征。Using the second sub-network of the first model, the first sub-feature is determined based on the third sub-feature, and the second sub-feature is determined based on the fourth sub-feature.
  9. 根据权利要求8所述的方法,其中,所述基于所述第一子特征和所述第二子特征,确定第二目标损失值,包括:The method according to claim 8, wherein said determining a second target loss value based on said first sub-feature and said second sub-feature comprises:
    基于所述第一子特征和所述第二子特征,确定第一目标子损失值;determining a first target sub-loss value based on the first sub-feature and the second sub-feature;
    基于所述第三子特征和所述第四子特征,确定第二目标子损失值;determining a second target sub-loss value based on the third sub-feature and the fourth sub-feature;
    基于所述第一目标子损失值和所述第二目标子损失值,确定所述第二目标损失值。The second target loss value is determined based on the first target sub-loss value and the second target sub-loss value.
  10. 根据权利要求9所述的方法,其中,所述第一子图像包括标签信息;所述基于所述第三子特征和所述第四子特征,确定第二目标子损失值,包括:The method according to claim 9, wherein the first sub-image includes label information; said determining a second target sub-loss value based on the third sub-feature and the fourth sub-feature comprises:
    基于所述第三子特征和所述标签信息,确定第七子损失值;determining a seventh sub-loss value based on the third sub-feature and the tag information;
    基于所述第四子特征和所述标签信息,确定第八子损失值;determining an eighth sub-loss value based on the fourth sub-feature and the tag information;
    基于所述第七子损失值和所述第八子损失值,确定所述第二目标子损失值。The second target sub-loss value is determined based on the seventh sub-loss value and the eighth sub-loss value.
  11. 根据权利要求8至10中任一项所述的方法,其中,所述第二子网络包括第三子网络和第四子网络;所述利用所述第一模型的第二子网络,基于所述第三子特征确定所述第一子特征,并基于所述第四子特征确定所述第二子特征,包括:The method according to any one of claims 8 to 10, wherein the second subnetwork includes a third subnetwork and a fourth subnetwork; the second subnetwork utilizing the first model is based on the The third sub-feature determines the first sub-feature, and determines the second sub-feature based on the fourth sub-feature, including:
    利用所述第一模型的第三子网络,基于所述第三子特征确定第一遮挡分数,并基于所述第四子特征确定第二遮挡分数;determining a first occlusion score based on the third sub-feature, and determining a second occlusion score based on the fourth sub-feature, using a third sub-network of the first model;
    利用所述第四子网络,基于所述第三子特征和所述第一遮挡分数,确定所述第一子特征,并基于所述第四子特征和所述第二遮挡分数,确定所述第二子特征。Utilizing the fourth sub-network, based on the third sub-feature and the first occlusion score, the first sub-feature is determined, and based on the fourth sub-feature and the second occlusion score, the Second sub-feature.
  12. 根据权利要求11所述的方法,其中,所述第三子网络包括池化子网络和至少一个遮挡擦除子网络,所述第一遮挡分数包括至少一个第一遮挡子分数,所述第二遮挡分数包括至少一个第二遮挡子分数;所述利用所述第一模型的第三子网络,基于所述第三子特征确定第一遮挡分数,并基于所述第四子特征确定第二遮挡分数,包括:The method according to claim 11, wherein the third subnetwork comprises a pooling subnetwork and at least one occlusion erasure subnetwork, the first occlusion score comprises at least one first occlusion subscore, and the second The occlusion score comprises at least one second occlusion sub-score; said third sub-network using said first model determines a first occlusion score based on said third sub-feature, and determines a second occlusion score based on said fourth sub-feature scores, including:
    利用所述池化子网络,将所述第三子特征划分为至少一个第三子部分特征,并将所述第四子特征划分为至少一个第四子部分特征;using the pooling sub-network, dividing the third sub-feature into at least one third sub-part feature, and dividing the fourth sub-feature into at least one fourth sub-part feature;
    利用每一所述遮挡擦除子网络,基于每一所述第三子部分特征,确定每一所述第一遮挡子分数,并基于每一所述第四子部分特征,确定每一所述第二遮挡子分数。Using each of said occlusion erasure sub-networks, each of said first occlusion sub-scores is determined based on each of said third sub-section features, and each of said Second occlusion subscore.
  13. 根据权利要求12所述的方法,其中,所述利用所述第四子网络,基于所述第三子特征和所述第一遮挡分数,确定所述第一子特征,并基于所述第四子特征和所述第二遮挡分数,确定所述第二子特征,包括:The method according to claim 12, wherein said first sub-feature is determined based on said third sub-feature and said first occlusion score using said fourth sub-network, and based on said fourth sub-features and the second occlusion score, determining the second sub-features, comprising:
    利用所述第四子网络,基于所述第三子特征的每一所述第三子部分特征和每一所述第一遮挡子分数,确定第一子部分特征,并基于所述第四子特征的每一所述第四子部分特征和每一所述第二遮挡子分数,确定第二子部分特征;Using the fourth sub-network, based on each of the third sub-part features and each of the first occlusion sub-scores of the third sub-features, a first sub-part feature is determined, and based on the fourth sub-feature each of said fourth sub-part features and each of said second occlusion sub-scores of features, determining a second sub-part feature;
    基于每一所述第一子部分特征,确定所述第一子特征,并基于每一所述第二子部分特征,确定所述第二子特征。Based on each of said first sub-part features, said first sub-feature is determined, and based on each of said second sub-part features, said second sub-feature is determined.
  14. 根据权利要求11至13任一项所述的方法,其中,所述第一子图像包括标签信息,所述第一模型包括第二特征记忆库,所述第二特征记忆库中包括属于至少一个对象的至少一个特征;所述基于所述第一子特征和所述第二子特征,确定第一目标子损失值,包括:The method according to any one of claims 11 to 13, wherein the first sub-image includes label information, the first model includes a second feature memory, and the second feature memory includes information belonging to at least one At least one feature of the object; said determining a first target sub-loss value based on said first sub-feature and said second sub-feature includes:
    基于所述第一子图像和所述第二子图像,确定遮挡掩码;determining an occlusion mask based on the first sub-image and the second sub-image;
    基于所述第一遮挡分数、所述第二遮挡分数和所述遮挡掩码,确定第三损失值;determining a third loss value based on the first occlusion score, the second occlusion score and the occlusion mask;
    基于所述第一子特征、所述第二子特征和所述标签信息,确定第四损失值;determining a fourth loss value based on the first sub-feature, the second sub-feature and the tag information;
    基于所述第一子特征、所述第二子特征和所述第二特征记忆库中的至少一个对象的至少一个特征,确定第五损失值;determining a fifth loss value based on the first sub-feature, the second sub-feature, and at least one feature of at least one object in the second feature memory;
    基于所述第三损失值、所述第四损失值和所述第五损失值,确定所述第一目标子损失值。The first target sub-loss value is determined based on the third loss value, the fourth loss value, and the fifth loss value.
  15. 根据权利要求14所述的方法,其中,所述基于所述第一子图像和所述第二子图像,确定遮挡掩码,包括:The method according to claim 14, wherein said determining an occlusion mask based on said first sub-image and said second sub-image comprises:
    将所述第一子图像和所述第二子图像分别划分为至少一个第一子部分图像和至少一个第二子部分图像;dividing the first sub-image and the second sub-image into at least one first sub-part image and at least one second sub-part image, respectively;
    基于每一所述第一子部分图像和每一所述第二子部分图像,确定遮挡子掩码;determining an occlusion submask based on each of said first subsection images and each of said second subsection images;
    基于每一所述遮挡子掩码,确定所述遮挡掩码。Based on each of the occlusion sub-masks, the occlusion mask is determined.
  16. 根据权利要求14或15所述的方法,其中,所述基于所述第一遮挡分数、所述第二遮挡分数和所述遮挡掩码,确定第三损失值,包括:The method according to claim 14 or 15, wherein said determining a third loss value based on said first occlusion score, said second occlusion score and said occlusion mask comprises:
    基于所述第一遮挡分数和所述遮挡掩码,确定第一子损失值;determining a first sub-loss value based on the first occlusion score and the occlusion mask;
    基于所述第二遮挡分数和所述遮挡掩码,确定第二子损失值;determining a second sub-loss value based on the second occlusion score and the occlusion mask;
    基于所述第一子损失值和所述第二子损失值,确定所述第三损失值。The third loss value is determined based on the first sub-loss value and the second sub-loss value.
  17. 根据权利要求14至16中任一项所述的方法,其中,所述基于所述第一子特征、所述第二子特征和所述标签信息,确定第四损失值,包括:The method according to any one of claims 14 to 16, wherein said determining a fourth loss value based on said first sub-feature, said second sub-feature and said tag information comprises:
    基于所述第一子特征和所述标签信息,确定第三子损失值;determining a third sub-loss value based on the first sub-feature and the tag information;
    基于所述第二子特征和所述标签信息,确定第四子损失值;determining a fourth sub-loss value based on the second sub-feature and the tag information;
    基于所述第三子损失值和所述第四子损失值,确定所述第四损失值。The fourth loss value is determined based on the third sub-loss value and the fourth sub-loss value.
  18. 根据权利要求14至17中任一项所述的方法,其中,所述基于所述第一子特征、所述第二子特征和所述第二特征记忆库中的至少一个对象的至少一个特征,确定第五损失值,包括:The method according to any one of claims 14 to 17, wherein said at least one feature based on at least one object in said first sub-feature, said second sub-feature and said second feature memory bank , to determine the fifth loss value, including:
    从所述第二特征记忆库中的至少一个对象的至少一个特征中,确定所述第一对象的第三特征中心和至少一个 所述第二对象的第四特征中心;determining a third feature center of the first object and a fourth feature center of at least one of the second objects from at least one feature of at least one object in the second feature memory;
    基于所述第一子特征、所述第三特征中心和每一所述第四特征中心,确定第五子损失值;determining a fifth sub-loss value based on said first sub-feature, said third feature center, and each of said fourth feature centers;
    基于所述第二子特征、所述第三特征中心和每一所述第四特征中心,确定第六子损失值;determining a sixth sub-loss value based on said second sub-feature, said third feature center, and each of said fourth feature centers;
    基于所述第五子损失值和所述第六子损失值,确定所述第五损失值。The fifth loss value is determined based on the fifth sub-loss value and the sixth sub-loss value.
  19. 根据权利要求14至18中任一项所述的方法,其中,所述第二网络包括第五子网络和第六子网络;所述利用所述第一模型的第二网络,基于至少一个第二对象的第二特征,对所述第一子特征和所述第二子特征分别进行更新,得到所述第一子特征对应的第一目标子特征和所述第二子特征对应的第二目标子特征,包括:The method according to any one of claims 14 to 18, wherein said second network comprises a fifth sub-network and a sixth sub-network; said second network utilizing said first model is based on at least one first For the second feature of the second object, update the first sub-feature and the second sub-feature respectively to obtain the first target sub-feature corresponding to the first sub-feature and the second target sub-feature corresponding to the second sub-feature Targeted sub-features, including:
    利用所述第五子网络,将所述第一子特征和所述第二子特征分别与至少一个第二对象的第二特征进行聚合,得到所述第一子特征对应的第一聚合子特征和所述第二子特征对应的第二聚合子特征;Using the fifth sub-network, the first sub-feature and the second sub-feature are respectively aggregated with the second feature of at least one second object to obtain a first aggregated sub-feature corresponding to the first sub-feature a second aggregated sub-feature corresponding to the second sub-feature;
    利用所述第六子网络,基于所述第一聚合子特征确定所述第一目标子特征,并基于所述第二聚合子特征确定所述第二目标子特征。Using the sixth sub-network, the first target sub-feature is determined based on the first aggregated sub-feature, and the second target sub-feature is determined based on the second aggregated sub-feature.
  20. 根据权利要求19所述的方法,其中,所述利用所述第五子网络,将所述第一子特征和所述第二子特征分别与至少一个第二对象的第二特征进行聚合,得到所述第一子特征对应的第一聚合子特征和所述第二子特征对应的第二聚合子特征,包括:The method according to claim 19, wherein said utilizing said fifth sub-network to aggregate said first sub-feature and said second sub-feature respectively with a second feature of at least one second object to obtain The first aggregate sub-feature corresponding to the first sub-feature and the second aggregate sub-feature corresponding to the second sub-feature include:
    基于所述第一子特征和每一所述第二特征,确定第一注意力矩阵,所述第一注意力矩阵用于表征所述第一子特征和每一所述第二特征之间的关联度;Based on the first sub-features and each of the second features, a first attention matrix is determined, and the first attention matrix is used to characterize the relationship between the first sub-features and each of the second features Correlation;
    基于每一所述第二特征和每一所述第一注意力矩阵,确定所述第一聚合子特征;determining the first aggregated sub-features based on each of the second features and each of the first attention matrices;
    基于所述第二子特征和每一所述第二特征,确定第二注意力矩阵,所述第二注意力矩阵用于表征所述第二子特征和每一所述第二特征之间的关联度;Based on the second sub-features and each of the second features, a second attention matrix is determined, and the second attention matrix is used to characterize the relationship between the second sub-features and each of the second features Correlation;
    基于每一所述第二特征和每一所述第二注意力矩阵,确定所述第二聚合子特征。The second aggregation sub-feature is determined based on each of the second features and each of the second attention matrices.
  21. 根据权利要求19或20所述的方法,其中,所述第六子网络包括第七子网络和第八子网络;所述利用所述第六子网络,基于所述第一聚合子特征确定所述第一目标子特征,并基于所述第二聚合子特征确定所述第二目标子特征,包括:The method according to claim 19 or 20, wherein the sixth sub-network includes a seventh sub-network and an eighth sub-network; and using the sixth sub-network, determining the The first target sub-feature, and determine the second target sub-feature based on the second aggregation sub-feature, including:
    利用所述第七子网络,基于所述第一聚合子特征和所述遮挡掩码确定第五子特征,并基于所述第二聚合子特征和所述遮挡掩码确定第六子特征;using the seventh sub-network, determining a fifth sub-feature based on the first aggregated sub-feature and the occlusion mask, and determining a sixth sub-feature based on the second aggregated sub-feature and the occlusion mask;
    利用所述第八子网络,基于所述第一子特征和所述第五子特征,确定所述第一目标子特征,并基于所述第二子特征和所述第六子特征,确定所述第二目标子特征。Using the eighth sub-network, based on the first sub-feature and the fifth sub-feature, determine the first target sub-feature, and based on the second sub-feature and the sixth sub-feature, determine the Describe the second target sub-feature.
  22. 一种图像识别方法,所述方法包括:An image recognition method, the method comprising:
    获取第一图像和第二图像;acquire the first image and the second image;
    利用已训练的目标模型,对所述第一图像中的对象和所述第二图像中的对象进行识别,得到识别结果,其中所述已训练的目标模型包括:采用权利要求1至21中任一项所述的模型训练方法得到的第一模型;所述识别结果表征所述第一图像中的对象和所述第二图像中的对象为同一对象或者不同对象。Use the trained target model to identify the object in the first image and the object in the second image to obtain a recognition result, wherein the trained target model includes: using any one of claims 1 to 21 A first model obtained by the model training method; the recognition result indicates that the object in the first image and the object in the second image are the same object or different objects.
  23. 一种模型训练装置,所述装置包括:A model training device, said device comprising:
    第一获取部分,被配置为获取包含第一对象的第一图像样本;a first acquisition part configured to acquire a first image sample containing a first object;
    特征提取部分,被配置为利用待训练的第一模型的第一网络,对所述第一图像样本进行特征提取,得到所述第一对象的第一特征;The feature extraction part is configured to use the first network of the first model to be trained to perform feature extraction on the first image sample to obtain the first feature of the first object;
    第一更新部分,被配置为利用所述第一模型的第二网络,基于至少一个第二对象的第二特征,对所述第一特征分别进行更新,得到所述第一特征对应的第一目标特征,每一所述第二对象与所述第一对象的相似度不小于第一阈值;The first updating part is configured to use the second network of the first model to update the first features respectively based on the second features of at least one second object to obtain the first features corresponding to the first features. Target features, the similarity between each second object and the first object is not less than a first threshold;
    第一确定部分,被配置为基于所述第一目标特征,确定目标损失值;a first determining part configured to determine a target loss value based on the first target feature;
    第二更新部分,被配置为基于所述目标损失值,对所述第一模型的模型参数进行至少一次更新,得到训练后的所述第一模型。The second updating part is configured to update the model parameters of the first model at least once based on the target loss value to obtain the trained first model.
  24. 一种图像识别装置,所述装置包括:An image recognition device, the device comprising:
    第二获取部分,被配置为获取第一图像和第二图像;a second acquisition part configured to acquire the first image and the second image;
    识别部分,被配置为利用已训练的目标模型,对所述第一图像中的对象和所述第二图像中的对象进行识别,得到识别结果,其中所述已训练的目标模型包括:采用权利要求1至21中任一项所述的模型训练方法得到的第一模型;所述识别结果表征所述第一图像中的对象和所述第二图像中的对象为同一对象或者不同对象。The identification part is configured to use the trained target model to identify the object in the first image and the object in the second image to obtain a recognition result, wherein the trained target model includes: using the right The first model obtained by the model training method described in any one of 1 to 21 is required; the recognition result indicates that the object in the first image and the object in the second image are the same object or different objects.
  25. 一种电子设备,包括处理器和存储器,所述存储器存储有可在处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现权利要求1至22任一项所述方法。An electronic device, comprising a processor and a memory, the memory stores a computer program that can run on the processor, and the processor implements the method according to any one of claims 1 to 22 when executing the computer program.
  26. 一种计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现权利要求1至22中任一项所述方法。A computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the method according to any one of claims 1 to 22 is implemented.
  27. 一种计算机程序产品,所述计算机程序产品包括计算机程序或指令,在所述计算机程序或指令在电子设备上运行的情况下,使得所述电子设备执行权利要求1至22中任意一项所述方法。A computer program product, the computer program product comprising a computer program or an instruction, when the computer program or instruction is run on an electronic device, the electronic device is made to execute any one of claims 1 to 22. method.
PCT/CN2022/127109 2022-01-28 2022-10-24 Model training and image recognition methods and apparatuses, device, storage medium and computer program product WO2023142551A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210107742.9A CN114445681A (en) 2022-01-28 2022-01-28 Model training and image recognition method and device, equipment and storage medium
CN202210107742.9 2022-01-28

Publications (1)

Publication Number Publication Date
WO2023142551A1 true WO2023142551A1 (en) 2023-08-03

Family

ID=81371764

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/127109 WO2023142551A1 (en) 2022-01-28 2022-10-24 Model training and image recognition methods and apparatuses, device, storage medium and computer program product

Country Status (2)

Country Link
CN (1) CN114445681A (en)
WO (1) WO2023142551A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117372818A (en) * 2023-12-06 2024-01-09 深圳须弥云图空间科技有限公司 Target re-identification method and device

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114445681A (en) * 2022-01-28 2022-05-06 上海商汤智能科技有限公司 Model training and image recognition method and device, equipment and storage medium
CN115022282B (en) * 2022-06-06 2023-07-21 天津大学 Novel domain name generation model establishment and application
CN115393953B (en) * 2022-07-28 2023-08-08 深圳职业技术学院 Pedestrian re-recognition method, device and equipment based on heterogeneous network feature interaction

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112329785A (en) * 2020-11-25 2021-02-05 Oppo广东移动通信有限公司 Image management method, device, terminal and storage medium
CN113421192A (en) * 2021-08-24 2021-09-21 北京金山云网络技术有限公司 Training method of object statistical model, and statistical method and device of target object
CN113780243A (en) * 2021-09-29 2021-12-10 平安科技(深圳)有限公司 Training method, device and equipment of pedestrian image recognition model and storage medium
CN114445681A (en) * 2022-01-28 2022-05-06 上海商汤智能科技有限公司 Model training and image recognition method and device, equipment and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112329785A (en) * 2020-11-25 2021-02-05 Oppo广东移动通信有限公司 Image management method, device, terminal and storage medium
CN113421192A (en) * 2021-08-24 2021-09-21 北京金山云网络技术有限公司 Training method of object statistical model, and statistical method and device of target object
CN113780243A (en) * 2021-09-29 2021-12-10 平安科技(深圳)有限公司 Training method, device and equipment of pedestrian image recognition model and storage medium
CN114445681A (en) * 2022-01-28 2022-05-06 上海商汤智能科技有限公司 Model training and image recognition method and device, equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117372818A (en) * 2023-12-06 2024-01-09 深圳须弥云图空间科技有限公司 Target re-identification method and device
CN117372818B (en) * 2023-12-06 2024-04-12 深圳须弥云图空间科技有限公司 Target re-identification method and device

Also Published As

Publication number Publication date
CN114445681A (en) 2022-05-06

Similar Documents

Publication Publication Date Title
WO2023142551A1 (en) Model training and image recognition methods and apparatuses, device, storage medium and computer program product
Cheng et al. Low-resolution face recognition
WO2021077984A1 (en) Object recognition method and apparatus, electronic device, and readable storage medium
CN110163115B (en) Video processing method, device and computer readable storage medium
Su et al. Multi-type attributes driven multi-camera person re-identification
CN103069415B (en) Computer-implemented method, computer program and computer system for image procossing
CN102549603B (en) Relevance-based image selection
Bianco et al. Predicting image aesthetics with deep learning
CN110503076B (en) Video classification method, device, equipment and medium based on artificial intelligence
US20230087863A1 (en) De-centralised learning for re-indentification
CN107003977A (en) System, method and apparatus for organizing the photo of storage on a mobile computing device
Dai et al. Cross-view semantic projection learning for person re-identification
Douze et al. The 2021 image similarity dataset and challenge
CN112508094A (en) Junk picture identification method, device and equipment
WO2020224221A1 (en) Tracking method and apparatus, electronic device, and storage medium
CN110516707B (en) Image labeling method and device and storage medium thereof
US20130343618A1 (en) Searching for Events by Attendants
Wieschollek et al. Transfer learning for material classification using convolutional networks
Ma et al. Low illumination person re-identification
Zhang et al. Multi-level and multi-scale horizontal pooling network for person re-identification
CN111666976A (en) Feature fusion method and device based on attribute information and storage medium
Guehairia et al. Deep random forest for facial age estimation based on face images
Deng et al. A deep multi-feature distance metric learning method for pedestrian re-identification
Zhang et al. Complementary networks for person re-identification
Islam et al. Large-scale geo-facial image analysis

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22923349

Country of ref document: EP

Kind code of ref document: A1