WO2023040195A1 - Procédé et appareil de reconnaissance d'objet, procédé et appareil d'entraînement de réseau, dispositif, support et produit - Google Patents

Procédé et appareil de reconnaissance d'objet, procédé et appareil d'entraînement de réseau, dispositif, support et produit Download PDF

Info

Publication number
WO2023040195A1
WO2023040195A1 PCT/CN2022/077443 CN2022077443W WO2023040195A1 WO 2023040195 A1 WO2023040195 A1 WO 2023040195A1 CN 2022077443 W CN2022077443 W CN 2022077443W WO 2023040195 A1 WO2023040195 A1 WO 2023040195A1
Authority
WO
WIPO (PCT)
Prior art keywords
network
domain
loss
image
sample image
Prior art date
Application number
PCT/CN2022/077443
Other languages
English (en)
Chinese (zh)
Inventor
余世杰
朱烽
赵瑞
乔宇
Original Assignee
上海商汤智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海商汤智能科技有限公司 filed Critical 上海商汤智能科技有限公司
Publication of WO2023040195A1 publication Critical patent/WO2023040195A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Definitions

  • Embodiments of the present disclosure relate to the technical field of object recognition, and relate to but are not limited to an object recognition method, a network training method and device, equipment, media and products.
  • An embodiment of the present disclosure provides a technical solution for training an object recognition network.
  • An embodiment of the present disclosure provides a training method for an object recognition network.
  • the object recognition network includes a comprehensive network, a first domain network, and a second domain network.
  • the method includes:
  • the first source domain sample image and the second source domain sample image respectively determine the first alignment loss between the comprehensive network and the first domain network, and the alignment loss between the comprehensive network and the second domain network The second alignment loss; wherein, there is no overlap between the first source domain sample image and the second source domain sample image, the first source domain sample image is used to train the first domain network, and the first Two source domain sample images are used to train the second domain network;
  • the second source domain sample image and the second alignment loss Based on the first source domain sample image and the first alignment loss, the second source domain sample image and the second alignment loss, determine the domain loss of the first domain network, determine the second domain loss of the domain network;
  • the network parameters of the object recognition network are calculated. Adjusted so that the adjusted loss at the output of the full network satisfies the convergence criteria.
  • determining the first alignment loss between the comprehensive network and the first domain network based on the first source domain sample image includes: characterizing the first source domain sample image based on the comprehensive network extracting to obtain first image features; based on the first domain network, perform feature extraction and projection transformation on the first source domain sample image to obtain second image features; based on the second image features and the first image features, determining a first alignment loss between the comprehensive network and the first domain network.
  • the first alignment loss between the comprehensive network and the first domain network can be determined, so that the comprehensive network can guide the domain network in the training process, and then the comprehensive network and the domain network can be realized.
  • the synergy between networks improves the generalization performance of the network.
  • the determining a first alignment loss between the comprehensive network and the first domain network based on the second image feature and the first image feature includes: In the feature space of the source domain sample image, determine the feature distance between the second image feature and the first image feature; based on the feature distance and the true value between the features in the first source domain sample image feature distance, determine the first alignment loss.
  • the comprehensive network can use the correlation degree between the features belonging to the same image to give feedback to the training process of the first domain network based on the sample image, so that the first domain network can consider during the training process to the correlation between features in the same sample image.
  • the determining the domain loss of the first domain network based on the first source domain sample image and the first alignment loss includes: using the first source domain sample image as meta-training Data, training the first domain network to obtain the internal loss of the first domain network; based on the internal loss and the first alignment loss, determine the network parameters for the first domain network Adjusted meta-training loss; determining the meta-training loss of the first domain network as the domain loss.
  • the meta-training loss of the first domain network in the meta-training stage is determined, and the meta-training loss is used as the domain loss for optimizing the network parameters of the first domain network, thereby improving the performance of the first domain network. Identification flexibility.
  • the training of the first domain network by using the first source domain sample image as meta-training data to obtain the internal loss of the first domain network includes: based on the first The image features extracted by the domain network and the image features of the positive sample image determine the image distance loss between the first source domain sample image and the positive sample image; in the first domain network, the first domain network extracts Classify the image features of the image to obtain a first classification result; based on the first classification result and the true value classification label of the first source domain sample image, determine the classification loss of the image features extracted by the first domain network; The classification loss and the image distance loss are determined as an internal loss of the first domain network; the meta-training loss of the first domain network is determined based on the internal loss and the first alignment loss, The method includes: fusing the first alignment loss, the image distance loss and the classification loss to obtain a meta-training loss of the first domain network.
  • the first domain network includes: a feature extractor, a projection network, and a classifier
  • performing feature extraction and projection transformation on the first source domain sample image to obtain second image features includes: Based on the feature extractor, performing feature extraction on the first source domain sample image; based on the projection network, performing projection transformation on the image features extracted by the feature extractor to obtain the second image features;
  • classifying the image features extracted by the first domain network to obtain a first classification result includes: based on the classifier, classifying the image features extracted by the feature extractor to obtain the Describe the results of the first classification. In this way, by determining the spatial distance between the second image feature and the first image feature, the comprehensive network can guide the training process of the first domain network based on the alignment loss.
  • the determination of the relationship between the first domain network and the second domain network is based on the second source domain sample image, the second alignment loss, and the domain loss of the first domain network.
  • the synergy loss among the first domain network includes: determining an adaptation parameter based on the network parameters of the feature extractor of the first domain network and the domain loss of the first domain network; based on the adaptation parameter, the second alignment loss and the The second source domain sample image is used to determine a synergy loss between the first domain network and the second domain network. In this way, based on the collaborative loss, it is convenient to realize the cooperative learning between the first domain network and the second domain network between different source domain sample images without overlapping, thereby improving the generalization ability of the entire network architecture.
  • said determining a synergy loss between said first domain network and said second domain network based on said adaptation parameter, said second alignment loss and said second source domain sample image It includes: using the adaptation parameters as the network parameters of the feature extractor of the second domain network to obtain the updated second domain network; using the second source domain sample image as meta-test data, and inputting the The feature extractor of the second domain network performs feature extraction to obtain a third image feature; based on the second domain network, respectively performs projection transformation and feature classification on the third image feature to obtain a fourth image feature and a second classification.
  • Result based on the fourth image feature, the second classification result and the second alignment loss, determine a meta-test loss of the first domain network; determine that the meta-test loss is the synergy loss.
  • the sample image of another source domain is used in the meta-test phase to test based on the second domain network, and the meta-test loss of the second domain network in the meta-test phase is determined, and the meta-test loss is fed back to A first domain network to optimize network parameters based on the domain loss of the first domain network.
  • the determining the network parameters of the comprehensive network based on the network parameters of the first domain network and the second domain network includes: iteratively training the object recognition network to be trained In the process, determine the historical network parameters of the comprehensive network in the last iteration training; determine the prediction network parameter set of the feature extractor in the first domain network and the second domain network in the next iteration training; based on the A set of historical network parameters and the predicted network parameters is used to update the network parameters of the comprehensive network in the next iterative training.
  • the comprehensive network can collect the learning knowledge of multiple domain networks, so that it can better feed back supervision information to various domain networks, and realize the collaborative learning of comprehensive network and various domain networks.
  • the network parameters of the object recognition network are adjusted based on the domain loss and collaborative loss of the first domain network and the second domain network, so that the adjusted comprehensive network The output loss meets the convergence condition, including: determining the data volume of the first source domain sample image; based on the data volume, the image features of the second source domain sample image extracted by the comprehensive network and the first The second image feature extracted by the feature extractor of the domain network determines the uniform loss used to adjust the distribution of image features; the preset balance value is used to adjust the uniform loss to obtain the adjusted uniform loss; the adjusted The uniform loss, the domain loss of the first domain network, the domain loss of the second domain network and the collaborative loss are fused to obtain a total loss; based on the total loss, the network parameters of the object recognition network are adjusted , such that the adjusted loss at the output of the full network satisfies the convergence condition. In this way, in an iterative process, by determining the total loss of the object recognition network, the network parameters in the network are optimized as a whole, so as to obtain a
  • An embodiment of the present disclosure provides an object recognition method, the method comprising: acquiring a first image including an object to be recognized; performing feature extraction on the first image and a second image in a preset image library based on a comprehensive network, The image features of the first image and the image features of the second image are obtained; wherein, the comprehensive network is obtained by training based on the above-mentioned object network training method; based on the image features of the first image and According to the image features of the second image, the object to be recognized is re-recognized in the preset image library to obtain a recognition result.
  • An embodiment of the present disclosure provides a training device for an object recognition network.
  • the object recognition network includes a comprehensive network, a first domain network, and a second domain network.
  • the device includes:
  • the first determining module is configured to determine a first alignment loss between the comprehensive network and the first domain network based on the first source domain sample image and the second source domain sample image, respectively, and the comprehensive network and the The second alignment loss between the second domain network; wherein, there is no overlap between the first source domain sample image and the second source domain sample image, and the first source domain sample image is used to train the first source domain sample image A domain network, the second source domain sample image is used to train the second domain network;
  • a second determination module configured to determine the domain of the first domain network based on the first source domain sample image and the first alignment loss, the second source domain sample image and the second alignment loss, respectively loss, determining the domain loss of said second domain network;
  • the third determination module is configured to determine the relationship between the first domain network and the second domain network based on the second source domain sample image, the second alignment loss, and the domain loss of the first domain network. synergy loss;
  • a fourth determination module configured to determine network parameters of the comprehensive network based on network parameters of the first domain network and the second domain network;
  • the first adjustment module is configured to adjust the network parameters of the object recognition network based on the domain loss and synergy loss of the first domain network and the second domain network, so that the adjusted comprehensive network outputs The loss satisfies the convergence condition.
  • An embodiment of the present disclosure provides an object recognition device, and the device includes:
  • a first acquisition module configured to acquire a first image including an object to be identified
  • the first extraction module is configured to perform feature extraction on the first image and the second image in the preset image library based on a comprehensive network, to obtain image features of the first image and image features of the second image;
  • the comprehensive network is obtained by training based on the above-mentioned object network training method;
  • the first recognition module is configured to re-recognize the object to be recognized in the preset image library based on the image features of the first image and the image features of the second image to obtain a recognition result.
  • An embodiment of the present disclosure provides a computer storage medium, on which computer-executable instructions are stored. After the computer-executable instructions are executed, the steps of the foregoing method can be realized.
  • An embodiment of the present disclosure provides a computer device, the computer device includes a memory and a processor, the memory stores computer-executable instructions, and the processor can implement the above method when running the computer-executable instructions on the memory A step of.
  • An embodiment of the present disclosure provides a computer program product, where the computer program product includes computer-executable instructions. After the computer-executable instructions are executed, the object recognition method described in any one of the above items can be implemented.
  • Embodiments of the present disclosure provide an object recognition method, a network training method and device, equipment, media and products.
  • the object recognition network includes a comprehensive network, a first domain network and a second domain network. Firstly, based on the first source domain sample image and the second source domain sample image, the first alignment loss between the comprehensive network and the first domain network, and the first alignment loss between the comprehensive network and the second domain network are respectively determined.
  • the second alignment loss between; wherein, there is no overlap between the first source domain sample image and the second source domain sample image, and the first source domain sample image is used to train the first domain network, so
  • the second source domain sample image is used to train the second domain network; in this way, the alignment loss can be used to guide the training of the domain network by the comprehensive network.
  • the second source domain sample image and the second alignment loss determine the domain loss of the first domain network, determine the first the domain loss of the second domain network; and based on the second source domain sample image, the second alignment loss and the domain loss of the first domain network, determine the relationship between the first domain network and the second domain network
  • the synergy loss among different domain networks in this way, the collaborative learning between different domain networks can be realized by using the synergy loss between different domain networks; again, based on the network parameters of the first domain network and the second domain network, determine the The network parameters of the comprehensive network; in this way, the network parameters of multiple domain networks can be brought together into the comprehensive network to update the network parameters of the comprehensive network; finally, based on the domain loss of the first domain network, the second domain network domain loss and the synergy loss between the first domain network and the second domain network, and adjust the network parameters of the object recognition network so that the adjusted overall network output loss satisfies the convergence condition; thus , through
  • FIG. 1A is a schematic diagram of the implementation flow of the training method of the object recognition network provided by the embodiment of the present disclosure
  • FIG. 1B is a schematic diagram of a system architecture to which an object recognition method according to an embodiment of the present disclosure can be applied;
  • FIG. 2A is a schematic flow diagram of another implementation of the object recognition network training method provided by the embodiment of the present disclosure.
  • FIG. 2B is a schematic diagram of an implementation flow of an object recognition method provided by an embodiment of the present disclosure
  • FIG. 3 is a schematic diagram of an application scenario of a training method for an object recognition network provided by an embodiment of the present disclosure
  • FIG. 4 is a schematic diagram of an implementation framework of a training method for an object recognition network provided by an embodiment of the present disclosure
  • FIG. 5 is a schematic diagram of a collaborative learning framework between networks in different domains provided by an embodiment of the present disclosure
  • FIG. 6 is a schematic diagram of an implementation framework of a training method for an object recognition network provided by an embodiment of the present disclosure
  • FIG. 7 is a schematic diagram of the structural composition of a training device for an object recognition network according to an embodiment of the present disclosure
  • FIG. 8 is a schematic diagram of the structural composition of an object recognition device according to an embodiment of the present disclosure.
  • FIG. 9 is a schematic diagram of the composition and structure of a computer device according to an embodiment of the disclosure.
  • first ⁇ second ⁇ third is only used to distinguish similar examples, and does not represent a specific order for the examples. Understandably, “first ⁇ second ⁇ third” Where permitted, the specific order or sequencing may be interchanged such that the embodiments of the disclosure described herein can be practiced in sequences other than those illustrated or described herein.
  • Each domain network corresponds to a deep neural network model, which is divided into three parts, feature extractor, classifier and projection network. In addition, a domain network is responsible for learning the data of a domain. If there are N domains in the training set, then there are N domain networks.
  • Holistic experts are responsible for collecting their learned knowledge from domain networks while knowing what they learn. There is only one comprehensive expert, which consists of a feature extractor. It should be noted that the feature extractor structures of Domain Network and Full Expert are exactly the same.
  • Collaborative learning between domain networks is mainly accomplished by the meta-test stage of the meta-learning method. When training a certain domain network, in the meta-test stage, use the classifier and projection network of other domain networks to help the training of the feature extractor of the domain network.
  • A. Comprehensive expert guidance domain network learning When training a domain network, the image data of the corresponding domain is input into the comprehensive expert and the domain network.
  • the features extracted by the domain network and the comprehensive expert are required to be as similar as possible.
  • the exemplary application of the object recognition device provided by the embodiment of the present disclosure is described below.
  • the device provided by the embodiment of the present disclosure can be implemented as a notebook computer, a tablet computer, a desktop computer, a camera, a mobile device (for example, a personal digital
  • a mobile device for example, a personal digital
  • Various types of user terminals such as assistants, dedicated messaging devices, portable game devices, etc., can also be implemented as servers.
  • an exemplary application when the device is implemented as a terminal or a server will be described.
  • the method can be applied to a computer device, and the functions realized by the method can be realized by calling a program code by a processor in the computer device.
  • the program code can be stored in a computer storage medium.
  • the computer device includes at least a processor and a storage device. medium.
  • An embodiment of the present disclosure provides a training method for an object recognition network.
  • the architecture of the object recognition network includes a comprehensive network and at least two domain networks, for example, a first domain network and a second domain network, as shown in FIG. 1A , combined with The steps shown in Figure 1A are illustrated:
  • Step S101 based on the first source domain sample image and the second source domain sample image, respectively determine the first alignment loss between the comprehensive network and the first domain network, and the comprehensive network and the second domain network The second alignment loss between .
  • the first source domain sample image is used to train the first domain network
  • the second source domain The sample images are used to train the second domain network.
  • the first source domain sample image corresponding to the first domain network may be the face and body image of the same pedestrian
  • the second source domain corresponding to the second domain network The sample image can be an incomplete picture of the same pedestrian (for example, the pedestrian is blocked)
  • the third source domain sample image corresponding to the third domain network can be an image of multiple pedestrians with similar faces (for example, multiple images of pedestrians), etc.
  • the first domain network is each of the at least two domain networks.
  • the first source domain sample image corresponding to the first domain network is an image from the same source domain; the second source domain sample image and the first source domain sample image are image sets belonging to different source domains.
  • the frame of the image includes one or more images of marked objects, which may be images with complex appearance or images with simple appearance.
  • the first source domain sample image may be an image including an annotated object collected by any collection device in any scene.
  • the marked object may be a pedestrian, animal or object to be recognized.
  • the network architecture of each of the at least two domain networks is the same, for example, each domain network includes three parts: a feature extractor, a classifier and a projection network; network parameters of each part in different domain networks are different.
  • the comprehensive network can be implemented with a feature extractor, which can have the same structure as the feature extractor in the domain network.
  • the features after projection transformation are obtained; at the same time, the first source domain sample image is input to the comprehensive network. Then, by determining the distance between the features after projective transformation and the image features extracted by the comprehensive network in the feature space, the first alignment loss between the comprehensive network and the first domain network is obtained. In this way, through the first alignment loss, the comprehensive network can guide the first domain network during the training process.
  • the features after projection transformation are obtained; at the same time, the second source domain sample image is input into the comprehensive network for feature extraction to obtain the image features; then, by determining the distance between the projectively transformed features and the image features extracted by the comprehensive network in the feature space, a second alignment loss between the comprehensive network and the second domain network is obtained.
  • Step S102 based on the first source domain sample image and the first alignment loss, the second source domain sample image and the second alignment loss, determine the domain loss of the first domain network, determine the Describe the domain loss of the second domain network.
  • the domain loss of the network is determined through the source domain sample image and alignment loss corresponding to the case in the domain.
  • the process of determining the domain loss of the first domain network is: through the first source domain sample image, the classification loss inside the first domain network and the three an internal loss such as tuple loss; combining the internal loss with the first alignment loss to obtain the domain loss of the first domain network.
  • the first source domain sample image is first input into the first domain network for feature extraction, and projection transformation and classification are performed based on the extracted features; then based on the prediction results output by the first domain network, and the ground-truth label of the object in the sample image to determine the internal loss of the first domain network; finally, the first alignment loss between the comprehensive network and the first domain network is fused with the internal loss to obtain the first Domain loss during domain network training.
  • the process of determining the domain loss of the second domain network is: through the second source domain sample image, the inside of the second domain network can be determined Intrinsic losses such as classification loss and triplet loss; combine the intrinsic loss with the second alignment loss to obtain the domain loss of the second domain network.
  • the second source domain sample image is first input into the second domain network for feature extraction, and projection transformation and classification are performed based on the extracted features; then based on the prediction results output by the second domain network, and the ground-truth label of the object in the sample image to determine the internal loss of the second domain network; finally, the second alignment loss between the comprehensive network and the second domain network is fused with the internal loss to obtain the second Domain loss during domain network training.
  • the first alignment loss between the comprehensive network and the first domain network is applied to the training process of the first domain network
  • the second alignment loss between the comprehensive network and the second domain network is applied to the training process of the second domain network.
  • the guidance and supervision of the comprehensive network to the first domain network and the second domain network during the training process can be realized.
  • Step S103 based on the second source domain sample image, the second alignment loss and the domain loss of the first domain network, determine a synergy loss between the first domain network and the second domain network.
  • the second source domain sample image and the first source domain sample image belong to different source domains; in this way, each network in the two domain networks is trained by a specific source domain sample image, and different domain networks are in The sample images learned during training belong to different domains; thus, the number of the at least two domain networks is the same as the number of source domains of the sample images.
  • the second domain network is any other domain network in the plurality of domain networks except the first domain network.
  • at least two domain networks include 5 domain networks, the first domain network is each domain network in the 5 domain networks, taking the first domain network as the first domain network in the domain network as an example,
  • the second domain network is any one of the five domain networks except the first domain network.
  • the comprehensive network guides the second alignment loss trained by the second domain network, and by fusing this internal loss with the second alignment loss, the synergy loss between the first domain network and the second domain network is obtained. In this way, based on the collaborative loss, the training of the first domain network and the second domain network between sample images in different source domains can be realized, so that the first domain network and the second domain network have stronger generalization performance.
  • Step S104 based on the network parameters of the first domain network and the second domain network, determine the network parameters of the comprehensive network.
  • the network parameters of the comprehensive network are determined.
  • the characteristics of all domain networks in each iteration are obtained.
  • the network parameters of the extractor the network parameters of the feature extractors of all domain networks in the iterative process are processed by using the exponential moving average method, and the network parameters of the comprehensive network are updated to obtain the network parameters of the comprehensive network in this iteration; In this way, by combining the network parameters of multiple domain networks into a comprehensive network, the training of each domain network can be used as feedback for the comprehensive network.
  • Step S105 based on the domain loss of the first domain network, the domain loss of the second domain network, and the synergy loss between the first domain network and the second domain network, the object recognition network
  • the network parameters are tuned so that the loss at the output of the tuned full network satisfies the convergence criteria.
  • the domain loss of each domain network is used as the meta-training loss in the meta-training phase
  • the collaborative loss between domain networks is used as the meta-training loss in the meta-testing phase.
  • Test loss; the meta-training loss and the meta-test loss are fused to form a total loss; based on the total loss, the network parameters in the object recognition network to be trained are adjusted until the total loss of the object recognition network to be trained meets the hand convergence condition , stop the iterative process, and complete the training of the object recognition network to be trained.
  • the network parameters of each domain network in the object recognition network and the network parameters of the overall network are adjusted based on the total loss, so that the output domain loss and synergy loss both meet the convergence condition.
  • the meta-training loss in the total loss is used to optimize the network parameters in the entire framework of the object recognition network to be trained; the meta-test loss does not optimize the network parameters, and is used for The algorithm implementation of the whole framework is optimized.
  • the meta-training loss is learned for sample images in one source domain, and the meta-test loss is learned for sample images in different source domains. In this way, optimizing the entire network through the combination of meta-training loss and meta-testing loss can lead to a comprehensive network with stronger generalization performance.
  • the first alignment loss between the comprehensive network and the first domain network is determined, so that the alignment loss can be used to guide the training of the domain network by the comprehensive network.
  • collaborative learning between different domain networks can be realized; based on the network parameters of the at least two domain networks, the network parameters of the comprehensive network can be determined, and the network parameters of multiple domain networks can be combined.
  • the network parameters are collected into the comprehensive network to update the network parameters of the comprehensive network; in this way, through the collaborative training between multiple domain networks and the guidance and supervision of the comprehensive network for each domain network, the obtained object recognition network can be used in any data domain has high recognition accuracy and good generalization performance.
  • step S101 can be realized through the steps shown in Figure 2A, which is the object recognition provided by the embodiment of the present disclosure
  • Figure 2A Another schematic diagram of the implementation flow of the training method of the network, combined with the steps shown in Figures 1A and 2A, is described as follows:
  • Step S201 performing feature extraction on the first source domain sample image based on the comprehensive network to obtain first image features.
  • the first source domain sample image is input into a comprehensive network, and the comprehensive network is used to perform feature extraction on the sample image to obtain the first image features.
  • the comprehensive network can be any type of feature extraction network, such as residual network, convolutional neural network, or dense network.
  • Step S202 based on the first domain network, perform feature extraction and projection transformation on the first source domain sample image to obtain second image features.
  • the sample image is fed into the first domain network at the same time as the first source domain sample image is fed into the comprehensive network.
  • feature extraction is first performed on the sample image, and then projective transformation is performed on the extracted image features to realize coordinate conversion of the image features at different spatial levels, and the converted second image features are obtained; In this way, the second image feature and the first image feature are in the same feature space.
  • the network architecture of each domain network in multiple domain networks is the same, including: a feature extractor, a projection network, and a classifier; wherein, the feature extractor can be the same as the feature extractor in the comprehensive network
  • the architecture is the same, but the network parameters are different;
  • the projection network can be any network that can realize the projection transformation of features; for example, a feedforward neural network.
  • the classifier can be any neural network capable of feature classification; for example, a convolutional neural network or a residual neural network.
  • Step S241 based on the feature extractor, perform feature extraction on the first source domain sample image.
  • the first source domain sample image is input to the feature extractor of the first domain network for feature extraction to obtain extracted image features.
  • Step S242 based on the projection network, perform projection transformation on the image features extracted by the feature extractor to obtain the second image features.
  • the extracted image features are input into the projection network of the first domain network for projection transformation to obtain the second image features in the same feature space as the first image features.
  • the second image feature can be obtained, so that the relationship between the second image feature and the first image feature can be determined.
  • the spatial distance, and then realize that the comprehensive network guides the training process of the first domain network based on the alignment loss.
  • Step S203 based on the second image feature and the first image feature, determine a first alignment loss between the comprehensive network and the first domain network.
  • the first alignment loss is calculated by calculating the Euclidean distance between the second image feature and the first image feature, and the Euclidean distance and the ground-truth distance between the features in the image;
  • the distance between the features in the sample image should be as small as possible, so based on this, using the alignment loss can realize the guidance of the comprehensive network to the domain network in the training process.
  • the first alignment loss between the comprehensive network and the first domain network can be determined, so that the comprehensive network can guide the domain network in the training process, and then the comprehensive network and the domain network can be realized.
  • the synergy between networks improves the generalization performance of the network.
  • the above steps S201 to S203 implement the "determine the first alignment loss between the comprehensive network and the first domain network based on the first source domain sample image" in the above step S101.
  • the process of "determining the second alignment loss between the comprehensive network and the second domain network based on the second source domain sample image" in step S101 is similar to the above steps S201 to S203, that is, the comprehensive network is first used to Feature extraction is performed on the sample image of the second source domain to obtain the image feature; then the second domain network is used to extract the feature of the sample image of the second source domain to obtain another image feature; finally, based on these two image features, the comprehensive network and the first image feature are determined Second alignment loss between two-domain networks.
  • the distance between the comprehensive network and the first domain network is determined based on the ground-truth distance between features in the same sample image and the predicted feature distance between the first image feature and the second image feature.
  • the first alignment loss that is, the above step S203 can be implemented through the following steps S231 and S232 (not shown in the figure):
  • Step S231 in the feature space of the first source domain sample image, determine the feature distance between the second image feature and the first image feature.
  • the feature space of the first source domain sample image in step S231 refers to the feature space where the image features obtained after the comprehensive network extracts features from the sample image are located.
  • the Euclidean distance between the first image feature obtained through the projection transformation of the projection network of the first domain network and the first image feature extracted by the comprehensive network for the sample image is determined, and the Euclidean distance as characteristic distance.
  • Step S232 Determine the first alignment loss based on the feature distance and a true feature distance between features in the first source domain sample image.
  • the feature distance between the two image features should be very small, that is, the true value of the feature distance is a very small value, indicating that the two image features are highly correlated; if the first image feature and the second image feature come from the same image, then the feature distance between the two image features should be very large, that is, two Image features are irrelevant.
  • the comprehensive network can use the correlation degree between the features belonging to the same image to give feedback to the training process of the first domain network based on the sample image, so that the first domain network can consider during the training process to the correlation between features in the same sample image.
  • multiple domain networks are collaboratively trained by means of meta-learning, and the training of the domain networks is completed through the meta-training phase and the meta-testing phase, that is, the above step S102 can be performed through the following steps S121 to S123 (illustrated not shown) to achieve:
  • Step S121 using the first source domain sample image as meta-training data to train the first domain network to obtain an internal loss of the first domain network.
  • the process of training the object recognition network to be trained includes training each domain network and training the full network.
  • Co-training the first-domain network and the second-domain network includes: training the first-domain network in the meta-training stage. Inputting the first source domain sample image to the feature extractor of the first domain network for feature extraction; then, inputting the extracted features to the classifier of the first source domain sample image to determine the classification loss of the classifier; and Determining the image distance loss between the first source domain sample image and the positive sample image; using the classification loss and the image distance loss as an internal loss for optimizing network parameters of the first domain network.
  • the internal loss of each domain network is determined during an iterative training process.
  • the domain loss of the first domain network is determined by combining the alignment loss between the first domain network and the comprehensive network, and the internal loss of the first domain network itself, that is, the above step S102 can be performed as follows Steps to achieve:
  • the first step is to determine the image distance loss between the first source domain sample image and the positive sample image based on the image features extracted by the first domain network and the image features of the positive sample image.
  • the positive sample image and the anchor sample image belong to the same type of image; the image distance loss is used to minimize the distance between the sample image and the positive sample image and the anchor image, and maximize the distance between the negative sample image.
  • the image distance loss may be a triplet loss of the sample image.
  • the category of the image feature extracted by the first domain network is predicted. , to get the first classification result.
  • the first domain network further includes a classifier for classifying features, and based on the classifier, the image features extracted by the feature extractor are classified to obtain the classification result. In this way, the extracted image features are input into the classifier of the first domain network for feature classification to obtain a feature classification result.
  • the third step is to determine the classification loss of the image features extracted by the first domain network based on the first classification result and the true classification label of the first source domain sample image.
  • the difference between the predicted classification result of the image feature and the true classification label is determined , based on the difference, the classification loss of the image feature can be determined.
  • the fourth step is to determine the classification loss and the image distance loss as internal losses of the first domain network.
  • the first source domain sample image is input into the first domain network, and the classification loss and ternary loss of the first domain network are respectively determined, which are used for fusion for the first domain network.
  • Step S122 based on the internal loss and the first alignment loss, determine a meta-training loss for adjusting network parameters of the first domain network.
  • the image distance loss, classification loss and first alignment loss of the first domain network are summed element-wise to obtain the network parameters of the feature extractor of the first domain network, the network parameters of the projection network and The network parameters of the classifier are optimized by the meta-training loss.
  • the first alignment loss, the image distance loss and the classification loss are fused to obtain a meta-training loss of the first domain network.
  • the first alignment loss, the image distance loss and the classification loss are summed element-wise to obtain the domain loss of the first domain network.
  • the first domain network can be trained based on this domain loss , so that the first domain network is guided by the full network while learning the sample image.
  • Step S123 determining the meta-training loss of the first domain network as the domain loss.
  • the meta-training loss of the first domain network in the meta-training phase is determined through meta-learning, and the meta-training loss is used as the domain loss for optimizing the network parameters of the first domain network, thereby improving the The flexibility of a domain network for object recognition.
  • each domain network in the object recognition network is trained by means of meta-learning, and the meta-training loss of each domain network in the meta-training stage is determined, so as to realize the network parameters of each domain network Optimization.
  • the element The synergy loss of the second domain network in the test phase that is, the above step S103 can be realized through the following steps S131 and S132 (not shown in the figure):
  • Step S131 based on the network parameters of the feature extractor of the first domain network and the domain loss of the first domain network, determine adaptation parameters.
  • the network parameters of the feature extractor of the first domain network are used as the network parameters that need to be meta-learned; the adjusted parameters are multiplied by the domain loss of the first domain network in the meta-training phase to obtain the product; For example, set the adjustment parameter to a value of 0.1.
  • the network parameters of the feature extractor and the product are subtracted element by element to obtain the adaptation parameter. In this way, through the domain loss of the first domain network in the training phase, the adaptation parameter is determined, and the adaptation parameter is used as the network parameter of the second domain network, so that the collaboration between different domain networks in multiple source domain sample images can be realized study.
  • Step S132 Determine a synergy loss between the first domain network and the second domain network based on the adaptation parameter, the second alignment loss, and the second source domain sample image.
  • feature extraction is performed on the second source domain sample image; secondly, the extracted features are respectively input into the second domain network In the classifier and projection network of the classifier; again, determine the classification loss of the second domain network based on the output of the classifier, and determine the second alignment loss between the second domain network and the comprehensive network based on the output of the projection network; finally pass The second alignment loss, the classification loss, and the triplet loss of the second domain network determine the synergy loss of the second domain network in the meta-testing stage.
  • the network parameters of the feature extractor of the first domain network are used as the network parameters of the feature extractor of the second domain network to determine the synergy loss of the second domain network in the meta-test stage, so as to facilitate The loss enables collaborative learning between the first domain network and the second domain network between different source domain sample images without overlap, thereby improving the generalization ability of the entire network architecture.
  • the relationship between the second domain network and the first domain network is determined.
  • the synergy loss between domain networks that is, the above step S132 can be realized through the following steps:
  • the adapted parameters are used as network parameters of the feature extractor of the second domain network to obtain the updated second domain network.
  • the adaptation parameter determined based on the feature extractor of the first domain network is used as the network parameter of the feature extractor of the second domain network
  • the second domain network whose network parameter is the adaptation parameter is used as the updated Second Domain Network.
  • the network parameters of the feature extractor in the updated second domain network are the adaptation parameters
  • the network parameters of the projection network and the network parameters of the classifier in the updated second domain network remain unchanged, that is, the same as in the original second domain network
  • the network parameters of the projection network and the network parameters of the classifier are the same.
  • the second source domain sample image is used as meta-test data, and input to the feature extractor of the second domain network for feature extraction to obtain third image features.
  • the second source domain sample image in the meta-testing phase, is used as meta-testing data, and is simultaneously input into the feature extractor of the second domain network and the comprehensive network, based on the updated second domain network
  • the feature extractor extracts features from the second source domain sample image to obtain third image features; meanwhile, the comprehensive network performs feature extraction on the second source domain sample image to obtain image features.
  • the third step based on the second domain network, respectively perform projection transformation and feature classification on the third image features to obtain fourth image features and second classification results.
  • the third image features extracted by the feature extractor are respectively input to the classifier and the projection network of the second domain network, the classifier outputs the second classification result, and the projection network is used for the third
  • the image features are projectively transformed so that the obtained fourth image features are in the same feature space as the image features extracted by the comprehensive network.
  • the second classification result is used to characterize the confidence that the third image feature belongs to the category of the object to be recognized.
  • the fourth step is to determine a meta-test loss of the first domain network based on the fourth image feature, the second classification result and the second alignment loss.
  • the feature distance between the fourth image feature output by the projection network and the feature of the second source domain sample image extracted by the comprehensive network is determined by determining the image distance between the image feature extracted by the feature extractor of the second domain network and the image feature of the positive sample image.
  • the classification loss of the classifier of the second domain network is determined according to the second classification result and the true value classification result of the image features in the second source domain sample image.
  • the domain loss of the second domain network is obtained by summing the second alignment loss, the image distance loss and the classification loss element-wise.
  • the adaptive parameters determined based on the domain loss of the first domain network as the network parameters of the feature extractor of the second domain network, and combining the second alignment loss between the second domain network and the comprehensive network with the second domain network
  • Combining its own internal loss as a collaborative loss between the second domain network and the first domain network it is possible to collaboratively learn the first domain network and the second domain network based on sample images from multiple different source domains.
  • the synergy loss in the meta-test phase is fed back to the first-domain network, so as to optimize the network parameters of the first-domain network in the meta-training phase.
  • the fifth step is to determine that the meta-test loss is the synergy loss.
  • a sample image of another source domain is used in the meta-test phase to conduct tests based on the second domain network, and the meta-test loss of the second domain network in the meta-test phase is determined, and the The meta-test loss is fed back to the first domain network to optimize network parameters based on the domain loss of the first domain network.
  • the training of the entire object recognition network is realized through multiple iterations. During one iteration, the losses of multiple domain networks and the overall network are determined, and the network parameters of the overall network are optimized based on the exponential moving average method. That is, the above step S104 can be realized through the following steps S141 to S143 (not shown in the figure):
  • Step S141 during the iterative training process of the object recognition network to be trained, determine the historical network parameters of the last iterative training of the comprehensive network.
  • the object recognition network to be trained is trained through multiple iterations until the overall loss of the object recognition network meets the convergence condition, and the iterative process ends.
  • the meta-training loss of the domain network is determined based on the alignment loss of the domain network and the comprehensive network, and the meta-test loss is further determined based on the meta-training loss. Loss completes one iteration.
  • the network parameters in the last iterative training include: network parameters of all domain networks and network parameters of the overall network, from which the historical network parameters of the overall network are determined.
  • Step S142 determining the set of predicted network parameters in the next iterative training of the feature extractors in the first domain network and the second domain network.
  • the network parameters of the feature extractors of all domain networks are determined to obtain a set of predicted network parameters.
  • Step S143 based on the historical network parameters and the set of predicted network parameters, update the network parameters of the comprehensive network in the next iterative training.
  • the weighting coefficient in the exponential moving average method is used to weight the historical network parameters; and to determine the average value of the predicted network parameter set, another weighting coefficient (the other weighting
  • the sum of the weighted coefficients of coefficients and historical network parameters is 1) weighting the average value, summing the two weighted results element by element, and obtaining the summation result; using the summation result as the comprehensive network in the next iteration Network parameters during training.
  • uniformity loss is used to promote a more uniform feature distribution between sample images of different source domains, and this uniformity loss is introduced into iterative training to determine Total loss, optimize the network parameters of the entire network through the total loss, that is, the above step S105 can be realized through steps S151 to S155 (not shown in the figure):
  • Step S151 determining the data volume of the first source domain sample image.
  • a small batch of sample images is obtained, that is, the first source domain sample image, and the number of frames of the first source domain sample image is the first source domain sample image amount of data.
  • sample images having the same amount of sample image data as the first source domain sample images are collected in the corresponding source domain.
  • Step S152 based on the amount of data, the image features of the second source domain sample image extracted by the comprehensive network, and the second image features extracted by the feature extractor of the first domain network, determine the Uniform loss over feature distribution.
  • the spatial distance between the two image features is determined, and The spatial distance is used as the exponent of the power operation with e as the base; the corresponding index is determined for each source domain sample image, and the exponents of all source domain sample images are averaged based on the data volume of the domain sample image, Get this uniform loss.
  • Step S153 using a preset balance amount to adjust the uniform loss to obtain an adjusted uniform loss.
  • the uniform loss is adjusted by using a preset balance value (for example, 0.1), for example, the balanced value is multiplied by the uniform loss to obtain the adjusted uniform loss.
  • a preset balance value for example, 0.1
  • step S154 the adjusted uniform loss, the domain loss of the first domain network, the domain loss of the second domain network, and the collaborative loss are fused to obtain a total loss.
  • a preset weighting coefficient (for example, 0.5) is used to weight the domain loss of each domain network and the synergy loss between the two networks, and the adjusted uniform loss and the weighted domain The loss and the synergy loss are element-wise summed to obtain this total loss.
  • Step S155 based on the total loss, adjust the network parameters of the object recognition network, so that the adjusted overall network output loss meets the convergence condition.
  • the network parameters in the network are optimized as a whole, so as to obtain a comprehensive network that can be applied to the inference stage for feature extraction to facilitate object recognition .
  • the process of optimizing the network parameters based on the total loss can fully take into account the uniformity of feature distribution, This in turn can refer to the network performance of object recognition networks.
  • the embodiment of the present disclosure provides an object recognition method, which can be applied to computer equipment, as shown in Figure 1B, which is a schematic diagram of a system architecture to which the object recognition method of the embodiment of the present disclosure can be applied; as shown in Figure 1B , the system architecture includes: an image acquisition terminal 11 , a network 12 and an object recognition terminal 13 .
  • the image acquisition terminal 11 and the object recognition terminal 13 can establish a communication connection through the network 12 , and the image acquisition terminal 11 reports the collected first image to the object recognition terminal 13 through the network 12 .
  • the object recognition terminal 13 For the first image received by the object recognition terminal 13, first, based on the comprehensive network, feature extraction is performed on the first image and the second image in the preset image database respectively, to obtain the image features of the first image and the image features of the second image; then , based on the image features of the first image and the image features of the second image, the object to be recognized is re-recognized in a preset image library to obtain a recognition result. Finally, the object recognition terminal 13 uploads the recognition result to the network 12 and sends it to the image acquisition terminal 11 through the network 12 . In this way, the trained comprehensive network is used for feature extraction, so that the extracted image features have compact intra-class features and scattered inter-class features, so that object re-identification based on the image features can improve the accuracy of recognition.
  • the image acquisition terminal 11 may include an image acquisition device, and the object recognition terminal 13 may include a processing device with information processing capability or a remote server.
  • the network 12 may adopt a wired connection or a wireless connection.
  • the image acquisition terminal 11 can communicate with the processing device through a wired connection, such as performing data communication through a bus; when the object recognition terminal 13 is a remote server, the image acquisition terminal 11 can be connected through The wireless network exchanges data with the remote server.
  • the image acquisition terminal 11 may be a vision processing device with an image acquisition module, specifically implemented as a host computer with a camera.
  • the object recognition method in the embodiment of the present disclosure may be executed by the object recognition terminal 13 , and the above-mentioned system architecture may not include the network and the image acquisition terminal 11 .
  • the embodiment of the present disclosure provides an object recognition method.
  • pedestrian re-identification is realized based on the extracted image features, as shown in FIG. 2B, which is the object provided by the embodiment of the present disclosure.
  • FIG. 2B A schematic diagram of the implementation flow of the identification method, combined with the steps shown in Figure 2B, the following description is made:
  • Step S21 acquiring a first image including an object to be identified.
  • the object to be identified may be a pedestrian, an animal or other objects that need to be identified in a preset database.
  • the first image may be an image collected in any scene including pedestrians, animals or other objects.
  • the first image includes the pre-set restriction conditions of the object to be recognized; taking the object to be recognized as a pedestrian and re-identifying the pedestrian in the preset image library as an example, the first image may include limiting conditions such as describing the pedestrian's identity information, facial features, or physical features, and based on the limiting conditions, images meeting these conditions are matched in the preset image library.
  • Step S22 performing feature extraction on the first image and the second image in the preset image library based on the comprehensive network, to obtain image features of the first image and image features of the second image.
  • the comprehensive network can be obtained by training the object recognition network training method provided in the above embodiments.
  • the comprehensive network can be a feature extractor, or a network including both feature extractors and object recognition. If the comprehensive network is a feature extractor, then the comprehensive network is used as a feature extraction module and embedded into the network architecture for object recognition to form a complete object recognition network.
  • the preset image library can include a static image library of a large number of images. Taking pedestrian re-identification as an example, the first image including the constraints of the object to be recognized and each image in the preset image library are input into the comprehensive network for feature recognition.
  • the comprehensive network is obtained by parameter optimization based on multiple domain networks based on collaborative learning, even if the preset image library belongs to an unknown domain, the image features extracted by the comprehensive network can still be extracted more uniformly, and the extracted image features
  • the data within the middle class is more compact, and the distance between the data between classes is far away and there is a significant gap, which is convenient for re-identification.
  • Step S23 based on the image features of the first image and the image features of the second image, re-identify the object to be recognized in the preset image library to obtain a recognition result.
  • the comprehensive network is a feature extractor
  • both the images of the preset image library and the object to be recognized are input into the object recognition network
  • feature extraction is performed through the comprehensive network
  • other components in the network architecture are used.
  • the module Based on the image features extracted by the comprehensive network, the module performs object recognition in the preset image library to identify the target image whose similarity with the object to be recognized is greater than a certain threshold.
  • the identification information of the object marked in the target image is the recognition result.
  • the accuracy of feature extraction by the comprehensive network can be improved, and the extracted image feature class features Compact, inter-class features are scattered, so object re-identification based on the image features can improve the accuracy of recognition.
  • Pedestrian re-identification is a key technology in intelligent video acquisition systems, which aims to find out pictures similar to query pictures in a large number of database pictures by measuring the similarity between a given query picture and database pictures.
  • Pedestrian re-identification is a key technology in intelligent video acquisition systems, which aims to find out pictures similar to query pictures in a large number of database pictures by measuring the similarity between a given query picture and database pictures.
  • collection equipment tens of millions of pedestrian data are generated every day.
  • FIG. 3 is a schematic diagram of the application scenario of the training method of the object recognition network provided by the embodiment of the present disclosure. It can be seen from FIG. 302 features.
  • the features of the data set 301 are shown in feature 303, and the data within the class is more compact in feature 303, and there is a significant gap between the data between classes ; dataset 302 is characterized by feature 304, where the distribution of features 304 is disorganized, showing the negative impact of the domain shift problem.
  • an embodiment of the present disclosure provides a training method for an object recognition network, which is applied in the training process of the pedestrian re-identification model, through the synergistic learning between domain networks and the collaborative learning of comprehensive experts and domain networks, to improve
  • the generalization ability of the obtained person re-identification model that is, the obtained model can be well applied in real scenes without much performance degradation.
  • the performance of the pedestrian re-identification model in unknown domains is improved by using data sets from multiple domains.
  • the pedestrian re-identification model trained based on this method can cope with the problem of domain transfer and has good performance. Generalization performance.
  • N source domains are obtained, that is, N data sets of object recognition networks for training, expressed as and M target domains for testing, denoted as where there is no overlap between the source and target domains, i.e.
  • the kth source domain with Pk images Expressed as in, is the i-th image, is the corresponding label from the label space Y k .
  • the source domains of the multi-source domain generalization for person re-identification network do not share the label space, namely
  • the goal of multi-source DG-ReID is to make full use of N source domains to train a more general model, which is able to have better performance on M target domains.
  • Embodiments of the present disclosure provide a network architecture based on multi-task learning, that is, a multi-domain equality baseline (Multi-Domain Equality, MDE).
  • MDE Multi-Domain Equality
  • L id and L tri of ⁇ balance are softmax classification loss (for example, cross-entropy loss) and triplet loss (triplet loss), as shown in formulas (2) and (3):
  • F( ⁇ ; ⁇ ) and C( ⁇ ; ⁇ n) denote the shared feature extractor and n-th domain classifier, express The farthest positive sample and the nearest negative sample, m is a triplet distance matrix fixed at 0.3.
  • Figure 4 is a schematic diagram of the implementation framework of the training method of the object recognition network provided by the embodiment of the disclosure, the framework includes: multiple source domains as input 401 to 4n1, multiple domain networks 41 to 4n in the multi-domain network learning phase and a comprehensive expert network 402 to collect knowledge learned from domain networks; where:
  • each domain network is composed of three sub-networks: feature extractor, classifier and projection network.
  • feature extractor classifier
  • projection network Represent N domain networks according to domain categories as Among them, ⁇ n , ⁇ n , and ⁇ n represent the model parameters corresponding to the nth domain network.
  • the full network consists of a feature extractor parameterized by V, denoted as F( ⁇ ;v).
  • the feature extractors of domain networks 41 to 4n and comprehensive network 402 share the same network architecture.
  • model-agnostic meta-learning is applied to the training of each domain network, so that not only the generalization ability of the model can be further improved, but also through the meta-learning
  • classifiers and projection networks of other domain networks are used to strengthen the interaction between domain networks.
  • the input to train the object recognition network includes: N source domains data; N domain networks, a total of N feature extractors N classifiers N projection networks 1 full network F( ⁇ ;v).
  • the output is: the network parameters of the full network.
  • the training of the kth domain network is taken as an example for illustration:
  • the first step is to start with A batch of samples are sampled in the domain as the meta-training data, denoted as (x k , y k ).
  • a domain is randomly selected from other domains Then sample a batch of samples as meta-test data, denoted as (x j ,y j ).
  • the third step is collaborative learning between domain networks.
  • FIG. 5 is a schematic diagram of a collaborative learning framework between different domain networks provided by an embodiment of the present disclosure, including the following process:
  • meta-training data from the source domain 501( represents the k-th source domain) is input to the feature extractor F( ⁇ ; ⁇ k )52 of the k-th domain network, and the features extracted by F( ⁇ ; ⁇ k )52 are input to the classifier C( ⁇ ; ⁇ k )53 and the projection network P( ⁇ ; ⁇ k )54, determine the meta-training loss L mt r 503 (which can be understood as the domain loss of the k-th domain network), as shown in formula (4) :
  • is the step size, which can be set as 0.1.
  • the meta-test data 502 (representing the j-th source domain) is input to the feature extractor F( ⁇ ; ⁇ ' k ) 55 of the j-th domain network, and the features extracted by the feature extractor 55 are respectively input to the classifier 56 and the projection network P( ⁇ ; ⁇ j ) 57; and the adaptation parameter ⁇ ' k is used as the network parameter of the feature extractor of the jth domain network to determine the meta-test loss function L mte 504, which can be expressed as L mte (x j ,y j ; ⁇ ' k , ⁇ j , ⁇ j ) (corresponding to the synergy loss in the above embodiment), can be based on the alignment loss between the comprehensive network and the jth domain network, the jth domain network
  • the triplet loss of the feature extractor and the classification loss of the classifier determine the meta-test loss.
  • the meta-test loss on (x j , y j ) is determined conditional on ⁇ ′ k , where ⁇ j , ⁇ j are the network parameters of the j-th domain network.
  • the network parameters of the classifier in the second domain network and the network parameters of the projection network are not optimized during the meta-testing stage.
  • L mte and L mtr have the same form but have different input and network parameters.
  • ⁇ k , ⁇ k , and ⁇ k are optimized based on L mte and L mtr respectively, namely
  • the fourth step is the collaborative learning between the comprehensive network and the domain network.
  • the collaborative learning between the comprehensive network and the domain network includes the following processes:
  • the sampled (x k , y k ) sample data 601 in the domain is input to the comprehensive network F( ⁇ ;v) 602 to extract features; at the same time, it is input to the feature extractor F( ⁇ ; ⁇ k of the kth domain network ) 603 and transformed by the projection network P( ⁇ ; ⁇ K ) 604 to obtain the features presented in the feature space 605, and give the features in the feature space 605 to determine the first alignment loss L align , as shown in formula (6) Shown:
  • the feature representations of the same shape come from the same sample image, and the feature representations of the same color are extracted by the same network.
  • represents the Euclidean distance between the two.
  • v (T-1) represents the parameters of the comprehensive network in (T-1) iterations (i.e., the previous iteration training), is the parameter of the feature extractor of the nth domain network in the current iteration T, and ⁇ can be set to a dataset adjustment of 0.999.
  • the first step to the fourth step above are repeated to complete the training of N domain networks, and one iteration is completed. Then continue to iterate until the object recognition network converges, and finally output the comprehensive network for actual model deployment.
  • a uniform loss is used to make the feature distribution between different domains more uniform, and the uniform loss is shown in formula (8):
  • model training is implemented based on multi-domain network collaboration, and the generalization of the network is improved.
  • each domain network is responsible for learning the knowledge of a domain, and prevents its own overfitting through cooperative learning with other domain networks;
  • the learned knowledge so that the comprehensive network can understand the information of all domains, and can further improve the generalization performance of the network.
  • FIG. 7 is a schematic diagram of the structural composition of the training device for an object recognition network according to an embodiment of the present disclosure.
  • the training device 700 for an object recognition network includes:
  • the first determining module 701 is configured to determine the first alignment loss between the comprehensive network and the first domain network based on the first source domain sample image and the second source domain sample image, and the comprehensive network and the first domain network The second alignment loss between the second domain network; wherein, there is no overlap between the first source domain sample image and the second source domain sample image, and the first source domain sample image is used to train the A first domain network, the second source domain sample image is used to train the second domain network;
  • the second determination module 702 is configured to determine the first domain network based on the first source domain sample image and the first alignment loss, the second source domain sample image and the second alignment loss, respectively. domain loss, determining the domain loss of said second domain network;
  • the third determining module 703 is configured to determine the difference between the first domain network and the second domain network based on the second source domain sample image, the second alignment loss and the domain loss of the first domain network loss of synergy between
  • the fourth determination module 704 is configured to determine the network parameters of the comprehensive network based on the network parameters of the first domain network and the second domain network;
  • the first adjustment module 705 is configured to, based on the domain loss of the first domain network, the domain loss of the second domain network, and the synergy loss between the first domain network and the second domain network, The network parameters of the object recognition network described above are adjusted so that the loss output by the adjusted comprehensive network satisfies the convergence condition.
  • the first determination module 701 includes:
  • the first extraction submodule is configured to perform feature extraction on the first source domain sample image based on the comprehensive network to obtain first image features
  • the first transformation submodule is configured to perform feature extraction and projection transformation on the first source domain sample image based on the first domain network to obtain second image features;
  • a first determining submodule configured to determine a first alignment loss between the comprehensive network and the first domain network based on the second image feature and the first image feature.
  • the first determining submodule includes:
  • a first determining unit configured to determine a feature distance between the second image feature and the first image feature in the feature space of the first source domain sample image
  • the second determining unit is configured to determine the first alignment loss based on the feature distance and a true feature distance between features in the first source domain sample image.
  • the second determining module 702 includes:
  • the first training submodule is configured to use the first source domain sample image as meta-training data to train the first domain network to obtain an internal loss of the first domain network;
  • the second determination submodule is configured to determine a meta-training loss for adjusting network parameters of the first domain network based on the internal loss and the first alignment loss;
  • the third determining submodule is configured to determine the meta-training loss of the first domain network as the domain loss.
  • the first training submodule includes:
  • the third determination unit is configured to determine the image distance loss between the first source domain sample image and the positive sample image based on the image features extracted by the first domain network and the image features of the positive sample image;
  • the first classification unit is configured to classify the image features extracted by the first domain network in the first domain network to obtain a first classification result
  • a fourth determination unit configured to determine the classification loss of the image features extracted by the first domain network based on the first classification result and the ground-truth classification label of the first source domain sample image;
  • a fifth determination unit configured to determine the classification loss and the image distance loss as internal losses of the first domain network
  • the second determining submodule is also configured to:
  • the first alignment loss, the image distance loss and the classification loss are fused to obtain a meta-training loss of the first domain network.
  • the first domain network includes: a feature extractor, a projection network, and a classifier
  • the first transformation submodule is further configured to: based on the feature extractor, perform an operation on the first source domain performing feature extraction on the sample image; based on the projection network, performing projection transformation on the image features extracted by the feature extractor to obtain the second image features;
  • the first classification unit is further configured to: classify the image features extracted by the feature extractor based on the classifier to obtain the first classification result.
  • the third determination module 703 includes:
  • the fourth determining submodule is configured to determine adaptation parameters based on the network parameters of the feature extractor of the first domain network and the domain loss of the first domain network;
  • the fifth determining submodule is configured to determine a synergy loss between the first domain network and the second domain network based on the adaptation parameter, the second alignment loss, and the second source domain sample image.
  • the fifth determining submodule includes:
  • the first update unit is configured to use the adaptation parameter as a network parameter of the feature extractor of the second domain network to obtain the updated second domain network;
  • the first extraction unit is configured to use the second source domain sample image as meta-test data, input it into the feature extractor of the second domain network for feature extraction, and obtain a third image feature;
  • the second classification unit is configured to respectively perform projection transformation and feature classification on the third image features based on the second domain network, to obtain fourth image features and second classification results;
  • a sixth determination unit configured to determine a meta-test loss of the first domain network based on the fourth image feature, the second classification result, and the second alignment loss;
  • the seventh determination unit is configured to determine that the meta-test loss is the synergy loss.
  • the fourth determination module 704 includes:
  • the sixth determining submodule is configured to determine the historical network parameters of the comprehensive network in the last iterative training during the iterative training process of the object recognition network to be trained;
  • the seventh determination sub-module is configured to determine the set of predicted network parameters in the next iterative training of the feature extractor in the first domain network and the second domain network;
  • the first update submodule is configured to update the network parameters of the comprehensive network in the next iterative training based on the historical network parameters and the set of predicted network parameters.
  • the first adjustment module 705 includes:
  • the eighth determination submodule is configured to determine the data amount of the first source domain sample image
  • the ninth determination submodule is configured to be based on the amount of data, the image features of the second source domain sample image extracted by the comprehensive network, and the second image features extracted by the feature extractor of the first domain network, Determine the uniform loss used to adjust the distribution of image features;
  • the first adjustment sub-module is configured to adjust the uniform loss by using a preset balance amount to obtain an adjusted uniform loss
  • the first fusion submodule is configured to fuse the adjusted uniform loss, the domain loss of the first domain network, the domain loss of the second domain network, and the collaborative loss to obtain a total loss;
  • the second adjustment submodule is configured to adjust the network parameters of the object recognition network based on the total loss, so that the adjusted loss of the overall network output meets a convergence condition.
  • FIG. 8 is a schematic diagram of the structure and composition of the object recognition device according to the embodiment of the present disclosure. As shown in FIG. 8 , the object recognition device 800 includes:
  • the first acquiring module 801 is configured to acquire a first image including an object to be identified
  • the first extraction module 802 is configured to perform feature extraction on the first image and the second image in the preset image library based on the comprehensive network, to obtain the image features of the first image and the image features of the second image ;
  • the comprehensive network is obtained by training based on the training method of the object recognition network provided in the above-mentioned embodiments;
  • the first recognition module 803 is configured to re-recognize the object to be recognized in the preset image library based on the image features of the first image and the image features of the second image to obtain a recognition result.
  • the above-mentioned object recognition network training method is implemented in the form of a software function module and sold or used as an independent product, it can also be stored in a computer-readable storage medium .
  • the computer software products are stored in a storage medium, including several instructions for A computer device (which may be a terminal, a server, etc.) is made to execute all or part of the methods described in various embodiments of the present disclosure.
  • the aforementioned storage media include: various media that can store program codes such as U disk, sports hard disk, read-only memory (Read Only Memory, ROM), magnetic disk or optical disk.
  • embodiments of the present disclosure are not limited to any specific combination of hardware and software.
  • an embodiment of the present disclosure further provides a computer program product, the computer program product includes computer-executable instructions, and after the computer-executable instructions are executed, the object recognition network training method provided by the embodiments of the present disclosure can be implemented. A step of.
  • an embodiment of the present disclosure further provides a computer storage medium, on which computer-executable instructions are stored, and when the computer-executable instructions are executed by a processor, the object recognition network provided by the above-mentioned embodiments is implemented.
  • the steps of the training method are performed by a processor.
  • FIG. 9 is a schematic diagram of the composition and structure of a computer device in an embodiment of the present disclosure.
  • the computer device 900 includes: a processor 901, at least one communication bus, A communication interface 902, at least one external communication interface and a memory 903.
  • the communication interface 902 is configured to realize connection and communication between these components.
  • the communication interface 902 may include a display screen, and the external communication interface may include a standard wired interface and a wireless interface.
  • the processor 901 is configured to execute the image processing program in the memory, so as to realize the steps of the method for training the object recognition network provided in the above embodiment.
  • the above description of the training device, computer equipment and storage medium embodiment of the object recognition network is similar to the description of the above-mentioned method embodiment, and has similar technical descriptions and beneficial effects with the corresponding method embodiment. Due to space limitations, the above-mentioned method can be used The description of the embodiment is omitted here.
  • the technical details not disclosed in the embodiment of the training device, computer equipment and storage medium of the object recognition network of the present disclosure please refer to the description of the method embodiment of the present disclosure for understanding.
  • the disclosed devices and methods may be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the units is only a logical function division.
  • the coupling, or direct coupling, or communication connection between the components shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be electrical, mechanical or other forms of.
  • each functional unit in each embodiment of the present disclosure may be integrated into one processing unit, or each unit may be used as a single unit, or two or more units may be integrated into one unit; the above-mentioned integration
  • the unit can be realized in the form of hardware or in the form of hardware plus software functional unit.
  • the above-mentioned integrated units of the present disclosure are realized in the form of software function modules and sold or used as independent products, they may also be stored in a computer-readable storage medium.
  • the computer software products are stored in a storage medium, including several instructions for Make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the methods described in various embodiments of the present disclosure.
  • the aforementioned storage medium includes various media capable of storing program codes such as removable storage devices, ROMs, magnetic disks or optical disks.
  • Embodiments of the present disclosure provide an object recognition method, a network training method and device, equipment, media, and products.
  • the object recognition network includes a comprehensive network, a first field network, and a second field network.
  • the method includes: based on the first a source domain sample image and a second source domain sample image, respectively determining a first alignment loss between the comprehensive network and the first domain network, and a second alignment loss between the comprehensive network and the second domain network Alignment loss; wherein, there is no overlap between the first source domain sample image and the second source domain sample image, the first source domain sample image is used to train the first domain network, and the second source domain
  • the domain sample image is used to train the second domain network; based on the first source domain sample image and the first alignment loss, the second source domain sample image and the second alignment loss, determine the The domain loss of the first domain network, determining the domain loss of the second domain network; based on the second source domain sample image, the second alignment loss, and the domain loss of the first domain network, determining the first domain

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

Les modes de réalisation de la présente divulgation concernent un procédé et un appareil de reconnaissance d'objet, un procédé et un appareil d'entraînement de réseau, un dispositif, un support et un produit. Le procédé d'entraînement de réseau comprend les étapes consistant à : sur la base d'une première image d'échantillon de domaine source et d'une seconde image d'échantillon de domaine source, déterminer respectivement une première perte d'alignement entre un réseau global et un premier réseau de domaine et une seconde perte d'alignement entre le réseau global et un second réseau de domaine; déterminer une perte de domaine du premier réseau de domaine et une perte de domaine du second réseau de domaine respectivement sur la base de la première image d'échantillon de domaine source et de la première perte d'alignement et de la seconde image d'échantillon de domaine source et de la seconde perte d'alignement; déterminer une perte cumulée entre les deux réseaux de domaine sur la base de la seconde image d'échantillon de domaine source, de la seconde perte d'alignement et de la perte de domaine du premier réseau de domaine; déterminer des paramètres de réseau du réseau global sur la base de paramètres de réseau du premier réseau de domaine et du second réseau de domaine; et ajuster des paramètres de réseau d'un réseau de reconnaissance d'objet sur la base de la perte de domaine des deux réseaux de domaine et de la perte cumulée entre ceux-ci, de telle sorte que la perte produite par le réseau global ajusté satisfait une condition de convergence. De cette manière, grâce à l'entraînement collaboratif entre de multiples réseaux de domaine et à la collaboration entre le réseau global et chaque réseau de domaine, le réseau de reconnaissance d'objet obtenu peut présenter une précision de reconnaissance élevée dans n'importe quel domaine de données et les performances de généralisation du réseau de reconnaissance d'objet peuvent être améliorées.
PCT/CN2022/077443 2021-09-15 2022-02-23 Procédé et appareil de reconnaissance d'objet, procédé et appareil d'entraînement de réseau, dispositif, support et produit WO2023040195A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111081370.9 2021-09-15
CN202111081370.9A CN113837256B (zh) 2021-09-15 2021-09-15 对象识别方法、网络的训练方法及装置、设备及介质

Publications (1)

Publication Number Publication Date
WO2023040195A1 true WO2023040195A1 (fr) 2023-03-23

Family

ID=78959464

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/077443 WO2023040195A1 (fr) 2021-09-15 2022-02-23 Procédé et appareil de reconnaissance d'objet, procédé et appareil d'entraînement de réseau, dispositif, support et produit

Country Status (2)

Country Link
CN (1) CN113837256B (fr)
WO (1) WO2023040195A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116340833A (zh) * 2023-05-25 2023-06-27 中国人民解放军海军工程大学 基于改进领域对抗式迁移网络的故障诊断方法

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113837256B (zh) * 2021-09-15 2023-04-07 深圳市商汤科技有限公司 对象识别方法、网络的训练方法及装置、设备及介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200226421A1 (en) * 2019-01-15 2020-07-16 Naver Corporation Training and using a convolutional neural network for person re-identification
CN111476168A (zh) * 2020-04-08 2020-07-31 山东师范大学 一种基于三阶段的跨域行人重识别方法和系统
CN111860823A (zh) * 2019-04-30 2020-10-30 北京市商汤科技开发有限公司 神经网络训练、图像处理方法及装置、设备及存储介质
CN112396119A (zh) * 2020-11-25 2021-02-23 上海商汤智能科技有限公司 一种图像处理方法及装置、电子设备和存储介质
CN113837256A (zh) * 2021-09-15 2021-12-24 深圳市商汤科技有限公司 对象识别方法、网络的训练方法及装置、设备及介质

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109919317A (zh) * 2018-01-11 2019-06-21 华为技术有限公司 一种机器学习模型训练方法和装置
CN111723611A (zh) * 2019-03-20 2020-09-29 北京沃东天骏信息技术有限公司 行人再识别方法、装置及存储介质
CN111126360B (zh) * 2019-11-15 2023-03-24 西安电子科技大学 基于无监督联合多损失模型的跨域行人重识别方法
CN112215280B (zh) * 2020-10-12 2022-03-15 西安交通大学 一种基于元骨干网络的小样本图像分类方法
CN112861995B (zh) * 2021-03-15 2023-03-31 中山大学 基于模型无关元学习的无监督少样本图像分类方法、系统及存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200226421A1 (en) * 2019-01-15 2020-07-16 Naver Corporation Training and using a convolutional neural network for person re-identification
CN111860823A (zh) * 2019-04-30 2020-10-30 北京市商汤科技开发有限公司 神经网络训练、图像处理方法及装置、设备及存储介质
CN111476168A (zh) * 2020-04-08 2020-07-31 山东师范大学 一种基于三阶段的跨域行人重识别方法和系统
CN112396119A (zh) * 2020-11-25 2021-02-23 上海商汤智能科技有限公司 一种图像处理方法及装置、电子设备和存储介质
CN113837256A (zh) * 2021-09-15 2021-12-24 深圳市商汤科技有限公司 对象识别方法、网络的训练方法及装置、设备及介质

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
YU SHIJIE, ZHU FENG, CHEN DAPENG, ZHAO RUI, CHEN HAOBIN, TANG SHIXIANG, ZHU JINGUO, QIAO YU: "Multiple Domain Experts Collaborative Learning: Multi-Source Domain Generalization For Person Re-Identification", ARXIV:2105.12355V1, 26 May 2021 (2021-05-26), XP093048573, Retrieved from the Internet <URL:https://arxiv.org/pdf/2105.12355v1.pdf> [retrieved on 20230522], DOI: 10.48550/arxiv.2105.12355 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116340833A (zh) * 2023-05-25 2023-06-27 中国人民解放军海军工程大学 基于改进领域对抗式迁移网络的故障诊断方法
CN116340833B (zh) * 2023-05-25 2023-10-13 中国人民解放军海军工程大学 基于改进领域对抗式迁移网络的故障诊断方法

Also Published As

Publication number Publication date
CN113837256B (zh) 2023-04-07
CN113837256A (zh) 2021-12-24

Similar Documents

Publication Publication Date Title
Yang et al. Heterogeneous graph attention network for unsupervised multiple-target domain adaptation
Hao et al. HSME: Hypersphere manifold embedding for visible thermal person re-identification
He et al. Neural factorization machines for sparse predictive analytics
Shen et al. Label distribution learning forests
Zhang et al. Progressive meta-learning with curriculum
WO2023040195A1 (fr) Procédé et appareil de reconnaissance d&#39;objet, procédé et appareil d&#39;entraînement de réseau, dispositif, support et produit
Shen et al. Stable learning via differentiated variable decorrelation
JP7345530B2 (ja) SuperLoss:堅牢なカリキュラム学習のための一般的な損失
WO2021152329A1 (fr) Apprentissage décentralisé pour nouvelle identification
Wang et al. Camera compensation using a feature projection matrix for person reidentification
Kim et al. Adaptive graph adversarial networks for partial domain adaptation
CN111382283A (zh) 资源类别标签标注方法、装置、计算机设备和存储介质
Canal et al. Active embedding search via noisy paired comparisons
WO2022134576A1 (fr) Procédé, appareil et dispositif de positionnement de comportement de moment de vidéo infrarouge, et support de stockage
Xu et al. Graphical modeling for multi-source domain adaptation
Wan et al. A new weakly supervised discrete discriminant hashing for robust data representation
Shi et al. Robust and fuzzy ensemble framework via spectral learning for random projection-based fuzzy-c-means clustering
CN113920382A (zh) 基于类一致性结构化学习的跨域图像分类方法和相关装置
Lu et al. Video person re-identification using key frame screening with index and feature reorganization based on inter-frame relation
WO2022088411A1 (fr) Procédé et appareil de détection d&#39;image, procédé et appareil d&#39;entraînement de modèle associé, ainsi que dispositif, support et programme
Kim et al. Embedded face recognition based on fast genetic algorithm for intelligent digital photography
Xu et al. Prdp: Person reidentification with dirty and poor data
Tian et al. Ordinal margin metric learning and its extension for cross-distribution image data
Bai et al. A unified deep learning model for protein structure prediction
Cui et al. Inverse extreme learning machine for learning with label proportions

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE