CN116910547A - Training and detecting method of detecting model based on domain self-adaption and related equipment - Google Patents

Training and detecting method of detecting model based on domain self-adaption and related equipment Download PDF

Info

Publication number
CN116910547A
CN116910547A CN202310850158.7A CN202310850158A CN116910547A CN 116910547 A CN116910547 A CN 116910547A CN 202310850158 A CN202310850158 A CN 202310850158A CN 116910547 A CN116910547 A CN 116910547A
Authority
CN
China
Prior art keywords
region
training sample
training
feature vector
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310850158.7A
Other languages
Chinese (zh)
Inventor
李虹杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Semidrive Technology Co Ltd
Original Assignee
Nanjing Semidrive Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Semidrive Technology Co Ltd filed Critical Nanjing Semidrive Technology Co Ltd
Priority to CN202310850158.7A priority Critical patent/CN116910547A/en
Publication of CN116910547A publication Critical patent/CN116910547A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The application discloses a training and detecting method of a detection model based on domain self-adaption and related equipment, wherein the method comprises the following steps: obtaining training samples under each iteration, wherein the training samples under each iteration comprise source domain samples and/or target domain samples; obtaining a feature map of a training sample under each iteration; based on the feature map, obtaining a classification result of each region of the training sample under each iteration; obtaining the coding result of each region of the feature map; obtaining the coding result of each region of the training sample under each iteration; and carrying out iterative operation on the model to be trained based on the classification result and the coding result of each region of the training sample under each iteration and the coding result of each region of the feature map so as to train a target detection model, wherein the target detection model is used for carrying out target detection on the image to be detected.

Description

Training and detecting method of detecting model based on domain self-adaption and related equipment
Technical Field
The present application relates to the field of data processing technologies, and in particular, to a training method, a detection method, and related devices for a detection model based on domain adaptation.
Background
In the related art, domain adaptation involves two types of data, one is a target domain and one is a source domain. In colloquial terms, the source domain may be considered as data used in simulation or training, such as sample images, sample speech, and the like. The target field may be considered as data used in practical applications, such as an image to be detected, voice to be detected, and the like. The domain self-adaptive technology is a technology for training a model to be trained under the condition that a target domain label is absent and only a source domain label is needed, so that the model is generalized in a target domain.
Currently, domain adaptive techniques include several methods based on reconstruction, methods based on countermeasure learning, methods based on feature distribution alignment, and methods based on pseudo tags. The method based on the countermeasure learning and the method based on the characteristic distribution alignment have insufficient detection effect. The reconstruction-based method and the pseudo tag-based method are complex to implement and have high calculation difficulty.
Disclosure of Invention
The application provides a training method, a detection method and related equipment of a detection model based on domain self-adaption, which at least solve the technical problems in the prior art.
According to a first aspect of the present application, there is provided a training method of a domain-based adaptive detection model, comprising:
Obtaining training samples under each iteration, wherein the training samples under each iteration comprise source domain samples and/or target domain samples;
obtaining a feature map of a training sample under each iteration;
based on the feature map, obtaining a classification result of each region of the training sample under each iteration;
obtaining the coding result of each region of the feature map;
obtaining the coding result of each region of the training sample under each iteration;
and carrying out iterative operation on the model to be trained based on the classification result and the coding result of each region of the training sample under each iteration and the coding result of each region of the feature map so as to train a target detection model, wherein the target detection model is used for carrying out target detection on the image to be detected.
In an embodiment, the obtaining the encoding result of each region of the feature map includes:
determining a partitioning strategy for each region of the training sample;
according to the division strategy of each region of the training sample, carrying out region division on the feature map to obtain each region of the feature map;
and adopting a preset coding algorithm to code each region of the feature map to obtain a coding result of each region of the feature map.
In an embodiment, the performing an iterative operation on the model to be trained based on the classification result and the encoding result of each region of the training sample under each iteration and the encoding result of each region of the feature map to train the target detection model includes:
based on the classification result of each region of the training sample under each iteration and the coding result of each region of the feature map of the training sample, obtaining each first feature vector marked with the classification result, wherein each first feature vector corresponds to each region of the feature map;
based on the classification result and the coding result of each region of the training sample under each iteration, obtaining each second feature vector marked with the classification result, wherein each second feature vector corresponds to each region of the training sample;
performing iterative operation on the model to be trained based on the first feature vectors marked with the classification results and the second feature vectors marked with the classification results and/or based on the second feature vectors marked with the classification results and the third feature vectors marked with the classification results under the previous training samples so as to train a target detection model;
Wherein, each third feature vector is obtained by the classification result and the coding result of each region of the training sample under the previous iteration.
In an embodiment, the performing an iterative operation on the model to be trained based on the first feature vectors with the classification results and the second feature vectors with the classification results to train the target detection model includes:
determining a first feature vector and a second feature vector corresponding to the same region of the training sample and a first feature vector and a second feature vector corresponding to different regions of the training sample from each first feature vector identified with the classification result and each second feature vector identified with the classification result;
obtaining a first instance pair based on a first feature vector and a second feature vector corresponding to the same region of the training sample;
obtaining a second instance pair based on the first feature vector and the second feature vector corresponding to different regions of the training sample;
and carrying out iterative operation on the model to be trained based on the first instance pair and the second instance pair so as to train the target detection model.
In an embodiment, the performing an iterative operation on the model to be trained based on the second feature vectors identified with the classification results and the third feature vectors identified with the classification results under the previous training samples to train the target detection model includes:
Determining a second feature vector and a third feature vector which are marked with the same classification result and a second feature vector and a third feature vector which are marked with different classification results from the second feature vectors marked with the classification results and the third feature vectors marked with the classification results;
obtaining a third instance pair based on the second feature vector and the third feature vector identified with the same classification result;
obtaining a fourth instance pair based on the second feature vector and the third feature vector with different classification results;
and carrying out iterative operation on the model to be trained based on the third instance pair and the fourth instance pair so as to train the target detection model.
In one embodiment, each region of the training sample includes a plurality of pixels;
based on the feature map, obtaining a classification result of each region of the training sample under each iteration, including:
based on the feature map, obtaining a classification result of each pixel in each region of the training sample under each iteration;
based on the classification result of each pixel in each region, the classification result of each region is obtained.
According to a second aspect of the present application, there is provided a domain-adaptive-based detection method, comprising:
Obtaining an image to be detected;
inputting an image to be detected into a target detection model to obtain one or more target objects in the image to be detected;
the target detection model is obtained by training a model to be trained through classification results and coding results of all areas of a training sample under each iteration and coding results of all areas of a feature map of the training sample; the classification result of each region of the training sample is obtained through a feature map of the training sample; the coding result of each region of the training sample is obtained by extracting the characteristics of the training sample; the training samples under each iteration comprise source domain samples and/or target domain samples; the target detection model classifies one or more target objects in an image to be detected through a feature map of the image to be detected.
According to a third aspect of the present application, there is provided a training device based on a domain-adaptive detection model, comprising:
the first obtaining unit is used for obtaining training samples under each iteration, wherein the training samples under each iteration comprise source domain samples and/or target domain samples;
the second obtaining unit is used for obtaining a feature map of the training sample under each iteration;
The third obtaining unit is used for obtaining a classification result of each region of the training sample under each iteration based on the feature map;
a fourth obtaining unit configured to obtain a result of encoding each region of the feature map;
a fifth obtaining unit, configured to perform feature extraction on the training sample under each iteration, to obtain a coding result of each region of the training sample under each iteration;
the training unit is used for carrying out iterative operation on the model to be trained based on the classification result and the coding result of each region of the training sample under each iteration and the coding result of each region of the feature map so as to train a target detection model, wherein the target detection model is used for carrying out target detection on the image to be detected.
According to a fourth aspect of the present application, there is provided a domain-adaptive based detection device comprising:
the first acquisition module is used for acquiring an image to be detected;
the second acquisition module is used for inputting an image to be detected into the target detection model to obtain one or more target objects in the image to be detected;
the target detection model is obtained by training a model to be trained through classification results and coding results of all areas of a training sample under each iteration and coding results of all areas of a feature map of the training sample; the classification result of each region of the training sample is obtained through a feature map of the training sample; the coding result of each region of the training sample is obtained by extracting the characteristics of the training sample; the training samples under each iteration comprise source domain samples and/or target domain samples; the target detection model classifies one or more target objects in an image to be detected through a feature map of the image to be detected.
According to a fifth aspect of the present application, there is provided an electronic device comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the methods of the present application.
According to a sixth aspect of the present application there is provided a non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of the present application.
In the application, the training samples and the feature images of the training samples are divided into areas, and the classification results and the coding results of all the areas of the training samples and the coding results of all the areas of the feature images of the training samples are combined to train the model to be trained. Features of similar targets (objects) in different domains and/or the same domain can be made similar, and features of different similar targets (objects) in different domains and/or the same domain can be made far away, so that training accuracy of a target detection model is guaranteed.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the application or to delineate the scope of the application. Other features of the present application will become apparent from the description that follows.
Drawings
The above, as well as additional purposes, features, and advantages of exemplary embodiments of the present application will become readily apparent from the following detailed description when read in conjunction with the accompanying drawings. Several embodiments of the present application are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which:
in the drawings, the same or corresponding reference numerals indicate the same or corresponding parts.
FIG. 1 shows a first implementation flow diagram of a training method based on a domain-adaptive detection model in an embodiment of the present application;
FIG. 2 shows a second implementation flow chart of a training method based on a domain-adaptive detection model in an embodiment of the present application;
FIG. 3 shows a third implementation flow diagram of a training method based on a domain-adaptive detection model in an embodiment of the present application;
FIG. 4 is a schematic diagram of the division of a sample image in an embodiment of the application;
FIG. 5 shows a fourth implementation flow diagram of a training method based on a domain-adaptive detection model in an embodiment of the present application;
FIG. 6 is a schematic diagram showing an implementation flow of a domain-adaptive detection method in an embodiment of the present application;
FIG. 7 is a schematic diagram showing the composition and structure of a training device based on a domain-adaptive detection model in an embodiment of the present application;
FIG. 8 is a schematic diagram showing the composition and structure of a detection device based on domain adaptation in an embodiment of the present application;
fig. 9 is a schematic diagram showing a composition structure of an electronic device in an embodiment of the present application.
Detailed Description
In order to make the objects, features and advantages of the present application more comprehensible, the technical solutions according to the embodiments of the present application will be clearly described in the following with reference to the accompanying drawings, and it is obvious that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
The present application will be further described in detail with reference to the accompanying drawings, for the purpose of making the objects, technical solutions and advantages of the present application more apparent, and the described embodiments should not be construed as limiting the present application, and all other embodiments obtained by those skilled in the art without making any inventive effort are within the scope of the present application.
In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is to be understood that "some embodiments" can be the same subset or different subsets of all possible embodiments and can be combined with one another without conflict.
In the following description, the terms "first", "second", and the like are merely used to distinguish between similar objects and do not represent a particular ordering of the objects, it being understood that the "first", "second", or the like may be interchanged with one another, if permitted, to enable embodiments of the application described herein to be practiced otherwise than as illustrated or described herein.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the application only and is not intended to be limiting of the application.
It should be understood that, in various embodiments of the present application, the size of the sequence number of each implementation process does not mean that the execution sequence of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of the present application.
In the training stage, iterative operation is carried out on a model to be trained based on classification results and coding results of all areas of a training sample under each iteration and coding results of all areas of a feature map of the training sample so as to train a target detection model. Compared with the related art, the method has the advantages of uncomplicated flow and easy implementation. The classification result and the coding result of each region of the training sample and the coding result of each region of the feature map of the training sample are combined to train the model to be trained, so that the features of similar targets (objects) in different domains and/or in the same domain become similar, the features of different similar targets (objects) in different domains and/or in the same domain become distant, and the training accuracy of the target detection model is ensured. In the application stage or model reasoning stage, the detection accuracy of the target object in the image to be detected can be improved through an accurate target detection model, and the detection effect with high accuracy is achieved.
The following describes the technical scheme of the training stage of the application.
Fig. 1 shows a schematic implementation flow diagram of a training method based on a domain-adaptive detection model in an embodiment of the present application. As shown in fig. 1, the method includes:
s101: training samples for each iteration are obtained, the training samples for each iteration comprising source domain samples and/or target domain samples.
In the application, the training sample can be any one of images, texts and voices. In some embodiments, the training samples are preferably images.
In the application, iterative operation is adopted to iterate the model to be trained for a plurality of times so as to realize the training of the model to be trained. The iteration times adopted in the iterative operation can be flexibly set according to the actual situation. The training samples at each iteration include samples of one of two different domains (the target domain and the source domain), i.e., one of the source domain samples and the target domain samples. Wherein the source domain samples have sample tags and the target domain samples do not have sample tags.
In practice, training samples for the same number of iterations may be source domain only samples or target domain only samples. The training samples for the same number of iterations may also include both source domain samples and target domain samples. It is also possible that, for two adjacent iterations, one is the source domain sample and the other is the target domain sample. The specific examples are not limited.
It can be understood that the technical scheme of the application is to train the model to be trained by utilizing the source domain sample and the target domain sample under the condition of lacking the target domain sample label and only the source domain sample label, and can be regarded as a domain self-adaptive scheme.
S102: a feature map of the training sample at each iteration is obtained.
In the step, the characteristics of the training sample under each iteration are extracted to obtain a characteristic diagram of the training sample under each iteration.
S103: and based on the feature map, obtaining a classification result of each region of the training sample under each iteration.
According to the application, the training sample can be subjected to region division according to the first region division strategy, so that each region of the training sample is obtained. The first region division policy may indicate that the equal region division or the unequal region division is performed. The first region partitioning policy may also indicate the number of partitioned regions.
Illustratively, as shown in fig. 4, a training image is divided into 8 rows by 8 columns of 64 equally large areas according to a first area division strategy. Each region may be considered an image block in the training image. And identifying target objects existing in each image block by utilizing the feature map of the training image so as to obtain a classification result of each region of the training image. If the target object existing in the area 1 (image block 1) is a car, the classification result of the area 1 is that the target object of the area is a "car". The target object existing in the area 2 (image block 2) is a pedestrian, and the classification result of the area 2 is that the target object of the area is a "pedestrian".
In the foregoing scheme, the regions of the training sample are regarded as the whole based on the feature map of the training sample, so as to realize the distinction or classification of the target objects existing in the regions of the training sample. In addition, each region of the training sample includes a plurality of pixels. Based on the feature map of the training sample, a classification result of each pixel in each region of the training sample under each iteration is obtained. Based on the classification result of each pixel in each region, the classification result of each region is obtained.
Illustratively, taking the example that the region 1 includes a plurality of pixels, the classification result of each pixel of the region 1 is predicted or identified based on the feature map of the training sample. For example, assuming that the area 1 includes 100 pixels, the target object of the recognition pixels 1 to 90 is "car", and the target object of the recognition pixels 91 to 100 is "pedestrian". It can be seen that the target object of most pixels (90 pixels) of the area 1 is "car". The classification result of the majority pixels is taken as the classification result of the region 1, that is, the classification result of the region 1 is that the target object of the region is "car".
S104: and obtaining the coding result of each region of the characteristic diagram.
In the application, the feature map can be subjected to region division according to the second region division strategy to obtain each region of the feature map. The second region division policy may instruct division of equal regions or division of unequal regions. The second region partitioning policy may also indicate the number of partitioned regions. The first region division policy and the second region division policy may be the same policy or different policies. In some embodiments, the first and second region-dividing policies are preferably the same policies.
In this step, the feature map may be divided according to the second region division policy, so as to obtain each region of the feature map. Then, each region of the feature map is encoded to obtain the encoding result of each region. Or firstly, coding the feature map to obtain a coding result aiming at the feature map. Then, according to a second region division strategy, the feature map is divided into regions. And reading the encoding results corresponding to the areas from the encoding results of the feature map, thereby obtaining the encoding results of the areas of the feature map.
S105: the encoding results for each region of the training sample at each iteration are obtained.
In the step, each region of the training sample under each iteration is encoded, and an encoding result of each region of the training sample is obtained. Or, coding the training sample under each iteration to obtain a coding result aiming at the training sample. And reading the coding result of each region corresponding to the training sample from the coding result of the training sample, thereby obtaining the coding result of each region of the training sample.
S106: and carrying out iterative operation on the model to be trained based on the classification result and the coding result of each region of the training sample under each iteration and the coding result of each region of the feature map so as to train a target detection model, wherein the target detection model is used for carrying out target detection on the image to be detected.
In S101-S106, in the whole training process, the training sample comprises a source domain sample and a target domain sample. The training samples and the feature images of the training samples are subjected to region division, and the classification results and the coding results of all the regions of the training samples and the coding results of all the regions of the feature images of the training samples are combined together to train the model to be trained. The coding result of each region of the training sample can embody the characteristics of the target in the training sample, the coding result of each region of the feature map of the training sample can embody the characteristics of the target in the training sample, and training is performed based on the characteristics, so that the characteristics of similar targets (objects) in different domains and/or in the same domain become similar, and the characteristics of different similar targets (objects) in different domains and/or in the same domain become far away, thereby ensuring the training accuracy of the target detection model. Compared with the related art, the technical scheme of the application has the advantages of uncomplicated flow and easy implementation.
As shown in fig. 2, in some embodiments, the above-mentioned technical solution for obtaining the encoding result of each region of the feature map may be implemented as follows.
S201: a partitioning strategy for each region of the training sample is determined.
In this step, it is determined how the training samples are divided. If the training sample is determined to be partitioned by adopting a first area partitioning strategy.
S202: and carrying out region division on the feature map according to a division strategy of each region of the training sample to obtain each region of the feature map.
In this step, the second region division policy is taken as the same policy as the first region division policy. And carrying out region division on the feature map according to a second region division strategy which is the same as the first region division strategy, so as to obtain each region of the feature map.
Illustratively, if the training image is divided into 8 rows by 8 columns as shown in fig. 4 for 64 equally large areas according to the first area division policy. In this step, the feature map of the training image is also divided into 64 equal-sized regions of 8 rows by 8 columns.
S203: and adopting a preset coding algorithm to code each region of the feature map to obtain a coding result of each region of the feature map.
In this step, the preset encoding algorithm may be convolutional encoding. In implementation, the convolutional layer is used to implement convolutional encoding for each region of the feature map, so as to obtain the encoding result of each region of the feature map.
In S201 to S203, the feature map is divided into regions according to the division policy for each region of the training sample, so that consistency of the region division for the feature map and the region division for the training sample can be ensured, so as to facilitate training of the model to be trained. And (3) coding each region of the feature map by adopting a preset coding algorithm to obtain an accurate coding result, thereby realizing accurate training of the model to be trained.
As shown in fig. 3, in some embodiments, the foregoing technical solution for performing iterative operation on the model to be trained to train the target detection model based on the classification result and the encoding result of each region of the training sample under each iteration and the encoding result of each region of the feature map may be implemented as follows.
S301: and obtaining each first feature vector marked with the classification result based on the classification result of each region of the training sample under each iteration and the coding result of each region of the feature map of the training sample, wherein each first feature vector corresponds to each region of the feature map.
Taking the ith iteration as an example, the convolutional layer is utilized to code each region of the feature map of the training sample under the ith iteration, and the obtained coding result is the feature vector aiming at each region of the feature map. For convenience of explanation, a feature vector obtained by encoding each region of the feature map is regarded as a first feature vector. For one training image, the number of first feature vectors obtained by encoding each region of the feature map is consistent with the number of regions of the feature map divided.
For example, if the feature map is divided into 8×8=64 equal-sized regions, the number of the first feature vectors obtained for the feature map is also 64. The 64 first feature vectors (first feature vector 1 to first feature vector 64) correspond to 64 regions (region 1 to region 64) of the feature map. For example, the first feature vector obtained by encoding the region 1 of the feature map using the convolution layer is the first feature vector 1. The first feature vector obtained by encoding the region 2 of the feature map is the first feature vector 2.
Since the training samples and the feature images have the same size, and the region division performed on the training samples and the region division performed on the feature images adopt the same region division strategy, each region of the training samples under the ith iteration corresponds to each region of the feature images of the training samples. Illustratively, region 1 of the training sample (row 1, column 1 region of the training sample) corresponds to region 1 of the feature map of the training sample (row 1, column 1 region of the feature map). Region 2 of the training sample (row 1, column 2 region of the training sample) corresponds to region 2 of the feature map of the training sample (row 1, column 2 region of the feature map).
When each region of the training sample corresponds to each region of the feature map of the training sample and each first feature vector corresponds to each region of the feature map, each region of the training sample is associated with each first feature vector. In this way, the classification result of each region of the training sample may be used as the classification result of each first feature vector, thereby obtaining each first feature vector identified with the classification result.
Illustratively, if the classification result of the region 1 of the training sample is "01" and "01" indicates that the main target object of the region 1 is a vehicle, the classification result of the first feature vector 1 of the region 1 of the feature map of the training sample is also "01". The classification result of the region 2 of the training sample is "00", and "00" indicates that the main target object of the region 2 is a pedestrian, and the classification result of the first feature vector 2 of the region 2 of the feature map of the training sample is also "00".
S302: and obtaining second feature vectors marked with the classification results based on the classification results and the coding results of the areas of the training samples under each iteration, wherein the second feature vectors correspond to the areas of the training samples.
Taking the ith iteration as an example, the convolutional layer is utilized to code each region of the training sample under the ith iteration, and the obtained coding result is a feature vector aiming at each region of the training sample. Wherein i is a positive integer greater than or equal to 1. For convenience of explanation, the feature vector obtained by encoding each region of the training sample is regarded as the second feature vector. The number of second feature vectors obtained by encoding each region of one training image is identical to the number of regions of the training image divided.
For example, if the training sample is divided into 8×8=64 equal-sized regions as shown in fig. 4, the number of the obtained second feature vectors is also 64 for the training sample. The 64 second feature vectors (second feature vector 1 to second feature vector 64) correspond to 64 regions (region 1 to region 64) of the training sample. For example, the second feature vector obtained by encoding the region 1 (the 1 st row 1 column region) of the training sample using the convolutional layer is the second feature vector 1. The second feature vector obtained by encoding the region 2 (the 1 st row 2 column region) of the training sample is the second feature vector 2.
When each region of the training sample corresponds to each second feature vector, the classification result of each region of the training sample may be used as the classification result of each second feature vector, thereby obtaining each second feature vector identified with the classification result.
Illustratively, if the classification result of the region 1 of the training sample is "01" and "01" indicates that the main target object of the region 1 is a car, the classification result of the second feature vector 1 corresponding to the region 1 of the training sample is also "01". The classification result of the region 2 of the training sample is "00", and "00" indicates that the main target object of the region 2 is a pedestrian, and the classification result of the second feature vector 2 corresponding to the region 2 of the training sample is also "00".
S303: performing iterative operation on the model to be trained based on the first feature vectors marked with the classification results and the second feature vectors marked with the classification results and/or based on the second feature vectors marked with the classification results and the third feature vectors marked with the classification results under the previous training samples so as to train a target detection model; wherein, each third feature vector is obtained by the classification result and the coding result of each region of the training sample under the previous iteration.
In the present application, taking the training sample as the training sample under the ith (non-1 st iteration) iteration as an example, the previous training sample refers to the training sample under the ith-1 st iteration. The training samples under the i-1 th iteration identify the obtaining scheme of each third feature vector with the classification result, and the process of identifying each second feature vector with the classification result under the i-1 th iteration can be referred to, which is not described in detail.
In S301 to S303, the training samples include both the source domain samples and the target domain samples. The model to be trained is trained based on the second feature vectors identified with the classification results and/or based on the second feature vectors identified with the classification results and the third feature vectors identified with the classification results under the previous training samples. The first feature vector to the third feature vector can embody the features of various targets existing in the training sample. In the application, the characteristics of the similar targets in different domains and the characteristics of the different similar targets in different domains are considered, and training is performed based on the characteristics, so that the training accuracy of the target detection model can be ensured.
In the application, based on each first feature vector marked with the classification result and each second feature vector marked with the classification result, iterative operation is carried out on the model to be trained so as to train out the scheme of the target detection model, and the scheme can be realized at least by one of the following three modes.
Mode one:
step A1: from each first feature vector identified with the classification result and each second feature vector identified with the classification result, a first feature vector and a second feature vector corresponding to the same region of the training sample and a first feature vector and a second feature vector corresponding to different regions of the training sample are determined.
In colloquial terms, this step is to determine from all first feature vectors and all second feature vectors which first feature vector or vectors and second feature vector or vectors are corresponding to the same region of the training sample and which first feature vector or vectors and second feature vector or vectors are corresponding to different regions of the training sample. Illustratively, the first feature vector 1 and the second feature vector 1 are two feature vectors corresponding to the same region of the training sample, such as region 1. The first feature vector 1 is a first feature vector corresponding to a region 1 of the training sample, the second feature vector 2 is a second feature vector corresponding to a region 2 of the training sample, and the first feature vector 1 and the second feature vector 2 are a first feature vector and a second feature vector corresponding to different regions of the training sample.
Step A2: a first instance pair is derived based on a first feature vector and a second feature vector corresponding to the same region of the training sample.
In this step, the first feature vector and the second feature vector corresponding to the same region of the training sample are aggregated to form a first instance pair. That is, the first instance pair is a set of two feature vectors (a first feature vector and a second feature vector) corresponding to the same region of the training sample.
If the training samples are divided into 8×8=64 equal-sized regions as shown in fig. 4, then, referring to the foregoing description, the number of the first feature vectors and the second feature vectors is 64, and the number of the resulting first implementation pairs is 64. For example, if the first feature vector 1 is a first feature vector corresponding to the region 1 of the training sample, and the second feature vector 1 is a second feature vector corresponding to the region 1 of the training sample, then the first feature vector 1 and the second feature vector 1 may be grouped into a first instance pair. The first example pair is a set of two feature vectors corresponding to the same region of the training sample, and can be considered as a positive example pair.
Step A3: a second instance pair is derived based on the first feature vector and the second feature vector corresponding to different regions of the training sample.
In this step, the first feature vector and the second feature vector corresponding to different regions of the training sample are aggregated to form a second instance pair. That is, the second instance pair is a set of two feature vectors (first feature vector and second feature vector) corresponding to different regions of the training sample. For example, if the first feature vector 1 is a first feature vector corresponding to the region 1 of the training sample, and the second feature vector 2 is a second feature vector corresponding to the region 2 of the training sample, then the first feature vector 1 and the second feature vector 2 may be grouped into a second instance pair. The second example pair is a set of two feature vectors corresponding to different regions of the training sample, which can be considered as a negative example pair.
Step A4: and carrying out iterative operation on the model to be trained based on the first instance pair and the second instance pair so as to train the target detection model.
It will be appreciated that the first example pair is two feature vectors derived from the same region of the same training sample, the second example pair is two feature vectors derived from different regions of the same training sample, which can embody features of the same kind of target and features of different kinds of target in the same training sample, and training based on such features in one iteration can ensure that features of the same kind of target in the same domain sample become similar, and features of different kinds of target (object) in the same domain become distant. The method comprises multiple iterations in the whole training process, wherein the training samples of the multiple iterations comprise target domain samples and source domain samples, so that in the whole training process, training is performed based on the characteristics, the characteristics of similar targets (objects) in different domain samples can be guaranteed to be similar, the characteristics of different similar targets in different domains are guaranteed to be far away, and therefore accurate training of a target detection model is achieved.
In step A1 to step A4, the distinction between the positive and negative examples is achieved from the point of view of the region of the training sample. Thereby realizing accurate training of the target detection model.
Mode two:
step B1: from each second feature vector identified with a classification result and each third feature vector identified with a classification result, a second feature vector and a third feature vector identified with the same classification result, and a second feature vector and a third feature vector identified with different classification results are determined.
In colloquial terms, this step is to determine from all second feature vectors and all third feature vectors which second feature vector or vectors and third feature vector or vectors have the same classification result and which second feature vector or vectors and third feature vector or vectors have the same classification result.
Illustratively, if the classification results of the second feature vector 1 and the third feature vector 1 are both "00", it is determined that the second feature vector 1 and the third feature vector 1 are two feature vectors (second feature vector and third feature vector) having the same classification result. If the classification result of the second feature vector 2 is "00" and the classification result of the third feature vector 2 is "01", it is determined that the second feature vector 2 and the third feature vector 2 are two feature vectors having different classification results.
Step B2: a third instance pair is obtained based on the second feature vector and the third feature vector identified with the same classification result.
In this step, the second feature vector and the third feature vector having the same classification result are collected to form a third instance pair. That is, the third example pair is a set of two feature vectors (a second feature vector and a third feature vector) having the same classification result, and can be regarded as a positive example pair. As in the previous example, the second feature vector 1 and the third feature vector 1 may be grouped into a third instance pair.
Step B3: a fourth instance pair is obtained based on the second feature vector and the third feature vector identified with different classification results.
In this step, the second feature vector and the third feature vector having different classification results are aggregated to form a fourth example pair. That is, the fourth example pair is a set of two feature vectors (a second feature vector and a third feature vector) having different classification results, and can be regarded as a negative example pair. As in the previous example, the second feature vector 2 and the third feature vector 2 may be grouped into a fourth example pair.
Step B4: and carrying out iterative operation on the model to be trained based on the third instance pair and the fourth instance pair so as to train the target detection model.
It can be understood that in one iteration, the third example pair is two feature vectors with the same classification result and derived from the same training sample, the fourth example pair is two feature vectors with different classification results and derived from the same training sample, so that the features of the same kind of targets and the features of different kinds of targets in the same training sample can be reflected, in one iteration, training is performed based on the features, so that the features of the same kind of targets in the same domain sample can be guaranteed to be similar, and the features of different kinds of targets (targets) in the same domain can be guaranteed to be far away. The method comprises multiple iterations in the whole training process, wherein the training samples for the multiple iterations comprise target domain samples and source domain samples, so that in the whole training process, training is performed based on the characteristics, the characteristics of similar targets in different domain samples can be guaranteed to be similar, the characteristics of different similar targets in different domains can be guaranteed to be far away, and therefore accurate training of a target detection model is achieved.
In steps B1 to B4, the distinction between the positive and negative examples is achieved from the point of view of the classification result of the training sample. Thereby realizing accurate training of the target detection model.
Mode three: and performing iterative operation on the model to be trained based on the first instance pair and the second instance pair and the third instance pair and the fourth instance pair to train the target detection model.
It will be appreciated that mode three is quite a combination of mode one and mode two. If the first instance pair and the second instance pair are regarded as first positive and negative instance pairs, the third instance pair and the fourth instance pair are regarded as second positive and negative instance pairs, iterative operation can be performed on the model to be trained based on the two positive and negative instance pairs, so that the characteristics of similar targets (objects) in different domains and/or in the same domain become similar, the characteristics of different targets (objects) in different domains and/or in the same domain become far apart, and training accuracy of the target detection model is ensured.
The following describes the technical scheme of the present application in detail with reference to fig. 4 and 5.
Training samples of two domains are pre-collected or collected: source domain training samples and target domain training samples. In this application scenario, the source domain training samples may be images including one or more target objects used in a simulation scenario. The target domain training samples may be images of one or more target objects included in the actual scene. These two images may be collectively referred to as a sample image, more specifically referred to as a source domain sample image and a target domain sample image, respectively. Taking the ith iteration as an example, the sample image under the ith iteration may be only a source domain sample image, may also be a target domain sample image, and may also be a source domain sample image and a target domain sample image.
The model to be trained includes a backbone network (backbone) and a detection head network (detection head) as shown in fig. 5. The backbone network and the detection head network may specifically be Convolutional Neural Networks (CNNs). According to the application, the model to be trained is trained, so that the expected stability and robustness are achieved. In fig. 5, branch 1, branch 2 and branch 3 are included.
In the present application, the sample image and the feature map of the sample image have the same size. And when the need arises, adopting the same region division strategy to divide the region of the sample image and the feature map. As shown in fig. 4, the sample image and the feature map are divided by using a region division policy of equally dividing into 8 rows by 8 columns, 64 regions in total, and the like.
For the sample image at the i-th iteration, the following processes of branch 1, branch 2 and branch 3 are required, respectively.
Wherein the processing procedure through the branch 1 comprises the following steps: and inputting the sample image under the ith iteration into a backbone network to be trained. And the backbone network performs feature extraction on the sample image under the ith iteration, so as to obtain a feature map of the sample image under the ith iteration. The feature map can be understood as a picture with three channels of RGB (red green blue). The signature is input to a network of detection heads. According to the feature map, the detection head network performs classification prediction on each pixel in each region (8×8=64 regions) of the sample image at the ith iteration. For the same region, the classification result of a plurality of pixels in the region is used as the classification result of the region. Illustratively, in an area, the classification result of a majority of pixels is the target object "car" (which may be represented by 01), then the classification result of the area is the target object "car". From this, a classification result (classification probilities) for each region among 64 regions of the sample image at the i-th iteration can be obtained.
It will be appreciated that the source domain sample image is a training sample with labels. Such as which target object(s) are included in the source domain sample image. While the target domain sample image is typically a training sample without a label. In the branch 1, the (actual) classification result of each region of the labeled source domain sample image and the sample image obtained by the above-described scheme is used to calculate the loss function. This loss may be considered as a loss due to incorrect classification, and may be simply referred to as a classification loss or a target detection loss. The classification loss function may employ any reasonable function for calculating classification or target detection losses. Such as a mean square error Loss function (Mean Squared Loss), an average absolute error Loss function (Mean Absolute Error Loss), quantile Loss (Quantile Loss), etc.
Wherein the processing procedure through the branch 2 comprises: and inputting the sample image under the ith iteration into a backbone network to be trained. And the backbone network performs feature extraction on the sample image under the ith iteration, so as to obtain a feature map of the sample image under the ith iteration. The feature map can be understood as a picture with three channels of RGB (red green blue). The above processing of branch 2 is identical to the previous processing of branch 1, as shown in fig. 5, and may be combined into the same processing.
The feature map is divided into 64 regions of 8 rows by 8 columns according to the same region division strategy as for the sample image at the ith iteration. The 64 regions of the feature map are input to a first encoding network (query encoder). The first coding network may be regarded as a convolution layer, and is configured to implement coding of each region of the feature map through convolution operation, so as to obtain a coding result of each region of the feature map. The convolution operation of the convolution layer results in 64 eigenvectors, down-sampling each region into one eigenvector. The 64 feature vectors may be used as first feature vectors (query vectors). The 64 feature vectors correspond to the 64 regions of the feature map. Each region of the feature map has a feature vector corresponding thereto. Illustratively, a 1 st row and a 1 st column region of the 64 regions of the feature map corresponds to a 1 st feature vector of the 64 feature vectors, a 1 st row and a 2 nd column region of the 64 regions of the feature map corresponds to a 2 nd feature vector of the 64 feature vectors, a 2 nd row and a 1 st column region of the 64 regions of the feature map corresponds to a 9 th feature vector of the 64 feature vectors, and so on.
And respectively combining the 64 feature vectors obtained through the first coding network and the classification results of the regions of the sample image under the ith iteration, thereby obtaining first feature vectors marked with the classification results. If the classification result of the 1 st row and 1 st column region of the sample image is "00", the classification result of the 1 st feature vector of the 64 feature vectors is "00". The classification result of the 1 st row and 2 nd column regions of the sample image is "01", and the classification result of the 2 nd feature vector of the 64 feature vectors is "01". Each first feature vector identified with the classification result may be represented in fig. 5 by classified query embeddings.
Wherein the processing through branch 3 comprises: the sample image at the i-th iteration is input into a second encoding network (key encoder). The second coding network is a convolution layer, and the convolution operation of the sample image under the ith iteration is realized. When the sample image is divided into 64 regions of 8 rows by 8 columns, the result of the convolution operation in the second coding network is that the result of the coding on each region is-64 eigenvectors (key empeddings). The 64 feature vectors obtained through the second encoding network can be used as second feature vectors.
And respectively combining the 64 feature vectors obtained through the second coding network and the classification results of the regions of the sample image under the ith iteration, thereby obtaining second feature vectors marked with the classification results. If the classification result of the 1 st row and 1 st column region of the sample image is "00", the classification result of the 1 st feature vector of the 64 feature vectors is "00". The classification result of the 1 st row and 2 nd column regions of the sample image is "01", and the classification result of the 2 nd feature vector of the 64 feature vectors is "01". Each second feature vector identified with the classification result may be represented in fig. 5 by classified key embeddings.
In the case of obtaining the first feature vector and the second feature vector at the i-th iteration, it is determined which one or more of the first feature vector and the second feature vector is a feature vector corresponding to the same region of the training sample, and which one or more of the first feature vector and the second feature vector is a feature vector corresponding to a different region of the training sample, from all of the first feature vector and all of the second feature vector.
Illustratively, the first feature vector 1 and the second feature vector 1 are two feature vectors corresponding to the same region of the training sample, such as region 1. The first feature vector 2 is a first feature vector corresponding to the region 2 of the training sample, the second feature vector 5 is a second feature vector corresponding to the region 5 of the training sample, and the first feature vector 2 and the second feature vector 5 are a first feature vector and a second feature vector corresponding to different regions of the training sample.
Two feature vectors (a first feature vector and a second feature vector) corresponding to the same region of the sample image at the ith iteration are aggregated or combined to form a positive example pair (used as a first example pair). Two feature vectors (a first feature vector and a second feature vector) corresponding to different regions of the sample image at the ith iteration are aggregated or combined to form a negative example pair (used as a second example pair).
The foregoing describes the process of obtaining the second feature vector at the ith iteration. It will be appreciated that at each previous iteration at the ith iteration, the key empeddings at each previous iteration is obtained as described above. The key empeddings under each previous iteration are classified according to the classification result of the key empeddings under each previous iteration and stored in a queue (queue). The key entries stored into the queue may be considered a third feature vector. In the queue, the third feature vectors are stored in a classified manner, and the third feature vectors stored in the queue are identified as classification results.
The second feature vector and the third feature vector with the same classification result are determined from the second feature vector with the classification result obtained based on the sample image under the i-th iteration and the third feature vector with the classification result obtained based on the sample image under the previous iteration. For example, the second feature vector 1 and the third feature vector 1 having the same classification result of the target object being "car", and/or the second feature vector 2 and the third feature vector 2 having the same classification result of the target object being "pedestrian". Further, from all the second feature vectors and the third feature vectors, the second feature vectors and the third feature vectors having different classification results are determined. For example, a second feature vector 3 having a classification result of "car" as a target object and a third feature vector 3 having a classification result of "pedestrian" as a target object. A second feature vector 4 having a classification result of "building" as the target object and a third feature vector 4 having a classification result of "sidewalk" as the target object.
Two feature vectors (a second feature vector and a third feature vector) having the same classification result are collected or combined to obtain a positive example pair (used as a third example pair). Two feature vectors (a second feature vector and a third feature vector) having different classification results are aggregated or combined to obtain a negative example pair (used as a fourth example pair).
After the ith iteration is finished, storing each second feature vector obtained in the ith iteration into a queue according to the identified classification result for use in a later iteration process.
For the foregoing, two positive example pairs (first and third example pairs), and two negative example pairs (second and fourth example pairs) are obtained. The number of instance pairs is typically multiple in each of the positive and negative instance pairs. In each instance pair, two feature vectors are included. And multiplying the two eigenvectors in each instance pair by each other to obtain a new result. The new result is input to a classifier, in particular a classifier. The model to be trained learns to divide the result obtained by multiplying the feature vectors in the positive example pair by two and the result obtained by multiplying the feature vectors in the negative example pair by two into two different categories. The two different classes include feature vectors of homogeneous target objects and feature vectors of heterogeneous target objects.
It will be appreciated that positive instance pairs are sets of feature vectors having the same classification result or corresponding to the same region of the sample image, and negative instance pairs are sets of feature vectors having different classification results or corresponding to different regions of the sample image. Through the classifier, the model to be trained learns how to make the distance between every two feature vectors in the positive example pair more and more close, and meanwhile make the distance between every two feature vectors in the negative example pair more and more far away. That is, in a single iteration, the features of the same kind of objects can be made similar and the features of different kinds of objects become further.
From the whole iteration process, the sample image comprises a target domain sample image and a source domain sample image, and the iteration operation of the model to be trained is realized by utilizing the two positive example pairs and the two negative example pairs, so that the characteristics of similar targets (objects) in different domains and/or same domain samples can be ensured to be similar, and the characteristics of different similar targets in different domains and/or same domain samples can be ensured to be far away, thereby realizing the accurate training of the model to be detected.
In the training scheme of the application, the branch 1 is a target detection branch, and the model to be trained is enabled to carry out detection learning of a target object. The set of the branch 2 and the branch 3 can be regarded as comparison learning branches, so that the model to be trained learns that the characteristics of the similar targets are similar, and the characteristics of the different targets are farther. Structurally, FIG. 5 of the present application is a contrasting learning branch added to the target detection branch. By dividing the sample image and the feature map of the sample image into regions or image blocks (patches) of the same size, features of the target object are learned by the target detection branch to detect the target object. And by utilizing a contrast learning branch, the characteristics of the similar targets (objects) are changed to be similar, and the characteristics of the different targets are changed to be far away, so that the characteristic alignment of the targets (objects) in the source domain sample and the target domain sample is realized. Therefore, the model to be trained can be improved to obtain a better generalization effect in the target domain.
In popular terms, the domain-based self-adaptive training method provided by the application can enable the characteristics of the similar targets (objects) of different domains and/or the same domain to be similar and the characteristics of the different targets of different domains and/or the same domain to be far away by adding the contrast learning branch on the basis of the target detection branch. To improve the generalization ability of the model in the target domain.
The technical scheme of the application can be regarded as a self-supervision contrast learning method to solve the problem of domain self-adaption of target object detection. The method shown by the comparison learning branch enables the model to be trained to learn the characteristics of the target objects in the source domain and the target domain, and aligns the characteristics of the target objects in the source domain and the target domain, so as to achieve the purpose of domain self-adaption.
In the training scheme of the present application, the loss function includes, in addition to the target detection loss (detection loss in fig. 5) generated by the aforementioned branch 1, a contrast learning loss (contrast loss in fig. 5) generated by the contrast learning branch. Wherein the target detection loss is calculated by using the label of the source domain sample image and the actual classification result of the target object of the source domain sample image detected by the branch 1. In brief, the target detection loss is calculated only on the source domain sample, so that the model to be trained has the capability of finding and classifying targets. The contrast learning loss is obtained based on the source domain sample image, the target domain sample image and the classification result of the classifier. In short, the contrast learning loss is calculated on all sample images, so that the model to be trained has the capability of generating the same or similar characteristics as the target object of the source domain, and the target object can be conveniently detected in the target domain by the target detection model.
By adopting the multiple iteration scheme, when the loss function converges or the preset maximum iteration number arrives, the iteration is stopped, and the training of the model to be trained is finished. The model obtained after training is finished can be used as a target detection model.
The solution shown in fig. 5 is training of the model to be trained by means of the third mode described above. In addition, the training of the model to be trained can be realized in the first mode or the second mode. The training scheme is simple and feasible and has high feasibility.
The application also provides a detection method based on domain self-adaption, as shown in fig. 6, the method comprises the following steps:
s1001: obtaining an image to be detected;
in this step, an image to be subjected to target detection is taken as an image to be detected. For example, a travel image acquired during travel of an automobile. An image captured by a user at a shooting location.
S1002: inputting an image to be detected into a target detection model to obtain one or more target objects in the image to be detected; the target detection model is obtained by training a model to be trained through classification results and coding results of all areas of a training sample under each iteration and coding results of all areas of a feature map of the training sample; the classification result of each region of the training sample is obtained through a feature map of the training sample; the coding result of each region of the training sample is obtained by extracting the characteristics of the training sample; the training samples under each iteration comprise source domain samples and/or target domain samples; the target detection model classifies one or more target objects in an image to be detected through a feature map of the image to be detected.
In this step, the image to be detected is input to the target detection model. Specifically, the data are input into a trained backbone network. And extracting the characteristics of the image to be detected by the backbone network to obtain a characteristic diagram of the image to be detected. Inputting the feature map into a trained detection head network, and outputting each predicted target object by the detection head network or outputting each predicted target object and the position of each target object in the image to be detected.
The object detection model in the application is obtained by the training scheme. During training, the characteristics of the same kind of targets (objects) in different domains and/or the same domain can be changed to be similar, and the characteristics of the different kinds of targets in different domains and/or the same domain can be changed to be far away. The generalization capability of the model in the target domain is improved. The target detection model with strong generalization capability is utilized to realize the detection of the target object in the model to be detected, the realization is simple and not complex, and the detection accuracy can be improved.
The application provides training equipment based on a domain self-adaptive detection model, as shown in fig. 7, comprising:
a first obtaining unit 601, configured to obtain training samples under each iteration, where the training samples under each iteration include a source domain sample and/or a target domain sample;
A second obtaining unit 602, configured to obtain a feature map of the training sample under each iteration;
a third obtaining unit 603, configured to obtain a classification result for each region of the training sample under each iteration based on the feature map;
a fourth obtaining unit 604, configured to obtain a result of encoding each region of the feature map;
a fifth obtaining unit 605, configured to perform feature extraction on the training sample under each iteration, so as to obtain a coding result of each region of the training sample under each iteration;
the training unit 606 is configured to perform iterative operation on a model to be trained based on the classification result and the encoding result of each region of the training sample under each iteration and the encoding result of each region of the feature map, so as to train a target detection model, where the target detection model is used to perform target detection on an image to be detected.
In some embodiments, the fourth obtaining unit 604 is configured to:
determining a partitioning strategy for each region of the training sample;
according to the division strategy of each region of the training sample, carrying out region division on the feature map to obtain each region of the feature map;
and adopting a preset coding algorithm to code each region of the feature map to obtain a coding result of each region of the feature map.
In some embodiments, the training unit 606 is configured to:
based on the classification result of each region of the training sample under each iteration and the coding result of each region of the feature map of the training sample, obtaining each first feature vector marked with the classification result, wherein each first feature vector corresponds to each region of the feature map;
based on the classification result and the coding result of each region of the training sample under each iteration, obtaining each second feature vector marked with the classification result, wherein each second feature vector corresponds to each region of the training sample;
performing iterative operation on the model to be trained based on the first feature vectors marked with the classification results and the second feature vectors marked with the classification results and/or based on the second feature vectors marked with the classification results and the third feature vectors marked with the classification results under the previous training samples so as to train a target detection model;
wherein, each third feature vector is obtained by the classification result and the coding result of each region of the training sample under the previous iteration.
In some embodiments, the training unit 606 is configured to:
Determining a first feature vector and a second feature vector corresponding to the same region of the training sample and a first feature vector and a second feature vector corresponding to different regions of the training sample from each first feature vector identified with the classification result and each second feature vector identified with the classification result;
obtaining a first instance pair based on a first feature vector and a second feature vector corresponding to the same region of the training sample;
obtaining a second instance pair based on the first feature vector and the second feature vector corresponding to different regions of the training sample;
and carrying out iterative operation on the model to be trained based on the first instance pair and the second instance pair so as to train the target detection model.
In some embodiments, the training unit 606 is configured to:
determining a second feature vector and a third feature vector which are marked with the same classification result and a second feature vector and a third feature vector which are marked with different classification results from the second feature vectors marked with the classification results and the third feature vectors marked with the classification results;
obtaining a third instance pair based on the second feature vector and the third feature vector identified with the same classification result;
Obtaining a fourth instance pair based on the second feature vector and the third feature vector with different classification results;
and carrying out iterative operation on the model to be trained based on the third instance pair and the fourth instance pair so as to train the target detection model.
In some embodiments, each region of the training sample includes a plurality of pixels. The third obtaining unit 603 is further configured to:
based on the feature map, obtaining a classification result of each pixel in each region of the training sample under each iteration;
based on the classification result of each pixel in each region, the classification result of each region is obtained.
The application provides a detection device based on domain adaptation, as shown in fig. 8, the device comprises:
a first obtaining module 701, configured to obtain an image to be detected;
a second obtaining module 702, configured to input an image to be detected into a target detection model, to obtain one or more target objects in the image to be detected;
the target detection model is obtained by training a model to be trained through classification results and coding results of all areas of a training sample under each iteration and coding results of all areas of a feature map of the training sample; the classification result of each region of the training sample is obtained through a feature map of the training sample; the coding result of each region of the training sample is obtained by extracting the characteristics of the training sample; the training samples under each iteration comprise source domain samples and/or target domain samples; the target detection model classifies one or more target objects in an image to be detected through a feature map of the image to be detected.
It should be noted that, in the training device based on the domain adaptive detection model and the detection device based on the domain adaptive detection model according to the embodiments of the present application, since the principle of solving the problem of the two devices is similar to the foregoing training method based on the domain adaptive detection model and the foregoing detection method based on the domain adaptive detection model, the implementation process and the implementation principle of the two devices may be described with reference to the implementation process and the implementation principle of the foregoing method, and repeated descriptions are omitted.
According to an embodiment of the present application, the present application also provides an electronic device and a readable storage medium. Wherein, the electronic equipment includes: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the foregoing domain-adaptive detection model-based training method and/or domain-adaptive detection method.
Wherein a non-transitory computer readable storage medium storing computer instructions for causing the computer to perform a training method and/or a domain-adaptive based detection method according to the aforementioned domain-adaptive based detection model.
Fig. 9 shows a schematic block diagram of an example electronic device 800 that may be used to implement an embodiment of the application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the applications described and/or claimed herein.
As shown in fig. 9, the apparatus 800 includes a computing unit 801 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 802 or a computer program loaded from a storage unit 808 into a Random Access Memory (RAM) 803. In the RAM 803, various programs and data required for the operation of the device 800 can also be stored. The computing unit 801, the ROM 802, and the RAM 803 are connected to each other by a bus 804. An input/output (I/O) interface 805 is also connected to the bus 804.
Various components in device 800 are connected to I/O interface 805, including: an input unit 806 such as a keyboard, mouse, etc.; an output unit 807 such as various types of displays, speakers, and the like; a storage unit 808, such as a magnetic disk, optical disk, etc.; and a communication unit 809, such as a network card, modem, wireless communication transceiver, or the like. The communication unit 809 allows the device 800 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.
The computing unit 801 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 801 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 801 performs the various methods and processes described above, such as a training method based on a domain-adaptive detection model and/or a detection method based on domain-adaptation. For example, in some embodiments, the training method of the domain-adaptive-based detection model and/or the domain-adaptive-based detection method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 808. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 800 via ROM 802 and/or communication unit 809. When the computer program is loaded into the RAM 803 and executed by the computing unit 801, one or more steps of the training method based on the domain-adaptive detection model and/or the detection method based on the domain-adaptive described above may be performed. Alternatively, in other embodiments, the computing unit 801 may be configured by any other suitable means (e.g. by means of firmware) to perform a training method of the domain-adaptive based detection model and/or a domain-adaptive based detection method.
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for carrying out methods of the present application may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of the present application, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.
The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.
The foregoing is merely illustrative of the present application, and the present application is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (11)

1. A training method of a domain-based adaptive detection model, comprising:
obtaining training samples under each iteration, wherein the training samples under each iteration comprise source domain samples and/or target domain samples;
obtaining a feature map of a training sample under each iteration;
based on the feature map, obtaining a classification result of each region of the training sample under each iteration;
obtaining the coding result of each region of the feature map;
obtaining the coding result of each region of the training sample under each iteration;
and carrying out iterative operation on the model to be trained based on the classification result and the coding result of each region of the training sample under each iteration and the coding result of each region of the feature map so as to train a target detection model, wherein the target detection model is used for carrying out target detection on the image to be detected.
2. The method of claim 1, the obtaining the encoding results for each region of the feature map, comprising:
determining a partitioning strategy for each region of the training sample;
according to the division strategy of each region of the training sample, carrying out region division on the feature map to obtain each region of the feature map;
and adopting a preset coding algorithm to code each region of the feature map to obtain a coding result of each region of the feature map.
3. The method according to claim 1 or 2, wherein the performing iterative operation on the model to be trained based on the classification result and the encoding result of each region of the training sample at each iteration and the encoding result of each region of the feature map to train out the target detection model includes:
based on the classification result of each region of the training sample under each iteration and the coding result of each region of the feature map of the training sample, obtaining each first feature vector marked with the classification result, wherein each first feature vector corresponds to each region of the feature map;
based on the classification result and the coding result of each region of the training sample under each iteration, obtaining each second feature vector marked with the classification result, wherein each second feature vector corresponds to each region of the training sample;
Performing iterative operation on the model to be trained based on the first feature vectors marked with the classification results and the second feature vectors marked with the classification results and/or based on the second feature vectors marked with the classification results and the third feature vectors marked with the classification results under the previous training samples so as to train a target detection model;
wherein, each third feature vector is obtained by the classification result and the coding result of each region of the training sample under the previous iteration.
4. A method according to claim 3, wherein the performing an iterative operation on the model to be trained based on the first feature vectors identified with the classification results and the second feature vectors identified with the classification results to train the target detection model comprises:
determining a first feature vector and a second feature vector corresponding to the same region of the training sample and a first feature vector and a second feature vector corresponding to different regions of the training sample from each first feature vector identified with the classification result and each second feature vector identified with the classification result;
obtaining a first instance pair based on a first feature vector and a second feature vector corresponding to the same region of the training sample;
Obtaining a second instance pair based on the first feature vector and the second feature vector corresponding to different regions of the training sample;
and carrying out iterative operation on the model to be trained based on the first instance pair and the second instance pair so as to train the target detection model.
5. A method according to claim 3, wherein the performing an iterative operation on the model to be trained based on the second feature vectors identified with the classification results and the third feature vectors identified with the classification results under the previous training samples to train the target detection model comprises:
determining a second feature vector and a third feature vector which are marked with the same classification result and a second feature vector and a third feature vector which are marked with different classification results from the second feature vectors marked with the classification results and the third feature vectors marked with the classification results;
obtaining a third instance pair based on the second feature vector and the third feature vector identified with the same classification result;
obtaining a fourth instance pair based on the second feature vector and the third feature vector with different classification results;
and carrying out iterative operation on the model to be trained based on the third instance pair and the fourth instance pair so as to train the target detection model.
6. The method of claim 1 or 2, each region of the training sample comprising a plurality of pixels;
based on the feature map, obtaining a classification result of each region of the training sample under each iteration, including:
based on the feature map, obtaining a classification result of each pixel in each region of the training sample under each iteration;
based on the classification result of each pixel in each region, the classification result of each region is obtained.
7. A domain-based adaptive detection method, comprising:
obtaining an image to be detected;
inputting an image to be detected into a target detection model to obtain one or more target objects in the image to be detected;
the target detection model is obtained by training a model to be trained through classification results and coding results of all areas of a training sample under each iteration and coding results of all areas of a feature map of the training sample; the classification result of each region of the training sample is obtained through a feature map of the training sample; the coding result of each region of the training sample is obtained by extracting the characteristics of the training sample; the training samples under each iteration comprise source domain samples and/or target domain samples; the target detection model classifies one or more target objects in an image to be detected through a feature map of the image to be detected.
8. A training device based on a domain-adaptive detection model, comprising:
the first obtaining unit is used for obtaining training samples under each iteration, wherein the training samples under each iteration comprise source domain samples and/or target domain samples;
the second obtaining unit is used for obtaining a feature map of the training sample under each iteration;
the third obtaining unit is used for obtaining a classification result of each region of the training sample under each iteration based on the feature map;
a fourth obtaining unit configured to obtain a result of encoding each region of the feature map;
a fifth obtaining unit, configured to perform feature extraction on the training sample under each iteration, to obtain a coding result of each region of the training sample under each iteration;
the training unit is used for carrying out iterative operation on the model to be trained based on the classification result and the coding result of each region of the training sample under each iteration and the coding result of each region of the feature map so as to train a target detection model, wherein the target detection model is used for carrying out target detection on the image to be detected.
9. A domain-based adaptive detection device, comprising:
the first acquisition module is used for acquiring an image to be detected;
The second acquisition module is used for inputting an image to be detected into the target detection model to obtain one or more target objects in the image to be detected;
the target detection model is obtained by training a model to be trained through classification results and coding results of all areas of a training sample under each iteration and coding results of all areas of a feature map of the training sample; the classification result of each region of the training sample is obtained through a feature map of the training sample; the coding result of each region of the training sample is obtained by extracting the characteristics of the training sample; the training samples under each iteration comprise source domain samples and/or target domain samples; the target detection model classifies one or more target objects in an image to be detected through a feature map of the image to be detected.
10. An electronic device, the electronic device comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-6, and/or the method of claim 7.
11. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-6, and/or the method of claim 7.
CN202310850158.7A 2023-07-11 2023-07-11 Training and detecting method of detecting model based on domain self-adaption and related equipment Pending CN116910547A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310850158.7A CN116910547A (en) 2023-07-11 2023-07-11 Training and detecting method of detecting model based on domain self-adaption and related equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310850158.7A CN116910547A (en) 2023-07-11 2023-07-11 Training and detecting method of detecting model based on domain self-adaption and related equipment

Publications (1)

Publication Number Publication Date
CN116910547A true CN116910547A (en) 2023-10-20

Family

ID=88362153

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310850158.7A Pending CN116910547A (en) 2023-07-11 2023-07-11 Training and detecting method of detecting model based on domain self-adaption and related equipment

Country Status (1)

Country Link
CN (1) CN116910547A (en)

Similar Documents

Publication Publication Date Title
CN109753913B (en) Multi-mode video semantic segmentation method with high calculation efficiency
CN113033537B (en) Method, apparatus, device, medium and program product for training a model
US11132392B2 (en) Image retrieval method, image retrieval apparatus, image retrieval device and medium
JP5985622B2 (en) Content adaptive system, method and apparatus for determining optical flow
Suárez et al. ELSED: Enhanced line segment drawing
EP4085369A1 (en) Forgery detection of face image
CN112560684B (en) Lane line detection method, lane line detection device, electronic equipment, storage medium and vehicle
CN113436100B (en) Method, apparatus, device, medium, and article for repairing video
CN114677565B (en) Training method and image processing method and device for feature extraction network
CN113221768A (en) Recognition model training method, recognition method, device, equipment and storage medium
CN107832794A (en) A kind of convolutional neural networks generation method, the recognition methods of car system and computing device
CN112949767A (en) Sample image increment, image detection model training and image detection method
US20210174104A1 (en) Finger vein comparison method, computer equipment, and storage medium
CN113705362A (en) Training method and device of image detection model, electronic equipment and storage medium
US20230245429A1 (en) Method and apparatus for training lane line detection model, electronic device and storage medium
CN111652181A (en) Target tracking method and device and electronic equipment
CN108520532B (en) Method and device for identifying motion direction of object in video
CN110969640A (en) Video image segmentation method, terminal device and computer-readable storage medium
CN113704276A (en) Map updating method and device, electronic equipment and computer readable storage medium
CN111898544A (en) Character and image matching method, device and equipment and computer storage medium
CN113723515B (en) Moire pattern recognition method, device, equipment and medium based on image recognition
CN113537309B (en) Object identification method and device and electronic equipment
CN113887535B (en) Model training method, text recognition method, device, equipment and medium
CN116910547A (en) Training and detecting method of detecting model based on domain self-adaption and related equipment
CN111815658B (en) Image recognition method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination