CN112883988A - Training and feature extraction method of feature extraction network based on multiple data sets - Google Patents

Training and feature extraction method of feature extraction network based on multiple data sets Download PDF

Info

Publication number
CN112883988A
CN112883988A CN202110298576.0A CN202110298576A CN112883988A CN 112883988 A CN112883988 A CN 112883988A CN 202110298576 A CN202110298576 A CN 202110298576A CN 112883988 A CN112883988 A CN 112883988A
Authority
CN
China
Prior art keywords
feature extraction
training
data sets
extraction network
sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110298576.0A
Other languages
Chinese (zh)
Other versions
CN112883988B (en
Inventor
郁亚峰
毛晓蛟
章勇
曹李军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Keda Technology Co Ltd
Original Assignee
Suzhou Keda Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Keda Technology Co Ltd filed Critical Suzhou Keda Technology Co Ltd
Priority to CN202110298576.0A priority Critical patent/CN112883988B/en
Publication of CN112883988A publication Critical patent/CN112883988A/en
Application granted granted Critical
Publication of CN112883988B publication Critical patent/CN112883988B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • G06V10/464Salient features, e.g. scale invariant feature transforms [SIFT] using a plurality of salient features, e.g. bag-of-words [BoW] representations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to the technical field of image processing, in particular to a training and feature extraction method of a feature extraction network based on multiple data sets, wherein the training method comprises the steps of obtaining multiple sample data sets; sequentially inputting sample images in any two sample data sets into a feature extraction network to obtain features corresponding to the sample images; carrying out gradient inversion on the characteristics corresponding to all the sample images to obtain a gradient inversion processing result corresponding to the sample images; determining the bulldozer distance between any two sample data sets based on the gradient inversion processing result; and training the feature extraction network according to the distance of the bulldozer to obtain a target feature extraction network. On the basis of the gradient inversion processing result, the bulldozer distance between any two sample data sets is calculated, and the bulldozer distance can still reflect the distance between two distributions when the two distributions are not overlapped, so that the gradient diffusion phenomenon occurring in GRL can be relieved to a great extent.

Description

Training and feature extraction method of feature extraction network based on multiple data sets
Technical Field
The invention relates to the technical field of image processing, in particular to a training and feature extraction method of a feature extraction network based on multiple data sets.
Background
The development of large-scale image retrieval is greatly promoted by the rise of deep learning technology, deep learning has great demand on data, the richer the data, the better the obtained effect, but the marking of the data needs to consume a large amount of manpower and time, so that a plurality of data sets are jointly trained to form a convenient method for increasing the data.
However, since the background and the application scene of different data sets are different, the problem of domain bias is caused by multi-data-set joint training, and noise is caused for feature learning. Some existing methods for solving the domain migration problem are called domain migration, for example, domain migration is implemented based on a Gradient Reverse Layer (GRL), and most of the domain migration is countertrained by a domain classifier and a feature extractor, so that the extracted features are not affected by a data set.
The GRL is a method based on the countermeasure training, sample images of every two data sets are input into a feature extractor for feature extraction, extracted features are sequentially input into a gradient inversion layer and a domain classifier, the distance between the two data sets is measured by using JS divergence or KL divergence, and the domain classifier and the feature extractor are trained based on the calculated distance. However, when the distribution difference between the two data sets is large, the JS divergence or KL divergence cannot measure the difference between the distributions, and the resulting loss is small, which is not favorable for the optimization of the feature extractor, which may result in the phenomenon of gradient diffusion.
Disclosure of Invention
In view of this, the embodiment of the present invention provides a method for training and extracting features of a feature extraction network based on multiple data sets, so as to solve the problem of gradient dispersion occurring in GRL training.
According to a first aspect, an embodiment of the present invention provides a method for training a feature extraction network based on multiple datasets, including:
acquiring a plurality of sample data sets;
sequentially inputting all sample images in any two sample data sets into a feature extraction network to obtain a feature set corresponding to the sample images;
carrying out gradient inversion on the characteristics corresponding to all the sample images to obtain gradient inversion processing results corresponding to all the sample images;
determining a bulldozer distance between any two sample data sets based on the gradient inversion processing result;
and training the feature extraction network according to the bulldozer distance to obtain a target feature extraction network.
According to the training method of the feature extraction network based on the multiple data sets, the bulldozer distance between any two sample data sets is calculated on the basis of the gradient inversion processing result, the feature extraction network is trained by using the calculated bulldozer distance, wherein the distance between two distributions can still reflect the distance between the two distributions when the two distributions are not overlapped, and therefore the gradient diffusion phenomenon occurring in GRL can be relieved to a great extent.
With reference to the first aspect, in a first implementation manner of the first aspect, the determining a bulldozer distance between any two sample data sets based on a result of the gradient inversion processing includes:
inputting a gradient inversion processing result corresponding to the sample image into a preset network to obtain output data corresponding to each sample data set, wherein the preset network corresponds to any two sample data sets and is used for fitting the bulldozer distance between any two sample data sets;
and calculating the bulldozer distance by using the output data corresponding to each sample data set to obtain the bulldozer distance between any two sample data sets.
The training method for the feature extraction network based on multiple data sets, provided by the embodiment of the invention, utilizes the preset network to fit the bulldozer distance between any two sample data sets, namely, a simple and convenient mode is provided for calculating the bulldozer distance, and the preset network is utilized to replace a domain classifier in a GRL (generalized regression line) so as to relieve the gradient diffusion phenomenon in training and obtain better domain invariant features.
With reference to the first embodiment of the first aspect, in a second embodiment of the first aspect, the inputting a result of gradient inversion processing corresponding to the sample image into a preset network to obtain output data corresponding to each sample data set includes:
and processing a gradient inversion processing result corresponding to the sample image by using at least one full connection layer in the preset network to obtain output data corresponding to each sample data set.
According to the training method of the feature extraction network based on the multiple data sets, the gradient inversion processing result is processed by using at least one full connection layer, the structure of a preset network is simplified, the parameter quantity of a model is reduced, and the training efficiency is improved.
With reference to the first implementation manner of the first aspect or the second implementation manner of the first aspect, in a third implementation manner of the first aspect, the training the feature extraction network according to the bulldozer distance to obtain a target feature extraction network includes:
obtaining the feature extraction loss of the sample image;
determining the characteristic loss by using the bulldozer distance and the characteristic extraction loss;
determining a domain offset loss based on the bulldozer distance;
and training the feature extraction network by using the feature loss, and training the preset network by using the domain offset loss to obtain the target feature extraction network.
According to the training method of the feature extraction network based on the multiple data sets, the feature extraction network is trained by using the feature loss, the preset network is trained by using the domain offset loss, and the feature extraction network and the preset network are trained simultaneously in the training process, so that the problem of domain offset can be reduced as much as possible, and the reliability of the obtained target feature extraction network can be ensured.
With reference to the third embodiment of the first aspect, in the fourth embodiment of the first aspect, the determining a loss of territory offset based on the dozer distance comprises:
performing interpolation processing on the characteristics corresponding to every two sample images of any two sample data sets to obtain interpolation characteristics;
inputting the interpolation characteristics into the preset network to obtain the gradient of the output of the preset network to the input of the preset network;
calculating the penalty loss of the gradient between any two sample data sets by using the obtained gradient;
and determining the domain migration loss by using the bulldozer distance and the gradient penalty loss.
The training method for the feature extraction network based on multiple data sets, provided by the embodiment of the invention, combines the calculation of the input gradient in the domain offset loss, avoids the realization of L constraint by adopting a weight pruning mode, can solve the problem of difficult optimization caused by weight pruning, improves the convergence speed and enables the learned features to be better.
According to a second aspect, an embodiment of the present invention further provides a feature extraction method, including:
acquiring an image to be processed;
inputting the image to be processed into a target feature extraction network to obtain a target feature, where the target feature extraction network is obtained by training according to the first aspect of the present invention or the training method for the feature extraction network based on multiple data sets in any embodiment of the first aspect.
According to the feature extraction method provided by the embodiment of the invention, the target feature extraction network is obtained by calculating the bulldozer distance between any two sample data sets on the basis of the gradient inversion processing result, so that the gradient dispersion phenomenon occurring in GRL is relieved in the training process, and better domain invariant features are obtained, so that the target feature extraction network obtained based on multi-data set training has a more accurate feature extraction effect, and the reliability of the extracted target features is ensured.
According to a third aspect, an embodiment of the present invention further provides a training apparatus for a feature extraction network based on multiple datasets, including:
the first acquisition module is used for acquiring a plurality of sample data sets;
the characteristic extraction module is used for sequentially inputting all sample images in any two sample data sets into a characteristic extraction network to obtain a characteristic set corresponding to the sample images;
the gradient inversion module is used for carrying out gradient inversion on the characteristics corresponding to all the sample images to obtain gradient inversion processing results corresponding to all the sample images;
a distance determining module, configured to determine a bulldozer distance between any two sample data sets based on the gradient inversion processing result;
and the training module is used for training the feature extraction network according to the bulldozer distance so as to obtain a target feature extraction network.
According to the training device of the feature extraction network based on the multiple data sets, the bulldozer distance between any two sample data sets is calculated on the basis of the gradient inversion processing result, and the feature extraction network is trained by using the calculated bulldozer distance, wherein the bulldozer distance can still reflect the distance between two distributions when the two distributions are not overlapped, so that the gradient diffusion phenomenon occurring in GRL can be relieved to a great extent.
According to a fourth aspect, an embodiment of the present invention further provides a feature extraction apparatus, including:
the second acquisition module is used for acquiring an image to be processed;
and the target feature extraction module is used for inputting the image to be processed into a target feature extraction network to obtain a target feature, wherein the target feature extraction network is obtained by training according to the first aspect of the invention or the training method of the feature extraction network based on multiple data sets in the first aspect.
According to the feature extraction device provided by the embodiment of the invention, the target feature extraction network is obtained by calculating the bulldozer distance between any two sample data sets on the basis of the gradient inversion processing result, so that the gradient dispersion phenomenon occurring in GRL is relieved in the training process, and better domain invariant features are obtained, so that the target feature extraction network obtained based on multi-data set training has a more accurate feature extraction effect, and the reliability of the extracted target features is ensured.
According to a fifth aspect, an embodiment of the present invention provides an electronic device, including: a memory and a processor, the memory and the processor being communicatively connected to each other, the memory storing therein computer instructions, and the processor executing the computer instructions to perform the training method for a multiple data set-based feature extraction network according to the first aspect or any one of the embodiments of the first aspect, or to perform the feature extraction method according to the second aspect.
According to a sixth aspect, an embodiment of the present invention provides a computer-readable storage medium storing computer instructions for causing a computer to execute the training method for a multiple data set-based feature extraction network described in the first aspect or any one of the implementation manners of the first aspect, or execute the feature extraction method described in the second aspect.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
FIG. 1 is a flow chart of a method of training a feature extraction network based on multiple datasets in accordance with an embodiment of the present invention;
FIG. 2 is a flow chart of a method of training a feature extraction network based on multiple datasets in accordance with an embodiment of the present invention;
FIG. 3 is a flow chart of a method of training a feature extraction network based on multiple datasets in accordance with an embodiment of the present invention;
FIG. 4 is a block diagram of a framework for multi-dataset joint training according to an embodiment of the present invention;
FIG. 5 is a flow diagram of a feature extraction method according to an embodiment of the invention;
FIG. 6 is a block diagram of a training apparatus for a multi-dataset-based feature extraction network according to an embodiment of the present invention;
fig. 7 is a block diagram of the structure of a feature extraction apparatus according to an embodiment of the present invention;
fig. 8 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The joint training of multiple data sets can cause the problem of domain deviation and noise for feature learning. To solve this problem, a gradient inversion method is often used to perform joint training of multiple datasets. In the joint training, the JS divergence or KL divergence used for measuring the distance between two distributions cannot measure the difference between the two distributions when the distributions are very different, so that the problem of gradient diffusion is caused. To solve this problem, the feature extraction network based on multiple data sets provided in the embodiment of the present invention performs joint training of multiple data sets based on the bulldozer distance (i.e., Wasserstein distance) to solve the problem of gradient diffusion.
In accordance with an embodiment of the present invention, there is provided an embodiment of a method for training a feature extraction network based on multiple datasets, it is noted that the steps illustrated in the flowchart of the figure may be performed in a computer system such as a set of computer executable instructions, and that while a logical order is illustrated in the flowchart, in some cases the steps illustrated or described may be performed in an order different than here.
In this embodiment, a training method for a feature extraction network based on multiple data sets is provided, which may be used in electronic devices, such as computers, mobile phones, tablet computers, and the like, fig. 1 is a flowchart of the training method for a feature extraction network based on multiple data sets according to an embodiment of the present invention, and as shown in fig. 1, the flowchart includes the following steps:
s11, a plurality of sample data sets are obtained.
The plurality of sample data sets may be sample images acquired under different environments, different backgrounds, and different scenes. For example, a plurality of sample images are acquired under scene 1 as a sample data set; acquiring a plurality of sample images under a scene 2 as a sample data set 2; a plurality of sample images are acquired under scene 3 as a sample data set 3, and so on.
The number of the sample data sets acquired by the electronic device and the number of the sample images in each sample data set may be set according to actual conditions, and is not limited herein.
And S12, sequentially inputting all sample images in any two sample data sets into the feature extraction network to obtain a feature set corresponding to the sample images.
When the joint training of multiple data sets is performed, every two sample data sets in the multiple sample data sets are used as a group for training, and the two sample data sets are called as sample data set pairs. For example, when 4 sample data sets are acquired, there are 6 combinations, that is, there are 6 sample data set pairs.
And the electronic equipment sequentially inputs all sample images in each sample data set pair into the feature extraction network, and performs feature extraction on the sample images to obtain a feature set corresponding to the sample images. The feature extraction network may be formed by a plurality of network layers, may also be constructed based on some neural networks, and the like, and specific structural details of the feature extraction network are not limited at all, and may be set according to actual situations.
For example, the sample data set pair 1 has a sample data set a1 and a sample data set b1, where N sample images are in the sample data set a1 and N sample images are in the sample data set b 1. And respectively performing feature extraction on the N sample images in the sample data set a1 and the N sample images in the sample data set b1 by using a feature extraction network to obtain corresponding features.
For sample data set pair 2, there are sample data set a2 and sample data set b2, with M sample images in sample data set a2 and M sample images in sample data set b 2. And respectively performing feature extraction on the M sample images in the sample data set a2 and the M sample images in the sample data set b2 by using a feature extraction network to obtain corresponding features.
And S13, performing gradient inversion on the features corresponding to all the sample images to obtain gradient inversion processing results corresponding to all the sample images.
After obtaining the features corresponding to each sample image in each sample data set pair, the electronic device performs gradient inversion on the features corresponding to each sample image in the sample data set pair by taking the sample data set pair as a unit to obtain a gradient inversion processing result. What GRL does is to multiply the error passed to this layer by a negative number, which makes the training targets of the network before and after GRL opposite to each other to achieve the effect of countermeasure.
For example, for sample data set pair 1, the gradient inversion processing results of the features corresponding to the N sample images in sample data set a1 and the gradient inversion processing results of the features corresponding to the N sample images in sample data set b1 are obtained;
for sample data set pair 2, the results of the gradient inversion processing of the features corresponding to the M sample images in sample data set a2 and the results of the gradient inversion processing of the features corresponding to the M sample images in sample data set b2 are obtained.
And S14, determining the bulldozer distance between any two sample data sets based on the gradient inversion processing result.
After obtaining the gradient inversion processing result of the corresponding feature of each sample image in each sample data set pair, the electronic device calculates the bulldozer distance between two sample data sets in the sample data set pair, namely the Wasserstein distance, by using the gradient inversion processing result. The Wasserstein distance is used for measuring the characteristic distribution condition in the two sample data sets.
And S15, training the feature extraction network according to the bulldozer distance to obtain a target feature extraction network.
After the Wasserstein distance is determined, the electronic equipment determines training loss on the basis, and trains the feature extraction network by using the training loss to obtain the target feature extraction network. For example, the training loss may be obtained by combining the feature extraction loss of the feature extraction network on the basis of the Wasserstein distance; or, the target feature extraction network can be obtained by combining other losses, and the target feature extraction network is not limited at all, and only needs to be obtained by ensuring that the target feature extraction network is obtained based on bulldozer distance training.
According to the training method of the feature extraction network based on the multiple data sets, the bulldozer distance between any two sample data sets is calculated on the basis of the gradient inversion processing result, and the feature extraction network is trained by using the calculated bulldozer distance, wherein the bulldozer distance can still reflect the distance between two distributions when the two distributions are not overlapped, so that the gradient diffusion phenomenon occurring in GRL can be relieved to a great extent.
In this embodiment, a training method for a feature extraction network based on multiple data sets is provided, which can be used in electronic devices, such as computers, mobile phones, tablet computers, and the like, fig. 2 is a flowchart of the training method for a feature extraction network based on multiple data sets according to an embodiment of the present invention, and as shown in fig. 2, the flowchart includes the following steps:
s21, a plurality of sample data sets are obtained.
Please refer to S11 in fig. 1, which is not described herein again.
And S22, sequentially inputting all sample images in any two sample data sets into the feature extraction network to obtain a feature set corresponding to the sample images.
In the following description, sample data set pair 1, i.e. sample data set a1 and sample data set b1, is taken as an example. For the rest, please refer to S12 in the embodiment shown in fig. 1, which is not described herein again.
And S23, performing gradient inversion on the features corresponding to all the sample images to obtain gradient inversion processing results corresponding to all the sample images.
Please refer to S13 in fig. 1, which is not described herein again.
And S24, determining the bulldozer distance between any two sample data sets based on the gradient inversion processing result.
In this embodiment, the calculation of the bulldozer distance is performed using a preset network that corresponds one-to-one to the sample data set. Specifically, the above S24 may include:
and S241, inputting the gradient inversion processing result corresponding to the sample image into a preset network to obtain output data corresponding to each sample data set.
The preset network corresponds to any two sample data sets and is used for fitting the bulldozer distance between any two sample data sets.
The preset networks correspond to the sample data set pairs one to one, for example, when there are 6 sample data set pairs, there are 6 preset networks correspondingly. The input of the preset network is a gradient inversion processing result, and the output is an intermediate calculation parameter of the bulldozer distance.
The gradient inversion processing result corresponds to the characteristics of each sample image of the sample data set in the sample data set pair, and the output of the preset network corresponds to the sample data set. That is, the electronic device inputs the gradient inversion processing result corresponding to the sample image into the preset network, and then output data corresponding to each sample data set can be obtained.
The preset network can be a network formed by a plurality of full connection layers, and input is classified by using the full connection layers to obtain output data corresponding to each sample data set. Optionally, processing a gradient inversion processing result corresponding to the sample image by using at least one full connection layer in the preset network, so as to obtain output data corresponding to each sample data set. For example, a 3-layer full connection layer may be employed as the pre-set network.
Specifically, each sample image in the sample data set pair corresponds to 256-dimensional features after passing through a feature extraction network, and after the 256-dimensional features are subjected to gradient inversion and preset network processing, a 1-dimensional output is obtained, where the 1-dimensional output is output data corresponding to the sample data set.
And the gradient inversion processing result is processed by utilizing at least one full connection layer, so that the structure of a preset network is simplified, the parameter quantity of a model is reduced, and the training efficiency is improved.
And S242, calculating the bulldozer distance by using the output data corresponding to each sample data set to obtain the bulldozer distance between any two sample data sets.
For each sample data set pair, the electronic device may first average the output data corresponding to each sample data set, and then calculate the difference between the two averages, so as to obtain the Wasserstein distance between the two sample data sets in the sample data set pair.
And S25, training the feature extraction network according to the bulldozer distance to obtain a target feature extraction network.
Please refer to S15 in fig. 1, which is not described herein again.
The training method for the feature extraction network based on multiple data sets, provided by the embodiment, utilizes the preset network to fit the bulldozer distance between any two sample data sets, namely, a simple and convenient method is provided for calculating the bulldozer distance, and the preset network is used for replacing the domain classifier in the GRL, so that the gradient diffusion phenomenon in the training can be relieved, and better domain invariant features can be obtained.
In this embodiment, a training method for a feature extraction network based on multiple data sets is provided, which can be used in electronic devices, such as computers, mobile phones, tablet computers, and the like, fig. 3 is a flowchart of the training method for a feature extraction network based on multiple data sets according to an embodiment of the present invention, and as shown in fig. 3, the flowchart includes the following steps:
s31, a plurality of sample data sets are obtained.
Please refer to S21 in fig. 2 for details, which are not described herein.
And S32, sequentially inputting all sample images in any two sample data sets into the feature extraction network to obtain a feature set corresponding to the sample images.
Please refer to S22 in fig. 2 for details, which are not described herein.
And S33, performing gradient inversion on the features corresponding to all the sample images to obtain gradient inversion processing results corresponding to all the sample images.
Please refer to S23 in fig. 2 for details, which are not described herein.
And S34, determining the bulldozer distance between any two sample data sets based on the gradient inversion processing result.
Please refer to S24 in fig. 2 for details, which are not described herein.
And S35, training the feature extraction network according to the bulldozer distance to obtain a target feature extraction network.
Specifically, the above S35 may include:
s351, acquiring the feature extraction loss of the sample image.
For a sample image of a sample data set in a sample data set, after feature extraction is performed by using a feature extraction network, features of the sample image are obtained. And calculating a loss function by using the target characteristics of the sample image and the characteristics of the sample image to obtain the characteristic extraction loss of the sample image. Here, the loss function for calculating the feature extraction loss is not limited at all, and for example, an AM-softmax function may be used.
And S352, determining the characteristic loss by using the bulldozer distance and the characteristic extraction loss.
After obtaining the dozer distance in S34, the electronic device may calculate a weighted sum of the dozer distance and the feature extraction loss obtained in S351 to determine a feature loss; or may be a direct summation to determine the characteristic loss.
And S353, determining the domain offset loss based on the bulldozer distance.
The loss of the domain offset may be calculated using the dozer distance, or may be calculated by combining the dozer distance with other parameters, such as a penalty loss in combination with a gradient, and so on.
In some optional implementations of this embodiment, the step S353 may include:
(1) and performing interpolation processing on the characteristics corresponding to every two sample images of any two sample data sets to obtain interpolation characteristics.
For sample data set pair 1, there is sample data set a1 and sample data set b 1. The two sample data sets in the sample data set pair 1 are provided with N sample images, and the N sample images are combined pairwise to obtain a sample image pair; and carrying out interpolation processing on the characteristics of the sample image pair to obtain interpolation characteristics.
(2) And inputting the interpolation characteristics into a preset network to obtain the gradient of the output of the preset network to the input of the preset network.
The electronic equipment inputs the interpolation characteristics corresponding to each sample image pair into a preset network, and the preset network is utilized to classify the interpolation characteristics. For example, the sample image pair is interpolated to obtain 256-dimensional interpolation features, the 256-dimensional interpolation features are input into a preset network corresponding to the sample image set pair to obtain 1-dimensional output, and the gradient of the interpolation features with the 1-dimensional output pair 256 as input is calculated.
(3) And calculating the penalty loss of the gradient between any two sample data sets by using the obtained gradient.
After the gradient is calculated, a gradient penalty loss function is calculated by using acceleration, for example, an expectation of a square of a difference between a two-norm gradient and 1 is calculated, namely, the gradient penalty loss is obtained.
(4) And determining the domain migration loss by utilizing the bulldozer distance and the gradient penalty loss.
The electronic equipment can weight and calculate the calculated bulldozer distance and the gradient penalty loss to determine the field offset loss.
The above three loss functions, namely bulldozer distance, feature loss and domain offset loss, describe how the present embodiment obtains domain invariant features between multiple data sets by reducing Wasserstein distance while normally learning classification, so as to obtain better feature extraction in multiple data set joint training.
The input gradient is calculated in the field offset loss in a combined manner, the L constraint is avoided being realized in a weight pruning manner, the problem of difficulty in optimization caused by weight pruning can be solved, the convergence speed is increased, and the learned characteristics are better.
And S354, training the feature extraction network by using the feature loss, and training the preset network by using the domain offset loss to obtain the target feature extraction network.
The preset networks corresponding to the feature extraction network and each sample data set are trained simultaneously, and the difference is that loss functions are different. Specifically, the electronic device trains the feature extraction network by using the feature loss, and simultaneously trains the preset network by using the domain offset loss, so as to obtain the target feature extraction network.
In the training method for the feature extraction network based on multiple data sets provided by this embodiment, the feature extraction network is trained by using the feature loss, the preset network is trained by using the domain migration loss, and the feature extraction network and the preset network are trained simultaneously in the training process, so that the problem of domain migration can be reduced as much as possible, and the reliability of the obtained target feature extraction network can be ensured.
As a specific implementation manner of this embodiment, a training framework of the feature extraction network based on multiple data sets is shown in fig. 4. The training framework is a complete end-to-end algorithm model and mainly comprises a preprocessing module, a feature extraction module and a loss function calculation module. Firstly, preprocessing input data, then performing feature extraction on an original image and the preprocessing through a resnet-50 backbone network to obtain a feature vector, calculating feature extraction loss through an Am-softmax loss function, calculating a Wasserstein distance between two data sets, and optimizing a feature extraction network by using the feature extraction loss and the Wasserstein distance; and calculating the domain offset loss by using the Wasserstein distance and the gradient penalty loss, and optimizing the corresponding preset network.
In practical application, if joint training is performed on more than 2 sample data sets, 1 preset network with 3 layers is needed between every two sample data sets for calculating the Wasserstein distance between the two corresponding sample data sets. For example, if there are 4 data sets, 6 preset networks are needed, and as shown in fig. 4, the Wasserstein distance and the gradient penalty loss between two sample data sets are calculated respectively.
Compared with the training directly by using AM-softmax, the method for training the feature extraction network based on the multiple data sets improves the passing rate by 4.5% under the condition of 1% error recognition rate and improves the passing rate by 3.8% under the condition of 0.1% error recognition rate by adopting Resnet-50 as the feature extraction network on the special combined data set and using 256-dimensional floating point features.
In accordance with an embodiment of the present invention, there is provided a feature extraction method embodiment, it is noted that the steps illustrated in the flowchart of the drawings may be performed in a computer system such as a set of computer-executable instructions and that, although a logical order is illustrated in the flowchart, in some cases, the steps illustrated or described may be performed in an order different than here.
In this embodiment, a feature extraction method is provided, which can be used in electronic devices, such as computers, mobile phones, tablet computers, and the like, fig. 5 is a flowchart of the feature extraction method according to an embodiment of the present invention, and as shown in fig. 5, the flowchart includes the following steps:
and S41, acquiring the image to be processed.
The image to be processed may be any image that needs to be subjected to feature extraction, the electronic device may acquire the image in real time, or the image may be transmitted to the electronic device by other devices, or may be stored in the electronic device, and no limitation is imposed on the manner in which the electronic device acquires the image to be processed.
And S42, inputting the image to be processed into the target feature extraction network to obtain the target feature.
The target feature extraction network is trained according to the training method of the feature extraction network based on multiple datasets in any one of the above embodiments.
Specifically, the features extracted by the target feature extraction network depend on the sample images employed for training. For example, when a target feature extraction network obtained by training a face sample image is adopted, the target feature extraction network is used for extracting face features; when the target feature extraction network obtained by vehicle sample image training is adopted, the target feature extraction network is used for extracting vehicle features.
Further, after the electronic device performs feature extraction on the image to be processed by using the target feature extraction network to obtain the target features, the electronic device may also perform target identification by using the target features to determine the target object. For example, the target characteristics are utilized to perform face recognition, and target personnel information in the image to be processed is determined; and identifying the vehicle by using the target characteristics, and determining the target vehicle information in the image to be processed.
The subsequent application of the target feature is not limited herein, and may be set according to actual situations.
In the feature extraction method provided by this embodiment, the target feature extraction network is obtained by calculating the bulldozer distance between any two sample data sets on the basis of the gradient inversion processing result, so that the gradient dispersion phenomenon occurring in the GRL is alleviated in the training process, and better domain invariant features are obtained, so that the target feature extraction network obtained based on the training of multiple data sets has a more accurate feature extraction effect, and the reliability of the extracted target features is ensured.
In this embodiment, a training apparatus for a feature extraction network based on multiple data sets, or a feature extraction apparatus is further provided, and the apparatus is used to implement the foregoing embodiments and preferred embodiments, and has already been described and will not be described again. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.
The embodiment provides a training apparatus for a feature extraction network based on multiple data sets, as shown in fig. 6, including:
a first obtaining module 51, configured to obtain a plurality of sample data sets;
the feature extraction module 52 is configured to input all sample images in any two sample data sets into a feature extraction network in sequence, so as to obtain a feature set corresponding to the sample images;
a gradient inversion module 53, configured to perform gradient inversion on the features corresponding to all the sample images to obtain gradient inversion processing results corresponding to all the sample images;
a distance determining module 54, configured to determine a bulldozer distance between any two sample data sets based on the gradient inversion processing result;
and the training module 55 is configured to train the feature extraction network according to the bulldozer distance to obtain a target feature extraction network.
The training device for the feature extraction network based on multiple data sets provided by the embodiment is used for training the feature extraction network by calculating the bulldozer distance between any two sample data sets on the basis of the gradient inversion processing result and by using the calculated bulldozer distance, wherein the bulldozer distance can still reflect the distance between two distributions when the two distributions are not overlapped, so that the gradient diffusion phenomenon occurring in GRL can be relieved to a great extent.
The present embodiment also provides a feature extraction apparatus, as shown in fig. 7, including:
a second obtaining module 61, configured to obtain an image to be processed;
a target feature extraction module 62, configured to input the image to be processed into a target feature extraction network to obtain a target feature, where the target feature extraction network is obtained by training according to the training method of the feature extraction network based on multiple data sets according to any of the embodiments.
According to the feature extraction device provided by the embodiment, the target feature extraction network is obtained by calculating the bulldozer distance between any two sample data sets on the basis of the gradient inversion processing result, so that the gradient dispersion phenomenon occurring in GRL is relieved in the training process, better domain invariant features are obtained, the target feature extraction network obtained based on multi-data set training has a more accurate feature extraction effect, and the reliability of the extracted target features is ensured.
The training apparatus, or feature extraction apparatus, of the multiple dataset-based feature extraction network in this embodiment is presented in the form of a functional unit, where the unit refers to an ASIC circuit, a processor and a memory executing one or more software or fixed programs, and/or other devices that can provide the above-described functions.
Further functional descriptions of the modules are the same as those of the corresponding embodiments, and are not repeated herein.
An embodiment of the present invention further provides an electronic device, which has the training apparatus based on the multi-dataset feature extraction network shown in fig. 6 or the feature extraction apparatus shown in fig. 7.
Referring to fig. 8, fig. 8 is a schematic structural diagram of an electronic device according to an alternative embodiment of the present invention, and as shown in fig. 8, the electronic device may include: at least one processor 71, such as a CPU (Central Processing Unit), at least one communication interface 73, memory 74, at least one communication bus 72. Wherein a communication bus 72 is used to enable the connection communication between these components. The communication interface 73 may include a Display (Display) and a Keyboard (Keyboard), and the optional communication interface 73 may also include a standard wired interface and a standard wireless interface. The Memory 74 may be a high-speed RAM Memory (volatile Random Access Memory) or a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. The memory 74 may alternatively be at least one memory device located remotely from the processor 71. Wherein the processor 71 may be in connection with the apparatus described in fig. 6 or 7, an application program is stored in the memory 74, and the processor 71 calls the program code stored in the memory 74 for performing any of the above-mentioned method steps.
The communication bus 72 may be a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (EISA) bus. The communication bus 72 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in FIG. 8, but this is not intended to represent only one bus or type of bus.
The memory 74 may include a volatile memory (RAM), such as a random-access memory (RAM); the memory may also include a non-volatile memory (english: non-volatile memory), such as a flash memory (english: flash memory), a hard disk (english: hard disk drive, abbreviated: HDD) or a solid-state drive (english: SSD); the memory 74 may also comprise a combination of memories of the kind described above.
The processor 71 may be a Central Processing Unit (CPU), a Network Processor (NP), or a combination of CPU and NP.
The processor 71 may further include a hardware chip. The hardware chip may be an application-specific integrated circuit (ASIC), a Programmable Logic Device (PLD), or a combination thereof. The PLD may be a Complex Programmable Logic Device (CPLD), a field-programmable gate array (FPGA), a General Array Logic (GAL), or any combination thereof.
Optionally, the memory 74 is also used for storing program instructions. The processor 71 may call program instructions to implement a training method of a feature extraction network based on multiple datasets as shown in the embodiments of fig. 1 to 3 of the present application, or a feature extraction method as shown in the embodiment of fig. 5.
Embodiments of the present invention further provide a non-transitory computer storage medium, where computer-executable instructions are stored, and the computer-executable instructions may execute the training method or the feature extraction method of the feature extraction network based on multiple data sets in any of the above method embodiments. The storage medium may be a magnetic Disk, an optical Disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a Flash Memory (Flash Memory), a Hard Disk (Hard Disk Drive, abbreviated as HDD), a Solid State Drive (SSD), or the like; the storage medium may also comprise a combination of memories of the kind described above.
Although the embodiments of the present invention have been described in conjunction with the accompanying drawings, those skilled in the art may make various modifications and variations without departing from the spirit and scope of the invention, and such modifications and variations fall within the scope defined by the appended claims.

Claims (10)

1. A training method for a feature extraction network based on multiple data sets is characterized by comprising the following steps:
acquiring a plurality of sample data sets;
sequentially inputting all sample images in any two sample data sets into a feature extraction network to obtain a feature set corresponding to the sample images;
carrying out gradient inversion on the characteristics corresponding to all the sample images to obtain gradient inversion processing results corresponding to all the sample images;
determining a bulldozer distance between any two sample data sets based on the gradient inversion processing result;
and training the feature extraction network according to the bulldozer distance to obtain a target feature extraction network.
2. The training method according to claim 1, wherein the determining a bulldozer distance between any two sample data sets based on the gradient inversion processing result comprises:
inputting a gradient inversion processing result corresponding to the sample image into a preset network to obtain output data corresponding to each sample data set, wherein the preset network corresponds to any two sample data sets and is used for fitting the bulldozer distance between any two sample data sets;
and calculating the bulldozer distance by using the output data corresponding to each sample data set to obtain the bulldozer distance between any two sample data sets.
3. The training method according to claim 2, wherein the inputting the gradient inversion processing result corresponding to the sample image into a preset network to obtain output data corresponding to each sample data set comprises:
and processing a gradient inversion processing result corresponding to the sample image by using at least one full connection layer in the preset network to obtain output data corresponding to each sample data set.
4. The training method according to claim 2 or 3, wherein the training the feature extraction network according to the bulldozer distance to obtain a target feature extraction network comprises:
obtaining the feature extraction loss of the sample image;
determining the characteristic loss by using the bulldozer distance and the characteristic extraction loss;
determining a domain offset loss based on the bulldozer distance;
and training the feature extraction network by using the feature loss, and training the preset network by using the domain offset loss to obtain the target feature extraction network.
5. The training method of claim 4, wherein said determining a loss of territorial offset based on said bulldozer distance comprises:
performing interpolation processing on the characteristics corresponding to every two sample images of any two sample data sets to obtain interpolation characteristics;
inputting the interpolation characteristics into the preset network to obtain the gradient of the output of the preset network to the input of the preset network;
calculating the penalty loss of the gradient between any two sample data sets by using the obtained gradient;
and determining the domain migration loss by using the bulldozer distance and the gradient penalty loss.
6. A method of feature extraction, comprising:
acquiring an image to be processed;
inputting the image to be processed into a target feature extraction network to obtain a target feature, wherein the target feature extraction network is obtained by training according to the training method of the feature extraction network based on multiple data sets in any one of claims 1 to 5.
7. A training device for a feature extraction network based on multiple datasets, comprising:
the first acquisition module is used for acquiring a plurality of sample data sets;
the characteristic extraction module is used for sequentially inputting all sample images in any two sample data sets into a characteristic extraction network to obtain a characteristic set corresponding to the sample images;
the gradient inversion module is used for carrying out gradient inversion on the characteristics corresponding to all the sample images to obtain gradient inversion processing results corresponding to all the sample images;
a distance determining module, configured to determine a bulldozer distance between any two sample data sets based on the gradient inversion processing result;
and the training module is used for training the feature extraction network according to the bulldozer distance so as to obtain a target feature extraction network.
8. A feature extraction device characterized by comprising:
the second acquisition module is used for acquiring an image to be processed;
a target feature extraction module, configured to input the image to be processed into a target feature extraction network to obtain a target feature, where the target feature extraction network is obtained by training according to the training method of the multi-dataset-based feature extraction network according to any one of claims 1 to 5.
9. An electronic device, comprising:
a memory and a processor, the memory and the processor being communicatively connected to each other, the memory having stored therein computer instructions, the processor executing the computer instructions to perform the method for training the multiple data set-based feature extraction network according to any one of claims 1 to 5, or to perform the method for feature extraction according to claim 6.
10. A computer-readable storage medium storing computer instructions for causing a computer to execute the method for training the multiple data set-based feature extraction network according to any one of claims 1 to 5 or the method for feature extraction according to claim 6.
CN202110298576.0A 2021-03-19 2021-03-19 Training and feature extraction method of feature extraction network based on multiple data sets Active CN112883988B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110298576.0A CN112883988B (en) 2021-03-19 2021-03-19 Training and feature extraction method of feature extraction network based on multiple data sets

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110298576.0A CN112883988B (en) 2021-03-19 2021-03-19 Training and feature extraction method of feature extraction network based on multiple data sets

Publications (2)

Publication Number Publication Date
CN112883988A true CN112883988A (en) 2021-06-01
CN112883988B CN112883988B (en) 2022-07-01

Family

ID=76041514

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110298576.0A Active CN112883988B (en) 2021-03-19 2021-03-19 Training and feature extraction method of feature extraction network based on multiple data sets

Country Status (1)

Country Link
CN (1) CN112883988B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190012581A1 (en) * 2017-07-06 2019-01-10 Nokia Technologies Oy Method and an apparatus for evaluating generative machine learning model
CN111898635A (en) * 2020-06-24 2020-11-06 华为技术有限公司 Neural network training method, data acquisition method and device
CN112070209A (en) * 2020-08-13 2020-12-11 河北大学 Stable controllable image generation model training method based on W distance

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190012581A1 (en) * 2017-07-06 2019-01-10 Nokia Technologies Oy Method and an apparatus for evaluating generative machine learning model
CN111898635A (en) * 2020-06-24 2020-11-06 华为技术有限公司 Neural network training method, data acquisition method and device
CN112070209A (en) * 2020-08-13 2020-12-11 河北大学 Stable controllable image generation model training method based on W distance

Also Published As

Publication number Publication date
CN112883988B (en) 2022-07-01

Similar Documents

Publication Publication Date Title
US11176418B2 (en) Model test methods and apparatuses
CN106776842B (en) Multimedia data detection method and device
CN107330731B (en) Method and device for identifying click abnormity of advertisement space
CN105938559A (en) Digital image processing using convolutional neural networks
CN111078552A (en) Method and device for detecting page display abnormity and storage medium
CN112085056B (en) Target detection model generation method, device, equipment and storage medium
KR20140035487A (en) Content-adaptive systems, methods and apparatus for determining optical flow
CN110941978B (en) Face clustering method and device for unidentified personnel and storage medium
CN110363098B (en) Violent behavior early warning method and device, readable storage medium and terminal equipment
CN113239914B (en) Classroom student expression recognition and classroom state evaluation method and device
CN111985414B (en) Joint position determining method and device
CN111079507A (en) Behavior recognition method and device, computer device and readable storage medium
CN116188790A (en) Camera shielding detection method and device, storage medium and electronic equipment
CN113781510A (en) Edge detection method and device and electronic equipment
CN116194933A (en) Processing system, processing method, and processing program
CN109190757B (en) Task processing method, device, equipment and computer readable storage medium
CN113743594A (en) Network flow prediction model establishing method and device, electronic equipment and storage medium
CN112597995B (en) License plate detection model training method, device, equipment and medium
CN112802076A (en) Reflection image generation model and training method of reflection removal model
CN113570541A (en) Image quality evaluation method and device, electronic equipment and storage medium
CN112883988B (en) Training and feature extraction method of feature extraction network based on multiple data sets
CN111079930A (en) Method and device for determining quality parameters of data set and electronic equipment
CN111127327B (en) Picture inclination detection method and device
CN114387451A (en) Training method, device and medium for abnormal image detection model
CN114638304A (en) Training method of image recognition model, image recognition method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant