CN116152644A

CN116152644A - Long-tail object identification method based on artificial synthetic data and multi-source transfer learning

Info

Publication number: CN116152644A
Application number: CN202310056416.4A
Authority: CN
Inventors: 张雪松; 刘丽娟
Original assignee: Dalian Jiaotong University
Current assignee: Dalian Jiaotong University
Priority date: 2023-01-18
Filing date: 2023-01-18
Publication date: 2023-05-23

Abstract

The invention discloses a long-tail object recognition method based on artificial synthetic data and multi-source transfer learning, which comprises the steps of obtaining a multi-source training image set, preprocessing and extracting features, determining class labels of each training image, and constructing a training data set according to training feature vectors and the class labels; respectively initializing a weight set corresponding to the training data set, setting a weight adjustment factor and iteration times, selecting an initial weak classifier learning algorithm, and training the training data set according to the weight adjustment factor, the iteration times, the initial weak classifier learning algorithm and the initialized weight set to obtain T weak classifiers and classifier coefficients thereof; and obtaining a target field test image set, preprocessing and extracting features, predicting class labels of each test image according to T weak classifiers, and obtaining and determining the class labels of each test image according to a prediction result and classifier coefficients. The image classifier performance under the condition of absolute unbalance of image class distribution is effectively improved.

Description

Long-tail object identification method based on artificial synthetic data and multi-source transfer learning

Technical Field

The invention relates to the field of recognition of small sample long tail objects, in particular to a long tail object recognition method based on artificial synthetic data and multi-source transfer learning.

Background

Object recognition is a fundamental problem in the field of computer vision, since class label distributions in real-world image dataset exhibit long-tailed trends, classifiers trained using class-unbalanced image data tend to over-fit the majority class (head) data, ignoring the minority class (tail) in classification prediction. Particularly, when training data is insufficient, there is a problem that the generalization ability of the model is reduced.

For the above problems, there are the following solutions: the Rare-Transfer algorithm based on single-source instance Transfer learning is proposed in the document [1], and the algorithm applies the single-source instance Transfer learning to solve the problem of unbalanced classification of class labels of insufficient conditions of training samples for the first time. Rare-Transfer is based on the classical TrAdaboost framework in the Transfer learning domain [2], and premature convergence of the weights of the source domain samples is prevented by introducing a correction factor Ct at each Boosting iteration. The main disadvantages of the Rare-Transfer algorithm are that it is only suitable for handling two kinds of problems and instance migration from a single source domain, and when the few kinds of samples in the source domain are still small, the effect of single source migration learning is limited and "negative migration" is easy to cause, i.e. the knowledge contained in the source domain "positive sample" is harmful for solving the object identification in the target domain. Literature [3-5] although the TrAdaboost framework is popularized from single source migration to multi-source migration learning, a multi-source migration learning algorithm based on the TrAdaboost framework is proposed, but the algorithms mainly aim at the two classification problems of class label distribution balance data. Document [6] proposes a Weighted Multisource-Troadaboost algorithm, which considers the influence of the total number of training samples in the source field and the target field on the performance of the classifier during the migration of examples, and has the main disadvantages of being applicable to the classification problem, introducing new super-parameters and not considering the problem of class imbalance in the target field. Document [7] combines multiple classes of Adaboost algorithms SAMME [8] and TrAdaboost frameworks, and proposes a multi-class single-source instance migration learning algorithm, which has the main disadvantages of being only suitable for single-source migration learning and standard class label distribution data sets and being incapable of effectively processing when training data in the target field are unbalanced.

In summary, the existing methods have the following disadvantages: 1) The unbalanced classification algorithm based on the TrAdaboost framework only considers the single-source instance migration and classification problems, and focuses on application in text classification tasks, and classification accuracy is low when training samples have multiple types; 2) The multi-source instance migration algorithm based on the TrAdaboost framework only considers the classifier training problem under the condition that class labels are in standard distribution, and can not effectively and accurately classify training samples with unbalanced class label distribution; 3) The algorithm based on the SMOTE technology and the Boosting framework only carries out model training in the target field, does not consider knowledge of the source field, reduces the classification accuracy, is mainly concentrated on the design of the training algorithm of the two-class model, and cannot classify training samples with various classes.

Reference to the literature

[1]Al-Stouhi S.,&Reddy C K.Transfer learning for class imbalance problems with inadequate data.Knowledge&Information Systems,2016,48(1),201-228.

[2]Dai W,Yang Q,Xue GR,Yu Y(2007a)Boosting for transfer learning.In:Proceedings of the international conference on machine learning,2007,pp 193–200.

[3]Yao Y,Doretto G,Boosting for transfer learning with multiple sources,2010IEEE Computer Society Conference on Computer Vision and Pattern Recognition,2010,1855-1862.

[4] Zhang Qian, li Ming, wang Xuesong, etc. an example migration learning for multi-source domain [ J ]. Automation journal 2014,40 (006): 1176-1183.

[5] Zhang Qian, li Haigang, li Ming, cheng Yuhu. Example transfer learning method based on multisource dynamic TrAdaBoost [ J ]. University of Chinese mining university, 2014,43 (04): 713-720.

[6]Antunes,J.,Bernardino,A.,Smailagic,A.,Siewiorek,D.(2019).Weighted Multisource Tradaboost.In:Morales,A.,Fierrez,J.,Sánchez,J.,Ribeiro,B.(eds)Pattern Recognition and Image Analysis.IbPRIA 2019.Lecture Notes in Computer Science(),vol 11867.Springer,Cham.

[7]Hanxian He,Kourosh Khoshelham,Clive Fraser,A multiclass TrAdaBoost transfer learning algorithm for the classification of mobile lidar data,ISPRS Journal of Photogrammetry and Remote Sensing,Volume166,2020,Pages 118-127.

[8]Hastie,T.,Rosset,S.,Zhu,J.,Zou,H.,2009.Multi-class adaboost.Stat.Interface 2,349–360.

Disclosure of Invention

The invention provides a long-tail object identification method based on artificial synthetic data and multi-source transfer learning, which aims to overcome the technical problems.

A long-tail object recognition method based on artificial synthetic data and multi-source transfer learning comprises,

step one, acquiring N source field training image sets and a target field training image set, preprocessing, respectively extracting the characteristics of each training image in the preprocessed N source field training image sets and the preprocessed target field training image set according to a deep neural network, acquiring the training characteristic vector of each training image,

step two, training feature vectors of each training image are subjected to l ₂ Zscore normalization processing, determining class labels of the training images according to object types contained in each training image, constructing N source domain training data sets and one target domain training data set according to the training feature vectors and the class labels after normalization processing,

step three, respectively initializing weight sets corresponding to N source field training data sets and one target field training data set, setting a weight adjustment factor, setting iteration times as T, selecting an initial weak classifier learning algorithm, training the N source field training data sets and the one target field training data set according to the weight adjustment factor, the iteration times, the initial weak classifier learning algorithm and the initialized weight set to obtain T weak classifiers and classifier coefficients thereof,

step four, acquiring a target field test image set, preprocessing, extracting features of the preprocessed target field test image set according to a deep neural network, acquiring test feature vectors of each test image in the target field test image set, predicting class labels of the test feature vectors of each test image according to T weak classifiers, and determining class labels of each test image according to prediction results of the T weak classifiers and classifier coefficients, wherein the class labels are object classes contained in the test images.

Preferably, the third step includes,

s1, respectively initializing weight sets of N source field training data sets and one target field training data set, setting weight adjusting factors, initializing n=1, t=1,

s2, for the t-th iteration, normalizing the N source field training data sets and the weight sets of the target field training data sets, respectively manually synthesizing the target field training data sets and the weight sets corresponding to the target field training data sets, obtaining the manually synthesized target field training data sets and the weight sets corresponding to the target field training data sets,

s3, respectively merging the nth source field training data set, the target field training data set and the artificially synthesized target field training data set and representing the nth source field training data set, respectively merging the weight set corresponding to the nth source field training data set, the weight set corresponding to the target field training data set and the weight set corresponding to the artificially synthesized target field training data set and representing the nth merging weight set, training the nth merging training set according to the selected initial weak classifier learning algorithm and the nth merging weight set, obtaining the trained nth weak classifier, calculating the nth training error rate of the nth weak classifier on the target field data set,

s4, when n=n+1, returning to execute S3 to obtain N weak classifiers, selecting the classifier with the lowest training error rate value from the N weak classifiers as the weak classifier of the t-th iteration,

s5, let n=1, obtain weak classifier and its correspondent training error rate that the t th iteration gets, calculate the classifier coefficient of the weak classifier according to training error rate, update the weight set of the training dataset of the goal field according to classifier coefficient, calculate the correction factor according to training error rate, update the weight set of the training dataset of the source field according to correction factor,

and S6, let t=t+1, and return to execute S2 until t=t, so as to obtain T weak classifiers and T classifier coefficients corresponding to the T weak classifiers.

Preferably, the artificial synthesis of the target domain training data set and the weight set corresponding to the target domain training data set includes,

s11, determining object types contained in each training image in the training data set of the target field, taking the object types as class labels, dividing the training data set of the target field into K sample sets according to the class labels, respectively calculating the sample number Nmaster (i) of the ith sample set to be synthesized manually, wherein i=1, i is less than or equal to K,

s12, for the ith sample set, generating Nsmote (i) synthesized sample sets based on the SMOTE technology, distributing a synthesized weight and a synthesized class label to each synthesized sample in the synthesized sample sets, storing the synthesized sample set, the synthesized weight set and the synthesized class label set of the ith sample set and representing the synthesized sample set as the ith artificial synthesized sample set,

and S13, repeatedly executing S12 until i=i+1, sequentially obtaining artificial synthetic sample sets of K sample sets, respectively merging synthetic sample sets in the artificial synthetic sample sets with the target field training data set, and merging synthetic weight sets in the artificial synthetic sample sets with weight sets corresponding to the target field training data set.

Preferably, the selecting an initial weak classifier learning algorithm includes randomly selecting one classifier learning algorithm from the group consisting of SVM, extreme learning machine, decision stumps as the initial weak classifier learning algorithm.

Preferably, said S12 comprises, in combination,

s121, let m=1, when m is less than or equal to Nsmote (i), performing an mth random sampling based on the ith sample set, and obtaining one anchor sample (x_anchor, y_anchor, w_anchor) each time, wherein x_anchor is a feature vector of the anchor sample, y_anchor is a class label of the anchor sample, and w_anchor is a weight of the anchor sample;

s122, k neighbor samples of the x_neighbor samples are searched by using a k neighbor algorithm, one neighbor sample is selected from the k neighbor samples, the feature vector of the selected neighbor sample is assigned to x_nearest, x_nearest is a first temporary variable, the weight of the selected neighbor sample is assigned to w_nearest, and w_nearest is a second temporary variable;

s123, calculating a characteristic vector x_syn of the synthesized sample according to the formula (1),

x_syn＝ x_anchor+(x_nearest-x_anchor).*rand(1,Nfeatures) (1)

where rand (1, NFeatures) represents randomly generating an nfeature dimension feature vector with a value range in the (0, 1) interval, NFeatures is the feature dimension of the x_anchor feature vector, x represents the multiplication of elements at the corresponding positions of the two feature vectors,

s124, calculating the synthesis weight w_syn of the synthesized sample according to the formula (2),

w_syn＝w_anchor*r1+w_nearest*(1-r1) (2)

wherein r1 is a random number with a value range between (0, 1),

s125, using the class label y_anchor of the anchor sample as a synthesized class label y_syn of the synthesized sample, storing the feature vector, the synthesis weight and the synthesized class label of the synthesized sample,

s126, let m=m+1, return to S121, obtain Nsmote (i) synthetic samples and store them as a synthetic sample set of the i-th sample set, and represent the synthetic sample set, the synthetic weight set and the synthetic label set of the i-th sample set as an i-th artificial synthetic sample set.

The invention provides a long-tail object recognition method based on artificial synthesis data and multi-source transfer learning, which is characterized in that the artificial synthesis is carried out on small sample long-tail data through an SMOTE technology, so that the class label distribution of training samples in the target field can be dynamically rebalanced, the method is particularly suitable for the task of identifying unbalanced objects (image classification) of small samples, the universality of image classification is improved, a weak classifier in each iteration can use a plurality of classifier learning algorithms such as SVM, extreme learning machine, decision tree stake and the like, the flexibility of image classification is improved, and the performance of the image classifier under the absolute unbalance condition of class distribution is effectively improved through automatic generation of artificial synthesis data with weight and classification training based on a plurality of source field data in each iteration.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions of the prior art, the drawings which are used in the description of the embodiments or the prior art will be briefly described, it being obvious that the drawings in the description below are some embodiments of the invention and that other drawings can be obtained according to these drawings without inventive faculty for a person skilled in the art.

FIG. 1 is a flow chart of the method of the present invention;

FIG. 2 is a flow chart of the execution of the Weighted_MCSMOTE algorithm of the present invention;

FIG. 3 is a flow chart of the execution of the MSMCTRASMOTEboost algorithm of the present invention;

FIG. 4 is a flow chart of the image classifier training and performance evaluation of the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

FIG. 1 is a flowchart of the method of the present invention, as shown in FIG. 1, the method of the present embodiment may include:

step one, acquiring N source field training image sets and a target field training image set, preprocessing, wherein the target field training image set is insufficient in sample and unbalanced in label-like distribution, and extracting features of each training image in the preprocessed N source field training image sets and the preprocessed one target field training image set according to a deep neural network, wherein the deep neural network is ResNet18, and the ResNet18 network requires that the input image is 224 multiplied by 3, but usually the training image and the test image are different in size. Thus, to pre-process the image size to 224 x 3 before inputting the training images into the ResNet18 network, the training feature vectors for each training image are obtained, i.e., the 512-dimensional feature vectors output by the global pooling layer 'pool5' at the end of the ResNet18 network are obtained,

the initial weak classifier learning algorithm is selected from SVM, extreme learning machine and decision stumps, and one classifier learning algorithm is selected randomly as the initial weak classifier learning algorithm;

the third step of the method comprises the steps of,

the artificial synthesis of the target domain training data set and the weight set corresponding to the target domain training data set includes,

s11, determining a source field to which each training image in the training data set of the target field belongs, taking the source field to which each training image belongs as class labels, dividing the training data set of the target field into K sample sets according to the class labels, respectively calculating the sample number NSmote (i) of the ith sample set needing artificial synthesis, wherein i=1, i is less than or equal to K,

the step S12 includes the step of,

x_syn＝ x_anchor+(x_nearest-x_anchor).*rand(1,Nfeatures) (1)

w_syn＝w_anchor*r1+w_nearest*(1-r1) (2)

wherein r1 is a random number with a value range uniformly distributed between (0, 1),

s126, m=m+1, returning to S121, obtaining an Nsmote (i) synthesized sample and storing the obtained synthesized sample as a synthesized sample set of the i-th sample set, and representing a synthesized sample set, a synthesized weight set and a synthesized label set of the i-th sample set as an i-th artificial synthesized sample set;

the implementation of S2 is shown as the Weighted-MCSMOTE algorithm, the process of which is shown in Table 1, and the flow chart of which is shown in FIG. 2.

TABLE 1 procedure for Weighted-MCSMOTE Algorithm

/>

Hypothesis D in algorithm Weighted-MCSMOTE _T The maximum value of the sample number of each category is Nmax=max (class No), and the Weighted-MCSMOTE algorithm scansD _T And artificially synthesizing Nmax-class No (ii) +1 samples for the ii th category by using an SMOTE technology, so that the number of samples in each category in the training data set reaches Nmax+1, and distributing a weight for each artificially synthesized sample according to a formula of the 19 th row.

S13, repeatedly executing S12 until i=i+1, sequentially obtaining artificial synthetic sample sets of K sample sets, respectively merging synthetic sample sets in the artificial synthetic sample sets with a target field training data set, and merging synthetic weight sets in the artificial synthetic sample sets with weight sets corresponding to the target field training data set;

s3, respectively merging the nth source field training data set, the target field training data set and the artificially synthesized target field training data set and representing the nth source field training data set, respectively merging the weight set corresponding to the nth source field training data set, the weight set corresponding to the target field training data set and the weight set corresponding to the artificially synthesized target field training data set and representing the nth merging weight set, training the nth merging training set according to the selected initial weak classifier and the nth merging weight set, obtaining the trained nth weak classifier, calculating the nth training error rate of the nth weak classifier on the target field training data set,

s4, when n=n+1, returning to execute S3 to obtain N weak classifiers, selecting one weak classifier from the N weak classifiers as the weak classifier of the t iteration,

s6, let t=t+1, return to and carry out S2, until t=T, obtain T weak classifiers and their correspondent T classifier coefficients;

the process of the third step is expressed as an MSMCTRASMOTEboost algorithm, the implementation process of the MSMCTRASMOTEboost algorithm is shown in table 2, and the flow chart of the MSMCTRASMOTEboost algorithm is shown in FIG. 3.

TABLE 2 implementation of MSMCTRASMOTEboost algorithm

/>

Initialization of training sample weights for line 1 of the algorithm MSMCTraSMOTEboost. Line 2 sets the weight adjustment factor of the source field sample to beta _s . T weak classifiers f are trained by T-round iteration on lines 3-18 _t (x) A. The invention relates to a method for producing a fibre-reinforced plastic composite Line 4 normalizes the weight vector. Line 5 invokes the Weighted-MCSMOTE algorithm to artificially synthesize Weighted data to rebalance the class label distribution of the training samples for the target domain. Lines 6-9 represent N source domain datasets to be traversed separately at iteration t to construct N weak classifiers and select at lines 10-11 where D _T The weak classifier with the lowest upper weighted training error rate is used as the final weak classifier f of the round of iteration _t . Line 7 sets each source domain training data

And the data set D output by the Weighted-MCSMOTE algorithm _T ∪D _SMOTE Merging, based on dataset->

And the current weight of its samples are trained to a classifier +.>

Line 8 computing classifier->

Training on a target field datasetError rate of exercise->

Thus, through lines 5-11, the weak classifier f _t Saved is at D at iteration of the t-th round _T The classifier with the lowest error rate is trained by the upper weighting. Line 12 calculates correction factor C _t The method is mainly used for preventing early convergence of the source field sample weight in the transfer learning process. Line 13 based on training error Rate ε _Tar,t And category number K, calculating weight adjusting factor of target field sample>

Line 14 according to->

And updating the weight of the training sample in the target field. Line 15 based on calculated +.>

And reducing the weight of the samples with wrong classification in the training data set in the target field, and keeping the weight of the samples with correct classification unchanged. Lines 15-17 represent sample weights for updating the source domain across N source domain training data sets at iteration t.

Step four, acquiring a target field test image set, preprocessing, namely, testing image type label distribution balance, preprocessing the size of the test image into 224 multiplied by 3, respectively extracting features of the preprocessed target field test image set according to a deep neural network, acquiring test feature vectors of each test image in the target field test image set, inputting the test images into a ResNet18 network, and extracting 512-dimensional feature vectors output by a global pooling layer 'pool 5';

from T weak classifiers f _t (x) (t=1, 2,3, …, T) predicting class labels to which the test feature vectors of each test image belong, according to the prediction results of the T weak classifiers and classifier coefficients

Determining each testAnd the class label of the image is an object class contained in the test image.

The step of performing object recognition by this embodiment can be divided into two phases: an image classifier training stage and a classifier performance evaluation stage, as shown in fig. 4.

1. Image classifier training stage

Step 1: the training images of k source fields and target fields are loaded and preprocessed, and the ResNet18 of the neural network is loaded and trained, so that the sizes of all the images are adjusted to 224 multiplied by 3.

Step 2: and extracting the characteristics of the image by the ResNet18 network, and acquiring 512-dimensional characteristic vectors output by a global pooling layer 'pool5' at the end of the ResNet18 network as characteristic representations of each picture in the source field and the target field.

Step 3: k source field training sample sets D after feature extraction _S1 ,D _S2 ,…,D _Sk And 1 target field training sample set D _T Feature matrix of the device ₂ -zscore normalization preprocessing, and inputting the zcore normalization preprocessing as training data into a newly proposed MSMCTRASMOTEboost algorithm, and iterating T rounds to train out T weak classifiers f _t (x) And classifier weight coefficients

Where t=1, 2, …, T.

2. Image classifier performance evaluation phase

Step 1: and loading a target field test image, balancing the distribution of the test image class labels, and adjusting the size of the test image to 224 multiplied by 3.

Step 2: the test image I is input into a ResNet18 network, and a 512-dimensional feature vector x output by a global pooling layer 'pool5' is extracted.

Step 3: loading T weak classifiers f in the trained image classifier _t (x) (t=1, 2,3, …, T) and classifier weight coefficients

Step 4: according to the formula

And outputting the prediction type label of the image I.

Class labels of training and testing data at image multi-classification belong to the set {1,2, …, K }, where K >2.

The performance of the long tail recognition classifier in this embodiment is measured, including the steps of,

1. definition of metrics

1) Imbalance rate definition

In the field of Imbalance learning, the degree of Imbalance for a training sample class label distribution can be measured using an Imbalance rate (Imbalance rate), formally expressed as:

wherein Y _i I represents the number of samples of the ith class in the training data set, min _i {|Y _i The number of samples in the category having the smallest number of samples, max _i {|Y _i The number of samples in the category with the largest number of samples is represented.

1) Performance metrics in two categories

For the two-classification problem, assuming that the label of the minority class is P (Positive), the label of the majority class is N (Negative), and the Confusion Matrix (fusion Matrix) is shown in table 3.

TABLE 3 confusion matrix for two classifications

Based on the definition of TP, FP, TN, and FN in the confusion matrix, classifier model performance metrics Precision, recall, F-measure, G-mean, and Balanced Accuracy (BAC for short) in long-tail classification are defined as:

in the above model performance measurement formula, precision is used for measuring the Precision of the model on the minority class samples, recall is used for measuring the Recall ratio of the model on the minority class samples, and F1-measure is the weighted harmonic average of Precision and Recall. G-means and BAC represent a comprehensive trade-off of TPR (True Positive Rate) and TNR (True Negative Rate).

3) Performance metrics at multiple classifications

The multi-class task may be broken down into multiple sub-class tasks and then the confusion matrix for each class is established separately. For example, an m-class can be decomposed into m-class problems, assuming that a minority class in the ith second class is labeled C _i Most class labels are set { C } _j I j=1..m, j+.i }. At this time, each model performance metric may be calculated by Macro-averaging (Macro-averaging).

The macro average is to calculate the performance measurement on the confusion matrix of each category, and then calculate the arithmetic average of each index.

For the m-classification task, assume that the m-group recall, precision, F1-measure, G-mean, and BAC calculated on the m-classification confusion matrices, respectively, are expressed as

Then there are:

2. building an embodiment dataset

In order to verify the superiority of the proposed method in small sample long-tail object recognition application, the embodiment constructs a multi-class long-tail object recognition data set Office-Caltch10-LT based on Office-Caltch10, wherein the multi-class long-tail object recognition data set Office-Caltch10-LT comprises image data sets DSLR, webcam, amazon and Caltech in 4 fields, wherein the Office-Caltch10 data set comprises 10 classes of objects, is the same class in Office-31 and Caltech-256 data sets, DSLR, webcam, amazon is a source field data set, and Caltech is a target field data set.

[Hoffman,J.,Rodner,E.,Donahue,J.et al.Asymmetric and Category Invariant Feature Transformations for Domain Adaptation.Int J Comput Vis 109,28–41(2014).]

The steps of constructing the long tail object recognition dataset Office-Caltch10-LT are as follows:

step 1: the method comprises the steps of performing batch feature coding on images in each field in an Office-Caltech10 dataset by using a pretrained ResNet18 deep neural network, and allocating a class label c (c E {1,2,3, …,10 }) to each feature vector according to the corresponding object class, wherein the object classes contained in the images in each field are respectively represented as 'back', 'bike', 'calculator', 'headsets', 'calculator', 'keyboard', 'map_computer', 'camera', 'projector', and each object class corresponds to the class labels 1,2,3,4,5,6,7,8,9 and 10.

Step 2: the total samples in the Caltech field are divided into 5 groups of training and testing sample sets by a random sampling mode according to the proportion of 5 rounds of layered cross validation (5-folds Stratified Cross Validation) and the training/testing data volume ratio of 70:30. DSLR, webcam and Amazon are source fields. Therefore, no further random sampling process is performed after feature encoding is performed on the picture.

Step 3: for each set of training samples, further downsampling is performed on the 5 classes of training samples in which the number is 5 bits after the reciprocal. Assuming that maxC is the largest of the training sample amounts of the respective object classes in the Caltech field, the unbalance rate selection is set to IR, the number of training samples after downsampling for each class of samples selected for downsampling is reduced to floor (maxC/IRatio).

Step 4: constructing an Office-Caltch10-LT data set according to feature vectors and class labels of images in a training set of a source field and a target field, wherein the unbalance rate of the Office-Caltch10-LT data set is expressed as IR.

3. Main procedure of the examples

Step 1: training images of the source field DSLR, webcam, amazon and the target field Caltech are loaded and preprocessed, wherein the target field training samples are insufficient and the class labels are unbalanced in distribution. Image feature extraction and encoding uses a deep neural network ResNet18, since ResNet18 networks require input images of 224X 3 in size, but typically training and test images tend to be of different sizes. Therefore, the training images and the test images are automatically resized to 224×224×3 before they are input to the res net18 network.

Step 2: a feature set of 3 source domain training images and 1 target domain training image is obtained. The batch process saves the 512-dimensional feature vector output by the global pooling layer 'pool5' at the end of the ResNet18 network as a feature representation for each picture of the source and target fields. It is worth noting that since the feature extraction of k source fields and 1 target field is the same, the feature vector dimensions output from the res net18 network are the same.

Step 3: 3 source field training sample sets D after feature coding ₁ ,D ₂ ,…,D ₃ And 1 target field training sample set D _T Feature matrix of the device ₂ -zscore normalization preprocessing, and inputting the zcore normalization preprocessing as training data into a newly proposed MSMCTRASMOTEboost algorithm, and iterating T rounds to train out T weak classifiers f _t (x) And classifier weight coefficients

Where t=1, 2, …, T.

Step 4: and loading a target field test image, balancing the distribution of the test image class labels, and adjusting the size of the test image to 224 multiplied by 3.

Step 5: inputting the test image I into a ResNet18 network, extracting 512-dimensional feature vectors x output by a global pooling layer 'pool5', and carrying out l ₂ -zscore normalization pretreatment.

Step 6: loading T weak classifiers f in the trained image classifier _t (x) (t=1, 2,3, …, T) and classifier weight coefficients

Step 7: according to the formula

And outputting a predicted class label of the image I, wherein the class label is an object class contained in the test image.

The ir=10 was set in comparative experiment 1, and the training and test sample numbers of the source field and the target field are shown in table 4. It is worth noting that the source fields are training data.

Table 4 number of training and test samples in different areas of experiment 1 (ir=10)

Table 5 shows the macro-average experimental results of comparative experiment 1, wherein the baseline algorithm in experiment 1 selects multiple types of Adaboost algorithms SAMME without using transfer learning, single source multiple types of transfer learning algorithms (MCT rAdaboost for short, wherein the source fields select DSLR, webcam and Amazon respectively), and the MSMCT rASMAOTEboost algorithm proposed in the present invention.

Table 5 results of comparative experiment 1 using macro average metric

The ir=20 was set in comparative experiment 1, and the training and test sample numbers of the source field and the target field, which are both training data, are shown in table 6.

Table 6 compares the number of training and test samples in different areas of experiment 2 (ir=10)

Table 7 shows the macro-average experimental results of comparative experiment 2. The baseline algorithm in comparative experiment 2 selects the multiple classes of Adaboost algorithm SAMME without transfer learning, the single source multiple classes of transfer learning algorithm (MCT rAdaboost for short), wherein the source fields select DSLR, webcam and Amazon, respectively, and the MSMCT rASMATE BOOST algorithm proposed in the present invention, respectively.

Table 7 results of comparative experiment 2 using macro average metric

As can be seen from comparing the experimental results of experiments 1 and 2, the image classifier trained by the MSMCTRASMOTEboost algorithm based on the artificial synthetic data and the multi-source transfer learning is suitable for the application field of long tail recognition, and can remarkably improve the comprehensive performance of long tail object recognition under the condition of small samples.

Note that:

the MCT rAdaboost algorithm is from document [1]Hanxian He,Kourosh Khoshelham,Clive Fraser,A multiclass TrAdaBoost transfer learning algorithm for the classification of mobile lidar data,ISPRS Journal of Photogrammetry and Remote Sensing,Volume 166,2020,Pages 118-127.

SAMME algorithm is from document [2] hasie, t., rosset, s., zhu, j., zou, h.,2009.multi-class adaboost.stat.interface 2,349-3.

The whole beneficial effects are that:

the invention provides a long-tail object recognition method based on artificial synthesis data and multi-source transfer learning, which is used for carrying out artificial synthesis on small sample long-tail data through an SMOTE technology, is particularly suitable for a task of identifying small sample unbalanced objects (classifying images), improves the universality of image classification, can use a plurality of classifier learning algorithms such as SVM, extreme learning machine, decision stumps and the like for weak classifiers in each round of iteration, improves the flexibility of image classification, and effectively improves the performance of the image classifier under the condition of absolute unbalance of class distribution by automatically generating artificial synthesis data with weights and carrying out classification training based on a plurality of source field data in each round of iteration.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the invention.

Claims

1. A long-tail object identification method based on artificial synthetic data and multi-source transfer learning is characterized by comprising the following steps of,

2. The method for long-tail object recognition based on artificial synthetic data and multi-source transfer learning according to claim 1, wherein the third step comprises,

3. The method for long-tail object recognition based on artificial synthesis data and multi-source transfer learning according to claim 2, wherein the artificial synthesis of the target domain training data set and the weight set corresponding to the target domain training data set comprises,

4. The long-tail object recognition method based on artificial synthesis data and multi-source transfer learning according to claim 1, wherein the selecting an initial weak classifier learning algorithm comprises randomly selecting one classifier learning algorithm from SVM, extreme learning machine and decision stumps as the initial weak classifier learning algorithm.

5. A long-tail object recognition method based on artificial synthetic data and multi-source transfer learning according to claim 3, wherein said S12 comprises,

x_syn＝x_anchor+(x_nearest-x_anchor).*rand(1,Nfeatures) (1)

w_syn＝w_anchor*r1+w_nearest*(1-r1) (2)

wherein r1 is a random number with a value range between (0, 1),