CN110378872A

CN110378872A - A kind of multi-source adaptive equalization transfer learning method towards crack image detection

Info

Publication number: CN110378872A
Application number: CN201910496225.3A
Authority: CN
Inventors: 毛莺池; 唐江红; 王静; 刘凡; 平萍; 黄倩
Original assignee: Hohai University HHU
Current assignee: Hohai University HHU
Priority date: 2019-06-10
Filing date: 2019-06-10
Publication date: 2019-10-25

Abstract

The multi-source adaptive equalization transfer learning method towards crack image detection that the invention discloses a kind of, comprising the following steps: 1) correction coefficient is added on the basis of TrAdaBoost algorithm, solves the problems, such as that auxiliary data weight convergence is too fast；2) adaptive covering parameter is introduced in correction coefficient, reflects between auxiliary data collection and target data set whether there is similarity relationships；3) keep finally obtained target data set consistent with each field crack data set different degree with final balance weight method, improve Dam Crack image detection accuracy and efficiency.The present invention can be improved Crack Detection accuracy rate, realize the promotion of Dam Crack picture detection performance on Small Sample Database collection.

Description

A kind of multi-source adaptive equalization transfer learning method towards crack image detection

Technical field

The invention belongs to Distributed Database cluster field, in particular to a kind of multi-source towards crack image detection is adaptive Equilibrium transfer learning method.

Background technique

China is the country that reservoir dam is most in the world, and by the end of 2016, China was completed all kinds of reservoir dams 9.8 More than ten thousand seats.It over time with the growth in dam age, is influenced by natural environment and human factor, dam surface and internal hair There is a series of dam visual defects, the probabilities that are in danger such as deformation, crack, leakage, calcified material precipitation and increases, threatens people in raw deformation People's security of the lives and property, crack are one of main harms of dam.

Traditional machine learning method needs a large amount of training sample, and it is same for needing to meet training data and test data The hypothesis of distribution.However, in practical applications, test data and training data not necessarily fully meet same distributional assumption.Migration Two basic assumptions in conventional machines study are relaxed in study, purpose primarily directed to small, limited sample size Specific area data set is easy to produce over-fitting using machine learning and leads to not the problem of training is with study, by utilizing tool There are trained preferable excellent model and sample in the field of certain similitude to construct the model for meeting mission requirements, thus real The effect of good model is constructed under existing small data set.

TrAdaBoost algorithm is a kind of transfer learning method of Case-based Reasoning selection, for solving training set and test set Different problems are distributed, good result can be obtained when auxiliary data and source data have many similitudes. TrAdaBoost algorithm constructs a part of usable supplemental training collection, combining target training set than using target source merely The more accurate model of training set training.

Summary of the invention

Goal of the invention: in order to overcome Dam Crack image existing in the prior art less, training sample is unevenly distributed weighing apparatus, And TrAdaBoost algorithm easily weakens the problem of auxiliary data collection effect in the training process, the present invention provides a kind of towards splitting The multi-source adaptive equalization transfer learning method for stitching image detection is improved to train the strong classifier of Dam Crack image Crack Detection accuracy rate realizes the promotion of Dam Crack picture detection performance on Small Sample Database collection.

Technical solution: to achieve the above object, the present invention provides a kind of multi-source towards crack image detection and adaptively puts down Weigh transfer learning method, includes the following steps:

(1) multi-source assistant images data set is inputted；

(2) K-means cluster is carried out to image, then rejected and the big picture of target data difference；

(3) discarding and the biggish auxiliary data of target difference from the image library of crack, and classifier is trained；

(4) setting right value update formula isAdaptively covering parameter is To modify corresponding weight；Wherein, increase compensation coefficient in right value update strategy, introduced in compensation coefficient adaptive Parameter is covered, it is auxiliary that target data set final weight is finally reset into each field in last time iteration with final balance weight method Help the average value of training set weight；

(5) weight vectors are updated, return step (3) obtains SVM strong classifier；Finally reset D_TWeight: by D_TWeights resetting Each D after for iteration_sAverage weight；Use D_sWith D after resetting_TOne final classification device of training jointly.

Further, K-means cluster is carried out to image in the step (2), then rejected big with target data difference Picture specific step is as follows:

(2.1) first by image X in the image library of crack_i(i=1,2 ..., n) carries out gray processing, and successively storage is to one-dimensional Matrix D_XIn；

(2.2) then with 10 length in pixels, 3 pixel moving step lengths successively carry out piecemeal storage, record the first place of every fritter It sets, obtains n block of pixels data set, arbitrarily select the gray average of 30 image fritters as initial cluster center；

(2.3) according to the gray average of each image array fritter, using Euclidean distance, shown in following formula, meter These objects are calculated at a distance from 30 image pattern cluster centres；And it is again equal to respective image fritter gray scale according to minimum range Value is divided, and each image array fritter is assigned to most similar class；

Wherein, dis (x_i,y_j) it is two data object x_iAnd y_jThe distance between；As dis (x_i,y_j) value is bigger, illustrate x_iWith y_jIt is more similar；As dis (x_i,y_j) value is smaller, illustrate x_iAnd y_jGap is bigger；

(2.4) mass center of each image fritter pixel grey scale mean value changed is recalculated；

(2.5) it repeats the above steps (2.3), step (2.4) is until the cluster center of each data class is no longer changed Until；

After the image array of input is stored in the form of block of pixels, with K-means clustering algorithm to picture element matrix block It is clustered, the Euclidean distance at the center of block of pixels in cluster set to cluster centre is ranked up, the remote picture of clustering distance It is deleted.

Further, it is trained that specific step is as follows in the step (3) to classifier:

(3.1) target domain tape label training set D_T={ (x_t,y_t), D_SFor the set of N number of auxiliary data collection, i.e. D_s= {D₁, D₂..., D_N}={ (x₁,y₁),...(x_k,y_k),...(x_N,y_N)}；Initialize weight vector w=(w_s,w_T), and to merging The processing of training set samples normalization；By multi-source auxiliary data collection D_i(i=1,2 ..., N) and target domain tape label data set D_TPoint Not Zu He < D_i, D_T>, obtain combined data set D_i,T；

(3.2) start to train network, in combination D_i,TUpper unified progress image preprocessing, image segmentation, feature extraction and handle All features save, and training SVM classifier obtains i Weak Classifier F_m；

(3.3) Weak Classifier F is calculated separately_mIn D_s, D_TUpper error:

Wherein, n_STo assist data set sample size, n_TFor target data set number of samples；y_iReally to classify, F_m(x_i) For classifier F_mExport classification.

Further, target data set final weight is reset into each field in last time iteration in the step (4) Specific step is as follows for the average value of supplemental training collection weight:

(4.1) increase the weight that correction coefficient updates auxiliary data collection:

Increase the weight that correction coefficient updates auxiliary data collection sample on the basis of TrAdaBoost；When the number of iterations M not Disconnected to increase, every field supplemental training collection can be returned correctly, after M iteration, each field of auxiliary sample weights The sum of are as follows:

Wherein, a is auxiliary training set, n_aTo assist number of samples in training set a,For each training sample in source domain a Weight；

The correct sample weights of forecast sample are constant in target data set b, then the weights sum of correct sampleAre as follows:

Wherein, n_bFor number of samples in target data set b,For training sample weight each in b,It is Weak Classifier in b On error rate；

Prediction error sample needs to update in target data set bModify corresponding weight:

The weights sum of error sample in target data set bAre as follows:

The sum of all aiming field sample weights, even if correct sample and error sample weights sum:

When the number of iterations is sufficiently large, each field supplemental training collection can be returned correctly, after iteration,It can obtain:

If auxiliary data, which integrates sample, increases correction coefficient as C^m, weight becomes:

Due to auxiliary data collection sample weights at this time stablize it is constant, i.e.,Correction coefficient can be obtained are as follows:

Find out from correction coefficient formula, correction coefficient C^mWith error rate of the Weak Classifier on target data set bAt anti- Than relationship, i.e. error rateIt is bigger, correction coefficient C^mSmaller, auxiliary data collection sample weights increase, to next iteration training The influence of Weak Classifier increases；Error rateIt is smaller, correction coefficient C^mBigger, auxiliary data collection sample weights reduce, to next The influence of secondary repetitive exercise Weak Classifier reduces；Therefore, correction coefficient C is added on the basis of TrAdaBoost algorithm^mIt can be same When keep target data set and auxiliary data collection sample weights to be restrained；

(4.2) adaptive covering parameter is introduced:

Introduce adaptive covering parameter in correction coefficient, it is adaptive cover parameter be base classifier in auxiliary data collection and The sum of classification accuracy rate on target data set, it may be assumed that

Field of auxiliary data sample weight after the m+1 times iteration:

(4.3) final balance weight method:

Target data set final weight is reset to the average value of each field supplemental training collection weight in last time iteration, Spend finally obtained target data set and each field supplemental training collection unanimously.

The utility model has the advantages that compared with the prior art, the present invention has the following advantages:

(1) it is clustered before TrAdaBoost algorithm using K-means, by the remote picture of clustering distance from crack image library Middle deletion is conducive to the training of subsequent classifier, improves training effectiveness.

(2) TrAdaBoost is compared, introducing correction coefficient can solve the increase due to the number of iterations, and source domain is caused to be weighed Decline too fast, the excessive problem of gap between target source domain weight again.

(3) adaptive covering parameter is introduced in correction coefficient can reflect source domain training dataset and target domain instruction Whether practice between data set has similarity relationships, improvement method detection performance.

(4) final balance weight method can make finally obtained target set of source data and each field crack data set different degree Unanimously, realize that Dam Crack picture classifier performance on Small Sample Database collection is promoted.

Detailed description of the invention

Fig. 1 is flow chart of the invention.

Specific embodiment

Combined with specific embodiments below, the present invention is furture elucidated, it should be understood that these embodiments are merely to illustrate the present invention Rather than limit the scope of the invention, after the present invention has been read, those skilled in the art are to various equivalences of the invention The modification of form falls within the application range as defined in the appended claims.

A kind of transfer learning side multi-source adaptive equalization TrAdaBoost towards crack image detection of the present invention Method, as shown in algorithm 1, including two aspects: K-means image clustering and multi-source adaptive equalization TrAdaBoost migration are learned It practises；

Algorithm 1: it is based on K-means multi-source adaptive equalization TrAdaBoost transfer learning method

1) K-means image clustering:

K-means clustering method, wherein K represents cluster mass center number, and means indicates the mean value of data in cluster.Its core Thought is: randomly selecting the initial center of k cluster, each center representative one cluster；And to remaining data object, according to At a distance from each center cluster, it is respectively allocated to therewith in nearest clustering cluster.

By K-means image clustering method, cluster row is carried out as method for measuring similarity using Euclidean distance Sequence.The remote picture of clustering distance is deleted from the image library of crack, is conducive to the training of subsequent classifier, improves training effectiveness. Specific step is as follows for K-means image clustering method:

Step 1: first by image X in the image library of crack_i(i=1,2 ..., n) carries out gray processing, and successively storage is to one-dimensional Matrix D_XIn；

Step 2: then with 10 length in pixels, 3 pixel moving step lengths successively carry out piecemeal storage, record the first place of every fritter It sets, obtains n block of pixels data set, arbitrarily select the gray average of 30 image fritters as initial cluster center；

Step 3: according to the gray average of each image array fritter, using Euclidean distance, as shown in formula 1, meter These objects are calculated at a distance from 30 image pattern cluster centres；And it is again equal to respective image fritter gray scale according to minimum range Value is divided, and each image array fritter is assigned to most similar class；

Wherein, dis (x_i,y_j) it is two data object x_iAnd y_jThe distance between.As dis (x_i,y_j) value is bigger, illustrate x_iWith y_jIt is more similar；As dis (x_i,y_j) value is smaller, illustrate x_iAnd y_jGap is bigger.

Step 4: recalculating the mass center of each image fritter pixel grey scale mean value changed；

Step 5: repeating the above steps 3,4 until the cluster center of each data class is no longer changed.

After the image array of input is stored in the form of block of pixels, with K-means clustering algorithm to picture element matrix block It is clustered, the Euclidean distance at the center of block of pixels in cluster set to cluster centre is ranked up, the remote picture of clustering distance It is deleted.Discarding and the biggish auxiliary data of target difference, are conducive to the training of subsequent classifier, mention from the image library of crack High training effectiveness.

2) multi-source adaptive equalization TrAdaBoost transfer learning

Multi-source adaptive equalization TrAdaBoost (Multi-source Adaptive Balance TrAdaBoost, MABtrA) correction coefficient is added in transfer learning method on the basis of TrAdaBoost algorithm, solves auxiliary data weight convergence mistake Fast problem；Adaptive covering parameter is introduced in correction coefficient, reflects between auxiliary data collection and target data set whether have There are similarity relationships；After iteration, using final balance weight method, make finally obtained target data set and each field crack Data set different degree is consistent, improves Dam Crack image detection accuracy and efficiency.Increase correction coefficient and updates auxiliary data collection sample Adaptively covering parameter and final balance weight method are specific as follows for this weight, introducing:

(1) increase the weight that correction coefficient updates auxiliary data collection

Due to having differences between each field auxiliary data collection and target data set, the Weak Classifier for causing training to obtain exists Error rate is higher on target data set, and therefore, the weight of each field supplemental training collection constantly reduces as the number of iterations increases, The weight that final training obtains becomes very small so that uncorrelated to auxiliary data collection, can not play auxiliary mark data set The effect of habit.However, the weight of target data set is continuously increased as the number of iterations increases, easily there is difficulty and divide sample situation.

In order to preferably using each field auxiliary data collection and target data set training, increase on the basis of TrAdaBoost The weight of correction coefficient update auxiliary data collection sample.When the number of iterations m constantly increases, every field supplemental training collection can be by It is correct to return, after m iteration, the sum of each field of auxiliary sample weights are as follows:

Wherein, a is auxiliary training set, n_aTo assist number of samples in training set a,For each training sample in source domain a Weight.

Wherein, n_bFor number of samples in target data set b,For training sample weight each in b,It is Weak Classifier in b On error rate.

The weights sum of error sample in target data set bAre as follows:

Therefore, when the auxiliary data collection sample weights distribution of M+1 iteration are as follows:

When the number of iterations is sufficiently large, each field supplemental training collection can be returned correctly, after iteration,Connection formula 7 can obtain:

Due to auxiliary data collection sample weights at this time stablize it is constant, i.e.,It can according to relational expression 8 and 9 Obtain correction coefficient are as follows:

From formula 10 as can be seen that correction coefficient C^mWith error rate of the Weak Classifier on target data set bIt is inversely proportional Relationship, i.e. error rateIt is bigger, correction coefficient C^mSmaller, auxiliary data collection sample weights increase, weak to next iteration training The influence of classifier increases；Error rateIt is smaller, correction coefficient C^mBigger, auxiliary data collection sample weights reduce, to next time The influence of repetitive exercise Weak Classifier reduces.Therefore, correction coefficient C is added on the basis of TrAdaBoost algorithm^mIt can be simultaneously Target data set and auxiliary data collection sample weights is kept to be restrained.

(2) adaptive covering parameter is introduced

However, even if ε_bWhen lower, Weak Classifier can also have differences the classifying quality of source domain training set, this difference Similarities and differences sample can reflect out the correlation between source domain training set and target domain training set.In order to reflect that this similitude is closed System introduces adaptive covering parameter in correction coefficient, and the adaptive parameter that covers is base classifier in auxiliary data collection and target The sum of classification accuracy rate on data set, it may be assumed that

Field of auxiliary data sample weight after the m+1 times iteration:

(3) final balance weight method

The basic conception of final balance weight method is: in an iterative process, auxiliary data weight constantly declines, target data Weight is continuously increased, and after iteration, gap is larger between auxiliary data weight and target data weight, but in final classification device On generation form, it should to target data set and every field supplemental training collection fair play.By target data set final weight The average value for resetting to each field supplemental training collection weight in last time iteration, makes finally obtained target data set and each neck Supplemental training collection in domain will be spent unanimously, improve the Detection accuracy of algorithm.

The evaluation criterion of the specific embodiment of the invention is as follows:

Evaluation criterion recall rate (Recall), the precision (Precision), accuracy rate of the specific embodiment of the invention (Accuracy) and comprehensive evaluation index (F-Measure).Recall rate indicates that positive class predicts positive exact figures and reality in prediction result The ratio between the positive class in border, that is, the target being identified ratio shared in such target；Accuracy representing predict positive class successfully count and Predict that the ratio between successfully, i.e., all identification returns the result ratio shared by middle real goal；Accuracy rate indicates to predict correct sample The ratio of the total sample of this Zhan；The comprehensive assessment of comprehensive evaluation index expression recall rate and accuracy.

F-Measure=(2 × TP)/(2 × TP+FP+FN) (16)

Above-mentioned four kinds of evaluation criterias, the bigger expression algorithm prediction effect of value are more preferable.

Wherein the meaning of TP, FN, FP, TN are as shown in 1 two classification confusion matrix of table.

1 two classification confusion matrix of table

Fig. 1 be the embodiment of the present invention model training flow chart, the course of work as described below:

1. inputting multi-source assistant images data set.

2. image after K-means is clustered, is deleted and the biggish picture of target source difference.The image clustering side K-means Specific step is as follows for method:

3. target domain tape label training set D_T={ (x_t,y_t), D_SFor the set of N number of auxiliary data collection, i.e. D_s={ D₁, D₂..., D_N}={ (x₁,y₁),...(x_k,y_k),...(x_N,y_N)}.Initialize weight vector w=(w_s,w_T), and instructed to merging Practice collection samples normalization processing.By multi-source auxiliary data collection D_i(i=1,2 ..., N) and target domain tape label data set D_TRespectively Combine < D_i, D_T>, obtain combined data set D_i,T。

4. starting to train network, in combination D_i,TUpper unified progress image preprocessing, image segmentation, feature extraction is simultaneously institute There is feature to save, training SVM classifier, available i Weak Classifier F_m。

5. calculating separately Weak Classifier F_mIn D_s, D_TUpper error:

6. setting right value update formula isAdaptively covering parameter is To modify corresponding weight.Wherein, compensation coefficient is increased in right value update strategy, is introduced in compensation coefficient adaptive Parameter should be covered, target data set final weight is finally reset into each field in last time iteration with final balance weight method The average value of supplemental training collection weight.Concrete principle is as follows:

The weights sum of error sample in target data set bAre as follows:

(2) adaptive covering parameter is introduced

Field of auxiliary data sample weight after the m+1 times iteration:

(3) final balance weight method

Target data set final weight is reset to the average value of each field supplemental training collection weight in last time iteration, It spend finally obtained target data set and each field supplemental training collection unanimously, improve the Detection accuracy of algorithm.

7. updating weight vectors:

7. repeating step 4. 5. 6. until reach the number of iterations M of setting, SVM strong classifier is obtained.Finally reset D_TPower Weight: by D_TWeights resetting is each D after iteration_sAverage weight；Use D_sWith D after resetting_TFinal point of common training one Class device.

F-Measure=(2 × TP)/(2 × TP+FP+FN) (16)

1 two classification confusion matrix of table

It is less for Dam Crack image according to above embodiments it is found that in practical applications, training sample distribution Unbalanced and TrAdaBoost algorithm easily weakens the problem of auxiliary data collection effect, method of the invention in the training process The strong classifier of Dam Crack image can be trained, Crack Detection accuracy rate is improved, realizes Dam Crack picture in small sample The promotion of detection performance on data set.

Claims

1. a kind of multi-source adaptive equalization transfer learning method towards crack image detection, which is characterized in that including walking as follows It is rapid:

(1) multi-source assistant images data set is inputted；

(4) setting right value update formula isAdaptively covering parameter is To modify corresponding weight；Wherein, increase compensation coefficient in right value update strategy, adaptive covering ginseng is introduced in compensation coefficient Number, finally resets to each field supplemental training in last time iteration for target data set final weight with final balance weight method Collect the average value of weight；

(5) weight vectors are updated, return step (3) obtains SVM strong classifier；Finally reset D_TWeight: by D_TWeights resetting is repeatedly Each D after generation_sAverage weight；Use D_sWith D after resetting_TOne final classification device of training jointly.

2. a kind of multi-source adaptive equalization transfer learning method towards crack image detection according to claim 1, It is characterized in that, carrying out K-means cluster to image in the step (2), then reject and the big picture of target data difference Specific step is as follows:

(2.1) first by image X in the image library of crack_i(i=1,2 ..., n) carry out gray processing, successively storage arrive one-dimensional matrix D_X In；

(2.2) then with 10 length in pixels, 3 pixel moving step lengths successively carry out piecemeal storage, and the first place for recording every fritter is set, and is obtained To n block of pixels data set, arbitrarily select the gray average of 30 image fritters as initial cluster center；

(2.3) according to the gray average of each image array fritter, using Euclidean distance, shown in following formula, this is calculated A little objects are at a distance from 30 image pattern cluster centres；And according to minimum range again to respective image fritter gray average into Row divides, and each image array fritter is assigned to most similar class；

Wherein, dis (x_i,y_j) it is two data object x_iAnd y_jThe distance between；As dis (x_i,y_j) value is bigger, illustrate x_iAnd y_jMore It is similar；As dis (x_i,y_j) value is smaller, illustrate x_iAnd y_jGap is bigger；

(2.5) it repeats the above steps (2.3), step (2.4) is until the cluster center of each data class is no longer changed；

After the image array of input is stored in the form of block of pixels, picture element matrix block is carried out with K-means clustering algorithm Cluster, the Euclidean distance at the center of block of pixels in cluster set to cluster centre is ranked up, and the remote picture of clustering distance carries out It deletes.

3. a kind of multi-source adaptive equalization transfer learning method towards crack image detection according to claim 1, It is characterized in that, being trained in the step (3) to classifier, specific step is as follows:

(3.1) target domain tape label training set D_T={ (x_t,y_t), D_SFor the set of N number of auxiliary data collection, i.e. D_s={ D₁, D₂..., D_N}={ (x₁,y₁),...(x_k,y_k),...(x_N,y_N)}；Initialize weight vector w=(w_s,w_T), and instructed to merging Practice collection samples normalization processing；By multi-source auxiliary data collection D_i(i=1,2 ..., N) and target domain tape label data set D_TPoint Not Zu He < D_i, D_T>, obtain combined data set D_i,T；

(3.2) start to train network, in combination D_i,TUpper unified progress image preprocessing, image segmentation, feature extraction is simultaneously all Feature saves, and training SVM classifier obtains i Weak Classifier F_m；

(3.3) Weak Classifier F is calculated separately_mIn D_s, D_TUpper error:

Wherein, n_STo assist data set sample size, n_TFor target data set number of samples；y_iReally to classify, F_m(x_i) it is point Class device F_mExport classification.

4. a kind of multi-source adaptive equalization transfer learning method towards crack image detection according to claim 1, It is characterized in that, target data set final weight is reset to each field auxiliary instruction in last time iteration in the step (4) Specific step is as follows for the average value of white silk collection weight:

Increase the weight that correction coefficient updates auxiliary data collection sample on the basis of TrAdaBoost；When the number of iterations M constantly increases Greatly, every field supplemental training collection can be returned correctly, after M iteration, the sum of each field of auxiliary sample weights Are as follows:

Wherein, a is auxiliary training set, n_aTo assist number of samples in training set a,For training sample weight each in source domain a；

Wherein, n_bFor number of samples in target data set b,For training sample weight each in b,It is Weak Classifier on b Error rate；

The weights sum of error sample in target data set bAre as follows:

When the number of iterations is sufficiently large, each field supplemental training collection can be returned correctly, after iteration,It can :

Find out from correction coefficient formula, correction coefficient C^mWith error rate of the Weak Classifier on target data set bBe inversely proportional pass System, i.e. error rateIt is bigger, correction coefficient C^mSmaller, auxiliary data collection sample weights increase, to weak point of next iteration training The influence of class device increases；Error rateIt is smaller, correction coefficient C^mBigger, auxiliary data collection sample weights reduce, to changing next time The influence of generation training Weak Classifier reduces；Therefore, correction coefficient C is added on the basis of TrAdaBoost algorithm^mIt can protect simultaneously It holds target data set and auxiliary data collection sample weights is restrained；

(4.2) adaptive covering parameter is introduced:

Adaptive covering parameter is introduced in correction coefficient, the adaptive parameter that covers is base classifier in auxiliary data collection and target The sum of classification accuracy rate on data set, it may be assumed that

Field of auxiliary data sample weight after the m+1 times iteration:

(4.3) final balance weight method:

The average value that target data set final weight is reset to each field supplemental training collection weight in last time iteration, makes most The target data set and each field supplemental training collection obtained eventually will be spent unanimously.