CN105787501B

CN105787501B - Power transmission line corridor region automatically selects the vegetation classification method of feature

Info

Publication number: CN105787501B
Application number: CN201510953921.4A
Authority: CN
Inventors: 徐侃; 陈志国; 张校志; 李陶
Original assignee: Wuhan University WHU
Current assignee: Wuhan University WHU
Priority date: 2015-12-17
Filing date: 2015-12-17
Publication date: 2019-03-19
Anticipated expiration: 2035-12-17
Also published as: CN105787501A

Abstract

The invention discloses a kind of power transmission line corridor regional vegetation classification methods, comprising: step 1, extracts the feature of training sample and cross validation sample, the various features constitutive characteristic collection of all training samples；Step 2, it is based on training sample and cross validation sample, characteristic optimization selection is carried out using cross-validation method, to obtain preferred feature；Step 3, vegetation classification is carried out to remote sensing image test data using preferred feature.The method of the present invention is in the feature selecting stage without largely being iterated to calculate, gained assemblage characteristic has robustness after optimized selection, assemblage characteristic is used for the vegetation classification in remote sensing image power transmission line corridor region, is remarkably improved algorithm computational efficiency and nicety of grading.

Description

Power transmission line corridor region automatically selects the vegetation classification method of feature

Technical field

The invention belongs to remote sensing image intelligent analysis technical field, in particular to a kind of power transmission line corridor region is automatic Select the vegetation classification method of feature.

Background technique

With national economy rapid growth, power grid construction is grown rapidly, and the safe and stable operation of transmission line of electricity is to ensure people The essential condition lived.China's transmission line of electricity power transmission distance, on the way landform and complex geologic conditions wide across region.It plants It is the main body of terrestrial ecosystems, as green space system chief component, playing in the entire ecosystem can not The effect of substitution.For electric system medium-high voltage transmission lines road, design of the trend and distribution of tree cover to transmission line of electricity There is significant impact with operation.For example, electric power line pole tower height, shaft tower addressing and the design of transmission line of electricity are raw by vegetation The influence of long situation.Especially in recent years, power transmission line corridor nearby brushfire, the initiations such as burn the grass on waste land line tripping accident frequency Hair, causes huge economic loss.In addition to human factor, fire caused by forest vegetation spontaneous combustion is the main original for causing tripping Cause.With the continuous development of power grid construction, high pressure, UHV transmission line will cover the area of more complex environments, either The arable land that has developed and utilized, coniferous forest, or the nature reserve area without exploitation, to the prison of vegetation type and growing state The critical issue that transmission line construction must be faced with safe operation will all be become by surveying.China's transmission line of electricity wide coverage, Vegetation type is various on the way, and inflammable vegetation pattern and vegetation area can effectively be judged by carrying out vegetation classification by remote sensing, So that it is determined that brushfire Yi Faqu, to prevention brushfire, prevents line tripping, guarantee transmission line safety operation meaning weight Greatly.

Resource three (ZY-3) satellites are first autonomous civilian high-resolution stereo mapping satellites of China, pass through solid Observation, can survey 1: 5 ten thousand topographic maps of system, be not necessarily to field measurement, the accurate acquisition of data can be realized, while also real The overall digital for having showed image processing and having arranged.The main task of satellite is long-term, continuous, stable, rapidly acquisition covering The high-resolution stereopsis and multispectral image of China for survey of territorial resources and are monitored, are prevented and reduced natural disasters, agriculture, forestry, water conservancy, life Apply offer service in the fields such as state environment, city construction planning and building, traffic, national Important Project.

From the point of view of Transmission Line Design and operating condition, suburb, outdoor transmission line of electricity pass through vegetation to some extent Overlay area, this directly determines the design and operation of transmission line of electricity.Therefore, the vegetation in power transmission line corridor region is investigated further Distribution and characteristic are particularly important.Classification is an important link of remote sensing image interpretation and the heat in Remote Sensing Study field Point.Nowadays greatly improving with image resolution ratio, the ground object target details for being included is more obvious, shapes textures structure Etc. information it is also more prominent.In face of characteristic information abundant, selection is targetedly characterized in improving a pass of classification performance Key link.Grandson is aobvious et al.^[1]Result of study show effectively extract high-resolution remote sensing image using shape and color characteristic Middle building target.Et al.^[2]By the semantic tagger of remote sensing image and LDA (Latent Dirichlet Allocation) model^[3]In conjunction with achieving certain effect.LIU C et al.^[4]Result of study show several feature groups It closes, classifying quality is better than single feature, but not better using the more classifying qualities of type of feature.For how to have selected Targetedly feature, LIU C et al.^[4]It proposes to complete the optimal of feature using augmentation LDA model (augmented LDA, aLDA) Selection.

In recent years, it expresses and learns for that can obtain facilitating the multi-level features that progress image is carried out by bottom to high level It practises, deep learning feature has received widespread attention in machine learning.Hinton study group proposes deepness belief network (deep belief network)^[5].From structure, deepness belief network and the difference of traditional multi-layer perception (MLP) less, and with have Supervised learning algorithm is the same.Unlike unique, deepness belief network needs first to carry out unsupervised learning before doing supervised learning, Then it is trained using weight obtained by unsupervised learning as the initial value of supervised learning.Deepness belief network is in comparison disagreement algorithm CD-n algorithm is proposed on the basis of (contrastive divergence, CD), need to only sample n times may be updated a weight. Weight is just fixed after having learnt a limited Boltzmann machine model；It is superimposed one layer of new Hidden unit again, makes to be limited glass originally The hidden layer of the graceful machine model of Wurz becomes input layer, thus constructs a new limited Boltzmann machine model.It uses again later The weight of the new limited Boltzmann machine model of same method study.The rest may be inferred, stackable multiple limited Boltzmanns out Machine model, to constitute deepness belief network.The weight that limited Boltzmann machine model learning is arrived is as deepness belief network Initial weight, then algorithm is learnt with backpropagation (Back Propagation), to form deepness belief network Learning method.

Following bibliography involved in text:

[1] grandson is aobvious, and the object-based Boosting method of Wang Hongqi, Zhang Zheng automatically extracts in high-resolution remote sensing image Building target [J] electronics and information journal, 2009,31 (1): 177-181.

[2]DATCU M.Semantic annotation of satellite Images using latent dirichlet allocation [J] .IEEE, Geoscience and Remote Sensing Letters, 2009,7 (1): 28-32.

[3] BLEI.D.M, NGA.Y, JORDAN.M.I.Latent dirichlet allocation [J] .Journal Of Machine Learning Research, 2003,3:993-1022.

[4] LIU C, SHARANL, ADELSONEH, et al.Exploring features in a Bayesian framework for material recognition[C].IEEE Conference on Computer Vision and Pattern Recognition (CVPR) .Washington, DC:IEEE Computer Society, 2010:239-246.

[5]G.E.Hinton,S.Osindero,Y.-W.Teh,A fast learning algorithm for deep Belief nets, Neural Computation, 2006, vol.18, pp.1527~1554

[6] FAN Rong-En, CHANG Kai-Wei, HSIEH Cho-Jui, et al.LIBLINEAR:A library For large linear classification [J] .The Journal of Machine Learning Research, 2008,9:1871-1874.

Summary of the invention

The object of the present invention is to provide a kind of power transmission line corridor regions that sorting algorithm efficiency and nicety of grading can be improved Vegetation classification method.

Present invention introduces a variety of characteristics of remote sensing image including deep learning feature, and optimize choosing to these features It selects, is used to carry out vegetation classification after the feature after optimum choice is combined.

In order to achieve the above objectives, the present invention adopts the following technical scheme:

A kind of power transmission line corridor regional vegetation classification method, comprising:

Step 1, the feature of training sample and cross validation sample is extracted, this step further comprises:

1.1 extract scene unit from remote sensing image training sample, and scene unit is divided into training sample and cross validation sample This；

1.2 are defined the scene type of each scene unit and are marked category label using artificial visual mode, are denoted as definition class Other label, scene type include vegetation class and non-vegetation class；

1.3 extract the various features of each scene unit；

Step 2, characteristic optimization selection is carried out based on cross-validation method, this step further comprises:

Features various in feature set are quantified as visual vocabulary by 2.1 respectively, form the corresponding visual vocabulary table of various features D_i, by each visual vocabulary table D_iLDA model is inputted, the corresponding latent semantic distribution probability vector θ of various features is obtained_i；The spy Collection is the set that the various features of cross validation sample are constituted；

2.2 by the corresponding θ of the various features of cross validation sample_iAnd define category label input regularization logistic regression respectively Classifier carries out cross validation, obtains the classification accuracy of various features, and the highest feature of classification accuracy is denoted as optimal characteristics, R is initialized using the classification accuracy of optimal characteristics；

A kind of other features and optimal characteristics are carried out tandem compound by 2.3, assemblage characteristic are denoted as, using a kind of other features Visual vocabulary table D corresponding to optimal characteristics expands, and obtains new vision vocabulary D', and a kind of described other features refer to spy Any feature in collection in addition to optimal characteristics；

New vision vocabulary D' is inputted LDA model by 2.4, and must dive semantic probability of happening vector θ '；

The definition category label of θ ' and cross validation sample input regularization logistic regression classifier is carried out intersection by 2.5 to be tested Card obtains the classification accuracy r of present combination feature according to the cross validation sample predictions category label of output_new；

If 2.6 r_new> r, by r_newIt is assigned to r, present combination feature is denoted as optimal characteristics, continues step 2.3；Otherwise, Present combination feature, that is, preferred feature；

Step 3, the classification of remote sensing image test data, this step further comprise:

3.1 use the corresponding latent semantic distribution probability vector of the optimization feature of training sample and cross validation sample and determine Adopted category label training classifier；

3.2 extract scene unit from remote sensing image test data, extract the optimization feature of each scene unit；

3.3 based on the corresponding latent semantic distribution probability of each scene unit optimization feature, and the classifier that use has been trained is to distant Sense image test data are classified, wherein the scene unit for being divided into vegetation class constitutes vegetation area；

Above-mentioned latent semantic distribution probability vector inputs the acquisition of LDA model by that will optimize the corresponding visual vocabulary table of feature.

Above-mentioned various features include SIFT feature, DAISY feature, LBP feature, BRIEF feature and CNN feature.

Use k averaging method or sparse coding method by characteristic quantification for visual vocabulary in sub-step 2.1, i.e., to various features point It is not clustered, the cluster centre of various features i.e. its corresponding visual vocabulary.

Scene unit is extracted in sub-step 1.1 and sub-step 3.1, specifically:

Remote sensing image is divided using uniform grid, a grid represents a scene unit, nothing between adjacent scene unit Overlapping, the remote sensing image are remote sensing image training sample or remote sensing image test data.

The classifier used in step 3 is regularization logistic regression classifier or SVM classifier.

Compared to the prior art, the invention has the advantages that and the utility model has the advantages that

The method of the present invention is not necessarily to largely be iterated to calculate in the feature selecting stage, gained assemblage characteristic after optimized selection With robustness, assemblage characteristic is used for the vegetation classification in remote sensing image power transmission line corridor region, is remarkably improved algorithm meter Calculate efficiency and nicety of grading.

Detailed description of the invention

Fig. 1 is the specific flow diagram of the method for the present invention.

Specific embodiment

Below in conjunction with attached drawing, detailed description of the preferred embodiments.Technical solution of the present invention can be used Computer software automatic running.

Step 1, the pretreatment of remote sensing image.

The pretreatment of remote sensing image includes the pretreatment of remote sensing image training sample and remote sensing image test data.

The pretreatment of remote sensing image training sample includes that extraction scene unit and artificial visual mode define scene type.It is first First, substantially the identical image subblock of several sizes, i.e. scene unit will be divided by remote sensing image training sample；Using artificial visual Mode defines the scene type of each scene unit and marks category label, and the category label of the label is denoted as definition category label. The scene unit that remote sensing image training sample divides is divided into training sample and cross validation sample.Scene type is only in the present invention Including vegetation class and non-vegetation class, forest and meadow belong to vegetation, and arable land and bare area belong to non-vegetation, remote sensing image test data Pretreatment extract its scene unit.

When it is implemented, remote sensing image is " resource three " panchromatic image, remote sensing image, grid are divided using uniform grid Just represent a scene unit, it is non-overlapping between adjacent scene unit.Final goal of the present invention is to give remote sensing image test data All scene units assign a scene type label, and are distinguished using different colours.In the present embodiment, scene unit size is 100 pixel *, 100 pixel.

Step 2, the various features of scene unit are extracted.

Scene classification is carried out to remote sensing image, first have to extract scenery element images Expressive Features.Iamge description feature It is various, it respectively gives priority in expression, has his own strong points in classification application.Due to power transmission line corridor area in remote sensing image Vegetation density is not substantially uniformity in domain, and edge shape multiplicity has stronger texture, color characteristics.In view of remote sensing image spy Levy it is varied, in order to illustrate the validity of this method, for the special characteristic set with 5 kinds of representative feature constructions, It is not limited to this when using the present invention.

(1) SIFT (Scale Invariant Feature Transform, Scale invariant features transform) feature extraction.

SIFT feature is locality characteristic, all has invariance for image translation, scaling, rotation, and for affine Transformation, visual angle change, illumination variation and noise etc. have very strong matching robustness, can be by finding in graphical rule space To Local Extremum position, direction and dimensional information be indicated and obtain.

The present invention extracts the SIFT feature of scene unit using existing SIFT feature kit, and specific embodiment is such as Under:

It detects scale spatial extrema point and obtains the accurate positioning of extreme point and the direction distribution of key point, generate characteristic point and retouch State operator；When generating feature point description operator, the pixel region that size is 16 × 16 is chosen centered on key point, by the pixel Region division is the sub-block that size is 4 × 4, calculates the histogram of gradients in 8 directions, i.e. SIFT feature in each sub-block respectively. Extracted SIFT feature vector has 16 × 8=128 dimension.The crucial point image is detected under different scale space to be had The Local Extremum of directional information.

(2) DAISY (daisy formula) feature extraction.

DAISY be towards dense characteristic extract can quickly calculate local image characteristics description son, essential idea and SIFT is similar: the gradient orientation histogram of each sub-block of block statistics.Unlike, DAISY is changed on partition strategy Into, using Gaussian convolution carry out gradient orientation histogram piecemeal converge, using Gaussian convolution can quickly it is computational can be quick Densely carry out the extraction of DAISY feature.

DAISY feature can be used existing DAISY feature tools packet and extract.

(3) LBP (Local Binary Pattern, Local Binary Pattern) feature extraction.

LBP feature is a kind of operator for describing image local textural characteristics, has rotational invariance and gray scale constant The remarkable advantages such as shape.LBP feature can be carried out using existing LBP feature tools packet, and specific embodiment is as follows:

(a) it will test the zonule that window is divided into 16 × 16, be denoted as cell.

(b) this step carries out pixel each in cell one by one: by the gray value of 8 adjacent pixels of current pixel point It is compared with current pixel point, if neighbor pixel gray value is greater than the gray value of current pixel point, current pixel point quilt It is otherwise 0 labeled as 1.In this way, adjacent 8 pixels in 3 × 3 neighborhood of current pixel point, which are compared, can produce 8 binary systems Number, i.e. the LBP value of current pixel point.

(c) it calculates each pixel in cell and is labeled the frequency that numerical value (0 and 1) occurs, obtain all pixels point LBP in cell The histogram of all cell is normalized in the histogram of value.

(d) feature vector that the histogram after all cell normalization connects into, i.e. LBP feature vector.

(4) BRIEF (Binary Robust Independent Element Feature, two-value robust independent element) Feature extraction.

BRIEF feature describes operator using the feature that binary coding method extracts characteristic point peripheral region, and BRIEF feature is retouched It states operator to calculate simply, memory space describes operator again smaller than SIFT feature.It is used since BRIEF feature describes operator Hamming distance is from being compared, and matching speed is faster.

BRIEF feature can be extracted using existing BRIEF feature tools packet, and specific embodiment is as follows:

(a) region of square is selected centered on characteristic point, the characteristic point is the tool directional information detected Local Extremum.

(b) Gauss nuclear convolution is carried out to the region, eliminates partial noise.

(c) the random point that generates is to < x, y > in the area, if the pixel value of x is greater than y, return value 1；Otherwise it returns 0。

(d) step (c) n times are repeated, N is that empirical value describes to get the binary-coded feature to one 256 dimensions, i.e., BRIEF feature.

(5) CNN feature (convolutional neural networks feature) is extracted.

CNN feature includes convolutional layer (convolution layer), pond layer (pooling layer) and full articulamentum (fully connected layer).Complete depth convolutional neural networks are by multiple convolution, pond and full articulamentum series connection group It closes and constitutes.In convolutional layer, input picture or input feature vector figure and several filter groups (also referred to as convolution kernel) carry out convolutional filtering Obtain characteristic pattern；Then, to pond layer is sent into after gained characteristic pattern nonlinear processing, above-mentioned nonlinear processing is using non-linear Function carries out, and common nonlinear function has Sigmoid function, Tanh function, ReLU (Rectified Linear Unit) letter Number etc..

Pondization operation is carried out to the characteristic pattern of input in the layer of pond.It is identical that characteristic pattern is divided into size by pondization operation Square region, to each region Counting statistics amount, such as the average value of fixed size window all pixels response in characteristic pattern or most Big value.By pond layer, it is equivalent to and down-sampling is carried out to characteristic pattern, obtain the lesser characteristic pattern of size.Pond layer Output can reconnect a convolutional layer, and the output of the convolutional layer reconnects another pond layer, in reasonable network number of plies premise Under, according to convolutional layer-pond layer sequential iteration.The statistic of the last layer pond layer output is sent into full articulamentum.

Full articulamentum is made of several full connection hidden layers and Softmax Regression decision-making level.Convolutional neural networks Training utilize back-propagation algorithm complete.Training is completed after obtaining each layer of network parameter, to any one width input picture Can be calculated by feedforward (Feed-forward) mode the input picture feature, i.e. CNN feature.CNN feature has good Translation and scale invariance.

Step 3, the characteristic optimization selection based on semantic model

For training sample and cross validation sample, it is based on semantic model, remote sensing shadow is directed to from the extracted feature of step 2 As the optimization feature of test data, will be used to classify after the combination of selected optimization feature later.

Semantic model comes from the data processing model in natural language processing, for expressing labyrinth and abundant language Justice.Whether the image scene classification work based on semantic model is generally completed by including latent semanteme in analysis image.This reality Example is applied using LDA (Latent Dirichlet Allocation, potential Di Li Cray distribution) model, the model is by latent semanteme Hybrid weight is considered as the potential stochastic variable of k dimension parameter.

It can be obtained by LDA model:

Marginal probability p (D | α, β) is calculated according to formula (1):

In formula (1)~(2): D indicates corpus；D indicates document serial number；M indicates total number of documents in corpus D；N is indicated Document Length；N_dIndicate the length of d-th of document；W is vocabulary distribution statistics, w_nFor n-th of vocabulary, w_dnFor in d-th of document N-th of vocabulary；θ is latent semantic distribution probability, θ_dIndicate semantic distribution of diving in d-th of document；Z indicates latent semanteme, z_nIt is n-th A latent semanteme, z_dnFor n-th in d-th of document latent semanteme；α, β are respectively hyper parameter；P (θ, z, w | α, β) it indicates in α, β condition The joint probability of lower θ, z, w；P (θ | α) indicates the distribution of α condition dive semanteme；p(θ_d| α) it indicates under the conditions of α in d-th of document Latent semantic distribution；p(z_n| θ) indicate latent semanteme z_nProbability of happening；p(z_dn|θ_d) it is based on semantic distribution of diving in d-th of document Latent semanteme z_nProbability of happening；p(w_n|z_n, β) and indicate z_n, vocabulary w under the conditions of β_nThe probability of generation；p(w_dn|z_dn, β) and it indicates d-th Document is in z_n, vocabulary w under the conditions of β_nThe probability of generation.

When estimating hyper parameter α, β, variation reasoning (variational inference) can be used or Markov Chain covers spy The methods of Caro sampling method (Markov Chain Monte Carlo, MCMC).

Augmentation LDA model (aLDA) proposed in document [4] completes the optimum choice to feature using greedy algorithm. Its core concept are as follows: in the cross validation stage, select from feature set makes a kind of maximum feature of classification accuracy rate every time, i.e., optimal The optimal characteristics and other features are combined by feature, the classification accuracy rate of feature after combination are calculated, until classification accuracy rate Until no longer rising.The method of the present invention then completes the optimum choice of feature using LDA model.

In sorting phase, document [4] obtains category label by maximum a posteriori principle using the parameter of aLDA model itself, That is:

In formula (3): λ_c=log π_c, π_cIndicate parameter π corresponding to the C classification；C is true category label, is obeyed with π For the multinomial distribution of parameter, π is the parameter of multinomial distribution here；C^*For the scene type label of estimation；L(α_c, η) and it is that model is joined The maximization lower bound of variation reasoning, α in number estimation_cIt is function lower bound for the corresponding hyper parameter α of the C classification in LDA model, η Parameter.

When feature combines, if there is m kind feature available, it is corresponding with m visual vocabulary table, vocabulary table size is denoted as | D_i|=V_i., vocabulary table size, that is, Document Length, V_iIndicate the quantity that visual vocabulary is corresponded to after i-th kind of characteristic quantification, D_iIt is i-th kind The visual vocabulary table length of visual vocabulary composition is corresponded to after characteristic quantification.Visual vocabulary corresponding to i-th kind of feature is represented byN_iFor the corresponding visual vocabulary number of i-th kind of feature.Due to various features it is independent into Row cluster operation, by the visual vocabulary of different characteristic Combination, It indicates are as follows:

Wherein N₁、N₂、…N_mValue be manually set.

It in document [4], needs based on formula (3), using LDA model inherent parameters, is constantly iterated calculating until classification Accuracy rate λ_cIt is no longer changed, it is longer to calculate the time.Simultaneously as directly completing classification using model inherent parameters, handing over Fork verifying and last test phase must individually be trained for every a kind of sample, to obtain corresponding parameter value of all categories. It will so make the calculating time in the stage linearly increasing with the increase of sample class number.

In view of the above problems, the present invention is in the cross validation stage by LDA model and regularization logistic regression classifier (Logistic Regression Classifier) is combined, and the cross validation stage only need to be based on all categories sample to LDA mould Type carries out the primary optimization and automatic selection trained and can be completed to feature.Semantic vector of diving in LDA model is inputted into regularization Logistic regression classifier is to replace formula (3), to directly acquire category label corresponding to cross validation sample.Such one Come, just no longer needs to carry out a large amount of interative computations in the cross validation stage.Meanwhile after introducing regularization logistic regression classifier, All samples once train to semantic model, do not need to be iterated computation model one by one further according to specimen types Parameter, so as to effectively improve computational efficiency.

The characteristic optimization choice phase carries out following steps based on scene cell data in cross validation sample:

(1) features various in feature set are carried out respectively: is visual vocabulary by characteristic quantification, visual vocabulary is connected into this kind The visual vocabulary table D of feature_i, D_iIndicate the visual vocabulary table of i-th kind of feature；By each visual vocabulary table D_iLDA model is inputted, is obtained To the corresponding latent semantic distribution probability vector θ of various features_i.Here feature set refers to that the various features of cross validation sample are constituted Set.

Characteristic quantification mode has diversity, such as k averaging method or sparse coding method can be used.This example utilizes k mean value (k-means) features various in feature set are clustered respectively, corresponding cluster centre this kind of spy in series of various features Levy corresponding visual vocabulary table D_i。

(2) according to the corresponding θ of various features of cross validation sample_iAnd category label is defined, it is returned using regularization logic Return classifier to carry out cross validation respectively, it is accurate that the classification classified using various features is calculated according to cross validation results The highest feature of classification accuracy is denoted as optimal characteristics by rate, initializes r using the classification accuracy of optimal characteristics.

In this sub-step, cross validation carries out respectively for various features: using current kind of feature pair of cross validation sample The θ answered_iRegularization logistic regression classifier is trained with category label is defined, the regularization logistic regression classification that use has been trained Device predicts that the category label of cross validation sample, the category label predicted are denoted as prediction category label, compare cross validation sample This definition category label and prediction category label, obtains the classification accuracy of current kind of feature.

(3) by a kind of other features and optimal characteristics tandem compound, it is denoted as assemblage characteristic, it is corresponding using a kind of other features Visual vocabulary table expand optimal characteristics visual vocabulary table, obtain new vision vocabulary D'.A kind of described other features refer to spy Any feature in collection in addition to optimal characteristics；

(4) new vision vocabulary D' is inputted to the LDA model trained, obtains the corresponding latent semantic distribution of current assemblage characteristic Probability vector θ '.

(5) the definition category label of θ ' and cross validation sample input regularization logistic regression classifier intersect and be tested Card obtains the classification accuracy r of present combination feature according to the cross validation sample predictions category label of output_new；

(6) judge r_newWith the size of r, if r_new> r, by r_newValue is assigned to r, and present combination feature is denoted as optimal characteristics, It repeats step (3)；Conversely, stopping characteristic optimization selection, present combination feature optimizes feature.

Step 4, the classification of remote sensing image test data is carried out based on optimization feature.

Various features in the optimization feature of training sample and cross validation sample are sequentially connected end to end, as training sample Vocabulary inputs LDA model, obtains latent semantic ProbabilityDistribution Vector θ_com；By latent semantic distribution probability vector θ_comInput classification Device, meanwhile, the definition category label of training sample and cross validation sample is inputted, also for being trained to classifier.Using The classifier trained classifies to the scene unit of remote sensing image test data, wherein being divided into the scene unit of vegetation class Constitute vegetation area.

It is pointed out in document [6] in Finite Samples and higher characteristic dimension using regularization logistic regression classifier, classification Effect outline is better than Linear SVM classifier, and has a clear superiority in speed.Therefore, the present invention selects Liblinear work Existing regularization logistic regression classifier in tool packet.

Logistic regression is study f:X → Y equation or the method for P (Y | X), and Y is discrete value, X=< X here₁, X₂,...,X_n> it is that wherein each variable is discrete or continuous value for any one vector.Logistic regression classifier through overfitting, It is one group of weight w₀,w_1,...,w_m.When test sample concentrate test data come then, this group of weight according to test data The mode of linear weighted function finds out a z value, and the process for solving z is to be learnt corresponding linear out by way of machine learning Curved surface is fitted training data sample:

Z=w₀+w₁·X₁+w₂·X₂+...+w_m·X_m (5)

Wherein, X₁,X₂,...,X_mIt is each feature of certain sample data, dimension m, according to the form of sigmoid function It finds out:

It can be seen that the value range of σ (z) (a possibility that representing certain classification) between [0,1], input is entire reality Number field, curve can continuously be led.When solving two classification problems, a threshold value can be set and distinguished classification.

Specific embodiment described herein is only an example for the spirit of the invention.The neck of technology belonging to the present invention The technical staff in domain can do the similar mode of various modify or supplement or adopts to described specific embodiment and substitute, However, it does not deviate from the spirit of the invention or beyond the scope of the appended claims.

Claims

1. a kind of power transmission line corridor regional vegetation classification method, characterized in that include:

1.1 extract scene unit from remote sensing image training sample, and scene unit is divided into training sample and cross validation sample；

1.2 are defined the scene type of each scene unit and are marked category label using artificial visual mode, are denoted as and are defined classification mark Number, scene type includes vegetation class and non-vegetation class；

1.3 extract the various features of each scene unit；

Features various in feature set are quantified as visual vocabulary by 2.1 respectively, form the corresponding visual vocabulary table D of various features_i, will Each visual vocabulary table D_iLDA model is inputted, the corresponding latent semantic distribution probability vector θ of various features is obtained_i；The feature set That is the set of the various features composition of cross validation sample；

2.2 by the corresponding θ of the various features of cross validation sample_iAnd define category label input regularization logistic regression classifier into Row cross validation obtains the classification accuracy of various features, and the highest feature of classification accuracy is denoted as optimal characteristics, and use is optimal The classification accuracy of feature initializes r；

A kind of other features and optimal characteristics are carried out tandem compound by 2.3, are denoted as assemblage characteristic, using a kind of other features to most The corresponding visual vocabulary table D of excellent feature is expanded, and obtains new vision vocabulary D', and a kind of described other features refer to feature set In any feature in addition to optimal characteristics；

New vision vocabulary D' is inputted LDA model by 2.4, and must dive semantic distribution probability vector θ '；

The definition category label of θ ' and cross validation sample input regularization logistic regression classifier is carried out cross validation by 2.5, The classification accuracy r of present combination feature is obtained according to the cross validation sample predictions category label of output_new；

If 2.6 r_new> r, by r_newIt is assigned to r, present combination feature is denoted as optimal characteristics, continues step 2.3；Otherwise, currently Assemblage characteristic, that is, preferred feature；

3.1 are sequentially connected end to end various features in the optimization feature of training sample and cross validation sample, as training sample Vocabulary inputs LDA model, obtains latent semantic ProbabilityDistribution Vector θ_com；By latent semantic distribution probability vector θ_comInput classification Device, meanwhile, the definition category label of training sample and cross validation sample is inputted, also for being trained to classifier；

3.3 based on the corresponding latent semantic distribution probability of each scene unit optimization feature, and the classifier that use has been trained is to remote sensing shadow As test data is classified, wherein the scene unit for being divided into vegetation class constitutes vegetation area；

2. power transmission line corridor regional vegetation classification method as described in claim 1, it is characterized in that:

The various features include SIFT feature, DAISY feature, LBP feature, BRIEF feature and CNN feature.

3. power transmission line corridor regional vegetation classification method as described in claim 1, it is characterized in that:

Use k averaging method or sparse coding method by characteristic quantification for visual vocabulary in sub-step 2.1, i.e., to various features respectively into Row cluster, the cluster centre of various features i.e. its corresponding visual vocabulary.

4. power transmission line corridor regional vegetation classification method as described in claim 1, it is characterized in that:

Scene unit is extracted in sub-step 1.1 and sub-step 3.2, specifically:

Remote sensing image is divided using uniform grid, a grid is to represent a scene unit, non-overlapping between adjacent scene unit；

In sub-step 1.1, remote sensing image is remote sensing image training sample；

In sub-step 3.2, remote sensing image is remote sensing image test data.

5. power transmission line corridor regional vegetation classification method as described in claim 1, it is characterized in that: