CN107909039B

CN107909039B - High-resolution remote sensing image earth surface coverage classification method based on parallel algorithm

Info

Publication number: CN107909039B
Application number: CN201711138873.9A
Authority: CN
Inventors: 钟燕飞; 赵�济; 吕鹏远; 王晶; 马爱龙; 刘艳飞; 伍丝琪; 张良培
Original assignee: Wuhan University WHU
Current assignee: Wuhan University WHU
Priority date: 2017-11-16
Filing date: 2017-11-16
Publication date: 2020-03-10
Anticipated expiration: 2037-11-16
Also published as: CN107909039A

Abstract

The invention discloses a high-resolution remote sensing image earth surface coverage classification method based on a parallel algorithm, which comprises the following steps: s1, segmenting the high-resolution remote sensing image data according to the number of the computers to obtain segmented high-resolution remote sensing image blocks; s2, distributing all high-resolution remote sensing image blocks to m processors based on an OpenMP parallel framework, and executing earth surface coverage classification processing concurrently; and S3, merging all the high-resolution remote sensing image block data according to a data segmentation principle to obtain a final earth surface coverage classification result. The method automatically segments the data according to the size of the data and the condition of using the memory of the computer, organizes the classification algorithm flow by using the configuration file, realizes the parallel classification algorithm, and thus can be suitable for the high-resolution earth surface coverage mapping task with extremely large data volume and finely divided ground object space.

Description

High-resolution remote sensing image earth surface coverage classification method based on parallel algorithm

Technical Field

The invention belongs to the technical field of remote sensing image processing, and particularly relates to a high-resolution remote sensing image earth surface coverage classification method based on a parallel algorithm.

Background

The national social and economic construction of land resource matters is a foundation on which human beings live, and with the development of a urbanization process, the land surface coverage/utilization which reflects the natural attributes and social attributes of land object types on the land surface is determined, so that the method has important effects on earth system modes, global environment changes, national soil protection, urban development decisions, hydraulic engineering construction and the like. Therefore, in the past decades, international society and various organizations have paid high attention to research on ground surface coverage/utilization and changes thereof, research projects are frequently on a few occasions, such as a core project of land utilization/ground surface coverage change of an international biosphere plan (IGBP) and an international global environmental change human factor plan (IHDP), a core project of ground surface coverage/utilization change of NASA in the united states, a project of national 863 plan focus project of global ground surface coverage remote sensing mapping and key technology research, and the like, and corresponding ground surface coverage products are successively released as research results, and according to relevant documents, about 21 sets of ground surface coverage products are currently available on a global scale, and at least 43 sets of ground surface coverage products are currently available on an area scale. Meanwhile, for the needs of construction and development, the country successively develops a second national land survey and a first national situation census project in a large scale, wherein the acquisition of land surface coverage/utilization is an important task for land survey and geographical national situation monitoring. In conclusion, large-scale global or regional surface coverage mapping has important research significance and application value.

The present global ground surface covering product with the highest resolution is a global ground surface covering product with 30 m spatial resolution which is manufactured based on Landsat TM data in China. Compared with Landsat (TM) data used by a 30-meter ground surface covering product, the high-resolution remote sensing image in the application of meter-level spatial resolution ground surface covering mapping has the characteristics, and in order to cover the same area, the high-resolution remote sensing image needs larger size and more pixels, so that the data volume has higher requirements on scale and fine division of ground object space, and the global ground surface covering mapping task based on the high-resolution remote sensing image becomes very difficult. The invention aims to provide a full-automatic large-scale earth surface coverage classification method based on parallel computation aiming at a high-resolution remote sensing image.

Disclosure of Invention

Aiming at the problems of large data volume and large calculated amount in the application of high-resolution remote sensing image earth surface coverage mapping, the invention adopts a processing mode of data parallel and algorithm calculation parallel to extract the spectral characteristics of the remote sensing image, fuses the existing maximum likelihood and support vector machine algorithm, and then carries out post-processing on the intermediate result by using a connected region labeling algorithm, thereby being capable of quickly obtaining a high-resolution earth surface coverage classification result.

In order to achieve the purpose, the technical scheme of the invention is a high-resolution remote sensing image earth surface coverage classification method based on a parallel algorithm, which comprises the following steps:

step one, segmenting the high-resolution remote sensing image data according to the number of computers to obtain segmented high-resolution remote sensing image blocks, and simultaneously recording the coordinates of the starting and ending positions of each high-resolution remote sensing image block in the following way,

and setting h as the length of the image and w as the width of the image, the width of the high-resolution remote sensing image block after segmentation

Length H-W r, where the ratio r-H/W, Mey is the memory number, S_tIs a data type, S_sA safety factor is set;

distributing all the high-resolution remote sensing image blocks to Mey processors based on an OpenMP parallel framework, and executing earth surface coverage classification processing concurrently;

and thirdly, combining all the high-resolution remote sensing image block data according to the coordinates of the starting and stopping positions of each high-resolution remote sensing image block to obtain a final earth surface coverage classification result.

Further, the surface coverage classification processing in the second step includes the following steps:

step 1, extracting the characteristics of a high-resolution remote sensing image block, which comprises the following substeps;

step 1.1, extracting spectral characteristics of a high-resolution remote sensing image block;

step 1.2, calculating a normalized vegetation index as the feature of vegetation extraction;

step 1.3, calculating a normalized water body index as the characteristic of the extracted water body;

step 1.4, calculating texture features based on gray level co-occurrence moments as local spatial features;

step 1.5, fusing the extracted spectral features, the normalized vegetation indexes, the normalized water body indexes and the texture features in a vector superposition mode, and inputting the fused spectral features, the normalized vegetation indexes, the normalized water body indexes and the texture features as the features of subsequent classification;

step 2, constructing a classification algorithm fusing maximum likelihood and a support vector machine, classifying the earth surface coverage categories in the high-resolution remote sensing image block, and obtaining a classified image, wherein the classification algorithm comprises the following substeps:

step 2.1, acquiring the member probability of the earth surface coverage category based on a maximum likelihood classification algorithm;

step 2.2, based on the obtained membership probability of the feature category, judging the accuracy of the feature discrimination according to a threshold, wherein the method further comprises the following steps:

step 2.2.1, determining the optimal mark type of the current pixel according to the maximum posterior probability;

step 2.2.2, judging the accuracy of the pixel category, if the category accuracy is greater than a given threshold value, directly using a classification result formed by labels corresponding to the maximum probability in the maximum likelihood classification algorithm in the step 2.1, otherwise, jumping to the step 2.3;

step 2.3, distinguishing pixels which are difficult to accurately distinguish in the maximum likelihood classification in a nonlinear space by using a nonlinear support vector machine classification algorithm;

step 3, generating a segmentation object on the classification image based on a connected region labeling algorithm, and post-processing the classification result in the step 2 through a region merging strategy to obtain a ground surface coverage classification result in the high-resolution remote sensing image block,

3.1, obtaining a segmentation object O by using a classic eight-neighborhood connected region labeling algorithm on the basis of the classification result in the step 2;

step 3.2, defining the obtained segmentation object as O according to the spatial resolution of the high-resolution remote sensing and the ground feature characteristics_iWherein i is the number of categories; passing threshold T₁Judgment of O_iWhether it belongs to a noisy object, when_iGreater than a given threshold T₁It is directly retained, otherwise it is merged into the neighboring segmented object.

Further, the processing of classifying the surface coverage in the second step further includes a step 4 of performing quality evaluation on the classification result of the surface coverage according to a Kappa coefficient, and the specific implementation manner is as follows:

and 4.1, calculating a confusion matrix,

where m represents the number of object classes, p_ijIndicates the ith row and the jth column have p_ijPredicting the number of the pixels actually belonging to the ith class as the jth class; the numerical value on the main diagonal line of the matrix is represented by the number of pixels belonging to a certain class in the surface real image and belonging to the same class in the corresponding classification image, namely the number of correctly classified pixels;

step 4.2, calculating producer precision based on the confusion matrix, formula

Wherein i and j are respectively expressed as the land feature types occupied by the ith type and the jth type in the confusion matrix, A_jjRepresenting the values of the elements on the diagonal of the confusion matrix;

calculating the overall classification accuracy, formula

Wherein N represents the total number of true samples;

calculating the Kappa coefficient, wherein the formula Kappa is (d-q)/(N-q), wherein N is trueThe total number of real samples, d the number of samples on the diagonal in the confusion matrix, C the number of classes, A_ijRepresenting the values of the elements in the confusion matrix corresponding to the i, j positions, q being expressed as

And 4.3, when the Kappa value is larger than α, the consistency between the classification result and the ground reference information is large or the precision is high, the consistency is medium when the Kappa value is β - α, and the consistency is poor when the Kappa value is smaller than β.

Further, the calculation formula of the normalized vegetation index in step 1.2 is

Where ρ is_nirIs the reflectance value rho of the near infrared band of the high-resolution multispectral remote sensing image_redIs the reflectance value of the red band.

Further, the calculation formula of the normalized water body index in the step 1.3 is as follows

Where ρ is_nirIs the reflectance value rho of the near infrared band of the high-resolution multispectral remote sensing image_greenIs the reflectance value of the green band of the image.

Further, the texture features in step 1.4 include entropy and homogeneity based on gray level co-occurrence moments, which are implemented as follows,

let Kx ═ {1, 2, …, Nx } be the horizontal direction spatial domain and Ky ═ {1, 2, …, Ny } be the vertical direction spatial domain, the image may be defined as grayscale pixel interval Kx × Ky ═ k, where entropy

Homogeneity of the mixture

In the formula Q_i'j'And expressing the element values of the ith 'row and j' column in the normalized gray level co-occurrence matrix.

Further, β and α in step 4.3 have values of 0.4 and 0.8, respectively.

Compared with the prior art, the invention has the following advantages and beneficial effects:

(1) the adaptability is good, and the data can be automatically segmented according to the size of the data and the condition of using a computer memory, so that the method can be suitable for operating environments with different configurations;

(2) the parallel computing capability is realized, an adaptive parallel algorithm is added, and parallel processing can be performed according to the number of CPUs (central processing units) of the machine;

(3) the method has the characteristics of easiness in use, no need of auxiliary information and manual intervention, high calculation speed and capability of realizing automatic processing.

(4) The global earth surface coverage mapping application system based on the high-resolution remote sensing satellite has reusability, and organizes a classification algorithm process by using the configuration files, so that different users can repeatedly obtain corresponding earth surface coverage mapping results through the same configuration files.

The method disclosed by the invention fuses the extracted spectral features, the normalized vegetation index, the normalized water body index and the texture features in a vector superposition mode to obtain fused feature input used as subsequent classification, classifies the high-resolution remote sensing images by utilizing a classification algorithm of a maximum likelihood and support vector machine, performs post-processing on classification results through a region merging strategy, and finally calculates a Kappa coefficient to perform quality evaluation on the classification images to realize the judgment of classification precision.

Drawings

FIG. 1 is a flowchart of a surface coverage classification process according to the present invention;

FIG. 2 is a schematic diagram of a hyperplane in a non-linear support vector machine according to an embodiment of the present invention.

Detailed Description

For a better understanding of the technical solutions of the present invention, the present invention will be further described in detail with reference to the accompanying drawings and examples.

Step 1, establishing a parallel computing framework.

Step 1.1, data are parallel, and the high-resolution remote sensing image data need to be segmented before ground surface coverage classification due to the large data volume of the high-resolution remote sensing image. The data parallel is that before data operation, data is automatically segmented according to a certain strategy, a large standard data is segmented into a plurality of small data modules, calculation is carried out by a plurality of CPU cores and other calculation resources, after the calculation is finished, the data is automatically merged according to a data segmentation principle, and the data in a required format is returned. The data parallelism can be calculated using the following formula.

After automatically segmenting the data, the width of each small data module

The ratio r is h/w, where h is the length of the image, w is the width of the image, Mey is the number of memories, S_tBeing data type, e.g. integer, floating point, S_sFor the safety factor, the safety factor in this embodiment is 4; the length H of each small data template is W r; and simultaneously recording the coordinates of the start-stop position of the small data module.

And 1.2, paralleling the algorithms. The algorithm parallelism is specific to an algorithm, and in the actual calculation process, the parallel framework based on OpenMP and the like is distributed to a plurality of processors to execute calculation concurrently. The invention adopts a flow customization mode to realize automatic earth surface coverage classification, and a user can select an extraction method, a classification model and a classification post-processing method of the features through a configuration file.

And 2, extracting the features of the segmented high-resolution remote sensing image block, wherein the step further comprises the following steps.

And 2.1, directly using the multispectral wave band of the high-resolution remote sensing image block to extract spectral features, and providing basic judgment on the ground objects through superposition of common spectral wave bands.

Step 2.2, calculating the normalized vegetation index (NDVI) as the feature of the extracted vegetation, wherein the calculation formula is

Step 2.3, calculating a normalized water body index (NDWI) as the characteristic of the extracted water body, wherein the calculation formula is

Step 2.4, texture features based on a gray level co-occurrence matrix (GLCM) are calculated as local spatial features, so as to describe a spatial distribution relationship between image elements, where Kx ═ {1, 2, …, Nx } is a horizontal direction spatial domain, Ky ═ {1, 2, …, Ny } is a vertical direction spatial domain, Nx and Ny are known quantities for a given image, an image may be defined as a gray level pixel interval Kx ═ Ky, and entropy is used to describe a spatial distribution relationship between image elements

And homogeneity

As local space characteristics, measuring and constructing texture characteristic value statistics, wherein Q in the formula_i'j'And expressing the element values of the ith 'row and j' column in the normalized gray level co-occurrence matrix. The gray level co-occurrence matrix is a matrix function of pixel distance and angle, and reflects the comprehensive information of the image in direction, interval, variation amplitude and speed by calculating the correlation between two points of gray levels in a certain distance and a certain direction in the image, and the calculation mode is the prior art, and can be specifically referred to in document [1]]The present invention is not described in detail.

[1] High range, Whiteon, texture feature extraction [ J ] based on gray level co-occurrence matrix computer system applications, 2010,19(6):195-198.

And 2.5, fusing the extracted spectral features, the normalized vegetation indexes, the normalized water body indexes and the texture features in a vector superposition mode to obtain fusion serving as feature input of subsequent classification.

And 3, constructing a classification algorithm fusing the maximum likelihood and the support vector machine, classifying the high-resolution remote sensing image blocks, and obtaining a classified image, wherein the specific implementation mode of the step is as follows:

and 3.1, providing member probability of the earth surface coverage type based on a maximum likelihood classification algorithm (MLC), and providing a decision basis for obtaining a final earth surface coverage type. The maximum likelihood classification method takes the distribution of remote sensing multi-waveband data as multi-dimensional normal distribution to construct a discriminant function. The basic idea is as follows: forming a certain point group in a plane or space by the data of various known pixels; each kind of data of every dimension forms a normal distribution on its own axis, the multidimensional data of this kind forms a multidimensional normal distribution of this kind, have various multidimensional distribution models, to any unknown data vector of classification, can reverse to ask it to belong to the probability of various kinds; comparing the probabilities, to see which class the probability is large, the data vector or the pixel is classified as the class, which can be expressed as:

wherein m is the number of bands; p (w)_i) Is an m-dimensional normal distribution density function of the i-th class, from which the probability of the m-dimensional random variable x to have various possible values in the k-th class can be seen. The m-dimensional data vector for a pixel may be represented as:

in m_iRepresents class ω_iMean value of each band

The mean vector formed is given by the following formula,

C_irepresenting a covariance matrix of the class, the formula being

Wherein n is_kIs the number of pixels of the kth class; w_kIs the class k intra-class dispersion matrix, as shown in the following formula,

in the formula, ω_k11,ω_k22,…ω_kmmIs the within-class variance of class k; and omega_k12,…,ω_k1mAnd omega_k21,…,ω_km1The class-k covariance is the class-within covariance, from which the covariance matrix of the class can be derived.

Through simplification, the original equation can be simplified into the following formula,

wherein p (w)_i) Represents class ω_iF denotes the corresponding class ω_iThe probability density of (c). Based on the maximum likelihood decision function value, by P_i(l)＝exp(D_i(f) K obtains the membership probability of the surface feature class, where K is the number of surface coverage classes. The maximum likelihood classification algorithm (MLC) is prior art, see in particular document [2]]。

[2] Soul' an remote sensing digital image processing [ M ] science publishers, 2004.

Step 3.2, based on the obtained membership probability of the feature category, judging the accuracy of the feature discrimination according to a threshold, wherein the method further comprises the following steps:

step 3.2.1, determining the optimal mark of the current pixel according to the maximum posterior probability, formula

Wherein k represents the optimal mark of the current pixel;

step 3.2.2, the accuracy of the pixel category is judged, and if the category accuracy is larger than a given threshold value, the label formed by the labels corresponding to the maximum probability in the maximum likelihood classification algorithm in the step 2.1 is directly usedAnd (4) sorting results, otherwise, jumping to the step 2.3. Formula P_i(k) T, where T is a set default accuracy threshold, usually 0.8, indicating that the current pel is likely to get the corresponding label.

And 3.3, distinguishing pixels which are difficult to accurately distinguish in the maximum likelihood classification in a nonlinear space by using a nonlinear Support Vector Machine (SVM) classification algorithm. Different from the maximum likelihood probability-based classification method, the support vector machine is a machine learning algorithm based on a statistical learning theory, and the generalization error of the model is reduced while the sample error is minimized by adopting the structure risk minimization principle, so that the generalization capability of the model is improved. The formula f (x) w · Φ (x) + b represents the discriminant function of the hyperplane, where w is the weight vector, b is the offset, and Φ (x) is the vector related to the parameter x. As shown in fig. 2, there is a two-dimensional plane with two different data, represented by circles and crosses. Since these data are linearly separable, the two types of data can be separated by a straight line, which in two dimensions corresponds to a hyperplane. The nonlinear support vector machine classification algorithm is the prior art, and can be specifically referred to in the literature [3 ].

[3] Leayage statistical learning method [ M ] qing university press, 2012.

And 4, generating a segmentation object based on a connected region labeling algorithm, and performing post-processing on the classification result in the step 2 through a region merging strategy, wherein the specific implementation mode of the step is as follows:

4.1, on the basis of the classification result in the step 2, obtaining a segmented object O by using a classic eight-neighborhood connected region labeling algorithm for collecting and searching a data structure;

and 4.2, on the basis of the classification result in the step 2, defining the obtained segmentation object as O according to the spatial resolution and the ground feature characteristics of the high-resolution remote sensing image_iAnd i is the number of categories. Passing threshold T₁Judging whether it belongs to a noise object (T)₁Typically 0.8), e.g. above a given threshold T₁Directly preserving, otherwise combining into adjacent segmentation object, and obtaining classification by maximum voting strategy in object unitAs a result, the classification result in step 2 is improved, so that a more refined surface coverage classification result is obtained.

And 5, reversely merging data by using the recorded start-stop position coordinates of the small data modules according to the process shown in the step 1 to obtain a final classification result.

And 6, evaluating the quality of the classified images, wherein the specific implementation mode of the step is as follows:

and 6.1, calculating a confusion matrix, wherein the confusion matrix is obtained by comparing the position of each real earth surface pixel with the classification pixels at corresponding positions in the classification map, the columns of the confusion matrix represent the number of the pixels in each class in the classification map, and the rows represent the real attribution class of the data. The confusion matrix is as follows:

where m represents the number of object classes, p_ijIndicates the ith row and the jth column have p_ijAnd predicting the picture element which actually belongs to the ith class as the jth class. Therefore, the value on the main diagonal line of the matrix is represented by the number of pixels in the surface real image, which belong to the same class as the corresponding classification image, i.e., the number of correctly classified pixels. Therefore, the larger the value on the main diagonal line in the matrix is, the more the number of correctly classified pixels is, and the higher the classification precision is.

Step 6.2, calculating Producer's Accuracy (PA) based on the confusion matrix, and calculating the formula

Wherein i and j are respectively expressed as the ground feature types shown in the ith class and the jth class in the confusion matrix, A_jjRepresenting the values of the elements on the diagonal of the confusion matrix; calculate the Overall classification Accuracy (OA), formula

Where N represents the pre-selected real sample total. Calculating KThe appa coefficient reflects the imbalance between classes, where the formula kappa (d-q)/(N-q), where N is the total number of true samples, d is the number of samples on the diagonal in the confusion matrix, C is the number of classes (which may be user-defined), a_ijRepresenting the values of the elements in the confusion matrix corresponding to the i, j positions, q being expressed as

Kappa number>0.80, that is, the consistency between the classification map and the ground reference information is very large or the precision is very high, the Kappa value is 0.40-0.80 to indicate that the consistency is medium, and the Kappa value is less than 0.40 to indicate that the consistency is poor.

The specific embodiments described herein are merely illustrative of the spirit of the invention. Those skilled in the art to which the invention relates may effect numerous modifications, additions or substitutions to the specific embodiments described, without departing from the spirit or ambit of the invention as defined in the accompanying claims.

Claims

1. The method for classifying the earth surface coverage of the high-resolution remote sensing image based on the parallel algorithm is characterized by comprising the following steps of:

combining all the high-resolution remote sensing image block data according to the coordinates of the starting and stopping positions of each high-resolution remote sensing image block to obtain a final earth surface coverage classification result;

the earth surface coverage classification processing in the second step comprises the following steps:

2. The method for classifying the earth surface coverage of the high-resolution remote sensing image based on the parallel algorithm as claimed in claim 1, wherein the earth surface coverage classification processing in the second step further comprises a step 4 of performing quality evaluation on the earth surface coverage classification result according to a Kappa coefficient, and the specific implementation mode is as follows:

and 4.1, calculating a confusion matrix,

step 4.2, calculating producer precision based on the confusion matrix, formula

calculating the overall classification accuracy, formula

Wherein N represents the total number of true samples;

calculating the Kappa coefficient, wherein the formula Kappa is (d-q)/(N-q), where N is the total number of real samples, d is the number of samples on the diagonal in the confusion matrix, C is the number of classes, a_ijRepresenting the values of the elements in the confusion matrix corresponding to the i, j positions, q being expressed as

And 4.3, when the Kappa value is larger than α, the consistency between the classification result and the ground reference information is large or the precision is high, the Kappa value is in the range of [ β ], the consistency is medium, and the consistency is poor when the Kappa value is smaller than β.

3. The method for classifying the earth surface coverage of the high-resolution remote sensing image based on the parallel algorithm as claimed in claim 2, wherein: the calculation formula of the normalized vegetation index in step 1.2 is

4. The method for classifying the earth surface coverage of the high-resolution remote sensing image based on the parallel algorithm as claimed in claim 3, wherein: the calculation formula of the normalized water body index in the step 1.3 is

Where ρ is_nirIs the reflectance value rho of the near infrared band of the high-resolution multispectral remote sensing image_greenIs the inverse of the green band of the imageAnd (4) a refractive index value.

5. The method for classifying the earth surface coverage of the high-resolution remote sensing image based on the parallel algorithm as claimed in claim 4, wherein: the texture features in step 1.4 include entropy and homogeneity based on gray level co-occurrence moments, which are implemented as follows,

let Kx ═ {1, 2, …, Nx } be the horizontal direction space domain, Ky ═ {1, 2, …, Ny } be the vertical direction space domain, the image is defined as the grayscale pixel interval Kx × Ky ═ k, where entropy

Homogeneity of the mixture

6. The method for classifying the earth surface coverage of the high-resolution remote sensing image based on the parallel algorithm as claimed in claim 5, wherein the values of β and α in the step 4.3 are 0.4 and 0.8 respectively.