CN116385892A

CN116385892A - Digital elevation model extraction method based on target context convolution neural network

Info

Publication number: CN116385892A
Application number: CN202310106547.9A
Authority: CN
Inventors: 吴丽沙; 张谷生; 刘建明
Original assignee: Beijing Daoda Tianji Technology Co ltd
Current assignee: Beijing Daoda Tianji Technology Co ltd
Priority date: 2023-02-13
Filing date: 2023-02-13
Publication date: 2023-07-04

Abstract

The invention relates to a digital elevation model extraction method based on a target context convolution neural network, which comprises the following steps: sequentially carrying out three-dimensional area network adjustment, epipolar image generation, image dense matching and front intersection processing on the three-dimensional image pair images so as to generate a digital surface model DSM; performing ground object classification on the stereopair images by using a target context convolution neural network, and extracting ground object boundaries; dividing the digital surface model DSM into a corrected target area and a non-corrected area according to the classification result of the target context convolution neural network; and adopting high Cheng Qumian fitting processing for the correction target area, and adopting smooth filtering processing for the non-correction area to jointly generate the digital elevation model DEM. According to the invention, the digital elevation model DEM is generated on the stereopair images by using the target context convolution neural network, so that the production efficiency of the digital elevation model DEM is improved while the production precision is ensured, and the high-precision automatic extraction of the digital elevation model DEM in different areas is realized.

Description

Digital elevation model extraction method based on target context convolution neural network

Technical Field

The invention relates to the technical field of digital elevation model extraction, in particular to a digital elevation model extraction method based on a target context convolution neural network.

Background

Digital Elevation Model (DEM) is a digital representation of the surface height of a terrain, and is widely used in the scientific and engineering fields. The traditional data acquisition modes of the digital elevation model comprise space remote sensing and traditional space remote sensing, the space remote sensing has the advantages of stable data source, low data acquisition cost, large imaging range, short acquisition period, stable platform, higher attitude control and measurement precision and the like, the precision and the update period of three-dimensional measurement can be fully ensured, and the method is a main means for producing the digital elevation model for a large area by a current production unit.

The traditional digital elevation model generation method mainly comprises filtering and artificial visual interpretation, wherein a filtering algorithm is mainly based on curved surface characteristics of the ground, and can remove partial noise points, but can not thoroughly remove objects such as trees, buildings and the like; the manual visual interpretation method requires manual intervention, has low automation degree, and the effect of removing the non-ground object information depends on the operation level of a person.

Therefore, the method utilizes the data acquired by space remote sensing to carry out rapid, efficient and accurate remote sensing image information mining, and generates a high-precision digital elevation model, which has very important research significance.

Disclosure of Invention

The invention aims to generate a Digital Elevation Model (DEM) on a stereopair image by using a target context convolution neural network, and provides a digital elevation model extraction method based on the target context convolution neural network, which has the advantages of accuracy and convenience, ensures the production precision, improves the production efficiency of the Digital Elevation Model (DEM), and realizes the high-precision automatic extraction of the Digital Elevation Model (DEM) in different areas.

In order to achieve the above object, the embodiment of the present invention provides the following technical solutions:

the digital elevation model extraction method based on the target context convolution neural network comprises the following steps:

step 1, sequentially carrying out three-dimensional region adjustment, epipolar image generation, image dense matching and front intersection processing on three-dimensional image pair images, thereby generating a digital earth surface model DSM;

step 2, performing ground object classification on the stereopair images by using a target context convolution neural network, and extracting ground object boundaries;

step 3, dividing the digital surface model DSM into a correction target area and a non-correction area according to the classification result of the target context convolution neural network; and adopting high Cheng Qumian fitting processing for the correction target area, and adopting smooth filtering processing for the non-correction area to jointly generate the digital elevation model DEM.

In the scheme, firstly, stereo region adjustment, epipolar image generation, dense image matching and front intersection processing are carried out on the high-resolution stereo pair images to obtain a digital landmark model DSM with rich ground detail, then, ground object boundary areas such as buildings, vegetation and water areas are automatically identified by using a target context convolution neural network, and finally, the elevation curved surface is fitted through a radial neural network under the boundary constraint condition, so that the high-precision DEM automatic extraction of different areas is realized. According to the method, the target context convolution neural network is added into the generation of the digital elevation model DEM, compared with the problems of inefficiency and error caused by manual editing in the traditional generation process of the digital elevation model DEM, the terrain production efficiency can be greatly improved, and the production precision requirement of the high-precision digital elevation model DEM of 1:50000 is met.

Compared with the prior art, the invention has the beneficial effects that:

compared with the traditional mode of carrying out manual post-processing after digital surface model DSM filtering, the method can more effectively solve the problems of time consumption and low automation degree of operation processing, stabilization to other areas, certain blindness and the like, can repair water areas, buildings and other elevations in a targeted mode, can well finish repair of corresponding areas, and has a smooth generated digital elevation model DEM which is close to the result of the manual post-processing.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a method according to embodiment 1 of the present invention;

fig. 2 is a schematic diagram of matching cost aggregation in 16 directions in embodiment 1 of the present invention;

FIG. 3 is a schematic diagram illustrating a merging process of the parting network algorithm in embodiment 1 of the present invention;

FIG. 4 is a schematic diagram of a segmentation and merging process of the parting network algorithm in embodiment 1 of the present invention;

FIG. 5 is a diagram of a high Cheng Xunlian sample selection interval in example 1 of the present invention;

fig. 6 is a graph of a test result generated by a high-precision digital elevation model DEM in embodiment 2 of the invention, where (a) in fig. 6 is an original stereopair image, (b) in fig. 6 is a building extraction result based on a target context convolution neural network, (c) in fig. 6 is a schematic view of a generated digital earth model DSM, and (d) in fig. 6 is a schematic view of a generated digital elevation model DEM;

FIG. 7 is a graphic representation of the digital surface model DSM results in a shading map and its partial area enlargement in accordance with example 2 of the present invention;

FIG. 8 is a graphic representation of the result of a digital elevation model DEM in accordance with embodiment 2 of the present invention showing a shading map and a partial region thereof;

FIG. 9 is a diagram showing the distribution of checkpoints in embodiment 2 of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. The components of the embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the invention, as presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be made by a person skilled in the art without making any inventive effort, are intended to be within the scope of the present invention.

It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures. Also, in the description of the present invention, the terms "first," "second," and the like are used merely to distinguish one from another, and are not to be construed as indicating or implying a relative importance or implying any actual such relationship or order between such entities or operations. In addition, the terms "connected," "coupled," and the like may be used to denote a direct connection between elements, or an indirect connection via other elements.

Example 1:

the invention is realized by the following technical scheme, as shown in fig. 1, a digital elevation model extraction method based on a target context convolution neural network comprises the following steps:

and step 1, sequentially carrying out three-dimensional region adjustment, epipolar image generation, image dense matching and front intersection processing on the three-dimensional image pair images, thereby generating a digital surface model DSM.

Step 1-1: three-dimensional area net adjustment.

The high-precision geometric positioning of the stereopair images acquired by the high-resolution space remote sensing satellite is mainly realized by the adjustment of a stereoscopic regional network. Firstly, a rational polynomial model is constructed, then, the connection point of the stereopair image and the control point acquired by the field are utilized to calculate the adjustment of the stereoscopic regional network, and finally, the refined rational polynomial parameters and the ground coordinates of the connection point are obtained.

Where l denotes the row of pixels in the stereopair image, s denotes the column of pixels in the stereopair image, v _l 、v _s Residual errors are observed values of image point coordinates; s is(s) _Measuring 、l _Measuring The coordinate observation value is the coordinate observation value of the homonymous image point;

、/>

、/>

、/>

、/>

、/>

affine transformation parameter correction values of rational polynomials of stereopair images; />

、/>

、/>

First derivatives of row coordinates of stereopair images in the longitudinal, latitudinal and elevation directions in a rational polynomial, respectively, +.>

、/>

、/>

Respectively the first derivatives of the column coordinates of the stereopair images in the directions of longitude, latitude and elevation in the rational polynomials; />

、/>

、/>

Correction values of the same name image points in the longitude, latitude and elevation directions are respectively obtained; />

、/>

As a constant term, representing the difference between the measured pixel coordinates and the pixel coordinates calculated from the rational polynomials in the row and column directions, respectively, there are:

wherein l _rpc 、s _rpc Respectively calculating image row coordinates and column coordinates according to the RPC parameters of the stereopair images; a, a ₀ 、a _s 、a _l 、b ₀ 、b _s 、b _l Affine transformation parameters which are all rational polynomials of the stereopair images can be set to be 0 as initial values in the primary adjustment calculation, affine transformation parameter correction values calculated after the adjustment calculation are added to the initial values each time, and the next adjustment calculation is participated until the image point coordinate observation value residual error v _l 、v _s The set threshold is met and the iteration is terminated.

Step 1-2: and generating a epipolar line image.

After the adjustment of the stereoscopic regional network is completed, epipolar line correction is needed before stereoscopic image pair image matching, so that the matching precision is improved, and meanwhile, the image matching time is greatly shortened. According to the three-dimensional regional network adjustment result, the scheme adopts a epipolar line image generation model based on an image side, and the model adopts a quadratic polynomial model:

wherein l _epi 、r _epi Respectively row coordinates and column coordinates of the epipolar line image; l (L) _org 、r _org The row and column coordinates of the stereopair images after the adjustment of the stereoscopic regional network are respectively; c ₀ 、c ₁ 、c ₂ 、c ₃ 、c ₄ 、c ₅ The quadratic polynomial fitting parameters, d, which all represent the line coordinates of the epipolar line image ₀ 、d ₁ 、d ₂ 、d ₃ 、d ₄ 、d ₅ And the quadratic polynomial fitting parameters of the nuclear line image column coordinates are represented.

Because the size of the stereopair image is larger and is usually 20000 x 20000, in order to improve the matching efficiency, the stereopair image is segmented on the basis of a quadratic polynomial model, the segmentation size is usually 5*5 or 7*7, so that a left image block can find a right image block corresponding to the stereopair image through affine transformation relation, a projection track method is utilized to generate a epipolar line image pair after the left image block and the right image block are segmented, the method can ensure that parallax images generated by adjacent epipolar line image blocks have no gaps, and certain overlapping exists between the blocks.

Step 1-3: the images are closely matched.

The scheme uses a semi-global matching algorithm (SGM) to take mutual information of each pixel as a matching primitive, and carries out overall matching on the basis of an energy function minimization principle, so that the method has the advantages of good stability and high efficiency.

(1) Matching cost calculation

The matching cost calculation is mainly to calculate the matching degree of pixels between a left image and a right image, and the smaller the calculated value is, the higher the matching degree of two pixels to be matched on the left image and the right image is. The optional matching cost calculation method comprises the following steps: the sum of squares of pixel gray differences, the sum of absolute differences of pixel gray differences, similar linear differences, mutual information, etc. The embodiment adopts a mutual information matching cost calculation method, which not only can process complex radiation relations between images, but also can ignore the influence of illumination, and the calculation mode is as follows:

wherein I is ₁ Representing the gray value of the left image, I ₂ Gray value representing right image;

mutual information matching cost value representing left and right images, < ->

Representing the mutual information matching cost value of pixel points in the left and right images; p represents a pixel point set, P is E P; i _1p Representing gray values of pixel points in the left image, I _2p And the gray value of the pixel point in the right image is represented.

Wherein i and k represent gray values of corresponding image points of the left and right images;

mutual information representing left and right images;

entropy representing left image->

Entropy representing the right image; />

And the joint entropy of the left image and the right image is represented.

Wherein h is _I (x) Is a data item; n is the number of pixel points calculated by the parameters; g _I (x) A probability distribution of gray values for the image x; g (x) is a gaussian convolution. Based on this formula:

wherein, the liquid crystal display device comprises a liquid crystal display device,

the probability distribution of the joint entropy of the left image and the right image is represented; g (i, k) represents the gray value of the corresponding image point of the left and right images, and the gaussian convolution operation is performed.

(2) Matching cost aggregation

The core of the semi-global matching algorithm is to build an energy function E (D) that depends on the parallax image D:

wherein D is _p Represents the parallax value of the pixel point p,

matching the sum of costs for all the pixel points; n (N) _p A field pixel point set representing a pixel point p; the pixel point p and the neighborhood pixel point have two conditions of small change and large change of depth difference, and the second term and the third term on the right side of the equal sign are respectively used for the coefficient R ₁ 、R ₂ Punishment is carried out, R ₁ 、R ₂ Representing penalty coefficients, function T [ []In (a)

Or->

If the calculation result of (1) is true, T []The result of (1); otherwise T [ sic ]]The result of (2) is 0; the second term and the third term are smoothness constraints, and the depth values of the pixel points in the field are required to be as consistent as possible, namely smoothness is kept，/>

。

In order to prevent depth boundary blurring, R is determined ₂ The intensity variation of the color needs to be considered:

wherein I is _bp Representing the gray value of the current pixel point p, I _bq A gradation value of a domain pixel point representing the pixel point p; r is R ₂ And' represents a penalty coefficient.

Taking the left image as an example, p is all pixels in the left image, and the semi-global matching algorithm obtains a two-dimensional smoothness constraint result by using a one-dimensional smoothness constraint result along the 16-direction as shown in fig. 2. The accumulated matching cost calculation formula of the pixel point p in the iteration direction r is as follows:

L _r (p,d)=C(p,d)+min(A,B,C,D)-minL _r (p-r,d)

A=L _r (p-r,d)

B=L _r (p-r,d-1)+R ₁

C=L _r (p-r,d+1)+R ₁

wherein L is _r (p, d) represents the cumulative matching cost in a certain direction; c (p, D) gives a matching cost of parallax D to the pixel point p, and min (A, B, C, D) is the minimum matching cost containing a penalty coefficient of the pixel point p-r in the previous direction of the pixel point p in the current direction; i represents parallax in directions other than d, d-1, d+1 in the 16 directions; minL (Minl) _r (p-r, d) has no effect on the generation of the optimal path, and the main purpose of this term is to prevent L _r (p, d) is too large to make L _r (p,d)≤C _max +R ₂ ，C _max Representing a matching cost threshold for a certain direction.

Thus, the matching costs in each direction are accumulated into a total matching cost:

wherein S (p, d) represents the cumulative matching cost in all directions of the left image.

Similarly, q is all pixels in the right image, and the accumulated matching cost in all directions of the right image can be obtained as S (p (q, d), d).

(3) Parallax calculation method

The parallax D is mainly calculated by matching cost aggregation, and the calculation formula of the left image parallax map D1 is as follows:

d _{final_left} =min _d S(p,d)

wherein d _{final_left} Indicating the final left image disparity.

The calculation formula of the right image parallax map D2 is:

d _{final_right} =min _d S(p(q,d),d)

wherein d _{final_right} Indicating the final right image disparity.

(4) Parallax optimization

After the parallax images of the left image and the right image are generated, consistency check is needed, namely, the parallax of the matching point pair output by the comparison algorithm is mainly used, and if the difference between the two is too large, the point is set as an unavailable point. Therefore, the pixel points of the left image and the right image can be mapped one by one, and shielding and mismatching can be effectively checked.

Through the operation, the parallax image is still wrong, and post-processing such as segmentation, peak filtering, fitting and the like needs to be carried out on the parallax image.

Step 1-4: the front meets.

According to the matching result and the rational polynomial coefficient (RPC, rational Polynomial Coefficient) after the three-dimensional area network adjustment, calculating the three-dimensional coordinate of each homonymy image point through the front intersection:

、/>

、/>

first derivative of row coordinates of left image in longitude, latitude and elevation directions in rational polynomial respectively, +.>

、/>

、/>

The first derivatives of the column coordinates of the left image in the directions of longitude, latitude and elevation in the rational polynomials are respectively shown; />

、/>

、/>

First derivative of row coordinates of right image in longitude, latitude and elevation directions in rational polynomial respectively, +.>

、/>

、/>

The first derivatives of the column coordinates of the right image in the longitude, latitude and elevation directions in the rational polynomials are respectively shown; />

、/>

、/>

Correction values of the same name image points in the longitude, latitude and elevation directions are respectively obtained; l (L) _left 、r _left Respectively represents the row coordinate and the column coordinate of the image with the same name in the left image, l _right 、r _right Respectively representing row coordinates and column coordinates of the same name image points in the right image; l (L) _{left_rpc} 、r _{left_rpc} Respectively representing the row coordinates and the column coordinates of the image with the same name in the left image according to the rational polynomial coefficient, l _{right_rpc} 、r _{right_rpc} Respectively representing the row coordinates and the column coordinates of the image with the same name in the right according to the rational polynomial coefficient.

Initial ground point longitude and latitude can select the longitude and latitude of the center point of stereopair image, and the initial l _{left_rpc} 、r _{left_rpc} 、l _{right_rpc} 、r _{right_rpc} Calculating according to longitude and latitude of the center point of the stereopair image, then carrying out iterative updating according to the above, obtaining new longitude and latitude each time the calculation is completed, and then re-calculating to obtain new l _{left_rpc} 、r _{left_rpc} 、l _{right_rpc} 、r _{right_rpc} And when the set threshold value is met (the threshold value is set according to the spatial resolution of the stereopair images), ending iteration, namely the geographic coordinates of the stereo matching homonymous image points.

If the reference digital elevation model data exist, a specified threshold value can be set, and matching points with larger errors are removed through comparison with the generated geographic coordinates.

Step 1-5: digital surface model DSM generation.

And (3) iterating the above formula, and calculating the geographic coordinates of all three-dimensional matching homonymy pixels, thereby generating a digital surface model DSM of the whole image.

And 2, performing ground object classification on the stereopair images by using the target context convolution neural network, and extracting ground object boundaries.

According to the scheme, a ground object segmentation algorithm is built by utilizing a target context convolution neural network (OCRNet), the overall idea is a semantic segmentation process from thick to thin, a rough segmentation result is obtained by adopting a general semantic segmentation model, meanwhile, the characteristics of each pixel are obtained from a backbone network, and the characteristics of each category can be obtained according to the semantic information and the characteristics of each pixel. And then calculating the similarity between the pixel characteristics and the characteristics of each category, and obtaining the probability that each pixel belongs to each category according to the similarity. Finally, the object context feature representation is spliced with the feature representation of the deepest layer input of the network and is used as the feature representation of the context information enhancement, so that the semantic category of each pixel can be predicted based on the enhanced feature representation.

Step 2-1: and extracting category region features.

The category region feature representation calculation formula is:

wherein K is K, K is the kth category, and K is the total number of categories; f (f) _k Category region features for the kth category; m is m _ki Is p _i Regularized value of the corresponding kth class, p _i For the ith pixel, x, on the stereopair image _i Is pixel p _i A corresponding original feature representation; l is the size of the stereopair image.

Step 2-2: category region context information is calculated.

Calculating the similarity between the ith pixel of the stereopair image and each category area to obtain a characteristic expression form with the highest matching degree with the pixel, wherein the category area context information calculation formula is as follows:

wherein y is _i Context information representing the i-th pixel and each category region; k represents the kth classThe j represents the j-th category, and K is the total number of categories; k (x) _i ,f _k ) Representing characteristic x _i And category regional feature f _k Similarity between, k (x _i ,f _j ) Representing characteristic x _i And category regional feature f _j Similarity between; k (x, f) =Φ (x) ^T All of ψ (f), Φ (), ψ (), δ () are conversion networks, which can be seen as a 1*1-sized conv+bn+relu convolution module, for better correlation between learning features.

Step 2-3: a feature representation of the contextual information enhancement is obtained.

The context information enhanced feature representation is represented by the pixel original feature representation x _i And category region context information representation y _i Spliced, and using a Conv+BN+Relu convolution module with a size of 1*1 to reduce the dimension to the required output dimension, wherein the calculation formula is as follows:

wherein z is _i For enhanced contextual information features; w () is a conversion function.

Step 3-1: and establishing a correction target area of the digital elevation model.

Dividing the stereopair images into grids with fixed sizes by using a parting network algorithm (FNEA), dividing the categories of each subarea according to the constructed classification model, and merging to construct a correction target area. The parting network algorithm is a region merging algorithm based on image objects, and the basic idea is as follows: selecting any pixel in the area to be segmented as an object, merging all pixels meeting the set heterogeneity threshold value in the neighborhood of the pixel, searching the pixels meeting the set threshold value in the neighborhood of the region as the object, and merging until the set threshold value is not met. The various merging processes of the parting network algorithm are shown in fig. 3.

The corrected target subregion obtained by the classification result of the target context convolution neural network generally cannot form a continuous association region, and the target regions to be corrected need to be combined to obtain a final corrected target region. FIG. 4 is a schematic diagram of a modification target subregion combined into a modification target region, wherein one modification target subregion is arbitrarily selected to be searched outwards based on the modification target subregion identified by the classification result, the modification target subregion within the radius r is an associated subregion, the process is circulated, other associated subregions are searched outwards based on the searched associated subregion until the searching is finished, and all the associated subregions are constructed into an external polygon, namely the modification target region.

Step 3-2: and correcting the target area elevation training sample selection.

The fitting precision of the correction target area depends on the selection of an elevation training sample, and the scheme defines an elevation data point set in an interval with the outward extension distance D of each correction target area as the elevation training sample and utilizes an elevation difference capability attenuation function J _D A search is performed as shown in fig. 5.

Height difference capability decay function J _D The method comprises the following steps:

wherein D is the outward extension distance of the correction target area, m (D) is the number of elevation points in the selected interval of the elevation training sample, namely the number of associated elevation data points; h is a _i Representing the elevation value of the ith elevation data point.

The relation calculation formula of the range of the nth iterative search and the (n+1) th time is as follows:

D _n+1 =D _n +η*d

where d is the outward search distance step size and η is the sensitivity. When the height difference capability decays function J _D And when the iteration converges, the iteration is ended. When J _D When convergence, namely, the change of the search results of two successive iterations is smooth,and determining a high Cheng Xunlian sample selection interval of which the correction target area extends outwards, and selecting an associated elevation data point set as an elevation training sample.

Step 3-3: a high Cheng Qumian fit is made to the corrected target area.

In order to avoid the precision of the rough image elevation surface fitting in an elevation training sample, the method adopts a radial neural network elevation fitting method taking rough differences into consideration, firstly, grid point elevations of an associated elevation data point set are mapped to gray values of stereopair images, then the rough differences in the elevation training sample are detected through a DOG extremum method, then the detected rough differences are corrected through a least square method moving surface fitting method, and finally, the RBF neural network is used for carrying out surface fitting on the elevation data point set after the rough difference correction.

The RBF neural network comprises an input layer, an implicit layer and an output layer, wherein the implicit layer is an activation function. If the input data is vector X and the output layer outputs a scalar function of X corresponding to the input data, the vector X is input in the interpolation process of n elevation data points ^p Mapping to output t ^p Is expressed as f (x) ^p )=t ^p Where p=1, 2,..n.

In an RBF neural network, the output function f (x) is a weighted linear combination of a series of radial basis functions:

wherein w is _p Weights for the p-th elevation data point; ||x-x ^p I is the elevation data point x to the p-th elevation data point x ^p A Euclidean distance between them; phi () is a linear function, each elevation data point corresponds to n basis functions phi (||x-x) ^p || )。

The interpolation formula for the q-th elevation data point is:

at this time, f (x) is a continuous fitting curved surface, the RBF neural network adopts a complex quadratic function as a radial basis function in the fitting process, so that a better fitting result can be obtained, and the calculation formula is as follows:

wherein r= |x-x ^p The I is the Euclidean distance between the elevation data point x and the p-th elevation data point; epsilon is a self-defined smoothing coefficient, and epsilon is more than or equal to 0 and less than or equal to 1.

The weighted objective function is defined as:

wherein e _q Is the error in inputting the q-th elevation data point of the RBF neural network.

The weight objective function E converges through iteration to obtain a continuous fitting curved surface f (x).

Example 2:

the present embodiment was tested and analyzed in the scheme of the above embodiment 1, and the forward-looking and backward-looking stereopair images of a certain airport WorldView3 were taken as the study object, the resolution was 0.4m, the acquisition time was 5 months in 2019, the stereopair image size was 19000×9000 pixels, and the study area includes buildings, roads, water bodies, and the like. The algorithm is realized based on C++ and Python programming under a Windows 10 professional operating system, the development platform is Visual stillio 2019 and pyrcharm, the hardware equipment is PC and server, and the specific configuration is as follows: intel (R) Xeon (R) Gold 6234 CPU @ 3.30GHz 128G*2 2*GeForce RTX 3090 (24G).

The specific parameters in the test process are set as follows: penalty coefficient R ₁ Is 10, R ₂ 120; the sample number is 3912, the size is 512 multiplied by 512, the training set, the test set and the verification set are divided according to the proportion of 0.8:0.1:0.1, the iteration number epoch is set to 300 rounds, the step size is battsize to be 16, the initial learning rate gamma is 0.01, the learning rate attenuation uses a polynomial learning iteration strategy, the optimizer is SGD, the momentum is 0.9, and the loss function is two cross entropiesloss, used to calculate the attention mechanism and gradient of classification results.

Please refer to fig. 6, which is a diagram of a test result generated by the high-precision digital elevation model DEM in the present embodiment, wherein a in fig. 6 is an original stereopair image; b in fig. 6 is the building extraction result based on the target context convolution neural network; c in fig. 6 is a schematic diagram of the generated digital earth model DSM, and the mesh spacing size is 0.5m; d in fig. 6 is a schematic diagram of the digital elevation model DEM, and the mesh spacing is 0.5m.

Please refer to fig. 7, which shows the result of the digital surface model DSM as a shaded map and a partial area enlarged view thereof. Please refer to fig. 8, which is a shaded view of the digital elevation model DEM and a partial enlarged view thereof. As can be seen from fig. 7 and 8, the edge profile of the building in the digital surface model DSM is relatively obvious, the digital elevation model DEM generated after the building is removed is relatively smooth, but the accuracy of the digital surface model DSM is relatively dependent on the extraction result of the deep learning, and if the edge of the building is not extracted, the erroneous result of the building is preserved in the digital elevation model DEM.

In order to quantitatively evaluate the precision of the digital elevation model DEM generated by the scheme, control point data acquired by the field GPS-RTK is taken as a true value, partial control points are extracted as check points to evaluate the precision of the digital elevation model DEM, the distribution situation is shown in fig. 9, the errors of the check points Gao Chengzhong are calculated by the following formula, and the precision statistics result is shown in table 1.

Where RMSE represents checkpoint Gao Chengzhong error; h _i Representing elevation values, H, of a digital elevation model DEM generated using the scheme _i And' represents the elevation value of the reference digital elevation model DEM.

TABLE 1 digital elevation model DEM precision assessment

As can be seen from Table 1, the in-elevation error RMSE is 4.39m, and the quality requirement of the production result of the 1:50000 high-precision digital elevation model DEM is met.

The foregoing is merely illustrative of the present invention, and the present invention is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. The digital elevation model extraction method based on the target context convolution neural network is characterized by comprising the following steps of: the method comprises the following steps:

2. The method for extracting the digital elevation model based on the target context convolution neural network according to claim 1, wherein: the step of processing the stereoscopic region adjustment of stereoscopic image pair images in the step 1 comprises the following steps:

and constructing a rational polynomial model:

where l denotes the line of image points in the stereopair image and s denotes the stereopair imageColumns of image points in an image, v _l 、v _s Residual errors are observed values of image point coordinates; s is(s) _Measuring 、l _Measuring The coordinate observation value is the coordinate observation value of the homonymous image point;

、/>

、/>

、/>

、/>

、/>

、/>

、/>

、/>

、/>

The column coordinates of the stereopair images are respectively expressed by longitude in rational polynomials,First derivative in latitude and elevation directions; />

、/>

、/>

、/>

wherein l _rpc 、s _rpc Respectively calculating row coordinates and column coordinates of the image according to the rational polynomial coefficients; a, a ₀ 、a _s 、a _l 、b ₀ 、b _s 、b _l Are affine transformation parameters of a rational polynomial of the stereopair image.

3. The method for extracting the digital elevation model based on the target context convolution neural network according to claim 2, wherein: the processing step of generating the epipolar line image in the step 1 comprises the following steps:

according to the three-dimensional area network adjustment result, constructing a quadratic polynomial model based on a epipolar line image generation model of an image space:

4. The method for extracting a digital elevation model based on a target context convolution neural network according to claim 3, wherein: the processing step of performing dense matching on the images in the step 1 comprises the following steps:

based on a semi-global matching algorithm, the mutual information of each pixel is used as a matching primitive, and the matching degree of pixels between the left image and the right image is calculated by using a mutual information matching cost calculation method:

mutual information matching cost value representing left and right images, < ->

Representing the mutual information matching cost value of pixel points in the left and right images; p represents a pixel point set, P is E P; i _1p Representing gray values of pixel points in the left image, I _2p Gray values representing pixel points in the right image;

mutual information representing left and right images; />

Entropy representing left image->

Entropy representing the right image; />

Representing the joint entropy of the left image and the right image;

the probability distribution of the joint entropy of the left image and the right image is represented; g (i, k) represents the gray value of the corresponding image point of the left and right images to carry out Gaussian convolution operation;

establishing an energy function E (D) depending on the parallax image D:

wherein D is _p Represents the parallax value of the pixel point p,

matching the sum of costs for all the pixel points; n (N) _p A field pixel point set representing a pixel point p; r is R ₁ 、R ₂ Represents penalty factors,/->

The method comprises the steps of carrying out a first treatment on the surface of the Function T []Middle->

Or alternatively

If the calculation result of (1) is true, T []The result of (1); otherwise T [ sic ]]The result of (2) is 0;

taking p as all pixel points in the left image, and obtaining a two-dimensional smooth constraint result by a semi-global matching algorithm through one-dimensional smooth constraint results along N directions; the accumulated matching cost calculation formula of the pixel point p in the iteration direction r is as follows:

L _r (p,d)=C(p,d)+min(A,B,C,D)-minL _r (p-r,d)

A=L _r (p-r,d)

B=L _r (p-r,d-1)+R ₁

C=L _r (p-r,d+1)+R ₁

wherein L is _r (p, d) represents the cumulative matching cost in a certain direction; c (p, D) gives a matching cost of parallax D to the pixel point p, and min (A, B, C, D) is the minimum matching cost containing a penalty coefficient of the pixel point p-r in the previous direction of the pixel point p in the current direction; i represents parallax in directions other than d, d-1, d+1 in the 16 directions; minL (Minl) _r (p-r, d) limiting parameters for L _r (p,d)≤C _max +R ₂ ，C _max A matching cost threshold representing a certain direction;

the matching costs in each direction are accumulated as the total matching cost:

s (p, d) represents the accumulated matching cost in all directions of the left image;

taking p as all pixel points in the right image, wherein the accumulated matching cost in all directions of the right image is S (p (q, d), d);

the calculation formula of the left image parallax map D1 is:

d _{final_left} =min _d S(p,d)

wherein d _{final_left} Representing the final left image disparity;

the calculation formula of the right image parallax map D2 is:

d _{final_right} =min _d S(p(q,d),d)

wherein d _{final_right} Indicating the final right image disparity.

5. The method for extracting the digital elevation model based on the target context convolution neural network according to claim 4, wherein: the step 1 of performing the front meeting processing includes:

according to the matching result and the rational polynomial coefficient after the three-dimensional area network adjustment, calculating the three-dimensional coordinate of each homonymy image point through front intersection processing:

、/>

、/>

、/>

、/>

、/>

、/>

、/>

、/>

、/>

、/>

Correction values of the same name image points in the longitude, latitude and elevation directions are respectively obtained; l (L) _left 、r _left Respectively represents the row coordinate and the column coordinate of the image with the same name in the left image, l _right 、r _right Respectively representing row coordinates and column coordinates of the same name image points in the right image; l (L) _{left_rpc} 、r _{left_rpc} Respectively represent the calculation of homonymous image points on the left image according to rational polynomial coefficientsRow and column coordinates, l _{right_rpc} 、r _{right_rpc} Respectively representing row coordinates and column coordinates of the same name image point on the right image according to rational polynomial coefficients;

pair l _{left_rpc} 、r _{left_rpc} 、l _{right_rpc} 、r _{right_rpc} Performing iterative computation to obtain geographic coordinates of three-dimensional matching homonymous image points;

and calculating the geographic coordinates of all the three-dimensional matching homonymous image points, thereby generating a digital surface model DSM of the whole image.

6. The method for extracting the digital elevation model based on the target context convolution neural network according to claim 1, wherein: the step 2 specifically comprises the following steps:

step 2-1: extracting category region features:

wherein K is K, K is the kth category, and K is the total number of categories; f (f) _k Category region features for the kth category; m is m _ki Is p _i Regularized value of the corresponding kth class, p _i For the ith pixel, x, on the stereopair image _i Is pixel p _i A corresponding original feature representation; l is the size of the stereopair image;

step 2-2: calculating category region context information;

wherein y is _i Context information representing the i-th pixel and each category region; k represents the kth category, j represents the jth category, and K is the categoryA total number; k (x) _i ,f _k ) Representing characteristic x _i And category regional feature f _k Similarity between, k (x _i ,f _j ) Representing characteristic x _i And category regional feature f _j Similarity between; k (x, f) =Φ (x) ^T ψ (f), Φ (), ψ (), δ () are all the conversion networks;

step 2-3: acquiring a characteristic representation of the enhancement of the context information;

7. The method for extracting the digital elevation model based on the target context convolution neural network according to claim 1, wherein: the step 3 specifically comprises the following steps:

step 3-1: establishing a correction target area of the digital elevation model;

step 3-2: selecting a correction target area elevation training sample;

defining an elevation data point set in an interval with the outward extension distance D of each correction target area as an elevation training sample:

wherein D is the outward extension distance of the correction target area, m (D) is the number of elevation points in the selected interval of the elevation training sample, namely the number of associated elevation data points; h is a _i An elevation value representing an ith elevation data point;

D _n+1 =D _n +η*d

wherein d is the step length of searching the distance outwards, and eta is the sensitivity;

when J _D When convergence is carried out, a high Cheng Xunlian sample selection interval of which the correction target area extends outwards is determined, and an associated elevation data point set is selected as an elevation training sample;

step 3-3: performing high Cheng Qumian fitting on the corrected target area;

mapping the grid point elevation of the associated elevation data point set to the gray value of the stereopair image, then detecting the rough difference in the elevation training sample by a DOG extremum method, correcting the detected rough difference by a least square method moving curved surface fitting method, and finally performing curved surface fitting on the elevation data point set after the rough difference correction by an RBF neural network;

the RBF neural network comprises an input layer, an implicit layer and an output layer, wherein the implicit layer is an activation function; let the input data be vector X, the output layer outputs a scalar function of X corresponding to the input data, and in the process of interpolation by using n elevation data points, the input vector X ^p Mapping to output t ^p Is expressed as f (x) ^p )=t ^p Wherein p=1, 2, n;

in an RBF neural network, the output function f (x) is:

wherein w is _p Weights for the p-th elevation data point; ||x-x ^p I is the elevation data point x to the p-th elevation data point x ^p A Euclidean distance between them; phi () is a linear function, each elevation data point corresponds to n basis functions phi (||x-x) ^p || )；

The interpolation formula for the q-th elevation data point is:

at this time, f (x) is a continuous fitting curved surface, and the RBF neural network adopts a complex quadratic function as a radial basis function in the fitting process:

wherein r= |x-x ^p The I is the Euclidean distance between the elevation data point x and the p-th elevation data point; epsilon is a self-defined smoothing coefficient, and epsilon is more than or equal to 0 and less than or equal to 1;

the weighted objective function is defined as:

wherein e _q Error when inputting the q-th elevation data point of the RBF neural network;