CN103955710B

CN103955710B - Method for monocular vision space recognition in quasi-earth gravitational field environment

Info

Publication number: CN103955710B
Application number: CN201410212438.6A
Authority: CN
Inventors: 郑李明; 崔兵兵
Original assignee: Jinling Institute of Technology
Current assignee: Nanjing Yuanjue Information And Technology Co
Priority date: 2013-11-29
Filing date: 2014-05-19
Publication date: 2017-02-15
Anticipated expiration: 2034-05-19
Also published as: CN103632167A; CN103632167B; CN103955710A

Abstract

The invention discloses a method for monocular vision space recognition in a quasi-earth gravitational field environment. The method for monocular vision space recognition in the quasi-earth gravitational field environment is characterized by comprising the following steps that firstly, superpixel segmentation is conducted on an image based on the CIELAB color space values L, a and b of a pixel, the x coordinate value and the y coordinate value, so that a superpixel image is generated; secondary, dimensionality reduction is conducted on the superpixel image obtained through segmentation and a large image block is formed according to a clustering algorithm based on the superpixel color characteristic, the textural feature vector distance and the adjacency relation; thirdly, the pixel of the obtained large image block is multiplied by a gravitational field fuzzy distribution density function which represents the sky, the ground and the stereo object, then the value of expectation of the large image block is obtained, and preliminary classification of the sky, the ground and a stereo object is completed; fourthly, a classification map of the sky, the ground and the stereo object is extracted through single-layer wavelet sampling and the Manhattan direction characteristic; finally, a space depth perception image is generated based on a small aperture imaging model and ground linear perspective information. The method for monocular vision space recognition in the quasi-earth gravitational field environment is easy to implement, the resolution ratio is high, and the application range is wide.

Description

Monocular vision space identification method under geoid gravity field environment

Technical Field

The invention relates to an image processing method, in particular to an image processing method which can be widely applied to the fields of robot visual navigation, large-space target measurement, target tracking and positioning and the like and can improve space identification, and specifically relates to a monocular visual space identification method under a gravity field environment.

Background

Understanding the 3D spatial structure as a basic problem of machine vision has long been of interest and research, and early research efforts focused on stereoscopic vision or obtaining 3D cues through the movement of viewing angles. In recent years, many researchers have focused on reconstructing 3D spatial structures from monocular visual images, and most of the monocular visual 3D spatial recognition methods currently employ supervised machine learning methods, such as: markov Random Fields (MRFs), conditional probabilistic random fields (CRFs), and Dynamic Bayesian Networks (DBNs), among others. However, these methods often rely on their a priori knowledge, i.e. only learning the image environment acquired in the training set. Therefore, when the sampling device or the sampling environment changes, the result of the monocular visual 3D space recognition will generate a large difference. In order to solve the problem, the invention provides a new unsupervised learning monocular space recognition method which is constructed by adding gravity field factors into image analysis.

Disclosure of Invention

The invention aims to solve the problems that most of the existing image identification methods can be completed by learning images, and the methods have large data processing capacity, low speed, poor adaptability and more limited application range.

The technical scheme of the invention is as follows:

a monocular visual space recognition method under a similar gravity field environment is characterized by comprising the following steps:

firstly, carrying out super-pixel segmentation on an image based on CIELAB color space values L, a, b and x, y coordinate values of pixels to generate a super-pixel image with a certain density;

secondly, reducing the dimension of the segmented super-pixel image and generating a large image block by adopting a common clustering algorithm based on the color characteristics of the super-pixels, the distance of the texture feature vectors and the adjacency relation;

thirdly, multiplying the gravity field fuzzy distribution density function representing sky, ground and vertical surface objects with the obtained pixels of the large block respectively, and solving the expected value of the large block, thereby finishing the preliminary classification of the sky, the ground and the vertical surface objects;

fourthly, extracting classification images of sky, ground and vertical surface objects through single-layer wavelet sampling and Manhattan direction characteristics;

and finally, generating a space depth perception map based on the pinhole imaging model and the ground linear perspective information, thereby completing the conversion from the plane image acquired by the camera equipment to the three-dimensional image and realizing monocular vision space identification under the similar ground gravity field environment.

The invention has the beneficial effects that:

the invention firstly provides a method for adding gravity field factors into image analysis, constructs a novel unsupervised learning monocular space recognition method, simulates a method for integrating and processing a ground continuous surface by a human vision system, constructs a monocular vision space recognition mode under a certain universality similar ground gravity field environment, and changes the traditional monocular vision system 3D reconstruction and depth perception algorithm mode.

1. The invention simulates a human vision system, constructs a monocular vision space identification method with certain universality under a similar-ground gravity field environment, and points out that the method can be applied to the vision space measurement under the similar-ground gravity field environment such as a mars surface, a moon surface and the like, as shown in fig. 15.

2. When the constraint condition on the brightness of the sky in the image is canceled, the invention can also identify the environment of the urban night scene, as shown in fig. 16.

3. The monocular visual image under the gravity field-like environment can be effectively identified and reconstructed in a 3D mode without learning and training prior knowledge of a computer.

4. The method changes the traditional 3D reconstruction and depth perception algorithm mode of the monocular vision system, and can be widely applied to the fields of robot vision navigation, large-space target measurement, target tracking and positioning and the like.

Drawings

FIG. 1 is a schematic flow diagram of the present invention.

FIG. 2 is a schematic diagram of the super-pixel based general clustering process and effect of the present invention. In fig. 2: (a) original image, (b) 951 superpixel segmentation images, (c)145 spectral clustering images, and (d) 92 clustering images with 3 iterations converging.

FIG. 3 is a schematic diagram of the process of eliminating islanding in a tile by using geometric containment relationship according to the present invention. In fig. 3: (a) the method comprises the steps of (a) leaving an island block after a building window clustering algorithm, (b) eliminating the island block based on a geometric containment relation clustering algorithm.

Fig. 4 is a schematic diagram of a human gravitational field visual cognition model.

FIG. 5 is a schematic illustration of the determination of the location of the eye level of the present invention.

FIG. 6 is an equivalent schematic diagram of the image horizon position determination of the present invention, wherein: h_IIs the height of the image, H_I＝H_S+H_G。

Fig. 7 is a schematic diagram of the process of classifying objects on the ground, the sky and the vertical surface according to the fuzzy distribution density function of the gravitational field.

FIG. 8 is a schematic diagram of a facade object and sky classification algorithm process according to the present invention.

Fig. 9 is a schematic diagram illustrating the result of non-conforming gravity field in the determination of the blur function of the gravity field according to the present invention. In the figure, (a) is an original figure, and (b) is a result of calculation for distinguishing a vertical surface object from a ground surface.

FIG. 10 is a diagram illustrating the calculation results after the fuzzy function and the classification of the facade objects and sky according to the present invention. The clustering method comprises the following steps of (a) reclassifying image blocks which do not accord with the gravity field, and (b) clustering results obtained after vertical surface objects are distinguished from the ground.

Fig. 11 shows the output result after further distinguishing the vertical object from the ground according to the present invention.

Fig. 12 is a schematic view of a physical model of the vision imaging system of the present invention.

FIG. 13 is a schematic diagram of the mapping of the depth projection angle of the present invention in Lab space.

Fig. 14 is a depth perception map corresponding to fig. 11.

FIG. 15 is a diagram illustrating the results of spatial recognition and depth recognition of NASA Mars images using the method of the present invention.

FIG. 16 is a spatial identification and 3D reconstruction of an urban night scene picture using the method of the present invention.

Detailed Description

The invention is further illustrated by the following structural examples and the accompanying drawings.

As shown in fig. 1-14.

A monocular visual space identification method under a kind of gravity field environment comprises the following steps:

(1) firstly, carrying out super-pixel image segmentation on an image based on pixel colors and spatial positions to form a super-pixel image with certain density;

(2) reducing the dimension of the super-pixel image to be less than 10% of a large block cluster image by applying a common clustering algorithm based on the super-pixel color space distance, the texture feature vector distance and the geometric adjacency relation;

(3) the gravity field fuzzy distribution density function representing sky, ground and vertical surface objects is respectively multiplied by the pixels of the large blocks, and the expected values of the large blocks are solved, so that the primary classification of the sky, the ground and the vertical surface objects is generated, and the accurate classification images of the sky, the ground and the vertical surface objects are extracted through a further layer of wavelet sampling, Manhattan direction extraction and other characteristic classification algorithms;

(4) and finally, generating a space depth perception map based on the pinhole imaging model and the ground linear perspective information.

The details are as follows:

1. and (4) a super-pixel clustering algorithm.

A simple linear iterative clustering algorithm, namely slic (simple linear iterative clustering) proposed by Achanta R can be adopted, the algorithm constructs a 5-dimensional space by using the L, a and b values of the CIELAB color space of the pixel and the x, y axis coordinates of the pixel, and defines a normalized distance Ds, which is specifically defined as follows:

wherein: c_k＝[l_k,a_k,b_k,x_k,y_k]^TIs the center of the cluster; [ l_i,a_i,b_i,x_i,y_i]^T5-dimensional space coordinates of image pixel points; n is the number of pixels of the image; k is the number of super pixels desired to be obtained; s is the center grid spacing of the super pixels; d_sIs the distance d of the color lab_labAnd d_xyA normalized distance based on S; and m is a controllable super pixel density factor.

2. A general clustering algorithm based on superpixels.

(1) Using n super-pixels generated by SLIC algorithm as vertex V ═ { V } of undirected weight graph G₁,v₂,…,v_n}；

(2) Constructing an adjacency matrix, i is 1,2 … n; j is 1,2 … n, wherein n is the number of superpixels;

(3) constructing a weight adjacency matrix, wherein i is 1,2 … n; j is 1,2 … n;

the specific construction method is that the color space of an image is converted into the CIELab space, the value range of an L channel is divided into 8-level equal parts, the value range of an a channel is divided into 16 equal parts, the value range of a b channel is divided into 16 levels, the purpose of dividing the value range of the L channel into 8 levels is to reduce the disturbance of color brightness change on the weight value, and the space calculation value histogram of each superpixel in the dimension of 8 × 16 × 16-2048 is the space calculation value histogram of each superpixel in the dimension of 8 × ×Wherein l is 2048, then when E_i,jWhen 1 is true

For the value of the weight w (i, j), two constraint conditions based on the color distance and the texture energy distance can be added in the specific implementation, and the following are included:

① based on the constraint of color distance, when W (i, j) is less than or equal to W_TThen, take W (i, j) as 0, where W_TThe value range of (1) is (0.7-1.0);

② constraint of texture energy distance₂Norm calculation of the average energy measure of each superpixel block, i.e.

Where R (i, j) is the wavelet sample value at point (i, j) in the image, the four-dimensional wavelet feature vector of each super-pixel block, i.e. e (i) -e (e), is calculated according to equation (8)_i(LL)，e_i(LH)，e_i(HL)，e_i(HH)) And finding the Bhattacharyya coefficient value B between its neighboring superpixels_e(i,j)。

Wherein,

when B is present_e(i,j)≤B_TWhen the ratio is zero, then take w (i, j) ═ 0, where B_TThe value range of (1) is (0.85-1.0);

the two constraint conditions aim to improve the similarity threshold of the color and the texture of adjacent super pixel points so as to protect the boundary characteristics between the sky and a vertical object and between the vertical object and the ground.

(4) Constructing a degree matrix, i is 1,2 … n; j is 1,2 … n;

(5) construction of normalized Laplacian matrix

The Normalized Laplacian matrix is calculated using the Normalized-cut criterion:

L_sym＝I-D^-1/2WD^-1/2(12)

wherein: d is a degree matrix and W is a weight adjacency matrix.

(6) Calculating L_symDecomposing the eigenvalues, and taking the eigenvectors V corresponding to the first K minimum eigenvalues₁，V₂，…，V_kWherein K is [0.1 × n%]Namely, 10% of n is taken as the dimension of the image clustering feature vector to realize the purpose of reducing the dimension;

(7) will V₁，V₂，…，V_kIs arranged to form R^n×kTaking an absolute value of each element in the matrix to obtain a matrix U;

(8) for i 1,2 … n, let y_i∈R^kIs the ith row vector of the matrix U;

(9) for non-zero y_i∈R^kVector normalization and clustering by Bhattacharyya coefficient method, wherein the Bhattacharyya is distant from B_UThe threshold value of (A) is between (0.85-1.0), namely when B is_UWhen the value is larger than or equal to the threshold value, clustering is carried out among the super pixels;

(10) adopting a normalized CIELAB color histogram for each cluster image block, performing Bhattacharyya distance w (i, j) on adjacent image block classes by adopting a formula (7), and calculating B of the adjacent image blocks by adopting a formula (9)_e(i, j) when W (i, j) ≧ W_TAnd B_e(i,j)≥B_TClustering is carried out;

(11) and (5) repeating the step (10) until convergence.

The clustering process and the effect of the algorithm on images in the Make3D Image data Image database are shown in FIG. 2.

3. And (4) a geometric inclusion relation clustering algorithm.

In order to improve the accuracy of the fuzzy distribution density function in judging sky, ground and vertical objects, the image blocks need to be clustered based on the geometric containment relationship to eliminate the island image blocks, the so-called island image blocks refer to image blocks (as shown in fig. 3) in which one or more image blocks are completely surrounded by a large image block, and the clustering algorithm of the geometric containment relationship can cluster the island image blocks into the large image blocks completely surrounding the island, so that singularity caused by spatial classification of the island image blocks by the geometric context algorithm is avoided.

The specific algorithm is as follows:

(1) searching for a hollow pattern block with the criterion of N_b-n_b>When 0, the image block is a hollow image block, where Nb is the pixel value of all the boundaries of the image block, n_bIs the pixel value at the outer boundary of the tile, if N_b-n_b>0, entering the next step, otherwise, the image block is not a hollow image block;

(2) filling the image blocks by using the mark values of the original image blocks by taking the outer boundary as the boundary;

(3) and replacing the original hollow-out pattern blocks with filling pattern blocks.

4. And (3) a construction and classification algorithm of a human visual cognition model in a gravity field.

Fig. 4 is a human gravitational field visual cognition model.

The inference model of the human being about sky, ground and facade objects when the horizon is horizontal or nearly horizontal is shown in fig. 4, where the black dots represent the maximum probability point positions of the sky, ground or facade objects appearing in the human vision, respectively. The distribution density value of the probability distribution function for sky reasoning is gradually reduced from the maximum value of the top extension of the human field angle to the lowest extension of the human field angle, and the probability density value of the probability distribution function on the horizon line is zero; the distribution density value of the probability distribution function of ground inference is gradually reduced from the maximum value of the lowest extension of the human field angle to the highest extension of the human field angle, and the probability density value of the probability distribution function on the horizon is zero; the density value of probability distribution of the opposite object inference is gradually reduced from the maximum value on the horizon line to the upper and lower directions until the density value of the distribution of the maximum value and the lowest value of the human visual field angle approaches zero.

According to the reasoning model of the sky, the ground and the vertical surface object and the perspective projection characteristic of the image, the following gravity field fuzzy distribution density function is constructed:

(1) the position of the horizon of the image is set as shown in fig. 5, the horizon is a straight line passing through the optical center of the camera and being parallel to the ground plane or horizontal plane, and the horizontal line in the image is a straight line parallel to the ground plane or horizontal plane at the intersection of the horizon and the imaging target surface of the camera as shown in fig. 6.

(2) A ground gravity field visual fuzzy distribution density function G:

when H is present_G≥H_SThe method comprises the following steps: order toAnd is

Then obtain

In the formula H_GThe distance between the eye flat line and the bottom edge of the image is shown; h_SThe distance between the apparent flat line and the top edge of the image; x is the coordinate of the pixel in the image height direction; n is the order of the density function.

When H is present_G＜H_SThe method comprises the following steps: g (x) ═ s (x)

Namely, it is

Wherein: n is 1,2,3 … … N, N ∈ positive integer, and N may be 1 in general.

(3) Sky gravity field visual fuzzy distribution density function S:

when H is present_G＜H_SThe method comprises the following steps: order toAnd is

Then obtain

When H is present_G≥H_SThe method comprises the following steps: s (x) ═ g (x)

Namely, it is

(4) Visual fuzzy distribution density function V of the gravity field of the vertical surface object:

wherein:

(5) for each pixel in the clustering block, the distribution density function G of the blur of the ground, the distribution density function S of the blur of the sky and the distribution density function V of the blur of the vertical face object in the vertical direction of the image are within (-H)_G，H_S) Multiplying within the range and finding its desired value, the formula is as follows:

wherein: n is_iFor clustering the number of pixels of the block in the ith row, r_bFor the lowest extension of the cluster tiles, r_tFor the top extension of the drawing, i ∈ (0,1, … …, H), H ═ H_G+H_SThe classification of the tile is:

FIG. 7 shows the classification result of sky, ground and facade objects for the clustering blocks generated by the corresponding clustering algorithm. It can be seen from the figure that the method can effectively distinguish the sky from the ground, and judge the vertical surface object near the horizon more accurately, but has a certain degree of misjudgment for the vertical surface object and the sky image block which are higher and the vertical surface object and the ground which are lower, so that the alternative classification between the vertical surface object and the sky and between the vertical surface object and the ground is needed.

5. Visual classification algorithm of sky and elevation object in gravity field.

As mentioned above, due to the gravity field, the substances on the earth surface are distributed hierarchically according to the density, that is, the high density solid substances form the vertical objects standing on the ground, and the low density gaseous (e.g. air and cloud) substances form the sky, so that under the irradiation of light, the solid vertical objects and the sky present completely different reflection effects, and present distinct texture features in the image.

In the research of sky characteristics, 1 layer of different objects (such as sky, roof, wall, ground grass and the like) of an image are researchedWavelet transform sampling and using l₂Norm calculation of the average energy measure of each tile, i.e.

Wherein: n is a radical of_pIs the number of pixels of the tile, r_bIs the lowest extension of the drawing block, r_tIs the uppermost extent of the drawing block, c_lLeft-most of the tile in row i, c_rFor the rightmost tile in the ith row, where R (i, j) is the wavelet sample value at (i, j) point in the image, it should be noted that the energy generated by the tile edge needs to be removed when calculating the average energy measure of each tile.

The four-dimensional wavelet feature vector of the image block can be obtained through energy measure calculation, namely (e)_LL,e_LH,e_HL,e_HH) Wherein e is_LLCharacterised by the overall brightness characteristic of the tile, e_LH,e_HL,e_HHThe high-frequency texture features of the image blocks are characterized, and the characteristics of outdoor sky in the daytime are generally represented on the high-brightness and low-energy high-frequency texture features.

According to the above analysis, the following visual classification algorithm for sky and facade objects is proposed:

(1) if e_LL>mean(e_LL1，e_LL2，…e_LLn) Is a candidate sky tile, where e_LL1，e_LL2，…e_LLn∈ e of sky and facade objects_LLValues, wherein: mean () is the mean function;

(2) under the condition of meeting the above conditions, the energy measure of the non-down-sampled wavelet in one layer of the image blockIf the condition is not met, the picture block is judged not to be the sky picture block, E_cThe value range of (1) is (0-7);

(3) when the image blocks meet the requirement, judging whether image blocks with image upper extension as a boundary exist, if so, judging that sky image blocks exist, otherwise, judging that no sky exists in the image;

(4) if the candidate sky picture blocks are not unique under the condition of meeting the above conditions, selecting the picture block with the largest area as the sky picture block and taking the color distance value d as the sky picture block_abAnd a luminance distance value d_LThe sky is clustered for criterion, and the formula is as follows:

and is

Wherein a is_s、b_sIs the average value of the color channels of a and b in the color space of the sky pattern block CIELAB, a_i、b_iIs the average value of color channels a and b of candidate sky pattern block CIELAB color space when candidate sky pattern block d_abC and d are not more than C_LAnd L is equal to or less than the sky, otherwise, the sky is an elevation object, wherein the value range of C is (0-30), and the value range of L is (0-70).

(5) Calculating the area of the sky generated by clustering, if the number of the pixels is less than 2 per thousand of the pixels of the image, classifying the area of the sky as a vertical surface object, wherein the reason is that a small sky image block has little significance for identifying the image space;

(6) all non-sky tiles are classified as facade objects.

The result obtained by the elevation object and sky classification algorithm is shown in fig. 8, and it can be seen from the figure that the algorithm more accurately judges whether the sky exists in the image (as shown in fig. 8 (c)), and realizes clustering of non-adjacent sky image blocks (as shown in fig. 8 (b)).

6. And (4) a segmentation algorithm of the ground and the vertical surface object.

As shown in fig. 8, based on the above-mentioned blur function, most of the ground in the image can be extracted, but some vertical surface object image blocks and ground image blocks are misjudged, and in addition, the situation that the image does not conform to the gravity field may occur, as shown in fig. 9, image blocks No. 27 and 34, and the ground is suspended above the vertical surface object, so that the judgment result of the blur function needs to be further corrected.

And for the situation which does not conform to the gravity field space geometric logic, only the logic judgment of the geometric context is needed to correct. The above algorithm is mainly caused by the fact that a large ground and facade object is misjudged in the image, as shown in fig. 8(c) and (d), and therefore, it is necessary to judge whether a large building in a short distance exists in the image. The specific method comprises the following steps:

(1) according to the continuity of the ground and the geometrical context property of the space of the gravity field thereof, the ground image blocks suspended in the vertical surface object are classified as the vertical surface object, as shown in fig. 10;

(2) judging whether a large short-distance building exists in the image or not by carrying out Hough transformation on image blocks which are judged to be vertical objects in the image and by the strength of Manhattan direction information of the image based on a statistical histogram of direction angles of straight lines, if not, finishing the correction of the ground, and if so, entering the next step;

(3) the connecting boundary between the building and the ground tile is corrected according to the manhattan direction information of the building in the facade object, and fig. 10 is a ground boundary correction result of fig. 9.

7. A depth perception model.

The model firstly assumes that the ground is continuously extended and relatively flat, the visual imaging system has definite directionality, namely the upper edge of the image is right above the 3D space, the lower edge is right below the 3D space, and the physical model of the visual system based on the pinhole imaging principle is shown in FIG. 12.

The perspective projection relationship between the ground depth information and the ground pixel position in the image is as follows:

h is the height of the camera from the ground, β is the included angle between the optical axis of the camera and the eye plane, the depth projection angle α is the included angle between the eye plane oo' and the straight line op, and the value range isp 'is the projection of the point p on the ground on the imaging target surface, f is the focal length of the lens, h is the distance from the visual flat line on the imaging target surface to the point p', and the value range of the ground distance d sensed by the camera is

8. Depth perception map of an image.

As can be seen from the relation (22) between the ground depth and the height H from the ground of the camera and the depth projection angle α, when H is a constant, the depth of each pixel point projected by the ground on the camera can be represented by a value α, and we will understand that Is mapped to the CIELAB color spaceAnd define the color of the sky as a color circleThe color of (b) is as shown in fig. 13. Fig. 14 shows a depth perception map corresponding to fig. 11.

The present invention is not concerned with parts which are the same as or can be implemented using prior art techniques.

Claims

1. A monocular visual space recognition method under a similar gravity field environment is characterized by comprising the following steps:

firstly, carrying out super-pixel segmentation on an image based on CIELAB color space values L, a, b and x, y coordinate values of pixels to generate a super-pixel image;

third, it will represent sky, ground and facade objectsMultiplying the gravity field fuzzy distribution density function with the obtained large block pixels respectively, and solving the expected value of the large block, thereby finishing the preliminary classification of sky, ground and vertical surface objects; the expected value is obtained by carrying out a fuzzy distribution density function G of each pixel in the large image block with the ground, a fuzzy distribution density function S of the sky and a fuzzy distribution density function V of the vertical object in the vertical direction of the image, wherein the distribution density function is in the range of (-H)_G，H_S) The range is multiplied by the following formula:

wherein: g_E、S_E、V_EThe method is characterized in that the mathematical expected values of the ground (ground), the sky (sky) and the vertical (vertical face) are obtained by summing an image block in an image based on a ground gravity field fuzzy distribution density function G, a sky gravity field fuzzy distribution density function S and a vertical object gravity field fuzzy distribution density function V, and n is the mathematical expected value_iFor clustering the number of pixels of the block in the ith row, r_bFor the lowest extension of the cluster tiles, r_tFor the top extent of the drawing, i ∈ (0,1, … …, H)_Z) HZ is the maximum pixel value of the image in the height direction; h_I＝H_G+H_S，H_IHeight, H, of the image_GThe distance H from the image eye line to the image bottom edge_SThe distance from the image eye level to the image top; the classification of the tile is:

and finally, generating a space depth perception map based on the pinhole imaging model and the ground linear perspective information.

2. The method as claimed in claim 1, wherein the common clustering algorithm includes a super-pixel clustering method and a common clustering method based on super-pixels, the super-pixel clustering method adopts a simple Linear Iterative clustering algorithm (slic) proposed by Achanta R, and the algorithm constructs a 5-dimensional space by using L, a, b values of a CIELAB color space of pixels and x, y-axis coordinates of the pixels, which is specifically defined as follows:

wherein: c_k＝[l_k,a_k,b_k,x_k,y_k]^TIs the center of the cluster; [ l_i,a_i,b_i,x_i,y_i]^T5-dimensional space coordinates of image pixel points; n is the number of pixels of the image; k is the number of super pixels desired to be obtained; s is the center grid spacing of the super pixels; d_sIs the distance d of the color lab_labAnd d_xyA normalized distance based on S; m is a controllable super pixel density factor;

the general clustering method based on the super pixels comprises the following steps:

(1) combining n super-chips generated by SLIC algorithmPixel as vertex V ═ V of undirected weight map G₁,v₂,…,v_n}；

the method comprises the specific construction method that the color space of an image is converted into the CIELAB space, the value range of an L channel is divided into 8-level equal parts, the value range of an a channel is divided into 16 equal parts, the value range of a b channel is divided into 16 levels, the purpose of dividing the value range of the L channel into 8 levels is to reduce the disturbance of color brightness change on the weight, and the histogram is calculated in the space with the dimension of 8 × 16 × 16-2048 for each super pixelWherein h is_l(i) Normalized histograms for each superpixel computed in space of dimension l 8 × 16 × 16 2048, when E is then_i,jWhen 1 is true

Two constraint conditions based on the color distance and the texture energy distance are added to the value of the weight w (i, j), and are as follows:

① based on the color distance constraint condition when W (i, j) is less than or equal to W_TWhen the value is zero, then take W (i, j) as 0, where W is_TThe value range of (1) is (0.7-1.0);

② constraint of texture energy distanceBy a₂Norm calculation of the average energy measure of each superpixel block, i.e.

Wherein r is_bIs the lowest extension of the drawing block, r_tIs the uppermost extent of the drawing block, c_lLeft-most of the tile in row i, c_rFor the rightmost tile in the ith row, R (i, j) is the wavelet sample value at (i, j) point in the image, and the four-dimensional wavelet feature vector of each super-pixel block is calculated according to formula (8), i.e. e (i) ═ e_i(LL)，e_i(LH)，e_i(HL)，e_i(HH)) And finding the Bhattacharyya coefficient value B between its neighboring superpixels_e(i,j)；

Wherein,

the two constraint conditions aim to improve the similarity threshold of the color and the texture of adjacent super pixel points so as to protect the boundary characteristics between the sky and a vertical object and between the vertical object and the ground;

(4) constructing a degree matrix, i is 1,2 … n; j is 1,2 … n;

(5) and (3) constructing a Normalized Laplacian matrix, and calculating the Normalized Laplacian matrix by adopting a Normalized-cut rule:

L_sym＝I-D^-1/2WD^-1/2(12)

wherein: d is a degree matrix, and W is a weight adjacency matrix;

(8) for i 1,2 … n, let y_i∈R^kIs the ith row vector, R, of the matrix U^kIs a K-dimensional real vector;

(9) for non-zero y_i∈R^kThe vectors are normalized and clustered by the Bhattacharyya coefficient method, wherein B is the distance of Bhattacharyya_UThe threshold value is between (0.85-1.0), i.e. when B_UWhen the value is larger than or equal to the threshold value, clustering is carried out among the super pixels;

(10) adopting a normalized CIELAB color histogram for each cluster image block, calculating the Bhattacharyya distance w (i, j) of adjacent image block classes by adopting a formula (7), and calculating the B of the adjacent image block by adopting a formula (9)_e(i, j) when W (i, j) > W_TAnd B_e(i,j)＞B_TClustering is carried out;

(11) and (5) repeating the step (10) until convergence.

3. The method as claimed in claim 1, wherein the large blocks are generated by using a geometric containment relationship clustering method to eliminate islanding blocks, wherein an islanding block is a block in which one or more blocks are completely surrounded by a large block, and the geometric containment relationship clustering algorithm can cluster the islanding blocks into large blocks completely surrounding the islanding, so as to avoid singularities caused by spatial classification of the islanding blocks by a geometric context algorithm; the specific method comprises the following steps:

(1) searching for a hollow pattern block with the criterion of N_b-n_b>When 0, the pattern block is a hollow pattern block, wherein N_bFor pixel values at all boundaries of the tile, n_bPixel values at the outer boundary of the tile, ifN_b-n_b>0, entering the next step, otherwise, the image block is not a hollow image block;

4. The method of claim 1, wherein the single-layer wavelet sampling is used to extract the classification map of sky and elevation objects by using l₂Norm calculation of the average energy measure of each object block, i.e.

Wherein: n is a radical of_pIs the number of pixels of the tile, r_bIs the lowest extension of the drawing block, r_tIs the uppermost extent of the drawing block, c_lLeft-most of the tile in row i, c_rThe rightmost part of the image blocks in the ith row is shown, wherein R (i, j) is a wavelet sampling value at a (i, j) point in the image, and energy generated by the edge of each image block needs to be removed when an average energy measure of each image block is calculated;

the four-dimensional wavelet feature vector of the image block is obtained by energy measure calculation, namely (e)_LL,e_LH,e_HL,e_HH) Wherein e is_LLCharacterised by the overall brightness characteristic of the tile, e_LH,e_HL,e_HHThe method is characterized in that high-frequency texture features of image blocks are represented, and the characteristics of outdoor sky in the daytime in the image are generally represented on high-brightness and low-energy high-frequency texture features;

(1) if e_LL>mean(e_LL1，e_LL2，…e_LLn) Is a candidate sky tile, wherein

e_LL1，e_LL2，…e_LLn∈ e of sky and facade objects_LLValues, wherein: mean () is the mean function;

(2) under the condition of meeting the above conditions, the energy measure of the non-down-sampled wavelet in one layer of the image blockIf the condition is not met, the picture block is judged not to be the sky picture block, E_cThe value range (0-7);

(4) if the candidate sky image blocks are not unique under the condition of meeting the above conditions, selecting the image block with the largest area as the sky image block, and taking the color distance value d_abAnd a luminance distance value d_LThe sky is clustered for criterion, and the formula is as follows:

and is

Wherein a is_s、b_sIs the average value of the color channels of a and b in the color space of the sky pattern block CIELAB, a_i、b_iIs the average value of color channels a and b of candidate sky pattern block CIELAB color space when candidate sky pattern block d_abC and d are not more than C_LL is equal to or less than the sky, otherwise, the sky is an elevation object, wherein the value range of C is (0-30), and the value range of L is (0-70);

(5) calculating the area of the sky generated by clustering, and if the number of the pixels is less than 2 per thousand of the pixels of the image, classifying the area as a vertical surface object;

(6) all non-sky tiles are classified as facade objects.

5. The method as claimed in claim 1, wherein the classification map of the ground and facade objects is extracted by single-layer wavelet sampling by the following discrimination method:

(1) according to the continuity of the ground and the geometric vertical property of the space of the gravity field of the ground, classifying ground image blocks suspended in the vertical plane object into the vertical plane object;

(3) and correcting the connection boundary of the building in the vertical object and the ground image block according to the Manhattan direction information of the building in the vertical object.

6. The method of claim 1, wherein the fuzzy distribution density function of gravity field of sky, ground and facade objects is:

(1) a ground gravity field fuzzy distribution density function G:

Then obtain

In the formula H_GThe distance between the eye flat line and the bottom edge of the image is shown; h_SThe distance between the apparent flat line and the top edge of the image; x is the coordinate of the pixel in the image height direction; n is the order of the density function;

Namely, it is

Wherein: n is 1,2,3 … … N, N belongs to positive integer;

(2) a sky gravity field fuzzy distribution density function S:

Then obtain

Namely, it is

Wherein: n is 1,2,3 … … N, N belongs to positive integer;

(3) fuzzy distribution density function V of vertical surface object gravity field: