CN111582004A

CN111582004A - Target area segmentation method and device in ground image

Info

Publication number: CN111582004A
Application number: CN201910117994.8A
Authority: CN
Inventors: 姜帆; 郝志会
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2019-02-15
Filing date: 2019-02-15
Publication date: 2020-08-25

Abstract

The invention discloses a method and a device for segmenting a target region in a ground image. The method comprises the following steps: inputting the ground image into a depth full convolution neural network model obtained by pre-training aiming at least one ground image in the test image data set to obtain a probability matrix; each element of the probability matrix represents the probability that a pixel point corresponding to the element in the ground image belongs to at least one type of target area; the deep full convolution neural network model is obtained by training different types of target areas of a plurality of ground sample images in an image sample set; obtaining preselected areas of target areas of various types included in the ground image according to the probability matrix; and processing the preselected area to obtain the segmentation result of each type of target area of the ground image. Each pixel of the ground image is classified through the deep full convolution neural network, so that accurate segmentation of different target areas is realized, the realization process is simple, and the influence of imaging quality is not easy to influence.

Description

Target area segmentation method and device in ground image

Technical Field

The invention relates to the field of computer vision, in particular to a method and a device for segmenting a target region in a ground image.

Background

The ground Image is a film or a photograph recording electromagnetic waves of various ground features, and includes a remote sensing Image (RemoteSensing Image) and an overhead Image. The ground image segmentation refers to a technique and a process for dividing a ground image into target regions with characteristics and extracting an interested target, where the characteristics may be that a predefined target such as gray scale, color, texture, etc. of pixels may correspond to a single region or a plurality of regions. Ground image segmentation is a key step from ground image processing to ground image analysis, and occupies an important position in image engineering.

In the prior art, a target area of a ground image is segmented, a supervised learning model such as a Support Vector Machine (SVM) is generally used, and when the ground image is segmented, the shallow learning model has a high requirement on the imaging quality of the ground image, needs to perform three-dimensional transformation on the ground image or needs the assistance of other information sources, can only perform two-classification on pixel points of the ground image, can only obtain a segmentation result of a specific target area of the ground image in each operation, and cannot simultaneously distinguish and segment various different target areas in the ground image. Therefore, the conventional ground image segmentation has a complex implementation process, is easily influenced by imaging quality, and cannot rapidly and accurately obtain the segmentation results of different target areas.

Disclosure of Invention

In view of the above, the present invention has been made to provide a method and apparatus for segmenting a target region in a ground image, which overcome or at least partially solve the above problems.

In a first aspect, an embodiment of the present invention provides a method for segmenting a target region in a ground image, including the following steps:

inputting at least one ground image in a test image data set into a depth full convolution neural network model obtained through pre-training to obtain a probability matrix; each element of the probability matrix represents the probability that a pixel point corresponding to the element in the ground image belongs to at least one type of target area; the deep full convolution neural network model is obtained by training different types of target areas of a plurality of ground sample images in an image sample set;

obtaining preselected areas of target areas of various types included in the ground image according to the probability matrix;

and processing the preselected area to obtain the segmentation result of each type of target area of the ground image.

In one embodiment, the process of training the deep fully convolutional neural network comprises:

marking different types of target areas in the ground sample image by using polygons and/or line segments with different thicknesses to obtain a true value of each target area on each ground sample image;

and training the deep full convolution neural network by taking the area marked by the polygon and/or the line segments with different thicknesses as a positive sample and the area not marked as a negative sample to obtain the deep full convolution neural network model.

In one embodiment, the obtaining the preselected regions of the respective types of target regions included in the ground image according to the probability matrix includes:

determining the type of a target area to which each pixel in the ground image belongs according to the probability value of each pixel in the probability matrix to obtain the pixels included in the target areas of all types;

and obtaining the preselected regions of the target areas of the various types included in the ground image according to the pixels included in the target areas of the various types.

In an embodiment, the inputting the ground image into a deep full convolution neural network model obtained by pre-training to obtain a probability matrix includes:

taking the ground image as the input of a depth fully-convolutional neural network, obtaining the probability that each pixel belongs to each type of target area aiming at each pixel in the input ground image at the output layer of the depth fully-convolutional neural network, and generating a three-dimensional probability matrix that all pixels of the ground image respectively belong to each type of target area; accordingly, the method can be used for solving the problems that,

the obtaining the preselected regions of the target regions of the various types included in the ground image according to the probability matrix includes:

determining the type of a target area with the highest probability value of each pixel according to the three-dimensional probability matrix, and taking the target area of the type as a classification result of the current pixel; and dividing the target area according to the classification result to obtain preselected areas of the target areas of various types.

taking the ground image as the input of a depth full convolution neural network, and aiming at each specified type of target area in an output layer of the depth full convolution neural network, obtaining a two-dimensional probability matrix of each pixel in the input ground image as the type of target area; accordingly, the method can be used for solving the problems that,

and aiming at each type of target area, judging whether the probability value of each pixel is greater than a specified threshold value according to the two-dimensional probability matrix of the type of target area, extracting all pixels with probability values greater than the threshold value, and obtaining a preselected area in the type of target area.

In an embodiment, the processing the preselected region to obtain the segmentation result of each type of target region of the ground image includes:

regularizing the preselected region by morphological operation;

extracting the target areas in the normalized output image through a connected domain detection algorithm, calculating the positions of the target areas, and obtaining the segmentation results of the target areas of various types of the ground image.

In one embodiment, said warping said preselected region by a morphological operation comprises:

when the preselected area is a linear image area, repairing the fracture of the preselected area through closed operation;

and when the preselected area is a nonlinear image area, cutting the adhesion part of the preselected area through open operation.

In a second aspect, an embodiment of the present invention provides an apparatus for segmenting a target region in a ground image, including:

the testing module is used for inputting at least one ground image in a testing image data set into a depth full convolution neural network model obtained by pre-training to obtain a probability matrix; each element of the probability matrix represents the probability that a pixel point corresponding to the element in the ground image belongs to at least one type of target area; the deep full convolution neural network model is obtained by training different types of target areas of a plurality of ground sample images in an image sample set;

the preselection module is used for obtaining preselection areas of various types of target areas included in the ground image according to the probability matrix;

and the processing module is used for processing the preselected area and acquiring the segmentation result of each type of target area of the ground image.

In one embodiment, the apparatus further comprises:

the marking module is used for marking different types of target areas in the ground sample image by using polygons and/or line segments with different thicknesses to obtain a true value of each target area on each ground sample image;

and the model training module is used for training the deep full convolution neural network by taking the area marked by the polygon and/or the line segments with different thicknesses as a positive sample and the area not marked as a negative sample to obtain the deep full convolution neural network model.

In one embodiment, the preselection module obtains preselection regions for various types of target regions included in the ground image based on the probability matrix, including:

In one embodiment, the testing module is configured to input the ground image into a deep full convolution neural network model obtained through pre-training to obtain a probability matrix, and includes:

the preselection module obtains preselection areas of various types of target areas included in the ground image according to the probability matrix, and the preselection areas comprise:

In one embodiment, the processing module processes the preselected region to obtain segmentation results of various types of target regions of the ground image, including:

regularizing the preselected region by morphological operation;

extracting the target areas in the normalized ground image through a connected domain detection algorithm, calculating the position of each target area, and obtaining the segmentation result of each type of target area of the ground image.

In one embodiment, the processing module normalizes the preselected region by a morphological operation, comprising:

In a third aspect, an embodiment of the present invention provides a computer-readable storage medium, on which computer instructions are stored, and the computer instructions, when executed by a processor, implement the above-mentioned target region segmentation method in a ground image.

In a fourth aspect, an embodiment of the present invention provides an image processing apparatus, including: a processor, a memory for storing processor executable commands; wherein the processor is configured to perform the above-mentioned target region segmentation method in the ground image.

The technical scheme provided by the embodiment of the invention has the beneficial effects that at least:

1. according to the method for segmenting the target area in the ground image, provided by the embodiment of the invention, each pixel of the tested ground image is subjected to multi-classification through the deep full convolution neural network model obtained through pre-training, so that the probability matrix of each pixel in the ground image is obtained, and the target area to which each pixel belongs is judged. The pre-selection areas of the target areas of various types included in the ground image are obtained by classifying the pixels of the ground image, so that the target areas of different types of the ground image can be segmented; the method has the advantages that the depth full convolution neural network model is used for carrying out depth learning on the sample image, so that pixels of different target areas can be classified, segmentation results of the different target areas are obtained, the method is not easily influenced by imaging quality, three-dimensional transformation is not needed in the process of testing the ground image, auxiliary processing of other information sources is not needed, the processed target area is closer to the actual shape and size of the target, and the method is suitable for segmentation of the target area of the ground image with visible light ground image, multispectral ground image and multisource information fusion.

2. According to the method for segmenting the target area in the ground image, provided by the embodiment of the invention, the pixels of the ground image are classified in the deep full convolution neural network model, the requirement on the imaging quality of the ground image is low, the method can be suitable for segmenting the target area of the ground image with high resolution and low resolution, the stability of segmenting the target area in the ground image is good, the method is not easily influenced by weather, complex background and other conditions in the ground image, the information integrity of the segmented target area is ensured, and the error of the segmentation result is reduced.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:

FIG. 1 is a flowchart of a method for segmenting a target region of a ground image according to an embodiment of the present invention;

FIG. 2 is a flow chart of a deep fully convolutional neural network training process in an embodiment of the present invention;

FIG. 3 is a schematic illustration of a remote sensing image in an embodiment of the invention;

FIG. 4 is a schematic diagram of an output image obtained by testing the remote sensing image shown in FIG. 3 according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of an image during morphological processing of the image shown in FIG. 4 according to an embodiment of the invention;

FIG. 6 is a schematic diagram of a device for segmenting a target region of a ground image according to an embodiment of the present invention;

fig. 7 is a schematic diagram of a target region segmentation apparatus for another ground image according to an embodiment of the present invention.

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

In view of the problems in the prior art, an embodiment of the present invention provides a method for segmenting a target region in a ground image, where the flow of the method is shown in fig. 1, and the method includes the following steps:

s11, aiming at least one ground image in the test image data set, inputting the ground image into a depth full convolution neural network model obtained through pre-training to obtain a probability matrix.

And aiming at each ground image in the test image data set, testing the ground image in a depth full convolution neural network model obtained by pre-training to obtain a probability matrix of whether each pixel in the ground image is a designated target area.

Each element of the probability matrix represents the probability that a pixel point corresponding to the element in the ground image belongs to a target area of a corresponding type. Assuming that the ground image is composed of m × n pixel points, and the target region has t types in common, the probability matrix is a t × m × n matrix, where each element of the t × m × n matrix represents the probability that the pixel point corresponding to the element belongs to the target region of the corresponding type.

Since there may be a plurality of targets in the ground image, the type of the target area obtained may also be changed according to different purposes of ground image analysis or different division standards of the target area. The designated target area may be various types of target areas determined in the ground image. For example, the types of target areas of the ground image may include, but are not limited to, a road area, a building area, a river area, a park area, a lake area, an airport area, and a station area.

And S12, obtaining the preselected areas of the target areas of various types included in the ground image according to the obtained probability matrix.

In the embodiment of the invention, the type of a target area to which each pixel in a ground image belongs is determined according to the probability value of each pixel in a probability matrix, and the pixels included in the target areas of all types are obtained; and obtaining the preselected areas of the target areas of the various types included in the ground image according to the pixels included in the target areas of the various types. According to the probability matrix, each type of target region in the obtained ground image is a relatively rough estimation result of a true value of the target region extracted according to the depth full convolution neural network model, and therefore, in the embodiment of the invention, each type of target region in the ground image extracted according to the probability matrix is called a preselection area of the target region.

In the specific implementation process, there may be a plurality of probability matrices obtained in step S11, and there are a plurality of ways to obtain the pre-selected area in step S12, which can be specifically referred to in the following description of specific embodiments.

And S13, processing the preselected area to obtain the segmentation result of each type of target area of the ground image.

In the embodiment of the present invention, the process of processing the preselected region includes the steps of performing morphological operations and processing by a connected component detection algorithm, which can be specifically described in the following detailed description.

In the embodiment of the present invention, the ground image includes, but is not limited to, a remote sensing image and a high-altitude overhead image, and the technical solution provided by the embodiment of the present invention is described below with the remote sensing image as an example.

In an embodiment, in step S11, the remote sensing image is tested in a deep full convolution neural network model obtained by pre-training, so as to obtain a probability matrix of whether each pixel in the remote sensing image is a designated target region, where a first optional implementation manner is:

and taking the remote sensing image as the input of the deep full convolution neural network, obtaining the probability that each pixel belongs to each type of target area aiming at each pixel in the input remote sensing image in the output layer of the deep full convolution neural network, and generating a three-dimensional probability matrix that all pixels of the remote sensing image respectively belong to each type of target area.

And obtaining the probability that each pixel of the remote sensing image belongs to each type of target area on the remote sensing image through the deep full convolution neural network, and generating a three-dimensional probability matrix that all pixels of the remote sensing image respectively belong to each type of target area, namely the position of each pixel of the remote sensing image corresponding to the three-dimensional probability matrix has the probability value that the pixel respectively belongs to each type of target area of the remote sensing image.

In this way, a multi-dimensional probability matrix is obtained, each element in the matrix comprises a multi-dimensional probability value, and the multi-dimensional probability values respectively represent the probability that the pixel belongs to each target area.

Accordingly, in step S12, obtaining the preselected regions of the target areas of the various types included in the remote sensing image according to the probability matrix may be implemented as follows:

For example, the types of the target areas of the remote sensing images include a road area, a building area and a lake area, each pixel point on the remote sensing image needs to obtain probability values of the pixel point belonging to the road area, the building area, the lake area and the background at the same time, 4 probability values are available at the position corresponding to the pixel point on the three-dimensional probability matrix, the probability values of the pixel point corresponding to the remote sensing image in the three-dimensional probability matrix belonging to the road area are assumed to be a, the probability value belonging to the building area is B, the probability value belonging to the lake area is C, and the probability values belonging to the background are D, the probability values of the pixel belonging to different target areas are compared to determine the highest probability value of the pixel, and if a is greater than B, C and D, that is a is the maximum value, the road area corresponding to a is used as the classification result of the pixel. And obtaining classification results of all pixels of the remote sensing image by adopting the same calculation method, and dividing a target area according to the classification results of all pixels to obtain preselected areas of a road area, a building area and a lake area of the remote sensing image.

Alternatively, more than one three-dimensional probability matrix may be generated to represent a pixel or pixels that may belong to two different categories simultaneously, such as mountain land and green land.

In an embodiment, in the step S11, the remote sensing image is tested in the depth full convolution neural network model obtained by pre-training, so as to obtain a probability matrix of whether each pixel in the remote sensing image is a designated target region, where the second optional implementation manner is:

and taking the remote sensing image as the input of the depth fully-convolution neural network, and obtaining a two-dimensional probability matrix of each pixel in the input remote sensing image as a target area of the type aiming at each specified type of target area in an output layer of the depth fully-convolution neural network.

Specifically, for a specified type of target area to be segmented on the remote sensing image, for example, the specified type of target area of the remote sensing image is segmented into a road area, then, the probability that each pixel in the remote sensing image belongs to the road area is calculated in the deep fully convolutional neural network, and then a two-dimensional probability matrix that each pixel in the remote sensing image is the road area can be obtained through testing in the deep fully convolutional neural network.

In this way, for a plurality of different target regions, there may be a plurality of two-dimensional probability matrices, each including a probability of whether each pixel in the image belongs to a respective target region.

Accordingly, in step S12, the pre-selection area of each type of target area included in the remote sensing image is obtained according to the probability matrix, which can be implemented as follows:

and aiming at each type of target area, judging whether the probability value of each pixel is greater than a specified threshold value according to the two-dimensional probability matrix of the type of target area, extracting all pixels with the probability values greater than the threshold value, and obtaining a preselected area in the type of target area.

Or taking the road area as an example, judging whether the probability value of each pixel in the two-dimensional probability matrix is greater than a specified threshold value, if so, determining that the pixel belongs to the road area, extracting all pixels with probability values greater than the threshold value, if not, not extracting the pixel, completing comparison between all pixels and the specified threshold value, and then extracting the preselected area of the road area of the remote sensing image.

In the embodiment of the present invention, the threshold may be determined according to an actual situation, for example, according to different types of remote sensing images, for example, a Synthetic Aperture Radar (SAR) image and an optical satellite remote sensing image, and a specific method for determining the threshold may be a method in the prior art, which is not limited in the embodiment of the present invention.

It should be noted that, in the embodiment of the present invention, the deep full convolution neural network model is obtained by training different types of target areas of multiple ground sample images in the image sample set.

In one embodiment, the process of training the deep fully convolutional neural network, as shown in fig. 2, may include the following steps:

s21: marking different types of target areas in the ground sample image by using polygons and/or line segments with different thicknesses to obtain a true value of each target area on each ground sample image;

specifically, a manual labeling manner may be adopted to label a nonlinear target region on each ground sample image in the image sample set in a manner of selecting a polygon region to be close to the edge of the target region according to the contour shape of the target regions of different types, or label a linear target region with line segments of different thicknesses according to the line width of the target regions of different types, so as to obtain a true value of each target region on the ground sample image, that is, a training sample of each type of target region.

S22: and training the deep full convolution neural network by taking the area marked by the polygon and/or the line segments with different thicknesses as a positive sample and the area not marked as a negative sample to obtain the deep full convolution neural network model.

And obtaining a plurality of ground sample images after image sample centralized labeling, taking a nonlinear target area labeled by a polygon and/or a linear target area labeled by line segments with different thicknesses in the ground sample images as a positive sample, taking an area not labeled by the polygon in the ground sample images as a negative sample, and training a deep full convolution neural network by adopting a back propagation algorithm to obtain a deep full convolution neural network model.

In the embodiment of the invention, the shape of a target area in the ground image is linear, such as a road area, a ditch area and a river area, and the linear target area is a linear image area; and the shape of some target areas is a gathered regular or irregular nonlinear shape, such as a building area, an airport area and a lake area, and such target areas which are nonlinear image areas. When the sample marking is carried out, the nonlinear target area is marked by adopting a polygon, and the linear target area is marked by adopting line segments with different thicknesses. Because the remote sensing image can simultaneously segment a plurality of target areas, the size difference of different types of target areas is large, samples are prone to being uneven, and therefore fine targets are difficult to segment accurately. In the embodiment of the invention, the deep full convolution neural network can be trained by adopting a gradient descent method, the learning result is judged to be good or bad during training according to the loss value, and the gradient descent direction, namely the training direction, is calculated by the partial derivative of the loss function. Full convolution at training depthIn the neural network process, the following loss function is used:

wherein α and gamma are fixed constants, α∈ [0,1 ]]γ ∈ [1, + ∞)), t denotes a certain type of target area, for example a road area or a building area, p_tWhen the truth value of the target area is t, the probability that the result of the network prediction is t,

is represented by p_tThe calculated loss. The method reduces the contribution of target areas with larger sizes, such as building areas, lake areas and the like which are easy to learn to network errors through the loss function; in contrast, the contribution of small-sized target areas, such as road areas, trench areas, and the like, which are difficult to learn, to the network error is increased. Therefore, the distribution of samples with unbalanced target size in the ground sample image is balanced, and the features of all target areas can be better learned in the deep full convolution neural network training process.

In the deep full convolution neural network model training process, the training samples are learned, and high-dimensional semantic features of the labeled target area, such as road features, building features and lake features, are extracted. When the input ground image is tested, the trained deep full convolution neural network model can classify each pixel of the input ground image according to the extracted high-dimensional semantic features, obtain the probability value of each pixel belonging to various types of target areas, and generate a probability matrix.

In an embodiment, in the step S13, the processing on the preselected area to obtain the segmentation result of each type of target area of the remote sensing image may be implemented by the following steps:

the pre-selected area is regulated through morphological operation;

and extracting the target areas in the normalized output image through a connected domain detection algorithm, calculating the positions of the target areas, and obtaining the segmentation results of the target areas of various types of the remote sensing image.

In one embodiment, the foregoing regularizing the preselected region by morphological operations may be implemented by:

The morphological operation is an image processing method of a binary image by a pointer according to mathematical morphology (mathematical morphology). In mathematical morphology, an open operation is defined as erosion before expansion, and a closed operation is defined as erosion before expansion. The corrosion can make the range of the target area smaller, which causes the boundary of the image to shrink substantially, and can be used for eliminating small and meaningless targets; the expansion can enlarge the range of the target area, and background points contacted with the target area are combined into the target object, so that the boundary of the target object expands outwards, thereby filling some holes in the target area and eliminating small particle noise contained in the target area. The connected component detection algorithm is a method for checking the connectivity of each pixel with its neighboring pixels, and is a simple and effective method for marking each region in the segmented image. The break in the preselected area refers to a place where the target areas present in the preselected areas of adjacent target areas do not communicate. The adhesion of the preselected areas refers to the place where the pixels in the preselected areas of two adjacent target areas coincide.

In this embodiment, a region having a difference between the length and the width greater than a set difference threshold may be defined as a linear image region, and a region having a difference between the length and the width not greater than the set difference threshold may be defined as a nonlinear region. Correspondingly, when the pre-selection area is judged, the judgment can be carried out according to the difference between the length and the width, whether the difference between the length and the width is larger than a set difference threshold value or not is judged, if yes, the linear image area is judged, and if not, the non-linear image area is considered.

The shapes of target areas in the remote sensing images are linear, such as road areas and river areas, and the linear target areas can belong to linear image areas; while the shape of some target areas is a gathered regular or irregular non-linear shape, such as a building area, an airport area and a lake area, such a non-linear target area may belong to a non-linear image area. In the morphological operation, the dilation operation, erosion operation, opening operation, or closing operation may be selected according to the actual shape of the target region.

And the preselected area of the target area is a linear image area, and the internal holes of the image and the edge burrs of the image are filled through morphological closed operation to connect the fracture parts of the image. The closing operation process may specifically be that the expansion operation is performed on the preselected area to increase the area of the target of the preselected area by a certain number of points, so as to fill up the holes in the preselected area, the fracture parts of the images are connected, then the corrosion operation is performed on the image with the increased area, and the edge burrs of the image are eliminated, thereby completing the smoothing process and the fracture part repair of the preselected area without significantly changing the area of the image.

The preselected area of the target area is a nonlinear image area, discrete points and edge burrs are eliminated through morphological open operation, and the adhered targets are separated. The opening operation process can specifically be that the corrosion operation is carried out on the preselected area to reduce the area of the preselected area, eliminate discrete points and edge burrs of the preselected area, separate the adhered objects, and then the expansion operation is carried out on the image with the reduced area to fill the holes in the image, thereby finishing the smoothing treatment and the cutting of the adhered parts of the preselected area under the condition of not obviously changing the area of the image.

In the embodiment of the invention, each pixel of the remote sensing image is classified through the deep full convolution neural network, so that the preselected area of the target area is obtained, the graph is rough, burrs are often generated at the edge of the target area, and the edge of the target area is not smooth. Meanwhile, as the remote sensing image contains more targets, the background is complex, and some targets are small, the remote sensing image needs to be refined, for example, the preselected area is normalized through morphological operation, the edge burr of the target area is eliminated, the edge of the target area is smoothed, the small target noise or the hole of the background is processed, and the small target false detection is eliminated. And finally, extracting the target area in the normalized output image through a connected domain detection algorithm, calculating the position of the target area, and obtaining the segmentation result of each type of target area of the remote sensing image.

In the embodiment of the invention, as some targets on the remote sensing image may be very small, the number of pixels of the target itself needs to be considered when performing morphological operation, and when performing morphological erosion on a small target, the pixel value of the radius of the template of the selected morphological erosion cannot be too large, so as to avoid that the target cannot be identified or the identification effect is poor.

According to the method for segmenting the target area in the remote sensing image, provided by the embodiment of the invention, each pixel of the tested remote sensing image is subjected to multi-classification through the deep full convolution neural network model obtained through pre-training, so that the probability matrix of each pixel in the remote sensing image is obtained, and the target area to which each pixel belongs is judged. The pre-selection area of each type of target area included in the remote sensing image is obtained by classifying the pixels of the remote sensing image, so that the target areas of different types of the remote sensing image can be segmented; the method has the advantages that the depth full convolution neural network model is used for carrying out depth learning on the sample image, so that pixels of different target areas can be classified, segmentation results of the different target areas are obtained, the method is not easily influenced by imaging quality, three-dimensional transformation is not needed in the process of testing the remote sensing image, auxiliary processing of other information sources is not needed, the processed target area is closer to the actual shape and size of the target, and the method is suitable for segmentation of the target area of the remote sensing image with visible light remote sensing image, multispectral remote sensing image and multisource information fusion.

According to the method for segmenting the target area in the remote sensing image, provided by the embodiment of the invention, the pixels of the remote sensing image are classified in the deep full convolution neural network model, the requirement on the imaging quality of the remote sensing image is low, the method can be suitable for segmenting the target area of the remote sensing image with high resolution and low resolution, the stability of segmenting the target area in the remote sensing image is good, the method is not easily influenced by the conditions of weather, complex background and the like in the remote sensing image, the information integrity of the segmented target area is ensured, and the error of the segmentation result is reduced.

In a specific embodiment, taking the segmentation of the road area in the remote sensing image as an example, the embodiment of the present invention is further described:

referring to fig. 3, a remote sensing image in a test image data set is obtained, the remote sensing image is used as an input of a deep full convolution neural network obtained through pre-training, the remote sensing image is tested in a deep full convolution neural network model, the probability that each pixel belongs to each type of target area is obtained for each pixel in the input remote sensing image in an output layer of the deep full convolution neural network, and a three-dimensional probability matrix that all pixels of the remote sensing image respectively belong to each type of target area is generated; determining the type of a target area with the highest probability value of each pixel according to the three-dimensional probability matrix, and taking the target area of the type as a classification result of the current pixel; the sum of the probabilities that each element in the three-dimensional probability matrix belongs to the respective target region is 1. For example, when a road region and a building region of an input remote sensing image are simultaneously segmented, each element in the three-dimensional probability matrix outputs three probability values, wherein the three probability values are respectively the probability that a pixel point on the input remote sensing image corresponding to the element belongs to the road region, the probability that the pixel point belongs to the building region and the probability that the pixel point belongs to the background region, and the sum of the three probability values is 1. And if the probability value belonging to the road region is the maximum in the three probability values, classifying the pixel point as a road region pixel point. Acquiring pixel points of the input remote sensing image corresponding to elements with the highest probability values of all road region types in the three-dimensional probability matrix to obtain an output image of the depth fully-convolutional neural network shown in FIG. 4, wherein a white region in FIG. 4 is a preselected region of a road region; alternatively, the first and second electrodes may be,

and aiming at the road area, obtaining a two-dimensional probability matrix of which each pixel in the input remote sensing image is the road area, judging whether the probability value corresponding to each pixel in the two-dimensional probability matrix is greater than a specified threshold value, and if so, taking the pixel as a pixel point of the road area. Each element in the two-dimensional probability matrix only represents the probability value of the pixel point in the input remote sensing image corresponding to the element, which belongs to the road region, a specified threshold value is preset, if the probability value of the element is greater than the specified threshold value, the pixel point in the input remote sensing image corresponding to the element belongs to the road region, all the pixel points belonging to the road region are extracted, the output image of the deep full convolution neural network shown in fig. 4 is obtained, and the white region in fig. 4 is the preselected region of the road region.

Since the preselected area of the road area shown in fig. 4 is divided by the deep fully convolutional neural network to obtain a relatively rough intermediate result of the road area, a pixel classification error may occur in the classification process of the deep fully convolutional neural network, so that a plurality of broken places exist between the road and the road in the preselected area of the road area, and edge burrs may occur at the road edge; meanwhile, the interior of the road has some tiny holes due to the interference of shelters such as vehicles, pedestrians and the like in the road of the input remote sensing image. The pre-selection area of the road area is regulated through the closed operation in the morphological operation, so that tiny holes in the road can be filled, the road fracture is repaired, the adjacent roads are communicated, the edges of the roads are smoothened, and the burrs at the edges of the roads are eliminated. Specifically, referring to fig. 5, the image shown in fig. 4 may be first morphologically expanded and then morphologically etched to obtain a normalized output image.

In a specific embodiment, when performing the close operation on the preselected area of the road area shown in fig. 4, it is assumed that a circle with a radius of 10 pixels is used as a template for morphological expansion, the expansion template is convolved with the preselected area of the road area shown in fig. 4 to obtain a local maximum value, that is, a maximum value of a pixel point of the preselected area of the road area covered by the expansion template is calculated, and the maximum value is assigned to a pixel specified by a reference point of the expansion template, and when performing the convolution, a processing mode of 8 neighborhoods or 4 neighborhoods may be used. Taking the processing mode of 8 neighborhoods as an example, after convolution, the foreground points of the preselected area of each road area change all 8 surrounding pixel points into foreground points, and after all the foreground points are traversed once, the preselected area of the road area in the image gradually increases to obtain the expanded road area in the image; then, a square with the radius of 10 pixels is used as a morphological corrosion template, the corrosion template is convolved with the expanded road area in the image, a local minimum value is obtained, namely, the minimum value of pixel points of the expanded road area in the image covered by the corrosion template is calculated, the minimum value is assigned to a pixel appointed by a reference point of the corrosion template, when the convolution is carried out, an 8-neighborhood processing mode is taken as an example, if 8 pixel points near a foreground point of a preselected area of the road area exist, the current foreground point becomes a background, after all foreground points are traversed for one time, the expanded road area in the image is gradually shortened, and the corroded road area in the image is obtained.

The calculation formula of the closed operation is specifically as follows:

I_r＝erode_radius=10(dilate_radius=10(I) in which I denotes the image of the preselected area of the road area shown in fig. 4, I_rThe normalized output image is shown, radius =10 shows a radius of 10 pixels, dilate shows an expansion operation, and enode shows an erosion operation.

The use of a circle with a radius of 10 pixels as the template for morphological dilation and a square with a radius of 10 pixels as the template for morphological erosion is only one specific example, and the radius of the template can be selected as required in practical applications.

And finally, extracting the road areas in the normalized output image through a connected domain detection algorithm, calculating the positions of the mutually connected roads in the road areas, and obtaining the road areas covered by the mutually connected roads.

When the road region of the normalized output image is extracted, each pixel point of the road region is communicated with the adjacent pixel points to form a communicated region, the pixel points of the road region are iterated until all the pixel points are converged, a target region, namely the road region is obtained, and the specific position of the road region is calculated. The connected component detection algorithm adopted in the embodiment of the invention can be a two-step traversal scanning method or a seed filling method. Taking the two-pass scanning method as an example, when scanning the output image for the first time, scanning each pixel value in the output image from top to bottom and from left to right, and giving each effective pixel value a label, the rule is as follows: if the left pixel value and the upper pixel value in the 4 neighborhoods of the pixel are both 0 and no label exists, giving a new label to the pixel; if one of the left pixel value or the upper pixel value in the 4 neighborhoods of the pixel is 1, the label of the pixel is the label with the pixel value of 1; if the left pixel value and the upper pixel value in the 4 neighborhoods of the pixel are both 1 and the labels are the same, the label of the pixel is the label; if the left pixel value and the upper pixel value in the 4 neighborhood of the pixel are both 1 and the labels are different, the smaller label is taken as the label of the pixel. After the labeling is finished, a plurality of different labels exist in a connected domain, and the label of the pixel on the left side of the pixel and the label of the pixel on the upper side of the pixel are marked as an equal relation.

Since one or more different labels may be assigned to a set of pixels in the same connected region during the first scan, therefore, when the labels belonging to the same connected region but having different values need to be combined, i.e. the equality relationship between them is recorded, the image is output by performing the second scan, and the pixels marked by all the labels having the equality relationship are classified into one connected region, selecting the label with the minimum assigned value as a new label of the pixels marked by all labels with equal relation, i.e. access already marked tags and merge tags with equal relation, get all connected regions existing in the marked image, i.e. the interconnected roads of the road area, the position of the interconnected roads of the road area is calculated, and the road area covered by several interconnected roads is obtained.

Based on the same inventive concept, embodiments of the present invention further provide a device for segmenting a target region in a ground image, a related storage medium, and an image processing apparatus, and because the principles of the problems solved by these devices, related storage media, and image processing apparatuses are similar to the method for segmenting a target region in a ground image, the implementation of the devices, related storage media, and image processing apparatuses may refer to the implementation of the foregoing methods, and repeated details are not repeated.

Referring to fig. 6, an embodiment of the present invention provides a device for segmenting a target region in a ground image, including:

the test module 61 is configured to input, for at least one ground image in the test image data set, the ground image into a depth full convolution neural network model obtained through pre-training to obtain a probability matrix; each element of the probability matrix represents the probability that a pixel point corresponding to the element in the ground image belongs to at least one type of target area; the deep full convolution neural network model is obtained by training different types of target areas of a plurality of ground sample images in an image sample set;

a preselection module 62, configured to obtain preselection areas of various types of target areas included in the ground image according to the probability matrix;

and the processing module 63 is configured to process the preselected area, and obtain a segmentation result of each type of target area of the ground image.

In one embodiment, the target region segmentation apparatus in the ground image further includes:

a labeling module 64, configured to label, for the ground sample image, different types of target regions in the image with polygons and/or line segments of different thicknesses, so as to obtain a true value of each target region on each ground sample image;

and the model training module 65 is configured to train the deep full convolution neural network by using the regions marked by the polygons and/or the line segments with different thicknesses as positive samples and using the regions not marked as negative samples, so as to obtain a deep full convolution neural network model.

In one embodiment, the preselection module 62, based on the probability matrix, obtains preselection regions for each type of target region included in the ground image, including: determining the type of a target area to which each pixel in the ground image belongs according to the probability value of each pixel in the probability matrix to obtain the pixels included in the target areas of all types; and obtaining the preselected areas of the target areas of the various types included in the ground image according to the pixels included in the target areas of the various types.

In one embodiment, the processing module 63 is specifically configured to perform regularization on the preselected region through a morphological operation; extracting the target areas in the normalized ground image through a connected domain detection algorithm, calculating the position of each target area, and obtaining the segmentation result of each type of target area of the ground image.

In one embodiment, the processing module 63, specifically configured to perform regularization on the preselected region through a morphological operation, includes:

Embodiments of the present invention provide a computer-readable storage medium, on which computer instructions are stored, and when the instructions are executed by a processor, the method for segmenting a target region in a ground image is implemented.

An embodiment of the present invention provides an image processing apparatus, including: a processor, a memory for storing processor executable commands; wherein the processor is configured to perform the above-mentioned target region segmentation method in the ground image.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. A method for segmenting a target region in a ground image, comprising:

2. The method of claim 1, wherein the process of training the deep fully convolutional neural network comprises:

3. The method of claim 1, wherein said obtaining pre-selected regions of respective types of target regions included in the ground image from the probability matrix comprises:

4. The method of claim 1, wherein the inputting the ground image into a pre-trained deep full convolution neural network model to obtain a probability matrix comprises:

5. The method of claim 1, wherein the inputting the ground image into a pre-trained deep full convolution neural network model to obtain a probability matrix comprises:

6. The method according to any one of claims 1 to 5, wherein the processing the preselected area to obtain the segmentation results for each type of target region of the ground image comprises:

regularizing the preselected region by morphological operation;

7. The method of claim 6, wherein said warping the preselected area by a morphological operation comprises:

8. An object region segmentation apparatus in a ground image, comprising:

9. The apparatus of claim 8, further comprising:

10. The apparatus of claim 8, wherein the preselection module obtains preselection regions for respective types of target areas included in the ground image based on the probability matrix, comprising:

11. The apparatus of claim 8, wherein the testing module is configured to input the ground image into a pre-trained deep full convolution neural network model to obtain a probability matrix, and comprises:

12. The apparatus of claim 8, wherein the testing module is configured to input the ground image into a pre-trained deep full convolution neural network model to obtain a probability matrix, and comprises:

13. The apparatus of any one of claims 8-12, wherein the processing module processes the preselected region to obtain segmentation results for each type of target region of the ground image, including:

regularizing the preselected region by morphological operation;

14. The apparatus of claim 13, wherein the processing module normalizes the preselected region by a morphological operation comprising:

15. A computer readable storage medium having stored thereon computer instructions, which when executed by a processor, implement a method of target region segmentation in a ground image according to any one of claims 1 to 7.

16. An image processing apparatus comprising: a processor, a memory for storing processor executable commands; wherein the processor is configured to perform the method of target region segmentation in ground images according to any one of claims 1 to 7.