CN108830319B

CN108830319B - Image classification method and device

Info

Publication number: CN108830319B
Application number: CN201810603073.8A
Authority: CN
Inventors: 王宁; 曹红杰; 刘军; 肖计划
Original assignee: Bostar Navigation Iocation Based Services Beijing Co ltd; Beijing Unistrong Science & Technology Co ltd
Current assignee: Bostar Navigation Iocation Based Services Beijing Co ltd; Beijing Unistrong Science & Technology Co ltd
Priority date: 2018-06-12
Filing date: 2018-06-12
Publication date: 2022-09-16
Anticipated expiration: 2038-06-12
Also published as: CN108830319A

Abstract

The invention discloses an image classification method and device, which are used for providing efficient remote sensing image classification. The method comprises the following steps: performing image segmentation on an original image according to preset characteristics to obtain a plurality of mutually disjoint sub-areas, and recording position information occupied by each sub-area in the original image; replacing each sub-region in the original image with a pixel characteristic value corresponding to each sub-region respectively to obtain a first image; classifying pixels in the first image by using a full convolution neural network to obtain a second image; and according to the position information, carrying out image reconstruction on the second image to obtain a classified image corresponding to the size and the pixel position of the original image.

Description

Image classification method and device

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to an image classification method and apparatus.

Background

With the continuous improvement of the resolution of the remote sensing image, the high-resolution remote sensing image brings new opportunities for the development of the remote sensing technology, and simultaneously brings new challenges for the processing of remote sensing data due to the characteristics of the high-resolution remote sensing image. The more clear the ground objects in the remote sensing image, the more complex the structure. The development of deep learning technology brings a new means for image intelligent analysis, in particular to a full convolution neural network based on pixel level semantic segmentation. However, the full convolution network is a pixel-based classification method, and if the full convolution network is directly applied to a high-resolution remote sensing image, the calculation complexity is too high, and the pixel-based processing also causes the problem of image spot breakage and the influence of salt and pepper noise.

Disclosure of Invention

The invention provides an image classification method and device, which are used for realizing more efficient remote sensing image classification.

The invention provides an image classification method, which comprises the following steps:

performing image segmentation on an original image according to preset characteristics to obtain a plurality of mutually disjoint sub-areas, and recording position information occupied by each sub-area in the original image;

replacing each subarea in the original image with a pixel characteristic value corresponding to each subarea to obtain a first image;

classifying pixels in the first image by using a full convolution neural network to obtain a second image;

and according to the position information, carrying out image reconstruction on the second image to obtain a classified image corresponding to the size and the pixel position of the original image.

Optionally, the replacing each sub-region in the original image with a pixel feature value corresponding to each sub-region respectively to obtain a first image includes:

acquiring a pixel characteristic value corresponding to each sub-region;

and arranging all the pixel characteristic values according to the sequence of each subarea in the original image to obtain the first image.

Optionally, the obtaining of the pixel characteristic value corresponding to each sub-region includes:

for each sub-region:

taking the average value of all pixels of the sub-area as the pixel characteristic value of the sub-area; alternatively, the first and second electrodes may be,

taking the maximum value of all pixels of the sub-area as the pixel characteristic value of the sub-area; alternatively, the first and second electrodes may be,

and taking the minimum value of all pixels of the subarea as the pixel characteristic value of the subarea.

Optionally, the arranging the feature values of all the pixels according to the sequence of each sub-region in the original image to obtain a first image includes:

and arranging all the pixel characteristic values according to the sequence of the center of each sub-region relative to the upper left corner of the original image to obtain a first image.

Optionally, after the image segmentation is performed on the original image to obtain a plurality of sub-regions, the method further includes:

determining a size of the first image according to a number of the plurality of sub-regions;

the arranging all the pixel feature values according to the sequence of each sub-region in the original image to obtain the first image comprises:

and arranging all pixel characteristic values according to the size and the sequence to obtain the first image.

The present invention also provides an image classification apparatus, comprising:

the segmentation module is used for carrying out image segmentation on the original image according to preset characteristics to obtain a plurality of mutually disjoint sub-areas and recording the position information occupied by each sub-area in the original image;

the first processing module is used for replacing each sub-region in the original image with a pixel characteristic value corresponding to each sub-region respectively to obtain a first image;

the second processing module is used for classifying the pixels in the first image by using a full convolution neural network to obtain a second image;

and the third processing module is used for carrying out image reconstruction on the second image according to the position information to obtain a classified image corresponding to the size and the pixel position of the original image.

Optionally, the first processing module includes:

the obtaining sub-module is used for obtaining a pixel characteristic value corresponding to each sub-region;

and the arrangement submodule is used for arranging all the pixel characteristic values according to the sequence of each subarea in the original image to obtain the first image.

Optionally, the obtaining sub-module is configured to, for each sub-region:

Optionally, the permutation submodule is configured to:

Optionally, the apparatus further comprises:

a determining module for determining a size of the first image according to the number of the plurality of sub-regions;

the permutation submodule is further configured to: and arranging all pixel characteristic values according to the size and the sequence to obtain the first image.

The technical scheme provided by the embodiment of the invention can have the following beneficial effects:

in this embodiment, since the original image is segmented according to the object-oriented idea, and the pixels of the same class are aggregated into one object, the object may correspond to one pixel of the input image of the global neural network, and thus the object-oriented technique and the full convolution neural network are combined. Therefore, the scale of the target to be classified of the full-convolution neural network can be greatly reduced, and the calculation speed is improved, especially for high-resolution images. Moreover, the image segmentation is carried out by adopting an object-oriented technology, so that the problems of broken image spots and the influence of salt and pepper noise can be solved, and the classification precision is improved.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:

FIG. 1 is a flowchart illustrating a method of image classification according to an embodiment of the present invention;

FIG. 2 is a diagram of a full convolution network model in accordance with an embodiment of the present invention;

FIG. 3 is a flowchart of a method of image classification according to an embodiment of the invention;

FIG. 4 is a diagram illustrating image segmentation and image encoding according to an embodiment of the present invention;

FIG. 5 is a diagram illustrating image reconstruction according to an embodiment of the present invention;

FIG. 6 is a flowchart illustrating an overall image classification method according to an embodiment of the present invention;

FIG. 7 is a schematic structural diagram of an image classification apparatus according to an embodiment of the present invention;

FIG. 8 is a schematic structural diagram of an image classification apparatus according to an embodiment of the present invention;

fig. 9 is a schematic structural diagram of an image classification apparatus according to an embodiment of the invention.

Detailed Description

The preferred embodiments of the present invention will be described in conjunction with the accompanying drawings, and it will be understood that they are described herein for the purpose of illustration and explanation and not limitation.

In the related art, the full convolution neural network is semantically segmented based on the pixel level, the problem of too high computational complexity is brought if the full convolution neural network is directly applied to a high-resolution remote sensing image, and the problem of image spot breakage and the influence of salt and pepper noise are brought by pixel-based processing.

In order to solve the above problems, in this embodiment, the remote sensing image is subjected to image segmentation according to an object-oriented idea, and the segmented sub-regions are used as pixels of an input image of a full convolution neural network, so that an object-oriented technology is combined with the full convolution neural network, and a more efficient image classification method is provided.

Referring to fig. 1, the image classification method in the present embodiment includes:

step 101: and carrying out image segmentation on the original image according to preset characteristics to obtain a plurality of mutually disjoint sub-areas, and recording the position information occupied by each sub-area in the original image.

Step 102: and replacing each sub-region in the original image with a pixel characteristic value corresponding to each sub-region respectively to obtain a first image.

Step 103: and classifying the pixels in the first image by using a full convolution neural network to obtain a second image.

Step 104: and according to the position information, carrying out image reconstruction on the second image to obtain a classified image corresponding to the size and the pixel position of the original image.

The present embodiment first performs image segmentation on the original image according to the object-oriented idea, and aggregates pixels of the same class into an object, which may correspond to one pixel of the input image of the global neural network, thereby combining the object-oriented technique and the full convolution neural network. Therefore, the scale of the target to be classified of the full-convolution neural network can be greatly reduced, and the calculation speed is improved, especially for high-resolution images. Moreover, the image segmentation is carried out by adopting an object-oriented technology, so that the problems of broken image spots and the influence of salt and pepper noise can be solved, and the classification precision is improved.

The image segmentation is to divide an image into a plurality of specific regions which have unique properties and are not intersected with each other. The present embodiment does not limit the algorithm of image segmentation, and any suitable image segmentation algorithm may be used to segment the original image. The methods and kinds of image segmentation are numerous, and can be generally divided into three categories: threshold-based segmentation algorithms, edge-based segmentation algorithms, and region-based segmentation algorithms. Wherein:

threshold-based segmentation algorithm: the thresholding can be divided into single thresholding and multi-thresholding. The most common algorithm used in single threshold processing and the segmentation effect is good is Otsu (maximum between class variance algorithm) algorithm. Multi-threshold processing: the K classes are separated by K-1 thresholds, i.e., a plurality of inter-class variances of the image are calculated.

Edge-based segmentation algorithm: because the edge is a set of gray-level abrupt change pixels in the image, the edge is generally detected by differentiation, and the basic edge detection algorithm includes: roberts operator, Prewitt operator, Sobel operator, and a slightly higher algorithm is: a Marr-hilberth edge detector, a Canny edge detector, etc.

The segmentation method based on the region comprises the following steps: mainly comprises two types of region splitting and merging and region growing. Specific algorithms such as watershed algorithm, cluster analysis, markov random field method, multi-scale segmentation, and neural network method, etc.

In this embodiment, the Full Convolutional Network (FCN) model may have various frameworks, such as U-Net (volumetric network for biological Image Segmentation) model, SegNet (A Deep relational Encoder-Decoder for Image Segmentation) model, Deep Lab (systematic Image Segmentation with Deep relational Networks) model, and so on.

The full convolution network, as a novel deep convolution architecture, has two obvious advantages: one is that any size of input image can be accepted without requiring all training images and test images to be the same size; secondly, the characteristics of a plurality of areas can be obtained through one convolution, the problem of repeated calculation convolution of the neural network transmitted into the convolution by the areas is avoided, and therefore the method is more efficient. The frame of the full convolution network is shown in figure 1, the full convolution network is formed by connecting a plurality of convolution layers in series, in order to reduce the number of training parameters in the network, a pooling layer is added behind the convolution layers, finally, the deconvolution operation is carried out on the output characteristic diagram, and the size of the characteristic diagram is restored to the size of an input image.

The convolution formula of the FCN model is shown in formula (1):

in the formula x _m Represents the m-th output mapping, x _i Represents the ith input, I represents the number of layers of the input feature map, K _mi Representing the ith component of the convolution kernel corresponding to the mth output, b _m Represents the offset corresponding to the mth output, while f () represents the activation function and conv () represents the convolution process. K _mi And b _m Are all preset parameters.

The pooling layer further reduces network parameters mainly by down-sampling, while improving the robustness of the features to variations such as translation of the input. Downsampling operates on an input rectangular area of a fixed size without changing the number of input layers. Therefore, the convolution process can be performed again based on the pooling result, which is used as an input of the second convolution process. The pooling approach in this real-time example uses maximum pooling, see equation (2):

x _(i,j) ＝max({x _{i×s+k,j×s+k} 0,1, …, K) (formula 2)

In the formula x _(i,j) The value at image coordinates (i, j) of the output of the pooling layer is expressed, K represents the side length of the selected square local area at the time of downsampling, and s represents the step size of the local area sliding during calculation (e.g., s is 1).

The convolution and pooling process is repeated a plurality of times, such as performing the convolution and pooling process at least twice, and then performing a deconvolution process by which the size of the input image is restored.

The deconvolution processing is mainly to restore the feature map to the same size of the input image by upsampling, and the spatial information in the input image is reserved. Therefore, it is first necessary to determine the convolution kernel size for deconvolution. See formula (3):

o ═ S (W-1) + k (formula 3)

Where O is the size of the input image, S is the convolution step, W is the size of the last pooled image, and k is the size of the deconvolution kernel. With the known input and output sizes, the size of the convolution kernel that the deconvolution layer should set can be inferred. Then, k is substituted into a deconvolution formula to obtain a full convolution network model, namely F ₁ (X),F ₂ (X),…,F _T (X). Training each training sample set to obtain a group of K through the training sample sets _mi And b _m And obtaining a full convolution network model.

Optionally, the step 102 includes: step a 1-step a 2.

Step A1: and acquiring a pixel characteristic value corresponding to each sub-region.

Step A2: and arranging all the pixel characteristic values according to the sequence of each subarea in the original image to obtain the first image.

In this embodiment, the pixel characteristic values of each sub-region are arranged according to the order of the sub-regions, so that the first image can be obtained simply and conveniently.

Optionally, the step a1 obtains a pixel characteristic value corresponding to each sub-region, including:

for each sub-region:

In this embodiment, different pixel feature value dereferencing methods are provided, and different dereferencing methods can be selected according to different classification conditions and classification effects.

Optionally, step a2 includes: and arranging all the pixel characteristic values according to the sequence of the center of each sub-region relative to the upper left corner of the original image to obtain a first image.

In the present embodiment, an effective method of arranging pixel characteristic values is provided, so that the pixel characteristic values can be arranged in the case of irregularity of the divided sub-regions.

Optionally, after step 101, the method further includes:

determining a size of the first image according to the number of the plurality of sub-regions;

step a2 includes: and arranging all the pixel characteristic values according to the determined size of the first image and the sequence of each subarea in the original image to obtain the first image.

The size of the first image can be determined according to the number of the plurality of sub-regions, so that the first image obtained after the pixel characteristic values are arranged meets the requirement.

The implementation process is described in detail by the embodiment below.

Referring to fig. 3, the image classification method in this embodiment includes:

step 301: and carrying out image segmentation on the original image according to preset characteristics to obtain a plurality of mutually disjoint sub-regions.

The original image may be segmented according to preset features, for example, the original image may be segmented according to gray scale value features, texture features, color features, brightness features, edge features, or the like, so that pixels in the segmented sub-regions have the same or similar certain characteristics. As shown in fig. 4, the original image is exemplarily divided into 4 sub-regions a, b, c, d. In other embodiments of the present invention, the partitioned sub-regions are likely to be different in size, irregular in shape, and different in shape.

Step 302: and recording the position information occupied by each subarea in the original image.

For example, as shown in fig. 4, position information of pixels contained in the 4 sub-areas a, b, c, d in the original image is recorded, respectively.

Step 303: and acquiring a pixel characteristic value corresponding to each sub-region.

Since pixels in the same sub-region have the same or similar characteristics, it is considered to use one pixel characteristic value to represent one sub-region.

For example, the average value of all pixels of a sub-region may be used as the pixel characteristic value of the sub-region; or taking the maximum value of all pixels of the subarea as the pixel characteristic value of the subarea; or, the minimum value of all the pixels of the sub-region is used as the pixel characteristic value of the sub-region.

As shown in fig. 4, in the present embodiment, the average value of all pixels in a sub-region is taken as the pixel characteristic value of the sub-region. Through this step, 4 pixel characteristic values A, B, C and D corresponding to the 4 sub-regions a, b, c, D, respectively, can be obtained.

Step 304: and arranging all the pixel characteristic values according to the sequence of each subarea in the original image to obtain a first image.

For example, as shown in fig. 4, 4 pixel feature values A, B, C and D corresponding to the 4 sub-regions are arranged according to the order of the 4 sub-regions a, b, c, and D in the original image, so as to obtain the first image. At the moment, the original image can be simplified into an image with four pixels, and the calculation complexity is greatly reduced. The above-mentioned

steps

303 and 304 may also be referred to collectively as image coding.

In another embodiment of the present invention, since the divided sub-regions are likely to have different sizes, irregular shapes, and different shapes, all the pixel feature values may be arranged according to the order of the center of each sub-region appearing relative to the upper left corner of the original image to obtain the first image.

In another embodiment of the present invention, the size of the first image may be determined according to the number of the plurality of sub-regions. For example, in step 301, after the original image is subjected to image segmentation, a first row of 3 sub-regions, a second row of 2 sub-regions, and a third row of 4 sub-regions are obtained, and nine sub-regions are obtained, so that a rectangle with a size of 3 × 3 in the first image is determined. Then, the sorting may be started according to the order in which the center of each sub-region appears with respect to the upper left corner of the original image, and all the pixel feature values are arranged to obtain a 3 × 3 image, which is the first image. Furthermore, if it occurs that the number of sub-regions cannot be combined into a rectangle, zero padding may be done at the end, making the first image a regular rectangle.

Step 305, classifying the pixels in the first image by using a full convolution neural network to obtain a second image.

The full convolution neural network can classify pixels in the first image based on pixel levels, the obtained classified image is the second image, and the second image is the same as the first image in size and corresponds to the pixel position.

And step 306, performing image reconstruction on the second image according to the recorded position information occupied by each sub-region in the original image to obtain a classified image corresponding to the size and the pixel position of the original image.

Since the second image is a classification of the first image, and since the pixel position in the first image is already different from the pixel position in the original image, the pixel position of the second image is completely different from that of the original image, and therefore the second image is reconstructed into a classification image corresponding to both the size and the pixel position of the original image according to the positions of the pixels in the sub-regions recorded during image segmentation.

As shown in fig. 5, the second image is reconstructed to obtain the final classification result according to the positions of the pixels in the sub-regions recorded during the image segmentation.

As shown in fig. 6, the general flow of the image classification method provided in this embodiment may be that, first, an original image is subjected to image segmentation and encoding to obtain a first image, the first image is classified by a full convolution neural network model to obtain a second image, and the second image is reconstructed (i.e., image decoding) to obtain a final classified image.

The implementation of image classification is described above, and the implementation can be implemented by an apparatus, the internal structure and function of which are described below.

Referring to fig. 7, the image classification device in the present embodiment includes: a partitioning module 701, a first processing module 702, a second processing module 703 and a third processing module 704.

The segmentation module 701 is configured to perform image segmentation on an original image according to preset features to obtain a plurality of mutually disjoint sub-regions, and record position information occupied by each sub-region in the original image;

a first processing module 702, configured to replace each sub-region in the original image with a pixel feature value corresponding to each sub-region, respectively, to obtain a first image;

a second processing module 703, configured to classify pixels in the first image by using a full convolution neural network to obtain a second image;

a third processing module 704, configured to perform image reconstruction on the second image according to the position information, so as to obtain a classified image corresponding to both the size and the pixel position of the original image.

In an embodiment of the present invention, as shown in fig. 8, the first processing module 702 includes:

the obtaining sub-module 801 is configured to obtain a pixel characteristic value corresponding to each sub-region;

an arranging sub-module 802, configured to arrange all the pixel feature values to obtain the first image according to an order of each sub-region in the original image.

In an embodiment of the present invention, the obtaining sub-module 801 is configured to, for each sub-region:

In one embodiment of the invention, permutation submodule 802 is configured to:

In an embodiment of the present invention, as shown in fig. 9, the apparatus further includes:

a determining module 705 for determining a size of the first image according to the number of the plurality of sub-regions;

permutation submodule 802 is also operable to: and arranging all pixel characteristic values according to the size and the sequence to obtain the first image.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. An image classification method, comprising:

replacing each sub-region in the original image with a pixel characteristic value corresponding to each sub-region respectively to obtain a first image, comprising: for each sub-region: taking the average value of all pixels of the sub-area as the pixel characteristic value of the sub-area; or taking the maximum value of all pixels of the sub-area as the pixel characteristic value of the sub-area; or taking the minimum value of all pixels of the sub-region as the pixel characteristic value of the sub-region; arranging all pixel characteristic values according to the sequence of each subarea in the original image to obtain the first image;

2. The image classification method according to claim 1, wherein the arranging all the pixel feature values according to the order of each sub-region in the original image to obtain the first image further comprises:

3. The image classification method according to claim 1, wherein after the image segmentation is performed on the original image to obtain a plurality of sub-regions, the method further comprises:

4. An image classification apparatus, comprising:

the first processing module is configured to replace each sub-region in the original image with a pixel feature value corresponding to each sub-region, respectively, to obtain a first image, and includes: for each sub-region: taking the average value of all pixels of the sub-area as the pixel characteristic value of the sub-area; or taking the maximum value of all pixels of the sub-area as the pixel characteristic value of the sub-area; or taking the minimum value of all pixels of the subarea as the pixel characteristic value of the subarea; arranging all pixel characteristic values according to the sequence of each subarea in the original image to obtain the first image;

5. The image classification apparatus according to claim 4,

the permutation submodule is further configured to:

6. The image classification device according to claim 4, characterized in that the device further comprises: