CN109657715B

CN109657715B - Semantic segmentation method, device, equipment and medium

Info

Publication number: CN109657715B
Application number: CN201811520565.7A
Authority: CN
Inventors: 黄国恒; 陈俊安; 黄斯彤; 胡可
Original assignee: Guangdong Airport Group Logistics Co ltd
Current assignee: Guangdong Airport Group Logistics Co ltd
Priority date: 2018-12-12
Filing date: 2018-12-12
Publication date: 2024-02-06
Anticipated expiration: 2038-12-12
Also published as: CN109657715A

Abstract

The invention discloses a semantic segmentation method, a semantic segmentation device, semantic segmentation equipment and a semantic segmentation medium. The method comprises the following steps: acquiring a target frame image, and dividing a plurality of independent target areas in the target frame image; respectively executing semantic segmentation operation based on the corresponding semantic segmentation model on each target region to generate a corresponding result image; the semantic segmentation model is generated by training a convolutional neural network according to semantic division standards corresponding to the target area. The method divides the complete target frame image into a plurality of independent target areas, semantic segmentation of each target area is independent, and further, the situation that color marks are mutually confused in the subsequent image understanding process due to the fact that the pixels with more kinds of semantic meanings are represented by the color marks of limited kinds is relatively avoided, and therefore usability of semantic segmentation results is guaranteed. In addition, the invention also provides a semantic segmentation device, equipment and medium, and the beneficial effects are the same as those described above.

Description

Semantic segmentation method, device, equipment and medium

Technical Field

The present invention relates to the field of computer vision video detection, and in particular, to a semantic segmentation method, apparatus, device, and medium.

Background

The semantic segmentation of the image can be said to be a fundamental technology of image understanding, and plays a significant role in application scenes such as an automatic driving system (specifically, street view identification and understanding), unmanned aerial vehicle application (landing point judgment), wearable equipment and the like.

It is known that an image is composed of a plurality of pixels (pixels), and semantic segmentation is to group the pixels according to the semantic meaning expressed in the image, in the traditional semantic segmentation, the main operation content is to name the colors associated with the names of the objects, so that after the image is subjected to semantic segmentation, the objects of different types in the original image are covered with the corresponding colors in the result image, and the regions of the same color in the result image represent the Pixel groups of the same type. However, in actual use, the content in the image is often rich, so that the situation that the number of pixel groups is large when the semantic segmentation operation is performed based on the image may be caused, and because the overall types of colors are relatively small and the distinguishability among colors of different chromaticities in the same type of colors is low, the pixel groups of a plurality of types of semantic meanings are represented by the color marks of a limited type, the mutual confusion among the different color marks is very easy to cause, and further, in the subsequent image understanding process, part of pixel information which can be used as important discrimination basis may be lost, so that the overall usability of the semantic segmentation result is difficult to ensure.

Therefore, the semantic segmentation method is provided to relatively avoid the condition that marks with different colors in the result image are confused with each other, so that the usability of the semantic segmentation result is ensured, and the problem to be solved by the person skilled in the art is urgent.

Disclosure of Invention

The invention aims to provide a semantic segmentation method, a semantic segmentation device, semantic segmentation equipment and a semantic segmentation medium, so that the situation that marks with different colors in a result image are mixed with each other is relatively avoided, and usability of a semantic segmentation result is further ensured.

In order to solve the technical problems, the invention provides a semantic segmentation method, which comprises the following steps:

acquiring a target frame image, and dividing a plurality of independent target areas in the target frame image;

respectively executing semantic segmentation operation based on the corresponding semantic segmentation model on each target region to generate a corresponding result image; the semantic segmentation model is generated by training a convolutional neural network according to semantic division standards corresponding to the target area.

Preferably, dividing the plurality of independent target areas in the target frame image includes:

selecting key points in the target frame image;

and respectively carrying out pooling processing based on each key point so as to divide a plurality of independent target areas in the target frame image.

Preferably, the semantic segmentation model is specifically generated by training a convolutional neural network according to semantic division standards corresponding to the target region, and the semantic division model is specifically generated by:

the semantic segmentation model is generated by training an IndRNN convolutional neural network according to semantic division standards corresponding to the target region.

Preferably, the semantic segmentation operation based on the corresponding semantic segmentation model is executed for each target region respectively, specifically:

and respectively executing semantic segmentation operation based on the corresponding semantic segmentation model on the mutual information among the pixel points in each target area.

Preferably, the target frame image is acquired, specifically, the target frame image in the video is acquired.

In addition, the invention also provides a semantic segmentation device, which comprises:

the acquisition and division module is used for acquiring a target frame image and dividing a plurality of independent target areas in the target frame image;

the semantic segmentation module is used for executing semantic segmentation operation based on the corresponding semantic segmentation model on each target area respectively to generate a corresponding result image; the semantic segmentation model is generated by training a convolutional neural network according to semantic division standards corresponding to the target area.

In addition, the invention also provides semantic segmentation equipment, which comprises the following steps:

a memory for storing a computer program;

a processor for implementing the steps of the semantic segmentation method as described above when executing a computer program.

In addition, the invention also provides a computer readable storage medium, and the computer readable storage medium stores a computer program, and the computer program realizes the steps of the semantic segmentation method when being executed by a processor.

After a target frame image is acquired, dividing a plurality of independent target areas in the target frame image, and further executing semantic segmentation based on a semantic segmentation model corresponding to each target area respectively so as to generate a result image corresponding to each target area; each target area has a corresponding semantic division standard, and further the semantic division model corresponding to each target area is generated by training the convolutional neural network according to the semantic division standard corresponding to the target area. The method divides the complete target frame image into a plurality of independent target areas, and compared with the complete target frame image, the content contained in the target areas is relatively less, and on the basis, the semantic segmentation of each target area is respectively carried out according to the corresponding semantic segmentation standard, so that the semantic segmentation of each target area is mutually independent, and further the situation that the color marks are mutually confused in the subsequent image understanding process due to the fact that the pixel groups with more semantic meanings are represented by the color marks of a limited variety is relatively avoided, and the usability of the semantic segmentation result is ensured. In addition, the invention also provides a semantic segmentation device, equipment and medium, and the beneficial effects are the same as those described above.

Drawings

For a clearer description of embodiments of the present invention, the drawings that are required to be used in the embodiments will be briefly described, it being apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to the drawings without inventive effort for those skilled in the art.

FIG. 1 is a flowchart of a semantic segmentation method according to an embodiment of the present invention;

fig. 2 is a structural diagram of a semantic segmentation device according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by a person of ordinary skill in the art without making any inventive effort are within the scope of the present invention.

The core of the invention is to provide a semantic segmentation method to relatively avoid the condition that marks with different colors in a result image are confused with each other, thereby ensuring the usability of semantic segmentation results. Another core of the present invention is to provide a semantic segmentation device, apparatus and medium.

In order to better understand the aspects of the present invention, the present invention will be described in further detail with reference to the accompanying drawings and detailed description.

Example 1

Fig. 1 is a flowchart of a semantic segmentation method according to an embodiment of the present invention. Referring to fig. 1, the semantic segmentation method specifically includes the steps of:

step S10: and acquiring a target frame image, and dividing a plurality of independent target areas in the target frame image.

It should be noted that, the target frame image in this step may be specifically a content image corresponding to a certain frame in the video, or may be a single frame image, that is, a still picture. After the target frame image is obtained, the target frame image is divided into independent target areas, the target areas are mutually independent, namely, the non-coincident parts between the target areas, and in order to ensure the overall degree of semantic segmentation on the target frame image, the target frame image can be completely restored by the corresponding combination of the target areas. The dividing of the target area may be to divide the target frame image according to the preset area size, and the size of each target area obtained by the dividing method is the same, but the content is relatively random; the dividing of the target area may be to divide the content having a certain association with the key content and the key content into the same target area with the key content included in the target frame image as a center, and by this dividing method, it is possible to relatively ensure that the content in the target area has a higher association, and relatively ensure that no content having association exists between the target areas. The two dividing modes for the target frame image are only two dividing modes listed in a plurality of dividing modes, and the user can be determined according to the actual requirement of semantic division, and the method is not particularly limited herein.

Step S11: and respectively executing semantic segmentation operation based on the corresponding semantic segmentation model on each target region to generate a corresponding result image.

The semantic segmentation model is generated by training a convolutional neural network according to semantic division standards corresponding to the target area.

It should be noted that, the semantic segmentation refers to classifying each pixel in the picture, and the core of the step is to execute the semantic segmentation operation based on the corresponding semantic segmentation model for each target region, so that the semantic segmentation operation between the target regions is independent, and the content in each target region is relatively less than that of the complete target frame image, so that when the semantic segmentation is performed on each target region, the number of required color marks is relatively less, and the occurrence of confusion between the color marks in the generated result image can be avoided to the greatest extent.

In addition, the semantic segmentation model in the step is generated by training the convolutional neural network according to the semantic division standard corresponding to each target area, the semantic division standard refers to the division mode of the pixel type and the corresponding relation between the color mark and the pixel type during semantic segmentation, the semantic division standards of each target area can be different, and each result image generated by carrying out semantic division on the target area according to the semantic division standards different from each other is subjected to subsequent image understanding by corresponding image understanding logic. For example, two result images a and B generated based on different semantic segmentation criteria, the red mark represents a plant in the pixel type of the result image a, and represents a pedestrian in the result image B, so that the understanding manners of the result image a and the result image B are different from each other, and further, when the subsequent images of the result images a and B are understood, the understanding should be performed respectively according to the corresponding semantic division criteria of the two.

After a target frame image is acquired, dividing a plurality of independent target areas in the target frame image, and further executing semantic segmentation based on a semantic segmentation model corresponding to each target area respectively so as to generate a result image corresponding to each target area; each target area has a corresponding semantic division standard, and further the semantic division model corresponding to each target area is generated by training the convolutional neural network according to the semantic division standard corresponding to the target area. The method divides the complete target frame image into a plurality of independent target areas, and compared with the complete target frame image, the content contained in the target areas is relatively less, and on the basis, the semantic segmentation of each target area is respectively carried out according to the corresponding semantic segmentation standard, so that the semantic segmentation of each target area is mutually independent, and further the situation that the color marks are mutually confused in the subsequent image understanding process due to the fact that the pixel groups with more semantic meanings are represented by the color marks of a limited variety is relatively avoided, and the usability of the semantic segmentation result is ensured.

Example two

On the basis of the above examples, the present invention also provides the following series of preferred embodiments.

As a preferred embodiment, dividing the plurality of independent target areas in the target frame image includes:

selecting key points in the target frame image;

In the present embodiment, key points are selected from the target frame image, and then the pooling process is performed with each key point as the center. In this embodiment, key points are selected from the target frame image, and the operation steps of performing pooling processing based on each key point respectively may be implemented by a convolution manner, that is, by a convolution neural network.

The key points are reference points according to which the target area is divided, the essence of the key points is pixel points with a certain feature (features), the key points are set for classifying other pixel points with similar features with the key points according to the features of the key points, and the other pixel points obtained by classification gradually form the completed target area. Theoretically, all extracted features can be used to train a classifier, such as a softmax classifier, but doing so is relatively computationally intensive, which means that features that are useful in one image region are most likely to be equally applicable in another region. Thus, to describe a large image, aggregate statistics need to be performed on features at different locations, e.g., an average (or maximum) of a particular feature over an area of the image may be calculated. These summary statistics not only have lower dimensions (than using all extracted features), but also improve the results. This polymerization operation is known as pooling (pooling). According to the embodiment, similar pixel points are aggregated according to the key points in a pooling processing mode, so that the target area is obtained through division, and the operation cost in the process of dividing the target area can be reduced while the content in the target area is ensured to have similarity.

In addition, the key points in the present embodiment may be selected in advance from the target frame image by the convolutional neural network model, or may be selected manually, which is not particularly limited herein.

In addition, as a preferred embodiment, the semantic segmentation model is specifically generated by training the convolutional neural network according to the semantic division standard corresponding to the target region, and the semantic division model is specifically generated by:

Note that a Recurrent Neural Network (RNN) has been widely used for processing sequence data. However, RNNs are often difficult to train due to common gradient extinction and explosion problems and difficulty in learning long-term patterns. To address these issues, researchers have proposed long-term memory (LSTM) and gated loop units (GRU), but using hyperbolic tangent and Sigmoid functions attenuate gradients with layers. Thus, building an efficient trainable depth network is a challenging task. In addition, all neurons in the RNN layer become entangled, and their behavior is difficult to interpret. In order to further solve the above problems, there is currently provided a novel RNN mode, i.e., an independent recurrent neural network (indirnn) in which neurons of each layer are independent of each other and are connected across layers, management between layers of the indirnn is relatively easy, and gradient explosion and gradient disappearance can be prevented. Therefore, in the embodiment, the semantic segmentation model is generated by training the indirnn convolutional neural network according to the semantic division standard corresponding to the target region in advance, so that the usability and reliability of the semantic segmentation model can be relatively ensured.

Further, on the basis of the above-described embodiments, as a preferred embodiment, the semantic segmentation operation based on the corresponding semantic segmentation model is performed for each target region separately, specifically:

In the present embodiment, the region is semantically divided, that is, the pixels are classified based on mutual information between the pixels included in the target region. Mutual information (Mutual Information) is an information measure in an information theory, which is a common method of computational linguistic model analysis that measures the interaction between two objects. In filtering questions, to measure the degree of distinction of features from topics. The definition of mutual information approximates to cross entropy. The feature extraction using mutual information theory is based on the following assumptions: the occurrence frequency of the entry is high in a specific category, but the occurrence frequency of the entry is lower in other categories, and the mutual information of the entry and the category is larger. Mutual information is typically used as a measure between a feature and a class, and is the largest if a feature belongs to that class.

The mutual information between the pixel points in the embodiment refers to information representing the relevance between the pixel points, and the degree of approximation of which type of pixel group each pixel point in the target area has with can be clearly known through the mutual information of each pixel point in the target area. In the implementation, the feature corresponding to the pixel group should be set in advance, so as to be used as a basis for evaluating whether the pixel belongs to the pixel group. In the method, no assumption needs to be made on the nature of the relation between the characteristics and the pixel grouping categories, so that the overall efficiency of semantic segmentation of the target region can be relatively ensured.

The mutual information is expressed in the form of a formula:

x is used for representing a pixel point set in a target area, X epsilon X is used for representing a certain pixel point, Z is used for representing a set of coding vectors, Z epsilon Z is used for representing a certain coding vector of the pixel point, namely the characteristic of the pixel point, p (z|x) is used for representing the distribution of the coding vectors generated by the X, and the coding vector is a Gaussian distribution or is simply understood to be an encoder which is sought by the user. The correlation of X, Z can be expressed in terms of mutual information as follows:

represents the distribution of the raw data, while p (Z) is the distribution of the entire Z after p (z|x) is given.

On the basis of the above-described series of embodiments, as a preferred embodiment, the acquisition of the target frame image is specifically the acquisition of the target frame image in the video.

It should be noted that, because the semantic segmentation is considered as a basic technology of image understanding, the method is often applied to scenes such as an automatic driving system (specifically, street view identification and understanding), unmanned aerial vehicle application (landing point judgment), wearable equipment and the like, and the scenes are often time-dynamic scenes rather than fixed images, the method acquires a target frame image in a video, performs subsequent corresponding processing, and can be matched with a use scene of the current semantic segmentation in image understanding, so that the overall usability of a current image understanding result is further improved.

Example III

The embodiments of the semantic segmentation method are described in detail above, and the invention further provides a semantic segmentation device corresponding to the method, and since the embodiments of the device part and the embodiments of the method part correspond to each other, the embodiments of the device part refer to the description of the embodiments of the method part, which is not repeated herein.

Fig. 2 is a structural diagram of a semantic segmentation device according to an embodiment of the present invention. The semantic segmentation device provided by the embodiment of the invention comprises:

the acquisition and division module 10 is configured to acquire a target frame image and divide a plurality of independent target areas in the target frame image.

The semantic segmentation module 11 is configured to perform a semantic segmentation operation based on a corresponding semantic segmentation model on each target region, and generate a corresponding result image. The semantic segmentation model is generated by training a convolutional neural network according to semantic division standards corresponding to the target area.

After a target frame image is acquired, dividing a plurality of independent target areas in the target frame image, and further executing semantic segmentation based on a semantic segmentation model corresponding to each target area respectively so as to generate a result image corresponding to each target area; each target area has a corresponding semantic division standard, and further the semantic division model corresponding to each target area is generated by training the convolutional neural network according to the semantic division standard corresponding to the target area. The device divides the complete target frame image into a plurality of independent target areas, and compared with the complete target frame image, the content contained in the target areas is relatively less, and on the basis, the semantic segmentation of each target area is respectively carried out according to the corresponding semantic segmentation standard, so that the semantic segmentation of each target area is mutually independent, and further the situation that the color marks are mutually confused in the subsequent image understanding process due to the fact that the pixel groups with more semantic meanings are represented by the color marks of a limited variety is relatively avoided, and the usability of the semantic segmentation result is ensured.

Example IV

The invention also provides a semantic segmentation device, comprising:

a memory for storing a computer program;

The invention also provides a computer readable storage medium, on which a computer program is stored, which when being executed by a processor implements the steps of the semantic segmentation method as described above.

After a target frame image is acquired, dividing a plurality of independent target areas in the target frame image, and further executing semantic segmentation based on a semantic segmentation model corresponding to each target area on each target area respectively to generate a result image corresponding to each target area; each target area has a corresponding semantic division standard, and further the semantic division model corresponding to each target area is generated by training the convolutional neural network according to the semantic division standard corresponding to the target area. Because the computer readable storage medium divides the complete target frame image into a plurality of independent target areas, and the content contained in the target areas is relatively less than that of the complete target frame image, the semantic segmentation standard is respectively carried out on each target area on the basis, so that the semantic segmentation of each target area is mutually independent, and further, the situation that the color marks are mutually confused in the subsequent image understanding process due to the fact that the pixel groups with more semantic meanings are represented by the color marks of limited types is relatively avoided, and the usability of the semantic segmentation result is ensured.

The semantic segmentation method, the semantic segmentation device, the semantic segmentation equipment and the semantic segmentation medium provided by the invention are described in detail. In the description, each embodiment is described in a progressive manner, and each embodiment is mainly described by the differences from other embodiments, so that the same similar parts among the embodiments are mutually referred. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section. It should be noted that it will be apparent to those skilled in the art that various modifications and adaptations of the invention can be made without departing from the principles of the invention and these modifications and adaptations are intended to be within the scope of the invention as defined in the following claims.

It should also be noted that in this specification, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

Claims

1. A semantic segmentation method, comprising:

respectively executing semantic segmentation operation based on the corresponding semantic segmentation model on each target region to generate a corresponding result image; the semantic segmentation model is generated by training a convolutional neural network according to semantic division standards corresponding to the target areas, and semantic segmentation operations among the target areas are mutually independent;

the semantic segmentation operation based on the corresponding semantic segmentation model is specifically performed on each target region:

respectively executing the semantic segmentation operation based on the corresponding semantic segmentation model on the mutual information among the pixel points in each target area; the semantic segmentation operation is performed based on the corresponding relation between the pixel type division and the color mark; the pixel type is the type corresponding to the pixel where the individual in the image is located.

2. The method of claim 1, wherein the dividing the plurality of independent target areas in the target frame image comprises:

selecting key points in the target frame image;

3. The method according to claim 1, wherein the semantic segmentation model is specifically generated by training a convolutional neural network according to semantic division standards corresponding to the target region, and is specifically:

the semantic segmentation model is generated by training an IndRNN convolutional neural network according to the semantic division standard corresponding to the target region.

4. A method according to any one of claims 1 to 3, wherein said acquiring a target frame image is in particular acquiring said target frame image in a video.

5. A semantic segmentation apparatus, comprising:

the semantic segmentation module is used for executing semantic segmentation operation based on the corresponding semantic segmentation model on each target region respectively to generate a corresponding result image; the semantic segmentation model is generated by training a convolutional neural network according to semantic division standards corresponding to the target areas, and semantic segmentation operations among the target areas are mutually independent;

the process of executing the semantic segmentation operation based on the corresponding semantic segmentation model on each target region through the semantic segmentation module comprises the following steps: respectively executing the semantic segmentation operation based on the corresponding semantic segmentation model on the mutual information among the pixel points in each target area; the semantic segmentation operation is performed based on the corresponding relation between the pixel type division and the color mark; the pixel type is the type corresponding to the pixel where the individual in the image is located.

6. A semantic segmentation device, comprising:

a memory for storing a computer program;

processor for implementing the steps of the semantic segmentation method according to any one of claims 1 to 4 when executing the computer program.

7. A computer readable storage medium, characterized in that the computer readable storage medium has stored thereon a computer program which, when executed by a processor, implements the steps of the semantic segmentation method according to any one of claims 1 to 4.