CN109657715A

CN109657715A - A kind of semantic segmentation method, apparatus, equipment and medium

Info

Publication number: CN109657715A
Application number: CN201811520565.7A
Authority: CN
Inventors: 黄国恒; 陈俊安; 黄斯彤; 胡可
Original assignee: Guangdong University of Technology
Current assignee: Guangdong Airport Group Logistics Co ltd
Priority date: 2018-12-12
Filing date: 2018-12-12
Publication date: 2019-04-19
Anticipated expiration: 2038-12-12
Also published as: CN109657715B

Abstract

The invention discloses a kind of semantic segmentation method, apparatus, equipment and media.This method step includes: to obtain target frame image, and multiple independent target areas are divided in target frame image；The semantic segmentation based on corresponding semantic segmentation model is executed to each target area respectively to operate, and generates corresponding result images；Wherein, semantic segmentation model is generated using the convolutional neural networks according to target corresponding semantic criteria for classifying training in region.Since this method is that complete target frame image is divided into multiple independent target areas, the semantic segmentation of each target area is mutually indepedent, and then the opposite group pixels avoided because being characterized with the color mark of limited kinds compared with multiple types semantic meaning, and the case where causing color mark mutually to obscure during subsequent image understanding, it ensure that the availability of semantic segmentation result with this.In addition, the present invention also provides a kind of semantic segmentation device, equipment and medium, beneficial effect are same as above.

Description

A kind of semantic segmentation method, apparatus, equipment and medium

Technical field

The present invention relates to computer vision field of video detection, more particularly to a kind of semantic segmentation method, apparatus, equipment And medium.

Background technique

The semantic segmentation of image can be described as the basic technology of image understanding, in automated driving system (specially streetscape Identification and understand), have very important work in the unmanned plane application application scenarios such as (landing point judgement) and wearable device With.

It is well known that image is made of many pixels (Pixel), and semantic segmentation be exactly by each pixel according to The difference of expressed semantic meaning is grouped (Grouping) accordingly in image, main in traditional semantic segmentation Operation content be by color name associated with object names progress, and then to image carry out semantic segmentation after, original image In different types of object in result images with the covering of corresponding color, the region of same color characterizes in result images The group pixels of same type.But in actual use, the content in image is often compared with horn of plenty, and then may cause and be based on When the image carries out semantic segmentation operation, a fairly large number of situation of group pixels, and relatively due to the whole type of color It is few, and identification is lower between the color of different colorations in similar color, thus with the color mark of limited kinds characterize compared with The group pixels of multiple types semantic meaning easily cause mutually obscuring between different colours label, and then in subsequent image In understanding process, may lost part can be as the Pixel Information of important distinguishing rule, therefore, it is difficult to ensure semantic segmentation As a result overall usability.

It can be seen that provide a kind of semantic segmentation method, phase between different colours label is avoided in result images with opposite The case where mutually obscuring, and then guarantee the availability of semantic segmentation result, it is those skilled in the art's urgent problem to be solved.

Summary of the invention

The object of the present invention is to provide a kind of semantic segmentation method, apparatus, equipment and media, avoid result images with opposite The case where mutually obscuring between middle different colours label, and then guarantee the availability of semantic segmentation result.

In order to solve the above technical problems, the present invention provides a kind of semantic segmentation method, comprising:

Target frame image is obtained, and divides multiple independent target areas in target frame image；

The semantic segmentation based on corresponding semantic segmentation model is executed to each target area respectively to operate, and generates corresponding knot Fruit image；Wherein, semantic segmentation model is to utilize the convolutional neural networks according to target corresponding semantic criteria for classifying training life in region At.

Preferably, multiple independent target areas are divided in target frame image includes:

Choose the key point in target frame image；

Pond processing is carried out respectively based on each key point, to divide corresponding multiple independent targets in target frame image Region.

Preferably, semantic segmentation model is to utilize the convolutional neural networks according to target corresponding semantic criteria for classifying training in region It generates specifically:

Semantic segmentation model is to utilize the according to target corresponding semantic criteria for classifying training in region of IndRNN convolutional neural networks It generates.

Preferably, the semantic segmentation based on corresponding semantic segmentation model is executed to each target area respectively to operate specifically Are as follows:

The semanteme based on corresponding semantic segmentation model point is executed between the mutual information pixel in each target area respectively Cut operation.

Preferably, obtaining target frame image is specially the target frame image obtained in video.

In addition, the present invention also provides a kind of semantic segmentation devices, comprising:

Division module is obtained, for obtaining target frame image, and divides multiple independent target areas in target frame image Domain；

Semantic segmentation module, for executing the semantic segmentation based on corresponding semantic segmentation model to each target area respectively Operation, generates corresponding result images；Wherein, semantic segmentation model is to utilize the convolutional neural networks according to target corresponding language in region What adopted criteria for classifying training generated.

In addition, the present invention also provides a kind of semantic segmentation equipment, comprising:

Memory, for storing computer program；

Processor is realized when for executing computer program such as the step of above-mentioned semantic segmentation method.

In addition, being stored with meter on computer readable storage medium the present invention also provides a kind of computer readable storage medium Calculation machine program is realized when computer program is executed by processor such as the step of above-mentioned semantic segmentation method.

Semantic segmentation method provided by the present invention divides more after getting target frame image in target frame image A independent target area, and then each target area is executed be based on semantic segmentation model corresponding with the target area respectively Semantic segmentation, and then generate the corresponding result images in each target area；Wherein, each target area all has corresponding language The adopted criteria for classifying, and then the corresponding semantic segmentation model in each target area is by convolutional neural networks according to the target area The corresponding semanteme criteria for classifying is trained and generates.Due to this method be complete target frame image is divided into it is multiple solely Vertical target area, and the content that target area is included is relatively fewer compared to for complete target frame image, in this base Semantic segmentation, therefore the semanteme of each target area point are carried out with corresponding semantic segmentation standard to each target area respectively on plinth It cuts independently of each other, and then the opposite pixel point avoided because being characterized with the color mark of limited kinds compared with multiple types semantic meaning Group, and the case where cause color mark mutually to obscure during subsequent image understanding, semantic segmentation ensure that with this As a result availability.In addition, the present invention also provides a kind of semantic segmentation device, equipment and medium, beneficial effect are same as above.

Detailed description of the invention

In order to illustrate the embodiments of the present invention more clearly, attached drawing needed in the embodiment will be done simply below It introduces, it should be apparent that, drawings in the following description are only some embodiments of the invention, for ordinary skill people For member, without creative efforts, it is also possible to obtain other drawings based on these drawings.

Fig. 1 is a kind of flow chart of semantic segmentation method provided in an embodiment of the present invention；

Fig. 2 is a kind of structure chart of semantic segmentation device provided in an embodiment of the present invention.

Specific embodiment

Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, rather than whole embodiments.Based on this Embodiment in invention, those of ordinary skill in the art are without making creative work, obtained every other Embodiment belongs to the scope of the present invention.

Core of the invention is to provide a kind of semantic segmentation method, avoids different colours in result images from marking it with opposite Between the case where mutually obscuring, and then guarantee the availability of semantic segmentation result.Another core of the invention is to provide a kind of semanteme Segmenting device, equipment and medium.

In order to enable those skilled in the art to better understand the solution of the present invention, with reference to the accompanying drawings and detailed description The present invention is described in further detail.

Embodiment one

Fig. 1 is a kind of flow chart of semantic segmentation method provided in an embodiment of the present invention.Referring to FIG. 1, semantic segmentation side The specific steps of method include:

Step S10: target frame image is obtained, and divides multiple independent target areas in target frame image.

It should be noted that the target frame image in this step can be specially the corresponding content graph of a certain frame in video Picture is also possible to single-frame images, i.e. a static picture.After getting target frame image, target frame image is carried out only The division of target area is stood, is referred between target area and target area independently of each other between target area without the portion being overlapped Point, and in order to ensure carrying out the fullest extent of semantic segmentation for target frame image, each target area is combined should be able to Complete reduction target frame image.The division of target area can be according to pre-set area size to target frame figure As being split, the size of each target area obtained by above-mentioned division mode is identical, but content is relatively random；For mesh Centered on the division in mark region is also possible to the key content to include in target frame image, there will be certain close with key content The content and key content of connection property are divided in identical target area, by this division mode, opposite can ensure target Content relevance with higher in region, it is opposite to ensure that there is no the contents with relevance between each target area.On Two kinds stated are for two enumerated kind division mode that the partitioning scheme of target frame image is only in numerous division modes, Yong Huke Be not specifically limited depending on the actual demand of semantic segmentation herein.

Step S11: the semantic segmentation based on corresponding semantic segmentation model is executed to each target area respectively and is operated, is generated Corresponding result images.

Wherein, semantic segmentation model is to utilize the convolutional neural networks according to target corresponding semantic criteria for classifying training life in region At.

Classify it should be noted that semantic segmentation refers to each pixel in picture, the core of this step exists Operated in executing semantic segmentation based on corresponding semantic segmentation model to each target area respectively, thus each target area it Between semantic segmentation operation it is mutually indepedent, and the content in each target area is opposite for complete target frame image It is less, therefore when carrying out semantic segmentation to each target area, the quantity of required color mark is relatively fewer, can be maximum Degree avoids in the result images of generation, the case where obscuring between color mark.

In addition, the semantic segmentation model in this step is using convolutional neural networks according to the corresponding semanteme in each target area The criteria for classifying and train generation, semantic segmentation standard refers in semantic segmentation to the division mode of type of pixel and color Corresponding relationship between label and type of pixel, and can be different for the semantic segmentation standard of each target area, it is right In each result images for carrying out semantic division to target area according to the mutually different semantic criteria for classifying and generating, Ying Yixiang The image understanding logic answered carries out subsequent image understanding.For example, two result figures generated based on different semantic segmentation standards As A and B, the type of pixel that red-label characterizes in result images A is plant, and what is characterized in result images B is then row People, thus it is different for result images A and the understanding mode of result images B, and then subsequent for result images A and B Image understanding when, should the semantic criteria for classifying carries out respectively accordingly according to the two.

Semantic segmentation method provided by the present invention divides more after getting target frame image in target frame image A independent target area, and then each target area is executed be based on semantic segmentation model corresponding with the target area respectively Semantic segmentation, and then generate the corresponding result images in each target area；Wherein, each target area all has corresponding language The adopted criteria for classifying, and then the corresponding semantic segmentation model in each target area is by convolutional neural networks according to the target area The corresponding semanteme criteria for classifying is trained and generates.Due to this method be complete target frame image is divided into it is multiple solely Vertical target area, and the content that target area is included is relatively fewer compared to for complete target frame image, in this base Semantic segmentation, therefore the semanteme of each target area point are carried out with corresponding semantic segmentation standard to each target area respectively on plinth It cuts independently of each other, and then the opposite pixel point avoided because being characterized with the color mark of limited kinds compared with multiple types semantic meaning Group, and the case where cause color mark mutually to obscure during subsequent image understanding, semantic segmentation ensure that with this As a result availability.

Embodiment two

On the basis of the above embodiments, the present invention also provides a series of preferred embodiments.

As a preferred embodiment, dividing multiple independent target areas in target frame image and including:

Choose the key point in target frame image；

It should be noted that present embodiment is will to choose key point in target frame image, and then with each key point Centered on carry out pond processing respectively.Present embodiment chooses key point in target frame image, and is based on each key point The operating procedure for carrying out pondization processing respectively can be realized by way of convolution by convolutional neural networks.

Key point be divided into target area according to reference point, key point is in the nature with a certain feature (features) pixel, the purpose that key point is arranged are that have similar features according to the tagsort and key point of key point Other pixels, and then be gradually made of other pixels that classification obtains the target area of completion.Theoretically, Ke Yiyong All obtained features extracted go to train classifier, such as softmax classifier, but it is relatively large to do so calculation amount, this It means that very likely equally applicable in another region in an image-region useful feature.Therefore, big in order to describe Image, need to carry out aggregate statistics to the feature of different location, for example, some that can be calculated on one region of image is specific The average value (or maximum value) of feature.Not only there is these summary statistics features lower dimension (to be extracted compared to using all The feature arrived), while can also improve result.This converging operation is pondization processing (pooling).Present embodiment passes through pond The mode for changing processing polymerize similar pixel and then divide according to key point obtains target area, can ensure target area In content have similitude while, reduce divide target area process computing overhead.

In addition, the key point in present embodiment can be selected in target frame image by convolutional neural networks model in advance It takes, can also be selected by artificial mode, be not specifically limited herein.

In addition, as a preferred embodiment, semantic segmentation model is to utilize convolutional neural networks according to target region What corresponding semanteme criteria for classifying training generated specifically:

It should be noted that Recognition with Recurrent Neural Network (RNN) has been widely used for the processing of sequence data.However, due to Common gradient disappears and explosion issues and is difficult to learn chronic mode, and RNN is generally difficult to train.It is asked to solve these Topic, researcher proposes shot and long term memory (LSTM) and gating cycle unit (GRU), but uses tanh and Sigmoid Function can make gradient decay with figure layer again.Therefore, it constructs efficiently trainable depth network and is challenging task.Separately Outside, all neurons all tied up in knots in RNN figure layer, their behavior are difficult to explain.In order to further solve above-mentioned ask Topic is currently provided with a kind of novel RNN mode, i.e. independent loops neural network (IndRNN), in IndRNN convolutional Neural net In network, every layer of neuron is mutually independent, and parallel link, and management is relatively easy between each figure layer of IndRNN, can prevent Only gradient explosion and gradient disappear.Therefore, present embodiment advances with IndRNN convolutional neural networks according to target region correspondence Semantic criteria for classifying training generative semantics parted pattern, opposite can ensure the availability of semantic segmentation model and reliable Property.

In addition, on the basis of the above embodiment, as a preferred embodiment, respectively to each target area Execute the semantic segmentation operation based on corresponding semantic segmentation model specifically:

It should be noted that present embodiment is based on the mutual trust between each pixel included in target area Breath carries out semantic segmentation to the region, i.e., sorts out to pixel.Mutual information (Mutual Information) is information theory In a kind of measure information, mutual information is the common method of computational linguistics model analysis, it measure two objects between it is mutual Property.For measures characteristic for the discrimination of theme in filtration problem.The definition of mutual information is approximate with cross entropy.Use mutual trust It is but frequency occur in other classifications based on an assumption that high in some particular category frequency of occurrences that breath is theoretical, which to carry out feature extraction, The relatively low entry of rate and such mutual information are larger.Usually use mutual information as the standard of estimating between feature and classification, such as If a certain feature of fruit belongs to such, their mutual information is maximum.

Mutual information between each pixel of present embodiment refers to the information of relevance between characterization pixel, passes through In target area the mutual information of each pixel can explicitly learn in the target area each pixel respectively with which kind of pixel The degree of approximation of grouping is higher.In the specific implementation, the corresponding feature of group pixels should be set in advance, in this, as commenting Determine the foundation whether pixel belongs to the group pixels.Due in this method, not needing between feature and group pixels classification The property of relationship is made any it is assumed that therefore opposite can ensure to carry out target area the whole efficiency of semantic segmentation.

Mutual information is indicated in the form of formula below:

The pixel collection in target area is indicated with X, indicates a certain pixel, the collection of Z presentation code vector with x ∈ X It closes, some coding vector of z ∈ Z expression pixel, i.e. feature possessed by pixel, and p (z | x) indicate coding caused by x The distribution of vector, we set it as Gaussian Profile, or simply understand that it is exactly the encoder that we want to look for.It can be with mutually Information indicates X, and the correlation of Z is as follows:

Indicate the distribution of initial data, and p (z) is the distribution of the entire Z after p (z | x) is given.

On the basis of a series of above-mentioned embodiments, as a preferred embodiment, obtaining target frame image tool Body is the target frame image obtained in video.

It should be noted that due to consideration that basic technology of the semantic segmentation as image understanding, is often applied to certainly The scenes such as dynamic control loop (specially streetscape identification and understanding), unmanned plane application (landing point judgement) and wearable device Under, and above-mentioned scene is often moment dynamic scene and and revocable image, therefore present embodiment obtains in video Target frame image, and subsequent corresponding processing is carried out, the usage scenario phase contract that can be segmented in current semantics in image understanding It closes, further increases the overall usability that present image understands result.

Embodiment three

Hereinbefore the embodiment of semantic segmentation method is described in detail, the present invention also provides one kind and is somebody's turn to do The corresponding semantic segmentation device of method, since the embodiment of device part is corresponded to each other with the embodiment of method part, dress Set part embodiment refer to method part embodiment description, wouldn't repeat here.

Fig. 2 is a kind of structure chart of semantic segmentation device provided in an embodiment of the present invention.Language provided in an embodiment of the present invention Adopted segmenting device, comprising:

Division module 10 is obtained, for obtaining target frame image, and divides multiple independent targets in target frame image Region.

Semantic segmentation module 11, for executing the semanteme based on corresponding semantic segmentation model point to each target area respectively Operation is cut, corresponding result images are generated.Wherein, semantic segmentation model is that according to target region is corresponding using convolutional neural networks What semantic criteria for classifying training generated.

Semantic segmentation device provided by the present invention divides more after getting target frame image in target frame image A independent target area, and then each target area is executed be based on semantic segmentation model corresponding with the target area respectively Semantic segmentation, and then generate the corresponding result images in each target area；Wherein, each target area all has corresponding language The adopted criteria for classifying, and then the corresponding semantic segmentation model in each target area is by convolutional neural networks according to the target area The corresponding semanteme criteria for classifying is trained and generates.Due to the present apparatus be complete target frame image is divided into it is multiple solely Vertical target area, and the content that target area is included is relatively fewer compared to for complete target frame image, in this base Semantic segmentation, therefore the semanteme of each target area point are carried out with corresponding semantic segmentation standard to each target area respectively on plinth It cuts independently of each other, and then the opposite pixel point avoided because being characterized with the color mark of limited kinds compared with multiple types semantic meaning Group, and the case where cause color mark mutually to obscure during subsequent image understanding, semantic segmentation ensure that with this As a result availability.

Example IV

The present invention also provides a kind of semantic segmentation equipment, comprising:

Memory, for storing computer program；

Semantic segmentation equipment provided by the present invention divides more after getting target frame image in target frame image A independent target area, and then each target area is executed be based on semantic segmentation model corresponding with the target area respectively Semantic segmentation, and then generate the corresponding result images in each target area；Wherein, each target area all has corresponding language The adopted criteria for classifying, and then the corresponding semantic segmentation model in each target area is by convolutional neural networks according to the target area The corresponding semanteme criteria for classifying is trained and generates.Due to this equipment be complete target frame image is divided into it is multiple solely Vertical target area, and the content that target area is included is relatively fewer compared to for complete target frame image, in this base Semantic segmentation, therefore the semanteme of each target area point are carried out with corresponding semantic segmentation standard to each target area respectively on plinth It cuts independently of each other, and then the opposite pixel point avoided because being characterized with the color mark of limited kinds compared with multiple types semantic meaning Group, and the case where cause color mark mutually to obscure during subsequent image understanding, semantic segmentation ensure that with this As a result availability.

The present invention also provides a kind of computer readable storage medium, computer journey is stored on computer readable storage medium Sequence is realized when computer program is executed by processor such as the step of above-mentioned semantic segmentation method.

Computer readable storage medium provided by the present invention, after getting target frame image, in target frame image Multiple independent target areas are divided, and then each target area is executed based on semantic point corresponding with the target area respectively The semantic segmentation of model is cut, and then generates the corresponding result images in each target area；Wherein, each target area all has phase The semantic criteria for classifying answered, and then the corresponding semantic segmentation model in each target area is by convolutional neural networks according to the mesh The corresponding semantic criteria for classifying in mark region is trained and generates.Since this computer readable storage medium is by complete mesh Mark frame image is divided into multiple independent target areas, and the content that target area is included is compared to complete target frame image For it is relatively fewer, on this basis respectively to each target area with corresponding semantic segmentation standard carry out semantic segmentation, therefore The semantic segmentation of each target area is mutually indepedent, and then is avoided relatively because more a variety of with the color mark characterization of limited kinds The group pixels of class semantic meaning, and the case where cause color mark mutually to obscure during subsequent image understanding, It ensure that the availability of semantic segmentation result with this.

A kind of semantic segmentation method, apparatus provided by the present invention, equipment and medium are described in detail above.It says Each embodiment is described in a progressive manner in bright book, and the highlights of each of the examples are the differences with other embodiments Place, the same or similar parts in each embodiment may refer to each other.For the device disclosed in the embodiment, due to its with Method disclosed in embodiment is corresponding, so being described relatively simple, reference may be made to the description of the method.It should refer to It out, for those skilled in the art, without departing from the principle of the present invention, can also be to the present invention Some improvement and modification can also be carried out, and these improvements and modifications also fall within the scope of protection of the claims of the present invention.

It should also be noted that, in the present specification, relational terms such as first and second and the like be used merely to by One entity or operation are distinguished with another entity or operation, without necessarily requiring or implying these entities or operation Between there are any actual relationship or orders.Moreover, the terms "include", "comprise" or its any other variant meaning Covering non-exclusive inclusion, so that the process, method, article or equipment for including a series of elements not only includes that A little elements, but also including other elements that are not explicitly listed, or further include for this process, method, article or The intrinsic element of equipment.In the absence of more restrictions, the element limited by sentence "including a ...", is not arranged Except there is also other identical elements in the process, method, article or apparatus that includes the element.

Claims

1. a kind of semantic segmentation method characterized by comprising

Target frame image is obtained, and divides multiple independent target areas in the target frame image；

The semantic segmentation based on corresponding semantic segmentation model is executed to each target area respectively to operate, and generates corresponding knot Fruit image；Wherein, the semantic segmentation model is to be marked using convolutional neural networks by corresponding semantic divide in the target area Quasi- training generates.

2. the method according to claim 1, wherein it is described divided in the target frame image it is multiple independent Target area includes:

Choose the key point in the target frame image；

Pond processing is carried out respectively based on each key point, it is corresponding multiple independent to be divided in the target frame image The target area.

3. the method according to claim 1, wherein the semantic segmentation model be using convolutional neural networks by The corresponding semantic criteria for classifying training in the target area generates specifically:

The semantic segmentation model is using IndRNN convolutional neural networks by the corresponding semantic criteria for classifying in the target area What training generated.

4. the method according to claim 1, wherein described execute respectively to each target area is based on phase The semantic segmentation of semantic segmentation model is answered to operate specifically:

Institute's predicate based on corresponding semantic segmentation model is executed between the mutual information pixel in each target area respectively Adopted cutting operation.

5. the method according to claim 1, which is characterized in that the acquisition target frame image is specially Obtain the target frame image in video.

6. a kind of semantic segmentation device characterized by comprising

Division module is obtained, for obtaining target frame image, and divides multiple independent target areas in the target frame image Domain；

Semantic segmentation module, for executing the semantic segmentation based on corresponding semantic segmentation model to each target area respectively Operation, generates corresponding result images；Wherein, the semantic segmentation model is using convolutional neural networks by the target area What corresponding semanteme criteria for classifying training generated.

7. a kind of semantic segmentation equipment characterized by comprising

Memory, for storing computer program；

Processor realizes such as semantic segmentation method described in any one of claim 1 to 5 when for executing the computer program The step of.

8. a kind of computer readable storage medium, which is characterized in that be stored with computer on the computer readable storage medium Program is realized when the computer program is executed by processor such as semantic segmentation method described in any one of claim 1 to 5 Step.