CN109271990A

CN109271990A - A kind of semantic segmentation method and device for RGB-D image

Info

Publication number: CN109271990A
Application number: CN201811020264.8A
Authority: CN
Inventors: 焦继超; 邓中亮; 章程; 苑立彬; 王鑫; 吴奇; 莫耀凯
Original assignee: Beijing University of Posts and Telecommunications
Current assignee: Beijing University of Posts and Telecommunications
Priority date: 2018-09-03
Filing date: 2018-09-03
Publication date: 2019-01-25

Abstract

The embodiment of the invention provides a kind of semantic segmentation method and devices for RGB-D image.The described method includes: obtaining the RGB-D image to semantic segmentation；RGB image included by RGB-D image and depth image are input to neural network trained in advance, obtain the corresponding target identification image of RGB-D image；Wherein, RGB image is input to a branched network network layers in the branching networks group of neural network, and depth image is input to another branched network network layers in branching networks group；Neural network includes: sequentially connected branching networks group, Fusion Features network layer and output network layer, neural network is obtained according to the corresponding sample identification image training of sample RGB-D image, sample RGB-D image, the corresponding sample identification image of any sample RGB-D image are as follows: the corresponding semantic segmentation result of sample RGB image included by sample RGB-D image.Using the embodiment of the present invention, the purpose for carrying out effective semantic segmentation to RGB-D image using neural network can be realized.

Description

A kind of semantic segmentation method and device for RGB-D image

Technical field

The present invention relates to field of image processings, more particularly to a kind of semantic segmentation method and dress for RGB-D image It sets.

Background technique

In recent years, SLAM (Simultaneous Location And Mapping, immediately positioning and reconstruction) System Development Rapidly, which is mainly used for the fields such as robot autonomous localization and navigation.Specifically, SLAM system utilizes RGB-D image, into The processing such as row feature extraction and matching, the purpose realizing building three-dimensional map and positioning in real time.So-called RGB-D image is two width Image: one is RGB image (image with RGB triple channel), the other is depth image (depth image).Its In, depth image is similar to gray level image, its each pixel value is the actual range of sensor distance object, also, usually The pixel of RGB image and depth image is one-to-one.

In order to promoted building three-dimensional map availability, researcher be based on semantic segmentation technology propose semantic map Concept, semantic segmentation, which refers to, to be carried out Pixel-level segmentation for the content in image and identifies the classification of object, semantic map Divide i.e. in the three-dimensional point cloud of building and the object in environment-identification.

Due to the fast development and good result that deep learning obtains in semantic segmentation in recent years, for SLAM system, Researcher is desirable to carry out the semantic segmentation of RGB-D image using the neural network in deep learning.

Therefore, for the demand, how using neural network effective semantic segmentation to be carried out to RGB-D image, is Urgent problem to be solved.

Summary of the invention

The embodiment of the present invention is designed to provide a kind of semantic segmentation method and device for RGB-D image, with reality The purpose of effective semantic segmentation is now carried out to RGB-D image using neural network.Specific technical solution is as follows:

In a first aspect, the embodiment of the invention provides a kind of semantic segmentation method for RGB-D image, the method packet It includes:

Obtain RGB-D image to semantic segmentation, the RGB-D image includes: RGB triple channel RGB image and described The corresponding depth image of RGB image；

RGB image included by the RGB-D image and depth image are input to neural network trained in advance, obtained The corresponding target identification image of the RGB-D image；Wherein, the RGB image is input to the branching networks of the neural network A branched network network layers in group, the depth image are input to another branched network network layers in the branching networks group；Its In, the neural network includes: the sequentially connected branching networks group, Fusion Features network layer and exports network layer, described Branching networks group includes two branched network network layers as branch arranged side by side, and each branched network network layers are to carry out feature to input picture The feature extraction layer of extraction；The neural network is according to sample RGB-D image, the corresponding sample of the sample RGB-D image Mark image training obtains, and the sample RGB-D image includes sample RGB image and sample depth image, any sample RGB-D The corresponding sample identification image of image are as follows: the corresponding semantic segmentation knot of sample RGB image included by sample RGB-D image Fruit.

Optionally, each branched network network layers include three convolution modules of serial connection.

Optionally, the input content of each target convolution module includes: the target convolution module in the first branched network network layers A upper convolution module output content and the second branched network network layers in convolution corresponding with the upper convolution module position The output content of module；

Wherein, the first branched network network layers are the branched network network layers where the RGB image, second branching networks Layer is the branched network network layers where the depth image, and either objective convolution module is that first is removed in the first branched network network layers Convolution module other than a convolution module.

Optionally, the Fusion Features mode of the Fusion Features network layer, comprising:

The channel for the characteristic spectrum that described two branched network network layers export respectively is corresponded into concatenated mode.

Optionally, the Fusion Features network layer is connected with output network layer by feature selecting network layer；Wherein, institute Stating feature selecting network layer includes: sequentially connected pond beggar layer, the first full connection sublayer and the second full connection sublayer；

The pond beggar layer is used for: carrying out maximum pond to the fused characteristic spectrum of Fusion Features network layer output Change and calculate, obtains the calculated result in maximum pond, and using the calculated result as first group of penalty coefficient；

Described first full connection sublayer is used for: by the nerve of first group of penalty coefficient and the first full connection sublayer The weight of member is calculated, and the first calculated result is obtained, and using first calculated result as second group of penalty coefficient, is utilized Sigmoid activation primitive normalizes the numerical value of second group of penalty coefficient, obtains third group penalty coefficient；

Described second full connection sublayer is used for: by the nerve of the third group penalty coefficient and the second full connection sublayer The weight of member is calculated, and the second calculated result is obtained, and using second calculated result as the 4th group of penalty coefficient, is utilized Sigmoid activation primitive normalizes the numerical value of the 4th group of penalty coefficient, obtains the 5th group of penalty coefficient, utilizes described the The fused characteristic spectrum is weighted in five groups of penalty coefficients, obtains fisrt feature map.

Second aspect, the embodiment of the invention provides a kind of semantic segmentation device for RGB-D image, described device packets It includes:

Module is obtained, for obtaining the RGB-D image to semantic segmentation, the RGB-D image includes: RGB triple channel RGB image and the corresponding depth image of the RGB image；

Computing module, for RGB image included by the RGB-D image and depth image to be input to training in advance Neural network obtains the corresponding target identification image of the RGB-D image；Wherein, the RGB image is input to the nerve net A branched network network layers in the branching networks group of network, the depth image are input to another point in the branching networks group Branch network layer；Wherein, the neural network includes: the sequentially connected branching networks group, Fusion Features network layer and output Network layer, the branching networks group include two branched network network layers as branch arranged side by side, and each branched network network layers are to input The feature extraction layer of image progress feature extraction；The neural network is schemed according to sample RGB-D image, the sample RGB-D As the training of corresponding sample identification image obtains, the sample RGB-D image includes sample RGB image and sample depth image, The corresponding sample identification image of any sample RGB-D image are as follows: sample RGB image included by sample RGB-D image is corresponding Semantic segmentation result.

It is in scheme provided by the embodiment of the present invention, RGB image included by the RGB-D image and depth image is defeated Enter to neural network trained in advance, obtains the corresponding target identification image of the RGB-D image, the target identification image Are as follows: the corresponding semantic segmentation result of RGB image included by the RGB-D image.Therefore, side provided by the embodiment of the present invention Case can be realized the purpose for carrying out effective semantic segmentation to RGB-D image using neural network.

Certainly, it implements any of the products of the present invention or method must be not necessarily required to reach all the above excellent simultaneously Point.

Detailed description of the invention

In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described.

Fig. 1 is a kind of flow diagram of the semantic segmentation method for RGB-D image provided by the embodiment of the present invention；

Fig. 2 is the structural schematic diagram of neural network provided by the embodiment of the present invention；

Fig. 3 (a) is the corresponding grayscale image of a RGB image；Fig. 3 (b) is right for the RGB image of grayscale image shown in Fig. 3 (a) The depth image answered；Fig. 3 (c) is obtained to the RGB image of Fig. 3 (a) and the depth image progress semantic segmentation of Fig. 3 (b) The grayscale image of target identification image；

Fig. 4 is a kind of structural schematic diagram of the semantic segmentation device for RGB-D image provided by the embodiment of the present invention；

Fig. 5 is the structural schematic diagram of a kind of electronic equipment provided by the embodiment of the present invention.

Specific embodiment

Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention is described.

In order to realize the purpose for carrying out effective semantic segmentation to RGB-D image, the embodiment of the invention provides a kind of needles To semantic segmentation method, apparatus, electronic equipment and the storage medium of RGB-D image.

It should be noted that a kind of semantic segmentation method for RGB-D image provided by the embodiment of the present invention is held Row main body can be a kind of semantic segmentation device for RGB-D image, should can be with for the semantic segmentation device of RGB-D image It runs in electronic equipment.It wherein, can should be inserting in an image processing tool for the semantic segmentation device of RGB-D image Part, or independently of the program except an image processing tool, be not limited thereto certainly.

It is situated between in the following, being provided for the embodiments of the invention a kind of semantic segmentation method for RGB-D image first It continues.

As shown in Figure 1, a kind of semantic segmentation method for RGB-D image provided by the embodiment of the present invention, can wrap Include following steps:

S101 obtains the RGB-D image to semantic segmentation.

Wherein, the RGB-D image includes: RGB image and the corresponding depth image of the RGB image.

In embodiments of the present invention, it can use the shooting of RGB-D camera and obtain the RGB-D image, for example, can be from institute The shooting module of RGB-D camera is stated, the RGB-D image of RGB-D camera captured in real-time is obtained, or obtains RGB- from predeterminated position The RGB-D image etc. that D camera shoots and stores in advance in the embodiment of the present invention, obtains the RGB-D to semantic segmentation certainly The mode of image is without being limited thereto.Wherein, so-called RGB-D camera is that can shoot RGB image and depth simultaneously in the prior art The camera of image.

RGB image included by the RGB-D image and depth image are input to nerve net trained in advance by S102 Network obtains the corresponding target identification image of the RGB-D image.

Described referring to fig. 2, Fig. 2 is the structural schematic diagram of neural network provided by the embodiment of the present invention；From the angle of function For degree, the neural network includes: the sequentially connected branching networks group, Fusion Features network layer 120 and output network Layer 130, the branching networks group includes two branched network network layers 110 as branch arranged side by side: each branched network network layers 110 are pair The feature extraction layer of input picture progress feature extraction.Fusion Features network layer 120 is used for: defeated to each branched network network layers 110 Characteristic spectrum out carries out Fusion Features.Exporting network layer 130 is that the neural network removes the branching networks group, Fusion Features The general designation of remaining structure except network layer 120, it is to be understood that the output network layer 130 may include multiple networks Sublayer.

It should be noted that either branch network layer with later Fusion Features network layer and output network layer be sequentially connected with It may be constructed FCN (Fully Convolutional Networks, full convolutional network).The full convolutional network is California in 2014 What Long of university's Berkeley et al. was proposed, which is a kind of net that current semantic segmentation field is widely used Network structure.

The neural network is according to sample RGB-D image, the corresponding sample identification image instruction of the sample RGB-D image It gets, the sample RGB-D image includes sample RGB image and sample depth image.It is clear in order to be laid out, about the mind Training process through network is introduced below.

As shown in Fig. 2, in embodiments of the present invention, RGB image included by the RGB-D image and depth image is defeated The process entered to neural network trained in advance may include: that the RGB image is input to the branched network of the neural network The depth image is input to another branching networks in the branching networks group by a branched network network layers in network group Layer.

According to the working principle of neural network, in embodiments of the present invention, the target mark of the neural network output Knowing image is the corresponding semantic segmentation result of RGB image included by the RGB-D image.

In embodiments of the present invention, the semantic segmentation may is that object mark different classes of in the RGB image Object for different values, each value is shown with corresponding color.Thus, it is different classes of in the target identification image Object can have different colors.

The image effect of RGB image, depth image and target identification image is described with reference to the drawings in scheme in order to facilitate understanding Difference.Fig. 3 (a) gives the corresponding grayscale image of a RGB image, and Fig. 3 (b) is the corresponding depth image of the RGB image, Fig. 3 (c) to utilize depth shown in RGB image and Fig. 3 (b) of the method provided by the embodiment of the present invention to grayscale image shown in Fig. 3 (a) Image carries out semantic segmentation, the grayscale image of obtained target identification image.It is understood that not carrying out the original of gray scale conversion In some target identification images, different classes of object has different colors, thus can intuitively embody semantic segmentation As a result.

The training process of the neural network in the embodiment of the present invention is simply introduced below, training process may include Following steps:

The first step determines initial neural network；

Wherein, the initial neural network includes: the sequentially connected branching networks group, Fusion Features network layer and defeated Network layer out, the branching networks group include two branched network network layers as branch arranged side by side, and each branched network network layers are to defeated Enter the feature extraction layer that image carries out feature extraction.

It should be noted that in embodiments of the present invention, the initial weight in the initial neural network can be existing The weight trained.And as a preferred mode, it in embodiments of the present invention, can be for where the depth image Branched network network layers, train corresponding weight as the initial weight of the branched network network layers in advance, enable in this way it is described just The specific aim of the weight of beginning neural network is stronger, improves the training effect of the initial neural network.

Second step obtains sample RGB-D image, the corresponding sample identification image of the sample RGB-D image；

In embodiments of the present invention, available multiple sample RGB-D images, the corresponding sample of the sample RGB-D image This mark image, in the subsequent training effect for improving initial neural network.It is available such as in primary training process Corresponding sample identification image of sample RGB-D image described in 8 groups of sample RGB-D images and 8 groups etc..

Wherein, the sample RGB-D image includes sample RGB image and sample depth image, any sample RGB-D image Corresponding sample identification image are as follows: the corresponding semantic segmentation result of sample RGB image included by sample RGB-D image.It needs It is noted that the sample identification image can be by manual identification, it is not limited thereto certainly.

Third step, using sample RGB-D image, the corresponding sample identification image of the sample RGB-D image, described in training Initial neural network obtains the neural network.

In this step, the sample RGB image is input in the branching networks group of the initial neural network first A branched network network layers, another branched network network layers sample depth image being input in the branching networks group； Using corresponding sample identification image as true value；And it follows the steps below:

1) the sample RGB image and the sample depth image are obtained by the training of the initial neural network Training result.

2) training result and corresponding true value are compared, obtain output result；

3) value of the loss function Loss of the initial neural network is calculated according to output result；

4) according to the value of the Loss, the parameter of initial neural network is adjusted, and re-starts 1) -3) step, until institute The value for stating Loss has reached certain condition of convergence, that is, the value of the Loss reaches minimum, at this moment, completes initial nerve net The training of network obtains the neural network that training is completed.

Some semantic point is carried out to RGB-D image using neural network it should be noted that having existed in the prior art The method cut, purpose are mostly to promote RGB-D semantic segmentation precision.For example a kind of realization process of prior art can be with are as follows:

The input channel number increase of full convolutional network is one-dimensional, then sample depth image is merged with sample RGB image Full convolutional network is input to for four-way to be trained, and obtains trained full convolutional network, and utilize the trained full volume Product network, semantic segmentation result is calculated in RGB image and corresponding depth image to input.It is understood that in mind Through in network training process, the initial weight of neural network is very important, this kind determines initial weight in the prior art Mode is: the good weight of pre-training on heavily loaded large-scale dataset (such as ImageNet), but since these large data sets are all RGB data collection, input channel are triple channel, and for the depth image of increased fourth lane, there is no specially suitable depth The weight of image.By experimental data it is found that this prior art trains obtained full convolutional network, in RGB-D semantic segmentation Precision aspect, promotion are not obvious.

In order to train the weight come on heavily loaded RGB data collection, other scholars are based on FCN, propose another existing skill Art, the process of realization can be with are as follows: depth image is converted to level error by the deepness image encoding method HHA proposed using S.Gupta Three channels of different, distance away the ground and surface normal measuring angle.Then depth image and RGB image are respectively input to one entirely Convolutional network, in the last carry out Fusion Features of full convolutional network, amalgamation mode is by the probability graph phase of two full convolutional networks Add, wherein the probability graph is that the characteristic spectrum of full convolutional network output is obtained by activation primitive, finally, based on being added Probability graph afterwards obtains final semantic segmentation result.The experimental results showed that the prior art has certain promotion to segmentation precision. But there are two disadvantages for the prior art: the complementary information between each channel data is only emphasized in a.HHA coding method, and is neglected Omited the independent element in each channel, there is certain limitation, and the spatial information that is characterized of triple channel after HHA coding and The color and vein information of RGB triple channel characterization has essential distinction.Also, it is this in the prior art, characteristic spectrum is subjected to phase The amalgamation mode added destroys the feature that full convolutional network respectively extracts under both modalities which, that is to say, that destroy RGB image With the respective feature of depth image.B. the prior art needs to carry out a large amount of pretreatment work, that is, carries out HHA and encode work Make, causes to consume more computing resource, can not accomplish real-time semantic segmentation.

In embodiments of the present invention, inventor is based on full convolutional network, constructs the neural network by research, and In implementation process, the RGB image is inputted into the first branched network network layers in the branching networks group of the neural network, it can be with Understand, the RGB image is inputted in the form of triple channel；The depth image is input in the branching networks group The second branched network network layers, the depth image is inputted with a channel form；Also, in training for the neural network Cheng Zhong is provided with corresponding weight for depth image, and therefore, scheme provided by the embodiment of the present invention can be to avoid above-mentioned first Kind has corresponding weight in the prior art, due to not having depth image, and caused RGB-D semantic segmentation precision improvement is unknown Aobvious problem.

Also, inventor also found in the course of the research, and the feature extracted by RGB image and depth image can be seen Out, there are apparent complementary relationships for two kinds of characteristic spectrums, can destroy this complementary relationship to the direct summation of characteristic spectrum, weaken Independent characteristic under both modalities which.

Therefore, in embodiments of the present invention, optionally, the amalgamation mode of the Fusion Features network layer are as follows: by described two The channel for the characteristic spectrum that a branched network network layers export respectively corresponds to concatenated mode.The mode for using feature to stack in this way can be with Retain the RGB image and the respective primitive character information of the depth image.It can thus be seen that with second of prior art It compares, scheme provided by the embodiment of the present invention does not have to carry out HHA coding work, and the amalgamation mode of used characteristic spectrum Also different, therefore can solve above-mentioned second of problems of the prior art.

Method provided by the embodiment of the present invention, by determining fusion position of the feature of depth image in neural network And amalgamation mode, can use the space geometry information in depth image, auxiliary RGB image realizes semantic segmentation end to end, Promote segmentation precision.

Illustrate scheme provided by the embodiment of the present invention below by way of experimental data, compared to the first prior art and The beneficial effect of two kinds of prior arts.Referring to table 1, table 1 is pair of the experimental result of the embodiment of the present invention and two kinds of prior arts Than, it can be seen from Table 1 that, scheme provided by the embodiment of the present invention, pixel accuracy, mean pixel accuracy and friendship are simultaneously compared All it is higher than two kinds of prior arts, it will be appreciated by persons skilled in the art that above-mentioned three kinds of indexs can be used for characterizing segmentation essence Degree.Therefore, compared to two kinds prior arts of scheme provided by the embodiment of the present invention, segmentation precision are higher.

Table 1

Optionally, in embodiments of the present invention, each branched network network layers include three convolution modules of serial connection.

According to the structure of the full articulamentum it is found that each convolution module includes two layers of full connection sublayer and one layer of pond beggar Layer.

In embodiments of the present invention, the quantity of the module of each branched network network layers is constantly to test determination by inventor 's.It is understood that the quantity of the module of each branched network network layers, is related to Fusion Features network layer in the neural network In position, that is, RGB image and depth image characteristic spectrum fusion position.Inventor is constructing the neural network Before, determine that the process of the fusion position can be with are as follows: RGB image is inputted into a full convolutional network, depth image is inputted Another full convolutional network is compared in the characteristic spectrum that the different location of full convolutional network extracts two full convolutional networks Analysis.By comparative analysis it can be found that: in the characteristic spectrum and depth image of the RGB image that third convolution module extracts Characteristic spectrum, the feature (feature of RGB image and depth image) under the two mode still falls within angle point, edge or plane Etc. low-level features scope.

And occur what network was independently extracted and combined since the characteristic spectrum for the RGB image that the 5th convolution block extracts High-level abstractions feature, while the characteristic spectrum of RGB image at this moment is Local activation, illustrates the RGB feature in upper layer network Extractor is only sensitive to the object for meeting certain category feature rule, and no longer to global point-line-surface feature-sensitive.And in contrast, The characteristic spectrum of the depth image extracted from the 5th convolution block can still see apparent focal boundary feature, these features Still fall within the scope of low-level features.

By comparative analysis it can be concluded that the upper layer network of full convolutional network cannot effectively extract the spy of depth image Sign, and since the presence of pond layer so that the resolution ratio of the characteristic spectrum of depth image reduces further results in depth image Characteristic details loss.Therefore, it should select lower layer network to carry out Fusion Features, retain the characteristic details of depth image, together When avoid more unnecessary convolution algorithms.

Therefore, in embodiments of the present invention, each branched network network layers may include three convolution modules of serial connection, That is the position after determining the third convolution module of full convolutional network, is the fusion position.

Certainly, in embodiments of the present invention, each branched network network layers also may include two or four convolution modules, this All it is reasonable, is the preferred embodiment after a kind of comparison for three convolution modules, and for other convolution modules Quantity, to RGB-D image carry out semantic segmentation effect, such as precision etc., may decrease.

Certainly, if the full convolutional network in the embodiment of the present invention replaces with other neural networks, the nerve can be chosen Position of the middle position of network as Fusion Features network layer can be easily determined accordingly in this way for RGB-D image Semantic segmentation neural network structure.

Optionally, in embodiments of the present invention, in the first branched network network layers each target convolution module input content packet Include: in the output content and the second branched network network layers of a upper convolution module for the target convolution module with a upper convolution The output content of the corresponding convolution module of module position；

It is understood that compared to the structure of neural network shown in Fig. 2, nerve net provided by the embodiment of the present invention Network, can be by the second branched network network layers, the characteristic spectrum of each target convolution module output, with first branched network The characteristic spectrum of the target convolution module output of corresponding position carries out Fusion Features in network layers, can utilize depth to greatest extent The feature of image cooperates the feature of RGB image, realizes the purpose that effective semantic segmentation is carried out to RGB-D image.

Optionally, in embodiments of the present invention, the Fusion Features network layer and output network layer pass through feature selecting net Network layers are connected；

It is understood that compared to the structure of neural network shown in Fig. 2, neural network in the embodiment of the present invention, Between Fusion Features network layer and output network layer, feature selecting network layer joined.That is, in the embodiment of the present invention In, inventor is modified the structure of existing full convolutional network.Wherein, the feature selecting network layer includes: sequentially The pond beggar layer of connection, the first full connection sublayer and the second full connection sublayer；The quantity etc. of the neuron of each full connection sublayer In the number of channels of fused characteristic spectrum.

It should be noted that if the size of the characteristic spectrum of the RGB image and the depth image is H × W × C, In, the C is the port number of the characteristic spectrum of the RGB image and the depth image.It is understood that fused spy The size for levying map is H × W × 2C.So, the fused characteristic spectrum of Fusion Features network layer output is carried out most The process that great Chiization calculates can be, and a maximum value, therefore, obtained maximum pond are determined from the H × W in each channel Calculated result is the array of 1 × 1 × 2C.

It is understood that first group of penalty coefficient is 2C numerical value, the first full connection sublayer has 2C neuron, Each numerical value of each neuron and first group of penalty coefficient has corresponding weight, and the first obtained calculated result is also 2C Numerical value, below to the process for obtaining the first calculated result for example, such as: Y₁=X₁*W₁₁+X₂*W₂₁+...+X_2C*W_2C1, Middle X₁~X_2CIt is the calculated result in maximum pond, Y₁It is first numerical value of the first calculated result, W₁₁~W_2CIt is maximum pond Corresponding weight on line between calculated result and first neuron of the first full connection sublayer.First calculated result its The calculating process of his numerical value, similar with the calculating process of first numerical value of the first calculated result, details are not described herein.

After obtaining the first calculated result, using first calculated result as second group of penalty coefficient.And utilize sigmoid The numerical value of second group of penalty coefficient is normalized between 0~1 by activation primitive, obtains third group penalty coefficient.

Described second full connection sublayer adds the fused characteristic spectrum using the 5th group of penalty coefficient Calculating process before power calculates is similar with the described first full connection calculating process of sublayer, and details are not described herein.

It should be noted that the neural network, when being trained, two connect each neurons of sublayer entirely Weight W between the penalty coefficient of corresponding group is constantly updating always iteration by study, and the change of weight influences the first meter Calculating result and the second calculated result, the value that then sigmoid activation primitive calculates can also change, thus in fused spy Weighting coefficient on sign map also can constantly change, and finally will affect the value of final loss function Loss.The neural network By study, weight W can slowly have autonomous selectivity, so that the value of loss is total after weighting coefficient weights characteristic spectrum Be it is less than normal, this just illustrate weighting coefficient be actually lesser feature (certain channels in 2C) will be contributed to inhibit, that , it is possible to understand that are as follows: since useless feature is suppressed, the effect of useful feature is in a disguised form amplified.Therefore, features described above selects net Network layers function as a rewards and punishments mechanism, i.e., improve the contributive high weighted value of feature for precision and rewarded, and The feature for improving low contribution for precision is punished and is inhibited with low weighted value.It is possible to understand, feature selecting net The semantic segmentation precision of RGB-D image can be improved in the addition of network layers.

Corresponding to above method embodiment, the embodiment of the invention also provides a kind of semantic segmentations for RGB-D image Device, as shown in figure 4, the device includes:

Module 401 is obtained, for obtaining the RGB-D image to semantic segmentation, the RGB-D image includes: RGB three Channel RGB image and the corresponding depth image of the RGB image；

Computing module 402, for RGB image included by the RGB-D image and depth image to be input to preparatory instruction Experienced neural network obtains the corresponding target identification image of the RGB-D image；Wherein, the RGB image is input to the mind A branched network network layers in branching networks group through network, the depth image are input to another in the branching networks group A branched network network layers；Wherein, the neural network include: the sequentially connected branching networks group, Fusion Features network layer and Network layer is exported, the branching networks group includes two branched network network layers as branch arranged side by side, and each branched network network layers are pair The feature extraction layer of input picture progress feature extraction；The neural network is according to sample RGB-D image, the sample RGB- The corresponding sample identification image training of D image obtains, and the sample RGB-D image includes sample RGB image and sample depth figure Picture, the corresponding sample identification image of any sample RGB-D image are as follows: sample RGB image pair included by sample RGB-D image The semantic segmentation result answered.

Optionally, in embodiments of the present invention, the Fusion Features mode of the Fusion Features network layer, comprising:

Optionally, in embodiments of the present invention, the Fusion Features network layer and output network layer pass through feature selecting net Network layers are connected；Wherein, the feature selecting network layer includes: sequentially connected pond beggar layer, the first full connection sublayer and the Two full connection sublayers；

Corresponding to above method embodiment, the embodiment of the invention also provides a kind of electronic equipment, as shown in figure 5, can be with Including processor 501 and memory 502, wherein

The memory 502, for storing computer program；

The processor 501 when for executing the program stored on the memory 502, realizes the embodiment of the present invention The step of provided semantic segmentation method for RGB-D image.

Above-mentioned memory may include RAM (Random Access Memory, random access memory), also may include NVM (Non-Volatile Memory, nonvolatile memory), for example, at least a magnetic disk storage.Optionally, memory It can also be that at least one is located away from the storage device of above-mentioned processor.

Above-mentioned processor can be general processor, including CPU (Central Processing Unit, central processing Device), NP (Network Processor, network processing unit) etc.；Can also be DSP (Digital Signal Processor, Digital signal processor), ASIC (Application Specific Integrated Circuit, specific integrated circuit), FPGA (Field-Programmable Gate Array, field programmable gate array) or other programmable logic device are divided Vertical door or transistor logic, discrete hardware components.

It by above-mentioned electronic equipment, can be realized: RGB image included by the RGB-D image and depth image is defeated Enter to neural network trained in advance, obtains the corresponding target identification image of the RGB-D image, the target identification image Are as follows: the corresponding semantic segmentation result of RGB image included by the RGB-D image.Therefore, side provided by the embodiment of the present invention Case can be realized the purpose for carrying out effective semantic segmentation to RGB-D image using neural network.

In addition, corresponding to the semantic segmentation method for being directed to RGB-D image provided by above-described embodiment, the embodiment of the present invention A kind of computer readable storage medium is provided, computer program, computer journey are stored in the computer readable storage medium The step of semantic segmentation method that RGB-D image is directed to provided by the embodiment of the present invention is realized when sequence is executed by processor.

Above-mentioned computer-readable recording medium storage, which has, to be executed provided by the embodiment of the present invention at runtime for RGB-D The application program of the semantic segmentation method of image, therefore can be realized: by RGB image and depth included by the RGB-D image Degree image is input to neural network trained in advance, obtains the corresponding target identification image of the RGB-D image, the target mark Know image are as follows: the corresponding semantic segmentation result of RGB image included by the RGB-D image.Therefore, the embodiment of the present invention is provided Scheme, can be realized the purpose for carrying out effective semantic segmentation to RGB-D image using neural network.

For electronic equipment and computer readable storage medium embodiment, method content base as involved in it Originally it is similar to embodiment of the method above-mentioned, so being described relatively simple, referring to the part explanation of embodiment of the method in place of correlation ?.

It should be noted that, in this document, relational terms such as first and second and the like are used merely to a reality Body or operation are distinguished with another entity or operation, are deposited without necessarily requiring or implying between these entities or operation In any actual relationship or order or sequence.Moreover, the terms "include", "comprise" or its any other variant are intended to Non-exclusive inclusion, so that the process, method, article or equipment including a series of elements is not only wanted including those Element, but also including other elements that are not explicitly listed, or further include for this process, method, article or equipment Intrinsic element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that There is also other identical elements in process, method, article or equipment including the element.

Each embodiment in this specification is all made of relevant mode and describes, same and similar portion between each embodiment Dividing may refer to each other, and each embodiment focuses on the differences from other embodiments.Especially for system reality For applying example, since it is substantially similar to the method embodiment, so being described relatively simple, related place is referring to embodiment of the method Part explanation.

The foregoing is merely alternative embodiments of the invention, are not intended to limit the scope of the present invention.It is all Any modification, equivalent replacement, improvement and so within the spirit and principles in the present invention, are all contained in protection scope of the present invention It is interior.

Claims

1. a kind of semantic segmentation method for RGB-D image characterized by comprising

The RGB-D image to semantic segmentation is obtained, the RGB-D image includes: RGB triple channel RGB image and the RGB The corresponding depth image of image；

RGB image included by the RGB-D image and depth image are input to neural network trained in advance, obtained described The corresponding target identification image of RGB-D image；Wherein, the RGB image is input in the branching networks group of the neural network A branched network network layers, the depth image is input to another branched network network layers in the branching networks group；Wherein, institute Stating neural network includes: the sequentially connected branching networks group, Fusion Features network layer and output network layer, the branched network Network group includes two branched network network layers as branch arranged side by side, and each branched network network layers are to carry out feature extraction to input picture Feature extraction layer；The neural network is according to sample RGB-D image, the corresponding sample identification figure of the sample RGB-D image It is obtained as trained, the sample RGB-D image includes sample RGB image and sample depth image, any sample RGB-D image pair The sample identification image answered are as follows: the corresponding semantic segmentation result of sample RGB image included by sample RGB-D image.

2. the method according to claim 1, wherein each branched network network layers include three convolution of serial connection Module.

3. according to the method described in claim 2, it is characterized in that, in the first branched network network layers each target convolution module it is defeated Enter in the output content and the second branched network network layers for the upper convolution module that content includes: the target convolution module with it is described The output content of the corresponding convolution module in upper convolution module position；

Wherein, the first branched network network layers are the branched network network layers where the RGB image, and the second branched network network layers are Branched network network layers where the depth image, either objective convolution module are in the first branched network network layers except first volume Convolution module other than volume module.

4. the method according to claim 1, wherein the Fusion Features mode of the Fusion Features network layer, packet It includes:

5. the method according to claim 1, wherein the Fusion Features network layer and output network layer pass through spy Sign selection network layer is connected；Wherein, the feature selecting network layer includes: sequentially connected pond beggar layer, the first full connection Sublayer and the second full connection sublayer；

Pond beggar layer by: the fused characteristic spectrum of Fusion Features network layer output is carried out based on maximum pond It calculates, obtains the calculated result in maximum pond, and using the calculated result as first group of penalty coefficient；

Described first full connection sublayer is used for: first group of penalty coefficient and described first are connected the neuron of sublayer entirely Weight is calculated, and the first calculated result is obtained, and using first calculated result as second group of penalty coefficient, is utilized Sigmoid activation primitive normalizes the numerical value of second group of penalty coefficient, obtains third group penalty coefficient；

Described second full connection sublayer is used for: the third group penalty coefficient and described second are connected the neuron of sublayer entirely Weight is calculated, and the second calculated result is obtained, and using second calculated result as the 4th group of penalty coefficient, is utilized Sigmoid activation primitive normalizes the numerical value of the 4th group of penalty coefficient, obtains the 5th group of penalty coefficient, utilizes described the The fused characteristic spectrum is weighted in five groups of penalty coefficients, obtains fisrt feature map.

6. a kind of semantic segmentation device for RGB-D image characterized by comprising

Computing module, for RGB image included by the RGB-D image and depth image to be input to nerve trained in advance Network obtains the corresponding target identification image of the RGB-D image；Wherein, the RGB image is input to the neural network A branched network network layers in branching networks group, the depth image are input to another branched network in the branching networks group Network layers；Wherein, the neural network includes: the sequentially connected branching networks group, Fusion Features network layer and output network Layer, the branching networks group include two branched network network layers as branch arranged side by side, and each branched network network layers are to input picture Carry out the feature extraction layer of feature extraction；The neural network is according to sample RGB-D image, the sample RGB-D image pair The sample identification image training answered obtains, and the sample RGB-D image includes sample RGB image and sample depth image, any The corresponding sample identification image of sample RGB-D image are as follows: the corresponding language of sample RGB image included by sample RGB-D image Adopted segmentation result.

7. device according to claim 6, which is characterized in that each branched network network layers include three convolution of serial connection Module.

8. device according to claim 7, which is characterized in that each target convolution module is defeated in the first branched network network layers Enter in the output content and the second branched network network layers for the upper convolution module that content includes: the target convolution module with it is described The output content of the corresponding convolution module in upper convolution module position；

9. device according to claim 6, which is characterized in that the Fusion Features mode of the Fusion Features network layer, packet It includes:

10. device according to claim 6, which is characterized in that the Fusion Features network layer and output network layer pass through Feature selecting network layer is connected；Wherein, the feature selecting network layer includes: sequentially connected pond beggar layer, first connects entirely Connect sublayer and the second full connection sublayer；