CN109271990A - A kind of semantic segmentation method and device for RGB-D image - Google Patents
A kind of semantic segmentation method and device for RGB-D image Download PDFInfo
- Publication number
- CN109271990A CN109271990A CN201811020264.8A CN201811020264A CN109271990A CN 109271990 A CN109271990 A CN 109271990A CN 201811020264 A CN201811020264 A CN 201811020264A CN 109271990 A CN109271990 A CN 109271990A
- Authority
- CN
- China
- Prior art keywords
- image
- network
- rgb
- group
- sample
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
- G06V10/267—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
The embodiment of the invention provides a kind of semantic segmentation method and devices for RGB-D image.The described method includes: obtaining the RGB-D image to semantic segmentation;RGB image included by RGB-D image and depth image are input to neural network trained in advance, obtain the corresponding target identification image of RGB-D image;Wherein, RGB image is input to a branched network network layers in the branching networks group of neural network, and depth image is input to another branched network network layers in branching networks group;Neural network includes: sequentially connected branching networks group, Fusion Features network layer and output network layer, neural network is obtained according to the corresponding sample identification image training of sample RGB-D image, sample RGB-D image, the corresponding sample identification image of any sample RGB-D image are as follows: the corresponding semantic segmentation result of sample RGB image included by sample RGB-D image.Using the embodiment of the present invention, the purpose for carrying out effective semantic segmentation to RGB-D image using neural network can be realized.
Description
Technical field
The present invention relates to field of image processings, more particularly to a kind of semantic segmentation method and dress for RGB-D image
It sets.
Background technique
In recent years, SLAM (Simultaneous Location And Mapping, immediately positioning and reconstruction) System Development
Rapidly, which is mainly used for the fields such as robot autonomous localization and navigation.Specifically, SLAM system utilizes RGB-D image, into
The processing such as row feature extraction and matching, the purpose realizing building three-dimensional map and positioning in real time.So-called RGB-D image is two width
Image: one is RGB image (image with RGB triple channel), the other is depth image (depth image).Its
In, depth image is similar to gray level image, its each pixel value is the actual range of sensor distance object, also, usually
The pixel of RGB image and depth image is one-to-one.
In order to promoted building three-dimensional map availability, researcher be based on semantic segmentation technology propose semantic map
Concept, semantic segmentation, which refers to, to be carried out Pixel-level segmentation for the content in image and identifies the classification of object, semantic map
Divide i.e. in the three-dimensional point cloud of building and the object in environment-identification.
Due to the fast development and good result that deep learning obtains in semantic segmentation in recent years, for SLAM system,
Researcher is desirable to carry out the semantic segmentation of RGB-D image using the neural network in deep learning.
Therefore, for the demand, how using neural network effective semantic segmentation to be carried out to RGB-D image, is
Urgent problem to be solved.
Summary of the invention
The embodiment of the present invention is designed to provide a kind of semantic segmentation method and device for RGB-D image, with reality
The purpose of effective semantic segmentation is now carried out to RGB-D image using neural network.Specific technical solution is as follows:
In a first aspect, the embodiment of the invention provides a kind of semantic segmentation method for RGB-D image, the method packet
It includes:
Obtain RGB-D image to semantic segmentation, the RGB-D image includes: RGB triple channel RGB image and described
The corresponding depth image of RGB image;
RGB image included by the RGB-D image and depth image are input to neural network trained in advance, obtained
The corresponding target identification image of the RGB-D image;Wherein, the RGB image is input to the branching networks of the neural network
A branched network network layers in group, the depth image are input to another branched network network layers in the branching networks group;Its
In, the neural network includes: the sequentially connected branching networks group, Fusion Features network layer and exports network layer, described
Branching networks group includes two branched network network layers as branch arranged side by side, and each branched network network layers are to carry out feature to input picture
The feature extraction layer of extraction;The neural network is according to sample RGB-D image, the corresponding sample of the sample RGB-D image
Mark image training obtains, and the sample RGB-D image includes sample RGB image and sample depth image, any sample RGB-D
The corresponding sample identification image of image are as follows: the corresponding semantic segmentation knot of sample RGB image included by sample RGB-D image
Fruit.
Optionally, each branched network network layers include three convolution modules of serial connection.
Optionally, the input content of each target convolution module includes: the target convolution module in the first branched network network layers
A upper convolution module output content and the second branched network network layers in convolution corresponding with the upper convolution module position
The output content of module;
Wherein, the first branched network network layers are the branched network network layers where the RGB image, second branching networks
Layer is the branched network network layers where the depth image, and either objective convolution module is that first is removed in the first branched network network layers
Convolution module other than a convolution module.
Optionally, the Fusion Features mode of the Fusion Features network layer, comprising:
The channel for the characteristic spectrum that described two branched network network layers export respectively is corresponded into concatenated mode.
Optionally, the Fusion Features network layer is connected with output network layer by feature selecting network layer;Wherein, institute
Stating feature selecting network layer includes: sequentially connected pond beggar layer, the first full connection sublayer and the second full connection sublayer;
The pond beggar layer is used for: carrying out maximum pond to the fused characteristic spectrum of Fusion Features network layer output
Change and calculate, obtains the calculated result in maximum pond, and using the calculated result as first group of penalty coefficient;
Described first full connection sublayer is used for: by the nerve of first group of penalty coefficient and the first full connection sublayer
The weight of member is calculated, and the first calculated result is obtained, and using first calculated result as second group of penalty coefficient, is utilized
Sigmoid activation primitive normalizes the numerical value of second group of penalty coefficient, obtains third group penalty coefficient;
Described second full connection sublayer is used for: by the nerve of the third group penalty coefficient and the second full connection sublayer
The weight of member is calculated, and the second calculated result is obtained, and using second calculated result as the 4th group of penalty coefficient, is utilized
Sigmoid activation primitive normalizes the numerical value of the 4th group of penalty coefficient, obtains the 5th group of penalty coefficient, utilizes described the
The fused characteristic spectrum is weighted in five groups of penalty coefficients, obtains fisrt feature map.
Second aspect, the embodiment of the invention provides a kind of semantic segmentation device for RGB-D image, described device packets
It includes:
Module is obtained, for obtaining the RGB-D image to semantic segmentation, the RGB-D image includes: RGB triple channel
RGB image and the corresponding depth image of the RGB image;
Computing module, for RGB image included by the RGB-D image and depth image to be input to training in advance
Neural network obtains the corresponding target identification image of the RGB-D image;Wherein, the RGB image is input to the nerve net
A branched network network layers in the branching networks group of network, the depth image are input to another point in the branching networks group
Branch network layer;Wherein, the neural network includes: the sequentially connected branching networks group, Fusion Features network layer and output
Network layer, the branching networks group include two branched network network layers as branch arranged side by side, and each branched network network layers are to input
The feature extraction layer of image progress feature extraction;The neural network is schemed according to sample RGB-D image, the sample RGB-D
As the training of corresponding sample identification image obtains, the sample RGB-D image includes sample RGB image and sample depth image,
The corresponding sample identification image of any sample RGB-D image are as follows: sample RGB image included by sample RGB-D image is corresponding
Semantic segmentation result.
Optionally, each branched network network layers include three convolution modules of serial connection.
Optionally, the input content of each target convolution module includes: the target convolution module in the first branched network network layers
A upper convolution module output content and the second branched network network layers in convolution corresponding with the upper convolution module position
The output content of module;
Wherein, the first branched network network layers are the branched network network layers where the RGB image, second branching networks
Layer is the branched network network layers where the depth image, and either objective convolution module is that first is removed in the first branched network network layers
Convolution module other than a convolution module.
Optionally, the Fusion Features mode of the Fusion Features network layer, comprising:
The channel for the characteristic spectrum that described two branched network network layers export respectively is corresponded into concatenated mode.
Optionally, the Fusion Features network layer is connected with output network layer by feature selecting network layer;Wherein, institute
Stating feature selecting network layer includes: sequentially connected pond beggar layer, the first full connection sublayer and the second full connection sublayer;
The pond beggar layer is used for: carrying out maximum pond to the fused characteristic spectrum of Fusion Features network layer output
Change and calculate, obtains the calculated result in maximum pond, and using the calculated result as first group of penalty coefficient;
Described first full connection sublayer is used for: by the nerve of first group of penalty coefficient and the first full connection sublayer
The weight of member is calculated, and the first calculated result is obtained, and using first calculated result as second group of penalty coefficient, is utilized
Sigmoid activation primitive normalizes the numerical value of second group of penalty coefficient, obtains third group penalty coefficient;
Described second full connection sublayer is used for: by the nerve of the third group penalty coefficient and the second full connection sublayer
The weight of member is calculated, and the second calculated result is obtained, and using second calculated result as the 4th group of penalty coefficient, is utilized
Sigmoid activation primitive normalizes the numerical value of the 4th group of penalty coefficient, obtains the 5th group of penalty coefficient, utilizes described the
The fused characteristic spectrum is weighted in five groups of penalty coefficients, obtains fisrt feature map.
It is in scheme provided by the embodiment of the present invention, RGB image included by the RGB-D image and depth image is defeated
Enter to neural network trained in advance, obtains the corresponding target identification image of the RGB-D image, the target identification image
Are as follows: the corresponding semantic segmentation result of RGB image included by the RGB-D image.Therefore, side provided by the embodiment of the present invention
Case can be realized the purpose for carrying out effective semantic segmentation to RGB-D image using neural network.
Certainly, it implements any of the products of the present invention or method must be not necessarily required to reach all the above excellent simultaneously
Point.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below
There is attached drawing needed in technical description to be briefly described.
Fig. 1 is a kind of flow diagram of the semantic segmentation method for RGB-D image provided by the embodiment of the present invention;
Fig. 2 is the structural schematic diagram of neural network provided by the embodiment of the present invention;
Fig. 3 (a) is the corresponding grayscale image of a RGB image;Fig. 3 (b) is right for the RGB image of grayscale image shown in Fig. 3 (a)
The depth image answered;Fig. 3 (c) is obtained to the RGB image of Fig. 3 (a) and the depth image progress semantic segmentation of Fig. 3 (b)
The grayscale image of target identification image;
Fig. 4 is a kind of structural schematic diagram of the semantic segmentation device for RGB-D image provided by the embodiment of the present invention;
Fig. 5 is the structural schematic diagram of a kind of electronic equipment provided by the embodiment of the present invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention is described.
In order to realize the purpose for carrying out effective semantic segmentation to RGB-D image, the embodiment of the invention provides a kind of needles
To semantic segmentation method, apparatus, electronic equipment and the storage medium of RGB-D image.
It should be noted that a kind of semantic segmentation method for RGB-D image provided by the embodiment of the present invention is held
Row main body can be a kind of semantic segmentation device for RGB-D image, should can be with for the semantic segmentation device of RGB-D image
It runs in electronic equipment.It wherein, can should be inserting in an image processing tool for the semantic segmentation device of RGB-D image
Part, or independently of the program except an image processing tool, be not limited thereto certainly.
It is situated between in the following, being provided for the embodiments of the invention a kind of semantic segmentation method for RGB-D image first
It continues.
As shown in Figure 1, a kind of semantic segmentation method for RGB-D image provided by the embodiment of the present invention, can wrap
Include following steps:
S101 obtains the RGB-D image to semantic segmentation.
Wherein, the RGB-D image includes: RGB image and the corresponding depth image of the RGB image.
In embodiments of the present invention, it can use the shooting of RGB-D camera and obtain the RGB-D image, for example, can be from institute
The shooting module of RGB-D camera is stated, the RGB-D image of RGB-D camera captured in real-time is obtained, or obtains RGB- from predeterminated position
The RGB-D image etc. that D camera shoots and stores in advance in the embodiment of the present invention, obtains the RGB-D to semantic segmentation certainly
The mode of image is without being limited thereto.Wherein, so-called RGB-D camera is that can shoot RGB image and depth simultaneously in the prior art
The camera of image.
RGB image included by the RGB-D image and depth image are input to nerve net trained in advance by S102
Network obtains the corresponding target identification image of the RGB-D image.
Described referring to fig. 2, Fig. 2 is the structural schematic diagram of neural network provided by the embodiment of the present invention;From the angle of function
For degree, the neural network includes: the sequentially connected branching networks group, Fusion Features network layer 120 and output network
Layer 130, the branching networks group includes two branched network network layers 110 as branch arranged side by side: each branched network network layers 110 are pair
The feature extraction layer of input picture progress feature extraction.Fusion Features network layer 120 is used for: defeated to each branched network network layers 110
Characteristic spectrum out carries out Fusion Features.Exporting network layer 130 is that the neural network removes the branching networks group, Fusion Features
The general designation of remaining structure except network layer 120, it is to be understood that the output network layer 130 may include multiple networks
Sublayer.
It should be noted that either branch network layer with later Fusion Features network layer and output network layer be sequentially connected with
It may be constructed FCN (Fully Convolutional Networks, full convolutional network).The full convolutional network is California in 2014
What Long of university's Berkeley et al. was proposed, which is a kind of net that current semantic segmentation field is widely used
Network structure.
The neural network is according to sample RGB-D image, the corresponding sample identification image instruction of the sample RGB-D image
It gets, the sample RGB-D image includes sample RGB image and sample depth image.It is clear in order to be laid out, about the mind
Training process through network is introduced below.
As shown in Fig. 2, in embodiments of the present invention, RGB image included by the RGB-D image and depth image is defeated
The process entered to neural network trained in advance may include: that the RGB image is input to the branched network of the neural network
The depth image is input to another branching networks in the branching networks group by a branched network network layers in network group
Layer.
According to the working principle of neural network, in embodiments of the present invention, the target mark of the neural network output
Knowing image is the corresponding semantic segmentation result of RGB image included by the RGB-D image.
In embodiments of the present invention, the semantic segmentation may is that object mark different classes of in the RGB image
Object for different values, each value is shown with corresponding color.Thus, it is different classes of in the target identification image
Object can have different colors.
The image effect of RGB image, depth image and target identification image is described with reference to the drawings in scheme in order to facilitate understanding
Difference.Fig. 3 (a) gives the corresponding grayscale image of a RGB image, and Fig. 3 (b) is the corresponding depth image of the RGB image, Fig. 3
(c) to utilize depth shown in RGB image and Fig. 3 (b) of the method provided by the embodiment of the present invention to grayscale image shown in Fig. 3 (a)
Image carries out semantic segmentation, the grayscale image of obtained target identification image.It is understood that not carrying out the original of gray scale conversion
In some target identification images, different classes of object has different colors, thus can intuitively embody semantic segmentation
As a result.
It is in scheme provided by the embodiment of the present invention, RGB image included by the RGB-D image and depth image is defeated
Enter to neural network trained in advance, obtains the corresponding target identification image of the RGB-D image, the target identification image
Are as follows: the corresponding semantic segmentation result of RGB image included by the RGB-D image.Therefore, side provided by the embodiment of the present invention
Case can be realized the purpose for carrying out effective semantic segmentation to RGB-D image using neural network.
The training process of the neural network in the embodiment of the present invention is simply introduced below, training process may include
Following steps:
The first step determines initial neural network;
Wherein, the initial neural network includes: the sequentially connected branching networks group, Fusion Features network layer and defeated
Network layer out, the branching networks group include two branched network network layers as branch arranged side by side, and each branched network network layers are to defeated
Enter the feature extraction layer that image carries out feature extraction.
It should be noted that in embodiments of the present invention, the initial weight in the initial neural network can be existing
The weight trained.And as a preferred mode, it in embodiments of the present invention, can be for where the depth image
Branched network network layers, train corresponding weight as the initial weight of the branched network network layers in advance, enable in this way it is described just
The specific aim of the weight of beginning neural network is stronger, improves the training effect of the initial neural network.
Second step obtains sample RGB-D image, the corresponding sample identification image of the sample RGB-D image;
In embodiments of the present invention, available multiple sample RGB-D images, the corresponding sample of the sample RGB-D image
This mark image, in the subsequent training effect for improving initial neural network.It is available such as in primary training process
Corresponding sample identification image of sample RGB-D image described in 8 groups of sample RGB-D images and 8 groups etc..
Wherein, the sample RGB-D image includes sample RGB image and sample depth image, any sample RGB-D image
Corresponding sample identification image are as follows: the corresponding semantic segmentation result of sample RGB image included by sample RGB-D image.It needs
It is noted that the sample identification image can be by manual identification, it is not limited thereto certainly.
Third step, using sample RGB-D image, the corresponding sample identification image of the sample RGB-D image, described in training
Initial neural network obtains the neural network.
In this step, the sample RGB image is input in the branching networks group of the initial neural network first
A branched network network layers, another branched network network layers sample depth image being input in the branching networks group;
Using corresponding sample identification image as true value;And it follows the steps below:
1) the sample RGB image and the sample depth image are obtained by the training of the initial neural network
Training result.
2) training result and corresponding true value are compared, obtain output result;
3) value of the loss function Loss of the initial neural network is calculated according to output result;
4) according to the value of the Loss, the parameter of initial neural network is adjusted, and re-starts 1) -3) step, until institute
The value for stating Loss has reached certain condition of convergence, that is, the value of the Loss reaches minimum, at this moment, completes initial nerve net
The training of network obtains the neural network that training is completed.
Some semantic point is carried out to RGB-D image using neural network it should be noted that having existed in the prior art
The method cut, purpose are mostly to promote RGB-D semantic segmentation precision.For example a kind of realization process of prior art can be with are as follows:
The input channel number increase of full convolutional network is one-dimensional, then sample depth image is merged with sample RGB image
Full convolutional network is input to for four-way to be trained, and obtains trained full convolutional network, and utilize the trained full volume
Product network, semantic segmentation result is calculated in RGB image and corresponding depth image to input.It is understood that in mind
Through in network training process, the initial weight of neural network is very important, this kind determines initial weight in the prior art
Mode is: the good weight of pre-training on heavily loaded large-scale dataset (such as ImageNet), but since these large data sets are all
RGB data collection, input channel are triple channel, and for the depth image of increased fourth lane, there is no specially suitable depth
The weight of image.By experimental data it is found that this prior art trains obtained full convolutional network, in RGB-D semantic segmentation
Precision aspect, promotion are not obvious.
In order to train the weight come on heavily loaded RGB data collection, other scholars are based on FCN, propose another existing skill
Art, the process of realization can be with are as follows: depth image is converted to level error by the deepness image encoding method HHA proposed using S.Gupta
Three channels of different, distance away the ground and surface normal measuring angle.Then depth image and RGB image are respectively input to one entirely
Convolutional network, in the last carry out Fusion Features of full convolutional network, amalgamation mode is by the probability graph phase of two full convolutional networks
Add, wherein the probability graph is that the characteristic spectrum of full convolutional network output is obtained by activation primitive, finally, based on being added
Probability graph afterwards obtains final semantic segmentation result.The experimental results showed that the prior art has certain promotion to segmentation precision.
But there are two disadvantages for the prior art: the complementary information between each channel data is only emphasized in a.HHA coding method, and is neglected
Omited the independent element in each channel, there is certain limitation, and the spatial information that is characterized of triple channel after HHA coding and
The color and vein information of RGB triple channel characterization has essential distinction.Also, it is this in the prior art, characteristic spectrum is subjected to phase
The amalgamation mode added destroys the feature that full convolutional network respectively extracts under both modalities which, that is to say, that destroy RGB image
With the respective feature of depth image.B. the prior art needs to carry out a large amount of pretreatment work, that is, carries out HHA and encode work
Make, causes to consume more computing resource, can not accomplish real-time semantic segmentation.
In embodiments of the present invention, inventor is based on full convolutional network, constructs the neural network by research, and
In implementation process, the RGB image is inputted into the first branched network network layers in the branching networks group of the neural network, it can be with
Understand, the RGB image is inputted in the form of triple channel;The depth image is input in the branching networks group
The second branched network network layers, the depth image is inputted with a channel form;Also, in training for the neural network
Cheng Zhong is provided with corresponding weight for depth image, and therefore, scheme provided by the embodiment of the present invention can be to avoid above-mentioned first
Kind has corresponding weight in the prior art, due to not having depth image, and caused RGB-D semantic segmentation precision improvement is unknown
Aobvious problem.
Also, inventor also found in the course of the research, and the feature extracted by RGB image and depth image can be seen
Out, there are apparent complementary relationships for two kinds of characteristic spectrums, can destroy this complementary relationship to the direct summation of characteristic spectrum, weaken
Independent characteristic under both modalities which.
Therefore, in embodiments of the present invention, optionally, the amalgamation mode of the Fusion Features network layer are as follows: by described two
The channel for the characteristic spectrum that a branched network network layers export respectively corresponds to concatenated mode.The mode for using feature to stack in this way can be with
Retain the RGB image and the respective primitive character information of the depth image.It can thus be seen that with second of prior art
It compares, scheme provided by the embodiment of the present invention does not have to carry out HHA coding work, and the amalgamation mode of used characteristic spectrum
Also different, therefore can solve above-mentioned second of problems of the prior art.
Method provided by the embodiment of the present invention, by determining fusion position of the feature of depth image in neural network
And amalgamation mode, can use the space geometry information in depth image, auxiliary RGB image realizes semantic segmentation end to end,
Promote segmentation precision.
Illustrate scheme provided by the embodiment of the present invention below by way of experimental data, compared to the first prior art and
The beneficial effect of two kinds of prior arts.Referring to table 1, table 1 is pair of the experimental result of the embodiment of the present invention and two kinds of prior arts
Than, it can be seen from Table 1 that, scheme provided by the embodiment of the present invention, pixel accuracy, mean pixel accuracy and friendship are simultaneously compared
All it is higher than two kinds of prior arts, it will be appreciated by persons skilled in the art that above-mentioned three kinds of indexs can be used for characterizing segmentation essence
Degree.Therefore, compared to two kinds prior arts of scheme provided by the embodiment of the present invention, segmentation precision are higher.
Table 1
Optionally, in embodiments of the present invention, each branched network network layers include three convolution modules of serial connection.
According to the structure of the full articulamentum it is found that each convolution module includes two layers of full connection sublayer and one layer of pond beggar
Layer.
In embodiments of the present invention, the quantity of the module of each branched network network layers is constantly to test determination by inventor
's.It is understood that the quantity of the module of each branched network network layers, is related to Fusion Features network layer in the neural network
In position, that is, RGB image and depth image characteristic spectrum fusion position.Inventor is constructing the neural network
Before, determine that the process of the fusion position can be with are as follows: RGB image is inputted into a full convolutional network, depth image is inputted
Another full convolutional network is compared in the characteristic spectrum that the different location of full convolutional network extracts two full convolutional networks
Analysis.By comparative analysis it can be found that: in the characteristic spectrum and depth image of the RGB image that third convolution module extracts
Characteristic spectrum, the feature (feature of RGB image and depth image) under the two mode still falls within angle point, edge or plane
Etc. low-level features scope.
And occur what network was independently extracted and combined since the characteristic spectrum for the RGB image that the 5th convolution block extracts
High-level abstractions feature, while the characteristic spectrum of RGB image at this moment is Local activation, illustrates the RGB feature in upper layer network
Extractor is only sensitive to the object for meeting certain category feature rule, and no longer to global point-line-surface feature-sensitive.And in contrast,
The characteristic spectrum of the depth image extracted from the 5th convolution block can still see apparent focal boundary feature, these features
Still fall within the scope of low-level features.
By comparative analysis it can be concluded that the upper layer network of full convolutional network cannot effectively extract the spy of depth image
Sign, and since the presence of pond layer so that the resolution ratio of the characteristic spectrum of depth image reduces further results in depth image
Characteristic details loss.Therefore, it should select lower layer network to carry out Fusion Features, retain the characteristic details of depth image, together
When avoid more unnecessary convolution algorithms.
Therefore, in embodiments of the present invention, each branched network network layers may include three convolution modules of serial connection,
That is the position after determining the third convolution module of full convolutional network, is the fusion position.
Certainly, in embodiments of the present invention, each branched network network layers also may include two or four convolution modules, this
All it is reasonable, is the preferred embodiment after a kind of comparison for three convolution modules, and for other convolution modules
Quantity, to RGB-D image carry out semantic segmentation effect, such as precision etc., may decrease.
Certainly, if the full convolutional network in the embodiment of the present invention replaces with other neural networks, the nerve can be chosen
Position of the middle position of network as Fusion Features network layer can be easily determined accordingly in this way for RGB-D image
Semantic segmentation neural network structure.
Optionally, in embodiments of the present invention, in the first branched network network layers each target convolution module input content packet
Include: in the output content and the second branched network network layers of a upper convolution module for the target convolution module with a upper convolution
The output content of the corresponding convolution module of module position;
Wherein, the first branched network network layers are the branched network network layers where the RGB image, second branching networks
Layer is the branched network network layers where the depth image, and either objective convolution module is that first is removed in the first branched network network layers
Convolution module other than a convolution module.
It is understood that compared to the structure of neural network shown in Fig. 2, nerve net provided by the embodiment of the present invention
Network, can be by the second branched network network layers, the characteristic spectrum of each target convolution module output, with first branched network
The characteristic spectrum of the target convolution module output of corresponding position carries out Fusion Features in network layers, can utilize depth to greatest extent
The feature of image cooperates the feature of RGB image, realizes the purpose that effective semantic segmentation is carried out to RGB-D image.
Optionally, in embodiments of the present invention, the Fusion Features network layer and output network layer pass through feature selecting net
Network layers are connected;
It is understood that compared to the structure of neural network shown in Fig. 2, neural network in the embodiment of the present invention,
Between Fusion Features network layer and output network layer, feature selecting network layer joined.That is, in the embodiment of the present invention
In, inventor is modified the structure of existing full convolutional network.Wherein, the feature selecting network layer includes: sequentially
The pond beggar layer of connection, the first full connection sublayer and the second full connection sublayer;The quantity etc. of the neuron of each full connection sublayer
In the number of channels of fused characteristic spectrum.
The pond beggar layer is used for: carrying out maximum pond to the fused characteristic spectrum of Fusion Features network layer output
Change and calculate, obtains the calculated result in maximum pond, and using the calculated result as first group of penalty coefficient;
It should be noted that if the size of the characteristic spectrum of the RGB image and the depth image is H × W × C,
In, the C is the port number of the characteristic spectrum of the RGB image and the depth image.It is understood that fused spy
The size for levying map is H × W × 2C.So, the fused characteristic spectrum of Fusion Features network layer output is carried out most
The process that great Chiization calculates can be, and a maximum value, therefore, obtained maximum pond are determined from the H × W in each channel
Calculated result is the array of 1 × 1 × 2C.
Described first full connection sublayer is used for: by the nerve of first group of penalty coefficient and the first full connection sublayer
The weight of member is calculated, and the first calculated result is obtained, and using first calculated result as second group of penalty coefficient, is utilized
Sigmoid activation primitive normalizes the numerical value of second group of penalty coefficient, obtains third group penalty coefficient;
It is understood that first group of penalty coefficient is 2C numerical value, the first full connection sublayer has 2C neuron,
Each numerical value of each neuron and first group of penalty coefficient has corresponding weight, and the first obtained calculated result is also 2C
Numerical value, below to the process for obtaining the first calculated result for example, such as: Y1=X1*W11+X2*W21+...+X2C*W2C1,
Middle X1~X2CIt is the calculated result in maximum pond, Y1It is first numerical value of the first calculated result, W11~W2CIt is maximum pond
Corresponding weight on line between calculated result and first neuron of the first full connection sublayer.First calculated result its
The calculating process of his numerical value, similar with the calculating process of first numerical value of the first calculated result, details are not described herein.
After obtaining the first calculated result, using first calculated result as second group of penalty coefficient.And utilize sigmoid
The numerical value of second group of penalty coefficient is normalized between 0~1 by activation primitive, obtains third group penalty coefficient.
Described second full connection sublayer is used for: by the nerve of the third group penalty coefficient and the second full connection sublayer
The weight of member is calculated, and the second calculated result is obtained, and using second calculated result as the 4th group of penalty coefficient, is utilized
Sigmoid activation primitive normalizes the numerical value of the 4th group of penalty coefficient, obtains the 5th group of penalty coefficient, utilizes described the
The fused characteristic spectrum is weighted in five groups of penalty coefficients, obtains fisrt feature map.
Described second full connection sublayer adds the fused characteristic spectrum using the 5th group of penalty coefficient
Calculating process before power calculates is similar with the described first full connection calculating process of sublayer, and details are not described herein.
It should be noted that the neural network, when being trained, two connect each neurons of sublayer entirely
Weight W between the penalty coefficient of corresponding group is constantly updating always iteration by study, and the change of weight influences the first meter
Calculating result and the second calculated result, the value that then sigmoid activation primitive calculates can also change, thus in fused spy
Weighting coefficient on sign map also can constantly change, and finally will affect the value of final loss function Loss.The neural network
By study, weight W can slowly have autonomous selectivity, so that the value of loss is total after weighting coefficient weights characteristic spectrum
Be it is less than normal, this just illustrate weighting coefficient be actually lesser feature (certain channels in 2C) will be contributed to inhibit, that
, it is possible to understand that are as follows: since useless feature is suppressed, the effect of useful feature is in a disguised form amplified.Therefore, features described above selects net
Network layers function as a rewards and punishments mechanism, i.e., improve the contributive high weighted value of feature for precision and rewarded, and
The feature for improving low contribution for precision is punished and is inhibited with low weighted value.It is possible to understand, feature selecting net
The semantic segmentation precision of RGB-D image can be improved in the addition of network layers.
Corresponding to above method embodiment, the embodiment of the invention also provides a kind of semantic segmentations for RGB-D image
Device, as shown in figure 4, the device includes:
Module 401 is obtained, for obtaining the RGB-D image to semantic segmentation, the RGB-D image includes: RGB three
Channel RGB image and the corresponding depth image of the RGB image;
Computing module 402, for RGB image included by the RGB-D image and depth image to be input to preparatory instruction
Experienced neural network obtains the corresponding target identification image of the RGB-D image;Wherein, the RGB image is input to the mind
A branched network network layers in branching networks group through network, the depth image are input to another in the branching networks group
A branched network network layers;Wherein, the neural network include: the sequentially connected branching networks group, Fusion Features network layer and
Network layer is exported, the branching networks group includes two branched network network layers as branch arranged side by side, and each branched network network layers are pair
The feature extraction layer of input picture progress feature extraction;The neural network is according to sample RGB-D image, the sample RGB-
The corresponding sample identification image training of D image obtains, and the sample RGB-D image includes sample RGB image and sample depth figure
Picture, the corresponding sample identification image of any sample RGB-D image are as follows: sample RGB image pair included by sample RGB-D image
The semantic segmentation result answered.
Optionally, in embodiments of the present invention, each branched network network layers include three convolution modules of serial connection.
Optionally, in embodiments of the present invention, in the first branched network network layers each target convolution module input content packet
Include: in the output content and the second branched network network layers of a upper convolution module for the target convolution module with a upper convolution
The output content of the corresponding convolution module of module position;
Wherein, the first branched network network layers are the branched network network layers where the RGB image, second branching networks
Layer is the branched network network layers where the depth image, and either objective convolution module is that first is removed in the first branched network network layers
Convolution module other than a convolution module.
Optionally, in embodiments of the present invention, the Fusion Features mode of the Fusion Features network layer, comprising:
The channel for the characteristic spectrum that described two branched network network layers export respectively is corresponded into concatenated mode.
Optionally, in embodiments of the present invention, the Fusion Features network layer and output network layer pass through feature selecting net
Network layers are connected;Wherein, the feature selecting network layer includes: sequentially connected pond beggar layer, the first full connection sublayer and the
Two full connection sublayers;
The pond beggar layer is used for: carrying out maximum pond to the fused characteristic spectrum of Fusion Features network layer output
Change and calculate, obtains the calculated result in maximum pond, and using the calculated result as first group of penalty coefficient;
Described first full connection sublayer is used for: by the nerve of first group of penalty coefficient and the first full connection sublayer
The weight of member is calculated, and the first calculated result is obtained, and using first calculated result as second group of penalty coefficient, is utilized
Sigmoid activation primitive normalizes the numerical value of second group of penalty coefficient, obtains third group penalty coefficient;
Described second full connection sublayer is used for: by the nerve of the third group penalty coefficient and the second full connection sublayer
The weight of member is calculated, and the second calculated result is obtained, and using second calculated result as the 4th group of penalty coefficient, is utilized
Sigmoid activation primitive normalizes the numerical value of the 4th group of penalty coefficient, obtains the 5th group of penalty coefficient, utilizes described the
The fused characteristic spectrum is weighted in five groups of penalty coefficients, obtains fisrt feature map.
It is in scheme provided by the embodiment of the present invention, RGB image included by the RGB-D image and depth image is defeated
Enter to neural network trained in advance, obtains the corresponding target identification image of the RGB-D image, the target identification image
Are as follows: the corresponding semantic segmentation result of RGB image included by the RGB-D image.Therefore, side provided by the embodiment of the present invention
Case can be realized the purpose for carrying out effective semantic segmentation to RGB-D image using neural network.
Corresponding to above method embodiment, the embodiment of the invention also provides a kind of electronic equipment, as shown in figure 5, can be with
Including processor 501 and memory 502, wherein
The memory 502, for storing computer program;
The processor 501 when for executing the program stored on the memory 502, realizes the embodiment of the present invention
The step of provided semantic segmentation method for RGB-D image.
Above-mentioned memory may include RAM (Random Access Memory, random access memory), also may include
NVM (Non-Volatile Memory, nonvolatile memory), for example, at least a magnetic disk storage.Optionally, memory
It can also be that at least one is located away from the storage device of above-mentioned processor.
Above-mentioned processor can be general processor, including CPU (Central Processing Unit, central processing
Device), NP (Network Processor, network processing unit) etc.;Can also be DSP (Digital Signal Processor,
Digital signal processor), ASIC (Application Specific Integrated Circuit, specific integrated circuit),
FPGA (Field-Programmable Gate Array, field programmable gate array) or other programmable logic device are divided
Vertical door or transistor logic, discrete hardware components.
It by above-mentioned electronic equipment, can be realized: RGB image included by the RGB-D image and depth image is defeated
Enter to neural network trained in advance, obtains the corresponding target identification image of the RGB-D image, the target identification image
Are as follows: the corresponding semantic segmentation result of RGB image included by the RGB-D image.Therefore, side provided by the embodiment of the present invention
Case can be realized the purpose for carrying out effective semantic segmentation to RGB-D image using neural network.
In addition, corresponding to the semantic segmentation method for being directed to RGB-D image provided by above-described embodiment, the embodiment of the present invention
A kind of computer readable storage medium is provided, computer program, computer journey are stored in the computer readable storage medium
The step of semantic segmentation method that RGB-D image is directed to provided by the embodiment of the present invention is realized when sequence is executed by processor.
Above-mentioned computer-readable recording medium storage, which has, to be executed provided by the embodiment of the present invention at runtime for RGB-D
The application program of the semantic segmentation method of image, therefore can be realized: by RGB image and depth included by the RGB-D image
Degree image is input to neural network trained in advance, obtains the corresponding target identification image of the RGB-D image, the target mark
Know image are as follows: the corresponding semantic segmentation result of RGB image included by the RGB-D image.Therefore, the embodiment of the present invention is provided
Scheme, can be realized the purpose for carrying out effective semantic segmentation to RGB-D image using neural network.
For electronic equipment and computer readable storage medium embodiment, method content base as involved in it
Originally it is similar to embodiment of the method above-mentioned, so being described relatively simple, referring to the part explanation of embodiment of the method in place of correlation
?.
It should be noted that, in this document, relational terms such as first and second and the like are used merely to a reality
Body or operation are distinguished with another entity or operation, are deposited without necessarily requiring or implying between these entities or operation
In any actual relationship or order or sequence.Moreover, the terms "include", "comprise" or its any other variant are intended to
Non-exclusive inclusion, so that the process, method, article or equipment including a series of elements is not only wanted including those
Element, but also including other elements that are not explicitly listed, or further include for this process, method, article or equipment
Intrinsic element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that
There is also other identical elements in process, method, article or equipment including the element.
Each embodiment in this specification is all made of relevant mode and describes, same and similar portion between each embodiment
Dividing may refer to each other, and each embodiment focuses on the differences from other embodiments.Especially for system reality
For applying example, since it is substantially similar to the method embodiment, so being described relatively simple, related place is referring to embodiment of the method
Part explanation.
The foregoing is merely alternative embodiments of the invention, are not intended to limit the scope of the present invention.It is all
Any modification, equivalent replacement, improvement and so within the spirit and principles in the present invention, are all contained in protection scope of the present invention
It is interior.
Claims (10)
1. a kind of semantic segmentation method for RGB-D image characterized by comprising
The RGB-D image to semantic segmentation is obtained, the RGB-D image includes: RGB triple channel RGB image and the RGB
The corresponding depth image of image;
RGB image included by the RGB-D image and depth image are input to neural network trained in advance, obtained described
The corresponding target identification image of RGB-D image;Wherein, the RGB image is input in the branching networks group of the neural network
A branched network network layers, the depth image is input to another branched network network layers in the branching networks group;Wherein, institute
Stating neural network includes: the sequentially connected branching networks group, Fusion Features network layer and output network layer, the branched network
Network group includes two branched network network layers as branch arranged side by side, and each branched network network layers are to carry out feature extraction to input picture
Feature extraction layer;The neural network is according to sample RGB-D image, the corresponding sample identification figure of the sample RGB-D image
It is obtained as trained, the sample RGB-D image includes sample RGB image and sample depth image, any sample RGB-D image pair
The sample identification image answered are as follows: the corresponding semantic segmentation result of sample RGB image included by sample RGB-D image.
2. the method according to claim 1, wherein each branched network network layers include three convolution of serial connection
Module.
3. according to the method described in claim 2, it is characterized in that, in the first branched network network layers each target convolution module it is defeated
Enter in the output content and the second branched network network layers for the upper convolution module that content includes: the target convolution module with it is described
The output content of the corresponding convolution module in upper convolution module position;
Wherein, the first branched network network layers are the branched network network layers where the RGB image, and the second branched network network layers are
Branched network network layers where the depth image, either objective convolution module are in the first branched network network layers except first volume
Convolution module other than volume module.
4. the method according to claim 1, wherein the Fusion Features mode of the Fusion Features network layer, packet
It includes:
The channel for the characteristic spectrum that described two branched network network layers export respectively is corresponded into concatenated mode.
5. the method according to claim 1, wherein the Fusion Features network layer and output network layer pass through spy
Sign selection network layer is connected;Wherein, the feature selecting network layer includes: sequentially connected pond beggar layer, the first full connection
Sublayer and the second full connection sublayer;
Pond beggar layer by: the fused characteristic spectrum of Fusion Features network layer output is carried out based on maximum pond
It calculates, obtains the calculated result in maximum pond, and using the calculated result as first group of penalty coefficient;
Described first full connection sublayer is used for: first group of penalty coefficient and described first are connected the neuron of sublayer entirely
Weight is calculated, and the first calculated result is obtained, and using first calculated result as second group of penalty coefficient, is utilized
Sigmoid activation primitive normalizes the numerical value of second group of penalty coefficient, obtains third group penalty coefficient;
Described second full connection sublayer is used for: the third group penalty coefficient and described second are connected the neuron of sublayer entirely
Weight is calculated, and the second calculated result is obtained, and using second calculated result as the 4th group of penalty coefficient, is utilized
Sigmoid activation primitive normalizes the numerical value of the 4th group of penalty coefficient, obtains the 5th group of penalty coefficient, utilizes described the
The fused characteristic spectrum is weighted in five groups of penalty coefficients, obtains fisrt feature map.
6. a kind of semantic segmentation device for RGB-D image characterized by comprising
Module is obtained, for obtaining the RGB-D image to semantic segmentation, the RGB-D image includes: RGB triple channel RGB
Image and the corresponding depth image of the RGB image;
Computing module, for RGB image included by the RGB-D image and depth image to be input to nerve trained in advance
Network obtains the corresponding target identification image of the RGB-D image;Wherein, the RGB image is input to the neural network
A branched network network layers in branching networks group, the depth image are input to another branched network in the branching networks group
Network layers;Wherein, the neural network includes: the sequentially connected branching networks group, Fusion Features network layer and output network
Layer, the branching networks group include two branched network network layers as branch arranged side by side, and each branched network network layers are to input picture
Carry out the feature extraction layer of feature extraction;The neural network is according to sample RGB-D image, the sample RGB-D image pair
The sample identification image training answered obtains, and the sample RGB-D image includes sample RGB image and sample depth image, any
The corresponding sample identification image of sample RGB-D image are as follows: the corresponding language of sample RGB image included by sample RGB-D image
Adopted segmentation result.
7. device according to claim 6, which is characterized in that each branched network network layers include three convolution of serial connection
Module.
8. device according to claim 7, which is characterized in that each target convolution module is defeated in the first branched network network layers
Enter in the output content and the second branched network network layers for the upper convolution module that content includes: the target convolution module with it is described
The output content of the corresponding convolution module in upper convolution module position;
Wherein, the first branched network network layers are the branched network network layers where the RGB image, and the second branched network network layers are
Branched network network layers where the depth image, either objective convolution module are in the first branched network network layers except first volume
Convolution module other than volume module.
9. device according to claim 6, which is characterized in that the Fusion Features mode of the Fusion Features network layer, packet
It includes:
The channel for the characteristic spectrum that described two branched network network layers export respectively is corresponded into concatenated mode.
10. device according to claim 6, which is characterized in that the Fusion Features network layer and output network layer pass through
Feature selecting network layer is connected;Wherein, the feature selecting network layer includes: sequentially connected pond beggar layer, first connects entirely
Connect sublayer and the second full connection sublayer;
Pond beggar layer by: the fused characteristic spectrum of Fusion Features network layer output is carried out based on maximum pond
It calculates, obtains the calculated result in maximum pond, and using the calculated result as first group of penalty coefficient;
Described first full connection sublayer is used for: first group of penalty coefficient and described first are connected the neuron of sublayer entirely
Weight is calculated, and the first calculated result is obtained, and using first calculated result as second group of penalty coefficient, is utilized
Sigmoid activation primitive normalizes the numerical value of second group of penalty coefficient, obtains third group penalty coefficient;
Described second full connection sublayer is used for: the third group penalty coefficient and described second are connected the neuron of sublayer entirely
Weight is calculated, and the second calculated result is obtained, and using second calculated result as the 4th group of penalty coefficient, is utilized
Sigmoid activation primitive normalizes the numerical value of the 4th group of penalty coefficient, obtains the 5th group of penalty coefficient, utilizes described the
The fused characteristic spectrum is weighted in five groups of penalty coefficients, obtains fisrt feature map.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811020264.8A CN109271990A (en) | 2018-09-03 | 2018-09-03 | A kind of semantic segmentation method and device for RGB-D image |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811020264.8A CN109271990A (en) | 2018-09-03 | 2018-09-03 | A kind of semantic segmentation method and device for RGB-D image |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109271990A true CN109271990A (en) | 2019-01-25 |
Family
ID=65187125
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811020264.8A Pending CN109271990A (en) | 2018-09-03 | 2018-09-03 | A kind of semantic segmentation method and device for RGB-D image |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109271990A (en) |
Cited By (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109903252A (en) * | 2019-02-27 | 2019-06-18 | 深圳市商汤科技有限公司 | Image processing method and device, electronic equipment and storage medium |
CN110111289A (en) * | 2019-04-28 | 2019-08-09 | 深圳市商汤科技有限公司 | A kind of image processing method and device |
CN110232418A (en) * | 2019-06-19 | 2019-09-13 | 深圳前海达闼云端智能科技有限公司 | Semantic recognition method, terminal and computer readable storage medium |
CN110736465A (en) * | 2019-11-15 | 2020-01-31 | 北京云迹科技有限公司 | Navigation method, navigation device, robot and computer readable storage medium |
CN110827305A (en) * | 2019-10-30 | 2020-02-21 | 中山大学 | Semantic segmentation and visual SLAM tight coupling method oriented to dynamic environment |
CN111161193A (en) * | 2019-12-31 | 2020-05-15 | 深圳度影医疗科技有限公司 | Ultrasonic image quality optimization method, storage medium and terminal equipment |
CN111242132A (en) * | 2020-01-07 | 2020-06-05 | 广州赛特智能科技有限公司 | Outdoor road scene semantic segmentation method and device, electronic equipment and storage medium |
CN111476840A (en) * | 2020-05-14 | 2020-07-31 | 阿丘机器人科技(苏州)有限公司 | Target positioning method, device, equipment and computer readable storage medium |
CN111738265A (en) * | 2020-05-20 | 2020-10-02 | 山东大学 | Semantic segmentation method, system, medium, and electronic device for RGB-D image |
WO2020258297A1 (en) * | 2019-06-28 | 2020-12-30 | 深圳市大疆创新科技有限公司 | Image semantic segmentation method, movable platform, and storage medium |
CN112418233A (en) * | 2020-11-18 | 2021-02-26 | 北京字跳网络技术有限公司 | Image processing method, image processing device, readable medium and electronic equipment |
CN112861911A (en) * | 2021-01-10 | 2021-05-28 | 西北工业大学 | RGB-D semantic segmentation method based on depth feature selection fusion |
CN112949662A (en) * | 2021-05-13 | 2021-06-11 | 北京市商汤科技开发有限公司 | Image processing method and device, computer equipment and storage medium |
CN113221659A (en) * | 2021-04-13 | 2021-08-06 | 天津大学 | Double-light vehicle detection method and device based on uncertain sensing network |
CN113393421A (en) * | 2021-05-08 | 2021-09-14 | 深圳市识农智能科技有限公司 | Fruit evaluation method and device and inspection equipment |
CN113971760A (en) * | 2021-10-26 | 2022-01-25 | 山东建筑大学 | High-quality quasi-dense complementary feature extraction method based on deep learning |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106709568A (en) * | 2016-12-16 | 2017-05-24 | 北京工业大学 | RGB-D image object detection and semantic segmentation method based on deep convolution network |
US9953236B1 (en) * | 2017-03-10 | 2018-04-24 | TuSimple | System and method for semantic segmentation using dense upsampling convolution (DUC) |
CN108095683A (en) * | 2016-11-11 | 2018-06-01 | 北京羽医甘蓝信息技术有限公司 | The method and apparatus of processing eye fundus image based on deep learning |
-
2018
- 2018-09-03 CN CN201811020264.8A patent/CN109271990A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108095683A (en) * | 2016-11-11 | 2018-06-01 | 北京羽医甘蓝信息技术有限公司 | The method and apparatus of processing eye fundus image based on deep learning |
CN106709568A (en) * | 2016-12-16 | 2017-05-24 | 北京工业大学 | RGB-D image object detection and semantic segmentation method based on deep convolution network |
US9953236B1 (en) * | 2017-03-10 | 2018-04-24 | TuSimple | System and method for semantic segmentation using dense upsampling convolution (DUC) |
Non-Patent Citations (1)
Title |
---|
代具亭等: "基于彩色-深度图像和深度学习的场景语义分割网络", 《科学技术与工程》 * |
Cited By (24)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109903252B (en) * | 2019-02-27 | 2021-06-18 | 深圳市商汤科技有限公司 | Image processing method and device, electronic equipment and storage medium |
CN109903252A (en) * | 2019-02-27 | 2019-06-18 | 深圳市商汤科技有限公司 | Image processing method and device, electronic equipment and storage medium |
CN110111289A (en) * | 2019-04-28 | 2019-08-09 | 深圳市商汤科技有限公司 | A kind of image processing method and device |
CN110232418A (en) * | 2019-06-19 | 2019-09-13 | 深圳前海达闼云端智能科技有限公司 | Semantic recognition method, terminal and computer readable storage medium |
WO2020258297A1 (en) * | 2019-06-28 | 2020-12-30 | 深圳市大疆创新科技有限公司 | Image semantic segmentation method, movable platform, and storage medium |
CN110827305A (en) * | 2019-10-30 | 2020-02-21 | 中山大学 | Semantic segmentation and visual SLAM tight coupling method oriented to dynamic environment |
CN110827305B (en) * | 2019-10-30 | 2021-06-08 | 中山大学 | Semantic segmentation and visual SLAM tight coupling method oriented to dynamic environment |
CN110736465B (en) * | 2019-11-15 | 2021-01-08 | 北京云迹科技有限公司 | Navigation method, navigation device, robot and computer readable storage medium |
CN110736465A (en) * | 2019-11-15 | 2020-01-31 | 北京云迹科技有限公司 | Navigation method, navigation device, robot and computer readable storage medium |
CN111161193A (en) * | 2019-12-31 | 2020-05-15 | 深圳度影医疗科技有限公司 | Ultrasonic image quality optimization method, storage medium and terminal equipment |
CN111161193B (en) * | 2019-12-31 | 2023-09-05 | 深圳度影医疗科技有限公司 | Ultrasonic image quality optimization method, storage medium and terminal equipment |
CN111242132A (en) * | 2020-01-07 | 2020-06-05 | 广州赛特智能科技有限公司 | Outdoor road scene semantic segmentation method and device, electronic equipment and storage medium |
CN111476840A (en) * | 2020-05-14 | 2020-07-31 | 阿丘机器人科技(苏州)有限公司 | Target positioning method, device, equipment and computer readable storage medium |
CN111476840B (en) * | 2020-05-14 | 2023-08-22 | 阿丘机器人科技(苏州)有限公司 | Target positioning method, device, equipment and computer readable storage medium |
CN111738265A (en) * | 2020-05-20 | 2020-10-02 | 山东大学 | Semantic segmentation method, system, medium, and electronic device for RGB-D image |
CN112418233A (en) * | 2020-11-18 | 2021-02-26 | 北京字跳网络技术有限公司 | Image processing method, image processing device, readable medium and electronic equipment |
CN112861911A (en) * | 2021-01-10 | 2021-05-28 | 西北工业大学 | RGB-D semantic segmentation method based on depth feature selection fusion |
CN112861911B (en) * | 2021-01-10 | 2024-05-28 | 西北工业大学 | RGB-D semantic segmentation method based on depth feature selection fusion |
CN113221659B (en) * | 2021-04-13 | 2022-12-23 | 天津大学 | Double-light vehicle detection method and device based on uncertain sensing network |
CN113221659A (en) * | 2021-04-13 | 2021-08-06 | 天津大学 | Double-light vehicle detection method and device based on uncertain sensing network |
CN113393421A (en) * | 2021-05-08 | 2021-09-14 | 深圳市识农智能科技有限公司 | Fruit evaluation method and device and inspection equipment |
CN112949662A (en) * | 2021-05-13 | 2021-06-11 | 北京市商汤科技开发有限公司 | Image processing method and device, computer equipment and storage medium |
CN113971760A (en) * | 2021-10-26 | 2022-01-25 | 山东建筑大学 | High-quality quasi-dense complementary feature extraction method based on deep learning |
CN113971760B (en) * | 2021-10-26 | 2024-02-06 | 山东建筑大学 | High-quality quasi-dense complementary feature extraction method based on deep learning |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109271990A (en) | A kind of semantic segmentation method and device for RGB-D image | |
CN109993220B (en) | Multi-source remote sensing image classification method based on double-path attention fusion neural network | |
Chen et al. | Underwater image enhancement based on deep learning and image formation model | |
CN109800736B (en) | Road extraction method based on remote sensing image and deep learning | |
CN109584248B (en) | Infrared target instance segmentation method based on feature fusion and dense connection network | |
CN110378381B (en) | Object detection method, device and computer storage medium | |
CN108764063B (en) | Remote sensing image time-sensitive target identification system and method based on characteristic pyramid | |
CN112750140B (en) | Information mining-based disguised target image segmentation method | |
CN113065558A (en) | Lightweight small target detection method combined with attention mechanism | |
CN106897673B (en) | Retinex algorithm and convolutional neural network-based pedestrian re-identification method | |
Costea et al. | Creating roadmaps in aerial images with generative adversarial networks and smoothing-based optimization | |
CN110619638A (en) | Multi-mode fusion significance detection method based on convolution block attention module | |
CN108710863A (en) | Unmanned plane Scene Semantics dividing method based on deep learning and system | |
CN107451616A (en) | Multi-spectral remote sensing image terrain classification method based on the semi-supervised transfer learning of depth | |
CN108052881A (en) | The method and apparatus of multiclass entity object in a kind of real-time detection construction site image | |
CN105354581B (en) | The color image feature extracting method of Fusion of Color feature and convolutional neural networks | |
CN107506761A (en) | Brain image dividing method and system based on notable inquiry learning convolutional neural networks | |
CN104866868A (en) | Metal coin identification method based on deep neural network and apparatus thereof | |
Doi et al. | The effect of focal loss in semantic segmentation of high resolution aerial image | |
CN113743417B (en) | Semantic segmentation method and semantic segmentation device | |
CN110705566B (en) | Multi-mode fusion significance detection method based on spatial pyramid pool | |
CN110059728A (en) | RGB-D image vision conspicuousness detection method based on attention model | |
CN107341440A (en) | Indoor RGB D scene image recognition methods based on multitask measurement Multiple Kernel Learning | |
CN114926511A (en) | High-resolution remote sensing image change detection method based on self-supervision learning | |
CN108133235A (en) | A kind of pedestrian detection method based on neural network Analysis On Multi-scale Features figure |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190125 |
|
RJ01 | Rejection of invention patent application after publication |