CN108596102B - RGB-D-based indoor scene object segmentation classifier construction method - Google Patents

RGB-D-based indoor scene object segmentation classifier construction method Download PDF

Info

Publication number
CN108596102B
CN108596102B CN201810382977.2A CN201810382977A CN108596102B CN 108596102 B CN108596102 B CN 108596102B CN 201810382977 A CN201810382977 A CN 201810382977A CN 108596102 B CN108596102 B CN 108596102B
Authority
CN
China
Prior art keywords
rgb
network
depth
pixel
picture
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810382977.2A
Other languages
Chinese (zh)
Other versions
CN108596102A (en
Inventor
沈旭昆
周锋
迟小羽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qingdao Research Institute Of Beihang University
Original Assignee
Qingdao Research Institute Of Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qingdao Research Institute Of Beihang University filed Critical Qingdao Research Institute Of Beihang University
Priority to CN201810382977.2A priority Critical patent/CN108596102B/en
Publication of CN108596102A publication Critical patent/CN108596102A/en
Application granted granted Critical
Publication of CN108596102B publication Critical patent/CN108596102B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/35Categorising the entire scene, e.g. birthday party or wedding scene
    • G06V20/36Indoor scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/56Extraction of image or video features relating to colour

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses an indoor scene object segmentation classifier construction method based on RGB-D. The method comprises the steps of collecting RGB modal pictures of an indoor scene and depth modal pictures at the same pose, then sequentially extracting RGB modal picture features and depth modal picture features, sequentially carrying out semantic analysis on the collected RGB modal pictures and the depth modal pictures, adding corresponding category labels to each pixel in the collected pictures, connecting the extracted RGB features and the depth features together, and inputting the RGB features and the depth features into a full convolution network embedded with an RPN module to carry out object segmentation on the indoor scene. The method can be applied to understanding of indoor scenes, and can effectively help indoor robot navigation and indoor real-time reconstruction by effectively segmenting the currently captured scenes.

Description

RGB-D-based indoor scene object segmentation classifier construction method
Technical Field
The invention belongs to the technical field of computer application, and particularly relates to a construction method of an indoor scene body segmentation classifier.
Background
The perception and understanding of the scene are particularly the indoor scene, the labeling effects such as segmentation, detection and the like can be better achieved by using the RGB images in the outdoor scene, and for the indoor scene, the scene is difficult to understand only by using the RGB images due to the characteristics of complexity, changeability and the like. Complexity, variability and occlusion of an indoor scene are one of research focuses in the field of scene cognition and understanding, and the problem is always a problem to be solved urgently by virtual reality, artificial intelligence, intelligent robots and machine vision.
Object detection is a prerequisite for many advanced visual tasks, virtual reality, augmented reality and other tasks, such as intelligent video surveillance, content-based image retrieval, robotic navigation, augmented reality and the like. A great number of excellent object detection algorithms are proposed by a great number of researchers, for example, an algorithm framework based on AdaBoost, Haar-like wavelet features are used for classification, and then a sliding window method is used for positioning an object to be detected in an image, wherein the algorithm is the first target detection algorithm which can achieve real-time performance and has good detection accuracy. And as another example, by utilizing the HOG characteristic, a Support Vector Machine (SVM) is used as a classifier for pedestrian detection. As the multi-scale deformation component model (DPM) algorithm which has the most influence before the deep neural network is popular, the DPM algorithm is composed of a root filter and several component filters, deformation between components is derived through hidden variables, and the algorithm inherits the advantages of a classifier based on HOG features and using SVM, but the algorithm is difficult to use because the algorithm uses a sliding window mode to perform object positioning and additionally needs to manually specify the number of components and the relation between the components. The optimal algorithm and the best result of target detection before 2012 were based on DPM or improved algorithms of DPM, and after 2012, as the task of image recognition of the AlexNet deep neural network greatly surpassed the classic work at that time, the deep neural network gradually occupied various fields of computer vision and computational graphics. The action of the deep learning algorithm on the mountain making on target detection is RCNN network. This algorithm has similar disadvantages to the classical DPM algorithm, making it very slow due to the need to repeatedly detect regions. In order to solve the defects of the RCNN, a subsequent researcher integrates the process of extracting the features of the region into the network to provide the RPN, the repeated feature extraction of the picture is not needed, a large amount of time is saved, and the defect that the RCNN extracts the region by using a selectSearch algorithm and consumes a large amount of time is solved.
Scene semantic segmentation belongs to a picture classification task at a pixel level, namely, a given picture outputs a picture with pixel-by-pixel labels which are consistent with the input size through a segmentation algorithm. I.e. the expression of each pixel point in a given sample is
Figure BDA0001641443310000021
Wherein xiRepresenting the ith picture, representing the size of the picture represented by wxh, and outputting d which is the dimension of a picture pixel point through a picture semantic segmentation algorithm
Figure BDA0001641443310000022
Where C ∈ {1, 2, 3., C }, indicating that each pixel belongs to one of the classes C. Since the pixels in the picture have relevance, the relationship between the variables needs to be considered when dividing the pixels.
Although object detection and semantic segmentation of scenes have been well solved, the existing work mainly solves the problem of object positioning in indoor scenes or the problem of object semantic segmentation in indoor scenes, the former can provide coarse-scale semantic information in the indoor scenes to know the approximate position of an object in the indoor scenes, but does not know which pixel belongs to the current object, and the latter provides finer-scale indoor semantic scene information to provide a semantic label for each pixel in the indoor scenes, but the intra-class objects in the indoor scenes are not distinguished. Thus, these two separate tasks are not currently well integrated and do not provide more robust information for semantic understanding of indoor scenes.
Disclosure of Invention
The invention provides a refined object identification method for solving the problem that the same network cannot well provide the positioning and pixel labels of indoor objects, namely a construction method of an indoor scene pixel-by-pixel object segmentation classifier based on RGB-D, and the scheme is as follows:
an object segmentation classifier construction method for an indoor scene based on RGB-D comprises the following steps:
acquiring an RGB (red, green and blue) modal picture and a depth modal picture for an indoor scene;
counting the types of objects contained in the RGB modal picture and the depth modal picture, and then carrying out category marking on each pixel in the picture;
inputting the collected RGB mode picture into an RGB sub-network in a full-convolution network embedded into an RPN module, simultaneously inputting the collected depth mode picture into a depth sub-network in the full-convolution network of the RPN module, simultaneously extracting the features of the RGB mode picture and the depth mode picture, and respectively obtaining the feature f output by the RGB sub-networkrgbAnd the characteristics f of the output of the depth sub-networkdepth
Defining an RGB-D loss function, connecting the RGB sub-network and the depth sub-network together to construct an RGB-D multi-mode network structure for training an RGB-D-based indoor scene object segmentation pixel-by-pixel classifierrgbd
Step five, in the stage of network reasoning, test sample RGB-D data is respectively input into the trained RGB-D multi-mode network according to data mode, and the RGB sub-network extracts f from the input RGB mode picturergbExtracting f from input depth mode picture by depth sub-networkdepthSplicing the two extracted modal characteristics together, and inputting the two modal characteristics into a pixel-by-pixel classifierrgbdThe detection and the segmentation of the indoor scene object are carried out.
Further, in the fourth step, the RGB-D loss function is defined as follows:
Figure BDA0001641443310000031
wherein the content of the first and second substances,
Figure BDA0001641443310000032
Figure BDA0001641443310000033
as above, λ and γ are balance factors for balancing the ratio of RGB mode data to depth mode data in calculating loss, α and β are balance factors for balancing the proportion of the final calculated loss in the Reg network and the Seg network, N represents the number of anchor points, and when j is an anchor point, j is a balance factor
Figure BDA0001641443310000041
Otherwise, the reverse is carried out
Figure BDA0001641443310000042
IiRepresenting the ith RGB training data, DiDenoted the ith depth training data, label liE {0, 1.,. C }, one label value for each pixel in the given training data,
Figure BDA0001641443310000043
is the bounding box label corresponding to the ith training data,
Figure BDA0001641443310000044
represents a pixel classification result obtained by calculating the weight w and the corresponding parameter theta of the input ith RGB training data, wherein k represents the kth pixel,
Figure BDA0001641443310000045
denoted is the weight that the ith RGB training data maps from the classification layer to the label domain,
Figure BDA0001641443310000046
representing parameters based from a layer preceding the classification layer
Figure BDA0001641443310000047
And (5) extracting characteristic expression.
Further, in the third step, the RGB subnetwork includes two parts, where a network responsible for detecting the object in the indoor scene is defined as a Reg network, a network responsible for semantic segmentation of the indoor scene is defined as a Seg network, and the process of extracting features from the input RGB modal data using the RGB subnetwork is as follows: inputting the picture into a Reg network, and extracting the position of each object in an indoor scene image input into the network; c (3,64,1) -C (3,128,1) -C (3,256,1)
-C(3,256,1)-C(3,512,1)-C(3,512,1)-RPN(9)-F(4096)-F(4096)
Simultaneously inputting the RGB image input into the network into the Seg network to extract the category of each pixel in the indoor scene image input into the network
C(3,64,1)-C(3,64,1)-C(3,64,1)-C(3,128,1)-C(3,128,1)-C(3,256,1)-C(3,256,1)
-C(3,256,1)-C(3,512,1)-C(3,512,1)-C(3,512,1)-C(3,512,1)-C(3,512,1)
-ASPP(6,12,18,24)
Where C represents the convolution operation in the network, k in C (k, n, s, d) represents the kernel size of the convolution kernel, n represents the number of convolution kernels, s represents the step size of the convolution kernel shift in the convolution operation, ASPP (d)i) Given is a spatial pyramid pooling structure with a convolution with a hole, where diThe convolution kernel fill amplitude of the hole convolution is indicated.
Further, in the first step, the indoor scene is collected by using the Microsoft depth sensor Kinect, and the Kinect can be held by a hand to walk indoors at a constant speed in the collection process.
Compared with the prior art, the invention has the advantages and positive effects that:
the invention provides a novel RGB-D-based indoor scene pixel-by-pixel object segmentation classifier construction method, which can analyze the pixel-by-pixel category of objects in an acquired indoor scene picture according to acquired RGB and depth modal information, namely, the position of the objects in the indoor scene and the label of each pixel can be simultaneously output through a complete RGB-D network, and the method belongs to a multi-task network and provides finer-scale semantic understanding information for semantic understanding of the indoor scene.
In addition, the invention is a multi-task end-to-end learning network, which can optimize end to end, perfectly embed RPN network into FCN semantic segmentation network through designed loss function, and can well realize end-to-end indoor scene pixel-by-pixel object segmentation algorithm.
Detailed Description
The design concept of the invention is as follows:
the invention mainly focuses on object segmentation in an indoor scene, and in order to well solve the problem of object segmentation in the indoor scene, the problems of object detection and semantic segmentation of the scene in the indoor scene need to be solved.
The generation algorithm for the object bounding box is originally intended to employ an RPN network. The position of an object in an indoor scene can be quickly positioned through the RPN, but the RPN can only provide the position of each type of object and cannot provide pixel-by-pixel type in the indoor scene;
to solve the above problem, a full convolution network with a convolution with a hole is then adopted. The input image can be segmented by a convolution with a hole, but the image size is reduced due to the input image being subjected to the operations of convolution and pooling. The output of the image segmentation is a score map with the same size as the input, and in order to solve the problem of image size inconsistency caused by network calculation, the applicant intends to adopt a bilinear interpolation method to solve the problem. The two networks can provide the problems of object positioning and pixel-by-pixel segmentation understood by indoor scenes, but due to the fact that the two networks are dispersed, end-to-end de-optimization cannot be achieved, in order to solve the problem, the RPN network is finally embedded into the full convolution network with the convolution, and practice proves that the RPN network is embedded into the full convolution network with the convolution to achieve the following effects: firstly, the whole network is optimized end to end, the second weight can be shared, and the low-level features extracted from the first layers of the deep neural network can be shared, so that fine adjustment can be carried out.
In order that the above objects, features and advantages of the present invention can be more clearly understood, the present invention will be further described with reference to the following examples.
The embodiment provides a construction method of an RGB-D indoor scene-based pixel-by-pixel object segmentation classifier, which comprises the following steps:
acquiring an RGB (red, green and blue) modal picture and a depth modal picture for an indoor scene;
in the embodiment, the microsoft depth sensor Kinect is mainly used for collecting indoor scenes, and the Kinect depth sensor can simultaneously collect RGB modal data and depth modal data under the same visual angle so as to construct a picture sample set. The Kinect can be held by hand to walk indoor at a constant speed in the acquisition process.
Counting the types of objects contained in the RGB modal picture and the depth modal picture, and then carrying out category marking on each pixel in the picture;
in this embodiment, the acquired RGB-D data is mainly analyzed manually, the types of objects included in the picture are counted, and then each pixel in the picture is labeled by a category, and since the paired RGB modal picture and depth modal picture describe the same scene, the pixel label of the RGB modal picture and the pixel label of the depth modal picture are the same.
Inputting the collected RGB pictures into an RGB sub-network in a full-convolution network embedded into an RPN module, simultaneously inputting the collected depth mode pictures into a depth sub-network in the full-convolution network of the RPN module, simultaneously extracting the features of the RGB mode pictures and the depth mode pictures, and respectively obtaining the features f output by the RGB sub-networksrgbAnd the characteristics f of the output of the depth sub-networkdepth. In the embodiment, the RGB subnetwork includes two parts, wherein the network responsible for detecting the objects in the indoor scene is defined as Reg network, and the network responsible for semantic segmentation of the indoor scene is defined as Seg network. The specific process of extracting features is as follows, inputting pictures into the Reg network as follows
C(3,64,1)-C(3,64,1)-C(3,128,1)-C(3,128,1)-C(3,256,1)-C(3,256,1)-C(3,256,1)
-C(3,256,1)-C(3,512,1)-C(3,512,1)-RPN(9)-F(4096)-F(4096)
Simultaneously inputting RGB image inputted into network into Seg network
C(3,64,1)-C(3,64,1)-C(3,64,1)-C(3,128,1)-C(3,128,1)-C(3,256,1)-C(3,256,1)
-C(3,256,1)-C(3,512,1)-C(3,512,1)-C(3,512,1)-C(3,512,1)-C(3,512,1)
-ASPP(6,12,18,24)
Where C represents the convolution operation in the network, k in C (k, n, s, d) represents the kernel size of the convolution kernel, n represents the number of convolution kernels, s represents the step size of the convolution kernel shift in the convolution operation, ASPP (d)i) Given is a spatial pyramid pooling structure with a convolution with a hole, where diThe convolution kernel fill amplitude of the hole convolution is indicated. The position of each object in the indoor scene image input into the network is extracted through the above Reg network, and the category of each pixel in the picture input into the network is extracted through the Seg network.
Defining an RGB-D loss function, connecting the RGB sub-network and the depth sub-network together to construct an RGB-D multi-mode network structure for training an RGB-D-based indoor scene object segmentation pixel-by-pixel classifierrgbd
An RGB-D multi-mode network structure is constructed by connecting an RGB sub-network and a depth sub-network together by defining an RGB-D loss function, and is used for training an RGB-D-based indoor scene object segmentation pixel-by-pixel classifierrgbdThe process of (2) is as follows:
the RGB-D loss function is first defined as follows:
Figure BDA0001641443310000071
wherein
Figure BDA0001641443310000072
Figure BDA0001641443310000073
As above, λ and γ are a balance factor for balancing RGB mode data and depth mode dataThe proportion of loss in calculation, alpha and beta are balance factors for balancing the proportion of loss in final calculation in the Reg network and the Seg network, N represents the position number of anchor points, and when j belongs to the anchor points
Figure BDA0001641443310000081
Otherwise, the reverse is carried out
Figure BDA0001641443310000082
IiRepresenting the ith RGB training data, DiDenoted the ith depth training data, label liE {0, 1.,. C }, one label value for each pixel in the given training data,
Figure BDA0001641443310000083
given is the bounding box label corresponding to the ith training data,
Figure BDA0001641443310000084
representing a pixel classification result obtained by calculating the ith RGB training data of the input through a weight w and a corresponding parameter theta, wherein k represents the kth pixel,
Figure BDA0001641443310000085
denoted is the weight that the ith RGB training data maps from the classification layer to the label domain,
Figure BDA0001641443310000086
it is shown that the fc7 layer is the layer before the classification layer (in this embodiment, the fc7 layer before the softmax layer) based on the parameter
Figure BDA0001641443310000087
Extracted feature expressions). And updating the learning of the network through the calculated loss value so as to obtain a final indoor scene pixel-by-pixel object segmentation classifier.
Step five, in the stage of network reasoning, test sample RGB-D data are respectively input into the trained RGB-D multi-mode network according to data modesExtracting f from input RGB mode picture by RGB sub networkrgbExtracting f from input depth mode picture by depth sub-networkdepth. Splicing the two extracted modal characteristics together, and inputting the two modal characteristics into a pixel-by-pixel classifierrgbdThe detection and segmentation tasks of the indoor scene objects are carried out.
The execution environment of the invention adopts a 4.0GHZ central processing unit and a core 4-core computer with 128 Gbyte memory, and simultaneously, in order to accelerate the training and reasoning process of an object recognition network, 4 blocks of GeForceGTX1080TIGPU display cards are adopted for accelerating the calculation. Meanwhile, a construction program of the RGB-D indoor scene pixel-by-pixel object segmentation classifier is compiled by adopting C + + and python languages, and other execution environments can be adopted, so that the description is omitted.
Compared with the mode based on manual features (handed) used in the prior art, the method needs a strong professional background to realize the mode by constructing the features of RGB and depth modes and then inputting the obtained features into the SVM classifier, is complex and cannot perform end-to-end optimization, and the sectional optimization is easy to cause local optimization. In addition, the invention can well distinguish the difference in the classification, and the output has the discrimination in the classification. For example, the invention can not only separate the chair and the table in the indoor scene, but also output the positions of the chair and the table in the indoor scene, and separate the two chairs in the indoor scene.
The method can be applied to understanding of indoor scenes, and can effectively help indoor robot navigation and indoor real-time reconstruction by effectively segmenting the currently captured scenes.
The above description is only a preferred embodiment of the present invention, and not intended to limit the present invention in other forms, and any person skilled in the art may apply the above modifications or variations to the disclosed embodiments and equivalent embodiments, but also all simple modifications, equivalent variations and variations made to the above embodiments according to the technical spirit of the present invention may still fall within the technical scope of the present invention.

Claims (3)

1. An object segmentation classifier construction method for an indoor scene based on RGB-D is characterized by comprising the following steps: the method comprises the following steps:
acquiring an RGB (red, green and blue) modal picture and a depth modal picture for an indoor scene;
counting the types of objects contained in the RGB modal picture and the depth modal picture, and then carrying out category marking on each pixel in the picture;
inputting the collected RGB mode picture into an RGB sub-network in a full-convolution network embedded into an RPN module, simultaneously inputting the collected depth mode picture into a depth sub-network in the full-convolution network of the RPN module, simultaneously extracting the features of the RGB mode picture and the depth mode picture, and respectively obtaining the feature f output by the RGB sub-networkrgbAnd the characteristics f of the output of the depth sub-networkdepth
Defining an RGB-D loss function, and connecting an RGB sub-network and a depth sub-network together to construct an RGB-D multi-mode network structure for training an RGB-D-based indoor scene object segmentation pixel-by-pixel classifierrgbd
The RGB-D loss function is defined as follows:
Figure FDA0003499344250000011
wherein the content of the first and second substances,
Figure FDA0003499344250000012
Figure FDA0003499344250000013
as above, λ and γ are balance factors for balancing the ratio of RGB mode data to depth mode data in calculating loss, α and β are balance factors for balancing the proportion of the final calculated loss in the Reg network and the Seg network, N represents the number of anchor points, and when j is an anchor point, j is a balance factor
Figure FDA0003499344250000014
Otherwise, the reverse is carried out
Figure FDA0003499344250000015
IiRepresenting the ith RGB training data, DiDenoted the ith depth training data, label liE {0, 1.,. C }, one label value for each pixel in the given training data,
Figure FDA0003499344250000016
is the bounding box label corresponding to the ith training data,
Figure FDA0003499344250000017
represents a pixel classification result obtained by calculating the weight w and the corresponding parameter theta of the input ith RGB training data, wherein k represents the kth pixel,
Figure FDA0003499344250000021
denoted is the weight that the ith RGB training data maps from the classification layer to the label domain,
Figure FDA0003499344250000022
representing parameters based from a layer preceding the classification layer
Figure FDA0003499344250000023
Extracting characteristic expression;
step five, in the stage of network reasoning, test sample RGB-D data are respectively input into the trained RGB-D multi-mode network according to data modes, and RGB sub-networksExtracting f from input RGB modal picturergbExtracting f from input depth mode picture by depth sub-networkdepthSplicing the two extracted modal characteristics together, and inputting the two modal characteristics into a pixel-by-pixel classifierrgbdThe detection and the segmentation of the indoor scene object are carried out.
2. The RGB-D based indoor scene object segmentation classifier construction method as claimed in claim 1, wherein: in the third step, the RGB sub-network includes two parts, wherein the network responsible for detecting the objects in the indoor scene is defined as the Reg network, the network responsible for semantic segmentation of the indoor scene is defined as the Seg network, and the process of extracting the features of the input RGB modal data by using the RGB sub-network is as follows: inputting the picture into a Reg network, and extracting the position of each object in an indoor scene image input into the network;
Figure FDA0003499344250000024
simultaneously inputting the RGB image input into the network into the Seg network to extract the category of each pixel in the indoor scene image input into the network
C(3,64,1)-C(3,64,1)-C(3,64,1)-C(3,128,1)-C(3,128,1)-C(3,256,1)-C(3,256,1)-C(3,256,1)-C(3,512,1)-C(3,512,1)-C(3,512,1)-C(3,512,1)-C(3,512,1)-ASPP(6,12,18,24)
Where C represents the convolution operation in the network, k in C (k, n, s) represents the kernel size of the convolution kernel, n represents the number of convolution kernels, s represents the step size of the convolution kernel shift in the convolution operation, ASPP (d)1,d2,d3,d4) Given is a spatial pyramid pooling structure with a convolution with a hole, where d1,d2,d3,d4The convolution kernel fill amplitude size of the punctured convolution is indicated.
3. The RGB-D based indoor scene object segmentation classifier construction method as claimed in claim 1, wherein: in the first step, the indoor scene is collected by using the Microsoft depth sensor Kinect, and the Kinect can be held by a hand to walk indoors at a constant speed in the collection process.
CN201810382977.2A 2018-04-26 2018-04-26 RGB-D-based indoor scene object segmentation classifier construction method Active CN108596102B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810382977.2A CN108596102B (en) 2018-04-26 2018-04-26 RGB-D-based indoor scene object segmentation classifier construction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810382977.2A CN108596102B (en) 2018-04-26 2018-04-26 RGB-D-based indoor scene object segmentation classifier construction method

Publications (2)

Publication Number Publication Date
CN108596102A CN108596102A (en) 2018-09-28
CN108596102B true CN108596102B (en) 2022-04-05

Family

ID=63609387

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810382977.2A Active CN108596102B (en) 2018-04-26 2018-04-26 RGB-D-based indoor scene object segmentation classifier construction method

Country Status (1)

Country Link
CN (1) CN108596102B (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109492666B (en) * 2018-09-30 2021-07-06 北京百卓网络技术有限公司 Image recognition model training method and device and storage medium
CN109598268B (en) * 2018-11-23 2021-08-17 安徽大学 RGB-D (Red Green blue-D) significant target detection method based on single-stream deep network
CN109766822B (en) * 2019-01-07 2021-02-05 山东大学 Gesture recognition method and system based on neural network
CN110110578B (en) * 2019-02-21 2023-09-29 北京工业大学 Indoor scene semantic annotation method
CN110737941A (en) * 2019-10-12 2020-01-31 南京我爱我家信息科技有限公司 house decoration degree recognition system and method based on probability model and pixel statistical model
CN110705653A (en) * 2019-10-22 2020-01-17 Oppo广东移动通信有限公司 Image classification method, image classification device and terminal equipment
CN111506940B (en) * 2019-12-13 2022-08-12 江苏艾佳家居用品有限公司 Furniture, ornament and lamp integrated intelligent layout method based on 3D structured light
CN112818837B (en) * 2021-01-29 2022-11-11 山东大学 Aerial photography vehicle weight recognition method based on attitude correction and difficult sample perception
CN113222003B (en) * 2021-05-08 2023-08-01 北方工业大学 Construction method and system of indoor scene pixel-by-pixel semantic classifier based on RGB-D
CN114426069B (en) * 2021-12-14 2023-08-25 哈尔滨理工大学 Indoor rescue vehicle based on real-time semantic segmentation and image semantic segmentation method

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102867288A (en) * 2011-07-07 2013-01-09 三星电子株式会社 Depth image conversion apparatus and method
CN103226828A (en) * 2013-04-09 2013-07-31 哈尔滨工程大学 Image registration method of acoustic and visual three-dimensional imaging with underwater vehicle
CN105956532A (en) * 2016-04-25 2016-09-21 大连理工大学 Traffic scene classification method based on multi-scale convolution neural network
CN106651765A (en) * 2016-12-30 2017-05-10 深圳市唯特视科技有限公司 Method for automatically generating thumbnail by use of deep neutral network
CN106709568A (en) * 2016-12-16 2017-05-24 北京工业大学 RGB-D image object detection and semantic segmentation method based on deep convolution network
CN107169974A (en) * 2017-05-26 2017-09-15 中国科学技术大学 It is a kind of based on the image partition method for supervising full convolutional neural networks more
CN107341440A (en) * 2017-05-08 2017-11-10 西安电子科技大学昆山创新研究院 Indoor RGB D scene image recognition methods based on multitask measurement Multiple Kernel Learning
CN107403430A (en) * 2017-06-15 2017-11-28 中山大学 A kind of RGBD image, semantics dividing method
WO2018047033A1 (en) * 2016-09-07 2018-03-15 Nokia Technologies Oy Method and apparatus for facilitating stereo vision through the use of multi-layer shifting

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104933755B (en) * 2014-03-18 2017-11-28 华为技术有限公司 A kind of stationary body method for reconstructing and system
CN106612427B (en) * 2016-12-29 2018-07-06 浙江工商大学 A kind of generation method of the space-time consistency depth map sequence based on convolutional neural networks
CN107622244B (en) * 2017-09-25 2020-08-28 华中科技大学 Indoor scene fine analysis method based on depth map

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102867288A (en) * 2011-07-07 2013-01-09 三星电子株式会社 Depth image conversion apparatus and method
CN103226828A (en) * 2013-04-09 2013-07-31 哈尔滨工程大学 Image registration method of acoustic and visual three-dimensional imaging with underwater vehicle
CN105956532A (en) * 2016-04-25 2016-09-21 大连理工大学 Traffic scene classification method based on multi-scale convolution neural network
WO2018047033A1 (en) * 2016-09-07 2018-03-15 Nokia Technologies Oy Method and apparatus for facilitating stereo vision through the use of multi-layer shifting
CN106709568A (en) * 2016-12-16 2017-05-24 北京工业大学 RGB-D image object detection and semantic segmentation method based on deep convolution network
CN106651765A (en) * 2016-12-30 2017-05-10 深圳市唯特视科技有限公司 Method for automatically generating thumbnail by use of deep neutral network
CN107341440A (en) * 2017-05-08 2017-11-10 西安电子科技大学昆山创新研究院 Indoor RGB D scene image recognition methods based on multitask measurement Multiple Kernel Learning
CN107169974A (en) * 2017-05-26 2017-09-15 中国科学技术大学 It is a kind of based on the image partition method for supervising full convolutional neural networks more
CN107403430A (en) * 2017-06-15 2017-11-28 中山大学 A kind of RGBD image, semantics dividing method

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
《DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution,and Fully Connected CRFs》;Liang-Chieh Chen等;《arXiv》;20170514;第1-14页 *
《Learning Rich Features from RGB-D Images for Object Detection and Segmentation》;Saurabh Gupta等;《European Conference on Computer Vision 2014》;20141231;第345-360页 *
《基于手持物体学习的室内物体检测研究及应用》;乔雷先;《中国优秀硕士学位论文全文数据库 信息科技辑》;20171015(第10期);第I138-231页 *
《基于高斯模型的遥感影像目标识别方法的初探》;李彦 等;《系统仿真学报》;20091023;第21卷(第S1期);第57-60页 *

Also Published As

Publication number Publication date
CN108596102A (en) 2018-09-28

Similar Documents

Publication Publication Date Title
CN108596102B (en) RGB-D-based indoor scene object segmentation classifier construction method
CN109344701B (en) Kinect-based dynamic gesture recognition method
WO2021022970A1 (en) Multi-layer random forest-based part recognition method and system
CN110929593B (en) Real-time significance pedestrian detection method based on detail discrimination
CN105160310A (en) 3D (three-dimensional) convolutional neural network based human body behavior recognition method
CN106991370B (en) Pedestrian retrieval method based on color and depth
JP2012243313A (en) Image processing method and image processing device
CN108596256B (en) Object recognition classifier construction method based on RGB-D
CN109685045A (en) A kind of Moving Targets Based on Video Streams tracking and system
CN109977834B (en) Method and device for segmenting human hand and interactive object from depth image
CN112257665A (en) Image content recognition method, image recognition model training method, and medium
CN113052295B (en) Training method of neural network, object detection method, device and equipment
CN111353447A (en) Human skeleton behavior identification method based on graph convolution network
CN114821014A (en) Multi-mode and counterstudy-based multi-task target detection and identification method and device
CN113515655A (en) Fault identification method and device based on image classification
CN114332911A (en) Head posture detection method and device and computer equipment
CN115816460A (en) Manipulator grabbing method based on deep learning target detection and image segmentation
Ge et al. Coarse-to-fine foraminifera image segmentation through 3D and deep features
CN113658129B (en) Position extraction method combining visual saliency and line segment strength
CN105404682B (en) A kind of book retrieval method based on digital image content
Akanksha et al. A Feature Extraction Approach for Multi-Object Detection Using HoG and LTP.
CN113033386A (en) High-resolution remote sensing image-based transmission line channel hidden danger identification method and system
CN111738264A (en) Intelligent acquisition method for data of display panel of machine room equipment
CN108109125A (en) Information extracting method and device based on remote sensing images
CN114549809A (en) Gesture recognition method and related equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant