CN108596102A - Indoor scene object segmentation grader building method based on RGB-D - Google Patents

Indoor scene object segmentation grader building method based on RGB-D Download PDF

Info

Publication number
CN108596102A
CN108596102A CN201810382977.2A CN201810382977A CN108596102A CN 108596102 A CN108596102 A CN 108596102A CN 201810382977 A CN201810382977 A CN 201810382977A CN 108596102 A CN108596102 A CN 108596102A
Authority
CN
China
Prior art keywords
rgb
network
depth
picture
pixel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810382977.2A
Other languages
Chinese (zh)
Other versions
CN108596102B (en
Inventor
沈旭昆
周锋
迟小羽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qingdao Research Institute Of Beihang University
Original Assignee
Qingdao Research Institute Of Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qingdao Research Institute Of Beihang University filed Critical Qingdao Research Institute Of Beihang University
Priority to CN201810382977.2A priority Critical patent/CN108596102B/en
Publication of CN108596102A publication Critical patent/CN108596102A/en
Application granted granted Critical
Publication of CN108596102B publication Critical patent/CN108596102B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/35Categorising the entire scene, e.g. birthday party or wedding scene
    • G06V20/36Indoor scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/56Extraction of image or video features relating to colour

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a kind of indoor scene object segmentation grader building method based on RGB D.Pass through the depth mode pictures under the RGB mode picture of acquisition indoor scene and same pose, then RGB mode picture feature and depth mode picture features are extracted successively, semantic analysis is carried out to the RGB mode picture of acquisition and depth mode pictures successively, corresponding class label is added to each pixel in the picture of acquisition, by the way that the RGB feature of extraction and depth features link together, it is input to the full convolutional network that one is embedded in RPN modules and carries out object segmentation to indoor scene.The present invention can apply in the understanding of scene indoors, by carrying out effective semantic segmentation to the scene currently captured, can effectively help Indoor Robot navigation and indoor real-time reconstruction.

Description

RGB-D-based indoor scene object segmentation classifier construction method
Technical Field
The invention belongs to the technical field of computer application, and particularly relates to a construction method of an indoor scene body segmentation classifier.
Background
The perception and understanding of the scene are particularly the indoor scene, the labeling effects such as segmentation, detection and the like can be better achieved by using the RGB images in the outdoor scene, and for the indoor scene, the scene is difficult to understand only by using the RGB images due to the characteristics of complexity, changeability and the like. Complexity, variability and occlusion of an indoor scene are one of research focuses in the field of scene cognition and understanding, and the problem is always a problem to be solved urgently by virtual reality, artificial intelligence, intelligent robots and machine vision.
Object detection is a prerequisite for many advanced visual tasks, virtual reality, augmented reality and other tasks, such as intelligent video surveillance, content-based image retrieval, robotic navigation, augmented reality and the like. A great number of excellent object detection algorithms are proposed by a great number of researchers, for example, an algorithm framework based on AdaBoost, Haar-like wavelet features are used for classification, and then a sliding window method is used for positioning an object to be detected in an image, wherein the algorithm is the first target detection algorithm which can achieve real-time performance and has good detection accuracy. And as another example, by utilizing the HOG characteristic, a Support Vector Machine (SVM) is used as a classifier for pedestrian detection. As the multi-scale deformation component model (DPM) algorithm which has the most influence before the deep neural network is popular, the DPM algorithm is composed of a root filter and several component filters, deformation between components is derived through hidden variables, and the algorithm inherits the advantages of a classifier based on HOG features and using SVM, but the algorithm is difficult to use because the algorithm uses a sliding window mode to perform object positioning and additionally needs to manually specify the number of components and the relation between the components. The optimal algorithm and the best result of target detection before 2012 were based on DPM or improved algorithms of DPM, and after 2012, as the task of image recognition of the AlexNet deep neural network greatly surpassed the classic work at that time, the deep neural network gradually occupied various fields of computer vision and computational graphics. The action of the deep learning algorithm on the mountain making on target detection is RCNN network. This algorithm has similar disadvantages to the classical DPM algorithm, making it very slow due to the need to repeatedly detect regions. In order to solve the defects of the RCNN, a subsequent researcher integrates the process of extracting the features of the region into the network to provide the RPN, the repeated feature extraction of the picture is not needed, a large amount of time is saved, and the defect that the RCNN extracts the region by using a selectSearch algorithm and consumes a large amount of time is solved.
Scene semantic segmentation belongs to a picture classification task at a pixel level, namely, a given picture outputs a picture with pixel-by-pixel labels which are consistent with the input size through a segmentation algorithm. I.e. the expression of each pixel point in a given sample isWherein xiRepresenting the ith picture, representing the size of the picture represented by wxh, and outputting d which is the dimension of a picture pixel point through a picture semantic segmentation algorithmWhere C ∈ {1, 2, 3., C }, indicating that each pixel belongs to one of the classes C. Since the pixels in the picture have relevance, the relationship between the variables needs to be considered when dividing the pixels.
Although object detection and semantic segmentation of scenes have been well solved, the existing work mainly solves the problem of object positioning in indoor scenes or the problem of object semantic segmentation in indoor scenes, the former can provide coarse-scale semantic information in the indoor scenes to know the approximate position of an object in the indoor scenes, but does not know which pixel belongs to the current object, and the latter provides finer-scale indoor semantic scene information to provide a semantic label for each pixel in the indoor scenes, but the intra-class objects in the indoor scenes are not distinguished. Thus, these two separate tasks are not currently well integrated and do not provide more robust information for semantic understanding of indoor scenes.
Disclosure of Invention
The invention provides a refined object identification method for solving the problem that the same network cannot well provide the positioning and pixel labels of indoor objects, namely a construction method of an indoor scene pixel-by-pixel object segmentation classifier based on RGB-D, and the scheme is as follows:
an object segmentation classifier construction method for an indoor scene based on RGB-D comprises the following steps:
acquiring an RGB (red, green and blue) modal picture and a depth modal picture for an indoor scene;
counting the types of objects contained in the RGB modal picture and the depth modal picture, and then carrying out category marking on each pixel in the picture;
inputting the collected RGB mode picture into an RGB sub-network in a full-convolution network embedded into an RPN module, simultaneously inputting the collected depth mode picture into a depth sub-network in the full-convolution network of the RPN module, simultaneously extracting the features of the RGB mode picture and the depth mode picture, and respectively obtaining the feature f output by the RGB sub-networkrgbAnd the characteristics f of the output of the depth sub-networkdepth
Defining an RGB-D loss function, connecting the RGB sub-network and the depth sub-network together to construct an RGB-D multi-mode network structure for training an RGB-D-based indoor scene object segmentation pixel-by-pixel classifierrgbd
Step five, in the stage of network reasoning, test sample RGB-D data is respectively input into the trained RGB-D multi-mode network according to data mode, and the RGB sub-network extracts f from the input RGB mode picturergbExtracting f from input depth mode picture by depth sub-networkdepthSplicing the two extracted modal characteristics together, and inputting the two modal characteristics into a pixel-by-pixel classifierrgbdThe detection and the segmentation of the indoor scene object are carried out.
Further, in the fourth step, the RGB-D loss function is defined as follows:
wherein,
as above, λ and γ are balance factors for balancing the ratio of RGB mode data to depth mode data in calculating loss, α and β are balance factors for balancing the proportion of the final calculated loss in the Reg network and the Seg network, N represents the number of anchor points, and when j is an anchor point, j is a balance factorOtherwise, the reverse is carried outIiRepresenting the ith RGB training data, DiDenoted the ith depth training data, label liE {0, 1.,. C }, one label value for each pixel in the given training data,is the bounding box label corresponding to the ith training data,represents a pixel classification result obtained by calculating the weight w and the corresponding parameter theta of the input ith RGB training data, wherein k represents the kth pixel,denoted is the weight that the ith RGB training data maps from the classification layer to the label domain,representing parameters based from a layer preceding the classification layerAnd (5) extracting characteristic expression.
Further, in the third step, the RGB subnetwork includes two parts, where a network responsible for detecting the object in the indoor scene is defined as a Reg network, a network responsible for semantic segmentation of the indoor scene is defined as a Seg network, and the process of extracting features from the input RGB modal data using the RGB subnetwork is as follows: inputting the picture into a Reg network, and extracting the position of each object in an indoor scene image input into the network; c (3,64,1) -C (3,64,1) -C (3,128,1) -C (3,128,1) -C (3,256,1)
-C(3,256,1)-C(3,512,1)-C(3,512,1)-RPN(9)-F(4096)-F(4096)
Simultaneously inputting the RGB image input into the network into the Seg network to extract the category of each pixel in the indoor scene image input into the network
C(3,64,1)-C(3,64,1)-C(3,64,1)-C(3,128,1)-C(3,128,1)-C(3,256,1)-C(3,256,1)
-C(3,256,1)-C(3,512,1)-C(3,512,1)-C(3,512,1)-C(3,512,1)-C(3,512,1)
-ASPP(6,12,18,24)
Where C represents the convolution operation in the network, k in C (k, n, s, d) represents the kernel size of the convolution kernel, n represents the number of convolution kernels, s represents the step size of the convolution kernel shift in the convolution operation, ASPP (d)i) Given is a spatial pyramid pooling structure with a convolution with a hole, where diThe convolution kernel fill amplitude of the hole convolution is indicated.
Further, in the first step, the indoor scene is collected by using the Microsoft depth sensor Kinect, and the Kinect can be held by a hand to walk indoors at a constant speed in the collection process.
Compared with the prior art, the invention has the advantages and positive effects that:
the invention provides a novel RGB-D-based indoor scene pixel-by-pixel object segmentation classifier construction method, which can analyze the pixel-by-pixel category of objects in an acquired indoor scene picture according to acquired RGB and depth modal information, namely, the position of the objects in the indoor scene and the label of each pixel can be simultaneously output through a complete RGB-D network, and the method belongs to a multi-task network and provides finer-scale semantic understanding information for semantic understanding of the indoor scene.
In addition, the invention is a multi-task end-to-end learning network, which can optimize end to end, perfectly embed RPN network into FCN semantic segmentation network through designed loss function, and can well realize end-to-end indoor scene pixel-by-pixel object segmentation algorithm.
Detailed Description
The design concept of the invention is as follows:
the invention mainly focuses on object segmentation in an indoor scene, and in order to well solve the problem of object segmentation in the indoor scene, the problems of object detection and semantic segmentation of the scene in the indoor scene need to be solved.
The generation algorithm for the object bounding box is originally intended to employ an RPN network. The position of an object in an indoor scene can be quickly positioned through the RPN, but the RPN can only provide the position of each type of object and cannot provide pixel-by-pixel type in the indoor scene;
to solve the above problem, a full convolution network with a convolution with a hole is then adopted. The input image can be segmented by a convolution with a hole, but the image size is reduced due to the input image being subjected to the operations of convolution and pooling. The output of the image segmentation is a score map with the same size as the input, and in order to solve the problem of image size inconsistency caused by network calculation, the applicant intends to adopt a bilinear interpolation method to solve the problem. The two networks can provide the problems of object positioning and pixel-by-pixel segmentation understood by indoor scenes, but due to the fact that the two networks are dispersed, end-to-end de-optimization cannot be achieved, in order to solve the problem, the RPN network is finally embedded into the full convolution network with the convolution, and practice proves that the RPN network is embedded into the full convolution network with the convolution to achieve the following effects: firstly, the whole network is optimized end to end, the second weight can be shared, and the low-level features extracted from the first layers of the deep neural network can be shared, so that fine adjustment can be carried out.
In order that the above objects, features and advantages of the present invention can be more clearly understood, the present invention will be further described with reference to the following examples.
The embodiment provides a construction method of an RGB-D indoor scene-based pixel-by-pixel object segmentation classifier, which comprises the following steps:
acquiring an RGB (red, green and blue) modal picture and a depth modal picture for an indoor scene;
in the embodiment, the microsoft depth sensor Kinect is mainly used for collecting indoor scenes, and the Kinect depth sensor can simultaneously collect RGB modal data and depth modal data under the same visual angle so as to construct a picture sample set. The Kinect can be held by hand to walk indoor at a constant speed in the acquisition process.
Counting the types of objects contained in the RGB modal picture and the depth modal picture, and then carrying out category marking on each pixel in the picture;
in this embodiment, the acquired RGB-D data is mainly analyzed manually, the types of objects included in the picture are counted, and then each pixel in the picture is labeled by a category, and since the paired RGB modal picture and depth modal picture describe the same scene, the pixel label of the RGB modal picture and the pixel label of the depth modal picture are the same.
Inputting the collected RGB pictures into an RGB sub-network in a full-convolution network embedded into an RPN module, simultaneously inputting the collected depth mode pictures into a depth sub-network in the full-convolution network of the RPN module, simultaneously extracting the features of the RGB mode pictures and the depth mode pictures, and respectively obtaining the features f output by the RGB sub-networksrgbAnd the characteristics f of the output of the depth sub-networkdepth. In the embodiment, the RGB subnetwork includes two parts, wherein the network responsible for detecting the objects in the indoor scene is defined as Reg network, and the network responsible for semantic segmentation of the indoor scene is defined as Seg network. The specific process of extracting features is as follows, inputting pictures into the Reg network as follows
C(3,64,1)-C(3,64,1)-C(3,128,1)-C(3,128,1)-C(3,256,1)-C(3,256,1)-C(3,256,1)
-C(3,256,1)-C(3,512,1)-C(3,512,1)-RPN(9)-F(4096)-F(4096)
Simultaneously inputting RGB image inputted into network into Seg network
C(3,64,1)-C(3,64,1)-C(3,64,1)-C(3,128,1)-C(3,128,1)-C(3,256,1)-C(3,256,1)
-C(3,256,1)-C(3,512,1)-C(3,512,1)-C(3,512,1)-C(3,512,1)-C(3,512,1)
-ASPP(6,12,18,24)
Where C represents the convolution operation in the network, k in C (k, n, s, d) represents the kernel size of the convolution kernel, n represents the number of convolution kernels, s represents the step size of the convolution kernel shift in the convolution operation, ASPP (d)i) Given is a spatial pyramid pooling structure with a convolution with a hole, where diThe convolution kernel fill amplitude of the hole convolution is indicated. Extracting the position of each object in the indoor scene image input into the network through the above Reg network, and extracting the position of each object input into the network through the Seg networkA category for each pixel in the picture.
Defining an RGB-D loss function, connecting the RGB sub-network and the depth sub-network together to construct an RGB-D multi-mode network structure for training an RGB-D-based indoor scene object segmentation pixel-by-pixel classifierrgbd
An RGB-D multi-mode network structure is constructed by connecting an RGB sub-network and a depth sub-network together by defining an RGB-D loss function, and is used for training an RGB-D-based indoor scene object segmentation pixel-by-pixel classifierrgbdThe process of (2) is as follows:
the RGB-D loss function is first defined as follows:
wherein
As above, λ and γ are balance factors for balancing the ratio of RGB mode data to depth mode data in calculating loss, α and β are balance factors for balancing the proportion of the final calculated loss in the Reg network and the Seg network, N represents the number of anchor points, and j is the anchor pointOtherwise, the reverse is carried outIiRepresenting the ith RGB training data, DiDenoted as the ith depth training data, label liE {0, 1.,. C }, one label value for each pixel in the given training data,given is the bounding box label corresponding to the ith training data,representing a pixel classification result obtained by calculating the ith RGB training data of the input through a weight w and a corresponding parameter theta, wherein k represents the kth pixel,denoted is the weight that the ith RGB training data maps from the classification layer to the label domain,it is shown that the fc7 layer is the layer before the classification layer (in this embodiment, the fc7 layer before the softmax layer) based on the parameterExtracted feature expressions). And updating the learning of the network through the calculated loss value so as to obtain a final indoor scene pixel-by-pixel object segmentation classifier.
Step five, in the stage of network reasoning, test sample RGB-D data are respectively input into the trained RGB-D multi-mode network according to data modes, and the RGB sub-network extracts f from the input RGB mode picturergbExtracting f from input depth mode picture by depth sub-networkdepth. Splicing the two extracted modal characteristics together, and inputting the two modal characteristics into a pixel-by-pixel classifierrgbdThe detection and segmentation tasks of the indoor scene objects are carried out.
The execution environment of the invention adopts a 4.0GHZ central processing unit and a core 4-core computer with 128 Gbyte memory, and simultaneously, in order to accelerate the training and reasoning process of an object recognition network, 4 blocks of GeForceGTX1080TIGPU display cards are adopted for accelerating the calculation. Meanwhile, a construction program of the RGB-D indoor scene pixel-by-pixel object segmentation classifier is compiled by adopting C + + and python languages, and other execution environments can be adopted, so that the description is omitted.
Compared with the mode based on manual features (handed) used in the prior art, the method needs a strong professional background to realize the mode by constructing the features of RGB and depth modes and then inputting the obtained features into the SVM classifier, is complex and cannot perform end-to-end optimization, and the sectional optimization is easy to cause local optimization. In addition, the invention can well distinguish the difference in the classification, and the output has the discrimination in the classification. For example, the invention can not only separate the chair and the table in the indoor scene, but also output the positions of the chair and the table in the indoor scene, and separate the two chairs in the indoor scene.
The method can be applied to understanding of indoor scenes, and can effectively help indoor robot navigation and indoor real-time reconstruction by effectively segmenting the currently captured scenes.
The above description is only a preferred embodiment of the present invention, and not intended to limit the present invention in other forms, and any person skilled in the art may apply the above modifications or variations to the disclosed embodiments and equivalent embodiments, but also all simple modifications, equivalent variations and variations made to the above embodiments according to the technical spirit of the present invention may still fall within the technical scope of the present invention.

Claims (4)

1. An object segmentation classifier construction method for an indoor scene based on RGB-D is characterized by comprising the following steps: the method comprises the following steps:
acquiring an RGB (red, green and blue) modal picture and a depth modal picture for an indoor scene;
counting the types of objects contained in the RGB modal picture and the depth modal picture, and then carrying out category marking on each pixel in the picture;
step three, inputting the collected RGB mode picture into an RGB sub-network in a full-convolution network embedded with an RPN module, and simultaneously collecting the collected RGB mode pictureThe depth mode picture is input into a depth sub-network in a full convolution network of an RPN module, and simultaneously, feature extraction is carried out on the RGB mode picture and the depth mode picture to respectively obtain features f output by the RGB sub-networkrgbAnd the characteristics f of the output of the depth sub-networkdepth
Defining an RGB-D loss function, connecting the RGB sub-network and the depth sub-network together to construct an RGB-D multi-mode network structure for training an RGB-D-based indoor scene object segmentation pixel-by-pixel classifierrgbd
Step five, in the stage of network reasoning, test sample RGB-D data is respectively input into the trained RGB-D multi-mode network according to data mode, and the RGB sub-network extracts f from the input RGB mode picturergbExtracting f from input depth mode picture by depth sub-networkdepthSplicing the two extracted modal characteristics together, and inputting the two modal characteristics into a pixel-by-pixel classifierrgbdThe detection and the segmentation of the indoor scene object are carried out.
2. The RGB-D based indoor scene object segmentation classifier construction method as claimed in claim 1, wherein:
in the fourth step, the RGB-D loss function is defined as follows:
wherein,
as above, λ and γ are balance factors for balancing the ratio of RGB mode data to depth mode data in calculating loss, and α and β are balance factors for balancing the loss in final calculation in the Reg network and the Seg networkN represents the number of anchor point positions, and when j belongs to an anchor pointOtherwise, the reverse is carried outIiRepresenting the ith RGB training data, DiDenoted the ith depth training data, label liE {0, 1.,. C }, one label value for each pixel in the given training data,is the bounding box label corresponding to the ith training data,represents a pixel classification result obtained by calculating the weight w and the corresponding parameter theta of the input ith RGB training data, wherein k represents the kth pixel,denoted is the weight that the ith RGB training data maps from the classification layer to the label domain,representing parameters based from a layer preceding the classification layerAnd (5) extracting characteristic expression.
3. The RGB-D based indoor scene object segmentation classifier construction method as claimed in claim 1, wherein:
in the third step, the RGB sub-network includes two parts, wherein the network responsible for detecting the objects in the indoor scene is defined as the Reg network, the network responsible for semantic segmentation of the indoor scene is defined as the Seg network, and the process of extracting the features of the input RGB modal data by using the RGB sub-network is as follows: inputting the picture into a Reg network, and extracting the position of each object in an indoor scene image input into the network; c (3,64,1) -C (3,64,1) -C (3,128,1) -C (3,128,1) -C (3,256,1) -C (3,512,1) -C (3,512,1) -RPN (9) -F (4096)
Simultaneously inputting the RGB image input into the network into the Seg network to extract the category of each pixel in the indoor scene image input into the network
C(3,64,1)-C(3,64,1)-C(3,64,1)-C(3,128,1)-C(3,128,1)-C(3,256,1)-C(3,256,1)-C(3,256,1)-C(3,512,1)-C(3,512,1)-C(3,512,1)-C(3,512,1)-C(3,512,1)-ASPP(6,12,18,24)
Where C represents the convolution operation in the network, k in C (k, n, s, d) represents the kernel size of the convolution kernel, n represents the number of convolution kernels, s represents the step size of the convolution kernel shift in the convolution operation, ASPP (d)i) Given is a spatial pyramid pooling structure with a convolution with a hole, where diThe convolution kernel fill amplitude of the hole convolution is indicated.
4. The RGB-D based indoor scene object segmentation classifier construction method as claimed in claim 1, wherein: in the first step, the indoor scene is collected by using the Microsoft depth sensor Kinect, and the Kinect can be held by a hand to walk indoors at a constant speed in the collection process.
CN201810382977.2A 2018-04-26 2018-04-26 RGB-D-based indoor scene object segmentation classifier construction method Active CN108596102B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810382977.2A CN108596102B (en) 2018-04-26 2018-04-26 RGB-D-based indoor scene object segmentation classifier construction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810382977.2A CN108596102B (en) 2018-04-26 2018-04-26 RGB-D-based indoor scene object segmentation classifier construction method

Publications (2)

Publication Number Publication Date
CN108596102A true CN108596102A (en) 2018-09-28
CN108596102B CN108596102B (en) 2022-04-05

Family

ID=63609387

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810382977.2A Active CN108596102B (en) 2018-04-26 2018-04-26 RGB-D-based indoor scene object segmentation classifier construction method

Country Status (1)

Country Link
CN (1) CN108596102B (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109492666A (en) * 2018-09-30 2019-03-19 北京百卓网络技术有限公司 Image recognition model training method, device and storage medium
CN109598268A (en) * 2018-11-23 2019-04-09 安徽大学 A kind of RGB-D well-marked target detection method based on single flow depth degree network
CN109766822A (en) * 2019-01-07 2019-05-17 山东大学 Gesture identification method neural network based and system
CN110110578A (en) * 2019-02-21 2019-08-09 北京工业大学 A kind of indoor scene semanteme marking method
CN110705653A (en) * 2019-10-22 2020-01-17 Oppo广东移动通信有限公司 Image classification method, image classification device and terminal equipment
CN110737941A (en) * 2019-10-12 2020-01-31 南京我爱我家信息科技有限公司 house decoration degree recognition system and method based on probability model and pixel statistical model
CN111506940A (en) * 2019-12-13 2020-08-07 江苏艾佳家居用品有限公司 Furniture, ornament and lamp integrated intelligent local method based on 3D structured light
CN111598912A (en) * 2019-02-20 2020-08-28 北京奇虎科技有限公司 Image segmentation method and device
CN112818837A (en) * 2021-01-29 2021-05-18 山东大学 Aerial photography vehicle weight recognition method based on attitude correction and difficult sample perception
CN113222003A (en) * 2021-05-08 2021-08-06 北方工业大学 RGB-D-based indoor scene pixel-by-pixel semantic classifier construction method and system
CN114426069A (en) * 2021-12-14 2022-05-03 哈尔滨理工大学 Indoor rescue vehicle based on real-time semantic segmentation and image semantic segmentation method

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102867288A (en) * 2011-07-07 2013-01-09 三星电子株式会社 Depth image conversion apparatus and method
CN103226828A (en) * 2013-04-09 2013-07-31 哈尔滨工程大学 Image registration method of acoustic and visual three-dimensional imaging with underwater vehicle
CN105956532A (en) * 2016-04-25 2016-09-21 大连理工大学 Traffic scene classification method based on multi-scale convolution neural network
US20160350904A1 (en) * 2014-03-18 2016-12-01 Huawei Technologies Co., Ltd. Static Object Reconstruction Method and System
CN106612427A (en) * 2016-12-29 2017-05-03 浙江工商大学 Method for generating spatial-temporal consistency depth map sequence based on convolution neural network
CN106651765A (en) * 2016-12-30 2017-05-10 深圳市唯特视科技有限公司 Method for automatically generating thumbnail by use of deep neutral network
CN106709568A (en) * 2016-12-16 2017-05-24 北京工业大学 RGB-D image object detection and semantic segmentation method based on deep convolution network
CN107169974A (en) * 2017-05-26 2017-09-15 中国科学技术大学 It is a kind of based on the image partition method for supervising full convolutional neural networks more
CN107341440A (en) * 2017-05-08 2017-11-10 西安电子科技大学昆山创新研究院 Indoor RGB D scene image recognition methods based on multitask measurement Multiple Kernel Learning
CN107403430A (en) * 2017-06-15 2017-11-28 中山大学 A kind of RGBD image, semantics dividing method
CN107622244A (en) * 2017-09-25 2018-01-23 华中科技大学 A kind of indoor scene based on depth map becomes more meticulous analytic method
WO2018047033A1 (en) * 2016-09-07 2018-03-15 Nokia Technologies Oy Method and apparatus for facilitating stereo vision through the use of multi-layer shifting

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102867288A (en) * 2011-07-07 2013-01-09 三星电子株式会社 Depth image conversion apparatus and method
CN103226828A (en) * 2013-04-09 2013-07-31 哈尔滨工程大学 Image registration method of acoustic and visual three-dimensional imaging with underwater vehicle
US20160350904A1 (en) * 2014-03-18 2016-12-01 Huawei Technologies Co., Ltd. Static Object Reconstruction Method and System
CN105956532A (en) * 2016-04-25 2016-09-21 大连理工大学 Traffic scene classification method based on multi-scale convolution neural network
WO2018047033A1 (en) * 2016-09-07 2018-03-15 Nokia Technologies Oy Method and apparatus for facilitating stereo vision through the use of multi-layer shifting
CN106709568A (en) * 2016-12-16 2017-05-24 北京工业大学 RGB-D image object detection and semantic segmentation method based on deep convolution network
CN106612427A (en) * 2016-12-29 2017-05-03 浙江工商大学 Method for generating spatial-temporal consistency depth map sequence based on convolution neural network
CN106651765A (en) * 2016-12-30 2017-05-10 深圳市唯特视科技有限公司 Method for automatically generating thumbnail by use of deep neutral network
CN107341440A (en) * 2017-05-08 2017-11-10 西安电子科技大学昆山创新研究院 Indoor RGB D scene image recognition methods based on multitask measurement Multiple Kernel Learning
CN107169974A (en) * 2017-05-26 2017-09-15 中国科学技术大学 It is a kind of based on the image partition method for supervising full convolutional neural networks more
CN107403430A (en) * 2017-06-15 2017-11-28 中山大学 A kind of RGBD image, semantics dividing method
CN107622244A (en) * 2017-09-25 2018-01-23 华中科技大学 A kind of indoor scene based on depth map becomes more meticulous analytic method

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
LIANG-CHIEH CHEN等: "《DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution,and Fully Connected CRFs》", 《ARXIV》 *
SAURABH GUPTA等: "《Learning Rich Features from RGB-D Images for Object Detection and Segmentation》", 《EUROPEAN CONFERENCE ON COMPUTER VISION 2014》 *
乔雷先: "《基于手持物体学习的室内物体检测研究及应用》", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
李彦 等: "《基于高斯模型的遥感影像目标识别方法的初探》", 《系统仿真学报》 *

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109492666A (en) * 2018-09-30 2019-03-19 北京百卓网络技术有限公司 Image recognition model training method, device and storage medium
CN109598268B (en) * 2018-11-23 2021-08-17 安徽大学 RGB-D (Red Green blue-D) significant target detection method based on single-stream deep network
CN109598268A (en) * 2018-11-23 2019-04-09 安徽大学 A kind of RGB-D well-marked target detection method based on single flow depth degree network
CN109766822A (en) * 2019-01-07 2019-05-17 山东大学 Gesture identification method neural network based and system
CN111598912A (en) * 2019-02-20 2020-08-28 北京奇虎科技有限公司 Image segmentation method and device
CN110110578A (en) * 2019-02-21 2019-08-09 北京工业大学 A kind of indoor scene semanteme marking method
CN110110578B (en) * 2019-02-21 2023-09-29 北京工业大学 Indoor scene semantic annotation method
CN110737941A (en) * 2019-10-12 2020-01-31 南京我爱我家信息科技有限公司 house decoration degree recognition system and method based on probability model and pixel statistical model
CN110705653A (en) * 2019-10-22 2020-01-17 Oppo广东移动通信有限公司 Image classification method, image classification device and terminal equipment
CN111506940B (en) * 2019-12-13 2022-08-12 江苏艾佳家居用品有限公司 Furniture, ornament and lamp integrated intelligent layout method based on 3D structured light
CN111506940A (en) * 2019-12-13 2020-08-07 江苏艾佳家居用品有限公司 Furniture, ornament and lamp integrated intelligent local method based on 3D structured light
CN112818837A (en) * 2021-01-29 2021-05-18 山东大学 Aerial photography vehicle weight recognition method based on attitude correction and difficult sample perception
CN112818837B (en) * 2021-01-29 2022-11-11 山东大学 Aerial photography vehicle weight recognition method based on attitude correction and difficult sample perception
CN113222003A (en) * 2021-05-08 2021-08-06 北方工业大学 RGB-D-based indoor scene pixel-by-pixel semantic classifier construction method and system
CN113222003B (en) * 2021-05-08 2023-08-01 北方工业大学 Construction method and system of indoor scene pixel-by-pixel semantic classifier based on RGB-D
CN114426069A (en) * 2021-12-14 2022-05-03 哈尔滨理工大学 Indoor rescue vehicle based on real-time semantic segmentation and image semantic segmentation method
CN114426069B (en) * 2021-12-14 2023-08-25 哈尔滨理工大学 Indoor rescue vehicle based on real-time semantic segmentation and image semantic segmentation method

Also Published As

Publication number Publication date
CN108596102B (en) 2022-04-05

Similar Documents

Publication Publication Date Title
CN108596102B (en) RGB-D-based indoor scene object segmentation classifier construction method
WO2021022970A1 (en) Multi-layer random forest-based part recognition method and system
Tao et al. Detection of power line insulator defects using aerial images analyzed with convolutional neural networks
CN109344701B (en) Kinect-based dynamic gesture recognition method
CN108717524B (en) Gesture recognition system based on double-camera mobile phone and artificial intelligence system
CN110929593B (en) Real-time significance pedestrian detection method based on detail discrimination
CN104966085B (en) A kind of remote sensing images region of interest area detecting method based on the fusion of more notable features
CN105160310A (en) 3D (three-dimensional) convolutional neural network based human body behavior recognition method
CN107273832B (en) License plate recognition method and system based on integral channel characteristics and convolutional neural network
CN106991370B (en) Pedestrian retrieval method based on color and depth
CN105046206B (en) Based on the pedestrian detection method and device for moving prior information in video
CN109086754A (en) A kind of human posture recognition method based on deep learning
CN108596256B (en) Object recognition classifier construction method based on RGB-D
CN113515655A (en) Fault identification method and device based on image classification
CN111353447A (en) Human skeleton behavior identification method based on graph convolution network
CN109977834B (en) Method and device for segmenting human hand and interactive object from depth image
CN113052295B (en) Training method of neural network, object detection method, device and equipment
CN115816460A (en) Manipulator grabbing method based on deep learning target detection and image segmentation
CN112734747A (en) Target detection method and device, electronic equipment and storage medium
Ge et al. Coarse-to-fine foraminifera image segmentation through 3D and deep features
CN114332911A (en) Head posture detection method and device and computer equipment
CN113033386B (en) High-resolution remote sensing image-based transmission line channel hidden danger identification method and system
CN117037049B (en) Image content detection method and system based on YOLOv5 deep learning
CN106960188B (en) Weather image classification method and device
CN108109125A (en) Information extracting method and device based on remote sensing images

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant