CN111583322A - Depth learning-based 2D image scene depth prediction and semantic segmentation method and system - Google Patents

Depth learning-based 2D image scene depth prediction and semantic segmentation method and system Download PDF

Info

Publication number
CN111583322A
CN111583322A CN202010380353.4A CN202010380353A CN111583322A CN 111583322 A CN111583322 A CN 111583322A CN 202010380353 A CN202010380353 A CN 202010380353A CN 111583322 A CN111583322 A CN 111583322A
Authority
CN
China
Prior art keywords
image
rgb
training
model
semantic segmentation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010380353.4A
Other languages
Chinese (zh)
Inventor
不公告发明人
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Huayan Mutual Entertainment Technology Co ltd
Original Assignee
Beijing Huayan Mutual Entertainment Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Huayan Mutual Entertainment Technology Co ltd filed Critical Beijing Huayan Mutual Entertainment Technology Co ltd
Priority to CN202010380353.4A priority Critical patent/CN111583322A/en
Publication of CN111583322A publication Critical patent/CN111583322A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Processing (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a depth prediction and semantic segmentation method and system for a 2D image scene based on depth learning, wherein the scene depth prediction method comprises the following steps: acquiring a plurality of RGB-D images; each RGB-D image is taken as a training sample, and a scene depth prediction initial model is formed on the basis of convolutional neural network training; scene depth prediction is carried out on the RGB-D image through a scene depth prediction initial model to obtain a verification result; adjusting model training parameters according to a verification result, then carrying out updating training on the depth prediction initial model by taking the RGB-D image as a training sample, and finally training to form a scene depth prediction model; and carrying out scene depth prediction on the input RGB 2D image through a scene depth prediction model to obtain a scene depth prediction result. The method can realize depth prediction of 2D image scenes and semantic segmentation of scene images, has high prediction speed and prediction precision, and has high semantic segmentation accuracy.

Description

Deep learning-based 2D image scene depth prediction and semantic segmentation method and system
Technical Field
The invention relates to the technical field of deep learning and image analysis, in particular to a depth prediction and semantic segmentation method and system for a 2D image scene based on deep learning.
Background
The depth estimation method is used for estimating the depth information of each pixel point in the image to be processed and obtaining the global depth map of the image to be processed, and plays an important role in the application fields of computer vision and computer graphics. However, in the existing depth information estimation method, the depth information is usually determined only according to the position information of the pixel points in the image, and according to the principle of bottom-up, the object at the bottom of the image is regarded as a near view, and the object at the top of the image is regarded as a far view, so that the depth information of the image is determined and obtained. The depth value estimated by such a depth information estimation method is usually inaccurate, the depth map has a weak hierarchical sense, and most importantly, the depth map cannot be predicted for the input color image.
Semantic segmentation is classification at the pixel level, and pixels belonging to the same class are classified into one class to realize image segmentation of different kinds of objects on an image. The segmentation accuracy of the segmentation map obtained by the traditional semantic segmentation method, such as the semantic segmentation method based on a random forest classifier and the like, is not high. Although some semantic segmentation methods based on deep learning exist at present, the problem of low segmentation accuracy of the existing semantic segmentation method still cannot be solved due to the fact that accurate depth information estimation cannot be performed on a color image.
Disclosure of Invention
The invention aims to provide a depth prediction and semantic segmentation method and system for a 2D image scene based on deep learning, so as to solve the technical problems.
In order to achieve the purpose, the invention adopts the following technical scheme:
provided is a depth prediction method of a 2D image scene based on deep learning, comprising the following steps:
acquiring a plurality of RGB-D images to form an image data set;
dividing the image data set into a training set and a verification set according to a preset division ratio;
taking each RGB-D image in a training set as a training sample, and training on the basis of a convolutional neural network to form a scene depth prediction initial model;
scene depth prediction is carried out on the RGB-D images in the verification set through the scene depth prediction initial model so as to verify the model performance of the scene depth prediction initial model and obtain a verification result;
adjusting model training parameters according to the verification result, then carrying out updating training on the depth prediction initial model by taking the RGB-D images in the sample set as training samples, and finally training to form a scene depth prediction model;
and performing scene depth prediction on the input RGB image through the scene depth prediction model, and outputting a scene depth prediction result.
As a preferred embodiment of the present invention, the training samples for training the scene depth prediction model are augmented by performing any one or more of random flipping, random cropping, scaling, and random rotation on the RGB-D image.
As a preferred scheme of the invention, the scene depth prediction model is trained based on a ResNet convolutional neural network architecture.
As a preferable scheme of the invention, the model performance of the scene depth prediction initial model is verified by using a Huber regression loss function.
The invention also provides a depth prediction system of the 2D image scene based on the deep learning, which can realize the depth prediction method of the image scene and comprises the following steps:
the image acquisition module is used for acquiring the RGB-D image from an external image database;
the image storage module is connected with the image acquisition module and used for storing the acquired RGB-D image;
the image dividing module is connected with the image storage module and is used for dividing the stored RGB-D images into a training set and a verification set according to a preset dividing proportion;
the initial model training module is connected with the image storage module and used for training each RGB-D image in a training set to form the scene depth prediction initial model by taking the RGB-D images as training samples;
the model performance prediction module is respectively connected with the model training module and the image storage module and is used for verifying the prediction performance of the scene depth prediction initial model by taking the RGB-D images in the verification set as verification samples to obtain a verification result;
the verification result display module is connected with the model performance prediction module and used for displaying the verification result to a user;
the model parameter adjusting module is connected with the verification result display module and used for providing the user with model training parameters adjusted according to the verification result;
the model updating training module is respectively connected with the image storage module, the initial model training module and the model parameter adjusting module and is used for updating and training the depth prediction initial model by taking each RGB-D image in a training set as a training sample according to the adjusted model parameter to finally train and form the scene depth prediction model;
and the scene depth prediction module is connected with the model updating training module and used for performing image scene depth prediction on the input RGB image through the scene depth prediction model.
The invention also provides a 2D image scene semantic segmentation method based on deep learning, which comprises the following steps:
inputting the RGB-D image into a feature extractor to extract an RGB feature map corresponding to the RGB image in the RGB-D image;
inputting the ground truth-value depth map corresponding to the RGB-D image into the feature extractor to extract a truth-value depth feature map corresponding to the ground truth-value depth map;
carrying out image fusion on the RGB feature map and the true value depth feature map to obtain a feature fusion map;
and performing semantic segmentation on the feature fusion graph through a pre-trained semantic segmentation model, and outputting a semantic segmentation result.
As a preferred aspect of the present invention, the method for training the semantic segmentation model includes:
acquiring the RGB-D image and the ground truth value depth map corresponding to the RGB-D image to form an image data set;
dividing the image data set into a training set and a verification set according to a preset division ratio;
taking each RGB-D image in the training set and the ground truth-value depth map corresponding to the RGB-D image as training samples, and training on the basis of a convolutional neural network to form a semantic segmentation initial model;
performing semantic segmentation on the RGB images in the verification set through the semantic segmentation initial model to verify the model performance of the semantic segmentation initial model to obtain a verification result;
and adjusting model training parameters according to the verification result, then updating and training the semantic segmentation initial model by taking the image data in the training set as a training sample, and finally training to form the semantic segmentation model.
In a preferred embodiment of the present invention, the image size of the RGB feature map or the true-value depth feature map output by the feature extractor is 160 × 128 × 64.
As a preferred scheme of the present invention, the network structure of the convolutional neural network for training the semantic segmentation model at least includes an upper convolutional layer, a first convolutional layer, a second convolutional layer, and an upper sampling layer, where the first convolutional layer is connected to the upper convolutional layer, the second convolutional layer is connected to the first convolutional layer, the upper sampling layer is connected to the second convolutional layer, the upper convolutional layer performs an upper convolutional operation on the feature fusion map, and the upper sampling layer performs an upper sampling process on a feature map output by the second convolutional layer and outputs a semantic segmentation result.
The invention also provides a deep learning-based 2D image scene semantic segmentation system, which can realize the 2D image scene semantic segmentation method and comprises the following steps:
the first image acquisition module is used for acquiring and storing the RGB-D image;
the second image acquisition module is used for acquiring the ground truth value depth map corresponding to the RGB-D image;
the first image feature extraction module is connected with the first image acquisition module and is used for extracting the RGB feature map corresponding to the RGB image in the RGB-D image;
the second image feature extraction module is connected with the second image acquisition module and is used for extracting the true value depth feature map corresponding to the ground true value depth map;
the feature fusion module is respectively connected with the first image feature extraction module and the second image feature extraction module and is used for carrying out image fusion on the RGB feature map and the truth-value depth feature map to obtain a feature fusion map;
and the semantic segmentation module is connected with the feature fusion module and used for performing semantic segmentation on the feature fusion graph through a pre-trained semantic segmentation model and outputting a semantic segmentation result.
The depth prediction method and the device realize depth prediction of 2D image scenes, have high prediction speed and prediction precision, and can accurately obtain the depth information of the input color map. In addition, the color image is subjected to semantic segmentation based on the accurately predicted depth information, and the segmentation accuracy is greatly improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required to be used in the embodiments of the present invention will be briefly described below. It is obvious that the drawings described below are only some embodiments of the invention, and that for a person skilled in the art, other drawings can be derived from them without inventive effort.
Fig. 1 is a diagram illustrating steps of a deep learning-based 2D image scene depth prediction method according to an embodiment of the present invention;
FIG. 2 is a block diagram of a depth prediction system for a deep learning based 2D image scene according to an embodiment of the present invention;
FIG. 3 is a diagram illustrating steps of a deep learning-based 2D image scene semantic segmentation method according to an embodiment of the present invention;
FIG. 4 is a diagram of method steps for training the semantic segmentation model;
FIG. 5 is a block diagram of a deep learning based 2D image scene semantic segmentation system according to an embodiment of the present invention;
FIG. 6 is a network architecture diagram of the feature extractor extracting the RGB feature map or the true depth feature map;
fig. 7 is a schematic diagram of predicting a depth of a 2D image scene and performing semantic segmentation of the image scene.
Detailed Description
The technical scheme of the invention is further explained by the specific implementation mode in combination with the attached drawings.
Wherein the showings are for the purpose of illustration only and are shown by way of illustration only and not in actual form, and are not to be construed as limiting the present patent; to better illustrate the embodiments of the present invention, some parts of the drawings may be omitted, enlarged or reduced, and do not represent the size of an actual product; it will be understood by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.
The same or similar reference numerals in the drawings of the embodiments of the present invention correspond to the same or similar components; in the description of the present invention, it should be understood that if the terms "upper", "lower", "left", "right", "inner", "outer", etc. are used for indicating the orientation or positional relationship based on the orientation or positional relationship shown in the drawings, it is only for convenience of description and simplification of description, but it is not indicated or implied that the referred device or element must have a specific orientation, be constructed in a specific orientation and be operated, and therefore, the terms describing the positional relationship in the drawings are only used for illustrative purposes and are not to be construed as limitations of the present patent, and the specific meanings of the terms may be understood by those skilled in the art according to specific situations.
In the description of the present invention, unless otherwise explicitly specified or limited, the term "connected" or the like, if appearing to indicate a connection relationship between the components, is to be understood broadly, for example, as being fixed or detachable or integral; can be mechanically or electrically connected; they may be directly connected or indirectly connected through intervening media, or may be connected through one or more other components or may be in an interactive relationship with one another. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.
Fig. 1 illustrates a depth prediction method for a 2D image scene based on deep learning according to an embodiment of the present invention. Referring to fig. 1, the depth prediction method for an image scene based on deep learning according to the present embodiment includes the following steps:
step S1, acquiring a plurality of RGB-D images to form an image data set; the RGB-D image is a color-depth image, and actually comprises two images, one is the RGB image, and the other is a depth image (D image) corresponding to the RGB image;
step S2, dividing the image data set into a training set and a verification set according to a preset division ratio;
step S3, each RGB-D image in the training set is used as a training sample, and a scene depth prediction initial model is formed based on convolutional neural network training;
step S4, performing scene depth prediction on the RGB-D images in the verification set through the scene depth prediction initial model, specifically, performing scene depth prediction on the RGB images in the RGB-D images through the scene depth prediction initial model to verify the model performance of the scene depth prediction initial model and obtain a verification result;
step S5, adjusting model training parameters according to a verification result, then carrying out updating training on the depth prediction initial model by taking RGB-D images in the sample set as training samples, and finally training to form a scene depth prediction model;
in step S6, the normal scene depth prediction model performs scene depth prediction on the input RGB image, and outputs a scene depth prediction result.
In order to ensure the diversity of the training samples, preferably, in the embodiment of the present invention, the training samples of the training scene depth prediction model are extended by performing image preprocessing such as image random flipping, random cropping, scaling, or random rotation on the RGB-D image, so as to ensure that the trained scene depth prediction model has higher prediction accuracy.
In step S3, in the embodiment of the present invention, the scene depth prediction model is preferably trained based on the ResNet convolutional neural network architecture. Please refer to fig. 7a for a detailed network structure of the ResNet residual network. Fig. 6 shows the internal network structure of the feature extractor in fig. 7 a. As can be seen from fig. 7a, the image size of the RGB image input to the convolutional neural network is 304 × 228 × 3, the image size of the feature map output by the feature extractor is 160 × 128 × 64, the feature map output by the feature extractor is subjected to convolution feature extraction of a convolution kernel with a size of 3 × 3, and then a feature map with a size of 160 × 128 × 1 is output, and then the feature map with a size of 160 × 128 × 1 is up-sampled, and finally a prediction depth map with a size of 640 × 480 is output.
In the embodiment of the invention, a Huber return loss function is preferably adopted to verify the model performance of the scene depth prediction initial model. Specifically, the loss amount between the prediction depth map and the real depth map corresponding to the input RGB image is calculated through a Huber loss function, so that the model performance of the scene depth prediction initial model is verified. Since the calculation of the amount of loss between the predicted depth map and the true depth map by the Huber loss function is not within the scope of the claimed invention, the loss calculation process for the Huber loss function is not set forth herein.
Referring to fig. 2, the present invention further provides a depth prediction system for a 2D image scene based on deep learning, which can implement the depth prediction method for an image scene, and the system includes:
the image acquisition module 1 is used for acquiring an RGB-D image from an external image database;
the image storage module 2 is connected with the image acquisition module 1 and is used for storing the acquired RGB-D image;
the image dividing module 3 is connected with the image storage module 2 and is used for dividing the stored RGB-D images into a training set and a verification set according to a preset dividing proportion;
the initial model training module 4 is connected with the image storage module 2 and used for training each RGB-D image in the training set as a training sample to form a scene depth prediction initial model;
the model performance prediction module 5 is connected with the initial model training module 4 and used for verifying the prediction performance of the scene depth prediction initial model by taking the RGB-D images in the verification set as verification samples to obtain a verification result;
the verification result display module 6 is connected with the model performance prediction module 5 and used for displaying the verification result to the user;
the model parameter adjusting module 7 is connected with the verification result displaying module 6 and used for providing the user with the model training parameters adjusted according to the verification result;
the model updating training module 8 is respectively connected with the image storage module 2, the initial model training module 4 and the model parameter adjusting module 7 and is used for updating and training the depth prediction initial model by taking each RGB-D image in the training set as a training sample according to the adjusted model parameters and finally training to form a scene depth prediction model;
and the scene depth prediction module 9 is connected with the model updating training module 8 and used for performing image scene depth prediction on the input RGB image through the scene depth prediction model.
The invention also provides a deep learning-based 2D image scene semantic segmentation method, please refer to fig. 3 and 7b, which specifically comprises the following steps:
step L1, inputting the RGB-D image into a feature extractor to extract the RGB feature map corresponding to the RGB image in the RGB-D image;
step L2, inputting the ground truth-value depth map corresponding to the RGB-D image into a feature extractor to extract a truth-value depth feature map corresponding to the ground truth-value depth map;
l3, carrying out image fusion on the RGB feature map and the true-value depth feature map to obtain a feature fusion map;
and L4, performing semantic segmentation on the feature fusion graph through a pre-trained semantic segmentation model, and outputting a semantic segmentation result.
Please refer to fig. 6 for the internal network structure of the feature extractor described in steps L1 and L2. The image size of the RGB image input to the feature extractor or the ground truth depth map corresponding to the RGB image is 304 × 228 × 3, and the image size of the RGB feature map or the ground truth depth map output by the feature extractor is 160 × 128 × 64.
In step L3, the image size of the feature fusion map formed by image fusion of the RGB feature map and the true-value depth feature map is 160 × 128 × 64.
In step L4, the network structure of the convolutional neural network of the training semantic segmentation model at least includes an upper convolutional layer, a first convolutional layer, a second convolutional layer, and an upper sampling layer, where the first convolutional layer is connected to the upper convolutional layer, the second convolutional layer is connected to the first convolutional layer, the upper sampling layer is connected to the second convolutional layer, the upper convolutional layer performs an upper convolutional operation on the feature fusion map, and the upper sampling layer performs an upper sampling process on the feature map output by the second convolutional layer and outputs a semantic segmentation result.
In this embodiment, the convolution kernel size of the first convolution layer and the second convolution layer is preferably 3 × 3. The size of the segmentation graph output by the semantic segmentation model is 640 × 480 × 38, and "38" is used for representing the number of segmented semantic tags.
Referring to fig. 4, the method for training the semantic segmentation model according to the embodiment of the present invention includes the following steps:
step M1, acquiring an RGB-D image and a ground truth value depth map corresponding to the RGB-D image to form an image data set;
step M2, dividing the image data set into a training set and a verification set according to a preset division ratio;
step M3, taking each RGB-D image in the training set and the ground truth-value depth map corresponding to the RGB-D image as training samples, and training on the basis of a convolutional neural network to form a semantic segmentation initial model; preferably, training a semantic segmentation initial model by adopting a ResNet convolutional neural network architecture;
step M4, performing semantic segmentation on the RGB images in the verification set through the semantic segmentation initial model to verify the model performance of the semantic segmentation initial model and obtain a verification result; preferably, the model performance of the initial model of semantic segmentation is verified through an L2 loss function, and a specific verification process is not described herein;
and step M5, adjusting model training parameters according to the verification result, then updating and training the semantic segmentation initial model by taking the image data in the training set as a training sample, and finally training to form a semantic segmentation model.
Referring to fig. 5, the present invention further provides a deep learning-based 2D image scene semantic segmentation system, which can implement the image scene semantic segmentation method described above, and the system includes:
the first image acquisition module 11 is used for acquiring and storing an RGB-D image;
the second image acquisition module 12 is configured to acquire a ground truth depth map corresponding to the RGB-D image;
the first image feature extraction module 13 is connected to the first image acquisition module 11, and is configured to extract an RGB feature map corresponding to an RGB image in the RGB-D image; referring to fig. 7a, the image size of the RGB feature map is 160 × 128 × 64;
the second image feature extraction module 14 is connected to the second image acquisition module 12, and is configured to extract a true-value depth feature map corresponding to the ground true-value depth map; referring to fig. 7b, the image size of the true depth feature is also 160 × 128 × 64;
the feature fusion module 15 is respectively connected to the first image feature extraction module 13 and the second image feature extraction module 14, and is configured to perform image fusion on the RGB feature map and the true-value depth feature map to obtain a feature fusion map; the image size of the feature fusion map is 160 × 128;
and the semantic segmentation module 16 is connected with the feature fusion module 15 and used for performing semantic segmentation on the feature fusion map through a pre-trained semantic segmentation model and outputting a semantic segmentation result.
It should be understood that the above-described embodiments are merely preferred embodiments of the invention and the technical principles applied thereto. It will be understood by those skilled in the art that various modifications, equivalents, changes, and the like can be made to the present invention. However, such variations are within the scope of the invention as long as they do not depart from the spirit of the invention. In addition, certain terms used in the specification and claims of the present application are not limiting, but are used merely for convenience of description.

Claims (10)

1. A depth prediction method for a 2D image scene based on deep learning is characterized by comprising the following steps:
acquiring a plurality of RGB-D images to form an image data set;
dividing the image data set into a training set and a verification set according to a preset division ratio;
taking each RGB-D image in a training set as a training sample, and training on the basis of a convolutional neural network to form a scene depth prediction initial model;
scene depth prediction is carried out on the RGB-D images in the verification set through the scene depth prediction initial model so as to verify the model performance of the scene depth prediction initial model and obtain a verification result;
adjusting model training parameters according to the verification result, then carrying out updating training on the depth prediction initial model by taking the RGB-D images in the sample set as training samples, and finally training to form a scene depth prediction model;
and performing scene depth prediction on the input RGB image through the scene depth prediction model, and outputting a scene depth prediction result.
2. The method of claim 1, wherein the training samples for training the scene depth prediction model are augmented by any one or more of random flipping, random cropping, scaling, or random rotation of the RGB-D image.
3. The 2D image scene depth prediction method of claim 1, wherein the scene depth prediction model is trained based on a ResNet convolutional neural network architecture.
4. The method for 2D image scene depth prediction according to claim 1, wherein a Huber regression loss function is used to verify the model performance of the scene depth prediction initial model.
5. The deep learning-based 2D image scene depth prediction system can realize the image scene depth prediction method according to any one of claims 1 to 4, and is characterized by comprising the following steps:
the image acquisition module is used for acquiring the RGB-D image from an external image database;
the image storage module is connected with the image acquisition module and used for storing the acquired RGB-D image;
the image dividing module is connected with the image storage module and is used for dividing the stored RGB-D images into a training set and a verification set according to a preset dividing proportion;
the initial model training module is connected with the image storage module and used for training each RGB-D image in a training set to form the scene depth prediction initial model by taking the RGB-D images as training samples;
the model performance prediction module is respectively connected with the model training module and the image storage module and is used for verifying the prediction performance of the scene depth prediction initial model by taking the RGB-D images in the verification set as verification samples to obtain a verification result;
the verification result display module is connected with the model performance prediction module and used for displaying the verification result to a user;
the model parameter adjusting module is connected with the verification result display module and used for providing the user with model training parameters adjusted according to the verification result;
the model updating training module is respectively connected with the image storage module, the initial model training module and the model parameter adjusting module and is used for updating and training the depth prediction initial model by taking each RGB-D image in a training set as a training sample according to the adjusted model parameter to finally train and form the scene depth prediction model;
and the scene depth prediction module is connected with the model updating training module and used for performing image scene depth prediction on the input RGB image through the scene depth prediction model.
6. A2D image scene semantic segmentation method based on deep learning is characterized by comprising the following steps:
inputting the RGB-D image into a feature extractor to extract an RGB feature map corresponding to the RGB image in the RGB-D image;
inputting the ground truth-value depth map corresponding to the RGB-D image into the feature extractor to extract a truth-value depth feature map corresponding to the ground truth-value depth map;
carrying out image fusion on the RGB feature map and the true value depth feature map to obtain a feature fusion map;
and performing semantic segmentation on the feature fusion graph through a pre-trained semantic segmentation model, and outputting a semantic segmentation result.
7. The method for deep learning based 2D image scene semantic segmentation as claimed in claim 6, wherein the method for training the semantic segmentation model comprises:
acquiring the RGB-D image and the ground truth value depth map corresponding to the RGB-D image to form an image data set;
dividing the image data set into a training set and a verification set according to a preset division ratio;
taking each RGB-D image in the training set and the ground truth-value depth map corresponding to the RGB-D image as training samples, and training on the basis of a convolutional neural network to form a semantic segmentation initial model;
performing semantic segmentation on the RGB images in the verification set through the semantic segmentation initial model to verify the model performance of the semantic segmentation initial model to obtain a verification result;
and adjusting model training parameters according to the verification result, then updating and training the semantic segmentation initial model by taking the image data in the training set as a training sample, and finally training to form the semantic segmentation model.
8. The method according to claim 7, wherein the image size of the RGB feature map or the true-value depth feature map output by the feature extractor is 160 × 128 × 64.
9. The deep learning-based 2D image scene semantic segmentation method according to claim 6, wherein a network structure of the convolutional neural network for training the semantic segmentation model at least includes an upper convolutional layer, a first convolutional layer, a second convolutional layer and an upper sampling layer, the first convolutional layer is connected to the upper convolutional layer, the second convolutional layer is connected to the first convolutional layer, the upper sampling layer is connected to the second convolutional layer, the upper convolutional layer performs an upper convolutional operation on the feature fusion map, and the upper sampling layer performs an upper sampling process on a feature map output by the second convolutional layer and outputs a semantic segmentation result.
10. A2D image scene semantic segmentation system based on deep learning can realize the image scene semantic segmentation method as the right 6-9, and is characterized by comprising the following steps:
the first image acquisition module is used for acquiring and storing the RGB-D image;
the second image acquisition module is used for acquiring the ground truth value depth map corresponding to the RGB-D image;
the first image feature extraction module is connected with the first image acquisition module and is used for extracting the RGB feature map corresponding to the RGB image in the RGB-D image;
the second image feature extraction module is connected with the second image acquisition module and is used for extracting the true value depth feature map corresponding to the ground true value depth map;
the feature fusion module is respectively connected with the first image feature extraction module and the second image feature extraction module and is used for carrying out image fusion on the RGB feature map and the truth-value depth feature map to obtain a feature fusion map;
and the semantic segmentation module is connected with the feature fusion module and used for performing semantic segmentation on the feature fusion graph through a pre-trained semantic segmentation model and outputting a semantic segmentation result.
CN202010380353.4A 2020-05-09 2020-05-09 Depth learning-based 2D image scene depth prediction and semantic segmentation method and system Pending CN111583322A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010380353.4A CN111583322A (en) 2020-05-09 2020-05-09 Depth learning-based 2D image scene depth prediction and semantic segmentation method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010380353.4A CN111583322A (en) 2020-05-09 2020-05-09 Depth learning-based 2D image scene depth prediction and semantic segmentation method and system

Publications (1)

Publication Number Publication Date
CN111583322A true CN111583322A (en) 2020-08-25

Family

ID=72112565

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010380353.4A Pending CN111583322A (en) 2020-05-09 2020-05-09 Depth learning-based 2D image scene depth prediction and semantic segmentation method and system

Country Status (1)

Country Link
CN (1) CN111583322A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112990304A (en) * 2021-03-12 2021-06-18 国网智能科技股份有限公司 Semantic analysis method and system suitable for power scene
CN114022871A (en) * 2021-11-10 2022-02-08 中国民用航空飞行学院 Unmanned aerial vehicle driver fatigue detection method and system based on depth perception technology
WO2023138062A1 (en) * 2022-01-19 2023-07-27 美的集团(上海)有限公司 Image processing method and apparatus

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107403430A (en) * 2017-06-15 2017-11-28 中山大学 A kind of RGBD image, semantics dividing method
US20180150727A1 (en) * 2016-11-29 2018-05-31 Sap Se Object Detection in Image Data Using Depth Segmentation
CN110599533A (en) * 2019-09-20 2019-12-20 湖南大学 Rapid monocular depth estimation method suitable for embedded platform
CN111080659A (en) * 2019-12-19 2020-04-28 哈尔滨工业大学 Environmental semantic perception method based on visual information

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180150727A1 (en) * 2016-11-29 2018-05-31 Sap Se Object Detection in Image Data Using Depth Segmentation
CN107403430A (en) * 2017-06-15 2017-11-28 中山大学 A kind of RGBD image, semantics dividing method
CN110599533A (en) * 2019-09-20 2019-12-20 湖南大学 Rapid monocular depth estimation method suitable for embedded platform
CN111080659A (en) * 2019-12-19 2020-04-28 哈尔滨工业大学 Environmental semantic perception method based on visual information

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
代具亭 等: "基于彩色-深度图像和深度学习的场景语义分割网络", 《科学技术与工程》 *
王子羽 等: "基于RGB-D图像的室内场景语义分割网络优化", 《自动化与信息工程》 *
袁建中 等: "基于深度卷积神经网络的道路场景深度估计", 《激光与光电子学进展》 *
韦鹏程 等: "《大数据巨量分析与机器学习的整合与开发》", 31 May 2017, 电子科技大学出版社 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112990304A (en) * 2021-03-12 2021-06-18 国网智能科技股份有限公司 Semantic analysis method and system suitable for power scene
CN112990304B (en) * 2021-03-12 2024-03-12 国网智能科技股份有限公司 Semantic analysis method and system suitable for power scene
CN114022871A (en) * 2021-11-10 2022-02-08 中国民用航空飞行学院 Unmanned aerial vehicle driver fatigue detection method and system based on depth perception technology
WO2023138062A1 (en) * 2022-01-19 2023-07-27 美的集团(上海)有限公司 Image processing method and apparatus

Similar Documents

Publication Publication Date Title
US10789504B2 (en) Method and device for extracting information in histogram
CN109508681B (en) Method and device for generating human body key point detection model
CN112132156B (en) Image saliency target detection method and system based on multi-depth feature fusion
EP3916627A1 (en) Living body detection method based on facial recognition, and electronic device and storage medium
CN108830285B (en) Target detection method for reinforcement learning based on fast-RCNN
CN109960742B (en) Local information searching method and device
US9042648B2 (en) Salient object segmentation
CN111583322A (en) Depth learning-based 2D image scene depth prediction and semantic segmentation method and system
CN111652869B (en) Slab void identification method, system, medium and terminal based on deep learning
WO2020101777A1 (en) Segmenting objects by refining shape priors
CN111369581A (en) Image processing method, device, equipment and storage medium
CN113128271A (en) Counterfeit detection of face images
CN110781980B (en) Training method of target detection model, target detection method and device
CN116051953A (en) Small target detection method based on selectable convolution kernel network and weighted bidirectional feature pyramid
CN113255557B (en) Deep learning-based video crowd emotion analysis method and system
CN109977834B (en) Method and device for segmenting human hand and interactive object from depth image
CN112651333B (en) Silence living body detection method, silence living body detection device, terminal equipment and storage medium
CN112101344B (en) Video text tracking method and device
CN114332473A (en) Object detection method, object detection device, computer equipment, storage medium and program product
CN111210417B (en) Cloth defect detection method based on convolutional neural network
CN112990213B (en) Digital multimeter character recognition system and method based on deep learning
CN116977683A (en) Object recognition method, apparatus, computer device, storage medium, and program product
CN112651351B (en) Data processing method and device
CN111325194B (en) Character recognition method, device and equipment and storage medium
CN113706636A (en) Method and device for identifying tampered image

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200825