CN113128519A - Multi-mode multi-spliced RGB-D significance target detection method - Google Patents

Multi-mode multi-spliced RGB-D significance target detection method Download PDF

Info

Publication number
CN113128519A
CN113128519A CN202110461176.7A CN202110461176A CN113128519A CN 113128519 A CN113128519 A CN 113128519A CN 202110461176 A CN202110461176 A CN 202110461176A CN 113128519 A CN113128519 A CN 113128519A
Authority
CN
China
Prior art keywords
image
depth
rgb
region
sub
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110461176.7A
Other languages
Chinese (zh)
Other versions
CN113128519B (en
Inventor
陈莉
赵志华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northwestern University
Original Assignee
Northwestern University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northwestern University filed Critical Northwestern University
Priority to CN202110461176.7A priority Critical patent/CN113128519B/en
Publication of CN113128519A publication Critical patent/CN113128519A/en
Application granted granted Critical
Publication of CN113128519B publication Critical patent/CN113128519B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/50Extraction of image or video features by performing operations within image blocks; by using histograms, e.g. histogram of oriented gradients [HoG]; by summing image-intensity values; Projection analysis
    • G06V10/507Summing image-intensity values; Histogram projection analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/56Extraction of image or video features relating to colour

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a multi-mode multi-spliced RGB-D significance target detection method, which comprises the following steps of: s1, dividing the image into non-overlapping sub-regions, respectively extracting RGB image color information, Depth information of a Depth image and symmetric invariant LBP characteristics of each image sub-region, and forming a region histogram based on the symmetric invariant LBP characteristics; s2, measuring the correlation of RGB image color information, Depth information of a Depth image and a region histogram based on class condition mutual information entropy, and realizing the fusion of the RGB image color information, the Depth information of the Depth image and the region histogram in the order of scores by using a self-adaptive score fusion algorithm to obtain the final score of each image subregion; and S3, realizing the salient object detection of the image based on the final score of each image subregion. The method can quickly and efficiently realize the detection of the saliency target and can realize the saliency detection precision.

Description

Multi-mode multi-spliced RGB-D significance target detection method
Technical Field
The invention relates to the field of image detection, in particular to a multi-mode multi-spliced RGB-D saliency target detection method.
Background
The salient object detection is an important component of computer vision, and with the continuous development of the field of computer vision, a salient object detection method with higher efficiency and better accuracy is urgently needed.
During the development of saliency detection, various methods come up, for example, utilizing color features, position information, texture features, and the like of images. Some conventional methods use central priors, edge priors, semantic priors, etc. However, these models often fail because the color scene in the image is very complex and there is no significant contrast between the object and the background, and the light is an object whose approximation is difficult to distinguish by these features.
Disclosure of Invention
In order to solve the problems, the invention provides a multi-mode multi-splicing RGB-D saliency target detection method, which can quickly and efficiently realize the detection of saliency targets and can realize the accuracy of saliency detection.
Meanwhile, the complementarity of the visible light camera and the near infrared camera is utilized, the feature extraction is simultaneously carried out on the visible light face picture by utilizing the deep learning algorithm, and finally the fusion algorithm is used for carrying out hierarchical fusion on the features extracted by the deep learning model, so that the effect of advantage complementation is played.
In order to achieve the purpose, the invention adopts the technical scheme that:
a multi-mode multi-spliced RGB-D saliency target detection method comprises the following steps:
s1, dividing the image into non-overlapping sub-regions, respectively extracting RGB image color information, Depth information of a Depth image and symmetric invariant LBP characteristics of each image sub-region, and forming a region histogram based on the symmetric invariant LBP characteristics;
s2, measuring the correlation of RGB image color information, Depth information of a Depth image and a region histogram based on class condition mutual information entropy, and realizing the fusion of the RGB image color information, the Depth information of the Depth image and the region histogram in the order of scores by using a self-adaptive score fusion algorithm to obtain the final score of each image subregion;
and S3, realizing the salient object detection of the image based on the final score of each image subregion.
Further, in step S1, the extraction of RGB image color information, Depth image Depth information, and symmetric invariant LBP features of each sub-region is implemented based on a Depth convolution network.
Further, in the step S1, the detection of the object carried in the image is firstly realized based on the Dssd _ Xception _ coco model, and then the division of the image sub-regions is realized based on the detection result of the object.
Further, each target is configured with an image subregion, and the rest of the background is configured with an image subregion.
Furthermore, the Dssd _ Xception _ coco model adopts a DSSD target detection algorithm, pre-trains an Xception neural network by using a coco data set, then trains the model by using a previously prepared data set, finely adjusts various parameters in the deep neural network, and finally obtains a suitable detection model for detecting each target loaded in the image.
Further, in the step S3, the saliency target detection of the image is achieved according to the final score of each image sub-region based on the ResNet50 model.
Further, still include: the step of recognizing the human eye observation angle of the image is realized, different human eye observation angles correspond to different image deflection angle adjusting models, and the adjustment of the image deflection angle is realized based on the image deflection angle model
The invention has the following beneficial effects:
the detection of the saliency target can be realized quickly and efficiently, and the saliency detection precision can be realized.
Drawings
FIG. 1 is a flowchart of a multi-modal multi-tiling RGB-D saliency target detection method according to embodiment 1 of the present invention
Fig. 2 is a flowchart of a multi-modal multi-stitched RGB-D saliency target detection method according to embodiment 2 of the present invention.
Detailed Description
In order that the objects and advantages of the invention will be more clearly understood, the invention is further described in detail below with reference to examples. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Example 1
A multi-mode multi-spliced RGB-D saliency target detection method comprises the following steps:
s1, dividing the image into non-overlapping sub-regions, respectively extracting RGB image color information, Depth information of a Depth image and symmetric invariant LBP characteristics of each image sub-region, and forming a region histogram based on the symmetric invariant LBP characteristics;
s2, measuring the correlation of RGB image color information, Depth information of a Depth image and a region histogram based on class condition mutual information entropy, and realizing the fusion of the RGB image color information, the Depth information of the Depth image and the region histogram in the order of scores by using a self-adaptive score fusion algorithm to obtain the final score of each image subregion;
and S3, realizing the saliency target detection of the image according to the final score of each image subregion based on the ResNet50 model.
In this embodiment, in step S1, extraction of RGB image color information, Depth information of a Depth image, and symmetric invariant LBP features of each sub-region is implemented based on a Depth convolution network.
In this embodiment, in step S1, the detection of the image intra-object is first implemented based on the Dssd _ Xception _ coco model, and then the division of the image sub-regions is implemented based on the detection result of the object. Each target is configured with an image subregion, and the rest of the background is configured with an image subregion. The Dssd _ Xscene _ coco model adopts a DSSD target detection algorithm, an Xscene neural network is pre-trained by a coco data set, then the model is trained by a previously prepared data set, various parameters in the deep neural network are finely adjusted, and finally a suitable detection model for detecting various targets (such as people, trees, furniture and the like) carried in the image is obtained.
Example 2
A multi-mode multi-spliced RGB-D saliency target detection method comprises the following steps:
s1, recognizing the human eye observation angle of the image, wherein different human eye observation angles correspond to different image deflection angle adjusting models, and adjusting the image deflection angle based on the image deflection angle model;
s2, dividing the image into non-overlapping sub-regions, respectively extracting RGB image color information, Depth information of a Depth image and symmetric invariant LBP characteristics of each image sub-region, and forming a region histogram based on the symmetric invariant LBP characteristics;
s3, measuring the correlation of RGB image color information, Depth information of a Depth image and a region histogram based on class condition mutual information entropy, and realizing the fusion of the RGB image color information, the Depth information of the Depth image and the region histogram in the order of scores by using a self-adaptive score fusion algorithm to obtain the final score of each image subregion;
and S4, realizing the saliency target detection of the image according to the final score of each image subregion based on the ResNet50 model.
In this embodiment, in step S1, extraction of RGB image color information, Depth information of a Depth image, and symmetric invariant LBP features of each sub-region is implemented based on a Depth convolution network.
In this embodiment, in step S1, firstly, the detection of the targets carried in the image is realized based on the Dssd _ Xception _ coco model, and then the division of the image sub-regions is realized based on the detection result of the targets, where each target is configured with an image sub-region, and the remaining background is configured with an image sub-region. The Dssd _ Xscene _ coco model adopts a DSSD target detection algorithm, an Xscene neural network is pre-trained by a coco data set, then the model is trained by a previously prepared data set, various parameters in the deep neural network are finely adjusted, and finally a suitable detection model for detecting various targets (such as people, trees, furniture and the like) carried in the image is obtained.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that those skilled in the art can make various improvements and modifications without departing from the principle of the present invention, and these improvements and modifications should also be construed as the protection scope of the present invention.

Claims (7)

1. A multi-mode multi-spliced RGB-D saliency target detection method is characterized by comprising the following steps:
s1, dividing the image into non-overlapping sub-regions, respectively extracting RGB image color information, Depth information of a Depth image and symmetric invariant LBP characteristics of each image sub-region, and forming a region histogram based on the symmetric invariant LBP characteristics;
s2, measuring the correlation of RGB image color information, Depth information of a Depth image and a region histogram based on class condition mutual information entropy, and realizing the fusion of the RGB image color information, the Depth information of the Depth image and the region histogram in the order of scores by using a self-adaptive score fusion algorithm to obtain the final score of each image subregion;
and S3, realizing the salient object detection of the image based on the final score of each image subregion.
2. The method according to claim 1, wherein in step S1, extraction of RGB image color information, Depth information of Depth image, and symmetric invariant LBP features of each sub-region is implemented based on a Depth convolution network.
3. The method as claimed in claim 1, wherein in step S1, firstly, the detection of the object carried in the image is implemented based on Dssd _ Xception _ coco model, and then, the division of the image sub-regions is implemented based on the detection result of the object.
4. The method as claimed in claim 3, wherein each target is configured with an image sub-region, and the remaining background is configured with an image sub-region.
5. The method as claimed in claim 3, wherein the Dssd _ Xception _ coco model employs a DSSD target detection algorithm, pre-trains an Xception neural network with a coco data set, then trains the model with a previously prepared data set, fine-tunes parameters in the deep neural network, and finally obtains a suitable detection model for detecting targets carried in the image.
6. The multi-modal multi-stitching RGB-D saliency target detection method according to claim 1, wherein in step S3, saliency target detection of images is achieved according to the final score of each image sub-region based on the ResNet50 model.
7. The method of claim 1, further comprising: and identifying the human eye observation angle of the image, wherein different human eye observation angles correspond to different image deflection angle adjustment models, and the image deflection angle is adjusted based on the image deflection angle model.
CN202110461176.7A 2021-04-27 2021-04-27 Multi-mode multi-spliced RGB-D (red, green and blue) -D (digital video) saliency target detection method Active CN113128519B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110461176.7A CN113128519B (en) 2021-04-27 2021-04-27 Multi-mode multi-spliced RGB-D (red, green and blue) -D (digital video) saliency target detection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110461176.7A CN113128519B (en) 2021-04-27 2021-04-27 Multi-mode multi-spliced RGB-D (red, green and blue) -D (digital video) saliency target detection method

Publications (2)

Publication Number Publication Date
CN113128519A true CN113128519A (en) 2021-07-16
CN113128519B CN113128519B (en) 2023-08-08

Family

ID=76780202

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110461176.7A Active CN113128519B (en) 2021-04-27 2021-04-27 Multi-mode multi-spliced RGB-D (red, green and blue) -D (digital video) saliency target detection method

Country Status (1)

Country Link
CN (1) CN113128519B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015180527A1 (en) * 2014-05-26 2015-12-03 清华大学深圳研究生院 Image saliency detection method
CN107145892A (en) * 2017-05-24 2017-09-08 北京大学深圳研究生院 A kind of image significance object detection method based on adaptive syncretizing mechanism
CN107909078A (en) * 2017-10-11 2018-04-13 天津大学 Conspicuousness detection method between a kind of figure
CN108345892A (en) * 2018-01-03 2018-07-31 深圳大学 A kind of detection method, device, equipment and the storage medium of stereo-picture conspicuousness
CN108846416A (en) * 2018-05-23 2018-11-20 北京市新技术应用研究所 The extraction process method and system of specific image
CN111353508A (en) * 2019-12-19 2020-06-30 华南理工大学 Saliency detection method and device based on RGB image pseudo-depth information

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015180527A1 (en) * 2014-05-26 2015-12-03 清华大学深圳研究生院 Image saliency detection method
CN107145892A (en) * 2017-05-24 2017-09-08 北京大学深圳研究生院 A kind of image significance object detection method based on adaptive syncretizing mechanism
CN107909078A (en) * 2017-10-11 2018-04-13 天津大学 Conspicuousness detection method between a kind of figure
CN108345892A (en) * 2018-01-03 2018-07-31 深圳大学 A kind of detection method, device, equipment and the storage medium of stereo-picture conspicuousness
CN108846416A (en) * 2018-05-23 2018-11-20 北京市新技术应用研究所 The extraction process method and system of specific image
CN111353508A (en) * 2019-12-19 2020-06-30 华南理工大学 Saliency detection method and device based on RGB image pseudo-depth information

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
PIPIT UTAMI: "A Study on Facial Expression Recognition in Assessing Teaching Skills: Datasets and Methods", 《PROCEDIA COMPUTER SCIENCE》, vol. 161 *
吴建国;邵婷;刘政怡;: "融合显著深度特征的RGB-D图像显著目标检测", 电子与信息学报, no. 09 *
赵轩;郭蔚;刘京;: "RGB-D图像中的分步超像素聚合和多模态融合目标检测", 中国图象图形学报, no. 08 *

Also Published As

Publication number Publication date
CN113128519B (en) 2023-08-08

Similar Documents

Publication Publication Date Title
Borji et al. Adaptive object tracking by learning background context
KR102596897B1 (en) Method of motion vector and feature vector based fake face detection and apparatus for the same
EP3916627A1 (en) Living body detection method based on facial recognition, and electronic device and storage medium
JP4755202B2 (en) Face feature detection method
Ghimire et al. A robust face detection method based on skin color and edges
Kumano et al. Pose-invariant facial expression recognition using variable-intensity templates
JP4597391B2 (en) Facial region detection apparatus and method, and computer-readable recording medium
CN111460884A (en) Multi-face recognition method based on human body tracking
CN110796101A (en) Face recognition method and system of embedded platform
CN111814655B (en) Target re-identification method, network training method thereof and related device
Li et al. Robust multiperson detection and tracking for mobile service and social robots
Mu Ear detection based on skin-color and contour information
Jabnoun et al. Visual substitution system for blind people based on SIFT description
Kerdvibulvech Human hand motion recognition using an extended particle filter
Chen et al. Fast eye detection using different color spaces
CN116386118B (en) Drama matching cosmetic system and method based on human image recognition
CN109993090B (en) Iris center positioning method based on cascade regression forest and image gray scale features
Nanda et al. A robust elliptical head tracker
CN113128519A (en) Multi-mode multi-spliced RGB-D significance target detection method
Bagherian et al. Extract of facial feature point
Bongale et al. Implementation of 3D object recognition and tracking
CN114445691A (en) Model training method and device, electronic equipment and storage medium
WO2021056531A1 (en) Face gender recognition method, face gender classifier training method and device
Naveena et al. Partial face recognition by template matching
Sharma et al. Study and implementation of face detection algorithm using Matlab

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant