CN107886533A

CN107886533A - Vision significance detection method, device, equipment and the storage medium of stereo-picture

Info

Publication number: CN107886533A
Application number: CN201711014924.7A
Authority: CN
Inventors: 张秋丹; 王旭; 江健民; 周宇
Original assignee: Shenzhen University
Current assignee: Shenzhen University
Priority date: 2017-10-26
Filing date: 2017-10-26
Publication date: 2018-04-06
Anticipated expiration: 2037-10-26
Also published as: CN107886533B

Abstract

The applicable field of computer technology of the present invention, there is provided a kind of vision significance detection method of stereo-picture, device, equipment and storage medium, this method include：When the vision significance for receiving stereo-picture detects request, the colouring information and depth information of stereo-picture are obtained first, then respectively to colouring information, depth information and colouring information and depth information carry out conspicuousness prediction, obtain the prediction of the first conspicuousness, second conspicuousness is predicted and the prediction of the 3rd conspicuousness, the first obtained conspicuousness is predicted afterwards, second conspicuousness is predicted and the prediction of the 3rd conspicuousness is cascaded with default multiple center-biased priori, obtain multichannel cascade connection data, multi-channel information Spatial Difference fusion is carried out to multichannel cascade connection data finally by default interchannel UNE, to obtain the Saliency maps of stereo-picture, so as to improve the accuracy of conspicuousness detection.

Description

Vision significance detection method, device, equipment and the storage medium of stereo-picture

Technical field

The invention belongs to field of computer technology, more particularly to a kind of vision significance detection method of stereo-picture, dress Put, equipment and storage medium.

Background technology

Recently, because deep learning model such as convolutional neural networks have been widely used in vision significance detection mould Type, and it has been obviously improved the performance of vision significance model.Therefore, largely the 2D visions based on deep learning are notable Property model is also suggested.Vig et al. is the elder generation for first attempting to build the vision significance detection model based on convolutional neural networks Drive, the model is named as depth combination of network (eDN).Afterwards, Kummerer et al. proposes a conspicuousness model, the model Using an existing neural network model extraction deep learning feature, the vision for then reusing these feature calculation images shows Work property.Srinivas et al. designs a conspicuousness model, and due to the space-invariance of complete convolutional network, the model uses one Individual novel location-based convolutional network goes to model the pattern of location-dependent query.Huang et al. proposes one based on depth god Conspicuousness method through network is used for the gap that reduces between model prediction result and people's eye fixation behavior, the model by using Different scale images information and deep neural network model is finely tuned based on the object function of conspicuousness evaluation index.Marcella Et al. further propose a novel conspicuousness attention model for natural image.However, these methods are all based on 2D What multimedia application proposed.

Different from traditional 2D conspicuousness models, only small part conspicuousness model predicts a 3D nature using depth map Human eye position of interest in scene, and by using one it is linear plus and method by resulting color and depth characteristic figure After being merged, a final 3D Saliency maps are generated.Also some 3D rendering conspicuousness computation models pass through extension one A little traditional 2D vision significance models are suggested.For example Neil et al. is by the way that existing attention model is extended from 2D A three-dimensional notice framework is proposed to binocular domain.Zhang et al. uses multiple perception in stereoscopic vision attention model Stimulate.In order to generate the conspicuousness of final 3D rendering, weight 2D Saliency maps are removed with depth information in some models.Lang Et al. on 2D and 3D rendering carry out eyeball tracking experimental result be used for carry out depth significance analysis, wherein by extend with Preceding 2D conspicuousnesses detection model calculates 3D Saliency maps.Recently, Fang et al. is proposed color, brightness, texture and depth Etc. the Saliency maps that information is combined together generation 3D rendering.

Although it is contemplated that depth characteristic has improved the performance of the conspicuousness detection model of stereo-picture, however, existing These conspicuousness detection models still have the problem of some are challenging in terms of the content sign of stereo-picture.Traditional The method of manual extraction characteristics of image is difficult the high-level image, semantic information of extraction, and traditional stereo-picture conspicuousness is melted Conjunction method can not also detect the spatial coherence between the color of stereo-picture and depth information.In addition, the side of linear fusion Method only simply adds with method to merge multiple characteristic patterns of extraction, not in view of the difference in space simply by one Property.In summary, existing stereo-picture conspicuousness detection model lacks diversified picture material and characterizes and do not account for Spatial diversity between the feature such as color and depth.

The content of the invention

It is an object of the invention to provide a kind of vision significance detection method of stereo-picture, device, equipment and storage Medium, it is intended to solve because existing stereo-picture vision significance detection method lacks diversified picture material sign and neglects The spatial diversity between color characteristic and depth characteristic has been omited, has caused conspicuousness to detect the problem of inaccurate.

On the one hand, the invention provides a kind of vision significance detection method of stereo-picture, methods described to include following Step：

When the vision significance for receiving stereo-picture detects request, the colouring information and depth of the stereo-picture are obtained Spend information；

Predict that network carries out conspicuousness prediction to the colouring information by default color conspicuousness, it is described vertical to obtain The first conspicuousness prediction of body image, it is pre- to predict that network carries out conspicuousness to the depth information by default depth conspicuousness Survey, to obtain the prediction of the second conspicuousness of the stereo-picture, and predict network to the face by default joint conspicuousness Color information and the depth information carry out conspicuousness prediction, to obtain the prediction of the 3rd conspicuousness of the stereo-picture；

By first conspicuousness prediction, second conspicuousness prediction and the 3rd conspicuousness prediction with it is default Multiple center-biased priori are cascaded, and obtain multichannel cascade connection data；

Multi-channel information Spatial Difference is carried out to the multichannel cascade connection data by default interchannel UNE Fusion, to obtain the Saliency maps of the stereo-picture.

On the other hand, the invention provides a kind of vision significance detection means of stereo-picture, described device to include：

Information acquisition unit, for when the vision significance for receiving stereo-picture detects request, obtaining the solid The colouring information and depth information of image；

Conspicuousness predicting unit is notable for predicting that network is carried out to the colouring information by default color conspicuousness Property prediction, with obtain the first conspicuousness of the stereo-picture prediction, by default depth conspicuousness predict network to described Depth information carries out conspicuousness prediction, to obtain the prediction of the second conspicuousness of the stereo-picture, and it is aobvious by default joint Work property prediction network carries out conspicuousness prediction to the colouring information and the depth information, to obtain the of the stereo-picture Three conspicuousnesses are predicted；

Passage concatenation unit, for by first conspicuousness prediction, second conspicuousness prediction and the described 3rd Conspicuousness prediction is cascaded with default multiple center-biased priori, obtains multichannel cascade connection data；And

Saliency maps acquiring unit, for being carried out by default interchannel UNE to the multichannel cascade connection data Multi-channel information Spatial Difference merges, to obtain the Saliency maps of the stereo-picture.

On the other hand, present invention also offers a kind of image detecting apparatus, including memory, processor and it is stored in institute The computer program that can be run in memory and on the processor is stated, it is real during computer program described in the computing device Now such as the step of the vision significance detection method of the stereo-picture.

On the other hand, present invention also offers a kind of computer-readable recording medium, the computer-readable recording medium Computer program is stored with, the vision significance inspection such as the stereo-picture is realized when the computer program is executed by processor The step of survey method.

Present invention firstly receives stereo-picture vision significance detect request, and obtain stereo-picture colouring information and Depth information, then predict that network carries out conspicuousness prediction to colouring information by default color conspicuousness, to obtain solid The first conspicuousness prediction of image, predict that network carries out conspicuousness prediction to depth information by default depth conspicuousness, with The second conspicuousness prediction of stereo-picture is obtained, and predicts that network is believed colouring information and depth by default joint conspicuousness Breath carries out conspicuousness prediction, to obtain the prediction of the 3rd conspicuousness of stereo-picture, afterwards by the first conspicuousness prediction, second notable Property prediction and the 3rd conspicuousness prediction cascaded with default multiple center-biased priori, obtain multichannel cascade connection data, Multi-channel information Spatial Difference fusion is carried out to multichannel cascade connection data finally by default interchannel UNE, with To the Saliency maps of stereo-picture, so as to improve the accuracy of conspicuousness detection.

Brief description of the drawings

Fig. 1 is the implementation process figure of the vision significance detection method for the stereo-picture that the embodiment of the present invention one provides；

Fig. 2 is the structural representation of the vision significance detection means for the stereo-picture that the embodiment of the present invention two provides；

Fig. 3 is the structural representation of the vision significance detection means for the stereo-picture that the embodiment of the present invention three provides；With And

Fig. 4 is the structural representation for the image detecting apparatus that the embodiment of the present invention four provides.

Embodiment

In order to make the purpose , technical scheme and advantage of the present invention be clearer, it is right below in conjunction with drawings and Examples The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and It is not used in the restriction present invention.

It is described in detail below in conjunction with specific implementation of the specific embodiment to the present invention：

Embodiment one：

Fig. 1 shows the implementation process of the vision significance detection method for the stereo-picture that the embodiment of the present invention one provides, For convenience of description, the part related to the embodiment of the present invention is illustrate only, details are as follows：

In step S101, when the vision significance for receiving stereo-picture detects request, the face of stereo-picture is obtained Color information and depth information.

The embodiment of the present invention is applied to the vision significance detecting system of stereo-picture, to predict user in 3D natural scenes Position of interest, Saliency maps corresponding to generation.In embodiments of the present invention, when the vision significance for receiving stereo-picture During detection request, the colouring information and depth information of stereo-picture are obtained, is calculated for follow-up vision significance.Stereogram As may be embodied in vision significance detection request, can also independently be transmitted.

In step s 102, predict that network carries out conspicuousness prediction to colouring information by default color conspicuousness, with The first conspicuousness prediction of stereo-picture is obtained, predicts that network carries out conspicuousness to depth information by default depth conspicuousness Prediction, to obtain the prediction of the second conspicuousness of stereo-picture, and predict network to colouring information by default joint conspicuousness Conspicuousness prediction is carried out with depth information, to obtain the prediction of the 3rd conspicuousness of stereo-picture.

In embodiments of the present invention, color conspicuousness prediction network include predetermined number stacking convolutional layer, one point Class layer, a linear interpolation layer and an output layer, depth conspicuousness prediction network also include the convolution that predetermined number stacks Module, a classification layer, a linear interpolation layer and an output layer, joint conspicuousness prediction network include two full convolution nets Network stream, ' Concat ' layer, a classification layer, a linear interpolation layer and an output layer.

Preferably, when predicting that network carries out conspicuousness prediction to colouring information by default color conspicuousness, first Predict that predetermined number convolutional layer carries out feature extraction to colouring information in network by color conspicuousness, to obtain corresponding face Color characteristic figure, then predict that the classification layer in network is classified to color characteristic figure by color conspicuousness, generate dense face Color conspicuousness prognostic chart, so as to improve the variation of picture material characterization information, wherein, the classification layer includes one 3x3 volumes Product core and an output channel, finally according to the spatial resolution of stereo-picture, the line in network is predicted by color conspicuousness Property interpolated layer dense color conspicuousness prognostic chart is up-sampled, and cross entropy operation is performed to the obtained image of up-sampling, Obtain the first conspicuousness prediction (prediction of color conspicuousness) of stereo-picture and exported by output layer, so as to improve use In the variation for the characteristics of image for characterizing stereo-picture color.

Preferably, when predicting that network carries out conspicuousness prediction to depth information by default depth conspicuousness, first Predict that predetermined number convolutional layer carries out feature extraction to depth information in network by depth conspicuousness, to obtain corresponding depth Characteristic pattern is spent, then predicts that the classification layer in network is classified to depth characteristic figure by depth conspicuousness, generates dense depth Conspicuousness prognostic chart is spent, so as to improve the variation of picture material characterization information, wherein, the classification layer includes one 3x3 volumes Product core and an output channel, finally according to the spatial resolution of stereo-picture, the line in network is predicted by depth conspicuousness Property interpolated layer dense depth conspicuousness prognostic chart is up-sampled, and cross entropy operation is performed to the obtained image of up-sampling, Obtain the second conspicuousness prediction (prediction of depth conspicuousness) of stereo-picture and exported by output layer, so as to improve use In the variation for the characteristics of image for characterizing stereo-picture depth.

Specifically, when carrying out feature extraction to colouring information and depth information by predetermined number convolutional layer, each Can be according to default formula in individual convolutional layerCarry out feature extraction,WithRepresent One random convolution filter parameter, n ∈ { 64,128,256,512 } represent the wave filter sum of l layers, finally give face Color characteristic figure F_cOr depth characteristic figure F_d, then according to formula S in linear interpolated layer_c/d=sigmoid (↑ (ω_i*F_c/d+ b_i)) perform cross entropy operation, wherein ω_iAnd b_iThe weight vectors relative to pixel i and biasing are represented respectively, ↑ represent up-sampling Operation, sigmoid represent a cross entropy operation, S_c/dTo up-sample the result that obtained image performs cross entropy operation.

Preferably, predicting that network is pre- to colouring information and depth information progress conspicuousness by default joint conspicuousness During survey, predict that two full convolutional network streams in network carry out spy to colouring information and depth information by combining conspicuousness first Sign extraction, corresponding color characteristic figure and depth characteristic figure are obtained, then obtained color characteristic figure and depth characteristic figure are entered Row feature cascades, and obtains color and depth union feature figure, afterwards by combining the classification layer in conspicuousness prediction network to face Color and depth union feature figure are classified, and dense color and depth joint conspicuousness prognostic chart are generated, so as to improve image The variation of content characterization information, finally according to the spatial resolution of stereo-picture, predicted by combining conspicuousness in network Linear interpolation layer is up-sampled to dense color and depth joint conspicuousness prognostic chart, and the image that up-sampling obtains is performed Cross entropy operates, and obtains the 3rd conspicuousness prediction (color and the prediction of depth conspicuousness) of stereo-picture and is carried out by output layer Output, so as to improve the variation of the characteristics of image for characterizing stereo-picture depth, while realizes color characteristic and depth The calculating of spatial diversity between degree feature.Wherein, the convolutional layer that the full convolutional network stream is stacked by predetermined number forms, this point Class layer includes a 3x3 convolution kernel and an output channel.

Specifically, when the color characteristic figure to obtaining and depth characteristic figure carry out feature cascade, first at ' Concat ' According to formula F in layer_c&d=Concat (F_c,F_d) feature cascade is carried out, obtain color and depth union feature figure F_c&d, Ran Hou According to formula S in linear interpolation layer_c&d=Sigmoid (↑ (ω_i*F_c&d+b_i)) perform cross entropy operation, S_c&dObtained for up-sampling Image perform cross entropy operation result.

In step s 103, the first obtained conspicuousness prediction, the prediction of the second conspicuousness and the 3rd conspicuousness are predicted Cascaded with default multiple center-biased priori, obtain multichannel cascade connection data.

In embodiments of the present invention, the framework of default interchannel UNE include ' Concat ' layer, one it is defeated Enter layer, two convolutional layers, one return a convolution classification layer and output layer, the interchannel UNE be used to carrying out center according to Rely the fusion of the Spatial Difference of pattern and visual signature, so as to improve the integrality of Saliency maps and display effect.

In embodiments of the present invention, due to different picture materials and environment is collected, center-biased is various and not unique , therefore, in order to which learning center surrounds feature, the first obtained conspicuousness prediction, the prediction of the second conspicuousness and the 3rd are shown The prediction of work property and default multiple center-biased priori I_cbAccording to formula S_IC=Concat (S_c,S_d,S_c&d,I_cb) cascaded, it is raw Into n-channel cascade data S_IC。

In step S104, it is empty that multi-channel information is carried out to multichannel cascade connection data by default interchannel UNE Between otherness merge, to obtain the Saliency maps of stereo-picture.

In embodiments of the present invention, it is preferable that multichannel cascade connection data are entered by default interchannel UNE Row multi-channel information Spatial Difference merge when, first by multichannel cascade connection data input into interchannel UNE convolution kernel Size is to obtain the visual signature and center-biased pattern of dense conspicuousness prognostic chart in 3x3 two convolutional layers respectively, then Convolution is performed to visual signature and center-biased pattern by the recurrence convolutional layer of interchannel UNE and returns operation, with basis FormulaThe Saliency maps of stereo-picture are calculated, so as to by calculating color characteristic Spatial diversity information between depth characteristic, improve the accuracy of conspicuousness detection.Wherein, convolution classification layer is returned to include One 3x3 convolution kernel and an output channel, I_cbMultiple center-biased priori are represented, R represents ReLU nonlinear operations, ' Sigmoid ' is a cost function, S_3dRepresent the Saliency maps of stereo-picture.

Embodiment two：

Fig. 2 shows the structure of the vision significance detection means for the stereo-picture that the embodiment of the present invention two provides, in order to It is easy to illustrate, illustrate only the part related to the embodiment of the present invention, including：

Information acquisition unit 21, for when the vision significance for receiving stereo-picture detects request, obtaining stereogram The colouring information and depth information of picture.

In embodiments of the present invention, when the vision significance for receiving stereo-picture detects request, acquisition of information is passed through Unit 21 obtains the colouring information and depth information of stereo-picture, is calculated for follow-up vision significance.Stereo-picture can Included in vision significance detection request, can also independently be transmitted.

Conspicuousness predicting unit 22, for predicting that network carries out conspicuousness to colouring information by default color conspicuousness Prediction, to obtain the prediction of the first conspicuousness of stereo-picture, predict that network enters to depth information by default depth conspicuousness Row conspicuousness is predicted, to obtain the prediction of the second conspicuousness of stereo-picture, and predicts network pair by default joint conspicuousness Colouring information and depth information carry out conspicuousness prediction, to obtain the prediction of the 3rd conspicuousness of stereo-picture.

Passage concatenation unit 23, the first conspicuousness for that will obtain is predicted, the second conspicuousness is predicted and the 3rd is notable Property prediction cascaded with default multiple center-biased priori, obtain multichannel cascade connection data.

In embodiments of the present invention, due to different picture materials and environment is collected, center-biased is various and not unique , therefore, in order to which learning center surrounds feature, the first obtained conspicuousness is predicted by passage concatenation unit 23, second shows The prediction of work property and the prediction of the 3rd conspicuousness and default multiple center-biased priori I_cbAccording to formula S_IC=Concat (S_c,S_d, S_c&d,I_cb) cascaded, generation n-channel cascade data S_IC。

Saliency maps acquiring unit 24 is more for being carried out by default interchannel UNE to multichannel cascade connection data Channel information Spatial Difference merges, to obtain the Saliency maps of stereo-picture.

In embodiments of the present invention, when the vision significance for receiving stereo-picture detects request, information is passed through first Acquiring unit 21 obtains the colouring information and depth information of stereo-picture, then by conspicuousness predicting unit 22 respectively to color Information, depth information and colouring information and depth information carry out conspicuousness prediction, obtain the first conspicuousness prediction, second notable Property prediction and the 3rd conspicuousness prediction, the first obtained conspicuousness is predicted by passage concatenation unit 23 afterwards, second shown The prediction of work property and the prediction of the 3rd conspicuousness are cascaded with default multiple center-biased priori, obtain multichannel cascade connection number According to last Saliency maps acquiring unit 24 carries out multichannel letter by default interchannel UNE to multichannel cascade connection data Spatial Difference fusion is ceased, to obtain the Saliency maps of stereo-picture, so as to improve the accuracy of conspicuousness detection.

In embodiments of the present invention, each unit of the vision significance detection means of stereo-picture can by corresponding hardware or Software unit realizes that each unit can be independent soft and hardware unit, can also be integrated into a soft and hardware unit, herein not To limit the present invention.

Embodiment three：

Fig. 3 shows the structure of the vision significance detection means for the stereo-picture that the embodiment of the present invention three provides, in order to It is easy to illustrate, illustrate only the part related to the embodiment of the present invention, including：

Information acquisition unit 31, for when the vision significance for receiving stereo-picture detects request, obtaining stereogram The colouring information and depth information of picture.

In embodiments of the present invention, when the vision significance for receiving stereo-picture detects request, acquisition of information is passed through Unit 31 obtains the colouring information and depth information of stereo-picture, is calculated for follow-up vision significance.Stereo-picture can Included in vision significance detection request, can also independently be transmitted.

Conspicuousness predicting unit 32, for predicting that network carries out conspicuousness to colouring information by default color conspicuousness Prediction, to obtain the prediction of the first conspicuousness of stereo-picture, predict that network enters to depth information by default depth conspicuousness Row conspicuousness is predicted, to obtain the prediction of the second conspicuousness of stereo-picture, and predicts network pair by default joint conspicuousness Colouring information and depth information carry out conspicuousness prediction, to obtain the prediction of the 3rd conspicuousness of stereo-picture.

Preferably, when predicting that network carries out conspicuousness prediction to colouring information by default color conspicuousness, first Predict that predetermined number convolutional layer carries out feature extraction to colouring information in network by color conspicuousness, to obtain corresponding face Color characteristic figure, then predict that the classification layer in network is classified to color characteristic figure by color conspicuousness, generate dense face Color conspicuousness prognostic chart, so as to improve the variation of picture material characterization information, wherein, the classification layer includes one 3x3 volumes Product core and an output channel, finally according to the spatial resolution of stereo-picture, the line in network is predicted by color conspicuousness Property interpolated layer dense color conspicuousness prognostic chart is up-sampled, and cross entropy operation is performed to the obtained image of up-sampling, The first conspicuousness for obtaining stereo-picture is predicted and exported by output layer, is used to characterize stereo-picture face so as to improve The variation of the characteristics of image of color.

Preferably, when predicting that network carries out conspicuousness prediction to depth information by default depth conspicuousness, first Predict that predetermined number convolutional layer carries out feature extraction to depth information in network by depth conspicuousness, to obtain corresponding depth Characteristic pattern is spent, then predicts that the classification layer in network is classified to depth characteristic figure by depth conspicuousness, generates dense depth Conspicuousness prognostic chart is spent, so as to improve the variation of picture material characterization information, wherein, the classification layer includes one 3x3 volumes Product core and an output channel, finally according to the spatial resolution of stereo-picture, the line in network is predicted by depth conspicuousness Property interpolated layer dense depth conspicuousness prognostic chart is up-sampled, and cross entropy operation is performed to the obtained image of up-sampling, The second conspicuousness for obtaining stereo-picture is predicted and exported by output layer, is used to characterize stereo-picture depth so as to improve The variation of the characteristics of image of degree.

Preferably, predicting that network is pre- to colouring information and depth information progress conspicuousness by default joint conspicuousness During survey, predict that two full convolutional network streams in network carry out spy to colouring information and depth information by combining conspicuousness first Sign extraction, corresponding color characteristic figure and depth characteristic figure are obtained, then obtained color characteristic figure and depth characteristic figure are entered Row feature cascades, and obtains color and depth union feature figure, afterwards by combining the classification layer in conspicuousness prediction network to face Color and depth union feature figure are classified, and dense color and depth joint conspicuousness prognostic chart are generated, so as to improve image The variation of content characterization information, finally according to the spatial resolution of stereo-picture, predicted by combining conspicuousness in network Linear interpolation layer is up-sampled to dense color and depth joint conspicuousness prognostic chart, and the image that up-sampling obtains is performed Cross entropy is operated, and the 3rd conspicuousness for obtaining stereo-picture is predicted and exported by output layer, is used for table so as to improve The variation of the characteristics of image of stereo-picture depth is levied, while realizes the meter of spatial diversity between color characteristic and depth characteristic Calculate.Wherein, the convolutional layer that the full convolutional network stream is stacked by predetermined number forms, and the classification layer includes a 3x3 convolution kernel With an output channel.

Passage concatenation unit 33, the first conspicuousness for that will obtain is predicted, the second conspicuousness is predicted and the 3rd is notable Property prediction cascaded with default multiple center-biased priori, obtain multichannel cascade connection data.

In embodiments of the present invention, due to different picture materials and environment is collected, center-biased is various and not unique , therefore, in order to which learning center surrounds feature, the first obtained conspicuousness is predicted by passage concatenation unit 33, second shows The prediction of work property and the prediction of the 3rd conspicuousness and default multiple center-biased priori I_cbAccording to formula S_IC=Concat (S_c,S_d, S_c&d,I_cb) cascaded, generation n-channel cascade data S_IC。

Saliency maps acquiring unit 34 is more for being carried out by default interchannel UNE to multichannel cascade connection data Channel information Spatial Difference merges, to obtain the Saliency maps of stereo-picture.

In embodiments of the present invention, it is preferable that multichannel cascade connection data are entered by default interchannel UNE Row multi-channel information Spatial Difference merge when, first by multichannel cascade connection data input into interchannel UNE convolution kernel Size is to obtain the visual signature and center-biased pattern of dense conspicuousness prognostic chart in 3x3 two convolutional layers respectively, then Convolution is performed to visual signature and center-biased pattern by the recurrence convolutional layer of interchannel UNE and returns operation, with basis FormulaThe Saliency maps of stereo-picture are calculated, so as to by calculating color characteristic Spatial diversity information between depth characteristic, improve the accuracy of conspicuousness detection.Wherein, convolution classification layer is returned to include One 3x3 convolution kernel and an output channel according to formula calculate centered around characteristic pattern, I_cbRepresent that multiple center-biaseds are first Test, R represents ReLU nonlinear operations, and ' Sigmoid ' is a cost function, S_3dRepresent the Saliency maps of stereo-picture.

It is therefore preferred that the conspicuousness predicting unit 32 includes：

Characteristic pattern acquiring unit 321, for predicting that predetermined number convolutional layer is to color in network by color conspicuousness Information carries out feature extraction, to obtain corresponding color characteristic figure；

Tagsort unit 322, the classification layer for being predicted by color conspicuousness in network are carried out to color characteristic figure Classification, generates dense color conspicuousness prognostic chart, the classification layer includes a 3x3 convolution kernel and an output channel；

Unit 323 is up-sampled, for the spatial resolution according to stereo-picture, is predicted by color conspicuousness in network Linear interpolation layer up-samples to dense color conspicuousness prognostic chart；And

Cross entropy predicting unit 324, the image for being obtained to up-sampling perform cross entropy operation, obtain stereo-picture First conspicuousness is predicted；

Preferably, the Saliency maps acquiring unit 34 includes：

Convolutional filtering unit 341, for by multichannel cascade connection data input into interchannel UNE convolution kernel size For in 3x3 the first convolution filter and the second convolution filter, obtain respectively the visual signature of dense conspicuousness prognostic chart and Center-biased pattern；And

Subelement 342 is obtained, for the recurrence convolutional layer by interchannel UNE to visual signature and center-biased Pattern performs convolution and returns operation, obtains the Saliency maps of stereo-picture, returns convolutional layer and includes a 3x3 convolution kernel and one Output channel.

Example IV：

Fig. 4 shows the structure for the image detecting apparatus that the embodiment of the present invention four provides, and for convenience of description, illustrate only The part related to the embodiment of the present invention.

The image detecting apparatus 4 of the embodiment of the present invention includes processor 40, memory 41 and is stored in memory 41 And the computer program 42 that can be run on processor 40.The processor 40 is realized above-mentioned each vertical when performing computer program 42 Step in the vision significance detection method embodiment of body image, such as the step S101 to S104 shown in Fig. 1.Or place Reason device 40 realizes the function of each unit in above-mentioned each device embodiment when performing computer program 42, for example, unit 21 shown in Fig. 2 To the function of unit 31 to 34 shown in 24, Fig. 3.

In embodiments of the present invention, regarding for above-mentioned each stereo-picture is realized when the processor 40 performs computer program 42 When feeling the step in conspicuousness detection method embodiment, the vision significance detection request of stereo-picture is received first, and is obtained The colouring information and depth information of stereo-picture, then predict that network shows to colouring information by default color conspicuousness The prediction of work property, to obtain the prediction of the first conspicuousness of stereo-picture, predict that network is believed depth by default depth conspicuousness Breath carries out conspicuousness prediction, to obtain the prediction of the second conspicuousness of stereo-picture, and passes through the default joint pre- survey grid of conspicuousness Network carries out conspicuousness prediction to colouring information and depth information, to obtain the prediction of the 3rd conspicuousness of stereo-picture, afterwards by the The prediction of one conspicuousness, the prediction of the second conspicuousness and the prediction of the 3rd conspicuousness carry out level with default multiple center-biased priori Connection, obtains multichannel cascade connection data, and multichannel is carried out to multichannel cascade connection data finally by default interchannel UNE Information space otherness merges, to obtain the Saliency maps of stereo-picture, so as to improve the accuracy of conspicuousness detection.

The step of processor 40 is realized when performing computer program 42 in the image detecting apparatus 4 specifically refers to implementation The description of method, will not be repeated here in example one.

Embodiment five：

In embodiments of the present invention, there is provided a kind of computer-readable recording medium, the computer-readable recording medium are deposited Computer program is contained, the computer program realizes the vision significance detection of above-mentioned each stereo-picture when being executed by processor Step in embodiment of the method, for example, the step S101 to S104 shown in Fig. 1.Or the computer program is executed by processor The function of each unit in the above-mentioned each device embodiments of Shi Shixian, for example, unit 21 to 24 shown in Fig. 2, unit 31 to 34 shown in Fig. 3 Function.

In embodiments of the present invention, the vision significance detection request of stereo-picture is received first, and obtains stereo-picture Colouring information and depth information, then by default color conspicuousness predict network to colouring information carry out conspicuousness it is pre- Survey, to obtain the prediction of the first conspicuousness of stereo-picture, predict that network is carried out to depth information by default depth conspicuousness Conspicuousness is predicted, to obtain the prediction of the second conspicuousness of stereo-picture, and predicts network to face by default joint conspicuousness Color information and depth information carry out conspicuousness prediction, notable by first afterwards to obtain the prediction of the 3rd conspicuousness of stereo-picture Property prediction, the second conspicuousness prediction and the 3rd conspicuousness prediction cascaded with default multiple center-biased priori, obtain Multichannel cascade connection data, multi-channel information space is carried out to multichannel cascade connection data finally by default interchannel UNE Otherness merges, to obtain the Saliency maps of stereo-picture, so as to improve the accuracy of conspicuousness detection.The computer program The vision significance detection method for the stereo-picture realized when being executed by processor further is referred in preceding method embodiment The description of step, will not be repeated here.

The computer-readable recording medium of the embodiment of the present invention can include that any of computer program code can be carried Entity or device, recording medium, for example, the memory such as ROM/RAM, disk, CD, flash memory.

The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all essences in the present invention All any modification, equivalent and improvement made within refreshing and principle etc., should be included in the scope of the protection.

Claims

1. a kind of vision significance detection method of stereo-picture, it is characterised in that methods described comprises the steps：

When the vision significance for receiving stereo-picture detects request, the colouring information and depth letter of the stereo-picture are obtained Breath；

Predict that network carries out conspicuousness prediction to the colouring information by default color conspicuousness, to obtain the stereogram The first conspicuousness prediction of picture, predict that network carries out conspicuousness prediction to the depth information by default depth conspicuousness, To obtain the prediction of the second conspicuousness of the stereo-picture, and predict that network is believed the color by default joint conspicuousness Breath and the depth information carry out conspicuousness prediction, to obtain the prediction of the 3rd conspicuousness of the stereo-picture；

By first conspicuousness prediction, second conspicuousness prediction and the 3rd conspicuousness prediction with it is default multiple Center-biased priori is cascaded, and obtains multichannel cascade connection data；

Multi-channel information Spatial Difference fusion is carried out to the multichannel cascade connection data by default interchannel UNE, To obtain the Saliency maps of the stereo-picture.

2. the method as described in claim 1, it is characterised in that predict network to the color by default color conspicuousness Information is carried out the step of conspicuousness prediction, including：

Predict that predetermined number convolutional layer carries out feature extraction to the colouring information in network by the color conspicuousness, with Color characteristic figure corresponding to obtaining；

Predict that the classification layer in network is classified to the color characteristic figure by the color conspicuousness, generate dense color Conspicuousness prognostic chart, the classification layer include a 3x3 convolution kernel and an output channel；

According to the spatial resolution of the stereo-picture, by the linear interpolation layer in color conspicuousness prediction network to institute Dense color conspicuousness prognostic chart is stated to be up-sampled；

Cross entropy operation is performed to the obtained image that up-samples, first conspicuousness for obtaining the stereo-picture is pre- Survey.

3. the method as described in claim 1, it is characterised in that predict network to the depth by default depth conspicuousness Information is carried out the step of conspicuousness prediction, including：

Predict that predetermined number convolutional layer carries out feature extraction to the depth information in network by the depth conspicuousness, with Depth characteristic figure corresponding to obtaining；

Predict that the classification layer in network is classified to the depth characteristic figure by the depth conspicuousness, generate dense depth Conspicuousness prognostic chart, the classification layer include a 3x3 convolution kernel and an output channel；

According to the spatial resolution of the stereo-picture, by the linear interpolation layer in depth conspicuousness prediction network to institute Dense depth conspicuousness prognostic chart is stated to be up-sampled；

Cross entropy operation is performed to the obtained image that up-samples, second conspicuousness for obtaining the stereo-picture is pre- Survey.

4. the method as described in claim 1, it is characterised in that predict network to the color by default joint conspicuousness The step of information and the depth information carry out conspicuousness prediction, including：

Predict that two full convolutional network streams in network are believed the colouring information and the depth by the joint conspicuousness Breath carries out feature extraction, obtains corresponding color characteristic figure and depth characteristic figure；

Feature cascade is carried out to the obtained color characteristic figure and the depth characteristic figure, obtains color and depth joint Characteristic pattern；

Predict that the classification layer in network is classified to the color and depth union feature figure by the joint conspicuousness, it is raw Into dense color and depth joint conspicuousness prognostic chart, the classification layer includes a 3x3 convolution kernel and an output channel；

According to the spatial resolution of the stereo-picture, by the linear interpolation layer in the joint conspicuousness prediction network to institute Dense color and depth joint conspicuousness prognostic chart is stated to be up-sampled；

Cross entropy operation is performed to the obtained image that up-samples, the 3rd conspicuousness for obtaining the stereo-picture is pre- Survey.

5. the method as described in claim 1, it is characterised in that by default interchannel UNE to the multichannel level Join the step of data carry out the fusion of multi-channel information Spatial Difference, including：

The first convolution that convolution kernel size is 3x3 into the interchannel UNE by the multichannel cascade connection data input is filtered In ripple device and the second convolution filter, the visual signature and center-biased pattern of dense conspicuousness prognostic chart are obtained respectively；

Convolution is performed by the recurrence convolutional layer of the interchannel UNE to the visual signature and center-biased pattern to return Return operation, obtain the Saliency maps of the stereo-picture, the recurrence convolutional layer includes a 3x3 convolution kernel and an output is logical Road.

6. the vision significance detection means of a kind of stereo-picture, it is characterised in that described device includes：

Information acquisition unit, for when the vision significance for receiving stereo-picture detects request, obtaining the stereo-picture Colouring information and depth information；

Conspicuousness predicting unit, it is pre- for predicting that network carries out conspicuousness to the colouring information by default color conspicuousness Survey, to obtain the prediction of the first conspicuousness of the stereo-picture, predict network to the depth by default depth conspicuousness Information carries out conspicuousness prediction, to obtain the prediction of the second conspicuousness of the stereo-picture, and passes through default joint conspicuousness Predict that network carries out conspicuousness prediction to the colouring information and the depth information, it is aobvious to obtain the 3rd of the stereo-picture the The prediction of work property；

Passage concatenation unit, for by first conspicuousness prediction, second conspicuousness prediction and described 3rd notable Property prediction cascaded with default multiple center-biased priori, obtain multichannel cascade connection data；And

Saliency maps acquiring unit, it is more logical for being carried out by default interchannel UNE to the multichannel cascade connection data Road information space otherness fusion, to obtain the Saliency maps of the stereo-picture.

7. device as claimed in claim 6, it is characterised in that the conspicuousness predicting unit includes：

Characteristic pattern acquiring unit, for predicting that predetermined number convolutional layer is to the color in network by the color conspicuousness Information carries out feature extraction, to obtain corresponding color characteristic figure；

Tagsort unit, for predicting that the classification layer in network is carried out to the color characteristic figure by the color conspicuousness Classification, generates dense color conspicuousness prognostic chart, and the classification layer includes a 3x3 convolution kernel and an output channel；

Unit is up-sampled, for the spatial resolution according to the stereo-picture, is predicted by the color conspicuousness in network Linear interpolation layer the dense color conspicuousness prognostic chart is up-sampled；And

Cross entropy predicting unit, for performing cross entropy operation to the obtained image that up-samples, obtain the stereo-picture First conspicuousness prediction.

8. device as claimed in claim 6, it is characterised in that the Saliency maps acquiring unit includes：

Convolutional filtering unit, for by the multichannel cascade connection data input into the interchannel UNE convolution kernel size For in 3x3 the first convolution filter and the second convolution filter, obtain respectively the visual signature of dense conspicuousness prognostic chart and Center-biased pattern；And

Subelement is obtained, for the recurrence convolutional layer by the interchannel UNE to the visual signature and center-biased Pattern performs convolution and returns operation, obtains the Saliency maps of the stereo-picture, and the recurrence convolutional layer includes a 3x3 convolution Core and an output channel.

9. a kind of image detecting apparatus, including memory, processor and it is stored in the memory and can be in the processing The computer program run on device, it is characterised in that realize such as claim 1 described in the computing device during computer program The step of to any one of 5 methods described.

10. a kind of computer-readable recording medium, the computer-readable recording medium storage has computer program, and its feature exists In when the computer program is executed by processor the step of realization such as any one of claim 1 to 5 methods described.