CN109344818B - Light field significant target detection method based on deep convolutional network - Google Patents

Light field significant target detection method based on deep convolutional network Download PDF

Info

Publication number
CN109344818B
CN109344818B CN201811141315.2A CN201811141315A CN109344818B CN 109344818 B CN109344818 B CN 109344818B CN 201811141315 A CN201811141315 A CN 201811141315A CN 109344818 B CN109344818 B CN 109344818B
Authority
CN
China
Prior art keywords
light field
layer
image
neural network
field data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811141315.2A
Other languages
Chinese (zh)
Other versions
CN109344818A (en
Inventor
张骏
刘亚美
刘紫薇
张钊
郑顺源
郑彤
王程
张旭东
高隽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hefei University of Technology
Original Assignee
Hefei University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hefei University of Technology filed Critical Hefei University of Technology
Priority to CN201811141315.2A priority Critical patent/CN109344818B/en
Publication of CN109344818A publication Critical patent/CN109344818A/en
Application granted granted Critical
Publication of CN109344818B publication Critical patent/CN109344818B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/10Image acquisition
    • G06V10/12Details of acquisition arrangements; Constructional details thereof
    • G06V10/14Optical characteristics of the device performing the acquisition or on the illumination arrangements
    • G06V10/145Illumination specially adapted for pattern recognition, e.g. using gratings
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a light field significant target detection method based on a deep convolutional network, which comprises the following steps: 1 converting light field data obtained by using a light field acquisition device into sub-aperture images of all visual angles; 2, recombining the sub-aperture images under different visual angles into a micro-lens image; 3, performing data enhancement on the microlens image; 4, building a significant target detection model combined with a microlens image on the basis of the pre-training weight of the Deeplab-V2 network, and training by using a data set; and 5, carrying out the salient object detection on the light field data to be processed by utilizing the trained salient object detection model. The method can effectively improve the accuracy of the detection of the salient target of the complex scene image.

Description

Light field significant target detection method based on deep convolutional network
Technical Field
The invention belongs to the field of computer vision, image processing and analysis, and particularly relates to a light field significant target detection method based on a deep convolutional network.
Background
Salient object detection is the perceptual capability of the human visual system. When an image is observed, the vision system can rapidly acquire the interested region and the target in the image, and the process of acquiring the interested region and the target is the detection of the salient target. With the development of computer technology and internet and the popularization of mobile intelligent equipment, people acquire external images to show blowout type growth. The obvious target detection selects a small part from a large amount of input visual information to enter subsequent complex processing, such as target detection and identification, image retrieval, image segmentation and the like, so that the calculation amount of a visual system is effectively reduced. At present, detection of salient objects has become one of the hot spots of research in the field of computer vision.
Current methods of salient object detection can be classified into three categories, according to the available image data: two-dimensional salient object detection, three-dimensional salient object detection and light field salient object detection.
The two-dimensional salient object detection method is characterized in that a traditional camera is used for obtaining a two-dimensional image, and the traditional method or the learning-based method is used for extracting and fusing characteristics such as color, brightness, position, texture and the like through a local or global contrast frame so as to realize salient and non-salient distinction.
The three-dimensional salient target detection method is used for realizing salient target detection by utilizing the depth information of a two-dimensional image and a scene. Depth information of a scene is acquired by a three-dimensional sensor, which also plays an important role in the human visual system, reflecting the distance between an object and an observer. The depth information is used for detecting the salient object, the defects of the traditional two-dimensional image are made up, the final salient image is obtained by utilizing the mutual complementation of the color and the depth, and the accuracy of detecting the salient object is improved to a certain extent.
The method for detecting the light field salient object is to process light field data acquired by a light field camera to realize salient object detection. The light field imaging can record the position and visual angle information of light radiation in a space through one-time exposure by means of a new calculation imaging technology, and the acquired light field information reflects the geometry and the reflection characteristic of a natural scene. The conventional method improves the performance of detecting the significant target of the challenging scene by fusing the significant characteristics of different light field data.
Although some methods for detecting salient objects with excellent performance have appeared in the field of computer vision, these methods still have disadvantages:
1. in the two-dimensional salient target detection method, because the two-dimensional image is the integral of the projection of light on the camera sensor and only contains the light intensity in a specific direction, the two-dimensional salient target detection is too sensitive to a high-frequency part or noise and is easily influenced by factors such as similar color and texture of a foreground and a background and disordered background.
2. In the three-dimensional obvious target detection method, the accuracy of scene depth information depends on a depth camera, and the conventional depth camera has the problems of low resolution, narrow measurement range, high noise, incapability of measuring transmission materials, easiness in sunlight and reflection interference of a smooth plane and the like.
3. In the three-dimensional salient object detection method, the characteristic information such as color, depth, position and the like are processed and fused independently, and the complementarity of the characteristic information is not considered comprehensively.
4. Most of the methods for detecting the salient objects based on the two-dimensional and three-dimensional images are based on the assumption that the objects are obviously different from the background, the background is simple and the like, and the methods have certain limitations as the image data is increased in a large scale and the complexity of the image content is increased.
5. In the detection of the light field significant target, the research of light field data on the aspect of significant target detection is just started, and the currently available data sets are fewer and the image quality is poorer. The prior method for detecting the salient object by utilizing the light field data is based on the traditional salient feature calculation method, and simultaneously models multiple clues such as color, depth, refocusing and the like respectively, so that the problems of insufficient feature expression, poor robust detection effect and the like exist.
Disclosure of Invention
The invention aims to overcome the defects in the prior art, and provides a light field significant target detection method based on a deep convolutional network, so that the spatial information and the visual angle information of light field data can be fully utilized, and the significant target detection accuracy of a complex scene image can be effectively improved.
The invention adopts the following technical scheme for solving the technical problems:
the invention relates to a method for detecting a light field significant target based on a deep convolutional network, which is characterized by comprising the following steps of:
step 1, obtaining a microlens image Id
Step 1.1, acquiring a light field file by using light field equipment, and decoding to obtain a light field data set which is recorded as L ═ (L ═ L)1,L2,…,Ld,…,LD) Wherein L isdRepresenting the d-th light field data and denoting the d-th light field data as Ld(u, v, s, t), u and v representing any horizontal pixel and vertical pixel in the spatial information, and s and t representing any horizontal viewing angle and vertical viewing angle in the viewing angle information; d is equal to [1, D ∈]And D represents the total number of light field data;
step 1.2, fixing a horizontal view angle s and a vertical view angle t, and traversing the d-th light field data Ld(u, v, s, t) of all horizontal and vertical pixels, resulting in the d-th light-field data LdSub-aperture image at view angle of tth row and sth column in (u, v, s, t)
Figure BDA0001815882180000021
And is
Figure BDA0001815882180000022
Is marked as V and U respectively, and V is belonged to [1, V ∈],u∈[1,U];
Step 1.3, traversing the light field data LdAll horizontal and vertical views in (u, v, s, t)Viewing angle, obtaining sub-aperture image set under the d-th all viewing angle
Figure BDA0001815882180000031
Wherein S is ∈ [1, S ∈],t∈[1,T]S represents the row of the maximum horizontal viewing angle, and T represents the column of the maximum vertical viewing angle;
step 1.4, defining the number of the selected visual angles as m multiplied by m, and utilizing a formula (1) to collect the sub-aperture images N under the d-th all visual anglesdTo select the d-th image set M centered on the central view angled
Figure BDA0001815882180000032
In the formula (1), the reaction mixture is,
Figure BDA0001815882180000033
and to
Figure BDA0001815882180000034
Taking an integer downwards;
step 1.5, obtaining the d-th microlens image I according to the x ═ v-1 × m + t, y ═ u-1 × m + sdMiddle x row and y column pixel point Id(x, y) to obtain the d-th microlens image I with height and width of H and W, respectivelydWherein x ∈ [1, H ]],y∈[1,W],H=V×m,W=U×m;
Step 2, from the d image set MdSelecting the sub-aperture image of the d-th central view, and recording as
Figure BDA0001815882180000035
Sub-aperture image to the d-th central view
Figure BDA0001815882180000036
Marking a significant area, setting the pixel of the significant area as 1, and setting the pixel of the non-significant area as 0, thereby obtaining the d-th microlens image IdD true saliency map G ofdSaid d-th real saliency map GdHeight and width ofAre V and U, respectively;
step 3, aiming at the d-th microlens image IdCarrying out data enhancement processing to obtain a d enhanced microlens image set Id'; for the d real significant map GdPerforming geometric transformation to obtain the d-th transformed real saliency map set Gd′;
Step 4, repeating steps 1.2 to 3, and obtaining D enhanced microlens image sets I' in the light field data set L (I ═ I)1′,I2′,…,Id′,…,I′D) And D sets of transformed true saliency maps are denoted G' ═ (G)1′,G2′,…,Gd′,…,G′D);
Step 5, constructing the d light field data Ld(u, v, s, t) salient object detection model;
step 5.1, acquiring a Deeplab-V2 convolutional neural network of a layer c, wherein the Deeplab-V2 convolutional neural network comprises a convolutional layer, a pooling layer and a discarding layer;
step 5.2, modifying the Deeplab-V2 convolutional neural network of the layer c to obtain a modified LFnet convolutional neural network;
step 5.2.1, adding a convolution layer LF _ conv1_1 with convolution kernel size of m multiplied by m and a ReLU activation function LF _ ReLU1_1 before the first layer of the Deeplab-V2 convolutional neural network;
setting the moving step length of the convolution kernel to be m when the convolution layer LF _ conv1_1 carries out convolution operation;
the mathematical expression of the ReLU activation function LF _ ReLU1_1 is phi (a) ═ max (0, a), where a represents the output of the convolutional layer LF _ conv1_1 and is input to the ReLU activation function LF _ ReLU1_1, and phi (a) represents the output of the ReLU activation function LF _ ReLU1_ 1;
step 5.2.2, adding a discarding layer after other convolution layers in the Deeplab-V2 convolutional neural network except convolution layers connected with the discarding layer in the convolutional layer LF _ conv1_1 and the Deeplab-V2 convolutional neural network;
step 5.2.3, setting the number of output channels of the c-1 layer in the Deeplab-V2 convolutional neural network as b, wherein b is the number of pixel categories;
step 5.2.4, adding an upsampling layer after the layer c of the Deeplab-V2 convolutional neural network, and utilizing the upsampling layer to output a feature map F to the layer c of the Deeplab-V2 convolutional neural networkd(q, r, b) performing an upsampling operation to obtain an upsampled feature map Fd' (q, r, b); wherein q, r and b represent the characteristic diagram F respectivelydWidth, height and number of channels of (q, r, b);
step 5.2.5, adding a shear layer after the upsampling layer, and according to the d-th real saliency map GdLength V and width U of said feature map F using said shear layerd' (q, r, b) obtaining said microlens image I by shearingdPixel class prediction probability map Fd″(q,r,b);
And 5.3, taking the enhanced microlens image set I 'as the input of the LFnet convolutional neural network, taking the transformed real significant image set G' as a label, using a cross entropy loss function, and training the LFnet convolutional neural network by using a gradient descent algorithm, so as to obtain a significant target detection model of the light field data, and realizing significant target detection of the light field data by using the significant target detection model.
Compared with the prior art, the invention has the beneficial effects that:
1. the invention utilizes the second generation light field camera to collect the light field data of complex and changeable scenes, the scenes comprise various sizes of salient objects, various light sources, similarity between the salient objects and the background, disordered background and other difficulties, the defects of the current light field salient data on data and difficulty are fully supplemented, and the quality of the current light field salient data is improved.
2. According to the method, the image characteristics are extracted by utilizing the powerful function of the depth convolution network in the aspect of image processing, the spatial information and the visual angle information of the light field data are fused, the context information of the microlens image is captured by utilizing the cavity pyramid network, and the salient object in the image scene is detected, so that the defect that the visual angle information cannot be used by the current two-dimensional or three-dimensional salient object detection method is overcome, and the precision and the robustness of the image salient object detection in the complex scene are improved.
3. The multi-view information in the microlens image reflects the space geometric characteristics of a scene, the microlens image is directly input into the convolutional neural network, the obvious target detection is realized, the defect that the depth and color information is independently processed in the current light field obvious target detection method is overcome, the depth perception and the visual significance are considered, the complementarity of the depth and the color is effectively utilized, and the accuracy of the image obvious target detection is improved.
Drawings
FIG. 1 is a flow chart of the salient object detection method of the present invention;
FIG. 2 is a sub-aperture image obtained by the method of the present invention;
FIG. 3 is a microlens image obtained by the method of the present invention;
FIG. 4 is a partial scene and a true saliency map of a data set acquired by the method of the present invention;
FIG. 5 is a detailed process diagram of the microlens image input network model according to the method of the present invention;
FIG. 6 is a diagram of the Deeplab-V2 model used in the method of the present invention;
FIG. 7 is a comparison graph of detection results of some salient objects obtained by the method of the present invention and other light field salient object detection methods on a data set collected by a second generation light field camera;
fig. 8 is an analysis diagram of the quantitative comparison between the data set acquired by the second generation light field camera and the current other light field saliency extraction methods, using the "recall ratio/precision ratio curve" as the measurement standard.
Detailed Description
In this embodiment, a method for detecting a significant light field target based on a deep convolutional network is shown in fig. 1, and is performed according to the following steps:
step 1, obtaining a microlens image Id
Step 1.1, acquiring a light field file by using light field equipment and decoding to obtain the light field fileTo light field data set is denoted as L ═ (L)1,L2,…,Ld,…,LD) Wherein L isdRepresenting the d-th light field data and denoting the d-th light field data as Ld(u, v, s, t), u and v representing any horizontal pixel and vertical pixel in the spatial information, and s and t representing any horizontal viewing angle and vertical viewing angle in the viewing angle information; d is equal to [1, D ∈]And D represents the total number of light field data;
in this embodiment, a second-generation light field camera is used to acquire a light field file, and the light field file is decoded by a lytro powertoolbeta tool to obtain light field data Ld(u, v, s, t); light field data LdThe (u, v, s, t) is expressed by a biplane parameter method, in a four-dimensional (u, v, s, t) coordinate space, a light ray corresponds to a sampling point of a light field, the u and v planes express a space information plane, and the s and t planes express a view angle information plane; in the experiment of the invention, 640 pieces of light field data are acquired, the light field data are averagely divided into 5 pieces, 1 piece is selected as a test set in turn, and the other 4 pieces are selected as a training set. D in step 1.1 represents the training data set, D512;
step 1.2, fixing the horizontal view angle s and the vertical view angle t, and traversing the d light field data LdAll horizontal and vertical pixels in (u, v, s, t) to obtain the d-th light field data LdSub-aperture image at view angle of tth row and sth column in (u, v, s, t)
Figure BDA0001815882180000061
And is
Figure BDA0001815882180000062
Is marked as V and U respectively, and V is belonged to [1, V ∈],u∈[1,U]In this experiment, V375, U540;
step 1.3, traversing the light field data LdAll horizontal viewing angles and vertical viewing angles in (u, v, s, t) are obtained, and a sub-aperture image set under the d-th all viewing angles is obtained
Figure BDA0001815882180000063
Wherein S is ∈ [1, S ∈],t∈[1,T]S represents the row of the maximum horizontal viewing angle, and T represents the column of the maximum vertical viewing angle; in particular toIn an embodiment, S-14, T-14; as shown in fig. 2, the left image in fig. 2 is a set of sub-aperture images for all viewing angles, and the right image in fig. 2 is a sub-aperture image for the viewing angle at the row 6 and column 11
Figure BDA0001815882180000064
Step 1.4, defining the number of the selected visual angles as m multiplied by m, and utilizing a formula (1) to collect the sub-aperture images N under the d-th all visual anglesdTo select the d-th image set M centered on the central view angled(ii) a In specific implementation, m is 9, and 81 view images are selected in total; experiments show that more visual angles can provide more information, the performance of the obvious target detection model can be further improved, however, more visual angles consume a large amount of storage and calculation time, and the experiment difficulty is increased;
Figure BDA0001815882180000065
in the formula (1), the reaction mixture is,
Figure BDA0001815882180000066
and to
Figure BDA0001815882180000067
Taking an integer downwards;
step 1.5, obtaining the d-th microlens image I according to the x ═ v-1 × m + t, y ═ u-1 × m + sdMiddle x row and y column pixel point Id(x, y) to obtain the d-th microlens image I with height and width of H and W, respectivelydAs shown in FIG. 3, where x ∈ [1, H ]],y∈[1,W]H ═ V × m, W ═ U × m; in this embodiment, H is 3375, W is 4860, and the left image in fig. 3 is a microlens image IdAnd the right image in FIG. 3 is a microlens image IdAnd (3) partially enlarging, wherein all pixels in the grids in the partially enlarged image represent a pixel set of the same spatial information and different viewing angle information.
Step 2, from the d image set MdSelecting the sub-aperture image of the d-th central view, and recording as
Figure BDA0001815882180000068
Sub-aperture image for the d-th central viewing angle
Figure BDA0001815882180000069
Marking a salient region, and enabling the pixel of the salient region to be 1 and the pixel of the non-salient region to be 0, thereby obtaining a d-th microlens image IdD true saliency map G ofdD-th real saliency map GdV and U, in specific embodiments, 375 and 540; as shown in fig. 4, the first and third rows in fig. 4 are microlens images, and the second and fourth rows are real saliency maps.
Step 3, for the d-th microlens image IdCarrying out data enhancement processing to obtain a d enhanced microlens image set Id'; for the d-th real saliency map GdPerforming geometric transformation to obtain the d-th transformed real saliency map set Gd'; in the present embodiment, for the d-th microlens image IdThe method has the advantages that the data enhancement is realized by rotating, turning over, increasing the chroma, increasing the contrast, increasing the brightness, reducing the brightness and increasing the Gaussian noise processing, and the generalization capability of the obvious target detection model can be improved by the data enhancement.
And 4, repeating the steps 1.2 to 3 to obtain D enhanced microlens image sets I' in the light field data set L (I)1′,I2′,…,Id′,…,I′D) And D sets of transformed true saliency maps are denoted G' ═ (G)1′,G2′,…,Gd′,…,G′D);
Step 5, constructing the d light field data Ld(u, v, s, t) salient object detection model;
step 5.1, a deep convolutional neural network is adopted in the deep convolutional neural network, the deep convolutional neural network is composed of 16 convolutional layers, 5 pooling layers, 2 discarding layers and 1 laminating merging layer and used for semantic segmentation, the detailed structure of the deep convolutional neural network is shown in fig. 6, the deep convolutional neural network is contained in the deep convolutional neural network, the context of an image is captured in multiple proportions, and the detection of significant targets in multiple scales is achieved.
Step 5.2, modifying the Deeplab-V2 convolutional neural network at the layer c to obtain a modified LFnet convolutional neural network, wherein the detailed structure of the LFnet convolutional neural network is shown in FIG. 5;
step 5.2.1, adding a convolution layer LF _ conv1_1 with convolution kernel size of m multiplied by m and a ReLU activation function LF _ ReLU1_1 before the first layer of the Deeplab-V2 convolutional neural network;
setting the moving step size of a convolution kernel to be m when the convolution layer LF _ conv1_1 carries out convolution operation; in specific implementation, m is 9; constructing microlens image I at step 1.4 and step 1.5dIn the process, the number of the selected viewing angles is 9 × 9, and in order that the network can better extract and fuse multi-viewing angle information, the size of a convolution kernel of the convolution layer LF _ conv1_1 is set to be 9 × 9, and the step length is 9;
the mathematical expression for the ReLU activation function LF _ ReLU1_1 is phi (a) ═ max (0, a), where a represents the output of the convolutional layer LF _ conv1_1 and is input to the ReLU activation function LF _ ReLU1_1, and phi (a) represents the output of the ReLU activation function LF _ ReLU1_ 1;
step 5.2.2, adding a discarding layer after other convolution layers in the Deeplab-V2 convolutional neural network except the convolution layers connected with the discarding layer in the convolutional layer LF _ conv1_1 and the Deeplab-V2 convolutional neural network; in the embodiment, the discarding layer is added, so that overfitting can be effectively prevented, and meanwhile, the generalization capability of the obvious target detection model is improved;
step 5.2.3, setting the number of output channels of the c-1 layer in the Deeplab-V2 convolutional neural network as b, wherein b is the number of pixel categories; in specific embodiments, c-1 ═ 23, b ═ 2; the salient object detection model classifies pixels into salient and non-salient types.
Step 5.2.4, adding an upsampling layer after the layer c of the Deeplab-V2 convolutional neural network, and utilizing the upsampling layer to output a characteristic diagram F of the layer c of the Deeplab-V2 convolutional neural networkd(q, r, b) performing an upsampling operation to obtain an upsampledLater feature map Fd' (q, r, b); wherein q, r and b respectively represent a characteristic diagram FdWidth, height and number of channels of (q, r, b);
step 5.2.5, adding a shear layer after the upper sampling layer, and according to the d-th real saliency map GdLength V and width U of (d), using shear layer pair profile Fd' (q, r, b) obtaining a microlens image I by shearingdPixel class prediction probability map Fd″(q,r,b);
And 5.3, taking the enhanced microlens image set I 'as the input of the LFnet convolutional neural network, taking the transformed real significant image set G' as a label, using a cross entropy loss function, and training the LFnet convolutional neural network by using a gradient descent algorithm, so as to obtain a significant target detection model of the light field data, and realizing significant target detection of the light field data by using the significant target detection model.
Processing the test set according to the steps 1.1 to 2 to obtain a microlens image of the test set, inputting the microlens image of the test set into the salient object detection model to obtain a pixel class prediction probability map F of the test settest"(q, r, b), extraction of saliency map F using equation (2)s", formula (2) wherein Ftest"(q, r,2) represents a probability map Ftest"(q, r, b) the value of the second channel; for significant picture Fs"normalization" to obtain the final saliency map Fs
Fs″=Ftest″(q,r,2) (2)
In order to more fairly evaluate the performance of the significant target detection model obtained by the method, a training set and a test set are selected in turn, and the average of the 5 test results is taken as a final index for evaluating the performance of the significant target detection model.
Fig. 7 is a qualitative comparison between the significant target detection method based on the deep convolutional network of the present invention and other current light field significant target detection methods, where Ours represents the significant target detection method based on the deep convolutional network of the present invention; multi-cue represents a light field significant target detection method based on focus flow, view flow, depth and color; DILF represents a light field significant target detection method based on color, depth and background prior; WSC represents a light field significant target detection method based on sparse coding theory; LFS represents a salient object detection method based on object and background modeling. All 4 methods were tested on real scene data sets collected by the second generation light field camera used in the present invention.
Table 1 is an analysis table of quantitative comparison between the method for detecting a significant target based on a deep convolutional network and other current methods for detecting a significant target of a light field by using an 'F-measure', 'WF-measure', 'average precision AP', 'average absolute value error MAE' as measurement standards and using a data set acquired by a second-generation light field camera, wherein the 'F-measure' is a statistical index of 'recall ratio/precision curve' measurement, the closer the value to 1, the better the effect of the significant target detection is indicated, the 'WF-measure' is a statistical index of 'weighted recall ratio/precision curve' measurement, the closer the value to 1, the better the effect of the significant target detection is indicated, the 'AP' measures the average precision of the result of the significant target detection, the closer the value to 1, the better the effect of the significant target detection is indicated, and the 'MAE' measures the average absolute difference degree of the result of the significant target detection and the real result, the closer the value is to 0, the better the detection of a significant target.
Fig. 8 is an analysis diagram of the significant target detection method based on the deep convolutional network, which takes the "PR curve of the accuracy-recall rate" as the measurement standard and performs quantitative comparison with other current light field significant target detection methods, wherein if one PR curve is completely "wrapped" by another PR curve, the performance of the latter PR curve is better than that of the former PR curve.
TABLE 1
Salient object detection method Ours Multi-cue DILF WSC LFS
F-measure 0.8118 0.6649 0.6395 0.6452 0.6108
WF-measure 0.7541 0.5420 0.4844 0.5946 0.3597
AP 0.9124 0.6593 0.6922 0.5960 0.6193
MAE 0.0551 0.1198 0.1390 0.1093 0.1698
As can be seen from the quantitative analysis table in Table 1, the 'F-measure', 'WF-measure', 'AP' and 'MAE' obtained by the method are all higher than those obtained by other light field significant target detection methods. As can be seen from the PR graph of FIG. 8, the method of the present invention shows that the "recall/precision curve" is close to the upper right corner, and all contain PR curves of other methods, and when the recall ratios are the same, the probability of false detection is lower.

Claims (1)

1. A light field salient object detection method based on a deep convolutional network is characterized by comprising the following steps:
step 1, obtaining a microlens image Id
Step 1.1, acquiring a light field file by using a light field device, and decoding to obtain a light field data set, which is recorded as L ═ (L ═ L)1,L2,…,Ld,…,LD) Wherein L isdRepresenting the d-th light field data and denoting the d-th light field data as Ld(u, v, s, t), u and v representing any horizontal pixel and vertical pixel in the spatial information, and s and t representing any horizontal viewing angle and vertical viewing angle in the viewing angle information; d is equal to [1, D ∈]And D represents the total number of light field data;
step 1.2, fixing a horizontal view angle s and a vertical view angle t, and traversing the d-th light field data Ld(u, v, s, t) of all horizontal and vertical pixels, resulting in the d-th light-field data LdSub-aperture image at view angle of s-th row and t-th column in (u, v, s, t)
Figure FDA0002227129280000011
And is
Figure FDA0002227129280000012
Is marked as V and U respectively, and V is belonged to [1, V ∈],u∈[1,U];
Step 1.3, traversing the d light field data LdAll horizontal viewing angles and vertical viewing angles in (u, v, s, t) are obtained, and a sub-aperture image set under the d-th all viewing angles is obtained
Figure FDA0002227129280000013
Wherein S is ∈ [1, S ∈],t∈[1,T]S represents the row of the maximum horizontal viewing angle, and T represents the column of the maximum vertical viewing angle;
step 1.4, defining the number of the selected visual angles as m multiplied by m, and utilizing a formula (1) to collect the sub-aperture images N under the d-th all visual anglesdTo select the d-th image set M centered on the central view angled
Figure FDA0002227129280000014
In the formula (1), the reaction mixture is,
Figure FDA0002227129280000015
and to
Figure FDA0002227129280000016
Taking an integer downwards;
step 1.5, obtaining the d-th microlens image I according to the x ═ v-1 × m + t, y ═ u-1 × m + sdMiddle x row and y column pixel point Id(x, y) to obtain the d-th microlens image I with height and width of H and W, respectivelydWherein x ∈ [1, H ]],y∈[1,W],H=V×m,W=U×m;
Step 2, from the d image set MdSelecting the sub-aperture image of the d-th central view, and recording as
Figure FDA0002227129280000017
Sub-aperture image to the d-th central view
Figure FDA0002227129280000018
Marking a significant area, setting the pixel of the significant area as 1, and setting the pixel of the non-significant area as 0, thereby obtaining the d-th microlens image IdD true saliency map G ofdSaid d-th real saliency map GdAre respectively V and U;
step 3, aiming at the d-th microlens image IdNumber of advancesObtaining a d enhanced microlens image set I 'according to the enhancement processing'd(ii) a For the d real significant map GdPerforming geometric transformation processing to obtain a d-th transformed real saliency map set G'd
And 4, repeating the steps 1.2 to 3 to obtain D enhanced microlens image sets I ' (I ') in the light field data set L '1,I′2,…,I′d,…,I′D) And D sets of true saliency maps after transformation are denoted as G '═ G'1,G′2,…,G′d,…,G′D);
Step 5, constructing the d light field data Ld(u, v, s, t) salient object detection model;
step 5.1, acquiring a Deeplab-V2 convolutional neural network of a layer c, wherein the Deeplab-V2 convolutional neural network comprises a convolutional layer, a pooling layer and a discarding layer;
step 5.2, modifying the Deeplab-V2 convolutional neural network of the layer c to obtain a modified LFnet convolutional neural network;
step 5.2.1, adding a convolution layer LF _ conv1_1 with convolution kernel size of m multiplied by m and a ReLU activation function LF _ ReLU1_1 before the first layer of the Deeplab-V2 convolutional neural network;
setting the moving step length of the convolution kernel to be m when the convolution layer LF _ conv1_1 carries out convolution operation;
the mathematical expression of the ReLU activation function LF _ ReLU1_1 is phi (a) ═ max (0, a), where a represents the output of the convolutional layer LF _ conv1_1 and is input to the ReLU activation function LF _ ReLU1_1, and phi (a) represents the output of the ReLU activation function LF _ ReLU1_ 1;
step 5.2.2, adding a discarding layer after other convolution layers in the Deeplab-V2 convolutional neural network except convolution layers connected with the discarding layer in the convolutional layer LF _ conv1_1 and the Deeplab-V2 convolutional neural network;
step 5.2.3, setting the number of output channels of the c-1 layer in the Deeplab-V2 convolutional neural network as b, wherein b is the number of pixel categories;
step 5.2.4, convolution at the deep-V2Adding an upsampling layer behind the layer c of the neural network, and utilizing the upsampling layer to output a feature map F of the layer c of the Deeplab-V2 convolutional neural networkd(q, r, b) performing an upsampling operation to obtain an upsampled feature map Fd' (q, r, b); wherein q, r and b represent the characteristic diagram F respectivelydWidth, height and number of channels of (q, r, b);
step 5.2.5, adding a shear layer after the upsampling layer, and according to the d-th real saliency map GdLength V and width U of said feature map F using said shear layerd' (q, r, b) obtaining said microlens image I by shearingdPixel class prediction probability map Fd″(q,r,b);
And 5.3, taking the enhanced microlens image set I 'as the input of the LFnet convolutional neural network, taking the transformed real significant image set G' as a label, using a cross entropy loss function, and training the LFnet convolutional neural network by using a gradient descent algorithm, so as to obtain a significant target detection model of the light field data, and realizing significant target detection of the light field data by using the significant target detection model.
CN201811141315.2A 2018-09-28 2018-09-28 Light field significant target detection method based on deep convolutional network Active CN109344818B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811141315.2A CN109344818B (en) 2018-09-28 2018-09-28 Light field significant target detection method based on deep convolutional network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811141315.2A CN109344818B (en) 2018-09-28 2018-09-28 Light field significant target detection method based on deep convolutional network

Publications (2)

Publication Number Publication Date
CN109344818A CN109344818A (en) 2019-02-15
CN109344818B true CN109344818B (en) 2020-04-14

Family

ID=65307539

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811141315.2A Active CN109344818B (en) 2018-09-28 2018-09-28 Light field significant target detection method based on deep convolutional network

Country Status (1)

Country Link
CN (1) CN109344818B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110441271B (en) * 2019-07-15 2020-08-28 清华大学 Light field high-resolution deconvolution method and system based on convolutional neural network
CN111369522B (en) * 2020-03-02 2022-03-15 合肥工业大学 Light field significance target detection method based on generation of deconvolution neural network
CN111445465B (en) * 2020-03-31 2023-06-16 江南大学 Method and equipment for detecting and removing snow or rain belt of light field image based on deep learning
CN111931793B (en) * 2020-08-17 2024-04-12 湖南城市学院 Method and system for extracting saliency target
CN113343822B (en) * 2021-05-31 2022-08-19 合肥工业大学 Light field saliency target detection method based on 3D convolution

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105701813A (en) * 2016-01-11 2016-06-22 深圳市未来媒体技术研究院 Significance detection method of light field image
WO2018072858A1 (en) * 2016-10-18 2018-04-26 Photonic Sensors & Algorithms, S.L. Device and method for obtaining distance information from views
CN107993260A (en) * 2017-12-14 2018-05-04 浙江工商大学 A kind of light field image depth estimation method based on mixed type convolutional neural networks

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160203689A1 (en) * 2015-01-08 2016-07-14 Kenneth J. Hintz Object Displacement Detector
CN105913070B (en) * 2016-04-29 2019-04-23 合肥工业大学 A kind of multi thread conspicuousness extracting method based on light-field camera
CN106981080A (en) * 2017-02-24 2017-07-25 东华大学 Night unmanned vehicle scene depth method of estimation based on infrared image and radar data

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105701813A (en) * 2016-01-11 2016-06-22 深圳市未来媒体技术研究院 Significance detection method of light field image
WO2018072858A1 (en) * 2016-10-18 2018-04-26 Photonic Sensors & Algorithms, S.L. Device and method for obtaining distance information from views
CN107993260A (en) * 2017-12-14 2018-05-04 浙江工商大学 A kind of light field image depth estimation method based on mixed type convolutional neural networks

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Occlusion-aware depth estimation for light field using multi-orientation EPIs;Hao Sheng等;《Pattern Recognition》;20180228;第74卷;第587-599页 *
Saliency Detection on Light Field: A Multi-Cue Approach;Jun Zhang等;《ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM)》;20170831;第13卷(第3期);第32:1-32:19页 *
光场相机的标定方法及深度估计研究;王丽娟;《万方数据知识服务平台》;20180731;第1-49页 *
基于卷积神经网络的光场图像深度估计技术研究;罗姚翔;《万方数据知识服务平台》;20180830;第1-50页 *

Also Published As

Publication number Publication date
CN109344818A (en) 2019-02-15

Similar Documents

Publication Publication Date Title
CN109344818B (en) Light field significant target detection method based on deep convolutional network
Qi et al. Volumetric and multi-view cnns for object classification on 3d data
WO2018023734A1 (en) Significance testing method for 3d image
CN108596108B (en) Aerial remote sensing image change detection method based on triple semantic relation learning
CN111079584A (en) Rapid vehicle detection method based on improved YOLOv3
Feng et al. Benchmark data set and method for depth estimation from light field images
CN110827312B (en) Learning method based on cooperative visual attention neural network
Nedović et al. Stages as models of scene geometry
CN110910437B (en) Depth prediction method for complex indoor scene
CN113609896A (en) Object-level remote sensing change detection method and system based on dual-correlation attention
CN113436210B (en) Road image segmentation method fusing context progressive sampling
CN109829924A (en) A kind of image quality evaluating method based on body feature analysis
CN113343822B (en) Light field saliency target detection method based on 3D convolution
CN110648331A (en) Detection method for medical image segmentation, medical image segmentation method and device
CN112926652A (en) Fish fine-grained image identification method based on deep learning
CN111640116A (en) Aerial photography graph building segmentation method and device based on deep convolutional residual error network
CN107392211B (en) Salient target detection method based on visual sparse cognition
CN113989343A (en) Attention mechanism-based sensor fusion depth reconstruction data driving method
CN114926826A (en) Scene text detection system
CN112329662B (en) Multi-view saliency estimation method based on unsupervised learning
CN114612315A (en) High-resolution image missing region reconstruction method based on multi-task learning
Babu et al. An efficient image dahazing using Googlenet based convolution neural networks
Shit et al. An encoder‐decoder based CNN architecture using end to end dehaze and detection network for proper image visualization and detection
Khoshboresh-Masouleh et al. Robust building footprint extraction from big multi-sensor data using deep competition network
CN116977895A (en) Stain detection method and device for universal camera lens and computer equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant