CN109344818A - A kind of light field well-marked target detection method based on depth convolutional network - Google Patents
A kind of light field well-marked target detection method based on depth convolutional network Download PDFInfo
- Publication number
- CN109344818A CN109344818A CN201811141315.2A CN201811141315A CN109344818A CN 109344818 A CN109344818 A CN 109344818A CN 201811141315 A CN201811141315 A CN 201811141315A CN 109344818 A CN109344818 A CN 109344818A
- Authority
- CN
- China
- Prior art keywords
- light field
- image
- layer
- target detection
- well
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/10—Image acquisition
- G06V10/12—Details of acquisition arrangements; Constructional details thereof
- G06V10/14—Optical characteristics of the device performing the acquisition or on the illumination arrangements
- G06V10/145—Illumination specially adapted for pattern recognition, e.g. using gratings
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/46—Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
- G06V10/462—Salient features, e.g. scale invariant feature transforms [SIFT]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/07—Target detection
Abstract
The light field well-marked target detection method based on depth convolutional network that the invention discloses a kind of, step include: the 1 sub-aperture image that whole visual angles are converted out from the light field data for using optical field acquisition equipment to obtain;2 by the sub-aperture image reorganization under different perspectives at lenticule image;3 pairs of lenticule images carry out data enhancing;4 based on the pre-training weight of Deeplab-V2 network, builds the well-marked target detection model in conjunction with lenticule image, and utilize data set training;5 carry out well-marked target detection to light field data to be processed using trained well-marked target detection model.The method of the present invention can effectively improve the accuracy of the well-marked target detection of complex scene image.
Description
Technical field
The invention belongs to computer vision, image procossing and analysis fields, specifically a kind of to be based on depth convolution net
The light field well-marked target detection method of network.
Background technique
Conspicuousness target detection is the sensing capability of human visual system.When observing piece image, vision system can
Interested region and target in image are quickly obtained, the process for obtaining interested region and target is well-marked target inspection
It surveys.With the development of computer technology and internet and universal, people's acquisition external image presentation well of intelligent movable equipment
Spray formula increases.Well-marked target detection enters subsequent complex process from a large amount of visual informations of input selection very small part, such as
Object detection and recognition, image retrieval, image segmentation etc. effectively reduce the calculation amount of vision system.Currently, well-marked target
Detection has become one of the hot spot studied in computer vision field.
According to workable image data, the method for current well-marked target detection can be divided into three classes: two-dimentional well-marked target inspection
It surveys, three-dimensional well-marked target detection and light field well-marked target detect.
Two-dimentional well-marked target detection method is to obtain two dimensional image using traditional camera, using conventional method or based on
The method of habit extracts Fusion of Color, brightness, position and Texture eigenvalue by part or the frame of global contrast, realizes aobvious
It writes and non-significant differentiation.
Three-dimensional well-marked target detection method is the depth information using two dimensional image and scene, realizes well-marked target detection.
The depth information of scene is obtained by three-dimension sensor, which equally plays an important role in human visual system, it is anti-
The distance between object and observer are answered.Depth information is used for well-marked target detection, compensates for conventional two-dimensional image not
Foot obtains final notable figure using color and being complementary to one another for depth, improves well-marked target detection to a certain extent
Accuracy.
Light field well-marked target detection method is that the light field data obtained to light-field camera is handled, and realizes well-marked target inspection
It surveys.Optical field imaging can record simultaneously the position of light radiation in space by single exposure by new calculating imaging technique
And Viewing-angle information, the field information of acquisition reflect the geometry and reflection characteristic of natural scene.Conventional method passes through fusion at present
Significant properties on different light field datas improves the performance of the well-marked target detection of challenge scene.
Although having had already appeared the outstanding well-marked target detection method of some performances in computer vision field, this
A little methods still remain shortcoming:
1, in two-dimentional well-marked target detection method, since two dimensional image is the product that light projects on camera sensor
Point, the light intensity of specific direction is contained only, therefore, two-dimentional well-marked target detection is too sensitive to high frequency section or noise,
And the influence for the factors such as, background similar with background color and vein vulnerable to prospect be mixed and disorderly.
2, in three-dimensional well-marked target detection method, the precision of depth information of scene depends on depth camera, current depth
That there are resolution ratio is lower, measurement range is narrow, noise is big for degree camera, is unable to measure and transmits material, anti-vulnerable to daylight and smooth flat
The problems such as light interference.
3, in three-dimensional well-marked target detection method, the characteristic informations such as color, depth, position, which are all independent from each other, to be located
Reason merges again, does not consider its complementarity synthetically.
4, most well-marked target detection methods based on two dimension and 3-D image are obvious to exist between target and background
Difference, background are simple etc. assume premised on, with the extensive increase of image data, image content complexity increases, these sides
Method has some limitations.
5, in the detection of light field well-marked target, light field data is at the early-stage in the research of well-marked target context of detection, at present
Available data set is less and picture quality is poor.The current well-marked target detection using light field data is all based on traditional show
Feature calculation method is write, and the multi threads such as color, depth, refocusing are modeled respectively simultaneously, it is insufficient that there is feature representation power
The problems such as bad with robust detection effect.
Summary of the invention
The present invention be in order to solve above-mentioned the shortcomings of the prior art in place of, propose a kind of based on depth convolutional network
Light field well-marked target detection method, to which the spatial information and Viewing-angle information of light field data can be made full use of, so as to effectively mention
The accuracy of the well-marked target detection of high complex scene image.
The present invention adopts the following technical scheme that in order to solve the technical problem
A kind of the characteristics of light field well-marked target detection method based on depth convolutional network of the invention be as follows into
Row:
Step 1 obtains lenticule image Id;
Step 1.1 obtains light field file using light field equipment, and is decoded to obtain light field data set and is denoted as L=
(L1,L2,…,Ld,…,LD), wherein LdIt indicates d-th of light field data, and d-th of light field data is denoted as Ld(u, v, s, t), u
With any horizontal pixel and vertical pixel in v representation space information, s and t indicate in Viewing-angle information any horizontal view angle and vertical
Visual angle;The sum of d ∈ [1, D], D expression light field data;
Step 1.2, fixed horizontal view angle s and vertical visual angle t, and traverse d-th of light field data LdIn (u, v, s, t)
All horizontal pixels and vertical pixel obtain d-th of light field data LdSon in (u, v, s, t) under t row s column visual angle
Subaperture imageAndHeight and width be denoted as V and U, v ∈ [1, V], u ∈ [1, U] respectively;
Step 1.3, the traversal light field data LdAll horizontal view angles and vertical visual angle in (u, v, s, t) obtain d-th
Sub-aperture image collection under whole visual anglesWherein, [1, S] s ∈, t
∈ [1, T], S indicate that maximum horizontal visual angle is expert at, and T indicates maximum vertical visual angle column;
The visual angle number that step 1.4, definition are chosen is m × m, using formula (1) from the sub-aperture figure under d-th of whole visual angle
Image set closes NdD-th image collection M of the middle selection centered on central visual angled:
In formula (1),And it is rightDownward round numbers;
Step 1.5 obtains d-th of lenticule image I according to x=(v-1) × m+t, y=(u-1) × m+sdMiddle xth row
The pixel I of y columnd(x, y), to obtain d-th of lenticule image I that height and width are respectively H and Wd, wherein x ∈ [1,
H], y ∈ [1, W], H=V × m, W=U × m;
Step 2, from d-th of image collection MdThe sub-aperture image for choosing d-th of central visual angle, is denoted asIt is right
The sub-aperture image at described d-th central visual angleSalient region is marked, and enabling the pixel of the salient region is 1,
The pixel for enabling non-limiting region is 0, to obtain d-th of lenticule image IdD-th of true notable figure Gd, described
D-th of true notable figure GdHeight and width be respectively V and U;
Step 3, to d-th of lenticule image IdData enhancing processing is carried out, d-th of enhanced lenticule is obtained
Image collection Id′;To described d-th true notable figure GdGeometric transformation processing is done, d-th of transformed true notable figure is obtained
Set Gd′;
Step 4 repeats step 1.2 to step 3, obtains D enhanced lenticule images in the light field data set L
Set I '=(I1′,I2′,…,Id′,…,I′D) and the transformed true significant set of graphs of D be denoted as G '=(G1′,G2′,…,
Gd′,…,G′D);
D-th step 5, building of light field data LdThe well-marked target detection model of (u, v, s, t);
Step 5.1, the Deeplab-V2 convolutional neural networks for obtaining c layers, the Deeplab-V2 convolutional neural networks packet
It includes convolutional layer, pond layer and abandons layer;
Step 5.2 modifies to described c layers of Deeplab-V2 convolutional neural networks, obtains modified LFnet volumes
Product neural network;
Step 5.2.1, one layer of convolution kernel size is added before the first layer of the Deeplab-V2 convolutional neural networks
For the convolutional layer LF_conv1_1 and ReLU activation primitive LF_relu1_1 of m × m;
The convolutional layer LF_conv1_1 is set when carrying out convolution operation, the moving step length of the convolution kernel is m;
The mathematic(al) representation of the ReLU activation primitive LF_relu1_1 be φ (a)=max (0, a), wherein a expression described in
The output of convolutional layer LF_conv1_1, and the input as ReLU activation primitive LF_relu1_1, φ (a) indicate that ReLU activates letter
The output of number LF_relu1_1;
Step 5.2.2, in addition to having connected discarding layer in convolutional layer LF_conv1_1 and Deeplab-V2 convolutional neural networks
Convolutional layer outside, a discarding layer is added after other convolutional layers in the Deeplab-V2 convolutional neural networks;
Step 5.2.3, c-1 layers of output channel number in the Deeplab-V2 convolutional neural networks is set as
B, b is pixel class number;
Step 5.2.4, increase a up-sampling layer after c layers of the Deeplab-V2 convolutional neural networks, utilize
Characteristic pattern F of the up-sampling layer to c layers of output of the Deeplab-V2 convolutional neural networksd(q, r, b) adopt
Sample operation, the characteristic pattern F after being up-sampledd′(q,r,b);Wherein, q, r and b respectively indicate the characteristic pattern Fd(q, r's, b)
Width, height and port number;
Step 5.2.5, increase a shear layer after the up-sampling layer, according to described d-th true notable figure Gd's
Long V and width U, using the shear layer to the characteristic pattern Fd' (q, r, b) is sheared, and the lenticule image I is obtaineddPicture
Plain class prediction probability graph Fd″(q,r,b);
Step 5.3, using the enhanced lenticule image collection I ' as the defeated of the LFnet convolutional neural networks
Enter, using the transformed true significant set of graphs G ' as label, is calculated using cross entropy loss function, and using gradient decline
Method is trained the LFnet convolutional neural networks, to obtain the well-marked target detection model of light field data, using described
Well-marked target detection model, which is realized, detects the well-marked target of light field data.
Compared with prior art, the beneficial effects of the present invention are:
1, the present invention acquires the light field data of scene complicated and changeable using second generation light-field camera, these scenes contain
The difficult points such as the conspicuousness target of sizes, various light sources, well-marked target are similar to background, background is mixed and disorderly, sufficiently supplement and work as
Deficiency of the preceding light field visible data in data and difficulty, and improve the quality of current light-field visible data.
2, the present invention extracts characteristics of image using depth convolutional network function powerful in terms of image procossing, merges light field
The spatial information and Viewing-angle information of data, using the contextual information of empty pyramid network capture lenticule image, to image
Well-marked target in scene is detected, and solves current two dimension or three-dimensional well-marked target detection method is not available Viewing-angle information
Defect, improve under complex scene image well-marked target detection precision and robustness.
3, the multi-angle of view information response in lenticule image used herein the space geometry feature of scene, directly will
Lenticule image is input in convolutional neural networks, realizes well-marked target detection, overcomes the detection of current light-field well-marked target
The shortcomings that method independent process depth and colouring information, depth perception and vision significance are taken into account, is effectively utilized depth and face
The complementarity of color improves the accuracy of image well-marked target detection.
Detailed description of the invention
Fig. 1 is well-marked target detection method work flow diagram of the invention;
Fig. 2 is the sub-aperture image that the method for the present invention obtains;
Fig. 3 is the lenticule image that the method for the present invention obtains;
Fig. 4 is the data set part scene and true notable figure that the method for the present invention obtains;
Fig. 5 is the detailed process figure that the method for the present invention lenticule image inputs network model;
Fig. 6 is Deeplab-V2 model structure used in the method for the present invention;
Fig. 7 is the data set that the method for the present invention and other light field well-marked target detection methods are acquired in second generation light-field camera
On, the part well-marked target testing result comparison diagram of acquisition;
Fig. 8 be the method for the present invention with " recall ratio/precision ratio curve " for module, second generation light-field camera acquire
Data set on, the analysis chart of quantization comparison is carried out with other current light field conspicuousness extracting methods.
Specific embodiment
In the present embodiment, a kind of light field well-marked target detection method based on depth convolutional network, flow chart such as Fig. 1 institute
Show, and carry out as follows:
Step 1 obtains lenticule image Id;
Step 1.1 obtains light field file using light field equipment, and is decoded to obtain light field data set and is denoted as L=
(L1,L2,…,Ld,…,LD), wherein LdIt indicates d-th of light field data, and d-th of light field data is denoted as Ld(u, v, s, t), u
With any horizontal pixel and vertical pixel in v representation space information, s and t indicate in Viewing-angle information any horizontal view angle and vertical
Visual angle;The sum of d ∈ [1, D], D expression light field data;
In the present embodiment, light field file is obtained using second generation light-field camera, and with lytro powertoolbeta work
Tool is decoded light field file, obtains light field data Ld(u,v,s,t);Light field data Ld(u, v, s, t) is joined using biplane
Number method indicates, in four-dimensional (u, v, s, t) coordinate space, a light corresponds to a sampled point of light field, and u, v plane indicate
Spatial information plane, s, t plane indicate Viewing-angle information plane;In experiment of the invention, 640 light field datas are obtained altogether, are put down
5 parts are divided into, selects 1 part in turn as test set, remaining 4 parts are used as training set.D in step 1.1 indicates training dataset,
D=512;
Step 1.2, fixed horizontal view angle s and vertical visual angle t, and traverse d-th of light field data LdOwn in (u, v, s, t)
Horizontal pixel and vertical pixel obtain d-th of light field data LdSub-aperture image in (u, v, s, t) under t row s column visual angleAndHeight and width be denoted as V and U respectively, v ∈ [1, V], u ∈ [1, U], in this experiment, V=375, U=
540;
Step 1.3, traversal light field data LdAll horizontal view angles and vertical visual angle in (u, v, s, t) obtain d-th all
Sub-aperture image collection under visual angleWherein, [1, S] s ∈, t ∈
[1, T], S indicate that maximum horizontal visual angle is expert at, and T indicates maximum vertical visual angle column;In specific implementation, S=14, T=14;
As shown in Fig. 2, left figure is the sub-aperture image collection at all visual angles in Fig. 2, right figure is under the 6th row the 11st column visual angle in Fig. 2
Sub-aperture image
The visual angle number that step 1.4, definition are chosen is m × m, using formula (1) from the sub-aperture figure under d-th of whole visual angle
Image set closes NdD-th image collection M of the middle selection centered on central visual angled;In specific implementation, m=9 has chosen 81 views altogether
Angle image;Experiment display, more visual angles can provide more information, can further promote the property of well-marked target detection model
Can, still, more visual angles need to consume a large amount of storages and calculate the time, increase experiment difficulty;
In formula (1),And it is rightDownward round numbers;
Step 1.5 obtains d-th of lenticule image I according to x=(v-1) × m+t, y=(u-1) × m+sdMiddle xth row
The pixel I of y columnd(x, y), to obtain d-th of lenticule image I that height and width are respectively H and Wd, as schemed 3 institutes
Show, wherein x ∈ [1, H], y ∈ [1, W], H=V × m, W=U × m;In the present embodiment, in H=3375, W=4860, Fig. 3
Left figure is lenticule image Id, right figure is lenticule image I in Fig. 3dPartial enlarged view, the institute in partial enlarged view medium square
There is pixel to represent the pixel set of the same space information, different perspectives information.
Step 2, from d-th of image collection MdThe sub-aperture image for choosing d-th of central visual angle, is denoted asTo d-th
The sub-aperture image at central visual angleSalient region is marked, and enabling the pixel of salient region is 1, enables non-limiting region
Pixel be 0, to obtain d-th of lenticule image IdD-th of true notable figure Gd, d-th of true notable figure GdHeight
It is respectively V and U with width, in specific implementation, V=375, U=540;As shown in figure 4, the first row and the third line are micro- in Fig. 4
Mirror image, the second row and is fourth is that true notable figure.
Step 3, to d-th of lenticule image IdData enhancing processing is carried out, d-th of enhanced lenticule image is obtained
Set Id′;To d-th of true notable figure GdGeometric transformation processing is done, d-th of transformed true significant set of graphs G is obtainedd′;
In the present embodiment, to d-th of lenticule image IdIt rotated, overturn, increased coloration, increase contrast, increase brightness, drop
Low brightness and increase Gaussian noise processing, realizes data enhancing, well-marked target detection model can be improved in data enhancing
Generalization ability.
Step 4 repeats step 1.2 to step 3, obtains D enhanced lenticule image collections in light field data set L
I '=(I1′,I2′,…,Id′,…,I′D) and the transformed true significant set of graphs of D be denoted as G '=(G1′,G2′,…,
Gd′,…,G′D);
D-th step 5, building of light field data LdThe well-marked target detection model of (u, v, s, t);
Step 5.1, the Deeplab-V2 convolutional neural networks for obtaining c layers, Deeplab-V2 convolutional neural networks include volume
Lamination, pond layer abandon layer and merge layer, and in specific implementation, c=24, Deeplab-V2 use depth convolutional neural networks, by
16 layers of convolutional layer, 5 layers of pond layer, 2 layers of discarding layer and 1 laminated and layer composition, are used for semantic segmentation, detailed construction such as Fig. 6 institute
Show, Deeplab-V2 contains empty pyramid network structure, and the context of image is captured with multiple ratios, realizes that multiple scales are big
Small well-marked target detection.
Step 5.2 modifies to c layers of Deeplab-V2 convolutional neural networks, obtains modified LFnet convolution mind
Through network, the detailed construction of LFnet convolutional neural networks is as shown in Figure 5;
Step 5.2.1, be added before the first layer of Deeplab-V2 convolutional neural networks one layer of convolution kernel size be m ×
The convolutional layer LF_conv1_1 and ReLU activation primitive LF_relu1_1 of m;
Convolutional layer LF_conv1_1 is set when carrying out convolution operation, the moving step length of convolution kernel is m;In specific implementation, m
=9;Lenticule image I is constructed in step 1.4 and step 1.5dWhen, the visual angle number of selection is 9 × 9, in order to which network can be more
Good extraction simultaneously merges multi-angle of view information, so the convolution kernel size of setting convolutional layer LF_conv1_1 is 9 × 9, step-length 9;
The mathematic(al) representation of ReLU activation primitive LF_relu1_1 is that (0, a), wherein a indicates convolutional layer to φ (a)=max
The output of LF_conv1_1, and the input as ReLU activation primitive LF_relu1_1, φ (a) indicate ReLU activation primitive LF_
The output of relu1_1;
Step 5.2.2, in addition to having connected discarding layer in convolutional layer LF_conv1_1 and Deeplab-V2 convolutional neural networks
Convolutional layer outside, a discarding layer is added after other convolutional layers in Deeplab-V2 convolutional neural networks;In this implementation
In example, it is added and abandons layer, over-fitting can be effectively prevented, while improving the generalization ability of well-marked target detection model;
Step 5.2.3, c-1 layers of output channel number in Deeplab-V2 convolutional neural networks is set as b,
B is pixel class number;In specific implementation, c-1=23, b=2;Well-marked target detection model is classified to pixel, is divided into
Significant and non-significant two class.
Step 5.2.4, increase a up-sampling layer after c layers of Deeplab-V2 convolutional neural networks, using above adopting
Characteristic pattern F of the sample layer to c layers of output of Deeplab-V2 convolutional neural networksd(q, r, b) carries out up-sampling operation, in acquisition
Characteristic pattern F after samplingd′(q,r,b);Wherein, q, r and b respectively indicate characteristic pattern FdWidth, height and the channel of (q, r, b)
Number;
Step 5.2.5, increase a shear layer after up-sampling layer, according to d-th of true notable figure GdLong V and width U,
Using shear layer to characteristic pattern Fd' (q, r, b) is sheared, and lenticule image I is obtaineddPixel class prediction probability figure Fd″
(q,r,b);
Step 5.3, using enhanced lenticule image collection I ' as the input of LFnet convolutional neural networks, with transformation
True significant set of graphs G ' afterwards is used as label, using cross entropy loss function, and using gradient descent algorithm to LFnet convolution
Neural network is trained, to obtain the well-marked target detection model of light field data, is realized using well-marked target detection model
Well-marked target detection to light field data.
Test set is handled according to step 1.1 to step 2, obtains the lenticule image of test set, test set it is micro-
Lenticular image is input in well-marked target detection model, obtains the pixel class prediction probability figure F of test settest" (q, r, b),
Notable figure F is extracted using formula (2)s", F in formula (2)test" (q, r, 2) represents probability graph Ftest" the number in second channel (q, r, b)
Value;To notable figure Fs" normalization, obtains final notable figure Fs。
Fs"=Ftest″(q,r,2) (2)
For the performance of well-marked target detection model obtained in more fair evaluation the method for the present invention, in turn selection training
Collection and test set take the average final index as evaluation well-marked target detection model performance to 5 test results.
Fig. 7 is the well-marked target detection method of the invention based on depth convolutional network and other current light field well-marked targets
Detection method is qualitatively compared, wherein Ours indicates the well-marked target detection side of the invention based on depth convolutional network
Method;Multi-cue indicate based on focused flow, visual angle stream, depth and color light field well-marked target detection method;DILF indicates base
In the light field well-marked target detection method of color, depth and background priori;WSC indicates that the light field based on sparse coding theory is significant
Object detection method;LFS indicates the well-marked target detection method modeled based on target and background.4 kinds of methods make in the present invention
It is tested on the real scene data set of second generation light-field camera acquisition.
Table 1 be it is of the invention based on the well-marked target detection method of depth convolutional network with " F-measure ", " WF-
Measure ", " mean accuracy AP ", " average absolute value error MAE " are module, and are acquired using second generation light-field camera
Data set, the analytical table of quantization comparison is carried out with other current light field well-marked target detection methods, " F-measure " is " to look into
The statistical indicator of full rate/precision ratio curve " measurement, value show that the effect of well-marked target detection is better, " WF- closer to 1
Measure " is the statistical indicator of " weighting recall ratio/precision ratio curve " measurement, and value shows that well-marked target detects closer to 1
Effect it is better, " AP " has measured the average precision of the result of well-marked target detection, and value indicates well-marked target closer to 1
The effect of detection is better, and " MAE " has measured the result of well-marked target detection and the average absolute difference degree of legitimate reading, value
Closer to 0, show that the effect of well-marked target detection is better.
Fig. 8 be it is of the invention based on the well-marked target detection method of depth convolutional network with " accuracy rate-recall rate curve PR
Curve " is module, the analysis chart of quantization comparison is carried out with other current light field well-marked target detection methods, if a PR song
Completely " encasing " by another PR curve, then the performance of the latter is better than the former to line.
Table 1
Well-marked target detection method | Ours | Multi-cue | DILF | WSC | LFS |
F-measure | 0.8118 | 0.6649 | 0.6395 | 0.6452 | 0.6108 |
WF-measure | 0.7541 | 0.5420 | 0.4844 | 0.5946 | 0.3597 |
AP | 0.9124 | 0.6593 | 0.6922 | 0.5960 | 0.6193 |
MAE | 0.0551 | 0.1198 | 0.1390 | 0.1093 | 0.1698 |
By the quantitative analysis table of table 1 as it can be seen that the method for the present invention obtain " F-measure ", " WF-measure ", " AP " and
" MAE " is above other light field well-marked target detection methods.By the PR curve graph of Fig. 8 as it can be seen that the method for the present invention shows " to look into complete
Rate/precision ratio curve " includes the PR curve of other methods close to the upper right corner, and when recall ratio is identical, probability of failure compared with
It is low.
Claims (1)
1. a kind of light field well-marked target detection method based on depth convolutional network, it is characterized in that carrying out as follows:
Step 1 obtains lenticule image Id;
Step 1.1 obtains light field file using light field equipment, and is decoded to obtain light field data set and is denoted as L=(L1,
L2,…,Ld,…,LD), wherein LdIt indicates d-th of light field data, and d-th of light field data is denoted as Ld(u, v, s, t), u and v table
Show any horizontal pixel and vertical pixel in spatial information, s and t indicate any horizontal view angle and vertical visual angle in Viewing-angle information;d
The sum of ∈ [1, D], D expression light field data;
Step 1.2, fixed horizontal view angle s and vertical visual angle t, and traverse d-th of light field data LdOwn in (u, v, s, t)
Horizontal pixel and vertical pixel obtain d-th of light field data LdSub-aperture in (u, v, s, t) under t row s column visual angle
ImageAndHeight and width be denoted as V and U, v ∈ [1, V], u ∈ [1, U] respectively;
Step 1.3, the traversal light field data LdAll horizontal view angles and vertical visual angle in (u, v, s, t) obtain d-th all
Sub-aperture image collection under visual angleWherein, [1, S] s ∈, t ∈
[1, T], S indicate that maximum horizontal visual angle is expert at, and T indicates maximum vertical visual angle column;
The visual angle number that step 1.4, definition are chosen is m × m, using formula (1) from the sub-aperture image set under d-th of whole visual angle
Close NdD-th image collection M of the middle selection centered on central visual angled:
In formula (1),And it is right
Downward round numbers;
Step 1.5 obtains d-th of lenticule image I according to x=(v-1) × m+t, y=(u-1) × m+sdMiddle xth row y column
Pixel Id(x, y), to obtain d-th of lenticule image I that height and width are respectively H and Wd, wherein x ∈ [1, H], y
∈ [1, W], H=V × m, W=U × m;
Step 2, from d-th of image collection MdThe sub-aperture image for choosing d-th of central visual angle, is denoted asTo described
The sub-aperture image at d central visual angleSalient region is marked, and enabling the pixel of the salient region is 1, is enabled non-aobvious
The pixel in work property region is 0, to obtain d-th of lenticule image IdD-th of true notable figure Gd, described d-th true
Real notable figure GdHeight and width be respectively V and U;
Step 3, to d-th of lenticule image IdData enhancing processing is carried out, d-th of enhanced lenticule image is obtained
Set I 'd;To described d-th true notable figure GdGeometric transformation processing is done, d-th of transformed true significant set of graphs is obtained
G′d;
Step 4 repeats step 1.2 to step 3, obtains D enhanced lenticule image collections in the light field data set L
I '=(I '1,I′2,…,I′d,…,I′D) and the transformed true significant set of graphs of D be denoted as G '=(G '1,G′2,…,G
′d,…,G′D);
D-th step 5, building of light field data LdThe well-marked target detection model of (u, v, s, t);
Step 5.1, the Deeplab-V2 convolutional neural networks for obtaining c layers, the Deeplab-V2 convolutional neural networks include volume
Lamination, pond layer and discarding layer;
Step 5.2 modifies to described c layers of Deeplab-V2 convolutional neural networks, obtains modified LFnet convolution mind
Through network;
Step 5.2.1, be added before the first layer of the Deeplab-V2 convolutional neural networks one layer of convolution kernel size be m ×
The convolutional layer LF_conv1_1 and ReLU activation primitive LF_relu1_1 of m;
The convolutional layer LF_conv1_1 is set when carrying out convolution operation, the moving step length of the convolution kernel is m;
The mathematic(al) representation of the ReLU activation primitive LF_relu1_1 is that (0, a), wherein a indicates the convolution to φ (a)=max
The output of layer LF_conv1_1, and the input as ReLU activation primitive LF_relu1_1, φ (a) indicate ReLU activation primitive
The output of LF_relu1_1;
Step 5.2.2, the volume of layer is abandoned in addition to having connected in convolutional layer LF_conv1_1 and Deeplab-V2 convolutional neural networks
Outside lamination, a discarding layer is added after other convolutional layers in the Deeplab-V2 convolutional neural networks;
Step 5.2.3, c-1 layers of output channel number in the Deeplab-V2 convolutional neural networks is set as b,
B is pixel class number;
Step 5.2.4, increase a up-sampling layer after c layers of the Deeplab-V2 convolutional neural networks, using described
Up-sample the characteristic pattern F of c layer output of the layer to the Deeplab-V2 convolutional neural networksd(q, r, b) carries out up-sampling behaviour
Make, the characteristic pattern F after being up-sampledd′(q,r,b);Wherein, q, r and b respectively indicate the characteristic pattern FdThe width of (q, r, b)
Degree, height and port number;
Step 5.2.5, increase a shear layer after the up-sampling layer, according to described d-th true notable figure GdLong V and
Wide U, using the shear layer to the characteristic pattern Fd' (q, r, b) is sheared, and the lenticule image I is obtaineddPixel class
Other prediction probability figure Fd″(q,r,b);
Step 5.3, using the enhanced lenticule image collection I ' as the input of the LFnet convolutional neural networks, with
The transformed true significant set of graphs G ' is used as label, using cross entropy loss function, and utilizes gradient descent algorithm pair
The LFnet convolutional neural networks are trained, to obtain the well-marked target detection model of light field data, using described significant
Target detection model realization detects the well-marked target of light field data.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811141315.2A CN109344818B (en) | 2018-09-28 | 2018-09-28 | Light field significant target detection method based on deep convolutional network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811141315.2A CN109344818B (en) | 2018-09-28 | 2018-09-28 | Light field significant target detection method based on deep convolutional network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109344818A true CN109344818A (en) | 2019-02-15 |
CN109344818B CN109344818B (en) | 2020-04-14 |
Family
ID=65307539
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811141315.2A Active CN109344818B (en) | 2018-09-28 | 2018-09-28 | Light field significant target detection method based on deep convolutional network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109344818B (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110441271A (en) * | 2019-07-15 | 2019-11-12 | 清华大学 | Light field high-resolution deconvolution method and system based on convolutional neural networks |
CN111369522A (en) * | 2020-03-02 | 2020-07-03 | 合肥工业大学 | Light field significance target detection method based on generation of deconvolution neural network |
CN111445465A (en) * | 2020-03-31 | 2020-07-24 | 江南大学 | Light field image snowflake or rain strip detection and removal method and device based on deep learning |
CN111931793A (en) * | 2020-08-17 | 2020-11-13 | 湖南城市学院 | Saliency target extraction method and system |
CN113343822A (en) * | 2021-05-31 | 2021-09-03 | 合肥工业大学 | Light field saliency target detection method based on 3D convolution |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105701813A (en) * | 2016-01-11 | 2016-06-22 | 深圳市未来媒体技术研究院 | Significance detection method of light field image |
US20160203689A1 (en) * | 2015-01-08 | 2016-07-14 | Kenneth J. Hintz | Object Displacement Detector |
CN105913070A (en) * | 2016-04-29 | 2016-08-31 | 合肥工业大学 | Multi-thread significance method based on light field camera |
CN106981080A (en) * | 2017-02-24 | 2017-07-25 | 东华大学 | Night unmanned vehicle scene depth method of estimation based on infrared image and radar data |
WO2018072858A1 (en) * | 2016-10-18 | 2018-04-26 | Photonic Sensors & Algorithms, S.L. | Device and method for obtaining distance information from views |
CN107993260A (en) * | 2017-12-14 | 2018-05-04 | 浙江工商大学 | A kind of light field image depth estimation method based on mixed type convolutional neural networks |
-
2018
- 2018-09-28 CN CN201811141315.2A patent/CN109344818B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160203689A1 (en) * | 2015-01-08 | 2016-07-14 | Kenneth J. Hintz | Object Displacement Detector |
CN105701813A (en) * | 2016-01-11 | 2016-06-22 | 深圳市未来媒体技术研究院 | Significance detection method of light field image |
CN105913070A (en) * | 2016-04-29 | 2016-08-31 | 合肥工业大学 | Multi-thread significance method based on light field camera |
WO2018072858A1 (en) * | 2016-10-18 | 2018-04-26 | Photonic Sensors & Algorithms, S.L. | Device and method for obtaining distance information from views |
CN106981080A (en) * | 2017-02-24 | 2017-07-25 | 东华大学 | Night unmanned vehicle scene depth method of estimation based on infrared image and radar data |
CN107993260A (en) * | 2017-12-14 | 2018-05-04 | 浙江工商大学 | A kind of light field image depth estimation method based on mixed type convolutional neural networks |
Non-Patent Citations (4)
Title |
---|
HAO SHENG等: "Occlusion-aware depth estimation for light field using multi-orientation EPIs", 《PATTERN RECOGNITION》 * |
JUN ZHANG等: "Saliency Detection on Light Field: A Multi-Cue Approach", 《ACM TRANSACTIONS ON MULTIMEDIA COMPUTING, COMMUNICATIONS, AND APPLICATIONS (TOMM)》 * |
王丽娟: "光场相机的标定方法及深度估计研究", 《万方数据知识服务平台》 * |
罗姚翔: "基于卷积神经网络的光场图像深度估计技术研究", 《万方数据知识服务平台》 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110441271A (en) * | 2019-07-15 | 2019-11-12 | 清华大学 | Light field high-resolution deconvolution method and system based on convolutional neural networks |
CN111369522A (en) * | 2020-03-02 | 2020-07-03 | 合肥工业大学 | Light field significance target detection method based on generation of deconvolution neural network |
CN111445465A (en) * | 2020-03-31 | 2020-07-24 | 江南大学 | Light field image snowflake or rain strip detection and removal method and device based on deep learning |
CN111931793A (en) * | 2020-08-17 | 2020-11-13 | 湖南城市学院 | Saliency target extraction method and system |
CN111931793B (en) * | 2020-08-17 | 2024-04-12 | 湖南城市学院 | Method and system for extracting saliency target |
CN113343822A (en) * | 2021-05-31 | 2021-09-03 | 合肥工业大学 | Light field saliency target detection method based on 3D convolution |
Also Published As
Publication number | Publication date |
---|---|
CN109344818B (en) | 2020-04-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109344818A (en) | A kind of light field well-marked target detection method based on depth convolutional network | |
Chen et al. | Learned feature embeddings for non-line-of-sight imaging and recognition | |
CN108549891B (en) | Multi-scale diffusion well-marked target detection method based on background Yu target priori | |
Lin et al. | Line segment extraction for large scale unorganized point clouds | |
CN108596108B (en) | Aerial remote sensing image change detection method based on triple semantic relation learning | |
Romaszko et al. | Vision-as-inverse-graphics: Obtaining a rich 3d explanation of a scene from a single image | |
CN111612807A (en) | Small target image segmentation method based on scale and edge information | |
CN110297232A (en) | Monocular distance measuring method, device and electronic equipment based on computer vision | |
Li et al. | Neulf: Efficient novel view synthesis with neural 4d light field | |
CN112784782B (en) | Three-dimensional object identification method based on multi-view double-attention network | |
Li et al. | Target detection based on dual-domain sparse reconstruction saliency in SAR images | |
CN113159232A (en) | Three-dimensional target classification and segmentation method | |
CN112990010A (en) | Point cloud data processing method and device, computer equipment and storage medium | |
CN114998566A (en) | Interpretable multi-scale infrared small and weak target detection network design method | |
Dey et al. | Mip-NeRF RGB-d: Depth assisted fast neural radiance fields | |
CN114299405A (en) | Unmanned aerial vehicle image real-time target detection method | |
CN114463736A (en) | Multi-target detection method and device based on multi-mode information fusion | |
Agresti et al. | Stereo and ToF data fusion by learning from synthetic data | |
CN106886754B (en) | Object identification method and system under a kind of three-dimensional scenic based on tri patch | |
Xu et al. | Light field distortion feature for transparent object classification | |
Chen et al. | Scene segmentation of remotely sensed images with data augmentation using U-net++ | |
Wang et al. | Buried target detection method for ground penetrating radar based on deep learning | |
CN104217430A (en) | Image significance detection method based on L1 regularization | |
CN113011359A (en) | Method for simultaneously detecting plane structure and generating plane description based on image and application | |
Balado et al. | Multi feature-rich synthetic colour to improve human visual perception of point clouds |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |