CN109344818A

CN109344818A - A kind of light field well-marked target detection method based on depth convolutional network

Info

Publication number: CN109344818A
Application number: CN201811141315.2A
Authority: CN
Inventors: 张骏; 刘亚美; 刘紫薇; 张钊; 郑顺源; 郑彤; 王程; 张旭东; 高隽
Original assignee: Hefei University of Technology
Current assignee: Hefei University of Technology
Priority date: 2018-09-28
Filing date: 2018-09-28
Publication date: 2019-02-15
Anticipated expiration: 2038-09-28
Also published as: CN109344818B

Abstract

The light field well-marked target detection method based on depth convolutional network that the invention discloses a kind of, step include: the 1 sub-aperture image that whole visual angles are converted out from the light field data for using optical field acquisition equipment to obtain；2 by the sub-aperture image reorganization under different perspectives at lenticule image；3 pairs of lenticule images carry out data enhancing；4 based on the pre-training weight of Deeplab-V2 network, builds the well-marked target detection model in conjunction with lenticule image, and utilize data set training；5 carry out well-marked target detection to light field data to be processed using trained well-marked target detection model.The method of the present invention can effectively improve the accuracy of the well-marked target detection of complex scene image.

Description

A kind of light field well-marked target detection method based on depth convolutional network

Technical field

The invention belongs to computer vision, image procossing and analysis fields, specifically a kind of to be based on depth convolution net The light field well-marked target detection method of network.

Background technique

Conspicuousness target detection is the sensing capability of human visual system.When observing piece image, vision system can Interested region and target in image are quickly obtained, the process for obtaining interested region and target is well-marked target inspection It surveys.With the development of computer technology and internet and universal, people's acquisition external image presentation well of intelligent movable equipment Spray formula increases.Well-marked target detection enters subsequent complex process from a large amount of visual informations of input selection very small part, such as Object detection and recognition, image retrieval, image segmentation etc. effectively reduce the calculation amount of vision system.Currently, well-marked target Detection has become one of the hot spot studied in computer vision field.

According to workable image data, the method for current well-marked target detection can be divided into three classes: two-dimentional well-marked target inspection It surveys, three-dimensional well-marked target detection and light field well-marked target detect.

Two-dimentional well-marked target detection method is to obtain two dimensional image using traditional camera, using conventional method or based on The method of habit extracts Fusion of Color, brightness, position and Texture eigenvalue by part or the frame of global contrast, realizes aobvious It writes and non-significant differentiation.

Three-dimensional well-marked target detection method is the depth information using two dimensional image and scene, realizes well-marked target detection. The depth information of scene is obtained by three-dimension sensor, which equally plays an important role in human visual system, it is anti- The distance between object and observer are answered.Depth information is used for well-marked target detection, compensates for conventional two-dimensional image not Foot obtains final notable figure using color and being complementary to one another for depth, improves well-marked target detection to a certain extent Accuracy.

Light field well-marked target detection method is that the light field data obtained to light-field camera is handled, and realizes well-marked target inspection It surveys.Optical field imaging can record simultaneously the position of light radiation in space by single exposure by new calculating imaging technique And Viewing-angle information, the field information of acquisition reflect the geometry and reflection characteristic of natural scene.Conventional method passes through fusion at present Significant properties on different light field datas improves the performance of the well-marked target detection of challenge scene.

Although having had already appeared the outstanding well-marked target detection method of some performances in computer vision field, this A little methods still remain shortcoming:

1, in two-dimentional well-marked target detection method, since two dimensional image is the product that light projects on camera sensor Point, the light intensity of specific direction is contained only, therefore, two-dimentional well-marked target detection is too sensitive to high frequency section or noise, And the influence for the factors such as, background similar with background color and vein vulnerable to prospect be mixed and disorderly.

2, in three-dimensional well-marked target detection method, the precision of depth information of scene depends on depth camera, current depth That there are resolution ratio is lower, measurement range is narrow, noise is big for degree camera, is unable to measure and transmits material, anti-vulnerable to daylight and smooth flat The problems such as light interference.

3, in three-dimensional well-marked target detection method, the characteristic informations such as color, depth, position, which are all independent from each other, to be located Reason merges again, does not consider its complementarity synthetically.

4, most well-marked target detection methods based on two dimension and 3-D image are obvious to exist between target and background Difference, background are simple etc. assume premised on, with the extensive increase of image data, image content complexity increases, these sides Method has some limitations.

5, in the detection of light field well-marked target, light field data is at the early-stage in the research of well-marked target context of detection, at present Available data set is less and picture quality is poor.The current well-marked target detection using light field data is all based on traditional show Feature calculation method is write, and the multi threads such as color, depth, refocusing are modeled respectively simultaneously, it is insufficient that there is feature representation power The problems such as bad with robust detection effect.

Summary of the invention

The present invention be in order to solve above-mentioned the shortcomings of the prior art in place of, propose a kind of based on depth convolutional network Light field well-marked target detection method, to which the spatial information and Viewing-angle information of light field data can be made full use of, so as to effectively mention The accuracy of the well-marked target detection of high complex scene image.

The present invention adopts the following technical scheme that in order to solve the technical problem

A kind of the characteristics of light field well-marked target detection method based on depth convolutional network of the invention be as follows into Row:

Step 1 obtains lenticule image I_d；

Step 1.1 obtains light field file using light field equipment, and is decoded to obtain light field data set and is denoted as L= (L₁,L₂,…,L_d,…,L_D), wherein L_dIt indicates d-th of light field data, and d-th of light field data is denoted as L_d(u, v, s, t), u With any horizontal pixel and vertical pixel in v representation space information, s and t indicate in Viewing-angle information any horizontal view angle and vertical Visual angle；The sum of d ∈ [1, D], D expression light field data；

Step 1.2, fixed horizontal view angle s and vertical visual angle t, and traverse d-th of light field data L_dIn (u, v, s, t) All horizontal pixels and vertical pixel obtain d-th of light field data L_dSon in (u, v, s, t) under t row s column visual angle Subaperture imageAndHeight and width be denoted as V and U, v ∈ [1, V], u ∈ [1, U] respectively；

Step 1.3, the traversal light field data L_dAll horizontal view angles and vertical visual angle in (u, v, s, t) obtain d-th Sub-aperture image collection under whole visual anglesWherein, [1, S] s ∈, t ∈ [1, T], S indicate that maximum horizontal visual angle is expert at, and T indicates maximum vertical visual angle column；

The visual angle number that step 1.4, definition are chosen is m × m, using formula (1) from the sub-aperture figure under d-th of whole visual angle Image set closes N_dD-th image collection M of the middle selection centered on central visual angle_d:

In formula (1),And it is rightDownward round numbers；

Step 1.5 obtains d-th of lenticule image I according to x=(v-1) × m+t, y=(u-1) × m+s_dMiddle xth row The pixel I of y column_d(x, y), to obtain d-th of lenticule image I that height and width are respectively H and W_d, wherein x ∈ [1, H], y ∈ [1, W], H=V × m, W=U × m；

Step 2, from d-th of image collection M_dThe sub-aperture image for choosing d-th of central visual angle, is denoted asIt is right The sub-aperture image at described d-th central visual angleSalient region is marked, and enabling the pixel of the salient region is 1, The pixel for enabling non-limiting region is 0, to obtain d-th of lenticule image I_dD-th of true notable figure G_d, described D-th of true notable figure G_dHeight and width be respectively V and U；

Step 3, to d-th of lenticule image I_dData enhancing processing is carried out, d-th of enhanced lenticule is obtained Image collection I_d′；To described d-th true notable figure G_dGeometric transformation processing is done, d-th of transformed true notable figure is obtained Set G_d′；

Step 4 repeats step 1.2 to step 3, obtains D enhanced lenticule images in the light field data set L Set I '=(I₁′,I₂′,…,I_d′,…,I′_D) and the transformed true significant set of graphs of D be denoted as G '=(G₁′,G₂′,…, G_d′,…,G′_D)；

D-th step 5, building of light field data L_dThe well-marked target detection model of (u, v, s, t)；

Step 5.1, the Deeplab-V2 convolutional neural networks for obtaining c layers, the Deeplab-V2 convolutional neural networks packet It includes convolutional layer, pond layer and abandons layer；

Step 5.2 modifies to described c layers of Deeplab-V2 convolutional neural networks, obtains modified LFnet volumes Product neural network；

Step 5.2.1, one layer of convolution kernel size is added before the first layer of the Deeplab-V2 convolutional neural networks For the convolutional layer LF_conv1_1 and ReLU activation primitive LF_relu1_1 of m × m；

The convolutional layer LF_conv1_1 is set when carrying out convolution operation, the moving step length of the convolution kernel is m；

The mathematic(al) representation of the ReLU activation primitive LF_relu1_1 be φ (a)=max (0, a), wherein a expression described in The output of convolutional layer LF_conv1_1, and the input as ReLU activation primitive LF_relu1_1, φ (a) indicate that ReLU activates letter The output of number LF_relu1_1；

Step 5.2.2, in addition to having connected discarding layer in convolutional layer LF_conv1_1 and Deeplab-V2 convolutional neural networks Convolutional layer outside, a discarding layer is added after other convolutional layers in the Deeplab-V2 convolutional neural networks；

Step 5.2.3, c-1 layers of output channel number in the Deeplab-V2 convolutional neural networks is set as B, b is pixel class number；

Step 5.2.4, increase a up-sampling layer after c layers of the Deeplab-V2 convolutional neural networks, utilize Characteristic pattern F of the up-sampling layer to c layers of output of the Deeplab-V2 convolutional neural networks_d(q, r, b) adopt Sample operation, the characteristic pattern F after being up-sampled_d′(q,r,b)；Wherein, q, r and b respectively indicate the characteristic pattern F_d(q, r's, b) Width, height and port number；

Step 5.2.5, increase a shear layer after the up-sampling layer, according to described d-th true notable figure G_d's Long V and width U, using the shear layer to the characteristic pattern F_d' (q, r, b) is sheared, and the lenticule image I is obtained_dPicture Plain class prediction probability graph F_d″(q,r,b)；

Step 5.3, using the enhanced lenticule image collection I ' as the defeated of the LFnet convolutional neural networks Enter, using the transformed true significant set of graphs G ' as label, is calculated using cross entropy loss function, and using gradient decline Method is trained the LFnet convolutional neural networks, to obtain the well-marked target detection model of light field data, using described Well-marked target detection model, which is realized, detects the well-marked target of light field data.

Compared with prior art, the beneficial effects of the present invention are:

1, the present invention acquires the light field data of scene complicated and changeable using second generation light-field camera, these scenes contain The difficult points such as the conspicuousness target of sizes, various light sources, well-marked target are similar to background, background is mixed and disorderly, sufficiently supplement and work as Deficiency of the preceding light field visible data in data and difficulty, and improve the quality of current light-field visible data.

2, the present invention extracts characteristics of image using depth convolutional network function powerful in terms of image procossing, merges light field The spatial information and Viewing-angle information of data, using the contextual information of empty pyramid network capture lenticule image, to image Well-marked target in scene is detected, and solves current two dimension or three-dimensional well-marked target detection method is not available Viewing-angle information Defect, improve under complex scene image well-marked target detection precision and robustness.

3, the multi-angle of view information response in lenticule image used herein the space geometry feature of scene, directly will Lenticule image is input in convolutional neural networks, realizes well-marked target detection, overcomes the detection of current light-field well-marked target The shortcomings that method independent process depth and colouring information, depth perception and vision significance are taken into account, is effectively utilized depth and face The complementarity of color improves the accuracy of image well-marked target detection.

Detailed description of the invention

Fig. 1 is well-marked target detection method work flow diagram of the invention；

Fig. 2 is the sub-aperture image that the method for the present invention obtains；

Fig. 3 is the lenticule image that the method for the present invention obtains；

Fig. 4 is the data set part scene and true notable figure that the method for the present invention obtains；

Fig. 5 is the detailed process figure that the method for the present invention lenticule image inputs network model；

Fig. 6 is Deeplab-V2 model structure used in the method for the present invention；

Fig. 7 is the data set that the method for the present invention and other light field well-marked target detection methods are acquired in second generation light-field camera On, the part well-marked target testing result comparison diagram of acquisition；

Fig. 8 be the method for the present invention with " recall ratio/precision ratio curve " for module, second generation light-field camera acquire Data set on, the analysis chart of quantization comparison is carried out with other current light field conspicuousness extracting methods.

Specific embodiment

In the present embodiment, a kind of light field well-marked target detection method based on depth convolutional network, flow chart such as Fig. 1 institute Show, and carry out as follows:

Step 1 obtains lenticule image I_d；

In the present embodiment, light field file is obtained using second generation light-field camera, and with lytro powertoolbeta work Tool is decoded light field file, obtains light field data L_d(u,v,s,t)；Light field data L_d(u, v, s, t) is joined using biplane Number method indicates, in four-dimensional (u, v, s, t) coordinate space, a light corresponds to a sampled point of light field, and u, v plane indicate Spatial information plane, s, t plane indicate Viewing-angle information plane；In experiment of the invention, 640 light field datas are obtained altogether, are put down 5 parts are divided into, selects 1 part in turn as test set, remaining 4 parts are used as training set.D in step 1.1 indicates training dataset, D=512；

Step 1.2, fixed horizontal view angle s and vertical visual angle t, and traverse d-th of light field data L_dOwn in (u, v, s, t) Horizontal pixel and vertical pixel obtain d-th of light field data L_dSub-aperture image in (u, v, s, t) under t row s column visual angleAndHeight and width be denoted as V and U respectively, v ∈ [1, V], u ∈ [1, U], in this experiment, V=375, U= 540；

Step 1.3, traversal light field data L_dAll horizontal view angles and vertical visual angle in (u, v, s, t) obtain d-th all Sub-aperture image collection under visual angleWherein, [1, S] s ∈, t ∈ [1, T], S indicate that maximum horizontal visual angle is expert at, and T indicates maximum vertical visual angle column；In specific implementation, S=14, T=14； As shown in Fig. 2, left figure is the sub-aperture image collection at all visual angles in Fig. 2, right figure is under the 6th row the 11st column visual angle in Fig. 2 Sub-aperture image

The visual angle number that step 1.4, definition are chosen is m × m, using formula (1) from the sub-aperture figure under d-th of whole visual angle Image set closes N_dD-th image collection M of the middle selection centered on central visual angle_d；In specific implementation, m=9 has chosen 81 views altogether Angle image；Experiment display, more visual angles can provide more information, can further promote the property of well-marked target detection model Can, still, more visual angles need to consume a large amount of storages and calculate the time, increase experiment difficulty；

In formula (1),And it is rightDownward round numbers；

Step 1.5 obtains d-th of lenticule image I according to x=(v-1) × m+t, y=(u-1) × m+s_dMiddle xth row The pixel I of y column_d(x, y), to obtain d-th of lenticule image I that height and width are respectively H and W_d, as schemed 3 institutes Show, wherein x ∈ [1, H], y ∈ [1, W], H=V × m, W=U × m；In the present embodiment, in H=3375, W=4860, Fig. 3 Left figure is lenticule image I_d, right figure is lenticule image I in Fig. 3_dPartial enlarged view, the institute in partial enlarged view medium square There is pixel to represent the pixel set of the same space information, different perspectives information.

Step 2, from d-th of image collection M_dThe sub-aperture image for choosing d-th of central visual angle, is denoted asTo d-th The sub-aperture image at central visual angleSalient region is marked, and enabling the pixel of salient region is 1, enables non-limiting region Pixel be 0, to obtain d-th of lenticule image I_dD-th of true notable figure G_d, d-th of true notable figure G_dHeight It is respectively V and U with width, in specific implementation, V=375, U=540；As shown in figure 4, the first row and the third line are micro- in Fig. 4 Mirror image, the second row and is fourth is that true notable figure.

Step 3, to d-th of lenticule image I_dData enhancing processing is carried out, d-th of enhanced lenticule image is obtained Set I_d′；To d-th of true notable figure G_dGeometric transformation processing is done, d-th of transformed true significant set of graphs G is obtained_d′； In the present embodiment, to d-th of lenticule image I_dIt rotated, overturn, increased coloration, increase contrast, increase brightness, drop Low brightness and increase Gaussian noise processing, realizes data enhancing, well-marked target detection model can be improved in data enhancing Generalization ability.

Step 4 repeats step 1.2 to step 3, obtains D enhanced lenticule image collections in light field data set L I '=(I₁′,I₂′,…,I_d′,…,I′_D) and the transformed true significant set of graphs of D be denoted as G '=(G₁′,G₂′,…, G_d′,…,G′_D)；

Step 5.1, the Deeplab-V2 convolutional neural networks for obtaining c layers, Deeplab-V2 convolutional neural networks include volume Lamination, pond layer abandon layer and merge layer, and in specific implementation, c=24, Deeplab-V2 use depth convolutional neural networks, by 16 layers of convolutional layer, 5 layers of pond layer, 2 layers of discarding layer and 1 laminated and layer composition, are used for semantic segmentation, detailed construction such as Fig. 6 institute Show, Deeplab-V2 contains empty pyramid network structure, and the context of image is captured with multiple ratios, realizes that multiple scales are big Small well-marked target detection.

Step 5.2 modifies to c layers of Deeplab-V2 convolutional neural networks, obtains modified LFnet convolution mind Through network, the detailed construction of LFnet convolutional neural networks is as shown in Figure 5；

Step 5.2.1, be added before the first layer of Deeplab-V2 convolutional neural networks one layer of convolution kernel size be m × The convolutional layer LF_conv1_1 and ReLU activation primitive LF_relu1_1 of m；

Convolutional layer LF_conv1_1 is set when carrying out convolution operation, the moving step length of convolution kernel is m；In specific implementation, m =9；Lenticule image I is constructed in step 1.4 and step 1.5_dWhen, the visual angle number of selection is 9 × 9, in order to which network can be more Good extraction simultaneously merges multi-angle of view information, so the convolution kernel size of setting convolutional layer LF_conv1_1 is 9 × 9, step-length 9；

The mathematic(al) representation of ReLU activation primitive LF_relu1_1 is that (0, a), wherein a indicates convolutional layer to φ (a)=max The output of LF_conv1_1, and the input as ReLU activation primitive LF_relu1_1, φ (a) indicate ReLU activation primitive LF_ The output of relu1_1；

Step 5.2.2, in addition to having connected discarding layer in convolutional layer LF_conv1_1 and Deeplab-V2 convolutional neural networks Convolutional layer outside, a discarding layer is added after other convolutional layers in Deeplab-V2 convolutional neural networks；In this implementation In example, it is added and abandons layer, over-fitting can be effectively prevented, while improving the generalization ability of well-marked target detection model；

Step 5.2.3, c-1 layers of output channel number in Deeplab-V2 convolutional neural networks is set as b, B is pixel class number；In specific implementation, c-1=23, b=2；Well-marked target detection model is classified to pixel, is divided into Significant and non-significant two class.

Step 5.2.4, increase a up-sampling layer after c layers of Deeplab-V2 convolutional neural networks, using above adopting Characteristic pattern F of the sample layer to c layers of output of Deeplab-V2 convolutional neural networks_d(q, r, b) carries out up-sampling operation, in acquisition Characteristic pattern F after sampling_d′(q,r,b)；Wherein, q, r and b respectively indicate characteristic pattern F_dWidth, height and the channel of (q, r, b) Number；

Step 5.2.5, increase a shear layer after up-sampling layer, according to d-th of true notable figure G_dLong V and width U, Using shear layer to characteristic pattern F_d' (q, r, b) is sheared, and lenticule image I is obtained_dPixel class prediction probability figure F_d″ (q,r,b)；

Step 5.3, using enhanced lenticule image collection I ' as the input of LFnet convolutional neural networks, with transformation True significant set of graphs G ' afterwards is used as label, using cross entropy loss function, and using gradient descent algorithm to LFnet convolution Neural network is trained, to obtain the well-marked target detection model of light field data, is realized using well-marked target detection model Well-marked target detection to light field data.

Test set is handled according to step 1.1 to step 2, obtains the lenticule image of test set, test set it is micro- Lenticular image is input in well-marked target detection model, obtains the pixel class prediction probability figure F of test set_test" (q, r, b), Notable figure F is extracted using formula (2)_s", F in formula (2)_test" (q, r, 2) represents probability graph F_test" the number in second channel (q, r, b) Value；To notable figure F_s" normalization, obtains final notable figure F_s。

F_s"=F_test″(q,r,2) (2)

For the performance of well-marked target detection model obtained in more fair evaluation the method for the present invention, in turn selection training Collection and test set take the average final index as evaluation well-marked target detection model performance to 5 test results.

Fig. 7 is the well-marked target detection method of the invention based on depth convolutional network and other current light field well-marked targets Detection method is qualitatively compared, wherein Ours indicates the well-marked target detection side of the invention based on depth convolutional network Method；Multi-cue indicate based on focused flow, visual angle stream, depth and color light field well-marked target detection method；DILF indicates base In the light field well-marked target detection method of color, depth and background priori；WSC indicates that the light field based on sparse coding theory is significant Object detection method；LFS indicates the well-marked target detection method modeled based on target and background.4 kinds of methods make in the present invention It is tested on the real scene data set of second generation light-field camera acquisition.

Table 1 be it is of the invention based on the well-marked target detection method of depth convolutional network with " F-measure ", " WF- Measure ", " mean accuracy AP ", " average absolute value error MAE " are module, and are acquired using second generation light-field camera Data set, the analytical table of quantization comparison is carried out with other current light field well-marked target detection methods, " F-measure " is " to look into The statistical indicator of full rate/precision ratio curve " measurement, value show that the effect of well-marked target detection is better, " WF- closer to 1 Measure " is the statistical indicator of " weighting recall ratio/precision ratio curve " measurement, and value shows that well-marked target detects closer to 1 Effect it is better, " AP " has measured the average precision of the result of well-marked target detection, and value indicates well-marked target closer to 1 The effect of detection is better, and " MAE " has measured the result of well-marked target detection and the average absolute difference degree of legitimate reading, value Closer to 0, show that the effect of well-marked target detection is better.

Fig. 8 be it is of the invention based on the well-marked target detection method of depth convolutional network with " accuracy rate-recall rate curve PR Curve " is module, the analysis chart of quantization comparison is carried out with other current light field well-marked target detection methods, if a PR song Completely " encasing " by another PR curve, then the performance of the latter is better than the former to line.

Table 1

Well-marked target detection method	Ours	Multi-cue	DILF	WSC	LFS
						F-measure	0.8118	0.6649	0.6395	0.6452	0.6108
WF-measure	0.7541	0.5420	0.4844	0.5946	0.3597
						AP	0.9124	0.6593	0.6922	0.5960	0.6193
MAE	0.0551	0.1198	0.1390	0.1093	0.1698

By the quantitative analysis table of table 1 as it can be seen that the method for the present invention obtain " F-measure ", " WF-measure ", " AP " and " MAE " is above other light field well-marked target detection methods.By the PR curve graph of Fig. 8 as it can be seen that the method for the present invention shows " to look into complete Rate/precision ratio curve " includes the PR curve of other methods close to the upper right corner, and when recall ratio is identical, probability of failure compared with It is low.

Claims

1. a kind of light field well-marked target detection method based on depth convolutional network, it is characterized in that carrying out as follows:

Step 1 obtains lenticule image I_d；

Step 1.1 obtains light field file using light field equipment, and is decoded to obtain light field data set and is denoted as L=(L₁, L₂,…,L_d,…,L_D), wherein L_dIt indicates d-th of light field data, and d-th of light field data is denoted as L_d(u, v, s, t), u and v table Show any horizontal pixel and vertical pixel in spatial information, s and t indicate any horizontal view angle and vertical visual angle in Viewing-angle information；d The sum of ∈ [1, D], D expression light field data；

Step 1.2, fixed horizontal view angle s and vertical visual angle t, and traverse d-th of light field data L_dOwn in (u, v, s, t) Horizontal pixel and vertical pixel obtain d-th of light field data L_dSub-aperture in (u, v, s, t) under t row s column visual angle ImageAndHeight and width be denoted as V and U, v ∈ [1, V], u ∈ [1, U] respectively；

Step 1.3, the traversal light field data L_dAll horizontal view angles and vertical visual angle in (u, v, s, t) obtain d-th all Sub-aperture image collection under visual angleWherein, [1, S] s ∈, t ∈ [1, T], S indicate that maximum horizontal visual angle is expert at, and T indicates maximum vertical visual angle column；

The visual angle number that step 1.4, definition are chosen is m × m, using formula (1) from the sub-aperture image set under d-th of whole visual angle Close N_dD-th image collection M of the middle selection centered on central visual angle_d:

In formula (1),And it is right Downward round numbers；

Step 1.5 obtains d-th of lenticule image I according to x=(v-1) × m+t, y=(u-1) × m+s_dMiddle xth row y column Pixel I_d(x, y), to obtain d-th of lenticule image I that height and width are respectively H and W_d, wherein x ∈ [1, H], y ∈ [1, W], H=V × m, W=U × m；

Step 2, from d-th of image collection M_dThe sub-aperture image for choosing d-th of central visual angle, is denoted asTo described The sub-aperture image at d central visual angleSalient region is marked, and enabling the pixel of the salient region is 1, is enabled non-aobvious The pixel in work property region is 0, to obtain d-th of lenticule image I_dD-th of true notable figure G_d, described d-th true Real notable figure G_dHeight and width be respectively V and U；

Step 3, to d-th of lenticule image I_dData enhancing processing is carried out, d-th of enhanced lenticule image is obtained Set I '_d；To described d-th true notable figure G_dGeometric transformation processing is done, d-th of transformed true significant set of graphs is obtained G′_d；

Step 4 repeats step 1.2 to step 3, obtains D enhanced lenticule image collections in the light field data set L I '=(I '₁,I′₂,…,I′_d,…,I′_D) and the transformed true significant set of graphs of D be denoted as G '=(G '₁,G′₂,…,G ′_d,…,G′_D)；

Step 5.1, the Deeplab-V2 convolutional neural networks for obtaining c layers, the Deeplab-V2 convolutional neural networks include volume Lamination, pond layer and discarding layer；

Step 5.2 modifies to described c layers of Deeplab-V2 convolutional neural networks, obtains modified LFnet convolution mind Through network；

Step 5.2.1, be added before the first layer of the Deeplab-V2 convolutional neural networks one layer of convolution kernel size be m × The convolutional layer LF_conv1_1 and ReLU activation primitive LF_relu1_1 of m；

The mathematic(al) representation of the ReLU activation primitive LF_relu1_1 is that (0, a), wherein a indicates the convolution to φ (a)=max The output of layer LF_conv1_1, and the input as ReLU activation primitive LF_relu1_1, φ (a) indicate ReLU activation primitive The output of LF_relu1_1；

Step 5.2.2, the volume of layer is abandoned in addition to having connected in convolutional layer LF_conv1_1 and Deeplab-V2 convolutional neural networks Outside lamination, a discarding layer is added after other convolutional layers in the Deeplab-V2 convolutional neural networks；

Step 5.2.4, increase a up-sampling layer after c layers of the Deeplab-V2 convolutional neural networks, using described Up-sample the characteristic pattern F of c layer output of the layer to the Deeplab-V2 convolutional neural networks_d(q, r, b) carries out up-sampling behaviour Make, the characteristic pattern F after being up-sampled_d′(q,r,b)；Wherein, q, r and b respectively indicate the characteristic pattern F_dThe width of (q, r, b) Degree, height and port number；

Step 5.2.5, increase a shear layer after the up-sampling layer, according to described d-th true notable figure G_dLong V and Wide U, using the shear layer to the characteristic pattern F_d' (q, r, b) is sheared, and the lenticule image I is obtained_dPixel class Other prediction probability figure F_d″(q,r,b)；

Step 5.3, using the enhanced lenticule image collection I ' as the input of the LFnet convolutional neural networks, with The transformed true significant set of graphs G ' is used as label, using cross entropy loss function, and utilizes gradient descent algorithm pair The LFnet convolutional neural networks are trained, to obtain the well-marked target detection model of light field data, using described significant Target detection model realization detects the well-marked target of light field data.