CN109462747A

CN109462747A - Based on the DIBR system gap filling method for generating confrontation network

Info

Publication number: CN109462747A
Application number: CN201811512179.3A
Authority: CN
Inventors: 刘然; 赵洋; 肖迪; 郑杨婷; 刘亚琼; 陈希; 张艳珍
Original assignee: "CHENGDU MEILYU TECHNOLOGY Co LTD"; Chongqing University
Current assignee: Shaanxi Hengsheng Bolong Junxin Technology Co ltd
Priority date: 2018-12-11
Filing date: 2018-12-11
Publication date: 2019-03-12
Anticipated expiration: 2038-12-11
Also published as: CN109462747B

Abstract

The invention discloses a kind of based on the DIBR system gap filling method for generating confrontation network, a collection of reference picture sample and corresponding depth image are collected first, generate training sample, it is trained using this training sample to confrontation network is generated, obtain the generation model filled for DIBR system cavity, then the corresponding tensor of target image to be filled is generated using method identical with training sample, and it generates prospect and inhibits masking-out, processing obtains input tensor, it is input to generation model, finally output result is further processed to obtain the filled target image in cavity.The present invention extracts low-level features and high-level semantics feature using confrontation network is generated, and proposes the consistency of highly filled hole region and remainder pixel.

Description

Based on the DIBR system gap filling method for generating confrontation network

Technical field

The invention belongs to 3-D image converter technique fields, more specifically, are related to a kind of based on generation confrontation network DIBR system gap filling method.

Background technique

View synthesis engine (view synthesis engine) is the important component of 3D display technology, and view closes At there are two main classes in engine view synthesizing method: stereo content generation method and Content_based image based on threedimensional model are raw At the method for stereo-picture, what is generallyd use at present is to draw (depth-image-based based on depth image Rendering, DIBR) algorithm.

Traditional 3D stereo display technique needs to transmit two-way or multichannel view, and generation continuously looks around (continuous Look-around), make one to experience stereoscopic effect.However, DIBR technology can only pass through video and its corresponding depth all the way Figure generates stereo-picture (stereoscopic image).Often there is empty (holes) using the view that DIBR technology synthesizes, The main reason for generating cavity is the visibility of object in view as the change of viewpoint is changed, such as in reference picture By the background object of foreground occlusion, become in the target image as it can be seen that therefore generate virtual visual point image (target image) exist Partial information content missing, to generate cavity.It is to be solved in DIBR technology for how reasonably and effectively filling these cavities Critical issue.In recent years, researcher proposes the method for many cavity fillings, these methods are broadly divided into following 4 class:

(1) depth map pre-processes

The pretreated basic thought of depth map is that cavity occur after reducing or even avoiding View synthesis.The depth in depth map Discontinuous place often generates cavity.If the depth value of two neighbor pixels in reference picture exists certain poor Different, then their displacements accordingly in 3-D image transformation are different, i.e., the two pixels are no longer adjacent in the target image. Depth in reference picture between two pixels is bigger, and empty width is also bigger.Depth image pretreatment is adjacent by reducing Depth difference between pixel reduces cavity, i.e., is first smoothed to depth map before 3-D image transformation.After smooth Depth map depth continuity it is more gentle, therefore generate cavity it is smaller.

Since depth map describes the geometry of scene, each change of depth map can all lead to transformed target The change of scene geometry in image, this, which means that, can generate geometric distortion in the view of synthesis.Since human eye is to vertical It is distorted more sensitive, therefore Zhang et al. is filtered depth image using asymmetric Gaussian filter, is hung down with weakening in background The geometric distortion phenomenon of straight grain information.Daribo et al. is only smooth in the adjacent edges application of depth map, in this way, only cavity Neighbouring region can be just smoothed, and other regions remain unchanged.Hong proposes direction gaussian filtering (DGF, directional Gaussian filter), in pretreatment deep figure, depth map is first expanded, then using asymmetric Gaussian filter to appearance The place in cavity is filtered.However, such methods existed general problem is that they can introduce geometric distortion, so that target figure The quality of picture reduces.

(2) interpolation method (Interpolation)

Gap filling method thought based on interpolation is fairly simple, and this method directly passes through the texture near cavity slotting The method of value travels to hole region.Method based on interpolation mainly includes horizontal interpolation (horizontal ) and background extrapolation (background extrapolation) interpolation.Horizontal interpolation method to hole region line by line Processing fills the empty pixel in the row by the linear interpolation between the non-empty pixel in left and right in the row.Background extrapolation side Method also handles filling cavity line by line, it identifies the non-empty pixel of the row or so first, then passes the pixel value of farther place pixel It is multicast in hole region.Since these interpolation methods are to every a line independent process in view, such methods will lead to row On visual artefacts.

(3) image repair (Image inpainting)

Gap filling method based on image repair depends on bigger reference zone.Image repair method mainly divides at present For two classes: one is the conventional method for being based on patch (patch), another kind is that the depth based on convolutional neural networks generates mould Type.Due to the similitude of cavity filling and image repair problem, many empty filling algorithms based on image repair are by classics Image repair algorithm improvement obtain, but have not yet to see the depth based on convolutional neural networks and generate model and be applied to DIBR In the filling of system cavity.Traditional reparation algorithm based on patch can effectively fill static texture, but non-static in object In the case of filling effect will be greatly reduced.

(4) the cavity filling based on space-time domain information

Above-mentioned gap filling method merely with a certain moment image spatial information, and be based on time-space domain (spatio- Temporal domain) method carry out cavity using the image information of time domain (temporal domain) upper different moments Filling.Hua et al. proposes a kind of drafting (Depth-Video-Based Rendering, DVBR) method based on deep video, This method has content-adaptive characteristic, and for by blocking the reason of, caused macroscopic-void was solved using the relativity of time domain of scene Certainly, but for the small cavity as caused by quantization error image restoration technology is still used.Lin et al. then proposes a kind of using subgraph Generate (sprite generation) algorithm gap filling method, subgraph generating algorithm can be to series of successive frames among Background information (including colored with depth map) merged, subgraph model is established, for empty in DIBR synthesis view Filling.However, the method based on time-space domain cannot be filled up completely cavity, there is still a need for other cavities in the cavity that cannot be filled Fill method is filled.

However, only used low-level features (low-level features) in the studies above, and can not capture higher The semanteme of grade, causes the empty filling effect under complex scene undesirable.

Summary of the invention

It is an object of the invention to overcome the deficiencies of the prior art and provide a kind of based on the DIBR system for generating confrontation network Gap filling method extracts low-level features and high-level semantics feature using confrontation network is generated, mention highly filled hole region and The consistency of remainder pixel.

For achieving the above object, the present invention is based on generate confrontation network DIBR system gap filling method include with Lower step:

S1: training sample is generated using following methods:

S1.1: a collection of reference picture is collectedAnd corresponding depth image D^t, t=1,2 ..., K, K expression reference picture number Amount；

S1.2: by every width reference pictureWith depth image D^t3-D image transformation is carried out, corresponding target image is obtainedAnd generate and target imageThe masking-out mask of identical size^t, masking-out mask^tFor marking empty pixel, if target ImageIn pixel be empty pixel, then in masking-out mask^tIn corresponding pixel value be 1, if it is non-empty pixel Point, then in masking-out mask^tIn corresponding pixel value be 0；

S1.3: by every width reference pictureEach access matrix and corresponding masking-out mask^tIt is separately converted to batch_ size×M^t×N^t× 1 tensor, then successively splicing obtains batch_size × M^t×N^tThe tensor of × (L+1) is as correspondence Training sample, wherein batch_size indicates batch size, M^t×N^tIndicate reference pictureSize, L indicate reference picturePort number；

S2: the training sample generated using step S1 is trained to confrontation network is generated, and is obtained empty for DIBR system The generation model of hole filling；

S3: for through reference picture I '_RCarry out the target image I ' to cavity filling that image three-dimensional converts_S, generate With target image I '_SThe non-empty matrix M ' of identical size_SWith masking-out mask ', wherein non-cavity matrix M '_SFor distinguishing target Image I '_SIn non-empty pixel and empty pixel, for target image I '_SIn pixel be empty pixel, then non- Empty matrix M '_SIt is middle that corresponding element value is set to -1, whereas if being non-empty pixel, then in non-empty matrix M '_SIn Corresponding element value is set to the depth value of the pixel；Masking-out mask ' is for marking empty pixel, if target image I′_SIn pixel be empty pixel, then be 1 by corresponding element value in masking-out mask ', if it is non-empty pixel Point, then in masking-out mask ', corresponding element value is set to 0 by；

S4: according to non-empty matrix M '_SIt carries out empty detection and the cavity detected is expanded, the matrix that will be obtained Inhibit masking-out mask ' as prospect_FR；

S5: using same method in step S1.3 by target image I '_SEach access matrix and masking-out mask ' turn respectively The tensor of batch_size × M ' × N ' × 1 is turned to, wherein M ' × N ' expression target image I '_SSize, then successively splice Batch_size × M ' × N ' × (L+1) tensor R is obtained, input tensor input is then calculated according to the following formula:

Input=R ⊙ (1-mask ')

Wherein, ⊙ indicates that corresponding element is multiplied；

Inhibit masking-out mask ' through prospect_FRInput tensor input ' is obtained after processing:

Input '=input ⊙ (1-mask '_FR)；

S6: input tensor input ' being input in the generation model that step S2 is obtained, the image output exported, Then it handles to obtain the filled target image I " in cavity using following formula_S:

I″_S=output ⊙ mask '+I '_S⊙(1-mask′)。

The present invention is based on the DIBR system gap filling methods for generating confrontation network, collect a collection of reference picture sample first With corresponding depth image, training sample is generated, is trained, is used for confrontation network is generated using this training sample Then it is corresponding to generate target image to be filled using method identical with training sample for the generation model of DIBR system cavity filling Tensor, and generate prospect inhibit masking-out, processing obtain input tensor, be input to generation model, finally to output result It is further processed to obtain the filled target image in cavity.The present invention extracts low-level features and height using confrontation network is generated Grade semantic feature, proposes the consistency of highly filled hole region and remainder pixel.

Detailed description of the invention

Fig. 1 is that the present invention is based on the specific embodiment processes for the DIBR system gap filling method for generating confrontation network Figure；

Fig. 2 is the flow chart that training sample is obtained in the present invention；

Fig. 3 is training sample generation phase 3-D image transformation exemplary diagram；

Fig. 4 is the exemplary diagram that confrontation network structure is generated in the present embodiment；

Fig. 5 is to generate target image exemplary diagram to be filled；

Fig. 6 is the exemplary diagram of prospect filling cavity；

Fig. 7 is the flow chart for generating prospect in the present embodiment and inhibiting masking-out；

Fig. 8 is in the present embodiment using the exemplary diagram for generating the empty filling of model progress；

Fig. 9 is to fill exemplary diagram using the target image cavity that " 003 " frame is generated as reference picture in " Ballet " sequence；

Figure 10 is to fill example using the target image cavity that " 096 " frame is generated as reference picture in " Ballet " sequence Figure；

Figure 11 is that whether there is or not the target image cavities of prospect suppression mechanism to fill comparison diagram in the present embodiment.

Specific embodiment

A specific embodiment of the invention is described with reference to the accompanying drawing, preferably so as to those skilled in the art Understand the present invention.Requiring particular attention is that in the following description, when known function and the detailed description of design perhaps When can desalinate main contents of the invention, these descriptions will be ignored herein.

Embodiment

Fig. 1 is that the present invention is based on the specific embodiment processes for the DIBR system gap filling method for generating confrontation network Figure.As shown in Figure 1, the present invention is based on generate confrontation network DIBR system gap filling method the following steps are included:

S101: training sample is obtained:

Firstly the need of acquisition training sample in the present invention.Fig. 2 is the flow chart that training sample is obtained in the present invention.Such as Fig. 2 Shown, the specific steps that training sample is obtained in the present invention include:

S201: reference picture is collected:

Collect a collection of reference pictureAnd corresponding depth image D^t, t=1,2 ..., K, K expression number of reference pictures.

S202: 3-D image transformation:

By every width reference pictureWith depth image D^t3-D image transformation is carried out, corresponding target image is obtainedAnd Generation and target imageThe masking-out mask of identical size^t, masking-out mask^tFor marking empty pixel, if target image In pixel be empty pixel, then in masking-out mask^tIn corresponding pixel value be 1, if it is non-empty pixel, then exist Masking-out mask^tIn corresponding pixel value be 0.Due to target imageSize and reference pictureIt is identical, therefore masking-out mask^t Size also with reference pictureIt is identical.

Fig. 3 is training sample generation phase 3-D image transformation exemplary diagram.As shown in figure 3, the hole region of target image It is marked as white, black region indicates non-hole region in masking-out.3-D image transformation is a kind of common technological means, Details are not described herein for detailed process.

S203: training sample is generated:

By every width reference pictureEach access matrix and corresponding masking-out mask^tBe separately converted to batch_size × M^t×N^t× 1 tensor, then successively splicing obtains batch_size × M^t×N^tThe tensor of × (L+1) is as corresponding training Sample, wherein batch_size indicates batch size, M^t×N^tIndicate reference pictureSize, L indicate reference pictureIt is logical Road number, in general reference pictureIncluding RGB triple channel, therefore L=3, and masking-out mask^tIt is in the present invention a channel.

Criticizing size batch_size is sample size needed for calculating gradient when generating confrontation network training, too small to will lead to Inefficiency can not restrain, and will lead to very much memory greatly can not support, after batch_size increases to a certain extent, decline Direction change very little, so batch_size is a critically important parameter, batch_size is set as 16 in the present embodiment.

For picture size M^t×N^tFor, if will increase very much trained complexity greatly, keep memory consumption excessive, therefore Picture size when generating training sample can be set in practical applications according to the size of memory source, it is every in the present embodiment The size of width reference picture is 256 × 256.The size of each training sample is 16 × 256 × 256 × 4 so in the present embodiment.

S102: training obtains generating model:

The training sample generated using step S101 is trained to confrontation network is generated, and is obtained for DIBR system cavity The generation model of filling.

The specific structure for generating confrontation network, which can according to need, to be configured.Bibliography in the present embodiment " Yu, J., Et al., Free-Form Image Inpainting with Gated Convolution.2018. " devises a kind of generation Fight network structure.Fig. 4 is the exemplary diagram that confrontation network structure is generated in the present embodiment.As shown in figure 4, being generated in the present embodiment Fighting network includes generating model and discrimination model two parts, and generate model includes two stages again, and the first stage is rolled up by 15 Product core is constituted, and reconstruction loss (reconstruction loss) has been used to be trained, the filling result relatively obscured； Second stage is equally made of 15 convolution kernels, uses reconstruction loss, global and local WGAN-GP (Wasserstein GAN With gradient penalty) antagonism loss be trained, obtain final result.Discrimination model is by six convolution kernel structures At after the 6th convolution kernel, the size of characteristic pattern becomesWherein H indicates to generate the figure of training sample The height of picture, W indicates to generate the width of the image of training sample, at this time to element application SN-GAN each in characteristic pattern (Spectral Normalization GAN) loses to train differentiation network.Note that all convolution kernels in the network architecture It is all made of document " Yu, J., et al., Free-Form Image Inpainting with Gated Convolution.2018. the gate convolution (gated convolution) in ".Using the network architecture can relatively faster, More stably train the measured generation model of chymoplasm.

S103: target image to be filled is generated:

For through reference picture I '_RCarry out the target image I ' to cavity filling that image three-dimensional converts_S, generate with Target image I '_SThe non-empty matrix M ' of identical size_SWith masking-out mask ', wherein non-cavity matrix M '_SFor distinguishing target figure As I '_SIn non-empty pixel and empty pixel, for target image I '_SIn pixel be empty pixel, then in non-empty Hole matrix M '_SIt is middle that corresponding pixel value is set to -1, whereas if being non-empty pixel, then in non-empty matrix M '_SIt is middle to incite somebody to action Corresponding pixel value is set to the depth value of the pixel；Masking-out Mask ' is for marking empty pixel, if target image I '_S In pixel be empty pixel, then be 1 by corresponding pixel value in masking-out Mask ', if it is non-empty pixel, then In masking-out Mask ', corresponding pixel value is set to 0 by.Clearly as non-cavity matrix M '_SWith the size and ginseng of masking-out mask ' Examine image I '_RIt is identical.Fig. 5 is to generate target image exemplary diagram to be filled.As shown in figure 5, the hole area of target image to be filled Domain is marked as white, and the gray area of non-cavity matrix indicates non-hole region, and black region indicates non-hole area in masking-out Domain.

S104: it generates prospect and inhibits masking-out:

In 3-D image transformation, empty generation is primarily due to the back in reference picture by foreground occlusion in target image Scenery body becomes visible in the target image.In this case, cavity should be filled by background information, and be unable to prospect of the application letter Breath filling.However it has been investigated that, the generation model that above-mentioned training obtains, since its context pays close attention to mechanism, in filling cavity When can pay close attention to all foreground and background information around cavity, lead to foreground information occur in the hole region that is filled of part, Generate semantic distortion.Fig. 6 is the exemplary diagram of prospect filling cavity.As shown in fig. 6, the part cavity on the right of female dancer is filled to be The dermatoid texture of class.In order to solve this problem, the invention proposes a kind of prospect suppression mechanisms, and prospect of the application information is inhibited to fill out Hole region is filled, needs to generate prospect and inhibits masking-out, generation method is as follows:

According to non-empty matrix M '_SIt carries out empty detection and the cavity detected is expanded, obtained matrix is made Inhibit masking-out mask ' for prospect_FR。

Fig. 7 is the flow diagram for generating prospect in the present embodiment and inhibiting masking-out.As shown in fig. 7, being generated in the present embodiment Details are as follows for the detailed process of prospect inhibition masking-out:

(1) empty edge is detected

To non-empty matrix M '_SFrom left to right, it progressively scans from top to bottom, records the location information at empty edge.To non- Empty matrix M '_SIn a point (x, y), if respective pixel value m (x, y)=- 1 (empty point), and m (x-1, y) ≠ -1, m (x+ 1, y)=1, then mark point (x-1, y) be empty left edge first non-empty point m_L；If m (x, y)=- 1 (empty point), and M (x-1, y)=- 1, m (x+1, y) ≠ 1, then mark point (x+1, y) is first non-empty point m of empty right hand edge_R.From cavity Left edge is denoted as num_holes to point number in cavity continuous between right hand edge.

(2) transition is detected

In non-empty matrix M '_SIn, the parallax value of foreground point is greater than the parallax value of background dot.Therefore, if m_L-m_R≥ Sharp_th, sharp_th are transition value, then it is assumed that the cavity left edge region is prospect, and right hand edge region is background；If m_R- m_L>=sharp_th, then it is assumed that the cavity right hand edge region is prospect, and left edge region is background；If | m_L-m_R|≥sharp_th And m_L>=bf_th, m_R>=bf_th, bf_th are the threshold value for distinguishing foreground and background, then it is assumed that the cavity left and right edges region is Prospect；If | m_L-m_R| < sharp_th and m_L< bf_th, m_R< bf_th then thinks that the cavity left and right edges region is background.

(3) expansion cavity

After empty edge detection and transition detection, only when continuous cavity point number num_holes is greater than setting When threshold value len_hole, the operation in expansion cavity is just carried out.If empty left edge is prospect, empty right hand edge is background, then only Left edge cavity is expanded, the pixel number of expansion is preset parameter P；If empty right hand edge is prospect, empty left edge is Background, then only expansion right hand edge is empty；If empty the right and left is prospect, both sides expand P point respectively；If cavity left and right Both sides are background, then without expansive working.

In the present embodiment, the value of above-mentioned parameter sharp_th, bf_th, len_hole, P are respectively 4,110,35,3.

S105: it obtains and generates mode input:

Using same method in step S203 by target image I '_SEach access matrix and masking-out mask ' convert respectively For the tensor of batch_size × M ' × N ' × 1, wherein M ' × N ' expression target image I '_SSize, then successively splice To batch_size × M ' × N ' × (L+1) tensor R, input tensor input is then calculated according to the following formula:

Input=R ⊙ (1-mask ')

Wherein, ⊙ indicates that corresponding element is multiplied, i.e., respective pixel value is multiplied in image.1 indicates and mask ' identical dimensional Complete 1 tensor.Since R is batch_size × M ' × N ' × (L+1) tensor, mask ' is batch_size × M ' × N ' × 1 Tensor, therefore above formula is equivalent to and enables each component in R in fourth dimension be multiplied respectively with 1-mask ' carry out corresponding element Operation, obtains batch_size × M ' × N ' × (L+1) input tensor input.

Input '=input ⊙ (1-mask '_FR)

Similarly, above formula be equivalent in middle input each component in fourth dimension respectively with 1-mask '_FRCarry out pair The operation for answering element multiplication obtains batch_size × M ' × N ' × (L+1) input tensor input '.

S106: cavity filling:

Input tensor input ' is input in the generation model that step S102 is obtained, the image output exported, Then it handles to obtain the filled target image I " in cavity using following formula_S:

I″_S=output ⊙ mask '+I '_S⊙(1-mask′)

Fig. 8 is in the present embodiment using the exemplary diagram for generating the empty filling of model progress.As shown in figure 8, through generating model Coding-decoded output output is relatively fuzzyyer.Therefore the present invention only takes the part that mask ' white area is corresponded in output (i.e. empty) and target image I '_SNon- hole region part be added to obtain final output result I "_S, to ensure target image Clarity.

From the above description, it can be seen that when training generates confrontation network using reference picture and masking-out in the present invention, What input generated model when carrying out cavity filling is the input for inhibiting masking-out to generate according to target image, masking-out and prospect Amount.Therefore in practical applications, if it is needing to a sequence of video images through the transformed target image of image three-dimensional When sequence carries out cavity filling, training sample is generated using the image in original video image sequence, is carried out to confrontation network is generated After training obtains the generation model filled for DIBR system cavity, then it is empty to target image sequence progress using the generation model Hole filling.The training sample generated under this mode by original video image sequence ginseng related to target image sequence to be filled Number compares there are consistency and generates the generation model that training sample training obtains, cavity filling using other reference pictures Technical effect can be more preferable.

Technical effect in order to better illustrate the present invention carries out experimental verification using " Ballet " sequence image.Fig. 9 is Exemplary diagram is filled using the target image cavity that " 003 " frame is generated as reference picture in " Ballet " sequence.Figure 10 be with Fill exemplary diagram in the target image cavity that " 096 " frame is generated as reference picture in " Ballet " sequence.Such as Fig. 9 and Figure 10 institute Show, use cam5 view generation cam4 view herein, lists general image respectively and 4 comparison diagrams for being locally filled with front and back, Wherein white area is hole region.It can be seen that the present invention has very hole region part from the point of view of whole and part Good filling effect.

It is same to be had using " Ballet " sequence image in order to show the technical effect of prospect suppression mechanism in the present invention Experimental verification without prospect suppression mechanism.Figure 11 is that whether there is or not the fillings pair of the target image cavity of prospect suppression mechanism in the present embodiment Than figure.The apparent part of prospect inhibitory effect is marked using ellipse in Figure 11.As shown in figure 11, the present invention can press down well System generates model prospect of the application information and carrys out filling cavity, to obtain the target view of better authenticity.

Although the illustrative specific embodiment of the present invention is described above, in order to the technology of the art Personnel understand the present invention, it should be apparent that the present invention is not limited to the range of specific embodiment, to the common skill of the art For art personnel, if various change the attached claims limit and determine the spirit and scope of the present invention in, these Variation is it will be apparent that all utilize the innovation and creation of present inventive concept in the column of protection.

Claims

1. a kind of based on the DIBR system gap filling method for generating confrontation network, which comprises the following steps:

S1: training sample is generated using following methods:

S1.1: a collection of reference picture is collectedAnd corresponding depth image D^t, t=1,2 ..., K, K expression number of reference pictures；

S1.2: by every width reference pictureWith depth image D^t3-D image transformation is carried out, corresponding target image is obtainedAnd Generation and target imageThe masking-out mask of identical size^t, masking-out mask^tFor marking empty pixel, if target image In pixel be empty pixel, then in masking-out mask^tIn corresponding pixel value be 1, if it is non-empty pixel, then exist Masking-out mask^tIn corresponding pixel value be 0；

S1.3: by every width reference pictureEach channel and corresponding masking-out mask^tIt is separately converted to batch_size × M^t× N^t× 1 tensor, then successively splicing obtains batch_size × M^t×N^tThe tensor of × (L+1) as corresponding training sample, Wherein batch_size indicates batch size, M^t×N^tIndicate reference pictureSize, L indicate reference picturePort number；

S2: the training sample generated using step S1 is trained to confrontation network is generated, and obtains filling out for DIBR system cavity The generation model filled；

S3: for through reference picture I '_RCarry out the target image I ' to cavity filling that image three-dimensional converts_S, generate and mesh Logo image I '_SThe non-empty matrix M ' of identical size_SWith masking-out mask ', wherein non-cavity matrix M '_SFor distinguishing target image I′_SIn non-empty pixel and empty pixel, for target image I '_SIn pixel be empty pixel, then in non-cavity Matrix M '_SIt is middle that corresponding element value is set to -1, whereas if being non-empty pixel, then in non-empty matrix M '_SIn will be right The element value answered is set to the depth value of the pixel；Masking-out mask ' is for marking empty pixel, if target image I '_SIn Pixel be empty pixel, then be 1 by corresponding element value in masking-out mask ', if it is non-empty pixel, then exist Masking-out mask ' corresponding element value by is set to 0；

S4: according to non-empty matrix M '_SEmpty detection is carried out simultaneously to expand the cavity detected, using obtained matrix as Prospect inhibits masking-out mask '_FR；

S5: using same method in step S1.3 by target image I '_SEach access matrix and masking-out mask ' be separately converted to The tensor of batch_size × M ' × N ' × 1, wherein M ' × N ' expression target image I '_SSize, then successively splicing obtain Then input tensor input is calculated in batch_size × M ' × N ' × (L+1) tensor R according to the following formula:

Input=R ⊙ (1-mask ')

Wherein, ⊙ indicates that corresponding element is multiplied；

Input '=input ⊙ (1-mask '_FR)；

S6: input tensor input ' is input in the generation model that step S2 is obtained, the image output exported, then It handles to obtain the filled target image I " in cavity using following formula_S:

I″_S=output ⊙ mask '+I '_S⊙(1-mask′)。

2. DIBR system gap filling method according to claim 1, which is characterized in that needing to a video image It is raw using the image in original video image sequence when sequence carries out cavity filling through the transformed target image sequence of image three-dimensional It at training sample, is trained after obtaining the generation model for the filling of DIBR system cavity, then uses to confrontation network is generated The generation model carries out empty filling to target image sequence.