CN108389217A

CN108389217A - A kind of image synthesizing method based on gradient field mixing

Info

Publication number: CN108389217A
Application number: CN201810094623.8A
Authority: CN
Inventors: 陈志华; 王兢业; 陈若溪; 陈莉莉; 戴超; 盛斌
Original assignee: East China University of Science and Technology
Current assignee: East China University of Science and Technology
Priority date: 2018-01-31
Filing date: 2018-01-31
Publication date: 2018-08-10

Abstract

The present invention provides a kind of image synthesizing method mixed based on gradient field, including：Input source video sequence and target video sequence；The target object of source video sequence and the area-of-interest of target video sequence are marked, and the inner boundary and outer boundary that obtain each frame are once mapped using motion compensation；Using the annular region between the inner boundary of present frame and outer boundary as the fusion boundary of present frame；According to the gradient mixed coefficint of each pixel position in previous frame, the gradient mixed coefficint of present frame is calculated using the time consistency and Space Consistency of gradient field；Technology, which is taken, using closed form carrys out local smoothing method HCCI combustion interpolation；And be added smooth value pixel value corresponding with target object, it is disposed to obtain fusion video frame by frame.Compared with the prior art, the present invention can extract the target object of source video, and be seamlessly integrated into the area-of-interest of target video in intensity of illumination significant changes, obtain desired Video Composition effect.

Description

Video synthesis method based on gradient domain mixing

Technical Field

The invention relates to the field of computer vision and image processing, in particular to a video synthesis method based on gradient domain mixing.

Background

The video synthesis technology is to input a source video sequence and a target video sequence, segment a target object marked by a user in the source video sequence, and then synthesize the segmented target object into an area of interest of the target video sequence to obtain a fused video so as to achieve a synthesis effect expected by the user.

In the prior art, the simplest implementation for video compositing methods is to use image fusion algorithms on a frame-by-frame basis or to use interactive commercial software. Although this approach can achieve the composite result desired by the user, it requires a lot of manual labor to complete the supervision, is time consuming and labor intensive, and does not maintain the temporal consistency of the video. In addition, the Poisson-based synthesis algorithm can achieve relatively good synthesis effect, but the time complexity is too high. In addition, the algorithm based on the mean coordinate synthesis uses the GPU for parallel operation, so that the operation speed can be greatly improved, the effect similar to that of Poisson synthesis can be achieved, and color-changing pollution can exist in a fusion area. As such, such methods tend to be less stable to illumination variations, resulting in less time-consistency of the resultant effect.

In view of the above, how to design a novel video synthesis method to improve or eliminate the above-mentioned defects and make the synthesized video effect more natural and harmonious is a subject to be solved in computer graphics and industry.

Disclosure of Invention

Aiming at the defects of the video synthesis method in the prior art, the invention provides the video synthesis method based on gradient domain mixing, which can still have good robustness under the condition that the illumination intensity in an input video sequence is greatly changed.

According to an aspect of the present invention, there is provided a video composition method based on gradient domain mixing, adapted to compose a target object in a source video sequence into a region of interest in a target video sequence when illumination intensity changes, the video composition method comprising:

inputting a source video sequence and a target video sequence;

marking a target object in the source video sequence and a region of interest of the target video sequence, and mapping an inner boundary and an outer boundary of each frame by using motion compensation;

determining a closed curve according to an annular region enclosed by the inner boundary and the outer boundary of the target object in the current frame, and taking the closed curve as a fusion boundary of the current frame;

calculating to obtain a gradient mixing coefficient of the current frame by utilizing the time consistency and the space consistency of a gradient domain according to the gradient mixing coefficient of each pixel point position in the previous frame, wherein the time consistency of the gradient domain is kept by minimizing the difference of a mixing vector of each pixel point position and a second-order normal form of the mixing vector on the motion compensation points of the previous frame and the next frame obtained by using an optical flow method; the difference between the second-order normal form of the mixed vector of each pixel point position of the current frame and the mixed vector of the pixel points in the four adjacent domains is minimized, so that the spatial consistency of the gradient domains is kept;

calculating alpha mattes values of each frame by using a closed type matting technology, and locally smoothing mean value coordinate interpolation obtained by calculation based on a mean value coordinate fusion technology according to the alpha mattes values to obtain an interpolation optimization value; and

and adding the interpolation optimization value and a pixel value corresponding to a target object of the source video sequence, and processing frame by frame to obtain a fused output video.

In an embodiment of the foregoing, the step of generating the fusion boundary further includes: and determining a closed curve as a target fusion boundary in an area surrounded by the known fusion inner boundary and the known fusion outer boundary in each frame, and eliminating the jitter of the fusion result by minimizing the difference between the source video pixel value and the target video pixel value at each point of the target fusion boundary.

In one embodiment, the source video and the target video in the gradient domain are mixed in the fusion region surrounded by the target fusion boundary, and the initial value of the corresponding gradient mixing coefficient is determined by the gradient ratio of the source video and the target video at each pixel position; definition ofMixing the horizontal and vertical gradient components for the initial gradient mixing vector, wherein the calculation formula of the mixing coefficient in the gradient mixing vector is as follows:

wherein,andsecond order norm values of the gradients of the source video and the target video at the position of pixel point i, η₁And η₂Is a control parameter that is a function of,andthe motion credibility of the source video and the motion credibility of the target video at the position of the pixel point i are respectively C^sAnd C^tThe brightness change sizes of the source video and the target video in the initial frame and the next frame are respectively.

In one embodiment, after obtaining the initial value of the gradient blending coefficient, the gradient blending vector of the current frame is updated according to the gradient blending vectors of the previous frame and the first frame:

wherein, ω is₁、ω₂And ω₃Is a control parameter, U_m(i) Set of four neighbourhood pixels, U, representing pixel i_n(i) Set of pixels representing the positions of the source video and the target video of the previous frame calculated from the bidirectional optical flow, a, of pixel i_i、a_jAnd a_kRepresenting the gradient vectors at pixel points i, j and k, respectively.

In one embodiment, the gradient blending coefficients are modified according to motion confidence, which is a convolution value of an optical flow vector and dog (difference of gaussian) at each pixel position, and a difference between frame luminance, which is determined by a Kullback-Leibler distance of histograms of two frames.

In one embodiment, the area surrounded by the outer boundary of the target object is divided into an area with inconsistent texture and color and an area with consistent texture and color, and the following iterative formula is adopted to obtain the closed curve for the area with consistent texture and color:

where B is the set of pixels merging the boundaries, h is the source video at all pixel locationsAnd the average value of the color differences of the target video, h_pIs the color difference between the source video and the target video at the p position of the pixel point,is the average value of the color difference of the pixel point p on the two nearest boundary points of the motion compensation point of the next frame,andrespectively the optical flow vectors of the source video and the target video from the position of a pixel point p to the next frame,the color difference h of the pixel point position obtained by the pixel point p in the next frame according to the optical flowⁿIs the average color difference of the boundary calculated for the next frame, and e is the control variable.

In one embodiment, when the closed curve is obtained by using the above iterative formula, the number of iterations is at least 25.

In an embodiment of the present invention, in the step of performing local smoothing based on mean coordinate interpolation calculated by a mean coordinate fusion technique, a local smoothing term is added to the mean coordinate interpolation for optimization, where the local smoothing term satisfies:

where μ is the control parameter, Ω is the fusion region enclosed by the fusion boundary, W_xIs a normalized weight function, defined as W_x＝(1-α_x)/∑_y∈Ω(1-α_y)，α_xAnd α_yCalculated alphastrates values on x and y within the point fusion region Ω, respectively, S (q, x) being a weight calculated based on geometric distance, S (q, x) exp (— q-x |)²)，F^t(x) And F^s(x) The color values of the target video and the source video at the x-position of the pixel point are respectively.

The video synthesis method based on gradient domain mixing is adopted, and firstly, a source video sequence and a target video sequence are input; then marking a target object of the source video sequence and an interested area of the target video sequence, and mapping once by utilizing motion compensation to obtain an inner boundary and an outer boundary of each frame; then, taking the annular area between the inner boundary and the outer boundary of the current frame as the fusion boundary of the current frame; then, according to the gradient mixing coefficient of each pixel point position in the previous frame, calculating to obtain the gradient mixing coefficient of the current frame by utilizing the time consistency and the space consistency of a gradient domain; and finally, interpolating local smooth mean value coordinates by using a closed type matting technology, adding the smooth value and a pixel value corresponding to the target object, and processing frame by frame to obtain a fused video. Compared with the prior art, the method and the device can extract the target object of the source video when the illumination intensity is remarkably changed, and seamlessly fuse the target object into the interested area of the target video to obtain the expected video synthesis effect.

Drawings

The various aspects of the present invention will become more apparent to the reader after reading the detailed description of the invention with reference to the attached drawings. Wherein,

fig. 1 shows a flow chart of a video composition method based on gradient domain blending in an embodiment of the present invention.

Fig. 2 is a schematic diagram illustrating an inner boundary and an outer boundary in a source video sequence and a fused boundary calculated from the inner and outer boundaries when the video synthesis method of fig. 1 is adopted.

Detailed Description

In order to make the technical content disclosed in the present application more detailed and complete, reference may be made to the drawings in the embodiment of the present invention, and the technical solutions and implementation details of the embodiments of the present invention will be described in more detail.

Fig. 1 shows a flow chart of a video composition method based on gradient domain blending in an embodiment of the present invention. Fig. 2 is a schematic diagram illustrating an inner boundary and an outer boundary in a source video sequence and a fused boundary calculated from the inner and outer boundaries when the video synthesis method of fig. 1 is adopted. Preferably, the hardware conditions of the video synthesis method of the present invention are that the CPU frequency is 2.60GHz, the computer of the memory 8G, and the software tool is Matlab 2014 b.

Referring to fig. 1, the video compositing method of the present invention is adapted to composite a target object in a source video sequence into a region of interest in a target video sequence when the illumination intensity changes. In particular, the method of manufacturing a semiconductor device,

in steps S101 and S103, a source video sequence and a target video sequence are input, a target object in the source video sequence and a region of interest of the target video sequence are marked, and an inner boundary and an outer boundary of each frame are mapped once by using motion compensation.

For example, the user may mark the inner and outer boundaries of a target object of the source video sequence at the first frame. Here, the inner boundary needs to contain the absolute foreground region of the target object, and the outer boundary needs to contain all the blurred boundary regions of the target object. In addition, a region of interest in the target video sequence needs to be marked, so that the target object in the source video sequence can be seamlessly cloned into the region of interest in the target video sequence through video synthesis. Further, in the preprocessing, a foreground region of a target object is firstly segmented in a source video sequence by using a mature target segmentation algorithm, grad-cut, and based on the foreground region, an outer boundary and an inner boundary of the target object in each frame are obtained by mapping once by using motion compensation. In addition, the mature brox optical flow algorithm can be used to calculate the bidirectional optical flow from the current frame to the previous frame and the next frame.

In step S105, a closed curve is determined according to the annular region surrounded by the inner boundary and the outer boundary of the target object in the current frame, and the closed curve is used as the fusion boundary of the current frame. According to an embodiment, the generating the fusion boundary further comprises determining a closed curve as the target fusion boundary within an area enclosed by the known inner and outer fusion boundaries in each frame, and eliminating jitter of the fusion result by minimizing a difference between the source video pixel value and the target video pixel value at each point of the target fusion boundary. For example, when each frame is processed, the inner and outer boundaries of the previous frame may be mapped to the current frame by regression, and a closed curve is determined in an annular region surrounded by the inner and outer boundaries to serve as the target fusion boundary.

In step S107, a gradient blend coefficient of the current frame is calculated by using temporal consistency and spatial consistency of a gradient domain according to a gradient blend coefficient of each pixel point position in the previous frame, wherein the temporal consistency of the gradient domain is maintained by minimizing a difference between a blend vector of each pixel point position and a second-order norm of a blend vector at a motion compensation point of previous and subsequent frames obtained by using an optical flow method; the difference of the second-order normal forms of the mixed vector of each pixel point position of the current frame and the mixed vectors on the pixel points of the four adjacent domains is minimum, so that the spatial consistency of the gradient domain is kept.

According to a specific embodiment, mixing a source video and a target video in a gradient domain in a fusion region surrounded by a target fusion boundary, wherein an initial value of a corresponding gradient mixing coefficient is determined by a gradient ratio of the source video and the target video at each pixel point position; definition ofMixing the horizontal and vertical gradient components for the initial gradient mixing vector, wherein the calculation formula of the mixing coefficient in the gradient mixing vector is as follows:

According to a specific embodiment, after obtaining the initial value of the gradient blending coefficient, the gradient blending vector of the current frame is updated according to the gradient blending vectors of the previous frame and the first frame:

wherein, ω is₁、ω₂And ω₃Is a control parameter, U_m(i) Set of four neighbourhood pixels, U, representing pixel i_n(i) Set of pixels representing the positions of the source video and the target video of the previous frame calculated from the bidirectional optical flow, a, of pixel i_i、a_jAnd a_kRepresenting the gradient vectors at pixel points i, j and k, respectively. Furthermore, the gradient blend coefficient may be corrected based on the motion reliability, which is the convolution value of the optical flow vector and dog (difference of gaussian) at each pixel position, and the inter-frame luminance difference, which is determined by the Kullback-Leibler distance of the histogram of two frames. Furthermore, when the illumination intensity of the source video changes, thenThe reliability of the gradient is reduced, and the gradient mixing coefficient needs to be properly adjusted down; conversely, when the illumination intensity of the target video changes, the gradient mixing coefficient needs to be increased appropriately. Therefore, after the source video sequence and the target video sequence are input, even if the illumination changes in the two videos are different (when the illumination intensity of one video is gradually increased and the illumination intensity of the other video is gradually decreased), natural and seamless video synthesis can be realized. Similarly, when the motion prediction is not reliable, the gradient mixing coefficient is also modified like illumination.

In step S109, alpha mattes values of each frame are calculated by using a closed-loop matting technique, and mean coordinate interpolation calculated based on a mean coordinate fusion technique is locally smoothed according to the alpha mattes values to obtain an interpolation optimized value.

According to a specific embodiment, the area surrounded by the outer boundary of the target object is divided into an area with inconsistent texture and color and an area with consistent texture and color. In detail, in areas where texture and color are not consistent, boundary is calculated using matting. In the area with consistent texture and color, an energy functional is defined, and the curve is obtained through continuous iteration, so that the color difference between the source video and the target video at each pixel point position on the curve is minimum, and the strategy can minimize the change of the object surface. In addition, the time consistency of the synthesized video can be enhanced by considering the change of the illumination intensity, and a fusion boundary with consistent space and time under the condition of the change of the illumination intensity is obtained.

The iterative formula for obtaining the closed curve aiming at the texture and color consistent area is as follows:

here, B is the set of pixels at the blend boundary, h is the average of the color differences of the source and target videos at all pixel locations, h is_pIs a source video and a target video at the p position of a pixel pointThe color difference of (a) is small,is the average value of the color difference of the pixel point p on the two nearest boundary points of the motion compensation point of the next frame,andrespectively the optical flow vectors of the source video and the target video from the position of a pixel point p to the next frame,the color difference h of the pixel point position obtained by the pixel point p in the next frame according to the optical flowⁿIs the average color difference of the boundary calculated for the next frame, and e is the control variable. When the input video has illumination change, the control variable is set to 1, otherwise, the control variable is 0. The resulting boundary is computed using dynamic programming and optimized using iterations to minimize color differences. The right side of the equation for the energy function above includes four terms: the first item is used for controlling the surface change of the fusion area not to be too large and removing jitter generated by the composite video; the second term is used for keeping the spatial consistency of the fusion boundary; the third term is used for controlling the time consistency of the fusion boundary; the fourth term allows the video to maintain good spatio-temporal consistency in the case of varying illumination intensity. In addition, considering the influence of illumination change, when the illumination intensity changes violently, the gradient mixed vector of each pixel point in the fusion area in the current frame and the initial gradient mixed vector in the first frame are closer, so that under the condition of illumination change, time consistency can still be kept to realize seamless synthesis of the video. The initial boundary of iteration is the outer boundary, and iteration is carried out until convergence or the number of iterations reaches 25.

According to a specific embodiment, in the step of the mean coordinate interpolation calculated by the local smoothing based on the mean coordinate fusion technique, a local smoothing term is set to be added to the mean coordinate interpolation for optimization, and the local smoothing term satisfies the following conditions:

where μ is the control parameter, Ω is the fusion region enclosed by the fusion boundary, W_xIs a normalized weight function, defined as W_x＝(1-α_x)/∑_y∈Ω(1-α_y)，α_xAnd α_yCalculated alphastrates values on x and y within the point fusion region Ω, respectively, S (q, x) being a weight calculated based on geometric distance, S (q, x) exp (— q-x |)²)，F^t(x) And F^s(x) The color values of the target video and the source video at the x-position of the pixel point are respectively. Therefore, by using the local smoothing term, the background of the fuzzy boundary region in the fusion region can have greater contribution, so that the brightness of the fusion region and the background is more consistent, and a more harmonious and natural video synthesis effect is obtained.

In step S111, the interpolation optimization value is added to the pixel value corresponding to the target object of the source video sequence, and the frame-by-frame processing is completed to obtain the fused output video.

Hereinbefore, specific embodiments of the present invention are described with reference to the drawings. However, those skilled in the art will appreciate that various modifications and substitutions can be made to the specific embodiments of the present invention without departing from the spirit and scope of the invention. Such modifications and substitutions are intended to be included within the scope of the present invention as defined by the appended claims.

Claims

1. A video synthesis method based on gradient domain mixing, adapted to synthesize a target object in a source video sequence into a region of interest in a target video sequence when illumination intensity changes, the video synthesis method comprising the steps of:

inputting a source video sequence and a target video sequence;

2. The method of claim 1, wherein the step of generating the blending boundary further comprises:

and determining a closed curve as a target fusion boundary in an area surrounded by the known fusion inner boundary and the known fusion outer boundary in each frame, and eliminating the jitter of the fusion result by minimizing the difference between the source video pixel value and the target video pixel value at each point of the target fusion boundary.

3. The method according to claim 2, wherein the source video and the target video are mixed in a fusion region surrounded by the target fusion boundary, and the initial value of the corresponding gradient mixing coefficient is derived from the source videoAnd the gradient ratio of the target video at each pixel position is determined; definition of Mixing the horizontal and vertical gradient components for the initial gradient mixing vector, wherein the calculation formula of the mixing coefficient in the gradient mixing vector is as follows:

4. The method of claim 3, wherein after obtaining the initial values of the gradient blending coefficients, the gradient blending vector of the current frame is updated according to the gradient blending vectors of the previous frame and the first frame:

5. The method according to claim 3, wherein the gradient blending coefficients are modified according to a motion reliability that is a convolution value of an optical flow vector and a dog (difference of gaussian) at each pixel position and a difference between frame luminances determined by a Kullback-Leibler distance of histograms of two frames.

6. The method according to claim 1, wherein the area surrounded by the outer boundary of the target object is divided into an area with inconsistent texture and color and an area with consistent texture and color, and the following iterative formula is adopted to obtain the closed curve for the area with consistent texture and color:

wherein B is a set of pixels merging the boundary, h is an average of color differences of the source video and the target video at all pixel positions, h is_pIs the color difference between the source video and the target video at the p position of the pixel point,is the average value of the color difference of the pixel point p on the two nearest boundary points of the motion compensation point of the next frame,andrespectively the optical flow vectors of the source video and the target video from the position of a pixel point p to the next frame,the color difference h of the pixel point position obtained by the pixel point p in the next frame according to the optical flowⁿIs the average color difference of the boundary calculated for the next frame, and e is the control variable.

7. The method of claim 6, wherein the closed curve is obtained by using the above iterative formula, and the number of iterations is at least 25.

8. The gradient-domain-mixing-based video synthesis method according to claim 1, wherein in the step of local smoothing based on mean coordinate interpolation calculated by mean coordinate fusion technique, a local smoothing term is added to mean coordinate interpolation for optimization, and the local smoothing term satisfies:

where μ is the control parameter, Ω is the fusion region enclosed by the fusion boundary, W_xIs a normalized weight function, defined as W_x＝(1-α_x)/∑_y∈Ω(1-α_y)，α_xAnd α_yAre calculated alpha matrices values on x and y within the point fusion region Ω, respectively, S (q, x) is a weight calculated based on the geometric distance, S (q, x) exp (— q-x |)²)，F^t(x) And F^s(x) The color values of the target video and the source video at the x-position of the pixel point are respectively.