Disclosure of Invention
In order to solve the above technical problem, the present invention provides a video motion compensation method, apparatus and computer device, and the specific scheme is as follows:
in a first aspect, an embodiment of the present application provides a video motion compensation method, where the method includes:
extracting foreground objects and background areas of two adjacent video frames in a video frame sequence;
obtaining foreground compensation data according to the pixel point information of the foreground target of every two adjacent video frames, and obtaining background compensation data according to the pixel point coordinate transformation parameters of the background area of every two adjacent video frames;
and fusing the foreground compensation data and the background compensation data to obtain a target compensation image, and inserting the target compensation image between two adjacent video frames.
According to a specific embodiment disclosed in the application, the step of extracting the foreground object of the video frame comprises the following steps;
and extracting the foreground target from the video frame through a pre-trained foreground extraction model, wherein the foreground extraction model is a multi-scale full-convolution neural network model.
According to a specific embodiment disclosed in the present application, the step of obtaining foreground compensation data according to the pixel point information of the foreground object of every two adjacent video frames, and obtaining background compensation data according to the pixel point coordinate transformation parameter of the background area of every two adjacent video frames includes:
acquiring pixel point information and centroid coordinates of the foreground target, and compensating the foreground target by using the pixel point information and the centroid coordinates to obtain foreground compensation data;
and solving transformation parameters for affine transformation of the background area by using the characteristic point distance of the background area in every two adjacent video frames, and compensating the background area based on the transformation parameters to obtain background compensation data.
According to a specific embodiment disclosed in the present application, the pixel point information includes coordinates and pixel values of each pixel point, and the step of obtaining the pixel point information and the centroid coordinates of the foreground object includes:
acquiring coordinates and pixel values of each pixel point of the foreground target;
calculating to obtain a centroid coordinate of the foreground target according to the coordinate and the pixel value of each pixel point; wherein the formula for calculating the centroid coordinate is:
wherein X is the coordinate of the mass center in the direction of the X axis, X i Coordinate, p, of ith pixel point of foreground object in x direction i Is the pixel value of the ith pixel point, Y is the coordinate of the mass center in the Y-axis direction, Y i And the coordinate of the ith pixel point in the y direction is obtained.
According to a specific embodiment disclosed in the present application, the step of compensating the foreground object by using the pixel point information and the centroid coordinate to obtain foreground compensation data includes:
performing Euclidean transformation based on coordinates and centroid coordinates of all edge pixel points corresponding to foreground objects in the two adjacent video frames to obtain a first Euclidean distance value;
storing each first Euclidean distance value to a corresponding coordinate array, and calculating according to all the coordinate arrays to obtain an original motion track;
smoothing the original motion trail by adopting a filter to obtain a smooth motion trail model;
and inputting the pixel point information corresponding to the foreground target into the smooth motion track model to obtain the foreground compensation data of two adjacent video frames.
According to a specific embodiment disclosed in the present application, the step of solving the transformation parameters of the affine transformation performed on the background area by using the feature point distance of the background area in each two adjacent video frames includes:
selecting N non-overlapping rectangular areas from the background area as matching areas, wherein N is a positive integer;
detecting all feature points in each matching area;
matching the feature points of the two adjacent video frames to obtain a corresponding associated feature point combination between the two adjacent video frames, wherein the associated feature point combination comprises two feature points with the same feature value between the two adjacent video frames;
selecting an optimal feature point combination from all the associated feature point combinations according to a second Euclidean distance value and a Hamming distance value corresponding to each associated feature point combination;
and solving the transformation parameters for affine transformation of the optimal characteristic point combination between two adjacent video frames.
According to a specific embodiment disclosed in the present application, the step of selecting an optimal feature point combination from all the associated feature point combinations according to the second euclidean distance value and the hamming distance value corresponding to each associated feature point combination includes:
calculating a second Euclidean distance value and a Hamming distance value between two feature points in each associated feature point combination;
determining a minimum Euclidean distance value R from all the second Euclidean distance values 1 And determining a minimum Hamming distance value R from all of said Hamming distance values 2 ;
The second Euclidean distance value is not more than 2R 1 And Hamming distance value is not more than 2R 2 And carrying out cluster analysis on the associated characteristic point combination to obtain the optimal characteristic point combination.
According to a specific embodiment disclosed in the present application, the step of compensating the background region based on the transformation parameter to obtain background compensation data includes:
constructing a background region compensation model according to the transformation parameters, wherein the transformation parameters comprise at least one of translation amount, scaling amount, turnover amount, rotation amount and shearing amount;
and inputting the pixel point information contained in the background area into the background area compensation model to obtain the background compensation data.
According to a specific embodiment disclosed in the present application, the step of fusing the foreground compensation data and the background compensation data to obtain a target compensation image includes:
fusing the foreground compensation data and the background compensation data in different scales by using an image pyramid model to obtain compensation images in different scales;
and selecting a target compensation image from the compensation images with different scales according to a selection instruction triggered by a user, wherein the target compensation image is any one of the compensation images with all different scales.
In a second aspect, an embodiment of the present application provides an apparatus for video motion compensation, where the apparatus includes:
the extraction module is used for extracting foreground objects and background areas of two adjacent video frames in the video frame sequence;
the compensation module is used for obtaining foreground compensation data according to the pixel point information of the foreground target of every two adjacent video frames and obtaining background compensation data according to the pixel point coordinate transformation parameters of the background area of every two adjacent video frames;
and the fusion module is used for fusing the foreground compensation data and the background compensation data to obtain a target compensation image and inserting the target compensation image between two adjacent video frames.
In a third aspect, the present application provides a computer device, which includes a processor and a memory, where the memory stores a computer program, and the computer program implements the method of any one of the embodiments of the first aspect when executed on the processor.
In a fourth aspect, the present application provides a computer-readable storage medium, which stores a computer program that, when executed on a processor, implements the method of any one of the embodiments of the first aspect.
Compared with the prior art, the method has the following beneficial effects:
the invention provides a video motion compensation method and device and computer equipment. The video motion compensation method comprises the following steps: the method comprises the steps of firstly extracting foreground targets and background areas of two adjacent video frames in a video frame sequence, then obtaining foreground compensation data according to pixel point information of the foreground targets of every two adjacent video frames, and obtaining the background compensation data according to pixel point coordinate transformation parameters of the background areas of every two adjacent video frames. And fusing the foreground compensation data and the background compensation data to obtain a target compensation image, and inserting the target compensation image between two adjacent video frames. According to the invention, the video frame is divided into the foreground target and the background area, and the foreground target and the background area are respectively subjected to motion compensation to obtain high-quality image motion compensation data, so that the data processing process of motion compensation is optimized, the conditions of video smear and jitter are reduced, and the use comfort of smart television users is greatly improved.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments.
The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present invention, as presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.
Hereinafter, the terms "including", "having", and their derivatives, which may be used in various embodiments of the present invention, are only intended to indicate specific features, numbers, steps, operations, elements, components, or combinations of the foregoing, and should not be construed as first excluding the existence of, or adding to, one or more other features, numbers, steps, operations, elements, components, or combinations of the foregoing.
Furthermore, the terms "first," "second," "third," and the like are used solely to distinguish one from another and are not to be construed as indicating or implying relative importance.
Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which various embodiments of the present invention belong. The terms (such as those defined in commonly used dictionaries) should be interpreted as having a meaning that is consistent with their contextual meaning in the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein in various embodiments of the present invention.
Some embodiments of the present application will be described in detail below with reference to the accompanying drawings. The embodiments described below and the features of the embodiments can be combined with each other without conflict.
Referring to fig. 1, a schematic flow chart of a video motion compensation method provided in an embodiment of the present application is shown in fig. 1, where the method mainly includes:
step S101, extracting foreground objects and background areas of two adjacent video frames in a video frame sequence.
In the past film age, the number of films displayed per second was about 24Hz, and if 24Hz, i.e. 24 pictures per second, were still displayed on a television or other terminal equipment, flicker would appear in the pictures. Therefore, the display frequency of the television is adjusted to 50Hz/60Hz, and for the movie shot at 24Hz, frames are inserted to make the picture smoother when the movie is displayed on the terminal equipment with higher display frequency.
Motion compensation is a motion picture quality compensation technique commonly used in lcd tv nowadays, that is, a frame of motion compensation frame is inserted between two frames of images. Therefore, the intermittent high-speed motion picture can be continuous and smooth.
The video motion compensation method extracts foreground objects and background areas of two adjacent video frames, and frames are intelligently and continuously inserted in the pictures through respectively estimating motion tracks of the foreground objects and the background areas, so that moving images are smooth, and the definition of the pictures is better. It should be noted that, the scheme provided by this embodiment may perform frame interpolation for compensation between every two adjacent video frames in the video, or may perform frame interpolation for compensation between partial adjacent video frames, especially between adjacent video frames related to a moving scene.
Extracting a foreground target of a video frame, wherein the step comprises the following steps;
and extracting the foreground target from the video frame through a pre-trained foreground extraction model, wherein the foreground extraction model is a multi-scale full-convolution neural network model.
In specific implementation, after extracting the foreground object of any video frame, the frame data of the rest of the video frame may be used as the background area, and the background area of the current video frame does not need to be additionally identified, and may also be extracted by other background extraction models, which is not limited herein.
Step S102, obtaining foreground compensation data according to the pixel point information of the foreground target of every two adjacent video frames, and obtaining background compensation data according to the pixel point coordinate transformation parameters of the background area of every two adjacent video frames.
For videos with background jitter and high-frequency moving objects, the conventional motion compensation technology cannot effectively overcome the background jitter to obtain high-quality compensation frame data, and the quality of video restoration is greatly influenced. For example, in a video of a competition of athletes at a competition field shot by a camera under the condition of shaking, the traditional global motion compensation parameters cannot effectively overcome shaking to obtain high-quality compensation data, so that the quality of the video is influenced, and the film watching experience of a user is greatly influenced.
The foreground compensation and the background compensation are to extract foreground objects and background areas of two adjacent video frames respectively, and to obtain the motion tracks of the foreground objects and the background areas of the two adjacent video frames through pixel point information in the video frames. And then calculating according to the motion trail to obtain pixel point information corresponding to the foreground target and the background area which need to be inserted between the adjacent video frames, namely respectively carrying out motion compensation of different degrees, thereby obtaining high-quality image motion compensation data.
In specific implementation, the step of obtaining foreground compensation data according to the pixel point information of the foreground target of every two adjacent video frames and obtaining background compensation data according to the pixel point coordinate transformation parameter of the background area of every two adjacent video frames includes:
acquiring pixel point information and a centroid coordinate of the foreground target, and compensating the foreground target by using the pixel point information and the centroid coordinate to obtain foreground compensation data;
and solving transformation parameters for affine transformation of the background area by using the characteristic point distance of the background area in every two adjacent video frames, and compensating the background area based on the transformation parameters to obtain background compensation data.
It should be noted that, in the scheme provided in this embodiment, there is no sequential limitation on the actions of obtaining the foreground compensation data and the background compensation data, and the actions may be processed simultaneously or sequentially.
The embodiment is further limited mainly for the acquisition process of foreground compensation data and background compensation data. The foreground compensation data mainly comprises a motion trail equation constructed by the edge pixel point information and the centroid coordinates of the foreground target, and the compensation data of the foreground target is generated by predicting through the motion trail equation. And the background compensation data is used for further calculating the motion vector of the background area through the affine change of the background area of the adjacent frames, so as to perform motion compensation on the background area in the current two video frames.
The transformation parameters refer to parameters for performing affine transformation on the background areas of every two adjacent video frames. Affine transformation is a linear transformation from two-dimensional coordinates to two-dimensional coordinates, maintaining the "straightness" and "parallelism" of a two-dimensional figure. The straightness is that straight lines or straight lines cannot be bent after transformation, and arcs or circular arcs, and the parallelism means that the relative position relation between two-dimensional graphs is kept unchanged, and parallel lines or parallel lines and the intersection angle of the intersected straight lines are unchanged. Affine transformations are geometric transformation models that best fit the changes between every two adjacent video frames.
In specific implementation, the pixel information includes coordinates and pixel values of each pixel, and the step of obtaining the pixel information and the centroid coordinates of the foreground object includes:
acquiring coordinates and pixel values of each pixel point of the foreground target;
calculating to obtain a centroid coordinate of the foreground target according to the coordinate and the pixel value of each pixel point; wherein the formula for calculating the centroid coordinate is:
wherein X is the coordinate of the mass center in the direction of the X axis, X i Coordinate, p, of ith pixel point of foreground object in x direction i Is the pixel value of the ith pixel point, Y is the coordinate of the mass center in the Y-axis direction, Y i And the coordinate of the ith pixel point in the y direction is obtained.
In particular, the center of mass of the image, also referred to as the center of gravity of the image. The concept of traditional centroids can be extended over images. The pixel value of each point in the image can be understood as the quality at that point, except that the image is 2-dimensional, and the solution is to find the centroid independently in the X-direction and the Y-direction, respectively. That is, for the centroid in the X direction, the pixel sums of the image on the left and right sides of the centroid are equal, and for the centroid in the Y direction, the pixel sums of the image on the upper and lower sides of the centroid are equal.
In specific implementation, the step of compensating the foreground target by using the pixel point information and the centroid coordinate to obtain foreground compensation data includes:
performing Euclidean transformation based on coordinates and centroid coordinates of all edge pixel points corresponding to foreground objects in the two adjacent video frames to obtain a first Euclidean distance value;
storing each first Euclidean distance value to a corresponding coordinate array, and calculating according to all the coordinate arrays to obtain an original motion track;
smoothing the original motion trail by adopting a filter to obtain a smooth motion trail model;
and inputting the pixel point information corresponding to the foreground target into the smooth motion track model to obtain the foreground compensation data of two adjacent video frames.
Specifically, the euclidean distance value refers to a real distance between two points in an m-dimensional space, or a natural length of a vector, i.e., a distance of the point from an origin. The euclidean distance in two and three dimensions is the actual distance between two points. When the Euclidean distance value is expanded to an n-dimensional space, the solving formula is as follows:
wherein d is an Euclidean distance value, x i Is the coordinate of the ith point in the coordinate system in the X-axis direction, y i Is the coordinate of the ith point in the coordinate system in the Y-axis direction.
In specific implementation, euclidean transformation is carried out on the basis of coordinates and centroid coordinates of all edge pixel points corresponding to foreground targets in two adjacent video frames, after first Euclidean distance values are obtained, each first Euclidean distance value is decomposed into a horizontal coordinate value, a vertical coordinate value and an angle value, and the horizontal coordinate value, the vertical coordinate value and the angle value are stored in corresponding coordinate arrays. And then differentiating and accumulating according to the array data to obtain an original motion trajectory curve, smoothing the original motion trajectory curve by using a filter, and filtering abnormal wave bands in the original motion trajectory curve to obtain a smooth motion trajectory model. The filtering by the filter can effectively suppress and prevent interference.
The step of solving the transformation parameters for performing affine transformation on the background area by using the feature point distance of the background area in every two adjacent video frames comprises the following steps:
selecting N non-overlapping rectangular regions in the background region as matching regions, wherein N is a positive integer;
detecting all feature points in each matching area;
matching the feature points of the two adjacent video frames to obtain a corresponding associated feature point combination between the two adjacent video frames, wherein the associated feature point combination comprises two feature points with the same feature value between the two adjacent video frames;
selecting an optimal feature point combination from all the associated feature point combinations according to a second Euclidean distance value and a Hamming distance value corresponding to each associated feature point combination;
and solving the transformation parameters for affine transformation of the optimal feature point combination between two adjacent video frames.
In specific implementation, harris corner detection and SURF key point detection can be adopted for each matching region to find all feature points in each matching region.
The hamming distance is the number of different characters at corresponding positions of two character strings with equal length, and d (x, y) can be used to represent the hamming distance between the character strings x and y. Viewed from another aspect, the hamming distance measures the minimum number of replacements required to change a string x to y by replacing a character. In other words, the hamming distance value is the number of characters that need to be replaced to convert one string to another. For example, the hamming distance between 1011101 and 1001001 is 2, the hamming distance between 2143896 and 2233796 is 3, and the hamming distance between "toned" and "roses" is 3.
The step of selecting an optimal feature point combination from all the associated feature point combinations according to the second euclidean distance value and the hamming distance value corresponding to each associated feature point combination includes:
calculating a second Euclidean distance value and a Hamming distance value between two feature points in each associated feature point combination;
determining a minimum Euclidean distance value R from all the second Euclidean distance values 1 And determining a minimum Hamming distance value R from all of the Hamming distance values 2 ;
The second Euclidean distance value is not more than 2R 1 And Hamming distance value is not more than 2R 2 And performing cluster analysis on the associated feature point combinations to obtain the optimal feature point combination.
In specific implementation, for the associated feature point combinations between two adjacent video frames, the corresponding minimum distance values are respectively selected as threshold values, when the second euclidean distance and the hamming distance of the associated feature point combinations are both greater than two times of the corresponding preset threshold values, the corresponding associated feature point combinations are deleted, otherwise, the corresponding associated feature point combinations are retained. In addition, the two-time preset threshold in the screening condition is only an optimal value, and the two-time preset threshold in the screening condition can be reasonably set to be any multiple of preset value according to practical applicationA threshold value. Then, the second Euclidean distance value is not more than 2R 1 And Hamming distance value is not more than 2R 2 And performing cluster analysis on the associated feature point combination to obtain an optimal feature point combination, and solving a transformation parameter of affine transformation performed on the optimal feature point combination between two adjacent video frames.
The step of compensating the background region based on the transformation parameters to obtain background compensation data includes:
constructing a background region compensation model according to the transformation parameters, wherein the transformation parameters comprise at least one of translation amount, scaling amount, turnover amount, rotation amount and shearing amount;
and inputting the pixel point information contained in the background area into the background area compensation model to obtain the background compensation data.
Specifically, the whole background region compensation model is constructed based on the transformation parameters of affine transformation carried out by the optimal feature point combination, so that the calculated amount can be greatly reduced, and the optimal transformation effect can be achieved. The transformation parameters for affine transformation of the optimal feature point combination can be represented by the following transformation matrix:
wherein (t) x ,t y ) Representing the amount of translation, parameter a 1 、a 2 、a 3 And a 4 And the scale amount, the flip amount, the rotation amount and the shearing amount are expressed, x and y respectively express the abscissa and ordinate of the feature point before affine transformation, and x 'and y' respectively express the abscissa and ordinate of the feature point after affine transformation.
And step S103, fusing the foreground compensation data and the background compensation data to obtain a target compensation image, and inserting the target compensation image between two adjacent video frames.
In the conventional global motion compensation technology, the whole video frame is compensated uniformly, and the local image compensation of the video frame is unclear. In step S103, by distinguishing the foreground object and the background area of the video frame and performing motion compensation in different degrees, two high-quality compensation data can be obtained, video motion compensation and image restoration can be effectively completed, the processing process of the motion compensation data is optimized, the occurrence of video smear and jitter is reduced, and the use comfort of the smart television user is greatly improved.
The step of fusing the foreground compensation data and the background compensation data to obtain a target compensation image comprises:
fusing the foreground compensation data and the background compensation data in different scales by using an image pyramid model to obtain compensation images in different scales;
and selecting a target compensation image from the compensation images with different scales according to a selection instruction triggered by a user, wherein the target compensation image is any one of the compensation images with all different scales.
Referring to fig. 2, fig. 2 is a schematic diagram of a model structure of an image pyramid according to an embodiment of the present application. The image pyramid is a method for interpreting the structure of an image in multiple resolutions, and generates N images with different resolutions by performing multi-scale pixel sampling on an original image. The image with the highest level of resolution is placed at the bottom and arranged in a pyramid shape, which is a series of images with pixels gradually decreasing up to the top of the pyramid containing only one pixel. From the perspective of FIG. 2, the process of image resolution decreasing is represented by Level 0 → Level 1 → Level 2 → Level 3 → Level 4.
Fusing foreground compensation data and background compensation data with different resolutions by using an image pyramid model to obtain compensation images with different scales, and selecting a more real compensation image with a fusion effect which is most consistent with the original video visual effect from the compensation images, thereby simulating more real frame data and inserting the more real frame data into a video sequence.
According to the video motion compensation method provided by the invention, high-quality compensation frame data can be obtained by distinguishing the foreground and the background of the video and performing motion compensation in different degrees by adopting different methods, the video motion compensation and image restoration can be efficiently completed, the processing process of motion compensation data is optimized, and the situations of video smear and jitter are reduced.
In correspondence with the above method embodiment, referring to fig. 3, the present invention further provides a video motion compensation apparatus 300, where the video motion compensation apparatus 300 includes:
an extracting module 301, configured to extract a foreground object and a background area of two adjacent video frames in a video frame sequence;
the compensation module 302 is configured to obtain foreground compensation data according to pixel point information of the foreground object of each two adjacent video frames, and obtain background compensation data according to pixel point coordinate transformation parameters of the background area of each two adjacent video frames;
and a fusion module 303, configured to fuse the foreground compensation data and the background compensation data to obtain a target compensation image, and insert the target compensation image between two adjacent video frames.
Furthermore, a computer device is provided, the computer device comprising a processor and a memory, the memory storing a computer program, the computer program when executed on the processor implementing the above video motion compensation method.
Furthermore, a computer-readable storage medium is provided, in which a computer program is stored which, when executed on a processor, implements the above-described video motion compensation method.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. The apparatus embodiments described above are merely illustrative and, for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In addition, each functional module or unit in each embodiment of the present invention may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.
The functions may be stored in a computer-readable storage medium if they are implemented in the form of software functional modules and sold or used as separate products. Based on such understanding, the technical solution of the present invention or a part of the technical solution that contributes to the prior art in essence can be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a smart phone, a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention.