CN110602479A

CN110602479A - Video conversion method and system

Info

Publication number: CN110602479A
Application number: CN201910858867.3A
Authority: CN
Inventors: 林海鹏
Original assignee: HAILIN COMPUTER TECHNOLOGY (SHENZHEN) CO LTD
Current assignee: HAILIN COMPUTER TECHNOLOGY (SHENZHEN) CO LTD
Priority date: 2019-09-11
Filing date: 2019-09-11
Publication date: 2019-12-20

Abstract

The invention relates to a video conversion method and a video conversion system. The video conversion method comprises the following steps: dividing an input two-dimensional video into a plurality of frame images; carrying out scene segmentation on a plurality of frames of images, and segmenting the plurality of frames of images into a foreground and a background; judging whether the background moves, if the background is static, classifying background objects by adopting a depth learning algorithm, and calculating the depth information of the background according to the classification result; if the background moves, establishing a global motion model according to the motion information of the background, and calculating the depth information of the background according to the global motion model; and performing edge detection on the foreground to obtain an accurate foreground object contour, and calculating the depth information of the foreground according to the position information of the foreground object contour in the background and the depth information of the background. According to the method and the device, the background object and the foreground object in the two-dimensional video can be accurately segmented, the accuracy of calculating the background and foreground depth information is improved, and the authenticity of the three-dimensional video is further improved.

Description

Video conversion method and system

Technical Field

The present invention relates to the field of video conversion technologies, and in particular, to a video conversion method and system.

Background

Generally, videos watched by electronic devices such as a mobile phone, a flat panel or a television are two-dimensional videos, and compared with three-dimensional videos, the visual perception brought by the two-dimensional videos to users is slightly poor, and therefore a technology of converting two-dimensional videos into three-dimensional videos is provided. However, the current technology of converting two-dimension into three-dimension mainly adopts a three-dimension synthesis technology based on a depth map, and the problem that the depth information of a two-dimension video is not accurate enough exists.

Disclosure of Invention

The invention aims to provide a video conversion method and a video conversion system which can accurately calculate the depth information of a two-dimensional image so as to improve the reality of a three-dimensional video.

In order to realize the purpose of the invention, the invention also adopts the following technical scheme:

a video conversion method, comprising the steps of:

dividing an input two-dimensional video into a plurality of frame images;

carrying out scene segmentation on the plurality of frames of images, and segmenting the plurality of frames of images into a foreground and a background;

judging whether the background moves, if the background is static, classifying background objects of the background by adopting a deep learning algorithm, and calculating the depth information of the background according to the classification result; if the background moves, establishing a global motion model according to the motion information of the background, and calculating the depth information of the background according to the global motion model;

performing three-dimensional reconstruction on the background according to the depth information of the background to obtain a three-dimensional background;

performing edge detection on the foreground to obtain an accurate foreground object contour, calculating the depth information of the foreground according to the position information of the foreground object contour in the background and the depth information of the background, and performing three-dimensional reconstruction on the foreground according to the depth information of the foreground to obtain a three-dimensional foreground;

and synthesizing the three-dimensional background and the three-dimensional foreground into a three-dimensional video and outputting the three-dimensional video.

The video conversion method comprises the following steps: dividing an input two-dimensional video into a plurality of frame images; carrying out scene segmentation on a plurality of frames of images, and segmenting the plurality of frames of images into a foreground and a background; judging whether the background moves, if so, establishing a global motion model according to the motion information of the background, and calculating the depth information of the background according to the global motion model; if the background is static, performing background object classification on the background by adopting a deep learning algorithm, and calculating the depth information of the background according to the classification result; performing three-dimensional reconstruction on the background according to the depth information of the background to obtain a three-dimensional background; performing edge detection on the foreground to obtain an accurate foreground object contour, calculating the depth information of the foreground according to the position information of the foreground object contour in the background and the depth information of the background, and performing three-dimensional reconstruction on the foreground according to the depth information of the foreground to obtain a three-dimensional foreground; and synthesizing the three-dimensional background and the three-dimensional foreground into a three-dimensional video and outputting the three-dimensional video. The method can accurately segment the background object and the foreground object in the two-dimensional video, improve the accuracy of calculating the background and foreground depth information and further improve the authenticity of the three-dimensional video.

a video conversion system comprising:

the frame dividing module is used for dividing the input two-dimensional video into a plurality of frame images;

the scene segmentation module is used for carrying out scene segmentation on the plurality of frames of images and segmenting the plurality of frames of images into a foreground and a background;

the background type judging module is used for judging whether the background moves or not, if the background is static, a deep learning algorithm is adopted to classify background objects of the background, and the depth information of the background is calculated according to the classification result; if the background moves, establishing a global motion model according to the motion information of the background, and calculating the depth information of the background according to the global motion model;

the background three-dimensional reconstruction module is used for performing three-dimensional reconstruction on the background according to the depth information of the background to obtain a three-dimensional background;

the foreground three-dimensional reconstruction module is used for carrying out edge detection on the foreground to obtain an accurate foreground object outline, calculating the depth information of the foreground according to the position information of the foreground object outline in the background and the depth information of the background, and carrying out three-dimensional reconstruction on the foreground according to the depth information of the foreground to obtain a three-dimensional foreground;

and the three-dimensional video output module synthesizes the three-dimensional background and the three-dimensional foreground into a three-dimensional video for output.

Drawings

FIG. 1 is a flow chart illustrating a video conversion method according to an embodiment;

fig. 2 is a schematic structural diagram of a video conversion system according to an embodiment.

Detailed Description

To facilitate an understanding of the invention, the invention will now be described more fully with reference to the accompanying drawings. Preferred embodiments of the present invention are shown in the drawings. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention.

In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise. In the description of the present invention, "a plurality" means at least one, e.g., one, two, etc., unless specifically limited otherwise.

Referring to fig. 1, the present embodiment provides a video conversion method including step S10, step S20, step S30, step S40, step S50, and step S60, which are detailed as follows:

in step S10, the input two-dimensional video is divided into several frame images.

In step S20, the images of the frames are subjected to scene segmentation, and the images of the frames are segmented into foreground and background.

In this embodiment, the foreground refers to a main target of interest in the two-dimensional video, including but not limited to a person, an animal, a vehicle, and the like; background refers to objects of interest for non-two-dimensional video, including but not limited to the sky, the ground, trees, buildings, etc. Specifically, the method for scene segmentation of a plurality of frames of images may adopt a scene segmentation method commonly used by those skilled in the art, for example, a segmentation method based on color information, and the embodiment is not limited.

In step S30, it is determined whether the background is moving, and if the background is stationary, a deep learning algorithm is used to classify the background objects, and the depth information of the background is calculated according to the classification result; if the background moves, a global motion model is established according to the motion information of the background, and the depth information of the background is calculated according to the global motion model.

In the embodiment, for a static background, a large number of background images of sky, ground, buildings, trees and the like are acquired, a feature database of different background images of sky, ground, buildings, trees and the like is established through continuous training and learning, and background objects of an input two-dimensional video frame can be accurately classified according to the feature database, so that the depth information of the background is accurately calculated; for a moving background, the depth information of the background is calculated according to a global motion model, which is a mathematical model used to describe the global motion of the video frame and is mainly generated by camera operations, wherein the camera operations include, but are not limited to, rotation, translation, horizontal swing, vertical swing, zooming, and the like.

In step S40, a three-dimensional background is obtained by performing three-dimensional reconstruction of the background based on the depth information of the background.

In step S50, performing edge detection on the foreground to obtain an accurate foreground object contour, calculating depth information of the foreground according to position information of the foreground object contour in the background and depth information of the background, and performing three-dimensional reconstruction of the foreground according to the depth information of the foreground to obtain a three-dimensional foreground.

In this embodiment, the foreground is subjected to edge detection to obtain an accurate foreground object profile, so that the foreground object can be accurately segmented, and the accuracy of calculating the foreground depth information can be improved.

In step S60, the three-dimensional background and the three-dimensional foreground are synthesized into a three-dimensional video output.

The video conversion method comprises the following steps: dividing an input two-dimensional video into a plurality of frame images; carrying out scene segmentation on a plurality of frames of images, and segmenting the plurality of frames of images into a foreground and a background; judging whether the background moves, if the background is static, classifying background objects by adopting a deep learning algorithm, and calculating the depth information of the background according to the classification result; if the background moves, establishing a global motion model according to the motion information of the background, and calculating the depth information of the background according to the global motion model; performing three-dimensional reconstruction on the background according to the depth information of the background to obtain a three-dimensional background; performing edge detection on the foreground to obtain an accurate foreground object contour, calculating the depth information of the foreground according to the position information of the foreground object contour in the background and the depth information of the background, and performing three-dimensional reconstruction on the foreground according to the depth information of the foreground to obtain a three-dimensional foreground; and synthesizing the three-dimensional background and the three-dimensional foreground into a three-dimensional video and outputting the three-dimensional video. The method can accurately segment the background object and the foreground object in the two-dimensional video, improve the accuracy of calculating the background and foreground depth information and further improve the authenticity of the three-dimensional video.

The present application further provides a video conversion system, referring to fig. 2, comprising a frame dividing module 100, a scene segmentation module 200, a background type determination module 300, a background three-dimensional reconstruction module 400, a foreground three-dimensional reconstruction module 500, and a three-dimensional video output module 600, wherein,

the frame dividing module 100 divides an input two-dimensional video into a plurality of frame images.

The scene segmentation module 200 performs scene segmentation on the plurality of frames of images, and segments the plurality of frames of images into a foreground and a background.

The background type judging module 300 is used for judging whether the background moves or not, if the background is static, the background object classification is carried out on the background by adopting a deep learning algorithm, and the depth information of the background is calculated according to the classification result; if the background moves, a global motion model is established according to the motion information of the background, and the depth information of the background is calculated according to the global motion model.

And the background three-dimensional reconstruction module 400 is used for performing three-dimensional reconstruction on the background according to the depth information of the background to obtain a three-dimensional background.

The foreground three-dimensional reconstruction module 500 performs edge detection on the foreground to obtain an accurate foreground object contour, calculates the depth information of the foreground according to the position information of the foreground object contour in the background and the depth information of the background, and performs three-dimensional reconstruction on the foreground according to the depth information of the foreground to obtain the three-dimensional foreground.

And the three-dimensional video output module 600 synthesizes the three-dimensional background and the three-dimensional foreground into a three-dimensional video for output.

The video conversion system can accurately segment the background object and the foreground object in the two-dimensional video, improve the accuracy of calculating the background and foreground depth information and further improve the authenticity of the three-dimensional video.

The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A video conversion method, comprising the steps of:

dividing an input two-dimensional video into a plurality of frame images;

2. A video conversion system, comprising: