CN109246415B - Video processing method and device - Google Patents

Video processing method and device Download PDF

Info

Publication number
CN109246415B
CN109246415B CN201710347063.8A CN201710347063A CN109246415B CN 109246415 B CN109246415 B CN 109246415B CN 201710347063 A CN201710347063 A CN 201710347063A CN 109246415 B CN109246415 B CN 109246415B
Authority
CN
China
Prior art keywords
omnidirectional
video
omnidirectional video
determining
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710347063.8A
Other languages
Chinese (zh)
Other versions
CN109246415A (en
Inventor
李炜明
张文波
王再冉
刘洋
汪昊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Samsung Electronics Co Ltd
Original Assignee
Samsung Electronics Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Samsung Electronics Co Ltd filed Critical Samsung Electronics Co Ltd
Priority to CN201710347063.8A priority Critical patent/CN109246415B/en
Publication of CN109246415A publication Critical patent/CN109246415A/en
Application granted granted Critical
Publication of CN109246415B publication Critical patent/CN109246415B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The embodiment of the invention provides a video processing method and a video processing device, wherein the method comprises the following steps: acquiring a first omnidirectional video and a second omnidirectional video, wherein the first omnidirectional video and the second omnidirectional video have a stereoscopic parallax in a first direction, and the first direction is a corresponding column direction after the first omnidirectional video and the second omnidirectional video are unfolded according to the longitude and latitude; determining one or two third omnidirectional videos according to the first omnidirectional video and the second omnidirectional video, wherein the second omnidirectional video and the third omnidirectional video have stereoscopic parallax in the second direction, and if one third omnidirectional video is determined, the second omnidirectional video and the third omnidirectional video have stereoscopic parallax in the second direction; if the two third omnidirectional videos are determined, the two third omnidirectional videos have stereoscopic parallax in the second direction; the second direction is a corresponding row direction of the first omnidirectional video and the second omnidirectional video after being unfolded according to the longitude and latitude.

Description

Video processing method and device
Technical Field
The invention relates to the technical field of video processing, in particular to a method and a device for video processing.
Background
With the development of information technology, multimedia technology is also developed, and a Three-Dimensional (english full name: Three Dimensional, english abbreviation: 3D) omnidirectional shooting technology is also developed, wherein the 3D omnidirectional shooting technology has a wide application prospect, for example, the 3D omnidirectional shooting technology can be applied to the fields of Virtual Reality (english full name: Virtual Reality, english abbreviation: VR) conferences, VR live broadcasting, wearable devices, navigation systems, robots, unmanned aerial vehicles, and the like.
The 3D omnidirectional shooting technology is applied to 3D omnidirectional video acquisition equipment, and the current 3D omnidirectional video acquisition equipment is to install a plurality of video acquisition equipment on a spherical or toroidal surface, as shown in fig. 1a and 1b, each video acquisition equipment respectively performs video acquisition in a corresponding direction, and processes videos acquired in each direction to obtain a 3D omnidirectional video. However, a plurality of video capture devices (such as cameras) in the existing 3D omnidirectional video capture device are generally all set to be in the same horizontal direction, that is, only videos in the horizontal direction can be captured, and therefore more video capture devices need to be installed in all directions of the round spherical object to perform video shooting in all directions, and therefore the existing 3D omnidirectional video capture device has a large volume, is not easy to carry and has high cost, and is difficult to apply to life scenes of individual users, and application scenes such as personal live broadcast, daily life recording and motion photography are difficult to complete, so that the application scenes are narrow, and the user experience is low.
Disclosure of Invention
In order to overcome the above technical problems or at least partially solve the above technical problems, the following technical solutions are proposed:
according to an aspect, an embodiment of the present invention provides a method of video processing, including:
acquiring a first omnidirectional video and a second omnidirectional video, wherein the first omnidirectional video and the second omnidirectional video have a stereoscopic parallax in a first direction, and the first direction is a corresponding column direction after the first omnidirectional video and the second omnidirectional video are spread according to the longitude and latitude;
determining one or two third omnidirectional videos according to the first omnidirectional video and the second omnidirectional video, wherein if one third omnidirectional video is determined, the second omnidirectional video and the third omnidirectional video have stereoscopic parallax in a second direction; if the two third omnidirectional videos are determined, the two third omnidirectional videos have stereoscopic parallax in the second direction; the second direction is a row direction corresponding to the first omnidirectional video and the second omnidirectional video after being unfolded according to the longitude and latitude.
Embodiments of the present invention also provide, according to another aspect, an apparatus for video processing, including:
an obtaining module, configured to obtain a first omnidirectional video and a second omnidirectional video, where the first omnidirectional video and the second omnidirectional video have a stereoscopic parallax in a first direction, and the first direction is a column direction in which the first omnidirectional video and the second omnidirectional video are spread according to longitude and latitude and then correspond to each other;
a determining module, configured to determine one or two third omnidirectional videos according to the first omnidirectional video and the second depth video, where if a third omnidirectional video is determined, the second omnidirectional video and the third omnidirectional video have a stereoscopic parallax in a second direction; if the two third omnidirectional videos are determined, the two third omnidirectional videos have stereoscopic parallax in the second direction; the second direction is a row direction corresponding to the first omnidirectional video and the second omnidirectional video after being unfolded according to the longitude and latitude.
The invention provides a video processing method and a video processing device, compared with the prior art, the method and the device are characterized in that two omnidirectional videos with stereoscopic parallax in a first direction are acquired and respectively are a first omnidirectional video and a second omnidirectional video, then a third omnidirectional video is determined according to the first omnidirectional video and the second omnidirectional video, wherein the second omnidirectional video and the third omnidirectional video have stereoscopic parallax in a second direction, namely, only the two omnidirectional videos with stereoscopic parallax in the first direction need to be acquired, the third omnidirectional video of the second omnidirectional video in the same row direction can be obtained through the omnidirectional video conversion from the stereoscopic parallax in the first direction to the stereoscopic parallax in the second direction, or the two third omnidirectional videos in the same row direction can be obtained, and then the three-dimensional omnidirectional video effect is presented to a user by combining the second omnidirectional video and the third omnidirectional video or the three-dimensional omnidirectional video effect is presented to the user by combining the two third omnidirectional videos The omnidirectional video effect provides possible and precondition guarantee. Meanwhile, only two omnidirectional video acquisition devices are needed to complete video acquisition, the size of the omnidirectional video acquisition devices can be greatly reduced, the cost is reduced, and application scenes suitable for the omnidirectional video acquisition devices can be increased based on the characteristics of portability, small size, low cost and the like, so that the user experience is improved.
Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.
Drawings
The foregoing and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
fig. 1a is a schematic diagram of a conventional 3D omnidirectional acquisition device;
fig. 1b is a schematic diagram of another existing 3D omnidirectional acquisition device;
FIG. 2 is a flow chart of a method of video processing according to an embodiment of the present invention;
fig. 3a is a schematic diagram of an omnidirectional video capturing device according to an embodiment of the present invention;
FIG. 3b is a schematic diagram of an omnidirectional video apparatus comprising two video capture devices located in the same horizontal direction;
FIG. 3c is a schematic diagram of an omnidirectional video apparatus comprising a plurality of video capture devices positioned in the same vertical orientation;
fig. 3d is a schematic diagram of another omnidirectional video capture device in an embodiment of the present invention;
FIG. 4 is a diagram illustrating a method for synchronizing timestamps in an embodiment of the present invention;
fig. 5a is a schematic diagram illustrating a method for converting two omnidirectional videos located in the same vertical direction into two omnidirectional videos located in the same horizontal direction according to an embodiment of the present invention;
fig. 5b is a schematic diagram of another method for converting two omnidirectional videos located in the same vertical direction into two omnidirectional videos located in the same horizontal direction according to an embodiment of the present invention;
fig. 6 is a schematic diagram illustrating a black hole area in a virtual omnidirectional video generated in the embodiment of the present invention;
fig. 7 is a schematic diagram of a virtual omnidirectional video after hole filling processing according to an embodiment of the present invention;
FIG. 8 is a schematic diagram of a method for generating training samples according to an embodiment of the present invention;
FIG. 9 is a schematic diagram of another training sample generation method according to an embodiment of the present invention;
fig. 10 is a schematic diagram of an apparatus for video processing according to an embodiment of the invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative only and should not be construed as limiting the invention.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. As used herein, the term "and/or" includes all or any element and all combinations of one or more of the associated listed items.
It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
As will be appreciated by those skilled in the art, a "terminal" as used herein includes both devices having a wireless signal receiver, which are devices having only a wireless signal receiver without transmit capability, and devices having receive and transmit hardware, which have devices having receive and transmit hardware capable of two-way communication over a two-way communication link. Such a device may include: a cellular or other communication device having a single line display or a multi-line display or a cellular or other communication device without a multi-line display; PCS (Personal Communications Service), which may combine voice, data processing, facsimile and/or data communication capabilities; a PDA (Personal Digital Assistant), which may include a radio frequency receiver, a pager, internet/intranet access, a web browser, a notepad, a calendar and/or a GPS (Global Positioning System) receiver; a conventional laptop and/or palmtop computer or other device having and/or including a radio frequency receiver. As used herein, a "terminal" or "terminal device" may be portable, transportable, installed in a vehicle (aeronautical, maritime, and/or land-based), or situated and/or configured to operate locally and/or in a distributed fashion at any other location(s) on earth and/or in space. As used herein, a "terminal Device" may also be a communication terminal, a web terminal, a music/video playing terminal, such as a PDA, an MID (Mobile Internet Device) and/or a Mobile phone with music/video playing function, or a smart tv, a set-top box, etc.
Example one
An embodiment of the present invention provides a method for video processing, as shown in fig. 2, including:
step 201, a first omnidirectional video and a second omnidirectional video are obtained.
Wherein the first omnidirectional video and the second omnidirectional video have stereoscopic parallax in a first direction.
For the embodiment of the invention, when the first omnidirectional video and the second omnidirectional video are respectively expanded in longitude and latitude, the connecting line direction of the selected longitude pole coincides with the connecting line direction of the optical centers of the two omnidirectional videos (the first omnidirectional video and the second omnidirectional video), the latitude selected by the two videos coincides with the origin, the row direction of the expanded video corresponds to the latitude direction, the column direction corresponds to the longitude direction, and the first direction is the corresponding column direction after the first omnidirectional video and the second omnidirectional video are expanded according to the longitude and latitude.
The first omnidirectional video can be an upper viewpoint omnidirectional video, and the second omnidirectional video can be a lower viewpoint omnidirectional video; or the first omnidirectional video is a lower viewpoint omnidirectional video, and the second omnidirectional video is an upper viewpoint omnidirectional video. The present invention is not limited to the embodiments.
In the embodiment of the present invention, the first omnidirectional video and the second omnidirectional video may be acquired by the omnidirectional video capturing device shown in fig. 3 a.
The omnidirectional video capturing device shown in fig. 3a may include two video capturing devices located in the same vertical direction, wherein the two video capturing devices located in the same vertical direction may be connected through a telescopic rod.
For the embodiment of the present invention, the omnidirectional video capturing device may also be composed of two video capturing devices located in the same horizontal direction, wherein the two video capturing devices located in the same horizontal direction may also be connected by a telescopic rod, as shown in fig. 3 b. In the embodiment of the present invention, two video capture devices located in the horizontal direction may be converted in direction to be suitable for the embodiment of the present invention.
For the embodiment of the present invention, the omnidirectional video capturing device may include a plurality of video capturing devices located in the same vertical direction, wherein the video capturing devices located in the same vertical direction may be connected through a telescopic rod, and may be connected through any two video capturing devices, so as to be applicable to the embodiment of the present invention, as shown in fig. 3 c.
For the embodiment of the present invention, the omnidirectional video capturing device may include two video capturing devices located in the same vertical direction, wherein the two video capturing devices located in the same vertical direction are embedded into the telescopic rod, as shown in fig. 3 d.
The telescopic rod can be a connecting rod with a fixed length, or a group of connecting rods with different lengths, and is replaced manually, or only one connecting rod is arranged, and the length of the connecting rod between the omnidirectional video acquisition devices can be adjusted through manual operation; or only one connecting rod is arranged, and the length of the connecting rod of the omnidirectional video acquisition equipment can be automatically adjusted.
For the embodiment of the invention, the omnidirectional video acquisition equipment shown in fig. 3a, 3b, 3c and 3d only needs two video acquisition equipment connected by the telescopic rod, so that the volume of the omnidirectional video acquisition equipment is greatly reduced, the cost is reduced, and based on the characteristics of portability, small size, lower cost and the like, the application scenes suitable for the omnidirectional video acquisition equipment can be increased, thereby improving the user experience.
Optionally, after step 201, the method further includes: and correcting the first omnidirectional video and the second omnidirectional video.
The step of correcting the first omnidirectional video and the second omnidirectional video may specifically include: determining position and attitude error parameters of video acquisition equipment corresponding to the first omnidirectional video and the second omnidirectional video according to the first omnidirectional video and the second omnidirectional video; determining correction parameters according to the position and attitude error parameters; and correcting the first omnidirectional video and the second omnidirectional video according to the correction parameters.
In the actual equipment generating and assembling process, the two video acquisition equipment positioned in the same vertical direction inevitably have errors in posture and direction, so calibration parameters corresponding to the video acquisition equipment respectively need to be adjusted, and the aim of correcting the acquired first omnidirectional video and the second omnidirectional video is fulfilled.
For the embodiment of the invention, the first omnidirectional video and the second omnidirectional video are respectively subjected to image expansion, a pixel point is extracted from an image after the first omnidirectional video is expanded, a pixel point corresponding to the pixel point is searched in an image after the second omnidirectional video is expanded, whether the two pixel points are positioned in the same column direction or not is determined, if the two pixel points are not positioned in the same column direction, calibration parameters respectively corresponding to two video acquisition devices positioned in the same vertical direction are adjusted, and the two corresponding pixel points are ensured to be positioned in the same column direction.
The omnidirectional video shot by the omnidirectional video acquisition equipment can be converted from a 360-degree spherical image to obtain a longitude and latitude expansion plane image according to a spherical longitude and latitude expansion mode. Specifically, a three-dimensional coordinate system O-XYZ is defined by the spherical center, where point O is a coordinate system center point, and XYZ are three mutually perpendicular directions, respectively. Ideally, XY is located on a horizontal plane, Z points upward along the direction of gravity, and in the planar image obtained by transformation, the row coordinates of the image correspond to an angular range of-90 degrees to 90 degrees of a vertical plane in a spherical coordinate system, and the row coordinates of the image correspond to an angular range of 0 degrees to 360 degrees of a horizontal plane in the spherical coordinate system.
For the embodiment of the invention, two omnidirectional images shot by a system at a certain moment are considered: the spherical coordinate systems of the upper viewpoint omnidirectional image and the lower viewpoint omnidirectional image are O1-X1Y1Z1 and O2-X2Y2Z2 respectively. In an ideal case, Z1 is coincident with the direction of a straight line O1O2, Z2 is coincident with the direction of Z1, X1 is parallel to X2, and Y1 is parallel to Y2. Under an ideal condition, after the two omnidirectional images are converted into longitude and latitude expansion images, the same object points in the space have the same column coordinates in the two longitude and latitude expansion images.
When the same object point in the space is detected to have different row coordinates in the two longitude and latitude expansion images, and the spherical coordinate systems representing the two video acquisition devices are not aligned to an ideal state, the spherical coordinate system of at least one of the two video acquisition devices needs to be rotated around the center to be aligned to the ideal state.
For example, the rotation is expressed as an angle [ Ax, Ay, Az ] around the center around three coordinate axis directions. Wherein [ Ax, Ay, Az ] is automatically calculated by a self-calibration method.
Optionally, the embodiment of the present invention may further include a step a (not shown in the figure), where the step a synchronizes timestamps corresponding to the first omnidirectional video and the second omnidirectional video, respectively.
Step a may be located after the step of correcting the first omnidirectional video and the second omnidirectional video, or after step 201. The present invention is not limited to the embodiments.
For the embodiment of the invention, a first characteristic pixel point is obtained in a first omnidirectional video, and a second characteristic pixel point corresponding to the first characteristic pixel point is determined in a second omnidirectional video; then determining the motion tracks corresponding to the first characteristic pixel point and the second characteristic pixel point respectively, sampling characteristic extraction (such as a track turning point with a suddenly changed motion direction) is carried out on the motion track corresponding to the first characteristic pixel point to obtain a first sampling point, and similar characteristic extraction sampling is carried out on the motion track corresponding to the second characteristic pixel point to obtain a second sampling point corresponding to the first sampling point; then determining whether the first sampling point and the second sampling point are aligned (or located on the same vertical line) on the same time axis; if not, the second sampling point can be adjusted according to the time of the first sampling point on the time axis, or the first sampling point can be adjusted according to the time of the second sampling point on the time axis, so as to synchronize the timestamps corresponding to the first omnidirectional video and the second omnidirectional video respectively.
In another mode, synchronization is performed according to time in a third party terminal or a cloud server to synchronize timestamps corresponding to the first omnidirectional video and the second omnidirectional video respectively.
The flow of specific synchronization time stamp is shown in fig. 4.
Step 202, determining one or two third omnidirectional videos according to the first omnidirectional video and the second omnidirectional video.
If a third omnidirectional video is determined, the second omnidirectional video and the third omnidirectional video have a stereoscopic parallax in a second direction; if the two third omnidirectional videos are determined, the two third omnidirectional videos have stereoscopic parallax in the second direction; the second direction is a corresponding row direction of the first omnidirectional video and the second omnidirectional video after being unfolded according to the longitude and latitude.
For example, if the first direction is a vertical direction and the second direction is a horizontal direction, a first omnidirectional video and a second omnidirectional video having a stereoscopic parallax in the vertical direction are acquired, and one or two third omnidirectional videos are determined according to the first omnidirectional video and the second omnidirectional video, where if one third omnidirectional video is determined, the second omnidirectional video and the third omnidirectional video have a stereoscopic parallax in the horizontal direction, and if two third omnidirectional videos are determined, the two third omnidirectional videos have a stereoscopic parallax in the horizontal direction.
Wherein, the step 202 includes steps 2021-2022 (not labeled in the figure):
step 2021, determining an omnidirectional depth video according to the first omnidirectional video and the second omnidirectional video.
Specifically, step 2021 includes step 20211 (not shown):
step 20211, determining the omnidirectional deep video through the trained deep learning network according to the first omnidirectional video and the second omnidirectional video.
For the embodiment of the present invention, the step 20211 specifically includes steps 20211a, 20211b, 20211c, 20211d (not labeled in the figure), wherein,
step 20211a, based on the deep learning network, determining a pixel point in the second omnidirectional video, where the pixel point is matched with each pixel point in the first omnidirectional video.
Step 20211b, determining the depth information corresponding to each matched pair of pixel points.
Step 20211c, based on the deep learning network, performing semantic annotation on each pixel point in the second omnidirectional video.
Step 20211d, determining the omnidirectional depth video according to the depth information corresponding to each pair of matched pixels and the semantic annotation information corresponding to each pixel in the second omnidirectional depth video.
For the embodiment of the invention, the depth learning network of the omnidirectional depth video comprises the following steps: the device comprises a stereo matching unit based on a Deep learning Network (Deep Neural Network, English abbreviation: DNN), a depth image estimation unit based on stereo matching, an image semantic segmentation unit based on DNN, an object geometric model unit, a semantic depth image generation unit and an omnidirectional depth image output unit.
The depth image estimation unit based on stereo matching carries out pixel matching and determines depth information corresponding to each pair of matched pixel points. The depth information corresponding to each pair of matched pixel points in the pixel matching process and the determination of the matched pixel points is specifically as follows:
in the first step, a first omni-directional image OImage1 and a second omni-directional image OImage2 which are spread in the longitude and latitude are input.
In the second step, for each pixel p1 in OImage1, the following operations are performed:
(1) for each pixel p2r in OImage2 and p1 in the same column, the image similarity values for p1 and p2 are compared and denoted as S (p1, p2r), and then the pixel with the largest S (p1, p2r) value among all p2r is found and denoted as p 2.
Wherein, S (p1, p2r) ═ D (D1, D2r), where D is a deep neuron network obtained by a method based on a deep learning model;
(2) if S (p1, p2) > Ts, calculating the distance between p1 and p2, marking p1 and p2 as pixels with depth estimation, and giving the depth to p1, wherein Ts is an image similarity threshold; if S (p1, p2) < Ts, mark p1 and p2 as pixels without depth estimation;
(3) for each pixel in OImage2 labeled as image pixel p2 with depth estimate, finding the most similar pixel in the same way as step (2), if the found most similar pixel is not p1, then labeling it as pixel without depth estimate;
(4) and outputting an omnidirectional depth image OImageD, wherein all pixels with depth estimation exist, and the pixel values are the depth values of the object distance system.
According to the above (1), (2), (3) and (4), the OImageD may contain pixels having no depth value.
In the DNN-based stereo matching unit, an image feature extraction model most suitable for stereo image matching is obtained through learning of a large amount of stereo image training data. Specifically, the DNN model includes multiple layers of neuronal networks with edge connections with weights between each layer of the network. The input layer of the DNN model is two images which respectively correspond to two image windows with the same size and cut from the upper viewpoint omnidirectional image and the lower viewpoint omnidirectional image, and the output layer of the DNN model is a floating point number output between 0 and 1. In the embodiment of the invention, when the DNN model is trained, the training sample is an image pair with a real mark value, two images in the image pair are two image windows with the same size respectively captured from an upper viewpoint omnidirectional image and a lower viewpoint omnidirectional image, and when the two window images correspond to the same object in space and comprise the same position range, the mark value is 1; otherwise, the flag value is 0.
Wherein the object image segmentation unit based on DNN comprises a DNN model for segmenting the image, the model segmenting the image into different regions which do not coincide, the different regions corresponding to different objects, such as people, tables, road surfaces, bicycles, etc. Specifically, the DNN model includes a plurality of layers of neuron networks, each layer of the networks is connected by edges having weights, an input layer of the model is an image, an output layer of the model is an image having the same size as the input image, each pixel of the image is an integer value representing a category of an object, and different integer values correspond to different categories of objects.
The semantic depth image generating unit generates a depth image with semantics. Specifically, based on the segmentation result obtained by the DNN image segmentation, each segmented region in the image corresponds to an object, and the three-dimensional model of the object can be obtained by searching from a three-dimensional model database. And the depth image OImageD obtained from the depth image estimation unit based on stereo matching and the depth information distribution of the object can estimate the three-dimensional posture of the object in the image, a three-dimensional model of the object is projected to the image according to the three-dimensional posture, the depth information of each pixel in the image area can be obtained, and the object category information of each pixel in the image is also known at the same time, so that the image is called as a depth image with semantics.
Further, for regions with too small areas or without depth estimation, the semantic depth image generation unit may not be able to produce results, perform nearest neighbor difference on these regions, fill in the regions with depth estimation values in the neighborhood, and may generate an omnidirectional dense depth image with each pixel having a depth value, as an output of the result output unit, that is, the information finally output by the depth learning network is an omnidirectional dense depth image with each pixel having a depth value.
Step 2022, determining one or two third omnidirectional videos according to the second omnidirectional video and the omnidirectional depth video.
The step 2022 specifically includes steps S1-S3 (not shown), wherein,
and step S1, determining the corresponding depth information of the first pixel point in the determined omnidirectional depth video, and determining a horizontal epipolar line according to the first pixel point.
And the first pixel point is positioned in the second omnidirectional video.
And step S2, determining a second pixel point according to the depth information and the horizontal epipolar line corresponding to the first pixel point in the determined omnidirectional depth video.
And step S3, the steps S1-S2 are repeated until a third omnidirectional video is obtained.
And the third omnidirectional video consists of all the determined second pixel points.
For the embodiment of the present invention, as shown in fig. 5a, for a pixel point P2 in the left viewpoint omnidirectional video, it can be known that an object point P corresponding to the pixel is located on a ray determined by a connection line between the left viewpoint omnidirectional image optical center C2 and the pixel P2, and the position of the point P on the ray can be known by using the depth value in the omnidirectional depth video, that is, the position of the point P in the three-dimensional space is known, and then the point P is projected to the "right viewpoint video collecting device", so as to obtain the pixel position P3 on the image plane of the "right viewpoint video collecting device". For example, a virtual "right viewpoint video capture device" C3, which has the same intra-imaging parameters as the left viewpoint video capture device: including focus, resolution, principal point. Wherein, C3 is located on a straight line passing through C2 and perpendicular to the plane P-C2-P2, the distance between C3 and C2 is a set length of a display stereoscopic base line, wherein the length of the display stereoscopic base line can be equal to the length of the average interpupillary distance of human eyes, and can be adjusted according to the interpupillary distance of a user, and the pixel color of P3 is equal to the pixel color of P2.
Step 2022 may further include steps S4-S8 (not shown), wherein,
and step S4, determining a third pixel point and depth information corresponding to the third pixel point in the omnidirectional depth video.
And the third pixel point is positioned in the second omnidirectional video.
And step S5, determining the vertical stereo parallax according to the third pixel point and the depth information corresponding to the third pixel point in the omnidirectional depth video.
And step S6, determining a horizontal stereo parallax corresponding to the vertical stereo parallax according to the vertical stereo parallax.
And step S7, obtaining a fourth pixel point according to the horizontal stereo parallax and the third pixel point.
And step S8, the steps S4-S7 are repeated until a third omnidirectional video is obtained.
And the third omnidirectional video consists of all the determined fourth pixel points.
For example, the third pixel point is marked as P2, the depth value corresponding to P2 in the depth image is D2, and the vertical stereo parallax corresponding to the pixel point is calculated as DUD(p2)=f*BUD/(D2) where f is the focal length of the video capture device, BUDIs the baseline length of the upper and lower video acquisition devices; then based on the corresponding vertical stereo parallax D of the pixel pointUD(p2) calculating horizontal stereo disparity DLR (p2) ═ DUD (p2) × (BLR/BUD), where BLR represents the basis between left and right stereo image pairsLine length according to DLR(p2), the color of the pixel p2 is rendered to the corresponding position of the right viewpoint omnidirectional image. Wherein, BLRThe length of the average interpupillary distance of the human eyes can be set, or the adjustment can be carried out according to the interpupillary distance of the user, and then the steps are rotated until the virtual omnidirectional video is obtained, as shown in fig. 5 b.
As some black hole regions without any pixel generating effective projection may exist in the virtual omnidirectional video generated by the above method, as shown in fig. 6, objects corresponding to these regions appear under the observation viewpoint corresponding to the virtual omnidirectional video, and the observation viewpoint of the second omnidirectional video is invisible due to being blocked by the foreground object. In order to generate a complete virtual omnidirectional video, image filling needs to be performed on the hole areas, and the obtained filled image is shown in fig. 7.
Optionally, after the step 2022, the method may further include: and performing hole filling processing on the determined third omnidirectional video to obtain the third omnidirectional video after the hole filling processing.
For the embodiment of the present invention, since some black hole regions that do not have any pixel to generate an effective projection may exist in the determined third omnidirectional video, the third omnidirectional video needs to be subjected to hole filling processing.
The step of performing hole filling processing on the determined third omnidirectional video to obtain a hole-filled third omnidirectional video includes steps S9-S13 (not labeled in the figure):
step S9, determining a first omnidirectional image and a second omnidirectional image corresponding to the first omnidirectional image.
Wherein the first omnidirectional image belongs to a first omnidirectional video and the second omnidirectional image belongs to a second omnidirectional video.
Step S10, capturing image windows with the same size from the first omnidirectional image and the second omnidirectional image, and obtaining a first window image and a second window image respectively.
Step S11 is a step of generating a third image corresponding to the second window image based on the countermeasure generation network, the first window image, and the second window image.
Wherein the countermeasure generation network includes an encoding network having high-level semantic attributes and a decoding network having underlying image attributes.
Step S12, determining a frame image corresponding to the generated third image in the third omnidirectional video, and performing hole filling processing on the determined frame image.
And step S13, the steps S9-S12 are circulated until the hole filling processing of each frame of image in the third omnidirectional video is completed.
The step of performing hole filling processing on the determined third omnidirectional video to obtain a hole-filled third omnidirectional video includes: determining filling strategies corresponding to each frame of image to be subjected to hole filling processing in the determined third omnidirectional video respectively; and performing hole filling processing according to a filling strategy to obtain a third omnidirectional video after the hole filling processing.
Further, the step of determining a filling policy corresponding to each frame of image to be subjected to hole filling processing in the determined third omnidirectional video may specifically include: and inputting the images with the preset frame number before each frame of image to be subjected to hole filling processing in the determined third omnidirectional video into the countermeasure generating network to obtain filling strategies corresponding to each frame of image to be subjected to hole filling processing in the third omnidirectional video.
For the embodiment of the invention, a simplified image filling method is to select the nearest pixel from the periphery of the hole and directly copy the color of the nearest pixel into the hole.
For example, one particular method may employ the following steps:
(1) a row of pixels in a hole area is selected, and left and right boundary pixels of the row of pixels are found. And judging the pixel which is farthest away from the video acquisition equipment in the left and right boundary pixels according to the depth information, and assigning the brightness value of the pixel to all the pixel values in the row of pixels.
(2) And (3) performing the operation in the step (1) on all the rows in all the hole areas in the image.
The embodiment of the invention also provides a filling mode which is a method based on the deep neuron network model, and the method adopts a network structure of a similar confrontation generation network (GAN).
The GAN model comprises a plurality of layers of neuron networks, wherein each layer of network is connected by edges with weights, the network is close to the first half network of an input layer and has a structure with gradually reduced number of the neuron networks in each layer, namely an encoding network, the network can learn the characteristics (such as object types, properties and the like) with high-layer semantic attributes in an image, the network is close to the second half network of an output layer and has a structure with gradually increased number of the neuron networks in each layer, namely a decoding network, and the network can learn the characteristics (such as image colors, textures and the like) with low-layer image attributes in the image.
The input layer of the model is two images which respectively correspond to two image windows with the same size cut from the upper viewpoint omnidirectional image and the lower viewpoint omnidirectional image. The output layer of the model is an image with the same size as the input image, and the image is a right view point omnidirectional image corresponding to an image window in the lower view point omnidirectional image. When the method is used, an image area corresponding to a hole area in the generated right viewpoint omnidirectional image is filled into the hole area, wherein the upper viewpoint omnidirectional image belongs to the upper viewpoint omnidirectional video, and the lower viewpoint image belongs to the lower viewpoint omnidirectional video.
When the model is trained, the input of each group of training samples is an upper viewpoint omnidirectional image and a lower viewpoint omnidirectional image, and the output is a right viewpoint omnidirectional image. There are two methods for generating training samples:
in the method 1, three video acquisition devices are used for shooting training images, specifically, the three video acquisition devices are located in the same vertical direction and are arranged at the upper, lower and right positions, and the three video acquisition devices are fixed by a mechanical device, as shown in fig. 8. The upper and lower video acquisition devices form an upper and lower stereo image pair, the lower and right video acquisition devices form a left and right stereo image pair, and the device is placed in various actual world environments to shoot training images.
And 2, generating a training image through technical simulation of computer graphics, and particularly arranging three virtual video acquisition devices in a computer three-dimensional model world. The three virtual video acquisition devices are positioned in the same vertical direction, and the positions are arranged to be the upper side, the lower side and the right side. The upper and lower video capture devices form an upper and lower stereo image pair, and the lower and right video capture devices form a left and right stereo image pair, as shown in fig. 9.
For the embodiment of the present invention, when training an anti-biotic-forming network, video training data is generated using a device or a computational imaging environment similar to the aforementioned "image hole filling unit", where each set of video training data includes: an upper viewpoint omnidirectional video, a lower viewpoint omnidirectional video, and a right viewpoint omnidirectional video.
The method comprises an image filling method set, wherein the image filling method set comprises a plurality of image filling methods, such as an image neighborhood based filling method and a GAN filling based method.
The method based on the filling of the image neighborhood may have various variant forms, such as: the filling is carried out in a line-by-line mode and/or in a column-by-column mode, color copy filling is adopted during filling, and/or texture copy filling is adopted during filling.
Among these, GAN-based padding methods may have various variant forms, such as: training data aiming at different scenes and depth distribution are adopted during training, and the trained GAN model has different filling modes.
The embodiment of the invention provides a video hole filling method, which is similar to a reinforcement learning method and is used for learning a video image hole filling strategy, when a hole in each frame of image in a video series is filled, an optimal filling method is selected from an image filling method set according to the characteristics of hole region images of a plurality of frames before the frame, and the filled video is ensured to have visual continuity in a time domain.
Specifically, S is a feature of a hole region image of several frames before the frame, a is a filling method from an image filling method set, Q (S, a) represents an estimated value of video continuity obtained by the filling method a under the feature S, r (S, a) is an immediate return for taking the action, for example, r (S, a) may be calculated as an image difference value obtained by comparing an image at time t with an image at time t-1 after the image at time t is filled by the method a, and the difference value may be obtained by performing image registration on two images according to an image region with a far depth, and then obtaining a color difference value pixel by pixel in the filling region portion and obtaining a tie value. Let v be a discount factor, 0< v < 1.
Wherein, the steps of the learning process are as follows:
(1) initializing each combination of S, a, i.e., Q (S, a) ═ 0;
(2) obtaining the characteristic S of the current moment;
(3) repeating the following steps a) -e) until the training video is finished.
a) Selecting a method a0 that maximizes Q (S, a);
b) filling the image hole area using method a0, calculating r (S, a 0);
c) acquiring a feature S' of the next moment after filling;
d) update Q (S, a) ═ r (S, a0) + v maxa{Q(S’,a)};
e) Let S be S'.
For the embodiment of the present invention, the holes in the video are filled by using the policy Q (S, a) obtained by the learning.
For the embodiment of the present invention, in some application scenarios, for example, when a user shoots a video during a movement, in order to ensure that a smoother omnidirectional video is output, it is necessary to process a first omnidirectional video and a second omnidirectional video obtained by shooting, where a specific processing manner is as in step 301 (not shown in the figure), where,
and 301, stabilizing the second omnidirectional video and/or the determined third omnidirectional video.
For the embodiment of the present invention, step 301 may include two cases:
case 1: and if only one third omnidirectional video is generated, stabilizing the second omnidirectional video and the third omnidirectional video.
Case 2: and if two third omnidirectional videos are generated, stabilizing the generated third omnidirectional videos.
Wherein, step 301 may specifically include step 3011 (not labeled in the figure):
step 3011, rendering the second omnidirectional video and/or the determined third omnidirectional video to a video stabilization target track to obtain a stable second omnidirectional video and/or a stable third omnidirectional video.
The method for determining the stable target track of the video comprises the following steps: according to the omnidirectional depth video, determining the position information of the three-dimensional environment model corresponding to each moment of the video acquisition equipment in the motion process; determining a three-dimensional running track of the video acquisition equipment in a world coordinate system according to the position information of the three-dimensional environment model corresponding to each moment of the video acquisition equipment in the motion process; and filtering the three-dimensional motion track to obtain a video stable target track.
For the embodiment of the present invention, the hole filling processing may be performed on the third omnidirectional video before step 3011, or may be performed on the third omnidirectional video after step 3011. The present invention is not limited to the embodiments.
The way of performing hole filling processing is the same as that of performing hole filling processing in the above embodiments, and is not described herein again.
The embodiment of the invention provides a video processing method, and compared with the existing mode, the embodiment of the invention acquires two omnidirectional videos with stereoscopic parallax in a first direction, namely a first omnidirectional video and a second omnidirectional video respectively, and then determines a third omnidirectional video according to the first omnidirectional video and the second omnidirectional video, wherein the second omnidirectional video and the third omnidirectional video have stereoscopic parallax in a second direction, namely, only the two omnidirectional videos with stereoscopic parallax in the first direction need to be acquired in the embodiment of the invention, and through the conversion from the stereoscopic parallax in the first direction to the stereoscopic parallax in the second direction, a third omnidirectional video with the second omnidirectional video positioned in the same row direction can be obtained, or two third omnidirectional videos positioned in the same row direction can be obtained, and then the second omnidirectional video and the third omnidirectional video are combined to present a three-dimensional omnidirectional video effect for a user or combine the two third omnidirectional videos to present a three-dimensional omnidirectional video effect for the user or combine the two third omnidirectional videos The user presents a three-dimensional omnidirectional video effect, and possible and precondition guarantee is provided. Meanwhile, only two omnidirectional video acquisition devices are needed to complete video acquisition, the size of the omnidirectional video acquisition devices can be greatly reduced, the cost is reduced, and application scenes suitable for the omnidirectional video acquisition devices can be increased based on the characteristics of portability, small size, low cost and the like, so that the user experience is improved.
Example two
An embodiment of the present invention provides a video processing apparatus, as shown in fig. 10, including: an obtaining module 1001 and a determining module 1002, wherein,
an obtaining module 1001 is configured to obtain a first omnidirectional video and a second omnidirectional video.
The first omnidirectional video and the second omnidirectional video have a stereoscopic parallax in a first direction, and the first direction is a column direction corresponding to the first omnidirectional video and the second omnidirectional video after being unfolded according to the longitude and latitude.
A determining module 1002, configured to determine one or two third omnidirectional videos according to the first omnidirectional video and the second depth video.
If a third omnidirectional video is determined, the second omnidirectional video and the third omnidirectional video have a stereoscopic parallax in a second direction; if the two third omnidirectional videos are determined, the two third omnidirectional videos have stereoscopic parallax in the second direction; the second direction is a corresponding row direction of the first omnidirectional video and the second omnidirectional video after being unfolded according to the longitude and latitude.
The embodiment of the invention provides a video processing device, and compared with the existing mode, the embodiment of the invention acquires two omnidirectional videos with stereoscopic parallax in a first direction, the videos are respectively a first omnidirectional video and a second omnidirectional video, then determines a third omnidirectional video according to the first omnidirectional video and the second omnidirectional video, wherein the second omnidirectional video and the third omnidirectional video have stereoscopic parallax in a second direction, namely, only the two omnidirectional videos with stereoscopic parallax in the first direction need to be acquired in the embodiment of the invention, and a third omnidirectional video with the second omnidirectional video positioned in the same row direction can be obtained by converting the stereoscopic parallax in the first direction into the omnidirectional video with stereoscopic parallax in the second direction, or two third omnidirectional videos positioned in the same row direction are obtained, and then the second omnidirectional video and the third omnidirectional video are combined to present a three-dimensional omnidirectional video effect for a user or combine the two third omnidirectional videos to present a three-dimensional omnidirectional video effect for the user or combine the two third omnidirectional videos The user presents a three-dimensional omnidirectional video effect, and possible and precondition guarantee is provided. Meanwhile, only two omnidirectional video acquisition devices are needed to complete video acquisition, the size of the omnidirectional video acquisition devices can be greatly reduced, the cost is reduced, and application scenes suitable for the omnidirectional video acquisition devices can be increased based on the characteristics of portability, small size, low cost and the like, so that the user experience is improved.
The embodiment of the present invention provides a video processing apparatus, which can implement the method embodiment provided above, and for specific function implementation, reference is made to the description in the method embodiment, and details are not repeated here.
Those skilled in the art will appreciate that the invention includes apparatus relating to performing one or more of the operations herein. These devices may be specially designed and manufactured for the required purposes, or they may comprise known devices in general-purpose computers. These devices have stored therein computer programs that are selectively activated or reconfigured. Such a computer program may be stored in a device (e.g., computer) readable medium, including, but not limited to, any type of disk including floppy disks, hard disks, optical disks, CD-ROMs, and magnetic-optical disks, ROMs (Read-Only memories), RAMs (Random Access memories), EPROMs (Erasable Programmable Read-Only memories), EEPROMs (Electrically Erasable Programmable Read-Only memories), flash memories, magnetic cards, or optical cards, or any type of media suitable for storing electronic instructions, and each coupled to a bus. That is, a readable medium includes any medium that stores or transmits information in a form readable by a device (e.g., a computer).
It will be understood by those within the art that each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented by computer program instructions. Those skilled in the art will appreciate that the computer program instructions may be implemented by a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, implement the features specified in the block or blocks of the block diagrams and/or flowchart illustrations of the present disclosure.
Those of skill in the art will appreciate that various operations, methods, steps in the processes, acts, or solutions discussed in the present application may be alternated, modified, combined, or deleted. Further, various operations, methods, steps in the flows, which have been discussed in the present application, may be interchanged, modified, rearranged, decomposed, combined, or eliminated. Further, steps, measures, schemes in the various operations, methods, procedures disclosed in the prior art and the present invention can also be alternated, changed, rearranged, decomposed, combined, or deleted.
The foregoing is only a partial embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims (33)

1. A method of video processing, comprising:
acquiring a first omnidirectional video and a second omnidirectional video, wherein the first omnidirectional video and the second omnidirectional video have a stereoscopic parallax in a first direction, and the first direction is a corresponding column direction after the first omnidirectional video and the second omnidirectional video are spread according to the longitude and latitude;
determining an omnidirectional depth video according to the first omnidirectional video and the second omnidirectional video; determining one or two third omnidirectional videos according to the second omnidirectional video and the omnidirectional depth video, wherein if one third omnidirectional video is determined, the second omnidirectional video and the third omnidirectional video have stereoscopic parallax in a second direction; if the two third omnidirectional videos are determined, the two third omnidirectional videos have stereoscopic parallax in the second direction; the second direction is a row direction corresponding to the first omnidirectional video and the second omnidirectional video after being unfolded according to the longitude and latitude.
2. The method of claim 1, wherein the step of determining one or two third omni-directional videos is further followed by:
and performing hole filling processing on the determined third omnidirectional video to obtain the third omnidirectional video after the hole filling processing.
3. The method of claim 1 or 2, wherein the step of acquiring the first omnidirectional video and the second omnidirectional video further comprises:
correcting the first omnidirectional video and the second omnidirectional video.
4. The method of claim 3, wherein the step of correcting the first omni-directional video and the second omni-directional video comprises:
determining position and attitude error parameters of video acquisition equipment corresponding to the first omnidirectional video and the second omnidirectional video according to the first omnidirectional video and the second omnidirectional video;
determining correction parameters according to the position and attitude error parameters;
and correcting the first omnidirectional video and the second omnidirectional video according to the correction parameters.
5. The method of claim 4, further comprising:
and synchronizing time stamps corresponding to the first omnidirectional video and the second omnidirectional video respectively.
6. The method of claim 1, wherein the step of determining one or two third omni-directional videos is further followed by:
and enhancing the resolution corresponding to the second omnidirectional video and/or the determined third omnidirectional video.
7. The method of claim 1, wherein the step of determining an omnidirectional depth video from the first omnidirectional video and the second omnidirectional video comprises:
and determining the omnidirectional depth video according to the first omnidirectional video and the second omnidirectional video and through a trained deep learning network.
8. The method of claim 7, wherein the step of determining the omnidirectional depth video from the first omnidirectional video and the second omnidirectional video and through a trained deep learning network comprises:
determining pixel points matched with each pixel point in the first omnidirectional video in the second omnidirectional video based on the deep learning network;
determining depth information corresponding to each matched pair of pixel points;
semantic labeling is carried out on each pixel point in the second omnidirectional video based on the deep learning network;
and determining the omnidirectional depth video according to the depth information corresponding to each pair of matched pixel points and the semantic annotation information corresponding to each pixel point in the second omnidirectional video.
9. The method of claim 1, wherein the step of determining a third omni-directional video from the second omni-directional video and the omni-directional depth video comprises:
step S1, determining depth information corresponding to a first pixel point in the determined omnidirectional depth video, and determining a horizontal epipolar line according to the first pixel point, wherein the first pixel point is located in the second omnidirectional video;
step S2, determining a second pixel point according to the depth information corresponding to the first pixel point in the determined omnidirectional depth video and the horizontal epipolar line;
and S3, repeating the steps S1-S2 until a third omnidirectional video is obtained, wherein the third omnidirectional video is composed of all the determined second pixel points.
10. The method of claim 1, wherein the step of determining a third omni-directional video from the second omni-directional video and the omni-directional depth video comprises:
step S4, determining a third pixel point and depth information corresponding to the third pixel point in the omnidirectional depth video, wherein the third pixel point is located in the second omnidirectional video;
step S5, determining a vertical stereo parallax according to the third pixel point and the depth information corresponding to the third pixel point in the omnidirectional depth video;
step S6, according to the vertical stereo parallax, determining a horizontal stereo parallax corresponding to the vertical stereo parallax;
step S7, obtaining a fourth pixel point according to the horizontal stereo parallax and the third pixel point;
and S8, repeating the steps S4-S7 until the third omnidirectional video is obtained, wherein the third omnidirectional video is composed of all the determined fourth pixel points.
11. The method according to claim 2, wherein the step of performing hole filling processing on the determined third omnidirectional video to obtain a hole-filled third omnidirectional video comprises:
step S9, determining a first omnidirectional image and a second omnidirectional image corresponding to the first omnidirectional image, where the first omnidirectional image belongs to the first omnidirectional video and the second omnidirectional image belongs to the second omnidirectional video;
step S10, capturing image windows with the same size from the first omnidirectional image and the second omnidirectional image, and obtaining a first window image and a second window image respectively;
step S11, generating a third image corresponding to the second window image based on a countermeasure generation network, wherein the countermeasure generation network comprises an encoding network with high-level semantic attribute and a decoding network with bottom-level image attribute;
step S12, determining a frame image corresponding to the generated third image in the determined third omnidirectional video, and performing hole filling processing on the determined frame image;
and step S13, looping steps S9-S12 until the hole filling processing of each frame of image in the determined third omnidirectional video is completed.
12. The method according to claim 2, wherein the step of performing hole filling processing on the determined third omnidirectional video to obtain a hole-filled third omnidirectional video comprises:
determining filling strategies corresponding to each frame of image to be subjected to hole filling processing in the determined third omnidirectional video respectively;
and carrying out hole filling processing according to the filling strategy to obtain a third omnidirectional video after the hole filling processing.
13. The method according to claim 12, wherein the step of determining the filling strategy corresponding to each frame of image to be hole-filled in the determined third omni-directional video comprises:
and inputting the images with the preset frame number before each frame of image to be subjected to hole filling processing in the determined third omnidirectional video into a countermeasure generation network to obtain filling strategies corresponding to each frame of image to be subjected to hole filling processing in the third omnidirectional video.
14. The method of any one of claims 1-2, 4-13, further comprising:
and performing stabilization processing on the second omnidirectional video and/or the determined third omnidirectional video.
15. The method according to claim 14, wherein the step of stabilizing the second omnidirectional video and/or the determined third omnidirectional video comprises:
and rendering the second omnidirectional video and/or the determined third omnidirectional video to a video stabilization target track to obtain a stable second omnidirectional video and/or a stable third omnidirectional video.
16. The method of claim 15, wherein determining the manner in which the video stabilizes the target track comprises:
according to the omnidirectional depth video, determining the position information of the three-dimensional environment model corresponding to the video acquisition equipment at each moment in the motion process;
determining a three-dimensional running track of the video acquisition equipment in a world coordinate system according to the position information of the three-dimensional environment model corresponding to each moment of the video acquisition equipment in the motion process;
and filtering the three-dimensional running track to obtain the stable target track of the video.
17. An apparatus for video processing, comprising:
an obtaining module, configured to obtain a first omnidirectional video and a second omnidirectional video, where the first omnidirectional video and the second omnidirectional video have a stereoscopic parallax in a first direction, and the first direction is a column direction in which the first omnidirectional video and the second omnidirectional video are spread according to longitude and latitude and then correspond to each other;
the determining module is further configured to determine an omnidirectional depth video according to the first omnidirectional video and the second omnidirectional video; determining one or two third omnidirectional videos according to the second omnidirectional video and the omnidirectional depth video, wherein if one third omnidirectional video is determined, the second omnidirectional video and the third omnidirectional video have stereoscopic parallax in a second direction; if the two third omnidirectional videos are determined, the two third omnidirectional videos have stereoscopic parallax in the second direction; the second direction is a row direction corresponding to the first omnidirectional video and the second omnidirectional video after being unfolded according to the longitude and latitude.
18. The apparatus of claim 17, further comprising:
and the hole filling module is used for carrying out hole filling processing on the determined third omnidirectional video to obtain the third omnidirectional video after the hole filling processing.
19. The apparatus of any one of claims 17 or 18, further comprising:
and the correction module is used for correcting the first omnidirectional video and the second omnidirectional video.
20. The apparatus of claim 19, wherein the correction module is specifically configured to:
determining position and attitude error parameters of video acquisition equipment corresponding to the first omnidirectional video and the second omnidirectional video according to the first omnidirectional video and the second omnidirectional video; determining correction parameters according to the position and attitude error parameters; and correcting the first omnidirectional video and the second omnidirectional video according to the correction parameters.
21. The apparatus of claim 20, further comprising:
and the synchronization module is used for synchronizing the timestamps corresponding to the first omnidirectional video and the second omnidirectional video respectively.
22. The apparatus of claim 17, further comprising:
and the enhancing module is used for enhancing the resolution corresponding to the second omnidirectional video and/or the determined third omnidirectional video.
23. The apparatus of claim 17, wherein the determining module is specifically configured to:
and determining the omnidirectional depth video according to the first omnidirectional video and the second omnidirectional video and through a trained deep learning network.
24. The apparatus of claim 23, wherein the determining module is specifically configured to:
determining pixel points matched with each pixel point in the first omnidirectional video in the second omnidirectional video based on the deep learning network; determining depth information corresponding to each matched pair of pixel points; semantic labeling is carried out on each pixel point in the second omnidirectional video based on the deep learning network; and determining the omnidirectional depth video according to the depth information corresponding to each pair of matched pixel points and the semantic annotation information corresponding to each pixel point in the second omnidirectional video.
25. The apparatus of claim 17, wherein the determining module is specifically configured to:
step S1, determining depth information corresponding to a first pixel point in the determined omnidirectional depth video, and determining a horizontal epipolar line according to the first pixel point, wherein the first pixel point is located in the second omnidirectional video;
step S2, determining a second pixel point according to the depth information corresponding to the first pixel point in the determined omnidirectional depth video and the horizontal epipolar line;
and S3, repeating the steps S1-S2 until a third omnidirectional video is obtained, wherein the third omnidirectional video is composed of all the determined second pixel points.
26. The apparatus of claim 17, wherein the determining module is specifically configured to:
step S4, determining a third pixel point and depth information corresponding to the third pixel point in the omnidirectional depth video, wherein the third pixel point is located in the second omnidirectional video;
step S5, determining a vertical stereo parallax according to the third pixel point and the depth information corresponding to the third pixel point in the omnidirectional depth video;
step S6, according to the vertical stereo parallax, determining a horizontal stereo parallax corresponding to the vertical stereo parallax;
step S7, obtaining a fourth pixel point according to the horizontal stereo parallax and the third pixel point;
and S8, repeating the steps S4-S7 until the third omnidirectional video is obtained, wherein the third omnidirectional video is composed of all the determined fourth pixel points.
27. The apparatus of claim 18, wherein the hole filling module is specifically configured to:
step S9, determining a first omnidirectional image and a second omnidirectional image corresponding to the first omnidirectional image, where the first omnidirectional image belongs to the first omnidirectional video and the second omnidirectional image belongs to the second omnidirectional video;
step S10, capturing image windows with the same size from the first omnidirectional image and the second omnidirectional image, and obtaining a first window image and a second window image respectively;
step S11, generating a third image corresponding to the second window image based on a countermeasure generation network, wherein the countermeasure generation network comprises an encoding network with high-level semantic attribute and a decoding network with bottom-level image attribute;
step S12, determining a frame image corresponding to the generated third image in the determined third omnidirectional video, and performing hole filling processing on the determined frame image;
and step S13, looping steps S9-S12 until the hole filling processing of each frame of image in the determined third omnidirectional video is completed.
28. The apparatus of claim 18, wherein the hole filling module is specifically configured to:
determining filling strategies corresponding to each frame of image to be subjected to hole filling processing in the determined third omnidirectional video respectively; and carrying out hole filling processing according to the filling strategy to obtain a third omnidirectional video after the hole filling processing.
29. The apparatus of claim 28, wherein the hole filling module is specifically configured to:
and inputting the images with the preset frame number before each frame of image to be subjected to hole filling processing in the determined third omnidirectional video into a countermeasure generation network to obtain filling strategies corresponding to each frame of image to be subjected to hole filling processing in the third omnidirectional video.
30. The apparatus of any one of claims 17-18, 20-29, further comprising:
and the stabilizing module is used for stabilizing the second omnidirectional video and/or the determined third omnidirectional video.
31. The apparatus of claim 30, wherein the stabilization module is specifically configured to:
and rendering the second omnidirectional video and/or the determined third omnidirectional video to a video stabilization target track to obtain a stable second omnidirectional video and/or a stable third omnidirectional video.
32. The apparatus of claim 31, wherein the stabilization module is specifically configured to:
according to the omnidirectional depth video, determining the position information of the three-dimensional environment model corresponding to the video acquisition equipment at each moment in the motion process;
determining a three-dimensional running track of the video acquisition equipment in a world coordinate system according to the position information of the three-dimensional environment model corresponding to each moment of the video acquisition equipment in the motion process;
and filtering the three-dimensional running track to obtain the stable target track of the video.
33. An electronic device, comprising: a processor and a memory, the memory storing a program, the processor executing the program to implement the method of any of claims 1-16.
CN201710347063.8A 2017-05-16 2017-05-16 Video processing method and device Active CN109246415B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710347063.8A CN109246415B (en) 2017-05-16 2017-05-16 Video processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710347063.8A CN109246415B (en) 2017-05-16 2017-05-16 Video processing method and device

Publications (2)

Publication Number Publication Date
CN109246415A CN109246415A (en) 2019-01-18
CN109246415B true CN109246415B (en) 2021-12-03

Family

ID=65082943

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710347063.8A Active CN109246415B (en) 2017-05-16 2017-05-16 Video processing method and device

Country Status (1)

Country Link
CN (1) CN109246415B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113409374A (en) * 2021-07-12 2021-09-17 东南大学 Character video alignment method based on motion registration

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105869201A (en) * 2016-03-25 2016-08-17 北京全景思维科技有限公司 Method and device for achieving smooth switching of panoramic views in panoramic roaming
CN106534832A (en) * 2016-11-21 2017-03-22 深圳岚锋创视网络科技有限公司 Stereoscopic image processing method and system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105335748B (en) * 2014-08-07 2018-10-12 株式会社理光 Image characteristic extracting method and system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105869201A (en) * 2016-03-25 2016-08-17 北京全景思维科技有限公司 Method and device for achieving smooth switching of panoramic views in panoramic roaming
CN106534832A (en) * 2016-11-21 2017-03-22 深圳岚锋创视网络科技有限公司 Stereoscopic image processing method and system

Also Published As

Publication number Publication date
CN109246415A (en) 2019-01-18

Similar Documents

Publication Publication Date Title
CN112085845B (en) Outdoor scene rapid three-dimensional reconstruction device based on unmanned aerial vehicle image
EP3234806B1 (en) Scalable 3d mapping system
US10445924B2 (en) Method and device for processing DVS events
US9269003B2 (en) Diminished and mediated reality effects from reconstruction
US9237330B2 (en) Forming a stereoscopic video
CN107407866A (en) Laser radar stereoscopic fusion true man&#39;s outdoor scene threedimensional model video reconstruction for 360 ° of body virtual reality videos of six degree of freedom
Pylvanainen et al. Automatic alignment and multi-view segmentation of street view data using 3d shape priors
KR100560464B1 (en) Multi-view display system with viewpoint adaptation
CN111557094A (en) Method, apparatus and stream for encoding/decoding a volumetric video
KR20150013709A (en) A system for mixing or compositing in real-time, computer generated 3d objects and a video feed from a film camera
US20170064279A1 (en) Multi-view 3d video method and system
KR101933037B1 (en) Apparatus for reproducing 360 degrees video images for virtual reality
KR102141319B1 (en) Super-resolution method for multi-view 360-degree image and image processing apparatus
CN104798128A (en) Annotation method and apparatus
CN112243518A (en) Method and device for acquiring depth map and computer storage medium
CN109588055A (en) Image processing equipment and image processing method
Saxena et al. 3-d reconstruction from sparse views using monocular vision
CN109246415B (en) Video processing method and device
US11568642B2 (en) Large-scale outdoor augmented reality scenes using camera pose based on learned descriptors
US11223815B2 (en) Method and device for processing video
Kim et al. 360° image reference-based super-resolution using latitude-aware convolution learned from synthetic to real
Fu et al. Image Stitching Techniques Applied to Plane or 3D Models: A Review
Li et al. Sat2vid: Street-view panoramic video synthesis from a single satellite image
CN115272450A (en) Target positioning method based on panoramic segmentation
US20210037230A1 (en) Multiview interactive digital media representation inventory verification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant