CN114697758A

CN114697758A - Video processing method and device and electronic equipment

Info

Publication number: CN114697758A
Application number: CN202011614555.7A
Authority: CN
Inventors: 朱韬
Original assignee: Beijing Kingsoft Cloud Network Technology Co Ltd
Current assignee: Beijing Kingsoft Cloud Network Technology Co Ltd
Priority date: 2020-12-30
Filing date: 2020-12-30
Publication date: 2022-07-01
Anticipated expiration: 2040-12-30
Also published as: CN114697758B; WO2022142757A1

Abstract

The application provides a video processing method, a video processing device and electronic equipment, wherein the method comprises the following steps: acquiring a first video frame sequence and a second video frame sequence respectively corresponding to two paths of original video sources for 3D presentation; respectively and alternately selecting target video frames with complementary identification information from the first video frame sequence and the second video frame sequence according to the identification information of the video frames to obtain a first target video frame set and a second target video frame set; and according to the sequence of the identification information of the target video frames, performing cross combination on the target video frames in the two sets to obtain a target video frame sequence. According to the method and the device, the video frame sequence with the naked eye 3D visual effect can be obtained through the cross combination processing of the video frames of the two video sources, and then the naked eye 3D video file can be generated, so that a user can watch the video with the 3D effect by naked eyes without relying on special equipment.

Description

Video processing method and device and electronic equipment

Technical Field

The present application relates to the technical field of software algorithm development, and in particular, to a video processing method and apparatus, and an electronic device.

Background

In the prior art, there are two ways to realize the 3D effect: one way is for the user to wear 3D glasses or VR glasses directly, relying on a dedicated device; the other mode is that a player is modified to play a video with a 3D effect, and the video can be directly watched by human eyes, but the mode cannot use a common video player, and the fast switching of the two videos can consume a large amount of equipment computing resources, so that the 3D effect of the high-definition high-image-quality video is seriously affected.

Disclosure of Invention

The application aims to provide a video processing method, a video processing device and electronic equipment, wherein a video frame sequence with a naked eye 3D visual effect can be obtained through cross merging processing of video frames of two paths of video sources, so that a 3D naked eye video file capable of being played by a common player can be generated, and a user can watch a video with a 3D effect by naked eyes without depending on special equipment.

In a first aspect, an embodiment of the present application provides a video processing method, where the method includes: acquiring two paths of original video sources for 3D presentation; respectively decoding the two paths of original video sources to obtain a first video frame sequence and a second video frame sequence which respectively correspond to the two paths of original video sources; respectively and alternately selecting target video frames from the first video frame sequence and the second video frame sequence according to the identification information of the video frames, so that the identification information of the target video frames in the obtained first target video frame set and the second target video frame set is complementary; and according to the sequence of the identification information of the target video frames, performing cross combination on the target video frames in the first target video frame set and the second target video frame set to obtain a target video frame sequence.

Further, after the target video frame set is obtained, the method further includes: and encapsulating the audio frames in the original video sources and the target video frames in the target video frame sequence to obtain the naked eye 3D video files corresponding to the two original video sources.

Further, the identification information includes a decompression time stamp of the video frame or a sequence number corresponding to the video frame.

Further, the step of obtaining two original video sources for 3D rendering includes: collecting two paths of original video sources for 3D presentation through parallel binocular camera equipment; or, calculating the target video source by an image recognition technology to obtain two paths of original video sources with parallax.

Further, the step of alternately selecting the target video frame from the first video frame sequence and the second video frame sequence according to the identification information of the video frame includes: according to the sequence of the identification information of the video frames in the first video frame sequence, sequentially spacing a specified number of video frames, selecting a specified number of first target video frames from the first video frame sequence, and adding the first target video frames to the first target video frame set; and sequentially adding second target video frames, which have the same identification information as the unselected video frames in the first video frame sequence, in the second video frame sequence to a second target video frame set according to the sequence of the identification information.

Further, the step of alternately selecting the target video frame from the first video frame sequence and the second video frame sequence according to the identification information of the video frame includes: according to the identification information of the video frames, odd frames are selected from the first video frame sequence and added to the first target video frame set, and even frames are selected from the second video frame sequence and added to the second target video frame set.

Further, before the step of alternately selecting the target video frame from the first video frame sequence and the second video frame sequence according to the identification information of the video frame, the method further includes: if the identification information of the video frames in the first video frame sequence and the second video frame sequence is inconsistent, the identification information of the video frames is adjusted to be consistent under the premise of not changing the sequence.

Further, the step of encapsulating the audio frames in the original video sources and the target video frames in the target video frame sequence to obtain the naked eye 3D video files corresponding to the two original video sources includes: taking an audio frame in any one path of original video source as a target audio frame; and packaging the target audio frame and the target video frame in the target video frame sequence through a preset coding algorithm to obtain naked eye 3D video files corresponding to the two paths of original video sources.

Further, the preset encoding algorithm includes one of the following: h264 encoding algorithm, H265 encoding algorithm, and AV1 encoding algorithm.

In a second aspect, an embodiment of the present application further provides a video processing apparatus, including: the video source acquisition module is used for acquiring two paths of original video sources for 3D presentation; the video decoding module is used for respectively decoding the two paths of original video sources to obtain a first video frame sequence and a second video frame sequence which respectively correspond to the two paths of original video sources; the target frame selection module is used for alternately selecting target video frames from the first video frame sequence and the second video frame sequence respectively according to the identification information of the video frames so as to make the identification information of the target video frames in the first target video frame set and the second target video frame set complementary; and the frame combination module is used for performing cross combination on the target video frames in the first target video frame set and the second target video frame set according to the sequence of the identification information of the target video frames to obtain a target video frame sequence.

In a third aspect, an embodiment of the present application further provides an electronic device, which includes a processor and a memory, where the memory stores computer-executable instructions that can be executed by the processor, and the processor executes the computer-executable instructions to implement the method of the first aspect.

In a fourth aspect, embodiments of the present application further provide a computer-readable storage medium storing computer-executable instructions that, when invoked and executed by a processor, cause the processor to implement the method of the first aspect.

In the video processing method provided by the embodiment of the application, first, two paths of original video sources for 3D presentation are decoded respectively to obtain a first video frame sequence and a second video frame sequence corresponding to the two paths of original video sources respectively; then according to the identification information of the video frames, alternately selecting target video frames with complementary identification information from the first video frame sequence and the second video frame sequence respectively to obtain a first target video frame set and a second target video frame set; and then, carrying out cross combination on the target video frames in the two sets according to the sequence of the identification information of the target video frames to obtain a target video frame sequence. According to the embodiment of the application, the video frame sequence with the naked eye 3D visual effect can be obtained through the cross combination processing of the video frames of the two video sources, and then a 3D naked eye video file capable of being played by a common player can be generated, so that a user can watch the video with the 3D effect by naked eyes without relying on special equipment.

Drawings

In order to more clearly illustrate the detailed description of the present application or the technical solutions in the prior art, the drawings used in the detailed description or the prior art description will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings can be obtained by those skilled in the art without creative efforts.

Fig. 1 is a flowchart of a video processing method according to an embodiment of the present application;

fig. 2 is a flowchart of a video frame selection method according to an embodiment of the present disclosure;

fig. 3 is a schematic diagram of a video cross-assembly according to an embodiment of the present application;

fig. 4 is a block diagram of a video processing apparatus according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The technical solutions of the present application will be described clearly and completely with reference to the following embodiments, and it should be understood that the described embodiments are some, but not all embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Currently, there are three ways to view video with 3D stereoscopic effect:

the first method comprises the following steps: the stereoscopic video technology mainly adopted by the 3D film is that an image with binocular parallax is directly synthesized through parallel shooting during shooting, and fine adjustment processing is carried out at the later stage. When the parallax images are played, the two parallax images exist on a picture at the same time, and corresponding videos are obtained by the left eye and the right eye of a person at the same time through special equipment (such as polarized glasses), and the parallax images are perceived as stereo images after being processed by the brain of the person.

And the second method comprises the following steps: the VR stereo video is characterized in that two videos with parallax, which are generated by two parallel cameras, are simultaneously and respectively played, the left eye and the right eye of a person respectively receive the corresponding videos by utilizing VR equipment (head-mounted equipment), and the parallax caused by the parallax is processed by the brain of the person to feel the effect of the stereo video.

And the third is that: the conventional naked eye 3D technology is used for transforming a player, two paths of videos with parallax errors generated by parallel shooting are played on a player terminal device in a rapid and alternate mode, when human eyes watch the videos, the videos switched rapidly generate the parallax errors, and after the human brain processes the parallax errors, the effect of a stereoscopic image is perceived.

In the above manner, the biggest problem of 3D movies and VR 3D technologies is that they are dependent on special equipment and cannot be viewed directly by human eyes. Conventional bore hole 3D technique, player need special transformation, can not use ordinary video player, and the fast switch-over of two videos can a large amount of consumer computational resources moreover, have serious influence to the high-definition video 3D effect.

Based on this, embodiments of the present application provide a video processing method, an apparatus, and an electronic device, which can obtain a video frame sequence with a naked-eye 3D visual effect by performing cross-merge processing on video frames of two video sources, and can further generate a 3D naked-eye video file that can be played by using a common player, so that a user can watch a video with a 3D effect by naked eyes without relying on a special device.

Fig. 1 is a flowchart of a video processing method according to an embodiment of the present application, where the video processing method specifically includes the following steps:

and S102, acquiring two paths of original video sources for 3D presentation.

The two original video sources for 3D presentation in the embodiment of the application may be two video sources with parallax acquired by hardware equipment, for example, two video sources generated by parallel shooting through parallel binocular shooting equipment; or, the two original video sources with parallax may be obtained by performing video parallax calculation through a software technique, for example, calculating a target video source through an image recognition technique. The binocular camera shooting equipment comprises two cameras or binocular cameras for shooting in parallel.

And step S104, respectively decoding the two paths of original video sources to obtain a first video frame sequence and a second video frame sequence respectively corresponding to the two paths of original video sources.

The video source decoding process can be realized by a hardware decoder or by a software decoding technology, and a first video frame sequence and a second video frame sequence can be obtained after two paths of original video sources are respectively decoded, wherein the first video frame sequence and the second video frame sequence are arranged according to the sequence of the identification information of the video frames.

And step S106, alternately selecting target video frames from the first video frame sequence and the second video frame sequence respectively according to the identification information of the video frames, so that the identification information of the target video frames in the first target video frame set and the second target video frame set are complementary.

In this step, the identification information may be a decompression timestamp of the video frame, or may also be a sequence number corresponding to the video frame. For example, the first video frame sequence includes video frames a1, a2, A3 … … a20, and the second video frame sequence includes video frames B1, B2, B3 … … B20, then the target video frames a1, a2 may be selected from the first video frame sequence, the target video frames B3, B4, B5 may be selected from the second video frame sequence, the target video frames a6 may be selected from the first video frame sequence, the target video frames B7, B8 may be selected from the second video frame sequence, and so on, until the last frame, which is uneven alternative selection, as long as the identification information of the target video frames in the first target video frame set and the second target video frame set is complementary.

Yet another way, the target video frame A1 is selected from the first video frame sequence, the target video frame B2 is selected from the second video frame sequence, the target video frame A3 is selected from the first video frame sequence, the target video frame B4 is selected from the second video frame sequence, and so on, until the last frame.

Forming a first target video frame set by selected target video frames in the first video frame sequence; forming a second target video frame set by the selected target video frames in the second video frame sequence; the target video frames in the first set of target video frames and the second set of target video frames are complementary.

And S108, performing cross combination on the target video frames in the first target video frame set and the second target video frame set according to the sequence of the identification information of the target video frames to obtain a target video frame sequence.

And after the first target video frame set and the second target video frame set are determined, performing cross combination on the target video frames according to the sequence of the identification information of the target video frames in the two sets to obtain a target video frame sequence.

According to the video processing method provided by the embodiment of the application, two paths of video sources generated by parallel shooting are decoded and then target video frames are alternately selected according to the sequence of the decompression timestamps of the video frames, the target video frames are combined in a cross mode to obtain a video frame sequence with a naked eye 3D visual effect, and then the video frames can be combined into a complete naked eye 3D video through a video coding algorithm.

After the target video frame sequence is obtained, the following method steps may be further included: and encapsulating audio frames in the original video sources and target video frames in the target video frame sequence to obtain naked eye 3D video files corresponding to the two original video sources.

According to the method and the device, only the video frames are subjected to cross merging processing, the audio is not subjected to special processing, when the audio is finally coded, the audio frames corresponding to any original video source can be adopted for packaging, namely, after the audio frames in the original video source and the target video frames in the target video frame sequence are packaged, naked eye 3D video files corresponding to two original video sources can be generated.

In practical application, two original video sources used for 3D presentation are available in various acquiring modes, and the video sources used in the embodiment of the application are two video files generated by two-way parallel camera equipment for simulating binocular vision, which are also used in the production of most 3D stereo videos. Two parallel cameras simulate binocular vision to collect videos, the two video sources are required to always unify targets, synchronize audio start-stop time and unify two video frame rates (not less than 60 frames/second: 60 complete pictures are generated in each second). In addition, the target video source can also be calculated through an image recognition technology to obtain two paths of original video sources with parallax, namely, the video sources are obtained through a software mode.

In order to improve the final viewing effect of the naked eye 3D video file, the embodiment of the present application provides a preferred target video frame selection method, that is, the step S106 is a step of respectively and alternately selecting a target video frame from a first video frame sequence and a second video frame sequence according to identification information of the video frame, and may be specifically implemented by referring to a flowchart of a video frame selection method shown in fig. 2:

step S202, according to the sequence of the identification information of the video frames in the first video frame sequence, sequentially spacing a specified number of video frames, and selecting a specified number of first target video frames from the first video frame sequence to add to the first target video frame set.

The specified number can be one or more, the number of the video frames in the interval is the same as the number of the target frame selections, namely, the target video frames are alternately and uniformly selected from the two video frame sequences, and the mode can realize better 3D video effect. For example, the first video frame sequence comprises video frames a1, a2, A3 … … a20, and the second video frame sequence comprises video frames B1, B2, B3 … … B20, then two video frames may be separated, the first target video frame A3, a4 is selected from the first video frame sequence, two video frames are separated, the first target video frame a7, A8 is selected from the first video frame sequence, and so on, until the last frame. The selected first target video frames A3, a4, a7 and A8 … … are sequentially added to the first target video frame set.

Step S204, adding second target video frames in the second video frame sequence, which have the same identification information as the unselected video frames in the first video frame sequence, to the second target video frame set in sequence according to the identification information.

The second video frame sequence includes video frames B1, B2, and B3 … … B20, generally, identification information of two paths of original video sources acquired by a binocular camera device are consistent after decoding, that is, identification information of video frames a1 and B1 are consistent, identification information of video frames a2 and B2 are consistent … …, and if the two paths of video sources are obtained by calculation using a software method, identification information of two paths of video frames corresponding to each other after decoding may have a certain difference, at this time, it is necessary to first adjust identification information of two paths of video frames to be consistent on the premise of keeping an unchanged sequence.

In this way, after the first target video frame in the first video frame sequence is selected, the second target video frame in the second video frame sequence may be directly determined, that is, the second target video frames in the second video frame sequence, which have the same identification information as the unselected video frames in the first video frame sequence, are sequentially added to the second target video frame set according to the sequence of the identification information.

Further, by way of example, after determining that the first target video frames in the first target video frame set are A3, a4, a7, and A8 … …, respectively, it may be determined that the video frames in the first video frame sequence that are not selected are a1, a2, a5, and a6 … …, and the video frames in the second video frame sequence that have the same identification information as the video frames a1, a2, a5, and a6 … … are B1, B2, B5, and B6 … …, so that the video frames B1, B2, B5, and B6 … … are added to the second target video frame set as the second target video frames.

In a preferred embodiment, the step of alternately selecting the target video frame from the first video frame sequence and the second video frame sequence according to the identification information of the video frames can be implemented by selecting an odd frame from the first video frame sequence according to the identification information of the video frames, as shown in fig. 3, where the first video frame a1 and the first video frame A3 … … are added to the first target video frame set, and the even frame from the second video frame sequence is selected, as the second video frame B2 and the second video frame B4 … … are added to the second target video frame set. According to the identification information of the odd frames and the even frames, carrying out cross merging on the odd frames and the even frames to obtain a target video frame sequence: a first video frame a1, a second video frame B2, a first video frame A3, a second video frame B4 … …. This way, that is, in the above embodiment, when the specified number is one, that is, one video frame is selected every other video frame, this way of selecting video frames can better achieve the 3D effect of video.

After the first target video frame set and the second target video frame set are determined through the process, the target video frames in the two sets are further subjected to cross merging according to identification information of the target video frames to generate a target video frame sequence, and then audio frames in the original video source and the target video frames in the target video frame sequence are packaged to obtain naked eye 3D video files corresponding to the two original video sources. Specifically, an audio frame in any one of the original video sources can be used as a target audio frame; and packaging the target audio frame and the target video frame in the target video frame sequence through a preset coding algorithm to obtain naked eye 3D video files corresponding to the two paths of original video sources. The preset encoding algorithm may include one of: h264 encoding algorithm, H265 encoding algorithm, and AV1 encoding algorithm.

By re-encoding, the newly generated video file has the same frame rate as the original video source, but since the two video files are combined, the frame rate of each monocular video (for one eye) is half of the frame rate of the original video source. Therefore, after combining the video files, in order to reduce the frustration caused by the monocular perception of the human eyes, the original video source is required to be recorded at a higher frame rate as much as possible. The newly generated naked eye 3D video file can be played on a common video player. If a person is looking with only one eye, the effect is a video with ghosts. If a person watches the three-dimensional images with two eyes, the brain can automatically correspond the double-image images to the single-side eyes to receive information through short adaptation, and finally, the illusion of watching the three-dimensional images is generated through the processing of the human brain visual perception area, namely, the effect of naked eye 3D videos is realized.

The video processing method provided by the embodiment of the application is realized through a pure software algorithm, does not depend on a special viewing device or a special video player, the video file can also use a common coding format in the market, all playing conditions are not limited, and a viewer can experience a 3D (three-dimensional) video effect by naked eyes.

Based on the foregoing method embodiment, an embodiment of the present application further provides a video processing apparatus, as shown in fig. 4, the apparatus includes: a video source obtaining module 402, configured to obtain two paths of original video sources for 3D presentation; the video decoding module 404 is configured to decode two paths of original video sources respectively to obtain a first video frame sequence and a second video frame sequence corresponding to the two paths of original video sources respectively; a target frame selection module 406, configured to alternately select a target video frame from the first video frame sequence and the second video frame sequence according to the identification information of the video frame, so that the obtained identification information of the target video frames in the first target video frame set and the second target video frame set are complementary; and the frame combination module 408 is configured to perform cross combination on the target video frames in the first target video frame set and the second target video frame set according to the sequence of the identification information of the target video frames to obtain a target video frame sequence.

In another possible implementation, the video processing apparatus further includes: and the audio and video packaging module is used for packaging the audio frames in the original video sources and the target video frames in the target video frame sequence to obtain naked eye 3D video files corresponding to the two original video sources.

In another possible embodiment, the identification information includes a decompression time stamp of the video frame or a sequence number corresponding to the video frame.

In another possible implementation, the video source obtaining module 402 is further configured to: acquiring two paths of original video sources for 3D presentation through binocular camera equipment; or, calculating the target video source by an image recognition technology to obtain two paths of original video sources with parallax.

In another possible implementation, the target frame selecting module 406 is further configured to: according to the sequence of the identification information of the video frames in the first video frame sequence, sequentially spacing a specified number of video frames, selecting a specified number of first target video frames from the first video frame sequence, and adding the first target video frames to the first target video frame set; and sequentially adding second target video frames, which have the same identification information as the unselected video frames in the first video frame sequence, in the second video frame sequence to a second target video frame set according to the sequence of the identification information.

In another possible implementation, the target frame selecting module 406 is further configured to: according to the identification information of the video frames, odd frames are selected from the first video frame sequence and added to the first target video frame set, and even frames are selected from the second video frame sequence and added to the second target video frame set.

In another possible embodiment, the above apparatus further comprises: and the timestamp adjusting module is used for adjusting the identification information of the video frames to be consistent on the premise of keeping the sequence unchanged if the identification information of the video frames is inconsistent in the first video frame sequence and the second video frame sequence.

In another possible implementation, the audio/video encapsulation module is further configured to: taking an audio frame in any one path of original video source as a target audio file; and packaging the target audio frame and the target video frame in the target video frame sequence through a preset coding algorithm to obtain naked eye 3D video files corresponding to the two paths of original video sources.

In another possible embodiment, the preset encoding algorithm includes one of the following: h264 encoding algorithm, H265 encoding algorithm, and AV1 encoding algorithm.

The video processing apparatus provided in the embodiment of the present application has the same implementation principle and technical effect as those of the foregoing video processing method embodiment, and for brief description, reference may be made to corresponding contents in the foregoing video processing method embodiment for the portions of the embodiment of the video processing apparatus that are not mentioned.

An electronic device is further provided in the embodiments of the present application, as shown in fig. 5, which is a schematic structural diagram of the electronic device, where the electronic device includes a processor 51 and a memory 50, the memory 50 stores computer-executable instructions capable of being executed by the processor 51, and the processor 51 executes the computer-executable instructions to implement the method.

In the embodiment shown in fig. 5, the electronic device further comprises a bus 52 and a communication interface 53, wherein the processor 51, the communication interface 53 and the memory 50 are connected by the bus 52.

The Memory 50 may include a high-speed Random Access Memory (RAM) and may also include a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. The communication connection between the network element of the system and at least one other network element is realized through at least one communication interface 53 (which may be wired or wireless), and the internet, a wide area network, a local network, a metropolitan area network, and the like can be used. The bus 52 may be an ISA (Industry Standard Architecture) bus, a PCI (Peripheral Component Interconnect) bus, an EISA (Extended Industry Standard Architecture) bus, or the like. The bus 52 may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one double-headed arrow is shown in FIG. 5, but this does not indicate only one bus or one type of bus.

The processor 51 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 51. The Processor 51 may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the device can also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, or a discrete hardware component. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in a memory, and the processor 51 reads information in the memory and performs the steps of the method of the previous embodiment in combination with hardware thereof.

Embodiments of the present application further provide a computer-readable storage medium, where computer-executable instructions are stored, and when the computer-executable instructions are called and executed by a processor, the computer-executable instructions cause the processor to implement the method, and specific implementation may refer to the foregoing method embodiments, and is not described herein again.

The video processing method, the video processing apparatus, and the computer program product of the electronic device provided in the embodiments of the present application include a computer-readable storage medium storing a program code, where instructions included in the program code may be used to execute the method described in the foregoing method embodiments, and specific implementations may refer to the method embodiments and are not described herein again.

Unless specifically stated otherwise, the relative steps, numerical expressions, and numerical values of the components and steps set forth in these embodiments do not limit the scope of the present application.

The functions, if implemented in software functional units and sold or used as a stand-alone product, may be stored in a non-transitory computer-readable storage medium executable by a processor. Based on such understanding, the technical solutions of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

In the description of the present application, it should be noted that the terms "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", and the like indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, and are only for convenience of description and simplicity of description, and do not indicate or imply that the device or element being referred to must have a particular orientation, be constructed and operated in a particular orientation, and thus, should not be construed as limiting the present application. Furthermore, the terms "first," "second," and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.

Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present application, and are used for illustrating the technical solutions of the present application, but not limiting the same, and the scope of the present application is not limited thereto, and although the present application is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope disclosed in the present application; such modifications, changes or substitutions do not depart from the spirit and scope of the exemplary embodiments of the present application, and are intended to be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method of video processing, the method comprising:

acquiring two paths of original video sources for 3D presentation;

decoding the two paths of original video sources respectively to obtain a first video frame sequence and a second video frame sequence corresponding to the two paths of original video sources respectively;

respectively and alternately selecting target video frames from the first video frame sequence and the second video frame sequence according to the identification information of the video frames, so that the identification information of the target video frames in the first target video frame set and the second target video frame set are complementary;

and according to the sequence of the identification information of the target video frames, performing cross combination on the target video frames in the first target video frame set and the second target video frame set to obtain a target video frame sequence.

2. The method of claim 1, wherein after obtaining the sequence of target video frames, the method further comprises:

and encapsulating the audio frames in the original video source and the target video frames in the target video frame sequence to obtain two paths of naked eye 3D video files corresponding to the original video source.

3. The method of claim 1, wherein the identification information comprises a decompression timestamp of a video frame or a corresponding sequence number of the video frame.

4. The method of claim 1, wherein the step of obtaining two original video sources for 3D rendering comprises:

collecting two paths of original video sources for 3D presentation through parallel binocular camera equipment;

alternatively, the first and second liquid crystal display panels may be,

and calculating the target video source by an image recognition technology to obtain two paths of original video sources with parallax.

5. The method of claim 1, wherein the step of alternately selecting the target video frame from the first video frame sequence and the second video frame sequence, respectively, according to the identification information of the video frame comprises:

according to the sequence of the identification information of the video frames in the first video frame sequence, sequentially spacing a specified number of video frames, selecting a first target video frame with the specified number from the first video frame sequence, and adding the first target video frame into a first target video frame set;

and sequentially adding second target video frames, which have the same identification information as the unselected video frames in the first video frame sequence, in the second video frame sequence to a second target video frame set according to the sequence of the identification information.

6. The method of claim 1, wherein the step of alternately selecting the target video frame from the first video frame sequence and the second video frame sequence, respectively, according to the identification information of the video frame comprises:

according to the identification information of the video frames, odd frames are selected from the first video frame sequence and added to a first target video frame set, and even frames are selected from the second video frame sequence and added to a second target video frame set.

7. The method of claim 1, wherein prior to the step of alternately selecting the target video frame from the first video frame sequence and the second video frame sequence, respectively, according to identification information of the video frame, the method further comprises:

if the identification information of the video frames in the first video frame sequence and the second video frame sequence is inconsistent, the identification information of the video frames is adjusted to be consistent on the premise of keeping the sequence.

8. The method according to claim 2, wherein the step of encapsulating the audio frames in the original video source and the target video frames in the target video frame sequence to obtain two paths of naked eye 3D video files corresponding to the original video source comprises:

taking an audio frame in any one path of the original video source as a target audio frame;

and packaging the target audio frame and the target video frame in the target video frame sequence through a preset coding algorithm to obtain two paths of naked eye 3D video files corresponding to the original video sources.

9. The method of claim 8, wherein the predetermined encoding algorithm comprises one of: h264 encoding algorithm, H265 encoding algorithm, and AV1 encoding algorithm.

10. A video processing apparatus, characterized in that the apparatus comprises:

the video source acquisition module is used for acquiring two paths of original video sources for 3D presentation;

the video decoding module is used for respectively decoding the two paths of original video sources to obtain a first video frame sequence and a second video frame sequence which respectively correspond to the two paths of original video sources;

a target frame selection module, configured to alternately select a target video frame from the first video frame sequence and the second video frame sequence according to identification information of video frames, so that identification information of target video frames in the obtained first target video frame set and second target video frame set are complementary;

and the frame combination module is used for performing cross combination on the target video frames in the first target video frame set and the second target video frame set according to the sequence of the identification information of the target video frames to obtain a target video frame sequence.

11. An electronic device comprising a processor and a memory, the memory storing computer-executable instructions executable by the processor, the processor executing the computer-executable instructions to implement the method of any of claims 1 to 9.

12. A computer-readable storage medium having computer-executable instructions stored thereon which, when invoked and executed by a processor, cause the processor to implement the method of any of claims 1 to 9.