WO2022142757A1

WO2022142757A1 - Video processing method and apparatus, electronic device, and computer readable storage medium

Info

Publication number: WO2022142757A1
Application number: PCT/CN2021/129870
Authority: WO
Inventors: 朱韬
Original assignee: 北京金山云网络技术有限公司
Priority date: 2020-12-30
Filing date: 2021-11-10
Publication date: 2022-07-07
Also published as: CN114697758A; CN114697758B

Abstract

The present application provides a video processing method and apparatus, an electronic device, and a computer readable storage medium. The method comprises: obtaining a first video frame sequence and a second video frame sequence respectively corresponding to the two original video sources represented in a 3D form; alternately selecting target video frames having complementary identifier information from the first video frame sequence and the second video frame sequence according to identifier information of video frames to obtain a first target video frame set and a second target video frame set; and cross-merging the target video frames in the two sets according to the sequence of the identifier information of the target video frames to obtain a target video frame sequence. According to the present application, by cross-merging the video frames of the two video sources, the video frame sequence having a naked eye 3D visual effect can be obtained, so that a naked-eye 3D video file can be generated, and a user does not need to rely on a special device to watch a video having a 3D effect by using the naked eyes.

Description

Video processing method, apparatus, electronic device, and computer-readable storage medium

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims the priority of the Chinese Patent Application No. 202011614555.7 and entitled "Video Processing Method, Apparatus and Electronic Equipment" filed with the Chinese Patent Office on December 30, 2020, the entire contents of which are incorporated herein by reference .

technical field

The present application relates to the technical field of software algorithm development, and in particular, to a video processing method, apparatus, electronic device, and computer-readable storage medium.

Background technique

In the related art, there are two ways to achieve a 3D (three dimensional, three-dimensional) effect: one way is that the user directly wears 3D glasses or VR (virtual reality, virtual reality) glasses, relying on special equipment; the other way is Yes, the player is modified to play a video with 3D effect, which can be watched directly by the human eye, but ordinary video players cannot be used in this way, and the rapid switching of the two videos will consume a lot of computing resources of the device, which is not suitable for high-definition and high-quality images. The 3D effect of the video has serious implications.

SUMMARY OF THE INVENTION

The purpose of this application is to provide a video processing method, device, electronic device and computer-readable storage medium, which can obtain a video frame sequence with naked-eye 3D visual effects through cross-merging processing of video frames of two video sources, and then It can generate 3D naked-eye video files that can be played by ordinary players, so that users can watch videos with naked eyes without relying on special equipment.

In a first aspect, an embodiment of the present application provides a video processing method, including: acquiring two channels of original video sources to be presented in 3D; decoding the two channels of original video sources respectively to obtain first videos corresponding to the two channels of original video sources respectively frame sequence and the second video frame sequence; according to the identification information of the video frame, alternately select target video frames from the first video frame sequence and the second video frame sequence, so that the first target video frame set and the second target video The identification information of the target video frame in the frame set is complementary; And according to the order of the identification information of the target video frame, the target video frame in the first target video frame set and the second target video frame set are cross-combined to obtain the target video frame. sequence.

In one embodiment, after obtaining the target video frame set, the method further includes: encapsulating the audio frame in the original video source and the target video frame in the target video frame sequence to obtain the naked-eye 3D video corresponding to the two original video sources. document.

In one embodiment, the above-mentioned identification information includes a decompression timestamp of the video frame or a sequence number corresponding to the video frame.

In one embodiment, the above-mentioned step of acquiring two channels of original video sources to be presented in 3D includes: collecting two channels of original video sources set for 3D presentation through a parallel binocular camera device; Calculation is performed to obtain two original video sources with parallax.

In one embodiment, the above-mentioned step of alternately selecting target video frames from the first video frame sequence and the second video frame sequence according to the identification information of the video frame, comprises: according to the identification information of the video frame in the first video frame sequence. sequence, the specified number of video frames are spaced in turn, and the specified number of first target video frames are selected from the first video frame sequence to be added to the first target video frame set; The unselected video frames in the video frame sequence have second target video frames with the same identification information, and are sequentially added to the second target video frame set according to the sequence of the identification information.

In one embodiment, the above-mentioned step of alternately selecting the target video frame from the first video frame sequence and the second video frame sequence according to the identification information of the video frame includes: according to the identification information of the video frame, from the first video frame sequence. Odd-numbered frames are selected from the middle to be added to the first target video frame set, and even-numbered frames are selected from the second video frame sequence to be added to the second target video frame set.

In one embodiment, before the step of alternately selecting target video frames from the first video frame sequence and the second video frame sequence according to the identification information of the video frame, the method further includes: if the first video frame sequence and the second video frame sequence are In the frame sequence, the identification information of the video frames is inconsistent, and the identification information of the video frames is adjusted to be consistent without changing the order.

In one embodiment, the above-mentioned steps of encapsulating the audio frame in the original video source and the target video frame in the target video frame sequence to obtain the corresponding naked-eye 3D video files of the two original video sources include: using any original video The audio frame in the source is used as the target audio frame; the target audio frame and the target video frame in the target video frame sequence are encapsulated by the preset coding algorithm, and the naked eye 3D video file corresponding to the two original video sources is obtained.

In one embodiment, the above-mentioned preset encoding algorithm includes one of the following: an H264 encoding algorithm, an H265 encoding algorithm, and an AV1 encoding algorithm.

In a second aspect, an embodiment of the present application further provides a video processing device, the device includes: a video source acquisition module configured to acquire two channels of original video sources to be presented in 3D; a video decoding module configured to acquire two channels of original video sources Decoding is performed respectively to obtain the first video frame sequence and the second video frame sequence corresponding to the two original video sources respectively; the target frame selection module is set to select from the first video frame sequence and the second video frame according to the identification information of the video frame. The sequence alternately selects the target video frame respectively, so that the identification information of the target video frame in the first target video frame set and the second target video frame set obtained is complementary; And the frame combining module is set to be according to the identification information of the target video frame. sequence, cross-combining the target video frames in the first target video frame set and the second target video frame set to obtain a target video frame sequence.

In a third aspect, embodiments of the present application further provide an electronic device, including a processor and a memory, where the memory stores computer-executable instructions that can be executed by the processor, and the processor executes the computer-executable instructions to implement the method of the first aspect above .

In a fourth aspect, embodiments of the present application further provide a computer-readable storage medium, where the computer-readable storage medium stores computer-executable instructions, and when the computer-executable instructions are invoked and executed by the processor, the computer-executable instructions cause the processor to The method described in the first aspect above is implemented.

Description of drawings

In order to more clearly illustrate the technical solutions in the specific embodiments of the present application or related technologies, the following briefly introduces the accompanying drawings required in the description of the specific embodiments or related technologies. Obviously, the accompanying drawings in the following description are For some embodiments of the present application, for those of ordinary skill in the art, other drawings can also be obtained based on these drawings without any creative effort.

1 is a flowchart of a video processing method provided by an embodiment of the present application;

2 is a flowchart of a video frame selection method provided by an embodiment of the present application;

3 is a schematic diagram of a video cross-combination provided by an embodiment of the present application;

4 is a structural block diagram of a video processing apparatus provided by an embodiment of the present application;

FIG. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed ways

The technical solutions of the present application will be clearly and completely described below with reference to the embodiments. Obviously, the described embodiments are part of the embodiments of the present application, but not all of the embodiments. Based on the embodiments in the present application, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present application.

Currently, there are three ways to watch videos with 3D stereoscopic effect:

The first: The stereoscopic video technology mainly used in 3D movies is to directly synthesize an image with binocular parallax through parallel cameras during shooting, and fine-tune processing is performed later. During playback, two parallax images exist on the screen at the same time. Through special equipment (such as polarized glasses), the left and right eyes can obtain corresponding videos at the same time, and the parallax is perceived as a stereoscopic image after the human brain is processed.

The second type: VR stereoscopic video, which is to play two videos with parallax generated by two parallel cameras, and use VR equipment (head mounted device) to make people's left and right eyes receive their corresponding videos respectively, resulting in The parallax is processed by the human brain to feel the stereoscopic video effect.

The third type: conventional naked-eye 3D technology, the player is transformed, and the two-channel video with parallax generated by parallel cameras is quickly alternately played on a player terminal device, and the video is quickly switched when the human eye directly watches it. Parallax is generated, and the human brain perceives the stereoscopic image effect after processing the parallax.

Among the above methods, the biggest problem of 3D movies, VR and 3D technology is that they rely on special equipment and cannot be watched directly by human eyes. For conventional naked-eye 3D technology, the player needs to be specially modified, and ordinary video players cannot be used, and the rapid switching of two videos will consume a large amount of computing resources of the device, which has a serious impact on the 3D effect of high-definition and high-quality videos.

Based on this, the embodiments of the present application provide a video processing method, apparatus, electronic device, and computer-readable storage medium. Through cross-merging processing of video frames of two video sources, a video frame sequence with naked-eye 3D visual effects can be obtained. , and then a 3D naked-eye video file that can be played by a common player can be generated, so that the user can watch the 3D-effect video without relying on a special device.

1 is a flowchart of a video processing method provided by an embodiment of the present application, and the video processing method specifically includes the following steps:

Step S102, acquiring two channels of original video sources to be presented in 3D.

The two channels of original video sources to be presented in 3D in the embodiments of the present application may be two channels of video sources with parallax collected by hardware devices, for example, two channels of video sources generated by parallel photography with parallel binocular camera devices; or , or two video sources obtained by performing video disparity calculation through software technology, for example, calculating the target video source through image recognition technology to obtain two original video sources with parallax. The above binocular camera device includes two cameras or binocular cameras that take pictures in parallel.

Step S104: Decode the two channels of original video sources respectively to obtain a first video frame sequence and a second video frame sequence corresponding to the two channels of original video sources respectively.

The above video source decoding process can be implemented by a hardware decoder or a software decoding technology. After decoding the two original video sources respectively, the first video frame sequence and the second video frame sequence can be obtained, and the first video frame sequence can be obtained. and the second video frame sequence are arranged according to the sequence of the identification information of the video frames.

Step S106, according to the identification information of the video frame, alternately select target video frames from the first video frame sequence and the second video frame sequence, so that the target video in the first target video frame set and the second target video frame set is obtained. The identification information of the frame is complementary.

In this step, the above identification information may be the decompression timestamp of the video frame, or may also be the sequence number corresponding to the video frame. The target video frames are alternately selected from the first video frame sequence and the second video frame sequence, which can be alternately selected unevenly or evenly. For example, the first video frame sequence includes video frames A1, A2, A3 ...A20, the second video frame sequence includes video frames B1, B2, B3...B20, then target video frames A1, A2 can be selected from the first video frame sequence, and target video frames B3, A2 can be selected from the second video frame sequence B4, B5, then select the target video frame A6 from the first video frame sequence, select the target video frame B7, B8 from the second video frame sequence, and so on, until the last frame, this is an uneven alternate selection, As long as the identification information of the target video frames in the first target video frame set and the second target video frame set are complementary to each other.

There is another way to evenly and alternately select, for example, select the target video frame A1 from the first video frame sequence, select the target video frame B2 from the second video frame sequence, and then select the target video frame from the first video frame sequence. Frame A3, select the target video frame B4 from the second video frame sequence, and so on until the last frame.

The target video frames selected from the first video frame sequence form the first target video frame set; the target video frames selected from the second video frame sequence form the second target video frame set; the first target video frame set and the first target video frame set The target video frames in the two target video frame sets are complementary.

Step S108, according to the sequence of the identification information of the target video frames, cross-combine the target video frames in the first target video frame set and the second target video frame set to obtain a target video frame sequence.

After the first target video frame set and the second target video frame set are determined, according to the sequence of the identification information of the target video frames in the two sets, the target video frames are cross-combined to obtain the target video frame sequence.

In the video processing method provided by the embodiment of the present application, after decoding two video sources generated by parallel cameras, the target video frames are alternately selected in the order of the decompression time stamps of the video frames, and the target video frames are cross-merged to obtain a naked-eye 3D video frame. The video frame sequence of visual effects can then be combined into a complete naked-eye 3D video through the video coding algorithm. Through the ordinary video player, the two-channel video with parallax can be played alternately, and the viewer does not need to use special equipment. It can be viewed, processed by the brain, and the stereoscopic effect is perceived.

After obtaining the target video frame sequence, the following method steps may also be included: encapsulating the audio frame in the original video source and the target video frame in the target video frame sequence to obtain the glasses-free 3D video files corresponding to the two original video sources.

The embodiment of the present application only performs cross-merging processing on video frames, and does not perform special processing on audio. In the final encoding, the audio frames corresponding to any original video source can be used for encapsulation, that is, the audio frames in the original video source can be encapsulated by After the frame is encapsulated with the target video frame in the target video frame sequence, the glasses-free 3D video file corresponding to the two original video sources can be generated.

In practical applications, there are many ways to obtain the two-channel original video sources to be presented in 3D. The video source used in the embodiment of the present application is a two-channel video file generated by a two-channel parallel camera device for simulating binocular vision. Video source used in some 3D stereoscopic video production. Two parallel camera devices, simulating binocular vision for video capture, the two video sources must always have the same target, synchronize the start and end time of the audio, and unify the frame rate of the two channels of video (not less than 60 frames/second: 60 complete frames per second are generated) screen). In addition, the target video source can also be calculated by the image recognition technology to obtain two channels of original video sources with parallax, that is, the acquisition of the video source can be realized by means of software.

In order to improve the final viewing effect of the naked-eye 3D video file, the embodiment of the present application provides an embodiment of the target video frame selection method, that is, the above step S106, according to the identification information of the video frame, from the first video frame sequence and the second video frame sequence. The steps of alternately selecting target video frames can be realized by referring to the flowchart of the video frame selection method shown in FIG. 2 :

Step S202, according to the sequence of the identification information of the video frames in the first video frame sequence, the specified number of video frames are spaced in turn, and the specified number of first target video frames are selected from the first video frame sequence and added to the first target video frame. frame collection.

The above specified number can be one or more. The number of video frames at intervals is the same as the number of target frames selected, that is, the target video frames are alternately and evenly selected from the two video frame sequences. This method Able to achieve better 3D video effect. For example, if the first video frame sequence includes video frames A1, A2, A3...A20, and the second video frame sequence includes video frames B1, B2, B3...B20, then two video frames can be spaced apart, starting from the first video frame sequence. Select the first target video frame A3, A4, and then two video frames apart, select the first target video frame A7, A8 from the first video frame sequence, and so on, until the last frame. The first target video frames A3, A4, A7, A8, . . . selected above are sequentially added to the first target video frame set.

Step S204, in the second video frame sequence, the second target video frame with the same identification information as the unselected video frame in the first video frame sequence is added to the second target video frame set according to the sequence of the identification information. .

The second video frame sequence includes video frames B1, B2, B3... , that is, the identification information of the video frame A1 and the video frame B1 are consistent, and the identification information of the video frame A2 and the video frame B2 are consistent..., and if it is a two-channel video source calculated by a software method, the decoded two channels correspond to There may be some differences in the identification information of the two video frames. In this case, it is necessary to first adjust the identification information of the two video frames to be consistent on the premise of keeping the order unchanged.

In this way, after the first target video frame in the first video frame sequence is selected, the second target video frame in the second video frame sequence can be directly determined, that is, the second video frame sequence is compared with the first target video frame in the second video frame sequence. Unselected video frames in a sequence of video frames have second target video frames with the same identification information, and are sequentially added to the second target video frame set according to the sequence of the identification information.

The above example is also used to illustrate, after determining that the first target video frames in the first target video frame set are respectively A3, A4, A7, A8..., it can be determined that the unselected video frame in the first video frame sequence is A1 , A2, A5, A6..., the video frames with the same identification information as the video frames A1, A2, A5, A6... in the second video frame sequence are B1, B2, B5, B6..., therefore, The video frames B1, B2, B5, B6... are added to the second target video frame set as second target video frames.

In one embodiment, the above-mentioned step of alternately selecting target video frames from the first video frame sequence and the second video frame sequence according to the identification information of the video frames can be implemented in the following manner. Referring to FIG. 3 , according to the video frame identification information, select odd-numbered frames from the first video frame sequence, such as the first video frame A1, the first video frame A3 in the figure, and add them to the first target video frame set, and select from the second video frame sequence. The even-numbered frames, such as the second video frame B2, the second video frame B4, . . . in the figure, are added to the second target video frame set. According to the identification information of the odd-numbered frame and the even-numbered frame, the odd-numbered frame and the even-numbered frame are cross-merged to obtain the target video frame sequence: the first video frame A1, the second video frame B2, the first video frame A3, the second video frame B4... …. This method is also the case where the specified number is one in the above-mentioned embodiment, that is, a video frame is selected every other video frame, and this video frame selection method can better realize the 3D effect of the video.

After the first target video frame set and the second target video frame set are determined through the above process, the target video frames in the two sets are further cross-merged according to the identification information of the target video frames to generate a target video frame sequence, and then the original target video frame sequence is generated. The audio frame in the video source is encapsulated with the target video frame in the target video frame sequence to obtain the glasses-free 3D video file corresponding to the two original video sources. In one embodiment, the audio frame in any original video source can be used as the target audio frame; the target audio frame and the target video frame in the target video frame sequence are encapsulated by a preset coding algorithm, and the corresponding two-way original video sources are obtained. naked-eye 3D video files. The above-mentioned preset encoding algorithm may include one of the following: H264 encoding algorithm, H265 encoding algorithm and AV1 encoding algorithm.

Through re-encoding, the newly generated video file has the same frame rate as the original video source, but since the two video files are synthesized, the frame rate of each monocular video (acting on one eye) is the frame rate of the original video source. half of . Therefore, after merging the video files, in order to reduce the frustration when the human eye perceives the video monocularly, the original video source is required to increase the frame rate as much as possible during recording. The newly generated glasses-free 3D video files can be played on ordinary video players. If a person watches with only one eye, the effect is a ghosted video. If a person watches with both eyes, after a short period of adaptation, the brain will automatically assign the ghost image to the unilateral eye to receive information. After processing in the visual perception area of the human brain, the illusion of viewing a stereoscopic image is finally produced, that is, the realization of the The effect of naked eye 3D video.

The video processing methods provided by the embodiments of the present application are implemented by pure software algorithms, and do not rely on proprietary viewing devices or dedicated video players. Video files can also use encoding formats commonly used in the market, and all playback conditions are not affected by The viewer can experience the 3D stereoscopic video effect with naked eyes.

Based on the above method embodiments, the embodiments of the present application further provide a video processing apparatus, as shown in FIG. 4 , the apparatus includes: a video source obtaining module 402, configured to obtain two channels of original video sources to be presented in 3D; a video decoding module 404, set to decode the two-way original video sources respectively, and obtain the first video frame sequence and the second video frame sequence corresponding to the two-way original video sources respectively; the target frame selection module 406 is set to according to the identification information of the video frame, The target video frames are alternately selected from the first video frame sequence and the second video frame sequence, so that the obtained first target video frame set and the identification information of the target video frames in the second target video frame set are complementary; frame combining module 408 is set to cross-combine the target video frames in the first target video frame set and the second target video frame set according to the sequence of the identification information of the target video frames to obtain the target video frame sequence.

In another possible implementation, the above-mentioned video processing apparatus further includes: an audio and video encapsulation module 410, configured to encapsulate the audio frames in the original video source and the target video frames in the target video frame sequence, to obtain two channels of original video frames. The naked-eye 3D video file corresponding to the video source.

In another possible implementation manner, the above-mentioned identification information includes a decompression timestamp of the video frame or a sequence number corresponding to the video frame.

In another possible implementation, the above-mentioned video source obtaining module 402 is further configured to: collect two channels of original video sources to be presented in 3D through a binocular camera device; or, calculate the target video source through image recognition technology, and obtain Two original video sources with parallax.

In another possible implementation, the above-mentioned target frame selection module 406 is further configured to: according to the sequence of the identification information of the video frames in the first video frame sequence, sequentially space a specified number of video frames, from the first video frame sequence Select a specified number of first target video frames in the first target video frame set and add to the first target video frame set; in the second video frame sequence, the second target video with the same identification information as the video frame not selected in the first video frame sequence has the same identification information. The frames are sequentially added to the second target video frame set according to the sequence of the identification information.

In another possible implementation, the target frame selection module 406 is further configured to: select odd-numbered frames from the first video frame sequence and add them to the first target video frame set according to the identification information of the video frames, and select the odd-numbered frames from the second video frame sequence. Selecting even-numbered frames from the video frame sequence is added to the second target video frame set.

In another possible implementation manner, the above-mentioned apparatus further includes: a time stamp adjustment module 412, configured to adjust the identification information of the video frame to the identification information of the video frame if the identification information of the video frame is inconsistent in the first video frame sequence and the second video frame sequence Adjusted to be consistent under the premise of keeping the order unchanged.

In another possible implementation, the above-mentioned audio and video encapsulation module 410 is further configured to: take any audio frame in the original video source as the target audio file; The target video frame is encapsulated, and the glasses-free 3D video file corresponding to the two original video sources is obtained.

In another possible implementation manner, the above-mentioned preset encoding algorithm includes one of the following: an H264 encoding algorithm, an H265 encoding algorithm, and an AV1 encoding algorithm.

The video processing apparatus provided by the embodiments of the present application has the same implementation principle and technical effects as the foregoing video processing method embodiments. For brief description, for the parts not mentioned in the embodiments of the video processing apparatus, reference may be made to the foregoing video processing method. Corresponding content in the examples.

An embodiment of the present application further provides an electronic device, as shown in FIG. 5 , which is a schematic structural diagram of the electronic device, wherein the electronic device includes a processor 51 and a memory 50 , and the memory 50 stores data that can be used by the processor 51 . Executed computer-executable instructions, the processor 51 executes the computer-executable instructions to implement the above method.

In the embodiment shown in FIG. 5 , the electronic device further includes a bus 52 and a communication interface 53 , wherein the processor 51 , the communication interface 53 and the memory 50 are connected through the bus 52 .

The memory 50 may include a high-speed random access memory (RAM, Random Access Memory), and may also include a non-volatile memory (non-volatile memory), such as at least one disk memory. The communication connection between the system network element and at least one other network element is realized through at least one communication interface 53 (which may be wired or wireless), and the Internet, wide area network, local network, metropolitan area network, etc. may be used. The bus 52 may be an ISA (Industry Standard Architecture, industry standard architecture) bus, a PCI (Peripheral Component Interconnect, peripheral component interconnect standard) bus or an EISA (Extended Industry Standard Architecture, extended industry standard architecture) bus and the like. The bus 52 can be divided into an address bus, a data bus, a control bus, and the like. For ease of representation, only one bidirectional arrow is shown in FIG. 5, but it does not mean that there is only one bus or one type of bus.

The processor 51 may be an integrated circuit chip with signal processing capability. In the implementation process, each step of the above-mentioned method can be completed by a hardware integrated logic circuit in the processor 51 or an instruction in the form of software. The above-mentioned processor 51 may be a general-purpose processor, including a central processing unit (Central Processing Unit, referred to as CPU), a network processor (Network Processor, referred to as NP), etc.; may also be a digital signal processor (Digital Signal Processor, referred to as DSP) ), Application Specific Integrated Circuit (ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, and discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in conjunction with the embodiments of the present application may be directly embodied as executed by a hardware decoding processor, or executed by a combination of hardware and software modules in the decoding processor. The software modules may be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other storage media mature in the art. The storage medium is located in the memory, and the processor 51 reads the information in the memory, and completes the steps of the methods of the foregoing embodiments in combination with its hardware.

Embodiments of the present application further provide a computer-readable storage medium, where computer-executable instructions are stored in the computer-readable storage medium, and when the computer-executable instructions are invoked and executed by a processor, the computer-executable instructions cause the processor to For the implementation of the above method, reference may be made to the foregoing method embodiments for specific implementation, and details are not described herein again.

The video processing method, apparatus, and computer program product of an electronic device provided by the embodiments of the present application include a computer-readable storage medium storing program codes, and the instructions included in the program codes can be configured to execute the methods described in the foregoing method embodiments. The specific implementation can refer to the method embodiment, which is not repeated here.

The relative steps, numerical expressions and numerical values of the components and steps set forth in these embodiments do not limit the scope of the present application unless specifically stated otherwise.

The functions, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a processor-executable non-volatile computer-readable storage medium. Based on this understanding, the technical solution of the present application can be embodied in the form of a software product in essence, or the part that contributes to the related technology or the part of the technical solution. The computer software product is stored in a storage medium, including several The instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned storage medium includes: U disk, mobile hard disk, Read-Only Memory (ROM, Read-Only Memory), Random Access Memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program codes .

In the video processing method provided by the embodiment of the present application, firstly, by decoding the two channels of original video sources to be presented in 3D, respectively, a first video frame sequence and a second video frame sequence corresponding to the two channels of original video sources are obtained; The identification information of the video frame, from the first video frame sequence and the second video frame sequence, alternately select target video frames with complementary identification information respectively, obtain the first target video frame set and the second target video frame set; then according to the target video frame The sequence of the identification information is cross-combined with the target video frames in the above two sets to obtain the target video frame sequence. The embodiment of the present application can obtain a video frame sequence with naked-eye 3D visual effects by cross-merging the video frames of two video sources, and then can generate a 3D naked-eye video file that can be played by a common player, so that the user does not need to Depending on special equipment, you can watch videos with 3D effects with the naked eye.

In the description of this application, it should be noted that the terms "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", etc. The indicated orientation or positional relationship is based on the orientation or positional relationship shown in the accompanying drawings, which is only for the convenience of describing the present application and simplifying the description, rather than indicating or implying that the indicated device or element must have a specific orientation or a specific orientation. construction and operation, and therefore should not be construed as limitations on this application. Furthermore, the terms "first", "second", and "third" are used for descriptive purposes only and should not be construed to indicate or imply relative importance.

Finally, it should be noted that the above-mentioned embodiments are only specific implementations of the present application, and are used to illustrate the technical solutions of the present application, rather than limit them. The embodiments describe the application in detail, and those of ordinary skill in the art should understand that: any person skilled in the art can still modify the technical solutions described in the foregoing embodiments within the technical scope disclosed in the application. Or can easily think of changes, or equivalently replace some of the technical features; and these modifications, changes or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions in the embodiments of the application, and should be covered in this application. within the scope of protection. Therefore, the protection scope of the present application should be based on the protection scope of the claims.

Industrial Applicability

The present application can be applied to the technical field of software algorithm development, and provides a video processing method, device, electronic device, and computer-readable storage medium. Through cross-merging processing of video frames of two video sources, a naked-eye 3D visual effect can be obtained. The video frame sequence can be used to generate a naked-eye 3D video file, so that users can watch the video with naked-eye effect without relying on special equipment.

Claims

A video processing method, comprising:

Obtain two channels of original video sources for 3D rendering;

Decoding the two channels of the original video sources respectively to obtain a first video frame sequence and a second video frame sequence corresponding to the two channels of the original video sources respectively;

According to the identification information of the video frames, target video frames are alternately selected from the first video frame sequence and the second video frame sequence, so that the target video frames in the first target video frame set and the second target video frame set are obtained. The identification information of the video frame is complementary; and

According to the sequence of the identification information of the target video frames, cross-combining the target video frames in the first target video frame set and the second target video frame set to obtain a target video frame sequence.
The method according to claim 1, wherein after obtaining the target video frame sequence, the method further comprises:

The audio frames in the original video source and the target video frames in the target video frame sequence are encapsulated to obtain two-channel naked-eye three-dimensional video files corresponding to the original video sources.
The method according to claim 1 or 2, wherein the identification information includes a decompression timestamp of a video frame or a sequence number corresponding to the video frame.
The method according to any one of claims 1 to 3, wherein the step of acquiring two channels of original video sources to be presented three-dimensionally comprises:

Two channels of original video sources to be presented in 3D are collected through parallel binocular camera equipment;

or,

The target video source is calculated by image recognition technology, and two original video sources with parallax are obtained.
The method according to any one of claims 1 to 4, wherein the step of alternately selecting target video frames from the first video frame sequence and the second video frame sequence, respectively, according to the identification information of the video frames, comprises: :

According to the sequence of the identification information of the video frames in the first video frame sequence, a specified number of video frames are sequentially spaced, and the specified number of first target video frames are selected from the first video frame sequence and added to the first target video frame of the first video frame sequence. in a target set of video frames; and

In the second video frame sequence, the second target video frames having the same identification information as the unselected video frames in the first video frame sequence are sequentially added to the second target video frame set according to the sequence of the identification information.
The method according to any one of claims 1 to 4, wherein the step of alternately selecting target video frames from the first video frame sequence and the second video frame sequence, respectively, according to the identification information of the video frames, comprises: :

According to the identification information of the video frame, select odd-numbered frames from the first video frame sequence to add to the first target video frame set, and select even-numbered frames from the second video frame sequence to add to the second target video frame set .
The method according to any one of claims 1 to 6, wherein, before the step of alternately selecting target video frames from the first video frame sequence and the second video frame sequence according to the identification information of the video frames, The method also includes:

If the identification information of the video frames in the first video frame sequence and the second video frame sequence are inconsistent, the identification information of the video frames is adjusted to be consistent on the premise of maintaining the order.
The method according to claim 2, wherein the audio frame in the original video source and the target video frame in the target video frame sequence are encapsulated to obtain two channels of glasses-free 3D video files corresponding to the original video source steps, including:

Using the audio frame in any channel of the original video source as the target audio frame; and

The target audio frame and the target video frame in the target video frame sequence are encapsulated by a preset coding algorithm, so as to obtain the naked-eye 3D video files corresponding to the two channels of the original video source.
The method of claim 8, wherein the preset encoding algorithm includes any one of an H264 encoding algorithm, an H265 encoding algorithm, and an AV1 encoding algorithm.
A video processing device, comprising:

The video source acquisition module is set to acquire two channels of original video sources to be presented in three dimensions;

a video decoding module, configured to decode the two channels of the original video sources respectively, to obtain the first video frame sequence and the second video frame sequence corresponding to the two channels of the original video sources respectively;

The target frame selection module is configured to alternately select target video frames from the first video frame sequence and the second video frame sequence according to the identification information of the video frame, so that the obtained first target video frame set and the second The identification information of the target video frames in the target video frame set is complementary; and

The frame combining module is configured to cross combine the target video frames in the first target video frame set and the second target video frame set according to the sequence of the identification information of the target video frames to obtain a target video frame sequence.
An electronic device comprising a processor and a memory, the memory storing computer-executable instructions executable by the processor, the processor executing the computer-executable instructions to implement any one of claims 1 to 9 method described in item.
A computer-readable storage medium storing computer-executable instructions, when the computer-executable instructions are invoked and executed by a processor, the computer-executable instructions cause the processor to implement the method described in any one of claims 1 to 9. method.