WO2022142757A1 - Video processing method and apparatus, electronic device, and computer readable storage medium - Google Patents

Video processing method and apparatus, electronic device, and computer readable storage medium Download PDF

Info

Publication number
WO2022142757A1
WO2022142757A1 PCT/CN2021/129870 CN2021129870W WO2022142757A1 WO 2022142757 A1 WO2022142757 A1 WO 2022142757A1 CN 2021129870 W CN2021129870 W CN 2021129870W WO 2022142757 A1 WO2022142757 A1 WO 2022142757A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
video frame
target
frames
frame sequence
Prior art date
Application number
PCT/CN2021/129870
Other languages
French (fr)
Chinese (zh)
Inventor
朱韬
Original Assignee
北京金山云网络技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京金山云网络技术有限公司 filed Critical 北京金山云网络技术有限公司
Publication of WO2022142757A1 publication Critical patent/WO2022142757A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/81Monomedia components thereof
    • H04N21/816Monomedia components thereof involving special video data, e.g 3D video
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/10Processing, recording or transmission of stereoscopic or multi-view image signals
    • H04N13/106Processing image signals
    • H04N13/161Encoding, multiplexing or demultiplexing different image signal components
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/30Image reproducers
    • H04N13/302Image reproducers for viewing without the aid of special glasses, i.e. using autostereoscopic displays
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/597Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
    • H04N21/44016Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving splicing one content stream with another content stream, e.g. for substituting a video clip
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/81Monomedia components thereof

Definitions

  • the present application relates to the technical field of software algorithm development, and in particular, to a video processing method, apparatus, electronic device, and computer-readable storage medium.
  • 3D three dimensional, three-dimensional
  • one way is that the user directly wears 3D glasses or VR (virtual reality, virtual reality) glasses, relying on special equipment; the other way is Yes, the player is modified to play a video with 3D effect, which can be watched directly by the human eye, but ordinary video players cannot be used in this way, and the rapid switching of the two videos will consume a lot of computing resources of the device, which is not suitable for high-definition and high-quality images.
  • the 3D effect of the video has serious implications.
  • the purpose of this application is to provide a video processing method, device, electronic device and computer-readable storage medium, which can obtain a video frame sequence with naked-eye 3D visual effects through cross-merging processing of video frames of two video sources, and then It can generate 3D naked-eye video files that can be played by ordinary players, so that users can watch videos with naked eyes without relying on special equipment.
  • an embodiment of the present application provides a video processing method, including: acquiring two channels of original video sources to be presented in 3D; decoding the two channels of original video sources respectively to obtain first videos corresponding to the two channels of original video sources respectively frame sequence and the second video frame sequence; according to the identification information of the video frame, alternately select target video frames from the first video frame sequence and the second video frame sequence, so that the first target video frame set and the second target video
  • the identification information of the target video frame in the frame set is complementary; And according to the order of the identification information of the target video frame, the target video frame in the first target video frame set and the second target video frame set are cross-combined to obtain the target video frame. sequence.
  • the method further includes: encapsulating the audio frame in the original video source and the target video frame in the target video frame sequence to obtain the naked-eye 3D video corresponding to the two original video sources. document.
  • the above-mentioned identification information includes a decompression timestamp of the video frame or a sequence number corresponding to the video frame.
  • the above-mentioned step of acquiring two channels of original video sources to be presented in 3D includes: collecting two channels of original video sources set for 3D presentation through a parallel binocular camera device; Calculation is performed to obtain two original video sources with parallax.
  • the above-mentioned step of alternately selecting target video frames from the first video frame sequence and the second video frame sequence according to the identification information of the video frame comprises: according to the identification information of the video frame in the first video frame sequence. sequence, the specified number of video frames are spaced in turn, and the specified number of first target video frames are selected from the first video frame sequence to be added to the first target video frame set;
  • the unselected video frames in the video frame sequence have second target video frames with the same identification information, and are sequentially added to the second target video frame set according to the sequence of the identification information.
  • the above-mentioned step of alternately selecting the target video frame from the first video frame sequence and the second video frame sequence according to the identification information of the video frame includes: according to the identification information of the video frame, from the first video frame sequence. Odd-numbered frames are selected from the middle to be added to the first target video frame set, and even-numbered frames are selected from the second video frame sequence to be added to the second target video frame set.
  • the method before the step of alternately selecting target video frames from the first video frame sequence and the second video frame sequence according to the identification information of the video frame, the method further includes: if the first video frame sequence and the second video frame sequence are In the frame sequence, the identification information of the video frames is inconsistent, and the identification information of the video frames is adjusted to be consistent without changing the order.
  • the above-mentioned steps of encapsulating the audio frame in the original video source and the target video frame in the target video frame sequence to obtain the corresponding naked-eye 3D video files of the two original video sources include: using any original video The audio frame in the source is used as the target audio frame; the target audio frame and the target video frame in the target video frame sequence are encapsulated by the preset coding algorithm, and the naked eye 3D video file corresponding to the two original video sources is obtained.
  • the above-mentioned preset encoding algorithm includes one of the following: an H264 encoding algorithm, an H265 encoding algorithm, and an AV1 encoding algorithm.
  • an embodiment of the present application further provides a video processing device, the device includes: a video source acquisition module configured to acquire two channels of original video sources to be presented in 3D; a video decoding module configured to acquire two channels of original video sources Decoding is performed respectively to obtain the first video frame sequence and the second video frame sequence corresponding to the two original video sources respectively; the target frame selection module is set to select from the first video frame sequence and the second video frame according to the identification information of the video frame. The sequence alternately selects the target video frame respectively, so that the identification information of the target video frame in the first target video frame set and the second target video frame set obtained is complementary; And the frame combining module is set to be according to the identification information of the target video frame. sequence, cross-combining the target video frames in the first target video frame set and the second target video frame set to obtain a target video frame sequence.
  • embodiments of the present application further provide an electronic device, including a processor and a memory, where the memory stores computer-executable instructions that can be executed by the processor, and the processor executes the computer-executable instructions to implement the method of the first aspect above .
  • embodiments of the present application further provide a computer-readable storage medium, where the computer-readable storage medium stores computer-executable instructions, and when the computer-executable instructions are invoked and executed by the processor, the computer-executable instructions cause the processor to The method described in the first aspect above is implemented.
  • FIG. 1 is a flowchart of a video processing method provided by an embodiment of the present application.
  • FIG. 2 is a flowchart of a video frame selection method provided by an embodiment of the present application.
  • FIG. 3 is a schematic diagram of a video cross-combination provided by an embodiment of the present application.
  • FIG. 4 is a structural block diagram of a video processing apparatus provided by an embodiment of the present application.
  • FIG. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
  • the stereoscopic video technology mainly used in 3D movies is to directly synthesize an image with binocular parallax through parallel cameras during shooting, and fine-tune processing is performed later.
  • two parallax images exist on the screen at the same time.
  • special equipment such as polarized glasses
  • the left and right eyes can obtain corresponding videos at the same time, and the parallax is perceived as a stereoscopic image after the human brain is processed.
  • the second type VR stereoscopic video, which is to play two videos with parallax generated by two parallel cameras, and use VR equipment (head mounted device) to make people's left and right eyes receive their corresponding videos respectively, resulting in The parallax is processed by the human brain to feel the stereoscopic video effect.
  • VR equipment head mounted device
  • the third type conventional naked-eye 3D technology, the player is transformed, and the two-channel video with parallax generated by parallel cameras is quickly alternately played on a player terminal device, and the video is quickly switched when the human eye directly watches it.
  • Parallax is generated, and the human brain perceives the stereoscopic image effect after processing the parallax.
  • the embodiments of the present application provide a video processing method, apparatus, electronic device, and computer-readable storage medium.
  • a video frame sequence with naked-eye 3D visual effects can be obtained.
  • a 3D naked-eye video file that can be played by a common player can be generated, so that the user can watch the 3D-effect video without relying on a special device.
  • FIG. 1 is a flowchart of a video processing method provided by an embodiment of the present application, and the video processing method specifically includes the following steps:
  • Step S102 acquiring two channels of original video sources to be presented in 3D.
  • the two channels of original video sources to be presented in 3D in the embodiments of the present application may be two channels of video sources with parallax collected by hardware devices, for example, two channels of video sources generated by parallel photography with parallel binocular camera devices; or , or two video sources obtained by performing video disparity calculation through software technology, for example, calculating the target video source through image recognition technology to obtain two original video sources with parallax.
  • the above binocular camera device includes two cameras or binocular cameras that take pictures in parallel.
  • Step S104 Decode the two channels of original video sources respectively to obtain a first video frame sequence and a second video frame sequence corresponding to the two channels of original video sources respectively.
  • the above video source decoding process can be implemented by a hardware decoder or a software decoding technology. After decoding the two original video sources respectively, the first video frame sequence and the second video frame sequence can be obtained, and the first video frame sequence can be obtained. and the second video frame sequence are arranged according to the sequence of the identification information of the video frames.
  • Step S106 according to the identification information of the video frame, alternately select target video frames from the first video frame sequence and the second video frame sequence, so that the target video in the first target video frame set and the second target video frame set is obtained.
  • the identification information of the frame is complementary.
  • the above identification information may be the decompression timestamp of the video frame, or may also be the sequence number corresponding to the video frame.
  • the target video frames are alternately selected from the first video frame sequence and the second video frame sequence, which can be alternately selected unevenly or evenly.
  • the first video frame sequence includes video frames A1, A2, A3 ...A20
  • the second video frame sequence includes video frames B1, B2, B3...B20
  • target video frames A1, A2 can be selected from the first video frame sequence
  • target video frames B3, A2 can be selected from the second video frame sequence B4, B5, then select the target video frame A6 from the first video frame sequence, select the target video frame B7, B8 from the second video frame sequence, and so on, until the last frame, this is an uneven alternate selection, As long as the identification information of the target video frames in the first target video frame set and the second target video frame set are complementary to each other.
  • the target video frames selected from the first video frame sequence form the first target video frame set; the target video frames selected from the second video frame sequence form the second target video frame set; the first target video frame set and the first target video frame set
  • the target video frames in the two target video frame sets are complementary.
  • Step S108 according to the sequence of the identification information of the target video frames, cross-combine the target video frames in the first target video frame set and the second target video frame set to obtain a target video frame sequence.
  • the target video frames are cross-combined to obtain the target video frame sequence.
  • the target video frames are alternately selected in the order of the decompression time stamps of the video frames, and the target video frames are cross-merged to obtain a naked-eye 3D video frame.
  • the video frame sequence of visual effects can then be combined into a complete naked-eye 3D video through the video coding algorithm.
  • the two-channel video with parallax can be played alternately, and the viewer does not need to use special equipment. It can be viewed, processed by the brain, and the stereoscopic effect is perceived.
  • the following method steps may also be included: encapsulating the audio frame in the original video source and the target video frame in the target video frame sequence to obtain the glasses-free 3D video files corresponding to the two original video sources.
  • the embodiment of the present application only performs cross-merging processing on video frames, and does not perform special processing on audio.
  • the audio frames corresponding to any original video source can be used for encapsulation, that is, the audio frames in the original video source can be encapsulated by
  • the glasses-free 3D video file corresponding to the two original video sources can be generated.
  • the video source used in the embodiment of the present application is a two-channel video file generated by a two-channel parallel camera device for simulating binocular vision.
  • Video source used in some 3D stereoscopic video production Two parallel camera devices, simulating binocular vision for video capture, the two video sources must always have the same target, synchronize the start and end time of the audio, and unify the frame rate of the two channels of video (not less than 60 frames/second: 60 complete frames per second are generated) screen).
  • the target video source can also be calculated by the image recognition technology to obtain two channels of original video sources with parallax, that is, the acquisition of the video source can be realized by means of software.
  • the embodiment of the present application provides an embodiment of the target video frame selection method, that is, the above step S106, according to the identification information of the video frame, from the first video frame sequence and the second video frame sequence.
  • the steps of alternately selecting target video frames can be realized by referring to the flowchart of the video frame selection method shown in FIG. 2 :
  • Step S202 according to the sequence of the identification information of the video frames in the first video frame sequence, the specified number of video frames are spaced in turn, and the specified number of first target video frames are selected from the first video frame sequence and added to the first target video frame. frame collection.
  • the above specified number can be one or more.
  • the number of video frames at intervals is the same as the number of target frames selected, that is, the target video frames are alternately and evenly selected from the two video frame sequences. This method Able to achieve better 3D video effect. For example, if the first video frame sequence includes video frames A1, A2, A3...A20, and the second video frame sequence includes video frames B1, B2, B3...B20, then two video frames can be spaced apart, starting from the first video frame sequence. Select the first target video frame A3, A4, and then two video frames apart, select the first target video frame A7, A8 from the first video frame sequence, and so on, until the last frame. The first target video frames A3, A4, A7, A8, . . . selected above are sequentially added to the first target video frame set.
  • Step S204 in the second video frame sequence, the second target video frame with the same identification information as the unselected video frame in the first video frame sequence is added to the second target video frame set according to the sequence of the identification information.
  • the second video frame sequence includes video frames B1, B2, B3... , that is, the identification information of the video frame A1 and the video frame B1 are consistent, and the identification information of the video frame A2 and the video frame B2 are consistent..., and if it is a two-channel video source calculated by a software method, the decoded two channels correspond to There may be some differences in the identification information of the two video frames. In this case, it is necessary to first adjust the identification information of the two video frames to be consistent on the premise of keeping the order unchanged.
  • the second target video frame in the second video frame sequence can be directly determined, that is, the second video frame sequence is compared with the first target video frame in the second video frame sequence.
  • Unselected video frames in a sequence of video frames have second target video frames with the same identification information, and are sequentially added to the second target video frame set according to the sequence of the identification information.
  • the above example is also used to illustrate, after determining that the first target video frames in the first target video frame set are respectively A3, A4, A7, A8..., it can be determined that the unselected video frame in the first video frame sequence is A1 , A2, A5, A6..., the video frames with the same identification information as the video frames A1, A2, A5, A6... in the second video frame sequence are B1, B2, B5, B6..., therefore, The video frames B1, B2, B5, B6... are added to the second target video frame set as second target video frames.
  • the above-mentioned step of alternately selecting target video frames from the first video frame sequence and the second video frame sequence according to the identification information of the video frames can be implemented in the following manner. Referring to FIG. 3 , according to the video frame identification information, select odd-numbered frames from the first video frame sequence, such as the first video frame A1, the first video frame A3 in the figure, and add them to the first target video frame set, and select from the second video frame sequence.
  • the even-numbered frames such as the second video frame B2, the second video frame B4, . . . in the figure, are added to the second target video frame set.
  • the odd-numbered frame and the even-numbered frame are cross-merged to obtain the target video frame sequence: the first video frame A1, the second video frame B2, the first video frame A3, the second video frame B4... ....
  • This method is also the case where the specified number is one in the above-mentioned embodiment, that is, a video frame is selected every other video frame, and this video frame selection method can better realize the 3D effect of the video.
  • the target video frames in the two sets are further cross-merged according to the identification information of the target video frames to generate a target video frame sequence, and then the original target video frame sequence is generated.
  • the audio frame in the video source is encapsulated with the target video frame in the target video frame sequence to obtain the glasses-free 3D video file corresponding to the two original video sources.
  • the audio frame in any original video source can be used as the target audio frame; the target audio frame and the target video frame in the target video frame sequence are encapsulated by a preset coding algorithm, and the corresponding two-way original video sources are obtained. naked-eye 3D video files.
  • the above-mentioned preset encoding algorithm may include one of the following: H264 encoding algorithm, H265 encoding algorithm and AV1 encoding algorithm.
  • the newly generated video file has the same frame rate as the original video source, but since the two video files are synthesized, the frame rate of each monocular video (acting on one eye) is the frame rate of the original video source. half of . Therefore, after merging the video files, in order to reduce the frustration when the human eye perceives the video monocularly, the original video source is required to increase the frame rate as much as possible during recording.
  • the newly generated glasses-free 3D video files can be played on ordinary video players. If a person watches with only one eye, the effect is a ghosted video. If a person watches with both eyes, after a short period of adaptation, the brain will automatically assign the ghost image to the unilateral eye to receive information. After processing in the visual perception area of the human brain, the illusion of viewing a stereoscopic image is finally produced, that is, the realization of the The effect of naked eye 3D video.
  • Video processing methods provided by the embodiments of the present application are implemented by pure software algorithms, and do not rely on proprietary viewing devices or dedicated video players. Video files can also use encoding formats commonly used in the market, and all playback conditions are not affected by The viewer can experience the 3D stereoscopic video effect with naked eyes.
  • the embodiments of the present application further provide a video processing apparatus, as shown in FIG. 4 , the apparatus includes: a video source obtaining module 402, configured to obtain two channels of original video sources to be presented in 3D; a video decoding module 404, set to decode the two-way original video sources respectively, and obtain the first video frame sequence and the second video frame sequence corresponding to the two-way original video sources respectively; the target frame selection module 406 is set to according to the identification information of the video frame, The target video frames are alternately selected from the first video frame sequence and the second video frame sequence, so that the obtained first target video frame set and the identification information of the target video frames in the second target video frame set are complementary; frame combining module 408 is set to cross-combine the target video frames in the first target video frame set and the second target video frame set according to the sequence of the identification information of the target video frames to obtain the target video frame sequence.
  • a video source obtaining module 402 configured to obtain two channels of original video sources to be presented in 3D
  • the above-mentioned video processing apparatus further includes: an audio and video encapsulation module 410, configured to encapsulate the audio frames in the original video source and the target video frames in the target video frame sequence, to obtain two channels of original video frames.
  • the above-mentioned identification information includes a decompression timestamp of the video frame or a sequence number corresponding to the video frame.
  • the above-mentioned video source obtaining module 402 is further configured to: collect two channels of original video sources to be presented in 3D through a binocular camera device; or, calculate the target video source through image recognition technology, and obtain Two original video sources with parallax.
  • the above-mentioned target frame selection module 406 is further configured to: according to the sequence of the identification information of the video frames in the first video frame sequence, sequentially space a specified number of video frames, from the first video frame sequence Select a specified number of first target video frames in the first target video frame set and add to the first target video frame set; in the second video frame sequence, the second target video with the same identification information as the video frame not selected in the first video frame sequence has the same identification information.
  • the frames are sequentially added to the second target video frame set according to the sequence of the identification information.
  • the target frame selection module 406 is further configured to: select odd-numbered frames from the first video frame sequence and add them to the first target video frame set according to the identification information of the video frames, and select the odd-numbered frames from the second video frame sequence. Selecting even-numbered frames from the video frame sequence is added to the second target video frame set.
  • the above-mentioned apparatus further includes: a time stamp adjustment module 412, configured to adjust the identification information of the video frame to the identification information of the video frame if the identification information of the video frame is inconsistent in the first video frame sequence and the second video frame sequence Adjusted to be consistent under the premise of keeping the order unchanged.
  • a time stamp adjustment module 412 configured to adjust the identification information of the video frame to the identification information of the video frame if the identification information of the video frame is inconsistent in the first video frame sequence and the second video frame sequence Adjusted to be consistent under the premise of keeping the order unchanged.
  • the above-mentioned audio and video encapsulation module 410 is further configured to: take any audio frame in the original video source as the target audio file; The target video frame is encapsulated, and the glasses-free 3D video file corresponding to the two original video sources is obtained.
  • the above-mentioned preset encoding algorithm includes one of the following: an H264 encoding algorithm, an H265 encoding algorithm, and an AV1 encoding algorithm.
  • the video processing apparatus provided by the embodiments of the present application has the same implementation principle and technical effects as the foregoing video processing method embodiments.
  • For brief description for the parts not mentioned in the embodiments of the video processing apparatus, reference may be made to the foregoing video processing method. Corresponding content in the examples.
  • FIG. 5 is a schematic structural diagram of the electronic device, wherein the electronic device includes a processor 51 and a memory 50 , and the memory 50 stores data that can be used by the processor 51 . Executed computer-executable instructions, the processor 51 executes the computer-executable instructions to implement the above method.
  • the electronic device further includes a bus 52 and a communication interface 53 , wherein the processor 51 , the communication interface 53 and the memory 50 are connected through the bus 52 .
  • the memory 50 may include a high-speed random access memory (RAM, Random Access Memory), and may also include a non-volatile memory (non-volatile memory), such as at least one disk memory.
  • the communication connection between the system network element and at least one other network element is realized through at least one communication interface 53 (which may be wired or wireless), and the Internet, wide area network, local network, metropolitan area network, etc. may be used.
  • the bus 52 may be an ISA (Industry Standard Architecture, industry standard architecture) bus, a PCI (Peripheral Component Interconnect, peripheral component interconnect standard) bus or an EISA (Extended Industry Standard Architecture, extended industry standard architecture) bus and the like.
  • the bus 52 can be divided into an address bus, a data bus, a control bus, and the like. For ease of representation, only one bidirectional arrow is shown in FIG. 5, but it does not mean that there is only one bus or one type of bus.
  • the processor 51 may be an integrated circuit chip with signal processing capability. In the implementation process, each step of the above-mentioned method can be completed by a hardware integrated logic circuit in the processor 51 or an instruction in the form of software.
  • the above-mentioned processor 51 may be a general-purpose processor, including a central processing unit (Central Processing Unit, referred to as CPU), a network processor (Network Processor, referred to as NP), etc.; may also be a digital signal processor (Digital Signal Processor, referred to as DSP) ), Application Specific Integrated Circuit (ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, and discrete hardware components.
  • CPU Central Processing Unit
  • NP Network Processor
  • DSP Digital Signal Processor
  • ASIC Application Specific Integrated Circuit
  • FPGA Field-Programmable Gate Array
  • a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
  • the steps of the method disclosed in conjunction with the embodiments of the present application may be directly embodied as executed by a hardware decoding processor, or executed by a combination of hardware and software modules in the decoding processor.
  • the software modules may be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other storage media mature in the art.
  • the storage medium is located in the memory, and the processor 51 reads the information in the memory, and completes the steps of the methods of the foregoing embodiments in combination with its hardware.
  • Embodiments of the present application further provide a computer-readable storage medium, where computer-executable instructions are stored in the computer-readable storage medium, and when the computer-executable instructions are invoked and executed by a processor, the computer-executable instructions cause the processor to
  • a computer-readable storage medium where computer-executable instructions are stored in the computer-readable storage medium, and when the computer-executable instructions are invoked and executed by a processor, the computer-executable instructions cause the processor to
  • the video processing method, apparatus, and computer program product of an electronic device provided by the embodiments of the present application include a computer-readable storage medium storing program codes, and the instructions included in the program codes can be configured to execute the methods described in the foregoing method embodiments.
  • the specific implementation can refer to the method embodiment, which is not repeated here.
  • the functions, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a processor-executable non-volatile computer-readable storage medium.
  • the technical solution of the present application can be embodied in the form of a software product in essence, or the part that contributes to the related technology or the part of the technical solution.
  • the computer software product is stored in a storage medium, including several
  • the instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage medium includes: U disk, mobile hard disk, Read-Only Memory (ROM, Read-Only Memory), Random Access Memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program codes .
  • a first video frame sequence and a second video frame sequence corresponding to the two channels of original video sources are obtained;
  • the identification information of the video frame from the first video frame sequence and the second video frame sequence, alternately select target video frames with complementary identification information respectively, obtain the first target video frame set and the second target video frame set; then according to the target video frame
  • the sequence of the identification information is cross-combined with the target video frames in the above two sets to obtain the target video frame sequence.
  • the embodiment of the present application can obtain a video frame sequence with naked-eye 3D visual effects by cross-merging the video frames of two video sources, and then can generate a 3D naked-eye video file that can be played by a common player, so that the user does not need to Depending on special equipment, you can watch videos with 3D effects with the naked eye.
  • the present application can be applied to the technical field of software algorithm development, and provides a video processing method, device, electronic device, and computer-readable storage medium. Through cross-merging processing of video frames of two video sources, a naked-eye 3D visual effect can be obtained.
  • the video frame sequence can be used to generate a naked-eye 3D video file, so that users can watch the video with naked-eye effect without relying on special equipment.

Abstract

The present application provides a video processing method and apparatus, an electronic device, and a computer readable storage medium. The method comprises: obtaining a first video frame sequence and a second video frame sequence respectively corresponding to the two original video sources represented in a 3D form; alternately selecting target video frames having complementary identifier information from the first video frame sequence and the second video frame sequence according to identifier information of video frames to obtain a first target video frame set and a second target video frame set; and cross-merging the target video frames in the two sets according to the sequence of the identifier information of the target video frames to obtain a target video frame sequence. According to the present application, by cross-merging the video frames of the two video sources, the video frame sequence having a naked eye 3D visual effect can be obtained, so that a naked-eye 3D video file can be generated, and a user does not need to rely on a special device to watch a video having a 3D effect by using the naked eyes.

Description

视频处理方法、装置、电子设备及计算机可读存储介质Video processing method, apparatus, electronic device, and computer-readable storage medium
相关申请的交叉引用CROSS-REFERENCE TO RELATED APPLICATIONS
本申请要求于2020年12月30日提交中国专利局的申请号为202011614555.7、名称为“视频处理方法、装置及电子设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese Patent Application No. 202011614555.7 and entitled "Video Processing Method, Apparatus and Electronic Equipment" filed with the Chinese Patent Office on December 30, 2020, the entire contents of which are incorporated herein by reference .
技术领域technical field
本申请涉及软件算法开发技术领域,尤其是涉及一种视频处理方法、装置、电子设备及计算机可读存储介质。The present application relates to the technical field of software algorithm development, and in particular, to a video processing method, apparatus, electronic device, and computer-readable storage medium.
背景技术Background technique
相关技术中,想要实现3D(three dimensional,三维)效果,有两种方式:一种方式是用户直接配戴3D眼镜或VR(virtual reality,虚拟现实)眼镜,依赖专用设备;另一种方式是,对播放器进行改造,播放出3D效果的视频,人眼直接观看,但这种方式不能使用普通视频播放器,而且两个视频的快速切换会大量消耗设备计算资源,对高清高画质视频的3D效果有严重影响。In the related art, there are two ways to achieve a 3D (three dimensional, three-dimensional) effect: one way is that the user directly wears 3D glasses or VR (virtual reality, virtual reality) glasses, relying on special equipment; the other way is Yes, the player is modified to play a video with 3D effect, which can be watched directly by the human eye, but ordinary video players cannot be used in this way, and the rapid switching of the two videos will consume a lot of computing resources of the device, which is not suitable for high-definition and high-quality images. The 3D effect of the video has serious implications.
发明内容SUMMARY OF THE INVENTION
本申请的目的在于提供一种视频处理方法、装置、电子设备及计算机可读存储介质,通过对两路视频源的视频帧的交叉合并处理,能够得到具有裸眼3D视觉效果的视频帧序列,进而可以生成能够利用普通播放器播放的3D裸眼视频文件,使用户不需要依赖特殊的设备就可以裸眼观看3D效果的视频。The purpose of this application is to provide a video processing method, device, electronic device and computer-readable storage medium, which can obtain a video frame sequence with naked-eye 3D visual effects through cross-merging processing of video frames of two video sources, and then It can generate 3D naked-eye video files that can be played by ordinary players, so that users can watch videos with naked eyes without relying on special equipment.
第一方面,本申请实施例提供一种视频处理方法,包括:获取要3D呈现的两路原始视频源;对两路原始视频源分别进行解码,得到两路原始视频源分别对应的第一视频帧序列和第二视频帧序列;按照视频帧的标识信息,从第一视频帧序列和第二视频帧序列分别交替选择目标视频帧,以使得到的第一目标视频帧集合和第二目标视频帧集合中的目标视频帧的标识信息互补;以及按照目标视频帧的标识信息的顺序,对第一目标视频帧集合和第二目标视频帧集合中的目标视频帧进行交叉组合,得到目标视频帧序列。In a first aspect, an embodiment of the present application provides a video processing method, including: acquiring two channels of original video sources to be presented in 3D; decoding the two channels of original video sources respectively to obtain first videos corresponding to the two channels of original video sources respectively frame sequence and the second video frame sequence; according to the identification information of the video frame, alternately select target video frames from the first video frame sequence and the second video frame sequence, so that the first target video frame set and the second target video The identification information of the target video frame in the frame set is complementary; And according to the order of the identification information of the target video frame, the target video frame in the first target video frame set and the second target video frame set are cross-combined to obtain the target video frame. sequence.
在一实施方式中,上述得到目标视频帧集合后,方法还包括:将原始视频源中的音频帧与目标视频帧序列中的目标视频帧进行封装,得到两路原始视频源对应的裸眼3D视频 文件。In one embodiment, after obtaining the target video frame set, the method further includes: encapsulating the audio frame in the original video source and the target video frame in the target video frame sequence to obtain the naked-eye 3D video corresponding to the two original video sources. document.
在一实施方式中,上述标识信息包括视频帧的解压时间戳或者视频帧对应的序号。In one embodiment, the above-mentioned identification information includes a decompression timestamp of the video frame or a sequence number corresponding to the video frame.
在一实施方式中,上述获取要3D呈现的两路原始视频源的步骤,包括:通过并联双目摄像设备采集设置为3D呈现的两路原始视频源;或者,通过图像识别技术对目标视频源进行计算,得到存在视差的两路原始视频源。In one embodiment, the above-mentioned step of acquiring two channels of original video sources to be presented in 3D includes: collecting two channels of original video sources set for 3D presentation through a parallel binocular camera device; Calculation is performed to obtain two original video sources with parallax.
在一实施方式中,上述按照视频帧的标识信息,从第一视频帧序列和第二视频帧序列分别交替选择目标视频帧的步骤,包括:根据第一视频帧序列中视频帧的标识信息的顺序,依次间隔指定个数的视频帧,从第一视频帧序列中选择指定个数的第一目标视频帧添加至第一目标视频帧集合中;以及将第二视频帧序列中,与第一视频帧序列中未选择的视频帧具有相同标识信息的第二目标视频帧,按照标识信息的顺序,依次添加至第二目标视频帧集合中。In one embodiment, the above-mentioned step of alternately selecting target video frames from the first video frame sequence and the second video frame sequence according to the identification information of the video frame, comprises: according to the identification information of the video frame in the first video frame sequence. sequence, the specified number of video frames are spaced in turn, and the specified number of first target video frames are selected from the first video frame sequence to be added to the first target video frame set; The unselected video frames in the video frame sequence have second target video frames with the same identification information, and are sequentially added to the second target video frame set according to the sequence of the identification information.
在一实施方式中,上述按照视频帧的标识信息,从第一视频帧序列和第二视频帧序列分别交替选择目标视频帧的步骤,包括:按照视频帧的标识信息,从第一视频帧序列中选择奇数帧添加至第一目标视频帧集合中,从第二视频帧序列中选择偶数帧添加至第二目标视频帧集合中。In one embodiment, the above-mentioned step of alternately selecting the target video frame from the first video frame sequence and the second video frame sequence according to the identification information of the video frame includes: according to the identification information of the video frame, from the first video frame sequence. Odd-numbered frames are selected from the middle to be added to the first target video frame set, and even-numbered frames are selected from the second video frame sequence to be added to the second target video frame set.
在一实施方式中,上述按照视频帧的标识信息,从第一视频帧序列和第二视频帧序列分别交替选择目标视频帧的步骤之前,方法还包括:如果第一视频帧序列和第二视频帧序列中,视频帧的标识信息不一致,将视频帧的标识信息在不改变顺序的前提下调整为一致。In one embodiment, before the step of alternately selecting target video frames from the first video frame sequence and the second video frame sequence according to the identification information of the video frame, the method further includes: if the first video frame sequence and the second video frame sequence are In the frame sequence, the identification information of the video frames is inconsistent, and the identification information of the video frames is adjusted to be consistent without changing the order.
在一实施方式中,上述将原始视频源中的音频帧与目标视频帧序列中的目标视频帧进行封装,得到两路原始视频源对应的裸眼3D视频文件的步骤,包括:以任一路原始视频源中的音频帧作为目标音频帧;通过预设编码算法对目标音频帧和目标视频帧序列中的目标视频帧进行封装,得到两路原始视频源对应的裸眼3D视频文件。In one embodiment, the above-mentioned steps of encapsulating the audio frame in the original video source and the target video frame in the target video frame sequence to obtain the corresponding naked-eye 3D video files of the two original video sources include: using any original video The audio frame in the source is used as the target audio frame; the target audio frame and the target video frame in the target video frame sequence are encapsulated by the preset coding algorithm, and the naked eye 3D video file corresponding to the two original video sources is obtained.
在一实施方式中,上述预设编码算法包括以下之一:H264编码算法、H265编码算法和AV1编码算法。In one embodiment, the above-mentioned preset encoding algorithm includes one of the following: an H264 encoding algorithm, an H265 encoding algorithm, and an AV1 encoding algorithm.
第二方面,本申请实施例还提供一种视频处理装置,该装置包括:视频源获取模块,设置为获取要3D呈现的两路原始视频源;视频解码模块,设置为对两路原始视频源分别进行解码,得到两路原始视频源分别对应的第一视频帧序列和第二视频帧序列;目标帧选择模块,设置为按照视频帧的标识信息,从第一视频帧序列和第二视频帧序列分别交替选择目标视频帧,以使得到的第一目标视频帧集合和第二目标视频帧集合中的目标视频帧的标识信息互补;以及帧组合模块,设置为按照目标视频帧的标识信息的顺序,对第一目标视频帧集合和第二目标视频帧集合中的目标视频帧进行交叉组合,得到目标视频帧序列。In a second aspect, an embodiment of the present application further provides a video processing device, the device includes: a video source acquisition module configured to acquire two channels of original video sources to be presented in 3D; a video decoding module configured to acquire two channels of original video sources Decoding is performed respectively to obtain the first video frame sequence and the second video frame sequence corresponding to the two original video sources respectively; the target frame selection module is set to select from the first video frame sequence and the second video frame according to the identification information of the video frame. The sequence alternately selects the target video frame respectively, so that the identification information of the target video frame in the first target video frame set and the second target video frame set obtained is complementary; And the frame combining module is set to be according to the identification information of the target video frame. sequence, cross-combining the target video frames in the first target video frame set and the second target video frame set to obtain a target video frame sequence.
第三方面,本申请实施例还提供一种电子设备,包括处理器和存储器,存储器存储有 能够被处理器执行的计算机可执行指令,处理器执行计算机可执行指令以实现上述第一方面的方法。In a third aspect, embodiments of the present application further provide an electronic device, including a processor and a memory, where the memory stores computer-executable instructions that can be executed by the processor, and the processor executes the computer-executable instructions to implement the method of the first aspect above .
第四方面,本申请实施例还提供一种计算机可读存储介质,计算机可读存储介质存储有计算机可执行指令,计算机可执行指令在被处理器调用和执行时,计算机可执行指令促使处理器实现上述第一方面所述的方法。In a fourth aspect, embodiments of the present application further provide a computer-readable storage medium, where the computer-readable storage medium stores computer-executable instructions, and when the computer-executable instructions are invoked and executed by the processor, the computer-executable instructions cause the processor to The method described in the first aspect above is implemented.
附图说明Description of drawings
为了更清楚地说明本申请具体实施方式或相关技术中的技术方案,下面将对具体实施方式或相关技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图是本申请的一些实施方式,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the specific embodiments of the present application or related technologies, the following briefly introduces the accompanying drawings required in the description of the specific embodiments or related technologies. Obviously, the accompanying drawings in the following description are For some embodiments of the present application, for those of ordinary skill in the art, other drawings can also be obtained based on these drawings without any creative effort.
图1为本申请实施例提供的一种视频处理方法的流程图;1 is a flowchart of a video processing method provided by an embodiment of the present application;
图2为本申请实施例提供的一种视频帧选择方法的流程图;2 is a flowchart of a video frame selection method provided by an embodiment of the present application;
图3为本申请实施例提供的一种视频交叉组合的示意图;3 is a schematic diagram of a video cross-combination provided by an embodiment of the present application;
图4为本申请实施例提供的一种视频处理装置的结构框图;4 is a structural block diagram of a video processing apparatus provided by an embodiment of the present application;
图5为本申请实施例提供的一种电子设备的结构示意图。FIG. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
具体实施方式Detailed ways
下面将结合实施例对本申请的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。The technical solutions of the present application will be clearly and completely described below with reference to the embodiments. Obviously, the described embodiments are part of the embodiments of the present application, but not all of the embodiments. Based on the embodiments in the present application, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present application.
目前,想要观看3D立体效果的视频有三种方式:Currently, there are three ways to watch videos with 3D stereoscopic effect:
第一种:3D电影主要采用的立体视频技术,都是在拍摄时通过并联摄像直接合成一个带有双目视差的图像,后期在进行微调处理。播放时,两个视差图像同时存在于画面,通过专门设备(例如偏振光眼镜),使人左右眼同时获得相对应的视频,视差在人脑的处理后就感知为立体图像。The first: The stereoscopic video technology mainly used in 3D movies is to directly synthesize an image with binocular parallax through parallel cameras during shooting, and fine-tune processing is performed later. During playback, two parallax images exist on the screen at the same time. Through special equipment (such as polarized glasses), the left and right eyes can obtain corresponding videos at the same time, and the parallax is perceived as a stereoscopic image after the human brain is processed.
第二种:VR立体视频,是将两个并联摄像产生的两个带有视差的视频,同时分别播放,利用VR设备(头戴设备),使人的左右眼分别接收各自对应的视频,造成的视差经过人脑处理,感受到立体视频效果。The second type: VR stereoscopic video, which is to play two videos with parallax generated by two parallel cameras, and use VR equipment (head mounted device) to make people's left and right eyes receive their corresponding videos respectively, resulting in The parallax is processed by the human brain to feel the stereoscopic video effect.
第三种:常规的裸眼3D技术,对播放器进行改造,将并联摄像产生的两路带有视差的视频,在一个播放器终端设备上快速交替播放,人眼直接观看时,快速切换的视频产生了视差,人脑处理视差后,感知到立体图像效果。The third type: conventional naked-eye 3D technology, the player is transformed, and the two-channel video with parallax generated by parallel cameras is quickly alternately played on a player terminal device, and the video is quickly switched when the human eye directly watches it. Parallax is generated, and the human brain perceives the stereoscopic image effect after processing the parallax.
上述方式中,3D电影和VR、3D技术最大问题是依赖专用设备,人眼不能直接观看。常规的裸眼3D技术,播放器要专门改造,不能使用普通视频播放器,而且两个视频的快速切换会大量消耗设备计算资源,对高清高画质视频的3D效果有严重影响。Among the above methods, the biggest problem of 3D movies, VR and 3D technology is that they rely on special equipment and cannot be watched directly by human eyes. For conventional naked-eye 3D technology, the player needs to be specially modified, and ordinary video players cannot be used, and the rapid switching of two videos will consume a large amount of computing resources of the device, which has a serious impact on the 3D effect of high-definition and high-quality videos.
基于此,本申请实施例提供一种视频处理方法、装置、电子设备及计算机可读存储介质,通过对两路视频源的视频帧的交叉合并处理,能够得到具有裸眼3D视觉效果的视频帧序列,进而可以生成能够利用普通播放器播放的3D裸眼视频文件,使用户不需要依赖特殊的设备就可以裸眼观看3D效果的视频。Based on this, the embodiments of the present application provide a video processing method, apparatus, electronic device, and computer-readable storage medium. Through cross-merging processing of video frames of two video sources, a video frame sequence with naked-eye 3D visual effects can be obtained. , and then a 3D naked-eye video file that can be played by a common player can be generated, so that the user can watch the 3D-effect video without relying on a special device.
图1为本申请实施例提供的一种视频处理方法的流程图,该视频处理方法具体包括以下步骤:1 is a flowchart of a video processing method provided by an embodiment of the present application, and the video processing method specifically includes the following steps:
步骤S102,获取要3D呈现的两路原始视频源。Step S102, acquiring two channels of original video sources to be presented in 3D.
本申请实施例中的要3D呈现的两路原始视频源,可以是采用硬件设备采集的存在视差的两路视频源,比如,通过并联双目摄像设备进行并联摄像产生的两路视频源;或者,也可以是通过软件技术进行视频视差计算得到的两路视频源,比如,通过图像识别技术对目标视频源进行计算,得到存在视差的两路原始视频源。上述双目摄像设备包括两个并联摄像的相机或者双目相机。The two channels of original video sources to be presented in 3D in the embodiments of the present application may be two channels of video sources with parallax collected by hardware devices, for example, two channels of video sources generated by parallel photography with parallel binocular camera devices; or , or two video sources obtained by performing video disparity calculation through software technology, for example, calculating the target video source through image recognition technology to obtain two original video sources with parallax. The above binocular camera device includes two cameras or binocular cameras that take pictures in parallel.
步骤S104,对两路原始视频源分别进行解码,得到两路原始视频源分别对应的第一视频帧序列和第二视频帧序列。Step S104: Decode the two channels of original video sources respectively to obtain a first video frame sequence and a second video frame sequence corresponding to the two channels of original video sources respectively.
上述视频源解码过程可以通过硬件解码器实现,也可以利用软件解码技术实现,对两路原始视频源分别解码后,即可得到第一视频帧序列和第二视频帧序列,第一视频帧序列和第二视频帧序列均是按照视频帧的标识信息的顺序排列的。The above video source decoding process can be implemented by a hardware decoder or a software decoding technology. After decoding the two original video sources respectively, the first video frame sequence and the second video frame sequence can be obtained, and the first video frame sequence can be obtained. and the second video frame sequence are arranged according to the sequence of the identification information of the video frames.
步骤S106,按照视频帧的标识信息,从第一视频帧序列和第二视频帧序列分别交替选择目标视频帧,以使得到的第一目标视频帧集合和第二目标视频帧集合中的目标视频帧的标识信息互补。Step S106, according to the identification information of the video frame, alternately select target video frames from the first video frame sequence and the second video frame sequence, so that the target video in the first target video frame set and the second target video frame set is obtained. The identification information of the frame is complementary.
该步骤中,上述标识信息可以是视频帧的解压时间戳,或者也可以是视频帧对应的序号。从第一视频帧序列和第二视频帧序列分别交替选择目标视频帧,可以是不均匀的交替选择,也可以是均匀地交替选择,比如,第一视频帧序列包括视频帧A1、A2、A3……A20,第二视频帧序列包括视频帧B1、B2、B3……B20,那么可以从第一视频帧序列中选择目标视频帧A1、A2,从第二视频帧序列选择目标视频帧B3、B4、B5,再从第一视频帧序列中选择目标视频帧A6,从第二视频帧序列选择目标视频帧B7、B8,以此类推,直到最后一帧,这种是不均匀的交替选择,只要第一目标视频帧集合和第二目标视频帧集合中的目标视频帧的标识信息互补即可。In this step, the above identification information may be the decompression timestamp of the video frame, or may also be the sequence number corresponding to the video frame. The target video frames are alternately selected from the first video frame sequence and the second video frame sequence, which can be alternately selected unevenly or evenly. For example, the first video frame sequence includes video frames A1, A2, A3 ...A20, the second video frame sequence includes video frames B1, B2, B3...B20, then target video frames A1, A2 can be selected from the first video frame sequence, and target video frames B3, A2 can be selected from the second video frame sequence B4, B5, then select the target video frame A6 from the first video frame sequence, select the target video frame B7, B8 from the second video frame sequence, and so on, until the last frame, this is an uneven alternate selection, As long as the identification information of the target video frames in the first target video frame set and the second target video frame set are complementary to each other.
还有另一种方式,均匀地交替选择,比如,从第一视频帧序列中选择目标视频帧A1, 从第二视频帧序列选择目标视频帧B2,再从第一视频帧序列中选择目标视频帧A3,从第二视频帧序列选择目标视频帧B4,以此类推,直到最后一帧。There is another way to evenly and alternately select, for example, select the target video frame A1 from the first video frame sequence, select the target video frame B2 from the second video frame sequence, and then select the target video frame from the first video frame sequence. Frame A3, select the target video frame B4 from the second video frame sequence, and so on until the last frame.
从第一视频帧序列中选出的目标视频帧组成第一目标视频帧集合;从第二视频帧序列中选出的目标视频帧组成第二目标视频帧集合;第一目标视频帧集合和第二目标视频帧集合中的目标视频帧互补。The target video frames selected from the first video frame sequence form the first target video frame set; the target video frames selected from the second video frame sequence form the second target video frame set; the first target video frame set and the first target video frame set The target video frames in the two target video frame sets are complementary.
步骤S108,按照目标视频帧的标识信息的顺序,对第一目标视频帧集合和第二目标视频帧集合中的目标视频帧进行交叉组合,得到目标视频帧序列。Step S108, according to the sequence of the identification information of the target video frames, cross-combine the target video frames in the first target video frame set and the second target video frame set to obtain a target video frame sequence.
在确定出上述第一目标视频帧集合和第二目标视频帧集合后,按照两集合中目标视频帧的标识信息的顺序,将目标视频帧进行交叉组合,得到目标视频帧序列。After the first target video frame set and the second target video frame set are determined, according to the sequence of the identification information of the target video frames in the two sets, the target video frames are cross-combined to obtain the target video frame sequence.
本申请实施例提供的视频处理方法,将并联摄像产生的两路视频源,解码后按视频帧的解压时间戳的顺序,交替选择目标视频帧,并将目标视频帧交叉合并,得到具有裸眼3D视觉效果的视频帧序列,进而可以通过视频编码算法,合并为一个完整的裸眼3D视频,通过普通视频播放器,就可以实现两路带有视差的视频交替播放,观看者不需要借助特殊设备就可以观看,经过大脑处理,感知立体效果。In the video processing method provided by the embodiment of the present application, after decoding two video sources generated by parallel cameras, the target video frames are alternately selected in the order of the decompression time stamps of the video frames, and the target video frames are cross-merged to obtain a naked-eye 3D video frame. The video frame sequence of visual effects can then be combined into a complete naked-eye 3D video through the video coding algorithm. Through the ordinary video player, the two-channel video with parallax can be played alternately, and the viewer does not need to use special equipment. It can be viewed, processed by the brain, and the stereoscopic effect is perceived.
在上述得到目标视频帧序列后,还可以包括以下方法步骤:将原始视频源中的音频帧与目标视频帧序列中的目标视频帧进行封装,得到两路原始视频源对应的裸眼3D视频文件。After obtaining the target video frame sequence, the following method steps may also be included: encapsulating the audio frame in the original video source and the target video frame in the target video frame sequence to obtain the glasses-free 3D video files corresponding to the two original video sources.
本申请实施例只对视频帧进行交叉合并处理,而音频并没有进行特殊的处理,在最后编码时,可以采用任一路原始视频源对应的音频帧进行封装,即通过对原始视频源中的音频帧与目标视频帧序列中的目标视频帧进行封装后,可以生成两路原始视频源对应的裸眼3D视频文件。The embodiment of the present application only performs cross-merging processing on video frames, and does not perform special processing on audio. In the final encoding, the audio frames corresponding to any original video source can be used for encapsulation, that is, the audio frames in the original video source can be encapsulated by After the frame is encapsulated with the target video frame in the target video frame sequence, the glasses-free 3D video file corresponding to the two original video sources can be generated.
实际应用中,要3D呈现的两路原始视频源的获取方式有多种,本申请实施例使用的视频源,是模拟双目视觉用双路并联摄像设备生成的两路视频文件,这也是大部分3D立体视频制作时所使用的视频源。两台并联摄像设备,模拟双目视觉进行视频采集,两路视频源要求始终统一目标,同步音频起止时间,统一两路视频帧率(不低于60帧/秒:每秒中产生60张完整画面)。另外,也可以通过图像识别技术对目标视频源进行计算,得到存在视差的两路原始视频源,也就是通过软件的方式实现视频源的获取。In practical applications, there are many ways to obtain the two-channel original video sources to be presented in 3D. The video source used in the embodiment of the present application is a two-channel video file generated by a two-channel parallel camera device for simulating binocular vision. Video source used in some 3D stereoscopic video production. Two parallel camera devices, simulating binocular vision for video capture, the two video sources must always have the same target, synchronize the start and end time of the audio, and unify the frame rate of the two channels of video (not less than 60 frames/second: 60 complete frames per second are generated) screen). In addition, the target video source can also be calculated by the image recognition technology to obtain two channels of original video sources with parallax, that is, the acquisition of the video source can be realized by means of software.
为了提高裸眼3D视频文件的最终观看效果,本申请实施例提供一实施方式的目标视频帧选择方式,即上述步骤S106,按照视频帧的标识信息,从第一视频帧序列和第二视频帧序列分别交替选择目标视频帧的步骤,可以参见图2所示的视频帧选择方法的流程图实现:In order to improve the final viewing effect of the naked-eye 3D video file, the embodiment of the present application provides an embodiment of the target video frame selection method, that is, the above step S106, according to the identification information of the video frame, from the first video frame sequence and the second video frame sequence. The steps of alternately selecting target video frames can be realized by referring to the flowchart of the video frame selection method shown in FIG. 2 :
步骤S202,根据第一视频帧序列中视频帧的标识信息的顺序,依次间隔指定个数的视频帧,从第一视频帧序列中选择指定个数的第一目标视频帧添加至第一目标视频帧集合中。Step S202, according to the sequence of the identification information of the video frames in the first video frame sequence, the specified number of video frames are spaced in turn, and the specified number of first target video frames are selected from the first video frame sequence and added to the first target video frame. frame collection.
上述指定个数可以是一个,也可以是多个,间隔的视频帧个数与目标帧选择的个数是相同的,也就是从两个视频帧序列中交替均匀选择目标视频帧,这种方式能够实现更好的3D视频效果。比如,第一视频帧序列包括视频帧A1、A2、A3……A20,第二视频帧序列包括视频帧B1、B2、B3……B20,那么可以间隔两个视频帧,从第一视频帧序列中选择第一目标视频帧A3、A4,再间隔两个视频帧,从第一视频帧序列中选择第一目标视频帧A7、A8,以此类推,直到最后一帧。将上述选择的第一目标视频帧A3、A4、A7、A8……依次添加到第一目标视频帧集合中。The above specified number can be one or more. The number of video frames at intervals is the same as the number of target frames selected, that is, the target video frames are alternately and evenly selected from the two video frame sequences. This method Able to achieve better 3D video effect. For example, if the first video frame sequence includes video frames A1, A2, A3...A20, and the second video frame sequence includes video frames B1, B2, B3...B20, then two video frames can be spaced apart, starting from the first video frame sequence. Select the first target video frame A3, A4, and then two video frames apart, select the first target video frame A7, A8 from the first video frame sequence, and so on, until the last frame. The first target video frames A3, A4, A7, A8, . . . selected above are sequentially added to the first target video frame set.
步骤S204,将第二视频帧序列中,与第一视频帧序列中未选择的视频帧具有相同标识信息的第二目标视频帧,按照标识信息的顺序,依次添加至第二目标视频帧集合中。Step S204, in the second video frame sequence, the second target video frame with the same identification information as the unselected video frame in the first video frame sequence is added to the second target video frame set according to the sequence of the identification information. .
第二视频帧序列包括视频帧B1、B2、B3……B20,通常来说,通过双目摄像设备采集的两路原始视频源,在解码后两路分别对应的视频帧的标识信息是一致的,即,视频帧A1和视频帧B1的标识信息一致,视频帧A2和视频帧B2的标识信息一致……,而如果是利用软件方法计算得到的两路视频源,其解码后两路分别对应的视频帧的标识信息可能会存在一定的差异,这时,就需要首先将两路视频帧的标识信息在保持顺序不变的前提下调整为一致。The second video frame sequence includes video frames B1, B2, B3... , that is, the identification information of the video frame A1 and the video frame B1 are consistent, and the identification information of the video frame A2 and the video frame B2 are consistent..., and if it is a two-channel video source calculated by a software method, the decoded two channels correspond to There may be some differences in the identification information of the two video frames. In this case, it is necessary to first adjust the identification information of the two video frames to be consistent on the premise of keeping the order unchanged.
这样,在选择完第一视频帧序列中的第一目标视频帧后,就可以是直接确定出第二视频帧序列中的第二目标视频帧,即,将第二视频帧序列中,与第一视频帧序列中未选择的视频帧具有相同标识信息的第二目标视频帧,按照标识信息的顺序,依次添加至第二目标视频帧集合中。In this way, after the first target video frame in the first video frame sequence is selected, the second target video frame in the second video frame sequence can be directly determined, that is, the second video frame sequence is compared with the first target video frame in the second video frame sequence. Unselected video frames in a sequence of video frames have second target video frames with the same identification information, and are sequentially added to the second target video frame set according to the sequence of the identification information.
还以上述例子进行说明,确定出第一目标视频帧集合中的第一目标视频帧分别为A3、A4、A7、A8……后,可以确定第一视频帧序列中未选择的视频帧为A1、A2、A5、A6……,在第二视频帧序列中与视频帧为A1、A2、A5、A6……具有相同的标识信息的视频帧为B1、B2、B5、B6……,因此,将视频帧B1、B2、B5、B6……作为第二目标视频帧,添加至第二目标视频帧集合中。The above example is also used to illustrate, after determining that the first target video frames in the first target video frame set are respectively A3, A4, A7, A8..., it can be determined that the unselected video frame in the first video frame sequence is A1 , A2, A5, A6..., the video frames with the same identification information as the video frames A1, A2, A5, A6... in the second video frame sequence are B1, B2, B5, B6..., therefore, The video frames B1, B2, B5, B6... are added to the second target video frame set as second target video frames.
在一实施方式中,上述按照视频帧的标识信息,从第一视频帧序列和第二视频帧序列分别交替选择目标视频帧的步骤,可以通过以下方式实现,参见图3所示,按照视频帧的标识信息,从第一视频帧序列中选择奇数帧,如图中的第一视频帧A1、第一视频帧A3……添加至第一目标视频帧集合中,从第二视频帧序列中选择偶数帧,如图中的第二视频帧B2、第二视频帧B4……添加至第二目标视频帧集合中。按照奇数帧和偶数帧的标识信息,对奇数帧和偶数帧进行交叉合并,得到目标视频帧序列:第一视频帧A1、第二视频帧B2、第一视频帧A3、第二视频帧B4……。这种方式也就是上述实施例中指定个数为一个的情况,即每间隔一个视频帧选择一个视频帧,这种视频帧选择方式可以更好地实现视频的3D效 果。In one embodiment, the above-mentioned step of alternately selecting target video frames from the first video frame sequence and the second video frame sequence according to the identification information of the video frames can be implemented in the following manner. Referring to FIG. 3 , according to the video frame identification information, select odd-numbered frames from the first video frame sequence, such as the first video frame A1, the first video frame A3 in the figure, and add them to the first target video frame set, and select from the second video frame sequence. The even-numbered frames, such as the second video frame B2, the second video frame B4, . . . in the figure, are added to the second target video frame set. According to the identification information of the odd-numbered frame and the even-numbered frame, the odd-numbered frame and the even-numbered frame are cross-merged to obtain the target video frame sequence: the first video frame A1, the second video frame B2, the first video frame A3, the second video frame B4... …. This method is also the case where the specified number is one in the above-mentioned embodiment, that is, a video frame is selected every other video frame, and this video frame selection method can better realize the 3D effect of the video.
在通过上述过程确定出第一目标视频帧集合和第二目标视频帧集合后,进一步根据目标视频帧的标识信息对两集合中的目标视频帧进行交叉合并,生成目标视频帧序列,然后将原始视频源中的音频帧与目标视频帧序列中的目标视频帧进行封装,得到两路原始视频源对应的裸眼3D视频文件。在一实施方式中,可以任一路原始视频源中的音频帧作为目标音频帧;通过预设编码算法对目标音频帧和目标视频帧序列中的目标视频帧进行封装,得到两路原始视频源对应的裸眼3D视频文件。上述预设编码算法可以包括以下之一:H264编码算法、H265编码算法和AV1编码算法。After the first target video frame set and the second target video frame set are determined through the above process, the target video frames in the two sets are further cross-merged according to the identification information of the target video frames to generate a target video frame sequence, and then the original target video frame sequence is generated. The audio frame in the video source is encapsulated with the target video frame in the target video frame sequence to obtain the glasses-free 3D video file corresponding to the two original video sources. In one embodiment, the audio frame in any original video source can be used as the target audio frame; the target audio frame and the target video frame in the target video frame sequence are encapsulated by a preset coding algorithm, and the corresponding two-way original video sources are obtained. naked-eye 3D video files. The above-mentioned preset encoding algorithm may include one of the following: H264 encoding algorithm, H265 encoding algorithm and AV1 encoding algorithm.
通过重新编码,新生成的视频文件与原始视频源拥有相同的帧率,但由于是两个视频文件合成的,所以每一个单眼视频(作用于一只眼睛)的帧率是原始视频源帧率的一半。因此,合并视频文件后,为了减少人眼单目感知视频时产生的顿挫感,要求原始视频源在录制时尽可能的提高帧率。新生成的裸眼3D视频文件,可以在普通视频播放器上播放。如果人只用一只眼睛观看,其效果是一个有重影的视频。如果人用双目观看,经过短暂的适应,大脑会自动将重影图像分别对应单侧眼睛接收信息,在经过人脑视觉感知区的处理,最终就产生了观看立体图像的错觉,即实现了裸眼3D视频的效果。Through re-encoding, the newly generated video file has the same frame rate as the original video source, but since the two video files are synthesized, the frame rate of each monocular video (acting on one eye) is the frame rate of the original video source. half of . Therefore, after merging the video files, in order to reduce the frustration when the human eye perceives the video monocularly, the original video source is required to increase the frame rate as much as possible during recording. The newly generated glasses-free 3D video files can be played on ordinary video players. If a person watches with only one eye, the effect is a ghosted video. If a person watches with both eyes, after a short period of adaptation, the brain will automatically assign the ghost image to the unilateral eye to receive information. After processing in the visual perception area of the human brain, the illusion of viewing a stereoscopic image is finally produced, that is, the realization of the The effect of naked eye 3D video.
本申请实施例提供的视频处理方法是通过纯软件算法实现的,不依赖专有观看设备,也不依赖专用视频播放器,视频文件也可以使用市场上常用的编码格式,所有播放条件都不受限,观看者裸眼就能体验到3D立体视频效果。The video processing methods provided by the embodiments of the present application are implemented by pure software algorithms, and do not rely on proprietary viewing devices or dedicated video players. Video files can also use encoding formats commonly used in the market, and all playback conditions are not affected by The viewer can experience the 3D stereoscopic video effect with naked eyes.
基于上述方法实施例,本申请实施例还提供一种视频处理装置,参见图4所示,该装置包括:视频源获取模块402,设置为获取要3D呈现的两路原始视频源;视频解码模块404,设置为对两路原始视频源分别进行解码,得到两路原始视频源分别对应的第一视频帧序列和第二视频帧序列;目标帧选择模块406,设置为按照视频帧的标识信息,从第一视频帧序列和第二视频帧序列分别交替选择目标视频帧,以使得到的第一目标视频帧集合和第二目标视频帧集合中的目标视频帧的标识信息互补;帧组合模块408,设置为按照目标视频帧的标识信息的顺序,对第一目标视频帧集合和第二目标视频帧集合中的目标视频帧进行交叉组合,得到目标视频帧序列。Based on the above method embodiments, the embodiments of the present application further provide a video processing apparatus, as shown in FIG. 4 , the apparatus includes: a video source obtaining module 402, configured to obtain two channels of original video sources to be presented in 3D; a video decoding module 404, set to decode the two-way original video sources respectively, and obtain the first video frame sequence and the second video frame sequence corresponding to the two-way original video sources respectively; the target frame selection module 406 is set to according to the identification information of the video frame, The target video frames are alternately selected from the first video frame sequence and the second video frame sequence, so that the obtained first target video frame set and the identification information of the target video frames in the second target video frame set are complementary; frame combining module 408 is set to cross-combine the target video frames in the first target video frame set and the second target video frame set according to the sequence of the identification information of the target video frames to obtain the target video frame sequence.
在另一种可能的实施方式中,上述视频处理装置还包括:音视频封装模块410,设置为将原始视频源中的音频帧与目标视频帧序列中的目标视频帧进行封装,得到两路原始视频源对应的裸眼3D视频文件。In another possible implementation, the above-mentioned video processing apparatus further includes: an audio and video encapsulation module 410, configured to encapsulate the audio frames in the original video source and the target video frames in the target video frame sequence, to obtain two channels of original video frames. The naked-eye 3D video file corresponding to the video source.
在另一种可能的实施方式中,上述标识信息包括视频帧的解压时间戳或者视频帧对应的序号。In another possible implementation manner, the above-mentioned identification information includes a decompression timestamp of the video frame or a sequence number corresponding to the video frame.
在另一种可能的实施方式中,上述视频源获取模块402还设置为:通过双目摄像设备 采集要3D呈现的两路原始视频源;或者,通过图像识别技术对目标视频源进行计算,得到存在视差的两路原始视频源。In another possible implementation, the above-mentioned video source obtaining module 402 is further configured to: collect two channels of original video sources to be presented in 3D through a binocular camera device; or, calculate the target video source through image recognition technology, and obtain Two original video sources with parallax.
在另一种可能的实施方式中,上述目标帧选择模块406还设置为:根据第一视频帧序列中视频帧的标识信息的顺序,依次间隔指定个数的视频帧,从第一视频帧序列中选择指定个数的第一目标视频帧添加至第一目标视频帧集合中;将第二视频帧序列中,与第一视频帧序列中未选择的视频帧具有相同标识信息的第二目标视频帧,按照标识信息的顺序,依次添加至第二目标视频帧集合中。In another possible implementation, the above-mentioned target frame selection module 406 is further configured to: according to the sequence of the identification information of the video frames in the first video frame sequence, sequentially space a specified number of video frames, from the first video frame sequence Select a specified number of first target video frames in the first target video frame set and add to the first target video frame set; in the second video frame sequence, the second target video with the same identification information as the video frame not selected in the first video frame sequence has the same identification information. The frames are sequentially added to the second target video frame set according to the sequence of the identification information.
在另一种可能的实施方式中,上述目标帧选择模块406还设置为:按照视频帧的标识信息,从第一视频帧序列中选择奇数帧添加至第一目标视频帧集合中,从第二视频帧序列中选择偶数帧添加至第二目标视频帧集合中。In another possible implementation, the target frame selection module 406 is further configured to: select odd-numbered frames from the first video frame sequence and add them to the first target video frame set according to the identification information of the video frames, and select the odd-numbered frames from the second video frame sequence. Selecting even-numbered frames from the video frame sequence is added to the second target video frame set.
在另一种可能的实施方式中,上述装置还包括:时间戳调整模块412,设置为如果第一视频帧序列和第二视频帧序列中,视频帧的标识信息不一致,将视频帧的标识信息在保持顺序不变的前提下调整为一致。In another possible implementation manner, the above-mentioned apparatus further includes: a time stamp adjustment module 412, configured to adjust the identification information of the video frame to the identification information of the video frame if the identification information of the video frame is inconsistent in the first video frame sequence and the second video frame sequence Adjusted to be consistent under the premise of keeping the order unchanged.
在另一种可能的实施方式中,上述音视频封装模块410还设置为:以任一路原始视频源中的音频帧作为目标音频文件;通过预设编码算法对目标音频帧和目标视频帧序列中的目标视频帧进行封装,得到两路原始视频源对应的裸眼3D视频文件。In another possible implementation, the above-mentioned audio and video encapsulation module 410 is further configured to: take any audio frame in the original video source as the target audio file; The target video frame is encapsulated, and the glasses-free 3D video file corresponding to the two original video sources is obtained.
在另一种可能的实施方式中,上述预设编码算法包括以下之一:H264编码算法、H265编码算法和AV1编码算法。In another possible implementation manner, the above-mentioned preset encoding algorithm includes one of the following: an H264 encoding algorithm, an H265 encoding algorithm, and an AV1 encoding algorithm.
本申请实施例提供的视频处理装置,其实现原理及产生的技术效果和前述视频处理方法实施例相同,为简要描述,视频处理装置的实施例部分未提及之处,可参考前述视频处理方法实施例中相应内容。The video processing apparatus provided by the embodiments of the present application has the same implementation principle and technical effects as the foregoing video processing method embodiments. For brief description, for the parts not mentioned in the embodiments of the video processing apparatus, reference may be made to the foregoing video processing method. Corresponding content in the examples.
本申请实施例还提供了一种电子设备,如图5所示,为该电子设备的结构示意图,其中,该电子设备包括处理器51和存储器50,该存储器50存储有能够被该处理器51执行的计算机可执行指令,该处理器51执行该计算机可执行指令以实现上述方法。An embodiment of the present application further provides an electronic device, as shown in FIG. 5 , which is a schematic structural diagram of the electronic device, wherein the electronic device includes a processor 51 and a memory 50 , and the memory 50 stores data that can be used by the processor 51 . Executed computer-executable instructions, the processor 51 executes the computer-executable instructions to implement the above method.
在图5示出的实施方式中,该电子设备还包括总线52和通信接口53,其中,处理器51、通信接口53和存储器50通过总线52连接。In the embodiment shown in FIG. 5 , the electronic device further includes a bus 52 and a communication interface 53 , wherein the processor 51 , the communication interface 53 and the memory 50 are connected through the bus 52 .
其中,存储器50可能包含高速随机存取存储器(RAM,Random Access Memory),也可能还包括非不稳定的存储器(non-volatile memory),例如至少一个磁盘存储器。通过至少一个通信接口53(可以是有线或者无线)实现该系统网元与至少一个其他网元之间的通信连接,可以使用互联网、广域网、本地网、城域网等。总线52可以是ISA(Industry Standard Architecture,工业标准体系结构)总线、PCI(Peripheral Component Interconnect,外设部件互连标准)总线或EISA(Extended Industry Standard Architecture,扩展工业标准结构) 总线等。所述总线52可以分为地址总线、数据总线、控制总线等。为便于表示,图5中仅用一个双向箭头表示,但并不表示仅有一根总线或一种类型的总线。The memory 50 may include a high-speed random access memory (RAM, Random Access Memory), and may also include a non-volatile memory (non-volatile memory), such as at least one disk memory. The communication connection between the system network element and at least one other network element is realized through at least one communication interface 53 (which may be wired or wireless), and the Internet, wide area network, local network, metropolitan area network, etc. may be used. The bus 52 may be an ISA (Industry Standard Architecture, industry standard architecture) bus, a PCI (Peripheral Component Interconnect, peripheral component interconnect standard) bus or an EISA (Extended Industry Standard Architecture, extended industry standard architecture) bus and the like. The bus 52 can be divided into an address bus, a data bus, a control bus, and the like. For ease of representation, only one bidirectional arrow is shown in FIG. 5, but it does not mean that there is only one bus or one type of bus.
处理器51可能是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法的各步骤可以通过处理器51中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器51可以是通用处理器,包括中央处理器(Central Processing Unit,简称CPU)、网络处理器(Network Processor,简称NP)等;还可以是数字信号处理器(Digital Signal Processor,简称DSP)、专用集成电路(Application Specific Integrated Circuit,简称ASIC)、现场可编程门阵列(Field-Programmable Gate Array,简称FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器,处理器51读取存储器中的信息,结合其硬件完成前述实施例的方法的步骤。The processor 51 may be an integrated circuit chip with signal processing capability. In the implementation process, each step of the above-mentioned method can be completed by a hardware integrated logic circuit in the processor 51 or an instruction in the form of software. The above-mentioned processor 51 may be a general-purpose processor, including a central processing unit (Central Processing Unit, referred to as CPU), a network processor (Network Processor, referred to as NP), etc.; may also be a digital signal processor (Digital Signal Processor, referred to as DSP) ), Application Specific Integrated Circuit (ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, and discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in conjunction with the embodiments of the present application may be directly embodied as executed by a hardware decoding processor, or executed by a combination of hardware and software modules in the decoding processor. The software modules may be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other storage media mature in the art. The storage medium is located in the memory, and the processor 51 reads the information in the memory, and completes the steps of the methods of the foregoing embodiments in combination with its hardware.
本申请实施例还提供了一种计算机可读存储介质,该计算机可读存储介质存储有计算机可执行指令,该计算机可执行指令在被处理器调用和执行时,该计算机可执行指令促使处理器实现上述方法,具体实现可参见前述方法实施例,在此不再赘述。Embodiments of the present application further provide a computer-readable storage medium, where computer-executable instructions are stored in the computer-readable storage medium, and when the computer-executable instructions are invoked and executed by a processor, the computer-executable instructions cause the processor to For the implementation of the above method, reference may be made to the foregoing method embodiments for specific implementation, and details are not described herein again.
本申请实施例所提供的视频处理方法、装置和电子设备的计算机程序产品,包括存储了程序代码的计算机可读存储介质,所述程序代码包括的指令可设置为执行前面方法实施例中所述的方法,具体实现可参见方法实施例,在此不再赘述。The video processing method, apparatus, and computer program product of an electronic device provided by the embodiments of the present application include a computer-readable storage medium storing program codes, and the instructions included in the program codes can be configured to execute the methods described in the foregoing method embodiments. The specific implementation can refer to the method embodiment, which is not repeated here.
除非另外具体说明,否则在这些实施例中阐述的部件和步骤的相对步骤、数字表达式和数值并不限制本申请的范围。The relative steps, numerical expressions and numerical values of the components and steps set forth in these embodiments do not limit the scope of the present application unless specifically stated otherwise.
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个处理器可执行的非易失的计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对相关技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。The functions, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a processor-executable non-volatile computer-readable storage medium. Based on this understanding, the technical solution of the present application can be embodied in the form of a software product in essence, or the part that contributes to the related technology or the part of the technical solution. The computer software product is stored in a storage medium, including several The instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned storage medium includes: U disk, mobile hard disk, Read-Only Memory (ROM, Read-Only Memory), Random Access Memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program codes .
本申请实施例提供的视频处理方法中,首先通过对要3D呈现的两路原始视频源分别进行解码,得到两路原始视频源分别对应的第一视频帧序列和第二视频帧序列;然后按照视 频帧的标识信息,从第一视频帧序列和第二视频帧序列分别交替选择标识信息互补的目标视频帧,得到第一目标视频帧集合和第二目标视频帧集合;然后按照目标视频帧的标识信息的顺序,对上述两集合中的目标视频帧进行交叉组合,得到目标视频帧序列。本申请实施例能够通过对两路视频源的视频帧的交叉合并处理,能够得到具有裸眼3D视觉效果的视频帧序列,进而可以生成能够利用普通播放器播放的3D裸眼视频文件,使用户不需要依赖特殊的设备就可以裸眼观看3D效果的视频。In the video processing method provided by the embodiment of the present application, firstly, by decoding the two channels of original video sources to be presented in 3D, respectively, a first video frame sequence and a second video frame sequence corresponding to the two channels of original video sources are obtained; The identification information of the video frame, from the first video frame sequence and the second video frame sequence, alternately select target video frames with complementary identification information respectively, obtain the first target video frame set and the second target video frame set; then according to the target video frame The sequence of the identification information is cross-combined with the target video frames in the above two sets to obtain the target video frame sequence. The embodiment of the present application can obtain a video frame sequence with naked-eye 3D visual effects by cross-merging the video frames of two video sources, and then can generate a 3D naked-eye video file that can be played by a common player, so that the user does not need to Depending on special equipment, you can watch videos with 3D effects with the naked eye.
在本申请的描述中,需要说明的是,术语“中心”、“上”、“下”、“左”、“右”、“竖直”、“水平”、“内”、“外”等指示的方位或位置关系为基于附图所示的方位或位置关系,仅是为了便于描述本申请和简化描述,而不是指示或暗示所指的装置或元件必须具有特定的方位、以特定的方位构造和操作,因此不能理解为对本申请的限制。此外,术语“第一”、“第二”、“第三”仅用于描述目的,而不能理解为指示或暗示相对重要性。In the description of this application, it should be noted that the terms "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", etc. The indicated orientation or positional relationship is based on the orientation or positional relationship shown in the accompanying drawings, which is only for the convenience of describing the present application and simplifying the description, rather than indicating or implying that the indicated device or element must have a specific orientation or a specific orientation. construction and operation, and therefore should not be construed as limitations on this application. Furthermore, the terms "first", "second", and "third" are used for descriptive purposes only and should not be construed to indicate or imply relative importance.
最后应说明的是:以上所述实施例,仅为本申请的具体实施方式,用以说明本申请的技术方案,而非对其限制,本申请的保护范围并不局限于此,尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,其依然可以对前述实施例所记载的技术方案进行修改或可轻易想到变化,或者对其中部分技术特征进行等同替换;而这些修改、变化或者替换,并不使相应技术方案的本质脱离本申请实施例技术方案的精神和范围,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应所述以权利要求的保护范围为准。Finally, it should be noted that the above-mentioned embodiments are only specific implementations of the present application, and are used to illustrate the technical solutions of the present application, rather than limit them. The embodiments describe the application in detail, and those of ordinary skill in the art should understand that: any person skilled in the art can still modify the technical solutions described in the foregoing embodiments within the technical scope disclosed in the application. Or can easily think of changes, or equivalently replace some of the technical features; and these modifications, changes or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions in the embodiments of the application, and should be covered in this application. within the scope of protection. Therefore, the protection scope of the present application should be based on the protection scope of the claims.
工业实用性Industrial Applicability
本申请能够应用于软件算法开发技术领域,提供一种视频处理方法、装置、电子设备及计算机可读存储介质,通过对两路视频源的视频帧的交叉合并处理,能够得到具有裸眼3D视觉效果的视频帧序列,进而可以生成裸眼3D视频文件,使用户不需要依赖特殊的设备就可以裸眼观看3D效果的视频。The present application can be applied to the technical field of software algorithm development, and provides a video processing method, device, electronic device, and computer-readable storage medium. Through cross-merging processing of video frames of two video sources, a naked-eye 3D visual effect can be obtained. The video frame sequence can be used to generate a naked-eye 3D video file, so that users can watch the video with naked-eye effect without relying on special equipment.

Claims (12)

  1. 一种视频处理方法,包括:A video processing method, comprising:
    获取要三维呈现的两路原始视频源;Obtain two channels of original video sources for 3D rendering;
    对两路所述原始视频源分别进行解码,得到两路所述原始视频源分别对应的第一视频帧序列和第二视频帧序列;Decoding the two channels of the original video sources respectively to obtain a first video frame sequence and a second video frame sequence corresponding to the two channels of the original video sources respectively;
    按照视频帧的标识信息,从所述第一视频帧序列和所述第二视频帧序列分别交替选择目标视频帧,以使得到的第一目标视频帧集合和第二目标视频帧集合中的目标视频帧的标识信息互补;以及According to the identification information of the video frames, target video frames are alternately selected from the first video frame sequence and the second video frame sequence, so that the target video frames in the first target video frame set and the second target video frame set are obtained. The identification information of the video frame is complementary; and
    按照目标视频帧的标识信息的顺序,对所述第一目标视频帧集合和所述第二目标视频帧集合中的目标视频帧进行交叉组合,得到目标视频帧序列。According to the sequence of the identification information of the target video frames, cross-combining the target video frames in the first target video frame set and the second target video frame set to obtain a target video frame sequence.
  2. 根据权利要求1所述的方法,其中,在得到目标视频帧序列之后,所述方法还包括:The method according to claim 1, wherein after obtaining the target video frame sequence, the method further comprises:
    将所述原始视频源中的音频帧与所述目标视频帧序列中的目标视频帧进行封装,得到两路所述原始视频源对应的裸眼三维视频文件。The audio frames in the original video source and the target video frames in the target video frame sequence are encapsulated to obtain two-channel naked-eye three-dimensional video files corresponding to the original video sources.
  3. 根据权利要求1或2所述的方法,其中,所述标识信息包括视频帧的解压时间戳或者所述视频帧对应的序号。The method according to claim 1 or 2, wherein the identification information includes a decompression timestamp of a video frame or a sequence number corresponding to the video frame.
  4. 根据权利要求1至3中任一项所述的方法,其中,获取要三维呈现的两路原始视频源的步骤,包括:The method according to any one of claims 1 to 3, wherein the step of acquiring two channels of original video sources to be presented three-dimensionally comprises:
    通过并联双目摄像设备采集要三维呈现的两路原始视频源;Two channels of original video sources to be presented in 3D are collected through parallel binocular camera equipment;
    或者,or,
    通过图像识别技术对目标视频源进行计算,得到存在视差的两路原始视频源。The target video source is calculated by image recognition technology, and two original video sources with parallax are obtained.
  5. 根据权利要求1至4中任一项所述的方法,其中,按照视频帧的标识信息,从所述第一视频帧序列和所述第二视频帧序列分别交替选择目标视频帧的步骤,包括:The method according to any one of claims 1 to 4, wherein the step of alternately selecting target video frames from the first video frame sequence and the second video frame sequence, respectively, according to the identification information of the video frames, comprises: :
    根据所述第一视频帧序列中视频帧的标识信息的顺序,依次间隔指定个数的视频帧,从所述第一视频帧序列中选择所述指定个数的第一目标视频帧添加至第一目标视频帧集合中;以及According to the sequence of the identification information of the video frames in the first video frame sequence, a specified number of video frames are sequentially spaced, and the specified number of first target video frames are selected from the first video frame sequence and added to the first target video frame of the first video frame sequence. in a target set of video frames; and
    将第二视频帧序列中,与所述第一视频帧序列中未选择的视频帧具有相同标识信息的第二目标视频帧,按照标识信息的顺序,依次添加至第二目标视频帧集合中。In the second video frame sequence, the second target video frames having the same identification information as the unselected video frames in the first video frame sequence are sequentially added to the second target video frame set according to the sequence of the identification information.
  6. 根据权利要求1至4中任一项所述的方法,其中,按照视频帧的标识信息,从所述第一视频帧序列和所述第二视频帧序列分别交替选择目标视频帧的步骤,包括:The method according to any one of claims 1 to 4, wherein the step of alternately selecting target video frames from the first video frame sequence and the second video frame sequence, respectively, according to the identification information of the video frames, comprises: :
    按照视频帧的标识信息,从所述第一视频帧序列中选择奇数帧添加至第一目标视频帧集合中,从所述第二视频帧序列中选择偶数帧添加至第二目标视频帧集合中。According to the identification information of the video frame, select odd-numbered frames from the first video frame sequence to add to the first target video frame set, and select even-numbered frames from the second video frame sequence to add to the second target video frame set .
  7. 根据权利要求1至6中任一项所述的方法,其中,按照视频帧的标识信息,从所述 第一视频帧序列和所述第二视频帧序列分别交替选择目标视频帧的步骤之前,所述方法还包括:The method according to any one of claims 1 to 6, wherein, before the step of alternately selecting target video frames from the first video frame sequence and the second video frame sequence according to the identification information of the video frames, The method also includes:
    如果所述第一视频帧序列和所述第二视频帧序列中,视频帧的标识信息不一致,将所述视频帧的标识信息在保持顺序的前提下调整为一致。If the identification information of the video frames in the first video frame sequence and the second video frame sequence are inconsistent, the identification information of the video frames is adjusted to be consistent on the premise of maintaining the order.
  8. 根据权利要求2所述的方法,其中,将所述原始视频源中的音频帧与所述目标视频帧序列中的目标视频帧进行封装,得到两路所述原始视频源对应的裸眼三维视频文件的步骤,包括:The method according to claim 2, wherein the audio frame in the original video source and the target video frame in the target video frame sequence are encapsulated to obtain two channels of glasses-free 3D video files corresponding to the original video source steps, including:
    以任一路所述原始视频源中的音频帧作为目标音频帧;以及Using the audio frame in any channel of the original video source as the target audio frame; and
    通过预设编码算法对所述目标音频帧和所述目标视频帧序列中的目标视频帧进行封装,得到两路所述原始视频源对应的裸眼三维视频文件。The target audio frame and the target video frame in the target video frame sequence are encapsulated by a preset coding algorithm, so as to obtain the naked-eye 3D video files corresponding to the two channels of the original video source.
  9. 根据权利要求8所述的方法,其中,所述预设编码算法包括H264编码算法、H265编码算法和AV1编码算法中的任一个。The method of claim 8, wherein the preset encoding algorithm includes any one of an H264 encoding algorithm, an H265 encoding algorithm, and an AV1 encoding algorithm.
  10. 一种视频处理装置,包括:A video processing device, comprising:
    视频源获取模块,设置为获取要三维呈现的两路原始视频源;The video source acquisition module is set to acquire two channels of original video sources to be presented in three dimensions;
    视频解码模块,设置为对两路所述原始视频源分别进行解码,得到两路所述原始视频源分别对应的第一视频帧序列和第二视频帧序列;a video decoding module, configured to decode the two channels of the original video sources respectively, to obtain the first video frame sequence and the second video frame sequence corresponding to the two channels of the original video sources respectively;
    目标帧选择模块,设置为按照视频帧的标识信息,从所述第一视频帧序列和所述第二视频帧序列分别交替选择目标视频帧,以使得到的第一目标视频帧集合和第二目标视频帧集合中的目标视频帧的标识信息互补;以及The target frame selection module is configured to alternately select target video frames from the first video frame sequence and the second video frame sequence according to the identification information of the video frame, so that the obtained first target video frame set and the second The identification information of the target video frames in the target video frame set is complementary; and
    帧组合模块,设置为按照目标视频帧的标识信息的顺序,对所述第一目标视频帧集合和所述第二目标视频帧集合中的目标视频帧进行交叉组合,得到目标视频帧序列。The frame combining module is configured to cross combine the target video frames in the first target video frame set and the second target video frame set according to the sequence of the identification information of the target video frames to obtain a target video frame sequence.
  11. 一种电子设备,包括处理器和存储器,所述存储器存储有能够被所述处理器执行的计算机可执行指令,所述处理器执行所述计算机可执行指令以实现权利要求1至9中任一项所述的方法。An electronic device comprising a processor and a memory, the memory storing computer-executable instructions executable by the processor, the processor executing the computer-executable instructions to implement any one of claims 1 to 9 method described in item.
  12. 一种计算机可读存储介质,存储有计算机可执行指令,所述计算机可执行指令在被处理器调用和执行时,计算机可执行指令促使处理器实现权利要求1至9中任一项所述的方法。A computer-readable storage medium storing computer-executable instructions, when the computer-executable instructions are invoked and executed by a processor, the computer-executable instructions cause the processor to implement the method described in any one of claims 1 to 9. method.
PCT/CN2021/129870 2020-12-30 2021-11-10 Video processing method and apparatus, electronic device, and computer readable storage medium WO2022142757A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011614555.7A CN114697758B (en) 2020-12-30 2020-12-30 Video processing method and device and electronic equipment
CN202011614555.7 2020-12-30

Publications (1)

Publication Number Publication Date
WO2022142757A1 true WO2022142757A1 (en) 2022-07-07

Family

ID=82132974

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/129870 WO2022142757A1 (en) 2020-12-30 2021-11-10 Video processing method and apparatus, electronic device, and computer readable storage medium

Country Status (2)

Country Link
CN (1) CN114697758B (en)
WO (1) WO2022142757A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116567353A (en) * 2023-07-10 2023-08-08 湖南快乐阳光互动娱乐传媒有限公司 Video delivery method and device, storage medium and electronic equipment

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100007722A1 (en) * 2008-07-14 2010-01-14 Ul-Je Kim Stereoscopic image display device and driving method thereof
US20120105602A1 (en) * 2010-11-03 2012-05-03 3Dmedia Corporation Methods, systems, and computer program products for creating three-dimensional video sequences
CN102547313A (en) * 2010-12-21 2012-07-04 北京睿为视讯技术有限公司 Three-dimensional video play system and method thereof
US20130215239A1 (en) * 2012-02-21 2013-08-22 Sen Wang 3d scene model from video
CN107872670A (en) * 2017-11-17 2018-04-03 暴风集团股份有限公司 A kind of 3D video coding-decoding methods, device, server, client and system
CN111447504A (en) * 2020-03-27 2020-07-24 北京字节跳动网络技术有限公司 Three-dimensional video processing method and device, readable storage medium and electronic equipment

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20080059937A (en) * 2006-12-26 2008-07-01 삼성전자주식회사 Display apparatus and processing method for 3d image and processing system for 3d image
CN103081478A (en) * 2010-06-24 2013-05-01 电子部品研究院 Method for configuring stereoscopic moving picture file
CN103024449B (en) * 2011-09-28 2015-08-19 中国移动通信集团公司 Stream of video frames processing method, video server and terminal equipment
CN104363437A (en) * 2014-11-28 2015-02-18 广东欧珀移动通信有限公司 Method and apparatus for recording stereo video
CN106303495B (en) * 2015-06-30 2018-01-16 深圳创锐思科技有限公司 Synthetic method, device and its mobile terminal of panoramic stereo image
CN108111833A (en) * 2016-11-24 2018-06-01 阿里巴巴集团控股有限公司 For the method, apparatus and system of stereo video coding-decoding
CN110868560B (en) * 2018-08-27 2022-04-15 青岛海信移动通信技术股份有限公司 Video recording method based on binocular camera and terminal equipment

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100007722A1 (en) * 2008-07-14 2010-01-14 Ul-Je Kim Stereoscopic image display device and driving method thereof
US20120105602A1 (en) * 2010-11-03 2012-05-03 3Dmedia Corporation Methods, systems, and computer program products for creating three-dimensional video sequences
CN102547313A (en) * 2010-12-21 2012-07-04 北京睿为视讯技术有限公司 Three-dimensional video play system and method thereof
US20130215239A1 (en) * 2012-02-21 2013-08-22 Sen Wang 3d scene model from video
CN107872670A (en) * 2017-11-17 2018-04-03 暴风集团股份有限公司 A kind of 3D video coding-decoding methods, device, server, client and system
CN111447504A (en) * 2020-03-27 2020-07-24 北京字节跳动网络技术有限公司 Three-dimensional video processing method and device, readable storage medium and electronic equipment

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116567353A (en) * 2023-07-10 2023-08-08 湖南快乐阳光互动娱乐传媒有限公司 Video delivery method and device, storage medium and electronic equipment
CN116567353B (en) * 2023-07-10 2023-09-12 湖南快乐阳光互动娱乐传媒有限公司 Video delivery method and device, storage medium and electronic equipment

Also Published As

Publication number Publication date
CN114697758A (en) 2022-07-01
CN114697758B (en) 2023-03-31

Similar Documents

Publication Publication Date Title
JP5663617B2 (en) Stereo image sequence encoding method and decoding method
CN105187816B (en) Method and apparatus for the activity space of reasonable employment frame packing form
CN102918847B (en) The method and apparatus of display image
CN106303573B (en) 3D video image processing method, server and client
KR20110064161A (en) Method and apparatus for encoding a stereoscopic 3d image, and display apparatus and system for displaying a stereoscopic 3d image
JP2009207136A (en) Method for processing multiple video streams, and systems for encoding and decoding video streams
US20130038611A1 (en) Image conversion device
US20130169543A1 (en) Rendering Apparatuses, Display System and Methods for Rendering Multimedia Data Objects with a Function to Avoid Eye Fatigue
CN102223550A (en) Image processing apparatus, image processing method, and program
US20150304640A1 (en) Managing 3D Edge Effects On Autostereoscopic Displays
TWI539790B (en) Apparatus, method and software product for generating and rebuilding a video stream
Shao et al. Stereoscopic video coding with asymmetric luminance and chrominance qualities
US20110157312A1 (en) Image processing apparatus and method
US20130229409A1 (en) Image processing method and image display device according to the method
WO2014005558A1 (en) System and method for user to upload 3d video to video website
WO2022142757A1 (en) Video processing method and apparatus, electronic device, and computer readable storage medium
TWI487379B (en) Video encoding method, video encoder, video decoding method and video decoder
Coll et al. 3D TV at home: Status, challenges and solutions for delivering a high quality experience
KR101686168B1 (en) Method for constituting stereoscopic moving picture file
Aflaki et al. Simultaneous 2D and 3D perception for stereoscopic displays based on polarized or active shutter glasses
US20140072271A1 (en) Recording apparatus, recording method, reproduction apparatus, reproduction method, program, and recording reproduction apparatus
CN114040184A (en) Image display method, system, storage medium and computer program product
TW201916682A (en) Real-time 2D-to-3D conversion image processing method capable of processing 2D image in real time and converting the 2D image into 3D image without requiring complicated subsequent processing
KR101433082B1 (en) Video conversing and reproducing method to provide medium feeling of two-dimensional video and three-dimensional video
KR101674688B1 (en) A method for displaying a stereoscopic image and stereoscopic image playing device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21913500

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21913500

Country of ref document: EP

Kind code of ref document: A1