WO2023029204A1 - Free viewpoint video screen splicing method, terminal, and readable storage medium - Google Patents

Free viewpoint video screen splicing method, terminal, and readable storage medium Download PDF

Info

Publication number
WO2023029204A1
WO2023029204A1 PCT/CN2021/129039 CN2021129039W WO2023029204A1 WO 2023029204 A1 WO2023029204 A1 WO 2023029204A1 CN 2021129039 W CN2021129039 W CN 2021129039W WO 2023029204 A1 WO2023029204 A1 WO 2023029204A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
viewpoint
target
video frame
image
Prior art date
Application number
PCT/CN2021/129039
Other languages
French (fr)
Chinese (zh)
Inventor
王荣刚
王振宇
高文
Original Assignee
北京大学深圳研究生院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京大学深圳研究生院 filed Critical 北京大学深圳研究生院
Publication of WO2023029204A1 publication Critical patent/WO2023029204A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/222Studio circuitry; Studio devices; Studio equipment
    • H04N5/262Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects
    • H04N5/265Mixing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44016Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving splicing one content stream with another content stream, e.g. for substituting a video clip
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
    • H04N21/440281Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display by altering the temporal resolution, e.g. by frame skipping
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments
    • H04N21/8456Structuring of content, e.g. decomposing content into time segments by decomposing the content in the time domain, e.g. in time segments

Definitions

  • the present application relates to the field of free viewpoints, in particular to a method for mosaicing free viewpoint video images, a terminal and a readable storage medium.
  • Free viewpoint applications allow viewers to watch videos in the form of continuous viewpoints within a certain range.
  • the viewer can set the position and angle of the viewpoint, and is no longer limited to a fixed camera angle of view.
  • This application often requires multiple cameras to shoot at the same time and generate videos from multiple viewpoints at the same time.
  • it is also necessary to generate depth maps corresponding to videos from multiple viewpoints.
  • the main purpose of this application is to provide a method for mosaicing video frames from different viewpoints at the same time, aiming to generate multiple video frames by splicing video frames corresponding to different viewpoints at the same time and sending them to the decoding end.
  • the image corresponding to the current viewpoint is intercepted and displayed in the frame to reduce the stitching of each video frame, thereby improving the resolution and solving the problem of low video resolution.
  • the present application provides a method for splicing free-viewpoint video images, the method for splicing free-viewpoint video images includes the following steps:
  • each video frame group includes at least two video frames with the same time stamp
  • the step of intercepting the target image according to the arrangement information of the video frame group and the viewpoint identifier corresponding to the target viewpoint includes:
  • the viewpoint identifier corresponding to the target viewpoint determine the target video frame where the target image corresponding to the target viewpoint is located and the position information of the target image in the target video frame;
  • the step of acquiring the video frame group corresponding to the target time stamp in the video sequence it further includes:
  • the step of acquiring the video frame group corresponding to the target time stamp in the video sequence includes:
  • the present application also provides a free-viewpoint video splicing method, which is applied to the encoding end, and the transmission method of the free-viewpoint video includes:
  • the step of splicing images with the same time stamp into at least two video frames according to the preset arrangement information includes:
  • the images with the same time stamp and the corresponding depth images are spliced into at least two video frames according to the preset arrangement information, wherein the images and the corresponding depth images are spliced in the same video frame.
  • the step of inputting the video sequence and the arrangement information into an encoder to generate a target video stream includes:
  • the arrangement information of the video frame is generated, wherein the arrangement information of the video frame includes the viewpoint identification and position information of each image in the video frame, and it is determined that the depth is included in the video frame image, the arrangement information of the video frame also includes the viewpoint identification and position information corresponding to each depth image;
  • the present application also provides a terminal, the terminal is a decoding end, and the decoding end includes a memory, a processor, and a free-viewpoint video image stored in the memory and operable on the processor
  • a stitching program when the free-viewpoint video frame stitching program is executed by the processor, the steps of the above-mentioned free-viewpoint video frame stitching method are implemented.
  • the present application also provides a terminal, the terminal is an encoding end, and the encoding end includes a memory, a processor, and a free-viewpoint video image stored in the memory and operable on the processor
  • a stitching program when the free-viewpoint video frame stitching program is executed by the processor, the steps of the above-mentioned free-viewpoint video frame stitching method are implemented.
  • the present application also provides a readable storage medium, on which a free-viewpoint video picture stitching program is stored, and when the free-viewpoint video picture stitching program is executed by a processor, any of the above-mentioned A step of the free-viewpoint picture stitching method described in one item.
  • the display request sent by the display terminal is received, and the target timestamp and the viewpoint identifier corresponding to the target viewpoint are obtained according to the display request;
  • the video code stream sent by the encoding terminal is received, and the video code stream is decoded by a decoder , acquire a video sequence; acquire the video frame group corresponding to the target time stamp in the video sequence, wherein each video frame group includes at least two video frames with the same time stamp; according to the video frame group
  • the layout information of the target viewpoint and the viewpoint identifier corresponding to the target viewpoint intercept the target image; and send the target image to the display terminal for the display terminal to generate a display screen according to the target image.
  • this application generates multiple video frames by splicing video images corresponding to different viewpoints at the same time, and sends the generated video frames to the decoding end, and the decoding end receives the video frames, and according to the arrangement information of the video frames and the current viewpoint correspondence
  • the view point identifier intercepts and displays the image corresponding to the current view point from the video frame, thereby reducing the number of spliced video frames by one video frame, so as to achieve the purpose of increasing the resolution.
  • Fig. 1 is a schematic diagram of the device structure of the hardware operating environment involved in the solution of the embodiment of the present application;
  • Fig. 2 is a schematic flow chart of an embodiment of the free viewpoint video picture splicing method of the present application
  • FIG. 3 is a schematic flow diagram of an embodiment of the method for splicing free viewpoint video images according to the present application
  • Fig. 4 is the first example diagram of the spliced image of an embodiment of the free viewpoint video picture splicing method of the present invention
  • Fig. 5 is a second example diagram of a spliced image according to an embodiment of the free-viewpoint video frame splicing method of the present invention.
  • Fig. 6 is a third example diagram of a spliced image according to an embodiment of the free-viewpoint video frame splicing method of the present invention.
  • each video frame group includes at least two video frames with the same time stamp
  • the display request sent by the display terminal is received, and the target timestamp and the viewpoint identifier corresponding to the target viewpoint are obtained according to the display request;
  • the video code stream sent by the encoding terminal is received, and the video code stream is decoded by a decoder , acquire a video sequence; acquire the video frame group corresponding to the target time stamp in the video sequence, wherein each video frame group includes at least two video frames with the same time stamp; according to the video frame group
  • the layout information and the viewpoint identifier corresponding to the target viewpoint intercept a target image; and send the target image to a display terminal for the display terminal to generate a display screen according to the target image.
  • this application generates multiple video frames by splicing video images corresponding to different viewpoints at the same time, and sends the generated video frames to the decoding end, and the decoding end receives the video frames, and according to the arrangement information of the video frames and the current viewpoint correspondence
  • the view point identifier intercepts and displays the image corresponding to the current view point from the video frame, thereby reducing the number of spliced video frames by one video frame, so as to achieve the purpose of increasing the resolution.
  • FIG. 1 is a schematic diagram of a hardware operating environment of a terminal involved in the solution of the embodiment of the present application.
  • the terminal may include: a processor 1001 , such as a CPU, a network interface 1004 , a user interface 1003 , a memory 1005 , and a communication bus 1002 .
  • the communication bus 1002 is used to realize connection and communication between these components.
  • the user interface 1003 may include a display screen (Display), an input unit such as a keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface and a wireless interface.
  • the network interface 1004 may include standard wired interfaces and wireless interfaces (such as non-volatile memory), such as disk storage.
  • the memory 1005 may also be a storage device independent of the foregoing processor 1001 .
  • FIG. 1 does not constitute a limitation on the terminal, and may include more or less components than those shown in the figure, or combine some components, or arrange different components.
  • the memory 1005 as a computer storage medium may include an operating system, a network communication module, a user interface module, and a free-viewpoint video image splicing program.
  • the network interface 1004 is mainly used to connect to the background server and perform data communication with the background server;
  • the user interface 1003 is mainly used to connect to the client (client) and perform data communication with the client;
  • the processor 1001 can be used to call the control program of the decoding end stored in the memory 1005, and perform the following operations:
  • each video frame group includes at least two video frames with the same time stamp
  • processor 1001 can call the control program of the decoder stored in the memory 1005, and also perform the following operations:
  • the viewpoint identifier corresponding to the target viewpoint determine the target video frame where the target image corresponding to the target viewpoint is located and the position information of the target image in the target video frame;
  • processor 1001 can call the control program of the decoder stored in the memory 1005, and also perform the following operations:
  • processor 1001 can call the control program of the decoder stored in the memory 1005, and also perform the following operations:
  • processor 1001 may call the control program of the encoding end stored in the memory 1005, and also perform the following operations:
  • processor 1001 may call the control program of the encoding end stored in the memory 1005, and also perform the following operations:
  • the images with the same time stamp and the corresponding depth images are spliced into at least two video frames according to the preset arrangement information, wherein the images and the corresponding depth images are spliced in the same video frame.
  • processor 1001 may call the control program of the encoding end stored in the memory 1005, and also perform the following operations:
  • the arrangement information of the video frame is generated, wherein the arrangement information of the video frame includes the viewpoint identification and position information of each image in the video frame, and it is determined that the depth is included in the video frame image, the arrangement information of the video frame also includes the viewpoint identification and position information corresponding to each depth image;
  • the method for mosaicing free-viewpoint video images is applied at the decoding end, including the following steps:
  • Step S11 receiving the display request sent by the display terminal, and obtaining the target timestamp and the viewpoint identifier corresponding to the target viewpoint according to the display request;
  • the display request sent by the display terminal is determined, and the time point corresponding to the screen required by the display terminal and the viewpoint identifier of the corresponding viewpoint can be obtained according to the display request. If it is determined that the display terminal needs to display a picture of a real viewpoint, the target time stamp corresponding to the picture and the viewpoint identifier corresponding to the real viewpoint can be obtained according to the display request; if it is determined that the display terminal needs to display a picture of a virtual viewpoint, the corresponding time stamp of the picture can be obtained according to the display request.
  • the target timestamp of the virtual viewpoint and the viewpoint identifiers corresponding to the adjacent viewpoints of the virtual viewpoint, wherein at least the viewpoint identifiers corresponding to two adjacent viewpoints are determined.
  • Step S12 receiving the video code stream sent by the encoding end, decoding the video code stream through a decoder, and obtaining a video sequence
  • the video code stream sent by the encoder is received, and the received video code stream is decoded by a decoder to obtain video sequence and arrangement information.
  • the video sequence is composed of video frame groups corresponding to different time stamps, and the arrangement information is in the sequence header of the video sequence or in the image header of the video frame.
  • Step S13 acquiring a video frame group corresponding to the target time stamp in the video sequence, wherein each video frame group includes at least two video frames with the same time stamp;
  • the video sequence is composed of video frame groups corresponding to different time stamps.
  • the target time stamp corresponding to the picture to be displayed on the display terminal can be obtained, and the video sequence is searched according to the target time stamp.
  • the video frames in the video frame group whose time stamp is the same as the target time stamp are spliced from video frames captured by cameras of various viewpoints at the time point corresponding to the target time stamp.
  • Each video frame group includes at least two video frames with the same time stamp.
  • the video frames may be spliced from pictures corresponding to multiple viewpoints, or may only have a picture corresponding to one viewpoint.
  • Step S14 intercepting the target image according to the arrangement information of the video frame group and the viewpoint identifier corresponding to the target viewpoint;
  • the arrangement information of the video frame group includes the arrangement information of each video frame in the video frame group, and the arrangement information of the video frame group includes viewpoint identifiers, viewpoint images, and video frames.
  • the coordinates of the middle viewpoint image in the video frame and the corresponding width and height of the viewpoint image determine the position of the target image in the target video frame and the size of the target image, and intercept the target image.
  • Step S15 sending the target image to a display terminal for the display terminal to generate a display screen according to the target image.
  • the target image intercepted from the target video frame is sent to the display terminal, and after receiving the target image, the display terminal generates a display screen according to the target image, and displays the target image on the display screen. Display screen.
  • the display request sent by the display terminal is received, and the target timestamp and the viewpoint identifier corresponding to the target viewpoint are obtained according to the display request;
  • the video code stream sent by the encoding terminal is received, and the decoder decodes the A video code stream, obtaining a video sequence; obtaining a video frame group corresponding to the target time stamp in the video sequence, wherein each video frame group includes at least two video frames with the same time stamp; according to the The arrangement information of the video frame group and the viewpoint identifier corresponding to the target viewpoint intercept the target image; and send the target image to the display terminal for the display terminal to generate a display screen according to the target image.
  • this application generates multiple video frames by splicing video images corresponding to different viewpoints at the same time, and sends the generated video frames to the decoding end, and the decoding end receives the video frames, and according to the arrangement information of the video frames and the current viewpoint correspondence
  • the view point identifier intercepts and displays the image corresponding to the current view point from the video frame, thereby reducing the number of spliced video frames by one video frame, so as to achieve the purpose of increasing the resolution.
  • the step S14 includes:
  • the viewpoint identifier corresponding to the target viewpoint determine the target video frame where the target image corresponding to the target viewpoint is located and the position information of the target image in the target video frame;
  • the arrangement information of the video frame group includes the viewpoint identifier corresponding to each viewpoint, the coordinates of the image corresponding to the viewpoint in the video frame, and the width and height of the image corresponding to the viewpoint.
  • search for the viewpoint identifier matching the viewpoint identifier corresponding to the target viewpoint in the arrangement information of the video frame group and determine the target video frame where the target image corresponding to the target viewpoint is located and in the The position information of the target image in the target video frame, that is, the coordinates of the target image in the target video frame and the corresponding width and height of the target image.
  • Intercepting the target image according to the position information for example: after acquiring the position information of the target image corresponding to the target viewpoint, finding the coordinates of the upper left corner pixel of the target image in the target video frame according to the coordinates in the position information; After determining the coordinates of the target image in the video frame, determine the splicing area of the target image in the target video frame according to the width and height corresponding to the target image, thereby intercepting the image in the splicing area, the splicing area The image in is the target image.
  • the video frame where the target image corresponding to the target viewpoint is located and the splicing area in the video frame are determined through the viewpoint identification corresponding to the target viewpoint and the arrangement information of the video frame group, so as to accurately and quickly intercept the target image and send it to display side.
  • step S13 it is determined that the display terminal requests to send the depth image, and after the step S13, further includes:
  • the encoding terminal needs to intercept and send the video picture and depth image corresponding to the target viewpoint in the display request, so that the display terminal can use the video picture and depth image corresponding to the target viewpoint
  • the image is generated and displayed on the display screen.
  • the display terminal needs to determine the viewpoint identifier corresponding to the adjacent viewpoint and send a display request to the decoder; the decoder obtains the viewpoint identifier of the adjacent viewpoint according to the received display request, and The viewpoint identification and the arrangement information of the video frame group determine the target video frame where the target images corresponding to each of the adjacent viewpoints are located.
  • the images and depth images corresponding to each viewpoint are spliced in the same video frame, it can be displayed in the target video. Intercepting the target image corresponding to the adjacent viewpoint and the corresponding depth image in the frame; sending the intercepted target image corresponding to the adjacent viewpoint and the corresponding depth image to the display terminal for the display terminal to correspond to each adjacent viewpoint The image and the depth image are generated to display the image.
  • the step S13 includes:
  • the video frame where the target image corresponding to the target viewpoint is located can be located and the splicing area of the target image in the video frame can be determined, that is, the image in the splicing area is the target image corresponding to the target viewpoint.
  • the arrangement information can be numbered and inserted into the sequence header, and the video frames of the spliced images with the arrangement information can refer to the corresponding number, so that according to the number Find the arrangement information corresponding to the video frame. In this way, by storing the same arrangement information of multiple video frames in the sequence header of the video sequence, the amount of data that needs to be received by the decoding end is reduced.
  • the transmission method of the free-viewpoint video is applied to the encoding end, including the following steps:
  • Step S21 acquiring images corresponding to each viewpoint and preset arrangement information
  • multiple cameras capture images corresponding to multiple viewpoints, wherein one camera can capture an image corresponding to one viewpoint, or one camera can capture an image corresponding to one viewpoint and a corresponding depth image.
  • Multiple cameras send images captured at the same time to the encoder.
  • the encoding end generates preset arrangement information according to a preset arrangement method, one piece of information in the preset arrangement information describes related information of a viewpoint image or a depth image, and the format of the related information is, where, x , y are the coordinates of the pixel in the upper left corner of the image in the video frame, w, h are the width and height of the image, and view_id is the viewpoint identifier.
  • Step S22 splicing images with the same time stamp into at least two video frames according to the preset arrangement information, wherein the same time stamp corresponds to different viewpoints corresponding to images in different video frames;
  • the received images sent by the camera are spliced into video frames according to the preset layout information, that is, the corresponding viewpoint image or the depth image is adjusted according to the width and height of the images in the preset layout information
  • the size of the adjusted image is stitched into the corresponding video frame according to the coordinates.
  • the time stamps corresponding to the images spliced into the same video frame are the same, and a video frame group includes at least two video frames. For example, 27 cameras are deployed for shooting. If the images captured by nine cameras are stitched into one video frame, there will be three video frames with the same time stamp.
  • Each video frame stitches images corresponding to nine viewpoints, such as As shown in Fig. 4, P1, P2...P9 are images captured by nine cameras.
  • Step S23 generating a video frame group according to video frames with the same time stamp, wherein the video frame group includes at least two video frames;
  • the images taken by each camera are spliced into video frames, and the video frames with the same time stamp are spliced to generate a video frame group, wherein the video frame group includes at least a time stamp The same two video frames.
  • Step S24 generating a video sequence from the video frame groups corresponding to different time stamps according to the playback sequence, and inputting the video sequence and the preset arrangement information into an encoder to generate a target video stream.
  • the video frame groups corresponding to different time stamps are sorted according to the playing sequence, a video sequence is generated according to the sorted multiple video frame groups, and the video sequence and the corresponding arrangement information Input the encoder to generate the target video stream.
  • Step S525 sending the target video code stream to a decoding end, so that the decoding end decodes the target video code stream to obtain a corresponding video sequence.
  • the video code stream is sent to the decoder, so that the decoder decodes the video code stream through the decoder to obtain the corresponding video sequence, and according to the display
  • the display request sent by the terminal searches for and intercepts the image frame of the target viewpoint required by the display request in the video sequence.
  • this application obtains the images and arrangement information corresponding to each viewpoint; according to the arrangement information, images with the same time stamp are spliced into at least two video frames, and the same time stamp corresponds to images in different video frames.
  • the viewpoints are different; a video frame group is generated according to the same video frames with timestamps, wherein the video frame group includes at least two video frames; according to the playback order, the video frame groups corresponding to different timestamps are generated into a video sequence, and the The video sequence and the arrangement information are input into an encoder to generate a target video stream. Sending the target video code stream to a decoding end for the decoding end to decode the target video code stream to obtain a corresponding video sequence. In this way, by splicing images corresponding to multiple viewpoints at the same moment into multiple video frames, the number of spliced images in one video frame is reduced, and the purpose of improving resolution is achieved.
  • step S22 it is determined that the image corresponding to each viewpoint and the corresponding depth image are acquired, and the step S22 includes:
  • the images with the same time stamp and the corresponding depth images are spliced into at least two video frames according to the preset arrangement information, wherein the images and the corresponding depth images are spliced in the same video frame.
  • the encoding end stitches the images with the same time stamp and the corresponding depth image according to the preset arrangement information, wherein the preset
  • the layout information includes the coordinates of the upper left pixel of the image or depth image in the video frame, the width and height of the image or depth image in the video frame, the corresponding viewpoint identifier and image category. Stitch images or depth images with the same time stamp into at least two video frames and stitch the images and corresponding depth images into the same video frame, as shown in Figure 5 and Figure 6, where P1, P2, P3...
  • D1, D2, D3...D9 and D21, D22, D23...D30 are the depth images of the images corresponding to each viewpoint, so as to find the images and depths corresponding to the adjacent viewpoints of the virtual viewpoint image to generate an image corresponding to the virtual viewpoint.
  • the step of inputting the video sequence and the preset arrangement information into an encoder to generate a target video stream includes:
  • the arrangement information of the video frame is generated, wherein the arrangement information of the video frame includes the viewpoint identification and position information of each image in the video frame, and it is determined that the depth is included in the video frame image, the arrangement information of the video frame also includes the viewpoint identification and position information corresponding to each depth image;
  • video frames with the same time stamp are combined into a video frame group, and each video frame group is sorted according to the order of the time stamps corresponding to the video frame group, and finally a video sequence is generated.
  • the viewpoint identifier and location information corresponding to the image are used by the decoding end to find the stitching area of the corresponding image according to the viewpoint identifier, so as to intercept the image. Determining the spliced depth images in the video frame, the arrangement information also includes the viewpoint identifier and position information corresponding to each depth image.
  • the arrangement information of the video frames may be added to the sequence header of the original video code stream or the image header of each video frame. If it is determined that multiple video frames in the video sequence have the same arrangement information, the same arrangement information can be numbered and added to the sequence header, and the corresponding number can be added to the corresponding video frame, so that the decoding end can identify the video Frame layout information. In this way, when the arrangement information of multiple video frames in the video sequence is the same, the arrangement information is added to the video frame, and the decoder can read the arrangement information in the video sequence when receiving the video sequence And the number corresponding to the arrangement information included in each video frame, reducing the amount of data that needs to be received by the decoding end.
  • the present application also provides a terminal, the terminal is a decoding end, and the decoding end includes a memory, a processor, and a free-viewpoint video image stored in the memory and operable on the processor
  • a stitching program when the free-viewpoint video frame stitching program is executed by the processor, the steps of the above-mentioned free-viewpoint video frame stitching method are implemented.
  • the present application also provides a terminal, the terminal is an encoding end, and the encoding end includes a memory, a processor, and a free-viewpoint video image stored in the memory and operable on the processor
  • a stitching program when the free-viewpoint video frame stitching program is executed by the processor, the steps of the above-mentioned free-viewpoint video frame stitching method are implemented.
  • the present application also provides a readable storage medium, on which a free-viewpoint video picture stitching program is stored, and when the free-viewpoint video picture stitching program is executed by a processor, any of the above-mentioned A step of the free-viewpoint picture stitching method described in one item.
  • the methods of the above embodiments can be implemented by means of software plus a necessary general-purpose hardware platform, and of course also by hardware, but in many cases the former is better implementation.
  • the technical solution of the present application can be embodied in the form of a software product in essence or the part that contributes to the prior art, and the computer software product is stored in a storage medium as described above (such as ROM/RAM , magnetic disk, optical disk), including several instructions to enable a terminal device (which may be a mobile phone, computer, server, or network device, etc.) to execute the methods described in various embodiments of the present application.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Testing, Inspecting, Measuring Of Stereoscopic Televisions And Televisions (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

A free viewpoint video screen splicing method, a terminal, and a readable storage medium. The free viewpoint video screen splicing method comprises the following steps: receiving a display request, and acquiring, according to the display request, a target timestamp and a viewpoint identifier corresponding to a target viewpoint (S11); receiving a video code stream, and decoding the video code stream by means of a decoder to obtain a video sequence (S12); acquiring a video frame group corresponding to the target timestamp in the video sequence (S13); intercepting a target image according to arrangement information of the video frame group and the viewpoint identifier corresponding to the target viewpoint (S14); and sending the target image to a display end (S15).

Description

自由视点视频画面拼接方法、终端及可读存储介质Free-viewpoint video screen splicing method, terminal and readable storage medium
本申请要求于2021年9月2日提交中国专利局、申请号为202111041026.7、发明名称为“自由视点视频画面拼接方法、终端及可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在申请中。This application claims the priority of the Chinese patent application submitted to the China Patent Office on September 2, 2021, with the application number 202111041026.7, and the title of the invention is "Free Viewpoint Video Screen Splicing Method, Terminal and Readable Storage Medium", the entire content of which is passed References are incorporated in the application.
技术领域technical field
本申请涉及自由视点领域,尤其涉及一种自由视点视频画面拼接方法、终端及可读存储介质。The present application relates to the field of free viewpoints, in particular to a method for mosaicing free viewpoint video images, a terminal and a readable storage medium.
背景技术Background technique
自由视点应用允许观看者在一定范围内以连续视点的形式观看视频。观看者可以设定视点的位置、角度,而不再局限于一个固定的摄像机视角。该应用往往需要多个摄像机同时拍摄,同时生成多个视点的视频,一些自由视点应用中,还需要生成多个视点的视频对应的深度图。Free viewpoint applications allow viewers to watch videos in the form of continuous viewpoints within a certain range. The viewer can set the position and angle of the viewpoint, and is no longer limited to a fixed camera angle of view. This application often requires multiple cameras to shoot at the same time and generate videos from multiple viewpoints at the same time. In some free viewpoint applications, it is also necessary to generate depth maps corresponding to videos from multiple viewpoints.
技术问题technical problem
传统自由视点应用往往使用空域拼接的方式。对于空域拼接的方式,由于编码以及终端解码播放设备支持的编解码计算能力有限,最大编解码分辨率受到限制,因此单路视频的分辨率以及支持传输的视点数之间面临严重的冲突,导致单路视频的分辨率低。Traditional free-viewpoint applications often use spatial stitching. For the way of spatial splicing, due to the limited encoding and decoding computing power supported by terminal decoding and playback equipment, the maximum encoding and decoding resolution is limited, so there is a serious conflict between the resolution of single-channel video and the number of viewpoints supported for transmission, resulting in The resolution of the single-channel video is low.
上述内容仅用于辅助理解本申请的技术方案,并不代表承认上述内容是现有技术。The above content is only used to assist in understanding the technical solution of the present application, and does not mean that the above content is admitted as prior art.
技术解决方案technical solution
本申请的主要目的在于提供一种自由视点视频画面拼接方法,旨在通过将同一时刻不同视点对应的视频画面拼接生成多个视频帧并发送至解码端,解码端接收视频帧并从所述视频帧中截取显示当前视点对应的图像,以减少每个视频帧拼接的画面,从而提高分辨率,解决视频分辨率低的问题。The main purpose of this application is to provide a method for mosaicing video frames from different viewpoints at the same time, aiming to generate multiple video frames by splicing video frames corresponding to different viewpoints at the same time and sending them to the decoding end. The image corresponding to the current viewpoint is intercepted and displayed in the frame to reduce the stitching of each video frame, thereby improving the resolution and solving the problem of low video resolution.
为了实现上述目的,本申请提供一种自由视点视频画面拼接方法,所述自由视点视频画面拼接方法包括以下步骤:In order to achieve the above object, the present application provides a method for splicing free-viewpoint video images, the method for splicing free-viewpoint video images includes the following steps:
接收显示端发送的显示请求,根据所述显示请求获取目标时间戳以及目标视点对应的视点标识;receiving a display request sent by the display terminal, and obtaining a target timestamp and a viewpoint identifier corresponding to the target viewpoint according to the display request;
接收编码端发送的视频码流,通过解码器解码所述视频码流,获取视频序列;receiving the video code stream sent by the encoding end, decoding the video code stream through a decoder, and obtaining a video sequence;
获取所述视频序列中所述的目标时间戳对应的视频帧组,其中,每个所述视频帧组中包括时间戳相同的至少两张视频帧;Obtain the video frame group corresponding to the target time stamp in the video sequence, wherein each video frame group includes at least two video frames with the same time stamp;
根据所述视频帧组的排布信息以及所述目标视点对应的视点标识截取目标图像;Intercepting a target image according to the arrangement information of the video frame group and the viewpoint identifier corresponding to the target viewpoint;
将所述目标图像发送至显示端,以供所述显示端根据所述目标图像生成显示画面。Sending the target image to a display terminal for the display terminal to generate a display screen according to the target image.
进一步地,所述根据所述视频帧组的排布信息以及所述目标视点对应的视点标识截取目标图像的步骤包括:Further, the step of intercepting the target image according to the arrangement information of the video frame group and the viewpoint identifier corresponding to the target viewpoint includes:
根据所述视频帧组的排布信息以及所述目标视点对应的视点标识,确定所述目标视点对应的目标图像所在的目标视频帧以及所述目标视频帧中所述目标图像的位置信息;According to the arrangement information of the video frame group and the viewpoint identifier corresponding to the target viewpoint, determine the target video frame where the target image corresponding to the target viewpoint is located and the position information of the target image in the target video frame;
根据所述位置信息在所述目标视频帧中截取所述目标视点对应的目标图像。Intercepting a target image corresponding to the target viewpoint in the target video frame according to the position information.
进一步地,确定显示端请求发送深度图像,所述获取所述视频序列中所述目标时间戳对应的视频帧组的步骤之后,还包括:Further, after determining that the display terminal requests to send a depth image, after the step of acquiring the video frame group corresponding to the target time stamp in the video sequence, it further includes:
根据所述视频帧组的排布信息以及所述目标视点对应的视点标识截取目标图像与对应的深度图像;Intercepting a target image and a corresponding depth image according to the arrangement information of the video frame group and the viewpoint identifier corresponding to the target viewpoint;
将所述目标图像与对应的深度图像发送至显示端,以供所述显示端根据所述目标图像与对应的深度图像生成显示画面。Sending the target image and the corresponding depth image to a display terminal for the display terminal to generate a display picture according to the target image and the corresponding depth image.
进一步地,所述获取所述视频序列中所述目标时间戳对应的视频帧组的步骤包括:Further, the step of acquiring the video frame group corresponding to the target time stamp in the video sequence includes:
根据所述视频序列确定所述目标时间戳对应的视频帧组,并确定序列头或者图像头中的排布信息。Determine the video frame group corresponding to the target time stamp according to the video sequence, and determine the arrangement information in the sequence header or image header.
此外,为了实现上述目的,本申请还提供一种自由视点视频画面拼接方法,应用于编码端,所述自由视点视频的传输方法包括:In addition, in order to achieve the above purpose, the present application also provides a free-viewpoint video splicing method, which is applied to the encoding end, and the transmission method of the free-viewpoint video includes:
获取各个视点对应的图像以及预设排布信息;Obtain the images corresponding to each viewpoint and the preset arrangement information;
根据所述预设排布信息将时间戳相同的图像拼接成至少两个视频帧,其中,相同时间戳对应不同视频帧中的图像对应的视点不同;splicing images with the same time stamp into at least two video frames according to the preset arrangement information, wherein the same time stamp corresponds to different viewpoints corresponding to images in different video frames;
根据时间戳相同的视频帧生成视频帧组,其中,所述视频帧组至少包括两张视频帧;Generate a video frame group according to video frames with the same time stamp, wherein the video frame group includes at least two video frames;
根据播放顺序将不同时间戳对应的所述视频帧组生成视频序列,并将所述视频序列以及所述预设排布信息输入编码器,生成目标视频码流;Generate a video sequence from the video frame groups corresponding to different time stamps according to the playback order, and input the video sequence and the preset arrangement information into an encoder to generate a target video stream;
将所述目标视频码流发送至解码端,以供所述解码端解码所述目标视频码流获取对应的视频序列。Sending the target video code stream to a decoding end for the decoding end to decode the target video code stream to obtain a corresponding video sequence.
进一步地,确定获取到各个视点对应的图像以及对应的深度图像,所述根据所述预设排布信息将时间戳相同的图像拼接成至少两个视频帧的步骤包括:Further, it is determined that the image corresponding to each viewpoint and the corresponding depth image are acquired, and the step of splicing images with the same time stamp into at least two video frames according to the preset arrangement information includes:
根据所述预设排布信息将时间戳相同的图像以及对应的深度图像拼接成至少两个视频帧,其中,所述图像以及对应的深度图像拼接在同一视频帧中。The images with the same time stamp and the corresponding depth images are spliced into at least two video frames according to the preset arrangement information, wherein the images and the corresponding depth images are spliced in the same video frame.
进一步地,所述将所述视频序列以及所述排布信息输入编码器,生成目标视频码流的步骤包括:Further, the step of inputting the video sequence and the arrangement information into an encoder to generate a target video stream includes:
将所述视频序列输入编码器生成原始视频码流;Input the video sequence into an encoder to generate an original video code stream;
根据所述预设排布信息生成视频帧的排布信息,其中,所述视频帧的排布信息包含所述视频帧中每一图像的视点标识和位置信息,确定所述视频帧中包括深度图像,所述视频帧的排布信息还包含每一深度图像对应的视点标识和位置信息;According to the preset arrangement information, the arrangement information of the video frame is generated, wherein the arrangement information of the video frame includes the viewpoint identification and position information of each image in the video frame, and it is determined that the depth is included in the video frame image, the arrangement information of the video frame also includes the viewpoint identification and position information corresponding to each depth image;
将所述视频帧的排布信息添加至所述原始视频码流的序列头或者所述视频帧的图像头中,生成目标视频码流。Adding the arrangement information of the video frame to the sequence header of the original video code stream or the image header of the video frame to generate a target video code stream.
为了实现上述目的,本申请还提供一种终端,所述终端为解码端,所述解码端包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的自由视点视频画面拼接程序,所述自由视点视频画面拼接程序被所述处理器执行时,实现如上所述的自由视点视频画面拼接方法的步骤。In order to achieve the above object, the present application also provides a terminal, the terminal is a decoding end, and the decoding end includes a memory, a processor, and a free-viewpoint video image stored in the memory and operable on the processor A stitching program, when the free-viewpoint video frame stitching program is executed by the processor, the steps of the above-mentioned free-viewpoint video frame stitching method are implemented.
为了实现上述目的,本申请还提供一种终端,所述终端为编码端,所述编码端包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的自由视点视频画面拼接程序,所述自由视点视频画面拼接程序被所述处理器执行时,实现如上所述的自由视点视频画面拼接方法的步骤。In order to achieve the above object, the present application also provides a terminal, the terminal is an encoding end, and the encoding end includes a memory, a processor, and a free-viewpoint video image stored in the memory and operable on the processor A stitching program, when the free-viewpoint video frame stitching program is executed by the processor, the steps of the above-mentioned free-viewpoint video frame stitching method are implemented.
为了实现上述目的,本申请还提供一种可读存储介质,所述可读存储介质上存储有自由视点视频画面拼接程序,所述自由视点视频画面拼接程序被处理器执行时实现如上所述任一项所述的自由视点画面拼接方法的步骤。In order to achieve the above purpose, the present application also provides a readable storage medium, on which a free-viewpoint video picture stitching program is stored, and when the free-viewpoint video picture stitching program is executed by a processor, any of the above-mentioned A step of the free-viewpoint picture stitching method described in one item.
有益效果Beneficial effect
本申请的技术方案中,接收显示端发送的显示请求,根据所述显示请求获取目标时间戳以及目标视点对应的视点标识;接收编码端发送的视频码流,通过解码器解码所述视频码流,获取视频序列;获取所述视频序列中所述目标时间戳对应的视频帧组,其中,每个所述视频帧组中包括时间戳相同的至少两张的视频帧;根据所述视频帧组的排布信息以及所述目标视点对应的视点标识截取目标图像;将所述目标图像发送至显示端,以供所述显示端根据所述目标图像生成显示画面。如此,本申请通过将同一时刻不同视点对应的视频画面拼接生成多个视频帧,并将生成的视频帧发送至解码端,解码端接收视频帧,并根据视频帧的排布信息以及当前视点对应的视点标识从所述视频帧中截取显示当前视点对应的图像,从而减少一个视频帧拼接的视频画面,以达到提高分辨率的目的。In the technical solution of the present application, the display request sent by the display terminal is received, and the target timestamp and the viewpoint identifier corresponding to the target viewpoint are obtained according to the display request; the video code stream sent by the encoding terminal is received, and the video code stream is decoded by a decoder , acquire a video sequence; acquire the video frame group corresponding to the target time stamp in the video sequence, wherein each video frame group includes at least two video frames with the same time stamp; according to the video frame group The layout information of the target viewpoint and the viewpoint identifier corresponding to the target viewpoint intercept the target image; and send the target image to the display terminal for the display terminal to generate a display screen according to the target image. In this way, this application generates multiple video frames by splicing video images corresponding to different viewpoints at the same time, and sends the generated video frames to the decoding end, and the decoding end receives the video frames, and according to the arrangement information of the video frames and the current viewpoint correspondence The view point identifier intercepts and displays the image corresponding to the current view point from the video frame, thereby reducing the number of spliced video frames by one video frame, so as to achieve the purpose of increasing the resolution.
附图说明Description of drawings
图1是本申请实施例方案涉及的硬件运行环境的装置结构示意图;Fig. 1 is a schematic diagram of the device structure of the hardware operating environment involved in the solution of the embodiment of the present application;
图2是本申请自由视点视频画面拼接方法一实施例的流程示意图;Fig. 2 is a schematic flow chart of an embodiment of the free viewpoint video picture splicing method of the present application;
图3是本申请自由视点视频画面拼接方法一实施例的流程示意图;FIG. 3 is a schematic flow diagram of an embodiment of the method for splicing free viewpoint video images according to the present application;
图4是本发明自由视点视频画面拼接方法一实施例的拼接图像的第一实例图;Fig. 4 is the first example diagram of the spliced image of an embodiment of the free viewpoint video picture splicing method of the present invention;
图5是本发明自由视点视频画面拼接方法一实施例的拼接图像的第二实例图;Fig. 5 is a second example diagram of a spliced image according to an embodiment of the free-viewpoint video frame splicing method of the present invention;
图6是本发明自由视点视频画面拼接方法一实施例的拼接图像的第三实例图;Fig. 6 is a third example diagram of a spliced image according to an embodiment of the free-viewpoint video frame splicing method of the present invention;
本申请目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。The realization, functional features and advantages of the present application will be further described in conjunction with the embodiments and with reference to the accompanying drawings.
本发明的实施方式Embodiments of the present invention
应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。It should be understood that the specific embodiments described here are only used to explain the present application, and are not intended to limit the present application.
本申请的主要技术方案是:The main technical scheme of the application is:
接收显示端发送的显示请求,根据所述显示请求获取目标时间戳以及目标视点对应的视点标识;receiving a display request sent by the display terminal, and obtaining a target timestamp and a viewpoint identifier corresponding to the target viewpoint according to the display request;
接收编码端发送的视频码流,通过解码器解码所述视频码流,获取视频序列;receiving the video code stream sent by the encoding end, decoding the video code stream through a decoder, and obtaining a video sequence;
获取所述视频序列中所述目标时间戳对应的视频帧组,其中,每个所述视频帧组中包括时间戳相同的至少两张视频帧;Acquiring a video frame group corresponding to the target time stamp in the video sequence, wherein each video frame group includes at least two video frames with the same time stamp;
根据所述视频帧组的排布信息以及所述目标视点对应的视点标识截取目标图像;Intercepting a target image according to the arrangement information of the video frame group and the viewpoint identifier corresponding to the target viewpoint;
将所述目标图像发送至显示端,以供所述显示端根据所述目标图像生成显示画面。Sending the target image to a display terminal for the display terminal to generate a display screen according to the target image.
在相关技术中,由于编码以及终端解码播放设备支持的编解码计算能力有限,最大编解码分辨率受到限制,因此单路视频的分辨率以及支持传输的视点数之间面临严重的冲突,导致单路视频的分辨率低。In related technologies, due to the limited codec computing power supported by encoding and terminal decoding and playback devices, the maximum codec resolution is limited, so there is a serious conflict between the resolution of a single video and the number of viewpoints supported for transmission, resulting in a single The resolution of the road video is low.
本申请的技术方案中,接收显示端发送的显示请求,根据所述显示请求获取目标时间戳以及目标视点对应的视点标识;接收编码端发送的视频码流,通过解码器解码所述视频码流,获取视频序列;获取所述视频序列中所述目标时间戳对应的视频帧组,其中,每个所述视频帧组中包括时间戳相同的至少两张视频帧;根据所述视频帧组的排布信息以及所述目标视点对应的视点标识截取目标图像;将所述目标图像发送至显示端,以供所述显示端根据所述目标图像生成显示画面。如此,本申请通过将同一时刻不同视点对应的视频画面拼接生成多个视频帧,并将生成的视频帧发送至解码端,解码端接收视频帧,并根据视频帧的排布信息以及当前视点对应的视点标识从所述视频帧中截取显示当前视点对应的图像,从而减少一个视频帧拼接的视频画面,以达到提高分辨率的目的。In the technical solution of the present application, the display request sent by the display terminal is received, and the target timestamp and the viewpoint identifier corresponding to the target viewpoint are obtained according to the display request; the video code stream sent by the encoding terminal is received, and the video code stream is decoded by a decoder , acquire a video sequence; acquire the video frame group corresponding to the target time stamp in the video sequence, wherein each video frame group includes at least two video frames with the same time stamp; according to the video frame group The layout information and the viewpoint identifier corresponding to the target viewpoint intercept a target image; and send the target image to a display terminal for the display terminal to generate a display screen according to the target image. In this way, this application generates multiple video frames by splicing video images corresponding to different viewpoints at the same time, and sends the generated video frames to the decoding end, and the decoding end receives the video frames, and according to the arrangement information of the video frames and the current viewpoint correspondence The view point identifier intercepts and displays the image corresponding to the current view point from the video frame, thereby reducing the number of spliced video frames by one video frame, so as to achieve the purpose of increasing the resolution.
如图1所示,图1是本申请实施例方案涉及的终端的硬件运行环境示意图。As shown in FIG. 1 , FIG. 1 is a schematic diagram of a hardware operating environment of a terminal involved in the solution of the embodiment of the present application.
如图1所示,该终端可以包括:处理器1001,例如CPU,网络接口1004,用户接口1003,存储器1005,通信总线1002。其中,通信总线1002用于实现这些组件之间的连接通信。用户接口1003可以包括显示屏(Display)、输入单元比如键盘(Keyboard),可选用户接口1003还可以包括标准的有线接口、无线接口。网络接口1004可选的可以包括标准的有线接口、无线接口(如存储器(non-volatilememory),例如磁盘存储器。存储器1005可选的还可以是独立于前述处理器1001的存储装置。As shown in FIG. 1 , the terminal may include: a processor 1001 , such as a CPU, a network interface 1004 , a user interface 1003 , a memory 1005 , and a communication bus 1002 . Wherein, the communication bus 1002 is used to realize connection and communication between these components. The user interface 1003 may include a display screen (Display), an input unit such as a keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface and a wireless interface. Optionally, the network interface 1004 may include standard wired interfaces and wireless interfaces (such as non-volatile memory), such as disk storage. Optionally, the memory 1005 may also be a storage device independent of the foregoing processor 1001 .
本领域技术人员可以理解,图1中示出的终端的结构并不构成对终端的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。Those skilled in the art can understand that the structure of the terminal shown in FIG. 1 does not constitute a limitation on the terminal, and may include more or less components than those shown in the figure, or combine some components, or arrange different components.
如图1所示,作为一种计算机存储介质的存储器1005中可以包括操作系统、网络通信模块、用户接口模块以及自由视点视频画面拼接程序。As shown in FIG. 1 , the memory 1005 as a computer storage medium may include an operating system, a network communication module, a user interface module, and a free-viewpoint video image splicing program.
在图1所示的终端中,网络接口1004主要用于连接后台服务器,与后台服务器进行数据通信;用户接口1003主要用于连接客户端(用户端),与客户端进行数据通信;而处理器1001可以用于调用存储器1005中存储的解码端的控制程序,并执行以下操作:In the terminal shown in Figure 1, the network interface 1004 is mainly used to connect to the background server and perform data communication with the background server; the user interface 1003 is mainly used to connect to the client (client) and perform data communication with the client; and the processor 1001 can be used to call the control program of the decoding end stored in the memory 1005, and perform the following operations:
接收显示端发送的显示请求,根据所述显示请求获取目标时间戳以及目标视点对应的视点标识;receiving a display request sent by the display terminal, and obtaining a target timestamp and a viewpoint identifier corresponding to the target viewpoint according to the display request;
接收编码端发送的视频码流,通过解码器解码所述视频码流,获取视频序列;receiving the video code stream sent by the encoding end, decoding the video code stream through a decoder, and obtaining a video sequence;
获取所述视频序列中所述目标时间戳对应的视频帧组,其中,每个所述视频帧组中包括时间戳相同的至少两张视频帧;Acquiring a video frame group corresponding to the target time stamp in the video sequence, wherein each video frame group includes at least two video frames with the same time stamp;
根据所述视频帧组的排布信息以及所述目标视点对应的视点标识截取目标图像;Intercepting a target image according to the arrangement information of the video frame group and the viewpoint identifier corresponding to the target viewpoint;
将所述目标图像发送至显示端,以供所述显示端根据所述目标图像生成显示画面。Sending the target image to a display terminal for the display terminal to generate a display screen according to the target image.
进一步地,处理器1001可以调用存储器1005中存储的解码端的控制程序,还执行以下操作:Further, the processor 1001 can call the control program of the decoder stored in the memory 1005, and also perform the following operations:
根据所述视频帧组的排布信息以及所述目标视点对应的视点标识,确定所述目标视点对应的目标图像所在的目标视频帧以及所述目标视频帧中所述目标图像的位置信息;According to the arrangement information of the video frame group and the viewpoint identifier corresponding to the target viewpoint, determine the target video frame where the target image corresponding to the target viewpoint is located and the position information of the target image in the target video frame;
根据所述位置信息在所述目标视频帧中截取所述目标视点对应的目标图像。Intercepting a target image corresponding to the target viewpoint in the target video frame according to the position information.
进一步地,处理器1001可以调用存储器1005中存储的解码端的控制程序,还执行以下操作:Further, the processor 1001 can call the control program of the decoder stored in the memory 1005, and also perform the following operations:
根据所述视频帧组的排布信息以及所述目标视点对应的视点标识截取目标图像与对应的深度图像;Intercepting a target image and a corresponding depth image according to the arrangement information of the video frame group and the viewpoint identifier corresponding to the target viewpoint;
将所述目标图像与对应的深度图像发送至显示端,以供所述显示端根据所述目标图像与对应的深度图像生成显示画面。Sending the target image and the corresponding depth image to a display terminal for the display terminal to generate a display picture according to the target image and the corresponding depth image.
进一步地,处理器1001可以调用存储器1005中存储的解码端的控制程序,还执行以下操作:Further, the processor 1001 can call the control program of the decoder stored in the memory 1005, and also perform the following operations:
根据所述视频序列确定所述目标时间戳对应的视频帧组,并确定序列头中或者图像头中的排布信息。Determine the video frame group corresponding to the target time stamp according to the video sequence, and determine the arrangement information in the sequence header or the image header.
进一步地,处理器1001可以调用存储器1005中存储的编码端的控制程序,还执行以下操作:Further, the processor 1001 may call the control program of the encoding end stored in the memory 1005, and also perform the following operations:
获取各个视点对应的图像以及预设排布信息;Obtain the images corresponding to each viewpoint and the preset arrangement information;
根据所述预设排布信息将时间戳相同的图像拼接成至少两个视频帧,其中,相同时间戳对应不同视频帧中的图像对应的视点不同;splicing images with the same time stamp into at least two video frames according to the preset arrangement information, wherein the same time stamp corresponds to different viewpoints corresponding to images in different video frames;
根据时间戳相同的视频帧生成视频帧组,其中,所述视频帧组至少包括两张视频帧;Generate a video frame group according to video frames with the same time stamp, wherein the video frame group includes at least two video frames;
根据播放顺序将不同时间戳对应的所述视频帧组生成视频序列,并将所述视频序列以及所述预设排布信息输入编码器,生成目标视频码流;Generate a video sequence from the video frame groups corresponding to different time stamps according to the playback order, and input the video sequence and the preset arrangement information into an encoder to generate a target video stream;
将所述目标视频码流发送至解码端,以供所述解码端解码所述目标视频码流获取对应的视频序列。Sending the target video code stream to a decoding end for the decoding end to decode the target video code stream to obtain a corresponding video sequence.
进一步地,处理器1001可以调用存储器1005中存储的编码端的控制程序,还执行以下操作:Further, the processor 1001 may call the control program of the encoding end stored in the memory 1005, and also perform the following operations:
根据所述预设排布信息将时间戳相同的图像以及对应的深度图像拼接成至少两个视频帧,其中,所述图像以及对应的深度图像拼接在同一视频帧中。The images with the same time stamp and the corresponding depth images are spliced into at least two video frames according to the preset arrangement information, wherein the images and the corresponding depth images are spliced in the same video frame.
进一步地,处理器1001可以调用存储器1005中存储的编码端的控制程序,还执行以下操作:Further, the processor 1001 may call the control program of the encoding end stored in the memory 1005, and also perform the following operations:
将所述视频序列输入编码器生成原始视频码流;Input the video sequence into an encoder to generate an original video code stream;
根据所述预设排布信息生成视频帧的排布信息,其中,所述视频帧的排布信息包含所述视频帧中每一图像的视点标识和位置信息,确定所述视频帧中包括深度图像,所述视频帧的排布信息还包含每一深度图像对应的视点标识和位置信息;According to the preset arrangement information, the arrangement information of the video frame is generated, wherein the arrangement information of the video frame includes the viewpoint identification and position information of each image in the video frame, and it is determined that the depth is included in the video frame image, the arrangement information of the video frame also includes the viewpoint identification and position information corresponding to each depth image;
将所述视频帧的排布信息添加至所述原始视频码流的序列头或者所述视频帧的图像头中,生成目标视频码流。Adding the arrangement information of the video frame to the sequence header of the original video code stream or the image header of the video frame to generate a target video code stream.
如图2所示,本申请一实施例中,所述自由视点视频画面拼接方法应用在解码端,包括以下步骤:As shown in Figure 2, in an embodiment of the present application, the method for mosaicing free-viewpoint video images is applied at the decoding end, including the following steps:
步骤S11,接收显示端发送的显示请求,根据所述显示请求获取目标时间戳以及目标视点对应的视点标识;Step S11, receiving the display request sent by the display terminal, and obtaining the target timestamp and the viewpoint identifier corresponding to the target viewpoint according to the display request;
在本实施例中,确定显示端发送的显示请求,根据所述显示请求可获取显示端需要的画面对应的时间点以及对应视点的视点标识。确定显示端需要显示真实视点的画面,根据所述显示请求可获取画面对应的目标时间戳以及真实视点对应的视点标识;确定显示端需要显示虚拟视点的画面,可根据所述显示请求获取画面对应的目标时间戳以及虚拟视点的相邻视点对应的视点标识,其中,至少确定两个相邻视点对应的视点标识。In this embodiment, the display request sent by the display terminal is determined, and the time point corresponding to the screen required by the display terminal and the viewpoint identifier of the corresponding viewpoint can be obtained according to the display request. If it is determined that the display terminal needs to display a picture of a real viewpoint, the target time stamp corresponding to the picture and the viewpoint identifier corresponding to the real viewpoint can be obtained according to the display request; if it is determined that the display terminal needs to display a picture of a virtual viewpoint, the corresponding time stamp of the picture can be obtained according to the display request. The target timestamp of the virtual viewpoint and the viewpoint identifiers corresponding to the adjacent viewpoints of the virtual viewpoint, wherein at least the viewpoint identifiers corresponding to two adjacent viewpoints are determined.
步骤S12,接收编码端发送的视频码流,通过解码器解码所述视频码流,获取视频序列;Step S12, receiving the video code stream sent by the encoding end, decoding the video code stream through a decoder, and obtaining a video sequence;
在本实施例中,接收编码端发送的视频码流,通过解码器对接收到的视频码流进行解码,获取视频序列以及排布信息。所述视频序列是由不同时间戳对应的视频帧组组成,所述排布信息在所述视频序列的序列头中或者在视频帧的图像头中。In this embodiment, the video code stream sent by the encoder is received, and the received video code stream is decoded by a decoder to obtain video sequence and arrangement information. The video sequence is composed of video frame groups corresponding to different time stamps, and the arrangement information is in the sequence header of the video sequence or in the image header of the video frame.
步骤S13,获取所述视频序列中所述目标时间戳对应的视频帧组,其中,每个所述视频帧组中包括时间戳相同的至少两张视频帧;Step S13, acquiring a video frame group corresponding to the target time stamp in the video sequence, wherein each video frame group includes at least two video frames with the same time stamp;
在本实施例中,所述视频序列是由不同时间戳对应的视频帧组组成,根据显示请求可获取显示端需要显示的画面对应的目标时间戳,根据所述目标时间戳在视频序列中查找时间戳与所述目标时间戳相同的视频帧组。时间戳与所述目标时间戳相同的所述视频帧组中的视频帧由各个视点的摄像机在目标时间戳对应的时刻点拍摄的视频画面拼接而成。每个视频帧组中包括时间戳相同的至少两张视频帧,所述视频帧可以是多个视点对应的画面拼接而成,而可以仅有一个视点对应的画面。In this embodiment, the video sequence is composed of video frame groups corresponding to different time stamps. According to the display request, the target time stamp corresponding to the picture to be displayed on the display terminal can be obtained, and the video sequence is searched according to the target time stamp. A group of video frames with the same timestamp as the target timestamp. The video frames in the video frame group whose time stamp is the same as the target time stamp are spliced from video frames captured by cameras of various viewpoints at the time point corresponding to the target time stamp. Each video frame group includes at least two video frames with the same time stamp. The video frames may be spliced from pictures corresponding to multiple viewpoints, or may only have a picture corresponding to one viewpoint.
步骤S14,根据所述视频帧组的排布信息以及所述目标视点对应的视点标识截取目标图像;Step S14, intercepting the target image according to the arrangement information of the video frame group and the viewpoint identifier corresponding to the target viewpoint;
在本实施例中,所述视频帧组的排布信息包括所述视频帧组中的每一视频帧的排布信息,所述视频帧组的排布信息包括视点标识、视点图像于视频帧的坐标以及视点图像对应的宽高。根据目标视点对应的视点标识在所述视频帧组的排布信息中查找匹配的视点标识,并根据所述视点标识确定所述目标视点对应的目标图像所在的目标视频帧,并根据排布信息中视点图像于视频帧的坐标以及视点图像对应的宽高确定在目标视频帧中目标图像所在的位置以及目标图像的大小,截取所述目标图像。In this embodiment, the arrangement information of the video frame group includes the arrangement information of each video frame in the video frame group, and the arrangement information of the video frame group includes viewpoint identifiers, viewpoint images, and video frames. The coordinates of and the width and height corresponding to the viewpoint image. Searching for a matching viewpoint identifier in the arrangement information of the video frame group according to the viewpoint identifier corresponding to the target viewpoint, and determining the target video frame where the target image corresponding to the target viewpoint is located according to the viewpoint identifier, and according to the arrangement information The coordinates of the middle viewpoint image in the video frame and the corresponding width and height of the viewpoint image determine the position of the target image in the target video frame and the size of the target image, and intercept the target image.
步骤S15,将所述目标图像发送至显示端,以供所述显示端根据所述目标图像生成显示画面。Step S15, sending the target image to a display terminal for the display terminal to generate a display screen according to the target image.
在本实施例中,将从目标视频帧中截取的目标图像发送至显示端,所述显示端接收到所述目标图像后,根据所述目标图像生成显示画面,并在显示屏中显示所述显示画面。In this embodiment, the target image intercepted from the target video frame is sent to the display terminal, and after receiving the target image, the display terminal generates a display screen according to the target image, and displays the target image on the display screen. Display screen.
综上所述,在本申请中,接收显示端发送的显示请求,根据所述显示请求获取目标时间戳以及目标视点对应的视点标识;接收编码端发送的视频码流,通过解码器解码所述视频码流,获取视频序列;获取所述视频序列中所述目标时间戳对应的视频帧组,其中,每个所述视频帧组中包括时间戳相同的至少两张的视频帧;根据所述视频帧组的排布信息以及所述目标视点对应的视点标识截取目标图像;将所述目标图像发送至显示端,以供所述显示端根据所述目标图像生成显示画面。如此,本申请通过将同一时刻不同视点对应的视频画面拼接生成多个视频帧,并将生成的视频帧发送至解码端,解码端接收视频帧,并根据视频帧的排布信息以及当前视点对应的视点标识从所述视频帧中截取显示当前视点对应的图像,从而减少一个视频帧拼接的视频画面,以达到提高分辨率的目的。To sum up, in this application, the display request sent by the display terminal is received, and the target timestamp and the viewpoint identifier corresponding to the target viewpoint are obtained according to the display request; the video code stream sent by the encoding terminal is received, and the decoder decodes the A video code stream, obtaining a video sequence; obtaining a video frame group corresponding to the target time stamp in the video sequence, wherein each video frame group includes at least two video frames with the same time stamp; according to the The arrangement information of the video frame group and the viewpoint identifier corresponding to the target viewpoint intercept the target image; and send the target image to the display terminal for the display terminal to generate a display screen according to the target image. In this way, this application generates multiple video frames by splicing video images corresponding to different viewpoints at the same time, and sends the generated video frames to the decoding end, and the decoding end receives the video frames, and according to the arrangement information of the video frames and the current viewpoint correspondence The view point identifier intercepts and displays the image corresponding to the current view point from the video frame, thereby reducing the number of spliced video frames by one video frame, so as to achieve the purpose of increasing the resolution.
在本申请一实施例中,所述步骤S14包括:In an embodiment of the present application, the step S14 includes:
根据所述视频帧组的排布信息以及所述目标视点对应的视点标识,确定所述目标视点对应的目标图像所在的目标视频帧以及所述目标视频帧中所述目标图像的位置信息;According to the arrangement information of the video frame group and the viewpoint identifier corresponding to the target viewpoint, determine the target video frame where the target image corresponding to the target viewpoint is located and the position information of the target image in the target video frame;
根据所述位置信息在所述目标视频帧中截取所述目标视点对应的目标图像。Intercepting a target image corresponding to the target viewpoint in the target video frame according to the position information.
在本实施例中,所述视频帧组的排布信息包括各个视点对应的视点标识、视点对应图像在视频帧中的坐标以及视点对应图像的宽高。根据目标视点对应的视点标识在所述视频帧组的排布信息中查找与所述目标视点对应的视点标识匹配的视点标识,并确定所述目标视点对应的目标图像所在的目标视频帧以及在所述目标视频帧中所述目标图像的位置信息,即所述目标图像在目标视频帧中的坐标以及所述目标图像对应的宽高。根据所述位置信息截取所述目标图像,例如:在获取目标视点对应的目标图像的位置信息后,根据所述位置信息中的坐标找到所述目标图像左上角像素在目标视频帧中的坐标;在确定所述目标图像在视频帧中坐标后,根据所述目标图像对应的宽高确定所述目标图像在目标视频帧中的拼接区域,从而截取所述拼接区域中的图像,所述拼接区域中的图像为目标图像。如此,通过目标视点对应的视点标识以及视频帧组的排布信息确定所述目标视点对应的目标图像所在的视频帧以及在视频帧中的拼接区域,从而准确、快速的截取目标图像并发送至显示端。In this embodiment, the arrangement information of the video frame group includes the viewpoint identifier corresponding to each viewpoint, the coordinates of the image corresponding to the viewpoint in the video frame, and the width and height of the image corresponding to the viewpoint. According to the viewpoint identifier corresponding to the target viewpoint, search for the viewpoint identifier matching the viewpoint identifier corresponding to the target viewpoint in the arrangement information of the video frame group, and determine the target video frame where the target image corresponding to the target viewpoint is located and in the The position information of the target image in the target video frame, that is, the coordinates of the target image in the target video frame and the corresponding width and height of the target image. Intercepting the target image according to the position information, for example: after acquiring the position information of the target image corresponding to the target viewpoint, finding the coordinates of the upper left corner pixel of the target image in the target video frame according to the coordinates in the position information; After determining the coordinates of the target image in the video frame, determine the splicing area of the target image in the target video frame according to the width and height corresponding to the target image, thereby intercepting the image in the splicing area, the splicing area The image in is the target image. In this way, the video frame where the target image corresponding to the target viewpoint is located and the splicing area in the video frame are determined through the viewpoint identification corresponding to the target viewpoint and the arrangement information of the video frame group, so as to accurately and quickly intercept the target image and send it to display side.
在本申请一实施例中,确定显示端请求发送深度图像,所述步骤S13之后,还包括:In an embodiment of the present application, it is determined that the display terminal requests to send the depth image, and after the step S13, further includes:
根据所述视频帧组的排布信息以及所述目标视点对应的视点标识截取目标图像与对应的深度图像;Intercepting a target image and a corresponding depth image according to the arrangement information of the video frame group and the viewpoint identifier corresponding to the target viewpoint;
将所述目标图像与对应的深度图像发送至显示端,以供所述显示端根据所述目标图像与对应的深度图像生成显示画面。Sending the target image and the corresponding depth image to a display terminal for the display terminal to generate a display picture according to the target image and the corresponding depth image.
在本实施例中,确定显示端请求发送深度图像,需要编码端截取并发送所述显示请求中目标视点对应的视频画面以及深度图像,以供显示端根据所述目标视点对应的视频画面以及深度图像生成显示画面并在显示屏中显示。例如,确定显示端需要显示一个虚拟视点对应的画面,需要所述虚拟视点左右两侧相邻视点对应的图像以及深度图像,根据相邻视点的图像以及深度图像合成所述虚拟视点的图像画面,此时,所述显示端需要确定相邻视点对应的视点标识并向解码端发送显示请求;所述解码端根据接收到的显示请求获取相邻视点的视点标识,并根据所述相邻视点的视点标识以及视频帧组的排布信息确定各个所述相邻视点对应的目标图像所在的目标视频帧,由于各个视点对应的图像以及深度图像拼接在同一视频帧中,故可在所述目标视频帧中截取对应的相邻视点的目标图像以及对应的深度图像;将截取到的所述相邻视点对应的目标图像以及对应的深度图像发送至显示端,以供显示端根据各个相邻视点对应的图像以及深度图像生成显示图像。In this embodiment, if it is determined that the display terminal requests to send a depth image, the encoding terminal needs to intercept and send the video picture and depth image corresponding to the target viewpoint in the display request, so that the display terminal can use the video picture and depth image corresponding to the target viewpoint The image is generated and displayed on the display screen. For example, if it is determined that the display terminal needs to display a picture corresponding to a virtual viewpoint, images and depth images corresponding to adjacent viewpoints on the left and right sides of the virtual viewpoint are required, and the image frames of the virtual viewpoint are synthesized according to the images and depth images of the adjacent viewpoints, At this time, the display terminal needs to determine the viewpoint identifier corresponding to the adjacent viewpoint and send a display request to the decoder; the decoder obtains the viewpoint identifier of the adjacent viewpoint according to the received display request, and The viewpoint identification and the arrangement information of the video frame group determine the target video frame where the target images corresponding to each of the adjacent viewpoints are located. Since the images and depth images corresponding to each viewpoint are spliced in the same video frame, it can be displayed in the target video. Intercepting the target image corresponding to the adjacent viewpoint and the corresponding depth image in the frame; sending the intercepted target image corresponding to the adjacent viewpoint and the corresponding depth image to the display terminal for the display terminal to correspond to each adjacent viewpoint The image and the depth image are generated to display the image.
在本申请一实施例中,所述步骤S13包括:In an embodiment of the present application, the step S13 includes:
根据所述视频序列确定所述目标时间戳对应的视频帧组,并确定序列头中或者图像头中的排布信息。Determine the video frame group corresponding to the target time stamp according to the video sequence, and determine the arrangement information in the sequence header or the image header.
在本实施例中,解码编码端发送的视频码流得到对应的视频序列后,查找时间戳与显示请求对应的时间戳相同的视频帧组,并获取存储于序列头或者图像头中的排布信息,通过所述排布信息以及目标视点对应的视点标识,可定位目标视点对应的目标图像所在的视频帧并确定所述目标图像在视频帧中的拼接区域,即所述拼接区域中的图像为目标视点对应的目标图像。确定视频序列中多个视频帧的排布信息相同,可将所述排布信息编号并插入序列头中,以所述排布信息拼接图像的视频帧可引用对应的编号,以根据所述编号找到视频帧对应的排布信息。如此,通过将多个视频帧相同的排布信息存储于视频序列的序列头中,降低解码端需要接收的数据量。In this embodiment, after decoding the video code stream sent by the encoding end to obtain the corresponding video sequence, search for the video frame group whose timestamp is the same as the timestamp corresponding to the display request, and obtain the arrangement stored in the sequence header or image header Information, through the arrangement information and the viewpoint identification corresponding to the target viewpoint, the video frame where the target image corresponding to the target viewpoint is located can be located and the splicing area of the target image in the video frame can be determined, that is, the image in the splicing area is the target image corresponding to the target viewpoint. If it is determined that the arrangement information of multiple video frames in the video sequence is the same, the arrangement information can be numbered and inserted into the sequence header, and the video frames of the spliced images with the arrangement information can refer to the corresponding number, so that according to the number Find the arrangement information corresponding to the video frame. In this way, by storing the same arrangement information of multiple video frames in the sequence header of the video sequence, the amount of data that needs to be received by the decoding end is reduced.
如图3所示,在本申请一实施例中,所述自由视点视频的传输方法应用于编码端,包括以下步骤:As shown in Figure 3, in an embodiment of the present application, the transmission method of the free-viewpoint video is applied to the encoding end, including the following steps:
步骤S21,获取各个视点对应的图像以及预设排布信息;Step S21, acquiring images corresponding to each viewpoint and preset arrangement information;
在本实施例中,多个相机拍摄得到多个视点对应的图像,其中,一个相机可拍摄得到一个视点对应的图像,或一个相机可拍摄得到一个视点对应的图像以及对应的深度图像。多个相机将同一时刻拍摄到的图像发送至编码端。编码端根据预先设定的排布方式生成预设排布信息,所述预设排布信息中的一条信息描述一个视点图像或深度图像的相关信息,所述相关信息的格式为,其中,x、y为图像左上角像素在视频帧中的坐标,w、h为图像的宽高,view_id为视点标识。In this embodiment, multiple cameras capture images corresponding to multiple viewpoints, wherein one camera can capture an image corresponding to one viewpoint, or one camera can capture an image corresponding to one viewpoint and a corresponding depth image. Multiple cameras send images captured at the same time to the encoder. The encoding end generates preset arrangement information according to a preset arrangement method, one piece of information in the preset arrangement information describes related information of a viewpoint image or a depth image, and the format of the related information is, where, x , y are the coordinates of the pixel in the upper left corner of the image in the video frame, w, h are the width and height of the image, and view_id is the viewpoint identifier.
步骤S22,根据所述预设排布信息将时间戳相同的图像拼接成至少两个视频帧,其中,相同时间戳对应不同视频帧中的图像对应的视点不同;Step S22, splicing images with the same time stamp into at least two video frames according to the preset arrangement information, wherein the same time stamp corresponds to different viewpoints corresponding to images in different video frames;
在本实施例中,根据预设排布信息将接收到的相机发送的图像拼接成为视频帧,即根据预设排布信息中图像的宽和高调整对应的所述视点图像或者所述深度图像的大小,将调整后的图像根据坐标拼接至对应视频帧中。拼接至同一个视频帧中的图像对应的时间戳相同,一个视频帧组中至少包括两个视频帧。例如,部署了27台摄像机进行拍摄,若将九个摄像机拍摄的图像拼接成为一个视频帧,则时间戳相同的视频帧就有三个,其中,每个视频帧拼接九个视点对应的图像,如图4所示,其中P1、P2...P9为九个摄像头拍摄的图像。In this embodiment, the received images sent by the camera are spliced into video frames according to the preset layout information, that is, the corresponding viewpoint image or the depth image is adjusted according to the width and height of the images in the preset layout information The size of the adjusted image is stitched into the corresponding video frame according to the coordinates. The time stamps corresponding to the images spliced into the same video frame are the same, and a video frame group includes at least two video frames. For example, 27 cameras are deployed for shooting. If the images captured by nine cameras are stitched into one video frame, there will be three video frames with the same time stamp. Each video frame stitches images corresponding to nine viewpoints, such as As shown in Fig. 4, P1, P2...P9 are images captured by nine cameras.
步骤S23,根据时间戳相同的视频帧生成视频帧组,其中,所述视频帧组至少包括两张视频帧;Step S23, generating a video frame group according to video frames with the same time stamp, wherein the video frame group includes at least two video frames;
在本实施例中,在获取各个相机拍摄的图像后,将各个相机拍摄的图像拼接成为视频帧,将时间戳相同的视频帧拼接生成视频帧组,其中,所述视频帧组至少包括时间戳相同的两张视频帧。In this embodiment, after acquiring the images taken by each camera, the images taken by each camera are spliced into video frames, and the video frames with the same time stamp are spliced to generate a video frame group, wherein the video frame group includes at least a time stamp The same two video frames.
步骤S24,根据播放顺序将不同时间戳对应的所述视频帧组生成视频序列,并将所述视频序列以及所述预设排布信息输入编码器,生成目标视频码流。Step S24, generating a video sequence from the video frame groups corresponding to different time stamps according to the playback sequence, and inputting the video sequence and the preset arrangement information into an encoder to generate a target video stream.
在本实施例中,根据播放的先后顺序将不同时间戳对应的视频帧组进行排序,根据排序后的多个所述视频帧组生成视频序列,并将所述视频序列以及对应的排布信息输入编码器,生成目标视频码流。In this embodiment, the video frame groups corresponding to different time stamps are sorted according to the playing sequence, a video sequence is generated according to the sorted multiple video frame groups, and the video sequence and the corresponding arrangement information Input the encoder to generate the target video stream.
步骤S525,将所述目标视频码流发送至解码端,以供所述解码端解码所述目标视频码流获取对应的视频序列。Step S525, sending the target video code stream to a decoding end, so that the decoding end decodes the target video code stream to obtain a corresponding video sequence.
在本实施例中,在编码器生成目标视频码流后,将所述视频码流发送至解码端,以供解码端通过解码器解码所述视频码流,获取对应的视频序列,并根据显示端发送的显示请求在所述视频序列中查找并截取所述显示请求需要的目标视点的图像画面。In this embodiment, after the encoder generates the target video code stream, the video code stream is sent to the decoder, so that the decoder decodes the video code stream through the decoder to obtain the corresponding video sequence, and according to the display The display request sent by the terminal searches for and intercepts the image frame of the target viewpoint required by the display request in the video sequence.
综上所述,本申请获取各个视点对应的图像以及排布信息;根据所述排布信息将时间戳相同的图像拼接成至少两个视频帧,相同时间戳对应不同视频帧中的图像对应的视点不同;根据时间戳相同的视频帧生成视频帧组,其中,所述视频帧组至少包括两张视频帧;根据播放顺序将不同时间戳对应的所述视频帧组生成视频序列,并将所述视频序列以及所述排布信息输入编码器,生成目标视频码流。将所述目标视频码流发送至解码端,以供所述解码端解码所述目标视频码流获取对应的视频序列。如此,通过将同一时刻的多个视点对应的图像拼接成多个视频帧,以减少一个视频帧中拼接的图像数量,达到提高分辨率的目的。To sum up, this application obtains the images and arrangement information corresponding to each viewpoint; according to the arrangement information, images with the same time stamp are spliced into at least two video frames, and the same time stamp corresponds to images in different video frames. The viewpoints are different; a video frame group is generated according to the same video frames with timestamps, wherein the video frame group includes at least two video frames; according to the playback order, the video frame groups corresponding to different timestamps are generated into a video sequence, and the The video sequence and the arrangement information are input into an encoder to generate a target video stream. Sending the target video code stream to a decoding end for the decoding end to decode the target video code stream to obtain a corresponding video sequence. In this way, by splicing images corresponding to multiple viewpoints at the same moment into multiple video frames, the number of spliced images in one video frame is reduced, and the purpose of improving resolution is achieved.
在本申请一实施例中,确定获取到各个视点对应的图像以及对应的深度图像,所述步骤S22包括:In an embodiment of the present application, it is determined that the image corresponding to each viewpoint and the corresponding depth image are acquired, and the step S22 includes:
根据所述预设排布信息将时间戳相同的图像以及对应的深度图像拼接成至少两个视频帧,其中,所述图像以及对应的深度图像拼接在同一视频帧中。The images with the same time stamp and the corresponding depth images are spliced into at least two video frames according to the preset arrangement information, wherein the images and the corresponding depth images are spliced in the same video frame.
在本实施例中,相机将当前时刻拍摄的图像以及深度图像发送至编码端后,编码端根据预设排布信息将时间戳相同的图像以及对应的深度图像进行拼接,其中,所述预设排布信息包括图像或深度图像左上角像素在视频帧中的坐标、图像或深度图像在视频帧中的宽高、对应的视点标识以及图像类别。将时间戳相同的图像或深度图像拼接成至少两个视频帧且将图像与对应的深度图像拼接在同一视频帧中,如图5以及图6所示,其中,P1、P2、P3...P10为视点对应的图像,D1、D2、D3....D9及D21、D22、D23...D30为各视点对应的图像的深度图像,以便查找虚拟视点的相邻视点对应的图像与深度图像,从而生成虚拟视点对应的图像。In this embodiment, after the camera sends the image captured at the current moment and the depth image to the encoding end, the encoding end stitches the images with the same time stamp and the corresponding depth image according to the preset arrangement information, wherein the preset The layout information includes the coordinates of the upper left pixel of the image or depth image in the video frame, the width and height of the image or depth image in the video frame, the corresponding viewpoint identifier and image category. Stitch images or depth images with the same time stamp into at least two video frames and stitch the images and corresponding depth images into the same video frame, as shown in Figure 5 and Figure 6, where P1, P2, P3... P10 is the image corresponding to the viewpoint, D1, D2, D3...D9 and D21, D22, D23...D30 are the depth images of the images corresponding to each viewpoint, so as to find the images and depths corresponding to the adjacent viewpoints of the virtual viewpoint image to generate an image corresponding to the virtual viewpoint.
在本申请一实施例中,所述将所述视频序列以及所述预设排布信息输入编码器,生成目标视频码流的步骤包括:In an embodiment of the present application, the step of inputting the video sequence and the preset arrangement information into an encoder to generate a target video stream includes:
将所述视频序列输入编码器生成原始视频码流;Input the video sequence into an encoder to generate an original video code stream;
根据所述预设排布信息生成视频帧的排布信息,其中,所述视频帧的排布信息包含所述视频帧中每一图像的视点标识和位置信息,确定所述视频帧中包括深度图像,所述视频帧的排布信息还包含每一深度图像对应的视点标识和位置信息;According to the preset arrangement information, the arrangement information of the video frame is generated, wherein the arrangement information of the video frame includes the viewpoint identification and position information of each image in the video frame, and it is determined that the depth is included in the video frame image, the arrangement information of the video frame also includes the viewpoint identification and position information corresponding to each depth image;
将所述视频帧的排布信息添加至所述原始视频码流的序列头或者所述视频帧的图像头中,生成目标视频码流。Adding the arrangement information of the video frame to the sequence header of the original video code stream or the image header of the video frame to generate a target video code stream.
在本实施例中,将相同时间戳的视频帧组合成视频帧组,并根据所述视频帧组对应的时间戳的先后顺序,对各个视频帧组进行排序,最后生成视频序列。将所述视频序列输入编码器生成原始视频码流,并根据预设排布信息生成各个视频帧的排布信息,其中,每个所述视频帧的排布信息包含所述视频帧中每一图像对应的视点标识以及位置信息,以供解码端根据视点标识查找到对应图像的拼接区域,从而截取图像。确定视频帧中拼接深度图像,所述排布信息中也包含每一深度图像对应的视点标识以及位置信息。所述视频帧的排布信息可添加至原视频码流的序列头或者各个视频帧的图像头中。确定视频序列中有多个视频帧的排布信息相同,可将相同的排布信息编号并添加至序列头中,并在对应的视频帧中添加对应的编号,以供解码端识别所述视频帧的排布信息。如此,在视频序列中多个视频帧排布信息相同时,将所述排布信息添加至视频帧,解码端在接收到所述视频序列时可根据读取所述视频序列中的排布信息以及各个视频帧中包含的排布信息对应的编号,降低解码端需要接收的数据量。In this embodiment, video frames with the same time stamp are combined into a video frame group, and each video frame group is sorted according to the order of the time stamps corresponding to the video frame group, and finally a video sequence is generated. Input the video sequence into the encoder to generate the original video code stream, and generate the arrangement information of each video frame according to the preset arrangement information, wherein the arrangement information of each video frame includes each The viewpoint identifier and location information corresponding to the image are used by the decoding end to find the stitching area of the corresponding image according to the viewpoint identifier, so as to intercept the image. Determining the spliced depth images in the video frame, the arrangement information also includes the viewpoint identifier and position information corresponding to each depth image. The arrangement information of the video frames may be added to the sequence header of the original video code stream or the image header of each video frame. If it is determined that multiple video frames in the video sequence have the same arrangement information, the same arrangement information can be numbered and added to the sequence header, and the corresponding number can be added to the corresponding video frame, so that the decoding end can identify the video Frame layout information. In this way, when the arrangement information of multiple video frames in the video sequence is the same, the arrangement information is added to the video frame, and the decoder can read the arrangement information in the video sequence when receiving the video sequence And the number corresponding to the arrangement information included in each video frame, reducing the amount of data that needs to be received by the decoding end.
为了实现上述目的,本申请还提供一种终端,所述终端为解码端,所述解码端包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的自由视点视频画面拼接程序,所述自由视点视频画面拼接程序被所述处理器执行时,实现如上所述的自由视点视频画面拼接方法的步骤。In order to achieve the above object, the present application also provides a terminal, the terminal is a decoding end, and the decoding end includes a memory, a processor, and a free-viewpoint video image stored in the memory and operable on the processor A stitching program, when the free-viewpoint video frame stitching program is executed by the processor, the steps of the above-mentioned free-viewpoint video frame stitching method are implemented.
为了实现上述目的,本申请还提供一种终端,所述终端为编码端,所述编码端包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的自由视点视频画面拼接程序,所述自由视点视频画面拼接程序被所述处理器执行时,实现如上所述的自由视点视频画面拼接方法的步骤。In order to achieve the above object, the present application also provides a terminal, the terminal is an encoding end, and the encoding end includes a memory, a processor, and a free-viewpoint video image stored in the memory and operable on the processor A stitching program, when the free-viewpoint video frame stitching program is executed by the processor, the steps of the above-mentioned free-viewpoint video frame stitching method are implemented.
为了实现上述目的,本申请还提供一种可读存储介质,所述可读存储介质上存储有自由视点视频画面拼接程序,所述自由视点视频画面拼接程序被处理器执行时实现如上所述任一项所述的自由视点画面拼接方法的步骤。In order to achieve the above purpose, the present application also provides a readable storage medium, on which a free-viewpoint video picture stitching program is stored, and when the free-viewpoint video picture stitching program is executed by a processor, any of the above-mentioned A step of the free-viewpoint picture stitching method described in one item.
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。The serial numbers of the above embodiments of the present application are for description only, and do not represent the advantages and disadvantages of the embodiments.
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在如上所述的一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,或者网络设备等)执行本申请各个实施例所述的方法。Through the description of the above embodiments, those skilled in the art can clearly understand that the methods of the above embodiments can be implemented by means of software plus a necessary general-purpose hardware platform, and of course also by hardware, but in many cases the former is better implementation. Based on such an understanding, the technical solution of the present application can be embodied in the form of a software product in essence or the part that contributes to the prior art, and the computer software product is stored in a storage medium as described above (such as ROM/RAM , magnetic disk, optical disk), including several instructions to enable a terminal device (which may be a mobile phone, computer, server, or network device, etc.) to execute the methods described in various embodiments of the present application.
以上仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。The above are only preferred embodiments of the present application, and are not intended to limit the patent scope of the present application. All equivalent structures or equivalent process transformations made by using the description of the application and the accompanying drawings are directly or indirectly used in other related technical fields. , are all included in the patent protection scope of the present application in the same way.

Claims (15)

  1. 一种自由视点视频画面拼接方法,其中,应用于解码端,所述自由视点视频的传输方法包括:A free-viewpoint video picture splicing method, wherein, applied to the decoding end, the transmission method of the free-viewpoint video includes:
    接收显示端发送的显示请求,根据所述显示请求获取目标时间戳以及目标视点对应的视点标识;receiving a display request sent by the display terminal, and obtaining a target timestamp and a viewpoint identifier corresponding to the target viewpoint according to the display request;
    接收编码端发送的视频码流,通过解码器解码所述视频码流,获取视频序列;receiving the video code stream sent by the encoding end, decoding the video code stream through a decoder, and obtaining a video sequence;
    获取所述视频序列中所述目标时间戳对应的视频帧组,其中,每个所述视频帧组中包括时间戳相同的至少两张视频帧;Acquiring a video frame group corresponding to the target time stamp in the video sequence, wherein each video frame group includes at least two video frames with the same time stamp;
    根据所述视频帧组的排布信息以及所述目标视点对应的视点标识截取目标图像;以及Intercepting a target image according to the arrangement information of the video frame group and the viewpoint identifier corresponding to the target viewpoint; and
    将所述目标图像发送至显示端,以供所述显示端根据所述目标图像生成显示画面。Sending the target image to a display terminal for the display terminal to generate a display screen according to the target image.
  2. 如权利要求1所述的自由视点视频画面拼接方法,其中,所述根据所述视频帧组的排布信息以及所述目标视点对应的视点标识截取目标图像的步骤包括:The free-viewpoint video picture splicing method according to claim 1, wherein the step of intercepting the target image according to the arrangement information of the video frame group and the viewpoint identifier corresponding to the target viewpoint comprises:
    根据所述视频帧组的排布信息以及所述目标视点对应的视点标识,确定所述目标视点对应的目标图像所在的目标视频帧以及所述目标视频帧中所述目标图像的位置信息;以及According to the arrangement information of the video frame group and the viewpoint identifier corresponding to the target viewpoint, determine the target video frame where the target image corresponding to the target viewpoint is located and the position information of the target image in the target video frame; and
    根据所述位置信息在所述目标视频帧中截取所述目标视点对应的目标图像。Intercepting a target image corresponding to the target viewpoint in the target video frame according to the position information.
  3. 如权利要求2所述的自由视点视频画面拼接方法,其中,所述根据所述位置信息在所述目标视频帧中截取所述目标视点对应的目标图像的步骤包括:The free-viewpoint video picture splicing method according to claim 2, wherein the step of intercepting the target image corresponding to the target viewpoint in the target video frame according to the position information comprises:
    根据所述位置信息确定所述目标视点对应的目标图像的坐标以及宽高;以及determining the coordinates and width and height of the target image corresponding to the target viewpoint according to the position information; and
    根据所述所述目标图像的坐标以及宽高在所述目标视频帧中截取所述目标图像。intercepting the target image in the target video frame according to the coordinates and the width and height of the target image.
  4. 如权利要求3所述的自由视点视频画面拼接方法,其中,所述根据所述所述目标图像的坐标以及宽高在所述目标视频帧中截取所述目标图像的步骤包括:The free-viewpoint video picture splicing method according to claim 3, wherein the step of intercepting the target image in the target video frame according to the coordinates and width and height of the target image comprises:
    根据所述目标图像的坐标以及宽高确定所述目标图像在目标视频帧中的拼接区域;以及Determine the splicing area of the target image in the target video frame according to the coordinates and the width and height of the target image; and
    截取所述拼接区域中的图像作为目标图像。Capture the image in the stitching area as the target image.
  5. 如权利要求1所述的自由视点视频画面拼接方法,其中,确定显示端请求发送深度图像,所述获取所述视频序列中所述目标时间戳对应的视频帧组的步骤之后,还包括:The free-viewpoint video picture splicing method according to claim 1, wherein, after determining that the display terminal requests to send a depth image, after the step of acquiring the video frame group corresponding to the target time stamp in the video sequence, further comprising:
    根据所述视频帧组的排布信息以及所述目标视点对应的视点标识截取目标图像与对应的深度图像;以及Intercepting a target image and a corresponding depth image according to the arrangement information of the video frame group and the viewpoint identifier corresponding to the target viewpoint; and
    将所述目标图像与对应的深度图像发送至显示端,以供所述显示端根据所述目标图像与对应的深度图像生成显示画面。Sending the target image and the corresponding depth image to a display terminal for the display terminal to generate a display picture according to the target image and the corresponding depth image.
  6. 如权利要求1所述的自由视点视频画面拼接方法,其中,所述获取所述视频序列中所述目标时间戳对应的视频帧组的步骤包括:The free-viewpoint video picture splicing method according to claim 1, wherein the step of obtaining the video frame group corresponding to the target time stamp in the video sequence comprises:
    根据所述视频序列确定所述目标时间戳对应的视频帧组,并确定序列头中或者图像头中的排布信息。Determine the video frame group corresponding to the target time stamp according to the video sequence, and determine the arrangement information in the sequence header or the image header.
  7. 如权利要求1所述的自由视点视频画面拼接方法,其中,所述根据所述显示请求获取目标时间戳以及目标视点对应的视点标识的步骤包括:The free-viewpoint video picture splicing method according to claim 1, wherein said step of obtaining the target timestamp and the viewpoint identifier corresponding to the target viewpoint according to the display request comprises:
    确定显示端需要显示真实视点的画面,根据所述显示请求获取对应的目标时间戳以及所述真实视点对应的视点标识;以及Determine that the display terminal needs to display a picture of a real viewpoint, and acquire a corresponding target timestamp and a viewpoint identifier corresponding to the real viewpoint according to the display request; and
    确定显示端需要显示虚拟视点的画面,根据所述显示请求获取对应的目标时间戳以及所述虚拟视点的相邻视点对应的视点标识,其中,至少确定两个相邻视点对应的视点标识。Determine that the display terminal needs to display the picture of the virtual viewpoint, and obtain the corresponding target time stamp and the viewpoint identifiers corresponding to the adjacent viewpoints of the virtual viewpoint according to the display request, wherein at least the viewpoint identifiers corresponding to two adjacent viewpoints are determined.
  8. 一种自由视点视频画面拼接方法,其中,应用于编码端,所述自由视点视频的传输方法包括:A free-viewpoint video picture splicing method, wherein, applied to the encoding end, the transmission method of the free-viewpoint video includes:
    获取各个视点对应的图像以及预设排布信息;Obtain the images corresponding to each viewpoint and the preset arrangement information;
    根据所述预设排布信息将时间戳相同的图像拼接成至少两个视频帧,其中,相同时间戳对应不同视频帧中的图像对应的视点不同;splicing images with the same time stamp into at least two video frames according to the preset arrangement information, wherein the same time stamp corresponds to different viewpoints corresponding to images in different video frames;
    根据时间戳相同的视频帧生成视频帧组,其中,所述视频帧组至少包括两张视频帧;Generate a video frame group according to video frames with the same time stamp, wherein the video frame group includes at least two video frames;
    根据播放顺序将不同时间戳对应的所述视频帧组生成视频序列,并将所述视频序列以及所述预设排布信息输入编码器,生成目标视频码流;以及generating a video sequence from the video frame groups corresponding to different time stamps according to the playback order, and inputting the video sequence and the preset arrangement information into an encoder to generate a target video stream; and
    将所述目标视频码流发送至解码端,以供所述解码端解码所述目标视频码流获取对应的视频序列。Sending the target video code stream to a decoding end for the decoding end to decode the target video code stream to obtain a corresponding video sequence.
  9. 如权利要求8所述的自由视点视频画面拼接方法,其中,确定获取到各个视点对应的图像以及对应的深度图像,所述根据所述预设排布信息将时间戳相同的图像拼接成至少两个视频帧的步骤包括:The method for splicing free-viewpoint video images according to claim 8, wherein it is determined that the images corresponding to each viewpoint and the corresponding depth images are obtained, and the images with the same time stamp are spliced into at least two images according to the preset arrangement information. The steps for a video frame include:
    根据所述预设排布信息将时间戳相同的图像以及对应的深度图像拼接成至少两个视频帧,其中,所述图像以及对应的深度图像拼接在同一视频帧中。The images with the same time stamp and the corresponding depth images are spliced into at least two video frames according to the preset arrangement information, wherein the images and the corresponding depth images are spliced in the same video frame.
  10. 如权利要求8所述的自由视点视频画面拼接方法,其中,所述将所述视频序列以及所述预设排布信息输入编码器,生成目标视频码流的步骤包括:The free-viewpoint video splicing method according to claim 8, wherein the step of inputting the video sequence and the preset arrangement information into an encoder to generate a target video stream comprises:
    将所述视频序列输入编码器生成原始视频码流;Input the video sequence into an encoder to generate an original video code stream;
    根据所述预设排布信息生成视频帧的排布信息,其中,所述视频帧的排布信息包含所述视频帧中每一图像的视点标识和位置信息,确定所述视频帧中包括深度图像,所述视频帧的排布信息还包含每一深度图像对应的视点标识和位置信息;以及According to the preset arrangement information, the arrangement information of the video frame is generated, wherein the arrangement information of the video frame includes the viewpoint identification and position information of each image in the video frame, and it is determined that the depth is included in the video frame image, the arrangement information of the video frame also includes the viewpoint identification and position information corresponding to each depth image; and
    将所述视频帧的排布信息添加至所述原始视频码流的序列头或者所述视频帧的图像头中,生成目标视频码流。Adding the arrangement information of the video frame to the sequence header of the original video code stream or the image header of the video frame to generate a target video code stream.
  11. 如权利要求8所述的自由视点视频画面拼接方法,其中,所述根据所述预设排布信息将时间戳相同的图像拼接成至少两个视频帧的步骤包括:The method for splicing free-viewpoint video frames according to claim 8, wherein the step of splicing images with the same time stamp into at least two video frames according to the preset arrangement information comprises:
    根据所述预设排布信息确定对应图像的宽高以及图像坐标;determining the width, height and image coordinates of the corresponding image according to the preset arrangement information;
    根据所述图像的宽高调整图像大小;以及resizing the image according to the width and height of said image; and
    根据所述图像坐标将时间戳相同的调整后的图像拼接成至少两个视频帧。Stitching the adjusted images with the same time stamp into at least two video frames according to the image coordinates.
  12. 如权利要求8所述的自由视点视频画面拼接方法,其中,所述根据播放顺序将不同时间戳对应的所述视频帧组生成视频序列的步骤包括:The free-viewpoint video picture splicing method according to claim 8, wherein the step of generating a video sequence from the video frame groups corresponding to different time stamps according to the playing order comprises:
    根据播放的先后顺序将不同时间戳对应的视频帧组进行排序;以及Sorting the video frame groups corresponding to different time stamps according to the order of playing; and
    根据排序后的多个所述视频帧组生成视频序列。A video sequence is generated according to the sorted plurality of video frame groups.
  13. 一种终端,其中,所述终端为解密端,所述解码端包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的自由视点视频画面拼接程序,所述自由视点视频画面拼接程序被所述处理器执行时,实现如权利要求1至7中任一项所述的自由视点视频画面拼接方法的步骤。A terminal, wherein the terminal is a decryption terminal, and the decoding terminal includes a memory, a processor, and a free-viewpoint video splicing program stored on the memory and operable on the processor, and the free-viewpoint When the video picture splicing program is executed by the processor, the steps of the free-viewpoint video picture splicing method according to any one of claims 1 to 7 are realized.
  14. 一种终端,其中,所述终端为编码端,所述编码端包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的自由视点视频画面拼接程序,所述自由视点视频画面拼接程序被所述处理器执行时,实现如权利要求8至12中任一项所述的自由视点视频画面拼接方法的步骤。A terminal, wherein the terminal is an encoding end, and the encoding end includes a memory, a processor, and a free-viewpoint video splicing program stored in the memory and operable on the processor, the free-viewpoint When the video picture splicing program is executed by the processor, the steps of the free-viewpoint video picture splicing method according to any one of claims 8 to 12 are realized.
  15. 一种可读存储介质,其中,所述可读存储介质上存储有自由视点视频画面拼接程序,所述自由视点视频画面拼接程序被处理器执行时实现如权利要求1至12中任一项所述的自由视点画面拼接方法的步骤。A readable storage medium, wherein a free-viewpoint video picture splicing program is stored on the readable storage medium, and when the free-viewpoint video picture splicing program is executed by a processor, the invention described in any one of claims 1 to 12 is realized. The steps of the free viewpoint picture splicing method described above.
PCT/CN2021/129039 2021-09-02 2021-11-05 Free viewpoint video screen splicing method, terminal, and readable storage medium WO2023029204A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111041026.7 2021-09-02
CN202111041026.7A CN113905186B (en) 2021-09-02 2021-09-02 Free viewpoint video picture splicing method, terminal and readable storage medium

Publications (1)

Publication Number Publication Date
WO2023029204A1 true WO2023029204A1 (en) 2023-03-09

Family

ID=79188896

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/129039 WO2023029204A1 (en) 2021-09-02 2021-11-05 Free viewpoint video screen splicing method, terminal, and readable storage medium

Country Status (2)

Country Link
CN (1) CN113905186B (en)
WO (1) WO2023029204A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117579843B (en) * 2024-01-17 2024-04-02 淘宝(中国)软件有限公司 Video coding processing method and electronic equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130100114A1 (en) * 2011-10-21 2013-04-25 James D. Lynch Depth Cursor and Depth Measurement in Images
CN110012310A (en) * 2019-03-28 2019-07-12 北京大学深圳研究生院 A kind of decoding method and device based on free view-point
CN111147868A (en) * 2018-11-02 2020-05-12 广州灵派科技有限公司 Free viewpoint video guide system
CN111669567A (en) * 2019-03-07 2020-09-15 阿里巴巴集团控股有限公司 Multi-angle free visual angle video data generation method and device, medium and server
CN111866525A (en) * 2020-09-23 2020-10-30 腾讯科技(深圳)有限公司 Multi-view video playing control method and device, electronic equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130100114A1 (en) * 2011-10-21 2013-04-25 James D. Lynch Depth Cursor and Depth Measurement in Images
CN111147868A (en) * 2018-11-02 2020-05-12 广州灵派科技有限公司 Free viewpoint video guide system
CN111669567A (en) * 2019-03-07 2020-09-15 阿里巴巴集团控股有限公司 Multi-angle free visual angle video data generation method and device, medium and server
CN110012310A (en) * 2019-03-28 2019-07-12 北京大学深圳研究生院 A kind of decoding method and device based on free view-point
CN111866525A (en) * 2020-09-23 2020-10-30 腾讯科技(深圳)有限公司 Multi-view video playing control method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN113905186A (en) 2022-01-07
CN113905186B (en) 2023-03-10

Similar Documents

Publication Publication Date Title
US20190246162A1 (en) Method and apparatus for presenting and controlling panoramic image, and storage medium
CN111937397B (en) Media data processing method and device
US9646406B2 (en) Position searching method and apparatus based on electronic map
US9485493B2 (en) Method and system for displaying multi-viewpoint images and non-transitory computer readable storage medium thereof
CN109040792B (en) Processing method for video redirection, cloud terminal and cloud desktop server
US9961334B2 (en) Simulated 3D image display method and display device
WO2021147702A1 (en) Video processing method and apparatus
WO2010028559A1 (en) Image splicing method and device
US11290752B2 (en) Method and apparatus for providing free viewpoint video
CN107040808B (en) Method and device for processing popup picture in video playing
US20130156263A1 (en) Verification method, verification device, and computer product
US10805620B2 (en) Method and apparatus for deriving composite tracks
KR20150026561A (en) Method for composing image and an electronic device thereof
CN111711859A (en) Video image processing method, system and terminal equipment
US11146799B2 (en) Method and apparatus for decoding video bitstream, method and apparatus for generating video bitstream, storage medium, and electronic device
WO2023279793A1 (en) Video playing method and apparatus
US10089510B2 (en) Display control methods and apparatuses
WO2023029204A1 (en) Free viewpoint video screen splicing method, terminal, and readable storage medium
WO2023029252A1 (en) Multi-viewpoint video data processing method, device, and storage medium
CN111225293B (en) Video data processing method and device and computer storage medium
CN112771878B (en) Method, client and server for processing media data
KR102407986B1 (en) Method and apparatus for providing broadcasting video
CN113099248B (en) Panoramic video filling method, device, equipment and storage medium
CN112153412B (en) Control method and device for switching video images, computer equipment and storage medium
US20200226716A1 (en) Network-based image processing apparatus and method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21955733

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE