WO2023029204A1

WO2023029204A1 - Free viewpoint video screen splicing method, terminal, and readable storage medium

Info

Publication number: WO2023029204A1
Application number: PCT/CN2021/129039
Authority: WO
Inventors: 王荣刚; 王振宇; 高文
Original assignee: 北京大学深圳研究生院
Priority date: 2021-09-02
Filing date: 2021-11-05
Publication date: 2023-03-09
Also published as: CN113905186A; CN113905186B

Abstract

A free viewpoint video screen splicing method, a terminal, and a readable storage medium. The free viewpoint video screen splicing method comprises the following steps: receiving a display request, and acquiring, according to the display request, a target timestamp and a viewpoint identifier corresponding to a target viewpoint (S11); receiving a video code stream, and decoding the video code stream by means of a decoder to obtain a video sequence (S12); acquiring a video frame group corresponding to the target timestamp in the video sequence (S13); intercepting a target image according to arrangement information of the video frame group and the viewpoint identifier corresponding to the target viewpoint (S14); and sending the target image to a display end (S15).

Description

Free-viewpoint video screen splicing method, terminal and readable storage medium

This application claims the priority of the Chinese patent application submitted to the China Patent Office on September 2, 2021, with the application number 202111041026.7, and the title of the invention is "Free Viewpoint Video Screen Splicing Method, Terminal and Readable Storage Medium", the entire content of which is passed References are incorporated in the application.

technical field

The present application relates to the field of free viewpoints, in particular to a method for mosaicing free viewpoint video images, a terminal and a readable storage medium.

Background technique

Free viewpoint applications allow viewers to watch videos in the form of continuous viewpoints within a certain range. The viewer can set the position and angle of the viewpoint, and is no longer limited to a fixed camera angle of view. This application often requires multiple cameras to shoot at the same time and generate videos from multiple viewpoints at the same time. In some free viewpoint applications, it is also necessary to generate depth maps corresponding to videos from multiple viewpoints.

technical problem

Traditional free-viewpoint applications often use spatial stitching. For the way of spatial splicing, due to the limited encoding and decoding computing power supported by terminal decoding and playback equipment, the maximum encoding and decoding resolution is limited, so there is a serious conflict between the resolution of single-channel video and the number of viewpoints supported for transmission, resulting in The resolution of the single-channel video is low.

The above content is only used to assist in understanding the technical solution of the present application, and does not mean that the above content is admitted as prior art.

technical solution

The main purpose of this application is to provide a method for mosaicing video frames from different viewpoints at the same time, aiming to generate multiple video frames by splicing video frames corresponding to different viewpoints at the same time and sending them to the decoding end. The image corresponding to the current viewpoint is intercepted and displayed in the frame to reduce the stitching of each video frame, thereby improving the resolution and solving the problem of low video resolution.

In order to achieve the above object, the present application provides a method for splicing free-viewpoint video images, the method for splicing free-viewpoint video images includes the following steps:

receiving a display request sent by the display terminal, and obtaining a target timestamp and a viewpoint identifier corresponding to the target viewpoint according to the display request;

receiving the video code stream sent by the encoding end, decoding the video code stream through a decoder, and obtaining a video sequence;

Obtain the video frame group corresponding to the target time stamp in the video sequence, wherein each video frame group includes at least two video frames with the same time stamp;

Intercepting a target image according to the arrangement information of the video frame group and the viewpoint identifier corresponding to the target viewpoint;

Sending the target image to a display terminal for the display terminal to generate a display screen according to the target image.

Further, the step of intercepting the target image according to the arrangement information of the video frame group and the viewpoint identifier corresponding to the target viewpoint includes:

According to the arrangement information of the video frame group and the viewpoint identifier corresponding to the target viewpoint, determine the target video frame where the target image corresponding to the target viewpoint is located and the position information of the target image in the target video frame;

Intercepting a target image corresponding to the target viewpoint in the target video frame according to the position information.

Further, after determining that the display terminal requests to send a depth image, after the step of acquiring the video frame group corresponding to the target time stamp in the video sequence, it further includes:

Intercepting a target image and a corresponding depth image according to the arrangement information of the video frame group and the viewpoint identifier corresponding to the target viewpoint;

Sending the target image and the corresponding depth image to a display terminal for the display terminal to generate a display picture according to the target image and the corresponding depth image.

Further, the step of acquiring the video frame group corresponding to the target time stamp in the video sequence includes:

Determine the video frame group corresponding to the target time stamp according to the video sequence, and determine the arrangement information in the sequence header or image header.

In addition, in order to achieve the above purpose, the present application also provides a free-viewpoint video splicing method, which is applied to the encoding end, and the transmission method of the free-viewpoint video includes:

Obtain the images corresponding to each viewpoint and the preset arrangement information;

splicing images with the same time stamp into at least two video frames according to the preset arrangement information, wherein the same time stamp corresponds to different viewpoints corresponding to images in different video frames;

Generate a video frame group according to video frames with the same time stamp, wherein the video frame group includes at least two video frames;

Generate a video sequence from the video frame groups corresponding to different time stamps according to the playback order, and input the video sequence and the preset arrangement information into an encoder to generate a target video stream;

Sending the target video code stream to a decoding end for the decoding end to decode the target video code stream to obtain a corresponding video sequence.

Further, it is determined that the image corresponding to each viewpoint and the corresponding depth image are acquired, and the step of splicing images with the same time stamp into at least two video frames according to the preset arrangement information includes:

The images with the same time stamp and the corresponding depth images are spliced into at least two video frames according to the preset arrangement information, wherein the images and the corresponding depth images are spliced in the same video frame.

Further, the step of inputting the video sequence and the arrangement information into an encoder to generate a target video stream includes:

Input the video sequence into an encoder to generate an original video code stream;

According to the preset arrangement information, the arrangement information of the video frame is generated, wherein the arrangement information of the video frame includes the viewpoint identification and position information of each image in the video frame, and it is determined that the depth is included in the video frame image, the arrangement information of the video frame also includes the viewpoint identification and position information corresponding to each depth image;

Adding the arrangement information of the video frame to the sequence header of the original video code stream or the image header of the video frame to generate a target video code stream.

In order to achieve the above object, the present application also provides a terminal, the terminal is a decoding end, and the decoding end includes a memory, a processor, and a free-viewpoint video image stored in the memory and operable on the processor A stitching program, when the free-viewpoint video frame stitching program is executed by the processor, the steps of the above-mentioned free-viewpoint video frame stitching method are implemented.

In order to achieve the above object, the present application also provides a terminal, the terminal is an encoding end, and the encoding end includes a memory, a processor, and a free-viewpoint video image stored in the memory and operable on the processor A stitching program, when the free-viewpoint video frame stitching program is executed by the processor, the steps of the above-mentioned free-viewpoint video frame stitching method are implemented.

In order to achieve the above purpose, the present application also provides a readable storage medium, on which a free-viewpoint video picture stitching program is stored, and when the free-viewpoint video picture stitching program is executed by a processor, any of the above-mentioned A step of the free-viewpoint picture stitching method described in one item.

Beneficial effect

In the technical solution of the present application, the display request sent by the display terminal is received, and the target timestamp and the viewpoint identifier corresponding to the target viewpoint are obtained according to the display request; the video code stream sent by the encoding terminal is received, and the video code stream is decoded by a decoder , acquire a video sequence; acquire the video frame group corresponding to the target time stamp in the video sequence, wherein each video frame group includes at least two video frames with the same time stamp; according to the video frame group The layout information of the target viewpoint and the viewpoint identifier corresponding to the target viewpoint intercept the target image; and send the target image to the display terminal for the display terminal to generate a display screen according to the target image. In this way, this application generates multiple video frames by splicing video images corresponding to different viewpoints at the same time, and sends the generated video frames to the decoding end, and the decoding end receives the video frames, and according to the arrangement information of the video frames and the current viewpoint correspondence The view point identifier intercepts and displays the image corresponding to the current view point from the video frame, thereby reducing the number of spliced video frames by one video frame, so as to achieve the purpose of increasing the resolution.

Description of drawings

Fig. 1 is a schematic diagram of the device structure of the hardware operating environment involved in the solution of the embodiment of the present application;

Fig. 2 is a schematic flow chart of an embodiment of the free viewpoint video picture splicing method of the present application;

FIG. 3 is a schematic flow diagram of an embodiment of the method for splicing free viewpoint video images according to the present application;

Fig. 4 is the first example diagram of the spliced image of an embodiment of the free viewpoint video picture splicing method of the present invention;

Fig. 5 is a second example diagram of a spliced image according to an embodiment of the free-viewpoint video frame splicing method of the present invention;

Fig. 6 is a third example diagram of a spliced image according to an embodiment of the free-viewpoint video frame splicing method of the present invention;

The realization, functional features and advantages of the present application will be further described in conjunction with the embodiments and with reference to the accompanying drawings.

Embodiments of the present invention

It should be understood that the specific embodiments described here are only used to explain the present application, and are not intended to limit the present application.

The main technical scheme of the application is:

Acquiring a video frame group corresponding to the target time stamp in the video sequence, wherein each video frame group includes at least two video frames with the same time stamp;

In related technologies, due to the limited codec computing power supported by encoding and terminal decoding and playback devices, the maximum codec resolution is limited, so there is a serious conflict between the resolution of a single video and the number of viewpoints supported for transmission, resulting in a single The resolution of the road video is low.

In the technical solution of the present application, the display request sent by the display terminal is received, and the target timestamp and the viewpoint identifier corresponding to the target viewpoint are obtained according to the display request; the video code stream sent by the encoding terminal is received, and the video code stream is decoded by a decoder , acquire a video sequence; acquire the video frame group corresponding to the target time stamp in the video sequence, wherein each video frame group includes at least two video frames with the same time stamp; according to the video frame group The layout information and the viewpoint identifier corresponding to the target viewpoint intercept a target image; and send the target image to a display terminal for the display terminal to generate a display screen according to the target image. In this way, this application generates multiple video frames by splicing video images corresponding to different viewpoints at the same time, and sends the generated video frames to the decoding end, and the decoding end receives the video frames, and according to the arrangement information of the video frames and the current viewpoint correspondence The view point identifier intercepts and displays the image corresponding to the current view point from the video frame, thereby reducing the number of spliced video frames by one video frame, so as to achieve the purpose of increasing the resolution.

As shown in FIG. 1 , FIG. 1 is a schematic diagram of a hardware operating environment of a terminal involved in the solution of the embodiment of the present application.

As shown in FIG. 1 , the terminal may include: a processor 1001 , such as a CPU, a network interface 1004 , a user interface 1003 , a memory 1005 , and a communication bus 1002 . Wherein, the communication bus 1002 is used to realize connection and communication between these components. The user interface 1003 may include a display screen (Display), an input unit such as a keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface and a wireless interface. Optionally, the network interface 1004 may include standard wired interfaces and wireless interfaces (such as non-volatile memory), such as disk storage. Optionally, the memory 1005 may also be a storage device independent of the foregoing processor 1001 .

Those skilled in the art can understand that the structure of the terminal shown in FIG. 1 does not constitute a limitation on the terminal, and may include more or less components than those shown in the figure, or combine some components, or arrange different components.

As shown in FIG. 1 , the memory 1005 as a computer storage medium may include an operating system, a network communication module, a user interface module, and a free-viewpoint video image splicing program.

In the terminal shown in Figure 1, the network interface 1004 is mainly used to connect to the background server and perform data communication with the background server; the user interface 1003 is mainly used to connect to the client (client) and perform data communication with the client; and the processor 1001 can be used to call the control program of the decoding end stored in the memory 1005, and perform the following operations:

Further, the processor 1001 can call the control program of the decoder stored in the memory 1005, and also perform the following operations:

Determine the video frame group corresponding to the target time stamp according to the video sequence, and determine the arrangement information in the sequence header or the image header.

Further, the processor 1001 may call the control program of the encoding end stored in the memory 1005, and also perform the following operations:

As shown in Figure 2, in an embodiment of the present application, the method for mosaicing free-viewpoint video images is applied at the decoding end, including the following steps:

Step S11, receiving the display request sent by the display terminal, and obtaining the target timestamp and the viewpoint identifier corresponding to the target viewpoint according to the display request;

In this embodiment, the display request sent by the display terminal is determined, and the time point corresponding to the screen required by the display terminal and the viewpoint identifier of the corresponding viewpoint can be obtained according to the display request. If it is determined that the display terminal needs to display a picture of a real viewpoint, the target time stamp corresponding to the picture and the viewpoint identifier corresponding to the real viewpoint can be obtained according to the display request; if it is determined that the display terminal needs to display a picture of a virtual viewpoint, the corresponding time stamp of the picture can be obtained according to the display request. The target timestamp of the virtual viewpoint and the viewpoint identifiers corresponding to the adjacent viewpoints of the virtual viewpoint, wherein at least the viewpoint identifiers corresponding to two adjacent viewpoints are determined.

Step S12, receiving the video code stream sent by the encoding end, decoding the video code stream through a decoder, and obtaining a video sequence;

In this embodiment, the video code stream sent by the encoder is received, and the received video code stream is decoded by a decoder to obtain video sequence and arrangement information. The video sequence is composed of video frame groups corresponding to different time stamps, and the arrangement information is in the sequence header of the video sequence or in the image header of the video frame.

Step S13, acquiring a video frame group corresponding to the target time stamp in the video sequence, wherein each video frame group includes at least two video frames with the same time stamp;

In this embodiment, the video sequence is composed of video frame groups corresponding to different time stamps. According to the display request, the target time stamp corresponding to the picture to be displayed on the display terminal can be obtained, and the video sequence is searched according to the target time stamp. A group of video frames with the same timestamp as the target timestamp. The video frames in the video frame group whose time stamp is the same as the target time stamp are spliced from video frames captured by cameras of various viewpoints at the time point corresponding to the target time stamp. Each video frame group includes at least two video frames with the same time stamp. The video frames may be spliced from pictures corresponding to multiple viewpoints, or may only have a picture corresponding to one viewpoint.

Step S14, intercepting the target image according to the arrangement information of the video frame group and the viewpoint identifier corresponding to the target viewpoint;

In this embodiment, the arrangement information of the video frame group includes the arrangement information of each video frame in the video frame group, and the arrangement information of the video frame group includes viewpoint identifiers, viewpoint images, and video frames. The coordinates of and the width and height corresponding to the viewpoint image. Searching for a matching viewpoint identifier in the arrangement information of the video frame group according to the viewpoint identifier corresponding to the target viewpoint, and determining the target video frame where the target image corresponding to the target viewpoint is located according to the viewpoint identifier, and according to the arrangement information The coordinates of the middle viewpoint image in the video frame and the corresponding width and height of the viewpoint image determine the position of the target image in the target video frame and the size of the target image, and intercept the target image.

Step S15, sending the target image to a display terminal for the display terminal to generate a display screen according to the target image.

In this embodiment, the target image intercepted from the target video frame is sent to the display terminal, and after receiving the target image, the display terminal generates a display screen according to the target image, and displays the target image on the display screen. Display screen.

To sum up, in this application, the display request sent by the display terminal is received, and the target timestamp and the viewpoint identifier corresponding to the target viewpoint are obtained according to the display request; the video code stream sent by the encoding terminal is received, and the decoder decodes the A video code stream, obtaining a video sequence; obtaining a video frame group corresponding to the target time stamp in the video sequence, wherein each video frame group includes at least two video frames with the same time stamp; according to the The arrangement information of the video frame group and the viewpoint identifier corresponding to the target viewpoint intercept the target image; and send the target image to the display terminal for the display terminal to generate a display screen according to the target image. In this way, this application generates multiple video frames by splicing video images corresponding to different viewpoints at the same time, and sends the generated video frames to the decoding end, and the decoding end receives the video frames, and according to the arrangement information of the video frames and the current viewpoint correspondence The view point identifier intercepts and displays the image corresponding to the current view point from the video frame, thereby reducing the number of spliced video frames by one video frame, so as to achieve the purpose of increasing the resolution.

In an embodiment of the present application, the step S14 includes:

In this embodiment, the arrangement information of the video frame group includes the viewpoint identifier corresponding to each viewpoint, the coordinates of the image corresponding to the viewpoint in the video frame, and the width and height of the image corresponding to the viewpoint. According to the viewpoint identifier corresponding to the target viewpoint, search for the viewpoint identifier matching the viewpoint identifier corresponding to the target viewpoint in the arrangement information of the video frame group, and determine the target video frame where the target image corresponding to the target viewpoint is located and in the The position information of the target image in the target video frame, that is, the coordinates of the target image in the target video frame and the corresponding width and height of the target image. Intercepting the target image according to the position information, for example: after acquiring the position information of the target image corresponding to the target viewpoint, finding the coordinates of the upper left corner pixel of the target image in the target video frame according to the coordinates in the position information; After determining the coordinates of the target image in the video frame, determine the splicing area of the target image in the target video frame according to the width and height corresponding to the target image, thereby intercepting the image in the splicing area, the splicing area The image in is the target image. In this way, the video frame where the target image corresponding to the target viewpoint is located and the splicing area in the video frame are determined through the viewpoint identification corresponding to the target viewpoint and the arrangement information of the video frame group, so as to accurately and quickly intercept the target image and send it to display side.

In an embodiment of the present application, it is determined that the display terminal requests to send the depth image, and after the step S13, further includes:

In this embodiment, if it is determined that the display terminal requests to send a depth image, the encoding terminal needs to intercept and send the video picture and depth image corresponding to the target viewpoint in the display request, so that the display terminal can use the video picture and depth image corresponding to the target viewpoint The image is generated and displayed on the display screen. For example, if it is determined that the display terminal needs to display a picture corresponding to a virtual viewpoint, images and depth images corresponding to adjacent viewpoints on the left and right sides of the virtual viewpoint are required, and the image frames of the virtual viewpoint are synthesized according to the images and depth images of the adjacent viewpoints, At this time, the display terminal needs to determine the viewpoint identifier corresponding to the adjacent viewpoint and send a display request to the decoder; the decoder obtains the viewpoint identifier of the adjacent viewpoint according to the received display request, and The viewpoint identification and the arrangement information of the video frame group determine the target video frame where the target images corresponding to each of the adjacent viewpoints are located. Since the images and depth images corresponding to each viewpoint are spliced in the same video frame, it can be displayed in the target video. Intercepting the target image corresponding to the adjacent viewpoint and the corresponding depth image in the frame; sending the intercepted target image corresponding to the adjacent viewpoint and the corresponding depth image to the display terminal for the display terminal to correspond to each adjacent viewpoint The image and the depth image are generated to display the image.

In an embodiment of the present application, the step S13 includes:

In this embodiment, after decoding the video code stream sent by the encoding end to obtain the corresponding video sequence, search for the video frame group whose timestamp is the same as the timestamp corresponding to the display request, and obtain the arrangement stored in the sequence header or image header Information, through the arrangement information and the viewpoint identification corresponding to the target viewpoint, the video frame where the target image corresponding to the target viewpoint is located can be located and the splicing area of the target image in the video frame can be determined, that is, the image in the splicing area is the target image corresponding to the target viewpoint. If it is determined that the arrangement information of multiple video frames in the video sequence is the same, the arrangement information can be numbered and inserted into the sequence header, and the video frames of the spliced images with the arrangement information can refer to the corresponding number, so that according to the number Find the arrangement information corresponding to the video frame. In this way, by storing the same arrangement information of multiple video frames in the sequence header of the video sequence, the amount of data that needs to be received by the decoding end is reduced.

As shown in Figure 3, in an embodiment of the present application, the transmission method of the free-viewpoint video is applied to the encoding end, including the following steps:

Step S21, acquiring images corresponding to each viewpoint and preset arrangement information;

In this embodiment, multiple cameras capture images corresponding to multiple viewpoints, wherein one camera can capture an image corresponding to one viewpoint, or one camera can capture an image corresponding to one viewpoint and a corresponding depth image. Multiple cameras send images captured at the same time to the encoder. The encoding end generates preset arrangement information according to a preset arrangement method, one piece of information in the preset arrangement information describes related information of a viewpoint image or a depth image, and the format of the related information is, where, x , y are the coordinates of the pixel in the upper left corner of the image in the video frame, w, h are the width and height of the image, and view_id is the viewpoint identifier.

Step S22, splicing images with the same time stamp into at least two video frames according to the preset arrangement information, wherein the same time stamp corresponds to different viewpoints corresponding to images in different video frames;

In this embodiment, the received images sent by the camera are spliced into video frames according to the preset layout information, that is, the corresponding viewpoint image or the depth image is adjusted according to the width and height of the images in the preset layout information The size of the adjusted image is stitched into the corresponding video frame according to the coordinates. The time stamps corresponding to the images spliced into the same video frame are the same, and a video frame group includes at least two video frames. For example, 27 cameras are deployed for shooting. If the images captured by nine cameras are stitched into one video frame, there will be three video frames with the same time stamp. Each video frame stitches images corresponding to nine viewpoints, such as As shown in Fig. 4, P1, P2...P9 are images captured by nine cameras.

Step S23, generating a video frame group according to video frames with the same time stamp, wherein the video frame group includes at least two video frames;

In this embodiment, after acquiring the images taken by each camera, the images taken by each camera are spliced into video frames, and the video frames with the same time stamp are spliced to generate a video frame group, wherein the video frame group includes at least a time stamp The same two video frames.

Step S24, generating a video sequence from the video frame groups corresponding to different time stamps according to the playback sequence, and inputting the video sequence and the preset arrangement information into an encoder to generate a target video stream.

In this embodiment, the video frame groups corresponding to different time stamps are sorted according to the playing sequence, a video sequence is generated according to the sorted multiple video frame groups, and the video sequence and the corresponding arrangement information Input the encoder to generate the target video stream.

Step S525, sending the target video code stream to a decoding end, so that the decoding end decodes the target video code stream to obtain a corresponding video sequence.

In this embodiment, after the encoder generates the target video code stream, the video code stream is sent to the decoder, so that the decoder decodes the video code stream through the decoder to obtain the corresponding video sequence, and according to the display The display request sent by the terminal searches for and intercepts the image frame of the target viewpoint required by the display request in the video sequence.

To sum up, this application obtains the images and arrangement information corresponding to each viewpoint; according to the arrangement information, images with the same time stamp are spliced into at least two video frames, and the same time stamp corresponds to images in different video frames. The viewpoints are different; a video frame group is generated according to the same video frames with timestamps, wherein the video frame group includes at least two video frames; according to the playback order, the video frame groups corresponding to different timestamps are generated into a video sequence, and the The video sequence and the arrangement information are input into an encoder to generate a target video stream. Sending the target video code stream to a decoding end for the decoding end to decode the target video code stream to obtain a corresponding video sequence. In this way, by splicing images corresponding to multiple viewpoints at the same moment into multiple video frames, the number of spliced images in one video frame is reduced, and the purpose of improving resolution is achieved.

In an embodiment of the present application, it is determined that the image corresponding to each viewpoint and the corresponding depth image are acquired, and the step S22 includes:

In this embodiment, after the camera sends the image captured at the current moment and the depth image to the encoding end, the encoding end stitches the images with the same time stamp and the corresponding depth image according to the preset arrangement information, wherein the preset The layout information includes the coordinates of the upper left pixel of the image or depth image in the video frame, the width and height of the image or depth image in the video frame, the corresponding viewpoint identifier and image category. Stitch images or depth images with the same time stamp into at least two video frames and stitch the images and corresponding depth images into the same video frame, as shown in Figure 5 and Figure 6, where P1, P2, P3... P10 is the image corresponding to the viewpoint, D1, D2, D3...D9 and D21, D22, D23...D30 are the depth images of the images corresponding to each viewpoint, so as to find the images and depths corresponding to the adjacent viewpoints of the virtual viewpoint image to generate an image corresponding to the virtual viewpoint.

In an embodiment of the present application, the step of inputting the video sequence and the preset arrangement information into an encoder to generate a target video stream includes:

In this embodiment, video frames with the same time stamp are combined into a video frame group, and each video frame group is sorted according to the order of the time stamps corresponding to the video frame group, and finally a video sequence is generated. Input the video sequence into the encoder to generate the original video code stream, and generate the arrangement information of each video frame according to the preset arrangement information, wherein the arrangement information of each video frame includes each The viewpoint identifier and location information corresponding to the image are used by the decoding end to find the stitching area of the corresponding image according to the viewpoint identifier, so as to intercept the image. Determining the spliced depth images in the video frame, the arrangement information also includes the viewpoint identifier and position information corresponding to each depth image. The arrangement information of the video frames may be added to the sequence header of the original video code stream or the image header of each video frame. If it is determined that multiple video frames in the video sequence have the same arrangement information, the same arrangement information can be numbered and added to the sequence header, and the corresponding number can be added to the corresponding video frame, so that the decoding end can identify the video Frame layout information. In this way, when the arrangement information of multiple video frames in the video sequence is the same, the arrangement information is added to the video frame, and the decoder can read the arrangement information in the video sequence when receiving the video sequence And the number corresponding to the arrangement information included in each video frame, reducing the amount of data that needs to be received by the decoding end.

The serial numbers of the above embodiments of the present application are for description only, and do not represent the advantages and disadvantages of the embodiments.

Through the description of the above embodiments, those skilled in the art can clearly understand that the methods of the above embodiments can be implemented by means of software plus a necessary general-purpose hardware platform, and of course also by hardware, but in many cases the former is better implementation. Based on such an understanding, the technical solution of the present application can be embodied in the form of a software product in essence or the part that contributes to the prior art, and the computer software product is stored in a storage medium as described above (such as ROM/RAM , magnetic disk, optical disk), including several instructions to enable a terminal device (which may be a mobile phone, computer, server, or network device, etc.) to execute the methods described in various embodiments of the present application.

The above are only preferred embodiments of the present application, and are not intended to limit the patent scope of the present application. All equivalent structures or equivalent process transformations made by using the description of the application and the accompanying drawings are directly or indirectly used in other related technical fields. , are all included in the patent protection scope of the present application in the same way.

Claims

A free-viewpoint video picture splicing method, wherein, applied to the decoding end, the transmission method of the free-viewpoint video includes:

receiving a display request sent by the display terminal, and obtaining a target timestamp and a viewpoint identifier corresponding to the target viewpoint according to the display request;

receiving the video code stream sent by the encoding end, decoding the video code stream through a decoder, and obtaining a video sequence;

Acquiring a video frame group corresponding to the target time stamp in the video sequence, wherein each video frame group includes at least two video frames with the same time stamp;

Intercepting a target image according to the arrangement information of the video frame group and the viewpoint identifier corresponding to the target viewpoint; and

Sending the target image to a display terminal for the display terminal to generate a display screen according to the target image.
The free-viewpoint video picture splicing method according to claim 1, wherein the step of intercepting the target image according to the arrangement information of the video frame group and the viewpoint identifier corresponding to the target viewpoint comprises:

According to the arrangement information of the video frame group and the viewpoint identifier corresponding to the target viewpoint, determine the target video frame where the target image corresponding to the target viewpoint is located and the position information of the target image in the target video frame; and

Intercepting a target image corresponding to the target viewpoint in the target video frame according to the position information.
The free-viewpoint video picture splicing method according to claim 2, wherein the step of intercepting the target image corresponding to the target viewpoint in the target video frame according to the position information comprises:

determining the coordinates and width and height of the target image corresponding to the target viewpoint according to the position information; and

intercepting the target image in the target video frame according to the coordinates and the width and height of the target image.
The free-viewpoint video picture splicing method according to claim 3, wherein the step of intercepting the target image in the target video frame according to the coordinates and width and height of the target image comprises:

Determine the splicing area of the target image in the target video frame according to the coordinates and the width and height of the target image; and

Capture the image in the stitching area as the target image.
The free-viewpoint video picture splicing method according to claim 1, wherein, after determining that the display terminal requests to send a depth image, after the step of acquiring the video frame group corresponding to the target time stamp in the video sequence, further comprising:

Intercepting a target image and a corresponding depth image according to the arrangement information of the video frame group and the viewpoint identifier corresponding to the target viewpoint; and

Sending the target image and the corresponding depth image to a display terminal for the display terminal to generate a display picture according to the target image and the corresponding depth image.
The free-viewpoint video picture splicing method according to claim 1, wherein the step of obtaining the video frame group corresponding to the target time stamp in the video sequence comprises:

Determine the video frame group corresponding to the target time stamp according to the video sequence, and determine the arrangement information in the sequence header or the image header.
The free-viewpoint video picture splicing method according to claim 1, wherein said step of obtaining the target timestamp and the viewpoint identifier corresponding to the target viewpoint according to the display request comprises:

Determine that the display terminal needs to display a picture of a real viewpoint, and acquire a corresponding target timestamp and a viewpoint identifier corresponding to the real viewpoint according to the display request; and

Determine that the display terminal needs to display the picture of the virtual viewpoint, and obtain the corresponding target time stamp and the viewpoint identifiers corresponding to the adjacent viewpoints of the virtual viewpoint according to the display request, wherein at least the viewpoint identifiers corresponding to two adjacent viewpoints are determined.
A free-viewpoint video picture splicing method, wherein, applied to the encoding end, the transmission method of the free-viewpoint video includes:

Obtain the images corresponding to each viewpoint and the preset arrangement information;

splicing images with the same time stamp into at least two video frames according to the preset arrangement information, wherein the same time stamp corresponds to different viewpoints corresponding to images in different video frames;

Generate a video frame group according to video frames with the same time stamp, wherein the video frame group includes at least two video frames;

generating a video sequence from the video frame groups corresponding to different time stamps according to the playback order, and inputting the video sequence and the preset arrangement information into an encoder to generate a target video stream; and

Sending the target video code stream to a decoding end for the decoding end to decode the target video code stream to obtain a corresponding video sequence.
The method for splicing free-viewpoint video images according to claim 8, wherein it is determined that the images corresponding to each viewpoint and the corresponding depth images are obtained, and the images with the same time stamp are spliced into at least two images according to the preset arrangement information. The steps for a video frame include:

The images with the same time stamp and the corresponding depth images are spliced into at least two video frames according to the preset arrangement information, wherein the images and the corresponding depth images are spliced in the same video frame.
The free-viewpoint video splicing method according to claim 8, wherein the step of inputting the video sequence and the preset arrangement information into an encoder to generate a target video stream comprises:

Input the video sequence into an encoder to generate an original video code stream;

According to the preset arrangement information, the arrangement information of the video frame is generated, wherein the arrangement information of the video frame includes the viewpoint identification and position information of each image in the video frame, and it is determined that the depth is included in the video frame image, the arrangement information of the video frame also includes the viewpoint identification and position information corresponding to each depth image; and

Adding the arrangement information of the video frame to the sequence header of the original video code stream or the image header of the video frame to generate a target video code stream.
The method for splicing free-viewpoint video frames according to claim 8, wherein the step of splicing images with the same time stamp into at least two video frames according to the preset arrangement information comprises:

determining the width, height and image coordinates of the corresponding image according to the preset arrangement information;

resizing the image according to the width and height of said image; and

Stitching the adjusted images with the same time stamp into at least two video frames according to the image coordinates.
The free-viewpoint video picture splicing method according to claim 8, wherein the step of generating a video sequence from the video frame groups corresponding to different time stamps according to the playing order comprises:

Sorting the video frame groups corresponding to different time stamps according to the order of playing; and

A video sequence is generated according to the sorted plurality of video frame groups.
A terminal, wherein the terminal is a decryption terminal, and the decoding terminal includes a memory, a processor, and a free-viewpoint video splicing program stored on the memory and operable on the processor, and the free-viewpoint When the video picture splicing program is executed by the processor, the steps of the free-viewpoint video picture splicing method according to any one of claims 1 to 7 are realized.
A terminal, wherein the terminal is an encoding end, and the encoding end includes a memory, a processor, and a free-viewpoint video splicing program stored in the memory and operable on the processor, the free-viewpoint When the video picture splicing program is executed by the processor, the steps of the free-viewpoint video picture splicing method according to any one of claims 8 to 12 are realized.
A readable storage medium, wherein a free-viewpoint video picture splicing program is stored on the readable storage medium, and when the free-viewpoint video picture splicing program is executed by a processor, the invention described in any one of claims 1 to 12 is realized. The steps of the free viewpoint picture splicing method described above.