WO2023029252A1

WO2023029252A1 - Multi-viewpoint video data processing method, device, and storage medium

Info

Publication number: WO2023029252A1
Application number: PCT/CN2021/134319
Authority: WO
Inventors: 王荣刚; 王振宇; 高文
Original assignee: 北京大学深圳研究生院
Priority date: 2021-09-02
Filing date: 2021-11-30
Publication date: 2023-03-09
Also published as: CN113949884A

Abstract

The present application discloses a multi-viewpoint video data processing method, a device, and a storage medium. The method comprises: when the viewpoint of a display device is switched, the display device sends to a decoding device a target viewpoint switched to; the decoding device provides, for the display device, an image, captured from an image frame, of a slave viewpoint corresponding to the target viewpoint; when a picture switching condition is satisfied, the display device sends a picture switching condition satisfaction instruction to the decoding device; according to the image switching condition satisfaction instruction, the decoding device provides, for the display device, an image in a main image frame sequence corresponding to the target viewpoint and/or the image, captured from the image frame, of the slave viewpoint corresponding to the target viewpoint.

Description

Multi-view video data processing method, device and storage medium

related application

This application claims the priority of the Chinese patent application with the application number 202111035779.7 and the application name "Multi-Viewpoint Video Data Processing Method, Device and Storage Medium" submitted to the China Patent Office on September 2, 2021, the entire contents of which are incorporated by reference in application.

technical field

The present application relates to the technical field of video processing, and in particular to a multi-viewpoint video data processing method, device and storage medium.

Background technique

The free viewpoint technology is a technology for viewing videos from a free viewpoint. The current free-viewpoint application using the free-viewpoint technology can allow viewers to watch videos in the form of continuous viewpoints within a certain range. The viewer can set the position and angle of the point of view, and is no longer limited to watching a video shot by a fixed camera angle of view, realizing a 360° free viewing angle to watch the video.

The current free-viewpoint applications often use the spatial stitching method to splice single-channel videos from multiple viewpoints together. When the user switches viewpoints on the free-viewpoint application side, the free-viewpoint application stitches together multiple viewpoints. Single-channel video is The user displays the single-channel video corresponding to the switched viewpoint. However, after splicing the single-channel video from multiple viewpoints using the spatial stitching method, the resolution of the single-channel video from each viewpoint will decrease, resulting in insufficient image resolution for the free-viewpoint application display, resulting in the resolution of the final generated viewpoint images. The rate is not high.

application content

The embodiment of the present application provides a multi-viewpoint video data processing method, device, and storage medium, aiming to solve the problem of the screen resolution required by each viewpoint application display after using the spatial domain splicing method to splicing multiple viewpoints of single-channel video. Insufficient, which in turn leads to the technical problem of low resolution of the final generated viewpoint picture.

An embodiment of the present application provides a method for processing multi-viewpoint video data, which is applied to a decoding device. The method for processing multi-viewpoint video data includes:

When receiving a display instruction sent by a display device, acquiring a current viewpoint corresponding to the display instruction;

Sending the first video data required by the display device to display the current viewpoint picture corresponding to the current viewpoint to the display device; the first video data includes the main image received by the main transmission path corresponding to the current viewpoint The image in the frame sequence and/or the image of the secondary viewpoint corresponding to the current viewpoint intercepted from the secondary image frame in the secondary image frame sequence received from the transmission path, the image in the main image frame sequence and the The images in the slave image frames all include viewpoint pictures and/or viewpoint depth map pictures, and the resolution of the images in the main image frame sequence is greater than the resolution of the images in the slave image frames;

When receiving the viewpoint switching instruction sent by the display device, acquiring a target viewpoint corresponding to the viewpoint switching instruction;

Sending the second video data required by the display device to display the first target viewpoint picture corresponding to the target viewpoint to the display device; the second video data includes the slave image frame in the slave image frame sequence An image from a viewpoint corresponding to the intercepted target viewpoint;

When receiving the screen switching condition satisfaction instruction sent by the display device, sending the third video data required by the display device to display the second target viewpoint picture corresponding to the target viewpoint to the display device; the third video The data includes an image in the main image frame sequence received by the main transmission path corresponding to the target viewpoint and/or an image of the secondary viewpoint corresponding to the target viewpoint intercepted from a secondary image frame in the secondary image frame sequence .

In an embodiment, before the step of sending the first video data required by the display device to display the current viewpoint picture corresponding to the current viewpoint to the display device, it further includes:

Acquire the viewpoint identifier of the secondary viewpoint corresponding to the current viewpoint and the arrangement information of the secondary image frame sequence;

determining the position information of the secondary viewpoint image corresponding to the current viewpoint in the secondary image frame according to the arrangement information and the viewpoint identifier;

An image corresponding to the location information is intercepted from a secondary image frame in the sequence of secondary image frames.

In an embodiment, the step of sending the first video data required by the display device to display the current viewpoint picture corresponding to the current viewpoint to the display device includes:

The current viewpoint is not a virtual viewpoint, and the images in the main image frame sequence received by the main transmission path corresponding to the current viewpoint are sent to the display device.

In an embodiment, after the step of judging whether the current viewpoint is a virtual viewpoint, the step further includes:

The current viewpoint is a virtual viewpoint, and the images in the main image frame sequence received by the main transmission path corresponding to the current viewpoint and the images intercepted from the secondary image frame sequences in the secondary image frame sequence received by the secondary transmission path are sent. The image of the secondary viewpoint corresponding to the current viewpoint is sent to the display device; the secondary viewpoint corresponding to the current viewpoint includes secondary viewpoints adjacent to the current viewpoint.

In an embodiment, the step of sending the second video data required by the display device to display the first target viewpoint picture corresponding to the target viewpoint to the display device includes:

The target viewpoint is not a virtual viewpoint, and an image of a slave viewpoint identical to the target viewpoint intercepted from a slave image frame in the sequence of slave image frames is sent to the display device.

In an embodiment, the target viewpoint is not a virtual viewpoint, and an image of the slave viewpoint identical to the target viewpoint intercepted from the slave image frame in the sequence of slave image frames is sent to the display device After the step, when receiving the screen switching condition satisfaction instruction sent by the display device, sending the third video data required by the display device to display the second target viewpoint picture corresponding to the target viewpoint to the display device step include:

When the screen switching condition satisfaction instruction sent by the display device is received, the images in the main image frame sequence received by the main transmission path corresponding to the target viewpoint are sent to the display device.

In an embodiment, after the step of judging whether the target viewpoint is a virtual viewpoint, the step further includes:

The target viewpoint is a virtual viewpoint, and an image of a slave viewpoint adjacent to the target viewpoint intercepted from a slave image frame in the sequence of slave image frames is sent to the display device.

In one embodiment, the target viewpoint is a virtual viewpoint, and the image of the secondary viewpoint adjacent to the target viewpoint intercepted from the secondary image frame in the secondary image frame sequence is sent to the display device After the step, when receiving the screen switching condition satisfaction instruction sent by the display device, sending the third video data required by the display device to display the second target viewpoint picture corresponding to the target viewpoint to the display device step Also includes:

When the screen switching condition satisfaction instruction sent by the display device is received, the image in the main image frame sequence received by the main transmission path corresponding to the target viewpoint and the image in the secondary image frame sequence received by the secondary image frame sequence are sent The intercepted images from viewpoints adjacent to the target viewpoint are sent to the display device to the display device.

In addition, the present application also provides a method for processing multi-viewpoint video data, which is applied to a coding device, and the method for processing multi-viewpoint video data includes:

Acquiring images of various viewpoints taken by each camera, where different cameras take images corresponding to different viewpoints, and the images include at least one of a viewpoint picture and a viewpoint depth map picture;

Stitching the images of each viewpoint, and encoding the spliced images according to the shooting time and the first resolution, to generate a sequence of secondary image frames;

Encode the images of each viewpoint according to the shooting time and the second resolution to generate a main image frame sequence of each viewpoint, the resolution of the images in the main image frame sequence is greater than that of the images in the spliced images resolution;

When receiving the viewpoint selection instruction sent by the decoding device, acquire the viewpoint selected by the decoding device according to the viewpoint selection instruction;

The main image frame sequence of the viewpoint selected by the decoding device is transmitted to the decoding device through the main transmission path of the viewpoint, and at the same time, the secondary image frame sequence is transmitted to the decoding device through the secondary transmission path.

In one embodiment, the images of each of the viewpoints are spliced, and the spliced images are encoded according to the shooting time and the first resolution, and the steps of generating a sequence of image frames include:

Stitching the images of each of the viewpoints in a preset arrangement manner to generate a spliced image and arrangement information of the images of each of the viewpoints in the spliced image, the arrangement information at least including each of the viewpoints The viewpoint identification and the position information of the image of each viewpoint in the spliced image;

Sorting the stitched images according to the shooting time to generate a stitched image sequence;

Encoding the spliced image sequence according to the first resolution, and marking the encoded spliced image sequence by using the arrangement information, to obtain the secondary image frame sequence.

In one embodiment, the step of marking the coded sequence of spliced images by using the arrangement information, and obtaining the sequence of secondary image frames includes:

The arrangement information is inserted into the sequence header of the coded spliced image sequence to obtain the secondary image frame sequence.

In one embodiment, the step of marking the coded sequence of spliced images by using the arrangement information, and obtaining the sequence of secondary image frames further includes:

Inserting the arrangement information into each stitched image in the encoded stitched image sequence to obtain the slave image frame sequence.

In addition, in order to achieve the above object, the present application also provides a decoding device comprising: a memory, a processor, and a multi-view video data processing program stored in the memory and operable on the processor, the multi-view When the video data processing program is executed by the processor, the steps of the above-mentioned multi-viewpoint video data processing method are realized.

In addition, in order to achieve the above object, the present application also provides a coding device comprising: a memory, a processor, and a multi-view video data processing program stored in the memory and operable on the processor, the multi-view When the video data processing program is executed by the processor, the steps of the above-mentioned multi-viewpoint video data processing method are realized.

In addition, in order to achieve the above purpose, the present application also provides a storage medium on which a multi-viewpoint video data processing program is stored, and when the multi-viewpoint video data processing program is executed by a processor, the above-mentioned multi-viewpoint video data processing method is realized. A step of.

A technical solution for a multi-viewpoint video data processing method, device, and storage medium provided in an embodiment of the present application has at least the following technical effects or advantages:

When the viewpoint of the display device is switched, the display device sends the switched target viewpoint to the decoding device, and the decoding device provides the display device with the image from the viewpoint corresponding to the target viewpoint intercepted from the image frame in the sequence of image frames , so that the display device displays a low-resolution target view point of view picture. When the picture switching condition is met, the display device sends a picture switching condition satisfaction instruction to the decoding device, and the decoding device provides the display device with the corresponding target viewpoint according to the picture switching condition satisfaction instruction. The image in the main image frame sequence and/or the image of the secondary viewpoint corresponding to the target viewpoint intercepted from the image frame makes the display device display the high-resolution target viewpoint. After the single-channel video splicing of multiple viewpoints, the resolution of the screens required for each viewpoint application display is insufficient, which in turn leads to the technical problem that the resolution of the final generated viewpoint images is not high, and not only achieves zero delay in video display when switching viewpoints , It can also quickly restore the video from low resolution to high resolution after the viewpoint is switched, ensuring the clarity when watching videos for a long time.

Description of drawings

Fig. 1 is a schematic structural diagram of the hardware operating environment involved in the solution of the embodiment of the present application;

FIG. 2 is a schematic flow diagram of the first embodiment of the multi-viewpoint video data processing method of the present application;

Fig. 3 is a schematic diagram of camera arrangement during multi-viewpoint video shooting;

FIG. 4 is a schematic flow diagram of a second embodiment of the multi-viewpoint video data processing method of the present application;

FIG. 5 is a schematic flowchart of a third embodiment of a method for processing multi-viewpoint video data according to the present application;

FIG. 6 is a schematic diagram in which the current viewpoint is a real viewpoint;

FIG. 7 is a schematic diagram in which the current viewpoint is a virtual viewpoint;

FIG. 8 is a schematic flowchart of a fourth embodiment of a method for processing multi-viewpoint video data according to the present application;

FIG. 9 is a schematic diagram in which the target viewpoint is a real viewpoint;

FIG. 10 is a schematic diagram in which the target viewpoint is a virtual viewpoint;

FIG. 11 is a schematic flowchart of a fifth embodiment of the multi-view video data processing method of the present application;

Fig. 12 is a schematic diagram of a preset arrangement;

FIG. 13 is a schematic diagram of another preset arrangement.

Detailed ways

In order to better understand the above-mentioned technical solutions, exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. Although exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited by the embodiments set forth herein. Rather, these embodiments are provided for more thorough understanding of the present disclosure and to fully convey the scope of the present disclosure to those skilled in the art.

As shown in FIG. 1 , FIG. 1 is a schematic structural diagram of a hardware operating environment involved in the solution of the embodiment of the present application.

It should be noted that FIG. 1 is a schematic structural diagram of a hardware operating environment of a decoding device or an encoding device.

As shown in FIG. 1 , the decoding device or encoding device may include: a processor 1001 , such as a CPU, a memory 1005 , a user interface 1003 , a network interface 1004 , and a communication bus 1002 . Wherein, the communication bus 1002 is used to realize connection and communication between these components. The user interface 1003 may include a display screen (Display), an input unit such as a keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface and a wireless interface. In an embodiment, the network interface 1004 may include a standard wired interface and a wireless interface (such as a WI-FI interface). The memory 1005 can be a high-speed RAM memory, or a stable memory (non-volatile memory), such as a disk memory. In an embodiment, the memory 1005 may also be a storage device independent of the aforementioned processor 1001 .

Those skilled in the art can understand that the structure of the decoding device or encoding device shown in Figure 1 does not constitute a limitation on the decoding device or encoding device, and may include more or less components than those shown in the illustration, or combine certain components, or Different component arrangements.

As shown in FIG. 1 , the memory 1005 as a storage medium may include an operating system, a network communication module, a user interface module, and a multi-viewpoint video data processing program. Among them, the operating system is a program that manages and controls the hardware and software resources of the decoding device or encoding device, the multi-viewpoint video data processing program, and the operation of other software or programs.

In the decoding device or encoding device shown in Figure 1, the user interface 1003 is mainly used to connect the terminal and perform data communication with the terminal; the network interface 1004 is mainly used for the background server to perform data communication with the background server; the processor 1001 can be used for The multi-viewpoint video data processing program stored in the memory 1005 is called.

In this embodiment, the decoding device or encoding device includes: a memory 1005, a processor 1001, and a multi-view video data processing program stored in the memory 1005 and operable on the processor, wherein:

Applied to a decoding device, when the processor 1001 invokes the multi-viewpoint video data processing program stored in the memory 1005, the following operations are performed:

Applied to a coding device, when the processor 1001 invokes the multi-viewpoint video data processing program stored in the memory 1005, the following operations are performed:

The embodiment of the present application provides an embodiment of the method for processing multi-viewpoint video data. It should be noted that although the logic sequence is shown in the flow chart, in some cases, the sequence shown here may be executed in a different order. steps outlined or described.

As shown in Figure 2, in the first embodiment of the present application, the multi-viewpoint video data processing method of the present application is applied to a decoding device, including the following steps:

Step S210: When a display instruction sent by the display device is received, obtain a current viewpoint corresponding to the display instruction.

In this embodiment, a display device refers to a device installed with a multi-viewpoint video playback application, such as a smart phone, a tablet, a smart TV, a computer, etc., and the user can select different viewpoints on the display device to watch through the multi-viewpoint video playback application. Video, where the video can be a live video, such as a basketball match live broadcast, a football match live broadcast, etc., or a recorded video, such as a badminton recorded broadcast video, etc.

In this embodiment, a live video is taken as an example for description, for example, a live video of a basketball match. When shooting a live video of a basketball game, several cameras need to be arranged around the venue where the game is held. Each camera is responsible for shooting an image from an angle of the game. The image from an angle captured by each camera is the image of a viewpoint. The image includes at least one of a viewpoint frame and a viewpoint depth map frame. As shown in FIG. 3 , 1-9 represent viewpoints P1-P9 respectively, and each viewpoint is correspondingly provided with a camera, that is, camera P1-camera P9. Cameras P1-P9 are 9 cameras for shooting this basketball game. Cameras P1-P9 are respectively responsible for shooting images of a viewpoint. Among them, the image captured by the P1 camera is the image of the P1 viewpoint, and the image captured by the P2 camera is the image of the P2 viewpoint. Image, and so on, the image captured by the P9 camera is the image of the P9 viewpoint.

After the P1 camera-P9 camera captures the images captured under the P1 viewpoint-P9 viewpoint, the encoding device encodes the images captured under the P1 viewpoint-P9 viewpoint. The encoding device uses a preset arrangement method to splice the images of each viewpoint first, and generates the spliced image and the arrangement information of the images of each viewpoint in the spliced image. The spliced image is composed of P1 viewpoint-P9 The images taken by the viewpoints at the same time are spliced together. It can be understood that the spliced image is a large image, and the large image is divided into 9 small images. Wherein, the arrangement information includes at least the viewpoint identification of each viewpoint and the position information of the image of each viewpoint in the spliced image, and the viewpoint identifier indicates which viewpoint image each small image in the large image is, for example, one of the images If the viewpoint identifier is P9, the image corresponds to the viewpoint of P9, and the position information indicates where each small image is specifically arranged in the large image. After generating the stitched image and the arrangement information of the images of each viewpoint in the stitched image, each stitched image is sorted according to the shooting time to generate a stitched image sequence. Suppose, according to the order of shooting time, n spliced images are generated, which are image 1, image 2, image 3, ..., image n, then, after sorting image 1-image n once, it is Stitch image sequences. Furthermore, the sequence of spliced images is encoded according to the preset first resolution, and the sequence of encoded spliced images is marked by using the arrangement information, so as to obtain a sequence of secondary image frames.

Further, while generating the slave image frame sequence, the images of each viewpoint captured from the P1 viewpoint to the P9 viewpoint are also separately encoded according to the shooting time and the preset second resolution, and a master image frame sequence of each viewpoint is generated , that is, according to the shooting time and the second resolution, the images captured under the viewpoint of P1 are separately encoded to generate the main image frame sequence of the viewpoint of P1, and the images captured under the viewpoint of P2 are separately encoded according to the shooting time and the second resolution, Generate the main image frame sequence of the P2 viewpoint, and so on, separately encode the images captured under the P9 viewpoint according to the shooting time and the second resolution, generate the main image frame sequence of the P9 viewpoint, and thus generate 9 main image frame sequences. Wherein, the main image frame of each viewpoint is an encoded image. Wherein, the first resolution represents the total resolution of the stitched image, and the resolution of the images of each viewpoint in the stitched image is smaller than the second resolution, that is, the resolution of the images of each viewpoint in the stitched image Less than the resolution of the images in the main image frame sequence for each viewpoint.

Further, after the coding device generates the secondary image frame sequence and the main image frame sequence of each viewpoint, the coding device obtains the viewpoint selected by the decoding device according to the received viewpoint selection instruction sent by the decoding device, and then selects the decoding device The main image frame sequence of the viewpoint is transmitted to the decoding device through the main transmission path of the viewpoint, and the secondary image frame sequence is transmitted to the decoding device through the secondary transmission path. The encoding device transmits the generated main image frame sequence of each viewpoint to the decoding device through an independent transmission path, and simultaneously transmits the generated secondary image frame sequence to the display device through an independent transmission path. For ease of understanding, the independent transmission path for transmitting the main image frame sequence of each viewpoint is referred to as the main transmission path, and the independent transmission path for transmitting the slave image frame sequence is referred to as the slave transmission path. If there are 9 viewpoints, that is, there are 9 main transmission paths and 1 secondary transmission path, each main transmission path is responsible for transmitting the main image frame sequence of the corresponding viewpoint, for example, the main transmission path corresponding to the P1 viewpoint transmits the main image of the P1 viewpoint Frame sequence, the slave transmission path is responsible for transmitting the slave image frame sequence.

Wherein, the viewpoint selection instruction is generated by the decoding device according to any one of the display instruction sent by the display device and the screen switching condition satisfaction instruction. For example, the display device currently needs to display a viewpoint picture corresponding to a certain viewpoint. If the viewpoint is a P2 viewpoint, the display device generates a display instruction according to the P2 viewpoint, and sends the display instruction including the P2 viewpoint to the decoding device, and the decoding device acquires it according to the display instruction. After arriving at the P2 viewpoint, generate a viewpoint selection instruction including the P2 viewpoint, and send the viewpoint selection instruction including the P2 viewpoint to the encoding device, and the encoding device can obtain the P2 viewpoint according to the received viewpoint selection instruction, and the P2 viewpoint is For the viewpoint selected by the decoding device, the encoding device transmits the main image frame sequence of the P2 viewpoint to the decoding device through the main transmission path corresponding to the P2 viewpoint, and simultaneously transmits the secondary image frame sequence to the decoding device through the secondary transmission path.

For another example, the user switches the current viewpoint P2 to the target viewpoint P4 through the display device, and the viewpoint picture corresponding to the P4 viewpoint is the viewpoint picture to be displayed. When the display device determines that the viewpoint picture switching condition corresponding to the P4 viewpoint is met, a picture including the P4 viewpoint is generated. The switching condition meets the instruction, and the screen switching condition satisfying instruction including the P4 viewpoint is sent to the decoding device. After the decoding device acquires the P4 viewpoint according to the screen switching condition satisfying instruction, it generates a viewpoint selection instruction including the P4 viewpoint, and will include the P4 viewpoint. The viewpoint selection command sent to the encoding device, the encoding device can obtain the P4 viewpoint according to the received viewpoint selection instruction, and the P4 viewpoint is the viewpoint selected by the decoding device, then the encoding device will pass the main image frame sequence of the P4 viewpoint The main transmission path corresponding to the viewpoint is transmitted to the decoding device.

Specifically, if the user opens the multi-viewpoint video playback application on the display device and starts to watch the live video of a basketball match, the multi-viewpoint video playback application generally takes the default viewpoint as the current viewpoint, and displays the current viewpoint picture corresponding to the current viewpoint (basketball race video screen). Before the display device needs to display the current viewpoint picture corresponding to the current viewpoint, it generates a display instruction including the current viewpoint and sends the display instruction including the current viewpoint to the decoding device. When the decoding device receives the display instruction sent by the display device, it acquires the display instruction The current viewpoint for the command. For example, if the default viewpoint is the P2 viewpoint, then the P2 viewpoint is the current viewpoint, and the display device needs to display the viewpoint picture corresponding to the P2 viewpoint. After the decoding device receives the display instruction, the viewpoint obtained according to the display instruction is the P2 viewpoint.

Step S220: Sending the first video data required by the display device to display the current viewpoint picture corresponding to the current viewpoint to the display device.

After the decoding device acquires the current viewpoint corresponding to the display instruction, it generates a viewpoint selection instruction including the current viewpoint, and sends the viewpoint selection instruction including the current viewpoint to the encoding device, and the encoding device obtains the current viewpoint according to the received viewpoint selection instruction , and transmit the main image frame sequence of the current viewpoint to the decoding device through the main transmission path corresponding to the current viewpoint, and at the same time transmit the secondary image frame sequence to the decoding device through the secondary transmission path. The decoding device not only passes through the main transmission path corresponding to the current viewpoint The main image frame sequence of the current viewpoint is received, and the secondary image frame sequence is also received through the secondary transmission path. After the decoding device receives the main image frame sequence and the secondary image frame sequence of the current viewpoint, it acquires the first image required by the display device to display the current viewpoint picture corresponding to the current viewpoint from the primary image frame sequence and/or the secondary image frame sequence of the current viewpoint. video data, and then send the first video data to the display device.

Specifically, the first video data includes the images in the main image frame sequence received by the decoding device from the main transmission path corresponding to the current viewpoint and/or the secondary image frames in the secondary image frame sequence received by the decoding device from the secondary transmission path The intercepted image from the current viewpoint corresponding to the viewpoint. Both the images in the master image frame sequence and the images in the slave image frames include viewpoint frames and/or viewpoint depth map frames, and the resolution of the images in the master image frame sequence is greater than the resolution of the images in the slave image frames. Wherein, the secondary image frame in the secondary image frame sequence refers to the spliced image in the secondary image frame sequence, including images of various viewpoints. If the spliced image is spliced from the images of the P1 viewpoint to the P9 viewpoint, then the secondary image frame includes the images of the P1 viewpoint to the P9 viewpoint. The viewpoint corresponding to each image in the slave image frame can be called the slave viewpoint. For example, if the current viewpoint is the P1 viewpoint, then the P1 viewpoint in the slave image frame is the image from the slave viewpoint corresponding to the current viewpoint. Furthermore, the decoding device starts from the slave image frame The image of the P1 viewpoint intercepted from the image frame in the sequence.

Further, after receiving the first video data, the display device displays the current viewpoint picture corresponding to the current viewpoint according to the first video data. Wherein, when the display device displays the current viewpoint picture corresponding to the current viewpoint according to the first video data, the current viewpoint picture seen by the user is of higher resolution, that is, the live video of the basketball match seen by the user is of higher resolution.

Step S230: Acquiring a target viewpoint corresponding to the viewpoint switching instruction when receiving the viewpoint switching instruction sent by the display device.

Step S240: Send the second video data required by the display device to display the first target viewpoint picture corresponding to the target viewpoint to the display device.

In this embodiment, the multi-viewpoint video playback application has the function of switching viewpoints, and the user can select a target viewpoint to switch to in the multi-viewpoint video playback application to watch the live video of the basketball game corresponding to the target viewpoint. Specifically, the user selects the target viewpoint that needs to be switched on the video playback interface of the display device. After the display device obtains the target viewpoint, it determines that the user needs to perform viewpoint switching, and needs to display the first target viewpoint picture corresponding to the target viewpoint for the user. For the second video data, the display device generates a viewpoint switching instruction according to the target viewpoint, sends the viewpoint switching instruction to the decoding device, and the decoding device receives the viewpoint switching instruction, and obtains the target viewpoint from the viewpoint switching instruction. For example, the current viewpoint is the P1 viewpoint, the target viewpoint selected by the user is the P2 viewpoint, and the target viewpoint acquired by the decoding device from the viewpoint switching instruction is the P2 viewpoint.

After decoding the target viewpoint acquired by the device, the second video data is obtained by intercepting the secondary viewpoint image corresponding to the target viewpoint from the secondary image frame in the sequence of secondary image frames, and sending the second video data to the display device, The display device displays the first target viewpoint picture corresponding to the target viewpoint according to the second video data. For example, the target viewpoint is the P2 viewpoint, and the second video data includes the image from the viewpoint corresponding to the P2 viewpoint intercepted from the image frame in the sequence of image frames, if the image from the viewpoint corresponding to the P2 viewpoint is from the image frame image F2, the display device displays image F2. Wherein, since the resolution of the images of each viewpoint in the secondary image frame is smaller than the resolution of the images in the main image frame sequence of each viewpoint, when the display device displays the image F2, the live basketball game under the viewpoint of P2 seen by the user The resolution of the video picture is lower than the resolution of the live video picture of the basketball game under the P1 viewpoint.

Step S250: When receiving the screen switching condition satisfaction instruction sent by the display device, sending the third video data required by the display device to display the second target viewpoint picture corresponding to the target viewpoint to the display device.

After the target viewpoint is obtained by the decoding device, the viewpoint selection instruction including the target viewpoint is sent to the encoding device, and the encoding device obtains the target viewpoint according to the received viewpoint selection instruction, and passes the main image frame sequence of the target viewpoint through the target viewpoint The corresponding main transmission path is transmitted to the decoding device, and the secondary image frame sequence is also transmitted to the decoding device through the secondary transmission path. The decoding device not only receives the main image frame sequence of the target viewpoint through the main transmission path corresponding to the target viewpoint, but also transmits it through the secondary transmission path path to receive a sequence of frames from an image. After the decoding device receives the main image frame sequence of the target viewpoint and the secondary image frame sequence, it obtains from the main image frame sequence of the target viewpoint and/or the secondary image frame sequence required by the display device to display the second target viewpoint picture corresponding to the target viewpoint. The third video data, that is, the decoding device has prepared the third video data in advance.

In this embodiment, the second video data required by the display device to display the first target viewpoint picture corresponding to the target viewpoint is intercepted from the secondary image frame in the sequence of secondary image frames, and the resolution of the first target viewpoint picture is relatively low , the image quality presented to the user is relatively poor, and after displaying the first target viewpoint image corresponding to the target viewpoint for a period of time, the display device will resume displaying the higher-resolution viewpoint image corresponding to the target viewpoint, and if the viewpoint does not change, The display device always displays a higher resolution viewpoint picture.

The display device will resume displaying the higher-resolution viewpoint picture corresponding to the target viewpoint, which refers to the third video data required for displaying the second target viewpoint picture corresponding to the target viewpoint, and the third video data includes the video data corresponding to the target viewpoint at the encoding end. The image in the main image frame sequence received by the main transmission path and/or the image of the secondary viewpoint corresponding to the target viewpoint intercepted from the secondary image frame in the secondary image frame sequence. Wherein, the image from the viewpoint corresponding to the target viewpoint intercepted from the image frame in the sequence of image frames in the third video data is different from the second video data, and the second video data is sent to the third video data by the decoding device before the third video data For the display device, the third video data is sent by the decoding device to the display device when the screen switching condition sent by the display device satisfies the instruction.

The timing when the display device resumes displaying the higher-resolution viewpoint picture corresponding to the target viewpoint is judged according to the screen switching condition. The screen switching condition can be understood as the display device has already displayed the first target viewpoint picture corresponding to the target viewpoint After the second video data is displayed, continue to display the third video data required for the second target viewpoint picture corresponding to the target viewpoint at the next time, that is, the time to display the second target viewpoint picture corresponding to the target viewpoint. For example, the time required for the second video data to be displayed for the first target viewpoint picture corresponding to the target viewpoint is 10 minutes and 00 seconds, then at 10 minutes and 01 seconds, it will be connected with the first target viewpoint picture corresponding to the target viewpoint, that is At 10 minutes and 01 seconds, the third video data required by the second target viewpoint picture corresponding to the target viewpoint is displayed.

Further, when the display device determines that the screen switching condition is satisfied, it generates a screen switching condition satisfaction instruction, and sends the screen switching condition satisfaction instruction to the decoding device, and the decoding device acquires the third video data according to the screen switching condition satisfaction instruction, and sends the third video data to For the display device, the display device displays the second target viewpoint picture corresponding to the target viewpoint according to the third video data, that is, the display device restores from displaying the first target viewpoint picture corresponding to the lower resolution target viewpoint to displaying the higher resolution For the second target viewpoint picture corresponding to the target viewpoint, the user can see a higher resolution viewpoint picture, that is, a higher resolution live video of a basketball game. Furthermore, if the viewpoint does not change, the display device always displays a viewpoint picture with a higher resolution.

Based on the above step S210-step S250, this embodiment describes the display device according to the following example, specifically as follows:

For example, when the user just opens the multi-viewpoint video playback application to watch the live video of a basketball game, the default current viewpoint is the P1 viewpoint, then the display device displays the viewpoint picture corresponding to the P1 viewpoint according to the first video data sent by the decoding device. At this time, the P1 viewpoint The resolution of the corresponding viewpoint picture is relatively high, and what the user sees is a relatively high-resolution live video of the basketball match. If the user switches the P1 viewpoint to the P2 viewpoint, the display device displays the first target viewpoint picture corresponding to the P2 viewpoint according to the second video data sent by the decoding device. At this time, the resolution of the first target viewpoint picture corresponding to the P2 viewpoint is relatively low. What the user sees is a lower resolution live video of a basketball game. After the second video data is displayed, the display device continues to display the second target viewpoint picture corresponding to the P2 viewpoint according to the third video data sent by the decoding device. At this time, the resolution of the second target viewpoint picture corresponding to the P2 viewpoint is relatively high, and the user What you see is a higher resolution live video of a basketball game, that is, the live video of a basketball game that the user sees is restored from low resolution to high resolution. If the P2 viewpoint does not switch later, the display device will always be the user’s Displays higher resolution live video of basketball games.

According to the above technical solution, this embodiment not only realizes zero-delay video display when switching viewpoints, but also quickly restores the video from low resolution to high resolution after switching viewpoints when watching live videos or recorded videos, ensuring long-term Clarity when watching video.

As shown in Figure 4, in the second embodiment of the present application, based on the first embodiment, the following steps are also included before step S210:

Step S110: Obtain the viewpoint identifier of the secondary viewpoint corresponding to the current viewpoint and the arrangement information of the secondary image frame sequence.

Step S120: Determine the position information of the secondary viewpoint image corresponding to the current viewpoint in the secondary image frame according to the arrangement information and the viewpoint identifier.

Step S130: Extracting an image corresponding to the location information from a secondary image frame in the sequence of secondary image frames.

Before the encoding device generates the slave image frame sequence, the encoding device first stitches the images of each viewpoint according to the preset arrangement method, generates the stitched image and the arrangement information of the images of each viewpoint in the stitched image, and then according to the shooting The time and the preset first resolution are used to encode the spliced image sequence, and the arrangement information is used to mark the encoded spliced image sequence, so as to obtain the slave image frame sequence. Wherein, using arrangement information to mark the coded spliced image sequence, obtaining the slave image frame sequence includes: inserting arrangement information into the sequence header of the coded spliced image sequence to obtain the slave image frame sequence, or in Arrangement information is inserted into each stitched image in the encoded stitched image sequence to obtain a slave image frame sequence. Insert the arrangement information into the sequence header of the spliced image sequence, and the decoding device can only read the arrangement information in the sequence header of the slave image frame sequence to obtain each slave viewpoint in each slave image frame in the slave image frame sequence The position information of the image, and insert the layout information into each spliced image, the decoding device needs to read the layout information of each slave image frame in the sequence of image frames to get each slave image frame The position information of the image from each viewpoint in . The arrangement information includes at least a viewpoint identifier of each viewpoint and position information of images of each viewpoint in the spliced image.

When the image of each viewpoint includes a viewpoint picture, the format of the layout information is {x, y, w, h, view_id}, where x, y are the coordinates of the pixel in the upper left corner of the viewpoint picture in the spliced image, and w, h are The width and height of the viewpoint picture, view_id is the viewpoint identifier corresponding to the image or viewpoint picture, where x, y, w, and h represent the location information. When the images of each viewpoint include viewpoint images and viewpoint depth map images, the format of the layout information is {x, y, w, h, view_id, is_depth}, where x, y are the pixels in the upper left corner of the viewpoint images or viewpoint depth map images The coordinates in the spliced image, w, h are the width and height of the viewpoint picture or the viewpoint depth map picture, view_id is the viewpoint ID corresponding to the viewpoint picture or the viewpoint depth map picture, and is_depth marks whether the picture is a viewpoint depth map picture. Among them, x, y, w, h represent position information. The images of each viewpoint in the spliced image are arranged in the same manner as the images of each viewpoint in the secondary image frame, so the spliced image is the same as the secondary image frame, and the arrangement information in the two is also the same.

In this embodiment, when the decoding device acquires the image of the secondary viewpoint corresponding to the current viewpoint, it obtains the viewpoint identifier of the secondary viewpoint corresponding to the current viewpoint and the arrangement information of the secondary image frame sequence. For example, if the current viewpoint is a P2 viewpoint, then the viewpoint identifier corresponding to the current viewpoint is P2. After obtaining the viewpoint identifier and arrangement information corresponding to the current viewpoint, obtain the viewpoint identifier that is the same as the viewpoint identifier corresponding to the current viewpoint in the arrangement information, and then determine from the image frame according to the viewpoint identifier obtained from the arrangement information Obtain the position information of the image of the secondary viewpoint corresponding to the current viewpoint in the secondary image frame. Furthermore, according to the determined position information, an image corresponding to the position information is intercepted from a sub image frame in the sub image frame sequence. If the current viewpoint is the P2 viewpoint, the viewpoint identified as P2 in the secondary image frame is the secondary viewpoint corresponding to the P2 viewpoint.

For example, the viewpoint identifier corresponding to the current viewpoint is P2, and after determining the viewpoint whose viewpoint identifier in the arrangement information is also P2 according to the viewpoint identifier corresponding to the current viewpoint, the viewpoint corresponding to the current viewpoint determined according to the viewpoint identifier P2 in the arrangement information From the image of the viewpoint, and according to the arrangement information, the image of the P2 viewpoint in the secondary image frame is intercepted from the secondary image frame.

As shown in FIG. 5, in the third embodiment of the present application, based on the first embodiment, step S220 includes the following steps:

Step S221: Determine whether the current viewpoint is a virtual viewpoint, if yes, execute step S223; if not, execute step S222.

Step S222: Send the images in the main image frame sequence received by the main transmission path corresponding to the current viewpoint to the display device.

Step S223: Send the images in the main image frame sequence received by the main transmission path corresponding to the current viewpoint and the images corresponding to the current viewpoint intercepted from the secondary image frames in the secondary image frame sequence received by the secondary transmission path The image from the viewpoint to the display device.

In this embodiment, if the user opens the multi-viewpoint video playback application on the display device and starts to watch the live video of the basketball match, the current viewpoint (the default viewpoint) of the multi-viewpoint video playback application may or may not be a virtual viewpoint. The point of view is either a virtual point of view or a real point of view. The real viewpoint refers to a real viewpoint, and the viewpoint corresponding to each camera is the true viewpoint. For example, the P1 viewpoint-P9 viewpoint corresponding to the P1 camera-P9 camera are all real viewpoints. A virtual viewpoint refers to a viewpoint that does not actually exist, that is, a viewpoint between two adjacent real viewpoints, for example, a viewpoint between a P1 viewpoint and a P2 viewpoint is a virtual viewpoint. If the current viewpoint is a real viewpoint, the display device directly displays the images in the main image frame sequence of the current viewpoint; if the current viewpoint is a virtual viewpoint, it needs to synthesize the current Viewpoint picture, and then display the image in the main image frame sequence of the real viewpoint closest to the current viewpoint, wherein, the secondary viewpoint corresponding to the current viewpoint includes the secondary viewpoint adjacent to the current viewpoint; the secondary viewpoint corresponding to the current viewpoint The image includes a viewpoint picture and a viewpoint depth map picture, and the current viewpoint picture of the current viewpoint needs to be synthesized according to the viewpoint picture and the viewpoint depth map picture in the image of the secondary viewpoint corresponding to the current viewpoint.

Specifically, after the decoding device receives the display instruction sent by the display device, it obtains the current viewpoint according to the display instruction, and judges whether the current viewpoint is a virtual viewpoint. The images in the main image frame sequence are sent to the display device as the first video data, and the display device displays the current viewpoint picture of the current viewpoint according to the first video data. If the current viewpoint is a virtual viewpoint, the image in the main image frame sequence received by the main transmission path corresponding to the current viewpoint and the image corresponding to the current viewpoint intercepted from the secondary image frame sequence in the secondary image frame sequence received by the secondary transmission path The image from the viewpoint is sent to the display device as the first video data, and the display device synthesizes the current viewpoint image corresponding to the current viewpoint according to the viewpoint image and the viewpoint depth map image in the image corresponding to the current viewpoint, and then displays the current image corresponding to the current viewpoint. Viewpoint picture, that is, an image in the main image frame sequence displaying the current viewpoint. Wherein, the displayed images in the main image frame sequence of the current viewpoint are the images in the main image frame sequence of the real viewpoint closest to the current viewpoint, and when the display device displays the current viewpoint picture corresponding to the current viewpoint according to the first video data, the user The current viewpoint picture seen is of higher resolution.

For example, as shown in Figure 6, 1-9 represent viewpoints P1-P9 respectively, and viewpoint A represents the default viewpoint, that is, the current viewpoint, and viewpoint A in Fig. 6 falls on viewpoint P5, that is, viewpoint A is also viewpoint P5, and It is also a real point of view. Then, the decoding device sends the images in the main image frame sequence received by the main transmission path corresponding to the P5 viewpoint as the first video data to the display device, and the display device displays the images in the main image frame sequence of the P5 viewpoint according to the first video data . As shown in FIG. 7 , viewpoint A represents the default viewpoint, that is, the current viewpoint, and A in FIG. 7 falls between viewpoints P5 and P6, that is, viewpoint A is a virtual viewpoint. Then, the decoding device uses the images in the main image frame sequence received by the main transmission path corresponding to the P5 viewpoint and the images of the P5 viewpoint and P6 viewpoint intercepted from the image frames in the secondary image frame sequence received by the secondary transmission path as The first video data is sent to the display device, and the display device synthesizes the current view point picture of the current view point according to the view point pictures and the view point depth map pictures in the images of the P5 view point and the P6 view point, and then displays the images in the main image frame sequence of the P5 view point.

As shown in FIG. 8, in the fourth embodiment of the present application, based on the first embodiment, step S240 includes the following steps:

Step S241: Determine whether the target viewpoint is a virtual viewpoint, if yes, execute step S243; if not, execute step S242.

Step S242: Send the image of the secondary viewpoint identical to the target viewpoint intercepted from the secondary image frame in the sequence of secondary image frames to the display device.

Step S243: Sending images of secondary viewpoints adjacent to the target viewpoint intercepted from secondary image frames in the sequence of secondary image frames to the display device.

In this embodiment, after the user selects a target viewpoint to be switched on the video playback interface of the display device, the target viewpoint selected by the user may or may not be a virtual viewpoint. If it is not a virtual viewpoint, it is a real viewpoint. If the target viewpoint is a real viewpoint, the display device directly displays the image from the same viewpoint as the target viewpoint in the secondary image frame; if the target viewpoint is a virtual viewpoint, it needs to be based on the Image synthesis of the viewpoint picture of the target viewpoint, and then displaying the image from the viewpoint closest to the target viewpoint. The viewpoint frame and the viewpoint depth map frame in the image of the secondary viewpoint adjacent to the target viewpoint are synthesized.

Specifically, after the decoding device receives the viewpoint switching instruction sent by the display device, it obtains the target viewpoint according to the viewpoint switching instruction, and judges whether the target viewpoint is a virtual viewpoint. If the target viewpoint is a real viewpoint, the secondary image in the sequence of secondary image frames The image of the secondary viewpoint identical to the target viewpoint intercepted in the frame is sent to the display device as the second video data, and the display device displays the image of the same secondary viewpoint as the target viewpoint according to the second video data. If the target point of view is a virtual point of view, then the image from the point of view adjacent to the target point of view intercepted from the frame of the image frame sequence is sent to the display device as the second video data, and the display device is based on the image adjacent to the target point of view Synthesizing the first target viewpoint frame corresponding to the target viewpoint from the viewpoint frame and the viewpoint depth map frame in the image of the viewpoint, and then displaying the image from the viewpoint closest to the target viewpoint according to the second video data. It should be noted that the display device displays that the resolution of the first target viewpoint picture corresponding to the target viewpoint is lower than the resolution of the current viewpoint picture corresponding to the current viewpoint, that is, the resolution of the viewpoint picture seen by the user is relatively low.

For example, as shown in FIG. 9 , viewpoint B represents a target viewpoint, and viewpoint B in FIG. 9 falls on viewpoint P6, and viewpoint B is viewpoint P6, which is also a real viewpoint. Then, the decoding device sends the image of the P6 viewpoint intercepted from the image frame in the sequence of image frames to the display device as the second video data, and the display device displays the image from the image frame in the sequence of image frames obtained by the decoding device according to the second video data. The image of the P6 viewpoint captured in the frame. As shown in FIG. 10 , viewpoint C represents the target viewpoint, and viewpoint C in FIG. 10 falls between viewpoints P6 and P7, that is, viewpoint C is a virtual viewpoint. Then, the decoding device sends the images of the P6 viewpoint and the P7 viewpoint intercepted from the image frames in the sequence of image frames as the second video data to the display device, and the display device transmits the images according to the viewpoint pictures and viewpoints in the images of the P6 viewpoint and the P7 viewpoint The depth map picture synthesizes the first target viewpoint picture corresponding to the viewpoint C, wherein the viewpoint P7 is closest to the viewpoint C, and the display device displays the image of the viewpoint P7 according to the second video data.

Further, based on the fourth embodiment, the target viewpoint is not a virtual viewpoint, and an image of a slave viewpoint identical to the target viewpoint intercepted from a slave image frame in the sequence of slave image frames is sent to the display device After the step, when receiving the screen switching condition satisfaction instruction sent by the display device, sending the third video data required by the display device to display the second target viewpoint picture corresponding to the target viewpoint to the display device Steps include:

Specifically, if the target viewpoint is a real viewpoint, the decoding device sends the image in the main image frame sequence received by the main transmission path corresponding to the target viewpoint as the third video data to the display device, and the display device displays the target viewpoint according to the third video data The corresponding second target viewpoint picture. Wherein, when the display device displays the second target viewpoint picture corresponding to the target viewpoint, the resolution of the viewpoint picture seen by the user is relatively high, that is, the display device displays the first target viewpoint picture corresponding to the lower resolution target viewpoint. Return to displaying the second target viewpoint picture corresponding to the higher resolution target viewpoint, and the user can see the higher resolution viewpoint picture again.

For example, as shown in FIG. 9 , viewpoint B represents a target viewpoint, and viewpoint B in FIG. 9 falls on viewpoint P6, and viewpoint B is viewpoint P6, which is also a real viewpoint. Then, the decoding device sends the image in the main image frame sequence corresponding to the P6 viewpoint to the display device as the third video data, and the display device displays the second target viewpoint picture corresponding to the P6 viewpoint according to the third video data, that is, displays the image corresponding to the P6 viewpoint. For the images in the main image frame sequence, the resolution of the second target viewpoint picture corresponding to the P6 viewpoint displayed by the display device at this time is relatively high.

Further, based on the fourth embodiment, the target viewpoint is a virtual viewpoint, and an image of a slave viewpoint adjacent to the target viewpoint intercepted from a slave image frame in the slave image frame sequence is sent to the display device After the step, when receiving the screen switching condition satisfaction instruction sent by the display device, sending the third video data required by the display device to display the second target viewpoint picture corresponding to the target viewpoint to the display device Steps also include:

Specifically, if the target viewpoint is a virtual viewpoint, the decoding device combines the images in the main image frame sequence received by the main transmission path corresponding to the target viewpoint and the images adjacent to the target viewpoint intercepted by the secondary image frames in the secondary image frame sequence The image from the viewpoint is sent to the display device as the third video data, and the display device displays the second target viewpoint picture corresponding to the target viewpoint according to the third video data. Wherein, when the display device displays the second target viewpoint picture corresponding to the target viewpoint, the resolution of the viewpoint picture seen by the user is relatively high, that is, the display device displays the first target viewpoint picture corresponding to the lower resolution target viewpoint. Return to displaying the second target viewpoint picture corresponding to the higher resolution target viewpoint, and the user can see the higher resolution viewpoint picture again.

For example, as shown in Figure 10, the viewpoint C represents the target viewpoint, and the viewpoint C in Figure 10 falls between the P6 viewpoint and the P7 viewpoint, that is, the viewpoint C is a virtual viewpoint, and the P7 viewpoint is closest to the viewpoint C, then, the decoding device The images in the main image frame sequence received by the main transmission path corresponding to the P7 viewpoint and the images of the P6 viewpoint and P7 viewpoint intercepted from the secondary image frames in the secondary image frame sequence are sent to the display device as third video data, and displayed The device synthesizes the second target viewpoint picture corresponding to viewpoint C according to the viewpoint pictures and viewpoint depth map pictures in the P6 viewpoint and P7 viewpoint images intercepted from the image frame, and displays the second target viewpoint corresponding to viewpoint C according to the second video data The picture, that is, the image in the main image frame sequence of the P7 viewpoint is displayed. At this time, the resolution of the second target viewpoint picture corresponding to the viewpoint C displayed by the device is relatively high.

As shown in FIG. 11, in the fifth embodiment of the present application, the multi-viewpoint video data processing method of the present application is applied to a coding device, including the following steps:

Step S310: Obtain images of each viewpoint captured by each camera.

In this embodiment, when video shooting is performed, several cameras need to be arranged in the video shooting site in advance, and each camera is responsible for shooting an image at an angle, and the image at an angle captured by each camera is an image at a viewpoint, namely Different cameras capture images corresponding to different viewpoints. The captured video can be a live video, such as a basketball game live broadcast, a football game live broadcast, etc., or a recorded video, such as a badminton recorded and broadcast video.

In this embodiment, a live video is taken as an example for description, for example, a live video of a basketball match. When shooting a live video of a basketball game, several cameras need to be arranged around the venue where the game is held, and each camera is responsible for shooting an image from an angle of the game, and the image from an angle captured by each camera is an image from a viewpoint. As shown in FIG. 3 , 1-9 represent viewpoints P1-P9 respectively, and each viewpoint is correspondingly provided with a camera, that is, camera P1-camera P9. Cameras P1-P9 are 9 cameras for shooting this basketball game. Cameras P1-P9 are respectively responsible for shooting images of a viewpoint. Among them, the image captured by the P1 camera is the image of the P1 viewpoint, and the image captured by the P2 camera is the image of the P2 viewpoint. Image, and so on, the image captured by the P9 camera is the image of the P9 viewpoint. Wherein, the image corresponding to each viewpoint includes at least one of a viewpoint picture and a viewpoint depth map picture, and the viewpoint depth map picture is also called a range image (range image), which refers to the An image of the distance (depth) of each point as a pixel value. Specifically, the encoding device acquires images of each viewpoint captured by each camera, that is, the images of each viewpoint acquired by the encoding device include at least one of a viewpoint picture and a perspective depth map picture.

Step S320: Stitching the images of each of the viewpoints, encoding the spliced images according to the shooting time and the first resolution, and generating a secondary image frame sequence.

Specifically, step S320 includes:

Stitching the images of each of the viewpoints in a preset arrangement manner, generating a spliced image and arrangement information of the images of each of the viewpoints in the spliced image;

The arrangement information includes at least the viewpoint identification of each viewpoint and the position information of the image of each viewpoint in the spliced image;

In this embodiment, the encoding device encodes the images captured at the P1 viewpoint-P9 viewpoint after acquiring the images captured at the P1 viewpoint-P9 viewpoint captured by the P1 camera-P9 camera. The encoding device uses a preset arrangement method to splice the images of each viewpoint first, and generates the spliced image and the arrangement information of the images of each viewpoint in the spliced image. The spliced image is composed of P1 viewpoint-P9 The images taken by the viewpoints at the same time are spliced together. It can be understood that the spliced image is a large image, and the large image is divided into 9 small images.

Among them, the layout information includes at least the viewpoint identification of each viewpoint and the position information of the image of each viewpoint in the spliced image. The preset layout method is shown in Figure 12 and Figure 13. If the image of each viewpoint includes a viewpoint picture, then the arrangement of the images of each viewpoint in the spliced image is the corresponding method in Figure 12; The arrangement of the viewpoint images in the spliced image is the one corresponding to FIG. 13 .

When the image of each viewpoint includes a viewpoint picture, the format of the layout information is {x, y, w, h, view_id}, where x, y are the coordinates of the pixel in the upper left corner of the viewpoint picture in the spliced image, and w, h are The width and height of the viewpoint picture, view_id is the viewpoint identifier corresponding to the image or viewpoint picture, where x, y, w, and h represent the location information. When the images of each viewpoint include viewpoint images and viewpoint depth map images, the format of the layout information is {x, y, w, h, view_id, is_depth}, where x, y are the pixels in the upper left corner of the viewpoint images or viewpoint depth map images The coordinates in the spliced image, w, h are the width and height of the viewpoint picture or the viewpoint depth map picture, view_id is the viewpoint ID corresponding to the viewpoint picture or the viewpoint depth map picture, and is_depth marks whether the picture is a viewpoint depth map picture. Among them, x, y, w, h represent position information. Wherein, the viewpoint identifier indicates which viewpoint image the image of each viewpoint in the spliced image is, for example, if the viewpoint identifier of one of the images is P9, then the image is the image corresponding to the viewpoint of P9, and the position information represents each viewpoint Where in the spliced image are the specific images arranged.

After the encoding device generates the stitched image and the arrangement information of the images of each viewpoint in the stitched image, each stitched image is sorted according to the shooting time to generate a stitched image sequence. Suppose, according to the order of shooting time, n spliced images are generated, which are image 1, image 2, image 3, ..., image n, then, after sorting image 1-image n once, it is Stitch image sequences. Furthermore, the sequence of spliced images is encoded according to the preset first resolution, and the sequence of encoded spliced images is marked by using the arrangement information, so as to obtain a sequence of secondary image frames.

Further, using the arrangement information to mark the coded spliced image sequence, obtaining the slave image frame sequence includes: inserting the arrangement information into the sequence header of the coded spliced image sequence to obtain the slave image frame sequence, or Arrangement information is inserted into each stitched image in the encoded stitched image sequence to obtain a slave image frame sequence. Insert the arrangement information into the sequence header of the spliced image sequence, and the decoding device can only read the arrangement information in the sequence header of the slave image frame sequence to obtain each slave viewpoint in each slave image frame in the slave image frame sequence The position information of the image, and insert the layout information into each spliced image, the decoding device needs to read the layout information of each slave image frame in the sequence of image frames to get each slave image frame The position information of the image from each viewpoint in . Wherein, the coded spliced image sequence is marked by using the arrangement information, which facilitates the decoding device to intercept the required image from the spliced images according to the arrangement information, and is beneficial to improve the image interception efficiency.

Step S330: Encode the image of each viewpoint according to the shooting time and the second resolution, and generate a main image frame sequence of each viewpoint.

In this embodiment, while the encoding device generates the slave image frame sequence, it also separately encodes the images of each viewpoint captured from the P1 viewpoint to the P9 viewpoint according to the shooting time and the preset second resolution, and generates the image of each viewpoint The main image frame sequence, that is, separately encode the images shot under the P1 viewpoint according to the shooting time and the second resolution to generate the main image frame sequence of the P1 viewpoint, and encode the images shot under the P2 viewpoint according to the shooting time and the second resolution Separate encoding is performed to generate the main image frame sequence of P2 viewpoint, and so on, according to the shooting time and second resolution, the images shot under P9 viewpoint are separately encoded to generate the main image frame sequence of P9 viewpoint, thereby generating 9 main sequence of image frames.

Specifically, step S330 includes:

Sorting the images of each of the viewpoints according to the shooting time to generate an image sequence of each of the viewpoints;

The image sequence of each viewpoint is encoded according to the second resolution to obtain the main image frame sequence.

Assume that, according to the order of shooting time, the encoding device has obtained n images corresponding to the viewpoints P1-P9 respectively. Taking the images corresponding to the viewpoint P1 as an example, after sorting the n images corresponding to the viewpoint P1 according to the shooting time, the generated P1 The image sequence of the viewpoint is image 1, image 2, image 3, ..., image n, and then, after encoding the image sequence of the P1 viewpoint according to the preset second resolution, the main image corresponding to the P1 viewpoint is obtained sequence of frames. The encoding methods of the n images corresponding to the P2 viewpoint to the P9 viewpoint are the same as the encoding methods of the n images corresponding to the P1 viewpoint, and will not be repeated here.

It is worth noting that in this embodiment, the first resolution represents the total resolution of the stitched image, and the resolution of the images of each viewpoint in the stitched image is smaller than the second resolution, that is, the stitched image The resolution of the images of each viewpoint in is smaller than the resolution of the images in the main image frame sequence of each viewpoint.

Further, the first image in the image sequence corresponding to each viewpoint is coded as an I frame, and an I frame (I frame) is also called an internal picture, and an I frame is usually a video compression technique used by each GOP (MPEG) ), the first frame of ) is moderately compressed and used as a reference point for random access, which can be regarded as an image. For multi-viewpoint video, since viewpoint switching can only be performed at I frames, this implementation encodes the first image in the image sequence corresponding to each viewpoint as an I frame when encoding images of each viewpoint. Since the first image in the main image frame sequence of each viewpoint is an I frame, it can be randomly switched with the viewpoint, thereby realizing switching and displaying the main image frame sequence corresponding to the high-resolution target viewpoint, which is convenient for the user After switching the viewpoint on the display device side, the display device quickly resumes displaying high-resolution viewpoint images according to the secondary image frame sequence provided by the encoding device and the main image frame sequence corresponding to each viewpoint, so as to maintain the clarity of the user watching the video for a long time.

Step S340: When receiving the viewpoint selection instruction sent by the decoding device, acquire the viewpoint selected by the decoding device according to the viewpoint selection instruction.

Step S350 transmits the main image frame sequence of the viewpoint selected by the decoding device to the decoding device through the main transmission path of the viewpoint, and simultaneously transmits the secondary image frame sequence to the decoding device through the secondary transmission path.

In this embodiment, after the coding device generates the secondary image frame sequence and the main image frame sequence of each viewpoint, the coding device obtains the viewpoint selected by the decoding device according to the received viewpoint selection instruction sent by the decoding device, and then decodes the The main image frame sequence of the viewpoint selected by the device is transmitted to the decoding device through the main transmission path of the viewpoint, and the secondary image frame sequence is transmitted to the decoding device through the secondary transmission path.

The encoding device transmits the generated main image frame sequence of each viewpoint to the decoding device through an independent transmission path, and simultaneously transmits the generated secondary image frame sequence to the display device through an independent transmission path. For ease of understanding, the independent transmission path for transmitting the main image frame sequence of each viewpoint is referred to as the main transmission path, and the independent transmission path for transmitting the slave image frame sequence is referred to as the slave transmission path. If there are 9 viewpoints, that is, there are 9 main transmission paths and 1 secondary transmission path, each main transmission path is responsible for transmitting the main image frame sequence of the corresponding viewpoint, for example, the main transmission path corresponding to the P1 viewpoint transmits the main image of the P1 viewpoint Frame sequence, the slave transmission path is responsible for transmitting the slave image frame sequence.

According to the above technical solution, this embodiment provides the display device with the required secondary image frame sequence and the main image frame sequence corresponding to each viewpoint for the display device, so that after the user switches the viewpoint at the display device end, the display device provides The slave image frame sequence and the main image frame sequence corresponding to each viewpoint quickly restore and display high-resolution viewpoint images to maintain the clarity of the user watching the video for a long time.

Based on the above step S310-step S350, this embodiment describes the encoding device according to the following example, specifically as follows:

For example, the images of the P1 viewpoint-P9 viewpoint captured by 9 cameras respectively, after encoding, the main image frame sequences corresponding to the P1 viewpoint-P9 viewpoint respectively are the main image frame sequence F1, the main image frame sequence F2, ... .., the main image frame sequence F9, the secondary image frame sequence is the secondary image frame sequence F0, and each spliced image in the secondary image frame sequence F0 includes the images of the P1 viewpoint-P9 viewpoint, which are respectively image f1-image f9, The main transmission path of the P1 viewpoint is path 1, the main transmission path of the P2 viewpoint is path 2, ..., the main transmission path of the P9 viewpoint is path 9, and the secondary transmission path is path 0. If the viewpoint selected by the decoding device is the P1 viewpoint, then the encoding device transmits the main image frame sequence F1 to the decoding device through path 1; if the viewpoint selected by the decoding device is the P5 viewpoint, then the encoding device transmits the main image frame sequence F5 through the path 5. Transmission to the decoding device; if the viewpoint selected by the decoding device is a virtual viewpoint, the virtual viewpoint is between the P3 viewpoint and the P4 viewpoint, and the virtual viewpoint is the closest to the P4 viewpoint, then the encoding device passes the main image frame sequence F4 through the path 4 It is transmitted to the decoding device, and at the same time, the image frame sequence F0 is transmitted to the decoding device through path 0.

The serial numbers of the above embodiments of the present application are for description only, and do not represent the advantages and disadvantages of the embodiments.

It should be noted that, in this document, the term "comprising", "comprising" or any other variation thereof is intended to cover a non-exclusive inclusion such that a process, method, article or apparatus comprising a set of elements includes not only those elements, It also includes other elements not expressly listed, or elements inherent in the process, method, article, or device. Without further limitations, an element defined by the phrase "comprising a ..." does not preclude the presence of additional identical elements in the process, method, article, or apparatus comprising that element.

Through the description of the above embodiments, those skilled in the art can clearly understand that the methods of the above embodiments can be implemented by means of software plus a necessary general-purpose hardware platform, and of course also by hardware, but in many cases the former is better implementation. Based on such an understanding, the technical solution of the present application can be embodied in the form of a software product in essence or the part that contributes to the prior art, and the computer software product is stored in a storage medium as described above (such as ROM/RAM , magnetic disk, optical disk), including several instructions to make a terminal device (which may be a mobile phone, computer, server, air conditioner, or network device, etc.) execute the methods described in various embodiments of the present application.

The above are only preferred embodiments of the present application, and are not intended to limit the patent scope of the present application. All equivalent structures or equivalent process transformations made by using the description of the application and the accompanying drawings are directly or indirectly used in other related technical fields. , are all included in the patent protection scope of the present application in the same way.

Claims

A method for processing multi-viewpoint video data, wherein, applied to a decoding device, the method for processing multi-viewpoint video data includes:

When receiving a display instruction sent by a display device, acquiring a current viewpoint corresponding to the display instruction;

Sending the first video data required by the display device to display the current viewpoint picture corresponding to the current viewpoint to the display device; the first video data includes the main image received by the main transmission path corresponding to the current viewpoint The image in the frame sequence and/or the image of the secondary viewpoint corresponding to the current viewpoint intercepted from the secondary image frame in the secondary image frame sequence received from the transmission path, the image in the main image frame sequence and the The images in the slave image frames all include viewpoint pictures and/or viewpoint depth map pictures, and the resolution of the images in the main image frame sequence is greater than the resolution of the images in the slave image frames;

When receiving the viewpoint switching instruction sent by the display device, acquiring a target viewpoint corresponding to the viewpoint switching instruction;

Sending the second video data required by the display device to display the first target viewpoint picture corresponding to the target viewpoint to the display device; the second video data includes the slave image frame in the slave image frame sequence An image from a viewpoint corresponding to the intercepted target viewpoint;

When receiving the screen switching condition satisfaction instruction sent by the display device, sending the third video data required by the display device to display the second target viewpoint picture corresponding to the target viewpoint to the display device; the third video The data includes an image in the main image frame sequence received by the main transmission path corresponding to the target viewpoint and/or an image of the secondary viewpoint corresponding to the target viewpoint intercepted from a secondary image frame in the secondary image frame sequence .
The method according to claim 1, wherein, before the step of sending the first video data required by the display device to display the current viewpoint picture corresponding to the current viewpoint to the display device, further comprising:

Acquire the viewpoint identifier of the secondary viewpoint corresponding to the current viewpoint and the arrangement information of the secondary image frame sequence;

determining the position information of the secondary viewpoint image corresponding to the current viewpoint in the secondary image frame according to the arrangement information and the viewpoint identifier;

An image corresponding to the location information is intercepted from a secondary image frame in the sequence of secondary image frames.
The method according to claim 1, wherein the step of sending the first video data required by the display device to display the current viewpoint picture corresponding to the current viewpoint to the display device comprises:

The current viewpoint is not a virtual viewpoint, and the images in the main image frame sequence received by the main transmission path corresponding to the current viewpoint are sent to the display device.
The method according to claim 3, wherein, after the step of judging whether the current viewpoint is a virtual viewpoint, it further comprises:

The current viewpoint is a virtual viewpoint, and the images in the main image frame sequence received by the main transmission path corresponding to the current viewpoint and the images intercepted from the secondary image frame sequences in the secondary image frame sequence received by the secondary transmission path are sent. The image of the secondary viewpoint corresponding to the current viewpoint is sent to the display device; the secondary viewpoint corresponding to the current viewpoint includes secondary viewpoints adjacent to the current viewpoint.
The method according to claim 1, wherein the step of sending the second video data required by the display device to display the first target viewpoint picture corresponding to the target viewpoint to the display device comprises:

The target viewpoint is not a virtual viewpoint, and an image of a slave viewpoint identical to the target viewpoint intercepted from a slave image frame in the sequence of slave image frames is sent to the display device.
The method according to claim 5, wherein the target viewpoint is not a virtual viewpoint, and the image of the slave viewpoint identical to the target viewpoint intercepted from the slave image frame in the sequence of slave image frames is sent to After the step of the display device, when receiving the screen switching condition satisfaction instruction sent by the display device, sending the third video data required by the display device to display the second target viewpoint picture corresponding to the target viewpoint to The display device steps include:

When the screen switching condition satisfaction instruction sent by the display device is received, the images in the main image frame sequence received by the main transmission path corresponding to the target viewpoint are sent to the display device.
The method according to claim 5, wherein, after the step of judging whether the target viewpoint is a virtual viewpoint, it further comprises:

The target viewpoint is a virtual viewpoint, and an image of a slave viewpoint adjacent to the target viewpoint intercepted from a slave image frame in the sequence of slave image frames is sent to the display device.
The method according to claim 7, wherein the target viewpoint is a virtual viewpoint, and an image of a slave viewpoint adjacent to the target viewpoint intercepted from a slave image frame in the sequence of slave image frames is sent to After the step of the display device, when receiving the screen switching condition satisfaction instruction sent by the display device, sending the third video data required by the display device to display the second target viewpoint picture corresponding to the target viewpoint to The display device step also includes:

When the screen switching condition satisfaction instruction sent by the display device is received, the image in the main image frame sequence received by the main transmission path corresponding to the target viewpoint and the image in the secondary image frame sequence received by the secondary image frame sequence are sent The intercepted images from viewpoints adjacent to the target viewpoint are sent to the display device to the display device.
A method for processing multi-viewpoint video data, wherein, applied to a coding device, the method for processing multi-viewpoint video data includes:

Acquiring images of various viewpoints taken by each camera, where different cameras take images corresponding to different viewpoints, and the images include at least one of a viewpoint picture and a viewpoint depth map picture;

Stitching the images of each viewpoint, and encoding the spliced images according to the shooting time and the first resolution, to generate a sequence of secondary image frames;

Encode the images of each viewpoint according to the shooting time and the second resolution to generate a main image frame sequence of each viewpoint, the resolution of the images in the main image frame sequence is greater than that of the images in the spliced images resolution;

When receiving the viewpoint selection instruction sent by the decoding device, acquire the viewpoint selected by the decoding device according to the viewpoint selection instruction;

The main image frame sequence of the viewpoint selected by the decoding device is transmitted to the decoding device through the main transmission path of the viewpoint, and at the same time, the secondary image frame sequence is transmitted to the decoding device through the secondary transmission path.
The method according to claim 9, wherein said splicing the images of each said viewpoint, and encoding the spliced images according to the shooting time and the first resolution, and the step of generating a sequence of secondary image frames comprises:

Stitching the images of each of the viewpoints in a preset arrangement manner to generate a spliced image and arrangement information of the images of each of the viewpoints in the spliced image, the arrangement information at least including each of the viewpoints The viewpoint identification and the position information of the image of each viewpoint in the spliced image;

Sorting the stitched images according to the shooting time to generate a stitched image sequence;

Encoding the spliced image sequence according to the first resolution, and marking the encoded spliced image sequence by using the arrangement information, to obtain the secondary image frame sequence.
The method according to claim 10, wherein the step of marking the coded sequence of stitched images by using the arrangement information, and obtaining the sequence of secondary image frames comprises:

The arrangement information is inserted into the sequence header of the coded spliced image sequence to obtain the secondary image frame sequence.
The method according to claim 10, wherein the step of marking the encoded sequence of stitched images by using the arrangement information, and obtaining the sequence of secondary image frames further comprises:

Inserting the arrangement information into each stitched image in the encoded stitched image sequence to obtain the slave image frame sequence.
A decoding device, wherein the decoding device includes: a memory, a processor, and a multi-viewpoint video data processing program stored in the memory and operable on the processor, the multi-viewpoint video data processing program being executed by The processor implements the steps of the method for processing multi-viewpoint video data according to any one of claims 1-8 when executed.
An encoding device, wherein the encoding device includes: a memory, a processor, and a multi-viewpoint video data processing program stored in the memory and operable on the processor, the multi-viewpoint video data processing program being executed by The processor implements the steps of the method for processing multi-viewpoint video data according to any one of claims 9-12 when executed.
A storage medium, wherein a multi-view video data processing program is stored thereon, and when the multi-view video data processing program is executed by a processor, the multi-view video data processing method according to any one of claims 1-12 is realized A step of.