WO2023029207A1

WO2023029207A1 - Video data processing method, decoding device, encoding device, and storage medium

Info

Publication number: WO2023029207A1
Application number: PCT/CN2021/129225
Authority: WO
Inventors: 王荣刚; 王振宇; 高文
Original assignee: 北京大学深圳研究生院
Priority date: 2021-09-02
Filing date: 2021-11-08
Publication date: 2023-03-09
Also published as: CN113900572A

Abstract

Disclosed in the present application are a video data processing method, a decoding device, an encoding device, and a storage medium. The method comprises: when a generation and display instruction is received, obtaining a current viewpoint; intercepting, from a video frame received by a transmission path corresponding to the current viewpoint, an image required for generating an image of the current viewpoint; when a viewpoint switching instruction is received, obtaining a target viewpoint, and intercepting, from the video frame received by the transmission path corresponding to the current viewpoint, an image required for generating an image of the target viewpoint; and when a switching condition is satisfied, intercepting, from a video frame received by a transmission path corresponding to the target viewpoint, the image required for generating the image of the target viewpoint and displaying same.

Description

Video data processing method, decoding device, encoding device and storage medium

related application

This application claims the priority of the Chinese patent application filed on September 2, 2021, with application number 202111040999.9, entitled "Video Data Processing Method, Decoding Device, Encoding Device, and Storage Medium", which is hereby incorporated by reference in its entirety .

technical field

The present application relates to the technical field of video data processing, and in particular to a video data processing method, a decoding device, an encoding device and a storage medium.

Background technique

The free viewpoint technology is a technology for viewing videos from a free viewpoint. The current free-viewpoint application using the free-viewpoint technology can allow viewers to watch videos in the form of continuous viewpoints within a certain range. The viewer can set the position and angle of the point of view, and is no longer limited to watching a video shot by a fixed camera angle of view, realizing a 360° free viewing angle to watch the video.

The current free-viewpoint applications often use the spatial stitching method to splice single-channel videos from multiple viewpoints together. When the user switches viewpoints on the free-viewpoint application side, the free-viewpoint application stitches together multiple viewpoints. Single-channel video is The user displays the single-channel video corresponding to the switched viewpoint. However, after splicing the single-channel video from multiple viewpoints using the spatial stitching method, the resolution of the single-channel video from each viewpoint will decrease, resulting in insufficient image resolution for the free-viewpoint application display, resulting in the resolution of the final generated viewpoint images. The rate is not high.

technical problem

The embodiment of the present application provides a video data processing method, a decoding device, a coding device, and a storage medium, aiming to solve the problem of screen resolution required for the display of free-viewpoint applications after splicing single-channel video from multiple viewpoints using the spatial domain splicing method. Insufficient resolution, which in turn leads to the technical problem that the resolution of the final generated viewpoint picture is reduced.

technical solution

An embodiment of the present application provides a video data processing method applied to a decoding device. The video data processing method includes:

When receiving the viewpoint generation and display instruction sent by the display device, acquiring the current viewpoint of the display device according to the viewpoint generation and display instruction;

Intercepting the image required to generate the image of the current viewpoint from the video frame of the image frame sequence received by the transmission path corresponding to the current viewpoint, and sending the image required to generate the image of the current viewpoint to the The above-mentioned display device is used to generate the current viewpoint picture;

When the viewpoint switching instruction sent by the display device is received, the target viewpoint corresponding to the viewpoint switching instruction is obtained, and the generated video frame is intercepted from the video frame of the image frame sequence received by the transmission path corresponding to the current viewpoint. The image required for the image of the target viewpoint, and the image required for generating the image of the target viewpoint is sent to the display device to generate the target viewpoint picture;

When the switching condition is satisfied, the image required to generate the image of the target viewpoint is intercepted from the video frame of the image frame sequence received by the transmission path corresponding to the target viewpoint, and the generated image of the target viewpoint is generated The required image is sent to the display device to generate the image of the target viewpoint.

In an embodiment, the step of intercepting the image required to generate the image of the target viewpoint from the video frame of the image frame sequence received by the transmission path corresponding to the current viewpoint includes:

Acquiring the viewpoint identifier corresponding to the target viewpoint and the arrangement information of the video frames;

determining, according to the arrangement information and the viewpoint identifier, the position information of the image required for generating the image of the target viewpoint in the video frame;

Intercepting an image required to generate an image of a target viewpoint corresponding to the position information from a video frame of the sequence of image frames received by the transmission path corresponding to the current viewpoint.

In an embodiment, the images required for generating the image of the current viewpoint and the images required for generating the image of the target viewpoint both include at least one of a viewpoint frame or a viewpoint depth map frame, and the The resolution of the picture corresponding to the current viewpoint is greater than the resolution of the picture corresponding to the target viewpoint.

In an embodiment, the switching condition includes at least one of the following:

The time stamp of the video frame corresponding to the currently displayed image is the same as the time stamp of the video frame in the transmission path corresponding to the target viewpoint;

The time stamp of the video frame of the image frame sequence received from the transmission path corresponding to the current viewpoint reaches a preset time point.

An embodiment of the present application provides a video data processing method applied to a coding device. The video data processing method includes:

Obtain images of various viewpoints captured by each camera, and different cameras capture images corresponding to different viewpoints, wherein each viewpoint is used as a main viewpoint to generate a first image, and viewpoints other than the main viewpoint are used as corresponding to the main viewpoint generating said second image from a viewpoint from a viewpoint, said image comprising at least one of a viewpoint frame or a viewpoint depth map frame;

The first image corresponding to each main viewpoint and the second image corresponding to the main viewpoint from the viewpoint are spliced to obtain a video frame corresponding to the main viewpoint, and the spliced video frame corresponding to the main viewpoint is performed according to the shooting time. encoding to generate a corresponding image frame sequence, wherein the resolution of the first image is greater than the resolution of the second image;

When the decoding device receives the viewpoint generation and display instruction sent by the display device, after acquiring the current viewpoint of the display device according to the viewpoint generation and display instruction, the image frame sequence corresponding to the current viewpoint is transmitted by the corresponding to the current viewpoint The path is transmitted to the decoding device.

In an embodiment, the first image corresponding to each main viewpoint and the second image corresponding to the main viewpoint are spliced to obtain a video frame corresponding to the main viewpoint, and all spliced images are processed according to the shooting time The step of encoding the video frame corresponding to the main viewpoint to generate a corresponding image frame sequence includes:

Splicing the first image corresponding to each main viewpoint and the second image corresponding to the main viewpoint from the viewpoint to obtain a video frame corresponding to the main viewpoint;

Sorting the video frames corresponding to the main viewpoint according to the shooting time to generate a spliced image sequence;

Encoding the spliced image sequence to obtain an image frame sequence corresponding to each of the main viewpoints, wherein the first frame image in the image frame sequence corresponding to each of the main viewpoints is encoded as an I frame.

In an embodiment, the step of encoding the spliced image sequence to obtain an image frame sequence corresponding to each of the main viewpoints includes:

Obtaining the arrangement information of the video frames corresponding to the main viewpoint, the arrangement information at least including the viewpoint identification of each viewpoint and the position information of the image of each viewpoint in the video frame corresponding to the main viewpoint;

Encoding the spliced image sequence, and inserting the arrangement information into a sequence header of the encoded spliced image sequence to obtain an image frame sequence corresponding to the main viewpoint.

In addition, in order to achieve the above purpose, the present application also provides a decoding device, the decoding device includes:

The first receiving module is configured to acquire the current viewpoint of the display device according to the viewpoint generation display instruction when receiving the viewpoint generation display instruction sent by the display device;

The first sending module is configured to intercept the image of the current viewpoint from the video frame of the image frame sequence received by the transmission path corresponding to the current viewpoint, and send the image of the current viewpoint to the display device to generate current viewpoint screen;

The second receiving module is configured to acquire the target viewpoint corresponding to the viewpoint switching instruction when receiving the viewpoint switching instruction sent by the display device, and obtain the sequence of image frames received from the transmission path corresponding to the current viewpoint Intercepting the image required for generating the image of the target viewpoint from the video frame of the target viewpoint, and sending the image required for generating the image of the target viewpoint to the display device to generate a target viewpoint picture, wherein the generating the Both the image required for the image of the current viewpoint and the image required for generating the image of the target viewpoint include at least one of a viewpoint picture or a viewpoint depth map picture, and the resolution of the picture corresponding to the current viewpoint is larger than the The resolution of the picture corresponding to the target viewpoint;

The second sending module is configured to intercept the image required to generate the image of the target viewpoint from the video frame of the image frame sequence received by the transmission path corresponding to the target viewpoint when the switching condition is met, and transmit the generated The image required by the image of the target viewpoint is sent to the display device to generate the current viewpoint picture.

In addition, in order to achieve the above purpose, the present application also provides an encoding device, the encoding device includes:

The image acquisition module is configured to acquire images of various viewpoints captured by each camera, and different cameras capture images corresponding to different viewpoints, wherein each viewpoint is used as the main viewpoint to generate the first image, and viewpoints other than the main viewpoint are used as A secondary viewpoint corresponding to the main viewpoint generates a second image of the secondary viewpoint, and the image includes at least one of a viewpoint frame or a viewpoint depth map frame;

The splicing and coding module is configured to splice the first image corresponding to each main viewpoint and the second image corresponding to the main viewpoint from the secondary viewpoint to obtain a video frame corresponding to the main viewpoint, and perform splicing of the spliced main image according to the shooting time. encoding the video frames corresponding to the viewpoint to generate a corresponding sequence of image frames, wherein the resolution of the first image is greater than the resolution of the second image;

The data transmission module is configured to convert the image frame sequence corresponding to the current viewpoint into The transmission path corresponding to the current viewpoint is transmitted to the decoding device.

In addition, in order to achieve the above object, the present application also provides a smart device, which includes: a memory, a processor, and a video data processing program stored in the memory and operable on the processor, the When the video data processing program is executed by the processor, the steps of the above video data processing method are realized.

In addition, to achieve the above object, the present application also provides a storage medium, the storage medium stores a video data processing program, and when the video data processing program is executed by a processor, the steps of the above video data processing method are implemented.

Beneficial effect

In the technical solution of a video data processing method, decoding device, encoding device, and storage medium provided in the embodiment of the present application, when receiving the viewpoint generation display instruction sent by the display device, the viewpoint generation display instruction is used to obtain the The current viewpoint of the display device; the image required to generate the image of the current viewpoint is intercepted from the video frame of the image frame sequence received by the transmission path corresponding to the current viewpoint, and the image of the current viewpoint is generated The required image is sent to the display device to generate the current viewpoint picture; when receiving the viewpoint switching instruction sent by the display device, the target viewpoint corresponding to the viewpoint switching instruction is obtained, and the corresponding transmission from the current viewpoint Intercepting the image required to generate the image of the target viewpoint from the video frame of the image frame sequence received by the path, and sending the image required to generate the image of the target viewpoint to the display device to generate the target viewpoint picture; when the switching condition is met, the image required for generating the image of the target viewpoint will be intercepted from the video frame of the image frame sequence received by the transmission path corresponding to the target viewpoint, and the generated image of the target viewpoint will be The image required by the image is sent to the display device to generate the image technical solution of the target viewpoint, which solves the problem that after the single-channel video stitching of multiple viewpoints is stitched using the spatial domain splicing method, the screen resolution required for the display of the free viewpoint application is solved. Insufficient, which in turn leads to the technical problem of a decrease in the resolution of the final generated viewpoint image, and improves the display effect of the viewpoint image.

Description of drawings

Fig. 1 is a schematic flow chart of the first embodiment of the video data processing method of the present application;

Fig. 2 is a schematic flow chart of the second embodiment of the video data processing method of the present application;

Fig. 3 is a schematic flow chart of the third embodiment of the video data processing method of the present application;

FIG. 4 is a schematic flow diagram of a fourth embodiment of the video data processing method of the present application;

5 is a schematic flow diagram of a fifth embodiment of the video data processing method of the present application;

Fig. 6 is the schematic diagram of the video frame switching of the present application;

Figure 7 is a schematic diagram of the arrangement of video frames of the present application;

FIG. 8 is a schematic flow diagram of multi-viewpoint video data in the decoding device of the present application;

FIG. 9 is a schematic flow chart of multi-viewpoint video data in the encoding device of the present application.

The realization of the object of the present application, functional characteristics and advantages will be further described in conjunction with the embodiments, with reference to the accompanying drawings. The above-mentioned accompanying drawings are only a diagram of an embodiment, rather than the entirety of the present application.

Embodiments of the present invention

In order to better understand the above-mentioned technical solutions, exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. Although exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited by the embodiments set forth herein. Rather, these embodiments are provided for more thorough understanding of the present disclosure and to fully convey the scope of the present disclosure to those skilled in the art.

The embodiment of the present application provides an embodiment of the video data processing method. It should be noted that although the logic sequence is shown in the flowchart, in some cases, the sequence shown or described steps.

As shown in Figure 1, in the first embodiment of the present application, the video data processing method of the present application includes the following steps:

Step S110, when receiving the viewpoint generation and display instruction sent by the display device, acquire the current viewpoint of the display device according to the viewpoint generation and display instruction;

Step S120, intercepting the image required to generate the image of the current viewpoint from the video frame of the image frame sequence received by the transmission path corresponding to the current viewpoint, and generating the image required for generating the image of the current viewpoint Send to the display device to generate the current viewpoint picture;

Step S130, when receiving the viewpoint switching instruction sent by the display device, obtain the target viewpoint corresponding to the viewpoint switching instruction, and obtain the video frame of the image frame sequence received from the transmission path corresponding to the current viewpoint Intercepting images required for generating the images of the target viewpoint, and sending the images required for generating the images of the target viewpoint to the display device to generate a target viewpoint picture;

Step S140, when the switching condition is met, intercept the image required to generate the image of the target viewpoint from the video frame of the image frame sequence received by the transmission path corresponding to the target viewpoint, and generate the image of the target viewpoint The required image of the image is sent to the display device to generate the image of the target viewpoint.

In this embodiment, the free viewpoint application allows the viewer to watch the video in the form of continuous viewpoints within a certain range. The viewer can set the position and angle of the viewpoint, and is no longer limited to a fixed camera viewing angle. This application often requires Multiple cameras shoot at the same time and generate video images from multiple viewpoints at the same time. In the application scenario of live viewing, the image corresponding to the current viewpoint is intercepted in real time from the video frame corresponding to the current viewpoint for viewing; in the application scenario of on-demand viewing, Obtain the video frame corresponding to the current viewpoint at the current moment from the image frame sequence and intercept the image corresponding to the current viewpoint to watch; , leading to the technical problem that the resolution of the single-channel video of each viewpoint displayed by the free viewpoint application decreases. This application designs a video data processing method. This method ensures that the free viewpoint can be switched with zero delay. The main viewpoint can also provide a higher picture resolution.

In this embodiment, multiple cameras can be deployed in this application. By splicing the images collected by different cameras, the image collected by each camera is the image corresponding to a viewpoint, and one of the viewpoints is used as the main viewpoint, and the other viewpoints are used as slaves. Viewpoint, the video frames transmitted in each transmission path are the video frames obtained by encoding the main viewpoint corresponding to the transmission path and the images obtained by splicing other secondary viewpoints except the main viewpoint, that is, the video frames are mainly obtained through the main viewpoint. The viewpoint and the images of the secondary viewpoint are concatenated, and the resolution of the image corresponding to the primary viewpoint is greater than the resolution of the image corresponding to the secondary viewpoint.

In this embodiment, the current viewpoint is the main viewpoint, and the decoding device acquires the current viewpoint of the display device according to the viewpoint generation and display instruction when receiving the viewpoint generation and display instruction sent by the display device. Specifically, the decoding device Analyzing the viewpoint generation and display instruction to obtain the current viewpoint of the display device corresponding to the viewpoint generation and display instruction; after obtaining the current viewpoint of the display device, receiving from the transmission path corresponding to the current viewpoint Intercepting the image required to generate the image of the current viewpoint from the video frame of the image frame sequence, and sending the image required for generating the image of the current viewpoint to the display device to generate the current viewpoint picture, realizing The currently watched video screen is a high-resolution screen;

In this embodiment, specifically, before obtaining the image required to intercept and generate the image of the current viewpoint, it is necessary to determine the arrangement of each viewpoint in the video frame of the image frame sequence received by the transmission path corresponding to the current viewpoint method, each viewpoint in the video frame is arranged according to a preset arrangement method, and the images corresponding to the viewpoints P1-P10 in Fig. 7 can be spliced into the same video frame by using the arrangement method, and the described The layout information of each viewpoint in the video frame, the layout information of each viewpoint includes: the coordinates of the pixel in the upper left corner of the corresponding viewpoint image in the video frame, the width and height of the corresponding viewpoint image, the viewpoint number corresponding to the image, etc.; After determining the arrangement of each viewpoint in the video frame, acquire the viewpoint identifier corresponding to the main viewpoint and the arrangement information of the video frame, and determine the main viewpoint according to the arrangement information and the viewpoint identifier For the location information of the corresponding image in the video frame, the image required to generate the image corresponding to the current viewpoint corresponding to the location information is intercepted from the video frame received by the transmission path corresponding to the current viewpoint; wherein, the The image required for the image corresponding to the current viewpoint can be a viewpoint picture corresponding to the current viewpoint, or a viewpoint depth map picture corresponding to a virtual viewpoint, and the virtual viewpoint is located between the viewpoints of two cameras and is a fictitious viewpoint.

In this embodiment, after receiving the viewpoint switching instruction sent by the display device, the target viewpoint corresponding to the viewpoint switching instruction is obtained. At this time, because the target viewpoint cannot be switched immediately and the video frame of the transmission path corresponding to the target viewpoint cannot be obtained , therefore, the image required to generate the image of the target viewpoint is intercepted from the video frame of the image frame sequence received through the transmission path corresponding to the current viewpoint; wherein, the image required to generate the image of the target viewpoint may be The viewpoint picture corresponding to the target viewpoint may also be a viewpoint depth map picture corresponding to the virtual viewpoint.

What needs to be emphasized here is that the image required for the image of the target viewpoint is located in the same video frame as the image required for the image of the current viewpoint intercepted from the video frame received by the transmission path corresponding to the aforementioned current viewpoint, and the The target viewpoint is one of the slave viewpoints, and the resolution of the picture corresponding to the current viewpoint is greater than the resolution of the picture corresponding to the target viewpoint; this is because there is a delay in the switching process, and cannot be obtained immediately To switch the video frame corresponding to the transmission path of the target viewpoint to be displayed, resulting in the phenomenon that the display screen is always at a low resolution. Therefore, this application will first intercept the video frame of the image frame sequence received by the transmission path corresponding to the current viewpoint An image required to generate an image of the target viewpoint to be switched is displayed at a low resolution.

In this embodiment, when the switching condition is satisfied, the image required to generate the image of the target viewpoint is intercepted from the video frame of the image frame sequence received by the transmission path corresponding to the target viewpoint, and the generated image of the target viewpoint is The image required for the image of the target viewpoint is sent to the display device for the display device to generate the image of the target viewpoint, at this time, the resolution of the image required for the image of the target viewpoint will be displayed from low resolution to high resolution, Wherein, the switching condition includes at least one of the following: the time stamp of the video frame corresponding to the currently displayed image is the same as the time stamp of the video frame in the transmission path corresponding to the target viewpoint; The time stamp of the video frame of the image frame sequence received by the path reaches a preset time point, and the preset time point can be set according to the actual situation; what needs to be emphasized here is that the transmission path corresponding to the target viewpoint is the same as the aforementioned The transmission path corresponding to the current viewpoint is not the same path, and at the same time, the sequence of image frames received by the transmission path corresponding to the target viewpoint is not the same sequence of image frames received by the transmission path corresponding to the aforementioned current viewpoint.

In this embodiment, taking Figure 6 as an example, P3 represents the third viewpoint, P3_2 is the second time period of the No. 3 viewpoint, P3_3 is the third time period of the No. 3 viewpoint, and P3_2 to P3_3 are normal ones After the time period is played, the content of the next time period is played, and the viewpoint is not switched. The viewpoint switch occurs during the playback of P2_1. For example, when the playback is halfway through, switch to the A viewpoint. There is no high-resolution picture of viewpoint A, so the remaining time in the corresponding time period of P2_1 can only get the low-resolution picture of viewpoint A from the merge stream; when the playback progress reaches the time period corresponding to P3_2, it can be obtained from P3_2 The high-resolution picture of the A viewpoint, so starting from P3_2, the high-resolution picture of the A viewpoint is displayed.

In this embodiment, according to the above-mentioned technical solution, when the viewpoint generation and display instruction sent by the display device is received, the current viewpoint of the display device is acquired according to the viewpoint generation and display instruction; the transmission path corresponding to the current viewpoint receives the Intercept the image needed to generate the image of the current viewpoint from the video frame of the image frame sequence, and send the image required to generate the image of the current viewpoint to the display device to generate the current viewpoint picture; When the viewpoint switching instruction sent by the display device is received, the target viewpoint corresponding to the viewpoint switching instruction is obtained, and the target viewpoint is generated by intercepting from the video frame of the image frame sequence received by the transmission path corresponding to the current viewpoint An image required for an image of a viewpoint, sending the image required for generating the image of the target viewpoint to the display device to generate a target viewpoint picture, wherein the image required for generating the image of the current viewpoint and The images required for generating the image of the target viewpoint all include at least one of a viewpoint picture or a viewpoint depth map picture, and the resolution of the picture corresponding to the current viewpoint is greater than the resolution of the picture corresponding to the target viewpoint; When the switching condition is satisfied, the image required to generate the image of the target viewpoint is intercepted from the video frame of the image frame sequence received by the transmission path corresponding to the target viewpoint, and the generated image of the target viewpoint is generated The required image is sent to the display device to generate the image technical solution of the target viewpoint, which solves the problem of insufficient screen resolution required for the free viewpoint application display after the single-channel video stitching of multiple viewpoints using the spatial domain splicing method, This further leads to the technical problem that the resolution of the finally generated viewpoint picture is reduced, and the display effect of the viewpoint picture is improved.

As shown in Figure 2, Figure 2 is the second embodiment of the present application, based on step S130 of the first embodiment, the second embodiment of the present application includes the following steps:

Step S131, acquiring the viewpoint identifier corresponding to the target viewpoint and the arrangement information of the video frames;

Step S132, determining the position information of the image in the video frame required for generating the image of the target viewpoint according to the arrangement information and the viewpoint identifier;

Step S133 , intercepting an image required to generate an image of a target viewpoint corresponding to the position information from a video frame of the sequence of image frames received by the transmission path corresponding to the current viewpoint.

In this embodiment, the viewpoint identifier is a viewpoint number, which means the number corresponding to each viewpoint; the arrangement information of the video frame is generated by the arrangement of each viewpoint in the video frame based on a preset arrangement method Specifically, each viewpoint in the video frame is arranged according to a preset arrangement method, and the images corresponding to the viewpoints P1-P10 in FIG. 7 can be spliced into the same video frame by using the arrangement method, and Generate the arrangement information of each viewpoint in the video frame, and the arrangement information of each viewpoint includes: the coordinates of the pixel at the upper left corner of the corresponding viewpoint image in the video frame, the width and height of the corresponding viewpoint image, and the viewpoint number corresponding to the image and other information; according to the arrangement information, the image corresponding to the viewpoint in the video frame can be intercepted and displayed; for example, in this embodiment, the image corresponding to the target viewpoint is intercepted and displayed in the video frame received through the transmission path corresponding to the current viewpoint. The specific process of the image is: obtain the viewpoint identifier corresponding to the target viewpoint and the arrangement information of the video frame, and determine the image corresponding to the target viewpoint in the video frame according to the arrangement information and the viewpoint identifier. For the position information in the frame, the image corresponding to the target viewpoint corresponding to the position information is intercepted from the video frame received by the transmission path corresponding to the current viewpoint for display.

According to the above technical solution, this embodiment adopts the method of obtaining the viewpoint identifier corresponding to the target viewpoint and the arrangement information of the video frame; determining the method for generating the target viewpoint according to the arrangement information and the viewpoint identifier The position information of the image required by the image in the video frame; intercepting and generating the image of the target viewpoint corresponding to the position information from the video frame of the image frame sequence received by the transmission path corresponding to the current viewpoint The technical means of obtaining the required image can realize the low-resolution display of the image corresponding to the target viewpoint.

As shown in Figure 3, Figure 3 is the third embodiment of the present application, the third embodiment of the present application comprises the following steps:

Step S210, acquiring images of each viewpoint captured by each camera, and images corresponding to different viewpoints captured by different cameras;

Step S220, splicing the first image corresponding to each main viewpoint and the second image corresponding to the main viewpoint to obtain the video frame corresponding to the main viewpoint, and according to the shooting time, the spliced image corresponding to the main viewpoint The video frame is encoded to generate a corresponding sequence of image frames;

Step S230, when the decoding device receives the viewpoint generation and display instruction sent by the display device, after acquiring the current viewpoint of the display device according to the viewpoint generation and display instruction, the sequence of image frames corresponding to the current viewpoint is replaced by the current viewpoint The corresponding transmission path transmits to the decoding device.

In this embodiment, the application can deploy multiple cameras, and the number of cameras can be set according to the actual situation; images of each viewpoint taken by each camera are obtained, and different cameras take images corresponding to different viewpoints; the images can be various viewpoints The corresponding view point picture may also be the view point depth map picture corresponding to the virtual view point corresponding to each view point; each view point is used as the main view point to generate the first image, and the view points other than the main view point are used as the view points corresponding to the main view point Generate the second image from the viewpoint from the viewpoint. For example, this application can deploy 10 cameras to shoot video, and the cameras shoot around a shooting focus. P1-P10 is the image taken by each camera, and the corresponding number of P1-P10 is 1 -10 viewpoint image.

In this embodiment, by splicing the images collected by different cameras, the video frames corresponding to the main viewpoint are encoded and then sent to the display terminal for display. The images collected by each camera are the images corresponding to one viewpoint, and one of them is The viewpoint is the main viewpoint, and other viewpoints are the secondary viewpoints. The video frames transmitted in each transmission path are encoded by splicing images obtained from the primary viewpoint corresponding to the transmission path and other secondary viewpoints except the primary viewpoint. The frame, that is, the video frame is mainly obtained by splicing the images of the main viewpoint and the secondary viewpoint, and the resolution of the image corresponding to the primary viewpoint is greater than the resolution of the image corresponding to the secondary viewpoint.

In this embodiment, the first image corresponding to each main viewpoint and the second image corresponding to the main viewpoint are spliced to obtain the video frame corresponding to the main viewpoint, and the video frame corresponding to the main viewpoint is sent according to the shooting time A corresponding image frame sequence is generated after encoding by an HEVC general-purpose encoder, wherein the resolution of the first image is greater than the resolution of the second image; for example, as shown in Figure 7, when P2 is the main viewpoint , the resolution corresponding to P2 is 2880*1620, and the resolution corresponding to other slave viewpoints is 960*540, that is, when P2 is used as the main viewpoint, the resolution of P2 viewpoint is greater than the resolution of other slave viewpoints; In the process, the encoder needs to use 10 images as the main viewpoint to generate 10 transmission path video frames; each transmission path video frame is composed of an image corresponding to the main viewpoint and other 9 secondary viewpoints except the main viewpoint. For example, Figure 7 is a video frame with the P2 viewpoint as the main viewpoint, where, when the P2 viewpoint is used as the main viewpoint, P1 and P3-P10 are used as the secondary viewpoints other than the P2 main viewpoint , splicing the images of each viewpoint to obtain the video frame corresponding to the main viewpoint of P2; when the image splicing method using other viewpoints as the main viewpoint is the same as the splicing method of the video frame corresponding to the above-mentioned P2 main viewpoint, it will not be repeated here. .

In this embodiment, the image collected by each camera is the image corresponding to a viewpoint, one of the viewpoints is taken as the main viewpoint, and the other viewpoints are taken as the slave viewpoints, and the video frames transmitted in each transmission path are corresponding to the transmission path The video frame obtained by encoding the main viewpoint and the images obtained by splicing other secondary viewpoints except the primary viewpoint, that is, the video frame is mainly obtained by splicing the images of the primary viewpoint and the secondary viewpoint, and the image corresponding to the primary viewpoint The resolution is greater than the resolution of the image corresponding to the viewpoint, that is, the resolution of the first image is greater than the resolution of the second image; after the encoding is completed, when the decoding device receives the viewpoint generation display instruction sent by the display device, After the viewpoint generation and display instruction acquires the current viewpoint of the display device, the image frame sequence corresponding to the current viewpoint is transmitted to the decoding device through the transmission path corresponding to the current viewpoint; for example, according to the viewpoint generation and display instruction, the obtained After the main viewpoint of the display device is displayed, the image frame sequence corresponding to the main viewpoint is transmitted to the decoding device through the transmission path corresponding to the main viewpoint.

In this embodiment, according to the above-mentioned technical solution, since the images of various viewpoints taken by each camera are acquired, different cameras shoot images corresponding to different viewpoints, wherein each viewpoint is used as the main viewpoint to generate the first image, and the main viewpoint is Other viewpoints are used as the secondary viewpoint corresponding to the main viewpoint to generate the second image of the secondary viewpoint, and the image includes at least one of a viewpoint picture or a viewpoint depth map picture; the first image corresponding to each main viewpoint and The second image corresponding to the main viewpoint is spliced to obtain the video frame corresponding to the main viewpoint, and the spliced video frame corresponding to the main viewpoint is encoded according to the shooting time to generate a corresponding image frame sequence, wherein, The resolution of the first image is greater than the resolution of the second image; when the decoding device receives the viewpoint generation and display instruction sent by the display device, after acquiring the current viewpoint of the display device according to the viewpoint generation and display instruction, The technical means of transmitting the image frame sequence corresponding to the current viewpoint to the decoding device through the transmission path corresponding to the current viewpoint realizes encoding of images corresponding to different viewpoints.

As shown in Figure 4, Figure 4 is the fourth embodiment of the present application, based on step S220 of the third embodiment, the fourth embodiment of the present application includes the following steps:

Step S221, splicing the first image corresponding to each main viewpoint and the second image of the secondary viewpoint corresponding to the main viewpoint to obtain a video frame corresponding to the main viewpoint;

Step S222, sorting the video frames corresponding to the main viewpoint according to the shooting time to generate a spliced image sequence;

Step S223, encoding the spliced image sequence to obtain an image frame sequence corresponding to each of the main viewpoints.

In this embodiment, in the application scenario of on-demand viewing, the watched video frames are actually historical video frames, and the first image corresponding to each main viewpoint and the second image of the secondary viewpoint corresponding to the main viewpoint are spliced Obtain the video frames corresponding to the main viewpoint, sort the video frames corresponding to the main viewpoint according to the shooting time, and generate a spliced image sequence corresponding to each main viewpoint; after obtaining the spliced image sequence, encode the spliced image sequence To obtain an image frame sequence corresponding to each main viewpoint, wherein the first frame image in the image frame sequence corresponding to each main viewpoint is encoded as an I frame.

In this embodiment, for multiple code streams, viewpoint switching can only be performed at the I frame, the start frame is the I frame, also known as the key frame, the I frame is an internal picture, and the I frame is usually the frame of each image. The first frame in the frame sequence is moderately compressed and used as a reference point for random access, which can be regarded as an image. During the encoding process, the encoding can be divided into one code stream sliced at 1 second intervals, and a starting frame is inserted every 1 second. start frame, and take 1 second as the length to generate code stream slices and each code stream slice takes I frame as the start frame, and the application encodes the first frame image in the image frame sequence corresponding to each main viewpoint as The I frame is used to obtain the video frame corresponding to the main viewpoint corresponding to the viewpoint switching instruction according to the start frame when the viewpoint switching instruction is received.

In this embodiment, according to the above-mentioned technical solution, since the first image corresponding to each main viewpoint and the second image corresponding to the main viewpoint are spliced to obtain the video frame corresponding to the main viewpoint; according to the shooting time, the The video frames corresponding to the main viewpoints are sorted to generate a spliced image sequence; the spliced image sequences are encoded to obtain the image frame sequences corresponding to each of the main viewpoints, wherein the image frame sequences corresponding to each of the main viewpoints are The first frame of the image is encoded as an I-frame technical means, thereby generating an image frame sequence corresponding to the main viewpoint.

As shown in Figure 5, Figure 5 is the fifth embodiment of the present application, based on step S223 in the fourth embodiment, the fifth embodiment of the present application includes the following steps:

Step S2231, acquiring the arrangement information of the video frame corresponding to the main viewpoint, the arrangement information at least including the viewpoint identifier of each viewpoint and the position information of the image of each viewpoint in the video frame corresponding to the main viewpoint;

Step S2232: Encoding the spliced image sequence, and inserting the arrangement information into a sequence header of the encoded spliced image sequence to obtain an image frame sequence corresponding to the main viewpoint.

In this embodiment, the viewpoint identifier is a viewpoint number, that is, the number corresponding to each viewpoint; the arrangement information of the video frame corresponding to the main viewpoint is each viewpoint based on the preset arrangement method in the video frame Arrangement generated, the preset arrangement method can be set according to the actual situation; specifically, the first image corresponding to each main viewpoint and the second images of all other secondary viewpoints corresponding to the main viewpoint are spliced Obtain the video frames corresponding to the main viewpoint, sort the video frames corresponding to the main viewpoint according to the shooting time, generate a stitched image sequence, and encode the stitched image sequence; at the same time, obtain the arrangement information of the video frames corresponding to the main viewpoint, and The arrangement information is inserted into the sequence header of the coded spliced image sequence, so as to obtain an image frame sequence corresponding to the main viewpoint.

In this embodiment, each viewpoint in the video frame corresponding to the main viewpoint is arranged according to a preset arrangement method, and the images corresponding to the P1-P10 viewpoints in FIG. Obtain the video frame corresponding to the main viewpoint, and generate the arrangement information of each viewpoint in the video frame corresponding to the main viewpoint, and the arrangement information of each viewpoint includes at least the viewpoint identifier of each viewpoint and the image of each viewpoint The position information in the video frame corresponding to the main viewpoint; the arrangement information is written into the image header in the video frame corresponding to the main viewpoint as user extension information.

According to the above technical solution in this embodiment, since the arrangement information of the video frames corresponding to the main viewpoint is obtained, the arrangement information at least includes the viewpoint identifier of each viewpoint and the image corresponding to each viewpoint in the main viewpoint. the position information in the video frame; the technique of encoding the mosaic image sequence, and inserting the arrangement information into the sequence header of the encoded mosaic image sequence to obtain the image frame sequence corresponding to the main viewpoint means to generate an image frame sequence corresponding to the main viewpoint.

Based on the same application idea, the embodiment of the present application also provides a decoding device, as shown in Figure 8, the decoding device includes a first receiving module 10, a first sending module 20, a second receiving module 30 and a second sending module 40;

Wherein, the first receiving module 10 is configured to acquire the current viewpoint of the display device according to the viewpoint generation and display instruction when receiving the viewpoint generation and display instruction sent by the display device;

The first sending module 20 is configured to intercept the image of the current viewpoint from the video frame of the image frame sequence received by the transmission path corresponding to the current viewpoint, and send the image of the current viewpoint to the display equipment to generate the current viewpoint screen;

The second receiving module 30 is configured to acquire the target viewpoint corresponding to the viewpoint switching instruction when receiving the viewpoint switching instruction sent by the display device, and obtain the target viewpoint corresponding to the current viewpoint from the transmission path corresponding to the current viewpoint. Intercepting images required to generate the images of the target viewpoint from the video frames of the image frame sequence, and sending the images required to generate the images of the target viewpoint to the display device to generate a target viewpoint picture, wherein the Both the image required for generating the image of the current viewpoint and the image required for generating the image of the target viewpoint include at least one of a viewpoint frame or a viewpoint depth map frame, and the resolution of the frame corresponding to the current viewpoint is Greater than the resolution of the picture corresponding to the target viewpoint;

The second sending module 40 is configured to, when the switching condition is met, intercept the image required to generate the image of the target viewpoint from the video frame of the image frame sequence received by the transmission path corresponding to the target viewpoint, and The images required for generating the image of the target viewpoint are sent to the display device to generate a picture of the current viewpoint.

In this application, by using the above-mentioned decoding device, the video stream is decoded to obtain corresponding viewpoint pictures.

Based on the same application idea, the embodiment of the present application also provides a coding device. As shown in FIG.

Wherein, the image acquisition module 50 is configured to acquire images of various viewpoints captured by each camera, and different cameras capture images corresponding to different viewpoints, wherein each viewpoint is used as the main viewpoint to generate the first image, and the main viewpoint is A viewpoint other than the primary viewpoint is used as a secondary viewpoint corresponding to the main viewpoint to generate a second image of the secondary viewpoint, and the image includes at least one of a viewpoint frame or a viewpoint depth map frame;

The splicing and encoding module 60 is configured to splice the first image corresponding to each main viewpoint and the second image corresponding to the main viewpoint from the secondary viewpoint to obtain a video frame corresponding to the main viewpoint, and perform splicing according to the shooting time. The video frame corresponding to the main viewpoint is encoded to generate a corresponding image frame sequence, wherein the resolution of the first image is greater than the resolution of the second image;

The data transmission module 70 is configured to, when the decoding device receives the viewpoint generation and display instruction sent by the display device, obtain the current viewpoint of the display device according to the viewpoint generation and display instruction, and transfer the image frame corresponding to the current viewpoint to The sequence is transmitted to the decoding device through the transmission path corresponding to the current viewpoint.

In the present application, by adopting the above-mentioned encoding device, the viewpoint pictures are encoded to obtain corresponding video streams.

Based on the same application idea, an embodiment of the present application also provides a storage medium, the storage medium stores a video data processing program, and when the video data processing program is executed by a processor, each of the above-mentioned video data processing methods is implemented. Steps, and can achieve the same technical effect, in order to avoid repetition, no more details here.

Since the storage medium provided by the embodiment of the present application is the storage medium used to implement the method of the embodiment of the present application, based on the method introduced in the embodiment of the present application, those skilled in the art can understand the specific structure and deformation of the storage medium, Therefore, I will not repeat them here. All computer storage media used in the methods of the embodiments of the present application belong to the intended protection scope of the present application.

Those skilled in the art should understand that the embodiments of the present application may be provided as methods, systems, or computer program products. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

The present application is described with reference to flowcharts and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the present application. It should be understood that each procedure and/or block in the flowchart and/or block diagram, and combinations of procedures and/or blocks in the flowchart and/or block diagram can be realized by computer program instructions. These computer program instructions may be provided to a general purpose computer, special purpose computer, embedded processor, or processor of other programmable data processing equipment to produce a machine such that the instructions executed by the processor of the computer or other programmable data processing equipment produce a Means for realizing the functions specified in one or more steps of the flowchart and/or one or more blocks of the block diagram.

These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to operate in a specific manner, such that the instructions stored in the computer-readable memory produce an article of manufacture comprising instruction means, the instructions The device realizes the function specified in one or more procedures of the flowchart and/or one or more blocks of the block diagram.

These computer program instructions can also be loaded onto a computer or other programmable data processing device, causing a sequence of operational steps to be performed on the computer or other programmable device to produce a computer-implemented process, thereby The instructions provide steps for implementing the functions specified in the flow chart flow or flows and/or block diagram block or blocks.

It should be noted that, in the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The application can be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In a unit claim enumerating several means, several of these means can be embodied by one and the same item of hardware. The use of the words first, second, and third, etc. does not indicate any order. These words can be interpreted as names.

While preferred embodiments of the present application have been described, additional changes and modifications to these embodiments can be made by those skilled in the art once the basic inventive concept is appreciated. Therefore, the appended claims are intended to be construed to cover the preferred embodiment and all changes and modifications which fall within the scope of the application.

Obviously, those skilled in the art can make various changes and modifications to the application without departing from the spirit and scope of the application. In this way, if these modifications and variations of the present application fall within the scope of the claims of the present application and their equivalent technologies, the present application is also intended to include these modifications and variations.

Claims

A video data processing method, wherein it is applied to a decoding device; the video data processing method includes:

When receiving the viewpoint generation and display instruction sent by the display device, acquiring the current viewpoint of the display device according to the viewpoint generation and display instruction;

Intercepting the image required to generate the image of the current viewpoint from the video frame of the image frame sequence received by the transmission path corresponding to the current viewpoint, and sending the image required to generate the image of the current viewpoint to the The above-mentioned display device is used to generate the current viewpoint picture;

When the viewpoint switching instruction sent by the display device is received, the target viewpoint corresponding to the viewpoint switching instruction is obtained, and the generated video frame is intercepted from the video frame of the image frame sequence received by the transmission path corresponding to the current viewpoint. The image required for the image of the target viewpoint, and the image required for generating the image of the target viewpoint is sent to the display device to generate the target viewpoint picture;

When the switching condition is satisfied, the image required to generate the image of the target viewpoint is intercepted from the video frame of the image frame sequence received by the transmission path corresponding to the target viewpoint, and the generated image of the target viewpoint is generated The required image is sent to the display device to generate the image of the target viewpoint.
The method according to claim 1, wherein the step of intercepting the image required to generate the image of the target viewpoint from the video frame of the sequence of image frames received by the transmission path corresponding to the current viewpoint comprises:

Acquiring the viewpoint identifier corresponding to the target viewpoint and the arrangement information of the video frames;

determining, according to the arrangement information and the viewpoint identifier, the position information of the image required for generating the image of the target viewpoint in the video frame;

Intercepting an image required to generate an image of a target viewpoint corresponding to the position information from a video frame of the sequence of image frames received by the transmission path corresponding to the current viewpoint.
The method according to claim 1, wherein the images required for generating the image of the current viewpoint and the images required for generating the image of the target viewpoint both include at least one of a viewpoint frame or a viewpoint depth map frame. one.
The method according to claim 1, wherein the resolution of the picture corresponding to the current viewpoint is greater than the resolution of the picture corresponding to the target viewpoint.
The method according to claim 1, wherein the switching condition comprises at least one of the following:

The time stamp of the video frame corresponding to the currently displayed image is the same as the time stamp of the video frame in the transmission path corresponding to the target viewpoint;

The time stamp of the video frame of the image frame sequence received from the transmission path corresponding to the current viewpoint reaches a preset time point.
A video data processing method, wherein it is applied to a coding device; the video data processing method includes:

Obtain images of various viewpoints captured by each camera, and different cameras capture images corresponding to different viewpoints, wherein each viewpoint is used as a main viewpoint to generate a first image, and viewpoints other than the main viewpoint are used as corresponding to the main viewpoint generating said second image from a viewpoint from a viewpoint, said image comprising at least one of a viewpoint frame or a viewpoint depth map frame;

The first image corresponding to each main viewpoint and the second image corresponding to the main viewpoint from the viewpoint are spliced to obtain a video frame corresponding to the main viewpoint, and the spliced video frame corresponding to the main viewpoint is performed according to the shooting time. encoding to generate a corresponding image frame sequence, wherein the resolution of the first image is greater than the resolution of the second image;

When the decoding device receives the viewpoint generation and display instruction sent by the display device, after acquiring the current viewpoint of the display device according to the viewpoint generation and display instruction, the image frame sequence corresponding to the current viewpoint is transmitted by the corresponding to the current viewpoint The path is transmitted to the decoding device.
The method according to claim 6, wherein the first image corresponding to each main viewpoint and the second image corresponding to the main viewpoint are spliced to obtain a video frame corresponding to the main viewpoint, and according to the shooting time The step of encoding the spliced video frames corresponding to the main viewpoint to generate a corresponding sequence of image frames includes:

Splicing the first image corresponding to each main viewpoint and the second image corresponding to the main viewpoint from the viewpoint to obtain a video frame corresponding to the main viewpoint;

Sorting the video frames corresponding to the main viewpoint according to the shooting time to generate a spliced image sequence;

Encoding is performed on the spliced image sequence to obtain an image frame sequence corresponding to each of the main viewpoints.
The method according to claim 7, wherein the first image frame in the sequence of image frames corresponding to each main viewpoint is encoded as an I frame.
The method according to claim 7, wherein the step of encoding the stitched image sequence to obtain an image frame sequence corresponding to each of the main viewpoints comprises:

Obtaining the arrangement information of the video frames corresponding to the main viewpoint, the arrangement information at least including the viewpoint identification of each viewpoint and the position information of the image of each viewpoint in the video frame corresponding to the main viewpoint;

Encoding the spliced image sequence, and inserting the arrangement information into a sequence header of the encoded spliced image sequence to obtain an image frame sequence corresponding to the main viewpoint.
The method according to claim 9, wherein the position information of the image of each viewpoint in the video frame corresponding to the main viewpoint includes at least: the coordinates of the pixel at the upper left corner of the image of the viewpoint in the video frame and the image of the viewpoint width and height.
A decoding device, wherein the decoding device includes:

The first receiving module is configured to acquire the current viewpoint of the display device according to the viewpoint generation display instruction when receiving the viewpoint generation display instruction sent by the display device;

The first sending module is configured to intercept the image of the current viewpoint from the video frame of the image frame sequence received by the transmission path corresponding to the current viewpoint, and send the image of the current viewpoint to the display device to generate current viewpoint screen;

The second receiving module is configured to acquire the target viewpoint corresponding to the viewpoint switching instruction when receiving the viewpoint switching instruction sent by the display device, and obtain the sequence of image frames received from the transmission path corresponding to the current viewpoint Intercepting the image required for generating the image of the target viewpoint from the video frame of the target viewpoint, and sending the image required for generating the image of the target viewpoint to the display device to generate a target viewpoint picture, wherein the generating the Both the image required for the image of the current viewpoint and the image required for generating the image of the target viewpoint include at least one of a viewpoint picture or a viewpoint depth map picture, and the resolution of the picture corresponding to the current viewpoint is larger than the The resolution of the picture corresponding to the target viewpoint;

The second sending module is configured to intercept the image required to generate the image of the target viewpoint from the video frame of the image frame sequence received by the transmission path corresponding to the target viewpoint when the switching condition is met, and transmit the generated The image required by the image of the target viewpoint is sent to the display device to generate the current viewpoint picture.
An encoding device, wherein the encoding device includes:

The image acquisition module is configured to acquire images of various viewpoints captured by each camera, and different cameras capture images corresponding to different viewpoints, wherein each viewpoint is used as the main viewpoint to generate the first image, and viewpoints other than the main viewpoint are used as A secondary viewpoint corresponding to the main viewpoint generates a second image of the secondary viewpoint, and the image includes at least one of a viewpoint frame or a viewpoint depth map frame;

The splicing and coding module is configured to splice the first image corresponding to each main viewpoint and the second image corresponding to the main viewpoint from the secondary viewpoint to obtain a video frame corresponding to the main viewpoint, and perform splicing of the spliced main image according to the shooting time. encoding the video frames corresponding to the viewpoint to generate a corresponding sequence of image frames, wherein the resolution of the first image is greater than the resolution of the second image;

The data transmission module is configured to convert the image frame sequence corresponding to the current viewpoint into The transmission path corresponding to the current viewpoint is transmitted to the decoding device.
A storage medium, wherein a video data processing program is stored thereon, and when the video data processing program is executed by a processor, the steps of the video data processing method according to any one of claims 1-10 are realized.