CN117014723A

CN117014723A - Video data transmission method, terminal, network device, system and electronic device

Info

Publication number: CN117014723A
Application number: CN202210441397.2A
Authority: CN
Inventors: 何应腾; 陈嘉敏; 陈金悬; 唐弘毅; 郝源
Original assignee: China Mobile Guangdong Hong Kong Macao Greater Bay Area Guangdong Innovation Research Institute Co ltd; China Mobile Communications Group Co Ltd; China Mobile Group Guangdong Co Ltd
Current assignee: China Mobile Guangdong Hong Kong Macao Greater Bay Area Guangdong Innovation Research Institute Co ltd; China Mobile Communications Group Co Ltd; China Mobile Group Guangdong Co Ltd
Priority date: 2022-04-25
Filing date: 2022-04-25
Publication date: 2023-11-07

Abstract

The application relates to the technical field of communication, and provides a video data transmission method, a terminal, network equipment, a system and electronic equipment. The method comprises the following steps: determining a target viewing angle; determining a target storage position of frame data corresponding to a target view angle in a target video file based on index header information, wherein the target video file comprises the index header information and multi-view video data, the multi-view video data comprises frame data corresponding to two or more view angles respectively, and the index header information is used for representing the storage position of the frame data corresponding to the two or more view angles respectively in the target video file; transmitting a first request message for requesting frame data corresponding to a target view angle, wherein the first request message carries a target storage position; and receiving frame data corresponding to the target visual angle. According to the embodiment of the application, the frame data corresponding to the target visual angle sent by the network equipment is acquired, and the frame data corresponding to all visual angles does not need to be downloaded, so that the degradation of video quality can be avoided.

Description

Video data transmission method, terminal, network device, system and electronic device

Technical Field

The present application relates to the field of communications technologies, and in particular, to a video data transmission method, a terminal, a network device, a system, and an electronic device.

Background

With the popularity of the fifth generation mobile communication technology (5th Generation Mobile Communication Technology,5G) and the wide application of a large number of high-performance Set Top Boxes (STBs)/mobile terminal devices, video applications are developed towards better experience, and higher resolution and immersive panoramic interaction is the main stream of video application technology development. In the related technology, the free view angle/multi-view angle technology can support the free change of the view angle of a user, and overcomes the defect that the view angle cannot be selected independently due to only one view angle transmitted during the conventional video viewing.

In the related art, video images of all view angles are downsampled and spliced into a large multi-view angle video image, then compressed and encoded for transmission, a playing end displays the downsampled single-view angle video image, and the more the view angles, the fewer the number of distributable pixels of the single view angle in the spliced image, the lower the resolution of the single view angle, so that the video quality of the single view angle is poor.

Disclosure of Invention

The embodiment of the application provides a video data transmission method, a terminal, network equipment, a system and electronic equipment, which are used for solving the technical problem that in the related art, the more the viewing angles are, the fewer the number of distributable pixels of a single viewing angle in a spliced image is, the lower the resolution of the single viewing angle is, so that the video quality of the single viewing angle is poor.

In a first aspect, an embodiment of the present application provides a video data transmission method, applied to a terminal, including:

determining a target viewing angle;

determining a target storage position of frame data corresponding to the target view angle in a target video file based on index header information, wherein the target video file comprises the index header information and multi-view video data, the multi-view video data comprises frame data corresponding to two or more view angles respectively, and the index header information is used for representing the storage position of the frame data corresponding to the two or more view angles respectively in the target video file;

a first request message for requesting frame data corresponding to the target visual angle is sent, wherein the first request message carries the target storage position;

and receiving frame data corresponding to the target visual angle.

In one embodiment, the data structure corresponding to the multi-view video data is a first target table structure, each cell in the first target table structure stores one frame data, a target row in the first target table structure is used for storing frame data corresponding to the two or more views respectively under a target frame time, the frame time corresponding to the first target row is earlier than the frame time corresponding to a second target row, the first target row and the second target row are any two adjacent rows in the first target table structure, and the first target row is a previous row of the second target row;

The index header information comprises one or more pieces of frame position information, the data structure corresponding to the index header information is a second target table structure, each cell in the second target table structure stores one piece of frame position information, the number of rows and columns of the second target table structure is the same as that of rows and columns of the first target table structure, the target frame position information is used for representing the storage position of target frame data in the target video file, the target frame position information is any one of the one or more pieces of frame position information, and the row number of the target frame in the second target table structure is the same as that of the target frame data in the first target table structure.

In one embodiment, the determining, based on the index header information, the target storage location of the frame data corresponding to the target view in the target video file includes:

transmitting a second request message for requesting the index header information;

receiving the index header information;

based on the index header information, the target view angle number and the target frame number, determining all or part of frame data between a first target frame and a second target frame as frame data to be acquired in frame data corresponding to the target view angle, and determining a storage position of the frame data to be acquired in the target video file as the target storage position;

The target view angle number is the view angle number of the two or more view angles, and the frame numbers corresponding to the two or more view angles respectively are the target frame number; the first target frame and the second target frame are frame data corresponding to the target visual angle, the frame time corresponding to the first target frame is earlier than or equal to the frame time corresponding to the target playing frame, the frame time corresponding to the second target frame is later than or equal to the frame time corresponding to the target playing frame, and the target playing frame data are frame data corresponding to the target visual angle at the target playing time.

In one embodiment, the target video file further includes header information, where the header information is used to characterize a byte length of the index header information, and before determining, among the frame data corresponding to the target view angle, all or part of frame data between the first target frame and the second target frame as frame data to be acquired, and determining a storage location of the frame data to be acquired in the target video file as the target storage location, the method further includes:

Transmitting a third request message for requesting the header information;

receiving the header information;

determining a byte length of the index header information based on the header information;

the target frame number is determined based on the byte length of the index header information.

In a second aspect, an embodiment of the present application provides a video data transmission method, applied to a network device, including:

receiving a first request message, wherein the first request message carries a target storage position; the target storage position is used for representing the storage position of frame data corresponding to a target view angle in a target video file, the target video file is stored in network equipment, the target video file comprises index header information and multi-view video data, the multi-view video data comprises frame data corresponding to two or more view angles respectively, and the index header information is used for representing the storage position of the frame data corresponding to the two or more view angles respectively in the target video file;

and sending frame data corresponding to the target visual angle based on the target storage position.

In one embodiment, before said receiving the first request message, the method further comprises:

acquiring frame data corresponding to the two or more visual angles respectively;

acquiring the multi-view video data based on the first target table structure and frame data respectively corresponding to the two or more views;

acquiring the index header information based on the multi-view video data and the second target table structure;

acquiring the header information based on the index header information;

And acquiring the target video file based on the header information, the index header information and the multi-view video data.

In a third aspect, an embodiment of the present application provides a terminal, including a memory, a transceiver, and a processor;

a memory for storing a computer program; a transceiver for transceiving data under control of the processor; a processor for reading the computer program in the memory and performing the following operations:

determining a target viewing angle;

and receiving frame data corresponding to the target visual angle.

In a fourth aspect, an embodiment of the present application provides a network device, including a memory, a transceiver, and a processor;

receiving a first request message, wherein the first request message carries a target storage position; the target storage position is used for representing a target storage position of frame data corresponding to a target view angle in a target video file, the target video file is stored in network equipment, the target video file comprises index header information and multi-view video data, the multi-view video data comprises frame data corresponding to two or more view angles respectively, and the index header information is used for representing storage positions of the frame data corresponding to the two or more view angles respectively in the target video file;

In a fifth aspect, an embodiment of the present application provides a video data transmission system, including: the system comprises a terminal and network equipment, wherein the network equipment stores a target video file, the target video file comprises index header information and multi-view video data, the multi-view video data comprises frame data corresponding to two or more view angles respectively, the index header information is used for representing storage positions of the frame data corresponding to the two or more view angles respectively in the target video file, and the storage positions of the frame data in the target video file are as follows:

The terminal is used for: determining a target viewing angle;

determining a target storage position of frame data corresponding to the target visual angle in the target video file based on the index header information;

a first request message for requesting frame data corresponding to the target visual angle is sent to the network equipment, wherein the first request message carries the target storage position;

the network device is configured to: receiving the first request message;

based on the target storage position, sending frame data corresponding to the target visual angle to the terminal;

the terminal is also used for receiving frame data corresponding to the target visual angle.

In a sixth aspect, an embodiment of the present application provides an electronic device, including a processor and a memory storing a computer program, where the processor implements the video data transmission method according to the first aspect or the second aspect when executing the program.

According to the video data transmission method, the terminal, the network equipment, the system and the electronic equipment, the frame data corresponding to the target visual angle can be obtained by determining the target storage position of the frame data corresponding to the target visual angle in the target video file, the frame data corresponding to the target visual angle transmitted by the network equipment is not required to be downloaded, the video data corresponding to a plurality of visual angles are not required to be spliced, the resolution of a single visual angle picture is not reduced, and further the video quality of a single visual angle is prevented from being deteriorated.

Drawings

In order to more clearly illustrate the application or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the application, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic flow chart of a video data transmission method according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a data structure of a video file according to an embodiment of the present application;

FIG. 3 is a second flowchart of a video data transmission method according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of a video data transmission device according to an embodiment of the present application;

fig. 5 is a second schematic structural diagram of a video data transmission device according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of a terminal according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of a network device according to an embodiment of the present application;

fig. 8 is a schematic diagram of a video data transmission system according to an embodiment of the present application;

fig. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

In order to facilitate a clearer understanding of various embodiments of the present application, some relevant background knowledge is first presented as follows.

By applying the free view angle/multi-view angle technology, the video transmission at a plurality of angles needs to occupy a larger bandwidth, the data packet loss or congestion is easily caused by the excessively high bandwidth occupation, and the processing capacity requirement on the terminal is higher; multiple angles of video may not be synchronized in time during the switching of viewing angles, resulting in a user having a noticeable perception of time interleaving of pictures.

In the related art, video images of multiple views are spliced into a large multi-view video image (i.e. a mode of dividing a high-resolution picture into n grids, each grid stores a picture of one view), for example, 4K splicing is adopted, video images of a single view of 9 views (9 grids) are downsampled to 720P (9 720P pictures are spliced into a 4K picture and transmitted to a playing end, the playing end decodes the 4K picture to obtain pictures of each view of 9 720P, whether a user looks at the view, all view data are downsampled to 540P or not, the playing end displays the downsampled single-view video image, the lower the resolution of the downsampled single-view video image is, the worse the video quality is, limited bandwidth and device decoding capability are difficult to adopt ultra-high resolution capacity expansion, the range and number of views can only be limited, and the difference of the view can be increased.

As can be seen, the freeview/multiview technology in the related art has the following drawbacks:

(1) The images at a plurality of frame moments are spliced and then transmitted, the more the photographed visual angles are, the fewer the number of pixels which can be allocated to a single visual angle in the spliced image is, the lower the resolution is, and the worse the video quality is;

(2) Limited bandwidth and decoding capability of the device, it is difficult to splice multi-view video with higher resolutions such as 8K or 16K;

(3) To ensure the video quality of a single view angle, only the number of view angles, the range of view angles, or the view angle difference of the camera position can be limited.

In order to overcome the above-mentioned drawbacks, the present application provides a video data transmission method, a terminal, a network device, a system and an electronic device, which can avoid degradation of video quality of a single view angle by acquiring frame data corresponding to a target view angle sent by the network device.

For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

Fig. 1 is one of flow diagrams of a video data transmission method according to an embodiment of the present application, and as shown in fig. 1, an embodiment of the present application provides a video data transmission method, which is applied to a terminal and may include:

step 101, determining a target visual angle;

specifically, frame data corresponding to the target view angle may be stored in a target video file of the network device, and in order to determine a target storage location of the frame data corresponding to the target view angle in the target video file, the target view angle may be determined first.

Alternatively, two or more views may be numbered, that is, two or more views respectively correspond to the view numbers, and in the case of first playing the target video file, one view with the smallest view number may be determined as the target view, or one view with the largest view number may be determined as the target view, or one view of the two or more views may be determined as the target view based on a preset configuration.

Alternatively, the terminal may receive an input from a user, the input may be an input from which the user determines the target viewing angle, and in response thereto, the target viewing angle may be determined.

Alternatively, the terminal may receive an input from the user, the input may be an input from which the user switches the viewing angle, and further determine, in response to the input, that the switched viewing angle is the target viewing angle.

Step 102, determining a target storage position of frame data corresponding to the target view angle in a target video file based on index header information, wherein the target video file comprises the index header information and multi-view video data, the multi-view video data comprises two or more frame data respectively corresponding to two or more view angles, and the index header information is used for representing the storage position of the frame data respectively corresponding to the two or more view angles in the target video file;

specifically, the index header information may be used to characterize storage locations of frame data corresponding to two or more perspectives in the target video file, and after determining the target perspectives, a target storage location of frame data corresponding to the target perspectives in the target video file may be determined based on the index header information.

Alternatively, the terminal may acquire index header information based on the history storage information, and may determine the target storage location based on the index header information.

Optionally, the terminal may initiate a request to the network device, obtain index header information sent by the network device, and further determine the target storage location based on the index header information.

Step 103, a first request message for requesting frame data corresponding to the target view angle is sent, wherein the first request message carries the target storage position;

Specifically, in order to acquire the frame data corresponding to the target view angle, after determining the target storage location, a first request message may be sent to the network device, and the first request message carries the target storage location, so that after the network device receives the first request message, the frame data corresponding to the target view angle may be sent to the terminal based on the target storage location.

Step 104, receiving frame data corresponding to the target view angle.

Specifically, after the first request message is sent, frame data corresponding to the target view angle sent by the network device may be obtained, and then the frame data corresponding to the target view angle may be decoded and played for display.

It can be understood that in the process of distributing and playing the multi-view video, the target storage position of the frame data corresponding to the target view angle in the target video file can be determined, and then the frame data corresponding to the target view angle sent by the network device can be obtained, without video stitching, and without loading video data of multiple views by means of ultra-high resolution pictures such as 8k and 16 k; the transmitted images are not affected by the number of shooting view angles (the increase of the number of view angles does not affect the resolution and video quality of a single view angle), and the n-view angle video distribution and playing of the same resolution can be realized as long as 1-path video can be transmitted.

It can be understood that, in the process of playing the video frame (frame data) corresponding to the target view angle, the terminal can calculate the data position by itself and acquire and play the data, and does not need to interact with the network device frequently.

According to the frequency data transmission method provided by the embodiment of the application, the frame data corresponding to the target visual angle can be obtained by determining the target storage position of the frame data corresponding to the target visual angle in the target video file, the frame data corresponding to all visual angles is not required to be downloaded, the video data corresponding to a plurality of visual angles are not required to be spliced, the resolution of a single visual angle picture is not reduced, and further the video quality of a single visual angle is prevented from being deteriorated.

Optionally, the data structure corresponding to the multi-view video data is a first target table structure, each cell in the first target table structure stores one frame data, a target row in the first target table structure is used for storing frame data corresponding to the two or more views under a target frame time, the frame time corresponding to the first target row is earlier than the frame time corresponding to a second target row, the first target row and the second target row are any two adjacent rows in the first target table structure, and the first target row is a previous row of the second target row;

Specifically, the number of rows and columns of the first target table structure and the number of rows and columns of the second target table structure may be kept the same, and further, based on the target frame position information in the second target table structure, the storage position of the target frame data in the first target table structure in the target video file may be determined, that is, the storage positions of the frame data corresponding to two or more view angles in the target video file may be obtained by analyzing the index header information, and the target storage position of the frame data corresponding to the target view angle in the target video file may be determined.

Alternatively, the target frame position information may be a target offset, which may represent an offset byte number of the target frame data with respect to a first byte of the target video file, and the byte numbers respectively corresponding to the one or more frame position information may be the same.

Optionally, the terminal may learn the number of rows and columns of the first target table structure and the number of columns and rows of the second target table structure based on a preset configuration.

Optionally, the terminal may send a request message for requesting rank information of the first target table structure or rank information of the second target table structure to the network device, so as to obtain the rank information sent by the network device, and further obtain the number of rows and columns of the first target table structure and the number of ranks of the second target table structure.

Optionally, after knowing the number of rows and columns of the first target table structure and the number of columns and rows of the second target table structure, frame position information corresponding to the target view angle in the second target table structure may be determined, and then, based on the frame position information corresponding to the target view angle, a position of frame data corresponding to the target view angle in the first target table structure may be determined, that is, a target storage position of frame data corresponding to the target view angle in the target video file may be determined.

Therefore, by keeping the number of rows and columns of the first target table structure identical to the number of columns and rows of the second target table structure, the index header information can be analyzed, the storage positions of the frame data corresponding to two or more view angles respectively in the target video file can be obtained, the target storage position of the frame data corresponding to the target view angle in the target video file can be determined, the frame data corresponding to the target view angle sent by the network device can be obtained, the frame data corresponding to all view angles does not need to be downloaded, the video data corresponding to a plurality of view angles does not need to be spliced, the resolution of a single view angle picture can not be reduced, and further the video quality of a single view angle can be prevented from being deteriorated.

Optionally, the determining, based on the index header information, the target storage location of the frame data corresponding to the target view in the target video file includes:

receiving the index header information;

Specifically, after determining the target view angle, in order to determine the target storage position of the frame data corresponding to the target view angle in the target video file, a second request message for requesting the index header information may be initiated to the network device, and then the index header information sent by the network device may be obtained;

specifically, the number of target view angles may be the same as the number of columns of the first target table structure (and the same as the number of columns of the second target table structure), and the number of target frames may be the same as the number of rows of the first target table structure (and the same as the number of rows of the second target table structure), so that all or part of frame data between the first target frame and the second target frame may be determined as frame data to be acquired in frame data corresponding to the target view angles based on the index header information, the number of target view angles, and the number of target frames;

Specifically, after determining the frame data to be acquired, frame position information corresponding to the frame data to be acquired in the second target table structure may be determined based on the index header information, the target view angle number and the target frame number, and a position of the frame data to be acquired in the first target table structure may be determined based on the frame position information corresponding to the frame data to be acquired, that is, a storage position of the frame data to be acquired in the target video file may be determined, and then the storage position of the frame data to be acquired in the target video file may be used as the target storage position.

Alternatively, the target playing time may be a time when the user determines to start playing the video frame of the target video file; the target playing time may be a current time determined based on a clock of the terminal in a process of playing a video frame of the target video file; the target playing time may also be a time when the user determines to continue playing the video frame of the target video file after the playing is paused.

Optionally, the frame time corresponding to the first target frame, the frame time corresponding to the second target frame and the frame time corresponding to the target playing frame are determined based on the time sequence among all the frame data corresponding to the target visual angle.

For example, the target view may correspond to 10 frame data, 10 frame times may be determined based on a time sequence order among the 10 frame data, the first frame data of the target view may correspond to 1 st frame time (an earliest frame time among the 10 frame times), the last frame data of the target view may correspond to 10 th frame time (a latest frame time among the 10 frame times), and so on, it may be determined what frame time the other frame data of the target view corresponds to.

Optionally, in the case that the terminal plays the target video file for the first time, the target playing time may be a time when playing the target video file starts, it may be determined that the target playing frame data is first frame data corresponding to the target view angle, it may be determined that the first target frame is also first frame data corresponding to the target view angle, and it may be determined that the second target frame is last frame data corresponding to the target view angle.

Optionally, in the process of playing the video frames of the target video file, the target playing time may be a current time determined based on a clock of the terminal, the target playing frame data may be determined to be frame data corresponding to the video frames played by the terminal at the target playing time, the first target frame may be determined so that the frame time corresponding to the first target frame is earlier than or equal to the frame time corresponding to the target playing frame, and the second target frame may be determined so that the frame time corresponding to the second target frame is later than or equal to the frame time corresponding to the target playing frame.

Optionally, in the case that the user determines to continue playing the video frame of the target video file after the pause playing, the target playing time may be a time when the user determines to continue playing the video frame of the target video file, the target playing frame data may be determined to be frame data corresponding to the video frame played by the terminal at the target playing time, the first target frame may be determined such that the frame time corresponding to the first target frame is earlier than or equal to the frame time corresponding to the target playing frame, and the second target frame may be determined such that the frame time corresponding to the second target frame is later than or equal to the frame time corresponding to the target playing frame.

Therefore, the second request message can be sent to the network device to obtain the index header information, the storage position of the frame data to be obtained in the target video file can be determined to be the target storage position based on the index header information, the target view angle number and the target frame number, the frame data corresponding to the target view angle sent by the network device can be obtained, the frame data corresponding to all view angles does not need to be downloaded, the video data corresponding to a plurality of view angles does not need to be spliced, the resolution of a single view angle picture does not need to be reduced, and further the video quality of a single view angle is prevented from being deteriorated.

Optionally, the determining that all or part of the frame data between the first target frame and the second target frame is frame data to be acquired includes:

determining a target playing mode, wherein the target playing mode is a frame-by-frame playing mode or a frame-skip playing mode;

and determining that partial frame data between the first target frame and the second target frame is the frame data to be acquired based on the index header information, the target view angle number and the target frame number under the condition that the target play mode is the frame skip play mode.

Specifically, in order to determine frame data to be acquired, a target play mode may be determined, and in the case that the target play mode is a skip frame play mode, partial frame data between a first target frame and a second target frame may be determined to be played, and further, based on index header information, a target view angle number and a target frame number, partial frame data between the first target frame and the second target frame may be determined to be frame data to be acquired;

Optionally, the target video file further includes header information, where the header information is used to characterize a byte length of the index header information, and before determining that all or part of frame data between a first target frame and a second target frame is to be acquired frame data in the frame data corresponding to the target view based on the index header information, the target view number, and the target frame number, the method further includes:

transmitting a third request message for requesting the header information;

Receiving the header information;

Specifically, in order to determine the target frame number, before determining the frame data to be acquired, a third request message for requesting header information may be sent to the network device, and thus header information may be acquired, and further the byte length of the index header information may be determined, and thus the target frame number may be determined.

Optionally, fig. 2 is a schematic diagram of a data structure of a video file according to an embodiment of the present application, as shown in fig. 2, a two-dimensional table (matrix) may be used to represent a time sequence relationship and a spatial position relationship between independent frames carried by a frame sequence of each video stream in a multi-view scene, rows of the table are horizontal axes of coordinates, may represent an arrangement relationship of spatial positions of cameras in different views, columns of the table are vertical axes of coordinates, may represent a time sequence relationship of frames of a video stream in a certain view, frames in the same row store all view frame data in a certain time, and frames in the same column store all frame data in a time sequence in a certain view.

Alternatively, as shown in fig. 2, the data structure corresponding to the target video file may be a two-dimensional table-type data structure, the target video file may include three parts, the first part of the target video file may be header information, the second part of the target video file may be index header information, and the third part of the target video file may be multi-view video data.

Alternatively, as shown in fig. 2, the version number field in the first portion may represent the version of the data structure shown in fig. 2, which may be used for version compatibility, and the subsequent iteration version may make certain adjustments to the data structure, and the index header field in the first portion may represent the length of the second portion (index header information) of the next table.

Alternatively, as shown in fig. 2, the second portion of the target video file may be an offset (offset) position indicated from the second line (beginning with the 4 th Byte) to the index-header-information-length field, that is, the length of the entire index header information, which may be (2×n×m) Bytes, n representing the number of views, and m representing the number of frames of the time sequence in which the table shown in fig. 2 is maximally stored. As shown in fig. 2, the index header information is also a two-dimensional table, and the size of each field may be fixed to 2Bytes, and the value may represent the offset of the y-th frame of the view x at the tail of the table structure shown in fig. 2, so as to calculate the start-stop offset of any frame.

Alternatively, as shown in fig. 2, the tail offset value of the y-th frame of the view x is stored in the tail offset (2 Byte size) of the y-th frame of the view x, and is also the head offset value of the y-th frame of the view (x+1). The position of the header offset of the 1 st frame of view 1 is the position where the index header information ends, that is, (2×n×m+1+2) th Bytes start.

For example, the start-stop position of the y-th frame of the view x can be obtained by:

finding out the tail offset of the y-th frame of the view angle x in the index head information, namely the tail position corresponding to the y-th frame of the view angle x;

finding out tail offset of the y-th frame of the view angle (x-1) in the index header information, namely the head position corresponding to the y-th frame of the view angle x;

if x=1, then the tail offset of the (y-1) th frame of view n needs to be found, i.e. the head position to be found; if x=1 and y=1 (first frame of first view), the offset of the head start is (2×n×m+1+2) Bytes.

Alternatively, as shown in fig. 2, for the third portion of the target video file, from the end of the index header information to the end of the table, it may be used to store specific frame data.

Optionally, the terminal may use a hypertext transfer protocol (HyperText Transfer Protocol, HTTP) to read the target video file through the network, and the HTTP protocol may support offset positioning and segment reading of the file, so as to adapt to the requirements of the data structure.

Alternatively, in the process of acquiring video data through the HTTP download method, the terminal may read the index header information in advance according to the configuration.

Alternatively, if the terminal can obtain video streams of n views through the held preset configuration information, and can learn that each video stream stores m frames in a file segment, the index header information of Bytes from 0 to (2×n×m+1+2) can be read first (as shown in fig. 2) when the HTTP request is sent, so that the offset stored in each video frame can be calculated.

Optionally, if the terminal only knows the video streams with n views through the held preset configuration information, but cannot know how many frames each video stream stores in the file segment, 3Bytes of the header information can be read first (as shown in fig. 2), the index header length value can be obtained from the index header length field of the header information, and then the formula (m=index header length/2 n) can be used to know that each video stream stores m frames in the file segment, so that the offset stored in each video frame can be calculated.

Optionally, the terminal may obtain the index header information of a certain video file through the HTTP GET request, then start playing from the 1 st frame of the view angle 1, if the user does not have the operation of switching the view angle, then play the frames according to the time sequence of the frames of the view angle, specifically, may obtain the 2 nd frame of the view angle 1, the 3 rd frame of the view angle 1 according to the above-mentioned positioning calculation mode of the frame offset, and so on until the m-th frame of the view angle 1 is played in sequence.

It can be appreciated that the HTTP GET method can be used to locate and segment read the offset of the file according to the data offset, and only the corresponding frame data can be acquired without downloading video data of all views.

Optionally, each view angle may be numbered, the terminal may receive an input of switching the viewing angle by the user during the video frame playing process, when the user performs an operation of switching the playing view angle (for example, sliding the playing frame left and right or up and down), the terminal may obtain the start offset and the end position of the frame with the same number (the same time, the pictures with different positions) of the current view angle number-1 according to the tail offset of the current playing frame (the current view angle number +1 if the current view angle is switched right, or the picture with different positions if the current view angle number-1 is switched left), so that frame data corresponding to the new view angle offset may be read through HTTP GET, and displayed on the screen after decoding, and the action of switching the view angle once may be completed.

Optionally, during the video frame playing process, the terminal may receive the input of the forward fast forward playing operation or the backward fast backward playing operation, that is, the frame is skipped in time sequence, for example, the y frame currently playing to the x view angle, the next frame of fast forward/fast backward is the (y+10)/(y-10) frame or the (y+20)/(y-20) frame of the x view angle, and as for the skip 10 frames or the skip 20 frames, the interval may be arbitrarily defined. After knowing that the terminal plays the first frame of a certain view angle, the terminal can obtain the offset of the corresponding frame through the calculation method, and read the data of the corresponding frame for playing.

Therefore, the index header information can be obtained by sending the second request message to the network device, the header information can be obtained by sending the third request message to the network device, the target frame number can be determined, the storage position of the frame data to be obtained in the target video file can be determined to be the target storage position based on the index header information, the target view angle number and the target frame number, the frame data corresponding to the target view angle sent by the network device can be obtained, the frame data corresponding to all view angles does not need to be downloaded, the video data corresponding to a plurality of view angles does not need to be spliced, the resolution of a single view angle picture does not need to be reduced, and the video quality of a single view angle is prevented from being deteriorated.

Fig. 3 is a second flowchart of a video data transmission method according to an embodiment of the present application, as shown in fig. 3, where the video data transmission method is applied to a network device and may include:

step 301, receiving a first request message, where the first request message carries a target storage location; the target storage position is used for representing the storage position of frame data corresponding to a target view angle in a target video file, the target video file is stored in network equipment, the target video file comprises index header information and multi-view video data, the multi-view video data comprises frame data corresponding to two or more view angles respectively, and the index header information is used for representing the storage position of the frame data corresponding to the two or more view angles respectively in the target video file;

Specifically, in order to send frame data corresponding to the target view angle to the terminal, a first request message sent by the terminal may be acquired, where the first request message carries a target storage location, and further a target storage location of the frame data corresponding to the target view angle in the target video file may be determined based on the target storage location.

Step 302, based on the target storage location, sending frame data corresponding to the target view angle.

Specifically, after the target storage location is acquired, the storage location of the frame data corresponding to the target view angle in the target video file may be determined based on the target storage location, and then the frame data corresponding to the target view angle may be extracted and sent to the terminal.

It can be understood that in the process of performing multi-view video distribution and playing, the terminal can determine the target storage position of the frame data corresponding to the target view angle in the target video file, so as to obtain the frame data corresponding to the target view angle sent by the network device, without performing video stitching, and without loading video data of multiple views by means of ultra-high resolution pictures such as 8k and 16 k; the transmitted images are not affected by the number of shooting view angles (the increase of the number of view angles does not affect the resolution and video quality of a single view angle), and the n-view angle video distribution and playing of the same resolution can be realized as long as 1-path video can be transmitted.

According to the frequency data transmission method provided by the embodiment of the application, the target storage position can be determined by the network equipment receiving the first request message, so that the frame data corresponding to the target view angle in the target video file can be determined, the frame data corresponding to the target view angle can be further sent to the terminal, the frame data corresponding to all view angles does not need to be downloaded, the video data corresponding to a plurality of view angles does not need to be spliced, the resolution of a single view angle picture can not be reduced, and further the video quality of a single view angle can be prevented from being deteriorated.

Therefore, by keeping the row number of the first target table structure and the column number of the second target table structure the same, the terminal analyzes the index header information, so that the target storage position of the frame data corresponding to the target view angle in the target video file can be determined, the network device can determine the target storage position by receiving the first request message, further, the frame data corresponding to the target view angle in the target video file can be determined, further, the frame data corresponding to the target view angle can be sent to the terminal, downloading of the frame data corresponding to all view angles is not needed, and further, splicing of the video data corresponding to a plurality of view angles is not needed, the resolution of the single view angle picture can not be reduced, and further, the video quality of the single view angle can be prevented from being deteriorated.

Optionally, before the receiving the first request message, the method further includes:

receiving a second request message sent by the terminal and used for requesting the index header information;

And sending the index head information to the terminal.

Specifically, in order for the terminal to acquire the index header information, the second request message for requesting the index header information transmitted by the terminal may be received before the first request message is received, and the index header information may be further transmitted to the terminal.

Therefore, the network device receives the first request message to determine the target storage position, so that the frame data corresponding to the target view angle in the target video file can be determined, the frame data corresponding to the target view angle can be sent to the terminal, the frame data corresponding to all view angles do not need to be downloaded, the video data corresponding to a plurality of view angles do not need to be spliced, the resolution of a single view angle picture is not reduced, and further the video quality of a single view angle is prevented from being deteriorated.

Optionally, the target video file further includes header information, where the header information is used to characterize a byte length of the index header information, and before the receiving the first request message, the method further includes:

receiving a third request message sent by the terminal and used for requesting the header information;

and sending the header information to the terminal.

Specifically, in order for the terminal to acquire the header information, a third request message for requesting the header information, which is transmitted by the terminal, may be received before the first request message is received, and the header information may be further transmitted to the terminal.

acquiring the header information based on the index header information;

Specifically, in order to acquire the target video file, before receiving the first request message, frame data corresponding to two or more views respectively may be acquired, and then, based on the first target table structure, the frame data corresponding to two or more views respectively may be stored, so as to acquire multi-view video data, and then, based on the multi-view video data and the second target table structure, index header information may be acquired, and further, based on the index header information, header information may be acquired, and then, the target video file may be acquired.

Optionally, the data structure corresponding to the target video file may be a two-dimensional table data structure, which is a data structure for organizing and storing multiple video data of multiple views/free views, and may be used for file storage, video streaming, and the like. As shown in fig. 2, a two-dimensional table (matrix) may be used to represent a time sequence relationship and a spatial position relationship between independent frames carried by a frame sequence of each video stream in a multi-view scene, rows of the table are horizontal axes of coordinates, may represent an arrangement relationship of spatial positions of cameras in different views, columns of the table are vertical axes of coordinates, may represent a time sequence relationship of frames of a video stream in a certain view, frames in the same row store all view picture data in a certain time, and frames in the same column store all frame data in a time sequence in a certain view.

Optionally, the video streams of n views may be read from frame to frame (the start time stamps need to be consistent), that is, from a certain moment, the frames of each view are read to a storage device (such as a memory or a hard disk), the frame sequence of each view is organized into a two-dimensional logic table according to time, according to the size of each frame, the offset of each field in the third part of the table is calculated according to the table structure shown in fig. 2, and then stored in the structure of the second part of the table (first placed in the memory), finally the first part of the table is generated, and the three parts are sequentially output to the file, so that the storage of multi-view data can be realized.

Alternatively, all video data of n perspectives may be stored into one file; the video data may also be stored in segments, such as 1-100 frames for a first file segment (the file segments organized according to the data structure shown in fig. 2), 101-200 frames for a second video segment, and so on, until all video data segments are stored.

Alternatively, for the target video file, the file may be named as HTTP-based adaptive bitrate streaming protocol (HTTP Live Streaming, HLS) or other streaming naming, e.g., the file slice naming may be followed by ts or fmp4, which may facilitate the content delivery network (Content Delivery Network, CDN) to deliver the file without modification to the CDN.

Therefore, the network device can process the frame data corresponding to two or more view angles respectively to obtain the target video file, so that the frame data corresponding to the target view angle in the target video file can be determined, the frame data corresponding to the target view angle can be sent to the terminal, the frame data corresponding to all view angles do not need to be downloaded, the video data corresponding to a plurality of view angles do not need to be spliced, the resolution of a single view angle picture can not be reduced, and further the video quality of the single view angle can be prevented from being deteriorated.

The video data transmission device provided by the embodiment of the present application is described below, and the video data transmission device described below and the video data transmission method described above may be referred to correspondingly to each other.

Fig. 4 is a schematic structural diagram of a video data transmission device according to an embodiment of the present application, where, as shown in fig. 4, the device is applied to a terminal, and includes: a first determining module 401, a second determining module 402, a first transmitting module 403 and a first receiving module 404, wherein:

A first determining module 401, configured to determine a target viewing angle;

a second determining module 402, configured to determine, based on index header information, a target storage location of frame data corresponding to the target view in a target video file, where the target video file includes the index header information and multi-view video data, the multi-view video data includes frame data corresponding to two or more views, and the index header information is used to characterize a storage location of the frame data corresponding to the two or more views in the target video file;

a first sending module 403, configured to send a first request message for requesting frame data corresponding to the target view, where the first request message carries the target storage location;

the first receiving module 404 is configured to receive frame data corresponding to the target view angle.

According to the frequency data transmission device provided by the embodiment of the application, the frame data corresponding to the target visual angle can be obtained by determining the target storage position of the frame data corresponding to the target visual angle in the target video file, the frame data corresponding to all visual angles is not required to be downloaded, the video data corresponding to a plurality of visual angles are not required to be spliced, the resolution of a single visual angle picture is not reduced, and further the video quality of a single visual angle is prevented from being deteriorated.

Fig. 5 is a second schematic structural diagram of a video data transmission apparatus according to an embodiment of the present application, as shown in fig. 5, where the apparatus is applied to a network device, and includes: a second receiving module 501 and a second transmitting module 502, wherein:

a second receiving module 501, configured to receive a first request message, where the first request message carries a target storage location; the target storage position is used for representing a target storage position of frame data corresponding to a target view angle in a target video file, the target video file is stored in network equipment, the target video file comprises index header information and multi-view video data, the multi-view video data comprises frame data corresponding to two or more view angles respectively, and the index header information is used for representing storage positions of the frame data corresponding to the two or more view angles respectively in the target video file;

and a second sending module 502, configured to send frame data corresponding to the target view angle based on the target storage location.

The terminal according to the embodiment of the application can be a device for providing voice and/or data connectivity for a user, a handheld device with a wireless connection function, or other processing devices connected to a wireless modem, etc. The names of the terminal devices may also be different in different systems, for example in a 5G system, the terminal devices may be referred to as User Equipment (UE).

The network device according to the embodiment of the present application may be a base station, where the base station may include a plurality of cells for providing services for the terminal. A base station may also be called an access point or may be a device in an access network that communicates over the air-interface, through one or more sectors, with wireless terminal devices, or other names, depending on the particular application.

Fig. 6 is a schematic structural diagram of a terminal provided in an embodiment of the present application, and referring to fig. 6, an embodiment of the present application further provides a terminal, which may include: a memory 610, a transceiver 620, and a processor 630;

the memory 610 is used for storing a computer program; a transceiver 620 for transceiving data under the control of the processor 630; a processor 630 for reading the computer program in the memory 610 and performing the following operations:

determining a target viewing angle;

and receiving frame data corresponding to the target visual angle.

Wherein in fig. 6, a bus architecture may comprise any number of interconnected buses and bridges, and in particular one or more processors represented by processor 630 and various circuits of memory represented by memory 610, linked together. The bus architecture may also link together various other circuits such as peripheral devices, voltage regulators, power management circuits, etc., which are well known in the art and, therefore, will not be described further herein. The bus interface provides an interface. The transceiver 620 may be a number of elements, i.e. comprising a transmitter and a receiver, providing a unit for communicating with various other apparatus over a transmission medium. The user interface 640 may also be an interface that enables external inscription of the desired device for different user devices.

The processor 630 is responsible for managing the bus architecture and general processing, and the memory 610 may store data used by the processor 630 in performing operations.

Processor 630 is operative to perform any of the methods provided in embodiments of the present application in accordance with the obtained executable instructions by invoking a computer program stored in memory 610. The processor and the memory may also be physically separate.

Optionally, the processor 630 is further configured to perform the following operations:

receiving the index header information;

Optionally, the target video file further includes header information, where the header information is used to characterize a byte length of the index header information, and before determining that all or part of frame data between a first target frame and a second target frame is to be acquired frame data in the frame data corresponding to the target view based on the index header information, the target view number, and the target frame number, the operations further include:

transmitting a third request message for requesting the header information;

Receiving the header information;

Fig. 7 is a schematic structural diagram of a network device according to an embodiment of the present application, and referring to fig. 7, an embodiment of the present application further provides a network device, which may include: memory 710, transceiver 720, and processor 730;

the memory 710 is used to store computer programs; a transceiver 720 for receiving and transmitting data under the control of the processor 730; a processor 730 for reading the computer program in the memory 710 and performing the following operations:

Wherein in fig. 7, a bus architecture may comprise any number of interconnected buses and bridges, and in particular one or more processors represented by processor 730 and various circuits of memory represented by memory 710, linked together. The bus architecture may also link together various other circuits such as peripheral devices, voltage regulators, power management circuits, etc., which are well known in the art and, therefore, will not be described further herein. The bus interface provides an interface. Transceiver 720 may be a number of elements, including a transmitter and a receiver, providing a means for communicating with various other apparatus over a transmission medium. The processor 730 is responsible for managing the bus architecture and general processing, and the memory 710 may store data used by the processor 730 in performing operations.

Optionally, the processor 730 is further configured to:

Optionally, before the receiving the first request message, the operations further include:

and sending the index head information to the terminal.

Optionally, the target video file further includes header information, the header information being used to characterize a byte length of the index header information, and before the receiving the first request message, the operations further include:

And sending the header information to the terminal.

acquiring the header information based on the index header information;

It should be noted that, the terminal and the network device provided in the embodiments of the present application can implement all the method steps implemented in the embodiments of the method and achieve the same technical effects, and detailed descriptions of the same parts and beneficial effects as those of the embodiments of the method in the embodiments are omitted herein.

Fig. 8 is a schematic diagram of a video data transmission system according to an embodiment of the present application, as shown in fig. 8, where the system includes: terminal 801 and network equipment 802, network equipment stores target video file, target video file includes index header information and multiview video data, multiview video data includes two or more frame data that the view corresponds respectively, index header information is used for representing the storage position in target video file of frame data that the view corresponds respectively more than two, wherein:

The terminal 801 is configured to: determining a target viewing angle;

the network device 802 is configured to: receiving the first request message;

the terminal 801 is further configured to receive frame data corresponding to the target view angle.

According to the video data transmission system provided by the embodiment of the application, the frame data corresponding to the target visual angle can be obtained by determining the target storage position of the frame data corresponding to the target visual angle in the target video file, the frame data corresponding to all visual angles is not required to be downloaded, the video data corresponding to a plurality of visual angles are not required to be spliced, the resolution of a single visual angle picture is not reduced, and further the video quality of a single visual angle is prevented from being deteriorated.

It should be noted that, in the video data transmission system provided by the embodiment of the present application, all the method steps implemented by the method embodiment can be implemented, and the same technical effects can be achieved, and the details of the same parts and beneficial effects as those of the method embodiment in the embodiment are not described in detail herein.

Fig. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present application, as shown in fig. 9, the electronic device may include: processor 910, communication interface (Communication Interface), memory 930, and communication bus 940, wherein processor 910, communication interface 920, and memory 930 communicate with each other via communication bus 940. The processor 910 may call a computer program in the memory 930 to perform the steps of a video data transmission method, for example including:

determining a target viewing angle;

receiving frame data corresponding to the target visual angle; or alternatively

Further, the logic instructions in the memory 930 described above may be implemented in the form of software functional units and may be stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on this understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

In another aspect, embodiments of the present application further provide a computer program product, where the computer program product includes a computer program, where the computer program may be stored on a non-transitory computer readable storage medium, where the computer program when executed by a processor is capable of executing the steps of the video data transmission method provided in the foregoing embodiments, for example, including:

determining a target viewing angle;

receiving frame data corresponding to the target visual angle; or alternatively

In another aspect, embodiments of the present application further provide a processor-readable storage medium storing a computer program for causing a processor to execute the steps of the method provided in the above embodiments, for example, including:

determining a target viewing angle;

receiving frame data corresponding to the target visual angle; or alternatively

The processor-readable storage medium may be any available medium or data storage device that can be accessed by a processor, including, but not limited to, magnetic storage (e.g., floppy disks, hard disks, magnetic tape, magneto-optical disks (MOs), etc.), optical storage (e.g., CD, DVD, BD, HVD, etc.), semiconductor storage (e.g., ROM, EPROM, EEPROM, nonvolatile storage (NAND FLASH), solid State Disk (SSD)), and the like.

The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and are not limiting; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims

1. A video data transmission method, applied to a terminal, comprising:

determining a target viewing angle;

And receiving frame data corresponding to the target visual angle.

2. The video data transmission method according to claim 1, wherein the data structure corresponding to the multi-view video data is a first target table structure, each cell in the first target table structure stores one piece of frame data, a target row in the first target table structure is used for storing frame data corresponding to the two or more views respectively at a target frame time, the frame time corresponding to the first target row is earlier than the frame time corresponding to a second target row, the first target row and the second target row are any two adjacent rows in the first target table structure, and the first target row is a previous row of the second target row;

3. The video data transmission method according to claim 2, wherein determining, based on the index header information, a target storage location of frame data corresponding to the target view in a target video file includes:

receiving the index header information;

4. The video data transmission method according to claim 3, wherein the target video file further includes header information for characterizing a byte length of the index header information, and before the frame data corresponding to the target view is determined to be frame data to be acquired, based on the index header information, the target view number, and the target frame number, all or part of frame data between a first target frame and a second target frame is determined to be frame data to be acquired, and a storage location of the frame data to be acquired in the target video file is determined to be the target storage location, the method further comprises:

transmitting a third request message for requesting the header information;

receiving the header information;

5. A video data transmission method, applied to a network device, comprising:

6. The video data transmission method according to claim 5, wherein the data structure corresponding to the multi-view video data is a first target table structure, each cell in the first target table structure stores one piece of frame data, a target row in the first target table structure is used for storing frame data corresponding to the two or more views respectively at a target frame time, a frame time corresponding to a first target row is earlier than a frame time corresponding to a second target row, the first target row and the second target row are any two adjacent rows in the first target table structure, and the first target row is a previous row of the second target row;

7. The video data transmission method of claim 6, wherein the target video file further comprises header information characterizing a byte length of the index header information, the method further comprising, prior to the receiving the first request message:

acquiring the header information based on the index header information;

8. A terminal comprising a memory, a transceiver, and a processor;

determining a target viewing angle;

and receiving frame data corresponding to the target visual angle.

9. A network device comprising a memory, a transceiver, and a processor;

10. A video data transmission system, comprising: the system comprises a terminal and network equipment, wherein the network equipment stores a target video file, the target video file comprises index header information and multi-view video data, the multi-view video data comprises frame data corresponding to two or more view angles respectively, the index header information is used for representing storage positions of the frame data corresponding to the two or more view angles respectively in the target video file, and the storage positions of the frame data in the target video file are as follows:

the terminal is used for: determining a target viewing angle;

the network device is configured to: receiving the first request message;

11. The video data transmission system of claim 10, wherein the data structure corresponding to the multi-view video data is a first target table structure, each cell in the first target table structure stores one of the frame data, a target row in the first target table structure is used for storing frame data corresponding to the two or more views at a target frame time, respectively, a frame time corresponding to a first target row is earlier than a frame time corresponding to a second target row, the first target row and the second target row are any two adjacent rows in the first target table structure, and the first target row is a previous row of the second target row;

12. An electronic device comprising a processor and a memory storing a computer program, characterized in that the processor implements the video data transmission method of any one of claims 1 to 4 or the video data transmission method of any one of claims 5 to 7 when executing the computer program.