CN111225293A

CN111225293A - Video data processing method and device and computer storage medium

Info

Publication number: CN111225293A
Application number: CN201811406127.8A
Authority: CN
Inventors: 程来顺; 王兴尚; 王磊; 艾万勇
Original assignee: Sanechips Technology Co Ltd
Current assignee: Sanechips Technology Co Ltd
Priority date: 2018-11-23
Filing date: 2018-11-23
Publication date: 2020-06-02
Anticipated expiration: 2038-11-23
Also published as: CN111225293B

Abstract

The embodiment of the invention discloses a video data processing method, a device and a computer storage medium, wherein the method is applied to a video playing terminal and comprises the following steps: determining block tile information corresponding to visual position information based on the visual position information; extracting tile data corresponding to the tile information from pre-cached tile stream data according to the tile information; the tile stream data is obtained by tile dividing panoramic video data to be played; storing the extracted tile data to a queue to be decoded; outputting decoded video data by decoding the tile data in the queue to be decoded; the method and the device can ensure that the multi-tile streaming data can play the video data corresponding to different blocks in the set top box according to the visual position change of the user, thereby meeting the requirement of the user on the VR function of the set top box.

Description

Video data processing method and device and computer storage medium

Technical Field

The present invention relates to the field of video technologies, and in particular, to a method and an apparatus for processing video data, and a computer storage medium.

Background

With the increasing maturity of Virtual Reality (VR) services, operators want to promote VR panoramic video playing in Internet Protocol TeleVision (IPTV) applications, so as to drive the rapid development of the IPTV industry. The panoramic video is also called a 360-degree video, and can be used for watching dynamic videos at any angle of 360 degrees from left to right, so that people can feel personally on the scene in a real sense; thus resulting in greater resolution of panoramic video, typically at 8K, 12K and above; however, the existing set-top box decoder can only process video decoding with 4K or lower resolution, and cannot meet video decoding with 8K, 12K or even higher resolution, and from the decoding point of view, the set-top box is difficult to realize VR playing effect.

Although a multi-tile (block, which divides a frame of video data into multiple blocks) streaming mode is proposed at present, 8K, 12K or even higher resolution videos of a panoramic video can be block-coded and encapsulated, so that a set top box can meet the requirements of VR video playing of high resolution. However, in research, it is found that when a conventional playing scheme is used in a set-top box to play a VR video, only a picture of a block (tile) corresponding to the top left corner of the VR video picture can be displayed, and the requirement of a user on the VR function still cannot be met.

Disclosure of Invention

The invention mainly aims to provide a video data processing method, a video data processing device and a computer storage medium, which can ensure that multiple tile stream data can play video data corresponding to different blocks in a set top box according to the change of the visual position of a user, thereby meeting the requirement of the user on the VR function of the set top box.

In order to achieve the purpose, the technical scheme of the invention is realized as follows:

in a first aspect, an embodiment of the present invention provides a video data processing method, where the method is applied to a video playing terminal, and the method includes:

determining block tile information corresponding to visual position information based on the visual position information;

extracting tile data corresponding to the tile information from pre-cached tile stream data according to the tile information; the tile stream data is obtained by tile dividing panoramic video data to be played;

storing the extracted tile data to a queue to be decoded;

and outputting the decoded video data by decoding the tile data in the queue to be decoded.

In the foregoing aspect, before the determining, based on the visual location information, block tile information corresponding to the visual location information, the method further includes:

acquiring visual parameter information; the visual parameter information represents parameter information required by reflecting the visual position of a user;

judging whether the visual parameter information is effective or not;

and when the visual parameter information is effective, obtaining the visual position information based on a preset visual algorithm.

In the foregoing solution, before the extracting tile data corresponding to the tile information from pre-cached tile stream data according to the tile information, the method further includes:

receiving a Uniform Resource Locator (URL) related to panoramic video data to be played;

according to the URL, establishing communication connection with a network server, and acquiring the panoramic video data to be played from the network server;

identifying whether the panoramic video data to be played is tile stream data;

and when the panoramic video data to be played is tile stream data, caching the tile stream data.

In the foregoing solution, after the obtaining the panoramic video data to be played from the network server, the method further includes:

determining the total number of tiles and resolution information corresponding to the panoramic video data to be played based on the analysis of the panoramic video data to be played;

calculating the maximum tile value supported by the video playing terminal based on the total number of tiles and the resolution information;

creating queues to be decoded with the same number as the maximum tile value; and based on a concurrency processing principle, carrying out concurrent processing on the queues to be decoded with the same number as the maximum tile value.

In the above scheme, when the to-be-played panoramic video data is tile stream data, caching the tile stream data includes:

when the panoramic video data to be played is tile stream data, acquiring a corresponding relation between the tile data and tile information; the tile information at least comprises coordinate information of tile data, width information of the tile data and height information of the tile data;

caching the tile stream data and the corresponding relation between the tile data and the tile information.

In the foregoing aspect, the extracting tile data corresponding to the tile information from pre-cached tile stream data according to the tile information includes:

and extracting tile data corresponding to the tile information from the tile stream data according to the tile information and the corresponding relation between the tile data and the tile information.

In the above scheme, the storing the extracted tile data to a queue to be decoded includes:

correspondingly storing the extracted tile data to K queues to be decoded; wherein the value of K is the maximum tile value;

correspondingly, the outputting the decoded video data by decoding the tile data in the queue to be decoded includes:

respectively decoding the tile data in the K queues to be decoded to obtain K decoded video data;

and synthesizing the K decoded video data, and outputting the synthesized video data.

In a second aspect, an embodiment of the present invention provides a video data processing apparatus, where the video data processing apparatus is applied to a video playback terminal, and the video data processing apparatus includes: a determination unit, an extraction unit, a storage unit and a decoding unit, wherein,

the determining unit is configured to determine block tile information corresponding to visual position information based on the visual position information;

the extracting unit is configured to extract tile data corresponding to the tile information from pre-cached tile stream data according to the tile information; the tile stream data is obtained by tile dividing panoramic video data to be played;

the storage unit is configured to store the extracted tile data to a queue to be decoded;

and the decoding unit is configured to output decoded video data through decoding processing of the tile data in the queue to be decoded.

In the above-mentioned aspect, the video data processing apparatus further includes an acquisition unit and a judgment unit, wherein,

the acquisition unit is configured to acquire visual parameter information; the visual parameter information represents parameter information required by reflecting the visual position of a user;

the judging unit is configured to judge whether the visual parameter information is valid;

the obtaining unit is further configured to obtain the visual position information based on a preset visual algorithm when the visual parameter information is valid.

In the above-described aspect, the video data processing apparatus further includes a receiving unit and an identifying unit, wherein,

the receiving unit is configured to receive a Uniform Resource Locator (URL) related to panoramic video data to be played;

the acquisition unit is also configured to establish communication connection with a network server according to the URL and acquire the panoramic video data to be played from the network server;

the identification unit is configured to identify whether the panoramic video data to be played is tile stream data;

the storage unit is further configured to buffer the tile stream data when the to-be-played panoramic video data is the tile stream data.

In the above-described aspect, the video data processing apparatus further includes a calculation unit and a creation unit, wherein,

the determining unit is further configured to determine the total number of tiles and resolution information corresponding to the panoramic video data to be played based on analysis of the panoramic video data to be played;

the calculating unit is configured to calculate a maximum tile value supported by the video playing terminal based on the total number of tiles and the resolution information;

the creating unit is configured to create queues to be decoded, the queues to be decoded and the maximum tile values of which are the same in number; and based on a concurrency processing principle, carrying out concurrent processing on the queues to be decoded with the same number as the maximum tile value.

In the above scheme, the obtaining unit is further configured to obtain a corresponding relationship between tile data and tile information when the to-be-played panoramic video data is tile stream data; the tile information at least comprises coordinate information of tile data, width information of the tile data and height information of the tile data;

the storage unit is further configured to cache the tile stream data and the corresponding relationship between the tile data and the tile information.

In the above aspect, the extracting unit is configured to extract tile data corresponding to the tile information from the tile stream data based on the tile information and a correspondence between the tile data and the tile information.

In the above-described aspect, the video data processing apparatus further includes a synthesizing unit, wherein,

the storage unit is configured to correspondingly store the extracted tile data to K queues to be decoded; wherein the value of K is the maximum tile value;

the decoding unit is configured to decode the tile data in the K queues to be decoded respectively to obtain K decoded video data;

and the synthesizing unit is configured to synthesize the K decoded video data and output the synthesized video data.

In a third aspect, an embodiment of the present invention provides a video data processing apparatus, including: a memory and a processor; wherein the content of the first and second substances,

the memory for storing a computer program operable on the processor;

the processor, when executing the computer program, is configured to perform the steps of the method according to any of the first aspects.

In a fourth aspect, an embodiment of the present invention provides a computer storage medium storing a video data processing program, which when executed by at least one processor implements the steps of the method according to any one of the first aspect.

In a fifth aspect, an embodiment of the present invention provides a video playback terminal, where the video playback terminal includes at least the video data processing apparatus according to any one of the second aspect or the third aspect.

The embodiment of the invention provides a video data processing method, a video data processing device and a computer storage medium, wherein the method is applied to a video playing terminal, and firstly, based on visual position information, block tile information corresponding to the visual position information is determined; extracting tile data corresponding to the tile information from pre-cached tile stream data according to the tile information; the tile stream data is obtained by tile dividing panoramic video data to be played; storing the extracted tile data to a queue to be decoded, and finally outputting decoded video data through decoding processing of the tile data in the queue to be decoded; therefore, on the premise of ensuring the definition of the played picture, the multi-tile stream data can be ensured to be capable of playing the video data corresponding to different tile blocks in the set top box according to the visual position change of the user, and the requirement of the user on the VR function of the set top box is met.

Drawings

Fig. 1 is a schematic structural diagram of a video playing system according to an embodiment of the present invention;

fig. 2 is a flowchart illustrating a video data processing method according to an embodiment of the present invention;

fig. 3 is a schematic flowchart of a method for determining tile information according to an embodiment of the present invention;

FIG. 4 is a schematic flow chart of a method for extracting and saving tile data according to an embodiment of the present invention;

fig. 5 is a schematic flowchart of a method for decoding tile data according to an embodiment of the present invention;

fig. 6 is a detailed flowchart of a video data processing method according to an embodiment of the present invention;

fig. 7 is a schematic structural diagram of a video data processing apparatus according to an embodiment of the present invention;

fig. 8 is a schematic diagram illustrating a specific hardware structure of a video data processing apparatus according to an embodiment of the present invention;

fig. 9 is a schematic structural diagram of a video playing terminal according to an embodiment of the present invention.

Detailed Description

The technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention.

Referring to fig. 1, a schematic diagram of a component structure of a video playing system 10 according to an embodiment of the present invention is shown; as shown in fig. 1, the video playback system 10 includes a video playback terminal 101 and a web server 102; the network server 102 is in communication connection with one or more video playing terminals 101 through a network, so as to implement data communication or interaction between the two terminals. Here, the network server 102 may be a file server, a streaming server, a database server, or the like, and is used for providing various video sources to the video playback terminal 101 for downloading or online playback, and the like; the video playing terminal 101 may be a set-top box, or an electronic device with a playing function, such as a tablet computer, a palmtop computer, a smart phone, a Personal Digital Assistant (PDA), a Portable Media Player (PMP), and the like; in the embodiment of the present invention, this is not particularly limited.

Based on the video playing system 10 shown in fig. 1, the following describes embodiments of the present invention in detail with reference to the accompanying drawings.

Example one

Referring to fig. 2, a video data processing method according to an embodiment of the present invention is shown, where the method may include:

s201: determining block tile information corresponding to visual position information based on the visual position information;

s202: extracting tile data corresponding to the tile information from pre-cached tile stream data according to the tile information; the tile stream data is obtained by tile dividing panoramic video data to be played;

s203: storing the extracted tile data to a queue to be decoded;

s204: and outputting the decoded video data by decoding the tile data in the queue to be decoded.

It should be noted that tile stream data is obtained by tile dividing panoramic video data to be played; assuming that tile stream data can include (M × N) tile data, that is, the coding arrangement of the tile stream data is M rows and N columns of grids, where M and N are positive integers greater than or equal to 1; in the embodiment of the present invention, the value of M may be 6, and the value of N may be 8, that is, tile stream data may include 48 tile data; however, in practical applications, the values of M and N are set according to practical situations, and the embodiment of the present invention is not particularly limited.

In the embodiment of the invention, the method is applied to a video playing terminal, and firstly, based on visual position information, block tile information corresponding to the visual position information is determined; extracting tile data corresponding to the tile information from pre-cached tile stream data according to the tile information; storing the extracted tile data to a queue to be decoded, and finally outputting decoded video data through decoding processing of the tile data in the queue to be decoded; therefore, on the premise of ensuring the definition of the played picture, the multi-tile stream data can be ensured to be capable of playing the video data corresponding to different tile blocks in the set top box according to the visual position change of the user, and the requirement of the user on the VR function of the set top box is met.

With respect to the technical solution shown in fig. 2, in a possible implementation manner, before determining, based on visual location information, block tile information corresponding to the visual location information, the method further includes:

judging whether the visual parameter information is effective or not;

The visual parameter information refers to data information acquired by the VR helmet through a sensor; generally, the visual parameter information includes 9 parameters, which are divided into 3 groups of data; such as a set of data for three directions x/y/z acquired by a gyro sensor, a set of data for three directions x/y/z acquired by a magnetometer, and a set of data for three directions x/y/z acquired by an angular velocity sensor; these three sets of data can reflect whether the user's visual position has changed.

It should be further noted that the preset visual algorithm represents a special support algorithm for calculating the visual position information; for example, the algorithm may be a Filter-based correlation algorithm such as Sparse Extended Information Filter (SEIF), Kalman Filter (KF), Extended Kalman Filter (EKF), and lossless Kalman Filter (UKF), or may be a monocular vision and odometer fusion correlation algorithm; in practical applications, the specific setting is performed according to practical situations, and the embodiment of the present invention is not particularly limited.

In this way, if the obtained visual parameter information is valid, the visual position information can be determined, so that tile information corresponding to the visual position information can be obtained according to the visual position information, and video data corresponding to different blocks can be played according to the change of the visual position of the user in the following.

As for the technical solution shown in fig. 2, in a possible implementation manner, before extracting tile data corresponding to the tile information from pre-cached tile stream data according to the tile information, the method further includes:

identifying whether the panoramic video data to be played is tile stream data;

It should be noted that a Uniform Resource Locator (URL) is used to indicate a playing data source; the playing data source can be a resource file stored in a local disk or a resource file on a network server; this is not particularly limited in the embodiments of the present invention.

In the foregoing implementation manner, specifically, when the to-be-played panoramic video data is tile stream data, caching the tile stream data includes:

In the foregoing specific implementation manner, more specifically, the extracting tile data corresponding to the tile information from pre-cached tile stream data according to the tile information includes:

For example, taking playing a resource file on a network server as an example, first receiving a URL related to panoramic video data to be played, then establishing a communication connection with the network server according to the URL, and acquiring the panoramic video data to be played from the network server; analyzing the header information of the panoramic video data to be played and identifying whether the panoramic video data to be played is tile stream data; when the panoramic video data to be played is identified to be tile stream data, further acquiring the corresponding relation between the tile data and the tile information; at present, an operator uses an MP4 file for packaging panoramic video data, stores tile information such as coordinates, width, and height of a tile by extending a box (such as an sgpd box type) in an MP4 file, and can analyze and store the tile information by analyzing header information of the panoramic video data to be played; after caching the tile stream data and the correspondence between the tile data and the tile information, the tile data corresponding to the tile information may be extracted from the tile stream data according to the determined tile information and the correspondence between the tile data and the tile information.

It can be understood that after acquiring the panoramic video data to be played from the network server, the total amount of tiles and resolution information corresponding to the panoramic video data to be played can be further determined, so as to create a plurality of threads for parallel processing; therefore, in the foregoing implementation manner, specifically, after the obtaining the panoramic video data to be played from the network server, the method further includes:

In the foregoing specific implementation manner, more specifically, the storing the extracted tile data to a queue to be decoded includes:

It should be noted that, in order to increase the speed of video data processing, multithread parallel processing may be adopted, that is, multiple queues to be decoded are created, so that the continuity of video playing can be ensured; here, the specific number of the plurality of queues to be decoded may be calculated according to the total number of tiles and the resolution information.

For example, taking playing a resource file on a network server as an example, with the above example combined, after acquiring panoramic video data to be played from the network server, based on analysis of the panoramic video data to be played, the total amount of tiles and resolution information corresponding to the panoramic video data to be played can also be determined; assuming that the total number of tiles obtained is 48, the resolution information is 8K; if the decoding capability of the video playing terminal is 4K resolution, the maximum tile value supported by the video playing terminal can be obtained through calculation to be 24; based on a concurrent processing principle, 24 queues to be decoded can be created, so that the extracted tile data is correspondingly stored in the 24 queues to be decoded; respectively decoding tile data in 24 queues to be decoded to obtain 24 decoded video data; the 24 decoded video data are subjected to synthesis processing, and finally, synthesized video data are output.

It should be further noted that in the playing process of the panoramic video data, the display visual change of the VR helmet drives the change of the visual position information, and the tile information corresponding to the display visual position information is determined by analyzing the visual position information; corresponding tile data can be extracted according to the tile information, and the tile data is stored in a queue to be decoded and is decoded, which is a repeated process; therefore, different video pictures can be seen along with the change of the visual position in the playing process, and the VR effect of video playing can be realized at a video playing terminal (such as a set top box).

The embodiment provides a video data processing method, which is applied to a video playing terminal and comprises the steps of firstly determining block tile information corresponding to visual position information based on the visual position information; extracting tile data corresponding to the tile information from pre-cached tile stream data according to the tile information; storing the extracted tile data to a queue to be decoded, and finally outputting decoded video data through decoding processing of the tile data in the queue to be decoded; therefore, the multi-tile streaming data can be ensured to be capable of playing the video data corresponding to different tile blocks in the set top box according to the visual position change of the user, so that the requirement of the user on the VR function of the set top box is met, and good VR experience is brought to the user.

Example two

Based on the same inventive concept of the foregoing embodiment, referring to fig. 3, which shows a flow of a method for determining tile information provided in the embodiment of the present invention, where a set top box is taken as an example of a video playing terminal, and a streaming media server is taken as an example of a network server, the flow may include:

s301: acquiring visual parameter information through display visual change of the VR helmet; wherein the visual parameter information comprises 3 sets of data;

s302: judging whether the visual parameter information is effective or not;

s303: when the visual parameter information is effective, obtaining the visual position information based on a preset visual algorithm;

s304: when the visual parameter information is invalid, returning to the step S301 to wait for the next acquisition of the visual parameter information;

s305: and determining tile information corresponding to the visual position information according to the visual position information.

It should be noted that the display visual change of the VR headset may be simply referred to as head display position change. In this way, when the head display position changes, with respect to step S302, if the determination result is that the visual parameter information is valid, step S303 is executed; if the determination result is that the visual parameter information is invalid, step S304 is executed. The preset variation range represents an effective range corresponding to visual parameter information generated by display visual variation of the VR helmet; when the visual parameter information is within the preset variation range (namely the visual parameter information does not exceed the preset variation range), the visual parameter information is effective, and at the moment, tile information corresponding to the visual position information can be determined; when the visual parameter information is out of the preset variation range (i.e. the visual parameter information exceeds the preset variation range), it indicates that the visual parameter information is invalid, and at this time, it needs to return to step S301 to wait for the next acquisition of the visual parameter information.

It can be understood that if the display vision of the VR headset changes and the acquired visual parameter information is valid, the corresponding tile information can be obtained according to the change, so that the corresponding tile data can be extracted according to the tile information in the following.

Referring to fig. 4, which shows a flow of a method for extracting and storing tile data according to an embodiment of the present invention, where a set top box is taken as an example of a video playing terminal, and a streaming media server is taken as an example of a network server, the flow may include:

s401: establishing communication connection with a streaming media server, and acquiring tile stream data of a panoramic video to be played from the streaming media server;

s402: caching the tile stream data to a stream receiving buffer;

s403: reading the tile stream data from the stream receiving buffer and analyzing;

s404: extracting tile data corresponding to the tile information from the analyzed tile stream data according to the tile information obtained in step S305;

s405: and creating K queues to be decoded, and respectively storing the extracted tile data to the corresponding queues to be decoded.

It should be noted that, after the set-top box establishes a communication connection with the streaming media server, the set-top box sends a URL request to the streaming media server; the URL request carries a specific token, so that the set top box can acquire the required tile stream data from the streaming media server; and then the tile stream data is written into a stream receiving buffer through a stream receiving module.

It should be further noted that, in order to increase the speed of video data processing, multithreading parallel processing may be adopted, that is, multiple queues to be decoded are created; assuming that the maximum tile value supported by the set top box is K, K queues to be decoded are created, such as tile pkt queue _0, tile pkt queue _1, tile pkt queue _2, …, tile pkt queue _ i, …, and tile pkt queue _ K-1; in this way, the extracted tile data can be stored in the corresponding queues to be decoded respectively, so that a subsequent decoder can extract the tile data from each queue to be decoded respectively for decoding processing.

Referring to fig. 5, which shows a flow of a method for decoding tile data according to an embodiment of the present invention, where a set top box is taken as an example of a video playing terminal, and a streaming media server is taken as an example of a network server, the flow may include:

s501: extracting tile data from corresponding queues to be decoded through K decoders respectively for decoding processing to obtain K decoded video data;

s502: and synthesizing the K decoded video data, and outputting the synthesized video data by an image output module.

It should be noted that K pieces of decoded video data can be obtained by decoding processing of K decoders (for example, decoder 1, decoder 2, …, decoder i, decoder …, and decoder K shown in fig. 5), and then the K pieces of decoded video data are subjected to synthesizing processing, and finally the synthesized video data are output and displayed by an image output module.

Through the embodiment, the specific implementation of the embodiment is elaborated in detail, and it can be seen that through the technical scheme of the embodiment, it can be ensured that multiple tile stream data can play video data corresponding to different tile blocks in a set top box according to the change of the visual position of a user, the requirement of the user on the VR function of the set top box is met, and good VR experience is brought to the user.

EXAMPLE III

Based on the same inventive concept of the foregoing embodiment, referring to fig. 6, a detailed flow of a video data processing method provided in the embodiment of the present invention is shown, where a set top box is taken as an example of a video playing terminal, and a streaming media server is taken as an example of a network server, and the detailed flow may include:

s601: starting playing;

s602: issuing a playing URL through an upper player;

s603: creating a stream receiving module, and writing VR data to be played into a stream receiving buffer by the stream receiving module;

s604: analyzing and identifying tile stream data;

s605: acquiring the total quantity of tiles and resolution information;

s606: calculating the maximum tile value supported by the set top box;

s607: creating a same number of pkt queues as the maximum tile value;

s608: acquiring visual position information;

s609: determining tile information corresponding to the visual position information according to the visual position information;

s610: extracting tile data corresponding to the tile information according to the tile information;

s611: storing the extracted tile data to a corresponding pkt queue;

s612: the synthesized video data is output by the image output module through decoding processing of the K decoders.

It should be noted that steps S601 to S607 are the playback initialization stage, steps S608 to S611 are the data extraction and storage stage, and step S612 is the decoding output stage.

It should be further noted that, in the play initialization stage, after the play is started, the upper player issues a play URL first, and establishes a communication connection with the streaming media server according to the URL, so as to obtain VR data to be played from the streaming media server; creating a stream receiving module, and writing VR data to be played into a stream receiving buffer by the stream receiving module; analyzing and identifying whether the VR data to be played is tile stream data; when VR data to be played is tile stream data, further analyzing to obtain two key parameters (including tile total quantity and resolution information); determining the maximum tile value supported by the set top box according to the two key parameters, and creating pkt queues with the same number as the maximum tile value; and carrying out concurrent processing on the data based on the multi-thread concurrent processing principle. After the playing initialization stage is completed, a data extraction and storage stage is entered; visual parameter information is obtained and acquired by a VR helmet through a sensor, and generally carries 9 parameters; when the visual parameter information is effective, visual position information can be obtained based on a preset visual algorithm; according to the visual position information, tile information needing to be played can be determined; extracting tile data corresponding to the tile data from a stream receiving buffer according to the tile information, analyzing to obtain tile pkt (wherein pkt represents an ES data storage form of a video), and storing the tile pkt into corresponding pkt queues (such as tile pkt queue _0, tile pkt queue _1, tile pkt queue _2, …, tile pkt queue _ i, … and tile pkt queue _ K-1); and extracting tile data from the corresponding pkt queue through K decoders (such as decoder 1, decoder 2, …, decoders i, …, and decoder K) to perform decoding processing, and finally synthesizing the K decoded video data obtained by the K decoders and outputting the synthesized video data through an image output module, thereby forming a VR effect.

Example four

Based on the same inventive concept of the foregoing embodiment, referring to fig. 7, which shows the composition of a video data processing apparatus 70 provided by the embodiment of the present invention, the video data processing apparatus 70 may include: a determination unit 701, an extraction unit 702, a storage unit 703 and a decoding unit 704, wherein,

the determining unit 701 is configured to determine block tile information corresponding to visual position information based on the visual position information;

the extracting unit 702 is configured to extract tile data corresponding to the tile information from pre-cached tile stream data according to the tile information; the tile stream data is obtained by tile dividing panoramic video data to be played;

the storage unit 703 is configured to store the extracted tile data to a queue to be decoded;

the decoding unit 704 is configured to output decoded video data by decoding the tile data in the queue to be decoded.

In the above solution, the video data processing apparatus 70 further comprises an obtaining unit 705 and a judging unit 706, wherein,

the obtaining unit 705 is configured to obtain visual parameter information; the visual parameter information represents parameter information required by reflecting the visual position of a user;

the judging unit 706 is configured to judge whether the visual parameter information is valid;

the obtaining unit 705 is further configured to obtain the visual position information based on a preset visual algorithm when the visual parameter information is valid.

In the above scheme, the video data processing apparatus 70 further comprises a receiving unit 707 and a recognition unit 708, wherein,

the receiving unit 707 is configured to receive a uniform resource locator URL related to the panoramic video data to be played;

the obtaining unit 705 is further configured to establish a communication connection with a network server according to the URL, and obtain the panoramic video data to be played from the network server;

the identifying unit 708 is configured to identify whether the panoramic video data to be played is tile stream data;

the storage unit 703 is further configured to buffer tile stream data when the to-be-played panoramic video data is the tile stream data.

In the above solution, the video data processing apparatus 70 further comprises a computing unit 709 and a creating unit 710, wherein,

the determining unit 701 is further configured to determine, based on the analysis of the to-be-played panoramic video data, a total number of tiles and resolution information corresponding to the to-be-played panoramic video data;

the calculating unit 709 is configured to calculate a maximum tile value supported by the video playing terminal based on the total number of tiles and the resolution information;

the creating unit 710 is configured to create queues to be decoded, the queues to be decoded having the same number as the maximum tile value; and based on a concurrency processing principle, carrying out concurrent processing on the queues to be decoded with the same number as the maximum tile value.

In the above solution, the obtaining unit 705 is further configured to obtain a corresponding relationship between tile data and tile information when the to-be-played panoramic video data is tile stream data; the tile information at least comprises coordinate information of tile data, width information of the tile data and height information of the tile data;

the storage unit 703 is further configured to cache the tile stream data and the corresponding relationship between the tile data and the tile information.

In the above-described aspect, the extracting unit 702 is configured to extract tile data corresponding to the tile information from the tile stream data according to the tile information and a correspondence between the tile data and the tile information.

In the above scheme, the video data processing apparatus 70 further includes a synthesizing unit 711, wherein,

the storage unit 703 is configured to correspondingly store the extracted tile data to K queues to be decoded; wherein the value of K is the maximum tile value;

the decoding unit 704 is configured to decode the tile data in the K queues to be decoded respectively to obtain K decoded video data;

the synthesizing unit 711 is configured to perform synthesizing processing on the K decoded video data, and output the synthesized video data.

It is understood that in this embodiment, a "unit" may be a part of a circuit, a part of a processor, a part of a program or software, etc., and may also be a module, or may also be non-modular. Moreover, each component in the embodiment may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware or a form of a software functional module.

Based on the understanding that the technical solution of the present embodiment essentially or a part contributing to the prior art, or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium, and include several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (processor) to execute all or part of the steps of the method of the present embodiment. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

Accordingly, the present embodiment provides a computer storage medium storing a video data processing program that, when executed by at least one processor, implements the steps of the method of one of the preceding embodiments.

Based on the above-mentioned composition of the video data processing apparatus 70 and the computer storage medium, referring to fig. 8, a specific hardware structure of the video data processing apparatus 70 provided by the embodiment of the present invention is shown, which may include: a network interface 801, a memory 802, and a processor 803; the various components are coupled together by a bus system 804. It is understood that the bus system 804 is used to enable communications among the components. The bus system 804 includes a power bus, a control bus, and a status signal bus in addition to a data bus. For clarity of illustration, however, the various buses are labeled as bus system 804 in FIG. 8. The network interface 801 is used for receiving and sending signals in the process of receiving and sending information with other external network elements;

a memory 802 for storing a computer program capable of running on the processor 803;

a processor 803 for executing, when running the computer program, the following:

storing the extracted tile data to a queue to be decoded;

It will be appreciated that the memory 802 in embodiments of the invention may be either volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. The non-volatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable PROM (EEPROM), or a flash Memory. Volatile Memory can be Random Access Memory (RAM), which acts as external cache Memory. By way of example, but not limitation, many forms of RAM are available, such as Static random access memory (Static RAM, SRAM), Dynamic Random Access Memory (DRAM), Synchronous Dynamic random access memory (Synchronous DRAM, SDRAM), Double Data rate Synchronous Dynamic random access memory (ddr SDRAM ), Enhanced Synchronous SDRAM (ESDRAM), Synchlink DRAM (SLDRAM), and direct memory bus RAM (DRRAM). The memory 802 of the systems and methods described herein is intended to comprise, without being limited to, these and any other suitable types of memory.

And the processor 803 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 803. The Processor 803 may be a general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component. The various methods, steps and logic blocks disclosed in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present invention may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in the memory 802, and the processor 803 reads the information in the memory 802, and completes the steps of the above method in combination with the hardware thereof.

It is to be understood that the embodiments described herein may be implemented in hardware, software, firmware, middleware, microcode, or any combination thereof. For a hardware implementation, the Processing units may be implemented within one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), general purpose processors, controllers, micro-controllers, microprocessors, other electronic units designed to perform the functions described herein, or a combination thereof.

For a software implementation, the techniques described herein may be implemented with modules (e.g., procedures, functions, and so on) that perform the functions described herein. The software codes may be stored in a memory and executed by a processor. The memory may be implemented within the processor or external to the processor.

Optionally, as another embodiment, the processor 803 is further configured to execute the steps of the method of the first embodiment when running the computer program.

Referring to fig. 9, a schematic diagram of a composition structure of a video playback terminal 90 according to an embodiment of the present invention is shown; the video playing terminal 90 at least includes any one of the video data processing devices 70 in the foregoing embodiments.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (such as a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.

While the present invention has been described with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, which are illustrative and not restrictive, and it will be apparent to those skilled in the art that various changes and modifications can be made therein without departing from the spirit and scope of the invention as defined in the appended claims.

Claims

1. A video data processing method is applied to a video playing terminal and comprises the following steps:

storing the extracted tile data to a queue to be decoded;

2. The method of claim 1, wherein before the determining block tile information corresponding to the visual location information based on the visual location information, the method further comprises:

judging whether the visual parameter information is effective or not;

3. The method according to claim 1, wherein before the extracting tile data corresponding to the tile information from pre-cached tile stream data according to the tile information, the method further comprises:

identifying whether the panoramic video data to be played is tile stream data;

4. The method according to claim 3, wherein after the obtaining the panoramic video data to be played from the network server, the method further comprises:

5. The method according to claim 3, wherein the buffering the tile stream data when the to-be-played panoramic video data is tile stream data comprises:

6. The method according to claim 5, wherein the extracting tile data corresponding to the tile information from pre-cached tile stream data according to the tile information comprises:

7. The method of claim 4, wherein saving the extracted tile data to a queue to be decoded comprises:

8. A video data processing apparatus applied to a video playback terminal, the video data processing apparatus comprising: a determination unit, an extraction unit, a storage unit and a decoding unit, wherein,

9. The video data processing apparatus according to claim 8, further comprising an acquisition unit and a judgment unit, wherein,

10. The video data processing apparatus according to claim 9, wherein said video data processing apparatus further comprises a receiving unit and an identifying unit, wherein,

11. The video data processing apparatus according to claim 10, further comprising a calculation unit and a creation unit, wherein,

12. The video data processing apparatus according to claim 10, wherein the acquiring unit is further configured to acquire a correspondence between tile data and tile information when the panoramic video data to be played is tile stream data; the tile information at least comprises coordinate information of tile data, width information of the tile data and height information of the tile data;

13. The apparatus according to claim 12, wherein said extracting means is configured to extract tile data corresponding to the tile information from the tile stream data based on the tile information and a correspondence between tile data and tile information.

14. The video data processing apparatus according to claim 11, wherein said video data processing apparatus further comprises a composition unit, wherein,

15. A video data processing apparatus, characterized in that the video data processing apparatus comprises: a memory and a processor; wherein the content of the first and second substances,

the memory for storing a computer program operable on the processor;

the processor, when executing the computer program, is adapted to perform the steps of the method of any of claims 1 to 7.

16. A computer storage medium, characterized in that the computer storage medium stores a video data processing program which, when executed by at least one processor, implements the steps of the method according to any one of claims 1 to 7.

17. A video playback terminal characterized in that it comprises at least a video data processing apparatus according to any one of claims 8 to 15.