CN112153401B

CN112153401B - Video processing method, communication device and readable storage medium

Info

Publication number: CN112153401B
Application number: CN202011002978.3A
Authority: CN
Inventors: 金晶; 王�琦; 李康敬; 陶嘉伟; 潘兴浩
Original assignee: China Mobile Communications Group Co Ltd; MIGU Video Technology Co Ltd; MIGU Culture Technology Co Ltd
Current assignee: China Mobile Communications Group Co Ltd; MIGU Video Technology Co Ltd; MIGU Culture Technology Co Ltd
Priority date: 2020-09-22
Filing date: 2020-09-22
Publication date: 2022-09-06
Anticipated expiration: 2040-09-22
Also published as: CN112153401A

Abstract

The invention provides a video processing method, communication equipment and a readable storage medium, which solve the problem that the calculation consumption of VR terminal equipment is increased by the existing VR user visual angle data synchronization method. The method comprises the steps of receiving a first request sent by first terminal equipment, wherein the first request is used for requesting to acquire a first target video stream of a first FOV, and the first FOV is a field angle corresponding to second terminal equipment; receiving first FOV track information of a second target video stream sent by a second terminal device; obtaining a first target video stream of a first FOV according to the first FOV track information; and transmitting the first target video stream of the first FOV to the first terminal equipment. Therefore, the first terminal equipment can acquire the video stream of the field angle of the second terminal equipment, the second terminal equipment only needs to transmit the first FOV track information to the network equipment, the direct-broadcasting data stream does not need to be calculated and transmitted, and the calculation consumption of the VR terminal equipment is effectively reduced.

Description

Video processing method, communication device and readable storage medium

Technical Field

The embodiment of the invention relates to the technical field of multimedia communication, in particular to a video processing method, communication equipment and a readable storage medium.

Background

In a live Virtual Reality (VR) scene, when another device wants to see a live data stream of a moving view of a user end, rendered data is generally directly acquired on a terminal VR device through a video image mapping algorithm such as a spherical texture mapping plane method, and then is pushed to a back-end server through a Real-Time Messaging Protocol (RTMP) or other protocols by using a compression algorithm such as H264/H265 again, and then media streams are sent to other terminals synchronously; or the terminal VR device directly synchronizes rendered data, and then synchronizes the audio and video data to other devices for playing through an IP, bluetooth or High Definition Multimedia Interface (HDMI) line connection manner by reusing a compression algorithm such as H264/H265.

By the VR user visual angle data synchronization method, calculation consumption of VR terminal equipment is greatly increased, and VR electric quantity continuous capacity with the same capacity is reduced.

Disclosure of Invention

The embodiment of the invention provides a video processing method, communication equipment and a readable storage medium, which aim to solve the problem that the calculation consumption of VR terminal equipment is increased by the existing VR user visual angle data synchronization method.

In a first aspect, an embodiment of the present invention provides a video processing method, applied to a network device, including:

receiving a first request sent by a first terminal device, wherein the first request is used for requesting to acquire a first target video stream of a first field angle FOV, and the first FOV is a field angle corresponding to a second terminal device;

receiving first FOV track information of a second target video stream sent by the second terminal equipment, wherein the first target video stream is at least part of the second target video stream;

obtaining a first target video stream of the first FOV according to the first FOV track information;

and transmitting the first target video stream of the first FOV to the first terminal equipment.

Optionally, the first FOV track information includes: and the identification of the second target video stream, the identification of the second terminal equipment and FOV coordinate information corresponding to display time stamps PTS of N image frames, wherein N is the total number of the image frames in the second target video stream.

Optionally, before receiving the first request sent by the first terminal device, the method further includes:

acquiring a second target video stream;

and acquiring corresponding relation of each group of picture (GOP) data in the second target video stream, each GOP data and a display time stamp (PTS) of a picture frame in the second target video stream, and the PTS and a recording time Tr of each picture frame in the second target video stream.

Optionally, the obtaining, according to first FOV track information of a second target video stream sent by the second terminal device, a first target video stream of the first FOV includes:

determining a starting PTS of a first target video stream of the first FOV;

in the second target video stream, sequentially reading image group data corresponding to each image frame from the starting PTS until the PTS is finished;

restoring to obtain image data of each image frame in the image group data according to the PTS of each image frame in the second target video stream;

and processing the image data of each image frame according to the first FOV track information to obtain a first target video stream of the first FOV.

Optionally, in a case that the first request is used to request to acquire a first target video stream of a first FOV in real time, the starting PTS is a starting PTS of a latest GOP, the ending PTS is a last PTS in the second target video stream, and the latest GOP is a GOP corresponding to a latest PTS in first FOV track information uploaded by the second terminal device;

or, in a case that the first request is for requesting to acquire a target video stream of a first field angle FOV which is played from a first time, the starting PTS is a PTS corresponding to the first time in the second target video stream, and the ending PTS is a last PTS in the second target video stream.

Or, in a case that the first request is for requesting to acquire a target video stream of a first field angle FOV between a second time and a third time, the starting PTS is a PTS corresponding to the second time in the second target video stream, and the ending PTS is a PTS corresponding to the third time in the second target video stream.

Optionally, the sending the first target video stream of the first FOV to the first terminal device includes:

and sending the first target video stream of the first FOV to the first terminal device through a Content Delivery Network (CDN).

According to another aspect of the present invention, there is provided a video processing method applied to a terminal device, including:

acquiring a second target video stream;

analyzing the second target video stream to obtain first FOV track information of the second target video stream;

and uploading the first FOV track information to a network device.

Optionally, the analyzing the second target video stream to obtain the first FOV track information of the second target video stream includes:

analyzing the second target video stream to obtain the PTS of each image frame;

acquiring FOV coordinate information corresponding to the PTS of each image frame;

and obtaining first FOV track information of the second target video stream according to the FOV coordinate information corresponding to the PTS of each image frame.

Optionally, the first FOV track information includes: and the identification of the second target video stream, the identification of the second terminal equipment and FOV coordinate information corresponding to the PTS of the N image frames, wherein N is the total number of the image frames in the second target video stream.

Optionally, the uploading the first FOV track information to a network device includes:

uploading the first FOV track information to network equipment according to a preset byte stream structure;

the preset byte stream structure comprises data header information and a group of data volume information, wherein the data header information comprises an identifier of a second target video stream and an identifier of a second terminal device, each data volume information comprises a PTS of an image frame, and FOV coordinate information corresponding to the PTS of the image frame.

In accordance with another aspect of the present invention, there is provided a network device, including: a processor, a memory and a computer program stored on said memory and executable on said processor, said computer program realizing the steps of the video processing method as described above when executed by said processor.

According to still another aspect of the present invention, there is provided a terminal device including: a processor, a memory and a computer program stored on the memory and executable on the processor, which computer program, when executed by the processor, carries out the steps of the video processing method as described above.

According to a further aspect of the present invention, a computer-readable storage medium is provided, having stored thereon a computer program which, when being executed by a processor, carries out the steps of the video processing method as set forth above.

In the embodiment of the invention, the first target video stream of the first FOV can be obtained according to the first FOV track information of the second target video stream sent by the second terminal device, and the first target video stream is at least a part of the second target video stream, so that the first terminal device can obtain the video stream of the field angle of the second terminal device.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required to be used in the description of the embodiments of the present invention will be briefly introduced below, and it is obvious that the drawings in the description below are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive labor.

Fig. 1 is a schematic flow chart of a video processing method according to an embodiment of the present invention;

FIG. 2 shows a schematic coordinate diagram of a first FOV in an embodiment of the invention;

fig. 3 is a schematic diagram illustrating interaction between a server and a terminal device according to an embodiment of the present invention;

FIG. 4 is a second flowchart illustrating a video processing method according to an embodiment of the invention;

FIG. 5 is a diagram illustrating header information in an embodiment of the invention;

FIG. 6 is a diagram illustrating data volume information in an embodiment of the invention;

FIG. 7 is a diagram illustrating header information and body information in an embodiment of the present invention;

fig. 8 is a schematic structural diagram of a video processing apparatus according to an embodiment of the present invention;

fig. 9 is a schematic diagram illustrating an implementation structure of a network device according to an embodiment of the present invention;

fig. 10 is a second schematic structural diagram of a video processing apparatus according to an embodiment of the invention;

fig. 11 is a schematic diagram of an implementation structure of a terminal device according to an embodiment of the present invention.

Detailed Description

In order to make the technical problems, technical solutions and advantages of the present invention more apparent, the following detailed description is given with reference to the accompanying drawings and specific embodiments. In the following description, specific details are provided, such as specific configurations and components, merely to facilitate a thorough understanding of embodiments of the invention. It will therefore be apparent to those skilled in the art that various changes and modifications can be made in the embodiments described herein without departing from the scope and spirit of the invention. In addition, descriptions of well-known functions and constructions are omitted for clarity and conciseness.

It should be appreciated that reference throughout this specification to "one embodiment" or "an embodiment" means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present invention. Thus, the appearances of the phrases "in one embodiment" or "in an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.

In various embodiments of the present invention, it should be understood that the sequence numbers of the following processes do not mean the execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention. In addition, the terms "system" and "network" are often used interchangeably herein.

As shown in fig. 1, an embodiment of the present invention provides a video processing method, which is applied to a network device, where the network device may specifically be a server, and the method includes the following steps:

step 101: receiving a first request sent by a first terminal device, wherein the first request is used for requesting to acquire a first target video stream of a first field angle FOV, and the first FOV is a field angle corresponding to a second terminal device.

In this step, the first terminal device may send the first request to the network device through the CDN, so as to obtain a first target video stream of the FOV of the second terminal device.

Step 102: receiving first FOV track information of a second target video stream sent by the second terminal device, wherein the first target video stream is at least a partial video stream of the second target video stream.

The first FOV trace information is indicative of position information, such as coordinate information, of an image frame in the second target video stream at the second terminal device field angle.

The first target video stream may be all video streams of the second target video stream, that is, the first target video stream is the same as the second target video stream, or may be a partial video stream of the second target video stream, for example, the first target video stream is a video stream in a certain time period of the second target video stream.

The first target video stream and the second target video stream have the same identification.

It should be noted that there is no restriction on the order between the step 102 and the step 101, that is, the step 102 may be executed first and then the step 101 is executed, or the step 101 may be executed first and then the step 102 is executed.

Step 103: and obtaining a first target video stream of the first FOV according to the first FOV track information.

Here, the network device obtains the first target video stream of the first FOV by combining the second target video stream according to the first FOV track information sent by the second terminal device.

Step 104: and transmitting the target video stream of the first FOV to the first terminal equipment.

It should be noted that, in the embodiment of the present invention, the first target video stream may be transmitted in real time, for example, a part of video data of the first target video stream is obtained according to the first FOV track information, and then the obtained part of video data is sent to the first terminal device, without obtaining a complete first target video stream and then sending the complete first target video stream to the first terminal device.

According to the video processing method provided by the embodiment of the invention, the first target video stream of the first FOV can be obtained according to the first FOV track information of the second target video stream sent by the second terminal device, and the first target video stream is at least part of the second target video stream, so that the first terminal device can obtain the video stream of the field angle of the second terminal device.

In the embodiment of the invention, the second terminal device plays the second target video stream through the VR player, renders a corresponding picture in an FOV mode, and simultaneously constructs a corresponding relation among the identifier of the second target video stream, the identifier of the second terminal device and FOV coordinate information corresponding to the display time stamps PTS of the N image frames to obtain the first FOV track information.

Specifically, the VR player of the second terminal device obtains a playing address of the second target video stream (for example, a target website, returns to the ts list, and then continues to request ts fragment data), thereby obtaining a VR live broadcast real-time data stream (the second target video stream) at the back end, and analyzing the PTS of each frame (I frame, B frame, and P frame): t _ pts (m) and restoring FOV view information according to the spherical texture mapping plane method view angle, as shown in fig. 2, after decoding, acquiring the T _ pts (m) time image shift left position Vp _ x (T _ pts (m)) (distance from left edge of full depth stream picture), shift bottom position Vp _ y (T _ pts (m)) (distance from bottom edge of full depth stream picture), FOV view height Vp _ h (T _ pts (m)) (height of FOV picture identified in fig. 2, i.e. distance between bottom edge and fixed edge of first FOV picture) and FOV view width Vp _ w (T _ pts (m)) (width of first FOV picture identified in fig. 2, i.e. distance between left edge and right edge of first FOV picture), and recording play frame m at the same time, and constructing ID (identification of second terminal device), and, S (identification of the second object video stream) and FOV coordinate information of the PTS of the image frame, to obtain first FOV track information:

the second terminal equipment uploads to the back-end server in a byte stream mode, after long connection is established between the second terminal equipment and the back-end server, the S and SID are pushed firstly, then Qp structure byte stream is continuously pushed in real time, first FOV track information is sent to the server, the server analyzes S and SID user attributes, then a relation between a subsequent splicing stream and S, SID is bound based on the long connection, a new index is established locally, and the new index relation is as follows:

the data structure is referred to as a filename by S _ ID, and,

optionally, in this embodiment of the present invention, before receiving the first request sent by the first terminal device, the method further includes:

acquiring a second target video stream;

and acquiring corresponding relation of each group of picture (GOP) data in the second target video stream, each GOP data and a display time stamp (PTS) of each image frame in the second target video stream, and the PTS and a recording time Tr of each image frame in the second target video stream.

In this application embodiment, the network equipment high in the clouds gathers in real time and records the live VR stream, and is specific, with the live environment real-time live stream of input VR, the server is through partitioning again, encoding after handling the video packing, synthesizes the live VR stream of the clear VR of one way superelevation. VR real-time live stream is time synchronized with ntp server. And the VR live recording server analyzes the data set D of each GOP, stores the data set D, and simultaneously analyzes the name S of the live stream, the PTS time T _ PTS of each frame and the recording time Tr of each frame. The data set D contains data for each GOP (a group of I, B, P frames). The data set D constructs each frame data of a set of GOPs where the frames are located, and marks the GOP frame set a as D (T _ pts). That is, a group of media data of a GOP at the T _ pts (i) th time is set a (T _ pts (i)). All media streams of the set of I frames, B frames, and P frames are recorded in the set a. Because the PTS of the data in the GOP is different for each frame, a relationship is established:

D(T_pts(x0))＝D(T_pts(x0+1))＝D(T_pts(x0+2))＝……＝D(T_pts(x0+x))＝A(T_pts(x0))；

wherein T _ PTS (x0) represents the PTS of the first frame of the GOP, and T _ PTS (x0+ x) represents the PTS of the last frame of the GOP, that is, it is clear that T _ PTS can obtain corresponding a media data, that is, each image frame in the GOP corresponds to the same group of media data, for example, the media data corresponding to each image frame in GOP0 is a 0.

In the embodiment of the invention, the relation Tr (x), T _ pts (x) is also recorded, and a relation II of a VR live broadcast recording set is constructed:

wherein: i represents a picture frame corresponding to the current recording time, and i starts counting from 1 and starts recording from recording; n represents that n frames of pictures exist, and n is increased along with recording; qr (S, n) represents a collection set of n frames of pictures recorded by S stream; t _ PTS (i) is the PTS of the ith frame, and a media data can be derived from T _ PTS (i) according to the above-mentioned relation.

determining a starting PTS of a first target video stream of the first FOV;

In the embodiment of the invention, according to a first request, a network device determines an initial PTS of a first target video stream of a first FOV, then, according to the first relation, sequentially reads image group data corresponding to each image frame from the initial PTS in a second target video stream until the PTS is finished, and restores and obtains image data of each image frame in the image group data according to the PTS of each image frame, the audio data is extracted according to original file audio data, processed according to a spherical texture mapping plane method and the like according to first FOV track information, corresponding FOV data is cut out, after a new GOP is cached (the size of the new GOP is consistent with the size of the corresponding original GOP by default), the new GOP is compressed by using H264/H265 again, and an RTMP stream is output in real time according to PTS intervals and accelerated to a first terminal device through a CDN network.

Optionally, when the first request is used to request to obtain a first target video stream of a first FOV in real time, the starting PTS is a starting PTS of a latest GOP, the ending PTS is a last PTS in the second target video stream, and the latest GOP is a GOP corresponding to a latest PTS in first FOV track information uploaded by the second terminal device;

or, in a case that the first request is for requesting to acquire a target video stream of a first field angle FOV that is played from a first time, the starting PTS is a PTS corresponding to the first time in the second target video stream, and the ending PTS is a last PTS in the second target video stream.

Optionally, in this embodiment of the application, the sending the first target video stream of the first FOV to the first terminal device includes:

Here, the CDN may distribute the first target video stream of the first FOV to the plurality of terminal apparatuses, thereby providing a capability of distributing the synchronized video stream to the plurality of terminal apparatuses.

The following describes a video processing method according to an embodiment of the present invention with reference to a specific application scenario.

As shown in fig. 3, assuming that a user K (ID 123) sends an original request requesting a video stream name S of live1, the network device constructs media data a per GOP of the stream S and PTS of each frame according to the request. And the user K acquires the VR live stream (S) through the HLS. The user K renders a visual angle corresponding to the first FOV on the VR device of the user K, live1 and 123, PTS information of each frame and first FOV coordinate information are uploaded, and the network device always records the FOV coordinate information and PTS information of the S stream requested by the user K. When other users (e.g., K1 user, K2 user, Kn user) initiate a request, the network device provides the live1 of the first FOV to the other users based on coordinate information of the first FOV, and the like. And after other users are accelerated through the CDN, obtaining S and SID stream information corresponding to the user K, and rendering a motion track shown by the user SID in real time according to the PTS.

Scene 1: the K1 user requests a stream of view trajectories for the K user FOVs.

The server resolves, from the first request, the stream name S (in this case live1) and the second terminal device ID, i.e. SID (in this case 123), according to the user view track set Qp (n) recorded by the VR recording server for continuously reading the stream S and the SID by the live1_123 server, the server side acquires the latest GOP start PTS time, meanwhile, the GOP information and the visual angle coordinate position information corresponding to the PTS are searched according to the live 1-123 index file, the A media data of each frame are read and analyzed frame by frame, the A media data restore the media data of the corresponding frame according to the PTS, the audio data are extracted according to the original file audio data, and (3) according to a spherical texture mapping plane method and the like, data corresponding to the FOV are cut off, after a new GOP1 is cached (the size of the GOP1 is consistent with that of the original GOP by default), H264/H265 is used for compression again, and a real-time message transport protocol RTMP stream is output in real time at pts intervals and accelerated to a K1 user through a CDN.

Scene 2: the K2 user requests the K user to specify a time to begin viewing the user's FOV data.

The server analyzes the stream name S (live 1 in this example) and the ID of the second terminal device, i.e. SID (123 in this example) and the play start time T1 according to the first request, acquires the corresponding PTS from the Qr (S, n) data according to live1 and T1, and takes the left value T _ PTS (i) if T _ PTS (i) < ═ T1< T _ PTS (i + 1).

And the recording server searches GOP information and view coordinate position information corresponding to the PTS according to T _ PTS (i) and live1_123 index files, starts to read the A media data of each frame by frame, analyzes the A media data, restores the media data of the corresponding frame according to the PTS, extracts the audio data according to the original file audio data, clips the data corresponding to the FOV according to a spherical texture mapping plane method and the like, caches a new GOP1 (the size of the GOP1 is consistent with the original GOP by default), compresses the data by H264/H265 again, and outputs RTMP stream to a K2 user in real time according to PTS intervals and a CDN network.

Scene 3: the K3 user requests the K user to view the FOV data for the user for a specified period of time.

The streaming media server parses the stream name S (live 1 in this case), SID (123 in this case), start playing time t1 and end playing time t2 according to the first request, first obtains the corresponding PTS1 and PTS2 from Qr (S, n) data according to live1, t1 and t2,

t _ pts (i) < ═ T1< T _ pts (i +1), taking the left value T _ pts (i).

T _ pts (j) < ═ T2< T _ pts (j +1), and the right value T _ pts (j +1) is taken.

And the recording server records the start GOP information, the end GOP information and the view coordinate position information corresponding to the PTS according to the T _ PTS (i) and the live1_123 index file, starts reading the A media data of each frame by frame, analyzes the A media data, restores the media data of the corresponding frame according to the PTS, extracts the audio data according to the original file audio data, clips the data of the corresponding FOV according to a spherical texture mapping plane method and the like, caches the new GOP1 (the size of the GOP1 is consistent with the original GOP by default), compresses the audio data by H264/H265, outputs RTMP stream to a K3 user in real time according to PTS intervals, passes through a CDN network until the end GOP information is read, and ends the output.

In the embodiment of the application, the capability of the rear end of the server is fully utilized to carry out FOV live broadcast real-time sharing, the uploading bandwidth of the second terminal equipment is reduced, and the consumption caused by secondary coding calculation of the second terminal equipment is reduced. The cloud end is used for solving the problem in real time in a unified mode, so that the calculated amount of the terminal equipment side is reduced.

As shown in fig. 4, an embodiment of the present application further provides a video processing method, which is applied to a terminal device, where the terminal device may specifically be the second terminal device, and the method includes:

step 401: a second target video stream is obtained.

Here, the second target video stream is a VR live stream acquired and recorded by the network device in real time. And the network equipment sends the collected and recorded VR live stream to the second terminal equipment.

Step 402: and analyzing the second target video stream to obtain first FOV track information of the second target video stream.

The VR player of the second terminal device obtains the playing address of the second target video stream (such as a target website, returns to the ts list, and continues to request ts fragment data), so as to obtain the rear VR live broadcast real-time data stream (the second target video stream), and analyzes the PTS of each frame (I frame, B frame and P frame): and T _ PTS (m), simultaneously restoring FOV (field of view) visual angle information according to a spherical texture mapping plane method visual angle, decoding and then acquiring a left position Vp _ x (T _ PTS (m)) of image shift at the time, a bottom position Vp _ y (T _ PTS (m)), a high FOV visual angle Vp _ h (T _ PTS (m) and a FOV visual angle width Vp _ w (T _ PTS (m)), simultaneously recording a playing frame m at the time, and constructing a corresponding relation among ID (identification of a second terminal device), S (identification of a second target video stream) and FOV coordinate information of PTS (partial description) of an image frame to obtain first FOV track information:

in this step, when the second terminal device plays the video stream based on the first field angle, the user ID requests the S stream address to generate a Qp (ID, S, n) set in real time, so as to form the first FOV track information.

Step 403: and uploading the first FOV track information to a network device.

The network device may be a cloud server. Here, the first FOV track information may be uploaded to the network device in a byte stream.

According to the video processing method, the first FOV track information of the second target video stream is uploaded to the network equipment, and the network equipment can obtain the first target video stream of the first FOV according to the first FOV track information, so that the first terminal equipment can obtain the video stream of the field angle of the second terminal equipment.

analyzing the second target video stream to obtain the PTS of each image frame;

Specifically, the VR player of the second terminal device obtains the playing address of the second target video stream (e.g., the target website, returns to the ts list, and then continues to request ts fragment data), so as to obtain the VR live broadcast real-time data stream (the second target video stream) at the back end, and analyzes the PTS of each frame (I frame, B frame, and P frame): t _ pts (m), and simultaneously restoring FOV view information according to the spherical texture mapping plane method view angle, as shown in fig. 2, after decoding, acquiring the position Vp _ x (T _ pts (m)) of the left side of the image shift at time T _ pts (m) (distance from the left edge of the full depth stream picture), the position Vp _ y (T _ pts (m)) of the shifted bottom edge (distance from the bottom edge of the full depth stream picture), the FOV view height Vp _ h (T _ pts (m) (height of the FOV picture identified in fig. 2, i.e., distance between the bottom edge and the fixed edge of the first FOV picture), and the FOV view width Vp _ w (T _ pts (m) (width of the first FOV picture identified in fig. 2, i.e., distance between the left edge and the right edge of the first FOV picture), recording the play frame m at this time, and constructing ID (identification of the second terminal device), as well as, The corresponding relation between S (the identification of the second object video stream) and the FOV coordinate information of the PTS of the image frame, obtains the first FOV track information:

The first FOV track information is described in detail in the embodiment on the network device side, and is not described herein again.

As shown in fig. 5, the header information includes: FOV structure flag bit, extension flag bit, total length, SID, and stream name (S). As shown in fig. 6, the volume information includes FOV structure flag bits, extension flag bits, current packet length, time stamp Tp, frame Pts, Vx, Vy, resolution high (i.e., Vp _ h (T _ Pts (m)), resolution wide (Vp _ w (T _ Pts (m)), as shown in fig. 7, is a combination of header information and a set of volume information, and the header information appears once every 1 second.

The video processing method of the embodiment of the application constructs a specific index format based on the user visual angle coordinate, PTS of I, B, P frames in the visual angle stream and the like, and the specific index format is returned according to the real-time stream and recorded by the back-end server in real time. After the VR display terminal receives a VR panoramic deep video display instruction, the server can provide real-time recorded FOV service according to the request characteristics, a solution for low-delay broadcasting of a specific user visual angle to a multi-user audio and video acquisition transmission distribution layer, which is acquired and recorded by a user visual angle cloud, is provided, the solution is different from an original transmission mode, and the pressure of a communication network is simplified.

As shown in fig. 8, an embodiment of the present application further provides a video processing apparatus, applied to a network device, including:

a first receiving module 801, configured to receive a first request sent by a first terminal device, where the first request is used to request to acquire a first target video stream of a first field angle FOV, where the first field angle FOV is a field angle corresponding to a second terminal device;

a second receiving module 802, configured to receive first FOV track information of a second target video stream sent by the second terminal device, where the first target video stream is at least a partial video stream of the second target video stream;

a first obtaining module 803, configured to obtain a first target video stream of the first FOV according to the first FOV track information;

a first sending module 804, configured to send the first target video stream of the first FOV to the first terminal device.

In the video processing apparatus according to the embodiment of the present application, the first FOV track information includes: and the identification of the second target video stream, the identification of the second terminal equipment and FOV coordinate information corresponding to display time stamps PTS of N image frames, wherein N is the total number of the image frames in the second target video stream.

The video processing apparatus according to the embodiment of the present application further includes:

the second obtaining module is used for obtaining a second target video stream before the first receiving module receives the first request sent by the first terminal equipment;

and the third acquisition module is used for acquiring the corresponding relation between each group of picture (GOP) data in the second target video stream, each GOP data and the display time stamp (PTS) of the image frames in the second target video stream, and the PTS and the recording time Tr of each image frame in the second target video stream.

In the video processing apparatus according to the embodiment of the present application, the first obtaining module 803 includes:

a determining submodule for determining a starting PTS of a first target video stream of the first FOV;

the reading sub-module is used for sequentially reading image group data corresponding to each image frame from the starting PTS in the second target video stream until the PTS is finished;

the restoring submodule is used for restoring and obtaining image data of each image frame in the image group data according to the PTS of each image frame in the second target video stream;

and the first acquisition submodule is used for processing the image data of each image frame according to the first FOV track information to obtain a first target video stream of the first FOV.

In the video processing apparatus according to the embodiment of the application, when the first request is used to request to acquire a first target video stream of a first FOV in real time, the starting PTS is a starting PTS of a latest GOP, the ending PTS is a last PTS in the second target video stream, and the latest GOP is a GOP corresponding to a latest PTS in first FOV track information uploaded by a second terminal device;

In the video processing apparatus in the embodiment of the present application, the first sending module is configured to send the first target video stream of the first FOV to the first terminal device through a content delivery network CDN.

It should be noted that the apparatus is an apparatus corresponding to the video processing method applied to the network device side, and all implementation manners in the method embodiments are applicable to the embodiment of the apparatus, and the same technical effect can be achieved.

According to the video processing device in the embodiment of the application, the first target video stream of the first FOV can be obtained according to the first FOV track information of the second target video stream sent by the second terminal device, and the first target video stream is at least part of the second target video stream, so that the first terminal device can obtain the video stream of the field angle of the second terminal device.

As shown in fig. 9, an embodiment of the present application further provides a network device, optionally, the network device is a cloud server, and the network device includes: a transceiver 903, a processor 901, a memory 902 and a computer program stored on the memory 902 and executable on the processor 901, the processor 901 implementing the steps of the video processing method described above when executing the computer program. Specifically, the transceiver 903 is configured to receive a first request sent by a first terminal device, where the first request is used to request to acquire a first target video stream of a first field angle FOV, where the first field angle FOV is a field angle corresponding to a second terminal device; receiving first FOV track information of a second target video stream sent by the second terminal equipment, wherein the first target video stream is at least part of the second target video stream; the processor 901 is configured to obtain a first target video stream of the first FOV according to the first FOV track information; the transceiver 903 is configured to transmit the first target video stream of the first FOV to the first terminal device.

Optionally, the processor 901 is further configured to: acquiring a second target video stream; and acquiring corresponding relation of each group of picture (GOP) data in the second target video stream, each GOP data and a display time stamp (PTS) of a picture frame in the second target video stream, and the PTS and a recording time Tr of each picture frame in the second target video stream.

Optionally, the processor 901 is further configured to: determining a starting PTS of a first target video stream of the first FOV; in the second target video stream, sequentially reading image group data corresponding to each image frame from the starting PTS until the PTS is finished; restoring to obtain image data of each image frame in the image group data according to the PTS of each image frame in the second target video stream; and processing the image data of each image frame according to the first FOV track information to obtain a first target video stream of the first FOV.

Optionally, the processor 901 is further configured to: and sending the first target video stream of the first FOV to the first terminal device through a Content Delivery Network (CDN).

The bus architecture may include any number of interconnected buses and bridges, with one or more processors 901, represented by processor 901, and various circuits of memory 902, represented by memory 902, being linked together. The bus architecture may also link together various other circuits such as peripherals, voltage regulators, power management circuits, and the like, which are well known in the art, and therefore, will not be described any further herein. The bus interface provides an interface. The transceiver 903 may be a number of elements including a transmitter and a transceiver providing a means for communicating with various other apparatus over a transmission medium. The processor 901 is responsible for managing the bus architecture and general processing, and the memory 902 may store data used by the processor in performing operations.

Those skilled in the art will appreciate that all or part of the steps for implementing the above embodiments may be performed by hardware, or may be instructed to be performed by associated hardware by a computer program that includes instructions for performing some or all of the steps of the above methods; and the computer program may be stored in a readable storage medium, which may be any form of storage medium.

As shown in fig. 10, an embodiment of the present application further provides a video processing apparatus, which is applied to a terminal device, and includes:

a fourth obtaining module 1001, configured to obtain a second target video stream;

a fifth obtaining module 1002, configured to analyze the second target video stream to obtain first FOV track information of the second target video stream;

an uploading module 1003, configured to upload the first FOV track information to a network device.

In the video processing apparatus according to the embodiment of the present application, the fifth obtaining module includes:

the analysis submodule is used for analyzing the second target video stream to obtain the PTS of each image frame;

the second acquisition submodule is used for acquiring FOV coordinate information corresponding to the PTS of each image frame;

and the third acquisition submodule is used for acquiring first FOV track information of the second target video stream according to the FOV coordinate information corresponding to the PTS of each image frame.

In the video processing apparatus according to the embodiment of the present application, the first FOV track information includes: and the identification of the second target video stream, the identification of the second terminal equipment and FOV coordinate information corresponding to PTS of N image frames, wherein N is the total number of the image frames in the second target video stream.

In the video processing apparatus according to the embodiment of the application, the uploading module is configured to upload the first FOV track information to a network device according to a preset byte stream structure;

It should be noted that the apparatus is an apparatus corresponding to the video processing method applied to the terminal device, and all implementation manners in the method embodiments are applicable to the embodiment of the apparatus, and the same technical effect can be achieved.

According to the video processing device, the first FOV track information of the second target video stream is uploaded to the network equipment, the network equipment can obtain the first target video stream of the first FOV according to the first FOV track information, so that the first terminal equipment can obtain the video stream of the field angle of the second terminal equipment, and in the method, the second terminal equipment only needs to transmit the first FOV track information to the network equipment, the direct-broadcast data stream does not need to be calculated and transmitted, and the calculation consumption of VR terminal equipment is effectively reduced.

As shown in fig. 11, an embodiment of the present application further provides a terminal device, where the terminal device is a second terminal device, and the terminal device includes: a transceiver 1104, a processor 1101, a memory 1103 and a computer program stored on the memory 1102 and operable on the processor 1101, the processor 1101 implementing the steps of the video processing method described above when executing the computer program. In particular, the transceiver 1104 is configured to obtain a second target video stream; the processor 1101 is configured to parse the second target video stream to obtain first FOV track information of the second target video stream; the transceiver 1104 is configured to upload the first FOV trace information to a network device.

Optionally, the processor 1101 is further configured to:

analyzing the second target video stream to obtain the PTS of each image frame;

Optionally, the first FOV track information includes: and the identification of the second target video stream, the identification of the second terminal equipment and FOV coordinate information corresponding to PTS of N image frames, wherein N is the total number of the image frames in the second target video stream.

Optionally, the transceiver 1104 is configured to upload the first FOV track information to a network device according to a preset byte stream structure;

It is noted that in FIG. 11, the bus architecture may include any number of interconnected buses and bridges, with one or more processors represented by the processor 1101 and various circuits of memory represented by the memory 1103 linked together. The bus architecture may also link together various other circuits such as peripherals, voltage regulators, power management circuits, and the like, which are well known in the art, and therefore, will not be described any further herein. The bus interface 1102 provides an interface. The transceiver 1104 may be a number of elements, including a transmitter and a transceiver, providing a means for communicating with various other apparatus over a transmission medium. For different terminals, the user interface 1105 may also be an interface capable of interfacing with a desired device, including but not limited to a keypad, display, speaker, microphone, joystick, etc. The processor 1101 is responsible for managing the bus architecture and general processing, and the memory 1103 may store data used by the processor 1101 in performing operations.

In addition, a computer-readable storage medium is provided in a specific embodiment of the present invention, and a computer program is stored thereon, and when the computer program is executed by a processor, the steps in the video processing method are implemented, and the same technical effects can be achieved.

In the several embodiments provided in the present application, it should be understood that the disclosed method and apparatus may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one type of logical functional division, and other divisions may be realized in practice, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may be physically included alone, or two or more units may be integrated into one unit. The integrated unit may be implemented in the form of hardware, or in the form of hardware plus a software functional unit.

The integrated unit implemented in the form of a software functional unit may be stored in a computer readable storage medium. The software functional unit is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute some steps of the transceiving method according to various embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a portable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other media capable of storing program codes.

While the foregoing is directed to the preferred embodiment of the present invention, it will be appreciated by those skilled in the art that various changes and modifications may be made therein without departing from the spirit and scope of the invention as defined in the appended claims.

Claims

1. A video processing method is applied to network equipment and is characterized by comprising the following steps:

receiving first FOV track information of a second target video stream sent by the second terminal equipment, wherein the first target video stream is at least part of the second target video stream; the first FOV track information includes: identification of a second target video stream, identification of a second terminal device and FOV coordinate information corresponding to display time stamps PTS of N image frames, wherein N is the total number of the image frames in the second target video stream; obtaining a first target video stream of the first FOV according to the first FOV track information;

transmitting a first target video stream of the first FOV to the first terminal device;

wherein the obtaining a first target video stream of the first FOV according to the first FOV track information includes:

determining a starting PTS of a first target video stream of the first FOV;

2. The video processing method according to claim 1, wherein before receiving the first request sent by the first terminal device, the method further comprises:

acquiring a second target video stream;

3. The video processing method according to claim 1,

under the condition that the first request is used for requesting to acquire a first target video stream of a first FOV in real time, the starting PTS is a starting PTS of a latest GOP, the ending PTS is a last PTS in the second target video stream, and the latest GOP is a GOP corresponding to a latest PTS in first FOV track information uploaded by a second terminal device;

or, in a case that the first request is for requesting to acquire a target video stream of a first field angle FOV which is played from a first time, the starting PTS is a PTS corresponding to the first time in the second target video stream, and the ending PTS is a last PTS in the second target video stream;

4. The video processing method according to claim 1, wherein said transmitting the first target video stream of the first FOV to the first terminal device comprises:

5. A video processing method is applied to terminal equipment and is characterized by comprising the following steps:

acquiring a second target video stream;

analyzing the second target video stream to obtain first FOV track information of the second target video stream; the first FOV track information comprises: identification of a second target video stream, identification of a second terminal device and FOV coordinate information corresponding to PTSs (partial field of view) of N image frames, wherein N is the total number of the image frames in the second target video stream;

uploading the first FOV track information to a network device;

wherein the analyzing the second target video stream to obtain the first FOV track information of the second target video stream includes:

analyzing the second target video stream to obtain the PTS of each image frame;

6. The video processing method of claim 5, wherein uploading the first FOV trajectory information to a network device comprises:

7. A network device, comprising: processor, memory and a computer program stored on the memory and executable on the processor, which computer program, when executed by the processor, carries out the steps of the video processing method according to any one of claims 1 to 4.

8. A terminal device, comprising: processor, memory and a computer program stored on the memory and executable on the processor, which computer program, when executed by the processor, carries out the steps of the video processing method according to any one of claims 5 to 6.

9. A computer-readable storage medium, characterized in that a computer program is stored thereon, which computer program, when being executed by a processor, realizes the video processing method according to one of the claims 1 to 4 or the steps of the video processing method according to one of the claims 5 to 6.