CN110662119A

CN110662119A - Video splicing method and device

Info

Publication number: CN110662119A
Application number: CN201810714820.5A
Authority: CN
Inventors: 薛永革
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2018-06-29
Filing date: 2018-06-29
Publication date: 2020-01-07
Also published as: WO2020001610A1

Abstract

The application discloses a video splicing method and device. The method comprises the following steps: the acquisition equipment sends video splicing information to the network equipment, and sends a plurality of video streams which are not spliced to the network equipment after the network equipment receives the video splicing information, so that the network equipment splices the plurality of video streams which are not spliced according to the video splicing information. By adopting the method, the acquisition equipment does not need to splice a plurality of video streams, and the network equipment splices the plurality of video streams, so that the efficiency of video splicing can be effectively improved, and further the transmission delay is reduced.

Description

Video splicing method and device

Technical Field

The present application relates to the field of communications technologies, and in particular, to a video stitching method and apparatus.

Background

Virtual Reality (VR) technology is a computer simulation system that creates and experiences a virtual world, with a computer creating a simulated environment into which a user is immersed. VR technology mainly includes aspects such as simulated environment, perception, natural skills and sensing equipment. The simulated environment is a three-dimensional realistic image generated by a computer and dynamic in real time. The perception means that the ideal VR should have the perception of all people, and in addition to the visual perception generated by the computer graphics technology, the perception includes the senses of hearing, touch, force sense, movement and the like, and even the sense of smell and taste. The natural skill refers to the head rotation, eyes, gestures or other human body behavior actions of a person, and the data adaptive to the actions of the participants are processed by the computer, respond to the input of the user in real time and respectively feed back to the five sense organs of the user. The sensing device is a three-dimensional interaction device which can collect the user's motion and feed the motion as input back to the computer simulation system.

Visual senses play an extremely important role in VR. The most basic VR systems address virtual visual senses first. Therefore, the basic VR system firstly has the following three points, namely, the first point is that the original visual input of a person is blocked; secondly, occupying all vision by virtual image light; and thirdly, the interaction with the image achieves the effect of deceiving the brain.

Panoramic video has expanded traditional video technique and has reached the purpose that the VR was immersed. The panoramic video is also called a 360-degree video, and is obtained by shooting an environment by using a plurality of cameras to obtain a plurality of video streams, and synthesizing the plurality of video streams by using technologies such as synchronization, splicing and the like. Unlike the traditional video which only passively watches the picture and lens of a given field of view (FOV) shot by a photographer, the panoramic video can allow a user to watch a dynamic video at any position of 360 degrees up, down, left and right of a shooting point in an active interaction manner, so that the user has a sense of being personally on the scene in a real sense without being limited by time, space and region.

However, the current streaming media technology cannot meet the delay requirement due to the high requirement on the transmission delay of the panoramic video in the real-time communication scene.

Disclosure of Invention

In view of this, embodiments of the present application provide a video splicing method and apparatus, which are used to reduce transmission delay of a panoramic video.

In a first aspect, an embodiment of the present application provides a video splicing method, including:

the acquisition equipment sends video splicing information to the network equipment; and after determining that the network device receives the video splicing information, sending a plurality of video streams which are not spliced to the network device, wherein the video splicing information is used for splicing the plurality of video streams which are not spliced.

By adopting the method, the acquisition equipment sends the video splicing information to the network equipment, so that the acquisition equipment can splice a plurality of video streams without splicing the video streams in the subsequent process, and the network equipment splices the video streams according to the video splicing information.

In one possible design, the video splicing information includes an identifier of the plurality of video streams that are not spliced, synchronization information between the plurality of video streams, and camera calibration parameters corresponding to the plurality of video streams, respectively.

In one possible design, the acquiring device sends video splicing information to a network device, and the method includes:

and the acquisition equipment transmits the video splicing information to the network equipment through terminal equipment.

In one possible design, the capture device sending the plurality of video streams that are not spliced to the network device includes:

the acquisition equipment sends a plurality of video streams which are not spliced to the network equipment through the terminal equipment; alternatively, the first and second electrodes may be,

and the acquisition equipment receives the address information of the network equipment sent by the terminal equipment and sends the plurality of video streams which are not spliced to the network equipment according to the address information of the network equipment.

In a second aspect, an embodiment of the present application provides a video stitching method, where the method includes:

the network equipment receives video splicing information sent by the acquisition equipment;

and the network equipment receives the plurality of video streams which are not spliced and sent by the acquisition equipment, and splices the plurality of video streams according to the video splicing information.

By adopting the method, the acquisition equipment can splice the plurality of video streams without splicing the plurality of video streams, and the network equipment can splice the plurality of video streams, so that the processing capacity of the network equipment is stronger than that of the acquisition equipment, the video splicing efficiency can be effectively improved, and the transmission delay is further reduced.

In one possible design, the method further includes:

the network equipment receives a request message from the presentation equipment, wherein the request message is used for indicating the current field angle of a user;

the network equipment processes the spliced video stream according to the current field angle of the user to obtain a first video stream, and sends the first video stream to the presentation equipment; and the corresponding field angle of the first video stream is the current field angle of the user.

Here, the request message may be a specific RTCP message extended by the embodiment of the present application. Because the field angle of the user may change frequently, the presentation device sends the request message to the network device, so that the network device can transmit the video stream corresponding to the changed field angle in time according to the changed field angle, and user experience is improved. It can be understood that the embodiments of the present application are not limited to the presentation device sending the request message to the network device in the scene where the angle of view changes, and may also be in other scenes.

In one possible design, the method further includes:

the network device obtains a second video stream according to the spliced video stream, and sends the second video stream to the presentation device, wherein the field angle corresponding to the second video stream is larger than the current field angle of the user, and the video quality of the second video stream is lower than that of the first video stream.

In one possible design, the request message further includes an identification of the first video stream; the identification of the first video stream is a synchronization source SSRC identifier.

In one possible design, the network device is a media server deployed at an edge data center of the rendering device.

Therefore, the transmission delay of the request message sent by the presentation device to the network device and the transmission delay of the updated video stream returned by the network device to the presentation device according to the request message can be effectively reduced, and the user experience can be further improved.

In a third aspect, an embodiment of the present application provides a video stitching method, where the method includes:

the network equipment receives a request message sent by the presentation equipment, wherein the request message is used for indicating the current field angle of the user;

the network equipment acquires spliced video streams and processes the spliced video streams according to the current field angle of a user to obtain a first video stream, wherein the field angle corresponding to the first video stream is the current field angle of the user;

the network device sends the first video stream to the presentation device.

In one possible design, the method further includes:

In a fourth aspect, an embodiment of the present application provides a video stitching method, where the method includes:

the presentation equipment sends a request message to network equipment, wherein the request message is used for indicating the current field angle of the user;

and the presentation equipment receives a first video stream returned by the network equipment according to the request message and plays the first video stream, wherein the field angle corresponding to the first video stream is the current field angle of the user.

Therefore, the request message is sent to the network equipment through the presentation equipment, so that the network equipment can transmit the video stream corresponding to the current field angle of the user to the presentation equipment in time according to the current field angle of the user, and the user experience is improved.

In one possible design, before the presenting device sends the request message to the network device, the method further includes:

the presentation device determines that the user's field of view has changed.

In one possible design, the method further includes:

the presentation device receives a second video stream sent by the network device, wherein a field angle corresponding to the second video stream is larger than a current field angle of the user, and the video quality of the second video stream is lower than that of the first video stream;

after the presenting device determines that the field angle of the user changes, the method further comprises:

the presentation device processes the second video stream according to the current field angle of the user to obtain a third video stream, wherein the field angle corresponding to the third video stream is the current field angle of the user;

and the presentation device plays the third video stream.

By adopting the above manner, after the presentation device determines that the field angle of the user changes, the presentation device may send a request message to the network device, and process the second video stream according to the current field angle of the user to obtain a third video stream, where the field angle corresponding to the third video stream is the current field angle of the user, and then play the third video stream; and subsequently, after receiving the video stream corresponding to the current field angle of the user, which is returned by the network equipment according to the request message, starting to play the video stream. Therefore, after the field angle of the user is changed, the presentation device obtains a third video stream with lower video quality according to the second video stream and plays the third video stream so as to respond to the change of the field angle in time; after the network equipment receives the request message, the video stream corresponding to the current field angle of the user can be adjusted in time and fed back to the presentation equipment for playing; due to the visual delay effect of human eyes, a user may not perceive or have no obvious perception of the short delay process when watching a video, so that the user experience can be effectively improved.

In a fifth aspect, an embodiment of the present application provides a video stitching method, where the method includes:

the terminal equipment receives video splicing information sent by the acquisition equipment;

and the terminal equipment sends the video splicing information to network equipment, and the video splicing information is used for splicing a plurality of video streams which are not spliced.

In one possible design, after the terminal device sends the video splicing information to a network device, the method further includes:

and the terminal equipment receives the plurality of video streams which are not spliced and sent by the acquisition equipment, and sends the plurality of video streams which are not spliced to the network equipment.

In a sixth aspect, an embodiment of the present application provides an apparatus, where the apparatus may be an acquisition device, a network device, a presentation device, or a terminal device, or may also be a chip disposed in the acquisition device, the network device, the presentation device, or the terminal device. The apparatus has the functionality to implement the method as described in the various possible designs of any one of the first to fifth aspects above. The function can be realized by hardware, and can also be realized by executing corresponding software by hardware. The hardware or software includes one or more modules or units corresponding to the above functions.

In a seventh aspect, an embodiment of the present application provides an apparatus, including: a processor and a memory; the memory is configured to store computer executable instructions that, when executed by the processor, cause the apparatus to perform the method as set forth in any of the various possible designs of the first to fifth aspects.

In an eighth aspect, the present embodiments also provide a computer-readable storage medium, which stores instructions that, when executed on a computer, cause the computer to perform the method according to any one of the possible designs of the first aspect to the fifth aspect.

In a ninth aspect, the present application also provides a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method as set forth in the various possible designs of any one of the first to fifth aspects above.

These and other aspects of the present application will be more readily apparent from the following description of the embodiments.

Drawings

FIG. 1a is a schematic diagram of a system architecture suitable for use in the present application;

FIG. 1b is a schematic diagram of another system architecture provided in the embodiments of the present application;

FIG. 1c is a schematic diagram of another system architecture according to an embodiment of the present application;

fig. 2 is a schematic flowchart of a video stitching method according to an embodiment of the present application;

fig. 3 is a schematic flowchart of a video stitching method according to a second embodiment of the present application;

fig. 4 is a schematic flowchart of a video stitching method according to a third embodiment of the present application;

fig. 5 is a flowchart illustrating a method for updating a video stream according to a fourth embodiment of the present application;

fig. 6 is a flowchart illustrating a method for updating a video stream according to a fifth embodiment of the present application;

FIG. 7 is a schematic diagram of an apparatus according to an embodiment of the present disclosure;

fig. 8 is a schematic structural diagram of another apparatus provided in the embodiment of the present application.

Detailed Description

The embodiments of the present application will be described in detail below with reference to the accompanying drawings.

In the streaming media technology, a transmission technology based on hypertext transfer protocol (HTTP), such as a hypertext transfer protocol live streaming (HLS), is usually adopted, and Content Delivery Network (CDN) is used to perform distribution in a file fragmentation manner. The operations of the file making and the fragment making of the CDN generally bring about a 5-10s time delay, so that each time a layer of CDN delivery is introduced, the time delay is added up by an order of magnitude of 5-10 s. Therefore, the time delay of 10-15s is brought to the large-scale panoramic video live broadcast at present.

However, the end-to-end delay of real-time communication (RTC) needs to be less than 400ms to make sense. From the perspective of improving the user experience, it is desirable to do so for less than 300ms (or even less than 200 ms). Therefore, from the perspective of low latency, real-time communication applications of panoramic video cannot be constructed based on streaming media technology.

In order to reduce the delay of the streaming media technology as much as possible, a stream pushing mode of a real-time messaging protocol (RTMP) is introduced in the live broadcast field, and at the same time, the CDN level is reduced, and transcoding operations (such as multi-rate file transcoding) on the CDN are reduced. In this way, the latency can be between 2-5 seconds (without using a CDN or with only one layer of a CDN). However, since video files inside the RTMP are encapsulated based on the streaming media Format (FLV), although smaller than file fragments of the HLS, the encapsulation still causes a second-level delay.

Based on this, the embodiment of the application provides a video splicing method, which is used for reducing the transmission delay of a panoramic video.

Fig. 1a is a schematic diagram of a system architecture applicable to the embodiment of the present application, as shown in fig. 1a, the system architecture includes: acquisition device 101, presentation device 102, and network device 103.

The capture device 101 and the rendering device 102 may have network access functionality. Specifically, the acquisition device 101 may establish a communication connection with the network device 103 in a network access manner, such as a wireless network connection or a wired network connection, which is not limited specifically; likewise, the presentation device 102 may also establish a communication connection with the network device 103 by way of network access. In this manner, the capture device 101 and the rendering device 102 may perform signaling (i.e., signaling plane communications) via the network device 103.

The capture device 101 may also have a function of capturing streaming media data (video data and/or audio data may be collectively referred to as streaming media data), for example, a panoramic camera may be disposed in the capture device 101, and then video data is captured by the panoramic camera; accordingly, the presentation device 102 may also have a function of playing audio and/or video for the user, for example, a VR headset may be provided in the presentation device 102, and the panoramic video is played for the user through the VR headset. In this manner, the capture device 101 and the rendering device 102 may perform streaming media data transmission (i.e., media plane communication) via the network device 103.

Fig. 1b is a schematic diagram of another system architecture provided in the embodiment of the present application, as shown in fig. 1b, the system architecture includes: an acquisition device 101, a rendering device 102, a first media server 1031, a core network element 1032, and a second media server 1033.

The capture device 101 and the rendering device 102 may have network access functionality. Specifically, the acquisition device 101 may establish a communication connection with the first media server 1031 in a network access manner, such as a wireless network connection or a wired network connection, which is not limited specifically; likewise, the rendering device 102 may also establish a communication connection with the second media server 1033 by way of network access. In this way, the collecting device 101 and the presenting device 102 can perform signaling transmission (i.e. signaling plane communication) through the first media server 1031, the core network element 1032 and the second media server 1033.

The acquisition device 101 may also have a function of acquiring streaming media data; accordingly, the presentation device 102 may also have the capability to play audio and/or video for the user. The media server is a core system of streaming media application, and is a key platform for operators to provide video services to users. The main functions of the media server are to cache, schedule and transmit streaming media data. Further, the main function of the media server (such as the first media server 1031) on the acquisition side is to acquire streaming media data from the acquisition device 101 by a streaming media protocol and transmit the streaming media data to the media server on the presentation side; the main function of the media server on the rendering side (such as the second media server 1033) is to receive streaming media data from the media server on the acquisition side through a streaming media protocol, and transmit the streaming media data to the rendering device 102 for playing. That is, the capture device 101 and the rendering device 102 may perform streaming media data transfer (i.e., media plane communication) via the first media server 1031 and the second media server 1033.

The core network element 1032 is mainly responsible for signaling control during a call session. In this application, the core network element may receive signaling from the first media server 1031 and forward it to the second media server 1033. Further, the core network element 1032 may be an application control platform of a third party or may also be an operator's own device.

Fig. 1c is a schematic diagram of another system architecture provided in the embodiment of the present application, as shown in fig. 1c, the system architecture includes: the acquisition device 101, the first terminal device 104, the presentation device 102, the second terminal device 105, the first media server 1031, the core network element 1032, and the second media server 1033.

The acquisition device 101 may establish a communication connection, such as a wired connection or a wireless-fidelity (Wi-Fi) connection, with the first terminal device 104; likewise, the rendering device 102 may also establish a communication connection, such as a wired connection or a Wi-Fi connection, with the second terminal device 105. The first terminal device 104 and the second terminal device 105 have a network access function. Specifically, the first terminal device 104 may establish a communication connection with the media server 1031 by way of network access, such as a wireless network connection or a wired network connection, which is not limited specifically; likewise, the second terminal device 105 can also establish a communication connection with the media server 1033 by means of network access. In this way, the acquisition device 101 may access the network through the first terminal device 104, and the presentation device 102 may access the network through the second terminal device 105, and further perform signaling transmission (i.e., signaling plane communication) through the first media server 1031, the core network element 1032, and the second media server 1033. The core network element is mainly configured to implement signaling forwarding between the first media server 1031 and the second media server 1033. It should be noted that, in the present application, a specific connection manner between the devices is not limited.

The acquisition device 101 may have a function of acquiring streaming media data; accordingly, the presentation device 102 may also have the capability to play audio and/or video for the user. The capture device 101 and the rendering device 102 may be in streaming media data transfer (i.e., media plane communication) via the media server 1031 and the media server 1033.

Further, the capture device 101 may be a panoramic camera and the presentation device 102 may be a VR headset.

The terminal device (for example, the first terminal device 104 or the second terminal device 105) in the embodiment of the present application is a device having a wireless transceiving function, and specifically may be a mobile phone (mobile phone), a tablet computer (Pad), a computer with a wireless transceiving function, a wireless terminal in industrial control (industrial control), a wireless terminal in self driving (self driving), a wireless terminal in remote medical (remote medical), a wireless terminal in smart grid (smart grid), a wireless terminal in transportation safety (transportation safety), a wireless terminal in smart city (smart city), a wireless terminal in smart home (smart home), and the like. The embodiments of the present application do not limit the application scenarios. A terminal device may also be sometimes referred to as a User Equipment (UE), an access terminal device, a UE unit, a UE station, a mobile station, a remote terminal device, a mobile device, a UE terminal device, a wireless communication device, a UE agent, or a UE apparatus, etc.

For the system architectures illustrated in fig. 1a, 1b and 1c, it should be noted that: (1) the network device 103 in fig. 1a may be a collective term for the media server 1031, the media server 1033 and the core network element 1032 illustrated in fig. 1b and fig. 1 c. (2) In the system architecture illustrated in fig. 1a and 1b, the acquisition device 101 and the presentation device 102 have a network access function; in the system architecture illustrated in fig. 1c, the acquisition device 101 and the presentation device 102 may not have a network access function, and may access the network through the first terminal device 104 and the second terminal device 105, respectively.

The communication systems applicable to the system architectures illustrated in fig. 1a, 1b and 1c include, but are not limited to: 5G New Radio (NR) communication systems and IP Multimedia Systems (IMS).

Taking the system architecture illustrated in fig. 1c as an example, if the system architecture is suitable for a 5G NR system, the media server may specifically be a multimedia function (MMF) network element or a multimedia resource processor (MRFP), and the core network element may specifically be an Application Function (AF) network element. If the system architecture is applicable to the IMS, the media server may specifically be a Session Border Controller (SBC), and the core network element may specifically be a Call Session Control Function (CSCF) network element.

To facilitate understanding of aspects of embodiments of the present application, a few concepts of embodiments of the present application are briefly described below.

(1) According to 3GPP 26.9195.4.5, after a plurality of video streams are captured by a panoramic camera, the plurality of video streams need to be spliced to obtain a panoramic video stream, and then the panoramic video stream is transmitted to a VR head display. The intra-machine splicing of the panoramic camera is a very key problem technical point, but the splicing of the intra-machine panoramic video stream can bring obvious time delay (at present, the time delay of the intra-machine splicing link is about 200ms in engineering analysis). However, since 200ms is not sensitive to end-to-end second-level latency in the prior art streaming scenario, every 50ms latency reduction in real-time communication requires a significant technical revolution. Based on this, the embodiment of the application introduces real-time splicing at the network side, and specifically, after the panoramic camera collects a plurality of video streams, the video stream splicing may not be executed, the collected video streams are directly transmitted to the network device, and the network device splices the video streams. Considering that the performance of the acquisition equipment (or panoramic camera) is limited due to cost factors, the network equipment can serve a plurality of acquisition equipment, and the performance of the network equipment is far stronger than that of the acquisition equipment, so that the network equipment is spliced, the splicing efficiency can be obviously improved, the splicing time delay is reduced, and the purpose of reducing the transmission time delay of real-time communication is further realized.

Further, in the system architecture, a transmission protocol used for performing signaling plane communication between the devices may be a Session Initiation Protocol (SIP), a transmission protocol used for performing media plane communication may be a real-time transport protocol (RTP)/real-time transport control protocol (RTCP) protocol, and further, the acquisition device illustrated in fig. 1c may use a real-time streaming protocol (RTSP) to interactively cooperate with the first terminal device. It is understood that video data is transmitted through RTP, video quality is controlled through RTCP, and video control (e.g., fast forward, rewind, etc.) is provided through RTSP. As a specific example, the embodiment of the present application will implement network-side real-time splicing by extending a communication protocol used between devices (e.g., adding signaling or extending a field in the signaling).

(2) Considering that the angle of view of the user may change in the process of transmitting the video stream corresponding to the angle of view of the user to the presentation device by the network device (for example, the angle of view of the user may change when the user rotates the head in the process of watching the video by using the VR head), at this time, if the video stream corresponding to the changed angle of view is not transmitted in time according to the changed angle of view, the user experience may be poor. Based on this, in the embodiment of the present application, after determining that the field angle of the user changes, the presentation device may send a specific RTCP message (the message is an extended message in the embodiment of the present application) to the network device, where the specific RTCP message is used to indicate the current field angle of the user (i.e., the changed field angle), and thus, the network device may process the spliced video stream according to the current field angle of the user, obtain a video stream corresponding to the current field angle of the user, and transmit the video stream to the presentation device. By the method, the video stream corresponding to the changed field angle can be transmitted in time according to the changed field angle, and user experience is improved.

It should be noted that, the methods described in (1) and (2) in the embodiment of the present application may be applied separately or in combination, and are not particularly limited, and the following embodiment mainly describes the combination of the two as an example.

The following describes a video stitching method provided by the embodiment of the present application with reference to specific embodiments.

Example one

In the first embodiment, the embodiment of the present application is mainly described based on the system architecture illustrated in fig. 1 a.

Fig. 2 is a schematic flowchart of a video stitching method according to an embodiment of the present application. As shown in fig. 2, the method includes:

step 201, the acquisition device sends video splicing information to the network device.

Here, the video splicing information is used to splice a plurality of un-spliced video streams, and in one example, the video splicing information may include an identification of the un-spliced video streams, synchronization information between the video streams, and camera calibration parameters corresponding to the video streams, respectively.

For example, if the camera used for capturing the video stream in the capturing device is a 4-view camera, the identifiers of the multiple video streams may include identifiers of 4 video streams captured by the 4-view camera, which are respectively: identification of video stream 1 (11111), identification of video stream 2 (22222), identification of video stream 3 (33333), identification of video stream 4 (44444). If the 4 video streams are subsequently subjected to panoramic stitching, the synchronization information among the multiple video streams may indicate that the 4 video streams need to be synchronized, and the embodiment of the present application does not limit the specific content of the synchronization information and the implementation process of synchronization. The camera calibration parameters corresponding to each video stream may include one or more variables, as shown in table 1, which are examples of variables that may be included in the camera calibration parameters corresponding to each video stream.

Table 1 shows examples of variables that may be included for camera calibration parameters

Variable number	Variables of	Expression of variables
			1	Width and height of image	width，height
2	Crop circle information	cropx、cropy、cropw、croph
			3	Angle of view	v
4	Attitude information, three rotation angles	y、r、p
			5	Amount of translation	d,e
6	Amount of pruning	g,t

In a possible implementation manner, the collecting device may send a first call request message (specifically, an invite message) to the network device, where the first call request message carries the video splicing information. Considering that the existing invite message does not support carrying of video splicing information, the embodiment of the present application may extend a Session Description Protocol (SDP), so as to achieve the purpose of sending video splicing information to a network device through the invite message (SIP message).

There may be a plurality of possible extension modes for any one of the information included in the video splicing information. For example, table 2 illustrates one possible extension for the identification of the multiple video streams that are not spliced and the synchronization information between the multiple video streams. The information carried by the multi-stream field of the SDP extension is the identifiers of the plurality of video streams that are not spliced, and the information carried by the stream synchronization field of the SDP extension is the synchronization information between the plurality of video streams. It should be noted that, only the stream synchronization field when 2 video streams need to be synchronized is shown in table 2, and when there are 4 video streams that need to be synchronized or more video streams that need to be synchronized, the stream synchronization field can be referred to and executed, which is not described herein again.

Table 2: field examples for SDP extensions

Further, in the embodiment of the present application, the network device also needs to extend the SDP, so that the network device has the capability of analyzing the extended SDP, so that the video splicing information can be obtained through analysis after receiving the invite message from the acquisition device.

Step 202, after receiving the video splicing information sent by the acquisition device, the network device sends video playing information to the presentation device. The video playing information may be used to indicate a format of a video stream transmitted by the network device to the presentation device, for example, the video playing information may be in an omni media format (OMAF) or a Tiled VR format.

Here, the network device may send a second call request message to the presentation device, the second call request message including the video playback information.

In step 203, the presentation device receives the video playing information and returns a first response message (for example, a 183 message) to the network device. Here, the media plane address information of the rendering device may be carried in the first response message to facilitate media plane communication between the subsequent network device and the rendering device.

In step 204, after receiving the first response message, the network device sends a second response message (for example, 183 message) to the collection device. Here, the second response message may carry media plane address information of the network device, so as to facilitate media plane communication between the subsequent acquisition device and the network device.

Thus, through the above steps 201 to 204, the acquisition device and the presentation device complete signaling plane communication and establish communication connection. It should be noted that: the steps 201 to 204 are only to simply illustrate the flow of signaling plane communication, and other steps may be involved in specific implementation, which is not limited in the embodiment of the present application. In a possible implementation manner, the step flow of the signaling plane communication may be the same as the step flow of the signaling plane communication in the prior art, but the content difference carried in the signaling transmitted between the devices is different from that in the prior art, for example, the video splicing information may be carried in the first call request message sent by the acquisition device to the network device. Therefore, the embodiment of the application only needs to expand the communication protocol between the devices, and can be applicable to the existing communication process, so that the method and the device have strong applicability and are convenient to implement.

In step 205, after the acquisition device determines that the network device receives the video splicing information, it sends a plurality of video streams that are not spliced to the network device.

Here, the specific implementation manner of the acquisition device determining that the network device receives the video splicing information may be various. For example, if the capture device sends the video splicing information to the network device through the first call request message, the capture device may determine that the network device receives the video splicing information after receiving a call response message (200OK) of the first call request message returned by the network device.

Further, the acquisition device may send the plurality of video streams that are not spliced to the network device according to the media plane address information of the network device carried in the second response message.

And step 206, the network device receives the plurality of un-spliced video streams, splices the plurality of un-spliced video streams according to the video splicing information, and processes the spliced video streams according to the current field angle of the user to obtain a first video stream, wherein the field angle corresponding to the first video stream is the current field angle of the user.

Optionally, the network device may further obtain a second video stream according to the spliced video stream, where a field angle corresponding to the second video stream is larger than the current field angle of the user, and a video quality of the second video stream is lower than a video quality of the first video stream. In one example, the second video stream is a panoramic video stream.

It should be noted that, in the embodiment of the present application, there may be multiple specific implementation manners for the network device to obtain the first video stream according to the spliced panoramic video stream, for example, the network device performs clipping on the panoramic video stream according to the current field angle of the user to obtain a video stream (i.e., the first video stream) corresponding to the current field angle of the user, and the specific implementation manner is not limited. The network device may obtain the second video stream according to the spliced video stream in various specific implementation manners, which are not limited specifically.

Step 207, the network device sends the first video stream to the rendering device.

Optionally, if the network device also generates a second video stream in step 206, the second video stream may be sent to the presentation device simultaneously or separately. Further, the first video stream may carry an identifier of the first video stream, and the second video stream may carry an identifier of the second video stream, so that after receiving the first video stream and the second video stream, the presentation device may distinguish the first video stream from the second video stream according to the identifiers. Wherein the identification of the first video stream and the identification of the second video stream may be represented using different Synchronization Source (SSRC) fields, that is, the identification of the first video stream and the identification of the second video stream may both be SSRC identifiers.

In the embodiment of the application, the network device may obtain the current field angle of the user in various ways before processing the spliced video stream according to the current field angle of the user. The following two scenarios are described below.

Scene 1: in the process of media plane communication

One possible implementation manner is that the presentation device may monitor the field angle of the user in real time, and after determining that the field angle of the user changes, send a request message (the message name may be defined as a Refresh FOV) to the network device, where the request message is used to indicate the current field angle of the user; therefore, after receiving the request message, the network device can acquire the current field angle of the user.

Further, the request message may be an RTCP message, and the request message may include current field angle information of the user, and in one example, the current field angle information of the user may include a center azimuth angle corresponding to the current field angle of the user, a center elevation corresponding to the current field angle of the user, an azimuth angle range corresponding to the current field angle of the user, an elevation range corresponding to the current field angle of the user, and a center tilt angle corresponding to the current field angle of the user, which is not limited in this embodiment of the application. The request message may further include an identifier of the first video stream, so that after receiving the request message sent by the presentation device, the network device may determine, according to the synchronization source identifier carried in the request message, that the video stream that needs to be updated is the first video stream.

As shown in table 3, is an example of a key field of the request message.

Table 3: key field examples for request messages

It should be noted that, during the media plane communication, the angle of view of the user may change many times, and therefore, each time the angle of view of the user changes, the presentation device may send the request message to the network device to indicate the current angle of view of the user.

Scene 2: initial stage of media surface communication

A default angle of view is preset in the network device, so that after the network device splices a plurality of video streams that are not spliced, a low-definition panoramic video stream (corresponding to the second video stream) can be obtained according to the spliced video streams, and the spliced video stream is processed based on the default angle of view to obtain a fourth video stream (corresponding to the first video stream), where the angle of view corresponding to the fourth video stream is the default angle of view; the network device sends the low-definition panoramic video stream and the fourth video stream to the rendering device. Correspondingly, after the presentation device receives the low-definition panoramic video stream and the fourth video stream, if it is determined that the current field angle of the user is different from the default field angle, a request message may be sent to the network device, where the request message is used to indicate the current field angle of the user. Thus, after receiving the request message, the network device can acquire the field angle of the user. In other possible implementations, the rendering device may also actively request the message from the network device during an initial phase of the media plane communication.

It should be noted that the key fields of the request message in scenario 2 may be the same as the key fields of the request message in scenario 1, and are not described herein again.

In step 208, the rendering device receives the first video stream and plays the first video stream.

Here, if the network device sends the first video stream and the second video stream to the presentation device in step 207, the presentation device may receive the first video stream and the second video stream accordingly, and identify the first video stream according to the identifier of the first video stream and the identifier of the second video stream, so as to play the first video stream.

According to the description, the acquisition equipment sends the video splicing information to the network equipment, so that the acquisition equipment does not need to splice a plurality of video streams in the subsequent process, and the network equipment splices the plurality of video streams according to the video splicing information.

In the embodiment of the application, in a situation that the network device sends the first video stream and the second video stream to the presentation device, after the presentation device determines that the field angle of the user changes, the presentation device may send a request message to the network device, and process the second video stream according to the current field angle of the user to obtain a third video stream, where the field angle corresponding to the third video stream is the current field angle of the user, and then play the third video stream; and subsequently, after receiving the video stream corresponding to the current field angle of the user, which is returned by the network equipment according to the request message, starting to play the video stream. Therefore, after the field angle of the user is changed, the presentation device obtains a third video stream with lower video quality according to the second video stream and plays the third video stream so as to respond to the change of the field angle in time; after the network equipment receives the request message, the video stream corresponding to the current field angle of the user can be adjusted in time and fed back to the presentation equipment for playing; due to the visual delay effect of human eyes, a user may not perceive or have no obvious perception of the short delay process when watching a video, so that the user experience can be effectively improved.

The implementation manner of processing the second video stream by the presentation device to obtain the third video stream may refer to the implementation manner of processing the spliced video stream by the network device to obtain the first video stream, which is not limited specifically.

The network device involved in the above step flow may be a general name of the first media server, the second media server, and the core network element. In an embodiment, a process of performing video splicing by a network device is mainly described based on fig. 1a, and in the system architectures illustrated in fig. 1b and fig. 1c, a subject performing video splicing may be a first media server, or may also be a second media server.

In the following, the second embodiment and the third embodiment are described by taking the main body for performing video splicing as the second media server.

Example two

In the second embodiment, the embodiment of the present application is mainly described based on the system architecture illustrated in fig. 1 b.

Fig. 3 is a schematic flowchart of a video stitching method according to the second embodiment of the present application. As shown in fig. 3, the method includes:

step 301, the collection device sends video splicing information to the first media server.

Here, the capture device may send a first call request message (specifically, an invite message) to the first media server, where the first call request message carries the video splicing information.

Step 302, after receiving the video splicing information sent by the acquisition device, the first media server forwards the video splicing information to a core network element.

Here, the first media server may send a second call request message (specifically, an invite message) to the core network element, where the second call request message carries the video splicing information.

Step 303, after receiving the video splicing information sent by the first media server, the network element of the core network forwards the video splicing information to the second media server.

Here, the core network element may send a third call request message (specifically, an invite message) to the second media server, where the third call request message carries the video splicing information.

In step 301 to step 303, considering that the existing invite message does not support carrying of video splicing information, in the embodiment of the present application, an SDP may be extended to achieve the purpose of carrying video splicing information through the invite message, and specific ways may refer to the description in embodiment one, and are not described herein again.

And step 304, after receiving the video splicing information sent by the network element of the core network, the second media server sends video playing information to the presentation device.

Here, the second media server may send a fourth call request message (specifically, an invite message) to the second terminal device, where the fourth call request message carries video playing information.

The video playing information may be used to indicate a format of a video stream transmitted by the network device to the presentation device, for example, the video playing information may be in an oma af or a Tiled VR format.

In step 305, after receiving the video playing information, the rendering device sends a first response message (specifically, 183 message) to the second media server.

Here, the first response message may carry a media plane address of the second terminal device, so as to facilitate subsequent media plane communication between the second media service and the second terminal device.

Step 306, after receiving the first response message, the second media server sends a second response message to the core network element. Here, the second response message may carry a media plane address of the second media server.

Step 307, after receiving the second response message sent by the second media server, the network element of the core network sends a third response message to the first media server. Here, the third response message may carry a media plane address of the second media server, so as to facilitate subsequent media plane communication between the first media server and the second media server.

It should be noted that the core network element may not participate in the media plane communication, and is mainly used for forwarding the signaling between the first media server and the second media server in the signaling plane communication.

Step 308, after receiving the third response message sent by the core network element, the first media server sends a fourth response message to the collection device.

In this embodiment, compared to steps 201 to 204 illustrated in fig. 2, steps 301 to 308 specifically illustrate a signaling transmission process that is participated by the first media server, the second media server, and the core network element, and other contents can refer to the description of steps 201 to 204, which is not described herein again.

Step 309, the capture device sends the plurality of un-stitched video streams to the first media server.

In step 310, the first media server receives the plurality of video streams sent by the capture device without splicing, and sends the plurality of video streams to the second media server.

And 311, the second media server splices the plurality of video streams which are not spliced according to the video splicing information, and processes the spliced video streams according to the current field angle of the user to obtain a first video stream, wherein the field angle corresponding to the first video stream is the current field angle of the user.

Optionally, the second media server may further obtain a second video stream according to the spliced video stream, where a field angle corresponding to the second video stream is greater than the current field angle of the user, and a video quality of the second video stream is lower than a video quality of the first video stream. In one example, the second video stream is a panoramic video stream.

The second media server sends the first video stream to the rendering device, step 312.

Optionally, if the network device also generates a second video stream in step 311, the first video stream and the second video stream may be simultaneously transmitted to the presentation device.

In step 313, the rendering device receives the first video stream and plays the first video stream.

Here, if the second media server sends the first video stream and the second video stream to the rendering device in step 312, the rendering device may receive the first video stream and the second video stream accordingly, and recognize the first video stream according to the identifier of the first video stream and the identifier of the second video stream, so as to play the first video stream.

In this embodiment, compared to step 205 to step 208 illustrated in fig. 2, step 309 to step 313 specifically illustrate a video streaming process in which the first media server and the second media server participate, and other contents can refer to the description of step 205 to step 208, which is not described herein again.

In the second embodiment, the SDP extension needs to be performed on the acquisition device, the first media server, the core network element, and the second media server, so that the SIP message transmitted among the acquisition device, the first media server, the core network element, and the second media server can carry video splicing information.

According to the description, the video splicing information is sent to the second media server by the acquisition equipment, so that the acquisition equipment does not need to splice a plurality of video streams in the subsequent process, and the second media server splices the plurality of video streams according to the video splicing information.

EXAMPLE III

In the third embodiment, the embodiment of the present application is mainly described based on the system architecture illustrated in fig. 1 c.

Fig. 4 is a schematic flowchart corresponding to a video stitching method provided in the third embodiment of the present application, and as shown in fig. 4, the method includes:

400a, the first terminal device and the acquisition device establish a connection.

In a possible implementation manner, the first terminal device and the acquisition device may perform automatic pairing through an existing near field device pairing technology, so as to establish a connection, and the first terminal device configures an RTSP address for the acquisition device.

400b, the second terminal device establishes a connection with the rendering device.

In a possible implementation, the second terminal device and the rendering device may establish a connection via Wi-Fi or a Universal Serial Bus (USB).

In step 401, the first terminal device sends a description request message (DESCRIBE request) to the collection device.

Here, the description request message is for requesting acquisition of media initialization description information of the acquisition device.

Step 402, the collecting device returns a description response message (DESCRIBE response) to the first terminal device according to the description request message.

Here, the description response message may be a 200OK message. The description response message may carry media initialization description information. In the embodiment of the application, the description response message may also carry video splicing information. In consideration that the existing description response message does not support carrying of the video splicing information, the embodiment of the application can extend the SDP to achieve the purpose of sending the video splicing information to the first terminal device through the description response message.

In the embodiment of the present application, for specific description of video splicing information, reference may be made to the description in the first embodiment, and details are not described here again. Further, the implementation manner of extending the SDP may be the same as that of extending the SDP in the first embodiment, and details are not described here.

By carrying out SDP extension on the acquisition equipment and the first terminal equipment, video splicing information can be carried in RSTP messages transmitted between the acquisition equipment and the terminal equipment.

Step 403, after receiving the description response message, the first terminal device sends a first call request message (specifically, invite message) to the first media server, where the first call request message carries video splicing information.

In step 404, after receiving the first call request message sent by the first terminal device, the first media server sends a second call request message (specifically, an invite message) to the core network element, where the second call request message carries video splicing information.

In step 405, after receiving the second call request message sent by the first media server, the core network element sends a third call request message (which may be an invite message) to the second media server, where the third call request message carries video splicing information.

In step 403 to step 405, considering that the existing invite message does not support carrying of video splicing information, in the embodiment of the present application, an SDP may be extended to achieve the purpose of carrying video splicing information through the invite message, and specific means may refer to the description in embodiment one, and details are not described here.

By carrying out SDP extension on the first terminal equipment, the first media server, the core network element and the second media server, the SIP messages transmitted among the first terminal equipment, the first media server, the core network element and the second media server can carry video splicing information.

In step 406, after receiving the third call request message sent by the core network element, the second media server sends a fourth call request message (specifically, invite message) to the second terminal device, where the fourth call request message carries video playing information.

In step 407, after receiving the fourth call request message, the second terminal device sends a first response message (specifically, may be 183 message) to the second media server.

In step 408, after receiving the first response message sent by the second terminal device, the second media server sends a second response message to the core network element.

Here, the second response message may carry a media plane address of the second media server.

In step 409, after receiving the second response message sent by the second media server, the network element of the core network sends a third response message to the first media server.

Here, the third response message may carry a media plane address of the second media server, so as to facilitate subsequent media plane communication between the first media server and the second media server.

In step 410, after receiving the third response message sent by the core network element, the first media server sends a fourth response message to the first terminal device.

Here, the fourth response message may carry a media plane address of the first media server, so as to facilitate subsequent media plane communication between the first terminal device and the first media server.

Step 411, after receiving the fourth response message, the first terminal device sends a SETUP request message (SETUP request) to the collection device.

Here, the establishment request message may be used to set the attributes and transmission mode of the session, and to remind the acquisition device to establish the session, and the like. The establishment request message may carry transport address information (to facilitate communication after session establishment), in a possible implementation manner, the transport address information may include a media plane address of the first terminal device, and further, the first terminal device may establish a correspondence between the media plane address of the first terminal device and the media plane address of the first media server.

In step 412, the collection device returns a SETUP response message (SETUP response) to the first terminal device.

Here, the setup response message may be a 200OK message for establishing a session with the first terminal device and returning a session identifier and session-related information.

In step 413, the first terminal device sends a PLAY request message (PLAY request) to the collection device.

Here, the play request message is used to request play, that is, to request the capture device to send a video stream.

Step 414, the collecting device sends the plurality of video streams which are not spliced to the first terminal device according to the playing request message.

In step 415, the first terminal device sends the plurality of video streams that are not spliced to the first media server.

Here, the first terminal device may send the plurality of video streams that are not spliced to the first media server according to a correspondence between a media plane address of the first terminal device and a media plane address of the first media server. For example, if the media plane address 1a of the first terminal device corresponds to the media plane address 1b of the first media server, the first terminal device may correspondingly send the video stream received through the address 1a to the address 1 b.

It should be noted that the above flow is described by taking an example in which the transmission address information includes the media plane address of the first terminal device. In another possible implementation manner, the transmission address information includes a media plane address of the first media server, so that after receiving the play request message sent by the first terminal device, the acquisition device can directly send the plurality of video streams that are not spliced to the first media server without being forwarded to the first media server by the first terminal device, thereby effectively improving transmission efficiency and reducing resource consumption of the first terminal device.

In step 416, the first media server receives the plurality of un-stitched video streams and sends the plurality of un-stitched video streams to the second media server.

And step 417, the second media server receives the plurality of video streams which are not spliced, splices the plurality of video streams which are not spliced according to the video splicing information, and processes the spliced video streams according to the current field angle of the user to obtain a first video stream, wherein the field angle corresponding to the first video stream is the current field angle of the user.

In an initial stage of performing media plane communication between a second media server and a second terminal device, in a possible implementation manner, a default angle of view is preset in the second media server, so that after the second media server splices a plurality of video streams that are not spliced, a low-definition panoramic video stream (corresponding to a second video stream) can be obtained according to the spliced video streams, and a fourth video stream (corresponding to a first video stream) is obtained by processing the spliced video stream based on the default angle of view, where the angle of view corresponding to the fourth video stream is the default angle of view; the second media server sends the low-definition panoramic video stream and the fourth video stream to the rendering device. Correspondingly, after receiving the low-definition panoramic video stream and the fourth video stream, the presentation device may send a request message to the second media server if it is determined that the current field angle of the user is different from the default field angle, where the request message is used to indicate the current field angle of the user. Therefore, after receiving the request message, the second media server can acquire the current field angle of the user, and process the spliced video stream according to the current field angle of the user to obtain the first video stream.

In other possible implementation manners, the presentation device may also actively request the second media server for the message at the initial stage of the media plane communication, and thus, after receiving the request message, the second media server may obtain the current angle of view of the user, and process the spliced video stream according to the current angle of view of the user to obtain the first video stream.

In the process of media area communication between the second media server and the second terminal device, the presentation device can monitor the field angle of the user in real time, and send a request message to the second media server after determining that the field angle of the user changes.

It should be noted that the request message mentioned above may be a specific RTCP message extended in the embodiments of the present application, and specific reference may be made to the description in the first embodiment, which is not described herein again.

In step 418, the second media server sends the first video stream to the second terminal device.

Optionally, if the network device also generates a second video stream in step 417, the second video stream may be sent to the presentation device simultaneously or separately.

And 419, the second terminal device receives the first video stream and plays the first video stream through the presentation device.

Here, if the second media server sends the first video stream and the second video stream to the second terminal device in step 418, the second terminal device may receive the first video stream and the second video stream accordingly, identify the first video stream according to the identifier of the first video stream and the identifier of the second video stream, and then play the first video stream through the presentation device.

It should be noted that the above flow is only an exemplary illustration, and in a specific implementation, some steps may be added on the basis, or some of the steps may be deleted, or some of the steps may be replaced, and the specific implementation is not limited. For example, with respect to step 401 and step 402, if there is another way to obtain the media initialization description information, step 401 and step 402 may not be executed.

For the first to third embodiments, it should be noted that: in the first embodiment, the method for updating the first video stream by the rendering device sending the request message to the network device may be applied to various scenarios, for example, this method may also be applied to a scenario in which video splicing is performed by the capture device. Further, the network device herein may be the second media server in the second embodiment and the third embodiment, and the second media server may be deployed in an edge data center of the rendering device in the second embodiment or an edge data center of the second terminal device in the third embodiment, for example, a Mobile Edge Computing (MEC) in a 5G NR scenario; therefore, the transmission delay of the request message sent by the presentation device or the second terminal device to the second media server can be effectively reduced, and the transmission delay of the updated video stream returned by the second media server to the presentation device or the second terminal device according to the request message can be effectively reduced, so that the user experience can be further improved.

In the embodiment of the application, it is considered that the field angle of the user may change frequently in the media plane communication process, and at this time, if the video stream corresponding to the field angle of the user is not updated timely and effectively, the user experience is poor. Based on this, the embodiment of the present application further provides a method for updating a video stream, which is used for updating a video stream corresponding to the field angle of the user in time when the field angle of the user changes, so as to improve user experience. The following is a detailed description with reference to example four and example five.

Example four

In the fourth embodiment, a method for updating a video stream is mainly described based on the system architecture illustrated in fig. 1 a.

Fig. 5 is a flowchart illustrating a method for updating a video stream according to an embodiment of the present application, as shown in fig. 5, including:

step 501, the presentation device sends a request message to the network device, where the request message is used to indicate the current angle of view of the user.

Here, the request message may be a specific RTCP message extended by the embodiment of the present application. The request message may include current field angle information of the user, and in an example, the current field angle information of the user may include a center azimuth angle corresponding to the current field angle of the user, a center elevation corresponding to the current field angle of the user, an azimuth angle range corresponding to the current field angle of the user, an elevation range corresponding to the current field angle of the user, and a center tilt angle corresponding to the current field angle of the user, which is not limited in this embodiment of the application.

In a possible implementation manner, in the process of playing a video, the presentation device may send a request message to the network device if it is determined that the angle of view of the user changes. In this case, before step 501, the method may further include:

in step 500a, the network device sends a first video stream to the presentation device, where the angle of view corresponding to the first video stream is the current angle of view of the user (the angle of view before change). Optionally, the network device may also send a second video stream to the presentation device, where the second video stream corresponds to a field angle that is larger than the current field angle of the user, and in one example, the second video stream may be a panoramic video stream. In this case, the first video stream may carry an identifier of the first video stream, and the second video stream may carry an identifier of the second video stream. Further, the video quality of the second video stream may be lower than the video quality of the first video stream in order to reduce the consumption of network transmission resources.

In step 500b, the presentation device receives the first video stream and plays the first video stream. Here, if the network device sends the first video stream and the second video stream to the rendering device in step 501, the rendering device may receive the first video stream and the second video stream accordingly, and identify the first video stream according to the identifier of the first video stream and the identifier of the second video stream, so as to play the first video stream.

Further, in a situation where the network device sends the first video stream and the second video stream to the presentation device, the request message may further include an identifier of the first video stream, and thus, after receiving the request message sent by the presentation device, the network device may determine, according to the synchronization source identifier carried in the request message, that the video stream that needs to be updated is the first video stream.

In the embodiment of the application, the presentation device may further process the second video stream according to the current field angle of the user to obtain a third video stream, where the field angle corresponding to the third video stream is the changed field angle, and then play the third video stream; in this way, after the field angle of the user is changed, the presentation device obtains the third video stream with lower video quality according to the second video stream and plays the third video stream, so as to respond to the change of the field angle in time. And subsequently, after receiving the video stream corresponding to the changed angle of view returned by the network device according to the request message, starting to play the video stream corresponding to the changed angle of view.

In other possible implementations, the presentation device may also send the request message to the network device triggered by other situations, which is not limited specifically.

Step 502, the network device receives a request message sent by the presentation device, acquires a spliced video stream, and processes the spliced video stream according to a current field angle of the user to obtain a first video stream, where the field angle corresponding to the first video stream is the current field angle of the user (the field angle after change).

Here, there are many possible implementations of the network device obtaining the spliced video stream. One possible implementation manner is that after the acquisition device acquires the plurality of video streams that are not spliced, the acquisition device splices the plurality of video streams that are not spliced according to the video splicing information, and sends the spliced video streams to the network device, so that the network device can acquire the spliced video streams. Another possible implementation manner is that the acquisition device sends the video splicing information to the network device in the signaling communication process, and directly sends the plurality of video streams that are not spliced to the network device, so that the network device can splice the plurality of received video streams that are not spliced according to the video splicing information, and further obtain the spliced video streams.

Optionally, the network device may also obtain the second video stream from the spliced video stream.

In step 503, the network device sends the first video stream (video stream corresponding to the changed angle of view) to the presentation device.

Optionally, if the network device also generates a second video stream in step 502, the second video stream may be sent to the presentation device simultaneously or separately.

Step 504, the presentation device receives the first video stream returned by the network device according to the request message, and plays the first video stream.

Here, if the network device sends the first video stream and the second video stream to the presentation device in step 503, the presentation device may receive the first video stream and the second video stream accordingly, and identify the first video stream according to the identifier of the first video stream and the identifier of the second video stream, so as to play the first video stream.

In the embodiment of the application, the presentation device can notify the network device of the current angle of view of the user by expanding a specific RTCP message, so that the network device can update the video stream corresponding to the current angle of view of the user in time and feed the video stream back to the presentation device for playing, thereby effectively improving the user experience.

It should be noted that the method for updating a video stream provided in the embodiment of the present application may also be applied to the system architectures illustrated in fig. 1b and fig. 1c, and the specific implementation of the method may refer to the foregoing description. The following mainly takes the system architecture illustrated in fig. 1c as an example, and specifically describes, in combination with embodiment five, a process of updating a video stream in a situation where the field angle of a user changes.

EXAMPLE five

In the fifth embodiment, a method for updating a video stream is mainly described based on the system architecture illustrated in fig. 1 c.

Fig. 6 is a flowchart illustrating a method for updating a video stream according to an embodiment of the present application, as shown in fig. 6, including:

step 601, the second media server sends a first video stream and a second video stream to the second terminal device, where the first video stream may carry an identifier of the first video stream, and the second video stream may carry an identifier of the second video stream.

Here, the identity of the first video stream and the identity of the second video stream may be represented using different SSRC fields. The field angle corresponding to the first video stream is the current field angle of the user (the field angle before change), and the field angle corresponding to the second video stream is larger than the current field angle of the user. In one example, the second video stream may be a panoramic video stream.

In the embodiment of the present application, the video quality of the second video stream may be lower than that of the first video stream, so as to reduce consumption of network transmission resources.

Step 602, the second terminal device receives the first video stream and the second video stream, identifies the first video stream according to the identifier of the first video stream and the identifier of the second video stream, and then plays the first video stream through the presentation device.

Step 603, the second terminal device determines that the field angle of the user changes.

Here, in the process of playing the video by the presentation device, the second terminal device may monitor the field angle of the user in real time, so as to identify whether the field angle of the user changes in time; alternatively, the second terminal device may also periodically monitor the angle of view of the user. The method can be specifically set according to actual needs, and is not limited in the embodiment of the application.

In step 604a, the second terminal device sends a request message to the second media server, where the request message is used to indicate the current field angle (in this case, the changed field angle) of the user.

Further, the request message may be a specific RTCP message extended by the embodiment of the present application. The request message may include current field angle information of the user, and in an example, the current field angle information of the user may include a center azimuth angle corresponding to the current field angle of the user, a center elevation corresponding to the current field angle of the user, an azimuth angle range corresponding to the current field angle of the user, an elevation range corresponding to the current field angle of the user, and a center tilt angle corresponding to the current field angle of the user, which is not limited in this embodiment of the application. The request message may further include an identifier of the first video stream, and thus, after receiving the request message sent by the second terminal device, the second media server may determine, according to the synchronization source identifier carried in the request message, that the video stream that needs to be updated is the first video stream.

And step 604b, the second terminal device processes the second video stream according to the changed field angle to obtain a third video stream, and plays the third video stream through the presentation device. The viewing angle corresponding to the third video stream is the changed viewing angle.

Step 605, the second media server receives the request message sent by the second terminal device, acquires the spliced video stream, and processes the spliced video stream according to the changed angle of view to obtain the first video stream (at this time, the updated video stream, that is, the video stream corresponding to the changed angle of view). And the second media server can also obtain a second video stream according to the spliced video stream.

Here, there are many possible implementations for the second media server to obtain the spliced video stream. One possible implementation manner is that after the acquisition device acquires the plurality of video streams that are not spliced, the acquisition device splices the plurality of video streams that are not spliced according to the video splicing information, and sends the spliced video streams to the second media server, so that the second media server can acquire the spliced video streams. Another possible implementation manner is that the acquisition device sends the video splicing information to the second media server in the signaling communication process, and directly sends the plurality of video streams that are not spliced to the second media server, so that the second media server can splice the plurality of received video streams that are not spliced according to the video splicing information, and further obtain the spliced video stream. It will be appreciated that there are other possible implementations, such as splicing multiple un-spliced video streams by a first media server and sending the spliced video streams to a second media server, which are not listed here.

In step 606, the second media server sends the first video stream (the video stream corresponding to the changed field angle) and the second video stream to the second terminal device.

In step 607, the second terminal device receives the first video stream (the video stream corresponding to the changed field angle) and the second video stream, and plays the first video stream (i.e. stops acquiring and playing the third video stream, and starts playing the first video stream).

Further, the second media server may be deployed in an edge data center of the second terminal device, such as an MEC in a 5G NR scenario; therefore, the transmission delay of the second terminal device sending the request message to the second media server can be effectively reduced, and the transmission delay of the second media server returning the updated video stream to the second terminal device according to the request message can be effectively reduced, so that the user experience can be further improved.

By adopting the mode, after the field angle of the user is changed, the presentation equipment obtains the third video stream with lower video quality according to the second video stream and plays the third video stream so as to respond to the change of the field angle in time; after the network equipment receives the request message, the video stream corresponding to the current field angle of the user can be adjusted in time and fed back to the presentation equipment for playing; due to the visual delay effect of human eyes, a user may not perceive or have no obvious perception of the short delay process when watching a video, so that the user experience can be effectively improved.

It should be noted that the step number in the first to fifth embodiments is only an example of an execution flow, and does not form a specific limitation on the execution sequence of each step, for example, the step 604a and the step 604b may be executed simultaneously.

It is to be understood that each device in the above embodiments may include a corresponding hardware structure and/or software module for performing each function in order to realize the corresponding function. Those of skill in the art will readily appreciate that the present invention can be implemented in hardware or a combination of hardware and computer software, with the exemplary elements and algorithm steps described in connection with the embodiments disclosed herein. Whether a function is performed as hardware or computer software drives hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

In case of an integrated unit, fig. 7 shows a possible exemplary block diagram of the apparatus involved in an embodiment of the invention, which apparatus 700 may be present in the form of software. The apparatus 700 may include: a processing unit 702 and a communication unit 703. As an implementation manner, the communication unit 703 may include a receiving unit and a transmitting unit. The processing unit 702 is configured to control and manage operations of the apparatus 700. The communication unit 703 is used to support communication between the apparatus 700 and other devices. The apparatus 700 may further comprise a storage unit 701 for storing program codes and data of the apparatus 700.

The processing unit 702 may be a processor or a controller, such as a general Central Processing Unit (CPU), a general purpose processor, a Digital Signal Processing (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, a transistor logic device, a hardware component, or any combination thereof. Which may implement or perform the various illustrative logical blocks, modules, and circuits described in connection with the disclosure. The processor may also be a combination of computing functions, e.g., comprising one or more microprocessors, DSPs, and microprocessors, among others. The communication unit 703 may be a communication interface, a transceiver circuit, or the like, wherein the communication interface is generally referred to, and in a specific implementation, the communication interface may include a plurality of interfaces. The memory unit 701 may be a memory.

The apparatus 700 may be an acquisition device according to the present application, or may also be a chip in an acquisition device. The processing unit 702 may enable the apparatus 700 to perform the actions of the acquisition device in the above method examples. The communication unit 703 may support communication between the apparatus 700 and a network device, for example, the communication unit 703 is used to support the apparatus 700 to perform step 201 and step 205 in fig. 2.

Specifically, the communication unit 703 may be configured to send video splicing information to a network device, and send a plurality of video streams that are not spliced to the network device after the processing unit 702 determines that the network device receives the video splicing information, where the video splicing information is used to splice the plurality of video streams that are not spliced.

In a possible implementation manner, the video splicing information includes an identifier of the plurality of video streams that are not spliced, synchronization information among the plurality of video streams, and camera calibration parameters respectively corresponding to the plurality of video streams.

In a possible implementation manner, the communication unit 703 sends video splicing information to a network device, specifically: and the video splicing information is sent to the network equipment through the terminal equipment.

In a possible implementation manner, the communication unit 703 sends a plurality of video streams that are not spliced to the network device, specifically: sending, by the terminal device, a plurality of video streams that are not spliced to the network device; or receiving address information of the network device sent by the terminal device, and sending the plurality of video streams which are not spliced to the network device according to the address information of the network device.

The apparatus 700 may also be a network device referred to in this application, or may also be a chip in a network device. The processing unit 702 may enable the apparatus 700 to perform the actions of the network device in the above method examples. The communication unit 703 may support communication between the apparatus 700 and other devices (such as an acquisition device or a presentation device), for example, the communication unit 703 is used to support the apparatus 700 to perform step 202, step 204, step 206, step 207, and the like in fig. 2.

Specifically, the communication unit 703 may be configured to receive video splicing information sent by a collection device, and receive multiple video streams sent by the collection device that are not spliced; the processing unit 702 may be configured to splice the plurality of video streams according to the video splicing information.

In a possible implementation manner, the communication unit 703 is further configured to: receiving a request message from a presentation device, wherein the request message is used for indicating the current field angle of a user; the processing unit 702 is further configured to: processing the spliced video stream according to the current field angle of the user to obtain a first video stream; the communication unit 703 is further configured to: sending the first video stream to the presentation device; and the corresponding field angle of the first video stream is the current field angle of the user.

In one possible implementation manner, the processing unit 702 is further configured to: obtaining a second video stream according to the spliced video stream;

the communication unit is further configured to: and sending the second video stream to the presentation device, wherein the corresponding field angle of the second video stream is larger than the current field angle of the user, and the video quality of the second video stream is lower than that of the first video stream.

In a possible implementation manner, the request message further includes an identifier of the first video stream; the identification of the first video stream is a synchronization source SSRC identifier.

In one possible implementation, the network device is a media server, and the media server is deployed in an edge data center of the rendering device.

Referring to fig. 8, a schematic diagram of an apparatus provided in the present application is shown, where the apparatus may be the above-mentioned acquisition device, network device, presentation device or terminal device, or may also be a chip disposed in the acquisition device, network device, presentation device or terminal device. The apparatus 800 comprises: a processor 802, a communication interface 803, and a memory 801. Optionally, the apparatus 800 may also include a bus 804. The communication interface 803, the processor 802, and the memory 801 may be connected to each other via a communication line 804; the communication line 804 may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication lines 804 may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 8, but this is not intended to represent only one bus or type of bus.

The processor 802 may be a CPU, microprocessor, ASIC, or one or more integrated circuits configured to control the execution of programs in accordance with the teachings of the present application.

The communication interface 803 may be any device, such as a transceiver, for communicating with other devices or communication networks, such as an ethernet, a Radio Access Network (RAN), a Wireless Local Area Network (WLAN), a wired access network, etc.

The memory 801 may be, but is not limited to, a read-only memory (ROM) or other type of static storage device that may store static information and instructions, a Random Access Memory (RAM) or other type of dynamic storage device that may store information and instructions, an electrically erasable programmable read-only memory (EEPROM), a compact-disc-only memory (CD-ROM) or other optical disk storage, optical disk storage (including compact disk, laser disk, optical disk, digital versatile disk, blu-ray disk, etc.), magnetic disk storage media or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer. The memory may be separate and coupled to the processor via a communication line 804. The memory may also be integral to the processor.

The memory 801 is used for storing computer-executable instructions for executing the present application, and is controlled by the processor 802 to execute the instructions. The processor 802 is configured to execute computer-executable instructions stored in the memory 801 to implement the methods provided by the above-described embodiments of the present application.

Optionally, the computer-executable instructions in the embodiments of the present application may also be referred to as application program codes, which are not specifically limited in the embodiments of the present application.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims

1. A method for video stitching, the method comprising:

the acquisition equipment sends video splicing information to the network equipment;

and after the acquisition equipment determines that the network equipment receives the video splicing information, sending a plurality of video streams which are not spliced to the network equipment, wherein the video splicing information is used for splicing the plurality of video streams which are not spliced.

2. The method of claim 1, wherein the video splicing information comprises an identification of the plurality of video streams that are not spliced, synchronization information between the plurality of video streams, and camera calibration parameters corresponding to the plurality of video streams respectively.

3. The method of claim 1 or 2, wherein the capture device sends video stitching information to a network device, comprising:

4. The method of claim 3, wherein the capture device sends the plurality of video streams that are not spliced to the network device, comprising:

5. A method for video stitching, the method comprising:

6. The method of claim 5, further comprising:

7. The method of claim 6, further comprising:

8. The method according to claim 6 or 7, wherein the request message further comprises an identification of the first video stream; the identification of the first video stream is a synchronization source SSRC identifier.

9. The method of any of claims 5 to 8, wherein the network device is a media server deployed in an edge data center of the rendering device.

10. An acquisition device, characterized in that it comprises:

the communication unit is used for sending video splicing information to the network equipment; and after the processing unit determines that the network equipment receives the video splicing information, sending a plurality of video streams which are not spliced to the network equipment, wherein the video splicing information is used for splicing the plurality of video streams which are not spliced.

11. The capturing device according to claim 10, wherein the video splicing information includes an identification of the plurality of video streams that are not spliced, synchronization information between the plurality of video streams, and camera calibration parameters corresponding to the plurality of video streams, respectively.

12. The acquisition device according to claim 10 or 11, wherein the communication unit sends video splicing information to the network device, specifically:

and the video splicing information is sent to the network equipment through the terminal equipment.

13. The acquisition device according to claim 12, wherein the communication unit sends the plurality of video streams that are not spliced to the network device, specifically:

sending, by the terminal device, a plurality of video streams that are not spliced to the network device; alternatively, the first and second electrodes may be,

and receiving the address information of the network equipment sent by the terminal equipment, and sending the plurality of video streams which are not spliced to the network equipment according to the address information of the network equipment.

14. A network device, characterized in that the network device comprises:

the communication unit is used for receiving video splicing information sent by the acquisition equipment; receiving a plurality of video streams which are sent by the acquisition equipment and are not spliced;

and the processing unit is used for splicing the plurality of video streams according to the video splicing information.

15. The network device of claim 14, wherein the communication unit is further configured to: receiving a request message from a presentation device, wherein the request message is used for indicating the current field angle of a user;

the processing unit is further to: processing the spliced video stream according to the current field angle of the user to obtain a first video stream;

the communication unit is further configured to: sending the first video stream to the presentation device; and the corresponding field angle of the first video stream is the current field angle of the user.

16. The network device of claim 15, wherein the processing unit is further configured to: obtaining a second video stream according to the spliced video stream;

17. The network device according to claim 15 or 16, wherein the request message further comprises an identification of the first video stream; the identification of the first video stream is a synchronization source SSRC identifier.

18. The network device of any of claims 14 to 17, wherein the network device is a media server deployed in an edge data center of the rendering device.

19. An acquisition device, characterized in that it comprises:

a memory for storing a software program;

a processor for executing a software program in the memory to cause the acquisition device to perform the method of any one of claims 1 to 4.

20. A network device, characterized in that the network device comprises:

a memory for storing a software program;

a processor for executing a software program in the memory to cause the network device to perform the method of any one of claims 5 to 9.

21. A computer storage medium, characterized in that the storage medium has stored therein a software program which, when executed by one or more processors, implements the method of any one of claims 1 to 9.