WO2020001610A1 - 一种视频拼接方法及装置 - Google Patents

一种视频拼接方法及装置 Download PDF

Info

Publication number
WO2020001610A1
WO2020001610A1 PCT/CN2019/093651 CN2019093651W WO2020001610A1 WO 2020001610 A1 WO2020001610 A1 WO 2020001610A1 CN 2019093651 W CN2019093651 W CN 2019093651W WO 2020001610 A1 WO2020001610 A1 WO 2020001610A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
video stream
network device
user
view
Prior art date
Application number
PCT/CN2019/093651
Other languages
English (en)
French (fr)
Inventor
薛永革
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2020001610A1 publication Critical patent/WO2020001610A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/81Monomedia components thereof
    • H04N21/816Monomedia components thereof involving special video data, e.g 3D video
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/21Server components or server architectures
    • H04N21/218Source of audio or video content, e.g. local disk arrays
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/21Server components or server architectures
    • H04N21/218Source of audio or video content, e.g. local disk arrays
    • H04N21/21805Source of audio or video content, e.g. local disk arrays enabling multiple viewpoints, e.g. using a plurality of cameras
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs
    • H04N21/23424Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs involving splicing one content stream with another content stream, e.g. for inserting or substituting an advertisement
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/4302Content synchronisation processes, e.g. decoder synchronisation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
    • H04N21/44016Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving splicing one content stream with another content stream, e.g. for substituting a video clip
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/81Monomedia components thereof

Definitions

  • the present application relates to the field of communication technologies, and in particular, to a video splicing method and device.
  • VR technology is a computer simulation system that can create and experience a virtual world.
  • a computer-generated simulation environment is used to immerse users in the environment.
  • VR technology includes simulation environment, perception, natural skills and sensing equipment.
  • the simulation environment is a computer-generated, real-time and dynamic 3D stereo realistic image.
  • Perception refers to the ideal VR should have the perception of all people, in addition to the visual perception generated by computer graphics technology, there are hearing, touch, force, motion and other perceptions, and even smell and taste.
  • Natural skills refer to the movement of a person's head, eyes, gestures, or other human behaviors.
  • Computers process data that is appropriate to the actions of participants, respond to user input in real time, and feed back to the user's facial features.
  • the sensing device refers to a three-dimensional interactive device that can collect a user's action and feed the action as an input to a computer simulation system.
  • Visual sense plays an extremely important role in VR.
  • the most basic VR system must first solve is the virtual visual senses. To this end, the basic VR system must first achieve the following three points: first, blocking the original visual input of the person; second, occupying all vision with virtual image light; third, interacting with the image to achieve deceiving the brain effect.
  • Panoramic video extends traditional video technology to achieve the purpose of VR immersion.
  • Panoramic video is also called 360-degree video. It is obtained by using multiple cameras to capture the environment to obtain multiple video streams, and then synthesizing multiple video streams through synchronization, stitching and other technologies.
  • panoramic video allows users to actively and interactively move 360 degrees up, down, left, and right of the shooting point. Watching dynamic video from a location allows users to have a truly immersive feeling without being limited by time, space and geography.
  • embodiments of the present application provide a video stitching method and device, which are used to reduce the transmission delay of panoramic video.
  • an embodiment of the present application provides a video stitching method, including:
  • the acquisition device sends video splicing information to a network device; and after determining that the network device receives the video splicing information, sending a plurality of unspliced video streams to the network device, the video splicing information is used for the Unspliced multiple video streams are spliced.
  • the video splicing information is sent by the capture device to the network device.
  • the capture device may not need to splice multiple video streams in the subsequent process, but the network device splices multiple video streams according to the video splicing information. Since the processing capability of the network device is stronger than that of the acquisition device, it can effectively improve the efficiency of video splicing, thereby reducing transmission delay.
  • the video stitching information includes an identifier of the unspliced multiple video streams, synchronization information between the multiple video streams, and camera calibration parameters corresponding to the multiple video streams, respectively.
  • the collecting device sends video splicing information to a network device, including:
  • the collection device sends the video splicing information to the network device through a terminal device.
  • the acquisition device sending the unspliced multiple video streams to the network device includes:
  • the acquisition device receives address information of the network device sent by the terminal device, and sends the unspliced multiple video streams to the network device according to the address information of the network device.
  • an embodiment of the present application provides a video splicing method, where the method includes:
  • the network device receives the video splicing information sent by the acquisition device
  • the network device receives the unspliced multiple video streams sent by the collection device, and splices the multiple video streams according to the video splicing information.
  • the capture device does not need to splice multiple video streams, but the network device splices multiple video streams. Since the processing capability of the network device is stronger than that of the capture device, it can effectively improve the efficiency of video splicing. This reduces transmission delay.
  • the method further includes:
  • the network device processes the spliced video stream according to the user's current field of view to obtain a first video stream, and sends the first video stream to the presentation device; the first video stream corresponds to The field of view angle of is the current field of view angle of the user.
  • the request message may be a specific RTCP message extended in the embodiment of the present application. Since the user's field of view may change frequently, the presentation device sends a request message to the network device, so that the network device can timely transmit the video stream corresponding to the changed field of view according to the changed field of view, thereby improving user experience. It can be understood that the embodiments of the present application are not limited to the presentation device sending the request message to the network device in a scene where the viewing angle changes, and may also be in other scenes.
  • the method further includes:
  • the network device obtains a second video stream according to the spliced video stream, and sends the second video stream to the presentation device, and the field of view corresponding to the second video stream is greater than the current field of view of the user Angle, the video quality of the second video stream is lower than the video quality of the first video stream.
  • the request message further includes an identifier of the first video stream; the identifier of the first video stream is a synchronization source SSRC identifier.
  • the network device is a media server, and the media server is deployed in an edge data center of the presentation device.
  • the transmission delay of the presentation device sending the request message to the network device can be effectively reduced, and the transmission delay of the network device returning the updated video stream to the presentation device according to the request message can further improve the user experience.
  • an embodiment of the present application provides a video splicing method, where the method includes:
  • a network device receives a request message sent by a presentation device, the request message is used to indicate a current field of view angle of the user;
  • the network device obtains the spliced video stream, and processes the spliced video stream according to the user's current field of view to obtain a first video stream, and the field of view corresponding to the first video stream is the user's current field of view.
  • the network device sends the first video stream to the presentation device.
  • the request message may be a specific RTCP message extended in the embodiment of the present application. Since the user's field of view may change frequently, the presentation device sends a request message to the network device, so that the network device can timely transmit the video stream corresponding to the changed field of view according to the changed field of view, thereby improving user experience. It can be understood that the embodiments of the present application are not limited to the presentation device sending the request message to the network device in a scene where the viewing angle changes, and may also be in other scenes.
  • the method further includes:
  • the network device obtains a second video stream according to the spliced video stream, and sends the second video stream to the presentation device, and the field of view corresponding to the second video stream is greater than the current field of view of the user Angle, the video quality of the second video stream is lower than the video quality of the first video stream.
  • the request message further includes an identifier of the first video stream; the identifier of the first video stream is a synchronization source SSRC identifier.
  • an embodiment of the present application provides a video stitching method, where the method includes:
  • the rendering device sends a request message to the network device, where the request message is used to indicate the current field of view of the user;
  • Receiving, by the rendering device, the first video stream returned by the network device according to the request message, and playing the first video stream, and the field of view corresponding to the first video stream is the current field of view of the user .
  • the presenting device sends a request message to the network device, so that the network device can timely transmit the video stream corresponding to the user's current field of view to the presenting device according to the current field of view of the user, thereby improving the user experience.
  • the method before the rendering device sends the request message to the network device, the method further includes:
  • the presentation device determines that the viewing angle of the user has changed.
  • the method further includes:
  • a field of view corresponding to the second video stream is greater than a current field of view of the user, and a video quality of the second video stream is lower than the Video quality of the first video stream;
  • the method further includes:
  • the rendering device processes the second video stream according to the current field of view of the user to obtain a third video stream, and the field of view corresponding to the third video stream is the current field of view of the user ;
  • the rendering device plays the third video stream.
  • the presentation device After the presentation device determines that the user's field of view has changed, it can send a request message to the network device, and first process the second video stream according to the user's current field of view to obtain a third video stream, and the third The field of view corresponding to the video stream is the current field of view of the user, and then the third video stream is played; subsequently, after receiving the video stream corresponding to the current field of view of the user returned by the network device according to the request message, Start playing the video stream.
  • the rendering device first obtains and plays a third video stream with lower video quality according to the second video stream, so as to respond to the change of the field of view angle in time; and the network device receives the request
  • the video stream corresponding to the user's current field of view can be adjusted in time and fed back to the presentation device for playback; due to the visual delay effect of the human eye, the user may not perceive or notice the aforementioned short delay process when watching the video Perception, which can effectively improve the user experience.
  • an embodiment of the present application provides a video splicing method, where the method includes:
  • the terminal device receives the video splicing information sent by the acquisition device;
  • the terminal device sends the video splicing information to a network device, and the video splicing information is used to splice multiple video streams that are not spliced.
  • the video stitching information includes an identifier of the unspliced multiple video streams, synchronization information between the multiple video streams, and camera calibration parameters corresponding to the multiple video streams, respectively.
  • the method further includes:
  • the terminal device receives the unspliced multiple video streams sent by the collection device, and sends the unspliced multiple video streams to the network device.
  • an embodiment of the present application provides a device, which may be a collection device, a network device, a presentation device, or a terminal device, or a chip provided in the collection device, a network device, a presentation device, or a terminal device.
  • the device has the function of implementing the method described in the various possible designs of any one of the first to fifth aspects. This function can be realized by hardware, and can also be implemented by hardware executing corresponding software.
  • the hardware or software includes one or more modules or units corresponding to the above functions.
  • an embodiment of the present application provides a device including a processor and a memory; the memory is configured to store a computer execution instruction, and when the device is running, the processor executes the computer execution instruction stored in the memory, so that The device performs the method according to the various possible designs of any one of the first to fifth aspects described above.
  • an embodiment of the present application further provides a computer-readable storage medium, where the computer-readable storage medium stores instructions, and when the computer-readable storage medium runs on the computer, the computer executes the first to fifth aspects described above. Methods described in any possible design of any aspect.
  • the present application also provides a computer program product including instructions that, when run on a computer, causes the computer to perform the method described in various possible designs of any one of the first to fifth aspects. .
  • FIG. 1a is a schematic diagram of a system architecture applicable to an embodiment of the present application.
  • FIG. 1b is another schematic diagram of a system architecture according to an embodiment of the present application.
  • FIG. 1c is another schematic diagram of a system architecture according to an embodiment of the present application.
  • FIG. 2 is a schematic flowchart of a video splicing method provided in Embodiment 1 of this application;
  • FIG. 3 is a schematic flowchart of a video splicing method provided in Embodiment 2 of the present application.
  • FIG. 4 is a schematic flowchart of a video stitching method provided in Embodiment 3 of the present application.
  • FIG. 5 is a schematic flowchart of a method for updating a video stream according to a fourth embodiment of the present application.
  • FIG. 6 is a schematic flowchart of a method for updating a video stream provided by Embodiment 5 of the present application;
  • FIG. 7 is a schematic structural diagram of a device according to an embodiment of the present application.
  • FIG. 8 is a schematic structural diagram of another device according to an embodiment of the present application.
  • HTTP Hypertext Transfer Protocol
  • HLS Hypertext Transfer Protocol Live Streaming
  • CDN content distribution network
  • CDN documenting and fragmentation operations generally bring a delay of 5-10s, so each time a CDN is introduced for distribution, it will add a delay of the order of 5-10s. Therefore, currently large-scale panoramic video live broadcasts will bring a delay of 10-15s.
  • the real-time communication (RTC) end-to-end delay needs to be less than 400ms to be meaningful. From the perspective of improving the user experience, it needs to be less than 300ms (or even less than 200ms). Therefore, from the perspective of low latency, it is not yet possible to build a real-time communication application of panoramic video based on streaming media technology.
  • RTMP real-time messaging protocol
  • a real-time messaging protocol (RTMP) push method is introduced in the field of live broadcasting, while reducing the CDN level and transcoding operations on the CDN (such as multi-code Rate of file transcoding).
  • the delay can be 2-5 seconds (without using a CDN or using only one layer of CDN).
  • the internal video file of RTMP is based on the streaming video format (flash video, FLV) packaging, although it is smaller than the file fragmentation of HLS, it still brings second-level delay due to the packaging.
  • an embodiment of the present application provides a video splicing method for reducing the transmission delay of panoramic video.
  • FIG. 1a is a schematic diagram of a system architecture applicable to an embodiment of the present application.
  • the system architecture includes: a collection device 101, a presentation device 102, and a network device 103.
  • the collection device 101 and the presentation device 102 may have a network access function. Specifically, the collection device 101 may establish a communication connection with the network device 103 through a network access method, such as a wireless network connection or a wired network connection, which is not specifically limited. Similarly, the presentation device 102 may also use a network access method. A communication connection is established with the network device 103. In this way, the collection device 101 and the presentation device 102 can perform signaling transmission (that is, signaling plane communication) through the network device 103.
  • a network access method such as a wireless network connection or a wired network connection, which is not specifically limited.
  • the presentation device 102 may also use a network access method.
  • a communication connection is established with the network device 103. In this way, the collection device 101 and the presentation device 102 can perform signaling transmission (that is, signaling plane communication) through the network device 103.
  • the collecting device 101 may also have a function of collecting streaming media data (video data and / or audio data may be collectively referred to as streaming media data).
  • the collecting device 101 may be provided with a panoramic camera, and then collect the video data through the panoramic camera; correspondingly
  • the rendering device 102 may also have a function of playing audio and / or video for the user.
  • the rendering device 102 may be provided with a VR headset, which plays a panoramic video for the user through the VR headset. In this way, the collection device 101 and the presentation device 102 can perform streaming media data transmission (that is, media plane communication) through the network device 103.
  • FIG. 1b is a schematic diagram of another system architecture according to an embodiment of the present application.
  • the system architecture includes: a collection device 101, a presentation device 102, a first media server 1031, a core network element 1032, and a second media. Server 1033.
  • the collection device 101 and the presentation device 102 may have a network access function. Specifically, the collection device 101 may establish a communication connection with the first media server 1031 through a network access method, such as a wireless network connection or a wired network connection, which is not specifically limited. Similarly, the presentation device 102 may also be accessed through the network. To establish a communication connection with the second media server 1033. In this way, the collection device 101 and the presentation device 102 can perform signaling transmission (that is, signaling plane communication) through the first media server 1031, the core network element 1032, and the second media server 1033.
  • signaling transmission that is, signaling plane communication
  • the capturing device 101 may also have a function of collecting streaming media data; accordingly, the rendering device 102 may also have a function of playing audio and / or video for a user.
  • the media server is the core system of the streaming media application and the key platform for operators to provide video services to users.
  • the main function of the media server is to cache, schedule and transmit streaming media data.
  • the main function of the media server on the collection side (such as the first media server 1031) is to obtain streaming media data from the collection device 101 through a streaming media protocol and transmit the streaming media data to the media server on the presentation side; the media on the presentation side
  • the main function of the server (such as the second media server 1033) is to receive the streaming media data from the collection-side media server through the streaming media protocol, and transmit the streaming media data to the presentation device 102 for playback. That is, the collection device 101 and the presentation device 102 can perform streaming media data transmission (that is, media plane communication) through the first media server 1031 and the second media server 1033.
  • the core network element 1032 is mainly responsible for signaling control during a call session.
  • the core network element may receive the signaling from the first media server 1031 and forward it to the second media server 1033.
  • the core network element 1032 may be a third-party application control platform or may be an operator's own device.
  • FIG. 1c is a schematic diagram of another system architecture according to an embodiment of the present application.
  • the system architecture includes: a collection device 101, a first terminal device 104, a presentation device 102, a second terminal device 105, and a first media.
  • the collection device 101 may establish a communication connection with the first terminal device 104, such as a wired connection or a wireless-fidelity (Wi-Fi) connection; similarly, the presentation device 102 may also establish a communication connection with the second terminal device 105, Such as a wired connection or a Wi-Fi connection.
  • the first terminal device 104 and the second terminal device 105 have a network access function.
  • the first terminal device 104 may establish a communication connection with the media server 1031 through a network access method, such as a wireless network connection or a wired network connection, which is not specifically limited.
  • the second terminal device 105 may also be connected through the network.
  • the access mode establishes a communication connection with the media server 1033.
  • the collection device 101 can access the network through the first terminal device 104, and the presentation device 102 can access the network through the second terminal device 105, and then through the first media server 1031, the core network element 1032, and the second media server 1033.
  • Signaling transmission that is, signaling plane communication
  • the core network element is mainly used to implement signaling forwarding between the first media server 1031 and the second media server 1033. It should be noted that the present application does not limit the specific connection manner between the various devices.
  • the collection device 101 may have a function of collecting streaming media data; accordingly, the presentation device 102 may also have a function of playing audio and / or video for a user.
  • the collection device 101 and the presentation device 102 may perform streaming media data transmission (that is, media plane communication) through the media server 1031 and the media server 1033.
  • the capture device 101 may be a panoramic camera
  • the presentation device 102 may be a VR headset.
  • the core network element 1032 is mainly responsible for signaling control during a call session.
  • the core network element may receive the signaling from the first media server 1031 and forward it to the second media server 1033.
  • the core network element 1032 may be a third-party application control platform or may be an operator's own device.
  • the terminal device (such as the first terminal device 104 or the second terminal device 105) in the embodiment of the present application is a device with a wireless transmitting and receiving function, and may specifically be a mobile phone, a tablet computer, and a wireless transmitting and receiving device.
  • Functional computer wireless terminal in industrial control, wireless terminal in self driving, wireless terminal in remote medical, wireless terminal in smart grid, transportation Wireless terminals in transportation, wireless terminals in smart cities, wireless terminals in smart homes, and the like.
  • the embodiment of the present application does not limit the application scenario.
  • Terminal equipment can also be referred to as user equipment (UE), access terminal equipment, UE unit, UE station, mobile station, mobile station, remote station, remote terminal device, mobile device, UE terminal device, terminal device, Wireless communication equipment, UE agent or UE device, etc.
  • UE user equipment
  • access terminal equipment UE unit
  • UE station mobile station
  • mobile station mobile station
  • remote station remote terminal device
  • mobile device UE terminal device
  • terminal device Wireless communication equipment
  • Wireless communication equipment UE agent or UE device, etc.
  • the network device 103 in FIG. 1a may be the media server 1031 and the media server 1033 shown in FIG. 1b and FIG. 1c And core network element 1032.
  • the acquisition device 101 and the presentation device 102 have a network access function; while in the system architecture shown in FIG. 1c, the acquisition device 101 and the presentation device 102 may not have a network connection Access function, and can access the network through the first terminal device 104 and the second terminal device 105, respectively.
  • the communication systems applicable to the system architectures shown in the above FIG. 1a, FIG. 1b, and FIG. 1c include, but are not limited to, a 5G new radio (NR) communication system and an IP multimedia system (IMS).
  • NR 5G new radio
  • IMS IP multimedia system
  • the media server may specifically be a multimedia function (MMF) network element or a multimedia resource processor (MRFP).
  • the core network element may be an application function (AF) element.
  • the media server may specifically be a border session controller (SBC), and the core network element may specifically be a call session control function (CSCF) network element.
  • SBC border session controller
  • CSCF call session control function
  • the panoramic camera collects multiple video streams
  • the multiple video streams need to be stitched to obtain the panoramic video stream, and then transmitted to the VR headset.
  • the in-camera stitching of the panoramic camera is a key technical issue, but the stitching of the in-camera panoramic video stream will bring significant delay (current engineering analysis, the in-camera stitching link delay is about 200ms).
  • the in-camera stitching link delay is about 200ms.
  • 200ms is not sensitive to the end-to-end second-level delay, but in real-time communication, every 50ms delay reduction needs to bring significant technical changes. Based on this, the embodiment of the present application introduces real-time stitching on the network side.
  • the panoramic camera may not directly perform video stream stitching, and directly transmit the collected multiple video streams to the network device. Splice multiple video streams.
  • the capture device or panoramic camera
  • the network device can serve multiple capture devices, its performance is much stronger than the capture device. Therefore, it is obvious Improve the splicing efficiency, reduce the splicing delay, and then achieve the purpose of reducing the transmission delay of real-time communication.
  • a transmission protocol used for signaling plane communication between various devices may be a session initiation protocol (SIP), and a transmission protocol used for media plane communication may be a real-time transmission protocol (real-time transport protocol (RTP) / real-time transport control protocol (RTCP) protocol.
  • RTP real-time transport protocol
  • RTCP real-time transport control protocol
  • the acquisition device illustrated in FIG. 1c can use a real-time streaming protocol (RTSP). Interact with the first terminal device.
  • RTSP real-time streaming protocol
  • the video data is transmitted through RTP
  • the video quality is controlled through RTCP
  • the video control (such as fast forward and reverse) is provided through RTSP.
  • the embodiments of the present application will extend the communication protocol used between various devices (such as adding signaling or extending the fields in the signaling, etc.) to achieve real-time splicing on the network side. .
  • the user's field of view may change (for example, when the user rotates his head while watching a video using a VR headset, Will cause the user's field of view to change).
  • the video stream corresponding to the changed field of view is not transmitted in a timely manner based on the changed field of view, the user experience may be poor.
  • the presentation device may send a specific RTCP message (the message is an extended message in the embodiment of the present application) to the network device, and the specific RTCP message Used to indicate the user's current field of view (that is, the changed field of view).
  • the network device can process the spliced video stream according to the user's current field of view to obtain the video corresponding to the user's current field of view. Stream and transmit to the rendering device. In this way, the video stream corresponding to the changed field angle can be transmitted in time according to the changed field angle, thereby improving the user experience.
  • FIG. 2 is a schematic flowchart of a video stitching method provided in Embodiment 1 of the present application. As shown in Figure 2, the method includes:
  • Step 201 The acquisition device sends video splicing information to a network device.
  • the video splicing information is used to splice multiple unspliced video streams.
  • the video splicing information may include an identifier of the unspliced multiple video streams, and synchronization information between the multiple video streams. And camera calibration parameters corresponding to the multiple video streams.
  • the identifiers of multiple video streams may include the identifiers of the 4 video streams collected by the 4-mesh camera, respectively: the identifier of video stream 1 (11111 ), The identity of video stream 2 (22222), the identity of video stream 3 (33333), and the identity of video stream 4 (44444).
  • synchronization information between multiple video streams may indicate that the 4 video streams need to be synchronized, and the specific content of the synchronization information and the implementation process of synchronization are not limited in this embodiment of the present application.
  • the camera calibration parameters corresponding to each video stream may include one or more variables, as shown in Table 1, examples of variables that may be included in the camera calibration parameters corresponding to each video stream.
  • Table 1 shows examples of variables that camera calibration parameters may include.
  • Variable number variable Variable expression 1 Image width and height width, height 2 Crop circle information cropx, cropy, cropw, croph 3 Field of View v 4 Posture information, three rotation angles y, r, p 5 Translation d, e 6 Trim amount g, t
  • the collection device may send a first call request message (specifically, an invite message) to the network device, and the first call request message carries video splicing information.
  • a first call request message specifically, an invite message
  • the embodiment of the present application can extend the Session Description Protocol (SDP) to implement sending video splicing to network devices through an Invite message (SIP message). The purpose of the information.
  • SDP Session Description Protocol
  • Table 2 illustrates one possible extension.
  • the information carried in the multi-stream field of the SDP extension is the identification of multiple video streams that are not spliced
  • the information carried in the stream synchronization field of the SDP extension is the synchronization information between multiple video streams. It should be noted that Table 2 only shows the stream synchronization fields when two video streams need to be synchronized. When there are four video streams that need to be synchronized or more video streams need to be synchronized, you can refer to the implementation, and will not repeat them here. .
  • the embodiment of the present application also needs to extend SDP to the network device, so that the network device has the ability to parse the extended SDP, so as to be able to parse and obtain the video splicing information after receiving the invite message from the collection device.
  • Step 202 After receiving the video splicing information sent by the collection device, the network device sends video playback information to the presentation device.
  • the video playback information may be used to indicate the format of the video stream transmitted by the network device to the presentation device, for example, it may be an omnidirectional media format (OMAF) or a Tiled VR format.
  • OMAF omnidirectional media format
  • Tiled VR format a Tiled VR format
  • the network device may send a second call request message to the presentation device, and the second call request message includes the video playback information.
  • Step 203 The rendering device receives the video playback information and returns a first response message (for example, it may be a 183 message) to the network device.
  • the first response message may carry media plane address information of the rendering device, so as to facilitate subsequent media plane communication between the network device and the rendering device.
  • Step 204 After receiving the first response message, the network device sends a second response message (for example, it may be a 183 message) to the collection device.
  • the second response message may carry media plane address information of the network device, so as to facilitate subsequent media plane communication between the collecting device and the network device.
  • the signalling plane communication is completed between the acquisition device and the presentation device, and a communication connection is established.
  • the above steps 201 to 204 simply indicate the signaling plane communication process, and other steps may be involved in the specific implementation, which is not limited in this embodiment of the present application.
  • the above-mentioned step flow of the signaling plane communication may be the same as the step flow of the signaling plane communication in the prior art, but the content carried in the signaling transmitted between the devices is different from the existing In the technology, for example, the first call request message sent by the collection device to the network device may carry video splicing information.
  • the embodiments of the present application only need to expand the communication protocol between various devices, and can be applied to the existing communication process, so that it has strong applicability and is relatively easy to implement.
  • Step 205 After the acquisition device determines that the network device receives the video splicing information, it sends multiple video streams that are not spliced to the network device.
  • the acquisition device may determine that the network device receives the video splicing information. For example, if the acquisition device sends video splicing information to the network device through the first call request message, the acquisition device may determine that the network device receives the video after receiving the call response message (200OK) of the first call request message returned by the network device.
  • the acquisition device may determine that the network device receives the video after receiving the call response message (200OK) of the first call request message returned by the network device.
  • the collecting device may send the unspliced multiple video streams to the network device according to the media plane address information of the network device carried in the second response message.
  • Step 206 The network device receives the multiple video streams that are not stitched, and stitches the multiple video streams that are not stitched according to the video stitching information, and processes the stitched video streams according to the user's current field of view to obtain the first video stream.
  • the field of view corresponding to the first video stream is the current field of view of the user.
  • the network device may also obtain a second video stream according to the spliced video stream, the field of view corresponding to the second video stream is greater than the current field of view angle of the user, and the video quality of the second video stream Video quality lower than the first video stream.
  • the second video stream is a panoramic video stream.
  • the network device may obtain the first video stream according to the spliced panoramic video stream in various specific implementation manners.
  • the network device performs the panoramic video stream according to the current field of view angle of the user.
  • the video stream corresponding to the current field of view of the user (that is, the first video stream) is obtained by cropping, which is not specifically limited.
  • Step 207 The network device sends a first video stream to the presentation device.
  • the second video stream may be sent to the presentation device simultaneously or separately.
  • the first video stream may carry the identifier of the first video stream
  • the second video stream may carry the identifier of the second video stream.
  • the first video stream and the second video stream can be distinguished according to the identification.
  • the identifier of the first video stream and the identifier of the second video stream may be represented by different synchronization source (SSRC) fields, that is, the identifier of the first video stream and the identifier of the second video stream may be the same. Is the SSRC identifier.
  • SSRC synchronization source
  • the current field of view of the user can be obtained in various ways. The following two scenarios are described respectively.
  • Scenario 1 During media communication
  • the presentation device can monitor the user's field of view in real time, and after determining that the user's field of view has changed, it sends a request message (the message name can be defined as Refresh FOV) to the network device. It indicates the current field of view of the user; in this way, after receiving the request message, the network device can obtain the current field of view of the user.
  • a request message the message name can be defined as Refresh FOV
  • the request message may be an RTCP message, and the request message may include the current field of view information of the user.
  • the current field of view information of the user may include the center azimuth corresponding to the current field of view of the user, the user The center elevation corresponding to the current field of view, the azimuth range corresponding to the user's current field of view, the elevation range corresponding to the user's current field of view, and the center inclination corresponding to the user's current field of view. No restrictions.
  • the request message may also include the identification of the first video stream. After receiving the request message sent by the rendering device, the network device may determine that the video stream to be updated is the first video stream according to the synchronization source identifier carried in the request message. .
  • Table 3 Examples of key fields for request messages
  • the presentation device may send the request message to the network device to indicate that the user is currently Field of view.
  • a possible implementation manner is that a default field of view is preset in the network device.
  • the network device can obtain a low-resolution panorama according to the spliced video streams.
  • Video stream (corresponding to the second video stream), and first process the spliced video stream based on the default field of view to obtain a fourth video stream (corresponding to the first video stream), and the field of view corresponding to the fourth video stream
  • the default FOV the network device sends the low-resolution panoramic video stream and the fourth video stream to the rendering device.
  • the rendering device may send a request message to the network device.
  • the user's current field of view In this way, after receiving the request message, the network device can obtain the angle of view of the user.
  • the presentation device may also actively request a message from the network device during the initial stage of media plane communication.
  • key fields of the request message in scenario 2 may be the same as the key fields of the request message in scenario 1, and details are not described herein again.
  • Step 208 The rendering device receives the first video stream and plays the first video stream.
  • the rendering device may receive the first video stream and the second video stream, and according to the identifier and The identifier of the second video stream identifies the first video stream, and then plays the first video stream.
  • the video splicing information is sent by the capture device to the network device.
  • the capture device does not need to splice multiple video streams in the subsequent process, and the network device splices multiple video streams according to the video splicing information Because the processing capability of the network equipment is stronger than that of the acquisition equipment, it can effectively improve the efficiency of video splicing and reduce transmission delay.
  • the presentation device may send a request message to the network device after determining that the angle of view of the user has changed, and first according to the user Process the second video stream at the current field of view to obtain a third video stream, and the field of view corresponding to the third video stream is the current field of view of the user, and then play the third video stream; After receiving the video stream corresponding to the user's current field of view returned by the network device according to the request message, the video stream can be played.
  • the rendering device first obtains and plays a third video stream with lower video quality according to the second video stream, so as to respond to the change of the field of view angle in time; and the network device receives the request
  • the video stream corresponding to the user's current field of view can be adjusted in time and fed back to the presentation device for playback; due to the visual delay effect of the human eye, the user may not perceive or notice the aforementioned short delay process when watching the video Perception, which can effectively improve the user experience.
  • the rendering device processes the second video stream to obtain the third video stream
  • the network device processes the spliced video stream to obtain the first video stream
  • the network devices involved in the foregoing steps and processes may be collectively referred to as a first media server, a second media server, and a core network element.
  • the process of video splicing performed by a network device is mainly described based on FIG. 1a.
  • the main body performing video splicing may be the first media server, or Two media servers.
  • FIG. 3 is a schematic flowchart of a video stitching method provided in Embodiment 2 of the present application. As shown in Figure 3, the method includes:
  • Step 301 The acquisition device sends video splicing information to the first media server.
  • the collection device may send a first call request message (specifically, an invite message) to the first media server, and the first call request message carries video splicing information.
  • a first call request message (specifically, an invite message)
  • the first call request message carries video splicing information.
  • Step 302 After receiving the video stitching information sent by the collection device, the first media server forwards the video stitching information to a core network element.
  • the first media server may send a second call request message (specifically, an invite message) to the core network element, and the second call request message carries video splicing information.
  • a second call request message (specifically, an invite message)
  • the second call request message carries video splicing information.
  • Step 303 After receiving the video splicing information sent by the first media server, the core network element forwards the video splicing information to the second media server.
  • the core network element may send a third call request message (specifically, an invite message) to the second media server, and the third call request message carries video splicing information.
  • a third call request message (specifically, an invite message)
  • the third call request message carries video splicing information.
  • this embodiment of the present application can extend SDP to achieve the purpose of carrying video splicing information through the invite message. For specific methods, see The description in the first embodiment is not repeated here.
  • Step 304 After receiving the video splicing information sent by the core network element, the second media server sends video playback information to the presentation device.
  • the second media server may send a fourth call request message (specifically, an invite message) to the second terminal device, and the fourth call request message carries video playback information.
  • a fourth call request message (specifically, an invite message)
  • the video playback information can be used to indicate the format of the video stream transmitted by the network device to the presentation device, for example, it can be in the OMAF or Tiled VR format.
  • Step 305 After receiving the video playback information, the rendering device sends a first response message (specifically, a 183 message) to the second media server.
  • a first response message (specifically, a 183 message)
  • the first response message may carry the media plane address of the second terminal device, so as to facilitate subsequent media plane communication between the second media service and the second terminal device.
  • Step 306 After receiving the first response message, the second media server sends a second response message to the core network element.
  • the second response message may carry the media plane address of the second media server.
  • Step 307 After receiving the second response message sent by the second media server, the core network element sends a third response message to the first media server.
  • the third response message may carry the media plane address of the second media server, so as to facilitate subsequent media plane communication between the first media server and the second media service.
  • the core network element may not participate in the media plane communication, which is mainly used to forward the signaling between the first media server and the second media server in the signaling plane communication.
  • Step 308 After receiving the third response message sent by the core network element, the first media server sends a fourth response message to the collection device.
  • steps 301 to 308 specifically illustrate the participation of the first media server, the second media server, and the core network element.
  • the signaling transmission process, and other contents please refer to the description of steps 201 to 204 above, which will not be repeated here.
  • Step 309 The acquisition device sends a plurality of unspliced video streams to the first media server.
  • Step 310 The first media server receives the multiple unspliced video streams sent by the collection device, and sends them to the second media server.
  • Step 311 The second media server splices the unspliced multiple video streams according to the video splicing information, and processes the spliced video streams according to the user's current field of view to obtain a first video stream, which corresponds to the first video stream.
  • the field of view angle of is the current field of view angle of the user.
  • the second media server may also obtain a second video stream according to the spliced video stream, and the field of view corresponding to the second video stream is greater than the current field of view angle of the user.
  • the video quality is lower than the video quality of the first video stream.
  • the second video stream is a panoramic video stream.
  • Step 312 The second media server sends the first video stream to the rendering device.
  • the network device also generates a second video stream in step 311, the first video stream and the second video stream may be sent to the presentation device at the same time.
  • Step 313 The rendering device receives the first video stream and plays the first video stream.
  • the rendering device can receive the first video stream and the second video stream, and according to the first video stream, The identifier and the identifier of the second video stream identify the first video stream, and then play the first video stream.
  • steps 309 to 313 specifically illustrate the video stream transmission process in which the first media server and the second media server participate.
  • the transmitted SIP message can carry video splicing information.
  • the video splicing information is sent by the capture device to the second media server.
  • the video capture device may not need to splice multiple video streams, and the second media server may The video stream is spliced. Since the processing capability of the second media server is stronger than that of the capture device, it can effectively improve the efficiency of video splicing and reduce transmission delay.
  • the embodiment of the present application is described mainly based on the system architecture shown in FIG. 1c.
  • FIG. 4 is a schematic flowchart of a video stitching method provided in Embodiment 3 of the present application. As shown in FIG. 4, the method includes:
  • a connection is established between the first terminal device and the acquisition device.
  • the first terminal device and the acquisition device can be automatically paired by using an existing near-field device pairing technology to establish a connection, and the first terminal device configures an RTSP address for the acquisition device.
  • the second terminal device establishes a connection with the presentation device.
  • the second terminal device and the presentation device may establish a connection through Wi-Fi or a universal serial bus (universal serial bus, USB).
  • a universal serial bus universal serial bus, USB
  • Step 401 The first terminal device sends a description request message (DESCRIBE request) to the acquisition device.
  • DESCRIBE request a description request message
  • the description request message is used to request to acquire the media initialization description information of the collection device.
  • Step 402 The collecting device returns a description response message (DESCRIBE response) to the first terminal device according to the description request message.
  • DESCRIBE response a description response message
  • the description response message may be a 200 OK message.
  • the description response message may carry media initialization description information.
  • the description response message may also carry video splicing information. Considering that the existing description response message does not support carrying video splicing information, the embodiment of the present application can extend SDP to achieve the purpose of sending video splicing information to the first terminal device through the description response message.
  • the RSTP message transmitted between the acquisition device and the terminal device can carry video splicing information.
  • Step 403 After receiving the description response message, the first terminal device sends a first call request message (specifically, an invite message) to the first media server, and the first call request message carries video splicing information.
  • a first call request message specifically, an invite message
  • Step 404 After receiving the first call request message sent by the first terminal device, the first media server sends a second call request message (specifically, an invite message) to the core network element, and the second call request message carries video splicing. information.
  • a second call request message specifically, an invite message
  • Step 405 After receiving the second call request message sent by the first media server, the core network element sends a third call request message (specifically, an invite message) to the second media server.
  • the third call request message carries video splicing. information.
  • this embodiment of the present application may extend SDP to achieve the purpose of carrying video splicing information through the invite message. For specific methods, see The description in the first embodiment is not repeated here.
  • the SIP transmitted between the first terminal device, the first media server, the core network element, and the second media server is extended.
  • the message can carry video splicing information.
  • Step 406 After receiving the third call request message sent by the core network element, the second media server sends a fourth call request message (specifically, an invite message) to the second terminal device.
  • the fourth call request message carries a video playback. information.
  • the video playback information can be used to indicate the format of the video stream transmitted by the network device to the presentation device, for example, it can be in the OMAF or Tiled VR format.
  • Step 407 After receiving the fourth call request message, the second terminal device sends a first response message (specifically, a 183 message) to the second media server.
  • a first response message specifically, a 183 message
  • the first response message may carry the media plane address of the second terminal device, so as to facilitate subsequent media plane communication between the second media service and the second terminal device.
  • Step 408 After receiving the first response message sent by the second terminal device, the second media server sends a second response message to the core network element.
  • the second response message may carry the media plane address of the second media server.
  • Step 409 After receiving the second response message sent by the second media server, the core network element sends a third response message to the first media server.
  • the third response message may carry the media plane address of the second media server, so as to facilitate subsequent media plane communication between the first media server and the second media service.
  • the core network element may not participate in the media plane communication, which is mainly used to forward the signaling between the first media server and the second media server in the signaling plane communication.
  • Step 410 After receiving the third response message sent by the core network element, the first media server sends a fourth response message to the first terminal device.
  • the fourth response message may carry the media plane address of the first media server, so as to facilitate subsequent media terminal communication between the first terminal device and the first media server.
  • Step 411 After receiving the fourth response message, the first terminal device sends a setup request message (SETUP request) to the collection device.
  • SETUP request a setup request message
  • the establishment request message can be used to set the attributes and transmission mode of the session, and remind the acquisition device to establish a session, and so on.
  • the establishment request message may carry transmission address information (to facilitate communication after the session is established).
  • the transmission address information may include a media plane address of the first terminal device.
  • the first terminal device may establish The correspondence between the media plane address of the first terminal device and the media plane address of the first media server.
  • Step 412 The acquisition device returns a SETUP response message to the first terminal device.
  • the establishment response message may be a 200 OK message, which is used to establish a session with the first terminal device, and returns a session identifier and session related information.
  • Step 413 The first terminal device sends a play request message (PLAY request) to the acquisition device.
  • PLAY request a play request message
  • the playback request message is used to request playback, that is, to request the acquisition device to send a video stream.
  • Step 414 The acquisition device sends a plurality of unspliced video streams to the first terminal device according to the playback request message.
  • Step 415 The first terminal device sends the multiple unspliced video streams to the first media server.
  • the first terminal device may send the multiple video streams that are not spliced to the first media server according to the correspondence between the media surface address of the first terminal device and the media surface address of the first media server. For example, if the media plane address 1a of the first terminal device corresponds to the media plane address 1b of the first media server, the first terminal device may correspondingly send the video stream received through the address 1a to the address 1b.
  • the transmission address information including the media plane address of the first terminal device as an example.
  • the transmission address information includes the media plane address of the first media server.
  • the collection device may directly send the unspliced data to the first media server. Multiple video streams without having to be forwarded by the first terminal device to the first media server, which can effectively improve transmission efficiency and reduce resource consumption of the first terminal device.
  • Step 416 The first media server receives the unspliced multiple video streams and sends the unspliced multiple video streams to the second media server.
  • Step 417 The second media server receives the multiple video streams that are not stitched, and stitches the multiple video streams that are not stitched according to the video stitching information, and processes the stitched video streams according to the user's current field of view.
  • a first video stream, and a field of view corresponding to the first video stream is a current field of view angle of the user.
  • the second media server may also obtain a second video stream according to the spliced video stream, and the field of view corresponding to the second video stream is greater than the current field of view angle of the user.
  • the video quality is lower than the video quality of the first video stream.
  • the second video stream is a panoramic video stream.
  • the second media server is preset with a default field of view.
  • the second media server After stitching multiple video streams, you can get a low-resolution panoramic video stream (corresponding to the second video stream) according to the stitched video stream, and first process the stitched video stream based on the default field of view to obtain the first Four video streams (corresponding to the first video stream), and the viewing angle corresponding to the fourth video stream is the default viewing angle; the second media server sends the low-resolution panoramic video stream and the fourth video stream to the rendering device.
  • the rendering device may send a request message to the second media server.
  • the second media server can obtain the current field of view of the user, and process the spliced video stream according to the current field of view of the user to obtain the first video stream.
  • the rendering device may also actively request a message from the second media server during the initial stage of the media plane communication.
  • the second media server may obtain the user's current field of view angle. And processing the spliced video stream according to the user's current field of view to obtain a first video stream.
  • the presentation device can monitor the user's field of view in real time and send a request message to the second media server after determining that the user's field of view has changed.
  • the second media server can obtain the current field of view of the user, and process the spliced video stream according to the current field of view of the user to obtain the first video stream.
  • request message may be a specific RTCP message extended in the embodiment of the present application.
  • RTCP message extended in the embodiment of the present application.
  • Step 418 The second media server sends the first video stream to the second terminal device.
  • the second video stream may be sent to the presentation device simultaneously or separately.
  • Step 419 The second terminal device receives the first video stream and plays the first video stream through the rendering device.
  • the second terminal device can receive the first video stream and the second video stream, and according to the first An identifier of a video stream and an identifier of a second video stream identify the first video stream, and then the first video stream is played by the rendering device.
  • step 401 and step 402 may not be performed.
  • the method for updating the first video stream by the presentation device sending a request message to the network device may be applicable to various scenarios. For example, this method It can also be applied to scenes where video stitching is performed by the capture device.
  • the network device herein may be the second media server in Embodiments 2 and 3.
  • the second media server may be deployed in the edge data center of the presentation device in Embodiment 2 or the second terminal device in Embodiment 3.
  • Edge data centers such as mobile edge computing (MEC) in 5G NR scenarios; in this way, the transmission delay of the request message sent by the rendering device or the second terminal device to the second media server can be effectively reduced, and the The two media servers return the transmission delay of the updated video stream to the presentation device or the second terminal device according to the request message, thereby further improving the user experience.
  • MEC mobile edge computing
  • the embodiment of the present application it is considered that the viewing angle of the user may change frequently during the communication process on the media. At this time, if the video stream corresponding to the viewing angle of the user is not updated in a timely and effective manner, the user experience will be caused. Worse. Based on this, the embodiment of the present application further provides a method for updating a video stream, which is used to timely update the video stream corresponding to the user's field of view when the user's field of view angle changes, thereby improving the user experience.
  • Embodiment 4 and Embodiment 5 The following specifically describes Embodiment 4 and Embodiment 5.
  • a method for updating a video stream is mainly described based on the system architecture shown in FIG. 1a.
  • FIG. 5 is a schematic flowchart of a method for updating a video stream according to an embodiment of the present application. As shown in FIG. 5, the method includes:
  • Step 501 The rendering device sends a request message to the network device, where the request message is used to indicate a current field of view angle of the user.
  • the request message may be a specific RTCP message extended in the embodiment of the present application.
  • the request message may include the current field of view information of the user.
  • the current field of view information of the user may include the center azimuth corresponding to the current field of view of the user, the center elevation corresponding to the current field of view of the user,
  • the azimuth range corresponding to the current field of view of the user, the elevation range corresponding to the current field of view of the user, and the central tilt angle corresponding to the current field of view of the user are not limited in this embodiment of the present application.
  • the rendering device when it determines that the viewing angle of the user changes during the video playing process, it may send a request message to the network device.
  • the method may further include:
  • the network device sends a first video stream to the presentation device, and the field of view corresponding to the first video stream is the user's current field of view (field of view angle before change).
  • the network device may further send a second video stream to the presentation device, and the field of view corresponding to the second video stream is greater than the current field of view of the user.
  • the second video stream may be a panoramic video stream.
  • the identifier of the first video stream may be carried in the first video stream
  • the identifier of the second video stream may be carried in the second video stream.
  • the video quality of the second video stream may be lower than the video quality of the first video stream, so as to reduce the consumption of network transmission resources.
  • Step 500b The rendering device receives the first video stream and plays the first video stream.
  • the rendering device may receive the first video stream and the second video stream, and according to the identifier and The identifier of the second video stream identifies the first video stream, and then plays the first video stream.
  • the request message may further include an identifier of the first video stream.
  • the network device After the network device receives the request message sent by the presentation device, , It can be determined that the video stream to be updated is the first video stream according to the synchronization source identifier carried in the request message.
  • the rendering device may further process the second video stream according to the current field of view of the user to obtain a third video stream.
  • the field of view corresponding to the third video stream is the changed field of view.
  • the third video stream is played; in this way, after the user's field of view angle changes, the rendering device first obtains and plays the third video stream with lower video quality according to the second video stream, so as to respond to the field of view in time. The change. Subsequent to receiving the video stream corresponding to the changed field of view returned by the network device according to the request message, the video stream corresponding to the changed field of view may start to be played.
  • the presentation device may also send a request message to the network device under the trigger of other situations, which is not specifically limited.
  • Step 502 The network device receives the request message sent by the rendering device, obtains the spliced video stream, and processes the spliced video stream according to the user's current field of view to obtain a first video stream.
  • the first video stream The corresponding field of view angle is the current field of view angle (the field of view angle after the change) of the user.
  • the network device to obtain the spliced video stream.
  • a possible implementation manner is that after capturing multiple unspliced video streams, the capture device splices the unspliced multiple video streams according to the video splicing information, and sends the spliced video streams to the network device. , The network device can obtain the spliced video stream.
  • the acquisition device sends video splicing information to the network device during the signaling communication process, and directly sends the unspliced multiple video streams to the network device. In this way, the network device can The received unspliced multiple video streams are spliced to obtain a spliced video stream.
  • the network device may also obtain a second video stream according to the spliced video stream.
  • Step 503 The network device sends a first video stream (a video stream corresponding to the changed field of view angle) to the presentation device.
  • the network device further generates a second video stream in step 502
  • the second video stream may be sent to the rendering device simultaneously or separately.
  • Step 504 The rendering device receives the first video stream returned by the network device according to the request message, and plays the first video stream.
  • the presentation device may receive the first video stream and the second video stream, and according to the identifier and The identifier of the second video stream identifies the first video stream, and then plays the first video stream.
  • the presentation device can notify the network device of the user's current field of view, thereby facilitating the network device to update and obtain the video stream corresponding to the user's current field of view in time, and feed back The rendering device plays, thereby effectively improving the user experience.
  • the method for updating a video stream provided in the embodiment of the present application may also be applied to the system architecture shown in FIG. 1b and FIG. 1c.
  • the system architecture shown in FIG. 1c As an example, and in combination with the fifth embodiment, the process of updating the video stream when the user's field of view angle changes is described in detail.
  • a method for updating a video stream is mainly described based on the system architecture shown in FIG. 1c.
  • FIG. 6 is a schematic flowchart of a method for updating a video stream according to an embodiment of the present application. As shown in FIG. 6, it includes:
  • Step 601 The second media server sends a first video stream and a second video stream to the second terminal device.
  • the first video stream may carry an identifier of the first video stream
  • the second video stream may carry a second video stream. The ID of the video stream.
  • the identifier of the first video stream and the identifier of the second video stream may be represented using different SSRC fields.
  • the field of view corresponding to the first video stream is the user's current field of view (the field of view angle before the change), and the field of view corresponding to the second video stream is greater than the user's current field of view.
  • the second video stream may be a panoramic video stream.
  • the video quality of the second video stream may be lower than the video quality of the first video stream, so as to reduce the consumption of network transmission resources.
  • Step 602 The second terminal device receives the first video stream and the second video stream, and identifies the first video stream according to the identifier of the first video stream and the identifier of the second video stream, and then performs the first video stream by the rendering device. Play.
  • Step 603 The second terminal device determines that the viewing angle of the user changes.
  • the second terminal device can monitor the user's field of view in real time, so as to identify whether the user's field of view angle has changed in time; or the second terminal device can also periodically monitor The field of view of the user. It may be specifically set according to actual needs, which is not limited in the embodiment of the present application.
  • Step 604a The second terminal device sends a request message to the second media server, where the request message is used to indicate a current field of view angle of the user (in this case, a changed field of view angle).
  • the request message may be a specific RTCP message extended in the embodiment of the present application.
  • the request message may include the current field of view information of the user.
  • the current field of view information of the user may include the center azimuth corresponding to the current field of view of the user, the center elevation corresponding to the current field of view of the user, The azimuth range corresponding to the current field of view of the user, the elevation range corresponding to the current field of view of the user, and the central tilt angle corresponding to the current field of view of the user are not limited in this embodiment of the present application.
  • the request message may also include the identifier of the first video stream. In this way, after receiving the request message sent by the second terminal device, the second media server may determine the video stream to be updated according to the synchronization source identifier carried in the request message. The first video stream.
  • Step 604b The second terminal device processes the second video stream according to the changed field of view to obtain a third video stream, and plays the third video stream through the rendering device.
  • the field of view corresponding to the third video stream is a changed field of view.
  • Step 605 The second media server receives the request message sent by the second terminal device, obtains the spliced video stream, and processes the spliced video stream according to the changed field angle to obtain the first video stream (at this time Is the updated video stream, that is, the video stream corresponding to the changed field angle). And, the second media server may also obtain the second video stream according to the spliced video stream.
  • the second media server to obtain the spliced video stream.
  • a possible implementation manner is that, after capturing the multiple video streams that are not stitched, the capture device stitches the multiple video streams that are not stitched according to the video stitching information, and sends the stitched video streams to the second media server. In this way, the second media server can obtain the spliced video stream.
  • the acquisition device sends video splicing information to the second media server during the signaling communication process, and directly sends the unspliced multiple video streams to the second media server.
  • the second media The server can splice the received unspliced multiple video streams according to the video splicing information, and then obtain the spliced video stream. Understandably, there are other possible implementation methods, such as splicing multiple unspliced video streams by the first media server and sending the spliced video streams to the second media server, which will not be enumerated here one by one. .
  • Step 606 The second media server sends the first video stream (the video stream corresponding to the changed field of view angle) and the second video stream to the second terminal device.
  • Step 607 the second terminal device receives the first video stream (the video stream corresponding to the changed field of view angle) and the second video stream, and plays the first video stream (that is, stops acquiring and playing the third video stream, and starts Play the first video stream).
  • the first video stream the video stream corresponding to the changed field of view angle
  • the second video stream plays the first video stream (that is, stops acquiring and playing the third video stream, and starts Play the first video stream).
  • the second media server may be deployed in an edge data center of the second terminal device, for example, a MEC in a 5G NR scenario; in this way, the transmission delay of the second terminal device sending a request message to the second media server may be effectively reduced. And the second media server returns the transmission delay of the updated video stream to the second terminal device according to the request message, so as to further improve the user experience.
  • the rendering device first obtains and plays a third video stream with lower video quality according to the second video stream, so as to respond to the change of the field of view angle in time; and the network device receives After the request message is received, the video stream corresponding to the user's current field of view can be adjusted in time and fed back to the presentation device for playback. Due to the visual delay effect of the human eye, the user may not perceive the above short delay process when watching the video No obvious perception, which can effectively improve the user experience.
  • step numbers mentioned in the first embodiment to the fifth embodiment are only examples of the execution flow, and do not constitute a specific limitation on the execution sequence of each step.
  • step 604a and step 604b can be performed at the same time. carried out.
  • each device in the foregoing embodiments may include a hardware structure and / or a software module corresponding to each function.
  • Those skilled in the art should easily realize that the present invention can be implemented in the form of hardware or a combination of hardware and computer software by combining the units and algorithm steps of each example described in the embodiments disclosed herein. Whether a certain function is performed by hardware or computer software-driven hardware depends on the specific application and design constraints of the technical solution. A person skilled in the art can use different methods to implement the described functions for each specific application, but such implementation should not be considered to be beyond the scope of the present invention.
  • FIG. 7 shows a possible exemplary block diagram of a device involved in the embodiment of the present invention, and the device 700 may exist in the form of software.
  • the apparatus 700 may include a processing unit 702 and a communication unit 703.
  • the communication unit 703 may include a receiving unit and a sending unit.
  • the processing unit 702 is configured to control and manage the operations of the device 700.
  • the communication unit 703 is configured to support communication between the apparatus 700 and other devices.
  • the device 700 may further include a storage unit 701 for storing program code and data of the device 700.
  • the processing unit 702 may be a processor or a controller.
  • the processing unit 702 may be a general-purpose central processing unit (CPU), a general-purpose processor, digital signal processing (DSP), or an application-specific integrated circuit. circuits (ASICs), field programmable gate arrays (FPGAs) or other programmable logic devices, transistor logic devices, hardware components, or any combination thereof. It may implement or execute various exemplary logical blocks, modules, and circuits described in connection with the present disclosure.
  • the processor may also be a combination that implements a computing function, for example, including a combination of one or more microprocessors, a combination of a DSP and a microprocessor, and the like.
  • the communication unit 703 may be a communication interface, a transceiver, or a transceiver circuit.
  • the communication interface is collectively referred to. In a specific implementation, the communication interface may include multiple interfaces.
  • the storage unit 701 may be a memory.
  • the apparatus 700 may be a collection device involved in the present application, or may also be a chip in the collection device.
  • the processing unit 702 may support the apparatus 700 to perform the actions of the collection device in the method examples above.
  • the communication unit 703 may support communication between the apparatus 700 and a network device.
  • the communication unit 703 is configured to support the apparatus 700 to perform steps 201 and 205 in FIG. 2.
  • the communication unit 703 may be configured to send video splicing information to a network device, and after the processing unit 702 determines that the network device receives the video splicing information, send multiple video streams that are not spliced to the network device, so The video splicing information is used to splice the multiple video streams that are not spliced.
  • the video splicing information includes an identifier of the unspliced multiple video streams, synchronization information between the multiple video streams, and camera calibrations corresponding to the multiple video streams, respectively. parameter.
  • the communication unit 703 sends video splicing information to a network device, specifically: sending the video splicing information to the network device through a terminal device.
  • the communication unit 703 sends the unspliced multiple video streams to the network device, specifically: sending the unspliced multiple video streams to the network device through the terminal device; Alternatively, receiving the address information of the network device sent by the terminal device, and sending the unspliced multiple video streams to the network device according to the address information of the network device.
  • the apparatus 700 may also be a network device involved in this application, or may also be a chip in a network device.
  • the processing unit 702 may support the apparatus 700 to perform the actions of the network device in the foregoing method examples.
  • the communication unit 703 may support communication between the device 700 and other devices (such as a collection device or a presentation device). For example, the communication unit 703 is used to support the device 700 to perform steps 202, 204, 206, and 207 in FIG. .
  • the communication unit 703 may be configured to receive video splicing information sent by a capture device, and to receive unspliced multiple video streams sent by the capture device; and the processing unit 702 may be configured to process the multiple videos according to the video splicing information.
  • Stream for splicing may be configured to process the multiple videos according to the video splicing information.
  • the communication unit 703 is further configured to: receive a request message from a rendering device, where the request message is used to indicate a current field of view angle of the user; and the processing unit 702 is further configured to: Processing the spliced video stream by the current field of view of the user to obtain a first video stream; the communication unit 703 is further configured to: send the first video stream to the presentation device; the first The field of view corresponding to the video stream is the current field of view of the user.
  • the processing unit 702 is further configured to obtain a second video stream according to the spliced video stream;
  • the communication unit is further configured to send the second video stream to the presentation device, and a field of view corresponding to the second video stream is greater than a current field of view angle of the user.
  • the video quality is lower than the video quality of the first video stream.
  • the request message further includes an identifier of the first video stream; the identifier of the first video stream is a synchronization source SSRC identifier.
  • the network device is a media server
  • the media server is deployed in an edge data center of the presentation device.
  • the device may be the above-mentioned acquisition device, network device, presentation device, or terminal device, or may be provided in the acquisition device, network device, presentation device, or terminal. Chips in the device.
  • the device 800 includes: a processor 802, a communication interface 803, and a memory 801.
  • the device 800 may further include a bus 804.
  • the communication interface 803, the processor 802, and the memory 801 may be connected to each other through a communication line 804.
  • the communication line 804 may be a peripheral component interconnect (PCI) bus or an extended industrial standard structure (extended industry standard architecture). , Referred to as EISA) bus and so on.
  • the communication line 804 can be divided into an address bus, a data bus, a control bus, and the like. For ease of representation, only one thick line is used in FIG. 8, but it does not mean that there is only one bus or one type of bus.
  • the processor 802 may be a CPU, a microprocessor, an ASIC, or one or more integrated circuits for controlling the execution of the program of the solution of the present application.
  • the communication interface 803 uses any device such as a transceiver to communicate with other devices or communication networks, such as Ethernet, radio access network (RAN), wireless local area networks (WLAN), Wired access network, etc.
  • RAN radio access network
  • WLAN wireless local area networks
  • Wired access network etc.
  • the memory 801 may be a read-only memory (ROM) or other type of static storage device that can store static information and instructions, a random access memory (RAM) or other type that can store information and instructions
  • the dynamic storage device can also be electrically erasable programmable read-only memory (electrically server-programmable read-only memory (EEPROM)), compact disc (read-only memory (CD-ROM) or other optical disk storage, Optical disc storage (including compact discs, laser discs, optical discs, digital versatile discs, Blu-ray discs, etc.), magnetic disk storage media or other magnetic storage devices, or can be used to carry or store desired program code in the form of instructions or data structures and can Any other media accessed by the computer, but not limited to this.
  • the memory may exist independently, and is connected to the processor through the communication line 804. The memory can also be integrated with the processor.
  • the memory 801 is configured to store a computer execution instruction for executing the solution of the present application, and the processor 802 controls the execution.
  • the processor 802 is configured to execute computer execution instructions stored in the memory 801, so as to implement the method provided in the foregoing embodiment of the present application.
  • the computer-executable instructions in the embodiments of the present application may also be referred to as application program codes, which are not specifically limited in the embodiments of the present application.
  • the computer program product includes one or more computer instructions.
  • the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be from a website site, a computer, a server, or a data center.
  • the computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, a data center, or the like that includes one or more available medium integration.
  • the available medium may be a magnetic medium (for example, a floppy disk, a hard disk, a magnetic tape), an optical medium (for example, a DVD), or a semiconductor medium (for example, a solid state disk (Solid State Disk (SSD)), and the like.
  • These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing device to work in a specific manner such that the instructions stored in the computer-readable memory produce a manufactured article including an instruction device, the instructions
  • the device implements the functions specified in one or more flowcharts and / or one or more blocks of the block diagram.
  • These computer program instructions can also be loaded onto a computer or other programmable data processing device, so that a series of steps can be performed on the computer or other programmable device to produce a computer-implemented process, which can be executed on the computer or other programmable device.
  • the instructions provide steps for implementing the functions specified in one or more flowcharts and / or one or more blocks of the block diagrams.

Abstract

本申请公开了一种视频拼接方法及装置。其中方法包括:采集设备向网络设备发送视频拼接信息,以及在确定网络设备接收到视频拼接信息后,向网络设备发送未拼接的多个视频流,以使网络设备根据视频拼接信息对未拼接的多个视频流进行拼接。采用这种方法,采集设备无需对多个视频流进行拼接,而由网络设备对多个视频流进行拼接,能够有效提高视频拼接的效率,进而降低传输时延。

Description

一种视频拼接方法及装置
本申请要求于2018年6月29日提交中国国家知识产权局、申请号为201810714820.5、发明名称为“一种视频拼接方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及通信技术领域,尤其涉及一种视频拼接方法及装置。
背景技术
虚拟现实(virtual reality,VR)技术是一种可以创建和体验虚拟世界的计算机仿真系统,利用计算机生成一种模拟环境,使用户沉浸到该环境中。VR技术主要包括模拟环境、感知、自然技能和传感设备等方面。模拟环境是由计算机生成的、实时动态的三维立体逼真图像。感知是指理想的VR应该具有一切人所具有的感知,除计算机图形技术所生成的视觉感知外,还有听觉、触觉、力觉、运动等感知,甚至还包括嗅觉和味觉等。自然技能是指人的头部转动,眼睛、手势、或其他人体行为动作,由计算机来处理与参与者的动作相适应的数据,实时响应用户的输入,并分别反馈到用户的五官。传感设备是指三维交互设备,该设备能采集用户的动作并把该动作为输入反馈到计算机模拟系统。
视觉感官在VR中占据极其重要的地位。最基本的VR系统首先要解决的是虚拟的视觉感官。为此,基本的VR系统首先要做到以下三点,第一、阻断人原有的视觉输入;第二、用虚拟影像光线占据全部视觉;第三、与影像的交互,达到欺骗大脑的效果。
全景视频扩展了传统的视频技术来达到VR沉浸的目的。全景视频又被称为360度视频,是通过采用多个摄像头对环境进行拍摄得到多个视频流,再通过同步、拼接等技术,将多个视频流合成而得到的。不同于传统的视频只能被动的观看摄影师拍摄的既定视场角(field of view,FOV)的画面和镜头,全景视频可以让用户以主动交互的方式在拍摄点的上下左右360度的任意位置观看动态视频,让用户有一种真正意义上身临其境的感觉,而不受时间、空间和地域的限制。
然而,由于实时通信场景中对全景视频的传输时延要求较高,目前的流媒体技术尚无法满足这一时延要求。
发明内容
有鉴于此,本申请实施例提供一种视频拼接方法及装置,用于降低全景视频的传输时延。
第一方面,本申请实施例提供一种视频拼接方法,包括:
采集设备向网络设备发送视频拼接信息;以及在确定所述网络设备接收到所述视频拼接信息后,向所述网络设备发送未拼接的多个视频流,所述视频拼接信息用于对所述未拼接的多个视频流进行拼接。
采用上述方法,由采集设备将视频拼接信息发送给网络设备,如此,在后续过程中采 集设备可以无需对多个视频流进行拼接,而由网络设备根据视频拼接信息对多个视频流进行拼接,由于网络设备的处理能力比采集设备的处理能力强,能够有效提高视频拼接的效率,进而降低传输时延。
在一种可能的设计中,所述视频拼接信息包括所述未拼接的多个视频流的标识、所述多个视频流之间的同步信息以及所述多个视频流分别对应的相机标定参数。
在一种可能的设计中,所述采集设备向网络设备发送视频拼接信息,包括:
所述采集设备通过终端设备向所述网络设备所述视频拼接信息。
在一种可能的设计中,所述采集设备向所述网络设备发送未拼接的多个视频流,包括:
所述采集设备通过所述终端设备向所述网络设备发送未拼接的多个视频流;或者,
所述采集设备接收所述终端设备发送的所述网络设备的地址信息,并根据所述网络设备的地址信息向所述网络设备发送所述未拼接的多个视频流。
第二方面,本申请实施例提供一种视频拼接方法,所述方法包括:
网络设备接收采集设备发送的视频拼接信息;
所述网络设备接收所述采集设备发送的未拼接的多个视频流,并根据所述视频拼接信息对所述多个视频流进行拼接。
采用上述方法,采集设备可以无需对多个视频流进行拼接,而由网络设备对多个视频流进行拼接,由于网络设备的处理能力比采集设备的处理能力强,能够有效提高视频拼接的效率,进而降低传输时延。
在一种可能的设计中,所述方法还包括:
所述网络设备接收来自呈现设备的请求消息,所述请求消息用于指示用户当前的视场角;
所述网络设备根据所述用户当前的视场角对拼接后的视频流进行处理,得到第一视频流,并将所述第一视频流发送给所述呈现设备;所述第一视频流对应的视场角为所述用户当前的视场角。
此处,请求消息可以为本申请实施例扩展的一个特定的RTCP消息。由于用户的视场角可能会经常发生变化,因此,通过呈现设备向网络设备发送请求消息,使得网络设备能够及时根据变化后的视场角,传输变化后的视场角对应的视频流,提高用户体验。可以理解的,本申请实施例并不限于呈现设备在视场角发生变化的场景中向网络设备发送请求消息,也可以是在其它场景中。
在一种可能的设计中,所述方法还包括:
所述网络设备根据拼接后的视频流得到第二视频流,并将所述第二视频流发送给所述呈现设备,所述第二视频流对应的视场角大于所述用户当前的视场角,所述第二视频流的视频质量低于所述第一视频流的视频质量。
在一种可能的设计中,所述请求消息中还包括所述第一视频流的标识;所述第一视频流的标识为同步源SSRC标识符。
在一种可能的设计中,所述网络设备为媒体服务器,所述媒体服务器部署于所述呈现设备的边缘数据中心。
如此,可以有效降低呈现设备向网络设备发送请求消息的传输时延,以及网络设备根 据请求消息向呈现设备返回更新后的视频流的传输时延,从而能够进一步提升用户体验。
第三方面,本申请实施例提供一种视频拼接方法,所述方法包括:
网络设备接收呈现设备发送的请求消息,所述请求消息用于指示所述用户当前的视场角;
所述网络设备获取拼接后的视频流,并根据用户当前的视场角对拼接后的视频流进行处理,得到第一视频流,所述第一视频流对应的视场角为所述用户当前的视场角;
所述网络设备将所述第一视频流发送给所述呈现设备。
此处,请求消息可以为本申请实施例扩展的一个特定的RTCP消息。由于用户的视场角可能会经常发生变化,因此,通过呈现设备向网络设备发送请求消息,使得网络设备能够及时根据变化后的视场角,传输变化后的视场角对应的视频流,提高用户体验。可以理解的,本申请实施例并不限于呈现设备在视场角发生变化的场景中向网络设备发送请求消息,也可以是在其它场景中。
在一种可能的设计中,所述方法还包括:
所述网络设备根据拼接后的视频流得到第二视频流,并将所述第二视频流发送给所述呈现设备,所述第二视频流对应的视场角大于所述用户当前的视场角,所述第二视频流的视频质量低于所述第一视频流的视频质量。
在一种可能的设计中,所述请求消息中还包括所述第一视频流的标识;所述第一视频流的标识为同步源SSRC标识符。
第四方面,本申请实施例提供一种视频拼接方法,所述方法包括:
呈现设备向网络设备发送请求消息,所述请求消息用于指示所述用户当前的视场角;
所述呈现设备接收网络设备根据所述请求消息返回的第一视频流,并对所述第一视频流进行播放,所述第一视频流对应的视场角为所述用户当前的视场角。
如此,通过呈现设备向网络设备发送请求消息,使得网络设备能够及时根据用户当前的视场角,向呈现设备传输用户当前的视场角对应的视频流,提高用户体验。
在一种可能的设计中,所述呈现设备向网络设备发送请求消息之前,还包括:
所述呈现设备确定用户的视场角发生变化。
在一种可能的设计中,所述方法还包括:
所述呈现设备接收所述网络设备发送的第二视频流,所述第二视频流对应的视场角大于所述用户当前的视场角,所述第二视频流的视频质量低于所述第一视频流的视频质量;
所述呈现设备确定用户的视场角发生变化之后,还包括:
所述呈现设备根据所述用户当前的视场角,对所述第二视频流进行处理,得到第三视频流,所述第三视频流对应的视场角为所述用户当前的视场角;
所述呈现设备对所述第三视频流进行播放。
采用上述方式,呈现设备确定用户的视场角发生变化之后,可向网络设备发送请求消息,并先根据用户当前的视场角,对第二视频流进行处理,得到第三视频流,第三视频流对应的视场角为所述用户当前的视场角,进而对第三视频流进行播放;后续在接收到网络设备根据请求消息返回的用户当前的视场角对应的视频流后,可开始播放该视频流。如此,在用户的视场角发生变化后,先由呈现设备根据第二视频流得到视频质量较低的第三视频 流并播放,以便于及时响应视场角的变化;而网络设备接收到请求消息后,可及时调整用户当前的视场角对应的视频流,并反馈给呈现设备进行播放;由于人眼的视觉延迟效应,用户在观看视频时对上述短暂的延迟过程可能无感知或无明显感知,从而能够有效提高用户体验。
第五方面,本申请实施例提供一种视频拼接方法,所述方法包括:
终端设备接收采集设备发送的视频拼接信息;
所述终端设备将所述视频拼接信息发送给网络设备,所述视频拼接信息用于对未拼接的多个视频流进行拼接。
在一种可能的设计中,所述视频拼接信息包括所述未拼接的多个视频流的标识、所述多个视频流之间的同步信息以及所述多个视频流分别对应的相机标定参数。
在一种可能的设计中,所述终端设备将所述视频拼接信息发送给网络设备之后,还包括:
所述终端设备接收所述采集设备发送的未拼接的多个视频流,并将所述未拼接的多个视频流发送给所述网络设备。
第六方面,本申请实施例提供一种装置,该装置可以是采集设备、网络设备、呈现设备或终端设备,或者,也可以是设置在采集设备、网络设备、呈现设备或终端设备中的芯片。该装置具有实现上述第一方面至第五方面中任意一方面的各种可能的设计所述的方法的功能。该功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。该硬件或软件包括一个或多个与上述功能相对应的模块或单元。
第七方面,本申请实施例提供一种装置,包括:处理器和存储器;该存储器用于存储计算机执行指令,当该装置运行时,该处理器执行该存储器存储的该计算机执行指令,以使该装置执行如上述第一方面至第五方面中任意一方面的各种可能的设计所述的方法。
第八方面,本申请实施例还提供一种计算机可读存储介质,所述计算机可读存储介质中存储有指令,当其在计算机上运行时,使得计算机执行上述第一方面至第五方面中任意一方面的各种可能的设计所述的方法。
第九方面,本申请还提供一种包括指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述第一方面至第五方面中任意一方面的各种可能的设计所述的方法。
本申请的这些方面或其他方面在以下实施例的描述中会更加简明易懂。
附图说明
图1a为本申请实施例适用的一种系统架构示意图;
图1b为本申请实施例提供的又一种系统架构示意图;
图1c为本申请实施例提供的又一种系统架构示意图;
图2为本申请实施例一提供的视频拼接方法对应的流程示意图;
图3为本申请实施例二提供的视频拼接方法对应的流程示意图;
图4为本申请实施例三提供的视频拼接方法对应的流程示意图;
图5为本申请实施例四提供的更新视频流的方法对应的流程示意图;
图6为本申请实施例五提供的更新视频流的方法对应的流程示意图;
图7为本申请实施例提供的一种装置的结构示意图;
图8为本申请实施例提供的又一种装置的结构示意图。
具体实施方式
下面结合附图对本申请实施例进行详细描述。
流媒体技术中,通常采用超文本传输协议直播流(hyper text transport protocol live streaming,HLS)等一类基于超文本传输协议(hyper text transport protocol,HTTP)的传输技术,并基于内容分发网络(content delivery network,CDN)通过文件分片的方式进行分发。CDN的文件化及分片的操作,一般都会带来5-10s的时延,所以每引入一层CDN的分发,都会带来时延5-10s量级的累加。因此,目前较大规模的全景视频直播,都会带来10-15s的时延。
然而,实时通信(real-time communication,RTC)的端到端时延需要做到小于400ms才有意义。从提升用户体验的角度来看,需要做到小于300ms(甚至是小于200ms)。因此,从低时延角度看,尚且无法基于流媒体技术来构建全景视频的实时通信类应用。
为了尽量减小流媒体技术的时延,直播领域中引入了实时消息传送协议(real-time messaging protocol,RTMP)的推流方式,同时减少CDN层级、减少CDN上的转码操作(比如多码率的文件转码)。采用这种方式,时延可以在2-5秒(不使用CDN或仅用一层CDN)。但是,由于RTMP内部的视频文件是基于流媒体格式(flash video,FLV)封装,虽然比HLS的文件分片要小,却仍会因封装而带来秒级时延。
基于此,本申请实施例提供一种视频拼接方法,用于降低全景视频的传输时延。
图1a为本申请实施例适用的一种系统架构示意图,如图1a所示,该系统架构包括:采集设备101、呈现设备102和网络设备103。
采集设备101和呈现设备102可以具有网络接入功能。具体来说,采集设备101可以通过网络接入的方式与网络设备103建立通信连接,比如无线网络连接或有线网络连接,具体不做限定;同样地,呈现设备102也可以通过网络接入的方式与网络设备103建立通信连接。如此,采集设备101和呈现设备102可以通过网络设备103进行信令传输(即信令面通信)。
采集设备101还可以具有采集流媒体数据的功能(视频数据和/或音频数据可统称为流媒体数据),比如,采集设备101中可以设置有全景相机,进而通过全景相机来采集视频数据;相应地,呈现设备102还可以具有为用户播放音频和/或视频的功能,比如,呈现设备102中可以设置有VR头显,通过VR头显来为用户播放全景视频。如此,采集设备101和呈现设备102可以通过网络设备103进行流媒体数据传输(即媒体面通信)。
图1b为本申请实施例提供的又一种系统架构示意图,如图1b所示,该系统架构包括:采集设备101、呈现设备102、第一媒体服务器1031、核心网网元1032和第二媒体服务器1033。
采集设备101和呈现设备102可以具有网络接入功能。具体来说,采集设备101可以通过网络接入的方式与第一媒体服务器1031建立通信连接,比如无线网络连接或有线网络连接,具体不做限定;同样地,呈现设备102也可以通过网络接入的方式与第二媒体服务 器1033建立通信连接。如此,采集设备101和呈现设备102可以通过第一媒体服务器1031、核心网网元1032和第二媒体服务器1033进行信令传输(即信令面通信)。
采集设备101还可以具有采集流媒体数据的功能;相应地,呈现设备102还可以具有为用户播放音频和/或视频的功能。媒体服务器为流媒体应用的核心系统,是运营商向用户提供视频服务的关键平台。媒体服务器的主要功能是对流媒体数据进行缓存、调度和传输等。进一步地,采集侧的媒体服务器(比如第一媒体服务器1031)的主要功能是通过流媒体协议从采集设备101获取流媒体数据,并将流媒体数据传输到呈现侧的媒体服务器;呈现侧的媒体服务器(比如第二媒体服务器1033)的主要功能是通过流媒体协议从采集侧媒体服务器接收流媒体数据,并传输到呈现设备102进行播放。也就是说,采集设备101和呈现设备102可以通过第一媒体服务器1031和第二媒体服务器1033进行流媒体数据传输(即媒体面通信)。
其中,核心网网元1032主要负责呼叫会话过程中的信令控制。在本申请中,核心网网元可以接收来自第一媒体服务器1031的信令,并将其转发给第二媒体服务器1033。进一步地,核心网网元1032可以是第三方的应用控制平台或者也可以是运营商自己的设备。
图1c为本申请实施例提供的又一种系统架构示意图,如图1c所示,该系统架构包括:采集设备101、第一终端设备104、呈现设备102、第二终端设备105、第一媒体服务器1031、核心网网元1032和第二媒体服务器1033。
采集设备101可以与第一终端设备104建立通信连接,比如有线连接或无线保真(wireless-fidelity,Wi-Fi)连接;同样地,呈现设备102也可以与第二终端设备105建立通信连接,比如有线连接或Wi-Fi连接。第一终端设备104和第二终端设备105具有网络接入功能。具体来说,第一终端设备104可以通过网络接入的方式与媒体服务器1031建立通信连接,比如无线网络连接或有线网络连接,具体不做限定;同样地,第二终端设备105也可以通过网络接入的方式与媒体服务器1033建立通信连接。如此,采集设备101可以通过第一终端设备104接入网络,呈现设备102可以通过第二终端设备105接入网络,进而通过第一媒体服务器1031、核心网网元1032和第二媒体服务器1033进行信令传输(即信令面通信)。其中,核心网网元主要用于实现第一媒体服务器1031和第二媒体服务器1033之间的信令转发。需要说明的是,本申请对于各个设备之间具体的连接方式不做限定。
采集设备101可以具有采集流媒体数据的功能;相应地,呈现设备102还可以具有为用户播放音频和/或视频的功能。采集设备101和呈现设备102可以通过媒体服务器1031和媒体服务器1033进行流媒体数据传输(即媒体面通信)。
进一步地,采集设备101可以为全景相机,呈现设备102可以为VR头显。
其中,核心网网元1032主要负责呼叫会话过程中的信令控制。在本申请中,核心网网元可以接收来自第一媒体服务器1031的信令,并将其转发给第二媒体服务器1033。进一步地,核心网网元1032可以是第三方的应用控制平台或者也可以是运营商自己的设备。
本申请实施例中的终端设备(比如第一终端设备104或第二终端设备105)是一种具有无线收发功能的设备,具体可以是手机(mobile phone)、平板电脑(Pad)、带无线收发功能的电脑、工业控制(industrial control)中的无线终端、无人驾驶(self driving)中的无线终端、远程医疗(remote medical)中的无线终端、智能电网(smart grid)中 的无线终端、运输安全(transportation safety)中的无线终端、智慧城市(smart city)中的无线终端、智慧家庭(smart home)中的无线终端等等。本申请的实施例对应用场景不做限定。终端设备有时也可以称为用户设备(user equipment,UE)、接入终端设备、UE单元、UE站、移动站、移动台、远方站、远程终端设备、移动设备、UE终端设备、终端设备、无线通信设备、UE代理或UE装置等。
针对于上述图1a、图1b和图1c所示意的系统架构,需要说明的是:(1)图1a中的网络设备103可以为图1b和图1c中所示意的媒体服务器1031、媒体服务器1033和核心网网元1032的统称。(2)图1a和图1b所示意的系统架构中,采集设备101和呈现设备102具有网络接入功能;而图1c所示意的系统架构中,采集设备101和呈现设备102可以不具有网络接入功能,并可以分别通过第一终端设备104和第二终端设备105接入网络。
上述图1a、图1b和图1c所示意的系统架构适用的通信系统包括但不限于:5G新无线(new radio,NR)通信系统和IP多媒体系统(IP multimedia system,IMS)。
以图1c所示意的系统架构为例,若该系统架构适用于5G NR系统,则媒体服务器具体可以为多媒体功能(multimedia function,MMF)网元或者多媒体资源处理器(multimedia resource function processor,MRFP),核心网网元具体可以为应用功能(application function,AF)网元。若该系统架构适用于IMS,则媒体服务器具体可以为边界会话控制器(session border controller,SBC),核心网网元具体可以为呼叫会话控制功能(call session control function,CSCF)网元。
为了便于理解本申请实施例的方案,以下简单介绍一下本申请实施例的几个概念。
(1)根据3GPP 26.919 5.4.5,全景相机采集到多个视频流后,需要对多个视频流进行拼接得到全景视频流,然后传输到VR头显。全景相机机内拼接是一个很关键的问题技术点,但机内全景视频流的拼接会带来明显时延(目前工程分析,机内的拼接环节时延在200ms左右)。然而,由于在现有技术的流媒体场景中,200ms对于端到端秒级的时延并不敏感,但在实时通信中,每50ms时延的降低都需要带来重大的技术变革。基于此,本申请实施例引入网络侧实时拼接,具体来说,全景相机采集到多个视频流后,可不执行视频流拼接,直接将采集到的多个视频流传输给网络设备,由网络设备对多个视频流进行拼接。考虑到采集设备(或者全景相机)通常会因成本因素而导致性能受限,而网络设备可为多个采集设备服务,其性能远远强于采集设备,因此,由网络设备进行拼接,可以明显提高拼接效率,降低拼接时延,进而实现降低实时通信的传输时延的目的。
进一步地,上述系统架构中,各个设备之间进行信令面通信所使用的传输协议可以为会话初始协议(session initiation protocol,SIP),进行媒体面通信所使用的传输协议可以为实时传输协议(real-time transport protocol,RTP)/实时传输控制协议(real-time transport control protocol,RTCP)协议,进一步地,图1c中所示意的采集设备可以使用实时流传输协议(real time streaming protocol,RTSP)与第一终端设备交互协作。可以理解为,视频数据通过RTP传输,视频质量通过RTCP控制,视频控制(比如快进、倒退等)通过RTSP提供。作为一个具体的例子,本申请实施例将在此基础上,通过对各个设备之间所使用的通信协议进行扩展(比如增加信令或扩展信令中的字段等),以实现网络侧实时拼接。
(2)考虑在网络设备向呈现设备传输用户的视场角对应的视频流的过程中,用户的视场角可能会发生变化(比如,用户使用VR头显观看视频的过程中转动头部,会导致用户的视场角发生变化),此时,若未能及时根据变化后的视场角,传输变化后的视场角对应的视频流,则可能导致用户体验较差。基于此,本申请实施例中,呈现设备在确定用户的视场角发生变化后,可向网络设备发送一个特定的RTCP消息(该消息为本申请实施例扩展的消息),该特定的RTCP消息用于指示用户当前的视场角(即变化后的视场角),如此,网络设备可根据用户当前的视场角对拼接后的视频流进行处理,得到用户当前的视场角对应的视频流并传输给呈现设备。通过这种方式,能够及时根据变化后的视场角,传输变化后的视场角对应的视频流,提高用户体验。
需要说明的是,本申请实施例中(1)和(2)中所介绍的方法可以分别单独应用,或者也可以结合应用,具体不做限定,后续实施例中主要以二者结合应用为例进行描述。
以下结合具体实施例,介绍本申请实施例提供的视频拼接方法。
实施例一
在实施例一中,主要基于图1a所示意的系统架构对本申请实施例进行描述。
图2为本申请实施例一提供的视频拼接方法对应的流程示意图。如图2所示,该方法包括:
步骤201,采集设备向网络设备发送视频拼接信息。
此处,视频拼接信息用于对多个未拼接的视频流进行拼接,在一个示例中,视频拼接信息可以包括未拼接的多个视频流的标识、所述多个视频流之间的同步信息以及所述多个视频流分别对应的相机标定参数。
比如,若采集设备中用于采集视频流的相机为4目相机,则多个视频流的标识可以包括4目相机采集到的4个视频流的标识,分别为:视频流1的标识(11111)、视频流2的标识(22222)、视频流3的标识(33333)、视频流4的标识(44444)。若后续对4个视频流进行全景拼接,则多个视频流之间的同步信息可以指示4个视频流需要进行同步,本申请实施例对同步信息的具体内容以及同步的实现过程不做限定。每个视频流对应的相机标定参数可以包括一种或多种变量,如表1所示,为每个视频流对应的相机标定参数可能包括的变量示例。
表1所示,为相机标定参数可能包括的变量示例
变量序号 变量 变量表达
1 图像宽高 width,height
2 裁切圆信息 cropx、cropy、cropw、croph
3 视场角 v
4 姿态信息,三个旋转角度 y、r、p
5 平移量 d,e
6 修剪量 g,t
在一种可能的实现方式中,采集设备可以向网络设备发送第一呼叫请求消息(具体可以为invite消息),第一呼叫请求消息中携带视频拼接信息。考虑到现有的invite消息中并不支持携带视频拼接信息,本申请实施例可以对会话描述协议(session description protocol,SDP)进行扩展,来实现通过invite消息(SIP消息)向网络设备发送视频拼接 信息的目的。
针对于视频拼接信息包括的任一项信息,均可以有多种可能的扩展方式。比如,针对于未拼接的多个视频流的标识和多个视频流之间的同步信息,表2示意出了一种可能的扩展方式。其中,SDP扩展的多流字段所携带的信息即为未拼接的多个视频流的标识,SDP扩展的流同步字段所携带的信息即为多个视频流之间的同步信息。需要说明的是,表2中仅示意出2个视频流需要同步时的流同步字段,当有4个视频流需要同步或者更多视频流需要同步时,均可以参照执行,此处不再赘述。
表2:SDP扩展的字段示例
Figure PCTCN2019093651-appb-000001
进一步地,本申请实施例对网络设备也需要对SDP进行扩展,从而使得网络设备具备解析扩展后的SDP的能力,以便于在接收来自采集设备的invite消息后,能够解析获得视频拼接信息。
步骤202,网络设备接收到采集设备发送的视频拼接信息后,向呈现设备发送视频播放信息。其中,视频播放信息可以用于指示网络设备向呈现设备传输的视频流的格式,比如可以为全景媒体格式(omnidirectional media format,OMAF)或者Tiled VR格式。
此处,网络设备可以向呈现设备发送第二呼叫请求消息,第二呼叫请求消息中包括所述视频播放信息。
步骤203,呈现设备接收视频播放信息,并向网络设备返回第一响应消息(比如,可以为183消息)。此处,第一响应消息中可携带有呈现设备的媒体面地址信息,以便于后续网络设备与呈现设备之间进行媒体面通信。
步骤204,网络设备接收第一响应消息后,向采集设备发送第二响应消息(比如,可以为183消息)。此处,第二响应消息中可携带有网络设备的媒体面地址信息,以便于后续采 集设备和网络设备之间进行媒体面通信。
如此,通过上述步骤201至步骤204,采集设备和呈现设备之间完成信令面通信,并建立起通信连接。需要说明的是:上述步骤201至步骤204仅简单示意出信令面通信的流程,具体实施中还可能涉及到其它步骤,本申请实施例对此不做限定。在一种可能的实现方式中,上述信令面通信的步骤流程可以与现有技术中信令面通信的步骤流程相同,但各个设备之间传输的信令中所携带的内容区别与现有技术中,比如,采集设备发送给网络设备的第一呼叫请求消息中可以携带视频拼接信息。如此,本申请实施例仅需对各个设备之间的通信协议进行扩展,即可适用于现有的通信流程,从而具有较强的适用性,且实现较为方便。
步骤205,采集设备确定网络设备接收到视频拼接信息后,向网络设备发送未拼接的多个视频流。
此处,采集设备确定网络设备接收到视频拼接信息的具体实现方式可以有多种。比如,若采集设备通过第一呼叫请求消息向网络设备发送视频拼接信息,则采集设备在接收到网络设备返回的第一呼叫请求消息的呼叫应答消息(200OK)后,可确定网络设备接收到视频拼接信息。
进一步地,采集设备可根据上述第二响应消息携带的网络设备的媒体面地址信息,向网络设备发送未拼接的多个视频流。
步骤206,网络设备接收未拼接的多个视频流,并根据视频拼接信息对未拼接的多个视频流进行拼接,以及根据用户当前的视场角对拼接后的视频流进行处理,得到第一视频流,第一视频流对应的视场角为所述用户当前的视场角。
可选地,网络设备还可以根据拼接后的视频流得到第二视频流,所述第二视频流对应的视场角大于所述用户当前的视场角,所述第二视频流的视频质量低于所述第一视频流的视频质量。在一个示例中,所述第二视频流为全景视频流。
需要说明的是,本申请实施例中,网络设备根据拼接后的全景视频流得到第一视频流的具体实现方式可以有多种,比如,网络设备根据用户当前的视场角对全景视频流进行裁剪,得到用户当前的视场角对应的视频流(即第一视频流),具体不做限定。网络设备根据拼接后的视频流得到第二视频流的具体实现方式也可以有多种,具体不做限定。
步骤207,网络设备向呈现设备发送第一视频流。
可选地,若步骤206中网络设备还生成有第二视频流,则可以向呈现设备同时或者单独发送所述第二视频流。进一步地,第一视频流中可以携带有第一视频流的标识,第二视频流中可以携带有第二视频流的标识,如此,呈现设备接收到第一视频流和第二视频流后,可以根据标识来区分第一视频流和第二视频流。其中,第一视频流的标识和第二视频流的标识可以使用不同的同步源(synchronization source,SSRC)字段来表示,也就是说,第一视频流的标识和第二视频流的标识可以均为SSRC标识符。
本申请实施例中,网络设备在根据用户当前的视场角对拼接后的视频流进行处理之前,可以通过多种方式得到用户当前的视场角。下面分别针对以下两种场景进行描述。
场景1:媒体面通信过程中
一种可能的实现方式为,呈现设备可以实时监测用户的视场角,并在确定用户的视场 角发生变化后,向网络设备发送请求消息(消息名可以定义为Refresh FOV),请求消息用于指示用户当前的视场角;如此,网络设备接收到请求消息后,可获取到用户当前的视场角。
进一步地,请求消息可以为RTCP消息,请求消息中可以包括用户当前的视场角信息,在一个示例中,用户当前的视场角信息可以包括用户当前的视场角对应的中心方位角、用户当前的视场角对应的中心高程、用户当前的视场角对应的方位角范围、用户当前的视场角对应的高程范围和用户当前的视场角对应的中心倾角,本申请实施例对此不做限定。请求消息中还可以包括第一视频流的标识,如此,网络设备接收到呈现设备发送的请求消息后,可根据请求消息中携带的同步源标识,确定出需要更新的视频流为第一视频流。
如表3所示,为请求消息的关键字段示例。
表3:请求消息的关键字段示例
Figure PCTCN2019093651-appb-000002
需要说明的是,在媒体面通信过程中,用户的视场角可能会发生多次变化,因此,在每一次发生变化时,呈现设备均可以向网络设备发送所述请求消息,来指示用户当前的视场角。
场景2:媒体面通信初始阶段
一种可能的实现方式为,网络设备中预先设定有默认的视场角,如此,网络设备在对未拼接的多个视频流进行拼接后,可以根据拼接后的视频流得到低清的全景视频流(对应于第二视频流),并先基于默认的视场角对拼接后的视频流进行处理得到第四视频流(对应于第一视频流),第四视频流对应的视场角为默认的视场角;网络设备将低清的全景视频流和第四视频流发送给呈现设备。相应地,呈现设备接收到低清的全景视频流和第四视频流后,若确定用户当前的视场角与默认的视场角不同,则可向网络设备发送请求消息,请求消息用于指示用户当前的视场角。如此,网络设备接收到请求消息后,可获取到用户的视场角。在其它可能的实现方式中,呈现设备也可以在媒体面通信的初始阶段,主动向网络设备请求消息。
需要说明的是,场景2的请求消息的关键字段可以与场景1中的请求消息的关键字段相同,此处不再赘述。
步骤208,呈现设备接收第一视频流,并对第一视频流进行播放。
此处,若步骤207中网络设备向呈现设备发送第一视频流和第二视频流,则相应地,呈现设备可接收第一视频流和第二视频流,并根据第一视频流的标识和第二视频流的标识识别出第一视频流,进而对第一视频流进行播放。
根据上述描述可知,由采集设备将视频拼接信息发送给网络设备,如此,在后续过程中采集设备可以无需对多个视频流进行拼接,而由网络设备根据视频拼接信息对多个视频流进行拼接,由于网络设备的处理能力比采集设备的处理能力强,从而能够有效提高视频拼接的效率,降低传输时延。
本申请实施例中,在网络设备向呈现设备发送第一视频流和第二视频流的情形中,呈现设备确定用户的视场角发生变化之后,可向网络设备发送请求消息,并先根据用户当前的视场角,对第二视频流进行处理,得到第三视频流,第三视频流对应的视场角为所述用户当前的视场角,进而对第三视频流进行播放;后续在接收到网络设备根据请求消息返回的用户当前的视场角对应的视频流后,可开始播放该视频流。如此,在用户的视场角发生变化后,先由呈现设备根据第二视频流得到视频质量较低的第三视频流并播放,以便于及时响应视场角的变化;而网络设备接收到请求消息后,可及时调整用户当前的视场角对应的视频流,并反馈给呈现设备进行播放;由于人眼的视觉延迟效应,用户在观看视频时对上述短暂的延迟过程可能无感知或无明显感知,从而能够有效提高用户体验。
其中,呈现设备对第二视频流进行处理得到第三视频流的实现方式可以参见上述网络设备对拼接后的视频流进行处理得到第一视频流的实现方式,具体不做限定。
上述步骤流程中所涉及的网络设备可以为第一媒体服务器、第二媒体服务器和核心网网元的统称。实施例一中,主要基于图1a描述了网络设备执行视频拼接的过程,在图1b和图1c所示意的系统架构中,执行视频拼接的主体可以为第一媒体服务器,或者,也可以为第二媒体服务器。
下面结合实施例二和实施例三以执行视频拼接的主体为第二媒体服务器为例来进行介绍。
实施例二
在实施例二中,主要基于图1b所示意的系统架构对本申请实施例进行描述。
图3为本申请实施例二提供的视频拼接方法对应的流程示意图。如图3所示,该方法包括:
步骤301,采集设备向第一媒体服务器发送视频拼接信息。
此处,采集设备可以向第一媒体服务器发送第一呼叫请求消息(具体可以为invite消息),第一呼叫请求消息中携带视频拼接信息。
步骤302,第一媒体服务器接收到采集设备发送的视频拼接信息后,将视频拼接信息转发给核心网网元。
此处,第一媒体服务器可以向核心网网元发送第二呼叫请求消息(具体可以为invite消息),第二呼叫请求消息中携带视频拼接信息。
步骤303,核心网网元接收到第一媒体服务器发送的视频拼接信息后,将视频拼接信息转发给第二媒体服务器。
此处,核心网网元可以向第二媒体服务器发送第三呼叫请求消息(具体可以为invite消息),第三呼叫请求消息中携带视频拼接信息。
针对于步骤301至步骤303,考虑到现有的invite消息中并不支持携带视频拼接信息,本申请实施例可以对SDP进行扩展,来实现通过invite消息携带视频拼接信息的目的,具体方式可以参见实施例一中的描述,此处不再赘述。
步骤304,第二媒体服务器接收到核心网网元发送的视频拼接信息后,向呈现设备发送视频播放信息。
此处,第二媒体服务器可以向第二终端设备发送第四呼叫请求消息(具体可以为invite消息),第四呼叫请求消息中携带视频播放信息。
其中,视频播放信息可以用于指示网络设备向呈现设备传输的视频流的格式,比如可以为OMAF或者Tiled VR格式。
步骤305,呈现设备接收到视频播放信息后,向第二媒体服务器发送第一响应消息(具体可以为183消息)。
此处,第一响应消息中可以携带第二终端设备的媒体面地址,以便于后续第二媒体服务和第二终端设备进行媒体面通信。
步骤306,第二媒体服务器接收到第一响应消息后,向核心网网元发送第二响应消息。此处,第二响应消息中可以携带第二媒体服务器的媒体面地址。
步骤307,核心网网元接收到第二媒体服务器发送的第二响应消息后,向第一媒体服务器发送第三响应消息。此处,第三响应消息中可以携带第二媒体服务器的媒体面地址,以便于后续第一媒体服务器和第二媒体服务进行媒体面通信。
需要说明的是,核心网网元可以不参与媒体面通信,其主要用于在信令面通信中转发第一媒体服务器和第二媒体服务器之间的信令。
步骤308,第一媒体服务器接收到核心网网元发送的第三响应消息后,向采集设备发送第四响应消息。
本申请实施例中,上述步骤301至步骤308相比于图2中所示意的步骤201至步骤204来说,具体示意出了由第一媒体服务器、第二媒体服务器和核心网网元参与的信令传输过程,除此之外的其它内容,均可参照上述步骤201至步骤204的描述,此处不再赘述。
步骤309,采集设备向第一媒体服务器发送未拼接的多个视频流。
步骤310,第一媒体服务器接收采集设备发送的未拼接的多个视频流,并将其发送给第二媒体服务器。
步骤311,第二媒体服务器根据视频拼接信息,对未拼接的多个视频流进行拼接,以及根据用户当前的视场角对拼接后的视频流进行处理得到第一视频流,第一视频流对应的视场角为所述用户当前的视场角。
可选地,第二媒体服务器还可以根据拼接后的视频流得到第二视频流,所述第二视频流对应的视场角大于所述用户当前的视场角,所述第二视频流的视频质量低于所述第一视频流的视频质量。在一个示例中,所述第二视频流为全景视频流。
步骤312,第二媒体服务器将第一视频流发送给呈现设备。
可选地,若步骤311中网络设备还生成有第二视频流,则可以向呈现设备同时发送第 一视频流和第二视频流。
步骤313,呈现设备接收第一视频流,并对第一视频流进行播放。
此处,若步骤312中第二媒体服务器向呈现设备发送第一视频流和第二视频流,则相应地,呈现设备可接收第一视频流和第二视频流,并根据第一视频流的标识和第二视频流的标识识别出第一视频流,进而对第一视频流进行播放。
本申请实施例中,上述步骤309至步骤313相比于图2中所示意的步骤205至步骤208来说,具体示意出了第一媒体服务器和第二媒体服务器参与的视频流传输过程,除此之外的其它内容,均可参照上述步骤205至步骤208的描述,此处不再赘述。
实施例二中,需要对采集设备、第一媒体服务器、核心网网元和第二媒体服务器进行SDP的扩展,从而使得采集设备、第一媒体服务器、核心网网元和第二媒体服务器之间传输的SIP消息中可以携带视频拼接信息。
根据上述描述可知,由采集设备将视频拼接信息发送给第二媒体服务器,如此,在后续过程中采集设备可以无需对多个视频流进行拼接,而由第二媒体服务器根据视频拼接信息对多个视频流进行拼接,由于第二媒体服务器的处理能力比采集设备的处理能力强,从而能够有效提高视频拼接的效率,降低传输时延。
实施例三
在实施例三中,主要基于图1c所示意的系统架构对本申请实施例进行描述。
图4为本申请实施例三提供的视频拼接方法对应的流程示意图,如图4所示,该方法包括:
400a,第一终端设备和采集设备建立连接。
一种可能的实现方式中,第一终端设备和采集设备可以通过现有的近场设备配对技术进行自动配对,从而建立连接,并由第一终端设备为采集设备配置RTSP地址。
400b,第二终端设备和呈现设备建立连接。
一种可能的实现方式中,第二终端设备和呈现设备可以通过Wi-Fi或者通用串行总线(universal serial bus,USB)建立连接。
步骤401,第一终端设备向采集设备发送描述请求消息(DESCRIBE request)。
此处,描述请求消息用于请求获取采集设备的媒体初始化描述信息。
步骤402,采集设备根据描述请求消息,向第一终端设备返回描述响应消息(DESCRIBE response)。
此处,描述响应消息可以为200OK消息。描述响应消息中可以携带媒体初始化描述信息。本申请实施例中,描述响应消息中还可以携带视频拼接信息。考虑到现有的描述响应消息中并不支持携带视频拼接信息,本申请实施例可以对SDP进行扩展,来实现通过描述响应消息向第一终端设备发送视频拼接信息的目的。
本申请实施例中,视频拼接信息的具体描述可以参见上述实施例一中的介绍,此处不再赘述。进一步地,对SDP进行扩展的实现方式可以和实施例一中对SDP进行扩展的实现方式相同,此处不再赘述。
通过对采集设备和第一终端设备进行SDP扩展,使得采集设备和终端设备之间传输的RSTP消息中可以携带视频拼接信息。
步骤403,第一终端设备接收到描述响应消息后,向第一媒体服务器发送第一呼叫请求消息(具体可以为invite消息),第一呼叫请求消息中携带视频拼接信息。
步骤404,第一媒体服务器接收到第一终端设备发送的第一呼叫请求消息后,向核心网网元发送第二呼叫请求消息(具体可以为invite消息),第二呼叫请求消息中携带视频拼接信息。
步骤405,核心网网元接收到第一媒体服务器发送的第二呼叫请求消息后,向第二媒体服务器发送第三呼叫请求消息(具体可以为invite消息),第三呼叫请求消息中携带视频拼接信息。
针对于步骤403至步骤405,考虑到现有的invite消息中并不支持携带视频拼接信息,本申请实施例可以对SDP进行扩展,来实现通过invite消息携带视频拼接信息的目的,具体方式可以参见实施例一中的描述,此处不再赘述。
通过对第一终端设备、第一媒体服务器、核心网网元和第二媒体服务器进行SDP扩展,使得第一终端设备、第一媒体服务器、核心网网元和第二媒体服务器之间传输的SIP消息中可以携带视频拼接信息。
步骤406,第二媒体服务器接收到核心网网元发送的第三呼叫请求消息后,向第二终端设备发送第四呼叫请求消息(具体可以为invite消息),第四呼叫请求消息中携带视频播放信息。
其中,视频播放信息可以用于指示网络设备向呈现设备传输的视频流的格式,比如可以为OMAF或者Tiled VR格式。
步骤407,第二终端设备接收到第四呼叫请求消息后,向第二媒体服务器发送第一响应消息(具体可以为183消息)。
此处,第一响应消息中可以携带第二终端设备的媒体面地址,以便于后续第二媒体服务和第二终端设备进行媒体面通信。
步骤408,第二媒体服务器接收到第二终端设备发送的第一响应消息后,向核心网网元发送第二响应消息。
此处,第二响应消息中可以携带第二媒体服务器的媒体面地址。
步骤409,核心网网元接收到第二媒体服务器发送的第二响应消息后,向第一媒体服务器发送第三响应消息。
此处,第三响应消息中可以携带第二媒体服务器的媒体面地址,以便于后续第一媒体服务器和第二媒体服务进行媒体面通信。
需要说明的是,核心网网元可以不参与媒体面通信,其主要用于在信令面通信中转发第一媒体服务器和第二媒体服务器之间的信令。
步骤410,第一媒体服务器接收到核心网网元发送的第三响应消息后,向第一终端设备发送第四响应消息。
此处,第四响应消息中可以携带第一媒体服务器的媒体面地址,以便于后续第一终端设备和第一媒体服务器进行媒体面通信。
步骤411,第一终端设备接收到第四响应消息后,向采集设备发送建立请求消息(SETUP request)。
此处,建立请求消息可用于设置会话的属性和传输模式,以及提醒采集设备建立会话等。建立请求消息中可以携带传输地址信息(以便于建立会话之后进行通信),一种可能的实现方式中,传输地址信息可以包括第一终端设备的媒体面地址,进一步地,第一终端设备可建立第一终端设备的媒体面地址与第一媒体服务器的媒体面地址之间的对应关系。
步骤412,采集设备向第一终端设备返回建立响应消息(SETUP response)。
此处,建立响应消息可以为200OK消息,用于与第一终端设备建立会话,并返回会话标识符和会话相关信息。
步骤413,第一终端设备向采集设备发送播放请求消息(PLAY request)。
此处,播放请求消息用于请求播放,即请求采集设备发送视频流。
步骤414,采集设备根据所述播放请求消息,向第一终端设备发送未拼接的多个视频流。
步骤415,第一终端设备将未拼接的多个视频流发送给第一媒体服务器。
此处,第一终端设备可以根据第一终端设备的媒体面地址与第一媒体服务器的媒体面地址之间的对应关系,将未拼接的多个视频流发送给第一媒体服务器。比如,第一终端设备的媒体面地址1a和第一媒体服务器的媒体面地址1b对应,则第一终端设备可以将通过地址1a接收到视频流对应发送给地址1b。
需要说明的是,上述流程是以传输地址信息包括第一终端设备的媒体面地址为例来描述的。在另一种可能的实现方式中,传输地址信息包括第一媒体服务器的媒体面地址,如此,采集设备接收到第一终端设备发送的播放请求消息后,可直接向第一媒体服务器发送未拼接的多个视频流,而无需再由第一终端设备转发给第一媒体服务器,从而能够有效提高传输效率,并降低第一终端设备的资源消耗。
步骤416,第一媒体服务器接收未拼接的多个视频流,并将未拼接的多个视频流发送给第二媒体服务器。
步骤417,第二媒体服务器接收未拼接的多个视频流,并根据视频拼接信息,对未拼接的多个视频流进行拼接,以及根据用户当前的视场角对拼接后的视频流进行处理得到第一视频流,第一视频流对应的视场角为所述用户当前的视场角。
可选地,第二媒体服务器还可以根据拼接后的视频流得到第二视频流,所述第二视频流对应的视场角大于所述用户当前的视场角,所述第二视频流的视频质量低于所述第一视频流的视频质量。在一个示例中,所述第二视频流为全景视频流。
在第二媒体服务器和第二终端设备进行媒体面通信的初始阶段,一种可能的实现方式中,第二媒体服务器预先设定有默认的视场角,如此,第二媒体服务器在对未拼接的多个视频流进行拼接后,可以根据拼接后的视频流得到低清的全景视频流(对应于第二视频流),并先基于默认的视场角对拼接后的视频流进行处理得到第四视频流(对应于第一视频流),第四视频流对应的视场角为默认的视场角;第二媒体服务器将低清的全景视频流和第四视频流发送给呈现设备。相应地,呈现设备接收到低清的全景视频流和第四视频流后,若确定用户当前的视场角与默认的视场角不同,则可向第二媒体服务器发送请求消息,请求消息用于指示用户当前的视场角。如此,第二媒体服务器接收到请求消息后,可获取到用户当前的视场角,并根据用户当前的视场角对拼接后的视频流进行处理得到第一视频流。
在其它可能的实现方式中,也可以由呈现设备在媒体面通信的初始阶段主动向第二媒 体服务器请求消息,如此,第二媒体服务器接收到请求消息后,可获取到用户当前的视场角,并根据用户当前的视场角对拼接后的视频流进行处理得到第一视频流。
在第二媒体服务器和第二终端设备进行媒体面通信的过程中,呈现设备可以实时监测用户的视场角,并在确定用户的视场角发生变化后,向第二媒体服务器发送请求消息,如此,第二媒体服务器接收到请求消息后,可获取到用户当前的视场角,并根据用户当前的视场角对拼接后的视频流进行处理得到第一视频流。
需要说明的是,上述所涉及的请求消息可以为本申请实施例中扩展的一个特定的RTCP消息,具体可参见实施例一中的描述,此处不再赘述。
步骤418,第二媒体服务器将第一视频流发送给第二终端设备。
可选地,若步骤417中网络设备还生成有第二视频流,则可以向呈现设备同时或者单独发送所述第二视频流。
步骤419,第二终端设备接收第一视频流,并通过呈现设备对第一视频流进行播放。
此处,若步骤418中第二媒体服务器向第二终端设备发送第一视频流和第二视频流,则相应地,第二终端设备可接收第一视频流和第二视频流,并根据第一视频流的标识和第二视频流的标识识别出第一视频流,进而通过呈现设备对第一视频流进行播放。
需要说明的是,上述流程仅为一种示例性说明,具体实施中可以在上述基础上增加一些步骤,或者删除上述部分步骤,又或者对上述部分步骤进行替换,具体不做限定。比如,针对于步骤401和步骤402,若存在其它途径得到媒体初始化描述信息,则也可以不执行步骤401和步骤402。
针对于上述实施例一至实施例三,需要说明的是:实施例一中,通过呈现设备向网络设备发送请求消息来更新第一视频流的方法可以适用于多种场景中,比如,这一方法也可以适用于由采集设备执行视频拼接的场景中。进一步地,此处的网络设备可以为实施例二和实施例三中的第二媒体服务器,第二媒体服务器可以部署于实施例二中呈现设备的边缘数据中心或者实施例三中第二终端设备的边缘数据中心,比如,5G NR场景中的移动边缘计算(mobile edge computing,MEC);如此,可以有效降低呈现设备或第二终端设备向第二媒体服务器发送请求消息的传输时延,以及第二媒体服务器根据请求消息向呈现设备或第二终端设备返回更新后的视频流的传输时延,从而能够进一步提升用户体验。
本申请实施例中,考虑到媒体面通信过程中,用户的视场角可能会经常发生变化,此时,若未能及时有效地更新用户的视场角对应的视频流,则会导致用户体验较差。基于此,本申请实施例还提供一种更新视频流的方法,用于在用户的视场角发生变化时,及时更新用户的视场角对应的视频流,提高用户体验。下面结合实施例四和实施例五进行具体描述。
实施例四
在实施例四中,主要基于图1a所示意的系统架构对更新视频流的方法进行描述。
图5为本申请实施例提供的更新视频流的方法对应的流程示意图,如图5所示,包括:
步骤501,呈现设备向网络设备发送请求消息,所述请求消息用于指示所述用户当前的视场角。
此处,请求消息可以为本申请实施例扩展的一个特定的RTCP消息。请求消息中可以包 括用户当前的视场角信息,在一个示例中,用户当前的视场角信息可以包括用户当前的视场角对应的中心方位角、用户当前的视场角对应的中心高程、用户当前的视场角对应的方位角范围、用户当前的视场角对应的高程范围和用户当前的视场角对应的中心倾角,本申请实施例对此不做限定。
一种可能的实现方式中,呈现设备在播放视频的过程中,若确定用户的视场角发生变化,则可向网络设备发送请求消息。此处情形下,步骤501之前,还可以包括:
步骤500a,网络设备向呈现设备发送第一视频流,第一视频流对应的视场角为用户当前的视场角(变化前的视场角)。可选地,网络设备还可以向呈现设备发送第二视频流,第二视频流对应的视场角大于用户当前的视场角,在一个示例中,第二视频流可以为全景视频流。此种情形下,第一视频流中可以携带有第一视频流的标识,第二视频流中可以携带有第二视频流的标识。进一步地,第二视频流的视频质量可以低于第一视频流的视频质量,以便于降低网络传输资源的消耗。
步骤500b,呈现设备接收第一视频流,并对第一视频流进行播放。此处,若步骤501中网络设备向呈现设备发送第一视频流和第二视频流,则相应地,呈现设备可接收第一视频流和第二视频流,并根据第一视频流的标识和第二视频流的标识识别出第一视频流,进而对第一视频流进行播放。
进一步地,在网络设备向呈现设备发送第一视频流和第二视频流的情形下,上述请求消息中还可以包括第一视频流的标识,如此,网络设备接收到呈现设备发送的请求消息后,可根据请求消息中携带的同步源标识,确定出需要更新的视频流为第一视频流。
本申请实施例中,呈现设备还可以先根据用户当前的视场角,对第二视频流进行处理,得到第三视频流,第三视频流对应的视场角为变化后的视场角,进而对第三视频流进行播放;如此,在用户的视场角发生变化后,先由呈现设备根据第二视频流得到视频质量较低的第三视频流并播放,以便于及时响应视场角的变化。后续在接收到网络设备根据请求消息返回的变化后的视场角对应的视频流后,可开始播放变化后的视场角对应的视频流。
在其它可能的实现方式中,呈现设备也可以在其它情形的触发下向网络设备发送请求消息,具体不做限定。
步骤502,网络设备接收呈现设备发送的请求消息,以及获取拼接后的视频流,并根据用户当前的视场角对拼接后的视频流进行处理,得到第一视频流,所述第一视频流对应的视场角为所述用户当前的视场角(变化后的视场角)。
此处,网络设备获取拼接后的视频流有多种可能的实现方式。一种可能的实现方式为,采集设备在采集到未拼接的多个视频流后,根据视频拼接信息对未拼接的多个视频流进行拼接,并将拼接后的视频流发送给网络设备,如此,网络设备可获取到拼接后的视频流。另一种可能的实现方式为,采集设备在信令通信过程中将视频拼接信息发送给网络设备,并直接将未拼接的多个视频流发送给网络设备,如此,网络设备可根据视频拼接信息对接收到的未拼接的多个视频流进行拼接,进而得到拼接后的视频流。
可选地,网络设备还可以根据拼接后的视频流得到第二视频流。
步骤503,网络设备向呈现设备发送第一视频流(变化后的视场角对应的视频流)。
可选地,若步骤502中网络设备还生成有第二视频流,则可以向呈现设备同时或者单 独发送所述第二视频流。
步骤504,呈现设备接收网络设备根据所述请求消息返回的第一视频流,并对所述第一视频流进行播放。
此处,若步骤503中网络设备向呈现设备发送第一视频流和第二视频流,则相应地,呈现设备可接收第一视频流和第二视频流,并根据第一视频流的标识和第二视频流的标识识别出第一视频流,进而对第一视频流进行播放。
本申请实施例中通过扩展一个特定的RTCP消息,使得呈现设备能够将用户当前的视场角通知给网络设备,进而便于网络设备及时更新得到用户当前的视场角对应的视频流,并反馈给呈现设备进行播放,从而有效提高用户体验。
需要说明的是,本申请实施例提供的更新视频流的方法也可以适用于图1b和图1c所示意的系统架构,其具体实现可以参照上述描述。下面主要以适用于图1c所示意的系统架构为例,结合实施例五对在用户的视场角发生变化的情形下更新视频流的过程进行具体描述。
实施例五
在实施例五中,主要基于图1c所示意的系统架构对更新视频流的方法进行描述。
图6为本申请实施例提供的更新视频流的方法对应的流程示意图,如图6所示,包括:
步骤601,第二媒体服务器向第二终端设备发送第一视频流和第二视频流,其中,第一视频流中可以携带有第一视频流的标识,第二视频流中可以携带有第二视频流的标识。
此处,第一视频流的标识和第二视频流的标识可以使用不同的SSRC字段来表示。第一视频流对应的视场角为用户当前的视场角(变化前的视场角),第二视频流对应的视场角大于用户当前的视场角。在一个示例中,第二视频流可以为全景视频流。
本申请实施例中,第二视频流的视频质量可以低于第一视频流的视频质量,以便于降低网络传输资源的消耗。
步骤602,第二终端设备接收第一视频流和第二视频流,并根据第一视频流的标识和第二视频流的标识识别出第一视频流,进而通过呈现设备对第一视频流进行播放。
步骤603,第二终端设备确定用户的视场角发生变化。
此处,在呈现设备播放视频的过程中,第二终端设备可以实时监测用户的视场角,以便于及时识别出用户的视场角是否发生变化;或者,第二终端设备也可以周期性监测用户的视场角。具体可根据实际需要进行设置,本申请实施例对此不做限定。
步骤604a,第二终端设备向第二媒体服务器发送请求消息,所述请求消息用于指示用户当前的视场角(此时为变化后的视场角)。
进一步地,请求消息可以为本申请实施例扩展的一个特定的RTCP消息。请求消息中可以包括用户当前的视场角信息,在一个示例中,用户当前的视场角信息可以包括用户当前的视场角对应的中心方位角、用户当前的视场角对应的中心高程、用户当前的视场角对应的方位角范围、用户当前的视场角对应的高程范围和用户当前的视场角对应的中心倾角,本申请实施例对此不做限定。请求消息中还可以包括第一视频流的标识,如此,第二媒体服务器接收到第二终端设备发送的请求消息后,可根据请求消息中携带的同步源标识,确定出需要更新的视频流为第一视频流。
步骤604b,第二终端设备根据变化后的视场角对第二视频流进行处理,得到第三视频流,并通过呈现设备对第三视频流进行播放。其中,第三视频流对应的视场角为变化后的视场角。
步骤605,第二媒体服务器接收第二终端设备发送的请求消息,以及获取拼接后的视频流,并根据变化后的视场角对拼接后的视频流进行处理,得到第一视频流(此时为更新后的视频流,即变化后的视场角对应的视频流)。以及,第二媒体服务器还可以根据拼接后的视频流得到第二视频流。
此处,第二媒体服务器获取拼接后的视频流有多种可能的实现方式。一种可能的实现方式为,采集设备在采集到未拼接的多个视频流后,根据视频拼接信息对未拼接的多个视频流进行拼接,并将拼接后的视频流发送给第二媒体服务器,如此,第二媒体服务器可获取到拼接后的视频流。另一种可能的实现方式为,采集设备在信令通信过程中将视频拼接信息发送给第二媒体服务器,并直接将未拼接的多个视频流发送给第二媒体服务器,如此,第二媒体服务器可根据视频拼接信息对接收到的未拼接的多个视频流进行拼接,进而得到拼接后的视频流。可以理解的,还存在其它可能的实现方式,比如由第一媒体服务器对多个未拼接的视频流进行拼接,并将拼接后的视频流发送给第二媒体服务器,此处不再一一列举。
步骤606,第二媒体服务器向第二终端设备发送第一视频流(变化后的视场角对应的视频流)和第二视频流。
步骤607,第二终端设备接收第一视频流(变化后的视场角对应的视频流)和第二视频流,并对第一视频流进行播放(即停止获取并播放第三视频流,开始播放第一视频流)。
进一步地,第二媒体服务器可以部署于第二终端设备的边缘数据中心,比如,5G NR场景中的MEC;如此,可以有效降低第二终端设备向第二媒体服务器发送请求消息的传输时延,以及第二媒体服务器根据请求消息向第二终端设备返回更新后的视频流的传输时延,从而能够进一步提升用户体验。
采用上述方式,在用户的视场角发生变化后,先由呈现设备根据第二视频流得到视频质量较低的第三视频流并播放,以便于及时响应视场角的变化;而网络设备接收到请求消息后,可及时调整用户当前的视场角对应的视频流,并反馈给呈现设备进行播放;由于人眼的视觉延迟效应,用户在观看视频时对上述短暂的延迟过程可能无感知或无明显感知,从而能够有效提高用户体验。
需要说明的是,上述实施例一至实施例五中所涉及的步骤编号仅为执行流程的一种示例,并不构成对各个步骤的执行先后顺序的具体限定,比如,步骤604a和步骤604b可以同时执行。
可以理解的是,上述实施例中的各个设备为了实现相应的功能,其可以包括执行各个功能相应的硬件结构和/或软件模块。本领域技术人员应该很容易意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,本发明能够以硬件或硬件和计算机软件的结合形式来实现。某个功能究竟以硬件还是计算机软件驱动硬件的方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来 实现所描述的功能,但是这种实现不应认为超出本发明的范围。
在采用集成的单元的情况下,图7示出了本发明实施例中所涉及的装置的可能的示例性框图,该装置700可以以软件的形式存在。装置700可以包括:处理单元702和通信单元703。作为一种实现方式,该通信单元703可以包括接收单元和发送单元。处理单元702用于对装置700的动作进行控制管理。通信单元703用于支持装置700与其他设备的通信。装置700还可以包括存储单元701,用于存储装置700的程序代码和数据。
其中,处理单元702可以是处理器或控制器,例如可以是通用中央处理器(central processing unit,CPU),通用处理器,数字信号处理(digital signal processing,DSP),专用集成电路(application specific integrated circuits,ASIC),现场可编程门阵列(field programmable gate array,FPGA)或者其他可编程逻辑器件、晶体管逻辑器件、硬件部件或者其任意组合。其可以实现或执行结合本发明公开内容所描述的各种示例性的逻辑方框,模块和电路。所述处理器也可以是实现计算功能的组合,例如包括一个或多个微处理器组合,DSP和微处理器的组合等等。通信单元703可以是通信接口、收发器或收发电路等,其中,该通信接口是统称,在具体实现中,该通信接口可以包括多个接口。存储单元701可以是存储器。
该装置700可以是本申请所涉及的采集设备,或者还可以为采集设备中的芯片。处理单元702可以支持装置700执行上文中各方法示例中采集设备的动作。通信单元703可以支持装置700与网络设备之间的通信,例如,通信单元703用于支持装置700执行图2中的步骤201和步骤205。
具体地,通信单元703可用于向网络设备发送视频拼接信息,以及在处理单元702确定所述网络设备接收到所述视频拼接信息后,向所述网络设备发送未拼接的多个视频流,所述视频拼接信息用于对所述未拼接的多个视频流进行拼接。
在一种可能的实现方式中,所述视频拼接信息包括所述未拼接的多个视频流的标识、所述多个视频流之间的同步信息以及所述多个视频流分别对应的相机标定参数。
在一种可能的实现方式中,所述通信单元703向网络设备发送视频拼接信息,具体为:通过终端设备向所述网络设备所述视频拼接信息。
在一种可能的实现方式中,所述通信单元703向所述网络设备发送未拼接的多个视频流,具体为:通过所述终端设备向所述网络设备发送未拼接的多个视频流;或者,接收所述终端设备发送的所述网络设备的地址信息,并根据所述网络设备的地址信息向所述网络设备发送所述未拼接的多个视频流。
该装置700还可以是本申请所涉及的网络设备,或者还可以为网络设备中的芯片。处理单元702可以支持装置700执行上文中各方法示例中网络设备的动作。通信单元703可以支持装置700与其它设备(比如采集设备或呈现设备)之间的通信,例如,通信单元703用于支持装置700执行图2中的步骤202、步骤204、步骤206和步骤207等。
具体地,通信单元703可用于接收采集设备发送的视频拼接信息,以及接收所述采集设备发送的未拼接的多个视频流;处理单元702可用于根据所述视频拼接信息对所述多个视频流进行拼接。
在一种可能的实现方式中,所述通信单元703还用于:接收来自呈现设备的请求消息, 所述请求消息用于指示用户当前的视场角;所述处理单元702还用于:根据所述用户当前的视场角对拼接后的视频流进行处理,得到第一视频流;所述通信单元703还用于:将所述第一视频流发送给所述呈现设备;所述第一视频流对应的视场角为所述用户当前的视场角。
在一种可能的实现方式中,所述处理单元702还用于:根据拼接后的视频流得到第二视频流;
所述通信单元还用于:将所述第二视频流发送给所述呈现设备,所述第二视频流对应的视场角大于所述用户当前的视场角,所述第二视频流的视频质量低于所述第一视频流的视频质量。
在一种可能的实现方式中,所述请求消息中还包括所述第一视频流的标识;所述第一视频流的标识为同步源SSRC标识符。
在一种可能的实现方式中,所述网络设备为媒体服务器,所述媒体服务器部署于所述呈现设备的边缘数据中心。
参阅图8所示,为本申请提供的一种装置示意图,该装置可以是上述采集设备、网络设备、呈现设备或终端设备,或者,也可以是设置在采集设备、网络设备、呈现设备或终端设备中的芯片。该装置800包括:处理器802、通信接口803、存储器801。可选的,装置800还可以包括总线804。其中,通信接口803、处理器802以及存储器801可以通过通信线路804相互连接;通信线路804可以是外设部件互连标准(peripheral component interconnect,简称PCI)总线或扩展工业标准结构(extended industry standard architecture,简称EISA)总线等。所述通信线路804可以分为地址总线、数据总线、控制总线等。为便于表示,图8中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。
处理器802可以是一个CPU,微处理器,ASIC,或一个或多个用于控制本申请方案程序执行的集成电路。
通信接口803,使用任何收发器一类的装置,用于与其他设备或通信网络通信,如以太网,无线接入网(radio access network,RAN),无线局域网(wireless local area networks,WLAN),有线接入网等。
存储器801可以是只读存储器(read-only memory,ROM)或可存储静态信息和指令的其他类型的静态存储设备,随机存取存储器(random access memory,RAM)或者可存储信息和指令的其他类型的动态存储设备,也可以是电可擦可编程只读存储器(electrically er服务器able programmable read-only memory,EEPROM)、只读光盘(compact disc read-only memory,CD-ROM)或其他光盘存储、光碟存储(包括压缩光碟、激光碟、光碟、数字通用光碟、蓝光光碟等)、磁盘存储介质或者其他磁存储设备、或者能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质,但不限于此。存储器可以是独立存在,通过通信线路804与处理器相连接。存储器也可以和处理器集成在一起。
其中,存储器801用于存储执行本申请方案的计算机执行指令,并由处理器802来控制执行。处理器802用于执行存储器801中存储的计算机执行指令,从而实现本申请上述 实施例提供的方法。
可选的,本申请实施例中的计算机执行指令也可以称之为应用程序代码,本申请实施例对此不作具体限定。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本发明实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘Solid State Disk(SSD))等。
本申请是参照根据本申请的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
显然,本领域的技术人员可以对本申请进行各种改动和变型而不脱离本申请的精神和范围。这样,倘若本申请的这些修改和变型属于本申请权利要求及其等同技术的范围之内,则本申请也意图包含这些改动和变型在内。

Claims (22)

  1. 一种视频拼接方法,其特征在于,所述方法包括:
    采集设备向网络设备发送视频拼接信息;
    所述采集设备确定所述网络设备接收到所述视频拼接信息后,向所述网络设备发送未拼接的多个视频流,所述视频拼接信息用于对所述未拼接的多个视频流进行拼接。
  2. 根据权利要求1所述的方法,其特征在于,所述视频拼接信息包括所述未拼接的多个视频流的标识、所述多个视频流之间的同步信息以及所述多个视频流分别对应的相机标定参数。
  3. 根据权利要求1或2所述的方法,其特征在于,所述采集设备向网络设备发送视频拼接信息,包括:
    所述采集设备通过终端设备向所述网络设备所述视频拼接信息。
  4. 根据权利要求3所述的方法,其特征在于,所述采集设备向所述网络设备发送未拼接的多个视频流,包括:
    所述采集设备通过所述终端设备向所述网络设备发送未拼接的多个视频流;或者,
    所述采集设备接收所述终端设备发送的所述网络设备的地址信息,并根据所述网络设备的地址信息向所述网络设备发送所述未拼接的多个视频流。
  5. 一种视频拼接方法,其特征在于,所述方法包括:
    网络设备接收采集设备发送的视频拼接信息;
    所述网络设备接收所述采集设备发送的未拼接的多个视频流,并根据所述视频拼接信息对所述多个视频流进行拼接。
  6. 根据权利要求5所述的方法,其特征在于,所述方法还包括:
    所述网络设备接收来自呈现设备的请求消息,所述请求消息用于指示用户当前的视场角;
    所述网络设备根据所述用户当前的视场角对拼接后的视频流进行处理,得到第一视频流,并将所述第一视频流发送给所述呈现设备;所述第一视频流对应的视场角为所述用户当前的视场角。
  7. 根据权利要求6所述的方法,其特征在于,所述方法还包括:
    所述网络设备根据拼接后的视频流得到第二视频流,并将所述第二视频流发送给所述呈现设备,所述第二视频流对应的视场角大于所述用户当前的视场角,所述第二视频流的视频质量低于所述第一视频流的视频质量。
  8. 根据权利要求6或7所述的方法,其特征在于,所述请求消息中还包括所述第一视频流的标识;所述第一视频流的标识为同步源SSRC标识符。
  9. 根据权利要求5至8中任一项所述的方法,其特征在于,所述网络设备为媒体服务器,所述媒体服务器部署于所述呈现设备的边缘数据中心。
  10. 一种采集设备,其特征在于,所述采集设备包括:
    通信单元,用于向网络设备发送视频拼接信息;以及,在处理单元确定所述网络设备接收到所述视频拼接信息后,向所述网络设备发送未拼接的多个视频流,所述视频拼接信息用于对所述未拼接的多个视频流进行拼接。
  11. 根据权利要求10所述的采集设备,其特征在于,所述视频拼接信息包括所述未拼接的多个视频流的标识、所述多个视频流之间的同步信息以及所述多个视频流分别对应的相机 标定参数。
  12. 根据权利要求10或11所述的采集设备,其特征在于,所述通信单元向网络设备发送视频拼接信息,具体为:
    通过终端设备向所述网络设备所述视频拼接信息。
  13. 根据权利要求12所述的采集设备,其特征在于,所述通信单元向所述网络设备发送未拼接的多个视频流,具体为:
    通过所述终端设备向所述网络设备发送未拼接的多个视频流;或者,
    接收所述终端设备发送的所述网络设备的地址信息,并根据所述网络设备的地址信息向所述网络设备发送所述未拼接的多个视频流。
  14. 一种网络设备,其特征在于,所述网络设备包括:
    通信单元,用于接收采集设备发送的视频拼接信息;以及接收所述采集设备发送的未拼接的多个视频流;
    处理单元,用于根据所述视频拼接信息对所述多个视频流进行拼接。
  15. 根据权利要求14所述的网络设备,其特征在于,所述通信单元还用于:接收来自呈现设备的请求消息,所述请求消息用于指示用户当前的视场角;
    所述处理单元还用于:根据所述用户当前的视场角对拼接后的视频流进行处理,得到第一视频流;
    所述通信单元还用于:将所述第一视频流发送给所述呈现设备;所述第一视频流对应的视场角为所述用户当前的视场角。
  16. 根据权利要求15所述的网络设备,其特征在于,所述处理单元还用于:根据拼接后的视频流得到第二视频流;
    所述通信单元还用于:将所述第二视频流发送给所述呈现设备,所述第二视频流对应的视场角大于所述用户当前的视场角,所述第二视频流的视频质量低于所述第一视频流的视频质量。
  17. 根据权利要求15或16所述的网络设备,其特征在于,所述请求消息中还包括所述第一视频流的标识;所述第一视频流的标识为同步源SSRC标识符。
  18. 根据权利要求14至17中任一项所述的网络设备,其特征在于,所述网络设备为媒体服务器,所述媒体服务器部署于所述呈现设备的边缘数据中心。
  19. 一种采集设备,其特征在于,所述采集设备包括:
    存储器,用于存储软件程序;
    处理器,用于执行所述存储器中的软件程序,以使得所述采集设备执行权利要求1至权利要求4中任一项所述的方法。
  20. 一种网络设备,其特征在于,所述网络设备包括:
    存储器,用于存储软件程序;
    处理器,用于执行所述存储器中的软件程序,以使得所述网络设备权利要求5至权利要求9中任一项所述的方法。
  21. 一种计算机存储介质,其特征在于,所述存储介质中存储软件程序,该软件程序在被一个或多个处理器执行时实现权利要求1至权利要求9中任一项所述的方法。
  22. 一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述权利要求1-9任一所述的方法。
PCT/CN2019/093651 2018-06-29 2019-06-28 一种视频拼接方法及装置 WO2020001610A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810714820.5A CN110662119A (zh) 2018-06-29 2018-06-29 一种视频拼接方法及装置
CN201810714820.5 2018-06-29

Publications (1)

Publication Number Publication Date
WO2020001610A1 true WO2020001610A1 (zh) 2020-01-02

Family

ID=68985413

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/093651 WO2020001610A1 (zh) 2018-06-29 2019-06-28 一种视频拼接方法及装置

Country Status (2)

Country Link
CN (1) CN110662119A (zh)
WO (1) WO2020001610A1 (zh)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111556076B (zh) * 2020-05-15 2020-12-29 杭州玖城网络科技有限公司 一种多路径网络实时视频传输的方法
CN113978410A (zh) * 2021-11-11 2022-01-28 南京市德赛西威汽车电子有限公司 一种基于车载环视摄像头的移动设备端互联方法及系统
CN114222162B (zh) * 2021-12-07 2024-04-12 浙江大华技术股份有限公司 视频处理方法、装置、计算机设备及存储介质
CN117440175A (zh) * 2022-07-14 2024-01-23 抖音视界有限公司 用于视频传输的方法、装置、系统、设备和介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101146231A (zh) * 2007-07-03 2008-03-19 浙江大学 根据多视角视频流生成全景视频的方法
CN106550239A (zh) * 2015-09-22 2017-03-29 北京同步科技有限公司 360度全景视频直播系统及其实现方法
CN107205122A (zh) * 2017-08-03 2017-09-26 哈尔滨市舍科技有限公司 多分辨率全景视频直播拍照系统与方法
US9838687B1 (en) * 2011-12-02 2017-12-05 Amazon Technologies, Inc. Apparatus and method for panoramic video hosting with reduced bandwidth streaming
CN107529064A (zh) * 2017-09-04 2017-12-29 北京理工大学 一种基于vr终端反馈的自适应编码方法
CN107707830A (zh) * 2017-10-27 2018-02-16 哈尔滨市舍科技有限公司 基于单向通信的全景视频播放拍照系统

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104602129B (zh) * 2015-01-27 2018-03-06 三星电子(中国)研发中心 互动式多视角视频的播放方法及系统
CN105898139A (zh) * 2015-12-23 2016-08-24 乐视致新电子科技(天津)有限公司 全景视频制作方法及装置、全景视频播放方法及装置
DE102017009149A1 (de) * 2016-11-04 2018-05-09 Avago Technologies General Ip (Singapore) Pte. Ltd. Aufzeichnung und Wiedergabe von 360-Grad-Videos mit Objektverfolgung

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101146231A (zh) * 2007-07-03 2008-03-19 浙江大学 根据多视角视频流生成全景视频的方法
US9838687B1 (en) * 2011-12-02 2017-12-05 Amazon Technologies, Inc. Apparatus and method for panoramic video hosting with reduced bandwidth streaming
CN106550239A (zh) * 2015-09-22 2017-03-29 北京同步科技有限公司 360度全景视频直播系统及其实现方法
CN107205122A (zh) * 2017-08-03 2017-09-26 哈尔滨市舍科技有限公司 多分辨率全景视频直播拍照系统与方法
CN107529064A (zh) * 2017-09-04 2017-12-29 北京理工大学 一种基于vr终端反馈的自适应编码方法
CN107707830A (zh) * 2017-10-27 2018-02-16 哈尔滨市舍科技有限公司 基于单向通信的全景视频播放拍照系统

Also Published As

Publication number Publication date
CN110662119A (zh) 2020-01-07

Similar Documents

Publication Publication Date Title
WO2020001610A1 (zh) 一种视频拼接方法及装置
US11457283B2 (en) System and method for multi-user digital interactive experience
US11765427B2 (en) Virtual reality video playing method, terminal, and server
WO2020057661A1 (zh) 一种拍照方法、装置与设备
TWI590664B (zh) 互動式視訊會議
JP6515215B2 (ja) 動画再生方法及び装置
US20110088068A1 (en) Live media stream selection on a mobile device
US11889159B2 (en) System and method for multi-user digital interactive experience
US9756373B2 (en) Content streaming and broadcasting
US20170064360A1 (en) Content streaming and broadcasting
CN109983777B (zh) 启用媒体编排的方法、客户端设备和控制器系统
JP2022536182A (ja) データストリームを同期させるシステム及び方法
KR20140103156A (ko) 멀티미디어 서비스를 이용하기 위한 시스템, 장치 및 방법
US11924397B2 (en) Generation and distribution of immersive media content from streams captured via distributed mobile devices
WO2014094537A1 (zh) 沉浸通信客户端、服务器及获取内容视图的方法
US10182204B1 (en) Generating images of video chat sessions
WO2019100631A1 (zh) 视频播放方法、装置、系统及存储介质
GB2567136A (en) Moving between spatially limited video content and omnidirectional video content
EP4304166A1 (en) Augmented reality communication method, apparatus, and system
KR20170085781A (ko) 유무선 통신 네트워크 기반 가상 현실 영상 제공 및 이를 위한 예매 시스템
US11968476B2 (en) Virtual environment streaming to a video communications platform
KR102465403B1 (ko) 2d영상 및 360도 영상이 조합된 동영상 콘텐츠를 제공하는 방법 및 장치
US11917232B2 (en) Collaborative video capture and sharing
CN112738472B (zh) 全景直播探视方法及系统、计算机可读存储介质
CN109842542A (zh) 即时会话方法及装置、电子设备、存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19826044

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19826044

Country of ref document: EP

Kind code of ref document: A1