WO2021088691A1 - 一种增强现实ar通信系统及基于ar的通信方法 - Google Patents

一种增强现实ar通信系统及基于ar的通信方法 Download PDF

Info

Publication number
WO2021088691A1
WO2021088691A1 PCT/CN2020/124168 CN2020124168W WO2021088691A1 WO 2021088691 A1 WO2021088691 A1 WO 2021088691A1 CN 2020124168 W CN2020124168 W CN 2020124168W WO 2021088691 A1 WO2021088691 A1 WO 2021088691A1
Authority
WO
WIPO (PCT)
Prior art keywords
media
sbc
terminal device
server
media server
Prior art date
Application number
PCT/CN2020/124168
Other languages
English (en)
French (fr)
Inventor
高扬
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2021088691A1 publication Critical patent/WO2021088691A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/141Systems for two-way working between two video terminals, e.g. videophone
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/235Processing of additional data, e.g. scrambling of additional data or processing content descriptors
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/478Supplemental services, e.g. displaying phone caller identification, shopping application
    • H04N21/4788Supplemental services, e.g. displaying phone caller identification, shopping application communicating with other users, e.g. chatting
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working

Definitions

  • the embodiments of the present application relate to the field of communication technology, and in particular to an augmented reality AR communication system and an AR-based communication method.
  • Long-term evolution voice bearer (voice over long term evolution, VoLTE) is an end-to-end voice solution based on all-IP conditions on the 4th generation (4G) network. VoLTE makes the connection waiting time shorter when communicating between users, and the voice and video call quality is higher.
  • Augmented reality is a technology that ingeniously integrates virtual information with the real world. It uses a variety of technical methods such as multimedia, three-dimensional modeling, real-time tracking and registration, intelligent interaction, and sensing, and generates After the virtual information such as text, image, 3D model, music, video, etc. is simulated and simulated, it is applied to the real world, and the two kinds of information complement each other, thus realizing the "enhancement" of the real world. Augmented reality technology can not only effectively reflect the content of the real world, but also promote the display of virtual information content. There is currently no effective way to integrate AR into voice and video calls.
  • the embodiments of the present application provide an augmented reality communication system and an AR-based communication method, and provide an implementation method of integrating AR into a voice and video call, thereby improving user experience.
  • the AR communication system may include a first AR media server and a first session border controller (session border controller, SBC); the first SBC is used to receive the first media stream from the first terminal device, and the received second A media stream is sent to the first AR media server; the first AR media server is configured to perform media enhancement processing on the received upstream media stream, and the upstream media stream includes the first media stream.
  • SBC session border controller
  • the AR communication system also includes an application server.
  • the application server is used to interact with the terminal device and the AR media server.
  • the application server is configured to receive an AR interface operation instruction from the first terminal device and send the operation instruction to the first AR media server; the first AR media server is specifically configured to respond to the first AR interface operation instruction received according to the AR interface operation instruction.
  • the media stream performs media enhancement processing.
  • the application server is deployed at the central node in the system.
  • an auxiliary transmission channel is established between the first AR media server and the first terminal device;
  • the first AR media server is further configured to receive the auxiliary media stream from the first terminal device through the auxiliary transmission channel, and perform media enhancement processing on the auxiliary media stream and the first media stream.
  • auxiliary media streams with high real-time requirements by establishing an auxiliary transmission channel between the first AR media server and the first terminal device, it is possible to reduce transmission delay and improve user experience.
  • a control interface can be deployed between the application server and the AR media server to transmit operation instructions from the terminal device.
  • a data interface can also be deployed, which can be used to transmit data that requires less real-time performance.
  • the auxiliary media stream includes one or more of point cloud data, spatial data, user-view video, or virtual model.
  • point cloud data, spatial data, user-view videos or virtual models can also be sent from the terminal device to the AR media server through the application server.
  • the application server is also used to send the virtual model to the AR media server; the AR media server is also used to perform media enhancement processing on the virtual model and the first media stream.
  • the upstream media stream of the AR media server may include the virtual model and the first media stream.
  • it also includes a second SBC, the first SBC is deployed at the second edge node in the system, the second SBC is used to manage the second terminal device; the second SBC is also used to receive data from the second terminal The second media stream of the device and sends the second media stream to the first AR media server; the first AR media server is also used to receive the second media stream, and perform media enhancement on the first media stream and the second media stream deal with.
  • the first SBC is deployed at the second edge node in the system, the second SBC is used to manage the second terminal device; the second SBC is also used to receive data from the second terminal The second media stream of the device and sends the second media stream to the first AR media server; the first AR media server is also used to receive the second media stream, and perform media enhancement on the first media stream and the second media stream deal with.
  • the AR media server can perform media enhancement processing on the media streams of the dual-side terminal devices.
  • at least two media processing instances can be deployed in the AR media server to perform media enhancement processing in response to requests from different terminal devices.
  • the AR communication system further includes a second SBC, which is deployed at a second edge node in the system, and the second SBC is used to manage the second terminal device; the first AR media server is also used to The media stream that has undergone media enhancement processing is sent to the second SBC; the second SBC is used to send the media stream from the first AR media server to the second terminal device.
  • a second SBC which is deployed at a second edge node in the system, and the second SBC is used to manage the second terminal device; the first AR media server is also used to The media stream that has undergone media enhancement processing is sent to the second SBC; the second SBC is used to send the media stream from the first AR media server to the second terminal device.
  • the first SBC and the first AR media server are deployed at the first edge node in the system.
  • the AR media server is deployed at the edge node and is relatively closer to the terminal device user, which can reduce the transmission delay and improve the user experience.
  • the first SBC is deployed at the first edge node in the system
  • the first AR media server is deployed at the central node in the system. Deploy the AR media server at the central node to reduce the number of AR media server deployments and reduce expenses.
  • a second AR media server and a second SBC are also deployed in the system; the first SBC and the second AR media server are deployed on the first edge node in the system, and the second SBC and the first AR media server The second edge node deployed in the system; the first SBC is used to send the first media stream from the first terminal device managed by the first SBC to the first AR media server through the second AR media server; the second AR The media server is also configured to receive the first media stream from the first SBC, and send the first media stream to the first AR media server.
  • a second AR media server and a second SBC are also deployed in the AR communication system; the first SBC and the first AR media server are deployed on the first edge node in the system, so The second SBC and the second AR media server are deployed in a second edge node in the system. There is a media stream channel between the second AR media server and the second SBC. The second AR media server and the first AR media server have a media stream channel.
  • the second SBC is used to receive the second media stream from the second terminal device and send the received second media stream to the second AR media server; the second AR media server is used to receive The received second media stream undergoes media enhancement processing.
  • the second AR media server may send the media stream after media enhancement processing to the first terminal device through the first AR media service.
  • the second SBC is used to receive the second media stream from the second terminal device, and send the received second media stream to the second AR media server.
  • the second AR media server sends the second media stream to the first AR media server, so that the first AR media server performs media enhancement processing according to the first media stream and the second media stream.
  • the first AR media server may send the media stream from the first terminal device to the second AR after receiving the media stream from the first terminal device.
  • Media server the second AR media server performs media enhancement processing.
  • the first AR media server is further configured to send the first media stream after media enhancement processing to the second SBC corresponding to the second terminal device.
  • an embodiment of the present application provides a communication method based on augmented reality, which is applied to an AR communication system, the AR communication system includes a first session boundary controller SBC and a first AR media server, and the method includes: An SBC receives the first media stream from the first terminal device, and sends the received first media stream to the first augmented reality AR media server; the first AR media server performs media on the received first media stream Enhanced processing.
  • the AR communication system further includes an application server
  • the method further includes: the application server receives an AR interface operation instruction from the first terminal device, and sends the operation instruction to the The first AR media server; the first AR media server performs media enhancement processing on the received first media stream, including: the first AR media server performs media enhancement processing on the received first media stream according to the AR interface operation instruction Perform media enhancement processing.
  • an auxiliary transmission channel is established between the first AR media server and the first terminal device, and the method further includes: the first AR media server receives the data from the first terminal device through the auxiliary transmission channel Auxiliary media stream; the first AR media server performs media enhancement processing on the received first media stream, including: the first AR media server performs media enhancement processing on the auxiliary media stream and the first media stream.
  • the auxiliary media stream includes one or more of point cloud data, spatial data, user-view video, or virtual model.
  • the method further includes: the application server sends a virtual model to the AR media server; the first AR media server performs media enhancement processing on the received first media stream, including: The first AR media server performs media enhancement processing on the virtual model and the first media stream.
  • the AR communication system further includes a second SBC
  • the second SBC is used to manage the second terminal device
  • the method further includes: the second SBC receives data from the second SBC Managing the second media stream of the second terminal device, and sending the second media stream to the first AR media server;
  • the first AR media server is further configured to receive the second media stream;
  • the first AR media server performs media enhancement processing on the received first media stream, including: the first AR media server performs media enhancement processing on the first media stream and the second media stream.
  • the AR communication system further includes a second SBC
  • the second SBC is used to manage the second terminal device
  • the method further includes: the first AR media server passes the media The enhanced media stream is sent to the second SBC; the second SBC sends the media stream from the first AR media server to the second terminal device.
  • a second AR media server and a second SBC are also deployed in the system; the second SBC receives the second media stream from the second terminal device, and combines the received second media stream Sent to the second AR media server; the second AR media server performs media enhancement processing on the received second media stream.
  • the method further includes: the first AR media server sending the first media stream after the media enhancement processing to the second SBC corresponding to the second terminal device.
  • FIG. 1 is a schematic diagram of a possible AR communication system architecture in an embodiment of this application
  • FIG. 2 is a schematic diagram of another possible AR communication system architecture in an embodiment of this application.
  • FIG. 3 is a schematic diagram of another possible AR communication system architecture in an embodiment of this application.
  • FIG. 4 is a schematic diagram of a display interface of a possible terminal device in an embodiment of this application.
  • FIG. 5 is a schematic diagram of another possible AR communication system architecture in an embodiment of this application.
  • FIG. 6 is a schematic diagram of another possible AR communication system architecture in an embodiment of this application.
  • FIG. 7 is a schematic diagram of input and output of an AR media server in an embodiment of the application.
  • Example 8 is a schematic diagram of the input and output of the AR media server in Example 1 of the embodiments of the application;
  • FIG. 9 is a schematic diagram of the input and output of the AR media server in Example 2 of the embodiment of the application.
  • FIG. 10 is a schematic diagram of input and output of an AR media server in Example 3 of an embodiment of the application.
  • FIG. 11 is a schematic diagram of input and output of another AR media server in Example 3 of an embodiment of the application.
  • FIG. 12 is a schematic diagram of the input and output of the AR media server in Example 4 of the embodiment of the application;
  • FIG. 13 is a schematic flowchart of a possible AR-based communication method in an embodiment of this application.
  • FIG. 14A is a schematic flowchart of another possible AR-based communication method in an embodiment of this application.
  • 14B is a schematic flowchart of another possible AR-based communication method in an embodiment of this application.
  • 15 is a schematic flowchart of another possible AR-based communication method in an embodiment of this application.
  • FIG. 16 is a schematic diagram of a method for a terminal device to trigger an AR video enhancement process in an embodiment of the application
  • FIG. 17 is a schematic diagram of a process of establishing an auxiliary transmission channel between a terminal device and an AR media server in an embodiment of the application.
  • the present application provides an AR-based communication system and an AR-based communication method, and provides an implementation method of integrating AR into voice and video calls, thereby improving user experience.
  • Voice and video calls can be but not limited to using VoLTE, and can also be applied to voice and video calls provided by future technologies.
  • the communication system includes one or more session border controllers (SBC) and one or more AR media servers.
  • the AR media server may also be called an AR media enabler (AR media enabler).
  • AR media enabler Two terminal devices can conduct voice and video calls through the communication system, and during the voice and video calls, the AR media enabler performs media enhancement processing on the media streams generated during the voice and video calls.
  • AR media enabler's specific and strong image processing functions and data calculation functions can use AR technology to perform logical operations, screen rendering, virtual scene synthesis and other operations on the received media stream.
  • the AR media server can be deployed in the form of a container service.
  • the AR media server can also be implemented by one or a virtual machine.
  • the AR media server can also include one or a processor, or be implemented by one or more computers, such as a super multi-core computer, a computer deployed with a graphics processing unit (GPU) cluster, a large distributed computer, and hardware resources Pooled cluster computers and so on.
  • SBC is used to manage or control the session of the terminal device.
  • the SBC includes a signaling plane function and a media plane function. For example, it can be used to receive media streams from terminal devices under its management, and send the media streams received from the terminal devices to the AR media server.
  • the AR media server is used to perform media enhancement processing on the received upstream media stream to obtain a downstream video stream.
  • the downstream video stream can be sent by the AR media server to the corresponding terminal device through the SBC.
  • the terminal device may be a device equipped with a camera and a video call function.
  • the terminal device may be a wearable device (such as an electronic watch), and the terminal device may also be a device such as a mobile phone or a tablet computer.
  • the embodiments of the present application do not impose special restrictions on the specific form of the terminal device.
  • two SBCs are taken as an example, namely, a first SBC and a second SBC.
  • the first SBC is used to manage the first terminal device
  • the second SBC is used to manage the second terminal device.
  • different terminal devices can also have the same SBC to manage.
  • the third terminal device described in FIG. 1 is managed by the first SBC.
  • the first SBC is used to receive the first media stream from the first terminal device, and send the received first media stream to the AR media server, and then
  • the AR media server performs media enhancement processing on the received upstream media stream, and the upstream media stream includes the first media stream.
  • the AR media server performs media enhancement processing on the upstream media stream to obtain the downstream media stream
  • the AR media server sends the downstream media stream to the second SBC
  • the second SBC sends it to the second terminal device.
  • the communication system may also include an application server.
  • the application server is used to establish an AR video call triggered by the terminal device. For example, taking the AR video call between the first terminal device and the second terminal device as an example, the application server receives an AR interface operation instruction from the first terminal device, and The AR interface operation instructions are sent to the AR media server, and the AR interface operation instructions are used to instruct the user to perform operations on the AR interface displayed by the first terminal device; and the AR media server is specifically used to perform the received uplink media according to the AR interface operation instructions. Streaming for media enhancement processing.
  • the application server may include a media plug-in service function, and may also be referred to as a plug-in server.
  • the application server also includes an application service function (application service, AS).
  • the media plug-in service function is used to interact with the terminal device, receive the AR interface operation instruction triggered from the terminal device, and send the AR interface operation instruction to the application service function.
  • the application service function is used to interact with the AR media server, and send the AR interface operation instructions sent by the media plug-in service function to the AR media server.
  • the AR media server performs media enhancement processing on the received upstream media stream, it performs media enhancement processing on the received upstream media stream according to the AR interface operation instruction.
  • media plug-in service function and AS can be deployed separately or combined during deployment.
  • the media plug-in service function and AS can be implemented through one device, or through one or more virtual machines.
  • the AR interface operation instruction may be an instruction used to indicate the processing method, which does not have a great requirement for real-time performance, such as a beauty operation.
  • the AR media server performs an operation on each video frame in the received media stream according to the AR interface operation instruction. Perform beauty processing on the face.
  • the AR interface operation instructions may also include instructions for instructing the user to operate in real time, such as model operation instructions for instructing a rotating model, a zooming model, and so on.
  • model operation instructions for instructing a rotating model, a zooming model, and so on are referred to as non-real-time operation instructions
  • instructions for instructing users to operate in real time are referred to as real-time operation instructions.
  • the application server can be deployed at the central node in the system.
  • the first SBC is deployed at the first edge node in the system
  • the AR media server is deployed at the central node in the system.
  • the second SBC is deployed at the second edge node of the system.
  • the edge node is closer to the users of the terminal device, and provides edge computing services, forwarding services, etc. for these users, reducing response delay and bandwidth cost, and reducing the pressure on the central node.
  • the central node and the edge node may be deployed on the cloud.
  • the central node may be referred to as the central cloud
  • the edge node may be referred to as the edge cloud.
  • the edge node may also be a mobile edge computing (Mobile Edge Computing, MEC) node.
  • the central cloud can also deploy an IP Multimedia Subsystem (IMS) core network (core).
  • the IMS core may include a call session control function (CSCF) and a home subscriber server (home subscriber server, HSS).
  • CSCF is the call control center of IMS core, which implements user access, authentication, session routing, and service triggering functions on the IP transmission platform.
  • the CSCF may include one of the serving-call session control function (S-CSCF), proxy-CSCF (Proxy CSCF, P-CSCF), query-CSCF (Interrogating-CSCF, I-CSCF), or Multiple.
  • HSS is used to record the user's subscription data (such as user information, business data).
  • the SBC provides boundary control functions between the access network and the IMS core network and between the IMS core network, and can provide functions such as access control, quality of service control, and firewall traversal.
  • an AR control (may be referred to as an end-side Plugin) can be deployed in the terminal device.
  • the AR control is used for message interaction with the media plug-in service function on the network side.
  • the AR control can also establish an auxiliary transmission channel with the AR media enabler.
  • the auxiliary transmission channel is used for the first terminal device to send the auxiliary media stream to the AR media enabler.
  • the auxiliary media stream may include one or more of point cloud data, spatial data (may also be referred to as spatial pose data), user-view video, or virtual model.
  • Point cloud data refers to data recorded in the form of points. Each point can include spatial location information, as well as color information or reflection intensity information. Spatial data can also be called geometric data.
  • the virtual model may include one or more of a virtual portrait model, a virtual object model, and material images (such as stickers, cartoon avatars, etc.), or a virtual animation model.
  • the user's perspective video may be a video captured by the user through the rear camera of the terminal device, or a video captured by the front camera of the terminal device.
  • the terminal device in the embodiment of the present application may establish different auxiliary transmission channels according to the type of the auxiliary media stream to be transmitted. For example, when point cloud data needs to be transmitted, auxiliary transmission channel 1 is established, and when AR spatial data needs to be transmitted, auxiliary transmission channel 2 is established.
  • the auxiliary transmission channel 2 used to transmit AR spatial data may be referred to as an Action channel, or other names may also be used, which is not limited in the embodiment of the present application.
  • the terminal device can also transmit different types of auxiliary media streams through an auxiliary transmission channel.
  • the user interface of the AR control may be used as a floating window superimposed on the VoLTE call interface during the AR video call of the terminal device.
  • Window 402 displays the image of the user at the opposite end of the call
  • window 401 displays the image of the user at the end of the call
  • window 403 is the user interface of the AR control.
  • the AR media server is deployed in the first edge node of the system (such as The first edge cloud).
  • the AR media server can be deployed independently of the SBC, that is, the AR media server is a separate device, and the AR media server can also be deployed in combination with the SBC.
  • the combined deployment of the device can realize the function of the AR media server and realize the SBC Function.
  • the two terminal devices that make a call are deployed in the edge cloud respectively corresponding to the AR media server.
  • the AR media server deployed in the first edge cloud is called the first AR media server
  • the AR media server deployed in the second edge cloud is called the second AR media server.
  • the media stream processed by the first AR media server for media enhancement processing is first sent to the second AR media server if it needs to be sent to the second terminal device, and then sent by the second AR media server through the second SBC To the second terminal device.
  • edge nodes corresponding to different terminal devices may be the same or different.
  • the central nodes corresponding to different terminal devices may be the same or different.
  • the AR media server corresponding to different terminal devices may be different or the same.
  • the CSCFs corresponding to different terminals may be the same or different.
  • the AR media server performs media enhancement processing on the received media stream.
  • the media stream input by the AR media server is referred to as an upstream media stream
  • the output media stream is referred to as a downstream media stream.
  • the upstream media stream received by the AR media server may include the first media stream from the first terminal device.
  • the upstream media stream may also include one or more of the following: auxiliary media stream and virtual model.
  • the input of the AR media server may also include an AR interface operation instruction, and the AR interface operation instruction may instruct the user to perform an operation on the AR interface displayed by the first terminal device.
  • the output of the AR media server may include the downstream media stream of the first terminal device, and/or the downstream media stream of the second terminal device.
  • the auxiliary media stream may be sent by the first terminal device to the AR media server through the auxiliary transmission channel.
  • the auxiliary media stream may include one or more of point cloud data, spatial data, user-view video, or virtual model.
  • the virtual model is generated by the terminal device and sent to the AR media server through the auxiliary transmission channel.
  • the terminal device may not have the ability to generate a virtual model.
  • the virtual model can be generated by the application server and sent to the AR media. server.
  • the input and output of the AR media server may be different.
  • composition and flow direction of the media stream of the AR media server will be exemplarily described below in conjunction with application scenarios. Take an AR video call between the first terminal device of user 1 and the second terminal device of user 2 as an example.
  • Example 1 One-way AR enhancement scenario. For example, scenes such as beauty, stickers, super scores, and expression-driven calls. This scenario can be applied to the situation where both ends of the call support AR. It can also be applied to the situation where one side supports AR.
  • the first terminal device needs to perform AR processing during a video call with the second terminal device.
  • the input of the AR media server includes the first media stream of the first terminal device.
  • the first media stream is sent by the first terminal device to the AR media server through the first SBC.
  • the first media stream may include a video collected by the first terminal device through a camera, and may also include a voice collected by a microphone.
  • the AR media server performs media enhancement processing on the first media stream and then outputs the downstream media stream of the second terminal device.
  • the first terminal device of user 1 may also display the image of user 1 itself on the basis of displaying the image of user 2, such as window 401 and window 402 shown in FIG. 4.
  • the AR media server When the first media stream is sent to the second terminal device after performing media enhancement processing, it may also be sent to the first terminal device.
  • the AR control of the first terminal device sends the beauty operation instruction to the application server, and the application server sends the beauty operation to the AR media server.
  • the media server performs a beauty operation on the face included in the video image in the received first media stream.
  • a material library is deployed in the application service function in the application server.
  • the material library can include various materials, such as different styles of stickers, and emoticon avatars with different expressions (such as cute cats, funny faces), or virtual portrait models of different styles, and so on.
  • the input of the AR media server also includes the material images from the application server.
  • the AR control of the first terminal device sends an AR interface operation instruction to the application server in response to the prop (such as a virtual portrait) selected by the user, and the AR interface operation instruction is used to indicate the virtual portrait selected by the user.
  • the application server After the application server receives the AR interface operation instruction, it can send the virtual portrait model in the material library to the AR media server, and the AR media server obtains the facial expressions, actions and other data of user 1 from the received first media stream to the virtual portrait model Perform rendering, and send the rendered media stream to the second terminal device.
  • Example 1 may be applicable to an architecture where the AR media server is deployed on a central node, and may also be applicable to an architecture where the AR media server is deployed on an edge node.
  • the AR media server such as the first AR media server
  • the second AR media server performs media enhancement processing on the media
  • the stream is first sent to the AR media server (second AR media server) corresponding to the second terminal device, and the second AR media server sends the stream to the second terminal device through the second SBC.
  • Example 2 Operating interactive call scenarios, such as advertising and distance education. This scenario can be applied to the situation where both ends of the call support AR. It can also be applied to the situation where one side supports AR.
  • the input of the AR media server includes the first media stream of the first terminal device.
  • the first media stream is sent by the first terminal device to the AR media server through the first SBC.
  • the input of the AR media server also includes real-time operation instructions, such as model rotation, model movement or model scaling, space annotation and other operations.
  • the real-time operation instruction may be generated by the user 1 operation, that is, sent by the first terminal device to the AR media server through the application server.
  • the real-time operation instruction may be generated by the operation of the user 2, that is, sent by the second terminal device through the application server.
  • the AR media server may include at least two media processing instances, taking two as an example, media processing instance 1 and media processing instance 2, respectively.
  • Media processing example 1 is used to perform media enhancement processing on the first media stream of the first terminal device.
  • the input of media processing example 1 can include the first media stream and real-time operation instructions.
  • Media processing example 1 enhances the media stream after media processing. It is sent to the first terminal device through the first SBC.
  • Media processing instance 2 is used to perform media enhancement processing on the second media stream of the second terminal device.
  • the input of media processing instance 2 may include the second media stream and real-time operation instructions, and the media stream after the media enhancement processing is passed through the second media stream.
  • the SBC is sent to the second terminal device.
  • the real-time operation instruction input on the media processing instance 1 and the real-time operation instruction input on the media processing instance 2 may be the same, for example, from the first terminal device or from the second terminal device.
  • the real-time operation instructions input on the media processing instance 1 and the real-time operation instructions input on the media processing instance 2 can also be the same.
  • the real-time operation instructions input on the media processing instance 1 come from the first terminal device, and the real-time operation input on the media processing instance 2
  • the indication comes from the second terminal device.
  • the first AR media server may perform media enhancement processing on the first media stream of the first terminal device
  • the second AR media server may perform media enhancement processing on the second media stream of the second terminal device. deal with.
  • the input of the first AR media server may include a first media stream and a real-time operation instruction, and the media stream after the media enhancement processing of the first AR media server is sent to the first terminal device through the first SBC.
  • the input of the second AR media server may include a second media stream and a real-time operation instruction, and the media stream after the media enhancement processing of the second AR media server is sent to the first terminal device through the first SBC.
  • the real-time operation instruction input on the first AR media server and the real-time operation instruction input by the second AR media server may be the same, for example, the real-time operation instruction comes from the first terminal device or from the second terminal device.
  • the real-time operation instructions input on the first AR media server and the real-time operation instructions input by the second AR media server may also be different.
  • the real-time operation instructions input on the first AR media server come from the first terminal device, and the real-time operation instructions input on the second AR media server The real-time operation instructions come from the second terminal device.
  • the input of the AR media server may also include a virtual model, and the virtual model may be sent by the application server to the AR media server. It should be noted that the virtual model may not be transmitted in real time, but can be transmitted once by the application server. In a scenario where the terminal device provides a virtual model, the terminal device may send the virtual model to the AR media server through the application server, or the AR control on the terminal device may send the virtual model to the AR media server through the auxiliary transmission channel.
  • the housing provider corresponds to the second terminal device
  • the house buyer is the first terminal device.
  • the first terminal device sends the first media stream (as a background stream) of the house-buying user to the AR media server through the first SBC.
  • the first media stream can be collected by a rear camera on the first terminal device.
  • the AR plug-in of the first terminal device sends the model operation instruction of the house buyer to operate the second-hand house model to the AR media server through the application server.
  • the AR media server obtains the spatial pose data of the buyer’s perspective from the first media stream, renders the second-hand house model according to the spatial pose data of the buyer’s perspective, and superimposes the rendered second-hand house model with the background stream of the buyer and sends it to the first Terminal Equipment.
  • the second media stream for the second terminal device may adopt a similar manner to the processing manner for the first media stream, and the description will not be repeated here.
  • the spatial pose data may be sent by the AR control of the first terminal device to the AR media server through the auxiliary transmission channel.
  • Example 3 image interactive call scene, such as AR holographic call scene, etc.
  • both ends of the call support AR.
  • the AR media server can deploy at least two media processing instances. Taking two as examples, see FIG. 10, which are media processing instance 1 and media processing instance 2, respectively.
  • the input and output of media processing example 1 and the input and output of media processing example 2 are shown in FIG. 10.
  • the input of media processing instance 1 includes the first media stream and the second auxiliary media stream.
  • the first media stream can be sent by the first terminal device to media processing instance 1 through the first SBC
  • the second auxiliary media stream can be the second terminal device.
  • the AR control on the above is sent to the media processing instance 1 through the auxiliary transmission channel.
  • the input of media processing instance 2 includes the second media stream and the first auxiliary media stream.
  • the second media stream can be sent by the second terminal device to media processing instance 2 through the second SBC, and the first auxiliary media stream can be the first terminal device.
  • the AR control on the above is sent to the media processing instance 2 through the auxiliary transmission channel.
  • the first media stream and the first auxiliary media stream may be collected by the first terminal device through the front camera and the rear camera respectively.
  • the second media stream and the second auxiliary media stream may be the front camera of the second terminal device, respectively. And the rear camera is collected.
  • the first media stream includes the environment image of the user 1 of the first terminal device
  • the first auxiliary media stream includes the portrait image of the user 1
  • the second media stream includes the location of the user 2 of the second terminal device.
  • the second auxiliary media stream includes the portrait image of user 2.
  • the first auxiliary media stream of user 1 is input to media processing instance 2, and media processing instance 2 obtains real-time expression and action data of user 1 from the first auxiliary media stream of user 1, and drives the virtual model of user 1;
  • the second media stream of 2 is used as the background stream, and the spatial pose data of user 2’s perspective is obtained according to the background stream, and the virtual model of user 1 is rendered according to the spatial pose data of user 2’s perspective, and superimposed with the second media stream as Downstream video stream of the second terminal device.
  • the second auxiliary media stream of user 2 is input to media processing instance 1, and media processing instance 1 obtains real-time expression and action data of user 2 from the second auxiliary media stream of user 2 to drive the virtual model of user 2; media processing instance 1 Use the first media stream of user 1 as the background stream, obtain the spatial pose data of user 1’s perspective according to the background stream, and render the virtual model of user 2 based on the spatial pose data of user 1’s perspective, and superimpose it with the first media stream.
  • the downstream video stream of the first terminal device As the downstream video stream of the first terminal device.
  • the input and output of the first AR media server and the second AR media server are shown in FIG. 11.
  • the processing methods of the first AR media server and the second AR media server are similar to the processing methods of the aforementioned media processing example 1 and media processing example 2, and will not be repeated here.
  • Example 4 virtual and real superimposed call scenarios, such as remote guidance, etc. This scenario can be applied to the situation where both ends of the call support AR. It can also be applied to the situation where one side supports AR.
  • the input of the AR media server includes the first media stream of the first terminal device and the auxiliary media stream (including point cloud data) of the first terminal device.
  • a depth camera may be configured on the first terminal device to obtain point cloud data, and the point cloud data is used to generate a depth map of the shooting picture, such as a red (R) green (G) blue (B)-depth (D) image.
  • the first media stream of the first terminal device is input to the AR media server, and the first media stream is used as the background stream.
  • the AR media server recognizes the spatial position of the object from the background stream with higher accuracy based on the point cloud data, and after recognizing the object , Superimpose a virtual model, or logo, etc. on the background stream, and the output of the AR media server is used as the downlink video stream of the first terminal device and the second terminal device.
  • FIG. 13 is a schematic flowchart of the AR-based communication method provided by this embodiment of the present application. Take the first terminal device triggering the AR video enhancement process to the second terminal device as an example.
  • the first terminal device triggers a call request to the application server through the first SBC.
  • the application server sends a first session creation request to the AR media server.
  • the first session creation request is used to request the creation of a first media session between the first SBC corresponding to the first terminal device.
  • the first session creation request carries the SDP information of the first SBC, such as the address information of the SBC, the type of the media stream, and the supported media parameters.
  • the first session creation request may be an INVITE message.
  • the AR media server When receiving the first session creation request, the AR media server sends a first session creation response to the application server.
  • the first session creation response is used to indicate that the first media session is successfully created.
  • the first session creation response carries the first media description protocol SDP information of the AR media server, and the first SDP information is used to describe the media for creating the first media session between the first SBC and the AR media server
  • the parameters of the stream channel such as the address information of the AR media server, the type of the media stream, and the supported media parameters.
  • the first session creation response may be 200 OK.
  • the AR media server receives the second session creation request sent by the application server.
  • the second session creation request is used to request the creation of a second media session with the second SBC.
  • the second session creation request may be an INVITE message.
  • the second session creation request may carry a service indication.
  • the service indication is used to indicate the media processing and media flow direction required for this session.
  • the service indication may be a service identification (ServiceID).
  • ServiceID service identification
  • the service indication may also be called the AR service indication.
  • the content indicated by the service indication in different application scenarios is different, that is, in different application scenarios, the media enhancement processing provided by the AR media server is different, and the flow direction of the media stream may also be different.
  • the second session creation request carries an association indication.
  • the association indication may be indicated by the call identifier (for example, CallID) of the second session creation request.
  • the association indication is used to associate the first media session with the second media session.
  • an association relationship between the media stream channel between the first SBC and the AR media server and the media stream channel between the AR media server is established. It can also be said that the media stream of the first terminal device forwarded by the first SBC needs to pass through the AR media server before reaching the second SBC to which the second terminal device belongs.
  • the AR media server sends a second session creation response to the application server.
  • the second session creation response is used to indicate that the second media session is successfully created.
  • the second session creation response carries the second SDP information of the AR media server.
  • the second media description protocol SDP information is used to describe the parameters of the media stream channel used to create the second media session between the second SBC and the AR media server.
  • the first session creation response may be a 183 message.
  • the association relationship between the first interface and the second interface on the AR media server may be established.
  • the first interface is for receiving the media stream sent by the first SBC
  • the second interface is for sending the media stream to the second SBC.
  • the first interface and the second interface may be physical interfaces or physical sub-interfaces, and may also be logical interfaces or logical sub-interfaces.
  • the application server sends a call request to the second terminal device through the second SBC.
  • the application server may bring the second SDP of the AR media server to the second SBC in the call request.
  • the first case is that the AR media server is introduced in the VoLTE call process, that is, the AR media server and the AR media server need to be established during the call establishment process.
  • Media conversation between SBCs For example, the first terminal device initiates the AR video enhancement process when the original call is established, and the first case can be adopted.
  • the second situation is that the original call does not need to perform AR media enhancement (for example, the original call is only an audio call), and the AR media enhancement process is triggered during the call.
  • FIG. 14A uses the same AR media server at both ends of the call as an example.
  • the first terminal device sends a call request 1 to the first SBC.
  • the call request 1 carries media description protocol (session description protocol, SDP) information of the first terminal device.
  • SDP session description protocol
  • the call request 1 may, but is not limited to, use a session initiation protocol (session initiation protocol, SIP), and may also use other types of transmission protocols, which is not limited in this application.
  • SIP session initiation protocol
  • the SDP of the aforementioned terminal may include parameters such as address information, type of media stream, and supported codec format.
  • SDP is used for media plane negotiation between two session entities, and a consensus is reached. It belongs to the signaling language family and can be described in the form of text (character). SDP can include one or more of the following: session ID, session version, session time, the IP and port of the local transmission media stream, the description information of the media stream (such as media type, transmission protocol, media format, etc.) One or more of), etc.
  • the SDP information of the first terminal device is used to describe the parameters of the media stream channel that creates the media session between the first terminal device and the first SBC.
  • the first SBC After receiving the call request 1, the first SBC replaces the SDP information of the first terminal device in the call request 1 with the SDP information of the first SBC to obtain the call request 2, and sends the call request 2 to the S-CSCF.
  • S1403 After receiving the call request 2, the S-CSCF forwards the call request 2 to the application server.
  • the S-CSCF determines that the first terminal device has subscribed to the AR media enhancement service according to the subscription data of the first terminal device, and then forwards the call request 2 to the application server.
  • the application server is used to provide AS media enhancement services.
  • the application server replaces the SDP information of the first SBC in the call request 2 with the second SDP information of the AR media server to obtain the call request 3, and sends the call request 3 to the S-CSCF.
  • the S-CSCF forwards the call request 3 to the second SBC.
  • the second SBC may determine that the previous hop of the media stream channel is the AR media server according to the second SDP information of the AR media server.
  • the second SBC replaces the second SDP information of the AR media server in the call request 3 with the SDP information of the second SBC to obtain the call request 4, and sends the call request 4 to the second terminal device.
  • the second terminal device sends a call response 4 (corresponding to the call request 4) to the second SBC, and the call response 4 may carry the SDP information of the second terminal device.
  • the second SBC After receiving the call response 4, the second SBC sends the call response 3 (corresponding to the call request 3) to the S-CSCF.
  • the call response 3 may carry the SDP information of the second SBC.
  • S1413 After receiving the call response 3, the S-CSCF forwards the call response 3 to the application server.
  • the application server After receiving the call response 3, the application server sends the SDP information of the second SBC to the AR media server. After receiving the SDP information of the second SBC, the AR media server may determine that the next hop of the media stream tunnel is the second SBC.
  • the application server sends a call response 2 (a response corresponding to the call request 2) to the S-CSCF.
  • the call response 2 can carry the second SDP information of the AR media enabler.
  • S1416 The S-CSCF forwards the call response 2 to the first SBC.
  • the first SBC After receiving the call response 2, the first SBC sends the call response 1 to the first terminal device.
  • the call response 1 carries the SDP information of the first SBC.
  • call response 1 to call response 4 may adopt the 183 message type.
  • FIG. 14B is a schematic diagram of the flow of the AR-based communication method provided in this embodiment of the application, taking the first terminal device triggering the AR video enhancement flow to the second terminal device as an example.
  • Figure 14B takes as an example the two ends of the call corresponding to different AR media servers, and the two ends of the call corresponding to different application servers.
  • the first application server sends a session creation request 1 to the first AR media server.
  • the session creation request 1 is used to request the creation of a first media session between the first SBC corresponding to the first terminal device.
  • the session creation request 1 carries the SDP information of the first SBC.
  • the first session creation request may be an INVITE message.
  • Session creation response 1 is used to indicate that the first media session is successfully created.
  • the session creation response 1 carries the first media description protocol SDP information of the first AR media server, and the first SDP information is used to describe the media circulation for creating the first media session between the first SBC and the first AR media server The parameters of the road.
  • the first session creation response may be 200 OK.
  • Session creation request 2 is used to request the creation of a second media session with the second SBC.
  • the session creation request 2 may be an INVITE message.
  • the session creation request 2 may carry a service indication.
  • the second session creation request carries the first association indication.
  • the first association indication may be indicated by the call identification (for example, CallID) of the session creation request 2.
  • the first association indication is used to associate the first media session with the second media session.
  • S1407a The first AR media server sends a session creation response 2 to the first application server.
  • Session creation response 2 is used to indicate that the second media session is successfully created.
  • the second session creation response carries the second SDP information of the AR media server.
  • the second media description protocol SDP information is used to describe the parameters of the media stream channel used to create the second media session between the second SBC and the first AR media server.
  • S1408a The first application server replaces the SDP information of the first SBC in the call request 2 with the second SDP information of the first AR media server to obtain the call request 3, and sends the call request 3 to the S-CSCF1.
  • S-CSCF1 forwards call request 3 to S-CSCF2.
  • the S-CSCF2 forwards the call request 3 to the second application server.
  • the second application server sends a session creation request 3 to the second AR media server.
  • the session creation request 3 is used to request the creation of a third media session with the first AR media server.
  • the session creation request 3 carries the second SDP information of the first AR media server.
  • the second AR media server When receiving the session creation request 3, the second AR media server sends a session creation response 3 to the second application server.
  • the session creation response 3 is used to indicate that the third media session is successfully created.
  • the session creation response 3 carries the first SDP information of the second AR media server, and the first SDP information of the second AR media server is used to describe the creation of a third AR media server between the first AR media server and the second AR media server. The parameters of the media stream channel of the media session.
  • the second AR media server receives the session creation request 4 sent by the second application server.
  • the session creation request 4 is used to request the creation of a fourth media session with the second SBC.
  • the session creation request 4 may be an INVITE message.
  • the session creation request 4 may carry a service indication.
  • the second session creation request carries a second association indication.
  • the second association indication may be indicated by the call identification (for example, CallID) of the session creation request 4.
  • the second association indication is used to associate the third media session with the fourth media session.
  • the second AR media server sends a session creation response 4 to the second application server.
  • the session creation response 4 is used to indicate that the fourth media session is successfully created.
  • the session creation response 4 carries the second SDP information of the second AR media server.
  • the second SDP information of the second AR media server is used to describe the parameters of the media stream channel for establishing the fourth media session between the second SBC and the second AR media server.
  • S1415a The second application server replaces the second SDP information of the first AR media server in the call request 3 with the second SDP information of the second AR media server to obtain the call request 4, and sends it to the S-CSCF2.
  • the S-CSCF2 forwards the call request 4 to the second SBC.
  • the second SBC may determine that the previous hop of the media stream channel is the second AR media server according to the second SDP information of the second AR media server.
  • the second SBC replaces the second SDP information of the second AR media server in the call request 4 with the SDP information of the second SBC to obtain the call request 5, and sends the call request 5 to the second terminal device.
  • the second terminal device sends a call response 5 to the second SBC, and the call response 5 may carry the SDP information of the second terminal device.
  • the second SBC After receiving the call response 5, the second SBC sends the call response 4 to the S-CSCF2.
  • the call response 4 may carry the SDP information of the second SBC.
  • S1420a After receiving the call response 4, the S-CSCF2 forwards the call response 4 to the second application server.
  • the second application server After receiving the call response 4, the second application server sends the SDP information of the second SBC to the second AR media server. After receiving the SDP information of the second SBC, the second AR media server may determine that the next hop of the media stream tunnel is the second SBC.
  • the second application server sends a call response 3 to the S-CSCF2.
  • the call response 3 may carry the first SDP information of the second AR media enabler.
  • S-CSCF2 sends a call response 3 to S-CSCF1.
  • S-CSCF1 sends a call response 3 to the first application server.
  • the first application server sends the first SDP information of the second AR media enabler to the first AR media server. After receiving the first SDP information of the second AR media enabler, the first AR media enabler may determine that the next hop of the media stream tunnel is the second AR media enabler.
  • the first application server sends a call response 2 to the first SBC, and the call response 2 carries the first SDP information of the first AR media server.
  • S1427a After receiving the call response 2, the first SBC sends the call response 1 to the first terminal device.
  • Call response 1 carries the first SDP information of the first SBC.
  • call response 1 to call response 4 may adopt the 183 message type.
  • the VoLTE call is established, and the media stream does not pass through the AR media server.
  • the AR media enhancement process may be triggered by the first terminal device or the second terminal device.
  • the following takes the first terminal device to trigger the AR media enhancement process through the AR control as an example.
  • Figure 14A takes an example where both ends of a call correspond to the same AR media server or an AR media server deployed at one end of the call.
  • the application server sends an AR video call re-invite 1 to the S-CSCF.
  • the AR video call re-request 1 is used to instruct the first type of terminal device to initiate an AR video call.
  • the AR video call re-request 1 may carry the identification information of the first terminal device, such as the SIP address or Uniform Resource Locator (URL) of the first terminal device.
  • the S-CSCF forwards the AR video call re-request 1 to the first SBC.
  • the AR video call re-request may be Re-INVITE.
  • the S-CSCF may determine that the SBC to which the first terminal device belongs is the first SBC according to the identification information of the first terminal device.
  • the first SBC sends an AR video call re-request 2 to the first terminal device.
  • the first terminal device sends an AR video call response 2 to the first SBC, and the AR call request 2 carries media description protocol (session description protocol, SDP) information of the first terminal device.
  • media description protocol session description protocol, SDP
  • the first SBC After receiving the AR video call response 2, the first SBC sends the AR video call response 1 to the S-CSCF.
  • the AR video call response 1 carries the SDP information of the first SBC.
  • S1507 After receiving the AR video call response 1, the S-CSCF forwards the AR video call response 1 to the application server.
  • the AR video call response 1 and the AR video call response 2 may use a 200 OK message.
  • the application server sends an AR video call repeat request 3 to the S-CSCF, and the AR video call repeat request 3 carries the second SDP information of the AR media server.
  • the S-CSCF forwards the AR video call re-request 3 to the second SBC.
  • the second SBC may determine that the previous hop of the media stream channel is the AR media server according to the second SDP information of the AR media server.
  • the second SBC replaces the second SDP information of the AR media server in the AR video call repeat request 3 with the SDP information of the second SBC to obtain the AR video call repeat request 4, and sends the AR video call repeat request 4 to the second terminal equipment.
  • the second terminal device sends an AR video call response 4 to the second SBC, and the AR video call response 4 may carry SDP information of the second terminal device.
  • the second SBC After receiving the AR video call response 4, the second SBC sends the AR video call response 3 to the S-CSCF.
  • the AR video call response 3 may carry the SDP information of the second SBC.
  • S1517 After receiving the AR video call response 3, the S-CSCF forwards the AR video call response 3 to the application server.
  • the application server After receiving the AR video call response 3, the application server sends the SDP information of the second SBC to the AR media server. After receiving the SDP information of the second SBC, the AR media server may determine that the next hop of the media stream tunnel is the second SBC.
  • the AR video call response 3 and the AR video call response 4 may adopt a 200 OK message.
  • the application server sends an AR video call confirmation 1 to the S-CSCF.
  • the AR video call confirmation 1 may carry the second SDP information of the AR media server.
  • S1520 The S-CSCF forwards the AR video call confirmation 1 to the first SBC.
  • the first SBC After receiving the AR video call confirmation 1, the first SBC sends the AR video call confirmation 2 to the first terminal device.
  • the AR video call confirmation 2 carries the SDP information of the first SBC.
  • the AR video call confirmation 1 and the AR video call confirmation 2 may use an acknowledgment (ACK) message.
  • ACK acknowledgment
  • the AR control is deployed on the first terminal device.
  • the AR video enhancement request triggered by the AR control on the first terminal device can be implemented through the following process, as shown in FIG. 16.
  • the AR control is started.
  • the first terminal device can pull up the AR control by calling a broadcast event.
  • the user interface of the AR control can be superimposed on the call interface as a floating window, as shown in Figure 4, for example.
  • the user interface of the AR control may include an AR enhancement start button, and the AR control receives the first operation of the start button by the user and triggers the AR video enhancement request.
  • the AR control establishes a communication connection with the media plug-in service function in the application server through a UX or UI interface.
  • S1602 The AR control sends the AR video enhancement request to the media plug-in service function.
  • the media plug-in service function sends the AR video enhancement request to the application service function.
  • S1604 The application service function triggers the AR video enhancement process. For example, execute S1502.
  • the first terminal device and the second terminal device have established an AR video enhanced call process, and the process of establishing an auxiliary media channel between the first terminal device and the AR media server is used for description.
  • the AR control of the first terminal device determines that the auxiliary media stream needs to be transmitted, it initiates an auxiliary transmission channel establishment request. For example, if the user triggers the opening of the depth camera for obtaining point cloud data through the AR control, it is determined that the auxiliary media stream needs to be transmitted. For another example, if the application used to generate AR spatial data is triggered by the AR control, it is determined that the auxiliary media stream needs to be transmitted.
  • the AR control sends an establishment request to the media plug-in service function in the application server, carrying the address used to send the auxiliary media stream on the first terminal device.
  • the media plug-in service function sends the establishment request to the application service function.
  • S1704 The application service function sends the establishment request to the AR media server.
  • the AR media server sends an establishment response to the application service function.
  • the establishment response may carry the address used to receive the auxiliary media stream on the AR media server.
  • S1706 The application service function sends the establishment response to the media plug-in service function.
  • the media plug-in service function forwards the establishment response to the AR control of the first terminal device. Furthermore, the auxiliary transmission channel between the AR control and the AR media server is established. The first end of the auxiliary transmission channel is the AR control, and the end of the auxiliary transmission channel is the AR media server. Furthermore, the AR control obtains the auxiliary media stream, and sends the auxiliary media stream to the AR media server according to the address used for sending the auxiliary media stream on the first terminal device and the address used for receiving the auxiliary media stream on the AR media server.
  • system and “network” in this article are often used interchangeably in this article.
  • the term “and/or” in this article is only an association relationship describing the associated objects, which means that there can be three relationships, for example, A and/or B, which can mean: A alone exists, A and B exist at the same time, exist alone B these three situations.
  • the character “/” in this text generally indicates that the associated objects before and after are in an "or” relationship.
  • the term “at least one” referred to in this application refers to one, or more than one, that includes one, two, three and more; “multiple” refers to two, or more than two, that is, two, Three and more.
  • At least one item (a) or similar expressions refers to any combination of these items, including any combination of a single item (a) or a plurality of items (a).
  • at least one item (a) of a, b, or c can mean: a, b, c, ab, ac, bc, or abc, where a, b, and c can be single or multiple .
  • B corresponding to A means that B is associated with A, and B can be determined according to A.
  • determining B based on A does not mean that B is determined only based on A, and B can also be determined based on A and/or other information.
  • the terms "including” and “having” in the embodiments, claims, and drawings of the present application are not exclusive. For example, a process, method, system, product, or device that includes a series of steps or modules is not limited to the listed steps or modules, and may also include unlisted steps or modules.
  • the processor in the embodiments of the present application may be a central processing unit (CPU), or may be other general-purpose processors, digital signal processors (digital signal processors, DSP), and application-specific integrated circuits. (application specific integrated circuit, ASIC), field programmable gate array (field programmable gate array, FPGA) or other programmable logic devices, transistor logic devices, hardware components, or any combination thereof.
  • the general-purpose processor may be a microprocessor or any conventional processor.
  • the method steps in the embodiments of the present application can be implemented by hardware, or can be implemented by a processor executing software instructions.
  • Software instructions can be composed of corresponding software modules, which can be stored in random access memory (RAM), flash memory, read-only memory (ROM), programmable read-only memory (programmable ROM) , PROM), erasable programmable read-only memory (erasable PROM, EPROM), electrically erasable programmable read-only memory (electrically EPROM, EEPROM), register, hard disk, mobile hard disk, CD-ROM or well-known in the art Any other form of storage medium.
  • An exemplary storage medium is coupled to the processor, so that the processor can read information from the storage medium and write information to the storage medium.
  • the storage medium may also be an integral part of the processor.
  • the processor and the storage medium may be located in the ASIC.
  • the ASIC can be located in a network device or a terminal device.
  • the processor and the storage medium may also exist as discrete components in the network device or the terminal device.
  • the above-mentioned embodiments it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof.
  • software it can be implemented in the form of a computer program product in whole or in part.
  • the computer program product includes one or more computer programs or instructions.
  • the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.
  • the computer program or instruction may be stored in a computer-readable storage medium or transmitted through the computer-readable storage medium.
  • the computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server integrating one or more available media.
  • the usable medium may be a magnetic medium, such as a floppy disk, a hard disk, and a magnetic tape; it may also be an optical medium, such as a DVD; and it may also be a semiconductor medium, such as a solid state disk (SSD).

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • General Engineering & Computer Science (AREA)
  • Telephonic Communication Services (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

本申请公开一种增强现实AR通信系统及基于AR的通信方法,提供一种AR融入语音视频通话的实现方式,进而提升用户体验。在通信系统中部署AR媒体服务器,AR媒体服务器与通话两侧的终端设备所属的SBC之间打通媒体流通道,使得通话两侧的终端设备之间传输的媒体流在从SBC发出后到达AR媒体服务器,进而AR媒体服务器执行媒体增强处理,实现了在视频通话过程中融入AR处理。

Description

一种增强现实AR通信系统及基于AR的通信方法
本申请要求于2019年11月8日提交中国国家知识产权局、申请号为201911089878.6、发明名称为“一种增强现实AR通信系统及基于AR的通信方法”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请实施例涉及通信技术领域,尤其涉及一种增强现实AR通信系统及基于AR的通信方法。
背景技术
长期演进语音承载(voice over long term evolution,VoLTE)是架构在第四代(the 4th generation,4G)网络上全IP条件下的端到端语音方案。VoLTE使得用户之间通信时,接通等待时间更短,并且语音视频通话质量更高。增强现实(Augmented Reality,AR)是一种将虚拟信息与真实世界巧妙融合的技术,广泛运用了多媒体、三维建模、实时跟踪及注册、智能交互、传感等多种技术手段,将计算机生成的文字、图像、三维模型、音乐、视频等虚拟信息模拟仿真后,应用到真实世界中,两种信息互为补充,从而实现对真实世界的“增强”。增强现实技术不仅能够有效体现出真实世界的内容,也能够促使虚拟的信息内容显示出来。如何将AR融入语音视频通话中,目前并没有一种有效的实现方式。
发明内容
本申请实施例提供一种增强现实通信系统及基于AR的通信方法,提供一种AR融入语音视频通话的实现方式,进而提升用户体验。AR通信系统中可以包括第一AR媒体服务器、第一会话边界控制器(session border controller,SBC);第一SBC,用于接收来自第一终端设备的第一媒体流,并将接收到的第一媒体流发送给第一AR媒体服务器;第一AR媒体服务器,用于对接收到的上行媒体流进行媒体增强处理,上行媒体流中包括第一媒体流。通过部署AR媒体服务器对来自终端设备的媒体流执行媒体增强处理,实现了在视频通话过程中融入AR处理,提升用户体验。
在一种可能的设计中,AR通信系统还包括应用服务器。应用服务器用于与终端设备以及AR媒体服务器进行交互。比如,应用服务器用于接收来自第一终端设备的AR界面操作指示,并将操作指示发送给第一AR媒体服务器;第一AR媒体服务器,具体用于根据AR界面操作指示对接收到的第一媒体流进行媒体增强处理。
在一种可能的设计中,应用服务器部署于系统中的中心节点。
在一种可能的设计中,第一AR媒体服务器与第一终端设备之间建立有辅助传输通道;
第一AR媒体服务器,还用于通过辅助传输通道接收来自第一终端设备的辅助媒体流,对辅助媒体流和第一媒体流进行媒体增强处理。对于实时性要求较高的辅助媒体流,通过建立第一AR媒体服务器与第一终端设备的辅助传输通道,可能降低传输时延,提升用户体验。
一种可能的设计中,应用服务器与AR媒体服务器之间可以部署有控制接口,用于传 输来自终端设备的操作指示。还可以部署数据接口,可以用于传输对实时性要求较低的数据。
示例性地,辅助媒体流包括点云数据、空间数据、用户视角视频或虚拟模型中的一项或多项。
另外,对于实时性要求较低的场景中,点云数据、空间数据、用户视角视频或虚拟模型也可以由终端设备通过应用服务器发送给AR媒体服务器。
在一种可能的设计中,应用服务器,还用于向AR媒体服务器发送虚拟模型;AR媒体服务器,还用于对虚拟模型以及第一媒体流进行媒体增强处理。AR媒体服务器的上行媒体流可以包括虚拟模型以及第一媒体流。
在一种可能的设计中,还包括第二SBC,第一SBC部署于系统中的第二边缘节点,第二SBC用于管理第二终端设备;第二SBC,还用于接收来自第二终端设备的第二媒体流,并将第二媒体流发送给第一AR媒体服务器;第一AR媒体服务器,还用于接收第二媒体流,并对第一媒体流和第二媒体流进行媒体增强处理。
上述设计中,AR媒体服务器可以执行对双侧终端设备的媒体流的媒体增强处理。比如AR媒体服务器中可以部署至少两个媒体处理实例,用来针对不同的终端设备的请求执行媒体增强处理。
在一种可能的设计中,AR通信系统还包括第二SBC,第二SBC部署于系统中的第二边缘节点,第二SBC用于管理第二终端设备;第一AR媒体服务器,还用于将经过媒体增强处理后的媒体流发送给第二SBC;第二SBC,用于将来自第一AR媒体服务器的媒体流发送给第二终端设备。
在一种可能的设计中,第一SBC与第一AR媒体服务器部署于系统中的第一边缘节点。AR媒体服务器部署于边缘节点,相对更靠近终端设备用户,可以降低传输时延,提升用户体验。
在一种可能的设计中,第一SBC部署于系统中的第一边缘节点,第一AR媒体服务器部署于系统中的中心节点。将AR媒体服务器部署于中心节点,减少AR媒体服务器部署数量,减少开支。
在一种可能的设计中,系统中还部署第二AR媒体服务器和第二SBC;第一SBC与第二AR媒体服务器部署于系统中的第一边缘节点,第二SBC与第一AR媒体服务器部署于系统中的第二边缘节点;第一SBC,用于将来自第一SBC所管理的第一终端设备的第一媒体流通过第二AR媒体服务器发送给第一AR媒体服务器;第二AR媒体服务器,还用于接收来自第一SBC的第一媒体流,并将第一媒体流发送给第一AR媒体服务器。
在一种可能的设计中,AR通信系统中还部署第二AR媒体服务器和第二SBC;所述第一SBC与所述第一AR媒体服务器部署于所述系统中的第一边缘节点,所述第二SBC与所述第二AR媒体服务器部署于所述系统中的第二边缘节点。第二AR媒体服务器和第二SBC之间具有媒体流通道。第二AR媒体服务器与第一AR媒体服务器具有媒体流通道。
一种方式中,第二SBC,用于接收来自第二终端设备的第二媒体流,并将接收到的第二媒体流发送给第二AR媒体服务器;第二AR媒体服务器,用于对接收到的第二媒体流进行媒体增强处理。第二AR媒体服务器可以将媒体增强处理后的媒体流通过第一AR媒体服务发送给第一终端设备。
另一种方式中,第二SBC,用于接收来自第二终端设备的第二媒体流,并将接收到的 第二媒体流发送给第二AR媒体服务器。第二AR媒体服务器将第二媒体流发送给第一AR媒体服务器,从而第一AR媒体服务器根据第一媒体流和第二媒体流执行媒体增强处理。
又一种方式中,对于来自第一终端设备的需要进行媒体增强处理后发往终端设备的媒体流,第一AR媒体服务器在接收到来自第一终端设备的媒体流,可以发送给第二AR媒体服务器,由第二AR媒体服务器执行媒体增强处理。
在一种可能的设计中,所述第一AR媒体服务器,还用于将媒体增强处理后的第一媒体流发送给第二终端设备对应的第二SBC。
第二方面,本申请实施例提供一种基于增强现实的通信方法,应用于AR通信系统,所述AR通信系统包括第一话边界控制器SBC和第一AR媒体服务器,所述方法包括:第一SBC接收来自第一终端设备的第一媒体流,并将接收到的第一媒体流发送给所述第一增强现实AR媒体服务器;第一AR媒体服务器对接收到的第一媒体流进行媒体增强处理。
在一种可能的设计中,所述AR通信系统还包括应用服务器,所述方法还包括:所述应用服务器接收来自第一终端设备的AR界面操作指示,并将所述操作指示发送给所述第一AR媒体服务器;第一AR媒体服务器对接收到的第一媒体流进行媒体增强处理,包括:所述第一AR媒体服务器根据所述AR界面操作指示对接收到的所述第一媒体流进行媒体增强处理。
在一种可能的设计中,第一AR媒体服务器与第一终端设备之间建立有辅助传输通道,所述方法还包括:所述第一AR媒体服务器通过辅助传输通道接收来自第一终端设备的辅助媒体流;第一AR媒体服务器对接收到的第一媒体流进行媒体增强处理,包括:所述第一AR媒体服务器对所述辅助媒体流和所述第一媒体流进行媒体增强处理。
在一种可能的设计中,所述辅助媒体流包括点云数据、空间数据、用户视角视频或虚拟模型中的一项或多项。
在一种可能的设计中,所述方法还包括:所述应用服务器向所述AR媒体服务器发送虚拟模型;第一AR媒体服务器对接收到的第一媒体流进行媒体增强处理,包括:所述第一AR媒体服务器对所述虚拟模型以及所述第一媒体流进行媒体增强处理。
在一种可能的设计中,所述AR通信系统还包括第二SBC,所述第二SBC用于管理所述第二终端设备,所述方法还包括:第二SBC接收来自所述第二SBC所管理的第二终端设备的第二媒体流,并将所述第二媒体流发送给所述第一AR媒体服务器;所述第一AR媒体服务器,还用于接收所述第二媒体流;第一AR媒体服务器对接收到的第一媒体流进行媒体增强处理,包括:第一AR媒体服务器对所述第一媒体流和所述第二媒体流进行媒体增强处理。
在一种可能的设计中,所述AR通信系统还包括第二SBC,所述第二SBC用于管理所述第二终端设备,所述方法还包括:所述第一AR媒体服务器将经过媒体增强处理后的媒体流发送给所述第二SBC;所述第二SBC将来自第一AR媒体服务器的媒体流发送给所述第二终端设备。
在一种可能的设计中,所述系统中还部署第二AR媒体服务器和第二SBC;所述第二SBC接收来自第二终端设备的第二媒体流,并将接收到的第二媒体流发送给所述第二AR媒体服务器;所述第二AR媒体服务器对接收到的第二媒体流进行媒体增强处理。
在一种可能的设计中,所述方法还包括:所述第一AR媒体服务器将媒体增强处理后的第一媒体流发送给第二终端设备对应的第二SBC。
附图说明
图1为本申请实施例中一种可能的AR通信系统架构示意图;
图2为本申请实施例中另一种可能的AR通信系统架构示意图;
图3为本申请实施例中又一种可能的AR通信系统架构示意图;
图4为本申请实施例中一种可能的终端设备的显示界面示意图;
图5为本申请实施例中再一种可能的AR通信系统架构示意图;
图6为本申请实施例中再一种可能的AR通信系统架构示意图;
图7为本申请实施例中一种AR媒体服务器输入输出示意图;
图8为本申请实施例的示例1中AR媒体服务器输入输出示意图;
图9为本申请实施例的示例2中AR媒体服务器输入输出示意图;
图10为本申请实施例的示例3中一种AR媒体服务器输入输出示意图;
图11为本申请实施例的示例3中另一种AR媒体服务器输入输出示意图;
图12为本申请实施例的示例4中AR媒体服务器输入输出示意图;
图13为本申请实施例中一种可能的基于AR的通信方法流程示意图;
图14A为本申请实施例中另一种可能的基于AR的通信方法流程示意图;
图14B为本申请实施例中另一种可能的基于AR的通信方法流程示意图;
图15为本申请实施例中又一种可能的基于AR的通信方法流程示意图;
图16为本申请实施例中终端设备触发AR视频增强流程的方法示意图;
图17为本申请实施例中终端设备与AR媒体服务器之间建立辅助传输通道的流程示意图。
具体实施方式
本申请提供一种基于AR的通信系统及基于AR的通信方法,提供一种AR融入语音视频通话的实现方式,进而提升用户体验。语音视频通话可以但不仅限于采用VoLTE,还可以适用于未来技术提供的语音视频通话。
参见图1所示为本申请实施例的一种基于AR的通信系统架构示意图。通信系统中包括一个或者多个会话边界控制器(session border controller,SBC)以及一个或者多个AR媒体服务器。AR媒体服务器也可以称为AR媒体使能器(AR media enabler)。两个终端设备可以通过通信系统进行语音视频通话,并且在语音视频通话过程,由AR media enabler对语音视频通话过程中产生的媒体流进行媒体增强处理。比如AR media enabler具体较强的图像处理功能以及数据计算功能,能够采用AR技术对接收到的媒体流执行逻辑运算、画面渲染、虚拟景象合成等操作。AR媒体服务器可以是以容器服务的形式部署。AR媒体服务器还可以通过一个或者虚拟机来实现。AR媒体服务器也可以包括一个或者处理器,或者通过一个或者多个计算机来实现,比如超多核计算机、部署有图形处理器(graphics processing unit,GPU)集群的计算机、大型的分布式计算机、硬件资源池化的集群计算机等等。SBC用于对终端设备的会话进行管理或控制。SBC包括信令面功能以及媒体面功能,例如可以用于接收来自其管理的终端设备的媒体流,并将从终端设备接收的媒体流发送给AR媒体服务器。AR媒体服务器用于对接收到的上行媒体流进行媒体增强处理得到下行视频流。下行视频流可以由AR媒体服务器通过SBC发送给对应的终端设备。终端设备可以 是配置有摄像头,具有视频通话功能的设备,比如,终端设备可以是可穿戴设备(例如电子手表),终端设备还可以是手机、平板电脑等设备。本申请实施例对终端设备的具体形式不作特殊限制。
图1中以两个SBC为例,分别为第一SBC和第二SBC,所述第一SBC用于管理第一终端设备,第二SBC用于管理第二终端设备。另外,不同终端设备也可以有同一个SBC来管理。比如图1中所述的第三终端设备,第三终端设备由第一SBC管理。以第一终端设备与第二终端设备进行AR视频通话为例,第一SBC用于接收来自第一终端设备的第一媒体流,并将接收到的第一媒体流发送给AR媒体服务器,进而AR媒体服务器对接收到的上行媒体流进行媒体增强处理,上行媒体流包括该第一媒体流。可选地,AR媒体服务器对上行媒体流进行媒体增强处理后得到下行媒体流,AR媒体服务器将下行媒体流发送给第二SBC,由第二SBC发送给第二终端设备。
示例性地,通信系统还可以包括应用服务器。应用服务器用于在终端设备的触发下建立AR视频通话,比如以上述第一终端设备与第二终端设备进行AR视频通话为例,应用服务器接收来自第一终端设备的AR界面操作指示,并将AR界面操作指示发送给AR媒体服务器,AR界面操作指示用于指示用户对第一终端设备显示的AR界面所作的操作;进而AR媒体服务器,具体用于根据AR界面操作指示对接收到的上行媒体流进行媒体增强处理。
可选地,参见图2所示,应用服务器可以包括媒体插件服务功能,也可以称为插件服务器(Plugin server)。应用服务器还包括应用服务功能(application service,AS)。媒体插件服务功能,用于与终端设备进行交互,接收来自终端设备触发的AR界面操作指示,并将AR界面操作指示发送给应用服务功能。应用服务功能用于与AR媒体服务器交互,将媒体插件服务功能发送的AR界面操作指示发送给AR媒体服务器。进而AR媒体服务器在对接收到的上行媒体流进行媒体增强处理时,根据AR界面操作指示对接收到的上行媒体流进行媒体增强处理。
需要说明的是,媒体插件服务功能与AS在部署时,可以分别独立部署,还可以合并部署,比如媒体插件服务功能与AS通过一个设备来实现,或者通过一个或者多个虚拟机来实现。
其中,AR界面操作指示可以是用于指示处理方式的指示,对实时性没有很大要求,比如美颜操作,AR媒体服务器根据该AR界面操作指示对接收到的媒体流中每个视频帧中人脸执行美颜处理。AR界面操作指示还可以包括用于指示用户实时操作的指示,比如模型操作指示,用于指示旋转模型、缩放模型等。本申请实施例中为了便于区分,将对实时性没有很大要求的AR界面操作指示称为非实时操作指示,将用于指示用户实时操作的指示称为实时操作指示。
一种可能的方式中,参见图3所示,应用服务器(AS)可以部署于系统中的中心节点。第一SBC部署于系统中的第一边缘节点,AR媒体服务器部署于系统中的中心节点。第二SBC部署于系统的第二边缘节点。边缘节点相比中心节点来说,更靠近终端设备的用户,为这些用户提供边缘计算服务、转发服务等等,降低响应时延和带宽成本,减轻中心节点的压力。可选地,中心节点和边缘节点可以部署于云上,在该请况下,中心节点可以称为中心云,边缘节点可以称为边缘云。示例性地,边缘节点也可以是移动边缘计算(Mobile Edge Computing,MEC)节点。中心云还可以部署IP多媒体系统(IP Multimedia Subsystem, IMS)核心网(core)。IMS core中可以包括呼叫会话控制功能(call session control function,CSCF)和归属签约用户服务器(home subscriber server,HSS),IMS core还可以包括其它网元,本申请实施例中不再赘述。CSCF是IMS core的呼叫控制中心,是在IP传输平台上实现用户接入,鉴权、会话路由和业务触发等功能。CSCF可以包括服务-呼叫会话控制功能(serving-call session control function,S-CSCF)、代理-CSCF(Proxy CSCF,P-CSCF)、查询-CSCF(Interrogating-CSCF,I-CSCF)中的一个或多个。HSS用于记录用户的签约数据(比如用户信息、业务数据)。参见图3所示,SBC提供接入网与IMS核心网之间以及IMS核心网之间的边界控制功能,能够提供接入控制、服务质量(quality of service)控制以及防火墙穿越等功能。
可选地,在终端设备中可以部署AR控件(可以称为端侧Plugin)。AR控件用于与网络侧的媒体插件服务功能进行消息交互。AR控件还可以与AR media enabler之间建立辅助传输通道。其中辅助传输通道用于第一终端设备向AR media enabler发送辅助媒体流。示例性地,辅助媒体流可以包括点云数据、空间数据(也可以称为空间位姿数据)、用户视角视频或虚拟模型中的一项或者多项。点云数据是指以点的形式记录的数据,每个点可以包括空间位置信息,还可以包含颜色信息或者反射强度信息等。空间数据也可以称为几何数据,它用来表示物体的位置、形态、大小分布等各方面的信息,是对现世界中存在的具有定位意义的事物和现象的定量描述。虚拟模型,比如可以包括虚拟人像模型、虚拟物体模型以及素材图像(比如贴纸、卡通头像等)、或者虚拟动画模型等中的一项或多项。用户视角视频,比如可以是在用户通过终端设备的后置摄像头采集到的视频,或者通过终端设备的前置摄像头采集到的视频。
可选地,本申请实施例中终端设备可以根据传输的辅助媒体流的类型建立不同的辅助传输通道。比如在需要传输点云数据时,建立辅助传输通道1,在需要传输AR空间数据时,建立辅助传输通道2。用于传输AR空间数据的辅助传输通道2可以称为Action通道,也可以采用其它的称呼,本申请实施例对此不作限定。终端设备也可以通过一个辅助传输通道传输不同类型的辅助媒体流。
示例性地,AR控件的用户界面可以在终端设备的AR视频通话过程中作为悬浮窗口叠加在VoLTE的通话界面之上。比如,参见图4所示的显示界面400。窗口402显示通话对端的用户的图像,窗口401显示通话本端用户的图像,窗口403为AR控件的用户界面。用户通过点击窗口403中的图标,比如“美白”图标,进而AR控件将美白的操作指令通过应用服务器发送给AR媒体服务器,从而AR媒体服务器对接收到的媒体流包括的视频图像的人脸执行美白操作。
另一种可能的方式中,参见图5所示,与所述图3中AR媒体服务器部署于中心节点不同的是,在图5中,AR媒体服务器部署于系统中的第一边缘节点(比如第一边缘云)。在该部署情况下,AR媒体服务器可以独立于SBC部署,即AR媒体服务器为一个单独的设备,AR媒体服务器还可以与SBC合并部署,合并部署后的设备可以实现AR媒体服务器的功能以及实现SBC的功能。
又一种可能的方式中,参见图6所示,进行通话的两个终端设备分别对应的边缘云中部署AR媒体服务器,为了便于区分,第一边缘云中部署的AR媒体服务器称为第一AR媒体服务器,第二边缘云中部署的AR媒体服务器称为第二AR媒体服务器。在该部署方式下,第一AR媒体服务器进行媒体增强处理后的媒体流,如果需要发送给第二终端设备, 则先发送给第二AR媒体服务器,由第二AR媒体服务器通过第二SBC发送给第二终端设备。
需要说明的是,不同的终端设备所对应的边缘节点可能相同,也可能不同。另外不同的终端设备对应的中心节点可能相同,也可能不同。不同的终端设备所对应的AR媒体服务器可能不同,也可能相同。不同的终端所对应的CSCF可能相同,也可能不同。
如下以第一终端设备与第二终端设备进行AR视频增强通话为例。由AR媒体服务器对接收到的媒体流进行媒体增强处理。本申请实施例中为了描述方便,将AR媒体服务器输入的媒体流称为上行媒体流,输出的媒体流称为下行媒体流。参见图7所示,AR媒体服务器接收的上行媒体流可以包括来自第一终端设备的第一媒体流。上行媒体流还可以如下一项或者多项:辅助媒体流、虚拟模型。
AR媒体服务器的输入还可以包括AR界面操作指示,AR界面操作指示可以指示用户对第一终端设备显示的AR界面所作的操作。AR媒体服务器的输出可以包括第一终端设备的下行媒体流,和/或,第二终端设备的下行媒体流。
可选地,辅助媒体流可以由第一终端设备通过辅助传输通道发送到AR媒体服务器。辅助媒体流可以包括点云数据、空间数据、用户视角视频或者虚拟模型中的一项或者多项。一种方式是,虚拟模型由终端设备生成并通过辅助传输通道发送给AR媒体服务器,另一种方式,终端设备可以不具有生成虚拟模型的能力,虚拟模型可以由应用服务器生成并发送给AR媒体服务器。
应理解的是,在不同的应用场景下,AR媒体服务器的输入和输出可以不同。
下面结合应用场景示例性地对AR媒体服务器的媒体流构成以及流向进行说明。以用户1的第一终端设备与用户2的的第二终端设备之间进行AR视频通话为例。
示例1,单向AR增强场景。比如,美颜、贴纸、超分、表情驱动通话等场景。该场景可以适用于通话两端均支持AR情况。也可以适用于单侧支持AR的情况。
参见图8所示,以第一终端设备在与第二终端设备进行视频通话过程中需要进行AR处理。AR媒体服务器的输入包括第一终端设备的第一媒体流。第一媒体流由第一终端设备通过第一SBC发送给AR媒体服务器。示例性地,第一媒体流可以包括第一终端设备通过摄像头采集到的视频,还可以包括通过麦克风采集到的语音。
AR媒体服务器对第一媒体流进行媒体增强处理后输出第二终端设备的下行媒体流。可选地,用户1的第一终端设备上在显示用户2的图像的基础上也可以显示用户1自身的图像,比如图4所示的窗口401和窗口402,在该情况下,AR媒体服务器在对第一媒体流执行媒体增强处理后再发送给第二终端设备时,还可以发送给第一终端设备。以美颜为例,第一终端设备的用户1触发美颜操作时,第一终端设备的AR控件将美颜操作指示发送给应用服务器,进而应用服务器将美颜操作发送给AR媒体服务器,AR媒体服务器对接收到的第一媒体流中视频图像包括的人脸执行美颜操作。
示例性地,针对贴纸、表情驱动通话等需要素材的场景中,应用服务器中的应用服务功能中部署素材库,素材库中可以包括各种素材,比如不同样式的贴纸、不同表情的表情头像(比如可爱猫咪,鬼脸)、或者不同样式的虚拟人像模型等等。在需要素材的场景中,AR媒体服务器的输入还包括来自应用服务器的素材图像。第一终端设备的AR控件响应于用户选择的道具(比如虚拟人像),向应用服务器发送AR界面操作指示,AR界面操作指示用于指示用户选择的虚拟人像。应用服务器接收到AR界面操作指示后,可以将素材 库中的虚拟人像模型发送给AR媒体服务器,AR媒体服务器从接收到的第一媒体流中获取用户1的表情、动作等数据对虚拟人像模型进行渲染,将经过渲染得到媒体流发送给第二终端设备。
示例1可以适用于AR媒体服务器部署于中心节点的架构,也可以适用于AR媒体服务器部署于边缘节点的架构。当第一终端设备和第二终端设备所对应的边缘节点不同时,比如,以图6为例,第一终端设备对应的AR媒体服务器(比如第一AR媒体服务器)将媒体增强处理后的媒体流先发送给第二终端设备对应的AR媒体服务器(第二AR媒体服务器),由第二AR媒体服务器通过第二SBC发送给第二终端设备。
示例2,操作交互通话场景,比如广告推销、远程教育等。该场景可以适用于通话两端均支持AR情况。也可以适用于单侧支持AR的情况。
参见图9所示,以用户1的第一终端设备在与用户2的第二终端设备进行视频通话过程中需要进行AR处理为例。AR媒体服务器的输入包括第一终端设备的第一媒体流。第一媒体流由第一终端设备通过第一SBC发送给AR媒体服务器。AR媒体服务器的输入还包括实时操作指示,比如模型旋转、模型移动或者模型缩放、空间标注等操作。实时操作指示可以由用户1操作产生,即由第一终端设备通过应用服务器发送给AR媒体服务器。实时操作指示可以由用户2操作产生,即由第二终端设备通过应用服务器发送。作为一种示例,在图3或图5所示的部署方式中,AR媒体服务器可以包括至少两个媒体处理实例,以两个为例,分别为媒体处理实例1和媒体处理实例2。媒体处理实例1用于对第一终端设备的第一媒体流进行媒体增强处理,媒体处理实例1的输入可以包括第一媒体流、实时操作指示,媒体处理实例1将媒体增强处理后的媒体流通过第一SBC发送给第一终端设备。媒体处理实例2用于对第二终端设备的第二媒体流进行媒体增强处理,媒体处理实例2的输入可以包括第二媒体流和实时操作指示,并将媒体增强处理后的媒体流通过第二SBC发送给第二终端设备。其中,媒体处理实例1上输入的实时操作指示和媒体处理实例2上输入的实时操作指示可以相同,比如来自于第一终端设备或者来自第二终端设备。媒体处理实例1上输入的实时操作指示和媒体处理实例2上输入的实时操作指示也可以相同,媒体处理实例1上输入的实时操作指示来自第一终端设备,媒体处理实例2上输入的实时操作指示来自第二终端设备。针对图6所示的部署方法,可以由第一AR媒体服务器对第一终端设备的第一媒体流进行媒体增强处理,由第二AR媒体服务器对第二终端设备的第二媒体流进行媒体增强处理。第一AR媒体服务器的输入可以包括第一媒体流、实时操作指示,第一AR媒体服务器媒体增强处理后的媒体流通过第一SBC发送给第一终端设备。第二AR媒体服务器的输入可以包括第二媒体流、实时操作指示,第二AR媒体服务器媒体增强处理后的媒体流通过第一SBC发送给第一终端设备。其中,第一AR媒体服务器上输入的实时操作指示和第二AR媒体服务器输入的实时操作指示可以相同,比如实时操作指示来自于第一终端设备或者来自第二终端设备。第一AR媒体服务器上输入的实时操作指示和第二AR媒体服务器输入的实时操作指示也可以不同,第一AR媒体服务器上输入的实时操作指示来自第一终端设备,第二AR媒体服务器上输入的实时操作指示来自第二终端设备。
AR媒体服务器的输入还可以包括虚拟模型,虚拟模型可以由应用服务器发送给AR媒体服务器。需要说明的是,虚拟模型可以不是实时传输的,可由应用服务器传输一次即可。在终端设备具有提供虚拟模型的场景中,可以由终端设备通过应用服务器发送给AR 媒体服务器,或者可以是终端设备上AR控件通过辅助传输通道将虚拟模型发送给AR媒体服务器。
比如,二手房可视业务,房源提供者对应第二终端设备,购房用户为第一终端设备。第一终端设备将购房用户的第一媒体流(作为背景流)通过第一SBC发送给AR媒体服务器。第一媒体流可以由第一终端设备上后置摄像头采集得到。第一终端设备的AR插件将购房用户操作二手房模型的模型操作指示通过应用服务器发送给AR媒体服务器。AR媒体服务器从第一媒体流获取购房用户视角的空间位姿数据,按照购房用户视角的空间位姿数据渲染二手房模型,将渲染后的二手房模型与购房用户的背景流叠加发送给第一终端设备。针对第二终端设备的第二媒体流可以采用针对第一媒体流的处理方式类似的方式,此处不再重复描述。作为一种示例,空间位姿数据可以由第一终端设备的AR控件通过辅助传输通道发送给AR媒体服务器。
示例3,图像交互通话场景,比如AR全息通话场景等。该场景下适用于通话两端均支持AR。在图3或图5所示的部署方式中,AR媒体服务器可以部署至少两个媒体处理实例,以两个为例,参见图10所示,分别为媒体处理实例1和媒体处理实例2。媒体处理实例1上输入和输出,以及媒体处理实例2上输入和输出参见图10所示。媒体处理实例1的输入包括第一媒体流和第二辅助媒体流,第一媒体流可以是第一终端设备通过第一SBC发送给媒体处理实例1,第二辅助媒体流可以是第二终端设备上的AR控件通过辅助传输通道发送给媒体处理实例1。媒体处理实例2的输入包括第二媒体流和第一辅助媒体流,第二媒体流可以是第二终端设备通过第二SBC发送给媒体处理实例2,第一辅助媒体流可以是第一终端设备上的AR控件通过辅助传输通道发送给媒体处理实例2。第一媒体流和第一辅助媒体流可以是第一终端设备分别通过前置摄像头和后置摄像头采集得到,同样,第二媒体流和第二辅助媒体流可以是第二终端设备分别前置摄像头和后置摄像头采集得到。
以AR全息通话场景为例,第一媒体流包括第一终端设备的用户1所在的环境图像,第一辅助媒体流包括用户1的人像图像,第二媒体流包括第二终端设备的用户2所在的环境图像,第二辅助媒体流包括用户2的人像图像。用户1的第一辅助媒体流输入媒体处理实例2,媒体处理实例2从用户1的第一辅助媒体流获取用户1的实时表情、动作数据,驱动用户1的虚拟模型;媒体处理实例2将用户2的第二媒体流作为背景流,根据背景流获得用户2视角的空间位姿数据,并根据用户2的视角的空间位姿数据渲染用户1的虚拟模型,与第二媒体流叠加后,作为第二终端设备的下行视频流。同样,用户2的第二辅助媒体流输入媒体处理实例1,媒体处理实例1从用户2的第二辅助媒体流获取用户2的实时表情、动作数据,驱动用户2的虚拟模型;媒体处理实例1将用户1的第一媒体流作为背景流,根据背景流获得用户1视角的空间位姿数据,并根据用户1的视角的空间位姿数据渲染用户2的虚拟模型,与第一媒体流叠加后,作为第一终端设备的下行视频流。
在图6所示的部署方式中,第一AR媒体服务器和第二AR媒体服务器的输入和输出参见图11所示。第一AR媒体服务器和第二AR媒体服务器的处理方式与上述媒体处理实例1和媒体处理实例2的处理方式类似,此处不再赘述。
示例4,虚实叠加通话场景,比如远程指导等。该场景可以适用于通话两端均支持AR情况。也可以适用于单侧支持AR的情况。
参见图12所示,AR媒体服务器的输入包括第一终端设备的第一媒体流和第一终端设 备的辅助媒体流(包括点云数据)。第一终端设备上可以配置深度摄像头,用于获取点云数据,点云数据用于生成拍摄画面的深度图,比如红(R)绿(G)蓝(B)-深度(D)图像。第一终端设备的第一媒体流输入AR媒体服务器,第一媒体流作为背景流,AR媒体服务器根据点云数据,以更高的精度从背景流中识别物体对象的空间位置,识别物体对象后,在背景流上叠加虚拟模型、或者标识等,AR媒体服务器的输出作为第一终端设备和第二终端设备的下行视频流。
下面结合前面所描述的通信系统结构,对创建媒体会话的流程进行说明。
参见图13所示,为本申请实施例提供的基于AR的通信方法流程示意图,以第一终端设备向第二终端设备触发AR视频增强流程为例。
S1301,第一终端设备通过第一SBC向应用服务器触发呼叫请求。
S1302,应用服务器向AR媒体服务器发送给第一会话创建请求。第一会话创建请求用于请求创建与第一终端设备对应的第一SBC之间的第一媒体会话。示例性地,第一会话创建请求中携带第一SBC的SDP信息,例如SBC的地址信息,媒体流的类型,支持的媒体参数等。
示例性地,第一会话创建请求可以是INVITE消息。
S1303,AR媒体服务器接收到第一会话创建请求时,向应用服务器发送第一会话创建响应。第一会话创建响应用于指示第一媒体会话创建成功。示例性地,第一会话创建响应携带AR媒体服务器的第一媒体描述协议SDP信息,该第一SDP信息用于描述所述第一SBC与所述AR媒体服务器之间创建第一媒体会话的媒体流通道的参数,例如AR媒体服务器的地址信息,媒体流的类型,支持的媒体参数等。
示例性地,第一会话创建响应可以是200OK。
S1304,AR媒体服务器接收应用服务器发送的第二会话创建请求。第二会话创建请求用于请求创建与第二SBC之间的第二媒体会话。
示例性地,第二会话创建请求可以是INVITE消息。
可选地,第二会话创建请求中可以携带服务指示。服务指示用于指示本次会话所需的媒体处理以及媒体流向。比如,服务指示可以是服务标识(ServiceID)。服务指示也可以称为AR业务指示。不同的应用场景下服务指示所指示的内容不同,即在不用的应用场景下,AR媒体服务器所提供的媒体增强处理不同,媒体流的流向也可能不同。
示例性地,第二会话创建请求携带关联指示。比如,关联指示可以通过第二会话创建请求的呼叫标识(比如,CallID)来指示。关联指示用于关联第一媒体会话和第二媒体会话。换句话说,建立第一SBC与AR媒体服务器之间的媒体流通道,和AR媒体服务器之间的媒体流通道的关联关系。也可以说第一SBC转发的第一终端设备的媒体流在到达第二终端设备所属的第二SBC之前需要经过AR媒体服务器。
S1305,AR媒体服务器向应用服务器发送第二会话创建响应。第二会话创建响应用于指示第二媒体会话创建成功。示例性地,第二会话创建响应中携带AR媒体服务器的第二SDP信息。第二媒体描述协议SDP信息用于描述第二SBC与AR媒体服务器之间创建第二媒体会话的媒体流通道的参数。
示例性地,第一会话创建响应可以是183消息。
示例性地,AR媒体服务器在建立第一媒体会话和第二媒体会话的关联关系,可以建 立AR媒体服务器器上第一接口与第二接口的关联关系。第一接口为接收第一SBC发送的媒体流,第二接口用于向第二SBC发送媒体流。第一接口和第二接口可以为物理接口或者物理子接口,也可以为逻辑接口或者为逻辑子接口。
S1306,应用服务器通过第二SBC向第二终端设备发送呼叫请求。应用服务器可以在呼叫请求中将AR媒体服务器的第二SDP带给第二SBC。
需要说明的是,第一终端设备在向第二终端设备发起AR视频增强流程时,第一种情况是,在VoLTE通话流程中即引入AR媒体服务器,即通话建立流程中需要建立AR媒体服务器与SBC之间的媒体会话。比如,第一终端设备建立原始呼叫时即发起AR视频增强流程,可以采用第一种情况。第二种情况是,原始呼叫无需执行AR媒体增强(比如原始通过仅为音频通话),在通话过程中,触发AR媒体增强流程。
下面针对第一种情况下的AR视频通信流程进行详细说明,参见图14A所示。图14A以通话两端对应同一AR媒体服务器为例。
S1401,第一终端设备向第一SBC发送呼叫请求1。该呼叫请求1中携带第一终端设备的媒体描述协议(session description protocol,SDP)信息。
示例性地,呼叫请求1可以但不仅限于采用会话初始协议(session initiation protocol,SIP),还可以采用其它类型的传输协议,本申请对此不作限定。上述终端的SDP可以包括地址信息、媒体流的类型、支持的编解码格式等参数。
本申请实施例中,SDP用于两个会话实体之间的媒体面协商,并达成一致,属信令语言族,可以采用文本(字符)描述形式。SDP中可以包括如下所述的一项或多项:会话ID、会话版本、会话时间、本端传输媒体流的IP和端口、媒体流的描述信息(比如媒体类型,传输协议、媒体格式等中的一项或者多项)等。
第一终端设备的SDP信息用于描述创建第一终端设备与第一SBC之间媒体会话的媒体流通道的参数。
S1402,第一SBC在接收到呼叫请求1后,将呼叫请求1中的第一终端设备的SDP信息替换为第一SBC的SDP信息得到呼叫请求2,并将呼叫请求2发送给S-CSCF。
S1403,S-CSCF在接收到呼叫请求2后,将呼叫请求2转发给应用服务器。
示例性地,S-CSCF在接收到呼叫请求2后,根据第一终端设备的签约数据确定第一终端设备签约了AR媒体增强业务,进而将呼叫请求2转发给应用服务器。该应用服务器用于提供AS媒体增强业务。
S1404-S1407,参见S1302-S1305,此处不再赘述。
S1408,应用服务器将呼叫请求2中的第一SBC的SDP信息替换为AR媒体服务器的第二SDP信息得到呼叫请求3,并将呼叫请求3发送给S-CSCF。
S1409,S-CSCF将呼叫请求3转发给第二SBC。第二SBC可以根据AR媒体服务器的第二SDP信息确定媒体流通道的上一跳为AR媒体服务器。
S1410,第二SBC将呼叫请求3中的AR媒体服务器的第二SDP信息替换为第二SBC的SDP信息得到呼叫请求4,并将呼叫请求4发送给第二终端设备。
S1411,第二终端设备向第二SBC发送给呼叫响应4(为呼叫请求4对应的响应),呼叫响应4中可以携带第二终端设备的SDP信息。
S1412,第二SBC接收到呼叫响应4后,向S-CSCF发送呼叫响应3(为呼叫请求3 对应的响应),呼叫响应3可以携带第二SBC的SDP信息。
S1413,S-CSCF接收到呼叫响应3后,向应用服务器转发呼叫响应3。
S1414,应用服务器在接收到呼叫响应3后,向AR媒体服务器发送第二SBC的SDP信息。AR媒体服务器在接收到第二SBC的SDP信息后可以确定媒体流隧道的下一跳为第二SBC。
S1415,应用服务器向S-CSCF发送呼叫响应2(为呼叫请求2对应的响应)。呼叫响应2中可以携带AR媒体使能器的第二SDP信息。
S1416,S-CSCF将呼叫响应2转发给第一SBC。
S1417,第一SBC在接收到呼叫响应2后,向第一终端设备发送呼叫响应1。呼叫响应1中携带第一SBC的SDP信息。
示例性地,呼叫响应1-呼叫响应4可以采用183消息类型。
参见图14B所示,为本申请实施例提供的基于AR的通信方法流程示意图,以第一终端设备向第二终端设备触发AR视频增强流程为例。图14B以通话两端对应不同的AR媒体服务器为例,通话两端对应不同的应用服务器为例。
S1401a-S1403a,参见S1401-S1403,此处不再赘述。
S1404a,第一应用服务器向第一AR媒体服务器发送给会话创建请求1。会话创建请求1用于请求创建与第一终端设备对应的第一SBC之间的第一媒体会话。示例性地,会话创建请求1中携带第一SBC的SDP信息。
示例性地,第一会话创建请求可以是INVITE消息。
S1405a,第一AR媒体服务器接收到会话创建请求1时,向第一应用服务器发送会话创建响应1。会话创建响应1用于指示第一媒体会话创建成功。示例性地,会话创建响应1携带第一AR媒体服务器的第一媒体描述协议SDP信息,该第一SDP信息用于描述第一SBC与第一AR媒体服务器之间创建第一媒体会话的媒体流通道的参数。
示例性地,第一会话创建响应可以是200OK。
S1406a,第一AR媒体服务器接收第一应用服务器发送的会话创建请求2。会话创建请求2用于请求创建与第二SBC之间的第二媒体会话。
示例性地,会话创建请求2可以是INVITE消息。
可选地,会话创建请求2中可以携带服务指示。示例性地,第二会话创建请求携带第一关联指示。比如,第一关联指示可以通过会话创建请求2的呼叫标识(比如,CallID)来指示。第一关联指示用于关联第一媒体会话和第二媒体会话。
S1407a,第一AR媒体服务器向第一应用服务器发送会话创建响应2。会话创建响应2用于指示第二媒体会话创建成功。示例性地,第二会话创建响应中携带AR媒体服务器的第二SDP信息。第二媒体描述协议SDP信息用于描述第二SBC与第一AR媒体服务器之间创建第二媒体会话的媒体流通道的参数。
S1408a,第一应用服务器将呼叫请求2中的第一SBC的SDP信息替换为第一AR媒体服务器的第二SDP信息得到呼叫请求3,并将呼叫请求3发送给S-CSCF1。
S1409a,S-CSCF1将呼叫请求3转发给S-CSCF2。
S1410a,S-CSCF2将呼叫请求3转发给第二应用服务器。
S1411a,第二应用服务器向第二AR媒体服务器发送给会话创建请求3。会话创建请求3用于请求创建与第一AR媒体服务器之间的第三媒体会话。示例性地,会话创建请求 3中携带第一AR媒体服务器的第二SDP信息。
S1412a,第二AR媒体服务器接收到会话创建请求3时,向第二应用服务器发送会话创建响应3。会话创建响应3用于指示第三媒体会话创建成功。示例性地,会话创建响应3携带第二AR媒体服务器的第一SDP信息,该第二AR媒体服务器的第一SDP信息用于描述第一AR媒体服务器与第二AR媒体服务器之间创建第三媒体会话的媒体流通道的参数。
S1413a,第二AR媒体服务器接收第二应用服务器发送的会话创建请求4。会话创建请求4用于请求创建与第二SBC之间的第四媒体会话。
示例性地,会话创建请求4可以是INVITE消息。
可选地,会话创建请求4中可以携带服务指示。示例性地,第二会话创建请求携带第二关联指示。比如,第二关联指示可以通过会话创建请求4的呼叫标识(比如,CallID)来指示。第二关联指示用于关联第三媒体会话和第四媒体会话。
S1414a,第二AR媒体服务器向第二应用服务器发送会话创建响应4。会话创建响应4用于指示第四媒体会话创建成功。示例性地,会话创建响应4中携带第二AR媒体服务器的第二SDP信息。第二AR媒体服务器的第二SDP信息用于描述第二SBC与第二AR媒体服务器之间创建第四媒体会话的媒体流通道的参数。
S1415a,第二应用服务器将呼叫请求3中的第一AR媒体服务器的第二SDP信息替换为第让二AR媒体服务器的第二SDP信息得到呼叫请求4,并发送给S-CSCF2。
S1416a,S-CSCF2将呼叫请求4转发给第二SBC。第二SBC可以根据第二AR媒体服务器的第二SDP信息确定媒体流通道的上一跳为第二AR媒体服务器。
S1417a,第二SBC将呼叫请求4中的第二AR媒体服务器的第二SDP信息替换为第二SBC的SDP信息得到呼叫请求5,并将呼叫请求5发送给第二终端设备。
S1418a,第二终端设备向第二SBC发送给呼叫响应5,呼叫响应5中可以携带第二终端设备的SDP信息。
S1419a,第二SBC接收到呼叫响应5后,向S-CSCF2发送呼叫响应4,呼叫响应4可以携带第二SBC的SDP信息。
S1420a,S-CSCF2接收到呼叫响应4后,向第二应用服务器转发呼叫响应4。
S1421a,第二应用服务器在接收到呼叫响应4后,向第二AR媒体服务器发送第二SBC的SDP信息。第二AR媒体服务器在接收到第二SBC的SDP信息后可以确定媒体流隧道的下一跳为第二SBC。
S1422a,第二应用服务器向S-CSCF2发送呼叫响应3。呼叫响应3中可以携带第二AR媒体使能器的第一SDP信息。
S1423a,S-CSCF2向S-CSCF1发送呼叫响应3。
S1424a,S-CSCF1向第一应用服务器发送呼叫响应3。
S1425a,第一应用服务器向第一AR媒体服务器发送第二AR媒体使能器的第一SDP信息。第一AR媒体使能器在接收到第二AR媒体使能器的第一SDP信息后可以确定媒体流隧道的下一跳为第二AR媒体使能器。
S1426a,第一应用服务器向第一SBC发送呼叫响应2,呼叫响应2中携带第一AR媒体服务器的第一SDP信息。
S1427a,第一SBC在接收到呼叫响应2后,向第一终端设备发送呼叫响应1。呼叫响 应1中携带第一SBC的第一SDP信息。
示例性地,呼叫响应1-呼叫响应4可以采用183消息类型。
下面针对第二种情况下的AR视频通信流程进行详细说明,参见图15所示。
VoLTE呼叫建立完成,媒体流传输时并未经过AR媒体服务器。第一终端设备与第二终端设备通话过程,可以由第一终端设备或者第二终端设备触发AR媒体增强流程。以下以第一终端设备通过AR控件触发AR媒体增强流程为例。图14A以通话两端对应同一AR媒体服务器或者通话一端部署AR媒体服务器为例。
S1501,第一终端设备上的AR控件触发的AR视频增强请求时,应用服务器接收第一终端设备上的AR控件触发的AR视频增强请求。
S1502,应用服务器向S-CSCF发送AR视频呼叫重请求(re-invite)1。所述AR视频呼叫重请求1用于指示第一种终端设备发起AR视频呼叫。所述AR视频呼叫重请求1可以携带第一终端设备的标识信息,比如第一终端设备的SIP地址或统一资源定位符(Uniform Resource Locator,URL)。
S1503,S-CSCF将AR视频呼叫重请求1转发给第一SBC。例如,AR视频呼叫重请求可以为Re-INVITE。S-CSCF可以根据第一终端设备的标识信息确定第一终端设备所属的SBC为第一SBC。
S1504,第一SBC向第一终端设备发送AR视频呼叫重请求2。
S1505,第一终端设备向第一SBC发送AR视频呼叫响应2,AR呼叫请求2携带第一终端设备的媒体描述协议(session description protocol,SDP)信息。
S1506,第一SBC在接收到AR视频呼叫响应2后,向S-CSCF发送AR视频呼叫响应1,AR视频呼叫响应1中携带第一SBC的SDP信息。
S1507,S-CSCF在接收到AR视频呼叫响应1后,将AR视频呼叫响应1转发给应用服务器。
其中,AR视频呼叫响应1和AR视频呼叫响应2可以采用200OK消息。
S1508-S1511,参见S1302-S1305,此处不再赘述。
S1512,应用服务器向S-CSCF发送AR视频呼叫重请求3,AR视频呼叫重请求3携带AR媒体服务器的第二SDP信息。
S1513,S-CSCF将AR视频呼叫重请求3转发给第二SBC。第二SBC可以根据AR媒体服务器的第二SDP信息确定媒体流通道的上一跳为AR媒体服务器。
S1514,第二SBC将AR视频呼叫重请求3中的AR媒体服务器的第二SDP信息替换为第二SBC的SDP信息得到AR视频呼叫重请求4,并AR视频呼叫重请求4发送给第二终端设备。
S1515,第二终端设备向第二SBC发送AR视频呼叫响应4,AR视频呼叫响应4中可以携带第二终端设备的SDP信息。
S1516,第二SBC接收到AR视频呼叫响应4后,向S-CSCF发送AR视频呼叫响应3,AR视频呼叫响应3可以携带第二SBC的SDP信息。
S1517,S-CSCF接收到AR视频呼叫响应3后,向应用服务器转发AR视频呼叫响应3。
S1518,应用服务器在接收到AR视频呼叫响应3后,向AR媒体服务器发送第二SBC 的SDP信息。AR媒体服务器在接收到第二SBC的SDP信息后可以确定媒体流隧道的下一跳为第二SBC。
示例性地,AR视频呼叫响应3和AR视频呼叫响应4可以采用200OK消息。
S1519,应用服务器向S-CSCF发送AR视频呼叫确认1。AR视频呼叫确认1中可以携带AR媒体服务器的第二SDP信息。
S1520,S-CSCF将AR视频呼叫确认1转发给第一SBC。
S1521,第一SBC在接收到AR视频呼叫确认1后,向第一终端设备发送AR视频呼叫确认2。AR视频呼叫确认2中携带第一SBC的SDP信息。
示例性地,AR视频呼叫确认1和AR视频呼叫确认2可以采用确认(ACK)消息。
在一种可能的实现方式中,第一终端设备上部署AR控件,在步骤S1501,第一终端设备上的AR控件触发的AR视频增强请求时,可以通过如下流程实现,参见图16所示。
第一终端设备触发AR视频增强流程时,启动AR控件,比如第一终端设备可以通过呼叫广播事件拉起AR控件。AR控件的用户界面可以作为悬浮窗口叠加在通话界面之上,比如参见图4所示。
S1601,AR控件的用户界面可以包括AR增强启动按钮,AR控件接收到用户对启动按钮的第一操作,触发AR视频增强请求。AR控件与应用服务器中的媒体插件服务功能通过UX或者UI接口建立有通信连接。
S1602,AR控件将AR视频增强请求发送给媒体插件服务功能。
S1603,媒体插件服务功能将AR视频增强请求发送给应用服务功能。
S1604,应用服务功能触发AR视频增强流程。比如执行S1502。
下面针对本申请实施例中终端设备与AR媒体服务器之间建立辅助媒体通道的流程进行详细说明。参见图17所示,第一终端设备与第二终端设备已经建立AR视频增强通话流程,以第一终端设备与AR媒体服务器之间建立辅助媒体通道的流程进行说明。
S1701,第一终端设备的AR控件在确定需要传输辅助媒体流时,发起辅助传输通道的建立请求。比如,用户通过AR控件触发打开用于获取点云数据的深度摄像头,则确定需要传输辅助媒体流。再比如,通过AR控件触发打开用于生成AR空间数据的应用,则确定需要传输辅助媒体流。
S1702,AR控件向应用服务器中的媒体插件服务功能发送建立请求,携带所述第一终端设备上用于发送辅助媒体流的地址。
S1703,媒体插件服务功能将建立请求发送给所述应用服务功能。
S1704,应用服务功能将建立请求发送给AR媒体服务器。
S1705,AR媒体服务器向应用服务功能发送建立响应。建立响应中可以携带AR媒体服务器上用于接收辅助媒体流的地址。
S1706,应用服务功能将建立响应发送给媒体插件服务功能。
S1707,媒体插件服务功能将建立响应转发给第一终端设备的AR控件。进而AR控件与AR媒体服务器之间的辅助传输通道建立完成。辅助传输通道的首端为AR控件,辅助传输通道的尾端为AR媒体服务器。进而AR控件获取辅助媒体流,根据第一终端设备上用于发送辅助媒体流的地址和AR媒体服务器上用于接收辅助媒体流的地址向AR媒体服务器发送辅助媒体流。
应理解,说明书通篇中提到的“一个实施例”、“一个实现方式”、“一个实施方式”或“一示例”意味着与实施例有关的特定特征、结构或特性包括在本申请的至少一个实施例中。因此,在整个说明书各处出现的“在一个实施例中”、“一个实现方式”、“一个实施方式”或“在一示例中”未必一定指相同的实施例。此外,这些特定的特征、结构或特性可以任意适合的方式结合在一个或多个实施例中。应理解,在本申请的各种实施例中,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。
另外,本文中术语“系统”和“网络”在本文中常被可互换使用。本文中术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中字符“/”,一般表示前后关联对象是一种“或”的关系。本申请涉及的术语“至少一个”,是指一个,或一个以上,即包括一个、两个、三个及以上;“多个”,是指两个,或两个以上,即包括两个、三个及以上。另外,需要理解的是,在本申请的描述中,“第一”、“第二”等词汇,仅用于区分描述的目的,而不能理解为指示或暗示相对重要性,也不能理解为指示或暗示顺序。“以下至少一项(个)”或其类似表达,是指的这些项中的任意组合,包括单项(个)或复数项(个)的任意组合。例如,a,b,或c中的至少一项(个),可以表示:a,b,c,a-b,a-c,b-c,或a-b-c,其中a,b,c可以是单个,也可以是多个。应理解,在本申请实施例中,“与A相应的B”表示B与A相关联,根据A可以确定B。但还应理解,根据A确定B并不意味着仅仅根据A确定B,还可以根据A和/或其它信息确定B。此外,本申请实施例和权利要求书及附图中的术语“包括”和“具有”不是排他的。例如,包括了一系列步骤或模块的过程、方法、系统、产品或设备没有限定于已列出的步骤或模块,还可以包括没有列出的步骤或模块。
可以理解的是,本申请的实施例中的处理器可以是中央处理单元(central processing unit,CPU),还可以是其它通用处理器、数字信号处理器(digital signal processor,DSP)、专用集成电路(application specific integrated circuit,ASIC)、现场可编程门阵列(field programmable gate array,FPGA)或者其它可编程逻辑器件、晶体管逻辑器件,硬件部件或者其任意组合。通用处理器可以是微处理器,也可以是任何常规的处理器。
本申请的实施例中的方法步骤可以通过硬件的方式来实现,也可以由处理器执行软件指令的方式来实现。软件指令可以由相应的软件模块组成,软件模块可以被存放于随机存取存储器(random access memory,RAM)、闪存、只读存储器(Read-Only Memory,ROM)、可编程只读存储器(programmable ROM,PROM)、可擦除可编程只读存储器(erasable PROM,EPROM)、电可擦除可编程只读存储器(electrically EPROM,EEPROM)、寄存器、硬盘、移动硬盘、CD-ROM或者本领域熟知的任何其它形式的存储介质中。一种示例性的存储介质耦合至处理器,从而使处理器能够从该存储介质读取信息,且可向该存储介质写入信息。当然,存储介质也可以是处理器的组成部分。处理器和存储介质可以位于ASIC中。另外,该ASIC可以位于网络设备或终端设备中。当然,处理器和存储介质也可以作为分立组件存在于网络设备或终端设备中。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机程序或指令。在计算机上加载和执行所述计算机程序或指令时,全部或部分地执行本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用 计算机、计算机网络、或者其它可编程装置。所述计算机程序或指令可以存储在计算机可读存储介质中,或者通过所述计算机可读存储介质进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是集成一个或多个可用介质的服务器等数据存储设备。所述可用介质可以是磁性介质,例如,软盘、硬盘、磁带;也可以是光介质,例如,DVD;还可以是半导体介质,例如,固态硬盘(solid state disk,SSD)。
在本申请的各个实施例中,如果没有特殊说明以及逻辑冲突,不同的实施例之间的术语和/或描述具有一致性、且可以相互引用,不同的实施例中的技术特征根据其内在的逻辑关系可以组合形成新的实施例。

Claims (21)

  1. 一种增强现实系统,其特征在于,包括第一增强现实AR媒体服务器、第一会话边界控制器SBC;
    第一SBC,用于接收来自第一终端设备的第一媒体流,并将接收到的第一媒体流发送给所述第一AR媒体服务器;
    所述第一AR媒体服务器,用于对接收到的第一媒体流进行媒体增强处理。
  2. 如权利要求1所述的系统,其特征在于,所述系统还包括:
    应用服务器,用于接收来自第一终端设备的AR界面操作指示,并将所述操作指示发送给所述第一AR媒体服务器;
    所述第一AR媒体服务器,具体用于根据所述AR界面操作指示对接收到的所述第一媒体流进行媒体增强处理。
  3. 如权利要求2所述的系统,其特征在于,所述应用服务器部署于所述系统中的中心节点。
  4. 如权利要求2或3所述的系统,其特征在于,所述第一AR媒体服务器与第一终端设备之间建立有辅助传输通道;
    所述第一AR媒体服务器,还用于通过辅助传输通道接收来自第一终端设备的辅助媒体流,对所述辅助媒体流和所述第一媒体流进行媒体增强处理。
  5. 如权利要求4所述的系统,其特征在于,所述辅助媒体流包括点云数据、空间数据、用户视角视频或虚拟模型中的一项或多项。
  6. 如权利要求2-5任一项所述的系统,其特征在于:
    所述应用服务器,还用于向所述AR媒体服务器发送虚拟模型;
    所述AR媒体服务器,还用于对所述虚拟模型以及所述第一媒体流进行媒体增强处理。
  7. 如权利要求1-6任一项所述的系统,其特征在于,还包括第二SBC,所述第一SBC部署于所述系统中的第二边缘节点,所述第二SBC用于管理所述第二终端设备;
    所述第二SBC,还用于接收来自第二终端设备的第二媒体流,并将所述第二媒体流发送给所述第一AR媒体服务器;
    所述第一AR媒体服务器,还用于接收所述第二媒体流,并对所述第一媒体流和所述第二媒体流进行媒体增强处理。
  8. 如权利要求1-6任一项所述的系统,其特征在于,所述系统还包括第二SBC,所述第二SBC部署于所述系统中的第二边缘节点,所述第二SBC用于管理所述第二终端设备;
    所述第一AR媒体服务器,还用于将经过媒体增强处理后的媒体流发送给所述第二SBC;
    所述第二SBC,用于将来自第一AR媒体服务器的媒体流发送给所述第二终端设备。
  9. 如权利要求1-8任一项所述的系统,其特征在于,所述第一SBC与所述第一AR媒体服务器部署于所述系统中的第一边缘节点。
  10. 如权利要求1-8任一项所述的系统,其特征在于,所述第一SBC部署于所述系统中的第一边缘节点,所述第一AR媒体服务器部署于所述系统中的中心节点。
  11. 如权利要求1-6任一项所述的系统,其特征在于,所述系统中还部署第二AR媒体服务器和第二SBC;所述第一SBC与所述第一AR媒体服务器部署于所述系统中的第一 边缘节点,所述第二SBC与所述第二AR媒体服务器部署于所述系统中的第二边缘节点;
    第二SBC,用于接收来自第二终端设备的第二媒体流,并将接收到的第二媒体流发送给所述第二AR媒体服务器;
    所述第二AR媒体服务器,用于对接收到的第二媒体流进行媒体增强处理。
  12. 如权利要求1-11任一项所述的系统,其特征在于:
    所述第一AR媒体服务器,还用于将媒体增强处理后的第一媒体流发送给第二终端设备对应的第二SBC。
  13. 一种基于增强现实的通信方法,其特征在于,所述方法包括:
    第一SBC接收来自第一终端设备的第一媒体流,并将接收到的第一媒体流发送给第一增强现实AR媒体服务器;
    第一AR媒体服务器对接收到的第一媒体流进行媒体增强处理。
  14. 如权利要求13所述的方法,其特征在于,所述AR通信系统还包括应用服务器,所述方法还包括:
    所述应用服务器接收来自第一终端设备的AR界面操作指示,并将所述操作指示发送给所述第一AR媒体服务器;
    第一AR媒体服务器对接收到的第一媒体流进行媒体增强处理,包括:
    所述第一AR媒体服务器根据所述AR界面操作指示对接收到的所述第一媒体流进行媒体增强处理。
  15. 如权利要求13或14所述的方法,其特征在于,第一AR媒体服务器与第一终端设备之间建立有辅助传输通道,所述方法还包括:
    所述第一AR媒体服务器通过辅助传输通道接收来自第一终端设备的辅助媒体流;
    第一AR媒体服务器对接收到的第一媒体流进行媒体增强处理,包括:
    所述第一AR媒体服务器对所述辅助媒体流和所述第一媒体流进行媒体增强处理。
  16. 如权利要求15所述的方法,其特征在于,所述辅助媒体流包括点云数据、空间数据、用户视角视频或虚拟模型中的一项或多项。
  17. 如权利要求14-16任一项所述的方法,其特征在于,所述方法还包括:
    所述应用服务器向所述AR媒体服务器发送虚拟模型;
    第一AR媒体服务器对接收到的第一媒体流进行媒体增强处理,包括:
    所述第一AR媒体服务器对所述虚拟模型以及所述第一媒体流进行媒体增强处理。
  18. 如权利要求13-17任一项所述的方法,其特征在于,所述AR通信系统还包括第二SBC,所述第二SBC用于管理所述第二终端设备,所述方法还包括:
    第二SBC接收来自所述第二SBC所管理的第二终端设备的第二媒体流,并将所述第二媒体流发送给所述第一AR媒体服务器;
    所述第一AR媒体服务器,还用于接收所述第二媒体流;
    第一AR媒体服务器对接收到的第一媒体流进行媒体增强处理,包括:
    第一AR媒体服务器对所述第一媒体流和所述第二媒体流进行媒体增强处理。
  19. 如权利要求13-17任一项所述的方法,其特征在于,所述AR通信系统还包括第二SBC,所述第二SBC用于管理所述第二终端设备,所述方法还包括:
    所述第一AR媒体服务器将经过媒体增强处理后的媒体流发送给所述第二SBC;
    所述第二SBC将来自第一AR媒体服务器的媒体流发送给所述第二终端设备。
  20. 如权利要求13-17任一项所述的方法,其特征在于,所述系统中还部署第二AR媒体服务器和第二SBC;
    所述第二SBC接收来自第二终端设备的第二媒体流,并将接收到的第二媒体流发送给所述第二AR媒体服务器;
    所述第二AR媒体服务器对接收到的第二媒体流进行媒体增强处理。
  21. 如权利要求13-20任一项所述的方法,其特征在于,还包括:
    所述第一AR媒体服务器将媒体增强处理后的第一媒体流发送给第二终端设备对应的第二SBC。
PCT/CN2020/124168 2019-11-08 2020-10-27 一种增强现实ar通信系统及基于ar的通信方法 WO2021088691A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911089878.6 2019-11-08
CN201911089878.6A CN112788273B (zh) 2019-11-08 2019-11-08 一种增强现实ar通信系统及基于ar的通信方法

Publications (1)

Publication Number Publication Date
WO2021088691A1 true WO2021088691A1 (zh) 2021-05-14

Family

ID=75748546

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/124168 WO2021088691A1 (zh) 2019-11-08 2020-10-27 一种增强现实ar通信系统及基于ar的通信方法

Country Status (2)

Country Link
CN (1) CN112788273B (zh)
WO (1) WO2021088691A1 (zh)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113542714A (zh) * 2021-07-02 2021-10-22 恒大新能源汽车投资控股集团有限公司 远程交互通信系统及装置
CN116406028A (zh) * 2021-12-28 2023-07-07 中兴通讯股份有限公司 服务管理方法及其装置、系统、电子设备、存储介质
CN116633905A (zh) * 2022-02-10 2023-08-22 华为技术有限公司 一种通信方法、装置及通信系统
WO2024050744A1 (en) * 2022-09-08 2024-03-14 Zte Corporation Systems and methods for augmented reality communication based on data channel

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101118010B1 (ko) * 2010-07-16 2012-06-12 텔코웨어 주식회사 증강 현실을 이용한 영상 통화 서비스 방법 및 그 시스템, 및 기록매체
CN105933637A (zh) * 2016-04-26 2016-09-07 上海与德通讯技术有限公司 一种视频通信的方法及系统
KR101688033B1 (ko) * 2010-06-15 2016-12-21 주식회사 엘지유플러스 영상 전송을 통한 증강 현실 정보 제공 시스템 및 방법
CN106803921A (zh) * 2017-03-20 2017-06-06 深圳市丰巨泰科电子有限公司 基于ar技术的即时音视频通信方法及装置
CN206323408U (zh) * 2017-01-14 2017-07-11 国家电网公司 一种基于ims的融合视频通信系统
CN108377398A (zh) * 2018-04-23 2018-08-07 太平洋未来科技(深圳)有限公司 基于红外的ar成像方法、系统及电子设备
CN108377355A (zh) * 2016-11-28 2018-08-07 中兴通讯股份有限公司 一种视频数据处理方法、装置及设备
CN109740476A (zh) * 2018-12-25 2019-05-10 北京琳云信息科技有限责任公司 即时通讯方法、装置和服务器

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110266992A (zh) * 2019-06-24 2019-09-20 苏芯物联技术(南京)有限公司 一种基于增强现实的远程视频交互系统以及方法

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101688033B1 (ko) * 2010-06-15 2016-12-21 주식회사 엘지유플러스 영상 전송을 통한 증강 현실 정보 제공 시스템 및 방법
KR101118010B1 (ko) * 2010-07-16 2012-06-12 텔코웨어 주식회사 증강 현실을 이용한 영상 통화 서비스 방법 및 그 시스템, 및 기록매체
CN105933637A (zh) * 2016-04-26 2016-09-07 上海与德通讯技术有限公司 一种视频通信的方法及系统
CN108377355A (zh) * 2016-11-28 2018-08-07 中兴通讯股份有限公司 一种视频数据处理方法、装置及设备
CN206323408U (zh) * 2017-01-14 2017-07-11 国家电网公司 一种基于ims的融合视频通信系统
CN106803921A (zh) * 2017-03-20 2017-06-06 深圳市丰巨泰科电子有限公司 基于ar技术的即时音视频通信方法及装置
CN108377398A (zh) * 2018-04-23 2018-08-07 太平洋未来科技(深圳)有限公司 基于红外的ar成像方法、系统及电子设备
CN109740476A (zh) * 2018-12-25 2019-05-10 北京琳云信息科技有限责任公司 即时通讯方法、装置和服务器

Also Published As

Publication number Publication date
CN112788273B (zh) 2022-12-02
CN112788273A (zh) 2021-05-11

Similar Documents

Publication Publication Date Title
WO2021088691A1 (zh) 一种增强现实ar通信系统及基于ar的通信方法
WO2021088690A1 (zh) 一种基于增强现实的通信方法及装置
US11490033B2 (en) Video generating method, apparatus, electronic device and computer storage medium
EP2288104B1 (en) Flexible decomposition and recomposition of multimedia conferencing streams using real-time control information
US9282287B1 (en) Real-time video transformations in video conferences
CN107770626A (zh) 视频素材的处理方法、视频合成方法、装置及存储介质
WO2021057120A1 (zh) 数据传输方法、装置、以及计算机存储介质
US8849900B2 (en) Method and system supporting mobile coalitions
WO2021104181A1 (zh) 一种基于增强现实的通信方法及装置
WO2021185302A1 (zh) 基于云手机的直播和配置方法以及相关装置和系统
US11128894B2 (en) Method and mobile terminal for processing data
WO2010003332A1 (zh) 视频会议实现方法、设备及系统
CN113727178B (zh) 投屏资源控制方法及其装置、设备与介质
CN116319790A (zh) 全真场景的渲染方法、装置、设备和存储介质
CN112533053B (zh) 直播互动方法、装置、电子设备及存储介质
CN105122761A (zh) 基于分组的呼叫的附加媒体会话的本地控制
US20230217047A1 (en) Method, system, and computer-readable recording medium for implementing fast-switching mode between channels in multi-live transmission environment
JP2015527818A (ja) ビデオ会議環境のためのビデオ表示変更
CN113727177B (zh) 投屏资源播放方法及其装置、设备与介质
WO2022022580A1 (zh) 一种网络直播互动方法及设备
CN115941876A (zh) 音视频会议实现方法、装置、存储介质及计算机设备
CN111367598B (zh) 动作指令的处理方法、装置、电子设备及计算机可读存储介质
CN112995692A (zh) 互动数据处理方法、装置、设备及介质
WO2022206624A1 (zh) 一种增强现实通信的方法、装置及系统
CN115037979B (zh) 投屏方法及相关设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20885775

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20885775

Country of ref document: EP

Kind code of ref document: A1