WO2023185648A1 - 一种通信方法、装置及系统 - Google Patents

一种通信方法、装置及系统 Download PDF

Info

Publication number
WO2023185648A1
WO2023185648A1 PCT/CN2023/083485 CN2023083485W WO2023185648A1 WO 2023185648 A1 WO2023185648 A1 WO 2023185648A1 CN 2023083485 W CN2023083485 W CN 2023083485W WO 2023185648 A1 WO2023185648 A1 WO 2023185648A1
Authority
WO
WIPO (PCT)
Prior art keywords
call
video
call terminal
media
transmission channel
Prior art date
Application number
PCT/CN2023/083485
Other languages
English (en)
French (fr)
Inventor
黄顺炎
冯军辉
张春河
庄乃峰
王坤
徐长月
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2023185648A1 publication Critical patent/WO2023185648A1/zh

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/141Systems for two-way working between two video terminals, e.g. videophone
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/10Architectures or entities
    • H04L65/1016IP multimedia subsystem [IMS]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/15Conference systems

Definitions

  • the embodiments of the present application relate to the field of communication technology, and in particular, to a communication method, device and system.
  • Embodiments of the present application provide a communication method, device and system, which can add a marking function for video pictures on the basis of the existing call function, so that the user can present to the other party what he has done to the target object in the video picture during the call. mark.
  • embodiments of the present application provide a communication method, which is executed by a call terminal.
  • the method includes: establishing a video call media transmission channel, and transmitting the call between the call terminal and the opposite end through the video call media transmission channel.
  • the first video picture is presented on the call interface of the call terminal based on the target media data; and the user's mark operation on the target object in the first video picture is detected, and mark trace data is generated, and the mark trace data is used to describe Marking traces generated by the marking operation; and transmitting marked media data to the opposite end call terminal through the video call media transmission channel, so that the opposite end call terminal performs the marking on the opposite end call terminal based on the marked media data.
  • the call interface presents a second video picture, and the second video picture contains the mark trace.
  • the call terminal can transmit marked media data based on the existing video call media transmission channel, without spending extra time to establish a transmission channel dedicated to transmitting marked media data, and without occupying the terminal (including call terminal and peer call terminal). In this way, the port resources occupied when marking target objects in the video screen during the call can be saved.
  • the embodiment of the present application does not limit the form of the user's marking operation on the first video screen.
  • the marking operation may be a touch operation on the first video screen, or it may be a touch operation on the first video screen.
  • Yibai Scribing operations on the board screen such as drawing different forms of lines in the first video screen to mark the target object, such as encircling the target object through a rectangular box, circle, triangle or irregular closed shape, or using a real Mark the target object with a line, dotted line, arrow line, or other special symbol (such as a five-pointed star drawn next to the target object).
  • a marking trace corresponding to the specific behavior of the marking operation can be formed, and the user can mark the target object in any way that can mark the target object.
  • the above-mentioned mark trace data includes but is not limited to the time stamp, color, shape, position (such as the coordinates of each point on the mark trace and other position parameters), etc., which can indicate the above-mentioned mark trace.
  • the mark trace in a certain video picture is a red circle
  • the mark trace data includes data indicating the video picture (such as the timestamp or identification information of the video picture), data indicating that the color of the mark trace is red, and indicating the mark.
  • the target media data includes data corresponding to a first video frame, which is received from the peer call terminal through the video call media transmission channel and used for presentation.
  • the first video picture, the presenting the first video picture on the call interface of the call terminal based on the target media data includes: decoding the data corresponding to the first video frame to present the first video picture on the call interface. Video footage.
  • the media server first receives video stream data from the peer call terminal through the second video call media transmission channel (the video stream data is the call video stream), and the video stream data includes data corresponding to the first video frame, Then the media server sends the video stream data to the call terminal through the first video call media transmission channel, and then the call terminal obtains the data corresponding to the first video frame from the video stream data, and based on the data corresponding to the first video frame, the call terminal The call interface of the terminal presents the first video screen.
  • the media server first receives video stream data from the peer call terminal through the second video call media transmission channel (the video stream data is the call video stream), and the video stream data includes data corresponding to the first video frame, Then the media server obtains the data corresponding to the first video frame from the video stream data, and then the media server sends the data corresponding to the first video frame to the call terminal through the first video call media transmission channel, and then the call terminal can based on the first video
  • the data corresponding to the frame presents the first video picture on the call interface of the call terminal.
  • the target media data includes data corresponding to a target image.
  • the target image is stored locally in the call terminal and is used to present the first video picture.
  • the data presenting the first video picture includes: decoding the data corresponding to the target image to present the first video picture on the call interface.
  • the marked media data includes data corresponding to a second video frame, and the second video frame is used to present a second video picture in which the marked traces are embedded; or, the marked media data Contains the marked trace data.
  • the communication method provided by the embodiment of the present application further includes: stopping transmitting the call video stream through the video call media transmission channel, so that the marked media data can be transmitted through the video call media transmission channel.
  • the video call media transmission channel includes a first video call media transmission channel between the call terminal and the media server, and a third video call media transmission channel between the peer call terminal and the media server. 2.
  • Video call media transmission channel The above-mentioned transmission of marked media data to the peer call terminal through the video call media transmission channel includes: transmitting the first marked media data to the media server through the first video call media transmission channel to trigger the The media server uses the second video call media The transmission channel transmits the second tagged media data to the peer call terminal.
  • both the first marked media data and the second marked media data include data corresponding to the second video frame; or, the first marked media data and the second marked media data are both the Mark trace data; or, the first marked media data is the marked trace data, and the second marked media data is data corresponding to the second video frame.
  • the first marked media data and the second marked media data are the same, and both include data corresponding to the second video frame.
  • the call terminal before the call terminal transmits the marked media data to the media server, the call terminal will superimpose and present the above-mentioned mark traces on the first video picture presented on the call interface of the call terminal to form a second video picture, and will be used for
  • the data corresponding to the second video frame presenting the second video picture is sent to the media server, and then the media server sends the data corresponding to the second video frame to the peer call terminal. In this way, the peer call terminal receives the second video frame. After corresponding data is obtained, the second video picture can be presented on the call interface of the peer call terminal.
  • the first marked media data and the second marked media data are the same, and both are marked trace data.
  • the call terminal sends the marked trace data to the media server, and the media server forwards the marked media data to the peer call terminal, and then the peer call terminal transmits the target media data (the The target media data (from the video content shot by the peer call terminal) and the mark trace data are superimposed to obtain the data corresponding to the second video frame, and based on the data corresponding to the second video frame, the second call interface of the peer call terminal is presented. Two video screens.
  • the above-mentioned first marked media data is different from the second marked media data.
  • the first marked media data is marked trace data
  • the second marked media data is data corresponding to the second video frame.
  • the call terminal sends the marked trace data to the media server, and the media server combines the target media data (the target media data obtained by the media server from the opposite end call terminal) and the marked trace data. Overlay is performed to obtain the data corresponding to the second video frame, and the data corresponding to the second video frame is sent to the opposite end call terminal, so that the opposite end call terminal can conduct a call on the opposite end call terminal based on the data corresponding to the second video frame.
  • the interface displays the second video screen.
  • the video call media transmission channel is a direct video call media transmission channel between the call terminal and the opposite end call terminal, and the video call media transmission channel passes through the video call media transmission channel to the video call media transmission channel.
  • the peer call terminal transmits the marked media data, including: transmitting the marked media data to the peer call terminal through the direct video call media transmission channel.
  • the above-mentioned marked media data includes data corresponding to the second video frame.
  • the call terminal will superimpose and present the above-mentioned mark traces on the first video picture presented on the call interface of the call terminal to form a second video picture, and will correspond to the second video frame used to present the second video picture.
  • the data is sent to the peer call terminal. In this way, after the peer call terminal receives the data corresponding to the second video frame, the second video frame can be presented on the call interface of the peer call terminal.
  • the above-mentioned marked media data is marked trace data.
  • the call terminal after the call terminal obtains the marked trace data, the call terminal sends the marked trace data to the peer call terminal, and then the peer call terminal sends the target media data (the target media data comes from the video shot by the peer call terminal). content) and mark trace data to obtain data corresponding to the second video frame, and based on the data corresponding to the second video frame, the second video picture is presented on the call interface of the peer call terminal.
  • the communication method provided by the embodiment of the present application further includes: from the media service
  • the device receives transmission channel indication information, and the transmission channel indication information instructs the call terminal to transmit the marked media data through the video call media transmission channel.
  • the transmission channel indication information can be carried in a SIP message.
  • the marked media data can be transmitted through the video call media transmission channel through an explicit instruction method (i.e., sending transmission channel indication information).
  • the media server can also instruct through an implicit instruction method.
  • the video call media transport channel transports tagged media data.
  • the communication method provided by the embodiment of the present application further includes: sending a video picture marking application to the media server, where the video picture marking application includes the identification of the peer call terminal, and the identification Used to apply for a marking operation on the video picture corresponding to the peer call terminal; and to receive the video content sent by the peer call terminal from the media server to present the first video picture.
  • the video picture marking application includes the identification of the peer call terminal, and the identification Used to apply for a marking operation on the video picture corresponding to the peer call terminal
  • receive the video content sent by the peer call terminal from the media server to present the first video picture there may be multiple peer call terminals communicating with the call terminal. During the conversation between the call terminal and multiple peer call terminals, the call terminal may apply to transmit marked media data to a certain peer call terminal.
  • the communication method provided by the embodiment of the present application further includes: confirming that the call terminal has the resources required to perform a marking operation on the first video frame.
  • the state of the call terminal itself may change.
  • the current network signal of the call terminal may be poor, or it may be in a 2G/3G network, and its bandwidth is not enough to support the call terminal.
  • the calling terminal does not have the resources required to perform the marking operation on the first video screen.
  • confirming that the call terminal has the resources required to perform a marking operation on the first video picture includes: receiving a SIP message from the media server, the SIP message including the marking operation. Confirmation identification, the marking operation confirmation identification is used to confirm whether the call terminal has the resources required to perform the marking operation on the first video picture; and sends a response message of the SIP message to the media server, the The response message includes a mark operation response identifier, and the mark operation response identifier is used to indicate that the call terminal has the resources required to perform a mark operation on the first video picture.
  • the above-mentioned marking operation confirmation identifier in the SIP message can be carried in the header field of the SIP message; or, in the case where the SIP message includes the SDP information of the media server, the above-mentioned marking operation confirmation identifier can also be carried in the media server in the SDP information.
  • embodiments of the present application provide a communication method.
  • the method is executed by a media server.
  • the method includes: establishing a first video call media transmission channel and a second video call media transmission channel.
  • the first video call media transmission channel The channel is a video call media transmission channel between the media server and the call terminal
  • the second video call media transmission channel is a video call media transmission channel between the media server and the opposite call terminal
  • the first video call media transmission channel and the second video call media transmission channel transmit the call video stream between the call terminal and the opposite end call terminal, so as to realize the communication between the call terminal and the opposite end call terminal.
  • the video call service between the parties, the call video stream includes the video content shot by the call terminal or the opposite end call terminal; and the first marked media data is received from the call terminal through the first video call media transmission channel, so The first mark media data is used to present a second video picture on the call interface of the peer call terminal.
  • the second video picture contains mark traces, and the mark traces are presented by the user to the call interface of the call terminal.
  • Marking traces generated by the marking operation on the target object in the first video picture which is a video picture presented on the call interface of the call terminal based on the target data; and transmitted through the second video call media
  • the channel transmits second tagged media data to the peer call terminal, and the third The second tagged media data is used to present the second video picture on the call interface of the peer call terminal.
  • the media server serves as the transmission medium between the call terminal and the peer call terminal, and can transmit marked media data (including the above-mentioned first marked media data and Secondly, mark the media data), so that there is no need to spend extra time to establish a transmission channel dedicated to marking media data, and there is no need to occupy additional port resources of the terminal (including the call terminal and the opposite end call terminal). In this way, it can save the need for communication during the call.
  • the target media data includes data corresponding to a first video frame
  • the first video frame is the call terminal transmitting data from the opposite end call terminal through the first video call media transmission channel. Received and used to present the first video frame.
  • the target media data includes data corresponding to a target image
  • the target image is stored locally in the call terminal and is used to present the first video picture.
  • both the first marked media data and the second marked media data include data corresponding to a second video frame, and the second video frame is used to present the third video in which the marked trace is embedded.
  • Two video frames; or, the first marked media data and the second marked media data are both the marked trace data; or the first marked media data is the marked trace data, and the second marked media data is the marked trace data.
  • the media data is data corresponding to the second video frame.
  • the communication method provided by the embodiment of the present application further includes: sending first transmission channel indication information to the call terminal, and the first transmission channel indication information instructs the call terminal to communicate through the first The video call media transmission channel transmits the first marked media data.
  • the communication method provided by the embodiment of the present application further includes: sending second transmission channel indication information to the opposite end call terminal, and the second transmission channel indication information instructs the opposite end call terminal to pass The second video call media transmission channel transmits the second tagged media data.
  • the communication method provided by the embodiment of the present application further includes: stopping transmitting the call video stream through the first video call media transmission channel.
  • the communication method provided by the embodiment of the present application further includes: stopping transmitting the call video stream through the second video call media transmission channel; stopping transmitting the call video stream through the second video call media transmission channel. Described call video stream.
  • the communication method provided by the embodiment of the present application further includes: receiving a video screen marking application from the call terminal, the video screen marking application includes an identification of the opposite end call terminal, and the identification Used to apply for a marking operation on the video screen corresponding to the peer call terminal; and send a video screen marking request to the peer call terminal, where the video screen marking request is used to request to mark the first video screen Operation; and receiving a response message to the video picture marking request from the peer call terminal, where the response message is used to indicate that the peer call terminal agrees to perform a marking operation on the first video picture.
  • the above video screen marking request also indicates a request to convert the voice call into a video call
  • the response message of the video screen marking request also indicates that the peer call terminal agrees to convert the voice call into a video call. Convert the call to a video call.
  • the first video picture comes from the video content received by the call terminal from the peer call terminal, and the response message of the video picture marking request also indicates that the peer call terminal agrees to the media server capturing the call video sent by the peer call terminal to the media server. flow.
  • the communication method provided by the embodiment of the present application further includes: confirming that the call terminal has the resources required to perform a marking operation on the first video frame.
  • confirming that the call terminal has the resources required to perform a marking operation on the first video picture includes: sending a SIP message to the call terminal, where the SIP message includes a mark operation confirmation identifier, so The mark operation confirmation identifier is used to confirm whether the call terminal has the resources required to perform a mark operation on the first video screen; and receives a response message of the SIP message from the call terminal, the response message includes the mark operation A response identifier.
  • the mark operation response identifier is used to indicate that the call terminal has the resources required to perform a mark operation on the first video picture.
  • embodiments of the present application provide a call terminal, including: a processing module, a generating module and a sending module.
  • the processing module is used to establish a video call media transmission channel, and control the call terminal to transmit the call video stream between the call terminal and the opposite end call terminal through the video call media transmission channel, so as to achieve the above The video call service between the call terminal and the opposite call terminal; and based on the target media data, the first video picture is presented on the call interface of the call terminal; the generating module is used to detect the user's response to the first video call.
  • the marking operation of the target object in the video screen generates marking trace data, and the marking trace data is used to describe the marking traces generated by the marking operation; the sending module is used to send the mark trace data to the target object through the video call media transmission channel.
  • the peer call terminal transmits marked media data, so that the peer call terminal presents a second video picture on the call interface of the peer call terminal based on the marked media data, and the second video picture includes the mark trace.
  • the target media data includes data corresponding to a first video frame, which is received from the peer call terminal through the video call media transmission channel and used for presentation.
  • the first video frame; the processing module is specifically configured to decode the data corresponding to the first video frame to present the first video frame on the call interface.
  • the target media data includes data corresponding to a target image, which is stored locally in the call terminal and used to present the first video picture; the processing module, Specifically used to decode the data corresponding to the target image to present the first video picture on the call interface.
  • the marked media data includes data corresponding to a second video frame, and the second video frame is used to present a second video picture in which the marked traces are embedded; or, the marked media data Contains the marked trace data.
  • the processing module is also configured to control the call terminal to stop transmitting the call video stream through the video call media transmission channel.
  • the video call media transmission channel includes a first video call media transmission channel between the call terminal and the media server, and a third video call media transmission channel between the peer call terminal and the media server.
  • the video call media transmission channel transmits first tagged media data to the media server to trigger the media server to transmit second tagged media data to the peer call terminal through the second video call media transmission channel.
  • both the first marked media data and the second marked media data include data corresponding to the second video frame; or, the first marked media data and the second marked media data
  • the media data are all the marked trace data; or, the first marked media data is the marked trace data, and the second marked media data is the data corresponding to the second video frame.
  • the video call media transmission channel is a direct video call media transmission channel between the call terminal and the opposite end call terminal; the sending module is specifically configured to pass the direct The video call media transmission channel transmits the marked media data to the peer call terminal.
  • the call terminal further includes a receiving module; the receiving module is configured to receive transmission channel indication information from the media server, and the transmission channel indication information instructs the call terminal to transmit information through the video The call media transmission channel transmits the marked media data.
  • the sending module is also configured to send a video screen marking application to the media server.
  • the video screen marking application includes the identification of the peer call terminal, and the identification is used to apply for A marking operation is performed on the video picture corresponding to the peer call terminal.
  • the receiving module is further configured to receive a Session Initiation Protocol SIP message from the media server.
  • the SIP message includes a marking operation confirmation identifier, and the marking operation confirmation identifier is used to confirm the Whether the call terminal has the resources required to perform a marking operation on the first video picture;
  • the sending module is also used to send a response message of the SIP message to the media server, where the response message includes a marking operation response
  • the mark operation response mark is used to indicate that the call terminal has the resources required to mark the video picture.
  • embodiments of the present application provide a media server execution, including: a processing module, a receiving module, and a sending module.
  • the processing module is used to establish a first video call media transmission channel and a second video call media transmission channel.
  • the first video call media transmission channel is the video call media transmission between the media server and the call terminal.
  • the second video call media transmission channel is the video call media transmission channel between the media server and the peer call terminal; and the receiving module or the sending module is controlled to transmit through the first video call media
  • the channel and the second video call media transmission channel transmit the call video stream between the call terminal and the opposite call terminal to realize the video call service between the call terminal and the opposite call terminal;
  • the receiving module is configured to receive first tagged media data from the call terminal through the first video call media transmission channel, and the first tagged media data is used to present the second video on the call interface of the peer call terminal.
  • the second video screen contains mark traces, and the mark traces are mark traces generated by the user performing a mark operation on the target object in the first video screen presented on the call interface of the call terminal.
  • the first video The picture is a video picture presented on the call interface of the call terminal based on the target data; the sending module is used to transmit the second marked media data to the opposite end call terminal through the second video call media transmission channel, so The second marked media data is used to present the second video picture on the call interface of the peer call terminal.
  • the target media data includes data corresponding to a first video frame
  • the first video frame is the call terminal transmitting data from the opposite end call terminal through the first video call media transmission channel. Received and used to present the first video frame.
  • the target media data includes data corresponding to a target image
  • the target image is stored locally in the call terminal and is used to present the first video picture.
  • both the first marked media data and the second marked media data include data corresponding to a second video frame, and the second video frame is used to present the third video in which the marked trace is embedded.
  • Two video frames; or, the first marked media data and the second marked media data are both the marked trace data; or the first marked media data is the marked trace data, and the second marked media data is the marked trace data.
  • the media data is data corresponding to the second video frame.
  • the sending module is further configured to send first transmission channel indication information to the call terminal, and the first transmission channel indication information instructs the call terminal to use the video call media transmission channel Transmit the first marked media data; and send second transmission channel indication information to the peer call terminal, the second transmission channel indication information instructs the peer call terminal to transmit the video call media data through the video call media transmission channel. the second tagged media data.
  • the processing module is further configured to control the media server to stop transmitting the call video stream through the first video call media transmission channel; and to control the media server to stop transmitting the call video stream through the second video call media transmission channel.
  • the video call media transmission channel transmits the call video stream.
  • the receiving module is further configured to receive a video screen marking application from the call terminal, where the video screen marking application includes an identifier of the peer call terminal, and the identifier is used to apply for Perform a marking operation on the video screen corresponding to the peer call terminal;
  • the sending module is also used to send a video screen marking request to the peer call terminal, and the video screen marking request is used to request the video screen to be marked.
  • the receiving module is further configured to receive a response message for the video picture marking request from the peer call terminal, where the response message is used to indicate that the peer call terminal agrees to mark the video picture. Mark operation.
  • the sending module is further configured to send a Session Initiation Protocol SIP message to the call terminal.
  • the SIP message includes a marking operation confirmation identifier, and the marking operation confirmation identifier is used to confirm the Whether the call terminal has the resources required to perform a marking operation on the first video picture;
  • the receiving module is also configured to receive a response message of the SIP message from the calling terminal, where the response message includes a mark operation response
  • the marking operation response identifier is used to indicate that the call terminal has the resources required to perform a marking operation on the first video picture.
  • embodiments of the present application provide a call terminal, including a memory and at least one processor connected to the memory.
  • the memory is used to store computer program code.
  • the computer program code includes computer instructions. When the computer instructions are processed by at least one processor When executed, the call terminal is caused to execute the method described in any one of the first aspect and its possible implementation manners.
  • embodiments of the present application provide a media server, including a memory and at least one processor connected to the memory.
  • the memory is used to store computer program code.
  • the computer program code includes computer instructions. When the computer instructions are processed by at least one processor When executed, the media server is caused to execute the method described in any one of the second aspect and its possible implementation manners.
  • embodiments of the present application provide a computer-readable storage medium that includes computer instructions.
  • the call terminal causes the call terminal to execute any one of the first aspect and its possible implementations. Methods.
  • embodiments of the present application provide a computer-readable storage medium that includes computer instructions.
  • the media server causes the media server to execute any one of the second aspect and its possible implementations. Methods.
  • embodiments of the present application provide a computer program product.
  • the computer program product is run on a computer, the method described in any one of the first aspect and its possible implementations is executed.
  • embodiments of the present application provide a computer program product.
  • the computer program product is run on a computer, the method described in any one of the second aspect and its possible implementations is executed.
  • embodiments of the present application provide a chip including a memory and a processor.
  • Memory is used to store computer instructions.
  • the processor is configured to call and run the computer instructions from the memory, so that the call terminal executes the method described in any one of the first aspect and its possible implementations.
  • embodiments of the present application provide a chip including a memory and a processor.
  • Memory is used to store computer instructions.
  • the processor is configured to call and run the computer instructions from the memory, so that the media server performs the method described in any one of the second aspect and its possible implementations.
  • embodiments of the present application provide a communication system, including a call terminal and a media server.
  • the call terminal performs the method described in any one of the first aspect and its possible implementations
  • the media server performs the method described in any one of the second aspect and its possible implementations.
  • Figure 1 is a schematic architectural diagram of a communication system in a manual customer service service scenario provided by an embodiment of the present application
  • Figure 2 is a schematic flow chart of a voice call provided by an embodiment of the present application.
  • Figure 3 is a schematic flow chart of a video call provided by an embodiment of the present application.
  • Figure 4A is a hardware schematic diagram of a mobile phone provided by an embodiment of the present application.
  • Figure 4B is a software structure block diagram of a mobile phone provided by an embodiment of the present application.
  • FIG. 5 is a hardware schematic diagram of a server provided by an embodiment of the present application.
  • Figure 6 is one of the schematic diagrams of a communication method provided by an embodiment of the present application.
  • Figure 7 is a second schematic diagram of a communication method provided by an embodiment of the present application.
  • Figure 8 is a schematic structural diagram of a call terminal provided by an embodiment of the present application.
  • FIG. 9 is a schematic structural diagram of another call terminal provided by an embodiment of the present application.
  • Figure 10 is a schematic structural diagram of a media server provided by an embodiment of the present application.
  • Figure 11 is a schematic structural diagram of another media server provided by an embodiment of the present application.
  • a and/or B can mean: A exists alone, A and B exist simultaneously, and they exist alone. B these three situations.
  • first and second in the description and claims of the embodiments of this application are used to distinguish different objects, rather than to describe a specific order of objects.
  • first video call media transmission channel and the second video call media transmission channel are used to distinguish different video call media transmission channels, rather than to describe A specific sequence of media transmission channels for video calls.
  • multiple call terminals refer to two or more call terminals.
  • voice calls or video calls between users can be realized through terminals.
  • user 1 can dial the number of terminal 2 held by user 2 through terminal 1 held by user 1 ( For example, phone number), after user 2 responds through terminal 2, terminal 1 held by user 1 and terminal 2 held by user 2 can establish a call connection, so that user 1 and user 2 can have a voice call, for example, terminal 1 collects user 1's voice and sends the collected voice to terminal 2.
  • Terminal 2 collects user 2's voice and sends the collected voice to terminal 1.
  • one of the terminals can be called a call terminal, and the opposite end of the call with the call terminal is called a peer call terminal.
  • the above-mentioned terminal 1 is a call terminal
  • the terminal 2 is the peer call terminal.
  • one call terminal may have multiple counterpart call terminals.
  • three terminals (such as terminal 1, terminal 2, and terminal 3) conduct a three-party conversation, where terminal 1 is the call terminal, and terminal 1
  • terminal 1 is the call terminal
  • terminal 1 terminal 1
  • terminal 1 terminal 1
  • terminal 1 terminal 1
  • terminal 1 terminal 1
  • terminal 1 terminal 1
  • terminal 1 terminal 1
  • terminal 1 terminal 1
  • terminal 1 terminal 1
  • terminal 1 terminal 1
  • terminal 1 terminal 1
  • terminal 1 terminal 1
  • terminal 1 terminal 1
  • terminal 1 terminal 1
  • terminal 1 terminal 1
  • terminal 1 terminal 1
  • terminal 1 terminal 1
  • terminal 1 terminal 1
  • terminal 1 terminal 1
  • terminal 1 terminal 1
  • terminal 1 terminal 1
  • terminal 1 terminal 1
  • terminal 1 terminal 1
  • terminal 1 terminal 1
  • the media stream transmitted between the call terminal and the opposite call terminal is a voice stream.
  • a voice call media transmission channel (which can also be referred to as voice for short) is established between the call terminal and the opposite call terminal.
  • the call terminal and the peer call terminal can transmit the voice stream based on the voice stream transmission channel. For example, the call terminal sends the corresponding user's voice stream collected by the microphone of the call terminal to the opposite end call terminal through the voice stream transmission channel, and the opposite end call terminal collects the corresponding user's voice stream collected by the microphone of the call terminal.
  • the user's voice stream is sent to the call terminal through the voice stream transmission channel.
  • the media streams transmitted between the call terminal and the opposite call terminal include voice streams and video streams, and the transmission channels of the voice streams and video streams are different.
  • the call terminal and the opposite end call terminal can transmit the voice stream based on the voice stream transmission channel.
  • the call terminal and the peer call terminal can transmit the video stream based on the video stream transmission channel.
  • the call terminal sends the corresponding user's voice stream collected by the microphone of the call terminal to the opposite end call terminal through the voice stream transmission channel, and the opposite end call terminal collects the corresponding user's voice stream collected by the microphone of the call terminal.
  • the user's voice stream is sent to the call terminal through the voice stream transmission channel;
  • the call terminal sends the corresponding user's video stream collected by the camera of the call terminal to the peer call terminal through the video stream transmission channel, and the peer call terminal sends the corresponding user's video stream to the peer call terminal through the video stream transmission channel.
  • the video stream of the corresponding peer user collected by the camera of the call terminal is sent to the call terminal through the video stream transmission channel.
  • Video tagging is a video processing technology that refers to marking one or more objects in the video screen to mark the objects that the user is paying attention to in the video screen.
  • a video frame includes people, animals As well as buildings, the objects that users pay attention to are animals, so you can use video marking tools to mark the animals in the video screen.
  • video marking tools For example, use a rectangular frame to encircle the animal object, or use other marks (such as arrows or curves) to identify the animals. object.
  • Video marking is widely used in daily life and can bring a lot of convenience to daily life, such as remote teaching, home appliance repair, broadband repair, remote damage assessment and other scenarios.
  • One user can mark the target object in the video screen, and then The marked media data (marked media data is used to present video images containing marked traces) is sent to the other user, so that the other user can more conveniently pay attention to the information that needs attention from the video images containing marked traces.
  • the user when a home appliance fails, the user (or consumer) can find the after-sales service agency of the home appliance, for example, by calling the after-sales service phone number to report the fault problem, thereby helping the user solve the fault. question.
  • the user In the process of solving the fault problem, the user can describe the fault problem through words. If the language description is not clear, in most cases, the after-sales agency needs to send staff to go offline to troubleshoot the fault and repair it. In this way, the efficiency is relatively low.
  • a more efficient way to complete after-sales service is: the user sends an image or video to the after-sales service agency to show the fault condition of the home appliance, and after the user takes the image or video of the home appliance, he or she can inspect the target in the image or video.
  • the object performs a marking operation to mark the fault point, for example, marking a button or indicator light of a home appliance in an image or video; then the marked media data is sent to the after-sales service agency, and the staff of the after-sales service agency Media data quickly learns the fault points of home appliances, so that users' problems can be solved in a targeted and efficient manner.
  • one user needs to mark the target object in the video screen, the user marks the target object in the video screen, generates mark trace data, and sends the mark trace data to the other user.
  • the user sends tagged media data, which is used to render a video frame containing tagged traces.
  • a transmission channel dedicated to marking media data is established through the server between the video marking APP on the calling terminal and the video marking APP on the peer calling terminal, and then the calling terminal and the peer calling terminal transmit the marking media data through the transmission channel.
  • the call terminal uploads the video picture to the video marking APP on the call terminal.
  • the marked media data is sent to the video through the above transmission channel.
  • the peer call terminal in another case, the peer call terminal uploads the video picture to the video marking APP on the peer call terminal, and after marking the target object in the video picture in the video mark APP, through the above
  • the transmission channel sends the marked media data to the call terminal.
  • Transmission channel which is different from the above-mentioned voice stream transmission channel and video stream transmission channel.
  • the target object in the video screen is marked during the call between the calling terminal and the opposite calling terminal, it will take extra time to establish a transmission channel for transmitting the marked media data, and the calling terminal or The peer call terminal also needs to use additional ports to transmit tagged media data. For example, if video tagging is performed during a voice call between the calling terminal and the peer calling terminal, the calling terminal and the peer calling terminal Voice streams are transmitted between call terminals based on a voice stream transmission channel, and marked media data are transmitted between call terminals and the peer call terminal based on a transmission channel established specifically for transmitting marked media data.
  • the target object in the video screen is marked during a video call between the calling terminal and the opposite calling terminal
  • the video stream collected by the camera is transmitted between the calling terminal and the opposite calling terminal based on the video stream transmission channel.
  • the marked media data is transmitted between the call terminal and the opposite end call terminal based on the established transmission channel dedicated to transmitting the marked media data.
  • embodiments of the present application provide a communication method, device and system.
  • the communication method It can be applied to the process of calls between terminals.
  • the call terminal, media server and peer call terminal in the communication system interact to establish a video call media transmission channel for transmitting the call video stream, and through the video call media transmission channel Transmitting the call video stream between the call terminal and the opposite call terminal to realize the video call service between the call terminal and the opposite call terminal, the call video stream includes video content shot by the call terminal or the opposite call terminal; After that, the first video picture is presented on the call interface of the call terminal based on the target media data; and the user's marking operation on the target object in the first video picture is detected, and marking trace data is generated, and the marking trace data is used to describe the mark operation.
  • the generated mark traces are then transmitted to the opposite end call terminal through the established video call media transmission channel, so that the opposite end call terminal presents the second video picture on the call interface of the opposite end call terminal based on the marked media data.
  • Two video frames contain traces of markers.
  • the communication method provided by the embodiment of the present application can be applied to the process of a call between two call terminals, or can also be applied to the process of a call between multiple call terminals.
  • the communication method provided by the embodiments of this application can be applied to video conferencing scenarios, customer service scenarios, and so on.
  • the customer service scenario is the scenario where the user calls the customer service center (which can also be a customer service system).
  • the call between the user and the customer service center can solve some of the user's service needs, such as for customer service services such as electronic products, insurance, and mobile communications.
  • customer service services such as electronic products, insurance, and mobile communications.
  • users can dial the number of the customer service center to establish a call connection with the customer service center.
  • most customer service processes are: the user dials the number of the customer service center (makes a call), and the customer service center After responding, some prompt content is pushed (i.e.
  • the prompt content is pre-stored by the customer service center.
  • the user can select the service options required based on the prompt content.
  • the customer service center will provide targeted services based on the service options selected by the user. Provide services to users (such as responding to the service options selected by users and solving problems raised by users).
  • the service scenario of manual customer service refers to the scenario in which the user talks to the staff of the customer service center during the call with the customer service center, that is, the above-mentioned user follows the prompts pushed by the customer service center.
  • the customer service center continues to call the staff of the customer service center (specifically, calling the terminal through the number of the terminal held by the staff).
  • the customer service center may be referred to as customer service or customer service system.
  • the staff of the customer service center can be referred to as customer service personnel.
  • the user calls the customer service system through the call terminal held by the user (which can be referred to as the user call terminal).
  • the customer service staff holds the The call terminal (which can be referred to as the customer service call terminal for short) talks to the user call terminal.
  • the customer service staff can To apply for video marking, specifically, after the customer service staff marks the target object in the video screen, the marked media data (used to present the video screen containing marked traces to the user) is sent to the user's call terminal. In this way, the user can The video screen containing marked traces presented on the call terminal can provide solutions provided by customer service personnel, which can effectively solve user problems and improve service quality.
  • this embodiment of the present application takes the service scenario of manual customer service as an example to describe the communication method provided by the embodiment of the present application. It can be understood that in the service scenario of manual customer service, the user call terminal initiates a call, and after the customer service call terminal responds, the two can mark the target object in the video screen during the call.
  • the call initiated by the user's call terminal may be a voice call or a video call.
  • the voice stream transmission channel is first established through media resource negotiation. If a video screen marking application is initiated during the voice call, then Media resource renegotiation is also required to establish a video stream transmission channel to convert the voice call to a video call, and then transmit the marked media data based on the video stream transmission channel corresponding to the video call.
  • a video stream transmission channel is established through media resource negotiation, and then the tagged media data is transmitted based on the video stream transmission channel.
  • the communication system corresponding to the customer service scenario can be regarded as a conference control system.
  • the communication system involves the access network, IP multimedia subsystem (i.e. IMS, including 4G/5G core network and IMS core network) and customer service platform (also called For customer service systems), business systems, etc.
  • IP multimedia subsystem i.e. IMS, including 4G/5G core network and IMS core network
  • customer service platform also called For customer service systems
  • the communication system specifically includes: user call terminal 101, access network equipment 102, IP multimedia subsystem 103, customer service platform 104, and business system 105 and customer service call terminal 106.
  • the IP multimedia subsystem 103 includes a core network (which can be a 4G core network and/or a 5G core network) and an IMS core network.
  • the 4G core network includes gateway devices (such as S-GW, P-GW)
  • the 5G core network includes user plane function (UPF) network elements, mobility management function (AMF) network elements, etc.
  • the IMS core network includes session border controller SBC, proxy-call session control function P-CSCF device, call session control Function I-CSCF network element, service call session control function S-CSCF network element.
  • the customer service platform 104 includes a media server.
  • SBC used to provide secure access and media processing.
  • P-CSCF It is the entry node device for user call terminals to access the IMS core network. It is mainly responsible for signaling and message agency.
  • I-CSCF It is the unified preliminary entry node device of the IMS core network and is responsible for the assignment and query of the S-CSCF registered by the user.
  • S-CSCF It is the central node device of the IMS core network. It is mainly used for user registration, authentication control, session routing and service trigger control, and maintains session status information.
  • the customer service platform 104 includes a control server (which can also be a signaling server) and a media server.
  • the function of the signaling server is mainly responsible for signaling negotiation. and processing, controlling the user call terminal and customer service call terminal to join or exit the call.
  • the function of the media server is mainly responsible for audio and video processing and playback, call venue application and release, audio codec, video codec and mixed coding processing.
  • the functions of the media server and the control server can be integrated in one server.
  • the communication method provided by the embodiments of the present application is based on the example of taking the functions of the media server and the control server both integrated in the media server. Give a description.
  • Business system Responsible for triggering different business processes based on the judgment of the caller (such as the user's call terminal) and the called number. Different businesses can include but are not limited to video calls, video advertisements, video shows, etc.
  • the voice call Describe the process to facilitate understanding of the voice call process in customer service scenarios.
  • the voice call process includes:
  • the user call terminal sends an invitation (invite) message to the media server through the IMS network element.
  • IMS includes network elements of the 4G/5G core network (including gateway equipment/user plane functional network elements), SBC/P-CSCF network elements of the IMS core network, I-CSCF/S-CSCF network elements.
  • these network elements in IMS may be collectively referred to as IMS network elements.
  • the above-mentioned user call terminal sends the invitation message to the media server through the IMS network element, which specifically includes: the user call terminal sequentially passes through the network elements of the 4G/5G core network, the SBC/P-CSCF network element, and the I-CSCF according to the architecture diagram shown in Figure 1.
  • /S-CSCF network element sends the invitation message to the media server.
  • the IMS network element is used to transparently transmit messages between the user's call terminal and the media server, and does not process the messages.
  • the messages or information sent or received through the IMS network element are similar to the invitation message transmitted through the IMS network element in S201.
  • the IMS network elements are used to transparently transmit messages or information. In the following implementation The examples will not be explained one by one.
  • the user call terminal executes the above S201.
  • the customer service can be the customer service of a certain communication operator or the customer service of an Internet operator (such as customer service for banking services, customer service for insurance services, etc.), etc. This application does not limit the type of customer service.
  • the customer service call terminal in the embodiment of the present application refers to the call terminal corresponding to the customer service personnel in the customer service system, and the customer service call terminal is part of the customer service system.
  • the user dials the customer service system through the user call terminal, and after the customer service system responds, the media server in the customer service platform plays audio prompt content related to the user's business to prompt the user to select the corresponding service according to actual needs.
  • the media server in the customer service system continues to call the customer service call terminal. This will be understood in conjunction with the relevant steps of the following embodiments.
  • the user's calling terminal sends the invitation message through the session initiation protocol (SIP). It can also be understood that the invitation message is sent through the SIP message, and the invitation message carries the user's calling terminal.
  • Session description protocol (SDP) information includes the address information of the user's calling terminal, audio port information, and audio codec format.
  • the SDP information is used to negotiate media resources with the media server to establish the user Between call terminal and media server The voice call media transmission channel used to transmit the call voice stream.
  • the address information of the device may be the IP address of the device.
  • the media server sends a ringing message to the user's call terminal.
  • the ringing message is used to indicate that the customer service call dialed by the user is being connected. At this time, the user's call terminal is in a ringing state waiting for a response from the customer service system (ie, off-hook).
  • the ringing message can be an 18* series message, for example 181 message (that is, call being forwarded, used to indicate that the call is forwarding) or 183 message (used to prompt the progress of establishing a dialogue), etc.
  • the ringing message carries the SDP information of the media server, the IP address of the media server, audio port information, and audio codec format.
  • the SDP information is used to negotiate media resources with the user's call terminal to establish the relationship between the user's call terminal and the media server.
  • the voice call media transmission channel used to transmit the call voice stream.
  • the media server sends a response message to the user's call terminal through the IMS network element.
  • the user's call terminal waits for the customer service system to respond (ie, waits for connection). During this process, the user can hear the "beep...beep" waiting tone , or a ring tone can be heard.
  • the customer service system responds, the call is connected and the media server executes the above S203.
  • the IMS network element is used to transparently transmit the response message.
  • the media server can play audio prompt content related to the user's business. Specifically, the media server sends the call to the user's call terminal based on the voice call media transmission channel established above.
  • the audio prompt content is sent, and the audio prompt content can prompt the user to select different service contents according to needs.
  • the voice call is a voice call made by a user to a communications operator
  • the audio prompt content may include:
  • the audio prompt content can also include some advertisements, publicity, etc.
  • the audio prompt content is related to the specific application scenario. This application does not limit the audio prompt content.
  • the media server detects the operation of selecting manual service, and the media server assigns a customer service staff to the user (that is, selects a corresponding customer service staff for the user's call terminal). call terminal), and then the media server executes the following S204.
  • the media server sends an invitation message to the customer service call terminal.
  • the invitation message is used to call the voice call between the customer service call terminal and the user call terminal.
  • the invitation message includes the SDP information of the media server.
  • the SDP information of the media server includes the IP address of the media server, audio port information, and audio codec format.
  • the SDP information is used to negotiate media resources with the customer service call terminal to establish a voice call media transmission channel between the customer service call terminal and the media server for transmitting the call voice stream.
  • the customer service call terminal sends a response message to the media server.
  • the customer service call terminal After the customer service call terminal sends the response message, the customer service call terminal joins the call with the user call terminal.
  • the response message includes the SDP information of the customer service call terminal.
  • the SDP information of the customer service call terminal includes the IP address of the customer service call terminal. Audio port information and audio codec format.
  • the SDP information is used to negotiate media resources of the media server to establish a voice call media transmission channel between the customer service call terminal and the media server for transmitting the call voice stream.
  • the customer service call terminal is a new device in the customer service system that communicates with the user call terminal, in the subsequent process, in order to realize communication between the user call terminal and the customer service call terminal, media resource negotiation needs to be carried out again. That is, the media server renegotiates the media resources with the user call terminal (refer to S206-S207), and the media server renegotiates the media resources with the customer service call terminal (refer to S208-S209). Through the media resource renegotiation, the voice stream transmission channel can be established (i.e. Voice call media transmission channel).
  • the voice call media transmission channel established through S206-S209 is a transmission channel that requires a media server as a medium, that is, an indirect voice call media transmission channel.
  • the voice call media transmission channel includes the communication between the user call terminal and the media server.
  • the media server sends a reinvite message to the user call terminal through the IMS network element.
  • the re-invite message is used to renegotiate media resources with the user call terminal to establish a voice call media transmission channel between the user call terminal and the media server.
  • the re-invite message includes the SDP information of the media server, and the SDP information of the media server includes The media server’s IP address, audio port information, and audio codec format.
  • the user call terminal sends a response message to the media server through the IMS network element.
  • the response message includes the SDP information of the user's calling terminal.
  • the SDP information of the user's calling terminal includes the IP address, audio port information and audio codec format of the user's calling terminal.
  • the user's calling terminal can obtain the SDP information of the media server, and the media server can also obtain the SDP information of the user's calling terminal. In this way, the voice call media between the user's calling terminal and the media server is established. transmission channel.
  • the media server sends a reinvite message to the customer service call terminal.
  • the re-invite message is used to renegotiate media resources with the customer service call terminal to establish a voice call media transmission channel between the customer service call terminal and the media server.
  • the re-invite message includes the SDP information of the media server, and the SDP information of the media server includes The media server’s IP address, audio port information, and audio codec format.
  • the customer service call terminal sends a response message to the media server.
  • the response message includes the SDP information of the customer service call terminal.
  • the SDP information of the customer service call terminal includes the IP address, audio port information and audio codec format of the customer service call terminal.
  • the customer service call terminal can obtain the SDP information of the media server, and the media server can also obtain the SDP information of the customer service call terminal. In this way, the voice call media between the customer service call terminal and the media server is established. transmission channel.
  • this voice call media transmission channel is used to transmit the call voice stream between the customer service call terminal and the user call terminal.
  • the user call terminal sends the call voice stream to the customer service call terminal
  • the user call terminal sends the call voice stream based on the voice call media transmission channel between the user call terminal and the media server.
  • the stream is sent to the media server, and then the media server sends the received call voice stream to the customer service call terminal based on the voice call media transmission channel between the media server and the customer service call terminal.
  • a voice call media transmission channel directly used to transmit the call voice stream between the user call terminal and the customer service call terminal can also be established through media resource negotiation.
  • the voice call media transmission channel directly used to transmit the call voice stream is a channel that does not require a media server as a transfer device, and is not necessarily a direct connection channel between the user call terminal and the customer service call terminal.
  • the above-mentioned S206-S209 can be replaced by S206'-S210'.
  • the media server sends a reinvite message to the user call terminal through the IMS network element.
  • the re-invite message is used to renegotiate media resources with the user's call terminal.
  • the re-invite message includes the SDP information of the media server.
  • the SDP information of the media server includes the IP address of the media server, audio port information, and audio codec format.
  • the user call terminal sends a response message to the media server through the IMS network element.
  • the response message includes the SDP information of the user's calling terminal.
  • the SDP information of the user's calling terminal includes the IP address, audio port information and audio codec format of the user's calling terminal.
  • the media server sends a reinvite message to the customer service call terminal.
  • the re-invite message is used to renegotiate media resources with the customer service call terminal.
  • the re-invite message includes the SDP information of the user's call terminal.
  • the SDP information of the user's call terminal includes the IP address, audio port information and audio encoding of the user's call terminal. Decoding format.
  • the customer service call terminal sends a response message to the media server.
  • the response message includes the SDP information of the customer service call terminal.
  • the SDP information of the customer service call terminal includes the IP address, audio port information and audio codec format of the customer service call terminal.
  • the media server sends a response message carrying the SDP information of the customer service call terminal to the user call terminal.
  • the user call terminal can obtain the SDP information of the customer service call terminal, and the customer service call terminal can obtain the SDP information of the user call terminal, that is, a voice communication between the user call terminal and the customer service call terminal is established.
  • Call media transmission channel Based on the established voice call media transmission channel, user call terminals and customer service call terminals can communicate directly without the need for a media server to forward the call voice stream.
  • the user call terminal can directly send the call voice stream to the customer service call terminal based on the voice call media transmission channel between the user call terminal and the media server.
  • the customer service call terminal can also send the call voice stream based on the voice call media transmission channel. Send the call voice stream directly to the user's call terminal.
  • the video call is taken as an example to describe the process of the video call.
  • the video call process is similar to the above-mentioned voice call process.
  • the video call process includes:
  • the user call terminal sends an invitation (invite) message to the media server through the IMS network element.
  • the invitation message carries the SDP information of the user's calling terminal.
  • the SDP information includes the IP address of the user's calling terminal, audio port information, audio codec format, video port information and video codec format.
  • the SDP information is used to communicate with the media.
  • the server negotiates media resources to establish a voice call media transmission channel for transmitting the call voice stream and a video call media transmission channel for transmitting the call video stream between the user call terminal and the media server. It is understandable that the video call process involves the transmission of call voice streams and call video streams. Therefore, compared to the audio call process, the SDP information during the video call process also needs to include video port information. information and video codec formats.
  • the media server sends a ringing message to the user's call terminal.
  • the ringing message carries the SDP information of the media server, the IP address of the media server, audio port information, audio codec format, video port information and video codec format.
  • the SDP information is used to negotiate media resources with the user's call terminal. , to establish a voice call media transmission channel for transmitting call voice streams and a video call media transmission channel for transmitting call video streams between the user call terminal and the media server.
  • the media server sends a response message to the user's call terminal through the IMS network element.
  • the media server when the user initiates a video call, after the customer service system responds to the call from the user's call terminal, the media server can play video prompt content related to the user's business. Specifically, the media server is based on the above The established voice call media transmission channel and video call media transmission channel send the video prompt content to the user call terminal, and the video prompt content can prompt the user to select different service contents according to needs.
  • the media server When the user operates under the prompt content of the video and selects manual service, the media server detects the operation of selecting manual service, and the media server assigns a customer service staff to the user (that is, selects a corresponding customer service staff for the user's call terminal). call terminal), and then the media server executes the following S304.
  • the media server sends an invitation message to the customer service call terminal.
  • the invitation message is used to call the video session between the customer service call terminal and the user call terminal.
  • the invitation message includes the SDP information of the media server.
  • the SDP information of the media server includes the IP address of the media server, audio port information, audio codec format, Video port information and video codec format.
  • the SDP information is used to negotiate media resources with the customer service call terminal to establish a voice call media transmission channel for transmitting the call voice stream and a video call media transmission channel for transmitting the call video stream between the customer service call terminal and the media server. .
  • the customer service call terminal sends a response message to the media server.
  • the customer service call terminal After the customer service call terminal sends the response message, the customer service call terminal joins the video call with the user call terminal.
  • the response message includes the SDP information of the customer service call terminal.
  • the SDP information of the customer service call terminal includes the IP address of the customer service call terminal. Audio port information, audio codec format, video port information and video codec format.
  • the SDP information is used to negotiate media resources with the media server to establish a voice call media transmission channel for transmitting the call voice stream and a video call media transmission channel for transmitting the call video stream between the customer service call terminal and the media server.
  • the customer service call terminal is a new device in the customer service system that communicates with the user call terminal, in the subsequent process, in order to realize communication between the user call terminal and the customer service call terminal, media resource negotiation needs to be carried out again. That is, the media server and the user call terminal perform media resource renegotiation (refer to S306-S307), and the media server and the customer service call terminal perform media resource renegotiation (refer to S308-S309). Through media resource renegotiation, a voice call media transmission channel and a video call media transmission channel can be established.
  • the voice call media transmission channel and video call media transmission channel established through S306-S309 are transmission channels that require a media server as a media, that is, an indirect voice call media transmission channel and an indirect video call media transmission channel.
  • the voice call media transmission channel includes the voice call media transmission channel between the user call terminal and the media server, and the voice call media transmission channel between the media server and the customer service call terminal.
  • the video call media transmission channel includes the voice call media transmission channel between the user call terminal and the media server. video call media transmission channel, And the video call media transmission channel between the media server and the customer service call terminal.
  • the media server sends a reinvite message to the user call terminal through the IMS network element.
  • the re-invite message is used to renegotiate media resources with the user call terminal to establish a voice call media transmission channel between the user call terminal and the media server, and a video call media transmission channel between the user call terminal and the media server.
  • the re-invite message includes the SDP information of the media server.
  • the SDP information of the media server includes the IP address, audio port information, audio codec format, video port information, and video codec format of the media server.
  • the user call terminal sends a response message to the media server through the IMS network element.
  • the response message includes the SDP information of the user's calling terminal.
  • the SDP information of the user's calling terminal includes the IP address of the user's calling terminal, audio port information, audio codec format, video port information, and video codec format.
  • the user's calling terminal can obtain the SDP information of the media server, and the media server can also obtain the SDP information of the user's calling terminal.
  • the voice call media between the user's calling terminal and the media server is established. transmission channel, and the video call media transmission channel between the user call terminal and the media server.
  • the media server sends a reinvite message to the customer service call terminal.
  • the re-invite message is used to renegotiate media resources with the customer service call terminal to establish a voice call media transmission channel between the customer service call terminal and the media server, and a video call media transmission channel between the customer service call terminal and the media server.
  • the re-invite message includes the SDP information of the media server.
  • the SDP information of the media server includes the IP address of the media server, audio port information, audio codec format, video port information, and video codec format.
  • the customer service call terminal sends a response message to the media server.
  • the response message includes the SDP information of the customer service call terminal.
  • the SDP information of the customer service call terminal includes the IP address of the customer service call terminal, audio port information, audio codec format, video port information, and video codec format.
  • the customer service call terminal can obtain the SDP information of the media server, and the media server can also obtain the SDP information of the customer service call terminal.
  • the voice call media between the customer service call terminal and the media server is established. transmission channel, and the video call media transmission channel between the customer service call terminal and the media server.
  • the voice call media transmission channel established through the above S306-S309 (including the voice call media transmission channel between the user call terminal and the media server, and the voice call media transmission channel between the customer service call terminal and the media server)
  • the voice call media transmission channel is used to transmit the call voice stream between the customer service call terminal and the user call terminal
  • the video call media transmission channel established through the above S306-S309 (including the video call media between the user call terminal and the media server transmission channel, and the video call media transmission channel between the customer service call terminal and the media server).
  • the video call media transmission channel is used to transmit the call video stream between the customer service call terminal and the user call terminal.
  • a voice call media transmission channel and a voice call media transmission channel directly used to transmit the call voice stream between the user call terminal and the customer service call terminal can be established through media resource negotiation.
  • Video call media transport channel for video streaming can be replaced by S306'-S310'.
  • the media server sends a reinvite message to the user call terminal through the IMS network element.
  • the re-invite message is used to renegotiate media resources with the user's call terminal.
  • the re-invite message includes the SDP information of the media server.
  • the SDP information of the media server includes the IP address of the media server, audio port information, audio codec format, and video port. information and video codec format.
  • the user call terminal sends a response message to the media server through the IMS network element.
  • the response message includes the SDP information of the user's calling terminal.
  • the SDP information of the user's calling terminal includes the IP address of the user's calling terminal, audio port information, audio codec format, video port information, and video codec format.
  • the media server sends a reinvite message to the customer service call terminal.
  • the re-invite message is used to renegotiate media resources with the customer service call terminal.
  • the re-invite message includes the SDP information of the user call terminal.
  • the SDP information of the user call terminal includes the IP address, audio port information, and audio codec format of the user call terminal. Video port information and video codec format.
  • the customer service call terminal sends a response message to the media server.
  • the response message includes the SDP information of the customer service call terminal.
  • the SDP information of the customer service call terminal includes the IP address of the customer service call terminal, audio port information, audio codec format video port information, and video codec format.
  • the media server sends a response message carrying the SDP information of the customer service call terminal to the user call terminal.
  • all SDP information in this media negotiation process includes the device’s video port information and video codec format.
  • the user call terminal can obtain the SDP information of the customer service call terminal, and the customer service call terminal can obtain the SDP information of the user call terminal, that is, a direct connection between the user call terminal and the customer service call terminal is established.
  • the voice call media transmission channel and the video call media transmission channel Based on the established direct voice call media transmission channel and video call media transmission channel, the media server does not need to forward the call voice stream and call video stream when communicating between the user call terminal and the customer service call terminal.
  • the above-mentioned user call terminal is a call terminal, and the customer service call terminal is a call terminal for the opposite end, or the customer service call terminal is a call terminal, and the user call terminal is a call terminal for the opposite end.
  • the specific determination is based on the actual situation.
  • the embodiment of this application does not Make limitations.
  • the above-mentioned call terminal may be an electronic device such as a mobile phone, a tablet computer, or a personal computer (Ultra-mobile Personal Computer, UMPC).
  • UMPC Ultra-mobile Personal Computer
  • it can also be other electronic devices such as desktop devices, laptop devices, handheld devices, wearable devices, smart home devices, and vehicle-mounted devices, such as netbooks, smart watches, smart cameras, netbooks, and personal digital assistants (Personal digital assistants). Digital Assistant, PDA), etc.
  • the embodiments of this application do not limit the specific type and structure of the call terminal.
  • FIG. 4A is a schematic diagram of the hardware structure of a mobile phone 400 provided by an embodiment of the present application.
  • the mobile phone 400 includes a processor 410, an external memory interface 420, an internal memory 421, and a universal serial bus (universal serial bus). bus, USB) interface 430, charging management module 440, power management module 441, battery 442, antenna 1, antenna 2, mobile communication module 450, wireless communication module 460, audio module 470, speaker 470A, receiver 470B, microphone 470C, earphones Interface 470D, sensor module 480, button 490, Motor 491, indicator 492, camera 493, display screen 494, and subscriber identification module (subscriber identification module, SIM) card interface 495, etc.
  • SIM subscriber identification module
  • the structure illustrated in the embodiment of the present application does not constitute a specific limitation on the mobile phone 400.
  • the mobile phone 400 may include more or fewer components than shown in the figures, or some components may be combined, some components may be separated, or some components may be arranged differently.
  • the components illustrated may be implemented in hardware, software, or a combination of software and hardware.
  • the processor 410 may include one or more processing units.
  • the processor 410 may include an application processor (application processor, AP), a modem processor, a graphics processing unit (GPU), and an image signal processor. (image signal processor, ISP), controller, memory, video codec, digital signal processor (digital signal processor, DSP), baseband processor, and/or neural-network processing unit (NPU) wait.
  • application processor application processor, AP
  • modem processor graphics processing unit
  • GPU graphics processing unit
  • image signal processor image signal processor
  • ISP image signal processor
  • controller memory
  • video codec digital signal processor
  • DSP digital signal processor
  • NPU neural-network processing unit
  • different processing units can be independent devices or integrated in one or more processors.
  • the controller may be the nerve center and command center of the mobile phone 400.
  • the controller can generate operation control signals based on the instruction operation code and timing signals to complete the control of fetching and executing instructions.
  • the processor 410 may also be provided with a memory for storing instructions and data.
  • the memory in processor 410 is cache memory. This memory may hold instructions or data that have been recently used or recycled by processor 410 . If the processor 410 needs to use the instruction or data again, it can be called directly from the memory. Repeated access is avoided and the waiting time of the processor 410 is reduced, thus improving the efficiency of the system.
  • the charge management module 440 is used to receive charging input from the charger. While charging the battery 442, the charging management module 440 can also provide power to the electronic device through the power management module 441.
  • the power management module 441 is used to connect the battery 442, the charging management module 440 and the processor 410.
  • the power management module 441 receives input from the battery 442 and/or the charging management module 440, and supplies power to the processor 410, internal memory 421, external memory, display screen 494, camera 493, wireless communication module 460, etc.
  • the power management module 441 can also be used to monitor battery capacity, battery cycle times, battery health status (leakage, impedance) and other parameters.
  • the power management module 441 may also be provided in the processor 410.
  • the power management module 441 and the charging management module 440 can also be provided in the same device.
  • the wireless communication function of the mobile phone 400 can be realized through the antenna 1, the antenna 2, the mobile communication module 450, the wireless communication module 460, the modem processor and the baseband processor.
  • Antenna 1 and Antenna 2 are used to transmit and receive electromagnetic wave signals.
  • the mobile communication module 450 can provide wireless communication solutions including 2G/3G/4G/5G applied to the mobile phone 400.
  • the mobile communication module 450 can receive electromagnetic waves from the antenna 1, perform filtering, amplification and other processing on the received electromagnetic waves, and transmit them to the modem processor for demodulation.
  • the mobile communication module 450 can also amplify the signal modulated by the modem processor and convert it into electromagnetic waves through the antenna 1 for radiation.
  • at least part of the functional modules of the mobile communication module 450 may be disposed in the processor 410 .
  • at least part of the functional modules of the mobile communication module 450 and at least part of the modules of the processor 410 may be provided in the same device.
  • the wireless communication module 460 can provide applications on the mobile phone 400 including wireless local area networks (WLAN) (such as wireless fidelity (Wi-Fi) network), Bluetooth (bluetooth, BT), and global navigation satellite systems. (global navigation satellite system, GNSS), frequency modulation (frequency modulation, FM), near field communication technology (near field communication, NFC), infrared technology (infrared, IR) and other wireless communication solutions.
  • WLAN wireless local area networks
  • GNSS global navigation satellite system
  • frequency modulation frequency modulation, FM
  • NFC near field communication technology
  • infrared technology infrared, IR
  • the wireless communication module 460 may be one or more devices integrating at least one communication processing module.
  • the wireless communication module 460 receives electromagnetic waves via the antenna 2 , frequency modulates and filters the electromagnetic wave signals, and sends the processed signals to the processor 410 .
  • the wireless communication module 460 can also receive the signal to be sent from the processor 410, frequency modulate it, amplify it, and convert
  • the antenna 1 of the mobile phone 400 is coupled to the mobile communication module 450, and the antenna 2 is coupled to the wireless communication module 460, so that the mobile phone 400 can communicate with the network and other devices through wireless communication technology.
  • the mobile phone 400 implements the display function through the GPU, the display screen 494, and the application processor.
  • the GPU is an image processing microprocessor and is connected to the display screen 494 and the application processor. GPUs are used to perform mathematical and geometric calculations for graphics rendering.
  • Processor 410 may include one or more GPUs that execute program instructions to generate or alter display information.
  • Display 494 is used to display images, videos, etc.
  • Display 494 includes a display panel.
  • the display panel can use a liquid crystal display (LCD), an organic light-emitting diode (OLED), an active matrix organic light emitting diode or an active matrix organic light emitting diode (active-matrix organic light emitting diode).
  • LCD liquid crystal display
  • OLED organic light-emitting diode
  • AMOLED organic light-emitting diode
  • FLED flexible light-emitting diode
  • Miniled MicroLed, Micro-oLed, quantum dot light emitting diode (QLED), etc.
  • the mobile phone 400 may include 1 or N display screens 494, where N is a positive integer greater than 1.
  • the mobile phone 400 can realize the shooting function through the ISP, camera 493, video codec, GPU, display screen 494 and application processor.
  • the ISP is used to process data fed back by the camera 493, which is used to capture still images or videos.
  • Digital signal processors are used to process digital signals. In addition to digital image signals, they can also process other digital signals (such as audio signals, etc.).
  • Video codecs are used to compress or decompress digital video.
  • the handset 400 may support one or more video codecs.
  • the mobile phone 400 can play or record videos in multiple encoding formats, such as: moving picture experts group (MPEG) 1, MPEG2, MPEG3, MPEG4, etc.
  • MPEG moving picture experts group
  • MPEG2 MPEG2, MPEG3, MPEG4, etc.
  • the external memory interface 420 can be used to connect an external memory card, such as a Micro SD card, to expand the storage capacity of the mobile phone 400.
  • the external memory card communicates with the processor 410 through the external memory interface 420 to implement the data storage function. Such as saving music, videos, etc. files in external memory card.
  • Internal memory 421 may be used to store computer executable program code, which includes instructions.
  • the processor 410 executes instructions stored in the internal memory 421 to execute various functional applications and data processing of the mobile phone 400 .
  • the internal memory 421 may include a program storage area and a data storage area.
  • the stored program area can store an operating system, at least one application program required for a function (such as a sound playback function, an image playback function, etc.).
  • the storage data area can store data created during the use of the mobile phone 400 (such as audio data, phone book, etc.).
  • the internal memory 421 may include high-speed random access memory, and may also include non-volatile memory, such as at least one disk storage device, flash memory device, universal flash storage (UFS), etc.
  • the mobile phone 400 can implement audio functions through the audio module 470, the speaker 470A, the receiver 470B, the microphone 470C, the headphone interface 470D, and the application processor. Such as music playback, recording, etc.
  • the audio module 470 is used to convert digital audio information into analog audio signal output, and is also used to convert analog audio input into digital audio signals. Audio module 470 may also be used to encode and decode audio signals. In some embodiments, the audio module 470 may be disposed in the processor 410, or some functional modules of the audio module 470 may be disposed in the processor 410.
  • Speaker 470A also known as “speaker” is used to convert audio electrical signals into sound signals.
  • Cell phone 400 can listen to music through speaker 470A, or listen to hands-free calls.
  • Receiver 470B also called “earpiece” is used to convert audio electrical signals into sound signals.
  • the voice can be heard by bringing the receiver 470B close to the human ear.
  • Microphone 470C also known as “microphone” and “microphone”, is used to convert sound signals into electrical signals. When making a call or sending a voice message, the user can speak close to the microphone 470C with the human mouth and input the sound signal to the microphone 470C.
  • the mobile phone 400 can be provided with at least one microphone 470C. In other embodiments, the mobile phone 400 can be provided with two microphones 470C, which in addition to collecting sound signals, can also implement a noise reduction function. In other embodiments, the mobile phone 400 can also be equipped with three, four or more microphones 470C to collect sound signals, reduce noise, identify sound sources, and implement directional recording functions, etc.
  • Headphone interface 470D is used to connect wired headphones.
  • the buttons 490 include a power button, a volume button, etc.
  • the mobile phone 400 can receive key input and generate key signal input related to user settings and function control of the mobile phone 400 .
  • Motor 491 can produce vibration prompts. Motor 491 can be used for vibration prompts for incoming calls and can also be used for touch vibration feedback.
  • the indicator 492 may be an indicator light, which may be used to indicate charging status, power changes, or may be used to indicate messages, missed calls, notifications, etc.
  • the SIM card interface 495 is used to connect a SIM card.
  • the SIM card can be connected to and separated from the mobile phone 400 by inserting it into the SIM card interface 495 or pulling it out from the SIM card interface 495 .
  • the above-mentioned mobile phone 400 can perform some or all of the steps in the embodiment of the present application. These steps or operations are only examples, and the mobile phone 400 can also perform other operations or variations of various operations. In addition, various steps may be performed in a different order than those presented in the embodiments of the present application, and it may not be necessary to perform all operations in the embodiments of the present application.
  • Each embodiment of the present application can be implemented individually or in any combination, which is not limited by this application.
  • the communication method provided by the embodiment of the present application can be applied to a call terminal with a hardware structure as shown in Figure 4A or a call terminal with a similar structure. Or it can also be applied to call terminals with other structures, which is not limited in the embodiments of the present application.
  • this application takes the call terminal as a mobile phone 400 as an example to introduce the system architecture of the call terminal provided by this application.
  • the system architecture of the mobile phone 400 can adopt a layered architecture, event-driven architecture, microkernel architecture, microservice architecture, or cloud architecture.
  • the embodiment of this application uses a layered architecture
  • the system is taken as an example to illustrate the software structure of the mobile phone 400.
  • Figure 4B is a software structure block diagram of the call terminal according to the embodiment of the present application.
  • the layered architecture divides the software into several layers, and each layer has clear roles and division of labor.
  • the layers communicate through software interfaces.
  • the Android system is divided into four layers, from top to bottom: application layer, application framework layer, Android runtime (Android runtime) and system libraries, and kernel layer.
  • the application layer can include a series of application packages, as shown in Figure 4B.
  • the application package can include camera, gallery, calendar, call, map, navigation, WLAN, Bluetooth, music, video, short message and other applications.
  • the call application in the application layer of the mobile phone 400 can be used to make voice calls or video calls with other call terminals.
  • the call application is an application that the mobile phone 400 already has when it leaves the factory, and does not require the user to perform installation, configuration and other operations.
  • the function of the call terminal and the opposite end call terminal to mark the target object in the video screen during the voice call or video call is based on the call terminal and the opposite end call.
  • the call application on the terminal. It can also be considered that the call terminal and the opposite end call terminal in the embodiment of the present application are specifically the call terminal or the call application on the opposite end call terminal.
  • the application framework layer provides an application programming interface (API) and programming framework for applications in the application layer.
  • API application programming interface
  • the application framework layer includes some predefined functions.
  • the application framework layer can include a window manager, content provider, view system, phone manager, resource manager, notification manager, etc.
  • the window manager is used to manage window programs.
  • the window manager can obtain the display size, determine whether there is a status bar, lock the screen, capture the screen, etc.
  • Content providers are used to store and retrieve data and make this data accessible to applications. Said data can include videos, images, audio, calls made and received, browsing history and bookmarks, phone books, etc.
  • the view system includes visual controls, such as controls that display text, controls that display pictures, etc.
  • a view system can be used to build applications.
  • the display interface can be composed of one or more views. For example, a display interface including a text message notification icon may include a view for displaying text and a view for displaying pictures.
  • the phone manager is used to provide communication functions of the call terminal, such as call status management (including connecting, hanging up, etc.).
  • the resource manager provides various resources to applications, such as localized strings, icons, pictures, layout files, video files, etc.
  • the notification manager allows applications to display notification information in the status bar, which can be used to convey notification-type messages and can automatically disappear after a short stay without user interaction. For example, the notification manager is used to notify download completion, message reminders, etc.
  • the notification manager can also be notifications that appear in the status bar at the top of the system in the form of charts or scroll bar text, such as notifications for applications running in the background, or notifications that appear on the screen in the form of conversation windows. For example, text information is prompted in the status bar, a beep sounds, the electronic device vibrates, the indicator light flashes, etc.
  • Android runtime includes core libraries and virtual machines. Android runtime is responsible for the scheduling and management of the Android system.
  • the core library contains two parts: one is the functional functions that need to be called by the Java language, and the other is the core library of Android.
  • the application layer and application framework layer run in virtual machines.
  • the virtual machine executes the java files of the application layer and application framework layer into binary files.
  • the virtual machine is used to perform object life cycle management, stack management, thread management, security and exception management, and garbage collection and other functions.
  • System libraries can include multiple functional modules. For example: surface manager (surface manager), media libraries (Media Libraries), 3D graphics processing libraries (for example: OpenGL ES), 2D graphics engines (for example: SGL), etc.
  • the surface manager is used to manage the display subsystem and provides the integration of 2D and 3D layers for multiple applications.
  • the media library supports playback and recording of a variety of commonly used audio and video formats, as well as still images Like files etc.
  • the media library can support a variety of audio and video encoding formats, such as: MPEG4, H.264, MP3, AAC, AMR, JPG, PNG, etc.
  • the 3D graphics processing library is used to implement 3D graphics drawing, image rendering, composition, and layer processing.
  • 2D Graphics Engine is a drawing engine for 2D drawing.
  • the kernel layer is the layer between hardware and software.
  • the kernel layer contains at least display driver, camera driver, audio driver, and sensor driver.
  • the workflow of the software and hardware of the mobile phone 400 will be exemplified below by combining the capture of the photographing scene.
  • the corresponding hardware interrupt is sent to the kernel layer.
  • the kernel layer processes touch operations into raw input events (including touch coordinates, timestamps of touch operations, and other information). Raw input events are stored in the kernel layer.
  • the application framework layer obtains the original input event from the kernel layer and identifies the control corresponding to the input event. Taking the touch operation as a touch click operation and the control corresponding to the click operation as a camera application icon control as an example, the camera application calls the interface of the application framework layer to start the camera application, and then starts the camera driver by calling the kernel layer. Camera 493 captures still images or video.
  • the media server in the above communication system may be a server in the form of hardware or a server in the form of software.
  • a server in the form of hardware as an example, as shown in FIG. 5 , an embodiment of the present application provides a media server 500 .
  • the media server 500 includes at least one processor 501 and a memory 502 .
  • the processor 501 includes one or more central processing units (CPUs).
  • the CPU is a single-core CPU (single-CPU) or a multi-core CPU (multi-CPU).
  • Memory 502 includes, but is not limited to, random access memory (RAM), read only memory (ROM), erasable programmable read-only memory (EPROM), fast Flash memory, or optical memory, etc.
  • RAM random access memory
  • ROM read only memory
  • EPROM erasable programmable read-only memory
  • fast Flash memory or optical memory, etc.
  • the memory 502 stores operating system code.
  • the processor 501 implements the method in the above embodiment by reading instructions stored in the memory 502, or the processor 501 implements the method in the above embodiment by using internally stored instructions.
  • the memory 502 stores instructions for implementing the communication method provided by the embodiment of the present application.
  • the media server 500 After the program code stored in the memory 502 is read by at least one processor 501, the media server 500 performs the following operations: establishing a first video call media transmission channel and a second video call media transmission channel, and passing the first video call media transmission channel and the second video call media transmission channel to transmit the call video stream between the call terminal and the opposite end call terminal to realize the video call service between the call terminal and the opposite end call terminal; receive the video screen marking application from the call terminal; and pass The first video call media transmission channel receives first tagged media data from the call terminal, and transmits second tagged media data to the opposite end call terminal through the second video call media transmission channel. The first tagged media data and the third The two tagged media data are both used to present the second video picture on the call interface of the peer call terminal.
  • the media server 500 shown in FIG. 5 also includes a network interface 503.
  • the network interface 503 is a wired interface, such as a fiber distributed data interface (FDDI) or a gigabit ethernet (GE) interface.
  • network interface 503 is a wireless interface.
  • the network interface 503 is used to receive messages (such as SIP messages, etc.).
  • the network interface 503 is used to receive a call video stream or a call voice stream.
  • the memory 502 is used to store the audio stream or video stream received by the network interface 503, and at least one processor 501 The method described in the above method embodiment is further performed based on the information stored in the memory 502. For more details on how the processor 501 implements the above functions, please refer to the descriptions in the previous method embodiments, which will not be repeated here.
  • the media server 500 also includes a bus 504.
  • the above-mentioned processor 501 and the memory 502 are usually connected to each other through the bus 504, or are connected to each other in other ways.
  • the media server 500 also includes an input and output interface 505.
  • the input and output interface 505 is used to connect with an input device and receive instructions input by the user through the input device.
  • Input devices include but are not limited to keyboards, touch screens, microphones, etc.
  • the input and output interface 505 is also used to connect to an output device and output the processing results of the processor 501.
  • Output devices include but are not limited to monitors, printers, etc.
  • embodiments of the present application provide a communication method, which can be applied to call terminals (including The call terminal and the call terminal's counterpart call terminal) and the media server having the hardware structure shown in Figure 5 above are implemented, and the communication method is implemented through the interaction of each device.
  • the number of call terminals is one, and the number of peer call terminals is at least one.
  • the device that initiates the call is the user call terminal, and the called device is the customer service call terminal.
  • the customer service call terminal can mark the target object in the video screen, and transmit the video screen and the generated mark trace data to the user call terminal.
  • the customer service call terminal is the call terminal, and the user call terminal is the peer call terminal.
  • the communication method provided by the embodiment of the present application is described in detail below. As shown in Figure 6, the communication method provided by the embodiment of the present application includes the following steps.
  • the above-mentioned video call media transmission channel is established by the interaction between the call terminal, the media server and the peer call terminal.
  • the video call media transmission channel is used to transmit the video call service between the call terminal and the peer call terminal.
  • the call video stream between the two parties contains the video content captured by the call terminal or the peer call terminal.
  • the call terminal and the opposite end call terminal are relative concepts. Among the two terminals participating in the call, any one terminal can be the call terminal, and the other terminal can be the opposite end call terminal.
  • the video call media transmission channel established by the call terminal, the peer call terminal and the media server can be an indirect video call media transmission channel.
  • the above-mentioned establishment of the video call media transmission channel specifically includes: establishing a first video call media transmission channel. , and establish a second video call media transmission channel.
  • the first video call media transmission channel is the video call media transmission channel between the call terminal and the media server.
  • the second video call media transmission channel is between the customer service call terminal and the media server.
  • Video call media transmission channel In the embodiment of the present application, reference may be made to the description of S206-S209 in the above embodiment for the establishment process of the video call media transmission channel, which will not be described again here.
  • the video call media transmission channel established by the call terminal, the peer call terminal and the media server can also be a direct video call media transmission channel.
  • the video call media transmission channel includes the video call media transmission channel between the call terminal and the peer call terminal.
  • For the video call media transmission channel please refer to the description of S206'-S210' in the above embodiment for the specific process.
  • the call terminal sends a video screen marking application to the media server.
  • the media server receives the video picture marking application from the call terminal.
  • the video picture marking application includes an identifier of the peer call terminal, which is used to apply for a marking operation on the video picture corresponding to the peer call terminal (referred to as the first video picture in the following embodiment). It can be understood that there may be multiple peer call terminals with which the call terminal communicates. During the conversation between the call terminal and multiple peer call terminals, the call terminal may apply to transmit to a certain peer call terminal the information in the first video screen. Marked media data generated after the target object performs marking operations.
  • the video screen corresponding to the peer call terminal may be the video content captured by the peer call terminal.
  • the video screen corresponding to the peer call terminal may also be the video content stored in the storage device of the call terminal.
  • the video content is selected by the call terminal.
  • the calling party i.e., the call terminal
  • the media server in the customer service system calls the opposite call terminal, and after the opposite call terminal responds,
  • the call terminal, the peer call terminal and the media server interact to establish a video call media transmission channel.
  • S601 is executed first, and then S602 is executed.
  • the media server in the customer service system calls the peer call terminal, and after the peer call terminal responds, the call terminal and the peer call terminal
  • the call terminal and the media server interact to establish a voice call media transmission channel.
  • the call terminal, the peer call terminal and the media server interact to establish a video call media transmission channel.
  • S602 is executed first, and then S601 is executed.
  • the media server sends a video screen marking request to the peer call terminal.
  • the peer call terminal receives a video picture marking request from the media server.
  • the video picture marking request is used to request a marking operation on the first video picture.
  • the peer call terminal sends a response message of the video screen marking request to the media server.
  • the media server receives the response message of the video picture marking request from the peer call terminal.
  • the response message is used to indicate that the peer call terminal agrees to perform the marking operation on the first video picture.
  • the above video screen marking request also indicates a request to convert the voice call into a video call
  • the response message of the video screen marking request also indicates that the peer call terminal agrees to convert the voice call into a video call.
  • the first video picture comes from the video content received by the call terminal from the peer call terminal, and the response message of the video picture marking request also indicates that the peer call terminal agrees to the media server capturing the call video sent by the peer call terminal to the media server. flow.
  • the media server interacts with the call terminal to confirm that the call terminal has the resources required to perform a marking operation on the first video screen (S605-S606 below) ).
  • the status of the call terminal itself may change. For example, the current network signal of the call terminal may be poor, or it may be on a 2G/3G network, and its bandwidth It is not enough to support the call terminal to perform the marking operation, or the video call media transmission channel is unavailable, or the user corresponding to the call terminal is inconvenient to perform the marking operation, etc. In these cases, the call terminal does not have the ability to mark the first video screen. resources required.
  • the media server sends a SIP message to the call terminal.
  • the call terminal receives the SIP message from the media server.
  • the SIP message includes a marking operation confirmation identifier, which is used to confirm whether the call terminal has the resources required to perform a marking operation on the first video picture.
  • the call terminal sends a response message of the SIP message to the media server.
  • the media server receives the response message of the SIP message from the call terminal.
  • the response message includes a mark operation response identifier, and the mark operation response identifier is used to indicate that the call terminal has the resources required to perform a mark operation on the first video picture.
  • the marking operation confirmation identifier in the SIP message may be carried in the header field of the SIP message.
  • the marking operation confirmation identifier can be carried in the header field of the SIP message in the following two ways.
  • the first carrying method carrying the tag operation confirmation identifier (recorded as Tag) in the Contact extension field of the SIP message.
  • the second carrying method carrying the tag operation confirmation identifier (Tag) in the Supported extension field of the SIP message
  • the SIP message in S605 above also includes the SDP information of the media server.
  • the SDP information of the media server includes the address information (such as IP address) and audio port of the media server. information, audio codec format, video port information, and video codec format.
  • the response message of the SIP message also includes the SDP information of the calling terminal.
  • the SDP information of the calling terminal includes the address information (such as IP address) and audio port information of the calling terminal. Audio codec format, video port information, and video codec format.
  • the media server and the calling terminal can respond to the SIP message and the SIP message in S605-S606.
  • the SDP information of the media server in the media server and the SDP information of the call terminal perform media resource negotiation, and a video call media transmission channel (ie, the first video call media transmission channel) between the call terminal and the media server is established.
  • the media resource negotiation between the media server and the call terminal can be performed in step S605- S606 is replaced.
  • the media server interacts with the peer call terminal (for example, the media server sends a message to the peer call terminal Send a SIP message that includes the SDP information of the media server.
  • the peer call terminal sends a response message to the SIP message to the media server.
  • the response message of the SIP message includes the SDP information of the peer call terminal) to perform media resource negotiation.
  • establishing a video call media transmission channel ie, a second video call media transmission channel
  • the above marking operation confirmation identifier may also be carried in the SDP information of the media server.
  • the video port information for transmitting the tagged media data may be further indicated in the extension field of the SDP information. Specifically, it can be instructed to use the video port that transmits the call video stream to transmit the tagged media data.
  • the fields of the SDP information are illustrated below.
  • m video 12082 RTP/AVP 114 113; identifies the use of the video port that transmits the call video stream to transmit the marked media data
  • the SDP information can indicate whether the video call is a one-way video call or a two-way video call.
  • the one-way video call can only transmit the call
  • the call video stream of the terminal does not transmit the video stream of the peer call terminal.
  • the call terminal sends the video content captured by the call terminal to the peer call terminal, and displays the video content captured by the call terminal on the peer call terminal, but the peer call terminal does not capture the video content or monitor the video captured by it.
  • the content will not be sent to the calling terminal, that is, the video content captured by the peer calling terminal will not be displayed on the calling terminal.
  • the call terminal presents the first video picture on the call interface of the call terminal based on the target media data.
  • the above-mentioned call interface is the interface presented on the call terminal during the video call between the call terminal and the opposite call terminal.
  • the call interface can be a call window or a dialog box, etc.
  • the application implements The examples are not limiting.
  • the target media data may be data received by the call terminal from the opposite call terminal, or the target media data may be data obtained locally by the call terminal from the call terminal.
  • the target media data includes data corresponding to the first video frame, which is received from the peer call terminal through the video call media transmission channel and is used to present the first video picture.
  • the above S607 specifically includes S6071.
  • the call terminal decodes the data corresponding to the first video frame to present the first video picture on the call interface of the call terminal.
  • the media server first receives video stream data from the peer call terminal through the second video call media transmission channel (the video stream data is the call video stream), and the video stream data includes data corresponding to the first video frame, Then the media server sends the video stream data to the call terminal through the first video call media transmission channel, and then the call terminal obtains the data corresponding to the first video frame from the video stream data, and based on the data corresponding to the first video frame, the call terminal The call interface of the terminal presents the first video screen.
  • the media server first receives video stream data from the peer call terminal through the second video call media transmission channel (the video stream data is the call video stream), and the video stream data includes data corresponding to the first video frame, Then the media server obtains the data corresponding to the first video frame from the video stream data, and then the media server sends the data corresponding to the first video frame to the call terminal through the first video call media transmission channel, and then the call terminal can based on the first video
  • the data corresponding to the frame presents the first video picture on the call interface of the call terminal.
  • the target media data includes data corresponding to the target image.
  • the target image is stored locally on the call terminal and is used to present the first video picture.
  • the above S607 specifically includes S6072.
  • the call terminal decodes the data corresponding to the target image to present the first video picture on the call interface of the call terminal.
  • the call terminal detects the user's marking operation on the target object in the first video screen, and generates marking trace data.
  • the marking trace data is used to describe the marking traces generated by the marking operation.
  • the touch traces are digital traces or virtual traces.
  • the marking operation may be a tapping operation on the first video screen, or it may be a writing operation on the first whiteboard screen, for example, drawing lines in different forms on the first video screen to thereby Mark the target object, such as encircling the target object through a rectangular box, circle, triangle or irregular closed shape, or marking it through a solid line, dotted line, arrow line or other special symbols (such as drawing a five-pointed star next to the target object) target.
  • the embodiment of the present application does not limit the specific form of the marking operation. Any operation that can mark the target object can be regarded as a marking operation.
  • a marking trace corresponding to the specific behavior of the marking operation can be formed, and the user can mark the target object in any way that can mark the target object.
  • the above-mentioned mark trace data includes but is not limited to the time stamp, color, shape, position (such as the coordinates of each point on the mark trace and other position parameters), etc., which can indicate the above-mentioned mark trace.
  • the mark trace in a certain video picture is a red circle
  • the mark trace data includes data indicating the video picture (such as the timestamp or identification information of the video picture), data indicating that the color of the mark trace is red, and indicating the mark.
  • the call terminal can use the video marking tool on the call terminal and call the video Use the function items in the video marking tool to mark the target object.
  • the video marking tool is the system software that comes with the call terminal.
  • the media server sends transmission channel indication information to the call terminal.
  • the call terminal receives the transmission channel indication information from the media server.
  • the transmission channel indication information is used to instruct the call terminal to transmit marked media data through the video call media transmission channel.
  • the marked media data is used to present a second video picture on the call interface of the opposite end call terminal.
  • the second video picture contains the above marked traces, that is, after the opposite end call terminal receives the marked media data, the opposite end call terminal can use the mark based on the mark.
  • the media data presents the second video picture on the call interface of the peer call terminal.
  • the transmission channel indication information may be carried in a SIP message.
  • the SIP message may be the same message as the SIP message in S605 above, or may be a different SIP message, which is not limited in the embodiment of this application.
  • the above-mentioned media server can instruct the transmission of marked media data through the video call media transmission channel through the explicit instruction method described in S608 (i.e., sending transmission channel indication information).
  • the media server can also use the implicit instruction method.
  • the SIP message in S605 carries the SDP information of the media server
  • the response message of the SIP message carries the SDP information of the calling terminal to negotiate (or instruct) the use of SDP information based on this pair of SDP information (the SDP information of the media server).
  • the video call media transmission channel established with the SDP of the call terminal transmits marked media data. It can be understood that the video call media transmission channel was originally used to transmit the call video stream.
  • the call terminal transmits the marked media data to the opposite call terminal through the video call media transmission channel, so that the opposite call terminal presents the second video picture on the call interface of the opposite call terminal based on the marked media data.
  • the above-mentioned video call media transmission channel includes a first video call media transmission channel between the call terminal and the media server, and a second video call media transmission channel between the peer call terminal and the media server, that is,
  • the video call media transmission channel is an indirect transmission channel, and the above S610 is specifically implemented through S6101-S6102.
  • the call terminal transmits the first tagged media data to the media server through the first video call media transmission channel.
  • the media server receives the first marked media data from the call terminal through the first video call media transmission channel.
  • the first marked media data is used to present the second video picture on the peer call terminal.
  • the media server sends the first transmission channel indication information to the call terminal.
  • the first transmission channel indication information instructs the call terminal to transmit the first marked media through the first video call media transmission channel. data.
  • the media server transmits the second tagged media data to the opposite call terminal through the second video call media transmission channel.
  • the peer call terminal receives the second tagged media data from the media server through the second video call media transmission channel.
  • the second marked media data is also used to present the second video picture on the peer call terminal.
  • the above-mentioned first marked media data and second marked media data may be the same type of data, or may be different types of data.
  • the media server before executing S609, the media server also sends second transmission channel indication information to the opposite end call terminal, and the second transmission channel indication information instructs the opposite end call terminal to use the second video
  • the call media transmission channel transmits the second tagged media data.
  • the first marked media data and the second marked media data are the same, and both include data corresponding to the second video frame.
  • the call terminal before the call terminal transmits the marked media data to the media server, the call terminal will superimpose and present the above-mentioned mark traces on the first video picture presented on the call interface of the call terminal to form a second video picture, and will be used for The data corresponding to the second video frame presenting the second video picture is sent to the media server, and then the media server sends the data corresponding to the second video frame to the peer call terminal. In this way, the peer call terminal receives the second video frame. After corresponding data is obtained, the second video picture can be presented on the call interface of the peer call terminal.
  • the first marked media data and the second marked media data are the same, and both are marked trace data.
  • the call terminal sends the marked trace data to the media server, and the media server forwards the marked media data to the peer call terminal, and then the peer call terminal transmits the target media data (the The target media data (from the video content shot by the peer call terminal) and the mark trace data are superimposed to obtain the data corresponding to the second video frame, and based on the data corresponding to the second video frame, the second call interface of the peer call terminal is presented. Two video screens.
  • the first marked media data and the second marked media data are different.
  • the first marked media data is marked trace data
  • the second marked media data is data corresponding to the second video frame.
  • the call terminal sends the marked trace data to the media server, and the media server combines the target media data (the target media data obtained by the media server from the opposite end call terminal) and the marked trace data. Overlay is performed to obtain the data corresponding to the second video frame, and the data corresponding to the second video frame is sent to the opposite end call terminal, so that the opposite end call terminal can conduct a call on the opposite end call terminal based on the data corresponding to the second video frame.
  • the interface displays the second video screen.
  • the above-mentioned video call media transmission channel includes a direct video call media transmission channel between the call terminal and the opposite end call terminal, that is, the video call media transmission channel is a direct transmission channel, then the above-mentioned S610 specifically passes S6101 'Implementation, i.e. S6101-S6102 are replaced with S6101'.
  • the call terminal transmits the marked media data to the opposite call terminal through the direct video call media transmission channel between the call terminal and the opposite call terminal.
  • the above-mentioned marked media data includes data corresponding to the second video frame.
  • the call terminal will superimpose and present the above-mentioned mark traces on the first video picture presented on the call interface of the call terminal to form a second video picture, and will correspond to the second video frame used to present the second video picture.
  • the data is sent to the peer call terminal. In this way, after the peer call terminal receives the data corresponding to the second video frame, the second video frame can be presented on the call interface of the peer call terminal.
  • the above-mentioned marked media data is marked trace data.
  • the call terminal after the call terminal obtains the marked trace data, the call terminal sends the marked trace data to the peer call terminal, and then the peer call terminal sends the target media data (the target media data comes from the video shot by the peer call terminal). content) and mark trace data to obtain data corresponding to the second video frame, and based on the data corresponding to the second video frame, the second video picture is presented on the call interface of the peer call terminal.
  • the communication method provided by the embodiment of the present application further includes S611.
  • the call terminal stops transmitting the call video stream through the video call media transmission channel.
  • the call terminal and the peer call terminal stop transmitting the video content (i.e., the call video stream) shot by the call terminal or the peer call terminal through the video call media transmission channel, so that the video can be transmitted through the video call media transmission channel.
  • the call media transport channel transmits tagged media data.
  • the call terminal stopping transmitting the call video stream through the video call media transmission channel specifically includes: before receiving the first tagged media data from the call terminal through the first video call media transmission channel, stopping transmitting the call video stream through the first video call media transmission channel. the call video stream; and before transmitting the second tagged media data to the opposite call terminal through the second video call media transmission channel, stopping transmitting the call video stream through the second video call media transmission channel.
  • the media server when the media server serves as the transmission medium between the call terminal and the opposite call terminal, after the media server obtains the data corresponding to the second video frame used to present the second video picture, the media server also photographs the call terminal.
  • the video content and the data corresponding to the above-mentioned second video frame are mixed and encoded (which can be called screen mixing processing), and then the mixed screen video stream is sent to the peer call terminal.
  • a mixed screen of the second video picture and the content shot by the call terminal can be displayed on the counterpart call terminal.
  • the picture shot by the call terminal is displayed in the first area of the call interface of the counterpart call terminal
  • the picture shot by the call terminal is displayed in the first area of the call interface of the counterpart call terminal.
  • a second video screen is displayed in the second area of the call interface of the call terminal.
  • the call terminal can also cancel the video mark and restore the original voice or video call. Specifically, the call terminal closes the video marking tool on the call terminal, and the call terminal sends a video marking end notification message to the media server to notify the media server that the video marking process is completed. In this way, subsequently, the call terminal, the media server and the opposite end Voice or video calls are resumed between call terminals, that is to say, the video call media transmission channel is no longer used to transmit marked media data, and the original function of the video call media transmission channel is restored, that is, the call video stream is transmitted through the video call media transmission channel. .
  • the calling terminal can perform a marking operation on the target object in the video picture presented on the calling terminal, and then use the obtained marked media data to Sent to the peer call terminal, the peer call terminal can also mark the target object in the video picture presented on the peer call terminal, and then send the obtained marked media data to the call terminal, and the peer call terminal performs
  • the process of marking operation and transmitting marked media data is similar to the above-mentioned process of the call terminal performing marking operation and transmitting marked media data.
  • the call terminal, the peer call terminal and the media server in the communication system interact to establish the video call media transmission channel, and transmit the video call media transmission channel through the video call media transmission channel.
  • the call video stream between the call terminal and the opposite end call terminal is used to realize the video call service between the call terminal and the opposite end call terminal.
  • the call video stream contains the video content shot by the call terminal or the opposite end call terminal; then, the call The terminal presents the first video picture on the call interface of the call terminal based on the target media data, detects the user's marking operation of the target object in the first video picture, generates mark trace data, and calls the peer through the video call media transmission channel
  • the terminal transmits marked media data, and the marked media data is used to present a second video picture on the call interface of the peer call terminal, and the second video picture contains the above-mentioned mark traces.
  • the call terminal, the media server and the peer call terminal can transmit marked media data based on the existing video call media transmission channel, without spending extra time to establish a dedicated channel for transmitting marked media data. transmission channel, and does not need to occupy additional port resources of the terminal (including the call terminal and the peer call terminal). In this way, it can save the processing of the target object in the video screen during the call. Port resources occupied during line marking.
  • the call terminal and the opposite end call terminal can be terminals with different roles.
  • the call terminal can be a customer service call terminal, and the opposite end call terminal can be a user call terminal; or the call terminal It is the user call terminal, and the peer call terminal is the customer service call terminal.
  • the user's call terminal that initiates the call can initiate a video call or a voice call.
  • the communication method provided by the embodiment of the present application will be described in detail from the perspective of interaction between each device. description of. As shown in Figure 7, the communication method provided by the embodiment of the present application includes the following steps.
  • the user call terminal sends an invitation (invite) message to the media server through the IMS network element.
  • the media server sends a ringing message to the user's call terminal through the IMS network element.
  • the media server sends a response message to the user's call terminal through the IMS network element.
  • the media server plays audio prompt content related to the user's business to prompt the user to select different service content (such as selecting manual service) according to needs.
  • the media server detects the operation of selecting manual service, and the media server assigns a customer service staff to the user (that is, selects a corresponding customer service call terminal for the user's call terminal) .
  • the media server sends an invitation message to the customer service call terminal.
  • the customer service call terminal sends a response message to the media server.
  • the media server sends a reinvite message to the user call terminal through the IMS network element.
  • the reinvite message includes the SDP information of the media server.
  • the SDP information of the media server includes the address information (such as IP address), audio port information, and audio codec format of the media server.
  • the user call terminal sends a response message to the media server through the IMS network element.
  • the response message includes the SDP information of the user's calling terminal.
  • the SDP information of the user's calling terminal includes the address information (such as IP address), audio port information and audio codec format of the user's calling terminal.
  • S708 The media server sends a reinvite message to the customer service call terminal.
  • the reinvite message includes the SDP information of the media server.
  • SDP information of the media server For a description of the SDP information of the media server, refer to S706.
  • the customer service call terminal sends a response message to the media server.
  • the response message includes the SDP information of the customer service call terminal.
  • the SDP information of the customer service call terminal includes the address information (such as IP address), audio port information, and audio codec format of the customer service call terminal.
  • the above-mentioned S706-S709 is a process in which the call terminal, the peer call terminal and the media server interact to establish an indirect voice call media transmission channel between the user call terminal and the customer service call terminal through media resource negotiation.
  • S701-S709 is a process in which the user call terminal calls the customer service call terminal and establishes a voice call media transmission channel.
  • the content carried in the messages of each step of S701-S709 please refer to the above detailed description of S201-S209, which will not be described again here.
  • the customer service call terminal sends a video screen marking application to the media server.
  • the media server receives the video screen marking application sent by the customer service call terminal.
  • the video picture marking application includes an identification of the user's call terminal, which is used to apply for a marking operation on the video picture corresponding to the user's call terminal (hereinafter referred to as the first video picture).
  • the media server sends a video screen marking request to the user call terminal.
  • the user call terminal receives the video picture marking request from the media server.
  • the video picture marking request is used to request a marking operation on the first video picture.
  • the user call terminal sends a response message of the video screen marking request to the media server.
  • the media server receives the response message of the video picture marking request from the user call terminal.
  • the response message is used to instruct the user's call terminal to agree to the marking operation on the first video picture.
  • the above-mentioned video screen marking request also indicates a request to convert the voice call into a video call
  • the response message of the video screen marking request also indicates that the user call terminal agrees to convert the voice call into a video call.
  • the target media data used to present the first video picture comes from data received by the customer service call terminal from the user call terminal
  • the response message of the video picture marking request also indicates that the user call terminal agrees to the media server capturing the data sent by the user call terminal to the media server. Call video streaming.
  • the above-mentioned user call terminal initiates a voice call, and the voice call media transmission channel is established through the steps S706-S709 in the above embodiment. Since the target object in the video screen is marked, it is necessary to obtain Marked media data (the marked media data is used to present the second video picture, which belongs to the video stream), and during the voice call, only the call voice stream can be transmitted, and the video stream cannot be transmitted. Therefore, when the media server receives After applying for video screen marking, the media server will trigger the establishment of a video call media transmission channel, that is, the voice call needs to be converted into a video call to establish a video call media transmission channel capable of transmitting the video stream, and use the video call media transmission channel Transfer tagged media data.
  • the video call media transmission channel includes the video call media transmission channel between the user call terminal and the media server (corresponding to the second video call media transmission channel in the above embodiment), and the video call media transmission channel between the customer service call terminal and the media server.
  • the video call media transmission channel (corresponding to the first video call media transmission channel in the above embodiment), the first video call media transmission channel and the second video call media transmission channel exist in pairs, and are used for customer service call terminals to communicate with users The transmission channel for terminal communication.
  • the above-mentioned S711 may be in the re-invite message, which carries the SDP information of the media server (including the SDP information of the media server including the address information of the media server, audio port information, audio codec format, video port information and video codec format), then S712 is a response message to the re-invite message.
  • the response message carries the SDP information of the user's calling terminal (the SDP information of the user's calling terminal.
  • the SDP information of the user's calling terminal includes the address information of the user's calling terminal.
  • the establishment process of the first video call media transmission channel is as follows S713-S714.
  • the media server sends a SIP message to the customer service call terminal.
  • the customer service call terminal receives the SIP message from the media server.
  • the SIP message includes the marking operation confirmation identifier and the SDP information of the media server.
  • the marking operation confirmation The identification is used to confirm whether the call terminal has the resources required to mark the first video screen.
  • the SDP information of the media server includes the address information of the media server, audio port information, audio codec format, video port information and video codec. Format. It should be noted that, unlike the SDP information (including device address information, audio port information, and audio codec format) in the media resource negotiation message in the voice call scenario, the SDP information in the media resource negotiation message in the video call scenario also contains Includes the device’s video port information and video codec format.
  • the customer service call terminal sends a response message of the SIP message to the media server.
  • the media server receives the response message of the SIP message from the customer service call terminal.
  • the response message of the SIP message includes a mark operation response identifier and the SDP information of the customer service call terminal.
  • the mark operation response identifier is used to indicate that the call terminal has the resources required to perform a mark operation on the first video screen.
  • the SDP information of the customer service call terminal Including the address information of the customer service call terminal, audio port information, audio codec format, video port information and video codec format.
  • the SDP information of the media server in the SIP message and the SDP information of the customer service call terminal in the response message of the SIP message are used to negotiate media resources and establish the first video call media transmission channel between the media server and the customer service call terminal.
  • the customer service call terminal presents the first video picture on the call interface of the customer service call terminal based on the target media data.
  • the customer service call terminal detects the user's marking operation on the target object in the first video screen, and generates marking trace data.
  • the marking trace data is used to describe the marking traces generated by the marking operation.
  • the media server sends transmission channel indication information to the customer service call terminal.
  • the customer service call terminal receives the transmission channel indication information from the media server.
  • the transmission channel indication information is used to instruct the customer service call terminal to transmit marked media data through the video call media transmission channel.
  • the marked media data is used to present a second video picture on the call interface of the peer call terminal, and the second video picture contains the above-mentioned mark traces.
  • the transmission channel for transmitting marked media data is the video call media transmission channel. Based on the video call media transmission channel, the customer service call terminal can transmit marked media data to the user call terminal. .
  • the customer service call terminal transmits the marked media data to the user call terminal through the video call media transmission channel, so that the user call terminal presents the second video picture on the call interface of the user call terminal based on the marked media data.
  • the above-mentioned user call terminal can also perform a marking operation on the target object in the video screen.
  • the user call terminal performs a marking operation on the target object in the video screen of the video content received from the customer service call terminal, and marks the target object.
  • the media data is sent to the customer service call terminal.
  • embodiments of the present application provide a call terminal, which can be divided into functional modules according to the above method examples.
  • each functional module can be divided corresponding to each function, or two or more functions can be divided into Integrated in a processing module.
  • the above integrated modules can be implemented in the form of hardware. It can also be implemented in the form of software function modules. It should be noted that the division of modules in the embodiment of the present invention is schematic and is only a logical function division. In actual implementation, there may be other division methods.
  • FIG. 8 shows a possible structural diagram of the call terminal involved in the above embodiment.
  • the call terminal includes a processing module 801, a generating module 802 and a sending module 803.
  • the processing module 801 is used to establish a video call media transmission channel, and control the call terminal to transmit the call video stream between the call terminal and the opposite end call terminal through the video call media transmission channel, so as to realize the video between the call terminal and the opposite end call terminal.
  • S601 in the above method embodiment is executed.
  • the processing module 801 is also configured to present the first video picture on the call interface of the call terminal based on the target media data, for example, perform S607 and S715 in the above method embodiment.
  • the generation module 802 is used to detect the user's marking operation on the target object in the first video frame, and generate marking trace data.
  • the marking trace data is used to describe the marking traces generated by the marking operation. For example, perform S608 and S608 in the above method embodiment.
  • the sending module 803 is configured to transmit marked media data to the opposite end call terminal through the video call media transmission channel, so that the opposite end call terminal presents a second video picture on the call interface of the opposite end call terminal based on the marked media data, and the second video picture includes The above-mentioned marking traces, for example, execute S610 and S718 in the above method embodiment.
  • the target media data includes data corresponding to a first video frame, which is received from the peer call terminal through the video call media transmission channel and used to present the first video picture; processing Module 801 is specifically configured to decode data corresponding to the first video frame to present the first video picture on the call interface, for example, executing S6071 in the above method embodiment.
  • the target media data includes data corresponding to the target image, which is stored locally in the call terminal and used to present the first video picture; the processing module 801 is specifically configured to decode the data corresponding to the target image to present the first video picture on the call interface.
  • Video screen for example, perform S6072 in the above method embodiment.
  • the above-mentioned processing module 801 is also used to control the call terminal to stop transmitting the call video stream through the video call media transmission channel, for example, executing S611 in the above method embodiment.
  • the video call media transmission channel includes a first video call media transmission channel between the call terminal and the media server, and a second video call media transmission channel between the peer call terminal and the media server.
  • the above-mentioned sending module 803 is specifically used to transmit the first tagged media data to the media server through the first video call media transmission channel to trigger the media server to transmit the second tagged media data to the opposite call terminal through the second video call media transmission channel.
  • Both the first marked media data and the second marked media data are data used to present the second video picture, for example, perform S6101 and S6102 in the above method embodiment.
  • the video call media transmission channel is a direct video call media transmission channel between the call terminal and the opposite call terminal.
  • the above-mentioned sending module 803 is specifically used to transmit the marked media to the opposite call terminal through the direct video call media transmission channel. Data, for example, perform S6101' in the above method embodiment.
  • the call terminal provided by the embodiment of the present application also includes a receiving module 804, which is configured to receive transmission channel indication information from the media server.
  • the transmission channel indication information instructs the call terminal to transmit the marked media through the video call media transmission channel.
  • Data for example, perform S609 and S717 in the above method embodiment.
  • the above-mentioned sending module 803 is also used to send a video picture marking application to the media server.
  • the video picture marking application includes the identification of the opposite end call terminal, and the identification is used to apply for marking the video picture corresponding to the opposite end call terminal. Operations, such as performing S602 and S710 in the above method embodiment.
  • the above-mentioned receiving module 804 is also used to receive a SIP message from the media server.
  • the SIP message contains Includes a marking operation confirmation identifier, which is used to confirm whether the call terminal has the resources required to perform a marking operation on the first video picture, such as performing S605 and S713 in the above method embodiment;
  • the sending module 803 is also used to The media server sends a response message of the SIP message.
  • the response message includes a marking operation response identifier.
  • the marking operation response identifier is used to indicate that the calling terminal has the resources required to perform a marking operation on the first video picture. For example, perform the above method embodiment. of S606, S714.
  • Each module of the above-mentioned call terminal can also be used to perform other actions in the above-mentioned method embodiment. All relevant content of each step involved in the above-mentioned method embodiment can be quoted from the functional description of the corresponding functional module, and will not be described again here.
  • FIG. 9 shows another possible structural diagram of the call terminal involved in the above embodiment.
  • the call terminal provided by the embodiment of the present application may include: a processing module 901 and a communication module 902.
  • the processing module 901 can be used to control and manage the actions of the call terminal.
  • the processing module 901 can be used to support the call terminal to perform S601, S607 (including S6071 or S6072), S608, S611, and S715 in the above method embodiment. , S716, and/or other processes for the techniques described herein.
  • the communication module 902 can be used to support the communication between the call terminal and other network entities.
  • the communication module 902 integrates the functions of the above-mentioned sending module 803 and the receiving module 804.
  • the communication module 902 can be used to support the call terminal to perform the above method embodiments.
  • the call terminal may also include a storage module 903 for storing the program code and data of the call terminal, such as received video content video frames, mark trace data, etc.
  • the processing module 901 may be a processor, for example, the processor may be the processor 410 in Figure 4A.
  • the communication module 902 may be a transceiver, a transceiver circuit or a communication interface, such as the mobile communication module 450 and/or the wireless communication module 460 in Figure 4A.
  • the storage module 903 may be a memory, such as the internal memory 421 in Figure 4A.
  • embodiments of the present application provide a media server, which can be divided into functional modules according to the above method examples.
  • each functional module can be divided into corresponding functional modules, or two or more functions can be divided into Integrated in a processing module.
  • the above integrated modules can be implemented in the form of hardware or software function modules. It should be noted that the division of modules in the embodiment of the present invention is schematic and is only a logical function division. In actual implementation, there may be other division methods.
  • FIG. 10 shows a possible structural diagram of the media server involved in the above embodiment.
  • the media server includes a processing module 1001, a receiving module 1002 and a sending module 1003.
  • the processing module 1001 is used to establish a first video call media transmission channel and a second video call media transmission channel.
  • the first video call media transmission channel is a video call media transmission channel between the media server and the call terminal.
  • the second video call media transmission channel The channel is a video call media transmission channel between the media server and the opposite end call terminal; and the control receiving or sending module transmits the call terminal and the opposite end call terminal through the first video call media transmission channel and the second video call media transmission channel.
  • the call video stream between the call terminal and the peer call terminal is used to implement the video call service between the call terminal and the peer call terminal.
  • the call video stream includes video content captured by the call terminal or the peer call terminal.
  • S601 in the above method embodiment is executed.
  • the receiving module 1002 is configured to receive the first marked media data from the call terminal through the first video call media transmission channel.
  • a mark media data is used to present a second video picture on the call interface of the peer call terminal.
  • the second video picture contains mark traces.
  • the mark trace is the user marking the target object in the first video picture presented on the call interface of the call terminal.
  • the first video picture is a video picture presented on the call interface of the call terminal based on the target data, for example, perform S6101 in the above method embodiment;
  • the sending module 1003 is also used to transmit the second video call media channel
  • the second tagged media data is transmitted to the peer call terminal, and the second tagged media data is used to present the second video picture on the call interface of the peer call terminal, for example, S6102 in the above method embodiment is executed.
  • the sending module 1003 is configured to send first transmission channel indication information to the calling terminal, where the first transmission channel indication information instructs the calling terminal to transmit the first tagged media data through the video call media transmission channel, for example, in the above method embodiment S609, S717.
  • the sending module 801 is also configured to send second transmission channel indication information to the opposite end call terminal, and the second transmission channel indication information instructs the opposite end call terminal to transmit the second tagged media data through the video call media transmission channel.
  • the above processing module 1001 is also used to control the media server to stop transmitting the call video stream through the first video call media transmission channel, for example, executing S611 in the above method embodiment. And the processing module 1001 is also used to control the media server to stop transmitting the call video stream through the second video call media transmission channel.
  • the receiving module 1002 is also configured to receive a video picture marking application from the call terminal.
  • the video picture marking application includes an identification of the opposite end call terminal, and the identification is used to apply for a marking operation on the video picture corresponding to the opposite end call terminal. , for example, perform S602 and S710 in the above method embodiment.
  • the above-mentioned sending module 1003 is also used to send a video picture marking request to the opposite end call terminal.
  • the video picture marking request is used to request a marking operation on the first video picture, for example, perform S603 and S711 in the above method embodiment.
  • the receiving module 1002 is also configured to receive a response message of the video picture marking request from the opposite end call terminal.
  • the response message is used to indicate that the opposite end call terminal agrees to mark the first video picture, for example, perform S604 in the above method embodiment. ,S712.
  • the sending module 1003 is also configured to send a SIP message to the calling terminal.
  • the SIP message includes a marking operation confirmation identifier.
  • the marking operation confirmation identifier is used to confirm whether the calling terminal has the resources required to mark the video screen, for example Execute S605 and S713 in the above method embodiment;
  • the receiving module 1002 is also used to receive a response message of the SIP message from the calling terminal.
  • the response message includes a marking operation response identifier, and the marking operation response identifier is used to indicate that the calling terminal has the ability to respond to the video screen. Resources required for marking operations, such as performing S606 and S714 in the above method embodiment.
  • Each module of the above-mentioned media server can also be used to perform other actions in the above-mentioned method embodiment. All relevant content of each step involved in the above-mentioned method embodiment can be quoted from the functional description of the corresponding functional module, which will not be described again here.
  • FIG. 11 shows another possible structural diagram of the media server involved in the above embodiment.
  • the media server provided by the embodiment of the present application may include: a processing module 1101 and a communication module 1102.
  • the processing module 1101 can be used to control and manage the actions of the media server.
  • the processing module 1101 can be used to support the media server to perform S601 and S611 in the above method embodiment, and/or for the technology described herein.
  • the communication module 1102 can be used to support communication between the media server and other network entities.
  • the communication module 1102 integrates the functions of the above-mentioned receiving module 1002 and the sending module 1003.
  • the communication module 1102 can be used to support the media server in executing the above method.
  • the media server may also include a storage module 1103 for storing the program code and data of the media server.
  • the processing module 1101 may be a processor, for example, the processor may be the processor 501 in FIG. 5 .
  • the communication module 1102 may be a transceiver, a transceiver circuit, a network interface, etc., such as the network interface 503 in FIG. 5
  • the storage module 1103 may be a memory, such as the memory 502 in FIG. 5 .
  • the above embodiments it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof.
  • a software program it may be implemented in whole or in part in the form of a computer program product.
  • the computer program product includes one or more computer instructions. When the computer instructions are loaded and executed on the computer, the processes or functions according to the embodiments of the present application are generated in whole or in part.
  • the computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable device.
  • the computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted over a wired connection from a website, computer, server, or data center (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.) to another website, computer, server or data center.
  • the computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server or data center integrated with one or more available media.
  • the available media may be magnetic media (such as floppy disks, magnetic disks, tapes), optical media (such as digital video discs (DVD)), or semiconductor media (such as solid state drives (SSD)), etc. .
  • the disclosed systems, devices and methods can be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of modules or units is only a logical function division.
  • there may be other division methods for example, multiple units or components may be The combination can either be integrated into another system, or some features can be ignored, or not implemented.
  • the coupling or direct coupling or communication connection between each other shown or discussed may be through some interfaces, and the indirect coupling or communication connection of the devices or units may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or they may be distributed to multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment of the present application can be integrated into one processing unit, or Each unit physically exists alone, or two or more units can be integrated into one unit.
  • the above integrated units can be implemented in the form of hardware or software functional units.
  • the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it may be stored in a computer-readable storage medium.
  • the technical solution of the present application is essentially or contributes to the existing technology, or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor to execute all or part of the steps of the methods described in various embodiments of the application.
  • the aforementioned storage media include: flash memory, mobile hard disk, read-only memory, random access memory, magnetic disk or optical disk and other media that can store program codes.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Telephonic Communication Services (AREA)

Abstract

一种通信方法、装置及系统,涉及通信技术领域,使用户在通话过程中能向对方呈现自己对视频画面中的目标对象所做的标记。包括:建立视频通话媒体传输通道,且通过视频通话媒体传输通道传输通话终端与对端通话终端之间的通话视频流,以实现通话终端与对端通话终端之间的视频通话业务;并基于目标媒体数据在通话终端的通话界面呈现第一视频画面;以及检测用户对第一视频画面中的目标对象的标记操作,生成用于描述标记操作所产生的标记痕迹的标记痕迹数据;以及通过视频通话媒体传输通道向对端通话终端传输标记媒体数据,以使对端通话终端基于标记媒体数据在对端通话终端的通话界面呈现第二视频画面,第二视频画面包含标记痕迹。

Description

一种通信方法、装置及系统
本申请要求于2022年03月31日提交国家知识产权局、申请号为202210334450.9、申请名称为“一种通信方法、装置及系统”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请实施例涉及通信技术领域,尤其涉及一种通信方法、装置及系统。
背景技术
双方通话或多方通话(例如多方视频会议)等场景中,交流效率仍有待提高;如果通话参与方可以对来自对方的视频画面进行标记,使对方可以看到对视频面中的对象做出的标记痕迹,如此增加了除语音和视频之外的沟通手段,可以更加便捷、有效地进行沟通。
如何在现有通话功能基础上叠加针对视频画面的标记功能,是需要解决的问题。
发明内容
本申请实施例提供一种通信方法、装置及系统,能够在现有通话功能基础上增加针对视频画面的标记功能,使用户在通话过程中能向对方呈现自己对视频画面中的目标对象所做的标记。
为达到上述目的,本申请实施例采用如下技术方案:
第一方面,本申请实施例提供一种通信方法,该方法由通话终端执行,该方法包括:建立视频通话媒体传输通道,且通过所述视频通话媒体传输通道传输所述通话终端与对端通话终端之间的通话视频流,以实现所述通话终端与所述对端通话终端之间的视频通话业务,所述通话视频流包含所述通话终端或所述对端通话终端拍摄的视频内容;之后,基于目标媒体数据在所述通话终端的通话界面呈现第一视频画面;并且检测用户对所述第一视频画面中的目标对象的标记操作,生成标记痕迹数据,该标记痕迹数据用于描述标记操作所产生的标记痕迹;以及通过所述视频通话媒体传输通道向所述对端通话终端传输标记媒体数据,以使所述对端通话终端基于所述标记媒体数据在所述对端通话终端的通话界面呈现第二视频画面,所述第二视频画面包含所述标记痕迹。
通过本申请实施例提供的技术方案,通话终端可以基于现有的视频通话媒体传输通道传输标记媒体数据,无需花费额外的时间建立专用于传输标记媒体数据的传输通道,并且无需占用终端(包括通话终端和对端通话终端)额外的端口资源,如此,能够节省通话过程中对视频画面中的目标对象进行标记时占用的端口资源。
进一步的,与现有的通信方法相比,本申请实施例提供的技术方案中,无需在用户通话终端和客服通话终端上安装视频标记APP,如此,也无需操作人员进行复杂的相关操作,不要求操作人员具有较高的操作技能。
一种可能的实现方式中,本申请实施例不限定用户在第一视频画面上的标记操作的形式,例如,该标记操作可以是在第一视频画面上的点触操作,也可以是在第一白 板画面上划写操作,例如以不同形式的线条在第一视频画面中进行绘划从而标记出目标对象,比如通过矩形框、圆圈、三角形或不规则的封闭形状圈出目标对象,或者通过实线、虚线、箭头线或者其他特殊符号(例如在目标对象旁边画一个五角星)标记出目标对象。可以理解的,用户对第一视频画面中的目标对象进行标记操作之后,可以形成与标记操作的具体行为对应的标记痕迹,且用户可以通过任何能够标记出目标对象的方式来对目标对象标记。上述标记痕迹数据包括但不限于可以指示上述标记痕迹的时间戳、颜色、形状、位置(例如标记痕迹上的各个点的坐标等位置参数)等。示例性的,某一视频画面中的标记痕迹为红色的圆圈,标记痕迹数据包括指示视频画面的数据(例如视频画面的时间戳或标识信息)、指示标记痕迹的颜色为红色的数据、指示标记痕迹的形状为圆圈的数据、以及指示标记痕迹的圆心坐标和半径的数据。
一种可能的实现方式中,所述目标媒体数据包含第一视频帧对应的数据,所述第一视频帧为通过所述视频通话媒体传输通道从所述对端通话终端接收的且用于呈现所述第一视频画面,所述基于目标媒体数据在所述通话终端的通话界面呈现第一视频画面,包括:解码所述第一视频帧对应的数据以在所述通话界面呈现所述第一视频画面。
可选地,媒体服务器通过第二视频通话媒体传输通道先从对端通话终端接收视频流数据(该视频流数据即为通话视频流),该视频流数据中包括第一视频帧对应的数据,然后媒体服务器通过第一视频通话媒体传输通道向通话终端发送该视频流数据,进而通话终端从该视频流数据中获取第一视频帧对应的数据,并基于该第一视频帧对应的数据在通话终端的通话界面呈现第一视频画面。
可选地,媒体服务器通过第二视频通话媒体传输通道先从对端通话终端接收视频流数据(该视频流数据即为通话视频流),该视频流数据中包括第一视频帧对应的数据,然后媒体服务器从该视频流数据中获取第一视频帧对应的数据,然后媒体服务器通过第一视频通话媒体传输通道向通话终端发送第一视频帧对应的数据,进而通话终端可以基于该第一视频帧对应的数据在通话终端的通话界面呈现第一视频画面。
或者,一种可能的实现方式中,所述目标媒体数据包含目标图像对应的数据,所述目标图像为所述通话终端本地存储的且用于呈现所述第一视频画面,所述基于目标媒体数据呈现第一视频画面,包括:解码所述目标图像对应的数据以在所述通话界面呈现所述第一视频画面。
一种可能的实现方式中,所述标记媒体数据包含第二视频帧对应的数据,所述第二视频帧用于呈现嵌入了所述标记痕迹的第二视频画面;或者,所述标记媒体数据包含所述标记痕迹数据。
一种可能的实现方式中,本申请实施例提供的通信方法还包括:停止通过所述视频通话媒体传输通道传输所述通话视频流,如此,可以通过该视频通话媒体传输通道传输标记媒体数据。
一种可能的实现方式中,所述视频通话媒体传输通道包括所述通话终端与媒体服务器之间的第一视频通话媒体传输通道,以及所述对端通话终端与所述媒体服务器之间的第二视频通话媒体传输通道。上述所述通过所述视频通话媒体传输通道向所述对端通话终端传输标记媒体数据,包括:通过所述第一视频通话媒体传输通道向所述媒体服务器传输第一标记媒体数据,以触发所述媒体服务器通过所述第二视频通话媒体 传输通道向所述对端通话终端传输第二标记媒体数据。其中,所述第一标记媒体数据和所述第二标记媒体数据均包含所述第二视频帧对应的数据;或者,所述第一标记媒体数据和所述第二标记媒体数据均为所述标记痕迹数据;或者,所述第一标记媒体数据为所述标记痕迹数据,所述第二标记媒体数据为所述第二视频帧对应的数据。
上述第一标记媒体数据和第二标记媒体数据相同,二者均包含所述第二视频帧对应的数据。在这种情况下,通话终端向媒体服务器传输标记媒体数据之前,通话终端将在该通话终端的通话界面呈现的第一视频画面上叠加呈现上述标记痕迹,形成第二视频画面,并将用于呈现第二视频画面的第二视频帧对应的数据发送至媒体服务器,进而媒体服务器将该第二视频帧对应的数据送至对端通话终端,如此,对端通话终端接收到该第二视频帧对应的数据之后,即可在该对端通话终端的通话界面呈现该第二视频画面。
上述第一标记媒体数据和第二标记媒体数据相同,二者均为标记痕迹数据。在这种情况下,通话终端获得标记痕迹数据之后,通话终端将标记痕迹数据发送至媒体服务器,媒体服务器将该标记媒体数据转发至对端通话终端,进而对端通话终端将目标媒体数据(该目标媒体数据来自该对端通话终端拍摄的视频内容)和标记痕迹数据进行叠加得到第二视频帧对应的数据,并基于该第二视频帧对应的数据在该对端通话终端的通话界面呈现第二视频画面。
上述第一标记媒体数据和第二标记媒体数据不同,第一标记媒体数据为标记痕迹数据,第二标记媒体数据为第二视频帧对应的数据。在这种情况下,通话终端获得标记痕迹数据之后,通话终端将标记痕迹数据发送至媒体服务器,媒体服务器将目标媒体数据(该目标媒体数据媒体服务器从对端通话终端获得的)和标记痕迹数据进行叠加得到第二视频帧对应的数据,并将第二视频帧对应的数据发送至对端通话终端,从而对端通话终端可以基于该第二视频帧对应的数据在该对端通话终端的通话界面呈现第二视频画面。
一种可能的实现方式中,所述视频通话媒体传输通道是所述通话终端与所述对端通话终端之间直接的视频通话媒体传输通道,所述通过所述视频通话媒体传输通道向所述对端通话终端传输标记媒体数据,包括:通过所述直接的视频通话媒体传输通道向所述对端通话终端传输所述标记媒体数据。
在一种实现方式中,上述标记媒体数据包含所述第二视频帧对应的数据。在这种情况下,通话终端将在该通话终端的通话界面呈现的第一视频画面上叠加呈现上述标记痕迹,形成第二视频画面,并将用于呈现第二视频画面的第二视频帧对应的数据发送至对端通话终端,如此,对端通话终端接收到该第二视频帧对应的数据之后,即可在该对端通话终端的通话界面呈现该第二视频画面。
在另一种实现方式中,上述标记媒体数据为标记痕迹数据。在这种情况下,通话终端获得标记痕迹数据之后,通话终端将标记痕迹数据发送至对端通话终端,进而对端通话终端将目标媒体数据(该目标媒体数据来自该对端通话终端拍摄的视频内容)和标记痕迹数据进行叠加得到第二视频帧对应的数据,并基于该第二视频帧对应的数据在该对端通话终端的通话界面呈现第二视频画面。
一种可能的实现方式中,本申请实施例提供的通信方法还包括:从所述媒体服务 器接收传输通道指示信息,所述传输通道指示信息指示所述通话终端通过所述视频通话媒体传输通道传输所述标记媒体数据,该传输通道指示信息可以携带在SIP消息中。
本申请实施例中,可以通过显式指示的方法(即发送传输通道指示信息)指示通过视频通话媒体传输通道传输标记媒体数据,在有些情况下,媒体服务器也可以通过隐式指示的方法指示通过视频通话媒体传输通道传输标记媒体数据。
一种可能的实现方式中,本申请实施例提供的通信方法还包括:向所述媒体服务器发送视频画面标记申请,所述视频画面标记申请中包括所述对端通话终端的标识,所述标识用于申请对所述对端通话终端对应的视频画面进行标记操作;并且从所述媒体服务器接收所述对端通话终端发送的视频内容以呈现所述第一视频画面。可以理解的是,与通话终端通话的对端通话终端可能包括多个,通话终端与多个对端通话终端通话的过程中,可以申请向某一个对端通话终端传输标记媒体数据。
一种可能的实现方式中,本申请实施例提供的通信方法还包括:确认所述通话终端具备对所述第一视频画面进行标记操作所需的资源。本申请实施例中,通话终端发出视频画面标记申请之后,通话终端本身的状态可能会发生变化,例如通话终端当前网络信号可能较差,或者处于2G/3G网络,其带宽不足以支持该通话终端进行标记操作,或者视频通话媒体传输通道不可用,或者通话终端对应的用户不方便实施标记操作等等,在这些情况下,通话终端不具备对第一视频画面进行标记操作所需的资源。
一种可能的实现方式中,所述确认所述通话终端具备对所述第一视频画面进行标记操作所需的资源,包括:从所述媒体服务器接收SIP消息,所述SIP消息中包括标记操作确认标识,所述标记操作确认标识用于确认所述通话终端是否具备对所述第一视频画面进行标记操作所需的资源;并且向所述媒体服务器发送所述SIP消息的响应消息,所述响应消息中包括标记操作应答标识,所述标记操作应答标识用于指示所述通话终端具备对所述第一视频画面进行标记操作所需的资源。可选地,上述SIP消息中的标记操作确认标识可以携带在SIP消息的头域中;或者,在SIP消息中包括媒体服务器的SDP信息的情况下,上述标记操作确认标识也可以携带在媒体服务器的SDP信息中。
第二方面,本申请实施例提供一种通信方法,该方法由媒体服务器执行,该方法包括:建立第一视频通话媒体传输通道和第二视频通话媒体传输通道,所述第一视频通话媒体传输通道为所述媒体服务器与通话终端之间的视频通话媒体传输通道,所述第二视频通话媒体传输通道为所述媒体服务器与对端通话终端之间的视频通话媒体传输通道,且通过所述第一视频通话媒体传输通道和所述第二视频通话媒体传输通道传输所述通话终端与所述对端通话终端之间的通话视频流,以实现所述通话终端与所述对端通话终端之间的视频通话业务,所述通话视频流包含所述通话终端或对端通话终端拍摄的视频内容;并且通过所述第一视频通话媒体传输通道从所述通话终端接收第一标记媒体数据,所述第一标记媒体数据用于在所述对端通话终端的通话界面呈现第二视频画面,所述第二视频画面包含标记痕迹,所述标记痕迹是用户对所述通话终端的通话界面呈现的第一视频画面中的目标对象进行标记操作所产生的标记痕迹,所述第一视频画面是基于目标数据在所述通话终端的通话界面呈现的视频画面;并且通过所述第二视频通话媒体传输通道向所述对端通话终端传输第二标记媒体数据,所述第 二标记媒体数据用于在所述对端通话终端的通话界面呈现第二视频画面。
通过本申请实施例提供的技术方案,媒体服务器作为通话终端和对端通话终端之间的传输媒介,可以基于现有的视频通话媒体传输通道传输标记媒体数据(包括上述的第一标记媒体数据和第二标记媒体数据),这样无需花费额外的时间建立专用于标记媒体数据的传输通道,并且无需占用终端(包括通话终端和对端通话终端)额外的端口资源,如此,能够节省通话过程中对视频画面中的目标对象进行标记时占用的端口资源。
进一步的,与现有的通信方法相比,本申请实施例提供的技术方案中,无需在用户通话终端和客服通话终端上安装视频标记APP,如此,也无需操作人员进行复杂的相关操作,不要求操作人员具有较高的操作技能。
一种可能的实现方式中,所述目标媒体数据包含第一视频帧对应的数据,所述第一视频帧为所述通话终端通过所述第一视频通话媒体传输通道从所述对端通话终端接收的且用于呈现所述第一视频画面。
或者,一种可能的实现方式中,所述目标媒体数据包含目标图像对应的数据,所述目标图像为所述通话终端本地存储的且用于呈现所述第一视频画面。
一种可能的实现方式中,所述第一标记媒体数据和所述第二标记媒体数据均包含第二视频帧对应的数据,所述第二视频帧用于呈现嵌入了所述标记痕迹的第二视频画面;或者,所述第一标记媒体数据和所述第二标记媒体数据均为所述标记痕迹数据;或者,所述第一标记媒体数据为所述标记痕迹数据,所述第二标记媒体数据为所述第二视频帧对应的数据。
一种可能的实现方式中,本申请实施例提供的通信方法还包括:向所述通话终端发送第一传输通道指示信息,所述第一传输通道指示信息指示所述通话终端通过所述第一视频通话媒体传输通道传输所述第一标记媒体数据。
一种可能的实现方式中,本申请实施例提供的通信方法还包括:向所述对端通话终端发送第二传输通道指示信息,所述第二传输通道指示信息指示所述对端通话终端通过所述第二视频通话媒体传输通道传输所述第二标记媒体数据。
一种可能的实现方式中,本申请实施例提供的通信方法还包括:停止通过所述第一视频通话媒体传输通道传输所述通话视频流。
一种可能的实现方式中,本申请实施例提供的通信方法还包括:停止通过所述第二视频通话媒体传输通道传输所述通话视频流;停止通过所述第二视频通话媒体传输通道传输所述通话视频流。
一种可能的实现方式中,本申请实施例提供的通信方法还包括:从所述通话终端接收视频画面标记申请,所述视频画面标记申请中包括所述对端通话终端的标识,所述标识用于申请对所述对端通话终端对应的视频画面进行标记操作;并且向所述对端通话终端发送视频画面标记请求,所述视频画面标记请求用于请求对所述第一视频画面进行标记操作;并且从所述对端通话终端接收所述视频画面标记请求的响应消息,所述响应消息用于指示所述对端通话终端同意对所述第一视频画面进行标记操作。
可选地,若呼叫方发起的是语音通话,上述视频画面标记请求还指示请求将语音通话转换视频通话,视频画面标记请求的响应消息还指示对端通话终端同意将语音通 话转为视频通话。
可选地,第一视频画面来自通话终端从对端通话终端接收的视频内容,视频画面标记请求的响应消息还指示对端通话终端同意媒体服务器捕获该对端通话终端发送给媒体服务器的通话视频流。
一种可能的实现方式中,本申请实施例提供的通信方法还包括:确认所述通话终端具备对第一视频画面进行标记操作所需的资源。
一种可能的实现方式中,确认所述通话终端具备对第一视频画面进行标记操作所需的资源,包括:向所述通话终端发送SIP消息,所述SIP消息中包括标记操作确认标识,所述标记操作确认标识用于确认所述通话终端是否具备对第一视频画面进行标记操作所需的资源;并且从所述通话终端接收所述SIP消息的响应消息,所述响应消息中包括标记操作应答标识,所述标记操作应答标识用于指示所述通话终端具备对第一视频画面进行标记操作所需的资源。
第二方面的相关内容和技术效果可以参考第一方面及其可能的实现方式中任意之一所述的内容和技术效果。
第三方面,本申请实施例提供一种通话终端,包括:处理模块、生成模块以及发送模块。其中,所述处理模块,用于建立视频通话媒体传输通道,且控制所述通话终端通过所述视频通话媒体传输通道传输所述通话终端与对端通话终端之间的通话视频流,以实现所述通话终端与所述对端通话终端之间的视频通话业务;并且基于目标媒体数据在所述通话终端的通话界面呈现第一视频画面;所述生成模块,用于检测用户对所述第一视频画面中的目标对象的标记操作,生成标记痕迹数据,所述标记痕迹数据用于描述所述标记操作所产生的标记痕迹;所述发送模块,用于通过所述视频通话媒体传输通道向所述对端通话终端传输标记媒体数据,以使所述对端通话终端基于所述标记媒体数据在所述对端通话终端的通话界面呈现第二视频画面,所述第二视频画面包含所述标记痕迹。
一种可能的实现方式中,所述目标媒体数据包含第一视频帧对应的数据,所述第一视频帧为通过所述视频通话媒体传输通道从所述对端通话终端接收的且用于呈现所述第一视频画面;所述处理模块,具体用于解码所述第一视频帧对应的数据以在所述通话界面呈现所述第一视频画面。
或者,一种可能的实现方式中,所述目标媒体数据包含目标图像对应的数据,所述目标图像为所述通话终端本地存储的且用于呈现所述第一视频画面;所述处理模块,具体用于解码所述目标图像对应的数据以在所述通话界面呈现所述第一视频画面。
一种可能的实现方式中,所述标记媒体数据包含第二视频帧对应的数据,所述第二视频帧用于呈现嵌入了所述标记痕迹的第二视频画面;或者,所述标记媒体数据包含所述标记痕迹数据。
一种可能的实现方式中,所述处理模块,还用于控制所述通话终端停止通过所述视频通话媒体传输通道传输所述通话视频流。
一种可能的实现方式中,所述视频通话媒体传输通道包括所述通话终端与媒体服务器之间的第一视频通话媒体传输通道,以及所述对端通话终端与所述媒体服务器之间的第二视频通话媒体传输通道;所述发送模块,具体用于通过所述第一视频通话媒 体传输通道向所述媒体服务器传输第一标记媒体数据,以触发所述媒体服务器通过所述第二视频通话媒体传输通道向所述对端通话终端传输第二标记媒体数据。
一种可能的实现方式中,所述第一标记媒体数据和所述第二标记媒体数据均包含所述第二视频帧对应的数据;或者,所述第一标记媒体数据和所述第二标记媒体数据均为所述标记痕迹数据;或者,所述第一标记媒体数据为所述标记痕迹数据,所述第二标记媒体数据为所述第二视频帧对应的数据。
一种可能的实现方式中,所述视频通话媒体传输通道是所述通话终端与所述对端通话终端之间直接的视频通话媒体传输通道;所述发送模块,具体用于通过所述直接的视频通话媒体传输通道向所述对端通话终端传输所述标记媒体数据。
一种可能的实现方式中,所述通话终端还包括接收模块;所述接收模块,用于从所述媒体服务器接收传输通道指示信息,所述传输通道指示信息指示所述通话终端通过所述视频通话媒体传输通道传输所述标记媒体数据。
一种可能的实现方式中,所述发送模块,还用于向所述媒体服务器发送视频画面标记申请,所述视频画面标记申请中包括所述对端通话终端的标识,所述标识用于申请对所述对端通话终端对应的视频画面进行标记操作。
一种可能的实现方式中,所述接收模块,还用于从所述媒体服务器接收会话发起协议SIP消息,所述SIP消息中包括标记操作确认标识,所述标记操作确认标识用于确认所述通话终端是否具备对所述第一视频画面进行标记操作所需的资源;所述发送模块,还用于向所述媒体服务器发送所述SIP消息的响应消息,所述响应消息中包括标记操作应答标识,所述标记操作应答标识用于指示所述通话终端具备对所述视频画面进行标记操作所需的资源。
第四方面,本申请实施例提供一种媒体服务器执行,包括:处理模块、接收模块以及发送模块。其中,所述处理模块,用于建立第一视频通话媒体传输通道和第二视频通话媒体传输通道,所述第一视频通话媒体传输通道为所述媒体服务器与通话终端之间的视频通话媒体传输通道,所述第二视频通话媒体传输通道为所述媒体服务器与对端通话终端之间的视频通话媒体传输通道;且控制所述接收模块或所述发送模块通过所述第一视频通话媒体传输通道和所述第二视频通话媒体传输通道传输所述通话终端与所述对端通话终端之间的通话视频流,以实现所述通话终端与所述对端通话终端之间的视频通话业务;所述接收模块,用于通过所述第一视频通话媒体传输通道从所述通话终端接收第一标记媒体数据,第一标记媒体数据用于在所述对端通话终端的通话界面呈现第二视频画面,所述第二视频画面包含标记痕迹,所述标记痕迹是用户对所述通话终端的通话界面呈现的第一视频画面中的目标对象进行标记操作所产生的标记痕迹,所述第一视频画面是基于目标数据在所述通话终端的通话界面呈现的视频画面;所述发送模块,用于通过所述第二视频通话媒体传输通道向所述对端通话终端传输第二标记媒体数据,所述第二标记媒体数据用于在所述对端通话终端的通话界面呈现所述第二视频画面。
一种可能的实现方式中,所述目标媒体数据包含第一视频帧对应的数据,所述第一视频帧为所述通话终端通过所述第一视频通话媒体传输通道从所述对端通话终端接收的且用于呈现所述第一视频画面。
或者,一种可能的实现方式中,所述目标媒体数据包含目标图像对应的数据,所述目标图像为所述通话终端本地存储的且用于呈现所述第一视频画面。
一种可能的实现方式中,所述第一标记媒体数据和所述第二标记媒体数据均包含第二视频帧对应的数据,所述第二视频帧用于呈现嵌入了所述标记痕迹的第二视频画面;或者,所述第一标记媒体数据和所述第二标记媒体数据均为所述标记痕迹数据;或者,所述第一标记媒体数据为所述标记痕迹数据,所述第二标记媒体数据为所述第二视频帧对应的数据。
一种可能的实现方式中,所述发送模块,还用于向所述通话终端发送第一传输通道指示信息,所述第一传输通道指示信息指示所述通话终端通过所述视频通话媒体传输通道传输所述第一标记媒体数据;并且向所述对端通话终端发送第二传输通道指示信息,所述第二传输通道指示信息指示所述对端通话终端通过所述视频通话媒体传输通道传输所述第二标记媒体数据。
一种可能的实现方式中,所述处理模块还用于控制所述媒体服务器停止通过所述第一视频通话媒体传输通道传输所述通话视频流;并且控制所述媒体服务器停止通过所述第二视频通话媒体传输通道传输所述通话视频流。
一种可能的实现方式中,所述接收模块,还用于从所述通话终端接收视频画面标记申请,所述视频画面标记申请中包括所述对端通话终端的标识,所述标识用于申请对所述对端通话终端对应的视频画面进行标记操作;所述发送模块,还用于向所述对端通话终端发送视频画面标记请求,所述视频画面标记请求用于请求对所述视频画面进行标记操作;所述接收模块,还用于从所述对端通话终端接收所述视频画面标记请求的响应消息,所述响应消息用于指示所述对端通话终端同意对所述视频画面进行标记操作。
一种可能的实现方式中,所述发送模块,还用于向所述通话终端发送会话发起协议SIP消息,所述SIP消息中包括标记操作确认标识,所述标记操作确认标识用于确认所述通话终端是否具备对所述第一视频画面进行标记操作所需的资源;所述接收模块,还用于从所述通话终端接收所述SIP消息的响应消息,所述响应消息中包括标记操作应答标识,所述标记操作应答标识用于指示所述通话终端具备对所述第一视频画面进行标记操作所需的资源。
第五方面,本申请实施例提供一种通话终端,包括存储器和与存储器连接的至少一个处理器,存储器用于存储计算机程序代码,该计算机程序代码包括计算机指令,当计算机指令被至少一个处理器执行时,使得通话终端执行第一方面及其可能的实现方式中任意之一所述的方法。
第六方面,本申请实施例提供一种媒体服务器,包括存储器和与存储器连接的至少一个处理器,存储器用于存储计算机程序代码,该计算机程序代码包括计算机指令,当计算机指令被至少一个处理器执行时,使得媒体服务器执行第二方面及其可能的实现方式中任意之一所述的方法。
第七方面,本申请实施例提供一种计算机可读存储介质,包括计算机指令,当计算机指令在通话终端上运行时,使得通话终端执行第一方面及其可能的实现方式中任意之一所述的方法。
第八方面,本申请实施例提供一种计算机可读存储介质,包括计算机指令,当计算机指令在媒体服务器上运行时,使得媒体服务器执行第二方面及其可能的实现方式中任意之一所述的方法。
第九方面,本申请实施例提供一种计算机程序产品,当计算机程序产品在计算机上运行时,执行第一方面及其可能的实现方式中任意之一所述方法。
第十方面,本申请实施例提供一种计算机程序产品,当计算机程序产品在计算机上运行时,执行第二方面及其可能的实现方式中任意之一所述方法。
第十一方面,本申请实施例提供一种芯片,包括存储器和处理器。存储器用于存储计算机指令。处理器用于从存储器中调用并运行该计算机指令,以使得通话终端执行第一方面及其可能的实现方式中任意之一所述的方法。
第十二方面,本申请实施例提供一种芯片,包括存储器和处理器。存储器用于存储计算机指令。处理器用于从存储器中调用并运行该计算机指令,以使得媒体服务器执行第二方面及其可能的实现方式中任意之一所述的方法。
第十三方面,本申请实施例提供一种通信系统,包括通话终端和媒体服务器。其中,通话终端执行第一方面及其可能的实现方式中任意之一所述的方法,媒体服务器执行第二方面及其可能的实现方式中任意之一所述的方法。
可以理解的是,上述第三方面至第十三方面技术方案及对应的可能的实施方式所取得的有益效果可以参见上述对第一方面至第二方面及其对应的可能的实施方式的技术效果,此处不再赘述。
附图说明
图1为本申请实施例提供的一种人工客服的服务场景中的通信系统的架构示意图;
图2为本申请实施例提供的一种语音通话的流程示意图;
图3为本申请实施例提供的一种视频通话的流程示意图;
图4A为本申请实施例提供的一种手机的硬件示意图;
图4B为本申请实施例提供的一种手机的软件结构框图;
图5为本申请实施例提供的一种服务器的硬件示意图;
图6为本申请实施例提供的一种通信方法示意图之一;
图7为本申请实施例提供的一种通信方法示意图之二;
图8为本申请实施例提供的一种通话终端的结构示意图;
图9为本申请实施例提供的另一种通话终端的结构示意图;
图10为本申请实施例提供的一种媒体服务器的结构示意图;
图11为本申请实施例提供的另一种媒体服务器的结构示意图。
具体实施方式
本文中术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。
本申请实施例的说明书和权利要求书中的术语“第一”和“第二”等是用于区别不同的对象,而不是用于描述对象的特定顺序。例如,第一视频通话媒体传输通道和第二视频通话媒体传输通道等是用于区别不同的视频通话媒体传输通道,而不是用于描述 视频通话媒体传输通道的特定顺序。
在本申请实施例中,“示例性的”或者“例如”等词用于表示作例子、例证或说明。本申请实施例中被描述为“示例性的”或者“例如”的任何实施例或设计方案不应被解释为比其它实施例或设计方案更优选或更具优势。确切而言,使用“示例性的”或者“例如”等词旨在以具体方式呈现相关概念。
在本申请实施例的描述中,除非另有说明,“多个”的含义是指两个或两个以上。例如,多个通话终端是指两个或两个以上的通话终端。
现阶段,基于移动网络,通过终端可以实现用户之间的语音通话或视频通话,以语音通话为例,用户1可以通过该用户1持有的终端1拨打用户2持有的终端2的号码(例如电话号码),用户2通过终端2应答之后,用户1持有的终端1与用户2持有的终端2可以建立通话连接,从而用户1和用户2可以进行语音通话,例如,终端1采集用户1的语音,并将采集的语音发送至终端2,终端2采集用户2的语音,并将采集的语音发送至终端1。
可以理解的,终端之间进行通话的场景中,可以将其中的一个终端称为通话终端,与该通话终端通话的对端称为对端通话终端,例如上述的终端1为通话终端,则终端2为对端通话终端。可选地,一个通话终端的对端通话终端的数量可以为多个,例如,三个终端(例如终端1、终端2以及终端3)进行三方会话,其中,终端1为通话终端,该终端1的对端,即终端2和终端3均为对端通话终端。
下面以两个通话终端(即一个通话终端和一个对端通话终端)为例,分别对通话终端与对端通话终端进行语音通话和视频通话过程中媒体流的传输原理进行简要介绍。
当通话终端与对端通话终端进行语音通话时,通话终端和对端通话终端之间传输的媒体流是语音流,在通话终端与对端通话终端建立语音通话媒体传输通道(也可以简称为语音流传输通道)之后,通话终端和对端通话终端可以基于语音流传输通道传输语音流。例如,通话终端将该通话终端的麦克风采集的与其对应的用户的语音流通过语音流传输通道发送至对端通话终端,对端通话终端将该对端通话终端的麦克风采集的与其对应的对端用户的语音流通过语音流传输通道发送至通话终端。
当通话终端与对端通话终端进行视频通话时,通话终端和对端通话终端之间传输的媒体流包括语音流和视频流,并且语音流与视频流的传输通道不同。具体的,在建立语音通话媒体传输通道(即语音流传输通道)和视频通话媒体传输通道(可以简称为视频流传输通道)之后,通话终端和对端通话终端可以基于语音流传输通道传输语音流,通话终端和对端通话终端可以基于视频流传输通道传输视频流。例如,通话终端将该通话终端的麦克风采集的与其对应的用户的语音流通过语音流传输通道发送至对端通话终端,对端通话终端将该对端通话终端的麦克风采集的与其对应的对端用户的语音流通过语音流传输通道发送至通话终端;通话终端将该通话终端的摄像头采集的与其对应的用户的视频流通过视频流传输通道发送至对端通话终端,对端通话终端将该对端通话终端的摄像头采集的与其对应的对端用户的视频流通过视频流传输通道发送至通话终端。
视频标记是一种视频处理技术,指的是对视频画面中的一个或多个对象打标记,以将视频画面中用户所关注的对象标记出来。例如,一个视频画面中包括人物、动物 以及建筑物,用户关注的对象是动物,因此可以采用视频标记工具将视频画面中的动物标记出来,例如,采用矩形框圈出动物对象,或者采用其他的标记(例如箭头或者曲线)标识出动物对象。
视频标记广泛地应用在日常生活中,可以给日常生活带来很多便利,例如远程教学、家电维修、宽带维修以及远程定损等场景中,一方用户可以对视频画面中的目标对象打标记,然后将标记媒体数据(标记媒体数据用于呈现包含标记痕迹的视频画面)发送给另一方用户,以使得另一方用户从包含标记痕迹的视频画面中更加方便地关注到需关注的信息。
示例性的,以家电维修的场景为例,在家电发生故障的情况下,用户(或消费者)可以找家电的售后服务机构,例如通过拨打售后服务电话,反映故障问题,从而帮助用户解决故障问题。在解决故障问题的过程中,用户可以通过语言来描述故障问题,若语言描述不清楚,则多数情况下,售后机构需要派工作人员线下上门排查故障并维修,这样,效率比较低。一种比较高效的完成售后服务的方式是:用户发送图像或视频向售后服务机构展示家电的故障情况,而且用户拍摄到家电的图像或视频之后,可以对图像或视频中的视频画面中的目标对象进行标记操作以标记出故障点,例如,在图像或视频画面中标记出家电的某个按键或指示灯等等;然后将标记媒体数据发送至售后服务机构,售后服务机构的工作人员根据标记媒体数据迅速地获知家电的故障点,从而可以有针对性地、高效地解决用户的问题。
在两个用户进行语音通话或者视频通话的过程中,若一个用户需要对视频画面中的目标对象进行标记操作,该用户对视频画面中的目标对象进行标记,生成标记痕迹数据之后,向另一个用户发送标记媒体数据,该标记媒体数据用于呈现包含标记痕迹的视频画面。以上述通话终端和对端通话终端之间传输视频画面和标记痕迹数据为例,目前,一种实现方式是:通话终端和对端通话终端上均需要安装用于视频标记的应用(APP),通话终端上的视频标记APP与对端通话终端上的视频标记APP之间通过服务器建立专用于标记媒体数据的传输通道,进而通话终端和对端通话终端通过该传输通道传输标记媒体数据。例如,一种情况下,通话终端将视频画面上传至该通话终端上的视频标记APP,在视频标记APP中对视频画面中的目标对象进行标记操作之后,通过上述传输通道将标记媒体数据发送至对端通话终端;另一种情况下,对端通话终端将视频画面上传至该对端通话终端上的视频标记APP,在视频标记APP中对视频画面中的目标对象进行标记操作之后,通过上述传输通道将该标记媒体数据发送至通话终端。
上述在两个用户进行语音通话或视频通话的过程中对视频画面中的目标对象进行标记操作并传输标记媒体数据时,需要建立通话终端与对端通话终端之间的专用于传输标记媒体数据的传输通道,该传输通道不同于上述的语音流传输通道和视频流传输通道。
可以理解的是,若在通话终端与对端通话终端通话的过程中对视频画面中的目标对象进行标记操作,则需要花费额外的时间建立用于传输标记媒体数据的传输通道,并且通话终端或对端通话终端上也需要使用额外的端口传输标记媒体数据。例如,若在通话终端与对端通话终端进行语音通话的过程中进行视频标记,则通话终端和对端 通话终端之间基于语音流传输通道传输语音流,通话终端和对端通话终端之间基于建立的专用于传输标记媒体数据的传输通道来传输标记媒体数据。又例如,若在通话终端与对端通话终端进行视频通话的过程中对视频画面中的目标对象进行标记操作,通话终端和对端通话终端之间基于视频流传输通道传输摄像头采集的视频流,则通话终端和对端通话终端之间基于建立的专用于传输标记媒体数据的传输通道传输标记媒体数据。综上可知,建立专用于传输标记媒体数据的传输通道需要占用额外的时间,并且传输标记媒体数据需要占用终端额外的端口资源。
针对现有技术中的建立专用于传输标记媒体数据的传输通道需要占用额外的时间,并且需要占用终端额外的端口资源的问题,本申请实施例提供一种通信方法、装置及系统,该通信方法可以应用于终端之间通话的过程中,通信系统中的通话终端、媒体服务器以及对端通话终端之间进行交互建立用于传输通话视频流的视频通话媒体传输通道,且通过视频通话媒体传输通道传输所述通话终端与对端通话终端之间的通话视频流,以实现通话终端与对端通话终端之间的视频通话业务,该通话视频流包含通话终端或对端通话终端拍摄的视频内容;之后,基于目标媒体数据在通话终端的通话界面呈现第一视频画面;并且检测用户对改第一视频画面中的目标对象的标记操作,生成标记痕迹数据,该标记痕迹数据用于描述标记操作所产生的标记痕迹,进而通过建立的视频通话媒体传输通道向对端通话终端传输标记媒体数据,以使对端通话终端基于标记媒体数据在对端通话终端的通话界面呈现第二视频画面,该第二视频画面包含标记痕迹。通过本申请实施例提供的技术方案,通话终端、媒体服务器以及对端通话终端之间可以基于现有的视频通话媒体传输通道传输标记媒体数据,无需花费额外的时间建立专用于传输标记媒体数据的传输通道,并且无需占用通话终端额外的端口资源。
进一步的,与现有的通信方法相比,本申请实施例提供的技术方案中,无需在终端上安装视频标记APP,如此,也无需操作人员进行复杂的相关操作,不要求操作人员具有较高的操作技能。
可选地,本申请实施例提供的通信方法可以应用于两个通话终端通话的过程,也可以应用于多个通话终端通话的过程。
可选地,本申请实施例提供的通信方法可以应用于视频会议场景、客服场景等等。其中,客服场景是用户与客户服务中心(也可以成为客户服务系统)通话的场景,用户与客户服务中心通话可以解决用户的一些服务需求,例如对于电子产品、保险、移动通信等客服业务,均涉及客服场景。一般情况下,用户可以拨打客户服务中心的电话与客户服务中心建立通话连接,目前,在实际应用中,大部分的客户服务过程是:用户拨打客户服务中心的电话(发出呼叫),客户服务中心应答之后,先向用户推送(即播放)一些提示内容(该提示内容是客户服务中心预先存储的),用户可以根据提示内容选择需要服务的选项,客户服务中心根据用户选择的服务选项,针对性地对用户进行服务(例如针对用户选择的服务选项进行答复,解决用户提出的问题)。
另外,用户也可以根据推送的提示内容选择人工服务,这种情况下,将进入人工客服的服务场景。人工客服的服务场景指的是用户与客户服务中心通话过程中,用户与客户服务中心的工作人员通话的场景,即上述用户根据客户服务中心推送的提示内 容选择人工服务之后,客户服务中心继续呼叫客户服务中心的工作人员(具体是通过该工作人员持有的终端的号码呼叫该终端),以下实施例中,客户服务中心可以简称为客服或客服系统,客户服务中心的工作人员可以简称为客服人员。以人工客服的服务场景为例,用户通过该用户持有的通话终端(可以简称为用户通话终端)呼叫客服系统,当该呼叫转到人工服务时,客服人员应答之后,该客服人员持有的通话终端(可以简称为客服通话终端)与用户通话终端通话,本申请实施例中,在客服通话终端与用户通话终端开始通话之后,客户人员针对用户向提出的问题用户进行指导时,客服人员可以申请视频标记,具体的,客服人员对视频画面中的目标对象进行标记操作之后,将标记媒体数据(用于向用户呈现包含标记痕迹的视频画面)发送至用户通话终端,如此,用户可以根据用户通话终端上呈现的包含标记痕迹的视频画面获知客服人员提供的解决方案,能够高效地解决用户的问题,提升服务质量。
需要说明的是,本申请实施例以人工客服的服务场景为例对本申请实施例提供的通信方法进行描述。可以理解的是,在人工客服的服务场景中,用户通话终端发起通话,客服通话终端应答之后,二者在通话的过程中可以对视频画面中的目标对象进行标记操作。
可选地,在本申请实施例中,用户通话终端发起的通话可以是语音通话,也可以是视频通话。应注意,当用户发起的是语音通话,并且呼叫了人工客服,客服系统中的人工客服应答之后,先通过媒体资源协商建立语音流传输通道,若在语音通话过程中发起视频画面标记申请,则还需要进行媒体资源重协商来建立视频流传输通道,以将语音通话转换到视频通话,然后,基于视频通话所对应的视频流传输通道传输标记媒体数据。当用户发起的是视频通话,客服系统中的人工客服应答之后,通过媒体资源协商建立视频流传输通道,然后,基于该视频流传输通道传输标记媒体数据。
客服场景所对应的通信系统可以看作是一个会议控制系统,该通信系统涉及接入网、IP多媒体子系统(即IMS,包括4G/5G核心网和IMS核心网)以及客服平台(也可以称为客服系统)、业务系统等。下面对客服场景中的通信系统的架构进行介绍,如图1所示,该通信系统具体包括:用户通话终端101、接入网设备102、IP多媒体子系统103、客服平台104、业务系统105以及客服通话终端106。其中,IP多媒体子系统103包括核心网(可以是4G核心网和/或5G核心网)和IMS核心网,可以理解的是,4G核心网中包括网关设备(例如S-GW、P-GW),5G核心网中包括用户面功能(UPF)网元、移动管理功能(AMF)网元等,IMS核心网中包括会话边界控制器SBC、代理-呼叫会话控制功能P-CSCF设备、呼叫会话控制功能I-CSCF网元、服务呼叫会话控制功能S-CSCF网元。客服平台104中包括媒体服务器。
SBC:用于提供安全接入和媒体处理。
P-CSCF:是用户通话终端接入IMS核心网的入口节点设备,主要负责信令和消息的代理。
I-CSCF:是IMS核心网的统一初步入口节点设备,负责用户注册的S-CSCF的指配和查询。
S-CSCF:是IMS核心网的中心节点设备,主要用于用户的注册、鉴权控制、会话路由和业务触发控制,并维持会话状态信息。
媒体服务器:本申请实施例中,在传统的客服场景所对应的通信系统中,客服平台104包括控制服务器(也可以成为信令服务器)和媒体服务器,信令服务器的功能主要负责信令的协商和处理,控制用户通话终端、客服通话终端的加入通话或者退出通话,媒体服务器的功能主要负责音频、视频处理和播放,通话会场的申请和释放,音频编解码,视频编解码以及混合编码处理。在有些实现方式中,媒体服务器和控制服务器的功能可以集成在一个服务器中,本申请实施例中,均以媒体服务器和控制服务器的功能均集成在媒体服务器为例对本申请实施例提供的通信方法进行描述。
业务系统:负责根据主叫(例如用户通话终端)、被叫的号码等判断触发不同的业务流程,不同的业务可以包括但不限于视频通话、视频广告、企视秀等。
结合图1所示的通信系统的架构,在用户通话终端接入接入网,并且通过4G核心网或者5G核心网,以及IMS核心网建立会话的基础上,以语音通话为例,对语音通话的流程进行描述,以便于理解客服场景中语音通话的流程。参考图2,语音通话的流程包括:
S201、用户通话终端通过IMS网元向媒体服务器发送邀请(invite)消息。
具体的,结合图1所示的通信系统的架构示意图,IMS中包括4G/5G核心网的网元(包括网关设备/用户面功能网元)、IMS核心网的SBC/P-CSCF网元、I-CSCF/S-CSCF网元,本申请实施例中,可以将IMS中的这些网元统称为IMS网元。上述用户通话终端通过IMS网元向媒体服务器发送邀请消息具体包括:用户通话终端按照图1所示的架构图依次经4G/5G核心网的网元、SBC/P-CSCF网元、I-CSCF/S-CSCF网元将邀请消息发送至媒体服务器。应注意,IMS网元用于透传用户通话终端与媒体服务器之间的消息,不对消息做处理。
需要说明的是,在以下实施例中,通过IMS网元发送或者接收的消息或者信息均与S201中通过IMS网元传输邀请消息类似,IMS网元均用于透传消息或者信息,在下述实施例中不再进行一一说明。
可以理解的是,用户通过用户通话终端拨打客服的接入码(可以理解为客服系统的电话号码)之后,用户通话终端即执行上述S201。示例性的,该客服可以为某通信运营商的客服或者互联网运营商的客服(例如银行业务对应的客服、保险业务对应的客服等)等等,本申请对客服的类型不做限定。
需要说明的是,本申请实施例中的客服通话终端指的是客服系统中的客服人员对应的通话终端,该客服通话终端属于客服系统中的一部分。可以理解的是,用户通过用户通话终端拨呼叫客服系统,待客服系统应答之后,客服平台中的媒体服务器播放与用户的业务相关的音频提示内容,以提示用户根据实际需求选择相应的服务,在用户选择了人工服务的情况下,客服系统中的媒体服务器继续呼叫客服通话终端,具体结合下述实施例的相关步骤进行理解。
本申请实施例中,用户通话终端是通过会话发起协议(session initiation protocol,SIP)发送该邀请消息的,也可以理解为该邀请消息是通过SIP消息发送的,该邀请消息中携带有用户通话终端的会话描述协议(session description protocol,SDP)信息,该SDP信息中包括用户通话终端的地址信息、音频端口信息以及音频编解码格式,该SDP信息用于与媒体服务器进行媒体资源协商,以建立用户通话终端与媒体服务器之 间的用于传输通话语音流的语音通话媒体传输通道。本申请实施例中,设备的地址信息可以为设备的IP地址。
S202、媒体服务器向用户通话终端发送振铃消息。
该振铃消息用于指示用户拨打的客服电话正在接通中,此时用户通话终端处于等待客服系统应答(即摘机)的振铃态,该振铃消息可以是18*系列的消息,例如181消息(即call being forwarded,用于指示呼叫正在前向)或183消息(用于提示建立对话的进度)等。该振铃消息中携带媒体服务器的SDP信息,媒体服务器的IP地址、音频端口信息以及音频编解码格式,该SDP信息用于与用户通话终端进行媒体资源协商,以建立用户通话终端与媒体服务器之间的用于传输通话语音流的语音通话媒体传输通道。
S203、媒体服务器通过IMS网元向用户通话终端发送应答消息。
可以理解的是,在媒体服务器向用户通话终端发送振铃消息之后,用户通话终端等待客服系统应答(即等待接通),在此过程中,用户可以听到“嘟…嘟…”的等待音,或者可以听到彩铃,当客服系统应答之后,此时呼叫接通,媒体服务器执行上述S203。
同理,IMS网元用于透传该应答消息。
本申请实施例中,客服系统对用户通话终端的呼叫进行应答之后,媒体服务器可以播放与用户的业务相关的音频提示内容,具体的,媒体服务器基于上述建立的语音通话媒体传输通道向用户通话终端发送该音频提示内容,该音频提示内容可以提示用户根据需求选择不同的服务内容。示例性的,若该语音通话是用户拨打通信运营商场景下的语音通话,音频提示内容可以包括:
话费流量查询请按“1”、宽带业务请按“2”、充值服务请按“3”、业务查询和办理请按“4”、密码服务请按“5”、集团业务请按“6”、人工服务请按“0”等等,可选地,该音频提示内容还可以包括一些广告、宣传等内容,音频提示内容与具体的应用场景相关,本申请对音频提示内容不做限定。
当用户在上述音频提示内容的提示下进行操作,选择了人工服务时,媒体服务器检测到选择人工服务的操作之后,媒体服务器为该用户分配一个客服人员(即为用户通话终端选择一个对应的客服通话终端),然后媒体服务器执行下述S204。
S204、媒体服务器向客服通话终端发送邀请(invite)消息。
该邀请消息用于呼叫客服通话终端与用户通话终端的语音通话,该邀请消息中包括媒体服务器的SDP信息,该媒体服务器的SDP信息包括媒体服务器的IP地址、音频端口信息以及音频编解码格式。该SDP信息用于与客服通话终端进行媒体资源协商,以建立客服通话终端与媒体服务器之间的用于传输通话语音流的语音通话媒体传输通道。
S205、客服通话终端向媒体服务器发送应答消息。
客服通话终端发送该应答消息之后,该客服通话终端即加入了与用户通话终端的通话,该应答消息中包括客服通话终端的SDP信息,该客服通话终端的SDP信息包括客服通话终端的IP地址、音频端口信息以及音频编解码格式。该SDP信息用于与媒体服务器媒体资源协商,以建立客服通话终端与媒体服务器之间的用于传输通话语音流的语音通话媒体传输通道。
可以理解的是,由于客服通话终端是客服系统中的与用户通话终端通话的新的设备,因此,在后续流程中,为实现用户通话终端与客服通话终端进行通信,需要重新进行媒体资源协商,即媒体服务器与用户通话终端进行媒体资源重协商(参考S206-S207),媒体服务器与客服通话终端进行媒体资源重协商(参考S208-S209),通过媒体资源重协商可以建立语音流传输通道(即语音通话媒体传输通道)。应注意,通过S206-S209建立的语音通话媒体传输通道是一条需要媒体服务器作为媒介的传输通道,即间接的语音通话媒体传输通道,该语音通话媒体传输通道包括用户通话终端与媒体服务器之间的语音通话媒体传输通道,以及媒体服务器和客服通话终端之间的语音通话媒体传输通道。
S206、媒体服务器通过IMS网元向用户通话终端发送重邀请(reinvite)消息。
该重邀请消息用于与用户通话终端进行媒体资源重协商,以建立用户通话终端与媒体服务器之间的语音通话媒体传输通道,该重邀请消息包括媒体服务器的SDP信息,媒体服务器的SDP信息包括媒体服务器的IP地址、音频端口信息以及音频编解码格式。
S207、用户通话终端通过IMS网元向媒体服务器发送应答消息。
该应答消息中包括用户通话终端的SDP信息,用户通话终端的SDP信息包括用户通话终端的IP地址、音频端口信息以及音频编解码格式。
通过S206-S207描述的媒体资源协商过程,用户通话终端可以获得媒体服务器的SDP信息,媒体服务器也可以获得用户通话终端的SDP信息,如此,建立了用户通话终端与媒体服务器之间的语音通话媒体传输通道。
S208、媒体服务器向客服通话终端发送重邀请(reinvite)消息。
该重邀请消息用于与客服通话终端进行媒体资源重协商,以建立客服通话终端与媒体服务器之间的语音通话媒体传输通道,该重邀请消息包括媒体服务器的SDP信息,媒体服务器的SDP信息包括媒体服务器的IP地址、音频端口信息以及音频编解码格式。
S209、客服通话终端向媒体服务器发送应答消息。
该应答消息中包括客服通话终端的SDP信息,客服通话终端的SDP信息包括客服通话终端的IP地址、音频端口信息以及音频编解码格式。
通过S208-S209描述的媒体资源协商过程,客服通话终端可以获得媒体服务器的SDP信息,媒体服务器也可以获得客服通话终端的SDP信息,如此,建立了客服通话终端与媒体服务器之间的语音通话媒体传输通道。
可以理解的是,通过上述S206-S209建立的语音通话媒体传输通道(包括用户通话终端与媒体服务器之间的语音通话媒体传输通道,以及客服通话终端与媒体服务器之间的语音通话媒体传输通道),该语音通话媒体传输通道用于传输客服通话终端与用户通话终端之间的通话语音流。示例性的,基于建立的语音通话媒体传输通道,当用户通话终端向客服通话终端发送通话语音流时,该用户通话终端基于该用户通话终端与媒体服务器之间的语音通话媒体传输通道将通话语音流发送至媒体服务器,然后,媒体服务器基于该媒体服务器与客服通话终端之间的语音通话媒体传输通道将其接收到的通话语音流发送至客服通话终端。
可选地,在有些情况下,通过媒体资源协商也可以建立用户通话终端与客服通话终端之间的直接用于传输通话语音流的语音通话媒体传输通道,应注意,用户通话终端与客服通话终端之间的直接用于传输通话语音流的语音通话媒体传输通道是不需要媒体服务器作为中转设备的通道,不一定是用户通话终端与客服通话终端的直连通道。在这种情况下,上述的S206-S209可以替换为S206'-S210'。
S206'、媒体服务器通过IMS网元向用户通话终端发送重邀请(reinvite)消息。
该重邀请消息用于与用户通话终端进行媒体资源重协商,该重邀请消息包括媒体服务器的SDP信息,媒体服务器的SDP信息包括媒体服务器的IP地址、音频端口信息以及音频编解码格式。
S207'、用户通话终端通过IMS网元向媒体服务器发送应答消息。
该应答消息中包括用户通话终端的SDP信息,该用户通话终端的SDP信息包括用户通话终端的IP地址、音频端口信息以及音频编解码格式。
S208'、媒体服务器向客服通话终端发送重邀请(reinvite)消息。
该重邀请消息用于与客服通话终端进行媒体资源重协商,该重邀请消息中包括用户通话终端的SDP信息,该用户通话终端的SDP信息包括用户通话终端的IP地址、音频端口信息以及音频编解码格式。
S209'、客服通话终端向媒体服务器发送应答消息。
该应答消息中包括客服通话终端的SDP信息,该客服通话终端的SDP信息包括该客服通话终端的IP地址、音频端口信息以及音频编解码格式。
S210'、媒体服务器向用户通话终端发送携带客服通话终端的SDP信息的应答消息。
通过S206'-S210'的媒体资源协商过程,用户通话终端可以获得客服通话终端的SDP信息,客服通话终端可以获得用户通话终端的SDP信息,即建立了用户通话终端和客服通话终端之间的语音通话媒体传输通道。基于建立的语音通话媒体传输通道,用户通话终端和客服通话终端可以直接通信,无需媒体服务器再转发通话语音流。示例性的,用户通话终端可以基于该用户通话终端与媒体服务器之间的语音通话媒体传输通道将通话语音流直接发送至客服通话终端,同理,客服通话终端也可以基于该语音通话媒体传输通道将通话语音流直接发送至用户通话终端。
结合图1所示的通信系统的架构,在用户通话终端接入接入网以及4G核心网或者5G核心网,以及IMS核心网的基础上,以视频通话为例,对视频通话的流程进行描述,以便于理解客服场景中视频通话的流程,该视频通话的流程与上述语音通话的流程类似,视频通话的流程中的相关内容可以参考语音通话的流程中的描述。参考图3,视频通话的流程包括:
S301、用户通话终端通过IMS网元向媒体服务器发送邀请(invite)消息。
该邀请消息中携带有用户通话终端的SDP信息,该SDP信息中包括用户通话终端的IP地址、音频端口信息、音频编解码格式、视频端口信息以及视频编解码格式,该SDP信息用于与媒体服务器进行媒体资源协商,以建立用户通话终端与媒体服务器之间的用于传输通话语音流的语音通话媒体传输通道和用于传输通话视频流的视频通话媒体传输通道。可以理解的是,视频通话的过程中涉及通话语音流和通话视频流的传输,因此相比于音频通话的过程,视频通话过程中的SDP信息还需包括视频端口信 息和视频编解码格式。
S302、媒体服务器向用户通话终端发送振铃消息。
该振铃消息中携带媒体服务器的SDP信息,该媒体服务器的IP地址、音频端口信息、音频编解码格式、视频端口信息以及视频编解码格式,该SDP信息用于与用户通话终端进行媒体资源协商,以建立用户通话终端与媒体服务器之间的用于传输通话语音流的语音通话媒体传输通道和用于传输通话视频流的视频通话媒体传输通道。
S303、媒体服务器通过IMS网元向用户通话终端发送应答消息。
本申请实施例中,在用户发起的是视频通话的情况下,客服系统对用户通话终端的呼叫进行应答之后,媒体服务器可以播放与用户的业务相关的视频提示内容,具体的,媒体服务器基于上述建立的语音通话媒体传输通道和视频通话媒体传输通道向用户通话终端发送该视频提示内容,该视频提示内容可以提示用户根据需求选择不同的服务内容。
当用户在该视频提示内容的提示下进行操作,选择了人工服务时,媒体服务器检测到选择人工服务的操作之后,媒体服务器为该用户分配一个客服人员(即为用户通话终端选择一个对应的客服通话终端),然后媒体服务器执行下述S304。
S304、媒体服务器向客服通话终端发送邀请(invite)消息。
该邀请消息用于呼叫客服通话终端与用户通话终端的视频会话,该邀请消息中包括媒体服务器的SDP信息,该媒体服务器的SDP信息包括媒体服务器的IP地址、音频端口信息、音频编解码格式、视频端口信息以及视频编解码格式。该SDP信息用于与客服通话终端进行媒体资源协商,以建立客服通话终端与媒体服务器之间的用于传输通话语音流的语音通话媒体传输通道和用于传输通话视频流的视频通话媒体传输通道。
S305、客服通话终端向媒体服务器发送应答消息。
客服通话终端发送该应答消息之后,该客服通话终端即加入与用户通话终端的视频通话,该应答消息中包括客服通话终端的SDP信息,该客服通话终端的SDP信息包括客服通话终端的IP地址、音频端口信息、音频编解码格式、视频端口信息以及视频编解码格式。该SDP信息用于与媒体服务器进行媒体资源协商,以建立客服通话终端与媒体服务器之间的用于传输通话语音流的语音通话媒体传输通道和用于传输通话视频流的视频通话媒体传输通道。
可以理解的是,由于客服通话终端是客服系统中的与用户通话终端通话的新的设备,因此,在后续流程中,为实现用户通话终端与客服通话终端进行通信,需要重新进行媒体资源协商,即媒体服务器与用户通话终端进行媒体资源重协商(参考S306-S307),媒体服务器与客服通话终端进行媒体资源重协商(参考S308-S309)。通过媒体资源重协商可以建立语音通话媒体传输通道和视频通话媒体传输通道。应注意,通过S306-S309建立的语音通话媒体传输通道和视频通话媒体传输通道是需要媒体服务器作为媒介的传输通道,即间接的语音通话媒体传输通道和间接的视频通话媒体传输通道,该语音通话媒体传输通道包括用户通话终端与媒体服务器之间的语音通话媒体传输通道,以及媒体服务器和客服通话终端之间的语音通话媒体传输通道,该视频通话媒体传输通道包括用户通话终端与媒体服务器之间的视频通话媒体传输通道, 以及媒体服务器和客服通话终端之间的视频通话媒体传输通道。
S306、媒体服务器通过IMS网元向用户通话终端发送重邀请(reinvite)消息。
该重邀请消息用于与用户通话终端进行媒体资源重协商,以建立用户通话终端与媒体服务器之间的语音通话媒体传输通道,以及用户通话终端与媒体服务器器之间的视频通话媒体传输通道,该重邀请消息包括媒体服务器的SDP信息,媒体服务器的SDP信息包括媒体服务器的IP地址、音频端口信息、音频编解码格式、视频端口信息以及视频编解码格式。
S307、用户通话终端通过IMS网元向媒体服务器发送应答消息。
该应答消息中包括用户通话终端的SDP信息,用户通话终端的SDP信息包括用户通话终端的IP地址、音频端口信息、音频编解码格式、视频端口信息以及视频编解码格式。
通过S306-S307描述的媒体资源协商过程,用户通话终端可以获得媒体服务器的SDP信息,媒体服务器也可以获得用户通话终端的SDP信息,如此,建立了用户通话终端与媒体服务器之间的语音通话媒体传输通道,以及用户通话终端与媒体服务器器之间的视频通话媒体传输通道。
S308、媒体服务器向客服通话终端发送重邀请(reinvite)消息。
该重邀请消息用于与客服通话终端进行媒体资源重协商,以建立客服通话终端与媒体服务器之间的语音通话媒体传输通道,以及客服通话终端与媒体服务器之间的视频通话媒体传输通道,该重邀请消息包括媒体服务器的SDP信息,媒体服务器的SDP信息包括媒体服务器的IP地址、音频端口信息、音频编解码格式、视频端口信息以及视频编解码格式。
S309、客服通话终端向媒体服务器发送应答消息。
该应答消息中包括客服通话终端的SDP信息,客服通话终端的SDP信息包括客服通话终端的IP地址、音频端口信息、音频编解码格式、视频端口信息以及视频编解码格式。
通过S308-S309描述的媒体资源协商过程,客服通话终端可以获得媒体服务器的SDP信息,媒体服务器也可以获得客服通话终端的SDP信息,如此,建立了客服通话终端与媒体服务器之间的语音通话媒体传输通道,以及客服通话终端与媒体服务器之间的视频通话媒体传输通道。
可以理解的是,通过上述S306-S309建立的语音通话媒体传输通道(包括用户通话终端与媒体服务器之间的语音通话媒体传输通道,以及客服通话终端与媒体服务器之间的语音通话媒体传输通道),该语音通话媒体传输通道用于传输客服通话终端与用户通话终端之间的通话语音流;通过上述S306-S309建立的视频通话媒体传输通道(包括用户通话终端与媒体服务器之间的视频通话媒体传输通道,以及客服通话终端与媒体服务器之间的视频通话媒体传输通道),该视频通话媒体传输通道用于传输客服通话终端与用户通话终端之间的通话视频流。
与语音通话过程类似,可选地,在有些情况下,通过媒体资源协商也可以建立用户通话终端与客服通话终端之间直接用于传输通话语音流的语音通话媒体传输通道和直接用于传输通话视频流的视频通话媒体传输通道。在这种情况下,上述的S306-S309 可以替换为S306'-S310'。
S306'、媒体服务器通过IMS网元向用户通话终端发送重邀请(reinvite)消息。
该重邀请消息用于与用户通话终端进行媒体资源重协商,该重邀请消息包括媒体服务器的SDP信息,媒体服务器的SDP信息包括媒体服务器的IP地址、音频端口信息、音频编解码格式、视频端口信息以及视频编解码格式。
S307'、用户通话终端通过IMS网元向媒体服务器发送应答消息。
该应答消息中包括用户通话终端的SDP信息,用户通话终端的SDP信息包括用户通话终端的IP地址、音频端口信息、音频编解码格式、视频端口信息以及视频编解码格式。
S308'、媒体服务器向客服通话终端发送重邀请(reinvite)消息。
该重邀请消息用于与客服通话终端进行媒体资源重协商,该重邀请消息包括用户通话终端的SDP信息,用户通话终端的SDP信息包括用户通话终端的IP地址、音频端口信息、音频编解码格式视频端口信息以及视频编解码格式。
S309'、客服通话终端向媒体服务器发送应答消息。
该应答消息中包括客服通话终端的SDP信息,该客服通话终端的SDP信息包括客服通话终端的IP地址、音频端口信息、音频编解码格式视频端口信息以及视频编解码格式。
S310'、媒体服务器向用户通话终端发送携带客服通话终端SDP信息的应答消息。
综上,与语音通话流程不同的是,该媒体协商过程中的所有的SDP信息中均包括设备的视频端口信息和视频编解码格式。
通过S306'-S310'的媒体资源协商过程,用户通话终端可以获得客服通话终端的SDP信息,客服通话终端可以获得用户通话终端的SDP信息,即建立了用户通话终端和客服通话终端之间的直接的语音通话媒体传输通道和视频通话媒体传输通道。基于建立的直接的语音通话媒体传输通道和视频通话媒体传输通道,用户通话终端和客服通话终端之间通信时无需媒体服务器再转发通话语音流和通话视频流。
可选地,上述用户通话终端为通话终端,客服通话终端为对端通话终端,或者,客服通话终端为通话终端,用户通话终端为对端通话终端,具体根据实际情况确定,本申请实施例不做限定。
本申请实施例中,上述通话终端(通话终端和对端通话终端)可以为手机、平板电脑或个人计算机(Ultra-mobile Personal Computer,UMPC)等电子设备。或者,还可以为其他桌面型设备、膝上型设备、手持型设备、可穿戴设备、智能家居设备和车载型设备等电子设备,例如上网本、智能手表、智能相机、上网本、个人数字助理(Personal Digital Assistant,PDA)等。本申请实施例对通话终端的具体类型和结构等不作限定。
以通话终端为手机为例,图4A为本申请实施例提供的一种手机400的硬件结构示意图,该手机400包括处理器410,外部存储器接口420,内部存储器421,通用串行总线(universal serial bus,USB)接口430,充电管理模块440,电源管理模块441,电池442,天线1,天线2,移动通信模块450,无线通信模块460,音频模块470,扬声器470A,受话器470B,麦克风470C,耳机接口470D,传感器模块480,按键490, 马达491,指示器492,摄像头493,显示屏494,以及用户标识模块(subscriber identification module,SIM)卡接口495等。
可以理解的是,本申请实施例示意的结构并不构成对手机400的具体限定。在本申请另一些实施例中,手机400可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。图示的部件可以以硬件,软件或软件和硬件的组合实现。
处理器410可以包括一个或多个处理单元,例如:处理器410可以包括应用处理器(application processor,AP),调制解调处理器,图形处理器(graphics processing unit,GPU),图像信号处理器(image signal processor,ISP),控制器,存储器,视频编解码器,数字信号处理器(digital signal processor,DSP),基带处理器,和/或神经网络处理器(neural-network processing unit,NPU)等。其中,不同的处理单元可以是独立的器件,也可以集成在一个或多个处理器中。
其中,控制器可以是手机400的神经中枢和指挥中心。控制器可以根据指令操作码和时序信号,产生操作控制信号,完成取指令和执行指令的控制。
处理器410中还可以设置存储器,用于存储指令和数据。在一些实施例中,处理器410中的存储器为高速缓冲存储器。该存储器可以保存处理器410刚用过或循环使用的指令或数据。如果处理器410需要再次使用该指令或数据,可从存储器中直接调用。避免了重复存取,减少了处理器410的等待时间,因而提高了系统的效率。
充电管理模块440用于从充电器接收充电输入。充电管理模块440为电池442充电的同时,还可以通过电源管理模块441为电子设备供电。
电源管理模块441用于连接电池442,充电管理模块440与处理器410。电源管理模块441接收电池442和/或充电管理模块440的输入,为处理器410,内部存储器421,外部存储器,显示屏494,摄像头493,和无线通信模块460等供电。电源管理模块441还可以用于监测电池容量,电池循环次数,电池健康状态(漏电,阻抗)等参数。在其他一些实施例中,电源管理模块441也可以设置于处理器410中。在另一些实施例中,电源管理模块441和充电管理模块440也可以设置于同一个器件中。
手机400的无线通信功能可以通过天线1,天线2,移动通信模块450,无线通信模块460,调制解调处理器以及基带处理器等实现。
天线1和天线2用于发射和接收电磁波信号。
移动通信模块450可以提供应用在手机400上的包括2G/3G/4G/5G等无线通信的解决方案。移动通信模块450可以由天线1接收电磁波,并对接收的电磁波进行滤波,放大等处理,传送至调制解调处理器进行解调。移动通信模块450还可以对经调制解调处理器调制后的信号放大,经天线1转为电磁波辐射出去。在一些实施例中,移动通信模块450的至少部分功能模块可以被设置于处理器410中。在一些实施例中,移动通信模块450的至少部分功能模块可以与处理器410的至少部分模块被设置在同一个器件中。
无线通信模块460可以提供应用在手机400上的包括无线局域网(wireless local area networks,WLAN)(如无线保真(wireless fidelity,Wi-Fi)网络),蓝牙(bluetooth,BT),全球导航卫星系统(global navigation satellite system,GNSS),调频(frequency modulation, FM),近距离无线通信技术(near field communication,NFC),红外技术(infrared,IR)等无线通信的解决方案。无线通信模块460可以是集成至少一个通信处理模块的一个或多个器件。无线通信模块460经由天线2接收电磁波,将电磁波信号调频以及滤波处理,将处理后的信号发送到处理器410。无线通信模块460还可以从处理器410接收待发送的信号,对其进行调频,放大,经天线2转为电磁波辐射出去。
在一些实施例中,手机400的天线1和移动通信模块450耦合,天线2和无线通信模块460耦合,使得手机400可以通过无线通信技术与网络以及其他设备通信。
手机400通过GPU,显示屏494,以及应用处理器等实现显示功能。GPU为图像处理的微处理器,连接显示屏494和应用处理器。GPU用于执行数学和几何计算,用于图形渲染。处理器410可包括一个或多个GPU,其执行程序指令以生成或改变显示信息。
显示屏494用于显示图像,视频等。显示屏494包括显示面板。显示面板可以采用液晶显示屏(liquid crystal display,LCD),有机发光二极管(organic light-emitting diode,OLED),有源矩阵有机发光二极体或主动矩阵有机发光二极体(active-matrix organic light emitting diode的,AMOLED),柔性发光二极管(flex light-emitting diode,FLED),Miniled,MicroLed,Micro-oLed,量子点发光二极管(quantum dot light emitting diodes,QLED)等。在一些实施例中,手机400可以包括1个或N个显示屏494,N为大于1的正整数。
手机400可以通过ISP,摄像头493,视频编解码器,GPU,显示屏494以及应用处理器等实现拍摄功能。
ISP用于处理摄像头493反馈的数据,摄像头493用于捕获静态图像或视频。
数字信号处理器用于处理数字信号,除了可以处理数字图像信号,还可以处理其他数字信号(如音频信号等)。
视频编解码器用于对数字视频压缩或解压缩。手机400可以支持一种或多种视频编解码器。这样,手机400可以播放或录制多种编码格式的视频,例如:动态图像专家组(moving picture experts group,MPEG)1,MPEG2,MPEG3,MPEG4等。
外部存储器接口420可以用于连接外部存储卡,例如Micro SD卡,实现扩展手机400的存储能力。外部存储卡通过外部存储器接口420与处理器410通信,实现数据存储功能。例如将音乐,视频等文件保存在外部存储卡中。
内部存储器421可以用于存储计算机可执行程序代码,所述可执行程序代码包括指令。处理器410通过运行存储在内部存储器421的指令,从而执行手机400的各种功能应用以及数据处理。内部存储器421可以包括存储程序区和存储数据区。其中,存储程序区可存储操作系统,至少一个功能所需的应用程序(比如声音播放功能,图像播放功能等)等。存储数据区可存储手机400使用过程中所创建的数据(比如音频数据,电话本等)等。此外,内部存储器421可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件,闪存器件,通用闪存存储器(universal flash storage,UFS)等。
手机400可以通过音频模块470,扬声器470A,受话器470B,麦克风470C,耳机接口470D,以及应用处理器等实现音频功能。例如音乐播放,录音等。
音频模块470用于将数字音频信息转换成模拟音频信号输出,也用于将模拟音频输入转换为数字音频信号。音频模块470还可以用于对音频信号编码和解码。在一些实施例中,音频模块470可以设置于处理器410中,或将音频模块470的部分功能模块设置于处理器410中。
扬声器470A,也称“喇叭”,用于将音频电信号转换为声音信号。手机400可以通过扬声器470A收听音乐,或收听免提通话。
受话器470B,也称“听筒”,用于将音频电信号转换成声音信号。当手机400接听电话或语音信息时,可以通过将受话器470B靠近人耳接听语音。
麦克风470C,也称“话筒”,“传声器”,用于将声音信号转换为电信号。当拨打电话或发送语音信息时,用户可以通过人嘴靠近麦克风470C发声,将声音信号输入到麦克风470C。手机400可以设置至少一个麦克风470C。在另一些实施例中,手机400可以设置两个麦克风470C,除了采集声音信号,还可以实现降噪功能。在另一些实施例中,手机400还可以设置三个,四个或更多麦克风470C,实现采集声音信号,降噪,还可以识别声音来源,实现定向录音功能等。
耳机接口470D用于连接有线耳机。
按键490包括开机键,音量键等。手机400可以接收按键输入,产生与手机400的用户设置以及功能控制有关的键信号输入。
马达491可以产生振动提示。马达491可以用于来电振动提示,也可以用于触摸振动反馈。
指示器492可以是指示灯,可以用于指示充电状态,电量变化,也可以用于指示消息,未接来电,通知等。
SIM卡接口495用于连接SIM卡。SIM卡可以通过插入SIM卡接口495,或从SIM卡接口495拔出,实现和手机400的接触和分离。
可以理解的,本申请实施例中,上述手机400可以执行本申请实施例中的部分或全部步骤,这些步骤或操作仅是示例,手机400还可以执行其它操作或者各种操作的变形。此外,各个步骤可以按照本申请实施例呈现的不同的顺序来执行,并且有可能并非要执行本申请实施例中的全部操作。本申请各实施例可以单独实施,也可以任意组合实施,本申请对此不作限定。
本申请实施例提供的通信方法可以应用于具有如图4A所示硬件结构的通话终端或者具有类似结构的通话终端。或者还可以应用于其他结构的通话终端中,本申请实施例对此不作限定。
在对通话终端的硬件结构进行介绍之后,本申请这里以通话终端为手机400为例,对本申请提供的通话终端的系统架构进行介绍。手机400的系统架构可以采用分层架构,事件驱动架构,微核架构,微服务架构,或云架构。本申请实施例以分层架构的系统为例,示例性说明手机400的软件结构。图4B是本申请实施例的通话终端的软件结构框图。
分层架构将软件分成若干个层,每一层都有清晰的角色和分工。层与层之间通过软件接口通信。在一些实施例中,将Android系统分为四层,从上至下分别为应用程序层,应用程序框架层,安卓运行时(Android runtime)和系统库,以及内核层。
应用程序层可以包括一系列应用程序包,如图4B所示,应用程序包可以包括相机,图库,日历,通话,地图,导航,WLAN,蓝牙,音乐,视频,短信息等应用程序。
本申请实施例中,可以基于手机400的应用程序层中的通话应用程序(即通话APP)可以用于与其他的通话终端进行语音通话或视频通话。该通话应用程序是手机400出厂时已经具有的应用程序,无需用户进行安装、配置等操作。
可以理解的是,本申请实施例提供的通信方法中,通话终端和对端通话终端在语音通话或视频通话过程中对视频画面中的目标对象进行标记操作的功能是基于通话终端和对端通话终端上的通话应用程序实现的。也可以认为,本申请实施例中的通话终端和对端通话终端具体为通话终端或对端通话终端上的通话应用程序。
应用程序框架层为应用程序层的应用程序提供应用编程接口(application programming interface,API)和编程框架。应用程序框架层包括一些预先定义的函数。
如图4B所示,应用程序框架层可以包括窗口管理器,内容提供器,视图系统,电话管理器,资源管理器,通知管理器等。
其中,窗口管理器用于管理窗口程序。窗口管理器可以获取显示屏大小,判断是否有状态栏,锁定屏幕,截取屏幕等。内容提供器用来存放和获取数据,并使这些数据可以被应用程序访问。所述数据可以包括视频,图像,音频,拨打和接听的电话,浏览历史和书签,电话簿等。视图系统包括可视控件,例如显示文字的控件,显示图片的控件等。视图系统可用于构建应用程序。显示界面可以由一个或多个视图组成的。例如,包括短信通知图标的显示界面,可以包括显示文字的视图以及显示图片的视图。电话管理器用于提供通话终端的通信功能,例如通话状态的管理(包括接通,挂断等)。资源管理器为应用程序提供各种资源,比如本地化字符串,图标,图片,布局文件,视频文件等等。通知管理器使应用程序可以在状态栏中显示通知信息,可以用于传达告知类型的消息,可以短暂停留后自动消失,无需用户交互。比如通知管理器被用于告知下载完成,消息提醒等。通知管理器还可以是以图表或者滚动条文本形式出现在系统顶部状态栏的通知,例如后台运行的应用程序的通知,还可以是以对话窗口形式出现在屏幕上的通知。例如在状态栏提示文本信息,发出提示音,电子设备振动,指示灯闪烁等。
Android runtime包括核心库和虚拟机,Android runtime负责安卓系统的调度和管理。
核心库包含两部分:一部分是java语言需要调用的功能函数,另一部分是安卓的核心库。
应用程序层和应用程序框架层运行在虚拟机中。虚拟机将应用程序层和应用程序框架层的java文件执行为二进制文件。虚拟机用于执行对象生命周期的管理,堆栈管理,线程管理,安全和异常的管理,以及垃圾回收等功能。
系统库可以包括多个功能模块。例如:表面管理器(surface manager),媒体库(Media Libraries),三维图形处理库(例如:OpenGL ES),2D图形引擎(例如:SGL)等。
其中,表面管理器用于对显示子系统进行管理,并且为多个应用程序提供了2D和3D图层的融合。媒体库支持多种常用的音频,视频格式回放和录制,以及静态图 像文件等。媒体库可以支持多种音视频编码格式,例如:MPEG4,H.264,MP3,AAC,AMR,JPG,PNG等。三维图形处理库用于实现三维图形绘图,图像渲染,合成,和图层处理等。2D图形引擎是2D绘图的绘图引擎。
内核层是硬件和软件之间的层。内核层至少包含显示驱动,摄像头驱动,音频驱动,传感器驱动。
下面结合捕获拍照场景,示例性说明手机400软件以及硬件的工作流程。
当手机400触摸传感器接收到触摸操作,相应的硬件中断被发给内核层。内核层将触摸操作加工成原始输入事件(包括触摸坐标,触摸操作的时间戳等信息)。原始输入事件被存储在内核层。应用程序框架层从内核层获取原始输入事件,识别该输入事件所对应的控件。以该触摸操作是触摸单击操作,该单击操作所对应的控件为相机应用图标的控件为例,相机应用调用应用框架层的接口,启动相机应用,进而通过调用内核层启动摄像头驱动,通过摄像头493捕获静态图像或视频。
本申请实施例中,上述通信系统中的媒体服务器可以为硬件形态的服务器,也可以是软件形态的服务器。以硬件形态的服务器为例,如图5所示,本申请实施例提供一种媒体服务器500,该媒体服务器500包括至少一个处理器501和存储器502。
其中,处理器501包括一个或多个中央处理器(central processing unit,CPU)。该CPU为单核CPU(single-CPU)或多核CPU(multi-CPU)。
存储器502包括但不限于是随机存取存储器(random access memory,RAM)、只读存储器(read only memory,ROM)、可擦除可编程只读存储器(erasable programmable read-only memory,EPROM)、快闪存储器、或光存储器等。存储器502中保存有操作系统的代码。
可选地,处理器501通过读取存储器502中保存的指令实现上述实施例中的方法,或者,处理器501通过内部存储的指令实现上述实施例中的方法。在处理器501通过读取存储器502中保存的指令实现上述实施例中的方法的情况下,存储器502中保存实现本申请实施例提供的通信方法的指令。
存储器502中存储的程序代码被至少一个处理器501读取后,媒体服务器500执行以下操作:建立第一视频通话媒体传输通道和第二视频通话媒体传输通道,且通过第一视频通话媒体传输通道和第二视频通话媒体传输通道传输通话终端与对端通话终端之间的通话视频流,以实现通话终端与对端通话终端之间的视频通话业务;从通话终端接收视频画面标记申请;并且通过第一视频通话媒体传输通道从通话终端接收第一标记媒体数据,以及通过所述第二视频通话媒体传输通道向所述对端通话终端传输第二标记媒体数据,该第一标记媒体数据和第二标记媒体数据均用于在对端通话终端的通话界面呈现第二视频画面。
可选地,图5所示的媒体服务器500还包括网络接口503。网络接口503是有线接口,例如光纤分布式数据接口(fiber distributed data interface,FDDI)、千兆以太网(gigabit ethernet,GE)接口。或者,网络接口503是无线接口。网络接口503用于接收消息(例如SIP消息等)。或者,网络接口503用于接收通话视频流或通话语音流。
存储器502用于存储网络接口503接收到的音频流或视频流,至少一个处理器501 进一步根据存储器502保存的这些信息来执行上述方法实施例所描述的方法。处理器501实现上述功能的更多细节请参考前面各个方法实施例中的描述,在这里不再重复。
可选地,媒体服务器500还包括总线504,上述处理器501、存储器502通常通过总线504相互连接,或采用其他方式相互连接。
可选地,媒体服务器500还包括输入输出接口505,输入输出接口505用于与输入设备连接,接收用户通过输入设备输入的指令。输入设备包括但不限于键盘、触摸屏、麦克风等等。输入输出接口505还用于与输出设备连接,输出处理器501的处理结果。输出设备包括但不限于显示器、打印机等等。
结合上述实施例的相关描述,本申请实施例提供一种通信方法,该方法可以应用于通信系统中的具有上述图4A所示的硬件结构和上述图4B所示的系统架构的通话终端(包括通话终端和该通话终端的对端通话终端)、具有上述图5所示的硬件结构的媒体服务器中实现,通过各个设备的交互实现通信方法。
需要说明的是,本申请实施例中,通话终端的数量为一个,对端通话终端的数量为至少一个。在人工客服的服务场景中,发起通话的设备是用户通话终端,被呼叫的设备是客服通话终端,在一个用户通话终端与一个客服通话终端通话的过程中,也可以邀请一个或多个第三方通话终端加入该通话。其中,客服通话终端可以对视频画面中的目标对象进行标记操作,将视频画面和生成的标记痕迹数据传输至用户通话终端,那么客服通话终端即为通话终端,用户通话终端为对端通话终端。
根据上述实施例的内容,可知,在客服场景中,在人工服务的阶段,即客服人员与用户(即客服人员所持有的客服通话终端与用户所持有的用户通话终端)通话的过程中,客服人员或用户均可以对视频画面中的目标对象进行标记操作。下面对本申请实施例提供的通信方法进行详细描述,如图6所示,本申请实施例提供的通信方法包括如下步骤。
S601、建立视频通话媒体传输通道,且通过该视频通话媒体传输通道传输通话终端与对端通话终端之间的通话视频流,以实现通话终端与对端通话终端之间的视频通话业务。
本申请实施例中,上述视频通话媒体传输通道是通话终端、媒体服务器以及对端通话终端进行交互建立的,该视频通话媒体传输通道用于传输视频通话业务中的通话终端与对端通话终端之间的通话视频流,该通话视频流包含通话终端或对端通话终端拍摄的视频内容。可以理解的是,本申请实施例中,通话终端和对端通话终端是相对的概念,参与通话的两个终端中,任意一个终端可以为通话终端,则另一个终端为对端通话终端。
可选地,通话终端、对端通话终端以及媒体服务器参与建立的视频通话媒体传输通道可以为间接的视频通话媒体传输通道,上述建立视频通话媒体传输通道具体包括:建立第一视频通话媒体传输通道,并且建立第二视频通话媒体传输通道,第一视频通话媒体传输通道为通话终端与媒体服务器之间的视频通话媒体传输通道,第二视频通话媒体传输通道为客服通话终端与媒体服务器之间的视频通话媒体传输通道。本申请实施例中,上述视频通话媒体传输通道的建立过程可以参考上述实施例中的S206-S209的描述,此处不再赘述。
可选地,通话终端、对端通话终端以及媒体服务器参与建立的视频通话媒体传输通道也可以为直接的视频通话媒体传输通道,该视频通话媒体传输通道包括通话终端与对端通话终端之间的视频通话媒体传输通道,具体过程可以参考上述实施例中的S206'-S210'的描述。
S602、通话终端向媒体服务器发送视频画面标记申请。相应地,媒体服务器从通话终端接收视频画面标记申请。
该视频画面标记申请中包括对端通话终端的标识,该标识用于申请对对端通话终端对应的视频画面(以下实施例中称为第一视频画面)进行标记操作。可以理解的是,与通话终端通话的对端通话终端可能包括多个,通话终端与多个对端通话终端通话的过程中,可以申请向某一个对端通话终端传输对第一视频画面中的目标对象进行标记操作之后而生成的标记媒体数据。
上述对端通话终端对应的视频画面可以是对端通话终端捕获的视频内容,对端通话终端对应的视频画面也可以是通话终端的存储装置中存储的视频内容,该视频内容是通话终端选择的即将发送给对端通话终端的视频内容。
可选地,若呼叫方(即通话终端)发起的是视频通话,通话终端根据视频提示内容选择人工服务之后,客服系统中的媒体服务器呼叫对端通话终端,并且在对端通话终端应答之后,通话终端、对端通话终端以及媒体服务器进行交互建立视频通话媒体传输通道。在这种情况下,先执行S601,后执行S602。
可选地,若呼叫方发起的是语音通话,通话终端根据音频提示内容选择人工服务之后,客服系统中的媒体服务器呼叫对端通话终端,并且在对端通话终端应答之后,通话终端、对端通话终端以及媒体服务器进行交互建立语音通话媒体传输通道,待媒体服务器接收到视频画面标记申请之后,通话终端、对端通话终端以及媒体服务器进行交互建立视频通话媒体传输通道。在这种情况下,先执行S602,后执行S601。
S603、媒体服务器向对端通话终端发送视频画面标记请求。相应地,对端通话终端从媒体服务器接收视频画面标记请求。
该视频画面标记请求用于请求对第一视频画面进行标记操作。
S604、对端通话终端向媒体服务器发送视频画面标记请求的响应消息。相应地,媒体服务器从对端通话终端接收视频画面标记请求的响应消息。
该响应消息用于指示对端通话终端同意对第一视频画面进行标记操作。
可选地,若呼叫方发起的是语音通话,上述视频画面标记请求还指示请求将语音通话转换视频通话,视频画面标记请求的响应消息还指示对端通话终端同意将语音通话转为视频通话。
可选地,第一视频画面来自通话终端从对端通话终端接收的视频内容,视频画面标记请求的响应消息还指示对端通话终端同意媒体服务器捕获该对端通话终端发送给媒体服务器的通话视频流。
本申请实施例中,在通话终端对视频画面中的目标对象进行标记操作之前,媒体服务器与通话终端交互以确认通话终端具备对第一视频画面进行标记操作所需的资源(下述S605-S606)。通话终端发出视频画面标记申请之后,通话终端本身的状态可能会发生变化,例如通话终端当前网络信号可能较差,或者处于2G/3G网络,其带宽 不足以支持该通话终端进行标记操作,或者视频通话媒体传输通道不可用,或者通话终端对应的用户不方便实施标记操作等等,在这些情况下,通话终端不具备对第一视频画面进行标记操作所需的资源。
S605、媒体服务器向通话终端发送SIP消息。相应地,通话终端从媒体服务器接收SIP消息。
SIP消息中包括标记操作确认标识,该标记操作确认标识用于确认通话终端是否具备对第一视频画面进行标记操作所需的资源。
S606、通话终端向媒体服务器发送SIP消息的响应消息。相应地,媒体服务器从通话终端接收SIP消息的响应消息。
该响应消息中包括标记操作应答标识,该标记操作应答标识用于指示通话终端具备对第一视频画面进行标记操作所需的资源。
本申请实施例中,上述SIP消息中的标记操作确认标识可以携带在SIP消息的头域中。
可选地,标记操作确认标识在SIP消息的头域中有以下两种携带方式。
第一种携带方式:在SIP消息的Contact的扩展字段中携带标记操作确认标识(记为Tag)。
以INVITE sip:02033296999@gd.ctcims.cn SIP/2.0为例,
Contact头域为:
<sip:172.27.10.10:5060;transport=udp;zte-did=26-3-20481-3629-12-890-3302;zte-uid=200001+861892222222;Hpt=8e48_16;CxtId=4;TRC=ffffffff-ffffffff>;audio;video;Tag;+g.3gpp.mid-call;+g.3gpp.srvcc-alerting;+g.3gpp.ps2cs-srvcc-orig-pre-alerting;+g.3gpp.icsi-ref="urn%3Aurn-7%3A3gpp-service.ims.icsi.mmtel";
Max-Forwards:64.
第二种携带方式:在SIP消息的Supported扩展字段中携带标记操作确认标识(Tag)
Supported:100rel,histinfo,precondition,timer,Tag.
可选地,当呼叫方发起的通话是语音通话时,上述S605中的SIP消息中还包括媒体服务器的SDP信息,该媒体服务器的SDP信息包括媒体服务器的地址信息(例如IP地址)、音频端口信息、音频编解码格式、视频端口信息以及视频编解码格式。
当SIP消息中包括媒体服务器的SDP信息时,该SIP消息的响应消息中包括还包括通话终端的SDP信息,该通话终端的SDP信息包括通话终端的地址信息(例如IP地址)、音频端口信息、音频编解码格式、视频端口信息以及视频编解码格式。
本申请实施例中,当呼叫方发起的通话是语音通话时,在媒体服务器接收到通话终端发送的视频标记申请之后,媒体服务器与通话终端可以根据S605-S606中SIP消息和SIP消息的响应消息中的媒体服务器的SDP信息和通话终端的SDP信息进行媒体资源协商,建立通话终端与媒体服务器之间的视频通话媒体传输通道(即第一视频通话媒体传输通道)。在这种情况下,S602中的通话终端、对端通话终端以及媒体服务器进行交互以建立视频通话媒体传输通道的过程中,媒体服务器与通话终端之间进行媒体资源协商的步骤可以用步骤S605-S606进行替换。
可选地,媒体服务器与对端通话终端进行交互(例如媒体服务器向对端通话终端 发送SIP消息,该SIP消息中包括媒体服务器的SDP信息,对端通话终端向媒体服务器发送SIP消息的响应消息,该SIP消息的响应消息中包括对端通话终端的SDP信息),进行媒体资源协商,建立对端通话终端与媒体服务器之间的视频通话媒体传输通道(即第二视频通话媒体传输通道)。
可选地,在SIP消息中包括媒体服务器的SDP信息的情况下,上述标记操作确认标识也可以携带在媒体服务器的SDP信息中。
在SDP信息中携带标记操作确认标识(Tag)的情况下,可以在SDP信息的扩展字段中进一步指示传输标记媒体数据的视频端口信息。具体的,可以指示使用传输通话视频流的视频端口传输标记媒体数据,下面对SDP信息的字段进行示意。
a=sendrecv;标识双向视频通话
a=sendonly/sendrecv;标识单向视频通话
a=Tag;标记操作确认标识
m=video 12082 RTP/AVP 114 113;标识使用传输通话视频流的视频端口传输标记媒体数据
v=0
o=HuaWeiUAP9600 12 12 IN IP4 10.137.2.167
s=Sip Call
c=IN IP4 10.137.2.176//IP地址
t=0 0
m=audio 12080 RTP/AVP 104 103 102 101 8 0 18 96 97//音频端口号
b=AS:41
b=RS:600
b=RR:2000
a=rtpmap:104 AMR-WB/16000/1//音频编解码
a=fmtp:104 mode-change-capability=2;max-red=0
a=rtpmap:103 AMR-WB/16000/1
a=fmtp:103 octet-align=1;mode-change-capability=2;max-red=0
a=rtpmap:102 AMR/8000/1
a=fmtp:102 mode-change-capability=2;max-red=0
a=rtpmap:101 AMR/8000/1
a=fmtp:101 octet-align=1;mode-change-capability=2;max-red=0
a=rtpmap:96 telephone-event/16000
a=fmtp:96 0-15
a=rtpmap:97 telephone-event/8000
a=fmtp:97 0-15
a=curr:qos local none
a=curr:qos remote none
a=des:qos mandatory local sendrecv
a=des:qos optional remote sendrecv
a=sendrecv
a=maxptime:240
a=ptime:20
m=video 12082 RTP/AVP 114 113//视频端口号
b=AS:2154
b=RS:8000
b=RR:6000
a=rtpmap:114 H264/90000//视频编解码
a=fmtp:114
profile-level-id=42C01F;sprop-parameter-sets=Z0LAH9oC0ChoBtChNQ==,aM4G4g==;pa cketization-mode=1;sar-understood=16;sar-supported=1
a=imageattr:114send[x=720,y=1280]recv[x=720,y=1280]
a=rtpmap:113H264/90000
a=fmtp:113
profile-level-id=42C01F;sprop-parameter-sets=Z0LAH9oC0ChoBtChNQ==,aM4G4g==;pa cketization-mode=0;sar-understood=16;sar-supported=1
a=imageattr:113 send[x=720,y=1280]recv[x=720,y=1280]
a=curr:qos local none
a=curr:qos remote none
a=des:qos mandatory local sendrecv
a=des:qos optional remote sendrecv
a=rtcp-fb:*nack
a=rtcp-fb:*nack pli
a=rtcp-fb:*ccm fir
a=rtcp-fb:*ccm tmmbr
a=sendrecv
a=Tag
a=tcap:1 RTP/AVPF
a=pcfg:1 t=1
a=extmap:2 urn:3gpp:video-orientation.
可以理解的是,根据对SDP信息的描述可知,在SDP信息中可以指示视频通话是单向视频通话还是双向视频通话,以通话终端和对端通话终端为例,单向视频通话可以仅传输通话终端的通话视频流,不传输对端通话终端的视频流。例如,通话终端将通话终端拍摄到的视频内容发送至对端通话终端,在对端通话终端上显示通话终端的拍摄到的视频内容,而对端通话终端不拍摄视频内容或者器其拍摄的视频内容不会发送给通话终端,即在通话终端上不显示对端通话终端拍摄的视频内容。
S607、通话终端基于目标媒体数据在通话终端的通话界面呈现第一视频画面。
本申请实施例中,上述通话界面是通话终端与对端通话终端进行视频通话的过程中,该通话终端上所呈现的界面,该通话界面可以是通话窗口或对话框等,本申请实 施例不做限定。
可选地,本申请实施例中,上述目标媒体数据可以是通话终端从对端通话终端接收的数据,目标媒体数据也可以是该通话终端从该通话终端本地获取的数据。
一种实现方式中,目标媒体数据包含第一视频帧对应的数据,该第一视频帧为通过视频通话媒体传输通道从对端通话终端接收的且用于呈现第一视频画面。在这种情况下,上述S607具体包括S6071。
S6071、通话终端解码第一视频帧对应的数据以在该通话终端的通话界面呈现第一视频画面。
可选地,媒体服务器通过第二视频通话媒体传输通道先从对端通话终端接收视频流数据(该视频流数据即为通话视频流),该视频流数据中包括第一视频帧对应的数据,然后媒体服务器通过第一视频通话媒体传输通道向通话终端发送该视频流数据,进而通话终端从该视频流数据中获取第一视频帧对应的数据,并基于该第一视频帧对应的数据在通话终端的通话界面呈现第一视频画面。
可选地,媒体服务器通过第二视频通话媒体传输通道先从对端通话终端接收视频流数据(该视频流数据即为通话视频流),该视频流数据中包括第一视频帧对应的数据,然后媒体服务器从该视频流数据中获取第一视频帧对应的数据,然后媒体服务器通过第一视频通话媒体传输通道向通话终端发送第一视频帧对应的数据,进而通话终端可以基于该第一视频帧对应的数据在通话终端的通话界面呈现第一视频画面。
另一种实现方式中,目标媒体数据包含目标图像对应的数据,该目标图像为通话终端本地存储的且用于呈现第一视频画面。在这种情况下,上述S607具体包括S6072。
S6072、通话终端解码目标图像对应的数据以在该通话终端的通话界面呈现第一视频画面。
S608、通话终端检测用户对第一视频画面中的目标对象的标记操作,生成标记痕迹数据,该标记痕迹数据用于描述标记操作所产生的标记痕迹。
可以理解的是,本申请实施例中,触碰痕迹是数字化的痕迹或虚拟痕迹。
本申请实施例中,标记操作可以是在第一视频画面上的点触操作,也可以是在第一白板画面上划写操作,例如以不同形式的线条在第一视频画面中进行绘划从而标记出目标对象,比如通过矩形框、圆圈、三角形或不规则的封闭形状圈出目标对象,或者通过实线、虚线、箭头线或者其他特殊符号(例如在目标对象旁边画一个五角星)标记出目标对象。本申请实施例对标记操作的具体形式不做限定,任何可以标记出目标对象的操作均可以认为是标记操作。
可以理解的,用户对第一视频画面中的目标对象进行标记操作之后,可以形成与标记操作的具体行为对应的标记痕迹,且用户可以通过任何能够标记出目标对象的方式来对目标对象标记。上述标记痕迹数据包括但不限于可以指示上述标记痕迹的时间戳、颜色、形状、位置(例如标记痕迹上的各个点的坐标等位置参数)等。示例性的,某一视频画面中的标记痕迹为红色的圆圈,标记痕迹数据包括指示视频画面的数据(例如视频画面的时间戳或标识信息)、指示标记痕迹的颜色为红色的数据、指示标记痕迹的形状为圆圈的数据、以及指示标记痕迹的圆心坐标和半径的数据。
本申请实施例中,通话终端可以采用该通话终端上的视频标记工具,并调用该视 频标记工具中的功能项对目标对象进行标记操作,该视频标记工具是通话终端自带的系统软件。
S609、媒体服务器向通话终端发送传输通道指示信息。相应地,通话终端从媒体服务器接收传输通道指示信息。
该传输通道指示信息用于指示通话终端通过视频通话媒体传输通道传输标记媒体数据。该标记媒体数据用于在对端通话终端的通话界面呈现第二视频画面,第二视频画面包含上述标记痕迹,即对端通话终端接收到该标记媒体数据之后,对端通话终端可以基于该标记媒体数据在对端通话终端的通话界面呈现第二视频画面。
可选地,该传输通道指示信息可以携带在SIP消息中,该SIP消息可以与上述S605中的SIP消息为同一消息,可以不同的SIP消息,本申请实施例不做限定。
上述媒体服务器可以通过上述S608所述的显式指示的方法(即发送传输通道指示信息)指示通过视频通话媒体传输通道传输标记媒体数据,在有些情况下,媒体服务器也可以通过隐式指示的方法指示通过视频通话媒体传输通道传输标记媒体数据。例如,在S605中的SIP消息中携带媒体服务器的SDP信息,在该SIP消息的响应消息中携带通话终端的SDP信息,以协商(或指示)使用基于这一对SDP信息(媒体服务器的SDP信息和通话终端的SDP)建立的视频通话媒体传输通道传输标记媒体数据,可以理解的是,该视频通话媒体传输通道原本是用于传输通话视频流的。
S610、通话终端通过视频通话媒体传输通道向对端通话终端传输标记媒体数据,以使对端通话终端基于该标记媒体数据在该对端通话终端的通话界面呈现第二视频画面。
在一种实现方式中,上述视频通话媒体传输通道包括通话终端与媒体服务器之间的第一视频通话媒体传输通道,以及对端通话终端与媒体服务器之间的第二视频通话媒体传输通道,即视频通话媒体传输通道为间接的传输通道,则上述S610具体通过S6101-S6102实现。
S6101、通话终端通过第一视频通话媒体传输通道向媒体服务器传输第一标记媒体数据。相应地,媒体服务器通过该第一视频通话媒体传输通道从通话终端接收第一标记媒体数据。
可以理解的是,第一标记媒体数据用于在对端通话终端上呈现第二视频画面。
需要说明的是,对于S6101,上述S609中,媒体服务器向通话终端发送的是第一传输通道指示信息,该第一传输通道指示信息指示通话终端通过第一视频通话媒体传输通道传输第一标记媒体数据。
S6102、媒体服务器通过第二视频通话媒体传输通道向对端通话终端传输第二标记媒体数据。相应地,对端通话终端通过第二视频通话媒体传输通道从媒体服务器接收第二标记媒体数据。
可以理解的是,第二标记媒体数据也是用于在对端通话终端上呈现第二视频画面。上述的第一标记媒体数据与第二标记媒体数据可能是同一种类数据,也可能是不同种类的数据。
需要说明的是,对于S6102,在执行S609之前,媒体服务器还向对端通话终端发送第二传输通道指示信息,该第二传输通道指示信息指示对端通话终端通过第二视频 通话媒体传输通道传输第二标记媒体数据。
一种实现方式中,上述第一标记媒体数据和第二标记媒体数据相同,二者均包含所述第二视频帧对应的数据。在这种情况下,通话终端向媒体服务器传输标记媒体数据之前,通话终端将在该通话终端的通话界面呈现的第一视频画面上叠加呈现上述标记痕迹,形成第二视频画面,并将用于呈现第二视频画面的第二视频帧对应的数据发送至媒体服务器,进而媒体服务器将该第二视频帧对应的数据送至对端通话终端,如此,对端通话终端接收到该第二视频帧对应的数据之后,即可在该对端通话终端的通话界面呈现该第二视频画面。
另一种实现方式中,上述第一标记媒体数据和第二标记媒体数据相同,二者均为标记痕迹数据。在这种情况下,通话终端获得标记痕迹数据之后,通话终端将标记痕迹数据发送至媒体服务器,媒体服务器将该标记媒体数据转发至对端通话终端,进而对端通话终端将目标媒体数据(该目标媒体数据来自该对端通话终端拍摄的视频内容)和标记痕迹数据进行叠加得到第二视频帧对应的数据,并基于该第二视频帧对应的数据在该对端通话终端的通话界面呈现第二视频画面。
又一种实现方式中,上述第一标记媒体数据和第二标记媒体数据不同,第一标记媒体数据为标记痕迹数据,第二标记媒体数据为第二视频帧对应的数据。在这种情况下,通话终端获得标记痕迹数据之后,通话终端将标记痕迹数据发送至媒体服务器,媒体服务器将目标媒体数据(该目标媒体数据媒体服务器从对端通话终端获得的)和标记痕迹数据进行叠加得到第二视频帧对应的数据,并将第二视频帧对应的数据发送至对端通话终端,从而对端通话终端可以基于该第二视频帧对应的数据在该对端通话终端的通话界面呈现第二视频画面。
在另一种实现方式中,上述视频通话媒体传输通道包括通话终端与对端通话终端之间直接的视频通话媒体传输通道,即视频通话媒体传输通道为直接的传输通道,则上述S610具体通过S6101'实现,即S6101-S6102被替换为S6101'。
S6101'、通话终端通过该通话终端与对端通话终端之间直接的视频通话媒体传输通道向对端通话终端传输标记媒体数据。
在一种实现方式中,上述标记媒体数据包含所述第二视频帧对应的数据。在这种情况下,通话终端将在该通话终端的通话界面呈现的第一视频画面上叠加呈现上述标记痕迹,形成第二视频画面,并将用于呈现第二视频画面的第二视频帧对应的数据发送至对端通话终端,如此,对端通话终端接收到该第二视频帧对应的数据之后,即可在该对端通话终端的通话界面呈现该第二视频画面。
在另一种实现方式中,上述标记媒体数据为标记痕迹数据。在这种情况下,通话终端获得标记痕迹数据之后,通话终端将标记痕迹数据发送至对端通话终端,进而对端通话终端将目标媒体数据(该目标媒体数据来自该对端通话终端拍摄的视频内容)和标记痕迹数据进行叠加得到第二视频帧对应的数据,并基于该第二视频帧对应的数据在该对端通话终端的通话界面呈现第二视频画面。
可选地,如图6所示,在通过视频通话媒体传输通道向对端通话终端传输标记媒体数据(即S610)之前,本申请实施例提供的通信方法还包括S611。
S611、通话终端停止通过视频通话媒体传输通道传输通话视频流。
本申请实施例提供的通信方法中,通话终端与对端通话终端之间停止通过视频通话媒体传输通道传输通话终端或对端通话终端拍摄的视频内容(即通话视频流),如此可以通过该视频通话媒体传输通道传输标记媒体数据。
可选地,通话终端停止通过视频通话媒体传输通道传输通话视频流具体包括:在通过第一视频通话媒体传输通道从通话终端接收第一标记媒体数据之前,停止通过第一视频通话媒体传输通道传输通话视频流;且在通过第二视频通话媒体传输通道向对端通话终端传输第二标记媒体数据之前,停止通过第二视频通话媒体传输通道传输所述通话视频流。
可选地,当媒体服务器作为通话终端和对端通话终端之间的传输媒介时,媒体服务器获取到用于呈现第二视频画面的第二视频帧对应的数据之后,媒体服务器还对通话终端拍摄的视频内容和上述的第二视频帧对应的数据进行混合编码处理(可以称为混屏处理),然后将混屏后的视频流发送至对端通话终端。如此,在对端通话终端上可以显示第二视频画面和通话终端的拍摄的内容混屏后的画面,例如,在对端通话终端的通话界面的第一区域显示通话终端拍摄的画面,在对端通话终端的通话界面的第二区域显示第二视频画面。
可以理解的是,本申请实施例中,通话终端也可以取消视频标记,恢复原本的语音或者视频通话。具体的,通话终端关闭该通话终端上的视频标记工具,并且通话终端向媒体服务器发送视频标记结束通知消息,以通知媒体服务器视频标记过程结束,如此,后续的,通话终端、媒体服务器以及对端通话终端之间恢复语音或者视频通话,也就是说,不再使用视频通话媒体传输通道传输标记媒体数据,而恢复该视频通话媒体传输通道原本的功能,即通过视频通话媒体传输通道传输通话视频流。
可选地,本申请实施例中,在通话终端与对端通话终端通话的过程中,通话终端可以对该通话终端上呈现的视频画面中的目标对象进行标记操作,然后将得到的标记媒体数据发送至对端通话终端,对端通话终端也可以对该对端通话终端上呈现的视频画面中的目标对象进行标记操作,然后将得到的标记媒体数据发送至通话终端,该对端通话终端进行标记操作并传输标记媒体数据的过程与上述通话终端进行标记操作并传输标记媒体数据的过程类似。
综上所述,本申请实施例提供的通信方法,通信系统中的通话终端、对端通话终端以及媒体服务器之间进行交互建立该视频通话媒体传输通道,且通过视频通话媒体传输通道传输所述通话终端与对端通话终端之间的通话视频流,以实现通话终端与对端通话终端之间的视频通话业务,该通话视频流包含通话终端或对端通话终端拍摄的视频内容;然后,通话终端基于目标媒体数据在通话终端的通话界面呈现第一视频画面,并检测用户对该第一视频画面中的目标对象的标记操作,生成标记痕迹数据,并且通过视频通话媒体传输通道向对端通话终端传输标记媒体数据,该标记媒体数据用于在所述对端通话终端的通话界面呈现第二视频画面,该第二视频画面包含上述标记痕迹。通过本申请实施例提供的技术方案,通话终端、媒体服务器以及对端通话终端之间可以基于现有的视频通话媒体传输通道传输标记媒体数据,无需花费额外的时间建立专用于传输标记媒体数据的传输通道,并且无需占用终端(包括通话终端和对端通话终端)额外的端口资源,如此,能够节省通话过程中对视频画面中的目标对象进 行标记时占用的端口资源。
进一步的,与现有的通信方法相比,本申请实施例提供的技术方案中,无需在用户通话终端和客服通话终端上安装视频标记APP,如此,也无需操作人员进行复杂的相关操作,不要求操作人员具有较高的操作技能。
可以理解的是,在人工客服的服务场景中,通话终端和对端通话终端可以为不同角色的终端,例如,通话终端可以为客服通话终端,对端通话终端可以为用户通话终端;或者通话终端为用户通话终端,对端通话终端为客服通话终端。
基于上述实施例的相关描述可知,发起通话的用户通话终端可以发起视频通话,也可以发起语音通话。下面,以上述通话终端为客服通话终端,对端通话终端为一个用户通话终端,并且用户通话终端发起的通话为语音通话为例,从各个设备的交互角度对本申请实施例提供的通信方法进行详细的描述。如图7所示,本申请实施例提供的通信方法包括如下步骤。
S701、用户通话终端通过IMS网元向媒体服务器发送邀请(invite)消息。
S702、媒体服务器通过IMS网元向用户通话终端发送振铃消息。
S703、媒体服务器通过IMS网元向用户通话终端发送应答消息。
可以理解的是,客服系统对用户通话终端的呼叫进行应答之后,媒体服务器播放与用户的业务相关的音频提示内容,以提示用户根据需求选择不同的服务内容(例如选择人工服务),当用户在该音频提示内容的提示下进行操作,选择了人工服务时,媒体服务器检测到选择人工服务的操作之后,媒体服务器为该用户分配一个客服人员(即为用户通话终端选择一个对应的客服通话终端)。
S704、媒体服务器向客服通话终端发送邀请(invite)消息。
S705、客服通话终端向媒体服务器发送应答消息。
S706、媒体服务器通过IMS网元向用户通话终端发送重邀请(reinvite)消息。
该重邀请(reinvite)消息中包括媒体服务器的SDP信息,媒体服务器的SDP信息包括媒体服务器的地址信息(例如IP地址)、音频端口信息以及音频编解码格式。
S707、用户通话终端通过IMS网元向媒体服务器发送应答消息。
该应答消息中包括用户通话终端SDP信息,用户通话终端SDP信息包括用户通话终端的地址信息(例如IP地址)、音频端口信息以及音频编解码格式。
S708、媒体服务器向客服通话终端发送重邀请(reinvite)消息。
该重邀请(reinvite)消息中包括媒体服务器的SDP信息,关于媒体服务器的SDP信息的描述可以参考S706。
S709、客服通话终端向媒体服务器发送应答消息。
该应答消息中包括客服通话终端的SDP信息,客服通话终端的SDP信息包括客服通话终端的地址信息(例如IP地址)、音频端口信息以及音频编解码格式。
上述S706-S709是通话终端、对端通话终端以及媒体服务器进行交互,通过媒体资源协商建立用户通话终端与客服通话终端的间接的语音通话媒体传输通道的过程。
可以理解的是,S701-S709是用户通话终端呼叫客服通话终端,并且建立语音通话媒体传输通道的过程。关于S701-S709的各个步骤的消息中所携带的内容可以参考上述对于S201-S209的详细描述,此处不再赘述。
S710、客服通话终端向媒体服务器发送视频画面标记申请。相应地,媒体服务器接收客服通话终端发送的视频画面标记申请。
该视频画面标记申请中包括用户通话终端的标识,该标识用于申请对用户通话终端对应的视频画面(以下称为第一视频画面)进行标记操作。
S711、媒体服务器向用户通话终端发送视频画面标记请求。相应地,用户通话终端从媒体服务器接收视频画面标记请求。
该视频画面标记请求用于请求对第一视频画面进行标记操作。
S712、用户通话终端向媒体服务器发送视频画面标记请求的响应消息。相应地,媒体服务器从用户通话终端接收视频画面标记请求的响应消息。
该响应消息用于指示用户通话终端同意对第一视频画面进行标记操作。
本申请实施例中,上述视频画面标记请求还指示请求将语音通话转换视频通话,该视频画面标记请求的响应消息还指示用户通话终端同意将语音通话转为视频通话。当用于呈现第一视频画面的目标媒体数据来自客服通话终端从用户通话终端接收的数据时,视频画面标记请求的响应消息还指示用户通话终端同意媒体服务器捕获该用户通话终端发送给媒体服务器的通话视频流。
需要说明的是,上述用户通话终端发起的是语音通话,并且通过上述实施例中的S706-S709的步骤建立的是语音通话媒体传输通道,由于对视频画面中的目标对象进行标记操作后需要得到的标记媒体数据(该标记媒体数据用于呈现第二视频画面,该第二视频画面属于视频流),而语音通话过程中仅能传输通话语音流,不能传输视频流,因此,在媒体服务器接收到视频画面标记申请之后,该媒体服务器将触发建立视频通话媒体传输通道,即需要将语音通话转换为视频通话,以建立能够传输视频流的视频通话媒体传输通道,并且使用该视频通话媒体传输通道传输标记媒体数据。
可以理解的是,视频通话媒体传输通道包括用户通话终端与媒体服务器之间的视频通话媒体传输通道(对应上述实施例中的第二视频通话媒体传输通道),以及客服通话终端与媒体服务器之间的视频通话媒体传输通道(对应上述实施例中的第一视频通话媒体传输通道),第一视频通话媒体传输通道和第二视频通话媒体传输通道成对存在,是用于客服通话终端与用户通话终端进行通信的传输通道。
本申请实施例中,上述S711可以为该重邀请消息中,该消息中携带媒体服务器的SDP信息(包括媒体服务器的SDP信息包括媒体服务器的地址信息、音频端口信息、音频编解码格式、视频端口信息以及视频编解码格式),则S712为该重邀请消息的响应消息,该响应消息携带用户通话终端的SDP信息(用户通话终端的SDP信息,用户通话终端的SDP信息包括用户通话终端的地址信息(例如IP地址)、音频端口信息、音频编解码格式、视频端口信息以及视频编解码格式),如此,通过重邀请消息中的媒体服务器的SDP信息和其响应消息中用户通话终端的SDP信息进行媒体资源协商以建立用户通话终端与媒体服务器之间的第二视频通话媒体传输通道。
第一视频通话媒体传输通道的建立过程如下S713-S714。
S713、媒体服务器向客服通话终端发送SIP消息。相应地,客服通话终端从媒体服务器接收SIP消息。
该SIP消息中包括标记操作确认标识和媒体服务器的SDP信息,该标记操作确认 标识用于确认通话终端是否具备对第一视频画面进行标记操作所需的资源,该媒体服务器的SDP信息包括媒体服务器的地址信息、音频端口信息、音频编解码格式、视频端口信息以及视频编解码格式。应注意,与语音通话场景的媒体资源协商消息中的SDP信息(包括设备的地址信息、音频端口信息以及音频编解码格式)不同的是,视频通话场景的媒体资源协商消息中的SDP信息中还包括设备的视频端口信息和视频编解码格式。
S714、客服通话终端向媒体服务器发送SIP消息的响应消息。相应地,媒体服务器从客服通话终端接收SIP消息的响应消息。
该SIP消息的响应消息中包括标记操作应答标识和客服通话终端的SDP信息,该标记操作应答标识用于指示通话终端具备对第一视频画面进行标记操作所需的资源,客服通话终端的SDP信息包括客服通话终端的地址信息、音频端口信息、音频编解码格式、视频端口信息以及视频编解码格式。
上述SIP消息中的媒体服务器的SDP信息和该SIP消息的应答消息中的客服通话终端的SDP信息用于进行媒体资源协商,建立媒体服务器与客服通话终端之间的第一视频通话媒体传输通道。
S715、客服通话终端基于目标媒体数据在该客服通话终端的通话界面呈现第一视频画面。
S716、客服通话终端检测用户对第一视频画面中的目标对象的标记操作,生成标记痕迹数据。
该标记痕迹数据用于描述标记操作所产生的标记痕迹。
S717、媒体服务器向客服通话终端发送传输通道指示信息。相应地,客服通话终端从媒体服务器接收该传输通道指示信息。
该传输通道指示信息用于指示客服通话终端通过视频通话媒体传输通道传输标记媒体数据。该标记媒体数据用于在对端通话终端的通话界面呈现第二视频画面,第二视频画面包含上述标记痕迹。
至此,用于传输标记媒体数据的传输通道已建立,该用于标记媒体数据的传输通道是视频通话媒体传输通道,基于该视频通话媒体传输通道,客服通话终端可以向用户通话终端传输标记媒体数据。
S718、客服通话终端通过视频通话媒体传输通道向用户通话终端传输标记媒体数据,以使用户通话终端基于该标记媒体数据在该用户话终端的通话界面呈现第二视频画面。
关于S701-S718的其他细节的描述可以参考上述实施例的相关描述,此处不再赘述。
可选地,上述用户通话终端也可以进行对视频画面中的目标对象进行标记操作,例如,用户通话终端对从客服通话终端接收的视频内容的视频画面中的目标对象进行标记操作,并将标记媒体数据发送至客服通话终端。
相应地,本申请实施例提供一种通话终端,根据上述方法示例可以对该通话终端进行功能模块的划分,例如,可以对应各个功能划分各个功能模块,也可以将两个或两个以上的功能集成在一个处理模块中。上述集成的模块既可以采用硬件的形式实现, 也可以采用软件功能模块的形式实现。需要说明的是,本发明实施例中对模块的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。
在采用对应各个功能划分各个功能模块的情况下,图8示出上述实施例中所涉及的通话终端的一种可能的结构示意图。如图8所示,该通话终端包括处理模块801、生成模块802以及发送模块803。处理模块801用于建立视频通话媒体传输通道,且控制通话终端通过视频通话媒体传输通道传输通话终端与对端通话终端之间的通话视频流,以实现通话终端与对端通话终端之间的视频通话业务,例如执行上述方法实施例中的S601。处理模块801还用于基于目标媒体数据在通话终端的通话界面呈现第一视频画面,例如执行上述方法实施例中的S607、S715。生成模块802用于检测用户对第一视频画面中的目标对象的标记操作,生成标记痕迹数据,该标记痕迹数据用于描述标记操作所产生的标记痕迹,例如执行上述方法实施例中的S608、S716。发送模块803用于通过视频通话媒体传输通道向对端通话终端传输标记媒体数据,以使对端通话终端基于标记媒体数据在对端通话终端的通话界面呈现第二视频画面,第二视频画面包含上述标记痕迹,例如执行上述方法实施例中的S610、S718。
可选地,目标媒体数据包含第一视频帧对应的数据,第一视频帧为通过所述视频通话媒体传输通道从所述对端通话终端接收的且用于呈现所述第一视频画面;处理模块801具体用于解码第一视频帧对应的数据以在通话界面呈现所述第一视频画面,例如执行上述方法实施例中的S6071。
可选地,目标媒体数据包含目标图像对应的数据,目标图像为通话终端本地存储的且用于呈现第一视频画面;处理模块801具体用于解码目标图像对应的数据以在通话界面呈现第一视频画面,例如执行上述方法实施例中的S6072。
可选地,上述处理模块801还用于控制通话终端停止通过视频通话媒体传输通道传输通话视频流,例如执行上述方法实施例中的S611。
可选地,视频通话媒体传输通道包括通话终端与媒体服务器之间的第一视频通话媒体传输通道,以及对端通话终端与媒体服务器之间的第二视频通话媒体传输通道。上述发送模块803具体用于通过第一视频通话媒体传输通道向媒体服务器传输第一标记媒体数据,以触发媒体服务器通过第二视频通话媒体传输通道向对端通话终端传输第二标记媒体数据,第一标记媒体数据和第二标记媒体数据均是用于呈现第二视频画面的数据,例如执行上述方法实施例中的S6101、S6102。
可选地,视频通话媒体传输通道是通话终端与对端通话终端之间直接的视频通话媒体传输通道,上述发送模块803具体用于通过直接的视频通话媒体传输通道向对端通话终端传输标记媒体数据,例如执行上述方法实施例中的S6101'。
可选地,本申请实施例提供的通话终端还包括接收模块804,该接收模块804用于从媒体服务器接收传输通道指示信息,该传输通道指示信息指示通话终端通过视频通话媒体传输通道传输标记媒体数据,例如执行上述方法实施例中的S609、S717。
可选地,上述发送模块803还用于向媒体服务器发送视频画面标记申请,该视频画面标记申请中包括对端通话终端的标识,该标识用于申请对对端通话终端对应的视频画面进行标记操作,例如执行上述方法实施例中的S602、S710。
可选地,上述接收模块804还用于从媒体服务器接收SIP消息,该SIP消息中包 括标记操作确认标识,该标记操作确认标识用于确认通话终端是否具备对第一视频画面进行标记操作所需的资源,例如执行上述方法实施例中的S605、S713;发送模块803还用于向媒体服务器发送SIP消息的响应消息,该响应消息中包括标记操作应答标识,该标记操作应答标识用于指示通话终端具备对第一视频画面进行标记操作所需的资源,例如执行上述方法实施例中的S606、S714。
上述通话终端的各个模块还可以用于执行上述方法实施例中的其他动作,上述方法实施例涉及的各步骤的所有相关内容均可以援引到对应功能模块的功能描述,在此不再赘述。
在采用集成的单元的情况下,图9示出了上述实施例中所涉及的通话终端的另一种可能的结构示意图。如图9所示,本申请实施例提供的通话终端可以包括:处理模块901和通信模块902。处理模块901可以用于对该通话终端的动作进行控制管理,例如,处理模块901可以用于支持该通话终端执行上述方法实施例中的S601、S607(包括S6071或S6072)、S608、S611、S715、S716,和/或用于本文所描述的技术的其它过程。通信模块902可以用于支持该通话终端与其他网络实体的通信,通信模块902集成了上述发送模块803和接收模块804的功能,该通信模块902可以用于支持该通话终端执行上述方法实施例中的S602、S605、S606、S609、S610(包括S6101-S6102)、S6101'、S710、S713、S714、S717、S718。可选地,如图9所示,该通话终端还可以包括存储模块903,用于存储该通话终端的程序代码和数据,例如接收到的视频内容视频画面、标记痕迹数据等。
其中,处理模块901可以是处理器,例如处理器可以为图4A中的处理器410。通信模块902可以是收发器、收发电路或通信接口等,例如图4A中的移动通信模块450和/或无线通信模块460,存储模块903可以是存储器,例如图4A中的内部存储器421。
上述通话终端包含的模块实现上述功能的更多细节请参考前面各个方法实施例中的描述,在这里不再重复。
相应地,本申请实施例提供一种媒体服务器,根据上述方法示例可以对该媒体服务器进行功能模块的划分,例如,可以对应各个功能划分各个功能模块,也可以将两个或两个以上的功能集成在一个处理模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。需要说明的是,本发明实施例中对模块的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。
在采用对应各个功能划分各个功能模块的情况下,图10示出上述实施例中所涉及的媒体服务器的一种可能的结构示意图。如图10所示,该媒体服务器包括处理模块1001、接收模块1002和发送模块1003。处理模块1001用于建立第一视频通话媒体传输通道和第二视频通话媒体传输通道,第一视频通话媒体传输通道为媒体服务器与通话终端之间的视频通话媒体传输通道,第二视频通话媒体传输通道为媒体服务器与对端通话终端之间的视频通话媒体传输通道;且控制接收或所述发送模块通过第一视频通话媒体传输通道和第二视频通话媒体传输通道传输通话终端与对端通话终端之间的通话视频流,以实现通话终端与对端通话终端之间的视频通话业务,通话视频流包含通话终端或对端通话终端拍摄的视频内容,例如执行上述方法实施例中的S601。接收模块1002用于通过第一视频通话媒体传输通道从通话终端接收第一标记媒体数据,第 一标记媒体数据用于在对端通话终端的通话界面呈现第二视频画面,第二视频画面包含标记痕迹,标记痕迹是用户对通话终端的通话界面呈现的第一视频画面中的目标对象进行标记操作所产生的标记痕迹,第一视频画面是基于目标数据在通话终端的通话界面呈现的视频画面,例如执行上述方法实施例中的S6101;发送模块1003还用于通过第二视频通话媒体传输通道向对端通话终端传输第二标记媒体数据,第二标记媒体数据用于在对端通话终端的通话界面呈现第二视频画面,例如执行上述方法实施例中的S6102。
可选地,发送模块1003用于向通话终端发送第一传输通道指示信息,该第一传输通道指示信息指示通话终端通过视频通话媒体传输通道传输第一标记媒体数据,例如执行上述方法实施例中的S609、S717。发送模块801还用于向对端通话终端发送第二传输通道指示信息,第二传输通道指示信息指示对端通话终端通过视频通话媒体传输通道传输第二标记媒体数据。
可选地,上述处理模块1001还用于控制媒体服务器停止通过第一视频通话媒体传输通道传输通话视频流,例如执行上述方法实施例中的S611。并且处理模块1001还用于控制媒体服务器停止通过第二视频通话媒体传输通道传输通话视频流。
可选地,接收模块1002还用于从通话终端接收视频画面标记申请,该视频画面标记申请中包括对端通话终端的标识,该标识用于申请对对端通话终端对应的视频画面进行标记操作,例如执行上述方法实施例中的S602、S710。
可选地,上述发送模1003还用于向对端通话终端发送视频画面标记请求,该视频画面标记请求用于请求对第一视频画面进行标记操作,例如执行上述方法实施例中的S603、S711;接收模块1002还用于从对端通话终端接收视频画面标记请求的响应消息,该响应消息用于指示对端通话终端同意对第一视频画面进行标记操作,例如执行上述方法实施例中的S604、S712。
可选地,发送模块1003还用于向通话终端发送SIP消息,该SIP消息中包括标记操作确认标识,标记操作确认标识用于确认通话终端是否具备对视频画面进行标记操作所需的资源,例如执行上述方法实施例中的S605、S713;接收模块1002还用于从通话终端接收SIP消息的响应消息,该响应消息中包括标记操作应答标识,标记操作应答标识用于指示通话终端具备对视频画面进行标记操作所需的资源,例如执行上述方法实施例中的S606、S714。
上述媒体服务器的各个模块还可以用于执行上述方法实施例中的其他动作,上述方法实施例涉及的各步骤的所有相关内容均可以援引到对应功能模块的功能描述,在此不再赘述。
在采用集成的单元的情况下,图11示出了上述实施例中所涉及的媒体服务器的另一种可能的结构示意图。如图11所示,本申请实施例提供的媒体服务器可以包括:处理模块1101和通信模块1102。处理模块1101可以用于对该媒体服务器的动作进行控制管理,例如,处理模块1101可以用于支持该媒体服务器执行上述方法实施例中的S601、S611,和/或用于本文所描述的技术的其它过程。通信模块1102可以用于支持该媒体服务器与其他网络实体的通信,通信模块1102集成了上述接收模块1002和发送模块1003的功能,该通信模块1102可以用于支持该媒体服务器执行上述方法实施 例中的S602、S603、S604、S605、S606、S609、S6101、S6102、S710、S711、S712、S713、S714、S717。可选地,如图11所示,该媒体服务器还可以包括存储模块1103,用于存储该媒体服务器的程序代码和数据。
其中,处理模块1101可以是处理器,例如处理器可以为图5中的处理器501。通信模块1102可以是收发器、收发电路或网络接口等,例如图5中的网络接口503,存储模块1103可以是存储器,例如图5中的存储器502。
上述媒体服务器包含的模块实现上述功能的更多细节请参考前面各个方法实施例中的描述,在这里不再重复。
本说明书中的各个实施例均采用递进的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件程序实现时,可以全部或部分地以计算机程序产品的形式实现。该计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行该计算机指令时,全部或部分地产生按照本申请实施例中的流程或功能。该计算机可以是通用计算机、专用计算机、计算机网络或者其他可编程装置。该计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,该计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(digital subscriber line,DSL))方式或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心传输。该计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包括一个或多个可用介质集成的服务器、数据中心等数据存储设备。该可用介质可以是磁性介质(例如,软盘、磁盘、磁带)、光介质(例如,数字视频光盘(digital video disc,DVD))、或者半导体介质(例如固态硬盘(solid state drives,SSD))等。
通过以上的实施方式的描述,所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。上述描述的系统,装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统,装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述模块或单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以 是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)或处理器执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:快闪存储器、移动硬盘、只读存储器、随机存取存储器、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何在本申请揭露的技术范围内的变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (41)

  1. 一种通信方法,其特征在于,所述方法由通话终端执行,所述方法包括:
    建立视频通话媒体传输通道,且通过所述视频通话媒体传输通道传输所述通话终端与对端通话终端之间的通话视频流,以实现所述通话终端与所述对端通话终端之间的视频通话业务;
    基于目标媒体数据在所述通话终端的通话界面呈现第一视频画面;
    检测用户对所述第一视频画面中的目标对象的标记操作,生成标记痕迹数据,所述标记痕迹数据用于描述所述标记操作所产生的标记痕迹;
    通过所述视频通话媒体传输通道向所述对端通话终端传输标记媒体数据,以使所述对端通话终端基于所述标记媒体数据在所述对端通话终端的通话界面呈现第二视频画面,所述第二视频画面包含所述标记痕迹。
  2. 根据权利要求1所述的方法,其特征在于:
    所述目标媒体数据包含第一视频帧对应的数据,所述第一视频帧为通过所述视频通话媒体传输通道从所述对端通话终端接收的且用于呈现所述第一视频画面,所述基于目标媒体数据在所述通话终端的通话界面呈现第一视频画面,包括:解码所述第一视频帧对应的数据以在所述通话界面呈现所述第一视频画面;或者,
    所述目标媒体数据包含目标图像对应的数据,所述目标图像为所述通话终端本地存储的且用于呈现所述第一视频画面,所述基于目标媒体数据呈现第一视频画面,包括:解码所述目标图像对应的数据以在所述通话界面呈现所述第一视频画面。
  3. 根据权利要求1或2所述的方法,其特征在于:
    所述标记媒体数据包含第二视频帧对应的数据,所述第二视频帧用于呈现嵌入了所述标记痕迹的第二视频画面;或者,
    所述标记媒体数据包含所述标记痕迹数据。
  4. 根据权利要求1至3任一项所述的方法,其特征在于,所述方法还包括:
    停止通过所述视频通话媒体传输通道传输所述通话视频流。
  5. 根据权利要求1至4任一项所述的方法,其特征在于:
    所述视频通话媒体传输通道包括所述通话终端与媒体服务器之间的第一视频通话媒体传输通道,以及所述对端通话终端与所述媒体服务器之间的第二视频通话媒体传输通道;
    所述通过所述视频通话媒体传输通道向所述对端通话终端传输标记媒体数据,包括:通过所述第一视频通话媒体传输通道向所述媒体服务器传输第一标记媒体数据,以触发所述媒体服务器通过所述第二视频通话媒体传输通道向所述对端通话终端传输第二标记媒体数据。
  6. 根据权利要求5所述的方法,其特征在于:
    所述第一标记媒体数据和所述第二标记媒体数据均包含所述第二视频帧对应的数据;或者,所述第一标记媒体数据和所述第二标记媒体数据均为所述标记痕迹数据;或者,
    所述第一标记媒体数据为所述标记痕迹数据,所述第二标记媒体数据为所述第二视频帧对应的数据。
  7. 根据权利要求1至5任一项所述的方法,其特征在于:
    所述视频通话媒体传输通道是所述通话终端与所述对端通话终端之间直接的视频通话媒体传输通道;
    所述通过所述视频通话媒体传输通道向所述对端通话终端传输标记媒体数据,包括:
    通过所述直接的视频通话媒体传输通道向所述对端通话终端传输所述标记媒体数据。
  8. 根据权利要求1至7任一项所述的方法,其特征在于,所述方法还包括:
    从所述媒体服务器接收传输通道指示信息,所述传输通道指示信息指示所述通话终端通过所述视频通话媒体传输通道传输所述标记媒体数据。
  9. 根据权利要求1至8任一项所述的方法,其特征在于,所述方法还包括:
    向所述媒体服务器发送视频画面标记申请,所述视频画面标记申请中包括所述对端通话终端的标识,所述标识用于申请对所述对端通话终端对应的视频画面进行标记操作。
  10. 根据权利要求1至9任一项所述的方法,其特征在于,所述方法还包括:
    确认所述通话终端具备对所述第一视频画面进行标记操作所需的资源。
  11. 根据权利要求10所述的方法,其特征在于,所述确认所述通话终端具备对所述第一视频画面进行标记操作所需的资源,包括:
    从所述媒体服务器接收会话发起协议SIP消息,所述SIP消息中包括标记操作确认标识,所述标记操作确认标识用于确认所述通话终端是否具备对所述第一视频画面进行标记操作所需的资源;
    向所述媒体服务器发送所述SIP消息的响应消息,所述响应消息中包括标记操作应答标识,所述标记操作应答标识用于指示所述通话终端具备对所述第一视频画面进行标记操作所需的资源。
  12. 一种通信方法,其特征在于,所述方法由媒体服务器执行,所述方法包括:
    建立第一视频通话媒体传输通道和第二视频通话媒体传输通道,所述第一视频通话媒体传输通道为所述媒体服务器与通话终端之间的视频通话媒体传输通道,所述第二视频通话媒体传输通道为所述媒体服务器与对端通话终端之间的视频通话媒体传输通道;且通过所述第一视频通话媒体传输通道和所述第二视频通话媒体传输通道传输所述通话终端与所述对端通话终端之间的通话视频流,以实现所述通话终端与所述对端通话终端之间的视频通话业务;
    通过所述第一视频通话媒体传输通道从所述通话终端接收第一标记媒体数据,所述第一标记媒体数据用于在所述对端通话终端的通话界面呈现第二视频画面,所述第二视频画面包含标记痕迹,所述标记痕迹是用户对所述通话终端的通话界面呈现的第一视频画面中的目标对象进行标记操作所产生的标记痕迹,所述第一视频画面是基于目标数据在所述通话终端的通话界面呈现的视频画面;
    通过所述第二视频通话媒体传输通道向所述对端通话终端传输第二标记媒体数据,所述第二标记媒体数据用于在所述对端通话终端的通话界面呈现所述第二视频画面。
  13. 根据权利要求12所述的方法,其特征在于:
    所述目标媒体数据包含第一视频帧对应的数据,所述第一视频帧为所述通话终端通过所述第一视频通话媒体传输通道从所述对端通话终端接收的且用于呈现所述第一视频画面;或者,
    所述目标媒体数据包含目标图像对应的数据,所述目标图像为所述通话终端本地存储的且用于呈现所述第一视频画面。
  14. 根据权利要求12或13所述的方法,其特征在于:
    所述第一标记媒体数据和所述第二标记媒体数据均包含第二视频帧对应的数据,所述第二视频帧用于呈现嵌入了所述标记痕迹的第二视频画面;或者,
    所述第一标记媒体数据和所述第二标记媒体数据均为所述标记痕迹数据;或者,
    所述第一标记媒体数据为所述标记痕迹数据,所述第二标记媒体数据为所述第二视频帧对应的数据。
  15. 根据权利要求12至14任一项所述的方法,其特征在于,所述方法还包括:
    向所述通话终端发送第一传输通道指示信息,所述第一传输通道指示信息指示所述通话终端通过所述第一视频通话媒体传输通道传输所述第一标记媒体数据;
    向所述对端通话终端发送第二传输通道指示信息,所述第二传输通道指示信息指示所述对端通话终端通过所述第二视频通话媒体传输通道传输所述第二标记媒体数据。
  16. 根据权利要求12至15任一项所述的方法,其特征在于,所述方法还包括:
    停止通过所述第一视频通话媒体传输通道传输所述通话视频流;
    停止通过所述第二视频通话媒体传输通道传输所述通话视频流。
  17. 根据权利要求12至16任一项所述的方法,其特征在于,所述方法还包括:
    从所述通话终端接收视频画面标记申请,所述视频画面标记申请中包括所述对端通话终端的标识,所述标识用于申请对所述对端通话终端对应的视频画面进行标记操作;
    向所述对端通话终端发送视频画面标记请求,所述视频画面标记请求用于请求对所述第一视频画面进行标记操作;
    从所述对端通话终端接收所述视频画面标记请求的响应消息,所述响应消息用于指示所述对端通话终端同意对所述第一视频画面进行标记操作。
  18. 根据权利要求12至17任一项所述的方法,其特征在于,所述方法还包括:
    确认所述通话终端具备对所述第一视频画面进行标记操作所需的资源。
  19. 根据权利要求18所述的方法,其特征在于,确认所述通话终端具备对所述第一视频画面进行标记操作所需的资源,包括:
    向所述通话终端发送会话发起协议SIP消息,所述SIP消息中包括标记操作确认标识,所述标记操作确认标识用于确认所述通话终端是否具备对所述第一视频画面进行标记操作所需的资源;
    从所述通话终端接收所述SIP消息的响应消息,所述响应消息中包括标记操作应答标识,所述标记操作应答标识用于指示所述通话终端具备对所述第一视频画面进行标记操作所需的资源。
  20. 一种通话终端,其特征在于,包括:处理模块、生成模块以及发送模块;
    所述处理模块,用于建立视频通话媒体传输通道,且控制所述通话终端通过所述 视频通话媒体传输通道传输所述通话终端与对端通话终端之间的通话视频流,以实现所述通话终端与所述对端通话终端之间的视频通话业务;并且基于目标媒体数据在所述通话终端的通话界面呈现第一视频画面;
    所述生成模块,用于检测用户对所述第一视频画面中的目标对象的标记操作,生成标记痕迹数据,所述标记痕迹数据用于描述所述标记操作所产生的标记痕迹;
    所述发送模块,用于通过所述视频通话媒体传输通道向所述对端通话终端传输标记媒体数据,以使所述对端通话终端基于所述标记媒体数据在所述对端通话终端的通话界面呈现第二视频画面,所述第二视频画面包含所述标记痕迹。
  21. 根据权利要求20所述的通话终端,其特征在于:
    所述目标媒体数据包含第一视频帧对应的数据,所述第一视频帧为通过所述视频通话媒体传输通道从所述对端通话终端接收的且用于呈现所述第一视频画面;
    所述处理模块,具体用于解码所述第一视频帧对应的数据以在所述通话界面呈现所述第一视频画面;或者,
    所述目标媒体数据包含目标图像对应的数据,所述目标图像为所述通话终端本地存储的且用于呈现所述第一视频画面;
    所述处理模块,具体用于解码所述目标图像对应的数据以在所述通话界面呈现所述第一视频画面。
  22. 根据权利要求20或21所述的通话终端,其特征在于:
    所述标记媒体数据包含第二视频帧对应的数据,所述第二视频帧用于呈现嵌入了所述标记痕迹的第二视频画面;或者,
    所述标记媒体数据包含所述标记痕迹数据。
  23. 根据权利要求20至22任一项所述的通话终端,其特征在于:
    所述处理模块,还用于控制所述通话终端停止通过所述视频通话媒体传输通道传输所述通话视频流。
  24. 根据权利要求20至23任一项所述的通话终端,其特征在于:
    所述视频通话媒体传输通道包括所述通话终端与媒体服务器之间的第一视频通话媒体传输通道,以及所述对端通话终端与所述媒体服务器之间的第二视频通话媒体传输通道;
    所述发送模块,具体用于通过所述第一视频通话媒体传输通道向所述媒体服务器传输第一标记媒体数据,以触发所述媒体服务器通过所述第二视频通话媒体传输通道向所述对端通话终端传输第二标记媒体数据。
  25. 根据权利要求24所述的通话终端,其特征在于:
    所述第一标记媒体数据和所述第二标记媒体数据均包含所述第二视频帧对应的数据;或者,
    所述第一标记媒体数据和所述第二标记媒体数据均为所述标记痕迹数据;或者,
    所述第一标记媒体数据为所述标记痕迹数据,所述第二标记媒体数据为所述第二视频帧对应的数据。
  26. 根据权利要求20至25任一项所述的通话终端,其特征在于:
    所述视频通话媒体传输通道是所述通话终端与所述对端通话终端之间直接的视频 通话媒体传输通道;
    所述发送模块,具体用于通过所述直接的视频通话媒体传输通道向所述对端通话终端传输所述标记媒体数据。
  27. 根据权利要求20至26任一项所述的通话终端,其特征在于,所述通话终端还包括接收模块;
    所述接收模块,用于从所述媒体服务器接收传输通道指示信息,所述传输通道指示信息指示所述通话终端通过所述视频通话媒体传输通道传输所述标记媒体数据。
  28. 根据权利要求20至27任一项所述的通话终端,其特征在于:
    所述发送模块,还用于向所述媒体服务器发送视频画面标记申请,所述视频画面标记申请中包括所述对端通话终端的标识,所述标识用于申请对所述对端通话终端对应的视频画面进行标记操作。
  29. 根据权利要求20至28任一项所述的通话终端,其特征在于,
    所述接收模块,还用于从所述媒体服务器接收会话发起协议SIP消息,所述SIP消息中包括标记操作确认标识,所述标记操作确认标识用于确认所述通话终端是否具备对所述第一视频画面进行标记操作所需的资源;
    所述发送模块,还用于向所述媒体服务器发送所述SIP消息的响应消息,所述响应消息中包括标记操作应答标识,所述标记操作应答标识用于指示所述通话终端具备对所述视频画面进行标记操作所需的资源。
  30. 一种媒体服务器,其特征在于,包括:处理模块、接收模块以及发送模块;
    所述处理模块,用于建立第一视频通话媒体传输通道和第二视频通话媒体传输通道,所述第一视频通话媒体传输通道为所述媒体服务器与通话终端之间的视频通话媒体传输通道,所述第二视频通话媒体传输通道为所述媒体服务器与对端通话终端之间的视频通话媒体传输通道;且控制所述接收模块或所述发送模块通过所述第一视频通话媒体传输通道和所述第二视频通话媒体传输通道传输所述通话终端与所述对端通话终端之间的通话视频流,以实现所述通话终端与所述对端通话终端之间的视频通话业务;
    所述接收模块,用于通过所述第一视频通话媒体传输通道从所述通话终端接收第一标记媒体数据,第一标记媒体数据用于在所述对端通话终端的通话界面呈现第二视频画面,所述第二视频画面包含标记痕迹,所述标记痕迹是用户对所述通话终端的通话界面呈现的第一视频画面中的目标对象进行标记操作所产生的标记痕迹,所述第一视频画面是基于目标数据在所述通话终端的通话界面呈现的视频画面;
    所述发送模块,用于通过所述第二视频通话媒体传输通道向所述对端通话终端传输第二标记媒体数据,所述第二标记媒体数据用于在所述对端通话终端的通话界面呈现所述第二视频画面。
  31. 根据权利要求30所述的媒体服务器,其特征在于:
    所述目标媒体数据包含第一视频帧对应的数据,所述第一视频帧为所述通话终端通过所述第一视频通话媒体传输通道从所述对端通话终端接收的且用于呈现所述第一视频画面;或者,
    所述目标媒体数据包含目标图像对应的数据,所述目标图像为所述通话终端本地 存储的且用于呈现所述第一视频画面。
  32. 根据权利要求30或31所述的媒体服务器,其特征在于:
    所述第一标记媒体数据和所述第二标记媒体数据均包含第二视频帧对应的数据,所述第二视频帧用于呈现嵌入了所述标记痕迹的第二视频画面;或者,
    所述第一标记媒体数据和所述第二标记媒体数据均为所述标记痕迹数据;或者,
    所述第一标记媒体数据为所述标记痕迹数据,所述第二标记媒体数据为所述第二视频帧对应的数据。
  33. 根据权利要求30至32任一项所述的媒体服务器,其特征在于:
    所述发送模块,还用于向所述通话终端发送第一传输通道指示信息,所述第一传输通道指示信息指示所述通话终端通过所述视频通话媒体传输通道传输所述第一标记媒体数据;并且向所述对端通话终端发送第二传输通道指示信息,所述第二传输通道指示信息指示所述对端通话终端通过所述视频通话媒体传输通道传输所述第二标记媒体数据。
  34. 根据权利要求30至33任一项所述的媒体服务器,其特征在于:
    所述处理模块还用于控制所述媒体服务器停止通过所述第一视频通话媒体传输通道传输所述通话视频流;并且控制所述媒体服务器停止通过所述第二视频通话媒体传输通道传输所述通话视频流。
  35. 根据权利要求30至34任一项所述的媒体服务器,其特征在于:
    所述接收模块,还用于从所述通话终端接收视频画面标记申请,所述视频画面标记申请中包括所述对端通话终端的标识,所述标识用于申请对所述对端通话终端对应的视频画面进行标记操作;
    所述发送模块,还用于向所述对端通话终端发送视频画面标记请求,所述视频画面标记请求用于请求对所述视频画面进行标记操作;
    所述接收模块,还用于从所述对端通话终端接收所述视频画面标记请求的响应消息,所述响应消息用于指示所述对端通话终端同意对所述视频画面进行标记操作。
  36. 根据权利要求30至35任一项所述的媒体服务器,其特征在于:
    所述发送模块,还用于向所述通话终端发送会话发起协议SIP消息,所述SIP消息中包括标记操作确认标识,所述标记操作确认标识用于确认所述通话终端是否具备对所述第一视频画面进行标记操作所需的资源;
    所述接收模块,还用于从所述通话终端接收所述SIP消息的响应消息,所述响应消息中包括标记操作应答标识,所述标记操作应答标识用于指示所述通话终端具备对所述第一视频画面进行标记操作所需的资源。
  37. 一种通话终端,其特征在于,包括存储器和与所述存储器连接的至少一个处理器,所述存储器用于存储计算机程序代码,所述计算机程序代码包括计算机指令,当所述计算机指令被所述至少一个处理器执行时,使得所述通话终端执行如权利要求1至11任一项所述的方法。
  38. 一种媒体服务器,其特征在于,包括存储器和与所述存储器连接的至少一个处理器,所述存储器用于存储计算机程序代码,所述计算机程序代码包括计算机指令,当所述计算机指令被所述至少一个处理器执行时,使得所述媒体服务器执行如权利要 求12至19任一项所述的方法。
  39. 一种计算机可读存储介质,其特征在于,包括计算机指令,当所述计算机指令在通话终端上运行时,使得所述通话终端执行如权利要求1至11任一项所述的方法。
  40. 一种计算机可读存储介质,其特征在于,包括计算机指令,当所述计算机指令在服务器上运行时,使得所述服务器执行如权利要求12至19任一项所述的方法。
  41. 一种通信系统,其特征在于,包括通话终端和媒体服务器;所述通话终端执行如权利要求1至11任一项所述的方法,所述媒体服务器执行如权利要求12至19任一项所述的方法。
PCT/CN2023/083485 2022-03-31 2023-03-23 一种通信方法、装置及系统 WO2023185648A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210334450.9 2022-03-31
CN202210334450.9A CN116939139A (zh) 2022-03-31 2022-03-31 一种通信方法、装置及系统

Publications (1)

Publication Number Publication Date
WO2023185648A1 true WO2023185648A1 (zh) 2023-10-05

Family

ID=88199309

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/083485 WO2023185648A1 (zh) 2022-03-31 2023-03-23 一种通信方法、装置及系统

Country Status (2)

Country Link
CN (1) CN116939139A (zh)
WO (1) WO2023185648A1 (zh)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017162012A1 (zh) * 2016-03-21 2017-09-28 中兴通讯股份有限公司 多方会议系统及其实现多方会议的方法和装置
CN107835464A (zh) * 2017-09-28 2018-03-23 努比亚技术有限公司 视频通话窗口画面处理方法、终端和计算机可读存储介质
CN108206807A (zh) * 2016-12-16 2018-06-26 展讯通信(上海)有限公司 通话中共享信息的方法、装置及移动终端
CN108259510A (zh) * 2018-02-27 2018-07-06 惠州Tcl移动通信有限公司 一种媒体数据实时传输控制方法、系统及存储介质
CN113891031A (zh) * 2021-11-11 2022-01-04 西安医疗指南者信息技术服务有限公司 用于医疗过程的视频通话方法、装置及存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017162012A1 (zh) * 2016-03-21 2017-09-28 中兴通讯股份有限公司 多方会议系统及其实现多方会议的方法和装置
CN108206807A (zh) * 2016-12-16 2018-06-26 展讯通信(上海)有限公司 通话中共享信息的方法、装置及移动终端
CN107835464A (zh) * 2017-09-28 2018-03-23 努比亚技术有限公司 视频通话窗口画面处理方法、终端和计算机可读存储介质
CN108259510A (zh) * 2018-02-27 2018-07-06 惠州Tcl移动通信有限公司 一种媒体数据实时传输控制方法、系统及存储介质
CN113891031A (zh) * 2021-11-11 2022-01-04 西安医疗指南者信息技术服务有限公司 用于医疗过程的视频通话方法、装置及存储介质

Also Published As

Publication number Publication date
CN116939139A (zh) 2023-10-24

Similar Documents

Publication Publication Date Title
US10321095B2 (en) Smart device pairing and configuration for meeting spaces
US9024997B2 (en) Virtual presence via mobile
US11805158B2 (en) Method and system for elevating a phone call into a video conferencing session
US9369673B2 (en) Methods and systems for using a mobile device to join a video conference endpoint into a video conference
CN105144673B (zh) 延迟减少的服务器干预式音频-视频通信
JP2006217592A (ja) 第3のディスプレイを通じた画像提供を可能にするビデオ通話方法
CN104365088A (zh) 使用多个摄像头的多通道通信
US8970651B2 (en) Integrating audio and video conferencing capabilities
US20150201085A1 (en) Seamlessly transferring a communication
CN105393546A (zh) 在用于无线显示的源设备中用于资源利用的方法和装置
CN103220195A (zh) 传输媒体数据
US20150199169A1 (en) Universal serial bus-to-bluetooth audio bridging devices
WO2022222691A1 (zh) 一种通话处理方法及相关设备
WO2023185650A1 (zh) 一种通信方法、装置及系统
WO2021218653A1 (zh) 媒体资源传输方法、相关装置及系统
CN103414867B (zh) 多媒体通话控制方法、终端及系统
WO2023185648A1 (zh) 一种通信方法、装置及系统
WO2023185651A1 (zh) 一种通信方法、装置及系统
US10122896B2 (en) System and method of managing transmission of data between two devices
WO2023071131A1 (zh) 一种桌面共享方法、装置及系统
WO2022068674A1 (zh) 视频通话的方法、电子设备及系统
WO2024104122A1 (zh) 分享方法、电子设备及计算机存储介质
CN102480418B (zh) 实现cdma2000终端和软交换终端视频互通的路由方法
WO2023025150A1 (zh) 一种通话方法、电子设备及系统
CN106331567A (zh) 一种视频通信方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23778017

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2023778017

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2023778017

Country of ref document: EP

Effective date: 20240327