CN113973230A

CN113973230A - Call processing method, device and system

Info

Publication number: CN113973230A
Application number: CN202010710150.7A
Authority: CN
Inventors: 吕梁; 高海; 丁镜程; 李冠英
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2020-07-22
Filing date: 2020-07-22
Publication date: 2022-01-25

Abstract

The embodiment of the application discloses a call processing method, a device and a system, wherein the method comprises the following steps: the media server receives the call request, negotiates with a first terminal about video media resources, and sends a real-time transport protocol (RTP) stream to the first terminal based on a negotiation result of the negotiation with the first terminal about the video media resources, wherein the RTP stream comprises a first RTP message of a first video media and a second RTP message of a second video media, the first RTP message comprises a synchronization source single channel (SSRC) identifier of the RTP stream and an SSRC identifier of the first video media, the second RTP message comprises an SSRC identifier of the RTP stream and an SSRC identifier of the second video media, splicing of the first video media and the second video media is not needed in advance, and the media server can instruct the first terminal to simultaneously play the first video media and the second video media on a screen through the RTP stream, so that the calculation resources required by video splicing are saved.

Description

Call processing method, device and system

Technical Field

The present application relates to the field of communications technologies, and in particular, to a method, an apparatus, and a system for call processing.

Background

With the deployment of 4 th generation (4G) and 5 th generation (5G) wireless communication systems, the wireless communication system can provide high-definition voice call and video call, and customized media services such as video polyphonic ringtone and video polyphonic ringtone. With the increase of network bandwidth, users have new requirements on video experience in the mobile communication process. For example, when a video color ring or advertisement is played, multiple sub-screens can be displayed simultaneously to play different video contents, so as to increase the information amount of the contents during playing and increase the selectivity of users.

In order to realize multi-screen color ring video playing, in the prior art, when a video source is manufactured, a plurality of sub-videos of a called user are combined into one video file. After the calling user initiates the SIP call, the calling domain and the called domain perform media negotiation, and the negotiated playing resolution is the same as that of the spliced video file. After the media negotiation is completed, the called domain MRS packages the spliced video file into RTP stream according to the video frame and sends the RTP stream to the terminal of the calling user, and the terminal of the calling user receives the video media RTP message, analyzes the video stream and displays the spliced video stream in a video playing window. Thus, the calling user can realize multi-screen viewing.

However, videos for multi-screen watching of a calling user need to be spliced in advance, splicing involves transcoding operation, transcoding is a calculation-intensive service, if the number of video materials is large, a large number of CPUs are needed to perform parallel processing, and the cost of computing resources is high.

Disclosure of Invention

The embodiment of the application provides a call processing method, a call processing device and a call processing system, which can indicate a calling terminal or a called terminal to simultaneously play a first video medium and a second video medium on a screen without splicing the first video medium and the second video medium in advance by a network, are favorable for saving operation resources required by video splicing, and are favorable for the development of multi-screen video medium playing services.

In a first aspect, an embodiment of the present application provides a call processing method, which includes the following steps. The media server receives a call request, wherein the call request is used for the calling terminal to initiate a call to the called terminal. And then, the media server performs video media resource negotiation with the calling terminal or the called terminal. For convenience of description, in the embodiments of the present application, a negotiation object of a media server is referred to as a first terminal. After the media server and the first terminal complete the negotiation of the video media resources, the media server sends a real-time transport protocol (RTP) stream to the first terminal based on the negotiation result of the negotiation of the video media resources, wherein the RTP stream comprises RTP messages of at least two video media. For convenience of description, in the embodiment of the present application, the RTP stream includes a first RTP packet of a first video media and a second RTP packet of a second video media as an example for illustration. The first RTP message comprises a synchronization source SSRC identification of the RTP stream and an SSRC identification of the first video media, and the second RTP message comprises an SSRC identification of the RTP stream and an SSRC identification of the second video media.

In the method, after the first terminal receives the RTP stream, the SSRC identifier of the RTP stream indicates that the first RTP packet and the second RTP packet are RTP packets in the RTP stream, the SSRC identifier of the first video media indicates that the first RTP packet is a packet of the first video media, and the SSRC identifier of the second video media indicates that the second RTP packet is a packet of the second video media. And the first terminal can determine a first RTP message of the first video media from the RTP stream according to the SSRC identification of the first video media, and determine a second RTP message of the second video media from the RTP stream according to the SSRC identification of the second video media. The first terminal can play the first video media after analyzing and decoding the first RTP message; the first terminal can play the second video media after analyzing and decoding the second RTP message. Therefore, the network does not need to splice the first video media and the second video media in advance, and instructs the first terminal to simultaneously play the first video media and the second video media on the screen by carrying the RTP messages of the first video media and the second video media in the same RTP stream, which is beneficial to saving the operation resources required by video splicing and development of multi-screen video media playing services.

In the prior art, an RTP stream only includes an RTP packet of a video media, and the RTP packet in the RTP stream only includes an SSRC identifier of the RTP stream; in this embodiment, the RTP stream includes RTP packets of at least two video media, and the RTP packets in the RTP stream include an SSRC identifier of the RTP stream and an SSRC identifier of the video media. In order to facilitate distinguishing the RTP stream related to the embodiment of the present application from the existing RTP stream, the existing RTP stream is referred to as a single SSRC-identified RTP stream, and the RTP stream related to the embodiment of the present application is referred to as a multiple SSRC-identified RTP stream. Supporting sending or receiving multiple SSRC identified RTP streams, and/or supporting video media resource negotiation for multiple SSRC identified RTP streams, is referred to as supporting multiple SSRC identification mechanism.

In one possible design, the negotiation of the video media resource between the media server and the first terminal specifically includes the following contents. The media server sends an update message to the first terminal, the update message indicating that the media server supports a multiple SSRC identification mechanism. Then, the media server receives a response message from the first terminal, optionally, the response message carries a negotiation result of the video media resource negotiation, which may also be called media capability information of the first terminal, and the response message indicates whether the first terminal supports the multiple SSRC identification mechanism, so that the media server determines whether the first terminal supports the multiple SSRC identification mechanism according to the response message, thereby facilitating the media server to send an RTP stream with multiple SSRC identifications to the first terminal based on the multiple SSRC identification mechanism supported by the first terminal, and avoiding a failure of the first terminal to play multiple video media.

In one possible design, the update message carries media capability information of the media server (e.g., SDP information of the media server). In one possible design, the update message includes first indication information indicating the SSRC identification of the first video media and the SSRC identification of the second video media. The first terminal may determine, according to the first indication information, information of the RTP stream to be received, such as the number of video media in the RTP stream and the SSRC identifier of each video media, so that the first terminal can determine, from the received RTP stream, an RTP packet of each video media, and thus the first terminal can correctly play the first video media and the second video media according to the RTP stream.

In one possible design, the first indication information corresponds to the same m lines of video in the update message. By negotiating the first video media and the second video media in the same video m-line of the update message, the existing scene of negotiating a single video media can be compatible, and the method is convenient to popularize.

In one possible design, the first indication information further indicates a playing parameter of the first video medium and a playing parameter of the second video medium. After receiving the update message, the first terminal may determine the playing parameter of the first video media according to the SSRC identifier of the first video media. After receiving the RTP stream, the first terminal may determine a first RTP packet of the first video media according to the SSRC identifier of the first video media. And then, the first terminal can analyze and decode the first RTP message and play the first video media according to the playing parameters of the first video media. Similarly, after receiving the update message, the first terminal may determine the playing parameters of the second video media according to the SSRC identifier of the second video media. After receiving the RTP stream, the first terminal may determine a second RTP packet of the second video media according to the SSRC identifier of the second video media. And then, the first terminal can analyze and decode the second RTP message and play the second video media according to the playing parameters of the second video media.

The first indication information in the update message can transmit the playing parameters of each video media to the first terminal, so that the playing parameters of each video media at the first terminal can be controlled conveniently, the richness of playing of the multiple video media is improved, and the user experience is improved.

In one possible design, the playing parameter of the first video medium indicates a display position of the first video medium at the first terminal.

In a possible design, the first RTP packet further includes a first order identifier, and the first order identifier of the first RTP packet indicates an order of the first RTP packet in the first video media, so that the first terminal can determine whether the first video media has lost a packet. Similarly, the second RTP packet further includes a first sequence identifier, where the first sequence identifier of the second RTP packet is used to indicate an order of the second RTP packet in the second video media, so that the first terminal can determine whether the second video media loses packets.

In one possible design, after the media server sends the RTP stream to the first terminal, the media server receives a handover request from the first terminal, the handover request requesting handover of the first video media. And then, the media server stops sending the first RTP message of the first video media to the first terminal, and sends a third RTP message of a third video media to the first terminal, wherein the third RTP message comprises the SSRC identification of the RTP stream and the SSRC identification of the third video media. The SSRC identification of the third video media is the same as the SSRC identification of the first video media, so that the first terminal can continuously determine the third RTP message of the third video media from the RTP stream according to the SSRC identification of the first video media, analyze and decode the third RTP message, and continuously play the third video media according to the playing parameters and the like of the first video media, thereby realizing the switching of the first video media into the third video media on the first terminal, increasing the selectivity of users to the video media and improving the user experience.

In a second aspect, the present application provides a call processing method, where the method is applied to a calling terminal or a called terminal, and for simplifying the description, a terminal to which the method is applied is referred to as a first terminal. The method comprises the following steps: if the first terminal is a calling terminal, the first terminal sends a call request to the media server. If the first terminal is a called terminal, the first terminal receives a call request from the media server. The call request is used for the calling terminal to initiate a call to the called terminal. The first terminal and the media server carry out video media resource negotiation. After the first terminal and the media server finish the negotiation of the video media resources, the first terminal receives an RTP stream from the media server based on the negotiation result of the negotiation of the video media resources, wherein the RTP stream comprises RTP messages of at least two video media. For convenience of description, in the embodiment of the present application, the RTP stream includes a first RTP packet of a first video media and a second RTP packet of a second video media as an example for illustration. The first RTP message comprises a synchronization source SSRC identification of the RTP stream and an SSRC identification of the first video media, and the second RTP message comprises an SSRC identification of the RTP stream and an SSRC identification of the second video media.

And after receiving the RTP stream, the first terminal respectively plays a first video media and a second video media according to the RTP stream. For example, the first terminal determines, according to the SSRC identifier of the RTP stream, that the first RTP packet and the second RTP packet are RTP packets in the RTP stream. And the first terminal determines a first RTP message of the first video media from the RTP stream according to the SSRC identifier of the first video media, and can play the first video media after analyzing and decoding the first RTP message. And the first terminal determines a second RTP message of the second video media from the RTP stream according to the SSRC identifier of the second video media, and can play the second video media after analyzing and decoding the second RTP message.

In the method, the network does not need to splice the first video media and the second video media in advance, and the first terminal can simultaneously play the first video media and the second video media on the screen by carrying the RTP messages of the first video media and the second video media in the same RTP stream, so that the method is favorable for saving the operation resources required by video splicing and developing the multi-screen video media playing service.

In one possible design, the negotiation of the video media resource between the first terminal and the media server specifically includes the following contents. The first terminal receives an update message from the media server indicating that the media server supports a multiple SSRC identification mechanism. Then, the first terminal sends a response message to the media server, optionally, the response message carries a negotiation result of the video media resource negotiation, which may also be called media capability information of the first terminal, and also indicates that the first terminal supports a multiple SSRC identification mechanism, so that the media server determines whether the first terminal supports the multiple SSRC identification mechanism according to the response message, thereby facilitating the media server to send an RTP stream with multiple SSRC identifications to the first terminal based on the multiple SSRC identification mechanism supported by the first terminal, and avoiding a failure of the first terminal in playing multiple video media.

In one possible design, the update message carries media capability information of the media server, such as Session Description Protocol (SDP) information of the media server. In one possible design, the update message includes first indication information indicating the SSRC identification of the first video media and the SSRC identification of the second video media. On the premise that the first terminal supports a multi-SSRC identification mechanism, the first terminal is favorable for determining information of RTP streams to be received in advance according to the first indication information, such as the number of video media in the RTP streams and the SSRC identification of each video media, so that the first terminal can determine RTP messages of each video media from the received RTP streams conveniently, and the first terminal can play the first video media and the second video media correctly according to the RTP streams conveniently.

In one possible design, the first indication information further indicates a playing parameter of the first video medium and a playing parameter of the second video medium. The playing parameter of the first video media indicates the display position of the first video media at the first terminal. After receiving the update message, the first terminal may determine the playing parameter of the first video media according to the SSRC identifier of the first video media. After receiving the RTP stream, the first terminal may determine a first RTP packet of the first video media according to the SSRC identifier of the first video media. And then, the first terminal can analyze and decode the first RTP message and play the first video media according to the playing parameters of the first video media. Similarly, after receiving the update message, the first terminal may determine the playing parameters of the second video media according to the SSRC identifier of the second video media. After receiving the RTP stream, the first terminal may determine a second RTP packet of the second video media according to the SSRC identifier of the second video media. And then, the first terminal can analyze and decode the second RTP message and play the second video media according to the playing parameters of the second video media.

In one possible design, the playing parameter of the first video medium is used to indicate a display position of the first video medium on the first terminal.

In a possible design, the first RTP packet further includes a first sequence identifier, where the first sequence identifier of the first RTP packet is used to indicate a sequence of the first RTP packet in the first video media, so that the first terminal can determine whether the first video media has lost a packet. Similarly, the second RTP packet further includes a first sequence identifier, where the first sequence identifier of the second RTP packet is used to indicate an order of the second RTP packet in the second video media, so that the first terminal can determine whether the second video media loses packets.

In one possible design, after the first terminal plays the first video media and the second video media respectively according to the RTP stream, the first terminal sends a switching request to the media server, where the switching request is used to request to switch the first video media. And then, the first terminal receives a third RTP message of a third video media from the media server, wherein the third RTP message comprises an SSRC (steady state streaming) identifier of the RTP stream and an SSRC identifier of the third video media, and then the third video media is played according to the third RTP message. The SSRC identification of the third video media is the same as the SSRC identification of the first video media, so that the first terminal can continuously determine the third RTP message of the third video media from the RTP stream according to the SSRC identification of the first video media, analyze and decode the third RTP message, and continuously play the third video media according to the playing parameters and the like of the first video media, thereby realizing the switching of the first video media into the third video media on the first terminal, increasing the selectivity of users to the video media and improving the user experience.

In a third aspect, an embodiment of the present application provides a call processing method, including: a Media Resource Server (MRS) receives a first play request from an Application Server (AS), and then sends a real-time transport protocol RTP stream to a first terminal according to the first play request, where the RTP stream includes a first RTP packet of a first video media and a second RTP packet of a second video media, the first RTP packet includes a synchronization source SSRC identifier of the RTP stream and an SSRC identifier of the first video media, and the second RTP packet includes the SSRC identifier of the RTP stream and the SSRC identifier of the second video media.

In the method, the network does not need to splice the first video media and the second video media in advance, and the first terminal is instructed to play the first video media and the second video media on the screen by carrying the RTP messages of the first video media and the second video media in the same RTP stream, so that the method is favorable for saving the operation resources required by video splicing and developing the multi-screen video media playing service.

In one possible design, the MRS performs a video media resource negotiation with the first terminal through the AS before the MRS receives the first play request from the AS. The MRS sending the RTP stream to the first terminal according to the first play request comprises: MRS sends RTP stream to the first terminal according to the negotiation result of video media resource negotiation and the first play request.

In one possible design, the negotiation of the video media resource between the MRS and the first terminal through the AS specifically includes the following. The MRS sends an update message to the first terminal through the AS, the update message indicating that the MRS and/or the AS support a multiple SSRC identification mechanism. Thereafter, the MRS receives a response message from the first terminal through the AS, the response message indicating that the first terminal supports the multiple SSRC identification mechanism. Therefore, the MRS can conveniently send the RTP stream with the multiple SSRC identifications to the first terminal based on the first terminal supporting the multiple SSRC identification mechanism, and the failure of playing the video media caused by the fact that the MRS also sends the RTP stream with the multiple SSRC identifications to the first terminal when the first terminal does not support the multiple SSRC identification mechanism is avoided. Optionally, the update message carries media capability information of the MRS; the response message carries a negotiation result of the video media resource negotiation, which may also be called media capability information of the first terminal.

In one possible design, before the MRS performs the video media resource negotiation with the first terminal through the AS, the MRS receives a request message from the AS, where the request message is sent by the AS in response to a received call request, where the call request is used by the calling terminal to initiate a call to the called terminal. The request message indicates that the calling user or the called user has subscribed to the multi-video media playing service. As an optional way, the request message specifically indicates that the MRS performs a video media resource negotiation with the first terminal by using a multiple SSRC mechanism.

In one possible design, the update message carries media capability information for the MRS (e.g., SDP information for the MRS). In one possible design, the update message includes first indication information indicating the SSRC identification of the first video media and the SSRC identification of the second video media. Therefore, the first terminal determines the information of the RTP stream to be received, such as the number of the video media in the RTP stream and the SSRC identifier of each video media, according to the first indication information, which is convenient for the first terminal to determine the RTP message of each video media from the received RTP stream, and thus the first terminal can correctly play the first video media and the second video media according to the RTP stream.

In one possible design, after the MRS sends the RTP stream to the first terminal, the MRS receives a stop play request and a second play request from the AS, the stop play request indicating that the first video media is stopped to be played, and the second play request indicating that the third video media is played. The stop playing request and the second playing request may be the same message or may be two different messages. And then, the MRS stops sending the first RTP packet of the first video media to the first terminal according to the play stop request, and sends the third RTP packet of the third video media to the first terminal according to the second play request, where the third RTP packet includes the SSRC identifier of the RTP stream and the SSRC identifier of the third video media. The SSRC identification of the third video media is the same as the SSRC identification of the first video media, so that the first terminal can continuously determine the third RTP message of the third video media from the RTP stream according to the SSRC identification of the first video media, analyze and decode the third RTP message, and continuously play the third video media according to the playing parameters and the like of the first video media, thereby realizing the switching of the first video media into the third video media on the first terminal, increasing the selectivity of users to the video media and improving the user experience.

In a fourth aspect, an embodiment of the present application provides a media server, which has a function of implementing any one of the methods in the first aspect. The function can be realized by hardware, and can also be realized by executing corresponding software by hardware. The hardware or software includes one or more units corresponding to the above functions, such as a storage unit, a transmission unit, a reception unit, or a processing unit.

In one possible design, the media server is configured to include at least one processor and a memory, the memory storing program code, the processor calling the program code to perform some or all of the steps of any of the methods of the first aspect. The media server may also include a communication interface for communicating with other devices.

In a fifth aspect, an embodiment of the present application provides a terminal having a function of implementing any one of the methods in the second aspect. The function can be realized by hardware, and can also be realized by executing corresponding software by hardware. The hardware or software includes one or more units corresponding to the above functions, such as a storage unit, a transmission unit, a reception unit, or a processing unit.

In one possible design, the terminal is structured to include at least one processor and a memory, the memory storing program code, the processor calling the program code to perform part or all of the steps of any of the methods of the second aspect. The terminal may further comprise a communication interface for communicating with other devices.

In a sixth aspect, an embodiment of the present application provides a Media Resource Server (MRS), which has a function of implementing any one of the methods in the third aspect. The function can be realized by hardware, and can also be realized by executing corresponding software by hardware. The hardware or software includes one or more units corresponding to the above functions, such as a storage unit, a transmission unit, a reception unit, or a processing unit.

In one possible design, the MRS is structurally comprised of at least one processor and a memory, the memory storing program code, the processor invoking the program code to perform some or all of the steps of any of the methods of the fourth aspect. The terminal may further comprise a communication interface for communicating with other devices.

In a seventh aspect, an embodiment of the present application provides a call processing system, including the media server in the fourth aspect and the terminal in the fifth aspect, which is not described herein again.

In an eighth aspect, an embodiment of the present application provides a call processing system, including the terminal in the fifth aspect and the Media Resource Server (MRS) in the sixth aspect, which are not described herein again. The system also includes an application server AS.

In a ninth aspect, embodiments of the present application provide a computer storage medium storing program code, wherein the program code includes instructions for performing some or all of the steps of any of the methods of the first, second or third aspects.

In a tenth aspect, embodiments of the present application provide a computer program product which, when run on a computer, causes the computer to perform some or all of the steps of any of the methods of the first, second or third aspects.

Drawings

Fig. 1 is a system architecture diagram applied in a VoLTE network according to an embodiment of the present application;

FIG. 2 is a schematic flow chart diagram illustrating one possible embodiment of a call processing method of the present application;

FIG. 3A is a schematic flow chart diagram illustrating another possible embodiment of a call processing method of the present application;

FIG. 3B is a schematic diagram of an interface for the calling terminal to play RTP stream;

fig. 3C is an interface diagram of the RTP stream after the calling terminal plays the switch;

FIG. 4 is a schematic flow chart diagram illustrating another possible embodiment of a call processing method of the present application;

FIG. 5 is a schematic flow chart diagram illustrating another possible embodiment of a call processing method of the present application;

FIG. 6 is a schematic diagram of a media server according to one possible embodiment of the present application;

FIG. 7 is a schematic diagram of one possible embodiment of a terminal of the present application;

fig. 8 is a schematic diagram of one possible embodiment of an MRS of the present application;

FIG. 9 is a schematic diagram of one possible embodiment of a computer device of the present application.

Detailed Description

In order to make the purpose, technical solution and advantages of the present application more clear, embodiments of the present application will be described in further detail with reference to the accompanying drawings.

The embodiment of the application can be applied to 4 th generation (4G), 5 th generation (5G) mobile communication network architecture or future networks. For convenience of description, the system architecture and the method flow of the scheme are described below by taking a 4G network architecture as an example.

As shown in fig. 1, a system architecture diagram for applying the embodiment of the present application in a VoLTE network includes: the system comprises a calling terminal, a called terminal, a wireless access network, an Internet Protocol (IP) multimedia subsystem (IMS) domain network of a calling side and a called side.

The IMS domains of the calling side and the called side may include an IMS domain Core network and an Evolved Packet Core (EPC). The IMS domain core network comprises: a service-call session control function (S-CSCF), an inquiry-call session control function (I-CSCF), a proxy-call session control function (P-CSCF), a Home Subscriber Server (HSS), a Session Border Controller (SBC), and the like. Wherein the I-CSCF may be collocated with the S-CSCF and may be referred to as "I/S-CSCF" for short. The SBC and P-CSCF may be collocated, and may be referred to as "SBC/P-CSCF" for short. The EPC may include a packet data network gateway (PGW), a Serving Gateway (SGW), and a Mobile Management Entity (MME). The PGW and the SGW may be combined together, and may be referred to as "S/P-GW" for short.

The above network elements are all corresponding network elements in the wireless communication network in the prior art, and are not described in detail here, but only briefly described. For example: the S-CSCF may be used for registration, authentication control, session routing and service trigger control of the user and to maintain session state information. The I-CSCF may be used for assignment and querying of S-CSCFs for user registrations. The P-CSCF may be used as a proxy for signaling and messages. The HSS may be used to store subscriber subscription information and location information. SBCs may provide secure access and media handling. The MME is the core device of the EPC network. The SGW may be used for connection of the IMS core network to the wireless network, and the PGW may be used for connection of the IMS core network to the IP network.

The IMS domain core network for the calling and called further includes: a media server. The media server provides media playing of the video for the calling and/or called users. The media server may include an Application Server (AS) and a Media Resource Server (MRS). The AS and the MRS can be combined and can also be deployed in different physical devices. The AS processes the signaling messages and the MRS provides the video stream and the video stream of the calling party and/or the called party.

The media server may be a color ring server, and the color ring server provides media playing of video for the calling user. The color ring server may include Customized Alerting Tone (CAT) AS and MRS, among others. The media server may specifically be a color vibration server, and the color vibration server provides media playing of videos for the called user. The lottery server may include a customized alerting signal (CRS) AS and an MRS, among others. In addition, the media server can provide the media playing of the video for the calling user and the media playing of the video for the called user, for example, the color ring server and the color vibration server are combined to be the media server.

The calling terminal and the called terminal are devices with wireless transceiving functions, can be deployed on land, and comprise indoor or outdoor, handheld or vehicle-mounted terminals; can also be deployed on the water surface (such as a ship and the like); and may also be deployed in the air (e.g., airplanes, balloons, satellites, etc.). Specifically, the terminal device may be a terminal device (terminal device) capable of accessing a mobile network, a mobile phone (mobile phone), a tablet computer (pad), a computer with a wireless transceiving function, a Virtual Reality (VR) terminal, an Augmented Reality (AR) terminal, a wireless terminal in industrial control (industrial control), a wireless terminal in self driving (self driving), a wireless terminal in remote medical (remote medical), a wireless terminal in smart grid (smart grid), a wireless terminal in transportation safety (transportation safety), a wireless terminal in city (smart city), a wireless terminal in smart home (smart home), and the like.

It should be noted that the above description does not limit the system architecture of the embodiment of the present application, which includes but is not limited to that shown in fig. 1.

As an alternative, the media server may also not be located in the IMS domain core network of the calling or called party, but independent from the calling and called IMS domain core networks.

As an optional way, the embodiment of the present application may also be a scenario of a user of a VoLTE network and a user of another network (such as an IMS network, a fixed network, a switching network, and the like). For example, in the embodiment of the present application, the calling or called user is a VoLTE user, and the opposite user is a user of another network.

The following describes the embodiments of the present application with reference to specific examples. For simplicity and understanding, some network elements through which the signaling interaction passes in the figure are not shown, such as S/P-GW, SBC/P-CSCF, I/S-CSCF, etc.

Fig. 2 is a schematic flow chart of one possible embodiment of the call processing method of the present application. The method may be applied to the system shown in fig. 1, and may also be applied to other communication scenarios, and the embodiment of the present application is not limited herein. The media server in the embodiment of the application can be in a calling domain or a called domain; the first terminal may refer to a calling terminal or a called terminal. The following describes specific steps of the embodiment of the call processing method corresponding to fig. 2.

201. The media server receives the call request from the first terminal or forwards the call request to the first terminal.

The calling terminal initiates a call to the called terminal, which may be a video call or an audio call. The call request for initiating the call is, for example, an INVITE message.

The call request includes Session Description Protocol (SDP) information of the calling terminal, and the SDP information of the calling terminal includes, but is not limited to, a call type, a media type supported by the calling party, and a code. The call request is used for the calling terminal and the called terminal to carry out conversation media negotiation.

After the calling terminal sends the call request, the media server may receive the call request from the calling terminal and forward the call request to the called terminal. For simplicity of description, in the embodiment corresponding to fig. 2, the first terminal may refer to a calling terminal or a called terminal. When the first terminal is a calling terminal, step 201 specifically means that the media server receives a call request sent by the first terminal; when the first terminal is a called terminal, step 201 specifically refers to the media server forwarding the received call request to the first terminal.

202. And the media server performs video media resource negotiation with the first terminal.

The media server and the first terminal carry out video media resource negotiation, the negotiation result of the video media resource negotiation includes but is not limited to negotiation IP address, media format, port number, media transmission type and/or audio and video media coding mode, and the negotiation result indicates the method for the media server to send RTP stream.

Possible embodiments of step 202 are illustrated later by step 2021 and step 2022.

203. The media server sends the RTP stream to the first terminal.

Based on the negotiation result of the video media resource negotiation, the media server sends a path of RTP stream to the first terminal. The source IP address, source port, destination IP address, destination port and transport layer protocol of any two RTP messages in one path of RTP flow are all the same. It should be understood that the IP address, port, etc. information of the RTP stream is determined based on the negotiation result of the video media resource negotiation in 202.

The RTP stream includes RTP packets for at least two video media. Illustratively, the video media refers to video color ring back tone or video advertisement, etc., and the plurality of video media may include video color ring back tone and/or video advertisement, etc. For convenience of description, in the embodiment of the present application, the RTP stream includes a first RTP packet of a first video media and a second RTP packet of a second video media as an example for illustration. The first RTP packet includes an SSRC identification of the RTP stream and an SSRC identification of the first video media, and the second RTP packet includes an SSRC identification of the RTP stream and an SSRC identification of the second video media.

The existing RTP packet includes an SSRC identifier, that is, an SSRC identifier of the RTP stream, so that the SSRC identifier of the RTP stream in the RTP packet related to the embodiment of the present application may be referred to as an SSRC identifier of the RTP packet or an original SSRC identifier, and the SSRC identifier of the newly added video media may be referred to as a sub-SSRC identifier of the RTP packet or an newly added SSRC identifier. For example, the SSRC identification of the first video media is referred to as the sub SSRC identification of the first RTP packet, and the SSRC identification of the second video media is referred to as the sub SSRC identification of the second RTP packet. As an example, assume that the RTP stream includes the following packets: RTP1-a, RTP2-b, RTP2-c and RTP1-d, wherein the SSRC identifications of RTP1-a, RTP2-b, RTP2-c and RTP1-d are the same and are the SSRC identifications of the RTP stream; the sub SSRC identifications of the RTP1-a and the RTP1-d are the same and are both SSRC identifications of the first video media; the sub-SSRC identifications of RTP2-b and RTP2-c are the same and are both SSRC identifications of the second video media.

204. And the first terminal respectively plays the first video media and the second video media according to the RTP stream.

After receiving RTP streams of a plurality of video media, the first terminal plays a first video media and a second video media respectively according to the RTP streams.

As an optional manner, the first terminal determines, according to the SSRC identifier of the RTP stream, that the first RTP packet and the second RTP packet are RTP packets in the RTP stream. The SSRC identification of the first video media is different from the SSRC identification of the second video media, the first terminal determines that the first RTP message corresponds to the first video media according to the SSRC identification of the first video media in the first RTP message, and determines that the second RTP message corresponds to the second video media according to the SSRC identification of the second video media in the second RTP message. Specifically, the first terminal determines a first RTP packet of the first video media from the RTP stream according to the SSRC identifier of the first video media, and can play the first video media after analyzing and decoding the first RTP packet. And the first terminal determines a second RTP message of the second video media from the RTP stream according to the SSRC identifier of the second video media, and can play the second video media after analyzing and decoding the second RTP message. Thus, the user of the first terminal can view the first video media and the second video simultaneously through the first terminal.

In a possible implementation manner, the first RTP packet further includes a first order identifier, and the first order identifier of the first RTP packet indicates an order of the first RTP packet in the first video media. After the first terminal identifies the multiple RTP packets of the first video media according to the SSRC identifier of the first video media, it may determine whether the first video media loses packets according to the first sequence identifier of each RTP packet in the multiple RTP packets, and may sequence each RTP packet in the multiple RTP packets. Similarly, the second RTP packet further includes a first order identifier, where the first order identifier of the second RTP packet indicates an order of the second RTP packet in the second video media. After the first terminal identifies the multiple RTP packets of the second video media according to the SSRC identifier of the second video media, the first terminal may determine whether the second video media loses packets according to the first sequence identifier of each RTP packet in the multiple RTP packets, and sequence each RTP packet in the multiple RTP packets.

In a possible implementation manner, the first RTP packet further includes a second order identifier, and the second order identifier of the first RTP packet indicates an order of the first RTP packet in the RTP stream. Similarly, the second RTP packet further includes a second order identifier, and the second order identifier of the second RTP packet indicates an order of the second RTP packet in the RTP stream. And the first terminal judges whether the RTP stream loses packets or not according to the second sequence identification of each RTP message in the RTP stream.

The existing RTP packet generally includes an order identifier, that is, the second order identifier, and the first order identifier is an order identifier added to the RTP packet in the embodiment of the present application, so the second order identifier in the RTP packet may also be referred to as an order identifier or an original order identifier of the RTP packet, and the first order identifier in the RTP packet may be referred to as a sub-order identifier or an added order identifier of the RTP packet. For example, based on the foregoing example, it is assumed that the RTP packets in the RTP stream sequentially include, in the sending order: RTP1-a, RTP2-b, RTP2-c and RTP1-d, wherein the second order identifiers (or called order identifiers) of RTP1-a, RTP2-b, RTP2-c and RTP1-d are 1, 2, 3, 4 in sequence, and the first order identifiers (or called sub-order identifiers) of RTP1-a, RTP2-b, RTP2-c and RTP1-d are 1, 2 and 2 in sequence.

In the embodiment of the application, the network does not need to splice the first video media and the second video media in advance, and the first terminal can simultaneously play the first video media and the second video media on the screen according to the RTP stream, so that the method and the device are beneficial to saving the operation resources required by video splicing, and further are beneficial to the development of multi-screen video media playing services.

Furthermore, in the embodiment of the present application, one path of RTP stream carries the first RTP packet of the first video media and the second RTP packet of the second video media, and the receiving ports of the same path of RTP stream are the same, so that the embodiment of the present application transmits at least two video media through one path of RTP stream, which is beneficial to reducing the processing complexity of video transmission and saving port resources.

The existing single RTP stream corresponds to a single video media, the RTP message in the single RTP stream only comprises one SSRC identifier, and the SSRC identifiers of the RTP messages are the same. In contrast, the single RTP stream referred to in the embodiment of the present application corresponds to multiple video media, and the RTP packet in the single RTP stream includes two SSRC identifiers. Wherein, the SSRC identification of the RTP stream can be understood by referring to the SSRC identification in the existing RTP stream; the SSRC identification of the video media is used for indicating the video media corresponding to the RTP message. And since a single RTP stream corresponds to multiple video media (e.g., n video media), the single RTP stream includes SSRC identification of the multiple video media (e.g., SSRC identification of the n video media). For the sake of convenience of distinction, the existing RTP stream is referred to as a single SSRC-identified RTP stream, and the RTP stream related to the embodiment of the present application is referred to as a multiple SSRC-identified RTP stream. Supporting sending or receiving multiple SSRC identified RTP streams, and/or supporting video media resource negotiation for multiple SSRC identified RTP streams, is referred to as supporting multiple SSRC identification mechanism.

In step 202, the media server negotiates with the first terminal about the video media resource, and as an optional manner, the media server may negotiate with the first terminal about whether the multiple SSRC identification mechanism is supported in the process. Illustratively, with continued reference to fig. 2, step 202 specifically includes step 2021 and step 2022.

2021. The media server sends an update message to the first terminal.

After receiving the call request, the media server sends an update message to the first terminal, where the update message carries media capability information of the media server (e.g., SDP information of the media server). The update message also indicates that the media server supports a multiple SSRC identification mechanism.

2022. The first terminal sends a response message to the media server.

And after receiving the update message, the first terminal determines a negotiation result of video media resource negotiation according to the media capability information of the media server carried by the update message and the self capability of the first terminal. It should be understood that the negotiation result of the video media resource negotiation may also be called media capability information of the first terminal. Therefore, the first terminal sends a response message to the media server, and the response message carries the negotiation result of the video media resource negotiation.

Optionally, the response message indicates that the first terminal supports the multiple SSRC identification mechanism, so that the media server determines that the first terminal supports the multiple SSRC identification mechanism according to the response message, thereby facilitating the media server to further send the RTP stream with multiple SSRC identifications to the first terminal.

Illustratively, the update message includes a field "multi-pack-mode ═ 1", which indicates that the media server supports the multiple SSRC identification mechanism, so that the first terminal determines that the media server supports the multiple SSRC identification mechanism according to the field. If the first terminal supports the multiple SSRC identification mechanism, the response message includes a field "multi-pack-mode" 1", where the field indicates that the first terminal supports the multiple SSRC identification mechanism, and then after the media server receives the response message, the media server determines that the first terminal supports the multiple SSRC identification mechanism according to the field in the response message, and further performs step 203, and sends the RTP stream with multiple SSRC identifications to the first terminal.

If the first terminal does not support the multiple SSRC identification mechanism, the response message indicates that the first terminal does not support the multiple SSRC identification mechanism, and illustratively, the response message includes "multi-pack-mode ═ 0", or the response message does not include a relevant field of "multi-pack-mode". In order to avoid playing error or failure after the first terminal receives the RTP streams of the multiple video media, the media server does not execute step 203 according to "multi-pack-mode ═ 0" in the response message. As an alternative, the media server sends, to the first terminal, an RTP stream encapsulated according to the single SSRC identification mechanism according to the response message, for example, an RTP stream for sending the first video media.

As can be seen, in the method according to the embodiment of the present application, by executing step 2021 and step 2022, the media server determines whether the first terminal supports the multiple SSRC identification mechanism according to the response message, so that the media server is convenient to send the RTP stream with the multiple SSRC identifications to the first terminal based on the multiple SSRC identification mechanism supported by the first terminal, thereby avoiding the failure of playing the multiple video media by the first terminal, which is beneficial to improving the success rate of playing the multiple video media and improving the user experience.

Further, as an optional manner, the SDP information carried in the update message involved in step 2021 includes first indication information, where the first indication information indicates the SSRC identifier of the first video media and the SSRC identifier of the second video media. On the premise that the first terminal supports the multi-SSRC identification mechanism, the first terminal may determine, according to the first indication information, information of the RTP streams to be received, such as the number of video media in the RTP streams and the SSRC identification of each video media, so that the first terminal can determine, in step 204, the RTP packet of each video media from the received RTP streams, and thus the first terminal can correctly play the first video media and the second video media according to the RTP streams.

Optionally, the first indication information further indicates a playing parameter of the first video medium and a playing parameter of the second video medium. The playing parameter of the first video media indicates the display position of the first video media at the first terminal. After receiving the update message, the first terminal may determine the corresponding playing parameter of the first video media according to the SSRC identifier of the first video media. After receiving the RTP stream, the first terminal may determine a first RTP packet of the first video media according to the SSRC identifier of the first video media. And then, the first terminal can analyze and decode the first RTP message and play the first video media according to the playing parameters of the first video media. Similarly, after receiving the update message, the first terminal may determine the corresponding playing parameter of the second video media according to the SSRC identifier of the second video media. After receiving the RTP stream, the first terminal may determine a second RTP packet of the second video media according to the SSRC identifier of the second video media. And then, the first terminal can analyze and decode the second RTP message and play the second video media according to the playing parameters of the second video media. In one possible design, the playing parameter of the first video medium is used to indicate a display position of the first video medium on the first terminal.

The first indication information in the update message can transmit the playing parameters of each video media to the first terminal, so that the playing parameters of each video media at the first terminal can be controlled conveniently, the richness and flexibility of playing the multi-video media are improved, and the user experience is improved.

As an alternative, the first indication information corresponds to the same m lines of video in the SDP information.

As an optional way, the first indication information is further used for indicating that the media server supports a multiple SSRC identification mechanism. Illustratively, the SDP information carried by the update message includes the following contents:

m＝audio 20100RTP/AVP 120

a＝rtpmap:120amr_wb/16000

m＝video 20102RTP/AVP 97

a＝rtpmap:97H264/90000

a＝fmtp:97profile_level_id＝42C01E

a＝multi_SSRC:12345showmode＝2*1location＝1:1

a＝multi_SSRC:12346showmode＝2*1location＝2:1。

where "m ═ audio 20100RTP/AVP 120" is an audio m line, "m ═ audio 20100RTP/AVP 120" is a video m line, "a ═ rtpmap:120amr _ wb/16000" corresponds to the audio m line "m ═ audio 20100RTP/AVP 120", and the contents after m ═ video 20102RTP/AVP 97 "correspond to the same video m line, that is, both correspond to" m ═ audio 20100RTP/AVP 120 ".

The first indication information includes: "a ═ multi _ SSRC:12345showmode ═ 2 × 1location ═ 1: 1; a multi _ SSRC 12346showmode 2 × 1location 2:1 ". And after receiving the update message, the first terminal determines the information of the RTP stream with the multiple SSRC identifiers to be sent according to the first indication information. Specifically, the RTP stream carries RTP packets of two video media, and the first terminal plays the two video media in two different upper and lower windows in the same column on the screen. One of the two video media (referred to as video media 1) has SSRC identification 12345, and the first terminal plays video media 1 in the upper window; the SSRC of another video media, referred to as video media 2, is identified as 12346, and the first terminal needs to play this video media 2 in the lower window.

As an alternative, the SDP information further includes "multi-pack-mode ═ 1", for example, the field is located after "m ═ video 20102RTP/AVP 97", also corresponding to the video m line, indicating that the media server supports the multiple SSRC identification mechanism. The first terminal determines that the media server supports a multi-SSRC identification mechanism according to 'multi-pack-mode ═ 1' in the SDP information.

The video media resource negotiation process is introduced in the foregoing, the SDP content is extended based on the existing SDP negotiation framework, and the existing scene of negotiating a single video media can be compatible by negotiating a plurality of video media in the same video m line, and from the point of negotiation, the extension mechanism adopted by the present application is simple and effective. If the negotiation is performed through a plurality of video m lines, the network element for transmitting the SDP information needs to be upgraded to support the negotiation of the plurality of video m lines. Compared with the scheme provided by the embodiment of the application, the scheme has the advantages that the SDP adopted in the video media resource negotiation process is small in scale, the negotiation process only relates to the modification of the media server and the first terminal, the transparent transmission of other network elements is realized, the implementation is simple, and the modification of the network elements is less.

In order to improve the flexibility of watching video polyphonic ringtone, based on the scheme, the application provides a video media switching scheme, namely, the user can combine a video interaction scheme to randomly switch the playing of the video media in the process of watching a plurality of video polyphonic ringtones at the first terminal. Optionally, after step 204, step 205-step 207 are also included.

205. The first terminal sends a handover request to the media server.

In the process of watching a plurality of video media played on the first terminal, a user of the first terminal may trigger the first terminal to switch a certain video media through an operation on the first terminal, for example, trigger the first terminal to switch the first video media through a sliding operation of clicking a display position of the first video media on a screen or the display position of the first video media on the screen. And after detecting the operation of the user, the first terminal sends a switching request of the first video media to the media server. The switching request indicates to switch the first video media, and the switching request may carry information of the first video media, such as an SSRC identifier of the first video media, and/or a playing parameter.

206. And the media server sends a third RTP message of a third video media to the first terminal.

After receiving a switching request of a first video media, a media server determines an SSRC identifier of the first video media, stops continuously sending a first RTP message of the first video media, and sends a third RTP message of a third video media to a first terminal, wherein the third RTP message comprises the SSRC identifier of the RTP stream and the SSRC identifier of the third video media, and the SSRC identifier of the third video media is the same as the SSRC identifier of the first video media.

207. And the first terminal plays the third video media according to the third RTP message.

After receiving the RTP stream, the first terminal determines a third RTP packet of the third video media from the RTP stream according to the SSRC identifier of the third video media, and performs operations such as parsing and decoding on the third RTP packet to obtain a video frame of the third video media, thereby playing the third video media. Since the SSRC identifier of the third video media is the same as the SSRC identifier of the first video media, the first terminal can play the third video media according to the playing parameters of the first video media. For example, the third video media is played in the upper window of the first terminal.

Fig. 3A is a schematic flow chart diagram of another possible embodiment of the call processing method of the present application. Fig. 3A depicts the method of the present application in exemplary signaling. The method shown in fig. 3A is an example of the method shown in fig. 2, and thus some explanations of the method shown in fig. 3A may refer to the method shown in fig. 2. The following takes a video media as a video polyphonic ringtone as an example, and introduces specific steps of the embodiment of the call processing method corresponding to fig. 3A.

301-.

Specifically, the media server receives the call request sent by the calling terminal and forwards the call request to the called terminal.

The call request carries SDP information (such as SDPo1) of the calling terminal, and is used for the calling terminal to perform session media negotiation with the called terminal. The call request is specifically an INVITE message.

303-.

Specifically, the media server receives a call response sent by the called terminal, and forwards the call response to the calling terminal.

The call response carries the SDP negotiation result (such as SDPa1) between the called terminal and the calling terminal, which may also be called as SDP information of the called terminal. The SDP information is used for the called terminal and the calling terminal to carry out conversation media negotiation.

In particular, the call response may be a 183 message indicating that the called terminal has received the call request.

If the session media negotiation employs the resource reservation mechanism, step 305 and step 308 are also included. It should be understood that if the session media negotiation does not employ the resource reservation mechanism, steps 305 to 308 may not be included.

305-.

Specifically, the media server receives an update message sent by the calling terminal, and forwards the update message to the called terminal.

Wherein, the update message carries SDP information (such as SDPo2) of the calling terminal, indicating that the calling terminal has completed resource reservation. For example, the UPDATE message is UPDATE (SDPo 2).

307 and 308, the called terminal sends 200ok (SDPa2) to the calling terminal.

Specifically, the media server receives 200ok (SDPa2) sent by the called terminal and forwards the ok to the calling terminal. The 200ok message indicates that the called terminal has completed the resource reservation.

309. The called terminal sends 180(ALERTING) to the media server.

Wherein 180(ALERTING) is a ringing message indicating that the called terminal has rung.

310. The media server sends an UPDATE (SDPo3) to the calling terminal.

The UPDATE message (UPDATE (SDPo3)) carries SDP information (SDPo3) of the media server, and is used for the media server to perform video media resource negotiation with the calling terminal. The update message indicates that the media server supports a multiple SSRC identification mechanism. The SDPo3 includes first indication information indicating SSRC identification of a plurality of video color ring tones. For an explanation of the UPDATE message (UPDATE (SDPo3)), see step 2021 in detail. Assume that the first indication information indicates that the SSRC identifier of the video ring back tone 1 is "12345" and the SSRC identifier of the video ring back tone 2 is "12346".

311. The calling terminal sends a 200(SDPa3) message to the media server.

The response message (200(SDPa3)) carries SDP information (SDPo3) of the calling terminal, i.e., a result of the video media resource negotiation between the calling terminal and the media server. The response message also indicates that the calling terminal supports the multiple SSRC identification mechanism. For an explanation of the response message (200(SDPa3)), see in particular step 2022. Illustratively, the SDPa3 includes a field "multi-pack-mode ═ 1".

312. The media server sends 180(ALERTING) to the calling terminal via the I/S-CSCF.

Wherein, the 180 message is a ringing message indicating that the called terminal has rung.

313. The media server sends the RTP stream to the calling terminal.

The media server starts the video color ring playing, encapsulates the video frames of the video color ring in a path of RTP stream, and sends the RTP stream to the calling terminal, which refers to step 203.

In the embodiment of the present application, the video is played after the ringing message is received as an example, and it should be understood that the present application is also applicable to other scenarios such as second, etc., for example, step 313 is executed before step 312.

As an optional mode, the video color ring in the RTP packet is identified by an RTP extension header in the RTP packet. Illustratively, the RTP packet header of the RTP packet in the RTP stream includes the contents shown in table 1.

TABLE 1

The meaning of each field in the RTP header is described below.

"V" indicates the version number of the RTP protocol, the field being 2 bits (bit) in length. "P" refers to padding bits, and the field is 1 bit long and is typically not padded, i.e., typically P is 0. "X" refers to an extension bit, the length of the field is 1 bit, if the RTP packet has an extension header before the payload, the field is set to 1, and in this embodiment, the RTP packet has an extension header before the payload, so the field is set to 1. "CC" is a counter (CSRC count) of the special cell, indicating the number of CSCR identifiers, and the field length is 4 bits. The "M" indicates a flag bit, the field length is 1 bit, if the current packet is the last packet of the video frame, then M is 1, and in the rest cases M is 0. "PT" refers to a payload type (payload type), and the field has a length of 7 bits and indicates a codec type. The "Sequence number" refers to the serial number of the current message, the length of the field is 16 bits, the value of the field is increased by 1 when sending a message, and the receiver can not only detect the loss of the message by using the serial number, but also Sequence the message. The "Sequence number" may be understood by referring to the second Sequence identifier of the first RTP packet in the embodiment corresponding to fig. 2. "Timestamp" indicates the sampling time of the first sampling point of the audio-video frame, and the field length is 32 bits. "SSRC id" refers to RTP stream synchronization source id, and is used to indicate a path of RTP stream, and the length of the field is 32 bits. The "SSRC identification" can be understood with reference to the SSRC identification of the RTP stream in the corresponding embodiment of fig. 2. "CSRC list" which identifies all sources in the current message that contribute to the payload.

The content of the RTP extension header of the RTP packet in the RTP stream is described as follows, and exemplarily, as shown in table 2 below.

TABLE 2

The meaning of each field in the RTP extension header is described below.

"defined of profile" is used to identify the extension header, and the field is 16 bits long. "length" refers to the length of the extension header, and the field length is 16 bits. "P", "M", "PT" and "Reserve" have the same meaning as the corresponding fields in Table 1, and are not described herein again. The sub Sequence number is a sub Sequence number, which refers to the Sequence number of the current message in the video media, the field length is 16 bits, and the sub Sequence number of each video media is counted separately. The "sub Sequence number" can be understood with reference to the first order identification in the corresponding embodiment of fig. 2. The "sub Timestamp" indicates Timestamp information of the RTP packet, and the meaning of the Timestamp information can be understood with reference to "Timestamp" in table 1, and the field length is 32 bits. The 'sub SSRC identifier' refers to a synchronization source identifier of the video media, and is used for uniquely identifying the video media, and the field length is 32 bits, by which the video media to which the message belongs can be determined. The "sub SSRC identification" can be understood by referring to the SSRC identification of the video media (e.g. the SSRC identification of the first video media, the SSRC identification of the second video media) in the embodiment corresponding to fig. 2.

Referring to table 1 and table 2, the RTP packets in the RTP stream are labeled by the SSRC identifier of the RTP stream and the SSRC identifiers of the video color ring tones negotiated in step 310. Each RTP message header in the RTP stream adopts Sequence number counting, so that whether packet loss exists in the RTP stream or not is conveniently judged, and the RTP extension header adopts sub-Sequence number counting, so that a single video media is conveniently and correctly played. Assume that the "sub SSRC id" in the RTP message of the video color ring 1 is 12345 according to the negotiation result of step 310.

The RTP stream after encapsulation has the following characteristics: the "Sequence number" of the RTP packet header is globally continuous, whether the RTP stream loses packets can be judged by using the content, the "SSRC identifier" of the RTP packet header is used for identifying that the RTP packet belongs to the same RTP stream, and other fields (such as "M", "P", and "Timestamp") in the RTP packet header are invalid; the 'sub Sequence number' in the RTP extension head is continuous in the same video media, the 'sub SSRC mark' in the RTP extension head is used for judging the RTP message in the same video media, the 'M' in the RTP extension head is used for judging the end of the video frame, the 'P' in the RTP extension head is used for judging whether a supplementary bit exists, and the 'Timestamp' in the RTP extension head marks the Timestamp of the video frame.

314. And the calling terminal plays the video color ring 1 and the video color ring 2.

After receiving the RTP stream, the calling terminal analyzes the RTP message header, judges whether the packet is lost according to the Sequence number, and judges whether the received RTP message is the same RTP stream according to the SSRC identification. Then, the RTP extension head is analyzed, the RTP message of the same video color ring is determined according to the sub SSRC identification, the Sequence of the RTP message in the video media is judged according to the sub Sequence number, the frame rate is calculated according to the sub Timestamp, and a complete frame is obtained according to the sub Timestamp. And after determining a plurality of RTP messages of the same video media, sequencing the plurality of RTP messages according to the sub Sequence number. And the calling terminal plays each video media according to the display position of the corresponding SSRC identifier carried by the SDP information in the step 314. Specifically, the calling terminal analyzes the load content of the RTP packet, and may determine the characteristics of the video from a Picture Parameter Set (PPS) frame and a Sequence Parameter Set (SPS) frame, such as an external characteristic (profile), a level (level), a resolution, a frame rate, and the like, and may analyze I, P, B frame types and specific contents, where an I frame is an intra-coded frame, a P frame is a forward prediction frame, and a B frame is a bidirectional interpolation frame. The calling terminal sends the load of the RTP packet of each video media to a decoder for decoding processing, acquires the original image information, and displays the acquired image information according to the display position of the corresponding SSRC identifier carried by the SDP information in step 314.

The embodiment of the application takes an RTP message that RTP stream carries video color ring 1 and video color ring 2 as an example for illustration. Fig. 3B is a schematic interface diagram of the calling terminal playing the RTP stream, where the display content (such as information displayed on the screen, the position and sequence of the information, and the like) in the schematic diagram is only schematic and is not limited to the display content on the actual screen.

315. The calling terminal sends a switching request to the media server.

The switching request may specifically be an INFO notification message, where the INFO notification message indicates to switch the video color ring.

During the process of watching the played multiple video ring back tones, a user of the calling terminal may trigger the calling terminal to send a switching request to the media server through an operation (such as clicking or sliding) on the calling terminal. Taking fig. 3B as an example, suppose that the user slides the screen at the display position of the video polyphonic ringtone 1, and the calling terminal generates a switching request for switching the video polyphonic ringtone 1 after detecting the operation. The switch request may be implemented by the Media Server Markup Language (MSML), as an example:

the media server is instructed to switch the video ring back tone displayed in the first row and the first column by a "switching location" 1:1 "field in the example. Alternatively, the "switching location" field in the above example is changed to "switching SSRC" 12345 "to instruct the media server to switch the video color ring 1 identified by the SSRC" 12345 ".

316. The media server sends a response message to the handover request to the calling terminal.

The response message may specifically be a 200ok (info) message indicating that the handover was successful.

After receiving the switching instruction, the media server stops playing the video color ring 1, starts playing the video color ring 3, and sends a 200OK (INFO) message to the calling terminal to inform the calling terminal of successful switching.

317. And the media server sends the RTP message of the video color ring 3 to the calling terminal.

And the media server encapsulates the video frame of the video color ring 3 by using the SSRC of the RTP stream and the SSRC of the video color ring 1 according to a multi-SSRC identification mechanism to obtain an RTP message of the video color ring 3, and sends the RTP message of the video color ring 4 in the RTP stream.

Optionally, step 317 may be performed before step 316, or step 316 and step 317 may be performed simultaneously, which is not limited by the timing sequence.

318. And the calling terminal plays the video color ring 3 according to the playing parameter of the video color ring 1.

And the calling terminal plays the video color ring 3 according to the playing parameter of the video color ring 1 and plays the video color ring 2 according to the playing parameter of the video color ring 2. Fig. 3C is a schematic interface diagram of the RTP stream after the calling terminal plays the switch.

The interaction process between the media server and the first terminal is described above through the corresponding embodiments of fig. 2 and fig. 3A, and the interaction process between the AS and the MRS in the communication scenario where the AS and the MRS are separately arranged is described below by way of example. Fig. 4 is a schematic flow chart of another possible embodiment of the call processing method of the present application. The method may be applied to the system shown in fig. 1, and may also be applied to other communication scenarios, and the embodiment of the present application is not limited herein. In the embodiment of the application, the AS and the MRS are separately arranged, and the AS and the MRS can be in a calling domain or a called domain; the first terminal may refer to a calling terminal or a called terminal. The following describes specific steps of the embodiment of the call processing method corresponding to fig. 4.

401. The AS receives the call request from the first terminal or forwards the call request to the first terminal.

Step 401 can be understood with reference to step 201, and is not described herein again.

402. The AS sends a request message to the MRS.

And after receiving the call request, the AS sends a request message to the MRS to indicate that the calling user or the called user has subscribed the multi-video media playing service. As an optional way, the request message specifically indicates that the MRS performs a video media resource negotiation with the first terminal by using a multiple SSRC mechanism. Illustratively, if the called user subscribes to the service, a request message is sent to the MRS, illustratively, the MRS is notified to perform video media resource negotiation with the first terminal by using a multi-SSRC mechanism in a manner of expanding an SIP X header field, and an example of the expansion of the X header field is as follows:

X-SUPPORT-multiple-SSRC identification: showmode ═ n × m.

Wherein, the "X-SUPPORT-MULTI-SSRC identifier" indicates that multiple SSRC identifier negotiation is adopted, and the "brown ═ n × m" indicates that the negotiated number of SSRC identifiers is n × m.

403. MRS carries out video media resource negotiation with the first terminal through AS.

404. The AS sends a first play request to the MRS.

The first play request instructs the MRS to send the multi-SSRC identified RTP stream to the first terminal.

405. MRS sends RTP stream to the first terminal according to the first play request.

406. And the first terminal plays the first video media and the second video media according to the received RTP stream.

Step 405 and step 406 may be understood with reference to step 203 and step 204, respectively, and will not be described herein.

Steps 401 to 403 are optional steps, and the embodiment of the present application is not limited to performing steps 401 to 403 before step 404. If step 403 is executed, optionally, in step 405, the MRS sends an RTP stream to the first terminal according to the result of the negotiation of the video media resource and the first play request.

In step 403, the MRS performs video media resource negotiation with the first terminal through the AS, AS an optional manner, referring to fig. 4, step 403 specifically includes steps 4031 to 4035.

4031 and 4032, MRS sends update message to the first terminal through AS.

The update message indicates that the MRS supports the multiple SSRC identification mechanism, which may specifically refer to the related description about the update message in step 2021, and is not described herein again.

4033 and 4034, the first terminal sends a response message to the MRS through the AS.

After receiving the update message, the first terminal sends a response message aiming at the update message to the MRS through the AS according to the self capability. The response message may specifically refer to the related description about the response message in step 2022, which is not described herein again.

Optionally, after step 404, steps 407 to 414 are also included.

407. The first terminal sends a handover request for the first video media to the AS.

Step 407 may be understood with reference to step 205, which is not described herein.

408. The AS sends a stop play request to the MRS.

Wherein the stop play request is for instructing the MRS to stop playing the first video media.

409. MRS stops sending RTP message of first video media to first terminal.

410. The MRS sends a response message to the AS for the stop play request.

Wherein the response message to the stop play request is used to inform the AS that the playing of the first video media has been stopped.

411. The AS sends a second play request to the MRS.

Wherein the second play request is used to instruct the MRS to play the third video media.

412. The MRS transmits a response message for the second play request to the AS.

Wherein the response message to the second play request is used to inform the AS that the third video media is ready to be played.

413. And the AS sends a response message aiming at the switching request to the first terminal.

Wherein the response message to the handover request is used to inform that the first terminal is ready to handover the first video media.

414. MRS sends the third RTP message of the third video media to the first terminal.

415. And the first terminal plays the third video media according to the third RTP message.

Step 414 and step 415 can be understood with reference to step 206 and step 207, and are not described herein.

Optionally, step 412 may be performed after step 414, or step 412 may be performed simultaneously with step 414.

In the embodiment of the present application, step 10, step 412 and step 413 are optional steps.

Fig. 5 is a schematic flow chart of another possible embodiment of the call processing method of the present application. Fig. 5 describes the method of the present application with exemplary signaling, and fig. 5 illustrates that the AS and the MRS in the media server are separately configured, so that the method may be applied to the system illustrated in fig. 1, and of course, may also be applied to other communication scenarios, and the embodiments of the present application are not limited herein. The method shown in fig. 5 is an example of the method shown in fig. 4, and thus some explanations of the method shown in fig. 5 may refer to the method shown in fig. 4. The following takes a video media as a video polyphonic ringtone as an example, and introduces specific steps of the embodiment of the call processing method corresponding to fig. 5.

501 and 502, the calling terminal sends a call request, such AS INVITE (SDPo1), to the called terminal through the AS.

503 + 504, the called terminal sends 183(SDPa1) to the calling terminal through the AS.

505 and 506, the calling terminal sends UPDATE (SDPo2) to the called terminal via the AS.

507 and 508, the called terminal sends UPDATE (SDPa2) to the calling terminal through the AS.

509. The called terminal sends 180(ALERTING) to the AS.

In steps 501 to 509, the interaction process between the AS and the calling terminal and the called terminal may refer to the interaction process between the media server and the calling terminal and the called terminal in steps 301 to 309, which is not described herein again.

510. The AS sends a request message, e.g., INVITE, to the MRS.

The AS judges whether the called user subscribes to the multi-video color ring playing service, if the user subscribes to the service, the AS sends an INVITE message to the MRS, the INVITE message does not carry SDP, and the MRS is informed to use multi-SSRC identification negotiation in a mode of expanding SIP X header fields. An example of an X header field extension is as follows:

X-SUPPORT-multiple-SSRC identification: showmode is 3 × 1.

511. The MRS sends 200(SDPa3) to the AS.

The MRS determines whether the current SIP call is a video negotiation, whether the called user subscribes to a multi-video polyphonic ringtone playing service, if so, the MRS generates 3 SSRC identifiers, for example, "12345, 12346, and 12347", and sends 200 messages carrying SDP information, that is, 200(SDPa3), to the AS. The SDP information may refer to the SDP information in step 310.

512. The AS sends an UPDATE message, e.g. UPDATE (SDPa3), to the calling terminal.

The update message carries the SDP information in the message 200 in step 511.

513. The calling terminal sends a response message, e.g. 200(SDPo3), to the AS.

Step 513 may be understood with reference to step 311, and will not be described herein.

514. The AS sends an ACK to the MRS (SDPo 3).

Wherein, the ACK (SDPo3) carries the SDP information in the response message in step 513, and the MRS determines whether the calling terminal supports receiving the RTP stream encapsulated by the multiple SSRC identifier mechanism according to the SDP information.

515. The AS sends an INFO message to the MRS.

After the AS receives the response message, if the calling terminal supports receiving the multi-SSRC identification stream according to the message, the AS sends an INFO message to the MRS, and the INFO message is used for controlling the MRS to play a plurality of video color ring.

In the INFO message, various protocol control media operations can be supported, such as implementation example of MSML:

516: the AS sends 180(ALERTING) to the calling terminal.

The 180 message carries ringing information (ALERTING) for indicating that the called terminal rings.

517. The MRS sends 200 (response) to the AS.

And after receiving the INFO message, the MRS sends a 200 message to the AS, wherein the 200 message carries response information used for indicating that the MRS has received the INFO message.

518. The MRS sends an RTP stream to the calling terminal.

519. And the calling terminal plays a plurality of video media according to the RTP stream.

Step 518 and step 519 are understood with reference to step 313 and step 314 and will not be described in detail here.

520. The calling terminal sends a handover instruction, e.g. INFO (handover), to the AS.

INFO (switching) can be implemented by extending MSML, as an example:

the media server is instructed to switch the video ring back tone displayed in the first row and the first column by a "switching location" 1:1 "field in the example.

Step 520 may be understood with reference to step 315, which is not described herein.

521. The AS sends a stop play request, e.g. INFO (stop), to the MRS.

Wherein, INFO (stop) is used to instruct to stop playing the video color ring displayed in the first row and the first column. An example of an extended MSML is as follows:

in the example, the media server is instructed to stop playing the video color ring tone displayed in the first row and the first column, namely, the video color ring tone 1, by a field of < subvideosoptop > < showmode > < 3 x 1 > and "storage > < 1: 1'/>.

522. MRS stops sending RTP message of video color ring 1 to calling terminal.

523. MRS sends a response message to the AS for the stop play request, e.g. 200 (stop);

where 200 (stop) is used to inform the AS that video media 1 has stopped playing.

524. AS sends play request, e.g. INFO (play), to MRS;

wherein, INFO (play) is used to instruct MRS to play other video color ring besides video color ring 1. An example of INFO (play) is as follows:

in the example, the MRS is instructed to play the video color ring of the path "y:/video/202001/sub-6.3gp" in the first row and the first column by a "< video > < show ═ 3 × 1" filename:/y:/video/202001/sub-6.3 gp "location ═ 1:1"/> "field, and for convenience of description, the video color ring is referred to as video color ring 4.

525. MRS sends a response message to the AS to the play request, e.g. 200 (play);

where 200 (play) is used to inform the AS that video media 3 is ready to be played.

526. The AS sends a response message, e.g. 200(INFO), of successful handover to the calling terminal.

Wherein 200(INFO) is used to inform the calling terminal that the video CRBT 1 is ready to be switched.

527. MRS sends the third RTP message of video color ring 3 to the calling terminal.

528. And the first terminal plays the video color ring 3 according to the third RTP message.

Step 527 to step 528 can refer to step 316 to step 318, which are not described herein again.

The foregoing describes a call processing method provided in an embodiment of the present application, and the following describes an apparatus provided in the embodiment of the present application. Fig. 6 is a schematic diagram of a media server according to one possible embodiment of the present application. Referring to fig. 6, the media server includes a call processing unit 601, a media negotiation unit 602, and a media transmission unit 603. The call processing unit 601 is configured to receive a call request, where the call request is used for a calling terminal to initiate a call to a called terminal. The media negotiation unit 602 is configured to perform video media resource negotiation with a first terminal, where the first terminal is a calling terminal or a called terminal. The media sending unit 603 is configured to send a real-time transport protocol RTP stream to the first terminal based on a negotiation result of the video media resource negotiation performed by the media negotiation unit and the first terminal. The RTP stream comprises a first RTP message of a first video media and a second RTP message of a second video media, the first RTP message comprises a synchronous source SSRC identification of the RTP stream and an SSRC identification of the first video media, and the second RTP message comprises an SSRC identification of the RTP stream and an SSRC identification of the second video media. These units may also be used to implement the related functions in any of the foregoing embodiments in fig. 2 and fig. 3A, and are not described herein again.

As an optional manner, the media negotiation unit 602 is specifically configured to: sending an update message to the first terminal, the update message indicating that the media server supports a multiple SSRC identification mechanism; receiving a response message from the first terminal, the response message indicating that the first terminal supports the multiple SSRC identification mechanism.

As an alternative, the update message includes first indication information indicating the SSRC identification of the first video media and the SSRC identification of the second video media.

As an alternative, the first indication information corresponds to the same m lines of video in the update message.

As an optional manner, the first RTP packet further includes a first order identifier, and the first order identifier of the first RTP packet indicates an order of the first RTP packet in the first video media.

As an optional manner, the media server further includes a media switching unit 604, where the media switching unit 604 is configured to: after a media sending unit 603 sends an RTP stream to a first terminal, receiving a switching request from the first terminal, where the switching request is used to request to switch the first video media; and sending a third RTP message of a third video media to the first terminal, wherein the third RTP message comprises an SSRC identifier of the RTP stream and an SSRC identifier of the third video media, and the SSRC identifier of the third video media is the same as the SSRC identifier of the first video media.

An embodiment of the present application further provides a terminal, and fig. 7 is a schematic diagram of a possible embodiment of the terminal of the present application. Referring to fig. 7, the terminal includes a call processing unit 701, a media negotiation unit 702, a media receiving unit 703, and a media playing unit 704. The call processing unit 701 is configured to send a call request to a media server, or receive a call request from the media server, where the call request is used for a calling terminal to initiate a call to a called terminal. The media negotiation unit 702 is configured to perform video media resource negotiation with a media server. The media receiving unit 703 is configured to receive an RTP stream from the media server based on a negotiation result of a video media resource negotiation with the media server, where the RTP stream includes a first RTP packet of a first video media and a second RTP packet of a second video media, the first RTP packet includes a synchronization source SSRC identifier of the RTP stream and an SSRC identifier of the first video media, and the second RTP packet includes the SSRC identifier of the RTP stream and the SSRC identifier of the second video media. The media playing unit 704 is configured to play the first video media and the second video media according to the RTP stream.

As an optional manner, the media negotiation unit 702 is specifically configured to: receiving an update message from the media server, the update message indicating that the media server supports a multiple SSRC identification mechanism; sending a response message to the media server, the response message indicating that the first terminal supports the multiple SSRC identification mechanism.

As an optional manner, the media playing unit 704 is specifically configured to play the first video media according to the playing parameter corresponding to the SSRC identifier of the first video media, and play the second video media according to the playing parameter corresponding to the SSRC identifier of the second video media.

As an optional manner, the first RTP packet further includes a first order identifier, where the first order identifier indicates an order of the first RTP packet in the first video media.

As an optional manner, the terminal further includes a media switching unit 705, where the media switching unit 705 is configured to: after the media playing unit 704 plays the first video media and the second video media respectively, sending a switching request to the media server, where the switching request is used to request to switch the first video media; receiving a third RTP packet of a third video media from the media server, where the third RTP packet includes an SSRC identifier of the RTP stream and an SSRC identifier of the third video media, and the SSRC identifier of the third video media is the same as the SSRC identifier of the first video media; and playing the third video media according to the third RTP message.

These units implement the related functions of the first terminal in the foregoing embodiment of fig. 2, or implement the related functions of the calling terminal in the embodiment of fig. 3A, which are not described herein again.

An embodiment of the present application further provides an MRS, and fig. 8 is a schematic diagram of a possible embodiment of the MRS of the present application. Referring to fig. 8, the MRS includes a reception unit 801 and a media transmission unit 802. The receiving unit 801 is configured to receive a first play request from an AS. The media sending unit 802 is configured to send an RTP stream to the first terminal. The first terminal is a calling terminal or a called terminal. The RTP stream comprises a first RTP message of a first video media and a second RTP message of a second video media, the first RTP message comprises a synchronous source SSRC identification of the RTP stream and an SSRC identification of the first video media, and the second RTP message comprises an SSRC identification of the RTP stream and an SSRC identification of the second video media.

AS an optional manner, the MRS further includes a media negotiation unit 803, and the media negotiation unit 803 performs video media resource negotiation with the first terminal through the AS before the receiving unit 801 receives the play request from the AS. The media sending unit 802 is specifically configured to send an RTP stream to the first terminal according to the negotiation result of the video media resource negotiation and the first play request.

As an optional manner, the media negotiation unit 803 is specifically configured to: the MRS sends an update message to the first terminal through the AS, the update message indicating that the MRS and/or the AS support a multiple SSRC identification mechanism. Thereafter, the MRS receives a response message from the first terminal through the AS, the response message indicating that the first terminal supports the multiple SSRC identification mechanism.

AS an optional manner, the receiving unit 801 is further configured to receive a request message from the AS before the media negotiating unit 803 performs video media resource negotiation with the first terminal through the AS, where the request message is sent by the AS in response to a call request, where the call request is used by the calling terminal to initiate a call to the called terminal. The request message indicates that the calling user or the called user has subscribed to the multi-video media playing service. As an optional way, the request message specifically indicates that the MRS performs a video media resource negotiation with the first terminal by using a multiple SSRC mechanism.

As an option, the update message carries media capability information of the MRS (e.g., SDP information of the MRS). In one possible design, the update message includes first indication information indicating the SSRC identification of the first video media and the SSRC identification of the second video media.

As an alternative, the first indication information further indicates a playing parameter of the first video medium and a playing parameter of the second video medium.

As an alternative, the playing parameter of the first video media is used to indicate the display position of the first video media at the first terminal.

As an optional manner, the first RTP packet further includes a first order identifier, and the first order identifier of the first RTP packet is used to indicate an order of the first RTP packet in the first video media.

AS an optional manner, the MRS further includes a media switching unit 804, where the media switching unit 804 is configured to receive a stop playing request and a second playing request from the AS after the media sending unit 802 sends the RTP stream to the first terminal, where the stop playing request indicates to stop playing the first video media, and the second playing request indicates to play the third video media. Then, the media switching unit 804 stops sending the first RTP packet of the first video media to the first terminal according to the play stop request, and sends the third RTP packet of the third video media to the first terminal according to the second play request, where the third RTP packet includes the SSRC identifier of the RTP stream and the SSRC identifier of the third video media. Wherein the SSRC identification of the third video media is the same as the SSRC identification of the first video media.

These units implement the related functions of the MRS in the foregoing embodiments of fig. 4 or fig. 5, and are not described here again.

In the embodiments of the present application, the media server and the terminal are presented in the form of functional units. As used herein, a "unit" may refer to an ASIC, a circuit, a processor and memory that execute one or more software or firmware programs, an integrated logic circuit, and/or other devices that provide the described functionality. In a simple embodiment, one skilled in the art can appreciate that the media server and the terminal can be implemented using a processor, memory, and a communication interface.

The media server or the terminal of the embodiment of the present application can also be implemented by way of a computer device (or system) in fig. 9. Fig. 9 is a schematic diagram of a computer device according to an embodiment of the present invention. The computer device includes at least one processor 901, a communication bus 902 and memory 903, and may also include at least one communication interface 904 and I/O interfaces 905.

The processor 901 may be a general purpose Central Processing Unit (CPU), a microprocessor, an application-specific integrated circuit (ASIC), or one or more ics for controlling the execution of programs in accordance with the teachings of the present disclosure.

Communication bus 902 may include a path that transfers information between the aforementioned components. A communication interface, using any transceiver or the like, for communicating with other devices or communication Networks, such as ethernet, Radio Access Network (RAN), Wireless Local Area Network (WLAN), etc.

The Memory 903 may be a Read-Only Memory (ROM) or other type of static storage device that can store static information and instructions, a Random Access Memory (RAM) or other type of dynamic storage device that can store information and instructions, an Electrically Erasable Programmable Read-Only Memory (EEPROM), a Compact Disc Read-Only Memory (CD-ROM) or other optical Disc storage, optical Disc storage (including Compact Disc, laser Disc, optical Disc, digital versatile Disc, blu-ray Disc, etc.), magnetic disk storage media or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to these. The memory may be self-contained and coupled to the processor via a bus. The memory may also be integral to the processor.

The memory 903 is used for storing application program codes for executing the scheme of the application, and the processor controls the execution. The processor is configured to execute application program code stored in the memory.

In a specific implementation, the processor 901 may include one or more CPUs, and each CPU may be a single-Core (single-Core) processor or a multi-Core (multi-Core) processor. Processor 801 herein may refer to one or more devices, circuits, and/or processing cores for processing data (e.g., computer program instructions).

In particular implementations, the computer device may also include an input/output (I/O) interface 905, as one embodiment. For example, the output device may be a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display device, a Cathode Ray Tube (CRT) display device, a projector (projector), or the like. The input device may be a mouse, a keyboard, a touch screen device or a sensing device, etc.

The computer device may be a general purpose computer device or a special purpose computer device. In a specific implementation, the computer device may be a desktop computer, a laptop computer, a web server, a Personal Digital Assistant (PDA), a mobile phone, a tablet computer, a wireless terminal device, a communication device, an embedded device, or a device with a similar structure as in fig. 7. The embodiment of the application does not limit the type of the computer equipment.

Each network element in fig. 1 may be the device shown in fig. 9, with one or more software modules stored in the memory. The terminal or the media server or the AS or the MRS may implement software modules by program codes in the processor and the memory, and perform the methods performed by the corresponding devices in the above embodiments.

Embodiments of the present application further provide a computer-readable storage medium for storing computer software instructions for the device (terminal or media server) shown in fig. 9, which contains a program designed to execute the above method embodiments. The above method can be implemented by executing a stored program.

The embodiment of the present application further provides a system for call processing, where the system includes a terminal and a media server, where the terminal may perform any step performed by the first terminal in the embodiment of fig. 2 or the calling terminal in the embodiment of fig. 3A; the media server may perform any step performed by the media server in the embodiment of fig. 2 or fig. 3A, which is not described again in this embodiment of the present application.

The embodiment of the present application further provides a system for call processing, where the system includes a terminal, an AS, and an MRS, where the terminal may perform any step performed by the first terminal in the embodiment of fig. 4 or the calling terminal in the embodiment of fig. 5; the AS and the MRS may respectively execute any step executed by the AS and the MRS in the embodiment of fig. 4 or fig. 5, and the embodiments of the present application are not described again.

While the present application has been described in connection with various embodiments, other variations to the disclosed embodiments can be understood and effected by those skilled in the art in practicing the claimed application, from a review of the drawings, the disclosure, and the appended claims. In the claims, the word "comprising" does not exclude other elements or steps, and the word "a" or "an" does not exclude a plurality. A single processor or other module may fulfill the functions of several items recited in the claims. The mere fact that certain measures are recited in mutually different dependent claims does not indicate that a combination of these measures cannot be used to advantage.

Those of skill in the art will appreciate that the functions described in connection with the various illustrative logical blocks, modules, and algorithm steps described in the disclosure herein may be implemented as hardware, software, firmware, or any combination thereof. If implemented in software, the functions described in the various illustrative logical blocks, modules, and steps may be stored on or transmitted over as one or more instructions or code on a computer-readable medium and executed by a hardware-based processing unit. The computer-readable medium may include a computer-readable storage medium, which corresponds to a tangible medium, such as a data storage medium, or any communication medium including a medium that facilitates transfer of a computer program from one place to another (e.g., according to a communication protocol). In this manner, a computer-readable medium may generally correspond to (1) a non-transitory tangible computer-readable storage medium, or (2) a communication medium, such as a signal or carrier wave. A data storage medium may be any available medium that can be accessed by one or more computers or one or more processors to retrieve instructions, code and/or data structures for implementing the techniques described herein. The computer program product may include a computer-readable medium.

By way of example, and not limitation, such computer-readable storage media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, flash memory, or any other medium that can be used to store desired program code in the form of instructions or data structures and that can be accessed by a computer. Also, any connection is properly termed a computer-readable medium. For example, if the instructions are transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, Digital Subscriber Line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. It should be understood, however, that the computer-readable storage media and data storage media do not include connections, carrier waves, signals, or other transitory media, but are instead directed to non-transitory tangible storage media. Disk and disc, as used herein, includes Compact Disc (CD), laser disc, optical disc, Digital Versatile Disc (DVD), and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.

The instructions may be executed by one or more processors, such as one or more Digital Signal Processors (DSPs), general purpose microprocessors, Application Specific Integrated Circuits (ASICs), field programmable logic arrays (FPGAs), or other equivalent integrated or discrete logic circuitry. Thus, the term "processor," as used herein may refer to any of the foregoing structure or any other structure suitable for implementation of the techniques described herein. Additionally, in some aspects, the functions described by the various illustrative logical blocks, modules, and steps described herein may be provided within dedicated hardware and/or software modules configured for encoding and decoding, or incorporated in a combined codec. Also, the techniques may be fully implemented in one or more circuits or logic elements.

The techniques of this application may be implemented in a wide variety of devices or apparatuses, including a wireless handset, an Integrated Circuit (IC), or a set of ICs (e.g., a chipset). Various components, modules, or units are described in this application to emphasize functional aspects of means for performing the disclosed techniques, but do not necessarily require realization by different hardware units. Indeed, as described above, the various units may be combined in a codec hardware unit, in conjunction with suitable software and/or firmware, or provided by an interoperating hardware unit (including one or more processors as described above).

The above description is only an exemplary embodiment of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present application are intended to be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A call processing method applied to a media server includes:

receiving a call request, wherein the call request is used for a calling terminal to initiate a call to a called terminal;

performing video media resource negotiation with a first terminal, wherein the first terminal is the calling terminal or the called terminal;

and sending a real-time transport protocol (RTP) stream to the first terminal based on a negotiation result of video media resource negotiation with the first terminal, wherein the RTP stream comprises a first RTP message of a first video media and a second RTP message of a second video media, the first RTP message comprises a synchronous source SSRC identifier of the RTP stream and an SSRC identifier of the first video media, and the second RTP message comprises the SSRC identifier of the RTP stream and the SSRC identifier of the second video media.

2. The method of claim 1, wherein negotiating the video media resource with the first terminal comprises:

sending an update message to the first terminal, the update message indicating that the media server supports a multiple SSRC identification mechanism;

receiving a response message from the first terminal, the response message indicating that the first terminal supports the multiple SSRC identification mechanism.

3. The method of claim 2, wherein the update message comprises first indication information indicating the SSRC identification of the first video media and the SSRC identification of the second video media.

4. The method of claim 3, wherein the first indication information corresponds to a same m-line of video in the update message.

5. The method of any of claims 1-4, wherein the first RTP packet further comprises a first order identification, the first order identification of the first RTP packet indicating an order of the first RTP packet in the first video media.

6. The method according to any of claims 1 to 5, wherein after said sending an RTP stream to the first terminal, the method further comprises:

receiving a switching request from the first terminal, wherein the switching request is used for requesting to switch the first video media;

and sending a third RTP message of a third video media to the first terminal, wherein the third RTP message comprises an SSRC identifier of the RTP stream and an SSRC identifier of the third video media, and the SSRC identifier of the third video media is the same as the SSRC identifier of the first video media.

7. A call processing method is applied to a first terminal, wherein the first terminal is a calling terminal or a called terminal, and comprises the following steps:

sending a call request to a media server, or receiving a call request from the media server, wherein the call request is used for the calling terminal to initiate a call to the called terminal;

performing video media resource negotiation with the media server;

receiving an RTP stream from the media server based on a negotiation result of video media resource negotiation with the media server, wherein the RTP stream comprises a first RTP message of a first video media and a second RTP message of a second video media, the first RTP message comprises a synchronization source SSRC identification of the RTP stream and an SSRC identification of the first video media, and the second RTP message comprises an SSRC identification of the RTP stream and an SSRC identification of the second video media;

and respectively playing the first video media and the second video media according to the RTP stream.

8. The method of claim 7, wherein the negotiating video media assets with the media server comprises:

receiving an update message from the media server, the update message indicating that the media server supports a multiple SSRC identification mechanism;

sending a response message to the media server, the response message indicating that the first terminal supports the multiple SSRC identification mechanism.

9. The method of claim 8, wherein the update message comprises first indication information indicating the SSRC identification of the first video media and the SSRC identification of the second video media.

10. The method of claim 9, wherein the first indication information corresponds to a same m-line of video in the update message.

11. The method of any of claims 7-10, wherein the first RTP packet further comprises a first order identifier, wherein the first order identifier indicates an order of the first RTP packet in the first video media.

12. The method of any of claims 7-11, wherein after said playing said first video media and said second video media, respectively, according to said RTP stream, said method further comprises:

sending a switching request to the media server, wherein the switching request is used for requesting to switch the first video media;

receiving a third RTP packet of a third video media from the media server, where the third RTP packet includes an SSRC identifier of the RTP stream and an SSRC identifier of the third video media, and the SSRC identifier of the third video media is the same as the SSRC identifier of the first video media;

and playing the third video media according to the third RTP message.

13. A media server, comprising:

the system comprises a call processing unit, a call processing unit and a call processing unit, wherein the call processing unit is used for receiving a call request, and the call request is used for a calling terminal to initiate a call to a called terminal;

a media negotiation unit, configured to perform video media resource negotiation with a first terminal, where the first terminal is the calling terminal or the called terminal;

a media sending unit, configured to send a real-time transport protocol RTP stream to the first terminal based on a negotiation result of a video media resource negotiation with the first terminal, where the RTP stream includes a first RTP packet of a first video media and a second RTP packet of a second video media, the first RTP packet includes a synchronization source SSRC identifier of the RTP stream and an SSRC identifier of the first video media, and the second RTP packet includes the SSRC identifier of the RTP stream and an SSRC identifier of the second video media.

14. The media server according to claim 13, wherein the media negotiation unit is specifically configured to:

15. The media server of claim 14, wherein the update message comprises first indication information indicating the SSRC identification of the first video media and the SSRC identification of the second video media.

16. The media server of claim 15, wherein the first indication information corresponds to a same m-line of video in the update message.

17. The media server of any of claims 13 to 16, wherein the first RTP packet further comprises a first order identification, the first order identification of the first RTP packet indicating an order of the first RTP packet in the first video media.

18. The media server according to any of claims 13 to 17, characterized in that the media server further comprises a media switching unit configured to:

after the RTP stream is sent to the first terminal, receiving a switching request from the first terminal, wherein the switching request is used for requesting to switch the first video media;

19. A terminal, which is a calling terminal or a called terminal, comprising:

a call processing unit, configured to send a call request to a media server, or receive a call request from the media server, where the call request is used for the calling terminal to initiate a call to the called terminal;

the media negotiation unit is used for carrying out video media resource negotiation with the media server;

a media receiving unit, configured to receive, based on a negotiation result of a video media resource negotiation with the media server, an RTP stream from the media server, where the RTP stream includes a first RTP packet of a first video media and a second RTP packet of a second video media, the first RTP packet includes a synchronization source SSRC identifier of the RTP stream and an SSRC identifier of the first video media, and the second RTP packet includes the SSRC identifier of the RTP stream and the SSRC identifier of the second video media;

and the media playing unit is used for respectively playing the first video media and the second video media according to the RTP stream.

20. The terminal according to claim 19, wherein the media negotiation unit is specifically configured to:

21. The terminal of claim 20, wherein the update message comprises first indication information indicating the SSRC identification of the first video media and the SSRC identification of the second video media.

22. The terminal of claim 21, wherein the first indication information corresponds to a same m-line of video in the update message.

23. The terminal of any of claims 19 to 22, wherein the first RTP packet further comprises a first order identifier, the first order identifier indicating an order of the first RTP packet in the first video media.

24. The terminal according to any of claims 19 to 23, characterized in that the terminal further comprises a media switching unit configured to:

after the first video media and the second video media are respectively played according to the RTP stream, sending a switching request to the media server, wherein the switching request is used for requesting to switch the first video media;

and playing the third video media according to the third RTP message.

25. A call processing system comprising a media server according to any of claims 13 to 18 and a terminal according to any of claims 19 to 24.