CN108632681B

CN108632681B - Method, server and terminal for playing media stream

Info

Publication number: CN108632681B
Application number: CN201710172615.6A
Authority: CN
Inventors: 杨生飞; 王赵淮; 王伟; 姜立科; 曹阳
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2017-03-21
Filing date: 2017-03-21
Publication date: 2020-04-03
Anticipated expiration: 2037-03-21
Also published as: CN108632681A; WO2018171567A1

Abstract

The application discloses a method, a server and a terminal for playing media streams, and belongs to the field of communication. The method comprises the following steps: a terminal sends an acquisition request carrying fragment identification of a video stream fragment to a server, wherein the video stream fragment is a video stream fragment contained in a video stream sent by the server and received by the terminal; the server acquires the starting time of the video stream according to the fragment identification of the video stream fragment; the server acquires an audio stream corresponding to the video stream according to the starting time; the server sends the audio stream to the terminal; and the terminal receives the audio stream and plays the video stream and the audio stream. The method and the device solve the problem that the sound cannot be played or the played sound is inconsistent with the video picture in the terminal in the related technology, and ensure that the sound played by the terminal is consistent with the video picture.

Description

Method, server and terminal for playing media stream

Technical Field

The present application relates to the field of communications, and in particular, to a method, a server, and a terminal for playing a media stream.

Background

In an Internet Protocol Television (IPTV) system, a user may view a Television program using a terminal such as a Television, a mobile phone, a tablet computer, or the like. When a user watches a television program of a certain channel, the user often needs to switch to another channel to watch the television program of the other channel. For convenience of explanation, the other channel is referred to as a target channel, and the terminal may obtain a media stream of the target channel for playing in the following manner.

At present, a scene of adaptive code streams appears, a terminal can be switched to a target channel in the scene, the scene defines that video streams of each channel include a high-code-rate video stream and a low-code-rate video stream, and a fast channel switching (FCC) server caches the high-code-rate video stream and the low-code-rate video stream of each channel sent by a video source, so that when the terminal is switched to the target channel, the FCC server can select one video stream to be provided for the terminal to play according to requirements. The handover procedure is as follows: after receiving a play request for switching to a target channel sent by the terminal, the FCC server sends a low-bit-rate video stream of the target channel to the terminal first, sends the low-bit-rate video stream of the target channel for a period of time to improve the quality of a play picture of the terminal, and then sends a high-bit-rate video stream of the target channel to the terminal and notifies the terminal to join a multicast group of the target channel. And the terminal plays the high-bit-rate video stream sent by the FCC server, joins the multicast group of the target channel, and plays the audio stream and the video stream after receiving the audio stream and the video stream sent by the multicast group.

In the above scenario of adaptive streaming, the FCC server sends a video stream to the terminal without involving an audio stream, which may cause a problem that the terminal cannot play sound or the played sound is inconsistent with the video picture.

Disclosure of Invention

In order to solve the problem that a terminal may not play sound or the played sound is inconsistent with a video picture in the prior art, embodiments of the present application provide a method, a server, and a terminal for playing a media stream. The technical scheme is as follows:

in a first aspect, a method for playing a media stream is provided, the method comprising: the method comprises the steps that a server receives an acquisition request sent by a terminal, wherein the acquisition request carries a fragment identifier of a video stream fragment in a video stream sent to the terminal by the server; the server determines the starting time of the video stream sent to the terminal according to the fragment identifier, and acquires the audio stream corresponding to the video stream according to the starting time; the server sends the audio stream to the terminal so that the terminal plays the video stream and the audio stream. After receiving an acquisition request sent by a terminal, a server can determine an audio stream corresponding to a video stream sent to the terminal according to a fragment identifier carried in the acquisition request, wherein the audio stream and the video stream have the same start time, so that audio and video synchronization can be ensured when the terminal simultaneously plays the video stream and the audio stream, and the problem that the terminal cannot play sound or the played sound is inconsistent with a video picture in the related art is solved.

In a possible implementation manner of the first aspect, a video stream sent by the server to the terminal includes description information of at least one audio stream, so that the terminal selects the description information of one audio stream that conforms to its own capability, and thus an acquisition request sent by the terminal to the server also includes the description information of the one audio stream; after receiving the acquisition request, the server determines the starting time of the video stream sent to the terminal according to the fragment identifier in the acquisition request; and determining one audio stream according to the description information of one audio stream in the acquisition request, and acquiring the audio stream corresponding to the video stream from the determined audio stream according to the starting time. Therefore, the terminal can acquire the audio stream which accords with the self-capability according to the self-capability to play.

In a possible implementation manner of the first aspect, before receiving an acquisition request that is sent by a terminal and carries a fragment identifier, a server sends a Real-time Transport control protocol (RTCP) packet to the terminal, where a fragment timestamp field of the RTCP packet carries the fragment identifier. The server sends an RTCP packet to the terminal, wherein a fragment timestamp field of the RTCP packet carries the fragment identifier, so that the terminal sends an acquisition request carrying the fragment identifier to the server, and the acquisition request is used for requesting the server to acquire an audio stream corresponding to the video stream received by the terminal.

In a possible implementation manner of the first aspect, the video stream sent by the server to the terminal is a first video stream with a first bitrate, and the method further includes: when the server stops sending the first video stream to the terminal and sends the second video stream with the second code rate to the terminal, the first code rate is smaller than the second code rate, and a notification message is sent to the terminal, wherein the notification message carries the sequence number of the last data packet of the first video stream and the sequence number of the first data packet of the second video stream. The method comprises the steps that after a terminal receives a last data packet of a first video stream, the first data packet of a second video stream is received, the sequence number of the last data packet is discontinuous with the sequence number of the first data packet, the playing of the second data stream is stopped in order to avoid the terminal from judging that the data packet is lost by mistake, and a server sends a notification message to the terminal so that the terminal can determine that the last data packet and the first data packet are continuous data packets according to the notification message, and therefore the second video stream is played from the first data packet normally.

In a possible implementation manner of the first aspect, the notification message sent by the Server to the terminal is a Server terminal notification (SCN) message, where the SCN message includes an old sequence number field and a new sequence number field, the old sequence number field carries a sequence number of a last data packet of a first video stream sent by the Server to the terminal, and the new sequence number field carries a sequence number of a first data packet of a second video stream sent by the Server to the terminal.

In a second aspect, a method for playing a media stream is provided, the method comprising: the terminal sends an acquisition request to the server, wherein the acquisition request carries the fragment identification of the video stream fragment in the received video stream, so that the server sends the audio stream corresponding to the video stream according to the acquisition request, and the starting time of the audio stream is the same as that of the video stream; and the terminal receives the audio stream sent by the server and plays the received video stream and audio stream. Because the starting time of the video stream is the same as that of the audio stream, the audio and video synchronization can be ensured when the terminal plays the video stream and the audio stream simultaneously, and the problem that the terminal cannot play sound or the played sound is inconsistent with the video picture in the related technology is solved.

In a possible implementation manner of the second aspect, the video stream received by the terminal from the server includes description information of at least one audio stream, the terminal further selects description information of one audio stream that conforms to its own capability from the description information, and the acquisition request sent to the server further includes description information of the one audio stream, so as to acquire the audio stream that conforms to its own capability from the server for playing.

In a possible implementation manner of the second aspect, before sending an acquisition request carrying a fragment identifier to a server, a terminal further receives an RTCP packet sent by the server, obtains a fragment timestamp field of the RTCP packet carrying a fragment identifier of a video stream fragment in a video stream that the terminal has received, and adds the fragment identifier to the acquisition request, so that after receiving the acquisition request, the server sends an audio stream corresponding to the video stream that the terminal has received to the terminal according to the fragment identifier.

In a possible implementation manner of the second aspect, the video stream received by the terminal is a first video stream with a first code rate, the terminal further receives a second video stream with a second code rate sent by the server, and the second code rate is greater than the first code rate; the terminal also receives a notification message sent by the server, wherein the notification message carries two sequence numbers, one of the sequence numbers is the sequence number of the last data packet of the first video stream, and the other sequence number is the sequence number of the first data packet of the second video stream; and playing the second video stream received by the user according to the notification message. The terminal receives the last data packet of the first video stream and then receives the first data packet of the second video stream, and the sequence number of the last data packet is discontinuous with the sequence number of the first data packet. So that the terminal determines that the last packet and the first packet are consecutive packets according to the notification message, thereby normally playing the second video stream from the first packet.

In a possible implementation manner of the second aspect, the notification message sent by the server and received by the terminal is an SCN message, where the SCN message includes an old sequence number field and a new sequence number field, the old sequence number field carries a sequence number of a last data packet of a first video stream received by the terminal from the server, and the new sequence number field carries a sequence number of a first data packet of a second video stream received by the terminal from the server.

In a third aspect, an apparatus for playing a media stream is provided, where the apparatus includes at least one unit, and the at least one unit is configured to implement the method for playing a media stream provided in the first aspect or any one of the possible implementation manners of the first aspect.

In a fourth aspect, an apparatus for playing a media stream is provided, where the apparatus includes at least one unit, and the at least one unit is configured to implement the method for playing a media stream provided in the second aspect or any possible implementation manner of the second aspect.

In a fifth aspect, a server is provided, which includes: a processor and a network port, wherein the processor is configured to implement the method for playing a media stream provided by the first aspect or any one of the possible implementation manners of the first aspect by executing instructions.

In a sixth aspect, a terminal is provided, which includes: a processor and a network port, wherein the processor is configured to implement the method for playing a media stream provided by the second aspect or any one of the possible implementations of the second aspect by executing instructions.

In a seventh aspect, a system for playing a media stream is provided, where the system includes the server provided in the third aspect or the fifth aspect, and the terminal provided in the fourth aspect or the sixth aspect.

In an eighth aspect, a computer-readable medium is provided, which stores instructions for implementing the playing media stream provided by the first aspect or any one of the possible implementations of the first aspect, or stores instructions for implementing the playing media stream provided by the second aspect or any one of the possible implementations of the second aspect.

Drawings

Fig. 1 is a schematic structural diagram of an IPTV system according to an exemplary embodiment of the present invention;

FIG. 2 is a schematic diagram of video stream slices and audio stream slices provided by an exemplary embodiment of the present invention;

FIG. 3 is a block diagram of a server provided in an exemplary embodiment of the invention;

fig. 4 is a schematic structural diagram of a terminal according to an exemplary embodiment of the present invention;

FIG. 5 is a flowchart of a method for playing a media stream according to an exemplary embodiment of the present invention;

FIG. 6 is a diagram illustrating an architecture of an RTCP packet according to an exemplary embodiment of the present invention;

fig. 7 is a schematic structural diagram of an SCN packet according to an exemplary embodiment of the present invention;

fig. 8 is a schematic structural diagram of a server according to another exemplary embodiment of the present invention;

fig. 9 is a schematic structural diagram of a terminal according to another exemplary embodiment of the present invention.

Detailed Description

To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

Referring to fig. 1, a schematic structural diagram of an IPTV system according to an exemplary embodiment of the present invention is shown. The IPTV system includes: an origin server 110, a coding server 120, a multicast server 130, an FCC server 140, and a terminal 150.

The source server 110 may be a server, a server cluster composed of several servers, or a cloud computing service center. The source server 110 includes a media stream of each channel of the at least one channel, the media stream includes a second video stream, an audio stream corresponding to the second video stream, and the like, and may further include a subtitle stream, and the like. The origin server 110 is connected to the coding server 120 through a wired network or a wireless network, and may transmit a media stream of each channel to the coding server 120. The channels may include a television channel, a live channel, a carousel channel, and the like.

Typically, the media stream in the origin server 110 is previously produced and uploaded to the origin server 110 by a technician. When a technician creates a media stream, the technician may create a second video stream and at least one audio stream corresponding to the second video stream, and set description information of each audio stream, where the created second video stream includes the set description information of each audio stream. The description information of the audio stream may include information such as a playback language, a stream identifier, and a code rate corresponding to the audio stream. For example, when the technician creates the second video stream a, in addition to creating the second video stream a, two audio streams are created for the second video stream a, where one audio stream is an audio stream in a chinese language, the other audio stream is an audio stream in an english language, and the created second video stream a includes description information of the two audio streams.

In this case, the second video stream generated by the technician generally has a higher bitrate, so that the second video stream has a higher definition. For convenience of illustration, the bitrate of the second video stream is referred to as a second bitrate.

The encoding server 120 receives the media stream of each channel transmitted by the origin server 110. For the media stream of each channel, the encoding server 120 performs code reduction on a second video stream included in the media stream of the channel by using an Adaptive Bit Rate (ABR) encoding technique to generate at least one first video stream, where first code rates of each of the at least one first video stream are different from each other and are all smaller than a second code Rate of the second video stream. Therefore, each channel corresponds to one channel of the second video stream, at least one channel of the first video stream, at least one channel of the audio stream, and the like. Since each path of the first video stream is obtained by decoding the second video stream, the definition of the first video stream is smaller than that of the second video stream.

Optionally, the coding server 120 stores at least one preset first code rate, and may perform code reduction on the second video stream according to the stored at least one first code rate to obtain at least one first video stream.

The coding server 120 is connected to the multicast server 130 through a wired network or a wireless network, the coding server 120 sends a media stream of each channel to the multicast server 130, and the media stream of each channel sent to the multicast server 130 includes at least one first video stream obtained by performing code reduction on one second video stream in addition to contents such as one second video stream and at least one audio stream received from the source server 110.

The multicast server 130 may be a server, a server cluster composed of several servers, or a cloud computing service center. The multicast server 130 is connected to the FCC server 140 and the terminal 150 through a wired network or a wireless network, respectively.

The multicast server 130 receives the media stream of each channel transmitted by the encode server 120, and forwards the received media stream of each channel to the FCC server 140 in real time. The multicast server 130 also maintains a multicast group for each channel, and transmits the media streams of the channel to the terminals 150 located in the multicast group.

The FCC server 140 may be a server, a server cluster composed of several servers, or a cloud computing service center. The FCC server 140 is connected to the terminal 150 through a wired network or a wireless network.

The terminal 150 may include a smart tv, a set-top box, a smart phone, a tablet computer, a smart tv, a laptop computer, a desktop computer, and the like, and sends a play request carrying a channel identifier of a target channel to the FCC server 140 when starting up or switching channels, and requests the FCC server 140 to send a media stream of the target channel.

The FCC server 140 is configured to, after receiving a play request sent by the terminal 150, send a first video stream corresponding to the target channel and a segment identifier of a video stream segment included in the first video stream to the terminal 150 according to a channel identifier of the target channel carried in the play request; receiving an acquisition request carrying the fragment identifier sent by the terminal 150; and acquiring an audio stream corresponding to the first video stream according to the fragment identifier, and sending the audio stream to the terminal 150.

The terminal 150 is configured to receive the first video stream and the fragment identifier sent by the FCC server 140, send an acquisition request carrying the fragment identifier to the FCC server 140, receive an audio stream corresponding to the first video stream sent by the FCC server 140, and play the first video stream and the audio stream. Since the first bitrate of the first video stream is lower than the second bitrate, the terminal 150 can play the first video stream quickly.

The FCC server 140 may transmit a second video stream corresponding to the target channel to the terminal 150 at a time after the audio stream starts to be transmitted. Since the second bitrate of the second video stream is greater than the first bitrate of the first video stream, the quality of the video pictures played by the terminal 150 can be improved.

The terminal 150 is further configured to receive a second video stream of the target channel sent by the FCC server 140 and play the second video stream, apply for joining the multicast group corresponding to the target channel in the multicast server 130, and receive the second video stream of the target channel sent by the multicast server 130; and when the second video stream sent by the playing FCC server 140 is the same as the second video stream sent by the multicast server 130, playing the second video stream sent by the multicast server 130.

It should be noted that: when receiving the media stream of a certain channel sent by the source server 110, the encoding server 120 may further fragment a second video stream included in the media stream, so that the second video stream includes a plurality of video stream fragments, and generate a fragment identifier for each video stream fragment. In addition, each path of the first video stream obtained by the encoding server 120 decoding the second video stream is also composed of video stream fragments.

Each video stream slice in the second video stream corresponds to a video stream slice in the first video stream. For any video stream slice in the second video stream and a corresponding video stream slice in the first video stream, the start time, the video content and the slice identifier of the two video stream slices are the same, but the sharpness of the two video stream slices is different and the number of the included data packets is different, and the number of the included data packets of the video stream slice in the second video stream is greater than the number of the included data packets of the video stream slice in the first video stream. The first frame included in each video stream slice located in the second video stream is a key frame for decoding the video stream, and the first frame included in each video stream slice located in the first video stream is also a key frame for decoding the video stream. The data packet may be an RTP packet.

Referring to fig. 2, the sequential 4 video stream slices S11, S12, S13 and S14 in the first video stream are used as an example, the video stream slice S11 corresponds to the video stream slice S21 in the second video stream, and the slice identifiers of S11 and S21 are both "1"; the video stream slice S12 is S22 in the second video stream, and the slice identifiers of S12 and S22 are both "2"; the video stream slice S13 is S23 in the second video stream, and the slice identifiers of S13 and S23 are both "3"; the video stream slice S14 corresponds to the video stream slice S24 in the second video stream, and the slice identifiers of S14 at S24 are all "4".

Each video slice in the second video stream includes a number of packets that is greater than a number of packets included in a corresponding video stream slice of a video stream. For example, still referring to fig. 2, the number of packets included in the video stream slice S11 in the first video stream is 2, which is less than the number of packets included in the video stream slice S21, i.e., 4.

In addition, the coding server 120 may segment the audio stream included in the media stream, so that the audio stream includes a plurality of audio stream segments, and generate a segment identifier for the audio stream segments. Each video stream slice in the first video stream and the second video stream corresponds to an audio stream slice in the audio stream, and the start time of the video stream slice is the same as that of the corresponding audio stream slice.

The encoding server 120 fragments the first video stream, the second video stream, and the audio stream included in the media stream, and then sends the media stream to the multicast server 130.

It should also be noted that: the multicast server 130 may add a time stamp to each video stream slice in the first video stream, each video stream slice in the second video stream and each audio stream slice in the audio stream, which are included in the media stream, after receiving the media stream of a certain channel transmitted by the encode server 120. For any video stream segment in the first video stream, the video stream segment and the video stream segment corresponding to the video stream segment in the second video stream have the same time stamp, and the time stamp of the video stream segment and the time stamp of the audio stream segment corresponding to the video stream segment in the audio stream are also the same.

Still referring to fig. 2, the timestamps corresponding to the video stream slice S11 and the video stream slice S21 are the same for all three of S31, S11, S21 and S31, so that the video content in S11 and S21 is the same, and the audio content in S31 corresponds to the video content.

The multicast server 130 adds timestamp information to each video stream segment in the first video stream, each segment in the second video stream, and each audio stream segment in the audio stream included in the media stream, and then transmits the media stream to the FCC server 140 and the terminal 150.

Referring to fig. 3, a schematic structural diagram of a server according to an exemplary embodiment of the present invention is shown, where the server may be the FCC server 140 in the embodiment shown in fig. 1, and the server includes: a processor 31, a network interface 32, a cache 33, and a memory 34.

The processor 31 includes one or more processing cores, and the processor 31 executes various functional applications and data processing by executing software programs and modules.

The network interface 32 may be multiple, with some of the network interfaces 32 being used to communicate with the multicast server 130 and some of the network interfaces 32 being used to communicate with the terminals 150.

The cache 33 is connected to the processor 31 for caching the media stream of the at least one channel received from the multicast server 130, and may be connected to the network interface 32 via a bus.

The memory 34 is connected to the processor 31, for example, the memory 34 may be connected to the processor 31 through a bus; the memory 23 may be used to store software programs and modules.

The memory 34 may store an application module 35 required for at least one function, and the application module 35 may include a transmitting module 351, a processing module 352, a receiving module 353, and the like. The processor 31 executes the corresponding steps executed by the FCC server in fig. 5 by operating the transmitting module 351, the processing module 352 and the receiving module 353, which refer to the description of fig. 5 in detail.

The memory 34 may be implemented by any type of volatile or non-volatile memory device or combination thereof, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic disk or optical disk.

Those skilled in the art will appreciate that the configuration of the server shown in FIG. 3 does not constitute a limitation of the server and may include more or fewer components than shown, or some components may be combined, or a different arrangement of components.

Referring to fig. 4, a schematic structural diagram of a terminal according to an exemplary embodiment of the present invention is shown. The terminal may be the terminal 150 in the embodiment shown in fig. 1. The terminal includes: a processor 41, a network interface 42, and a memory 43.

The processor 41 includes one or more processing cores, and the processor 41 executes various functional applications and data processing by running software programs and modules.

The network interface 42 may be multiple, wherein a portion of the network interface 42 is used for communicating with the multicast server 130, and a portion of the network interface 42 is used for communicating with the FCC server 140.

The memory 43 is connected to the processor 41, for example, the memory 43 may be connected to the processor 41 through a bus; the memory 43 may be used to store software programs and modules.

The memory 43 may store an application module 44 required for at least one function, and the application module 44 may include a transmitting module 441, a processing module 442, a receiving module 443, and the like. The processor 41 executes the corresponding steps executed by the terminal in fig. 5 by executing the sending module 441, the processing module 442 and the receiving module 443, which refer to the description of fig. 5 in detail.

The memory 43 may be implemented by any type or combination of volatile or non-volatile memory devices, such as static SRAM, EEPROM, EPROM), PROM, ROM, magnetic memory, flash memory, magnetic or optical disks.

Those skilled in the art will appreciate that the configuration of the terminal shown in fig. 4 is not intended to be limiting and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components.

Referring to fig. 5, a method for playing a media stream according to an exemplary embodiment of the present invention is shown, where the method for playing a media stream is applied in an IPTV system as shown in fig. 1, and the method for playing a media stream may include:

step 501, the terminal sends a play request carrying a channel identifier of a target channel to an FCC server.

The terminal may send a play request to the FCC server in the following cases, including:

firstly, when a terminal needs to switch from a currently played original channel to a target channel, a playing request is sent to an FCC server.

When the terminal is playing the media stream of the original channel currently, the user can select the target channel to be switched. When the terminal detects the selected target channel, the channel identifier of the target channel is acquired, and a playing request carrying the channel identifier of the target channel is sent to the FCC server.

When the terminal detects the selected target channel, the terminal also disconnects the connection between the terminal and the multicast server to stop receiving the media stream of the original channel sent by the multicast server.

Secondly, the terminal takes a preset channel as a target channel when starting, and sends a playing request to the FCC server to request to join the target channel.

The preset channel may be a default channel set by the terminal when the terminal leaves a factory, or a channel played by the terminal when the terminal was last turned off. Therefore, the terminal can acquire the channel identifier of the default channel or the channel identifier of the channel played when the terminal is closed last time when the terminal is started currently, and send the playing request carrying the channel identifier of the target channel to the FCC server by taking the acquired channel identifier as the channel identifier of the target channel.

Step 502, the FCC server receives a play request sent by the terminal, and sends a first video stream of a target channel and a fragment identifier of a video stream fragment included in the first video stream to the terminal according to a channel identifier of the target channel carried in the play request.

The FCC server receives and caches the media stream of each channel sent by the multicast server in real time, wherein the media stream of each channel comprises a first video stream, a second video stream, at least one audio stream and the like.

The method comprises the following steps: the method comprises the steps that an FCC server receives a playing request sent by a terminal, a first video stream of a target channel cached by the FCC server is determined according to a channel identifier of the target channel carried in the playing request, a key frame with time before the current time is obtained from the first video stream of the target channel cached by the FCC server, a first video stream from the time corresponding to the key frame to the current time is obtained from the first video stream of the target channel cached by the FCC server, the obtained first video stream and a playing response are sent to the terminal, and the playing response carries a fragment identifier of a video stream fragment included in the obtained first video stream.

There are various ways for the FCC server to obtain the key frame, for example, a key frame whose time is before the current time and is closest to the current time may be obtained from the first video stream of the target channel cached by the FCC server, or a key frame whose frame is before the current time may be obtained randomly.

The FCC server, when transmitting the acquired first video stream, may transmit the acquired first video stream at a receiving rate greater than that of the first video stream of its reception target channel. For example, in implementation, the FCC server determines a receiving rate of the first video stream of the target channel currently received by the FCC server, calculates a sending rate according to a preset first multiple and the receiving rate, and sends the obtained first video stream to the terminal according to the sending rate. The first multiple is greater than 1, for example, the first multiple may be equal to 1.4, 1.5, 1.6, and the magnitude of the sending rate is obtained by multiplying the receiving rate by the first multiple.

The FCC server sends the acquired first video and sends the play response in sequence, and there may be several types, one is to send the play response first and then send the acquired first video stream; one is to simultaneously transmit the acquired first video stream and the play response to the terminal. For the case of simultaneous transmission, the FCC server may transmit the play response to the terminal at the same time as the start of transmission of the acquired first video stream, or at some point after the start of transmission of the first video stream.

The segment identifier carried in the play response is the segment identifier of the nth video stream segment included in the acquired first video stream, and n is a positive integer greater than or equal to 1. In general, n is 1, and may be 2 or 3.

The play response may be an RTCP packet, see fig. 6, where the RTCP packet includes a default/Adaptive code rate (Fault bit/Adaptive bit) field, an identification of the current unicast synchronization source (SSRC of current single Burst) field, a User network Protocol address (User IP) field, a User RTCP port (User RTCPPort) field, a Reserved (Reserved) field, an identification field (repetition ID), a Bandwidth (Bandwidth) field, and an extended Fragment Time Stamp (Fragment Time Stamp) field, and the Fragment Time Stamp field is used to carry a Fragment identification of a video stream Fragment included in the acquired first video stream.

In step 503, the terminal receives the segment identifiers of the first video stream and the video stream segments sent by the FCC server, and sends an acquisition request carrying the segment identifiers to the FCC server.

The implementation of this step can be: the terminal receives a first video stream and a play response sent by the FCC server, extracts a fragment identifier of a video fragment included in the first video stream from the play response, and sends an acquisition request carrying the fragment identifier to the FCC server.

The first video stream comprises description information of at least one audio stream. The terminal may select the description information of the at least one audio stream from the description information of the at least one audio stream according to the capability information of the terminal, and the acquisition request may further carry the selected description information of the audio stream.

The capability information of the terminal may include a bandwidth size of the terminal, a decoding capability of the terminal, and/or hardware resources of the terminal, etc.

If the first video stream includes description information of one audio stream or an audio stream selected by the terminal is an audio stream that is default between the terminal and the FCC server in advance, the obtaining request may or may not carry the description information of the one audio stream, or the obtaining request may or may not carry the description information of the selected default audio stream.

In step 504, the FCC server receives the acquisition request sent by the terminal, and determines the start time of the first video stream that the FCC server has sent to the terminal according to the fragment identifier in the acquisition request.

The method comprises the following steps: the method comprises the steps that an FCC server receives an acquisition request sent by a terminal, extracts a fragment identifier carried by the acquisition request, wherein the extracted fragment identifier is the fragment identifier of the nth video stream fragment included in a first video stream sent to the terminal, determines the fragment identifier of the first video stream fragment included in the first video stream according to the extracted fragment identifier, acquires the timestamp of the first video stream fragment according to the fragment identifier of the first video stream fragment, and determines the timestamp of the first video stream fragment as the starting time of the first video stream sent to the terminal.

If n is 1, the extracted slice identifier may be used as the slice identifier of the first video stream slice. And if n is larger than 1, calculating the fragment identification of the first video stream fragment according to the extracted fragment identification and the value n.

In general, the slice identifier of a video stream slice in the first video stream is continuously changed, and the difference value between the slice identifiers of two adjacent video stream slices is 1, so the process of calculating the slice identifier may be as follows: and if the fragment identifier extracted from the acquisition request is m, the fragment identifier of the first video stream fragment in the first video stream is m-n-1.

Step 505, the FCC server obtains an audio stream corresponding to the first video stream according to the start time, and sends the audio stream to the terminal.

When a playing request received by an FCC server does not carry description information of an audio stream, two situations are indicated, wherein the first situation is that a first video stream only comprises description information of one audio stream, namely a target channel only corresponds to one audio stream; the second is to select the description information of default audio stream agreed with FCC server in advance for the terminal.

In this case, the step may be: in the first case, the FCC server determines a path of audio stream corresponding to the target channel cached by the FCC server, obtains the audio stream from the start time to the current time from the path of audio stream cached by the FCC server, and sends the obtained audio stream to the terminal. In the second case, the FCC server determines the default audio stream corresponding to the description information of the default audio stream from at least one path of audio stream corresponding to the target channel cached by the FCC server, acquires the audio stream from the start time to the current time from the default audio stream, and sends the acquired audio stream to the terminal.

When the playback request received by the FCC server carries description information of an audio stream, this step may be: the FCC server determines a path of audio stream corresponding to the description information of the audio stream from at least one path of audio stream corresponding to the target channel cached by the FCC server, acquires the audio stream from the starting time to the current time from the determined path of audio stream, and sends the acquired audio stream to the terminal.

The FCC server, when transmitting the retrieved audio stream, may transmit the audio stream at a rate greater than the rate at which it receives the audio stream. For example, in implementation, the FCC server determines a receiving rate at which it currently receives the audio stream sent by the multicast server, calculates a sending rate according to a preset second multiple and the receiving rate, and sends the audio stream to the terminal at the sending rate. The second multiple is greater than 1, for example the first multiple may be 1.4, 1.5, 1.6, etc., and the magnitude of the transmission rate is obtained by multiplying the receiving rate by the second multiple. The second multiple may or may not be equal to the first multiple.

In step 506, the terminal receives the audio stream sent by the FCC server, and provides the received video stream and audio stream to a player on the terminal for playing.

Because the starting time of the audio stream and the video stream received by the terminal is the same, the terminal can ensure that the sound and the video picture of the target channel are synchronous by simultaneously playing the audio stream and the video stream.

In step 507, the FCC server sends a notification message to the terminal when stopping sending the first video stream to the terminal and starting sending the second video of the target channel to the terminal, where the notification message carries a sequence number of a last data packet of the first video stream and a sequence number of a first data packet of the second video stream.

After the FCC server starts to send the first video stream of the target channel to the terminal, the FCC server continuously and continuously sends the first video stream to the terminal, and meanwhile, the FCC server also receives the first video stream sent by the multicast server. When the data packet included in the first video stream currently sent to the terminal by the FCC server is the same as the data packet included in the first video stream currently sent by the receiving multicast server, the FCC server determines the video stream slice to which the data packet currently sent to the terminal belongs, and stops sending the first video stream to the terminal when the determined video stream slice is sent completely. Meanwhile, the FCC server sends a second video stream of the target channel to the terminal, and the first video stream slice included in the second video stream is the next video stream slice of the determined video stream slices.

For example, referring to fig. 2, assuming that the data packet of the first video stream currently sent by the FCC server to the terminal is a data packet with sequence number 4, and assuming that the data packet of the first video stream currently received by the FCC server is also a data packet with sequence number 4, the FCC server determines that the video stream slice to which the data packet with sequence number 4 belongs is S12, and the slice identifier of the video stream slice S12 is 2; when the FCC server finishes sending the video stream slice S12, the FCC server sends the second video stream to the terminal, where the first video stream slice of the second video stream is S23, the slice identifier of the video stream slice S23 is 3, and the next video stream slice of the video stream slice S12 (the slice identifier is 2).

In order to solve the problem, the FCC server sends a notification message to the terminal when starting to send the second video of a target channel to the terminal, and informs the terminal that the last data packet of the first video stream and the first data packet of the second video stream are continuous data packets through the sequence number of the last data packet of the first video stream and the sequence number of the first data packet of the second video stream carried by the notification message.

For example, still referring to fig. 2, the sequence number of the last packet in the video stream fragment S12 is 4, and when the FCC server finishes transmitting the packet, the FCC server transmits the second video stream to the terminal, where the first video stream fragment of the second video stream is S23 and the sequence number of the first packet in the video stream fragment S23 is 9. Therefore, the terminal receives the packet with sequence number 4 and then receives the packet with sequence number 9. Because the terminal also receives the notification message carrying the sequence number 4 and the sequence number 9, it can be determined that the received data packet with the sequence number 4 and the received data packet with the sequence number 9 are continuous data packets according to the message, and the data packets are not lost.

The FCC server, when transmitting the second video stream, may transmit the second video stream at a lower rate than it receives the second video stream. For example, in implementation, the FCC server determines a receiving rate of the second video stream of the target channel currently received by the FCC server, calculates a sending rate according to a preset third multiple and the receiving rate, and sends the obtained second video stream to the terminal according to the sending rate. The third multiple is smaller than 1, for example, the third multiple can be equal to 0.6, 0.7, 0.8, and the magnitude of the sending rate is obtained by multiplying the receiving rate by the third multiple.

Optionally, the notification message may be an SCN message, referring to fig. 7, where the SCN message includes a Final Adaptive code rate (Final Adaptive binary) field, an extended old Sequence Number (Last Sequence Number) field, and a new Sequence Number (First Sequence Number) field, where the old Sequence Number field is used to carry the Sequence Number of the Last data packet, and the new Sequence Number field is used to carry the Sequence Number of the First data packet.

Step 508, the terminal receives the notification message sent by the FCC server, plays the second video stream sent by the FCC and the multicast group joining the target channel according to the notification message, and receives the second video stream and the audio stream sent by the multicast server.

The terminal receives a notification message sent by the FCC server, and also receives a second video stream sent by the FCC server, and obtains a sequence number of a last data packet of the first video stream and a sequence number of a first data packet of the second video stream carried in the notification message. And the terminal determines that the last data packet of the first video stream and the first data packet of the second video stream are two continuous data packets according to the sequence number of the last data packet of the first video stream and the sequence number of the first data packet of the second video stream, and then plays the second video stream from the first data packet of the second video stream.

And when the terminal plays the second video stream, the terminal also sends a joining request carrying the channel identifier of the target channel to the multicast server. And the multicast server receives the joining request, joins the terminal to the multicast group of the target channel according to the channel identifier of the target channel, and then sends a second video stream and an audio stream of the target channel to the terminal. And the terminal receives and caches the second video stream and the audio stream sent by the multicast server after starting playing the second video stream sent by the FCC.

In step 509, when the data packet included in the second video stream currently played by the terminal is the same as the first data packet included in the second video stream sent by the multicast server, the second video stream and the audio stream sent by the multicast server are played.

When the terminal plays the second video stream and the audio stream sent by the multicast server, the terminal completes switching from the original channel to the target channel or the terminal completes starting and then joins in the target channel. Meanwhile, the terminal can also disconnect from the FCC server to stop receiving the second video stream and the audio stream of the target channel transmitted by the FCC server.

In summary, in the method for playing a media stream provided by an embodiment of the present invention, a terminal sends an acquisition request to an FCC server, where the acquisition request carries segment identifiers of video stream segments in a video stream sent by the FCC server that the terminal has received; the FCC server determines the starting time of the video stream sent to the terminal according to the fragment identifier, acquires the audio stream corresponding to the video stream according to the starting time, and sends the audio stream to the terminal, wherein the starting time of the audio stream is the same as the starting time of the video stream; the terminal receives the video stream and the audio stream which are sent by the FCC server and have the same starting time, so that the audio and video synchronization can be ensured when the terminal simultaneously plays the video stream and the audio stream, the problem that the terminal cannot play sound or the played sound is inconsistent with the video picture in the related technology is solved, and the played sound of the terminal is consistent with the video picture.

The video stream sent by the server to the terminal comprises the description information of at least one audio stream, so that the terminal selects the description information of one audio stream conforming to the capability of the terminal, and the acquisition request sent to the server comprises the description information of the one audio stream. Therefore, the terminal can acquire the audio stream which accords with the self-capability according to the self-capability to play.

The server sends an RTCP packet to the terminal, and a fragment timestamp field of the RTCP packet carries the fragment identifier, so that the terminal sends an acquisition request carrying the fragment identifier to the server, and the audio stream corresponding to the video stream received by the terminal is acquired from the server.

The server sends a notification message to the terminal when stopping sending the first video stream to the terminal and sending the second video stream with the second code rate to the terminal, wherein the notification message carries the sequence number of the last data packet of the first video stream and the sequence number of the first data packet of the second video stream, so that the terminal determines that the last data packet and the first data packet are continuous data packets according to the notification message, the phenomenon that the terminal misjudges the data packets to be lost is avoided, and the second video stream is normally played from the first data packet.

Referring to fig. 8, a server 800 provided in another exemplary embodiment of the present invention is shown, where the server 800 may be an FCC server in the embodiment shown in fig. 1, fig. 3 and/or fig. 5, and the server 800 includes: a transmitting unit 810, a processing unit 820 and a receiving unit 830.

A sending unit 810, configured to perform a function of at least one of the steps 502, 505, and 507.

A processing unit 820, configured to perform a function of at least one of the steps 504 and 505.

A receiving unit 830, configured to perform a function of at least one of the steps 502 and 504.

Reference may be made in connection with the above-described method embodiments.

It should be noted that: in the above embodiment, when the server provides the media stream of the target channel requested to be played by the terminal to the terminal, only the division of the functional modules is illustrated, and in practical applications, the function distribution may be completed by different functional modules according to needs, that is, the internal structure of the server is divided into different functional modules to complete all or part of the functions described above. In addition, the server and the method for playing media streams provided by the above embodiments belong to the same concept, and specific implementation processes thereof are described in the method embodiments and are not described herein again.

Referring to fig. 9, which illustrates a terminal 900 according to another exemplary embodiment of the present invention, where the terminal 900 may be the terminal 900 in the embodiment shown in fig. 1, fig. 4 and/or fig. 5, and the terminal includes: a transmitting unit 910, a processing unit 920 and a receiving unit 930.

A sending unit 910, configured to perform a function of at least one of the steps 501, 503, and 508.

And a processing unit 920, configured to perform a function of at least one of the step 506, the step 508, and the step 509.

A receiving unit 930 configured to perform a function of at least one of the step 503, the step 506, and the step 508.

It should be noted that: in the terminal provided in the foregoing embodiment, when playing a media stream, only the division of the functional modules is described as an example, and in practical applications, the function distribution may be completed by different functional modules according to needs, that is, the internal structure of the terminal is divided into different functional modules to complete all or part of the functions described above. In addition, the terminal and the method for playing media stream provided by the above embodiments belong to the same concept, and the specific implementation process thereof is detailed in the method embodiments and will not be described herein again.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

The above description is only exemplary of the present application and should not be taken as limiting, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. A method of playing a media stream, the method comprising:

a server receives an acquisition request which is sent by a terminal and carries a fragment identifier of a video stream fragment, wherein the video stream fragment is a video stream fragment contained in a video stream sent to the terminal by the server;

the server acquires the starting time of the video stream according to the fragment identification of the video stream fragment;

the server acquires an audio stream corresponding to the video stream according to the starting time;

the server sends the audio stream to the terminal so that the terminal plays the video stream and the audio stream;

the video stream sent to the terminal by the server is a first video stream with a first code rate;

the method further comprises the following steps:

when the server stops sending the first video stream to the terminal and sends a second video stream with a second code rate to the terminal, the server sends a notification message to the terminal, wherein the notification message carries a sequence number of a last data packet of the first video stream and a sequence number of a first data packet of the second video stream, and the first code rate is smaller than the second code rate.

2. The method of claim 1, wherein the video stream includes description information of at least one audio stream, and the acquisition request further carries description information of one audio stream;

the server obtains the audio stream corresponding to the video stream according to the starting time, and the method comprises the following steps:

the server determines the audio stream according to the description information of the audio stream;

and the server acquires the audio stream corresponding to the video stream from the audio stream according to the starting time.

3. The method according to claim 1 or 2, wherein before the server receives an acquisition request carrying segment identifiers of video stream segments sent by a terminal, the method further comprises:

and the server sends a real-time transport control protocol (RTCP) packet to the terminal, wherein a fragment timestamp field of the RTCP packet carries a fragment identifier of the video stream fragment.

4. The method of claim 1, wherein the notification message notifies a server terminal of an SCN packet, the SCN packet comprising an old sequence number field and a new sequence number field, the old sequence number field carrying the sequence number of the last data packet, and the new sequence number field carrying the sequence number of the first data packet.

5. A method of channel switching, the method comprising:

a terminal sends an acquisition request carrying fragment identification of a video stream fragment to a server, wherein the video stream fragment is a video stream fragment contained in a video stream sent by the server and received by the terminal;

the terminal receives an audio stream sent by the server according to the fragment identifier, and the starting time of the audio stream is the same as the starting time of the video stream;

the terminal plays the video stream and the audio stream;

the video stream received by the terminal is a first video stream with a first code rate;

the method further comprises the following steps:

the terminal receives a second video stream with a second code rate and a notification message, wherein the second code rate is greater than the first code rate, and the notification message carries a sequence number of a last data packet of the first video stream and a sequence number of a first data packet of the second video stream;

and the terminal plays the second video stream according to the notification message.

6. The method of claim 5, wherein the video stream includes description information of at least one audio stream;

the terminal sends an acquisition request carrying the fragment identifier of the video stream fragment to the server, and the acquisition request comprises the following steps:

the terminal selects the description information of one audio stream from the at least one audio stream;

and the terminal sends an acquisition request carrying the fragment identification of the video stream fragment and the description information of the audio stream to the server.

7. The method according to claim 5 or 6, before the terminal sends an acquisition request carrying fragment identifiers of video stream fragments to the server, further comprising:

and the terminal receives a real-time transport control protocol (RTCP) packet sent by the server and acquires a fragment identifier of the video stream fragment carried by a fragment timestamp field of the RTCP packet.

8. The method of claim 5, wherein the notification message notifies a server terminal of an SCN packet, the SCN packet comprising an old sequence number field and a new sequence number field, the old sequence number field carrying the sequence number of the last data packet, and the new sequence number field carrying the sequence number of the first data packet.

9. A server, characterized in that the server comprises:

a receiving unit, configured to receive an acquisition request carrying a fragment identifier of a video stream fragment sent by a terminal, where the video stream fragment is a video stream fragment included in a video stream sent by the server to the terminal;

the processing unit is used for acquiring the starting time of the video stream according to the fragment identification of the video stream fragment; acquiring an audio stream corresponding to the video stream according to the starting time;

a sending unit, configured to send the audio stream to the terminal, so that the terminal plays the video stream and the audio stream;

the sending unit is further configured to send a notification message to the terminal when the sending of the first video stream to the terminal is stopped and a second video stream with a second bitrate is sent to the terminal, where the notification message carries a sequence number of a last data packet of the first video stream and a sequence number of a first data packet of the second video stream, and the first bitrate is smaller than the second bitrate.

10. The server according to claim 9, wherein the video stream includes description information of at least one audio stream, and the acquisition request further carries description information of one audio stream;

the processing unit is further configured to:

determining the audio stream according to the description information of the audio stream;

and acquiring the audio stream corresponding to the video stream from the audio stream according to the starting time.

11. The server according to claim 9 or 10, wherein the sending unit is further configured to send a real-time transport control protocol RTCP packet to the terminal, and a fragment timestamp field of the RTCP packet carries a fragment identifier of the video stream fragment.

12. The server according to claim 9, wherein the notification message notifies a server terminal of an SCN packet, the SCN packet comprising an old sequence number field and a new sequence number field, the old sequence number field carrying the sequence number of the last data packet, and the new sequence number field carrying the sequence number of the first data packet.

13. A terminal, characterized in that the terminal comprises:

a sending unit, configured to send an acquisition request carrying a fragment identifier of a video stream fragment to a server, where the video stream fragment is a video stream fragment included in a video stream sent by the server and received by the terminal;

a receiving unit, configured to receive an audio stream sent by the server according to the fragment identifier, where a start time of the audio stream is the same as a start time of the video stream;

a processing unit for playing the video stream and the audio stream;

the received video stream is a first video stream with a first code rate;

the receiving unit is further configured to receive a second video stream with a second code rate and a notification message, where the second code rate is greater than the first code rate, and the notification message carries a sequence number of a last data packet of the first video stream and a sequence number of a first data packet of the second video stream;

the processing unit is further configured to play the second video stream according to the notification message.

14. The terminal of claim 13, wherein the video stream includes description information of at least one audio stream;

the processing unit is further configured to select description information of one audio stream from the at least one audio stream;

the sending unit is further configured to send, to the server, an acquisition request carrying the fragment identifier of the video stream fragment and the description information of the audio stream.

15. The terminal according to claim 13 or 14, wherein the receiving unit is further configured to receive a real-time transport control protocol RTCP packet sent by the server, and obtain a fragment identifier of the video stream fragment carried in a fragment timestamp field of the RTCP packet.

16. The terminal of claim 13, wherein the notification message notifies a server terminal of an SCN packet, the SCN packet comprising an old sequence number field and a new sequence number field, the old sequence number field carrying the sequence number of the last data packet, and the new sequence number field carrying the sequence number of the first data packet.