WO2021143479A1 - 媒体流传输方法及系统 - Google Patents

媒体流传输方法及系统 Download PDF

Info

Publication number
WO2021143479A1
WO2021143479A1 PCT/CN2020/138855 CN2020138855W WO2021143479A1 WO 2021143479 A1 WO2021143479 A1 WO 2021143479A1 CN 2020138855 W CN2020138855 W CN 2020138855W WO 2021143479 A1 WO2021143479 A1 WO 2021143479A1
Authority
WO
WIPO (PCT)
Prior art keywords
media
target
frame
media stream
parameter
Prior art date
Application number
PCT/CN2020/138855
Other languages
English (en)
French (fr)
Inventor
周超
Original Assignee
北京达佳互联信息技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京达佳互联信息技术有限公司 filed Critical 北京达佳互联信息技术有限公司
Priority to EP20913499.8A priority Critical patent/EP3968647A4/en
Publication of WO2021143479A1 publication Critical patent/WO2021143479A1/zh
Priority to US17/542,841 priority patent/US20220095002A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs
    • H04N21/2343Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
    • H04N21/23439Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements for generating different versions
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/25Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
    • H04N21/266Channel or content management, e.g. generation and management of keys and entitlement messages in a conditional access system, merging a VOD unicast channel into a multicast channel
    • H04N21/2662Controlling the complexity of the video stream, e.g. by scaling the resolution or bitrate of the video stream based on the client capabilities
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/21Server components or server architectures
    • H04N21/218Source of audio or video content, e.g. local disk arrays
    • H04N21/2187Live feed
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/60Network streaming of media packets
    • H04L65/61Network streaming of media packets for supporting one-way streaming services, e.g. Internet radio
    • H04L65/612Network streaming of media packets for supporting one-way streaming services, e.g. Internet radio for unicast
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/60Network streaming of media packets
    • H04L65/61Network streaming of media packets for supporting one-way streaming services, e.g. Internet radio
    • H04L65/613Network streaming of media packets for supporting one-way streaming services, e.g. Internet radio for the control of the source by the destination
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/21Server components or server architectures
    • H04N21/218Source of audio or video content, e.g. local disk arrays
    • H04N21/2183Cache memory
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/238Interfacing the downstream path of the transmission network, e.g. adapting the transmission rate of a video stream to network bandwidth; Processing of multiplex streams
    • H04N21/23805Controlling the feeding rate to the network, e.g. by controlling the video pump
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/239Interfacing the upstream path of the transmission network, e.g. prioritizing client content requests
    • H04N21/2393Interfacing the upstream path of the transmission network, e.g. prioritizing client content requests involving handling client requests
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/24Monitoring of processes or resources, e.g. monitoring of server load, available bandwidth, upstream requests
    • H04N21/2402Monitoring of the downstream path of the transmission network, e.g. bandwidth available
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/25Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
    • H04N21/262Content or additional data distribution scheduling, e.g. sending additional data at off-peak times, updating software modules, calculating the carousel transmission frequency, delaying a video stream transmission, generating play-lists
    • H04N21/26258Content or additional data distribution scheduling, e.g. sending additional data at off-peak times, updating software modules, calculating the carousel transmission frequency, delaying a video stream transmission, generating play-lists for generating a list of items to be played back in a given order, e.g. playlist, or scheduling item distribution according to such list
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
    • H04N21/44004Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving video buffer management, e.g. video decoder buffer or video display buffer
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/60Network structure or processes for video distribution between server and client or between remote clients; Control signalling between clients, server and network components; Transmission of management data between server and client, e.g. sending from server to client commands for recording incoming content stream; Communication details between server and client 
    • H04N21/63Control signaling related to video distribution between client, server and network components; Network processes for video distribution between server and clients or between remote clients, e.g. transmitting basic layer and enhancement layers over different transmission paths, setting up a peer-to-peer communication via Internet between remote STB's; Communication protocols; Addressing
    • H04N21/64Addressing
    • H04N21/6402Address allocation for clients
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/60Network structure or processes for video distribution between server and client or between remote clients; Control signalling between clients, server and network components; Transmission of management data between server and client, e.g. sending from server to client commands for recording incoming content stream; Communication details between server and client 
    • H04N21/63Control signaling related to video distribution between client, server and network components; Network processes for video distribution between server and clients or between remote clients, e.g. transmitting basic layer and enhancement layers over different transmission paths, setting up a peer-to-peer communication via Internet between remote STB's; Communication protocols; Addressing
    • H04N21/647Control signaling between network components and server or clients; Network processes for video distribution between server and clients, e.g. controlling the quality of the video stream, by dropping packets, protecting content from unauthorised alteration within the network, monitoring of network load, bridging between two different networks, e.g. between IP and wireless
    • H04N21/64746Control signals issued by the network directed to the server or the client
    • H04N21/64761Control signals issued by the network directed to the server or the client directed to the server
    • H04N21/64769Control signals issued by the network directed to the server or the client directed to the server for rate control
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/60Network structure or processes for video distribution between server and client or between remote clients; Control signalling between clients, server and network components; Transmission of management data between server and client, e.g. sending from server to client commands for recording incoming content stream; Communication details between server and client 
    • H04N21/65Transmission of management data between client and server
    • H04N21/658Transmission by the client directed to the server
    • H04N21/6581Reference data, e.g. a movie identifier for ordering a movie or a product identifier in a home shopping application
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments
    • H04N21/8455Structuring of content, e.g. decomposing content into time segments involving pointers to the content, e.g. pointers to the I-frames of the video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments
    • H04N21/8456Structuring of content, e.g. decomposing content into time segments by decomposing the content in the time domain, e.g. in time segments
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/85Assembly of content; Generation of multimedia applications
    • H04N21/854Content authoring
    • H04N21/8547Content authoring involving timestamps for synchronizing content

Definitions

  • the present disclosure relates to the field of network technology, and in particular to a media stream transmission method and system.
  • Fragmentation-based media transmission methods include the common DASH (Dynamic Adaptive Streaming over HTTP, an HTTP-based adaptive streaming media transmission standard formulated by MPEG. Among them, the full English name of MPEG is Moving Picture Experts Group, and the full Chinese name is Dynamic Picture Experts Group. ), HLS (HTTP Live Streaming, an adaptive streaming media transmission standard based on HTTP developed by Apple), etc. However, the delay of media transmission based on fragmentation is relatively high.
  • the embodiments of the present disclosure provide a media stream transmission method and system.
  • the technical solutions are as follows:
  • a media stream transmission method which is applied to a terminal, and includes: in response to a frame acquisition instruction for a media stream, determining from address information of the media stream with multiple code rates The target address information of the media stream of the target code rate; determine the start position of the media frame to be obtained corresponding to the target code rate in the media stream; send the target address information and the start position to the server
  • the frame acquisition request is used to instruct the server to return the media frame starting from the starting position in the media stream at the target bit rate.
  • a media stream transmission method which is applied to a server and includes: receiving a frame acquisition request, the frame acquisition request carrying target address information of a media stream with a target bitrate and the target The starting position of the media frame to be obtained in the media stream corresponding to the code rate; in response to the frame obtaining request, obtaining the media frame starting from the starting position from the address corresponding to the target address information; The target code rate transmits the media frame starting from the start position to the terminal.
  • a media streaming system including a terminal and a server, the terminal is used to execute the above-mentioned media streaming method applied to the terminal; the server is used to execute the above-mentioned application to the server Media streaming method.
  • a media stream transmission device which is applied to a terminal and includes a determining module and a sending module; wherein, the determining module is used to respond to a frame acquisition instruction for a media stream, from various codes In the address information of the media stream of the target bit rate, the target address information of the media stream that determines the target bit rate; the determining module is also used to determine that the media frame to be obtained corresponding to the target bit rate is in the media stream
  • the sending module is used to send a frame acquisition request carrying the target address information and the starting position to the server, and the frame acquisition request is used to instruct the server to return the The media frame starting from the start position in the media stream.
  • a media stream transmission device which is applied to a server and includes a receiving module, an obtaining module, and a transmission module; wherein the receiving module is configured to receive a frame acquisition request, and the frame acquisition request carries The target address information of the media stream with the target code rate and the start position of the media frame to be obtained corresponding to the target code rate in the media stream; the obtaining module is configured to respond to the frame obtaining request from the The address corresponding to the target address information obtains the media frame starting from the starting position; the transmission module is configured to transmit the media frame starting from the starting position to the terminal at the target code rate.
  • an electronic device including: one or more processors; one or more memories for storing executable instructions of the one or more processors; wherein, the One or more processors are configured to execute the instructions to implement the following steps: in response to the frame acquisition instruction for the media stream, from the address information of the media streams with multiple code rates, determine the target code rate.
  • the target address information of the media stream determine the starting position in the media stream of the media frame to be obtained corresponding to the target bit rate; send a frame acquisition request carrying the target address information and the starting position to the server,
  • the frame acquisition request is used to instruct the server to return the media frame starting from the starting position in the media stream at the target bit rate.
  • an electronic device including: one or more processors; one or more memories for storing executable instructions of the one or more processors; wherein, the The one or more processors are configured to execute the instructions to implement the following steps: receive a frame acquisition request, the frame acquisition request carries the target address information of the media stream of the target bitrate and the target bitrate corresponding to the target bitrate. Acquire the starting position of the media frame in the media stream; in response to the frame acquisition request, obtain the media frame starting from the initial position from the address corresponding to the target address information; use the target bit rate , To transmit the media frame starting from the start position to the terminal.
  • a storage medium When instructions in the storage medium are executed by a processor of an electronic device, the electronic device can execute the following steps: Obtaining instructions, from the address information of the media streams of multiple code rates, determine the target address information of the media stream of the target code rate; determine that the media frame to be obtained corresponding to the target code rate is in the media stream Sending a frame acquisition request carrying the target address information and starting position to the server, and the frame acquisition request is used to instruct the server to return the media stream at the target bit rate from the The media frame starting at the start position.
  • a storage medium When instructions in the storage medium are executed by a processor of an electronic device, the electronic device can perform the following steps: receiving a frame acquisition request, The frame acquisition request carries the target address information of the media stream of the target code rate and the starting position of the media frame to be acquired corresponding to the target code rate in the media stream; in response to the frame acquisition request, the target The address corresponding to the address information obtains the media frame starting from the starting position; and transmitting the media frame starting from the starting position to the terminal at the target code rate.
  • a computer program product including one or more instructions, when the one or more instructions are executed by a processor of an electronic device, the electronic device can execute the aforementioned media stream Transmission method.
  • Fig. 1 is a schematic diagram showing an implementation environment of a media stream transmission method according to an exemplary embodiment
  • FIG. 2 is a schematic diagram of a FAS framework provided by an embodiment of the present disclosure
  • Fig. 3 is a flow chart showing a method for transmitting a media stream according to an exemplary embodiment
  • Fig. 4 is a flow chart showing a method for transmitting a media stream according to an exemplary embodiment
  • Fig. 5 is an interaction flow chart showing a method for media stream transmission according to an exemplary embodiment
  • Fig. 6 is a schematic diagram showing a code rate switching process according to an exemplary embodiment
  • FIG. 7 is a schematic diagram of a principle for determining a target timestamp provided by an embodiment of the present disclosure.
  • Fig. 8 is a block diagram showing a media stream transmission device according to an exemplary embodiment
  • Fig. 9 is a block diagram showing a media stream transmission device according to an exemplary embodiment
  • FIG. 10 is a block diagram of a terminal provided by an embodiment of the present disclosure.
  • Fig. 11 is a block diagram of a server provided by an embodiment of the present disclosure.
  • the user information involved in this disclosure is information authorized by the user or fully authorized by all parties.
  • FLV is a streaming media format
  • FLV streaming media format is a video format developed with the introduction of Flash MX (an animation production software). Due to the extremely small file size and fast loading speed, it makes it possible to watch video files online (that is, to browse videos online). Its emergence effectively solves the problem of SWF (a special file for Flash) that is exported after video files are imported into Flash. Format) The file is so large that it cannot be used well on the Internet.
  • Streaming media uses a streaming transmission method, which refers to a technology and process that compresses a series of media streams and sends resource packs through the network, thereby instantly transmitting the media streams online for viewing.
  • This technology makes the resource packs look like flowing water. Send; if you don’t use this technology, you must download the entire media file before using it, so you can only watch the media stream offline.
  • Streaming can transmit live media streams or media streams pre-stored on the server. When audience users are watching these media streams, the media streams will be played by specific playback software after they reach the audience terminals of the audience users.
  • FAS FLV Adaptive Streaming, FLV-based adaptive streaming media transmission standard
  • FAS is a streaming resource transmission standard (or called a resource transmission protocol) proposed in the present disclosure. It is different from traditional fragment-based media transmission.
  • the FAS standard can achieve frame-level media streaming, and the server does not need to wait for a complete After the video clip arrives, the resource package can be sent to the terminal. Instead, the target timestamp is determined after the terminal’s frame acquisition request is parsed. Based on the target timestamp being less than zero, all media frames that have been buffered starting from the target timestamp will be packaged and sent to The terminal (without fragmentation), after that, based on the target timestamp being greater than or equal to zero or there is a real-time stream in addition to the buffered media frames, then the media frames of the media stream are sent to the terminal frame by frame.
  • the target code rate is specified in the frame acquisition request.
  • the code rate to be switched is adaptively adjusted, and the frame acquisition request corresponding to the code rate to be switched is re-sent, thereby adaptively adjusting Media stream bit rate.
  • the FAS standard can realize frame-level transmission.
  • Live broadcast The media stream is recorded in real time.
  • the host user “pushes” the media stream (referring to push based on the streaming transmission method) to the server through the host terminal.
  • the media stream is "pulled” from the server (referring to pull based on the streaming transmission method) to the audience terminal, and the audience terminal decodes and plays the media stream, thereby real-time video playback.
  • On-demand also known as Video On Demand (VOD)
  • VOD Video On Demand
  • the server can provide the media stream specified by the audience user according to the requirements of the audience user.
  • the audience terminal sends an on-demand request to the server, and the server finds the request specified by the on-demand request. After the media stream, the media stream is sent to the audience terminal, that is to say, the audience user can selectively play a specific media stream.
  • the on-demand content arbitrarily controls the playback progress, while the live broadcast does not.
  • the speed of the live broadcast content depends on the real-time live broadcast progress of the host user.
  • Fig. 1 is a schematic diagram showing an implementation environment involved in a media stream transmission method according to an exemplary embodiment.
  • the implementation environment includes at least one terminal and a server, which will be described in detail below:
  • the at least one terminal is used for media stream transmission, and each terminal is equipped with a media codec component and a media playback component.
  • the media codec component is used to receive media streams (such as fragmented resource packages and frame-level transmissions).
  • the media stream is decoded after the transmitted media frame), and the media playback component is used to play the media stream after the media stream is decoded.
  • the at least one terminal is divided into a host terminal and a viewer terminal.
  • the host terminal corresponds to the host user
  • the viewer terminal corresponds to the viewer user.
  • the terminal is the host terminal.
  • the terminal is the host terminal when the user is recording a live broadcast
  • the terminal is the viewer terminal when the user is watching the live broadcast.
  • the at least one terminal and the server are connected through a wired network or a wireless network.
  • the server is used to provide media streams to be transmitted, and the server includes at least one of a server, multiple servers, a cloud computing platform, or a virtualization center.
  • the server is responsible for the main calculation work, and at least one terminal is responsible for the secondary calculation work; or, the server is responsible for the secondary calculation work, and at least one terminal is responsible for the main calculation work; or, at least one terminal and the server are used between Distributed computing architecture for collaborative computing.
  • the server is a clustered CDN (Content Delivery Network, content delivery network) server.
  • the CDN server includes a central platform and edge servers deployed in various places.
  • the central platform performs load balancing, content distribution, scheduling, etc.
  • the functional module enables the user's terminal to rely on the local edge server to obtain the required content (ie media stream) nearby, thereby reducing network congestion and improving the response speed and hit rate of terminal access.
  • the CDN server adds a caching mechanism between the terminal and the central platform.
  • the caching mechanism is also an edge server (such as a WEB server) deployed in different geographic locations.
  • the central platform will depend on the terminal and the edge server. Dispatching the edge server closest to the terminal to provide services to the terminal, which can distribute content to the terminal more effectively.
  • the media streams involved in the embodiments of the present disclosure include, but are not limited to: at least one of video resources, audio resources, image resources, or text resources.
  • the embodiments of the present disclosure do not specifically limit the types of media streams.
  • the media stream is a live video stream of a network host, or a historical on-demand video pre-stored on the server, or a live audio stream of a radio host, or a historical on-demand audio pre-stored on the server.
  • the device type of each terminal in at least one terminal includes, but is not limited to: TV, smart phone, smart speaker, vehicle-mounted terminal, tablet computer, e-book reader, MP3 (Moving Picture Experts Group Audio Layer III) , Motion Picture Experts compress standard audio layer 3) player, MP4 (Moving Picture Experts Group Audio Layer IV, Motion Picture Experts compress standard audio layer 4) player, laptop portable computer or desktop computer at least one.
  • the terminal includes a smart phone as an example.
  • the number of the aforementioned at least one terminal is only one, or the number of the at least one terminal is tens or hundreds, or more.
  • the embodiments of the present disclosure do not limit the quantity and device type of at least one terminal.
  • FIG. 2 is a schematic diagram of a FAS framework provided by an embodiment of the present disclosure. Please refer to FIG. 2.
  • the embodiment of the present disclosure provides a FAS (Streaming-based Multi-rate Adaptive) framework.
  • FAS Streaming-based Multi-rate Adaptive
  • a terminal 101 and a server 102 perform multimedia resource transmission through the FAS protocol.
  • An application also known as FAS client
  • the application is used to browse multimedia resources.
  • the application is a short video application, a live broadcast application, and a video-on-demand application.
  • Social applications shopping applications, etc.
  • the embodiments of the present disclosure do not specifically limit the types of applications.
  • the user starts the application on the terminal, and displays the resource push interface (such as the home page or function interface of the application).
  • the resource push interface includes at least one multimedia resource's abbreviated information, and the abbreviated information includes title, profile, and publisher At least one of, posters, trailers, or highlights, in response to the user's touch operation on the abbreviated information of any multimedia resource, the terminal jumps from the resource push interface to the resource playback interface, and the resource playback interface includes the The play option of the multimedia resource, in response to the user's touch operation on the play option, the terminal downloads the media presentation description file (MPD) of the multimedia resource from the server, and determines the multimedia with the target bitrate based on the media description file
  • MPD media presentation description file
  • the target address information of the resource, and the frame acquisition request (or FAS request) carrying the target address information is sent to the server, so that the server processes the frame acquisition request based on certain specifications (the processing specifications of the FAS request), and the server locates the multimedia After the media
  • the media stream requested by the terminal is usually the live video stream that the host user pushes to the server in real time.
  • the server transcodes the live video stream after receiving the live video stream of the host user to obtain
  • different address information is assigned to live video streams with different bit rates, and recorded in the media description file, so that the corresponding frame acquisition request carrying different address information can be returned at different bit rates.
  • Live video streaming is usually the live video stream that the host user pushes to the server in real time.
  • the server transcodes the live video stream after receiving the live video stream of the host user to obtain
  • different address information is assigned to live video streams with different bit rates, and recorded in the media description file, so that the corresponding frame acquisition request carrying different address information can be returned at different bit rates.
  • Live video streaming is usually the live video stream that the host user pushes to the server in real time.
  • a mechanism for adaptively adjusting the code rate is provided.
  • the code rate to be switched that matches the current network bandwidth situation is adaptively adjusted. For example, when the code rate needs to be switched, the terminal disconnects the media stream transmission link of the current code rate, sends a frame acquisition request carrying the address information to be switched corresponding to the code rate to be switched to the server, and establishes media stream transmission based on the code rate to be switched Link, of course, the terminal does not disconnect the media stream transmission link of the current bit rate, but directly re-initiates the frame acquisition request carrying the address information to be switched, and establishes a media stream transmission link based on the bit rate to be switched (used to transmit new media Stream), the original media stream is used as a backup stream. Once the transmission of the new media stream is abnormal, the backup stream will continue to be played, and the bit rate of the media stream will be dynamically adjusted during the playback process.
  • frame-level media stream transmission can be achieved without the need for fragmented transmission of multimedia resources.
  • Fig. 3 is a flowchart showing a method for transmitting a media stream according to an exemplary embodiment. The method is applied to a terminal and includes the following steps.
  • the target address information of the media stream of the target code rate is determined from the address information of the media stream of multiple code rates.
  • the address information of the media stream of the multiple code rates is stored in the media description file of the media stream.
  • the target address of the media stream of the target code rate is determined from the address information of the media stream of multiple code rates included in the media description file of the media stream information.
  • the starting position of the media frame to be acquired corresponding to the target bit rate in the media stream is determined.
  • a frame acquisition request carrying the target address information and starting position is sent to the server, and the frame acquisition request is used to instruct the server to return the media frame starting from the starting position in the media stream at the target bit rate .
  • the determining the start position of the media frame to be acquired corresponding to the target bit rate in the media stream includes: The position of the media frame generated during the operation time of the playback operation is determined as the starting position; or, the position of the media frame selected in the frame acquisition instruction in the media stream is determined as the starting position; or The location of the first media frame of the media stream is determined as the starting location.
  • the frame acquisition instruction is triggered when the playback status information of the media stream satisfies the code rate switching condition.
  • the step of determining the target address information of the media stream with the target bit rate from the address information of the media stream with multiple bit rates in response to the frame acquisition instruction for the media stream includes: When any media frame in the stream, obtain the playback status information of the media stream; in response to the playback status information meeting the code rate switching condition, determine the target code rate from the address information of the media stream with multiple code rates The target address information of the media stream;
  • the determining the starting position of the to-be-obtained media frame corresponding to the target bit rate in the media stream includes: determining the position of the to-be-obtained media frame corresponding to the target bit rate according to the position of any media frame in the media stream The starting position in the media stream.
  • the media with the target code rate in response to the playback status information meeting the code rate switching condition, from the address information of the media stream with multiple code rates included in the media description file of the media stream, the media with the target code rate is determined.
  • the target address information of the stream includes: in response to the playback status information meeting the code rate switching condition, determining the target code rate according to the playback status information and the current code rate; in response to the target code rate being not equal to the current code rate, From the address information of the media stream of multiple code rates included in the media description file of the media stream, the target address information of the media stream of the target code rate is determined.
  • the playing state information includes a first buffering amount, the first buffering amount being the amount of buffering currently buffered and not playing the media stream; the response to the playing state information meeting the code rate switching condition, according to The playing state information and the current bit rate determine the target bit rate, including: in response to the first buffer amount being greater than the first buffer amount threshold or the first buffer amount being smaller than the second buffer amount threshold, according to the playing state information and the current code rate The target bit rate is determined, wherein the second buffer amount threshold is less than the first buffer amount threshold.
  • determining the target code rate according to the playback status information and the current code rate includes: obtaining a plurality of candidate code rates; according to the relationship between the multiple candidate code rates and the current code rate, the playback State information and the position of any media frame in the media stream in the media frame group where the any media frame is located, and obtain the second buffer amount corresponding to each candidate bit rate; according to the second buffer corresponding to each candidate bit rate
  • the relationship between the buffer amount and the first buffer amount threshold or the second buffer amount threshold is to determine the target bit rate from the multiple candidate bit rates; wherein the second buffer amount corresponding to each candidate bit rate is to switch the bit rate After reaching the candidate bit rate, the amount of buffering that has been buffered but not played for the media stream at the end of the transmission of the media frame group where any media frame is located.
  • the frame acquisition request further includes at least one of a first extended parameter or a second extended parameter, the first extended parameter is used to indicate whether the media frame is an audio frame, and the second extended parameter is used to indicate The media frame in the media stream is transmitted from the target timestamp indicated by the second extension parameter.
  • the media description file includes a version number and a media description set, where the version number includes at least one of the version number of the media description file or the version number of the resource transmission standard, and the media description set includes multiple Media description meta information, each media description meta information corresponds to a media stream of one bit rate, and each media description meta information includes the length of the picture group and attribute information of the media stream of the bit rate corresponding to the media description meta information.
  • Fig. 4 is a flow chart showing a method for transmitting a media stream according to an exemplary embodiment. The method is applied to a server and includes the following steps.
  • a frame acquisition request is received, and the frame acquisition request carries the target address information of the media stream of the target code rate and the starting position of the media frame to be obtained corresponding to the target code rate in the media stream.
  • the obtaining the media frame from the starting position from the address corresponding to the target address information includes: determining the target timestamp based on the starting position; and determining and obtaining the target timestamp based on the target timestamp. The media frame starting from this starting position.
  • determining the target timestamp based on the starting position includes: determining the target timestamp based on the audio parameter and the pull position parameter.
  • the determining the target timestamp based on the audio parameter and the pull position parameter includes: based on the pull position parameter as a default value, and the audio parameter is the default value or the audio parameter is false, the maximum value The value obtained by subtracting the absolute value of the default value of the pull position parameter from the timestamp is determined as the target timestamp; or, based on the pull position parameter as the default value and the audio parameter is true, the maximum audio timestamp is subtracted The value obtained by removing the absolute value of the default value of the pull position parameter is determined as the target timestamp; or, based on the pull position parameter being equal to 0, and the audio parameter is the default value or the audio parameter is false, the maximum time The time stamp is determined as the target timestamp; or, based on the pull position parameter is equal to 0 and the audio parameter is true, the maximum audio timestamp is determined as the target timestamp; or, based on the pull position parameter is less than 0, and If the audio parameter is the default value or the audio parameter is false, the maximum value The value obtained
  • the method further includes: determining that the time stamp rollback occurs in the buffer area based on the non-monotonic increase in the timestamps of the media frames in the media frame sequence in the buffer area; The time stamp does not increase non-monotonously, and it is determined that no time stamp rollback has occurred in the buffer area, and the media frame sequence is a sequence composed of multiple media frames buffered in the buffer area.
  • the method further includes: determining that the media frame sequence is non-monotonously increasing based on the video resource included in the buffer area and the time stamp of the key frame in the key frame sequence is non-monotonously increasing, the key frame
  • the sequence is a sequence composed of a plurality of buffered key frames; based on the fact that the buffer does not include video resources and the timestamps of the audio frames in the audio frame sequence are non-monotonically increasing, it is determined that the media frame sequence is non-monotonously increasing, the The audio frame sequence is a sequence composed of multiple buffered audio frames.
  • the determining and acquiring the media frame from the starting position based on the target timestamp includes: determining that the target media frame is the starting position based on the presence of the target media frame in the current valid buffer area For the media frame starting at the position, the time stamp of the target media frame is greater than or equal to the target time stamp and is closest to the target time stamp; or, based on the absence of the target media frame in the current valid buffer area, enters the waiting state until the target When the media frame is written into the current effective buffer area, it is determined that the target media frame is the media frame starting from the starting position, and the timestamp of the target media frame is greater than or equal to the target timestamp and is closest to the target timestamp; Or, based on the fact that there is no target media frame in the current valid buffer area, and the difference between the target timestamp and the maximum timestamp is greater than the timeout threshold, the pull failure information is sent, and the timestamp of the target media frame is greater than or equal to this The target timestamp is closest
  • the method embodiments provided by the embodiments of the present disclosure are unilaterally introduced above from the terminal and the server, and the embodiments of the present disclosure are exemplarily described from the perspective of interaction between the terminal and the server through another method embodiment.
  • Fig. 5 is a flowchart showing a method for transmitting a media stream according to an exemplary embodiment. Referring to Fig. 5, the method includes the following steps.
  • a media stream transmission method is provided.
  • a terminal receives a frame acquisition instruction, and in response to the frame acquisition instruction, determines the target address information of the media stream of the target bit rate and the start position of the media frame to be acquired, Thus, a request is sent to the server to instruct the server to send the corresponding media frame according to the target bit rate, so as to realize the frame-level transmission of the media stream.
  • the frame acquisition instruction includes two triggering methods, and the two triggering methods correspond to different application scenarios.
  • the frame acquisition instruction is when the playback status information of the media stream meets the code rate switching condition trigger. That is, when it is determined according to the playing state information of the media stream that the code rate needs to be switched, the frame acquisition instruction is triggered, and a request is sent to the server again to request the media frame of the target code rate after the switch. Refer to S51 and S52 for the corresponding content of this trigger mode.
  • the frame acquisition instruction is triggered by a play operation of the media stream.
  • the play operation is the first play operation of the media stream performed by the user, and it is also an operation of restarting the playback after the playback is paused.
  • the terminal provides the user with a bit rate selection list, and the user selects a bit rate as the target bit rate in the bit rate selection list.
  • the user manually clicks any value in the code rate selection list, and accordingly, the terminal determines the code rate corresponding to the value as the target code rate.
  • the user's playback operation of the media stream occurs during the terminal playing the media stream. Accordingly, the user manually switches the transmission bit rate at any time during the process of using the terminal to obtain the media stream.
  • This play operation also occurs before the terminal obtains the media stream for the first time. Accordingly, before the user starts to obtain the media stream using the terminal, the target code rate is determined in the code rate selection list provided by the terminal, and the terminal responds to the corresponding triggered media The frame fetching instruction of the stream.
  • the target code rate is still a default value, which is not limited in the embodiment of the present disclosure.
  • the terminal and the server are connected through a wired network or a wireless network, and the server is used to provide media streams with multiple bit rates.
  • the media stream includes a dry-based media frame, which is an audio frame and also an image frame, and the dry-based media frame is obtained by sampling original media resources.
  • the terminal continuously obtains dry-based media frames in the media stream from the server, and then plays the obtained media frames to realize the transmission and playback of the media stream.
  • the terminal obtains the playback status information of the media stream, and the playback status information is used to determine whether it is necessary to switch the transmission bit rate of the media stream.
  • An application is installed on the terminal, and the application is used to browse media streams.
  • the application includes at least one of a short video application, a live broadcast application, a video-on-demand application, a social application, or a shopping application.
  • the type is specifically limited.
  • the media streams involved in the embodiments of the present disclosure include, but are not limited to: at least one of video resources, audio resources, image resources, or text resources.
  • the embodiments of the present disclosure do not specifically limit the types of media streams.
  • the media stream is a live video stream of a network host, or a historical on-demand video pre-stored on the server, or a live audio stream of a radio host, or a historical on-demand audio pre-stored on the server.
  • the user starts an application on the terminal, and the application displays a resource push interface.
  • the resource push interface is the homepage or function interface of the application.
  • the embodiment of the present disclosure does not specifically limit the type of the resource push interface.
  • the resource pushing interface includes abbreviated information of at least one media stream, and the abbreviated information includes at least one of a title, a brief introduction, a poster, a trailer, or a highlight segment of the media stream.
  • the terminal responds to the playback status information meeting the code rate switching condition, and determines the target bit rate according to the playback status information and the current bit rate.
  • the terminal when the terminal is playing the media stream, it obtains the playback status information of the media stream.
  • the playback status information is used to determine whether the code rate of the media stream needs to be switched.
  • the terminal responds to this. Determine and switch the bit rate to a target bit rate that can optimize the playback effect of the media stream.
  • the code rate is adjusted to the code rate corresponding to the current network bandwidth information.
  • the playback status information of the terminal is also combined to dynamically select the playback. The best target bit rate, so as to achieve a compromise between the stutter rate, definition and smoothness of the media stream.
  • the terminal determines the target code rate, it obtains the address information of the media stream corresponding to the target code rate, that is, the target address information.
  • the target address information of the media stream with the target bit rate is determined, which provides a basis for the subsequent transmission of the media stream according to the target address information.
  • the address information of the media stream with multiple bit rates is stored in the media description file of the media stream. Accordingly, the terminal determines from the address information of the media stream with multiple bit rates included in the media description file The target address information of the media stream with the target bitrate.
  • the playing state information includes a first buffering amount
  • the first buffering amount is a buffering amount currently buffered for the media stream and not being played.
  • the terminal determines the target bit rate according to the playback status information and the current bit rate, and the second buffer amount threshold is less than the first buffer amount threshold .
  • the terminal after obtaining the media frame of the media stream, stores the obtained media frame in the buffer, and when the media frame needs to be played, the buffered media frame is decoded and played in chronological order.
  • the first buffer amount is measured by the duration of the buffered and unplayed media stream. For example, if the terminal has buffered for 1000 milliseconds (ms) and played for 400ms, the first buffer amount is 600ms.
  • the target bit rate is determined according to the playback status information and the current bit rate, including two cases:
  • Case 1 In response to the first buffer amount being greater than the first buffer amount threshold, the terminal determines the target code rate according to the playback status information and the current bit rate, and the target bit rate is greater than or equal to the current bit rate.
  • the first buffer amount is greater than the first buffer amount threshold, indicating that the current terminal has buffered the media stream and has not played the buffer amount to ensure the smooth playback of the media stream.
  • Case 2 In response to the first buffer amount being less than the second buffer amount threshold, the terminal determines the target code rate according to the playback status information and the current bit rate, and the target bit rate is less than or equal to the current bit rate.
  • the second buffer amount threshold is less than the first buffer amount threshold, and the first buffer threshold and the second buffer threshold are preset buffer amount thresholds and also temporarily set buffer amount thresholds.
  • the first buffer amount is less than the second buffer amount threshold
  • the lower the bit rate of the media stream the lower the In the same time, the more media streams the terminal can cache, the more the cache can make the media stream play more smoothly, so consider reducing the download bit rate of the media stream.
  • the first cache amount is less than or equal to the first cache amount
  • the threshold is greater than or equal to the second buffer amount threshold. At this time, it indicates that the current terminal has cached the media stream and the amount of unplayed cache meets the playback requirements of the media stream, and may just meet the requirements, and the download bit rate of the media stream is not changed at this time.
  • the first buffer amount threshold is represented by q h
  • the second buffer amount threshold is represented by q l
  • the first buffer amount is represented by q c .
  • q c > q h the media stream playback has a very low probability of stalling. At this time, consider increasing the bit rate of the media stream.
  • q c ⁇ q h the media stream playback has a high probability of stalling. At this time, consider reducing the bit rate of the media stream.
  • the first buffer amount is compared with the first buffer threshold and the second buffer threshold to determine whether to switch the code rate of the media stream and quickly learn the current media stream.
  • the playback effect When the first buffer amount is greater than the first buffer amount threshold or smaller than the second buffer amount threshold, the terminal performs a code rate switching step, and adjusts the code rate adaptively to optimize the playback effect. However, considering that after the bit rate is switched, the terminal receiving the media stream at the target bit rate may not guarantee the normal playback of the media stream.
  • the second buffer amount threshold is compared with the two thresholds to determine Will the playback of the media stream be improved after the bit rate is switched?
  • the process of determining the target code rate is to judge the playback effects corresponding to multiple candidate code rates, so as to obtain the candidate code rate with the best playback effect from the multiple candidate code rates as the target code rate.
  • the terminal obtains multiple candidate bit rates, and obtains each candidate bit rate according to the relationship between the multiple candidate bit rates and the current bit rate, the playback status information, and the position of any media frame in the media stream in the media frame group where the any media frame is located.
  • a second buffer amount corresponding to a candidate bit rate, and the target bit rate is determined from a plurality of candidate bit rates according to the relationship between the second buffer amount corresponding to each candidate bit rate and the first buffer amount threshold or the second buffer amount threshold .
  • media streams of multiple code rates are cached in the server, and the multiple candidate code rates are multiple code rates of the media streams that the server can provide.
  • the media stream includes a plurality of media frame groups, and the length of each media frame group is preset according to business requirements, and is also temporarily set by a technician, which is not limited in the present disclosure.
  • Each media frame group includes multiple media frames, and the multiple media frames are arranged in chronological order. The position of any media frame in the media stream in the media frame group where the any media frame is located is represented by the time required for the terminal to play from the first frame of the media frame group to the any media frame.
  • the second buffer amount corresponding to each candidate bit rate represents the length of time that the buffered but unplayed media stream can be played after the bit rate is switched to the candidate bit rate and the terminal transmits to the media frame group where any media frame is located.
  • media streams of n types of code rates are cached in the server, and multiple candidate code rates include r 1 , r 2 ,...r n .
  • the length of the media frame group in which any media frame is located is D
  • the time required to play from the first frame of the media frame group to any media frame is d
  • d represents that any media frame is in the media frame group.
  • q n is used to represent the second buffer amount corresponding to the nth candidate code rate.
  • D and d are positive numbers
  • n is a positive integer.
  • the playback status information, and the position of any media frame in the media stream in the media frame group where the media frame is located obtain The second buffer amount corresponding to each candidate bit rate, and then the target bit rate is determined according to the relationship between the second buffer amount corresponding to each bit rate and the first buffer amount threshold and the second buffer amount threshold, providing a determination
  • the target code rate method provides a basis for realizing the code rate switching of the terminal.
  • the process of acquiring the second buffer amount is: according to the position of any media frame in the media stream in the media frame group where any media frame is located, the terminal acquires the data at the end of the media frame group transmission where any media frame is located.
  • the buffering increase of the media stream according to the relationship between the multiple candidate bitrates and the current bitrate, the buffering position when continuing to buffer any media frame in the media frame group, determine the current to any one based on the multiple candidate bitrates
  • the playback volume of the media stream at the end of the media frame group transmission where the media frame is located, and each candidate is obtained according to the current buffered and unplayed first buffer volume, buffer increase, and playback volume included in the playback status information.
  • the second buffer amount corresponding to the bit rate.
  • the position of any media frame in the media frame group where the any media frame is located is represented by the time required for the terminal to play from the first frame of the media frame group to the any media frame, for example, the unit is ms.
  • the time that the buffered media stream can be played is the buffer increase.
  • the terminal is still playing the cached media stream.
  • the duration of the media stream played by the terminal is equal to Is the amount of play.
  • the terminal can obtain the second buffer amount corresponding to each candidate bit rate according to the first buffer amount, buffer increase amount, and playback amount.
  • the second buffer amount corresponding to the multiple candidate code rates can indicate the length of time that the buffered and unplayed media stream can be played when the terminal switches to the multiple candidate code rates to complete the transmission of the currently transmitting media frame group.
  • q c is the first buffer amount
  • q b is the playback amount
  • q n is the second buffer amount corresponding to the nth candidate code rate.
  • D is the length of the media frame group in which any media frame is located
  • d is the time required for playing from the first frame of the media frame group to the any media frame.
  • Dd represents the buffer increase.
  • the second buffer amount of each candidate bit rate can be obtained, and a method of obtaining the second buffer amount is provided, which is based on the second buffer amount.
  • the relationship with the first buffer amount threshold and the second buffer amount threshold provides a basis for determining the target bit rate.
  • the current network status information is also referred to determine the time required for the media frame group in which any media frame is cached to be completed, and to determine how much buffer volume will be played during this period.
  • the terminal determines the cache location when continuing to cache any media frame in the media frame group, obtains current network state information, and according to the current network state information, cache location, and media frame group The length of, and multiple candidate code rates, determine the amount of media stream played in the process until the end of the transmission of any media frame group based on multiple candidate code rates to the end of the media frame group.
  • the above-mentioned playback volume is the duration of the media stream played during the time period during which the terminal obtains the media stream corresponding to the buffer increase. It can be seen that the playback volume is related to the speed at which the terminal obtains the media stream corresponding to the buffer increase, that is, it is related to the terminal's network. Status related. Correspondingly, the terminal obtains current network information, and the current network information includes the average bandwidth of the terminal within a period of time close to the current moment.
  • the buffer position is related to the relationship between the candidate bit rate and the current bit rate. When the candidate bit rate is the same as the current bit rate, the terminal does not need to switch the bit rate.
  • the next frame of the media frame continues to buffer the media frame group where any media frame is located.
  • the terminal starts from the first frame of the media frame group in which any media frame is located, and buffers any media The media frame group where the frame is located.
  • the terminal can determine multiple candidate code rate pairs based on the current network state information obtained, the length of the media frame group, and multiple candidate code rates. The amount of playback in the process at the end of the media frame group transmission.
  • r c is the current code rate
  • r n is the nth candidate code rate
  • q n is the second buffer amount corresponding to the nth candidate code rate
  • q c is the first buffer amount
  • D is any media frame
  • d is the time required to play from the first frame of the media frame group to any media frame
  • B is the average bandwidth of the terminal within a period of time close to the current moment
  • Dd represents the buffer increments.
  • (Dd)*r c * 8/B represents the playback volume.
  • the terminal starts from the first frame of the media frame group where any media frame is located, and buffers the media frame group where any media frame is located.
  • D*r n *8 /B indicates the amount of playback.
  • the following formula is also used to obtain the average bandwidth B of the terminal within a period of time close to the current moment:
  • S is the data volume of the media stream downloaded by the terminal within a period of time close to the current moment
  • T is a period of time close to the current moment, for example, T is 500 milliseconds.
  • the current network status information is obtained, combined with the length of the media frame group and multiple candidate bit rates, and the current time is determined based on multiple candidate bit rates for any media
  • the playback volume of the media stream at the end of the transmission of the media frame group where the frame is located provides a method for obtaining the playback volume, so that the terminal can obtain the second buffer volume.
  • the terminal responds to that at least one second buffer amount among the second buffer amounts corresponding to the multiple candidate code rates is greater than the first buffer amount threshold, and the terminal corresponds to the candidate corresponding to the at least one second buffer amount.
  • the largest candidate code rate is determined as the target code rate.
  • the terminal also determines that the current bit rate is the target bit rate in response to that the second buffer amount corresponding to the multiple candidate bit rates does not include the second buffer amount larger than the first buffer amount threshold.
  • the first buffer amount is greater than the first buffer amount threshold, that is to say, the current terminal has buffered the media stream and has not played the buffer amount to ensure the smooth playback of the media stream.
  • the first buffer amount threshold that is to say, the current terminal has buffered the media stream and has not played the buffer amount to ensure the smooth playback of the media stream.
  • the terminal determines the largest candidate code rate among the candidate code rates corresponding to the at least one second buffer amount as the target code rate, and the candidate code rate is greater than the current code rate.
  • the second buffer amount corresponding to multiple candidate bit rates does not include a second buffer amount greater than the first buffer amount threshold, it means that among multiple candidate bit rates, none of the multiple candidate bit rates can ensure the normal playback of the media stream and enhance the media at the same time.
  • the bit rate of the definition of the stream so the terminal determines the current bit rate as the target bit rate, and continues to buffer the media stream at the current bit rate.
  • the current code rate is represented by r c
  • the target code rate is represented by r
  • the nth candidate code rate is represented by r n
  • the second buffer amount corresponding to the nth candidate code rate is represented by q n
  • the largest candidate code rate is determined as the target code rate, or the current code rate is determined as the target code rate.
  • the terminal when at least one second buffer amount among the second buffer amounts corresponding to the multiple candidate code rates is greater than the first buffer amount threshold, the terminal corresponds to the at least one second buffer amount greater than Among the candidate code rates of the current code rate, the smallest candidate code rate is determined as the target code rate.
  • the terminal responds that at least one of the second buffer amounts corresponding to the multiple candidate code rates is greater than the second buffer amount threshold, and the at least one candidate code corresponding to the second buffer amount Among the rates, the largest candidate code rate is determined as the target code rate.
  • the terminal also responds that the second buffer amount corresponding to the multiple candidate code rates does not include the second buffer amount greater than the second buffer amount threshold, and the second buffer amount corresponding to the multiple candidate code rates is the largest second buffer amount
  • the corresponding candidate code rate is determined as the target code rate.
  • the first buffer amount is less than the second buffer amount threshold, that is to say, the current terminal has buffered the media stream and the unplayed buffer volume cannot guarantee the smooth playback of the media stream.
  • the second buffer amount threshold When at least one of the second buffer amounts corresponding to the multiple candidate bit rates is greater than the second buffer amount threshold, the at least one candidate bit rate corresponding to the at least one second buffer amount can ensure the normal playback of the media stream, and accordingly Yes, the terminal determines the largest candidate code rate among the candidate code rates corresponding to the at least one second buffer amount as the target code rate.
  • the candidate code rate corresponding to the largest second buffer amount is determined as the target code rate.
  • the current code rate is represented by r c
  • the target code rate is represented by r
  • the nth candidate code rate is represented by r n
  • the second buffer amount corresponding to the nth candidate code rate is represented by q n
  • the second buffer amount threshold Expressed by q l. If for any r n , there is no q n ⁇ q l , then the r n corresponding to the largest q n is taken as the target code rate. If for any r n, q n ⁇ q l exists, then taken to satisfy q n ⁇ q l of r n, the largest target code rate r n.
  • the largest candidate bit rate is determined as the target bit rate, or by determining the second buffer amount corresponding to multiple candidate bit rates, the largest one The candidate code rate corresponding to the second buffer amount is determined as the target code rate.
  • the target code rate is also a preset code rate, or the code rate indicated in the code rate selection instruction, for example, the user Select to switch the code rate and specify the target code rate.
  • the terminal executes the steps described in S53 below according to the target code rate, and the present disclosure does not limit the manner of obtaining the target code rate.
  • the playback status information includes at least one of the freeze information or the frame loss rate during the playback of the media stream. Accordingly, the playback status If the information meets the code rate switching condition, any one of the freeze information or the frame loss rate meets the code rate switching condition.
  • the stall information includes at least one of the number of stalls in the target time period of playing the media stream, the time of the last stall, or the duration of the last stall.
  • the code rate switching conditions include multiple situations, such as: the number of stalls is greater than the threshold, the duration between the last stall time and the current moment is less than the interval threshold, and the stall duration is greater than the duration threshold, or the last stall duration It is greater than the duration threshold, etc. In these cases, the terminal considers reducing the bit rate.
  • the code rate switching conditions also include that the number of stalls is less than the threshold, the duration between the last stall time and the current moment is greater than the interval threshold, and the stall duration is less than the duration threshold, and the last stall duration is less than the duration threshold. In these cases Next, the terminal considers increasing the code rate.
  • the code rate switching condition includes that the frame loss rate is greater than the first frame loss rate threshold, and the code rate switching condition is also that the frame loss rate is less than the second frame loss rate threshold, and the second frame loss rate threshold is less than the first The frame loss rate threshold.
  • the frame loss rate is also the frame loss rate in the target time period, for example, the frame loss rate in the past minute, and the media stream transmission situation during this period is judged by the frame loss rate in a period of time. , To determine whether the bit rate needs to be adjusted.
  • a method is provided to determine whether the current code rate switching condition is met according to the freeze information or frame loss rate, so that the terminal can switch media streams according to more judgment conditions The bit rate.
  • the step of determining the target bit rate by the terminal according to the playback status information and the current bit rate in the foregoing S52 is an optional step in the embodiment of the present disclosure.
  • the target code rate is a preset code rate. Accordingly, when the playback status information satisfies the code rate switching condition, the terminal directly determines the preset code rate as the target code rate, and then triggers the frame acquisition instruction. To execute the following S53.
  • the present disclosure does not limit the method for determining the target code rate.
  • the terminal receives a frame acquisition instruction for the media stream.
  • the frame acquisition instruction in the first trigger mode, the frame acquisition instruction is triggered when the playback status information meets the code rate switching condition, or, in the second trigger mode, the frame acquisition instruction is triggered by Triggers the play operation of the media stream.
  • the target code rate is determined in different ways.
  • the target code rate is determined to be the code rate to be switched.
  • the relationship between the target code rate and the current code rate may be different.
  • the target code rate is different from the current code rate, that is, if the switching code rate is determined through the above determination process, the terminal triggers the frame Obtain instructions and execute subsequent request sending steps.
  • the target bit rate is the same as the current bit rate, that is, through the above determination process, if it is determined not to switch the bit rate, the terminal keeps the current bit rate and continues to receive media frames of the media stream from the server. Accordingly, the terminal does not trigger The frame acquisition instruction or the frame acquisition instruction is triggered. After receiving the frame acquisition instruction, the terminal discards the frame acquisition instruction without responding. Of course, the terminal does not need to perform subsequent request sending steps.
  • the target code rate is the code rate selected by the user, and is also the default code rate, which is not limited in the embodiment of the present disclosure.
  • the terminal determines the target address information of the media stream with the target bitrate from the address information of the media stream with multiple bitrates included in the media description file of the media stream in response to the frame acquisition instruction for the media stream.
  • the terminal determines the target address information of the media stream of the target bitrate from the address information of the media stream of multiple bitrates in response to the frame acquisition instruction for the media stream.
  • the address information of the media stream with multiple bitrates is stored in the media description file as an example.
  • the address information of the media stream with multiple bitrates can also be stored in other places, and the terminal can obtain it from other places. To the address information of the media stream of multiple bitrates, and then determine the address information of the media stream of the target bitrate from it.
  • the server may form media streams with multiple bitrates.
  • the server allocates different address information for media streams with different bitrates, and combines the address information of media streams with various bitrates. Both are recorded in the media description file, and the terminal downloads the media description file of the media stream from the server, and based on the media description file, determines the address information of the media streams with different bit rates.
  • the terminal uses the target code rate as an index in the media description file, queries to obtain the media description meta information corresponding to the media stream of the target code rate, and extracts the target address from the attribute information of the media description meta information information.
  • the above-mentioned media description file is a data file provided by the server to the terminal based on business requirements, and is pre-configured by the server according to the business requirements. It is used to provide a set of data sets and business-related descriptions for streaming media services to the terminal, ensuring that the terminal The necessary information required for resource download, decoding, playback and rendering is obtained.
  • the media description file includes the encoded and transmittable media stream and the corresponding meta-information description, so that the terminal can construct a frame acquisition request (FAS) based on the media description file. Request), so that the server responds to the frame acquisition request according to the processing specifications of the FAS standard, and provides streaming media services to the terminal.
  • FAS frame acquisition request
  • the media description file is a JSON (JavaScript Object Notation, JS object notation) format file, of course, it is also a file in other formats, and the embodiment of the present disclosure does not specifically limit the format of the media description file.
  • the media description file includes the version number (@version) and the media description set (@adaptationSet), which are described in detail below:
  • the version number includes the media description file's version number. At least one of the version number or the version number of the resource transmission standard (FAS standard), for example, the version number only includes the version number of the FAS standard, or only the version number of the media description file, or the version number is still the version number of the media description file and The combination of the version numbers of the FAS standard.
  • FOS standard resource transmission standard
  • the media description set is used to represent the meta-information of the media stream.
  • the media description set includes multiple media description meta-information.
  • Each media description meta-information corresponds to a media stream with a bit rate.
  • the description meta information includes the length of the group of pictures (@gopDuration) and attribute information (@representation) of the media stream of the bit rate corresponding to the media description meta information.
  • the length of the Group Of Pictures here refers to the distance between two key frames.
  • the key frame refers to the intra-coded picture (Intra-coded picture, also known as "I frame") in the video coding sequence. ), the coding and decoding of the I frame does not need to refer to other image frames, and can be realized only by using the information of this frame.
  • P frame Predictive-coded picture
  • B frame Bidirectionally predicted picture
  • the encoding and decoding of the encoded image frame all need to refer to other image frames, and the encoding and decoding cannot be completed by using only the information of this frame. That is, the media frame group where the media frame in S52 is located. Since the media stream may only include audio streams, it may also include audio streams and video streams. If the media stream includes video streams, the media frame group is the picture If the media stream only includes audio streams, the media frame group is an audio frame group.
  • each attribute information includes the identification information of the media stream, the encoding method of the media stream, and the support of the media stream.
  • the bit rate and the address information of the media stream of that bit rate are the attribute information included in each media description meta-information.
  • Identification information Refers to the unique identifier of each media stream, and the identification information is allocated by the server.
  • Coding method Refers to the codec standard that the media stream complies with, such as H.263, H.264, H.265, MPEG, etc.
  • the bitrate supported by the media stream refers to the number of data bits transmitted per unit time during resource transmission, also known as the bit rate.
  • bit rate the number of data bits transmitted per unit time during resource transmission.
  • Audio resources Take audio resources as an example. The higher the bit rate, the more compressed the audio resources. The smaller the sound quality loss, the closer the sound quality to the sound source (the better the sound quality).
  • Video resources are similar to audio resources. However, since video resources are assembled from image resources and audio resources, they are not included in the calculation of the bitrate. In addition to the resources, the corresponding image resources must be added.
  • Address information (@url) of a media stream with a certain bitrate After the server transcodes the media stream and obtains the media stream with this bitrate, it provides the URL (Uniform Resource Locator) of the media stream with this bitrate to the outside. Resource locator) or domain name (Domain Name).
  • URL Uniform Resource Locator
  • each attribute information further includes at least one of the quality type of the media stream, the hidden option of the media stream, the first adaptive function option, or the default playback function option.
  • QualityType Includes quality evaluation indicators such as the resolution or frame rate of the media stream.
  • Media stream hidden option used to indicate whether the media stream is explicit or not. Based on the setting to true, it means that the media stream corresponding to the bit rate is not visible. At this time, the user cannot manually select the media stream corresponding to the bit rate.
  • the media stream of the bit rate can be selected through the adaptive function. Based on the setting as false, the media stream of the corresponding bit rate is displayed. At this time, in addition to the media stream of this bit rate, the user can also select the media stream of the bit rate through the adaptive function. Able to manually select the media stream corresponding to the bit rate.
  • the adaptive function involved in this application refers to the function of the terminal to dynamically adjust the frame rate of the played media stream according to the current network bandwidth situation, which will not be described in detail later.
  • the first adaptive function option (@enableAdaptive): used to indicate whether the media stream is visible to the adaptive function, based on the setting to true, it means that the media stream of the corresponding bit rate is visible to the adaptive function, and the media stream of the corresponding bit rate can Selected by the adaptive function, based on the setting as false, it means that the media stream of the corresponding bit rate is not visible to the adaptive function, and the media stream of the corresponding bit rate cannot be selected by the adaptive function.
  • Default playback function option used to indicate whether to play the media stream of the corresponding bitrate by default when the broadcast is started, based on the setting to true, which means that the media stream of the corresponding bitrate is played by default when the broadcast is started, based on the setting of false , Which means that the media stream of the corresponding bitrate is not played by default when the broadcast is started. Because the media player component cannot play the media stream of two bitrates by default (there is a play conflict), therefore, in the attribute information of all media description meta-information, at most only The default playback function option (@defaultSelect) of a media stream with a bitrate is true.
  • the media description file in addition to the version number and the media description set, also includes at least one of a service type, a second adaptive function option, or a third adaptive function option.
  • Service Type Used to specify the service type of the media stream, including at least one of live or on-demand. For example, when set to “dynamic”, it means live broadcast, and when it is set to “static”, it means on-demand. When specified, use “dynamic” as the default value.
  • the second adaptive function option (@hideAuto): used to indicate whether to turn on the adaptive function, based on the setting to true, it means turning off the adaptive function, and the adaptive option is not displayed, based on the setting to false, it means turning on the adaptive function , And the adaptive option is displayed. If it is not specified, the default value is "false”.
  • the third adaptive function option (@autoDefaultSelect): It is used to indicate whether the adaptive function is turned on by default when the broadcast is started. Based on the setting to true, it means that the playback is based on the adaptive function by default at the beginning of playback (starting), based on the setting If it is false, it means that the playback is not based on the adaptive function by default when starting to play, that is, the adaptive function is turned off by default when starting to play. It should be noted that the third adaptive function option here is the premise of the above default playback function option, that is, only when the third adaptive function option is set to flase (the adaptive function is turned off by default when the broadcast is started), the default The playback function option will be effective.
  • the media stream with the bit rate corresponding to @defaultSelect set to true will be played by default when the broadcast is started.
  • the media stream with the most suitable bit rate for the current network bandwidth will be selected according to the adaptive function when starting the broadcast.
  • the terminal determines the starting position of the media frame to be acquired corresponding to the target bit rate in the media stream.
  • the frame acquisition instruction includes two triggering methods.
  • the media frame to be acquired corresponding to the target bit rate may be different, and the starting position of the media frame to be acquired in the media stream may also be different. .
  • the target code rate may be different from the current code rate.
  • the position where the terminal starts to download the media frame in the media stream corresponding to the target bit rate after switching the bit rate may be the same as that of any media in the media stream corresponding to the current bit rate.
  • the position of the frame is different.
  • the starting position of the media frame to be obtained in the media stream corresponding to the target bit rate is the position in the media stream of the media frame to be obtained after the terminal switches the bit rate to the target bit rate.
  • the terminal can determine the starting position of the media frame to be acquired corresponding to the target bit rate in the media stream according to the position of any media frame in the media stream. In response to the target bit rate being equal to the current bit rate, the terminal can discard the playback status information, and no longer determine the target location. The terminal also responds that the target bit rate is not equal to the current bit rate, and determines the position in the media stream of the first media frame in the media frame group where any media frame is located as the starting position.
  • the target bit rate is the same as the current bit rate
  • the bit rate of the terminal transmitting the media stream has not changed, so it continues to transmit the media frame from the next media frame of any media frame based on the current bit rate, and accordingly, The terminal discards the acquired playback status information.
  • the key media frames in the media stream are strictly aligned, that is, in media streams with different bit rates, the position of the first media frame in the corresponding media frame group is the same .
  • the target bit rate is different from the current bit rate and the bit rate has changed, the bit rate of the media frame group where any media frame is located needs to be consistent. Therefore, the media frames of the media frame group are retransmitted according to the target bit rate.
  • the terminal starts to transmit the media frame based on the target bit rate from the position of the first media frame in the media frame group where the any media frame is located.
  • the media stream currently being transmitted by the terminal includes two media frame groups, and each media frame group includes multiple media frames, and each media frame corresponds to a time stamp.
  • the media frames in the first media frame group are arranged in the order of time stamps as [1000, 2000, 3000, 4000, 5000, 6000], and the media frames in the second media frame group are arranged in the order of time stamps as [7000 ,8000,9000,10000,11000,12000].
  • Take the media frame with the time stamp of 8000 as an example when the terminal is receiving the second media frame group.
  • the terminal receives the media frame with the time stamp of 8000 and the determined target bit rate is the same as the current bit rate, there is no need for code rate switching, there is no need to send a frame acquisition request to the server, and continue to receive media frames starting from the media frame with a timestamp of 9000. If the terminal receives a media frame with a timestamp of 8000, and the determined target code rate is different from the current code rate, it will switch the code rate and send a frame acquisition request to the server, and in order to keep the code rate of the media frame group consistent, Re-acquire the media frames in the media frame group starting from the media frame with a time stamp of 7000. The foregoing process re-determines the starting position of the media frame to be obtained in the media stream corresponding to the target bit rate according to the position of any media frame in the media stream.
  • the terminal starts to transmit the media frame based on the target bit rate from the position of the first media frame in the media frame group where any media frame is located. Accordingly, media streams with different bit rates may exist in the terminal at the same time. , When the terminal plays the media stream, it will give priority to the media stream with high bit rate.
  • the media frame to be acquired corresponding to the target bit rate may include multiple situations, and three possible situations are provided below.
  • Case 1 The terminal determines the position of the media frame generated during the operation time of the playback operation in the media stream as the starting position.
  • the user wants to watch a live broadcast of a certain anchor, and then performs a playback operation of the media stream, such as clicking the live broadcast room link of the anchor to enter the live broadcast room of the anchor.
  • the terminal uses the position of the media frame being generated at the current time in the live stream as the starting position.
  • Case 2 The terminal determines the position of the media frame selected in the frame acquisition instruction in the media stream as the starting position.
  • the user wants to watch the video from the 15th second, he will play the video, control the media stream to start playing at 15 seconds, and the terminal will place the media frame corresponding to 15 seconds in the video. As the starting position.
  • Case 3 The terminal determines the position of the first media frame of the media stream as the starting position.
  • the user wants to watch a certain video, and then performs a playback operation on the video, and the terminal determines the position of the first media frame of the video as the starting position.
  • the above-mentioned playback operation occurs before the terminal obtains the media stream for the first time, and also occurs during the process of the terminal playing the media stream.
  • the terminal determines the position of the media frame corresponding to the operation time as the starting position according to the operation time of the playback operation, so as to ensure that the user obtains the media stream after the operation time.
  • the terminal will download the media description file again every time the user clicks the play option, and determine the position of the first media frame of the media stream as the starting position.
  • the terminal also determines the position in the media stream of the media frame selected in the frame acquisition instruction as the starting position, which is not limited in the embodiment of the present disclosure.
  • the terminal sends a frame acquisition request carrying the target address information and the starting position to the server.
  • the frame acquisition request is used to instruct the server to transmit the media frames starting from the starting position in the media stream at the target bit rate.
  • the terminal After the terminal obtains the target address information and the starting position, it generates a frame acquisition request carrying the target address information and the starting position, and then sends a frame acquisition request carrying the target address information (or called a FAS request) to the server.
  • the frame acquisition request in addition to the target address information (@url), the frame acquisition request also includes an extended parameter (@extParam), which is used to specify different request methods to achieve different functions.
  • the extended parameter At least one of the first extended parameter or the second extended parameter is included, which will be described in detail below:
  • the first extended parameter is an audio parameter, which is used to indicate whether the media frame is an audio frame. Based on the setting true, it means that the media frame pulled by the terminal is an audio frame, that is, only pure audio stream is pulled . Based on the setting as false, it means that the media frame pulled by the terminal is an audio and video frame, that is, the audio stream and the video picture stream are pulled. If it is not specified, "false" is used as the default value.
  • the terminal obtains the type of the media stream, sets the first extended parameter to "false” or a default value based on the type of the media stream as video, and sets the first extended parameter to "false” or the default value based on the type of the media stream as audio. Is "true”.
  • the terminal also detects the type of the application, based on the type of the application being a video application, and setting the first extended parameter to "false” or a default value, and based on the type of the application being an audio application, the first extended parameter is set to "false” or a default value. The parameter is set to "true”.
  • the second extension parameter belongs to a pull position parameter and is used to indicate that the media frame of the media stream is transmitted from the target timestamp indicated by the second extension parameter.
  • the second extension The data type of the parameter is the int64_t type, of course, it is also other data types, and the embodiment of the present disclosure does not specifically limit the data type of the second extended parameter.
  • the second extended parameter is specified in the frame acquisition request. Based on the fact that the second extended parameter is not specified in the frame acquisition request, the server configures the default value of the second extended parameter by the server.
  • the target timestamp pts is the timestamp of the key frame or audio frame closest to the current time, when pulling the audio frame (pure audio mode) .
  • the terminal starts to pull the media stream from the latest audio frame, or, when pulling the audio and video frames (non-pure audio mode), the terminal starts to pull the media stream from the latest video I frame.
  • the target timestamp is smaller than the current time, and the media frame includes media frames that have been buffered starting from the target timestamp, that is, the terminal pulls Take the media stream whose buffer length is
  • the terminal determines the second extended parameter according to the service type (@type) field in the multimedia description file. Based on the query that the service type is "dynamic" (live) and the user does not specify the playback progress, the terminal will set the second The extended parameter is set to 0 so that the user can watch the latest live video stream in real time; based on the query that the server type is "dynamic" (live) and the user specifies the playback progress, the terminal sets the second extended parameter to correspond to the playback progress Timestamp (target timestamp), which can easily start to pull the media stream according to the starting point specified by the user; based on the query that the service type is "static" (on-demand) and the user has not specified the playback progress, the terminal detects that the media stream is on The historical playback progress when it is closed once, and the second extended parameter is set to the timestamp (target timestamp) corresponding to the historical playback progress, so that it is convenient for the user to continue watching from the progress viewed last time.
  • the service type is "dynamic" (live
  • the terminal Based on the user viewing the media stream for the first time, and no historical playback progress can be queried at this time, the terminal sets the second extended parameter to the timestamp of the first media frame (target timestamp); based on the query, the service type is "static" (on-demand) And the user specifies the playback progress, and the terminal sets the second extended parameter to the timestamp (target timestamp) corresponding to the playback progress, so that it can start to pull the media stream conveniently according to the starting point specified by the user.
  • the url address of the media stream whose format is considered to be the target bit rate plus the extension field is visually represented as "url&extParam".
  • the server can follow the FAS For the specified processing specifications, to respond to the frame acquisition request, refer to the following S57.
  • the server receives and responds to the frame acquisition request, and acquires the media frame starting from the start position from the address corresponding to the target address information.
  • the server parses the frame acquisition request to obtain the target address information and starting position. Based on the target address information and starting position, the server locates the media stream of the target code rate from the resource library. The media frame corresponding to the start position, and the media frame starting from the media frame is obtained.
  • the server determines the target timestamp based on the starting position, and then determines and obtains the media frame starting from the starting position based on the target timestamp.
  • each media frame in the media stream corresponds to a timestamp
  • the server locates the media frame at the starting position based on the starting position, and then determines the target timestamp according to the timestamp of the media frame, and then locates it by the target timestamp
  • the server can start to transmit the media stream to the terminal through the media frame.
  • the starting position is the pull position parameter
  • the above process of determining the target timestamp is: the server determines the target timestamp based on the audio parameter and the pull position parameter.
  • the pull position parameter (@fasSpts) is used to indicate the specific frame from which the server sends the media stream, and the data type of the pull position parameter is int64_t type. Of course, it is also other data types.
  • the embodiment of the present disclosure is wrong to pull The data type of the position parameter is specifically limited. In the frame acquisition request, if the pull position parameter is equal to 0, greater than 0, less than 0, or default, different values will correspond to different processing logic of the server, which will be described in detail in S57 below.
  • the server parses the frame acquisition request to obtain the pull position parameter.
  • the terminal specifies the pull position parameter in the frame acquisition request, and the server directly responds The @fasSpts field of the frame acquisition request is parsed to obtain the pull position parameter.
  • the server configures the pull position parameter as a default value.
  • the default value here is configured by the server according to the business scenario. For example, in the live broadcast business scenario, set defaultSpts to 0, and in the on-demand business scenario, set defaultSpts to the PTS (Presentation Time Stamp) of the historical media frame at the end of the last viewing. , Display timestamp), based on the PTS of the historical media frames that are not recorded in the cache, then set defaultSpts as the PTS of the first media frame.
  • PTS Presentation Time Stamp
  • the audio parameter (@onlyAudio) is used to indicate the pull mode of the media stream. Based on the setting as true, it means that the media frame transmitted by the server to the terminal is an audio frame, which is commonly referred to as "pure audio mode”. Based on the setting as false, it means that the media frames transmitted by the server to the terminal are audio and video frames, commonly known as "non-pure audio mode".
  • the audio parameters are true, false or default, and different values correspond to different processing logics of the server, which will be described in detail in S57 below.
  • the server parses the frame acquisition request to obtain the audio parameters.
  • the terminal specifies the audio parameters in the frame acquisition request, and the server directly responds to the @onlyAudio of the frame acquisition request.
  • the fields are parsed to obtain audio parameters.
  • the server configures the audio parameter as the default value.
  • the terminal does not specify the audio parameter in the frame acquisition request, and the server configures the default value for it.
  • the server before determining the target timestamp, the server refreshes the current effective buffer area by executing the following S57A-S57B:
  • the server determines that the time stamp rollback occurs in the buffer area. Based on the monotonic increase of the media frame sequence in the buffer area, the server determines that the time stamp rollback has not occurred in the buffer area.
  • the media frame sequence is a sequence composed of multiple media frames buffered in the buffer area.
  • the above-mentioned time stamp rollback phenomenon means that the media frames in the buffer area are not stored in the order of monotonically increasing timestamps. At this time, there are redundant media frames in the buffer area. This phenomenon usually occurs in live broadcast business scenarios.
  • the terminal pushes the stream to the server, due to network fluctuations, delays, etc., the media frame sent first may arrive at the server later, causing the timestamp of the media frame in the media frame sequence in the buffer area to increase non-monotonously, causing The time stamp rollback phenomenon.
  • the host terminal in order to avoid the problem of packet loss, usually sends each media frame multiple times. This redundant multiple transmission mechanism will also cause the time stamp of the media frame in the media frame sequence in the buffer area to appear. The non-monotonic increase causes the time stamp to roll back.
  • the server When determining whether the timestamp of the media frame in the media frame sequence is increasing non-monotonically, the server only needs to start with the media frame with the smallest timestamp, and according to the storage order of the media frame sequence in the buffer area, traverse whether there is a media frame with a timestamp greater than For the time stamp of the next media frame, based on the existence of any media frame having a time stamp greater than the time stamp of the next media frame, it is determined that the time stamp of the media frame in the media frame sequence is non-monotonously increasing, and it is determined that the time stamp rollback occurs in the buffer area.
  • the timestamps of the media frames in the media frame sequence in the buffer area are [1001,1002,1003,1004,1005...], and the timestamps of the omitted parts of the media frames are increasing.
  • the time stamps of the media frames in the media frame sequence The timestamp increases monotonically, and there is no timestamp rollback phenomenon in the buffer area.
  • the timestamps of the media frames in the media frame sequence in the buffer area are [1001,1002,1003,1001,1002,1003,1004...], and the timestamps of the omitted parts of the media frames are increasing.
  • video resources and audio resources are discussed separately: For video resources, when judging whether the timestamps of media frames in a media frame sequence are non-monotonously increasing, only the key frames (I frames) of the video resources are considered. Whether the timestamp of the key frame in the sequence is increasing non-monotonically; for audio resources, when judging whether the timestamp of the media frame in the media frame sequence is non-monotonously increasing, consider whether the timestamp of the audio frame in the audio frame sequence of the audio resource Shows a non-monotonic increase.
  • the media frame sequence is non-monotonously increasing, where the key frame sequence is in the buffer area.
  • the audio frame sequence is a sequence composed of multiple audio frames buffered in the buffer area.
  • the time stamp rollback phenomenon may occur more than once, that is, multiple monotonically increasing phases are divided into the time stamp of the media frame in the media frame sequence.
  • the timestamp is increasing monotonically, but the timestamps of the media frames between different stages are increasing non-monotonously. At this time, there are many redundant and invalid media frames in the buffer area.
  • the server determines the current validity in the buffer area by executing the following S57B Cache area.
  • the server determines each media frame included in the last monotonically increasing stage as a resource in the current effective buffer area.
  • the server determines the first media frame in the last monotonic increase phase from the media frame sequence, and changes the media frame sequence from the first media frame to the media frame with the largest time stamp (equivalent to the latest media frame). All media frames between) are determined as the current effective buffer area, so as to ensure that the time stamp of the media frames in the current effective buffer area is monotonously increasing.
  • the timestamps of the media frames in the media frame sequence in the buffer area are [1001,1002,1003,1001,1002,1003,1004...]
  • the timestamps of the omitted parts of the media frames increase, and the buffer area The timestamp rolls back, and it is seen that the first media frame in the last monotonously increasing stage is the 4th media frame, then all media frames from the 4th media frame to the latest media frame are determined as the current effective buffer area .
  • the timestamps of the media frames in the media frame sequence in the buffer area are [1001,1002,1003,1001,1002,1003,1001...]
  • the timestamps of the omitted parts of the media frames increase, and the time when the buffer area occurs Poke back and see that the first media frame in the last monotonously increasing stage is the 7th media frame, so all media frames from the 7th media frame to the latest media frame are determined as the current effective buffer area.
  • video resources and audio resources are discussed separately: based on the video resources included in the buffer area, for video resources, the server uses the I frame of the video resource as the calculation point, starting from the first of the last monotonically increasing stage All media frames between the key frame and the latest video frame are used as the current effective buffer area, where the timestamp of the latest video frame is expressed as latestVideoPts; based on the fact that the buffer area does not include video resources, for audio resources, the server uses audio Frames are used as calculation points, and all media frames from the first audio frame of the last monotonously increasing stage to the latest audio frame are used as the current effective buffer area, where the timestamp of the latest audio frame is represented as latestAudioPts.
  • the operation of updating the current valid buffer area is triggered periodically, and manually triggered by a technician. Of course, it is also updated every time a frame acquisition request is received. This method is called "passive trigger".
  • the embodiment of the present disclosure does not specifically limit the trigger condition for updating the currently valid buffer area.
  • the time stamp rollback phenomenon in the buffer area can be detected in time, and the current effective buffer area can be updated based on the time stamp rollback phenomenon to avoid abnormalities in the subsequent transmission of media frames.
  • FIG. 7 is a schematic diagram of a principle for determining a target timestamp according to an embodiment of the present disclosure. Please refer to FIG. 7, which shows that the server has different processing logics under different pull position parameters and audio parameter values. In the following, the processing logic of the server will be introduced. Since the value of the pull position parameter is divided into four types: default value, equal to 0, less than 0, and greater than 0, the following four situations will be described separately.
  • the server determines the target timestamp by subtracting the absolute value of the default value of the pull position parameter from the maximum timestamp .
  • the maximum time stamp is the maximum video timestamp latestVideoPts; based on the video resource not included in the current valid buffer area, the maximum time stamp is the maximum audio timestamp latestAudioPts.
  • @onlyAudio audio parameter
  • the server's processing rules are as follows:
  • the server determines the value obtained from latestVideoPts–
  • the processing rules of the server at this time are as follows: the server will latestAudioPts–
  • the obtained value is determined as the target timestamp.
  • the maximum time stamp is the maximum video timestamp latestVideoPts; based on the video resource not included in the current valid buffer area, the maximum time stamp is the maximum audio timestamp latestAudioPts.
  • the server determines the latestVideoPts as the target timestamp; based on the fact that no video resource is included in the current valid buffer area, the server determines the latestAudioPts as the target timestamp.
  • the maximum time stamp is the maximum video timestamp latestVideoPts; based on the video resource not included in the current valid buffer area, the maximum time stamp is the maximum audio timestamp latestAudioPts.
  • the above process refers to the case where the @fasSpts field in the frame acquisition request carries a value less than 0 (@fasSpts ⁇ 0).
  • the server determines latestVideoPts-
  • the above process refers to the case where the @fasSpts field in the frame acquisition request carries a value less than 0 (@fasSpts ⁇ 0).
  • the server's processing rules are as follows: the server determines the latestAudioPts-
  • the maximum time stamp is the maximum video timestamp latestVideoPts; based on the video resource not included in the current valid buffer area, the maximum time stamp is the maximum audio timestamp latestAudioPts.
  • the above process refers to the case where the @fasSpts field in the frame acquisition request carries a value greater than 0 (@fasSpts>0).
  • the server determines the latestVideoPts as the target timestamp; b) based on the current effective buffer area not including video resources, the server determines the latestAudioPts as Target timestamp.
  • the above process refers to the case where the @fasSpts field in the frame acquisition request carries a value greater than 0 (@fasSpts>0).
  • the server's processing rules are as follows: the server determines the latestAudioPts as the target timestamp.
  • the pull position parameter is determined as the target time stamp.
  • the above process refers to the case where the @fasSpts field in the frame acquisition request carries a value greater than 0 (@fasSpts>0).
  • the pull position parameter is determined as the target timestamp.
  • the above process refers to the case where the @fasSpts field in the frame acquisition request carries a value greater than 0 (@fasSpts>0).
  • the server's processing rules are as follows: when the time stamp rollback does not occur in the buffer area, the server will determine @fasSpts as the target time stamp.
  • the server can execute the corresponding processing logic when pulling different values of the position parameter, thereby determining the target timestamp.
  • the target timestamp is used to determine the media stream from the starting position. frame.
  • the server determines the media frame starting from the starting position in the following way:
  • Manner 1 The server determines that the media frame whose time stamp is closest to the target time stamp in the current effective buffer area is the media frame starting from the start position.
  • the key frame (I frame) of the video resource whose time stamp is closest to the target time stamp is determined It is the media frame starting from the starting position; based on the fact that no video resource is included in the current valid buffer area, the audio frame whose time stamp is closest to the target timestamp is determined as the media frame starting from the starting position.
  • the server when the audio parameter is true, the server directly determines the audio frame whose time stamp is closest to the target time stamp as the media frame starting from the starting position.
  • the process includes the following exemplary scenarios:
  • the target timestamp is latestVideoPts–
  • the server takes the I frame closest to the latestVideoPts–
  • the target timestamp is latestVideoPts
  • the server will set the PTS to the latest The I frame close to latestVideoPts is regarded as the media frame starting from the starting position;
  • the target timestamp is latestAudioPts, and the server takes the audio frame closest to the latestAudioPts of PTS as the media frame starting from the starting position .
  • the target timestamp is latestVideoPts-
  • the server takes the I frame closest to the latestVideoPts-
  • @fasSpts>0, @onlyAudio default or @onlyAudio false, when the time stamp rollback occurs in the buffer area, please refer to the example 1) in the above case 4.
  • the target The time stamp is latestVideoPts
  • the server takes the I frame closest to latestVideoPts (the latest I frame) as the media frame starting from the starting position; based on the current effective buffer area does not include video resources, the target time stamp is latestAudioPts, the server will PTS The audio frame closest to latestAudioPts (the latest audio frame) is taken as the media frame starting from the start position.
  • the server uses the above method one to bring the timestamp in the current effective cache area closest to the target timestamp
  • the media frame of is determined to be the media frame starting from the starting position, and there is no enumerating one by one here.
  • the server when @fasSpts>0, in addition to the above method 1, the server also uses the following method 2 to determine the media frame:
  • Method 2 Based on the presence of the target media frame in the current effective buffer area, the server determines that the target media frame is a media frame starting from the starting position, and the time stamp of the target media frame is greater than or equal to the target time stamp and is closest to the target Timestamp.
  • the target media frame refers to the I frame in the video resource; based on the fact that the current valid buffer area does not include the video resource For video resources, the target media frame refers to the audio frame.
  • the target media frame refers to an audio frame.
  • the process includes the following exemplary scenarios:
  • @fasSpts>0, @onlyAudio default or @onlyAudio false, when the time stamp rollback does not occur in the buffer area, please refer to the example 3) in the above case 4.
  • the target time stamp is @fasSpts, based on The current effective buffer area includes video resources.
  • the server starts with the smallest I frame of PTS and traverses one by one along the direction of increasing PTS until the first I frame (target media frame) with PTS ⁇ @fasSpts is queried, indicating that it is currently valid There is a target media frame in the buffer area, and the server determines the above target media frame as a media frame starting from the starting position; based on the fact that the current effective buffer area does not include video resources, the server starts from the audio frame with the smallest PTS and increases along the PTS Traverse the direction one by one until the first audio frame (target media frame) with PTS ⁇ @fasSpts is queried, indicating that there is a target media frame in the current effective buffer area, and the server determines the above target media frame as the media starting from the starting position frame.
  • the second method above provides how the server determines the media frame starting from the starting position when the target media frame can be queried in the current valid buffer area. However, in some embodiments, it may be combined in the current valid buffer area.
  • the target media frame is not queried. This situation usually occurs in live broadcast business scenarios.
  • the frame acquisition request for pulling @fasSpts specified by the audience terminal arrives at the server first, and the media frame corresponding to @fasSpts (live video frame) is still In the transmission process of the streaming stage, the server also determines the media frame starting from the starting position through the following method three at this time.
  • Method 3 Based on the fact that the target media frame does not exist in the current valid buffer area, the server enters a waiting state until the target media frame is written into the current valid buffer area, and determines that the target media frame is a media frame starting from the starting position.
  • the timestamp of the target media frame is greater than or equal to the target timestamp and is closest to the target timestamp.
  • the target media frame refers to the I frame in the video resource; based on the fact that the current valid buffer area does not include the video resource For video resources, the target media frame refers to the audio frame.
  • the target media frame refers to an audio frame.
  • the target time stamp is @fasSpts, based on The current effective buffer area includes video resources.
  • the server starts from the I frame with the smallest PTS and traverses one by one in the direction of increasing PTS.
  • the server If all I frames are traversed, no I frames satisfying PTS ⁇ @fasSpts (target media Frame), indicating that the target media frame does not exist in the current effective buffer area, and the server enters the waiting state, waiting for the first I frame (target media frame) with PTS ⁇ @fasSpts to be written into the current effective buffer area, and confirms that the target media frame is Media frame starting from the starting position; based on the current effective buffer area does not include video resources, the server starts from the audio frame with the smallest PTS and traverses one by one along the direction of increasing PTS.
  • the audio frame (target media frame) that satisfies PTS ⁇ @fasSpts indicates that there is no target media frame in the current effective buffer area, and the server enters the waiting state, waiting for the first audio frame (target media frame) with PTS ⁇ @fasSpts to be written In the current valid buffer area, the target media frame is determined to be the media frame starting from the start position.
  • the target time stamp is @fasSpts
  • the server receives the smallest audio from PTS At the beginning of the frame, it is traversed one by one along the direction of increasing PTS. If no audio frame (target media frame) that satisfies PTS ⁇ @fasSpts is found after traversing all audio frames, it means that there is no target media frame in the current effective buffer area.
  • the server enters the waiting state and waits for the first audio frame (target media frame) with PTS ⁇ @fasSpts to be written into the current effective buffer area, and determines that the target media frame is a media frame starting from the start position.
  • the server determines the media frame starting from the starting position when the target media frame cannot be queried in the current effective buffer area.
  • the frame may be caused by abnormal conditions.
  • the @fasSpts carried in the acquisition request is a large abnormal value. Processing based on the above method 3 will cause a long waiting time.
  • these frames are acquired Requests will enter a blocked waiting state, occupying the processing resources of the server, and causing great losses to the performance of the server.
  • the server also sets a timeout threshold, so as to determine whether it is necessary to return pull failure information based on the timeout threshold through the following method four.
  • the fourth method is described in detail below.
  • Method 4 Based on the fact that there is no target media frame in the current valid buffer area, and the difference between the target timestamp and the maximum timestamp is greater than the timeout threshold, the server sends pull failure information, and the timestamp of the target media frame is greater than or equal to The target timestamp is closest to the target timestamp.
  • the maximum time stamp is the maximum video timestamp latestVideoPts; based on the current valid buffer area not including video Resource, the maximum timestamp is the maximum audio timestamp latestAudioPts.
  • the maximum timestamp is the maximum audio timestamp latestAudioPts.
  • the timeout threshold is any value greater than or equal to 0, and the timeout threshold is a value preset by the server, which is also configured by the technicians based on business scenarios.
  • the embodiment of the present disclosure does not obtain the timeout threshold.
  • the method is specifically limited. In some embodiments, the following exemplary scenarios are included:
  • the target time stamp is @fasSpts, based on The current effective buffer area includes video resources.
  • the server starts from the I frame with the smallest PTS and traverses one by one in the direction of increasing PTS.
  • the server determines whether the difference between @fasSpts and latestVideoPts is greater than timeoutPTS, and based on @fasSpts–latestVideoPts>timeoutPTS, the server sends pull failure information to the terminal.
  • the server Based on @fasSpts–latestVideoPts ⁇ timeoutPTS, the server enters the waiting state, which corresponds to the operation performed in the corresponding case of example K) in the third method; based on the current effective buffer area does not include video resources, the server starts from the audio with the smallest PTS At the beginning of the frame, it is traversed one by one along the direction of increasing PTS. If no audio frame (target media frame) that satisfies PTS ⁇ @fasSpts is found after traversing all audio frames, it means that there is no target media frame in the current effective buffer area.
  • target media frame target media frame
  • the server judges whether the difference between @fasSpts and latestAudioPts is greater than timeoutPTS, and based on @fasSpts–latestAudioPts>timeoutPTS, the server sends a pull failure letter to the terminal. Based on @fasSpts–latestAudioPts ⁇ timeoutPTS, the server enters the waiting state, which corresponds to the operation performed in the corresponding case of example K) in the third method.
  • the target time stamp is @fasSpts
  • the server receives the smallest audio from PTS At the beginning of the frame, it is traversed one by one along the direction of increasing PTS. If no audio frame (target media frame) that satisfies PTS ⁇ @fasSpts is found after traversing all audio frames, it means that there is no target media frame in the current effective buffer area.
  • the server judges whether the difference between @fasSpts and latestAudioPts is greater than timeoutPTS, and based on @fasSpts–latestAudioPts>timeoutPTS, the server sends pull failure information to the terminal. Based on @fasSpts–latestAudioPts ⁇ timeoutPTS, the server enters the waiting state, which corresponds to the operation performed in the corresponding case of example K) in the third method.
  • Combining the above method 3 and method 4 provides an exception handling logic when @fasSpts>0 and there is no target media frame in the current effective buffer area, based on the difference between the target timestamp and the maximum timestamp being less than or Equal to the timeout threshold, the server enters the waiting state (waiting processing mode) through mode three, until the target media frame arrives, and determines the target media frame as the media frame starting from the starting position. Based on the difference between the target timestamp and the maximum timestamp being greater than the timeout threshold, the server sends the pull failure information (error handling mode) through method four. At this time, the server determines that the frame acquisition request is wrong, so it directly returns the pull to the terminal Failure information, the pull failure information is in the form of an error code.
  • the server determines the media frame from the start position of the media stream based on the pull position parameter of the media stream. It is precisely because the pull position parameter is carried in the frame acquisition request that the server can conveniently Determine from which media frame to start the transmission of the media frame at the target bit rate in the process of responding to the frame acquisition request, which improves the flexibility of the resource transmission process. Furthermore, in scenarios where dynamic bit rate switching is required, only It is necessary to replace the address information (@url field) and pull position parameters (@fasSpts field) carried in the frame acquisition request, so that the media frame starts from any specified media frame starting from the starting position and starts media at the new bit rate. Frame transmission realizes adaptive code rate switching.
  • the server transmits the media frame starting from the start position to the terminal at the target bit rate.
  • the server After the server obtains the media frame starting from the starting position, it transmits the media frame starting from the starting position to the terminal at the target bit rate. In this process, the server continuously sends media frames to the terminal like a stream, which is vividly called For "media streaming".
  • the target address information is a domain name
  • the terminal sends a frame acquisition request to the central platform of the CDN server, and the central platform calls DNS (Domain Name System, domain name system, which is essentially a domain name system).
  • DNS Domain Name System, domain name system, which is essentially a domain name system.
  • Domain name resolution library parse the domain name to obtain the CNAME (alias) record corresponding to the domain name, and analyze the CNAME record again based on the geographic location information of the terminal to obtain the IP (Internet Protocol, Internet Protocol) of the edge server closest to the terminal ) Address.
  • the central platform directs the frame acquisition request to the above-mentioned edge server, and the edge server responds to the frame acquisition request to provide the terminal with the media frame of the multimedia resource at the target bit rate, so that the terminal can access the multimedia with the target bit rate nearby. resource.
  • the embodiments of the present disclosure provide an internal return-to-source mechanism of the CDN server.
  • the edge server cannot provide the multimedia resources specified by the frame acquisition request.
  • the edge server returns to the upper-level node device. The source pulls the media stream.
  • the edge server sends back the source pull request to the upper-level node device, and the upper-level node device returns the corresponding media stream to the edge server in response to the back-to-source pull request, and the edge server sends the corresponding media stream to the terminal.
  • the edge server when the edge server obtains the back-to-origin pull request, based on the @fasSpts field carried in the frame acquisition request sent by the terminal, the edge server directly determines the frame acquisition request as a back-to-origin pull request, and then the back-to-origin pull request Forward to the higher-level node device.
  • the edge server based on the default @fasSpts field in the frame acquisition request sent by the terminal, the edge server needs to configure the default value defaultSpts for the @fasSpts field, and then embed the @fasSpts field in the frame acquisition request, and set the @fasSpts field in the frame acquisition request.
  • the stored value is set to defaultSpts, and the pull request back to the source is obtained.
  • the upper-level node device is a third-party origin server. In this case, the return-to-origin pull request must carry the @fasSpts field.
  • Fig. 8 is a block diagram of a media stream transmission device according to an exemplary embodiment, which is applied to a terminal.
  • the device includes a determining module 801 and a sending module 802; wherein, the determining module 801 is configured to respond to the frame of the media stream.
  • An acquisition instruction is used to determine the target address information of the media stream of the target code rate from the address information of the media stream of multiple code rates.
  • the determining module 801 is further configured to determine the starting position of the media frame to be acquired corresponding to the target bit rate in the media stream.
  • the sending module 802 is configured to send a frame acquisition request carrying the target address information and the starting position to the server, and the frame acquisition request is used to instruct the server to return the input media stream at the target bit rate. The media frame starting from the start position.
  • the determining module 801 is configured to determine the location of the media frame generated during the operation time of the play operation in the media stream as The starting position. Or, determining the position in the media stream of the media frame selected in the frame acquisition instruction as the starting position. Or, determining the location of the first media frame of the media stream as the starting location.
  • the frame acquisition instruction is triggered when the playback status information of the media stream meets a code rate switching condition.
  • the device further includes: an obtaining module, configured to obtain the playing state information of the media stream when any media frame in the media stream is received.
  • the determining module 801 is further configured to determine the target address of the media stream with a target bitrate from the address information of the media stream with multiple bitrates in response to the playback status information meeting the code rate switching condition information.
  • the determining module 801 is configured to determine the starting position of the media frame to be acquired corresponding to the target bit rate in the media stream according to the position of any media frame in the media stream.
  • the determining module 801 is configured to determine a target bit rate according to the playing state information and the current bit rate in response to the playback state information meeting the code rate switching condition. In response to the target code rate being not equal to the current code rate, the target address information of the media stream of the target code rate is determined from the address information of the media streams of multiple code rates.
  • the playing state information includes a first buffer amount
  • the first buffer amount is a buffer amount that is currently buffered for the media stream and not played.
  • the determining module 801 is configured to: in response to the first buffer amount being greater than the first buffer amount threshold or the first buffer amount being smaller than the second buffer amount threshold, determine the target code according to the playback status information and the current bit rate Rate, wherein the second buffer amount threshold is less than the first buffer amount threshold.
  • the determining module 801 is configured to obtain multiple candidate code rates. According to the relationship between the multiple candidate code rates and the current code rate, the playback status information, and the position of the any media frame in the media stream in the media group where the any media frame is located, Obtain the second buffer amount corresponding to each candidate code rate. According to the relationship between the second buffer amount corresponding to each candidate bit rate and the first buffer amount threshold or the second buffer amount threshold, a target bit rate is determined from the multiple candidate bit rates. Wherein, the second buffer amount corresponding to each candidate bit rate is the buffer that has been buffered but not played for the media stream at the end of transmission of the media group in which any media frame is located after the bit rate is switched to the candidate bit rate quantity.
  • the frame acquisition request further includes at least one of a first extended parameter or a second extended parameter
  • the first extended parameter is used to indicate whether the media frame is an audio frame
  • the second extended parameter The parameter is used to indicate that the media frame in the media stream is transmitted from the target timestamp indicated by the second extended parameter.
  • the address information of the media stream of the multiple code rates is stored in the media description file of the media stream.
  • the media description file includes a version number and a media description set, where the version number includes at least one of a version number of the media description file or a version number of a resource transmission standard, and the media description The set includes multiple media description meta-information, each media description meta-information corresponds to a media stream of one bit rate, and each media description meta-information includes the length of the group of pictures of the media stream of the bit rate corresponding to the media description meta-information and Property information.
  • the media stream transmission device provided in the above embodiment transmits media streams
  • only the division of the above-mentioned functional modules is used as an example for illustration.
  • the above-mentioned functions are allocated by different functional modules as needed. That is, the internal structure of the device is divided into different functional modules to complete all or part of the functions described above.
  • the media stream transmission device provided in the foregoing embodiment and the embodiment of the media stream transmission method belong to the same concept, and the specific implementation process is detailed in the method embodiment, which is not repeated here.
  • Fig. 9 is a block diagram showing a media stream transmission device according to an exemplary embodiment, which is applied to a server.
  • the device includes a receiving module 901, an obtaining module 902, and a transmission module 903; wherein, the receiving module 901 is used for receiving frame obtaining Request, the frame acquisition request carries the target address information of the media stream of the target code rate and the starting position of the media frame to be acquired corresponding to the target code rate in the media stream.
  • the obtaining module 902 is configured to obtain a media frame starting from the start position from the address corresponding to the target address information in response to the frame obtaining request.
  • the transmission module 903 is configured to transmit the media frame starting from the start position to the terminal at the target bit rate.
  • the acquiring module 902 is configured to determine a target timestamp based on the starting position. Based on the target timestamp, a media frame starting from the starting position is determined and acquired.
  • the acquisition module 902 is configured to determine the target timestamp based on the audio parameter and the pull position parameter.
  • the acquisition module 902 is configured to: based on the pull position parameter being a default value, and the audio parameter is the default value or the audio parameter is false, subtract the maximum timestamp from the pull The value obtained by taking the absolute value of the default value of the position parameter is determined as the target timestamp. Or, based on that the pull position parameter is a default value and the audio parameter is true, a value obtained by subtracting the absolute value of the default value of the pull position parameter from the maximum audio timestamp is determined as the target timestamp . Or, based on that the pull position parameter is equal to 0, and the audio parameter is a default value or the audio parameter is false, the maximum timestamp is determined as the target timestamp.
  • the maximum audio timestamp is determined as the target timestamp.
  • the maximum timestamp minus the absolute value of the pull position parameter is determined as the Target timestamp.
  • a value obtained by subtracting the absolute value of the pull position parameter from the maximum audio timestamp is determined as the target timestamp.
  • the maximum time stamp is determined as the target time stamp .
  • the pull position parameter is greater than 0 and the audio parameter being true, when a time stamp rollback occurs in the buffer area, the maximum audio time stamp is determined as the target time stamp.
  • the pull position parameter is determined as the target timestamp.
  • the acquiring module 902 is further configured to: determine that the time stamp rollback occurs in the buffer area based on the non-monotonic increase in the timestamps of the media frames in the media frame sequence in the buffer area. Based on the time stamp of the media frame in the media frame sequence in the buffer area not increasing non-monotonically, it is determined that the time stamp rollback has not occurred in the buffer area, and the media frame sequence is composed of multiple media frames buffered in the buffer area sequence.
  • the acquisition module 902 is configured to determine that the media frame sequence is non-monotonously increasing when the timestamp of the key frame in the key frame sequence is non-monotonously increasing based on the video resource included in the buffer area ,
  • the key frame sequence is a sequence composed of a plurality of buffered key frames. Based on the fact that the buffer area does not include video resources, when the audio frame sequence shows a non-monotonic increase in the timestamp of the middle audio frame, it is determined that the media frame sequence shows a non-monotonic increase, and the audio frame sequence is a plurality of buffered audio frames Composed sequence.
  • the acquisition module 902 is configured to determine that the target media frame is the media frame starting from the start position based on the existence of the target media frame in the current valid buffer area, and the target media frame The timestamp of is greater than or equal to the target timestamp and is closest to the target timestamp. Or, based on the absence of a target media frame in the currently valid buffer area, enter a waiting state until the target media frame is written into the current valid buffer area, and determine that the target media frame is the For the media frame starting at the position, the time stamp of the target media frame is greater than or equal to the target time stamp and is closest to the target time stamp.
  • the pull failure information is sent, and the timestamp of the target media frame is greater than Or equal to the target timestamp and closest to the target timestamp.
  • the media stream transmission device provided in the above embodiment transmits media streams
  • only the division of the above-mentioned functional modules is used as an example for illustration.
  • the above-mentioned functions are allocated by different functional modules as needed. That is, the internal structure of the device is divided into different functional modules to complete all or part of the functions described above.
  • the media stream transmission device provided in the foregoing embodiment and the embodiment of the media stream transmission method belong to the same concept, and the specific implementation process is detailed in the method embodiment, which is not repeated here.
  • FIG. 10 shows a structural block diagram of a terminal 1000 provided by an exemplary embodiment of the present disclosure.
  • the terminal 1000 is: smart phone, tablet computer, MP3 (Moving Picture Experts Group Audio Layer III, motion picture expert compression standard audio layer 3) player, MP4 (Moving Picture Experts Group Audio Layer IV, motion picture expert compression standard audio layer) 4) Player, laptop or desktop computer.
  • the terminal 1000 may also be called user equipment, portable terminal, laptop terminal, desktop terminal and other names.
  • the terminal 1000 includes: one or more processors 1001 and one or more memories 1002.
  • the processor 1001 includes one or more processing cores, such as a 4-core processor, an 8-core processor, and so on.
  • the processor 1001 adopts at least one hardware form among DSP (Digital Signal Processing), FPGA (Field-Programmable Gate Array), and PLA (Programmable Logic Array, Programmable Logic Array).
  • the processor 1001 also includes a main processor and a co-processor.
  • the main processor is a processor used to process data in the wake-up state, also called a CPU (Central Processing Unit, central processing unit); the co-processor is a It is a low-power processor for processing data in the standby state.
  • the processor 1001 is integrated with a GPU (Graphics Processing Unit, image processor), and the GPU is responsible for rendering and drawing content that needs to be displayed on the display screen.
  • the processor 1001 further includes an AI (Artificial Intelligence) processor, and the AI processor is used to process computing operations related to machine learning.
  • AI Artificial Intelligence
  • the memory 1002 includes one or more computer-readable storage media, which are non-transitory.
  • the memory 1002 may also include high-speed random access memory and non-volatile memory, such as one or more magnetic disk storage devices and flash memory storage devices.
  • the non-transitory computer-readable storage medium in the memory 1002 is used to store at least one instruction, and the at least one instruction is used to be executed by the processor 1001 to implement the media stream provided by the method embodiments of the present disclosure. Transmission method.
  • the terminal 1000 optionally further includes: a peripheral device interface 1003 and at least one peripheral device.
  • the processor 1001, the memory 1002, and the peripheral device interface 1003 are connected by a bus or signal line.
  • Each peripheral device is connected to the peripheral device interface 1003 through a bus, a signal line or a circuit board.
  • the peripheral device includes at least one of a radio frequency circuit 1004, a touch display screen 1005, a camera component 1006, an audio circuit 1007, a positioning component 1008, and a power supply 1009.
  • the peripheral device interface 1003 may be used to connect at least one peripheral device related to I/O (Input/Output) to the processor 1001 and the memory 1002.
  • the processor 1001, the memory 1002, and the peripheral device interface 1003 are integrated on the same chip or circuit board; in some other embodiments, any one of the processor 1001, the memory 1002, and the peripheral device interface 1003 or The two are implemented on a separate chip or circuit board, which is not limited in this embodiment.
  • the radio frequency circuit 1004 is used for receiving and transmitting RF (Radio Frequency, radio frequency) signals, also called electromagnetic signals.
  • the radio frequency circuit 1004 communicates with a communication network and other communication devices through electromagnetic signals.
  • the radio frequency circuit 1004 converts electrical signals into electromagnetic signals for transmission, or converts received electromagnetic signals into electrical signals.
  • the radio frequency circuit 1004 includes: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a user identity module card, and so on.
  • the radio frequency circuit 1004 communicates with other terminals through at least one wireless communication protocol.
  • the wireless communication protocol includes but is not limited to: World Wide Web, Metropolitan Area Network, Intranet, various generations of mobile communication networks (2G, 3G, 4G, and 5G), wireless local area network and/or WiFi (Wireless Fidelity, wireless fidelity) network.
  • the radio frequency circuit 1004 further includes a circuit related to NFC (Near Field Communication), which is not limited in the present disclosure.
  • the display screen 1005 is used to display a UI (User Interface, user interface).
  • the UI includes graphics, text, icons, videos, and any combination of them.
  • the display screen 1005 also has the ability to collect touch signals on or above the surface of the display screen 1005.
  • the touch signal is input to the processor 1001 as a control signal for processing.
  • the display screen 1005 is also used to provide virtual buttons and/or virtual keyboards, also called soft buttons and/or soft keyboards.
  • the display screen 1005 there are one display screen 1005, and the front panel of the terminal 1000 is provided; in other embodiments, there are at least two display screens 1005, which are respectively provided on different surfaces of the terminal 1000 or in a folding design;
  • the display screen 1005 is a flexible display screen, which is arranged on the curved surface or the folding surface of the terminal 1000.
  • the display screen 1005 is also configured as a non-rectangular irregular pattern, that is, a special-shaped screen.
  • the display screen 1005 is made of materials such as LCD (Liquid Crystal Display) and OLED (Organic Light-Emitting Diode).
  • the camera assembly 1006 is used to capture images or videos.
  • the camera assembly 1006 includes a front camera and a rear camera.
  • the front camera is set on the front panel of the terminal, and the rear camera is set on the back of the terminal.
  • the camera assembly 1006 further includes a flash.
  • the flash is a single-color temperature flash and also a dual-color temperature flash. Dual color temperature flash refers to a combination of warm light flash and cold light flash used for light compensation under different color temperatures.
  • the audio circuit 1007 includes a microphone and a speaker.
  • the microphone is used to collect sound waves from the user and the environment, and convert the sound waves into electrical signals and input to the processor 1001 for processing, or input to the radio frequency circuit 1004 to implement voice communication. For the purpose of stereo collection or noise reduction, there are multiple microphones, which are respectively set in different parts of the terminal 1000.
  • the microphone is also an array microphone or an omnidirectional acquisition microphone.
  • the speaker is used to convert the electrical signal from the processor 1001 or the radio frequency circuit 1004 into sound waves.
  • the speaker is a traditional thin-film speaker and also a piezoelectric ceramic speaker.
  • the speaker When the speaker is a piezoelectric ceramic speaker, it not only converts electrical signals into sound waves that are audible to humans, but also converts electrical signals into sound waves that are inaudible to humans for distance measurement and other purposes.
  • the audio circuit 1007 also includes a headphone jack.
  • the positioning component 1008 is used to locate the current geographic location of the terminal 1000 to implement navigation or LBS (Location Based Service, location-based service).
  • the positioning component 1008 is a positioning component based on the GPS (Global Positioning System, Global Positioning System) of the United States, the Beidou system of China, or the Galileo system of Russia.
  • the power supply 1009 is used to supply power to various components in the terminal 1000.
  • the power source 1009 is alternating current, direct current, disposable batteries or rechargeable batteries.
  • the rechargeable battery is a wired rechargeable battery or a wireless rechargeable battery.
  • a wired rechargeable battery is a battery charged through a wired line
  • a wireless rechargeable battery is a battery charged through a wireless coil.
  • the rechargeable battery is also used to support fast charging technology.
  • the terminal 1000 further includes one or more sensors 1010.
  • the one or more sensors 1010 include, but are not limited to: an acceleration sensor 1011, a gyroscope sensor 1012, a pressure sensor 1013, a fingerprint sensor 1014, an optical sensor 1015, and a proximity sensor 1016.
  • the acceleration sensor 1011 detects the magnitude of acceleration on the three coordinate axes of the coordinate system established by the terminal 1000. For example, the acceleration sensor 1011 is used to detect the components of gravitational acceleration on three coordinate axes.
  • the processor 1001 controls the touch screen 1005 to display the user interface in a horizontal view or a vertical view according to the gravity acceleration signal collected by the acceleration sensor 1011.
  • the acceleration sensor 1011 is also used for the collection of game or user motion data.
  • the gyroscope sensor 1012 detects the body direction and rotation angle of the terminal 1000, and the gyroscope sensor 1012 and the acceleration sensor 1011 cooperate to collect the user's 3D actions on the terminal 1000.
  • the processor 1001 implements the following functions according to the data collected by the gyroscope sensor 1012: motion sensing (for example, changing the UI according to the user's tilt operation), image stabilization during shooting, game control, and inertial navigation.
  • the pressure sensor 1013 is arranged on the side frame of the terminal 1000 and/or the lower layer of the touch screen 1005.
  • the processor 1001 performs left and right hand recognition or quick operation according to the holding signal collected by the pressure sensor 1013.
  • the processor 1001 controls the operability controls on the UI interface according to the user's pressure operation on the touch display screen 1005.
  • the operability control includes at least one of a button control, a scroll bar control, an icon control, and a menu control.
  • the fingerprint sensor 1014 is used to collect the user's fingerprint.
  • the processor 1001 identifies the user's identity according to the fingerprint collected by the fingerprint sensor 1014, or the fingerprint sensor 1014 identifies the user's identity according to the collected fingerprint.
  • the processor 1001 authorizes the user to perform related sensitive operations, including unlocking the screen, viewing encrypted information, downloading software, paying, and changing settings.
  • the fingerprint sensor 1014 is provided on the front, back or side of the terminal 1000. When a physical button or a manufacturer logo is provided on the terminal 1000, the fingerprint sensor 1014 is integrated with the physical button or the manufacturer logo.
  • the optical sensor 1015 is used to collect the ambient light intensity.
  • the processor 1001 controls the display brightness of the touch display screen 1005 according to the ambient light intensity collected by the optical sensor 1015. When the ambient light intensity is high, the display brightness of the touch display screen 1005 is increased; when the ambient light intensity is low, the display brightness of the touch display screen 1005 is decreased. In another embodiment, the processor 1001 also dynamically adjusts the shooting parameters of the camera assembly 1006 according to the ambient light intensity collected by the optical sensor 1015.
  • the proximity sensor 1016 also called a distance sensor, is usually arranged on the front panel of the terminal 1000.
  • the proximity sensor 1016 is used to collect the distance between the user and the front of the terminal 1000.
  • the processor 1001 controls the touch screen 1005 to switch from the on-screen state to the off-screen state; when the proximity sensor 1016 detects When the distance between the user and the front of the terminal 1000 gradually increases, the processor 1001 controls the touch display screen 1005 to switch from the rest screen state to the bright screen state.
  • FIG. 10 does not constitute a limitation on the terminal 1000, and includes more or fewer components than shown, or some components are combined, or different component arrangements are adopted.
  • FIG. 11 is a schematic structural diagram of a server provided by an embodiment of the present disclosure.
  • the server 1100 may have relatively large differences due to different configurations or performance, including one or more processors (central processing units, CPU) 1101 and one or There are more than one memory 1102, where at least one instruction is stored in the memory 1102, and the at least one instruction is loaded and executed by the processor 1101 to implement the media stream transmission method provided by each of the foregoing method embodiments.
  • the server also has components such as a wired or wireless network interface and an input/output interface for input and output.
  • the server also includes other components for implementing device functions, which will not be repeated here.
  • the above-mentioned terminal and server are electronic devices that include one or more processors; one or more memories for storing executable instructions of the one or more processors; wherein, the one or more processors It is configured to execute the instructions to implement the method steps of the media stream transmission method shown in the foregoing various embodiments.
  • the features and applications described in each of the foregoing embodiments can all be executed by instructions stored in a storage medium recorded in the memory. When these instructions are executed by one or more processors, they can cause one or more processors to perform the actions indicated in the instructions.
  • a storage medium including instructions such as a memory including instructions.
  • the foregoing instructions may execute the method operations of the media stream transmission method shown in the foregoing embodiments by the processor of the electronic device.
  • the features and applications described in the above embodiments can all be executed by instructions recorded in a storage medium.
  • these instructions are executed by one or more computing or processing units (for example, one or more processors, cores of the processors, or other processing units), they cause the processing units to perform the actions indicated in the instructions.
  • the storage medium may be a non-transitory computer-readable storage medium.
  • the non-transitory computer-readable storage medium may be a read-only memory (Read-Only Memory, ROM for short), random access memory ( Random Access Memory, RAM for short, Compact Disc Read-Only Memory, CD-ROM, magnetic tape, floppy disk and optical data storage devices, etc.
  • the storage medium is a non-transitory computer-readable storage medium.
  • the non-transitory computer-readable storage medium is a read-only memory (Read-Only Memory, abbreviated as ROM), random access memory (Random Access).
  • ROM Read-Only Memory
  • RAM random access memory
  • CD-ROM Compact Disc Read-Only Memory
  • magnetic tape floppy disk and optical data storage device, etc.
  • the embodiment of the present disclosure also provides a computer program product, including one or more instructions, when the one or more instructions are executed by the processor of the electronic device, the electronic device can execute the above-mentioned media stream transmission method.

Abstract

本公开关于一种媒体流传输方法及系统,属于网络技术领域。所述媒体流传输方法可以包括:获取媒体流时,从多种码率的所述媒体流的地址信息中,确定目标码率的目标地址信息,进而确定目标码率对应的媒体帧在所述媒体流中的起始位置,向服务器发送携带有目标地址信息和起始位置的帧获取请求,以指示以所述目标码率返回媒体流中从所述起始位置开始的媒体帧。

Description

媒体流传输方法及系统
本申请要求于2020年01月17日提交的申请号为2020100548308、发明名称为“媒体流传输方法、系统、装置、设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本公开涉及网络技术领域,特别涉及一种媒体流传输方法及系统。
背景技术
随着媒体传输技术的发展,用户随时随地在终端上浏览音视频资源,目前,在服务器向终端传输音视频资源(俗称为“拉流阶段”)时,采用基于分片的媒体传输方式。
基于分片的媒体传输方式包括常见的DASH(Dynamic Adaptive Streaming over HTTP,MPEG制定的基于HTTP的自适应流媒体传输标准,其中,MPEG的英文全称为Moving Picture Experts Group,中文全称为动态图像专家组)、HLS(HTTP Live Streaming,苹果公司制定的基于HTTP的自适应流媒体传输标准)等,然而基于分片的媒体传输方式延迟比较高。
发明内容
本公开实施例提供了一种媒体流传输方法及系统。技术方案如下:
根据本公开实施例的第一方面,提供一种媒体流传输方法,应用于终端,包括:响应于对媒体流的帧获取指令,从多种码率的所述媒体流的地址信息中,确定目标码率的所述媒体流的目标地址信息;确定所述目标码率对应的待获取媒体帧在所述媒体流中的起始位置;向服务器发送携带有所述目标地址信息和起始位置的帧获取请求,所述帧获取请求用于指示所述服务器以所述目标码率返回所述媒体流中从所述起始位置开始的媒体帧。
根据本公开实施例的第二方面,提供一种媒体流传输方法,应用于服务器,包括:接收帧获取请求,所述帧获取请求携带有目标码率的媒体流的目标地址信息和所述目标码率对应的待获取媒体帧在所述媒体流中的起始位置;响应于所述帧获取请求,从所述目标地址信息对应的地址,获取从所述起始位置开始的媒体帧;以所述目标码率,向终端传输从所述起始位置开始的媒体帧。
根据本公开实施例的第三方面,提供一种媒体流传输系统,包括终端和服务器,所述终端用于执行上述应用于终端的媒体流传输方法;所述服务器用于执行上述应用于服务器的媒体流传输方法。
根据本公开实施例的第四方面,提供一种媒体流传输装置,应用于终端,包括确定模块和发送模块;其中,确定模块,用于响应于对媒体流的帧获取指令,从多种码率的所述媒体流的地址信息中,确定目标码率的所述媒体流的目标地址信息;所述确定模块,还用于确定所述目标码率对应的待获取媒体帧在所述媒体流中的起始位置;发送模块,用于向服务器发送携带有所述目标地址信息和起始位置的帧获取请求,所述帧获取请求用于指示所述服务器以所述目标码率返回所述媒体流中从所述起始位置开始的媒体帧。
根据本公开实施例的第五方面,提供一种媒体流传输装置,应用于服务器,包括接收模块、获取模块和传输模块;其中,接收模块,用于接收帧获取请求,所述帧获取请求携带有 目标码率的媒体流的目标地址信息和所述目标码率对应的待获取媒体帧在所述媒体流中的起始位置;获取模块,用于响应于所述帧获取请求,从所述目标地址信息对应的地址,获取从所述起始位置开始的媒体帧;传输模块,用于以所述目标码率,向终端传输从所述起始位置开始的媒体帧。
根据本公开实施例的第六方面,提供一种电子设备,包括:一个或多个处理器;用于存储所述一个或多个处理器可执行指令的一个或多个存储器;其中,所述一个或多个处理器被配置为执行所述指令,以实现如下步骤:响应于对媒体流的帧获取指令,从多种码率的所述媒体流的地址信息中,确定目标码率的所述媒体流的目标地址信息;确定所述目标码率对应的待获取媒体帧在所述媒体流中的起始位置;向服务器发送携带有所述目标地址信息和起始位置的帧获取请求,所述帧获取请求用于指示所述服务器以所述目标码率返回所述媒体流中从所述起始位置开始的媒体帧。
根据本公开实施例的第七方面,提供一种电子设备,包括:一个或多个处理器;用于存储所述一个或多个处理器可执行指令的一个或多个存储器;其中,所述一个或多个处理器被配置为执行所述指令,以实现如下步骤:接收帧获取请求,所述帧获取请求携带有目标码率的媒体流的目标地址信息和所述目标码率对应的待获取媒体帧在所述媒体流中的起始位置;响应于所述帧获取请求,从所述目标地址信息对应的地址,获取从所述起始位置开始的媒体帧;以所述目标码率,向终端传输从所述起始位置开始的媒体帧。
根据本公开实施例的第八方面,提供一种存储介质,当所述存储介质中的指令由电子设备的处理器执行时,使得所述电子设备能够执行如下步骤:响应于对媒体流的帧获取指令,从多种码率的所述媒体流的地址信息中,确定目标码率的所述媒体流的目标地址信息;确定所述目标码率对应的待获取媒体帧在所述媒体流中的起始位置;向服务器发送携带有所述目标地址信息和起始位置的帧获取请求,所述帧获取请求用于指示所述服务器以所述目标码率返回所述媒体流中从所述起始位置开始的媒体帧。
根据本公开实施例的第九方面,提供一种存储介质,当所述存储介质中的指令由电子设备的处理器执行时,使得所述电子设备能够执行如下步骤:接收帧获取请求,所述帧获取请求携带有目标码率的媒体流的目标地址信息和所述目标码率对应的待获取媒体帧在所述媒体流中的起始位置;响应于所述帧获取请求,从所述目标地址信息对应的地址,获取从所述起始位置开始的媒体帧;以所述目标码率,向终端传输从所述起始位置开始的媒体帧。
根据本公开实施例的第十方面,提供一种计算机程序产品,包括一条或多条指令,所述一条或多条指令由电子设备的处理器执行时,使得所述电子设备能够执行上述媒体流传输方法。
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,并不能限制本公开。
附图说明
为了更清楚地说明本公开实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本公开的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还根据这些附图获得其他的附图。
图1是根据一示例性实施例示出的一种媒体流传输方法的实施环境示意图;
图2是本公开实施例提供的一种FAS框架的原理性示意图;
图3是根据一示例性实施例示出的一种媒体流传输方法的流程图;
图4是根据一示例性实施例示出的一种媒体流传输方法的流程图;
图5是根据一示例性实施例示出的一种媒体流传输方法的交互流程图;
图6是根据一示例性实施例示出的一种码率切换过程的示意图;
图7是本公开实施例提供的一种确定目标时间戳的原理性示意图;
图8是根据一示例性实施例示出的一种媒体流传输装置的框图;
图9是根据一示例性实施例示出的一种媒体流传输装置的框图;
图10是本公开实施例提供的一种终端的框图;
图11是本公开实施例提供的一种服务器的框图。
具体实施方式
为使本公开的目的、技术方案和优点更加清楚,下面将结合附图对本公开实施方式作进一步地详细描述。
为了使本领域普通人员更好地理解本公开的技术方案,下面将结合附图,对本公开实施例中的技术方案进行清楚、完整地描述。
需要说明的是,本公开的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下互换,以便这里描述的本公开的实施例能够以除了在这里图示或描述的那些以外的顺序实施。以下示例性实施例中所描述的实施方式并不代表与本公开相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本公开的一些方面相一致的装置和方法的例子。
本公开所涉及的用户信息为经用户授权或者经过各方充分授权的信息。
以下,对本公开所涉及的术语进行解释。
一、FLV(Flash Video)
FLV是一种流媒体格式,FLV流媒体格式是随着Flash MX(一种动画制作软件)的推出发展而来的视频格式。由于它形成的文件极小、加载速度极快,使得网络观看视频文件(也即在线浏览视频)成为可能,它的出现有效地解决了视频文件导入Flash后导出的SWF(一种Flash的专用文件格式)文件体积庞大,以致不能在网络上很好的使用等问题。
二、流媒体(Streaming Media)
流媒体采用流式传输方法,是指将一连串的媒体流压缩后,通过网络发送资源包,从而在网上即时传输媒体流以供观赏的一种技术与过程,此技术使得资源包得以像流水一样发送;如果不使用此技术,就必须在使用前下载整个媒体文件,从而仅能进行离线观看媒体流。流式传输可传送现场媒体流或预存于服务器上的媒体流,当观众用户在收看这些媒体流时,媒体流在送达观众用户的观众终端后由特定播放软件进行播放。
三、FAS(FLV Adaptive Streaming,基于FLV的自适应流媒体传输标准)
FAS是本公开所提出的流式资源传输标准(或称为资源传输协议),与传统的基于分片的媒体传输方式不同,FAS标准能够达到帧级别的媒体流传输,服务器无需等待一个完整的视频片段到达之后才能向终端发送资源包,而是在解析终端的帧获取请求之后,确定目标时间戳,基于目标时间戳小于零,那么将从目标时间戳开始已缓存的所有媒体帧打包发送至终端(无需分片),此后,基于目标时间戳大于或等于零或者除了缓存的媒体帧之外还存在实时流,那么将媒体流的媒体帧逐帧发送至终端。需要说明的是,帧获取请求中指定目标码率,当终端自身的网络带宽情况发生变化时,适应性调整待切换码率,重新发送与待切换码率对应的 帧获取请求,从而自适应调整媒体流码率。FAS标准能够实现帧级传输。
四、直播与点播
直播:媒体流是实时录制的,主播用户通过主播终端将媒体流“推流”(指基于流式传输方式推送)到服务器上,观众用户在观众终端上触发进入主播用户的直播界面之后,将媒体流从服务器“拉流”(指基于流式传输方式拉取)到观众终端,观众终端解码并播放媒体流,从而实时地进行视频播放。
点播:也称为Video On Demand(VOD),媒体流预存在服务器上,服务器能够根据观众用户的要求来提供观众用户指定的媒体流,观众终端向服务器发送点播请求,服务器查询到点播请求所指定的媒体流之后,将媒体流发送至观众终端,也即是说,观众用户能够选择性地播放某个特定的媒体流。
直观地来讲,点播的内容任意控制播放进度,而直播则不然,直播的内容播放速度取决于主播用户的实时直播进度。
以下,对本公开实施例的实施环境进行示例性说明。
图1是根据一示例性实施例示出的一种媒体流传输方法涉及的实施环境的示意图。参见图1,该实施环境包括至少一个终端和服务器,下面进行详述:
该至少一个终端,用于进行媒体流传输,在每个终端上安装有媒体编解码组件以及媒体播放组件,该媒体编解码组件用于在接收媒体流(例如分片传输的资源包、帧级传输的媒体帧)之后进行媒体流的解码,该媒体播放组件用于在解码媒体流之后进行媒体流的播放。
按照用户身份的不同,该至少一个终端划分为主播终端以及观众终端,主播终端对应于主播用户,观众终端对应于观众用户,需要说明的是,对同一个终端而言,该终端即是主播终端,也是观众终端,比如,用户在录制直播时该终端为主播终端,用户在观看直播时该终端为观众终端。
该至少一个终端和服务器通过有线网络或无线网络相连。
该服务器,用于提供待传输的媒体流,服务器包括一台服务器、多台服务器、云计算平台或者虚拟化中心中的至少一种。在一些实施例中,服务器承担主要计算工作,至少一个终端承担次要计算工作;或者,服务器承担次要计算工作,至少一个终端承担主要计算工作;或者,至少一个终端和服务器两者之间采用分布式计算架构进行协同计算。
在一个示例性场景中,该服务器是集群式的CDN(Content Delivery Network,内容分发网络)服务器,CDN服务器包括中心平台以及部署在各地的边缘服务器,通过中心平台的负载均衡、内容分发、调度等功能模块,使得用户所在终端能够依靠当地的边缘服务器来就近获取所需内容(即媒体流),从而降低网络拥塞,提升终端访问的响应速度和命中率。
换言之,CDN服务器在终端与中心平台之间增加了一个缓存机制,该缓存机制也即是部署在不同地理位置的边缘服务器(比如WEB服务器),在性能优化时,中心平台会根据终端与边缘服务器的距离远近,调度与终端之间距离最近的边缘服务器来向终端提供服务,能够更加有效地向终端发布内容。
本公开实施例所涉及的媒体流,包括但不限于:视频资源、音频资源、图像资源或者文本资源中至少一项,本公开实施例不对媒体流的类型进行具体限定。比如,该媒体流为网络主播的直播视频流,或者为预存在服务器上的历史点播视频,或者为电台主播的直播音频流,或者为预存在服务器上的历史点播音频。
在一些实施例中,至少一个终端中每个终端的设备类型包括但不限于:电视机、智能手 机、智能音箱、车载终端、平板电脑、电子书阅读器、MP3(Moving Picture Experts Group Audio Layer III,动态影像专家压缩标准音频层面3)播放器、MP4(Moving Picture Experts Group Audio Layer IV,动态影像专家压缩标准音频层面4)播放器、膝上型便携计算机或者台式计算机中的至少一种。以下实施例,以终端包括智能手机来进行举例说明。
本领域技术人员知晓,上述至少一个终端的数量仅为一个,或者至少一个终端的数量为几十个或几百个,或者更多数量。本公开实施例对至少一个终端的数量和设备类型不加以限定。
以上介绍了本公开实施例的实施环境,以下对本公开实施例的方法流程进行示例性说明。
图2是本公开实施例提供的一种FAS框架的原理性示意图,请参考图2,本公开实施例提供一种FAS(基于流式的多码率自适应)框架,在该框架内,至少一个终端101与服务器102之间通过FAS协议进行多媒体资源传输。
以任一终端为例进行说明,在终端上安装有应用程序(亦称为FAS客户端),该应用程序用于浏览多媒体资源,例如,该应用程序为短视频应用、直播应用、视频点播应用、社交应用、购物应用等,本公开实施例不对应用程序的类型进行具体限定。
用户在终端上启动应用程序,显示资源推送界面(例如应用程序的首页或者功能界面),在该资源推送界面中包括至少一个多媒体资源的缩略信息,该缩略信息包括标题、简介、发布者、海报、预告片或者精彩片段中至少一项,响应于用户对任一多媒体资源的缩略信息的触控操作,终端从资源推送界面跳转至资源播放界面,在该资源播放界面中包括该多媒体资源的播放选项,响应于用户对该播放选项的触控操作,终端从服务器中下载该多媒体资源的媒体描述文件(Media Presentation Description,MPD),基于该媒体描述文件,确定目标码率的多媒体资源的目标地址信息,向服务器发送携带目标地址信息的帧获取请求(或称为FAS请求),使得服务器基于一定的规范(FAS请求的处理规范)来处理该帧获取请求,服务器定位到该多媒体资源的媒体帧(连续的媒体帧构成媒体流)之后,以目标码率向终端返回该多媒体资源的媒体帧(也即以目标码率向终端返回媒体流)。终端接收到媒体流之后,调用媒体编解码组件对媒体流进行解码,得到解码后的媒体流,调用媒体播放组件播放解码后的媒体流。
在一些直播场景下,终端所请求的媒体流通常为主播用户实时推流到服务器的直播视频流,这时服务器在接收到主播用户的直播视频流之后,对该直播视频流进行转码,得到多种码率的直播视频流,为不同码率的直播视频流分配不同的地址信息,记录在媒体描述文件中,从而能够针对携带不同地址信息的帧获取请求,以不同的码率返回对应的直播视频流。
进一步地,提供一种自适应调整码率的机制,当终端当前的网络带宽情况发生变化时,适应性调整与当前网络带宽情况相匹配的待切换码率。比如,在需要切换码率时,终端断开当前码率的媒体流传输链接,向服务器发送携带待切换码率所对应待切换地址信息的帧获取请求,建立基于待切换码率的媒体流传输链接,当然,终端也不断开当前码率的媒体流传输链接,而是直接重新发起携带待切换地址信息的帧获取请求,建立基于待切换码率的媒体流传输链接(用于传输新的媒体流),将原有的媒体流作为备用流,一旦新的媒体流出现传输异常,那么继续播放备用流,在播放过程中动态调整媒体流的码率。
在上述FAS框架中,能够达到帧级别的媒体流传输,无需对多媒体资源进行分片传输。
图3是根据一示例性实施例示出的一种媒体流传输方法的流程图,该方法应用于终端,包括以下步骤。
在S31中,响应于对媒体流的帧获取指令,从多种码率的该媒体流的地址信息中,确定目标码率的该媒体流的目标地址信息。
在一些实施例中,该多种码率的该媒体流的地址信息存储于媒体流的媒体描述文件中。相应地,在S31中,响应于对媒体流的帧获取指令,从媒体流的媒体描述文件包括的多种码率的该媒体流的地址信息中,确定目标码率的该媒体流的目标地址信息。
在S32中,确定该目标码率对应的待获取媒体帧在该媒体流中的起始位置。
在S33中,向服务器发送携带有该目标地址信息和起始位置的帧获取请求,该帧获取请求用于指示该服务器以该目标码率返回该媒体流中从该起始位置开始的媒体帧。
在一些实施例中,基于该帧获取指令由对媒体流的播放操作触发,该确定该目标码率对应的待获取媒体帧在该媒体流中的起始位置,包括:将该媒体流中该播放操作的操作时间产生的媒体帧所在位置确定为该起始位置;或,将该帧获取指令中所选定的媒体帧在该媒体流中的位置确定为该起始位置;或,将该媒体流的第一个媒体帧所在位置确定为该起始位置。
在一些实施例中,该帧获取指令在该媒体流的播放状态信息满足码率切换条件时触发。
在一些实施例中,该响应于对媒体流的帧获取指令,从多种码率的该媒体流的地址信息中,确定目标码率的该媒体流的目标地址信息,包括:在接收该媒体流中任一媒体帧时,获取该媒体流的播放状态信息;响应于该播放状态信息符合该码率切换条件,从多种码率的该媒体流的地址信息中,确定目标码率的该媒体流的目标地址信息;
该确定该目标码率对应的待获取媒体帧在该媒体流中的起始位置,包括:根据该任一媒体帧在该媒体流中的位置,确定该目标码率对应的待获取媒体帧在该媒体流中的起始位置。
在一些实施例中,该响应于该播放状态信息符合该码率切换条件,从该媒体流的媒体描述文件包括的多种码率的该媒体流的地址信息中,确定目标码率的该媒体流的目标地址信息,包括:响应于该播放状态信息符合该码率切换条件,根据该播放状态信息和当前码率,确定目标码率;响应于该目标码率与该当前码率不相等,从该媒体流的媒体描述文件包括的多种码率的该媒体流的地址信息中,确定目标码率的该媒体流的目标地址信息。
在一些实施例中,该播放状态信息包括第一缓存量,该第一缓存量为当前对该媒体流已缓存且未播放的缓存量;该响应于该播放状态信息符合码率切换条件,根据该播放状态信息以及当前码率,确定目标码率,包括:响应于该第一缓存量大于第一缓存量阈值或该第一缓存量小于第二缓存量阈值,根据该播放状态信息以及当前码率,确定目标码率,其中该第二缓存量阈值小于该第一缓存量阈值。
在一些实施例中,该根据该播放状态信息以及当前码率,确定目标码率,包括:获取多个候选码率;根据该多个候选码率与该当前码率之间的关系、该播放状态信息以及该媒体流中该任一媒体帧在该任一媒体帧所在媒体帧组中的位置,获取每个候选码率对应的第二缓存量;根据该每个候选码率对应的第二缓存量与第一缓存量阈值或第二缓存量阈值的关系,从该多个候选码率中,确定目标码率;其中,该每个候选码率对应的第二缓存量为将码率切换至候选码率后,该任一媒体帧所在媒体帧组传输结束时对该媒体流已缓存但未播放的缓存量。
在一些实施例中,该帧获取请求还包括第一扩展参数或者第二扩展参数中至少一项,该第一扩展参数用于表示该媒体帧是否为音频帧,该第二扩展参数用于表示从该第二扩展参数所指示的目标时间戳开始传输该媒体流中的媒体帧。
在一些实施例中,该媒体描述文件包括版本号和媒体描述集合,其中,该版本号包括该媒体描述文件的版本号或者资源传输标准的版本号中至少一项,该媒体描述集合包括多个媒 体描述元信息,每个媒体描述元信息对应于一种码率的媒体流,每个媒体描述元信息包括该媒体描述元信息所对应码率的媒体流的画面组长度以及属性信息。
图4是根据一示例性实施例示出的一种媒体流传输方法的流程图,该方法应用于服务器,包括以下步骤。
在S41中,接收帧获取请求,该帧获取请求携带有目标码率的媒体流的目标地址信息和该目标码率对应的待获取媒体帧在该媒体流中的起始位置。
在S42中,响应于该帧获取请求,从该目标地址信息对应的地址,获取从该起始位置开始的媒体帧。
在S43中,以该目标码率,向终端传输从该起始位置开始的媒体帧。
在一些实施例中,该从该目标地址信息对应的地址,获取从该起始位置开始的媒体帧,包括:基于该起始位置,确定目标时间戳;基于该目标时间戳,确定并获取从该起始位置开始的媒体帧。
在一些实施例中,基于该起始位置为拉取位置参数,该基于该起始位置,确定目标时间戳,包括:基于音频参数和拉取位置参数,确定目标时间戳。
在一些实施例中,该基于该音频参数和该拉取位置参数,确定目标时间戳包括:基于该拉取位置参数为默认值,且该音频参数为默认值或该音频参数为假,将最大时间戳减去该拉取位置参数的默认值的绝对值所得的数值确定为该目标时间戳;或,基于该拉取位置参数为默认值,且该音频参数为真,将最大音频时间戳减去该拉取位置参数的默认值的绝对值所得的数值确定为该目标时间戳;或,基于该拉取位置参数等于0,且该音频参数为默认值或该音频参数为假,将最大时间戳确定为该目标时间戳;或,基于该拉取位置参数等于0,且该音频参数为真,将最大音频时间戳确定为该目标时间戳;或,基于该拉取位置参数小于0,且该音频参数为默认值或该音频参数为假,将最大时间戳减去该拉取位置参数的绝对值所得的数值确定为该目标时间戳;或,基于该拉取位置参数小于0,且该音频参数为真,将最大音频时间戳减去该拉取位置参数的绝对值所得的数值确定为该目标时间戳;或,基于该拉取位置参数大于0,且该音频参数为默认值或该音频参数为假,在缓存区中发生时间戳回退时,将最大时间戳确定为该目标时间戳;或,基于该拉取位置参数大于0,且该音频参数为真,在缓存区中发生时间戳回退时,将最大音频时间戳确定为该目标时间戳;或,基于该拉取位置参数大于0,且缓存区中未发生时间戳回退时,将该拉取位置参数确定为该目标时间戳。
在一些实施例中,该方法还包括:基于缓存区中媒体帧序列中媒体帧的时间戳呈非单调递增,确定该缓存区发生时间戳回退;基于缓存区中媒体帧序列中媒体帧的时间戳不是呈非单调递增,确定该缓存区未发生时间戳回退,所述媒体帧序列为所述缓存区缓存的多个媒体帧所组成的序列。
在一些实施例中,该方法还包括:基于该缓存区中包括视频资源且在关键帧序列中关键帧的时间戳呈非单调递增时,确定该媒体帧序列呈非单调递增,所述关键帧序列为缓存的多个关键帧组成的序列;基于该缓存区中不包括视频资源且在音频帧序列中音频帧的时间戳呈非单调递增时,确定该媒体帧序列呈非单调递增,所述音频帧序列为缓存的多个音频帧组成的序列。
在一些实施例中,该基于该目标时间戳,确定并获取从该起始位置开始的媒体帧,包括:基于当前有效缓存区中存在目标媒体帧,确定该目标媒体帧为该从该起始位置开始的媒体帧,该目标媒体帧的时间戳大于或等于该目标时间戳且最接近该目标时间戳;或,基于该当前有 效缓存区中不存在目标媒体帧,进入等待状态,直到该目标媒体帧写入该当前有效缓存区时,确定该目标媒体帧为该从该起始位置开始的媒体帧,该目标媒体帧的时间戳大于或等于该目标时间戳且最接近该目标时间戳;或,基于该当前有效缓存区中不存在目标媒体帧,且该目标时间戳与最大时间戳之间的差值大于超时阈值,发送拉取失败信息,该目标媒体帧的时间戳大于或等于该目标时间戳且最接近该目标时间戳。
以上从终端和服务器单侧介绍了本公开实施例提供的方法实施例,以下通过另一个方法实施例从终端和服务器之间进行交互的角度出发,对本公开实施例进行示例性说明。
图5是根据一示例性实施例示出的一种媒体流传输方法的流程图,参见图5,该方法包括以下步骤。
在S51中,终端接收媒体流中的任一媒体帧时,获取媒体流的播放状态信息。
在本公开实施例中,提供了一种媒体流传输方法,终端接收帧获取指令,响应于该帧获取指令,确定目标码率的媒体流的目标地址信息以及待获取媒体帧的起始位置,从而向服务器发送请求以指示服务器发送按照目标码率发送相应的媒体帧,以实现媒体流的帧级传输。
其中,该帧获取指令包括两种触发方式,该两种触发方式对应的应用场景不同,在第一种触发方式中,该帧获取指令在所述媒体流的播放状态信息满足码率切换条件时触发。也即是,在根据该媒体流的播放状态信息确定需要切换码率时,触发该帧获取指令,重新向服务器发送请求,以请求切换后的目标码率的媒体帧。这种触发方式对应的内容参见S51和S52。
在第二种触发方式中,该帧获取指令由对媒体流的播放操作触发,例如,该播放操作是用户首次对媒体流进行的播放操作,也为播放暂停后重新启动播放的操作。终端向用户提供码率选择列表,用户在码率选择列表中,选择一个码率作为目标码率。用户手动点击码率选择列表中任一数值,相应地,终端将该数值对应的码率确定为目标码率。用户对媒体流的播放操作发生在终端播放媒体流的过程中,相应地,用户在使用终端获取媒体流的过程中,随时手动切换传输码率。该播放操作也发生在终端第一次获取到媒体流之前,相应地,用户使用终端开始获取媒体流之前,先在终端提供的码率选择列表中确定目标码率,终端再响应对应触发的媒体流的帧获取指令。当然,该目标码率还为一个默认值,本公开实施例对此不作限定。
下面针对上述第一种触发方式进行详细说明。
其中,终端与服务器通过有线网络或无线网络相连,该服务器用于提供多种码率的媒体流。媒体流包括基于干媒体帧,该媒体帧为音频帧,也为图像帧,该基于干媒体帧通过对原始媒体资源采样得到。终端从服务器源源不断地获取的媒体流中的基于干媒体帧,再播放获取的媒体帧,实现媒体流的传输和播放。终端在传输该媒体流中的任一媒体帧时,获取该媒体流的播放状态信息,该播放状态信息用于判断是否需要切换媒体流的传输码率。
终端上安装有应用程序,该应用程序用于浏览媒体流,例如,该应用程序包括短视频应用、直播应用、视频点播应用、社交应用或者购物应用中至少一项,本公开实施例不对应用程序的类型进行具体限定。
本公开实施例所涉及的媒体流,包括但不限于:视频资源、音频资源、图像资源或者文本资源中至少一项,本公开实施例不对媒体流的类型进行具体限定。比如,该媒体流为网络主播的直播视频流,或者为预存在服务器上的历史点播视频,或者为电台主播的直播音频流,或者为预存在服务器上的历史点播音频。
在上述过程中,用户在终端上启动应用程序,该应用程序显示资源推送界面,例如该资 源推送界面是应用程序的首页或者功能界面,本公开实施例不对资源推送界面的类型进行具体限定。在该资源推送界面中包括至少一个媒体流的缩略信息,该缩略信息包括媒体流的标题、简介、海报、预告片或者精彩片段中至少一项。用户在浏览资源推送界面的过程中,点击感兴趣的媒体流的缩略信息,响应于用户对该媒体流的缩略信息的触控操作,终端从资源推送界面跳转至资源播放界面。
在S52中,终端响应于播放状态信息符合码率切换条件,根据播放状态信息以及当前码率,确定目标码率。
其中,终端在播放媒体流时,获取媒体流的播放状态信息,该播放状态信息用于判断是否需要切换媒体流的码率,当播放状态信息符合码率切换条件时,终端对此作出响应,确定并将码率切换为可优化媒体流的播放效果的目标码率。
通过自适应功能,将码率调整为与当前的网络带宽信息对应的码率,在进行自适应调整的过程中,除了当前的网络带宽信息之外,还结合终端的播放状态信息,动态选择播放效果最佳的目标码率,从而在媒体流的卡顿率、清晰度以及平滑性之间取得折中。
在一些实施例中,终端确定目标码率后,获取该目标码率对应的媒体流的地址信息,即目标地址信息,相应地,终端触发帧获取指令,该帧获取指令用于指示终端从多种码率的媒体流的地址信息中,确定目标码率的媒体流的目标地址信息,为后续根据目标地址信息传输媒体流提供基础。
在一些实施例中,该多种码率的媒体流的地址信息存储于媒体流的媒体描述文件中,相应地,终端从媒体描述文件包括的多种码率的媒体流的地址信息中,确定目标码率的媒体流的目标地址信息。
下面对该码率切换判断和目标码率的获取过程进行详细说明。
在一些实施例中,播放状态信息包括第一缓存量,第一缓存量为当前对媒体流已缓存且未播放的缓存量。终端响应于第一缓存量大于第一缓存量阈值或第一缓存量小于第二缓存量阈值,根据播放状态信息以及当前码率,确定目标码率,第二缓存量阈值小于第一缓存量阈值。
其中,终端在得到媒体流的媒体帧后,将得到的媒体帧存入缓存,等到需要播放媒体帧时,对缓存的媒体帧进行解码并按时间顺序播放。该第一缓存量通过已缓存且未播放的媒体流的时长来度量。例如,终端已缓存1000毫秒(ms),播放了400ms,则第一缓存量为600ms。
针对该码率切换条件的不同,上述播放状态信息符合码率切换条件时,根据播放状态信息以及当前码率,确定目标码率,包括两种情况:
情况一、响应于第一缓存量大于第一缓存量阈值,终端根据播放状态信息以及当前码率,确定目标码率,该目标码率大于或等于当前码率。
在情况一中,第一缓存量大于第一缓存量阈值,说明当前终端对媒体流已缓存且未播放的缓存量能够保证媒体流的流畅播放,而媒体流的码率越大,媒体流越清晰,因此考虑增大媒体流的下载码率。
情况二、响应于第一缓存量小于第二缓存量阈值,终端根据播放状态信息以及当前码率,确定目标码率,该目标码率小于或等于当前码率。
其中,第二缓存量阈值小于第一缓存量阈值,该第一缓存阈值和该第二缓存阈值是预先设定的缓存量阈值,也是临时设定的缓存量阈值。
在情况二中,第一缓存量小于第二缓存量阈值时,代表当前终端对媒体流已缓存且未播 放的缓存量可能无法保证媒体流的流畅播放,而媒体流的码率越小,在相同时间内,终端能够缓存的媒体流越多,增加了缓存也就能够使媒体流的播放更加流畅,因此考虑减小媒体流的下载码率。
第一缓存量与第一缓存量阈值、第二缓存量阈值之间的关系除了上述两种情况中所示的关系外,还具有一种可能情况:第一缓存量小于或等于第一缓存量阈值,且大于或等于第二缓存量阈值。此时说明当前终端对媒体流已缓存且未播放的缓存量符合媒体流的播放要求,且可能刚好符合要求,此时不改变媒体流的下载码率。
例如,第一缓存量阈值用q h表示,第二缓存量阈值用q l表示,第一缓存量用q c表示。当q c>q h时,媒体流播放发生卡顿的概率很小,此时考虑增大媒体流的码率。当q c<q h时,媒体流播放发生卡顿的概率很大,此时考虑降低媒体流的码率。
在一些实施例中,通过两个阈值的设置,将第一缓存量与第一缓存阈值和第二缓存阈值进行对比,以此来判断是否对媒体流的码率进行切换,快速获知媒体流当前的播放效果。该第一缓存量大于第一缓存量阈值或小于第二缓存量阈值时,终端执行码率切换的步骤,并通过自适应调整码率以优化播放效果。但考虑到切换码率后,终端以目标码率接收媒体流不一定能够保证媒体流的正常播放,在确定目标码率时,通过第二缓存量阈值与两个阈值进行对比,以此来判断码率切换后媒体流的播放情况是否会得到改善。
在一些实施例中,该目标码率的确定过程为对多个候选码率对应的播放效果进行判断,从而从多个候选码率中获取播放效果最好的候选码率作为目标码率。终端获取多个候选码率,根据多个候选码率与当前码率之间的关系、播放状态信息以及媒体流中任一媒体帧在该任一媒体帧所在媒体帧组中的位置,获取每个候选码率对应的第二缓存量,根据每个候选码率对应的第二缓存量与第一缓存量阈值或第二缓存量阈值的关系,从多个候选码率中,确定目标码率。
其中,服务器中缓存有多种码率的媒体流,该多个候选码率即为服务器能够提供的媒体流的多种码率。媒体流包括多个媒体帧组,每个媒体帧组的长度根据业务需求进行预设,也由技术人员临时设置,本公开对此不做限定。每个媒体帧组包括多个媒体帧,多个媒体帧按时间顺序进行排列。媒体流中任一媒体帧在该任一媒体帧所在媒体帧组中的位置通过终端从该媒体帧组的第一帧播放至该任一媒体帧所需的时长来表示。每个候选码率对应的第二缓存量代表将码率切换至该候选码率后,终端对任一媒体帧所在媒体帧组传输结束时,已缓存但未播放的媒体流能够播放的时长。通过将每个候选码率对应的第二缓存量与第一缓存量阈值和第二缓存量阈值进行对比,能够判断码率切换后的播放效果,以此筛选出作为目标码率的候选码率。
例如,服务器中缓存有n种码率的媒体流,多个候选码率包括r 1,r 2,…r n。该任一媒体帧所在的媒体帧组的长度为D,从该媒体帧组的第一帧播放至该任一媒体帧所需的时长为d,用d表示该任一媒体帧在媒体帧组中的位置。另外,用q n表示第n个候选码率对应的第二缓存量。其中,D,d为正数,n为正整数。
通过获取多个候选码率,并根据多个候选码率与当前码率之间的关系、播放状态信息以及媒体流中任一媒体帧在该任一媒体帧所在媒体帧组中的位置,获取每个候选码率对应的第二缓存量,再根据每个码率对应的第二缓存量与第一缓存量阈值和第二缓存量阈值之间的关系确定目标码率,提供了一种确定目标码率的方法,为实现终端的码率切换提供了基础。
在一些实施例中,该第二缓存量的获取过程为:终端根据媒体流中任一媒体帧在任一媒 体帧所在媒体帧组中的位置,获取任一媒体帧所在媒体帧组传输结束时对媒体流的缓存增加量,根据多个候选码率与当前码率之间的关系对应的继续缓存任一媒体帧所在媒体帧组时的缓存位置,确定当前到基于多个候选码率对任一媒体帧所在媒体帧组传输结束时的过程中媒体流的播放量,根据播放状态信息包括的当前对媒体流已缓存且未播放的第一缓存量、缓存增加量和播放量,获取每个候选码率对应的第二缓存量。
其中,任一媒体帧在该任一媒体帧所在媒体帧组中的位置通过终端从该媒体帧组的第一帧播放至该任一媒体帧所需的时长来表示,例如,单位为ms。终端从获取该任一媒体帧的时刻,到对该任一媒体帧所在的媒体帧组的传输结束的时刻之间的时间段内,缓存的媒体流能够播放的时间即为缓存增加量。而终端获取该缓存增加量对应的媒体流是需要时间的,获取该缓存增加量对应的媒体流时,终端仍然在播放缓存的媒体流,在这段时间内,终端播放的媒体流的时长即为播放量。终端能够根据第一缓存量、缓存增加量和播放量,获取每个候选码率对应的第二缓存量。多个候选码率对应的第二缓存量能够表示终端切换至多个候选码率完成对当前正在传输的媒体帧组的传输时,已缓存且未播放的媒体流能够播放的时长。
例如,用以下公式表示第二数据缓存量:
q n=q c+D-d-q b
其中,q c为第一缓存量,q b为播放量,q n为第n个候选码率对应的第二缓存量。D为该任一媒体帧所在的媒体帧组的长度,d为从该媒体帧组的第一帧播放至该任一媒体帧所需的时长,相应的,D-d表示缓存增加量。
通过获取缓存增加量和播放量,再结合获取的第一缓存量,能够得到每个候选码率的第二缓存量,提供了一种获得第二缓存量的方法,为后续根据第二缓存量与第一缓存量阈值和第二缓存量阈值的关系确定目标码率提供了基础。
在一些实施例中,在获取播放量时,还参考当前网络状态信息,来判断缓存完该任一媒体帧所在媒体帧组所需时间,并判断这段时间会播放多少缓存量。终端根据多个候选码率与当前码率之间的关系,确定继续缓存任一媒体帧所在媒体帧组时的缓存位置,获取当前网络状态信息,根据当前网络状态信息、缓存位置、媒体帧组的长度以及多个候选码率,确定当前到基于多个候选码率对任一媒体帧所在媒体帧组传输结束时的过程中媒体流的播放量。
其中,上述播放量为终端获取缓存增加量对应的媒体流的时间段内,播放的媒体流的时长,可见该播放量与终端获取缓存增加量对应的媒体流的速度有关,即与终端的网络状态有关。相应的,终端获取当前网络信息,该当前网络信息包括与当前时刻相近的一段时长内,终端的平均带宽。终端继续缓存任一媒体帧所在媒体帧组时的缓存位置与候选码率和当前码率之间的关系有关,当候选码率与当前码率相同时,终端无需切换码率,从该任一媒体帧的下一帧继续缓存该任一媒体帧所在的媒体帧组。当候选码率与当前码率不同时,为了防止终端对获取的媒体帧进行解码的过程中发生错误,终端从该任一媒体帧所在的媒体帧组的第一帧开始,缓存该任一媒体帧所在的媒体帧组。确定了终端继续缓存该任一媒体帧所在的媒体帧组时的缓存位置后,终端能够结合获取的当前网络状态信息,媒体帧组的长度以及多个候选码率,确定多个候选码率对该媒体帧组传输结束时的过程中的播放量。
例如,用以下公式表示第二数据缓存量:
当r n=r c时,q n=q c+D-d-(D-d)*r c*8/B
当r n≠r c时,q n=q c+D-d-D*r n*8/B
其中,r c为当前码率,r n为第n个候选码率,q n为第n个候选码率对应的第二缓存量, q c为第一缓存量,D为该任一媒体帧所在的媒体帧组的长度,d为从该媒体帧组的第一帧播放至该任一媒体帧所需的时长,B为终端在与当前时刻相近的一段时长内的平均带宽,D-d表示缓存增加量。当候选码率与当前码率相同时,终端无需切换码率,从该任一媒体帧的下一帧继续缓存该任一媒体帧所在的媒体帧组,相应的,(D-d)*r c*8/B表示播放量。当候选码率与当前码率不同时,终端从该任一媒体帧所在的媒体帧组的第一帧开始,缓存该任一媒体帧所在的媒体帧组,相应的,D*r n*8/B表示播放量。
其中,对于平均带宽,还利用以下公式得到终端在与当前时刻相近的一段时长内的平均带宽B:
B=S*8/T
其中,S为终端在与当前时刻相近的一段时长内下载的媒体流的数据量,T为与当前时刻相近的一段时长,例如,T取500毫秒。
通过确定继续缓存任一媒体帧所在的媒体帧组时的缓存位置,获取当前网络状态信息,结合媒体帧组的长度和多个候选码率,确定当前到基于多个候选码率对任一媒体帧所在媒体帧组传输结束时的过程中媒体流的播放量,提供了一种得到播放量的方法,使得终端能够得到第二缓存量。
在一些实施例中,对于上述情况一,终端响应于多个候选码率对应的第二缓存量中至少一个第二缓存量大于第一缓存量阈值,将该至少一个第二缓存量对应的候选码率中,最大的候选码率确定为目标码率。终端也响应于多个候选码率对应的第二缓存量中不包括大于第一缓存量阈值的第二缓存量,将当前码率确定为目标码率。
其中,对于上述情况一,第一缓存量大于第一缓存量阈值,也就是说,当前终端对媒体流已缓存且未播放的缓存量能够保证媒体流的流畅播放,此时考虑增大媒体流的载码率。
当多个候选码率对应的第二缓存量中至少一个第二缓存量大于第一缓存量阈值时,该至少一个第二缓存量对应的至少一个候选码率能够保证媒体流的正常播放,相应的,终端将至少一个第二缓存量对应的候选码率中,最大的候选码率确定为目标码率,且该候选码率大于当前码率。
当多个候选码率对应的第二缓存量中不包括大于第一缓存量阈值的第二缓存量时,说明多个候选码率中,没有能够在保证媒体流的正常播放的同时,增强媒体流的清晰度的码率,因此终端将当前码率确定为目标码率,以当前码率继续缓存媒体流。
例如,当前码率用r c表示,目标码率用r表示,第n个候选码率用r n表示,第n个候选码率对应的第二缓存量用q n表示,第一缓存量阈值用q h表示。如果对于任意的r n>r c,不存在q n>q h,则取r=r c。如果对于任意的r n>r c,存在q n>q h,则取满足q n>q h的r n中,最大的r n作为目标码率。
通过将至少一个大于第一缓存量阈值的第二缓存量对应的候选码率中,最大的候选码率确定为目标码率,或将当前码率确定为目标码率。
在一些实施例中,对于上述情况一,当多个候选码率对应的第二缓存量中至少一个第二缓存量大于第一缓存量阈值时,终端将该至少一个第二缓存量对应的大于当前码率的候选码率中,最小的候选码率确定为目标码率。
其中,该至少一个第二缓存量对应的大于当前码率的候选码率中的每个候选码率均在保证媒体流能够正常播放的同时,增强媒体流的清晰度,选择其中最小的码率作为目标码率。
在一些实施例中,对于上述情况二,终端响应于多个候选码率对应的第二缓存量中至少 一个第二缓存量大于第二缓存量阈值,将至少一个第二缓存量对应的候选码率中,最大的候选码率确定为目标码率。终端也响应于多个候选码率对应的第二缓存量中不包括大于第二缓存量阈值的第二缓存量,将多个候选码率对应的第二缓存量中,最大的第二缓存量对应的候选码率确定为目标码率。
其中,对于上述情况二,第一缓存量小于第二缓存量阈值,也就是说,当前终端对媒体流已缓存且未播放的缓存量不能保证媒体流的流畅播放,此时考虑减小媒体流的下载码率。当多个候选码率对应的第二缓存量中至少一个第二缓存量大于第二缓存量阈值时,该至少一个第二缓存量对应的至少一个候选码率能够保证媒体流的正常播放,相应的,终端将至少一个第二缓存量对应的候选码率中,最大的候选码率确定为目标码率。当多个候选码率对应的第二缓存量中不包括大于第二缓存量阈值的第二缓存量时,说明多个候选码率中,没有能够保证媒体流的正常播放的码率,因此终端将多个候选码率对应的第二缓存量中,最大的第二缓存量对应的候选码率确定为目标码率。
例如,当前码率用r c表示,目标码率用r表示,第n个候选码率用r n表示,第n个候选码率对应的第二缓存量用q n表示,第二缓存量阈值用q l表示。如果对于任意的r n,不存在q n≥q l,则取最大的q n对应的r n作为目标码率。如果对于任意的r n,存在q n≥q l,则取满足q n≥q l的r n中,最大的r n作为目标码率。
通过将至少一个大于第二缓存量阈值的第二缓存量对应的候选码率中,最大的候选码率确定为目标码率,或将多个候选码率对应的第二缓存量中,最大的第二缓存量对应的候选码率确定为目标码率。
需要说明的是,上述S52为本公开实施例的可选步骤,在一些实施例中,目标码率也为预设的码率,或者为码率选择指令中所指示的码率,例如,用户选择切换码率,并指定目标码率。相应地,终端根据该目标码率,执行下述S53所述的步骤,本公开对获取目标码率的方式不做限定。
上述均以播放状态信息包括第一缓存量为例进行说明,在另一些实施例中,播放状态信息包括媒体流播放过程中的卡顿信息或丢帧率中至少一项,相应地,播放状态信息符合码率切换条件为卡顿信息或丢帧率中任一项满足码率切换条件。
其中,卡顿信息包括播放媒体流的目标时间段内的卡顿次数、上一次卡顿的时间或上一次的卡顿时长中至少一项。相应地,码率切换条件包括多种情况,例如:卡顿次数大于次数阈值,上一次卡顿时间与当前时刻之间的时长小于间隔阈值和卡顿时长大于时长阈值,或者上一次卡顿时长大于时长阈值等,在这些情况下,终端考虑降低码率。当然,码率切换条件也包括卡顿次数小于次数阈值,上一次卡顿时间与当前时刻之间的时长大于间隔阈值和卡顿时长小于时长阈值,上一次卡顿时长小于时长阈值,在这些情况下,终端考虑提高码率。
对于丢帧率,相应地,码率切换条件包括丢帧率大于第一丢帧率阈值,码率切换条件也为丢帧率小于第二丢帧率阈值,第二丢帧率阈值小于第一丢帧率阈值。在一些实施例中,该丢帧率还为目标时间段内的丢帧率,例如,过去一分钟内的丢帧率,通过一段时间内的丢帧率来判断这段时间的媒体流传输情况,从而来判断是否需要调整码率。
通过播放状态信息包括的卡顿信息或丢帧率,提供了一种根据卡顿信息或丢帧率,确定当前是否符合码率切换条件的方法,使得终端能够根据更多的判断条件切换媒体流的码率。
需要说明的是,上述S52中终端根据播放状态信息以及当前码率,确定目标码率的步骤为本公开实施例的可选步骤。在一些实施例中,目标码率为预设的码率,相应地,当播放状 态信息满足码率切换条件时,终端直接将预设的码率确定为目标码率,进而触发帧获取指令,以执行下述S53。本公开对确定目标码率的方法不做限定。
在S53中,终端接收对媒体流的帧获取指令。
通过上述内容可知,该帧获取指令有两种触发方式:第一种触发方式中,帧获取指令在播放状态信息满足码率切换条件时触发,或,第二种触发方式中,帧获取指令由对媒体流的播放操作触发。
上述两种情况中,目标码率的确定方式不同,在第一种触发方式中,由于满足码率切换条件,该目标码率为确定的待切换的码率。在一些实施例中,该目标码率与当前码率的关系可能不同,例如,该目标码率与当前码率不同,也即是,通过上述确定过程,确定切换码率,则终端触发该帧获取指令,并执行后续的请求发送步骤。又例如,该目标码率与当前码率相同,也即是,通过上述确定过程,确定不切换码率,则终端保持当前码率继续从服务器接收媒体流的媒体帧,相应地,终端不触发该帧获取指令,或者触发该帧获取指令,终端接收到该帧获取指令后,丢弃该帧获取指令,而不响应。当然,终端也就无需执行后续的请求发送步骤。
在第二种触发方式中,在播放该媒体流时,该目标码率为用户选定的码率,也为默认码率,本公开实施例对此不作限定。
在S54中,终端响应于对媒体流的帧获取指令,从媒体流的媒体描述文件包括的多种码率的媒体流的地址信息中,确定目标码率的媒体流的目标地址信息。
该S54为终端响应于对媒体流的帧获取指令,从多种码率的媒体流的地址信息中,确定目标码率的媒体流的目标地址信息。在此仅以该多种码率的媒体流的地址信息存储于媒体描述文件中为例进行说明,该多种码率的媒体流的地址信息也可以存储于其他地方,终端可以从其他地方获取到多种码率的媒体流的地址信息,进而从中确定出目标码率的媒体流的地址信息。
其中,服务器在对媒体流进行转码之后,可能会形成多种码率的媒体流,此时服务器为不同码率的媒体流分配不同的地址信息,将各种码率的媒体流的地址信息均记录在媒体描述文件中,终端从服务器中下载该媒体流的媒体描述文件,并基于该媒体描述文件,确定不同码率的媒体流的地址信息。终端在确定目标码率之后,在媒体描述文件中以目标码率为索引,查询得到与目标码率的媒体流对应的媒体描述元信息,在该媒体描述元信息的属性信息中提取出目标地址信息。
上述媒体描述文件,是由服务器基于业务需求提供给终端的数据文件,由服务器按照业务需求进行预先配置,用于向终端提供流媒体服务的一组数据的集合以及业务相关的描述,能够保证终端获取到进行资源下载、解码、播放渲染时所需要的必要信息,媒体描述文件包括已编码并可传输的媒体流以及相应的元信息描述,使得终端能够基于媒体描述文件来构建帧获取请求(FAS请求),从而由服务器根据FAS标准的处理规范来响应帧获取请求,向终端提供流媒体服务。
在一些实施例中,该媒体描述文件是JSON(JavaScript Object Notation,JS对象简谱)格式的文件,当然也是其他格式的文件,本公开实施例不对媒体描述文件的格式进行具体限定。该媒体描述文件包括版本号(@version)和媒体描述集合(@adaptationSet),下面进行详述:
在一些实施例中,由于媒体描述文件本身可能会由于转码方式的变换而产生不同的版本, 而FAS标准也会随着技术的发展而进行版本更迭,因此该版本号包括该媒体描述文件的版本号或者资源传输标准(FAS标准)的版本号中至少一项,比如,该版本号仅包括FAS标准的版本号,或者仅包括媒体描述文件的版本号,或者该版本号还是媒体描述文件与FAS标准的版本号之间的组合。
在一些实施例中,该媒体描述集合用于表示媒体流的元信息,该媒体描述集合包括多个媒体描述元信息,每个媒体描述元信息对应于一种码率的媒体流,每个媒体描述元信息包括该媒体描述元信息所对应码率的媒体流的画面组长度(@gopDuration)以及属性信息(@representation)。
这里的画面组(Group Of Pictures,GOP)长度是指两个关键帧之间的距离,关键帧是指视频编码序列中的帧内编码图像帧(Intra-coded picture,也称为“I帧”),I帧的编解码不需要参考其他图像帧,仅利用本帧信息即可实现,而相对地,P帧(Predictive-coded picture,预测编码图像帧)和B帧(Bidirectionally predicted picture,双向预测编码图像帧)的编解码均需要参考其他图像帧,仅利用本帧信息无法完成编解码。也即是上述S52中媒体帧所在媒体帧组,由于该媒体流中可能仅包括音频流,也可能包括音频流和视频流,如果该媒体流中包括视频流,该媒体帧组则为该画面组,如果该媒体流中仅包括音频流,该媒体帧组则为音频帧组。
在一些实施例中,对每个媒体描述元信息所包括的属性信息(也即每个属性信息)而言,每个属性信息包括媒体流的标识信息、媒体流的编码方式、媒体流所支持的码率以及该码率的媒体流的地址信息。
标识信息(@id):指每个媒体流独一无二的标识符,标识信息由服务器进行分配。
编码方式(@codec):指媒体流遵从的编解码标准,例如H.263、H.264、H.265、MPEG等。
媒体流所支持的码率(@bitrate):指资源传输时单位时间内传送的数据位数,也称为比特率,以音频资源为例,码率越高,则音频资源被压缩的比例越小,音质损失越小,那么与音源的音质就越接近(音质越好),视频资源与音频资源类似,但由于视频资源由图像资源和音频资源组装而成,因此在计算码率时除了音频资源之外还要加上对应的图像资源。
某种码率的媒体流的地址信息(@url):指服务器在针对媒体流进行转码,得到该码率的媒体流之后,对外提供该码率的媒体流的URL(Uniform Resource Locator,统一资源定位符)或域名(Domain Name)。
在一些实施例中,每个属性信息还包括媒体流的质量类型、媒体流的隐藏选项、第一自适应功能选项或者默认播放功能选项中至少一项。
质量类型(@qualityType):包括媒体流的分辨率或者帧率等质量评价指标。
媒体流的隐藏选项(@hiden):用于表示媒体流是否外显,基于设定为true,表示对应码率的媒体流不外显,此时用户无法手动选择对应码率的媒体流,只能通过自适应功能来选中该码率的媒体流,基于设定为false,表示对应码率的媒体流外显,此时除了能够通过自适应功能选中该码率的媒体流之外,用户还能够手动选择对应码率的媒体流。需要说明的是,本申请所涉及的自适应功能,是指终端根据当前的网络带宽情况对所播放的媒体流进行动态帧率调整的功能,后文不做赘述。
第一自适应功能选项(@enableAdaptive):用于表示媒体流是否相对于自适应功能可见,基于设定为true,表示对应码率的媒体流对于自适应功能可见,对应码率的媒体流能够被自适应功能选中,基于设定为false,表示对应码率的媒体流对于自适应功能不可见,对应码 率的媒体流不能被自适应功能选中。
默认播放功能选项(@defaultSelect):用于表示是否在启播时默认播放对应码率的媒体流,基于设定为true,表示启播时默认播放对应码率的媒体流,基于设定为false,表示启播时不默认播放对应码率的媒体流,由于媒体播放组件无法默认播放两种码率的媒体流(存在播放冲突),因此,在所有媒体描述元信息的属性信息中,最多只能出现一个码率的媒体流的默认播放功能选项(@defaultSelect)为true。
在一些实施例中,除了版本号和媒体描述集合之外,该媒体描述文件还包括服务类型、第二自适应功能选项或者第三自适应功能选项中至少一项。
服务类型(@type):用于指定媒体流的业务类型,包括直播或者点播中至少一项,比如,设定为“dynamic”时表示直播,设定为“static”时表示点播,基于不做规定时,将“dynamic”作为默认值。
第二自适应功能选项(@hideAuto):用于表示是否打开自适应功能,基于设定为true,代表关闭自适应功能,且不显示自适应选项,基于设定为false,代表开启自适应功能,且显示自适应选项,基于不做规定时,将“false”作为默认值。
第三自适应功能选项(@autoDefaultSelect):用于表示是否在启播时默认打开自适应功能,基于设定为true,代表在开始播放(启播)时默认基于自适应功能播放,基于设定为false,代表在开始播放时默认不基于自适应功能播放,即启播时默认关闭自适应功能。需要说明的是,这里的第三自适应功能选项是上述默认播放功能选项的前提,也即是,只有在第三自适应功能选项设置为flase(启播时默认关闭自适应功能)时,默认播放功能选项才会有效,这时在启播时会默认播放@defaultSelect设置为true所对应码率的媒体流。基于第三自适应功能选项设置为true,那么在启播时会根据自适应功能选中最适合当前网络带宽情况的码率的媒体流。
在S55中,终端确定目标码率对应的待获取媒体帧在媒体流中的起始位置。
其中,与该帧获取指令包括两种触发方式对应,在两种触发方式中,目标码率对应的待获取媒体帧可能不同,该待获取媒体帧在媒体流中的起始位置则可能也不同。
对于第一种触发方式中帧获取指令在媒体流的播放状态信息满足码率切换条件时触发的情况,终端确定了优化媒体流的播放效果的目标码率后,由于目标码率可能与当前码率不同,根据终端对媒体帧解码时的参考关系,终端切换码率后在目标码率对应的媒体流中开始下载媒体帧的位置,可能与在当前码率对应的媒体流中该任一媒体帧的位置不同。该目标码率对应的待获取媒体帧在媒体流中的起始位置,即为终端将码率切换至目标码率后,开始获取的媒体帧在媒体流中的位置。
在一些实施例中,终端能够根据任一媒体帧在媒体流中的位置,确定目标码率对应的待获取媒体帧在媒体流中的起始位置。终端能够响应于目标码率与当前码率相等,丢弃播放状态信息,而不再确定目标位置。终端也响应于目标码率与当前码率不相等,将该任一媒体帧所在媒体帧组中的第一个媒体帧在媒体流中的位置确定为起始位置。
其中,如果目标码率与当前码率相同,则终端传输媒体流的码率没有发生变化,因此继续基于当前码率,从该任一媒体帧的下一个媒体帧开始传输媒体帧,相应地,终端丢弃获取的播放状态信息。在对媒体流进行转码时,媒体流中的关键媒体帧是严格对齐的,也就是说,不同码率的媒体流中,相对应的媒体帧组中的第一媒体帧的位置是相同的。如果目标码率与当前码率不相同,码率发生了变化,该任一媒体帧所在媒体帧组的码率需要保持一致,因而, 重新按照目标码率传输该媒体帧组的媒体帧。终端从该任一媒体帧所在的媒体帧组中的第一媒体帧所在的位置,基于目标码率开始传输媒体帧。
例如,如图6所示,终端当前正在传输的媒体流包括两个媒体帧组,每个媒体帧组中包括多个媒体帧,每个媒体帧均对应于一个时间戳。第一个媒体帧组中的媒体帧按时间戳的排列顺序为[1000,2000,3000,4000,5000,6000],第二个媒体帧组中的媒体帧按时间戳的排列顺序为[7000,8000,9000,10000,11000,12000]。以终端正在接收第二媒体帧组中,时间戳为8000的媒体帧为例进行说明,如果终端接收到时间戳为8000的媒体帧后,确定出的目标码率与当前码率相同,则无需进行码率切换,也无需向服务器发送帧获取请求,继续接收从时间戳为9000的媒体帧开始的媒体帧。如果终端接收到时间戳为8000的媒体帧后,确定出的目标码率与当前码率不同,则进行码率切换,向服务器发送帧获取请求,并为了使媒体帧组的码率保持一致,从时间戳为7000的媒体帧开始重新获取该媒体帧组中的媒体帧。上述过程通过根据任一媒体帧在媒体流中的位置,重新确定目标码率对应的待获取媒体帧在媒体流中的起始位置。
需要说明的是,终端从该任一媒体帧所在的媒体帧组中的第一媒体帧所在的位置,基于目标码率开始传输媒体帧,相应地,终端中可能同时存在不同码率的媒体流,终端播放媒体流时,优先播放高码率的媒体流。
对于帧获取指令由对媒体流的播放操作触发的情况,该目标码率对应的待获取媒体帧可能包括多种情况,下面提供了三种可能情况。
情况一、终端将媒体流中播放操作的操作时间产生的媒体帧所在位置确定为起始位置。
例如,在直播场景中,用户想要观看某个主播直播,则进行对该媒体流的播放操作,如点击该主播的直播房间链接,进入该主播的直播房间。终端将当前时间正产生的媒体帧在直播流中的位置作为该起始位置。
情况二、终端将帧获取指令中所选定的媒体帧在媒体流中的位置确定为起始位置。
例如,在点播场景中,用户想要从视频的第15秒开始观看,则进行对该视频的播放操作,控制该媒体流从15秒开始播放,终端则将15秒对应的媒体帧在视频中的位置作为该起始位置。
情况三、终端将媒体流的第一个媒体帧所在位置确定为起始位置。
例如,在点播场景中,用户想要观看某个视频,则对视频进行播放操作,终端将该视频的第一个媒体帧所在位置确定为起始位置。
上述播放操作发生在终端第一次获取到媒体流之前,也发生在终端播放媒体流的过程中。终端根据播放操作的操作时间,将该操作时间对应的媒体帧所在的位置确定为起始位置,保证用户获得该操作时间之后的媒体流。当然,由于媒体描述文件可能会发生变化造成版本更迭,那么终端也每当用户点击播放选项时,均重新下载一次媒体描述文件,将媒体流的第一个媒体帧所在位置确定为起始位置。终端还将帧获取指令中所选定的媒体帧在媒体流中的位置确定为起始位置,本公开实施例对此不做限定。
在S56中,终端向服务器发送携带有目标地址信息和起始位置的帧获取请求,帧获取请求用于指示服务器以目标码率传输媒体流中从起始位置开始的媒体帧。
其中,终端获取目标地址信息和起始位置后,生成携带该目标地址信息和起始位置的帧获取请求,进而向服务器发送携带目标地址信息的帧获取请求(或称为FAS请求)。
在一些实施例中,除了目标地址信息(@url)之外,该帧获取请求还包括扩展参数 (@extParam),该扩展参数用于指定不同的请求方式,从而实现不同的功能,该扩展参数包括第一扩展参数或者第二扩展参数中至少一项,下面进行详述:
第一扩展参数(@onlyAudio)属于一种音频参数,用于表示该媒体帧是否为音频帧,基于设定为true,表示终端拉取的媒体帧为音频帧,也即只拉取纯音频流。基于设定为false,表示终端拉取的媒体帧为音视频帧,也即拉取音频流和视频画面流,基于不做规定时,将“false”作为默认值。
在一个示例性场景中,终端获取媒体流的类型,基于媒体流的类型为视频,将第一扩展参数置为“false”或者默认值,基于媒体流的类型为音频,将第一扩展参数置为“true”。
在一些实施例中,终端还检测应用程序的类型,基于应用程序的类型为视频应用,将第一扩展参数置为“false”或者默认值,基于应用程序的类型为音频应用,将第一扩展参数置为“true”。
第二扩展参数(@fasSpts)属于一种拉取位置参数,用于表示从该第二扩展参数所指示的目标时间戳开始传输该媒体流的媒体帧,在一些实施例中,该第二扩展参数的数据类型为int64_t类型,当然,也为其他数据类型,本公开实施例不对第二扩展参数的数据类型进行具体限定。在帧获取请求中指定第二扩展参数,基于帧获取请求中未指定第二扩展参数,那么服务器由服务器来配置第二扩展参数的默认值。
针对第二扩展参数的不同取值,分别讨论对应取值下的媒体帧拉取情况:
1)基于该第二扩展参数大于零(@fasSpts>0),此时该目标时间戳pts大于当前时刻,那么终端将从pts等于@fasSpts的媒体帧(未来的某个时刻)开始拉取媒体流。
2)基于该第二扩展参数等于零(@fasSpts=0),此时该目标时间戳pts为距离当前时刻最接近的关键帧或音频帧的时间戳,在拉取音频帧(纯音频模式)时,终端从最新的音频帧开始拉取媒体流,或者,在拉取音视频帧(非纯音频模式)时,终端从最新的视频I帧开始拉取媒体流。
3)基于该第二扩展参数小于零(@fasSpts<0),此时该目标时间戳小于当前时刻,且该媒体帧包括从该目标时间戳开始已缓存的媒体帧,也即是,终端拉取缓存长度为|@fasSpts|毫秒的媒体流。
在一些实施例中,终端根据多媒体描述文件中的服务类型(@type)字段来确定第二扩展参数,基于查询到服务类型为“dynamic”(直播)且用户未指定播放进度,终端将第二扩展参数置为0,以便于用户能够实时观看到最新的直播视频流;基于查询到服务器类型为“dynamic”(直播)且用户指定了播放进度,终端将第二扩展参数置为播放进度所对应的时间戳(目标时间戳),从而能够方便地根据用户所指定的起点开始拉取媒体流;基于查询到服务类型为“static”(点播)且用户未指定播放进度,终端检测媒体流在上一次关闭时的历史播放进度,将第二扩展参数置为该历史播放进度所对应的时间戳(目标时间戳),从而能够方便用户从上一次观看到的进度开始继续进行观看,需要说明的是,基于用户首次观看媒体流,此时查询不到任何历史播放进度,终端将第二扩展参数置为首个媒体帧的时间戳(目标时间戳);基于查询到服务类型为“static”(点播)且用户指定了播放进度,终端将第二扩展参数置为播放进度所对应的时间戳(目标时间戳),从而能够方便地根据用户所指定的起点开始拉取媒体流。
对帧获取请求而言,认为其格式为目标码率的媒体流的url地址加上扩展字段,形象地表示为“url&extParam”,在FAS标准中,服务器在接收到帧获取请求之后,能够按照FAS所 规定的处理规范,对帧获取请求进行响应处理,参考下述S57。
在S57中,服务器接收并响应帧获取请求,从目标地址信息对应的地址,获取从起始位置开始的媒体帧。
其中,服务器在接收到帧获取请求之后,解析该帧获取请求,得到目标地址信息和起始位置,服务器基于目标地址信息和起始位置,从资源库中定位到目标码率的媒体流中,起始位置对应的媒体帧,获取从该媒体帧开始的媒体帧。
在一些实施例中,服务器基于起始位置,确定目标时间戳,再基于目标时间戳,确定并获取从起始位置开始的媒体帧。其中,媒体流中的每个媒体帧均对应一个时间戳,服务器基于起始位置,定位到起始位置的媒体帧,进而根据该媒体帧的时间戳确定目标时间戳,再通过目标时间戳定位到开始传输的媒体帧,进而,服务器能够该媒体帧开始向终端传输媒体流。
在一些实施例中,起始位置为拉取位置参数,上述确定目标时间戳的过程为:服务器基于音频参数和拉取位置参数,确定目标时间戳。
其中,该拉取位置参数(@fasSpts)用于指示服务器具体从哪帧开始发送媒体流,拉取位置参数的数据类型为int64_t类型,当然,也为其他数据类型,本公开实施例不对拉取位置参数的数据类型进行具体限定。在帧获取请求中,拉取位置参数等于0、大于0、小于0或者缺省,在不同的取值情况下会对应于服务器不同的处理逻辑,将在下述S57中进行详述。
在一些实施例中,基于该帧获取请求携带拉取位置参数,服务器解析该帧获取请求得到该拉取位置参数,这种情况下终端在帧获取请求中指定了拉取位置参数,服务器直接对帧获取请求的@fasSpts字段进行解析,得到拉取位置参数。
在一些实施例中,基于该帧获取请求缺省拉取位置参数,服务器将该拉取位置参数配置为默认值,这种情况下终端并未在帧获取请求中指定拉取位置参数,那么服务器为其配置默认值,令@fasSpts=defaultSpts。这里的默认值由服务器根据业务场景自行配置,比如,在直播业务场景下,将defaultSpts设置为0,在点播业务场景下,将defaultSpts设置为上一次结束观看时历史媒体帧的PTS(Presentation Time Stamp,显示时间戳),基于缓存中未记录历史媒体帧的PTS,那么将defaultSpts设置为首个媒体帧的PTS。
其中,该音频参数(@onlyAudio)用于指示媒体流的拉取模式,基于设定为true,表示服务器传输至终端的媒体帧为音频帧,俗称为“纯音频模式”。基于设定为false,表示服务器传输至终端的媒体帧为音视频帧,俗称为“非纯音频模式”。在帧获取请求中,音频参数为真、假或者缺省,在不同的取值情况下会对应于服务器不同的处理逻辑,将在下述S57中进行详述。
在一些实施例中,基于该帧获取请求携带音频参数,服务器解析该帧获取请求得到该音频参数,这种情况下终端在帧获取请求中指定了音频参数,服务器直接对帧获取请求的@onlyAudio字段进行解析,得到音频参数。
在一些实施例中,基于该帧获取请求缺省音频参数,服务器将该音频参数配置为默认值,这种情况下终端并未在帧获取请求中指定音频参数,那么服务器为其配置默认值。这里的默认值由服务器根据业务场景自行配置,比如,在提供视频业务时,将默认值设置为假,也即令@onlyAudio=false,或者,在仅提供音频业务时,将默认值设置为真,也即令@onlyAudio=true。需要说明的是,在本公开实施例中,仅以默认值为假(false)为例进行说明,根据默认值的不同,服务器的处理逻辑进行适应性调整,后文不做赘述。
在一些实施例中,在确定目标时间戳之前,服务器通过执行下述S57A-S57B来刷新当前 有效缓存区:
S57A、基于缓存区中媒体帧序列中媒体帧的时间戳呈非单调递增,服务器确定该缓存区发生时间戳回退。基于缓存区中的媒体帧序列呈单调递增,那么服务器确定该缓存区未发生时间戳回退。
其中,媒体帧序列为缓存区缓存的多个媒体帧所组成的序列。上述时间戳回退现象是指缓存区内的媒体帧并非按照时间戳单调递增的顺序进行存放,此时缓存区中存在冗余的媒体帧,这种现象通常容易发生在直播业务场景中,主播终端推流到服务器的过程中,由于网络波动、延时等原因,先发送的媒体帧有可能反而较晚到达服务器,致使缓存区内媒体帧序列中媒体帧的时间戳呈非单调递增,引发时间戳回退现象,另外,为了避免丢包问题,主播终端通常还会将各个媒体帧进行多次发送,这种冗余多发机制也会致使缓存区内媒体帧序列中媒体帧的时间戳呈非单调递增,引发时间戳回退现象。
在确定媒体帧序列中媒体帧的时间戳是否呈非单调递增时,服务器只需要从时间戳最小的媒体帧开始,按照缓存区内媒体帧序列的存放顺序,遍历是否存在媒体帧的时间戳大于下一媒体帧的时间戳,基于存在任一媒体帧的时间戳大于下一媒体帧的时间戳,确定媒体帧序列中媒体帧的时间戳呈非单调递增,确定缓存区发生时间戳回退。基于所有媒体帧的时间戳均小于或等于下一媒体帧的时间戳,确定媒体帧序列中媒体帧的时间戳呈单调递增,确定缓存区未发生时间戳回退。
例如,假设缓存区内媒体帧序列中媒体帧的时间戳分别为[1001,1002,1003,1004,1005…],省略部分的媒体帧的时间戳呈递增,此时媒体帧序列中媒体帧的时间戳呈单调递增,缓存区未发生时间戳回退现象。又比如,假设缓存区内媒体帧序列中媒体帧的时间戳分别为[1001,1002,1003,1001,1002,1003,1004…],省略部分的媒体帧的时间戳呈递增,此时由于第3个媒体帧的时间戳(PTS 3=1003)大于第4个媒体帧的时间戳(PTS 4=1001),媒体帧序列中媒体帧的时间戳呈非单调递增,缓存区发生时间戳回退现象。
在一些实施例中,对视频资源和音频资源进行分别讨论:对视频资源而言,判断媒体帧序列中媒体帧的时间戳是否呈非单调递增时,仅考虑视频资源的关键帧(I帧)序列中关键帧的时间戳是否呈非单调递增;对音频资源而言,判断媒体帧序列中媒体帧的时间戳是否呈非单调递增时,考虑音频资源的音频帧序列中音频帧的时间戳是否呈非单调递增。
也即是说,基于该缓存区中包括视频资源且在关键帧序列中关键帧的时间戳呈非单调递增时,确定该媒体帧序列呈非单调递增,其中,该关键帧序列为缓存区中已缓存的多个关键帧所构成的序列;基于该缓存区中不包括视频资源且在音频帧序列中音频帧的时间戳呈非单调递增时,确定该媒体帧序列呈非单调递增,其中,该音频帧序列为缓存区中已缓存的多个音频帧所构成的序列。
这是由于I帧的编解码不需要参考其他图像帧,仅利用本帧信息即可实现,而相对地,P帧(Predictive-coded picture,预测编码图像帧)和B帧(Bidirectionally predicted picture,双向预测编码图像帧)的编解码均需要参考其他图像帧,仅利用本帧信息无法完成编解码。对视频资源而言,是在I帧解码完成之后,基于I帧来进行P帧和B帧的解码,那么即使各个I帧对应的P帧和B帧的时间戳呈非单调递增,只要保证I帧序列(仅考虑I帧的PTS序列)中I帧的时间戳呈单调递增,那么认为缓存区未发生时间戳回退,反之,一旦I帧序列中I帧的时间戳呈非单调递增,那么确定缓存区发生时间戳回退。当然,如果缓存区里没有视频资源,那么直接对所有音频帧的PTS序列进行遍历判断即可,这里不做赘述。
在一些实施例中,由于时间戳回退现象可能不止发生一次,也即是说,在媒体帧序列中媒体帧的时间戳里划分出多个单调递增阶段,在每个阶段内部的媒体帧的时间戳呈单调递增,但是在不同阶段之间的媒体帧的时间戳呈非单调递增,这时缓存区中存在很多冗余无效的媒体帧,服务器通过执行下述S57B在缓存区中确定当前有效缓存区。
S57B、服务器将最后一个单调递增阶段所包含的各个媒体帧确定为当前有效缓存区内的资源。
在上述过程中,服务器从媒体帧序列中确定最后一个单调递增阶段中首个媒体帧,将媒体帧序列中从上述首个媒体帧开始到具有最大时间戳的媒体帧(相当于最新的媒体帧)之间的所有媒体帧确定为当前有效缓存区,这样保证当前有效缓存区内的媒体帧的时间戳呈单调递增。
例如,假设缓存区内媒体帧序列中媒体帧的时间戳分别为[1001,1002,1003,1001,1002,1003,1004…],省略部分的媒体帧的时间戳呈递增,此时缓存区发生时间戳回退,看出最后一个单调递增阶段的首个媒体帧为第4个媒体帧,那么将从第4个媒体帧开始到最新的媒体帧之间的所有媒体帧确定为当前有效缓存区。又比如,假设缓存区内媒体帧序列中媒体帧的时间戳分别为[1001,1002,1003,1001,1002,1003,1001…],省略部分的媒体帧的时间戳呈递增,缓存区发生时间戳回退,看出最后一个单调递增阶段的首个媒体帧为第7个媒体帧,那么将从第7个媒体帧开始到最新的媒体帧之间的所有媒体帧确定为当前有效缓存区。
在一些实施例中,对视频资源和音频资源进行分别讨论:基于缓存区内包括视频资源,对视频资源而言,服务器以视频资源的I帧作为计算点,从最后一个单调递增阶段的首个关键帧到最新的视频帧之间的所有媒体帧作为当前有效缓存区,其中,最新的视频帧的时间戳表示为latestVideoPts;基于缓存区内不包括视频资源,对音频资源而言,服务器以音频帧作为计算点,从最后一个单调递增阶段的首个音频帧到最新的音频帧之间的所有媒体帧作为当前有效缓存区,其中,最新的音频帧的时间戳表示为latestAudioPts。
在一些实施例中,更新当前有效缓存区的操作是定时触发的,也由技术人员手动触发,当然,还每当接收到帧获取请求时进行一次更新,这种方式称为“被动触发”,本公开实施例不对更新当前有效缓存区的触发条件进行具体限定。
通过上述S57A-S57B,能够及时发现缓存区内的时间戳回退现象,并基于针对时间戳回退现象进行处理,更新当前有效缓存区,避免在后续传输媒体帧的过程中出现异常。
图7是本公开实施例提供的一种确定目标时间戳的原理性示意图,请参考图7,示出了服务器在不同拉取位置参数以及音频参数的取值情况下,分别具有不同的处理逻辑,以下,将对服务器的处理逻辑进行介绍,由于拉取位置参数的取值情况分为四种:默认值、等于0、小于0以及大于0,下面针对这四种情况进行分别说明。
情况一、拉取位置参数为默认值
1):基于拉取位置参数为默认值,且音频参数为默认值或音频参数为假,服务器将最大时间戳减去该拉取位置参数的默认值的绝对值所得的数值确定为目标时间戳。
其中,基于当前有效缓存区中包括视频资源,该最大时间戳为最大视频时间戳latestVideoPts;基于当前有效缓存区中不包括视频资源,该最大时间戳为最大音频时间戳latestAudioPts。
上述过程是指帧获取请求中@fasSpts(拉取位置参数)缺省的情况下,服务器会为拉取位置参数配置默认值,令@fasSpts=defaultSpts。此时,如果帧获取请求中@onlyAudio(音 频参数)也缺省,服务器会为音频参数配置默认值(音频参数的默认值为false),令@onlyAudio=false,或者,帧获取请求自身的@onlyAudio字段携带false值,也即帧获取请求指定@onlyAudio=false,此时服务器的处理规则如下:
基于当前有效缓存区中包括视频资源,服务器将latestVideoPts–|defaultSpts|所得的数值确定为目标时间戳;基于当前有效缓存区中不包括视频资源,服务器将latestAudioPts–|defaultSpts|所得的数值确定为目标时间戳。
2):基于拉取位置参数为默认值,且音频参数为真,将最大音频时间戳减去该拉取位置参数的默认值的绝对值所得的数值确定为目标时间戳。
上述过程是指帧获取请求中@fasSpts(拉取位置参数)缺省的情况下,服务器会为拉取位置参数配置默认值,令@fasSpts=defaultSpts。此时,如果帧获取请求的@onlyAudio字段携带true值,也即帧获取请求指定@onlyAudio=true(纯音频模式,仅传输音频流),此时服务器的处理规则如下:服务器将latestAudioPts–|defaultSpts|所得的数值确定为目标时间戳。
情况二、拉取位置参数等于0
1):基于拉取位置参数等于0,且音频参数为默认值或音频参数为假,将最大时间戳确定为目标时间戳。
其中,基于当前有效缓存区中包括视频资源,该最大时间戳为最大视频时间戳latestVideoPts;基于当前有效缓存区中不包括视频资源,该最大时间戳为最大音频时间戳latestAudioPts。
上述过程是指帧获取请求中@fasSpts字段携带0值(@fasSpts=0)的情况下,此时,如果帧获取请求中@onlyAudio(音频参数)也缺省,服务器会为音频参数配置默认值(音频参数的默认值为false),令@onlyAudio=false,或者,帧获取请求中@onlyAudio字段携带false值(帧获取请求指定@onlyAudio=false),此时服务器的处理规则如下:
基于当前有效缓存区中包括视频资源,服务器将latestVideoPts确定为目标时间戳;基于当前有效缓存区中不包括视频资源,服务器将latestAudioPts确定为目标时间戳。
2):基于拉取位置参数等于0,且音频参数为真,将最大音频时间戳确定为目标时间戳。
上述过程是指帧获取请求中@fasSpts字段携带0值(@fasSpts=0)的情况下,如果帧获取请求中@onlyAudio字段携带true值(帧获取请求指定@onlyAudio=true),也即是纯音频模式、仅传输音频流,此时服务器的处理规则如下:服务器将latestAudioPts确定为目标时间戳。
情况三、拉取位置参数小于0
1):基于拉取位置参数小于0,且音频参数为默认值或音频参数为假,将最大时间戳减去该拉取位置参数的绝对值所得的数值确定为目标时间戳。
其中,基于当前有效缓存区中包括视频资源,该最大时间戳为最大视频时间戳latestVideoPts;基于当前有效缓存区中不包括视频资源,该最大时间戳为最大音频时间戳latestAudioPts。
上述过程是指帧获取请求中@fasSpts字段携带小于0的值(@fasSpts<0)的情况下,此时,如果帧获取请求中@onlyAudio(音频参数)也缺省,服务器会为音频参数配置默认值(音频参数的默认值为false),令@onlyAudio=false,或者,帧获取请求中@onlyAudio字段携带false值(帧获取请求指定@onlyAudio=false),此时服务器的处理规则如下:
基于当前有效缓存区中包括视频资源,服务器将latestVideoPts-|@fasSpts|确定为目标时间戳;基于当前有效缓存区中不包括视频资源,服务器将latestAudioPts-|@fasSpts|确定为目标时间戳。
2):基于拉取位置参数小于0,且音频参数为真,将最大音频时间戳减去该拉取位置参数的绝对值所得的数值确定为目标时间戳。
上述过程是指帧获取请求中@fasSpts字段携带小于0的值(@fasSpts<0)的情况下,此时,如果帧获取请求中@onlyAudio字段携带true值(帧获取请求指定@onlyAudio=true),也即是纯音频模式、仅传输音频流,此时服务器的处理规则如下:服务器将latestAudioPts-|@fasSpts|确定为目标时间戳。
情况四、拉取位置参数大于0
1):基于拉取位置参数大于0,且音频参数为默认值或音频参数为假,在缓存区中发生时间戳回退时,将最大时间戳确定为目标时间戳。
其中,基于当前有效缓存区中包括视频资源,该最大时间戳为最大视频时间戳latestVideoPts;基于当前有效缓存区中不包括视频资源,该最大时间戳为最大音频时间戳latestAudioPts。
上述过程是指帧获取请求中@fasSpts字段携带大于0的值(@fasSpts>0)的情况下,此时,如果帧获取请求中@onlyAudio(音频参数)也缺省,服务器会为音频参数配置默认值(音频参数的默认值为false),令@onlyAudio=false,或者,帧获取请求中@onlyAudio字段携带false值(帧获取请求指定@onlyAudio=false),此时服务器的处理规则如下:
在缓存区中发生时间戳回退时,a)基于当前有效缓存区中包括视频资源,服务器将latestVideoPts确定为目标时间戳;b)基于当前有效缓存区中不包括视频资源,服务器将latestAudioPts确定为目标时间戳。
2):基于拉取位置参数大于0,且音频参数为真,在缓存区中发生时间戳回退时,将最大音频时间戳确定为目标时间戳。
上述过程是指帧获取请求中@fasSpts字段携带大于0的值(@fasSpts>0)的情况下,此时,如果帧获取请求中@onlyAudio字段携带true值(帧获取请求指定@onlyAudio=true),也即是纯音频模式、仅传输音频流,服务器的处理规则如下:服务器将latestAudioPts确定为目标时间戳。
3):基于拉取位置参数大于0,且音频参数为默认值或音频参数为假,在缓存区中未发生时间戳回退时,将该拉取位置参数确定为目标时间戳。
上述过程是指帧获取请求中@fasSpts字段携带大于0的值(@fasSpts>0)的情况下,此时,如果帧获取请求中@onlyAudio(音频参数)也缺省,服务器会为音频参数配置默认值(音频参数的默认值为false),令@onlyAudio=false,或者,帧获取请求中@onlyAudio字段携带false值(帧获取请求指定@onlyAudio=false),此时服务器的处理规则如下:在缓存区中未发生时间戳回退时,服务器将@fasSpts确定为目标时间戳。
4):基于拉取位置参数大于0,且音频参数为真,在缓存区中未发生时间戳回退时,将该拉取位置参数确定为目标时间戳。
上述过程是指帧获取请求中@fasSpts字段携带大于0的值(@fasSpts>0)的情况下,此时,如果帧获取请求中@onlyAudio字段携带true值(帧获取请求指定@onlyAudio=true),也即是纯音频模式、仅传输音频流,服务器的处理规则如下:在缓存区中未发生时间戳回退 时,服务器将@fasSpts确定为目标时间戳。
针对上述情况3)和4)的讨论,看出,基于拉取位置参数大于0(@fasSpts>0),且缓存区中未发生时间戳回退时,不论音频参数为真、为假还是默认值,服务器均将拉取位置参数确定为目标时间戳。
在上述各个情况中,服务器判断是否发生时间戳回退的操作参见上述S57A,服务器更新当前有效缓存区的操作参见上述S57B,这里不做赘述。
在上述基础上,服务器在拉取位置参数的不同取值情况下,均能够执行对应的处理逻辑,从而确定出目标时间戳,该目标时间戳用于确定媒体流的从起始位置开始的媒体帧。
在一些实施例中,服务器确定目标时间戳后,器通过下述方式一确定从起始位置开始的媒体帧:
方式一、服务器确定当前有效缓存区中时间戳最接近该目标时间戳的媒体帧为从起始位置开始的媒体帧。
在一些实施例中,在音频参数缺省或音频参数为假的情况下,基于当前有效缓存区中包括视频资源,将视频资源中时间戳最接近该目标时间戳的关键帧(I帧)确定为从起始位置开始的媒体帧;基于当前有效缓存区中不包括视频资源,将时间戳最接近该目标时间戳的音频帧确定为从起始位置开始的媒体帧。
在一些实施例中,在音频参数为真的情况下,服务器直接将时间戳最接近该目标时间戳的音频帧确定为从起始位置开始的媒体帧。该过程包括下述几种示例性场景:
A):@fasSpts=defaultSpts,@onlyAudio缺省或@onlyAudio=false时,请参考上述情况一中的示例1),基于当前有效缓存区中包括视频资源,目标时间戳为latestVideoPts–|defaultSpts|,服务器将PTS最接近latestVideoPts–|defaultSpts|的I帧作为从起始位置开始的媒体帧;此外,基于当前有效缓存区中不包括视频资源,目标时间戳为latestAudioPts–|defaultSpts|,服务器将PTS最接近latestAudioPts–|defaultSpts|的音频帧作为从起始位置开始的媒体帧。
B):@fasSpts=defaultSpts,@onlyAudio=true时,请参考上述情况一中的示例2),目标时间戳为latestAudioPts–|defaultSpts|,服务器将PTS最接近latestAudioPts–|defaultSpts|的音频帧作为从起始位置开始的媒体帧。
C):@fasSpts=0,@onlyAudio缺省或@onlyAudio=false时,请参考上述情况二中的示例1),基于当前有效缓存区中包括视频资源,目标时间戳为latestVideoPts,服务器将PTS最接近latestVideoPts的I帧作为从起始位置开始的媒体帧;基于当前有效缓存区中不包括视频资源,目标时间戳为latestAudioPts,服务器将PTS最接近latestAudioPts的音频帧作为从起始位置开始的媒体帧。
D):@fasSpts=0,@onlyAudio=true时,请参考上述情况二中的示例2),目标时间戳为latestAudioPts,服务器将PTS最接近latestAudioPts的音频帧作为从起始位置开始的媒体帧。
E):@fasSpts<0,@onlyAudio缺省或@onlyAudio=false时,请参考上述情况三中的示例1),基于当前有效缓存区中包括视频资源,目标时间戳为latestVideoPts-|@fasSpts|,服务器将PTS最接近latestVideoPts-|@fasSpts|的I帧作为从起始位置开始的媒体帧;反之,基于当前有效缓存区中不包括视频资源,目标时间戳为latestAudioPts-|@fasSpts|,服务器将PTS最接近latestAudioPts-|@fasSpts|的音频帧作为从起始位 置开始的媒体帧。
F):@fasSpts<0,@onlyAudio=true时,请参考上述情况三中的示例2),目标时间戳为latestAudioPts-|@fasSpts|,服务器将PTS最接近latestAudioPts-|@fasSpts|的音频帧作为从起始位置开始的媒体帧。
G):@fasSpts>0,@onlyAudio缺省或@onlyAudio=false,缓存区中发生时间戳回退时,请参考上述情况四中的示例1),基于当前有效缓存区中包括视频资源,目标时间戳为latestVideoPts,服务器将PTS最接近latestVideoPts的I帧(最新的I帧)作为从起始位置开始的媒体帧;基于当前有效缓存区中不包括视频资源,目标时间戳为latestAudioPts,服务器将PTS最接近latestAudioPts的音频帧(最新的音频帧)作为从起始位置开始的媒体帧。
H):@fasSpts>0,@onlyAudio=true,缓存区中发生时间戳回退时,请参考上述情况四中的示例2),目标时间戳为latestAudioPts,服务器将PTS最接近latestAudioPts的音频帧(最新的音频帧)作为从起始位置开始的媒体帧。
以此类推,在@fasSpts>0时,针对上述情况四中的其余示例性讨论,在确定目标时间戳之后,服务器也通过上述方式一,将当前有效缓存区中时间戳最接近该目标时间戳的媒体帧确定为从起始位置开始的媒体帧,这里不进行一一枚举。
在一些实施例中,在@fasSpts>0时,除了上述方式一之外,服务器还通过下述方式二来确定媒体帧:
方式二、基于该当前有效缓存区中存在目标媒体帧,服务器确定该目标媒体帧为从起始位置开始的媒体帧,该目标媒体帧的时间戳大于或等于该目标时间戳且最接近该目标时间戳。
在一些实施例中,在音频参数缺省或音频参数为假的情况下,基于当前有效缓存区中包括视频资源,目标媒体帧是指视频资源内的I帧;基于当前有效缓存区中不包括视频资源,目标媒体帧是指音频帧。
在一些实施例中,在音频参数为真的情况下,目标媒体帧是指音频帧。该过程包括下述几种示例性场景:
I):@fasSpts>0,@onlyAudio缺省或@onlyAudio=false,缓存区中未发生时间戳回退时,请参考上述情况四中的示例3),此时目标时间戳为@fasSpts,基于当前有效缓存区中包括视频资源,服务器从PTS最小的I帧开始,沿着PTS增大的方向逐个遍历,直到查询到第一个PTS≥@fasSpts的I帧(目标媒体帧),说明当前有效缓存区中存在目标媒体帧,服务器将上述目标媒体帧确定为从起始位置开始的媒体帧;基于当前有效缓存区内不包括视频资源,服务器从PTS最小的音频帧开始,沿着PTS增大的方向逐个遍历,直到查询到第一个PTS≥@fasSpts的音频帧(目标媒体帧),说明当前有效缓存区中存在目标媒体帧,服务器将上述目标媒体帧确定为从起始位置开始的媒体帧。
J):@fasSpts>0,@onlyAudio=true,缓存区中未发生时间戳回退时,请参考上述情况四中的示例4),此时目标时间戳为@fasSpts,服务器从PTS最小的音频帧开始,沿着PTS增大的方向逐个遍历,直到查询到第一个PTS≥@fasSpts的音频帧(目标媒体帧),说明当前有效缓存区中存在目标媒体帧,服务器将上述目标媒体帧确定为从起始位置开始的媒体帧。
上述方式二中,提供了在当前有效缓存区中能够查询到目标媒体帧时,服务器如何确定从起始位置开始的媒体帧,然而,在一些实施例中,有可能在当前有效缓存区内并未查询到目标媒体帧,这种情况通常会出现在直播业务场景中,观众终端所指定拉取@fasSpts的帧获 取请求先到达了服务器,而@fasSpts所对应的媒体帧(直播视频帧)还在推流阶段的传输过程中,此时服务器还通过下述方式三来确定从起始位置开始的媒体帧。
方式三、基于该当前有效缓存区中不存在目标媒体帧,服务器进入等待状态,直到该目标媒体帧写入该当前有效缓存区时,确定该目标媒体帧为从起始位置开始的媒体帧,该目标媒体帧的时间戳大于或等于该目标时间戳且最接近该目标时间戳。
在一些实施例中,在音频参数缺省或音频参数为假的情况下,基于当前有效缓存区中包括视频资源,目标媒体帧是指视频资源内的I帧;基于当前有效缓存区中不包括视频资源,目标媒体帧是指音频帧。
在一些实施例中,在音频参数为真的情况下,目标媒体帧是指音频帧。
具体地,包括下述几种示例性场景:
K):@fasSpts>0,@onlyAudio缺省或@onlyAudio=false,缓存区中未发生时间戳回退时,请参考上述情况四中的示例3),此时目标时间戳为@fasSpts,基于当前有效缓存区中包括视频资源,服务器从PTS最小的I帧开始,沿着PTS增大的方向逐个遍历,如果遍历了所有的I帧之后查询不到满足PTS≥@fasSpts的I帧(目标媒体帧),说明当前有效缓存区中不存在目标媒体帧,服务器进入等待状态,等待第一个PTS≥@fasSpts的I帧(目标媒体帧)被写入当前有效缓存区时,确定目标媒体帧为从起始位置开始的媒体帧;基于当前有效缓存区内不包括视频资源,服务器从PTS最小的音频帧开始,沿着PTS增大的方向逐个遍历,如果遍历了所有的音频帧之后查询不到满足PTS≥@fasSpts的音频帧(目标媒体帧),说明当前有效缓存区中不存在目标媒体帧,服务器进入等待状态,等待第一个PTS≥@fasSpts的音频帧(目标媒体帧)被写入当前有效缓存区时,确定目标媒体帧为从起始位置开始的媒体帧。
L):@fasSpts>0,@onlyAudio=true,缓存区中未发生时间戳回退时,请参考上述情况四中的示例4),此时目标时间戳为@fasSpts,服务器从PTS最小的音频帧开始,沿着PTS增大的方向逐个遍历,如果遍历了所有的音频帧之后查询不到满足PTS≥@fasSpts的音频帧(目标媒体帧),说明当前有效缓存区中不存在目标媒体帧,服务器进入等待状态,等待第一个PTS≥@fasSpts的音频帧(目标媒体帧)被写入当前有效缓存区时,确定目标媒体帧为从起始位置开始的媒体帧。
上述方式三中,提供了在当前有效缓存区中查询不到目标媒体帧时,服务器如何确定从起始位置开始的媒体帧,在一些实施例中,有可能会由于异常情况的出现,导致帧获取请求中携带的@fasSpts是一个较大的异常值,基于基于上述方式三进行处理,会导致很长的等待时间,在大数据场景下如果存在并发的帧获取请求发生异常情况,这些帧获取请求都会进入一个阻塞的等待状态,占用服务器的处理资源,那么会对服务器的性能造成极大的损失。
有鉴于此,服务器还设置一个超时阈值,从而通过下述方式四,基于超时阈值来确定是否需要返回拉取失败信息。下面对方式四进行详述。
方式四、基于该当前有效缓存区中不存在目标媒体帧,且目标时间戳与最大时间戳之间的差值大于超时阈值,服务器发送拉取失败信息,该目标媒体帧的时间戳大于或等于该目标时间戳且最接近该目标时间戳。
在一些实施例中,在音频参数缺省或音频参数为假的情况下,基于当前有效缓存区中包括视频资源,该最大时间戳为最大视频时间戳latestVideoPts;基于当前有效缓存区中不包括视频资源,该最大时间戳为最大音频时间戳latestAudioPts。
在一些实施例中,在音频参数为真的情况下,该最大时间戳为最大音频时间戳 latestAudioPts。
假设超时阈值为timeoutPTS,超时阈值是任一大于或等于0的数值,超时阈值是一个服务器预设的数值,也由技术人员基于业务场景进行个性化的配置,本公开实施例不对超时阈值的获取方式进行具体限定,在一些实施例中,包括下述几种示例性场景:
M):@fasSpts>0,@onlyAudio缺省或@onlyAudio=false,缓存区中未发生时间戳回退时,请参考上述情况四中的示例3),此时目标时间戳为@fasSpts,基于当前有效缓存区中包括视频资源,服务器从PTS最小的I帧开始,沿着PTS增大的方向逐个遍历,如果遍历了所有的I帧之后查询不到满足PTS≥@fasSpts的I帧(目标媒体帧),说明当前有效缓存区中不存在目标媒体帧,服务器判断@fasSpts与latestVideoPts之间的差值是否大于timeoutPTS,基于@fasSpts–latestVideoPts>timeoutPTS,服务器向终端发送拉取失败信息。基于@fasSpts–latestVideoPts≤timeoutPTS,服务器进入等待状态,也即是对应于上述方式三中示例K)对应情况下所执行的操作;基于当前有效缓存区内不包括视频资源,服务器从PTS最小的音频帧开始,沿着PTS增大的方向逐个遍历,如果遍历了所有的音频帧之后查询不到满足PTS≥@fasSpts的音频帧(目标媒体帧),说明当前有效缓存区中不存在目标媒体帧,服务器判断@fasSpts与latestAudioPts之间的差值是否大于timeoutPTS,基于@fasSpts–latestAudioPts>timeoutPTS,服务器向终端发送拉取失败信。基于@fasSpts–latestAudioPts≤timeoutPTS,服务器进入等待状态,也即是对应于上述方式三中示例K)对应情况下所执行的操作。
N):@fasSpts>0,@onlyAudio=true,缓存区中未发生时间戳回退时,请参考上述情况四中的示例4),此时目标时间戳为@fasSpts,服务器从PTS最小的音频帧开始,沿着PTS增大的方向逐个遍历,如果遍历了所有的音频帧之后查询不到满足PTS≥@fasSpts的音频帧(目标媒体帧),说明当前有效缓存区中不存在目标媒体帧,服务器判断@fasSpts与latestAudioPts之间的差值是否大于timeoutPTS,基于@fasSpts–latestAudioPts>timeoutPTS,服务器向终端发送拉取失败信息。基于@fasSpts–latestAudioPts≤timeoutPTS,服务器进入等待状态,也即是对应于上述方式三中示例K)对应情况下所执行的操作。
在上述方式三和方式四相结合,提供一种在@fasSpts>0且当前有效缓存区中不存在目标媒体帧时的异常处理逻辑,基于目标时间戳与最大时间戳之间的差值小于或等于超时阈值,服务器通过方式三进入等待状态(等待处理模式),直到目标媒体帧到达时,将目标媒体帧确定为从起始位置开始的媒体帧。基于目标时间戳与最大时间戳之间的差值大于超时阈值,服务器通过方式四发送拉取失败信息(错误处理模式),这时服务器是判定帧获取请求出错的,因此直接向终端返回拉取失败信息,该拉取失败信息是一个错误码的形式。
在上述步骤中,服务器基于该媒体流的拉取位置参数,确定该媒体流的从起始位置开始的媒体帧,正是由于在帧获取请求中携带了拉取位置参数,使得服务器能够方便的确定在响应该帧获取请求的过程中,到底从哪个媒体帧开始以目标码率进行媒体帧的传输,提高了资源传输过程的灵活性,进一步地,在需要动态码率切换的场景下,只需要在帧获取请求中更换携带的地址信息(@url字段)以及拉取位置参数(@fasSpts字段),就实现从任一个指定的从起始位置开始的媒体帧开始以新的码率进行媒体帧的传输,实现自适应地码率切换。
在S58中,服务器以目标码率,向终端传输从起始位置开始的媒体帧。
服务器获取从起始位置开始的媒体帧后,以目标码率,向终端传输从起始位置开始的媒 体帧,在该过程中,服务器像流水一样源源不断的向终端发送媒体帧,形象地称为“媒体流传输”。
在一个示例性场景中,基于服务器为CDN服务器,那么该目标地址信息是一个域名,终端向CDN服务器的中心平台发送帧获取请求,中心平台调用DNS(Domain Name System,域名系统,本质上是一个域名解析库)对域名进行解析,得到域名对应的CNAME(别名)记录,基于终端的地理位置信息对CNAME记录再次进行解析,得到一个距离终端最近的边缘服务器的IP(Internet Protocol,网际互连协议)地址,这时中心平台将帧获取请求导向至上述边缘服务器,由边缘服务器响应于帧获取请求,以目标码率向终端提供多媒体资源的媒体帧,从而能够使得终端就近访问目标码率的多媒体资源。
在一些实施例中,本公开实施例提供一种CDN服务器内部回源机制,在CDN系统中,有可能边缘服务器中无法提供帧获取请求所指定的多媒体资源,此时边缘服务器向上级节点设备回源拉取媒体流。
那么边缘服务器向上级节点设备发送回源拉取请求,上级节点设备响应于回源拉取请求,向边缘服务器返回对应的媒体流,再由边缘服务器向终端发送对应的媒体流。
在上述过程中,边缘服务器在获取回源拉取请求时,基于终端发送的帧获取请求中携带@fasSpts字段,边缘服务器直接将帧获取请求确定为回源拉取请求,将回源拉取请求转发至上级节点设备,反之,基于终端发送的帧获取请求中缺省@fasSpts字段,边缘服务器需要为@fasSpts字段配置默认值defaultSpts,进而在帧获取请求嵌入@fasSpts字段,将@fasSpts字段内所存储的数值置为defaultSpts,得到回源拉取请求。
在一些实施例中,该上级节点设备是第三方源站服务器,此时回源拉取请求必须携带@fasSpts字段,在一些实施例中,该上级节点设备也是CDN系统内部的节点服务器(比如中心平台或者分布式数据库系统的节点设备),基于帧获取请求中携带@fasSpts字段,那么按照@fasSpts字段的实际值进行回源,否则,依据默认值@fasSpts=defaultSpts进行回源,本公开实施例不对边缘服务器的回源方式进行具体限定。
以上介绍了本公开实施例提供的方法实施例,以下对本公开实施例提供的虚拟装置进行示例性说明。
图8是根据一示例性实施例示出的一种媒体流传输装置的框图,应用于终端,该装置包括确定模块801和发送模块802;其中,确定模块801,用于响应于对媒体流的帧获取指令,从多种码率的所述媒体流的地址信息中,确定目标码率的所述媒体流的目标地址信息。所述确定模块801,还用于确定所述目标码率对应的待获取媒体帧在所述媒体流中的起始位置。发送模块802,用于向服务器发送携带有所述目标地址信息和起始位置的帧获取请求,所述帧获取请求用于指示所述服务器以所述目标码率返回输所述媒体流中从所述起始位置开始的媒体帧。
在一些实施例中,基于所述帧获取指令由对媒体流的播放操作触发,所述确定模块801用于:将所述媒体流中所述播放操作的操作时间产生的媒体帧所在位置确定为所述起始位置。或,将所述帧获取指令中所选定的媒体帧在所述媒体流中的位置确定为所述起始位置。或,将所述媒体流的第一个媒体帧所在位置确定为所述起始位置。
在一些实施例中,所述帧获取指令在所述媒体流的播放状态信息满足码率切换条件时触发。
在一些实施例中,所述装置还包括:获取模块,用于在接收所述媒体流中任一媒体帧时, 获取所述媒体流的播放状态信息。所述确定模块801还用于响应于所述播放状态信息符合所述码率切换条件,从多种码率的所述媒体流的地址信息中,确定目标码率的所述媒体流的目标地址信息。所述确定模块801用于根据所述任一媒体帧在所述媒体流中的位置,确定所述目标码率对应的待获取媒体帧在所述媒体流中的起始位置。
在一些实施例中,所述确定模块801用于:响应于所述播放状态信息符合所述码率切换条件,根据所述播放状态信息和当前码率,确定目标码率。响应于所述目标码率与所述当前码率不相等,从多种码率的所述媒体流的地址信息中,确定目标码率的所述媒体流的目标地址信息。
在一些实施例中,所述播放状态信息包括第一缓存量,所述第一缓存量为当前对所述媒体流已缓存且未播放的缓存量。所述确定模块801用于:响应于所述第一缓存量大于第一缓存量阈值或所述第一缓存量小于第二缓存量阈值,根据所述播放状态信息以及当前码率,确定目标码率,其中所述第二缓存量阈值小于所述第一缓存量阈值。
在一些实施例中,所述确定模块801用于:获取多个候选码率。根据所述多个候选码率与所述当前码率之间的关系、所述播放状态信息以及所述媒体流中所述任一媒体帧在所述任一媒体帧所在媒体组中的位置,获取每个候选码率对应的第二缓存量。根据所述每个候选码率对应的第二缓存量与第一缓存量阈值或第二缓存量阈值的关系,从所述多个候选码率中,确定目标码率。其中,所述每个候选码率对应的第二缓存量为将码率切换至候选码率后,所述任一媒体帧所在媒体组传输结束时对所述媒体流已缓存但未播放的缓存量。
在一些实施例中,所述帧获取请求还包括第一扩展参数或者第二扩展参数中至少一项,所述第一扩展参数用于表示所述媒体帧是否为音频帧,所述第二扩展参数用于表示从所述第二扩展参数所指示的目标时间戳开始传输所述媒体流中的媒体帧。
在一些实施例中,所述多种码率的所述媒体流的地址信息存储于所述媒体流的媒体描述文件中。
在一些实施例中,所述媒体描述文件包括版本号和媒体描述集合,其中,所述版本号包括所述媒体描述文件的版本号或者资源传输标准的版本号中至少一项,所述媒体描述集合包括多个媒体描述元信息,每个媒体描述元信息对应于一种码率的媒体流,每个媒体描述元信息包括所述媒体描述元信息所对应码率的媒体流的画面组长度以及属性信息。
需要说明的是:上述实施例提供的媒体流传输装置在传输媒体流时,仅以上述各功能模块的划分进行举例说明,实际应用中,根据需要而将上述功能分配由不同的功能模块完成,即将装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的媒体流传输装置与媒体流传输方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。
图9是根据一示例性实施例示出的一种媒体流传输装置的框图,应用于服务器,该装置包括接收模块901、获取模块902和传输模块903;其中,接收模块901,用于接收帧获取请求,所述帧获取请求携带有目标码率的媒体流的目标地址信息和所述目标码率对应的待获取媒体帧在所述媒体流中的起始位置。获取模块902,用于响应于所述帧获取请求,从所述目标地址信息对应的地址,获取从所述起始位置开始的媒体帧。传输模块903,用于以所述目标码率,向终端传输从所述起始位置开始的媒体帧。
在一些实施例中,所述获取模块902用于:基于所述起始位置,确定目标时间戳。基于所述目标时间戳,确定并获取从所述起始位置开始的媒体帧。
在一些实施例中,基于所述起始位置为拉取位置参数,所述获取模块902用于基于音频参数和拉取位置参数,确定目标时间戳。
在一些实施例中,所述获取模块902用于:基于所述拉取位置参数为默认值,且所述音频参数为默认值或所述音频参数为假,将最大时间戳减去所述拉取位置参数的默认值的绝对值所得的数值确定为所述目标时间戳。或,基于所述拉取位置参数为默认值,且所述音频参数为真,将最大音频时间戳减去所述拉取位置参数的默认值的绝对值所得的数值确定为所述目标时间戳。或,基于所述拉取位置参数等于0,且所述音频参数为默认值或所述音频参数为假,将最大时间戳确定为所述目标时间戳。或,基于所述拉取位置参数等于0,且所述音频参数为真,将最大音频时间戳确定为所述目标时间戳。或,基于所述拉取位置参数小于0,且所述音频参数为默认值或所述音频参数为假,将最大时间戳减去所述拉取位置参数的绝对值所得的数值确定为所述目标时间戳。或,基于所述拉取位置参数小于0,且所述音频参数为真,将最大音频时间戳减去所述拉取位置参数的绝对值所得的数值确定为所述目标时间戳。或,基于所述拉取位置参数大于0,且所述音频参数为默认值或所述音频参数为假,在缓存区中发生时间戳回退时,将最大时间戳确定为所述目标时间戳。或,基于所述拉取位置参数大于0,且所述音频参数为真,在缓存区中发生时间戳回退时,将最大音频时间戳确定为所述目标时间戳。或,基于所述拉取位置参数大于0,且缓存区中未发生时间戳回退时,将所述拉取位置参数确定为所述目标时间戳。
在一些实施例中,所述获取模块902,还用于:基于缓存区中媒体帧序列中媒体帧的时间戳呈非单调递增,确定所述缓存区发生时间戳回退。基于缓存区中媒体帧序列中媒体帧的时间戳不是呈非单调递增,确定所述缓存区未发生时间戳回退,所述媒体帧序列为所述缓存区缓存的多个媒体帧所组成的序列。
在一些实施例中,所述获取模块902用于:基于所述缓存区中包括视频资源,在关键帧序列中关键帧的时间戳呈非单调递增时,确定所述媒体帧序列呈非单调递增,所述关键帧序列为缓存的多个关键帧组成的序列。基于所述缓存区中不包括视频资源,在音频帧序列呈中音频帧的时间戳非单调递增时,确定所述媒体帧序列呈非单调递增,所述音频帧序列为缓存的多个音频帧组成的序列。
在一些实施例中,所述获取模块902用于:基于当前有效缓存区中存在目标媒体帧,确定所述目标媒体帧为所述从所述起始位置开始的媒体帧,所述目标媒体帧的时间戳大于或等于所述目标时间戳且最接近所述目标时间戳。或,基于所述当前有效缓存区中不存在目标媒体帧,进入等待状态,直到所述目标媒体帧写入所述当前有效缓存区时,确定所述目标媒体帧为所述从所述起始位置开始的媒体帧,所述目标媒体帧的时间戳大于或等于所述目标时间戳且最接近所述目标时间戳。或,基于所述当前有效缓存区中不存在目标媒体帧,且所述目标时间戳与最大时间戳之间的差值大于超时阈值,发送拉取失败信息,所述目标媒体帧的时间戳大于或等于所述目标时间戳且最接近所述目标时间戳。
需要说明的是:上述实施例提供的媒体流传输装置在传输媒体流时,仅以上述各功能模块的划分进行举例说明,实际应用中,根据需要而将上述功能分配由不同的功能模块完成,即将装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的媒体流传输装置与媒体流传输方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。
以上介绍了本公开实施例提供的虚拟装置,以下对本公开实施例提供的硬件装置进行示 例性说明。
图10示出了本公开一个示例性实施例提供的终端1000的结构框图。该终端1000是:智能手机、平板电脑、MP3(Moving Picture Experts Group Audio Layer III,动态影像专家压缩标准音频层面3)播放器、MP4(Moving Picture Experts Group Audio Layer IV,动态影像专家压缩标准音频层面4)播放器、笔记本电脑或台式电脑。终端1000还可能被称为用户设备、便携式终端、膝上型终端、台式终端等其他名称。
通常,终端1000包括有:一个或多个处理器1001和一个或多个存储器1002。
处理器1001包括一个或多个处理核心,比如4核心处理器、8核心处理器等。处理器1001采用DSP(Digital Signal Processing,数字信号处理)、FPGA(Field-Programmable Gate Array,现场可编程门阵列)、PLA(Programmable Logic Array,可编程逻辑阵列)中的至少一种硬件形式来实现。处理器1001也包括主处理器和协处理器,主处理器是用于对在唤醒状态下的数据进行处理的处理器,也称CPU(Central Processing Unit,中央处理器);协处理器是用于对在待机状态下的数据进行处理的低功耗处理器。在一些实施例中,处理器1001在集成有GPU(Graphics Processing Unit,图像处理器),GPU用于负责显示屏所需要显示的内容的渲染和绘制。一些实施例中,处理器1001还包括AI(Artificial Intelligence,人工智能)处理器,该AI处理器用于处理有关机器学习的计算操作。
存储器1002包括一个或多个计算机可读存储介质,该计算机可读存储介质是非暂态的。存储器1002还可包括高速随机存取存储器,以及非易失性存储器,比如一个或多个磁盘存储设备、闪存存储设备。在一些实施例中,存储器1002中的非暂态的计算机可读存储介质用于存储至少一个指令,该至少一个指令用于被处理器1001所执行以实现本公开中方法实施例提供的媒体流传输方法。
在一些实施例中,终端1000还可选包括有:外围设备接口1003和至少一个外围设备。处理器1001、存储器1002和外围设备接口1003之间通过总线或信号线相连。各个外围设备通过总线、信号线或电路板与外围设备接口1003相连。在一些实施例中,外围设备包括:射频电路1004、触摸显示屏1005、摄像头组件1006、音频电路1007、定位组件1008和电源1009中的至少一种。
外围设备接口1003可被用于将I/O(Input/Output,输入/输出)相关的至少一个外围设备连接到处理器1001和存储器1002。在一些实施例中,处理器1001、存储器1002和外围设备接口1003被集成在同一芯片或电路板上;在一些其他实施例中,处理器1001、存储器1002和外围设备接口1003中的任意一个或两个在单独的芯片或电路板上实现,本实施例对此不加以限定。
射频电路1004用于接收和发射RF(Radio Frequency,射频)信号,也称电磁信号。射频电路1004通过电磁信号与通信网络以及其他通信设备进行通信。射频电路1004将电信号转换为电磁信号进行发送,或者,将接收到的电磁信号转换为电信号。在一些实施例中,射频电路1004包括:天线系统、RF收发器、一个或多个放大器、调谐器、振荡器、数字信号处理器、编解码芯片组、用户身份模块卡等等。射频电路1004通过至少一种无线通信协议来与其它终端进行通信。该无线通信协议包括但不限于:万维网、城域网、内联网、各代移动通信网络(2G、3G、4G及5G)、无线局域网和/或WiFi(Wireless Fidelity,无线保真)网络。在一些实施例中,射频电路1004还包括NFC(Near Field Communication,近距离无线通信)有关的电路,本公开对此不加以限定。
显示屏1005用于显示UI(User Interface,用户界面)。该UI包括图形、文本、图标、视频及其它们的任意组合。当显示屏1005是触摸显示屏时,显示屏1005还具有采集在显示屏1005的表面或表面上方的触摸信号的能力。该触摸信号作为控制信号输入至处理器1001进行处理。此时,显示屏1005还用于提供虚拟按钮和/或虚拟键盘,也称软按钮和/或软键盘。在一些实施例中,显示屏1005为一个,设置终端1000的前面板;在另一些实施例中,显示屏1005为至少两个,分别设置在终端1000的不同表面或呈折叠设计;在再一些实施例中,显示屏1005是柔性显示屏,设置在终端1000的弯曲表面上或折叠面上。甚至,显示屏1005还设置成非矩形的不规则图形,也即异形屏。显示屏1005采用LCD(Liquid Crystal Display,液晶显示屏)、OLED(Organic Light-Emitting Diode,有机发光二极管)等材质制备。
摄像头组件1006用于采集图像或视频。在一些实施例中,摄像头组件1006包括前置摄像头和后置摄像头。通常,前置摄像头设置在终端的前面板,后置摄像头设置在终端的背面。在一些实施例中,后置摄像头为至少两个,分别为主摄像头、景深摄像头、广角摄像头、长焦摄像头中的任意一种,以实现主摄像头和景深摄像头融合实现背景虚化功能、主摄像头和广角摄像头融合实现全景拍摄以及VR(Virtual Reality,虚拟现实)拍摄功能或者其它融合拍摄功能。在一些实施例中,摄像头组件1006还包括闪光灯。闪光灯是单色温闪光灯,也是双色温闪光灯。双色温闪光灯是指暖光闪光灯和冷光闪光灯的组合,用于不同色温下的光线补偿。
音频电路1007包括麦克风和扬声器。麦克风用于采集用户及环境的声波,并将声波转换为电信号输入至处理器1001进行处理,或者输入至射频电路1004以实现语音通信。出于立体声采集或降噪的目的,麦克风为多个,分别设置在终端1000的不同部位。麦克风还是阵列麦克风或全向采集型麦克风。扬声器则用于将来自处理器1001或射频电路1004的电信号转换为声波。扬声器是传统的薄膜扬声器,也是压电陶瓷扬声器。当扬声器是压电陶瓷扬声器时,不仅将电信号转换为人类可听见的声波,也将电信号转换为人类听不见的声波以进行测距等用途。在一些实施例中,音频电路1007还包括耳机插孔。
定位组件1008用于定位终端1000的当前地理位置,以实现导航或LBS(Location Based Service,基于位置的服务)。定位组件1008是基于美国的GPS(Global Positioning System,全球定位系统)、中国的北斗系统或俄罗斯的伽利略系统的定位组件。
电源1009用于为终端1000中的各个组件进行供电。电源1009是交流电、直流电、一次性电池或可充电电池。当电源1009包括可充电电池时,该可充电电池是有线充电电池或无线充电电池。有线充电电池是通过有线线路充电的电池,无线充电电池是通过无线线圈充电的电池。该可充电电池还用于支持快充技术。
在一些实施例中,终端1000还包括有一个或多个传感器1010。该一个或多个传感器1010包括但不限于:加速度传感器1011、陀螺仪传感器1012、压力传感器1013、指纹传感器1014、光学传感器1015以及接近传感器1016。
加速度传感器1011检测以终端1000建立的坐标系的三个坐标轴上的加速度大小。比如,加速度传感器1011用于检测重力加速度在三个坐标轴上的分量。处理器1001根据加速度传感器1011采集的重力加速度信号,控制触摸显示屏1005以横向视图或纵向视图进行用户界面的显示。加速度传感器1011还用于游戏或者用户的运动数据的采集。
陀螺仪传感器1012检测终端1000的机体方向及转动角度,陀螺仪传感器1012与加速度传感器1011协同采集用户对终端1000的3D动作。处理器1001根据陀螺仪传感器1012采集 的数据,实现如下功能:动作感应(比如根据用户的倾斜操作来改变UI)、拍摄时的图像稳定、游戏控制以及惯性导航。
压力传感器1013设置在终端1000的侧边框和/或触摸显示屏1005的下层。当压力传感器1013设置在终端1000的侧边框时,检测用户对终端1000的握持信号,由处理器1001根据压力传感器1013采集的握持信号进行左右手识别或快捷操作。当压力传感器1013设置在触摸显示屏1005的下层时,由处理器1001根据用户对触摸显示屏1005的压力操作,实现对UI界面上的可操作性控件进行控制。可操作性控件包括按钮控件、滚动条控件、图标控件、菜单控件中的至少一种。
指纹传感器1014用于采集用户的指纹,由处理器1001根据指纹传感器1014采集到的指纹识别用户的身份,或者,由指纹传感器1014根据采集到的指纹识别用户的身份。在识别出用户的身份为可信身份时,由处理器1001授权该用户执行相关的敏感操作,该敏感操作包括解锁屏幕、查看加密信息、下载软件、支付及更改设置等。指纹传感器1014被设置终端1000的正面、背面或侧面。当终端1000上设置有物理按键或厂商Logo时,指纹传感器1014与物理按键或厂商Logo集成在一起。
光学传感器1015用于采集环境光强度。在一个实施例中,处理器1001根据光学传感器1015采集的环境光强度,控制触摸显示屏1005的显示亮度。当环境光强度较高时,调高触摸显示屏1005的显示亮度;当环境光强度较低时,调低触摸显示屏1005的显示亮度。在另一个实施例中,处理器1001还根据光学传感器1015采集的环境光强度,动态调整摄像头组件1006的拍摄参数。
接近传感器1016,也称距离传感器,通常设置在终端1000的前面板。接近传感器1016用于采集用户与终端1000的正面之间的距离。在一个实施例中,当接近传感器1016检测到用户与终端1000的正面之间的距离逐渐变小时,由处理器1001控制触摸显示屏1005从亮屏状态切换为息屏状态;当接近传感器1016检测到用户与终端1000的正面之间的距离逐渐变大时,由处理器1001控制触摸显示屏1005从息屏状态切换为亮屏状态。
本领域技术人员理解,图10中示出的结构并不构成对终端1000的限定,包括比图示更多或更少的组件,或者组合某些组件,或者采用不同的组件布置。
图11是本公开实施例提供的一种服务器的结构示意图,该服务器1100可因配置或性能不同而产生比较大的差异,包括一个或一个以上处理器(central processing units,CPU)1101和一个或一个以上的存储器1102,其中,该存储器1102中存储有至少一条指令,该至少一条指令由该处理器1101加载并执行以实现上述各个方法实施例提供的媒体流传输方法。当然,该服务器还具有有线或无线网络接口以及输入输出接口等部件,以便进行输入输出,该服务器还包括其他用于实现设备功能的部件,在此不做赘述。
上述终端和服务器为电子设备,该电子设备包括一个或多个处理器;用于存储所述一个或多个处理器可执行指令的一个或多个存储器;其中,所述一个或多个处理器被配置为执行所述指令,以实现上述各个实施例中所示的媒体流传输方法的方法步骤。在一些实施例中,上述各个实施例中所描述的特征和应用都可以由记录在一种该存储器中的存储介质中存储的指令执行操作。这些指令被一个或多个处理器执行时,能够使得一个或多个处理器执行指令中所指示的动作。
在示例性实施例中,还提供了一种包括指令的存储介质,例如包括指令的存储器。上述指令可由电子设备的处理器执行上述各个实施例中所示的媒体流传输方法的方法操作。在一 些实施例中,上述各个实施例所描述的特征和应用都可以由记录在一种存储介质中的指令执行操作。当这些指令被一个或多个计算或处理单元(例如,一个或多个处理器、处理器的内核或其他处理单元)执行时,它们导致处理单元执行指令中指示的动作。在一些实施例中,存储介质可以是非临时性计算机可读存储介质,例如,该非临时性计算机可读存储介质可以是只读存储器(Read-Only Memory,简称:ROM)、随机存取存储器(Random Access Memory,简称:RAM)、只读光盘(Compact Disc Read-Only Memory,简称:CD-ROM)、磁带、软盘和光数据存储设备等。
在一些实施例中,存储介质是非临时性计算机可读存储介质,例如,该非临时性计算机可读存储介质是只读存储器(Read-Only Memory,简称:ROM)、随机存取存储器(Random Access Memory,简称:RAM)、只读光盘(Compact Disc Read-Only Memory,简称:CD-ROM)、磁带、软盘和光数据存储设备等。
本公开实施例还提供一种计算机程序产品,包括一条或多条指令,该一条或多条指令由电子设备的处理器执行时,使得电子设备能够执行上述媒体流传输方法。
本领域技术人员在考虑说明书及实践这里公开的发明后,将容易想到本公开的其它实施方案。本公开旨在涵盖本公开的任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本公开的一般性原理并包括本公开未公开的本技术领域中的公知常识或惯用技术手段。说明书和实施例仅被视为示例性的,本公开的真正范围和精神由下面的权利要求指出。
应当理解的是,本公开并不局限于上面已经描述并在附图中示出的精确结构,并且在不脱离其范围进行各种修改和改变。本公开的范围仅由所附的权利要求来限制。

Claims (36)

  1. 一种媒体流传输方法,应用于终端,所述方法包括:
    响应于对媒体流的帧获取指令,从多种码率的所述媒体流的地址信息中,确定目标码率的所述媒体流的目标地址信息;
    确定所述目标码率对应的待获取媒体帧在所述媒体流中的起始位置;
    向服务器发送携带有所述目标地址信息和起始位置的帧获取请求,所述帧获取请求用于指示所述服务器以所述目标码率返回所述媒体流中从所述起始位置开始的媒体帧。
  2. 根据权利要求1所述的方法,基于所述帧获取指令由对媒体流的播放操作触发,所述确定所述目标码率对应的待获取媒体帧在所述媒体流中的起始位置,包括:
    将所述媒体流中所述播放操作的操作时间产生的媒体帧所在位置确定为所述起始位置;或,
    将所述帧获取指令中所选定的媒体帧在所述媒体流中的位置确定为所述起始位置;或,
    将所述媒体流的第一个媒体帧所在位置确定为所述起始位置。
  3. 根据权利要求1所述的方法,所述响应于对媒体流的帧获取指令,从多种码率的所述媒体流的地址信息中,确定目标码率的所述媒体流的目标地址信息,包括:
    在接收所述媒体流中任一媒体帧时,获取所述媒体流的播放状态信息;
    响应于所述播放状态信息符合所述码率切换条件,从多种码率的所述媒体流的地址信息中,确定目标码率的所述媒体流的目标地址信息;
    所述确定所述目标码率对应的待获取媒体帧在所述媒体流中的起始位置,包括:
    根据所述任一媒体帧在所述媒体流中的位置,确定所述目标码率对应的待获取媒体帧在所述媒体流中的起始位置。
  4. 根据权利要求3所述的方法,所述响应于所述播放状态信息符合所述码率切换条件,从多种码率的所述媒体流的地址信息中,确定目标码率的所述媒体流的目标地址信息,包括:
    响应于所述播放状态信息符合所述码率切换条件,根据所述播放状态信息和当前码率,确定目标码率;
    响应于所述目标码率与所述当前码率不相等,从多种码率的所述媒体流的地址信息中,确定目标码率的所述媒体流的目标地址信息。
  5. 根据权利要求4所述的方法,所述播放状态信息包括第一缓存量,所述第一缓存量为当前对所述媒体流已缓存且未播放的缓存量;
    所述响应于所述播放状态信息符合码率切换条件,根据所述播放状态信息以及当前码率,确定目标码率,包括:
    响应于所述第一缓存量大于第一缓存量阈值或所述第一缓存量小于第二缓存量阈值,根据所述播放状态信息以及当前码率,确定目标码率,其中所述第二缓存量阈值小于所述第一缓存量阈值。
  6. 根据权利要求4所述的方法,所述根据所述播放状态信息以及当前码率,确定目标码率,包括:
    获取多个候选码率;
    根据所述多个候选码率与所述当前码率之间的关系、所述播放状态信息以及所述媒体流中所述任一媒体帧在所述任一媒体帧所在媒体帧组中的位置,获取每个候选码率对应的第二缓存量;
    根据所述每个候选码率对应的第二缓存量与第一缓存量阈值或第二缓存量阈值的关系,从所述多个候选码率中,确定目标码率;
    其中,所述每个候选码率对应的第二缓存量为将码率切换至候选码率后,所述任一媒体帧所在媒体帧组传输结束时对所述媒体流已缓存但未播放的缓存量。
  7. 根据权利要求1所述的方法,所述帧获取请求还包括第一扩展参数或者第二扩展参数中至少一项,所述第一扩展参数用于表示所述媒体帧是否为音频帧,所述第二扩展参数用于表示从所述第二扩展参数所指示的目标时间戳开始传输所述媒体流中的媒体帧。
  8. 根据权利要求1所述的方法,所述多种码率的所述媒体流的地址信息存储于所述媒体流的媒体描述文件中。
  9. 根据权利要求8所述的方法,所述媒体描述文件包括版本号和媒体描述集合,其中,所述版本号包括所述媒体描述文件的版本号或者资源传输标准的版本号中至少一项,所述媒体描述集合包括多个媒体描述元信息,每个媒体描述元信息对应于一种码率的媒体流,每个媒体描述元信息包括所述媒体描述元信息所对应码率的媒体流的画面组长度以及属性信息。
  10. 一种媒体流传输方法,应用于服务器,所述方法包括:
    接收帧获取请求,所述帧获取请求携带有目标码率的媒体流的目标地址信息和所述目标码率对应的待获取媒体帧在所述媒体流中的起始位置;
    响应于所述帧获取请求,从所述目标地址信息对应的地址,获取从所述起始位置开始的媒体帧;
    以所述目标码率,向终端传输从所述起始位置开始的媒体帧。
  11. 根据权利要求10所述的方法,所述从所述目标地址信息对应的地址,获取从所述起始位置开始的媒体帧,包括:
    基于所述起始位置,确定目标时间戳;
    基于所述目标时间戳,确定并获取从所述起始位置开始的媒体帧。
  12. 根据权利要求11所述的方法,基于所述起始位置为拉取位置参数,所述基于所述起始位置,确定目标时间戳,包括:基于音频参数和拉取位置参数,确定目标时间戳。
  13. 根据权利要求12所述的方法,所述基于所述音频参数和所述拉取位置参数,确定目标时间戳包括:
    基于所述拉取位置参数为默认值,且所述音频参数为默认值或所述音频参数为假,将最大时间戳减去所述拉取位置参数的默认值的绝对值所得的数值确定为所述目标时间戳;或,
    基于所述拉取位置参数为默认值,且所述音频参数为真,将最大音频时间戳减去所述拉取位置参数的默认值的绝对值所得的数值确定为所述目标时间戳;或,
    基于所述拉取位置参数等于0,且所述音频参数为默认值或所述音频参数为假,将最大时间戳确定为所述目标时间戳;或,
    基于所述拉取位置参数等于0,且所述音频参数为真,将最大音频时间戳确定为所述目标时间戳;或,
    基于所述拉取位置参数小于0,且所述音频参数为默认值或所述音频参数为假,将最大时间戳减去所述拉取位置参数的绝对值所得的数值确定为所述目标时间戳;或,
    基于所述拉取位置参数小于0,且所述音频参数为真,将最大音频时间戳减去所述拉取位置参数的绝对值所得的数值确定为所述目标时间戳;或,
    基于所述拉取位置参数大于0,且所述音频参数为默认值或所述音频参数为假,在缓存区中发生时间戳回退时,将最大时间戳确定为所述目标时间戳;或,
    基于所述拉取位置参数大于0,且所述音频参数为真,在缓存区中发生时间戳回退时,将最大音频时间戳确定为所述目标时间戳;或,
    基于所述拉取位置参数大于0,且缓存区中未发生时间戳回退时,将所述拉取位置参数确定为所述目标时间戳。
  14. 根据权利要求13所述的方法,所述方法还包括:
    基于缓存区中媒体帧序列中媒体帧的时间戳呈非单调递增,确定所述缓存区发生时间戳回退;
    基于缓存区中媒体帧序列中媒体帧的时间戳不是呈非单调递增,确定所述缓存区未发生时间戳回退,所述媒体帧序列为所述缓存区缓存的多个媒体帧所组成的序列。
  15. 根据权利要求14所述的方法,所述方法还包括:
    基于所述缓存区中包括视频资源且在关键帧序列中关键帧的时间戳呈非单调递增时,确定所述媒体帧序列呈非单调递增,所述关键帧序列为缓存的多个关键帧组成的序列;
    基于所述缓存区中不包括视频资源且在音频帧序列中音频帧的时间戳呈非单调递增时,确定所述媒体帧序列呈非单调递增,所述音频帧序列为缓存的多个音频帧组成的序列。
  16. 根据权利要求11所述的方法,所述基于所述目标时间戳,确定并获取从所述起始位置开始的媒体帧,包括:
    基于当前有效缓存区中存在目标媒体帧,确定所述目标媒体帧为所述从所述起始位置开始的媒体帧,所述目标媒体帧的时间戳大于或等于所述目标时间戳且最接近所述目标时间戳;或,
    基于所述当前有效缓存区中不存在目标媒体帧,进入等待状态,直到所述目标媒体帧写 入所述当前有效缓存区时,确定所述目标媒体帧为所述从所述起始位置开始的媒体帧,所述目标媒体帧的时间戳大于或等于所述目标时间戳且最接近所述目标时间戳;或,
    基于所述当前有效缓存区中不存在目标媒体帧且所述目标时间戳与最大时间戳之间的差值大于超时阈值,发送拉取失败信息,所述目标媒体帧的时间戳大于或等于所述目标时间戳且最接近所述目标时间戳。
  17. 一种媒体流传输装置,应用于终端,包括:
    确定模块,用于响应于对媒体流的帧获取指令,从多种码率的所述媒体流的地址信息中,确定目标码率的所述媒体流的目标地址信息;
    所述确定模块,还用于确定所述目标码率对应的待获取媒体帧在所述媒体流中的起始位置;
    发送模块,用于向服务器发送携带有所述目标地址信息和起始位置的帧获取请求,所述帧获取请求用于指示所述服务器以所述目标码率返回所述媒体流中从所述起始位置开始的媒体帧。
  18. 根据权利要求17所述的装置,基于所述帧获取指令由对媒体流的播放操作触发,所述确定模块用于:
    将所述媒体流中所述播放操作的操作时间产生的媒体帧所在位置确定为所述起始位置;或,
    将所述帧获取指令中所选定的媒体帧在所述媒体流中的位置确定为所述起始位置;或,
    将所述媒体流的第一个媒体帧所在位置确定为所述起始位置。
  19. 根据权利要求17所述的装置,所述装置还包括:
    获取模块,用于在接收所述媒体流中任一媒体帧时,获取所述媒体流的播放状态信息;
    所述确定模块还用于响应于所述播放状态信息符合所述码率切换条件,从多种码率的所述媒体流的地址信息中,确定目标码率的所述媒体流的目标地址信息;
    所述确定模块用于根据所述任一媒体帧在所述媒体流中的位置,确定所述目标码率对应的待获取媒体帧在所述媒体流中的起始位置。
  20. 根据权利要求19所述的装置,所述确定模块用于:
    响应于所述播放状态信息符合所述码率切换条件,根据所述播放状态信息和当前码率,确定目标码率;
    响应于所述目标码率与所述当前码率不相等,从多种码率的所述媒体流的地址信息中,确定目标码率的所述媒体流的目标地址信息。
  21. 根据权利要求20所述的装置,所述播放状态信息包括第一缓存量,所述第一缓存量为当前对所述媒体流已缓存且未播放的缓存量;
    所述确定模块用于:
    响应于所述第一缓存量大于第一缓存量阈值或所述第一缓存量小于第二缓存量阈值,根 据所述播放状态信息以及当前码率,确定目标码率,其中所述第二缓存量阈值小于所述第一缓存量阈值。
  22. 根据权利要求20所述的装置,所述确定模块用于:
    获取多个候选码率;
    根据所述多个候选码率与所述当前码率之间的关系、所述播放状态信息以及所述媒体流中所述任一媒体帧在所述任一媒体帧所在媒体组中的位置,获取每个候选码率对应的第二缓存量;
    根据所述每个候选码率对应的第二缓存量与第一缓存量阈值或第二缓存量阈值的关系,从所述多个候选码率中,确定目标码率;
    其中,所述每个候选码率对应的第二缓存量为将码率切换至候选码率后,所述任一媒体帧所在媒体组传输结束时对所述媒体流已缓存但未播放的缓存量。
  23. 根据权利要求17所述的装置,所述帧获取请求还包括第一扩展参数或者第二扩展参数中至少一项,所述第一扩展参数用于表示所述媒体帧是否为音频帧,所述第二扩展参数用于表示从所述第二扩展参数所指示的目标时间戳开始传输所述媒体流中的媒体帧。
  24. 根据权利要求17所述的装置,所述多种码率的所述媒体流的地址信息存储于所述媒体流的媒体描述文件中。
  25. 根据权利要求24所述的装置,所述媒体描述文件包括版本号和媒体描述集合,其中,所述版本号包括所述媒体描述文件的版本号或者资源传输标准的版本号中至少一项,所述媒体描述集合包括多个媒体描述元信息,每个媒体描述元信息对应于一种码率的媒体流,每个媒体描述元信息包括所述媒体描述元信息所对应码率的媒体流的画面组长度以及属性信息。
  26. 一种媒体流传输装置,应用于服务器,包括:
    接收模块,用于接收帧获取请求,所述帧获取请求携带有目标码率的媒体流的目标地址信息和所述目标码率对应的待获取媒体帧在所述媒体流中的起始位置;
    获取模块,用于响应于所述帧获取请求,从所述目标地址信息对应的地址,获取从所述起始位置开始的媒体帧;
    传输模块,用于以所述目标码率,向终端传输从所述起始位置开始的媒体帧。
  27. 根据权利要求26所述的装置,所述获取模块用于:
    基于所述起始位置,确定目标时间戳;
    基于所述目标时间戳,确定并获取从所述起始位置开始的媒体帧。
  28. 根据权利要求27所述的装置,基于所述起始位置为拉取位置参数,所述获取模块用于基于音频参数和拉取位置参数,确定目标时间戳。
  29. 根据权利要求28所述的装置,所述获取模块用于:
    基于所述拉取位置参数为默认值,且所述音频参数为默认值或所述音频参数为假,将最大时间戳减去所述拉取位置参数的默认值的绝对值所得的数值确定为所述目标时间戳;或,
    基于所述拉取位置参数为默认值,且所述音频参数为真,将最大音频时间戳减去所述拉取位置参数的默认值的绝对值所得的数值确定为所述目标时间戳;或,
    基于所述拉取位置参数等于0,且所述音频参数为默认值或所述音频参数为假,将最大时间戳确定为所述目标时间戳;或,
    基于所述拉取位置参数等于0,且所述音频参数为真,将最大音频时间戳确定为所述目标时间戳;或,
    基于所述拉取位置参数小于0,且所述音频参数为默认值或所述音频参数为假,将最大时间戳减去所述拉取位置参数的绝对值所得的数值确定为所述目标时间戳;或,
    基于所述拉取位置参数小于0,且所述音频参数为真,将最大音频时间戳减去所述拉取位置参数的绝对值所得的数值确定为所述目标时间戳;或,
    基于所述拉取位置参数大于0,且所述音频参数为默认值或所述音频参数为假,在缓存区中发生时间戳回退时,将最大时间戳确定为所述目标时间戳;或,
    基于所述拉取位置参数大于0,且所述音频参数为真,在缓存区中发生时间戳回退时,将最大音频时间戳确定为所述目标时间戳;或,
    基于所述拉取位置参数大于0,且缓存区中未发生时间戳回退时,将所述拉取位置参数确定为所述目标时间戳。
  30. 根据权利要求29所述的装置,所述获取模块,还用于:
    基于缓存区中媒体帧序列中媒体帧的时间戳呈非单调递增,确定所述缓存区发生时间戳回退;
    基于缓存区中媒体帧序列中媒体帧的时间戳不是呈非单调递增,确定所述缓存区未发生时间戳回退,所述媒体帧序列为所述缓存区缓存的多个媒体帧所组成的序列。
  31. 根据权利要求30所述的装置,所述获取模块用于:
    基于所述缓存区中包括视频资源且在关键帧序列中关键帧的时间戳呈非单调递增时,确定所述媒体帧序列呈非单调递增,所述关键帧序列为缓存的多个关键帧组成的序列;
    基于所述缓存区中不包括视频资源且在音频帧序列中音频帧的时间戳呈非单调递增时,确定所述媒体帧序列呈非单调递增,所述音频帧序列为缓存的多个音频帧组成的序列。
  32. 根据权利要求27所述的装置,所述获取模块用于:
    基于当前有效缓存区中存在目标媒体帧,确定所述目标媒体帧为所述从所述起始位置开始的媒体帧,所述目标媒体帧的时间戳大于或等于所述目标时间戳且最接近所述目标时间戳;或,
    基于所述当前有效缓存区中不存在目标媒体帧,进入等待状态,直到所述目标媒体帧写入所述当前有效缓存区时,确定所述目标媒体帧为所述从所述起始位置开始的媒体帧,所述目标媒体帧的时间戳大于或等于所述目标时间戳且最接近所述目标时间戳;或,
    基于所述当前有效缓存区中不存在目标媒体帧,且所述目标时间戳与最大时间戳之间的差值大于超时阈值,发送拉取失败信息,所述目标媒体帧的时间戳大于或等于所述目标时间戳且最接近所述目标时间戳。
  33. 一种电子设备,包括:
    一个或多个处理器;
    用于存储所述一个或多个处理器可执行指令的一个或多个存储器;
    其中,所述一个或多个处理器被配置为执行所述指令,以实现如下步骤:
    响应于对媒体流的帧获取指令,从多种码率的所述媒体流的地址信息中,确定目标码率的所述媒体流的目标地址信息;
    确定所述目标码率对应的待获取媒体帧在所述媒体流中的起始位置;
    向服务器发送携带有所述目标地址信息和起始位置的帧获取请求,所述帧获取请求用于指示所述服务器以所述目标码率返回所述媒体流中从所述起始位置开始的媒体帧。
  34. 一种电子设备,包括:
    一个或多个处理器;
    用于存储所述一个或多个处理器可执行指令的一个或多个存储器;
    其中,所述一个或多个处理器被配置为执行所述指令,以实现如下步骤:
    接收帧获取请求,所述帧获取请求携带有目标码率的媒体流的目标地址信息和所述目标码率对应的待获取媒体帧在所述媒体流中的起始位置;
    响应于所述帧获取请求,从所述目标地址信息对应的地址,获取从所述起始位置开始的媒体帧;
    以所述目标码率,向终端传输从所述起始位置开始的媒体帧。
  35. 一种存储介质,当所述存储介质中的指令由电子设备的处理器执行时,使得所述电子设备能够执行如下步骤:
    响应于对媒体流的帧获取指令,从多种码率的所述媒体流的地址信息中,确定目标码率的所述媒体流的目标地址信息;
    确定所述目标码率对应的待获取媒体帧在所述媒体流中的起始位置;
    向服务器发送携带有所述目标地址信息和起始位置的帧获取请求,所述帧获取请求用于指示所述服务器以所述目标码率返回所述媒体流中从所述起始位置开始的媒体帧。
  36. 一种存储介质,当所述存储介质中的指令由电子设备的处理器执行时,使得所述电子设备能够执行如下步骤:
    接收帧获取请求,所述帧获取请求携带有目标码率的媒体流的目标地址信息和所述目标码率对应的待获取媒体帧在所述媒体流中的起始位置;
    响应于所述帧获取请求,从所述目标地址信息对应的地址,获取从所述起始位置开始的媒体帧;
    以所述目标码率,向终端传输从所述起始位置开始的媒体帧。
PCT/CN2020/138855 2020-01-17 2020-12-24 媒体流传输方法及系统 WO2021143479A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP20913499.8A EP3968647A4 (en) 2020-01-17 2020-12-24 METHOD AND SYSTEM FOR TRANSMISSION OF A MEDIA STREAM
US17/542,841 US20220095002A1 (en) 2020-01-17 2021-12-06 Method for transmitting media stream, and electronic device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010054830.8A CN113141514B (zh) 2020-01-17 2020-01-17 媒体流传输方法、系统、装置、设备及存储介质
CN202010054830.8 2020-01-17

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/542,841 Continuation US20220095002A1 (en) 2020-01-17 2021-12-06 Method for transmitting media stream, and electronic device

Publications (1)

Publication Number Publication Date
WO2021143479A1 true WO2021143479A1 (zh) 2021-07-22

Family

ID=76809532

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/138855 WO2021143479A1 (zh) 2020-01-17 2020-12-24 媒体流传输方法及系统

Country Status (4)

Country Link
US (1) US20220095002A1 (zh)
EP (1) EP3968647A4 (zh)
CN (1) CN113141514B (zh)
WO (1) WO2021143479A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115174569A (zh) * 2022-06-27 2022-10-11 普联技术有限公司 一种视频流传输的控制方法、装置、服务器及存储介质
CN115361543A (zh) * 2022-10-21 2022-11-18 武汉光谷信息技术股份有限公司 一种基于arm架构的异构数据融合与推流方法、系统

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11451606B2 (en) * 2020-12-16 2022-09-20 Grass Valley Limited System and method for moving media content over a network
CN113905257A (zh) * 2021-09-29 2022-01-07 北京字节跳动网络技术有限公司 视频码率切换方法、装置、电子设备及存储介质
CN115086300B (zh) * 2022-06-16 2023-09-08 乐视云网络技术(北京)有限公司 一种视频文件调度方法和装置
CN116684515B (zh) * 2022-09-27 2024-04-12 荣耀终端有限公司 流媒体视频的seek处理方法、电子设备及存储介质
CN117560525B (zh) * 2024-01-11 2024-04-19 腾讯科技(深圳)有限公司 一种数据处理方法、装置、设备以及可读存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080145025A1 (en) * 2006-12-13 2008-06-19 General Instrument Corporation Method and System for Selecting Media Content
CN101771492A (zh) * 2008-12-29 2010-07-07 华为技术有限公司 调整流媒体码率的方法和装置
CN102333083A (zh) * 2011-08-24 2012-01-25 中兴通讯股份有限公司 一种传输数据的方法和系统
CN103369355A (zh) * 2012-04-10 2013-10-23 华为技术有限公司 一种在线媒体数据转换的方法、播放视频方法及相应装置
CN103974147A (zh) * 2014-03-07 2014-08-06 北京邮电大学 一种基于mpeg-dash协议的带有码率切换控制和静态摘要技术的在线视频播控系统
CN108184152A (zh) * 2018-01-03 2018-06-19 湖北大学 一种dash传输系统两阶段客户端码率选择方法

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102957672A (zh) * 2011-08-25 2013-03-06 中国电信股份有限公司 自适应播放flv媒体流的方法、客户端和系统
US10666961B2 (en) * 2016-01-08 2020-05-26 Qualcomm Incorporated Determining media delivery event locations for media transport
JP2017157903A (ja) * 2016-02-29 2017-09-07 富士ゼロックス株式会社 情報処理装置
CN110545491B (zh) * 2018-05-29 2021-08-10 北京字节跳动网络技术有限公司 一种媒体文件的网络播放方法、装置及存储介质
CN109040801B (zh) * 2018-07-19 2019-07-09 北京达佳互联信息技术有限公司 媒体码率自适应方法、装置、计算机设备及存储介质
CN109218759A (zh) * 2018-09-27 2019-01-15 广州酷狗计算机科技有限公司 推送媒体流的方法、装置、服务器及存储介质
CN110636346B (zh) * 2019-09-19 2021-08-03 北京达佳互联信息技术有限公司 一种码率自适应切换方法、装置、电子设备及存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080145025A1 (en) * 2006-12-13 2008-06-19 General Instrument Corporation Method and System for Selecting Media Content
CN101771492A (zh) * 2008-12-29 2010-07-07 华为技术有限公司 调整流媒体码率的方法和装置
CN102333083A (zh) * 2011-08-24 2012-01-25 中兴通讯股份有限公司 一种传输数据的方法和系统
CN103369355A (zh) * 2012-04-10 2013-10-23 华为技术有限公司 一种在线媒体数据转换的方法、播放视频方法及相应装置
CN103974147A (zh) * 2014-03-07 2014-08-06 北京邮电大学 一种基于mpeg-dash协议的带有码率切换控制和静态摘要技术的在线视频播控系统
CN108184152A (zh) * 2018-01-03 2018-06-19 湖北大学 一种dash传输系统两阶段客户端码率选择方法

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115174569A (zh) * 2022-06-27 2022-10-11 普联技术有限公司 一种视频流传输的控制方法、装置、服务器及存储介质
CN115174569B (zh) * 2022-06-27 2024-03-19 普联技术有限公司 一种视频流传输的控制方法、装置、服务器及存储介质
CN115361543A (zh) * 2022-10-21 2022-11-18 武汉光谷信息技术股份有限公司 一种基于arm架构的异构数据融合与推流方法、系统

Also Published As

Publication number Publication date
CN113141514B (zh) 2022-07-22
EP3968647A1 (en) 2022-03-16
EP3968647A4 (en) 2022-11-23
US20220095002A1 (en) 2022-03-24
CN113141514A (zh) 2021-07-20

Similar Documents

Publication Publication Date Title
WO2021143479A1 (zh) 媒体流传输方法及系统
US11366632B2 (en) User interface for screencast applications
CN113141524B (zh) 资源传输方法、装置、终端及存储介质
US10423320B2 (en) Graphical user interface for navigating a video
WO2021143386A1 (zh) 资源传输方法及终端
US20190149885A1 (en) Thumbnail preview after a seek request within a video
CN111866433B (zh) 视频源切换方法、播放方法、装置、设备和存储介质
US9756373B2 (en) Content streaming and broadcasting
WO2017075956A1 (zh) 内容投射方法及移动终端
KR101593780B1 (ko) 상이한 디바이스들에 걸친 콘텐츠의 끊김 없는 네비게이션을 위한 방법 및 시스템
WO2021143360A1 (zh) 资源传输方法及计算机设备
US9137497B2 (en) Method and system for video stream personalization
US20220095020A1 (en) Method for switching a bit rate, and electronic device
US10237195B1 (en) IP video playback
US20220174356A1 (en) Method for determining bandwidth, terminal, and storage medium
US20150032900A1 (en) System for seamlessly switching between a cloud-rendered application and a full-screen video sourced from a content server
CN116264619A (zh) 资源处理方法、装置、服务器、终端、系统及存储介质
CN113794936B (zh) 一种精彩瞬间生成方法、装置、系统、设备和介质
CN113794836B (zh) 一种子弹时间视频生成方法、装置、系统、设备和介质
US20240073415A1 (en) Encoding Method, Electronic Device, Communication System, Storage Medium, and Program Product
CN115604496A (zh) 一种显示设备、直播切台方法及存储介质
CN115834966A (zh) 一种视频播放方法、装置、设备和存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20913499

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2020913499

Country of ref document: EP

Effective date: 20211207

NENP Non-entry into the national phase

Ref country code: DE