WO2021143360A1 - 资源传输方法及计算机设备 - Google Patents

资源传输方法及计算机设备 Download PDF

Info

Publication number
WO2021143360A1
WO2021143360A1 PCT/CN2020/131552 CN2020131552W WO2021143360A1 WO 2021143360 A1 WO2021143360 A1 WO 2021143360A1 CN 2020131552 W CN2020131552 W CN 2020131552W WO 2021143360 A1 WO2021143360 A1 WO 2021143360A1
Authority
WO
WIPO (PCT)
Prior art keywords
frame
audio
timestamp
buffer area
target
Prior art date
Application number
PCT/CN2020/131552
Other languages
English (en)
French (fr)
Inventor
周超
Original Assignee
北京达佳互联信息技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京达佳互联信息技术有限公司 filed Critical 北京达佳互联信息技术有限公司
Priority to EP20913803.1A priority Critical patent/EP3941070A4/en
Publication of WO2021143360A1 publication Critical patent/WO2021143360A1/zh
Priority to US17/517,973 priority patent/US20220060532A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/238Interfacing the downstream path of the transmission network, e.g. adapting the transmission rate of a video stream to network bandwidth; Processing of multiplex streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/80Responding to QoS
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments
    • H04N21/8456Structuring of content, e.g. decomposing content into time segments by decomposing the content in the time domain, e.g. in time segments
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/60Network streaming of media packets
    • H04L65/61Network streaming of media packets for supporting one-way streaming services, e.g. Internet radio
    • H04L65/612Network streaming of media packets for supporting one-way streaming services, e.g. Internet radio for unicast
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/60Network streaming of media packets
    • H04L65/61Network streaming of media packets for supporting one-way streaming services, e.g. Internet radio
    • H04L65/613Network streaming of media packets for supporting one-way streaming services, e.g. Internet radio for the control of the source by the destination
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs
    • H04N21/2343Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
    • H04N21/23439Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements for generating different versions
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/238Interfacing the downstream path of the transmission network, e.g. adapting the transmission rate of a video stream to network bandwidth; Processing of multiplex streams
    • H04N21/2387Stream processing in response to a playback request from an end-user, e.g. for trick-play
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/239Interfacing the upstream path of the transmission network, e.g. prioritizing client content requests
    • H04N21/2393Interfacing the upstream path of the transmission network, e.g. prioritizing client content requests involving handling client requests
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/25Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
    • H04N21/262Content or additional data distribution scheduling, e.g. sending additional data at off-peak times, updating software modules, calculating the carousel transmission frequency, delaying a video stream transmission, generating play-lists
    • H04N21/26258Content or additional data distribution scheduling, e.g. sending additional data at off-peak times, updating software modules, calculating the carousel transmission frequency, delaying a video stream transmission, generating play-lists for generating a list of items to be played back in a given order, e.g. playlist, or scheduling item distribution according to such list
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/442Monitoring of processes or resources, e.g. detecting the failure of a recording device, monitoring the downstream bandwidth, the number of times a movie has been viewed, the storage space available from the internal hard disk
    • H04N21/44209Monitoring of downstream path of the transmission network originating from a server, e.g. bandwidth variations of a wireless network
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/472End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
    • H04N21/47217End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for controlling playback functions for recorded or on-demand content, e.g. using progress bars, mode or play-point indicators or bookmarks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/60Network structure or processes for video distribution between server and client or between remote clients; Control signalling between clients, server and network components; Transmission of management data between server and client, e.g. sending from server to client commands for recording incoming content stream; Communication details between server and client 
    • H04N21/65Transmission of management data between client and server
    • H04N21/658Transmission by the client directed to the server
    • H04N21/6587Control parameters, e.g. trick play commands, viewpoint selection

Definitions

  • the present disclosure relates to the field of communication technology, and in particular to a resource transmission method and computer equipment.
  • Fragmentation-based media transmission methods include the common DASH (Dynamic Adaptive Streaming over HTTP, an HTTP-based adaptive streaming media transmission standard formulated by MPEG.
  • MPEG is called Moving Picture Experts Group in English, and Dynamic Picture Experts Group in Chinese.
  • HLS HTTP Live Streaming, an adaptive streaming media transmission standard based on HTTP developed by Apple
  • the server divides audio and video resources into segments of audio and video fragments, and each audio and video fragment can be transcoded into a different Bit rate:
  • the terminal accesses the URLs of each audio and video segment divided into the audio and video resources.
  • Different audio and video segments can correspond to the same or different bit rates, so that the terminal can conveniently log in Switching between audio and video resources with different bit rates is also called adaptive adjustment of the bit rate based on the terminal's own bandwidth.
  • the present disclosure provides a resource transmission method and computer equipment.
  • the technical solutions of the present disclosure are as follows:
  • a resource transmission method including: in response to a frame acquisition request of a multimedia resource, acquiring a pull position parameter of the multimedia resource, and the frame acquisition request is used to request transmission of the multimedia resource.
  • the media frame of the resource, the pull position parameter is used to indicate the initial pull position of the media frame of the multimedia resource; determine the start frame of the multimedia resource based on the pull position parameter of the multimedia resource;
  • the initial frame starts to send the media frame of the multimedia resource, wherein the time stamp of the media frame is greater than or equal to the time stamp of the initial frame.
  • a resource transmission device including: an acquisition unit configured to perform a frame acquisition request in response to a multimedia resource, acquire a pull position parameter of the multimedia resource, and the frame acquisition
  • the request is used to request the transmission of the media frame of the multimedia resource, and the pull position parameter is used to indicate the initial pull position of the media frame of the multimedia resource;
  • the first determining unit is configured to perform the operation based on the multimedia resource
  • the position parameter of the multimedia resource is pulled to determine the start frame of the multimedia resource;
  • the sending unit is configured to execute sending the media frame of the multimedia resource from the start frame, wherein the time stamp of the media frame is greater than or Equal to the timestamp of the starting frame.
  • a computer device including: one or more processors; one or more memories for storing executable instructions of the one or more processors; wherein, the One or more processors are configured to perform the following operations: in response to a frame obtaining request of the multimedia resource, obtaining a pull position parameter of the multimedia resource, the frame obtaining request is used to request the transmission of the media frame of the multimedia resource, The pull position parameter is used to indicate the start pull position of the media frame of the multimedia resource; determine the start frame of the multimedia resource based on the pull position parameter of the multimedia resource; from the start frame Start sending the media frame of the multimedia resource, wherein the time stamp of the media frame is greater than or equal to the time stamp of the starting frame.
  • a storage medium When at least one instruction in the storage medium is executed by one or more processors of a computer device, the computer device can perform the following operations: A frame acquisition request for a resource to acquire a pull position parameter of the multimedia resource, the frame acquisition request is used to request transmission of a media frame of the multimedia resource, and the pull position parameter is used to indicate a media frame of the multimedia resource The initial pull position of the multimedia resource; the initial frame of the multimedia resource is determined based on the pull position parameter of the multimedia resource; the media frame of the multimedia resource is sent from the initial frame, wherein the media frame The timestamp of is greater than or equal to the timestamp of the starting frame.
  • a computer program product including one or more instructions, and the one or more instructions can be executed by one or more processors of a computer device, so that the computer device can execute the foregoing On the one hand, the resource transmission method is involved.
  • Fig. 1 is a schematic diagram showing an implementation environment of a resource transmission method according to an embodiment
  • FIG. 2 is a schematic diagram of a FAS framework provided by an embodiment of the present disclosure
  • Fig. 3 is a flow chart showing a method for resource transmission according to an embodiment
  • Fig. 4 is an interaction flowchart of a resource transmission method according to an embodiment
  • FIG. 5 is a schematic diagram of a principle for determining a target timestamp provided by an embodiment of the present disclosure
  • Fig. 6 is a block diagram showing a logical structure of a resource transmission device according to an embodiment
  • Fig. 7 is a schematic structural diagram of a computer device provided by an embodiment of the present disclosure.
  • the user information involved in the present disclosure may be information authorized by the user or fully authorized by all parties.
  • FLV is a streaming media format
  • FLV streaming media format is a video format developed with the introduction of Flash MX (an animation production software). Due to the extremely small file size and fast loading speed, it makes it possible to watch video files online (that is, to browse videos online). Its emergence effectively solves the problem of SWF (a special file for Flash) that is exported after video files are imported into Flash. Format) The file is so large that it cannot be used well on the Internet.
  • Streaming media uses a streaming transmission method, which refers to a technology and process that compresses a series of multimedia resources and sends resource packages through the network, thereby real-time transmission of multimedia resources on the Internet for viewing.
  • This technology makes the resource packages look like flowing water Send; if you don’t use this technology, you must download the entire media file before using it, so that you can only watch multimedia resources offline.
  • Streaming can transmit on-site multimedia resources or multimedia resources pre-stored on the server. When viewer users are watching these multimedia resources, the multimedia resources can be played by specific playback software after being delivered to the viewer terminal of the viewer user.
  • FAS FLV Adaptive Streaming, FLV-based adaptive streaming media transmission standard
  • FAS is a streaming resource transmission standard (or called a resource transmission protocol) proposed in this disclosure. Unlike the traditional fragment-based media transmission method, the FAS standard can achieve frame-level multimedia resource transmission, and the server does not need to wait for a complete
  • the resource package can only be sent to the terminal after the video clip arrives, but after the terminal’s frame acquisition request is parsed, the pull position parameter is determined, and then the start frame of the multimedia resource is determined according to the pull position parameter.
  • the media frames are sent to the terminal frame by frame. It should be noted that each frame acquisition request can correspond to a certain code rate. When the terminal's own network bandwidth changes, the corresponding code rate can be adjusted adaptively, and the frame acquisition corresponding to the adjusted code rate can be resent.
  • the FAS standard can realize frame-level transmission and reduce end-to-end delay. Only when the code rate is switched, a new frame acquisition request needs to be sent, which greatly reduces the number of requests and reduces the communication overhead of the resource transmission process.
  • Live broadcast Multimedia resources are recorded in real time.
  • the host user “pushes” the media stream (referring to push based on the streaming transmission method) to the server through the host terminal.
  • the media stream is "pulled” from the server (referring to pull based on the streaming transmission method) to the audience terminal, and the audience terminal decodes and plays multimedia resources, thereby real-time video playback.
  • On-demand also known as Video On Demand (VOD)
  • multimedia resources are pre-stored on the server, and the server can provide the multimedia resources specified by the audience user according to the requirements of the audience user.
  • the audience terminal sends an on-demand request to the server. After the multimedia resource specified by the on-demand request is inquired, the multimedia resource is sent to the audience terminal, that is, the audience user can selectively play a specific multimedia resource.
  • On-demand content can be arbitrarily controlled to play progress, while live broadcast is not.
  • the speed of live broadcast content depends on the real-time live broadcast progress of the host user.
  • Fig. 1 is a schematic diagram showing an implementation environment of a resource transmission method according to an embodiment.
  • the implementation environment may include at least one terminal 101 and a server 102, where the server 102 is also a kind of computer equipment.
  • the server 102 is also a kind of computer equipment.
  • the terminal 101 is used for multimedia resource transmission, and each terminal may be equipped with a media codec component and a media playback component.
  • the media codec component is used to receive multimedia resources (such as fragmented transmission resources).
  • the multimedia resource is decoded after the package, frame-level transmission of the media frame, and the media playback component is used to play the multimedia resource after the multimedia resource is decoded.
  • the terminal 101 can be divided into a host terminal and a viewer terminal.
  • the host terminal corresponds to the host user
  • the viewer terminal corresponds to the viewer user.
  • the terminal can be the host terminal.
  • It can also be a viewer terminal for example, the terminal is the host terminal when the user is recording a live broadcast, and the terminal is the viewer terminal when the user is watching the live broadcast.
  • the terminal 101 and the server 102 may be connected through a wired network or a wireless network.
  • the server 102 is used to provide multimedia resources to be transmitted, and the server 102 may include at least one of a server, multiple servers, a cloud computing platform, or a virtualization center.
  • the server 102 can take on the main calculation work, and the terminal 101 can take on the secondary calculation work; or, the server 102 can take on the secondary calculation work, and the terminal 101 can take on the main calculation work; or, one of the terminal 101 and the server 102 Distributed computing architecture is used for collaborative computing.
  • the server 102 may be a clustered CDN (Content Delivery Network, Content Delivery Network) server.
  • the CDN server includes a central platform and edge servers deployed in various places.
  • the central platform performs load balancing, content distribution, scheduling, etc.
  • the functional module enables the terminal where the user is located to rely on the local edge server to obtain the required content (ie, multimedia resources) nearby.
  • the CDN server adds a caching mechanism between the terminal and the central platform.
  • the caching mechanism is also an edge server (such as a WEB server) deployed in different geographic locations.
  • the central platform will be based on the distance between the terminal and the edge server. Near and far, dispatching the edge server closest to the terminal to provide services to the terminal can more effectively distribute content to the terminal.
  • the multimedia resources involved in the embodiments of the present disclosure include, but are not limited to: at least one of video resources, audio resources, image resources, or text resources.
  • the embodiments of the present disclosure do not specifically limit the types of multimedia resources.
  • the multimedia resource is a live video stream of a network host, or a historical on-demand video pre-stored on the server, or a live audio stream of a radio host, or a historical on-demand audio pre-stored on the server.
  • the device types of the terminal 101 include, but are not limited to: TVs, smart phones, smart speakers, vehicle-mounted terminals, tablet computers, e-book readers, MP3 (Moving Picture Experts Group Audio Layer III, moving picture expert compression Standard audio layer 3) player, MP4 (Moving Picture Experts Group Audio Layer IV, moving picture expert compression standard audio layer 4) player, laptop portable computer or desktop computer at least one.
  • the terminal 101 includes a smart phone as an example.
  • the number of the foregoing terminal 101 may be only one, or the number of the terminal 101 may be tens or hundreds, or more.
  • the embodiments of the present disclosure do not limit the number of terminals 101 and device types.
  • FIG. 2 is a schematic diagram of a FAS framework provided by an embodiment of the present disclosure. Please refer to FIG. 2.
  • An embodiment of the present disclosure provides a FAS (Streaming-based Multi-rate Adaptive) framework in which the terminal The multimedia resource transmission is carried out between 101 and the server 102 through the FAS protocol.
  • FAS Streaming-based Multi-rate Adaptive
  • an application also known as FAS client
  • the application is used to browse multimedia resources.
  • the application can be a short video application, a live broadcast application, or a video On-demand applications, social applications, shopping applications, etc., the embodiments of the present disclosure do not specifically limit the types of applications.
  • the user can start the application on the terminal and display the resource push interface (such as the home page or function interface of the application).
  • the resource push interface includes at least one multimedia resource's abbreviated information.
  • the abbreviated information includes title, introduction, and release.
  • the terminal can jump from the resource pushing interface to the resource playing interface, and in response to at least one of a poster, a poster, a trailer, or a highlight clip, Include the playback option of the multimedia resource.
  • the terminal downloads the media presentation description file (MPD) of the multimedia resource from the server, and determines the multimedia resource based on the media description file.
  • MPD media presentation description file
  • the server may form multimedia resources with multiple code rates.
  • the server can allocate different address information for the multimedia resources with different code rates, and combine the multimedia resources with different code rates.
  • the address information of the multimedia resources is recorded in the MPD.
  • the terminal downloads the MPD it can send frame acquisition requests carrying different address information to the server at different times, and the server will return the media frames of the corresponding multimedia resources at different code rates.
  • the terminal determines the target code rate of the multimedia resource to be requested this time when it starts broadcasting (starts playing), indexes the target address information of the multimedia resource with the target code rate in the MPD, and sends the target address to the server.
  • Information frame acquisition request so that the target code rate of the multimedia resource that the terminal wants to request this time can be specified in the frame acquisition request, and the server returns the media frame of the multimedia resource according to the target code rate.
  • the terminal when the current network bandwidth situation of the terminal fluctuates, the terminal can adaptively adjust the code rate to be switched that matches the current network bandwidth condition based on the adaptive strategy, and index from the MPD to the code rate to be switched.
  • the terminal For the multimedia resource's to-be-switched address information, the terminal can disconnect the media stream transmission link of the current bitrate, and send a frame acquisition request carrying the to-be-switched address information to the server.
  • the server returns the media frame of the multimedia resource according to the Switch the media stream transmission link of the bitrate.
  • the terminal may not disconnect the media stream transmission link of the current bit rate, but directly re-initiate the frame acquisition request carrying the address information to be switched, and establish a media stream transmission link based on the bit rate to be switched (for transmission New media stream), the original media stream is used as the backup stream. Once the transmission of the new media stream is abnormal, the backup stream can continue to be played.
  • Fig. 3 is a flow chart showing a method for resource transmission according to an embodiment.
  • the method for resource transmission is applied to a computer device.
  • the computer device is an example of a server in the FAS framework involved in the foregoing implementation environment.
  • the server obtains a pull position parameter of the multimedia resource in response to a frame acquisition request of the multimedia resource.
  • the frame acquisition request is used to request the transmission of a media frame of the multimedia resource
  • the pull position parameter is used to indicate the multimedia resource.
  • the server determines the start frame of the multimedia resource based on the pull position parameter of the multimedia resource.
  • the server sends the media frame of the multimedia resource from the start frame, wherein the time stamp of the media frame is greater than or equal to the time stamp of the start frame.
  • determining the start frame of the multimedia resource includes:
  • the start frame of the multimedia resource is determined.
  • determining the target timestamp includes:
  • the target timestamp is the maximum timestamp minus the absolute value of the default value of the pull position parameter
  • the target timestamp is the maximum audio timestamp minus the absolute value of the default value of the pull position parameter
  • the target timestamp is the maximum timestamp
  • the target timestamp is the maximum audio timestamp
  • the target timestamp is the maximum timestamp minus the absolute value of the pull position parameter
  • the target timestamp is the maximum audio timestamp minus the absolute value of the pull position parameter
  • the target time stamp is the maximum time stamp
  • the target time stamp is the maximum audio time stamp
  • the pull position parameter is greater than 0 and no time stamp rollback occurs in the cache area, it is determined that the target timestamp is the pull position parameter.
  • the determining the start frame of the multimedia resource based on the target time stamp includes:
  • the waiting state is entered, and until the target media frame is written into the current valid buffer area, it is determined that the start frame is the target media frame.
  • the method further includes:
  • the pull failure information is sent.
  • the method further includes:
  • the method further includes:
  • the buffer area does not include video resources
  • the timestamp of the audio frame in the audio frame sequence is non-monotonically increasing, it is determined that the media frame sequence is non-monotonously increasing, where the audio frame sequence is the buffer in the buffer area.
  • a sequence of multiple audio frames is determined that the media frame sequence is non-monotonously increasing, where the audio frame sequence is the buffer in the buffer area.
  • the method further includes:
  • the media frames included in the last monotonically increasing stage are determined as resources in the current effective buffer area.
  • determining the start frame of the multimedia resource includes:
  • start frame is the media frame with the time stamp closest to the target time stamp in the current valid buffer area.
  • the maximum timestamp is the maximum video timestamp; based on the current valid buffer area not including the video resource , The maximum timestamp is the maximum audio timestamp.
  • acquiring the pull position parameter of the multimedia resource includes:
  • the frame acquisition request is parsed to obtain the pull position parameter
  • the default pull position parameter is configured, and the pull position parameter is configured as a default value.
  • sending the media frame of the multimedia resource from the start frame includes:
  • the address information of the multimedia resource is obtained by parsing
  • the media frame of the multimedia resource indicated by the address information is sent from the start frame.
  • Fig. 4 is an interaction flowchart of a resource transmission method according to an embodiment.
  • the resource transmission method can be applied to the FAS framework involved in the foregoing implementation environment.
  • the embodiment includes the following content.
  • the terminal sends a frame acquisition request of the multimedia resource to the server, and the frame acquisition request is used to request the transmission of the media frame of the multimedia resource.
  • an application program may be installed on the terminal, and the application program is used to browse multimedia resources.
  • the application program may include at least one of a short video application, a live broadcast application, a video-on-demand application, a social application, or a shopping application. The example does not specifically limit the type of application.
  • the multimedia resources involved in the embodiments of the present disclosure include, but are not limited to: at least one of video resources, audio resources, image resources, or text resources.
  • the embodiments of the present disclosure do not specifically limit the types of multimedia resources.
  • the multimedia resource is a live video stream of a network host, or a historical on-demand video pre-stored on the server, or a live audio stream of a radio host, or a historical on-demand audio pre-stored on the server.
  • the user can start an application on the terminal, and the application displays a resource push interface.
  • the resource push interface may be the homepage or function interface of the application.
  • the embodiment of the present disclosure does not specify the type of the resource push interface. limited.
  • the resource pushing interface may include abbreviated information of at least one multimedia resource, and the abbreviated information includes at least one of a title, a brief introduction, a poster, a trailer, or a highlight segment of the multimedia resource.
  • the user can click on the abbreviated information of the multimedia resource of interest.
  • the terminal can jump from the resource push interface to the resource play interface .
  • the resource play interface may include a play area and a comment area, the play area may include the play options of the multimedia resource, and the comment area may include other users' viewing comments on the multimedia resource.
  • the terminal responds to the user’s touch operation on the play option, downloads the MPD of the multimedia resource from the server, and then the terminal determines the target bit rate, from which The MPD obtains the target address information of the multimedia resource with the target code rate, generates a frame acquisition request (FAS request) carrying the target address information, and sends the frame acquisition request carrying the target address information to the server.
  • FAS request frame acquisition request
  • the MPD file format may be JSON (JavaScript Object Notation, JS object notation) or other script formats.
  • JSON JavaScript Object Notation, JS object notation
  • the embodiment of the present disclosure does not specifically limit the MPD file format.
  • the MPD file may include a version number (@version) and a media description set (@adaptationSet), and may also include a service type (@type) and a function option (@hideAuto ) Or at least one of the function options (@autoDefaultSelect) used to indicate whether to turn on the adaptive function by default when the broadcast is started.
  • version number @version
  • media description set @adaptationSet
  • function option @hideAuto
  • at least one of the function options (@autoDefaultSelect) used to indicate whether to turn on the adaptive function by default when the broadcast is started.
  • the version number may include at least one of the version number of the media description file or the version number of the resource transmission standard (FAS standard).
  • FAS standard resource transmission standard
  • the media description set is used to represent the meta-information of the multimedia resource.
  • the media description set may include multiple media description meta-information.
  • Each media description meta-information corresponds to a multimedia resource with a bit rate, and each media description meta-information It may include the length of the group of pictures (@gopDuration) and the attribute information (@representation) of the multimedia resource of the code rate corresponding to the media description meta-information.
  • GOP The length of a group of pictures (Group Of Pictures, GOP) refers to the distance between two key frames (Intra-coded pictures, also referred to as "I frames").
  • Each attribute information can include the identification information of the multimedia resource (@id, a unique identifier), the encoding method of the multimedia resource (@codec, the codec standard to be complied with), and the bit rate supported by the multimedia resource (@bitrate, resource transmission The number of data bits transmitted in a unit of time) and the address information of the multimedia resource of the bit rate (@url, the URL or domain name provided by the multimedia resource of a certain bit rate.
  • each attribute information can also include the quality type of the multimedia resource (@qualityType, including quality evaluation indicators such as resolution and frame rate), and the hidden option of the multimedia resource (@hiden, used to indicate a code Whether the multimedia resource of the rate is visible, that is, whether the user can manually the multimedia resource of the bit rate), the function option used to indicate whether the multimedia resource is visible relative to the adaptive function (@enableAdaptive, refers to whether the adaptive function can select a certain At least one of the multimedia resources with a bit rate) or the default playback function option (@defaultSelect, refers to whether to play a multimedia resource with a certain bit rate by default when the broadcast is started).
  • qualityType including quality evaluation indicators such as resolution and frame rate
  • hidden option of the multimedia resource @hiden, used to indicate a code Whether the multimedia resource of the rate is visible, that is, whether the user can manually the multimedia resource of the bit rate
  • the function option used to indicate whether the multimedia resource is visible relative to the adaptive function (@enableAdaptive, refers to whether the adaptive function can
  • the service type is used to specify the service type of the multimedia resource, including at least one of live broadcast or on-demand.
  • the terminal can provide the user with a code rate selection list.
  • the user clicks on any value in the code rate selection list it triggers the generation of a code rate selection instruction carrying the value, and the terminal responds to the code rate selection instruction to The value carried by the code rate selection instruction is determined as the target code rate.
  • the terminal can also adjust the target bit rate to the bit rate corresponding to the current network bandwidth information through the adaptive function.
  • the target bitrate with the best playback effect can be dynamically selected based on the playback status information of the terminal.
  • the frame acquisition request may also carry at least one of audio parameters or pull position parameters, which are introduced in the following 402 and 403, respectively.
  • the aforementioned frame acquisition request may not carry audio parameters and pull position parameters.
  • both parameters are defaulted, and the server will allocate and configure default values for the two parameters, which will be described in detail in 404 below.
  • the server obtains the pull position parameter of the multimedia resource in response to the frame acquisition request of the multimedia resource, and the pull position parameter is used to indicate the initial pull position of the media frame of the multimedia resource.
  • the pull position parameter (@fasSpts) is used to indicate the specific frame from which the server sends the media stream.
  • the data type of the pull position parameter can be int64_t type, of course, it can also be other data types. The embodiment of the present disclosure is not correct.
  • the data type of the pull position parameter is specifically limited. In the frame acquisition request, the pull position parameter can be equal to 0, greater than 0, less than 0, or default. In the case of different values, it will correspond to different processing logic of the server, which will be described in detail in 404 below.
  • the server may parse the frame acquisition request to obtain the pull position parameter.
  • the terminal specifies the pull position parameter in the frame acquisition request, and the server may Directly parse the @fasSpts field of the frame acquisition request to obtain the pull position parameter.
  • the server configures the pull position parameter as a default value.
  • the default value here can be configured by the server according to the business scenario. For example, in the live broadcast business scenario, the defaultSpts can be set to 0, and in the on-demand business scenario, the defaultSpts can be set to the PTS( Presentation Time Stamp, displays the time stamp). If the PTS of the historical media frame is not recorded in the cache, then set defaultSpts as the PTS of the first media frame.
  • the server obtains the audio parameter of the multimedia resource, and the audio parameter is used to indicate whether the media frame of the multimedia resource is an audio frame.
  • the audio parameter (@onlyAudio) is used to indicate the pull mode of the media stream. If set to true, it means that the media frame transmitted by the server to the terminal is an audio frame, commonly known as “pure audio mode", otherwise, if set If false, it means that the media frames transmitted by the server to the terminal are audio and video frames, commonly known as “non-pure audio mode”.
  • the audio parameter can be true, false or default, and different values will correspond to different processing logic of the server, which will be described in detail in 404 below.
  • the server can parse the frame acquisition request to obtain the audio parameters.
  • the terminal specifies the audio parameters in the frame acquisition request, and the server can directly respond to the frame acquisition request.
  • the @onlyAudio field is parsed to obtain audio parameters.
  • the server configures the audio parameter as the default value.
  • the terminal does not specify the audio parameter in the frame acquisition request, and the server configures the default value for it.
  • the server determines the target timestamp based on the audio parameter and the pull position parameter.
  • the server may refresh the current valid cache area by executing the following 404A-404B:
  • the server determines that the time stamp rollback occurs in the buffer area.
  • the server can determine that the time stamp rollback has not occurred in the buffer area.
  • the media frame sequence is a sequence composed of multiple media frames that have been buffered in the buffer area.
  • the above-mentioned time stamp rollback phenomenon means that the media frames in the buffer area are not stored in the order of monotonically increasing timestamps. At this time, there are redundant media frames in the buffer area. This phenomenon usually occurs in live broadcast business scenarios. When the terminal pushes the stream to the server, due to network fluctuations, delays, etc., the media frame sent first may arrive at the server later, causing the timestamp of the media frame in the media frame sequence in the buffer area to increase non-monotonously, causing The time stamp rollback phenomenon. In addition, in order to avoid the problem of packet loss, the host terminal usually sends each media frame multiple times. This redundant multiple transmission mechanism will also cause the time stamp of the media frame in the media frame sequence in the buffer area to appear. The non-monotonic increase causes the time stamp to roll back.
  • the server When determining whether the timestamp of the media frame in the media frame sequence is increasing non-monotonically, the server only needs to start with the media frame with the smallest timestamp, and according to the storage order of the media frame sequence in the buffer area, traverse whether there is a media frame with a timestamp greater than The time stamp of the next media frame, if any media frame has a time stamp greater than the time stamp of the next media frame, it is determined that the time stamp of the media frame in the media frame sequence is non-monotonously increasing, and the time stamp rollback occurs in the buffer area.
  • the timestamps of the media frames in the media frame sequence in the buffer area are [1001,1002,1003,1004,1005...], and the timestamps of the omitted parts of the media frames are increasing.
  • the time stamps of the media frames in the media frame sequence The timestamp increases monotonically, and there is no timestamp rollback phenomenon in the buffer area.
  • the timestamps of the media frames in the media frame sequence in the buffer area are [1001,1002,1003,1001,1002,1003,1004...], and the timestamps of the omitted parts of the media frames are increasing.
  • video resources and audio resources can be discussed separately: for video resources, when judging whether the timestamps of media frames in a media frame sequence are non-monotonously increasing, only the key frames of the video resources (I Frame) whether the time stamp of the key frame in the sequence is non-monotonously increasing; for audio resources, when judging whether the time stamp of the media frame in the media frame sequence is non-monotonously increasing, you can consider the value of the audio frame in the audio frame sequence of the audio resource. Whether the timestamp is increasing non-monotonically.
  • the media frame sequence is non-monotonously increasing, where the key frame sequence is in the buffer area.
  • the audio frame sequence is a sequence composed of multiple audio frames buffered in the buffer area.
  • the coding and decoding of the I frame does not need to refer to other image frames, and can be achieved only by using the information of this frame.
  • P frame Predictive-coded picture
  • B frame Bidirectionally predicted picture
  • the encoding and decoding of predictively encoded image frames all need to refer to other image frames, and the encoding and decoding cannot be completed by using only the information of this frame.
  • the P and B frames are decoded based on the I frame.
  • the time stamp rollback phenomenon may occur more than once, that is to say, in the time stamp of the media frame in the media frame sequence, multiple monotonically increasing stages can be divided, and the media frame within each stage The timestamp of is increasing monotonically, but the timestamps of media frames between different stages are increasing non-monotonously.
  • the server can determine in the buffer area by executing the following 404B Currently valid buffer area.
  • the server determines each media frame included in the last monotonically increasing stage as a resource in the current effective buffer area.
  • the server determines the first media frame in the last monotonic increase phase from the media frame sequence, and changes the media frame sequence from the first media frame to the media frame with the largest time stamp (equivalent to the latest media frame). All media frames between) are determined as the current effective buffer area, so that it can be guaranteed that the media frames in the current effective buffer area are monotonically increasing.
  • the timestamps of the media frames in the media frame sequence in the buffer area are [1001,1002,1003,1001,1002,1003,1004...], and the timestamps of the omitted parts of the media frames are incremented.
  • the buffer area The timestamp is rolled back, and it can be seen that the first media frame in the last monotonously increasing stage is the 4th media frame, then all media frames from the 4th media frame to the latest media frame are determined as the current effective buffer Area.
  • the timestamps of the media frames in the media frame sequence in the buffer area are [1001,1002,1003,1001,1002,1003,1001...]
  • the timestamps of the omitted parts of the media frames increase, and the time when the buffer area occurs Stamping back, it can be seen that the first media frame in the last monotonically increasing stage is the 7th media frame, then all media frames from the 7th media frame to the latest media frame are determined as the current effective buffer area .
  • video resources and audio resources can be discussed separately: if video resources are included in the buffer area, for video resources, the server can use the I frame of the video resource as the calculation point, starting from the last monotonically increasing stage All media frames between the first key frame and the latest video frame are used as the current effective buffer area. Among them, the timestamp of the latest video frame can be expressed as latestVideoPts; if the buffer area does not include video resources, for audio resources, The server can use the audio frame as the calculation point, and all media frames from the first audio frame of the last monotonic increase stage to the latest audio frame are used as the current effective buffer area, where the timestamp of the latest audio frame can be expressed as latestAudioPts .
  • the operation of updating the current valid buffer area can be triggered periodically or manually by a technician. Of course, it can also be updated every time a frame acquisition request is received. This method is called “passive”. "Trigger”, the embodiment of the present disclosure does not specifically limit the trigger condition for updating the currently valid buffer area.
  • FIG. 5 is a schematic diagram of a principle for determining a target timestamp provided by an embodiment of the present disclosure. Please refer to FIG. 5, which shows that the server has different processing logics under different pull position parameters and audio parameter values. In the following, the processing logic of the server will be introduced. Since the value of the pull position parameter can be divided into four types: default value, equal to 0, less than 0, and greater than 0, the following four situations will be described separately.
  • the server determines the target timestamp by subtracting the absolute value of the default value of the pull position parameter from the maximum timestamp .
  • the maximum time stamp is the maximum video timestamp latestVideoPts; based on the video resource not included in the current valid buffer area, the maximum time stamp is the maximum audio timestamp latestAudioPts.
  • @onlyAudio audio parameter
  • the server's processing rules are as follows:
  • the server determines the value obtained from latestVideoPts–
  • the processing rules of the server at this time are as follows: the server will latestAudioPts–
  • the obtained value is determined as the target timestamp.
  • the maximum time stamp is the maximum video timestamp latestVideoPts; based on the video resource not included in the current valid buffer area, the maximum time stamp is the maximum audio timestamp latestAudioPts.
  • the server determines the latestVideoPts as the target timestamp; based on the fact that no video resource is included in the current valid buffer area, the server determines the latestAudioPts as the target timestamp.
  • the maximum time stamp is the maximum video timestamp latestVideoPts; based on the video resource not included in the current valid buffer area, the maximum time stamp is the maximum audio timestamp latestAudioPts.
  • the above process refers to the case where the @fasSpts field in the frame acquisition request carries a value less than 0 (@fasSpts ⁇ 0).
  • the server determines latestVideoPts-
  • the above process refers to the case where the @fasSpts field in the frame acquisition request carries a value less than 0 (@fasSpts ⁇ 0).
  • the server's processing rules are as follows: the server determines the latestAudioPts-
  • the maximum time stamp is the maximum video timestamp latestVideoPts; based on the video resource not included in the current valid buffer area, the maximum time stamp is the maximum audio timestamp latestAudioPts.
  • the above process refers to the case where the @fasSpts field in the frame acquisition request carries a value greater than 0 (@fasSpts>0).
  • the server determines the latestVideoPts as the target timestamp; b) based on the current effective buffer area not including video resources, the server determines the latestAudioPts as Target timestamp.
  • the above process refers to the case where the @fasSpts field in the frame acquisition request carries a value greater than 0 (@fasSpts>0).
  • the server's processing rules are as follows: the server determines the latestAudioPts as the target timestamp.
  • the pull position parameter is determined as the target time stamp.
  • the above process refers to the case where the @fasSpts field in the frame acquisition request carries a value greater than 0 (@fasSpts>0).
  • the pull position parameter is determined as the target timestamp.
  • the above process refers to the case where the @fasSpts field in the frame acquisition request carries a value greater than 0 (@fasSpts>0).
  • the server's processing rules are as follows: when the time stamp rollback does not occur in the buffer area, the server will determine @fasSpts as the target time stamp.
  • the operation of the server to determine whether a time stamp rollback occurs can be referred to above 404A, and the operation of the server to update the current valid buffer area can be referred to above 404B, which will not be repeated here.
  • the server can execute the corresponding processing logic when pulling different values of the position parameter, thereby determining the target timestamp.
  • the target timestamp is used to determine the starting frame of the multimedia resource in the following 405 .
  • the server determines the start frame of the multimedia resource based on the target timestamp.
  • the server may determine the starting frame in the following way:
  • Manner 1 The server determines the media frame whose time stamp is closest to the target time stamp in the current effective buffer area as the start frame.
  • the key frame (I frame) of the video resource whose time stamp is closest to the target time stamp is determined Is the starting frame; based on the fact that no video resources are included in the current effective buffer area, the audio frame whose time stamp is closest to the target time stamp is determined as the starting frame.
  • the server may directly determine the audio frame whose time stamp is closest to the target time stamp as the starting frame.
  • the method for determining the starting frame includes:
  • the target timestamp is latestVideoPts–
  • the server takes the I frame closest to the latestVideoPts–
  • the server Based on the current effective buffer area including video resources, the target timestamp is latestVideoPts, the server will PTS The I frame closest to the latestVideoPts is taken as the starting frame; based on the current effective buffer area does not include video resources, the target timestamp is the latestAudioPts, and the server uses the audio frame closest to the latestAudioPts of the PTS as the starting frame.
  • @fasSpts ⁇ 0, @onlyAudio default or @onlyAudio false, please refer to example 1) in case 3 of 404 above, based on the current effective buffer area including video resources, the target timestamp is latestVideoPts-
  • @fasSpts>0, @onlyAudio default or @onlyAudio false, when the time stamp rollback occurs in the buffer area, please refer to the example 1) in the above 404 case 4), based on the current effective buffer area including video resources, The target timestamp is latestVideoPts, and the server uses the I-frame closest to latestVideoPts (the latest I-frame) as the starting frame; based on the current effective buffer area does not include video resources, the target timestamp is latestAudioPts, and the server sets the PTS closest to the latestAudioPts The audio frame (the latest audio frame) is used as the starting frame.
  • the server can also use the above method 1 to bring the timestamp in the current effective cache area closest to the target timestamp
  • the media frame of is determined as the starting frame, and no enumeration is performed here.
  • the server may also determine the media frame in the following method 2:
  • Manner 2 Based on the presence of the target media frame in the current effective buffer area, the server determines the target media frame as the starting frame, and the time stamp of the target media frame is greater than or equal to the target time stamp and is closest to the target time stamp.
  • the target media frame refers to the I frame in the video resource; based on the fact that the current valid buffer area does not include the video resource For video resources, the target media frame refers to the audio frame.
  • the target media frame refers to an audio frame.
  • the method for determining the starting frame includes:
  • the server can start with the smallest I frame of PTS and traverse one by one in the direction of increasing PTS until the first I frame (target media frame) with PTS ⁇ @fasSpts is found.
  • the server determines the above target media frame as the start frame; based on the current effective buffer area does not include video resources, the server can start from the audio frame with the smallest PTS and follow the direction of increasing PTS Traverse one by one until the first audio frame (target media frame) with PTS ⁇ @fasSpts is found, indicating that there is a target media frame in the current effective buffer area, and the server determines the foregoing target media frame as the starting frame.
  • the target time stamp is @fasSpts
  • the server can be the smallest from PTS Beginning with the audio frame of PTS, traverse one by one along the increasing direction of PTS until the first audio frame (target media frame) with PTS ⁇ @fasSpts is found, indicating that there is a target media frame in the current effective buffer area, and the server will The frame is determined as the starting frame.
  • the server determines the starting frame when the target media frame can be queried in the current valid buffer area.
  • the target media cannot be queried in the current valid buffer area.
  • Frames This situation usually occurs in live broadcast business scenarios.
  • the frame acquisition request specified by the audience terminal to pull @fasSpts arrives at the server first, and the media frame (live video frame) corresponding to @fasSpts is still in the streaming stage.
  • the server can also determine the starting frame through the following method three at this time.
  • Method 3 Based on the fact that there is no target media frame in the current valid buffer area, the server enters a waiting state until the target media frame is written into the current valid buffer area, the target media frame is determined as the starting frame, and the target media frame The timestamp of is greater than or equal to the target timestamp and is closest to the target timestamp.
  • the target media frame refers to the I frame in the video resource; based on the fact that the current valid buffer area does not include the video resource For video resources, the target media frame refers to the audio frame.
  • the target media frame refers to an audio frame.
  • the method for determining the starting frame includes:
  • the server can start from the I frame with the smallest PTS and traverse one by one in the direction of increasing PTS.
  • the server If all I frames are traversed, no I frames satisfying PTS ⁇ @fasSpts ( Target media frame), indicating that there is no target media frame in the current effective buffer area, and the server enters the waiting state, waiting for the first I frame (target media frame) with PTS ⁇ @fasSpts to be written into the current effective buffer area, and the target media
  • the frame is determined as the starting frame; based on the fact that no video resources are included in the current effective buffer area, the server can start from the audio frame with the smallest PTS and traverse one by one along the direction of increasing PTS. If all the audio frames are traversed, the query cannot be satisfied.
  • the audio frame (target media frame) with PTS ⁇ @fasSpts indicates that there is no target media frame in the current effective buffer area, and the server enters the waiting state, waiting for the first audio frame (target media frame) with PTS ⁇ @fasSpts to be written into the current When the buffer area is valid, the target media frame is determined as the start frame.
  • the target time stamp is @fasSpts
  • the server can be the smallest from PTS Starting from the audio frame, traverse one by one along the direction of increasing PTS. If after traversing all audio frames, no audio frames satisfying PTS ⁇ @fasSpts (target media frames) are found, it means that there is no target media in the current effective buffer area. Frame, the server enters the waiting state and waits for the first audio frame (target media frame) with PTS ⁇ @fasSpts to be written into the current valid buffer area, and then determines the target media frame as the start frame.
  • the server determines the starting frame when the target media frame cannot be found in the current valid buffer area.
  • abnormal conditions may cause the frame acquisition request to carry @fasSpts is a large outlier. If processed based on the above method 3, it will cause a long waiting time. In a big data scenario, if there are exceptions to concurrent frame acquisition requests, these frame acquisition requests will all enter a block. The waiting state of the server occupies the processing resources of the server, which will cause great losses to the performance of the server.
  • the server may also set a timeout threshold, so as to determine whether it is necessary to return the pull failure information based on the timeout threshold through the following method 4.
  • the method 4 is described in detail below.
  • Method 4 Based on the fact that there is no target media frame in the current valid buffer area, and the difference between the target timestamp and the maximum timestamp is greater than the timeout threshold, the server sends pull failure information, and the timestamp of the target media frame is greater than or equal to The target timestamp is closest to the target timestamp.
  • the maximum time stamp is the maximum video timestamp latestVideoPts; based on the current valid buffer area not including video Resource, the maximum timestamp is the maximum audio timestamp latestAudioPts.
  • the maximum timestamp is the maximum audio timestamp latestAudioPts.
  • the timeout threshold can be any value greater than or equal to 0.
  • the timeout threshold can be a value preset by the server, or it can be personalized by a technician based on business scenarios.
  • the embodiments of the present disclosure do not treat timeouts.
  • the method of obtaining the threshold is specifically limited, for example:
  • the server can start from the I frame with the smallest PTS and traverse one by one in the direction of increasing PTS.
  • the server judges whether the difference between @fasSpts and latestVideoPts is greater than timeoutPTS, if @fasSpts–latestVideoPts>timeoutPTS, the server sends a pull failure message to the terminal, otherwise If @fasSpts–latestVideoPts ⁇ timeoutPTS, the server can enter the waiting state, which corresponds to the operation performed in the corresponding case of example K) in the third method; based on the current effective buffer area does not include video resources, the server can start from PTS Start with the smallest audio frame and traverse one by one along the direction of increasing PTS.
  • the server can determine whether the difference between @fasSpts and latestAudioPts is greater than timeoutPTS. If @fasSpts–latestAudioPts>timeoutPTS, the server sends a pull failure message to the terminal, otherwise, if @fasSpts–latestAudioPts ⁇ timeoutPTS, the server can enter the wait The state, that is, corresponds to the operation performed in the corresponding case of example K) in the third manner.
  • the target time stamp is @fasSpts
  • the server can be the smallest from PTS Starting from the audio frame, traverse one by one along the direction of increasing PTS. If after traversing all audio frames, no audio frames satisfying PTS ⁇ @fasSpts (target media frames) are found, it means that there is no target media in the current effective buffer area.
  • the server can determine whether the difference between @fasSpts and latestAudioPts is greater than timeoutPTS, if @fasSpts–latestAudioPts>timeoutPTS, the server sends a pull failure message to the terminal, otherwise, if @fasSpts–latestAudioPts ⁇ timeoutPTS, the server can enter the waiting state , That is, it corresponds to the operation performed in the corresponding case of example L) in the third way.
  • the combination of the above method 3 and method 4 can provide an exception handling logic when @fasSpts>0 and the target media frame does not exist in the current effective buffer area, based on the difference between the target timestamp and the maximum timestamp being less than Or equal to the timeout threshold, the server enters the waiting state (waiting processing mode) through mode three, and determines the target media frame as the starting frame when the target media frame arrives, otherwise, based on the difference between the target timestamp and the maximum timestamp If the value is greater than the timeout threshold, the server sends pull failure information (error processing mode) through method four. At this time, the server determines that the frame acquisition request is wrong, so it directly returns the pull failure message to the terminal.
  • the pull failure message can be an error code form.
  • the server determines the starting frame of the multimedia resource based on the pull position parameter of the multimedia resource. Furthermore, in scenarios where dynamic bit rate switching is required, it only needs to replace the carried in the frame acquisition request. Address information (@url field) and pull location parameters (@fasSpts field) can realize the transmission of media frames at a new bit rate starting from any specified starting frame.
  • the server sends the media frame of the multimedia resource to the terminal from the start frame, where the timestamp of the media frame is greater than or equal to the timestamp of the start frame.
  • the server may parse and obtain the address information of the multimedia resource based on the frame acquisition request, and start sending the media frame of the multimedia resource indicated by the address information from the start frame.
  • the server since the frame acquisition is The address information carried in the request corresponds to the target code rate, and the server can send the media stream at the target code rate from the start frame.
  • the server can continuously send media frames to the terminal like streaming water, which can be vividly called "media streaming”.
  • the target address information can be a domain name
  • the terminal can send a frame acquisition request to the central platform of the CDN server, and the central platform calls DNS (Domain Name System, domain name system), which is essentially A domain name resolution library) parses the domain name to obtain the CNAME (alias) record corresponding to the domain name.
  • DNS Domain Name System, domain name system
  • the CNAME record is parsed again to obtain the IP (Internet Protocol) of the edge server closest to the terminal. Interconnection protocol) address.
  • the central platform directs the frame acquisition request to the above-mentioned edge server, and the edge server responds to the frame acquisition request to provide the terminal with the media frame of the multimedia resource at the target bit rate.
  • the embodiments of the present disclosure provide an internal back-to-source mechanism of the CDN server.
  • the edge server cannot provide the multimedia resources specified by the frame acquisition request.
  • the edge server can be used by the upper node device Pull the media stream back to the source.
  • the edge server can send back the source pull request to the upper-level node device, and the upper-level node device returns the corresponding media stream to the edge server in response to the back-to-source pull request, and the edge server sends the corresponding media stream to the terminal.
  • the edge server when the edge server obtains the back-to-origin pull request, based on the @fasSpts field in the frame acquisition request sent by the terminal, the edge server can directly determine the frame acquisition request as a back-to-origin pull request, and pull back to the source. The request is forwarded to the upper-level node device.
  • the edge server needs to configure the default value defaultSpts for the @fasSpts field, and then embed the @fasSpts field in the frame acquisition request and place the @fasSpts field in the frame acquisition request
  • the stored value is set to defaultSpts, and the pull request back to the source is obtained.
  • the upper-level node device may be a third-party origin server. In this case, the return-to-origin pull request must carry the @fasSpts field.
  • the embodiment of the present disclosure does not specifically limit the return-to-source manner of the edge server.
  • the terminal receives the media frame of the multimedia resource, and plays the media frame of the multimedia resource.
  • the terminal can store the media frame in the buffer area and call the media codec
  • the component decodes the media frame to obtain the decoded media frame, and calls the media playback component to play the media frames in the buffer area in the order of PTS from small to large.
  • the terminal can determine the encoding method of the multimedia resource from the @codec field of the media description file, and determine the corresponding decoding method according to the encoding method, so as to decode the media frame according to the determined decoding method.
  • Fig. 6 is a block diagram showing a logical structure of a resource transmission device according to an embodiment.
  • the device includes an acquiring unit 601, a first determining unit 602, and a sending unit 603, which will be described below.
  • the acquiring unit 601 is configured to perform a frame acquisition request in response to a multimedia resource to acquire a pull position parameter of the multimedia resource, the frame acquisition request is used to request transmission of a media frame of the multimedia resource, and the pull position parameter is used to indicate The initial pull position of the media frame of the multimedia resource;
  • the first determining unit 602 is configured to perform a pull position parameter based on the multimedia resource to determine the start frame of the multimedia resource;
  • the sending unit 603 is configured to send the media frame of the multimedia resource from the start frame, wherein the time stamp of the media frame is greater than or equal to the time stamp of the start frame.
  • the acquiring unit 601 is further configured to perform: acquiring an audio parameter of the multimedia resource, where the audio parameter is used to indicate whether the media frame is an audio frame;
  • the first determining unit 602 includes:
  • the first determining subunit is configured to determine the target timestamp based on the audio parameter and the pull position parameter;
  • the second determining subunit is configured to determine the start frame of the multimedia resource based on the target timestamp.
  • the first determining subunit is configured to execute:
  • the target timestamp is the maximum timestamp minus the absolute value of the default value of the pull position parameter
  • the target timestamp is the maximum audio timestamp minus the absolute value of the default value of the pull position parameter
  • the target timestamp is the maximum timestamp
  • the target timestamp is the maximum audio timestamp
  • the target timestamp is the maximum timestamp minus the absolute value of the pull position parameter
  • the target timestamp is the maximum audio timestamp minus the absolute value of the pull position parameter
  • the target time stamp is the maximum time stamp
  • the target time stamp is the maximum audio time stamp
  • the pull position parameter is greater than 0 and no time stamp rollback occurs in the cache area, it is determined that the target timestamp is the pull position parameter.
  • the second determining subunit is configured to execute:
  • the waiting state is entered, and until the target media frame is written into the current valid buffer area, it is determined that the start frame is the target media frame.
  • the sending unit 603 is further configured to execute:
  • the pull failure information is sent.
  • the device further includes:
  • the second determining unit is configured to perform a non-monotonic increase in the timestamps of the media frames in the media frame sequence in the buffer area, and determine that the timestamp rollback occurs in the buffer area; based on the media frame in the media frame sequence in the buffer area
  • the time stamp is monotonically increasing, and it is determined that no time stamp rollback has occurred in the buffer area, where the media frame sequence is a sequence formed by multiple media frames buffered in the buffer area.
  • the second determining unit is further configured to execute:
  • the buffer area does not include video resources
  • the timestamp of the audio frame in the audio frame sequence is non-monotonically increasing, it is determined that the media frame sequence is non-monotonously increasing, where the audio frame sequence is the buffer in the buffer area.
  • a sequence of multiple audio frames is determined that the media frame sequence is non-monotonously increasing, where the audio frame sequence is the buffer in the buffer area.
  • the device further includes:
  • the third determining unit is configured to determine each media frame included in the last monotonically increasing stage as a resource in the current effective buffer area.
  • the second determining subunit is configured to execute:
  • start frame is the media frame with the time stamp closest to the target time stamp in the current valid buffer area.
  • the maximum timestamp is the maximum video timestamp; based on the current valid buffer area not including the video resource , The maximum timestamp is the maximum audio timestamp.
  • the acquiring unit 601 is configured to execute:
  • the frame acquisition request is parsed to obtain the pull position parameter
  • the default pull position parameter is configured, and the pull position parameter is configured as a default value.
  • the sending unit 603 is configured to execute:
  • the address information of the multimedia resource is obtained by parsing
  • the media frame of the multimedia resource indicated by the address information is sent from the start frame.
  • FIG. 7 is a schematic structural diagram of a computer device provided by an embodiment of the present disclosure.
  • the computer device may be a server in the FAS framework.
  • the computer device 700 may have relatively large differences due to different configurations or performances, and may include one or one The above processor (Central Processing Units, CPU) 701 and one or more memories 702, wherein at least one program code is stored in the memory 702, and the at least one program code is loaded and executed by the processor 701 to realize each of the above The resource transmission method provided by the embodiment.
  • the computer device 700 may also have components such as a wired or wireless network interface, a keyboard, and an input/output interface for input and output.
  • the computer device 700 may also include other components for implementing device functions, which will not be repeated here.
  • the computer device includes one or more processors, and one or more memories for storing executable instructions of the one or more processors, wherein the one or more processors are configured to Execute this instruction to achieve the following operations:
  • the pull position parameter of the multimedia resource is acquired.
  • the frame acquisition request is used to request the transmission of the media frame of the multimedia resource.
  • the pull position parameter is used to indicate the start of the media frame of the multimedia resource.
  • the media frame of the multimedia resource is sent from the start frame, where the timestamp of the media frame is greater than or equal to the timestamp of the start frame.
  • the one or more processors are configured to execute the instructions to implement the following operations:
  • the start frame of the multimedia resource is determined.
  • the one or more processors are configured to execute the instructions to implement the following operations:
  • the target timestamp is the maximum timestamp minus the absolute value of the default value of the pull position parameter
  • the target timestamp is the maximum audio timestamp minus the absolute value of the default value of the pull position parameter
  • the target timestamp is the maximum timestamp
  • the target timestamp is the maximum audio timestamp
  • the target timestamp is the maximum timestamp minus the absolute value of the pull position parameter
  • the target timestamp is the maximum audio timestamp minus the absolute value of the pull position parameter
  • the target time stamp is the maximum time stamp
  • the target time stamp is the maximum audio time stamp
  • the pull position parameter is greater than 0 and no time stamp rollback occurs in the cache area, it is determined that the target timestamp is the pull position parameter.
  • the one or more processors are configured to execute the instruction to implement the following operations:
  • the waiting state is entered, and until the target media frame is written into the current valid buffer area, it is determined that the start frame is the target media frame.
  • the one or more processors are further configured to execute the instructions to implement the following operations:
  • the pull failure information is sent.
  • the one or more processors are further configured to execute the instruction to implement the following operations:
  • the one or more processors are further configured to execute the instructions to implement the following operations:
  • the buffer area does not include video resources
  • the timestamp of the audio frame in the audio frame sequence is non-monotonically increasing, it is determined that the media frame sequence is non-monotonously increasing, where the audio frame sequence is the buffer in the buffer area.
  • a sequence of multiple audio frames is determined that the media frame sequence is non-monotonously increasing, where the audio frame sequence is the buffer in the buffer area.
  • the one or more processors are further configured to execute the instructions to implement the following operations:
  • the media frames included in the last monotonically increasing stage are determined as resources in the current effective buffer area.
  • the one or more processors are configured to execute the instructions to implement the following operations:
  • start frame is the media frame with the time stamp closest to the target time stamp in the current valid buffer area.
  • the maximum timestamp is the maximum video timestamp; based on the current valid buffer area not including the video resource , The maximum timestamp is the maximum audio timestamp.
  • the one or more processors are configured to execute the instructions to implement the following operations:
  • the frame acquisition request is parsed to obtain the pull position parameter
  • the default pull position parameter is configured, and the pull position parameter is configured as a default value.
  • the one or more processors are configured to execute the instructions to implement the following operations:
  • the address information of the multimedia resource is obtained by parsing
  • the media frame of the multimedia resource indicated by the address information is sent from the start frame.
  • a storage medium including at least one instruction such as a memory including at least one instruction, which can be executed by a processor in a computer device to complete the resource transmission method in the foregoing embodiment.
  • the aforementioned storage medium may be a non-transitory computer-readable storage medium.
  • the non-transitory computer-readable storage medium may include ROM (Read-Only Memory, read-only memory), RAM (Random-Access Memory) , Random Access Memory), CD-ROM (Compact Disc Read-Only Memory, CD-ROM), magnetic tape, floppy disk and optical data storage devices, etc.
  • the computer device when at least one instruction in the storage medium is executed by one or more processors of the computer device, the computer device is enabled to perform the following operations:
  • the pull position parameter of the multimedia resource is acquired.
  • the frame acquisition request is used to request the transmission of the media frame of the multimedia resource.
  • the pull position parameter is used to indicate the start of the media frame of the multimedia resource.
  • the media frame of the multimedia resource is sent from the start frame, where the timestamp of the media frame is greater than or equal to the timestamp of the start frame.
  • one or more processors of the computer device are used to perform the following operations:
  • the start frame of the multimedia resource is determined.
  • one or more processors of the computer device are used to perform the following operations:
  • the target timestamp is the maximum timestamp minus the absolute value of the default value of the pull position parameter
  • the target timestamp is the maximum audio timestamp minus the absolute value of the default value of the pull position parameter
  • the target timestamp is the maximum timestamp
  • the target timestamp is the maximum audio timestamp
  • the target time stamp is the maximum timestamp minus the absolute value of the pull position parameter
  • the target timestamp is the maximum audio timestamp minus the absolute value of the pull position parameter
  • the target time stamp is the maximum time stamp
  • the target time stamp is the maximum audio time stamp
  • the pull position parameter is greater than 0 and no time stamp rollback occurs in the cache area, it is determined that the target timestamp is the pull position parameter.
  • one or more processors of the computer device are configured to perform the following operations:
  • the waiting state is entered, and until the target media frame is written into the current valid buffer area, it is determined that the start frame is the target media frame.
  • one or more processors of the computer device are further configured to perform the following operations:
  • the pull failure information is sent.
  • one or more processors of the computer device are further configured to perform the following operations:
  • one or more processors of the computer device are further configured to perform the following operations:
  • the buffer area does not include video resources
  • the timestamp of the audio frame in the audio frame sequence is non-monotonically increasing, it is determined that the media frame sequence is non-monotonously increasing, where the audio frame sequence is the buffer in the buffer area.
  • a sequence of multiple audio frames is determined that the media frame sequence is non-monotonously increasing, where the audio frame sequence is the buffer in the buffer area.
  • one or more processors of the computer device are further configured to perform the following operations:
  • the media frames included in the last monotonically increasing stage are determined as resources in the current effective buffer area.
  • one or more processors of the computer device are used to perform the following operations:
  • start frame is the media frame with the time stamp closest to the target time stamp in the current valid buffer area.
  • the maximum timestamp is the maximum video timestamp; based on the current valid buffer area not including the video resource , The maximum timestamp is the maximum audio timestamp.
  • one or more processors of the computer device are used to perform the following operations:
  • the frame acquisition request is parsed to obtain the pull position parameter
  • the default pull position parameter is configured, and the pull position parameter is configured as a default value.
  • one or more processors of the computer device are used to perform the following operations:
  • the address information of the multimedia resource is obtained by parsing
  • the media frame of the multimedia resource indicated by the address information is sent from the start frame.
  • a computer program product which includes one or more instructions, and the one or more instructions can be executed by a processor of a computer device to complete the resource transmission method provided in each of the foregoing embodiments.

Abstract

本公开关于一种资源传输方法及计算机设备,属于通信技术领域。本公开通过响应于多媒体资源的帧获取请求,获取该多媒体资源的拉取位置参数,基于该拉取位置参数,确定该多媒体资源的起始帧,从该起始帧开始发送该多媒体资源的媒体帧。

Description

资源传输方法及计算机设备
本申请要求于2020年01月17日提交的申请号为202010054760.6、发明名称为“资源传输方法、装置、计算机设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本公开涉及通信技术领域,特别涉及一种资源传输方法及计算机设备。
背景技术
随着通信技术的发展,用户可以随时随地在终端上浏览音视频资源,目前,在服务器向终端传输音视频资源(俗称为“拉流阶段”)时,可以采用基于分片的媒体传输方式。
基于分片的媒体传输方式包括常见的DASH(Dynamic Adaptive Streaming over HTTP,MPEG制定的基于HTTP的自适应流媒体传输标准,其中,MPEG的英文全称为Moving Picture Experts Group,中文全称为动态图像专家组)、HLS(HTTP Live Streaming,苹果公司制定的基于HTTP的自适应流媒体传输标准)等,服务器将音视频资源切分成一段一段的音视频片段,每个音视频片段都可以转码成不同的码率,终端在播放音视频资源时,分别访问音视频资源所切分成的各个音视频片段的网址,不同的音视频片段之间可以对应于相同或不同的码率,使得终端能够方便地在不同码率的音视频资源中进行切换,这种过程也称为基于终端自身带宽情况自适应调整码率。
发明内容
本公开提供一种资源传输方法及计算机设备。本公开的技术方案如下:
根据本公开实施例的一方面,提供一种资源传输方法,包括:响应于多媒体资源的帧获取请求,获取所述多媒体资源的拉取位置参数,所述帧获取请求用于请求传输所述多媒体资源的媒体帧,所述拉取位置参数用于表示所述多媒体资源的媒体帧的起始拉取位置;基于所述多媒体资源的拉取位置参数,确定所述多媒体资源的起始帧;从所述起始帧开始发送所述多媒体资源的媒体帧,其中,所述媒体帧的时间戳大于或等于所述起始帧的时间戳。
根据本公开实施例的另一方面,提供一种资源传输装置,包括:获取单元,被配置为执行响应于多媒体资源的帧获取请求,获取所述多媒体资源的拉取位置参数,所述帧获取请求用于请求传输所述多媒体资源的媒体帧,所述拉取位置参数用于表示所述多媒体资源的媒体帧的起始拉取位置;第一确定单元,被配置为执行基于所述多媒体资源的拉取位置参数,确定所述多媒体资源的起始帧;发送单元,被配置为执行从所述起始帧开始发送所述多媒体资源的媒体帧,其中,所述媒体帧的时间戳大于或等于所述起始帧的时间戳。
根据本公开实施例的另一方面,提供一种计算机设备,包括:一个或多个处理器;用于存储所述一个或多个处理器可执行指令的一个或多个存储器;其中,所述一个或多个处理器被配置为执行如下操作:响应于多媒体资源的帧获取请求,获取所述多媒体资源的拉取位置参数,所述帧获取请求用于请求传输所述多媒体资源的媒体帧,所述拉取位置参数用于表示所述多媒体资源的媒体帧的起始拉取位置;基于所述多媒体资源的拉取位置参数,确定所述多媒体资源的起始帧;从所述起始帧开始发送所述多媒体资源的媒体帧,其中,所述媒体帧的时间戳大于或等于所述起始帧的时间戳。
根据本公开实施例的另一方面,提供一种存储介质,当所述存储介质中的至少一条指令由计算机设备的一个或多个处理器执行时,使得计算机设备能够执行如下操作:响应于多媒体资源的帧获取请求,获取所述多媒体资源的拉取位置参数,所述帧获取请求用于请求传输所述多媒体资源的媒体帧,所述拉取位置参数用于表示所述多媒体资源的媒体帧的起始拉取位置;基于所述多媒体资源的拉取位置参数,确定所述多媒体资源的起始帧;从所述起始帧开始发送所述多媒体资源的媒体帧,其中,所述媒体帧的时间戳大于或等于所述起始帧的时间戳。
根据本公开实施例的另一方面,提供一种计算机程序产品,包括一条或多条指令,所述一条或多条指令可以由计算机设备的一个或多个处理器执行,使得计算机设备能够执行上述一方面涉及的资源传输方法。
附图说明
图1是根据一实施例示出的一种资源传输方法的实施环境示意图;
图2是本公开实施例提供的一种FAS框架的原理性示意图;
图3是根据一实施例示出的一种资源传输方法的流程图;
图4是根据一实施例示出的一种资源传输方法的交互流程图;
图5是本公开实施例提供的一种确定目标时间戳的原理性示意图;
图6是根据一实施例示出的一种资源传输装置的逻辑结构框图;
图7是本公开实施例提供的一种计算机设备的结构示意图。
具体实施方式
本公开的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的本公开的实施例能够以除了在这里图示或描述的那些以外的顺序实施。
本公开所涉及的用户信息可以为经用户授权或者经过各方充分授权的信息。
以下,对本公开所涉及的术语进行解释。
一、FLV(Flash Video)
FLV是一种流媒体格式,FLV流媒体格式是随着Flash MX(一种动画制作软件)的推出发展而来的视频格式。由于它形成的文件极小、加载速度极快,使得网络观看视频文件(也即在线浏览视频)成为可能,它的出现有效地解决了视频文件导入Flash后导出的SWF(一种Flash的专用文件格式)文件体积庞大,以致不能在网络上很好的使用等问题。
二、流媒体(Streaming Media)
流媒体采用流式传输方法,是指将一连串的多媒体资源压缩后,通过网络发送资源包,从而在网上即时传输多媒体资源以供观赏的一种技术与过程,此技术使得资源包得以像流水一样发送;如果不使用此技术,就必须在使用前下载整个媒体文件,从而仅能进行离线观看多媒体资源。流式传输可传送现场多媒体资源或预存于服务器上的多媒体资源,当观众用户在收看这些多媒体资源时,多媒体资源在送达观众用户的观众终端后可以由特定播放软件进行播放。
三、FAS(FLV Adaptive Streaming,基于FLV的自适应流媒体传输标准)
FAS是本公开所提出的流式资源传输标准(或称为资源传输协议),与传统的基于分片的媒体传输方式不同,FAS标准能够达到帧级别的多媒体资源传输,服务器无需等待一个完整 的视频片段到达之后才能向终端发送资源包,而是在解析终端的帧获取请求之后,确定拉取位置参数,进而根据拉取位置参数确定多媒体资源的起始帧,从起始帧开始将多媒体资源的媒体帧逐帧发送至终端。需要说明的是,每个帧获取请求可以对应于某一码率,当终端自身的网络带宽情况发生变化时,可以适应性调整对应的码率,重新发送与调整后的码率对应的帧获取请求,从而能够达到自适应调整多媒体资源码率的效果。FAS标准能够实现帧级传输、降低端到端延迟,只有码率发生切换时才需要发送新的帧获取请求,极大减小请求数量,降低资源传输过程的通信开销。
四、直播与点播
直播:多媒体资源是实时录制的,主播用户通过主播终端将媒体流“推流”(指基于流式传输方式推送)到服务器上,观众用户在观众终端上触发进入主播用户的直播界面之后,将媒体流从服务器“拉流”(指基于流式传输方式拉取)到观众终端,观众终端解码并播放多媒体资源,从而实时地进行视频播放。
点播:也称为Video On Demand(VOD),多媒体资源预存在服务器上,服务器能够根据观众用户的要求来提供观众用户指定的多媒体资源,在一些实施例中,观众终端向服务器发送点播请求,服务器查询到点播请求所指定的多媒体资源之后,将多媒体资源发送至观众终端,也即是说,观众用户能够选择性地播放某个特定的多媒体资源。
点播的内容可以任意控制播放进度,而直播则不然,直播的内容播放速度取决于主播用户的实时直播进度。
图1是根据一实施例示出的一种资源传输方法的实施环境示意图,参见图1,在该实施环境中可以包括至少一个终端101和服务器102,其中服务器102也即是一种计算机设备,下面进行详述:
在一些实施例中,终端101用于进行多媒体资源传输,在每个终端上可以安装有媒体编解码组件以及媒体播放组件,该媒体编解码组件用于在接收多媒体资源(例如分片传输的资源包、帧级传输的媒体帧)之后进行多媒体资源的解码,该媒体播放组件用于在解码多媒体资源之后进行多媒体资源的播放。
按照用户身份的不同,终端101可以划分为主播终端以及观众终端,主播终端对应于主播用户,观众终端对应于观众用户,需要说明的是,对同一个终端而言,该终端即可以是主播终端,也可以是观众终端,比如,用户在录制直播时该终端为主播终端,用户在观看直播时该终端为观众终端。
终端101和服务器102可以通过有线网络或无线网络相连。
在一些实施例中,服务器102用于提供待传输的多媒体资源,服务器102可以包括一台服务器、多台服务器、云计算平台或者虚拟化中心中的至少一种。在一些实施例中,服务器102可以承担主要计算工作,终端101可以承担次要计算工作;或者,服务器102承担次要计算工作,终端101承担主要计算工作;或者,终端101和服务器102两者之间采用分布式计算架构进行协同计算。
在一些实施例中,服务器102可以是集群式的CDN(Content Delivery Network,内容分发网络)服务器,CDN服务器包括中心平台以及部署在各地的边缘服务器,通过中心平台的负载均衡、内容分发、调度等功能模块,使得用户所在终端能够依靠当地的边缘服务器来就近获取所需内容(即多媒体资源)。
CDN服务器在终端与中心平台之间增加了一个缓存机制,该缓存机制也即是部署在不同 地理位置的边缘服务器(比如WEB服务器),在性能优化时,中心平台会根据终端与边缘服务器的距离远近,调度与终端之间距离最近的边缘服务器来向终端提供服务,能够更加有效地向终端发布内容。
本公开实施例所涉及的多媒体资源,包括但不限于:视频资源、音频资源、图像资源或者文本资源中至少一项,本公开实施例不对多媒体资源的类型进行具体限定。比如,该多媒体资源为网络主播的直播视频流,或者为预存在服务器上的历史点播视频,或者为电台主播的直播音频流,或者为预存在服务器上的历史点播音频。
在一些实施例中,终端101的设备类型包括但不限于:电视机、智能手机、智能音箱、车载终端、平板电脑、电子书阅读器、MP3(Moving Picture Experts Group Audio Layer III,动态影像专家压缩标准音频层面3)播放器、MP4(Moving Picture Experts Group Audio Layer IV,动态影像专家压缩标准音频层面4)播放器、膝上型便携计算机或者台式计算机中的至少一种。以下实施例,以终端101包括智能手机来进行举例说明。
本领域技术人员可以知晓,上述终端101的数量可以仅为一个,或者终端101的数量为几十个或几百个,或者更多数量。本公开实施例对终端101的数量和设备类型不加以限定。
图2是本公开实施例提供的一种FAS框架的原理性示意图,请参考图2,本公开实施例提供一种FAS(基于流式的多码率自适应)框架,在该框架内,终端101与服务器102之间通过FAS协议进行多媒体资源传输。
以任一终端为例进行说明,在终端上可以安装有应用程序(亦称为FAS客户端),该应用程序用于浏览多媒体资源,例如,该应用程序可以为短视频应用、直播应用、视频点播应用、社交应用、购物应用等,本公开实施例不对应用程序的类型进行具体限定。
用户可以在终端上启动应用程序,显示资源推送界面(例如应用程序的首页或者功能界面),在该资源推送界面中包括至少一个多媒体资源的缩略信息,该缩略信息包括标题、简介、发布者、海报、预告片或者精彩片段中至少一项,响应于用户对任一多媒体资源的缩略信息的触控操作,终端可以从资源推送界面跳转至资源播放界面,在该资源播放界面中包括该多媒体资源的播放选项,响应于用户对该播放选项的触控操作,终端从服务器中下载该多媒体资源的媒体描述文件(Media Presentation Description,MPD),基于该媒体描述文件,确定多媒体资源的地址信息,向服务器发送携带该地址信息的帧获取请求(或称为FAS请求),使得服务器基于本公开实施例所提供的FAS请求的处理规范,对该帧获取请求进行解析和响应,服务器定位到该多媒体资源的媒体帧(连续的媒体帧可以构成媒体流)之后,向终端返回该多媒体资源的媒体帧(也即向终端返回媒体流)。终端接收到媒体流之后,调用媒体编解码组件对媒体流进行解码,得到解码后的媒体流,调用媒体播放组件播放解码后的媒体流。
需要说明的是,由于服务器在对多媒体资源进行转码之后,可能会形成多种码率的多媒体资源,此时服务器可以为不同码率的多媒体资源分配不同的地址信息,将各种码率的多媒体资源的地址信息均记录在MPD中,终端下载MPD之后,可以在不同时刻向服务器发送携带不同地址信息的帧获取请求,那么服务器会以不同的码率返回对应的多媒体资源的媒体帧。
在上述过程中,通过不同的地址信息来指定不同的码率,此外由于不同的拉取位置参数能够指定多媒体资源的不同起始拉取位置,那么在帧获取请求中规定了起始拉取位置和码率(若缺省,则服务器会配置默认值)之后,若在播放过程中需要进行码率切换,终端只需要再次发送新的帧获取请求,服务器能够随时地从起始帧开始按照另一码率向终端发送媒体流,也即终端能够动态地从任一起始帧开始拉取另一码率的媒体流。
在一些实施例中,终端在启播(开始播放)时确定本次所欲请求多媒体资源的目标码率,在MPD中索引到目标码率的多媒体资源的目标地址信息,向服务器发送携带目标地址信息的帧获取请求,从而能够在帧获取请求中指定终端本次所欲请求多媒体资源的目标码率,服务器按照目标码率返回多媒体资源的媒体帧。
在上述场景中,当终端当前的网络带宽情况发生波动的时候,终端可以基于自适应策略,适应性调整与当前网络带宽情况相匹配的待切换码率,从MPD中索引到待切换码率的多媒体资源的待切换地址信息,终端可以断开当前码率的媒体流传输链接,向服务器发送携带待切换地址信息的帧获取请求,服务器按照待切换码率返回多媒体资源的媒体帧,建立基于待切换码率的媒体流传输链接。
在一些实施例中,终端也可以不断开当前码率的媒体流传输链接,而是直接重新发起携带待切换地址信息的帧获取请求,建立基于待切换码率的媒体流传输链接(用于传输新的媒体流),将原有的媒体流作为备用流,一旦新的媒体流出现传输异常,那么可以继续播放备用流。
图3是根据一实施例示出的一种资源传输方法的流程图,所述资源传输方法应用于计算机设备,以计算机设备为上述实施环境涉及的FAS框架中的服务器为例进行说明。
在301中,服务器响应于多媒体资源的帧获取请求,获取该多媒体资源的拉取位置参数,该帧获取请求用于请求传输该多媒体资源的媒体帧,该拉取位置参数用于表示该多媒体资源的媒体帧的起始拉取位置。
在302中,服务器基于该多媒体资源的拉取位置参数,确定该多媒体资源的起始帧。
在303中,服务器从该起始帧开始发送该多媒体资源的媒体帧,其中,该媒体帧的时间戳大于或等于该起始帧的时间戳。
在一些实施例中,基于该多媒体资源的拉取位置参数,确定该多媒体资源的起始帧包括:
获取该多媒体资源的音频参数,该音频参数用于表示该媒体帧是否为音频帧;
基于该音频参数和该拉取位置参数,确定目标时间戳;
基于该目标时间戳,确定该多媒体资源的起始帧。
在一些实施例中,基于该音频参数和该拉取位置参数,确定目标时间戳包括:
基于该拉取位置参数为默认值,且该音频参数为默认值或该音频参数为假,确定该目标时间戳为最大时间戳减去该拉取位置参数的默认值的绝对值所得的数值;
基于该拉取位置参数为默认值,且该音频参数为真,确定该目标时间戳为最大音频时间戳减去该拉取位置参数的默认值的绝对值所得的数值;
基于该拉取位置参数等于0,且该音频参数为默认值或该音频参数为假,确定该目标时间戳为最大时间戳;
基于该拉取位置参数等于0,且该音频参数为真,确定该目标时间戳为最大音频时间戳;
基于该拉取位置参数小于0,且该音频参数为默认值或该音频参数为假,确定该目标时间戳为最大时间戳减去该拉取位置参数的绝对值所得的数值;
基于该拉取位置参数小于0,且该音频参数为真,确定该目标时间戳为最大音频时间戳减去该拉取位置参数的绝对值所得的数值;
基于该拉取位置参数大于0,且该音频参数为默认值或该音频参数为假,在缓存区中发生时间戳回退时,确定该目标时间戳为最大时间戳;
基于该拉取位置参数大于0,且该音频参数为真,在缓存区中发生时间戳回退时,确定 该目标时间戳为最大音频时间戳;
基于该拉取位置参数大于0,且缓存区中未发生时间戳回退时,确定该目标时间戳为该拉取位置参数。
在一些实施例中,基于该拉取位置参数大于0且缓存区中未发生时间戳回退,所述基于该目标时间戳,确定该多媒体资源的起始帧包括:
基于当前有效缓存区中存在目标媒体帧,确定该起始帧为该目标媒体帧,该目标媒体帧的时间戳大于或等于该目标时间戳且最接近该目标时间戳;
基于该当前有效缓存区中不存在该目标媒体帧,进入等待状态,直到该目标媒体帧写入该当前有效缓存区时,确定该起始帧为该目标媒体帧。
在一些实施例中,该方法还包括:
基于该当前有效缓存区中不存在该目标媒体帧,且该目标时间戳与最大时间戳之间的差值大于超时阈值,发送拉取失败信息。
在一些实施例中,基于该拉取位置参数大于0,该方法还包括:
基于缓存区中的媒体帧序列中媒体帧的时间戳呈非单调递增,确定该缓存区发生时间戳回退;
基于缓存区中的媒体帧序列中媒体帧的时间戳呈单调递增,确定该缓存区未发生时间戳回退,其中,该媒体帧序列为该缓存区中已缓存的多个媒体帧所构成的序列。
在一些实施例中,该方法还包括:
基于该缓存区中包括视频资源,在关键帧序列中关键帧的时间戳呈非单调递增时,确定该媒体帧序列呈非单调递增,其中,该关键帧序列为该缓存区中已缓存的多个关键帧所构成的序列;
基于该缓存区中不包括视频资源,在音频帧序列中音频帧的时间戳呈非单调递增时,确定该媒体帧序列呈非单调递增,其中,该音频帧序列为该缓存区中已缓存的多个音频帧所构成的序列。
在一些实施例中,该方法还包括:
将最后一个单调递增阶段所包含的各个媒体帧确定为当前有效缓存区内的资源。
在一些实施例中,基于该目标时间戳,确定该多媒体资源的起始帧包括:
确定该起始帧为当前有效缓存区中时间戳最接近该目标时间戳的媒体帧。
在一些实施例中,基于该音频参数为默认值或该音频参数为假,基于当前有效缓存区中包括视频资源,该最大时间戳为最大视频时间戳;基于当前有效缓存区中不包括视频资源,该最大时间戳为最大音频时间戳。
在一些实施例中,响应于多媒体资源的帧获取请求,获取该多媒体资源的拉取位置参数包括:
基于该帧获取请求携带拉取位置参数,解析该帧获取请求得到该拉取位置参数;
基于该帧获取请求缺省拉取位置参数,将该拉取位置参数配置为默认值。
在一些实施例中,从该起始帧开始发送该多媒体资源的媒体帧包括:
基于该帧获取请求,解析得到该多媒体资源的地址信息;
从该起始帧开始发送该地址信息所指示的多媒体资源的媒体帧。
图4是根据一实施例示出的一种资源传输方法的交互流程图,所述资源传输方法可以应用于上述实施环境涉及的FAS框架中,该实施例包括以下内容。
在401中,终端向服务器发送多媒体资源的帧获取请求,该帧获取请求用于请求传输该多媒体资源的媒体帧。
其中,终端上可以安装有应用程序,该应用程序用于浏览多媒体资源,例如,该应用程序可以包括短视频应用、直播应用、视频点播应用、社交应用或者购物应用中至少一项,本公开实施例不对应用程序的类型进行具体限定。
本公开实施例所涉及的多媒体资源,包括但不限于:视频资源、音频资源、图像资源或者文本资源中至少一项,本公开实施例不对多媒体资源的类型进行具体限定。比如,该多媒体资源为网络主播的直播视频流,或者为预存在服务器上的历史点播视频,或者为电台主播的直播音频流,或者为预存在服务器上的历史点播音频。
在一些实施例中,用户可以在终端上启动应用程序,该应用程序显示资源推送界面,例如该资源推送界面可以是应用程序的首页或者功能界面,本公开实施例不对资源推送界面的类型进行具体限定。在该资源推送界面中可以包括至少一个多媒体资源的缩略信息,该缩略信息包括多媒体资源的标题、简介、海报、预告片或者精彩片段中至少一项。用户在浏览资源推送界面的过程中,可以点击感兴趣的多媒体资源的缩略信息,响应于用户对该多媒体资源的缩略信息的触控操作,终端可以从资源推送界面跳转至资源播放界面。
在该资源播放界面中可以包括播放区域和评论区域,在播放区域内可以包括该多媒体资源的播放选项,在评论区域内可以包括其他用户针对该多媒体资源的观看评论。用户在想要观看多媒体资源时,可以点击资源播放界面中的播放选项,终端响应于用户对播放选项的触控操作,从服务器中下载该多媒体资源的MPD,然后终端确定目标码率,从该MPD中获取该目标码率的多媒体资源的目标地址信息,生成携带目标地址信息的帧获取请求(FAS请求),向服务器发送携带目标地址信息的帧获取请求。
在一些实施例中,MPD文件格式可以为JSON(JavaScript Object Notation,JS对象简谱),也可以为其他脚本格式,本公开实施例不对MPD文件格式进行具体限定。
在一些实施例中,MPD文件中可以包括版本号(@version)和媒体描述集合(@adaptationSet),还可以包括服务类型(@type)、用于表示是否打开自适应功能的功能选项(@hideAuto)或者用于表示是否在启播时默认打开自适应功能的功能选项(@autoDefaultSelect)中至少一项,本公开实施例不对MPD文件承载的内容进行具体限定。
其中,版本号可以包括该媒体描述文件的版本号或者资源传输标准(FAS标准)的版本号中至少一项。
其中,该媒体描述集合用于表示多媒体资源的元信息,该媒体描述集合可以包括多个媒体描述元信息,每个媒体描述元信息对应于一种码率的多媒体资源,每个媒体描述元信息可以包括该媒体描述元信息所对应码率的多媒体资源的画面组长度(@gopDuration)以及属性信息(@representation)。
画面组(Group Of Pictures,GOP)长度是指两个关键帧(Intra-coded picture,帧内编码图像帧,也称为“I帧”)之间的距离。
每个属性信息可以包括多媒体资源的标识信息(@id,独一无二的标识符)、多媒体资源的编码方式(@codec,遵从的编解码标准)、多媒体资源所支持的码率(@bitrate,资源传输时单位时间内传送的数据位数)以及该码率的多媒体资源的地址信息(@url,某一码率的多媒体资源对外提供的URL或域名,URL是指统一资源定位符,英文全称为:Uniform Resource Locator),当然,每个属性信息还可以包括多媒体资源的质量类型(@qualityType,包括分辨 率、帧率等质量评价指标)、多媒体资源的隐藏选项(@hiden,用于表示某一码率的多媒体资源是否外显,也即用户是否能够手动该码率的多媒体资源)、用于表示多媒体资源是否相对于自适应功能可见的功能选项(@enableAdaptive,指自适应功能能否选中某一码率的多媒体资源)或者默认播放功能选项(@defaultSelect,指是否在启播时默认播放某一码率的多媒体资源)中至少一项。
其中,服务类型用于指定多媒体资源的业务类型,包括直播或者点播中至少一项。
在确定目标码率时,终端可以向用户提供码率选择列表,用户在点击码率选择列表中任一数值时,触发生成携带该数值的码率选择指令,终端响应于码率选择指令,将该码率选择指令所携带的数值确定为目标码率。
在一些实施例中,终端还可以通过自适应功能,将目标码率调整为与当前的网络带宽信息对应的码率,在进行自适应调整的过程中,除了当前的网络带宽信息之外,还可以结合终端的播放状态信息,动态选择播放效果最佳的目标码率。
在一些实施例中,上述帧获取请求中除了携带目标地址信息之外,还可以携带音频参数或者拉取位置参数中至少一项,分别在下述402和403进行介绍。当然,上述帧获取请求中也可以不携带音频参数以及拉取位置参数,此时两种参数均缺省,服务器会分配配置两种参数的默认值,将在下述404中进行详述。
在402中,服务器响应于多媒体资源的帧获取请求,获取该多媒体资源的拉取位置参数,该拉取位置参数用于表示该多媒体资源的媒体帧的起始拉取位置。
其中,该拉取位置参数(@fasSpts)用于指示服务器具体从哪帧开始发送媒体流,拉取位置参数的数据类型可以为int64_t类型,当然,也可以为其他数据类型,本公开实施例不对拉取位置参数的数据类型进行具体限定。在帧获取请求中,拉取位置参数可以等于0、大于0、小于0或者缺省,在不同的取值情况下会对应于服务器不同的处理逻辑,将在下述404中进行详述。
在一些实施例中,基于该帧获取请求携带拉取位置参数,服务器可以解析该帧获取请求得到该拉取位置参数,这种情况下终端在帧获取请求中指定了拉取位置参数,服务器可以直接对帧获取请求的@fasSpts字段进行解析,得到拉取位置参数。
在一些实施例中,基于该帧获取请求缺省拉取位置参数,服务器将该拉取位置参数配置为默认值,这种情况下终端并未在帧获取请求中指定拉取位置参数,那么服务器为其配置默认值,令@fasSpts=defaultSpts。这里的默认值可以由服务器根据业务场景自行配置,比如,在直播业务场景下,可以将defaultSpts设置为0,在点播业务场景下,可以将defaultSpts设置为上一次结束观看时历史媒体帧的PTS(Presentation Time Stamp,显示时间戳),若缓存中未记录历史媒体帧的PTS,那么将defaultSpts设置为首个媒体帧的PTS。
在403中,服务器获取该多媒体资源的音频参数,该音频参数用于表示该多媒体资源的媒体帧是否为音频帧。
其中,该音频参数(@onlyAudio)用于指示媒体流的拉取模式,若设定为true,表示服务器传输至终端的媒体帧为音频帧,俗称为“纯音频模式”,否则,若设定为false,表示服务器传输至终端的媒体帧为音视频帧,俗称为“非纯音频模式”。在帧获取请求中,音频参数可以为真、假或者缺省,在不同的取值情况下会对应于服务器不同的处理逻辑,将在下述404中进行详述。
在一些实施例中,基于该帧获取请求携带音频参数,服务器可以解析该帧获取请求得到 该音频参数,这种情况下终端在帧获取请求中指定了音频参数,服务器可以直接对帧获取请求的@onlyAudio字段进行解析,得到音频参数。
在一些实施例中,基于该帧获取请求缺省音频参数,服务器将该音频参数配置为默认值,这种情况下终端并未在帧获取请求中指定音频参数,那么服务器为其配置默认值。这里的默认值可以由服务器根据业务场景自行配置,比如,在提供视频业务时,将默认值设置为假,也即令@onlyAudio=false,或者,在仅提供音频业务时,将默认值设置为真,也即令@onlyAudio=true。需要说明的是,在本公开实施例中,仅以默认值为假(false)为例进行说明,根据默认值的不同,服务端的处理逻辑可以进行适应性调整,后文不做赘述。
在404中,服务器基于该音频参数和该拉取位置参数,确定目标时间戳。
在一些实施例中,在确定目标时间戳之前,服务器可以通过执行下述404A-404B来刷新当前有效缓存区:
404A、基于缓存区中的媒体帧序列中媒体帧的时间戳呈非单调递增,服务器确定该缓存区发生时间戳回退。
否则,基于缓存区中的媒体帧序列中媒体帧的时间戳呈单调递增,那么服务器可以确定该缓存区未发生时间戳回退。其中,媒体帧序列为缓存区中已缓存的多个媒体帧所构成的序列。
上述时间戳回退现象是指缓存区内的媒体帧并非按照时间戳单调递增的顺序进行存放,此时缓存区中存在冗余的媒体帧,这种现象通常容易发生在直播业务场景中,主播终端推流到服务器的过程中,由于网络波动、延时等原因,先发送的媒体帧有可能反而较晚到达服务器,致使缓存区内媒体帧序列中媒体帧的时间戳呈非单调递增,引发时间戳回退现象,另外,为了避免丢包问题,主播终端通常还会将各个媒体帧进行多次发送,这种冗余多发机制也会致使缓存区内媒体帧序列中媒体帧的时间戳呈非单调递增,引发时间戳回退现象。
在确定媒体帧序列中媒体帧的时间戳是否呈非单调递增时,服务器只需要从时间戳最小的媒体帧开始,按照缓存区内媒体帧序列的存放顺序,遍历是否存在媒体帧的时间戳大于下一媒体帧的时间戳,若存在任一媒体帧的时间戳大于下一媒体帧的时间戳,确定媒体帧序列中媒体帧的时间戳呈非单调递增,确定缓存区发生时间戳回退,否则,若所有媒体帧的时间戳均小于或等于下一媒体帧的时间戳,确定媒体帧序列中媒体帧的时间戳呈单调递增,确定缓存区未发生时间戳回退。
例如,假设缓存区内媒体帧序列中媒体帧的时间戳分别为[1001,1002,1003,1004,1005…],省略部分的媒体帧的时间戳呈递增,此时媒体帧序列中媒体帧的时间戳呈单调递增,缓存区未发生时间戳回退现象。又比如,假设缓存区内媒体帧序列中媒体帧的时间戳分别为[1001,1002,1003,1001,1002,1003,1004…],省略部分的媒体帧的时间戳呈递增,此时由于第3个媒体帧的时间戳(PTS 3=1003)大于第4个媒体帧的时间戳(PTS 4=1001),媒体帧序列中媒体帧的时间戳呈非单调递增,缓存区发生时间戳回退现象。
在一些实施例中,可以对视频资源和音频资源进行分别讨论:对视频资源而言,判断媒体帧序列中媒体帧的时间戳是否呈非单调递增时,可以仅考虑视频资源的关键帧(I帧)序列中关键帧的时间戳是否呈非单调递增;对音频资源而言,判断媒体帧序列中媒体帧的时间戳是否呈非单调递增时,可以考虑音频资源的音频帧序列中音频帧的时间戳是否呈非单调递增。
也即是说,基于该缓存区中包括视频资源,在关键帧序列中关键帧的时间戳呈非单调递增时,确定该媒体帧序列呈非单调递增,其中,该关键帧序列为缓存区中已缓存的多个关键 帧所构成的序列;基于该缓存区中不包括视频资源,在音频帧序列中音频帧的时间戳呈非单调递增时,确定该媒体帧序列呈非单调递增,其中,该音频帧序列为缓存区中已缓存的多个音频帧所构成的序列。
这是由于I帧的编解码不需要参考其他图像帧,仅利用本帧信息即可实现,而相对地,P帧(Predictive-coded picture,预测编码图像帧)和B帧(Bidirectionally predicted picture,双向预测编码图像帧)的编解码均需要参考其他图像帧,仅利用本帧信息无法完成编解码。对视频资源而言,是在I帧解码完成之后,基于I帧来进行P帧和B帧的解码,那么即使各个I帧对应的P帧和B帧呈非单调递增,只要保证I帧序列(仅考虑I帧的PTS序列)呈单调递增,那么可以认为缓存区未发生时间戳回退,反之,一旦I帧序列呈非单调递增,那么可以确定缓存区发生时间戳回退。当然,如果缓存区里没有视频资源,那么直接对所有音频帧的PTS序列进行遍历判断即可,这里不做赘述。
在一些实施例中,由于时间戳回退现象可能不止发生一次,也即是说,在媒体帧序列中媒体帧的时间戳里可以划分出多个单调递增阶段,在每个阶段内部的媒体帧的时间戳呈单调递增,但是在不同阶段之间的媒体帧的时间戳呈非单调递增,这时缓存区中存在很多冗余无效的媒体帧,服务器可以通过执行下述404B在缓存区中确定当前有效缓存区。
404B、服务器将最后一个单调递增阶段所包含的各个媒体帧确定为当前有效缓存区内的资源。
在上述过程中,服务器从媒体帧序列中确定最后一个单调递增阶段中首个媒体帧,将媒体帧序列中从上述首个媒体帧开始到具有最大时间戳的媒体帧(相当于最新的媒体帧)之间的所有媒体帧确定为当前有效缓存区,这样可以保证当前有效缓存区内的媒体帧呈单调递增。
例如,假设缓存区内媒体帧序列中媒体帧的时间戳分别为[1001,1002,1003,1001,1002,1003,1004…],省略部分的媒体帧的时间戳呈递增,此时缓存区发生时间戳回退,可以看出最后一个单调递增阶段的首个媒体帧为第4个媒体帧,那么将从第4个媒体帧开始到最新的媒体帧之间的所有媒体帧确定为当前有效缓存区。又比如,假设缓存区内媒体帧序列中媒体帧的时间戳分别为[1001,1002,1003,1001,1002,1003,1001…],省略部分的媒体帧的时间戳呈递增,缓存区发生时间戳回退,可以看出最后一个单调递增阶段的首个媒体帧为第7个媒体帧,那么将从第7个媒体帧开始到最新的媒体帧之间的所有媒体帧确定为当前有效缓存区。
在一些实施例中,可以对视频资源和音频资源进行分别讨论:若缓存区内包括视频资源,对视频资源而言,服务器可以以视频资源的I帧作为计算点,从最后一个单调递增阶段的首个关键帧到最新的视频帧之间的所有媒体帧作为当前有效缓存区,其中,最新的视频帧的时间戳可以表示为latestVideoPts;若缓存区内不包括视频资源,对音频资源而言,服务器可以以音频帧作为计算点,从最后一个单调递增阶段的首个音频帧到最新的音频帧之间的所有媒体帧作为当前有效缓存区,其中,最新的音频帧的时间戳可以表示为latestAudioPts。
在一些实施例中,更新当前有效缓存区的操作可以是定时触发的,也可以由技术人员手动触发,当然,还可以每当接收到帧获取请求时进行一次更新,这种方式称为“被动触发”,本公开实施例不对更新当前有效缓存区的触发条件进行具体限定。
图5是本公开实施例提供的一种确定目标时间戳的原理性示意图,请参考图5,示出了服务器在不同拉取位置参数以及音频参数的取值情况下,分别具有不同的处理逻辑,以下,将对服务器的处理逻辑进行介绍,由于拉取位置参数的取值情况可以分为四种:默认值、等于0、小于0以及大于0,下面针对这四种情况进行分别说明。
情况一、拉取位置参数为默认值
1):基于拉取位置参数为默认值,且音频参数为默认值或音频参数为假,服务器将最大时间戳减去该拉取位置参数的默认值的绝对值所得的数值确定为目标时间戳。
其中,基于当前有效缓存区中包括视频资源,该最大时间戳为最大视频时间戳latestVideoPts;基于当前有效缓存区中不包括视频资源,该最大时间戳为最大音频时间戳latestAudioPts。
上述过程是指帧获取请求中@fasSpts(拉取位置参数)缺省的情况下,服务器会为拉取位置参数配置默认值,令@fasSpts=defaultSpts。此时,如果帧获取请求中@onlyAudio(音频参数)也缺省,服务器会为音频参数配置默认值(音频参数的默认值为false),令@onlyAudio=false,或者,帧获取请求自身的@onlyAudio字段携带false值,也即帧获取请求指定@onlyAudio=false,此时服务器的处理规则如下:
基于当前有效缓存区中包括视频资源,服务器将latestVideoPts–|defaultSpts|所得的数值确定为目标时间戳;基于当前有效缓存区中不包括视频资源,服务器将latestAudioPts–|defaultSpts|所得的数值确定为目标时间戳。
2):基于拉取位置参数为默认值,且音频参数为真,将最大音频时间戳减去该拉取位置参数的默认值的绝对值所得的数值确定为目标时间戳。
上述过程是指帧获取请求中@fasSpts(拉取位置参数)缺省的情况下,服务器会为拉取位置参数配置默认值,令@fasSpts=defaultSpts。此时,如果帧获取请求的@onlyAudio字段携带true值,也即帧获取请求指定@onlyAudio=true(纯音频模式,仅传输音频流),此时服务器的处理规则如下:服务器将latestAudioPts–|defaultSpts|所得的数值确定为目标时间戳。
情况二、拉取位置参数等于0
1):基于拉取位置参数等于0,且音频参数为默认值或音频参数为假,将最大时间戳确定为目标时间戳。
其中,基于当前有效缓存区中包括视频资源,该最大时间戳为最大视频时间戳latestVideoPts;基于当前有效缓存区中不包括视频资源,该最大时间戳为最大音频时间戳latestAudioPts。
上述过程是指帧获取请求中@fasSpts字段携带0值(@fasSpts=0)的情况下,此时,如果帧获取请求中@onlyAudio(音频参数)也缺省,服务器会为音频参数配置默认值(音频参数的默认值为false),令@onlyAudio=false,或者,帧获取请求中@onlyAudio字段携带false值(帧获取请求指定@onlyAudio=false),此时服务器的处理规则如下:
基于当前有效缓存区中包括视频资源,服务器将latestVideoPts确定为目标时间戳;基于当前有效缓存区中不包括视频资源,服务器将latestAudioPts确定为目标时间戳。
2):基于拉取位置参数等于0,且音频参数为真,将最大音频时间戳确定为目标时间戳。
上述过程是指帧获取请求中@fasSpts字段携带0值(@fasSpts=0)的情况下,如果帧获取请求中@onlyAudio字段携带true值(帧获取请求指定@onlyAudio=true),也即是纯音频模式、仅传输音频流,此时服务器的处理规则如下:服务器将latestAudioPts确定为目标时间戳。
情况三、拉取位置参数小于0
1):基于拉取位置参数小于0,且音频参数为默认值或音频参数为假,将最大时间戳减去该拉取位置参数的绝对值所得的数值确定为目标时间戳。
其中,基于当前有效缓存区中包括视频资源,该最大时间戳为最大视频时间戳latestVideoPts;基于当前有效缓存区中不包括视频资源,该最大时间戳为最大音频时间戳latestAudioPts。
上述过程是指帧获取请求中@fasSpts字段携带小于0的值(@fasSpts<0)的情况下,此时,如果帧获取请求中@onlyAudio(音频参数)也缺省,服务器会为音频参数配置默认值(音频参数的默认值为false),令@onlyAudio=false,或者,帧获取请求中@onlyAudio字段携带false值(帧获取请求指定@onlyAudio=false),此时服务器的处理规则如下:
基于当前有效缓存区中包括视频资源,服务器将latestVideoPts-|@fasSpts|确定为目标时间戳;基于当前有效缓存区中不包括视频资源,服务器将latestAudioPts-|@fasSpts|确定为目标时间戳。
2):基于拉取位置参数小于0,且音频参数为真,将最大音频时间戳减去该拉取位置参数的绝对值所得的数值确定为目标时间戳。
上述过程是指帧获取请求中@fasSpts字段携带小于0的值(@fasSpts<0)的情况下,此时,如果帧获取请求中@onlyAudio字段携带true值(帧获取请求指定@onlyAudio=true),也即是纯音频模式、仅传输音频流,此时服务器的处理规则如下:服务器将latestAudioPts-|@fasSpts|确定为目标时间戳。
情况四、拉取位置参数大于0
1):基于拉取位置参数大于0,且音频参数为默认值或音频参数为假,在缓存区中发生时间戳回退时,将最大时间戳确定为目标时间戳。
其中,基于当前有效缓存区中包括视频资源,该最大时间戳为最大视频时间戳latestVideoPts;基于当前有效缓存区中不包括视频资源,该最大时间戳为最大音频时间戳latestAudioPts。
上述过程是指帧获取请求中@fasSpts字段携带大于0的值(@fasSpts>0)的情况下,此时,如果帧获取请求中@onlyAudio(音频参数)也缺省,服务器会为音频参数配置默认值(音频参数的默认值为false),令@onlyAudio=false,或者,帧获取请求中@onlyAudio字段携带false值(帧获取请求指定@onlyAudio=false),此时服务器的处理规则如下:
在缓存区中发生时间戳回退时,a)基于当前有效缓存区中包括视频资源,服务器将latestVideoPts确定为目标时间戳;b)基于当前有效缓存区中不包括视频资源,服务器将latestAudioPts确定为目标时间戳。
2):基于拉取位置参数大于0,且音频参数为真,在缓存区中发生时间戳回退时,将最大音频时间戳确定为目标时间戳。
上述过程是指帧获取请求中@fasSpts字段携带大于0的值(@fasSpts>0)的情况下,此时,如果帧获取请求中@onlyAudio字段携带true值(帧获取请求指定@onlyAudio=true),也即是纯音频模式、仅传输音频流,服务器的处理规则如下:服务器将latestAudioPts确定为目标时间戳。
3):基于拉取位置参数大于0,且音频参数为默认值或音频参数为假,在缓存区中未发生时间戳回退时,将该拉取位置参数确定为目标时间戳。
上述过程是指帧获取请求中@fasSpts字段携带大于0的值(@fasSpts>0)的情况下,此时,如果帧获取请求中@onlyAudio(音频参数)也缺省,服务器会为音频参数配置默认值(音频参数的默认值为false),令@onlyAudio=false,或者,帧获取请求中@onlyAudio字段携带 false值(帧获取请求指定@onlyAudio=false),此时服务器的处理规则如下:在缓存区中未发生时间戳回退时,服务器将@fasSpts确定为目标时间戳。
4):基于拉取位置参数大于0,且音频参数为真,在缓存区中未发生时间戳回退时,将该拉取位置参数确定为目标时间戳。
上述过程是指帧获取请求中@fasSpts字段携带大于0的值(@fasSpts>0)的情况下,此时,如果帧获取请求中@onlyAudio字段携带true值(帧获取请求指定@onlyAudio=true),也即是纯音频模式、仅传输音频流,服务器的处理规则如下:在缓存区中未发生时间戳回退时,服务器将@fasSpts确定为目标时间戳。
针对上述情况3)和4)的讨论,可以看出,基于拉取位置参数大于0(@fasSpts>0),且缓存区中未发生时间戳回退时,不论音频参数为真、为假还是默认值,服务器均将拉取位置参数确定为目标时间戳。
在上述各个情况中,服务器判断是否发生时间戳回退的操作可以参见上述404A,服务器更新当前有效缓存区的操作可以参见上述404B,这里不做赘述。
在上述基础上,服务器在拉取位置参数的不同取值情况下,均能够执行对应的处理逻辑,从而确定出目标时间戳,该目标时间戳用于在下述405中确定多媒体资源的起始帧。
在405中,服务器基于该目标时间戳,确定该多媒体资源的起始帧。
在一些实施例中,服务器可以通过下述方式一确定起始帧:
方式一、服务器将当前有效缓存区中时间戳最接近该目标时间戳的媒体帧确定为起始帧。
在一些实施例中,在音频参数缺省或音频参数为假的情况下,基于当前有效缓存区中包括视频资源,将视频资源中时间戳最接近该目标时间戳的关键帧(I帧)确定为起始帧;基于当前有效缓存区中不包括视频资源,将时间戳最接近该目标时间戳的音频帧确定为起始帧。
在一些实施例中,在音频参数为真的情况下,服务器可以直接将时间戳最接近该目标时间戳的音频帧确定为起始帧。
在一些实施例中,起始帧的确定方式包括:
A):@fasSpts=defaultSpts,@onlyAudio缺省或@onlyAudio=false时,请参考上述404情况一中的示例1),基于当前有效缓存区中包括视频资源,目标时间戳为latestVideoPts–|defaultSpts|,服务器将PTS最接近latestVideoPts–|defaultSpts|的I帧作为起始帧;此外,基于当前有效缓存区中不包括视频资源,目标时间戳为latestAudioPts–|defaultSpts|,服务器将PTS最接近latestAudioPts–|defaultSpts|的音频帧作为起始帧。
B):@fasSpts=defaultSpts,@onlyAudio=true时,请参考上述404情况一中的示例2),目标时间戳为latestAudioPts–|defaultSpts|,服务器将PTS最接近latestAudioPts–|defaultSpts|的音频帧作为起始帧。
C):@fasSpts=0,@onlyAudio缺省或@onlyAudio=false时,请参考上述404情况二中的示例1),基于当前有效缓存区中包括视频资源,目标时间戳为latestVideoPts,服务器将PTS最接近latestVideoPts的I帧作为起始帧;基于当前有效缓存区中不包括视频资源,目标时间戳为latestAudioPts,服务器将PTS最接近latestAudioPts的音频帧作为起始帧。
D):@fasSpts=0,@onlyAudio=true时,请参考上述404情况二中的示例2),目标时间戳为latestAudioPts,服务器将PTS最接近latestAudioPts的音频帧作为起始帧。
E):@fasSpts<0,@onlyAudio缺省或@onlyAudio=false时,请参考上述404情况三中的示例1),基于当前有效缓存区中包括视频资源,目标时间戳为latestVideoPts-|@fasSpts|, 服务器将PTS最接近latestVideoPts-|@fasSpts|的I帧作为起始帧;反之,基于当前有效缓存区中不包括视频资源,目标时间戳为latestAudioPts-|@fasSpts|,服务器将PTS最接近latestAudioPts-|@fasSpts|的音频帧作为起始帧。
F):@fasSpts<0,@onlyAudio=true时,请参考上述404情况三中的示例2),目标时间戳为latestAudioPts-|@fasSpts|,服务器可以将PTS最接近latestAudioPts-|@fasSpts|的音频帧作为起始帧。
G):@fasSpts>0,@onlyAudio缺省或@onlyAudio=false,缓存区中发生时间戳回退时,请参考上述404情况四中的示例1),基于当前有效缓存区中包括视频资源,目标时间戳为latestVideoPts,服务器将PTS最接近latestVideoPts的I帧(最新的I帧)作为起始帧;基于当前有效缓存区中不包括视频资源,目标时间戳为latestAudioPts,服务器将PTS最接近latestAudioPts的音频帧(最新的音频帧)作为起始帧。
H):@fasSpts>0,@onlyAudio=true,缓存区中发生时间戳回退时,请参考上述404情况四中的示例2),目标时间戳为latestAudioPts,服务器将PTS最接近latestAudioPts的音频帧(最新的音频帧)作为起始帧。
以此类推,在@fasSpts>0时,针对上述404情况四中的其余讨论,在确定目标时间戳之后,服务器也可以通过上述方式一,将当前有效缓存区中时间戳最接近该目标时间戳的媒体帧确定为起始帧,这里不进行一一枚举。
在一些实施例中,在@fasSpts>0时,除了上述方式一之外,服务器还可以通过下述方式二来确定媒体帧:
方式二、基于该当前有效缓存区中存在目标媒体帧,服务器将该目标媒体帧确定为起始帧,该目标媒体帧的时间戳大于或等于该目标时间戳且最接近该目标时间戳。
在一些实施例中,在音频参数缺省或音频参数为假的情况下,基于当前有效缓存区中包括视频资源,目标媒体帧是指视频资源内的I帧;基于当前有效缓存区中不包括视频资源,目标媒体帧是指音频帧。
在一些实施例中,在音频参数为真的情况下,目标媒体帧是指音频帧。
在一些实施例中,起始帧的确定方式包括:
I):@fasSpts>0,@onlyAudio缺省或@onlyAudio=false,缓存区中未发生时间戳回退时,请参考上述404情况四中的示例3),此时目标时间戳为@fasSpts,基于当前有效缓存区中包括视频资源,服务器可以从PTS最小的I帧开始,沿着PTS增大的方向逐个遍历,直到查询到第一个PTS≥@fasSpts的I帧(目标媒体帧),说明当前有效缓存区中存在目标媒体帧,服务器将上述目标媒体帧确定为起始帧;基于当前有效缓存区内不包括视频资源,服务器可以从PTS最小的音频帧开始,沿着PTS增大的方向逐个遍历,直到查询到第一个PTS≥@fasSpts的音频帧(目标媒体帧),说明当前有效缓存区中存在目标媒体帧,服务器将上述目标媒体帧确定为起始帧。
J):@fasSpts>0,@onlyAudio=true,缓存区中未发生时间戳回退时,请参考上述404情况四中的示例4),此时目标时间戳为@fasSpts,服务器可以从PTS最小的音频帧开始,沿着PTS增大的方向逐个遍历,直到查询到第一个PTS≥@fasSpts的音频帧(目标媒体帧),说明当前有效缓存区中存在目标媒体帧,服务器将上述目标媒体帧确定为起始帧。
上述方式二中,提供了在当前有效缓存区中能够查询到目标媒体帧时,服务器如何确定起始帧,然而,在一些实施例中,有可能在当前有效缓存区内并未查询到目标媒体帧,这种 情况通常会出现在直播业务场景中,观众终端所指定拉取@fasSpts的帧获取请求先到达了服务器,而@fasSpts所对应的媒体帧(直播视频帧)还在推流阶段的传输过程中,此时服务器还可以通过下述方式三来确定起始帧。
方式三、基于该当前有效缓存区中不存在目标媒体帧,服务器进入等待状态,直到该目标媒体帧写入该当前有效缓存区时,将该目标媒体帧确定为起始帧,该目标媒体帧的时间戳大于或等于该目标时间戳且最接近该目标时间戳。
在一些实施例中,在音频参数缺省或音频参数为假的情况下,基于当前有效缓存区中包括视频资源,目标媒体帧是指视频资源内的I帧;基于当前有效缓存区中不包括视频资源,目标媒体帧是指音频帧。
在一些实施例中,在音频参数为真的情况下,目标媒体帧是指音频帧。
在一些实施例中,起始帧的确定方式包括:
K):@fasSpts>0,@onlyAudio缺省或@onlyAudio=false,缓存区中未发生时间戳回退时,请参考上述404情况四中的示例3),此时目标时间戳为@fasSpts,基于当前有效缓存区中包括视频资源,服务器可以从PTS最小的I帧开始,沿着PTS增大的方向逐个遍历,如果遍历了所有的I帧之后查询不到满足PTS≥@fasSpts的I帧(目标媒体帧),说明当前有效缓存区中不存在目标媒体帧,服务器进入等待状态,等待第一个PTS≥@fasSpts的I帧(目标媒体帧)被写入当前有效缓存区时,将目标媒体帧确定为起始帧;基于当前有效缓存区内不包括视频资源,服务器可以从PTS最小的音频帧开始,沿着PTS增大的方向逐个遍历,如果遍历了所有的音频帧之后查询不到满足PTS≥@fasSpts的音频帧(目标媒体帧),说明当前有效缓存区中不存在目标媒体帧,服务器进入等待状态,等待第一个PTS≥@fasSpts的音频帧(目标媒体帧)被写入当前有效缓存区时,将目标媒体帧确定为起始帧。
L):@fasSpts>0,@onlyAudio=true,缓存区中未发生时间戳回退时,请参考上述404情况四中的示例4),此时目标时间戳为@fasSpts,服务器可以从PTS最小的音频帧开始,沿着PTS增大的方向逐个遍历,如果遍历了所有的音频帧之后查询不到满足PTS≥@fasSpts的音频帧(目标媒体帧),说明当前有效缓存区中不存在目标媒体帧,服务器进入等待状态,等待第一个PTS≥@fasSpts的音频帧(目标媒体帧)被写入当前有效缓存区时,将目标媒体帧确定为起始帧。
上述方式三中,提供了在当前有效缓存区中查询不到目标媒体帧时,服务器如何确定起始帧,在一些实施例中,有可能会由于异常情况的出现,导致帧获取请求中携带的@fasSpts是一个较大的异常值,若基于上述方式三进行处理,会导致很长的等待时间,在大数据场景下如果存在并发的帧获取请求发生异常情况,这些帧获取请求都会进入一个阻塞的等待状态,占用服务器的处理资源,那么会对服务器的性能造成极大的损失。
有鉴于此,服务器还可以设置一个超时阈值,从而通过下述方式四,基于超时阈值来确定是否需要返回拉取失败信息,下面对方式四进行详述。
方式四、基于该当前有效缓存区中不存在目标媒体帧,且目标时间戳与最大时间戳之间的差值大于超时阈值,服务器发送拉取失败信息,该目标媒体帧的时间戳大于或等于该目标时间戳且最接近该目标时间戳。
在一些实施例中,在音频参数缺省或音频参数为假的情况下,基于当前有效缓存区中包括视频资源,该最大时间戳为最大视频时间戳latestVideoPts;基于当前有效缓存区中不包括视频资源,该最大时间戳为最大音频时间戳latestAudioPts。
在一些实施例中,在音频参数为真的情况下,该最大时间戳为最大音频时间戳latestAudioPts。
假设超时阈值为timeoutPTS,超时阈值可以是任一大于或等于0的数值,超时阈值可以是一个服务器预设的数值,也可以由技术人员基于业务场景进行个性化的配置,本公开实施例不对超时阈值的获取方式进行具体限定,例如:
M):@fasSpts>0,@onlyAudio缺省或@onlyAudio=false,缓存区中未发生时间戳回退时,请参考上述404情况四中的示例3),此时目标时间戳为@fasSpts,基于当前有效缓存区中包括视频资源,服务器可以从PTS最小的I帧开始,沿着PTS增大的方向逐个遍历,如果遍历了所有的I帧之后查询不到满足PTS≥@fasSpts的I帧(目标媒体帧),说明当前有效缓存区中不存在目标媒体帧,服务器判断@fasSpts与latestVideoPts之间的差值是否大于timeoutPTS,若@fasSpts–latestVideoPts>timeoutPTS,服务器向终端发送拉取失败信息,否则,若@fasSpts–latestVideoPts≤timeoutPTS,服务器可以进入等待状态,也即是对应于上述方式三中示例K)对应情况下所执行的操作;基于当前有效缓存区内不包括视频资源,服务器可以从PTS最小的音频帧开始,沿着PTS增大的方向逐个遍历,如果遍历了所有的音频帧之后查询不到满足PTS≥@fasSpts的音频帧(目标媒体帧),说明当前有效缓存区中不存在目标媒体帧,服务器可以判断@fasSpts与latestAudioPts之间的差值是否大于timeoutPTS,若@fasSpts–latestAudioPts>timeoutPTS,服务器向终端发送拉取失败信息,否则,若@fasSpts–latestAudioPts≤timeoutPTS,服务器可以进入等待状态,也即是对应于上述方式三中示例K)对应情况下所执行的操作。
N):@fasSpts>0,@onlyAudio=true,缓存区中未发生时间戳回退时,请参考上述404情况四中的示例4),此时目标时间戳为@fasSpts,服务器可以从PTS最小的音频帧开始,沿着PTS增大的方向逐个遍历,如果遍历了所有的音频帧之后查询不到满足PTS≥@fasSpts的音频帧(目标媒体帧),说明当前有效缓存区中不存在目标媒体帧,服务器可以判断@fasSpts与latestAudioPts之间的差值是否大于timeoutPTS,若@fasSpts–latestAudioPts>timeoutPTS,服务器向终端发送拉取失败信息,否则,若@fasSpts–latestAudioPts≤timeoutPTS,服务器可以进入等待状态,也即是对应于上述方式三中示例L)对应情况下所执行的操作。
在上述方式三和方式四相结合,可以提供一种在@fasSpts>0且当前有效缓存区中不存在目标媒体帧时的异常处理逻辑,基于目标时间戳与最大时间戳之间的差值小于或等于超时阈值,服务器通过方式三进入等待状态(等待处理模式),直到目标媒体帧到达时,将目标媒体帧确定为起始帧,否则,基于目标时间戳与最大时间戳之间的差值大于超时阈值,服务器通过方式四发送拉取失败信息(错误处理模式),这时服务器是判定帧获取请求出错的,因此直接向终端返回拉取失败信息,该拉取失败信息可以是一个错误码的形式。
在上述403-405中,服务器基于该多媒体资源的拉取位置参数,确定该多媒体资源的起始帧,进一步地,在需要动态码率切换的场景下,只需要在帧获取请求中更换携带的地址信息(@url字段)以及拉取位置参数(@fasSpts字段),就可以实现从任一个指定的起始帧开始以新的码率进行媒体帧的传输。
在406中,服务器从该起始帧开始向终端发送该多媒体资源的媒体帧,其中,该媒体帧的时间戳大于或等于该起始帧的时间戳。
在上述406中,服务器可以基于该帧获取请求,解析得到该多媒体资源的地址信息,从该起始帧开始发送该地址信息所指示的多媒体资源的媒体帧,在一些实施例中,由于帧获取 请求所携带的地址信息与目标码率相对应,那么服务器可以从起始帧开始以目标码率来发送媒体流。
在上述过程中,服务器可以像流水一样源源不断的向终端发送媒体帧,可以形象地称为“媒体流传输”。
在一些实施例中,基于服务器为CDN服务器,那么该目标地址信息可以是一个域名,终端可以向CDN服务器的中心平台发送帧获取请求,中心平台调用DNS(Domain Name System,域名系统,本质上是一个域名解析库)对域名进行解析,可以得到域名对应的CNAME(别名)记录,基于终端的地理位置信息对CNAME记录再次进行解析,可以得到一个距离终端最近的边缘服务器的IP(Internet Protocol,网际互连协议)地址,这时中心平台将帧获取请求导向至上述边缘服务器,由边缘服务器响应于帧获取请求,以目标码率向终端提供多媒体资源的媒体帧。
在一些实施例中,本公开实施例提供一种CDN服务器内部回源机制,在CDN系统中,有可能边缘服务器中无法提供帧获取请求所指定的多媒体资源,此时边缘服务器可以向上级节点设备回源拉取媒体流。
那么边缘服务器可以向上级节点设备发送回源拉取请求,上级节点设备响应于回源拉取请求,向边缘服务器返回对应的媒体流,再由边缘服务器向终端发送对应的媒体流。
在上述过程中,边缘服务器在获取回源拉取请求时,基于终端发送的帧获取请求中携带@fasSpts字段,边缘服务器可以直接将帧获取请求确定为回源拉取请求,将回源拉取请求转发至上级节点设备,反之,基于终端发送的帧获取请求中缺省@fasSpts字段,边缘服务器需要为@fasSpts字段配置默认值defaultSpts,进而在帧获取请求嵌入@fasSpts字段,将@fasSpts字段内所存储的数值置为defaultSpts,得到回源拉取请求。
在一些实施例中,该上级节点设备可以是第三方源站服务器,此时回源拉取请求必须携带@fasSpts字段,在一些实施例中,该上级节点设备也可以是CDN系统内部的节点服务器(比如中心平台或者分布式数据库系统的节点设备),基于帧获取请求中携带@fasSpts字段,那么可以按照@fasSpts字段的实际值进行回源,否则,依据默认值@fasSpts=defaultSpts进行回源,本公开实施例不对边缘服务器的回源方式进行具体限定。
在407中,终端接收多媒体资源的媒体帧,播放多媒体资源的媒体帧。
在上述过程中,基于终端接收到多媒体资源的媒体帧(连续接收到的媒体帧即可构成媒体流),为了保证播放流畅性,终端可以将该媒体帧存入缓存区中,调用媒体编解码组件对媒体帧进行解码,得到解码后的媒体帧,调用媒体播放组件按照PTS从小到大的顺序来对缓存区内的媒体帧进行播放。
在解码过程中,终端可以从媒体描述文件的@codec字段中确定多媒体资源的编码方式,根据编码方式确定对应的解码方式,从而按照确定的解码方式对媒体帧进行解码。
图6是根据一实施例示出的一种资源传输装置的逻辑结构框图。参照图6,该装置包括获取单元601、第一确定单元602以及发送单元603,下面进行介绍。
获取单元601,被配置为执行响应于多媒体资源的帧获取请求,获取该多媒体资源的拉取位置参数,该帧获取请求用于请求传输该多媒体资源的媒体帧,该拉取位置参数用于表示该多媒体资源的媒体帧的起始拉取位置;
第一确定单元602,被配置为执行基于该多媒体资源的拉取位置参数,确定该多媒体资源的起始帧;
发送单元603,被配置为执行从该起始帧开始发送该多媒体资源的媒体帧,其中,该媒体帧的时间戳大于或等于该起始帧的时间戳。
在一些实施例中,该获取单元601还被配置为执行:获取该多媒体资源的音频参数,该音频参数用于表示该媒体帧是否为音频帧;
基于图6的装置组成,该第一确定单元602包括:
第一确定子单元,被配置为执行基于该音频参数和该拉取位置参数,确定目标时间戳;
第二确定子单元,被配置为执行基于该目标时间戳,确定该多媒体资源的起始帧。
在一些实施例中,该第一确定子单元被配置为执行:
基于该拉取位置参数为默认值,且该音频参数为默认值或该音频参数为假,确定该目标时间戳为最大时间戳减去该拉取位置参数的默认值的绝对值所得的数值;
基于该拉取位置参数为默认值,且该音频参数为真,确定该目标时间戳为最大音频时间戳减去该拉取位置参数的默认值的绝对值所得的数值;
基于该拉取位置参数等于0,且该音频参数为默认值或该音频参数为假,确定该目标时间戳为最大时间戳;
基于该拉取位置参数等于0,且该音频参数为真,确定该目标时间戳为最大音频时间戳;
基于该拉取位置参数小于0,且该音频参数为默认值或该音频参数为假,确定该目标时间戳为最大时间戳减去该拉取位置参数的绝对值所得的数值;
基于该拉取位置参数小于0,且该音频参数为真,确定该目标时间戳为最大音频时间戳减去该拉取位置参数的绝对值所得的数值;
基于该拉取位置参数大于0,且该音频参数为默认值或该音频参数为假,在缓存区中发生时间戳回退时,确定该目标时间戳为最大时间戳;
基于该拉取位置参数大于0,且该音频参数为真,在缓存区中发生时间戳回退时,确定该目标时间戳为最大音频时间戳;
基于该拉取位置参数大于0,且缓存区中未发生时间戳回退时,确定该目标时间戳为该拉取位置参数。
在一些实施例中,基于该拉取位置参数大于0且缓存区中未发生时间戳回退,该第二确定子单元被配置为执行:
基于当前有效缓存区中存在目标媒体帧,确定该起始帧为该目标媒体帧,该目标媒体帧的时间戳大于或等于该目标时间戳且最接近该目标时间戳;
基于该当前有效缓存区中不存在该目标媒体帧,进入等待状态,直到该目标媒体帧写入该当前有效缓存区时,确定该起始帧为该目标媒体帧。
在一些实施例中,该发送单元603还被配置为执行:
基于该当前有效缓存区中不存在该目标媒体帧,且该目标时间戳与最大时间戳之间的差值大于超时阈值,发送拉取失败信息。
在一些实施例中,基于该拉取位置参数大于0,该装置还包括:
第二确定单元,被配置为执行基于缓存区中的媒体帧序列中媒体帧的时间戳呈非单调递增,确定该缓存区发生时间戳回退;基于缓存区中的媒体帧序列中媒体帧的时间戳呈单调递增,确定该缓存区未发生时间戳回退,其中,该媒体帧序列为该缓存区中已缓存的多个媒体帧所构成的序列。
在一些实施例中,该第二确定单元还被配置为执行:
基于该缓存区中包括视频资源,在关键帧序列中关键帧的时间戳呈非单调递增时,确定该媒体帧序列呈非单调递增,其中,该关键帧序列为该缓存区中已缓存的多个关键帧所构成的序列;
基于该缓存区中不包括视频资源,在音频帧序列中音频帧的时间戳呈非单调递增时,确定该媒体帧序列呈非单调递增,其中,该音频帧序列为该缓存区中已缓存的多个音频帧所构成的序列。
在一些实施例中,基于图6的装置组成,该装置还包括:
第三确定单元,被配置为执行将最后一个单调递增阶段所包含的各个媒体帧确定为当前有效缓存区内的资源。
在一些实施例中,该第二确定子单元被配置为执行:
确定该起始帧为当前有效缓存区中时间戳最接近该目标时间戳的媒体帧。
在一些实施例中,基于该音频参数为默认值或该音频参数为假,基于当前有效缓存区中包括视频资源,该最大时间戳为最大视频时间戳;基于当前有效缓存区中不包括视频资源,该最大时间戳为最大音频时间戳。
在一些实施例中,该获取单元601被配置为执行:
基于该帧获取请求携带拉取位置参数,解析该帧获取请求得到该拉取位置参数;
基于该帧获取请求缺省拉取位置参数,将该拉取位置参数配置为默认值。
在一些实施例中,该发送单元603被配置为执行:
基于该帧获取请求,解析得到该多媒体资源的地址信息;
从该起始帧开始发送该地址信息所指示的多媒体资源的媒体帧。
图7是本公开实施例提供的一种计算机设备的结构示意图,该计算机设备可以是FAS框架中的服务器,该计算机设备700可因配置或性能不同而产生比较大的差异,可以包括一个或一个以上处理器(Central Processing Units,CPU)701和一个或一个以上的存储器702,其中,该存储器702中存储有至少一条程序代码,该至少一条程序代码由该处理器701加载并执行以实现上述各个实施例提供的资源传输方法。当然,该计算机设备700还可以具有有线或无线网络接口、键盘以及输入输出接口等部件,以便进行输入输出,该计算机设备700还可以包括其他用于实现设备功能的部件,在此不做赘述。
在一些实施例中,该计算机设备包括一个或多个处理器,和用于存储该一个或多个处理器可执行指令的一个或多个存储器,其中,该一个或多个处理器被配置为执行该指令,以实现如下操作:
响应于多媒体资源的帧获取请求,获取该多媒体资源的拉取位置参数,该帧获取请求用于请求传输该多媒体资源的媒体帧,该拉取位置参数用于表示该多媒体资源的媒体帧的起始拉取位置;
基于该多媒体资源的拉取位置参数,确定该多媒体资源的起始帧;
从该起始帧开始发送该多媒体资源的媒体帧,其中,该媒体帧的时间戳大于或等于该起始帧的时间戳。
在一些实施例中,该一个或多个处理器被配置为执行该指令,以实现如下操作:
获取该多媒体资源的音频参数,该音频参数用于表示该媒体帧是否为音频帧;
基于该音频参数和该拉取位置参数,确定目标时间戳;
基于该目标时间戳,确定该多媒体资源的起始帧。
在一些实施例中,该一个或多个处理器被配置为执行该指令,以实现如下操作:
基于该拉取位置参数为默认值,且该音频参数为默认值或该音频参数为假,确定该目标时间戳为最大时间戳减去该拉取位置参数的默认值的绝对值所得的数值;
基于该拉取位置参数为默认值,且该音频参数为真,确定该目标时间戳为最大音频时间戳减去该拉取位置参数的默认值的绝对值所得的数值;
基于该拉取位置参数等于0,且该音频参数为默认值或该音频参数为假,确定该目标时间戳为最大时间戳;
基于该拉取位置参数等于0,且该音频参数为真,确定该目标时间戳为最大音频时间戳;
基于该拉取位置参数小于0,且该音频参数为默认值或该音频参数为假,确定该目标时间戳为最大时间戳减去该拉取位置参数的绝对值所得的数值;
基于该拉取位置参数小于0,且该音频参数为真,确定该目标时间戳为最大音频时间戳减去该拉取位置参数的绝对值所得的数值;
基于该拉取位置参数大于0,且该音频参数为默认值或该音频参数为假,在缓存区中发生时间戳回退时,确定该目标时间戳为最大时间戳;
基于该拉取位置参数大于0,且该音频参数为真,在缓存区中发生时间戳回退时,确定该目标时间戳为最大音频时间戳;
基于该拉取位置参数大于0,且缓存区中未发生时间戳回退时,确定该目标时间戳为该拉取位置参数。
在一些实施例中,基于该拉取位置参数大于0且缓存区中未发生时间戳回退,该一个或多个处理器被配置为执行该指令,以实现如下操作:
基于当前有效缓存区中存在目标媒体帧,确定该起始帧为该目标媒体帧,该目标媒体帧的时间戳大于或等于该目标时间戳且最接近该目标时间戳;
基于该当前有效缓存区中不存在该目标媒体帧,进入等待状态,直到该目标媒体帧写入该当前有效缓存区时,确定该起始帧为该目标媒体帧。
在一些实施例中,该一个或多个处理器还被配置为执行该指令,以实现如下操作:
基于该当前有效缓存区中不存在该目标媒体帧,且该目标时间戳与最大时间戳之间的差值大于超时阈值,发送拉取失败信息。
在一些实施例中,基于该拉取位置参数大于0,该一个或多个处理器还被配置为执行该指令,以实现如下操作:
基于缓存区中的媒体帧序列中媒体帧的时间戳呈非单调递增,确定该缓存区发生时间戳回退;
基于缓存区中的媒体帧序列中媒体帧的时间戳呈单调递增,确定该缓存区未发生时间戳回退,其中,该媒体帧序列为该缓存区中已缓存的多个媒体帧所构成的序列。
在一些实施例中,该一个或多个处理器还被配置为执行该指令,以实现如下操作:
基于该缓存区中包括视频资源,在关键帧序列中关键帧的时间戳呈非单调递增时,确定该媒体帧序列呈非单调递增,其中,该关键帧序列为该缓存区中已缓存的多个关键帧所构成的序列;
基于该缓存区中不包括视频资源,在音频帧序列中音频帧的时间戳呈非单调递增时,确定该媒体帧序列呈非单调递增,其中,该音频帧序列为该缓存区中已缓存的多个音频帧所构成的序列。
在一些实施例中,该一个或多个处理器还被配置为执行该指令,以实现如下操作:
将最后一个单调递增阶段所包含的各个媒体帧确定为当前有效缓存区内的资源。
在一些实施例中,该一个或多个处理器被配置为执行该指令,以实现如下操作:
确定该起始帧为当前有效缓存区中时间戳最接近该目标时间戳的媒体帧。
在一些实施例中,基于该音频参数为默认值或该音频参数为假,基于当前有效缓存区中包括视频资源,该最大时间戳为最大视频时间戳;基于当前有效缓存区中不包括视频资源,该最大时间戳为最大音频时间戳。
在一些实施例中,该一个或多个处理器被配置为执行该指令,以实现如下操作:
基于该帧获取请求携带拉取位置参数,解析该帧获取请求得到该拉取位置参数;
基于该帧获取请求缺省拉取位置参数,将该拉取位置参数配置为默认值。
在一些实施例中,该一个或多个处理器被配置为执行该指令,以实现如下操作:
基于该帧获取请求,解析得到该多媒体资源的地址信息;
从该起始帧开始发送该地址信息所指示的多媒体资源的媒体帧。
在一些实施例中,还提供了一种包括至少一条指令的存储介质,例如包括至少一条指令的存储器,上述至少一条指令可由计算机设备中的处理器执行以完成上述实施例中资源传输方法。在一些实施例中,上述存储介质可以是非临时性计算机可读存储介质,例如,该非临时性计算机可读存储介质可以包括ROM(Read-Only Memory,只读存储器)、RAM(Random-Access Memory,随机存取存储器)、CD-ROM(Compact Disc Read-Only Memory,只读光盘)、磁带、软盘和光数据存储设备等。
在一些实施例中,当该存储介质中的至少一条指令由计算机设备的一个或多个处理器执行时,使得计算机设备能够执行如下操作:
响应于多媒体资源的帧获取请求,获取该多媒体资源的拉取位置参数,该帧获取请求用于请求传输该多媒体资源的媒体帧,该拉取位置参数用于表示该多媒体资源的媒体帧的起始拉取位置;
基于该多媒体资源的拉取位置参数,确定该多媒体资源的起始帧;
从该起始帧开始发送该多媒体资源的媒体帧,其中,该媒体帧的时间戳大于或等于该起始帧的时间戳。
在一些实施例中,该计算机设备的一个或多个处理器用于执行如下操作:
获取该多媒体资源的音频参数,该音频参数用于表示该媒体帧是否为音频帧;
基于该音频参数和该拉取位置参数,确定目标时间戳;
基于该目标时间戳,确定该多媒体资源的起始帧。
在一些实施例中,该计算机设备的一个或多个处理器用于执行如下操作:
基于该拉取位置参数为默认值,且该音频参数为默认值或该音频参数为假,确定该目标时间戳为最大时间戳减去该拉取位置参数的默认值的绝对值所得的数值;
基于该拉取位置参数为默认值,且该音频参数为真,确定该目标时间戳为最大音频时间戳减去该拉取位置参数的默认值的绝对值所得的数值;
基于该拉取位置参数等于0,且该音频参数为默认值或该音频参数为假,确定该目标时间戳为最大时间戳;
基于该拉取位置参数等于0,且该音频参数为真,确定该目标时间戳为最大音频时间戳;
基于该拉取位置参数小于0,且该音频参数为默认值或该音频参数为假,确定该目标时 间戳为最大时间戳减去该拉取位置参数的绝对值所得的数值;
基于该拉取位置参数小于0,且该音频参数为真,确定该目标时间戳为最大音频时间戳减去该拉取位置参数的绝对值所得的数值;
基于该拉取位置参数大于0,且该音频参数为默认值或该音频参数为假,在缓存区中发生时间戳回退时,确定该目标时间戳为最大时间戳;
基于该拉取位置参数大于0,且该音频参数为真,在缓存区中发生时间戳回退时,确定该目标时间戳为最大音频时间戳;
基于该拉取位置参数大于0,且缓存区中未发生时间戳回退时,确定该目标时间戳为该拉取位置参数。
在一些实施例中,基于该拉取位置参数大于0且缓存区中未发生时间戳回退,该计算机设备的一个或多个处理器用于执行如下操作:
基于当前有效缓存区中存在目标媒体帧,确定该起始帧为该目标媒体帧,该目标媒体帧的时间戳大于或等于该目标时间戳且最接近该目标时间戳;
基于该当前有效缓存区中不存在该目标媒体帧,进入等待状态,直到该目标媒体帧写入该当前有效缓存区时,确定该起始帧为该目标媒体帧。
在一些实施例中,该计算机设备的一个或多个处理器还用于执行如下操作:
基于该当前有效缓存区中不存在该目标媒体帧,且该目标时间戳与最大时间戳之间的差值大于超时阈值,发送拉取失败信息。
在一些实施例中,基于该拉取位置参数大于0,该计算机设备的一个或多个处理器还用于执行如下操作:
基于缓存区中的媒体帧序列中媒体帧的时间戳呈非单调递增,确定该缓存区发生时间戳回退;
基于缓存区中的媒体帧序列中媒体帧的时间戳呈单调递增,确定该缓存区未发生时间戳回退,其中,该媒体帧序列为该缓存区中已缓存的多个媒体帧所构成的序列。
在一些实施例中,该计算机设备的一个或多个处理器还用于执行如下操作:
基于该缓存区中包括视频资源,在关键帧序列中关键帧的时间戳呈非单调递增时,确定该媒体帧序列呈非单调递增,其中,该关键帧序列为该缓存区中已缓存的多个关键帧所构成的序列;
基于该缓存区中不包括视频资源,在音频帧序列中音频帧的时间戳呈非单调递增时,确定该媒体帧序列呈非单调递增,其中,该音频帧序列为该缓存区中已缓存的多个音频帧所构成的序列。
在一些实施例中,该计算机设备的一个或多个处理器还用于执行如下操作:
将最后一个单调递增阶段所包含的各个媒体帧确定为当前有效缓存区内的资源。
在一些实施例中,该计算机设备的一个或多个处理器用于执行如下操作:
确定该起始帧为当前有效缓存区中时间戳最接近该目标时间戳的媒体帧。
在一些实施例中,基于该音频参数为默认值或该音频参数为假,基于当前有效缓存区中包括视频资源,该最大时间戳为最大视频时间戳;基于当前有效缓存区中不包括视频资源,该最大时间戳为最大音频时间戳。
在一些实施例中,该计算机设备的一个或多个处理器用于执行如下操作:
基于该帧获取请求携带拉取位置参数,解析该帧获取请求得到该拉取位置参数;
基于该帧获取请求缺省拉取位置参数,将该拉取位置参数配置为默认值。
在一些实施例中,该计算机设备的一个或多个处理器用于执行如下操作:
基于该帧获取请求,解析得到该多媒体资源的地址信息;
从该起始帧开始发送该地址信息所指示的多媒体资源的媒体帧。
在一些实施例中,还提供了一种计算机程序产品,包括一条或多条指令,该一条或多条指令可以由计算机设备的处理器执行,以完成上述各个实施例提供的资源传输方法。

Claims (25)

  1. 一种资源传输方法,包括:
    响应于多媒体资源的帧获取请求,获取所述多媒体资源的拉取位置参数,所述帧获取请求用于请求传输所述多媒体资源的媒体帧,所述拉取位置参数用于表示所述多媒体资源的媒体帧的起始拉取位置;
    基于所述多媒体资源的拉取位置参数,确定所述多媒体资源的起始帧;
    从所述起始帧开始发送所述多媒体资源的媒体帧,其中,所述媒体帧的时间戳大于或等于所述起始帧的时间戳。
  2. 根据权利要求1所述的资源传输方法,所述基于所述多媒体资源的拉取位置参数,确定所述多媒体资源的起始帧包括:
    获取所述多媒体资源的音频参数,所述音频参数用于表示所述媒体帧是否为音频帧;
    基于所述音频参数和所述拉取位置参数,确定目标时间戳;
    基于所述目标时间戳,确定所述多媒体资源的起始帧。
  3. 根据权利要求2所述的资源传输方法,所述基于所述音频参数和所述拉取位置参数,确定目标时间戳包括:
    基于所述拉取位置参数为默认值,且所述音频参数为默认值或所述音频参数为假,确定所述目标时间戳为最大时间戳减去所述拉取位置参数的默认值的绝对值所得的数值;
    基于所述拉取位置参数为默认值,且所述音频参数为真,确定所述目标时间戳为最大音频时间戳减去所述拉取位置参数的默认值的绝对值所得的数值;
    基于所述拉取位置参数等于0,且所述音频参数为默认值或所述音频参数为假,确定所述目标时间戳为最大时间戳;
    基于所述拉取位置参数等于0,且所述音频参数为真,确定所述目标时间戳为最大音频时间戳;
    基于所述拉取位置参数小于0,且所述音频参数为默认值或所述音频参数为假,确定所述目标时间戳为最大时间戳减去所述拉取位置参数的绝对值所得的数值;
    基于所述拉取位置参数小于0,且所述音频参数为真,确定所述目标时间戳为最大音频时间戳减去所述拉取位置参数的绝对值所得的数值;
    基于所述拉取位置参数大于0,且所述音频参数为默认值或所述音频参数为假,在缓存区中发生时间戳回退时,确定所述目标时间戳为最大时间戳;
    基于所述拉取位置参数大于0,且所述音频参数为真,在缓存区中发生时间戳回退时,确定所述目标时间戳为最大音频时间戳;
    基于所述拉取位置参数大于0,且缓存区中未发生时间戳回退时,确定所述目标时间戳为所述拉取位置参数。
  4. 根据权利要求2所述的资源传输方法,基于所述拉取位置参数大于0且缓存区中未发生时间戳回退,所述基于所述目标时间戳,确定所述多媒体资源的起始帧包括:
    基于当前有效缓存区中存在目标媒体帧,确定所述起始帧为所述目标媒体帧,所述目标媒体帧的时间戳大于或等于所述目标时间戳且最接近所述目标时间戳;
    基于所述当前有效缓存区中不存在所述目标媒体帧,进入等待状态,直到所述目标媒体 帧写入所述当前有效缓存区时,确定所述起始帧为所述目标媒体帧。
  5. 根据权利要求4所述的资源传输方法,所述方法还包括:
    基于所述当前有效缓存区中不存在所述目标媒体帧,且所述目标时间戳与最大时间戳之间的差值大于超时阈值,发送拉取失败信息。
  6. 根据权利要求2所述的资源传输方法,基于所述拉取位置参数大于0,所述方法还包括:
    基于缓存区中的媒体帧序列中媒体帧的时间戳呈非单调递增,确定所述缓存区发生时间戳回退;
    基于缓存区中的媒体帧序列中媒体帧的时间戳呈单调递增,确定所述缓存区未发生时间戳回退,其中,所述媒体帧序列为所述缓存区中已缓存的多个媒体帧所构成的序列。
  7. 根据权利要求6所述的资源传输方法,所述方法还包括:
    基于所述缓存区中包括视频资源,在关键帧序列中关键帧的时间戳呈非单调递增时,确定所述媒体帧序列呈非单调递增,其中,所述关键帧序列为所述缓存区中已缓存的多个关键帧所构成的序列;
    基于所述缓存区中不包括视频资源,在音频帧序列中音频帧的时间戳呈非单调递增时,确定所述媒体帧序列呈非单调递增,其中,所述音频帧序列为所述缓存区中已缓存的多个音频帧所构成的序列。
  8. 根据权利要求6所述的资源传输方法,所述方法还包括:
    将最后一个单调递增阶段所包含的各个媒体帧确定为当前有效缓存区内的资源。
  9. 根据权利要求2所述的资源传输方法,所述基于所述目标时间戳,确定所述多媒体资源的起始帧包括:
    确定所述起始帧为当前有效缓存区中时间戳最接近所述目标时间戳的媒体帧。
  10. 根据权利要求2所述的资源传输方法,基于所述音频参数为默认值或所述音频参数为假,基于当前有效缓存区中包括视频资源,所述最大时间戳为最大视频时间戳;基于当前有效缓存区中不包括视频资源,所述最大时间戳为最大音频时间戳。
  11. 根据权利要求1所述的资源传输方法,所述响应于多媒体资源的帧获取请求,获取所述多媒体资源的拉取位置参数包括:
    基于所述帧获取请求携带拉取位置参数,解析所述帧获取请求得到所述拉取位置参数;
    基于所述帧获取请求缺省拉取位置参数,将所述拉取位置参数配置为默认值。
  12. 根据权利要求1所述的资源传输方法,所述从所述起始帧开始发送所述多媒体资源的媒体帧包括:
    基于所述帧获取请求,解析得到所述多媒体资源的地址信息;
    从所述起始帧开始发送所述地址信息所指示的多媒体资源的媒体帧。
  13. 一种计算机设备,包括:
    一个或多个处理器;
    用于存储所述一个或多个处理器可执行指令的一个或多个存储器;
    其中,所述一个或多个处理器被配置为执行所述指令,以实现如下操作:
    响应于多媒体资源的帧获取请求,获取所述多媒体资源的拉取位置参数,所述帧获取请求用于请求传输所述多媒体资源的媒体帧,所述拉取位置参数用于表示所述多媒体资源的媒 体帧的起始拉取位置;
    基于所述多媒体资源的拉取位置参数,确定所述多媒体资源的起始帧;
    从所述起始帧开始发送所述多媒体资源的媒体帧,其中,所述媒体帧的时间戳大于或等于所述起始帧的时间戳。
  14. 根据权利要求13所述的计算机设备,所述一个或多个处理器被配置为执行所述指令,以实现如下操作:
    获取所述多媒体资源的音频参数,所述音频参数用于表示所述媒体帧是否为音频帧;
    基于所述音频参数和所述拉取位置参数,确定目标时间戳;
    基于所述目标时间戳,确定所述多媒体资源的起始帧。
  15. 根据权利要求14所述的计算机设备,所述一个或多个处理器被配置为执行所述指令,以实现如下操作:
    基于所述拉取位置参数为默认值,且所述音频参数为默认值或所述音频参数为假,确定所述目标时间戳为最大时间戳减去所述拉取位置参数的默认值的绝对值所得的数值;
    基于所述拉取位置参数为默认值,且所述音频参数为真,确定所述目标时间戳为最大音频时间戳减去所述拉取位置参数的默认值的绝对值所得的数值;
    基于所述拉取位置参数等于0,且所述音频参数为默认值或所述音频参数为假,确定所述目标时间戳为最大时间戳;
    基于所述拉取位置参数等于0,且所述音频参数为真,确定所述目标时间戳为最大音频时间戳;
    基于所述拉取位置参数小于0,且所述音频参数为默认值或所述音频参数为假,确定所述目标时间戳为最大时间戳减去所述拉取位置参数的绝对值所得的数值;
    基于所述拉取位置参数小于0,且所述音频参数为真,确定所述目标时间戳为最大音频时间戳减去所述拉取位置参数的绝对值所得的数值;
    基于所述拉取位置参数大于0,且所述音频参数为默认值或所述音频参数为假,在缓存区中发生时间戳回退时,确定所述目标时间戳为最大时间戳;
    基于所述拉取位置参数大于0,且所述音频参数为真,在缓存区中发生时间戳回退时,确定所述目标时间戳为最大音频时间戳;
    基于所述拉取位置参数大于0,且缓存区中未发生时间戳回退时,确定所述目标时间戳为所述拉取位置参数。
  16. 根据权利要求14所述的计算机设备,基于所述拉取位置参数大于0且缓存区中未发生时间戳回退,所述一个或多个处理器被配置为执行所述指令,以实现如下操作:
    基于当前有效缓存区中存在目标媒体帧,确定所述起始帧为所述目标媒体帧,所述目标媒体帧的时间戳大于或等于所述目标时间戳且最接近所述目标时间戳;
    基于所述当前有效缓存区中不存在所述目标媒体帧,进入等待状态,直到所述目标媒体帧写入所述当前有效缓存区时,确定所述起始帧为所述目标媒体帧。
  17. 根据权利要求16所述的计算机设备,所述一个或多个处理器还被配置为执行所述指令,以实现如下操作:
    基于所述当前有效缓存区中不存在所述目标媒体帧,且所述目标时间戳与最大时间戳之间的差值大于超时阈值,发送拉取失败信息。
  18. 根据权利要求14所述的计算机设备,基于所述拉取位置参数大于0,所述一个或多个处理器还被配置为执行所述指令,以实现如下操作:
    基于缓存区中的媒体帧序列中媒体帧的时间戳呈非单调递增,确定所述缓存区发生时间戳回退;
    基于缓存区中的媒体帧序列中媒体帧的时间戳呈单调递增,确定所述缓存区未发生时间戳回退,其中,所述媒体帧序列为所述缓存区中已缓存的多个媒体帧所构成的序列。
  19. 根据权利要求18所述的计算机设备,所述一个或多个处理器还被配置为执行所述指令,以实现如下操作:
    基于所述缓存区中包括视频资源,在关键帧序列中关键帧的时间戳呈非单调递增时,确定所述媒体帧序列呈非单调递增,其中,所述关键帧序列为所述缓存区中已缓存的多个关键帧所构成的序列;
    基于所述缓存区中不包括视频资源,在音频帧序列中音频帧的时间戳呈非单调递增时,确定所述媒体帧序列呈非单调递增,其中,所述音频帧序列为所述缓存区中已缓存的多个音频帧所构成的序列。
  20. 根据权利要求18所述的计算机设备,所述一个或多个处理器还被配置为执行所述指令,以实现如下操作:
    将最后一个单调递增阶段所包含的各个媒体帧确定为当前有效缓存区内的资源。
  21. 根据权利要求14所述的计算机设备,所述一个或多个处理器被配置为执行所述指令,以实现如下操作:
    确定所述起始帧为当前有效缓存区中时间戳最接近所述目标时间戳的媒体帧。
  22. 根据权利要求14所述的计算机设备,基于所述音频参数为默认值或所述音频参数为假,基于当前有效缓存区中包括视频资源,所述最大时间戳为最大视频时间戳;基于当前有效缓存区中不包括视频资源,所述最大时间戳为最大音频时间戳。
  23. 根据权利要求13所述的计算机设备,所述一个或多个处理器被配置为执行所述指令,以实现如下操作:
    基于所述帧获取请求携带拉取位置参数,解析所述帧获取请求得到所述拉取位置参数;
    基于所述帧获取请求缺省拉取位置参数,将所述拉取位置参数配置为默认值。
  24. 根据权利要求13所述的计算机设备,所述一个或多个处理器被配置为执行所述指令,以实现如下操作:
    基于所述帧获取请求,解析得到所述多媒体资源的地址信息;
    从所述起始帧开始发送所述地址信息所指示的多媒体资源的媒体帧。
  25. 一种存储介质,当所述存储介质中的至少一条指令由计算机设备的一个或多个处理器执行时,使得计算机设备能够执行如下操作:
    响应于多媒体资源的帧获取请求,获取所述多媒体资源的拉取位置参数,所述帧获取请求用于请求传输所述多媒体资源的媒体帧,所述拉取位置参数用于表示所述多媒体资源的媒体帧的起始拉取位置;
    基于所述多媒体资源的拉取位置参数,确定所述多媒体资源的起始帧;
    从所述起始帧开始发送所述多媒体资源的媒体帧,其中,所述媒体帧的时间戳大于或等于所述起始帧的时间戳。
PCT/CN2020/131552 2020-01-17 2020-11-25 资源传输方法及计算机设备 WO2021143360A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP20913803.1A EP3941070A4 (en) 2020-01-17 2020-11-25 RESOURCE TRANSFER METHOD AND COMPUTER DEVICE
US17/517,973 US20220060532A1 (en) 2020-01-17 2021-11-03 Method for transmitting resources and electronic device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010054760.6 2020-01-17
CN202010054760.6A CN113141522B (zh) 2020-01-17 2020-01-17 资源传输方法、装置、计算机设备及存储介质

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/517,973 Continuation US20220060532A1 (en) 2020-01-17 2021-11-03 Method for transmitting resources and electronic device

Publications (1)

Publication Number Publication Date
WO2021143360A1 true WO2021143360A1 (zh) 2021-07-22

Family

ID=76809554

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/131552 WO2021143360A1 (zh) 2020-01-17 2020-11-25 资源传输方法及计算机设备

Country Status (4)

Country Link
US (1) US20220060532A1 (zh)
EP (1) EP3941070A4 (zh)
CN (1) CN113141522B (zh)
WO (1) WO2021143360A1 (zh)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113596499B (zh) * 2021-07-29 2023-03-24 北京达佳互联信息技术有限公司 直播数据处理方法、装置、计算机设备及介质
CN114697303B (zh) * 2022-03-16 2023-11-03 北京金山云网络技术有限公司 一种多媒体数据处理方法、装置、电子设备及存储介质
CN116489342B (zh) * 2023-06-20 2023-09-15 中央广播电视总台 确定编码延时的方法、装置、及电子设备、存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108540819A (zh) * 2018-04-12 2018-09-14 腾讯科技(深圳)有限公司 直播数据处理方法、装置、计算机设备和存储介质
CN108737908A (zh) * 2018-05-21 2018-11-02 腾讯科技(深圳)有限公司 一种媒体播放方法、装置及存储介质
CN110072123A (zh) * 2018-01-24 2019-07-30 中兴通讯股份有限公司 一种视频的恢复播放方法、视频播放终端及服务器
US20190342585A1 (en) * 2017-02-19 2019-11-07 Jonathan James Valliere System and Method for Intelligent Delivery of Segmented Media Streams

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2158747B1 (en) * 2007-06-20 2016-11-23 Telefonaktiebolaget LM Ericsson (publ) Method and arrangement for improved media session management
US20110096828A1 (en) * 2009-09-22 2011-04-28 Qualcomm Incorporated Enhanced block-request streaming using scalable encoding
CN102761524B (zh) * 2011-04-27 2017-06-23 中兴通讯股份有限公司 一种流媒体存储、播放方法及相应系统
CN102957672A (zh) * 2011-08-25 2013-03-06 中国电信股份有限公司 自适应播放flv媒体流的方法、客户端和系统
US9905269B2 (en) * 2014-11-06 2018-02-27 Adobe Systems Incorporated Multimedia content duration manipulation
CN106686445B (zh) * 2015-11-05 2019-06-11 北京中广上洋科技股份有限公司 对多媒体文件进行按需跳转的方法
US10638192B2 (en) * 2017-06-19 2020-04-28 Wangsu Science & Technology Co., Ltd. Live streaming quick start method and system
US20190104326A1 (en) * 2017-10-03 2019-04-04 Qualcomm Incorporated Content source description for immersive media data
CN110545491B (zh) * 2018-05-29 2021-08-10 北京字节跳动网络技术有限公司 一种媒体文件的网络播放方法、装置及存储介质
US11082752B2 (en) * 2018-07-19 2021-08-03 Netflix, Inc. Shot-based view files for trick play mode in a network-based video delivery system
CN110493324A (zh) * 2019-07-29 2019-11-22 咪咕视讯科技有限公司 下载方法、下载器及计算机可读存储介质

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190342585A1 (en) * 2017-02-19 2019-11-07 Jonathan James Valliere System and Method for Intelligent Delivery of Segmented Media Streams
CN110072123A (zh) * 2018-01-24 2019-07-30 中兴通讯股份有限公司 一种视频的恢复播放方法、视频播放终端及服务器
CN108540819A (zh) * 2018-04-12 2018-09-14 腾讯科技(深圳)有限公司 直播数据处理方法、装置、计算机设备和存储介质
CN108737908A (zh) * 2018-05-21 2018-11-02 腾讯科技(深圳)有限公司 一种媒体播放方法、装置及存储介质

Also Published As

Publication number Publication date
US20220060532A1 (en) 2022-02-24
EP3941070A1 (en) 2022-01-19
EP3941070A4 (en) 2022-06-01
CN113141522B (zh) 2022-09-20
CN113141522A (zh) 2021-07-20

Similar Documents

Publication Publication Date Title
US9344517B2 (en) Downloading and adaptive streaming of multimedia content to a device with cache assist
US11356491B2 (en) Streamlined delivery of video content
WO2021143360A1 (zh) 资源传输方法及计算机设备
US20170195744A1 (en) Live-stream video advertisement system
CA2988320C (en) Http live streaming (hls) video client synchronization
US10791366B2 (en) Fast channel change in a video delivery network
CN105681912A (zh) 一种视频播放方法和装置
CN110933517B (zh) 码率切换方法、客户端和计算机可读存储介质
KR20170012471A (ko) 개량된 스트리밍 미디어 재생
EP2493191B1 (en) Method, device and system for realizing hierarchically requesting content in http streaming system
WO2019128800A1 (zh) 一种内容服务的实现方法、装置及内容分发网络节点
US11647252B2 (en) Identification of elements in a group for dynamic element replacement
US11765421B2 (en) Client based storage of remote element resolutions
US11184655B1 (en) System and method for intelligent delivery of segmented media streams
US10687106B2 (en) System and method for distributed control of segmented media
US10237195B1 (en) IP video playback
JPWO2014010445A1 (ja) コンテンツ送信装置、コンテンツ再生装置、コンテンツ配信システム、コンテンツ送信装置の制御方法、コンテンツ再生装置の制御方法、データ構造、制御プログラムおよび記録媒体
WO2019061256A1 (zh) 基于流媒体的音视频播放方法及装置
US10051024B2 (en) System and method for adapting content delivery
US11392643B2 (en) Validation of documents against specifications for delivery of creatives on a video delivery system
US20240114184A1 (en) Methods, systems, and apparatuses for improved transmission of content
US20200260144A1 (en) Video Stream Switching Service
CN113364728A (zh) 媒体内容接收方法、装置、存储介质和计算机设备
CN116527992A (zh) 视频播放分辨率的切换方法及装置
CN114501166A (zh) Dash点播快进快退方法及系统

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20913803

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 20913803.1

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2020913803

Country of ref document: EP

Effective date: 20211014

NENP Non-entry into the national phase

Ref country code: DE