WO2022002070A1

WO2022002070A1 - Adaptive real-time delivery method for media stream, and server

Info

Publication number: WO2022002070A1
Application number: PCT/CN2021/103196
Authority: WO
Inventors: 姜红旗; 辛振涛; 姜红艳; 申素辉
Original assignee: 北京开广信息技术有限公司
Priority date: 2020-06-30
Filing date: 2021-06-29
Publication date: 2022-01-06
Also published as: CN113873343B; CN113873343A

Abstract

Disclosed in the present application are an adaptive real-time delivery method for a media stream, and a server. The method comprises: receiving a media segment request sent by a client, wherein the media segment request carries at least one pulling command; generating a media segment according to the media segment request, comprising: for each pulling command in the media segment request, selecting a target media stream to be transmitted, selecting at least one target media sub-stream to be transmitted in the target media stream, determining a candidate media unit to be transmitted in the target media sub-stream, and encapsulating the candidate media unit determined by each pulling command into the media segment; and sending the media segment to the client. According to embodiments of the present application, the media units of the selected sub-streams can be combined in real time according to the request of the client to generate a media segment, thereby simplifying the synchronous transmission between the sub-streams while reducing the storage overhead on the server, and adaptive real-time transmission of various multi-sub-stream media streams is supported uniformly.

Description

Adaptive real-time delivery method and server for media stream

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims the priority of the Chinese patent application number "202010614997.5", filed by Beijing Kaiguang Information Technology Co., Ltd. on June 30, 2020, with the invention titled "Adaptive Real-time Delivery Method and Server for Media Streams".

technical field

The present application relates to the technical field of digital information transmission, and in particular, to an adaptive real-time delivery method and server of a media stream.

Background technique

With the rapid development of the Internet, especially the mobile Internet, the real-time transmission of multimedia information such as audio, video, and images through the Internet has become a basic requirement for many applications (such as webcasting, real-time monitoring, video conferencing, etc.). To meet this demand, people Various streaming media real-time transmission technologies are proposed, and currently there are three types of widely used: real-time transmission protocol (RTP (Real-time Transport Protocol, real-time transmission protocol) / RTSP (Real Time Streaming Protocol, real-time streaming protocol)) , RTMP (Real Time Messaging Protocol, real-time messaging protocol) and HTTP (HyperText Transfer Protocol, hypertext transfer protocol) adaptive streaming HAS (HTTP Adaptive Streaming). Among them, HTTP adaptive streaming includes various schemes: HLS (HTTP Live Streaming) proposed by Apple, Smooth Streaming proposed by Microsoft, HDS (HTTP Dynamic Streaming) proposed by Adobe, and DASH (Dynamic Adaptive Streaming) proposed by MPEG. Streaming over HTTP, HTTP-based dynamic adaptive streaming).

The common feature of the above HTTP adaptive streaming scheme is that the media stream is cut into short-term (2s ~ 10s) media segments, and an index file or manifest file describing these media segments is generated at the same time (such as m3u8 playlist in HLS or MPD file in DASH), and then save it to each web server, the client obtains the URL (Uniform Resource Locator, Uniform Resource Locator) access address of these media segments by accessing the playlist or manifest file, and then can use HTTP protocol to download and play these media segments one by one. The main difference between these schemes is reflected in the encapsulation format and manifest file format adopted by the media segment.

Compared with RTP/RTSP and RTMP, HTTP adaptive streaming is easy to deploy using common web servers and adapts to the existing Internet infrastructure, including CDN, Caches, Firewall and NATS, etc., and can support large-scale user access. At the same time, by providing media clips with multiple bit rates, the client can also select clips with suitable bit rates according to network conditions and terminal capabilities, so as to realize bit rate adaptation. Therefore, HTTP adaptive streaming has become the mainstream way of real-time streaming media delivery on the Internet.

With the development of multimedia technology, the form of media stream has become more and more complex. The earliest media stream usually only includes one audio and/or one video, but in the future, a media stream transmitted on the Internet may include dozens of media sub-streams, which are manifested in the following aspects: 1) Various types of media sub-streams , the same scene can generate multiple types of media sub-streams including video, audio, subtitles, pictures, auxiliary information, data, etc. These media sub-streams need to be mixed together for transmission; 2) Multi-bit rate encoding, in order to adapt to the network bandwidth Transmission needs and processing capabilities of different terminals. The same video stream can generate multiple encoded sub-streams according to different resolutions, frame rates and code rates, and multiple audio streams can generate multiple coded sub-streams according to different languages, sampling rates and code rates. Coding sub-streams; 3) Multi-view Video, in order to obtain a more realistic video experience, the same scene will generate multiple video sub-streams from different viewpoints, such as 3D video or free-view video; 4) Multi-sound In order to obtain an immersive audio experience, the same scene will be sampled from different positions to generate multiple audio sub-streams; 5) Scalable Video Coding (SVC), in order to adapt to the transmission of network bandwidth, one channel of video A base layer and several enhancement layers are produced during encoding. Further, any combination of the above aspects (eg, using multi-view video while using multi-rate video coding or scalable coding for each view) will result in a surprisingly large number of media sub-streams and media streams.

In the related art, when various HTTP adaptive streaming protocols transmit the media streams of the above-mentioned multiple sub-streams, they all need to pre-segment the media streams and generate corresponding manifest files (such as M3U8 in HLS or MPD in DASH). file), there are two pre-segmentation schemes:

Option 1, sub-stream combined segmentation, that is, encapsulating video sub-stream segments and audio sub-stream segments of the same time range in the same media segment and corresponding to an HTTP URL. The client only needs to request once to get the corresponding video clips and audio clips, which ensures the synchronization of each substream and simplifies the processing of the receiving end. However, once the number of video substreams and audio substreams increases, different video substreams The number of combinations of streams and audio sub-streams will increase rapidly, and each combination will generate a new segment, which leads to repeated storage of video sub-streams and audio sub-streams on the server side, increasing the storage overhead of the server.

Scheme 2, sub-streams are segmented independently, that is, each sub-stream is segmented independently, but the time alignment between segments of these different sub-streams is maintained, and each sub-stream segment corresponds to a URL. In this way of sub-stream independent segmentation, the client can request the segmentation of each sub-stream as needed, and the server does not need to store the combined segmentation of each sub-stream, but because the client needs to submit requests multiple times to obtain different sub-streams Stream segmentation, which increases the transmission overhead and the difficulty of synchronization processing. On the other hand, when pre-segmenting each sub-stream, it is necessary to strictly ensure the synchronization between different sub-stream segments, which makes the Segment processing is more complex.

In addition, the above HTTP adaptive streaming transmission scheme has another problem: in order to support real-time transmission, the server needs to continuously update its manifest file, and the client needs to obtain the manifest file before obtaining the URL address of the latest media segment. Since the manifest file needs to be transmitted to the client after a period of time, the manifest file obtained by the client does not reflect the current generation of the latest media segment on the server, which will affect the real-time transmission performance of the media stream. When the number of substreams or combinations in the media stream reaches dozens, the manifest file will become very complicated, further increasing the transmission overhead and processing overhead of the client receiving the media stream.

To sum up, the HTTP adaptive streaming transmission scheme based on pre-segmentation and manifest file is not suitable for adaptive real-time delivery of media streams containing many sub-streams, and a new delivery method needs to be designed for it.

SUMMARY OF THE INVENTION

The present application aims to solve one of the technical problems in the related art at least to a certain extent.

Therefore, the first purpose of this application is to propose an adaptive real-time delivery method for media streams, which simplifies the synchronous transmission between sub-streams while reducing the storage overhead on the server, and supports various types of Adaptive real-time delivery of sub-stream media streams (eg using multi-rate coding/multi-view/multi-channel/scalable coding).

The second purpose of this application is to propose an adaptive real-time delivery server for media streams.

The third object of the present application is to propose a computer device.

The fourth object of the present application is to provide a non-transitory computer-readable storage medium.

In order to achieve the above object, an embodiment of the present application proposes an adaptive real-time delivery method for a media stream, the media stream includes at least one media sub-stream, and each media sub-stream is a sequence of media units generated in real time on a server, wherein , each media sub-stream is associated with a sub-stream number, and each media unit is associated with a generation time and/or a sequence number indicating the generation sequence of the media unit in the media sub-stream, the method includes the following steps: receiving a client A media segment request sent by the terminal, wherein the media segment request carries at least one pull command, and the pull command does not carry or carries at least one control parameter, and the control parameter includes a first parameter indicating the target media stream to be transmitted. class parameter, a second class parameter indicating the target media substream to be transmitted, and a third class parameter indicating a candidate media unit to be transmitted; a media segment is generated according to the media segment request, wherein, for the media segment request in the For each pull command, the target media stream to be transmitted is selected, at least one target media sub-stream to be transmitted in the target media stream is selected, and the candidate media unit to be transmitted in the target media sub-stream is determined, and encapsulate the candidate media units determined by each pull command into the media segment; and send the media segment to the client.

The adaptive real-time delivery method of the media stream according to the embodiment of the present application can arbitrarily combine the media units of each sub-stream according to the request of the client, generate the media segment in real time, and deliver the media segment to the client. First, this makes the server only need to store the media units according to each sub-stream, and does not need to generate fragments of various sub-stream combinations in advance, which reduces the storage requirements of the server, and at the same time, simplifies the synchronization processing of the client, and the client only needs to request once The combined segment of each substream in the same time period can be obtained, and it is easy to ensure the synchronous reception of each substream. Secondly, the client can dynamically adjust the target media sub-stream in the media segment request according to application needs and network conditions, so that various types of multi-sub-stream media streams (such as multi-rate encoding/multi-view/multi-stream media streams) can be uniformly supported. Channel/Scalable Coding) adaptive delivery. Finally, since each media segment is triggered by the client's request, no matter how many sub-streams the media stream includes, the manifest file is no longer required, and the client does not need to request and parse the manifest file, which significantly reduces the complexity of manifest files. Therefore, the real-time transmission delay and transmission overhead of the media stream can be effectively reduced.

In order to achieve the above purpose, an embodiment of the present application proposes an adaptive real-time delivery server for a media stream, the media stream includes at least one media sub-stream, and each media sub-stream is a sequence of media units generated in real time on the server, wherein , each media substream is associated with a substream number, and each media unit is associated with a generation time and/or a sequence number indicating the sequence in which the media unit is generated in the media substream, and the server includes: a client interface component , used to receive a media segment request sent by the client, wherein the media segment request carries at least one pull command, the pull command does not carry or carries at least one control parameter, and each control parameter includes an instruction to be transmitted The first type parameter of the target media stream, the second type parameter indicating the target media sub-stream to be transmitted, and the third type parameter indicating the candidate media unit to be transmitted; a media segment generating component for generating according to the media segment request media segment, wherein, for each pull command in the media segment request, the target media stream to be transmitted is selected, at least one target media sub-stream to be transmitted in the target media stream is selected, and the The candidate media unit to be transmitted in the target media substream, and the candidate media units determined by each pull command are encapsulated into the media segment; the media segment sending component is used to send the generated media segment to the client. .

The adaptive real-time delivery server of the media stream in the embodiment of the present application can arbitrarily combine the media units of each substream according to the request of the client, generate media segments in real time, and deliver the media segments to the client. First, this makes the server only need to store the media units according to each sub-stream, and does not need to generate fragments of various sub-stream combinations in advance, which reduces the storage requirements of the server, and at the same time, simplifies the synchronization processing of the client, and the client only needs to request once The combined segment of each substream in the same time period can be obtained, and it is easy to ensure the synchronous reception of each substream. Secondly, the client can dynamically adjust the target media sub-stream in the media segment request according to application needs and network conditions, so that various types of multi-sub-stream media streams (such as multi-rate encoding/multi-view/multi-stream media streams) can be uniformly supported. Channel/Scalable Coding) adaptive delivery. Finally, since each media segment is triggered by the client's request, no matter how many sub-streams the media stream includes, the manifest file is no longer required, and the client does not need to request and parse the manifest file, which significantly reduces the complexity of manifest files. Therefore, the real-time transmission delay and transmission overhead of the media stream can be effectively reduced.

An embodiment of the present application provides a computer device, including: at least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores instructions executable by the at least one processor, The instructions are arranged to perform an adaptive real-time delivery method for media streams as described in the above embodiments.

Embodiments of the present application provide a non-transitory computer-readable storage medium, where the non-transitory computer-readable storage medium stores computer instructions, and the computer instructions are used to cause the computer to execute the media stream described in the foregoing embodiments Adaptive real-time delivery method.

Additional aspects and advantages of the present application will be set forth, in part, in the following description, and in part will be apparent from the following description, or learned by practice of the present application.

Description of drawings

The above and/or additional aspects and advantages of the present application will become apparent and readily understood from the following description of embodiments taken in conjunction with the accompanying drawings, wherein:

1 is a schematic diagram of a processing process of a method for adaptive real-time delivery of media streams according to an embodiment of the present application;

2 is a schematic diagram of an adaptive real-time transmission process of a media stream according to an embodiment of the present application;

3 is a schematic diagram of a sub-flow pattern according to an embodiment of the present application;

4 is a schematic diagram of media substream description information (including multi-rate coding substreams) according to an embodiment of the present application;

5 is a schematic diagram of media substream description information (including multi-view video substreams) according to an embodiment of the present application;

6 is a schematic diagram of media substream description information (including scalable coding substreams) according to an embodiment of the present application;

7 is a schematic diagram of an adaptive real-time transmission process of a media stream according to an embodiment of the present application;

8 is a schematic diagram of an adaptive real-time transmission process of a media stream according to an embodiment of the present application;

9 is a schematic diagram of candidate media unit encapsulation under different media unit sorting modes according to an embodiment of the present application;

10 is a schematic structural diagram of an adaptive real-time delivery server for media streams according to an embodiment of the present application;

11 is a schematic structural diagram of an adaptive real-time delivery server for media streams according to a specific embodiment of the present application;

FIG. 12 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.

detailed description

The following describes in detail the embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein the same or similar reference numerals refer to the same or similar elements or elements having the same or similar functions throughout. The embodiments described below with reference to the accompanying drawings are exemplary, and are intended to be used to explain the present application, but should not be construed as a limitation to the present application.

In the Internet, it is often necessary to transfer various real-time audio streams, video streams or data streams from one network node to another network node. These network nodes include various terminals, such as PCs, mobile phones, tablet computers, and It includes various application servers, such as a video server and an audio server. Here, the transmitted audio streams, video streams or data streams are collectively referred to as media streams. The delivery process of the media stream can be described by a general client-server model: the server delivers the generated media stream to the client in real time. The server and the client here refer to logical functional entities, wherein the server is a functional entity that sends a media stream, and the client is a functional entity that receives a media stream. Servers and clients can exist on any network node.

In many application scenarios, there are multiple audio streams, video streams or data streams that need to be transmitted synchronously. For example, a live media stream of a concert includes at least one video stream and at least one audio stream. Once any one or more of multi-rate coding/multi-view coding/multi-channel coding/scalable coding is adopted, there will be multiple data streams\video streams\audio streams in the live stream. Here, all synchronously transmitted video streams, audio streams or data streams in a live media stream to be transmitted are referred to as media sub-streams of the media stream.

Each delivered media substream is a sequence of media units generated in real time on the server. For different media substreams, the corresponding media units can be selected by themselves. When the media substream is a real-time generated byte stream, one byte can be selected as the media unit; when the media substream is an audio stream or video stream obtained by real-time sampling, the original audio frame or video frame can be selected is the media unit; when the media substream is an audio stream or video stream sampled and encoded in real time, the encoded audio frame, the encoded video frame or the Access Unit can be selected as the media unit; when the media substream When the stream is an audio stream or video stream that has been sampled, encoded and encapsulated in real time, the encapsulated transport packet (such as RTP packet, PES/PS/TS packet, etc.) can be selected as the media unit; When sampling, encoding, encapsulating and pre-segmented audio or video streams, a segmented media segment (such as the TS format segment used in the HLS protocol and the fMP4 format segment used in the DASH protocol) can be selected as a media unit.

Each media unit can be associated with a production time, which is usually a timestamp. Each media unit may also be associated with a sequence number, which may be used to indicate the order in which the media units are generated in the media substream. When the sequence number is used to indicate the order in which the media unit is generated, the meaning of the sequence number needs to be defined according to the specific media unit. When the media unit is one byte, the sequence number of the media unit is the byte sequence number; when the media unit is an audio frame or a video frame, the sequence number of the media unit is the frame sequence number; when the media unit is a transmission packet, the sequence number of the media unit is the packet sequence number; when the media unit is a stream segment, the sequence number of the media unit is the segment sequence number (such as the Media Sequence of each TS segment in HLS). For a media substream, a sequence number representing the generation sequence and a generation time can be associated at the same time. For example, when the media substream is an RTP packet stream, the RTP header has a packet sequence number (Sequence Number) field to indicate the RTP The sequence of the packets, and the Timestamp field to indicate the generation time of the media data encapsulated in the RTP.

In order to identify different media substreams in a media stream, the media substreams need to be numbered: each media substream is associated with a unique substream number. Generally speaking, when a media stream includes N media substreams, the corresponding substreams are numbered 1, 2, . . . , N. The generation time and/or sequence number of each media substream may be used to describe the generation sequence of each media unit.

The generation time of the media units of different media substreams may be synchronous timing or independent timing. When independent timing is adopted, the generation times of different media sub-streams are derived from asynchronous clocks. Therefore, it is necessary to separately record the corresponding relationship between the generation times of these different media sub-streams. When synchronous timing is used, the generation times of different media sub-streams are derived from the same reference clock, and the synchronization relationship of media units in different media sub-streams can be known through the generation times. For the sake of simplicity, it is assumed that in all the following embodiments, the generation time of all media substreams in one media stream uses the same reference clock on the server, which corresponds to the same time line, such as Greenwich Mean Time.

In this embodiment of the present application, a media stream includes at least one media substream, wherein each media substream may be of any type, such as an audio stream, a video stream, or a subtitle stream, and each media substream may also adopt any transmission encapsulation type, Such as RTP packet stream or MPEG2-TS stream. When the media substream is an RTP packet stream, the media unit is an RTP packet, the sequence number of the RTP packet (Sequence Number) is the sequence number of the media unit, and the timestamp (Timestamp) of the RTP packet is the timestamp of the media unit. When the media substream is an MPEG2-TS stream, a method similar to HLS/DASH can be used to divide the TS stream into TS segments of a fixed duration (such as about 1 second), and each TS segment is regarded as a media unit. Each TS segment may include a plurality of media frames, and then the segments are numbered in the sequence of generation, as the sequence number of the media unit, and the time stamp of the first media frame included in each segment indicates the generation time of the segment.

In traditional real-time streaming media protocols such as RTP or RTMP, the server push method is adopted: once there is a new media unit on the server, it will be actively sent to the client. The method of the embodiment of the present application is similar to various HTTP adaptive streams (such as HLS and MPEG-DASH), and adopts the method of pulling by the client, but the difference is that in the existing various HTTP adaptive streams, the client All of the pre-segmented segments are requested or pulled according to the manifest file, and each segment can be identified by a URL. In this embodiment of the present application, the media segment is not pre-segmented, but the server real-time according to the client's request. generated, the client can control the content of the media segment.

Specifically, FIG. 1 is a schematic diagram of a processing process of a method for adaptive real-time delivery of a media stream provided by an embodiment of the present application.

As shown in FIG. 1 , the media stream includes at least one media substream, and each media substream is a sequence of media units generated in real time on the server, wherein each media substream is associated with a substream number, and each media unit is associated with There is a generation time and/or a sequence number indicating the generation sequence of the media unit in the media sub-stream, then the adaptive real-time delivery method of the media stream comprises the following steps:

In step S101, a media segment request sent by the client is received, wherein the media segment request carries at least one pull command, the pull command does not carry or carries at least one control parameter, and each control parameter includes an indication of the target media to be transmitted A first type of parameter for the stream, a second type of parameter indicating the target media sub-stream to be delivered, and a third type of parameter indicating the candidate media unit to be delivered.

In some examples, the control parameters that can be used as the first type of parameters include but are not limited to: media stream identifier, media stream name, program identifier, etc.; the control parameters that can be used as the second type of parameters include but are not limited to: substream list, substream Pattern, sub-stream type, sub-stream priority, etc.; control parameters that can be used as the third type of parameters include but are not limited to: start sequence number, start time, maximum time offset, unit type, unit priority, etc. It should be understood by those skilled in the art that new control parameters can also be defined according to the needs of further implementation.

In some cases, a media segment request may carry one or more pull commands, and these pull commands all carry respective control parameters, or a pull command may not carry any control parameters. Additionally, new commands other than pull commands can be defined as needed for further implementation.

Certainly, in some embodiments, the media segment request may be submitted using any network transmission protocol, such as common HTTP protocol, TCP protocol, UDP protocol, and so on. When using the HTTP protocol to submit the media segment request, the HTTP-GET method or the HTTP-POST method can also be used.

When the pull command in the media segment request carries control parameters, certain encapsulation rules need to be used to encapsulate the pull command and its control parameters into a string or byte stream, and then send it to the server. For example, when using HTTP-GET to send a media segment request, the command and its control parameters can be encapsulated in the URL as strings.

An example of a media segment request using HTTP-GET is as follows:

Media segment request with a pull command (without control parameters):

GET "http://www.xxx-server.com/msreq?cmd=PULL"

Media segment request with a pull command (with 1 control parameter):

GET "http://www.xxx-server.com/msreq?cmd=PULL&streamID=601"

GET "http://www.xxx-server.com/msreq?cmd=PULL&substreamList=2,3"

GET "http://www.xxx-server.com/msreq?cmd=PULL&substreamPattern=01100100"

GET "http://www.xxx-server.com/msreq?cmd=PULL&seqBegin=1003"

GET "http://www.xxx-server.com/msreq?cmd=PULL&timeBegin=31000"

GET "http://www.xxx-server.com/msreq?cmd=PULL&maxTimeOffset=1000"

A media segment request carrying a pull command (with multiple control parameters) (split the long URL string into multiple lines for easy display):

GET "http://www.xxx-server.com/msreq?cmd=PULL&streamID=601&timeBegin=32000"

GET "http://www.xxx-server.com/msreq?cmd=PULL&substreamList=1,2&seqBegin=1005"

GET "http://www.xxx-server.com/msreq?

cmd=PULL&substreamList=3&seqBegin=1005&unitType=3"

GET "http://www.xxx-server.com/msreq?

cmd=PULL&substreamPattern=1101&timeBegin=32000&unitPrio=1"

Media segment request with multiple pull commands: (split long URL strings into multiple lines for easy display)

GET "http://www.xxx-server.com/msreq?

cmd=PULL&streamID=601&substreamList=1&timeBegin=32000&

cmd=PULL&streamID=601&substreamList=2&seqBegin=1005"

GET "http://www.xxx-server.com/msreq?

cmd=PULL&streamID=601&substreamPattern=0100001&timeBegin=32000&

cmd=PULL&streamID=601&substreamList=2&seqBegin=1005"

GET "http://www.xxx-server.com/msreq?

cmd=PULL&streamID=601&substreamPattern=0100001&timeBegin=32000&

cmd=PULL&streamID=602&substreamPattern=0011&timeBegin=32000”

In the URL of the above request, the parameter names streamID, substreamList, substreamPattern, seqBegin, timeBegin, maxTimeOffset, unitType, and unitPrio respectively represent the media stream ID, substream list, substream pattern, start sequence number, start time, maximum time offset, unit Type, unit priority.

The server side can use a web server to receive the media segment request from the above client, extract the corresponding command and its control parameters from the requested URL, and classify the control parameters carried by each pull command: if it is a media stream identifier or media stream name, this parameter is the first type parameter; if it is a substream list or substream pattern, this parameter is the second type parameter; if it is one of the following parameters: start sequence number, start time, maximum time offset, unit type, unit priority, then this parameter is the third type of parameter.

In step S102, a media segment is generated according to the media segment request, wherein, for each pull command in the media segment request, a target media stream to be transmitted is selected, and at least one target media substream to be transmitted in the target media stream is selected. stream, determine the candidate media units to be transmitted in the target media substream, and encapsulate the candidate media units determined by each pull command into media segments.

Specifically, the media segment is generated according to the media segment request, and this step can be further divided into several sub-steps S1021-S1024: First, for each pull command in the media segment request, step S1021 selects the pending The target media stream to be transmitted, step S1022 selects the target media substream in the aforementioned target media stream according to the second type parameter, step S1023 determines the candidate media unit to be transmitted in the aforementioned target media substream according to the third type parameter, and step S1024 will Candidate media units identified in all pull commands are packaged into media segments.

In some instances, in step S1021, the target media stream to be transmitted may be selected according to the media stream identifier or the media stream name, and in step S1022, the target media substream may be selected according to parameters such as the substream list and substream pattern. stream, in step S1023, the candidate media unit can be determined according to parameters such as the starting sequence number, starting time, maximum time offset, etc., and in step S1024, one or more media units can be encapsulated into media using a self-defined encapsulation protocol For example, a simple encapsulation protocol is as follows: a media segment consists of a segment header and a segment payload, and the segment payload is formed by concatenating several media units. The segment header indicates the starting position and length of each media unit. When the unit does not carry the generation time or sequence number, the sequence number and/or generation time of each media unit shall also be indicated in the segment header, and when each media unit does not carry the sub-stream number, each media unit shall also be indicated in the segment header. The substream number of the unit.

In step S103, the media segment is sent to the client.

Specifically, the server can select an appropriate method to send the media segment to the client according to the protocol used by the client's media segment request. For example, when the received media segment request adopts the HTTP GET method, the HTTP GET response message can be used to respond Send the generated media segment: put the media segment into the entity body of the HTTP response message; if the media segment request is received through an established TCP connection, the generated media segment can be sent to the client directly through the TCP connection end.

When the server receives continuous media segment requests from the client, the server will continue to generate new media segments according to the client's request. These new media segments encapsulate the selected target media substreams that have recently been generated and are waiting to be sent to the client. The client can parse these media segments to recover the media units of each target media substream in the real-time media stream. This process is shown in FIG. 2 . The client can continuously adjust the control parameters carried by the pull command in the media segment request according to application needs or network transmission conditions, such as changing the second type of parameters (media substream list, etc.) and the third type of parameters (such as start time, maximum time offset, unit priority, etc.), to ensure the continuity, real-time and adaptability to dynamic network transmission of media stream from server to client.

Due to the instant generation of media segments, the method of the embodiments of the present application no longer requires pre-segmentation and manifest files, and thus does not require the client to receive and process manifest files, thereby reducing transmission delay and saving overhead. At the same time, the client can arbitrarily combine media units in different media substreams through media segment requests, and only need one request to obtain the required media units of each media substream, which is easy to ensure synchronous reception of different media substreams. Finally, by adjusting the media substreams and candidate media units that need to be received at any time, it can better meet the needs of terminal applications and adapt to changes in network bandwidth. Adaptive delivery of multi-channel coded/scalable coded) media streams.

It should be understood that the settings of the above steps S101 , S102 and S103 are only for the convenience of description, and are not used to limit the execution sequence of the method. During specific implementation, each step may correspond to a functional entity that can run independently and interact with each other.

The above is a detailed description of the first embodiment, and the second embodiment will be described in detail below. In the following embodiments, an example will be given to illustrate how the server generates the media segment according to the media segment request.

Optionally, in an embodiment of the present application, generating the media segment according to the media segment request includes: if the pull command does not carry the first type parameter, the target media stream to be transmitted is the default specified media stream; if If the pull command does not carry the second type parameter, the target media substream to be transmitted is at least one media substream specified by default in the target media stream; if the pull command does not carry the third type parameter, the candidate media unit includes the target media substream. The media unit specified by default in the media substream, the default specified media unit is all the media units in the target media substream whose sequence number interval from the latest media unit is less than the first preset value, or all the media units in the target media substream and the latest media unit. For media units whose generation time interval of the latest media unit is less than the second preset value, both the first preset value and the second preset value are obtained according to the target media substream.

Specifically, when there is only one media stream to be transmitted in the server, the pull command sent by the client does not need to carry the first type of parameters, and the media stream is the selected target media stream; when there are multiple media streams in the server One of the media streams to be transmitted can be designated as the default media stream. When the pull command sent by the client does not carry any first-type parameters, the default media stream is selected as the target media stream.

In the actual execution process, for any target media stream, the media sub-streams it contains may be various. For example, it may contain different types of media substreams: video stream, audio stream, subtitle stream, additional information stream, picture stream, etc.; for the same type of media substream, it may contain different bit rates, such as for video stream For example, it may contain media substreams corresponding to different resolutions and frame rates; for audio streams, it may contain media substreams corresponding to different sampling rates; for video streams of the same type and bit rate, it may contain multiple encodings layers (such as using scalable video coding SVC), these different coding layers correspond to different priorities. Usually, the server should select one or more media sub-streams suitable for most terminal display and normal transmission under most network bandwidth conditions among all media sub-streams, as the default media sub-stream of the target media stream, When the client does not carry any second type parameters, these default media substreams are selected as target media substreams.

For any target media substream, in order to ensure the real-time transmission of the media stream, one or more media units newly generated by the target media substream should be transmitted first. When the pull command sent by the client does not carry any parameters of the third type, the server may use the default specified media unit as a candidate media unit. These default specified media units are all media units in the target media substream whose sequence number interval from the latest media unit is less than the first preset value, or the generation time interval between all and the latest media units in the target media substream is less than the second The default media unit. When the pull command includes multiple target media sub-streams and does not carry any third-type parameters, the first preset value or the second preset value set for each target media sub-stream shall ensure the sending of each target media sub-stream Synchronize.

It can be understood based on the description of the relevant embodiments that, by adopting the above-mentioned implementation manner, even when a certain pull command in the media segment request sent by the client does not carry a certain type of control parameter, the server can select the default specified control parameter. Target media stream\target media substream\candidate media unit, and generate media segments and send them to the client. 2 is a schematic diagram of a real-time transmission process of a media stream according to an embodiment of the present application. The server contains only one media stream S1. When the server receives a media segment request MS_REQ1, because MS_REQ1 only contains one pull command and the pull command contains Without carrying any parameters, the target media stream selected by the server is the default media stream S1, and the selected target media substreams are the default media substreams 1 and 4; for media substream 1, its first The preset value is 3. For media substream 4, the first preset value is 4. Therefore, the server determines the candidate media units of media substream 1 and media substream 4 respectively, and encapsulates them into the first media unit. Segment MS1, returned to the client.

Embodiment 3, in the following embodiment, how the server selects the target media substream to be transmitted according to the second type of parameters will be described. The second type of parameters given in the embodiments of the present application include two types:

1) Subflow list

The substream list directly gives the number of the target media substream. For example: substreamList=1,4 can be used to indicate that the selected target media substreams are substream 1 and substream 4.

2) Subflow pattern

The sub-stream pattern is an N-bit bit stream, where N is the number of media sub-streams contained in the target media stream, and each bit of the sub-stream pattern is associated with a specific media sub-stream of the target media stream, and is used for indicating whether the specific media substream is a target media substream to be transmitted. Figure 3 shows an example of a substream pattern (N=8). The substream pattern is bitstream 01101000. From left to right, each bit corresponds to substream 1 to substream 8. Therefore, when the bit value is 1 Indicates that the associated sub-stream is the target media sub-stream, that is, the target media sub-streams selected by the sub-stream pattern above are three: sub-stream 2, sub-stream 3 and sub-stream 5.

Usually in the specific implementation, when the number of target media substreams is small or the encapsulation order of the target media substreams needs to be specified, it is recommended to use the substream list to represent the target media substreams; when the number of target media substreams is large and the substreams need to be specified It is recommended to use the sub-stream pattern when the same clock is used.

In other implementation manners, in addition to the number of the sub-stream as the second-type parameter, the characteristics of the sub-stream can also be defined as the second-type parameter. The characteristics of these sub-streams include: sub-stream type, sub-stream priority, viewpoint number, channel number, video resolution, etc. One or more sub-stream feature parameters can be used to indicate the conditions that the target media sub-stream needs to meet , the server selects the final target media substream.

Embodiment 4. In the following embodiments, an example will be given to illustrate how the server transmits the sub-stream related information to the client.

In the specific implementation process, the client needs to specify the target media substream through the second type of parameters. The premise is that the client should know which target media substreams are included in the current media stream and the characteristics of these target media substreams. The terminal can select the target media substream to be transmitted according to the application requirements and network transmission conditions.

These descriptive information about the media substreams in the media stream can be provided by the server application layer to the client application layer, and the client can obtain this information in a way independent of the current transmission process (such as submitting additional request messages or through a third-party server). One piece of information can also be directly obtained from the server during the transmission process. In the embodiment of the present application, a method for directly encapsulating the media substream description information into a media segment and transmitting it to the client is proposed.

The minimum information contained in the media substream description information is: which media substreams are included in the current media stream. If the numbers of the media substreams are consecutively numbered from 1 to N, the media substream description information only needs to include the number N of the media substreams to obtain the numbers of all the media substreams. When the media stream adopts various multi-substream encoding, more substream feature information will be introduced into the media substream description information:

1) Media component identification

The media component identifier is used to indicate different ways of obtaining information in a media stream. For example, the video information collected by different cameras in a live broadcast corresponds to different components. Each media substream is associated with a media component, but the same media component can correspond to multiple media substreams. For example, a video captured from the same viewpoint can be represented by multiple media substreams encoded at different bit rates.

2) Subflow type

The types of media substreams include but are not limited to: video, audio, picture, subtitle, etc. or mixed types; the mixed types refer to a media substream that contains multiple types of media units, for example, a substream may contain both video and audio.

3) Substream bit rate

When a media substream is a fixed rate (CBR), the substream code rate is used to indicate the code rate of the media substream; if a media substream is a variable code rate (VBR), the substream code rate is used to indicate the code rate of the media substream. Indicates the average bit rate of this substream over a period of time.

4) Subflow priority

The priority of the media substream, used to indicate the importance of different media substreams in the transmission process.

5) Coding level

When the media stream adopts scalable coding, such as Scalable Video Coding (SVC), the media stream will generate multiple levels of coding streams, including: a base layer and multiple enhancement layers, and each media substream corresponds to a coding level .

6) Viewpoint identification

When the media stream adopts multi-view encoding such as 3D video, the media stream will generate multiple encoded streams of different viewpoints, and each media substream corresponds to a viewpoint. When multiple viewpoints are jointly encoded into one media substream, there may be multiple viewpoint identifiers in one media substream.

7) Video resolution

When the type of a media substream is video, the video resolution used for encoding;

8) Video frame rate

When the type of a media substream is video, the frame rate used for video coding.

9) Channel identification

When the media stream adopts multi-channel encoding, the media stream will generate encoded data on multiple channels respectively. Several channels form a channel group for multi-channel joint encoding. Each media substream corresponds to a or multiple channel IDs.

10) Audio sample rate

When the media substream is an audio stream, the sampling rate used for encoding.

11) Language Type

When the media substream is an audio stream containing vocals, the language of the vocals.

During specific implementation, each media stream can customize its own media sub-stream description information according to the actual situation. Examples of media sub-stream description information under three application scenarios are given in Fig. 4 to Fig. 6 , in which Fig. 4 Substream 1, substream 2 and substream 3 are substreams encoded by three different code rates of the same media content (the media component identifiers are all 10), and substream 1, substream 2, and substream 3 in Figure 5 Corresponding to three different viewpoints of the same media content, substream 4 and substream 5 correspond to two different channels of the same media content, and substreams 1 to 4 in Figure 6 correspond to the same media content (media The component identifiers are all 30) a base layer and three enhancement layers when using scalable video coding. After receiving the media segment, the client parses the media substream description information from it, and then selects the target media substream to be transmitted in real time according to the actual needs of the service layer, terminal performance and network conditions to support various multi-substream encoding. Adaptive delivery of media streams.

The media sub-stream description information of a media stream generally remains unchanged, therefore, it is not necessary to encapsulate the above-mentioned media sub-stream description information in each media segment. Generally speaking, when the server receives the first media segment request from the client, it can encapsulate the media substream description information in the first returned media segment, and can no longer encapsulate the media substream description in subsequent media segments. information.

Embodiment 5. In the following embodiments, an example will be given to illustrate how the server determines the candidate media unit to be transmitted through the third type of parameters.

Optionally, in an embodiment of the present application, generating the media segment according to the media segment request further includes: if the pull command carries at least one third-type parameter, wherein each third-type parameter corresponds to at least one of the candidate media units. A constraint condition, the candidate media units to be transmitted include all media units in each target media substream that simultaneously satisfy all the constraints corresponding to the third type of parameters.

Several third-type parameters are given below, as well as the constraints corresponding to each third-type parameter:

1) Start sequence number

The constraint condition corresponding to the start sequence number is: if the start sequence number is valid, the sequence number of the candidate media unit is after the start sequence number or equal to the start sequence number.

2) Start time

The constraint condition corresponding to the start time is: if the start time is valid, the generation time of the candidate unit is after the start time.

3) Maximum time offset

The constraint condition corresponding to the maximum time offset is: if the maximum time offset is valid, in the target media substream, the generation time interval between the candidate media unit and the latest media unit is less than the maximum time offset.

The above-mentioned third type of parameter validity and invalidity refers to whether the value of the parameter is within a specified range. Taking the start sequence number as an example, the value of the start sequence number cannot exceed the sequence number of the current latest media unit. On the other hand, to ensure real-time performance, the value of the start sequence number cannot be earlier than the sequence number of an existing media unit. The starting sequence number within the above range is valid. If a third-type parameter is invalid, it is equivalent to not carrying the third-type parameter. When all the third-type parameters are invalid, the candidate media unit to be transmitted in the target media substream is the default specified media unit.

In addition, during specific implementation, each pull command may carry one or more of the third type parameters. Here, the pull command is not limited to carry other self-defined third type parameters. For example, it can be based on the characteristics of the media unit. Define other third-type parameters, such as media unit type, minimum priority, priority range, etc., as constraints of the media unit.

It should be further pointed out that when there is only one target media substream selected according to the second type parameter, it is only necessary to judge whether the media units in the target media substream satisfy the constraints corresponding to various third type parameters. However, if multiple target media sub-streams are selected according to the second type of parameters, in order to make the above-mentioned constraints based on sequence numbers or generation time act on multiple target media sub-streams at the same time, these target media sub-streams should use synchronization numbers. , or use the same clock for timing. The synchronization number refers to: on the server, every time a specified time period elapses, all media units generated by each target media substream within the time period are associated with the same new sequence number. The above-mentioned specified time period may be of fixed length or variable length, may be preset, or may be dynamically determined according to the actual generation of the media unit. After using the synchronization number, the serial number of the media unit can not only be used to indicate the generation sequence of the media units in each media substream, but also the synchronization relationship between the media units in different target media substreams.

Figure 2 shows a real-time delivery process of a media stream. The client requests the media data of the target media stream S1, wherein the target media stream S1 is the default media stream on the server, and the target media stream includes 4 media streams. Substreams, where substream 1, substream 2, and substream 3 are three media streams that are synchronously numbered (for example, three video streams encoded with different bit rates), and substream 4 uses an independent (for example, an independently encoded output audio stream), the default specified media substreams are substream 1 and substream 4. Since sub-stream 1 and sub-stream 4 are not synchronized numbers, after the client receives the media segment, the serial numbers of the latest media units of sub-stream 1 and sub-stream 4 are different. Therefore, when the client continues to receive new media units , you need to carry two pull commands in the media segment request, each pull command carries a different media substream list and the corresponding start sequence number, which are respectively used to specify the characteristics of the target media substream and the media unit to be sent. , so that the continuous reception of sub-stream 1 and sub-stream 4 can be guaranteed respectively.

The target media stream in Figure 7 is similar to Figure 2, except that the client actively requests the media data of sub-stream 1 and sub-stream 2 (for example, sub-stream 1, sub-stream 2 and sub-stream 3 respectively use scalable video coded base layer and two enhancement layers). In this case, since substream 1 and substream 2 are numbered synchronously, when the client submits a media segment request, only one pull command is used, and the substream list carried by it includes two target media substreams: For stream 1 and sub-stream 2, the starting sequence numbers carried by them can be used to indicate the candidate media units in sub-stream 1 and sub-stream 2 at the same time.

The target media stream in Fig. 8 is similar to Fig. 2 and Fig. 7, the difference is that the client simultaneously requests the synchronized media data of three sub-streams (including sub-stream 1, sub-stream 2 and sub-stream 4), although the sub-stream Stream 4 and sub-streams 1&2 are not numbered synchronously. However, the generation time of all sub-streams in the target media stream S1 uses the same reference clock. Therefore, the client can still use only one pull command in the media segment request to Realize simultaneous pulling of three media substreams. At this time, three target media sub-streams are specified in the sub-stream list carried by the pull command, and the start time carried by the pull command is the latest generation time of the media unit currently received by the client. The start time can ensure that all newly generated media units to be sent are continuously encapsulated into media segments and sent to the client.

In this embodiment, the client can receive media streams in real time by continuously submitting media segment requests, and can adapt to changes in application requirements and network status by adjusting the target media sub-stream list, as shown in FIG. 2, at the beginning By default, substream 1 and substream 4 are received. When the application layer only needs to receive substream 4 or detects that the network bandwidth is reduced, the target media substream can be modified in MS_REQ4 to only include substream 4, and it can be automatically switched to only substream 4. Media units of substream 4 are received.

Embodiment 6, in the following embodiment, the processing procedure when the server encapsulates the candidate media unit into a media segment will be described.

Further, in an embodiment of the present application, encapsulating the candidate media units determined by each pull command into media segments includes: according to the order in which the pull commands appear in the media segment request, The candidate media units are encapsulated into the media segment, wherein if a parameter carried by a pull command includes a unit sorting method, the determined candidate media units are sorted according to the unit sorting method and then encapsulated into the media segment. If If a pull command does not carry a unit sorting mode, the determined candidate media units are sorted according to the default sorting mode and then encapsulated into the media segment.

The embodiment of the present application provides six basic cell sorting methods:

1) Time forward (TIME_FORWARD)

The candidate media units are sorted according to the generation time of the candidate media units, and the earlier the candidate media units are generated, the earlier they are encapsulated into the media segment.

2) Time reverse (TIME_BACKWARD)

The order is reversed according to the generation time of the candidate media units, and the candidate media units generated later are encapsulated into the media segment first.

3) Serial number forward (SEQ_FORWARD)

The candidate media units are sorted according to the sequence numbers of the candidate media units, and the candidate media units with the higher sequence numbers are encapsulated into the media segment earlier.

4) Serial number reverse (SEQ_BACKWARD)

The sequence number of the candidate media unit is reversed, and the candidate media unit with the later sequence number is encapsulated into the media segment first.

5) Substream numbering order (SSNO_ORDER)

When there are multiple target media sub-streams, the candidate media units of each sub-stream are encapsulated in sequence according to the order of the sub-stream numbers.

6) Substream list order (SSLIST_ORDER)

When there are multiple target media substreams and these target media substreams are defined by the substream list parameter (SubStreamList), the candidate media units of the multiple substreams are encapsulated sequentially according to the order in which the substream numbers appear in the substream list.

The unit sorting method can also be a cascade of the above basic sorting methods, such as SSLIST_ORDER+SEQ_BACKWARD. The meaning of this cascade is that first, the candidate media units are sorted according to the first basic sorting method, and the candidates with the same position after sorting are sorted. The media units are ordered according to the second basic ordering, and so on until the ordering is complete. Regardless of the basic sorting method or the cascading sorting method, if there are still candidate media units with the same position after sorting, the candidate media units with the same position are sorted according to the default sorting method.

9 shows the process of encapsulating candidate media units into media segments under different media unit sorting methods, where the media segment MS3 to be generated contains the same candidate media unit, but corresponds to different media segment requests, and the final media The order in which cells are encapsulated into media segments also varies.

Sorting method 1: The media segment request consists of two pull commands. The target media substream of the first pull command is substream 4, and the target media substreams of the second pull command are substream 1 and substream 2. Therefore, according to the order of the pull commands, the candidate media units of substream 4 are firstly encapsulated into the media segment. Since the first pull command does not specify any unit sorting method, the default sorting method, that is, the time forward is used to encapsulate Candidate media units D58 to D62; then, since the unit sorting method carried by the second pull command is time reverse (TIME_BACKWARD), the media units encapsulated by substream 1 and substream 2 according to time reverse are A27/B27 in turn, A26/B26, A25/B25. Media units with the same location are sorted according to the size of their substream numbers by default. Therefore, the packaging sequence of the final candidate media units is shown in Sorting Mode 1 in FIG. 9 .

Ordering method 2: The media segment request includes only one pull command, and the unit ordering method carried by the pull command is a cascade of two basic ordering methods: SSLIST_ORDER+SEQ_FORWARD. The first basic sorting method is the substream list order (SSLIST_ORDER), which indicates that the candidate media units of each substream are encapsulated in the order of the substream numbers in the substream list (SubStreamList=4, 1, 2), that is, the substream is encapsulated first. The candidate media unit of 4, the candidate media unit of substream 1 is encapsulated, and the candidate media unit of substream 2 is further encapsulated. The second basic sorting method is sequence number forward (SEQ_FORWARD), that is, for candidate media units belonging to the same substream, the candidate media units are sorted in the order of their sequence numbers from front to back. Finally, the encapsulation order of the candidate media units is as follows Figure 9 shows the sorting method 2.

Sorting mode 3: The media segment request includes only one pull command, and the unit sorting mode carried by the pull command is a cascade of two basic sorting modes: SSNO_ORDER+SEQ_BACKWARD. The first basic ordering method is the sub-stream number order (SSNO_ORDER), which indicates that the candidate media units of each sub-stream are encapsulated in the order of the sub-stream numbers from small to large, that is, the candidate media units of sub-stream 1 are encapsulated first, and then the sub-stream is encapsulated. The candidate media unit of 2, and then encapsulates the candidate media unit of substream 3. The second basic sorting method is sequence number reverse (SEQ_BACKWARD), that is, for candidate media units belonging to the same substream, the candidate media units are sorted according to their sequence numbers from back to front. Finally, the packaging order of the candidate media units As shown in Fig. 9 sorting mode 3.

Sorting method 4: The media segment request only includes one pull command, and the pull command carries only one unit sorting method: TIME_FORWARD, that is, the candidate media units are sorted from front to back according to the generation time of all candidate media units. Finally, the candidate media units are sorted. The encapsulation order of the media units is shown in Sorting Mode 4 of FIG. 9 .

Of course, this embodiment does not limit the definition of a new unit sorting method. For example, when each media unit is associated with a priority, the candidate media units can be sorted according to the unit priority, then a new unit sorting method can be defined. : High-priority unit priority (HIGH_PRIOR_FIRST); when each media sub-stream is also associated with a priority, multiple target media sub-streams in the same pull command can be sorted according to the sub-stream priority. Defines a new unit ordering method: substream priority order (SS_PRIOR_ORDER). In addition, when generating a media segment, the candidate media units determined by each pull command may not be encapsulated according to the order in which the pull commands appear in the media segment request. For example, without distinguishing between pull commands, all candidate media Units are ordered and packed into media segments.

The order in which media units are encapsulated into media segments is controlled by pulling commands and unit sorting, so that when the network transmission bandwidth is insufficient, the specific candidate media units of specific substreams can be sent preferentially: for example, high-priority media substreams , when the video sub-stream and the audio sub-stream are transmitted at the same time, the audio transmission can be guaranteed first; when the base layer and the enhancement layer code stream are transmitted at the same time, the candidate media unit of the base layer is preferentially sent. In the occasions with high real-time requirements, The delivery of the newly generated candidate media unit is prioritized to improve user experience.

According to the adaptive real-time delivery method of the media stream proposed by the embodiment of the present application, the media units of each sub-stream can be arbitrarily combined according to the request of the client, and the media segment can be generated in real time, and the media segment can be delivered to the client. First, this makes the server only need to store the media units according to each sub-stream, and does not need to generate fragments of various sub-stream combinations in advance, which reduces the storage requirements of the server, and at the same time, simplifies the synchronization processing of the client, and the client only needs to request once The combined segment of each substream in the same time period can be obtained, and it is easy to ensure the synchronous reception of each substream. Secondly, the client can dynamically adjust the target media sub-stream in the media segment request according to application needs and network conditions, so that various types of multi-sub-stream media streams (such as multi-rate encoding/multi-view/multi-stream media streams) can be uniformly supported. Channel/Scalable Coding) adaptive delivery. Finally, since each media segment is triggered by the client's request, no matter how many sub-streams the media stream includes, the manifest file is no longer required, and the client does not need to request and parse the manifest file, which significantly reduces the complexity of manifest files. Therefore, the real-time transmission delay and transmission overhead of the media stream can be effectively reduced.

Next, the adaptive real-time delivery server for media streams proposed according to the embodiments of the present application will be described with reference to the accompanying drawings.

FIG. 10 is a schematic structural diagram of an adaptive real-time delivery server for media streams according to an embodiment of the present application.

As shown in FIG. 10 , the media stream includes at least one media substream, and each media substream is a sequence of media units generated in real time on the server, wherein each media substream is associated with a substream number, and each media unit is associated with Having a generation time and/or a sequence number indicating the order in which the media units are generated in the media substream, the server 10 includes a client interface component 100 , a media segment generating component 200 and a media segment sending component 300 .

The client interface component 100 is configured to receive a media segment request sent by the client, wherein the media segment request carries at least one pull command, the pull command does not carry or carries at least one control parameter, and each control parameter includes an indication A first type of parameter for the target media stream to be delivered, a second type of parameter to indicate the target media sub-stream to be delivered, and a third type of parameter to indicate a candidate media unit to be delivered. A media segment generating component 200, configured to generate a media segment according to a media segment request, wherein, for each pull command in the media segment request, a target media stream to be transmitted is selected, and at least one of the target media streams to be transmitted is selected The target media sub-stream, determines the candidate media units to be transmitted in the target media sub-stream, and encapsulates the candidate media units determined by each pull command into media segments, wherein generating the media segments according to the media segment request includes: first, for the media For each pull command in the segment request, select the target media stream to be transmitted, select at least one target media substream to be transmitted in the target media stream, determine the candidate media units to be transmitted in each target media substream, and then , the candidate media units determined by each pull command are encapsulated into media segments. The media segment sending component 300 is configured to send the generated media segment to the client. The server 10 in this embodiment of the present application can arbitrarily combine the media units of each substream according to the client's request, generate media segments in real time, and then return the media segments to the client, thereby reducing storage overhead on the server and simplifying the interaction between substreams. synchronous transmission, and effectively reduce the media stream transmission delay and overhead.

Specifically, the client interface component 100 is used to receive a media segment request from a client; the media segment request can be one or more pull commands, and each pull command can carry 0, 1 or more control parameters; the control parameters Including the following categories: the first type parameter, the second type parameter and the third type parameter; the first type parameter is used to indicate the target media stream to be transmitted; the second type parameter is used to indicate the target media stream to be transmitted in the target media stream stream; the third type of parameter is used to indicate the candidate media unit to be transmitted in the target media substream. The client interface component 100 can use any specified protocol to receive the media segment request, for example, when the HTTP protocol is used, the client interface component 100 can be a Web server, which can receive any media segment request using the http protocol; protocol, the client interface component is a TCP server and provides a fixed service port.

The media segment generating component 200 is configured to generate the required media segment according to the media segment request of the client. The media segment request is obtained from the client interface component 100, and the pull command and its control parameters are parsed out. Then, the target media stream to be transmitted is selected according to the first type of parameters, and the to-be-transmitted media stream is selected according to the second type of parameters. The target media substreams to be transmitted are determined according to the third type of parameters to determine the candidate media units to be transmitted in each target media substream, and finally, the candidate media units determined by each pull command are extracted from the media stream storage unit, and the It is encapsulated into a media segment, and then directly sent to the media segment sending component 300 for sending.

Further, as shown in FIG. 11 , the server 10 in this embodiment of the present application further includes at least one media stream real-time generating component for generating or receiving one or more media streams from other servers in real time by itself; the media stream includes at least one media stream Stream, each media substream is a sequence of media units generated in real time on the server; each media substream is associated with a substream number, and each media unit is associated with a generation time and/or a sequence number, the sequence number is used to Indicates the generation order of the media units in the media substream;

Specifically, the media stream real-time generation component includes one or more media sub-stream real-time generation components, and each media sub-stream real-time generation component includes one or more processing steps for the real-time generation of media sub-streams. For example, the processing steps include But not limited to: real-time acquisition of media signals, encoding and compression, transmission encapsulation and pre-segmentation. In addition, the real-time media sub-stream generation component can also receive media streams from other devices in real time, or convert existing media stream files on a server into real-time generated media unit sequences.

Optionally, in an embodiment of the present application, the media segment generation component 200 is further configured to, when the pull command does not carry the first type of parameters, the target media stream to be transmitted is the default specified media stream, and when the pull command does not carry the first type parameter, the When the pull command does not carry the second type of parameter, the target media substream to be transmitted is at least one media substream specified by default in the target media stream, and when the pull command does not carry the third type of parameter, the candidate media unit includes the target media substream. The media unit specified by default in the media substream, the default specified media unit is all the media units in the target media substream whose sequence number interval from the latest media unit is less than the first preset value, or all the media units in the target media substream and the latest media unit. For media units whose generation time interval of the latest media unit is less than the second preset value, both the first preset value and the second preset value are obtained according to the target media substream.

Optionally, in an embodiment of the present application, the second type of parameter includes a sub-stream list, and the sub-stream list includes the serial number of at least one target media sub-stream.

Optionally, in an embodiment of the present application, the second type of parameter includes a sub-stream pattern, and the sub-stream pattern is an N-bit bit stream, where N is the number of media sub-streams included in the target media stream, and the sub-stream Each bit of the pattern is associated with a specific media substream of the target media stream and is used to indicate whether the specific media substream is a target media substream to be transmitted.

Optionally, in an embodiment of the present application, the media segment generation component 200 is further configured to encapsulate media substream description information into the media segment, where the media substream description information includes at least one entry, wherein each entry Corresponds to a media substream of the media stream, and contains at least one field: the media substream number.

Optionally, in an embodiment of the present application, each entry further includes at least one of the following fields: media component identifier, sub-stream type, sub-stream bit rate, sub-stream priority, coding level, viewpoint identifier, video Resolution, video frame rate, channel identification, audio sample rate, language type.

Optionally, in an embodiment of the present application, the media segment generation component 200 is further configured to, when the pull command carries at least one third type parameter, each third type parameter corresponds to at least one constraint condition of the candidate media unit, The candidate media units to be transmitted include all media units in each target media substream that simultaneously satisfy all the constraints corresponding to the third type of parameters.

Optionally, in an embodiment of the present application, the media units in the target media sub-stream adopt synchronization numbers, wherein, each time a specified time period passes, all media generated by each target media sub-stream within the specified time period are The units are all associated with the same new sequence number. The third type of parameter includes the start sequence number. The constraints corresponding to the start sequence number are: if the start sequence number is valid, the sequence number of the candidate media unit is after the start sequence number or equal to the start sequence number. .

Optionally, in an embodiment of the present application, the generation times of the media units in all the target media sub-streams are derived from the same clock on the server, the third type of parameters includes the start time, and the constraints corresponding to the start time are: : If the start time is valid, the generation time of the candidate media unit is after the start time.

Optionally, in an embodiment of the present application, the third type of parameter includes the maximum time offset, and the constraint condition corresponding to the maximum time offset is: if the maximum time offset is valid, then in the target media substream, the candidate media unit and the latest The generation time interval of the media unit is less than the maximum time offset.

Optionally, in an embodiment of the present application, the media segment generation component 200 is further configured to encapsulate the candidate media units determined by each pull command into the media segment request according to the order in which each pull command appears in the media segment request. Media segment, wherein, if the parameter carried by any pull command includes the unit sorting method, the candidate media units determined by the pull command are sorted according to the unit sorting method and then encapsulated into the media segment. If the unit sorting method is not carried, then The candidate media units determined by the pull command are sorted according to the default sorting method and then encapsulated into media segments.

Optionally, in an embodiment of the present application, the unit sorting method is a cascade of one or more basic sorting methods, and the basic sorting methods include the following types: time forward sorting, time reverse sorting, serial number Forward sorting, serial number reverse sorting, substream number sequence sorting, and substream list sequence sorting.

Additionally, clients and servers are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also known as a cloud computing server or a cloud host. It is a host product in the cloud computing service system to solve the traditional physical host and VPS services, which are difficult to manage and weak in business scalability. defect.

It should be noted that, the foregoing explanation of the embodiment of the method for adaptive real-time delivery of media streams is also applicable to the adaptive real-time delivery server of media streams in this embodiment, and details are not repeated here.

According to the adaptive real-time delivery server of the media stream proposed by the embodiment of the present application, the media units of each sub-stream can be arbitrarily combined according to the request of the client, and the media segment can be generated in real time, and the media segment can be delivered to the client. First, this makes the server only need to store the media units according to each sub-stream, and does not need to generate fragments of various sub-stream combinations in advance, which reduces the storage requirements of the server, and at the same time, simplifies the synchronization processing of the client, and the client only needs to request once The combined segment of each substream in the same time period can be obtained, and it is easy to ensure the synchronous reception of each substream. Secondly, the client can dynamically adjust the target media sub-stream in the media segment request according to application needs and network conditions, so that various types of multi-sub-stream media streams (such as multi-rate encoding/multi-view/multi-stream media streams) can be uniformly supported. Channel/Scalable Coding) adaptive delivery. Finally, since each media segment is triggered by the client's request, no matter how many sub-streams the media stream includes, the manifest file is no longer required, and the client does not need to request and parse the manifest file, which significantly reduces the complexity of manifest files. Therefore, the real-time transmission delay and transmission overhead of the media stream can be effectively reduced.

FIG. 12 is a schematic structural diagram of an electronic device provided by an embodiment of the present application. The electronic device may include:

Memory 1201 , processor 1202 , and computer programs stored on memory 1201 and executable on processor 1202 .

When the processor 1202 executes the program, the adaptive real-time delivery method of the media stream provided in the above embodiment is implemented.

Further, the electronic device also includes:

The communication interface 1203 is used for communication between the memory 1201 and the processor 1202 .

The memory 1201 is used to store computer programs that can be executed on the processor 1202 .

The memory 1201 may include high-speed RAM memory, and may also include non-volatile memory, such as at least one disk memory.

If the memory 1201, the processor 1202 and the communication interface 1203 are independently implemented, the communication interface 1203, the memory 1201 and the processor 1202 can be connected to each other through a bus and complete communication with each other. The bus can be an Industry Standard Architecture (referred to as ISA) bus, a Peripheral Component (referred to as PCI) bus, or an Extended Industry Standard Architecture (referred to as EISA) bus or the like. The bus can be divided into address bus, data bus, control bus and so on. For ease of representation, only one thick line is shown in FIG. 12, but it does not mean that there is only one bus or one type of bus.

Optionally, in terms of specific implementation, if the memory 1201, the processor 1202 and the communication interface 1203 are integrated on one chip, the memory 1201, the processor 1202 and the communication interface 1203 can communicate with each other through an internal interface.

The processor 1202 may be a central processing unit (Central Processing Unit, referred to as CPU), or a specific integrated circuit (Application Specific Integrated Circuit, referred to as ASIC), or is configured to implement one or more embodiments of the present application integrated circuit.

This embodiment also provides a computer-readable storage medium on which a computer program is stored, characterized in that, when the program is executed by a processor, the above-mentioned adaptive real-time delivery method of a media stream is implemented.

In the description of this specification, description with reference to the terms "one embodiment," "some embodiments," "example," "specific example," or "some examples", etc., mean specific features described in connection with the embodiment or example , structure, material or feature is included in at least one embodiment or example of the present application. In this specification, schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the particular features, structures, materials or characteristics described may be combined in any suitable manner in any one or N of the embodiments or examples. Furthermore, those skilled in the art may combine and combine the different embodiments or examples described in this specification, as well as the features of the different embodiments or examples, without conflicting each other.

In addition, the terms "first" and "second" are only used for descriptive purposes, and should not be construed as indicating or implying relative importance or implying the number of indicated technical features. Thus, a feature delimited with "first", "second" may expressly or implicitly include at least one of that feature. In the description of the present application, "N" means at least two, such as two, three, etc., unless otherwise expressly and specifically defined.

Any process or method description in the flowchart or otherwise described herein may be understood to represent a module, segment or portion of code comprising one or N more executable instructions for implementing custom logical functions or steps of the process , and the scope of the preferred embodiments of the present application includes alternative implementations in which the functions may be performed out of the order shown or discussed, including performing the functions substantially concurrently or in the reverse order depending upon the functions involved, which should It is understood by those skilled in the art to which the embodiments of the present application belong.

The logic and/or steps represented in flowcharts or otherwise described herein, for example, may be considered an ordered listing of executable instructions for implementing the logical functions, may be embodied in any computer-readable medium, For use with, or in conjunction with, an instruction execution system, apparatus, or device (such as a computer-based system, a system including a processor, or other system that can fetch instructions from and execute instructions from an instruction execution system, apparatus, or apparatus) or equipment. For the purposes of this specification, a "computer-readable medium" can be any device that can contain, store, communicate, propagate, or transport the program for use by or in connection with an instruction execution system, apparatus, or apparatus. More specific examples (non-exhaustive list) of computer readable media include the following: electrical connections (electronic devices) with one or N wires, portable computer disk cartridges (magnetic devices), random access memory (RAM), Read Only Memory (ROM), Erasable Editable Read Only Memory (EPROM or Flash Memory), Fiber Optic Devices, and Portable Compact Disc Read Only Memory (CDROM). In addition, the computer readable medium may even be paper or other suitable medium on which the program may be printed, as the paper or other medium may be optically scanned, for example, followed by editing, interpretation, or other suitable medium as necessary process to obtain the program electronically and then store it in computer memory.

It should be understood that various parts of this application may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the N steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware as in another embodiment, it can be implemented by any one of the following techniques known in the art, or a combination thereof: discrete with logic gates for implementing logic functions on data signals Logic circuits, application specific integrated circuits with suitable combinational logic gates, Programmable Gate Arrays (PGA), Field Programmable Gate Arrays (FPGA), etc.

Those skilled in the art can understand that all or part of the steps carried by the methods of the above embodiments can be completed by instructing the relevant hardware through a program, and the program can be stored in a computer-readable storage medium, and the program can be stored in a computer-readable storage medium. When executed, one or a combination of the steps of the method embodiment is included.

In addition, each functional unit in each embodiment of the present application may be integrated into one processing module, or each unit may exist physically alone, or two or more units may be integrated into one module. The above-mentioned integrated modules can be implemented in the form of hardware, and can also be implemented in the form of software function modules. If the integrated modules are implemented in the form of software functional modules and sold or used as independent products, they may also be stored in a computer-readable storage medium.

The above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, and the like. Although the embodiments of the present application have been shown and described above, it should be understood that the above embodiments are exemplary and should not be construed as limitations to the present application. Embodiments are subject to variations, modifications, substitutions and variations.

Claims

An adaptive real-time delivery method for a media stream, wherein the media stream includes at least one media sub-stream, and each media sub-stream is a sequence of media units generated in real time on a server, wherein each media sub-stream is a sequence of media units generated in real time on a server. The stream is associated with a substream number, and each media unit is associated with a generation time and/or a sequence number indicating the sequence in which the media unit is generated in the media substream, and the method includes the following steps:

Receive a media segment request sent by the client, wherein the media segment request carries at least one pull command, and the pull command does not carry or carries at least one control parameter, and the control parameter includes a message indicating the target media stream to be transmitted. a first type of parameter, a second type of parameter indicating a target media substream to be transmitted, and a third type of parameter indicating a candidate media unit to be transmitted;

A media segment is generated according to the media segment request, wherein, for each pull command in the media segment request, the target media stream to be transmitted is selected, and at least one of the target media streams to be transmitted is selected. the target media substream, determining the candidate media units to be transmitted in the target media substream, and encapsulating the candidate media units determined by each pull command into the media segment;

The media segment is sent to the client.
The method according to claim 1, wherein the generating a media segment according to the media segment request comprises:

If the pull command does not carry the first type parameter, the target media stream to be transmitted is a default specified media stream;

If the pull command does not carry the second type of parameter, the target media substream to be transmitted is at least one media substream specified by default in the target media stream;

If the pull command does not carry the third type of parameter, the candidate media unit includes a default specified media unit in the target media substream, and the default specified media unit is the target media substream The sequence number interval between all media units in the stream and the latest media unit is less than the first preset value, or all media units in the target media sub-stream whose generation time interval between the latest media units is less than the second preset value, the Both the first preset value and the second preset value are obtained according to the target media substream.
The method according to claim 1 or 2, wherein the second type of parameter comprises a sub-stream list, and the sub-stream list includes the number of at least one target media sub-stream.
The method according to claim 1, wherein the second type parameter comprises a sub-stream pattern, and the sub-stream pattern is an N-bit bit stream, wherein N is a media sub-stream included in the target media stream Each bit of the sub-stream pattern is associated with a specific media sub-stream of the target media stream, and is used to indicate whether the specific media sub-stream is a target media sub-stream to be transmitted.
The method according to claim 1, wherein the generating a media segment according to the media segment request further comprises:

Encapsulate the media substream description information into the media segment, where the media substream description information includes at least one entry, wherein each entry corresponds to a media substream of the media stream and includes at least one field: Media substream number.
The method according to claim 5, wherein each entry further comprises at least one of the following fields: media component identifier, substream type, substream bit rate, substream priority, coding level, viewpoint identifier , video resolution, video frame rate, channel identification, audio sample rate, language type.
The method according to claim 1, wherein the generating a media segment according to the media segment request further comprises:

If the pull command carries at least one parameter of the third type, wherein each parameter of the third type corresponds to at least one constraint condition of a candidate media unit, and the candidate media unit to be transmitted includes each target All media units in the media substream that simultaneously satisfy all the constraints corresponding to the third type of parameters.
The method according to claim 7, wherein the media units in the target media sub-streams adopt synchronization numbers, wherein each time a specified time period elapses, each target media sub-stream is updated at the specified time All media units generated in the segment are associated with the same new sequence number, the third type of parameter includes a start sequence number, and the constraints corresponding to the start sequence number are:

If the start sequence number is valid, the sequence number of the candidate media unit is after the start sequence number or equal to the start sequence number.
The method according to claim 7, wherein the generation time of the media units in all the target media sub-streams comes from the same clock on the server, the third type of parameters includes a start time, and the start time The constraints corresponding to time are:

If the start time is valid, the generation time of the candidate media unit is after the start time.
The method according to claim 7, wherein the third type of parameter comprises a maximum time offset, and the constraint condition corresponding to the maximum time offset is:

If the maximum time offset is valid, in the target media substream, the generation time interval between the candidate media unit and the latest media unit is smaller than the maximum time offset.
The method according to claim 1, wherein the encapsulating the candidate media units determined by each pull command into the media segment comprises:

According to the order in which each pull command appears in the media segment request, the candidate media units determined by each pull command are encapsulated into the media segment, wherein, if the parameters carried by any one pull command include If the unit sorting mode is selected, the candidate media units determined by the pull command are sorted according to the unit sorting mode and then encapsulated into the media segment. If the unit sorting mode is not included, the default sorting mode is used. The candidate media units determined by the pull command are sorted and then encapsulated into the media segment.
The method according to claim 11, wherein the unit sorting method is a cascade of one or more basic sorting methods, and the basic sorting methods include the following types: time forward sorting, time reverse sorting Forward sorting, sequence number forward sorting, sequence number reverse sorting, substream number sequence sorting, and substream list sequence sorting.
An adaptive real-time delivery server for media streams, characterized in that the media stream includes at least one media sub-stream, and each media sub-stream is a sequence of media units generated in real time on the server, wherein each media sub-stream is The stream is associated with a substream number, and each media unit is associated with a generation time and/or a sequence number indicating the sequence in which the media unit is generated in the media substream, and the server includes:

A client interface component, configured to receive a media segment request sent by a client, wherein the media segment request carries at least one pull command, the pull command does not carry or carries at least one control parameter, and the control parameter includes an indication The first type parameter of the target media stream to be transmitted, the second type parameter indicating the target media substream to be transmitted, and the third type parameter indicating the candidate media unit to be transmitted;

A media segment generation component, configured to generate a media segment according to the media segment request, wherein, for each pull command in the media segment request, the target media stream to be transmitted is selected, and the target media is selected. At least one target media sub-stream to be transmitted in the stream, determine the candidate media unit to be transmitted in the target media sub-stream, and encapsulate the candidate media unit determined by each pull command into the media segment;

A media segment sending component, configured to send the generated media segment to the client.
The server according to claim 13, wherein the media segment generating component is further configured to, when the pull command does not carry the first type parameter, the target media stream to be transmitted is a default specified and when the pull command does not carry the second type parameter, the target media substream to be transmitted is at least one media substream specified by default in the target media stream, and the When the pull command does not carry the third type of parameter, the candidate media unit includes the media unit specified by default in the target media substream, and the default specified media unit is the media unit in the target media substream. All media units whose sequence number interval from the latest media unit is less than the first preset value, or all media units whose generation time interval from the latest media unit in the target media substream is less than the second preset value, the first media unit. Both the preset value and the second preset value are obtained according to the target media substream.
The server according to claim 13 or 14, wherein the second type of parameter comprises a sub-stream list, and the sub-stream list includes the number of at least one target media sub-stream.
The server according to claim 13, wherein the second type parameter comprises a sub-stream pattern, and the sub-stream pattern is an N-bit bit stream, wherein N is a media sub-stream included in the target media stream Each bit of the sub-stream pattern is associated with a specific media sub-stream of the target media stream, and is used to indicate whether the specific media sub-stream is a target media sub-stream to be transmitted.
The server according to claim 13, wherein the media segment generating component is further configured to encapsulate media substream description information into the media segment, the media substream description information comprising at least one entry, wherein , each entry corresponds to a media substream of the media stream, and includes at least one field: a media substream number.
The server according to claim 17, wherein each entry further comprises at least one of the following fields: media component identifier, substream type, substream bit rate, substream priority, encoding level, viewpoint identifier , video resolution, video frame rate, channel identification, audio sample rate, language type.
The server according to claim 13, wherein the media segment generating component is further configured to, when the pull command carries at least one parameter of the third type, each parameter of the third type corresponds to candidate media At least one constraint condition of the unit, the candidate media unit to be transmitted includes all media units in each target media substream that simultaneously satisfy all constraints corresponding to the third type of parameters.
The server according to claim 19, wherein the media units in the target media sub-streams adopt synchronization numbers, wherein each time a specified time period elapses, each target media sub-stream is updated at the specified time All media units generated in the segment are associated with the same new sequence number, the third type of parameter includes a start sequence number, and the constraints corresponding to the start sequence number are:

If the start sequence number is valid, the sequence number of the candidate media unit is after the start sequence number or equal to the start sequence number.
The server according to claim 19, wherein the generation time of the media units in all the target media sub-streams comes from the same clock on the server, and the third type of parameters includes a start time, and the start time The constraints corresponding to time are:

If the start time is valid, the generation time of the candidate media unit is after the start time.
The server according to claim 19, wherein the third type of parameter comprises a maximum time offset, and the constraint condition corresponding to the maximum time offset is:

If the maximum time offset is valid, in the target media substream, the generation time interval between the candidate media unit and the latest media unit is smaller than the maximum time offset.
The server according to claim 13, wherein the media segment generation component is further configured to, according to the order in which each pull command appears in the media segment request, The candidate media units are encapsulated into the media segment, wherein, if the parameter carried by any pull command includes a unit sorting mode, the candidate media units determined by the pull command are sorted according to the unit sorting mode and then encapsulated into the media segment. If the media segment does not carry the unit sorting mode, the candidate media units determined by the pull command are sorted according to the default sorting mode and then encapsulated into the media segment.
The server according to claim 23, wherein the unit sorting method is a cascade of one or more basic sorting methods, and the basic sorting methods include the following types: time forward sorting, time reverse sorting Forward sorting, sequence number forward sorting, sequence number reverse sorting, substream number sequence sorting, and substream list sequence sorting.
A computer device, characterized by comprising: a memory, a processor, and a computer program stored on the memory and running on the processor, the processor executing the program to implement the method as claimed in claim 1 - 12 The adaptive real-time delivery method of a media stream according to any one of 12.
A non-transitory computer-readable storage medium on which a computer program is stored, characterized in that the program is executed by a processor to implement the adaptation of the media stream according to any one of claims 1-12 Real-time delivery method.