CN113873343B

CN113873343B - Self-adaptive real-time delivery method of media stream and server

Info

Publication number: CN113873343B
Application number: CN202010614997.5A
Authority: CN
Inventors: 姜红旗; 辛振涛; 姜红艳; 申素辉
Original assignee: Beijing Kaiguang Information Technology Co ltd
Current assignee: Beijing Kaiguang Information Technology Co ltd
Priority date: 2020-06-30
Filing date: 2020-06-30
Publication date: 2023-02-24
Anticipated expiration: 2040-06-30
Also published as: WO2022002070A1; CN113873343A

Abstract

The application discloses a self-adaptive real-time delivery method of a media stream and a server, wherein the method comprises the following steps: receiving a media segment request sent by a client, wherein the media segment request carries at least one pull command; generating a media segment according to the media segment request, wherein a target media stream to be transmitted is selected for each pull command in the media segment request, at least one target media sub-stream to be transmitted in the target media stream is selected, candidate media units to be transmitted in the target media sub-streams are determined, and the candidate media units determined by each pull command are packaged into the media segment; and sending the media segment to the client. According to the embodiment of the application, the media units of the selected sub-streams can be combined in real time to generate the media segments according to the request of the client, the storage overhead on the server is reduced, the synchronous transmission among the sub-streams is simplified, and the self-adaptive real-time transmission of various multi-sub-stream media streams is supported in a unified manner.

Description

Self-adaptive real-time delivery method of media stream and server

Technical Field

The present application relates to the field of digital information transmission technologies, and in particular, to a method and a server for adaptive real-time delivery of a media stream.

Background

With the rapid development of the internet, especially the mobile internet, the real-time transmission of multimedia information such as audio, video, and image through the internet becomes a basic requirement for many applications (such as live webcast, real-time monitoring, and video conference), and in order to meet this requirement, various streaming media real-time transmission technologies are proposed, and currently, the technologies are widely used, and mainly include three types: real-Time Transport Protocol (RTP)/RTSP), RTMP (Real-Time Messaging Protocol), and HTTP (HyperText Transfer Protocol) Adaptive Streaming HAS (HTTP Adaptive Streaming). The HTTP adaptive streaming includes multiple schemes: HLS (HTTP Live Streaming) by apple, smooth Streaming by microsoft, HDS (HTTP Dynamic Streaming) by Adobe, DASH (Dynamic Adaptive Streaming over HTTP) by MPEG organization.

The common feature of the HTTP adaptive streaming schemes is to cut a media stream into media segments of short time (2 s to 10 s), generate an index file or a manifest file (for example, an MPD file in an m3u8 playlist in HLS or an DASH) describing the media segments at the same time, store the index file or the manifest file in each Web server, obtain URL (Uniform Resource Locator) access addresses of the media segments by accessing the playlist or the manifest file through a client, and then download the media segments one by one and play the media segments by using an HTTP protocol. The main difference between these schemes is represented by the difference between the encapsulation format and the manifest file format employed by the media segments.

Compared with RTP/RTSP and RTMP, HTTP adaptive streaming is easy to deploy by using a common Web server and adapts to the existing infrastructure of the Internet, including CDN, caches, firewall, NATS and the like, and can support large-scale user access. Meanwhile, the client can be supported to select the fragments with proper code rates according to network conditions and terminal capability by providing the media fragments with various code rates, so that code rate self-adaption is realized. Therefore, HTTP adaptive streaming has become the mainstream way of real-time streaming media delivery on the internet at present.

With the development of multimedia technology, the form of media streams has become more and more complex. The earliest media streams typically include only one audio and/or one video, but a media stream transmitted over the internet in the future may include several tens of media sub-streams, which are expressed in the following ways: 1) The method comprises the following steps that multiple types of media sub-streams can be generated from the same scene, wherein the multiple types of media sub-streams comprise video, audio, subtitles, pictures, auxiliary information, data and the like, and the media sub-streams need to be mixed together for transmission; 2) In order to adapt to the transmission requirement of network bandwidth and the processing capacity of different terminals, the same video stream can generate a plurality of coded sub-streams according to the difference of resolution, frame rate and code rate, and a plurality of audio streams can generate a plurality of coded sub-streams according to the difference of language, sampling rate and code rate; 3) Multi-view Video (Multi-view Video), where one scene may generate multiple Video sub-streams from different views, such as 3D Video or free view Video, in order to obtain a more realistic Video experience; 4) Multi-channel audio, where the same scene is sampled from different locations to produce multiple audio substreams for an immersive audio experience; 5) Scalable Video Coding (SVC) is a method for Coding a Video path to adapt to network bandwidth transmission, and a base layer and a plurality of enhancement layers are generated. Further, any combination of the above aspects (e.g., using multi-view video while using multi-rate video coding or scalable coding for each view) will produce a staggering number of media sub-streams and media streams.

In the related art, when various HTTP adaptive streaming protocols are used to transmit the media streams of the multiple sub-streams, the media streams need to be pre-segmented, and a corresponding manifest file (for example, an MPD file in M3U8 in HLS or DASH) is generated, where the pre-segmentation schemes include two schemes:

scheme 1, sub-stream combination segmentation, that is, encapsulating video sub-stream fragments and audio sub-stream fragments in the same time range in the same media segment, and corresponding to an HTTP URL. The client only needs to request once to obtain corresponding video segments and audio segments, which ensures the synchronization of the sub-streams and simplifies the processing of the receiving end, however, once the number of the video sub-streams and the audio sub-streams is increased, the number of the combinations of different video sub-streams and audio sub-streams will quickly rise, and each combination will generate a new segment, which results in the repeated storage of the video sub-streams and the audio sub-streams at the server end and increases the storage overhead of the server.

Scheme 2, the sub-streams are segmented independently, that is, each sub-stream is segmented independently, but time alignment between segments of different sub-streams is maintained, and each sub-stream segment corresponds to a URL. By adopting the way of independent segmentation of the sub-streams, the client can request the segmentation of each sub-stream according to the needs, the server does not need to store the combined segmentation of each sub-stream, but the client needs to submit the request for multiple times to obtain the segmentation of different sub-streams, which increases the transmission overhead and the difficulty of synchronous processing, and on the other hand, the synchronization among different sub-stream segments needs to be strictly ensured when the sub-streams are pre-segmented, which makes the pre-segmentation processing of each sub-stream more complex.

Furthermore, the above HTTP adaptive streaming scheme has another problem: to support real-time delivery, the server needs to continuously update its manifest file, and the client needs to obtain the manifest file first to obtain the URL address of the latest media segment. Since the manifest file needs to be transmitted to the client after a period of time, the manifest file obtained by the client cannot reflect the generation condition of the current and latest media segments on the server, which affects the real-time transmission performance of the media stream. When the number of sub-streams or the number of combinations in a media stream reaches several tens, the manifest file becomes very complex, further increasing the transmission overhead and the processing overhead of the client for receiving the media stream.

In summary, the HTTP adaptive streaming scheme based on pre-segmentation and manifest files is not suitable for adaptive real-time transmission of media streams containing more sub-streams, and a new delivery method is urgently needed to be designed for the scheme.

Content of application

The present application is directed to solving, at least in part, one of the technical problems in the related art.

To this end, a first object of the present application is to propose a method for adaptive real-time delivery of media streams that simplifies the synchronous transmission between the sub-streams while reducing the storage overhead on the server, unifying adaptive real-time transmission supporting various types of multi-sub-stream media streams (e.g. using multi-rate coding/multi-view/multi-channel/scalable coding).

A second object of the present application is to propose an adaptive real-time delivery server for media streams.

A third object of the present application is to propose a computer device.

A fourth object of the present application is to propose a non-transitory computer-readable storage medium.

In order to achieve the above object, an embodiment of the first aspect of the present application provides a method for adaptive real-time delivery of a media stream, where the media stream includes at least one media substream, each media substream is a sequence of media units generated in real-time on a server, where each media substream is associated with a substream number, and each media unit is associated with a generation time and/or a sequence number indicating a generation order of the media units in the media substream, the method includes the following steps: receiving a media segment request sent by a client, wherein the media segment request carries at least one pull command, the pull command does not carry or carries at least one control parameter, and the control parameter comprises a first type parameter indicating a target media stream to be transmitted, a second type parameter indicating a target media sub-stream to be transmitted and a third type parameter indicating a candidate media unit to be transmitted; generating a media segment according to the media segment request, wherein for each pull command in the media segment request, the target media stream to be transmitted is selected, at least one target media sub-stream to be transmitted in the target media stream is selected, candidate media units to be transmitted in the target media sub-streams are determined, and the candidate media units determined by each pull command are packaged into the media segment; and sending the media segment to the client.

According to the self-adaptive real-time delivery method of the media stream, the media units of the sub-streams can be combined randomly according to the request of the client, the media segments are generated in real time, and the media segments are delivered to the client. Firstly, the server only needs to store the media units according to the sub-streams, and does not need to generate fragments of various sub-stream combinations in advance, so that the storage requirement of the server is reduced, the synchronous processing of the client is simplified, the client can obtain the combined fragments of the sub-streams in the same time period only through one request, and the synchronous receiving of the sub-streams is easy to ensure. Secondly, the client can dynamically adjust the target media sub-stream in the media segment request according to the application needs and the network conditions, so that adaptive transmission of various types of multi-sub-stream media streams (such as multi-rate coding/multi-view/multi-channel/scalable coding) can be uniformly supported. Finally, because each media segment is generated by the request trigger of the client, no matter how many sub-streams the media stream comprises, the manifest file is not needed any more, and the client does not need to request and analyze the manifest file, which significantly reduces the transmission overhead and processing overhead brought by the complex manifest file, thereby effectively reducing the real-time transmission delay and transmission overhead of the media stream.

In addition, the adaptive real-time delivery method of a media stream according to the above embodiment of the present application may further have the following additional technical features:

optionally, in an embodiment of the present application, the generating a media segment according to the media segment request includes: if the pull command does not carry the first type of parameters, the target media stream to be transmitted is a default specified media stream; if the pull command does not carry the second type of parameters, the target media sub-stream to be transmitted is at least one media sub-stream specified by default in the target media stream; if the pull command does not carry the third type of parameters, the candidate media units comprise default specified media units in the target media substream, the default specified media units are media units of which the sequence number intervals between all the media units and the latest media unit in the target media substream are smaller than a first preset value, or media units of which the generation time intervals between all the media units and the latest media unit in the target media substream are smaller than a second preset value, and the first preset value and the second preset value are obtained according to the target media substream.

Optionally, in an embodiment of the present application, the second type of parameter includes a sub-stream list, and the sub-stream list includes a number of at least one target media sub-stream.

Optionally, in an embodiment of the present application, the second type of parameter includes a substream pattern, where the substream pattern is an N-bit bitstream, where N is the number of media substreams included in the target media stream, and each bit of the substream pattern is associated with a specific media substream of the target media stream and is used for indicating whether the specific media substream is a target media substream to be transmitted.

Optionally, in an embodiment of the present application, the generating a media segment according to the media segment request further includes: encapsulating media sub-stream description information into the media segment, the media sub-stream description information including at least one entry, wherein each entry corresponds to a media sub-stream of the media stream and includes at least one field: the media sub-stream numbers.

Optionally, in an embodiment of the present application, each entry further includes at least one of the following fields: media component identification, substream type, substream rate, substream priority, coding hierarchy, viewpoint identification, video resolution, video frame rate, channel identification, audio sampling rate, language type.

Optionally, in an embodiment of the present application, the generating a media segment according to a media segment request further includes: if the pull command carries at least one third type parameter, wherein each third type parameter corresponds to at least one constraint condition of a candidate media unit, and the candidate media units to be transmitted include all media units in each target media sub-stream which simultaneously satisfy all constraint conditions corresponding to the third type parameters.

Optionally, in an embodiment of the present application, media units in the target media substream adopt synchronous numbers, where each time a specified time period passes, all media units generated by each target media substream in the specified time period are associated with a same new sequence number, the third type parameter includes a start sequence number, and a constraint condition corresponding to the start sequence number is: if the start sequence number is valid, the sequence number of the candidate media unit is subsequent to the start sequence number or equal to the start sequence number.

Optionally, in an embodiment of the present application, the generation times of the media units in all the target media sub-streams are derived from the same clock on the server, the third type of parameter includes a start time, and the constraint condition corresponding to the start time is: if the start time is valid, the generation time of the candidate media unit is after the start time.

Optionally, in an embodiment of the present application, the third type of parameter includes a maximum time offset, and a constraint condition corresponding to the maximum time offset is: if the maximum time offset is valid, the generation time interval of the candidate media unit and the latest media unit in the target media sub-stream is less than the maximum time offset.

Optionally, in an embodiment of the present application, the encapsulating the candidate media units determined by the respective pull commands into the media segments includes: and packaging the candidate media units determined by each pull command to the media segment according to the sequence of each pull command in the media segment request, wherein if the parameters carried by any one pull command include a unit sorting mode, the candidate media units determined by the pull command are sorted according to the unit sorting mode and then packaged to the media segment, and if the unit sorting mode is not carried, the candidate media units determined by the pull command are sorted according to a default sorting mode and then packaged to the media segment.

Optionally, in an embodiment of the present application, the unit sorting manner is a cascade of one or more basic sorting manners of basic sorting manners, where the basic sorting manner includes the following categories: time forward ordering, time reverse ordering, sequence number forward ordering, sequence number reverse ordering, sub-stream number sequence ordering, sub-stream list sequence ordering.

In order to achieve the above object, an embodiment of the second aspect of the present application provides an adaptive real-time delivery server for a media stream, where the media stream includes at least one media substream, each media substream is a sequence of media units generated in real-time on the server, where each media substream is associated with a substream number, and each media unit is associated with a generation time and/or a sequence number indicating a generation order of the media units in the media substream, and the server includes: the client interface component is used for receiving a media segment request sent by a client, wherein the media segment request carries at least one pull command, the pull command does not carry or carry at least one control parameter, and each control parameter comprises a first type parameter indicating a target media stream to be transmitted, a second type parameter indicating a target media sub-stream to be transmitted and a third type parameter indicating a candidate media unit to be transmitted; a media segment generating component, configured to generate a media segment according to the media segment request, where, for each pull command in the media segment request, the target media stream to be transmitted is selected, at least one target media sub-stream to be transmitted in the target media stream is selected, candidate media units to be transmitted in the target media sub-streams are determined, and the candidate media units determined by each pull command are encapsulated into the media segment; and the media segment sending component is used for sending the generated media segments to the client.

The adaptive real-time delivery server of the media stream in the embodiment of the application can arbitrarily combine the media units of the sub-streams according to the request of the client, generate the media segments in real time, and deliver the media segments to the client. Firstly, the server only needs to store the media units according to the sub-streams, and does not need to generate fragments of various sub-stream combinations in advance, so that the storage requirement of the server is reduced, meanwhile, the synchronous processing of the client is simplified, the client can obtain the combined fragments of the sub-streams in the same time period only by one request, and the synchronous receiving of the sub-streams is easy to ensure. Secondly, the client can dynamically adjust the target media sub-stream in the media segment request according to the application needs and the network conditions, so that adaptive transmission of various types of multi-sub-stream media streams (such as multi-rate coding/multi-view/multi-channel/scalable coding) can be uniformly supported. Finally, because each media segment is generated by the request trigger of the client, no matter how many sub-streams the media stream comprises, the manifest file is not needed any more, and the client does not need to request and analyze the manifest file, which significantly reduces the transmission overhead and processing overhead brought by the complex manifest file, thereby effectively reducing the real-time transmission delay and transmission overhead of the media stream.

In addition, the adaptive real-time delivery server for media streams according to the above-mentioned embodiment of the present application may further have the following additional technical features:

optionally, in an embodiment of the present application, the media segment generating component is further configured to, when the pull command does not carry the first type of parameter, determine that the target media stream to be transmitted is a default specified media stream, and when the pull command does not carry the second type of parameter, determine that the target media sub-stream to be transmitted is at least one media sub-stream specified by default in the target media stream, and when the pull command does not carry the third type of parameter, the candidate media units include a default specified media unit in the target media sub-stream, where the default specified media unit is a media unit in the target media sub-stream whose sequence number interval between all media units in the target media sub-stream and a latest media unit is smaller than a first preset value, or a media unit in the target media sub-stream whose generation time interval between all media units in the target media sub-stream and a latest media unit is smaller than a second preset value, where the first preset value and the second preset value are both obtained according to the target media sub-stream.

Optionally, in an embodiment of the present application, the media segment generating component is further configured to encapsulate media sub-stream description information into the media segment, where the media sub-stream description information includes at least one entry, where each entry corresponds to one media sub-stream of the media stream and includes at least one field: the media sub-stream numbers.

Optionally, in an embodiment of the present application, the media segment generating component is further configured to, when the pull command carries at least one of the third type parameters, where each of the third type parameters corresponds to at least one constraint condition of a candidate media unit, and the candidate media units to be transmitted include all media units that satisfy all constraint conditions corresponding to the third type parameters simultaneously in each of the target media sub-streams.

Optionally, in an embodiment of the present application, the media segment generating component is further configured to package, to the media segment, the candidate media units determined by each pull command according to an order in which each pull command appears in the media segment request, where if a parameter carried by any one pull command includes a unit sorting manner, the candidate media units determined by the pull command are sorted according to the unit sorting manner and then packaged to the media segment, and if the parameter carried by any one pull command does not include the unit sorting manner, the candidate media units determined by the pull command are sorted according to a default sorting manner and then packaged to the media segment.

Optionally, in an embodiment of the present application, the unit sorting manner is a cascade of one or more basic sorting manners of the basic sorting manners, where the basic sorting manner includes the following categories: time forward ordering, time reverse ordering, sequence number forward ordering, sequence number reverse ordering, substream number sequence ordering, substream list sequence ordering.

An embodiment of a third aspect of the present application provides a computer device, including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor, the instructions being arranged to perform a method of adaptive real-time delivery of a media stream as described in the above embodiments.

A fourth aspect of the present application provides a non-transitory computer-readable storage medium storing computer instructions for causing a computer to execute the method for adaptive real-time delivery of a media stream according to the foregoing embodiment.

Additional aspects and advantages of the present application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the present application.

Drawings

The foregoing and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

fig. 1 is a schematic processing diagram of a method for adaptive real-time delivery of a media stream according to an embodiment of the present application;

FIG. 2 is a diagram illustrating an adaptive real-time delivery process of a media stream according to an embodiment of the present application;

fig. 3 is a schematic diagram of a sub-stream pattern according to an embodiment of the present application;

fig. 4 is a schematic diagram of media substream description information (including multi-rate coded substreams) according to an embodiment of the present application

Fig. 5 is a diagram of media sub-stream description information (including multi-view video sub-streams) according to an embodiment of the present application

Fig. 6 is a diagram of media sub-stream description information (including scalable coded sub-streams) according to an embodiment of the present application

FIG. 7 is a diagram illustrating an adaptive real-time delivery process of a media stream according to an embodiment of the present application;

FIG. 8 is a diagram illustrating an adaptive real-time delivery process of a media stream according to an embodiment of the present application;

FIG. 9 is a schematic diagram of candidate media unit packages for different media unit ranks in accordance with one embodiment of the present application;

FIG. 10 is a block diagram of an adaptive real-time delivery server for media streaming according to an embodiment of the application;

FIG. 11 is a block diagram of an adaptive real-time delivery server for media streaming according to an embodiment of the present application;

fig. 12 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary and intended to be used for explaining the present application and should not be construed as limiting the present application.

In the internet, it is often necessary to transmit various audio streams, video streams or data streams generated in real time from one network node to another network node in real time, the network nodes include various terminals such as PCs, mobile phones and tablet computers, and various application servers such as video servers and audio servers, and the transmitted audio streams, video streams or data streams are collectively referred to as media streams. The delivery process of a media stream can be described in a general client-server model: the server delivers the generated media stream to the client in real time. Here, the server and the client refer to logical functional entities, where the server is a functional entity that sends a media stream, and the client is a functional entity that receives a media stream. The server and client may reside on any network node.

In many application scenarios, there are multiple audio streams, video streams or data streams that need to be transmitted synchronously, for example, a live media stream of a concert includes at least one video stream and at least one audio stream. Once any one or more of multi-rate coding/multi-view coding/multi-channel coding/scalable coding is adopted, multiple data streams \ video streams \ audio streams will exist in the live stream. All synchronously transmitted video streams, audio streams or data streams in a live media stream to be transmitted are referred to herein as media sub-streams of the media stream.

Each transmitted media sub-stream is a sequence of media units generated in real time on the server. Different media sub-streams, their corresponding media units may be selected by themselves. When the media sub-stream is a byte stream generated in real time, a byte can be selected as a media unit; when the media sub-stream is an audio stream or a video stream obtained through real-time sampling, an original audio frame or a video frame can be selected as a media unit; when the media sub-stream is an audio stream or a video stream that is sampled and encoded in real time, an encoded audio frame, an encoded video frame, or an Access Unit (Access Unit) may be selected as a media Unit; when the media sub-stream is an audio stream or a video stream that is sampled, encoded, and encapsulated in real time, the encapsulated transport packets (e.g., RTP packets, PES/PS/TS packets, etc.) may be selected as media units; when the media sub-stream is a real-time sampled, encoded, encapsulated, and pre-segmented audio or video stream, a segmented media segment (e.g., a TS format segment used in HLS protocol, an fMP4 format segment used in DASH protocol) may be selected as a media unit.

Each media unit may be associated with a generation time, which is typically a timestamp. Each media unit may also be associated with a sequence number that may be used to indicate the order in which the media units are generated in the media sub-streams. When serial numbers are used to indicate the order in which media units are generated, the meaning of the serial numbers needs to be defined in terms of the specific media units. When the media unit is a byte, the serial number of the media unit is a byte serial number; when the media units are audio frames and video frames, the serial numbers of the media units are frame serial numbers; when the media unit is a transmission packet, the serial number of the media unit is a packet serial number; when the Media unit is a stream segment, the Sequence number of the Media unit is the segment Sequence number (e.g., media Sequence of each TS segment in HLS). For a media sub-stream, a Sequence Number indicating the generation Sequence and a generation time may be associated at the same time, for example, when the media sub-stream is an RTP packet stream, the RTP header has both a packet Sequence Number (Sequence Number) field to indicate the Sequence of the RTP packet and a Timestamp field to indicate the generation time of the media data encapsulated in the RTP.

To identify different media sub-streams in one media stream, the media sub-streams need to be numbered: each media sub-stream is associated with a unique sub-stream number. Generally, when a media stream includes N media sub-streams, the corresponding sub-streams are numbered 1,2, \8230;, N. Each media sub-stream may use a generation time and/or sequence number to describe the generation order of each media unit.

The generation time of the media units of the different media sub-streams may be timed synchronously or independently. When independent clocking is used, the generation times of the different media sub-streams are derived from asynchronous clocks, and therefore, the correspondence of the generation times between the different media sub-streams needs to be recorded separately. When synchronous timing is adopted, the generation time of different media sub-streams is derived from the same reference clock, and the synchronous relation of the media units in the different media sub-streams can be known through the generation time. For simplicity, it is assumed that in all embodiments described below, the generation times of all media sub-streams in a media stream use the same reference clock on the server, corresponding to the same timeline, such as greenwich time.

In an embodiment of the present application, a media stream includes at least one media sub-stream, where each media sub-stream may be of any type, such as an audio stream, a video stream, or a subtitle stream, and each media sub-stream may also be of any type of transport encapsulation, such as an RTP packet stream or an MPEG2-TS stream. When the media sub-stream is an RTP packet stream, the media unit is an RTP packet, the Sequence Number (Sequence Number) of the RTP packet is the Sequence Number of the media unit, and the Timestamp (Timestamp) of the RTP packet is the Timestamp of the media unit. When the media sub-stream is an MPEG2-TS stream, the TS stream may be divided into TS segments of fixed duration (e.g., about 1 second) in a manner similar to HLS/DASH, each TS segment serves as a media unit, each TS segment may include a plurality of media frames, and then the segments are sequenced in the generation order as media unit sequence numbers, and the timestamp of the first media frame included in each segment indicates the generation time of the segment.

In a conventional real-time streaming media protocol such as RTP or RTMP, a server push method is adopted: and actively sending the new media unit to the client once the server has the new media unit. The method of the embodiment of the present application is similar to various HTTP adaptive streams (such as HLS and MPEG-DASH), and adopts a client pulling method, but the difference is that in the existing various HTTP adaptive streams, a client requests or pulls pre-segmented fragments according to a manifest file, each fragment can be identified by one URL, whereas in the embodiment of the present application, a media segment is not pre-segmented, but is generated by a server in real time according to the request of the client, and the client can control the content of the media segment.

Specifically, fig. 1 is a schematic processing procedure diagram of a method for adaptive real-time delivery of a media stream according to an embodiment of the present application.

As shown in fig. 1, a media stream includes at least one media substream, each media substream is a sequence of media units generated in real time on a server, where each media substream is associated with a substream number, and each media unit is associated with a generation time and/or a sequence number indicating a generation order of the media units in the media substream, and the adaptive real-time delivery method of the media stream includes the following steps:

in step S101, a media segment request sent by a client is received, where the media segment request carries at least one pull command, the pull command does not carry or carries at least one control parameter, and each control parameter includes a first type parameter indicating a target media stream to be transmitted, a second type parameter indicating a target media sub-stream to be transmitted, and a third type parameter indicating a candidate media unit to be transmitted.

In some examples, the control parameters that may be the first type of parameters include, but are not limited to: media stream identification, media stream name, program identification, etc.; control parameters that may be considered as a second type of parameter include, but are not limited to: a substream list, a substream pattern, a substream type, a substream priority, etc.; control parameters that may be used as a third type of parameter include, but are not limited to: start sequence number, start time, maximum time offset, unit type, unit priority, etc. It will be appreciated by those skilled in the art that new control parameters may also be defined as required by further implementation.

In some cases, a media segment request may carry one or more pull commands, which all carry respective control parameters, or a pull command may not carry any control parameters. In addition, new commands other than the pull command may be defined as needed for further implementation.

Of course, in some embodiments, the media segment request may be submitted using any network transport protocol, such as the common HTTP protocol, TCP protocol, UDP protocol, or the like. When the media segment request is submitted by adopting an HTTP protocol, an HTTP-GET mode or an HTTP-POST mode can also be adopted.

When the pull command in the media segment request carries the control parameters, the pull command and the control parameters thereof need to be encapsulated into a character string or a byte stream by adopting a certain encapsulation rule, and then the character string or the byte stream is sent to the server. For example, when HTTP-GET is used to send media segment requests, the commands and their control parameters may be encapsulated as strings into URLs.

An example of a media segment request with HTTP-GET is as follows:

media segment request carrying a pull command (without control parameters):

GET“http://www.xxx-server.com/msreqcmd＝PULL”

media segment request carrying a pull command (with 1 control parameter):

GET“http://www.xxx-server.com/msreqcmd＝PULL&streamID＝601”

GET“http://www.xxx-server.com/msreqcmd＝PULL&substreamList＝2,3”

GET“http://www.xxx-server.com/msreqcmd＝PULL&substreamPattern＝01100100”

GET“http://www.xxx-server.com/msreqcmd＝PULL&seqBegin＝1003”

GET“http://www.xxx-server.com/msreqcmd＝PULL&timeBegin＝31000”

GET“http://www.xxx-server.com/msreqcmd＝PULL&maxTimeOffset＝1000”

media segment request (split long URL string into multiple lines for easy display) carrying one pull command (with multiple control parameters):

GET“http://www.xxx-server.com/msreqcmd＝PULL&streamID＝601&timeBegin＝32000”

GET“http://www.xxx-server.com/msreqcmd＝PULL&substreamList＝1,2&seqBegin＝1005”

GET“http://www.xxx-server.com/msreqcmd＝PULL&substreamList＝3&seqBegin＝1005&unitType＝3”

GET“http://www.xxx-server.com/msreqcmd＝PULL&substreamPattern＝1101&timeBegin＝32000&unitPrio＝1”

media segment request carrying multiple pull commands: (splitting longer URL strings into rows for ease of display)

GET“http://www.xxx-server.com/msreqcmd＝PULL&streamID＝601&substreamList＝1&timeBegin＝32000&cmd＝PULL&streamID＝601&substreamList＝2&seqBegin＝1005”

GET“http://www.xxx-server.com/msreqcmd＝PULL&streamID＝601&substreamPattern＝0100001&timeBegin＝32000&cmd＝PULL&streamID＝601&substreamList＝2&seqBegin＝1005”

GET“http://www.xxx-server.com/msreqcmd＝PULL&streamID＝601&substreamPattern＝0100001&timeBegin＝32000&cmd＝PULL&streamID＝602&substreamPattern＝0011&timeBegin＝32000”

In the URL of the request, the parameter names streamID, substreamList, substreamattern, seqBegin, timeBegin, maxTimeOffset, unitType, and unitPrio respectively represent the media stream identifier, the sub-stream list, the sub-stream pattern, the start sequence number, the start time, the maximum time offset, the unit type, and the unit priority.

The server may receive the media segment request of the client by using a Web server, extract the corresponding command and the control parameter thereof from the URL of the request, and classify the control parameter carried by each pull command: if the parameter is the media stream identifier or the media stream name, the parameter is a first type parameter; if the parameter is the sub-stream list or the sub-stream pattern, the parameter is the second type parameter; if one of the following parameters: the starting sequence number, the starting time, the maximum time offset, the unit type and the unit priority, and the parameter is the third type parameter.

In step S102, a media segment is generated according to the media segment request, wherein for each pull command in the media segment request, a target media stream to be transmitted is selected, at least one target media sub-stream to be transmitted in the target media stream is selected, candidate media units to be transmitted in the target media sub-streams are determined, and the candidate media units determined by each pull command are encapsulated into the media segment.

Specifically, the media segment is generated according to the media segment request, and this step can be further divided into several sub-steps S1021 to S1024: firstly, for each pull command in the media segment request, step S1021 selects a target media stream to be transmitted according to a first type parameter, step S1022 selects a target media sub-stream in the target media stream according to a second type parameter, step S1023 determines candidate media units to be transmitted in the target media sub-stream according to a third type parameter, and step S1024 encapsulates the candidate media units determined in all the pull commands into media segments.

In some examples, in step S1021, a target media stream to be transmitted may be selected according to a media stream identifier or a media stream name, in step S1022, a target media sub-stream may be selected according to parameters such as a sub-stream list, a sub-stream pattern, and the like, in step S1023, candidate media units may be determined according to parameters such as a start sequence number, a start time, a maximum time offset, and the like, and in step S1024, one or more media units may be encapsulated into media segments by using a customized encapsulation protocol, for example, a simple encapsulation protocol is as follows: the media segment is composed of a segment header and a segment payload, the segment payload is formed by cascading a plurality of media units, the segment header indicates the initial position and the length of each media unit, when each media unit does not carry generation time or sequence number, the segment header should also indicate the sequence number and/or the generation time of each media unit, and when each media unit does not carry substream numbers, the segment header should also indicate the substream numbers of each media unit.

In step S103, the media segment is sent to the client.

In particular, the server may select an appropriate way to transmit the media segment to the client according to a protocol used by the media segment request of the client, for example, when the received media segment request adopts HTTP GET, the generated media segment may be transmitted through an HTTP GET response message: placing the media segment into an entity body of the HTTP response message; if the media segment request is received over some established TCP connection, the generated media segment may be sent directly to the client over the TCP connection.

When the server receives continuous media segment requests from the client, the server will continuously generate new media segments according to the requests of the client, these new media segments encapsulate several media units that are recently generated in the selected target media sub-stream and wait to be sent to the client, the client analyzes these media segments, and the media units of each target media sub-stream in the real-time media stream can be recovered, this process is shown in fig. 2. The client may continuously adjust the control parameters carried by the pull command in the media segment request according to the application needs or the network transmission conditions, for example, change the second type of parameters (media sub-stream list, etc.) and the third type of parameters (start time, maximum time offset, unit priority, etc.), so as to ensure the continuity and real-time property of the transmission of the media stream from the server to the client and the adaptability to the dynamic network.

Due to the adoption of the mode of instantly generating the media segments, the method of the embodiment of the application does not need pre-segmentation and manifest files, and does not need a client to receive and process the manifest files, thereby reducing transmission delay and saving cost. Meanwhile, the client can arbitrarily combine the media units in different media sub-streams through the media segment request, and the required media units of each media sub-stream can be obtained only by one-time request, so that synchronous receiving of different media sub-streams is easily ensured. Finally, by adjusting the media substreams and the candidate media units to be received at any time, the method better meets the application requirements of the terminal and adapts to the change of network bandwidth, and can support the adaptive transmission of various media streams adopting multi-substream coding (such as multi-rate coding/multi-view coding/multi-channel coding/scalable coding).

It should be understood that the setting of each step S101, step S102 and step S103 is only for convenience of description, and is not used to limit the execution order of the method. In particular implementation, each step may correspond to a functional entity that can operate independently and interact with each other.

The above is a detailed explanation of embodiment 1, and embodiment 2 will be described in detail below, and in the following embodiments, an example will be made of how the server generates a media segment according to a media segment request.

Optionally, in an embodiment of the present application, generating a media segment according to a media segment request includes: if the pull command does not carry the first type of parameters, the target media stream to be transmitted is the default specified media stream; if the pull command does not carry the second type of parameters, the target media substream to be transmitted is at least one media substream specified by default in the target media stream; if the pull command does not carry the third type of parameters, the candidate media units include a default specified media unit in the target media substream, the default specified media unit is a media unit in which the sequence number intervals of all and the latest media unit in the target media substream are smaller than a first preset value, or a media unit in which the generation time intervals of all and the latest media unit in the target media substream are smaller than a second preset value, and the first preset value and the second preset value are both obtained according to the target media substream.

Specifically, when only one media stream to be transmitted exists in the server, the pull command sent by the client does not need to carry the first type of parameters, and the media stream is the selected target media stream; when a plurality of media streams to be transmitted exist in the server, one of the media streams can be designated as a default media stream, and when a pull command sent by the client does not carry any first-type parameter, the default media stream is selected as a target media stream.

In actual implementation, the media sub-streams contained in any one target media stream may be various. As may contain different types of media sub-streams: video streams, audio streams, subtitle streams, additional information streams, picture streams, and the like; different bit rates may be included for the same type of media substream, such as media substreams corresponding to different resolutions and frame rates for video streams and media substreams corresponding to different sampling rates for audio streams; for a video stream of the same type and bitrate, it may include multiple coding layers (e.g., using Scalable Video Coding (SVC)), and these different coding layers correspond to different priorities. In general, the server should select one or more media sub-streams among all media sub-streams suitable for most terminal displays and normal transmission under most network bandwidth conditions as default media sub-streams of the target media stream, and select these default media sub-streams as the target media sub-streams when the client does not carry any parameter of the second type.

For any one of the target media sub-streams, in order to guarantee real-time delivery of the media stream, the newly generated one or more media units of the target media sub-stream should be delivered first. When the pull command sent by the client does not carry any third-class parameter, the server may take the default specified media unit as a candidate media unit. These default designated media units are media units in the target media substream for which the sequence number intervals of all and the newest media units are less than a first predetermined value, or media units in the target media substream for which the generation time intervals of all and the newest media units are less than a second predetermined value. When the pull command includes a plurality of target media sub-streams and does not carry any third type parameter, the first preset value or the second preset value set by each target media sub-stream should ensure the transmission synchronization of each target media sub-stream.

Based on the description of the relevant embodiments, it can be understood that, with the above implementation manner, even when a certain pull command in a media segment request sent by a client does not carry a certain type of control parameter, a server may select a default specified target media stream \ target media substream \ candidate media unit, and generate a media segment to send to the client. Fig. 2 is a schematic diagram of a real-time transmission process of media streams according to an embodiment of the present application, where a server only includes one media stream S1, and when the server receives a media segment request MS _ REQ1, because the MS _ REQ1 only includes one pull command and the pull command does not carry any parameter, a target media stream selected by the server is a default media stream S1, and selected target media substreams are default media substreams 1 and 4; for the media sub-stream 1, the first preset value is 3, and for the media sub-stream 4, the first preset value is 4, so that the server determines candidate media units of the media sub-stream 1 and the media sub-stream 4, encapsulates the candidate media units into the first media segment MS1, and returns the first media segment MS1 to the client.

Embodiment 3, in the following embodiments, a description will be given of how the server selects the target media sub-stream to be transmitted according to the second type of parameters. The second type of parameters given in the embodiments of the present application includes two types:

1) Sub-stream list

The substream list directly gives the number of the target media substream. For example: substreamList =1,4 may be used to indicate that the selected target media substream is substream 1 and substream 4.

2) Sub-stream pattern

The sub-stream pattern is an N-bit stream, where N is the number of media sub-streams included in the target media stream, and each bit of the sub-stream pattern is associated with a specific media sub-stream of the target media stream and is used to indicate whether the specific media sub-stream is a target media sub-stream to be transmitted. Fig. 3 shows an example of a sub-stream pattern (N = 8), where the sub-stream pattern is a bit stream 01101000, and each bit from left to right corresponds to sub-stream 1 to sub-stream 8, so that a bit value of 1 indicates that the associated sub-stream is the target media sub-stream, i.e. the selected target media sub-streams of the sub-stream pattern are three: substream 2, substream 3 and substream 5.

Generally, in implementation, when the number of target media sub-streams is small or the packaging order of the target media sub-streams needs to be indicated, it is recommended to use a sub-stream list to represent the target media sub-streams; when the number of target media substreams is large and the same clock timing is used between substreams, it is recommended to use a substream pattern.

In other embodiments, in addition to the number of the sub-stream as the second type parameter, the characteristics of the sub-stream may be defined as the second type parameter. The characteristics of these sub-streams include: the substream type, substream priority, viewpoint number, channel number, video resolution, etc. may indicate the conditions that the target media substream needs to satisfy by one or more substream characteristic parameters, and the final target media substream is selected by the server.

Embodiment 4, in the following embodiments, an example will be given of how the server transfers the information related to the sub-streams to the client.

In the specific implementation process, the client needs to indicate the target media sub-streams through the second type of parameters, provided that the client should know which target media sub-streams are included in the current media stream and which characteristics of the target media sub-streams are provided, and the client can select the target media sub-streams to be transmitted according to the application requirements and the network transmission conditions.

The description information about the media sub-streams in the media stream can be provided to the client application layer by the server application layer, and the client can obtain this information by a method independent of the current transmission process (e.g. submitting an additional request message or through a third-party server), or can obtain this information directly from the server during the transmission process.

The media sub-stream description information contains the minimum information: which media sub-streams the current media stream contains. If the numbers of the media sub-streams are numbered consecutively from 1 to N, the number of all the media sub-streams can be obtained only by including the number N of the media sub-streams in the media sub-stream description information. When the media stream adopts various multi-sub-stream coding, more sub-stream characteristic information is introduced into the media sub-stream description information:

1) Media component identification

The media component identifiers are used to indicate different ways of information acquisition in a media stream, such as video information acquired by different cameras in a live broadcast site corresponding to different components. Each media sub-stream is associated with a media component, but the same media component may correspond to multiple media sub-streams, for example, a video captured from the same viewpoint may be represented by multiple media sub-streams encoded at different bit rates.

2) Sub-stream types

Types of media sub-streams include, but are not limited to: video, audio, picture, subtitle, etc., or mixed types; the mixed type means that one media sub-stream contains multiple types of media units, for example, a sub-stream may contain both video and audio.

3) Sub-stream code rate

When a media substream is a fixed rate (CBR), the substream code rate is used to indicate the code rate of the media substream; if a media substream is variable code rate (VBR), the substream code rate is used to indicate the average code rate over a period of time for that substream.

4) Sub-stream priority

The priority of the media sub-streams is used to indicate the importance of different media sub-streams in the transmission process.

5) Coding hierarchy

When a media stream employs scalable coding, such as Scalable Video Coding (SVC), the media stream generates multiple levels of coded streams, including: a base layer and a plurality of enhancement layers, one for each media substream.

6) Viewpoint identification

When the media stream is encoded using multiple views, such as 3D video, the media stream may generate multiple encoded streams of different views, where each media sub-stream corresponds to a view. When multiple views are jointly encoded into one media sub-stream, then there may be multiple view identifications for one media sub-stream.

7) Video resolution

When the type of a media sub-stream is video, the video resolution adopted when the media sub-stream is coded;

8) Video frame rate

When the type of a media sub-stream is video, the frame rate used for video encoding is the same.

9) Soundtrack identification

When the media stream adopts multi-channel coding, the media stream can respectively generate coding data on a plurality of sound channels, a plurality of sound channels form a sound channel group for multi-channel joint coding, and each media sub-stream corresponds to one or more sound channel identifications.

10 ) audio sampling rate

When the media substream is an audio stream, it is encoded at the sampling rate.

11 ) language type

When the media sub-stream is an audio stream containing human voice, the language used by the human voice.

In specific implementation, each media stream may customize its own media sub-stream description information according to actual conditions, and fig. 4 to fig. 6 show examples of the media sub-stream description information in three application scenarios, where sub-stream 1, sub-stream 2, and sub-stream 3 in fig. 4 are encoded sub-streams with three different bit rates of the same media content (media component identifiers are all 10), sub-stream 1, sub-stream 2, and sub-stream 3 in fig. 5 correspond to three different viewpoints of the same media content, sub-stream 4 and sub-stream 5 correspond to two different sound channels of the same media content, and sub-streams 1 to 4 in fig. 6 correspond to a base layer and 3 enhancement layers when the same media content (media component identifiers are all 30) adopts scalable video coding. After receiving the media segments, the client analyzes the description information of the media sub-streams from the media segments, and then can select the target media sub-streams to be transmitted in real time according to the actual needs of a service layer, the performance of a terminal and the network condition so as to support the adaptive transmission of various multi-sub-stream coded media streams.

The media sub-stream description information of a media stream is typically kept constant and therefore does not need to be encapsulated in every media segment. Generally speaking, when the server receives the first media segment request from the client, the server may encapsulate the media sub-stream description information in the returned first media segment, and may not encapsulate the media sub-stream description information in the subsequent media segments.

Embodiment 5, in the following embodiments, an example will be given of how the server determines candidate media units to be transmitted by means of a third type of parameter.

Optionally, in an embodiment of the present application, generating a media segment according to the media segment request further includes: if the pull command carries at least one third type parameter, wherein each third type parameter corresponds to at least one constraint condition of the candidate media unit, the candidate media units to be transmitted include all media units in each target media sub-stream which simultaneously satisfy all constraint conditions corresponding to the third type parameters.

A plurality of third-class parameters and the constraint condition corresponding to each third-class parameter are given as follows:

1) Starting sequence number

The constraint conditions corresponding to the starting sequence number are as follows: if the start sequence number is valid, the sequence number of the candidate media unit is after the start sequence number or equal to the start sequence number.

2) Starting time

The constraint conditions corresponding to the starting time are as follows: if the start time is valid, the generation time of the candidate unit is after the start time.

3) Maximum time offset

The constraint conditions corresponding to the maximum time offset are as follows: if the maximum time offset is valid, then the generation time interval of the candidate media unit and the newest media unit in the target media sub-stream is less than the maximum time offset.

The third type of parameter valid and invalid refers to whether the value of the parameter is within a specified range. Taking the starting sequence number as an example, the value of the starting sequence number cannot exceed the sequence number of the current newest media unit, on the other hand, to ensure real-time performance, the value of the starting sequence number cannot be earlier than the sequence number of a certain existing media unit, and the starting sequence number in the above range is valid. If a certain third-class parameter is invalid, it is equivalent to not carrying the third-class parameter. And when all the third-class parameters are invalid, the candidate media units to be transmitted in the target media sub-stream are the default specified media units.

In addition, each pull command may carry one or more third-class parameters thereof in a specific implementation, where the pull command is not limited to carry other self-defined third-class parameters, for example, other third-class parameters, such as a media unit type, a minimum priority, a priority range, and the like, may be defined according to characteristics of the media unit as constraints of the media unit.

It should be further noted that, when there is only one target media substream selected according to the second type of parameter, it is only necessary to determine whether the media units in the target media substream satisfy the constraint conditions corresponding to various third type of parameters. However, if a plurality of target media sub-streams are selected according to the second type of parameters, in order to make the above constraint conditions based on sequence numbers or generation time act on a plurality of target media sub-streams simultaneously, the target media sub-streams should be synchronously numbered or be clocked by the same clock. Wherein, the synchronous number means: and at the server, every time a specified time period passes, all the media units generated in the time period of each target media sub-stream are associated to the same new sequence number. The specified time period may be fixed or variable, may be preset, or may be dynamically determined according to the actual generation of the media unit. After the synchronization number is adopted, the sequence number of the media unit can not only be used for indicating the generation sequence of the media units in each media sub-stream, but also indicating the synchronization relationship among the media units in different target media sub-streams.

Fig. 2 shows a real-time delivery process of a media stream, and a client requests media data of a target media stream S1, where the target media stream S1 is a default media stream on a server, and the target media stream includes 4 media sub-streams, where sub-stream 1, sub-stream 2, and sub-stream 3 are three media streams (e.g., three video streams encoded with different bit rates) that are synchronously numbered, sub-stream 4 is a numbering process independent of other sub-streams (e.g., an audio stream that is encoded and output independently), and the default specified media sub-streams are sub-stream 1 and sub-stream 4. Since the sub-stream 1 and the sub-stream 4 are not synchronous numbers, and the sequence numbers of the latest media units of the obtained sub-stream 1 and the sub-stream 4 are different after the client receives the media segment, when the client wants to continue to receive new media units, two pull commands need to be carried in the media segment request, each pull command carries different media sub-stream lists and corresponding initial sequence numbers, which are respectively used for specifying the characteristics of the target media sub-stream and the media unit to be sent, so that the continuous reception of the sub-stream 1 and the sub-stream 4 can be respectively ensured.

The target media stream in fig. 7 is similar to fig. 2, except that the client actively requests media data of sub-streams 1 and 2 (e.g., sub-streams 1,2, and 3 correspond to the base layer and two enhancement layers using scalable video coding, respectively). In this case, since subflow 1 and subflow 2 are synchronously numbered, when the client submits a media segment request, only one pull command is used, and the subflow list carried by the pull command includes two target media subflows: substream 1 and substream 2, which carry starting sequence numbers, may be used to indicate candidate media units in both substream 1 and substream 2.

The target media stream in fig. 8 is similar to fig. 2 and fig. 7, except that the client requests synchronous media data of three sub-streams (including sub-stream 1, sub-stream 2 and sub-stream 4) simultaneously, although sub-stream 4 and sub-stream 1&2 are not synchronously numbered, the generation time of all sub-streams in the target media stream S1 uses the same reference clock, so the client can still use only one pull command in the media segment request to achieve the simultaneous pull of the three media sub-streams. At this time, three target media substreams are specified in the substream list carried by the pull command, and the start time carried by the pull command is the latest generation time of the currently received media unit by the client, so that the server can guarantee that all newly generated media units to be sent are continuously encapsulated into the media segment according to the start time and are sent to the client.

In this embodiment, the client may implement real-time reception of the media stream by continuously submitting media segment requests, and may adapt to changes in application requirements and network states by adjusting the target media sub-stream list, as in fig. 2, initially receiving sub-stream 1 and sub-stream 4 by default, and when the application layer only needs to receive sub-stream 4 or detects a reduction in network bandwidth, modifying the target media sub-stream to include only sub-stream 4 in MS _ REQ4, and automatically switching to a media unit that only receives sub-stream 4.

Embodiment 6, in the following embodiments, a description will be given of a processing procedure when the server encapsulates the candidate media units into media segments.

Further, in one embodiment of the present application, encapsulating the candidate media units determined by the respective pull commands into media segments comprises: and packaging the candidate media units determined by each pull command to the media segments according to the sequence of the pull commands in the media segment requests, wherein if the parameters carried by a certain pull command comprise a unit sequencing mode, the determined candidate media units are sequenced according to the unit sequencing mode and then packaged to the media segments, and if a certain pull command does not carry the unit sequencing mode, the determined candidate media units are sequenced according to a default sequencing mode and then packaged to the media segments.

The embodiment of the application provides six basic unit sequencing modes:

1) TIME FORWARD (TIME _ FORWARD)

The candidate media units are ordered by their generation time, with earlier generated candidate media units being encapsulated to the media segment earlier.

2) TIME reversal (TIME _ BACKWARD)

The candidate media units are reverse ordered by their generation time, with later generated candidate media units being encapsulated to media segments earlier.

3) Sequence number Forward (SEQ _ FORWARD)

The candidate media units are ordered according to their sequence numbers, with the candidate media unit preceding the sequence number being encapsulated to the media segment first.

4) Reverse sequence number (SEQ _ BACKWARD)

And reversely ordering according to the sequence numbers of the candidate media units, and encapsulating the candidate media units with the following sequence numbers into the media segments earlier.

5) Substream numbering sequence (SSNO _ ORDER)

When there are multiple target media sub-streams, the candidate media units of each sub-stream are sequentially packaged in order of the size of the sub-stream number.

6) Substream list ORDER (SSLIST _ ORDER)

When there are a plurality of target media sub-streams defined by a sub-stream list parameter (SubStreamList), candidate media units of the plurality of sub-streams are sequentially packaged in an order in which respective sub-stream numbers appear in the sub-stream list.

The unit ranking may also be a concatenation of the above-described basic ranking, such as SSLIST _ ORDER + SEQ _ backup, which means that the candidate media units are first ranked according to a first basic ranking, and the candidate media units with the same rank after ranking are ranked according to a second basic ranking, and so on until the ranking is completed. Whether a basic sorting mode or a cascade sorting mode is adopted, if candidate media units with the same position exist after sorting, the candidate media units with the same position are sorted according to a default sorting mode.

Fig. 9 shows the process of encapsulating the candidate media units into the media segments in different media unit ordering manners, where the media segment MS3 to be generated contains the same candidate media unit, but the sequence of encapsulating the final media unit into the media segment is different corresponding to different media segment requests.

Sequencing mode 1: the media segment request consists of two pull commands, the first pull command's target media sub-stream being sub-stream 4 and the second pull command's target media sub-streams being sub-streams 1 and 2. Therefore, the candidate media units of sub-stream 4 are first encapsulated to the media segment in the order of the pull command, and since the first pull command does not indicate any unit ordering, the candidate media units D58-D62 are encapsulated in the default ordering, i.e. forward in time; then, since the unit ordering mode carried by the second pull command is TIME reversal (TIME _ backup _ ward), the media units encapsulated in subflow 1 and subflow 2 in TIME reversal are a27/B27, a26/B26, and a25/B25. For the same-position media units, the media units are sorted by default according to the sizes of their sub-stream numbers, so the packing order of the final candidate media units is as shown in sorting mode 1 of fig. 9.

Sequencing mode 2: the media segment request only comprises a pull command, and the unit ordering mode carried by the pull command is the cascade of two basic ordering modes: SSLIST _ ORDER + SEQ _ FORWARD. The first basic sorting mode is the sub-stream list ORDER (SSLIST _ ORDER), which indicates that the candidate media units of each sub-stream are packaged in the ORDER of the sub-stream numbers in the sub-stream list (SubStreamList =4,1, 2), i.e., the candidate media unit of sub-stream 4 is packaged first, then the candidate media unit of sub-stream 1 is packaged, and then the candidate media unit of sub-stream 2 is packaged. The second basic sorting mode is sequence number FORWARD (SEQ _ FORWARD), i.e. for the candidate media units belonging to the same sub-stream, the candidate media units are sorted in the sequence of their sequence numbers from front to back, and finally, the packing order of the candidate media units is as shown in sorting mode 2 of fig. 9.

A sorting mode 3: the media segment request only comprises a pull command, and the unit ordering mode carried by the pull command is the cascade of two basic ordering modes: SSNO _ ORDER + SEQ _ BACKWARD. The first basic ordering is the sub-stream number ORDER (SSNO _ ORDER), which indicates that the candidate media units of each sub-stream are packed in ORDER of the sub-stream numbers from small to large, i.e., the candidate media unit of sub-stream 1 is packed first, then the candidate media unit of sub-stream 2 is packed, and then the candidate media unit of sub-stream 3 is packed. The second basic sorting mode is sequence number reversal (SEQ _ BACKWARD), i.e. for candidate media units belonging to the same sub-stream, the candidate media units are sorted in the order of their sequence numbers from back to front, and finally, the packing order of the candidate media units is as shown in sorting mode 3 of fig. 9.

And 4, sorting mode: the media segment request only comprises one pull command, and the unit ordering mode carried by the pull command is only one: TIME _ FORWARD, i.e., the candidate media units are sorted from front to back according to the generation TIME of all candidate media units, and finally, the packing order of the candidate media units is shown in sort 4 of fig. 9.

Of course, this embodiment does not limit the definition of a new unit ordering manner, for example, when each media unit is associated with a priority, candidate media units may be ordered according to unit priority, and then a new unit ordering manner may be defined: HIGH priority cell FIRST (HIGH _ PRIOR _ FIRST); when each media sub-stream is also associated with a priority, a plurality of target media sub-streams in the same pull command may be sorted according to the sub-stream priority, and a new unit sorting manner may be defined: sub-stream priority ORDER (SS _ PRIOR _ ORDER). In addition, when the media segment is generated, the candidate media units determined by each pull command may also be encapsulated not according to the sequence of the pull command appearing in the media segment request, for example, all the candidate media units are sorted and encapsulated into the media segment without distinguishing the pull command.

The order of packaging the media units to the media segments is controlled by a pull command and a unit sorting mode, so that when the network transmission bandwidth is insufficient, the preferential transmission of a specific candidate media unit of a specific sub-stream can be ensured: for example, a high-priority media sub-stream may preferentially ensure the transmission of audio when a video sub-stream and an audio sub-stream are transmitted simultaneously; when the code streams of the base layer and the enhancement layer are transmitted simultaneously, the candidate media units of the base layer are preferentially sent, and the transmission of the newly generated candidate media units is preferentially ensured on occasions with high real-time requirements, so that the user experience is improved.

According to the adaptive real-time delivery method of the media stream provided by the embodiment of the application, the media units of the sub-streams can be combined arbitrarily according to the request of the client, the media segments are generated in real time, and the media segments are delivered to the client. Firstly, the server only needs to store the media units according to the sub-streams, and does not need to generate fragments of various sub-stream combinations in advance, so that the storage requirement of the server is reduced, the synchronous processing of the client is simplified, the client can obtain the combined fragments of the sub-streams in the same time period only through one request, and the synchronous receiving of the sub-streams is easy to ensure. Secondly, the client can dynamically adjust the target media sub-stream in the media segment request according to the application needs and the network conditions, so that adaptive transmission of various types of multi-sub-stream media streams (such as multi-rate coding/multi-view/multi-channel/scalable coding) can be uniformly supported. Finally, each media segment is generated by the request trigger of the client, no matter how many sub-streams the media stream comprises, no manifest file is needed any more, and the client does not need to request and analyze the manifest file, which significantly reduces the transmission overhead and the processing overhead brought by the complex manifest file, thereby effectively reducing the real-time transmission delay and the transmission overhead of the media stream.

An adaptive real-time delivery server for media streams according to an embodiment of the present application is described next with reference to the accompanying drawings.

Fig. 10 is a schematic structural diagram of an adaptive real-time delivery server for media streaming according to an embodiment of the present application.

As shown in fig. 10, the media stream includes at least one media sub-stream, each media sub-stream is a sequence of media units generated in real time on the server, wherein each media sub-stream is associated with a sub-stream number, each media unit is associated with a generation time and/or a sequence number indicating a generation sequence of the media units in the media sub-stream, and the server 10 includes: a client interface component 100, a media segment generation component 200 and a media segment transmission component 300.

The client interface component 100 is configured to receive a media segment request sent by a client, where the media segment request carries at least one pull command, the pull command does not carry or carries at least one control parameter, and each control parameter includes a first type parameter indicating a target media stream to be transmitted, a second type parameter indicating a target media sub-stream to be transmitted, and a third type parameter indicating a candidate media unit to be transmitted. A media segment generating component 200, configured to generate media segments according to the media segment request, where for each pull command in the media segment request, a target media stream to be transmitted is selected, at least one target media sub-stream to be transmitted in the target media stream is selected, candidate media units to be transmitted in the target media sub-streams are determined, and the candidate media units determined by each pull command are encapsulated into a media segment, where generating a media segment according to the media segment request includes: firstly, aiming at each pull command in the media segment request, a target media stream to be transmitted is selected, at least one target media sub-stream to be transmitted in the target media stream is selected, candidate media units to be transmitted in each target media sub-stream are determined, and then the candidate media units determined by each pull command are packaged into the media segment. A media segment sending component 300, configured to send the generated media segment to the client. The server 10 according to the embodiment of the present application may arbitrarily combine the media units of the sub-streams according to the request of the client, generate the media segments in real time, and then return the media segments to the client, thereby simplifying the synchronous transmission among the sub-streams while reducing the storage overhead on the server, and effectively reducing the transmission delay and overhead of the media stream.

In particular, the client interface component 100 is used to receive a media segment request of a client; the media segment request may be one or more pull commands, each pull command may carry 0, 1, or more control parameters; the control parameters include the following categories: a first type parameter, a second type parameter and a third type parameter; the first type of parameter is used for indicating a target media stream to be transmitted; the second type of parameter is used for indicating a target media sub-stream to be transmitted in the target media stream; the third type of parameter is used to indicate candidate media units to be transmitted in the target media sub-stream. The client interface component 100 may employ any specified protocol for receiving media segment requests, for example, when the HTTP protocol is employed, the client interface component 100 may be a Web server that may receive any media segment requests employing the HTTP protocol; when using the TCP protocol, the client interface component is a TCP server and provides a fixed service port.

The media segment generation component 200 is configured to generate the required media segments according to the media segment requests of the client. The method comprises the steps of obtaining a media segment request from a client interface component 100, analyzing a pull command and a control parameter carried by the media segment request, selecting a target media stream to be transmitted according to a first type of parameter, selecting a target media sub-stream to be transmitted according to a second type of parameter, determining a candidate media unit to be transmitted in each target media sub-stream according to a third type of parameter, finally extracting the candidate media unit determined by each pull command from a media stream storage unit, packaging the candidate media unit into a media segment, and directly delivering the media segment to a media segment sending component 300 for sending.

Further, as shown in fig. 11, the server 10 of the embodiment of the present application further includes at least one media stream real-time generation component, configured to generate by itself or receive in real time one or more media streams from other servers; the media stream comprises at least one media sub-stream, each media sub-stream being a sequence of media units generated in real time on the server; each media sub-stream is associated with a sub-stream number, each media unit is associated with a generation time and/or a sequence number, and the sequence number is used for indicating the generation sequence of the media units in the media sub-streams;

specifically, the real-time media stream generating component includes one or more real-time media sub-stream generating components, each of which includes one or more processing steps for real-time generation of media sub-streams, for example, the processing steps include but are not limited to: real-time acquisition, encoding compression, transmission encapsulation and pre-segmentation of media signals. In addition, the media sub-stream real-time generation component can also receive media streams from other devices in real time or convert media stream files existing on a server into a media unit sequence generated in real time.

Optionally, in an embodiment of the present application, the media segment generating component 200 is further configured to, when the pull command does not carry the first type of parameter, determine a target media stream to be transmitted as a default specified media stream, and when the pull command does not carry the second type of parameter, determine the target media sub-stream to be transmitted as at least one media sub-stream specified by the default in the target media stream, and when the pull command does not carry the third type of parameter, determine the candidate media units as the default specified media units in the target media sub-stream, where the default specified media units are media units in the target media sub-stream whose sequence number intervals between all media units and the latest media unit are smaller than a first preset value, or media units in the target media sub-stream whose generation time intervals between all media units and the latest media unit are smaller than a second preset value, where the first preset value and the second preset value are both obtained according to the target media sub-stream.

Optionally, in an embodiment of the present application, the second type of parameter includes a substream list, the substream list containing a number of at least one target media substream.

Optionally, in an embodiment of the present application, the media segment generating component 200 is further configured to encapsulate media sub-stream description information into the media segment, where the media sub-stream description information includes at least one entry, where each entry corresponds to one media sub-stream of the media stream and includes at least one field: the media sub-streams are numbered.

Optionally, in an embodiment of the present application, the media segment generating component 200 is further configured to, when the pull command carries at least one third type parameter, each third type parameter corresponds to at least one constraint condition of a candidate media unit, and the candidate media units to be transmitted include all media units that satisfy all constraint conditions corresponding to the third type parameter simultaneously in each target media sub-stream.

Optionally, in an embodiment of the present application, media units in the target media substream adopt synchronous numbers, where each time a specified time period passes, all media units generated by each target media substream in the specified time period are associated with the same new sequence number, the third type of parameter includes a start sequence number, and a constraint condition corresponding to the start sequence number is: if the start sequence number is valid, the sequence number of the candidate media unit is after the start sequence number or equal to the start sequence number.

Optionally, in an embodiment of the present application, the third type of parameter includes a maximum time offset, and a constraint condition corresponding to the maximum time offset is: if the maximum time offset is valid, the generation time interval of the candidate media unit and the newest media unit in the target media sub-stream is less than the maximum time offset.

Optionally, in an embodiment of the present application, the media segment generating component 200 is further configured to package the candidate media units determined by each pull command to the media segment according to an order in which each pull command appears in the media segment request, where if a parameter carried by any one pull command includes a unit sorting manner, the candidate media units determined by the pull command are sorted according to the unit sorting manner and then packaged to the media segment, and if the unit sorting manner is not carried, the candidate media units determined by the pull command are sorted according to a default sorting manner and then packaged to the media segment.

Optionally, in an embodiment of the present application, the unit sorting manner is a cascade of one or more basic sorting manners of the basic sorting manners, and the basic sorting manner includes the following categories: time forward ordering, time reverse ordering, sequence number forward ordering, sequence number reverse ordering, substream number sequence ordering, substream list sequence ordering.

In addition, a client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service are overcome.

It should be noted that the foregoing explanation on the embodiment of the adaptive real-time delivery method for a media stream is also applicable to the adaptive real-time delivery server for a media stream in this embodiment, and details are not described here again.

According to the adaptive real-time delivery server of the media stream provided by the embodiment of the application, the media units of the sub-streams can be combined arbitrarily according to the request of the client, the media segments are generated in real time, and the media segments are delivered to the client. Firstly, the server only needs to store the media units according to the sub-streams, and does not need to generate fragments of various sub-stream combinations in advance, so that the storage requirement of the server is reduced, meanwhile, the synchronous processing of the client is simplified, the client can obtain the combined fragments of the sub-streams in the same time period only by one request, and the synchronous receiving of the sub-streams is easy to ensure. Secondly, the client can dynamically adjust the target media sub-stream in the media segment request according to the application needs and the network conditions, so that adaptive transmission of various types of multi-sub-stream media streams (such as multi-rate coding/multi-view/multi-channel/scalable coding) can be uniformly supported. Finally, each media segment is generated by the request trigger of the client, no matter how many sub-streams the media stream comprises, no manifest file is needed any more, and the client does not need to request and analyze the manifest file, which significantly reduces the transmission overhead and the processing overhead brought by the complex manifest file, thereby effectively reducing the real-time transmission delay and the transmission overhead of the media stream.

Fig. 12 is a schematic structural diagram of an electronic device according to an embodiment of the present application. The electronic device may include:

a memory 1201, a processor 1202, and a computer program stored on the memory 1201 and executable on the processor 1202.

The processor 1202, when executing the program, implements the adaptive real-time delivery method of media streams provided in the above-described embodiments.

Further, the electronic device further includes:

a communication interface 1203 for communication between the memory 1201 and the processor 1202.

A memory 1201 for storing computer programs executable on the processor 1202.

The memory 1201 may comprise high-speed RAM memory, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.

If the memory 1201, the processor 1202, and the communication interface 1203 are implemented independently, the communication interface 1203, the memory 1201, and the processor 1202 may be connected to each other by a bus and perform communication with each other. The bus may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended ISA (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 12, but this is not intended to represent only one bus or type of bus.

Optionally, in a specific implementation, if the memory 1201, the processor 1202, and the communication interface 1203 are integrated on one chip, the memory 1201, the processor 1202, and the communication interface 1203 may complete mutual communication through an internal interface.

Processor 1202 may be a Central Processing Unit (CPU), or an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits configured to implement embodiments of the present Application.

The present embodiment also provides a computer-readable storage medium, on which a computer program is stored, wherein the program, when executed by a processor, implements the method for adaptive real-time delivery of a media stream as above.

In the description herein, reference to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the application. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or N embodiments or examples. Moreover, various embodiments or examples and features of various embodiments or examples described in this specification can be combined and combined by one skilled in the art without being mutually inconsistent.

Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one of the feature. In the description of the present application, "N" means at least two, e.g., two, three, etc., unless explicitly defined otherwise.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more N executable instructions for implementing steps of a custom logic function or process, and alternate implementations are included within the scope of the preferred embodiment of the present application in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the embodiments of the present application.

The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or N wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Further, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.

It should be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the N steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. If implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.

It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.

In addition, functional units in the embodiments of the present application may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a separate product, may also be stored in a computer-readable storage medium.

The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc. While embodiments of the present application have been shown and described above, it will be understood that the above embodiments are exemplary and should not be construed as limiting the present application and that changes, modifications, substitutions and alterations in the above embodiments may be made by those of ordinary skill in the art within the scope of the present application.

Claims

1. An adaptive real-time delivery method for a media stream, wherein the media stream includes at least one media sub-stream, each media sub-stream is a sequence of media units generated in real-time on a server, wherein each media sub-stream is associated with a sub-stream number, each media unit is associated with a generation time and/or a sequence number indicating a generation sequence of the media unit in the media sub-stream, and when the media stream includes multiple media sub-streams, the multiple media sub-streams are any media types that need to be synchronously transmitted, where the media types include an audio stream, a video stream, and a data stream, the method includes the following steps:

receiving a media segment request sent by a client, wherein the media segment request carries at least one pull command, the pull command does not carry or carries at least one control parameter, the control parameter comprises a first type parameter indicating a target media stream to be transmitted, a second type parameter indicating a target media sub-stream to be transmitted and a third type parameter indicating a candidate media unit to be transmitted, the second type parameter comprises a sub-stream list, the sub-stream list comprises the number of one or more target media sub-streams, and when the media segment request carries a plurality of pull commands, the first type parameters carried by the plurality of pull commands are the same or different;

generating media segments according to the media segment requests, wherein for each pull command in the media segment requests, the target media stream to be transmitted is selected, at least one target media sub-stream to be transmitted in the target media stream is selected, candidate media units to be transmitted in the target media sub-streams are determined, and the candidate media units determined by each pull command are packaged into the media segments, wherein when a second type of parameter carried by the pull command comprises a sub-stream list, the candidate media units of a plurality of sub-streams are sequentially packaged according to the sequence of the sub-stream numbers appearing in the sub-stream list, and when the pull command carries at least one third type of parameter, wherein each third type of parameter corresponds to at least one constraint condition of the candidate media unit, the candidate media units determined by the pull command comprise all media units which simultaneously meet all constraint conditions corresponding to the third type of parameter in each target media sub-stream; and

and sending the media segment to the client.

2. The method of claim 1, wherein generating a media segment from the media segment request comprises:

if the pull command does not carry the first type of parameters, the target media stream to be transmitted is a default specified media stream;

if the pull command does not carry the second type of parameters, the target media sub-stream to be transmitted is at least one media sub-stream specified by default in the target media stream;

if the pull command does not carry the third type of parameters, the candidate media units comprise default specified media units in the target media substream, the default specified media units are media units of which the sequence number intervals between all the media units and the latest media unit in the target media substream are smaller than a first preset value, or media units of which the generation time intervals between all the media units and the latest media unit in the target media substream are smaller than a second preset value, and the first preset value and the second preset value are obtained according to the target media substream.

3. The method of claim 1, wherein the second type of parameter comprises a sub-stream pattern, and the sub-stream pattern is an N-bit bitstream, where N is the number of media sub-streams included in the target media stream, and each bit of the sub-stream pattern is associated with a specific media sub-stream of the target media stream and is used for indicating whether the specific media sub-stream is a target media sub-stream to be transmitted.

4. The method of claim 1, wherein generating media segments according to the media segment requests further comprises:

encapsulating media sub-stream description information into the media segment, the media sub-stream description information including at least one entry, wherein each entry corresponds to a media sub-stream of the media stream and includes at least one field: the media sub-streams are numbered.

5. The method of claim 4, wherein each entry further comprises at least one of the following fields: media component identification, substream type, substream rate, substream priority, coding hierarchy, viewpoint identification, video resolution, video frame rate, channel identification, audio sampling rate, language type.

6. The method of claim 1, wherein media units in the target media substreams are numbered synchronously, wherein each time a specified time period passes, all media units generated in the specified time period in each target media substream are associated with a same new sequence number, the third type of parameter includes a start sequence number, and the constraint condition corresponding to the start sequence number is:

if the start sequence number is valid, the sequence number of the candidate media unit is subsequent to the start sequence number or equal to the start sequence number.

7. The method of claim 1, wherein the generation time of the media units in all the target media sub-streams is derived from the same clock on the server, the third type of parameter includes a start time, and the constraint condition for the start time is:

if the start time is valid, the generation time of the candidate media unit is after the start time.

8. The method according to claim 1, wherein the third type of parameter includes a maximum time offset, and the constraint condition corresponding to the maximum time offset is:

if the maximum time offset is valid, the generation time interval of the candidate media unit and the latest media unit in the target media sub-stream is less than the maximum time offset.

9. The method of claim 1, wherein encapsulating the candidate media units determined by the respective pull commands into the media segment comprises:

and packaging the candidate media units determined by each pull command to the media segments according to the sequence of the pull command in the media segment request, wherein if the parameters carried by any one pull command comprise a unit sequencing mode, the candidate media units determined by the pull command are sequenced according to the unit sequencing mode and then packaged to the media segments, and if the parameters do not carry the unit sequencing mode, the candidate media units determined by the pull command are sequenced according to a default sequencing mode and then packaged to the media segments.

10. The method of claim 9, wherein the cell ordering is a concatenation of one or more basic orderings of basic orderings, the basic orderings including: time forward ordering, time reverse ordering, sequence number forward ordering, sequence number reverse ordering, substream number sequence ordering.

11. An adaptive real-time delivery server of a media stream, wherein the media stream includes at least one media sub-stream, each media sub-stream is a sequence of media units generated in real-time on the server, wherein each media sub-stream is associated with a sub-stream number, each media unit is associated with a generation time and/or a sequence number indicating a generation sequence of the media unit in the media sub-stream, when the media stream includes multiple media sub-streams, the multiple media sub-streams are any media types that need to be synchronously transmitted, and the media types include an audio stream, a video stream and a data stream, the server includes:

a client interface component, configured to receive a media segment request sent by a client, where the media segment request carries at least one pull command, and the pull command does not carry or carries at least one control parameter, where the control parameter includes a first type parameter indicating a target media stream to be transmitted, a second type parameter indicating a target media sub-stream to be transmitted, and a third type parameter indicating a candidate media unit to be transmitted, where the second type parameter includes a sub-stream list, the sub-stream list includes numbers of one or more target media sub-streams, and when the media segment request carries multiple pull commands, the first type parameters carried by the multiple pull commands are the same or different;

a media segment generating component, configured to generate a media segment according to the media segment request, where, for each pull command in the media segment request, the target media stream to be transmitted is selected, at least one target media sub-stream to be transmitted in the target media stream is selected, candidate media units to be transmitted in the target media sub-stream are determined, and the candidate media units determined by each pull command are encapsulated into the media segment, where when a second-class parameter carried by the pull command includes a sub-stream list, candidate media units of multiple sub-streams are sequentially encapsulated according to an order in which respective sub-stream numbers appear in the sub-stream list, and when the pull command carries at least one third-class parameter, where each third-class parameter corresponds to at least one constraint condition of the candidate media unit, and the candidate media units determined by the pull command include all media units in each target media sub-stream that simultaneously satisfy all constraint conditions corresponding to the third-class parameter;

and the media segment sending component is used for sending the generated media segments to the client.

12. The server according to claim 11, wherein the media segment generating component is further configured to, when the pull command does not carry the first type of parameter, determine the target media stream to be transmitted as a default specified media stream, and when the pull command does not carry the second type of parameter, determine the target media sub-stream to be transmitted as at least one media sub-stream specified by default in the target media stream, and when the pull command does not carry the third type of parameter, determine the candidate media units as the default specified media units in the target media sub-stream, where the default specified media units are media units whose sequence number intervals between all and the latest media unit in the target media sub-stream are smaller than a first preset value, or media units whose generation time intervals between all and the latest media unit in the target media sub-stream are smaller than a second preset value, where the first preset value and the second preset value are both obtained according to the target media sub-stream.

13. The server according to claim 11, wherein the second type of parameter comprises a sub-stream pattern, the sub-stream pattern is an N-bit bitstream, where N is the number of media sub-streams included in the target media stream, and each bit of the sub-stream pattern is associated with a specific media sub-stream of the target media stream and is used for indicating whether the specific media sub-stream is a target media sub-stream to be transmitted.

14. The server according to claim 11, wherein the media segment generation component is further configured to encapsulate media sub-stream description information into the media segments, the media sub-stream description information comprising at least one entry, wherein each entry corresponds to one media sub-stream of the media stream and comprises at least one field: the media sub-stream numbers.

15. The server of claim 14, wherein each entry further comprises at least one of the following fields: media component identification, substream type, substream rate, substream priority, coding hierarchy, viewpoint identification, video resolution, video frame rate, channel identification, audio sampling rate, language type.

16. The server according to claim 11, wherein media units in the target media sub-streams are numbered synchronously, wherein each time a specified time period elapses, all media units generated by each target media sub-stream in the specified time period are associated with a same new sequence number, the third type of parameter includes a start sequence number, and the constraint condition corresponding to the start sequence number is:

17. The server according to claim 11, wherein the generation times of the media units in all the target media sub-streams are derived from the same clock on the server, the third type of parameter includes a start time, and the constraint conditions corresponding to the start time are:

18. The server according to claim 11, wherein the third type of parameter includes a maximum time offset, and the constraint condition corresponding to the maximum time offset is:

19. The server of claim 11, wherein the media segment generation component is further configured to encapsulate the candidate media units determined by each pull command into the media segment according to an order in which each pull command appears in the media segment request, wherein if a parameter carried by any one pull command includes a unit ordering manner, the candidate media units determined by the pull command are ordered according to the unit ordering manner and then encapsulated into the media segment, and if the unit ordering manner is not carried, the candidate media units determined by the pull command are ordered according to a default ordering manner and then encapsulated into the media segment.

20. The server according to claim 19, wherein the cell ordering is a concatenation of one or more basic orderings of basic orderings, the basic orderings including: time forward ordering, time reverse ordering, sequence number forward ordering, sequence number reverse ordering, substream number sequence ordering.

21. A computer device, comprising: memory, processor and computer program stored on the memory and executable on the processor, the processor executing the program to implement the method of adaptive real-time delivery of a media stream according to any of claims 1-10.

22. A non-transitory computer-readable storage medium, having stored thereon a computer program for execution by a processor for implementing a method for adaptive real-time delivery of a media stream according to any of claims 1-10.