WO2024104080A1 - 传输媒体数据流的方法、装置、存储介质及电子设备 - Google Patents

传输媒体数据流的方法、装置、存储介质及电子设备 Download PDF

Info

Publication number
WO2024104080A1
WO2024104080A1 PCT/CN2023/127001 CN2023127001W WO2024104080A1 WO 2024104080 A1 WO2024104080 A1 WO 2024104080A1 CN 2023127001 W CN2023127001 W CN 2023127001W WO 2024104080 A1 WO2024104080 A1 WO 2024104080A1
Authority
WO
WIPO (PCT)
Prior art keywords
rtp
data stream
media data
segment
media
Prior art date
Application number
PCT/CN2023/127001
Other languages
English (en)
French (fr)
Inventor
谭志华
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Publication of WO2024104080A1 publication Critical patent/WO2024104080A1/zh

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/60Network streaming of media packets
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/60Network streaming of media packets
    • H04L65/65Network streaming protocols, e.g. real-time transport protocol [RTP] or real-time control protocol [RTCP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1095Replication or mirroring of data, e.g. scheduling or transport for data synchronisation between network nodes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/60Network structure or processes for video distribution between server and client or between remote clients; Control signalling between clients, server and network components; Transmission of management data between server and client, e.g. sending from server to client commands for recording incoming content stream; Communication details between server and client 
    • H04N21/63Control signaling related to video distribution between client, server and network components; Network processes for video distribution between server and clients or between remote clients, e.g. transmitting basic layer and enhancement layers over different transmission paths, setting up a peer-to-peer communication via Internet between remote STB's; Communication protocols; Addressing
    • H04N21/643Communication protocols
    • H04N21/6437Real-time Transport Protocol [RTP]

Definitions

  • WebRTC Web Real-Time Communication introduces real-time communication into web browsers, including audio and video calls.
  • WebRTC implements web-based voice or video calls, with the goal of achieving real-time communication capabilities on the web without plug-ins.
  • WebRTC is mainly based on the RTP protocol (Real-Time Transport Protocol) for real-time audio and video communications.
  • the header of the RTP protocol can be expanded to meet more needs, but the RTP header extension is mainly to expand some data frames of the data stream.
  • the RTP header extension is mainly to expand some data frames of the data stream.
  • the purpose of the present application is to provide a method, device, storage medium and electronic device for transmitting media data streams, which helps to avoid the trouble of adding an SDP negotiation process and a new RTP stream when a new media data stream is needed, thereby improving the transmission efficiency and convenience of media data streams.
  • a method for transmitting a media data stream which is executed in a server, and the method includes: obtaining a first data segment from a first media data stream as an RTP payload; obtaining a second data segment from a second media data stream; adding the second data segment to an RTP extension header; and generating an RTP data packet including the RTP extension header and the RTP payload.
  • a method for transmitting a media data stream which is executed in a terminal device, and the method includes: obtaining an RTP data packet, an RTP payload and an RTP extension header of the RTP data packet, the RTP payload including a first data segment of a first media data stream, and the RTP extension header including a second data segment of a second media data stream; parsing the first data segment of the first media data stream from the RTP payload of the RTP data packet; parsing the second data segment of the second media data stream from the RTP extension header of the RTP data packet.
  • a media data stream synchronization device which includes: an acquisition module, used to acquire an extended data packet corresponding to an extended media data stream; a sending module, used to fill the extended data packet into an RTP extended header, form an RTP data packet based on the filled RTP extended header and the RTP payload carrying the original media data stream, and send the RTP data packet to a terminal device.
  • a device for transmitting a media data stream comprising:
  • a receiving module which obtains an RTP data packet, an RTP payload and an RTP extension header of the RTP data packet, wherein the RTP payload includes a first data segment of a first media data stream, and the RTP extension header includes a second data segment of a second media data stream;
  • a parsing module is used to parse the RTP payload of the RTP data packet to obtain the first data segment of the first media data stream; and to parse the RTP extension header of the RTP data packet to obtain the second data segment of the second media data stream.
  • a computer-readable storage medium on which a computer program is stored.
  • the computer program is executed by a processor, the method in the above technical solution is implemented.
  • an electronic device comprising: a processor; and a memory for storing executable instructions of the processor; wherein the processor is configured to execute the method in the above technical solution by executing the executable instructions.
  • a computer program product includes computer instructions.
  • the computer instructions When the computer instructions are executed on a computer, the computer executes the method in the above technical solution.
  • FIG1 schematically shows a schematic diagram of the structure of a system architecture to which the method for transmitting a media data stream in an embodiment of the present application is applied.
  • FIG. 2 schematically shows a flowchart of steps of a method for transmitting a media data stream in an embodiment of the present application.
  • FIG3 schematically shows a structural diagram of inserting an extended media data stream into an original media data stream according to a time correspondence between the original media data stream and the extended media data stream in an embodiment of the present application.
  • FIG. 4 schematically shows a structural diagram of a fixed header of the real-time transport protocol RTP protocol in an embodiment of the present application.
  • Figure 5A schematically shows a structural diagram of the one-byte header extension header of the real-time transport protocol RTP protocol in an embodiment of the present application.
  • Figure 5B schematically shows a structural diagram of the two-byte header extension header of the real-time transport protocol RTP protocol in an embodiment of the present application.
  • FIG. 6 shows a schematic diagram of a first media data stream and a second media data stream.
  • FIG. 7 schematically shows a schematic diagram of the structure of the underlying protocol stack of the WebRTC media data stream in an embodiment of the present application.
  • FIG. 8 schematically shows a flow chart of a method for transmitting a media data stream.
  • FIG. 9 schematically shows a structural block diagram of an apparatus for transmitting a media data stream in an embodiment of the present application.
  • FIG10 schematically shows a block diagram of a computer system structure of an electronic device suitable for implementing an embodiment of the present application.
  • RTP Real-time Transport Protocol
  • UDP User Datagram Protocol, a connectionless transport layer protocol that provides simple, transaction-oriented, unreliable information transmission services.
  • the real-time transport protocol RTP is usually used to package and transmit audio and video data.
  • the RTP protocol is used to package and transmit audio and video data, only one media data stream can be packaged and transmitted.
  • the RTP extension header can only expand some data frames with a small amount of data, and the data frame is the data frame corresponding to the media data stream itself. It cannot expand data with a large amount of data.
  • the extended data is the extended information corresponding to the video stream itself, and the RTP extension header cannot be used to expand data such as subtitle streams, interactive text, background music, etc. that need to be displayed synchronously with the video stream.
  • this media data stream transmission method has the problems of cumbersome steps, large delay, poor synchronization effect, and poor user experience, and the flexibility of dynamically adding or reducing a data stream is poor. Therefore, the relevant technology can no longer be applied to live broadcast, video conferencing, P2P and other scenarios that require high synchronization of media data streams.
  • many types of data streams cannot be synchronized, such as the extended interactive supplementary data streams of the audio and video streams themselves, such as metadata streams, etc., which cannot be synchronized.
  • an embodiment of the present application proposes a method for transmitting media data streams.
  • the method for transmitting media data streams in the present application can be applied to any live broadcast and audio and video call scenarios, such as video conferencing, video calls, interactive live broadcasts, e-commerce live broadcasts, etc., and while achieving synchronization, there is no need to redefine a new media data stream and many related fields. Only on the basis of the existing RTP stream, an independent media data stream can be constructed, so that one RTP stream can transmit two media data streams at the same time.
  • FIG1 shows a block diagram of an exemplary system architecture applying the technical solution of the present application.
  • the system architecture 100 may include a terminal device 101, a server 102, and a network 103.
  • the terminal device 101 may be any electronic device with a display screen or a display screen and a voice playback device, such as a smart phone, a tablet computer, a laptop computer, a desktop computer, a smart TV, a smart car terminal, etc.
  • the terminal device 101 may be used to receive two media data streams transmitted simultaneously through the same RTP data packet.
  • the two media data streams may be both video data, both audio data, both subtitle stream data, or any two of video data, audio data, and subtitle stream data.
  • the terminal device 101 may render and display the media data streams on the display screen.
  • the server 102 may be an independent physical server, or a server cluster or distributed system composed of multiple physical servers, or a cloud server providing cloud computing services.
  • the network 103 may be a communication medium of various connection types capable of providing a communication link between the client 101 and the server 102 , for example, a wired communication link or a wireless communication link.
  • the system architecture in the embodiment of the present application can have any number of terminal devices, networks and servers.
  • the server can be a server group composed of multiple server devices.
  • the technical solution provided in the embodiment of the present application can be applied to the terminal device 101.
  • the terminal device 101 first performs SDP negotiation with the server 102 to ensure that the underlying code supports the functions required by the application layer, and then transmits the media data stream between the terminal device 101 and the server 102.
  • an RTP data packet containing a first data segment of the original media data stream and a second data segment of the extended media data stream can be generated based on the RTP protocol, and then the RTP data packet is encapsulated through a preset transmission protocol (such as UDP, etc.) to form a target data packet, and the target data packet is sent to the terminal device 101 through the network 103, so that the terminal device 101 obtains the RTP data packet by parsing the target data packet, obtains the first data segment of the original media data stream and the second data segment of the extended media data stream by parsing the RTP data packet, and then obtains two different media data streams by decoding the first data segment and the second data segment, and renders and synchronously presents according to the original media data stream and the
  • the RTP protocol header can be extended and customized fields can be used to achieve RTP stream multiplexing, so that an independent media data stream can be constructed based on one media data stream, thereby achieving simultaneous transmission of two media data streams through one RTP stream, without having to perform SDP negotiation again for the newly added media data stream, and being compatible with the WebRTC standard.
  • the system architecture may be slightly different depending on the application scenario. For example, in a P2P scenario, there may be multiple terminal devices, but no server, that is, the terminal device is both a terminal and a server, etc. Although the system architecture is different, the method of using the RTP header extension stream multiplexing method to perform synchronous transmission of two media data streams is the same.
  • the server 102 in the present application may be a cloud server that provides cloud computing services, that is, the present application involves cloud storage and cloud computing technologies.
  • Cloud storage is a new concept extended and developed from the concept of cloud computing.
  • a distributed cloud storage system (hereinafter referred to as storage system) refers to a storage system that uses cluster applications, grid technology, and distributed storage file systems to bring together a large number of different types of storage devices (storage devices are also called storage nodes) in the network through application software or application interfaces to work together and provide external data storage and business access functions.
  • the storage method of the storage system is to create a logical volume.
  • physical storage space is allocated to each logical volume.
  • the physical storage space may be composed of disks of a storage device or several storage devices.
  • the client stores data on a logical volume, that is, the data is stored on the file system.
  • the file system divides the data into many parts, each of which is an object.
  • the object contains not only data but also additional information such as data identification (ID, ID entity).
  • ID data identification
  • the file system writes each object to the physical storage space of the logical volume, and the file system records the storage location information of each object.
  • the file system can allow the client to access the data according to the storage location information of each object.
  • the process of the storage system allocating physical storage space to logical volumes is as follows: according to the capacity estimation of the objects stored in the logical volumes (this estimation often has a large margin relative to the capacity of the actual objects to be stored) and the groups of independent redundant disk arrays (RAID, Redundant Array of Independent Disks), the physical storage space is pre-divided into stripes.
  • One logical volume can be understood as a stripe, thereby allocating physical storage space to the logical volume.
  • Cloud computing is a computing model that distributes computing tasks across a resource pool consisting of a large number of computers, enabling various application systems to obtain computing power, storage space, and information services as needed.
  • the network that provides resources is called a "cloud”. From the user's perspective, the resources in the "cloud” are infinitely scalable and can be accessed at any time, used on demand, expanded at any time, and paid for on a per-use basis.
  • a cloud computing resource pool (referred to as a cloud platform, generally referred to as an IaaS (Infrastructure as a Service) platform) will be established, and various types of virtual resources will be deployed in the resource pool for external customers to choose to use.
  • the cloud computing resource pool mainly includes: computing devices (virtualized machines, including operating systems), storage devices, and network devices.
  • the PaaS (Platform as a Service) layer can be deployed on the IaaS (Infrastructure as a Service) layer, and the SaaS (Software as a Service) layer can be deployed on the PaaS layer. SaaS can also be deployed directly on IaaS.
  • PaaS is a platform for software operation, such as databases, web containers, etc. SaaS is a variety of business software, such as web portals, SMS mass senders, etc. Generally speaking, SaaS and PaaS are upper layers relative to IaaS.
  • FIG2 shows a flow chart of a method for transmitting a media data stream in an embodiment of the present application, and the method for transmitting a media data stream can be executed by a server, for example.
  • the server can be, for example, the server 102 in FIG1 .
  • the method 200 for transmitting a media data stream in an embodiment of the present application can mainly include the following steps S210 to S240.
  • Step S210 Obtain a first data segment from a first media data stream as an RTP payload.
  • the first media data stream refers to a media data stream transmitted by an RTP payload of an RTP stream (i.e., a sequence of multiple RTP data packets), and may also be referred to as an original media data stream.
  • a first data segment is a slice of the first media data stream.
  • Step S220 Acquire a second data segment from a second media data stream.
  • the second media data stream may also be referred to as an extended media data stream.
  • Step S230 Add the second data segment to the RTP extension header.
  • a second data segment is a fragment of the second media data stream.
  • Step S240 Generate an RTP data packet including the RTP extension header and the RTP payload.
  • the RTP data packet generated according to method 200 can be used to transmit two media data streams simultaneously.
  • the present application can realize the multiplexing of one RTP stream and carry two different media data streams in the same RTP stream, avoiding the creation of media data stream protocol stacks for different media data streams (i.e., creating one RTP stream respectively) and avoiding the SDP negotiation based on the Session Description Protocol for different media data streams, thereby simplifying the transmission steps of the media data stream and improving the transmission efficiency and convenience of the media stream data;
  • the method for transmitting the media data stream in the embodiment of the present application only needs to perform RTP header extension customization at the RTP service sending layer and perform corresponding parsing and assembly at the RTP service receiving layer, so as to realize the synchronous transmission of a dynamic data stream, and is compatible with the WebRTC standard, and can dynamically and flexibly increase the media data stream;
  • by transmitting a first data fragment and a second data fragment in the same RTP data packet the synchronization of the two data fragments can be realized.
  • the transmission of the two media data streams can be synchronization of the two data fragments can be
  • step S210 may fragment the first media data stream to obtain a plurality of fragments, each of which is taken as a first data fragment.
  • Step S210 may obtain, for a first data fragment, a data fragment having a collection period that is the same as the collection period of the first data fragment from the second media data stream as a second data fragment.
  • step S210 may obtain, for a first data fragment, a data fragment having a display period that is the same as the display period of the first data fragment from the second media data stream as a second data fragment.
  • the embodiment of the present application may transmit two data fragments that are time-aligned (i.e., have the same collection period or display period) in the same RTP data packet.
  • the first media data stream that needs to be obtained from the address corresponding to the media device or the uniform resource identifier is defined as the original media data stream.
  • the second media data stream that needs to be displayed synchronously with the original media data stream is defined as the extended media data stream.
  • the data stream corresponding to the video recorded by the anchor during the live broadcast is the original media data stream
  • the data stream corresponding to the media objects such as the pictures, subtitles, and background music played in the live broadcast screen
  • the data stream corresponding to the text information, pictures, etc. sent by the audience when interacting with the anchor is the extended media data stream.
  • the type of extended media data stream will also be different depending on the scene.
  • the extended media data stream may include interactive subtitles, etc.
  • the extended media data stream may include subtitles, pictures, etc.
  • the extended media data stream may include audio, pictures, etc.
  • the extended media data stream can also be other types of objects, and the embodiment of the present application does not specifically limit this.
  • the media data stream is transmitted in the form of data packets. Therefore, when preparing to transmit the media data stream, the media data stream needs to be packetized (also called grouping or slicing) to form corresponding data segments in sequence. Accordingly, in the embodiment of the present application, if you want to achieve synchronous transmission of the extended media data stream, you need to packetize the acquired extended media data stream to obtain multiple second data segments corresponding to the extended media data stream. And when acquiring the extended media data stream, you also need to acquire the original media data stream and packetize the original media data stream to form multiple corresponding first data segments.
  • the original media data stream and the extended media data stream in order to realize the synchronous transmission of the two media data streams, the original media data stream and the extended media data stream, through RTP stream multiplexing, and synchronously display the objects corresponding to the original media data stream and the extended media data stream on the display screen of the terminal device, when generating the second data segment, it is first necessary to align the extended media data stream according to the time correspondence between the original media data stream and the extended media data stream (for example, the correspondence between the sampling time or the display time), and then packetize the extended media data stream to form the second data segment.
  • the time correspondence between the original media data stream and the extended media data stream for example, the correspondence between the sampling time or the display time
  • the original media data stream and the extended media data stream are independent of each other, and are also independent of each other and do not affect each other when the original media data stream and the extended media data stream are packetized.
  • the terminal device since the extended media data stream is aligned according to the time correspondence between it and the original media data stream, and the extended media data stream and the original media data stream are sent to the terminal device through the same RTP stream, the terminal device can synchronously obtain the time-aligned first data segment and the second data segment, and render and synchronously present according to the first data segment and the second data segment.
  • FIG3 schematically shows an interface diagram for aligning the first data segment of the first media data stream and the second data segment of the second media data stream according to the time correspondence between the first media data stream and the second media data stream in a live broadcast scene.
  • the live broadcast data stream is the first media data stream.
  • background music is inserted, and the background music is the extended media data stream.
  • the background music ends at the end of the 10th minute.
  • the first data segment where the image frame of the 5th minute period in the host's live broadcast video is located can be aligned with the second data segment where the starting frame of the background music is located, and the first data segment corresponding to the image frame of the 10th minute period in the host's live broadcast video can be aligned with the second data segment where the ending frame of the background music is located.
  • the terminal device can simultaneously obtain the first data segment corresponding to the host's live broadcast video and the second data segment corresponding to the background music, and can also synchronously play the background music while displaying the screen of the 5th minute to the 10th minute of the live broadcast video, so as to realize the synchronous transmission and synchronous presentation of the first media data stream and the second media data stream.
  • the second media data stream is a dynamic data stream and also a customized data stream.
  • the second media data stream is independent of the first media data stream and can be directly added to the RTP extension header.
  • the second media data stream is a media data stream that is highly correlated with the first media data stream. Specifically, it can be a synchronous substream or an interactive substream of the first media data stream.
  • the synchronous substream is a media data stream that has the same producer as the first media data stream.
  • the audio is the first media data stream
  • the background sound is the synchronous substream
  • the interactive substream is a media data stream that has a different producer from the first media data stream.
  • the media data stream generated by the host is the first media data stream
  • the audio or subtitle stream generated by the interactor during the interaction is the interactive substream.
  • the negotiation is implemented based on the session description protocol SDP, and is implemented by the terminal device and the server through offer/answer.
  • the terminal device sends an SDP proposal to the server through the network.
  • the server determines whether to accept or reject. When it is determined to accept, a response is sent to the terminal device through the network. When it is determined to reject, a rejection is sent to the terminal device through the network.
  • the server After the server sends a response to the terminal device, it confirms the functional support involved in the SDP proposal. Further, the transmission of the media data stream can be carried out based on the RTP protocol.
  • the SDP negotiation mainly negotiates the specific information of the media data stream.
  • the extension header of RTP is also negotiated.
  • the specific content of the SDP negotiation is illustrated as follows:
  • ⁇ media> media type such as audio, video, etc.
  • ⁇ proto>Transmission protocol such as UDP/RTP, which means using UDP to transmit RTP data packets
  • ⁇ fmt list> media formats, data payload type list.
  • the SDP layer describes the RTP extension header as follows:
  • value represents the extension header identifier
  • direction indicates the transmission direction, which can be sendonly, recvonly, sendrecv, inactive.
  • the default value is sendrecv.
  • URI indicates the URI of the extension header.
  • the communicating parties can use the URI to indicate the meaning of the extension header so that both parties can understand it.
  • Extension attributes represent other media data stream information, such as stream identifiers and other complex descriptions.
  • the parameters such as the extension header identifier and URI corresponding to the second media data stream can also be described through the SDP proposal, so that the functions used in the subsequent transmission of the second media data stream can be supported by the underlying code, and when an independent second media data stream is added on the basis of the existing first media data stream, there is no need to perform SDP negotiation for the newly added second media data stream.
  • only one SDP negotiation is required in the present application to realize the synchronous transmission of two independent media data streams based on RTP stream multiplexing.
  • method 200 may receive a proposal generated based on a session description protocol SDP and sent by a terminal device.
  • the proposal includes a media description and extended information.
  • the extended information is used to represent the description information related to the transmission of the second media data stream in the RTP extension header, for example, including an extension header identifier and URI for transmitting the second media data stream.
  • the extended information may include extmap: ⁇ value>[“/” ⁇ direction>] ⁇ URI> ⁇ extension attributes>.
  • the method 200 includes generating a response to the proposal.
  • the response may include, for example, an indication of the server related to the acceptance of the SDP proposal, such as a confirmation of support for the functions involved in the SDP proposal.
  • the method 200 may also send a response to the terminal device to complete the SDP negotiation with the terminal device.
  • an RTP data packet can be generated based on the first data fragment and the second data fragment, and the RTP data packet can be sent to the terminal device.
  • one RTP stream can only transmit one media data stream (that is, one RTP data packet in one RTP stream can only transmit data fragments of one media data stream).
  • the present application extends the RTP header, and then uses the RTP extension header in one RTP stream to transmit the data fragments of one media data stream.
  • the header of an RTP data packet (i.e., an RTP message) includes a fixed header and an extended header.
  • FIG4 schematically shows a schematic diagram of the structure of a fixed header of an RTP data packet.
  • CC is the CSRC counter, which occupies 4 bits and indicates the number of CSRC identifiers
  • M is the flag bit, which occupies 1 bit and has different meanings for different payloads. For video, it marks the end of a frame, and for audio, it marks the beginning of a session
  • PT payload type
  • PT payload type
  • the timestamp occupies 32 bits and must use a 90kHZ clock frequency (90000 in the program).
  • the timestamp reflects the sampling time of the first octet of the RTP data packet.
  • the receiver uses the timestamp to count Calculate delay and delay jitter, and perform synchronization control.
  • the timing of the data packet can be obtained according to the timestamp of the RTP packet; the synchronization source (SSRC) identifier occupies 32 bits and is used to identify the synchronization source.
  • the synchronization source refers to the source that generates the media stream.
  • each CSRC identifier occupies 32 bits, there can be 0 to 15 CSRCs, each CSRC identifies all provider sources contained in the RTP data packet payload.
  • FIG. 5A schematically shows a schematic diagram of the structure of the one-byte header extension header.
  • the one-byte header extension header includes the extension flag 0XBEDE, the total length of the extension header length, and the extension header identification ID of multiple extension headers corresponding to the total length of the extension header length, the corresponding load length L, and the load data.
  • Figure 5B schematically shows a schematic diagram of the structure of the two-byte header extension header.
  • the two-byte header extension header includes the extension flag 0x100 and the appbits field. Further, it can also include the total data length of the extension header length and the extension header element identification ID of multiple extension header elements corresponding to the total data length of the extension header length, the corresponding load length L, and the load data.
  • the header formats and data lengths of the two extension methods are different. Since the purpose of this application is that the RTP data packet can independently carry an extended media data stream while carrying the original media data stream, it is necessary to customize the functional parameter fields required for the extended media data stream during transmission. There is no customizable field in the one-byte header extension header, while the appbits field in the two-byte header extension header depends on the application and can be defined as any value or meaning, and can be used to fill in the application.
  • the data at the application layer level is data that is not supported by the standard. The embodiments of the present application can customize the data to be filled.
  • the appbits field is regarded as a special extension value assigned to the local identifier 256. If no extension is specified for this local identifier 256 through configuration or signaling, the sender should set the appbits field to all 0s and the receiver must ignore this field.
  • the sender in order to realize RTP media data stream multiplexing, it is necessary to customize the appbits field in the two-byte header extension header so that the underlying code can realize the functional support for the extended media data stream added to the RTP data packet.
  • the first data segment can be filled into the RTP message as the RTP payload
  • the second data segment can be filled into the RTP extended header
  • an RTP data packet carrying the first data segment of the original media data stream and the second data segment of the extended media data stream is generated.
  • the RTP data packet can be sent to the terminal device so that the terminal device can render and synchronously present according to the original media data stream and the extended media data stream by receiving each RTP data packet.
  • the RTP data packet includes a header and an RTP payload, wherein the header includes an RTP fixed header and an RTP extended header.
  • the terminal device after receiving multiple RTP data packets, the terminal device needs to extract the second data segment from each RTP data packet to obtain a completed second media data stream.
  • the embodiment of the present application can mark the starting segment and the ending segment respectively.
  • a start field indicating that the second data segment is the starting segment is set in the RTP extension header of the RTP data packet.
  • an end field indicating that the second data segment is the ending segment is set in the RTP extension header.
  • the appbits field in the RTP extension header is used to indicate the position of the second data segment in the second media data stream.
  • the value range of the position of the second data fragment includes the position of the starting fragment, the position of the ending fragment, and the position between the starting fragment and the ending fragment.
  • the appbits field includes 4 bits.
  • Setting the appbits field in the RTP extension header to the starting field includes: setting the first bit of the appbits field to 1, and setting the remaining bits to 0. That is, the appbits field is set to 1000.
  • setting the appbits field in the RTP extension header to the ending field includes: setting the second bit of the appbits field to 1, and setting the remaining bits to 0, that is, setting the appbits field to 0100.
  • the appbits field in the RTP extension header carrying the second data fragment is set to 0000.
  • the data in the RTP data packet can be extracted and assembled to obtain an extended media data stream.
  • the starting segment and the ending segment are determined according to the starting field and the ending field.
  • the starting segment and the ending segment, as well as the second data segment between the two, can form a complete extended media data stream.
  • FIG6 shows a schematic diagram of the first media data stream and the second media data stream.
  • the extended media data stream is formed into multiple segments after segmentation, and there are a start segment ts1 and an end segment ts6 in these segments, and the start segment ts1 is marked with a start field 1000, and the end segment ts6 is marked with an end field 0100. That is to say, all segments from the start segment ts1 to the end segment ts6 constitute a complete second media data stream; the original media data stream is also formed into multiple segments after segmentation, such as segments TS1 to TS9.
  • segment TS1 is the first first data segment in the original media data stream
  • start segment ts1 is aligned with the segment TS2, so the start segment ts1 and the segment TS2 can be transmitted by the same RTP data packet.
  • end segment ts2 and the segment TS8 can be transmitted by the same RTP data packet.
  • the second data segments that constitute the extended media data stream can be marked according to the custom rules of the appbits field.
  • the server can first segment the extended media data stream.
  • a frame of an object (audio, image, etc.) can be encoded into one or more second data segments.
  • Different second data segments can be marked in the process of generating an RTP data packet. Since the second data segments in the second media data stream are arranged in sequence, when generating an RTP data packet, the start segment and the end segment can be marked.
  • the specific marking method is to mark in the appbits field.
  • the appbits field is set as the start field in the RTP extension header containing the start segment, that is, the first bit of the appbits field is marked as 1, and the other bits are marked as 0.
  • the appbits field is set to the end field in the RTP extension header containing the end segment, that is, the second bit of the appbits field is marked as 1, and the other bits are marked as 0.
  • the appbits field can be set to 0000 in the RTP extension header containing the second data segment. In this way, the terminal device can determine the beginning of the second media data stream when the start field is parsed from an RTP extension header.
  • the end of the second media data stream can be determined.
  • the size of the header and the RTP data packet of the RTP data packet is limited, exemplarily, the size of the header of the RTP data packet does not exceed 255 bytes, and the total size of the RTP data packet does not exceed 1200 bytes. Therefore, in order to carry the second data segment corresponding to the extended media data stream through the RTP extended header, it is necessary to set the fragment size of the extended media data stream according to the extended size of the header of the RTP data packet, and then fragment and transmit the second media data stream according to the fragment size.
  • the RTP data packet when an RTP data packet is generated, can be sent to a terminal device so that the terminal device synchronously presents the original media data stream and the extended media data stream.
  • the RTP data packet can be encapsulated according to a preset transmission protocol (such as UDP, etc.) to generate a target data packet (such as a UDP data packet) corresponding to the preset transmission protocol, and then the target data packet is sent to the terminal device so that the terminal device can obtain the required data therefrom.
  • a preset transmission protocol such as UDP, etc.
  • the preset transmission protocol can specifically be a UDP protocol.
  • UDP is a connectionless transport layer protocol, although it provides a simple transaction-oriented unreliable information transmission service, it can improve the timeliness of data transmission, reduce delays, and improve user experience. Therefore, the UDP protocol is usually used as the preset transmission protocol. Of course, other transmission protocols can also be used, and the embodiments of the present application do not specifically limit this.
  • FIG7 schematically shows a schematic diagram of the structure of the underlying protocol stack of the WebRTC media data stream.
  • the underlying protocol stack of the WebRTC media data stream is, from top to bottom, a media data stream layer 701, an SRTP layer 702, a DTLS layer 703, and a UDP layer 704.
  • the SRTP layer 702 is in the transport layer, and is mainly used to process the media data stream in the media data stream layer 701 to generate RTP data packets.
  • the DTLS layer 703 is a data packet transport layer security protocol layer, and is used to ensure the security of the RTP data packet during transmission, and to ensure that the RTP data packet is transmitted through an encrypted channel.
  • the UDP layer 704 is in the transport layer, and after the RTP data packet reaches the UDP layer, it also It needs to be encapsulated according to the UDP protocol to form a UDP data packet containing an RTP data packet, and then the UDP data packet is sent to the terminal device so that the terminal device can parse the UDP data packet to obtain the RTP data packet, and then parse the RTP data packet to obtain the first data segment corresponding to the original media data stream and the second data segment corresponding to the extended media data stream, and obtain the required original media data stream and extended media data stream by obtaining multiple first data segments and multiple second data segments.
  • Fig. 8 schematically shows a flow chart of a method 800 for transmitting a media data stream.
  • the method 800 is executed in a terminal device.
  • an RTP data packet is obtained.
  • the RTP data packet includes an RTP payload and an RTP extension header.
  • the RTP payload includes a first data segment of a first media data stream.
  • the RTP extension header includes a second data segment of a second media data stream.
  • the RTP data packet is obtained by parsing the target data packet.
  • the target data packet is, for example, a UDP data packet.
  • step S802 a first data segment of a first media data stream is parsed from an RTP payload of an RTP data packet.
  • step S803 the second data segment of the second media data stream is parsed from the RTP extension header of the RTP data packet.
  • the terminal device can obtain the data segments of the two media data streams from the same RTP data packet.
  • the embodiment of the present application can obtain each second data segment from each RTP data packet, and the sequence from the start segment to the end segment obtained can constitute a complete second media data stream.
  • the terminal device may synchronously present the first data segment and the second data segment obtained from the same RTP data.
  • the terminal device when the terminal device parses the RTP data packet, the terminal device further parses the target field of the RTP extension header.
  • the target field is used to indicate the position of the second data segment in the second media data stream.
  • the terminal device may determine that the second data segment is a start segment of the second media data stream.
  • the terminal device determines that the second data segment is the end segment of the second media data stream.
  • the start segment and the end segment are used to determine the start and end of the second media data stream respectively.
  • the terminal device may perform an SDP negotiation process. For example, the terminal device may send a proposal generated based on a session description protocol SDP.
  • the proposal includes a media description and extended information.
  • the media description is used to represent description information related to the transmission of the first media data stream
  • the extended information is used to represent description information related to the transmission of the second media data stream in an RTP extension header.
  • the terminal device may receive a response from the server to the proposal to complete the SDP negotiation with the server.
  • the terminal device parses the RTP data packet to obtain the first data segment of the original media data stream and the second data segment of the extended media data stream, and renders and displays them according to each data segment.
  • the time correspondence between the original media data stream and the extended media data stream can be specifically the time point when the extended media data stream is inserted into the original media data stream.
  • the extended media data stream is inserted at the beginning of the 5th minute of the original media data stream, and the extended media data stream ends at the end of the 10th minute. Then, when rendering, the original media data stream before the 5th minute is first rendered and displayed.
  • the extended media data stream When rendering to the beginning of the 5th minute, the extended media data stream is started to be rendered, and the original media data stream in the 5th minute period and the extended media data stream in the 1st minute are displayed simultaneously, until the synchronous rendering and synchronous display of the original media data stream between the 5th minute and the 10th minute and all the extended media data streams are completed, and finally the remaining original media data streams are rendered and displayed.
  • the method for transmitting media data streams in the embodiments of the present application can be applied to any scenario involving real-time audio and video communication, for example, it can be applied to interactive live broadcast, e-commerce live broadcast, video live broadcast, video conferencing, video communication, P2P and other scenarios that require low latency.
  • the method for transmitting media data streams in the present application can also synchronize data streams that cannot be synchronized, such as synchronizing the extended interactive supplementary stream metadata data stream of the audio and video data stream itself, etc.
  • the method for transmitting media data streams in the embodiments of the present application is specifically described.
  • a one-on-one classroom is a face-to-face teaching between a teacher and a student through live broadcasting.
  • data streams such as the courseware content that needs to be displayed when the teacher is lecturing, the subtitles corresponding to the teacher's lecture content, the students' answers to the teacher's questions, the questions raised by the students, etc. All types of data streams are related.
  • the subtitles need to be synchronized with what the teacher says, the courseware content needs to be synchronized with the teacher's lecture content, the students' answers to the teacher's questions should be followed by the teacher's questions, the students' questions should be within the teacher's question-answering time range, etc.
  • the delayed arrival of any one or more types of data streams will affect the effect of the live broadcast. Therefore, in order to ensure For teaching effectiveness, low latency during live broadcast is the key thing to ensure.
  • the data stream collected by the image acquisition device such as the camera is the original media data stream in this application
  • the dynamic data streams such as the pictures, subtitles, students' answers and questions that record the courseware content are the extended media data streams in the embodiments of this application.
  • the system architecture corresponding to the one-to-one scenario includes a teacher terminal, a student terminal and a server.
  • the teacher terminal and the student terminal are provided with a built-in or external image acquisition device.
  • the image acquisition device can specifically be a camera, a video recorder and the like.
  • the teacher starts teaching, the camera connected to the teacher terminal starts shooting video to generate a live data stream.
  • the classroom content progresses, it is necessary to display courseware content pictures related to the real-time teaching content in the interface. Since the courseware content pictures and the live data stream are two data streams, they are independent of each other during transmission. Therefore, it is necessary to ensure that when the teacher talks about the courseware content picture, the courseware content picture is also synchronously displayed in the teacher terminal and the student terminal.
  • the server can obtain the live data stream generated by the camera in real time, and generate multiple first data segments corresponding to the live data stream by subpackaging the live data stream.
  • the server can also receive the extended media data stream containing the courseware content picture.
  • the server can insert the second data segment generated by subpackaging the extended media data stream at the time point when the courseware content picture needs to be displayed according to the time correspondence between the courseware picture and the live video, and can also set the corresponding start field and end field for the second data segment, and add the start field and end field corresponding to the second data segment to the second data segment.
  • the second data segment can be filled into the RTP extension header, and then an RTP data packet is formed according to the filled RTP extension header and the RTP payload carrying the original media data stream, wherein the original media data stream in the RTP payload exists in the form of the first data segment, and then the RTP data packet is encapsulated based on the UDP transmission protocol to generate a UDP data packet, and finally the UDP data packet is sent to the teacher terminal and the student terminal, so that the teacher terminal and the student terminal synchronously display the live video stream and the courseware content picture.
  • the student terminal when the student terminal receives a UDP data packet, it can parse the UDP data packet to obtain the RTP data packet therein, and then parse the RTP data packet to obtain a first data segment corresponding to the teacher's live broadcast screen and a second data segment corresponding to the courseware content. Then, the first data segment can be decoded, and the segments corresponding to the teacher's live broadcast screen can be obtained from each first data segment, and then the segments are sorted and spliced according to the timestamps to obtain the data stream corresponding to the teacher's live broadcast screen. At the same time, the second data segment is decoded to obtain the segments and target fields therein, and the target field includes a start field and an end field.
  • the target segment corresponding to the courseware content picture can be determined, and then the target segments are sorted and spliced according to the timestamps to obtain the data stream corresponding to the courseware content picture. Finally, the two data streams are rendered and displayed according to the time correspondence between the courseware content picture and the teacher's live broadcast picture, so as to display the teacher's live broadcast picture and the courseware content picture that need to be displayed synchronously in the display interface.
  • the method for transmitting media data streams in the embodiment of the present application can also be applied to other scenarios, such as interactive live broadcast scenarios, where the host can interact with the audience, the host can interact with other hosts, and so on.
  • the server can obtain the media data stream corresponding to the host's live broadcast screen, and at the same time obtain the media data stream of the audience or other hosts interacting with the host, such as interactive text information, interactive video, interactive audio, etc., and then packetize the media data stream corresponding to the host's live broadcast screen and the interactive media data stream to form a first data segment and a second data segment, and then packetize the interactive media data stream.
  • the target data packet can be, for example, a UDP data packet, etc.
  • the target data packet After receiving the target data packet, the target data packet can be parsed to obtain the RTP data packet, and then the RTP data packet can be parsed to obtain the first data segment and the second data segment therein, and then the first data segment is decoded to obtain the segment corresponding to the live broadcast screen of the anchor, and a live media data stream is formed according to these segments, and at the same time, the second data segment is decoded to obtain the segment and target field therein, and the field information includes the start field and the end field, and then the obtained start field and end field determine the target segment corresponding to the interactive media data stream, and then the interactive media data stream can be formed according to the target segment, and finally the live media data stream and the interactive media data stream can be rendered and displayed according to the time correspondence.
  • the method for transmitting media data stream in the present application obtains multiple second data fragments corresponding to the extended media data stream, then fills the second data fragments into the RTP extension header, and forms an RTP data packet according to the filled RTP extension header and the RTP load carrying the original media data stream, and finally sends the RTP data packet to the terminal device.
  • the method for transmitting media data stream in the embodiment of the present application can realize the multiplexing of RTP media data stream, carry two different media data streams in the same RTP data packet, avoid creating media data stream protocol stacks for different media data streams respectively, and conduct multiple negotiations based on the session description protocol, reduce the transmission steps of media data streams, and improve the transmission efficiency; on the other hand, the method for transmitting media data stream in the embodiment of the present application only needs to perform RTP header extension customization at the RTP service sending layer, and perform corresponding parsing and assembly at the RTP service receiving layer, so as to realize the synchronous transmission of a dynamic data stream.
  • the synchronous transmission method is simple and compatible with the WebRTC standard, and can dynamically and flexibly increase the media data stream; on the other hand, it can realize the transmission synchronization of two media data streams, avoiding the problem of asynchrony between different media data streams.
  • FIG. 9 schematically shows a block diagram of the structure of the device for transmitting a media data stream provided in an embodiment of the present application.
  • the device 900 includes: an acquisition module 910 and a sending module 920, specifically:
  • the acquisition module 910 is configured to acquire a first data segment from the first media data stream as an RTP payload; and acquire a second data segment from the second media data stream.
  • the sending module 920 is used to add the second data segment to the RTP extension header; and generate an RTP data packet including the RTP extension header and the RTP payload.
  • the sending module 920 can also send the RTP data packet to the terminal device.
  • FIG10 schematically shows a block diagram of a computer system structure of an electronic device for implementing an embodiment of the present application.
  • the electronic device may be a terminal device 101 and a server 102 as shown in FIG1 .
  • the computer system 1000 includes a central processing unit 1001 (CPU), which can perform various appropriate actions and processes according to the program stored in the read-only memory 1002 (ROM) or the program loaded from the storage part 1008 to the random access memory 1003 (RAM).
  • Various programs and data required for system operation are also stored in the random access memory 1003.
  • the central processing unit 1001, the read-only memory 1002 and the random access memory 1003 are connected to each other through a bus 1004.
  • the input/output interface 1005 Input/Output interface, i.e., I/O interface
  • I/O interface input/output interface
  • the following components are connected to the input/output interface 1005: an input section 1006 including a keyboard, a mouse, etc.; an output section 1007 including a cathode ray tube (CRT), a liquid crystal display (LCD), etc., and a speaker, etc.; a storage section 1008 including a hard disk, etc.; and a communication section 1009 including a network interface card such as a local area network card, a modem, etc.
  • the communication section 1009 performs communication processing via a network such as the Internet.
  • a drive 1010 is also connected to the input/output interface 1005 as needed.
  • a removable medium 1011 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, etc., is installed on the drive 1010 as needed, so that a computer program read therefrom is installed into the storage section 1008 as needed.
  • the process described in each method flow chart can be implemented as a computer software program.
  • an embodiment of the present application includes a computer program product, which includes a computer program carried on a computer readable medium, and the computer program contains a program code for executing the method shown in the flow chart.
  • the computer program can be downloaded and installed from the network through the communication part 1009, and/or installed from the removable medium 1011.
  • the central processor 1001 various functions defined in the system of the present application are executed.
  • the computer-readable medium shown in the embodiment of the present application may be a computer-readable signal medium or a computer-readable Medium or any combination of the above two.
  • Computer readable medium can be, for example, but not limited to, electrical, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices or devices, or any combination of the above.
  • Computer readable media can include, but are not limited to: an electrical connection with one or more wires, a portable computer disk, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM), a flash memory, an optical fiber, a portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the above.
  • a computer readable medium can be any tangible medium containing or storing a program that can be used by or in combination with an instruction execution system, device or device.
  • a computer readable signal medium can include a data signal propagated in a baseband or as part of a carrier wave, which carries a computer readable program code.
  • This propagated data signal can take a variety of forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the above.
  • the computer readable signal medium may also be any computer readable medium other than a computer readable medium, which may send, propagate or transmit a program for use by or in conjunction with an instruction execution system, apparatus or device.
  • the program code contained on the computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wired, etc., or any suitable combination of the above.
  • each box in the flow chart or block diagram can represent a module, a program segment or a part of a code, and the above-mentioned module, program segment or a part of a code contains one or more executable instructions for realizing the specified logical function.
  • the functions marked in the box can also occur in a different order from the order marked in the accompanying drawings. For example, two boxes represented in succession can actually be executed substantially in parallel, and they can sometimes be executed in the opposite order, depending on the functions involved.
  • each box in the block diagram or flow chart, and the combination of the boxes in the block diagram or flow chart can be implemented with a dedicated hardware-based system that performs a specified function or operation, or can be implemented with a combination of dedicated hardware and computer instructions.
  • the technical solution according to the implementation methods of the present application can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, a USB flash drive, a mobile hard disk, etc.) or on a network, and includes several instructions to enable an electronic device to execute the method according to the implementation methods of the present application.
  • a non-volatile storage medium which can be a CD-ROM, a USB flash drive, a mobile hard disk, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

本申请属于数据传输技术领域,涉及一种传输媒体数据流的方法、装置、存储介质及电子设备。其中该方法包括:从第一媒体数据流获取第一数据片段,作为RTP负载;从第二媒体数据流获取第二数据片段;将所述第二数据片段添加至RTP扩展头部中;生成包含所述RTP扩展头部和所述RTP负载的RTP数据包。本申请能够实现RTP流复用,通过同一路RTP流的RTP数据包传输两路媒体数据流,避免进行多次SDP协商,提高媒体数据流传输效率和传输方便性。

Description

传输媒体数据流的方法、装置、存储介质及电子设备
本申请要求于2022年11月15日提交中国专利局、申请号为202211428364.0、申请名称为“媒体数据流同步方法、装置、计算机可读介质以及电子设备”的中国专利申请的优先权,该申请的全部内容通过引用结合在本申请中。
技术领域
WebRTC(Web Real-Time Communication,web实时通信技术),就是在web浏览器里面引入实时通信,包括音视频通话等。WebRTC实现了基于网页的语音对话或视频通话,目的是无插件实现web端的实时通信的能力。
WebRTC主要是基于RTP协议(Real-Time Transport Protocol,实时传输协议)进行实时音视频通信的,RTP协议的头部可以扩展以满足更多的需求,但是RTP头扩展主要是对数据流的一些数据帧进行扩展的,对于扩展的数据量比较大时,则无法通过简单的RTP头扩展实现,而是需要另外利用WebRTC协商增加一个新的数据流来传输数据,这就使得动态增加或者减少一个数据流的灵活性很差的问题。
发明内容
本申请的目的在于提供一种传输媒体数据流的方法、装置、存储介质及电子设备,有助于避免在需要新增一路媒体数据流时需要新增一个SDP协商过程和新增一路RTP流的麻烦,从而提高媒体数据流传输效率和传输方便性。
根据本申请实施例的一个方面,提供一种传输媒体数据流的方法,在服务器中执行,所述方法包括:从第一媒体数据流获取第一数据片段,作为RTP负载;从第二媒体数据流获取第二数据片段;将所述第二数据片段添加至RTP扩展头部中;生成包含所述RTP扩展头部和所述RTP负载的RTP数据包。
根据本申请实施例的一个方面,提供一种传输媒体数据流的方法,在终端设备中执行,所述方法包括:获取RTP数据包,所述RTP数据包RTP负载和RTP扩展头部,所述RTP负载包括第一媒体数据流的第一数据片段,所述RTP扩展头部包括第二媒体数据流的第二数据片段;从RTP数据包的RTP负载中解析出所述第一媒体数据流的第一数据片段;从RTP数据包的RTP扩展头部中解析出所述第二媒体数据流的第二数据片段。
根据本申请实施例的一个方面,提供一种媒体数据流同步装置,该装置包括:获取模块,用于获取与扩展媒体数据流对应的扩展数据包;发送模块,用于将所述扩展数据包填充至RTP扩展头部中,基于填充后的所述RTP扩展头部和携带有原始媒体数据流的RTP负载形成RTP数据包,并将所述RTP数据包发送至终端设备。
根据本申请实施例的一个方面,提供一种传输媒体数据流的装置,所述装置包括:
接收模块,获取RTP数据包,所述RTP数据包RTP负载和RTP扩展头部,所述RTP负载包括第一媒体数据流的第一数据片段,所述RTP扩展头部包括第二媒体数据流的第二数据片段;
解析模块,从RTP数据包的RTP负载中解析出所述第一媒体数据流的第一数据片段;从RTP数据包的RTP扩展头部中解析出所述第二媒体数据流的第二数据片段。
根据本申请实施例的一个方面,提供一种计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现如以上技术方案中的方法。
根据本申请实施例的一个方面,提供一种电子设备,该电子设备包括:处理器;以及存储器,用于存储所述处理器的可执行指令;其中,所述处理器被配置为经由执行所述可执行指令来执行如以上技术方案中的方法。
根据本申请实施例的一个方面,提供一种计算机程序产品,该计算机程序产品包括计算机指令,当所述计算机指令在计算机上运行时,使得所述计算机执行如以上技术方案中的方法。
附图说明
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本申请的实施例,并与说明书一起用于解释本申请的原理。显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1示意性地示出了应用本申请实施例中的传输媒体数据流的方法的系统架构的结构示意图。
图2示意性地示出了本申请实施例中传输媒体数据流的方法的步骤流程示意图。
图3示意性地示出了本申请实施例中根据原始媒体数据流和扩展媒体数据流的时间对应关系将扩展媒体数据流插入到原始媒体数据流中的结构示意图。
图4示意性地示出了本申请实施例中的实时传输协议RTP协议的固定头部的结构示意图。
图5A示意性地示出了本申请实施例中的实时传输协议RTP协议one-byte header扩展头的结构示意图。
图5B示意性地示出了本申请实施例中的实时传输协议RTP协议two-byte header扩展头的结构示意图。
图6示出了第一媒体数据流和第二媒体数据流的示意图。
图7示意性示出了本申请实施例中的WebRTC媒体数据流底层协议栈的结构示意图。
图8示意性示出了传输媒体数据流的方法的流程示意图。
图9示意性地示出了本申请实施例中传输媒体数据流的装置的结构框图。
图10示意性示出了适于用来实现本申请实施例的电子设备的计算机系统结构框图。
具体实施方式
现在将参考附图更全面地描述示例实施方式。然而,示例实施方式能够以多种形式实施,且不应被理解为限于在此阐述的范例;相反,提供这些实施方式使得本申请将更加全面和完整,并将示例实施方式的构思全面地传达给本领域的技术人员。
此外,所描述的特征、结构或特性可以以任何合适的方式结合在一个或更多实施例中。在下面的描述中,提供许多具体细节从而给出对本申请的实施例的充分理解。然而,本领域技术人员将意识到,可以实践本申请的技术方案而没有特定细节中的一个或更多,或者可以采用其它的方法、组元、装置、步骤等。在其它情况下,不详细示出或描述公知方法、装置、实现或者操作以避免模糊本申请的各方面。
附图中所示的方框图仅仅是功能实体,不一定必须与物理上独立的实体相对应。即,可以采用软件形式来实现这些功能实体,或在一个或多个硬件模块或集成电路中实现这些功能实体,或在不同网络和/或处理器装置和/或微控制器装置中实现这些功能实体。
附图中所示的流程图仅是示例性说明,不是必须包括所有的内容和操作/步骤,也不是必须按所描述的顺序执行。例如,有的操作/步骤还可以分解,而有的操作/步骤可以合并或部分合并,因此实际执行的顺序有可能根据实际情况改变。
在对本申请实施例中的传输媒体数据流的方法进行详细说明之前,先对本申请涉及的技术名词进行解释。
1.WebRTC:Web Real-Time communication,web实时通信技术,W3C和IETF标准,应用于音视频实时通信。
2.RTP:Real-time Transport Protocol,实时传输协议,是一个网络传输协议,它是由IETF的多媒体传输工作小组1996年在RFC 1889中公布的。
3.SDP:Session Description Protocol,一种会话描述协议。
4.DTLS:Datagram Transport Layer Security,数据包传输层安全性协议。
5.UDP:User Datagram Protocol,一种无连接的传输层协议,提供面向事务的简单不可靠信息传送服务。
在本申请的相关技术中,在基于WebRTC进行音视频实时通信时,通常采用实时传输协议RTP协议负责音视频数据的封包和传输,但是采用RTP协议进行音视频数据封包和传输时,只能针对一个媒体数据流进行封包和传输,虽然RTP协议的头部可以扩展,可用于对媒体数据流进行扩展,但是 RTP扩展头部只能实现对一些数据帧的扩展,数据量较小,并且该数据帧是对应于媒体数据流本身的数据帧,无法对数据量较大的数据数据进行扩展,例如传输的媒体数据流为视频流,那么扩展的数据是与视频流本身对应的扩展信息,而无法通过RTP扩展头部扩展需要与视频流同步显示的字幕流、互动文字、背景音乐等类型的数据。
如果需要对新增的大量数据进行传输,就需要利用WebRTC协商增加一个新的数据流来传输数据。也就是说,针对想要传输的不同的媒体数据流,首先需要基于会话描述协议SDP进行相应规则地协商过程(即SDP协商),需要多定义一路m=<media><port><proto><fmt list>的媒体数据流,同时需要定义这路媒体数据流对应的很多的字段,然后再基于RTP协议进行数据的封装和传输。但是这样的媒体数据流传输方式存在步骤繁琐、延时大、同步效果差、用户体验差的问题,并且动态增加或减少一个数据流的灵活性差,因此相关技术已无法适用于直播、视频会议、P2P等对媒体数据流的同步性要求比较高的场景。并且,很多类型的数据流无法实现同步,例如音视频流本身的扩展互动补充数据流,如元数据metadata流等等,是做不到同步的。
针对本领域的相关技术,本申请实施例提出了一种传输媒体数据流的方法,本申请中的传输媒体数据流的方法可以应用于任意的直播以及音视频通话场景,例如视频会议、视频通话、互动直播、电商直播,等等,并且在实现同步的同时,无需重新定义一路新的媒体数据流以及相关的很多字段,只需要在已有的RTP流的基础上,就可以构建一个独立的媒体数据流,实现一路RTP流可以同时传输两路的媒体数据流。
图1示出了应用本申请技术方案的示例性系统架构框图。
如图1所示,系统架构100可以包括终端设备101、服务器102和网络103。其中,终端设备101可以是诸如智能手机、平板电脑、笔记本电脑、台式电脑、智能电视、智能车载终端等各种具有显示屏幕或者具有显示屏幕和语音播放装置的电子设备,终端设备101可以用来接收通过同一路RTP数据包同时传输的两个媒体数据流,该两个媒体数据流可以都是视频数据,可以都是音频数据,可以都是字幕流数据,也可以是视频数据、音频数据和字幕流数据中的任意两个,当两个媒体数据流都是视频数据或者字幕流数据时,终端设备101可以将媒体数据流渲染显示在显示屏幕上,当两个媒体数据流都是音频数据时,可以通过语音播放装置进行播放,当两个媒体数据是视频数据、音频数据和字幕流数据中的任意两个时,可以将视频数据或者字幕流数据渲染显示在显示屏幕上同时通过语音播放装置播放音频信息。服务器102可以是独立的物理服务器,也可以是多个物理服务器构成的服务器集群或者分布式系统,还可以是提供云计算服务的云服务器。网络103可以是能够在客户端101和服务器102之间提供通信链路的各种连接类型的通信介质,例如可以是有线通信链路或者无线通信链路。
根据实现需要,本申请实施例中的系统架构可以具有任意数目的终端设备、网络和服务器。例如,服务器可以是由多个服务器设备组成的服务器群组。另外,本申请实施例提供的技术方案可以应用于终端设备101中。
在本申请的一个实施例中,终端设备101首先与服务器102进行SDP协商,以保证底层代码对应用层所需功能的支持,然后再在终端设备101和服务器102之间进行媒体数据流的传输。在传输媒体数据流的时候,可以基于RTP协议生成包含原始媒体数据流的第一数据片段和扩展媒体数据流的第二数据片段的RTP数据包,然后通过预设传输协议(例如UDP等)对RTP数据包进行封装形成目标数据包,并通过网络103将目标数据包发送至终端设备101,以使终端设备101通过解析目标数据包获取RTP数据包,通过解析RTP数据包获取原始媒体数据流的第一数据片段和扩展媒体数据流的第二数据片段,进而通过解码第一数据片段和第二数据片段以获取两路不同的媒体数据流,并根据原始媒体数据流和扩展媒体数据流进行渲染和同步呈现。在本申请的实施例中,可以对RTP协议进行头部扩展并自定义字段以实现RTP流复用,这样就可以在一路媒体数据流的基础上再构建一个独立的媒体数据流,实现通过一路RTP流对两路媒体数据流的同时传输,而且不用为新增的媒体数据流进行再一次的SDP协商,并且兼容WebRTC标准。
在本申请的一个实施例中,根据应用场景的不同,系统架构会存在些许差异,例如在P2P场景中,可能存在多个终端设备,但是没有服务器,也就是说,终端设备既是终端也是服务器,等等。虽然系统架构存在差异,但是采用RTP头部扩展的流复用方法进行两路媒体数据流同步传输的方式是相同的。
在本申请的一个实施例中,本申请中的服务器102可以是提供云计算服务的云服务器,也就是说,本申请涉及云存储和云计算技术。
云存储(cloud storage)是在云计算概念上延伸和发展出来的一个新的概念,分布式云存储系统(以下简称存储系统)是指通过集群应用、网格技术以及分布存储文件系统等功能,将网络中大量各种不同类型的存储设备(存储设备也称之为存储结点)通过应用软件或应用接口集合起来协同工作,共同对外提供数据存储和业务访问功能的一个存储系统。
目前,存储系统的存储方法为:创建逻辑卷,在创建逻辑卷时,就为每个逻辑卷分配物理存储空间,该物理存储空间可能是某个存储设备或者某几个存储设备的磁盘组成。客户端在某一逻辑卷上存储数据,也就是将数据存储在文件系统上,文件系统将数据分成许多部分,每一部分是一个对象,对象不仅包含数据而且还包含数据标识(ID,ID entity)等额外的信息,文件系统将每个对象分别写入该逻辑卷的物理存储空间,且文件系统会记录每个对象的存储位置信息,从而当客户端请求访问数据时,文件系统能够根据每个对象的存储位置信息让客户端对数据进行访问。
存储系统为逻辑卷分配物理存储空间的过程,具体为:按照对存储于逻辑卷的对象的容量估量(该估量往往相对于实际要存储的对象的容量有很大余量)和独立冗余磁盘阵列(RAID,Redundant Array of Independent Disk)的组别,预先将物理存储空间划分成分条,一个逻辑卷可以理解为一个分条,从而为逻辑卷分配了物理存储空间。
云计算(cloud computing)是一种计算模式,它将计算任务分布在大量计算机构成的资源池上,使各种应用系统能够根据需要获取计算力、存储空间和信息服务。提供资源的网络被称为“云”。“云”中的资源在使用者看来是可以无限扩展的,并且可以随时获取,按需使用,随时扩展,按使用付费。
作为云计算的基础能力提供商,会建立云计算资源池(简称云平台,一般称为IaaS(Infrastructure as a Service,基础设施即服务)平台,在资源池中部署多种类型的虚拟资源,供外部客户选择使用。云计算资源池中主要包括:计算设备(为虚拟化机器,包含操作系统)、存储设备、网络设备。
按照逻辑功能划分,在IaaS(Infrastructure as a Service,基础设施即服务)层上可以部署PaaS(Platform as a Service,平台即服务)层,PaaS层之上再部署SaaS(Software as a Service,软件即服务)层,也可以直接将SaaS部署在IaaS上。PaaS为软件运行的平台,如数据库、web容器等。SaaS为各式各样的业务软件,如web门户网站、短信群发器等。一般来说,SaaS和PaaS相对于IaaS是上层。
下面结合具体实施方式对本申请提供的传输媒体数据流的方法、媒体数据流同步装置、计算机可读介质以及电子设备等技术方案做出详细说明。
图2示出了本申请一个实施例中的传输媒体数据流的方法的流程图,该传输媒体数据流的方法例如可以由服务器执行。该服务器例如可以是图1中的服务器102。如图2所示,本申请实施例中的传输媒体数据流的方法200主要可以包括如下的步骤S210至步骤S240。
步骤S210:从第一媒体数据流获取第一数据片段,作为RTP负载。第一媒体数据流是指由一路RTP流(即包括多个RTP数据包的一个序列)的RTP负载传输的媒体数据流,也可以称为原始媒体数据流。一个第一数据片段是第一媒体数据流的一个分片。
步骤S220:从第二媒体数据流获取第二数据片段。第二媒体数据流也可以称为扩展媒体数据流。
步骤S230:将所述第二数据片段添加至RTP扩展头部中。一个第二数据片段是第二媒体数据流的一个分片。
步骤S240:生成包含所述RTP扩展头部和所述RTP负载的RTP数据包。这里,根据方法200生成的RTP数据包可以用于同时传输两路媒体数据流。
本申请一方面能够实现对一路RTP流的复用,在同一个RTP流中携带两个不同的媒体数据流,避免了针对不同的媒体数据流分别创建媒体数据流协议栈(即分别创建一个RTP流),并避免了针对不同的媒体数据流分别进行基于会话描述协议的SDP协商,从而简化了媒体数据流的传输步骤,提高了媒体流数据的传输效率和传输方便性;另一方面,本申请实施例中的传输媒体数据流的方法只需要在RTP业务发送层进行RTP头扩展定制,在RTP业务接收层进行对应的解析和组装,就可以实现一个动态数据流的同步传输,并且与WebRTC标准兼容,可以动态灵活地增加媒体数据流;再一方面,在同一个RTP数据包中传输一个第一数据片段和一个第二数据片段,可以实现这两个数据片段的同步 传输,进而能够实现两个媒体数据流的传输同步性,避免了不同媒体数据流之间的不同步的问题。
在一个实施例中,步骤S210可以将第一媒体数据流进行分片,以得到多个分片,每个分片被作为一个第一数据片段。步骤S210针对一个第一数据片段,可以从第二媒体数据流获取采集时段与第一数据片段的采集时段相同的数据片段,作为一个第二数据片段。在一个实施例中,步骤S210针对一个第一数据片段,可以从第二媒体数据流获取显示时段与第一数据片段的显示时段相同的数据片段,作为一个第二数据片段。这样,本申请实施例可以在同一个RTP数据包中传输时间对齐(即采集时段或者显示时段相同)的两种数据片段。
在本申请的一个实施例中,将需要从媒体设备或者统一资源标识符对应的地址处获取的第一媒体数据流定义为原始媒体数据流。将需要与原始媒体数据流同步显示的第二媒体数据流定义为扩展媒体数据流。例如在直播时,主播进行直播的过程中所录制的视频对应的数据流即为原始媒体数据流,而在直播过程中,在直播画面中显示的图片、字幕以及播放的背景音乐等媒体对象对应的数据流即为扩展媒体数据流。又例如,观众与主播进行互动时发送的文字信息、图片等内容对应的数据流是扩展媒体数据流。在本申请的实施例中,根据场景的不同,扩展媒体数据流的类型也会不同。例如当场景为互动直播场景时,扩展媒体数据流可以包括互动字幕等。当场景为电商直播场景时,扩展媒体数据流可以包括字幕、图片等。当场景为视频聊天时,扩展媒体数据流可以包括音频、图片等等。当然扩展媒体数据流还可以是其它类型的对象,本申请实施例对此不作具体限定。
在本申请的一个实施例中,媒体数据流传输时都是以数据包的形式传输的,因此在准备传输媒体数据流时,需要对媒体数据流进行分包(也可以称为分组或者分片),依次形成对应的数据片段,相应地,本申请实施例中想要实现对扩展媒体数据流的同步传输,就需要对获取的扩展媒体数据流进行分包以获取与扩展媒体数据流对应的多个第二数据片段。并且在获取扩展媒体数据流的同时还需要获取原始媒体数据流,并对原始媒体数据流进行分包形成对应的多个第一数据片段。
在本申请的一个实施例中,为了通过RTP流复用实现原始媒体数据流和扩展媒体数据流这两路媒体数据流的同步传输,并在终端设备的显示屏幕中同步显示原始媒体数据流和扩展媒体数据流对应的对象,在生成第二数据片段时,首先需要将扩展媒体数据流根据原始媒体数据流和扩展媒体数据流之间的时间对应关系(例如关于采样时间或者显示时间的对应关系)对齐,然后再对扩展媒体数据流进行分包形成第二数据片段。在本申请的实施例中,原始媒体数据流和扩展媒体数据流是相互独立的,并且在对原始媒体数据流和扩展媒体数据流进行分包时也是相互独立、互不影响的。另外,由于扩展媒体数据流是根据其与原始媒体数据流的时间对应关系对齐的,并且扩展媒体数据流和原始媒体数据流是通过同一个RTP流发送至终端设备的,因此终端设备能够同步获取时间对齐的第一数据片段和第二数据片段,并根据第一数据片段和第二数据片段进行渲染和同步呈现。
图3示意性示出了直播场景中根据第一媒体数据流和第二媒体数据流的时间对应关系,将第一媒体数据流的第一数据片段和第二媒体数据流的第二数据片段对齐的界面示意图,如图3所示,在直播过程中,直播数据流即为第一媒体数据流,当主播直播到第5分钟开始时,开始插入背景音乐,该背景音乐即为扩展媒体数据流,一直到第10分钟结束时结束背景音乐的播放,那么在传输第一数据片段和第二数据片段时,可以将主播直播视频中第5分钟的时段的图像帧所处的第一数据片段与背景音乐的起始帧所处的第二数据片段对齐,将主播直播视频中第10分钟的时段的图像帧所对应的第一数据片段与背景音乐的结束帧所处的第二数据片段对齐。这样,终端设备能够同时获取主播直播视频对应的第一数据片段和背景音乐对应的第二数据片段,并在显示直播视频的第5分钟-第10分钟的画面的同时也能够同步播放该背景音乐,实现第一媒体数据流与第二媒体数据流的同步传输和同步呈现。
在本申请的一个实施例中,第二媒体数据流为动态数据流,同时也是定制数据流。第二媒体数据流不依赖于第一媒体数据流,可以直接添加到RTP扩展头部中。在本申请的实施例中,第二媒体数据流是与第一媒体数据流相关性很强的媒体数据流,具体地,可以是第一媒体数据流的同步子流或者是互动子流。其中,同步子流为与第一媒体数据流具有相同生产者的媒体数据流,例如音频制作者在音频中添加一个背景音,那么音频就是第一媒体数据流,背景音就是同步子流;互动子流为与第一媒体数据流具有不同生产者的媒体数据流,例如在互动直播时,主播直播生成的媒体数据流为第一媒体数据流,而互动者在互动时产生的音频或者字幕流等则为互动子流,当然还存在一些其它类型的同步子 流和互动子流,本申请实施例在此不再赘述。
在本申请的一个实施例中,在获取第一数据片段和第二数据片段之前,需要建立终端设备和服务器之间的通信连接,并且就媒体数据流传输时的参数及规则进行协商,以保证底层代码对应用层数据传输时所需的功能支持。在本申请的实施例中,协商是基于会话描述协议SDP实现的,通过终端设备和服务器进行提议/应答(offer/answer)实现。具体地,终端设备通过网络向服务器发送SDP提议。服务器接收到该提议后,确定是接受还是拒绝。当确定接受后,通过网络向终端设备发送应答。当确定拒绝后,通过网络向终端设备发送拒绝。考虑到要实现媒体数据流的传输,服务器接受终端设备发送的提议是必须的,因此在本申请实施例中不需要考虑服务器拒绝提议的情况。服务器向终端设备发送应答后,便确认了SDP提议中所涉及的功能支持。进一步,可以基于RTP协议进行媒体数据流的传输。
在本申请的一个实施例中,SDP协商主要就媒体数据流的具体信息进行协商,当RTP存在扩展头时,还会对RTP的扩展头进行协商。接下来,对SDP协商的具体内容进行示例说明:
SDP提议中对第一媒体数据流的描述为:
m=<media><port><proto><fmt list>
其中,<media>媒体类型,比如audio、video等等;
<port>端口;
<proto>传输协议,比如UDP/RTP,表示用UDP传输RTP数据包;
<fmt list>媒体格式,数据负载类型列表。
SDP层对RTP扩展头部的描述为:
a=extmap:<value>[“/”<direction>]<URI><extension attributes>
其中,value表示扩展头标识;
direction表示传输方向,可选sendonly、recvonly、sendrecv、inactive,默认值为sendrecv;
URI表示扩展头的URI,通信双方可以通过URI标明扩展头的含义让双方都能理解;
extension attributes表示其它的媒体数据流信息,比如流标识等复杂描述。
在本申请的一个实施例中,在进行SDP协商时,可以将第二媒体数据流对应的扩展头标识和URI等参数也都通过SDP提议进行描述,这样可以保证后续第二媒体数据流传输过程中所使用到的功能都能得到底层代码的支持,并且后续在已有的第一媒体数据流的基础上添加一路独立的第二媒体数据流时,无需再针对新添加的第二媒体数据流进行SDP协商。也就是说,本申请中只需要进行一次SDP协商,就可以基于RTP流复用实现两路独立的媒体数据流的同步传输。
在一个实施例中,方法200可以接收终端设备发送的基于会话描述协议SDP生成的提议。提议包括媒体描述和扩展信息。媒体描述用于表示与传输所述第一媒体数据流有关的描述信息,描述信息可以包括与传输媒体数据有关的功能支持的参数。例如,包括上文中提到的m=<media><port><proto><fmt list>。扩展信息用于表示与在RTP扩展头部中传输所述第二媒体数据流有关的描述信息,例如包括用于传输所述第二媒体数据流的扩展头标识和URI。例如,扩展信息可以包括extmap:<value>[“/”<direction>]<URI><extension attributes>。
另外,方法200包括生成对提议的应答。应答例如可以包括服务器与接手SDP提议相关的指示,例如为对SDP提议中所涉及的功能支持的确认。
方法200还可以将应答发送至终端设备,以完成与终端设备之间的SDP协商。
在本申请的一个实施例中,在完成SDP协商,并获取第一数据片段和第二数据片段后,可以基于第一数据片段和第二数据片段生成RTP数据包,并将RTP数据包发送至终端设备。通常情况下,一路RTP流只能传输一路媒体数据流(即,一路RTP流中一个RTP数据包只能传输一路媒体数据流的数据片段),为了实现RTP流的复用,实现同时传输两路媒体数据流的功能,本申请对RTP头部进行扩展,进而利用一路RTP流中的RTP扩展头部来传输一路媒体数据流的数据片段。
RTP数据包(即RTP报文)的头部包括固定头部和扩展头部,图4示意性示出了RTP数据包的固定头部的结构示意图,如图4所示,固定头部中包含多个标志位,其中V表示RTP协议的版本号,占2位,当前协议版本号为2;P为填充标志位,占1位,如果P=1,则在RTP数据包的尾部填充一个或多个额外的八位组,它们不是有效载荷的一部分;X为扩展标志,占1位,如果X=1,则在RTP 报头后跟有一个扩展报头;CC为CSRC计数器,占4位,指示CSRC标识符个数;M为标志位,占1位,不同的有效载荷有不同的含义,对于视频,标记一帧的结束,对于音频,标记会话的开始;PT(payload type)为有效荷载类型,占7位,用于说明RTP数据包中有效载荷的类型,如GSM音频、JPEM图像等,在流媒体中大部分是用来区分音频流和视频流,这样便于终端设备进行解析;序列号(sequence number)占16位,用于标识发送者所发送的RTP数据包的序列号,每发送一个报文,序列号增1,这个字段当下层的承载协议用UDP的时候,网络状况不好的时候可以用来检查丢包,当出现网络抖动的情况可以用来对数据进行重新排序,序列号的初始值是随机的,同时音频包和视频包的sequence是分别计数的;时戳(Timestamp)占32位,必须使用90kHZ时钟频率(程序中的90000),时戳反映了RTP数据包的第一个八位组的采样时刻,接受者使用时戳来计算延迟和延迟抖动,并进行同步控制,可以根据RTP包的时间戳来获得数据包的时序;同步信源(SSRC)标识符占32位,用于标识同步信源,同步信源是指产生媒体流的信源,它通过RTP报头中的一个32为数字SSRC标识符来标识,而不依赖网络地址,接收者将根据SSRC标识符来区分不同的信源,进行RTP数据包的分组;提供信源(CSRC)标识符,每个CSRC标识符占32位,可以有0~15个CSRC,每个CSRC标识了包含在RTP数据包有效载荷中的所有提供信源。
当扩展标志位X=1,则在RTP头部后跟有一个扩展头部,该扩展头部可以用来传输一些其它的必要信息。扩展头部的扩展方式有两种,一种是one-byte header扩展,一种是two-byte header扩展。图5A示意性示出了one-byte header扩展头部的结构示意图,如图5A所示,one-byte header扩展头部包括扩展标志0XBEDE,扩展头总长度length以及与扩展头总长度length对应的多个扩展头的扩展头标识ID、对应的负载长度L以及负载data。图5B示意性示出了two-byte header扩展头部的结构示意图,如图5B所示,two-byte header扩展头部包括扩展标志0x100和appbits字段,进一步地,还可以包括扩展头部总数据长度length以及与扩展头总数据长度length对应的多个扩展头元素的扩展头元素标识ID、对应的负载长度L以及负载data。
通过分析图5A和图5B所示的one-byte header扩展头部和two-byte header扩展头部的结构,可以发现,两种扩展方式的报头格式以及数据长度均不同,由于本申请的目的是RTP数据包在携带原始媒体数据流的同时还能独立携带一路扩展媒体数据流,这就需要对扩展媒体数据流在传输过程中所需的功能参数字段进行自定义,one-byte header扩展头中不存在可以进行自定义的字段,而two-byte header扩展头中的appbits字段取决于应用程序,可定义为任何值或含义,并且可以用来填充应用层级别的数据,应用层级别的数据为不是标准支持的数据,本申请实施例可以自定义所要填充的数据,通常情况下,出于发信号的目的,appbits字段被视为分配给本地标识符256的特殊扩展值,如果未通过配置或发信号为此本地标识符256指定扩展,则发送方应将appbits字段设置为所有0,接收方必须忽略该字段,但是在本申请的实施例中,为了实现RTP媒体数据流复用,就需要对two-byte header扩展头中的appbits字段进行自定义,以使底层代码能够实现对添加到RTP数据包中的扩展媒体数据流的功能支持。
在本申请的一个实施例中,可以将第一数据片段填充至RTP报文中作为RTP负载,将第二数据片段填充至RTP扩展头部中,进而根据填充后的RTP扩展头部和携带有原始媒体数据流的RTP负载,生成携带有原始媒体数据流的第一数据片段和扩展媒体数据流的第二数据片段的RTP数据包。在此基础上,可以将RTP数据包发送至终端设备,以使终端设备通过接收各个RTP数据包,能够根据原始媒体数据流和扩展媒体数据流进行渲染和同步呈现。该RTP数据包包括报头和RTP负载,其中报头包括RTP固定头部和RTP扩展头部。
在本申请的一个实施例中,终端设备在接收到多个RTP数据包后,需要从各个RTP数据包中提取第二数据片段,以得到完成的第二媒体数据流。为了确定在采集顺序(或者显示顺序)上的第二媒体数据流的起始片段(即第二媒体数据流的首个第二数据片段)和结束片段(即第二媒体数据流的最后一个第二数据片段),本申请实施例可以对起始片段和结束片段分别进行标记。
在一个实施例中,在一个RTP数据包中第二数据片段为第二媒体数据流的起始片段的情况下,在该RTP数据包的RTP扩展头部中设置表示第二数据片段为起始片段的起始字段。在第二数据片段为第二媒体数据流的结束片段的情况下,在所述RTP扩展头部中设置表示第二数据片段为结束片段的结束字段。例如,RTP扩展头部中appbits字段用于表示第二数据片段在所述第二媒体数据流中的位置, 这里,第二数据片段的位置的取值范围包括起始片段的位置、结束片段的位置,和处于起始片段和结束片段之间的位置。例如,appbits字段包括4位。将RTP扩展头部中的appbits字段设置为起始字段包括:将appbits字段的第1位设置1,其余位设置为0。即,将appbits字段设置为1000。另外,将所述RTP扩展头部中的appbits字段设置为结束字段包括:将appbits字段的第2位设置1,其余位设置为0,即将appbits字段设置为0100。另外,在一个第二数据片段处于起始片段和结束片段之间,携带该第二数据片段的RTP扩展头部中的appbits字段设置为0000。
在一些实施例中,可以对RTP数据包中的数据进行抽取组装,以得到扩展媒体数据流。例如,根据起始字段和结束字段,确定起始片段和结束片段。起始片段和结束片段,以及二者之间的第二数据片段可以组成完整的扩展媒体数据流。
图6示出了第一媒体数据流和第二媒体数据流的示意图,如图6所示,扩展媒体数据流经过分片后形成多个片段,该些片段中存在起始片段ts1和结束片段ts6,并且对起始片段ts1标记有起始字段1000,对结束片段ts6标记有结束字段0100。也就是说,起始片段ts1-结束片段ts6中的所有片段构成完整的一路第二媒体数据流;原始媒体数据流经分片后也形成了多个片段,例如片段TS1至TS9。其中片段TS1为原始媒体数据流中首个第一数据片段,其中,起始片段ts1与片段TS2对齐,因此,起始片段ts1和片段TS2可以由同一个RTP数据包传输。类似地,结束片段ts2与片段TS8可以由同一个RTP数据包传输。
在一些实施例中,当终端设备或者服务器完成对appbits字段中关于扩展媒体数据流的开始字段和结束字段的定义后,在服务器获取扩展媒体数据流并对扩展媒体数据流进行分片的过程中,就可以根据关于appbits字段的自定义规则对组成扩展媒体数据流的第二数据片段进行标记。具体而言,服务器在对扩展媒体数据流进行编码时,首先可以对扩展媒体数据流进行分片,分片时一帧对象(音频、图像,等等)可以编码成一个或者多个第二数据片段,在生成RTP数据包的过程中可以对不同第二数据片段进行标记,由于第二媒体数据流中的第二数据片段是按序排列的,因此在生成RTP数据包时,可以对起始片段和结束片段进行标记即可,而具体的标记方法就是在appbits字段中进行标记。以上述实施例中的标记规则为例,当扩展媒体数据流开始时,在包含起始片段的RTP扩展头部中将appbits字段设置为起始字段,即将appbits字段的第一位标记为1,其它位标记为0。当扩展媒体数据流结束时,在包含结束片段的RTP扩展头部中将appbits字段设置为结束字段,即将appbits字段的第二位标记为1,其它位标记为0。而对于包含第二媒体数据流中处于起始字段和结束字段之间的任意第二数据片段,包含该第二数据片段的RTP扩展头部中可以将appbits字段设置0000。这样,终端设备可以在从一个RTP扩展头部中解析出起始字段时,确定第二媒体数据流的开始。在从一个RTP扩展头部中解析出结束字段时,可以确定第二媒体数据流的结束。
在本申请的一个实施例中,由于RTP数据包的包头和RTP数据包的大小是有限的,示例性地,RTP数据包的包头大小不超过255个字节,RTP数据包的总大小不超过1200个字节,因此为了通过RTP扩展头部携带扩展媒体数据流对应的第二数据片段,需要按照RTP数据包的包头的扩展大小来设定扩展媒体数据流的分片大小,进而根据该分片大小对第二媒体数据流进行分片和传输。
在本申请的一个实施例中,在生成RTP数据包时,可以将RTP数据包发送至终端设备,以便终端设备同步呈现原始媒体数据流和扩展媒体数据流。在生成RTP数据包后,可以根据预设传输协议(例如UDP等)对RTP数据包进行封装生成与预设传输协议对应的目标数据包(例如UDP数据包),然后再将该目标数据包发送至终端设备,以使终端设备从中获取所需的数据。该预设传输协议具体可以是UDP协议,由于UDP是一种无连接的传输层协议,虽然提供面向事务的简单不可靠信息的传送服务,但是能够提高数据传输的时效性,减少延时,提高用户体验,因此通常采用UDP协议作为预设传输协议,当然还可以是其它传输协议,本申请实施例对此不做具体限定。
图7示意性示出了WebRTC媒体数据流底层协议栈的结构示意图,如图7所示,WebRTC媒体数据流底层协议栈由上至下依次为媒体数据流层701、SRTP层702、DTLS层703和UDP层704,其中SRTP层702处于传输层,主要是对媒体数据流层701中的媒体数据流进行处理生成RTP数据包;DTLS层703为数据包传输层安全性协议层,用于对RTP数据包传输过程中的安全进行保障,保证RTP数据包是通过加密的信道进行传输的;UDP层704处于传输层,RTP数据包到达UDP层后,还 需要根据UDP协议进行封装,形成包含RTP数据包的UDP数据包,然后再将该UDP数据包发送至终端设备,以便终端设备对UDP数据包进行解析获取RTP数据包,进而对RTP数据包进行解析获取与原始媒体数据流对应的第一数据片段和与扩展媒体数据流对应的第二数据片段,并通过获取多个第一数据片段和多个第二数据片段获取所需的原始媒体数据流和扩展媒体数据流。
图8示意性示出了传输媒体数据流的方法800的流程示意图。方法800在终端设备中执行。
如图8所示,在步骤S801中,获取RTP数据包。RTP数据包RTP负载和RTP扩展头部。RTP负载包括第一媒体数据流的第一数据片段。RTP扩展头部包括第二媒体数据流的第二数据片段。在一个实施例中,通过对目标数据包进行解析,以获取RTP数据包。目标数据包例如为一个UDP数据包。
在步骤S802中,从RTP数据包的RTP负载中解析出第一媒体数据流的第一数据片段。
在步骤S803中,从RTP数据包的RTP扩展头部中解析出第二媒体数据流的第二数据片段。这样,终端设备可以从同一个RTP数据包中获取两路媒体数据流各自的数据片段。通过步骤S803,本申请实施例可以从各个RTP数据包中获取各个第二数据片段,获取的起始片段至结束片段的序列可以组成完整的第二媒体数据流。
进一步,终端设备可以同步呈现从同一个RTP数据包括中得到的第一数据片段和第二数据片段。
在一些实施例中,在终端设备解析RTP数据包时,终端设备进一步解析所述RTP扩展头部的目标字段。目标字段用于表示第二数据片段在所述第二媒体数据流中的位置。
在目标字段为起始字段时,终端设备可以确定第二数据片段为第二媒体数据流的起始片段。
在目标字段为结束字段时,终端设备确定第二数据片段为第二媒体数据流的结束片段。其中,起始片段和结束片段用于分别确定第二媒体数据流的开始和结束。
在一些实施例中,终端设备可以执行SDP协商过程。例如,终端设备可以发送基于会话描述协议SDP生成的提议。提议包括媒体描述和扩展信息。媒体描述用于表示与传输所述第一媒体数据流有关的描述信息,所述扩展信息用于表示与在RTP扩展头部中传输所述第二媒体数据流有关的描述信息。
另外,终端设备可以接收服务器对提议的应答,以完成与服务器之间的SDP协商。
由于在生成RTP数据包之前已经就原始媒体数据流和扩展媒体数据流所需的功能配置进行了SDP协商,因此终端设备对RTP数据包进行解析获取其中的原始媒体数据流的第一数据片段和扩展媒体数据流中的第二数据片段,并根据各数据片段进行渲染和显示。
在本申请的一个实施例中,在对原始媒体数据流和扩展媒体数据流进行渲染时,需要根据原始媒体数据流和扩展媒体数据流之间的时间对应关系进行渲染,这样才可以保证原始媒体数据流和扩展媒体数据流的同步显示。其中,原始媒体数据流和扩展媒体数据流之间的时间对应关系具体可以为扩展媒体数据流插入到原始媒体数据流中的时间点,例如在原始媒体数据流播放的第5分钟开始时插入了扩展媒体数据流,在第10分钟结束时扩展媒体数据流播放结束,那么在渲染时,首先渲染第5分钟之前的原始媒体数据流并显示,当渲染到第5分钟开始时,开始渲染扩展媒体数据流,并将第5分钟时段内的原始媒体数据流和第1分钟内的扩展媒体数据流同时显示,直至完成对第5分钟-第10分钟之间的原始媒体数据流和所有扩展媒体数据流的同步渲染和同步显示,最后再针对剩余的原始媒体数据流进行渲染和显示即可。
本申请实施例中的传输媒体数据流的方法可以应用于任意涉及到音视频实时通信的场景,例如可以应用到需要实现低延迟的互动直播、电商直播、视频直播、视频会议、视频通信、P2P等等场景中,同时,本申请中的传输媒体数据流的方法还可以对无法进行同步的数据流进行同步,比如对音视频数据流本身的扩展互动补充流metadata数据流进行同步,等等。接下来,以基于直播的一对一课堂的场景为例,对本申请实施例中的传输媒体数据流的方法进行具体说明。
随着直播的广泛普及,逐渐出现了基于直播的在线课堂,例如基于直播的一对一课堂,一对一课堂就是一个老师和一个学生通过直播进行面对面授课,在直播过程中,存在多种类型的数据流,例如老师讲课时需要显示的课件内容、与老师讲课内容对应的字幕、学生对老师所提问题的回答、学生提出的问题等等,各种类型的数据流都是相关的,比如字幕需要与老师说的话同步、课件内容需要与老师授课内容同步、学生对老师所提问题的回答应当紧接着老师的问题、学生提出的问题应当在老师的答疑时间范围内等等,任意一种或多种类型的数据流的延时到达都会影响直播的效果,因此为了保证 教学效果,直播过程中的低时延是重点需要保证的。
直播过程中,通过摄像器等图像采集装置采集的数据流即为本申请中的原始媒体数据流,记录有课件内容的图片、字幕、学生的回答以及问题这些动态数据流等即为本申请实施例中的扩展媒体数据流,通过采用本申请中的传输媒体数据流的方法可以实现需要同步的原始媒体数据流和扩展媒体数据流的同步传输、同步渲染和同步显示。接下来,以课件内容图片作为扩展媒体数据流为例对本申请中的传输媒体数据流的方法进行详细说明。
与该一对一场景对应的系统架构包括老师终端、学生终端和服务器,老师终端和学生终端中设置有内置或外设的图像采集装置,图像采集装置具体可以是摄像器、录像器等装置,当老师开始授课时,与老师终端连接的摄像器开始拍摄视频生成直播数据流,随着课堂内容的进行,需要在界面中显示与实时授课内容相关的课件内容图片,由于课件内容图片与直播数据流是两路数据流,传输时是相互独立的,因此需要保证在老师讲到该课件内容图片的时候,该课件内容图片也同步显示在老师终端和学生终端中。
在直播过程中,服务器可以实时获取摄像器拍摄生成的直播数据流,通过对直播数据流进行分包生成与直播数据流对应的多个第一数据片段,在老师打开老师终端中存储的课件文件并选择投屏后,服务器也可以接收到包含课件内容图片的扩展媒体数据流,服务器可以根据课件图片与直播视频的时间对应关系,在需要展示课件内容图片的时间点插入通过对扩展媒体数据流进行分包生成的第二数据片段,以及针对第二数据片段还可以设置与其对应的起始字段和结束字段,并将与第二数据片段对应的起始字段和结束字段添加至第二数据片段中,接着可以将第二数据片段填充至RTP扩展头部中,然后根据填充后的RTP扩展头部和携带有原始媒体数据流的RTP负载形成RTP数据包,其中RTP负载中原始媒体数据流是以第一数据片段的形式存在的,接着基于UDP传输协议对RTP数据包进行封装以生成UDP数据包,最后将UDP数据包发送至老师终端和学生终端,以便老师终端和学生终端同步显示直播视频流和课件内容图片。
以学生终端为例,当学生终端接收到UDP数据包后,可以解析该UDP数据包以获取其中的RTP数据包,然后对RTP数据包进行解析以获取与老师直播画面对应的第一数据片段和与课件内容对应的第二数据片段,接着可以对第一数据片段进行解码,并从各个第一数据片段中获取与老师直播画面对应的片段,进而将该些片段按照时间戳进行排序和拼接以获取老师直播画面对应的数据流,同时对第二数据片段进行解码以获取其中的片段和目标字段,该目标字段包括起始字段和结束字段,接着根据所获取的起始字段和结束字段可以确定与课件内容图片对应的目标片段,进而将该些目标片段按照时间戳进行排序和拼接以获取课件内容图片对应的数据流,最后根据课件内容图片和老师直播画面的时间对应关系对两路数据流进行渲染和显示,以在显示界面中显示需要同步显示的老师直播画面和课件内容图片。
如上所述,本申请实施例中的传输媒体数据流的方法还可以应用于其它场景中,例如互动直播场景等场景中,主播可以跟观众进行互动,主播可以跟其他主播进行互动等等,在这种场景下,服务器可以获取与主播的直播画面对应的媒体数据流,同时获取观众或者其他主播与该主播进行互动的媒体数据流,例如可以是互动文字信息、互动视频、互动音频等等,然后根据主播的直播画面对应的媒体数据流和互动的媒体数据流进行分包,以形成第一数据片段和第二数据片段,在对互动的媒体数据流进行分包形成第二数据片段时,针对每一个第二数据片段标记其中的起始片段对应的起始字段和结束片段对应的结束字段,并将起始字段和结束字段添加至第二数据片段中,然后将第二数据片段填充至RTP扩展头部中,接着根据填充后的RTP扩展头部和携带有原始媒体数据流的RTP负载生成RTP数据包,最后根据预设传输协议对RTP数据包进行封装,生成目标数据包,并将该目标数据包发送至所有观众的终端设备,以及主播的终端设备,该目标数据包例如可以是UDP数据包等等。接收到目标数据包后,可以解析目标数据包以获取RTP数据包,接着解析RTP数据包以获取其中的第一数据片段和第二数据片段,然后解码第一数据片段获取与主播直播画面对应的片段,并根据该些片段形成直播媒体数据流,同时解码第二数据片段获取其中的片段和目标字段,该字段信息包括起始字段和结束字段,接着所获取的起始字段和结束字段确定与互动媒体数据流对应的目标分片,进而根据目标分片可以形成互动媒体数据流,最后根据时间对应关系渲染直播媒体数据流和互动媒体数据流并显示即可。
本申请中的传输媒体数据流的方法,通过获取与扩展媒体数据流对应的多个第二数据片段,接着将第二数据片段填充至RTP扩展头部中,并根据填充后的所述RTP扩展头部和携带有原始媒体数据流的RTP负载形成RTP数据包,最后将所述RTP数据包发送至终端设备。本申请实施例中的传输媒体数据流的方法,一方面能够实现对RTP媒体数据流的复用,在同一个RTP数据包中携带两个不同的媒体数据流,避免了针对不同的媒体数据流分别创建媒体数据流协议栈,并基于会话描述协议进行多次的协商,减少了媒体数据流的传输步骤,提高了传输效率;另一方面,本申请实施例中的传输媒体数据流的方法只需要在RTP业务发送层进行RTP头扩展定制,在RTP业务接收层进行对应的解析和组装,就可以实现一个动态数据流的同步传输,同步传输方法简单,并且与WebRTC标准兼容,可以动态灵活地增加媒体数据流;再一方面,能够实现两个媒体数据流的传输同步性,避免了不同媒体数据流之间的不同步问题。
应当注意,尽管在附图中以特定顺序描述了本申请中方法的各个步骤,但是,这并非要求或者暗示必须按照该特定顺序来执行这些步骤,或是必须执行全部所示的步骤才能实现期望的结果。附加的或备选的,可以省略某些步骤,将多个步骤合并为一个步骤执行,以及/或者将一个步骤分解为多个步骤执行等。
以下介绍本申请的装置实施例,可以用于执行本申请上述实施例中的传输媒体数据流的方法。图9示意性地示出了本申请实施例提供的传输媒体数据流的装置的结构框图。如图9所示,装置900包括:获取模块910和发送模块920,具体地:
获取模块910,用于从第一媒体数据流获取第一数据片段,作为RTP负载;从第二媒体数据流获取第二数据片段。
发送模块920,用于将所述第二数据片段添加至RTP扩展头部中;生成包含所述RTP扩展头部和所述RTP负载的RTP数据包。
另外,发送模块920,还可以将RTP数据包发送到终端设备。
本申请各实施例中提供的装置的具体细节已经在对应的方法实施例中进行了详细的描述,此处不再赘述。
图10示意性地示出了用于实现本申请实施例的电子设备的计算机系统结构框图,该电子设备可以是如图1中所示的终端设备101和服务器102。
需要说明的是,图10示出的电子设备的计算机系统1000仅是一个示例,不应对本申请实施例的功能和使用范围带来任何限制。
如图10所示,计算机系统1000包括中央处理器1001(Central Processing Unit,CPU),其可以根据存储在只读存储器1002(Read-Only Memory,ROM)中的程序或者从存储部分1008加载到随机访问存储器1003(Random Access Memory,RAM)中的程序而执行各种适当的动作和处理。在随机访问存储器1003中,还存储有系统操作所需的各种程序和数据。中央处理器1001、在只读存储器1002以及随机访问存储器1003通过总线1004彼此相连。输入/输出接口1005(Input/Output接口,即I/O接口)也连接至总线1004。
在一些实施例中,以下部件连接至输入/输出接口1005:包括键盘、鼠标等的输入部分1006;包括诸如阴极射线管(Cathode Ray Tube,CRT)、液晶显示器(Liquid Crystal Display,LCD)等以及扬声器等的输出部分1007;包括硬盘等的存储部分1008;以及包括诸如局域网卡、调制解调器等的网络接口卡的通信部分1009。通信部分1009经由诸如因特网的网络执行通信处理。驱动器1010也根据需要连接至输入/输出接口1005。可拆卸介质1011,诸如磁盘、光盘、磁光盘、半导体存储器等等,根据需要安装在驱动器1010上,以便于从其上读出的计算机程序根据需要被安装入存储部分1008。
特别地,根据本申请的实施例,各个方法流程图中所描述的过程可以被实现为计算机软件程序。例如,本申请的实施例包括一种计算机程序产品,其包括承载在计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信部分1009从网络上被下载和安装,和/或从可拆卸介质1011被安装。在该计算机程序被中央处理器1001执行时,执行本申请的系统中限定的各种功能。
需要说明的是,本申请实施例所示的计算机可读介质可以是计算机可读信号介质或者计算机可读 介质或者是上述两者的任意组合。计算机可读介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(Erasable Programmable Read Only Memory,EPROM)、闪存、光纤、便携式紧凑磁盘只读存储器(Compact Disc Read-Only Memory,CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本申请中,计算机可读介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本申请中,计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读信号介质还可以是计算机可读介质以外的任何计算机可读介质,该计算机可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:无线、有线等等,或者上述的任意合适的组合。
附图中的流程图和框图,图示了按照本申请各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,上述模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图或流程图中的每个方框、以及框图或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
应当注意,尽管在上文详细描述中提及了用于动作执行的设备的若干模块或者单元,但是这种划分并非强制性的。实际上,根据本申请的实施方式,上文描述的两个或更多模块或者单元的特征和功能可以在一个模块或者单元中具体化。反之,上文描述的一个模块或者单元的特征和功能可以进一步划分为由多个模块或者单元来具体化。
通过以上的实施方式的描述,本领域的技术人员易于理解,这里描述的示例实施方式可以通过软件实现,也可以通过软件结合必要的硬件的方式来实现。因此,根据本申请实施方式的技术方案可以以软件产品的形式体现出来,该软件产品可以存储在一个非易失性存储介质(可以是CD-ROM,U盘,移动硬盘等)中或网络上,包括若干指令以使得一台电子设备执行根据本申请实施方式的方法。
本领域技术人员在考虑说明书及实践这里公开的发明后,将容易想到本申请的其它实施方案。本申请旨在涵盖本申请的任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本申请的一般性原理并包括本申请未公开的本技术领域中的公知常识或惯用技术手段。
应当理解的是,本申请并不局限于上面已经描述并在附图中示出的精确结构,并且可以在不脱离其范围进行各种修改和改变。本申请的范围仅由所附的权利要求来限制。

Claims (16)

  1. 一种传输媒体数据流的方法,在服务器中执行,所述方法包括:
    从第一媒体数据流获取第一数据片段,作为RTP负载;
    从第二媒体数据流获取第二数据片段;
    将所述第二数据片段添加至RTP扩展头部中;
    生成包含所述RTP扩展头部和所述RTP负载的RTP数据包。
  2. 根据权利要求1所述的方法,其中,所述方法还包括:将所述RTP数据包发送至终端设备。
  3. 根据权利要求1所述的方法,其中,所述方法还包括:
    在所述第二数据片段为所述第二媒体数据流的起始片段的情况下,在所述RTP扩展头部中设置表示所述第二数据片段为所述起始片段的起始字段;
    在所述第二数据片段为所述第二媒体数据流的结束片段的情况下,在所述RTP扩展头部中设置表示所述第二数据片段为所述结束片段的结束字段。
  4. 根据权利要求3所述的方法,其中,
    在所述RTP扩展头部中设置表示所述第二数据片段为所述起始片段的起始字段,包括:将所述RTP扩展头部中的appbits字段设置为所述起始字段;
    所述在所述RTP扩展头部中设置表示所述第二数据片段为结束片段的结束字段,包括:将所述RTP扩展头部中的appbits字段设置为所述结束字段。
  5. 根据权利要求4所述的方法,其中,所述appbits字段包括4位;
    将所述RTP扩展头部中的appbits字段设置为所述起始字段包括:将所述appbits字段的第1位设置1,其余位设置为0;
    将所述RTP扩展头部中的appbits字段设置为所述结束字段包括:将所述appbits字段的第2位设置1,其余位设置为0。
  6. 根据权利要求1所述的方法,其中,所述方法还包括:
    接收终端设备发送的基于会话描述协议SDP生成的提议,所述提议包括媒体描述和扩展信息,所述媒体描述用于表示与传输所述第一媒体数据流有关的描述信息,所述扩展信息用于表示与在RTP扩展头部中传输所述第二媒体数据流有关的描述信息;
    生成对所述提议的应答;
    将所述应答发送至所述终端设备,以完成与所述终端设备之间的SDP协商。
  7. 根据权利要求6所述的方法,其中,所述扩展信息包括:用于传输所述第二媒体数据流的扩展头标识和URI。
  8. 根据权利要求1所述的方法,其中,
    从第一媒体数据流获取第一数据片段,作为RTP负载,包括:
    将所述第一媒体数据流进行分片,以得到多个分片,每个分片被作为一个第一数据片段;
    所述从第二媒体数据流获取第二数据片段,包括:
    针对一个所述第一数据片段,从所述第二媒体数据流获取采集时段与所述第一数据片段的采集时段相同的数据片段,作为一个第二数据片段。
  9. 一种传输媒体数据流的方法,在终端设备中执行,所述方法包括:
    获取RTP数据包,所述RTP数据包RTP负载和RTP扩展头部,所述RTP负载包括第一媒体数据流的第一数据片段,所述RTP扩展头部包括第二媒体数据流的第二数据片段;
    从RTP数据包的RTP负载中解析出所述第一媒体数据流的第一数据片段;
    从RTP数据包的RTP扩展头部中解析出所述第二媒体数据流的第二数据片段。
  10. 根据权利要求9所述的方法,还包括:
    解析所述RTP扩展头部的目标字段,所述目标字段用于表示所述第二数据片段在所述第二媒体数据流中的位置;
    在所述目标字段为起始字段时,确定所述第二数据片段为所述第二媒体数据流的起始片段;
    在所述目标字段为结束字段时,确定所述第二数据片段为所述第二媒体数据流的结束片段;
    其中,所述起始片段和所述结束片段用于分别确定所述第二媒体数据流的开始和结束。
  11. 根据权利要求9所述的方法,还包括:
    发送基于会话描述协议SDP生成的提议,所述提议包括媒体描述和扩展信息,所述媒体描述用于表示与传输所述第一媒体数据流有关的描述信息,所述扩展信息用于表示与在RTP扩展头部中传输所述第二媒体数据流有关的描述信息;
    接收服务器对所述提议的应答,以完成与服务器之间的SDP协商。
  12. 一种传输媒体数据流的装置,包括:
    获取模块,用于:
    从第一媒体数据流获取第一数据片段,作为RTP负载;
    从第二媒体数据流获取第二数据片段;
    发送模块,用于:
    将所述第二数据片段添加至RTP扩展头部中;
    生成包含所述RTP扩展头部和所述RTP负载的RTP数据包。
  13. 一种传输媒体数据流的装置,所述装置包括:
    接收模块,获取RTP数据包,所述RTP数据包RTP负载和RTP扩展头部,所述RTP负载包括第一媒体数据流的第一数据片段,所述RTP扩展头部包括第二媒体数据流的第二数据片段;
    解析模块,从RTP数据包的RTP负载中解析出所述第一媒体数据流的第一数据片段;从RTP数据包的RTP扩展头部中解析出所述第二媒体数据流的第二数据片段。
  14. 一种计算机可读存储介质,存储有计算机程序,该计算机程序被处理器执行时实现权利要求1至11中任意一项所述的方法。
  15. 一种电子设备,包括:
    处理器;以及
    存储器,用于存储指令;
    其中,所述处理器执行所述存储器存储的指令用于实现权利要求1至11中任意一项所述的方法。
  16. 一种计算机程序产品,所述计算机程序产品包括计算机指令,当所述计算机指令在计算机上运行时,使得所述计算机执行权利要求1至11中任意一项所述的方法。
PCT/CN2023/127001 2022-11-15 2023-10-27 传输媒体数据流的方法、装置、存储介质及电子设备 WO2024104080A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211428364.0A CN118055103A (zh) 2022-11-15 2022-11-15 媒体数据流同步方法、装置、计算机可读介质及电子设备
CN202211428364.0 2022-11-15

Publications (1)

Publication Number Publication Date
WO2024104080A1 true WO2024104080A1 (zh) 2024-05-23

Family

ID=91050776

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/127001 WO2024104080A1 (zh) 2022-11-15 2023-10-27 传输媒体数据流的方法、装置、存储介质及电子设备

Country Status (2)

Country Link
CN (1) CN118055103A (zh)
WO (1) WO2024104080A1 (zh)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102868937A (zh) * 2011-07-08 2013-01-09 中兴通讯股份有限公司 多媒体数据的传输方法及系统
CN104737515A (zh) * 2012-07-27 2015-06-24 高通股份有限公司 在rtp会话中递送时间同步的任意数据
US20180109388A1 (en) * 2016-10-19 2018-04-19 Qualcomm Incorporated Methods for Header Extension Preservation, Security, Authentication, and Protocol Translation for RTP over MPRTP
KR20180050983A (ko) * 2016-11-07 2018-05-16 한국전자통신연구원 Rtp 패킷 전송 방법 및 장치
CN112714335A (zh) * 2019-10-24 2021-04-27 中兴通讯股份有限公司 直播媒体流录制方法、系统及计算机可读存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102868937A (zh) * 2011-07-08 2013-01-09 中兴通讯股份有限公司 多媒体数据的传输方法及系统
CN104737515A (zh) * 2012-07-27 2015-06-24 高通股份有限公司 在rtp会话中递送时间同步的任意数据
US20180109388A1 (en) * 2016-10-19 2018-04-19 Qualcomm Incorporated Methods for Header Extension Preservation, Security, Authentication, and Protocol Translation for RTP over MPRTP
KR20180050983A (ko) * 2016-11-07 2018-05-16 한국전자통신연구원 Rtp 패킷 전송 방법 및 장치
CN112714335A (zh) * 2019-10-24 2021-04-27 中兴通讯股份有限公司 直播媒体流录制方法、系统及计算机可读存储介质

Also Published As

Publication number Publication date
CN118055103A (zh) 2024-05-17

Similar Documents

Publication Publication Date Title
EP3518486B1 (en) Data transmission method and apparatus, and electronic device
US6580756B1 (en) Data transmission method, data transmission system, data receiving method, and data receiving apparatus
CN110870282B (zh) 使用网络内容的文件轨处理媒体数据
EP3902272A1 (en) Audio and video pushing method and audio and video stream pushing client based on webrtc protocol
US20150181003A1 (en) Method and apparatus for transmitting and receiving packets in hybrid transmission service of mmt
US20130212231A1 (en) Method, apparatus and system for dynamic media content insertion based on http streaming
KR101959260B1 (ko) Mmt 시스템을 위한 미디어 데이터 전송 장치 및 방법, 그리고 미디어 데이터 수신 장치 및 방법
CN113661692B (zh) 接收媒体数据的方法、装置和非易失性计算机可读存储介质
WO2020248649A1 (zh) 音视频数据同步播放方法、装置、系统、电子设备及介质
CN110996160B (zh) 视频处理方法、装置、电子设备及计算机可读取存储介质
CN111669645B (zh) 视频的播放方法、装置、电子设备及存储介质
CN108882010A (zh) 一种多屏播放的方法及系统
CN205230019U (zh) 一种实现多屏间视频无缝切换的系统
CN108810575B (zh) 一种发送目标视频的方法和装置
CN106303754A (zh) 一种音频数据播放方法及装置
WO2024104080A1 (zh) 传输媒体数据流的方法、装置、存储介质及电子设备
WO2023231478A1 (zh) 音视频共享方法、设备及计算机可读存储介质
US9338485B2 (en) Method and apparatus for distributing a multimedia content
WO2014036873A1 (zh) 一种传输流的共享方法
CN110996181A (zh) 一种多源内容数据统一封装方法
KR101405865B1 (ko) 셋탑박스 화면 가상화 방법 및 시스템
CN112188256B (zh) 信息处理方法、信息提供方法、装置、电子设备及存储介质
KR100640918B1 (ko) 인터넷 스트리밍 서비스를 위한 스트림 파일 제작 방법
CN114448955B (zh) 一种数字音频网络传输方法、装置、设备及存储介质
JP2004228850A (ja) 受信再生方法、受信再生装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23890516

Country of ref document: EP

Kind code of ref document: A1