WO2009137972A1 - 视音频同流传输的方法、系统及相应的接收方法和设备 - Google Patents

视音频同流传输的方法、系统及相应的接收方法和设备 Download PDF

Info

Publication number
WO2009137972A1
WO2009137972A1 PCT/CN2008/072681 CN2008072681W WO2009137972A1 WO 2009137972 A1 WO2009137972 A1 WO 2009137972A1 CN 2008072681 W CN2008072681 W CN 2008072681W WO 2009137972 A1 WO2009137972 A1 WO 2009137972A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
audio
unit
media frame
time
Prior art date
Application number
PCT/CN2008/072681
Other languages
English (en)
French (fr)
Inventor
刘志强
张建强
彭铭
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2009137972A1 publication Critical patent/WO2009137972A1/zh

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/434Disassembling of a multiplex stream, e.g. demultiplexing audio and video streams, extraction of additional data from a video stream; Remultiplexing of multiplex streams; Extraction or processing of SI; Disassembling of packetised elementary stream
    • H04N21/4341Demultiplexing of audio and video streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/4302Content synchronisation processes, e.g. decoder synchronisation
    • H04N21/4307Synchronising the rendering of multiple content streams or additional data on devices, e.g. synchronisation of audio on a mobile phone with the video output on the TV screen
    • H04N21/43072Synchronising the rendering of multiple content streams or additional data on devices, e.g. synchronisation of audio on a mobile phone with the video output on the TV screen of multiple content streams on the same device
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/236Assembling of a multiplex stream, e.g. transport stream, by combining a video stream with other content or additional data, e.g. inserting a URL [Uniform Resource Locator] into a video stream, multiplexing software data into a video stream; Remultiplexing of multiplex streams; Insertion of stuffing bits into the multiplex stream, e.g. to obtain a constant bit-rate; Assembling of a packetised elementary stream
    • H04N21/2368Multiplexing of audio and video streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/432Content retrieval operation from a local storage medium, e.g. hard-disk
    • H04N21/4325Content retrieval operation from a local storage medium, e.g. hard-disk by playing back content from the storage medium
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/433Content storage operation, e.g. storage operation in response to a pause request, caching operations
    • H04N21/4331Caching operations, e.g. of an advertisement for later insertion during playback
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments
    • H04N21/8455Structuring of content, e.g. decomposing content into time segments involving pointers to the content, e.g. pointers to the I-frames of the video stream

Definitions

  • the present invention relates to the field of mobile multimedia broadcast or mobile TV technology, and in particular, to a method and system for simultaneous streaming of video and audio of a mobile multimedia broadcast.
  • Mobile multimedia broadcasting is a multimedia playback technology that has emerged in recent years. With a handheld terminal, you can watch TV even at high speeds.
  • the terminal receives the program list through the wireless protocol, and can select the channel that the user has the right to watch, and receive the multimedia data of the selected channel, so as to watch the TV on the mobile terminal.
  • the air data transmitted by the system is divided into different channels.
  • the data of each channel includes: video, audio and data, which need to be transmitted separately by multiplexing technology; the terminal can receive relevant data and play through the terminal. , to achieve normal playback of the TV.
  • the first one is the RTP (Real-time Transport Protocol).
  • RTP Real-time Transport Protocol
  • This method can separately transmit video streams and audio streams by opening multiple RTP channels; however, this method has the problem that synchronous transmission between video streams and audio streams is difficult to control.
  • the second is the TS (Transport Stream) mode.
  • the TS protocol is one of the MPEG (Moving Picture Expert Grou) standards. It transmits video and audio in a fixed 188-byte packet through the PID (Program Identifier) field. Differentiate between video and audio, so that video and audio can be transmitted in one TS stream; however, each packet in TS mode is small, and the terminal needs to be hierarchically parsed. In order to obtain a complete video and audio frame, a large number of TS packets need to be buffered, and parsed. The logic is more complicated.
  • the technical problem to be solved by the present invention is to provide a method and system for video and audio co-stream transmission, simplifying video AV synchronization control, reducing video audio analysis complexity, and improving user experience.
  • the present invention provides a method for co-stream transmission of video and audio, comprising the following steps:
  • the transmitting device buffers the input video data and audio data according to the time slice period and sorts them in play time order, and respectively sorts the sorted video data and audio data in the same time slice period into video units in the video segment in the same media frame. And the audio unit in the audio segment, and the playing time information of the video unit and the audio unit is written in the media frame, and after the encapsulation is completed, the media frame is transmitted from the broadcast channel to the receiving end device.
  • the method further includes:
  • the receiving device After receiving the media frame from the broadcast channel, the receiving device parses out the video unit, the audio unit, and the playing time information, calculates the playing time of each video unit and the audio unit, and sequentially decodes the video unit and the audio unit. , play according to the corresponding playback time.
  • the playing time information includes an initial playing time and a relative playing time, where the initial playing time is an earliest time value of playing time of all video units and audio units in the time slice period, and the relative playing time The time is obtained by subtracting the starting play time from the playing time of the video unit or the audio unit.
  • the media frame includes a media frame header, a video segment, and an audio segment
  • the video segment further includes a video segment header and one or more video units
  • the audio segment further includes an audio segment header and one or more audio units
  • the generated video unit and audio unit are both variable length.
  • the input video data and audio data are a video code stream and an audio code stream.
  • the system for video and audio co-stream transmission includes a transmitting end device and a receiving end device, and the transmitting end device includes a media frame sending module, where the receiving end device includes a media frame receiving module and a media frame parsing module, where:
  • the sending end device further includes an audio and video sorting module and a media frame encapsulating module, where: the video and audio sorting module is configured to buffer the input video data and audio number according to a time slice period. The video and audio data are sequentially sorted in a play time sequence and sent to the media frame encapsulation module.
  • the media frame encapsulation module is configured to sequentially package the sorted video data and audio data in the same time slice period into the same sequence. a video unit in the video segment of the media frame and an audio unit in the audio segment, and the playing time information of each video unit and the audio unit is written in the media frame, and the media frame is encapsulated and sent to the media frame sending module;
  • the media frame sending module is configured to send the encapsulated media frame to the receiving end device;
  • the receiving end device further includes an audio and video decoding module and an audio and video playing module, where: the media frame receiving module is configured to receive The broadcast media frame stream sent by the sending end device is forwarded to the media frame parsing module;
  • the media frame parsing module is configured to parse the video unit and the audio unit from the media frame, send the video unit to the video and audio decoding module, and parse the initial play time and the relative play time information from the media frame, and send the information to the video and audio play module. ;
  • the video and audio decoding module is configured to decode video and audio encoded data of the video unit and the audio unit into video and audio data that can be played, and send the video and audio data to the video and audio playback module;
  • the video and audio playback module is configured to calculate a play time of the corresponding video data and audio data according to the received initial play time and the relative play time of each video unit and the audio unit, and play in time sequence.
  • the media frame encapsulation module encapsulates the media frame header, the video segment, and the audio segment into a media frame
  • the initial playback time is filled in the media frame header
  • the video segment header and one or more video units are encapsulated as a video segment
  • the audio segment header and one or more audio units are encapsulated into audio segments
  • the relative playing time of each video unit is filled into the segment header of the corresponding video segment
  • the relative playing time of each audio unit is filled in The segment header of the corresponding audio segment.
  • the input video data and audio data buffered by the video and audio sorting module are a video code stream and an audio code stream.
  • the video and audio data in the same time period are encapsulated in a media frame in a chronological order and transmitted at the front end, thereby simplifying the video and audio synchronization control at the receiving end. , reduce buffer time at the receiving end, reduce video and audio resolution Complexity, improving the user experience.
  • Another technical problem to be solved by the present invention is to provide a method for realizing video and audio co-current at a receiving end device and a corresponding receiving end device, which can realize synchronous play of video and audio in the same stream, and simplify video and audio synchronous control at the receiving end. Reduce the complexity of video and audio parsing.
  • the present invention provides a method for a video device to perform video and audio co-stream transmission, including the following steps:
  • the receiving device After receiving the media frame from the broadcast channel, the receiving device parses out the video unit, the audio unit, and the playing time information, calculates the playing time of each video unit and the audio unit, and sequentially decodes the video unit and the audio unit. , play according to the corresponding playback time.
  • the media frame received by the receiving device includes a media frame header, a video segment, and an audio segment
  • the receiving device parses the initial playing time from the media frame header, and parses the video segment from the video segment.
  • the receiving device parses the initial playing time from the media frame header, and parses the video segment from the video segment.
  • the video unit parses out the audio unit from the audio segment, parsing the relative playing time of each video unit in the segment header of the video segment, and parsing the relative playing time of each audio unit in the segment header of the audio segment.
  • the playing time of the video data is obtained by using the initial playing time plus the relative playing time of the video unit
  • the playing time of the audio data is obtained by using the initial playing time plus the relative playing time of the audio unit.
  • the receiving end device comprises a media frame receiving module and an audio and video playing module, wherein:
  • the receiving end device further includes a media frame parsing module and a video and audio decoding module, where the media frame receiving module is configured to receive a broadcast media frame stream sent by the sending end device, and forward the packet to the media frame parsing module;
  • the media frame parsing module is configured to parse the video unit and the audio unit from the media frame, send the video unit to the video and audio decoding module, and parse the initial play time and the relative play time information from the media frame, and send the information to the video and audio play module. ;
  • the video and audio decoding module is configured to decode video and audio encoded data of the video unit and the audio unit into video and audio data that can be played, and send the video and audio data to the video and audio playback module;
  • the video and audio playback module is configured to calculate a play time of the corresponding video data and audio data according to the received initial play time and the relative play time of each audio unit and the video unit, and according to the time play in order.
  • the media frame received by the receiving device includes a media frame header, a video segment, and an audio segment
  • the receiving device parses the initial playing time from the media frame header, and parses the video from the video segment.
  • parsing out the audio unit from the audio segment parsing the relative playing time of the video unit in the segment header of the video segment, parsing the relative playing time of each audio unit in the segment header of the audio segment.
  • the initial playing time plus the relative playing time of the video unit obtains the video data playing time
  • the playing time of the audio data is obtained by using the initial playing time plus the relative playing time of the audio unit.
  • the method for realizing video and audio co-current at the receiving end device and the corresponding receiving end device proposed by the present invention can correctly parse and play video and audio data that are sequentially transmitted in a media frame in the same time period, simplifying
  • the terminal video and audio synchronization control reduces the buffer time of the terminal, reduces the complexity of video and audio analysis, and improves the user experience.
  • FIG. 1 is a system structural diagram of video and audio co-stream transmission of a mobile multimedia broadcast in the present invention
  • FIG. 2 is a schematic structural diagram of a media frame in the present invention
  • FIG. 3 is a flowchart of a multimedia broadcast front end broadcast according to an embodiment of the present invention.
  • FIG. 4 is a flowchart of playing a multimedia broadcast terminal according to an embodiment of the present invention.
  • the present invention provides a system for video and audio co-current transmission, as shown in FIG. 1, including a transmitting end device and a receiving end device of a mobile multimedia broadcasting system.
  • the transmitting device includes an audio and video sequencing module, a media frame encapsulating module, and a media frame sending module
  • the receiving device includes a media frame receiving module, a media frame parsing module, an audio and video decoding module, and an audio and video playing module;
  • the video and audio sorting module is configured to receive a data input stream, cache video and audio (also referred to as video and audio) data in a time slice period, and then sort the cached video and audio data in chronological order, and then arrange The sequenced video and audio data is sent to the media frame encapsulation module;
  • the input stream is a media stream that includes a video and audio stream.
  • the media frame encapsulation module is configured to package the video and audio data sorted by time into a video segment and an audio segment respectively in a sequence, as a video unit and an audio unit, respectively, that is, sequentially packaged into video in the same media frame. a video unit in the segment and an audio unit in the audio segment, and transmitting the encapsulated media frame to the media frame sending module;
  • the media frame includes a body frame header, a video segment, and an audio segment, as shown in FIG. 2;
  • the media frame header includes information such as a media frame start play time, a video stream parameter, and an audio stream parameter;
  • the video segment is composed of a video segment header and One or more video units, the video segment header contains video relative play time, and the video unit is variable length;
  • the audio segment is composed of an audio segment header and one or more audio units, and the audio segment header contains audio relative playback Time, and the audio unit is variable length.
  • the media frame sending module is configured to send the encapsulated media frame to the receiving end device as a broadcast media frame stream;
  • the media frame receiving module is configured to receive a broadcast media stream sent by the sending end device, and forward the same to the media frame parsing module;
  • the media frame parsing module is configured to parse the video and audio unit from the media frame, and send the video and audio unit to the video and audio decoding module, and simultaneously parse the initial playing time and the relative playing time information from the media frame, and send the information to the video and audio playing module;
  • the video and audio decoding module is configured to receive the parsed video and audio unit, decode the specified video and audio encoded data into video and audio data that can be played by the terminal hardware, and send the decoded video and audio data to the video and audio playing module;
  • the video and audio playing module is configured to receive the video and audio data decoded by the video and audio decoding module, and calculate the playing time of the corresponding video and audio data according to the received initial playing time and the relative playing time of each video and audio unit, and the video and audio data. Play in the terminal in chronological order.
  • the invention also provides a method for video and audio co-stream transmission:
  • the mobile multimedia broadcast can transmit a broadcast channel frame structure data in a fixed time slice, and the fixed time slice can be 1 second but not limited to 1 second, and can also be other time values. Now assume Take 1 second as the time slice period.
  • the sending device receives a data input stream, such as a mobile multimedia broadcast front end.
  • the input stream is a media stream containing a video and audio stream, that is, an audiovisual data stream.
  • step 304 is performed; if the time slice period has not arrived, then step 301 is continued.
  • the video data and the audio data sorted by time are sequentially encapsulated into audio units of video segments and audio segments of video segments in the same media frame, and the relative playing time of the video unit and the audio unit is calculated;
  • a receiving end device such as a mobile multimedia broadcast terminal, acquires a media frame from a broadcast channel every second;
  • the audio and video unit of the media frame is sequentially placed in a decoder for decoding.
  • the media frame contains video and audio data in the same time period
  • the synchronization control is simpler and easier than using the RTP method; and since the video and audio units are all variable length, each video and audio unit can be transmitted.
  • the complete video and audio data makes it unnecessary for the multimedia broadcast receiver to recover the complete video and audio data through a large amount of buffering and complex parsing. This parsing method is more convenient than TS parsing.

Description

视音频同流传输的方法、 系统及相应的接收方法和设备
技术领域
本发明涉及移动多媒体广播或手机电视技术领域, 尤其涉及一种移动多 媒体广播视音频同流传输的方法及系统。
背景技术
移动多媒体广播是近年来兴起的一种多媒体播放技术。通过手持的终端, 即使在高速移动的情况下, 也可以观看电视。 终端通过无线协议, 接收到节 目单, 并可以选择自己有权限收看的频道, 接收所选频道的多媒体数据, 从 而实现在移动终端上看电视。
系统发射的空中数据, 被分成不同的频道, 每个频道的数据又包括: 视 频、 音频和数据三种类型, 需要通过复用技术来分别传输; 终端可以接收相 关的数据, 通过终端上的播放器, 实现电视的正常播放。
现常见的媒体流传输标准主要有以下两种:
第一种是 RTP ( Real-time Transport Protocol, 实时传输协议)方式。 这 种方式可以通过打开多个 RTP通道来分别传输视频流和音频流; 但这种方式 存在视频流和音频流之间同步传输较难控制的问题。
第二种是 TS ( Transport Stream,传输流)方式。 TS协议是 MPEG( Moving Picture Expert Grou , 运动图像专家组)标准中的一种, 它把视频、 音频放在 一个固定 188字节大小的包里传输, 通过 PID ( Program Identifier, 节目标识 ) 字段来区分视频和音频,从而可以支持视频和音频在一个 TS流中传输;但是 TS方式的每个包很小, 终端需要分级解析, 为了获取一个完整的视音频帧, 需要緩存大量 TS包, 而且解析逻辑较复杂。
发明内容
本发明要解决的技术问题是提供一种视音频同流传输的方法及系统, 简 化终端视音频同步控制, 降低视频音频解析复杂度, 改善用户体验。 为了解决上述技术问题, 本发明提供了一种视音频同流传输的方法, 包 括以下步骤:
发送端设备按时间片周期緩存输入的视频数据和音频数据并按播放时间 顺序排序, 将同一时间片周期内的排序后的视频数据和音频数据分别封装为 同一媒体帧内视频段中的视频单元和音频段中的音频单元, 并在该媒体帧中 写入视频单元和音频单元的播放时间信息, 封装完成后, 将该媒体帧从广播 信道传输给接收端设备。
进一步地, 该方法还包括:
接收端设备从广播信道中获取媒体帧后, 解析出其中的视频单元、 音频 单元和播放时间信息, 计算得到每个视频单元和音频单元的播放时间, 将所 述视频单元和音频单元顺序解码后, 根据对应的播放时间进行播放。
进一步的, 所述播放时间信息包括起始播放时间和相对播放时间, 所述 起始播放时间为所述时间片周期内所有视频单元与音频单元的播放时间中的 最早时间值, 所述相对播放时间是用视频单元或音频单元的播放时间减去所 述起始播放时间得到。
进一步的, 所述媒体帧包括媒体帧头、 视频段和音频段, 视频段又包括 视频段头和一个或多个视频单元, 音频段又包括音频段头和一个或多个音频 单元; 在所述媒体帧写入播放时间信息时, 是将所述起始播放时间填入媒体 帧头, 将每个视频单元和音频单元的相对播放时间填入对应的视频段头和音 频段头。
进一步的, 生成的视频单元和音频单元都是可变长的。
进一步的, 所述输入的视频数据和音频数据是视频码流和音频码流。
本发明提供的视音频同流传输的系统, 包括发送端设备及接收端设备, 所述发送端设备包括媒体帧发送模块,所述接收端设备包括媒体帧接收模块、 媒体帧解析模块, 其中:
所述发送端设备还包括视音频排序模块、 媒体帧封装模块, 其中: 所述视音频排序模块用于按时间片周期緩存输入的视频数据和音频数 据, 将所述视频和音频数据按播放时间顺序排序后发送给媒体帧封装模块; 所述媒体帧封装模块用于将同一时间片周期内的排序后的视频数据和音 频数据分别顺序封装为同一媒体帧内视频段中的视频单元和音频段中的音频 单元, 并在该媒体帧中写入各视频单元和音频单元的播放时间信息, 封装好 媒体帧后发送给媒体帧发送模块;
所述媒体帧发送模块用于将封装好的媒体帧发送给所述接收端设备; 所述接收端设备还包括视音频解码模块及视音频播放模块, 其中: 所述媒体帧接收模块用于接收所述发送端设备发来的广播媒体帧流, 并 转发给媒体帧解析模块;
所述媒体帧解析模块用于从媒体帧中解析出视频单元和音频单元, 发送 给视音频解码模块, 同时从媒体帧中解析出起始播放时间和相对播放时间信 息, 发送给视音频播放模块;
所述视音频解码模块用于将视频单元和音频单元的视音频编码数据解码 成能够播放的视音频数据, 并发送给视音频播放模块;
所述视音频播放模块用于根据收到的起始播放时间和各视频单元和音频 单元的相对播放时间计算出相应视频数据和音频数据的播放时间, 并按时间 顺序播放。
进一步的, 所述媒体帧封装模块将媒体帧头、 视频段和音频段封装为媒 体帧时, 将所述起始播放时间填入媒体帧头, 将视频段头和一个或多个视频 单元封装为视频段, 将音频段头和一个或多个音频单元封装为音频段, 将每 个视频单元的相对播放时间填入对应的视频段的段头, 将每个音频单元的相 对播放时间填入到对应的音频段的段头。
进一步的, 所述视音频排序模块緩存的所述输入的视频数据和音频数据 是视频码流和音频码流。
釆用本发明提出的移动多媒体广播视音频同流传输的方法及系统, 将相 同时间段内视音频数据, 按时间顺序封装在一个媒体帧后在前端发送, 从而 可简化接收端视音频同步控制, 减少接收端的緩冲时间, 降低视频音频解析 复杂度, 改善用户体验。
本发明要解决的另一技术问题是提供一种在接收端设备实现视音频同流 的方法及相应的接收端设备, 可以实现视音频同流时的同步播放, 简化接收 端视音频同步控制, 降低视频音频解析复杂度。
为了解决上述技术问题, 本发明提供了一种接收端设备实现视音频同流 传输的方法, 包括以下步骤:
接收端设备从广播信道中获取媒体帧后, 解析出其中的视频单元、 音频 单元和播放时间信息, 计算得到每个视频单元和音频单元的播放时间, 将所 述视频单元和音频单元顺序解码后, 根据对应的播放时间进行播放。
进一步的, 所述接收端设备收到的所述媒体帧包括媒体帧头、 视频段和 音频段, 所述接收端设备是从所述媒体帧头解析出起始播放时间, 从所述视 频段解析出视频单元, 从所述音频段解析出音频单元, 在所述视频段的段头 解析出各视频单元的相对播放时间, 在所述音频段的段头解析出各音频单元 的相对播放时间, 用所述起始播放时间加上视频单元的相对播放时间得到视 频数据的播放时间, 用所述起始播放时间加上音频单元的相对播放时间得到 音频数据的播放时间。
本发明提供的接收端设备, 包括媒体帧接收模块和视音频播放模块, 其 中:
所述接收端设备还包括媒体帧解析模块和视音频解码模块; 其中, 所述媒体帧接收模块用于接收发送端设备发来的广播媒体帧流, 并转发 给媒体帧解析模块;
所述媒体帧解析模块用于从媒体帧中解析出视频单元和音频单元, 发送 给视音频解码模块, 同时从媒体帧中解析出起始播放时间和相对播放时间信 息, 发送给视音频播放模块;
所述视音频解码模块用于将视频单元和音频单元的视音编码数据解码成 能够播放的视音频数据, 并发送给视音频播放模块;
所述视音频播放模块用于根据收到的起始播放时间和各音频单元和视频 单元的相对播放时间计算出相应视频数据和音频数据的播放时间, 并按时间 顺序播放。
进一步的, 所述接收端设备接收的所述媒体帧包括媒体帧头、 视频段和 音频段, 所述接收端设备从所述媒体帧头解析出起始播放时间, 从所述视频 段解析出视频单元, 从所述音频段解析出音频单元, 在所述视频段的段头解 析出视频单元的相对播放时间, 在所述音频段的段头解析出各音频单元的相 对播放时间, 用所述起始播放时间加上视频单元的相对播放时间得到视频数 据播放时间, 用所述起始播放时间加上音频单元的相对播放时间得到音频数 据的播放时间。
釆用本发明提出的在接收端设备实现视音频同流的方法及相应的接收端 设备, 可以正确解析和播放在相同时间段内封装在一个媒体帧内按时间顺序 发送的视音频数据, 简化终端视音频同步控制, 减少终端的緩冲时间, 降低 视频音频解析复杂度, 改善用户体验。
附图概述
图 1是本发明中移动多媒体广播视音频同流传输的系统结构图; 图 2是本发明中媒体帧的结构示意图;
图 3是本发明实施例的多媒体广播前端广播流程图;
图 4是本发明实施例的多媒体广播终端播放流程图。
本发明的较佳实施方式
下面将结合附图及实施例对本发明的技术方案作更详细的说明: 本发明提供一种视音频同流传输的系统, 如图 1所示, 包括移动多媒体 广播系统发送端设备及接收端设备, 发送端设备包括视音频排序模块、 媒体 帧封装模块及媒体帧发送模块, 接收端设备包括媒体帧接收模块、 媒体帧解 析模块、 视音频解码模块及视音频播放模块; 其中:
视音频排序模块用于接收数据输入流,按时间片周期緩存视频和音频(文 中也简称视音频)数据, 然后将緩存的视音频数据按时间顺序排序, 再将排 序后的视音频数据发送给媒体帧封装模块;
其中, 输入流是包含视频、 音频码流的媒体流。
媒体帧封装模块用于将按时间排序后的视音频数据, 分别作为视频单元 和音频单元, 按顺序分别封装到媒体帧中的视频段和音频段, 即按顺序分别 封装为同一媒体帧中的视频段中的视频单元和音频段中的音频单元, 并将封 装好的媒体帧发送给媒体帧发送模块;
其中, 媒体帧包^某体帧头、 视频段和音频段, 如图 2所示; 媒体帧头 包含媒体帧起始播放时间、 视频流参数和音频流参数等信息; 视频段由视频 段头和一个或多个视频单元组成, 视频段头中包含视频相对播放时间, 且视 频单元是可变长的; 音频段由音频段头和一个或多个音频单元组成, 音频段 头中包含音频相对播放时间, 且音频单元是可变长的。
媒体帧发送模块用于将封装好的媒体帧以广播媒体帧流的方式发送给接 收端设备;
媒体帧接收模块用于接收发送端设备发来的广播媒体流, 并将其转发给 媒体帧解析模块;
媒体帧解析模块用于从媒体帧中解析出视音频单元, 并将其发送给视音 频解码模块, 同时从媒体帧中解析出起始播放时间和相对播放时间信息, 发 送给视音频播放模块;
视音频解码模块用于接收解析后的视音频单元, 将指定的视音频编码数 据解码成终端硬件能够播放的视音频数据, 并把解码后的视音频数据发送给 视音频播放模块;
视音频播放模块用于接收视音频解码模块解码出来的视音频数据并根据 收到的起始播放时间和各视音频单元的相对播放时间计算出相应的视音频数 据的播放时间, 将视音频数据按时间顺序在终端展现即播放。
本发明还提供一种视音频同流传输的方法:
移动多媒体广播可以在一个固定的时间片内发送一个广播信道帧结构数 据, 固定时间片可以是 1秒但不限于 1秒, 也可以是其它时间值。 现在假设 以 1秒钟作为时间片周期。
在移动多媒体广播发送端设备上, 如图 3所示, 按以下步骤操作: 301 , 发送端设备如移动多媒体广播前端接收数据输入流;
其中, 输入流是包含视频、 音频码流的媒体流, 即视音频数据流。
302, 将输入的视音频数据放入视音频緩存中;
303 , 判断时间片周期是否到达;
如果时间片周期到达, 则进行步骤 304; 如果时间片周期尚未到达, 则 继续执行步骤 301。
304 , 将緩存的视音频数据按播放时间顺序进行排序;
305 , 将按时间排序的视频数据和音频数据, 按顺序分别封装为同一媒体 帧内的视频段的视频单元和音频段的音频单元, 并计算出视频单元和音频单 元的相对播放时间;
其中, 将该时间片周期内所有视频单元与音频单元的播放时间的最早时 间值填入媒体帧头中的起始播放时间字段; 根据视频单元和音频单元播放时 间和起始播放时间以及计算公式:视频单元或音频单元相对播放时间 =视频 单元或音频单元播放时间 -媒体帧起始播放时间, 计算出视频单元和音频单 元相对播放时间, 将视频单元和音频单元相对播放时间分别写入视频段头和 音频段头中。
306, 将媒体帧从广播信道传输给接收端设备。
在移动多媒体广播接收端设备上, 如图 4所示, 按以下步骤操作:
401 ,接收端设备如移动多媒体广播终端每秒钟从广播信道中获取一个媒 体帧;
402,对所获得的媒体帧进行解析, 获取视音频单元及其起始播放时间和 相对播放时间信息;
403 , 将该媒体帧的视音频单元顺序放入解码器中解码;
404, 计算视音频数据的播放时间, 根据其播放时间播放。
其中, 视音频数据播放时间通过计算公式: 视频数据或音频数据播放时 间 =媒体帧起始播放时间 +视频单元或音频单元相对播放时间得到。
综上可知, 由于媒体帧中包含了相同时间段中的视音频数据, 同步控制 比使用 RTP方式更加简单、 容易; 同时由于视音频单元都是可变长的, 每个 视音频单元都可以传输完整的视音频数据, 使得多媒体广播接收端不再需要 通过大量的緩存和复杂的解析来恢复完整的视音频数据,这种解析方法比 TS 解析更加简便。

Claims

权 利 要 求 书
1、 一种视音频同流传输的方法, 包括以下步骤:
发送端设备按时间片周期緩存输入的视频数据和音频数据并按播放时间 顺序排序, 将同一时间片周期内的排序后的视频数据和音频数据分别封装为 同一媒体帧内视频段中的视频单元和音频段中的音频单元, 并在该媒体帧中 写入视频单元和音频单元的播放时间信息, 封装完成后, 将该媒体帧从广播 信道传输给接收端设备。
2、 如权利要求 1所述的方法, 其特征在于, 还包括:
接收端设备从广播信道中获取媒体帧后, 解析出其中的视频单元、 音频 单元和播放时间信息, 计算得到每个视频单元和音频单元的播放时间, 将所 述视频单元和音频单元顺序解码后, 根据对应的播放时间进行播放。
3、 如权利要求 2所述的方法, 其特征在于:
所述播放时间信息包括起始播放时间和相对播放时间, 所述起始播放时 间为所述时间片周期内所有视频单元与音频单元的播放时间中的最早时间 值, 所述相对播放时间是用视频单元或音频单元的播放时间减去所述起始播 放时间得到。
4、 如权利要求 3所述的方法, 其特征在于:
所述媒体帧包括媒体帧头、 视频段和音频段, 视频段又包括视频段头和 一个或多个视频单元, 音频段又包括音频段头和一个或多个音频单元; 在所 述媒体帧写入播放时间信息时, 是将所述起始播放时间填入媒体帧头, 将每 个视频单元和音频单元的相对播放时间填入对应的视频段头和音频段头。
5、 如权利要求 1所述的方法,其特征在于: 生成的视频单元和音频单元 都是可变长的。
6、 如权利要求 1所述的方法,其特征在于: 所述输入的视频数据和音频 数据是视频码流和音频码流。
7、 一种在接收端设备实现视音频同流的方法, 包括以下步骤: 接收端设备从广播信道中获取媒体帧后, 解析出其中的视频单元、 音频 单元和播放时间信息, 计算得到每个视频单元和音频单元的播放时间, 将所 述视频单元和音频单元顺序解码后, 根据对应的播放时间进行播放。
8、 如权利要求 7所述的方法, 其特征在于:
所述接收端设备收到的所述媒体帧包括媒体帧头、 视频段和音频段, 所 述接收端设备是从所述媒体帧头解析出起始播放时间, 从所述视频段解析出 视频单元, 从所述音频段解析出音频单元, 在所述视频段的段头解析出各视 频单元的相对播放时间, 在所述音频段的段头解析出各音频单元的相对播放 时间, 用所述起始播放时间加上视频单元的相对播放时间得到视频数据的播 放时间, 用所述起始播放时间加上音频单元的相对播放时间得到音频数据的 播放时间。
9、 一种视音频同流传输的系统, 包括发送端设备及接收端设备, 所述发 送端设备包括媒体帧发送模块, 所述接收端设备包括媒体帧接收模块、 媒体 帧解析模块, 其特征在于:
所述发送端设备还包括视音频排序模块、 媒体帧封装模块, 其中: 所述视音频排序模块用于按时间片周期緩存输入的视频数据和音频数 据, 将所述视频和音频数据按播放时间顺序排序后发送给媒体帧封装模块; 所述媒体帧封装模块用于将同一时间片周期内的排序后的视频数据和音 频数据分别顺序封装为同一媒体帧内视频段中的视频单元和音频段中的音频 单元, 并在该媒体帧中写入各视频单元和音频单元的播放时间信息, 封装好 媒体帧后发送给媒体帧发送模块;
所述媒体帧发送模块用于将封装好的媒体帧发送给所述接收端设备; 所述接收端设备还包括视音频解码模块及视音频播放模块, 其中: 所述媒体帧接收模块用于接收所述发送端设备发来的广播媒体帧流, 并 转发给媒体帧解析模块;
所述媒体帧解析模块用于从媒体帧中解析出视频单元和音频单元, 发送 给视音频解码模块, 同时从媒体帧中解析出起始播放时间和相对播放时间信 息, 发送给视音频播放模块; 所述视音频解码模块用于将视频单元和音频单元的视音频编码数据解码 成能够播放的视音频数据, 并发送给视音频播放模块;
所述视音频播放模块用于根据收到的起始播放时间和各视频单元和音频 单元的相对播放时间计算出相应视频数据和音频数据的播放时间, 并按时间 顺序播放。
10、 如权利要求 9所述的系统, 其特征在于:
所述媒体帧封装模块将媒体帧头、 视频段和音频段封装为媒体帧时, 将 所述起始播放时间填入媒体帧头, 将视频段头和一个或多个视频单元封装为 视频段, 将音频段头和一个或多个音频单元封装为音频段, 将每个视频单元 的相对播放时间填入对应的视频段的段头, 将每个音频单元的相对播放时间 填入到对应的音频段的段头。
11、 如权利要求 9或 10所述的系统,其特征在于: 所述视音频排序模块 緩存的所述输入的视频数据和音频数据是视频码流和音频码流。
12、 一种接收端设备, 包括媒体帧接收模块和视音频播放模块, 其特征 在于:
所述接收端设备还包括媒体帧解析模块和视音频解码模块; 其中, 所述媒体帧接收模块用于接收发送端设备发来的广播媒体帧流, 并转发 给媒体帧解析模块;
所述媒体帧解析模块用于从媒体帧中解析出视频单元和音频单元, 发送 给视音频解码模块, 同时从媒体帧中解析出起始播放时间和相对播放时间信 息, 发送给视音频播放模块;
所述视音频解码模块用于将视频单元和音频单元的视音编码数据解码成 能够播放的视音频数据, 并发送给视音频播放模块;
所述视音频播放模块用于根据收到的起始播放时间和各音频单元和视频 单元的相对播放时间计算出相应视频数据和音频数据的播放时间, 并按时间 顺序播放。
13、 如权利要求 12所述的接收端设备, 其特征在于: 所述接收端设备接收的所述媒体帧包括媒体帧头、 视频段和音频段, 所 述接收端设备从所述媒体帧头解析出起始播放时间, 从所述视频段解析出视 频单元, 从所述音频段解析出音频单元, 在所述视频段的段头解析出视频单 元的相对播放时间,在所述音频段的段头解析出各音频单元的相对播放时间, 用所述起始播放时间加上视频单元的相对播放时间得到视频数据播放时间, 用所述起始播放时间加上音频单元的相对播放时间得到音频数据的播放时 间。
PCT/CN2008/072681 2008-05-13 2008-10-14 视音频同流传输的方法、系统及相应的接收方法和设备 WO2009137972A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN200810081867.9 2008-05-13
CN2008100818679A CN101272499B (zh) 2008-05-13 2008-05-13 一种视音频同流传输的方法及系统

Publications (1)

Publication Number Publication Date
WO2009137972A1 true WO2009137972A1 (zh) 2009-11-19

Family

ID=40006145

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2008/072681 WO2009137972A1 (zh) 2008-05-13 2008-10-14 视音频同流传输的方法、系统及相应的接收方法和设备

Country Status (2)

Country Link
CN (1) CN101272499B (zh)
WO (1) WO2009137972A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111092898A (zh) * 2019-12-24 2020-05-01 华为终端有限公司 报文传输方法及相关设备
CN113347468A (zh) * 2021-04-21 2021-09-03 深圳市乐美客视云科技有限公司 一种基于以太网帧的音视频传输方法、装置及存储介质

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101272499B (zh) * 2008-05-13 2010-08-18 中兴通讯股份有限公司 一种视音频同流传输的方法及系统
CN101686236B (zh) * 2008-09-27 2012-07-25 中国移动通信集团公司 并发关联类业务的同步方法及其装置
CN101389010B (zh) * 2008-10-13 2012-02-29 中兴通讯股份有限公司 播放器以及播放方法
CN101533655B (zh) * 2008-12-19 2011-06-15 徐清华 基于高清媒体播放器的多个ts档视频文件的播放方法
CN102307179A (zh) * 2011-04-21 2012-01-04 广东电子工业研究院有限公司 基于龙芯的流媒体解码方法
CN102510488B (zh) * 2011-11-04 2015-11-11 播思通讯技术(北京)有限公司 一种利用广播特性进行音视频同步的方法及装置
US9671998B2 (en) * 2014-12-31 2017-06-06 Qualcomm Incorporated Synchronised control
CN110944003B (zh) * 2019-12-06 2022-03-29 北京数码视讯软件技术发展有限公司 文件传输方法和电子设备
CN112764709B (zh) * 2021-01-07 2021-09-21 北京创世云科技股份有限公司 一种声卡数据的处理方法、装置及电子设备

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1960485A (zh) * 2006-08-29 2007-05-09 中兴通讯股份有限公司 一种移动多媒体广播视音频同步播放的方法
CN1972454A (zh) * 2006-11-30 2007-05-30 中兴通讯股份有限公司 一种移动多媒体广播实时流的封装方法
CN1972407A (zh) * 2006-11-30 2007-05-30 中兴通讯股份有限公司 一种移动多媒体广播视音频同步播放的方法
CN101272500A (zh) * 2008-05-14 2008-09-24 中兴通讯股份有限公司 一种视音频数据流的传输方法及系统
CN101272499A (zh) * 2008-05-13 2008-09-24 中兴通讯股份有限公司 一种视音频同流传输的方法及系统

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1960485A (zh) * 2006-08-29 2007-05-09 中兴通讯股份有限公司 一种移动多媒体广播视音频同步播放的方法
CN1972454A (zh) * 2006-11-30 2007-05-30 中兴通讯股份有限公司 一种移动多媒体广播实时流的封装方法
CN1972407A (zh) * 2006-11-30 2007-05-30 中兴通讯股份有限公司 一种移动多媒体广播视音频同步播放的方法
CN101272499A (zh) * 2008-05-13 2008-09-24 中兴通讯股份有限公司 一种视音频同流传输的方法及系统
CN101272500A (zh) * 2008-05-14 2008-09-24 中兴通讯股份有限公司 一种视音频数据流的传输方法及系统

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111092898A (zh) * 2019-12-24 2020-05-01 华为终端有限公司 报文传输方法及相关设备
CN111092898B (zh) * 2019-12-24 2022-05-10 华为终端有限公司 报文传输方法及相关设备
CN113347468A (zh) * 2021-04-21 2021-09-03 深圳市乐美客视云科技有限公司 一种基于以太网帧的音视频传输方法、装置及存储介质
CN113347468B (zh) * 2021-04-21 2023-01-13 深圳市乐美客视云科技有限公司 一种基于以太网帧的音视频传输方法、装置及存储介质

Also Published As

Publication number Publication date
CN101272499B (zh) 2010-08-18
CN101272499A (zh) 2008-09-24

Similar Documents

Publication Publication Date Title
WO2009137972A1 (zh) 视音频同流传输的方法、系统及相应的接收方法和设备
US20200029130A1 (en) Method and apparatus for configuring content in a broadcast system
JP4423263B2 (ja) 携帯端末向け伝送方法及び装置
CN101889451B (zh) 通过独立解码器时钟来减少媒体流延迟的系统和方法
RU2530731C2 (ru) Способ и устройство для предоставления возможности быстрого переключения каналов при ограниченной памяти приемника dvb
US20070174880A1 (en) Method, apparatus, and system of fast channel hopping between encoded video streams
WO2008028367A1 (fr) Procédé permettant de réaliser des pistes audio multimédia pour un système de diffusion multimédia mobile
CN101151829A (zh) 流传送中的缓存
WO2005043784A1 (ja) 複数サービスが多重化された放送波の受信装置および受信方法
CN101179736B (zh) 一种传输流节目转换成中国移动多媒体广播节目的方法
CA2792106C (en) Method and system for inhibiting audio-video synchronization delay
JP2009512265A (ja) ネットワーク上の動画データ伝送制御システムとその方法
CN101207822A (zh) 流媒体终端的音频视频同步的方法
CN1972408A (zh) 一种移动多媒体广播系统的数据传送方法
CN101272200B (zh) 一种多媒体流同步缓存的方法及系统
WO2008022501A1 (fr) Procédé de diffusion en multimédia mobile de flots multividéo
WO2008031293A1 (en) A method for quickly playing the multimedia broadcast channels
KR100640467B1 (ko) 멀티미디어 스트림 평활화가 가능한 아이피 스트리밍 장치
CN100479529C (zh) 一种广播网络复用协议的转换方法
JP5428734B2 (ja) ネットワーク機器、情報処理装置、ストリーム切替方法、情報処理方法、プログラムおよびコンテンツ配信システム
WO2008022499A1 (fr) Procédé de paquetisation de flux de diffusion multimédia mobile en temps réel
CN101179738B (zh) 一种传输流到中国移动多媒体广播复用协议的转换方法
CN101102485A (zh) 移动终端的音频及视频同步装置和方法
WO2012070447A1 (ja) 映像信号出力方法及び映像情報再生装置
CN1960520B (zh) 一种在移动多媒体广播中传递辅助数据的方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 08874244

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 08874244

Country of ref document: EP

Kind code of ref document: A1