WO2012034442A1 - System and method for realizing synchronous transmission and reception of scalable video coding service - Google Patents

System and method for realizing synchronous transmission and reception of scalable video coding service Download PDF

Info

Publication number
WO2012034442A1
WO2012034442A1 PCT/CN2011/076622 CN2011076622W WO2012034442A1 WO 2012034442 A1 WO2012034442 A1 WO 2012034442A1 CN 2011076622 W CN2011076622 W CN 2011076622W WO 2012034442 A1 WO2012034442 A1 WO 2012034442A1
Authority
WO
WIPO (PCT)
Prior art keywords
code stream
stream
media
layer code
unit
Prior art date
Application number
PCT/CN2011/076622
Other languages
French (fr)
Chinese (zh)
Inventor
童登金
谢文军
戴志军
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2012034442A1 publication Critical patent/WO2012034442A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/242Synchronization processes, e.g. processing of PCR [Program Clock References]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/2343Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
    • H04N21/234327Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements by decomposing into layers, e.g. base layer and one or more enhancement layers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/60Network structure or processes for video distribution between server and client or between remote clients; Control signalling between clients, server and network components; Transmission of management data between server and client, e.g. sending from server to client commands for recording incoming content stream; Communication details between server and client 
    • H04N21/63Control signaling related to video distribution between client, server and network components; Network processes for video distribution between server and clients or between remote clients, e.g. transmitting basic layer and enhancement layers over different transmission paths, setting up a peer-to-peer communication via Internet between remote STB's; Communication protocols; Addressing
    • H04N21/633Control signals issued by server directed to the network components or client
    • H04N21/6332Control signals issued by server directed to the network components or client directed to client
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/24Systems for the transmission of television signals using pulse code modulation

Definitions

  • the present invention relates to a transmission technology in a mobile multimedia broadcasting system, and more particularly to a scalable mobile video coding service (SVC, Scalable Video Coding) service in a China Mobile Multimedia Broadcasting System (CMMB), which implements a scalable video coding service.
  • SVC Scalable Video Coding
  • CMMB China Mobile Multimedia Broadcasting System
  • the China Mobile Multimedia Broadcasting System Standard specifies the frame structure, channel coding and modulation of the broadcast channel transmission signal of the mobile multimedia broadcasting system within the frequency range of the broadcasting service.
  • the CMMB standard “Mobile Multicast Broadcasting Part 2: Multiplexing” specifies that multiplexed sub-frames are used to encapsulate streaming media such as video and audio for transmission.
  • Scalable Video Coding is a video grading coding method.
  • the encoder encodes the video content source to generate a plurality of layers of code streams, and the base layer code stream can be separately decoded.
  • the enhancement layer code stream includes additional information for improving the quality of the lower layer code stream, and needs a low layer including the base layer. Decode together.
  • SVC technology can provide scalable and scalable services, differentiate services with different quality of service, and adapt to the capabilities of various terminals. It has many advantages. Therefore, it is necessary to transmit SVC services in the CMMB system.
  • the patent application number is 200910088679.3, "Classification Transmission and Reception Method and Apparatus in Mobile Multimedia Broadcasting System", and a method for implementing SVC transmission in CMMB is given.
  • the base layer code stream in the SVC video service can be used.
  • Layered transmission with the enhancement layer code stream, and the base layer code stream and the enhancement layer code stream are respectively encapsulated in different multiplex subframes in the broadcast channel frame according to the layer to which they belong, and the code stream of each layer of the video stream is simultaneously
  • the location information of the multiplex subframe in which it is located is encapsulated in the broadcast channel frame and transmitted to the receiving end.
  • the terminal monitors location information of a multiplex subframe in which each layer of the video stream in the video service in the broadcast channel frame is located, and the receiving terminal receives the base layer code stream according to the video stream processing capability of the broadcast channel, or receives the base layer code stream and Corresponding enhancement layer code stream, for base layer code stream, Or decoding the base layer code stream and the corresponding enhancement layer code stream, and outputting the video data of the base layer code stream or the video data of the base layer code stream and the enhancement layer code stream.
  • the SVCs are separated and transmitted in different multiplex subframes.
  • the synchronization and coordination of each layer is a problem that must be solved.
  • how to perform synchronization is a problem that must be solved.
  • an access unit at a specific time point such as a video frame
  • the coding units of different layers transmitted by the subframe are combined, such as combining multiple layers of the same video frame, and then performing video decoding presentation.
  • each layered coding unit must be synchronized to ensure that the coding unit of the same access unit is involved in the merging, and the success of the merging is ensured, and the merging success can ensure the success of the subsequent decoding.
  • the technical problem to be solved by the present invention is to provide a system and method for implementing SVC video synchronous transmission and reception when transmitting SVC layered services in a CMMB, and to ensure separate SVC layered data transmitted in different multiplex subframes. Synchronization between the two, thus ensuring the normal implementation of the SVC service in the CMMB.
  • the present invention provides a sending method, including:
  • the video service is encoded to generate a multi-channel media stream, and the multiple media streams are respectively encapsulated in different multiplex subframes in a broadcast channel frame in units of media units, and the multiplexer is also carried in the multiplex subframe.
  • a mobile multimedia broadcast timestamp of each media unit encapsulated in the frame, and a mobile multimedia broadcast timestamp of the media stream at the same time instant is synchronized, and the location information of the multiplexed subframe in which the media stream is located is encapsulated in the broadcast
  • the broadcast channel frame is sent to the receiving terminal, where the media stream includes a basic layer code stream generated by the video stream coding of the video service and its corresponding enhancement layer code stream, or includes audio of a video service.
  • the basic layer code stream generated by the video stream coding of the stream and video service and its corresponding enhancement layer code stream thereby implementing the scalable video coding industry in the mobile multimedia broadcast Send synchronously.
  • the foregoing method may further have the following feature: the media stream is encapsulated into the multiplex subframe according to the following manner:
  • RTP real-time transport protocol
  • RTCP real-time transport control protocol
  • the foregoing method may further have the following feature: converting the RTP timestamp into a mobile multimedia broadcast timestamp as follows: For each media unit, extracting an RTP timestamp of the RTP packet in which it is located, and corresponding to the RTP code stream corresponding thereto Calculating the time information of the RTCP packet transmitted by the RTCP code stream, and calculating the NTP time of the media unit;
  • the foregoing method may further have the following feature: the mobile multimedia broadcast timestamp synchronization of the media stream at the same sampling time refers to: the difference between the mobile multimedia broadcast timestamp values of the media streams at the same sampling time is The preset time stamp tolerance value range.
  • the present invention also provides a transmitting system, where the system includes an encoding device and a front-end transmitting device, where:
  • the encoding device is configured to: encode a video service to generate a multi-media media stream; the media stream includes a base layer code stream generated by the video stream coding of the video service and a corresponding enhancement layer code stream, or include the An audio stream of a video service, a base layer code stream generated by the video stream coding of the video service, and a corresponding enhancement layer code stream;
  • the front-end transmitting device is configured to: separately package the media stream in units of media units In a different multiplex subframe in the broadcast channel frame, the mobile multimedia broadcast timestamp of each media unit encapsulated in the multiplex subframe is encapsulated in the multiplex subframe, and the media stream moves at the same time
  • the multimedia broadcast timestamp is synchronized, and the location information of the multiplex subframe in which the media stream is located is encapsulated in the broadcast channel frame, and the broadcast channel frame is sent to the receiving terminal; thereby implementing scalability in the mobile multimedia broadcast
  • the video encoding service is sent synchronously.
  • the above system may further have the following features, the encoding device comprising an encoding unit and a packaging unit, wherein:
  • the encoding unit is configured to: encode a video service to generate a multi-channel media stream;
  • the encapsulating unit is configured to: encapsulate the multi-path media stream into a multi-channel real-time transport protocol (RTP) code stream, where each RTP code stream is accompanied by a real-time transport control protocol (RTCP) code stream, and the RTCP
  • RTP real-time transport protocol
  • RTCP real-time transport control protocol
  • NTP network time protocol
  • the front-end transmitting device includes a first encapsulating unit, a second encapsulating unit, a converting unit, a third encapsulating unit, and a sending unit, where:
  • the first encapsulating unit is configured to: extract media streams encapsulated in the RTP code stream, and encapsulate the media streams in different multiplex sub-frames in a broadcast channel frame in units of media units;
  • the second encapsulating unit is configured to: encapsulate location information of a multiplex subframe in which the media stream is located in the broadcast channel frame;
  • the converting unit is configured to: convert the RTP timestamp of the media stream into an NTP time, and then convert the NTP time into a mobile multimedia broadcast timestamp under the unified time reference;
  • the third encapsulating unit is configured to: encapsulate the mobile multimedia broadcast timestamp into a multiplex subframe in which the corresponding media unit is located; and the sending unit is configured to: send the broadcast channel frame to the receiving terminal.
  • the above system may further have the following features, the conversion unit includes a first conversion unit and a second conversion unit, wherein:
  • the first converting unit is configured to: for each media unit, take out an RTP timestamp of the RTP packet in which it is located, and calculate time information of the RTCP packet transmitted by the RTCP code stream corresponding to the RTP stream in which the RTP stream is located, and calculate the media unit.
  • NTP time The second converting unit is configured to: multiply the NTP time of the media unit by a mobile multimedia broadcast time scale to obtain a mobile multimedia broadcast time stamp of the media unit.
  • the above system may also have the following features: the mobile multimedia broadcast timestamp synchronization of the media stream at the same sampling time refers to: the difference between the mobile multimedia broadcast timestamp values of the media streams at the same sampling time is The preset time stamp tolerance value range.
  • the invention also provides a receiving method, comprising:
  • the receiving terminal receives the basic layer code stream according to its own video stream processing capability, or receives the basic layer code stream and the corresponding enhancement layer code stream;
  • the layer code streams are aligned and combined according to the mobile multimedia broadcast timestamp, and then the base layer code stream and the corresponding enhancement layer code stream are decoded, and the basic layer code stream is output.
  • the video data combined with the enhancement layer code stream; thereby implementing scalable video coding service reception in the mobile multimedia broadcast.
  • the foregoing method may further have the following feature: the receiving the base layer code stream and the corresponding enhanced layer code stream, and aligning the layer code streams according to the mobile multimedia broadcast timestamp comprises:
  • the receiving terminal will receive the base layer code stream and the enhancement layer code stream and store them in the buffer;
  • the synchronized enhancement layer code stream data is stamped and combined as data of the same video access unit.
  • the foregoing method may further have the following feature: the enhanced layer code stream data synchronized with the mobile multimedia broadcast timestamp of the base layer code stream data refers to a mobile multimedia broadcast time with the base layer code stream data.
  • the enhancement layer code stream data within the preset time stamp tolerance value difference.
  • the invention also provides a receiving device, comprising:
  • a monitoring unit configured to: monitor location information of a multiplex subframe in which each layer of the video stream in the video service in the broadcast channel frame is located;
  • a receiving unit configured to: receive a base layer code stream according to its own video stream processing capability, or receive a base layer code stream and a corresponding enhancement layer code stream;
  • Aligning the merging unit which is set to: align each layer code stream according to the mobile multimedia broadcast time stamp;
  • a decoding unit configured to: decode the base layer code stream and the corresponding enhancement layer code stream, and output video data combined by the base layer code stream and the enhancement layer code stream; thereby implementing scalable video coding service reception in the mobile multimedia broadcast .
  • the foregoing apparatus may further have the following feature, the receiving unit is further configured to: store the received base layer code stream and the enhancement layer code stream in a buffer;
  • the alignment merging unit is configured to: extract base layer code stream data belonging to a video access unit from a buffer, and extract the base layer based on a mobile multimedia broadcast timestamp of the base layer code stream data.
  • the mobile multimedia broadcast time-stamped enhanced layer code stream data of the code stream data is combined as data of the same video access unit.
  • the foregoing apparatus may further have the following feature, the extracting the enhanced layer code stream data synchronized with the mobile multimedia broadcast timestamp of the base layer code stream data, and extracting the mobile multimedia with the base layer code stream data The enhancement layer code stream data whose difference between the broadcast time stamps is within a preset time stamp tolerance value range.
  • synchronous CMMB broadcast time stamps are set in different layered SVC data to ensure synchronization between different hierarchical data.
  • FIG. 1 is a schematic structural diagram of a CMMB channel frame
  • FIG. 2 is a schematic diagram of the system of the present invention.
  • FIG. 3 is a schematic structural diagram of a structure of a broadcast channel frame according to the present invention.
  • FIG. 4 is a schematic diagram of a terminal processing function according to the present invention.
  • the base layer video unit and the enhancement layer video unit participating in the synthesis must be synchronized.
  • Preferred embodiment of the invention The basic idea of the embodiment of the present invention is to perform synchronous SCM service synchronization by applying synchronized CMMB timestamps on the audio stream, the base layer code stream, and the corresponding enhancement layer code stream at the same time.
  • An embodiment of the present invention provides a method for implementing synchronous transmission of a scalable video coding service in a mobile multimedia broadcast, including:
  • the video service is encoded to generate a multi-channel media stream, and the multiple media streams are respectively encapsulated in different multiplex subframes in a broadcast channel frame in units of media units, and the multiplexer is also carried in the multiplex subframe.
  • a mobile multimedia broadcast timestamp of each media unit encapsulated in the frame, and a mobile multimedia broadcast timestamp of the media stream at the same time instant is synchronized, and the location information of the multiplexed subframe in which the media stream is located is encapsulated in the broadcast
  • the broadcast channel frame is sent to the receiving terminal, where the media stream includes a base layer code stream generated by the video stream coding and its corresponding enhancement layer code stream, or includes a basic generation of the audio stream and the video stream coding. Layer code stream and its corresponding enhancement layer code stream.
  • the media stream is encapsulated into the multiplex subframe according to the following manner:
  • RTP real-time transport protocol
  • RTCP real-time transport control protocol
  • the RTP timestamp is converted into a mobile multimedia broadcast timestamp as follows: For each media unit, the RTP timestamp of the RTP packet in which it is located is extracted, and the RTCP packet transmitted by the RTCP code stream corresponding to the RTP stream in which it is located is related. Time information, calculating an NTP time of the media unit;
  • the mobile multimedia broadcast timestamp of the media unit includes two parts: a start play time and a relative play time corresponding to each media unit. Start of each media unit within the same multiplex sub-frame Play time is the same.
  • the mobile multimedia broadcast timestamp synchronization of the media stream at the same sampling time refers to: the difference between the mobile multimedia broadcast timestamp values of the media streams at the same sampling time is within a preset time stamp tolerance range Inside.
  • An embodiment of the present invention further provides a system for implementing synchronous transmission of a scalable video coding service in a mobile multimedia broadcast, where the system includes an encoding device and a front-end transmitting device, where:
  • the encoding device is configured to encode a video service to generate a multi-path media stream;
  • the media stream includes a base layer code stream generated by the video stream coding and a corresponding enhancement layer code stream, or includes an audio stream and a video stream coding generation Base layer code stream and its corresponding enhancement layer code stream;
  • the front-end transmitting device is configured to encapsulate the media stream in different multiplex subframes in a broadcast channel frame in units of media units, and move multimedia broadcast time of each media unit encapsulated in the multiplex subframe Stamping is encapsulated in the multiplexed subframe, and the mobile multimedia broadcast timestamp of the media stream at the same time instant is synchronized, and the location information of the multiplexed subframe in which the media stream is located is encapsulated in the broadcast channel frame. And transmitting the broadcast channel frame to the receiving terminal.
  • the encoding device includes an encoding unit and a packaging unit, where:
  • the coding unit is configured to encode a video service to generate a multi-channel media stream
  • the encapsulating unit is configured to encapsulate the multi-path media stream into a multi-channel real-time transport protocol (RTP) code stream, where each RTP code stream is accompanied by a real-time transport control protocol (RTCP) code stream, and the RTCP
  • RTP real-time transport protocol
  • RTCP real-time transport control protocol
  • NTP network time protocol
  • the front-end transmitting device includes a first encapsulating unit, a second encapsulating unit, a converting unit, a third encapsulating unit, and a sending unit, where:
  • a first encapsulating unit configured to extract a media stream encapsulated in the RTP code stream, and encapsulate the media stream in different multiplex subframes in a broadcast channel frame in units of media units;
  • the second encapsulating unit And for encapsulating location information of the multiplex subframe in which the media stream is located in the broadcast channel frame;
  • the converting unit is configured to convert an RTP timestamp of the media stream into an NTP time, and then convert the NTP time into a mobile multimedia broadcast timestamp under a unified time reference;
  • the third encapsulating unit is configured to encapsulate the mobile multimedia broadcast timestamp into a multiplex subframe in which the corresponding media unit is located, and the sending unit is configured to send the broadcast channel frame to the receiving terminal.
  • the converting unit includes a first converting unit and a second converting unit, where: the first converting unit is configured to: for each media unit, take out an RTP timestamp of an RTP packet in which it is located, and combine the RTP stream in which the RTP stream is located. Corresponding time information of the RTCP packet transmitted by the corresponding RTCP code stream, and calculating an NTP time of the media unit;
  • the second converting unit is configured to multiply the NTP time of the media unit by a mobile multimedia broadcast time scale to obtain a mobile multimedia broadcast timestamp of the media unit.
  • An embodiment of the present invention further provides a method for implementing a scalable video coding service in a mobile multimedia broadcast, including:
  • the receiving terminal receives the basic layer code stream according to its own video stream processing capability, or receives the basic layer code stream and the corresponding enhancement layer code stream;
  • the layer code streams are aligned and combined according to the mobile multimedia broadcast timestamp, and then the base layer code stream and the corresponding enhancement layer code stream are decoded, and the basic layer code stream is output.
  • Video data combined with the enhancement layer stream.
  • the receiving the base layer code stream and the corresponding enhancement layer code stream, and aligning the layer code streams according to the mobile multimedia broadcast timestamp includes:
  • the receiving terminal will receive the base layer code stream and the enhancement layer code stream and store them in the buffer;
  • the synchronized enhancement layer code stream data is stamped and combined as data of the same video access unit.
  • the enhanced layer code stream data synchronized with the mobile multimedia broadcast timestamp of the base layer code stream data refers to a mobile multimedia broadcast timestamp of the enhancement layer code stream data and the mobile multimedia of the base layer code stream data.
  • the difference between the broadcast timestamps is within the preset timestamp tolerance value range.
  • An embodiment of the present invention further provides a device for implementing a scalable video coding service in a mobile multimedia broadcast, including:
  • a monitoring unit which monitors location information of a multiplex subframe in which each layer of the video stream in the video service in the broadcast channel frame is located;
  • the receiving unit receives the basic layer code stream according to its own video stream processing capability, or receives the basic layer code stream and the corresponding enhancement layer code stream;
  • the aligning unit is configured to align the code streams of each layer according to the time limit of the mobile multimedia broadcast; the decoding unit decodes the base layer code stream and the corresponding enhancement layer code stream, and combines the output base layer code stream and the enhancement layer code stream. Video data.
  • the receiving unit is further configured to store the received base layer code stream and the enhancement layer code stream in a buffer;
  • the alignment merging unit is configured to extract base layer code stream data belonging to a video access unit from a buffer, and extract the base layer based on a mobile multimedia broadcast timestamp of the base layer code stream data.
  • the mobile multimedia broadcast time-stamped enhanced layer code stream data of the code stream data is combined as data of the same video access unit.
  • the extracting the enhanced layer code stream data synchronized with the mobile multimedia broadcast timestamp of the base layer code stream data, the difference between the mobile multimedia broadcast timestamp and the mobile layer broadcast timestamp of the base layer code stream data is The enhancement layer code stream data within the preset time stamp tolerance value range.
  • An embodiment of the present invention provides a system for implementing multiple hierarchical synchronization of a scalable video coding service, including an encoding device, a front-end transmitting device, and a terminal, where:
  • the encoding device is configured to encode a video source, generate an SVC code stream including a base layer and a plurality of enhancement layers, and the plurality of layered SVC code streams are encapsulated into multiple RTP (Real Time Transport Protocol) code streams, and Each RTP code stream is accompanied by an RTCP (Real Time Transmission Control Protocol) code stream, which is used to ensure synchronization of audio and video services in the base layer over NTP (Network Time Protocol), and multiple layered SVC services. Synchronization at NTP (Network Time Protocol) time, that is, to ensure synchronization of the base layer code stream and the corresponding enhancement layer code stream at the same time of NTP time, if there is an audio stream, the audio of the same time is also guaranteed.
  • NTP Network Time Protocol
  • the stream is synchronized with the base layer code stream and the corresponding enhancement layer code stream at NTP time.
  • the front-end transmitting device is configured to receive an RTP code stream and an RTCP code stream sent by the encoding device, extract an SVC base layer code stream and an enhancement layer code stream from the RTP code stream, and use the SVC base layer code stream and the enhancement layer code stream.
  • the CMMB broadcast timestamps of the video units encapsulated in the multiplexed sub-frames are also carried in the multiplexed subframes of the broadcast channel frame. And storing location information of the multiplex subframe in which each layer of the video stream is located in the broadcast channel frame.
  • the RTP timestamp of the media data is converted into a CMMB broadcast timestamp under the unified time reference, and the synchronization of the SVC layered services on the CMMB broadcast timestamp is controlled, and the media data to be sent together with the media data is controlled.
  • the CMMB broadcast timestamp is broadcasted together.
  • the terminal is configured to listen to location information of a multiplex subframe where each layer of the video stream in the video service in the broadcast channel frame is located, and receive the base layer code stream according to the video stream processing capability of the broadcast channel, or receive the base layer code.
  • Stream and corresponding enhancement layer code stream are synchronized according to the CMMB broadcast time stamp, and then video decoding is performed.
  • the embodiment of the invention further provides a method for implementing multiple hierarchical synchronization of a scalable video coding service, including:
  • the encoding device encapsulates the plurality of layered SVC code streams generated by the encoding into a plurality of RTP code streams, and sends an RTCP code stream for each RTP code stream.
  • the RTCP code stream ensures synchronization of the base layer SVC data, enhancement layer SVC data, and audio data at the same time instant at the NTP time;
  • the front-end transmitting device receives the RTP code stream and the RTCP code stream sent by the encoding device, extracts the encapsulated SVC service data from the RTP code stream, and encapsulates the SVC base layer code stream and the enhancement layer code stream into the broadcast channel according to the layer to which they belong.
  • the location information of the multiplex subframe in which the layer stream of each layer of the video stream is located is encapsulated in the broadcast channel frame.
  • the RTP timestamp of the media data is converted into a CMMB broadcast timestamp under the unified time reference, and the synchronization of the SVC hierarchical services on the CMMB broadcast timestamp is controlled, and the media data to be sent together with the media data is controlled.
  • the CMMB broadcast timestamp is broadcasted together.
  • the terminal monitors the location information of the multiplex subframe in which the layer streams of the video stream in the video service in the broadcast channel frame are located, and receives the base layer code stream according to the video stream processing capability of the broadcast channel, or receives the base layer code stream and the corresponding enhancement.
  • Layer code stream aligning coding units of different layers by CMMB broadcast timestamp, After the synchronization, the merge is performed, and then the video decoding is performed.
  • the SVC layered services are synchronized on the CMMB broadcast timestamp, as follows:
  • the received RTP and RTCP streams are calculated according to the time information of the RTCP packet transmitted by the RTP packet corresponding to the RTP packet corresponding to the RTP packet, and the NTP time corresponding to the video unit is calculated.
  • a start time is taken for all time-stamped media units to be encapsulated into the multiplex subframe, and the CMMB broadcast time stamp of each video unit is decomposed into multiplex subframes.
  • the start play time and the relative play time corresponding to each unit are two parts, and the start play time and the relative play time corresponding to each unit are encapsulated into the multiplex subframe.
  • the CMMB time scale described in step B 1 indicates the number of CMMB time units per second. According to the CMMB standard, Mobile Multimedia Broadcasting Part 2: Multiplexing, the CMMB time scale is 22,500. The maximum number of CMMB broadcast timestamps specified by the standard is 32 bits.
  • the CMMB broadcast timestamp is similar to the RTP timestamp and is a relative timestamp.
  • the initial play time in the multiplex subframe indicates a play time reference of each audio unit or video unit in the multiplex subframe, and the relative play time of each audio unit or video unit indicates each of the multiplex subframes.
  • the offset of the playing time of the audio unit or the video unit relative to the starting playing time, and the relative time relationship of playing of each audio unit or video unit in one multiplexed sub-frame may be determined according to the initial playing time and the relative playing time.
  • the multiplexed sub-frame stream Inter-subframes have a common reference for their timestamps, lacking a starting reference for the timestamp of the SVC traffic transmitted in the multiplexed sub-frame stream, so that synchronization between the multiplexed sub-frame streams is not possible .
  • the CMMB broadcast timestamp corresponding to the video unit is converted by directly multiplying the CMMB time scale by the NTP time, because the NTP time of the video unit at the same time is the same Steps, thereby ensuring that the video units of the same sample time are synchronized with their CMMB broadcast timestamps.
  • the CMMB broadcast timestamp can be jointly represented by the initial play time and the relative play time.
  • the start play time in the same multiplex sub-frame is the same, and the start play time of different multiplex sub-frames can be the same or different.
  • the CMMB broadcast timestamp start value of the SVC service may be determined by using the NTP time of the first video unit in the processed first multiplex subframe (other time may also be used as the CMMB broadcast timestamp). The starting value), and the CMMB broadcast timestamp of the subsequent video unit is determined by the conversion method.
  • a common timestamp reference of the SVC service transmitted by the multiplexed sub-frame stream is established, and each video in the multiplex sub-frame of the same service is established.
  • the unit determines the play time offset of the unit relative to the CMMB broadcast timestamp start value relative to the deviation of the CMMB broadcast timestamp start value according to the CMMB broadcast timestamp converted according to the above method.
  • the invention can ensure the unification of the SVC hierarchical service data at the same NTP time point in the CMMB broadcast timestamp, thereby providing a guarantee means for synchronizing between different layers of the SVC.
  • each multiplex subframe corresponding to each layer of the SVC can be processed independently, and there is no coupling dependency between them.
  • the time stamp of the audio data is processed in the same manner as the video data.
  • the buffer may be one, or one for each layer, the buffer size should be able to accommodate each The transmission time difference of hierarchical data. For some reasons, the SVC layered data at the same time cannot be completely guaranteed to be received at the same time.
  • the threshold of the receiving time difference of each layered service data is Td, and the buffer should be able to accommodate all points of the SVC in the Td time range.
  • the merged data is taken from the buffer, and the merge is aligned.
  • the base layer data belonging to a video access unit is taken, and the enhancement layer data corresponding to the base layer data timestamp is taken as the reference based on the timestamp of the base layer data, and is used as the same video.
  • Data merge of access units is taken from the buffer, and the merge is aligned.
  • step B2 for some reason, the time information carried by the RTP and RTCP sent by the encoding device may not ensure that the basic layer data, the enhancement layer data, and the audio data at the same sampling time are completely consistent in the NTP time. In this way, the CMMB broadcast timestamp converted on the front-end device cannot be completely consistent, and there may be a small deviation.
  • a timestamp tolerance is also set. value.
  • FIG. 2 is a schematic diagram of the system of the present invention.
  • the encoding device encodes the SVC video into a base layer code stream and at least one enhancement layer code stream, and the base layer code stream can be separately decoded, and the enhancement layer code stream includes additional information for improving the quality of the lower layer code stream, and needs and includes the base layer.
  • the lower layers are decoded together.
  • the video data is based on the video access unit, and a typical video access unit is a video frame.
  • a video access unit at a specific time point can be encoded into multiple layers of data, and the data of the multiple layers can be divided into multiple channels for transmission; the receiving terminal can receive multiple channels simultaneously according to needs.
  • the data of multiple layers transmitted is combined and decoded by the associated video access unit.
  • the encoding device is an H264 SVC encoder
  • the basic unit of the encoded output video data is a Network Abstraction Layer Unit (NALU), and a plurality of NALUs at the same time point form a video access unit.
  • NALU Network Abstraction Layer Unit
  • the broadcast service includes SVC video and one channel audio.
  • SVC video uses spatial layering mode, encoded as a QVGA (Quad VGA, 320 240 pixel image) elementary stream and a VGA (Video Graphic Array, 640x480 pixel image) enhanced video stream, the audio is encoded as an audio stream.
  • QVGA Quad VGA, 320 240 pixel image
  • VGA Video Graphic Array, 640x480 pixel image
  • the encoding device sends the generated base layer code stream, the enhancement layer code stream, and the audio stream into an RTP code stream.
  • a RTCP stream is sent along with it.
  • the time stamps of each RTP stream may not depend on each other, and have independent RTP time scales and initial time stamps, and the time scale indicates the number of time units of the medium in 1 second. Guarantee the same through RTCP Basic layer SVC data, enhancement layer SVC data, and audio data synchronization at NTP time. For example, the audio has a time scale of 48000 and the video has a time scale of 90000.
  • the RTP timestamp may be Tbase
  • the enhancement layer data is encapsulated in the RTP packet of the enhanced stream
  • the RTP timestamp may be Text, synchronized with it.
  • the audio RTP packet timestamp at the moment can be Raudio.
  • the RTP streams corresponding to the RTP streams are respectively carried.
  • the SR (sender report) in the RTCP packet carries the reference NTP time and the corresponding reference RTP timestamp, and the base layer SVC at the same time should be guaranteed.
  • the data, the enhancement layer SVC data, and the synchronization of the audio data in the NTP time should be consistent.
  • the "NTP time should be consistent" the corresponding NTP time of Tbase, Text, Taudio is approximately equal, and a deviation value is allowed, which can be determined according to needs.
  • FIG. 3 is a schematic structural diagram of a structure of a broadcast channel frame according to the present invention.
  • time slot 0 multiplex frame 0
  • time slot 1 to time slot 39 are used to transmit service information.
  • the front-end transmitting device configures the video base layer stream, audio, and data information in the video service S into the multiplex frame 1, occupying slot 1 to slot 4, and multiplexing the subframe number to 1.
  • the video enhancement layer code stream is configured to multiplex frame 2, occupying slot 5 to slot 6, and having a multiplex subframe number of 1. No other services are transmitted in multiplex frame 1 and multiplex frame 2.
  • Descriptive information of a multiplexing frame position carrying each layer code stream (the base layer code stream and its corresponding enhancement layer code stream) is added to the service control information and the ESG (Electronic Service Guide) information, and the description is performed.
  • the information indicates that the video service S includes two multiplexed subframes: the multiplex subframe 1 of the multiplex frame 1 is the service base layer code stream data, and the multiplex subframe 1 of the multiplex frame 2 is the enhancement layer code stream data.
  • the front-end transmitting device obtains the basic layer code stream VI of the video service S, encapsulates it into the multiplex subframe 1 of the multiplex frame 1, acquires the enhancement layer code stream V2 of the video service S, and encapsulates the multiplex subframe of the multiplex frame 2. 1 in.
  • the front-end transmitting device acquires the audio code stream and the data segment information of the video service S, and adds it to the multiplex subframe 1 of the multiplex frame 1, that is, is carried in the same multiplex subframe together with the base layer code stream.
  • Layer code The description information of the multiplexed frame position of the stream is carried in the multiplex frame 0 in the broadcast channel frame to indicate the multiplexed frame position of each layer of the code stream of the receiving terminal, so as to facilitate receiving the video service s.
  • audio and video data When audio and video data is encapsulated into a multiplexed sub-frame, it is encapsulated in units of audio units or video units.
  • CMMB broadcast time stamp For each audio unit or video unit, there is a corresponding CMMB broadcast time stamp.
  • the time stamp is a combination of the initial play time in the multiplex subframe and the relative play time corresponding to each unit.
  • the initial play time and the relative play time corresponding to each unit are also encapsulated in the multiplex subframe and transmitted together with the media data.
  • the RTP timestamp of the received media data needs to be converted into a CMMB broadcast timestamp, and in this process, the synchronization of the SVC hierarchical services on the CMMB broadcast timestamp is controlled at the same time.
  • the SVC layered services are synchronized on the CMMB broadcast timestamp, as follows:
  • A. Receive the input RTP and RTCP streams, for each video unit, take the RTP timestamp of the RTP packet in which it is located, and calculate the video according to the relevant time information of the RTCP packet transmitted by the RTCP code stream corresponding to the RTP stream in which it is located.
  • CMMB time scale by the NTP time to obtain the CMMB broadcast timestamp of the video unit.
  • For the multiplex subframe take a starting play time, and decompose the CMMB broadcast time stamp of the video unit into two parts: the start play time and the relative play time corresponding to each unit, and the start play time and each unit corresponding to each unit.
  • the relative play time is encapsulated into the multiplex subframe.
  • the calculation of the NTP time corresponding to the video unit in step A can use the following methods:
  • NTP time J 1 - + reference NTP time formula ( 1 ) timescale
  • the number of time units of the media in 1 second can be different in different time units, such as RTP video, the clock of 90000Hz is commonly used as the time unit, the number of clocks in 1 second is 90000 is the timescale of the video; the common sampling rate of audio is the time unit.
  • the timescale is the number of samples in 1 second. If the sample rate is 48000/second, the number of samples in 1 second is the timescale of the audio.
  • the CMMB time scale described in step B indicates the number of CMMB time units per second, according to the CMMB standard, Mobile Multimedia Broadcasting Part 2: Multiplexing, which is 22,500.
  • each multiplex subframe corresponding to each layer of the SVC can be processed independently, and there is no coupling dependency between them.
  • the CMMB standard does not specify how the starting value of the CMMB broadcast timestamp of the service is determined (the starting play time in the multiplex subframe is for each multiplex subframe rather than for the service), through the multiplexer Frame transmission between SVC layered data does not have an existing method of guaranteeing synchronization.
  • the method converts the CMMB broadcast timestamp corresponding to the SVC video unit by directly multiplying the CMMB time scale by the NTP time, and determines the service by using the NTP time of the first SVC video unit in the first multiplexed subframe processed.
  • the CMMB broadcasts a timestamp start value and determines the CMMB broadcast timestamp of the subsequent SVC video unit by the conversion method.
  • the base layer data RTP timestamp is Tbase
  • the enhancement layer data RTP timestamp is Text
  • the audio RTP timestamp Raudio at the synchronization time the RTCP message sent by the encoder, Ensure that the NTP times of the respective Tbase, Text, and Taudio are consistent (the NTP times corresponding to each are equal or approximately equal, and a small offset value is allowed).
  • the basic layer data of the video stream, The enhancement layer data and the audio data on the synchronization point are corresponding to the CMMB broadcast timestamps when they are encapsulated into the multiplexed sub-frames (equal or approximately equal, allowing a small offset value) even though they Encapsulated in different multiplex subframes, the multiplex subframes are processed independently.
  • the method can ensure that the SVC service data at the same time point is unified in the CMMB broadcast time stamp, thereby providing a guarantee means for synchronizing between different layers of the SVC. As shown in FIG.
  • the terminal when receiving on the terminal, the terminal monitors the multiplex frame 0 in the broadcast channel frame, and receives the control information and the ESG information to correctly receive the multimedia broadcast service, and simultaneously monitors, and also includes the layer codes in the SVC service.
  • the terminal according to its own needs, such as its The video stream processing capability or the network transmission status determines whether the base layer code stream is received, or the base layer code stream and the corresponding enhancement layer code stream are received.
  • the terminal is a netbook that can process VGA video, and needs to receive the basic stream of QVGA and the enhanced stream of VGA for processing, and decode and display VGA video.
  • the terminal After receiving the elementary stream and the enhancement stream transmitted from different multiplex subframes, the terminal extracts the video unit from the multiplex subframe, parses out the H264 basic coding unit NALU, and broadcasts the NALUs belonging to different layers according to the CMMB broadcast time.
  • the stamps are aligned, synchronized, and then the NALUs belonging to the same video access unit are merged, and then video decoding is performed.
  • Each video unit has a corresponding CMMB broadcast timestamp, which is a combination of the initial play time in the multiplex subframe and the relative play time corresponding to each video unit.
  • the NALU data contained in the video unit is placed in a buffer along with its corresponding time stamp.
  • the buffer can be independent of the base layer and the enhancement layer. For some reasons, the SVC layered data at the same time cannot be completely guaranteed to be received at the same time.
  • the threshold value of the receiving time difference of each layered service data is Td, that is, the data belonging to the same video access unit is allowed to be received at the earliest.
  • the time difference between the data and the latest received data is Td, then the buffer should be able to accommodate all layers of SVC data in the Td time range;
  • the terminal takes a video access unit from the foregoing buffer, aligns and merges the NALU data of different layers of the video access unit, and sends the data to the decoder.
  • the first data fetching operation can be started after the buffer data is not less than the tolerance data of the aforementioned Td time.
  • the base layer data belonging to a video access unit is taken, and the enhancement layer data corresponding to the base layer data timestamp is taken as the reference, and the data is used as the data of the same video access unit, and the base layer data. Merging, the combined data is sent to the decoder for decoding as a complete video access unit.
  • step B for some reason, the time information carried by the RTP and RTCP sent by the encoding device may not ensure that the basic layer data, the enhancement layer data, and the audio data at the same sampling time are completely consistent in the NTP time. In this way, the CMMB broadcast timestamp converted on the front-end device cannot be completely consistent, and there may be a small deviation.
  • a timestamp tolerance is also set. value.
  • the embodiments of the present invention are not only applicable to the CMMB system, but also to other mobile multimedia broadcast systems.
  • the above is only the preferred embodiment of the present invention and is not intended to limit the scope of the present invention.
  • synchronous CMMB broadcast time stamps are set in different layered SVC data to ensure synchronization between different hierarchical data.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

A transmission method is provided in the present invention. The method includes that: a video service is coded to generate multi-path media streams which are respectively encapsulated in different multiplex sub-frames in a broadcast channel frame by taking a media unit as a unit; a mobile multi-media broadcast timestamp of each media unit encapsulated in the multiplex sub-frames is encapsulated; the mobile multi-media broadcast timestamps of the media streams at the same sampling time are synchronized; the position information of the multiplex sub-frames in which the media streams are located is encapsulated in the broadcast channel frame and is transmitted to a terminal; the media streams comprise a base layer code stream and enhancement layer code streams corresponding to the base layer code stream generated by coding video stream of the video service, and may also comprise an audio stream of the video service; and thereby synchronous transmission of a scalable video coding service is realized in the mobile multi-media broadcast. A corresponding transmission system is also provided in the present invention. A reception method and a device of the scalable video coding service, which align the received media data according to the mobile multi-media broadcast timestamp, are also provided in the present invention.

Description

实现可伸缩视频编码业务同步发送、 接收的系统和方法  System and method for realizing synchronous transmission and reception of scalable video coding service
技术领域 Technical field
本发明涉及移动多媒体广播系统中传输技术, 尤其涉及一种中国移动多 媒体广播系统(CMMB , China Mobile Multimedia Broadcasting ) 中传送可伸 缩视频编码(SVC, Scalable Video Coding )业务时, 实现可伸缩视频编码业 务同步发送、 接收的系统和方法。 背景技术  The present invention relates to a transmission technology in a mobile multimedia broadcasting system, and more particularly to a scalable mobile video coding service (SVC, Scalable Video Coding) service in a China Mobile Multimedia Broadcasting System (CMMB), which implements a scalable video coding service. Systems and methods for simultaneous transmission and reception. Background technique
随着通信技术的发展, 移动多媒体广播技术的应用越来越广。 目前, 中 国移动多媒体广播系统标准规定了在广播业务频率范围内, 移动多媒体广播 系统广播信道传输信号的帧结构、 信道编码和调制等。 CMMB标准《移动多 媒体广播 第 2部分: 复用》中规定, 釆用复用子帧来封装视频、 音频等流媒 体数据, 进行传送。  With the development of communication technology, the application of mobile multimedia broadcasting technology has become wider and wider. At present, the China Mobile Multimedia Broadcasting System Standard specifies the frame structure, channel coding and modulation of the broadcast channel transmission signal of the mobile multimedia broadcasting system within the frequency range of the broadcasting service. The CMMB standard "Mobile Multicast Broadcasting Part 2: Multiplexing" specifies that multiplexed sub-frames are used to encapsulate streaming media such as video and audio for transmission.
可伸缩视频编码( SVC, Scalable Video Coding )是一种视频分级编码方 式。 编码器对视频内容源进行编码, 产生多个层次的码流, 基本层码流可以 单独解码, 增强层码流包含用于提高低层码流质量的附加信息, 需要和包括 基本层在内的低层一起解码。 SVC技术可以提供可分级可伸缩的业务, 实现 有服务质量差别的区分业务、 实现各种终端的能力适配, 具有诸多优点, 因 此, 在 CMMB系统中传送 SVC业务是很有必要的。  Scalable Video Coding (SVC) is a video grading coding method. The encoder encodes the video content source to generate a plurality of layers of code streams, and the base layer code stream can be separately decoded. The enhancement layer code stream includes additional information for improving the quality of the lower layer code stream, and needs a low layer including the base layer. Decode together. SVC technology can provide scalable and scalable services, differentiate services with different quality of service, and adapt to the capabilities of various terminals. It has many advantages. Therefore, it is necessary to transmit SVC services in the CMMB system.
申请号为 200910088679.3的专利 《移动多媒体广播系统中分级传送、 接 收方法与装置》 , 给出了一种在 CMMB中实现 SVC传送的方法, 根据该方 法, 可以将 SVC视频业务中的基本层码流和增强层码流进行分层传送, 将基 本层码流和增强层码流按其所属的层分别封装于广播信道帧中的不同复用子 帧中, 同时将所述视频流各层码流所在的复用子帧的位置信息封装于所述广 播信道帧中并发送至接收端。 在终端, 监听广播信道帧中视频业务中视频流 的各层码流所在的复用子帧的位置信息, 接收终端根据自身的视频流处理能 力接收基本层码流, 或接收基本层码流及对应的增强层码流, 对基本层码流、 或基本层码流及对应的增强层码流进行解码, 输出基本层码流的视频数据、 或基本层码流与增强层码流合并后的视频数据。 The patent application number is 200910088679.3, "Classification Transmission and Reception Method and Apparatus in Mobile Multimedia Broadcasting System", and a method for implementing SVC transmission in CMMB is given. According to the method, the base layer code stream in the SVC video service can be used. Layered transmission with the enhancement layer code stream, and the base layer code stream and the enhancement layer code stream are respectively encapsulated in different multiplex subframes in the broadcast channel frame according to the layer to which they belong, and the code stream of each layer of the video stream is simultaneously The location information of the multiplex subframe in which it is located is encapsulated in the broadcast channel frame and transmitted to the receiving end. The terminal monitors location information of a multiplex subframe in which each layer of the video stream in the video service in the broadcast channel frame is located, and the receiving terminal receives the base layer code stream according to the video stream processing capability of the broadcast channel, or receives the base layer code stream and Corresponding enhancement layer code stream, for base layer code stream, Or decoding the base layer code stream and the corresponding enhancement layer code stream, and outputting the video data of the base layer code stream or the video data of the base layer code stream and the enhancement layer code stream.
SVC各个分层分开在不同的复用子帧传送, 各分层的同步、 协同是必须 解决的问题。 当各层码流所在的复用子帧所属的复用帧不同时, 如何进行同 步是一个必须解决阿的问题。  The SVCs are separated and transmitted in different multiplex subframes. The synchronization and coordination of each layer is a problem that must be solved. When the multiplexing frames to which the multiplex subframes in which the layer streams are located belong to different multiplex frames, how to perform synchronization is a problem that must be solved.
SVC视频中, 特定一个时间点上的接入单元比如一视频帧, 可被编码到 多个分层中, 它们被分开在不同的复用子帧传送; 在终端侧, 需要将通过不 同的复用子帧传输来的不同分层的编码单元合并, 如将同一视频帧的多个分 层合并起来, 然后进行视频解码展现。 在合并时, 各分层编码单元必须同步, 才能保证参与合并的是同一接入单元的编码单元, 保证合并的成功, 合并成 功才能保证后续解码展现成功。  In an SVC video, an access unit at a specific time point, such as a video frame, may be encoded into multiple layers, which are separately transmitted in different multiplex subframes; on the terminal side, it is required to pass different complexes. The coding units of different layers transmitted by the subframe are combined, such as combining multiple layers of the same video frame, and then performing video decoding presentation. When merging, each layered coding unit must be synchronized to ensure that the coding unit of the same access unit is involved in the merging, and the success of the merging is ensured, and the merging success can ensure the success of the subsequent decoding.
本发明需要解决的, 就是在 CMMB中传送 SVC分层业务时, 如何保证 上述多个 SVC分层之间的同步。 发明内容  What the present invention needs to solve is how to ensure synchronization between the multiple SVC layers when transmitting SVC layered services in the CMMB. Summary of the invention
本发明要解决的技术问题是在 CMMB中传送 SVC分层业务时, 提供一 种实现 SVC视频同步发送、接收的系统和方法, 保证分开在不同的复用子帧 中传送的 SVC分层数据之间的同步,从而保证 CMMB中 SVC业务的正常实 施。  The technical problem to be solved by the present invention is to provide a system and method for implementing SVC video synchronous transmission and reception when transmitting SVC layered services in a CMMB, and to ensure separate SVC layered data transmitted in different multiplex subframes. Synchronization between the two, thus ensuring the normal implementation of the SVC service in the CMMB.
为了解决上述问题, 本发明提供了一种发送方法, 包括:  In order to solve the above problems, the present invention provides a sending method, including:
将视频业务编码生成多路媒体流, 将所述多路媒体流以媒体单元为单位 分别封装在广播信道帧中的不同复用子帧中, 所述复用子帧中还携带该复用 子帧中封装的各媒体单元的移动多媒体广播时间戳, 且同一釆样时刻的媒体 流的移动多媒体广播时间戳同步, 并将所述媒体流所在的复用子帧的位置信 息封装于所述广播信道帧中, 将所述广播信道帧发送至接收终端, 所述媒体 流包括所述视频业务的视频流编码生成的基本层码流及其对应的增强层码 流, 或者, 包括视频业务的音频流、 视频业务的视频流编码生成的基本层码 流及其对应的增强层码流; 从而在移动多媒体广播中实现可伸缩视频编码业 务同步发送。 The video service is encoded to generate a multi-channel media stream, and the multiple media streams are respectively encapsulated in different multiplex subframes in a broadcast channel frame in units of media units, and the multiplexer is also carried in the multiplex subframe. a mobile multimedia broadcast timestamp of each media unit encapsulated in the frame, and a mobile multimedia broadcast timestamp of the media stream at the same time instant is synchronized, and the location information of the multiplexed subframe in which the media stream is located is encapsulated in the broadcast In the channel frame, the broadcast channel frame is sent to the receiving terminal, where the media stream includes a basic layer code stream generated by the video stream coding of the video service and its corresponding enhancement layer code stream, or includes audio of a video service. The basic layer code stream generated by the video stream coding of the stream and video service and its corresponding enhancement layer code stream; thereby implementing the scalable video coding industry in the mobile multimedia broadcast Send synchronously.
优选地, 上述方法还可具有以下特点, 按如下方式将所述媒体流封装到 所述复用子帧中:  Preferably, the foregoing method may further have the following feature: the media stream is encapsulated into the multiplex subframe according to the following manner:
将所述多路媒体流封装为多路实时传输协议(RTP )码流, 其中, 每路 RTP码流伴随一路实时传输控制协议( RTCP )码流, 且所述 RTCP码流保证 同一釆样时刻的媒体流的网络时间协议( NTP ) 时间同步;  Encapsulating the multi-channel media stream into a multi-channel real-time transport protocol (RTP) code stream, where each RTP code stream is accompanied by a real-time transport control protocol (RTCP) code stream, and the RTCP code stream ensures the same sample time Network Time Protocol (NTP) time synchronization of media streams;
提取所述 RTP码流中封装的媒体流, 将所述媒体流以媒体单元为单位分 别封装在广播信道帧中的不同复用子帧中, 将所述媒体流的 RTP时间戳转换 为 NTP时间, 再将 NTP时间转换为统一时间基准下的移动多媒体广播时间 戳, 将所述移动多媒体广播时间戳封装到其对应的媒体单元所在的复用子帧 中。  Extracting the media stream encapsulated in the RTP code stream, and encapsulating the media stream in different multiplex subframes in a broadcast channel frame in units of media units, and converting the RTP timestamp of the media stream into an NTP time And converting the NTP time into a mobile multimedia broadcast timestamp under the unified time reference, and packaging the mobile multimedia broadcast timestamp into a multiplex subframe in which the corresponding media unit is located.
优选地, 上述方法还可具有以下特点, 按如下方式将 RTP时间戳转换为 移动多媒体广播时间戳: 对每一媒体单元, 取出其所在 RTP包的 RTP时间戳, 结合其所在 RTP 码流对应的 RTCP码流传送的 RTCP包的有关时间信息, 计算该媒体单元的 NTP时间;  Preferably, the foregoing method may further have the following feature: converting the RTP timestamp into a mobile multimedia broadcast timestamp as follows: For each media unit, extracting an RTP timestamp of the RTP packet in which it is located, and corresponding to the RTP code stream corresponding thereto Calculating the time information of the RTCP packet transmitted by the RTCP code stream, and calculating the NTP time of the media unit;
将该媒体单元的 NTP时间与移动多媒体广播时间刻度相乘,得到所述媒 体单元的移动多媒体广播时间戳。  Multiplying the NTP time of the media unit by the mobile multimedia broadcast time scale to obtain a mobile multimedia broadcast time stamp of the media unit.
优选地, 上述方法还可具有以下特点, 所述同一釆样时刻的媒体流的移 动多媒体广播时间戳同步是指: 同一釆样时刻的媒体流的移动多媒体广播时 间戳值之间的差值在预设的时间戳容差值范围内。 本发明还提供一种发送系统, 所述系统包括编码设备和前端发送设备, 其中:  Preferably, the foregoing method may further have the following feature: the mobile multimedia broadcast timestamp synchronization of the media stream at the same sampling time refers to: the difference between the mobile multimedia broadcast timestamp values of the media streams at the same sampling time is The preset time stamp tolerance value range. The present invention also provides a transmitting system, where the system includes an encoding device and a front-end transmitting device, where:
所述编码设备设置为: 将视频业务编码生成多路媒体流; 所述媒体流包 括所述视频业务的视频流编码生成的基本层码流及其对应的增强层码流, 或 者, 包括所述视频业务的音频流、 所述视频业务的视频流编码生成的基本层 码流及其对应的增强层码流;  The encoding device is configured to: encode a video service to generate a multi-media media stream; the media stream includes a base layer code stream generated by the video stream coding of the video service and a corresponding enhancement layer code stream, or include the An audio stream of a video service, a base layer code stream generated by the video stream coding of the video service, and a corresponding enhancement layer code stream;
所述前端发送设备设置为: 将所述媒体流以媒体单元为单位分别封装在 广播信道帧中的不同复用子帧中, 将该复用子帧中封装的各媒体单元的移动 多媒体广播时间戳封装在所述复用子帧中, 且同一釆样时刻的媒体流的移动 多媒体广播时间戳同步, 并将所述媒体流所在的复用子帧的位置信息封装于 所述广播信道帧中, 将所述广播信道帧发送至接收终端; 从而在移动多媒体 广播中实现可伸缩视频编码业务同步发送。 The front-end transmitting device is configured to: separately package the media stream in units of media units In a different multiplex subframe in the broadcast channel frame, the mobile multimedia broadcast timestamp of each media unit encapsulated in the multiplex subframe is encapsulated in the multiplex subframe, and the media stream moves at the same time The multimedia broadcast timestamp is synchronized, and the location information of the multiplex subframe in which the media stream is located is encapsulated in the broadcast channel frame, and the broadcast channel frame is sent to the receiving terminal; thereby implementing scalability in the mobile multimedia broadcast The video encoding service is sent synchronously.
优选地, 上述系统还可具有以下特点, 所述编码设备包括编码单元和封 装单元, 其中:  Preferably, the above system may further have the following features, the encoding device comprising an encoding unit and a packaging unit, wherein:
所述编码单元设置为: 将视频业务编码生成多路媒体流;  The encoding unit is configured to: encode a video service to generate a multi-channel media stream;
所述封装单元设置为: 将所述多路媒体流封装为多路实时传输协议 ( RTP )码流, 其中, 每路 RTP码流伴随一路实时传输控制协议( RTCP )码 流, 且所述 RTCP码流保证同一釆样时刻的媒体流的网络时间协议(NTP ) 时间同步;  The encapsulating unit is configured to: encapsulate the multi-path media stream into a multi-channel real-time transport protocol (RTP) code stream, where each RTP code stream is accompanied by a real-time transport control protocol (RTCP) code stream, and the RTCP The code stream guarantees network time protocol (NTP) time synchronization of the media stream at the same time;
所述前端发送设备包括第一封装单元, 第二封装单元、 转换单元, 第三 封装单元和发送单元, 其中:  The front-end transmitting device includes a first encapsulating unit, a second encapsulating unit, a converting unit, a third encapsulating unit, and a sending unit, where:
第一封装单元设置为: 提取所述 RTP码流中封装的媒体流, 将所述媒体 流以媒体单元为单位分别封装在广播信道帧中的不同复用子帧中;  The first encapsulating unit is configured to: extract media streams encapsulated in the RTP code stream, and encapsulate the media streams in different multiplex sub-frames in a broadcast channel frame in units of media units;
所述第二封装单元设置为: 将所述媒体流所在的复用子帧的位置信息封 装在所述广播信道帧中;  The second encapsulating unit is configured to: encapsulate location information of a multiplex subframe in which the media stream is located in the broadcast channel frame;
所述转换单元设置为:将所述媒体流的 RTP时间戳转换为 NTP时间,再 将 NTP时间转换为统一时间基准下的移动多媒体广播时间戳;  The converting unit is configured to: convert the RTP timestamp of the media stream into an NTP time, and then convert the NTP time into a mobile multimedia broadcast timestamp under the unified time reference;
所述第三封装单元设置为: 将所述移动多媒体广播时间戳封装到其对应 的媒体单元所在的复用子帧中; 所述发送单元设置为: 将所述广播信道帧发送至接收终端。  The third encapsulating unit is configured to: encapsulate the mobile multimedia broadcast timestamp into a multiplex subframe in which the corresponding media unit is located; and the sending unit is configured to: send the broadcast channel frame to the receiving terminal.
优选地, 上述系统还可具有以下特点, 所述转换单元包括第一转换单元 和第二转换单元, 其中:  Preferably, the above system may further have the following features, the conversion unit includes a first conversion unit and a second conversion unit, wherein:
所述第一转换单元设置为: 对每一媒体单元, 取出其所在 RTP包的 RTP 时间戳, 结合其所在 RTP码流对应的 RTCP码流传送的 RTCP包的有关时间 信息, 计算该媒体单元的 NTP时间; 所述第二转换单元设置为: 将所述媒体单元的 NTP时间与移动多媒体广 播时间刻度相乘, 得到所述媒体单元的移动多媒体广播时间戳。 The first converting unit is configured to: for each media unit, take out an RTP timestamp of the RTP packet in which it is located, and calculate time information of the RTCP packet transmitted by the RTCP code stream corresponding to the RTP stream in which the RTP stream is located, and calculate the media unit. NTP time; The second converting unit is configured to: multiply the NTP time of the media unit by a mobile multimedia broadcast time scale to obtain a mobile multimedia broadcast time stamp of the media unit.
优选地, 上述系统还可具有以下特点, 所述同一釆样时刻的媒体流的移 动多媒体广播时间戳同步是指: 同一釆样时刻的媒体流的移动多媒体广播时 间戳值之间的差值在预设的时间戳容差值范围内。 本发明还提供一种接收方法, 包括:  Preferably, the above system may also have the following features: the mobile multimedia broadcast timestamp synchronization of the media stream at the same sampling time refers to: the difference between the mobile multimedia broadcast timestamp values of the media streams at the same sampling time is The preset time stamp tolerance value range. The invention also provides a receiving method, comprising:
接收终端监听广播信道帧中视频业务中视频流的各层码流所在的复用子 帧的位置信息;  Receiving, by the receiving terminal, location information of a multiplex sub-frame in which each layer of the video stream in the video service in the broadcast channel frame is located;
接收终端根据自身的视频流处理能力接收基本层码流, 或者, 接收基本 层码流及对应的增强层码流;  The receiving terminal receives the basic layer code stream according to its own video stream processing capability, or receives the basic layer code stream and the corresponding enhancement layer code stream;
当接收基本层码流及对应的增强层码流时, 将各层码流按照移动多媒体 广播时间戳对齐合并后, 对基本层码流及对应的增强层码流进行解码, 输出 基本层码流与增强层码流合并后的视频数据; 从而在移动多媒体广播中实现 可伸缩视频编码业务接收。  When receiving the base layer code stream and the corresponding enhancement layer code stream, the layer code streams are aligned and combined according to the mobile multimedia broadcast timestamp, and then the base layer code stream and the corresponding enhancement layer code stream are decoded, and the basic layer code stream is output. The video data combined with the enhancement layer code stream; thereby implementing scalable video coding service reception in the mobile multimedia broadcast.
优选地, 上述方法还可具有以下特点, 所述接收基本层码流及对应的增 强层码流, 将各层码流按照移动多媒体广播时间戳对齐合并包括:  Preferably, the foregoing method may further have the following feature: the receiving the base layer code stream and the corresponding enhanced layer code stream, and aligning the layer code streams according to the mobile multimedia broadcast timestamp comprises:
接收终端将接收基本层码流和增强层码流, 存入緩冲区中;  The receiving terminal will receive the base layer code stream and the enhancement layer code stream and store them in the buffer;
从緩冲区中取出属于一个视频接入单元的基本层码流数据, 以所述基本 层码流数据的移动多媒体广播时间戳为基准, 取出与所述基本层码流数据的 移动多媒体广播时间戳同步的增强层码流数据, 将其作为同一视频接入单元 的数据合并。  Extracting base layer code stream data belonging to a video access unit from a buffer, and extracting a mobile multimedia broadcast time with the base layer code stream data based on a mobile multimedia broadcast timestamp of the base layer code stream data The synchronized enhancement layer code stream data is stamped and combined as data of the same video access unit.
优选地, 上述方法还可具有以下特点, 所述与所述基本层码流数据的移 动多媒体广播时间戳同步的增强层码流数据是指, 与所述基本层码流数据的 移动多媒体广播时间戳的差值在预设的时间戳容差值范围内的增强层码流数 据。  Preferably, the foregoing method may further have the following feature: the enhanced layer code stream data synchronized with the mobile multimedia broadcast timestamp of the base layer code stream data refers to a mobile multimedia broadcast time with the base layer code stream data. The enhancement layer code stream data within the preset time stamp tolerance value difference.
本发明还提供一种接收装置, 包括:  The invention also provides a receiving device, comprising:
监听单元, 其设置为: 监听广播信道帧中视频业务中视频流的各层码流 所在的复用子帧的位置信息; 接收单元, 其设置为: 根据自身的视频流处理能力接收基本层码流, 或 者, 接收基本层码流及对应的增强层码流; a monitoring unit, configured to: monitor location information of a multiplex subframe in which each layer of the video stream in the video service in the broadcast channel frame is located; a receiving unit, configured to: receive a base layer code stream according to its own video stream processing capability, or receive a base layer code stream and a corresponding enhancement layer code stream;
对齐合并单元, 其设置为: 将各层码流按照移动多媒体广播时间戳对齐 合并;  Aligning the merging unit, which is set to: align each layer code stream according to the mobile multimedia broadcast time stamp;
解码单元, 其设置为: 对基本层码流及对应的增强层码流进行解码, 输 出基本层码流与增强层码流合并后的视频数据; 从而移动多媒体广播中实现 可伸缩视频编码业务接收。  a decoding unit, configured to: decode the base layer code stream and the corresponding enhancement layer code stream, and output video data combined by the base layer code stream and the enhancement layer code stream; thereby implementing scalable video coding service reception in the mobile multimedia broadcast .
优选地, 上述装置还可具有以下特点, 所述接收单元还设置为: 将接收 到的基本层码流和增强层码流存入緩冲区中;  Preferably, the foregoing apparatus may further have the following feature, the receiving unit is further configured to: store the received base layer code stream and the enhancement layer code stream in a buffer;
所述对齐合并单元设置为: 从緩冲区中取出属于一个视频接入单元的基 本层码流数据, 以所述基本层码流数据的移动多媒体广播时间戳为基准, 取 出与所述基本层码流数据的移动多媒体广播时间戳同步的增强层码流数据, 将其作为同一视频接入单元的数据合并。  The alignment merging unit is configured to: extract base layer code stream data belonging to a video access unit from a buffer, and extract the base layer based on a mobile multimedia broadcast timestamp of the base layer code stream data. The mobile multimedia broadcast time-stamped enhanced layer code stream data of the code stream data is combined as data of the same video access unit.
优选地, 上述装置还可具有以下特点, 所述取出与所述基本层码流数据 的移动多媒体广播时间戳同步的增强层码流数据是指, 取出与所述基本层码 流数据的移动多媒体广播时间戳之间的差值在预设的时间戳容差值范围内的 增强层码流数据。  Preferably, the foregoing apparatus may further have the following feature, the extracting the enhanced layer code stream data synchronized with the mobile multimedia broadcast timestamp of the base layer code stream data, and extracting the mobile multimedia with the base layer code stream data The enhancement layer code stream data whose difference between the broadcast time stamps is within a preset time stamp tolerance value range.
本发明实施例所述方法, 在不同分层的 SVC数据中打上同步的 CMMB 广播时间戳, 保证了不同分层数据之间的同步。 附图概述  In the method of the embodiment of the present invention, synchronous CMMB broadcast time stamps are set in different layered SVC data to ensure synchronization between different hierarchical data. BRIEF abstract
图 1为 CMMB信道帧的组成结构示意图;  1 is a schematic structural diagram of a CMMB channel frame;
图 2为本发明所述系统示意图;  2 is a schematic diagram of the system of the present invention;
图 3为本发明所涉及的广播信道帧的组成结构示意图;  3 is a schematic structural diagram of a structure of a broadcast channel frame according to the present invention;
图 4为本发明所涉及的终端处理功能示意图, 图中 SVC视频帧合成时, 参与合成的基本层视频单元和增强层视频单元必须保持同步。 本发明的较佳实施方式 本发明实施例的基本思想是, 在同一釆样时刻的音频流、 基本层码流和 对应的增强层码流上打上同步的 CMMB时间戳, 从而实现 SVC业务同步。 FIG. 4 is a schematic diagram of a terminal processing function according to the present invention. In the SVC video frame synthesis, the base layer video unit and the enhancement layer video unit participating in the synthesis must be synchronized. Preferred embodiment of the invention The basic idea of the embodiment of the present invention is to perform synchronous SCM service synchronization by applying synchronized CMMB timestamps on the audio stream, the base layer code stream, and the corresponding enhancement layer code stream at the same time.
本发明实施例提供一种移动多媒体广播中实现可伸缩视频编码业务同步 发送的方法, 包括:  An embodiment of the present invention provides a method for implementing synchronous transmission of a scalable video coding service in a mobile multimedia broadcast, including:
将视频业务编码生成多路媒体流, 将所述多路媒体流以媒体单元为单位 分别封装在广播信道帧中的不同复用子帧中, 所述复用子帧中还携带该复用 子帧中封装的各媒体单元的移动多媒体广播时间戳, 且同一釆样时刻的媒体 流的移动多媒体广播时间戳同步, 并将所述媒体流所在的复用子帧的位置信 息封装于所述广播信道帧中, 将所述广播信道帧发送至接收终端, 所述媒体 流包括视频流编码生成的基本层码流及其对应的增强层码流, 或者, 包括音 频流、 视频流编码生成的基本层码流及其对应的增强层码流。  The video service is encoded to generate a multi-channel media stream, and the multiple media streams are respectively encapsulated in different multiplex subframes in a broadcast channel frame in units of media units, and the multiplexer is also carried in the multiplex subframe. a mobile multimedia broadcast timestamp of each media unit encapsulated in the frame, and a mobile multimedia broadcast timestamp of the media stream at the same time instant is synchronized, and the location information of the multiplexed subframe in which the media stream is located is encapsulated in the broadcast In the channel frame, the broadcast channel frame is sent to the receiving terminal, where the media stream includes a base layer code stream generated by the video stream coding and its corresponding enhancement layer code stream, or includes a basic generation of the audio stream and the video stream coding. Layer code stream and its corresponding enhancement layer code stream.
其中, 按如下方式将所述媒体流封装到所述复用子帧中:  The media stream is encapsulated into the multiplex subframe according to the following manner:
将所述多路媒体流封装为多路实时传输协议(RTP )码流, 其中, 每路 RTP码流伴随一路实时传输控制协议( RTCP )码流, 且所述 RTCP码流保证 同一釆样时刻的媒体流的网络时间协议( NTP ) 时间同步;  Encapsulating the multi-channel media stream into a multi-channel real-time transport protocol (RTP) code stream, where each RTP code stream is accompanied by a real-time transport control protocol (RTCP) code stream, and the RTCP code stream ensures the same sample time Network Time Protocol (NTP) time synchronization of media streams;
提取所述 RTP码流中封装的媒体流, 将所述媒体流以媒体单元为单位分 别封装在广播信道帧中的不同复用子帧中, 将所述媒体流的 RTP时间戳转换 为 NTP时间, 再将 NTP时间转换为统一时间基准下的移动多媒体广播时间 戳, 将所述移动多媒体广播时间戳封装到其对应的媒体单元所在的复用子帧 中。  Extracting the media stream encapsulated in the RTP code stream, and encapsulating the media stream in different multiplex subframes in a broadcast channel frame in units of media units, and converting the RTP timestamp of the media stream into an NTP time And converting the NTP time into a mobile multimedia broadcast timestamp under the unified time reference, and packaging the mobile multimedia broadcast timestamp into a multiplex subframe in which the corresponding media unit is located.
其中, 按如下方式将 RTP时间戳转换为移动多媒体广播时间戳: 对每一媒体单元, 取出其所在 RTP包的 RTP时间戳, 结合其所在 RTP 码流对应的 RTCP码流传送的 RTCP包的有关时间信息, 计算该媒体单元的 NTP时间;  The RTP timestamp is converted into a mobile multimedia broadcast timestamp as follows: For each media unit, the RTP timestamp of the RTP packet in which it is located is extracted, and the RTCP packet transmitted by the RTCP code stream corresponding to the RTP stream in which it is located is related. Time information, calculating an NTP time of the media unit;
将该媒体单元的 NTP时间与移动多媒体广播时间刻度相乘,得到所述媒 体单元的移动多媒体广播时间戳。  Multiplying the NTP time of the media unit by the mobile multimedia broadcast time scale to obtain a mobile multimedia broadcast time stamp of the media unit.
其中, 所述媒体单元的移动多媒体广播时间戳包括起始播放时间和每个 媒体单元对应的相对播放时间两部分。 同一复用子帧内, 各媒体单元的起始 播放时间相同。 The mobile multimedia broadcast timestamp of the media unit includes two parts: a start play time and a relative play time corresponding to each media unit. Start of each media unit within the same multiplex sub-frame Play time is the same.
其中, 所述同一釆样时刻的媒体流的移动多媒体广播时间戳同步是指: 同一釆样时刻的媒体流的移动多媒体广播时间戳值之间的差值在预设的时间 戳容差值范围内。  The mobile multimedia broadcast timestamp synchronization of the media stream at the same sampling time refers to: the difference between the mobile multimedia broadcast timestamp values of the media streams at the same sampling time is within a preset time stamp tolerance range Inside.
本发明实施例还提供一种移动多媒体广播中实现可伸缩视频编码业务同 步发送的系统, 所述系统包括编码设备和前端发送设备, 其中:  An embodiment of the present invention further provides a system for implementing synchronous transmission of a scalable video coding service in a mobile multimedia broadcast, where the system includes an encoding device and a front-end transmitting device, where:
所述编码设备, 用于将视频业务编码生成多路媒体流; 所述媒体流包括 视频流编码生成的基本层码流及其对应的增强层码流, 或者, 包括音频流、 视频流编码生成的基本层码流及其对应的增强层码流;  The encoding device is configured to encode a video service to generate a multi-path media stream; the media stream includes a base layer code stream generated by the video stream coding and a corresponding enhancement layer code stream, or includes an audio stream and a video stream coding generation Base layer code stream and its corresponding enhancement layer code stream;
所述前端发送设备, 用于将所述媒体流以媒体单元为单位分别封装在广 播信道帧中的不同复用子帧中, 将该复用子帧中封装的各媒体单元的移动多 媒体广播时间戳封装在所述复用子帧中, 且同一釆样时刻的媒体流的移动多 媒体广播时间戳同步 , 并将所述媒体流所在的复用子帧的位置信息封装于所 述广播信道帧中, 将所述广播信道帧发送至接收终端。  The front-end transmitting device is configured to encapsulate the media stream in different multiplex subframes in a broadcast channel frame in units of media units, and move multimedia broadcast time of each media unit encapsulated in the multiplex subframe Stamping is encapsulated in the multiplexed subframe, and the mobile multimedia broadcast timestamp of the media stream at the same time instant is synchronized, and the location information of the multiplexed subframe in which the media stream is located is encapsulated in the broadcast channel frame. And transmitting the broadcast channel frame to the receiving terminal.
其中, 所述编码设备包括编码单元和封装单元, 其中:  The encoding device includes an encoding unit and a packaging unit, where:
所述编码单元, 用于将视频业务编码生成多路媒体流;  The coding unit is configured to encode a video service to generate a multi-channel media stream;
所述封装单元, 用于将所述多路媒体流封装为多路实时传输协议(RTP ) 码流, 其中, 每路 RTP码流伴随一路实时传输控制协议(RTCP )码流, 且 所述 RTCP码流保证同一釆样时刻的媒体流的网络时间协议(NTP ) 时间同 步;  The encapsulating unit is configured to encapsulate the multi-path media stream into a multi-channel real-time transport protocol (RTP) code stream, where each RTP code stream is accompanied by a real-time transport control protocol (RTCP) code stream, and the RTCP The code stream guarantees network time protocol (NTP) time synchronization of the media stream at the same time;
所述前端发送设备包括第一封装单元, 第二封装单元、 转换单元, 第三 封装单元和发送单元, 其中:  The front-end transmitting device includes a first encapsulating unit, a second encapsulating unit, a converting unit, a third encapsulating unit, and a sending unit, where:
第一封装单元, 用于提取所述 RTP码流中封装的媒体流, 将所述媒体流 以媒体单元为单位分别封装在广播信道帧中的不同复用子帧中; 所述第二封装单元, 用于将所述媒体流所在的复用子帧的位置信息封装 在所述广播信道帧中;  a first encapsulating unit, configured to extract a media stream encapsulated in the RTP code stream, and encapsulate the media stream in different multiplex subframes in a broadcast channel frame in units of media units; the second encapsulating unit And for encapsulating location information of the multiplex subframe in which the media stream is located in the broadcast channel frame;
所述转换单元,用于将所述媒体流的 RTP时间戳转换为 NTP时间,再将 NTP时间转换为统一时间基准下的移动多媒体广播时间戳; 所述第三封装单元, 用于将所述移动多媒体广播时间戳封装到其对应的 媒体单元所在的复用子帧中; 所述发送单元, 用于将所述广播信道帧发送至接收终端。 The converting unit is configured to convert an RTP timestamp of the media stream into an NTP time, and then convert the NTP time into a mobile multimedia broadcast timestamp under a unified time reference; The third encapsulating unit is configured to encapsulate the mobile multimedia broadcast timestamp into a multiplex subframe in which the corresponding media unit is located, and the sending unit is configured to send the broadcast channel frame to the receiving terminal.
其中, 所述转换单元包括第一转换单元和第二转换单元, 其中: 所述第一转换单元, 用于对每一媒体单元, 取出其所在 RTP 包的 RTP 时间戳, 结合其所在 RTP码流对应的 RTCP码流传送的 RTCP包的有关时间 信息, 计算该媒体单元的 NTP时间;  The converting unit includes a first converting unit and a second converting unit, where: the first converting unit is configured to: for each media unit, take out an RTP timestamp of an RTP packet in which it is located, and combine the RTP stream in which the RTP stream is located. Corresponding time information of the RTCP packet transmitted by the corresponding RTCP code stream, and calculating an NTP time of the media unit;
所述第二转换单元, 用于将所述媒体单元的 NTP时间与移动多媒体广播 时间刻度相乘, 得到所述媒体单元的移动多媒体广播时间戳。  The second converting unit is configured to multiply the NTP time of the media unit by a mobile multimedia broadcast time scale to obtain a mobile multimedia broadcast timestamp of the media unit.
本发明实施例还提供一种移动多媒体广播中实现可伸缩视频编码业务接 收方法, 包括:  An embodiment of the present invention further provides a method for implementing a scalable video coding service in a mobile multimedia broadcast, including:
接收终端监听广播信道帧中视频业务中视频流的各层码流所在的复用子 帧的位置信息;  Receiving, by the receiving terminal, location information of a multiplex sub-frame in which each layer of the video stream in the video service in the broadcast channel frame is located;
接收终端根据自身的视频流处理能力接收基本层码流, 或者, 接收基本 层码流及对应的增强层码流;  The receiving terminal receives the basic layer code stream according to its own video stream processing capability, or receives the basic layer code stream and the corresponding enhancement layer code stream;
当接收基本层码流及对应的增强层码流时, 将各层码流按照移动多媒体 广播时间戳对齐合并后, 对基本层码流及对应的增强层码流进行解码, 输出 基本层码流与增强层码流合并后的视频数据。 其中, 所述接收基本层码流及对应的增强层码流, 将各层码流按照移动 多媒体广播时间戳对齐合并包括:  When receiving the base layer code stream and the corresponding enhancement layer code stream, the layer code streams are aligned and combined according to the mobile multimedia broadcast timestamp, and then the base layer code stream and the corresponding enhancement layer code stream are decoded, and the basic layer code stream is output. Video data combined with the enhancement layer stream. The receiving the base layer code stream and the corresponding enhancement layer code stream, and aligning the layer code streams according to the mobile multimedia broadcast timestamp includes:
接收终端将接收基本层码流和增强层码流, 存入緩冲区中;  The receiving terminal will receive the base layer code stream and the enhancement layer code stream and store them in the buffer;
从緩冲区中取出属于一个视频接入单元的基本层码流数据, 以所述基本 层码流数据的移动多媒体广播时间戳为基准, 取出与所述基本层码流数据的 移动多媒体广播时间戳同步的增强层码流数据, 将其作为同一视频接入单元 的数据合并。  Extracting base layer code stream data belonging to a video access unit from a buffer, and extracting a mobile multimedia broadcast time with the base layer code stream data based on a mobile multimedia broadcast timestamp of the base layer code stream data The synchronized enhancement layer code stream data is stamped and combined as data of the same video access unit.
其中, 所述与所述基本层码流数据的移动多媒体广播时间戳同步的增强 层码流数据是指, 增强层码流数据的移动多媒体广播时间戳与所述基本层码 流数据的移动多媒体广播时间戳之间的差值在预设的时间戳容差值范围内。 本发明实施例还提供一种移动多媒体广播中实现可伸缩视频编码业务接 收装置, 包括: The enhanced layer code stream data synchronized with the mobile multimedia broadcast timestamp of the base layer code stream data refers to a mobile multimedia broadcast timestamp of the enhancement layer code stream data and the mobile multimedia of the base layer code stream data. The difference between the broadcast timestamps is within the preset timestamp tolerance value range. An embodiment of the present invention further provides a device for implementing a scalable video coding service in a mobile multimedia broadcast, including:
监听单元, 监听广播信道帧中视频业务中视频流的各层码流所在的复用 子帧的位置信息;  a monitoring unit, which monitors location information of a multiplex subframe in which each layer of the video stream in the video service in the broadcast channel frame is located;
接收单元, 根据自身的视频流处理能力接收基本层码流, 或者, 接收基 本层码流及对应的增强层码流;  The receiving unit receives the basic layer code stream according to its own video stream processing capability, or receives the basic layer code stream and the corresponding enhancement layer code stream;
对齐合并单元, 用于将各层码流按照移动多媒体广播时间戳对齐合并; 解码单元, 对基本层码流及对应的增强层码流进行解码, 输出基本层码 流与增强层码流合并后的视频数据。  The aligning unit is configured to align the code streams of each layer according to the time limit of the mobile multimedia broadcast; the decoding unit decodes the base layer code stream and the corresponding enhancement layer code stream, and combines the output base layer code stream and the enhancement layer code stream. Video data.
其中, 所述接收单元, 还用于将接收到的基本层码流和增强层码流存入 緩冲区中;  The receiving unit is further configured to store the received base layer code stream and the enhancement layer code stream in a buffer;
所述对齐合并单元, 用于从緩冲区中取出属于一个视频接入单元的基本 层码流数据, 以所述基本层码流数据的移动多媒体广播时间戳为基准, 取出 与所述基本层码流数据的移动多媒体广播时间戳同步的增强层码流数据, 将 其作为同一视频接入单元的数据合并。  The alignment merging unit is configured to extract base layer code stream data belonging to a video access unit from a buffer, and extract the base layer based on a mobile multimedia broadcast timestamp of the base layer code stream data. The mobile multimedia broadcast time-stamped enhanced layer code stream data of the code stream data is combined as data of the same video access unit.
其中, 所述取出与所述基本层码流数据的移动多媒体广播时间戳同步的 增强层码流数据是指, 取出与所述基本层码流数据的移动多媒体广播时间戳 之间的差值在预设的时间戳容差值范围内的增强层码流数据。  The extracting the enhanced layer code stream data synchronized with the mobile multimedia broadcast timestamp of the base layer code stream data, the difference between the mobile multimedia broadcast timestamp and the mobile layer broadcast timestamp of the base layer code stream data is The enhancement layer code stream data within the preset time stamp tolerance value range.
本发明实施例提供了一种实现可伸缩视频编码业务多个分层同步的系 统, 包括编码设备、 前端发送设备和终端, 其中:  An embodiment of the present invention provides a system for implementing multiple hierarchical synchronization of a scalable video coding service, including an encoding device, a front-end transmitting device, and a terminal, where:
所述编码设备, 用于对视频源进行编码, 产生包含基本层和若干增强层 的 SVC码流, 多个分层的 SVC码流封装为多路 RTP (实时传输协议 )码流 后发出, 对于每一路 RTP码流, 伴随一路 RTCP (实时传输控制协议)码流, 所述 RTCP码流用于保证基本层内音视频业务在 NTP (网络时间协议)上的 同步, 和多个分层的 SVC业务在 NTP (网络时间协议)时间上的同步, 即保 证同一釆样时刻基本层码流和对应的增强层码流在 NTP时间上的同步, 如果 存在音频流, 还要保证同一釆样时刻的音频流与基本层码流和对应的增强层 码流在 NTP时间上的同步。 所述前端发送设备, 用于接收编码设备发出的 RTP码流和 RTCP码流, 从 RTP码流中提取出 SVC基本层码流和增强层码流, 将 SVC基本层码流和 增强层码流按其所属的层以视频单元为单位分别封装于广播信道帧中的不同 复用子帧中,复用子帧中还携带该复用子帧中封装的各视频单元的 CMMB广 播时间戳, 同时将所述视频流各层码流所在的复用子帧的位置信息封装于所 述广播信道帧中。 在此设备上, 将媒体数据的 RTP时间戳转换成统一时间基 准下的 CMMB广播时间戳,控制 SVC各分层业务在 CMMB广播时间戳上的 同步,将需要发送的媒体数据连同该媒体数据的 CMMB广播时间戳一起广播 发送。 The encoding device is configured to encode a video source, generate an SVC code stream including a base layer and a plurality of enhancement layers, and the plurality of layered SVC code streams are encapsulated into multiple RTP (Real Time Transport Protocol) code streams, and Each RTP code stream is accompanied by an RTCP (Real Time Transmission Control Protocol) code stream, which is used to ensure synchronization of audio and video services in the base layer over NTP (Network Time Protocol), and multiple layered SVC services. Synchronization at NTP (Network Time Protocol) time, that is, to ensure synchronization of the base layer code stream and the corresponding enhancement layer code stream at the same time of NTP time, if there is an audio stream, the audio of the same time is also guaranteed. The stream is synchronized with the base layer code stream and the corresponding enhancement layer code stream at NTP time. The front-end transmitting device is configured to receive an RTP code stream and an RTCP code stream sent by the encoding device, extract an SVC base layer code stream and an enhancement layer code stream from the RTP code stream, and use the SVC base layer code stream and the enhancement layer code stream. The CMMB broadcast timestamps of the video units encapsulated in the multiplexed sub-frames are also carried in the multiplexed subframes of the broadcast channel frame. And storing location information of the multiplex subframe in which each layer of the video stream is located in the broadcast channel frame. On the device, the RTP timestamp of the media data is converted into a CMMB broadcast timestamp under the unified time reference, and the synchronization of the SVC layered services on the CMMB broadcast timestamp is controlled, and the media data to be sent together with the media data is controlled. The CMMB broadcast timestamp is broadcasted together.
所述终端, 用于监听广播信道帧中视频业务中视频流的各层码流所在的 复用子帧的位置信息, 并根据自身的视频流处理能力接收基本层码流, 或接 收基本层码流及对应的增强层码流。将不同分层的编码单元按 CMMB广播时 间戳同步后进行合并, 然后进行视频解码展现。  The terminal is configured to listen to location information of a multiplex subframe where each layer of the video stream in the video service in the broadcast channel frame is located, and receive the base layer code stream according to the video stream processing capability of the broadcast channel, or receive the base layer code. Stream and corresponding enhancement layer code stream. The coding units of different layers are synchronized according to the CMMB broadcast time stamp, and then video decoding is performed.
本发明实施例还提供了一种实现可伸缩视频编码业务多个分层同步的方 法, 包括:  The embodiment of the invention further provides a method for implementing multiple hierarchical synchronization of a scalable video coding service, including:
编码设备将编码生成的多个分层的 SVC码流封装为多路 RTP码流后发 出, 对于每一路 RTP码流, 伴随一路 RTCP码流。 RTCP码流保证同一釆样 时刻上的基本层 SVC数据、 增强层 SVC数据以及音频数据在 NTP时间上的 同步;  The encoding device encapsulates the plurality of layered SVC code streams generated by the encoding into a plurality of RTP code streams, and sends an RTCP code stream for each RTP code stream. The RTCP code stream ensures synchronization of the base layer SVC data, enhancement layer SVC data, and audio data at the same time instant at the NTP time;
前端发送设备接收编码设备发出的 RTP码流和 RTCP码流,从 RTP码流 中提取出所封装的 SVC业务数据, 将 SVC基本层码流和增强层码流按其所 属的层分别封装于广播信道帧中的不同复用子帧中, 同时将所述视频流各层 码流所在的复用子帧的位置信息封装于所述广播信道帧中。 在此设备上, 将 媒体数据的 RTP时间戳转换成统一时间基准下的 CMMB广播时间戳, 控制 SVC各分层业务在 CMMB广播时间戳上的同步, 将需要发送的媒体数据连 同该媒体数据的 CMMB广播时间戳一起广播发送。  The front-end transmitting device receives the RTP code stream and the RTCP code stream sent by the encoding device, extracts the encapsulated SVC service data from the RTP code stream, and encapsulates the SVC base layer code stream and the enhancement layer code stream into the broadcast channel according to the layer to which they belong. In the different multiplex subframes in the frame, the location information of the multiplex subframe in which the layer stream of each layer of the video stream is located is encapsulated in the broadcast channel frame. On the device, the RTP timestamp of the media data is converted into a CMMB broadcast timestamp under the unified time reference, and the synchronization of the SVC hierarchical services on the CMMB broadcast timestamp is controlled, and the media data to be sent together with the media data is controlled. The CMMB broadcast timestamp is broadcasted together.
终端监听广播信道帧中视频业务中视频流的各层码流所在的复用子帧的 位置信息, 并根据自身的视频流处理能力接收基本层码流, 或接收基本层码 流及对应的增强层码流, 将不同分层的编码单元按 CMMB广播时间戳对齐, 同步后进行合并, 然后进行视频解码展现。 The terminal monitors the location information of the multiplex subframe in which the layer streams of the video stream in the video service in the broadcast channel frame are located, and receives the base layer code stream according to the video stream processing capability of the broadcast channel, or receives the base layer code stream and the corresponding enhancement. Layer code stream, aligning coding units of different layers by CMMB broadcast timestamp, After the synchronization, the merge is performed, and then the video decoding is performed.
前端发送设备上, SVC各分层业务在 CMMB广播时间戳上的同步, 按 以下方法进行:  On the front-end sending device, the SVC layered services are synchronized on the CMMB broadcast timestamp, as follows:
对于 SVC各分层所对应的每一复用子帧,  For each multiplex subframe corresponding to each layer of the SVC,
A1、接收输入的 RTP和 RTCP流, 根据视频单元所对应的 RTP包、 RTP 包所在 RTP码流所对应的 RTCP码流传送的 RTCP包的有关时间信息, 计算 该视频单元对应的 NTP时间;  A1. The received RTP and RTCP streams are calculated according to the time information of the RTCP packet transmitted by the RTP packet corresponding to the RTP packet corresponding to the RTP packet, and the NTP time corresponding to the video unit is calculated.
Bl、 对于每个视频单元, 直接用上述 NTP时间乘上 CMMB时间刻度, 对结果根据 CMMB广播时间戳最大位数取整, 得到该视频单元的 CMMB广 播时间戳。  Bl. For each video unit, multiply the CMMB time scale by the above NTP time, and round up the result according to the maximum number of CMMB broadcast timestamps to obtain the CMMB broadcast timestamp of the video unit.
对于每一复用子帧 , 针对将封装到该复用子帧的所有带时间戳的媒体单 元,取一个起始播放时间,将各视频单元的 CMMB广播时间戳分解为复用子 帧的起始播放时间和每个单元对应的相对播放时间两部分, 把起始播放时间 和每个单元对应的相对播放时间封装到复用子帧中。 步骤 B 1中所述 CMMB时间刻度,表示每秒发生的 CMMB时间单位数, 根据 CMMB标准《移动多媒体广播 第 2部分: 复用》, CMMB时间刻度为 22500。 该标准规定的 CMMB广播时间戳最大位数为 32位。  For each multiplex subframe, a start time is taken for all time-stamped media units to be encapsulated into the multiplex subframe, and the CMMB broadcast time stamp of each video unit is decomposed into multiplex subframes. The start play time and the relative play time corresponding to each unit are two parts, and the start play time and the relative play time corresponding to each unit are encapsulated into the multiplex subframe. The CMMB time scale described in step B 1 indicates the number of CMMB time units per second. According to the CMMB standard, Mobile Multimedia Broadcasting Part 2: Multiplexing, the CMMB time scale is 22,500. The maximum number of CMMB broadcast timestamps specified by the standard is 32 bits.
CMMB广播时间戳类似于 RTP时间戳,是一种相对时间戳。复用子帧内 的起始播放时间指示了该复用子帧内的各音频单元或视频单元的播放时间基 准, 各音频单元或视频单元的相对播放时间指示了该复用子帧内的各音频单 元或视频单元的播放时间相对于起始播放时间的偏移, 根据起始播放时间和 相对播放时间可以确定一个复用子帧内各音频单元或视频单元播放的相对时 间关系。但是,当如 SVC这样的业务用多路复用子帧流来传输时,由于 CMMB 广播时间戳是一种单路复用子帧流内适用的相对时间戳, 多路复用子帧流之 间的复用子帧其时间戳缺乏共同的基准, 缺乏针对以多路复用子帧流来传输 的 SVC业务的时间戳的起始基准, 从而多路复用子帧流之间无法实现同步。  The CMMB broadcast timestamp is similar to the RTP timestamp and is a relative timestamp. The initial play time in the multiplex subframe indicates a play time reference of each audio unit or video unit in the multiplex subframe, and the relative play time of each audio unit or video unit indicates each of the multiplex subframes. The offset of the playing time of the audio unit or the video unit relative to the starting playing time, and the relative time relationship of playing of each audio unit or video unit in one multiplexed sub-frame may be determined according to the initial playing time and the relative playing time. However, when a service such as SVC is transmitted with a multiplexed sub-frame stream, since the CMMB broadcast timestamp is a relative timestamp suitable for use in a single-way multiplexed sub-frame stream, the multiplexed sub-frame stream Inter-subframes have a common reference for their timestamps, lacking a starting reference for the timestamp of the SVC traffic transmitted in the multiplexed sub-frame stream, so that synchronization between the multiplexed sub-frame streams is not possible .
本发明实施例通过直接用 NTP时间乘上 CMMB时间刻度换算出视频单 元对应的 CMMB广播时间戳, 由于同一釆样时刻的视频单元其 NTP时间同 步, 从而可以确保同一釆样时刻的视频单元其 CMMB 广播时间戳同步。 该 CMMB广播时间戳可以用起始播放时间和相对播放时间联合进行表示, 同一 复用子帧内的起始播放时间相同, 不同复用子帧的起始播放时间可以相同或 不同。 In the embodiment of the present invention, the CMMB broadcast timestamp corresponding to the video unit is converted by directly multiplying the CMMB time scale by the NTP time, because the NTP time of the video unit at the same time is the same Steps, thereby ensuring that the video units of the same sample time are synchronized with their CMMB broadcast timestamps. The CMMB broadcast timestamp can be jointly represented by the initial play time and the relative play time. The start play time in the same multiplex sub-frame is the same, and the start play time of different multiplex sub-frames can be the same or different.
本发明实施例中, 可以用所处理的第一个复用子帧内的第一个视频单元 的 NTP时间确定 SVC业务的 CMMB广播时间戳起始值(也可使用其他时间 作为 CMMB广播时间戳起始值 ) , 并通过该转换方法确定了后续视频单元的 CMMB广播时间戳。  In the embodiment of the present invention, the CMMB broadcast timestamp start value of the SVC service may be determined by using the NTP time of the first video unit in the processed first multiplex subframe (other time may also be used as the CMMB broadcast timestamp). The starting value), and the CMMB broadcast timestamp of the subsequent video unit is determined by the conversion method.
本发明实施例中, 以这样一种易于实行的方法, 建立了以多路复用子帧 流来传输的 SVC业务的共同时间戳基准, 同一个业务的多路复用子帧内的各 视频单元,其根据上述方法换算出的 CMMB广播时间戳,相对于上述 CMMB 广播时间戳起始值的偏差,确定了该单元相对于 CMMB广播时间戳起始值的 播放时间偏移。 本发明可以保证同一 NTP时间点上的 SVC各分层业务数据 在 CMMB广播时间戳的统一, 从而提供了 SVC不同分层之间同步的保证手 段。  In the embodiment of the present invention, in such an easy-to-implement method, a common timestamp reference of the SVC service transmitted by the multiplexed sub-frame stream is established, and each video in the multiplex sub-frame of the same service is established. The unit determines the play time offset of the unit relative to the CMMB broadcast timestamp start value relative to the deviation of the CMMB broadcast timestamp start value according to the CMMB broadcast timestamp converted according to the above method. The invention can ensure the unification of the SVC hierarchical service data at the same NTP time point in the CMMB broadcast timestamp, thereby providing a guarantee means for synchronizing between different layers of the SVC.
根据本方法, SVC各分层所对应的每一复用子帧可独立处理, 相互之间 没有耦合依赖。  According to the method, each multiplex subframe corresponding to each layer of the SVC can be processed independently, and there is no coupling dependency between them.
对于上述方法, 当 SVC分层流内传送的数据除了视频数据外, 还包含音 频数据时, 音频数据的时间戳按与视频数据同样的方法处理。  For the above method, when the data transmitted in the SVC layered stream contains audio data in addition to the video data, the time stamp of the audio data is processed in the same manner as the video data.
终端上, SVC各分层业务的同步按以下方法进行:  On the terminal, the synchronization of each layered service of the SVC is performed as follows:
A2、 将所接收到的 SVC各分层业务数据连同其对应的时间戳放入緩冲 区, 该緩冲区可以为一个, 也可以为每一分层一个, 緩冲区大小应能容纳各 分层数据的传输时间差。 由于某些原因, 同一时刻的 SVC分层数据不能完全 保证同时收到, 设各分层业务数据的接收时刻差的门限值为 Td, 緩冲区应能 容纳 Td时间范围内的 SVC所有分层的数据;  A2, the received SVC layered service data together with its corresponding time stamp is put into the buffer, the buffer may be one, or one for each layer, the buffer size should be able to accommodate each The transmission time difference of hierarchical data. For some reasons, the SVC layered data at the same time cannot be completely guaranteed to be received at the same time. The threshold of the receiving time difference of each layered service data is Td, and the buffer should be able to accommodate all points of the SVC in the Td time range. Layer data
B2、 在特定时刻, 如特定间隔的解码时刻, 从緩冲区取待合并的数据, 对齐合并。 先取属于一个视频接入单元的基本层数据, 以基本层数据的时间 戳为基准, 取出与基本层数据时间戳对应的增强层数据, 将其作为同一视频 接入单元的数据合并。 B2. At a specific moment, such as the decoding time of a specific interval, the merged data is taken from the buffer, and the merge is aligned. First, the base layer data belonging to a video access unit is taken, and the enhancement layer data corresponding to the base layer data timestamp is taken as the reference based on the timestamp of the base layer data, and is used as the same video. Data merge of access units.
在步骤 B2中, 由于某种原因, 编码设备发出的 RTP、 RTCP所携带的时 间信息, 可能不能保证同一釆样时刻上的基本层数据、 增强层数据以及音频 数据在 NTP时间上的完全一致, 这样在前端设备上转换出的 CMMB广播时 间戳也就不能完全一致, 而可能有一个小的偏差, 终端根据基本层数据的时 间戳来找对应的增强层数据时, 也设一个时间戳容差值。  In step B2, for some reason, the time information carried by the RTP and RTCP sent by the encoding device may not ensure that the basic layer data, the enhancement layer data, and the audio data at the same sampling time are completely consistent in the NTP time. In this way, the CMMB broadcast timestamp converted on the front-end device cannot be completely consistent, and there may be a small deviation. When the terminal finds the corresponding enhancement layer data according to the timestamp of the base layer data, a timestamp tolerance is also set. value.
为使本发明的目的、 技术方案和优点更加清楚明白, 以下举实施例, 对 本发明进一步详细说明。  In order to make the objects, technical solutions and advantages of the present invention more comprehensible, the present invention will be further described in detail.
图 2为本发明所述系统示意图。  2 is a schematic diagram of the system of the present invention.
编码设备将 SVC视频编码为基本层码流和至少一个增强层码流,基本层 码流可以单独解码, 增强层码流包含用于提高低层码流质量的附加信息, 需 要和包括基本层在内的低层一起解码。  The encoding device encodes the SVC video into a base layer code stream and at least one enhancement layer code stream, and the base layer code stream can be separately decoded, and the enhancement layer code stream includes additional information for improving the quality of the lower layer code stream, and needs and includes the base layer. The lower layers are decoded together.
在时间线上, 视频数据以视频接入单元为基本单位, 典型的视频接入单 元如一个视频帧。 在 SVC方式下, 特定时间点上的一个视频接入单元可被编 码为多个层的数据, 该多个层的数据可以被分为多路进行传输; 接收终端根 据需要, 可同时接收多路传输的多个层的数据, 将其按所属的视频接入单元 进行合并, 解码展现。 本实施例中, 假设编码设备为 H264 SVC编码器, 其 编码输出视频数据基本单元为网络抽象层单元 (Network Abstraction Layer Unit, NALU), 同一时间点上的若干 NALU组成一个视频接入单元。  On the timeline, the video data is based on the video access unit, and a typical video access unit is a video frame. In the SVC mode, a video access unit at a specific time point can be encoded into multiple layers of data, and the data of the multiple layers can be divided into multiple channels for transmission; the receiving terminal can receive multiple channels simultaneously according to needs. The data of multiple layers transmitted is combined and decoded by the associated video access unit. In this embodiment, it is assumed that the encoding device is an H264 SVC encoder, and the basic unit of the encoded output video data is a Network Abstraction Layer Unit (NALU), and a plurality of NALUs at the same time point form a video access unit.
为便于描述, 本实施例中, 假设广播业务包含了 SVC视频和一路音频, For convenience of description, in this embodiment, it is assumed that the broadcast service includes SVC video and one channel audio.
SVC视频釆用空间分层模式, 被编码为一路 QVGA ( QuarterVGA, 320 240 像素图像)基本流和一路 VGA ( Video Graphic Array, 640X480像素图像) 增强视频流, 音频被编码为一路音频流。 需要说明的是, 本发明所述方法同 样适用于多路增强流及其它 SVC编码模式的情况。 SVC video uses spatial layering mode, encoded as a QVGA (Quad VGA, 320 240 pixel image) elementary stream and a VGA (Video Graphic Array, 640x480 pixel image) enhanced video stream, the audio is encoded as an audio stream. It should be noted that the method of the present invention is equally applicable to the case of multiple enhanced streams and other SVC coding modes.
编码设备将编码生成的基本层码流、 增强层码流以及音频流各自封装为 一路 RTP码流后发出。 对于每一路 RTP码流, 伴随发送一路 RTCP码流。 每 路 RTP码流的时间戳可以不相互依赖,具有各自独立的 RTP时间刻度和初始 时间戳, 时间刻度表示 1秒内媒体的时间单位数。 通过 RTCP, 来保证同一 釆样时刻上的基本层 SVC数据、 增强层 SVC数据以及音频数据在 NTP时间 上的同步。 例如, 音频的时间刻度为 48000 , 视频的时间刻度为 90000。 对于 某个视频接入单元, 基本层数据封装在基本流的 RTP包中后, RTP时间戳可 以是 Tbase,增强层数据封装在增强流的 RTP包中后, RTP时间戳可以是 Text, 与其同步时刻上的音频 RTP包时间戳可以为 Raudio。 但这几路 RTP码流各 自对应的 RTCP码流, RTCP报文中的 SR (发送者报告)里携带了参考 NTP 时间和对应的参考 RTP时间戳, 应保证同一釆样时刻上的基本层 SVC数据、 增强层 SVC数据以及音频数据在 NTP时间上的同步, 即上述 Tbase、 Text, Taudio各自所对应的 NTP时间应该一致。 为了实现的灵活性, 此处的 "NTP 时间应该一致" , 可以是 Tbase、 Text, Taudio各自所对应的 NTP时间近似 相等, 允许有一个偏差值, 该偏差值可以根据需要而定。 The encoding device sends the generated base layer code stream, the enhancement layer code stream, and the audio stream into an RTP code stream. For each RTP stream, a RTCP stream is sent along with it. The time stamps of each RTP stream may not depend on each other, and have independent RTP time scales and initial time stamps, and the time scale indicates the number of time units of the medium in 1 second. Guarantee the same through RTCP Basic layer SVC data, enhancement layer SVC data, and audio data synchronization at NTP time. For example, the audio has a time scale of 48000 and the video has a time scale of 90000. For a certain video access unit, after the basic layer data is encapsulated in the RTP packet of the basic stream, the RTP timestamp may be Tbase, and the enhancement layer data is encapsulated in the RTP packet of the enhanced stream, and the RTP timestamp may be Text, synchronized with it. The audio RTP packet timestamp at the moment can be Raudio. However, the RTP streams corresponding to the RTP streams are respectively carried. The SR (sender report) in the RTCP packet carries the reference NTP time and the corresponding reference RTP timestamp, and the base layer SVC at the same time should be guaranteed. The data, the enhancement layer SVC data, and the synchronization of the audio data in the NTP time, that is, the NTP times corresponding to the above Tbase, Text, and Taudio should be consistent. In order to achieve flexibility, the "NTP time should be consistent" here, the corresponding NTP time of Tbase, Text, Taudio is approximately equal, and a deviation value is allowed, which can be determined according to needs.
前端发送设备接收编码设备发出的 RTP码流和 RTCP码流,从 RTP码流 中提取出所封装的 SVC业务数据,将基本层码流和增强层码流按其所属的层 分别封装于广播信道帧中的不同复用子帧中, 同时将所述视频流各层码流所 在的复用子帧的位置信息封装于所述广播信道帧中。  The front-end transmitting device receives the RTP code stream and the RTCP code stream sent by the encoding device, extracts the encapsulated SVC service data from the RTP code stream, and encapsulates the base layer code stream and the enhancement layer code stream into the broadcast channel frame according to the layer to which the layer is belonged. In the different multiplex subframes, the location information of the multiplex subframe in which the code stream of each layer of the video stream is located is encapsulated in the broadcast channel frame.
图 3为本发明广播信道帧的组成结构示意图。 如图 3所示, 某频点 F上 有 40个时隙, 其中时隙 0 (复用帧 0 )用于传送控制信息, 时隙 1至时隙 39 用于传送业务信息。 前端发送设备将视频业务 S中的视频基本层码流、 音频 以及数据信息配置为复用帧 1中, 占用时隙 1至时隙 4, 复用子帧号为 1。 视 频增强层码流配置为复用帧 2, 占用时隙 5至时隙 6, 复用子帧号为 1。 复用 帧 1 和复用帧 2 中不再传输其他业务。 在业务的控制信息和电子业务指南 ( ESG, Electronic Service Guide )信息中添加承载各层码流(基本层码流及 其对应的各增强层码流) 的复用帧位置的描述信息, 该描述信息标明视频业 务 S包含两个复用子帧: 复用帧 1的复用子帧 1为业务基本层码流数据, 复 用帧 2的复用子帧 1为增强层码流数据。  FIG. 3 is a schematic structural diagram of a structure of a broadcast channel frame according to the present invention. As shown in Fig. 3, there are 40 time slots on a certain frequency point F, wherein time slot 0 (multiplex frame 0) is used to transmit control information, and time slot 1 to time slot 39 are used to transmit service information. The front-end transmitting device configures the video base layer stream, audio, and data information in the video service S into the multiplex frame 1, occupying slot 1 to slot 4, and multiplexing the subframe number to 1. The video enhancement layer code stream is configured to multiplex frame 2, occupying slot 5 to slot 6, and having a multiplex subframe number of 1. No other services are transmitted in multiplex frame 1 and multiplex frame 2. Descriptive information of a multiplexing frame position carrying each layer code stream (the base layer code stream and its corresponding enhancement layer code stream) is added to the service control information and the ESG (Electronic Service Guide) information, and the description is performed. The information indicates that the video service S includes two multiplexed subframes: the multiplex subframe 1 of the multiplex frame 1 is the service base layer code stream data, and the multiplex subframe 1 of the multiplex frame 2 is the enhancement layer code stream data.
前端发送设备获取视频业务 S的基本层码流 VI ,封装到复用帧 1的复用 子帧 1中, 获取视频业务 S的增强层码流 V2, 封装到复用帧 2的复用子帧 1 中。 前端发送设备获取视频业务 S的音频码流及数据段信息, 添加到复用帧 1 的复用子帧 1 中, 即与基本层码流一起承载于同一个复用子帧中。 各层码 流的复用帧位置的描述信息承载于广播信道帧中的复用帧 0中, 以指示接收 终端各层码流的复用帧位置, 以方便其接收视频业务 s。 The front-end transmitting device obtains the basic layer code stream VI of the video service S, encapsulates it into the multiplex subframe 1 of the multiplex frame 1, acquires the enhancement layer code stream V2 of the video service S, and encapsulates the multiplex subframe of the multiplex frame 2. 1 in. The front-end transmitting device acquires the audio code stream and the data segment information of the video service S, and adds it to the multiplex subframe 1 of the multiplex frame 1, that is, is carried in the same multiplex subframe together with the base layer code stream. Layer code The description information of the multiplexed frame position of the stream is carried in the multiplex frame 0 in the broadcast channel frame to indicate the multiplexed frame position of each layer of the code stream of the receiving terminal, so as to facilitate receiving the video service s.
音视频数据封装到复用子帧中时, 是以一个个音频单元或视频单元为单 位进行封装的。对于每个音频单元或视频单元,都有一个对应的 CMMB广播 时间戳,根据 CMMB标准,该时间戳为复用子帧内的起始播放时间和每个单 元对应的相对播放时间组合而成。 起始播放时间和每个单元对应的相对播放 时间也封装在复用子帧内, 与媒体数据一同发送。  When audio and video data is encapsulated into a multiplexed sub-frame, it is encapsulated in units of audio units or video units. For each audio unit or video unit, there is a corresponding CMMB broadcast time stamp. According to the CMMB standard, the time stamp is a combination of the initial play time in the multiplex subframe and the relative play time corresponding to each unit. The initial play time and the relative play time corresponding to each unit are also encapsulated in the multiplex subframe and transmitted together with the media data.
在前端发送设备上, 需要将所接收的媒体数据的 RTP 时间戳转换成 CMMB 广播时间戳, 在此过程中控制同一釆样时刻上 SVC各分层业务在 CMMB广播时间戳上的同步。  On the front-end transmitting device, the RTP timestamp of the received media data needs to be converted into a CMMB broadcast timestamp, and in this process, the synchronization of the SVC hierarchical services on the CMMB broadcast timestamp is controlled at the same time.
前端发送设备上, SVC各分层业务在 CMMB广播时间戳上的同步, 按 以下方法进行:  On the front-end sending device, the SVC layered services are synchronized on the CMMB broadcast timestamp, as follows:
对于 SVC各分层所对应的每一复用子帧:  For each multiplex subframe corresponding to each layer of SVC:
A、 接收输入的 RTP和 RTCP流, 对于每一视频单元, 取出其所在 RTP 包的 RTP时间戳,结合其所在 RTP码流所对应的 RTCP码流传送的 RTCP包 的有关时间信息, 计算该视频单元对应的 NTP时间;  A. Receive the input RTP and RTCP streams, for each video unit, take the RTP timestamp of the RTP packet in which it is located, and calculate the video according to the relevant time information of the RTCP packet transmitted by the RTCP code stream corresponding to the RTP stream in which it is located. The NTP time corresponding to the unit;
B、 直接用 NTP时间乘上 CMMB时间刻度, 得到该视频单元的 CMMB 广播时间戳。 对复用子帧, 取一个起始播放时间, 将视频单元的 CMMB广播 时间戳分解为起始播放时间和每个单元对应的相对播放时间两部分, 把起始 播放时间和每个单元对应的相对播放时间封装到复用子帧里去。  B. Multiply the CMMB time scale by the NTP time to obtain the CMMB broadcast timestamp of the video unit. For the multiplex subframe, take a starting play time, and decompose the CMMB broadcast time stamp of the video unit into two parts: the start play time and the relative play time corresponding to each unit, and the start play time and each unit corresponding to each unit. The relative play time is encapsulated into the multiplex subframe.
步骤 A中视频单元对应的 NTP时间的计算可以釆用以下方法:  The calculation of the NTP time corresponding to the video unit in step A can use the following methods:
( 1 )对于每一视频单元, 取出其所在 RTP包的 RTP时间戳, 从对应的 RTCP报文里取出 SR (发送者报告)里携带的参考 NTP时间和对应的参考 RTP时间戳;  (1) For each video unit, take the RTP timestamp of the RTP packet in which it is located, and take the reference NTP time carried in the SR (sender report) and the corresponding reference RTP timestamp from the corresponding RTCP packet;
( 2 )用 RTP包里的 RTP时间戳和对应的 RTCP码流里的参考 RTP时间 戳做差值, 再除以 timescale (时间刻度) , 得到一个参考绝对时间差, 用这 个差值再加上参考 NTP时间 , 就得到 RTP包对应的 NTP时间 , 亦即视频单 元对应的 NTP时间。 RTP时间戳 -参考 RTP时间戳 , , ,(2) Using the RTP timestamp in the RTP packet and the reference RTP timestamp in the corresponding RTCP codestream to make a difference, and then dividing by timescale (timescale) to obtain a reference absolute time difference, using this difference plus reference At NTP time, the NTP time corresponding to the RTP packet is obtained, that is, the NTP time corresponding to the video unit. RTP timestamp - reference RTP timestamp, , ,
NTP时间 = J 1- +参考 NTP时间 公式 ( 1 ) timescale NTP time = J 1 - + reference NTP time formula ( 1 ) timescale
1秒内媒体的时间单位数, 媒体不同时间单位可以不同, 如 RTP视频, 常用 90000Hz的时钟为时间单位, 1秒钟内的时钟数 90000就是视频的 timescale; 音频常用釆样率为时间单位, 其 timescale就是 1秒内的釆样数, 如釆样率为 48000/秒, 则 1秒内的釆样数 48000即为音频的 timescale。 The number of time units of the media in 1 second, the media can be different in different time units, such as RTP video, the clock of 90000Hz is commonly used as the time unit, the number of clocks in 1 second is 90000 is the timescale of the video; the common sampling rate of audio is the time unit. The timescale is the number of samples in 1 second. If the sample rate is 48000/second, the number of samples in 1 second is the timescale of the audio.
步骤 B中所述 CMMB时间刻度, 表示每秒发生的 CMMB时间单位数, 根据 CMMB标准《移动多媒体广播 第 2部分: 复用》 , 为 22500。  The CMMB time scale described in step B indicates the number of CMMB time units per second, according to the CMMB standard, Mobile Multimedia Broadcasting Part 2: Multiplexing, which is 22,500.
根据本方法, SVC各分层所对应的每一复用子帧可独立处理, 相互之间 没有耦合依赖。 CMMB标准并没有规定, 业务的 CMMB广播时间戳的起始 值如何确定(复用子帧内的起始播放时间是针对每个复用子帧而不是针对业 务的), 通过多路复用子帧传输的 SVC分层数据之间, 并没有一个现有的保 证同步的方法。本方法通过直接用 NTP时间乘上 CMMB时间刻度换算出 S VC 视频单元对应的 CMMB广播时间戳,用所处理的第一个复用子帧内的第一个 SVC视频单元的 NTP时间确定了业务的 CMMB广播时间戳起始值, 并通过 该转换方法确定了后续 SVC视频单元的 CMMB广播时间戳。对于本实施例, 对于某个视频接入单元, 基本层数据 RTP时间戳为 Tbase, 增强层数据 RTP 时间戳为 Text, 与其同步时刻上的音频 RTP时间戳 Raudio, 编码器发出的 RTCP报文, 保证 Tbase、 Text, Taudio各自所对应的 NTP时间一致(各自所 对应的 NTP时间相等或近似相等, 允许有一个小的偏差值) , 通过本发明所 述方法处理之后, 视频流的基本层数据、 增强层数据以及同步点上的音频数 据,在封装到复用子帧中去的时候,其对应的 CMMB广播时间戳是一致的(相 等或近似相等, 允许有一个小的偏差值) , 尽管它们被封装在不同的复用子 帧, 复用子帧是各自独立处理的。 本方法可以保证同一时间点上的 SVC各分 层业务数据在 CMMB广播时间戳的统一, 从而提供了 SVC不同分层之间同 步的保证手段。 如图 4所示, 终端上接收时, 终端监听广播信道帧中的复用帧 0, 接收 控制信息及 ESG信息, 以正确接收多媒体广播业务, 同时监听到的, 还包括 SVC业务中各层码流所在的复用子帧的位置信息。 终端根据自身需要, 如其 视频流处理能力或网络传输状况, 决定接收基本层码流, 或接收基本层码流 及对应的增强层码流。本实施例中,设终端为一个可以处理 VGA视频的上网 本,需要同时接收 QVGA的基本流和 VGA的增强流来做处理,解码展现 VGA 视频。 终端接收到从不同复用子帧传来的基本流和增强流后, 将视频单元从 复用子帧中提取出来,解析出 H264基本编码单元 NALU,将属于不同分层的 NALU按 CMMB广播时间戳对齐、 同步,之后将属于同一个视频接入单元的 NALU进行合并, 然后进行视频解码展现。 According to the method, each multiplex subframe corresponding to each layer of the SVC can be processed independently, and there is no coupling dependency between them. The CMMB standard does not specify how the starting value of the CMMB broadcast timestamp of the service is determined (the starting play time in the multiplex subframe is for each multiplex subframe rather than for the service), through the multiplexer Frame transmission between SVC layered data does not have an existing method of guaranteeing synchronization. The method converts the CMMB broadcast timestamp corresponding to the SVC video unit by directly multiplying the CMMB time scale by the NTP time, and determines the service by using the NTP time of the first SVC video unit in the first multiplexed subframe processed. The CMMB broadcasts a timestamp start value and determines the CMMB broadcast timestamp of the subsequent SVC video unit by the conversion method. For this embodiment, for a certain video access unit, the base layer data RTP timestamp is Tbase, the enhancement layer data RTP timestamp is Text, and the audio RTP timestamp Raudio at the synchronization time, the RTCP message sent by the encoder, Ensure that the NTP times of the respective Tbase, Text, and Taudio are consistent (the NTP times corresponding to each are equal or approximately equal, and a small offset value is allowed). After processing by the method of the present invention, the basic layer data of the video stream, The enhancement layer data and the audio data on the synchronization point are corresponding to the CMMB broadcast timestamps when they are encapsulated into the multiplexed sub-frames (equal or approximately equal, allowing a small offset value) even though they Encapsulated in different multiplex subframes, the multiplex subframes are processed independently. The method can ensure that the SVC service data at the same time point is unified in the CMMB broadcast time stamp, thereby providing a guarantee means for synchronizing between different layers of the SVC. As shown in FIG. 4, when receiving on the terminal, the terminal monitors the multiplex frame 0 in the broadcast channel frame, and receives the control information and the ESG information to correctly receive the multimedia broadcast service, and simultaneously monitors, and also includes the layer codes in the SVC service. The location information of the multiplex subframe in which the stream is located. The terminal according to its own needs, such as its The video stream processing capability or the network transmission status determines whether the base layer code stream is received, or the base layer code stream and the corresponding enhancement layer code stream are received. In this embodiment, the terminal is a netbook that can process VGA video, and needs to receive the basic stream of QVGA and the enhanced stream of VGA for processing, and decode and display VGA video. After receiving the elementary stream and the enhancement stream transmitted from different multiplex subframes, the terminal extracts the video unit from the multiplex subframe, parses out the H264 basic coding unit NALU, and broadcasts the NALUs belonging to different layers according to the CMMB broadcast time. The stamps are aligned, synchronized, and then the NALUs belonging to the same video access unit are merged, and then video decoding is performed.
终端上, SVC各分层业务的同步按以下方法进行:  On the terminal, the synchronization of each layered service of the SVC is performed as follows:
A、 每一个视频单元, 都有对应的 CMMB广播时间戳, 该时间戳为复用 子帧内的起始播放时间和每个视频单元对应的相对播放时间组合而成。 将视 频单元所包含的 NALU数据连同其对应的时间戳放入緩冲区, 緩冲区可以是 基本层、 增强层各自独立的。 由于某些原因, 同一时刻的 SVC分层数据不能 完全保证同时收到, 设各分层业务数据的接收时刻差的门限值为 Td, 即允许 属于同一视频接入单元的数据, 最早收到的数据与最晚收到的数据之间的时 间差为 Td, 则緩冲区应能容纳 Td时间范围内的 SVC所有分层的数据;  A. Each video unit has a corresponding CMMB broadcast timestamp, which is a combination of the initial play time in the multiplex subframe and the relative play time corresponding to each video unit. The NALU data contained in the video unit is placed in a buffer along with its corresponding time stamp. The buffer can be independent of the base layer and the enhancement layer. For some reasons, the SVC layered data at the same time cannot be completely guaranteed to be received at the same time. The threshold value of the receiving time difference of each layered service data is Td, that is, the data belonging to the same video access unit is allowed to be received at the earliest. The time difference between the data and the latest received data is Td, then the buffer should be able to accommodate all layers of SVC data in the Td time range;
B、 每隔一个固定时间间隔, 终端就从前述緩冲区取一个视频接入单元, 将视频接入单元不同分层的 NALU数据对齐合并后, 送入解码器。 为保证解 码器平滑处理,可以在緩冲区数据不少于前述的 Td时间的容差数据后才开始 进行第一次取数据的操作。 先取属于一个视频接入单元的基本层数据, 以基 本层数据的时间戳为基准, 取出与基本层数据时间戳对应的增强层数据, 将 其作为同一视频接入单元的数据, 和基本层数据合并, 将合并后的数据作为 一个完整的视频接入单元, 送入解码器解码。  B. At every fixed time interval, the terminal takes a video access unit from the foregoing buffer, aligns and merges the NALU data of different layers of the video access unit, and sends the data to the decoder. In order to ensure the smoothing of the decoder, the first data fetching operation can be started after the buffer data is not less than the tolerance data of the aforementioned Td time. First, the base layer data belonging to a video access unit is taken, and the enhancement layer data corresponding to the base layer data timestamp is taken as the reference, and the data is used as the data of the same video access unit, and the base layer data. Merging, the combined data is sent to the decoder for decoding as a complete video access unit.
在步骤 B中, 由于某种原因, 编码设备发出的 RTP、 RTCP所携带的时 间信息, 可能不能保证同一釆样时刻上的基本层数据、 增强层数据以及音频 数据在 NTP时间上的完全一致, 这样在前端设备上转换出的 CMMB广播时 间戳也就不能完全一致, 而可能有一个小的偏差, 终端根据基本层数据的时 间戳来找对应的增强层数据时, 也设一个时间戳容差值。  In step B, for some reason, the time information carried by the RTP and RTCP sent by the encoding device may not ensure that the basic layer data, the enhancement layer data, and the audio data at the same sampling time are completely consistent in the NTP time. In this way, the CMMB broadcast timestamp converted on the front-end device cannot be completely consistent, and there may be a small deviation. When the terminal finds the corresponding enhancement layer data according to the timestamp of the base layer data, a timestamp tolerance is also set. value.
本发明实施例不仅仅适用于 CMMB系统,也适合其他移动多媒体广播系 统。 以上所述, 仅为本发明的较佳实施例而已, 并非用于限定本发明的保护 范围。 The embodiments of the present invention are not only applicable to the CMMB system, but also to other mobile multimedia broadcast systems. The above is only the preferred embodiment of the present invention and is not intended to limit the scope of the present invention.
工业实用性 Industrial applicability
本发明实施例所述方法, 在不同分层的 SVC数据中打上同步的 CMMB 广播时间戳, 保证了不同分层数据之间的同步。  In the method of the embodiment of the present invention, synchronous CMMB broadcast time stamps are set in different layered SVC data to ensure synchronization between different hierarchical data.

Claims

权 利 要 求 书 Claim
1、 一种发送方法, 包括:  1. A method of transmitting, comprising:
将视频业务编码生成多路媒体流, 将所述多路媒体流以媒体单元为单位 分别封装在广播信道帧中的不同复用子帧中, 所述复用子帧中还携带该复用 子帧中封装的各媒体单元的移动多媒体广播时间戳, 且同一釆样时刻的媒体 流的移动多媒体广播时间戳同步, 并将所述媒体流所在的复用子帧的位置信 息封装于所述广播信道帧中, 将所述广播信道帧发送至接收终端, 所述媒体 流包括所述视频业务的视频流编码生成的基本层码流及对应的增强层码流, 或者, 包括视频业务的音频流、 视频业务的视频流编码生成的基本层码流及 对应的增强层码流; 从而在移动多媒体广播中实现可伸缩视频编码业务同步 发送。  The video service is encoded to generate a multi-channel media stream, and the multiple media streams are respectively encapsulated in different multiplex subframes in a broadcast channel frame in units of media units, and the multiplexer is also carried in the multiplex subframe. a mobile multimedia broadcast timestamp of each media unit encapsulated in the frame, and a mobile multimedia broadcast timestamp of the media stream at the same time instant is synchronized, and the location information of the multiplexed subframe in which the media stream is located is encapsulated in the broadcast And transmitting, in the channel frame, the broadcast channel frame to the receiving terminal, where the media stream includes a base layer code stream generated by the video stream coding of the video service and a corresponding enhancement layer code stream, or an audio stream including a video service. And the basic layer code stream generated by the video stream of the video service and the corresponding enhancement layer code stream; thereby implementing synchronous transmission of the scalable video coding service in the mobile multimedia broadcast.
2、 如权利要求 1所述的方法, 其中, 按如下方式将所述媒体流封装到所 述复用子帧中:  2. The method of claim 1, wherein the media stream is encapsulated into the multiplex subframe as follows:
将所述多路媒体流封装为多路实时传输协议(RTP )码流, 其中, 每路 RTP码流伴随一路实时传输控制协议( RTCP )码流, 且所述 RTCP码流保证 同一釆样时刻的媒体流的网络时间协议( NTP ) 时间同步;  Encapsulating the multi-channel media stream into a multi-channel real-time transport protocol (RTP) code stream, where each RTP code stream is accompanied by a real-time transport control protocol (RTCP) code stream, and the RTCP code stream ensures the same sample time Network Time Protocol (NTP) time synchronization of media streams;
提取所述 RTP码流中封装的媒体流, 将所述媒体流以媒体单元为单位分 别封装在广播信道帧中的不同复用子帧中, 将所述媒体流的 RTP时间戳转换 为 NTP时间, 再将 NTP时间转换为统一时间基准下的移动多媒体广播时间 戳, 将所述移动多媒体广播时间戳封装到其对应的媒体单元所在的复用子帧 中。  Extracting the media stream encapsulated in the RTP code stream, and encapsulating the media stream in different multiplex subframes in a broadcast channel frame in units of media units, and converting the RTP timestamp of the media stream into an NTP time And converting the NTP time into a mobile multimedia broadcast timestamp under the unified time reference, and packaging the mobile multimedia broadcast timestamp into a multiplex subframe in which the corresponding media unit is located.
3、 如权利要求 2所述的方法, 其中, 按如下方式将 RTP时间戳转换为 移动多媒体广播时间戳: 对每一媒体单元 , 取出所在 RTP包的 RTP时间戳 , 结合所在 RTP码流 对应的 RTCP码流传送的 RTCP包的有关时间信息, 计算该媒体单元的 NTP 时间;  3. The method according to claim 2, wherein the RTP timestamp is converted into a mobile multimedia broadcast timestamp as follows: For each media unit, the RTP timestamp of the RTP packet is taken out, and is combined with the corresponding RTP code stream. Calculating the time information of the RTCP packet transmitted by the RTCP code stream, and calculating the NTP time of the media unit;
将该媒体单元的 NTP时间与移动多媒体广播时间刻度相乘,得到所述媒 体单元的移动多媒体广播时间戳。 Multiplying the NTP time of the media unit by the mobile multimedia broadcast time scale to obtain a mobile multimedia broadcast time stamp of the media unit.
4、 如权利要求 1所述的方法, 其中, 所述同一釆样时刻的媒体流的移动 多媒体广播时间戳同步是指: 同一釆样时刻的媒体流的移动多媒体广播时间 戳值之间的差值在预设的时间戳容差值范围内。 4. The method according to claim 1, wherein the mobile multimedia broadcast timestamp synchronization of the media stream at the same sample time refers to: a difference between mobile multimedia broadcast timestamp values of media streams at the same sample time The value is within the preset time stamp tolerance value range.
5、 一种发送系统, 所述系统包括编码设备和前端发送设备, 其中: 所述编码设备设置为: 将视频业务编码生成多路媒体流; 所述媒体流包 括所述视频业务的视频流编码生成的基本层码流及对应的增强层码流,或者, 包括所述视频业务的音频流、 所述视频业务的视频流编码生成的基本层码流 及对应的增强层码流;  A transmission system, where the system includes an encoding device and a front-end transmitting device, where: the encoding device is configured to: encode a video service to generate a multi-channel media stream; and the media stream includes a video stream encoding of the video service. a generated base layer code stream and a corresponding enhancement layer code stream, or an audio stream including the video service, a base layer code stream generated by the video stream coding of the video service, and a corresponding enhancement layer code stream;
所述前端发送设备设置为: 将所述媒体流以媒体单元为单位分别封装在 广播信道帧中的不同复用子帧中, 将该复用子帧中封装的各媒体单元的移动 多媒体广播时间戳封装在所述复用子帧中, 且同一釆样时刻的媒体流的移动 多媒体广播时间戳同步, 并将所述媒体流所在的复用子帧的位置信息封装于 所述广播信道帧中, 将所述广播信道帧发送至接收终端;  The front-end transmitting device is configured to: encapsulate the media stream in different multiplex subframes in a broadcast channel frame in units of media units, and move the multimedia broadcast time of each media unit encapsulated in the multiplex subframe Stamping is encapsulated in the multiplexed subframe, and the mobile multimedia broadcast timestamp of the media stream at the same time instant is synchronized, and the location information of the multiplexed subframe in which the media stream is located is encapsulated in the broadcast channel frame. Transmitting the broadcast channel frame to the receiving terminal;
从而在移动多媒体广播中实现可伸缩视频编码业务同步发送。  Thereby, synchronous transmission of the scalable video coding service is implemented in the mobile multimedia broadcast.
6、 如权利要求 5所述的系统, 其中,  6. The system of claim 5, wherein
所述编码设备包括编码单元和封装单元, 其中:  The encoding device includes an encoding unit and a packaging unit, where:
所述编码单元设置为: 将视频业务编码生成多路媒体流;  The encoding unit is configured to: encode a video service to generate a multi-channel media stream;
所述封装单元设置为: 将所述多路媒体流封装为多路实时传输协议 ( RTP )码流, 其中, 每路 RTP码流伴随一路实时传输控制协议( RTCP )码 流, 且所述 RTCP码流保证同一釆样时刻的媒体流的网络时间协议( NTP ) 时间同步;  The encapsulating unit is configured to: encapsulate the multi-path media stream into a multi-channel real-time transport protocol (RTP) code stream, where each RTP code stream is accompanied by a real-time transport control protocol (RTCP) code stream, and the RTCP The code stream guarantees network time protocol (NTP) time synchronization of the media stream at the same time;
所述前端发送设备包括第一封装单元, 第二封装单元、 转换单元, 第三 封装单元和发送单元, 其中:  The front-end transmitting device includes a first encapsulating unit, a second encapsulating unit, a converting unit, a third encapsulating unit, and a sending unit, where:
第一封装单元设置为: 提取所述 RTP码流中封装的媒体流, 将所述媒体 流以媒体单元为单位分别封装在广播信道帧中的不同复用子帧中;  The first encapsulating unit is configured to: extract media streams encapsulated in the RTP code stream, and encapsulate the media streams in different multiplex sub-frames in a broadcast channel frame in units of media units;
所述第二封装单元设置为: 将所述媒体流所在的复用子帧的位置信息封 装在所述广播信道帧中;  The second encapsulating unit is configured to: encapsulate location information of a multiplex subframe in which the media stream is located in the broadcast channel frame;
所述转换单元设置为:将所述媒体流的 RTP时间戳转换为 NTP时间,再 将 NTP时间转换为统一时间基准下的移动多媒体广播时间戳; 所述第三封装单元设置为: 将所述移动多媒体广播时间戳封装到其对应 的媒体单元所在的复用子帧中; 所述发送单元设置为: 将所述广播信道帧发送至接收终端。 The converting unit is configured to: convert an RTP timestamp of the media stream into an NTP time, and then Converting the NTP time to a mobile multimedia broadcast timestamp under the unified time reference; the third encapsulating unit is configured to: encapsulate the mobile multimedia broadcast timestamp into a multiplex subframe in which the corresponding media unit is located; The transmitting unit is configured to: send the broadcast channel frame to the receiving terminal.
7、 如权利要求 6所述的系统, 其中,  7. The system of claim 6 wherein
所述转换单元包括第一转换单元和第二转换单元, 其中:  The conversion unit includes a first conversion unit and a second conversion unit, wherein:
所述第一转换单元设置为: 对每一媒体单元, 取出所在 RTP 包的 RTP 时间戳, 结合所在 RTP码流对应的 RTCP码流传送的 RTCP包的有关时间信 息 , 计算该媒体单元的 NTP时间;  The first converting unit is configured to: take out an RTP timestamp of the RTP packet, and calculate an NTP time of the media unit according to the time information of the RTCP packet transmitted by the RTCP code stream corresponding to the RTP stream for each media unit. ;
所述第二转换单元设置为: 将所述媒体单元的 NTP时间与移动多媒体广 播时间刻度相乘, 得到所述媒体单元的移动多媒体广播时间戳。  The second converting unit is configured to: multiply the NTP time of the media unit by a mobile multimedia broadcast time scale to obtain a mobile multimedia broadcast timestamp of the media unit.
8、 如权利要求 5、 6或 7所述的系统, 其中, 所述同一釆样时刻的媒体 流的移动多媒体广播时间戳同步是指: 同一釆样时刻的媒体流的移动多媒体 广播时间戳值之间的差值在预设的时间戳容差值范围内。  8. The system according to claim 5, 6 or 7, wherein the mobile multimedia broadcast timestamp synchronization of the media stream at the same sample time refers to: a mobile multimedia broadcast timestamp value of the media stream at the same sample time The difference between the preset time stamp tolerance values.
9、 一种接收方法, 包括:  9. A receiving method, comprising:
接收终端监听广播信道帧中视频业务中视频流的各层码流所在的复用子 帧的位置信息;  Receiving, by the receiving terminal, location information of a multiplex sub-frame in which each layer of the video stream in the video service in the broadcast channel frame is located;
接收终端根据自身的视频流处理能力接收基本层码流, 或者, 接收基本 层码流及对应的增强层码流;  The receiving terminal receives the basic layer code stream according to its own video stream processing capability, or receives the basic layer code stream and the corresponding enhancement layer code stream;
当接收基本层码流及对应的增强层码流时, 将各层码流按照移动多媒体 广播时间戳对齐合并后, 对基本层码流及对应的增强层码流进行解码, 输出 基本层码流与增强层码流合并后的视频数据; 从而在移动多媒体广播中实现可伸缩视频编码业务接收。  When receiving the base layer code stream and the corresponding enhancement layer code stream, the layer code streams are aligned and combined according to the mobile multimedia broadcast timestamp, and then the base layer code stream and the corresponding enhancement layer code stream are decoded, and the basic layer code stream is output. The video data combined with the enhancement layer code stream; thereby implementing scalable video coding service reception in the mobile multimedia broadcast.
10、 如权利要求 9所述的方法, 其中, 所述接收基本层码流及对应的增 强层码流, 将各层码流按照移动多媒体广播时间戳对齐合并包括:  10. The method according to claim 9, wherein the receiving the base layer code stream and the corresponding enhanced layer code stream, and aligning the layer code streams according to the mobile multimedia broadcast timestamp comprises:
接收终端将接收基本层码流和增强层码流, 存入緩冲区中;  The receiving terminal will receive the base layer code stream and the enhancement layer code stream and store them in the buffer;
从緩冲区中取出属于一个视频接入单元的基本层码流数据, 以所述基本 层码流数据的移动多媒体广播时间戳为基准, 取出与所述基本层码流数据的 移动多媒体广播时间戳同步的增强层码流数据, 作为同一视频接入单元的数 据合并。 Extracting base layer code stream data belonging to a video access unit from the buffer, to the basic The mobile multimedia broadcast timestamp of the layer code stream data is used as a reference, and the enhancement layer code stream data synchronized with the mobile multimedia broadcast timestamp of the base layer code stream data is taken out as data merge of the same video access unit.
11、 如权利要求 10所述的方法, 其中,  11. The method of claim 10, wherein
所述与所述基本层码流数据的移动多媒体广播时间戳同步的增强层码流 数据是指, 与所述基本层码流数据的移动多媒体广播时间戳的差值在预设的 时间戳容差值范围内的增强层码流数据。  The enhancement layer code stream data synchronized with the mobile multimedia broadcast timestamp of the base layer code stream data refers to a difference between a mobile multimedia broadcast timestamp and the base layer code stream data at a preset timestamp. Enhancement layer code stream data within the difference range.
12、 一种接收装置, 包括: 12. A receiving device, comprising:
监听单元, 其设置为: 监听广播信道帧中视频业务中视频流的各层码流 所在的复用子帧的位置信息;  a monitoring unit, configured to: monitor location information of a multiplex subframe in which each layer of the video stream in the video service in the broadcast channel frame is located;
接收单元, 其设置为: 根据自身的视频流处理能力接收基本层码流, 或 者, 接收基本层码流及对应的增强层码流;  a receiving unit, configured to: receive a base layer code stream according to its own video stream processing capability, or receive a base layer code stream and a corresponding enhancement layer code stream;
对齐合并单元, 其设置为: 将各层码流按照移动多媒体广播时间戳对齐 合并;  Aligning the merging unit, which is set to: align each layer code stream according to the mobile multimedia broadcast time stamp;
解码单元, 其设置为: 对基本层码流及对应的增强层码流进行解码, 输 出基本层码流与增强层码流合并后的视频数据;  a decoding unit, configured to: decode the base layer code stream and the corresponding enhancement layer code stream, and output the combined video data of the base layer code stream and the enhancement layer code stream;
从而在移动多媒体广播中实现可伸缩视频编码业务接收。  Thereby, scalable video coding service reception is implemented in mobile multimedia broadcasting.
13、 如权利要求 12所述的装置, 其中,  13. The apparatus according to claim 12, wherein
所述接收单元还设置为: 将接收到的基本层码流和增强层码流存入緩冲 区中;  The receiving unit is further configured to: store the received base layer code stream and the enhancement layer code stream into the buffer area;
所述对齐合并单元是设置为: 从緩冲区中取出属于一个视频接入单元的 基本层码流数据, 以所述基本层码流数据的移动多媒体广播时间戳为基准, 取出与所述基本层码流数据的移动多媒体广播时间戳同步的增强层码流数 据, 作为同一视频接入单元的数据合并。  The alignment merging unit is configured to: extract base layer code stream data belonging to a video access unit from a buffer, and extract and base the mobile multimedia broadcast timestamp of the base layer code stream data The enhanced layer code stream data of the mobile multimedia broadcast timestamp synchronization of the layer code stream data is merged as data of the same video access unit.
14、 如权利要求 13所述的装置, 其中,  14. The apparatus according to claim 13, wherein
所述取出与所述基本层码流数据的移动多媒体广播时间戳同步的增强层 码流数据是指, 取出与所述基本层码流数据的移动多媒体广播时间戳之间的 差值在预设的时间戳容差值范围内的增强层码流数据。  And extracting the enhancement layer code stream data synchronized with the mobile multimedia broadcast timestamp of the base layer code stream data, where the difference between the mobile multimedia broadcast timestamp and the base layer code stream data is taken out at a preset The enhancement layer code stream data within the time stamp tolerance range.
PCT/CN2011/076622 2010-09-17 2011-06-30 System and method for realizing synchronous transmission and reception of scalable video coding service WO2012034442A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201010290881.7A CN101951506B (en) 2010-09-17 2010-09-17 System and method for realizing synchronous transmitting and receiving of scalable video coding service
CN201010290881.7 2010-09-17

Publications (1)

Publication Number Publication Date
WO2012034442A1 true WO2012034442A1 (en) 2012-03-22

Family

ID=43454846

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2011/076622 WO2012034442A1 (en) 2010-09-17 2011-06-30 System and method for realizing synchronous transmission and reception of scalable video coding service

Country Status (2)

Country Link
CN (1) CN101951506B (en)
WO (1) WO2012034442A1 (en)

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101951506B (en) * 2010-09-17 2014-03-12 中兴通讯股份有限公司 System and method for realizing synchronous transmitting and receiving of scalable video coding service
CN102480634B (en) * 2010-11-24 2015-12-16 中兴通讯股份有限公司 The method, apparatus and system that in Mobile Multimedia Broadcasting, classified service is synchronous
CN102510488B (en) * 2011-11-04 2015-11-11 播思通讯技术(北京)有限公司 A kind of utilize broadcast characteristic to carry out audio-visual synchronization method and device
CN102665108A (en) * 2012-04-10 2012-09-12 中国联合网络通信集团有限公司 Processing method, processing device and processing system of mobile video service
CN102761776B (en) * 2012-08-01 2015-01-14 重庆大学 Video and audio synchronizing method of P2PVoD (peer-to-peer video on demand) system based on SVC (scalable video coding)
CN106303673B (en) 2015-06-04 2021-01-22 中兴通讯股份有限公司 Code stream alignment and synchronization processing method, transmitting and receiving terminal and communication system
CN106507112B (en) * 2015-09-07 2020-05-12 中兴通讯股份有限公司 Code stream processing method, device and system
CN105611222B (en) * 2015-12-25 2019-03-15 北京紫荆视通科技有限公司 Audio data processing method, device, controlled device and system
CN106231317A (en) * 2016-09-29 2016-12-14 三星电子(中国)研发中心 Video processing, coding/decoding method and device, VR terminal, audio/video player system
CN112564837B (en) * 2019-09-25 2022-05-06 杭州海康威视数字技术股份有限公司 Multi-path data flow synchronization method and multi-path data flow synchronization step-by-step transmission system
CN112825513B (en) * 2019-11-21 2023-08-22 深圳市中兴微电子技术有限公司 Method, device, equipment and storage medium for transmitting multipath data
CN112383816A (en) * 2020-11-03 2021-02-19 广州长嘉电子有限公司 ATSC system signal analysis method and system based on android system intervention

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101179736A (en) * 2006-11-08 2008-05-14 中兴通讯股份有限公司 Method for converting transmission stream program to China mobile multimedia broadcasting program
CN101394555A (en) * 2008-10-24 2009-03-25 清华大学 High error tolerant low time delay video transmission method and device suitable for deep space communication
CN101742246A (en) * 2009-12-01 2010-06-16 中广传播有限公司 System and method for realizing interactive service of mobile multimedia broadcast
CN101951506A (en) * 2010-09-17 2011-01-19 中兴通讯股份有限公司 System and method for realizing synchronous transmitting and receiving of scalable video coding service

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1868213B (en) * 2003-09-02 2010-05-26 索尼株式会社 Content receiving apparatus, video/audio output timing control method, and content providing system
EP1860884A1 (en) * 2006-05-26 2007-11-28 BRITISH TELECOMMUNICATIONS public limited company Video processing
CN101359974B (en) * 2007-07-31 2012-09-19 北京新岸线移动通信技术有限公司 High-efficient source adaptation method suitable for LDPC block coding in T-MMB system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101179736A (en) * 2006-11-08 2008-05-14 中兴通讯股份有限公司 Method for converting transmission stream program to China mobile multimedia broadcasting program
CN101394555A (en) * 2008-10-24 2009-03-25 清华大学 High error tolerant low time delay video transmission method and device suitable for deep space communication
CN101742246A (en) * 2009-12-01 2010-06-16 中广传播有限公司 System and method for realizing interactive service of mobile multimedia broadcast
CN101951506A (en) * 2010-09-17 2011-01-19 中兴通讯股份有限公司 System and method for realizing synchronous transmitting and receiving of scalable video coding service

Also Published As

Publication number Publication date
CN101951506A (en) 2011-01-19
CN101951506B (en) 2014-03-12

Similar Documents

Publication Publication Date Title
WO2012034442A1 (en) System and method for realizing synchronous transmission and reception of scalable video coding service
US8009742B2 (en) Method and system for retransmitting internet protocol packet for terrestrial digital multimedia broadcasting service
KR101639358B1 (en) Transmission apparatus and method, and reception apparatus and method for providing 3d service using the content and additional image seperately transmitted with the reference image transmitted in real time
JP5543590B2 (en) Hierarchical transmission method, hierarchical reception method, hierarchical transmission device, and hierarchical reception device in mobile multimedia broadcasting system
US8396082B2 (en) Time-interleaved simulcast for tune-in reduction
US8422564B2 (en) Method and apparatus for transmitting/receiving enhanced media data in digital multimedia broadcasting system
CN101895750B (en) Set-top box and PC-oriented real-time streaming media server and working method
CN102685588A (en) Decoder and method at the decoder for synchronizing rendering of contents received through different networks
KR20130120422A (en) Method and apparatus for tranmiting and receiving data multimedia transfer system
CN102356619A (en) Modified stream synchronization
US20080301742A1 (en) Time-interleaved simulcast for tune-in reduction
WO2009114557A1 (en) System and method for recovering the decoding order of layered media in packet-based communication
EP2276192A2 (en) Method and apparatus for transmitting/receiving multi - channel audio signals using super frame
WO2012079402A1 (en) Method, system and apparatus for transmitting multimedia data
CN100479529C (en) Conversion method of multiplexing protocols in broadcast network
WO2012034441A1 (en) Method and system for achieving scalable video coding service cooperation transmission
EP1855402A1 (en) Transmission, reception and synchronisation of two data streams
WO2012068898A1 (en) Method, apparatus and system for synchronizing tiered service in mobile multimedia broadcasting
CN112272316B (en) Multi-transmission code stream synchronous UDP distribution method and system based on video display timestamp
KR20130056829A (en) Transmitter/receiver for 3dtv broadcasting, and method for controlling the same
KR101745652B1 (en) Broadcasting transmitter and receiver for rapid receiving of decoding information, and method thereof
KR100881312B1 (en) Apparatus and Method for encoding/decoding multi-channel audio signal, and IPTV thereof
KR20080023902A (en) Internet protocol packet re-transporting apparatus for digital multimedia broadcasting service
KR100950771B1 (en) Apparatus and method for transmitting and receiving the broadcasting signal
CN101179737B (en) Method of converting compound protocol in multimedia broadcasting network

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 11824512

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 11824512

Country of ref document: EP

Kind code of ref document: A1