WO2009104869A1 - Method and apparatus for svc video and aac audio synchronization using npt - Google Patents

Method and apparatus for svc video and aac audio synchronization using npt Download PDF

Info

Publication number
WO2009104869A1
WO2009104869A1 PCT/KR2008/007859 KR2008007859W WO2009104869A1 WO 2009104869 A1 WO2009104869 A1 WO 2009104869A1 KR 2008007859 W KR2008007859 W KR 2008007859W WO 2009104869 A1 WO2009104869 A1 WO 2009104869A1
Authority
WO
WIPO (PCT)
Prior art keywords
information
npt
audio information
value
video information
Prior art date
Application number
PCT/KR2008/007859
Other languages
French (fr)
Inventor
Soon-Heung Jung
Jeong Ju Yoo
Jin Woo Hong
Kwang-Deok Seo
Wonsup Chi
Original Assignee
Electronics And Telecommunications Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority to KR20080015154 priority Critical
Priority to KR10-2008-0015154 priority
Priority to KR10-2008-0025042 priority
Priority to KR20080025042A priority patent/KR100916505B1/en
Application filed by Electronics And Telecommunications Research Institute filed Critical Electronics And Telecommunications Research Institute
Priority claimed from US12/735,828 external-priority patent/US8675727B2/en
Publication of WO2009104869A1 publication Critical patent/WO2009104869A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/236Assembling of a multiplex stream, e.g. transport stream, by combining a video stream with other content or additional data, e.g. inserting a URL [Uniform Resource Locator] into a video stream, multiplexing software data into a video stream; Remultiplexing of multiplex streams; Insertion of stuffing bits into the multiplex stream, e.g. to obtain a constant bit-rate; Assembling of a packetised elementary stream
    • H04N21/2368Multiplexing of audio and video streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network, synchronizing decoder's clock; Client middleware
    • H04N21/4302Content synchronization processes, e.g. decoder synchronization
    • H04N21/4307Synchronizing display of multiple content streams, e.g. synchronisation of audio and video output or enabling or disabling interactive icons for a given period of time
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network, synchronizing decoder's clock; Client middleware
    • H04N21/434Disassembling of a multiplex stream, e.g. demultiplexing audio and video streams, extraction of additional data from a video stream; Remultiplexing of multiplex streams; Extraction or processing of SI; Disassembling of packetised elementary stream
    • H04N21/4341Demultiplexing of audio and video streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/60Network structure or processes for video distribution between server and client or between remote clients; Control signalling between clients, server and network components; Transmission of management data between server and client, e.g. sending from server to client commands for recording incoming content stream; Communication details between server and client 
    • H04N21/63Control signaling related to video distribution between client, server and network components; Network processes for video distribution between server and clients or between remote clients, e.g. transmitting basic layer and enhancement layers over different transmission paths, setting up a peer-to-peer communication via Internet between remote STB's; Communication protocols; Addressing
    • H04N21/643Communication protocols
    • H04N21/6437Real-time Transport Protocol [RTP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/60Network structure or processes for video distribution between server and client or between remote clients; Control signalling between clients, server and network components; Transmission of management data between server and client, e.g. sending from server to client commands for recording incoming content stream; Communication details between server and client 
    • H04N21/63Control signaling related to video distribution between client, server and network components; Network processes for video distribution between server and clients or between remote clients, e.g. transmitting basic layer and enhancement layers over different transmission paths, setting up a peer-to-peer communication via Internet between remote STB's; Communication protocols; Addressing
    • H04N21/647Control signaling between network components and server or clients; Network processes for video distribution between server and clients, e.g. controlling the quality of the video stream, by dropping packets, protecting content from unauthorised alteration within the network, monitoring of network load, bridging between two different networks, e.g. between IP and wireless
    • H04N21/64784Data processing by the network
    • H04N21/64792Controlling the complexity of the content stream, e.g. by dropping packets
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments
    • H04N21/8455Structuring of content, e.g. decomposing content into time segments involving pointers to the content, e.g. pointers to the I-frames of the video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/85Assembly of content; Generation of multimedia applications
    • H04N21/854Content authoring
    • H04N21/8547Content authoring involving timestamps for synchronizing content

Abstract

A method of supporting synchronization of Scalable Video Coding (SVC) information and Advanced Audio Coding (AAC) information using a Normal Play Time (NPT), the method including: receiving video information using a decoding apparatus; receiving audio information using the decoding apparatus; calculating the NPT of the video information using a Real-time Transport Protocol (RTP) time stamp included in the received video information; calculating the NPT of the audio information using the RTP time stamp included in the received audio information; comparing the NPT of the video information and the NPT of the audio information to calculate a difference value; determining whether the calculated difference value is included in a specific synchronization region; and outputting the audio information and the video information when the calculated difference value is determined to be included in the specific synchronization region.

Description

Description

METHOD AND APPARATUS FOR SVC VIDEO AND AAC AUDIO SYNCHRONIZATION USING NPT

Technical Field

[1] The present invention relates to a method and apparatus for supporting synchronization of Scalable Video Coding (SVC) information and Advanced Audio Coding (AAC) information using a Normal Play Time (NPT), and more particularly, to a method and apparatus for supporting synchronization with respect to video and audio using an NPT induced from time stamp information to be recorded in a header of a Real-time Transport Protocol (RTP) packet when performing RTP packetization of the SVC information and the AAC information and transmitting the SVC information and the AAC information in an Internet Protocol (IP) network such as the Internet.

[2] This work was supported by the IT R&D program of MIC/IITA [2005-S- 103-03, Development of Ubiquitous Content Access Technology for Convergence of Broadcasting and Communications]. Background Art

[3] Generally, a Real-time Transport Protocol (RTP) packet is used for transmitting media data in order to transmit video/audio using an Internet Protocol (IP) network, and an RTP Control Protocol (RTCP) packet is used for secondarily cooperating with the RTP packet.

[4] In particular, one of various important functions of the RTCP packet is providing media synchronization information. Since the video and the audio are different media, a media sampling rate for acquiring an access unit corresponding to a unit of RTP packetization is different from each other.

[5] Accordingly, the video and the audio need to be transmitted using each different RTP session. Information used for synchronization in a header corresponds to a "time stamp" field, and a value is independently generated for each video/audio access unit based on the sampling rate of the video and the audio.

[6] Since the video and the audio independently generate a "time stamp" value, synchronization between the video and the audio may not be performed using only "time stamp" information. Accordingly, time information to which a video stream and an audio stream may be commonly referred is required for providing synchronization between the video and the audio.

[7] A method of providing common time information uses an RTCP Sender Report (SR) packet. A "Normal Play Time (NPT) time stamp" field provides the common time information to which the video and the audio are commonly referred, and an "RTP time stamp" field records an RTP time stamp of the video or the audio corresponding to an "NPT time stamp".

[8] Accordingly, each RTP time stamp value by which synchronization between the video and the audio is performed by a medium of the "NPT time stamp" may be estimated. Each RTCP session is generated for each of a video session and an audio session, and is transmitted to be within 5 % of the total traffic. Each time the RTCP session is periodically transmitted, the RTP time stamp of each media corresponding to the NPT time stamp is recorded in the RTCP packet and is transmitted, thereby enabling a receiver to acquire the information required for synchronization.

[9] As described above, since a legacy media synchronization method requires the "time stamp" information of the RTP packet and transmission of the RTCP SR packet periodically providing the NPT time stamp value, complexity or a processing process is complex.

[10] In particular, when an amount of traffic of a network is excessive, a congestion problem of the network may worsen due to the RTCP SR packet transmission. Disclosure of Invention Technical Problem

[11] The present invention provides a method of calculating a Normal Play Time (NPT) of audio information and video information using a Real-time Transport Protocol (RTP) time stamp.

[12] The present invention also provides a method of inducing an NPT from a time stamp value with respect to a received video and a received audio to provide synchronization between two media.

[13] The present invention also provides a method of inducing an NPT using only an RTP time stamp by eliminating a separate need for transmitting and processing an RTP Control Protocol Sender Report (RTCP SR) packet of video information and audio information.

[14] The present invention also provides a method of reducing a number of User

Datagram Protocol (UDP) ports required for transmitting an RTCP packet, and reducing an amount of control traffic coming into a network since RTCP packet transmission is unnecessary. Technical Solution

[15] According to an aspect of the present invention, there is provided a method of extracting a Normal Play Time (NPT) of Scalable Video Coding (SVC) information using a Real-time Transport Protocol (RTP) time stamp, the method including: receiving video information from a decoding apparatus; extracting a specific output screen RTP time stamp of the video information; calculating a difference value by sub- trading a first output screen RTP time stamp of the video information from the extracted specific output screen RTP time stamp of the video information; and defining, as the NPT of the video information, a value calculated by dividing the difference value by a sampling rate with respect to an access unit of the video information.

[16] According to another aspect of the present invention, there is provided a method of extracting an NPT of Advanced Audio Coding (AAC) information using an RTP time stamp, the method including: receiving audio information from a decoding apparatus; extracting a specific output screen RTP time stamp of the audio information; calculating a difference value by subtracting a first output screen RTP time stamp of the audio information from the extracted specific output screen RTP time stamp of the audio information; and defining, as the NPT of the audio information, a value calculated by dividing the difference value by a sampling rate with respect to an access unit of the audio information.

[17] According to still another aspect of the present invention, there is provided a method of supporting synchronization of SVC information and AAC information using an NPT, the method including: receiving video information using a decoding apparatus; receiving audio information using the decoding apparatus; calculating the NPT of the video information using an RTP time stamp included in the received video information; calculating the NPT of the audio information using the RTP time stamp included in the received audio information; comparing the NPT of the video information and the NPT of the audio information to calculate a difference value; determining whether the calculated difference value is included in a specific synchronization region; and outputting the audio information and the video information when the calculated difference value is determined to be included in the specific synchronization region.

[18] In an aspect of the present invention, the method further includes: determining a display interval of the video information and the audio information to adjust the display interval between screens of the video information when the calculated difference value is determined to be excluded from the specific synchronization region.

[19] According to yet another aspect of the present invention, there is provided an apparatus for supporting synchronization of SVC information and AAC information using an NPT, the apparatus including: an information receiving unit to receive video information and audio information using a decoding apparatus; a video information analysis unit to calculate the NPT of the video information using an RTP time stamp included in the received video information; an audio information analysis unit to calculate the NPT of the audio information using the RTP time stamp included in the received audio information; a calculation unit to compare the NPT of the video in- formation and the NPT of the audio information to calculate a difference value; a determination unit to determine whether the calculated difference value is included in a specific synchronization region; and an output unit to output the audio information and the video information when the calculated difference value is determined to be included in the specific synchronization region.

[20] In an aspect of the present invention, the apparatus further includes: a display interval adjustment unit to determine a display interval of the video information and the audio information to adjust the display interval between screens of the video information when the calculated difference value is determined to be excluded from the specific synchronization region. Brief Description of the Drawings

[21] FIG. 1 illustrates a process of playing a single audio frame as Pulse Code Modulation

(PCM) data and inputting and outputting the PCM data to a wave-out buffer after the single audio frame is decoded according to an exemplary embodiment of the present invention;

[22] FIG. 2 is a flowchart illustrating a method of extracting a Normal Play Time (NPT) of Scalable Video Coding (SVC) information using a Real-time Transport Protocol (RTP) time stamp according to an exemplary embodiment of the present invention;

[23] FIG. 3 is a flowchart illustrating a method of extracting an NPT of Advanced Audio

Coding (AAC) information using an RTP time stamp according to an exemplary embodiment of the present invention;

[24] FIG. 4 is a block diagram illustrating a configuration of an apparatus for supporting synchronization of SVC information and AAC information using an NPT according to an exemplary embodiment of the present invention;

[25] FIG. 5 is a block diagram illustrating a synchronization algorithm of video information and audio information using an NPT according to an exemplary embodiment of the present invention;

[26] FIG. 6 is a flowchart illustrating NPT processing for synchronization of audio information and video information according to an exemplary embodiment of the present invention; and

[27] FIG. 7 is a flowchart illustrating a method of supporting synchronization of SVC information and AAC information using an NPT according to an exemplary embodiment of the present invention. Mode for the Invention

[28] Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the like elements throughout. The embodiments are described below in order to explain the present invention by referring to the figures.

[29] An apparatus for supporting synchronization according to an exemplary embodiment of the present invention is based on an apparatus for synchronizing video information and audio information to process a piece of media information.

[30] An exemplary embodiment of the present invention uses a Normal Play Time (NPT) acquired from a Real-time Transport Protocol (RTP) time stamp in order to match synchronization of Scalable Video Coding (SVC) information and Advanced Audio Coding (AAC) information.

[31] Accordingly, an exemplary embodiment of the present invention discloses a method of inducing each NPT using only the RTP time stamp included in the SVC information and the AAC information, and the method is described below with reference to related Equations.

[32] First, a method of inducing the NPT of the SVC information (hereinafter, referred to as 'video information') is disclosed.

[33] In an exemplary embodiment of the present invention,

Figure imgf000007_0001
corresponding to an NPT of a k-th video screen in which the video information received from a decoding apparatus is outputted to a display apparatus may be induced using RTP time stamp information by Equation 1 : [34] [Equation 1]

[35] NPTf = (RTPTf - RTPTf )/ SRV

[36] where

RTPT *' a denotes an RTP time stamp of a first output screen (an Instantaneous Decoding Refresh (IDR) picture),

RTPTf y o denotes an RTP time stamp of a k-th output screen, and

denotes a sampling rate with respect to an access unit of video in a transmitter. [37] 90 KHz may be generally applied to

with respect to the video information, however, is not limited to this value, and a time stamp value with respect to each screen is generated based on

[38] Since an outputted unit corresponds to an inconsecutive individual screen in the case of the video information, the NPT may be easily acquired for each output screen as described above.

[39] However, since an output unit of the audio information corresponds to a consecutive

Pulse Code Modulation (PCM) data block, the output unit may not be classified and the NPT may not be directly acquired. In order to solve the above-described problem, an exemplary embodiment of the present invention discloses a method of acquiring the NPT with respect to the audio information using a size of a wave-out buffer remaining before PCM data is outputted.

[40] FIG. 1 illustrates a process of playing a single audio frame as PCM data and inputting and outputting the PCM data to a wave-out buffer after the single audio frame is decoded according to an exemplary embodiment of the present invention.

[41] As illustrated in FIG. 1, the audio information is played as the PCM data, and is inputted/outputted to the wave-out buffer, and audio compression data extracted from an RTP packet for each frame is periodically decoded and is played as the PCM data, and the played PCM data is consecutively stored in the wave-out buffer.

[42] A PCM data block stored in the wave-out buffer is transmitted to an output device, and is outputted to a speaker by a device driver. A size of the wave-out buffer is always set as a constant value

lbuff for audio output of a continuously constant speed. [43] An exemplary embodiment of the present invention may estimate

RTPTl corresponding to an RTP time stamp with respect to an s-th PCM data block to be outputted to the wave-out buffer based on the above-described process of processing audio data using Equation 2:

[44] [Equation 2]

[45]

Figure imgf000009_0001

[46] where

RTPT" denotes an RTP time stamp value of n-th PCM data inputted into a wave-out buffer at a time when

RTPT^ is calculated, and SRΛ denotes a sampling rate with respect to a frame corresponding to a basic access unit of audio. [47] A frequency up to a maximum of 48 KHz may be applied to the AAC information, however, the AAC information is not limited to this value. [48] Accordingly,

corresponding to the NPT of the s-th PCM data block to be directly outputted to the speaker may be calculated using Equation 3: [49] [Equation 3]

[50]

NPTj = (RTPTI - RTPTl )/SR ;

[51] where

RTPT] denotes a time stamp value of a PCM data block being first outputted.

[52] As described above, an exemplary embodiment of the present invention discloses a method of extracting the NPT of each piece of information using the RTP time stamp of the video information and the audio information. Referring to FIGS. 2 and 3, a method of acquiring the NPT of the video information and the NPT of the audio information is described below for each operation.

[53] First, a method of extracting an NPT of SVC information using an RTP time stamp according to an exemplary embodiment of the present invention is described.

[54] FIG. 2 is a flowchart illustrating a method of extracting an NPT of SVC information using an RTP time stamp according to an exemplary embodiment of the present invention.

[55] First, in operation S210, the method receives video information from a decoding apparatus.

[56] An exemplary embodiment of the present invention receives the SVC information being currently and widely used as the video information and being stable, however, an exemplary embodiment of the present invention is not limited to the SVC information.

[57] In operation S220, the method subsequently extracts a specific output screen RTP time stamp of the video information.

[58] In operation S230, the method subsequently calculates a difference value by subtracting a first output screen RTP time stamp of the video information from the extracted specific output screen RTP time stamp of the video information.

[59] In operation S240, the method subsequently defines, as the NPT of the video information, a value calculated by dividing the difference value by a sampling rate with respect to an access unit of the video information.

[60] Each operation may be performed based on the above-described Equation 1.

[61] A method of extracting an NPT of AAC information using an RTP time stamp according to an exemplary embodiment of the present invention is also described.

[62] FIG. 3 is a flowchart illustrating a method of extracting an NPT of AAC information using an RTP time stamp according to an exemplary embodiment of the present invention.

[63] In operation S310, the method receives audio information from a decoding apparatus.

[64] An exemplary embodiment of the present invention receives the AAC information being currently and widely used as the audio information and being stable, however, an exemplary embodiment of the present invention is not limited to the AAC information.

[65] In operation S320, the method subsequently extracts a specific output screen RTP time stamp of the audio information.

[66] The specific output screen RTP time stamp of the audio information corresponds to a value of subtracting, from an RTP time stamp value of specific PCM data inputted into a wave-out buffer at a time when the specific output screen RTP time stamp of the audio information is calculated, a value calculated by multiplying a wave-out buffer value and a sampling rate with respect to a basic access unit of the audio information.

[67] In operation S330, the method subsequently calculates a difference value by subtracting a first output screen RTP time stamp of the audio information from the extracted specific output screen RTP time stamp of the audio information.

[68] In operation S340, the method subsequently defines, as the NPT of the audio in- formation, a value calculated by dividing the difference value by the sampling rate with respect to an access unit of the audio information.

[69] Each operation may be performed based on the above-described Equations 2 and 3.

[70] As described above, an exemplary embodiment of the present invention may calculate the NPT using the RTP time stamp, and may provide an optimized synchronization algorithm using the calculated NPT with respect to the video information and the audio information.

[71] A basic synchronization principle according to an exemplary embodiment of the present invention may compare an NPT of a video screen to be outputted and the NPT of audio PCM data to be outputted simultaneously with the screen to adjust a display interval of the video screen.

[72] Since the audio information is more important than the video information, an exemplary embodiment of the present invention may provide an apparatus for comparing the NPT of the video information and the NPT of the audio information to adjust a video display speed in order to enable the audio information to be continuously outputted regardless of the video information and to be synchronized with the outputted audio information.

[73] A configuration of an apparatus for supporting synchronization of SVC information and AAC information using an NPT according to an exemplary embodiment of the present invention is sequentially described with reference to FIG. 4.

[74] FIG. 4 is a block diagram illustrating a configuration of an apparatus for supporting synchronization of SVC information and AAC information using an NPT according to an exemplary embodiment of the present invention.

[75] First, an information receiving unit 110 receives video information and audio information using a decoding apparatus.

[76] As described above, the video information may correspond to the SVC information, and the audio information may correspond to the AAC information, however, the video information and the audio information are not limited to the above-described information formats.

[77] An exemplary embodiment of the present invention subsequently analyzes the received video information and the received audio information to calculate the NPT of each piece of information, and a calculation process is described below with reference to FIG. 5.

[78] FIG. 5 is a block diagram illustrating a synchronization algorithm of video information and audio information using an NPT according to an exemplary embodiment of the present invention.

[79] A video information analysis unit 120 calculates the NPT of the video information using an RTP time stamp included in the received video information. [80] The video information analysis unit 120 according to an exemplary embodiment of the present invention may define, as the NPT of the video information, a value of dividing a value calculated by subtracting a first output screen RTP time stamp of the video information from a specific output screen RTP time stamp of the video information by a sampling rate with respect to an access unit of the video information.

[81] In the case of the video information, an exemplary embodiment of the present invention extracts

RTPT^ corresponding to a time stamp for each screen from a received RTP packet, and finds

RTPTy for each output screen according to a screen display sequence based on screen sequence reordering considering a B-picture. An exemplary embodiment of the present invention may calculate

NPTΪ corresponding to an NPT of a k-th output screen using the above-described Equation 1 based on

RTPTy

[82] An audio information analysis unit 130 calculates the NPT of the audio information using the RTP time stamp included in the received audio information

[83] The audio information analysis unit 130 according to an exemplary embodiment of the present invention may define, as the NPT of the audio information, a value of dividing a value calculated by subtracting a first output screen RTP time stamp of the audio information from a specific output screen RTP time stamp of the audio information by a sampling rate with respect to an access unit of the audio information.

[84] The specific output screen RTP time stamp of the audio information may be defined as a value of subtracting, from an RTP time stamp value of specific PCM data inputted into a wave-out buffer at a time when the specific output screen RTP time stamp of the audio information is calculated, a value calculated by multiplying a wave-out buffer value and the sampling rate with respect to a basic access unit of the audio information.

[85] In the case of the audio information, an exemplary embodiment of the present invention performs AAC decoding for each audio frame being loaded in an RTP and arriving to restore PCM data.

[86] An exemplary embodiment of the present invention may extract RTPT" corresponding to the RTP time stamp of the audio frame sequentially arriving from an RTP packet header simultaneous with the above-described process. An exemplary embodiment of the present invention may extract

NPT^ corresponding to a time stamp of a PCM data block to be outputted using the above- described Equation 2 based on

RTPT A"

[87] An exemplary embodiment of the present invention may calculate

NPT' corresponding to the NPT of the PCM data block to be outputted using the above- described Equation 3.

[88] When a screen to be outputted is assumed as a k-th screen and a PCM data block of audio to be synchronized with the screen is assumed as an s-th PCM data block, an exemplary embodiment of the present invention compares

Figure imgf000013_0001
and

NPT^

, and adjusts a display interval of a video screen to match synchronization. [89] A synchronization process of the video information and the audio information according to an exemplary embodiment of the present invention is described below in detail. [90] First, a calculation unit 140 compares the NPT of the video information and the NPT of the audio information to calculate a difference value. [91] The calculation unit 140 may define, as the difference value, a value of subtracting an NPT value of the audio information from the NPT value of the video information. [92] The difference value

T. between NPT; and

Figure imgf000014_0001
to be used for the NPT comparing may be acquired by Equation 4: [93] [Equation 4]

[94] T = NPTy - NPJ 1J

[95] A determination unit 150 determines whether the calculated difference value is included in a specific synchronization region. When the calculated difference value is determined to be included in the specific synchronization region, an output unit 160 outputs the audio information and the video information.

[96] When

is within η corresponding to the established synchronization region (an in-sync region), synchronization is determined to be matched, and the video screen is displayed at display intervals based on a screen rate established by a Terminal Identifier (TID).

[97] However, when the calculated difference value is determined by the determination unit 150 to be excluded from the specific synchronization region, an exemplary embodiment of the present invention determines a display interval of the video information and the audio information to adjust the display interval between screens of the video information using a display interval adjustment unit 170.

[98] When

is outside η

, the display interval adjustment unit 170 according to an exemplary embodiment of the present invention may determine whether the video information corresponds to an output state faster or slower than the audio information to adjust the display interval between screens of the video information.

[99] FIG. 6 is a flowchart illustrating NPT processing for synchronization of audio information and video information according to an exemplary embodiment of the present invention. [100] A display interval value between screens of the video information may be defined as a value of summing a value calculated by dividing 1000 by an established screen rate and a screen interval size adjustment parameter. [101] The display interval of the video information

may be calculated by a predetermined screen rate

Λ in accordance with Equation 5: [102] [Equation 5]

Figure imgf000015_0001

[104] The screen interval size adjustment parameter may be defined as a value of multiplying the difference value calculated by comparing the NPT of the video information and the NPT of the audio information, and a scale factor.

[105] When

is outside η

, a size of the screen interval size adjustment parameter δ may be determined by a scale factor

Sf

, and may be represented as Equation 6:

[106] [Equation 6]

[107] δ = Ts - sf

(ms)

[108] When synchronization is not performed,

Sf may adjust a convergence speed for matching synchronization again and may verify that a value of about 0.05 to 0.1 is appropriate, using an experiment. [109] In an exemplary embodiment of the present invention,

/; corresponding to the screen display interval adjusted by δ may be calculated in accordance with Equation 7: [110] [Equation 7]

Figure imgf000016_0001

[112] As described above, an exemplary embodiment of the present invention may provide the method of supporting synchronization of the video information and the audio information using the NPT induced from time stamp information to be recorded in a header of an RTP packet when performing RTP packetization of the video information and the audio information in an Internet Protocol (IP) network and transmitting the video information and the audio information.

[113] The method is sequentially described based on a functional aspect of a configuration of an apparatus for supporting synchronization of SVC information and AAC information using an NPT with reference to FIG. 7.

[114] Since the method is applied corresponding to a method of using the apparatus for supporting synchronization of the SVC information and the AAC information using the NPT, all functional factors of the apparatus are included. Accordingly, detailed description thereof is omitted and the method is briefly described.

[115] FIG. 7 is a flowchart illustrating a method of supporting synchronization of SVC information and AAC information using an NPT according to an exemplary embodiment of the present invention.

[116] First, in operation S710, the information receiving unit 110 receives video information using a decoding apparatus.

[117] In operation S720, the information receiving unit 110 subsequently receives audio information using the decoding apparatus.

[118] In operation S730, the video information analysis unit 120 calculates the NPT of the video information using an RTP time stamp included in the received video information.

[119] Operation S730 corresponds to an operation of defining, as the NPT of the video information, a value calculated by dividing a value of subtracting a first output screen RTP time stamp of the video information from a specific output screen RTP time stamp of the video information by a sampling rate with respect to an access unit of the video information.

[120] In operation S740, the audio information analysis unit 130 calculates the NPT of the audio information using the RTP time stamp included in the received audio information.

[121] Operation S740 corresponds to an operation of defining, as the NPT of the audio information, a value calculated by dividing a value of subtracting a first output screen RTP time stamp of the audio information from a specific output screen RTP time stamp of the audio information by a sampling rate with respect to an access unit of the audio information.

[122] The specific output screen RTP time stamp of the audio information corresponds to a value of subtracting, from an RTP time stamp value of specific PCM data inputted into a wave-out buffer at a time when the specific output screen RTP time stamp of the audio information is calculated, a value calculated by multiplying a wave-out buffer value and the sampling rate with respect to a basic access unit of the audio information.

[123] In operation S750, the calculation unit 140 subsequently compares the NPT of the video information and the NPT of the audio information to calculate a difference value.

[124] Operation S750 corresponds to an operation of defining, as the difference value, a value of subtracting an NPT value of the audio information from the NPT value of the video information.

[125] In operation S760, the determination unit 150 subsequently determines whether the calculated difference value is included in a specific synchronization region.

[126] In operation S770, the output unit 160 outputs the audio information and the video information when the calculated difference value is determined to be included in the specific synchronization region.

[127] In operation S770, an exemplary embodiment of the present invention outputs the audio information and the video information at screen display intervals based on a screen rate established by a TID of the video information and the audio information.

[128] However, in operation S780, when the calculated difference value is determined to be excluded from the specific synchronization region, the display interval adjustment unit 170 determines a display interval of the video information and the audio information to adjust the display interval between screens of the video information.

[129] A display interval value between screens of the video information may be defined as a value of summing a value calculated by dividing 1000 by an established screen rate and a screen interval size adjustment parameter, and the screen interval size adjustment parameter is defined as a value of multiplying the difference value calculated by comparing the NPT of the video information and the NPT of the audio information, and a scale factor.

[130] The exemplary embodiments according to the present invention may be recorded in computer-readable media including program instructions to implement various operations embodied by a computer. The media may also include, alone or in combination with the program instructions, data files, data structures, and the like. The media and program instructions may be those specially designed and constructed for the purposes of the present invention, or they may be of the kind well-known and available to those having skill in the computer software arts. Examples of computer- readable media include magnetic media such as hard disks, floppy disks, and magnetic tape; optical media such as CD ROM disks and DVD; magneto-optical media such as floptical disks; and hardware devices that are specially configured to store and perform program instructions, such as read-only memory (ROM), random access memory (RAM), flash memory, and the like. Examples of program instructions include both machine code, such as produced by a compiler, and files containing higher level code that may be executed by the computer using an interpreter. The described hardware devices may be configured to act as one or more software modules in order to perform the operations of the above-described embodiments of the present invention.

[131] According to the present invention, it is possible to calculate an NPT of audio information and video information using an RTP time stamp.

[132] Also, according to the present invention, it is possible to induce an NPT from a time stamp value with respect to a received video and a received audio to provide synchronization between two media.

[133] Also, according to the present invention, it is possible to induce an NPT using only an RTP time stamp by eliminating a separate need for transmitting and processing an RTP Control Protocol Sender Report (RTCP SR) packet of video information and audio information.

[134] Also, according to the present invention, it is possible to reduce a number of User Datagram Protocol (UDP) ports required for transmitting an RTCP packet, and to reduce an amount of control traffic coming into a network since RTCP packet transmission is unnecessary.

[135] Although a few embodiments of the present invention have been shown and described, the present invention is not limited to the described embodiments. Instead, it would be appreciated by those skilled in the art that changes may be made to these embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the claims and their equivalents.

Claims

Claims
[1] A method of extracting a Normal Play Time (NPT) of Scalable Video Coding
(SVC) information using a Real-time Transport Protocol (RTP) time stamp, the method comprising: receiving video information from a decoding apparatus; extracting a specific output screen RTP time stamp of the video information; calculating a difference value by subtracting a first output screen RTP time stamp of the video information from the extracted specific output screen RTP time stamp of the video information; and defining, as the NPT of the video information, a value calculated by dividing the difference value by a sampling rate with respect to an access unit of the video information.
[2] The method of claim 1, wherein the video information corresponds to the SVC information.
[3] A method of extracting an NPT of Advanced Audio Coding (AAC) information using an RTP time stamp, the method comprising: receiving audio information from a decoding apparatus; extracting a specific output screen RTP time stamp of the audio information; calculating a difference value by subtracting a first output screen RTP time stamp of the audio information from the extracted specific output screen RTP time stamp of the audio information; and defining, as the NPT of the audio information, a value calculated by dividing the difference value by a sampling rate with respect to an access unit of the audio information.
[4] The method of claim 3, wherein the audio information corresponds to the AAC information.
[5] The method of claim 3, wherein the specific output screen RTP time stamp of the audio information corresponds to a value of subtracting, from an RTP time stamp value of specific Pulse Code Modulation (PCM) data inputted into a wave-out buffer at a time when the specific output screen RTP time stamp of the audio information is calculated, a value calculated by multiplying a wave-out buffer value and the sampling rate with respect to a basic access unit of the audio information.
[6] A method of supporting synchronization of SVC information and AAC information using an NPT, the method comprising: receiving video information using a decoding apparatus; receiving audio information using the decoding apparatus; calculating the NPT of the video information using an RTP time stamp included in the received video information; calculating the NPT of the audio information using the RTP time stamp included in the received audio information; comparing the NPT of the video information and the NPT of the audio information to calculate a difference value; determining whether the calculated difference value is included in a specific synchronization region; and outputting the audio information and the video information when the calculated difference value is determined to be included in the specific synchronization region.
[7] The method of claim 6, wherein the video information corresponds to the SVC information and the audio information corresponds to the AAC information.
[8] The method of claim 6, wherein the calculating of the NPT of the video information defines, as the NPT of the video information, a value calculated by dividing a value of subtracting a first output screen RTP time stamp of the video information from a specific output screen RTP time stamp of the video information by a sampling rate with respect to an access unit of the video information.
[9] The method of claim 6, wherein the calculating of the NPT of the audio information defines, as the NPT of the audio information, a value calculated by dividing a value of subtracting a first output screen RTP time stamp of the audio information from a specific output screen RTP time stamp of the audio information by a sampling rate with respect to an access unit of the audio information.
[10] The method of claim 9, wherein the specific output screen RTP time stamp of the audio information corresponds to a value of subtracting, from an RTP time stamp value of specific PCM data inputted into a wave-out buffer at a time when the specific output screen RTP time stamp of the audio information is calculated, a value calculated by multiplying a wave-out buffer value and the sampling rate with respect to a basic access unit of the audio information.
[11] The method of claim 6, wherein the comparing defines, as the difference value, a value of subtracting an NPT value of the audio information from the NPT value of the video information.
[12] The method of claim 6, wherein the outputting outputs the audio information and the video information at screen display intervals based on a screen rate established by a Terminal Identifier (TID) of the video information and the audio information. [13] The method of claim 6, further comprising: determining a display interval of the video information and the audio information to adjust the display interval between screens of the video information when the calculated difference value is determined to be excluded from the specific synchronization region.
[14] The method of claim 13, wherein a display interval value between screens of the video information corresponds to a value of summing a value calculated by dividing 1000 by an established screen rate and a screen interval size adjustment parameter.
[15] The method of claim 14, wherein the screen interval size adjustment parameter is defined as a value of multiplying the difference value calculated by comparing the NPT of the video information and the NPT of the audio information, and a scale factor.
[16] An apparatus for supporting synchronization of SVC information and AAC information using an NPT, the apparatus comprising: an information receiving unit to receive video information and audio information using a decoding apparatus; a video information analysis unit to calculate the NPT of the video information using an RTP time stamp included in the received video information; an audio information analysis unit to calculate the NPT of the audio information using the RTP time stamp included in the received audio information; a calculation unit to compare the NPT of the video information and the NPT of the audio information to calculate a difference value; a determination unit to determine whether the calculated difference value is included in a specific synchronization region; and an output unit to output the audio information and the video information when the calculated difference value is determined to be included in the specific synchronization region.
[17] The apparatus of claim 16, wherein the video information corresponds to the
SVC information and the audio information corresponds to the AAC information.
[18] The apparatus of claim 16, wherein the video information analysis unit defines, as the NPT of the video information, a value of dividing a value calculated by subtracting a first output screen RTP time stamp of the video information from a specific output screen RTP time stamp of the video information by a sampling rate with respect to an access unit of the video information.
[19] The apparatus of claim 16, wherein the audio information analysis unit defines, as the NPT of the audio information, a value of dividing a value calculated by subtracting a first output screen RTP time stamp of the audio information from a specific output screen RTP time stamp of the audio information by a sampling rate with respect to an access unit of the audio information.
[20] The apparatus of claim 19, wherein the specific output screen RTP time stamp of the audio information corresponds to a value of subtracting, from an RTP time stamp value of specific PCM data inputted into a wave-out buffer at a time when the specific output screen RTP time stamp of the audio information is calculated, a value calculated by multiplying a wave-out buffer value and the sampling rate with respect to a basic access unit of the audio information.
[21] The apparatus of claim 16, wherein the calculation unit defines, as the difference value, a value of subtracting an NPT value of the audio information from the NPT value of the video information.
[22] The apparatus of claim 16, wherein the output unit outputs the audio information and the video information at screen display intervals based on a screen rate established by a TID of the video information and the audio information.
[23] The apparatus of claim 16, further comprising: a display interval adjustment unit to determine a display interval of the video information and the audio information to adjust the display interval between screens of the video information when the calculated difference value is determined to be excluded from the specific synchronization region.
[24] The apparatus of claim 23, wherein a display interval value between screens of the video information corresponds to a value of summing a value calculated by dividing 1000 by an established screen rate and a screen interval size adjustment parameter.
[25] The apparatus of claim 24, wherein the screen interval size adjustment parameter is defined as a value of multiplying the difference value calculated by comparing the NPT of the video information and the NPT of the audio information, and a scale factor.
PCT/KR2008/007859 2008-02-20 2008-12-31 Method and apparatus for svc video and aac audio synchronization using npt WO2009104869A1 (en)

Priority Applications (4)

Application Number Priority Date Filing Date Title
KR20080015154 2008-02-20
KR10-2008-0015154 2008-02-20
KR10-2008-0025042 2008-03-18
KR20080025042A KR100916505B1 (en) 2008-02-20 2008-03-18 Method and apparatus for svc video and aac audio synchronization using ntp

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US12/735,828 US8675727B2 (en) 2008-02-20 2008-12-31 Method and apparatus for SVC video and AAC audio synchronization using NPT

Publications (1)

Publication Number Publication Date
WO2009104869A1 true WO2009104869A1 (en) 2009-08-27

Family

ID=40985714

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2008/007859 WO2009104869A1 (en) 2008-02-20 2008-12-31 Method and apparatus for svc video and aac audio synchronization using npt

Country Status (1)

Country Link
WO (1) WO2009104869A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016201888A1 (en) * 2015-06-17 2016-12-22 小米科技有限责任公司 Multimedia file playing method and apparatus

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20050085289A (en) * 2002-12-04 2005-08-29 코닌클리즈케 필립스 일렉트로닉스 엔.브이. Method of automatically testing audio/video synchronization
US20050259947A1 (en) * 2004-05-07 2005-11-24 Nokia Corporation Refined quality feedback in streaming services
US20060184790A1 (en) * 2004-03-26 2006-08-17 Microsoft Corporation Protecting elementary stream content

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20050085289A (en) * 2002-12-04 2005-08-29 코닌클리즈케 필립스 일렉트로닉스 엔.브이. Method of automatically testing audio/video synchronization
US20060184790A1 (en) * 2004-03-26 2006-08-17 Microsoft Corporation Protecting elementary stream content
US20050259947A1 (en) * 2004-05-07 2005-11-24 Nokia Corporation Refined quality feedback in streaming services

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016201888A1 (en) * 2015-06-17 2016-12-22 小米科技有限责任公司 Multimedia file playing method and apparatus
US9961393B2 (en) 2015-06-17 2018-05-01 Xiaomi Inc. Method and device for playing multimedia file

Similar Documents

Publication Publication Date Title
US8355433B2 (en) Encoding video streams for adaptive video streaming
CN101933333B (en) Synchronizing remote audio with fixed video
US8503541B2 (en) Method and apparatus for determining timing information from a bit stream
JP2008506282A (en) Maintaining synchronization between streaming audio and streaming video used for Internet protocols
CN102742249B (en) Method, system and device for synchronization of media streams
US8681854B2 (en) Method and device for reordering and multiplexing multimedia packets from multimedia streams pertaining to interrelated sessions
RU2622621C2 (en) System and method for flow transfer of reproduced content
JP5641090B2 (en) Transmitting apparatus, transmitting method, receiving apparatus, and receiving method
JP5149012B2 (en) Synchronizing multi-channel speakers on the network
US7243150B2 (en) Reducing the access delay for transmitting processed data over transmission data
EP2645727A2 (en) Reception device for receiving a plurality of real-time transfer streams, transmission device for transmitting same, and method for playing multimedia content
US20110261257A1 (en) Feature Optimization and Reliability for Audio and Video Signature Generation and Detection
KR101704619B1 (en) Determining available media data for network streaming
WO2012096372A1 (en) Content reproduction device, content reproduction method, delivery system, content reproduction program, recording medium, and data structure
JP2005523650A (en) Apparatus and method for synchronization of audio and video streams
US20110246659A1 (en) System, Method and Apparatus for Dynamic Media File Streaming
ES2681049T3 (en) Inter-destination multimedia synchronization based on bookmarks
TWI475855B (en) Synchronized wireless display devices
WO2002023908A1 (en) Method for distributing dynamic image and sound over network, the apparatus, and method for generating dynamic image and sound
EP2417748A1 (en) Systems, methods and apparatuses for media file streaming
CN101271720B (en) Synchronization process for mobile phone stream media audio and video
KR20030078354A (en) Apparatus and method for injecting synchronized data for digital data broadcasting
CN102273214A (en) Device and method for synchronizing a stereoscopic image, and a method and apparatus for providing a stereoscopic image based on the synchronization apparatus and method used for a stereoscopic image which
EP1346579A1 (en) Webcasting method and system for time-based synchronization of multiple, independent media streams
EP1195996A2 (en) Apparatus, method and computer program product for decoding and reproducing moving images, time control method and multimedia information receiving apparatus

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 08872503

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 12735828

Country of ref document: US

NENP Non-entry into the national phase in:

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 08872503

Country of ref document: EP

Kind code of ref document: A1