JP4390666B2

JP4390666B2 - Method and apparatus for decoding and reproducing compressed video data and compressed audio data

Info

Publication number: JP4390666B2
Application number: JP2004266201A
Authority: JP
Inventors: 直也川上
Original assignee: NEC Electronics Corp
Current assignee: NEC Electronics Corp
Priority date: 2004-09-14
Filing date: 2004-09-14
Publication date: 2009-12-24
Anticipated expiration: 2024-09-14
Also published as: JP2006086553A

Description

本発明は、圧縮映像データ及び圧縮音声データの復号再生方法及び復号再生装置に関し、特に、途中再生時に映像データと音声データの同期再生を行う方法に関するものである。 The present invention relates to a method for decoding and reproducing compressed video data and compressed audio data, and more particularly to a method for performing synchronous reproduction of video data and audio data during intermediate reproduction.

デジタル化された映像データ、音声データを圧縮符号化し、１本のデータストリームに多重化する方式としてＭＰＥＧ１方式がある。ＭＰＥＧ１システムストリームのデータ構造について図３を用いて説明する。ＭＰＥＧ１方式によって圧縮符号化された映像データストリーム及び音声データストリームは、エレメンタリーストリームと呼ばれる。これらのエレメンタリーストリームはパケットと呼ばれる単位に分割されて多重化される。パケットには、パケットヘッダが付加され、パケットヘッダにはデータストリームを識別するためのストリーム識別コード、同期再生を行うための時間情報であるＰＴＳ（Presentation Time Stamp）、さらに画像データの場合はデータ復号順序を示す時間情報であるＤＴＳ（Decoding Time Stamp）等が記述される。 There is an MPEG1 system as a system for compressing and encoding digitized video data and audio data and multiplexing the data into a single data stream. The data structure of the MPEG1 system stream will be described with reference to FIG. A video data stream and an audio data stream that are compression-encoded according to the MPEG1 system are called elementary streams. These elementary streams are divided and multiplexed into units called packets. A packet header is added to the packet. The packet header includes a stream identification code for identifying a data stream, a PTS (Presentation Time Stamp) that is time information for performing synchronous reproduction, and, in the case of image data, data decoding. A DTS (Decoding Time Stamp), which is time information indicating the order, is described.

さらに、任意の数のパケットの集合によるパックと呼ばれる単位が形成される。パックにはパックヘッダが付加され、パックヘッダには、パック開始コード、ＳＣＲ（System Clock Reference）と呼ばれる基準クロック情報等が記述される。ＭＰＥＧ１システムでは、映像と音声の同期再生を行うためにＳＴＣ（System Time Clock）と呼ばれる基準時間が定義されており、ＭＰＥＧ１復号再生装置は９０ｋＨｚのＳＴＣを備えている。従来のＭＰＥＧ１復号再生装置は、パックヘッダ中のＳＣＲを参照して符号化時の基準時間をＳＴＣで再現することにより、映像と音声を同期再生する際に必要な基準時間を確定する。 Furthermore, a unit called a pack is formed by a set of an arbitrary number of packets. A pack header is added to the pack, and a pack start code, reference clock information called SCR (System Clock Reference), and the like are described in the pack header. In the MPEG1 system, a reference time called STC (System Time Clock) is defined in order to perform synchronized playback of video and audio, and the MPEG1 decoding / playback apparatus has a 90 kHz STC. The conventional MPEG1 decoding / playback apparatus determines the reference time required for synchronous playback of video and audio by reproducing the reference time at the time of encoding with STC by referring to the SCR in the pack header.

また、図３に示すように、パックヘッダの後にシステムヘッダを付加することもでき、システムヘッダには、個々のエレメンタリーストリームのビットレート等が記述される。このように、パック単位に映像、音声等の複数のエレメンタリーストリームを多重化して１本のストリームとしたものをＭＰＥＧ１システムストリームと呼ぶ。 Also, as shown in FIG. 3, a system header can be added after the pack header, and the bit rate of each elementary stream is described in the system header. In this manner, a stream obtained by multiplexing a plurality of elementary streams such as video and audio in units of packs is called an MPEG1 system stream.

次に、ＭＰＥＧ１システムストリームから映像データ及び音声データの同期再生を行う従来のＭＰＥＧ１復号再生装置４０の構成を、図４を用いて説明する。復号再生を行う際には、転送装置４１から出力されたＭＰＥＧ１システムストリームがＤＥＭＵＸ４２に入力される。ここで、転送装置４１は、ＣＤ−ＲＯＭ、ハードディスク等の記憶媒体からＭＰＥＧ１システムストリームを抽出して出力する装置である。ＤＥＭＵＸ４２は、パケットヘッダ内のストリーム識別コードに基づいて、入力されたＭＰＥＧ１システムストリームを映像データストリーム（ビデオエレメンタリストリーム、以下ビデオＥＳと呼ぶ）と音声データストリーム（オーディオエレメンタリストリーム、以下オーディオＥＳと呼ぶ）に分離するデマルチプレクサであり、それぞれをビデオデコーダ４４、オーディオデコーダ４５に出力する。 Next, the configuration of a conventional MPEG1 decoding / playback apparatus 40 that performs synchronized playback of video data and audio data from an MPEG1 system stream will be described with reference to FIG. When performing decoding and reproduction, the MPEG1 system stream output from the transfer device 41 is input to the DEMUX 42. Here, the transfer device 41 is a device that extracts and outputs an MPEG1 system stream from a storage medium such as a CD-ROM or a hard disk. Based on the stream identification code in the packet header, the DEMUX 42 converts the input MPEG1 system stream into a video data stream (video elementary stream, hereinafter referred to as video ES) and an audio data stream (audio elementary stream, hereinafter referred to as audio ES). The demultiplexers are separated into a video decoder 44 and an audio decoder 45, respectively.

また、ＤＥＭＵＸ４２は、パックヘッダから抽出したＳＣＲの値をＳＴＣ生成部４３に出力する。ＳＴＣ生成部４３は、受信したＳＣＲ値と自己のＳＴＣが一致するよう調整を行い、ビデオデコーダ４４及びオーディオデコーダ４５にＳＴＣを配信する。具体的には、ＳＣＲに符号化時の基準時刻が９０ｋＨｚ単位のカウンタ値で示されており、ＳＴＣ生成部４３は、自己が備えているＳＴＣカウンタの値を受信したＳＣＲに一致させることにより、符号化時の基準時刻を再現する。 Further, the DEMUX 42 outputs the SCR value extracted from the pack header to the STC generation unit 43. The STC generation unit 43 performs adjustment so that the received SCR value matches its own STC, and distributes the STC to the video decoder 44 and the audio decoder 45. Specifically, the reference time at the time of encoding in the SCR is indicated by a counter value in units of 90 kHz, and the STC generation unit 43 matches the value of the STC counter included in the SCR with the received SCR. Reproduce the reference time for encoding.

ビデオデコーダ４４は、ＤＥＭＵＸ４２から受信したビデオＥＳをデコードし、映像信号を出力する。受信したビデオＥＳは、いったん入力バッファ４４１に蓄積される。ＤＴＳ・ＰＴＳ比較部４４３は、入力バッファ４４１に蓄積されたビデオＥＳからＤＴＳ及びＰＴＳを取得し、取得したＤＴＳ及びＰＴＳをＳＴＣ生成部４３から配信される基準時間ＳＴＣと比較する。さらに、ＤＴＳ・ＰＴＳ比較部４４３は、ＤＴＳとＳＴＣが一致するタイミングでビデオＥＳに含まれるピクチャのデコードを行うようビデオ復号部４４２に対して指示し、ＰＴＳとＳＴＣが一致するタイミングでデコード後の映像信号を出力するようビデオ復号部４４２に対して指示する。ビデオ復号部４４２は、入力バッファ４４１からビデオＥＳを取得し、ＤＴＳ・ＰＴＳ比較部４４３が指示するタイミングに従ってデコードを行い、デコード後の映像信号を出力する。 The video decoder 44 decodes the video ES received from the DEMUX 42 and outputs a video signal. The received video ES is temporarily stored in the input buffer 441. The DTS / PTS comparison unit 443 acquires the DTS and the PTS from the video ES stored in the input buffer 441 and compares the acquired DTS and PTS with the reference time STC distributed from the STC generation unit 43. Further, the DTS / PTS comparison unit 443 instructs the video decoding unit 442 to decode the picture included in the video ES at the timing when the DTS and STC match, and after decoding at the timing when the PTS and STC match. The video decoding unit 442 is instructed to output a video signal. The video decoding unit 442 acquires the video ES from the input buffer 441, performs decoding according to the timing instructed by the DTS / PTS comparison unit 443, and outputs the decoded video signal.

オーディオデコーダ４５は、ＤＥＭＵＸ４２から受信したオーディオＥＳをデコードし、音声信号を出力する。受信したオーディオＥＳは、いったん入力バッファ４５１に蓄積される。ＰＴＳ比較部４５３は、入力バッファ４５１に蓄積されたオーディオＥＳからＰＴＳを取得し、取得したＰＴＳをＳＴＣ生成部４３から配信されるＳＴＣと比較する。さらに、ＰＴＳ比較部４５３は、ＰＴＳとＳＴＣが一致するタイミングでデコード後の音声信号を出力するようオーディオ復号部４５２に対して指示する。オーディオ復号部４５２は、入力バッファ４５１からオーディオＥＳを取得してデコードを行い、ＰＴＳ比較部４５３が指示するタイミングに従ってデコード後の映像信号を出力する。 The audio decoder 45 decodes the audio ES received from the DEMUX 42 and outputs an audio signal. The received audio ES is temporarily stored in the input buffer 451. The PTS comparison unit 453 acquires the PTS from the audio ES stored in the input buffer 451 and compares the acquired PTS with the STC distributed from the STC generation unit 43. Furthermore, the PTS comparison unit 453 instructs the audio decoding unit 452 to output the decoded audio signal at the timing when the PTS and STC match. The audio decoding unit 452 acquires the audio ES from the input buffer 451, performs decoding, and outputs a decoded video signal according to the timing indicated by the PTS comparison unit 453.

上述したように、従来のＭＰＥＧ１復号再生装置４０では、パケットヘッダに含まれているＰＴＳに従って映像データ及び音声データの再生出力タイミングを決定してすることにより、映像と音声の同期再生を可能としている。なお、ＭＰＥＧ１システムストリームの途中から再生を行う場合（以下、途中再生と呼ぶ）であっても、ＰＴＳを用いることによって映像と音声の同期（以下、ＡＶ同期と呼ぶ）が可能である。以下では、途中再生時におけるＰＴＳを用いたＡＶ同期処理について、図５及び図６を参照して説明する。 As described above, the conventional MPEG1 decoding / playback apparatus 40 enables synchronized playback of video and audio by determining the playback output timing of video data and audio data according to the PTS included in the packet header. . Even when playback is performed from the middle of the MPEG1 system stream (hereinafter referred to as halfway playback), video and audio can be synchronized (hereinafter referred to as AV synchronization) by using PTS. In the following, AV synchronization processing using PTS during halfway playback will be described with reference to FIG. 5 and FIG.

図５は、途中再生時のＡＶ同期を示すタイミング図である。図５の横軸はＳＴＣ値を示している。このＳＴＣ値は、ＳＴＣ生成部４３において、パックヘッダから取得したＳＣＲ値と一致するよう決められて配信されるものである。図５（ａ）は、途中再生時に転送装置４１及びＤＥＭＵＸ４２を経由して、ビデオＥＳがビデオデコーダ４４の備える入力バッファ４４１に入力されるタイミングを示している。図中のＩ１、Ｐ１、Ｂ１等はそれぞれ、Ｉピクチャ、Ｐピクチャ、Ｂピクチャのフレームであることを示している。図５（ｂ）及び（ｃ）は、図５（ａ）に示すビデオＥＳの各ピクチャに付与されたＰＴＳ値及びＤＴＳ値を示している。 FIG. 5 is a timing chart showing AV synchronization during playback. The horizontal axis in FIG. 5 indicates the STC value. This STC value is determined and distributed by the STC generation unit 43 so as to match the SCR value acquired from the pack header. FIG. 5A shows the timing at which the video ES is input to the input buffer 441 included in the video decoder 44 via the transfer device 41 and the DEMUX 42 during halfway playback. In the figure, I1, P1, B1, etc. indicate frames of an I picture, a P picture, and a B picture, respectively. FIGS. 5B and 5C show the PTS value and the DTS value assigned to each picture of the video ES shown in FIG.

ここで、Ｉピクチャ、Ｐピクチャ及びＢピクチャとは、映像フレーム（ピクチャ）の符号化方法の違いを示したものであり、ＭＰＥＧ１方式では、ビデオＥＳを構成するピクチャは、Ｉピクチャ、Ｐピクチャ及びＢピクチャのいずれかに符号化される。Ｉピクチャは、自己のフレーム（ピクチャ）内の情報だけを使用して符号化するフレーム内符号化方式により符号化されており、他のピクチャの情報を必要とせずに復号を行うことができる。一方、Ｐピクチャは、過去のＩピクチャから順方向のフレーム（ピクチャ）間予測を行って差分を符号化したものであり、デコードの際には過去のＩピクチャの情報を必要とする。さらに、Ｂピクチャは、過去及び未来の２つのピクチャからピクチャ間予測符号化を行ったものであり、デコードの際には過去のＩピクチャの情報に加えて未来のＰピクチャの情報を必要とする。 Here, the I picture, the P picture, and the B picture indicate differences in the encoding method of the video frame (picture). In the MPEG1 system, the pictures constituting the video ES are the I picture, the P picture, and the picture. It is encoded into one of the B pictures. The I picture is encoded by an intra-frame encoding method that encodes using only information in its own frame (picture), and can be decoded without requiring information of other pictures. On the other hand, a P picture is obtained by encoding a difference by performing inter-frame prediction in the forward direction from a past I picture, and information on the past I picture is required for decoding. Further, the B picture is obtained by performing inter-picture predictive coding from two past and future pictures. In decoding, B picture information in addition to past I picture information is required. .

図５（ｆ）は、図５（ｃ）のＤＴＳ値に従って図５（ａ）のビデオ復号部４４２においてビデオＥＳがデコードされるタイミングを示したものである。ここで、デコードされるピクチャの先頭はピクチャＩ１であり、ＤＴＳ値が９９乃至１０１であるピクチャＰ１、Ｂ１及びＢ２のデコードは行われない。これは、ピクチャＰ１、Ｂ１及びＢ２をデコードする際に必要となる過去のＩピクチャが途中再生されたビデオＥＳ中に存在せず、これらのフレーム間予測符号化されたピクチャをデコードすることができないためである。このため、単独でデコード可能なピクチャＩ１からデコードが開始される。なお、デコードすることができないピクチャＰ１、Ｂ１及びＢ２は入力バッファ４４１及びビデオ復号部４４２から廃棄される。 FIG. 5F shows the timing at which the video ES is decoded in the video decoding unit 442 in FIG. 5A according to the DTS value in FIG. Here, the head of the picture to be decoded is the picture I1, and the pictures P1, B1, and B2 whose DTS values are 99 to 101 are not decoded. This is because the past I pictures necessary for decoding the pictures P1, B1, and B2 do not exist in the video ES that has been reproduced halfway, and these interframe predictive-encoded pictures cannot be decoded. Because. Therefore, decoding is started from a picture I1 that can be decoded independently. Note that the pictures P1, B1, and B2 that cannot be decoded are discarded from the input buffer 441 and the video decoding unit 442.

図５（ｇ）は、図５（ｂ）のＰＴＳ値に従ってデコードされた映像信号がビデオ復号部４４２から出力されるタイミングを示したものである。上述したように、ピクチャＰ１、Ｂ１及びＢ２はデコード不可能であるため、Ｉ１に対応する映像信号から出力が行われる。 FIG. 5G shows the timing when the video signal decoded according to the PTS value of FIG. 5B is output from the video decoding unit 442. As described above, since the pictures P1, B1, and B2 cannot be decoded, the video signal corresponding to I1 is output.

一方、図５（ｄ）は、オーディオＥＳがオーディオデコーダ４５の備える入力バッファ４５１に入力されるタイミングを示しており、図５（ｅ）は、図５（ｄ）のオーディオＥＳに対応するＰＴＳ値を示している。また、図５（ｈ）は、ビデオ復号部４４２において図５（ｅ）のＰＴＳ値に従って図５（ｄ）のオーディオＥＳがデコードされ、デコード後の音声信号が出力されるタイミングを示したものである。音声信号の出力は、ＳＴＣ値１００の時点からＡ１の出力を開始することとしてもよいが、通常は、映像信号出力が可能となるＳＴＣ値１０３の時点から、映像信号Ｉ１と同一のＰＴＳ値を持つ音声信号Ａ４の出力を開始することにより、映像と音声の再生を同時に始めることが行われている。 On the other hand, FIG. 5D shows the timing at which the audio ES is input to the input buffer 451 included in the audio decoder 45, and FIG. 5E shows the PTS value corresponding to the audio ES in FIG. Is shown. FIG. 5 (h) shows the timing at which the audio decoding unit 442 decodes the audio ES of FIG. 5 (d) in accordance with the PTS value of FIG. 5 (e) and the decoded audio signal is output. is there. For the output of the audio signal, the output of A1 may be started from the point of time when the STC value is 100. Normally, the same PTS value as that of the video signal I1 is set from the point of time of the STC value 103 at which the video signal can be output. By starting the output of the audio signal A4, the reproduction of video and audio is started at the same time.

図６は、ＰＴＳを用いたＡＶ同期処理の手順を示すフローチャートである。ステップＳ１では、オーディオデコーダ４５が備えるＰＴＳ比較部４５３が、途中再生されたオーディオＥＳの先頭パケットに付与されたオーディオＰＴＳを取得する。ステップＳ２では、ビデオデコーダ４４が備えるビデオ復号部４４２が、入力バッファ４４１に蓄積されているビデオＥＳからピクチャを順次読み出して取得する。ビデオ復号部４４２では読み出したピクチャがＩピクチャであるか判定を行い（ステップＳ３）、Ｉピクチャでないピクチャは、上述したようにデコード不可能であるため、取得したピクチャを廃棄して次のピクチャの読み出しを行う。他方、読み出したピクチャがＩピクチャである場合は、ＤＴＳ・ＰＴＳ比較部４４３において、当該Ｉピクチャに付与されたＰＴＳ及びＤＴＳの取得が行われる（ステップＳ４）。 FIG. 6 is a flowchart showing a procedure of AV synchronization processing using PTS. In step S1, the PTS comparison unit 453 included in the audio decoder 45 acquires the audio PTS attached to the head packet of the audio ES that is played back midway. In step S <b> 2, the video decoding unit 442 included in the video decoder 44 sequentially reads out and acquires pictures from the video ES stored in the input buffer 441. The video decoding unit 442 determines whether or not the read picture is an I picture (step S3). Since the picture that is not an I picture cannot be decoded as described above, the acquired picture is discarded and the next picture is discarded. Read. On the other hand, if the read picture is an I picture, the DTS / PTS comparison unit 443 obtains the PTS and DTS assigned to the I picture (step S4).

ステップＳ５では、オーディオデコーダ４５が備えるＰＴＳ比較部４５３が、ＳＴＣ生成部４３から配信されるＳＴＣの値とステップＳ１で取得したＰＴＳ値との比較を行って、両者が一致するタイミングでオーディオ復号部４５２に対してデコード後の音声信号を出力するよう指示する。ステップＳ６では、ＰＴＳ比較部４５３から指示されたタイミングに従って、オーディオ復号部４５２がオーディオＥＳのデコード及び音声信号の出力を行う。 In step S5, the PTS comparison unit 453 included in the audio decoder 45 compares the STC value distributed from the STC generation unit 43 with the PTS value acquired in step S1, and at the timing when both match, the audio decoding unit 452 is instructed to output the decoded audio signal. In step S6, the audio decoding unit 452 decodes the audio ES and outputs the audio signal according to the timing instructed by the PTS comparison unit 453.

ステップＳ７では、ＤＴＳ・ＰＴＳ比較部４４３が、ＳＴＣ生成部４３から配信されるＳＴＣの値とステップＳ４で取得したＰＴＳ及びＤＴＳの値との比較を行って、両者が一致するタイミングでビデオ復号部４４２に対してピクチャのデコード及びデコード後の映像信号の出力を行うよう指示する。ステップＳ８では、ＤＴＳ・ＰＴＳ比較部４４３から指示されたタイミングに従って、ビデオ復号部４４２がピクチャのデコード及び映像信号の出力を行う。 In step S7, the DTS / PTS comparison unit 443 compares the STC value distributed from the STC generation unit 43 with the PTS and DTS values acquired in step S4, and at the timing when both match, the video decoding unit Instruct 442 to decode the picture and output the decoded video signal. In step S8, the video decoding unit 442 decodes a picture and outputs a video signal according to the timing instructed by the DTS / PTS comparison unit 443.

以上に説明したように、従来のＭＰＥＧ１復号再生装置では、ビデオＰＴＳ及びオーディオＰＴＳがＳＴＣカウンタの値と一致するタイミングで映像信号及び音声信号の出力を開始することにより、途中再生の場合であってもＡＶ同期を行うことができる。このように、ＰＴＳを用いてＡＶ同期を確立することにより途中再生を行う復号再生装置は、例えば、特許文献１乃至２に開示されている。
特開平７−１７０４９０号公報特開平９−２１９８３８号公報 As described above, in the conventional MPEG1 decoding / playback apparatus, the video PTS and the audio PTS start outputting video signals and audio signals at the timing when they match the values of the STC counter. Can also perform AV synchronization. Thus, for example, Patent Documents 1 and 2 disclose decoding and reproducing apparatuses that perform intermediate reproduction by establishing AV synchronization using PTS.
JP-A-7-170490 Japanese Patent Laid-Open No. 9-219838

上述したように、従来のＭＰＥＧ１復号再生装置で行われるＡＶ同期処理は、ＰＴＳを使用して行われている。しかしながら、ＩＳＯ／ＩＥＣ１１１７２−１に定められるＭＰＥＧ１規格では、必ずしも全てのピクチャにビデオＰＴＳを付与する必要はなく、ＭＰＥＧ１システムストリームの先頭に位置するピクチャ以外のピクチャをパケット化する際には、ビデオＰＴＳを付与しないことも許容されている。ビデオＰＴＳが付与されなければ、上述したＰＴＳを使用する方式によってＡＶ同期を行うことはできない。このように、ビデオＰＴＳが付与されていないＭＰＥＧ１システムストリームの復号再生を開始すると、ＡＶ同期が取れないばかりでなく、映像と音声の再生開始タイミングの同期を取ることもできないため、映像と音声の出力を同時に開始することができないという課題がある。具体的には、ＰＴＳとＳＴＣの一致による再生タイミングの決定ができないために、映像信号の出力は最初に取得したＩピクチャに相当する映像信号から開始し、音声信号の出力はオーディオＥＳの先頭に相当する音声信号から開始すると、両者の出力開始時間には、先頭のＩピクチャを取得前にデコードされずに廃棄されるピクチャの再生時間の分だけタイムラグが生じることになる。 As described above, the AV synchronization processing performed in the conventional MPEG1 decoding / playback apparatus is performed using PTS. However, in the MPEG1 standard defined in ISO / IEC11172-1, it is not always necessary to assign a video PTS to all pictures. When a picture other than the picture located at the head of the MPEG1 system stream is packetized, the video PTS is used. It is also allowed not to give. If no video PTS is given, AV synchronization cannot be performed by the above-described method using the PTS. As described above, when the decoding and reproduction of the MPEG1 system stream to which the video PTS is not added is started, not only AV synchronization cannot be established but also the reproduction start timing of the video and audio cannot be synchronized. There is a problem that output cannot be started simultaneously. Specifically, since the reproduction timing cannot be determined based on the coincidence of PTS and STC, the output of the video signal starts from the video signal corresponding to the first acquired I picture, and the output of the audio signal is at the head of the audio ES. When starting from the corresponding audio signal, a time lag is generated in the output start time of both of them corresponding to the reproduction time of the picture discarded without being decoded before the leading I picture is acquired.

本発明にかかる復号再生方法は、少なくとも画像フレームをフレーム内符号化した第１ピクチャ及び画像フレームをフレーム間予測符号化した第２ピクチャにより構成される圧縮映像データストリームと圧縮音声データストリームから、映像データ及び音声データの復号再生を行う方法であって、前記圧縮映像データストリームの先頭からみて最初の前記第１ピクチャを検出し、前記圧縮映像データストリームの先頭から前記最初の第１ピクチャの直前に位置する前記第２ピクチャまでを再生した場合の再生時間を算出し、前記最初の第１ピクチャを復号化して得られる映像データから映像出力を開始し、前記圧縮音声データストリームを復号化して得られる音声データストリームの先頭から、前記再生時間分だけ後の音声データより音声出力を開始するものである。 The decoding / reproducing method according to the present invention includes a compressed video data stream and a compressed audio data stream that are composed of at least a first picture obtained by intra-coding an image frame and a second picture obtained by inter-frame predictive coding an image frame. A method for decoding and reproducing data and audio data, wherein the first picture is detected as viewed from the head of the compressed video data stream, and immediately before the first picture from the head of the compressed video data stream. Obtained by calculating a reproduction time when reproducing up to the second picture located, starting video output from video data obtained by decoding the first first picture, and decoding the compressed audio data stream From the beginning of the audio data stream, the audio data is output from the audio data after the playback time. It is intended to start.

一方、本発明にかかる復号再生装置は、少なくとも画像フレームをフレーム内符号化した第１ピクチャ及び画像フレームをフレーム間予測符号化した第２ピクチャにより構成される圧縮映像データストリームから映像データを復号再生するビデオデコーダと、圧縮音声データストリームから音声データを復号再生するオーディオデコーダとを備え、前記ビデオデコーダは、前記圧縮映像データストリームの先頭からみて最初の前記第１ピクチャを復号化して得られる映像データから映像出力を開始し、前記オーディオデコーダは、前記圧縮音声データストリームを復号化して得られる音声データストリームの先頭から、前記最初の第１ピクチャより前に位置する全てのピクチャの再生時間に対応する音声データを飛び越し、前記再生時間に対応する音声データより後の音声データより音声出力を開始するよう構成したものである。 On the other hand, the decoding / reproducing apparatus according to the present invention decodes / reproduces video data from a compressed video data stream including at least a first picture obtained by intra-coding an image frame and a second picture obtained by inter-frame predictive coding an image frame. And a video decoder obtained by decoding the first picture as viewed from the beginning of the compressed video data stream. Video output is started, and the audio decoder corresponds to the playback time of all pictures located before the first first picture from the beginning of the audio data stream obtained by decoding the compressed audio data stream. Audio data is skipped and the playback time is Those configured to start a voice output from the voice data after the audio data.

本発明にかかる復号再生方法又は復号再生装置によれば、音声信号の出力開始タイミングを前記最初の第１ピクチャより前に位置するピクチャの再生時間に相当する時間だけ遅らせることができる。さらには、前記最初の第１ピクチャより前に位置するピクチャはデコードできないため、音声信号の再生開始時には再生可能な映像信号の存在しない時間がある。本発明にかかる復号再生方法又は復号再生装置によれば、この映像信号が存在しない時間に対応する音声信号を除いて再生を開始することができる。これにより、ビデオＰＴＳが検出されない場合であっても、音声信号の再生開始タイミングを映像信号の再生開始タイミングに合わせることが可能となる。 According to the decoding / reproducing method or the decoding / reproducing apparatus according to the present invention, the output start timing of the audio signal can be delayed by a time corresponding to the reproduction time of the picture located before the first first picture. Furthermore, since a picture positioned before the first first picture cannot be decoded, there is a time when there is no reproducible video signal at the start of reproduction of the audio signal. According to the decoding / reproducing method or the decoding / reproducing apparatus according to the present invention, reproduction can be started except for the audio signal corresponding to the time when the video signal does not exist. Thereby, even when the video PTS is not detected, it is possible to match the reproduction start timing of the audio signal with the reproduction start timing of the video signal.

本発明により、圧縮符号化された映像ストリームにＰＴＳ等の再生時間情報が付与されていない場合であっても、映像と音声の再生開始タイミングを合わせることができる。 According to the present invention, even when playback time information such as PTS is not added to a compression-encoded video stream, the playback start timing of video and audio can be matched.

発明の実施の形態１．
図１に本発明の実施の形態１にかかるＭＰＥＧ１復号再生装置１０の構成を示す。ＭＰＥＧ１復号再生装置１０は、廃棄ピクチャ数をカウントするためのピクチャカウンタ１４４をビデオデコーダ１４に備え、ピクチャカウンタ１４４に集計された廃棄ピクチャ数から音声信号出力時のオフセット時間を算出するオフセット算出部１５４をオーディオデコーダ１５に備えている。ＭＰＥＧ１復号再生装置１０が備える転送装置４１、ＤＥＭＵＸ４２及びＳＴＣ生成部４３の機能は、従来のＭＰＥＧ１復号再生装置４０が備えるものと同等であるため、説明を省略する。 Embodiment 1 of the Invention
FIG. 1 shows a configuration of an MPEG1 decoding / playback apparatus 10 according to the first exemplary embodiment of the present invention. The MPEG1 decoding / playback apparatus 10 includes a picture counter 144 for counting the number of discarded pictures in the video decoder 14, and an offset calculating unit 154 that calculates an offset time at the time of outputting an audio signal from the number of discarded pictures counted in the picture counter 144. Is provided in the audio decoder 15. The functions of the transfer device 41, the DEMUX 42, and the STC generation unit 43 provided in the MPEG1 decoding / playback apparatus 10 are the same as those provided in the conventional MPEG1 decoding / playback apparatus 40, and thus description thereof is omitted.

ビデオデコーダ１４は、ＤＥＭＵＸ４２から受信したビデオＥＳをデコードし、映像信号の出力を行う。ここで、入力バッファ４４１及びＤＴＳ・ＰＴＳ比較部４４３の機能は、図４に示した従来のＭＰＥＧ１復号再生装置４０が備えるものと同等である。 The video decoder 14 decodes the video ES received from the DEMUX 42 and outputs a video signal. Here, the functions of the input buffer 441 and the DTS / PTS comparison unit 443 are the same as those provided in the conventional MPEG1 decoding / playback apparatus 40 shown in FIG.

ビデオ復号部１４２は、従来のＭＰＥＧ１復号再生装置４０が備えるビデオ復号部４４２と同様に、入力バッファ４４１からピクチャを取得し、ＤＴＳ・ＰＴＳ比較部４４３の指示によって、ＤＴＳで示される復号順序に従ってデコードを行い、ＰＴＳで示される再生順序に従って復号後の映像信号を出力できるよう構成されている。加えて、ビデオ復号部１４２は、ＰＴＳが取得できないためにＤＴＳ・ＰＴＳ比較部４４３から再生タイミングの指示を得られない場合には、先頭Ｉピクチャから順次再生を行うよう構成されている。 Similar to the video decoding unit 442 provided in the conventional MPEG1 decoding / playback apparatus 40, the video decoding unit 142 obtains a picture from the input buffer 441, and decodes it according to the decoding order indicated by the DTS according to an instruction from the DTS / PTS comparison unit 443. And the decoded video signal can be output in accordance with the reproduction order indicated by PTS. In addition, the video decoding unit 142 is configured to sequentially reproduce from the first I picture when the reproduction timing instruction cannot be obtained from the DTS / PTS comparison unit 443 because the PTS cannot be acquired.

ビデオ復号部１４２は、ビデオＥＳの先頭に位置するＩピクチャを取得するまでＰピクチャ及びＢピクチャの廃棄を行う点も従来のビデオ復号部４４２と同様であるが、さらに、ピクチャカウンタ１４４に対してピクチャを廃棄したことを通知するよう構成されている。ピクチャカウンタ１４４は、ビデオ復号部１４２においてビデオＥＳの先頭に位置するＩピクチャを検出するまでに廃棄されたピクチャ数を計数するカウンタであり、ビデオ復号部１４２がピクチャを廃棄した旨の通知を受けて廃棄ピクチャ数を順次カウントしていく。 The video decoding unit 142 is similar to the conventional video decoding unit 442 in that the P picture and the B picture are discarded until the I picture located at the head of the video ES is acquired. It is configured to notify that the picture has been discarded. The picture counter 144 is a counter that counts the number of pictures discarded until the video decoding unit 142 detects the I picture located at the head of the video ES, and receives a notification that the video decoding unit 142 has discarded the picture. The number of discarded pictures is counted sequentially.

オーディオデコーダ１５は、ＤＥＭＵＸ４２から受信したオーディオＥＳをデコードし、音声信号の出力を行う。ここで、入力バッファ４５１及びＰＴＳ比較部４５３の機能は、従来のＭＰＥＧ１復号再生装置４０が備えるものと同等である。 The audio decoder 15 decodes the audio ES received from the DEMUX 42 and outputs an audio signal. Here, the functions of the input buffer 451 and the PTS comparison unit 453 are equivalent to those provided in the conventional MPEG1 decoding / playback apparatus 40.

オフセット算出部１５４は、ピクチャカウンタ１４４が保持する廃棄ピクチャ数を参照して、廃棄ピクチャ数に相当する再生時間Ｔ_Ｐの算出を行う。再生時間Ｔ_Ｐは以下の式（１）、
Ｔ_Ｐ＝ＰＩＣ＿ＣＮＴ×Ｔ_Ｆ・・・・（１）
により算出する。ここで、ＰＩＣ＿ＣＮＴは廃棄ピクチャ数であり、Ｔ_Ｆは１ピクチャ当たりの再生時間である。Ｔ_Ｆの値は、ビデオＥＳが従うテレビ信号形式によって定まるものであり、ＮＴＳＣ方式であれば１／３０秒、ＰＡＬ方式であれば１／２４秒である。 Offset calculation unit 154 refers to the number of discarded pictures picture counter 144 holds, calculates the reproduction time T _P corresponding to discards picture. The reproduction time _TP is the following formula (1),
T _P = PIC_CNT × T _F (1)
Calculated by Here, PIC_CNT is discarded number of pictures, _{T F} is the reproduction time per picture. The value of _TF is determined by the television signal format followed by the video ES, and is 1/30 seconds for the NTSC system and 1/24 seconds for the PAL system.

オーディオ復号部１５２は、従来のＭＰＥＧ１復号再生装置４０が備えるオーディオ復号部４５２と同様に、入力バッファ４５１からオーディオＥＳを取得し、ＰＴＳ比較部４５３の指示によって、ＰＴＳで示される再生タイミングに従ってデコード後の映像信号を出力できるよう構成されている。加えて、オーディオ復号部１５２は、オフセット算出部１５４で算出された廃棄ピクチャ数に相当する再生時間Ｔ_Ｐ分のオーディオデータを再生される音声信号の先頭から廃棄することにより、出力開始タイミングを遅らせて出力を開始できるよう構成されている。 The audio decoding unit 152 obtains the audio ES from the input buffer 451 in the same manner as the audio decoding unit 452 provided in the conventional MPEG1 decoding / playback device 40, and after decoding according to the playback timing indicated by the PTS according to the instruction of the PTS comparison unit 453 The video signal can be output. In addition, the audio decoding unit 152, by discarding from the beginning of the audio signal reproduced audio data reproduction time T _P min that corresponds to the number of discarded picture calculated by the offset calculation unit 154, delays the output start timing Is configured to start output.

このような構成により、映像ストリームにＰＴＳが付与されていない場合であっても、映像と音声の再生開始タイミングを合わせることが可能となる。 With such a configuration, it is possible to match the playback start timing of video and audio even when no PTS is added to the video stream.

続いて、本実施の形態にかかるＭＰＥＧ１復号再生装置１０で行うＡＶ同期及び再生タイミングの同期処理フローを、図２を用いて説明する。なお、ピクチャにビデオＰＴＳが付与されており、ビデオＰＴＳの取得が可能である場合におけるＡＶ同期処理は、図６を用いて説明した従来の処理と同様であるため、これに関する処理ステップには、図６と同じ記号を付与している。 Next, an AV synchronization and playback timing synchronization processing flow performed by the MPEG1 decoding / playback apparatus 10 according to the present embodiment will be described with reference to FIG. Note that the AV synchronization process when the video PTS is given to the picture and the video PTS can be acquired is the same as the conventional process described with reference to FIG. The same symbols as in FIG. 6 are given.

まず、ステップＳ１からＳ３は、図６を用いて説明した従来の処理と同様である。ステップＳ２２では、ステップＳ３において取得したピクチャがＰピクチャ又はＢピクチャであるためにピクチャ廃棄を行った場合、ピクチャカウンタ１４４が保持する廃棄ピクチャ数のカウントアップを行う。 First, steps S1 to S3 are the same as the conventional processing described with reference to FIG. In step S22, when the picture is discarded because the picture acquired in step S3 is a P picture or a B picture, the number of discarded pictures held by the picture counter 144 is counted up.

ステップＳ２１では、ステップＳ３において取得した先頭ＩピクチャにビデオＰＴＳが付与されているか否かの判定が、ＤＴＳ・ＰＴＳ比較部４４３において行われる。ビデオＰＴＳが付与されている場合は、ステップＳ４からＳ８の処理を行うが、これらの処理ステップは図６を用いて説明した従来の処理ステップと同様であるため説明を省略する。 In step S21, the DTS / PTS comparison unit 443 determines whether or not a video PTS is assigned to the first I picture acquired in step S3. When the video PTS is assigned, the processing from steps S4 to S8 is performed. Since these processing steps are the same as the conventional processing steps described with reference to FIG.

他方、ステップＳ２１において先頭ＩピクチャにビデオＰＴＳが付与されていないと判断された場合は、ステップＳ２３からＳ２５の処理を行う。ステップＳ２３では、オフセット算出部１５４が、上述した（１）式により廃棄ピクチャ数に相当する再生時間Ｔ_Ｐの算出を行う。続く、ステップＳ２４では、オーディオ復号部１５２が、再生時間Ｔ_Ｐ分のオーディオデータを、オーディオＥＳの先頭から廃棄する。最後に、ステップＳ２５では、ビデオ復号部１４２が先頭Ｉピクチャから映像信号の出力を開始し、オーディオ復号部１５２がオーディオデータ廃棄後の先頭データから音声信号の出力を開始する。 On the other hand, if it is determined in step S21 that the video PTS is not assigned to the first I picture, the processing from steps S23 to S25 is performed. In step S23, the offset calculation unit 154, calculates the reproduction time T _P corresponding to the number of discarded picture by the equation (1). Subsequently, in step S24, the audio decoder 152, the audio data reproduction time _{T P} min, discarded from the head of the audio ES. Finally, in step S25, the video decoding unit 142 starts outputting the video signal from the first I picture, and the audio decoding unit 152 starts outputting the audio signal from the first data after discarding the audio data.

このように処理することにより、音声信号の出力開始タイミングを、ビデオデコーダ１４において廃棄されたピクチャの再生時間に相当する時間だけ遅くすることができ、さらに、映像信号が存在しない部分の音声信号を除いて再生を開始することができる。これによって、ビデオＰＴＳが検出されない場合であっても、映像信号と音声信号の再生開始タイミングを合わせることが可能となる。 By processing in this way, the output start timing of the audio signal can be delayed by a time corresponding to the reproduction time of the picture discarded in the video decoder 14, and the audio signal of the portion where no video signal exists is further reduced. Excluding playback can be started. Thereby, even when the video PTS is not detected, the reproduction start timing of the video signal and the audio signal can be matched.

なお、以上の説明では、ＭＰＥＧ１システムストリームを復号再生する際にビデオＥＳからビデオＰＴＳを取得できない場合に、映像信号と音声信号の再生開始タイミングを合わせることができる復号再生装置について説明したが、本発明はＭＰＥＧ１復号再生装置に限定されるものではなく、ＭＰＥＧ２プログラムストリームを復号再生するＭＰＥＧ２復号再生装置において、ＰＥＳ（Packetized Elementary Stream）パケットからピクチャの再生タイミングを指示するＰＴＳが取得できないときにも適用可能であることは言うまでもない。 In the above description, the decoding / playback apparatus that can match the playback start timing of the video signal and the audio signal when the video PTS cannot be obtained from the video ES when the MPEG1 system stream is decoded and played back has been described. The invention is not limited to the MPEG1 decoding / playback apparatus, and is also applicable to the MPEG2 decoding / playback apparatus that decodes / plays back an MPEG2 program stream when a PTS indicating the playback timing of a picture cannot be obtained from a PES (Packetized Elementary Stream) packet. It goes without saying that it is possible.

本発明にかかるＭＰＥＧ１復号再生装置の構成図である。It is a block diagram of the MPEG1 decoding reproduction | regeneration apparatus concerning this invention. 本発明にかかるＡＶ同期処理を示すフローチャートである。It is a flowchart which shows the AV synchronous process concerning this invention. ＭＰＥＧ１システムストリームのデータ構造を示す図である。It is a figure which shows the data structure of an MPEG1 system stream. 従来のＭＰＥＧ１復号再生装置の構成図である。It is a block diagram of the conventional MPEG1 decoding reproduction | regeneration apparatus. ＡＶ同期を説明するためのタイミングチャートである。It is a timing chart for demonstrating AV synchronization. 従来のＡＶ同期処理を示すフローチャートである。It is a flowchart which shows the conventional AV synchronous process.

Explanation of symbols

１０ＭＰＥＧ１復号再生装置
１４ビデオデコーダ
１５オーディオデコーダ
１４４ピクチャカウンタ
１５４オフセット算出部 DESCRIPTION OF SYMBOLS 10 MPEG1 decoding reproduction apparatus 14 Video decoder 15 Audio decoder 144 Picture counter 154 Offset calculation part

Claims

Video data and audio data are decoded and reproduced from a compressed video data stream and a compressed audio data stream which are composed of at least a first picture obtained by intra-coding an image frame and a second picture obtained by inter-frame predictive coding an image frame. A method,
Detecting the first first picture viewed from the beginning of the compressed video data stream;
Calculating a playback time when it is assumed that all pictures whose display order is positioned before the first first picture in the compressed video data stream are played back;
Starting video output from video data obtained by decoding the first first picture;
A decoding / reproducing method for starting audio output from audio data after the reproduction time from the head of an audio data stream obtained by decoding the compressed audio data stream.

The reproduction time includes the number of pictures from a picture physically located at the head of the compressed video data stream to a picture physically located immediately before the first first picture, and a predetermined reproduction time per picture. The decoding reproduction | regenerating method of Claim 1 calculated based on.

A video decoder that decodes and reproduces video data from a compressed video data stream composed of at least a first picture obtained by intra-coding an image frame and a second picture obtained by inter-frame predictive coding an image frame;
An audio decoder that decodes and reproduces audio data from the compressed audio data stream;
The video decoder starts video output from video data obtained by decoding the first picture as viewed from the beginning of the compressed video data stream;
The audio decoder skips audio data corresponding to playback times of all the pictures whose display order is located before the first first picture from the beginning of the audio data stream obtained by decoding the compressed audio data stream. A decoding / reproducing apparatus, wherein audio output is started from audio data after audio data corresponding to the reproduction time.

Further comprising a calculating means for calculating the playback time,
4. The audio decoder according to claim 3, wherein the audio decoder starts audio output from audio data after the reproduction time calculated by the calculation means from the head of the audio data stream obtained by decoding the compressed audio data stream. Decoding and playback device.

The calculating means counts the number of pictures physically located before the first picture as viewed from the head of the compressed video data stream, and the number of pictures obtained by counting and a predetermined picture The decoding / reproducing apparatus according to claim 4, wherein the reproduction time is calculated based on a winning reproduction time.