JP6399189B2

JP6399189B2 - Video coding method

Info

Publication number: JP6399189B2
Application number: JP2017197857A
Authority: JP
Inventors: 数井　君彦; 君彦数井; 純平小山; 智史島田
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2017-10-11
Filing date: 2017-10-11
Publication date: 2018-10-03
Anticipated expiration: 2032-10-01
Also published as: JP2018026867A

Description

本発明は、例えば、符号化された動画像データを復号せずに編集することが可能な動画像符号化方法に関する。 The present invention relates to a moving image encoding method capable of editing encoded moving image data without decoding, for example.

動画像データは、一般に非常に大きなデータ量を有する。そのため、動画像データを扱う装置は、動画像データを他の装置へ送信しようとする場合、あるいは、動画像データを記憶装置に記憶しようとする場合、動画像データを符号化することにより圧縮する。代表的な動画像の符号化標準として、International Standardization Organization/International Electrotechnical Commission(ISO/IEC)で策定されたMoving Picture Experts Group phase 2（MPEG-2）、MPEG-4、あるいはH.264 MPEG-4 Advanced Video Coding（MPEG-4 AVC/H.264）が広く利用されている。 The moving image data generally has a very large amount of data. Therefore, a device that handles moving image data compresses the moving image data by encoding it when transmitting the moving image data to another device or when storing the moving image data in the storage device. . As a typical video coding standard, Moving Picture Experts Group phase 2 (MPEG-2), MPEG-4, or H.264 MPEG-4 established by the International Standardization Organization / International Electrotechnical Commission (ISO / IEC) Advanced Video Coding (MPEG-4 AVC / H.264) is widely used.

このような符号化標準では、符号化対象のピクチャと、その前後のピクチャの情報とを用いて、符号化対象のピクチャを符号化するインター符号化方法、及び符号化対象ピクチャが持つ情報のみを用いて符号化するイントラ符号化方法が採用されている。また、インター符号化方法には、イントラ符号化ピクチャ（Iピクチャ）、一般的に過去のピクチャを予測ピクチャとする前方向予測ピクチャ（Pピクチャ）、及び一般的に過去のピクチャ及び未来のピクチャの両方を予測ピクチャとする双方向予測ピクチャ（Bピクチャ）、の3つのピクチャ種別が存在する。 In such an encoding standard, only the information of the encoding target picture and the inter encoding method for encoding the encoding target picture using the encoding target picture and the information of the pictures before and after the encoding target picture are used. An intra-encoding method that uses and encodes is employed. In addition, the inter-coding method includes intra-coded pictures (I pictures), forward-predicted pictures (P pictures) that generally use past pictures as predicted pictures, and generally past pictures and future pictures. There are three types of pictures, bi-predictive pictures (B pictures), both of which are predictive pictures.

一般に、インター符号化方法で符号化されたピクチャもしくはブロックの符号量は、イントラ符号化方法で符号化されたピクチャもしくはブロックの符号量に比べて小さい。このように、選択された符号化モードにより、シーケンス内でピクチャの符号量の偏りが生じる。同様に、選択された符号化モードにより、ピクチャ内でブロックの符号量の偏りが生じる。
そこで、符号量が時間的に変動しても、一定の伝送レートにて符号化された動画像を含むデータストリームを伝送できるように、伝送元の装置にデータストリーム用の送信バッファが用意され、また伝送先の装置にデータストリーム用の受信バッファが用意される。 In general, the code amount of a picture or block encoded by the inter encoding method is smaller than the code amount of a picture or block encoded by the intra encoding method. In this way, the coding amount of the picture is biased in the sequence depending on the selected coding mode. Similarly, the coding amount selected causes a deviation in the code amount of the block in the picture.
Therefore, even if the code amount fluctuates with time, a transmission buffer for the data stream is prepared in the transmission source device so that a data stream including a moving image encoded at a constant transmission rate can be transmitted. In addition, a reception buffer for the data stream is prepared in the transmission destination device.

MPEG-2及びMPEG-4 AVC/H.264では、それぞれVideo Buffering Verifier(VBV)とCoded Picture Buffer(CPB)と呼ばれる、理想的な動画像復号装置における受信バッファの動作が規定されている。なお、便宜上、以下では、理想的な動画像復号装置を単に理想復号装置と称する。理想復号装置は、復号処理に要する時間が0である瞬時復号を行うと規定されている。例えば、特許文献１にVBVに関する動画像符号化装置の制御方法が開示されている。 In MPEG-2 and MPEG-4 AVC / H.264, the operation of a reception buffer in an ideal moving picture decoding apparatus called Video Buffering Verifier (VBV) and Coded Picture Buffer (CPB) is defined. In the following, for convenience, an ideal video decoding device is simply referred to as an ideal decoding device. The ideal decoding device is defined to perform instantaneous decoding in which the time required for the decoding process is zero. For example, Patent Document 1 discloses a method for controlling a moving picture coding apparatus related to VBV.

動画像符号化装置は、理想復号装置の受信バッファがオーバーフロー及びアンダーフローしないように、理想復号装置があるピクチャを復号する時刻にそのピクチャのデータが受信バッファに格納されていることを保証するように符号量を制御する。
受信バッファのアンダーフローは、動画像符号化装置が一定の伝送レートで符号化された動画像のデータストリームを送信した場合に、動画像復号装置が復号及び表示するべき時刻までにピクチャを復号するのに必要なデータの伝送が完了しない場合に生じる。すなわち、受信バッファのアンダーフローは、動画像復号装置の受信バッファ内にピクチャを復号するために必要データが存在していないことである。この場合、動画像復号装置は復号処理を行うことが出来ないため、フレームスキップが発生する。 The moving picture coding apparatus ensures that the data of the picture is stored in the reception buffer at the time when the picture is decoded by the ideal decoding apparatus so that the reception buffer of the ideal decoding apparatus does not overflow and underflow. The code amount is controlled.
The underflow of the reception buffer is to decode a picture by a time when the moving image decoding apparatus should decode and display when the moving image encoding apparatus transmits a moving image data stream encoded at a constant transmission rate. This occurs when the transmission of data necessary for data transfer is not completed. That is, the underflow of the reception buffer means that data necessary for decoding a picture does not exist in the reception buffer of the video decoding device. In this case, since the moving picture decoding apparatus cannot perform the decoding process, frame skip occurs.

動画像復号装置は、受信バッファのアンダーフローを起こさずに復号処理できるように、ストリームを受信時刻から所定の時間だけ遅延させてからピクチャを表示する。
上述したように、理想復号装置では、処理時間0で、瞬時的に復号処理が完了すると規定される。そのため、動画像符号化装置へのi番目ピクチャの入力時刻をt(i)、理想復号装置における、i番目ピクチャの復号時刻をtr(i)とすれば、そのピクチャが表示可能となる最も早い時刻は、tr(i)と等しい。全てのピクチャにおいてピクチャの表示期間｛t(i+1)-t(i)｝と｛tr(i+1)-tr(i)｝が等しくなるので、復号時刻tr(i)は、入力時刻t(i)から固定時間dly分だけ遅延させた時刻｛tr(i)=t(i)+dly｝となる。従って、動画像符号化装置は、時刻tr(i)までに復号に必要なデータを動画像復号装置の受信バッファに伝送完了させなければならない。 The moving image decoding apparatus displays the picture after delaying the stream by a predetermined time from the reception time so that the decoding process can be performed without causing an underflow of the reception buffer.
As described above, in the ideal decoding device, it is defined that the decoding process is instantaneously completed at the processing time 0. Therefore, if the input time of the i-th picture to the video encoding device is t (i) and the decoding time of the i-th picture in the ideal decoding device is tr (i), the earliest possible display of that picture Time is equal to tr (i). Since the picture display periods {t (i + 1) -t (i)} and {tr (i + 1) -tr (i)} are equal in all pictures, the decoding time tr (i) is the input time A time {tr (i) = t (i) + dly} delayed by a fixed time dly from t (i). Therefore, the moving picture coding apparatus must complete transmission of data necessary for decoding to the reception buffer of the moving picture decoding apparatus by time tr (i).

図１を参照しつつ、受信バッファの様子を説明する。図１において横軸は時刻を表し、縦軸は受信バッファのバッファ占有量を表す。そして実線のグラフ１００は、各時刻におけるバッファ占有量を表す。
受信バッファでは、所定の伝送レートでバッファ占有量が回復し、各ピクチャの復号時刻にそのピクチャを復号するために用いられる分のデータがバッファから取り出される。i番目ピクチャのデータが、時刻at(i)から受信バッファに入力開始され、i番目ピクチャの最後のデータが時刻ft(i)に入力される。理想復号装置は時刻tr(i)でi番目ピクチャの復号を完了し、その時刻tr(i)においてi番目ピクチャが表示可能となる。ただし、Bピクチャが含まれる場合、ピクチャのリオーダ（符号化順序の入れ替え）が発生しているので、i番目ピクチャについての実際の表示時刻はtr(i)より遅くなることもある。 The state of the reception buffer will be described with reference to FIG. In FIG. 1, the horizontal axis represents time, and the vertical axis represents the buffer occupation amount of the reception buffer. A solid line graph 100 represents the buffer occupancy at each time.
In the reception buffer, the buffer occupancy is restored at a predetermined transmission rate, and data used for decoding the picture is extracted from the buffer at the decoding time of each picture. The i-th picture data starts to be input to the reception buffer at time at (i), and the last data of the i-th picture is input at time ft (i). The ideal decoding device completes decoding of the i-th picture at time tr (i), and can display the i-th picture at time tr (i). However, when a B picture is included, picture reordering (reordering of coding order) has occurred, so the actual display time for the i-th picture may be later than tr (i).

MPEG-4 AVC/H.264における、各ピクチャの復号時刻、及び表示時刻の記述方法の詳細を以下に説明する。 Details of the decoding time and display time description method for each picture in MPEG-4 AVC / H.264 will be described below.

MPEG-4 AVC/H.264では、画素の復号処理に直接関係しない補足情報は、SEI(Supplemental enhancement information)メッセージで記述される。SEIは、数十種類が規定され、その種別はpayloadTypeパラメータによって識別される。このSEIはピクチャごとに付加される。 In MPEG-4 AVC / H.264, supplementary information that is not directly related to pixel decoding processing is described in an SEI (Supplemental enhancement information) message. Dozens of types of SEI are defined, and the type is identified by the payloadType parameter. This SEI is added for each picture.

再引き込みが可能なピクチャ、すなわち、過去のピクチャがなくても復号可能なピクチャ（一般的にはIピクチャ）には、SEIの一つであるBPSEI(Buffering Period SEI)が付加される。BPSEIには、パラメータInitialCpbRemovalDelayが記述される。パラメータInitialCpbRemovalDelayは、BPSEIが付加されたピクチャの先頭ビットの受信バッファへの到達時刻と、BPSEIが付加されたピクチャの復号時刻との差分値を表す。この差分値の精度は90kHzである。
先頭ピクチャの復号時刻tr(0)は、符号化動画像データの先頭ビットが動画像復号装置に到達した時刻（0とする）、すなわちat(0)からInitialCpbRemovalDelay ÷ 90,000 [sec]だけ遅れた時刻になる。 BPSEI (Buffering Period SEI), which is one of SEI, is added to a picture that can be redrawn, that is, a picture that can be decoded without a past picture (generally an I picture). In BPSEI, a parameter InitialCpbRemovalDelay is described. The parameter InitialCpbRemovalDelay represents a difference value between the arrival time of the first bit of the picture to which the BPSEI is added and the decoding time of the picture to which the BPSEI is added. The accuracy of this difference value is 90 kHz.
The decoding time tr (0) of the first picture is the time when the first bit of the encoded moving image data reaches the moving image decoding device (assumed to be 0), that is, the time delayed by InitialCpbRemovalDelay ÷ 90,000 [sec] from at (0) become.

また、各ピクチャには、一般的に、SEIの一つであるPTSEI(Picture Timing SEI)が付加される。PTSEIには、パラメータCpbRemovalDelay及びDpbOutputDelayが記述される。パラメータCpbRemovalDelayは、直前のBPSEI付加ピクチャの復号時刻とPTSEIが付加されたピクチャの復号時刻との差分値を表す。またパラメータDpbOutputDelayは、PTSEIが付加されたピクチャの復号時刻とそのピクチャの表示時刻との差分値を表す。これら差分値の精度は、1フィールドピクチャ間隔である。したがって、ピクチャがフレームである場合には、パラメータCpbRemovalDelay及びDpbOutputDelayは2の倍数となる。 In addition, generally, PTSEI (Picture Timing SEI), which is one of SEI, is added to each picture. In PTSEI, parameters CpbRemovalDelay and DpbOutputDelay are described. The parameter CpbRemovalDelay represents the difference value between the decoding time of the previous BPSEI-added picture and the decoding time of the picture to which PTSEI is added. The parameter DpbOutputDelay represents a difference value between the decoding time of the picture to which PTSEI is added and the display time of the picture. The accuracy of these difference values is one field picture interval. Therefore, when the picture is a frame, the parameters CpbRemovalDelay and DpbOutputDelay are multiples of 2.

２番目以降のピクチャの復号時刻tr(i)は、先頭ピクチャの復号時刻tr(0)から、tc * CpbRemovalDelay(i) [sec]だけ遅れた時刻になる。CpbRemovalDelay(i)はi番目のピクチャに付加されるCpbRemovalDelayである。tcはピクチャ間の時間間隔[sec]であり、例えば29.97Hzのプログレッシブ画像の場合、tcは1001/60000である。 The decoding time tr (i) of the second and subsequent pictures is delayed by tc * CpbRemovalDelay (i) [sec] from the decoding time tr (0) of the first picture. CpbRemovalDelay (i) is CpbRemovalDelay added to the i-th picture. tc is a time interval [sec] between pictures. For example, in the case of a 29.97 Hz progressive image, tc is 1001/60000.

BPSEIが付加されたピクチャを含む、各ピクチャの表示時刻は、tr(i)からtc * DpbOutputDelay(i)だけ遅れた時刻になる。DpbOutputDelay(i)はi番目のピクチャに付加されるDpbOutputDelayである。すなわち、時刻tr(0)以降は、tcの整数倍の時刻単位で、ピクチャの復号及び表示が行われる。 The display time of each picture including the picture to which BPSEI is added is a time delayed by tc * DpbOutputDelay (i) from tr (i). DpbOutputDelay (i) is a DpbOutputDelay added to the i-th picture. That is, after time tr (0), picture decoding and display are performed in time units that are integral multiples of tc.

動画像データの用途によっては、符号化動画像が編集されることがある。符号化動画像の編集操作は、符号化動画像データを細分化し、それらをつなぎ合わせて別の符号化動画像データを生成するものである。例えば、放送中の符号化動画像の途中に別番組（例えばCM）を挿入する処理（スプライシング）は編集操作の一つである。 Depending on the use of the moving image data, the encoded moving image may be edited. The editing operation of the encoded moving image is to subdivide the encoded moving image data and connect them to generate another encoded moving image data. For example, processing (splicing) for inserting another program (for example, CM) in the middle of an encoded moving image being broadcast is one of editing operations.

フレーム間予測符号化された動画像の編集において、特にインター符号化されたピクチャは、その符号化ピクチャ単独では正常復号できない。そのため、任意のピクチャ位置で二つの符号化動画像データを結合するには、符号化動画像データを編集する装置は、一度結合する二つの符号化動画像データを復号して、復号されたピクチャ単位で結合したのち、その結合された動画像データを再度符号化する。 In editing a motion picture that has been subjected to inter-frame predictive coding, especially an inter-coded picture cannot be normally decoded by the coded picture alone. Therefore, in order to combine two encoded moving image data at an arbitrary picture position, an apparatus for editing the encoded moving image data decodes the two encoded moving image data once combined, After combining in units, the combined moving image data is encoded again.

しかし再符号化処理は煩雑であるため、特にスプライシングといったリアルタイム処理では、結合位置を限定することで、再符号化処理を行わずに符号化動画像データを直接編集することが一般的である。再符号化処理を行わずに符号化動画像データを編集する場合、結合する二つの符号化動画像データのうち、時間的に後ろ側に結合される符号化動画像データの先頭ピクチャはIピクチャとする。また、時間的に後ろ側に結合される符号化動画像データでは、いわゆるクローズドGOP構造、すなわち先頭Iピクチャに続く全てのピクチャが、先頭Iピクチャよりも時間的に過去のピクチャを参照しないように制限される。これにより、編集により所定の結合点にて結合された二つの符号化動画像データのうちの後側の符号化動画像データの先頭Iピクチャ以降の全てのピクチャが正常復号可能となる。 However, since the re-encoding process is complicated, it is common to directly edit the encoded moving image data without performing the re-encoding process by limiting the coupling position particularly in the real-time process such as splicing. When editing the encoded moving image data without performing re-encoding processing, the first picture of the encoded moving image data that is combined backward in time among the two encoded moving image data to be combined is the I picture And In addition, in the encoded video data that is temporally coupled to the back side, the so-called closed GOP structure, that is, all pictures following the top I picture do not refer to past pictures in time than the top I picture. Limited. As a result, all the pictures after the first I picture of the encoded video data on the rear side of the two encoded video data combined at a predetermined connection point by editing can be normally decoded.

なお、クローズドGOP構造は、非クローズドGOP構造に比べ符号化効率が低下するため、非クローズドGOP構造が採用されることもある。この場合、結合点後の先頭Iピクチャの直後のいくつかのピクチャは正常に復号されないが、それらのピクチャは表示時間的に先頭Iピクチャより先に表示されるピクチャであるため、表示できなくてもよい。そこで、動画像復号装置は、一般的に、結合された二つの符号化動画像データのうち、時間的に前側の符号化動画像データの最終ピクチャを表示後、表示をフリーズするなどして正常復号できなかったピクチャの表示をマスキングする。 Note that the closed GOP structure has a lower encoding efficiency than the non-closed GOP structure, and therefore the non-closed GOP structure may be adopted. In this case, some pictures immediately after the first I picture after the connection point are not decoded normally, but since these pictures are displayed before the first I picture in display time, they cannot be displayed. Also good. Therefore, in general, a moving image decoding apparatus normally displays a final picture of temporally preceding encoded moving image data among two combined encoded moving image data and then freezes the display. Mask the display of pictures that could not be decoded.

従来技術では、フレーム間予測符号化された動画像データが再符号化処理されずに編集された場合でも、結合した二つの符号化動画像データの間で齟齬が生じないように、ヘッダ情報も修正される。
例えば、MPEG-4 AVC/H.264において、ピクチャ間の時間的な関係、及び参照ピクチャの特定のために、スライスヘッダにPOC (Picture Order Count)及びFrameNumが付加される。POCは、相対的なピクチャの表示順番を表す。FrameNumは、符号化動画像中に参照ピクチャが出現する度に1だけ増加する値である。結合した二つの符号化動画像データの間でPOC及びFrameNumが連続する必要があるため、結合する二つの符号化動画像データの内、後ろ側の符号化動画像データの全てのPOC及びFrameNumを書き換える必要が生じる。 In the prior art, even when the video data that has been inter-frame predictively encoded is edited without being re-encoded, the header information is also included so that no wrinkles occur between the two encoded video data that have been combined. Will be corrected.
For example, in MPEG-4 AVC / H.264, POC (Picture Order Count) and FrameNum are added to a slice header in order to specify a temporal relationship between pictures and to specify a reference picture. POC represents the display order of relative pictures. FrameNum is a value that increases by 1 each time a reference picture appears in the encoded video. Since the POC and FrameNum need to be continuous between the two encoded video data combined, all the POC and FrameNum of the encoded video data on the back side of the two encoded video data to be combined It is necessary to rewrite.

一方、非特許文献１に開示された方式では、新たな参照ピクチャの特定方法が導入されたため、FrameNumが廃止されている。また結合する二つの符号化動画像データのうち、後ろ側の符号化動画像データの先頭ピクチャのPOC値は、前側の符号化動画像データとの連続性を持たなくてもよいため、スライスヘッダの変更は不要になる。非特許文献１に開示された方式では、ピクチャの種別として、MPEG-4 AVC/H.264が規定するIDR (Instantaneous Decoding Refresh)ピクチャに加え、CRA (Clean Random Access)ピクチャ、BLA (Broken Link Access)ピクチャ、TFD (Tagged For Discard)ピクチャ、DLP (Decodable Leading Picture)ピクチャ、TP (Trailing Picture)ピクチャが新たに導入されている。 On the other hand, in the method disclosed in Non-Patent Document 1, FrameNum is abolished because a new reference picture specifying method is introduced. Of the two encoded moving image data to be combined, the POC value of the first picture of the backward encoded moving image data need not have continuity with the encoded image data of the front side, so the slice header No change is required. In the method disclosed in Non-Patent Document 1, as a picture type, in addition to IDR (Instantaneous Decoding Refresh) picture defined by MPEG-4 AVC / H.264, CRA (Clean Random Access) picture, BLA (Broken Link Access) ) Picture, TFD (Tagged For Discard) picture, DLP (Decodable Leading Picture) picture, and TP (Trailing Picture) picture are newly introduced.

このうち、CRAピクチャ及びBLAピクチャは、共に、再引き込みポイントとなるピクチャである。再引き込みポイントとなるピクチャは、そのピクチャの復号において他のピクチャを参照しないピクチャである。動画像復号装置がCRAピクチャ及びBLAピクチャから復号動作を開始した場合、このCRAピクチャもしくはBLAピクチャの直後に続くTFDピクチャ以外は正常に復号可能である。 Among these, the CRA picture and the BLA picture are both pictures that become re-drawing points. A picture that serves as a redrawing point is a picture that does not refer to other pictures in decoding the picture. When the moving picture decoding apparatus starts the decoding operation from the CRA picture and the BLA picture, it is possible to normally decode other than the TFD picture immediately following the CRA picture or the BLA picture.

TFDピクチャは、CRAピクチャまたはBLAピクチャの直後に出現する、CRAピクチャまたはBLAピクチャよりも時間的及び復号順番的に前のピクチャを参照するピクチャである。MPEG-2に準拠する非クローズドGOP構造の場合、GOP先頭のIピクチャの直後の複数のBピクチャがTFDピクチャに相当する。 A TFD picture is a picture that appears immediately after a CRA picture or a BLA picture and refers to a picture preceding the CRA picture or the BLA picture in terms of time and decoding order. In the case of a non-closed GOP structure compliant with MPEG-2, a plurality of B pictures immediately after the I picture at the head of the GOP correspond to TFD pictures.

BLAピクチャは、符号化動画像データの編集操作により発生する。結合する二つの符号化動画像データのうち、後ろ側の符号化動画像データの先頭ピクチャは一般にはCRAピクチャであるが、このCRAピクチャが、結合された符号化動画像データの途中に位置する場合には、CRAピクチャからBLAピクチャに種別変更される。非特許文献１に開示された方式では、BLAピクチャが出現する場合、POC値が不連続になることが許容される。また、このBLAピクチャの直後のTFDピクチャは、結合した符号化動画像データでは、参照されるべきピクチャが失われているので、その符号化動画像データのどこから復号しても正常に復号できない。そのため、動画像符号化装置は、結合する二つの符号化動画像データのうち、後ろ側の符号化動画像データの先頭BLAピクチャに後続するTFDピクチャを符号化動画像データから削除してもよい。 A BLA picture is generated by an editing operation of encoded moving image data. Of the two encoded moving image data to be combined, the first picture of the encoded image data on the back side is generally a CRA picture, but this CRA picture is located in the middle of the combined encoded moving image data. In this case, the type is changed from the CRA picture to the BLA picture. In the method disclosed in Non-Patent Document 1, when a BLA picture appears, the POC value is allowed to be discontinuous. Further, the TFD picture immediately after the BLA picture cannot be normally decoded no matter where it is decoded from the encoded moving picture data because the picture to be referred to is lost in the combined encoded moving picture data. For this reason, the moving picture coding apparatus may delete the TFD picture that follows the first BLA picture of the back coded moving picture data from the coded moving picture data, of the two coded moving picture data to be combined. .

DLPピクチャは、TFDピクチャと同様に、CRAピクチャまたはBLAピクチャの直後に出現するピクチャである。DLPピクチャは、TDFピクチャと異なり、CRAピクチャまたはBLAピクチャよりも時間的及び復号順番的に前のピクチャを参照しない。このため、DLPピクチャは、CRAピクチャまたはBLAピクチャから復号が開始された場合にも、正常に復号できる。 A DLP picture is a picture that appears immediately after a CRA picture or a BLA picture, like a TFD picture. Unlike the TDF picture, the DLP picture does not refer to the previous picture in terms of time and decoding order than the CRA picture or the BLA picture. Therefore, the DLP picture can be normally decoded even when decoding is started from the CRA picture or the BLA picture.

TPピクチャは、CRAピクチャまたはBLAピクチャ、TFDピクチャ、DLPピクチャよりも復号順番的に後に出現し、またCRAピクチャまたはBLAピクチャよりも時間的に後のピクチャである。したがって、TPピクチャは、CRAピクチャまたはBLAピクチャから復号が開始された場合にも、正常に復号できる。 The TP picture appears later in the decoding order than the CRA picture or BLA picture, TFD picture, and DLP picture, and is a picture temporally later than the CRA picture or BLA picture. Therefore, the TP picture can be normally decoded even when decoding is started from the CRA picture or the BLA picture.

特開２００３−１７９９３８号公報JP 2003-179938 A

JCTVC-J1003, " High-Efficiency Video Coding (HEVC) text specification Draft 8", Joint Collaborative Team on Video Coding of ITU-T SG16 WP3 and ISO/IEC JTC1/SC29/WG11, July 2012JCTVC-J1003, "High-Efficiency Video Coding (HEVC) text specification Draft 8", Joint Collaborative Team on Video Coding of ITU-T SG16 WP3 and ISO / IEC JTC1 / SC29 / WG11, July 2012

非特許文献１に開示された方式では、MPEG-4 AVC/H.264と同じく、各符号化ピクチャの復号時刻及び表示時刻は、パラメータInitialCpbRemovalDelay, CpbRemovalDelay, DpbOutputDelayを用いて決定される。二つの符号化動画像データを結合する場合に、結合点にて動画像復号処理及び表示が連続して行われるためには、結合点より後のピクチャのCpbRemovalDelay、DpbOutputDelayを適切な値に修正することが求められる。 In the method disclosed in Non-Patent Document 1, the decoding time and display time of each encoded picture are determined using parameters InitialCpbRemovalDelay, CpbRemovalDelay, and DpbOutputDelay, as in MPEG-4 AVC / H.264. When two encoded video data are combined, in order to continuously perform video decoding processing and display at the connection point, CpbRemovalDelay and DpbOutputDelay of the pictures after the connection point are corrected to appropriate values. Is required.

具体的には、動画像符号化装置または動画像復号装置は、結合する二つの符号化動画像データのうち、後ろ側の符号化動画像データの先頭CRAピクチャのCpbRemovalDelayを、前側の符号化動画像データにおける最終BPSEI付加ピクチャに後続するピクチャ数に基づいて修正する必要がある。更にCPBバッファの連続性を保証するには、動画像符号化装置または動画像復号装置は、CpbRemovalDelayの値を増加する。また、後ろ側の符号化動画像データからTFDピクチャを除去する場合、動画像符号化装置または動画像復号装置は、除去したTFDピクチャの後に復号するピクチャのCpbRemovalDelay、及び結合点での先頭CRAピクチャ並びに除去したTFDピクチャの前に復号するピクチャのDpbOutputDelayを変更する必要がある。
このように、非特許文献１に開示された方式では、二つの符号化動画像データを結合する編集操作においてPTSEIの内容を変更する必要が依然残っている。 Specifically, the moving image encoding device or the moving image decoding device calculates the CpbRemovalDelay of the first CRA picture of the encoded image data on the rear side of the two encoded moving image data to be combined with the encoded image on the front side. It is necessary to correct based on the number of pictures following the final BPSEI additional picture in the image data. Further, in order to guarantee the continuity of the CPB buffer, the moving image encoding device or moving image decoding device increases the value of CpbRemovalDelay. Also, when removing the TFD picture from the encoded video data on the back side, the video encoding device or video decoding device uses the CpbRemovalDelay of the picture to be decoded after the removed TFD picture, and the leading CRA picture at the connection point In addition, it is necessary to change the DpbOutputDelay of the picture to be decoded before the removed TFD picture.
As described above, in the method disclosed in Non-Patent Document 1, it is still necessary to change the contents of PTSEI in an editing operation for combining two encoded moving image data.

そこで本明細書は、フレーム間予測符号化された二つの動画像データを結合する際に、元の符号化動画像データのヘッダ内パラメータを変更しなくても、連続した復号処理及び表示処理を可能にする動画像符号化方法を提供することを目的とする。 Therefore, in this specification, when two moving image data subjected to interframe prediction encoding are combined, continuous decoding processing and display processing are performed without changing the parameters in the header of the original encoded moving image data. It is an object of the present invention to provide a moving picture encoding method that enables this.

一つの実施形態によれば、フレーム間予測方式によって符号化された第１の動画像データと第２の動画像データを結合することにより符号化された結合動画像データを生成する動画像符号化方法が提供される。この動画像符号化方法は、動画像復号装置において、符号化された第２の動画像データに含まれる各ピクチャのうち、符号化された第１の動画像データの後に結合される先頭の符号化ピクチャについて、先頭の符号化ピクチャよりも符号化順序が後の１以上のピクチャを除去する場合にも、符号化された第２の動画像データのうちの先頭の符号化ピクチャ以降のピクチャを連続して復号及び表示できるようにするための復号遅延及び表示遅延の補正情報を求め、結合動画像データにその補正情報を付加する、ことを含み、補正情報は、除去されるべきピクチャのそれぞれについてのその除去されるべきピクチャと復号順序が直前のピクチャとの復号間隔に基づいて算出される。 According to one embodiment, the moving image encoding that generates the combined moving image data encoded by combining the first moving image data and the second moving image data encoded by the inter-frame prediction method. A method is provided. In this moving image encoding method, in the moving image decoding apparatus, the first code combined after the encoded first moving image data among the pictures included in the encoded second moving image data. Even when one or more pictures whose encoding order is later than the first encoded picture are removed, the pictures after the first encoded picture in the encoded second moving image data are also deleted. Determining correction information for decoding delay and display delay to enable continuous decoding and display, and adding the correction information to the combined moving image data, the correction information for each picture to be removed The decoding order for the picture to be removed is calculated based on the decoding interval between the previous picture.

本発明の目的及び利点は、請求項において特に指摘されたエレメント及び組み合わせにより実現され、かつ達成される。
上記の一般的な記述及び下記の詳細な記述の何れも、例示的かつ説明的なものであり、請求項のように、本発明を限定するものではないことを理解されたい。 The objects and advantages of the invention will be realized and attained by means of the elements and combinations particularly pointed out in the appended claims.
It should be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are not restrictive of the invention as claimed.

本明細書に開示された動画像符号化方法は、フレーム間予測符号化された二つの動画像データを結合する際に、元の符号化動画像データのヘッダ内パラメータを変更しなくても、連続した復号処理及び表示処理を可能にする。 The moving image encoding method disclosed in the present specification can be performed without combining the parameters in the header of the original encoded moving image data when combining two pieces of moving image data subjected to inter-frame predictive encoding. Enables continuous decoding and display processing.

受信バッファのバッファ占有量の遷移と表示時刻との関係を示す図である。It is a figure which shows the relationship between the transition of the buffer occupation amount of a receiving buffer, and display time. 動画像データに含まれる各ピクチャの表示順序及び復号順序と、復号遅延及び表示遅延との関係を示す図である。It is a figure which shows the relationship between the display order and decoding order of each picture contained in moving image data, and a decoding delay and a display delay. 二つの符号化動画像データが結合された場合の結合点以降のピクチャの復号遅延及び表示遅延の説明図である。It is explanatory drawing of the decoding delay and display delay of the picture after the coupling | bonding point when two coding moving image data are couple | bonded. 第１の実施形態による符号化動画像の一つのピクチャのデータ構造の説明図である。It is explanatory drawing of the data structure of one picture of the encoding moving image by 1st Embodiment. 第１の実施形態による動画像符号化装置の概略構成図である。It is a schematic block diagram of the moving image encoder by 1st Embodiment. 第１の実施形態による動画像符号化処理の動作フローチャートである。It is an operation | movement flowchart of the moving image encoding process by 1st Embodiment. 第１の実施形態による動画像編集処理の動作フローチャートである。It is an operation | movement flowchart of the moving image edit process by 1st Embodiment. 第１の実施形態による動画像復号装置の概略構成図である。It is a schematic block diagram of the moving image decoding apparatus by 1st Embodiment. 第１の実施形態による動画像復号処理の動作フローチャートである。It is an operation | movement flowchart of the moving image decoding process by 1st Embodiment. 第２の実施形態による、二つの符号化動画像データが結合された場合の結合点以降のピクチャの復号遅延及び表示遅延の説明図である。It is explanatory drawing of the decoding delay and display delay of the picture after the coupling | bonding point at the time of two coding moving image data being couple | bonded by 2nd Embodiment. 第２の実施形態による符号化動画像の一つのピクチャのデータ構造の説明図である。It is explanatory drawing of the data structure of one picture of the encoding moving image by 2nd Embodiment. 各実施形態またはその変形例による動画像符号化装置または動画像復号装置の各部の機能を実現するコンピュータプログラムが動作することにより、動画像符号化装置または動画像復号装置として動作するコンピュータの構成図である。The block diagram of the computer which operate | moves as a moving image encoding apparatus or a moving image decoding apparatus by the computer program which implement | achieves the function of each part of the moving image encoding apparatus or moving image decoding apparatus by each embodiment or its modification It is.

以下、図を参照しつつ、様々な実施形態による、動画像符号化装置及び動画像復号装置について説明する。この動画像符号化装置は、二つの符号化動画像データを、復号せずに結合する際に、結合点以降のピクチャについての復号時刻及び表示時刻を表すパラメータを修正するために利用される値を算出し、その値を結合点以降のピクチャのヘッダ情報に付加する。これにより、この動画像符号化装置は、二つの符号化動画像データを結合する場合でも、元の符号化動画像データのヘッダ内パラメータの書き換えを不要とする。 Hereinafter, a video encoding device and a video decoding device according to various embodiments will be described with reference to the drawings. This moving image encoding apparatus uses values used for correcting parameters representing decoding time and display time for pictures after the combining point when combining two encoded moving image data without decoding. And the value is added to the header information of the pictures after the coupling point. As a result, the moving image encoding apparatus does not require rewriting of the parameters in the header of the original encoded moving image data even when two encoded moving image data are combined.

本実施形態では、ピクチャは、フレームである。ただし、ピクチャはフレームに限られず、フィールドであってもよい。フレームは、動画像データ中の一つの静止画像であり、一方、フィールドは、フレームから奇数行のデータあるいは偶数行のデータのみを取り出すことにより得られる静止画像である。また、符号化された動画像は、カラー動画像であってもよく、あるいは、モノクロ動画像であってもよい。 In the present embodiment, the picture is a frame. However, the picture is not limited to a frame and may be a field. The frame is one still image in the moving image data, while the field is a still image obtained by extracting only odd-numbered data or even-numbered data from the frame. The encoded moving image may be a color moving image or a monochrome moving image.

図２を参照しつつ、最初に、第１の実施形態における、ピクチャの復号遅延CpbRemovalDelay、及び表示遅延DpbOutputDelayの値について、一つのピクチャ符号化構造を例として説明する。
図２において、ピクチャ符号化構造の一例であるピクチャ符号化構造２０１は、複数のピクチャを含む。ピクチャ符号化構造２０１内の一つのブロックが一つのピクチャを表す。各ピクチャに相当するブロックの内部に示された二つの文字のうち、左側の一文字のアルファベットは、そのピクチャに対して適用される符号化モードを表す。I, P, Bはそれぞれ、Iピクチャ、Pピクチャ、Bピクチャを意味する。また、ブロックの内部に示された二つの文字のうちの右側の数字は、動画像符号化装置へ入力される順番である。なお、入力順番は、動画像復号装置から出力される順番と一致する。ピクチャ符号化構造２０１の上側に示された矢印は、前方向フレーム予測において符号化されるピクチャが参照する参照ピクチャを表す。例えば、ピクチャP4は、ピクチャP4よりも前に位置するピクチャI0を参照する。同様に、ピクチャ符号化構造２０１の下側に示された矢印は、後方向フレーム予測において符号化されるピクチャが参照する参照ピクチャを表す。例えば、ピクチャB2は、ピクチャB2よりも後に位置するピクチャP4を参照する。 With reference to FIG. 2, first, the picture decoding delay CpbRemovalDelay and the display delay DpbOutputDelay in the first embodiment will be described by taking one picture coding structure as an example.
In FIG. 2, a picture coding structure 201, which is an example of a picture coding structure, includes a plurality of pictures. One block in the picture coding structure 201 represents one picture. Of the two characters shown in the block corresponding to each picture, the alphabet of the left one character represents the encoding mode applied to that picture. I, P, and B mean an I picture, a P picture, and a B picture, respectively. Also, the number on the right of the two characters shown inside the block is the order of input to the video encoding device. Note that the input order matches the order output from the video decoding device. An arrow shown on the upper side of the picture coding structure 201 represents a reference picture to which a picture to be coded in forward frame prediction refers. For example, the picture P4 refers to a picture I0 located before the picture P4. Similarly, an arrow shown below the picture coding structure 201 represents a reference picture that is referenced by a picture to be coded in backward frame prediction. For example, picture B2 refers to picture P4 located after picture B2.

ピクチャ符号化構造２０１の下側には、動画像復号装置における、ピクチャ符号化構造２０１に含まれる各ピクチャの復号順序２０２が示される。復号順序２０２に示された各ブロックは、それぞれ、一つのピクチャを表し、かつ、各ブロック内の文字は、ピクチャ符号化構造２０１と同様に、符号化モード及び動画像符号化装置への入力の順番を表す。なお、この復号順序２０２は、動画像符号化装置による符号化の順序と一致する。各ピクチャの復号順序２０２の上側、及び下側に示された矢印はそれぞれ、前方向フレーム予測において符号化されるピクチャが参照する参照ピクチャ、後方向フレーム予測において符号化されるピクチャが参照する参照ピクチャを表す。
復号順序２０２において、下側に"BPSEI"という文字が書かれたピクチャには、BPSEIが付加される。この例では、全てのIピクチャにBPSEIが付加される。すなわち、全てのIピクチャに、Iピクチャの先頭ビットの受信バッファへの到達時刻と、Iピクチャの復号時刻との差分値を表すパラメータInitialCpbRemovalDelayが記述される。 Below the picture coding structure 201, a decoding order 202 of each picture included in the picture coding structure 201 in the video decoding device is shown. Each block shown in the decoding order 202 represents one picture, and the characters in each block are input to the encoding mode and the moving image encoding apparatus in the same manner as the picture encoding structure 201. Represents the order. Note that this decoding order 202 matches the order of encoding by the moving picture encoding apparatus. The arrows shown above and below the decoding order 202 of each picture are a reference picture referenced by a picture encoded in forward frame prediction and a reference referenced by a picture encoded in backward frame prediction, respectively. Represents a picture.
In the decoding order 202, BPSEI is added to a picture in which “BPSEI” is written on the lower side. In this example, BPSEI is added to all I pictures. That is, the parameter InitialCpbRemovalDelay indicating the difference value between the arrival time of the first bit of the I picture at the reception buffer and the decoding time of the I picture is described in all I pictures.

復号順序２０２の下側に示されたブロック列２０３は、各ピクチャに付加されるPTSEIに含まれるCpbRemovalDelay及びDpbOutputDelayの値を表す。ブロック列２０３のうちの上側の各ブロックには、それぞれ、そのブロックの上に位置する復号順序２０２のピクチャのCpbRemovalDelayの値が示される。一方、ブロック列２０３のうちの下側の各ブロックには、それぞれ、そのブロックの上に位置する復号順序２０２のピクチャのDpbOutputDelayの値が示される。CpbRemovalDelayは、BPSEIが付加されたピクチャのうち、符号化順で直前のピクチャからの符号化の順番に相当する。例えば、ピクチャP8は、ピクチャI0から符号化順で5番目である。そして本実施形態では、各ピクチャはフレームであり、ピクチャ間の時間間隔tcがフィールド単位の値であるので、ピクチャP8のCpbRemovalDelayは10(=5*2)となる。 A block string 203 shown below the decoding order 202 represents the values of CpbRemovalDelay and DpbOutputDelay included in PTSEI added to each picture. Each block on the upper side of the block sequence 203 indicates the value of CpbRemovalDelay of a picture in decoding order 202 located above that block. On the other hand, each lower block of the block sequence 203 indicates the value of DpbOutputDelay of a picture in decoding order 202 located above that block. CpbRemovalDelay corresponds to the coding order from the previous picture in the coding order among the pictures to which BPSEI is added. For example, the picture P8 is fifth from the picture I0 in the coding order. In this embodiment, each picture is a frame, and the time interval tc between pictures is a field unit value, so the CpbRemovalDelay of the picture P8 is 10 (= 5 * 2).

一方、DpbOutputDelayは、動画像復号装置において、各ピクチャを連続して正しい順番で出力するための、表示遅延である。例えばピクチャP4は、DpbOutputDelayが10になっている。これは動画像符号化装置において、入力順番と符号化順番との差が最も大きいピクチャB1を正しく表示するために必要な遅延である。すなわち、ピクチャB1は、ピクチャP4の復号時刻から2ピクチャ時間後に復号されるため、ピクチャP4の表示時刻は、ピクチャB1が表示可能となる最も早い時刻、すなわちピクチャB1の復号時刻から、更に3ピクチャ時間後に表示するようにしなければならない。ピクチャP4の復号時刻と表示時刻との差は5ピクチャ時間になり、DpbOutputDelayは、tcがフィールド単位であることから10となる。 On the other hand, DpbOutputDelay is a display delay for continuously outputting each picture in the correct order in the video decoding device. For example, DpbOutputDelay is 10 for picture P4. This is a delay necessary for correctly displaying the picture B1 having the largest difference between the input order and the encoding order in the moving picture encoding apparatus. That is, since the picture B1 is decoded two picture times after the decoding time of the picture P4, the display time of the picture P4 is three pictures from the earliest time when the picture B1 can be displayed, that is, the decoding time of the picture B1. It should be displayed after hours. The difference between the decoding time and the display time of the picture P4 is 5 picture times, and DpbOutputDelay is 10 because tc is a field unit.

次に、図３を参照しつつ、二つの符号化動画像データを結合した場合に、その二つの符号化動画像データの結合点前後で復号遅延及び表示遅延に矛盾が無いようにするために、結合点より後ろ側の符号化動画像データのピクチャでの復号遅延CpbRemovalDelay及び表示遅延DpbOutputDelayのとるべき値について説明する。 Next, referring to FIG. 3, when two encoded moving image data are combined, in order to ensure that there is no contradiction in decoding delay and display delay before and after the combination point of the two encoded moving image data. Next, the values to be taken for the decoding delay CpbRemovalDelay and the display delay DpbOutputDelay in the picture of the encoded video data behind the coupling point will be described.

結合点よりも前側に結合される第１の動画像符号化データ３０１に示された各ブロックは、それぞれ、一つのピクチャを表し、かつ、各ブロック内の文字は、図２と同様に、符号化モード及び動画像符号化装置への入力の順番を表す。この例では、第１の動画像符号化データ３０１の符号化構造は、図２に示された符号化構造２０１と同一である。 Each block shown in the first moving image encoded data 301 combined before the combining point represents one picture, and the characters in each block are encoded as in FIG. Represents the order of input to the video coding mode and the moving picture coding apparatus. In this example, the encoding structure of the first moving image encoded data 301 is the same as the encoding structure 201 shown in FIG.

この例では、第１の動画像符号化データの最後のピクチャB15の直後に、第２の動画像符号化データ３０２を結合する。第２の動画像符号化データ３０２についても、各ブロックは、それぞれ、一つのピクチャを表し、かつ、各ブロック内の文字は、符号化モード及び動画像符号化装置への入力の順番を表す。第２の動画像符号化データ３０２の上側に示された矢印は、それぞれ、符号化されるピクチャB70、B69、B71が前方向フレーム予測において参照する参照ピクチャを表す。また、第２の動画像符号化データ３０２の下側に示された矢印は、それぞれ、符号化されるピクチャB70、B69、B71が後方向フレーム予測において参照する参照ピクチャを表す。第２の動画像符号化データ３０２の符号化構造は、ピクチャB70、B69、B71を除き図２に示された符号化構造２０１と同一である。ピクチャB70、B69、B71の符号化順序は図２に示された符号化構造２０１に含まれる双方向予測ピクチャの符号化順序と同一である。しかし、ピクチャB70、B69、B71の参照ピクチャは、符号化構造２０１に含まれる双方向予測ピクチャの参照ピクチャと異なる。ピクチャB70、B71は表示時刻が後のピクチャ、即ちピクチャI72のみを参照する。一方、ピクチャB69は表示時刻が前のピクチャ、即ちI68のみを参照する。このような状況は、例えば、ピクチャB69とピクチャB70との間にシーンチェンジがある場合に生じる。シーンチェンジ境界にて、画像が大きく変化するため、シーンチェンジ境界の近くに位置する双方向予測ピクチャは、より予測効率の良い、シーンチェンジ境界と比べて同じ側にあるピクチャのみを参照するようになる。この例では、B69がTFDピクチャであり、B70及びB71がDLPピクチャである。この例では、第２の動画像符号化データ３０２のうち、ピクチャI72以降のピクチャが、第１の動画像符号化データのピクチャB15の後に結合される。なお、非特許文献１の方式では、TFDピクチャの表示時刻はDLPピクチャの表示時刻よりも早いこと、及びDLPピクチャは、TPピクチャから参照されないこと、という制約が設けられている。 In this example, the second moving image encoded data 302 is combined immediately after the last picture B15 of the first moving image encoded data. Also in the second moving image encoded data 302, each block represents one picture, and the characters in each block represent the encoding mode and the order of input to the moving image encoding device. The arrows shown on the upper side of the second moving image encoded data 302 represent reference pictures that the pictures B70, B69, and B71 to be encoded refer to in the forward frame prediction, respectively. In addition, the arrows shown below the second moving image encoded data 302 represent reference pictures that the pictures B70, B69, and B71 to be encoded refer to in backward frame prediction, respectively. The encoding structure of the second moving image encoded data 302 is the same as the encoding structure 201 shown in FIG. 2 except for pictures B70, B69, and B71. The encoding order of the pictures B70, B69, and B71 is the same as the encoding order of the bi-predictive pictures included in the encoding structure 201 shown in FIG. However, the reference pictures of the pictures B70, B69, and B71 are different from the reference pictures of the bi-predictive pictures included in the coding structure 201. Pictures B70 and B71 refer only to pictures whose display time is later, that is, picture I72. On the other hand, the picture B69 refers to only the picture with the previous display time, that is, I68. Such a situation occurs, for example, when there is a scene change between the picture B69 and the picture B70. Because the image changes greatly at the scene change boundary, bi-predictive pictures located near the scene change boundary should refer only to pictures on the same side as the scene change boundary, which is more predictive. Become. In this example, B69 is a TFD picture, and B70 and B71 are DLP pictures. In this example, of the second moving image encoded data 302, the pictures after the picture I72 are combined after the picture B15 of the first moving image encoded data. In the method of Non-Patent Document 1, there are restrictions that the display time of the TFD picture is earlier than the display time of the DLP picture, and that the DLP picture is not referenced from the TP picture.

第２の動画像符号化データ３０２の下側に示されたブロック列３０３は、第２の動画像符号化データ３０２の各ピクチャに付加されるPTSEIに含まれる復号遅延CpbRemovalDelay及び表示遅延DpbOutputDelayの値を表す。ブロック列３０３のうちの上側の各ブロックには、それぞれ、そのブロックの上に位置する第２の動画像符号化データ３０２のピクチャの復号遅延CpbRemovalDelayの値が示される。一方、ブロック列３０３のうちの下側の各ブロックには、それぞれ、そのブロックの上に位置する第２の動画像符号化データ３０２のピクチャの表示遅延DpbOutputDelayの値が示される。 The block sequence 303 shown below the second moving image encoded data 302 is a value of a decoding delay CpbRemovalDelay and a display delay DpbOutputDelay included in PTSEI added to each picture of the second moving image encoded data 302. Represents. Each block on the upper side of the block sequence 303 indicates the value of the decoding delay CpbRemovalDelay of the picture of the second moving image encoded data 302 located above the block. On the other hand, each lower block in the block sequence 303 indicates the value of the display delay DpbOutputDelay of the picture of the second moving image encoded data 302 located above the block.

ブロック列３０３の下側には、第１の動画像符号化データ３０１と第２の動画像符号化データ３０２とが結合された結合動画像符号化データ３０４が示されている。この例では、結合動画像符号化データ３０４において、第２の動画像符号化データ３０２のうちのピクチャB67及び符号化順序でピクチャB67よりも前の符号化ピクチャは含まれない。また、ピクチャB69は、符号化順序でピクチャI72よりも前の符号化ピクチャであるピクチャI68を参照するTFDピクチャである。そのため、ピクチャI72で結合されると、ピクチャB69は正常再生できなくなる。よって、ピクチャB69は結合時に除去される。ただし、ピクチャB69は、除去されずに、結合動画像符号化データ内に残されてもよい。ピクチャB70,B71は、符号化順序でピクチャI72よりも前の符号化ピクチャを参照しないDLPピクチャであり、正常再生が可能である。なおピクチャB70,B71は、P76以降のピクチャから参照されないため、ピクチャB70,B71をTFDピクチャ69と同時に除去しても、P76以降のピクチャの再生に影響しない。 Below the block sequence 303, combined moving image encoded data 304 in which the first moving image encoded data 301 and the second moving image encoded data 302 are combined is shown. In this example, the combined moving image encoded data 304 does not include the picture B67 of the second moving image encoded data 302 and the encoded picture preceding the picture B67 in the encoding order. Also, the picture B69 is a TFD picture that refers to a picture I68 that is an encoded picture preceding the picture I72 in the encoding order. Therefore, when combined with the picture I72, the picture B69 cannot be normally reproduced. Therefore, picture B69 is removed at the time of combination. However, the picture B69 may be left in the combined moving image encoded data without being removed. Pictures B70 and B71 are DLP pictures that do not refer to an encoded picture prior to picture I72 in the encoding order, and can be normally reproduced. Since the pictures B70 and B71 are not referenced from the pictures after P76, even if the pictures B70 and B71 are removed at the same time as the TFD picture 69, the reproduction of the pictures after P76 is not affected.

ブロック列３０５は、結合動画像符号化データ３０４における、ピクチャI72、B70、B71、P76、B74、B73、B75が持つべき復号遅延CpbRemovalDelay及び表示遅延DpbOutputDelayの値を示す。ブロック列３０５のうちの上側の各ブロックには、それぞれ、そのブロックの上に位置する結合動画像符号化データ３０４のピクチャの復号遅延CpbRemovalDelayの値が示される。一方、ブロック列３０５のうちの下側の各ブロックには、それぞれ、そのブロックの上に位置する結合動画像符号化データ３０４のピクチャの表示遅延DpbOutputDelayの値が示される。 A block string 305 indicates values of a decoding delay CpbRemovalDelay and a display delay DpbOutputDelay that the pictures I72, B70, B71, P76, B74, B73, and B75 should have in the combined moving image encoded data 304. Each block on the upper side of the block sequence 305 indicates the value of the decoding delay CpbRemovalDelay of the picture of the combined moving image encoded data 304 located above the block. On the other hand, each lower block in the block sequence 305 indicates the value of the display delay DpbOutputDelay of the picture of the combined moving image encoded data 304 positioned above that block.

ピクチャI72の復号遅延CpbRemovalDelayは、結合後に、直前のBPSEIを持つピクチャであるピクチャI12との符号化ピクチャ間隔に合わせる必要がある。この例では、ピクチャI72は、符号化順序に従うとピクチャI12から8番目であるので、復号遅延CpbRemovalDelayは16(=8*2)となる。またピクチャI72の表示遅延DpbOutputDelayも、ピクチャI72よりも後に復号されるピクチャB73を正しく表示するために修正される必要がある。ピクチャI72の表示遅延DpbOutputDelayの値は、ピクチャB69を除去する前と除去した後とで異なる。ピクチャB69を除去した後の表示遅延DpbOutputDelayの値は、復号順番でI72より後の除去ピクチャについての、その除去ピクチャと復号順序で直前のピクチャとの復号時間の差である復号間隔分だけ減少する。この例ではピクチャB69が除去されるピクチャに該当し、B69の復号間隔（B69の復号時刻と、復号順番で直前のピクチャであるB70の復号時刻との差分）は2であるので、ピクチャI72の表示遅延DpbOutputDelayの値は2となる。同様に、ピクチャB70の表示遅延DpbOutputDelayも、復号順番でB70より後の除去ピクチャの復号間隔分、即ち2だけ減少し、2となる。 The decoding delay CpbRemovalDelay of the picture I72 needs to be matched with the encoded picture interval with the picture I12 that is the picture having the immediately preceding BPSEI after the combination. In this example, since the picture I72 is the eighth from the picture I12 according to the coding order, the decoding delay CpbRemovalDelay is 16 (= 8 * 2). Also, the display delay DpbOutputDelay of the picture I72 needs to be corrected in order to correctly display the picture B73 decoded after the picture I72. The value of the display delay DpbOutputDelay of the picture I72 is different before and after the picture B69 is removed. The value of the display delay DpbOutputDelay after removing the picture B69 decreases by the decoding interval that is the difference in decoding time between the removed picture and the previous picture in the decoding order for the removed picture after I72 in the decoding order. . In this example, picture B69 corresponds to the picture to be removed, and the decoding interval of B69 (the difference between the decoding time of B69 and the decoding time of B70 that is the previous picture in the decoding order) is 2, so The value of the display delay DpbOutputDelay is 2. Similarly, the display delay DpbOutputDelay of the picture B70 is decreased to 2 by the decoding interval of the removed pictures after B70 in the decoding order, that is, by 2.

さらに、ピクチャB71、P76、B74、B73、B75の復号遅延CpbRemovalDelayは、ピクチャB69を除去する前と除去した後とで異なる。ピクチャB69を除去した後のピクチャB71、P76、B74、B73、B75の復号遅延CpbRemovalDelayの値は、それぞれ、除去する前の復号遅延CpbRemovalDelayの値から、復号順番でそれぞれのピクチャI72より前の除去ピクチャの復号間隔分だけ減少する。この例では、ピクチャB71, P76, B74, B73, B75の復号遅延CpbRemovalDelayの値は、それぞれ、元の復号遅延CpbRemovalDelayの値から、TFDピクチャB69の復号間隔2を引いた、4、6、8、10、12となる。なお、DLPピクチャB70については、復号順番でB70よりも前の除去ピクチャがないため、ピクチャB69が除去されてもCpbRemovalDelayの値は不変である。さらに、ピクチャP76、B74、B73、B75については、表示遅延DpbOutputDelayの値は不変である。さらに、結合後の最初のCRAとなるピクチャよりも入力順序で後になるピクチャについては、復号遅延CpbRemovalDelay及び表示遅延DpbOutputDelayの何れも修正する必要はない。 Furthermore, the decoding delays CpbRemovalDelay of the pictures B71, P76, B74, B73, and B75 are different before and after the picture B69 is removed. The values of the decoding delays CpbRemovalDelay of the pictures B71, P76, B74, B73, and B75 after the removal of the picture B69 are the removed pictures before the respective pictures I72 in the decoding order from the values of the decoding delays CpbRemovalDelay before the removal. Decrease by the decoding interval. In this example, the values of the decoding delays CpbRemovalDelay of the pictures B71, P76, B74, B73, and B75 are respectively 4, 6, 8, and the value of the original decoding delay CpbRemovalDelay minus the decoding interval 2 of the TFD picture B69. 10 and 12. For DLP picture B70, since there is no removed picture prior to B70 in the decoding order, the value of CpbRemovalDelay is unchanged even if picture B69 is removed. Further, for the pictures P76, B74, B73, and B75, the value of the display delay DpbOutputDelay is unchanged. Furthermore, it is not necessary to modify either the decoding delay CpbRemovalDelay or the display delay DpbOutputDelay for a picture that is later in the input order than the picture that is the first CRA after combining.

上記のように、二つの符号化動画像データを結合することにより、その結合点よりも後の符号化動画像データに含まれる幾つかのピクチャについては、復号の際に、復号遅延CpbRemovalDelay及び表示遅延DpbOutputDelayを修正する必要がある。そこで、本実施形態では、動画像符号化装置が、結合前の元の動画像符号化データに含まれるピクチャの復号遅延CpbRemovalDelay及び表示遅延DpbOutputDelayの値を修正しなくても、動画像復号装置が、結合動画像符号化データを復号する際に、復号遅延CpbRemovalDelay及び表示遅延DpbOutputDelayの値を適切な値に書き換えるために利用可能なパラメータが、動画像符号化データのヘッダに追加される。 As described above, by combining the two encoded moving image data, for some pictures included in the encoded moving image data after the connection point, the decoding delay CpbRemovalDelay and the display are displayed at the time of decoding. The delay DpbOutputDelay needs to be corrected. Therefore, in the present embodiment, the moving picture decoding apparatus does not need to modify the values of the decoding delay CpbRemovalDelay and the display delay DpbOutputDelay of the pictures included in the original moving picture encoded data before combining. When decoding the combined moving image encoded data, parameters that can be used to rewrite the values of the decoding delay CpbRemovalDelay and the display delay DpbOutputDelay to appropriate values are added to the header of the moving image encoded data.

図４を参照しつつ、第１の実施形態における、復号遅延CpbRemovalDelay及び表示遅延DpbOutputDelayの値を適切な値に書き換えるために利用可能なパラメータを含む、符号化動画像データの構造を説明する。
図４に示されるように、一つのピクチャのデータ構造４００は、６種類のNetwork Abstraction Layer(NAL) unit４１０〜４１５を含む。これらのNAL unit４１０〜４１５は、いずれも、非特許文献１の方式及び、MPEG-4 AVC/H.264のNAL unitに準拠する。各NAL unitにはヘッダNUH４２０が付加される。ヘッダNUH４２０は、各NAL unitのタイプを表すNalUnitTypeフィールドを含む。NalUnitTypeが'1'もしくは'2'の場合は、TPピクチャであることを表す。NalUnitTypeが'7'の場合は、TFDピクチャ及びDLPピクチャが直後に出現する可能性がある、再引き込み可能なBLAピクチャであることを表す。NalUnitTypeが'8'の場合は、DLPピクチャが直後に出現する可能性がある、再引き込み可能なBLAピクチャであることを表す。NalUnitTypeが'9'の場合は、TFDピクチャ及びDLPピクチャが直後に出現しない、再引き込み可能なBLAピクチャであることを表す。NalUnitTypeが'12'の場合は、再引き込み可能なCRAピクチャであることを表す。NalUnitTypeが'13'の場合は、DLPピクチャであることを表す。NalUnitTypeが'14'の場合は、TFDピクチャであることを表す。
なお、各ピクチャのNalUnitTypeの値は、上記の値以外に設定されてもよい。 With reference to FIG. 4, the structure of encoded moving image data including parameters that can be used to rewrite the values of the decoding delay CpbRemovalDelay and the display delay DpbOutputDelay to appropriate values in the first embodiment will be described.
As shown in FIG. 4, the data structure 400 of one picture includes six types of Network Abstraction Layer (NAL) units 410 to 415. These NAL units 410 to 415 all comply with the method of Non-Patent Document 1 and the NAL unit of MPEG-4 AVC / H.264. A header NUH420 is added to each NAL unit. The header NUH 420 includes a NalUnitType field indicating the type of each NAL unit. When NalUnitType is '1' or '2', it represents a TP picture. When NalUnitType is “7”, it represents that the TFD picture and the DLP picture are likely to appear immediately and are BLA pictures that can be redrawn. When NalUnitType is “8”, this indicates that the DLP picture may appear immediately after and is a redrawable BLA picture. When NalUnitType is “9”, it represents a redrawable BLA picture in which a TFD picture and a DLP picture do not appear immediately after. When NalUnitType is '12', it represents a redrawable CRA picture. When NalUnitType is “13”, it represents a DLP picture. When NalUnitType is “14”, it represents a TFD picture.
Note that the value of NalUnitType of each picture may be set to other than the above values.

以下、各NAL unitについて説明する。
NAL unit４１０は、デリミタ(DELIM)NAL unitであり、ピクチャの境界であることを示す。
NAL unit４１１は、シーケンスパラメータセット(SPS)NAL unitであり、SPS NAL unit４１１には、符号化動画像のシーケンス全体に渡って共通なパラメータが記述される。SPS NAL unit ４１１は、再引き込みが可能なピクチャに付加される。
NAL unit４１２は、ピクチャパラメータセット(PPS)NAL unitであり、PPS NAL unit４１２には、複数の符号化ピクチャで共通なパラメータが記述される。PPS NAL unit４１２は、再引き込みが可能なピクチャに付加されるほか、他のピクチャにも付加されることがある。
NAL unit４１３は、BPSEI NAL unitであり、BPSEI NAL unit４１３は再引き込みが可能なピクチャにのみ付加される。本実施形態では、このBPSEI NAL unit４１３に、動画像復号装置にて結合点以降のピクチャの復号遅延と表示遅延とを修正するために利用されるパラメータが追加される。
NAL unit４１４は、PTSEI NAL unitであり、PTSEI NAL unit４１４は各ピクチャに付加される。
NAL unit４１５は、スライス(SLICE)NAL unitであり、符号化されたピクチャの実体である。 Hereinafter, each NAL unit will be described.
The NAL unit 410 is a delimiter (DELIM) NAL unit and indicates a picture boundary.
The NAL unit 411 is a sequence parameter set (SPS) NAL unit, and the SPS NAL unit 411 describes parameters common to the entire sequence of encoded moving images. The SPS NAL unit 411 is added to a picture that can be redrawn.
The NAL unit 412 is a picture parameter set (PPS) NAL unit, and the PPS NAL unit 412 describes parameters common to a plurality of coded pictures. The PPS NAL unit 412 may be added to other pictures in addition to the redrawable picture.
The NAL unit 413 is a BPSEI NAL unit, and the BPSEI NAL unit 413 is added only to a picture that can be redrawn. In the present embodiment, parameters used for correcting the decoding delay and display delay of pictures after the coupling point in the video decoding device are added to the BPSEI NAL unit 413.
The NAL unit 414 is a PTSEI NAL unit, and the PTSEI NAL unit 414 is added to each picture.
The NAL unit 415 is a slice (SLICE) NAL unit and is an entity of a coded picture.

本実施形態による、BPSEI NAL unit４１３は、(N+1)個(ただし、Nは0以上の整数)のInitialCpbRemovalDelayフィールドとInitialCpbRemovalDelayOffsetフィールドの組を含む。これらのフィールドの定義は非特許文献１の方式及び、MPEG-4 AVC/H.264と同等であってもよい。 The BPSEI NAL unit 413 according to this embodiment includes a set of (N + 1) (where N is an integer equal to or greater than 0) InitialCpbRemovalDelay field and InitialCpbRemovalDelayOffset field. The definition of these fields may be equivalent to the method of Non-Patent Document 1 and MPEG-4 AVC / H.264.

なお、InitialCpbRemovalDelayフィールド及びInitialCpbRemovalDelayOffsetフィールドが複数存在するのは、符号化ビットストリームを(N+1)の種類のビットレートで伝送した場合に適したInitialCpbRemovalDelay及びInitialCpbRemovalDelayOffsetを記述するためである。なおInitialCpbRemovalDelayOffsetは、動画像符号化装置における最初のピクチャの符号化完了時刻と、動画像復号装置に対して符号化ピクチャデータの伝送を開始する時刻との差分を表す。 Note that there are a plurality of InitialCpbRemovalDelay fields and InitialCpbRemovalDelayOffset fields in order to describe InitialCpbRemovalDelay and InitialCpbRemovalDelayOffset that are suitable when the encoded bitstream is transmitted at (N + 1) types of bit rates. Note that InitialCpbRemovalDelayOffset represents the difference between the encoding completion time of the first picture in the video encoding device and the time when transmission of encoded picture data to the video decoding device is started.

PTSEI NAL unit４１４は、復号遅延CpbRemovalDelayフィールドと、表示遅延DpbOutputDelayフィールドと、NumRemovedTfdsフィールドとを含む。NumRemovedTfdsフィールドは、復号遅延と表示遅延の補正に利用される補正情報の一例である。NumRemovedTfdsフィールドには、PTSEIが付加されたピクチャから、復号順番で次のBPSEI付加ピクチャまでの間で、除去されたピクチャの復号間隔の総和が記述される。ピクチャの復号間隔とは、このピクチャに付加されたPTSEIのCpbRemovalDelayフィールド値から、復号順番で直前のピクチャに付加されたPTSEIのCpbRemovalDelayフィールド値を引いた値と定義される。なお、復号順番で直前のピクチャがBLAピクチャの場合は、BLAピクチャに付加されたPTSEIのCpbRemovalDelayフィールド値を0として扱う。符号化ビットストリームを生成した段階では、NumRemovedTfdsフィールド値は0とする。 PTSEI NAL unit 414 includes a decoding delay CpbRemovalDelay field, a display delay DpbOutputDelay field, and a NumRemovedTfds field. The NumRemovedTfds field is an example of correction information used for correcting decoding delay and display delay. The NumRemovedTfds field describes the sum of the decoding intervals of the removed pictures from the picture to which the PTSEI is added to the next BPSEI-added picture in decoding order. The decoding interval of a picture is defined as a value obtained by subtracting the CpbRemovalDelay field value of PTSEI added to the previous picture in the decoding order from the CpbRemovalDelay field value of PTSEI added to this picture. When the immediately preceding picture in decoding order is a BLA picture, the CpbRemovalDelay field value of the PTSEI added to the BLA picture is treated as 0. At the stage where the encoded bitstream is generated, the NumRemovedTfds field value is set to 0.

図５は、第１の実施形態による、動画像符号化装置の概略構成図である。動画像符号化装置１は、制御部１１と、符号化制御部１２と、ピクチャ符号化部１３と、結合点識別情報処理部１４と、データ結合部１５とを有する。動画像符号化装置１が有するこれらの各部は、それぞれ、別個の回路として動画像符号化装置１に実装される。あるいは、動画像符号化装置１が有するこれらの各部は、その各部の機能を実現する回路が集積された一つの集積回路として動画像符号化装置１に実装されてもよい。あるいはまた、動画像符号化装置１が有するこれらの各部は、動画像符号化装置１が有するプロセッサ上で実行されるコンピュータプログラムにより実現される機能モジュールであってもよい。 FIG. 5 is a schematic configuration diagram of a moving image encoding apparatus according to the first embodiment. The moving image encoding apparatus 1 includes a control unit 11, an encoding control unit 12, a picture encoding unit 13, a connection point identification information processing unit 14, and a data combining unit 15. Each of these units included in the video encoding device 1 is mounted on the video encoding device 1 as a separate circuit. Alternatively, these units included in the video encoding device 1 may be mounted on the video encoding device 1 as one integrated circuit in which circuits that realize the functions of the units are integrated. Alternatively, each of these units included in the moving image encoding device 1 may be a functional module realized by a computer program executed on a processor included in the moving image encoding device 1.

制御部１１は、動画像データを符号化する際、または符号化動画像データの編集操作を実行する際、動画像符号化装置１の各部の処理を制御する。例えば、制御部１１は、シーンチェンジの位置等の動画像データの性質、要求される符号化動画像データの再生画質、圧縮率などに基づいて、符号化しようとする動画像データに対して適用するGOP構造などを決定する。そして制御部１１は、GOP構造などを符号化制御部１２へ通知する。 The control unit 11 controls the processing of each unit of the video encoding device 1 when encoding video data or when performing an editing operation on the encoded video data. For example, the control unit 11 applies to the moving image data to be encoded based on the nature of the moving image data such as the position of the scene change, the required reproduction image quality of the encoded moving image data, the compression rate, and the like. Determine the GOP structure to be performed. Then, the control unit 11 notifies the encoding control unit 12 of the GOP structure and the like.

まず、動画像データを符号化する動画像符号化処理について説明する。動画像符号化処理では、符号化制御部１２及びピクチャ符号化部１３が用いられる。
符号化制御部１２は、制御部１１から通知されたGOP構造に応じて、各ピクチャの符号化順番、符号化モード（例えば、イントラ符号化モード、前方向予測モード、及び双方向予測モードのうちの何れか）などを決定する。そして符号化制御部１２は、各ピクチャの符号化モード、GOP構造中の位置等に応じて、CRAピクチャの挿入間隔、符号化時のリオーダリングピクチャ数、及び最大表示遅延を決定する。図２に示された例では、CRAピクチャの挿入間隔は12、リオーダリングピクチャ数は2、最大表示遅延は5である。さらに符号化制御部１２は、これらの値に応じて、各ピクチャのヘッダ情報を生成する。 First, a moving image encoding process for encoding moving image data will be described. In the moving image encoding process, the encoding control unit 12 and the picture encoding unit 13 are used.
According to the GOP structure notified from the control unit 11, the encoding control unit 12 encodes each picture in coding order, coding mode (for example, among intra coding mode, forward prediction mode, and bidirectional prediction mode). Or any of these). The encoding control unit 12 determines the CRA picture insertion interval, the number of reordering pictures at the time of encoding, and the maximum display delay according to the encoding mode of each picture, the position in the GOP structure, and the like. In the example shown in FIG. 2, the insertion interval of CRA pictures is 12, the number of reordering pictures is 2, and the maximum display delay is 5. Further, the encoding control unit 12 generates header information of each picture according to these values.

例えば、ピクチャの種別がIピクチャ（CRAピクチャ）、すなわち、他のピクチャを参照せずに符号化されるピクチャで、そのピクチャが符号化動画像データの先頭ピクチャでない場合には、符号化制御部１２は、ピクチャ内の各スライスのNUH７２０のNalUnitTypeを'12'に設定する。符号化動画像データの先頭ピクチャ内の各スライスのNUH７２０のNalUnitTypeは、10(IDRピクチャ)に設定される。またリオーダリングピクチャ数が1以上の場合、CRAピクチャの直後の、CRAピクチャよりも復号順番及び表示順番が早いピクチャを参照するピクチャについて、符号化制御部１２は、NalUnitTypeを'14'（TFDピクチャ）とする。またCRAピクチャの直後の、表示時刻がCRAピクチャより前であり、かつCRAピクチャよりも復号順番及び表示順番が早いピクチャを参照しないピクチャについて、符号化制御部１２は、NalUnitTypeを'13'（DLPピクチャ）とする。それ以外のピクチャについては、符号化制御部１２は、NalUnitTypeを'1'もしくは'2'（TPピクチャ）とする。 For example, when the picture type is an I picture (CRA picture), that is, a picture that is coded without referring to another picture, and the picture is not the first picture of the coded moving image data, the coding control unit 12 sets NalUnitType of NUH 720 of each slice in the picture to '12'. The NUH720 NalUnitType of each slice in the first picture of the encoded moving image data is set to 10 (IDR picture). When the number of reordering pictures is 1 or more, the encoding control unit 12 sets NalUnitType to “14” (TFD picture) for a picture that refers to a picture immediately after the CRA picture and whose decoding order and display order are earlier than the CRA picture. ). Also, for a picture immediately after the CRA picture, the display time of which is before the CRA picture, and which does not refer to a picture whose decoding order and display order are earlier than the CRA picture, the encoding control unit 12 sets NalUnitType to “13” (DLP Picture). For the other pictures, the encoding control unit 12 sets NalUnitType to “1” or “2” (TP picture).

符号化制御部１２は、ピクチャ符号化部１３に対し、符号化するピクチャ内の各スライスヘッダのNUH７２０のNalUnitTypeを通知する。また、符号化制御部１２は、各ピクチャのPTSEI内の復号遅延CpbRemovalDelay及び表示遅延DpbOutputDelayを、図２のように、ピクチャの予測構造から求め、通知する。 The encoding control unit 12 notifies the picture encoding unit 13 of the NalUnitType of NUH 720 of each slice header in the picture to be encoded. Also, the encoding control unit 12 obtains and notifies the decoding delay CpbRemovalDelay and display delay DpbOutputDelay in the PTSEI of each picture from the picture prediction structure as shown in FIG.

さらに、ピクチャ内の各スライスのNUH７２０のNalUnitTypeが'10'もしくは'12'の場合、符号化制御部１２は、そのピクチャにBPSEIを付加する。 Further, when the NalUnitType of NUH 720 of each slice in the picture is “10” or “12”, the encoding control unit 12 adds BPSEI to the picture.

符号化制御部１２は、ピクチャごとに、ピクチャ符号化部１３に、その符号化モード及びヘッダ情報を通知するとともに、ピクチャの符号化を指示する。 For each picture, the coding control unit 12 notifies the picture coding unit 13 of the coding mode and header information, and instructs the picture coding.

ピクチャ符号化部１３は、符号化制御部１２からの指示に従い、フレーム間予測符号化が可能な動画像符号化方式の何れかに準拠して、該当ピクチャを指定された符号化モードで符号化する。ピクチャ符号化部１３が準拠する動画像符号化方式は、例えば、MPEG-4 AVC/H.264またはMPEG-2とすることができる。そしてピクチャ符号化部１３は、符号化された各ピクチャを含む、符号化動画像データを記憶部（図示せず）に記憶する。 In accordance with an instruction from the encoding control unit 12, the picture encoding unit 13 encodes the corresponding picture in the designated encoding mode in accordance with any one of the moving image encoding methods capable of interframe predictive encoding. To do. The moving picture coding method to which the picture coding unit 13 is based can be, for example, MPEG-4 AVC / H.264 or MPEG-2. The picture encoding unit 13 stores encoded moving image data including each encoded picture in a storage unit (not shown).

次に、二つの符号化動画像データを結合する際の編集処理について説明する。結合点識別情報処理部１４及びデータ結合部１５は、この編集処理において用いられる。 Next, editing processing when two encoded moving image data are combined will be described. The coupling point identification information processing unit 14 and the data coupling unit 15 are used in this editing process.

結合点識別情報処理部１４は、例えば、図示しないユーザインターフェース部を介して選択された二つの符号化動画像データを記憶部（図示せず）から読み出す。そして結合点識別情報処理部１４は、図示しない外部からの制御信号に従い、その二つの符号化動画像データのうち、時間的に後側に結合される第２の符号化動画像データ内の結合点先頭ピクチャを特定する。外部からの制御信号は、例えば、第２の符号化動画像データ内の、先頭からの符号化ピクチャ数であり、結合点識別情報処理部１４は、例えば、この符号化ピクチャ数以前で最も遅いCRAピクチャを結合点とする。 The coupling point identification information processing unit 14 reads, for example, two encoded moving image data selected via a user interface unit (not shown) from a storage unit (not shown). Then, in accordance with an external control signal (not shown), the connection point identification information processing unit 14 combines the second encoded moving image data that is combined later in time among the two encoded moving image data. Specify the point head picture. The control signal from the outside is, for example, the number of encoded pictures from the beginning in the second encoded moving image data, and the connection point identification information processing unit 14 is, for example, the slowest before the number of encoded pictures. The CRA picture is the connection point.

結合点識別情報処理部１４は、結合点のCRAピクチャについて、リオーダリング数が1以上である場合、そのピクチャのスライスのNalUnitTypeを'12'からTFDピクチャが出現する可能性があるBLAピクチャであることを表す'7'に書き換える。すなわち、このNalUnitTypeの値が、二つの符号化動画像データがその結合点において結合されたこと、及び、結合点のBLAピクチャよりも符号化順序及び復号順序が後の１以上の符号化ピクチャが除去されたことを示す。また結合点識別情報処理部１４は、データ結合部１５に対し、第２の符号化動画像データ内の結合点のCRAピクチャ以降の全ピクチャを出力するとともに、結合点CRAピクチャの直後のTFDピクチャを除去する指示を出す。一方、リオーダリング数が0である場合、結合点識別情報処理部１４は、結合点のCRAピクチャについて、そのピクチャのスライスのNalUnitTypeを'12'からTFDピクチャ及びDLPピクチャが直後に出現しないBLAピクチャであることを表す'9'に書き換える。
次に、結合点識別情報処理部１４は、除去すべきTFDピクチャの復号間隔を算出し、除去すべきTFDピクチャの直前の非TFDピクチャのNumRemovedTfdsフィールドの値を、この非TFDピクチャに後続する、除去したTFDピクチャの復号間隔だけ加算する。各ピクチャの復号間隔が等しい場合には、最終的に、非TFDピクチャのNumRemovedTfdsフィールドの値は、そのピクチャよりも復号順序が後で、かつ、削除されたピクチャのフィールド単位の枚数となる。そして結合点識別情報処理部１４は、第２の符号化動画像データ中の、除去されるTFDピクチャよりも復号順番が早いピクチャについて、付加されたPTSEIのNumRemovedTfdsフィールドの値を変更する。 The joint point identification information processing unit 14 is a BLA picture in which a TFD picture may appear from a NalUnitType of “12” in the slice of the CRA picture of the joint point when the reordering number is 1 or more. Rewrite it to '7' to represent this. That is, the value of this NalUnitType is that two encoded moving image data are combined at the connection point, and one or more encoded pictures whose encoding order and decoding order are later than the BLA picture at the connection point. Indicates that it has been removed. In addition, the join point identification information processing unit 14 outputs all the pictures after the CRA picture at the join point in the second encoded moving image data to the data combining unit 15 and also the TFD picture immediately after the join point CRA picture. Give instructions to remove. On the other hand, when the reordering number is 0, for the CRA picture at the connection point, the connection point identification information processing unit 14 sets the NalUnitType of the slice of the picture from “12” to the BLA picture in which the TFD picture and the DLP picture do not appear immediately after Rewrite it to '9' to indicate that.
Next, the joint point identification information processing unit 14 calculates the decoding interval of the TFD picture to be removed, and the value of the NumRemovedTfds field of the non-TFD picture immediately before the TFD picture to be removed follows this non-TFD picture. Add only the decoding interval of the removed TFD picture. When the decoding intervals of each picture are equal, finally, the value of the NumRemovedTfds field of the non-TFD picture becomes the number of deleted pictures in the field unit after the decoding order. Then, the connection point identification information processing unit 14 changes the value of the NumRemovedTfds field of the added PTSEI for a picture earlier in decoding order than the TFD picture to be removed in the second encoded moving image data.

データ結合部１５は、結合点よりも時間的に前に結合される第１の符号化動画像データの後に、結合点識別情報処理部１４から出力された第２の符号化動画像データを結合する。この時、データ結合部１５は、第２の符号化動画像データ中の、先頭ピクチャの直後の、正常復号が保証されないTFDピクチャを除去する。なお、この場合、データ結合部１５は、DLPピクチャもTFDピクチャとみなして除去してもよい。
そしてデータ結合部１５は、第１の符号化動画像データと第２の符号化動画像データを結合することにより得られた結合符号化動画像データを記憶部（図示せず）に記憶する。 The data combining unit 15 combines the second encoded moving image data output from the combining point identification information processing unit 14 after the first encoded moving image data combined before the combining point. To do. At this time, the data combining unit 15 removes the TFD picture for which normal decoding is not guaranteed immediately after the head picture in the second encoded moving image data. In this case, the data combining unit 15 may also remove the DLP picture by regarding it as a TFD picture.
The data combining unit 15 stores the combined encoded moving image data obtained by combining the first encoded moving image data and the second encoded moving image data in a storage unit (not shown).

図６は、第１の実施例による動画像符号化装置により実行される動画像符号化処理の動作フローチャートである。なお、図６に示される動作フローチャートにしたがって、動画像符号化装置１は、符号化対象の動画像シーケンス全体を符号化する。
シーケンス全体の符号化処理開始に先立ち、例えば、制御部１１によって、GOP構造など、ピクチャ予測構造が決定される（ステップＳ１０１）。そしてそのピクチャ予測構造が符号化制御部１２へ通知される。 FIG. 6 is an operation flowchart of a video encoding process executed by the video encoding apparatus according to the first embodiment. In addition, according to the operation | movement flowchart shown by FIG. 6, the moving image encoder 1 encodes the whole moving image sequence of encoding object.
Prior to starting the encoding process for the entire sequence, for example, the control unit 11 determines a picture prediction structure such as a GOP structure (step S101). Then, the picture prediction structure is notified to the encoding control unit 12.

符号化制御部１２は、通知されたピクチャ予測構造、及び、動画像データの先頭からの符号化対象ピクチャの位置などに基づいて符号化対象ピクチャに適用される符号化モードを決定し、かつ、その符号化対象ピクチャのヘッダ情報を生成する（ステップＳ１０２）。 The encoding control unit 12 determines an encoding mode to be applied to the encoding target picture based on the notified picture prediction structure, the position of the encoding target picture from the beginning of the moving image data, and the like, and The header information of the encoding target picture is generated (step S102).

ステップＳ１０２の後、符号化制御部１２は、符号化対象ピクチャのデータ、そのピクチャの符号化モード及びヘッダ情報を符号化部１３へ通知する。そして符号化部１３は、通知された符号化モード及びヘッダ情報に従って、符号化対象ピクチャを符号化し、その符号化されたピクチャのデータにヘッダ情報を付加する（ステップＳ１０３）。 After step S102, the encoding control unit 12 notifies the encoding unit 13 of the data of the current picture to be encoded, the encoding mode of the picture, and header information. The encoding unit 13 encodes the encoding target picture according to the notified encoding mode and header information, and adds header information to the encoded picture data (step S103).

その後、制御部１１は、動画像シーケンス内に未符号化ピクチャがあるか否か判定する（ステップＳ１０４）。未符号化ピクチャがある場合（ステップＳ１０４−Ｙｅｓ）、制御部１１は、次の未符号化ピクチャを符号化対象ピクチャとして、ステップＳ１０２以降の処理を実行する。
一方、未符号化ピクチャがない場合（ステップＳ１０４−Ｎｏ）、制御部１１は、符号化処理を終了する。 Thereafter, the control unit 11 determines whether or not there is an uncoded picture in the moving image sequence (step S104). When there is an unencoded picture (step S104-Yes), the control unit 11 executes the processing from step S102 onward with the next unencoded picture as an encoding target picture.
On the other hand, when there is no uncoded picture (step S104-No), the control unit 11 ends the coding process.

図７は、第１の実施例による動画像符号化装置により実行される動画像編集処理の動作フローチャートである。この例では、DLPピクチャは削除されず、TFDピクチャのみが削除されるものとする。
結合点識別情報処理部１４は、TFDピクチャ及びDLPピクチャのうち、削除されなかったピクチャのリストL[]を初期化し、その削除されなかったピクチャの数に2を加算した値を表す変数mを2に初期化する（ステップＳ２０１）。なお、変数mは、復号順序で最後のDLPピクチャの後にTFDピクチャが出現しなければ、TFDピクチャ及びDLPピクチャのうち、削除されなかったピクチャの数としてもよい。 FIG. 7 is an operation flowchart of a moving image editing process executed by the moving image encoding apparatus according to the first embodiment. In this example, it is assumed that the DLP picture is not deleted and only the TFD picture is deleted.
The joint point identification information processing unit 14 initializes a list L [] of pictures not deleted from the TFD pictures and DLP pictures, and sets a variable m representing a value obtained by adding 2 to the number of pictures not deleted. 2 is initialized (step S201). Note that the variable m may be the number of TFD pictures and DLP pictures that have not been deleted if no TFD picture appears after the last DLP picture in decoding order.

次に、結合点よりも前に結合される第１の符号化動画像データのうち、結合点までの符号化ピクチャを順次記憶部（図示せず）から読み出す（ステップＳ２０２）。
また結合点識別情報処理部１４は、結合点よりも後に結合される第２の符号化動画像データのうち、結合点以降の符号化ピクチャを順次記憶部（図示せず）から読み出す（ステップＳ２０３）。次に、結合点識別情報処理部１４は、第２の符号化動画像データのうち、読み出された先頭CRAピクチャに対し、各スライスのNUHのNalUnitTypeの値を、BLAピクチャであることを表す値に書き換える（ステップＳ２０４）。 Next, among the first encoded moving image data combined before the connection point, the encoded pictures up to the connection point are sequentially read from the storage unit (not shown) (step S202).
Further, the joining point identification information processing unit 14 sequentially reads out encoded pictures after the joining point from the second encoded moving image data joined after the joining point from a storage unit (not shown) (step S203). ). Next, the connection point identification information processing unit 14 indicates that the NUH NalUnitType value of each slice is a BLA picture for the read first CRA picture in the second encoded moving image data. The value is rewritten (step S204).

次に、結合点識別情報処理部１４は、復号順番で次のピクチャのNalUnitTypeが'14'か否か、すなわち、TFDピクチャか否か判定する（ステップＳ２０５）。TFDピクチャの場合（ステップＳ２０５−Ｙｅｓ）、結合点識別情報処理部１４は、TFDピクチャを削除する指示を結合部１５へ出力し、かつ、そのTFDピクチャの復号間隔、即ち復号順番でそのTFDピクチャと直前のピクチャとの、PTSEIのCpbRemovalDelay値の差分をリスト[]の0番目からm番目のエントリにそれぞれ加算する（ステップＳ２０６）。その後、結合点識別情報処理部１４は、ステップＳ２０５に戻り、次のピクチャのNalUnitTypeを評価する。 Next, the connection point identification information processing unit 14 determines whether or not the NalUnitType of the next picture is “14” in the decoding order, that is, whether or not it is a TFD picture (step S205). In the case of a TFD picture (step S205—Yes), the joining point identification information processing unit 14 outputs an instruction to delete the TFD picture to the joining unit 15, and the TFD picture is decoded at the decoding interval, that is, in the decoding order. The difference between the CpbRemovalDelay value of PTSEI and the previous picture is added to the 0th to mth entries of the list [], respectively (step S206). Thereafter, the connection point identification information processing unit 14 returns to step S205 and evaluates the NalUnitType of the next picture.

一方、TFDピクチャでない場合（ステップＳ２０５−Ｎｏ）、結合点識別情報処理部１４は、復号順番で次のピクチャのNalUnitTypeが'13'か否か、すなわち、DLPピクチャか否か判定する（ステップＳ２０７）。次のピクチャがDLPピクチャの場合（ステップＳ２０７−Ｙｅｓ）、結合点識別情報処理部１４は変数mを1増加する（ステップＳ２０８）。その後、結合点識別情報処理部１４は、再度ステップＳ２０５以降の処理を実行する。一方、復号順番で次のピクチャがDLPピクチャでない場合（ステップＳ２０７−Ｎｏ）、次のピクチャはTFDピクチャでもDLPピクチャでもないので、TPピクチャである。TPピクチャよりも復号順序が後のピクチャには、TFDピクチャは存在しない。そこで、結合点識別情報処理部１４は、リストL[]に基づいて、BLAピクチャ及びDLPピクチャのPTSEIのNumRemovedTfdsフィールドを更新する（ステップＳ２０９）。具体的には、結合点識別情報処理部１４は、復号順番でBLAピクチャから数えてm番目までの非TFDピクチャについて、k番目のピクチャのPTSEIのNumRemovedTfdsフィールドの値をL[k]に設定する。その後、結合点識別情報処理部１４は、BLAピクチャ及び、全後続ピクチャをデータ結合部１５へ出力する。
結合部１５は、結合点よりも前の第１の符号化動画像データの最後のピクチャに後続して、第２の符号化動画像データのBLAピクチャ以降の各ピクチャを結合する。その際、結合部１５は、結合点識別情報処理部１４から除去することを通知されたTFDピクチャを除去する。 On the other hand, when it is not a TFD picture (step S205—No), the connection point identification information processing unit 14 determines whether or not the NalUnitType of the next picture is “13” in decoding order, that is, whether or not it is a DLP picture (step S207). ). When the next picture is a DLP picture (step S207—Yes), the connection point identification information processing unit 14 increases the variable m by 1 (step S208). Thereafter, the connection point identification information processing unit 14 executes the processing subsequent to step S205 again. On the other hand, if the next picture is not a DLP picture in the decoding order (step S207-No), the next picture is neither a TFD picture nor a DLP picture and is therefore a TP picture. A TFD picture does not exist in a picture whose decoding order is later than a TP picture. Therefore, the connection point identification information processing unit 14 updates the NumRemovedTfds field of the PTSEI of the BLA picture and the DLP picture based on the list L [] (step S209). Specifically, the connection point identification information processing unit 14 sets the value of the NumRemovedTfds field of the PTSEI of the kth picture to L [k] for the m-th non-TFD pictures counted from the BLA picture in decoding order. . Thereafter, the joining point identification information processing unit 14 outputs the BLA picture and all subsequent pictures to the data joining unit 15.
The combining unit 15 combines the pictures after the BLA picture of the second encoded moving image data following the last picture of the first encoded moving image data before the combining point. At that time, the combining unit 15 removes the TFD picture notified to be removed from the combining point identification information processing unit 14.

次に、第１の実施形態による動画像符号化装置１により符号化され、または編集された符号化動画像データを復号する動画像復号装置について説明する。 Next, a moving picture decoding apparatus that decodes encoded moving picture data encoded or edited by the moving picture encoding apparatus 1 according to the first embodiment will be described.

図８は、第１の実施形態による、動画像復号装置の概略構成図である。動画像復号装置２は、制御部２１と、ヘッダ情報解析部２２と、ピクチャ復号・表示時刻決定部２３と、ピクチャ復号部２４と、フレームメモリ２５とを有する。動画像復号装置２が有するこれらの各部は、それぞれ、別個の回路として動画像復号装置２に実装される。あるいは、動画像復号装置２が有するこれらの各部は、その各部の機能を実現する回路が集積された一つの集積回路として動画像復号装置２に実装されてもよい。あるいはまた、動画像復号装置２が有するこれらの各部は、動画像復号装置２が有するプロセッサ上で実行されるコンピュータプログラムにより実現される機能モジュールであってもよい。 FIG. 8 is a schematic configuration diagram of a video decoding device according to the first embodiment. The moving picture decoding apparatus 2 includes a control unit 21, a header information analysis unit 22, a picture decoding / display time determination unit 23, a picture decoding unit 24, and a frame memory 25. Each of these units included in the video decoding device 2 is implemented in the video decoding device 2 as a separate circuit. Alternatively, these units included in the video decoding device 2 may be mounted on the video decoding device 2 as one integrated circuit in which circuits that realize the functions of the units are integrated. Alternatively, these units included in the video decoding device 2 may be functional modules realized by a computer program executed on a processor included in the video decoding device 2.

制御部２１は、符号化動画像データを復号する際、動画像復号装置２の各部の処理を制御する。 The control unit 21 controls processing of each unit of the video decoding device 2 when decoding the encoded video data.

ヘッダ情報解析部２２は、符号化動画像データのヘッダ情報を解析し、ピクチャ復号・表示時刻の決定に必要なパラメータ、例えば、各ピクチャのNalUnitType、及び、PTSEIに含まれるCpbRemovalDelay, DpbOutputDelay及びNumRemovedTfdsを、ピクチャ復号・表示時刻決定部２３に出力する。 The header information analysis unit 22 analyzes the header information of the encoded moving image data and calculates parameters necessary for determining the picture decoding / display time, for example, NalUnitType of each picture, and CpbRemovalDelay, DpbOutputDelay, and NumRemovedTfds included in PTSEI. The picture decoding / display time determination unit 23 outputs the result.

ピクチャ復号・表示時刻決定部２３は、ヘッダ情報解析部２２から渡される、復号対象ピクチャのスライスのNUHをチェックする。NUHのNalUnitTypeが'7', '8'もしくは'9'の場合、ピクチャ復号・表示時刻決定部２３は、復号対象ピクチャはBLAピクチャであると判断する。 The picture decoding / display time determination unit 23 checks the NUH of the slice of the decoding target picture, which is passed from the header information analysis unit 22. When NalUnitType of NUH is “7”, “8”, or “9”, the picture decoding / display time determination unit 23 determines that the decoding target picture is a BLA picture.

復号対象ピクチャがBLAピクチャである場合、ピクチャ復号・表示時刻決定部２３は、BLAピクチャの復号遅延CpbRemovalDelayとして、BLAピクチャに付加されたPTSEIのCpbRemovalDelayの代わりに、以下のようにして求めた値を用いる。 When the decoding target picture is a BLA picture, the picture decoding / display time determination unit 23 uses, as the decoding delay CpbRemovalDelay of the BLA picture, a value obtained as follows instead of the CpbRemovalDelay of the PTSEI added to the BLA picture. Use.

ピクチャ復号・表示時刻決定部２３は、BLAピクチャの直前のBPSEIが付加されたピクチャの次のピクチャから、BLAピクチャまでの各ピクチャの復号間隔の総和Aをもとめる。そしてピクチャ復号・表示時刻決定部２３は、BLAピクチャの復号遅延CpbRemovalDelayをAとする。なお、各ピクチャの復号間隔が等しい場合には、ピクチャ復号・表示時刻決定部２３は、BLAピクチャの直前のBPSEIが付加されたピクチャの次のピクチャから、BLAピクチャまでのフィールド単位のピクチャ数を、BLAピクチャの復号遅延CpbRemovalDelayとしてもよい。 The picture decoding / display time determination unit 23 obtains the sum A of the decoding intervals of each picture from the picture next to the picture to which the BPSEI immediately before the BLA picture is added to the BLA picture. Then, the picture decoding / display time determining unit 23 sets the decoding delay CpbRemovalDelay of the BLA picture to A. If the decoding interval of each picture is equal, the picture decoding / display time determination unit 23 calculates the number of pictures in field units from the picture next to the picture to which the BPSEI immediately before the BLA picture is added to the BLA picture. BLA picture decoding delay CpbRemovalDelay may be used.

ピクチャ復号・表示時刻決定部２３は、さらにBLAピクチャのPTSEIのNumRemovedTfdsフィールドをチェックする。NumRemovedTfdsの値がゼロでない場合には、ピクチャ復号・表示時刻決定部２３は、BLAピクチャの直後のTFDピクチャが除去されていると判断し、BLAピクチャの表示遅延DpbOutputDelayからNumRemovedTfdsを減算した値を、BLAピクチャの修正後の表示遅延DpbOutputDelayとする。 The picture decoding / display time determination unit 23 further checks the NumRemovedTfds field of the PTSEI of the BLA picture. When the value of NumRemovedTfds is not zero, the picture decoding / display time determination unit 23 determines that the TFD picture immediately after the BLA picture has been removed, and subtracts NumRemovedTfds from the display delay DpbOutputDelay of the BLA picture, The display delay after correction of the BLA picture is DpbOutputDelay.

ピクチャ復号・表示時刻決定部２３は、さらに、復号順番でBLAピクチャ以降のピクチャについても、次のBPSEI付加ピクチャまで、以下の処理を行う。
処理対象のピクチャの復号遅延CpbRemovalDelayについて、ピクチャ復号・表示時刻決定部２３は、CpbRemovalDelayの元の値から、BLAピクチャに付加されたPTSEIのNumRemovedTfdsと、その処理対象ピクチャに付加されたPTSEIのNumRemovedTfdsとの差分値（処理対象ピクチャ以降にあった除去済TFDピクチャの復号間隔の総和に相当）を減算した値を、修正後の復号遅延CpbRemovalDelayとする。また、処理対象のピクチャの表示遅延DpbOutputDelayについて、ピクチャ復号・表示時刻決定部２３は、DpbOutputDelayの元の値からその処理対象ピクチャに付加されたPTSEIのNumRemovedTfdsを減算した値を、修正後の表示遅延DpbOutputDelayとする。
さらに、ピクチャ復号・表示時刻決定部２３は、TPピクチャについては、そのピクチャの復号遅延CpbRemovalDelayの元の値から、BLAピクチャに付加されたPTSEIのNumRemovedTfdsを減算した値を修正後のCpbRemovalDelayの値とする。 The picture decoding / display time determination unit 23 also performs the following processing for the pictures after the BLA picture in the decoding order until the next BPSEI-added picture.
For the decoding delay CpbRemovalDelay of the picture to be processed, the picture decoding / display time determination unit 23 calculates the NumRemovedTfds of PTSEI added to the BLA picture and the NumRemovedTfds of PTSEI added to the processing target picture from the original value of CpbRemovalDelay. The value obtained by subtracting the difference value (corresponding to the sum of the decoding intervals of the removed TFD pictures after the processing target picture) is defined as a corrected decoding delay CpbRemovalDelay. For the display delay DpbOutputDelay of the processing target picture, the picture decoding / display time determining unit 23 subtracts the PTSEI NumRemovedTfds added to the processing target picture from the original value of DpbOutputDelay, and the corrected display delay Set to DpbOutputDelay.
Furthermore, for the TP picture, the picture decoding / display time determination unit 23 subtracts the value obtained by subtracting the PTSEI NumRemovedTfds added to the BLA picture from the original value of the decoding delay CpbRemovalDelay of the picture, and the value of the corrected CpbRemovalDelay To do.

上記以外のピクチャについては、ピクチャ復号・表示時刻決定部２３は、ピクチャの復号遅延CpbRemovalDelay及び表示遅延DpbOutputDelayを、それぞれ、そのピクチャに付加されたPTSEIのCpbRemovalDelay及びDpbOutputDelayとする。 For pictures other than those described above, the picture decoding / display time determining unit 23 sets the decoding delay CpbRemovalDelay and display delay DpbOutputDelay of the picture as CpbRemovalDelay and DpbOutputDelay of the PTSEI added to the picture, respectively.

ピクチャ復号・表示時刻決定部２３は、上記の復号遅延CpbRemovalDelayに基づいて、各ピクチャの復号時刻を決定し、その復号時刻に、ピクチャ復号部２４に対し復号指示を出す。また、ピクチャ復号・表示時刻決定部２３は、上記の表示遅延DpbOutputDelayに基づいて、各ピクチャの表示時刻を決定し、その表示時刻に、フレームメモリ２５に対し、表示指示を出す。 The picture decoding / display time determining unit 23 determines the decoding time of each picture based on the decoding delay CpbRemovalDelay, and issues a decoding instruction to the picture decoding unit 24 at the decoding time. The picture decoding / display time determination unit 23 determines the display time of each picture based on the display delay DpbOutputDelay, and issues a display instruction to the frame memory 25 at the display time.

ピクチャ復号部２４は、復号対象ピクチャについて復号指示を受け取ると、フレームメモリ２５に格納された参照ピクチャを用いてその復号対象ピクチャを復号する。そしてピクチャ復号部２４は、復号したピクチャをフレームメモリ２５に格納させる。なお、ピクチャ復号部２４は、動画像符号化装置１の符号化部が準拠する符号化方式と同様の符号化方式に準拠して、復号処理を実行する。 When receiving a decoding instruction for the decoding target picture, the picture decoding unit 24 decodes the decoding target picture using the reference picture stored in the frame memory 25. The picture decoding unit 24 stores the decoded picture in the frame memory 25. Note that the picture decoding unit 24 performs a decoding process in accordance with an encoding method similar to the encoding method that the encoding unit of the moving image encoding device 1 conforms to.

フレームメモリ２５は、復号されたピクチャを格納する。またフレームメモリ２５は、復号されたピクチャを、そのピクチャよりも後に復号されるピクチャの参照ピクチャとしてピクチャ復号部２４に出力する。さらに、フレームメモリ２５は、ピクチャ復号・表示時刻決定部２３から受け取った表示指示に従い、表示部（図示せず）にピクチャを出力する。 The frame memory 25 stores the decoded picture. In addition, the frame memory 25 outputs the decoded picture to the picture decoding unit 24 as a reference picture of a picture to be decoded after that picture. Further, the frame memory 25 outputs a picture to a display unit (not shown) in accordance with the display instruction received from the picture decoding / display time determination unit 23.

図９は、第１の実施例による動画像復号装置により実行される動画像復号処理の動作フローチャートである。なお、図９に示される動作フローチャートにしたがって、動画像復号装置２は、復号対象の動画像シーケンス全体を復号する。 FIG. 9 is an operation flowchart of a video decoding process executed by the video decoding device according to the first embodiment. Note that the moving image decoding apparatus 2 decodes the entire moving image sequence to be decoded in accordance with the operation flowchart shown in FIG.

制御部２１は、シーケンス全体の復号処理開始に先立ち、変数flagを初期化し、0に設定する（ステップＳ３０１）。変数flagは、CpbRemovalDelay及びDpbOutputDelayの修正が必要な非BLAピクチャか否かを示す変数である。変数flagが'1'であれば、CpbRemovalDelay及びDpbOutputDelayの修正が必要であり、変数flagが'0'であれば、CpbRemovalDelay及びDpbOutputDelayの修正は必要ない。 Prior to starting the decoding process for the entire sequence, the control unit 21 initializes a variable flag and sets it to 0 (step S301). The variable flag is a variable indicating whether or not the non-BLA picture needs to be corrected for CpbRemovalDelay and DpbOutputDelay. If the variable flag is '1', CpbRemovalDelay and DpbOutputDelay need to be modified. If the variable flag is '0', CpbRemovalDelay and DpbOutputDelay need not be modified.

次に、ヘッダ情報解析部２２は、復号対象ピクチャのヘッダ情報を解析し、ピクチャ復号・表示時刻の決定に必要なパラメータを、ピクチャ復号・表示時刻決定部２３に出力する。復号対象のピクチャのヘッダ情報を解析する（ステップＳ３０２）。
ピクチャ復号・表示時刻決定部２３は、変数flagが'1'であるかを判定する（ステップＳ３０３）。
変数flagが'1'の場合（ステップＳ３０３−Ｙｅｓ）、ピクチャ復号・表示時刻決定部２３は、非BLAピクチャである復号対象ピクチャの復号遅延CpbRemovalDelayを、復号対象ピクチャのNumRemovedTfds及び直前のBLAピクチャのNumRemovedTfdsとを用いて修正する（ステップＳ３０４）。また、復号対象ピクチャの表示遅延DpbOutputDelayを、復号対象ピクチャのNumRemovedTfdsを用いて修正する。 Next, the header information analysis unit 22 analyzes the header information of the decoding target picture and outputs parameters necessary for determining the picture decoding / display time to the picture decoding / display time determining unit 23. The header information of the decoding target picture is analyzed (step S302).
The picture decoding / display time determination unit 23 determines whether the variable flag is “1” (step S303).
When the variable flag is “1” (step S303—Yes), the picture decoding / display time determination unit 23 sets the decoding delay CpbRemovalDelay of the decoding target picture that is a non-BLA picture, the NumRemovedTfds of the decoding target picture, and the previous BLA picture. Correction is performed using NumRemovedTfds (step S304). Further, the display delay DpbOutputDelay of the decoding target picture is corrected using NumRemovedTfds of the decoding target picture.

ステップＳ３０４の後、あるいは、ステップＳ３０３にて変数flagが'0'である場合（ステップＳ３０３−Ｎｏ）、ピクチャ復号・表示時刻決定部２３は、復号対象ピクチャにBPSEIが付加されているか判定する（ステップＳ３０５）。 After step S304, or when the variable flag is '0' in step S303 (step S303-No), the picture decoding / display time determination unit 23 determines whether BPSEI is added to the decoding target picture (step S303: No). Step S305).

復号対象ピクチャにBPSEIが付加されている場合（ステップＳ３０５−Ｙｅｓ）、ピクチャ復号・表示時刻決定部２３は、復号対象ピクチャがBLAピクチャか否か判定する（ステップＳ３０６）。復号対象ピクチャがBLAピクチャでなければ（ステップＳ３０６−Ｎｏ）、ピクチャ復号・表示時刻決定部２３は、変数flagを0にリセットする（ステップＳ３０７）。 When BPSEI is added to the decoding target picture (step S305—Yes), the picture decoding / display time determination unit 23 determines whether or not the decoding target picture is a BLA picture (step S306). If the decoding target picture is not a BLA picture (step S306-No), the picture decoding / display time determination unit 23 resets the variable flag to 0 (step S307).

一方、復号対象ピクチャがBLAピクチャであれば（ステップＳ３０６−Ｙｅｓ）、ピクチャ復号・表示時刻決定部２３は、そのピクチャの復号遅延CpbRemovalDelay及び表示遅延DpbOutputDelayを修正するとともに、変数flagを1に設定する（ステップＳ３０８）。この場合、上記のように、ピクチャ復号・表示時刻決定部２３は、BLAピクチャの復号遅延CpbRemovalDelayを、直前のBPSEIが付加されたピクチャの次のピクチャからBLAピクチャまでの各ピクチャの復号間隔の総和とする。また、ピクチャ復号・表示時刻決定部２３は、BLAピクチャの表示遅延DpbOutputDelayを、その元の値からNumRemovedTfdsを減算した値とする。 On the other hand, if the decoding target picture is a BLA picture (step S306-Yes), the picture decoding / display time determination unit 23 corrects the decoding delay CpbRemovalDelay and display delay DpbOutputDelay of the picture and sets the variable flag to 1 (Step S308). In this case, as described above, the picture decoding / display time determination unit 23 sets the decoding delay CpbRemovalDelay of the BLA picture to the sum of the decoding intervals of each picture from the picture next to the picture to which the previous BPSEI is added to the BLA picture. And Also, the picture decoding / display time determination unit 23 sets the display delay DpbOutputDelay of the BLA picture to a value obtained by subtracting NumRemovedTfds from the original value.

ステップＳ３０７またはＳ３０８の後、あるいは、ステップＳ３０５にて、復号対象ピクチャにBPSEIが付加されていない場合（ステップＳ３０５−Ｎｏ）、制御部２１は、符号化動画像データ内に未復号のピクチャがあるか否か判定する（ステップＳ３０９）。未復号のピクチャがあれば（ステップＳ３０９−Ｙｅｓ）、制御部２１は、処理をステップＳ３０２へ移行させる。そして復号順序で次のピクチャを復号対象ピクチャとして、ステップＳ３０２以降の処理が繰り返される。
一方、未復号のピクチャがなければ（ステップＳ３０９−Ｎｏ）、制御部２１は、動画像復号処理を終了する。 After step S307 or S308, or when BPSEI is not added to the decoding target picture in step S305 (step S305-No), the control unit 21 includes an undecoded picture in the encoded moving image data. Whether or not (step S309). If there is an undecoded picture (step S309—Yes), the control unit 21 shifts the processing to step S302. Then, the processing from step S302 onward is repeated with the next picture as the decoding target picture in the decoding order.
On the other hand, if there is no undecoded picture (step S309-No), the control unit 21 ends the moving image decoding process.

以上説明してきた、NumRemovedTfdsの導出方法、CpbRemovalDelay及びDpbOutputDelayの修正方法の具体例を、図１０を用いて説明する。
結合点よりも前側に結合される第１の動画像符号化データ１００１に示された各ブロックは、それぞれ、一つのピクチャを表し、かつ、各ブロック内の文字は、図２と同様に、符号化モード及び動画像符号化装置への入力の順番を表す。 A specific example of the NumRemovedTfds derivation method, CpbRemovalDelay, and DpbOutputDelay correction method described above will be described with reference to FIG.
Each block shown in the first moving image encoded data 1001 combined before the combining point represents one picture, and the characters in each block are encoded as in FIG. Represents the order of input to the video coding mode and the moving picture coding apparatus.

この例では、第１の動画像符号化データの最後のピクチャB11の直後に、第２の動画像符号化データ１００２を結合する。第２の動画像符号化データ１００２についても、各ブロックは、それぞれ、一つのピクチャを表し、かつ、各ブロック内の文字は、符号化モード及び動画像符号化装置への入力の順番を表す。第２の動画像符号化データ１００２の上側に示された矢印は、それぞれ、符号化されるピクチャB4〜B7が前方向フレーム予測において参照する参照ピクチャを表す。また、第２の動画像符号化データ１００２の下側に示された矢印は、それぞれ、符号化されるピクチャB4〜B7が後方向フレーム予測において参照する参照ピクチャを表す。 In this example, the second moving image encoded data 1002 is combined immediately after the last picture B11 of the first moving image encoded data. Also in the second moving image encoded data 1002, each block represents one picture, and the characters in each block represent the encoding mode and the order of input to the moving image encoding device. The arrows shown on the upper side of the second moving image encoded data 1002 represent reference pictures to which the pictures B4 to B7 to be encoded refer in forward frame prediction, respectively. Also, the arrows shown below the second moving image encoded data 1002 indicate reference pictures to which the pictures B4 to B7 to be encoded refer in backward frame prediction, respectively.

第２の動画像符号化データ１００２において、ピクチャB4、B2、B1、B3、B7は、第２の動画像符号化データ１００２の下側に表記するように、TFDピクチャである。ピクチャB5及びB7はDLPピクチャである。 In the second moving image encoded data 1002, the pictures B4, B2, B1, B3, and B7 are TFD pictures as described below the second moving image encoded data 1002. Pictures B5 and B7 are DLP pictures.

第２の動画像符号化データ１００２の下側に示されたブロック列１００３は、第２の動画像符号化データ１００２の各ピクチャに付加されるPTSEIに含まれる復号遅延CpbRemovalDelay及び表示遅延DpbOutputDelayの値を表す。ブロック列１００３のうちの上側の各ブロックには、それぞれ、そのブロックの上に位置する第２の動画像符号化データ１００２のピクチャの復号遅延CpbRemovalDelayの値が示される。一方、ブロック列１００３のうちの下側の各ブロックには、それぞれ、そのブロックの上に位置する第２の動画像符号化データ１００２のピクチャの表示遅延DpbOutputDelayの値が示される。 A block sequence 1003 shown below the second moving image encoded data 1002 is a value of a decoding delay CpbRemovalDelay and a display delay DpbOutputDelay included in PTSEI added to each picture of the second moving image encoded data 1002. Represents. Each block on the upper side of the block sequence 1003 indicates the value of the decoding delay CpbRemovalDelay of the picture of the second moving image encoded data 1002 positioned above the block. On the other hand, each lower block in the block sequence 1003 indicates the value of the display delay DpbOutputDelay of the picture of the second moving image encoded data 1002 positioned above that block.

ブロック列１００３の下側には、第１の動画像符号化データ１００１と第２の動画像符号化データ１００２とが結合された結合動画像符号化データ１００４が示されている。この例では、結合動画像符号化データ１００４において、第２の動画像符号化データ１００２のうちのTFDピクチャB4、B2、B1、B3、B7は削除されている。 Below the block string 1003, combined moving image encoded data 1004 in which the first moving image encoded data 1001 and the second moving image encoded data 1002 are combined is shown. In this example, in the combined moving image encoded data 1004, the TFD pictures B4, B2, B1, B3, and B7 in the second moving image encoded data 1002 are deleted.

ブロック列１００４の下側には、結合動画像符号化データ１００４のNumRemovedfds１００５が示される。BLAピクチャであるI8のNumRemovedfdsには、復号順序でI8以降となる、除去したTFDピクチャ(B4、B2、B1、B3、B5)の復号間隔の総和、この例では、I8以降でかつ除去されたピクチャのフィールド単位の枚数である10が格納される。DLPピクチャであるB6のNumRemovedfdsには、復号順序でB6以降となる、除去したTFDピクチャ(B5)の復号間隔の総和、この例では、B6以降でかつ除去されたピクチャのフィールド単位の枚数である2が格納される。ピクチャB7以降は、復号順序で後となる、除去したTFDピクチャは無いため、NumRemovedfdsは0のままである。 Below the block row 1004, NumRemovedfds 1005 of the combined moving image encoded data 1004 is shown. The NumRemovedfds of I8, which is a BLA picture, is the sum of the decoding intervals of the removed TFD pictures (B4, B2, B1, B3, B5) that become I8 or later in the decoding order, in this example, I8 or later and removed Stores 10 which is the number of pictures in a field unit. The NumRemovedfds of B6, which is a DLP picture, is the sum of the decoding intervals of the removed TFD picture (B5) after decoding B6 in the decoding order. In this example, it is the number of field units of the removed pictures after B6. 2 is stored. After picture B7, there is no removed TFD picture that is later in the decoding order, so NumRemovedfds remains 0.

結合動画像符号化データ１００４のNumRemovedfds１００５の下側のブロック列１００６は、NumRemovedfdsに基づき修正した、結合動画像符号化データ１００４の復号遅延CpbRemovalDelay及び表示遅延DpbOutputDelayの値を表す。ブロック列１００６の上側の各ブロックは、その上に示されたピクチャについての修正後の復号遅延CpbRemovalDelayを表し、ブロック列１００６の下側の各ブロックは、その上に示されたピクチャについての修正後の表示遅延DpbOutputDelayを表す。 A block sequence 1006 below the NumRemovedfds 1005 of the combined moving image encoded data 1004 represents the values of the decoding delay CpbRemovalDelay and the display delay DpbOutputDelay of the combined moving image encoded data 1004 corrected based on NumRemovedfds. Each block above block sequence 1006 represents a modified decoding delay CpbRemovalDelay for the picture shown above, and each block below block sequence 1006 is a modified version for the picture shown above Represents the display delay DpbOutputDelay.

BLAピクチャI8については、表示遅延DpbOutputDelayの元の値20から、NumRemovedfdsの値10を減算した10が、修正後の表示遅延DpbOutputDelayになる。これにより、修正後のピクチャI8の表示遅延DpbOutputDelayも、修正前のその値と同様に、ピクチャI8以降で最もリオーダリング数が多いピクチャB9の表示時刻を基準としたときの、ピクチャI8の復号時刻から表示時刻までの差で表すことができる。 For the BLA picture I8, 10 obtained by subtracting the value 10 of NumRemovedfds from the original value 20 of the display delay DpbOutputDelay becomes the corrected display delay DpbOutputDelay. As a result, the display delay DpbOutputDelay of the picture I8 after correction is also the decoding time of the picture I8 when the display time of the picture B9 with the largest reordering number after the picture I8 is used as a reference, similarly to the value before correction. To the display time.

また、DLPピクチャB6については、復号遅延CpbRemovalDelayの元の値10からピクチャI8のNumRemovedfds(=10)とピクチャB6のNumRemovedfds(=2)との差分8を引いた2が修正後の復号遅延CpbRemovalDelayとなる。また、ピクチャB6の表示遅延DpbOutputDelayの元の値6からピクチャB6のNumRemovedfds(2)を引いた４が、修正後の表示遅延DpbOutputDelayになる。ピクチャB7以降のピクチャについては、それぞれ、NumRemovedfdsの値が0であることから、復号遅延CpbRemovalDelayの元の値からピクチャI8のNumRemovedfdsを引いた値が、修正後の復号遅延CpbRemovalDelayとなる。一方、ピクチャB7以降のピクチャについての表示遅延DpbOutputDelayは変化しない。 Also, for the DLP picture B6, 2 is obtained by subtracting the difference 8 between the NumRemovedfds (= 10) of the picture I8 and the NumRemovedfds (= 2) of the picture B6 from the original value 10 of the decoding delay CpbRemovalDelay and the corrected decoding delay CpbRemovalDelay and Become. Also, 4 obtained by subtracting NumRemovedfds (2) of picture B6 from the original value 6 of display delay DpbOutputDelay of picture B6 is the corrected display delay DpbOutputDelay. For each picture after picture B7, the value of NumRemovedfds is 0, so the value obtained by subtracting NumRemovedfds of picture I8 from the original value of decoding delay CpbRemovalDelay is the corrected decoding delay CpbRemovalDelay. On the other hand, the display delay DpbOutputDelay for the pictures after the picture B7 does not change.

以上に説明してきたように、この実施形態による動画像符号化装置は、二つ以上の符号化動画像データを復号せずに結合した場合に、その結合により削除されたピクチャの枚数により決定される復号遅延及び表示遅延の修正用のパラメータを符号化動画像データ内に記録しておくだけで、符号化時に決定した復号遅延及び表示遅延のパラメータ値を修正しなくてよい。そしてこの実施形態による動画像復号装置は、符号化動画像データが結合された場合の復号遅延及び表示遅延の修正用のパラメータを利用して、各ピクチャの復号遅延及び表示遅延を修正できるので、適切なタイミングで各ピクチャを復号及び表示できる。 As described above, the moving picture coding apparatus according to this embodiment is determined by the number of pictures deleted by combining two or more pieces of encoded moving picture data without decoding. The decoding delay and display delay parameter values determined at the time of encoding need not be corrected by simply recording the decoding delay and display delay correction parameters in the encoded moving image data. And the moving picture decoding apparatus according to this embodiment can correct the decoding delay and the display delay of each picture using the parameters for correcting the decoding delay and the display delay when the encoded moving picture data is combined. Each picture can be decoded and displayed at an appropriate timing.

次に、第２の実施形態について説明する。第２の実施形態では、第１の実施形態と比較して、符号化動画像データの構造が異なる。 Next, a second embodiment will be described. The second embodiment differs from the first embodiment in the structure of encoded moving image data.

図１１を参照しつつ、第２の実施形態における、符号化動画像データの構造を説明する。図４に示された第１の実施形態による符号化されたピクチャの構造と同様に、一つのピクチャのデータ構造１１００は、６種類のNAL unit１１１０〜１１１５を含む。このうち、BPSEI１１１３及びPTSEI１１１４が、図４に示されたBPSEI４１３及びPTSEI４１４と異なる。一方、DELIM１１１０、SPS１１１１、PPS１１１２、SLICE１１１５、及びNUH１１２０は、それぞれ、図４に示されたDELIM４１０、SPS４１１、PPS４１２、SLICE４１５及びNUH４２０と同等である。 The structure of encoded moving image data in the second embodiment will be described with reference to FIG. Similar to the structure of the coded picture according to the first embodiment shown in FIG. 4, the data structure 1100 of one picture includes six types of NAL units 1110 to 1115. Among these, the BPSEI 1113 and the PTSEI 1114 are different from the BPSEI 413 and the PTSEI 414 shown in FIG. On the other hand, DELIM 1110, SPS 1111, PPS 1112, SLICE 1115, and NUH 1120 are equivalent to DELIM 410, SPS 411, PPS 412, SLICE 415, and NUH 420 shown in FIG. 4, respectively.

BPSEI１１１３は、結合時において、BLAピクチャから次のCRAピクチャまでの間に位置するTFDピクチャ及びDLPピクチャのうち、削除されなかったピクチャの枚数に2を加算した数である変数mに1を加えた数を表すNumEntriesフィールドを含む。さらに、BPSEI１１１３は、NumEntries個のAltCpbRemovalDelayOffsetフィールド及びAltDpbOutputDelayOffsetフィールドを含む。なお、NumEntriesフィールド、AltCpbRemovalDelayOffsetフィールド及びAltDpbOutputDelayOffsetフィールドが、復号遅延及び表示遅延を補正するための補正情報の他の一例である。
一方、PTSEI１１４０は、PTSEI４４０と異なり、NumRemovedTfdsフィールドを含まない。
NumEntriesフィールドが0の場合、BPSEIが付加されたピクチャ、及び以降のピクチャ（次のBPSEI付加ピクチャまで）において、CpbRemovalDelay及びDpbOutputDelayの値を動画像復号装置は変更しなくてよい。一方、NumEntriesがゼロでない場合、動画像復号装置は、BPSEIが付加されたピクチャから数えて復号順番でk番目のピクチャの復号遅延CpbRemovalDelayの修正値を、復号遅延CpbRemovalDelayの元の値からAltCpbRemovalDelayOffset[k]を減算することで計算する。同様に、動画像復号装置は、表示遅延DpbOutputDelayの修正値を、表示遅延DpbOutputDelayの元の値からAltDpbOutputDelayOffset[k]を減算することで計算する。 The BPSEI 1113 added 1 to the variable m, which is the number obtained by adding 2 to the number of pictures not deleted from the TFD picture and DLP picture located between the BLA picture and the next CRA picture at the time of combination. Contains a NumEntries field that represents a number. Further, the BPSEI 1113 includes NumEntries AltCpbRemovalDelayOffset fields and AltDpbOutputDelayOffset fields. The NumEntries field, the AltCpbRemovalDelayOffset field, and the AltDpbOutputDelayOffset field are another example of correction information for correcting the decoding delay and the display delay.
On the other hand, unlike PTSEI 440, PTSEI 1140 does not include a NumRemovedTfds field.
When the NumEntries field is 0, the video decoding device does not have to change the values of CpbRemovalDelay and DpbOutputDelay in the picture to which BPSEI is added and the subsequent pictures (up to the next BPSEI-added picture). On the other hand, when NumEntries is not zero, the video decoding device calculates the correction value of the decoding delay CpbRemovalDelay of the kth picture in the decoding order from the picture to which BPSEI is added, and AltCpbRemovalDelayOffset [k ] Is subtracted. Similarly, the moving picture decoding apparatus calculates the corrected value of the display delay DpbOutputDelay by subtracting AltDpbOutputDelayOffset [k] from the original value of the display delay DpbOutputDelay.

上記のように、第１の実施形態と比較して、CpbRemovalDelayフィールド及びDpbOutputDelayフィールドの補正値を格納するSEIが異なる。そのため、第２の実施形態による動画像符号化装置は、第１の実施形態による動画像符号化装置と比較して、結合点識別情報処理部１４の処理が異なる。そこで以下では、結合点識別情報処理部１４の処理について説明する。 As described above, the SEI for storing the correction values of the CpbRemovalDelay field and the DpbOutputDelay field is different from that of the first embodiment. Therefore, the moving picture coding apparatus according to the second embodiment differs from the moving picture coding apparatus according to the first embodiment in the processing of the connection point identification information processing unit 14. Therefore, hereinafter, the process of the connection point identification information processing unit 14 will be described.

結合点識別情報処理部１４は、図７の動画像編集処理の動作フローチャートに基づき計算される変数mに1を加算した値を、NumEntriesフィールドに格納する。さらに、結合点識別情報処理部１４は、k番目(k=[0,m-1])番目のAltCpbRemovalDelayOffsetフィールドに、L[0] - L[k]の値を入れる。またk番目のAltDpbOutputDelayOffsetフィールドに、L[k]の値を入れる。 The connection point identification information processing unit 14 stores a value obtained by adding 1 to the variable m calculated based on the operation flowchart of the moving image editing process in FIG. 7 in the NumEntries field. Furthermore, the connection point identification information processing unit 14 puts a value of L [0] −L [k] in the kth (k = [0, m−1]) th AltCpbRemovalDelayOffset field. Also, the value of L [k] is entered in the kth AltDpbOutputDelayOffset field.

次に、第２の実施形態による動画像復号装置の動作について説明する。第２の実施形態による動画像復号装置も、第１の実施形態による動画像復号装置と同様の構成を有する。ただし、第１の実施形態と比較して、第２の実施形態による動画像復号装置では、ピクチャ復号・表示時刻決定部２３の処理が異なる。そこで以下では、ピクチャ復号・表示時刻決定部２３の処理について説明する。
復号対象ピクチャの直前のBPSEI付加ピクチャにおいて、BPSEIのNumEntriesフィールドがゼロでない場合にのみ、ピクチャ復号・表示時刻決定部２３は復号対象ピクチャのPTSEIの復号遅延CpbRemovalDelay及び表示遅延DpbOutputDelayの値を以下のように補正する。 Next, the operation of the video decoding device according to the second embodiment will be described. The video decoding device according to the second embodiment also has the same configuration as the video decoding device according to the first embodiment. However, in the moving picture decoding apparatus according to the second embodiment, the processing of the picture decoding / display time determining unit 23 is different from that of the first embodiment. Therefore, hereinafter, the processing of the picture decoding / display time determination unit 23 will be described.
In the BPSEI-added picture immediately before the decoding target picture, only when the NumEntries field of the BPSEI is not zero, the picture decoding / display time determination unit 23 sets the values of the decoding delay CpbRemovalDelay and the display delay DpbOutputDelay of the PTSEI of the decoding target picture as follows: To correct.

直前のBPSEI付加ピクチャ（この場合はBLAピクチャ）から、復号順番をk(k=0,1,2,…)とする。kがNumEntriesと同じか大きい場合、ピクチャ復号・表示時刻決定部２３は、そのk番目のピクチャの復号遅延CpbRemovalDelayの元の値から、AltCpbRemovalDelayOffset[NumEntries-1]の値を引いた値を、修正後のCpbRemovalDelayの値とする。一方、kがNumEntriesよりも小さい場合、ピクチャ復号・表示時刻決定部２３は、復号遅延CpbRemovalDelayの元の値から、AltCpbRemovalDelayOffset[k]の値を引いた値を、k番目のピクチャについての修正後のCpbRemovalDelay値とし、表示遅延DpbRemovalDelayの元の値からAltDpbRemovalDelayOffset値を引いた値を、修正後のDpbOutputDelayの値とする。 The decoding order is k (k = 0, 1, 2,...) From the immediately preceding BPSEI-added picture (in this case, the BLA picture). When k is equal to or larger than NumEntries, the picture decoding / display time determination unit 23 corrects the value obtained by subtracting the value of AltCpbRemovalDelayOffset [NumEntries-1] from the original value of the decoding delay CpbRemovalDelay of the kth picture. Of CpbRemovalDelay. On the other hand, when k is smaller than NumEntries, the picture decoding / display time determining unit 23 obtains a value obtained by subtracting the value of AltCpbRemovalDelayOffset [k] from the original value of the decoding delay CpbRemovalDelay after correcting the kth picture. The value obtained by subtracting the AltDpbRemovalDelayOffset value from the original value of the display delay DpbRemovalDelay as the CpbRemovalDelay value is set as the corrected DpbOutputDelay value.

図１２は、上記の何れかの実施形態またはその変形例による動画像符号化装置または動画像復号装置の各部の機能を実現するコンピュータプログラムが動作することにより、動画像符号化装置または動画像復号装置として動作するコンピュータの構成図である。 FIG. 12 illustrates the moving picture coding apparatus or the moving picture decoding by the operation of a computer program that implements the functions of the respective units of the moving picture coding apparatus or the moving picture decoding apparatus according to any one of the above-described embodiments or modifications thereof. It is a block diagram of the computer which operate | moves as an apparatus.

コンピュータ１００は、ユーザインターフェース部１０１と、通信インターフェース部１０２と、記憶部１０３と、記憶媒体アクセス装置１０４と、プロセッサ１０５とを有する。プロセッサ１０５は、ユーザインターフェース部１０１、通信インターフェース部１０２、記憶部１０３及び記憶媒体アクセス装置１０４と、例えば、バスを介して接続される。 The computer 100 includes a user interface unit 101, a communication interface unit 102, a storage unit 103, a storage medium access device 104, and a processor 105. The processor 105 is connected to the user interface unit 101, the communication interface unit 102, the storage unit 103, and the storage medium access device 104 via, for example, a bus.

ユーザインターフェース部１０１は、例えば、キーボードとマウスなどの入力装置と、液晶ディスプレイといった表示装置とを有する。または、ユーザインターフェース部１０１は、タッチパネルディスプレイといった、入力装置と表示装置とが一体化された装置を有してもよい。そしてユーザインターフェース部１０１は、例えば、ユーザの操作に応じて、符号化する動画像データ、編集する符号化動画像データあるいは復号する符号化動画像データを選択する操作信号をプロセッサ１０５へ出力する。またユーザインターフェース部１０１は、プロセッサ１０５から受け取った、復号された動画像データを表示してもよい。 The user interface unit 101 includes, for example, an input device such as a keyboard and a mouse, and a display device such as a liquid crystal display. Alternatively, the user interface unit 101 may include a device such as a touch panel display in which an input device and a display device are integrated. For example, the user interface unit 101 outputs an operation signal for selecting moving image data to be encoded, encoded moving image data to be edited, or encoded moving image data to be decoded to the processor 105 in accordance with a user operation. The user interface unit 101 may display the decoded moving image data received from the processor 105.

通信インターフェース部１０２は、コンピュータ１００を、動画像データを生成する装置、例えば、ビデオカメラと接続するための通信インターフェース及びその制御回路を有してもよい。そのような通信インターフェースは、例えば、Universal Serial Bus（ユニバーサル・シリアル・バス、USB）とすることができる。 The communication interface unit 102 may include a communication interface for connecting the computer 100 to a device that generates moving image data, for example, a video camera, and a control circuit thereof. Such a communication interface can be, for example, Universal Serial Bus (Universal Serial Bus, USB).

さらに、通信インターフェース部１０２は、イーサネット（登録商標）などの通信規格に従った通信ネットワークに接続するための通信インターフェース及びその制御回路を有してもよい。 Furthermore, the communication interface unit 102 may include a communication interface for connecting to a communication network according to a communication standard such as Ethernet (registered trademark) and a control circuit thereof.

この場合には、通信インターフェース部１０２は、通信ネットワークに接続された他の機器から、符号化する動画像データ、編集する符号化動画像データまたは復号する符号化動画像データを取得し、それらのデータをプロセッサ１０５へ渡す。また通信インターフェース部１０２は、プロセッサ１０５から受け取った、符号化動画像データ、結合符号化動画像データまたは復号された動画像データを通信ネットワークを介して他の機器へ出力してもよい。 In this case, the communication interface unit 102 acquires the moving image data to be encoded, the encoded moving image data to be edited, or the encoded moving image data to be decoded from another device connected to the communication network. Data is passed to the processor 105. Further, the communication interface unit 102 may output the encoded moving image data, the combined encoded moving image data, or the decoded moving image data received from the processor 105 to another device via the communication network.

記憶部１０３は、例えば、読み書き可能な半導体メモリと読み出し専用の半導体メモリとを有する。そして記憶部１０３は、プロセッサ１０５上で実行される、動画像符号化処理または動画像復号処理を実行するためのコンピュータプログラム、及びこれらの処理の途中または結果として生成されるデータを記憶する。 The storage unit 103 includes, for example, a readable / writable semiconductor memory and a read-only semiconductor memory. The storage unit 103 stores a computer program for executing a moving image encoding process or a moving image decoding process executed on the processor 105, and data generated during or as a result of these processes.

記憶媒体アクセス装置１０４は、例えば、磁気ディスク、半導体メモリカード及び光記憶媒体といった記憶媒体１０６にアクセスする装置である。記憶媒体アクセス装置１０４は、例えば、記憶媒体１０６に記憶されたプロセッサ１０５上で実行される、動画像符号化処理または動画像復号処理用のコンピュータプログラムを読み込み、プロセッサ１０５に渡す。 The storage medium access device 104 is a device that accesses a storage medium 106 such as a magnetic disk, a semiconductor memory card, and an optical storage medium. For example, the storage medium access device 104 reads a computer program for moving image encoding processing or moving image decoding processing executed on the processor 105 stored in the storage medium 106 and passes the computer program to the processor 105.

プロセッサ１０５は、上記の各実施形態の何れかまたは変形例による動画像符号化処理用コンピュータプログラムを実行することにより、動画像データを符号化する。あるいは、プロセッサ１０５は、二つの符号化動画像データを結合した結合符号化動画像データを生成する。そしてプロセッサ１０５は、生成された結合符号化動画像データを記憶部１０３に保存し、または通信インターフェース部１０２を介して他の機器へ出力する。さらにプロセッサ１０５は、上記の各実施形態の何れかまたは変形例による動画像復号処理用コンピュータプログラムを実行することにより、符号化動画像データを復号する。そしてプロセッサ１０５は、復号された動画像データを記憶部１０３に保存し、ユーザインターフェース部１０１に表示し、または通信インターフェース部１０２を介して他の機器へ出力する。 The processor 105 encodes moving image data by executing a computer program for moving image encoding processing according to any one or each of the above embodiments. Alternatively, the processor 105 generates combined encoded moving image data obtained by combining two encoded moving image data. Then, the processor 105 stores the generated combined encoded moving image data in the storage unit 103 or outputs it to another device via the communication interface unit 102. Furthermore, the processor 105 decodes the encoded moving image data by executing a computer program for moving image decoding processing according to any one of the above-described embodiments or modifications. Then, the processor 105 stores the decoded moving image data in the storage unit 103 and displays it on the user interface unit 101 or outputs it to another device via the communication interface unit 102.

コンピュータ上で実行されることにより、上述した実施形態またはその変形例による動画像符号化装置及び動画像復号装置の各部の機能を実現するコンピュータプログラムは、半導体メモリまたは光記録媒体などの記録媒体に記録されて配布されてもよい。ただし、そのような記録媒体には、搬送波は含まれない。 When executed on a computer, a computer program that realizes the functions of the respective units of the moving picture coding apparatus and the moving picture decoding apparatus according to the above-described embodiment or its modification is recorded on a recording medium such as a semiconductor memory or an optical recording medium. It may be recorded and distributed. However, such a recording medium does not include a carrier wave.

上述した実施形態またはその変形例による、動画像符号化装置及び動画像復号装置は、様々な用途に利用される。例えば、この動画像符号化装置及び動画像復号装置は、ビデオカメラ、映像送信装置、映像受信装置、テレビ電話システム、コンピュータあるいは携帯電話機に組み込まれる。 The moving image encoding device and the moving image decoding device according to the above-described embodiment or its modification are used for various applications. For example, the moving image encoding device and the moving image decoding device are incorporated in a video camera, a video transmission device, a video reception device, a videophone system, a computer, or a mobile phone.

ここに挙げられた全ての例及び特定の用語は、読者が、本発明及び当該技術の促進に対する本発明者により寄与された概念を理解することを助ける、教示的な目的において意図されたものであり、本発明の優位性及び劣等性を示すことに関する、本明細書の如何なる例の構成、そのような特定の挙げられた例及び条件に限定しないように解釈されるべきものである。本発明の実施形態は詳細に説明されているが、本発明の精神及び範囲から外れることなく、様々な変更、置換及び修正をこれに加えることが可能であることを理解されたい。 All examples and specific terms listed herein are intended for instructional purposes to help the reader understand the concepts contributed by the inventor to the present invention and the promotion of the technology. It should be construed that it is not limited to the construction of any example herein, such specific examples and conditions, with respect to showing the superiority and inferiority of the present invention. Although embodiments of the present invention have been described in detail, it should be understood that various changes, substitutions and modifications can be made thereto without departing from the spirit and scope of the present invention.

１動画像符号化装置
１１制御部
１２符号化制御部
１３ピクチャ符号化部
１４結合点識別情報処理部
１５データ結合部
２動画像復号装置
２１制御部
２２ヘッダ情報解析部
２３ピクチャ復号・表示時刻決定部
２４ピクチャ復号部
２５フレームメモリ DESCRIPTION OF SYMBOLS 1 Moving image encoder 11 Control part 12 Encoding control part 13 Picture encoding part 14 Connection point identification information processing part 15 Data connection part 2 Video decoding apparatus 21 Control part 22 Header information analysis part 23 Picture decoding and display time determination Unit 24 picture decoding unit 25 frame memory

Claims

A moving image encoding method for generating combined moving image data encoded by combining first moving image data and second moving image data encoded by an inter-frame prediction method,
In the moving picture decoding apparatus, the first encoded picture combined after the first moving picture data among the pictures included in the second moving picture data is encoded more than the first encoded picture. A decoding delay for continuously decoding and displaying pictures after the first encoded picture in the second moving image data even when one or more pictures in the later order are removed; Obtaining display delay correction information and adding the correction information to the combined moving image data;
Including
The moving picture coding method in which the correction information is calculated based on a decoding interval between each picture to be removed and the picture to be removed and a picture immediately before the decoding order.

The correction information is obtained for the first encoded picture and a picture having a decoding time later than that of the first encoded picture and an earlier display time, and the value of the correction information is calculated in the combined moving image data. For each picture to be removed from the second moving image data after the decoding order of the picture for which the correction information is obtained, decoding of the picture to be removed and the picture immediately before the decoding order The moving image encoding method according to claim 1, wherein the moving image encoding method is a value corresponding to a total of intervals.