JP2015180038A

JP2015180038A - Image encoder, image decoder, image processing system, image encoding method and image decoding method

Info

Publication number: JP2015180038A
Application number: JP2014220796A
Authority: JP
Inventors: 健吾寺田; Kengo Terada; 寿郎笹井; Toshiro Sasai; 哲史吉川; Tetsushi Yoshikawa
Original assignee: Panasonic Intellectual Property Management Co Ltd
Current assignee: Panasonic Intellectual Property Management Co Ltd
Priority date: 2014-03-18
Filing date: 2014-10-29
Publication date: 2015-10-08
Anticipated expiration: 2034-10-29
Also published as: JP6509523B2

Abstract

PROBLEM TO BE SOLVED: To provide an image encoder that can refer to a proper reference image for inter-face prediction.SOLUTION: An image encoder for encoding plural display target images constituting a picture by using inter-face prediction has an obtaining part 71 for obtaining a reference dedicated image as an image which is different from the plural display target images and plural reconstructed images of the plural display target images and used for only reference in inter-face prediction, and an encoding part 72 for encoding one or more display target images out of the plural display target images by referring to the reference dedicated image as a reference image in the inter-face prediction.

Description

本発明は、面間予測を用いて、映像を構成する複数の表示対象画像を符号化する画像符号化装置等に関する。 The present invention relates to an image encoding device or the like that encodes a plurality of display target images constituting a video by using inter prediction.

画像（動画像を含む）を符号化する画像符号化方法、または、画像を復号する画像復号方法に関する技術として、非特許文献１に記載の技術がある。 As a technique related to an image encoding method for encoding an image (including a moving image) or an image decoding method for decoding an image, there is a technique described in Non-Patent Document 1.

また、背景画像を用いた画像符号化方法に関する技術として、特許文献１に記載の技術がある。 Moreover, there exists a technique of patent document 1 as a technique regarding the image coding method using a background image.

特開平１０−２３４２３号公報Japanese Patent Laid-Open No. 10-23423

Joint Collaborative Team on Video Coding (JCT-VC) of ITU-T SG16 WP3 and ISO/IEC JTC1/SC29/WG11 12th Meeting: Geneva, CH, 14-23 Jan. 2013 JCTVC-L1003_v34.doc, High Efficiency Video Coding (HEVC) text specification draft 10 (for FDIS & Last Call) http://phenix.it-sudparis.eu/jct/doc_end_user/documents/12_Geneva/wg11/JCTVC-L1003-v34.zipJoint Collaborative Team on Video Coding (JCT-VC) of ITU-T SG16 WP3 and ISO / IEC JTC1 / SC29 / WG11 12th Meeting: Geneva, CH, 14-23 Jan. 2013 JCTVC-L1003_v34.doc, High Efficiency Video Coding ( HEVC) text specification draft 10 (for FDIS & Last Call) http://phenix.it-sudparis.eu/jct/doc_end_user/documents/12_Geneva/wg11/JCTVC-L1003-v34.zip

しかしながら、従来技術に係る画像符号化装置等は、面間予測（画面間予測）において適切な参照画像を参照することができない可能性がある。 However, there is a possibility that an image encoding device or the like according to the related art cannot refer to an appropriate reference image in inter-plane prediction (inter-screen prediction).

そこで、本発明は、面間予測において適切な参照画像を参照することができる画像符号化装置等を提供する。 Therefore, the present invention provides an image encoding device and the like that can refer to an appropriate reference image in inter-surface prediction.

本発明の一態様に係る画像符号化装置は、面間予測を用いて、映像を構成する複数の表示対象画像を符号化する画像符号化装置であって、前記複数の表示対象画像とも前記複数の表示対象画像の複数の再構成画像とも異なる画像であり前記面間予測において参照専用として用いられる画像である参照専用画像を取得する取得部と、前記参照専用画像を前記面間予測における参照画像として参照して、前記複数の表示対象画像のうち１以上の表示対象画像を符号化する符号化部とを備える。 An image encoding apparatus according to an aspect of the present invention is an image encoding apparatus that encodes a plurality of display target images constituting a video by using inter prediction, and the plurality of display target images are the plurality of display target images. An acquisition unit that acquires a reference-only image that is different from a plurality of reconstructed images of the display target image and is used as a reference-only image in the inter-frame prediction; and the reference-only image is a reference image in the inter-surface prediction And an encoding unit that encodes one or more display target images among the plurality of display target images.

なお、これらの包括的または具体的な態様は、システム、装置、方法、集積回路、コンピュータプログラム、または、コンピュータ読み取り可能なＣＤ−ＲＯＭなどの非一時的な記録媒体で実現されてもよく、システム、装置、方法、集積回路、コンピュータプログラム、および、記録媒体の任意な組み合わせで実現されてもよい。 Note that these comprehensive or specific aspects may be realized by a system, apparatus, method, integrated circuit, computer program, or non-transitory recording medium such as a computer-readable CD-ROM. The present invention may be realized by any combination of an apparatus, a method, an integrated circuit, a computer program, and a recording medium.

本発明の一態様に係る画像符号化装置等は、面間予測において適切な参照画像を参照することができる。 An image encoding device or the like according to an aspect of the present invention can refer to an appropriate reference image in inter prediction.

図１は、実施の形態１における画像処理システムの構成を示す図である。FIG. 1 is a diagram illustrating a configuration of an image processing system according to the first embodiment. 図２は、実施の形態１における画像処理システムの処理フローを示す図である。FIG. 2 is a diagram illustrating a processing flow of the image processing system according to the first embodiment. 図３は、実施の形態１における符号列を示す図である。FIG. 3 is a diagram showing a code string in the first embodiment. 図４は、実施の形態１における背景画像の生成処理のフローを示す図である。FIG. 4 is a diagram showing a flow of background image generation processing in the first embodiment. 図５は、実施の形態１における背景画像を示す図である。FIG. 5 is a diagram showing a background image in the first embodiment. 図６は、実施の形態１における別の背景画像を示す図である。FIG. 6 is a diagram showing another background image in the first embodiment. 図７は、実施の形態１における背景画像の選択処理のフローを示す図である。FIG. 7 is a diagram showing a flow of background image selection processing in the first embodiment. 図８は、実施の形態１における背景画像の更新処理のフローを示す図である。FIG. 8 is a diagram showing a flow of background image update processing in the first embodiment. 図９は、実施の形態１におけるエンコーダの処理部の構成を主に示す図である。FIG. 9 is a diagram mainly illustrating a configuration of a processing unit of the encoder in the first embodiment. 図１０は、実施の形態１における符号化処理のフローを示す図である。FIG. 10 is a diagram showing a flow of the encoding process in the first embodiment. 図１１は、実施の形態１におけるスケーリング処理を示す図である。FIG. 11 is a diagram showing the scaling process in the first embodiment. 図１２は、実施の形態１における変換処理を示す図である。FIG. 12 is a diagram showing the conversion process in the first embodiment. 図１３は、実施の形態１における全体ベクトルを示す図である。FIG. 13 is a diagram showing an entire vector in the first embodiment. 図１４は、実施の形態１におけるスケーリング処理の変形例を示す図である。FIG. 14 is a diagram illustrating a modification of the scaling process in the first embodiment. 図１５は、実施の形態１における整数画素精度および小数画素精度を示す図である。FIG. 15 is a diagram illustrating integer pixel accuracy and decimal pixel accuracy in the first embodiment. 図１６は、実施の形態１における符号列の変形例を示す図である。FIG. 16 is a diagram illustrating a modification of the code string in the first embodiment. 図１７は、実施の形態２における画像処理システムの構成を示す図である。FIG. 17 is a diagram illustrating a configuration of an image processing system according to the second embodiment. 図１８は、実施の形態２における画像処理システムの処理フローを示す図である。FIG. 18 is a diagram illustrating a processing flow of the image processing system according to the second embodiment. 図１９は、実施の形態２におけるデコーダの処理部の構成を主に示す図である。FIG. 19 is a diagram mainly illustrating a configuration of a processing unit of the decoder according to the second embodiment. 図２０は、実施の形態２における復号処理のフローを示す図である。FIG. 20 is a diagram showing a flow of decoding processing in the second embodiment. 図２１は、実施の形態３における画像処理システムの構成を示す図である。FIG. 21 is a diagram illustrating a configuration of an image processing system according to the third embodiment. 図２２は、実施の形態３における画像処理システムの動作の処理フローを示す図である。FIG. 22 is a diagram showing a processing flow of the operation of the image processing system in the third embodiment. 図２３は、コンテンツ配信サービスを実現するコンテンツ供給システムの全体構成図である。FIG. 23 is an overall configuration diagram of a content supply system that implements a content distribution service. 図２４は、デジタル放送用システムの全体構成図である。FIG. 24 is an overall configuration diagram of a digital broadcasting system. 図２５は、テレビの構成例を示すブロック図である。FIG. 25 is a block diagram illustrating a configuration example of a television. 図２６は、光ディスクである記録メディアに情報の読み書きを行う情報再生／記録部の構成例を示すブロック図である。FIG. 26 is a block diagram illustrating a configuration example of an information reproducing / recording unit that reads and writes information from and on a recording medium that is an optical disk. 図２７は、光ディスクである記録メディアの構造例を示す図である。FIG. 27 is a diagram illustrating a structure example of a recording medium that is an optical disk. 図２８Ａは、携帯電話の一例を示す図である。FIG. 28A is a diagram illustrating an example of a mobile phone. 図２８Ｂは、携帯電話の構成例を示すブロック図である。FIG. 28B is a block diagram illustrating a configuration example of a mobile phone. 図２９は、多重化データの構成を示す図である。FIG. 29 is a diagram showing a structure of multiplexed data. 図３０は、各ストリームが多重化データにおいてどのように多重化されているかを模式的に示す図である。FIG. 30 is a diagram schematically showing how each stream is multiplexed in the multiplexed data. 図３１は、ＰＥＳパケット列に、ビデオストリームがどのように格納されるかを更に詳しく示した図である。FIG. 31 is a diagram showing in more detail how the video stream is stored in the PES packet sequence. 図３２は、多重化データにおけるＴＳパケットとソースパケットの構造を示す図である。FIG. 32 is a diagram showing the structure of TS packets and source packets in multiplexed data. 図３３は、ＰＭＴのデータ構成を示す図である。FIG. 33 shows the data structure of the PMT. 図３４は、多重化データ情報の内部構成を示す図である。FIG. 34 shows the internal structure of multiplexed data information. 図３５は、ストリーム属性情報の内部構成を示す図である。FIG. 35 shows the internal structure of stream attribute information. 図３６は、映像データを識別するステップを示す図である。FIG. 36 is a diagram showing steps for identifying video data. 図３７は、各実施の形態の動画像符号化方法および動画像復号化方法を実現する集積回路の構成例を示すブロック図である。FIG. 37 is a block diagram illustrating a configuration example of an integrated circuit that realizes the moving picture coding method and the moving picture decoding method according to each embodiment. 図３８は、駆動周波数を切り替える構成を示す図である。FIG. 38 is a diagram showing a configuration for switching the drive frequency. 図３９は、映像データを識別し、駆動周波数を切り替えるステップを示す図である。FIG. 39 is a diagram illustrating steps for identifying video data and switching between driving frequencies. 図４０は、映像データの規格と駆動周波数を対応づけたルックアップテーブルの一例を示す図である。FIG. 40 is a diagram illustrating an example of a look-up table in which video data standards are associated with drive frequencies. 図４１Ａは、信号処理部のモジュールを共有化する構成の一例を示す図である。FIG. 41A is a diagram illustrating an example of a configuration for sharing a module of a signal processing unit. 図４１Ｂは、信号処理部のモジュールを共有化する構成の他の一例を示す図である。FIG. 41B is a diagram illustrating another example of a configuration for sharing a module of a signal processing unit.

（本発明の基礎となった知見）
本発明者は、「背景技術」の欄において記載した、画像を符号化する画像符号化方法、または、画像を復号する画像復号方法に関して、課題を見出した。以下、具体的に説明する。 (Knowledge that became the basis of the present invention)
The present inventor has found a problem with respect to the image encoding method for encoding an image or the image decoding method for decoding an image described in the “Background Art” section. This will be specifically described below.

近年、デジタル映像機器の技術進歩が著しく、ビデオカメラまたはテレビチューナなどから入力された映像信号（時系列の順に並んだ複数のピクチャ）を圧縮符号化し、ＤＶＤまたはハードディスク等の記録メディアに記録する機会が増えている。画像符号化規格としてＨ．２６４／ＡＶＣ（ＭＰＥＧ−４ＡＶＣ）と呼ばれる規格が存在するが、次世代の標準規格としてＨＥＶＣ（ＨｉｇｈＥｆｆｉｃｉｅｎｃｙＶｉｄｅｏＣｏｄｉｎｇ）と呼ばれる規格（非特許文献１）が検討されている。 In recent years, technological progress of digital video equipment has been remarkable, and an opportunity to compress and encode video signals (a plurality of pictures arranged in time series) input from a video camera or a TV tuner and record them on a recording medium such as a DVD or a hard disk Is increasing. As an image coding standard, H.264 is used. Although there is a standard called H.264 / AVC (MPEG-4 AVC), a standard called HEVC (High Efficiency Video Coding) (Non-Patent Document 1) has been studied as a next-generation standard.

一方、背景画像を長期的に保存し、保存された背景画像を面間予測における参照画像として使うことにより符号化効率を高める技術が特許文献１に開示されている。 On the other hand, Patent Document 1 discloses a technique for increasing coding efficiency by storing a background image for a long period of time and using the stored background image as a reference image in inter prediction.

ＨＥＶＣ規格（非特許文献１）には、特許文献１に記載の技術が適用可能な長期参照画像という仕組みがある。長期参照画像として指定された復号画像は、長期的にフレームメモリに保存される。したがって、その後の復号において、長期参照画像として指定された復号画像への長期的な参照が可能になる。 The HEVC standard (Non-Patent Document 1) has a mechanism called a long-term reference image to which the technique described in Patent Document 1 can be applied. The decoded image designated as the long-term reference image is stored in the frame memory for a long time. Therefore, in the subsequent decoding, a long-term reference to the decoded image designated as the long-term reference image becomes possible.

しかしながら、適切な参照画像が映像において存在しない場合がある。このような場合、面間予測で適切な参照画像を参照することは困難である。したがって、このような場合、符号化効率が低下する可能性がある。 However, an appropriate reference image may not exist in the video. In such a case, it is difficult to refer to an appropriate reference image in inter-surface prediction. Therefore, in such a case, encoding efficiency may be reduced.

また、例えば、パン、チルトおよびズームが可能なカメラで撮影した画像が符号化される場合、非特許文献１および特許文献１の技術を用いても符号化効率が向上しない可能性がある。具体的には、パン、チルトまたはズームによって背景が大きく変化する。そのため、１つの背景画像が長期参照画像として保存されても、パン、チルトまたはズームが行われた符号化対象画像と背景画像とが一致しない可能性がある。したがって、予測の確度が向上せず、符号化効率が向上しない場合がある。 Further, for example, when an image captured by a camera capable of panning, tilting, and zooming is encoded, there is a possibility that the encoding efficiency may not be improved even if the techniques of Non-Patent Document 1 and Patent Document 1 are used. Specifically, the background changes greatly by panning, tilting, or zooming. Therefore, even if one background image is stored as a long-term reference image, there is a possibility that the encoding target image that has been panned, tilted, or zoomed does not match the background image. Therefore, the accuracy of prediction is not improved, and the encoding efficiency may not be improved.

例えば、本発明の一態様に係る画像符号化装置は、面間予測を用いて、映像を構成する複数の表示対象画像を符号化する画像符号化装置であって、前記複数の表示対象画像とも前記複数の表示対象画像の複数の再構成画像とも異なる画像であり前記面間予測において参照専用として用いられる画像である参照専用画像を取得する取得部と、前記参照専用画像を前記面間予測における参照画像として参照して、前記複数の表示対象画像のうち１以上の表示対象画像を符号化する符号化部とを備える。 For example, an image encoding device according to an aspect of the present invention is an image encoding device that encodes a plurality of display target images constituting a video by using inter prediction, and the plurality of display target images An acquisition unit that acquires a reference-only image that is an image different from a plurality of reconstructed images of the plurality of display target images and is used as a reference-only in the inter-frame prediction; and the reference-only image in the inter-surface prediction And an encoding unit that encodes one or more display target images among the plurality of display target images with reference to a reference image.

これにより、画像符号化装置は、面間予測において、表示対象画像等とは異なる参照専用画像を参照することができる。したがって、画像符号化装置は、面間予測において適切な参照画像を参照することができる。 Thereby, the image coding apparatus can refer to a reference-only image that is different from the display target image or the like in the inter prediction. Therefore, the image coding apparatus can refer to an appropriate reference image in inter-plane prediction.

また、例えば、前記取得部は、前記複数の表示対象画像のそれぞれよりも大きい前記参照専用画像を取得してもよい。 For example, the acquisition unit may acquire the reference-only image that is larger than each of the plurality of display target images.

これにより、画像符号化装置は、例えば、パン、チルトまたはズーム等に応じた背景を含む参照専用画像を参照して、画像を符号化することができる。 Thereby, the image coding apparatus can code an image with reference to a reference-only image including a background corresponding to, for example, pan, tilt, or zoom.

また、例えば、前記取得部は、撮影によって得られた複数の画像である複数の撮影画像が統合された前記参照専用画像を取得してもよい。 Further, for example, the acquisition unit may acquire the reference-only image in which a plurality of captured images that are a plurality of images obtained by capturing are integrated.

これにより、画像符号化装置は、例えば、パン、チルトおよびズーム等によって得られた複数の画像、または、複数のカメラによって得られた複数の画像が統合された参照専用画像を参照することができる。したがって、画像符号化装置は、より適切な参照画像を参照することができる。 Thereby, the image coding apparatus can refer to a reference-only image in which a plurality of images obtained by panning, tilting, zooming, or the like or a plurality of images obtained by a plurality of cameras are integrated, for example. . Therefore, the image coding apparatus can refer to a more appropriate reference image.

また、例えば、前記取得部は、前記複数の表示対象画像のうち符号化順で最初の表示対象画像が符号化される前に、前記参照専用画像を取得してもよい。 For example, the acquisition unit may acquire the reference-only image before the first display target image is encoded in the encoding order among the plurality of display target images.

これにより、画像符号化装置は、映像の符号化のための準備を予め行うことができ、円滑に映像を符号化することができる。 As a result, the image encoding device can prepare in advance for encoding the video, and can smoothly encode the video.

また、例えば、前記取得部は、前記参照専用画像を画像管理装置から部分的または全体的に受信することにより、前記参照専用画像を部分的または全体的に取得し、前記符号化部は、部分的または全体的に取得された前記参照専用画像を参照して、前記１以上の表示対象画像を符号化してもよい。 Further, for example, the acquisition unit acquires the reference-only image partially or entirely by receiving the reference-only image from an image management apparatus partially or entirely, and the encoding unit includes a partial The one or more display target images may be encoded with reference to the reference-only image acquired manually or entirely.

これにより、画像符号化装置は、画像管理装置から面間予測のための適切な参照専用画像を取得することができる。 Thereby, the image coding apparatus can acquire an appropriate reference-only image for inter-plane prediction from the image management apparatus.

また、例えば、前記取得部は、第１撮影状況に対応する第１参照専用画像と、第２撮影状況に対応する第２参照専用画像とを含む複数の参照専用画像のそれぞれを前記参照専用画像として取得し、前記符号化部は、前記映像の撮影状況が前記第１撮影状況である場合、前記第１参照専用画像を前記参照専用画像として参照して、前記１以上の表示対象画像を符号化し、前記映像の撮影状況が前記第２撮影状況である場合、前記第２参照専用画像を前記参照専用画像として参照して、前記１以上の表示対象画像を符号化してもよい。 In addition, for example, the acquisition unit converts each of a plurality of reference dedicated images including a first reference dedicated image corresponding to the first shooting situation and a second reference dedicated image corresponding to the second shooting situation to the reference dedicated image. The encoding unit encodes the one or more display target images with reference to the first reference-only image as the reference-only image when the shooting state of the video is the first shooting state. If the shooting situation of the video is the second shooting situation, the one or more display target images may be encoded with reference to the second reference-only image as the reference-only image.

これにより、画像符号化装置は、映像の撮影状況に応じて、複数の参照専用画像を切り替えることができる。 As a result, the image encoding device can switch between a plurality of reference-dedicated images depending on the video shooting situation.

また、例えば、前記取得部は、さらに、前記複数の表示対象画像の前記複数の再構成画像のうち１以上の再構成画像を用いて、前記参照専用画像を更新し、前記符号化部は、更新された前記参照専用画像を参照して、前記１以上の表示対象画像を符号化してもよい。 For example, the acquisition unit further updates the reference-only image using one or more reconstructed images among the plurality of reconstructed images of the plurality of display target images, and the encoding unit includes: The one or more display target images may be encoded with reference to the updated reference-only image.

これにより、画像符号化装置は、映像に応じて、参照専用画像を適切に更新することができる。 Thereby, the image coding apparatus can appropriately update the reference-only image in accordance with the video.

また、例えば、前記符号化部は、前記１以上の表示対象画像のうち符号化対象画像を符号化する際、前記参照専用画像が前記符号化対象画像に対応するように、前記参照専用画像を変換し、変換された前記参照専用画像を前記参照画像として参照してもよい。 Further, for example, when the encoding unit encodes an encoding target image among the one or more display target images, the encoding unit selects the reference dedicated image so that the reference dedicated image corresponds to the encoding target image. The converted reference-only image may be referred to as the reference image.

これにより、画像符号化装置は、符号化対象画像に従って変換された参照専用画像を参照することができる。したがって、画像符号化装置は、面間予測において、より適切な参照画像を参照することができる。 Thereby, the image coding apparatus can refer to the reference-only image converted according to the coding target image. Therefore, the image coding apparatus can refer to a more appropriate reference image in the inter prediction.

また、例えば、前記符号化部は、前記参照専用画像における被写体の大きさが前記符号化対象画像における被写体の大きさに対応するように、前記参照専用画像をスケーリングし、スケーリングされた前記参照専用画像を前記参照画像として参照してもよい。 For example, the encoding unit scales the reference-only image so that the size of the subject in the reference-only image corresponds to the size of the subject in the encoding-target image, and scales the reference-only image. An image may be referred to as the reference image.

これにより、画像符号化装置は、符号化対象画像に従ってスケーリングされた参照専用画像を参照することができる。したがって、画像符号化装置は、面間予測において、より適切な参照画像を参照することができる。 Thereby, the image encoding apparatus can refer to the reference-only image scaled according to the encoding target image. Therefore, the image coding apparatus can refer to a more appropriate reference image in the inter prediction.

また、例えば、前記符号化部は、前記参照専用画像および前記符号化対象画像のそれぞれの撮影情報、または、前記参照専用画像および前記符号化対象画像のそれぞれにおける特徴点の位置を用いて、前記参照専用画像をスケーリングしてもよい。 Further, for example, the encoding unit uses the shooting information of each of the reference-dedicated image and the encoding target image, or the position of the feature point in each of the reference-dedicated image and the encoding target image, and Reference-only images may be scaled.

これにより、画像符号化装置は、撮影情報等に基づいて、参照専用画像を適切にスケーリングすることができる。 As a result, the image encoding device can appropriately scale the reference-dedicated image based on the shooting information and the like.

また、例えば、前記符号化部は、前記面間予測で用いられる動きベクトルの精度に従って、前記参照専用画像をスケーリングしてもよい。 Further, for example, the encoding unit may scale the reference-only image according to the accuracy of a motion vector used in the inter-plane prediction.

これにより、画像符号化装置は、例えば、動きベクトルで指し示される小数画素の情報が維持されるように、参照専用画像をスケーリングすることができる。 Thereby, the image coding apparatus can scale the reference-only image so that, for example, the information of the decimal pixel indicated by the motion vector is maintained.

また、例えば、前記符号化部は、さらに、前記参照専用画像の変換に用いられたパラメータである変換パラメータを符号化してもよい。 Further, for example, the encoding unit may further encode a conversion parameter that is a parameter used for conversion of the reference-only image.

これにより、画像復号装置が、画像符号化装置と同様に、参照専用画像を変換することができる。 As a result, the image decoding apparatus can convert the reference-only image in the same manner as the image encoding apparatus.

また、例えば、前記符号化部は、さらに、前記参照専用画像において前記１以上の表示対象画像のうちの符号化対象画像が対応する領域の位置を指し示す全体ベクトルを符号化してもよい。 For example, the encoding unit may further encode an entire vector indicating a position of a region corresponding to the encoding target image among the one or more display target images in the reference-only image.

これにより、画像符号化装置は、参照専用画像において面間予測に用いられる領域を示す情報を符号化することができる。したがって、画像復号装置も、同じ領域を面間予測に用いることができる。 Thereby, the image coding apparatus can code information indicating a region used for inter-surface prediction in a reference-only image. Therefore, the image decoding apparatus can also use the same region for inter-plane prediction.

また、例えば、前記符号化部は、前記参照専用画像および前記符号化対象画像のそれぞれの撮影情報、または、前記参照専用画像および前記符号化対象画像のそれぞれにおける特徴点の位置を用いて、前記全体ベクトルを算出し、算出された前記全体ベクトルを符号化してもよい。 Further, for example, the encoding unit uses the shooting information of each of the reference-dedicated image and the encoding target image, or the position of the feature point in each of the reference-dedicated image and the encoding target image, and A total vector may be calculated and the calculated total vector may be encoded.

これにより、画像符号化装置は、参照専用画像において面間予測に用いられる領域を算出することができる。 Thereby, the image coding apparatus can calculate the region used for inter-surface prediction in the reference-only image.

また、例えば、前記符号化部は、前記１以上の表示対象画像を符号化して、前記参照専用画像を含む符号列とは別に、前記１以上の表示対象画像を含む符号列を生成してもよい。 For example, the encoding unit may encode the one or more display target images to generate a code sequence including the one or more display target images separately from the code sequence including the reference-only image. Good.

これにより、画像符号化装置は、映像とは別に参照専用画像を適切なタイミングで取得することができる。 Thereby, the image coding apparatus can acquire a reference-dedicated image at an appropriate timing separately from the video.

また、例えば、前記符号化部は、さらに、前記参照専用画像を非表示画像として符号化してもよい。 For example, the encoding unit may further encode the reference-only image as a non-display image.

これにより、画像符号化装置は、表示対象画像から区別して、参照専用画像を符号化することができる。 Thereby, the image encoding apparatus can encode the reference-only image by distinguishing it from the display target image.

また、例えば、本発明の一態様に係る画像復号装置は、面間予測を用いて、映像を構成する複数の表示対象画像を復号する画像復号装置であって、前記複数の表示対象画像とも前記複数の表示対象画像の複数の再構成画像とも異なる画像であり前記面間予測において参照専用として用いられる画像である参照専用画像を取得する取得部と、前記参照専用画像を前記面間予測における参照画像として参照して、前記複数の表示対象画像のうち１以上の表示対象画像を復号する復号部とを備える画像復号装置でもよい。 In addition, for example, an image decoding device according to an aspect of the present invention is an image decoding device that decodes a plurality of display target images constituting a video using inter-frame prediction, and the plurality of display target images are both the above-described display target images. An acquisition unit that acquires a reference-only image that is different from a plurality of reconstructed images of a plurality of display target images and is used as a reference-only in the inter-plane prediction, and the reference-only image is referred to in the inter-plane prediction An image decoding apparatus including a decoding unit that decodes one or more display target images among the plurality of display target images may be referred to as an image.

これにより、画像復号装置は、面間予測において、表示対象画像等とは異なる参照専用画像を参照することができる。したがって、画像復号装置は、面間予測において適切な参照画像を参照することができる。 Thereby, the image decoding apparatus can refer to a reference-only image that is different from the display target image or the like in the inter prediction. Therefore, the image decoding apparatus can refer to an appropriate reference image in the inter prediction.

また、例えば、本発明の一態様に係る画像処理システムは、面間予測を用いて、映像を構成する複数の表示対象画像の符号化および復号を行う画像処理システムであって、前記複数の表示対象画像とも前記複数の表示対象画像の複数の再構成画像とも異なる画像であり前記面間予測において参照専用として用いられる画像である参照専用画像を取得する画像管理装置と、前記面間予測を用いて、前記複数の表示対象画像を符号化する画像符号化装置と、前記面間予測を用いて、前記複数の表示対象画像を復号する画像復号装置とを備え、前記画像符号化装置は、前記画像管理装置で取得された前記参照専用画像を前記画像管理装置から取得する第１取得部と、前記第１取得部で取得された前記参照専用画像を前記面間予測における参照画像として参照して、前記複数の表示対象画像のうち１以上の表示対象画像を符号化する符号化部とを備え、前記画像復号装置は、前記画像管理装置で取得された前記参照専用画像を前記画像管理装置から取得する第２取得部と、前記第２取得部で取得された前記参照専用画像を前記面間予測における参照画像として参照して、前記複数の表示対象画像のうち１以上の表示対象画像を復号する復号部とを備える画像処理システムでもよい。 In addition, for example, an image processing system according to an aspect of the present invention is an image processing system that performs encoding and decoding of a plurality of display target images constituting a video by using inter prediction, and the plurality of displays An image management device that acquires a reference-only image that is a different image from a target image and a plurality of reconstructed images of the plurality of display target images and is used exclusively for reference in the inter-plane prediction, and using the inter-surface prediction An image encoding device that encodes the plurality of display target images, and an image decoding device that decodes the plurality of display target images using the inter-frame prediction, and the image encoding device includes: A first acquisition unit that acquires the reference-only image acquired by the image management device from the image management device; and the reference-only image acquired by the first acquisition unit is a reference image in the inter-plane prediction; An encoding unit that encodes one or more display target images of the plurality of display target images, and the image decoding device uses the reference-dedicated image acquired by the image management device as the reference dedicated image. A second acquisition unit acquired from the image management device, and the reference-only image acquired by the second acquisition unit is referred to as a reference image in the inter-plane prediction, and one or more displays among the plurality of display target images An image processing system including a decoding unit that decodes the target image may be used.

これにより、画像処理システムにおける画像符号化装置および画像復号装置は、面間予測において、表示対象画像等とは異なる参照専用画像を参照することができる。したがって、画像処理システムにおける画像符号化装置および画像復号装置は、面間予測において適切な参照画像を参照することができる。 As a result, the image encoding device and the image decoding device in the image processing system can refer to a reference-only image that is different from the display target image or the like in the inter prediction. Therefore, the image encoding device and the image decoding device in the image processing system can refer to an appropriate reference image in the inter prediction.

なお、これらの包括的または具体的な態様は、システム、装置、方法、集積回路、コンピュータプログラム、または、コンピュータ読み取り可能なＣＤ−ＲＯＭなどの非一時的な記録媒体で実現されてもよく、システム、装置、方法、集積回路、コンピュータプログラム、または、記録媒体の任意な組み合わせで実現されてもよい。 Note that these comprehensive or specific aspects may be realized by a system, apparatus, method, integrated circuit, computer program, or non-transitory recording medium such as a computer-readable CD-ROM. , An apparatus, a method, an integrated circuit, a computer program, or any combination of recording media.

以下、実施の形態について、図面を参照しながら具体的に説明する。なお、以下で説明する実施の形態は、いずれも包括的または具体的な例を示すものである。以下の実施の形態で示される数値、形状、材料、構成要素、構成要素の配置位置及び接続形態、ステップ、ステップの順序などは、一例であり、本発明を限定する主旨ではない。また、以下の実施の形態における構成要素のうち、最上位概念を示す独立請求項に記載されていない構成要素については、任意の構成要素として説明される。 Hereinafter, embodiments will be specifically described with reference to the drawings. It should be noted that each of the embodiments described below shows a comprehensive or specific example. The numerical values, shapes, materials, constituent elements, arrangement positions and connecting forms of the constituent elements, steps, order of steps, and the like shown in the following embodiments are merely examples and are not intended to limit the present invention. In addition, among the constituent elements in the following embodiments, constituent elements that are not described in the independent claims indicating the highest concept are described as optional constituent elements.

また、以下において、復号画像および復号ブロックは、それぞれ、再構成画像および再構成ブロックを意味する場合がある。また、画像ｘの復号画像は、復号された画像ｘを意味する。同様に、画像ｘの再構成画像は、再構成された画像ｘを意味する。 In the following description, a decoded image and a decoded block may mean a reconstructed image and a reconstructed block, respectively. A decoded image of image x means a decoded image x. Similarly, the reconstructed image of the image x means the reconstructed image x.

（実施の形態１）
＜全体構成＞
図１は、本実施の形態における画像処理システムの構成を示す図である。図１に示された画像処理システム１０は、サーバ２０、エンコーダ３０ａ、３０ｂ、３０ｃ、および、カメラ３５ａ、３５ｂ、３５ｃ等を備える。図１には、３台のエンコーダ３０ａ、３０ｂ、３０ｃが示されているが、エンコーダは、１台でもよいし、２台でもよいし、４台以上でもよい。同様に、カメラも、１台でもよいし、２台でもよいし、４台以上でもよい。 (Embodiment 1)
<Overall configuration>
FIG. 1 is a diagram showing a configuration of an image processing system according to the present embodiment. The image processing system 10 illustrated in FIG. 1 includes a server 20, encoders 30a, 30b, and 30c, and cameras 35a, 35b, and 35c. Although three encoders 30a, 30b, and 30c are shown in FIG. 1, the number of encoders may be one, two, or four or more. Similarly, the number of cameras may be one, two, or four or more.

サーバ２０は、背景画像データベース２１、制御部２２、処理部２３、および、通信部２４を備える。背景画像データベース２１は、背景画像を蓄積するためのデータベースである。制御部２２は、サーバ２０における各構成要素の動作を制御する。処理部２３は、情報処理を行う。サーバ２０の動作は、基本的に、処理部２３によって行われる。通信部２４は、エンコーダ３０ａ、３０ｂ、３０ｃ等と通信する。通信部２４は、インターネットを介して、外部の装置と通信してもよい。 The server 20 includes a background image database 21, a control unit 22, a processing unit 23, and a communication unit 24. The background image database 21 is a database for accumulating background images. The control unit 22 controls the operation of each component in the server 20. The processing unit 23 performs information processing. The operation of the server 20 is basically performed by the processing unit 23. The communication unit 24 communicates with the encoders 30a, 30b, 30c and the like. The communication unit 24 may communicate with an external device via the Internet.

なお、サーバ２０は、さらに、記憶部を備えてもよい。背景画像データベース２１は、サーバ２０における記憶部に含まれてもよい。 The server 20 may further include a storage unit. The background image database 21 may be included in a storage unit in the server 20.

エンコーダ３０ａは、記憶部３１ａ、制御部３２ａ、処理部３３ａ、および、通信部３４ａを備える。記憶部３１ａには、カメラ３５ａからの画像、および、符号化された画像などが記憶される。制御部３２ａは、エンコーダ３０ａにおける各構成要素の動作を制御する。処理部３３ａは、情報処理を行う。エンコーダ３０ａの動作は、基本的に、処理部３３ａによって行われる。特に、処理部３３ａは、カメラ３５ａからの入力画像を符号化する。通信部３４ａは、サーバ２０と通信する。 The encoder 30a includes a storage unit 31a, a control unit 32a, a processing unit 33a, and a communication unit 34a. The storage unit 31a stores an image from the camera 35a, an encoded image, and the like. The control unit 32a controls the operation of each component in the encoder 30a. The processing unit 33a performs information processing. The operation of the encoder 30a is basically performed by the processing unit 33a. In particular, the processing unit 33a encodes an input image from the camera 35a. The communication unit 34 a communicates with the server 20.

エンコーダ３０ｂは、記憶部３１ｂ、制御部３２ｂ、処理部３３ｂ、および、通信部３４ｂを備える。エンコーダ３０ｃは、記憶部３１ｃ、制御部３２ｃ、処理部３３ｃ、および、通信部３４ｃを備える。これらは、それぞれ、エンコーダ３０ａの構成要素と同様の構成要素である。エンコーダ３０ａは、カメラ３５ａから得られる画像を符号化し、エンコーダ３０ｂは、カメラ３５ｂから得られる画像を符号化し、エンコーダ３０ｃは、カメラ３５ｃから得られる画像を符号化する。 The encoder 30b includes a storage unit 31b, a control unit 32b, a processing unit 33b, and a communication unit 34b. The encoder 30c includes a storage unit 31c, a control unit 32c, a processing unit 33c, and a communication unit 34c. These are the same components as the components of the encoder 30a. The encoder 30a encodes an image obtained from the camera 35a, the encoder 30b encodes an image obtained from the camera 35b, and the encoder 30c encodes an image obtained from the camera 35c.

例えば、エンコーダ３０ａは、カメラ３５ａからの入力画像を符号化し、符号化された入力画像をサーバ２０に蓄積する。具体的には、カメラ３５ａの撮影画像は、エンコーダ３０ａに入力画像として入力される。エンコーダ３０ａは、処理部３３ａで入力画像を符号化し、符号化された入力画像をサーバ２０に送信する。 For example, the encoder 30 a encodes an input image from the camera 35 a and stores the encoded input image in the server 20. Specifically, the captured image of the camera 35a is input as an input image to the encoder 30a. The encoder 30 a encodes the input image by the processing unit 33 a and transmits the encoded input image to the server 20.

また、サーバ２０は、背景画像データベース２１における背景画像をエンコーダ３０ａに送信する。エンコーダ３０ａは、サーバ２０から送信された背景画像を利用して、入力画像を符号化する。また、サーバ２０は、インターネットから各種情報を取得する。 Further, the server 20 transmits the background image in the background image database 21 to the encoder 30a. The encoder 30a encodes the input image using the background image transmitted from the server 20. Further, the server 20 acquires various information from the Internet.

なお、ここでは、エンコーダ３０ａの構成、エンコーダ３０ａの動作、および、サーバ２０とエンコーダ３０ａとの間で行われる動作が、主に示されている。 Here, the configuration of the encoder 30a, the operation of the encoder 30a, and the operation performed between the server 20 and the encoder 30a are mainly shown.

エンコーダ３０ｂの構成、エンコーダ３０ｂの動作、および、サーバ２０とエンコーダ３０ｂとの間で行われる動作も、エンコーダ３０ａの構成、エンコーダ３０ａの動作、および、サーバ２０とエンコーダ３０ａとの間で行われる動作と同様である。エンコーダ３０ｃの構成、エンコーダ３０ｃの動作、および、サーバ２０とエンコーダ３０ｃとの間で行われる動作も、エンコーダ３０ａの構成、エンコーダ３０ａの動作、および、サーバ２０とエンコーダ３０ａとの間で行われる動作と同様である。 The configuration of the encoder 30b, the operation of the encoder 30b, and the operation performed between the server 20 and the encoder 30b are also the configuration of the encoder 30a, the operation of the encoder 30a, and the operation performed between the server 20 and the encoder 30a. It is the same. The configuration of the encoder 30c, the operation of the encoder 30c, and the operation performed between the server 20 and the encoder 30c are also the configuration of the encoder 30a, the operation of the encoder 30a, and the operation performed between the server 20 and the encoder 30a. It is the same.

＜動作（全体）＞
次に、図２を参照しつつ、符号化全体フローについて説明する。図２は、図１に示された画像処理システム１０の処理フローを示す図である。 <Operation (overall)>
Next, the overall coding flow will be described with reference to FIG. FIG. 2 is a diagram showing a processing flow of the image processing system 10 shown in FIG.

まず、サーバ２０およびエンコーダ３０ａは、データの送受信を行って、背景画像を生成する（Ｓ２０１およびＳ２１０）。そして、サーバ２０およびエンコーダ３０ａは、符号化に使用される複数の背景画像を選択する（Ｓ２０２およびＳ２１１）。詳細は後述する。 First, the server 20 and the encoder 30a transmit and receive data to generate a background image (S201 and S210). Then, the server 20 and the encoder 30a select a plurality of background images used for encoding (S202 and S211). Details will be described later.

次に、サーバ２０は、エンコーダ３０ａへ符号化の開始要求を送信する（Ｓ２０３）。エンコーダ３０ａは、開始要求を受信する（Ｓ２１２）。 Next, the server 20 transmits an encoding start request to the encoder 30a (S203). The encoder 30a receives the start request (S212).

次に、エンコーダ３０ａは、符号化に使用対象の背景画像を決定する（Ｓ２１３）。ここでは、エンコーダ３０ａは、選択処理（Ｓ２１１）で選択された複数の背景画像のうちの１つを使用対象の背景画像として決定する。具体的には、エンコーダ３０ａは、撮影が行われた時点を示す時刻情報を背景画像の決定に用いる。撮影が行われた時点として、符号化が行われる現時点が用いられてもよい。例えば、エンコーダ３０ａは、時刻ｔにおいて、第１背景画像を使用し、時刻ｔ＋１において、第２背景画像を使用する。 Next, the encoder 30a determines a background image to be used for encoding (S213). Here, the encoder 30a determines one of the plurality of background images selected in the selection process (S211) as the background image to be used. Specifically, the encoder 30a uses time information indicating a time point when photographing is performed to determine a background image. The current time point at which encoding is performed may be used as the time point when shooting is performed. For example, the encoder 30a uses the first background image at time t and uses the second background image at time t + 1.

次に、エンコーダ３０ａは、決定処理（Ｓ２１３）で決定された背景画像が以前の背景画像から切り替わった場合、背景画像を符号化し、背景画像の符号列をサーバ２０へ送信する（Ｓ２１４およびＳ２１５）。 Next, when the background image determined in the determination process (S213) is switched from the previous background image, the encoder 30a encodes the background image and transmits the code string of the background image to the server 20 (S214 and S215). .

次に、エンコーダ３０ａは、背景画像を参照してカメラ画像を符号化する。そして、エンコーダ３０ａは、カメラ画像の符号列をサーバ２０へ送信する（Ｓ２１６）。詳細は後述する。 Next, the encoder 30a encodes the camera image with reference to the background image. Then, the encoder 30a transmits the code sequence of the camera image to the server 20 (S216). Details will be described later.

サーバ２０は、エンコーダ３０ａから、背景画像の符号列、および、カメラ画像の符号列を受信する（Ｓ２０４）。サーバ２０は、例えば、図３のように、これらの符号列を連結し、サーバ２０内のメモリ（記憶部）に保存する。 The server 20 receives the code sequence of the background image and the code sequence of the camera image from the encoder 30a (S204). For example, as shown in FIG. 3, the server 20 concatenates these code strings and stores them in a memory (storage unit) in the server 20.

図３において、Ｉ（ｘ）は、画面内符号化（面内予測）が行われる画像を示し、Ｐ（ｘ）は、片方向参照符号化（片方向の面間予測）が行われる画像を示し、Ｂ（ｘ）は、双方向参照符号化（双方向の面間予測）が行われる画像を示す。ｘは、符号化の順番（符号化順）を示す。図３における画像間の矢印は、参照関係を示す。例えば、Ｐ（１）は、Ｉ（０）を参照画像として用いて符号化され、Ｂ（３）はＩ（０）、Ｐ（１）およびＢ（２）を参照画像として用いて符号化される。 In FIG. 3, I (x) represents an image on which intra-frame coding (in-plane prediction) is performed, and P (x) represents an image on which one-way reference coding (one-way inter-plane prediction) is performed. B (x) indicates an image for which bi-directional reference coding (bi-directional inter prediction) is performed. x indicates an encoding order (encoding order). The arrows between the images in FIG. 3 indicate the reference relationship. For example, P (1) is encoded using I (0) as a reference image, and B (3) is encoded using I (0), P (1) and B (2) as reference images. The

また、Ｉ（０）、Ｐ（７）およびＩ（ｔ）は、背景画像であり、これらは長期的に参照されるため、長期参照画像として符号化される。 Further, I (0), P (7) and I (t) are background images, and since these are referred to for a long time, they are encoded as a long-term reference image.

次に、エンコーダ３０ａは、使用された背景画像を更新する（Ｓ２１７）。その際、エンコーダ３０ａは、符号化処理（Ｓ２１６）で生成された符号列に含まれるカメラ画像を復号することで得られるカメラ画像を用いて、背景画像を更新する。例えば、背景画像に含まれる動物体で背景の一部が隠れてしまい、背景画像に背景の全てが含まれていない場合がある。そこで、エンコーダ３０ａは、カメラ画像を用いて背景画像を更新する。 Next, the encoder 30a updates the used background image (S217). At that time, the encoder 30a updates the background image using the camera image obtained by decoding the camera image included in the code string generated in the encoding process (S216). For example, a part of the background may be hidden by the moving object included in the background image, and the background image may not include the entire background. Therefore, the encoder 30a updates the background image using the camera image.

具体的には、例えば、動物体は１つの場所に留まらずに様々な場所に移動する。そのため、複数のカメラ画像における画素の平均値によって、背景の画素値を特定することが可能である。そこで、エンコーダ３０ａは、使用された背景画像と、復号されたカメラ画像との間で画素毎に画素値の平均を算出する。そして、エンコーダ３０ａは、画素毎に算出された平均で背景画像を更新することにより背景画像から動物体を疑似的に削除することができる。 Specifically, for example, the moving object moves to various places without staying at one place. Therefore, the background pixel value can be specified by the average value of the pixels in a plurality of camera images. Therefore, the encoder 30a calculates the average pixel value for each pixel between the used background image and the decoded camera image. Then, the encoder 30a can artificially delete the moving object from the background image by updating the background image with the average calculated for each pixel.

また、背景は、時間経過と共に変化する。このような変化に追従するように、背景画像が更新されてもよい。例えば、夜に近づくに従って暗くなる背景に応じて、背景画像が更新されてもよい。ここでも、復号されたカメラ画像を用いて、背景画像が更新されてもよい。具体的には、上記のように、画素毎に算出される平均が背景画像の画素値に適用されてもよい。 The background changes with time. The background image may be updated so as to follow such a change. For example, the background image may be updated according to the background that becomes darker as the night approaches. Again, the background image may be updated using the decoded camera image. Specifically, as described above, an average calculated for each pixel may be applied to the pixel value of the background image.

次に、サーバ２０は、背景画像を更新する（Ｓ２０５）。ここでは、サーバ２０は、エンコーダ３０ａでの更新処理（Ｓ２１７）とは異なり、背景画像データベース２１内の背景画像をカメラ画像で更新する。そして、エンコーダ３０ａへの背景画像の前回の送信から一定時間が経過している場合、サーバ２０は、エンコーダ３０ａに現時点の背景画像を送信する（Ｓ２０７）。サーバ２０から背景画像が送信された場合（Ｓ２１８でＹｅｓ）エンコーダ３０ａは、サーバ２０から送信された背景画像を受信する（Ｓ２１９）。 Next, the server 20 updates the background image (S205). Here, unlike the update process (S217) in the encoder 30a, the server 20 updates the background image in the background image database 21 with the camera image. If a predetermined time has elapsed since the previous transmission of the background image to the encoder 30a, the server 20 transmits the current background image to the encoder 30a (S207). When the background image is transmitted from the server 20 (Yes in S218), the encoder 30a receives the background image transmitted from the server 20 (S219).

サーバ２０は、ユーザから符号化の停止要求を受けるまで、上記の処理（Ｓ２０４〜Ｓ２０７）を繰り返す（Ｓ２０８でＮｏ）。サーバ２０は、停止要求を受けた場合（Ｓ２０８でＹｅｓ）、エンコーダ３０ａに符号化の停止要求を送信し、処理を終える（Ｓ２０９）。エンコーダ３０ａは、サーバ２０から符号化の停止要求を受けるまで、上記の処理（Ｓ２１３〜Ｓ２１９）を繰り返す（Ｓ２２０でＮｏ）。エンコーダ３０ａは、停止要求を受けた場合（Ｓ２２０でＹｅｓ）、処理を終える。 The server 20 repeats the above processing (S204 to S207) until receiving an encoding stop request from the user (No in S208). When the server 20 receives the stop request (Yes in S208), the server 20 transmits an encoding stop request to the encoder 30a, and ends the process (S209). The encoder 30a repeats the above processing (S213 to S219) until it receives an encoding stop request from the server 20 (No in S220). When the encoder 30a receives the stop request (Yes in S220), the encoder 30a finishes the process.

＜動作（背景画像作成）＞
次に、図４を参照しつつ、背景画像の生成処理（Ｓ２０１およびＳ２１０）のフローについて説明する。図４は、図２に示された生成処理（Ｓ２０１およびＳ２１０）のフローを示す図である。 <Operation (Create background image)>
Next, a flow of background image generation processing (S201 and S210) will be described with reference to FIG. FIG. 4 is a diagram showing a flow of the generation processing (S201 and S210) shown in FIG.

まず、サーバ２０は、インターネットから、日付、時刻および天気（天候）を取得する（Ｓ３０１）。また、サーバ２０は、エンコーダ３０ａ、および、他のエンコーダ３０ｂ、３０ｃから、カメラ画像およびカメラ情報を取得する（Ｓ３０２およびＳ３０８）。ここで、カメラ情報は、例えば、カメラ設置位置、パンチルト角度（パンおよびチルトのうち少なくとも一方に対応する角度）、および、ズーム倍率等の制御データを示す。カメラ情報は、撮影情報とも表現される。 First, the server 20 acquires date, time, and weather (weather) from the Internet (S301). Further, the server 20 acquires camera images and camera information from the encoder 30a and the other encoders 30b and 30c (S302 and S308). Here, the camera information indicates control data such as a camera installation position, a pan / tilt angle (an angle corresponding to at least one of pan and tilt), zoom magnification, and the like. Camera information is also expressed as shooting information.

次に、サーバ２０は、日付、時刻、カメラ設置位置の天気、カメラ画像、および、カメラ情報を用いて、作成対象の背景画像を選択する（Ｓ３０３）。より具体的には、サーバ２０は、上記の複数の項目を検索キーとして用いて、背景画像データベース２１から１枚の背景画像を作成対象の背景画像として選択する。あるいは、サーバ２０は、画像マッチングに従って、カメラ画像に最も適合する背景画像を作成対象の背景画像として選択してもよい。 Next, the server 20 selects a background image to be created using the date, time, weather at the camera installation position, camera image, and camera information (S303). More specifically, the server 20 uses the plurality of items as search keys to select one background image from the background image database 21 as a background image to be created. Alternatively, the server 20 may select a background image that best matches the camera image as a creation target background image according to image matching.

次に、サーバ２０は、複数のカメラ画像に対応するカメラ情報、および、複数のカメラ画像のそれぞれの画像特徴点を用いて、背景画像を作成する（Ｓ３０４）。ここでは、サーバ２０は、パノラマ画像のように、各カメラ画像よりも大きなサイズを有する背景画像を作成する。 Next, the server 20 creates a background image using the camera information corresponding to the plurality of camera images and the image feature points of the plurality of camera images (S304). Here, the server 20 creates a background image having a size larger than each camera image, such as a panoramic image.

そして、サーバ２０は、ユーザから背景画像作成の停止要求を受けるまで、一定時間間隔で上記の処理（Ｓ３０１〜Ｓ３０４）を繰り返す。サーバ２０は、停止要求を受けた場合、エンコーダ３０ａへ背景画像作成の停止要求を送信し、処理を終える（Ｓ３０５、Ｓ３０６およびＳ３０７）。また、エンコーダ３０ａは、サーバ２０から背景画像作成の停止要求を受けるまで、一定時間間隔で送信処理（Ｓ３０８）を繰り返す。そして、エンコーダ３０ａは、停止要求を受けた場合、処理を終える（Ｓ３０９およびＳ３１０）。 Then, the server 20 repeats the above processing (S301 to S304) at regular time intervals until receiving a background image creation stop request from the user. When the server 20 receives the stop request, the server 20 transmits a background image creation stop request to the encoder 30a, and ends the processing (S305, S306, and S307). Further, the encoder 30a repeats the transmission process (S308) at regular time intervals until receiving a background image creation stop request from the server 20. When the encoder 30a receives the stop request, the encoder 30a finishes the process (S309 and S310).

上記の動作で、カメラ３５ａのパン、チルトおよびズーム等によって得られる複数の画像から大きなサイズの背景画像が生成される。図５および図６に例を示す。例えば、図５のように、３月１５日１０時の晴れの日に、パン、チルトおよびズーム等によって複数のカメラ画像が取得される。そして、これらのカメラ画像から大きな背景画像が生成される。 With the above operation, a large-size background image is generated from a plurality of images obtained by panning, tilting, zooming, and the like of the camera 35a. Examples are shown in FIGS. For example, as shown in FIG. 5, a plurality of camera images are acquired by panning, tilting, zooming, and the like on a clear day of March 15 at 10:00. Then, a large background image is generated from these camera images.

より具体的には、パンおよびチルトによって得られる複数の画像からパノラマ画像のような背景画像が生成される。また、ズームによって得られる画像から、解像度がより高い鮮明な背景画像が生成される。 More specifically, a background image such as a panoramic image is generated from a plurality of images obtained by panning and tilting. In addition, a clear background image with higher resolution is generated from the image obtained by zooming.

背景画像の解像度は、最も大きくズームインが行われた画像に調整されてもよい。そして、最も大きくズームインが行われていないその他の画像に対して拡大処理を施すことにより、解像度が調整されてもよい。サーバ２０は、パンチルト角度またはズーム倍率を用いて、背景画像に対する複数のカメラ画像のそれぞれの位置を大まかに推定し、画像特徴点を用いて高精度に複数の画像を結合することにより、背景画像を作成してもよい。 The resolution of the background image may be adjusted to the image with the largest zoom-in. Then, the resolution may be adjusted by performing enlargement processing on the other image that has not been zoomed in to the largest extent. The server 20 roughly estimates the position of each of the plurality of camera images with respect to the background image using the pan / tilt angle or zoom magnification, and combines the plurality of images with high accuracy using the image feature points, thereby obtaining the background image. May be created.

また、サーバ２０は、図６のように、別の日時および別の天気での撮影によって得られるカメラ画像から別の背景画像を生成する。すなわち、サーバ２０は、複数の状況に応じて、複数の背景画像を作成する。 Moreover, the server 20 produces | generates another background image from the camera image obtained by imaging | photography on another date and another weather like FIG. That is, the server 20 creates a plurality of background images according to a plurality of situations.

＜動作（背景画像選択）＞
次に、図７を参照しつつ、背景画像の選択処理（Ｓ２０２およびＳ２１１）のフローについて説明する。図７は、図２に示された選択処理（Ｓ２０２およびＳ２１１）のフローを示す図である。 <Operation (background image selection)>
Next, the flow of background image selection processing (S202 and S211) will be described with reference to FIG. FIG. 7 is a diagram showing a flow of the selection process (S202 and S211) shown in FIG.

まず、サーバ２０は、インターネットから日付、時刻および天気を取得する（Ｓ４０１）。また、サーバ２０は、エンコーダ３０ａから、カメラ画像およびカメラ情報を取得する（Ｓ４０２およびＳ４０７）。 First, the server 20 acquires the date, time, and weather from the Internet (S401). Further, the server 20 acquires a camera image and camera information from the encoder 30a (S402 and S407).

そして、サーバ２０は、日付、時刻、カメラ設置位置の天気、カメラ画像、カメラ情報（パンチルト角度およびズーム倍率）を用いて最も適合する背景画像を選択する（Ｓ４０３）。サーバ２０は、カメラ画像を用いて画像マッチングで背景画像を選択してもよい。そして、サーバ２０は、選択された背景画像をエンコーダ３０ａへ送信する（Ｓ４０４）。エンコーダ３０ａは、サーバ２０から送信された背景画像を受信する（Ｓ４０８）。 Then, the server 20 selects the most suitable background image using the date, time, weather at the camera installation position, camera image, and camera information (pan tilt angle and zoom magnification) (S403). The server 20 may select a background image by image matching using a camera image. Then, the server 20 transmits the selected background image to the encoder 30a (S404). The encoder 30a receives the background image transmitted from the server 20 (S408).

そして、サーバ２０は、背景画像の選択処理（Ｓ４０３）に用いられる時刻を一定時間進める（Ｓ４０５）。そして、指定枚数の背景画像の送受信が完了するまで、背景画像の選択処理（Ｓ４０３）、送信処理（Ｓ４０４）、受信処理（Ｓ４０８）、および、時刻の変更処理（Ｓ４０５）が繰り返される（Ｓ４０６およびＳ４０９）。 Then, the server 20 advances the time used for the background image selection process (S403) by a predetermined time (S405). The background image selection process (S403), the transmission process (S404), the reception process (S408), and the time change process (S405) are repeated (S406 and S406) until transmission / reception of the designated number of background images is completed. S409).

これにより、エンコーダ３０ａは、カメラ設置位置の天気、カメラ画像およびカメラ情報に適合する複数の背景画像のうち、時刻ｔ、時刻ｔ＋α、時刻ｔ＋α×２、・・・、時刻ｔ＋α×ｍに対応する複数の背景画像を受信する。 Thus, the encoder 30a corresponds to time t, time t + α, time t + α × 2,..., Time t + α × m among a plurality of background images that match the weather at the camera installation position, the camera image, and the camera information. Receive multiple background images.

＜動作（背景画像更新）＞
次に、図８を参照しつつ、サーバ２０における背景画像の更新処理（Ｓ２０５）のフローについて説明する。図８は、図２に示された更新処理（Ｓ２０５）のフローを示す図である。 <Operation (Update background image)>
Next, a flow of background image update processing (S205) in the server 20 will be described with reference to FIG. FIG. 8 is a diagram showing the flow of the update process (S205) shown in FIG.

まず、サーバ２０は、符号列における符号化されたカメラ画像を復号することにより、復号されたカメラ画像を取得する（Ｓ５０１）。 First, the server 20 acquires the decoded camera image by decoding the encoded camera image in the code string (S501).

次に、サーバ２０は、インターネットから日付、時刻および天気を取得する（Ｓ５０２）。サーバ２０は、インターネットから取得された情報、復号されたカメラ画像、および、カメラ情報を用いて、背景画像を選択する（Ｓ５０３）。 Next, the server 20 acquires the date, time, and weather from the Internet (S502). The server 20 selects a background image using the information acquired from the Internet, the decoded camera image, and the camera information (S503).

サーバ２０は、カメラ情報、および、カメラ画像の画像特徴点を用いて、背景画像の作成処理（Ｓ３０４）と同様に、背景画像を更新する（Ｓ５０４）。サーバ２０は、更新処理において、具体的には、既存の背景画像の画素値と新しいカメラ画像の画素値との平均で背景画像の画素値を更新してもよいし、重み付け平均で背景画像の画素値を更新してもよい。 The server 20 uses the camera information and the image feature points of the camera image to update the background image in the same manner as the background image creation process (S304) (S504). In the update process, specifically, the server 20 may update the pixel value of the background image by the average of the pixel value of the existing background image and the pixel value of the new camera image, or the background image by the weighted average. The pixel value may be updated.

また、サーバ２０は、カメラ画像における被写体（オブジェクト）を認識し、被写体の領域を除く背景領域のみを更新してもよい。その際、サーバ２０は、動いている被写体を認識して、動いている被写体の領域を除く背景領域のみを更新してもよい。 Further, the server 20 may recognize a subject (object) in the camera image and update only the background region excluding the subject region. At that time, the server 20 may recognize the moving subject and update only the background region excluding the moving subject region.

＜符号化構成＞
図９は、図１に示されたエンコーダ３０ａの処理部３３ａの構成を主に示す図である。処理部３３ａは、分割部４１、減算部４２、変換部４３、可変長符号化部４４、逆変換部４５、加算部４６、フレームメモリ４７および予測部４８を備える。フレームメモリ４７は、記憶部３１ａに含まれてもよい。 <Encoding configuration>
FIG. 9 is a diagram mainly showing a configuration of the processing unit 33a of the encoder 30a shown in FIG. The processing unit 33a includes a dividing unit 41, a subtracting unit 42, a converting unit 43, a variable length encoding unit 44, an inverse converting unit 45, an adding unit 46, a frame memory 47, and a predicting unit 48. The frame memory 47 may be included in the storage unit 31a.

分割部４１は、カメラ画像または背景画像を複数のブロックに分割する。減算部４２は、分割で得られたブロックから予測ブロックを減算することにより、差分ブロックを出力する。変換部４３は、差分ブロックに対して周波数変換を行って係数を出力する。可変長符号化部４４は、係数に対して可変長符号化を行う。逆変換部４５は、係数に対して逆周波数変換を行って差分ブロックを出力する。 The dividing unit 41 divides the camera image or the background image into a plurality of blocks. The subtraction unit 42 outputs the difference block by subtracting the prediction block from the block obtained by the division. The conversion unit 43 performs frequency conversion on the difference block and outputs a coefficient. The variable length coding unit 44 performs variable length coding on the coefficients. The inverse transform unit 45 performs inverse frequency transform on the coefficient and outputs a difference block.

加算部４６は、予測ブロックと差分ブロックとを加算することにより復号ブロック（再構成ブロック）を生成する。フレームメモリ４７には、復号ブロックで構成される画像が記憶される。フレームメモリ４７には、背景画像が分割部４１等を経由せずに直接記憶されてもよい。予測部４８は、分割で得られたブロックと、フレームメモリ４７に記憶された画像とを用いて、予測ブロックを生成する。 The adding unit 46 generates a decoded block (reconstructed block) by adding the prediction block and the difference block. The frame memory 47 stores an image composed of decoded blocks. The background image may be directly stored in the frame memory 47 without going through the dividing unit 41 or the like. The prediction unit 48 generates a prediction block using the block obtained by the division and the image stored in the frame memory 47.

＜動作（符号化）＞
次に、図１０を参照しつつ、エンコーダ３０ａでのカメラ画像の符号化処理（Ｓ２１６）のフローについて説明する。図１０は、図２に示された符号化処理（Ｓ２１６）のフローを示す図である。 <Operation (encoding)>
Next, the flow of the camera image encoding process (S216) in the encoder 30a will be described with reference to FIG. FIG. 10 is a diagram showing a flow of the encoding process (S216) shown in FIG.

まず、エンコーダ３０ａは、背景画像をスケーリングし、スケーリング処理に用いられたスケーリングパラメータを符号化する（Ｓ７０１）。ここで、エンコーダ３０ａは、背景画像の解像度と現時点のカメラ画像（符号化対象画像）の解像度とが適合するように、背景画像をスケーリングする。 First, the encoder 30a scales the background image and encodes the scaling parameter used for the scaling process (S701). Here, the encoder 30a scales the background image so that the resolution of the background image matches the resolution of the current camera image (encoding target image).

スケーリング処理の例を図１１に示す。図１１は、図１０に示されたスケーリング処理（Ｓ７０１）を示す図である。 An example of the scaling process is shown in FIG. FIG. 11 is a diagram showing the scaling process (S701) shown in FIG.

例えば、背景画像の解像度は、最も大きくズームインが行われた画像の解像度に対応する。そのため、基本的に、背景画像の解像度は符号化対象画像の解像度よりも高い。そこで、エンコーダ３０ａは、背景画像を符号化対象画像の参照画像として使用できるように背景画像をスケーリングすることにより、背景画像の解像度を符号化対象画像の解像度に適合させる。 For example, the resolution of the background image corresponds to the resolution of the image that has been zoomed in the most. Therefore, basically, the resolution of the background image is higher than the resolution of the encoding target image. Therefore, the encoder 30a adapts the resolution of the background image to the resolution of the encoding target image by scaling the background image so that the background image can be used as a reference image of the encoding target image.

例えば、エンコーダ３０ａは、スケーリングにおいて、画像認識等で使われているＳＩＦＴまたはＳＵＲＦを用いて、画像特徴点のマッチングを行い、背景画像の被写体のサイズが符号化対象画像の被写体のサイズと等しくなるように、背景画像をリサイズする。 For example, the encoder 30a performs matching of image feature points using SIFT or SURF used in image recognition or the like in scaling, and the size of the subject of the background image becomes equal to the size of the subject of the encoding target image. So resize the background image.

次に、エンコーダ３０ａは、背景画像を変換（画像処理）し、その変換処理に用いられた変換パラメータを符号化する（Ｓ７０２）。これにより、エンコーダ３０ａは、背景画像を現時点のカメラ画像に適合させる。 Next, the encoder 30a converts the background image (image processing), and encodes the conversion parameter used for the conversion processing (S702). Thereby, the encoder 30a adapts the background image to the current camera image.

変換処理の例を図１２に示す。図１２は、図１０に示された変換処理（Ｓ７０２）を示す図である。 An example of the conversion process is shown in FIG. FIG. 12 is a diagram showing the conversion process (S702) shown in FIG.

複数のカメラ３５ａ、３５ｂ、３５ｃから得られた複数のカメラ画像が統合された１つの背景画像のアングルと、カメラ３５ａから得られた符号化対象画像のアングルとが一致するとは限らない。また、天気または照明に応じて、背景画像の全体的な輝度と符号化対象画像の全体的な輝度とが互いに異なる場合がある。そこで、エンコーダ３０ａは、背景画像の被写体と、符号化対象画像の被写体とが一致するよう、背景画像に対して射影変換および輝度変換などを行う。 The angle of one background image obtained by integrating a plurality of camera images obtained from the plurality of cameras 35a, 35b, and 35c does not necessarily match the angle of the encoding target image obtained from the camera 35a. Also, depending on the weather or lighting, the overall brightness of the background image and the overall brightness of the encoding target image may differ from each other. Therefore, the encoder 30a performs projective transformation and luminance conversion on the background image so that the subject of the background image matches the subject of the encoding target image.

エンコーダ３０ａは、スケーリング処理（Ｓ７０１）と同様に、ＳＩＦＴまたはＳＵＲＦを用いて、画像特徴点のマッチングを行い、変換パラメータを算出する。 Similar to the scaling process (S701), the encoder 30a uses SIFT or SURF to perform matching of image feature points and calculate conversion parameters.

次に、エンコーダ３０ａは、全体ベクトルを算出し、算出された全体ベクトルを符号化する（Ｓ７０３）。全体ベクトルは、背景画像と符号化対象画像とのズレを表わす。言い換えれば、背景画像に対する符号化対象画像の相対的な位置を指し示す。 Next, the encoder 30a calculates a total vector and encodes the calculated total vector (S703). The whole vector represents a deviation between the background image and the encoding target image. In other words, it indicates the relative position of the encoding target image with respect to the background image.

全体ベクトルの例を図１３に示す。図１３は、図１０の算出処理（Ｓ７０３）で算出される全体ベクトルを示す図である。 An example of the entire vector is shown in FIG. FIG. 13 is a diagram showing the entire vector calculated in the calculation process (S703) of FIG.

背景画像は、パンチルトによって得られる複数のカメラ画像から生成される。そのため、基本的には、背景画像の画像サイズは、符号化対象画像の画像サイズよりも大きい。そこで、エンコーダ３０ａは、背景画像と符号化対象画像とのズレを全体ベクトルとして算出する。そして、エンコーダ３０ａは、全体ベクトルをベースとして用いて各ブロックの動きベクトルを符号化することにより動きベクトルの符号量を抑制することができる。 The background image is generated from a plurality of camera images obtained by pan / tilt. Therefore, basically, the image size of the background image is larger than the image size of the encoding target image. Therefore, the encoder 30a calculates a deviation between the background image and the encoding target image as an entire vector. Then, the encoder 30a can suppress the coding amount of the motion vector by encoding the motion vector of each block using the entire vector as a base.

例えば、符号化対象画像の各ブロックで同じ動きベクトルが符号化される可能性がある。全体ベクトルが用いられることにより、この場合の符号量が抑制される。 For example, the same motion vector may be encoded in each block of the encoding target image. By using the whole vector, the code amount in this case is suppressed.

エンコーダ３０ａは、スケーリング処理（Ｓ７０１）と同様に、ＳＩＦＴまたはＳＵＲＦを用いて、画像特徴点のマッチングを行い、全体ベクトルを算出する。 Similar to the scaling process (S701), the encoder 30a performs matching of image feature points using SIFT or SURF, and calculates an entire vector.

次に、エンコーダ３０ａにおいて、分割部４１は、符号化対象画像を複数のブロックに分割する（Ｓ７０４）。予測部４８は、処理対象ブロックの予測ブロックを生成する（Ｓ７０５）。減算部４２は、予測ブロックと符号ブロックとの差分ブロックを生成する（Ｓ７０６）。そして、変換部４３は、差分ブロックに対して周波数変換を行い、変換係数を生成する（Ｓ７０７）。 Next, in the encoder 30a, the dividing unit 41 divides the encoding target image into a plurality of blocks (S704). The prediction unit 48 generates a prediction block of the processing target block (S705). The subtraction unit 42 generates a difference block between the prediction block and the code block (S706). And the conversion part 43 performs frequency conversion with respect to a difference block, and produces | generates a conversion coefficient (S707).

次に、エンコーダ３０ａにおいて、可変長符号化部４４は、変換係数に対して可変長符号化を行う（Ｓ７０８）。逆変換部４５は、変換係数に対して逆周波数変換を行う（Ｓ７０９）。加算部４６は、逆周波数変換によって得られたブロックと予測ブロックとを加算し、復号ブロックを生成する（Ｓ７１０）。 Next, in the encoder 30a, the variable length coding unit 44 performs variable length coding on the transform coefficient (S708). The inverse transform unit 45 performs inverse frequency transform on the transform coefficient (S709). The adding unit 46 adds the block obtained by the inverse frequency transform and the prediction block, and generates a decoded block (S710).

なお、予測部４８は、予測ブロックを生成する際、背景画像、復号済み画像（参照画像）、または、同一画像内の復号済みブロックを参照する。また、エンコーダ３０ａは、算出処理（Ｓ７０３）で算出された全体ベクトルをベースとし用いて、予測ブロックの生成に用いられるベクトルと全体ベクトルとの差分を動きベクトルとして符号化する。図１３の例のように、符号化される動きベクトルに全体ベクトルを加算することで得られるベクトルが、予測ブロックの生成のためのベクトルとして用いられる。 Note that the prediction unit 48 refers to a background image, a decoded image (reference image), or a decoded block in the same image when generating a prediction block. Further, the encoder 30a encodes a difference between the vector used for generating the prediction block and the entire vector as a motion vector using the entire vector calculated in the calculation process (S703) as a base. As in the example of FIG. 13, a vector obtained by adding the entire vector to the motion vector to be encoded is used as a vector for generating a prediction block.

エンコーダ３０ａは、全てのブロックの符号化が完了するまで、ブロックの符号化処理（Ｓ７０５〜Ｓ７１０）を繰り返す（Ｓ７１１）。 The encoder 30a repeats the block encoding process (S705 to S710) until the encoding of all the blocks is completed (S711).

＜効果＞
以上、本実施の形態では、広範囲かつ高解像度の背景画像が使用される。これにより、特に、カメラ３５ａがパン、チルトまたはズームを行う場合において、符号化効率の向上が可能である。 <Effect>
As described above, in this embodiment, a wide range and high resolution background image is used. Thereby, especially when the camera 35a performs panning, tilting, or zooming, the encoding efficiency can be improved.

より具体的には、画像処理システム１０は、カメラ３５ａ等がパン、チルトまたはズームを行って撮影を行うことで得られた複数の画像から、事前に背景画像を生成する。そして、画像処理システム１０は、生成された背景画像を参照画像としてカメラ画像の符号化に利用する。 More specifically, the image processing system 10 generates a background image in advance from a plurality of images obtained by performing shooting by panning, tilting, or zooming by the camera 35a or the like. The image processing system 10 uses the generated background image as a reference image for encoding a camera image.

これにより、符号化対象のカメラ画像の背景領域は、パン、チルトまたはズームが行われた場合でも、背景画像に含まれている可能性が高い。そのため、画像処理システム１０は、符号化対象のカメラ画像における動物体に大きい符号量を割り当てることができる。 Thus, the background area of the camera image to be encoded is highly likely to be included in the background image even when panning, tilting, or zooming is performed. Therefore, the image processing system 10 can assign a large code amount to the moving object in the encoding target camera image.

また、事前の撮影によって、動物体を含まない背景画像の準備が可能である。したがって、より質の高い背景画像の利用が可能である。 In addition, it is possible to prepare a background image that does not include a moving object by photographing in advance. Therefore, a higher quality background image can be used.

また、画像処理システム１０は、既に符号化された画像の復号画像（再構成画像）を用いて背景画像を更新することにより、背景画像が現在のカメラ画像に適合するように背景画像を更新することができる。すなわち、画像処理システム１０は、背景画像を符号化対象画像に近づけることができ、予測の確度を向上させることができる。 The image processing system 10 updates the background image so that the background image matches the current camera image by updating the background image using the decoded image (reconstructed image) of the already encoded image. be able to. That is, the image processing system 10 can bring the background image closer to the encoding target image, and can improve the accuracy of prediction.

例えば、背景画像に動物体が含まれている場合、背景の一部が隠れている。したがって、１枚の背景画像に全ての背景が含まれていない可能性がある。画像処理システム１０は、複数のカメラ画像を用いて背景画像を更新することにより、背景の一部が含まれていない背景画像を背景の一部が含まれているカメラ画像で補うことができる。また、背景が時間経過と共に変化する場合がある。例えば、夜に近づくに従って背景が暗くなる場合、背景画像が更新されてもよい。 For example, when the background image includes a moving object, a part of the background is hidden. Therefore, there is a possibility that not all backgrounds are included in one background image. The image processing system 10 can supplement a background image that does not include a portion of the background with a camera image that includes a portion of the background by updating the background image using a plurality of camera images. Also, the background may change over time. For example, the background image may be updated when the background becomes darker as the night approaches.

また、画像処理システム１０は、背景画像を更新することにより、例えば、椅子の配置が変わる場合等における背景の変化に迅速に追従できる。また、画像処理システム１０は、復号画像を用いて背景画像を更新することにより、サーバ２０とエンコーダ３０ａとの間で背景画像の送受信回数を削減することができる。 Further, the image processing system 10 can quickly follow changes in the background when, for example, the arrangement of the chair changes by updating the background image. Further, the image processing system 10 can reduce the number of times of transmission / reception of the background image between the server 20 and the encoder 30a by updating the background image using the decoded image.

また、画像処理システム１０は、エンコーダ３０ａとデコーダとの間で背景画像の更新処理を共通化してもよい。これにより、更新後の背景画像は、符号列に含まれなくてもよい。したがって、全体の符号量が削減される。 The image processing system 10 may share the background image update process between the encoder 30a and the decoder. Thus, the updated background image may not be included in the code string. Therefore, the overall code amount is reduced.

また、エンコーダ３０ａが背景画像を更新することにより、サーバ２０は、多数の様々な背景画像を保持しなくてもよい。したがって、サーバ２０におけるメモリ（記憶部）の容量の削減が可能である。 Further, the encoder 20a updates the background image, so that the server 20 does not have to hold a large number of various background images. Therefore, the capacity of the memory (storage unit) in the server 20 can be reduced.

また、画像処理システム１０は、複数のカメラから得られた複数の画像で背景画像を生成する。これにより、動物体によって背景の一部が隠れている場合でも、背景の全体を示す背景画像の作成が可能である。具体的には、カメラ３５ａで得られた画像に人の後ろの背景が写っていない場合において、カメラ３５ｂで得られた画像に人の後ろの背景が写っている可能性がある。そのような場合、画像処理システム１０は、カメラ３５ｂで得られた画像を用いて人の後ろの背景を含む背景画像を生成してもよい。 Further, the image processing system 10 generates a background image with a plurality of images obtained from a plurality of cameras. Thereby, even when a part of the background is hidden by the moving object, a background image showing the entire background can be created. Specifically, when the background behind the person is not reflected in the image obtained with the camera 35a, the background behind the person may be reflected in the image obtained with the camera 35b. In such a case, the image processing system 10 may generate a background image including the background behind the person using the image obtained by the camera 35b.

また、画像処理システム１０は、複数の撮影状況（複数の季節、複数の時刻、および、複数の天気）に対応する複数の背景画像を作成してもよい。例えば、画像処理システム１０は、背景画像を撮影状況に応じて適応的に切り替える。これにより、符号化効率が向上する。 Further, the image processing system 10 may create a plurality of background images corresponding to a plurality of shooting situations (a plurality of seasons, a plurality of times, and a plurality of weathers). For example, the image processing system 10 adaptively switches the background image according to the shooting situation. Thereby, encoding efficiency improves.

具体的には、画像処理システム１０は、図５および図６のように時刻または天気に従って、作成対象の背景画像を切り替える。また、画像処理システム１０は、図７のフローで示したように、符号化時点の撮影状況およびカメラ画像から、最も適切な背景画像を選択する。これにより、画像処理システム１０は、符号化対象画像に背景画像を近づけることができる。そして、これにより、画像処理システム１０は、予測誤差を低減することができる。 Specifically, the image processing system 10 switches the background image to be created according to time or weather as shown in FIGS. Further, as shown in the flow of FIG. 7, the image processing system 10 selects the most appropriate background image from the shooting situation and the camera image at the time of encoding. As a result, the image processing system 10 can bring the background image closer to the encoding target image. Thereby, the image processing system 10 can reduce the prediction error.

また、季節、時刻または天気に応じて背景画像が変化する可能性は高い。例えば、冬の１７時頃の背景は暗いが、夏の１７時頃の背景はそれほど暗くない。また、晴れ、くもり、雨および雪などの天気によっても背景は変わる。そのため、画像処理システム１０は、符号化時点（撮影時点）の撮影状況で複数の背景画像を切り替えることにより、符号化効率を向上させることができる。 In addition, there is a high possibility that the background image changes according to the season, time, or weather. For example, the background around 17:00 in winter is dark, but the background around 17:00 in summer is not so dark. The background also changes depending on the weather such as sunny, cloudy, rain and snow. Therefore, the image processing system 10 can improve the encoding efficiency by switching a plurality of background images in the shooting state at the time of encoding (shooting time).

また、画像処理システム１０は、図１２のように背景画像を変換し、変換された背景画像を参照画像として使用する。これにより、背景画像が符号化対象画像に近づき、予測誤差が低減される。また、これにより、背景画像を保存するためのフレームメモリ４７の容量の削減が可能である。 Further, the image processing system 10 converts a background image as shown in FIG. 12, and uses the converted background image as a reference image. As a result, the background image approaches the encoding target image, and the prediction error is reduced. Thereby, the capacity of the frame memory 47 for storing the background image can be reduced.

予測誤差は、多数の背景画像を用意することで、各背景画像を変換することなく、低減されてもよい。しかし、この場合、大きな容量を有するフレームメモリ４７が利用される。また、この場合、背景画像を選択するための処理量も大きい。また、背景画像の切り替えに応じて、サーバ２０とエンコーダ３０ａ等との間で背景画像の送受信が発生する。そのため、この場合、通信量も大きい。また、背景画像の切り替えに応じて、背景画像の符号化が行われる。そのため、この場合、符号化の処理量も大きく、符号列の符号量も大きい。 The prediction error may be reduced without preparing each background image by preparing a large number of background images. However, in this case, the frame memory 47 having a large capacity is used. In this case, the processing amount for selecting the background image is also large. In addition, according to the switching of the background image, transmission / reception of the background image occurs between the server 20 and the encoder 30a. Therefore, in this case, the communication amount is large. In addition, the background image is encoded in accordance with the switching of the background image. Therefore, in this case, the processing amount of encoding is large and the code amount of the code string is also large.

そこで、背景画像の変換により、フレームメモリ４７の容量の削減、処理量の削減、通信量の削減、および、符号量の削減が可能である。 Therefore, by converting the background image, it is possible to reduce the capacity of the frame memory 47, the processing amount, the communication amount, and the code amount.

また、サーバ２０に背景画像が保存されることにより、エンコーダ３０ａに大きな容量を有するフレームメモリ４７が利用されなくてもよい。つまり、エンコーダ３０ａ、３０ｂ、３０ｃは、サーバ２０における背景画像を共有することができる。また、各カメラ画像がサーバ２０に集められることで、動物体の影響が抑制され、広範囲かつ高解像度の背景画像の生成が可能である。 Further, since the background image is stored in the server 20, the frame memory 47 having a large capacity in the encoder 30a may not be used. That is, the encoders 30a, 30b, and 30c can share the background image in the server 20. Also, by collecting the camera images in the server 20, the influence of the moving object is suppressed, and a wide range and high resolution background image can be generated.

また、サーバ２０は、複数の背景画像をエンコーダ３０ａに事前に送信する。そして、エンコーダ３０ａは、サーバ２０から事前に送信された複数の背景画像のうち、時刻に応じて、使用対象の背景画像を切り替える。これにより、符号化中においてサーバ２０とエンコーダ３０ａとの間の通信量が軽減され、通信負荷が分散される。 The server 20 transmits a plurality of background images to the encoder 30a in advance. Then, the encoder 30 a switches the background image to be used among the plurality of background images transmitted in advance from the server 20 according to the time. As a result, the amount of communication between the server 20 and the encoder 30a is reduced during encoding, and the communication load is distributed.

なお、本実施の形態では、符号化の前に事前に背景画像が作成される。背景画像が作成されるタイミングは、これに限られない。背景画像は事前に作成されなくてもよい。その場合、エンコーダ３０ａは、カメラ画像を符号化しつつ、背景画像を作成（更新）する（Ｓ２１７）。サーバ２０は、エンコーダ３０ａでの背景画像の作成（更新）に応じて、背景画像データベース２１における背景画像を作成（更新）する（Ｓ２０５）。 In the present embodiment, a background image is created in advance before encoding. The timing at which the background image is created is not limited to this. The background image may not be created in advance. In that case, the encoder 30a creates (updates) the background image while encoding the camera image (S217). The server 20 creates (updates) a background image in the background image database 21 according to the creation (update) of the background image by the encoder 30a (S205).

これにより、画像処理システム１０は、背景画像を事前に作成することなく、すぐにカメラ画像の符号化を実行することができる。したがって、符号化の遅延が抑制される。 Thereby, the image processing system 10 can immediately perform encoding of a camera image without creating a background image in advance. Therefore, encoding delay is suppressed.

また、背景画像の作成、選択および更新の際に、カメラ情報とカメラ画像との両方が用いられてもよいし、いずれか一方のみが用いられてもよい。 Further, when creating, selecting, and updating the background image, both the camera information and the camera image may be used, or only one of them may be used.

例えば、サーバ２０またはエンコーダ３０ａは、カメラ情報（パンチルト角度およびズーム倍率）のみに基づいて、背景画像においてカメラ画像が対応する位置を特定してもよい。すなわち、カメラ情報のみに基づいて、高精度の背景画像の作成、選択および更新の処理が可能な場合がある。また、背景画像の作成、選択および更新の処理を画像特徴点で行うことが困難なカメラ画像に対して、これらの処理がカメラ情報で行われてもよい。 For example, the server 20 or the encoder 30a may specify the position corresponding to the camera image in the background image based only on the camera information (pan tilt angle and zoom magnification). That is, there are cases where a highly accurate background image can be created, selected, and updated based only on camera information. In addition, for a camera image in which it is difficult to perform background image creation, selection, and update processing using image feature points, these processing may be performed using camera information.

逆に、カメラ画像の画像特徴点のみを用いてこれらの処理が行われてもよい。その場合、パンチルト角度およびズーム倍率等のカメラ情報が、カメラ３５ａから取得されなくてもよい。したがって、構成が簡素化され、通信量も削減される。また、サーバ２０またはエンコーダ３０ａは、パンチルト角度およびズーム倍率等のカメラ情報に含まれる誤差の大きさにかかわらず、カメラ画像の画像特徴点のみを用いて、背景画像の作成、選択および更新の処理を行ってもよい。 Conversely, these processes may be performed using only the image feature points of the camera image. In this case, camera information such as the pan / tilt angle and zoom magnification may not be acquired from the camera 35a. Therefore, the configuration is simplified and the amount of communication is also reduced. In addition, the server 20 or the encoder 30a uses only the image feature points of the camera image regardless of the magnitude of the error included in the camera information such as the pan / tilt angle and the zoom magnification, and performs processing for creating, selecting, and updating the background image May be performed.

また、サーバ２０は、背景画像をエンコーダ３０ａへ送信する。ここで、サーバ２０は、背景画像の全体を送信しなくてもよい。サーバ２０は、背景画像の全体のうち、エンコーダ３０ａで使用される可能性を有する部分のみを送信してもよい。例えば、サーバ２０は、カメラ３５ａのパンおよびチルトの可動域（可能な撮影範囲）、および、カメラ３５ａの可能なズーム倍率に基づいて、背景画像の一部分のみをエンコーダ３０ａに送信してもよい。 In addition, the server 20 transmits the background image to the encoder 30a. Here, the server 20 may not transmit the entire background image. The server 20 may transmit only a part of the entire background image that has a possibility of being used by the encoder 30a. For example, the server 20 may transmit only a part of the background image to the encoder 30a based on the pan and tilt movable range (possible shooting range) of the camera 35a and the possible zoom magnification of the camera 35a.

これにより、サーバ２０とエンコーダ３０ａとの間の通信量が減り、エンコーダ３０ａのフレームメモリ４７の容量の削減が可能である。 As a result, the amount of communication between the server 20 and the encoder 30a is reduced, and the capacity of the frame memory 47 of the encoder 30a can be reduced.

また、エンコーダ３０ａは、図１１のように背景画像の被写体のサイズと符号化対象画像の被写体のサイズとが一致するように、背景画像をスケーリングする。さらに、エンコーダ３０ａは、動きベクトルに対して許容されている精度に適合するように、スケーリングの比率を変えてもよい。図１４および図１５を例に説明する。 Further, the encoder 30a scales the background image so that the size of the subject of the background image matches the size of the subject of the encoding target image as shown in FIG. Furthermore, the encoder 30a may change the scaling ratio so as to match the accuracy allowed for the motion vector. 14 and 15 will be described as an example.

図１４は、図１１に示されたスケーリング処理の変形例を示す図である。具体的には、図１４は、ＨＥＶＣ（非特許文献１）の規定のように、動きベクトルが１／４画素精度まで許容されている場合におけるスケーリング処理の例を示す。この場合、エンコーダ３０ａは、背景画像の被写体が、縦方向および横方向に、符号化対象画像の被写体の４倍になるように、背景画像をスケーリングする。 FIG. 14 is a diagram showing a modification of the scaling process shown in FIG. Specifically, FIG. 14 shows an example of the scaling process in the case where the motion vector is allowed up to ¼ pixel accuracy as defined in HEVC (Non-Patent Document 1). In this case, the encoder 30a scales the background image so that the subject of the background image is four times the subject of the encoding target image in the vertical and horizontal directions.

つまり、図１４の符号化対象画像において右から２番目の木の横幅が３８４画素である場合、背景画像において同じ被写体である右から２番目の木の横幅が１５３６画素（３８４画素×４）に一致するように、スケーリングが行われる。そして、エンコーダ３０ａは、予測ブロックの生成処理において、４画素の間隔で背景画像の各画素を参照する。 That is, when the width of the second tree from the right in the encoding target image in FIG. 14 is 384 pixels, the width of the second tree from the right, which is the same subject in the background image, is 1536 pixels (384 pixels × 4). Scaling is done to match. Then, the encoder 30a refers to each pixel of the background image at intervals of 4 pixels in the prediction block generation process.

図１５は、本実施の形態における整数画素精度および小数画素精度を示す図である。図１５に、スケーリング後の背景画像の各画素が示されている。また、図１５に、整数画素精度の動きベクトルで参照される画素と、小数画素精度の動きベクトルで参照される画素とが示されている。 FIG. 15 is a diagram showing integer pixel accuracy and decimal pixel accuracy in the present embodiment. FIG. 15 shows each pixel of the background image after scaling. FIG. 15 shows pixels referred to by motion vectors with integer pixel accuracy and pixels referred to by motion vectors with decimal pixel accuracy.

ＨＥＶＣ（非特許文献１）では、小数画素精度の動きベクトルが用いられる場合、整数画素精度の画素値に対してフィルタ処理を行うことにより小数画素精度の画素値が推定される。この方法では、推定された画素値と本来の画素値との間に誤差が生じる。 In HEVC (Non-Patent Document 1), when a motion vector with decimal pixel accuracy is used, a pixel value with decimal pixel accuracy is estimated by performing filter processing on pixel values with integer pixel accuracy. In this method, an error occurs between the estimated pixel value and the original pixel value.

しかし、図１４の例では、エンコーダ３０ａは、ズームインで得られたカメラ画像から生成された高解像度の背景画像を用いて小数画素精度の画素値を生成する。そのため、生成された画素値と本来の画素値との間に誤差が生じにくい。したがって、エンコーダ３０ａは、予測誤差を低減することができる。 However, in the example of FIG. 14, the encoder 30 a generates a pixel value with decimal pixel accuracy using a high-resolution background image generated from a camera image obtained by zooming in. Therefore, an error hardly occurs between the generated pixel value and the original pixel value. Therefore, the encoder 30a can reduce the prediction error.

なお、エンコーダ３０ａは、図１１のスケーリング処理と、図１４のスケーリング処理とを切り替えてもよい。例えば、エンコーダ３０ａは、フレームメモリ４７の容量の削減のため、図１１のスケーリング処理を用いることにより、図１４のスケーリング処理で用いられる画像サイズよりも小さい画像サイズを用いてもよい。また、エンコーダ３０ａは、予測誤差を小さくするため（符号化効率を向上するため）、小数画素精度の予測確度が高くなるように、図１４のスケーリング処理を用いてもよい。 The encoder 30a may switch between the scaling process of FIG. 11 and the scaling process of FIG. For example, the encoder 30a may use an image size smaller than the image size used in the scaling process of FIG. 14 by using the scaling process of FIG. 11 in order to reduce the capacity of the frame memory 47. Further, the encoder 30a may use the scaling process of FIG. 14 so as to increase the prediction accuracy with decimal pixel accuracy in order to reduce the prediction error (in order to improve the encoding efficiency).

また、エンコーダ３０ａは、画像特徴点を用いて、背景画像のスケーリング、背景画像の変換、および、全体ベクトルの算出などの処理を行ってもよいし、パンチルト角度およびズーム倍率等のカメラ情報を用いて、これらの処理を行ってもよい。例えば、エンコーダ３０ａは、画像特徴点を用いてこれらの処理を行うことが困難な画像に対して、カメラ情報を用いてこれらの処理を行ってもよい。また、カメラ３５ａの構成の簡素化のため、エンコーダ３０ａは、画像特徴点のみを用いてこれらの処理を行ってもよい。 The encoder 30a may perform processing such as background image scaling, background image conversion, and overall vector calculation using image feature points, or camera information such as pan / tilt angle and zoom magnification. These processes may be performed. For example, the encoder 30a may perform these processes using camera information on an image that is difficult to perform these processes using image feature points. In order to simplify the configuration of the camera 35a, the encoder 30a may perform these processes using only the image feature points.

また、画像間のマッチングには、画像特徴点の代わりに、画素の差分絶対値和が用いられてもよい。画素の差分絶対値和が最も小さい２つの画像が互いに最も適合していると想定される。これにより、マッチングの処理が簡素化され、ＳＩＭＤ演算装置において並列処理を行うことが可能である。 In addition, for the matching between images, the sum of absolute differences of pixels may be used instead of the image feature points. It is assumed that the two images having the smallest sum of absolute differences of pixels are most suitable for each other. This simplifies the matching process and enables parallel processing in the SIMD arithmetic device.

また、図５および図６の例では、複数のカメラ画像から背景画像が作成される。サーバ２０は、１つの背景画像の生成に、同じカメラ３５ａから得られた画像だけでなく、別のカメラ３５ｂ、３５ｃから得られた複数の画像も用いてもよい。例えば、サーバ２０は、別のカメラ３５ｂで別の角度から撮影によって得られた画像に対して射影変換を行い、射影変換が行われた画像をカメラ３５ａに対する背景画像の生成に用いてもよい。これにより、サーバ２０は、動物体によって隠れていた背景部分を取得することでき、動物体の影響が抑制された背景画像を取得することができる。 In the examples of FIGS. 5 and 6, a background image is created from a plurality of camera images. The server 20 may use not only an image obtained from the same camera 35a but also a plurality of images obtained from different cameras 35b and 35c for generating one background image. For example, the server 20 may perform projective transformation on an image obtained by photographing from another angle with another camera 35b, and use the image subjected to the projective transformation for generation of a background image for the camera 35a. Thereby, the server 20 can acquire the background part hidden by the moving object, and can acquire the background image in which the influence of the moving object is suppressed.

また、図３に各画像を含む符号列の構成例が示されている。図３において、第１背景画像、第２背景画像、および、第３背景画像は、同じ画像でもよい。例えば、映像が途中から再生され、符号列の途中から復号が開始される場合でも、エンコーダ３０ａは、正しく復号画像が表示されるように、定期的に長期参照画像である背景画像を符号列に挿入してもよい。 FIG. 3 shows a configuration example of a code string including each image. In FIG. 3, the first background image, the second background image, and the third background image may be the same image. For example, even when the video is played back in the middle and decoding is started from the middle of the code string, the encoder 30a periodically uses the background image, which is a long-term reference image, as the code string so that the decoded image is displayed correctly. It may be inserted.

これにより、例えば、デコーダは、Ｉ（ｔ）以降の画像を表示する際、Ｉ（ｔ）から復号を開始することができる。したがって、復号処理が削減され、表示の遅延も抑制される。 Thereby, for example, the decoder can start decoding from I (t) when displaying the image after I (t). Therefore, decoding processing is reduced and display delay is also suppressed.

また、図３では、１つの符号列がカメラ画像と背景画像とを含む。しかし、エンコーダ３０ａは、図１６のようにカメラ画像を含む符号列と背景画像を含む符号列とを別々に生成してもよい。これにより、１つの符号列内での画像サイズが統一される。したがって、符号化処理および復号処理が簡素化される。また、エンコーダ３０ａは、背景画像の符号列をデコーダにおけるカメラ画像の復号よりも前に、別途、デコーダに提供することができる。これにより、符号列の配信における通信負荷が分散される。 In FIG. 3, one code string includes a camera image and a background image. However, the encoder 30a may generate a code string including a camera image and a code string including a background image separately as shown in FIG. Thereby, the image size in one code string is unified. Therefore, the encoding process and the decoding process are simplified. In addition, the encoder 30a can separately provide the code string of the background image to the decoder before the decoding of the camera image in the decoder. Thereby, the communication load in the distribution of the code string is distributed.

また、上記において、エンコーダ３０ａは、背景画像を符号化して、符号化された背景画像を符号列に含める。しかし、エンコーダ３０ａは、符号化された背景画像を符号列に含めなくてもよい。符号化されたカメラ画像のみで符号列が構成されてもよい。その場合、別途、デコーダに背景画像が送信される。背景画像の送信において、高効率に符号化された背景画像が送信されてもよいし、背景画像の画素値そのものが送信されてもよい。 In the above, the encoder 30a encodes the background image and includes the encoded background image in the code string. However, the encoder 30a may not include the encoded background image in the code string. The code string may be configured only by the encoded camera image. In that case, a background image is separately transmitted to the decoder. In the transmission of the background image, the background image encoded with high efficiency may be transmitted, or the pixel value of the background image itself may be transmitted.

また、サーバ２０からエンコーダ３０ａへの背景画像の送信において、背景画像の画素値そのものが送信されてもよいし、ＨＥＶＣ（非特許文献１）またはＪＰＥＧを用いて背景画像を符号化することで得られる符号列が送信されてもよい。また、エンコーダ３０ａは、符号化処理（Ｓ２１５）で背景画像を符号化する。エンコーダ３０ａは、サーバ２０から受信した符号列そのものを符号化された背景画像として用いることにより、背景画像の符号化および送信の処理（Ｓ２１５）をスキップしてもよい。 In the transmission of the background image from the server 20 to the encoder 30a, the pixel value of the background image itself may be transmitted, or obtained by encoding the background image using HEVC (Non-patent Document 1) or JPEG. Code strings may be transmitted. The encoder 30a encodes the background image by the encoding process (S215). The encoder 30a may skip the process of encoding and transmitting the background image (S215) by using the code string itself received from the server 20 as the encoded background image.

また、サーバ２０が、図３のような符号列を生成してもよいし、図１６のような符号列を生成してもよい。また、エンコーダ３０ａが、図３のような符号列を生成してもよいし、図１６のような符号列を生成してもよい。また、図３のような１つの符号列が、図１６のような複数の符号列に分離されてもよい。また、サーバ２０またはエンコーダ３０ａは、符号列をデコーダに送信してもよいし、符号列を記録媒体に格納してもよい。また、図１６のような複数の符号列は、通信媒体または記録媒体などに別々に出力されてもよい。 The server 20 may generate a code string as shown in FIG. 3 or a code string as shown in FIG. Further, the encoder 30a may generate a code string as shown in FIG. 3 or a code string as shown in FIG. Further, one code string as shown in FIG. 3 may be separated into a plurality of code strings as shown in FIG. Further, the server 20 or the encoder 30a may transmit the code string to the decoder, or store the code string in a recording medium. Also, a plurality of code strings as shown in FIG. 16 may be separately output to a communication medium or a recording medium.

また、ここでは、「背景」は、一定時間動いていない被写体を意味する。例えば、しばらく動いていない人物、または、停車している車が「背景」に含まれてもよい。 Here, “background” means a subject that has not moved for a certain period of time. For example, a person who has not moved for a while or a car that has stopped may be included in the “background”.

さらに、本実施の形態における処理は、ソフトウェアによって実行されてもよい。そして、このソフトウェアは、ダウンロード等によって配布されてもよい。また、このソフトウェアは、ＣＤ−ＲＯＭなどの記録媒体に記録され、流布されてもよい。なお、これらに関して、本明細書における他の実施の形態も同様である。 Furthermore, the processing in the present embodiment may be executed by software. This software may be distributed by downloading or the like. The software may be recorded on a recording medium such as a CD-ROM and distributed. In addition, regarding these, other embodiment in this specification is also the same.

（実施の形態２）
＜全体構成＞
図１７は、本実施の形態における画像処理システムの構成を示す図である。図１７に示された画像処理システム１１は、サーバ２０、および、デコーダ５０ａ、５０ｂ等を備える。図１７には、２台のデコーダ５０ａ、５０ｂが示されているが、デコーダは、１台でもよいし、３台以上でもよい。 (Embodiment 2)
<Overall configuration>
FIG. 17 is a diagram showing a configuration of an image processing system in the present embodiment. The image processing system 11 illustrated in FIG. 17 includes a server 20 and decoders 50a and 50b. Although two decoders 50a and 50b are shown in FIG. 17, the number of decoders may be one, or three or more.

サーバ２０は、背景画像データベース２１、制御部２２、処理部２３、および、通信部２４を備える。これらの構成要素は、実施の形態１と同様である。本実施の形態において、通信部２４は、デコーダ５０ａ、５０ｂ等と通信する。 The server 20 includes a background image database 21, a control unit 22, a processing unit 23, and a communication unit 24. These components are the same as those in the first embodiment. In the present embodiment, the communication unit 24 communicates with the decoders 50a and 50b and the like.

デコーダ５０ａは、記憶部５１ａ、制御部５２ａ、処理部５３ａ、通信部５４ａ、および、表示部５５ａを備える。記憶部５１ａには、符号化された画像、および、復号された画像などが記憶される。制御部５２ａは、デコーダ５０ａにおける各構成要素の動作を制御する。処理部５３ａは、情報処理を行う。デコーダ５０ａの動作は、基本的に、処理部５３ａによって行われる。特に、処理部５３ａは、サーバ２０からの画像を復号する。通信部５４ａは、サーバ２０と通信する。 The decoder 50a includes a storage unit 51a, a control unit 52a, a processing unit 53a, a communication unit 54a, and a display unit 55a. The storage unit 51a stores an encoded image, a decoded image, and the like. The control unit 52a controls the operation of each component in the decoder 50a. The processing unit 53a performs information processing. The operation of the decoder 50a is basically performed by the processing unit 53a. In particular, the processing unit 53a decodes an image from the server 20. The communication unit 54 a communicates with the server 20.

表示部５５ａは、復号された画像を表示する。デコーダ５０ａの外部の表示装置でもよい。例えば、制御部５２ａが、復号された画像を外部の表示装置に表示してもよい。 The display unit 55a displays the decoded image. A display device outside the decoder 50a may be used. For example, the control unit 52a may display the decoded image on an external display device.

デコーダ５０ｂは、記憶部５１ｂ、制御部５２ｂ、処理部５３ｂ、通信部５４ｂ、および、表示部５５ｂを備える。これらは、それぞれ、デコーダ５０ａの構成要素と同様の構成要素である。 The decoder 50b includes a storage unit 51b, a control unit 52b, a processing unit 53b, a communication unit 54b, and a display unit 55b. These are the same components as the components of the decoder 50a.

例えば、サーバ２０は、背景画像データベース２１における背景画像を用いて符号化された画像をデコーダ５０ａに送信する。デコーダ５０ａは、符号化された画像を復号し、復号された画像を表示部５５ａに表示する。 For example, the server 20 transmits an image encoded using the background image in the background image database 21 to the decoder 50a. The decoder 50a decodes the encoded image and displays the decoded image on the display unit 55a.

なお、ここでは、デコーダ５０ａの構成、デコーダ５０ａの動作、および、サーバ２０とデコーダ５０ａとの間で行われる動作が、主に示されている。デコーダ５０ｂの構成、デコーダ５０ｂの動作、および、サーバ２０とデコーダ５０ｂとの間で行われる動作も、デコーダ５０ａの構成、デコーダ５０ａの動作、および、サーバ２０とデコーダ５０ａとの間で行われる動作と同様である。 Here, the configuration of the decoder 50a, the operation of the decoder 50a, and the operation performed between the server 20 and the decoder 50a are mainly shown. The configuration of the decoder 50b, the operation of the decoder 50b, and the operation performed between the server 20 and the decoder 50b are the same as the configuration of the decoder 50a, the operation of the decoder 50a, and the operation performed between the server 20 and the decoder 50a. It is the same.

＜動作（全体）＞
次に、図１８を参照しつつ、復号全体フローについて説明する。図１８は、図１７に示された画像処理システム１１の処理フローを示す図である。 <Operation (overall)>
Next, the entire decoding flow will be described with reference to FIG. FIG. 18 is a diagram showing a processing flow of the image processing system 11 shown in FIG.

まず、デコーダ５０ａは、サーバ２０へ復号の開始要求を送信する（Ｓ８０４）。サーバ２０は、復号の開始要求を受信する（Ｓ８０１）。 First, the decoder 50a transmits a decoding start request to the server 20 (S804). The server 20 receives the decryption start request (S801).

次に、サーバ２０は、デコーダ５０ａへ、背景画像の符号列またはカメラ画像の符号列を送信する（Ｓ８０２）。デコーダ５０ａは、符号列を受信する（Ｓ８０５）。 Next, the server 20 transmits the code sequence of the background image or the code sequence of the camera image to the decoder 50a (S802). The decoder 50a receives the code string (S805).

次に、デコーダ５０ａは、サーバ２０から受信した符号列が背景画像の符号列であるか否かを判定する（Ｓ８０６）。サーバ２０から受信した符号列が背景画像の符号列である場合（Ｓ８０６でＹｅｓ）、デコーダ５０ａは、符号列に含まれる符号化された背景画像を復号する（Ｓ８０７）。なお、背景画像は、他の画像からの参照のみに用いられる画像であり、表示されない。 Next, the decoder 50a determines whether or not the code string received from the server 20 is the code string of the background image (S806). When the code string received from the server 20 is the code string of the background image (Yes in S806), the decoder 50a decodes the encoded background image included in the code string (S807). Note that the background image is an image used only for reference from other images and is not displayed.

サーバ２０から受信した符号列が背景画像の符号列でない場合（Ｓ８０６でＮｏ）、サーバ２０から受信した符号列はカメラ画像の符号列である。この場合、デコーダ５０ａは、カメラ画像の復号に用いられる背景画像を決定する（Ｓ８０８）。具体的には、デコーダ５０ａは、復号対象のカメラ画像の時刻情報を用いて、背景画像を決定する。例えば、デコーダ５０ａは、時刻ｔにおけるカメラ画像には第１背景画像を使用対象の背景画像と決定し、時刻ｔ＋１におけるカメラ画像には第２背景画像を使用対象の背景画像と決定する。 When the code string received from the server 20 is not the code string of the background image (No in S806), the code string received from the server 20 is the code string of the camera image. In this case, the decoder 50a determines a background image used for decoding the camera image (S808). Specifically, the decoder 50a determines the background image using the time information of the camera image to be decoded. For example, the decoder 50a determines the first background image as the background image to be used for the camera image at time t, and determines the second background image as the background image to be used for the camera image at time t + 1.

そして、デコーダ５０ａは、符号列に含まれる符号化されたカメラ画像を復号する（Ｓ８０９）。そして、デコーダ５０ａは、復号されたカメラ画像を表示する（Ｓ８１０）。 Then, the decoder 50a decodes the encoded camera image included in the code string (S809). Then, the decoder 50a displays the decoded camera image (S810).

次に、デコーダ５０ａは、使用された背景画像を更新する（Ｓ８１１）。ここでは、復号処理（Ｓ８０９）で復号されたカメラ画像を用いて背景画像を更新する。例えば、背景画像に含まれる動物体で背景の一部が隠れてしまい、背景画像に背景の全てが含まれていない場合がある。そこで、デコーダ５０ａは、カメラ画像を用いて背景画像を更新する。 Next, the decoder 50a updates the used background image (S811). Here, the background image is updated using the camera image decoded in the decoding process (S809). For example, a part of the background may be hidden by the moving object included in the background image, and the background image may not include the entire background. Therefore, the decoder 50a updates the background image using the camera image.

具体的には、例えば、動物体は１つの場所に留まらずに様々な場所に移動する。そのため、複数のカメラ画像における画素の平均値で、背景の画素値を特定することが可能である。そこで、デコーダ５０ａは、使用された背景画像と、復号されたカメラ画像との間で画素毎に画素値の平均を算出する。そして、デコーダ５０ａは、画素毎に算出された平均で背景画像を更新することにより背景画像から動物体を疑似的に削除することができる。 Specifically, for example, the moving object moves to various places without staying at one place. Therefore, the background pixel value can be specified by the average value of the pixels in a plurality of camera images. Therefore, the decoder 50a calculates an average of pixel values for each pixel between the used background image and the decoded camera image. Then, the decoder 50a can artificially delete the moving object from the background image by updating the background image with the average calculated for each pixel.

なお、背景画像の更新処理は、エンコーダ３０ａ等と同様に行われる。これにより、符号化側で得られる更新後の背景画像と復号側で得られる更新後の背景画像とが互いに一致し、符号化側と復号側とにおける参照画像のミスマッチが抑制される。 The background image update process is performed in the same manner as the encoder 30a and the like. As a result, the updated background image obtained on the encoding side and the updated background image obtained on the decoding side match each other, and mismatch of the reference images on the encoding side and the decoding side is suppressed.

デコーダ５０ａは、ユーザから復号の停止要求を受けるまで、上記の処理（Ｓ８０５〜Ｓ８１１）を繰り返す（Ｓ８１２でＮｏ）。デコーダ５０ａは、停止要求を受けた場合（Ｓ８１２でＹｅｓ）、サーバ２０に復号の停止要求を送信し、処理を終える（Ｓ８１３）。サーバ２０は、デコーダ５０ａから復号の停止要求を受けるまで、背景画像の符号列またはカメラ画像の符号列の送信処理（Ｓ８０２）を繰り返す（Ｓ８０３でＮｏ）。サーバ２０は、停止要求を受けた場合（Ｓ８０３でＹｅｓ）、処理を終える。 The decoder 50a repeats the above processing (S805 to S811) until a decoding stop request is received from the user (No in S812). When receiving the stop request (Yes in S812), the decoder 50a transmits a stop request for decoding to the server 20, and ends the process (S813). The server 20 repeats the transmission process (S802) of the code sequence of the background image or the code sequence of the camera image until receiving a decoding stop request from the decoder 50a (No in S803). When the server 20 receives the stop request (Yes in S803), the server 20 ends the process.

＜復号構成＞
図１９は、図１７に示されたデコーダ５０ａの処理部５３ａの構成を主に示す図である。処理部５３ａは、可変長復号部６１、逆変換部６５、加算部６６、フレームメモリ６７および結合部６８を備える。フレームメモリ６７は、記憶部５１ａに含まれてもよい。 <Decryption configuration>
FIG. 19 is a diagram mainly showing a configuration of processing unit 53a of decoder 50a shown in FIG. The processing unit 53 a includes a variable length decoding unit 61, an inverse conversion unit 65, an addition unit 66, a frame memory 67, and a combining unit 68. The frame memory 67 may be included in the storage unit 51a.

可変長復号部６１は、符号列に対して可変長復号を行って係数を出力する。逆変換部６５は、係数に対して逆周波数変換を行って、処理対象ブロックと予測ブロックとの差分を示す差分ブロックを出力する。加算部６６は、差分ブロックと予測ブロックとを加算することにより、処理対象ブロックを再構成し、復号ブロックを生成する。結合部６８は、複数の復号ブロックを結合し、結合によって得られるカメラ画像を出力する。フレームメモリ６７には、結合部６８から出力されたカメラ画像が記憶される。 The variable length decoding unit 61 performs variable length decoding on the code string and outputs coefficients. The inverse transform unit 65 performs inverse frequency transform on the coefficient, and outputs a difference block indicating the difference between the processing target block and the prediction block. The adding unit 66 reconstructs the processing target block by adding the difference block and the prediction block, and generates a decoded block. The combining unit 68 combines a plurality of decoded blocks and outputs a camera image obtained by combining. In the frame memory 67, the camera image output from the combining unit 68 is stored.

＜動作（復号）＞
次に、図２０を参照しつつ、デコーダ５０ａでのカメラ画像の復号処理（Ｓ８０９）のフローについて説明する。図２０は、図１８に示された復号処理（Ｓ８０９）のフローを示す図である。 <Operation (decoding)>
Next, the flow of the camera image decoding process (S809) in the decoder 50a will be described with reference to FIG. FIG. 20 is a diagram showing a flow of the decoding process (S809) shown in FIG.

まず、デコーダ５０ａは、スケーリングパラメータを復号し、復号されたスケーリングパラメータに従って背景画像をスケーリングする（Ｓ９０１）。例えば、デコーダ５０ａは、図１１のように、背景画像をスケーリングすることにより、スケーリングされた背景画像を取得する。 First, the decoder 50a decodes the scaling parameter and scales the background image according to the decoded scaling parameter (S901). For example, the decoder 50a obtains a scaled background image by scaling the background image as shown in FIG.

次に、デコーダ５０ａは、変換パラメータを復号し、復号された変換パラメータに従って背景画像を変換する（Ｓ９０２）。例えば、デコーダ５０ａは、図１２のように、背景画像を変換することにより、変換された背景画像を取得する。 Next, the decoder 50a decodes the conversion parameter, and converts the background image according to the decoded conversion parameter (S902). For example, the decoder 50a acquires the converted background image by converting the background image as shown in FIG.

次に、デコーダ５０ａは、全体ベクトルを復号し、図１３のような全体ベクトルを取得する（Ｓ９０３）。 Next, the decoder 50a decodes the entire vector and acquires the entire vector as shown in FIG. 13 (S903).

次に、デコーダ５０ａにおいて、可変長復号部６１は、符号列に対して可変長復号を行う（Ｓ９０４）。逆変換部６５は、可変長復号で得られた係数に対して逆周波数変換を行う（Ｓ９０５）。加算部６６は、逆周波数変換によって得られたブロックと予測ブロックとを加算することにより、復号ブロックを生成する（Ｓ９０６）。 Next, in the decoder 50a, the variable length decoding unit 61 performs variable length decoding on the code string (S904). The inverse transform unit 65 performs inverse frequency transform on the coefficient obtained by variable length decoding (S905). The adding unit 66 generates a decoded block by adding the block obtained by the inverse frequency transform and the prediction block (S906).

デコーダ５０ａは、上記の処理（Ｓ９０４〜Ｓ９０６）を全てのブロックの復号が完了するまで繰り返す（Ｓ９０７でＮｏ）。全てのブロックの復号が完了した後（Ｓ９０７でＹｅｓ）、結合部６８は、全てのブロックを結合し、復号画像を取得する（Ｓ９０８）。 The decoder 50a repeats the above processing (S904 to S906) until decoding of all the blocks is completed (No in S907). After the decoding of all the blocks is completed (Yes in S907), the combining unit 68 combines all the blocks and acquires the decoded image (S908).

なお、予測ブロックが生成される際、背景画像、復号済み画像（参照画像）、または、同一画像内の復号済みブロックが参照される。また、デコーダ５０ａは、実施の形態１と同様に全体ベクトルをベースとして用いる。そして、デコーダ５０ａは、復号された動きベクトルに全体ベクトルを加算することで得られるベクトルを予測ブロックの生成のためのベクトルとして用いる（図１３参照）。 Note that when a prediction block is generated, a background image, a decoded image (reference image), or a decoded block in the same image is referred to. The decoder 50a uses the entire vector as a base, as in the first embodiment. Then, the decoder 50a uses a vector obtained by adding the entire vector to the decoded motion vector as a vector for generating a prediction block (see FIG. 13).

＜効果＞
以上、本実施の形態では、画像処理システム１１は、広範囲かつ高解像度の背景画像を使用する。これにより、画像処理システム１１は、高い符号化効率で符号化された画像を復号することができる。特に、画像処理システム１１は、パン、チルトまたはズームを行って得られた画像を適切に復号することができる。 <Effect>
As described above, in the present embodiment, the image processing system 11 uses a wide-range and high-resolution background image. Thereby, the image processing system 11 can decode the image encoded with high encoding efficiency. In particular, the image processing system 11 can appropriately decode an image obtained by panning, tilting, or zooming.

また、画像処理システム１１は、復号されたカメラ画像を用いて背景画像を更新することにより、背景画像を現在のカメラ画像に応じて更新することができる。そのため、画像処理システム１１は、背景画像を復号対象画像に近づけることができ、予測の確度を向上させることができる。 Further, the image processing system 11 can update the background image according to the current camera image by updating the background image using the decoded camera image. Therefore, the image processing system 11 can bring the background image closer to the decoding target image, and can improve the accuracy of prediction.

例えば、背景画像に動物体が含まれている場合、背景の一部が隠れている。したがって、１枚の背景画像に全ての背景が含まれていない可能性がある。画像処理システム１１は、複数のカメラ画像を用いて背景画像を更新することにより、背景の一部が含まれていない背景画像を背景の一部が含まれているカメラ画像で補うことができる。また、背景が時間経過と共に変化する場合がある。例えば、夜に近づくに従って背景が暗くなる場合、背景画像が更新されてもよい。 For example, when the background image includes a moving object, a part of the background is hidden. Therefore, there is a possibility that not all backgrounds are included in one background image. The image processing system 11 can supplement a background image that does not include a part of the background with a camera image that includes a part of the background by updating the background image using a plurality of camera images. Also, the background may change over time. For example, the background image may be updated when the background becomes darker as the night approaches.

また、画像処理システム１１は、背景画像を更新することにより、椅子の配置が変わる場合等における背景の変化に迅速に追従できる。また、画像処理システム１１は、復号画像を用いて背景画像を更新することにより、サーバ２０とデコーダ５０ａとの間で背景画像の送受信回数を削減することができる。 Further, the image processing system 11 can quickly follow changes in the background when the arrangement of the chair changes by updating the background image. Further, the image processing system 11 can reduce the number of times of transmission / reception of the background image between the server 20 and the decoder 50a by updating the background image using the decoded image.

また、画像処理システム１１は、エンコーダ（例えば、実施の形態１のエンコーダ３０ａ）とデコーダ５０ａとの間で背景画像の更新処理を共通化してもよい。これにより、更新後の背景画像は、符号列に含まれなくてもよい。したがって、全体の符号量が削減される。 Further, the image processing system 11 may share the background image update process between the encoder (for example, the encoder 30a of the first embodiment) and the decoder 50a. Thus, the updated background image may not be included in the code string. Therefore, the overall code amount is reduced.

また、デコーダ５０ａが背景画像を更新することにより、サーバ２０は、多数の様々な背景画像を保持しなくてもよい。したがって、サーバ２０におけるメモリ（記憶部）の容量の削減が可能である。 Further, the decoder 50a updates the background image, so that the server 20 does not have to hold a large number of various background images. Therefore, the capacity of the memory (storage unit) in the server 20 can be reduced.

また、画像処理システム１１は、図１２のように背景画像を変換し、変換された背景画像を参照画像として使用する。これにより、背景画像が復号対象画像に近づき、予測誤差が低減される。また、これにより、背景画像を保存するためのフレームメモリ６７の容量の削減が可能である。 Further, the image processing system 11 converts the background image as shown in FIG. 12, and uses the converted background image as a reference image. As a result, the background image approaches the decoding target image, and the prediction error is reduced. Thereby, the capacity of the frame memory 67 for storing the background image can be reduced.

予測誤差は、多数の背景画像を用意することで、各背景画像を変換することなく、低減されてもよい。しかし、この場合、大きな容量を有するフレームメモリ６７が利用される。また、背景画像を選択するための処理量も大きい。また、背景画像は、切り替えに応じて、符号列に挿入される。そのため、この場合、符号量も大きく、通信量も大きい。 The prediction error may be reduced without preparing each background image by preparing a large number of background images. However, in this case, the frame memory 67 having a large capacity is used. Also, the amount of processing for selecting a background image is large. The background image is inserted into the code string in accordance with the switching. Therefore, in this case, the code amount is large and the communication amount is large.

そこで、背景画像の変換により、フレームメモリ６７の容量の削減、処理量の削減、通信量の削減、および、符号量の削減が可能である。 Therefore, by converting the background image, it is possible to reduce the capacity of the frame memory 67, the processing amount, the communication amount, and the code amount.

また、サーバ２０に背景画像が保存されることにより、デコーダ５０ａに大きな容量を有するフレームメモリ６７が利用されなくてもよい。つまり、デコーダ５０ａ、５０ｂは、サーバ２０における背景画像を共有することができる。 Further, since the background image is stored in the server 20, the frame memory 67 having a large capacity may not be used in the decoder 50a. That is, the decoders 50a and 50b can share the background image in the server 20.

また、サーバ２０は、複数の背景画像をデコーダ５０ａに事前に送信する。そして、デコーダ５０ａは、サーバ２０から事前に送信された複数の背景画像のうち、時刻に応じて、使用対象の背景画像を切り替える。これにより、復号中においてサーバ２０とデコーダ５０ａとの間の通信量が軽減され、通信負荷が分散される。 The server 20 transmits a plurality of background images to the decoder 50a in advance. Then, the decoder 50 a switches the background image to be used among the plurality of background images transmitted in advance from the server 20 according to the time. As a result, the amount of communication between the server 20 and the decoder 50a is reduced during decoding, and the communication load is distributed.

なお、デコーダ５０ａは、判定処理（Ｓ８０６）において、サーバ２０から受信した符号列が背景画像の符号列であるかカメラ画像の符号列であるかを判定する。デコーダ５０ａは、判定処理において、復号画像を表示するか否かを示す非表示フラグを利用してもよい。非表示フラグは、動画編集等で参照画像が削除され、画像が正しく復号されない場合等に用いられる。 In the determination process (S806), the decoder 50a determines whether the code string received from the server 20 is the code string of the background image or the code string of the camera image. The decoder 50a may use a non-display flag indicating whether or not to display the decoded image in the determination process. The non-display flag is used when the reference image is deleted in moving image editing or the like and the image is not decoded correctly.

本実施の形態の背景画像のように参照のみに使われる画像は非表示フラグによって表示されない画像と指定されてもよい。デコーダ５０ａは、表示されない画像として指定された画像が背景画像であると判定してもよい。画像処理システム１１は、新たな追加のフラグを用いずに、既存の非表示フラグを用いることで、符号量の増加を抑制することができる。もちろん、画像処理システム１１は、背景画像であるか否かを示す新たなフラグを用いてもよい。 An image used only for reference like the background image of the present embodiment may be designated as an image that is not displayed by the non-display flag. The decoder 50a may determine that the image designated as the image that is not displayed is the background image. The image processing system 11 can suppress an increase in the code amount by using an existing non-display flag without using a new additional flag. Of course, the image processing system 11 may use a new flag indicating whether or not the image is a background image.

また、デコーダ５０ａは、図１１のように背景画像の被写体のサイズと符号化対象画像（復号対象画像）の被写体のサイズとが一致するように、背景画像をスケーリングする。デコーダ５０ａは、動きベクトルに対して許容されている精度に適合するように、スケーリングの比率を変えてもよい。図１４および図１５を例に説明する。 In addition, the decoder 50a scales the background image so that the size of the subject of the background image matches the size of the subject of the encoding target image (decoding target image) as shown in FIG. The decoder 50a may change the scaling ratio so as to match the accuracy allowed for the motion vector. 14 and 15 will be described as an example.

図１４は、図１１に示されたスケーリング処理の変形例を示す図である。具体的には、図１４は、ＨＥＶＣ（非特許文献１）の規定のように、動きベクトルが１／４画素精度まで許容されている場合におけるスケーリング処理の例を示す。この場合、デコーダ５０ａは、背景画像の被写体が、縦方向および横方向に、符号化対象画像（復号対象画像）の被写体の４倍になるように、背景画像をスケーリングする。 FIG. 14 is a diagram showing a modification of the scaling process shown in FIG. Specifically, FIG. 14 shows an example of the scaling process in the case where the motion vector is allowed up to ¼ pixel accuracy as defined in HEVC (Non-Patent Document 1). In this case, the decoder 50a scales the background image so that the subject of the background image is four times the subject of the encoding target image (decoding target image) in the vertical and horizontal directions.

つまり、図１４の符号化対象画像（復号対象画像）において右から２番目の木の横幅が３８４画素である場合、スケーリングにより背景画像において同じ被写体である右から２番目の木の横幅が１５３６画素（３８４画素×４）になるように、スケーリングが行われる。そして、デコーダ５０ａは、予測ブロックの生成処理において、４画素の間隔で背景画像の各画素を参照する。 That is, when the horizontal width of the second tree from the right in the encoding target image (decoding target image) in FIG. 14 is 384 pixels, the horizontal width of the second tree from the right that is the same subject in the background image is 1536 pixels by scaling. Scaling is performed so as to be (384 pixels × 4). Then, the decoder 50a refers to each pixel of the background image at intervals of 4 pixels in the prediction block generation process.

しかし、図１４の例では、デコーダ５０ａは、ズームインで得られたカメラ画像から生成された高解像度の背景画像を用いて小数画素精度の画素値を生成する。そのため、生成された画素値と本来の画素値との間に誤差が生じにくい。したがって、デコーダ５０ａは、予測誤差を低減することができる。 However, in the example of FIG. 14, the decoder 50a generates a pixel value with decimal pixel accuracy using a high-resolution background image generated from a camera image obtained by zooming in. Therefore, an error hardly occurs between the generated pixel value and the original pixel value. Therefore, the decoder 50a can reduce the prediction error.

なお、デコーダ５０ａは、図１１のスケーリング処理と図１４のスケーリング処理とを切り替えてもよい。例えば、デコーダ５０ａは、フレームメモリ６７の容量の削減のため、図１１のスケーリング処理を用いることにより、図１４のスケーリング処理で用いられる画像サイズよりも小さい画像サイズを用いてもよい。また、デコーダ５０ａは、予測誤差を小さくするため（符号化効率を向上するため）、小数画素精度の予測確度を高くなるように、図１４のスケーリング処理を用いてもよい。 The decoder 50a may switch between the scaling process of FIG. 11 and the scaling process of FIG. For example, the decoder 50 a may use an image size smaller than the image size used in the scaling process of FIG. 14 by using the scaling process of FIG. 11 in order to reduce the capacity of the frame memory 67. Further, the decoder 50a may use the scaling process of FIG. 14 so as to increase the prediction accuracy with decimal pixel accuracy in order to reduce the prediction error (in order to improve the encoding efficiency).

また、図３に各画像を含む符号列の構成例が示されている。図３において、第１背景画像、第２背景画像、および、第３背景画像は、同じ画像でもよい。例えば、映像が途中から再生され、符号列の途中から復号が開始される場合でも、正しく復号画像が表示されるように、定期的に長期参照画像である背景画像が符号列に挿入されてもよい。これにより、例えば、デコーダ５０ａは、Ｉ（ｔ）以降の画像を表示する際、Ｉ（ｔ）から復号を開始することができる。したがって、復号処理が削減され、表示の遅延も抑制される。 FIG. 3 shows a configuration example of a code string including each image. In FIG. 3, the first background image, the second background image, and the third background image may be the same image. For example, even when a video is played from the middle and decoding is started from the middle of the code string, a background image that is a long-term reference image is periodically inserted into the code string so that the decoded image is displayed correctly. Good. Thereby, for example, the decoder 50a can start decoding from I (t) when displaying an image after I (t). Therefore, decoding processing is reduced and display delay is also suppressed.

また、図３では、１つの符号列がカメラ画像と背景画像とを含む。しかし、図１６のようにカメラ画像を含む符号列と背景画像を含む符号列とは、別々に生成されてもよい。これにより、１つの符号列内での画像サイズが統一される。したがって、符号化処理および復号処理が簡素化される。また、デコーダ５０ａは、カメラ画像の符号列の復号よりも前に、別途、背景画像の符号列を取得することができる。これにより、符号列の配信における通信負荷が分散される。 In FIG. 3, one code string includes a camera image and a background image. However, as shown in FIG. 16, the code string including the camera image and the code string including the background image may be generated separately. Thereby, the image size in one code string is unified. Therefore, the encoding process and the decoding process are simplified. In addition, the decoder 50a can separately acquire the code sequence of the background image before decoding the code sequence of the camera image. Thereby, the communication load in the distribution of the code string is distributed.

また、上記において、符号化された背景画像が符号列に含められている。しかし、背景画像は、符号列に含められなくてもよい。すなわち、符号化されたカメラ画像のみで符号列が構成されてもよい。その場合、別途、デコーダ５０ａに背景画像が送信される。背景画像の送信において、高効率に符号化された背景画像が送信されてもよいし、背景画像の画素値そのものが送信されてもよい。 In the above, the encoded background image is included in the code string. However, the background image may not be included in the code string. That is, the code string may be configured only with the encoded camera image. In that case, a background image is separately transmitted to the decoder 50a. In the transmission of the background image, the background image encoded with high efficiency may be transmitted, or the pixel value of the background image itself may be transmitted.

また、デコーダ５０ａは、サーバ２０から、図３のような符号列を取得してもよいし、図１６のような符号列を取得してもよい。また、デコーダ５０ａは、エンコーダ（エンコーダ３０ａ等）から、図３のような符号列を取得してもよいし、図１６のような符号列を取得してもよい。また、デコーダ５０ａは、記録媒体から、図３のような符号列を取得してもよいし、図１６のような符号列を取得してもよい。デコーダ５０ａは、図１６のような複数の符号列を別々に通信媒体または記録媒体などから取得してもよい。 Further, the decoder 50a may acquire a code string as illustrated in FIG. 3 or a code string as illustrated in FIG. 16 from the server 20. Further, the decoder 50a may acquire a code string as shown in FIG. 3 or an code string as shown in FIG. 16 from an encoder (encoder 30a or the like). The decoder 50a may acquire a code string as shown in FIG. 3 or a code string as shown in FIG. 16 from the recording medium. The decoder 50a may acquire a plurality of code strings as shown in FIG. 16 separately from a communication medium or a recording medium.

（実施の形態３）
本実施の形態は、実施の形態１および実施の形態２に示された画像処理システムの特徴的な構成および特徴的な動作を示す。本実施の形態で示される構成および動作は、基本的に、実施の形態１および実施の形態２に示された構成および動作に対応する。 (Embodiment 3)
The present embodiment shows a characteristic configuration and a characteristic operation of the image processing system shown in the first and second embodiments. The configuration and operation shown in this embodiment basically correspond to the configuration and operation shown in Embodiment 1 and Embodiment 2.

図２１は、本実施の形態における画像処理システムの構成を示す図である。図２１に示された画像処理システム１２は、画像符号化装置７０、画像復号装置８０、および、画像管理装置９０を備える。画像符号化装置７０は、取得部７１および符号化部７２を備える。画像復号装置８０は、取得部８１および復号部８２を備える。 FIG. 21 is a diagram illustrating a configuration of an image processing system according to the present embodiment. The image processing system 12 illustrated in FIG. 21 includes an image encoding device 70, an image decoding device 80, and an image management device 90. The image encoding device 70 includes an acquisition unit 71 and an encoding unit 72. The image decoding device 80 includes an acquisition unit 81 and a decoding unit 82.

画像符号化装置７０は、面間予測を用いて、映像を構成する複数の表示対象画像を符号化する。取得部７１は、参照専用画像を取得する。符号化部７２は、参照専用画像を面間予測における参照画像として参照して、映像を構成する複数の表示対象画像のうち１以上の表示対象画像を符号化する。 The image encoding device 70 encodes a plurality of display target images constituting a video using inter prediction. The acquisition unit 71 acquires a reference-only image. The encoding unit 72 encodes one or more display target images among a plurality of display target images constituting the video with reference to the reference-only image as a reference image in the inter prediction.

ここで、参照専用画像は、複数の表示対象画像とも複数の表示対象画像の複数の再構成画像とも異なる画像であり面間予測において参照専用として用いられる画像である。参照専用画像は、例えば、実施の形態１および実施の形態２で示された背景画像である。表示対象画像は、例えば、実施の形態１および実施の形態２で示されたカメラ画像である。 Here, the reference-dedicated image is an image that is different from both the plurality of display target images and the plurality of reconstructed images of the plurality of display target images, and is an image used exclusively for reference in inter-plane prediction. The reference only image is, for example, the background image shown in the first embodiment and the second embodiment. The display target image is, for example, the camera image shown in the first embodiment and the second embodiment.

画像復号装置８０は、面間予測を用いて、映像を構成する複数の表示対象画像を復号する。取得部８１は、参照専用画像を取得する。復号部８２は、参照専用画像を面間予測における参照画像として参照して、映像を構成する複数の表示対象画像のうち１以上の表示対象画像を復号する。 The image decoding device 80 decodes a plurality of display target images constituting the video by using inter prediction. The acquisition unit 81 acquires a reference-only image. The decoding unit 82 decodes one or more display target images among the plurality of display target images constituting the video with reference to the reference-only image as a reference image in the inter prediction.

画像管理装置９０は、画像処理システム１２において、任意の構成要素である。画像管理装置９０は、参照専用画像を取得する。画像符号化装置７０の取得部７１は、画像管理装置９０で取得された参照専用画像を画像管理装置９０から取得してもよい。同様に、画像復号装置８０の取得部８１は、画像管理装置９０で取得された参照専用画像を画像管理装置９０から取得してもよい。 The image management apparatus 90 is an arbitrary component in the image processing system 12. The image management device 90 acquires a reference-only image. The acquisition unit 71 of the image encoding device 70 may acquire the reference-only image acquired by the image management device 90 from the image management device 90. Similarly, the acquisition unit 81 of the image decoding device 80 may acquire the reference-only image acquired by the image management device 90 from the image management device 90.

画像符号化装置７０は、実施の形態１に示されたエンコーダ３０ａ等に対応する。画像符号化装置７０の取得部７１および符号化部７２は、実施の形態１に示された処理部３３ａ等に対応する。画像復号装置８０は、実施の形態２に示されたデコーダ５０ａ等に対応する。画像復号装置８０の取得部８１および復号部８２は、実施の形態２に示された処理部５３ａ等に対応する。画像管理装置９０は、実施の形態１および実施の形態２に示されたサーバ２０に対応する。 The image encoding device 70 corresponds to the encoder 30a and the like shown in the first embodiment. The acquisition unit 71 and the encoding unit 72 of the image encoding device 70 correspond to the processing unit 33a and the like shown in the first embodiment. The image decoding device 80 corresponds to the decoder 50a or the like shown in the second embodiment. The acquisition unit 81 and the decoding unit 82 of the image decoding device 80 correspond to the processing unit 53a and the like shown in the second embodiment. The image management apparatus 90 corresponds to the server 20 shown in the first and second embodiments.

図２２は、図２１に示された画像処理システム１２の動作の処理フローを示す図である。 FIG. 22 is a diagram showing a processing flow of the operation of the image processing system 12 shown in FIG.

画像符号化装置７０において、まず、取得部７１が参照専用画像を取得する（Ｓ１１１）。取得部７１は、参照専用画像を画像管理装置９０から取得してもよい。この場合、画像管理装置９０が、まず、参照専用画像を取得する（Ｓ１０１）。次に、取得部７１が、画像管理装置９０で取得された参照専用画像を画像管理装置９０から取得する。 In the image encoding device 70, first, the acquisition unit 71 acquires a reference-only image (S111). The acquisition unit 71 may acquire a reference-only image from the image management device 90. In this case, the image management apparatus 90 first acquires a reference-only image (S101). Next, the acquisition unit 71 acquires the reference-only image acquired by the image management apparatus 90 from the image management apparatus 90.

次に、画像符号化装置７０において、符号化部７２が、参照専用画像を面間予測における参照画像として参照して、映像を構成する複数の表示対象画像のうち１以上の表示対象画像を符号化する（Ｓ１１２）。 Next, in the image encoding device 70, the encoding unit 72 encodes one or more display target images among a plurality of display target images constituting the video with reference to the reference-only image as a reference image in inter prediction. (S112).

画像復号装置８０において、まず、取得部８１が参照専用画像を取得する（Ｓ１２１）。取得部８１は、参照専用画像を画像管理装置９０から取得してもよい。この場合、画像管理装置９０が、まず、参照専用画像を取得する（Ｓ１０１）。次に、取得部８１が、画像管理装置９０で取得された参照専用画像を画像管理装置９０から取得する。 In the image decoding device 80, first, the acquisition unit 81 acquires a reference-only image (S121). The acquisition unit 81 may acquire a reference-only image from the image management device 90. In this case, the image management apparatus 90 first acquires a reference-only image (S101). Next, the acquisition unit 81 acquires the reference-only image acquired by the image management apparatus 90 from the image management apparatus 90.

次に、画像復号装置８０において、復号部８２が、参照専用画像を面間予測における参照画像として参照して、映像を構成する複数の表示対象画像のうち１以上の表示対象画像を復号する（Ｓ１２２）。 Next, in the image decoding apparatus 80, the decoding unit 82 refers to the reference-only image as a reference image in inter-frame prediction, and decodes one or more display target images among a plurality of display target images constituting the video ( S122).

これにより、画像符号化装置７０および画像復号装置８０は、面間予測において、表示対象画像等とは異なる参照専用画像を参照することができる。したがって、画像符号化装置７０および画像復号装置８０は、面間予測において適切な参照画像を参照することができる。 Accordingly, the image encoding device 70 and the image decoding device 80 can refer to a reference-only image that is different from the display target image or the like in the inter-frame prediction. Therefore, the image encoding device 70 and the image decoding device 80 can refer to an appropriate reference image in the inter prediction.

参照専用画像は、複数の表示対象画像のそれぞれよりも大きくてもよい。すなわち、参照専用画像の画素数は、複数の表示対象画像のそれぞれの画素数よりも大きくてもよい。 The reference-only image may be larger than each of the plurality of display target images. That is, the number of pixels of the reference-only image may be larger than the number of pixels of each of the plurality of display target images.

参照専用画像は、複数の撮影画像が統合された画像でもよい。ここで、撮影画像は、撮影によって得られた画像である。複数の撮影画像は、パン、チルトおよびズームによって得られてもよいし、複数のカメラから得られてもよい。参照専用画像は、各撮影画像の撮影情報または特徴点を用いて複数の撮影画像を統合することにより得られる画像でもよい。 The reference-only image may be an image in which a plurality of captured images are integrated. Here, the photographed image is an image obtained by photographing. A plurality of captured images may be obtained by panning, tilting, and zooming, or may be obtained from a plurality of cameras. The reference-dedicated image may be an image obtained by integrating a plurality of captured images using shooting information or feature points of each captured image.

取得部７１は、映像を構成する複数の表示対象画像のうち符号化順で最初の表示対象画像が符号化される前に、参照専用画像を取得してもよい。同様に、取得部８１は、映像を構成する複数の表示対象画像のうち復号順で最初の表示対象画像が復号される前に、参照専用画像を取得してもよい。 The acquisition unit 71 may acquire the reference-only image before the first display target image is encoded in the encoding order among the plurality of display target images constituting the video. Similarly, the acquisition unit 81 may acquire the reference-only image before the first display target image is decoded in the decoding order among the plurality of display target images constituting the video.

取得部７１、８１は、参照専用画像を画像管理装置９０から部分的または全体的に受信することにより、参照専用画像を部分的または全体的に取得してもよい。符号化部７２は、部分的または全体的に取得された参照専用画像を参照して、１以上の表示対象画像を符号化してもよい。また、復号部８２は、部分的または全体的に取得された参照専用画像を参照して、１以上の表示対象画像を復号してもよい。 The acquisition units 71 and 81 may acquire the reference-only image partially or wholly by receiving the reference-only image from the image management apparatus 90 partially or entirely. The encoding unit 72 may encode one or more display target images with reference to the reference-only images acquired partially or entirely. The decoding unit 82 may decode one or more display target images with reference to the reference-only images acquired partially or entirely.

取得部７１、８１は、第１撮影状況に対応する第１参照専用画像と、第２撮影状況に対応する第２参照専用画像とを含む複数の参照専用画像のそれぞれを参照専用画像として取得してもよい。取得部７１、８１は、映像の撮影状況に基づいて、複数の参照専用画像を選択的に取得してもよい。各撮影状況は、例えば、撮影の際の時刻、天気または季節などを含む。 The acquisition units 71 and 81 acquire, as reference-only images, each of a plurality of reference-only images including a first reference-only image corresponding to the first shooting situation and a second reference-only image corresponding to the second shooting situation. May be. The acquisition units 71 and 81 may selectively acquire a plurality of reference-only images based on video shooting conditions. Each photographing situation includes, for example, time at the time of photographing, weather or season.

例えば、映像の撮影状況が第１撮影状況である場合、取得部７１、８１は、第１参照専用画像を取得してもよい。また、この場合、符号化部７２は、第１参照専用画像を参照専用画像として参照して、１以上の表示対象画像を符号化してもよい。また、この場合、復号部８２は、第１参照専用画像を参照専用画像として参照して、１以上の表示対象画像を復号してもよい。 For example, when the shooting situation of the video is the first shooting situation, the acquisition units 71 and 81 may acquire the first reference-only image. In this case, the encoding unit 72 may encode one or more display target images with reference to the first reference dedicated image as the reference dedicated image. In this case, the decoding unit 82 may decode one or more display target images with reference to the first reference-only image as a reference-only image.

また、例えば、映像の撮影状況が第２撮影状況である場合、取得部７１、８１は、第２参照専用画像を取得してもよい。また、この場合、符号化部７２は、第２参照専用画像を参照専用画像として参照して、１以上の表示対象画像を符号化してもよい。また、この場合、復号部８２は、第２参照専用画像を参照専用画像として参照して、１以上の表示対象画像を復号してもよい。 In addition, for example, when the shooting situation of the video is the second shooting situation, the acquisition units 71 and 81 may acquire the second reference dedicated image. In this case, the encoding unit 72 may encode one or more display target images with reference to the second reference dedicated image as the reference dedicated image. In this case, the decoding unit 82 may decode one or more display target images with reference to the second reference-only image as a reference-only image.

取得部７１、８１は、複数の表示対象画像の複数の再構成画像のうち１以上の再構成画像を用いて、参照専用画像を更新してもよい。符号化部７２は、更新された参照専用画像を参照して、１以上の表示対象画像を符号化してもよい。復号部８２は、更新された参照専用画像を参照して、１以上の表示対象画像を復号してもよい。 The acquisition units 71 and 81 may update the reference-only image using one or more reconstructed images among the plurality of reconstructed images of the plurality of display target images. The encoding unit 72 may encode one or more display target images with reference to the updated reference-only image. The decoding unit 82 may decode one or more display target images with reference to the updated reference-only image.

符号化部７２は、１以上の表示対象画像のうち符号化対象画像を符号化する際、参照専用画像が符号化対象画像に対応するように、参照専用画像を変換し、変換された参照専用画像を参照画像として参照してもよい。復号部８２は、１以上の表示対象画像のうち復号対象画像を復号する際、参照専用画像が復号対象画像に対応するように、参照専用画像を変換し、変換された参照専用画像を参照画像として参照してもよい。変換は、射影変換でもよいし、輝度変換でもよいし、スケーリングでもよい。 When the encoding unit 72 encodes the encoding target image among the one or more display target images, the encoding unit 72 converts the reference dedicated image so that the reference dedicated image corresponds to the encoding target image, and the converted reference dedicated image An image may be referred to as a reference image. When decoding the decoding target image among the one or more display target images, the decoding unit 82 converts the reference dedicated image so that the reference dedicated image corresponds to the decoding target image, and the converted reference dedicated image is used as the reference image. You may refer to as The conversion may be projective conversion, luminance conversion, or scaling.

符号化部７２は、参照専用画像における被写体の大きさが符号化対象画像における被写体の大きさに対応するように、参照専用画像をスケーリングし、スケーリングされた参照専用画像を参照画像として参照してもよい。復号部８２は、参照専用画像における被写体の大きさが復号対象画像における被写体の大きさに対応するように、参照専用画像をスケーリングし、スケーリングされた参照専用画像を参照画像として参照してもよい。 The encoding unit 72 scales the reference-only image so that the size of the subject in the reference-only image corresponds to the size of the subject in the encoding-target image, and refers to the scaled reference-only image as a reference image. Also good. The decoding unit 82 may scale the reference-only image so that the size of the subject in the reference-only image corresponds to the size of the subject in the decoding-target image, and refer to the scaled reference-only image as a reference image. .

符号化部７２は、参照専用画像および符号化対象画像のそれぞれの撮影情報、または、参照専用画像および符号化対象画像のそれぞれにおける特徴点の位置を用いて、参照専用画像をスケーリングしてもよい。復号部８２は、参照専用画像および復号対象画像のそれぞれの撮影情報、または、参照専用画像および復号対象画像のそれぞれにおける特徴点の位置を用いて、参照専用画像をスケーリングしてもよい。復号部８２は、復号済みの画像から復号対象画像の特徴点の位置を推定してもよい。 The encoding unit 72 may scale the reference-only image using the shooting information of each of the reference-only image and the encoding target image or the position of the feature point in each of the reference-only image and the encoding target image. . The decoding unit 82 may scale the reference-only image using the shooting information of each of the reference-only image and the decoding target image, or the position of the feature point in each of the reference-only image and the decoding target image. The decoding unit 82 may estimate the position of the feature point of the decoding target image from the decoded image.

符号化部７２および復号部８２は、面間予測で用いられる動きベクトルの精度に従って、参照専用画像をスケーリングしてもよい。 The encoding unit 72 and the decoding unit 82 may scale the reference-only image according to the accuracy of the motion vector used in the inter prediction.

符号化部７２は、変換パラメータを符号化してもよい。復号部８２は、変換パラメータを復号してもよい。変換パラメータは、参照専用画像の変換に用いられたパラメータである。変換パラメータには、スケーリングの比率が含まれてもよい。 The encoding unit 72 may encode the conversion parameter. The decoding unit 82 may decode the conversion parameter. The conversion parameter is a parameter used for conversion of the reference-only image. The conversion parameter may include a scaling ratio.

符号化部７２は、全体ベクトルを符号化してもよい。復号部８２は、全体ベクトルを復号してもよい。全体ベクトルは、参照専用画像において１以上の表示対象画像のうちの符号化対象画像が対応する領域の位置を指し示すベクトルである。 The encoding unit 72 may encode the entire vector. The decoding unit 82 may decode the entire vector. The overall vector is a vector indicating the position of the region corresponding to the encoding target image among the one or more display target images in the reference-only image.

符号化部７２は、参照専用画像および符号化対象画像のそれぞれの撮影情報、または、参照専用画像および符号化対象画像のそれぞれにおける特徴点の位置を用いて、全体ベクトルを算出し、算出された全体ベクトルを符号化してもよい。復号部８２は、撮影情報または特徴点の位置を用いて算出され符号化された全体ベクトルを復号してもよい。 The encoding unit 72 calculates the entire vector using the shooting information of each of the reference-dedicated image and the encoding target image or the position of the feature point in each of the reference-dedicated image and the encoding target image. The entire vector may be encoded. The decoding unit 82 may decode the entire vector calculated and encoded using the shooting information or the position of the feature point.

符号化部７２は、１以上の表示対象画像を符号化して、参照専用画像を含む符号列とは別に、１以上の表示対象画像を含む符号列を生成してもよい。復号部８２は、参照専用画像を含む符号列とは別の符号列に含まれる１以上の表示対象画像を復号してもよい。 The encoding unit 72 may encode one or more display target images and generate a code string including one or more display target images separately from the code string including the reference-only image. The decoding unit 82 may decode one or more display target images included in a code string different from the code string including the reference-only image.

符号化部７２は、参照専用画像を非表示画像として符号化してもよい。復号部８２は、非表示画像として符号化された参照専用画像を復号してもよい。言い換えれば、復号部８２は、非表示画像を参照専用画像として復号してもよい。 The encoding unit 72 may encode the reference-only image as a non-display image. The decoding unit 82 may decode the reference-only image encoded as a non-display image. In other words, the decoding unit 82 may decode a non-display image as a reference-only image.

画像符号化装置７０、画像復号装置８０、および、画像管理装置９０は、通信ネットワークを介して、互いに接続されていてもよい。 The image encoding device 70, the image decoding device 80, and the image management device 90 may be connected to each other via a communication network.

なお、上記に示された画像処理システム１２および画像管理装置９０は、それぞれ、画像配信システムおよび画像配信装置と表現されてもよい。また、画像管理装置９０は、画像符号化装置７０に含まれてもよいし、画像復号装置８０に含まれてもよい。また、上記に示された画像処理システム１２等は、特に、背景の変化が小さい映像を処理するシステムに有用であり、例えば、防犯カメラシステム、または、定点観測カメラシステムに有用である。 Note that the image processing system 12 and the image management apparatus 90 described above may be expressed as an image distribution system and an image distribution apparatus, respectively. Further, the image management device 90 may be included in the image encoding device 70 or may be included in the image decoding device 80. The image processing system 12 and the like shown above are particularly useful for a system that processes an image with a small background change, and for example, a security camera system or a fixed point observation camera system.

以上の各実施の形態において、各構成要素は、例えば、ＭＰＵおよびメモリ等を含む回路によって実現される。また、各構成要素が実行する処理は、ソフトウェア（プログラム）によって実行されてもよい。当該ソフトウェアは、例えば、ＲＯＭ等の記録媒体に記録されている。そして、このようなソフトウェアは、ダウンロード等により配布されてもよいし、ＣＤ−ＲＯＭなどの記録媒体に記録して配布されてもよい。なお、各構成要素をハードウェア（専用回路）によって実現することも、当然、可能である。 In each of the embodiments described above, each component is realized by a circuit including an MPU, a memory, and the like, for example. Further, the processing executed by each component may be executed by software (program). The software is recorded on a recording medium such as a ROM. Such software may be distributed by downloading or the like, or may be distributed by being recorded on a recording medium such as a CD-ROM. Of course, each component can be realized by hardware (dedicated circuit).

つまり、上記各実施の形態において、各構成要素は、専用のハードウェアで構成されるか、各構成要素に適したソフトウェアプログラムを実行することによって実現されてもよい。各構成要素は、ＣＰＵまたはプロセッサなどのプログラム実行部が、ハードディスクまたは半導体メモリなどの記録媒体に記録されたソフトウェアプログラムを読み出して実行することによって実現されてもよい。 That is, in each of the above embodiments, each component may be configured by dedicated hardware, or may be realized by executing a software program suitable for each component. Each component may be realized by a program execution unit such as a CPU or a processor reading and executing a software program recorded on a recording medium such as a hard disk or a semiconductor memory.

言い換えると、画像符号化装置および画像復号装置などは、処理回路（ＰｒｏｃｅｓｓｉｎｇＣｉｒｃｕｉｔｒｙ）と、当該処理回路に電気的に接続された（当該処理回路からアクセス可能な）記憶装置（Ｓｔｏｒａｇｅ）とを備える。処理回路は、専用のハードウェアおよびプログラム実行部の少なくとも一方を含み、記憶装置を用いて処理を実行する。また、記憶装置は、処理回路がプロラグム実行部を含む場合には、当該プログラム実行部により実行されるソフトウェアプログラムを記憶する。 In other words, the image encoding device, the image decoding device, and the like include a processing circuit (Processing Circuit) and a storage device (Storage) that is electrically connected to the processing circuit (accessible from the processing circuit). The processing circuit includes at least one of dedicated hardware and a program execution unit, and executes processing using a storage device. Further, when the processing circuit includes a program execution unit, the storage device stores a software program executed by the program execution unit.

ここで、上記各実施の形態の画像符号化装置および画像復号装置などを実現するソフトウェアは、次のようなプログラムである。 Here, the software that realizes the image encoding device, the image decoding device, and the like according to each of the above embodiments is the following program.

すなわち、このプログラムは、コンピュータに、面間予測を用いて、映像を構成する複数の表示対象画像を符号化する画像符号化方法であって、前記複数の表示対象画像とも前記複数の表示対象画像の複数の再構成画像とも異なる画像であり前記面間予測において参照専用として用いられる画像である参照専用画像を取得する取得ステップと、前記参照専用画像を前記面間予測における参照画像として参照して、前記複数の表示対象画像のうち１以上の表示対象画像を符号化する符号化ステップとを含む画像符号化方法を実行させる。 That is, this program is an image encoding method for encoding a plurality of display target images constituting a video by using inter-plane prediction on a computer, and both the plurality of display target images are the plurality of display target images. An acquisition step of acquiring a reference-only image that is an image different from the plurality of reconstructed images and is used as a reference-only image in the inter-frame prediction, and referring to the reference-only image as a reference image in the inter-surface prediction. An image encoding method including an encoding step of encoding one or more display target images among the plurality of display target images is executed.

また、このプログラムは、コンピュータに、面間予測を用いて、映像を構成する複数の表示対象画像を復号する画像復号方法であって、前記複数の表示対象画像とも前記複数の表示対象画像の複数の再構成画像とも異なる画像であり前記面間予測において参照専用として用いられる画像である参照専用画像を取得する取得ステップと、前記参照専用画像を前記面間予測における参照画像として参照して、前記複数の表示対象画像のうち１以上の表示対象画像を復号する復号ステップとを備える画像復号方法を実行させてもよい。 Further, the program is an image decoding method for decoding a plurality of display target images constituting a video by using inter prediction in a computer, wherein the plurality of display target images are a plurality of the display target images. An acquisition step of obtaining a reference-only image that is an image different from the reconstructed image and used as a reference-only in the inter-frame prediction, and referring to the reference-only image as a reference image in the inter-frame prediction, An image decoding method including a decoding step of decoding one or more display target images among a plurality of display target images may be executed.

また、各構成要素は、上述の通り、回路であってもよい。これらの回路は、全体として１つの回路を構成してもよいし、それぞれ別々の回路であってもよい。また、各構成要素は、汎用的なプロセッサで実現されてもよいし、専用のプロセッサで実現されてもよい。 Each component may be a circuit as described above. These circuits may constitute one circuit as a whole, or may be separate circuits. Each component may be realized by a general-purpose processor or a dedicated processor.

また、特定の構成要素が実行する処理を別の構成要素が実行してもよい。また、処理を実行する順番が変更されてもよいし、複数の処理が並行して実行されてもよい。また、画像符号化復号装置が、画像符号化装置および画像復号装置を備えていてもよい。 Moreover, another component may perform the process which a specific component performs. In addition, the order in which the processes are executed may be changed, or a plurality of processes may be executed in parallel. Further, the image encoding / decoding device may include an image encoding device and an image decoding device.

また、各実施の形態において説明された処理は、単一の装置（システム）を用いて実行される集中処理として実行されてもよいし、あるいは、複数の装置を用いて実行される分散処理として実行されてもよい。また、上記のプログラムを実行するコンピュータは、単数であってもよいし、複数であってもよい。すなわち、プログラムの実行において、集中処理が行われてもよいし、分散処理が行われてもよい。 Further, the processing described in each embodiment may be executed as centralized processing executed using a single device (system), or as distributed processing executed using a plurality of devices. May be executed. Moreover, the computer which performs said program may be single, and plural may be sufficient as it. That is, in the execution of the program, centralized processing may be performed, or distributed processing may be performed.

以上、一つまたは複数の態様に係る画像符号化装置および画像復号装置について、実施の形態に基づいて説明したが、本発明は、この実施の形態に限定されるものではない。本発明の趣旨を逸脱しない限り、当業者が思いつく各種変形を本実施の形態に施したものや、異なる実施の形態における構成要素を組み合わせて構築される形態も、一つまたは複数の態様の範囲内に含まれてもよい。 As described above, the image encoding device and the image decoding device according to one or a plurality of aspects have been described based on the embodiment, but the present invention is not limited to this embodiment. Unless it deviates from the gist of the present invention, various modifications conceived by those skilled in the art have been made in this embodiment, and forms constructed by combining components in different embodiments are also within the scope of one or more aspects. May be included.

（実施の形態４）
上記各実施の形態で示した動画像符号化方法（画像符号化方法）または動画像復号化方法（画像復号方法）の構成を実現するためのプログラムを記憶メディアに記録することにより、上記各実施の形態で示した処理を独立したコンピュータシステムにおいて簡単に実施することが可能となる。記憶メディアは、磁気ディスク、光ディスク、光磁気ディスク、ＩＣカード、半導体メモリ等、プログラムを記録できるものであればよい。 (Embodiment 4)
By recording a program for realizing the configuration of the moving image encoding method (image encoding method) or the moving image decoding method (image decoding method) shown in each of the above embodiments on a storage medium, each of the above embodiments It is possible to easily execute the processing shown in the form in the independent computer system. The storage medium may be any medium that can record a program, such as a magnetic disk, an optical disk, a magneto-optical disk, an IC card, and a semiconductor memory.

さらにここで、上記各実施の形態で示した動画像符号化方法（画像符号化方法）や動画像復号化方法（画像復号方法）の応用例とそれを用いたシステムを説明する。当該システムは、画像符号化方法を用いた画像符号化装置、及び画像復号方法を用いた画像復号装置からなる画像符号化復号装置を有することを特徴とする。システムにおける他の構成について、場合に応じて適切に変更することができる。 Furthermore, application examples of the moving picture coding method (picture coding method) and the moving picture decoding method (picture decoding method) shown in the above embodiments and a system using the same will be described. The system has an image encoding / decoding device including an image encoding device using an image encoding method and an image decoding device using an image decoding method. Other configurations in the system can be appropriately changed according to circumstances.

図２３は、コンテンツ配信サービスを実現するコンテンツ供給システムex１００の全体構成を示す図である。通信サービスの提供エリアを所望の大きさに分割し、各セル内にそれぞれ固定無線局である基地局ex１０６、ex１０７、ex１０８、ex１０９、ex１１０が設置されている。 FIG. 23 is a diagram illustrating an overall configuration of a content supply system ex100 that implements a content distribution service. A communication service providing area is divided into desired sizes, and base stations ex106, ex107, ex108, ex109, and ex110, which are fixed wireless stations, are installed in each cell.

このコンテンツ供給システムex１００は、インターネットex１０１にインターネットサービスプロバイダex１０２および電話網ex１０４、および基地局ex１０６からex１１０を介して、コンピュータex１１１、ＰＤＡ（Personal Digital Assistant）ex１１２、カメラex１１３、携帯電話ex１１４、ゲーム機ex１１５などの各機器が接続される。 The content supply system ex100 includes a computer ex111, a PDA (Personal Digital Assistant) ex112, a camera ex113, a mobile phone ex114, and a game machine ex115 via the Internet ex101, the Internet service provider ex102, the telephone network ex104, and the base stations ex106 to ex110. Etc. are connected.

しかし、コンテンツ供給システムex１００は図２３のような構成に限定されず、いずれかの要素を組合せて接続するようにしてもよい。また、固定無線局である基地局ex１０６からex１１０を介さずに、各機器が電話網ex１０４に直接接続されてもよい。また、各機器が近距離無線等を介して直接相互に接続されていてもよい。 However, the content supply system ex100 is not limited to the configuration as shown in FIG. 23, and any element may be combined and connected. In addition, each device may be directly connected to the telephone network ex104 without going from the base station ex106, which is a fixed wireless station, to ex110. In addition, the devices may be directly connected to each other via short-range wireless or the like.

カメラex１１３はデジタルビデオカメラ等の動画撮影が可能な機器であり、カメラex１１６はデジタルカメラ等の静止画撮影、動画撮影が可能な機器である。また、携帯電話ex１１４は、ＧＳＭ（登録商標）（Global System for Mobile Communications）方式、ＣＤＭＡ（Code Division Multiple Access）方式、Ｗ−ＣＤＭＡ（Wideband-Code Division Multiple Access）方式、若しくはＬＴＥ（Long Term Evolution）方式、ＨＳＰＡ(High Speed Packet Access)の携帯電話機、またはＰＨＳ（Personal Handyphone System）等であり、いずれでも構わない。 The camera ex113 is a device capable of shooting a moving image such as a digital video camera, and the camera ex116 is a device capable of shooting a still image and moving image such as a digital camera. The mobile phone ex114 is a GSM (registered trademark) (Global System for Mobile Communications) system, a CDMA (Code Division Multiple Access) system, a W-CDMA (Wideband-Code Division Multiple Access) system, or LTE (Long Term Evolution). It may be a system, a HSPA (High Speed Packet Access) mobile phone, a PHS (Personal Handyphone System), or the like.

コンテンツ供給システムex１００では、カメラex１１３等が基地局ex１０９、電話網ex１０４を通じてストリーミングサーバex１０３に接続されることで、ライブ配信等が可能になる。ライブ配信では、ユーザがカメラex１１３を用いて撮影するコンテンツ（例えば、音楽ライブの映像等）に対して上記各実施の形態で説明したように符号化処理を行い（即ち、本発明の一態様に係る画像符号化装置として機能する）、ストリーミングサーバex１０３に送信する。一方、ストリーミングサーバex１０３は要求のあったクライアントに対して送信されたコンテンツデータをストリーム配信する。クライアントとしては、上記符号化処理されたデータを復号化することが可能な、コンピュータex１１１、ＰＤＡex１１２、カメラex１１３、携帯電話ex１１４、ゲーム機ex１１５等がある。配信されたデータを受信した各機器では、受信したデータを復号化処理して再生する（即ち、本発明の一態様に係る画像復号装置として機能する）。 In the content supply system ex100, the camera ex113 and the like are connected to the streaming server ex103 through the base station ex109 and the telephone network ex104, thereby enabling live distribution and the like. In live distribution, content that is shot by a user using the camera ex113 (for example, music live video) is encoded as described in each of the above embodiments (that is, in one aspect of the present invention). Functions as an image encoding device), and transmits it to the streaming server ex103. On the other hand, the streaming server ex103 stream-distributes the content data transmitted to the requested client. Examples of the client include a computer ex111, a PDA ex112, a camera ex113, a mobile phone ex114, and a game machine ex115 that can decode the encoded data. Each device that receives the distributed data decodes the received data and reproduces it (that is, functions as an image decoding device according to one embodiment of the present invention).

なお、撮影したデータの符号化処理はカメラex１１３で行っても、データの送信処理をするストリーミングサーバex１０３で行ってもよいし、互いに分担して行ってもよい。同様に配信されたデータの復号化処理はクライアントで行っても、ストリーミングサーバex１０３で行ってもよいし、互いに分担して行ってもよい。また、カメラex１１３に限らず、カメラex１１６で撮影した静止画像および／または動画像データを、コンピュータex１１１を介してストリーミングサーバex１０３に送信してもよい。この場合の符号化処理はカメラex１１６、コンピュータex１１１、ストリーミングサーバex１０３のいずれで行ってもよいし、互いに分担して行ってもよい。 Note that the captured data may be encoded by the camera ex113, the streaming server ex103 that performs data transmission processing, or may be shared with each other. Similarly, the decryption processing of the distributed data may be performed by the client, the streaming server ex103, or may be performed in common with each other. In addition to the camera ex113, still images and / or moving image data captured by the camera ex116 may be transmitted to the streaming server ex103 via the computer ex111. The encoding process in this case may be performed by any of the camera ex116, the computer ex111, and the streaming server ex103, or may be performed in a shared manner.

また、これら符号化・復号化処理は、一般的にコンピュータex１１１や各機器が有するＬＳＩex５００において処理する。ＬＳＩex５００は、ワンチップであっても複数チップからなる構成であってもよい。なお、動画像符号化・復号化用のソフトウェアをコンピュータex１１１等で読み取り可能な何らかの記録メディア（ＣＤ−ＲＯＭ、フレキシブルディスク、ハードディスクなど）に組み込み、そのソフトウェアを用いて符号化・復号化処理を行ってもよい。さらに、携帯電話ex１１４がカメラ付きである場合には、そのカメラで取得した動画データを送信してもよい。このときの動画データは携帯電話ex１１４が有するＬＳＩex５００で符号化処理されたデータである。 These encoding / decoding processes are generally performed by the computer ex111 and the LSI ex500 included in each device. The LSI ex500 may be configured as a single chip or a plurality of chips. It should be noted that moving image encoding / decoding software is incorporated into some recording medium (CD-ROM, flexible disk, hard disk, etc.) that can be read by the computer ex111 and the like, and encoding / decoding processing is performed using the software. May be. Furthermore, when the mobile phone ex114 is equipped with a camera, moving image data acquired by the camera may be transmitted. The moving image data at this time is data encoded by the LSI ex500 included in the mobile phone ex114.

また、ストリーミングサーバex１０３は複数のサーバや複数のコンピュータであって、データを分散して処理したり記録したり配信するものであってもよい。 The streaming server ex103 may be a plurality of servers or a plurality of computers, and may process, record, and distribute data in a distributed manner.

以上のようにして、コンテンツ供給システムex１００では、符号化されたデータをクライアントが受信して再生することができる。このようにコンテンツ供給システムex１００では、ユーザが送信した情報をリアルタイムでクライアントが受信して復号化し、再生することができ、特別な権利や設備を有さないユーザでも個人放送を実現できる。 As described above, in the content supply system ex100, the client can receive and reproduce the encoded data. Thus, in the content supply system ex100, the information transmitted by the user can be received, decrypted and reproduced by the client in real time, and personal broadcasting can be realized even for a user who does not have special rights or facilities.

なお、コンテンツ供給システムex１００の例に限らず、図２４に示すように、デジタル放送用システムex２００にも、上記各実施の形態の少なくとも動画像符号化装置（画像符号化装置）または動画像復号化装置（画像復号装置）のいずれかを組み込むことができる。具体的には、放送局ex２０１では映像データに音楽データなどが多重化された多重化データが電波を介して通信または衛星ex２０２に伝送される。この映像データは上記各実施の形態で説明した動画像符号化方法により符号化されたデータである（即ち、本発明の一態様に係る画像符号化装置によって符号化されたデータである）。これを受けた放送衛星ex２０２は、放送用の電波を発信し、この電波を衛星放送の受信が可能な家庭のアンテナex２０４が受信する。受信した多重化データを、テレビ（受信機）ex３００またはセットトップボックス（ＳＴＢ）ex２１７等の装置が復号化して再生する（即ち、本発明の一態様に係る画像復号装置として機能する）。 In addition to the example of the content supply system ex100, as shown in FIG. 24, the digital broadcasting system ex200 also includes at least the moving image encoding device (image encoding device) or the moving image decoding according to each of the above embodiments. Any of the devices (image decoding devices) can be incorporated. Specifically, in the broadcast station ex201, multiplexed data obtained by multiplexing music data and the like on video data is transmitted to a communication or satellite ex202 via radio waves. This video data is data encoded by the moving image encoding method described in each of the above embodiments (that is, data encoded by the image encoding apparatus according to one aspect of the present invention). Receiving this, the broadcasting satellite ex202 transmits a radio wave for broadcasting, and this radio wave is received by a home antenna ex204 capable of receiving satellite broadcasting. The received multiplexed data is decoded and reproduced by an apparatus such as the television (receiver) ex300 or the set top box (STB) ex217 (that is, functions as an image decoding apparatus according to one embodiment of the present invention).

また、ＤＶＤ、ＢＤ等の記録メディアex２１５に記録した多重化データを読み取り復号化する、または記録メディアex２１５に映像信号を符号化し、さらに場合によっては音楽信号と多重化して書き込むリーダ／レコーダex２１８にも上記各実施の形態で示した動画像復号化装置または動画像符号化装置を実装することが可能である。この場合、再生された映像信号はモニタex２１９に表示され、多重化データが記録された記録メディアex２１５により他の装置やシステムにおいて映像信号を再生することができる。また、ケーブルテレビ用のケーブルex２０３または衛星／地上波放送のアンテナex２０４に接続されたセットトップボックスex２１７内に動画像復号化装置を実装し、これをテレビのモニタex２１９で表示してもよい。このときセットトップボックスではなく、テレビ内に動画像復号化装置を組み込んでもよい。 Also, a reader / recorder ex218 that reads and decodes multiplexed data recorded on a recording medium ex215 such as a DVD or a BD, or encodes a video signal on the recording medium ex215 and, in some cases, multiplexes and writes it with a music signal. It is possible to mount the moving picture decoding apparatus or moving picture encoding apparatus described in the above embodiments. In this case, the reproduced video signal is displayed on the monitor ex219, and the video signal can be reproduced in another device or system using the recording medium ex215 on which the multiplexed data is recorded. Alternatively, a moving picture decoding apparatus may be mounted in a set-top box ex217 connected to a cable ex203 for cable television or an antenna ex204 for satellite / terrestrial broadcasting and displayed on the monitor ex219 of the television. At this time, the moving picture decoding apparatus may be incorporated in the television instead of the set top box.

図２５は、上記各実施の形態で説明した動画像復号化方法および動画像符号化方法を用いたテレビ（受信機）ex３００を示す図である。テレビex３００は、上記放送を受信するアンテナex２０４またはケーブルex２０３等を介して映像データに音声データが多重化された多重化データを取得、または出力するチューナex３０１と、受信した多重化データを復調する、または外部に送信する多重化データに変調する変調／復調部ex３０２と、復調した多重化データを映像データと、音声データとに分離する、または信号処理部ex３０６で符号化された映像データ、音声データを多重化する多重／分離部ex３０３を備える。 FIG. 25 is a diagram illustrating a television (receiver) ex300 that uses the video decoding method and the video encoding method described in each of the above embodiments. The television ex300 obtains or outputs multiplexed data in which audio data is multiplexed with video data via the antenna ex204 or the cable ex203 that receives the broadcast, and demodulates the received multiplexed data. Alternatively, the modulation / demodulation unit ex302 that modulates multiplexed data to be transmitted to the outside, and the demodulated multiplexed data is separated into video data and audio data, or the video data and audio data encoded by the signal processing unit ex306 Is provided with a multiplexing / demultiplexing unit ex303.

また、テレビex３００は、音声データ、映像データそれぞれを復号化する、またはそれぞれの情報を符号化する音声信号処理部ex３０４、映像信号処理部ex３０５（本発明の一態様に係る画像符号化装置または画像復号装置として機能する）を有する信号処理部ex３０６と、復号化した音声信号を出力するスピーカex３０７、復号化した映像信号を表示するディスプレイ等の表示部ex３０８を有する出力部ex３０９とを有する。さらに、テレビex３００は、ユーザ操作の入力を受け付ける操作入力部ex３１２等を有するインタフェース部ex３１７を有する。さらに、テレビex３００は、各部を統括的に制御する制御部ex３１０、各部に電力を供給する電源回路部ex３１１を有する。インタフェース部ex３１７は、操作入力部ex３１２以外に、リーダ／レコーダex２１８等の外部機器と接続されるブリッジex３１３、ＳＤカード等の記録メディアex２１６を装着可能とするためのスロット部ex３１４、ハードディスク等の外部記録メディアと接続するためのドライバex３１５、電話網と接続するモデムex３１６等を有していてもよい。なお記録メディアex２１６は、格納する不揮発性／揮発性の半導体メモリ素子により電気的に情報の記録を可能としたものである。テレビex３００の各部は同期バスを介して互いに接続されている。 The television ex300 also decodes the audio data and the video data, or encodes the information, the audio signal processing unit ex304, the video signal processing unit ex305 (the image encoding device or the image according to one embodiment of the present invention) A signal processing unit ex306 that functions as a decoding device), a speaker ex307 that outputs the decoded audio signal, and an output unit ex309 that includes a display unit ex308 such as a display that displays the decoded video signal. Furthermore, the television ex300 includes an interface unit ex317 including an operation input unit ex312 that receives an input of a user operation. Furthermore, the television ex300 includes a control unit ex310 that performs overall control of each unit, and a power supply circuit unit ex311 that supplies power to each unit. In addition to the operation input unit ex312, the interface unit ex317 includes a bridge unit ex313 connected to an external device such as a reader / recorder ex218, a recording unit ex216 such as an SD card, and an external recording unit such as a hard disk. A driver ex315 for connecting to a medium, a modem ex316 for connecting to a telephone network, and the like may be included. Note that the recording medium ex216 is capable of electrically recording information by using a nonvolatile / volatile semiconductor memory element to be stored. Each part of the television ex300 is connected to each other via a synchronous bus.

まず、テレビex３００がアンテナex２０４等により外部から取得した多重化データを復号化し、再生する構成について説明する。テレビex３００は、リモートコントローラex２２０等からのユーザ操作を受け、ＣＰＵ等を有する制御部ex３１０の制御に基づいて、変調／復調部ex３０２で復調した多重化データを多重／分離部ex３０３で分離する。さらにテレビex３００は、分離した音声データを音声信号処理部ex３０４で復号化し、分離した映像データを映像信号処理部ex３０５で上記各実施の形態で説明した復号化方法を用いて復号化する。復号化した音声信号、映像信号は、それぞれ出力部ex３０９から外部に向けて出力される。出力する際には、音声信号と映像信号が同期して再生するよう、バッファex３１８、ex３１９等に一旦これらの信号を蓄積するとよい。また、テレビex３００は、放送等からではなく、磁気／光ディスク、ＳＤカード等の記録メディアex２１５、ex２１６から多重化データを読み出してもよい。次に、テレビex３００が音声信号や映像信号を符号化し、外部に送信または記録メディア等に書き込む構成について説明する。テレビex３００は、リモートコントローラex２２０等からのユーザ操作を受け、制御部ex３１０の制御に基づいて、音声信号処理部ex３０４で音声信号を符号化し、映像信号処理部ex３０５で映像信号を上記各実施の形態で説明した符号化方法を用いて符号化する。符号化した音声信号、映像信号は多重／分離部ex３０３で多重化され外部に出力される。多重化する際には、音声信号と映像信号が同期するように、バッファex３２０、ex３２１等に一旦これらの信号を蓄積するとよい。なお、バッファex３１８、ex３１９、ex３２０、ex３２１は図示しているように複数備えていてもよいし、１つ以上のバッファを共有する構成であってもよい。さらに、図示している以外に、例えば変調／復調部ex３０２や多重／分離部ex３０３の間等でもシステムのオーバフロー、アンダーフローを避ける緩衝材としてバッファにデータを蓄積することとしてもよい。 First, a configuration in which the television ex300 decodes and reproduces multiplexed data acquired from the outside by the antenna ex204 or the like will be described. The television ex300 receives a user operation from the remote controller ex220 or the like, and demultiplexes the multiplexed data demodulated by the modulation / demodulation unit ex302 by the multiplexing / demultiplexing unit ex303 based on the control of the control unit ex310 having a CPU or the like. Furthermore, in the television ex300, the separated audio data is decoded by the audio signal processing unit ex304, and the separated video data is decoded by the video signal processing unit ex305 using the decoding method described in each of the above embodiments. The decoded audio signal and video signal are output from the output unit ex309 to the outside. At the time of output, these signals may be temporarily stored in the buffers ex318, ex319, etc. so that the audio signal and the video signal are reproduced in synchronization. Also, the television ex300 may read multiplexed data from recording media ex215 and ex216 such as a magnetic / optical disk and an SD card, not from broadcasting. Next, a configuration in which the television ex300 encodes an audio signal or a video signal and transmits the signal to the outside or to a recording medium will be described. The television ex300 receives a user operation from the remote controller ex220 and the like, encodes an audio signal with the audio signal processing unit ex304, and converts the video signal with the video signal processing unit ex305 based on the control of the control unit ex310. Encoding is performed using the encoding method described in (1). The encoded audio signal and video signal are multiplexed by the multiplexing / demultiplexing unit ex303 and output to the outside. When multiplexing, these signals may be temporarily stored in the buffers ex320, ex321, etc. so that the audio signal and the video signal are synchronized. Note that a plurality of buffers ex318, ex319, ex320, and ex321 may be provided as illustrated, or one or more buffers may be shared. Further, in addition to the illustrated example, data may be stored in the buffer as a buffer material that prevents system overflow and underflow, for example, between the modulation / demodulation unit ex302 and the multiplexing / demultiplexing unit ex303.

また、テレビex３００は、放送等や記録メディア等から音声データ、映像データを取得する以外に、マイクやカメラのＡＶ入力を受け付ける構成を備え、それらから取得したデータに対して符号化処理を行ってもよい。なお、ここではテレビex３００は上記の符号化処理、多重化、および外部出力ができる構成として説明したが、これらの処理を行うことはできず、上記受信、復号化処理、外部出力のみが可能な構成であってもよい。 In addition to acquiring audio data and video data from broadcasts, recording media, and the like, the television ex300 has a configuration for receiving AV input of a microphone and a camera, and performs encoding processing on the data acquired from them. Also good. Here, the television ex300 has been described as a configuration capable of the above-described encoding processing, multiplexing, and external output, but these processing cannot be performed, and only the above-described reception, decoding processing, and external output are possible. It may be a configuration.

また、リーダ／レコーダex２１８で記録メディアから多重化データを読み出す、または書き込む場合には、上記復号化処理または符号化処理はテレビex３００、リーダ／レコーダex２１８のいずれで行ってもよいし、テレビex３００とリーダ／レコーダex２１８が互いに分担して行ってもよい。 In addition, when reading or writing multiplexed data from a recording medium by the reader / recorder ex218, the decoding process or the encoding process may be performed by either the television ex300 or the reader / recorder ex218, The reader / recorder ex218 may share with each other.

一例として、光ディスクからデータの読み込みまたは書き込みをする場合の情報再生／記録部ex４００の構成を図２６に示す。情報再生／記録部ex４００は、以下に説明する要素ex４０１、ex４０２、ex４０３、ex４０４、ex４０５、ex４０６、ex４０７を備える。光ヘッドex４０１は、光ディスクである記録メディアex２１５の記録面にレーザスポットを照射して情報を書き込み、記録メディアex２１５の記録面からの反射光を検出して情報を読み込む。変調記録部ex４０２は、光ヘッドex４０１に内蔵された半導体レーザを電気的に駆動し記録データに応じてレーザ光の変調を行う。再生復調部ex４０３は、光ヘッドex４０１に内蔵されたフォトディテクタにより記録面からの反射光を電気的に検出した再生信号を増幅し、記録メディアex２１５に記録された信号成分を分離して復調し、必要な情報を再生する。バッファex４０４は、記録メディアex２１５に記録するための情報および記録メディアex２１５から再生した情報を一時的に保持する。ディスクモータex４０５は記録メディアex２１５を回転させる。サーボ制御部ex４０６は、ディスクモータex４０５の回転駆動を制御しながら光ヘッドex４０１を所定の情報トラックに移動させ、レーザスポットの追従処理を行う。システム制御部ex４０７は、情報再生／記録部ex４００全体の制御を行う。上記の読み出しや書き込みの処理はシステム制御部ex４０７が、バッファex４０４に保持された各種情報を利用し、また必要に応じて新たな情報の生成・追加を行うと共に、変調記録部ex４０２、再生復調部ex４０３、サーボ制御部ex４０６を協調動作させながら、光ヘッドex４０１を通して、情報の記録再生を行うことにより実現される。システム制御部ex４０７は例えばマイクロプロセッサで構成され、読み出し書き込みのプログラムを実行することでそれらの処理を実行する。 As an example, FIG. 26 shows a configuration of an information reproducing / recording unit ex400 when data is read from or written to an optical disk. The information reproducing / recording unit ex400 includes elements ex401, ex402, ex403, ex404, ex405, ex406, and ex407 described below. The optical head ex401 irradiates a laser spot on the recording surface of the recording medium ex215 that is an optical disk to write information, and detects information reflected from the recording surface of the recording medium ex215 to read the information. The modulation recording unit ex402 electrically drives a semiconductor laser built in the optical head ex401 and modulates the laser beam according to the recording data. The reproduction demodulator ex403 amplifies the reproduction signal obtained by electrically detecting the reflected light from the recording surface by the photodetector built in the optical head ex401, separates and demodulates the signal component recorded on the recording medium ex215, and is necessary To play back information. The buffer ex404 temporarily holds information to be recorded on the recording medium ex215 and information reproduced from the recording medium ex215. The disk motor ex405 rotates the recording medium ex215. The servo control unit ex406 moves the optical head ex401 to a predetermined information track while controlling the rotational drive of the disk motor ex405, and performs a laser spot tracking process. The system control unit ex407 controls the entire information reproduction / recording unit ex400. In the reading and writing processes described above, the system control unit ex407 uses various types of information held in the buffer ex404, and generates and adds new information as necessary. The modulation recording unit ex402, the reproduction demodulation unit This is realized by recording / reproducing information through the optical head ex401 while operating the ex403 and the servo control unit ex406 in a coordinated manner. The system control unit ex407 includes, for example, a microprocessor, and executes these processes by executing a read / write program.

以上では、光ヘッドex４０１はレーザスポットを照射するとして説明したが、近接場光を用いてより高密度な記録を行う構成であってもよい。 In the above, the optical head ex401 has been described as irradiating a laser spot. However, the optical head ex401 may be configured to perform higher-density recording using near-field light.

図２７に光ディスクである記録メディアex２１５の模式図を示す。記録メディアex２１５の記録面には案内溝（グルーブ）がスパイラル状に形成され、情報トラックex２３０には、予めグルーブの形状の変化によってディスク上の絶対位置を示す番地情報が記録されている。この番地情報はデータを記録する単位である記録ブロックex２３１の位置を特定するための情報を含み、記録や再生を行う装置において情報トラックex２３０を再生し番地情報を読み取ることで記録ブロックを特定することができる。また、記録メディアex２１５は、データ記録領域ex２３３、内周領域ex２３２、外周領域ex２３４を含んでいる。ユーザデータを記録するために用いる領域がデータ記録領域ex２３３であり、データ記録領域ex２３３より内周または外周に配置されている内周領域ex２３２と外周領域ex２３４は、ユーザデータの記録以外の特定用途に用いられる。情報再生／記録部ex４００は、このような記録メディアex２１５のデータ記録領域ex２３３に対して、符号化された音声データ、映像データまたはそれらのデータを多重化した多重化データの読み書きを行う。 FIG. 27 shows a schematic diagram of a recording medium ex215 that is an optical disk. Guide grooves (grooves) are formed in a spiral shape on the recording surface of the recording medium ex215, and address information indicating the absolute position on the disc is recorded in advance on the information track ex230 by changing the shape of the groove. This address information includes information for specifying the position of the recording block ex231 that is a unit for recording data, and the recording block is specified by reproducing the information track ex230 and reading the address information in a recording or reproducing apparatus. Can do. Further, the recording medium ex215 includes a data recording area ex233, an inner peripheral area ex232, and an outer peripheral area ex234. The area used for recording user data is the data recording area ex233, and the inner circumference area ex232 and the outer circumference area ex234 arranged on the inner or outer circumference of the data recording area ex233 are used for specific purposes other than user data recording. Used. The information reproducing / recording unit ex400 reads / writes encoded audio data, video data, or multiplexed data obtained by multiplexing these data with respect to the data recording area ex233 of the recording medium ex215.

以上では、１層のＤＶＤ、ＢＤ等の光ディスクを例に挙げ説明したが、これらに限ったものではなく、多層構造であって表面以外にも記録可能な光ディスクであってもよい。また、ディスクの同じ場所にさまざまな異なる波長の色の光を用いて情報を記録したり、さまざまな角度から異なる情報の層を記録したりなど、多次元的な記録／再生を行う構造の光ディスクであってもよい。 In the above description, an optical disk such as a single-layer DVD or BD has been described as an example. However, the present invention is not limited to these, and an optical disk having a multilayer structure and capable of recording other than the surface may be used. Also, an optical disc with a multi-dimensional recording / reproducing structure, such as recording information using light of different wavelengths in the same place on the disc, or recording different layers of information from various angles. It may be.

また、デジタル放送用システムex２００において、アンテナex２０５を有する車ex２１０で衛星ex２０２等からデータを受信し、車ex２１０が有するカーナビゲーションex２１１等の表示装置に動画を再生することも可能である。なお、カーナビゲーションex２１１の構成は例えば図２５に示す構成のうち、ＧＰＳ受信部を加えた構成が考えられ、同様なことがコンピュータex１１１や携帯電話ex１１４等でも考えられる。 In the digital broadcasting system ex200, the car ex210 having the antenna ex205 can receive data from the satellite ex202 and the like, and the moving image can be reproduced on the display device such as the car navigation ex211 that the car ex210 has. Note that the configuration of the car navigation ex211 may be, for example, a configuration in which a GPS receiving unit is added in the configuration illustrated in FIG.

図２８Ａは、上記実施の形態で説明した動画像復号化方法および動画像符号化方法を用いた携帯電話ex１１４を示す図である。携帯電話ex１１４は、基地局ex１１０との間で電波を送受信するためのアンテナex３５０、映像、静止画を撮ることが可能なカメラ部ex３６５、カメラ部ex３６５で撮像した映像、アンテナex３５０で受信した映像等が復号化されたデータを表示する液晶ディスプレイ等の表示部ex３５８を備える。携帯電話ex１１４は、さらに、操作キー部ex３６６を有する本体部、音声を出力するためのスピーカ等である音声出力部ex３５７、音声を入力するためのマイク等である音声入力部ex３５６、撮影した映像、静止画、録音した音声、または受信した映像、静止画、メール等の符号化されたデータもしくは復号化されたデータを保存するメモリ部ex３６７、又は同様にデータを保存する記録メディアとのインタフェース部であるスロット部ex３６４を備える。 FIG. 28A is a diagram illustrating the mobile phone ex114 using the video decoding method and the video encoding method described in the above embodiment. The mobile phone ex114 includes an antenna ex350 for transmitting and receiving radio waves to and from the base station ex110, a camera unit ex365 capable of capturing video and still images, a video captured by the camera unit ex365, a video received by the antenna ex350, and the like Is provided with a display unit ex358 such as a liquid crystal display for displaying the decrypted data. The mobile phone ex114 further includes a main body unit having an operation key unit ex366, an audio output unit ex357 such as a speaker for outputting audio, an audio input unit ex356 such as a microphone for inputting audio, a captured video, In the memory unit ex367 for storing encoded data or decoded data such as still images, recorded audio, received video, still images, mails, or the like, or an interface unit with a recording medium for storing data A slot ex364 is provided.

さらに、携帯電話ex１１４の構成例について、図２８Ｂを用いて説明する。携帯電話ex１１４は、表示部ex３５８及び操作キー部ex３６６を備えた本体部の各部を統括的に制御する主制御部ex３６０に対して、電源回路部ex３６１、操作入力制御部ex３６２、映像信号処理部ex３５５、カメラインタフェース部ex３６３、ＬＣＤ（Liquid Crystal Display）制御部ex３５９、変調／復調部ex３５２、多重／分離部ex３５３、音声信号処理部ex３５４、スロット部ex３６４、メモリ部ex３６７がバスex３７０を介して互いに接続されている。 Furthermore, a configuration example of the mobile phone ex114 will be described with reference to FIG. 28B. The mobile phone ex114 has a power supply circuit part ex361, an operation input control part ex362, and a video signal processing part ex355 with respect to a main control part ex360 that comprehensively controls each part of the main body including the display part ex358 and the operation key part ex366. , A camera interface unit ex363, an LCD (Liquid Crystal Display) control unit ex359, a modulation / demodulation unit ex352, a multiplexing / demultiplexing unit ex353, an audio signal processing unit ex354, a slot unit ex364, and a memory unit ex367 are connected to each other via a bus ex370. ing.

電源回路部ex３６１は、ユーザの操作により終話及び電源キーがオン状態にされると、バッテリパックから各部に対して電力を供給することにより携帯電話ex１１４を動作可能な状態に起動する。 When the end call and the power key are turned on by a user operation, the power supply circuit ex361 starts up the mobile phone ex114 in an operable state by supplying power from the battery pack to each unit.

携帯電話ex１１４は、ＣＰＵ、ＲＯＭ、ＲＡＭ等を有する主制御部ex３６０の制御に基づいて、音声通話モード時に音声入力部ex３５６で収音した音声信号を音声信号処理部ex３５４でデジタル音声信号に変換し、これを変調／復調部ex３５２でスペクトラム拡散処理し、送信／受信部ex３５１でデジタルアナログ変換処理および周波数変換処理を施した後にアンテナex３５０を介して送信する。また携帯電話ex１１４は、音声通話モード時にアンテナex３５０を介して受信した受信データを増幅して周波数変換処理およびアナログデジタル変換処理を施し、変調／復調部ex３５２でスペクトラム逆拡散処理し、音声信号処理部ex３５４でアナログ音声信号に変換した後、これを音声出力部ex３５７から出力する。 The cellular phone ex114 converts the audio signal collected by the audio input unit ex356 in the voice call mode into a digital audio signal by the audio signal processing unit ex354 based on the control of the main control unit ex360 having a CPU, a ROM, a RAM, and the like. Then, this is subjected to spectrum spread processing by the modulation / demodulation unit ex352, digital-analog conversion processing and frequency conversion processing are performed by the transmission / reception unit ex351, and then transmitted via the antenna ex350. The mobile phone ex114 also amplifies the received data received via the antenna ex350 in the voice call mode, performs frequency conversion processing and analog-digital conversion processing, performs spectrum despreading processing by the modulation / demodulation unit ex352, and performs voice signal processing unit After being converted into an analog audio signal by ex354, this is output from the audio output unit ex357.

さらにデータ通信モード時に電子メールを送信する場合、本体部の操作キー部ex３６６等の操作によって入力された電子メールのテキストデータは操作入力制御部ex３６２を介して主制御部ex３６０に送出される。主制御部ex３６０は、テキストデータを変調／復調部ex３５２でスペクトラム拡散処理をし、送信／受信部ex３５１でデジタルアナログ変換処理および周波数変換処理を施した後にアンテナex３５０を介して基地局ex１１０へ送信する。電子メールを受信する場合は、受信したデータに対してこのほぼ逆の処理が行われ、表示部ex３５８に出力される。 Further, when an e-mail is transmitted in the data communication mode, the text data of the e-mail input by operating the operation key unit ex366 of the main unit is sent to the main control unit ex360 via the operation input control unit ex362. The main control unit ex360 performs spread spectrum processing on the text data in the modulation / demodulation unit ex352, performs digital analog conversion processing and frequency conversion processing in the transmission / reception unit ex351, and then transmits the text data to the base station ex110 via the antenna ex350. . In the case of receiving an e-mail, almost the reverse process is performed on the received data and output to the display unit ex358.

データ通信モード時に映像、静止画、または映像と音声を送信する場合、映像信号処理部ex３５５は、カメラ部ex３６５から供給された映像信号を上記各実施の形態で示した動画像符号化方法によって圧縮符号化し（即ち、本発明の一態様に係る画像符号化装置として機能する）、符号化された映像データを多重／分離部ex３５３に送出する。また、音声信号処理部ex３５４は、映像、静止画等をカメラ部ex３６５で撮像中に音声入力部ex３５６で収音した音声信号を符号化し、符号化された音声データを多重／分離部ex３５３に送出する。 When transmitting video, still images, or video and audio in the data communication mode, the video signal processing unit ex355 compresses the video signal supplied from the camera unit ex365 by the moving image encoding method described in the above embodiments. Encode (that is, function as an image encoding device according to an aspect of the present invention), and send the encoded video data to the multiplexing / demultiplexing unit ex353. The audio signal processing unit ex354 encodes the audio signal picked up by the audio input unit ex356 while the camera unit ex365 images a video, a still image, etc., and sends the encoded audio data to the multiplexing / separating unit ex353. To do.

多重／分離部ex３５３は、映像信号処理部ex３５５から供給された符号化された映像データと音声信号処理部ex３５４から供給された符号化された音声データを所定の方式で多重化し、その結果得られる多重化データを変調／復調部（変調／復調回路部）ex３５２でスペクトラム拡散処理をし、送信／受信部ex３５１でデジタルアナログ変換処理及び周波数変換処理を施した後にアンテナex３５０を介して送信する。 The multiplexing / demultiplexing unit ex353 multiplexes the encoded video data supplied from the video signal processing unit ex355 and the encoded audio data supplied from the audio signal processing unit ex354 by a predetermined method, and is obtained as a result. The multiplexed data is subjected to spread spectrum processing by the modulation / demodulation unit (modulation / demodulation circuit unit) ex352, digital-analog conversion processing and frequency conversion processing by the transmission / reception unit ex351, and then transmitted via the antenna ex350.

データ通信モード時にホームページ等にリンクされた動画像ファイルのデータを受信する場合、または映像およびもしくは音声が添付された電子メールを受信する場合、アンテナex３５０を介して受信された多重化データを復号化するために、多重／分離部ex３５３は、多重化データを分離することにより映像データのビットストリームと音声データのビットストリームとに分け、同期バスex３７０を介して符号化された映像データを映像信号処理部ex３５５に供給するとともに、符号化された音声データを音声信号処理部ex３５４に供給する。映像信号処理部ex３５５は、上記各実施の形態で示した動画像符号化方法に対応した動画像復号化方法によって復号化することにより映像信号を復号し（即ち、本発明の一態様に係る画像復号装置として機能する）、ＬＣＤ制御部ex３５９を介して表示部ex３５８から、例えばホームページにリンクされた動画像ファイルに含まれる映像、静止画が表示される。また音声信号処理部ex３５４は、音声信号を復号し、音声出力部ex３５７から音声が出力される。 Decode multiplexed data received via antenna ex350 when receiving video file data linked to a homepage, etc. in data communication mode, or when receiving e-mail with video and / or audio attached Therefore, the multiplexing / separating unit ex353 separates the multiplexed data into a video data bit stream and an audio data bit stream, and performs video signal processing on the video data encoded via the synchronization bus ex370. The encoded audio data is supplied to the audio signal processing unit ex354 while being supplied to the unit ex355. The video signal processing unit ex355 decodes the video signal by decoding using the video decoding method corresponding to the video encoding method described in each of the above embodiments (that is, an image according to an aspect of the present invention). For example, video and still images included in the moving image file linked to the home page are displayed from the display unit ex358 via the LCD control unit ex359. The audio signal processing unit ex354 decodes the audio signal, and the audio is output from the audio output unit ex357.

また、上記携帯電話ex１１４等の端末は、テレビex３００と同様に、符号化器・復号化器を両方持つ送受信型端末の他に、符号化器のみの送信端末、復号化器のみの受信端末という３通りの実装形式が考えられる。さらに、デジタル放送用システムex２００において、映像データに音楽データなどが多重化された多重化データを受信、送信するとして説明したが、音声データ以外に映像に関連する文字データなどが多重化されたデータであってもよいし、多重化データではなく映像データ自体であってもよい。 In addition to the transmission / reception type terminal having both the encoder and the decoder, the terminal such as the mobile phone ex114 is referred to as a transmission terminal having only an encoder and a receiving terminal having only a decoder. There are three possible mounting formats. Furthermore, in the digital broadcasting system ex200, it has been described that multiplexed data in which music data or the like is multiplexed with video data is received and transmitted, but data in which character data or the like related to video is multiplexed in addition to audio data It may be video data itself instead of multiplexed data.

このように、上記各実施の形態で示した動画像符号化方法あるいは動画像復号化方法を上述したいずれの機器・システムに用いることは可能であり、そうすることで、上記各実施の形態で説明した効果を得ることができる。 As described above, the moving picture encoding method or the moving picture decoding method shown in each of the above embodiments can be used in any of the above-described devices / systems. The described effect can be obtained.

また、本発明はかかる上記実施の形態に限定されるものではなく、本発明の範囲を逸脱することなく種々の変形または修正が可能である。 Further, the present invention is not limited to the above-described embodiment, and various modifications or corrections can be made without departing from the scope of the present invention.

（実施の形態５）
上記各実施の形態で示した動画像符号化方法または装置と、ＭＰＥＧ−２、ＭＰＥＧ４−ＡＶＣ、ＶＣ−１など異なる規格に準拠した動画像符号化方法または装置とを、必要に応じて適宜切替えることにより、映像データを生成することも可能である。 (Embodiment 5)
The moving picture coding method or apparatus shown in the above embodiments and the moving picture coding method or apparatus compliant with different standards such as MPEG-2, MPEG4-AVC, and VC-1 are appropriately switched as necessary. Thus, it is also possible to generate video data.

ここで、それぞれ異なる規格に準拠する複数の映像データを生成した場合、復号する際に、それぞれの規格に対応した復号方法を選択する必要がある。しかしながら、復号する映像データが、どの規格に準拠するものであるか識別できないため、適切な復号方法を選択することができないという課題を生じる。 Here, when a plurality of video data compliant with different standards are generated, it is necessary to select a decoding method corresponding to each standard when decoding. However, since it is impossible to identify which standard the video data to be decoded complies with, there arises a problem that an appropriate decoding method cannot be selected.

この課題を解決するために、映像データに音声データなどを多重化した多重化データは、映像データがどの規格に準拠するものであるかを示す識別情報を含む構成とする。上記各実施の形態で示す動画像符号化方法または装置によって生成された映像データを含む多重化データの具体的な構成を以下説明する。多重化データは、ＭＰＥＧ−２トランスポートストリーム形式のデジタルストリームである。 In order to solve this problem, multiplexed data obtained by multiplexing audio data and the like on video data is configured to include identification information indicating which standard the video data conforms to. A specific configuration of multiplexed data including video data generated by the moving picture encoding method or apparatus shown in the above embodiments will be described below. The multiplexed data is a digital stream in the MPEG-2 transport stream format.

図２９は、多重化データの構成を示す図である。図２９に示すように多重化データは、ビデオストリーム、オーディオストリーム、プレゼンテーショングラフィックスストリーム（ＰＧ）、インタラクティブグラフィックスストリームのうち、１つ以上を多重化することで得られる。ビデオストリームは映画の主映像および副映像を、オーディオストリーム（ＩＧ）は映画の主音声部分とその主音声とミキシングする副音声を、プレゼンテーショングラフィックスストリームは、映画の字幕をそれぞれ示している。ここで主映像とは画面に表示される通常の映像を示し、副映像とは主映像の中に小さな画面で表示する映像のことである。また、インタラクティブグラフィックスストリームは、画面上にＧＵＩ部品を配置することにより作成される対話画面を示している。ビデオストリームは、上記各実施の形態で示した動画像符号化方法または装置、従来のＭＰＥＧ−２、ＭＰＥＧ４−ＡＶＣ、ＶＣ−１などの規格に準拠した動画像符号化方法または装置によって符号化されている。オーディオストリームは、ドルビーＡＣ−３、ＤｏｌｂｙＤｉｇｉｔａｌＰｌｕｓ、ＭＬＰ、ＤＴＳ、ＤＴＳ−ＨＤ、または、リニアＰＣＭのなどの方式で符号化されている。 FIG. 29 is a diagram showing a structure of multiplexed data. As shown in FIG. 29, multiplexed data is obtained by multiplexing one or more of a video stream, an audio stream, a presentation graphics stream (PG), and an interactive graphics stream. The video stream indicates the main video and sub-video of the movie, the audio stream (IG) indicates the main audio portion of the movie and the sub-audio mixed with the main audio, and the presentation graphics stream indicates the subtitles of the movie. Here, the main video indicates a normal video displayed on the screen, and the sub-video is a video displayed on a small screen in the main video. The interactive graphics stream indicates an interactive screen created by arranging GUI components on the screen. The video stream is encoded by the moving image encoding method or apparatus described in the above embodiments, or the moving image encoding method or apparatus conforming to the conventional standards such as MPEG-2, MPEG4-AVC, and VC-1. ing. The audio stream is encoded by a method such as Dolby AC-3, Dolby Digital Plus, MLP, DTS, DTS-HD, or linear PCM.

多重化データに含まれる各ストリームはＰＩＤによって識別される。例えば、映画の映像に利用するビデオストリームには０ｘ１０１１が、オーディオストリームには０ｘ１１００から０ｘ１１１Ｆまでが、プレゼンテーショングラフィックスには０ｘ１２００から０ｘ１２１Ｆまでが、インタラクティブグラフィックスストリームには０ｘ１４００から０ｘ１４１Ｆまでが、映画の副映像に利用するビデオストリームには０ｘ１Ｂ００から０ｘ１Ｂ１Ｆまで、主音声とミキシングする副音声に利用するオーディオストリームには０ｘ１Ａ００から０ｘ１Ａ１Ｆが、それぞれ割り当てられている。 Each stream included in the multiplexed data is identified by PID. For example, 0x1011 for video streams used for movie images, 0x1100 to 0x111F for audio streams, 0x1200 to 0x121F for presentation graphics, 0x1400 to 0x141F for interactive graphics streams, 0x1B00 to 0x1B1F are assigned to video streams used for sub-pictures, and 0x1A00 to 0x1A1F are assigned to audio streams used for sub-audio mixed with the main audio.

図３０は、多重化データがどのように多重化されるかを模式的に示す図である。まず、複数のビデオフレームからなるビデオストリームex２３５、複数のオーディオフレームからなるオーディオストリームex２３８を、それぞれＰＥＳパケット列ex２３６およびex２３９に変換し、ＴＳパケットex２３７およびex２４０に変換する。同じくプレゼンテーショングラフィックスストリームex２４１およびインタラクティブグラフィックスex２４４のデータをそれぞれＰＥＳパケット列ex２４２およびex２４５に変換し、さらにＴＳパケットex２４３およびex２４６に変換する。多重化データex２４７はこれらのＴＳパケットを１本のストリームに多重化することで構成される。 FIG. 30 is a diagram schematically showing how multiplexed data is multiplexed. First, a video stream ex235 composed of a plurality of video frames and an audio stream ex238 composed of a plurality of audio frames are converted into PES packet sequences ex236 and ex239, respectively, and converted into TS packets ex237 and ex240. Similarly, the data of the presentation graphics stream ex241 and interactive graphics ex244 are converted into PES packet sequences ex242 and ex245, respectively, and further converted into TS packets ex243 and ex246. The multiplexed data ex247 is configured by multiplexing these TS packets into one stream.

図３１は、ＰＥＳパケット列に、ビデオストリームがどのように格納されるかをさらに詳しく示している。図３１における第１段目はビデオストリームのビデオフレーム列を示す。第２段目は、ＰＥＳパケット列を示す。図３１の矢印ｙｙ１，ｙｙ２，ｙｙ３，ｙｙ４に示すように、ビデオストリームにおける複数のＶｉｄｅｏＰｒｅｓｅｎｔａｔｉｏｎＵｎｉｔであるＩピクチャ、Ｂピクチャ、Ｐピクチャは、ピクチャ毎に分割され、ＰＥＳパケットのペイロードに格納される。各ＰＥＳパケットはＰＥＳヘッダを持ち、ＰＥＳヘッダには、ピクチャの表示時刻であるＰＴＳ（ＰｒｅｓｅｎｔａｔｉｏｎＴｉｍｅ−Ｓｔａｍｐ）やピクチャの復号時刻であるＤＴＳ（ＤｅｃｏｄｉｎｇＴｉｍｅ−Ｓｔａｍｐ）が格納される。 FIG. 31 shows in more detail how the video stream is stored in the PES packet sequence. The first row in FIG. 31 shows a video frame sequence of the video stream. The second level shows a PES packet sequence. As shown by arrows yy1, yy2, yy3, and yy4 in FIG. 31, a plurality of Video Presentation Units in a video stream are divided into pictures, stored in the payload of the PES packet. . Each PES packet has a PES header, and a PTS (Presentation Time-Stamp) that is a picture display time and a DTS (Decoding Time-Stamp) that is a picture decoding time are stored in the PES header.

図３２は、多重化データに最終的に書き込まれるＴＳパケットの形式を示している。ＴＳパケットは、ストリームを識別するＰＩＤなどの情報を持つ４ＢｙｔｅのＴＳヘッダとデータを格納する１８４ＢｙｔｅのＴＳペイロードから構成される１８８Ｂｙｔｅ固定長のパケットであり、上記ＰＥＳパケットは分割されＴＳペイロードに格納される。ＢＤ−ＲＯＭの場合、ＴＳパケットには、４ＢｙｔｅのＴＰ＿Ｅｘｔｒａ＿Ｈｅａｄｅｒが付与され、１９２Ｂｙｔｅのソースパケットを構成し、多重化データに書き込まれる。ＴＰ＿Ｅｘｔｒａ＿ＨｅａｄｅｒにはＡＴＳ（Ａｒｒｉｖａｌ＿Ｔｉｍｅ＿Ｓｔａｍｐ）などの情報が記載される。ＡＴＳは当該ＴＳパケットのデコーダのＰＩＤフィルタへの転送開始時刻を示す。多重化データには図３２下段に示すようにソースパケットが並ぶこととなり、多重化データの先頭からインクリメントする番号はＳＰＮ（ソースパケットナンバー）と呼ばれる。 FIG. 32 shows the format of a TS packet that is finally written in the multiplexed data. The TS packet is a 188-byte fixed-length packet composed of a 4-byte TS header having information such as a PID for identifying a stream and a 184-byte TS payload for storing data. The PES packet is divided and stored in the TS payload. The In the case of a BD-ROM, a 4-byte TP_Extra_Header is added to a TS packet, forms a 192-byte source packet, and is written in multiplexed data. In TP_Extra_Header, information such as ATS (Arrival_Time_Stamp) is described. ATS indicates the transfer start time of the TS packet to the PID filter of the decoder. Source packets are arranged in the multiplexed data as shown in the lower part of FIG. 32, and the number incremented from the head of the multiplexed data is called SPN (source packet number).

また、多重化データに含まれるＴＳパケットには、映像・音声・字幕などの各ストリーム以外にもＰＡＴ（ＰｒｏｇｒａｍＡｓｓｏｃｉａｔｉｏｎＴａｂｌｅ）、ＰＭＴ（ＰｒｏｇｒａｍＭａｐＴａｂｌｅ）、ＰＣＲ（ＰｒｏｇｒａｍＣｌｏｃｋＲｅｆｅｒｅｎｃｅ）などがある。ＰＡＴは多重化データ中に利用されるＰＭＴのＰＩＤが何であるかを示し、ＰＡＴ自身のＰＩＤは０で登録される。ＰＭＴは、多重化データ中に含まれる映像・音声・字幕などの各ストリームのＰＩＤと各ＰＩＤに対応するストリームの属性情報を持ち、また多重化データに関する各種ディスクリプタを持つ。ディスクリプタには多重化データのコピーを許可・不許可を指示するコピーコントロール情報などがある。ＰＣＲは、ＡＴＳの時間軸であるＡＴＣ（ＡｒｒｉｖａｌＴｉｍｅＣｌｏｃｋ）とＰＴＳ・ＤＴＳの時間軸であるＳＴＣ（ＳｙｓｔｅｍＴｉｍｅＣｌｏｃｋ）の同期を取るために、そのＰＣＲパケットがデコーダに転送されるＡＴＳに対応するＳＴＣ時間の情報を持つ。 In addition, TS packets included in multiplexed data include PAT (Program Association Table), PMT (Program Map Table), PCR (Program Clock Reference), and the like in addition to video, audio, and subtitle streams. PAT indicates what the PID of the PMT used in the multiplexed data is, and the PID of the PAT itself is registered as 0. The PMT has the PID of each stream such as video / audio / subtitles included in the multiplexed data and the attribute information of the stream corresponding to each PID, and has various descriptors related to the multiplexed data. The descriptor includes copy control information for instructing permission / non-permission of copying of multiplexed data. In order to synchronize ATC (Arrival Time Clock), which is the time axis of ATS, and STC (System Time Clock), which is the time axis of PTS / DTS, the PCR corresponds to the ATS in which the PCR packet is transferred to the decoder. Contains STC time information.

図３３はＰＭＴのデータ構造を詳しく説明する図である。ＰＭＴの先頭には、そのＰＭＴに含まれるデータの長さなどを記したＰＭＴヘッダが配置される。その後ろには、多重化データに関するディスクリプタが複数配置される。上記コピーコントロール情報などが、ディスクリプタとして記載される。ディスクリプタの後には、多重化データに含まれる各ストリームに関するストリーム情報が複数配置される。ストリーム情報は、ストリームの圧縮コーデックなどを識別するためストリームタイプ、ストリームのＰＩＤ、ストリームの属性情報（フレームレート、アスペクト比など）が記載されたストリームディスクリプタから構成される。ストリームディスクリプタは多重化データに存在するストリームの数だけ存在する。 FIG. 33 is a diagram for explaining the data structure of the PMT in detail. A PMT header describing the length of data included in the PMT is arranged at the head of the PMT. After that, a plurality of descriptors related to multiplexed data are arranged. The copy control information and the like are described as descriptors. After the descriptor, a plurality of pieces of stream information regarding each stream included in the multiplexed data are arranged. The stream information includes a stream descriptor in which a stream type, a stream PID, and stream attribute information (frame rate, aspect ratio, etc.) are described to identify a compression codec of the stream. There are as many stream descriptors as the number of streams existing in the multiplexed data.

記録媒体などに記録する場合には、上記多重化データは、多重化データ情報ファイルと共に記録される。 When recording on a recording medium or the like, the multiplexed data is recorded together with the multiplexed data information file.

多重化データ情報ファイルは、図３４に示すように多重化データの管理情報であり、多重化データと１対１に対応し、多重化データ情報、ストリーム属性情報とエントリマップから構成される。 As shown in FIG. 34, the multiplexed data information file is management information of multiplexed data, has a one-to-one correspondence with the multiplexed data, and includes multiplexed data information, stream attribute information, and an entry map.

多重化データ情報は図３４に示すようにシステムレート、再生開始時刻、再生終了時刻から構成されている。システムレートは多重化データの、後述するシステムターゲットデコーダのＰＩＤフィルタへの最大転送レートを示す。多重化データ中に含まれるＡＴＳの間隔はシステムレート以下になるように設定されている。再生開始時刻は多重化データの先頭のビデオフレームのＰＴＳであり、再生終了時刻は多重化データの終端のビデオフレームのＰＴＳに１フレーム分の再生間隔を足したものが設定される。 As shown in FIG. 34, the multiplexed data information includes a system rate, a reproduction start time, and a reproduction end time. The system rate indicates the maximum transfer rate of multiplexed data to the PID filter of the system target decoder described later. The ATS interval included in the multiplexed data is set to be equal to or less than the system rate. The playback start time is the PTS of the first video frame of the multiplexed data, and the playback end time is set by adding the playback interval for one frame to the PTS of the video frame at the end of the multiplexed data.

ストリーム属性情報は図３５に示すように、多重化データに含まれる各ストリームについての属性情報が、ＰＩＤ毎に登録される。属性情報はビデオストリーム、オーディオストリーム、プレゼンテーショングラフィックスストリーム、インタラクティブグラフィックスストリーム毎に異なる情報を持つ。ビデオストリーム属性情報は、そのビデオストリームがどのような圧縮コーデックで圧縮されたか、ビデオストリームを構成する個々のピクチャデータの解像度がどれだけであるか、アスペクト比はどれだけであるか、フレームレートはどれだけであるかなどの情報を持つ。オーディオストリーム属性情報は、そのオーディオストリームがどのような圧縮コーデックで圧縮されたか、そのオーディオストリームに含まれるチャンネル数は何であるか、何の言語に対応するか、サンプリング周波数がどれだけであるかなどの情報を持つ。これらの情報は、プレーヤが再生する前のデコーダの初期化などに利用される。 As shown in FIG. 35, in the stream attribute information, attribute information about each stream included in the multiplexed data is registered for each PID. The attribute information has different information for each video stream, audio stream, presentation graphics stream, and interactive graphics stream. The video stream attribute information includes the compression codec used to compress the video stream, the resolution of the individual picture data constituting the video stream, the aspect ratio, and the frame rate. It has information such as how much it is. The audio stream attribute information includes the compression codec used to compress the audio stream, the number of channels included in the audio stream, the language supported, and the sampling frequency. With information. These pieces of information are used for initialization of the decoder before the player reproduces it.

本実施の形態においては、上記多重化データのうち、ＰＭＴに含まれるストリームタイプを利用する。また、記録媒体に多重化データが記録されている場合には、多重化データ情報に含まれる、ビデオストリーム属性情報を利用する。具体的には、上記各実施の形態で示した動画像符号化方法または装置において、ＰＭＴに含まれるストリームタイプ、または、ビデオストリーム属性情報に対し、上記各実施の形態で示した動画像符号化方法または装置によって生成された映像データであることを示す固有の情報を設定するステップまたは手段を設ける。この構成により、上記各実施の形態で示した動画像符号化方法または装置によって生成した映像データと、他の規格に準拠する映像データとを識別することが可能になる。 In the present embodiment, among the multiplexed data, the stream type included in the PMT is used. Also, when multiplexed data is recorded on the recording medium, video stream attribute information included in the multiplexed data information is used. Specifically, in the video encoding method or apparatus shown in each of the above embodiments, the video encoding shown in each of the above embodiments for the stream type or video stream attribute information included in the PMT. There is provided a step or means for setting unique information indicating that the video data is generated by the method or apparatus. With this configuration, it is possible to discriminate between video data generated by the moving picture encoding method or apparatus described in the above embodiments and video data compliant with other standards.

また、本実施の形態における動画像復号化方法のステップを図３６に示す。ステップexＳ１００において、多重化データからＰＭＴに含まれるストリームタイプ、または、多重化データ情報に含まれるビデオストリーム属性情報を取得する。次に、ステップexＳ１０１において、ストリームタイプ、または、ビデオストリーム属性情報が上記各実施の形態で示した動画像符号化方法または装置によって生成された多重化データであることを示しているか否かを判断する。そして、ストリームタイプ、または、ビデオストリーム属性情報が上記各実施の形態で示した動画像符号化方法または装置によって生成されたものであると判断された場合には、ステップexＳ１０２において、上記各実施の形態で示した動画像復号方法により復号を行う。また、ストリームタイプ、または、ビデオストリーム属性情報が、従来のＭＰＥＧ−２、ＭＰＥＧ４−ＡＶＣ、ＶＣ−１などの規格に準拠するものであることを示している場合には、ステップexＳ１０３において、従来の規格に準拠した動画像復号方法により復号を行う。 FIG. 36 shows steps of the moving picture decoding method according to the present embodiment. In step exS100, the stream type included in the PMT or the video stream attribute information included in the multiplexed data information is acquired from the multiplexed data. Next, in step exS101, it is determined whether or not the stream type or the video stream attribute information indicates multiplexed data generated by the moving picture encoding method or apparatus described in the above embodiments. To do. When it is determined that the stream type or the video stream attribute information is generated by the moving image encoding method or apparatus described in the above embodiments, in step exS102, the above embodiments are performed. Decoding is performed by the moving picture decoding method shown in the form. If the stream type or the video stream attribute information indicates that it conforms to a standard such as conventional MPEG-2, MPEG4-AVC, or VC-1, in step exS103, Decoding is performed by a moving image decoding method compliant with the standard.

このように、ストリームタイプ、または、ビデオストリーム属性情報に新たな固有値を設定することにより、復号する際に、上記各実施の形態で示した動画像復号化方法または装置で復号可能であるかを判断することができる。従って、異なる規格に準拠する多重化データが入力された場合であっても、適切な復号化方法または装置を選択することができるため、エラーを生じることなく復号することが可能となる。また、本実施の形態で示した動画像符号化方法または装置、または、動画像復号方法または装置を、上述したいずれの機器・システムに用いることも可能である。 In this way, by setting a new unique value in the stream type or video stream attribute information, whether or not decoding is possible with the moving picture decoding method or apparatus described in each of the above embodiments is performed. Judgment can be made. Therefore, even when multiplexed data conforming to different standards is input, an appropriate decoding method or apparatus can be selected, and therefore decoding can be performed without causing an error. In addition, the moving picture encoding method or apparatus or the moving picture decoding method or apparatus described in this embodiment can be used in any of the above-described devices and systems.

（実施の形態６）
上記各実施の形態で示した動画像符号化方法および装置、動画像復号化方法および装置は、典型的には集積回路であるＬＳＩで実現される。一例として、図３７に１チップ化されたＬＳＩex５００の構成を示す。ＬＳＩex５００は、以下に説明する要素ex５０１、ex５０２、ex５０３、ex５０４、ex５０５、ex５０６、ex５０７、ex５０８、ex５０９を備え、各要素はバスex５１０を介して接続している。電源回路部ex５０５は電源がオン状態の場合に各部に対して電力を供給することで動作可能な状態に起動する。 (Embodiment 6)
The moving picture encoding method and apparatus and moving picture decoding method and apparatus described in the above embodiments are typically realized by an LSI that is an integrated circuit. As an example, FIG. 37 shows a configuration of the LSI ex500 that is made into one chip. The LSI ex500 includes elements ex501, ex502, ex503, ex504, ex505, ex506, ex507, ex508, and ex509 described below, and each element is connected via a bus ex510. The power supply circuit unit ex505 is activated to an operable state by supplying power to each unit when the power supply is on.

例えば符号化処理を行う場合には、ＬＳＩex５００は、ＣＰＵex５０２、メモリコントローラex５０３、ストリームコントローラex５０４、駆動周波数制御部ex５１２等を有する制御部ex５０１の制御に基づいて、ＡＶＩ／Ｏex５０９によりマイクex１１７やカメラex１１３等からＡＶ信号を入力する。入力されたＡＶ信号は、一旦ＳＤＲＡＭ等の外部のメモリex５１１に蓄積される。制御部ex５０１の制御に基づいて、蓄積したデータは処理量や処理速度に応じて適宜複数回に分けるなどされ信号処理部ex５０７に送られ、信号処理部ex５０７において音声信号の符号化および／または映像信号の符号化が行われる。ここで映像信号の符号化処理は上記各実施の形態で説明した符号化処理である。信号処理部ex５０７ではさらに、場合により符号化された音声データと符号化された映像データを多重化するなどの処理を行い、ストリームＩ／Ｏex５０６から外部に出力する。この出力された多重化データは、基地局ex１０７に向けて送信されたり、または記録メディアex２１５に書き込まれたりする。なお、多重化する際には同期するよう、一旦バッファex５０８にデータを蓄積するとよい。 For example, when performing the encoding process, the LSI ex500 uses the AV I / O ex509 to perform the microphone ex117 and the camera ex113 based on the control of the control unit ex501 including the CPU ex502, the memory controller ex503, the stream controller ex504, the drive frequency control unit ex512, and the like. The AV signal is input from the above. The input AV signal is temporarily stored in an external memory ex511 such as SDRAM. Based on the control of the control unit ex501, the accumulated data is divided into a plurality of times as appropriate according to the processing amount and the processing speed and sent to the signal processing unit ex507. Signal encoding is performed. Here, the encoding process of the video signal is the encoding process described in the above embodiments. The signal processing unit ex507 further performs processing such as multiplexing the encoded audio data and the encoded video data according to circumstances, and outputs the result from the stream I / Oex 506 to the outside. The output multiplexed data is transmitted to the base station ex107 or written to the recording medium ex215. It should be noted that data should be temporarily stored in the buffer ex508 so as to be synchronized when multiplexing.

なお、上記では、メモリex５１１がＬＳＩex５００の外部の構成として説明したが、ＬＳＩex５００の内部に含まれる構成であってもよい。バッファex５０８も１つに限ったものではなく、複数のバッファを備えていてもよい。また、ＬＳＩex５００は１チップ化されてもよいし、複数チップ化されてもよい。 In the above description, the memory ex511 is described as an external configuration of the LSI ex500. However, a configuration included in the LSI ex500 may be used. The number of buffers ex508 is not limited to one, and a plurality of buffers may be provided. The LSI ex500 may be made into one chip or a plurality of chips.

また、上記では、制御部ex５０１が、ＣＰＵex５０２、メモリコントローラex５０３、ストリームコントローラex５０４、駆動周波数制御部ex５１２等を有するとしているが、制御部ex５０１の構成は、この構成に限らない。例えば、信号処理部ex５０７がさらにＣＰＵを備える構成であってもよい。信号処理部ex５０７の内部にもＣＰＵを設けることにより、処理速度をより向上させることが可能になる。また、他の例として、ＣＰＵex５０２が信号処理部ex５０７、または信号処理部ex５０７の一部である例えば音声信号処理部を備える構成であってもよい。このような場合には、制御部ex５０１は、信号処理部ex５０７、またはその一部を有するＣＰＵex５０２を備える構成となる。 In the above description, the control unit ex501 includes the CPU ex502, the memory controller ex503, the stream controller ex504, the drive frequency control unit ex512, and the like, but the configuration of the control unit ex501 is not limited to this configuration. For example, the signal processing unit ex507 may further include a CPU. By providing a CPU also in the signal processing unit ex507, the processing speed can be further improved. As another example, the CPU ex502 may be configured to include a signal processing unit ex507 or, for example, an audio signal processing unit that is a part of the signal processing unit ex507. In such a case, the control unit ex501 is configured to include a signal processing unit ex507 or a CPU ex502 having a part thereof.

なお、ここでは、ＬＳＩとしたが、集積度の違いにより、ＩＣ、システムＬＳＩ、スーパーＬＳＩ、ウルトラＬＳＩと呼称されることもある。 Here, although LSI is used, it may be called IC, system LSI, super LSI, or ultra LSI depending on the degree of integration.

また、集積回路化の手法はＬＳＩに限るものではなく、専用回路または汎用プロセッサで実現してもよい。ＬＳＩ製造後に、プログラムすることが可能なＦＰＧＡ（Field Programmable Gate Array）や、ＬＳＩ内部の回路セルの接続や設定を再構成可能なリコンフィギュラブル・プロセッサを利用してもよい。このようなプログラマブル・ロジック・デバイスは、典型的には、ソフトウェア又はファームウェアを構成するプログラムを、ロードする又はメモリ等から読み込むことで、上記各実施の形態で示した動画像符号化方法、又は動画像復号化方法を実行することができる。 Further, the method of circuit integration is not limited to LSI's, and implementation using dedicated circuitry or general purpose processors is also possible. An FPGA (Field Programmable Gate Array) that can be programmed after manufacturing the LSI, or a reconfigurable processor that can reconfigure the connection and setting of circuit cells inside the LSI may be used. Such a programmable logic device typically loads or reads a program constituting software or firmware from a memory or the like, so that the moving image encoding method or the moving image described in each of the above embodiments is used. An image decoding method can be performed.

さらには、半導体技術の進歩または派生する別技術によりＬＳＩに置き換わる集積回路化の技術が登場すれば、当然、その技術を用いて機能ブロックの集積化を行ってもよい。バイオ技術の適応等が可能性としてありえる。 Furthermore, if integrated circuit technology comes out to replace LSI's as a result of the advancement of semiconductor technology or a derivative other technology, it is naturally also possible to carry out function block integration using this technology. Biotechnology can be applied.

（実施の形態７）
上記各実施の形態で示した動画像符号化方法または装置によって生成された映像データを復号する場合、従来のＭＰＥＧ−２、ＭＰＥＧ４−ＡＶＣ、ＶＣ−１などの規格に準拠する映像データを復号する場合に比べ、処理量が増加することが考えられる。そのため、ＬＳＩex５００において、従来の規格に準拠する映像データを復号する際のＣＰＵex５０２の駆動周波数よりも高い駆動周波数に設定する必要がある。しかし、駆動周波数を高くすると、消費電力が高くなるという課題が生じる。 (Embodiment 7)
When decoding video data generated by the moving picture encoding method or apparatus described in the above embodiments, video data compliant with standards such as MPEG-2, MPEG4-AVC, and VC-1 is decoded. It is conceivable that the amount of processing increases compared to the case. Therefore, in LSI ex500, it is necessary to set a driving frequency higher than the driving frequency of CPU ex502 when decoding video data compliant with the conventional standard. However, when the drive frequency is increased, there is a problem that power consumption increases.

この課題を解決するために、テレビex３００、ＬＳＩex５００などの動画像復号化装置は、映像データがどの規格に準拠するものであるかを識別し、規格に応じて駆動周波数を切替える構成とする。図３８は、本実施の形態における構成ex８００を示している。駆動周波数切替え部ex８０３は、映像データが、上記各実施の形態で示した動画像符号化方法または装置によって生成されたものである場合には、駆動周波数を高く設定する。そして、上記各実施の形態で示した動画像復号化方法を実行する復号処理部ex８０１に対し、映像データを復号するよう指示する。一方、映像データが、従来の規格に準拠する映像データである場合には、映像データが、上記各実施の形態で示した動画像符号化方法または装置によって生成されたものである場合に比べ、駆動周波数を低く設定する。そして、従来の規格に準拠する復号処理部ex８０２に対し、映像データを復号するよう指示する。 In order to solve this problem, moving picture decoding apparatuses such as the television ex300 and the LSI ex500 are configured to identify which standard the video data conforms to and switch the driving frequency according to the standard. FIG. 38 shows a configuration ex800 in the present embodiment. The drive frequency switching unit ex803 sets the drive frequency high when the video data is generated by the moving image encoding method or apparatus described in the above embodiments. Then, the decoding processing unit ex801 that executes the moving picture decoding method described in each of the above embodiments is instructed to decode the video data. On the other hand, when the video data is video data compliant with the conventional standard, compared to the case where the video data is generated by the moving picture encoding method or apparatus shown in the above embodiments, Set the drive frequency low. Then, it instructs the decoding processing unit ex802 compliant with the conventional standard to decode the video data.

より具体的には、駆動周波数切替え部ex８０３は、図３７のＣＰＵex５０２と駆動周波数制御部ex５１２から構成される。また、上記各実施の形態で示した動画像復号化方法を実行する復号処理部ex８０１、および、従来の規格に準拠する復号処理部ex８０２は、図３７の信号処理部ex５０７に該当する。ＣＰＵex５０２は、映像データがどの規格に準拠するものであるかを識別する。そして、ＣＰＵex５０２からの信号に基づいて、駆動周波数制御部ex５１２は、駆動周波数を設定する。また、ＣＰＵex５０２からの信号に基づいて、信号処理部ex５０７は、映像データの復号を行う。ここで、映像データの識別には、例えば、実施の形態５で記載した識別情報を利用することが考えられる。識別情報に関しては、実施の形態５で記載したものに限られず、映像データがどの規格に準拠するか識別できる情報であればよい。例えば、映像データがテレビに利用されるものであるか、ディスクに利用されるものであるかなどを識別する外部信号に基づいて、映像データがどの規格に準拠するものであるか識別可能である場合には、このような外部信号に基づいて識別してもよい。また、ＣＰＵex５０２における駆動周波数の選択は、例えば、図４０のような映像データの規格と、駆動周波数とを対応付けたルックアップテーブルに基づいて行うことが考えられる。ルックアップテーブルを、バッファex５０８や、ＬＳＩの内部メモリに格納しておき、ＣＰＵex５０２がこのルックアップテーブルを参照することにより、駆動周波数を選択することが可能である。 More specifically, the drive frequency switching unit ex803 includes the CPU ex502 and the drive frequency control unit ex512 in FIG. Also, the decoding processing unit ex801 that executes the moving picture decoding method described in each of the above embodiments and the decoding processing unit ex802 that conforms to the conventional standard correspond to the signal processing unit ex507 in FIG. The CPU ex502 identifies which standard the video data conforms to. Then, based on the signal from the CPU ex502, the drive frequency control unit ex512 sets the drive frequency. Further, based on the signal from the CPU ex502, the signal processing unit ex507 decodes the video data. Here, for identification of video data, for example, the identification information described in the fifth embodiment may be used. The identification information is not limited to that described in the fifth embodiment, and any information that can identify which standard the video data conforms to may be used. For example, it is possible to identify which standard the video data conforms to based on an external signal that identifies whether the video data is used for a television or a disk. In some cases, identification may be performed based on such an external signal. In addition, the selection of the driving frequency in the CPU ex502 may be performed based on, for example, a lookup table in which video data standards and driving frequencies are associated with each other as shown in FIG. The look-up table is stored in the buffer ex508 or the internal memory of the LSI, and the CPU ex502 can select the drive frequency by referring to the look-up table.

図３９は、本実施の形態の方法を実施するステップを示している。まず、ステップexＳ２００では、信号処理部ex５０７において、多重化データから識別情報を取得する。次に、ステップexＳ２０１では、ＣＰＵex５０２において、識別情報に基づいて映像データが上記各実施の形態で示した符号化方法または装置によって生成されたものであるか否かを識別する。映像データが上記各実施の形態で示した符号化方法または装置によって生成されたものである場合には、ステップexＳ２０２において、駆動周波数を高く設定する信号を、ＣＰＵex５０２が駆動周波数制御部ex５１２に送る。そして、駆動周波数制御部ex５１２において、高い駆動周波数に設定される。一方、従来のＭＰＥＧ−２、ＭＰＥＧ４−ＡＶＣ、ＶＣ−１などの規格に準拠する映像データであることを示している場合には、ステップexＳ２０３において、駆動周波数を低く設定する信号を、ＣＰＵex５０２が駆動周波数制御部ex５１２に送る。そして、駆動周波数制御部ex５１２において、映像データが上記各実施の形態で示した符号化方法または装置によって生成されたものである場合に比べ、低い駆動周波数に設定される。 FIG. 39 shows steps for executing a method in the present embodiment. First, in step exS200, the signal processing unit ex507 acquires identification information from the multiplexed data. Next, in step exS201, the CPU ex502 identifies whether the video data is generated by the encoding method or apparatus described in each of the above embodiments based on the identification information. When the video data is generated by the encoding method or apparatus shown in the above embodiments, in step exS202, the CPU ex502 sends a signal for setting the drive frequency high to the drive frequency control unit ex512. Then, the drive frequency control unit ex512 sets a high drive frequency. On the other hand, if the video data conforms to the standards such as MPEG-2, MPEG4-AVC, and VC-1, the CPU ex502 drives the signal for setting the drive frequency low in step exS203. This is sent to the frequency control unit ex512. Then, in the drive frequency control unit ex512, the drive frequency is set to be lower than that in the case where the video data is generated by the encoding method or apparatus described in the above embodiments.

さらに、駆動周波数の切替えに連動して、ＬＳＩex５００またはＬＳＩex５００を含む装置に与える電圧を変更することにより、省電力効果をより高めることが可能である。例えば、駆動周波数を低く設定する場合には、これに伴い、駆動周波数を高く設定している場合に比べ、ＬＳＩex５００またはＬＳＩex５００を含む装置に与える電圧を低く設定することが考えられる。 Furthermore, the power saving effect can be further enhanced by changing the voltage applied to the LSI ex500 or the device including the LSI ex500 in conjunction with the switching of the driving frequency. For example, when the drive frequency is set low, it is conceivable that the voltage applied to the LSI ex500 or the device including the LSI ex500 is set low as compared with the case where the drive frequency is set high.

また、駆動周波数の設定方法は、復号する際の処理量が大きい場合に、駆動周波数を高く設定し、復号する際の処理量が小さい場合に、駆動周波数を低く設定すればよく、上述した設定方法に限らない。例えば、ＭＰＥＧ４−ＡＶＣ規格に準拠する映像データを復号する処理量の方が、上記各実施の形態で示した動画像符号化方法または装置により生成された映像データを復号する処理量よりも大きい場合には、駆動周波数の設定を上述した場合の逆にすることが考えられる。 In addition, the setting method of the driving frequency may be set to a high driving frequency when the processing amount at the time of decoding is large, and to a low driving frequency when the processing amount at the time of decoding is small. It is not limited to the method. For example, the amount of processing for decoding video data compliant with the MPEG4-AVC standard is larger than the amount of processing for decoding video data generated by the moving picture encoding method or apparatus described in the above embodiments. It is conceivable that the setting of the driving frequency is reversed to that in the case described above.

さらに、駆動周波数の設定方法は、駆動周波数を低くする構成に限らない。例えば、識別情報が、上記各実施の形態で示した動画像符号化方法または装置によって生成された映像データであることを示している場合には、ＬＳＩex５００またはＬＳＩex５００を含む装置に与える電圧を高く設定し、従来のＭＰＥＧ−２、ＭＰＥＧ４−ＡＶＣ、ＶＣ−１などの規格に準拠する映像データであることを示している場合には、ＬＳＩex５００またはＬＳＩex５００を含む装置に与える電圧を低く設定することも考えられる。また、他の例としては、識別情報が、上記各実施の形態で示した動画像符号化方法または装置によって生成された映像データであることを示している場合には、ＣＰＵex５０２の駆動を停止させることなく、従来のＭＰＥＧ−２、ＭＰＥＧ４−ＡＶＣ、ＶＣ−１などの規格に準拠する映像データであることを示している場合には、処理に余裕があるため、ＣＰＵex５０２の駆動を一時停止させることも考えられる。識別情報が、上記各実施の形態で示した動画像符号化方法または装置によって生成された映像データであることを示している場合であっても、処理に余裕があれば、ＣＰＵex５０２の駆動を一時停止させることも考えられる。この場合は、従来のＭＰＥＧ−２、ＭＰＥＧ４−ＡＶＣ、ＶＣ−１などの規格に準拠する映像データであることを示している場合に比べて、停止時間を短く設定することが考えられる。 Further, the method for setting the drive frequency is not limited to the configuration in which the drive frequency is lowered. For example, when the identification information indicates that the video data is generated by the moving image encoding method or apparatus described in the above embodiments, the voltage applied to the LSIex500 or the apparatus including the LSIex500 is set high. However, if the video data conforms to the standards such as MPEG-2, MPEG4-AVC, VC-1, etc., it may be considered to set the voltage applied to the device including LSIex500 or LSIex500 low. It is done. As another example, when the identification information indicates that the video data is generated by the moving image encoding method or apparatus described in the above embodiments, the driving of the CPU ex502 is stopped. If the video data conforms to the standards such as MPEG-2, MPEG4-AVC, VC-1, etc., the CPU ex502 is temporarily stopped because there is enough processing. Is also possible. Even when the identification information indicates that the video data is generated by the moving image encoding method or apparatus described in each of the above embodiments, if there is a margin for processing, the CPU ex502 is temporarily driven. It can also be stopped. In this case, it is conceivable to set the stop time shorter than in the case where the video data conforms to the standards such as MPEG-2, MPEG4-AVC, and VC-1.

このように、映像データが準拠する規格に応じて、駆動周波数を切替えることにより、省電力化を図ることが可能になる。また、電池を用いてＬＳＩex５００またはＬＳＩex５００を含む装置を駆動している場合には、省電力化に伴い、電池の寿命を長くすることが可能である。 In this way, it is possible to save power by switching the drive frequency according to the standard to which the video data complies. In addition, when the battery is used to drive the LSI ex500 or the device including the LSI ex500, it is possible to extend the life of the battery with power saving.

（実施の形態８）
テレビや、携帯電話など、上述した機器・システムには、異なる規格に準拠する複数の映像データが入力される場合がある。このように、異なる規格に準拠する複数の映像データが入力された場合にも復号できるようにするために、ＬＳＩex５００の信号処理部ex５０７が複数の規格に対応している必要がある。しかし、それぞれの規格に対応する信号処理部ex５０７を個別に用いると、ＬＳＩex５００の回路規模が大きくなり、また、コストが増加するという課題が生じる。 (Embodiment 8)
A plurality of video data that conforms to different standards may be input to the above-described devices and systems such as a television and a mobile phone. As described above, the signal processing unit ex507 of the LSI ex500 needs to support a plurality of standards in order to be able to decode even when a plurality of video data complying with different standards is input. However, when the signal processing unit ex507 corresponding to each standard is used individually, there is a problem that the circuit scale of the LSI ex500 increases and the cost increases.

この課題を解決するために、上記各実施の形態で示した動画像復号方法を実行するための復号処理部と、従来のＭＰＥＧ−２、ＭＰＥＧ４−ＡＶＣ、ＶＣ−１などの規格に準拠する復号処理部とを一部共有化する構成とする。この構成例を図４１Ａのex９００に示す。例えば、上記各実施の形態で示した動画像復号方法と、ＭＰＥＧ４−ＡＶＣ規格に準拠する動画像復号方法とは、エントロピー符号化、逆量子化、デブロッキング・フィルタ、動き補償などの処理において処理内容が一部共通する。共通する処理内容については、ＭＰＥＧ４−ＡＶＣ規格に対応する復号処理部ex９０２を共有し、ＭＰＥＧ４−ＡＶＣ規格に対応しない、本発明の一態様に特有の他の処理内容については、専用の復号処理部ex９０１を用いるという構成が考えられる。特に、本発明の一態様は、面間予測に特徴を有していることから、例えば、面間予測については専用の復号処理部ex９０１を用い、それ以外のエントロピー復号、デブロッキング・フィルタ、逆量子化のいずれか、または、全ての処理については、復号処理部を共有することが考えられる。復号処理部の共有化に関しては、共通する処理内容については、上記各実施の形態で示した動画像復号化方法を実行するための復号処理部を共有し、ＭＰＥＧ４−ＡＶＣ規格に特有の処理内容については、専用の復号処理部を用いる構成であってもよい。 In order to solve this problem, a decoding processing unit for executing the moving picture decoding method shown in each of the above embodiments and a decoding conforming to a standard such as conventional MPEG-2, MPEG4-AVC, or VC-1 The processing unit is partly shared. An example of this configuration is shown as ex900 in FIG. 41A. For example, the moving picture decoding method shown in the above embodiments and the moving picture decoding method compliant with the MPEG4-AVC standard are processed in processes such as entropy coding, inverse quantization, deblocking filter, and motion compensation. Some contents are common. For the common processing content, the decoding processing unit ex902 corresponding to the MPEG4-AVC standard is shared, and for other processing content specific to one aspect of the present invention that does not correspond to the MPEG4-AVC standard, a dedicated decoding processing unit A configuration using ex901 is conceivable. In particular, since one aspect of the present invention is characterized by inter prediction, for example, a dedicated decoding processing unit ex901 is used for inter prediction, and other entropy decoding, deblocking filter, inverse processing, and the like are performed. For any or all of the quantization processes, it is conceivable to share the decoding processing unit. Regarding the sharing of the decoding processing unit, regarding the common processing content, the decoding processing unit for executing the moving picture decoding method described in each of the above embodiments is shared, and the processing content specific to the MPEG4-AVC standard As for, a configuration using a dedicated decoding processing unit may be used.

また、処理を一部共有化する他の例を図４１Ｂのex１０００に示す。この例では、本発明の一態様に特有の処理内容に対応した専用の復号処理部ex１００１と、他の従来規格に特有の処理内容に対応した専用の復号処理部ex１００２と、本発明の一態様に係る動画像復号方法と他の従来規格の動画像復号方法とに共通する処理内容に対応した共用の復号処理部ex１００３とを用いる構成としている。ここで、専用の復号処理部ex１００１、ex１００２は、必ずしも本発明の一態様、または、他の従来規格に特有の処理内容に特化したものではなく、他の汎用処理を実行できるものであってもよい。また、本実施の形態の構成を、ＬＳＩex５００で実装することも可能である。 Further, ex1000 in FIG. 41B shows another example in which processing is partially shared. In this example, a dedicated decoding processing unit ex1001 corresponding to the processing content specific to one aspect of the present invention, a dedicated decoding processing unit ex1002 corresponding to the processing content specific to another conventional standard, and one aspect of the present invention And a common decoding processing unit ex1003 corresponding to the processing contents common to the moving image decoding method according to the above and other conventional moving image decoding methods. Here, the dedicated decoding processing units ex1001 and ex1002 are not necessarily specialized in one aspect of the present invention or processing content specific to other conventional standards, and can execute other general-purpose processing. Also good. Also, the configuration of the present embodiment can be implemented by LSI ex500.

このように、本発明の一態様に係る動画像復号方法と、従来の規格の動画像復号方法とで共通する処理内容について、復号処理部を共有することにより、ＬＳＩの回路規模を小さくし、かつ、コストを低減することが可能である。 As described above, the processing content common to the moving picture decoding method according to one aspect of the present invention and the moving picture decoding method of the conventional standard reduces the circuit scale of the LSI by sharing the decoding processing unit, In addition, the cost can be reduced.

本発明は、例えば、テレビジョン受像機、デジタルビデオレコーダー、カーナビゲーション、携帯電話、デジタルカメラ、デジタルビデオカメラ、防犯カメラシステム、定点観測カメラシステム、または、コンテンツ配信システム等に利用可能である。 The present invention is applicable to, for example, a television receiver, a digital video recorder, a car navigation system, a mobile phone, a digital camera, a digital video camera, a security camera system, a fixed point observation camera system, or a content distribution system.

１０、１１、１２画像処理システム
２０サーバ
２１背景画像データベース
２２、３２ａ、３２ｂ、３２ｃ、５２ａ、５２ｂ制御部
２３、３３ａ、３３ｂ、３３ｃ、５３ａ、５３ｂ処理部
２４、３４ａ、３４ｂ、３４ｃ、５４ａ、５４ｂ通信部
３０ａ、３０ｂ、３０ｃエンコーダ
３１ａ、３１ｂ、３１ｃ、５１ａ、５１ｂ記憶部
３５ａ、３５ｂ、３５ｃカメラ
４１分割部
４２減算部
４３変換部
４４可変長符号化部
４５、６５逆変換部
４６、６６加算部
４７、６７フレームメモリ
４８予測部
５０ａ、５０ｂデコーダ
５５ａ、５５ｂ表示部
６１可変長復号部
６８結合部
７０画像符号化装置
７１、８１取得部
７２符号化部
８０画像復号装置
８２復号部
９０画像管理装置 10, 11, 12 Image processing system 20 Server 21 Background image database 22, 32a, 32b, 32c, 52a, 52b Control unit 23, 33a, 33b, 33c, 53a, 53b Processing unit 24, 34a, 34b, 34c, 54a, 54b Communication unit 30a, 30b, 30c Encoder 31a, 31b, 31c, 51a, 51b Storage unit 35a, 35b, 35c Camera 41 Division unit 42 Subtraction unit 43 Conversion unit 44 Variable length encoding unit 45, 65 Inverse conversion unit 46, 66 Adder 47, 67 Frame memory 48 Prediction unit 50a, 50b Decoder 55a, 55b Display unit 61 Variable length decoding unit 68 Coupling unit 70 Image encoding device 71, 81 Acquisition unit 72 Encoding unit 80 Image decoding device 82 Decoding unit 90 Image Management device

Claims

An image encoding device that encodes a plurality of display target images constituting a video using inter-surface prediction,
An acquisition unit that acquires a reference-only image that is an image that is different from both the plurality of display target images and the plurality of reconstructed images of the plurality of display target images and is used exclusively for reference in the inter-plane prediction;
An image encoding apparatus comprising: an encoding unit configured to encode one or more display target images among the plurality of display target images with reference to the reference dedicated image as a reference image in the inter-frame prediction.

The image encoding device according to claim 1, wherein the acquisition unit acquires the reference-only image that is larger than each of the plurality of display target images.

The image encoding device according to claim 1, wherein the acquisition unit acquires the reference-only image obtained by integrating a plurality of captured images that are a plurality of images obtained by capturing.

The acquisition unit according to any one of claims 1 to 3, wherein the acquisition unit acquires the reference-only image before the first display target image is encoded in the encoding order among the plurality of display target images. Image encoding device.

The obtaining unit obtains the reference-only image partially or entirely by receiving the reference-only image from an image management device partially or entirely.
The image encoding according to any one of claims 1 to 4, wherein the encoding unit encodes the one or more display target images with reference to the reference-only image acquired partially or entirely. Device.

The acquisition unit acquires each of a plurality of reference dedicated images including a first reference dedicated image corresponding to the first shooting situation and a second reference dedicated image corresponding to the second shooting situation as the reference dedicated image,
The encoding unit includes:
When the shooting situation of the video is the first shooting situation, the first reference-only image is referred to as the reference-only image, and the one or more display target images are encoded,
The one or more display target images are encoded by referring to the second reference-dedicated image as the reference-dedicated image when the shooting state of the video is the second shooting state. The image encoding device according to item 1.

The acquisition unit further updates the reference-only image using one or more reconstructed images among the plurality of reconstructed images of the plurality of display target images,
The image encoding device according to any one of claims 1 to 6, wherein the encoding unit encodes the one or more display target images with reference to the updated reference-only image.

The encoding unit converts the reference-only image so that the reference-only image corresponds to the encoding-target image when encoding the encoding-target image among the one or more display-target images. The image encoding device according to any one of claims 1 to 7, wherein the reference-only image that has been used is referred to as the reference image.

The encoding unit scales the reference-only image so that the size of the subject in the reference-only image corresponds to the size of the subject in the encoding-target image, and refers to the scaled reference-only image. The image encoding device according to claim 8, which is referred to as an image.

The encoding unit uses the shooting information of each of the reference-dedicated image and the encoding target image, or the position of the feature point in each of the reference-dedicated image and the encoding target image, to generate the reference-dedicated image. The image encoding device according to claim 9, wherein the image encoding device is scaled.

The image encoding device according to claim 9 or 10, wherein the encoding unit scales the reference-dedicated image according to an accuracy of a motion vector used in the inter-plane prediction.

The image encoding device according to any one of claims 8 to 11, wherein the encoding unit further encodes a conversion parameter that is a parameter used for conversion of the reference-only image.

The encoding unit further encodes an entire vector indicating a position of a region corresponding to the encoding target image of the one or more display target images in the reference-only image. The image encoding device according to item.

The encoding unit calculates the entire vector using shooting information of each of the reference-dedicated image and the encoding target image, or a position of a feature point in each of the reference-dedicated image and the encoding target image. The image encoding device according to claim 13, wherein the calculated entire vector is encoded.

The encoding unit generates the code string including the one or more display target images separately from the code string including the reference-only image by encoding the one or more display target images. The image encoding device according to any one of claims.

The image encoding device according to any one of claims 1 to 15, wherein the encoding unit further encodes the reference-only image as a non-display image.

An image decoding device that decodes a plurality of display target images constituting a video using inter-surface prediction,
An acquisition unit that acquires a reference-only image that is an image that is different from both the plurality of display target images and the plurality of reconstructed images of the plurality of display target images and is used exclusively for reference in the inter-plane prediction;
An image decoding apparatus comprising: a decoding unit configured to decode one or more display target images among the plurality of display target images with reference to the reference dedicated image as a reference image in the inter prediction.

An image processing system that encodes and decodes a plurality of display target images constituting a video by using inter prediction,
An image management device that acquires a reference-only image that is an image different from both the plurality of display target images and a plurality of reconstructed images of the plurality of display target images and is used exclusively for reference in the inter-plane prediction;
An image encoding device that encodes the plurality of display target images using the inter-plane prediction;
An image decoding device that decodes the plurality of display target images using the inter-plane prediction,
The image encoding device includes:
A first acquisition unit that acquires the reference-only image acquired by the image management device from the image management device;
An encoding unit that encodes one or more display target images of the plurality of display target images with reference to the reference-only image acquired by the first acquisition unit as a reference image in the inter-frame prediction. ,
The image decoding device includes:
A second acquisition unit that acquires the reference-only image acquired by the image management apparatus from the image management apparatus;
A decoding unit that decodes one or more display target images of the plurality of display target images with reference to the reference-only image acquired by the second acquisition unit as a reference image in the inter-frame prediction. system.

An image encoding method for encoding a plurality of display target images constituting a video using inter-surface prediction,
An acquisition step of acquiring a reference-only image that is an image that is different from both the plurality of display target images and a plurality of reconstructed images of the plurality of display target images and is used exclusively for reference in the inter-plane prediction;
An image encoding method comprising: an encoding step of encoding one or more display target images among the plurality of display target images with reference to the reference-only image as a reference image in the inter prediction.

An image decoding method for decoding a plurality of display target images constituting a video using inter-surface prediction,
An acquisition step of acquiring a reference-only image that is an image that is different from both the plurality of display target images and a plurality of reconstructed images of the plurality of display target images and is used exclusively for reference in the inter-plane prediction;
And a decoding step of decoding one or more display target images of the plurality of display target images with reference to the reference-only image as a reference image in the inter-frame prediction.