JP2017097815A

JP2017097815A - Information processing device, information processing method, and program

Info

Publication number: JP2017097815A
Application number: JP2015232527A
Authority: JP
Inventors: 石田　良弘; Yoshihiro Ishida; 良弘石田; 正輝北郷; Masateru Kitago
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2015-11-28
Filing date: 2015-11-28
Publication date: 2017-06-01

Abstract

PROBLEM TO BE SOLVED: To detect a specific subject appearing in a moving image at high speed on the basis of a compressed range image.SOLUTION: Provided is an information processing device comprising: setting means for setting a warning space for detecting trespassing of a subject in a monitoring space; acquisition means for acquiring compressed data that is obtained by encoding a range image in an encoding format in which encoding per pixel and encoding per multiple pixels are mixed; estimation means for estimating whether a codeword in the compressed data that is to be decoded has been encoded once for each pixel or encoded once for multiple pixels; and determination means which, when encoding per pixel is estimated by the estimation means, decodes the codeword to be decoded, and for one pixel having distance information, determines whether there is a change to the distance information in the warning space, and when encoding per multiple pixels is estimated by the estimation means, decodes the codeword to be decoded, and for multiple pixels having distance information, determines whether there is a change to the distance information in the warning space.SELECTED DRAWING: Figure 3

Description

本発明は、被写体までの距離情報に基づいて、動画像における特定の被写体を検出する方法に関する。 The present invention relates to a method for detecting a specific subject in a moving image based on distance information to the subject.

従来、ネットワークカメラと称される撮像装置を用いて、撮像した動画から、撮影対象の空間に不審者が侵入したかどうかを判定するシステムが知られている。このような特定の被写体の検出には、撮像装置から被写体までの距離を測定した距離画像を用いる方法がある。特許文献１には、撮像装置により動体監視を行う場合、検出した動体までの距離と検出すべき動体の距離とから、動体が撮像装置から所定の範囲にあるか否かを判定する方法を開示している。一方、複数のカメラから得られる映像とそれらのカメラをステレオカメラとして構成して得られる距離画像とを用いて、撮像する環境中の有意な情報を的確に認識する環境認識装置の提案もなされている（特許文献２）。 2. Description of the Related Art Conventionally, there is known a system that uses an imaging device called a network camera to determine whether a suspicious person has entered a shooting target space from a captured moving image. For detecting such a specific subject, there is a method using a distance image obtained by measuring the distance from the imaging device to the subject. Patent Document 1 discloses a method for determining whether or not a moving object is within a predetermined range from the imaging device based on the distance to the detected moving object and the distance of the moving object to be detected when the moving object is monitored by the imaging device. doing. On the other hand, there has also been proposed an environment recognition device that accurately recognizes significant information in an environment to be captured using images obtained from a plurality of cameras and distance images obtained by configuring those cameras as stereo cameras. (Patent Document 2).

また、撮像により得られた動画は、ネットを介して他の情報処理装置にデータを伝送する際のデータ量の削減や、メモリに蓄積するデータ量の削減のため、圧縮された形式のデータとして取り扱われることが多い。同様の観点から、撮像装置を視点とした被写体への距離を測定した距離画像についても、圧縮データとして取り扱うことが望ましい。距離画像の符号化に関する標準技術として、３ＤＶｉｄｅｏＣｏｄｉｎｇ（以下、３ＤＶ）の策定が行われている。３ＤＶでは、自由視点映像合成を高画質に行うための距離画像を生成する。しかしながらＨ．２６４等のＲＧＢ画像の符号化技術と同様、周波数変換を行うため、距離画像のエッジ付近で大きな劣化が生じやすい。一部の画素に大きな劣化が発生する圧縮方式は、距離画像の利用方法によっては大きな問題となる。例えば特許文献２に開示された方法において、このような一部の画素に大きな劣化が発生する圧縮方式を用いて圧縮された距離情報を用いると、圧縮による劣化があるために、撮像する環境中の有意な情報を的確に認識することが難しくなる。そこで距離画像には、ＪＰＥＧ−ＬＳと称される公知の圧縮方式（非特許文献１）の様に、符号化による各画素値の最大歪み（誤差）を任意の値に抑えることができる符号化方式を用いることが望ましい。 In addition, the moving image obtained by imaging is compressed as data in order to reduce the amount of data when transmitting data to other information processing devices via the network and to reduce the amount of data stored in the memory. Often handled. From the same point of view, it is desirable to handle the distance image obtained by measuring the distance to the subject with the imaging device as the viewpoint as compressed data. 3D Video Coding (hereinafter referred to as 3DV) has been developed as a standard technique for encoding distance images. In 3DV, a distance image for performing free viewpoint video composition with high image quality is generated. However, H. As with the RGB image encoding technology such as H.264, frequency conversion is performed, so that large degradation is likely to occur near the edge of the distance image. A compression method in which a large amount of deterioration occurs in some pixels is a serious problem depending on how the range image is used. For example, in the method disclosed in Patent Document 2, if distance information compressed using a compression method in which some of the pixels are greatly deteriorated is used, there is deterioration due to the compression, so that an image is captured in the environment. It becomes difficult to accurately recognize significant information. Therefore, the distance image is encoded such that a maximum distortion (error) of each pixel value due to encoding can be suppressed to an arbitrary value, as in a known compression method called JPEG-LS (Non-Patent Document 1). It is desirable to use a method.

特開平１１−１９６３２０号公報JP 11-196320 A 特開２０１０−２２４９１８号公報JP 2010-224918 A

ＴｈｅＬＯＣＯ−ＩＬｏｓｓｌｅｓｓＩｍａｇｅＣｏｍｐｒｅｓｓｉｏｎＡｌｇｏｒｉｔｈｍ：ＰｒｉｎｃｉｐｌｅｓａｎｄＳｔａｎｄａｒｄｉｚａｔｉｏｎｉｎｔｏＪＰＥＧ−ＬＳ，（ＩＥＥＥＴＲＡＮＳＡＣＴＩＯＮＯＮＩＭＡＧＥＰＲＯＣＥＳＳＩＮＧ，ＶＯＬ．９，ＮＯ．８，ＡＵＧＵＳＴ２０００）The LOCO-I Lossless Image Compression Algorithm: Principles and Standardization into JPEG-LS, (IEEE TRANSACTION ON IMAGE PROSINGING, VOL.9, NO.

距離画像を符号化して得られる圧縮データを参照し、特定の被写体の検出するためには、まずは圧縮データの復号が必要である。そのあと、復号して得られた距離画像に基づいて、被写体の侵入を検出しようとすると、時間がかかってしまう。特に動画像を構成するフレーム画像が順次入力され、被写体の侵入を検出する場合、特に高速な処理が求められている。 In order to detect a specific subject by referring to the compressed data obtained by encoding the distance image, it is necessary to first decode the compressed data. After that, it takes time to detect the intrusion of the subject based on the distance image obtained by decoding. Particularly when frame images constituting a moving image are sequentially input and intrusion of a subject is detected, particularly high-speed processing is required.

そこで本発明は、圧縮された距離画像に基づいて、動画に出現する特定の被写体を高速、かつより適切に検出することを目的とする。 Accordingly, an object of the present invention is to detect a specific subject appearing in a moving image at high speed and more appropriately based on a compressed distance image.

上記課題を解決するため本発明に係る情報処理装置は、撮像装置により撮像することにより前記撮像装置から被写体への距離情報を得られる監視空間において、前記監視空間の初期状態における距離画像と、被写体の侵入を検出する警戒空間を設定する設定手段と、画素毎の符号化と複数の画素毎の符号化を混合した符号化形式により、画素ごとの距離情報からなる距離画像を符号化して得られる圧縮データを取得する取得手段と、前記圧縮データに対して復号対象の符号語が、画素毎に符号化された符号語であるか、複数の画素毎に符号化された符号語であるかを推定する推定手段と、
前記復号対象の符号語を復号して距離情報を取得し、前記推定手段により推定された結果に応じて、前記距離情報の画素位置について、前記距離情報が前記警戒空間における前記距離情報の変化があるか否かを判定する判定手段を有し、前記判定手段は、前記推定手段が画素毎に符号化された符号語であると推定した場合、前記復号対象の符号語を復号して距離情報を有する１つの画素について、該距離情報と前記監視空間の初期状態とに基づいて前記警戒空間における前記距離情報の変化があるか否かを判定し、前記推定手段が複数の画素毎に符号化された符号であると推定した場合、前記復号対象の符号を復号して距離情報を有する複数の画素について、該距離情報と前記監視空間の初期状態とに基づいて、前記警戒空間における前記距離情報の変化があるか否かを判定することを特徴とする。
ことを特徴とする。 In order to solve the above-described problem, an information processing apparatus according to the present invention provides a distance image in an initial state of the monitoring space and a subject in a monitoring space in which distance information from the imaging device to the subject can be obtained by imaging with the imaging device. Obtained by encoding a distance image consisting of distance information for each pixel by a setting means for setting a warning space for detecting an intrusion of the image and an encoding format in which encoding for each pixel and encoding for each of a plurality of pixels are mixed. An acquisition means for acquiring compressed data, and whether a codeword to be decoded for the compressed data is a codeword encoded for each pixel or a codeword encoded for each of a plurality of pixels Estimating means for estimating;
Decoding the codeword to be decoded to obtain distance information, and according to the result estimated by the estimation means, for the pixel position of the distance information, the distance information changes in the distance information in the alert space. Determining means for determining whether or not there is a distance information by decoding the decoding target codeword when the estimating means estimates that the codeword is encoded for each pixel; And determining whether or not there is a change in the distance information in the alert space based on the distance information and the initial state of the monitoring space, and the estimating means encodes each pixel If it is estimated that the code is a decoded code, the distance in the alert space is determined based on the distance information and the initial state of the monitoring space for a plurality of pixels having distance information by decoding the code to be decoded. And judging whether there is a change of information.
It is characterized by that.

本発明によれば、圧縮された距離画像に基づいて、動画に出現する特定の被写体を高速、かつ適切に検出することができる。 According to the present invention, a specific subject appearing in a moving image can be detected at high speed and appropriately based on a compressed distance image.

監視空間と監視対象物と撮像装置の一例を示す図。The figure which shows an example of monitoring space, a monitoring target, and an imaging device. 図１の監視空間における警戒空間の例を示す図。The figure which shows the example of the alert space in the monitoring space of FIG. 情報処理装置３の機能構成を示すブロック図。3 is a block diagram showing a functional configuration of the information processing apparatus 3. FIG. 動画像と距離画像を説明する図。The figure explaining a moving image and a distance image. コンピュータシステムの構成例を示す図。The figure which shows the structural example of a computer system. 情報処理装置の機能をコンピュータシステム上で実現する場合の処理のフローチャート。The flowchart of the process in the case of implement | achieving the function of information processing apparatus on a computer system. 検出処理の詳細を示すフローチャート。The flowchart which shows the detail of a detection process. 情報処理装置への入力となる動画像と距離画像のデータを生成する装置の構成例を示す図。The figure which shows the structural example of the apparatus which produces | generates the data of the moving image and distance image used as the input to information processing apparatus. 図８における距離画像符号化部８８０の構成例を示す図。The figure which shows the structural example of the distance image encoding part 880 in FIG. ゴロム・ライス符号化を説明する図。The figure explaining Golomb-Rice encoding. 着目画素とその周辺の画素の位置関係を示す図。The figure which shows the positional relationship of a focused pixel and its surrounding pixel. ゴロム・ライス符号化のパラメータｋと整数値ＭＶと符号語の関係を示す図。The figure which shows the relationship between the parameter k of Golomb-Rice encoding, the integer value MV, and a code word.

以下、添付図面を参照し、本発明の好適な実施形態について説明する。なお、以下説明する実施形態は、本発明を具体的に実施した場合の一例にすぎず、本発明は図示された構成に限定されるものではない。 Preferred embodiments of the present invention will be described below with reference to the accompanying drawings. The embodiment described below is merely an example when the present invention is specifically implemented, and the present invention is not limited to the illustrated configuration.

＜第一の実施形態＞
第一の実施形態では、監視対象が配置された空間を撮像装置が撮像して得られる動画像において、監視対象物に接近する被写体があるか否か判定する情報処理装置を説明する。監視空間を撮像装置により連続時間撮像し、動画像を得る。ここで動画像とは、時系列の複数のフレーム画像からなるデータである。撮像装置は、撮像結果に対して所定の画像処理、符号化を施して、動画像の圧縮データと、対応する距離画像列の圧縮データを情報処理装置に転送する。情報処理装置は、監視空間および撮像装置からは離れた場所に存在している監視装置であり、転送された圧縮データを解析し、警戒空間への侵入物を検出する。 <First embodiment>
In the first embodiment, an information processing apparatus that determines whether or not there is a subject that approaches a monitoring target in a moving image obtained by imaging an imaging device in a space in which the monitoring target is arranged will be described. A monitoring space is imaged continuously by an imaging device to obtain a moving image. Here, the moving image is data composed of a plurality of time-series frame images. The imaging device performs predetermined image processing and encoding on the imaging result, and transfers the compressed data of the moving image and the compressed data of the corresponding distance image sequence to the information processing device. The information processing device is a monitoring device that exists in a location away from the monitoring space and the imaging device, and analyzes the transferred compressed data to detect an intruder into the alert space.

図１（ａ）において、監視空間１１に、監視対象物１２が配置されている。このような監視空間を、主カメラ１００と副カメラ１０１の２つの撮像装置により動画像を撮像する。主カメラ１００と副カメラ１０１が監視空間１１を撮像することにより、図２に示すように２つの画像列を得る。１つは、フレーム画像列からなる動画像であり、フレーム画像はＲＧＢにより表わされるカラー画像データである。動画像は、主カメラ１００から得られるデータを用いるものとする。もう１つの画像列は、距離画像列であり、動画像のフレーム画像それぞれに対応する距離画像からなる。各距離画像は、対応するフレーム画像におけるカメラ位置から被写体への距離情報を画素毎に保持する画像である。各距離画像は、主カメラ１００と副カメラ１０１の対応するフレームを視差の異なる２つのステレオ画像とし、公知の方法により算出される。以上の通り、動画像と距離画像列は、各フレーム画像と距離画像とが対応するデータであり、動画像のフレーム画像における各画素位置と距離画像における各画素位置は対応づけられている。 In FIG. 1A, a monitoring object 12 is arranged in a monitoring space 11. In such a monitoring space, a moving image is picked up by two image pickup devices of the main camera 100 and the sub camera 101. When the main camera 100 and the sub camera 101 image the monitoring space 11, two image sequences are obtained as shown in FIG. One is a moving image composed of a frame image sequence, and the frame image is color image data represented by RGB. Data obtained from the main camera 100 is used for the moving image. The other image sequence is a distance image sequence, and is composed of distance images corresponding to each frame image of the moving image. Each distance image is an image that holds the distance information from the camera position to the subject in the corresponding frame image for each pixel. Each distance image is calculated by a known method using the corresponding frames of the main camera 100 and the sub camera 101 as two stereo images having different parallaxes. As described above, the moving image and the distance image sequence are data corresponding to each frame image and the distance image, and each pixel position in the frame image of the moving image is associated with each pixel position in the distance image.

なお、図１（ｂ）に示す様に、カメラ１０２とレンジセンサ１０３とを組み合わせて距離画像を生成するように構成しても良い。この場合カメラ１０２から得られる動画を動画像とし、レンジセンサ１０３より得られる測距データを距離画像列とする。 As shown in FIG. 1B, the camera 102 and the range sensor 103 may be combined to generate a distance image. In this case, a moving image obtained from the camera 102 is a moving image, and distance measurement data obtained from the range sensor 103 is a distance image sequence.

各カメラにより撮像して得られた動画像および距離画像列は、それぞれ所定の圧縮形式で符号化を施され、所定の形式の圧縮データに変換される。ここで図８は、撮像装置から得られる動画像に基づいて所定の形式の圧縮データが生成される画像処理装置の論理構成を示すブロック図である。同図において、主カメラ１００および、副カメラ１０１は図１（ａ）に示したカメラである。各カメラからは、監視対象物１２を含む監視空間１１を撮像して得た動画像を出力される。画像取得部８３０は、主カメラ１００、副カメラ１０１から出力された複数のフレーム画像からなる動画像を取得し、フレーム画像毎に画像処理部８４０に出力する。画像処理部８４０では、主カメラ１００が撮像したフレーム画像と副カメラ１０１が撮像したフレーム画像とをステレオ画像とし、フレーム画像に対応する距離画像を生成する。距離画像は、フレーム毎に生成されるため、距離画像も動画像に対応した距離画像列として出力される。画像処理部８４０が生成したフレーム画像列は、一次的に動画像メモリ８５０に保持され、距離画像列は一時的に距離画像メモリ８６０に保持される。 A moving image and a distance image sequence obtained by imaging with each camera are encoded in a predetermined compression format and converted into compressed data in a predetermined format. Here, FIG. 8 is a block diagram illustrating a logical configuration of an image processing apparatus in which compressed data of a predetermined format is generated based on a moving image obtained from the imaging apparatus. In the figure, a main camera 100 and a sub camera 101 are the cameras shown in FIG. Each camera outputs a moving image obtained by imaging the monitoring space 11 including the monitoring target 12. The image acquisition unit 830 acquires a moving image composed of a plurality of frame images output from the main camera 100 and the sub camera 101 and outputs the moving image to the image processing unit 840 for each frame image. The image processing unit 840 generates a distance image corresponding to the frame image by using the frame image captured by the main camera 100 and the frame image captured by the sub camera 101 as a stereo image. Since the distance image is generated for each frame, the distance image is also output as a distance image sequence corresponding to the moving image. The frame image sequence generated by the image processing unit 840 is temporarily stored in the moving image memory 850, and the distance image sequence is temporarily stored in the distance image memory 860.

動画符号化部８７０は、動画メモリ８５０に保持されるフレーム画像列を符号化し、圧縮データに変換して出力部８９０に出力する。距離画像符号化部８８０は、距離画像メモリ８６０に保持される距離画像列それぞれを符号化し、圧縮データに変換して出力部８９０に出力する。出力部８９０は、動画符号化部８７０より得られる符号化済みのフレーム画像列を、不図示の通信回線等を通して、本実施形態において侵入物を検出する情報処理装置に出力する。同様に、出力部８９０は、距離画像符号化部８８０より得られる符号化済みの距離画像列を、不図示の通信回線等を通して、情報処理装置に出力する。 The moving image encoding unit 870 encodes the frame image sequence held in the moving image memory 850, converts the frame image sequence into compressed data, and outputs the compressed data to the output unit 890. The distance image encoding unit 880 encodes each distance image sequence held in the distance image memory 860, converts it into compressed data, and outputs the compressed data to the output unit 890. The output unit 890 outputs the encoded frame image sequence obtained from the moving image encoding unit 870 to an information processing apparatus that detects an intruder in the present embodiment through a communication line (not shown). Similarly, the output unit 890 outputs the encoded distance image sequence obtained from the distance image encoding unit 880 to the information processing apparatus through a communication line (not shown).

次に、上述の距離画像符号化部８８０の構成例を、図９を用いて説明する。前述のように、圧縮によって距離情報が大きく劣化した画素値を使用すると、距離画像を用いた判定の精度が悪くなってしまう場合がある。このため圧縮による各画素値の最大歪みを任意の値に抑えることができる符号化方式を用いることが望ましい。そこで本実施形態において距離画像符号化部８８０は、距離画像については、ＪＰＥＧ−ＬＳと称される公知の符号化方式を用いて、距離画像を符号化する。ＪＰＥＧ−ＬＳは、符号化対象の各画素の画素値を予測符号化と呼ばれる手法により画素単位で符号化する方式と、同じ画素値である連続する複数の画素からなる画素列をランレングス符号化と呼ばれる手法により符号化する方式とが混合された符号化形式である。 Next, a configuration example of the above-described distance image encoding unit 880 will be described with reference to FIG. As described above, when a pixel value whose distance information is greatly degraded by compression is used, the accuracy of determination using the distance image may be deteriorated. Therefore, it is desirable to use an encoding method that can suppress the maximum distortion of each pixel value due to compression to an arbitrary value. Therefore, in the present embodiment, the distance image encoding unit 880 encodes the distance image using a known encoding method called JPEG-LS for the distance image. JPEG-LS is a method that encodes pixel values of each pixel to be encoded in a pixel unit by a technique called predictive encoding, and a run-length encoding of a pixel sequence composed of a plurality of continuous pixels having the same pixel value. This is a coding format in which a method of coding by a method called “a” is mixed.

図９は、距離画像符号化８８０のより詳細な論理構成を示すブロック図である。距離画像入力部９１０は、距離画像メモリ８６０に保持される距離画像列より距離画像を順次入力する。距離画像入力部９１では、距離画像を構成する各画素の画素値を、ラスタスキャン順にライン単位にモード選択部９２０に入力する。モード選択部９２０は、距離画像入力部９１０から取得したライン単位の画素値列を、ラインの先頭の画素値から順次に解析して、ランレングス符号化すべきか否かを判定する。ここで、ランレングス符号化すべきと判定した画素に関しては、画素の画素値をランレングス符号化部９４０に出力する。そうではない場合には、画素の画素値をゴロム符号化部９３０に出力する。ゴロム符号化部９３０は、モード選択部９２０の制御のもと、当該画素を画素単位にゴロム・ライス符号化した符号語を生成する。また、ランレングス符号化部９４０は、当該画素を隣接する画素を含めた同一ライン上の複数画素の集まりとして符号化した符号語を生成する。符号生成部９５０は、前記ゴロム符号化部９３０で生成された符号語とランレングス符号化部９４０で生成された符号語とを組み合わせ、一ライン分の画素値列に対する符号語列を生成して符号出力部９６０に出力する。符号出力部９６０は符号生成部９５０より出力される一ライン分の符号語列から、ラスター走査順に並ぶ複数のラインにより構成される距離画像の符号データを順次形成し、出力部８９０に出力する。 FIG. 9 is a block diagram showing a more detailed logical configuration of the range image encoding 880. The distance image input unit 910 sequentially inputs distance images from the distance image sequence held in the distance image memory 860. The distance image input unit 91 inputs the pixel value of each pixel constituting the distance image to the mode selection unit 920 in line units in raster scan order. The mode selection unit 920 sequentially analyzes the line-by-line pixel value sequence acquired from the distance image input unit 910 from the top pixel value of the line, and determines whether or not run-length encoding is to be performed. Here, regarding the pixel determined to be run-length encoded, the pixel value of the pixel is output to the run-length encoding unit 940. Otherwise, the pixel value of the pixel is output to Golomb encoding section 930. The Golomb encoding unit 930 generates a code word obtained by Golomb-Rice encoding the pixel in units of pixels under the control of the mode selection unit 920. In addition, the run-length encoding unit 940 generates a code word obtained by encoding the pixel as a group of a plurality of pixels on the same line including adjacent pixels. The code generation unit 950 combines the codeword generated by the Golomb encoding unit 930 and the codeword generated by the run-length encoding unit 940 to generate a codeword sequence for a pixel value sequence for one line. The data is output to the code output unit 960. The code output unit 960 sequentially forms code data of a distance image composed of a plurality of lines arranged in the raster scan order from the code word string for one line output from the code generation unit 950, and outputs the code data to the output unit 890.

上記に説明した距離画像符号化部８８０を、コンピュータシステム上で、ソフトウェア処理で実現した場合の例を、図５のコンピュータシステムのハードウェア構成図と、図１０のフローチャートを用いて説明する。ＣＰＵ５０１は、中央演算処理装置として機能し、入力されたデータや後述のＲＡＭ５０２やＲＯＭ５０３に格納されているコンピュータプログラムを用いて、画像形成システム全体の動作を制御する。ＲＡＭ５０２は、外部記憶装置５０７から読み取ったコンピュータプログラムやデータ、Ｉ／Ｆ部５０９を介して外部から受信したデータを一時的に記憶する記憶領域を有する。外部記憶装置５０７から読み取ったコンピュータプログラムやデータ、Ｉ／Ｆ部５０９を介して外部から受信したデータを一時的に記憶する記憶領域を有する。ＲＯＭ５０３は、ランダムアクセスして読み出しのみ可能なメモリであり、画像形成システムにおける各部の設定を行う設定パタメータやブートプログラムなどが格納されている。キーボード５０４や、マウス５０５は、ポインティングデバイスであり、ユーザによる指示の入力を受け付ける。また、液晶ディスプレイなどに代表される表示装置５０６は、デジタル画像を表示する。ハードディスクなどに代表される外部記憶装置５０７は、大容量外部記憶装置である。記憶媒体ドライブ５０８は脱着可能な記憶媒体を駆動する。Ｉ／Ｆ部５０９は、外部装置とデータのやり取りをするためのインターフェースであり、上記の各部はいずれも、バス５１０に接続され、バス５１０を介してデータの授受を行う。 An example in which the distance image encoding unit 880 described above is realized by software processing on a computer system will be described with reference to the hardware configuration diagram of the computer system in FIG. 5 and the flowchart in FIG. 10. The CPU 501 functions as a central processing unit, and controls the operation of the entire image forming system using input data and computer programs stored in a RAM 502 and a ROM 503 described later. The RAM 502 has a storage area for temporarily storing computer programs and data read from the external storage device 507 and data received from the outside via the I / F unit 509. It has a storage area for temporarily storing computer programs and data read from the external storage device 507 and data received from the outside via the I / F unit 509. A ROM 503 is a memory that can be read only by random access, and stores setting parameters, a boot program, and the like for setting each unit in the image forming system. A keyboard 504 and a mouse 505 are pointing devices and accept input of instructions from the user. A display device 506 typified by a liquid crystal display displays a digital image. An external storage device 507 typified by a hard disk or the like is a large-capacity external storage device. The storage medium drive 508 drives a removable storage medium. An I / F unit 509 is an interface for exchanging data with an external device. Each of the above units is connected to the bus 510 and exchanges data via the bus 510.

図１０は、距離画像の符号化を実行するフローチャートである。図１０に示される処理の流れを記述したプログラムを、ＲＡＭ５０２、もしくは、ＲＯＭ５０３に予め記憶しておきＣＰＵ５０１が該プログラムを実行することで距離情報符号化部８８０を実現することができる。図１０において、処理を開始すると、ステップＳ１０１において距離画像入力部９１０は、距離画像列のうち、時系列において最初の距離画像を先頭フレームとして特定し、先頭フレームの距離画像における最上のラインを先頭ラインとして設定する。次にステップＳ１０２において、距離画像入力９１０は処理対象である距離画像の最終ラインの処理が終了しているか否かを判定し、終了している場合にはステップＳ１０３に進む。そうではない場合にはステップＳ１０５に進む。 FIG. 10 is a flowchart for executing encoding of a distance image. The distance information encoding unit 880 can be realized by storing a program describing the processing flow shown in FIG. 10 in the RAM 502 or the ROM 503 in advance, and the CPU 501 executing the program. In FIG. 10, when the process is started, in step S101, the distance image input unit 910 identifies the first distance image in the time series as the first frame in the distance image sequence, and sets the top line in the distance image of the first frame as the first frame. Set as line. Next, in step S102, the distance image input 910 determines whether or not the processing of the last line of the distance image to be processed has been completed. If it has been completed, the process proceeds to step S103. If not, the process proceeds to step S105.

ステップＳ１０３では距離画像列のうち、時系列において最終フレームの処理が終了しているか否かを判定し、終了している場合には、一連の距離情報符号化を終了し、そうではない場合にはステップＳ１０４に進む。ステップＳ１０４において距離画像入力部９１０は、処理対象を次のフレームに対応する距離画像の先頭ラインに設定し、ステップＳ１０２へ戻る。ステップＳ１０５において距離画像入力部９１０は、処理対象のライン上の画素列の画素値列を入力し、ステップＳ１０６に進む。距離画像における画素値はライン単位に入力され、ラスタスキャン順に順次符号化される。 In step S103, it is determined whether or not the processing of the last frame in the time series in the distance image sequence has been completed. If it has been completed, a series of distance information encoding is completed, and if not, Advances to step S104. In step S104, the distance image input unit 910 sets the processing target as the first line of the distance image corresponding to the next frame, and returns to step S102. In step S105, the distance image input unit 910 inputs the pixel value sequence of the pixel sequence on the processing target line, and proceeds to step S106. Pixel values in the distance image are input in units of lines and sequentially encoded in raster scan order.

ステップＳ１０６においてモード選択部９２０は、ステップＳ１０５で入力した画素値列の内、先頭に位置する画素を着目画素として設定し、ステップＳ１０７に進む。ステップＳ１０７においてモード選択部９２０は、着目画素に対する符号化モードを選択する。符号化モードとしては、画素符号化モードとランレングス符号化モードとがある。ステップＳ１０７における符号化モードの選択の仕方を図１１を用いて説明する。図１１は、着目画素と周辺画素の位置関係を示している。ここでは、画素ｘが着目画素であり、符号化対象の画素となる。また、画素はラスタスキャン順に符号化されるため、左画素ａ、左上画素ｃ、上画素ｂ、右上画素ｄは、既に符号化済みの画素である。初期設定では、ラン長ＲＬ＝０が設定されている。ラン長ＲＬが０以外、または、周囲画素の状態がランレングス符号化条件を満たしている場合に、モード選択部９２０はランレングスモードを選択する。なおランレングス符号化条件は、左画素ａ、左上画素ｃ、上画素ｂ、右上画素ｄにおいて、それぞれの画素値がａ＝ｃ、かつｃ＝ｂ、かつｂ＝ｄとする。従って、周辺画素の画素値がａ＝ｃ、かつｃ＝ｂ、かつｂ＝ｄを満たすとき、モード選択部９２０は、ランレングス符号化条件を選択する。それ以外の場合には、モード選択部９２０は画素符号化モードを選択する。ステップＳ１０８においてモード選択部９２０は、ステップＳ１０７において符号化モードがランレングスモードに設定されたか否かを判定し、ランレングスモードに設定されているときには、ステップＳ１０９へ進む。そうではない時にはステップＳ１１０へ進む。 In step S106, the mode selection unit 920 sets the pixel located at the head of the pixel value sequence input in step S105 as the target pixel, and proceeds to step S107. In step S107, the mode selection unit 920 selects an encoding mode for the pixel of interest. As encoding modes, there are a pixel encoding mode and a run-length encoding mode. A method of selecting the encoding mode in step S107 will be described with reference to FIG. FIG. 11 shows the positional relationship between the pixel of interest and the peripheral pixels. Here, the pixel x is a pixel of interest and is a pixel to be encoded. Further, since the pixels are encoded in the raster scan order, the left pixel a, the upper left pixel c, the upper pixel b, and the upper right pixel d are already encoded pixels. In the initial setting, the run length RL = 0 is set. When the run length RL is other than 0, or when the surrounding pixels satisfy the run-length encoding condition, the mode selection unit 920 selects the run-length mode. The run-length encoding condition is that the pixel values of the left pixel a, the upper left pixel c, the upper pixel b, and the upper right pixel d are a = c, c = b, and b = d. Therefore, when the pixel values of the peripheral pixels satisfy a = c, c = b, and b = d, the mode selection unit 920 selects the run-length encoding condition. In other cases, the mode selection unit 920 selects a pixel encoding mode. In step S108, the mode selection unit 920 determines whether or not the encoding mode is set to the run-length mode in step S107. If the encoding mode is set to the run-length mode, the process proceeds to step S109. Otherwise, the process proceeds to step S110.

ステップＳ１０９においてランレングス符号化９４０は、ランレングス符号化を行う。図１１に示される着目画素ｘが左画素ａと同じ画素値のとき、ラン長ＲＬを１増加させ、ステップＳ１１２へ進む。また着目画素ｘが左画素ａと異なる画素値のとき、それまでのラン長ＲＬをハフマン符号化した上で、ラン長ＲＬを０に初期化して、ステップＳ１１２に進む。尚、ランレングス符号化については、公知の技術であり、例えばＪＰＥＧ−ＬＳと同様の処理のため、詳細説明は省略する。 In step S109, the run-length encoding 940 performs run-length encoding. When the pixel of interest x shown in FIG. 11 has the same pixel value as that of the left pixel a, the run length RL is increased by 1, and the process proceeds to step S112. When the target pixel x has a pixel value different from that of the left pixel a, the run length RL so far is Huffman-coded, the run length RL is initialized to 0, and the process proceeds to step S112. Note that run-length encoding is a known technique and, for example, is the same processing as JPEG-LS, and thus detailed description thereof is omitted.

ステップＳ１１０においてゴロム符号化９３０は、画素の画素値を予測変換する。予測変換にはＭＥＤ（ＭｅｄｉａｎＥｄｇｅＤｅｔｅｃｔｉｏｎ）予測を用いる。図１１の周辺画素を用いた予測式は、式（１）に示す通りである。 In step S110, Golomb coding 930 predictively converts the pixel value of the pixel. The prediction conversion uses MED (Media Edge Detection) prediction. The prediction formula using the peripheral pixels in FIG. 11 is as shown in Formula (1).

ここで着目画素の予測誤差は式（２）である。
予測誤差（Ｄｉｆｆ）＝着目画素ｘの画素値−予測値ｐ（２）
予測変換を終えるとステップＳ１１１へ進む。

Here, the prediction error of the pixel of interest is Equation (2).
Prediction error (Diff) = pixel value of target pixel x−predicted value p (2)
When the prediction conversion is completed, the process proceeds to step S111.

ステップＳ１１１においてゴロム符号化９３０は、ステップＳ１１０において算出した着目画素の予測誤差をゴロム・ライス符号化する。まず予測誤差（Ｄｉｆｆ）を非負の整数値（ＭＶ）に変換する。変換式は式（３）に示す通りである。 In step S111, the Golomb coding 930 performs Golomb-Rice coding on the prediction error of the pixel of interest calculated in Step S110. First, the prediction error (Diff) is converted into a non-negative integer value (MV). The conversion formula is as shown in Formula (3).

次に、パラメータｋを用いて非負の整数値（ＭＶ）をゴロム・ライス符号化する。ゴロム・ライス符号化の手順は以下の通りである。
（１）ＭＶを２進数表現して、ＭＶをｋビット右シフトした値の０を並べ、その後に１を付加する。
（２）（１）の後ろに、ＭＶの下位ｋビットを取りだして付け加える。

Next, Golomb-Rice coding is performed on the non-negative integer value (MV) using the parameter k. The procedure for Golomb-Rice coding is as follows.
(1) MV is expressed as a binary number, 0's of values obtained by right-shifting MV by k bits are arranged, and 1 is added after that.
(2) The lower k bits of MV are extracted and added after (1).

図１２に、ゴロム・ライス符号化のパラメータｋと非負の整数値（ＭＶ）と符号語の関係を示す。ゴロム・ライス符号化の構成はこれに限定されるものではなく、例えば、０と１を反対にして符号を構成しても構わないし、上記手順で述べた（１）と（２）の順番を入れ替えて符号を構成しても良い。なお、ここでは符号化パラメータｋの決定方法については特に特定しないが、符号化側と復号側で同じパラメータを利用できれば良い。ステップＳ１１１の処理を終えるとステップＳ１１２へ進む。 FIG. 12 shows the relationship between the Golomb-Rice coding parameter k, the non-negative integer value (MV), and the code word. The configuration of Golomb-Rice coding is not limited to this. For example, codes may be configured by reversing 0 and 1, and the order of (1) and (2) described in the above procedure may be changed. The codes may be configured by replacing them. Here, the method for determining the encoding parameter k is not particularly specified, but it is sufficient that the same parameter can be used on the encoding side and the decoding side. When the process of step S111 is completed, the process proceeds to step S112.

ステップＳ１１２において、処理中のラインの最終画素まで符号化処理が完了したか否かを判断し、終了している場合にはステップＳ１１４に進み、そうではない場合にはステップＳ１１３へ進む。ステップＳ１１３では処理対象の画素を同ライン上の次の画素に設定し、ステップＳ１０７へ戻る。ステップＳ１１４において、ゴロム符号化部９３０から得た符号語とランレングス符号化部９４０から得た符号語とを合わせて、処理対象ラインの符号語列を生成する。次にステップＳ１１５において距離情報入力部９１０は、処理対象の画素を次ラインの先頭画素に設定し、ステップＳ１０２へ戻る。以上、距離情報符号化部８８０を、コンピュータシステム上で、ソフトウェア処理で実現する場合の例を説明した。 In step S112, it is determined whether or not the encoding process has been completed up to the last pixel of the line being processed. If completed, the process proceeds to step S114. If not, the process proceeds to step S113. In step S113, the pixel to be processed is set to the next pixel on the same line, and the process returns to step S107. In step S114, the codeword obtained from the Golomb coding unit 930 and the codeword obtained from the run-length coding unit 940 are combined to generate a codeword string for the processing target line. Next, in step S115, the distance information input unit 910 sets the pixel to be processed as the first pixel of the next line, and returns to step S102. The example in which the distance information encoding unit 880 is realized by software processing on the computer system has been described above.

尚、上記に説明した距離情報符号化部８８０の構成は、動画符号化部８７０にも適用できる。本実施形態では、図８の動画符号化部８７０も、動画メモリ８５０に保持されるＲＧＢやＹＣｂＣｒ等の公知の色空間中での複数の成分で表現されるカラー画像よりなるフレーム画像列を、色成分毎にモノクロ画像として符号化する。このように符号化済の圧縮データとして出力部８９０に出力するものとする。 The configuration of the distance information encoding unit 880 described above can also be applied to the moving image encoding unit 870. In the present embodiment, the moving image encoding unit 870 in FIG. 8 also generates a frame image sequence composed of color images expressed by a plurality of components in a known color space such as RGB and YCbCr held in the moving image memory 850. Each color component is encoded as a monochrome image. It is assumed that the encoded compressed data is output to the output unit 890 in this way.

ここから、本実施形態における情報処理装置について説明する。図３は、情報処理装置３の論理構成を示すブロック図である。情報処理装置３は、動画取得部３０１、距離画像取得部３０２、初期空間算出部３０３、警戒空間設定部３０４、モード推定部３０５、座標算出部３０６、判定部３０７、警告制御部３０８を有する。動画取得部３０１は、動画（フレーム画像列）を符号化した圧縮データを、距離画像取得部３０２は、距離画像を符号化した圧縮データを順次、取得する。動画のフレーム画像と、対応する距離画像は必ず両方を取得する。初期空間算出部３０３は、監視対象の空間における通常時の距離画像を算出する。ここでは、取得した距離画像の圧縮データのうち、最初のフレームを初期状態（侵入物なし）として復号し、各画素の距離情報を初期状態として算出しておく。 From here, the information processing apparatus according to the present embodiment will be described. FIG. 3 is a block diagram illustrating a logical configuration of the information processing apparatus 3. The information processing apparatus 3 includes a moving image acquisition unit 301, a distance image acquisition unit 302, an initial space calculation unit 303, a warning space setting unit 304, a mode estimation unit 305, a coordinate calculation unit 306, a determination unit 307, and a warning control unit 308. The moving image acquisition unit 301 sequentially acquires compressed data obtained by encoding a moving image (frame image sequence), and the distance image acquisition unit 302 sequentially acquires compressed data obtained by encoding a distance image. Both the frame image of the moving image and the corresponding distance image are always acquired. The initial space calculation unit 303 calculates a normal distance image in the monitoring target space. Here, among the acquired compressed data of the distance image, the first frame is decoded as the initial state (no intruder), and the distance information of each pixel is calculated as the initial state.

警戒空間設定部３０５は、ユーザにより撮像された動画に撮像された監視空間において、物体の侵入を監視したい領域を指定させ、受け付ける。まず動画メモリ８５０に保持されている動画のうち初期状態のフレーム画像（ここでは最初のフレーム画像）を、表示装置５０６に表示する。次に、ユーザによりマウス等のポインティングデバイス５０５を用いて、表示画像上で警戒したい領域を指定させる。図４（ａ）は、表示画像１１を示し、指定点４１〜４４は、ユーザにより入力された監視領域を規定する４点を示している。警戒空間設定部３０５は、ユーザにより指定された領域に基づいて、３次元空間における警戒空間を設定する。図４（ｂ）は、ユーザにより指定された領域４０を底面とした四角柱で表現される空間を、警戒空間４５として設定した例を示している。警戒空間設定部３０５は、フレーム画像に対応する距離画像における点の位置とその位置の画素値（距離情報）とから生成された、警戒空間の３次元位置（３次元座標）を表現し、警戒空間情報として出力する。なお、距離画像から３次元データを得ることは、例えば特許第３８２３５５９号公報に開示される方法により算出することで実現できる。距離画像上の各画素の２次元画素位置（画像上の横方向の位置ｘと縦方向の位置ｙ）は測距点（観測点）から見た画角内での方向を表わす。また、各画素のもつ画素値は測距点（観測点）から当該画素位置の方向にある被写体までの距離を表わす奥行き値（ｚ）である。上記の公知の方法をコンピュータにより実行可能なプログラムとして記述し、図５で示されるようなコンピュータシステム上で実行する。その結果警戒空間設定部３０５は、警戒空間情報として、３次元空間における警戒空間（四角柱）の境界を規定する底面の各頂点を生成することができる。 The alert space setting unit 305 designates and accepts an area where it is desired to monitor the intrusion of an object in the monitoring space captured by the moving image captured by the user. First, an initial frame image (here, the first frame image) of the moving images held in the moving image memory 850 is displayed on the display device 506. Next, the user uses a pointing device 505 such as a mouse to designate an area to be alerted on the display image. FIG. 4A shows the display image 11, and designated points 41 to 44 indicate four points that define the monitoring area input by the user. The warning space setting unit 305 sets the warning space in the three-dimensional space based on the area specified by the user. FIG. 4B shows an example in which a space represented by a quadrangular prism with the area 40 designated by the user as the bottom is set as a warning space 45. The warning space setting unit 305 expresses the three-dimensional position (three-dimensional coordinates) of the warning space, which is generated from the position of the point in the distance image corresponding to the frame image and the pixel value (distance information) of the position. Output as spatial information. Note that obtaining three-dimensional data from a distance image can be realized by calculating by a method disclosed in Japanese Patent No. 3823559, for example. The two-dimensional pixel position (horizontal position x and vertical position y on the image) of each pixel on the distance image represents the direction within the angle of view as viewed from the distance measuring point (observation point). The pixel value of each pixel is a depth value (z) representing the distance from the distance measuring point (observation point) to the subject in the direction of the pixel position. The above known method is described as a program executable by a computer, and is executed on a computer system as shown in FIG. As a result, the alert space setting unit 305 can generate, as the alert space information, each vertex of the bottom surface that defines the border of the alert space (quadratic column) in the three-dimensional space.

モード推定部３０３は、取得した距離画像の圧縮データの符号語が、複数の画素毎に符号化するランレングスモードと、１つの画素毎に符号化する画素符号化モードのいずれにより符号化されたかを推定する。座標算出部３０４は、距離画像における画素について、実空間上の３次元座標を生成する。判定部３０６は、距離画像を参照し、距離画像が圧縮された符号化モードに応じて、監視空間の距離情報が、初期状態に対して所定の条件を超える変化があったか否かを判定する。判定部３０６は、距離情報の変化があると判定した場合、警戒空間に侵入した被写体があるとみなす。このように警戒空間設定部３０５によって指定された警戒空間に、侵入した被写体があるか否かを随時判定する。 The mode estimation unit 303 determines whether the codeword of the acquired compressed data of the distance image is encoded by a run-length mode that encodes for each of a plurality of pixels or a pixel encoding mode that encodes for each pixel. Is estimated. The coordinate calculation unit 304 generates three-dimensional coordinates in real space for the pixels in the distance image. The determination unit 306 refers to the distance image and determines whether the distance information of the monitoring space has changed beyond a predetermined condition with respect to the initial state, according to the encoding mode in which the distance image is compressed. If the determination unit 306 determines that there is a change in the distance information, the determination unit 306 considers that there is a subject that has entered the alert space. In this way, it is determined at any time whether there is an intruding subject in the alert space designated by the alert space setting unit 305.

図３に示した情報処理装置の機能を、図５に示されるような構成を有するコンピュータシステム上でソフトウェア処理で実現する場合の例を、図６のフローチャートを用いて説明を加える。ここでは、前述の距離画像符号化部８８０同様、図３に示した情報処理装置の機能も、図５を用いて先に説明したコンピュータシステムと同様の、もう一つのコンピュータシステム上でのソフトウェア処理で実現する。 An example in which the functions of the information processing apparatus shown in FIG. 3 are realized by software processing on a computer system having the configuration shown in FIG. 5 will be described with reference to the flowchart of FIG. Here, as with the above-described distance image encoding unit 880, the function of the information processing apparatus shown in FIG. 3 is the same as that of the computer system described above with reference to FIG. Realize with.

図６において、処理を開始すると、ステップＳ６０１において動画取得部３０１は、監視対象を含む監視空間の初期設定を算出するために、監視空間の初期状態を撮像した画像データを取得する。ここでは動画を構成するフレーム画像列のうち最初のフレームの圧縮データを入力するとする。このとき入力される画像データは、監視空間に警戒すべき被写体は侵入していない状態を撮像したものである。 In FIG. 6, when the process is started, in step S601, the moving image acquisition unit 301 acquires image data obtained by capturing an initial state of the monitoring space in order to calculate an initial setting of the monitoring space including the monitoring target. Here, it is assumed that the compressed data of the first frame in the frame image sequence constituting the moving image is input. The image data input at this time is an image of a state in which a subject to be alerted does not enter the monitoring space.

ステップＳ６０２において距離画像取得部３０１は、ステップ６０１において動画取得部３０１が入力したフレーム画像に対応する距離画像の圧縮データを取得する。ステップＳ６０３において初期空間算出部３０３は、ステップＳ６０１において取得した圧縮データを復号し、表示装置５０６に表示する。このときユーザは、監視空間の初期状態を確認することができる。また、初期空間算出部３０３は、ステップＳ６０２において取得した距離画像の圧縮データも復号し、初期状態における各画素の距離情報を算出する。 In step S602, the distance image acquisition unit 301 acquires compressed data of the distance image corresponding to the frame image input by the moving image acquisition unit 301 in step 601. In step S603, the initial space calculation unit 303 decodes the compressed data acquired in step S601 and displays the decoded data on the display device 506. At this time, the user can confirm the initial state of the monitoring space. The initial space calculation unit 303 also decodes the compressed data of the distance image acquired in step S602, and calculates distance information of each pixel in the initial state.

ステップＳ６０４において警戒空間設定部３０４は、ユーザの指示に応じて監視空間における警戒空間を設定する。警戒空間とは、初期状態には存在していない被写体の侵入を検出すべき空間を意味する。前述の通り本実施形態における警戒空間設定部３０４は、図４（ｂ）に示される各指定ポイント４１〜４４を底辺とする四角柱の頂点を３次元位置（３次元座標値）情報として生成する。図４の例に沿って説明すると、まず、図４（ａ）の指定ポイント４１〜４４に示される各点から、点４１と点４２、点４２と点４３、点４３と点４４、点４４と点４１というように、一巡する４本の線分で四辺形を形成する。この際、一般的に、４１〜４４の４点のそれぞれが、測定誤差や計算誤差等により必ずしもそのままでは平面を形成しない座標値となっていることがある。ここでは、指定ポイント４１〜４４は、同一平面上にあるべきものとして、必要であれば補正し、改めて４１〜４４の点の座標値とするものとして説明を続ける。補正の方法としては、特開２００７−２７１４０８等に開示される公知の方法で、距離画像から得られる３次元点群から床面を推定し、指定ポイント４１〜４４それぞれがこの床面上にある点であるものとしてそれぞれの座標値を補正するものとする。その他にも、例えば、４１から４３までの３点で規定される平面上に４４も載るように４４の座標値を補正するように構成しても良い。なお、上記の特開平２００７−２７１４０８には、３次元点群から推定した床面をＸＹ軸とし、これと直交した方向をＺ軸とした座標系に変換して表現することも開示されている。本実施形態では、距離画像から得られる３次元点群から推定する等、公知の方法により、同定した床面をＸＹ軸とし、これと直交した方向をＺ軸とした座標系に変換した表現を用いても良い。以上の通りに、警戒すべき空間を規定する境界を与える底面とこれに直交する平面群とで構成される柱体を規定する情報を、警戒空間情報として生成する。柱体の情報の表現自体は、例えば、柱体の頂点の各３次元座標値と、柱体を構成する平面と、これら平面を構成する頂点の繋がりに関する情報等を含むものである。生成した情報をＲＡＭ５０２上の不図示、かつ、その他の情報とは異なる領域に保持してステップＳ６０４の処理を終える。 In step S604, the warning space setting unit 304 sets a warning space in the monitoring space in accordance with a user instruction. The alert space means a space in which an intrusion of a subject that does not exist in the initial state should be detected. As described above, the alert space setting unit 304 according to the present embodiment generates, as three-dimensional position (three-dimensional coordinate value) information, the vertex of a quadrangular prism having the designated points 41 to 44 shown in FIG. . Referring to the example of FIG. 4, first, from each of the points indicated by the designated points 41 to 44 in FIG. 4A, the points 41 and 42, the points 42 and 43, the points 43 and 44, and the points 44 A quadrilateral is formed by four line segments that circulate as shown in FIG. At this time, in general, each of the four points 41 to 44 may have a coordinate value that does not necessarily form a plane as it is due to a measurement error or a calculation error. Here, it is assumed that the designated points 41 to 44 should be on the same plane, and the description will be continued assuming that the designated points 41 to 44 are corrected if necessary and are set as the coordinate values of the points 41 to 44 again. As a correction method, a floor surface is estimated from a three-dimensional point group obtained from a distance image by a publicly known method disclosed in Japanese Unexamined Patent Application Publication No. 2007-271408, and each of designated points 41 to 44 is on this floor surface. It is assumed that each coordinate value is corrected as a point. In addition, for example, the coordinate value of 44 may be corrected so that 44 is also placed on a plane defined by three points from 41 to 43. In addition, the above-mentioned Japanese Patent Application Laid-Open No. 2007-271408 also discloses that the floor surface estimated from the three-dimensional point group is converted into a coordinate system having the XY axis and the direction orthogonal thereto as the Z axis. . In the present embodiment, an expression obtained by converting the identified floor surface into the coordinate system with the XY axis as the identified floor surface and the Z axis as the direction orthogonal thereto by a known method such as estimation from a three-dimensional point group obtained from the distance image. It may be used. As described above, information that defines a column body composed of a bottom surface that provides a boundary that defines a space to be alerted and a plane group orthogonal thereto is generated as alert space information. The representation of the column body information itself includes, for example, each of the three-dimensional coordinate values of the vertexes of the column body, information regarding the planes that form the column bodies, and the connections between the vertices that configure these planes. The generated information is held in an area not shown in the RAM 502 and different from other information, and the process of step S604 is completed.

ステップＳ６０５において動画取得部３０１は、監視空間を監視している間に撮像されている動画像をフレーム毎に順次、ＬＩＶＥ動画として入力する。動画取得部３０１は、符号化されたデータを復号してフレーム画像を生成し、ＲＡＭ５０２上の不図示、かつ、その他の情報とは異なる領域に保持する。ステップＳ６０６では、ステップＳ６０５において入力したフレーム画像に対応する距離画像列中の距離画像を１枚入力し、ＲＡＭ５０２上の不図示、かつ、その他の情報とは異なるメモリ領域に保持する。距離画像を圧縮したデータは、前述のように画素単位での符号化と、同じ画素値をもった同一ライン上に互いに連続する複数の画素からなるラン単位での符号化とが混在する圧縮法で生成されたものである。 In step S605, the moving image acquisition unit 301 sequentially inputs a moving image captured while monitoring the monitoring space for each frame as a LIVE moving image. The moving image acquisition unit 301 decodes the encoded data to generate a frame image, and holds it in an area not shown in the RAM 502 and different from other information. In step S606, one distance image in the distance image sequence corresponding to the frame image input in step S605 is input and held in a memory area not shown in the RAM 502 and different from other information. A compression method in which data obtained by compressing a distance image includes encoding in units of pixels as described above and encoding in units of runs including a plurality of continuous pixels on the same line having the same pixel value. It is generated by.

ステップＳ６０７において、入力されたフレーム画像が撮像されたタイミングにおける監視空間を解析し、警戒空間への侵入物の有無を判定する。図７は、ステップＳ６０７における侵入物有無の判定処理の詳細なフローチャートを示す。 In step S607, the monitoring space at the timing when the input frame image is captured is analyzed, and the presence or absence of an intruder into the alert space is determined. FIG. 7 shows a detailed flowchart of the determination process for the presence or absence of an intruder in step S607.

まずステップＳ７０１においてモード推定部３０５は、既に復号済みの領域の各画素の画素値から、現在復号対象のデータ（符号語）が、画素符号化モードとランレングス符号化モードのどちらのモードで符号化されたデータであるかを推定する。この推定は、符号化モードの選択の仕方を説明した際に用いたと同じ図１１を用いる。着目画素ｘについて、その周囲画素である左画素ａ、左上画素ｃ、上画素ｂ、右上画素ｄは、既に復号化済みの画素である。そこで、この復号化済の画素の画素値を用いて、符号化時の説明で述べた図１０のステップＳ１０７と同様の判定を行う。これにより、着目位置ｘがいずれの処理モードで符号化されたかが推定できる。 First, in step S701, the mode estimation unit 305 encodes the current decoding target data (codeword) in either the pixel encoding mode or the run-length encoding mode from the pixel value of each pixel in the already decoded region. It is estimated whether the data is converted into data. This estimation uses FIG. 11 which is the same as that used when explaining how to select the encoding mode. Regarding the pixel of interest x, the left pixel a, the upper left pixel c, the upper pixel b, and the upper right pixel d, which are surrounding pixels, are already decoded pixels. Therefore, using the pixel value of the decoded pixel, the same determination as in step S107 of FIG. 10 described in the description at the time of encoding is performed. Thereby, it can be estimated in which processing mode the position of interest x was encoded.

ステップＳ７０２において推定結果が画素符号化モードであった場合にはステップＳ７０３に進み、そうではない場合には、ステップＳ７０４に進む。ステップＳ７０４において、着目中の符号データはランレングス符号化モードで符号化されたものとして復号し、ランの線長を同定する。ステップＳ７０５において座標三算出部３０６は、ステップＳ７０４において得たランの線長と着目画素の位置とから、当該ランの両端点の距離画像における画素位置を求める。またこれら両端点に位置する画素の画素値とを合わせて、先述の公知の方法により、当該ランの両端点の３次元座標値を求める。 If the estimation result is the pixel encoding mode in step S702, the process proceeds to step S703, and if not, the process proceeds to step S704. In step S704, the code data under attention is decoded as encoded in the run-length encoding mode, and the line length of the run is identified. In step S705, the coordinate three calculation unit 306 obtains the pixel position in the distance image of both end points of the run from the line length of the run obtained in step S704 and the position of the target pixel. Further, by combining the pixel values of the pixels located at these end points, the three-dimensional coordinate values of the end points of the run are obtained by the known method described above.

一方、ステップＳ７０３において座標算出部３０６は、着目画素位置は、画素符号化モードで符号化されたものであるとして復号化する。距離画像における着目画素の位置と復号して得た画素値とから、先述の公知の方法により、当該着目画素位置の３次元座標値を求める。 On the other hand, in step S703, the coordinate calculation unit 306 decodes the pixel position of interest as having been encoded in the pixel encoding mode. From the position of the target pixel in the distance image and the pixel value obtained by decoding, the three-dimensional coordinate value of the target pixel position is obtained by the above-described known method.

次に、ステップＳ７２１において判定部３０７は、画素符号化モードであると推定された場合に、画素符号化モードに応じてデータを復号した画素の侵入検出処理を行う。ステップＳ７０３において求めた着目画素の３次元座標と、ステップＳ６０４において生成した警戒空間情報とから、着目画素において侵入物があるか否かを判定する。着目画素における画素値（距離情報）が、３次元空間において、警戒空間情報が規定する空間の中に位置する場合、ステップＳ７０７へ進む。着目画素における画素値（距離情報）が、３次元空間において、警戒空間情報が規定する空間の中に位置しない場合には、ステップＳ７１６に進む。ステップＳ７０７において判定部３０７は、着目画素が警戒空間内にあることを示す情報をＲＡＭ５０２上の不図示、かつ、その他の情報とは異なる領域に保持する。 Next, in step S721, when it is estimated that the pixel encoding mode is set, the determination unit 307 performs an intrusion detection process for a pixel obtained by decoding data according to the pixel encoding mode. From the three-dimensional coordinates of the pixel of interest obtained in step S703 and the alert space information generated in step S604, it is determined whether there is an intruder in the pixel of interest. When the pixel value (distance information) in the target pixel is located in the space defined by the alert space information in the three-dimensional space, the process proceeds to step S707. If the pixel value (distance information) in the target pixel is not located in the space defined by the alert space information in the three-dimensional space, the process proceeds to step S716. In step S <b> 707, the determination unit 307 holds information indicating that the pixel of interest is in the alert space in an area not shown in the RAM 502 and different from other information.

ステップＳ７０８以降では、ステップＳ７０１においてランレングス符号化モードで符号化されたデータであると推定されたデータに対して、侵入物の有無を判定する処理を実行する。まずステップＳ７０８において、データを復号したランの左端点について、ランの左端点の３次元座標と警戒空間情報とに基づいて、ランの左端点が３次元空間において警戒空間の中に有るか否かを判定する。ランの左端点が警戒空間内に有ると判定した場合にはステップＳ７１０に進む。一方、当該ランの左端点が警戒空間の中には無い場合には、ステップＳ７０９に進む。ステップＳ７０９において判定部３０７は、ランの右端点が警戒空間内に有るか否かを判定する。ステップＳ７０９において、当該ランの右端点が警戒空間内に無いと判定された場合には、ランの左端点および右端点いずれも警戒空間の中にはないので、処理対象のラン上の全画素は警戒空間内にはないとしてステップＳ７１６へ進む。 In step S708 and subsequent steps, processing for determining the presence or absence of an intruder is performed on the data estimated in step S701 as data encoded in the run-length encoding mode. First, in step S708, based on the three-dimensional coordinates of the left end point of the run and the warning space information, whether or not the left end point of the run is in the warning space in the three-dimensional space. Determine. If it is determined that the left end point of the run is in the alert space, the process proceeds to step S710. On the other hand, if the left end point of the run is not in the alert space, the process proceeds to step S709. In step S709, the determination unit 307 determines whether or not the right end point of the run is in the alert space. If it is determined in step S709 that the right end point of the run is not in the alert space, neither the left end point nor the right end point of the run is in the alert space. Since it is not in the alert space, the process proceeds to step S716.

ステップＳ７０９において判定部３０７は、ランの左端点は警戒空間内にないがランの右端点は警戒空間内にあると判定された場合には、ランの右端点から左端点までの途中が部分的に警戒空間内にあるものと判定し、ステップＳ７１４へ進む。ステップＳ７１４において判定部３０７は、ランと警戒空間を規定する平面との交点を算出し、ステップＳ７１５へ進む。ステップＳ７１５では、ステップＳ７１４で求めたランと警戒空間を規定する平面との交点からランの右端点までのラン上の各画素が警戒空間内にあるものと判定する。このラン上の区間内の一連の画素が警戒空間内あるとの情報をＲＡＭ５０２上の不図示、かつ、その他の情報とは異なる領域に保持する。ステップＳ７１０において判定部３０７は、ランの左端点が警戒空間内に有る場合に、ランの右端点もまた、警戒空間内にあるか否かを判定する。ランの右端点も警戒空間内にある場合、ラン上の全画素が警戒空間の中にあると判定して、ステップＳ７１３へ進む。一方、ランの右端点は警戒空間内に無いと判定された場合には、ステップＳ７１１へ進む。ステップＳ７１３では、当該ランの左端点から右端点までのラン上の全画素が、警戒空間内にあるとの情報をＲＡＭ５０２上の不図示、かつ、その他の情報とは異なる領域に保持する。 In step S709, if the determination unit 307 determines that the left end point of the run is not in the alert space but the right end point of the run is in the alert space, the midway from the right end point of the run to the left end point is partially If it is in the alert space, the process proceeds to step S714. In step S714, the determination unit 307 calculates the intersection between the run and the plane that defines the alert space, and the process proceeds to step S715. In step S715, it is determined that each pixel on the run from the intersection of the run obtained in step S714 and the plane defining the alert space to the right end point of the run is in the alert space. Information indicating that a series of pixels in the section on the run are in the alert space is held in a region not shown in the RAM 502 and different from other information. In step S710, when the left end point of the run is in the alert space, the determination unit 307 determines whether the right end point of the run is also in the alert space. If the right end point of the run is also in the alert space, it is determined that all pixels on the run are in the alert space, and the process proceeds to step S713. On the other hand, if it is determined that the right end point of the run is not in the alert space, the process proceeds to step S711. In step S713, information that all the pixels on the run from the left end point to the right end point of the run are in the alert space is held in an area not shown in the RAM 502 and different from other information.

ステップＳ７１１において、ランと警戒空間を規定する平面との交点を算出し、ステップＳ７１２へ進む。ステップＳ７１２において、当該ランの左端点から、ステップＳ７１１で求めた当該ランと警戒空間を規定する平面との交点までの当該ラン上の各画素が警戒空間内にあるものと判定する。このラン上の区間内の一連の画素が警戒空間内あるとの情報をＲＡＭ５０２上の不図示、かつ、その他の情報とは異なる領域に保持して、ステップＳ７１２の処理を終える。 In step S711, the intersection of the run and the plane that defines the alert space is calculated, and the process proceeds to step S712. In step S712, it is determined that each pixel on the run from the left end point of the run to the intersection of the run obtained in step S711 and the plane defining the alert space is in the alert space. Information that a series of pixels in the section on the run is in the alert space is held in an area not shown in the RAM 502 and different from other information, and the process of step S712 is finished.

ステップＳ７１６において判定部３０７は、警戒空間内にはないと判定された画素の判定結果を確定させて、警戒空間への侵入はなかったものと判定する。 In step S716, the determination unit 307 determines the determination result of the pixel determined not to be in the alert space, and determines that there is no entry into the alert space.

ステップＳ７１７において判定部３０７は、警戒空間内にあると判定された画素について、距離画像における各画素の画素値（距離情報）とステップＳ６０３において生成した距離画像の同位置の画素値と比較する。ステップＳ６１３で生成した初期空間情報は、警戒空間に侵入物がいない状態での距離情報である。そこでステップＳ７１７において、初期空間情報と監視中の距離画像とにおいて、同じ画素の距離情報が、大きく異なる値か否かを判定する。ここで、距離情報は符号化により所定の誤差以内の誤差が発生している場合がある。そこで判定部３０７は、所定の誤差以上の差分があるか否かを判定する。なお所定の誤差は、距離画像を符号化する際に設定される許容される誤差に基づいて、設定されることが望ましい。距離画像を符号化する際の符号化パラメータ取得ができる場合は、符号化パラメータを参照して、符号化によって発生し得る許容誤差を、所定の誤差として設定する。ただし、符号化パラメータを取得できない場合は、距離画像を符号化する際に発生しうる誤差量を予め決めておいてもよい。 In step S717, the determination unit 307 compares the pixel value (distance information) of each pixel in the distance image with the pixel value at the same position in the distance image generated in step S603 for the pixel determined to be in the alert space. The initial space information generated in step S613 is distance information when there is no intruder in the alert space. Therefore, in step S717, it is determined whether or not the distance information of the same pixel is greatly different between the initial space information and the distance image being monitored. Here, the distance information may have an error within a predetermined error due to encoding. Therefore, the determination unit 307 determines whether or not there is a difference greater than a predetermined error. The predetermined error is preferably set based on an allowable error set when encoding the distance image. When encoding parameters can be acquired when encoding a distance image, an allowable error that can be generated by encoding is set as a predetermined error with reference to the encoding parameters. However, when an encoding parameter cannot be acquired, an error amount that may occur when encoding a distance image may be determined in advance.

初期空間情報と監視中の距離画像とにおいて、警戒空間内にある同じ画素の距離情報に所定の誤差以上の差分があれば、この警戒空間内と判定された画素位置に初期状態とは異なる、被写体が侵入したと判定する。そうではない場合には、初期状態でそもそも警戒空間内に存在した状況がそのまま検出されたものと判断する。ＲＡＭ５０２上の不図示の領域に保持される警戒空間内にあると判定された画素位置の情報を、侵入があったと判断される画素のみの状態に更新する。その結果、警戒空間内にあると判定される画素が残る場合には、監視対象空間への侵入があったものと判定し、残らなかった場合には、侵入はなったと判定する。ステップＳ７１７の処理を終えると、符号化処理モードに応じた侵入検出処理を終了する。 In the initial space information and the distance image being monitored, if there is a difference greater than a predetermined error in the distance information of the same pixel in the alert space, the pixel position determined to be in the alert space is different from the initial state. It is determined that the subject has entered. Otherwise, it is determined that the situation that originally existed in the alert space in the initial state is detected as it is. The information on the pixel position determined to be in the alert space held in the unillustrated area on the RAM 502 is updated to the state of only the pixel determined to have entered. As a result, when the pixel determined to be in the alert space remains, it is determined that the intrusion into the monitoring target space has occurred, and when it does not remain, it is determined that the intrusion has not occurred. When the process of step S717 is completed, the intrusion detection process corresponding to the encoding process mode is terminated.

なお、３次元空間中でのある一点が、同空間内での面で囲まれた閉空間の内側に有るか外側に有るかの判定方法に関しては公知の方法が存在し、例えば、特開平２−１００７７７に開示される。また、上記ステップＳ７１７において、警戒空間内にあると判定された点が初期状態とは異なること状態にあることの判定に、距離画像のみを用いて説明したが、同画素位置にあるカラー画像中の画素間の画素値の差異をも用いて判定するように構成してもよい。以上図７を用いて、ステップＳ６０７を詳述した。 There is a known method for determining whether a point in a three-dimensional space is inside or outside a closed space surrounded by a plane in the same space. -100777. In addition, in the above-described step S717, the determination that the point determined to be in the alert space is in a state different from the initial state has been described using only the distance image, but in the color image at the same pixel position. The determination may also be made using the difference in pixel values between these pixels. The step S607 has been described in detail with reference to FIG.

以降、図６のフローチャートに戻り、ステップＳ６０８において警戒制御部３０８は、監視中の動画において警戒空間への侵入があったと判定された場合には、ステップＳ６０９へ進み、そうではない場合には、ステップＳ６１０へ進む。ステップＳ６０９において警戒制御部３０８は、警戒空間への侵入があった場合に設定された制御を実行する。ここではまず、監視中の監視空間を撮像したフレーム画像において、ステップＳ７１７において侵入が有ったと判断された画素の位置と同位置にある画素値を、重畳用のフレーム画像エリアの対応する画素位置にコピーする。なお、重畳用のフレーム画像エリアは、説明を省いたが、ＲＡＭ５０２上の不図示、かつ、その他の情報とは異なる領域に確保されているものとする。かつ、ステップＳ６０９の処理を開始するまでの間に、重畳される画素値を含まない状態に初期化されているものとする。この重畳用のフレーム画像エリアの画像データは、ステップＳ６０３において生成された初期状態の環境空間（警戒空間への侵入がまだ検出されていない）に、公知の方法により重畳されたデータである。警戒制御部３０８は、重畳用フレーム画像エリアの画像データを、表示装置５０６上に表示させる。 Thereafter, returning to the flowchart of FIG. 6, the warning control unit 308 proceeds to step S609 when it is determined in step S608 that there is an intrusion into the warning space in the moving image being monitored, and otherwise, Proceed to step S610. In step S609, the alert control unit 308 executes the control set when there is an entry into the alert space. Here, first, in the frame image obtained by imaging the monitoring space being monitored, the pixel value at the same position as the pixel position determined to have entered in step S717 is used as the corresponding pixel position in the superimposition frame image area. Copy to. The superimposition frame image area is not described, but is assumed to be secured in an area not shown in the RAM 502 and different from other information. In addition, it is assumed that the process has been initialized to include no superimposed pixel value until the process of step S609 is started. The image data of the frame image area for superimposition is data superposed by a known method on the environmental space in the initial state generated in step S603 (intrusion into the alert space has not yet been detected). The alert control unit 308 causes the display device 506 to display the image data of the superimposition frame image area.

ステップＳ６１０において、ステップＳ６３１において獲得した監視中の監視空間の一フレーム分のフレーム画像と距離画像を、ステップＳ６４０でラスタスキャン順に処理を進めて、フレームの最終走査線の最終処理単位まで処理が終了したか否かを判断する。終了している場合には、ステップＳ６２０に進み、終了していない場合にはステップＳ６０７に戻り、ラスタスキャン順で次の位置にある処理単位の処理を開始する。ステップＳ６２０では、装置外部よりＩ／Ｆ部５０９を経由して、もしくは、操作者によりマウス等のポインティングデバイス５０５やキーボード５０４等の入力装置を用いて、一連の処理を終了指示があったか否かを判定する。終了指示がない場合には、ステップＳ６０５に戻り次のフレーム分に対する処理の開始に移る。終了指示があった場合には、一連の監視の処理を終了する。 In step S610, the frame image and distance image for one frame of the monitored space acquired in step S631 are processed in raster scan order in step S640, and the process is completed up to the final processing unit of the final scanning line of the frame. Determine whether or not. If completed, the process proceeds to step S620. If not completed, the process returns to step S607, and the process of the next processing unit in the raster scan order is started. In step S620, it is determined whether a series of processing has been instructed from the outside of the apparatus via the I / F unit 509 or by an operator using a pointing device 505 such as a mouse or a keyboard 504. judge. If there is no end instruction, the process returns to step S605 to start the process for the next frame. If there is an end instruction, the series of monitoring processes ends.

以上本実施形態によれば、距離画像を符号化することで撮像装置から監視装置である情報処理装置へ転送するデータ量や情報処理装置内で保存する際に必要となるメモリ量を削減している。一方で、距離画像に基づいて侵入物を検出するために、復号するデータがいかなる符号化モードで符号化されたかに応じて侵入物を判定することで、効率的に判定できる。特にランレングスモードで符号化された圧縮データの場合、ラン上の複数の画素をまとめて判定することができる。 As described above, according to the present embodiment, by encoding the distance image, the amount of data to be transferred from the imaging device to the information processing device that is the monitoring device and the amount of memory required for saving in the information processing device are reduced. Yes. On the other hand, since the intruder is detected based on the distance image, the intruder can be efficiently determined by determining the intruder according to the encoding mode in which the data to be decoded is encoded. Particularly in the case of compressed data encoded in the run length mode, a plurality of pixels on the run can be determined collectively.

＜その他の実施形態＞
なお、第一の実施形態では、警戒空間情報は、操作者により警戒領域を規定する４点の情報を入力した。ただし必ずしも、指定する領域は４点である必要はなく、例えば、底面を長方形とし、その対角にある二頂点となる２点を入力するもので有っても良い。さらに、底面は必ずしも四角形で有る必要もなく、多角形を構成する複数の頂点を入力するものでも良い。また、初期状態の監視空間に存在する被写体のうち、被写体の接近を警戒すべき被写体を操作者に指定させ、警戒すべき被写体を含むように警戒空間を設定する構成としてもよい。 <Other embodiments>
In the first embodiment, as the warning space information, four points of information that define the warning area are input by the operator. However, the area to be specified does not necessarily have to be four points. For example, the bottom surface may be a rectangle, and two points that are two vertices on the diagonal may be input. Further, the bottom surface does not necessarily have to be a rectangle, and a plurality of vertices constituting a polygon may be input. In addition, among the subjects existing in the monitoring space in the initial state, the operator may designate a subject that should be alerted to the approach of the subject, and the alerting space may be set so as to include the subject to be alerted.

また、侵入発生時の対応として、警戒制御部３０８は、侵入が有ったと判断された部分画像を、監視空間初期画像（背景画像）に重畳表示するものとして説明した。しかしながら、これに限らない。真の意味での侵入が有ったと判断された画素が発生した時点で、警報音を発したり、警告表示をしたり、あるいは、警告情報を通報したり等の予め定める動作をさせるように構成しても良い。 Further, as a response when an intrusion occurs, the alert control unit 308 has been described as superimposing and displaying the partial image determined to have intruded on the monitoring space initial image (background image). However, the present invention is not limited to this. Configured to perform predetermined actions such as sounding an alarm, displaying a warning, or reporting warning information when a pixel that is determined to have truly entered is generated You may do it.

また前述の実施形態では、画素単位の符号化とラン単位の符号化が混在した符号化方式により符号化された距離画像の圧縮データに基づくものとして説明した。しかしながら、これだけに限らない。即ち、例えば、画像を複数のブロックに分割し、ブロック毎に符号化するブロック符号化と称される手法において、複数の異なるサイズのブロックが混在する形式での符号化方式により符号化された圧縮データを用いてもよい。距離画像における同じ画素値をもつ画素の領域を、同画素領域の大きさに応じて、複数種類のサイズ（例えば、６４×６４、３２×３２、・・・、４×４、２×２、１×１）のうち、可能な限り大きなサイズで符号化する方式に基づくものでも良い。この場合には、例えば、前述の実施形態におけるランレングスモードでラン毎に判定したように、ブロック毎に警戒空間に含まれるか否かを判定するように構成すればよい。各ブロックの４隅の画素のそれぞれの画素位置と各画素値とにより、ブロック全体が警戒空間外に有るか否かの判定が可能である。ブロックの全体が、警戒空間外、または警戒空間内との判定がつかず、部分的に警戒空間の内外に係る場合には、ブロック内の各画素まで復号して処理をするとしても、ブロック全体で判定がつく場合に対しては、処理の高速化が図れる。部分的に警戒空間の内外にかかる場合にもさらに縦横半分のサイズのサブブロックの単位で内外判定をやり直す様に構成しても良い。この場合には、さらに、ブロック全体を復号して、画素単位に内外判定を行う場合よりも高速な処理を実現できる。 Further, in the above-described embodiment, the description has been given on the basis of the compressed data of the distance image encoded by the encoding method in which the encoding in units of pixels and the encoding in units of runs are mixed. However, it is not limited to this. That is, for example, in a method called block coding in which an image is divided into a plurality of blocks and encoded for each block, the compression is encoded by a coding method in a format in which a plurality of blocks of different sizes are mixed. Data may be used. A region of pixels having the same pixel value in the distance image is classified into a plurality of sizes (for example, 64 × 64, 32 × 32,..., 4 × 4, 2 × 2, 1 × 1) may be based on a method of encoding with the largest possible size. In this case, for example, as determined for each run in the run length mode in the above-described embodiment, it may be configured to determine whether or not each block is included in the alert space. It is possible to determine whether or not the entire block is outside the alert space based on the pixel positions and the pixel values of the four corner pixels of each block. If the entire block is outside the warning space or cannot be determined to be inside the warning space, and partially relates to the inside or outside of the warning space, the entire block may be decoded and processed even if each pixel in the block is decoded. In the case where the determination can be made with, the processing speed can be increased. Even when it partially covers the inside and outside of the alert space, the inside / outside determination may be re-executed in units of sub-blocks of half the vertical and horizontal sizes. In this case, it is possible to realize processing at a higher speed than the case where the entire block is decoded and the inside / outside determination is performed for each pixel.

本発明は、上述の実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサがプログラムを読出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。 The present invention supplies a program that realizes one or more functions of the above-described embodiments to a system or apparatus via a network or a storage medium, and one or more processors in the computer of the system or apparatus read and execute the program This process can be realized. It can also be realized by a circuit (for example, ASIC) that realizes one or more functions.

Claims

In a monitoring space in which distance information from the imaging device to the subject can be obtained by imaging by the imaging device, a setting unit that sets a distance image in the initial state of the monitoring space, and a warning space for detecting intrusion of the subject,
An acquisition means for acquiring compressed data obtained by encoding a distance image composed of distance information for each pixel in an encoding format in which encoding for each pixel and encoding for each of a plurality of pixels are mixed.
Estimating means for estimating whether a codeword to be decoded in the compressed data is a codeword encoded for each pixel or a codeword encoded for each of a plurality of pixels;
Whether or not there is a change in the distance information in the alert space for the pixel position of the distance information according to the result estimated by the estimation means, by decoding the codeword to be decoded and obtaining the distance information A determination means for determining
The determination means includes
When the estimation unit estimates that the codeword is encoded for each pixel, the distance information and the initial state of the monitoring space are obtained for one pixel having distance information by decoding the codeword to be decoded. To determine whether there is a change in the distance information in the alert space,
When the estimation unit estimates that the code word is encoded for each of a plurality of pixels, the distance information and the initial value of the monitoring space are obtained for a plurality of pixels having distance information by decoding the code word to be decoded. An information processing apparatus that determines whether there is a change in the distance information in the alert space based on a state.

Furthermore, if the determination means determines that there is a change in the distance information in the alert space, it is regarded as indicating that the subject has entered the alert space, and control means for warning that the subject has entered the alert space. The information processing apparatus according to claim 1, further comprising:

The encoding for each of the plurality of pixels is run-length encoding.
If the estimation unit determines that the estimation unit is a codeword encoded for each of a plurality of pixels, the determination unit specifies the length of the run, and whether a part or all of the run is in the alert space. And if part or all of the run is in the alert space, the distance information of the pixels in the alert space is the same as the pixels in the alert space. The information processing apparatus according to claim 1, wherein the information processing apparatus determines whether there is a difference greater than a predetermined error and distance information in an initial state.

4. The information processing according to claim 3, wherein the determination unit calculates coordinates in a three-dimensional space of both end points of the run, and determines whether or not the end points of the run are in the monitoring space. apparatus.

Further, the acquisition unit acquires a moving image captured by the imaging device in the monitoring space for each frame,
The information processing apparatus according to claim 1, wherein the distance image is a distance image corresponding to each frame of the moving image.

The determination means compares the distance information of the pixels located in the alert space with the distance information in the initial state of the monitoring space, and determines whether there is a difference greater than or equal to a predetermined error. The information processing apparatus according to any one of claims 1 to 5, wherein the information processing apparatus is detected.

Furthermore, it has a parameter acquisition means for acquiring a parameter for encoding corresponding to the compressed data,
The information processing apparatus according to claim 1, wherein the predetermined error is set according to the parameter.

A computer program that causes a computer to function as the information processing apparatus according to claim 1 by being read and executed by a computer.

In the monitoring space where the distance information from the imaging device to the subject can be obtained by imaging with the imaging device, the distance information of the initial state of the monitoring space and the warning space for detecting the intrusion of the subject are set,
In a coding format in which coding for each pixel and coding for each of a plurality of pixels are mixed, compressed data obtained by coding a distance image composed of distance information for each pixel is obtained,
Estimating whether a codeword to be decoded in the compressed data is a codeword encoded for each pixel or a codeword encoded for each of a plurality of pixels;
Decoding the codeword to be decoded to obtain distance information;
When it is estimated that the code word is encoded for each pixel, one pixel having distance information by decoding the code word to be decoded is based on the distance information and the initial state of the monitoring space. Determine whether there is a change in the distance information in the alert space,
When it is estimated that the code is encoded for each of a plurality of pixels, the plurality of pixels having distance information by decoding the code to be decoded are based on the distance information and the initial state of the monitoring space. And determining whether there is a change in the distance information in the alert space.