JP6436846B2

JP6436846B2 - Moving object detection device, video decoding device, and moving object detection method

Info

Publication number: JP6436846B2
Application number: JP2015083578A
Authority: JP
Inventors: 一之宮澤; 関口　俊一; 俊一関口; 守屋　芳美; 芳美守屋; 彰峯澤; 亮史服部
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 2015-04-15
Filing date: 2015-04-15
Publication date: 2018-12-12
Anticipated expiration: 2035-04-15
Also published as: JP2016208087A

Description

この発明は、符号化された映像から移動物体を検出する移動物体検出装置、映像復号装置および移動物体検出方法に関するものである。 The present invention relates to a moving object detection device, a video decoding device, and a moving object detection method for detecting a moving object from an encoded video.

映像からの移動物体検出は、映像監視などの様々なアプリケーションにおいて重要な技術である。映像から移動物体を検出するためには、まず映像の動き情報を求める必要があるが、これは一般に演算量が大きい。そのため、例えばリアルタイム処理を行うためには高性能なハードウェアが必要であり、アプリケーションの適用範囲を狭める原因となっている。 Moving object detection from video is an important technology in various applications such as video surveillance. In order to detect a moving object from an image, it is necessary to first obtain motion information of the image, but this generally requires a large amount of computation. For this reason, for example, high-performance hardware is required to perform real-time processing, which is a cause of narrowing the application range of applications.

また、映像はそのままでは情報量が膨大であるため、一般にＭＰＥＧまたはＩＴＵ−ＴＨ．２６ｘ等の国際標準映像符号化方式を用いて符号化処理が施され、圧縮された状態で伝送および蓄積される。したがって、符号化された映像に対して移動物体検出を行うためには、動き情報を求める処理を行う前にまず映像の復号を実施する必要があり、さらに演算量が大きくなる。 In addition, since the amount of information is enormous if the video is as it is, generally MPEG or ITU-TH. An encoding process is performed using an international standard video encoding method such as 26x, and the compressed data is transmitted and stored in a compressed state. Therefore, in order to detect a moving object with respect to an encoded video, it is necessary to first decode the video before performing a process for obtaining motion information, which further increases the amount of calculation.

ところで、ＭＰＥＧおよびＩＴＵ−ＴＨ．２６ｘ等の国際標準映像符号化方式では、まず映像を小さなブロックに分割し、ブロックごとに予測処理を行って予測残差のみを伝送することで情報量を削減している。ここで、符号化対象となるブロックの予測は、そのブロックが属するフレームだけを用いて行う画面内予測と、符号化済みの他フレームの中から符号化対象と類似したブロックを探索して用いる動き補償予測とに分けられる。動き補償予測を行う場合は、符号化対象ブロックから他フレーム内の類似ブロック（いわゆる「参照ブロック」）までの移動を表すベクトルを符号化してビットストリームの一部として出力する。このベクトルは動きベクトルと呼ばれ、符号化対象ブロックが移動物体を含む場合はその移動物体の動き情報に相当する。すなわち、符号化された映像から動きベクトルを復号すれば、その映像の動き情報を少ない演算量で求めることができる。例えば特許文献１および特許文献２において、符号化された映像の動きベクトルを使った映像解析方法が提案されている。 By the way, MPEG and ITU-TH. In the international standard video coding scheme such as 26x, first, a video is divided into small blocks, a prediction process is performed for each block, and only the prediction residual is transmitted to reduce the amount of information. Here, the prediction of a block to be encoded includes intra-frame prediction performed using only the frame to which the block belongs, and motion used by searching for a block similar to the encoding target from other encoded frames. It is divided into compensation prediction. When performing motion compensation prediction, a vector representing a movement from a current block to a similar block (so-called “reference block”) in another frame is encoded and output as a part of a bitstream. This vector is called a motion vector. When the encoding target block includes a moving object, it corresponds to the motion information of the moving object. That is, if the motion vector is decoded from the encoded video, the motion information of the video can be obtained with a small amount of calculation. For example, Patent Document 1 and Patent Document 2 propose a video analysis method using a motion vector of an encoded video.

特開平６−２９２２０３号公報JP-A-6-292203 特開平１０−７５４５７号公報Japanese Patent Laid-Open No. 10-75457

映像符号化における動き補償予測の過程で計算される動きベクトルは、常に高精度に被写体の動きを表しているとは限らず、ビットストリームには信頼性の低い動きベクトルも含まれていることが多い。そのため、先行技術のように、符号化された映像から単純に動きベクトルを復号して移動物体検出を行うと、信頼性の低い動きベクトルがノイズとして働いて誤検出が頻繁に発生するため、高精度な移動物体検出を行うことができない課題があった。 Motion vectors calculated in the process of motion compensation prediction in video coding do not always represent subject motion with high precision, and bitstreams may contain motion vectors with low reliability. Many. For this reason, as in the prior art, when a moving object is detected by simply decoding a motion vector from an encoded video, an unreliable motion vector acts as noise and frequent false detections occur. There was a problem that accurate moving object detection could not be performed.

この発明は、上記のような課題を解決するためになされたものであり、信頼性の低い動きベクトルの影響を小さくして移動物体検出の精度を向上させた移動物体検出装置および移動物体検出方法を提供することを目的とする。また、かかる移動物体検出装置を有する映像復号装置を提供することを目的とする。 The present invention has been made to solve the above-described problems, and a moving object detection apparatus and a moving object detection method that improve the accuracy of moving object detection by reducing the influence of motion vectors with low reliability. The purpose is to provide. Another object of the present invention is to provide a video decoding device having such a moving object detection device.

この発明の移動物体検出装置は、動き補償予測により符号化された映像ビットストリームから、符号化ブロックの符号化対象フレームと参照フレーム間の動きベクトルを復号する可変長復号部と、複数の動きベクトルのそれぞれに対する投票値を算出する投票値算出部と、投票値算出部が算出した投票値を、各々の動きベクトルの参照フレームに対応する投票面の動きベクトルが指す座標に投票する投票部と、投票部が投票した投票値を投票面の座標ごとに累積した値である累積投票値を記憶する累積投票値記憶部と、累積投票値を用いて映像ビットストリームの映像に含まれる移動物体を検出する移動物体検出部と、を備え、投票値算出部は、可変長復号部が映像ビットストリームから復号した指標を用いて動きベクトルに対する投票値を算出するものであって、動きベクトルを有する符号化ブロックのブロックサイズが小さいほど大きい投票値を算出するものである。 A moving object detection apparatus according to the present invention includes a variable length decoding unit that decodes a motion vector between an encoding target frame of a coding block and a reference frame from a video bitstream encoded by motion compensation prediction, and a plurality of motion vectors A voting value calculation unit that calculates a voting value for each of the voting value calculated by the voting value calculation unit, and a voting unit that votes the coordinates indicated by the motion vector of the voting surface corresponding to the reference frame of each motion vector; A voting value voted by the voting unit is accumulated for each coordinate of the voting plane, and a cumulative voting value storage unit that stores a cumulative voting value, and a moving object included in the video bitstream is detected using the cumulative voting value. It includes a moving object detection unit which, a voting value calculation unit, a variable length decoding unit a vote value for the motion vector using the index decoded from the video bitstream Been made to output, and calculates a large voting value as the block size of the encoded block is less with the motion vector.

この発明の映像復号装置は、上記映像復号装置と、映像ビットストリームから復号画像を生成する復号画像生成部と、を備えるものである。 A video decoding apparatus according to the present invention includes the above video decoding apparatus and a decoded image generation unit that generates a decoded image from a video bitstream.

この発明の移動物体検出方法は、可変長復号部が、動き補償予測により符号化された映像ビットストリームから、符号化ブロックの符号化対象フレームと参照フレーム間の動きベクトルを復号し、投票値算出部が、複数の動きベクトルのそれぞれに対する投票値を算出し、投票部が、投票値算出部が算出した投票値を、各々の動きベクトルの参照フレームに対応する投票面の動きベクトルが指す座標に投票し、累積投票値記憶部が、投票部が投票した投票値を投票面の座標ごとに累積した値である累積投票値を記憶し、移動物体検出部が、累積投票値を用いて映像ビットストリームの映像に含まれる移動物体を検出する移動物体検出方法において、投票値算出部は、可変長復号部が映像ビットストリームから復号した指標を用いて動きベクトルに対する投票値を算出するものであって、動きベクトルを有する符号化ブロックのブロックサイズが小さいほど大きい投票値を算出するものである。 In the moving object detection method of the present invention, the variable length decoding unit decodes the motion vector between the encoding target frame of the encoding block and the reference frame from the video bitstream encoded by the motion compensation prediction, and calculates a voting value. A voting value for each of a plurality of motion vectors, and a voting unit sets the voting value calculated by the voting value calculation unit to a coordinate indicated by a motion vector of a voting surface corresponding to a reference frame of each motion vector. The accumulated vote value storage unit stores the accumulated vote value that is a value obtained by accumulating the vote value voted by the vote unit for each coordinate of the voting plane, and the moving object detection unit uses the accumulated vote value to store the video bit. in the moving object detection method for detecting a moving object included in the video stream, voting value calculating unit includes a motion variable length decoding unit by using the index decoded from the video bit stream vector A calculates a voting value for, and calculates the larger voting value block size of the encoded block is less with the motion vector.

この発明の移動物体検出装置、映像復号装置および移動物体検出方法は、複数のフレームにおける複数の動きベクトルを統合して用いることで、信頼性の低い動きベクトルの影響を小さくして移動物体検出の精度を向上させることができる。 The moving object detection device, the video decoding device, and the moving object detection method of the present invention reduce the influence of a motion vector with low reliability by integrating and using a plurality of motion vectors in a plurality of frames. Accuracy can be improved.

この発明の実施の形態１に係る移動物体検出装置の要部を示すブロック図である。It is a block diagram which shows the principal part of the moving object detection apparatus which concerns on Embodiment 1 of this invention. 動き補償予測におけるフレームの参照構造を示す説明図である。It is explanatory drawing which shows the reference structure of the flame | frame in motion compensation prediction. 各フレーム内の移動物体を示す説明図である。It is explanatory drawing which shows the moving object in each flame | frame. 各フレーム内の移動物体および符号化ブロックと、動きベクトルが指す座標とを示す説明図である。It is explanatory drawing which shows the moving object and encoding block in each flame | frame, and the coordinate which a motion vector points out. 累積投票値記憶部に記憶された初期状態の投票面を示す説明図である。It is explanatory drawing which shows the voting surface of the initial state memorize | stored in the accumulation vote value memory | storage part. Ｈ．２６５における符号化ブロックのサイズを示す説明図である。H. 3 is an explanatory diagram illustrating the size of a coding block in H.265. FIG. ガウス関数を用いた投票値の補正の様子を示す説明図である。It is explanatory drawing which shows the mode of correction | amendment of the vote value using a Gaussian function. ブロックサイズを８×８、１６×１６、３２×３２、および６４×６４とした投票結果を示す説明図である。It is explanatory drawing which shows the voting result which made the block size 8x8, 16x16, 32x32, and 64x64. 投票部による投票の様子を示す説明図である。It is explanatory drawing which shows the mode of the vote by a voting part. 投票部による投票の様子を示す説明図である。It is explanatory drawing which shows the mode of the vote by a voting part. 投票部による投票の様子を示す説明図である。It is explanatory drawing which shows the mode of the vote by a voting part. 動き補償予測におけるフレームの参照構造を示す説明図である。It is explanatory drawing which shows the reference structure of the flame | frame in motion compensation prediction. 投票部による逆方向の投票の様子を示す説明図である。It is explanatory drawing which shows the mode of the vote of the reverse direction by a voting part. 移動物体検出部が累積値記憶部から読み出した投票面を示す説明図である。It is explanatory drawing which shows the voting surface which the moving object detection part read from the accumulation value memory | storage part. 図１４の投票面に対する閾値処理の結果を示す説明図である。It is explanatory drawing which shows the result of the threshold value process with respect to the voting surface of FIG. 移動物体検出部が出力する移動物体検出結果を示す説明図である。It is explanatory drawing which shows the moving object detection result which a moving object detection part outputs. この発明の実施の形態１に係る移動物体検出装置の動作を示すフローチャートである。It is a flowchart which shows operation | movement of the moving object detection apparatus which concerns on Embodiment 1 of this invention. この発明の実施の形態２に係る映像復号装置の要部を示すブロック図である。It is a block diagram which shows the principal part of the video decoding apparatus which concerns on Embodiment 2 of this invention. この発明の実施の形態２に係る映像復号装置が復号した復号画像を示す説明図である。It is explanatory drawing which shows the decoded image which the video decoding apparatus concerning Embodiment 2 of this invention decoded. 図１９の復号画像に移動物体の検出結果を合成した画像を示す説明図である。It is explanatory drawing which shows the image which synthesize | combined the detection result of the moving object with the decoded image of FIG. 図１９の復号画像に移動物体の検出結果を合成した画像を示す説明図である。It is explanatory drawing which shows the image which synthesize | combined the detection result of the moving object with the decoded image of FIG.

実施の形態１．
図１は、実施の形態１に係る移動物体検出装置の要部を示すブロック図である。
可変長復号部１１は、動き補償予測を用いて符号化されたフレームを含む映像ビットストリームを入力として受け取り、対応する符号化方式にて規定されている方法に従って復号する処理回路である。可変長復号部１１は、映像ビットストリームに含まれる情報のうち、少なくとも動きベクトルを含む移動物体検出に使用する情報を復号して出力するようになっている。 Embodiment 1 FIG.
FIG. 1 is a block diagram illustrating a main part of the moving object detection device according to the first embodiment.
The variable length decoding unit 11 is a processing circuit that receives, as an input, a video bitstream including a frame encoded using motion compensated prediction, and decodes it according to a method defined by a corresponding encoding method. The variable length decoding unit 11 decodes and outputs information used for detecting a moving object including at least a motion vector among information included in the video bitstream.

投票値算出部１２は、可変長復号部１１の出力を受け取り、動きベクトルごとに、その動きベクトルの移動物体検出における信頼度に対応する所定の値を算出する処理回路である。以下、この値を「投票値」という。 The voting value calculation unit 12 is a processing circuit that receives the output of the variable length decoding unit 11 and calculates a predetermined value corresponding to the reliability in moving object detection of the motion vector for each motion vector. Hereinafter, this value is referred to as “voting value”.

累積投票値記憶部１３は、２次元配列を記憶する記憶素子である。例えばこの２次元配列は、映像ビットストリームに含まれる各フレームと同じ大きさである。以下、この２次元配列を「投票面」という。累積投票値記憶部１３は、投票面の座標ごとに、その座標を指す動きベクトルに対する投票値の累積値を記憶するようになっている。以下、この累積値を「累積投票値」という。 The cumulative vote value storage unit 13 is a storage element that stores a two-dimensional array. For example, this two-dimensional array is the same size as each frame included in the video bitstream. Hereinafter, this two-dimensional array is referred to as a “voting surface”. The cumulative vote value storage unit 13 stores the cumulative value of the vote value for the motion vector indicating the coordinates for each coordinate of the voting plane. Hereinafter, this cumulative value is referred to as “cumulative vote value”.

投票部１４は、可変長復号部１１から出力された動きベクトルと、投票値算出部１２で算出されたその動きベクトルの投票値と、累積投票値記憶部１３に記憶されているその動きベクトルが指す座標の累積投票値とを受け取り、累積投票値に投票値を加算して再び累積投票値記憶部１３に記憶させる処理回路である。以下、投票部１４が投票値を加算する処理を「投票」という。 The voting unit 14 includes the motion vector output from the variable length decoding unit 11, the voting value of the motion vector calculated by the voting value calculating unit 12, and the motion vector stored in the cumulative voting value storage unit 13. This is a processing circuit that receives the cumulative vote value of the indicated coordinate, adds the vote value to the cumulative vote value, and stores it again in the cumulative vote value storage unit 13. Hereinafter, the process in which the voting unit 14 adds the vote values is referred to as “voting”.

移動物体検出部１５は、累積投票値記憶部１３から累積投票値を受け取り、閾値処理を施すことで移動物体検出対象のフレームにおける移動物体を検出してその結果を出力する処理回路である。 The moving object detection unit 15 is a processing circuit that receives a cumulative vote value from the cumulative vote value storage unit 13, detects a moving object in a moving object detection target frame by performing threshold processing, and outputs the result.

可変長復号部１１、投票値算出部１２、累積投票値記憶部１３、投票部１４および移動物体検出部１５により、移動物体検出装置１００が構成されている。 The variable length decoding unit 11, the vote value calculation unit 12, the cumulative vote value storage unit 13, the vote unit 14, and the moving object detection unit 15 constitute a moving object detection device 100.

ここで、国際標準映像符号化方式ＩＴＵ−ＴＨ．２６４およびＨ．２６５等で一般的に用いられている動き補償予測におけるフレームの参照構造について説明する。
図２に示す９枚のフレームを符号化することを考える。図２の各フレームに示した数字はフレーム番号であり、フレーム１〜８は動き補償予測を行うフレームである。フレーム０は映像の一番最初のフレームであり、動き補償予測を行わず、画面内予測のみ行う。Ｈ．２６４およびＨ．２６５等では、フレーム番号順に０、１、２、と符号化していくこともできるが、フレームの順番を入れ替え、参照構造を図２に示すように階層化することで圧縮性能が向上する。図２の例では、まずフレーム０を符号化した後にフレーム８を符号化し、その後、４、２、６、１、３、５、７の順で符号化していく。なお、図２に示した矢印は参照関係を表す。すなわち、フレーム８を符号化する際にはフレーム０を参照しながら符号化するブロックごとにフレーム８からフレーム０への動きベクトルを求めていく。また、フレーム４を符号化する際にはフレーム０とフレーム８を参照する。フレーム０を参照するか、フレーム８を参照するか、あるいはそれら両方を参照するかは符号化時にブロック単位で切り替えることができる。図２において、参照関係は常に階層の上から下へと向かっている。すなわち、符号化対象フレームよりも階層が上のフレームを参照することはない。 Here, the international standard video encoding method ITU-T H.264. H.264 and H.H. A frame reference structure in motion compensation prediction generally used in H.265 and the like will be described.
Consider encoding the nine frames shown in FIG. The numbers shown in each frame in FIG. 2 are frame numbers, and frames 1 to 8 are frames for motion compensation prediction. Frame 0 is the very first frame of the video, and only intra prediction is performed without performing motion compensation prediction. H. H.264 and H.H. In the case of H.265, etc., 0, 1, 2, etc. can be encoded in the order of frame numbers, but the compression performance is improved by changing the order of the frames and hierarchizing the reference structure as shown in FIG. In the example of FIG. 2, first, frame 0 is encoded, then frame 8 is encoded, and then encoded in the order of 4, 2, 6, 1, 3, 5, and 7. In addition, the arrow shown in FIG. 2 represents a reference relationship. That is, when encoding frame 8, a motion vector from frame 8 to frame 0 is obtained for each block to be encoded with reference to frame 0. Further, when encoding frame 4, reference is made to frame 0 and frame 8. Whether to refer to frame 0, frame 8, or both can be switched in units of blocks at the time of encoding. In FIG. 2, the reference relationship is always from the top to the bottom of the hierarchy. That is, a frame higher than the encoding target frame is not referred to.

図２に示す参照構造では、例えばフレーム４では、フレーム２からの動きベクトル、フレーム３からの動きベクトル、フレーム５からの動きベクトル、およびフレーム６からの動きベクトルが得られていることになり、一つのフレームに対して複数のフレームからの動きベクトルが利用できる。この実施の形態１では、こうした複数のフレームからの動きベクトルを投票処理によって統合して用いることで、高精度な移動物体検出を行う。 In the reference structure shown in FIG. 2, for example, in frame 4, the motion vector from frame 2, the motion vector from frame 3, the motion vector from frame 5, and the motion vector from frame 6 are obtained. Motion vectors from a plurality of frames can be used for one frame. In the first embodiment, highly accurate moving object detection is performed by using motion vectors from a plurality of frames integrated by voting.

なお、図２に示した参照構造はこれを一つの単位として以降のフレームでも同様に繰り返される。この様子を図１２に示す。したがって、ここではフレーム０〜８を例にとって説明するが、フレーム８以降でも単に同じ動作を繰り返すだけでよい。また、この実施の形態１は図２に示す参照構造に限定されるものではなく、任意の参照構造に対して適用することができる。 It should be noted that the reference structure shown in FIG. 2 is similarly repeated in subsequent frames with this as a unit. This is shown in FIG. Therefore, although description will be given here using frames 0 to 8 as an example, the same operation may be simply repeated after frame 8. Further, the first embodiment is not limited to the reference structure shown in FIG. 2, and can be applied to any reference structure.

図２に示したフレーム２〜６において、図３のように移動物体（図３の例では被写体の車を示す）が左から右へ移動していくとする。このとき、図４に示すように移動物体の位置に符号化ブロックがあると、フレーム２、３、５、６のそれぞれからフレーム４への動きベクトルは、信頼性が高い場合は一貫してフレーム４のほぼ同じ座標を指す。一方、ノイズ的に発生する信頼性の低い動きベクトルはランダムで一貫性がないため、フレーム４の様々な座標を指す。そこでこの実施の形態１では、動きベクトルによる参照をその座標への投票とみなし、複数の動きベクトルによる投票の結果、高い投票値を得た領域を移動物体として検出する。 In frames 2 to 6 shown in FIG. 2, it is assumed that the moving object (showing the subject car in the example of FIG. 3) moves from left to right as shown in FIG. At this time, if there is a coding block at the position of the moving object as shown in FIG. 4, the motion vectors from each of the frames 2, 3, 5 and 6 to the frame 4 are consistently frames when the reliability is high. 4 points to almost the same coordinates. On the other hand, since the motion vector with low reliability generated in noise is random and inconsistent, it indicates various coordinates of the frame 4. Therefore, in the first embodiment, a reference based on a motion vector is regarded as a vote for the coordinates, and a region having a high vote value as a result of voting based on a plurality of motion vectors is detected as a moving object.

次に、図１７のフローチャートを参照して、移動物体検出装置１００の動作について説明する。
まず、可変長復号部１１は、動き補償予測を用いて符号化されたフレームを含む映像ビットストリームを入力として受け取り、対応する符号化方式にて規定されている方法に従って復号を行う（図１７のステップＳＴ１００）。このとき、可変長復号部１１は、ビットストリームの全てを復号してもよいし、その一部だけを復号してもよい。一部だけを復号する場合は、投票値算出部１２および投票部１４で使用する情報だけを復号する。 Next, the operation of the moving object detection device 100 will be described with reference to the flowchart of FIG.
First, the variable length decoding unit 11 receives a video bitstream including a frame encoded using motion compensated prediction as an input, and performs decoding according to a method defined by a corresponding encoding method (FIG. 17). Step ST100). At this time, the variable length decoding unit 11 may decode all of the bit stream or only a part thereof. When only a part is decoded, only the information used in the vote value calculation unit 12 and the vote unit 14 is decoded.

投票は、図５に示すように各フレームと同じ大きさの投票面を用意して行う。投票面ｉの座標（ｘ，ｙ）には、フレームｉの座標（ｘ，ｙ）の累積投票値を格納する。ここで、ｉはフレームの番号である。投票面は、あらかじめ例えば０などの値で全ての座標の値を初期化された状態で累積投票値記憶部１３に格納されている。図５の例では、全てのフレームの全ての座標の値が０であるため、全てのフレームが黒で表現されている。 Voting is performed by preparing a voting surface having the same size as each frame as shown in FIG. The accumulated voting value of the coordinates (x, y) of the frame i is stored in the coordinates (x, y) of the voting plane i. Here, i is a frame number. The voting plane is stored in the cumulative voting value storage unit 13 in a state in which all coordinate values are initialized with a value such as 0 in advance. In the example of FIG. 5, since all coordinate values of all frames are 0, all frames are expressed in black.

なお、ここでは投票面の大きさを各フレームと同じ大きさとして説明しているが、必ずしも同じ大きさでなくともよい。例えば投票面の大きさをフレームの大きさよりも小さくすることで、累積投票値記憶部１３が必要とするメモリのサイズを節約したり、投票に必要な処理時間を短くしたりすることが可能である。 Here, the voting surface is described as having the same size as each frame, but the voting surface is not necessarily the same size. For example, by making the size of the voting surface smaller than the size of the frame, it is possible to save the memory size required for the cumulative voting value storage unit 13 and shorten the processing time required for voting. is there.

可変長復号部１１により最初に復号されるのはフレーム０であるが、ここではフレーム０は一番最初に符号化されたフレームであり、動き補償予測において参照できる他フレームが存在しないためフレーム全体が画面内符号化されており動きベクトルは存在しないものとする。すなわち、フレーム０はＩピクチャである。したがって、投票値算出部１２は、フレーム０の次に復号されるフレーム８に含まれる全ての符号化ブロックの動きベクトルを可変長復号部１１からまず受け取り、各動きベクトルに対する投票値を算出する（図１７のステップＳＴ１０１）。 The variable length decoding unit 11 first decodes the frame 0, but here the frame 0 is the first encoded frame, and there is no other frame that can be referred to in motion compensation prediction. Is encoded in the screen and no motion vector exists. That is, frame 0 is an I picture. Therefore, the voting value calculation unit 12 first receives the motion vectors of all the encoded blocks included in the frame 8 decoded next to the frame 0 from the variable length decoding unit 11, and calculates the voting value for each motion vector ( Step ST101 in FIG. 17).

投票値は、常に固定の定数値としてもよいし、他の指標に基づいて各動きベクトルに応じて変更してもよい。各動きベクトルに異なる投票値を設定する場合、例えばその動きベクトルを有する符号化ブロックのサイズに応じて決定する。国際標準映像符号化方式ＩＴＵ−ＴＨ．２６５では、符号化ブロックを符号化時に再帰的に四分木分割し、ブロックのサイズを６４×６４画素から８×８画素といったフレームの内容に合わせて変更することができる。このとき、図６に示すように、移動物体領域では動きベクトルの精度を高めるためブロックサイズが小さくなり、動きのない背景領域などではブロックサイズが大きくなりやすい。したがって、動きベクトルを有する符号化ブロックのサイズが小さいほど、その動きベクトルは移動物体に属している可能性が高い（移動物体検出における信頼度が高い）と言える。 The vote value may always be a fixed constant value, or may be changed according to each motion vector based on another index. When a different voting value is set for each motion vector, the voting value is determined according to, for example, the size of the encoded block having the motion vector. International standard video encoding system ITU-T H.264. In H.265, the encoded block can be recursively divided into quadtrees at the time of encoding, and the block size can be changed according to the content of the frame from 64 × 64 pixels to 8 × 8 pixels. At this time, as shown in FIG. 6, in the moving object region, the block size is reduced in order to increase the accuracy of the motion vector, and in the background region where there is no motion, the block size is likely to be increased. Therefore, it can be said that the smaller the size of a coding block having a motion vector, the higher the possibility that the motion vector belongs to a moving object (high reliability in moving object detection).

そこで、投票値算出部１２は、可変長復号部１１から動きベクトルと併せてブロックサイズも受け取り、これを用いて動きベクトルの投票値を求める。ここで、ある動きベクトルが指す座標に対応する投票面の座標を（ｍ，ｎ）、ブロックの横幅をＢｘ、縦幅をＢｙとすると、投票面の座標（ｘ，ｙ）に対するこの動きベクトルの投票値ｖは以下の式（１）のように決定する。

式（１）において、符号化ブロックの横幅の最大値をＬｘ、縦幅の最大値をＬｙとしている。ＬｘとＬｙの値は例えば６４画素である。 Therefore, the voting value calculation unit 12 also receives the block size from the variable length decoding unit 11 together with the motion vector, and uses this to determine the voting value of the motion vector. Here, assuming that the coordinates of the voting plane corresponding to the coordinates indicated by a certain motion vector are (m, n), the horizontal width of the block is Bx, and the vertical width is By, the motion vector with respect to the voting plane coordinates (x, y) The vote value v is determined as in the following formula (1).

In equation (1), the maximum horizontal width of the encoded block is Lx, and the maximum vertical width is Ly. The value of Lx and Ly is, for example, 64 pixels.

このようにブロックサイズに基づいて動きベクトルの投票値を決定することで、移動物体である可能性が高い小さなサイズのブロックが有する動きベクトルの投票値は大きく、大きなブロックが有する動きベクトルの投票値は小さくすることができるため、動きベクトルの投票に基づいた移動物体検出の精度を高めることができる。 By determining the motion vector voting value based on the block size in this way, the motion vector voting value of a small size block that is likely to be a moving object is large, and the motion vector voting value of a large block is large. Therefore, the accuracy of moving object detection based on motion vector voting can be increased.

なお、可変長復号部１１では、ブロックサイズの他にも様々なパラメータをビットストリームから復号することができるため、投票値算出部１２はこれらのパラメータを受け取ることでブロックサイズ以外のパラメータ、およびそれらの組み合わせに基づいて投票値を決定するようにしてもよい。 Since the variable length decoding unit 11 can decode various parameters in addition to the block size from the bitstream, the voting value calculation unit 12 receives these parameters to receive parameters other than the block size and those parameters. The voting value may be determined based on the combination.

さらに、符号化ブロックの内部において、そのブロックの中心に近いほど移動物体である可能性が高いと考えて、投票値ｖを所定の重み付け関数を用いて補正することも考えられる。中心から外側に向けて値が小さくなる重み付け関数としてはガウス関数がよく知られている。投票値ｖのガウス関数による補正は、次式（２）で与えられる。

式（２）において、ｖ´は補正後の投票値、σ_ｘ ^２とσ_ｙ ^２はそれぞれ水平方向と垂直方向の分散であり、例えば次式（３）のようにしてブロックサイズが小さいほど分散を小さく、ブロックサイズが大きいほど分散を大きくする。

式（３）において、αとβは例えば０．２５などの定数である。 Furthermore, it is considered that the closer to the center of the block in the coding block, the higher the possibility that the moving object is, and the vote value v is corrected using a predetermined weighting function. A Gaussian function is well known as a weighting function whose value decreases from the center toward the outside. The correction of the vote value v by a Gaussian function is given by the following equation (2).

In equation (2), v ′ is the voted value after correction, and σ _x ² and σ _y ² are the variances in the horizontal and vertical directions, respectively. For example, as shown in the following equation (3), the smaller the block size, the more the variance And increase the variance as the block size increases.

In Expression (3), α and β are constants such as 0.25.

ブロックサイズに応じてガウス関数で投票値を補正する様子を図７に示す。また、ブロックサイズを８×８、１６×１６、３２×３２、および６４×６４として実際に投票を行った結果を図８に示す。 FIG. 7 shows how the vote value is corrected with a Gaussian function according to the block size. In addition, FIG. 8 shows the results of actual voting with block sizes of 8 × 8, 16 × 16, 32 × 32, and 64 × 64.

このように動きベクトルの投票値を重み付け関数により補正することで、移動物体である可能性が高いブロック中心の投票値を大きくし、中心から離れるにしたがって投票値を小さくすることができるため、動きベクトルの投票に基づいた移動物体検出の精度を高めることができる。 By correcting the voting value of the motion vector with the weighting function in this way, the voting value at the block center that is likely to be a moving object can be increased, and the voting value can be decreased as the distance from the center increases. The accuracy of moving object detection based on vector voting can be improved.

なお、例えばブロック中心などの一点に対してのみ投票を行ってもよいが、点ではなくある程度の面積を持った領域に対して投票を行う方が投票の信頼性を高めることができる。動きベクトルは、誤差により数画素程度のずれが生じる場合があるため、図８のように領域に対して投票を行うことで動きベクトルのずれの影響を小さくしてより安定した投票結果を得ることができる。 For example, the voting may be performed only for one point such as the center of the block, but the voting reliability can be improved by voting on a region having a certain area instead of the point. Since the motion vector may be shifted by several pixels due to an error, voting on the area as shown in FIG. 8 can reduce the influence of the motion vector shift and obtain a more stable voting result. Can do.

投票値算出部１２は、このようにしてフレーム８に含まれる全ての符号化ブロックの動きベクトルに対して投票値を算出し、その結果を投票部１４に出力する。 In this way, the voting value calculation unit 12 calculates voting values for the motion vectors of all the encoded blocks included in the frame 8 and outputs the result to the voting unit 14.

投票部１４は、可変長復号部１１で復号された動きベクトルと、投票値算出部１２で算出された動きベクトルの投票値を受け取り、累積投票値記憶部１３から投票面を読み出して投票を行う。フレーム８の各動きベクトルは、フレーム０を参照フレームとして算出されているため、投票部１４は累積投票値記憶部１３からフレーム０に相当する投票面０を読み出し（図１７のステップＳＴ１０２）、可変長復号部１１から受け取った動きベクトルが指す座標へ投票値算出部１２から受け取った投票値を投票する。すなわち、投票面０の当該座標が持つ累積投票値に、現在の投票値を加算する。この様子を図９に示す。図９では、フレーム８における３つの符号化ブロックが持つ動きベクトルのそれぞれに基づいて投票面０へ投票を行った結果を示している。投票が完了した後、投票部１４は、累積投票値が更新された投票面０を累積投票値記憶部１３に格納する（図１７のステップＳＴ１０３）。 The voting unit 14 receives the motion vector decoded by the variable length decoding unit 11 and the voting value of the motion vector calculated by the voting value calculating unit 12, reads the voting surface from the cumulative voting value storage unit 13, and performs voting. . Since each motion vector of the frame 8 is calculated using the frame 0 as a reference frame, the voting unit 14 reads the voting plane 0 corresponding to the frame 0 from the cumulative vote value storage unit 13 (step ST102 in FIG. 17), and is variable. The voting value received from the voting value calculation unit 12 is voted to the coordinates indicated by the motion vector received from the long decoding unit 11. That is, the current voting value is added to the cumulative voting value of the coordinates on the voting plane 0. This is shown in FIG. FIG. 9 shows the result of voting on voting plane 0 based on each of the motion vectors of the three encoded blocks in frame 8. After the voting is completed, the voting unit 14 stores the voting surface 0 with the accumulated voting value updated in the accumulated voting value storage unit 13 (step ST103 in FIG. 17).

ここで、投票を行う際、信頼性が低いと判断できる動きベクトルは投票から除外するようにしてもよい。動きベクトルの信頼性は、例えば符号化対象フレームと参照フレームの間隔に基づいて判断できる。投票を行うフレームの間隔が大きい場合、動きベクトルの精度が低下するため、フレームの間隔に対して閾値（以下「間隔閾値」という）を設定し、フレームの間隔がその間隔閾値を超える場合は投票を行わないようにする。フレームの間隔は、単純に２枚のフレームのフレーム番号の差分として求めてもよいし、映像のフレームレートが既知である場合は、フレームレートを用いて実際の時間間隔として求めてもよい。 Here, when voting, motion vectors that can be determined to have low reliability may be excluded from voting. The reliability of the motion vector can be determined based on the interval between the encoding target frame and the reference frame, for example. When the interval between frames for voting is large, the accuracy of the motion vector is lowered. Therefore, a threshold (hereinafter referred to as “interval threshold”) is set for the frame interval, and voting is performed when the frame interval exceeds the interval threshold Do not do. The frame interval may be obtained simply as the difference between the frame numbers of the two frames, or may be obtained as an actual time interval using the frame rate when the frame rate of the video is known.

また、可変長復号部１１から得られるパラメータを利用して動きベクトルの信頼性を判断するようにしてもよい。例えば、可変長復号部１１からそのブロックにおけるＤＣＴ（ＤｉｓｃｒｅｔｅＣｏｓｉｎｅＴｒａｎｓｆｏｒｍ）係数を受け取るようにし、そのエネルギーが閾値（以下「エネルギー閾値」という）を超える場合はその動きベクトルの信頼性が低いと判断して投票を行わないようにしてもよい。ここで、ＤＣＴ係数のエネルギーは、予測誤差の大きさに相当するため、ＤＣＴ係数のエネルギーが大きいブロックは動き補償予測の精度が低く、動きベクトルの信頼性も低いと考えられる。 Further, the reliability of the motion vector may be determined using parameters obtained from the variable length decoding unit 11. For example, when a DCT (Discrete Cosine Transform) coefficient in the block is received from the variable length decoding unit 11 and the energy exceeds a threshold (hereinafter referred to as “energy threshold”), it is determined that the reliability of the motion vector is low. You may choose not to vote. Here, since the energy of the DCT coefficient corresponds to the magnitude of the prediction error, it is considered that the block with a large DCT coefficient energy has low motion compensated prediction accuracy and low motion vector reliability.

このように動きベクトルの信頼性を判定し、信頼性の低い動きベクトルを用いた投票を行わないことで、信頼性の低い動きベクトルによる影響を小さくしてより高精度な移動物体検出を行うことが可能となる。 By determining the reliability of motion vectors in this way and not voting using motion vectors with low reliability, the effect of motion vectors with low reliability can be reduced, and more accurate moving object detection can be performed. Is possible.

また、映像は必ずしもカメラを固定した状態で、かつ焦点距離などのカメラパラメータを変えずに撮影されているとは限らない。撮影中のカメラの運動およびカメラパラメータの変更は、映像全体に亘る動きベクトル（いわゆる「グローバルモーション」）を生じさせる。したがって、高精度な移動物体検出を行うためには、必要に応じてグローバルモーションを推定して動きベクトルを補正する必要がある。可変長復号部１１から得られたある符号化ブロックの動きベクトルが（ＭＶｘ，ＭＶｙ）であり、その符号化ブロックにおいて推定したグローバルモーションが（ＧＭＶｘ，ＧＭＶｙ）であったとすると、補正後の動きベクトル（ＭＶｘ´，ＭＶｙ´）は次式（４）で与えられる。

なお、グローバルモーションに関する情報がビットストリームに含まれている場合は、これを可変長復号部１１から受け取って利用してもよい。カメラ運動およびカメラパラメータの変化によるグローバルモーションを補正して投票を行うことで、より高精度な移動物体検出を行うことが可能となる。 Also, the video is not necessarily captured with the camera fixed and without changing camera parameters such as the focal length. Changes in camera motion and camera parameters during filming produce motion vectors throughout the video (so-called “global motion”). Therefore, in order to detect a moving object with high accuracy, it is necessary to estimate a global motion and correct a motion vector as necessary. If the motion vector of a certain encoded block obtained from the variable length decoding unit 11 is (MVx, MVy) and the global motion estimated in the encoded block is (GMVx, GMVy), the corrected motion vector (MVx ′, MVy ′) is given by the following equation (4).

In addition, when the information regarding global motion is included in the bit stream, it may be received from the variable length decoding unit 11 and used. By correcting the global motion due to camera motion and camera parameter changes and voting, it becomes possible to detect moving objects with higher accuracy.

次いで、投票値算出部１２は、フレーム８が移動物体検出対象のフレームでない場合（図１７のステップＳＴ１０４でＮＯとなった場合）、フレーム８の次に復号されるフレーム４に含まれる全ての符号化ブロックの動きベクトルを可変長復号部１１から受け取り、フレーム８の時と同様に各動きベクトルの投票値を求める（図１７のステップＳＴ１０５，ＳＴ１０１）。 Next, when the frame 8 is not a moving object detection target frame (when NO is determined in step ST104 in FIG. 17), the voting value calculation unit 12 determines all the codes included in the frame 4 to be decoded next to the frame 8. The motion vector of the generalized block is received from the variable length decoding unit 11, and the voting value of each motion vector is obtained in the same manner as in the case of frame 8 (steps ST105 and ST101 in FIG. 17).

フレーム４の各動きベクトルは、それぞれがフレーム０とフレーム８のいずれかあるいは両方を参照フレームとして算出されているため、投票部１４は累積投票値記憶部１３からフレーム０とフレーム８に相当する投票面０、投票面８を読み出し（図１７のステップＳＴ１０２）、可変長復号部１１から受け取った動きベクトルが指す座標へ投票値算出部１２から受け取った投票値を投票する。このとき、フレーム０を参照している動きベクトルの投票値は投票面０へ、フレーム８を参照している動きベクトルの投票値は投票面８へ、両方を参照している動きベクトルの投票値は投票面０と８の両方へ投票する。この様子を図１０に示す。投票が完了した後、投票部１４は、累積投票値が更新された投票面０と投票面８を累積投票値記憶部１３に格納する（図１７のステップＳＴ１０３）。 Since each motion vector of frame 4 is calculated by using either one or both of frame 0 and frame 8 as a reference frame, voting unit 14 votes from frame voting value storage unit 13 corresponding to frame 0 and frame 8. The plane 0 and the voting plane 8 are read (step ST102 in FIG. 17), and the voting value received from the voting value calculation unit 12 is voted to the coordinates indicated by the motion vector received from the variable length decoding unit 11. At this time, the voting value of the motion vector referring to the frame 0 is to the voting plane 0, the voting value of the motion vector referring to the frame 8 is to the voting plane 8, and the voting value of the motion vector referring to both Vote for both voting planes 0 and 8. This is shown in FIG. After the voting is completed, the voting unit 14 stores the voting surface 0 and the voting surface 8 with the accumulated voting value updated in the cumulative voting value storage unit 13 (step ST103 in FIG. 17).

以降、移動物体検出対象のフレームへの投票が未完了である場合（図１７のステップＳＴ１０４でＮＯとなった場合）は、フレーム２、６、１、３、５、７と同様の処理を繰り返していく（図１７のステップＳＴ１０５）。フレーム２で投票を行う様子を図１１に示す。なお、ここでは符号化された順序、すなわち復号する順序で投票を行う動作について説明したが、必ずしもこの順序で行う必要はない。例えば可変長復号部１１により、フレーム０からフレーム８まで全てのフレームの復号を行い、その後、フレーム１、２、３、といったフレームの表示順序にしたがって投票を行うようにしてもよい。 Thereafter, when the voting for the moving object detection target frame is incomplete (NO in step ST104 of FIG. 17), the same processing as that of frames 2, 6, 1, 3, 5, and 7 is repeated. (Step ST105 in FIG. 17). FIG. 11 shows how voting is performed in frame 2. In addition, although operation | movement which performs voting in the encoding order, ie, the decoding order, was demonstrated here, it does not necessarily need to be performed in this order. For example, all the frames from frame 0 to frame 8 may be decoded by the variable length decoding unit 11, and then voting may be performed according to the display order of frames 1, 2, 3, and the like.

また、各動きベクトルの符号を反転させて逆方向への投票処理を繰り返すようにしてもよい。すなわち、フレーム０〜８への投票が完了した後、投票部１４は投票面０を累積投票値記憶部１３から読み出す。このとき、フレーム０を参照しているフレームはフレーム１、２、４、８であるため、投票部１４はフレーム１、２、４、８の動きベクトルを可変長復号部１１から受け取る。そして、さらに投票面０、１、２、４、８を累積投票値記憶部１３から読み出し、フレーム１、２、４、８の各動きベクトルの符号を反転させて投票面０の累積投票値を投票面１、２、４、８にそれぞれ投票する。フレーム０からフレーム４へ逆方向の投票を行う様子を図１３に示す。図１３では、フレーム４からフレーム０への動きベクトルの座標を反転させ、投票面０の累積投票値を用いて投票面４への投票を行っている。逆方向の投票が完了した後、投票部１４は、累積投票値が更新された投票面１、２、４、８を累積投票値記憶部１３に格納する。 Further, the voting process in the reverse direction may be repeated by inverting the sign of each motion vector. That is, after the voting for the frames 0 to 8 is completed, the voting unit 14 reads the voting plane 0 from the cumulative vote value storage unit 13. At this time, since the frames referring to the frame 0 are the frames 1, 2, 4, and 8, the voting unit 14 receives the motion vectors of the frames 1, 2, 4, and 8 from the variable length decoding unit 11. Further, the voting planes 0, 1, 2, 4, and 8 are read from the cumulative voting value storage unit 13, and the signs of the motion vectors of the frames 1, 2, 4, and 8 are inverted to obtain the cumulative voting values of the voting plane 0. Vote on voting surfaces 1, 2, 4, and 8, respectively. FIG. 13 shows a state in which voting in the reverse direction is performed from frame 0 to frame 4. In FIG. 13, the coordinates of the motion vector from frame 4 to frame 0 are reversed, and the vote on voting plane 4 is performed using the cumulative voting value of voting plane 0. After the voting in the reverse direction is completed, the voting unit 14 stores the voting surfaces 1, 2, 4, and 8 in which the cumulative voting value is updated in the cumulative voting value storage unit 13.

このように、一度投票が完了した後に逆方向への投票を行うことで、他のフレームから参照されることが少ない例えばフレーム１、２、３、５、６、７などにも、信頼性の高い投票値を与えることができる。 Thus, by voting in the reverse direction once the voting is completed, the reliability of the frames 1, 2, 3, 5, 6, 7, etc. High voting value can be given.

移動物体検出対象のフレームの投票が完了した段階で（図１７のステップＳＴ１０４でＹＥＳとなった場合）、移動物体検出部１５は、累積投票値記憶部１３から移動物体検出対象のフレームに対応する投票面を読み出して移動物体検出のための閾値処理を行う（図１７のステップＳＴ１０６）。例えば、累積投票値記憶部１３から読み出された投票面が図１４のようであったとする。ここで、最終的な投票値が大きい領域ほど、移動物体領域である可能性が高い。したがって、移動物体検出部１５は、投票面の各座標に格納された累積投票値と閾値（以下「投票閾値」という）を比較し、累積投票値が投票閾値よりも大きくなる領域だけを選択することで移動物体検出を行う。図１４に対する閾値処理の結果を図１５に示す。図１５において、白で示した領域が検出された移動物体である。 When the voting of the moving object detection target frame is completed (when YES in step ST104 in FIG. 17), the moving object detection unit 15 corresponds to the moving object detection target frame from the cumulative vote value storage unit 13. The voting surface is read out and threshold processing for moving object detection is performed (step ST106 in FIG. 17). For example, it is assumed that the voting surface read from the cumulative vote value storage unit 13 is as shown in FIG. Here, a region having a higher final vote value is more likely to be a moving object region. Therefore, the moving object detection unit 15 compares the cumulative vote value stored in each coordinate of the voting surface with a threshold (hereinafter referred to as “voting threshold”), and selects only a region where the cumulative vote value is larger than the vote threshold. In this way, moving object detection is performed. FIG. 15 shows the result of the threshold processing for FIG. In FIG. 15, the area shown in white is a moving object detected.

投票閾値は、あらかじめ移動物体検出部１５の内部のメモリに保持しておいてもよいし、移動物体検出部１５の内部あるいは外部で計算して求めるようにしてもよい。なお、図２に示す階層化された参照構造では、階層に応じて累積投票値の大きさに違いが出るため、階層に応じて投票閾値を変更するようにしてもよい。例えば、移動物体検出部１５の内部のメモリに保持されている投票閾値がＴｈ、移動物体検出対象のフレームが属する階層がＨであったとすると（図２の例ではＨ＝０、１、２、３）、投票閾値をＴｈ／２^Ｈとして用いることが考えられる。参照構造における階層または映像の特性に応じて投票閾値を適応的に変更することで、より高精度な移動物体検出を行うことが可能となる。 The voting threshold value may be stored in advance in a memory inside the moving object detection unit 15, or may be calculated and obtained inside or outside the moving object detection unit 15. In the hierarchical reference structure shown in FIG. 2, the voting threshold value may be changed according to the hierarchy because the cumulative vote value differs depending on the hierarchy. For example, if the voting threshold value held in the memory inside the moving object detection unit 15 is Th and the hierarchy to which the moving object detection target frame belongs is H (H = 0, 1, 2, 3), it is conceivable to use a voting threshold as Th / ^{2 H.} By adaptively changing the voting threshold according to the hierarchy or video characteristics in the reference structure, it becomes possible to detect a moving object with higher accuracy.

移動物体検出結果は、例えば図１５に示すように移動物体領域の値を１、それ以外の領域の値を０とする２値マスク画像として出力してもよいし、あるいは図１６に示すように移動物体領域をラベリングして領域ごとに区別し、各領域を内包する矩形（いわゆる「バウンディングボックス」）を定義してその矩形の座標（例えば左上座標）とサイズ（例えば２辺の長さ）として出力してもよい（図１７のステップＳＴ１０７）。 For example, the moving object detection result may be output as a binary mask image in which the value of the moving object area is 1 and the values of the other areas are 0 as shown in FIG. 15, or as shown in FIG. Label the moving object area to distinguish each area, define a rectangle (so-called “bounding box”) that encloses each area, and set the coordinates (for example, upper left coordinates) and size (for example, the length of two sides) of the rectangle You may output (step ST107 of FIG. 17).

以上のように、実施の形態１の移動物体検出装置１００は、動き補償予測により符号化された映像ビットストリームから、符号化ブロックの符号化対象フレームと参照フレーム間の動きベクトルを復号する可変長復号部１１と、複数の動きベクトルのそれぞれに対する投票値を算出する投票値算出部１２と、投票値算出部１２が算出した投票値を、各々の動きベクトルの参照フレームに対応する投票面の動きベクトルが指す座標に投票する投票部１４と、投票部１４が投票した投票値を投票面の座標ごとに累積した値である累積投票値を記憶する累積投票値記憶部１３と、累積投票値を用いて映像ビットストリームの映像に含まれる移動物体を検出する移動物体検出部１５とを備える。符号化された映像から得られる動きベクトルを用いた移動物体検出において、動きベクトルの参照関係を利用した投票処理によって複数のフレームにおける複数の動きベクトルを統合的に用いて移動物体検出を行うように構成したので、信頼性の低い動きベクトルの影響を小さくして高精度な移動物体検出を行うことできる。 As described above, the moving object detection apparatus 100 according to Embodiment 1 has a variable length that decodes a motion vector between an encoding target frame of a coding block and a reference frame from a video bitstream encoded by motion compensation prediction. The voting value calculation unit 12 that calculates the voting value for each of the plurality of motion vectors, the voting value calculated by the voting value calculation unit 12 is used as the motion of the voting plane corresponding to the reference frame of each motion vector. The voting unit 14 that votes for the coordinates pointed to by the vector, the cumulative voting value storage unit 13 that stores the cumulative voting value that is a value obtained by accumulating the voting values voted by the voting unit 14 for each coordinate of the voting plane, and the cumulative voting value And a moving object detection unit 15 that detects a moving object included in the video of the video bitstream. In moving object detection using motion vectors obtained from encoded video, moving object detection is performed by using a plurality of motion vectors in a plurality of frames in an integrated manner by voting processing using a reference relation of motion vectors. Since it is configured, it is possible to perform highly accurate moving object detection by reducing the influence of motion vectors with low reliability.

また、投票値算出部１２は、可変長復号部１１が映像ビットストリームから復号した指標を用いて動きベクトルに対する投票値を算出する。具体的には、例えば、投票値算出部１２は、動きベクトルを有する符号化ブロックのブロックサイズが小さいほど大きい投票値を算出する。ブロックサイズに基づいて動きベクトルの投票値を決定することで、移動物体である可能性が高い小さなサイズのブロックが有する動きベクトルの投票値は大きく、大きなブロックが有する動きベクトルの投票値は小さくすることができるため、動きベクトルの投票に基づいた移動物体検出の精度を高めることができる。 The voting value calculation unit 12 calculates a voting value for the motion vector using the index decoded by the variable length decoding unit 11 from the video bitstream. Specifically, for example, the voting value calculation unit 12 calculates a larger voting value as the block size of an encoded block having a motion vector is smaller. By determining the motion vector voting value based on the block size, the motion vector voting value of a small-sized block that is likely to be a moving object is large, and the motion vector voting value of a large block is small. Therefore, the accuracy of moving object detection based on motion vector voting can be improved.

また、投票値算出部１２は、重み付け関数を用いて、算出した動きベクトルに対する投票値を補正する。具体的には、例えば、投票値算出部１２は、ガウス関数を用いて、符号化ブロックの中心部に近い座標を指す動きベクトルほど投票値を大きくする。動きベクトルの投票値を重み付け関数により補正することで、移動物体である可能性が高いブロック中心の投票値を大きくし、中心から離れるにしたがって投票値を小さくすることができるため、動きベクトルの投票に基づいた移動物体検出の精度を高めることができる。 The voting value calculation unit 12 corrects the voting value for the calculated motion vector using a weighting function. Specifically, for example, the voting value calculation unit 12 uses a Gaussian function to increase the voting value for a motion vector indicating a coordinate closer to the center of the encoded block. By correcting the voting value of the motion vector with a weighting function, the voting value at the center of the block that is likely to be a moving object can be increased, and the voting value can be decreased as the distance from the center increases. The accuracy of moving object detection based on the above can be improved.

また、投票面の座標は、複数個の画素を含む領域を示す座標である。動きベクトルは、誤差により数画素程度のずれが生じる場合があるため、図８のように領域に対して投票を行うことで動きベクトルのずれの影響を小さくしてより安定した投票結果を得ることができる。 Moreover, the coordinate of a voting surface is a coordinate which shows the area | region containing a some pixel. Since the motion vector may be shifted by several pixels due to an error, voting on the area as shown in FIG. 8 can reduce the influence of the motion vector shift and obtain a more stable voting result. Can do.

また、投票部１４は、各々の動きベクトルの信頼性を判定し、信頼性に応じて一部の動きベクトルを投票から除外する。具体的には、例えば、投票部１４は、各々の動きベクトルの符号化対象フレームと参照フレームとの間隔を間隔閾値と比較し、間隔が間隔閾値を超える動きベクトルを投票から除外する。または、投票部１４は、各々の動きベクトルを有する符号化ブロックの離散コサイン変換係数のエネルギーをエネルギー閾値と比較し、エネルギーがエネルギー閾値を超える動きベクトルを投票から除外する。動きベクトルの信頼性を判定し、信頼性の低い動きベクトルを用いた投票を行わないことで、信頼性の低い動きベクトルによる影響を小さくしてより高精度な移動物体検出を行うことが可能となる。 The voting unit 14 determines the reliability of each motion vector, and excludes some motion vectors from the vote according to the reliability. Specifically, for example, the voting unit 14 compares the interval between the encoding target frame of each motion vector and the reference frame with the interval threshold value, and excludes the motion vector whose interval exceeds the interval threshold value from the vote. Alternatively, the voting unit 14 compares the energy of the discrete cosine transform coefficient of the coding block having each motion vector with the energy threshold, and excludes the motion vector whose energy exceeds the energy threshold from the vote. By judging the reliability of motion vectors and not voting using motion vectors with low reliability, it is possible to reduce the influence of motion vectors with low reliability and perform more accurate moving object detection Become.

また、投票部１４は、動きベクトルに含まれるグローバルモーションを補正して投票を実行する。カメラ運動およびカメラパラメータの変化によるグローバルモーションを補正して投票を行うことで、より高精度な移動物体検出を行うことが可能となる。 The voting unit 14 performs voting by correcting the global motion included in the motion vector. By correcting the global motion due to camera motion and camera parameter changes and voting, it becomes possible to detect moving objects with higher accuracy.

また、投票部１４は、投票値算出部１２が算出した投票値を、各々の動きベクトルの参照フレームに対応する投票面の動きベクトルが指す座標に投票した後に、参照フレームに対応する投票面の各座標の累積投票値を、動きベクトルの符号を反転したベクトルが指す投票面の座標にそれぞれ投票する。一度投票が完了した後に逆方向への投票を行うことで、他のフレームから参照されることが少ない例えばフレーム１、２、３、５、６、７などにも、信頼性の高い投票値を与えることができる。 The voting unit 14 votes the voting value calculated by the voting value calculating unit 12 to the coordinates indicated by the motion vector of the voting surface corresponding to the reference frame of each motion vector, and then the voting surface corresponding to the reference frame. The cumulative voting value of each coordinate is voted to the coordinates of the voting plane pointed to by the vector obtained by inverting the sign of the motion vector. By voting in the reverse direction once voting is completed, highly reliable voting values are given to frames 1, 2, 3, 5, 6, 7, etc., which are rarely referred to by other frames. Can be given.

また、移動物体検出部１５は、各座標の累積投票値を投票閾値と比較することで、移動物体検出対象フレーム内の移動物体が存在する領域を判定する。映像ビットストリームに含まれるフレームは階層化された参照構造を有し、移動物体検出部１５は、移動物体検出対象フレームの階層に応じて投票閾値を設定する。参照構造における階層などに応じて投票閾値を適応的に変更することで、より高精度な移動物体検出を行うことが可能となる。 In addition, the moving object detection unit 15 determines the region where the moving object exists in the moving object detection target frame by comparing the cumulative vote value of each coordinate with the vote threshold. The frames included in the video bitstream have a hierarchical reference structure, and the moving object detection unit 15 sets a voting threshold according to the hierarchy of the moving object detection target frame. By adaptively changing the voting threshold according to the hierarchy in the reference structure, it is possible to detect a moving object with higher accuracy.

実施の形態２．
図１８は、実施の形態２に係る映像復号装置の要部を示すブロック図である。図１８を参照して、実施の形態１と同様の移動物体検出装置１００を有する映像復号装置３００について説明する。 Embodiment 2. FIG.
FIG. 18 is a block diagram showing a main part of the video decoding apparatus according to the second embodiment. With reference to FIG. 18, video decoding apparatus 300 having moving object detection apparatus 100 similar to that in the first embodiment will be described.

可変長復号部１１は、動き補償予測を用いて符号化されたフレームを含む映像ビットストリームを入力として受け取り、対応する符号化方式にて規定されている方法に従って復号する処理回路である。 The variable length decoding unit 11 is a processing circuit that receives, as an input, a video bitstream including a frame encoded using motion compensated prediction, and decodes it according to a method defined by a corresponding encoding method.

逆量子化・逆変換部２１は、可変長復号部１１により復号されたＤＣＴ係数を受け取り、これを逆量子化および逆変換して予測残差を出力する処理回路である。 The inverse quantization / inverse transform unit 21 is a processing circuit that receives the DCT coefficient decoded by the variable length decoding unit 11, dequantizes and inverse transforms the DCT coefficient, and outputs a prediction residual.

加算部２２は、逆量子化・逆変換部２１から出力された予測残差と、イントラ予測部２３あるいは動き補償予測部２４から出力された予測画像とを受け取り、それらを加算して復号画像を生成して出力する演算器である。 The addition unit 22 receives the prediction residual output from the inverse quantization / inverse conversion unit 21 and the prediction image output from the intra prediction unit 23 or the motion compensation prediction unit 24, and adds them to obtain a decoded image. An arithmetic unit that generates and outputs.

イントラ予測メモリ２５は、加算部２２から出力された復号画像を記憶する記憶素子である。 The intra prediction memory 25 is a storage element that stores the decoded image output from the adding unit 22.

ループフィルタ部２６は、加算部２２から出力された復号画像を受け取ってループフィルタ処理を施し、その結果を出力する処理回路である。 The loop filter unit 26 is a processing circuit that receives the decoded image output from the adder unit 22, performs a loop filter process, and outputs the result.

動き補償予測メモリ２７は、ループフィルタ部２６がループフィルタ処理を施した復号画像を受け取り、参照画像として記憶する記憶素子である。 The motion compensation prediction memory 27 is a storage element that receives the decoded image that has been subjected to the loop filter processing by the loop filter unit 26 and stores it as a reference image.

イントラ予測部２３は、イントラ予測メモリ２５に格納された復号画像の画素データを読み出し、イントラ予測を行ってイントラ予測画像を生成して出力する処理回路である。 The intra prediction unit 23 is a processing circuit that reads pixel data of a decoded image stored in the intra prediction memory 25, performs intra prediction, generates an intra predicted image, and outputs the intra predicted image.

動き補償予測部２４は、動き補償予測メモリ２７に格納された参照画像を読み出し、動き補償予測を行って動き補償予測画像を生成して出力する処理回路である。 The motion compensation prediction unit 24 is a processing circuit that reads a reference image stored in the motion compensation prediction memory 27, performs motion compensation prediction, and generates and outputs a motion compensation prediction image.

投票値算出部１２、累積投票値記憶部１３、投票部１４および移動物体検出部１５は、実施の形態１と同様のブロックであるため説明を省略する。 The voting value calculation unit 12, the cumulative voting value storage unit 13, the voting unit 14, and the moving object detection unit 15 are the same blocks as those in the first embodiment, and thus description thereof is omitted.

可変長復号部１１、投票値算出部１２、累積投票値記憶部１３、投票部１４および移動物体検出部１５により、移動物体検出装置１００が構成されている。可変長復号部１１、逆量子化・逆変換部２１、加算部２２、イントラ予測部２３、動き補償予測部２４、イントラ予測メモリ２５、ループフィルタ部２６および動き補償予測メモリ２７により、復号画像生成部２００が構成されている。移動物体検出装置１００および復号画像生成部２００により、映像復号装置３００が構成されている。 The variable length decoding unit 11, the vote value calculation unit 12, the cumulative vote value storage unit 13, the vote unit 14, and the moving object detection unit 15 constitute a moving object detection device 100. A variable length decoding unit 11, an inverse quantization / inverse transform unit 21, an adding unit 22, an intra prediction unit 23, a motion compensation prediction unit 24, an intra prediction memory 25, a loop filter unit 26, and a motion compensation prediction memory 27 generate a decoded image. The unit 200 is configured. The moving object detection apparatus 100 and the decoded image generation unit 200 constitute a video decoding apparatus 300.

次に、映像復号装置３００の動作につい説明する。
なお、復号画像生成部２００は、一般的な映像復号装置と同様に動作するため説明を省略する。また、移動物体検出装置１００のうち、可変長復号部１１、投票値算出部１２、累積投票値記憶部１３および投票部１４は、実施の形態１と同様に動作するため説明を省略する。 Next, the operation of the video decoding device 300 will be described.
Note that the decoded image generation unit 200 operates in the same manner as a general video decoding device, and thus description thereof is omitted. In the moving object detection apparatus 100, the variable length decoding unit 11, the vote value calculation unit 12, the cumulative vote value storage unit 13, and the vote unit 14 operate in the same manner as in the first embodiment, and thus the description thereof is omitted.

移動物体検出部１５は、移動物体検出対象のフレームの投票が完了した段階で、累積投票値記憶部１３から移動物体検出対象のフレームに対応する投票面を読み出して移動物体検出のための閾値処理を行う。移動物体検出部１５は、投票面の各座標に格納された累積投票値と投票閾値を比較し、累積投票値が投票閾値よりも大きくなる領域だけを選択することで移動物体検出を行う。 When the voting of the moving object detection target frame is completed, the moving object detection unit 15 reads out the voting plane corresponding to the moving object detection target frame from the cumulative vote value storage unit 13 and performs threshold processing for moving object detection. I do. The moving object detection unit 15 compares the accumulated voting value stored in each coordinate of the voting surface with the voting threshold, and performs moving object detection by selecting only a region where the accumulated voting value is larger than the voting threshold.

移動物体検出結果は、例えば図１５に示すように移動物体領域の値を１、それ以外の領域の値を０とする２値マスク画像として出力してもよいし、あるいは図１６に示すように移動物体領域をラベリングして領域ごとに区別し、各領域を内包するバウンディングボックスを定義してその矩形の座標（例えば左上座標）とサイズ（例えば２辺の長さ）として出力してもよい。 For example, the moving object detection result may be output as a binary mask image in which the value of the moving object area is 1 and the values of the other areas are 0 as shown in FIG. 15, or as shown in FIG. The moving object area may be labeled and distinguished for each area, a bounding box that includes each area may be defined, and output as rectangular coordinates (for example, upper left coordinates) and sizes (for example, the length of two sides).

また、移動物体検出部１５は、復号画像生成部２００から復号画像を受け取り、復号画像と移動物体検出結果を合成して出力するようにしてもよい。例えば、入力された復号画像が図１９のようであったとすると、移動物体検出部１５は、図１９において検出された移動物体領域以外の領域の画素値を０とすることで黒く塗りつぶし、検出された移動物体領域の画素値は変更しないことで図２０に示す画像を生成して出力するようにしてもよい。 Further, the moving object detection unit 15 may receive the decoded image from the decoded image generation unit 200, and may synthesize and output the decoded image and the moving object detection result. For example, if the input decoded image is as shown in FIG. 19, the moving object detection unit 15 paints black and detects the pixel value of the area other than the moving object area detected in FIG. 19. The image shown in FIG. 20 may be generated and output without changing the pixel value of the moving object region.

あるいは、図２１に示すように移動物体領域をラベリングして領域ごとに区別し、各領域を内包するバウンディングボックスを定義してそれを復号画像に合成して出力するようにしてもよい。 Alternatively, as shown in FIG. 21, moving object regions may be labeled and distinguished for each region, and bounding boxes containing each region may be defined and combined with a decoded image for output.

以上のように、実施の形態２の映像復号装置３００は、実施の形態１と同様の移動物体検出装置１００と、映像ビットストリームから復号画像を生成する復号画像生成部２００とを備える。符号化された映像から得られる動きベクトルを用いた移動物体検出において、動きベクトルの参照関係を利用した投票処理によって複数のフレームにおける複数の動きベクトルを統合的に用いて移動物体検出を行うように構成したので、信頼性の低い動きベクトルの影響を小さくして高精度な移動物体検出を行うことできる。 As described above, the video decoding device 300 according to the second embodiment includes the moving object detection device 100 similar to that of the first embodiment and the decoded image generation unit 200 that generates a decoded image from the video bitstream. In moving object detection using motion vectors obtained from encoded video, moving object detection is performed by using a plurality of motion vectors in a plurality of frames in an integrated manner by voting processing using a reference relation of motion vectors. Since it is configured, it is possible to perform highly accurate moving object detection by reducing the influence of motion vectors with low reliability.

また、移動物体検出部１５は、復号画像生成部２００が生成した復号画像に移動物体の検出結果を合成して外部に出力する。ビットストリームを復号することで得られる復号画像と移動物体検出結果を合成して出力することで、利用者にとって視覚的にわかりやすい形式で移動物体検出結果を提示することができる。 Further, the moving object detection unit 15 synthesizes the detection result of the moving object with the decoded image generated by the decoded image generation unit 200 and outputs the result to the outside. By synthesizing and outputting the decoded image obtained by decoding the bitstream and the moving object detection result, the moving object detection result can be presented in a format that is visually understandable to the user.

なお、本願発明はその発明の範囲内において、各実施の形態の自由な組み合わせ、あるいは各実施の形態の任意の構成要素の変形、もしくは各実施の形態において任意の構成要素の省略が可能である。 In the present invention, within the scope of the invention, any combination of the embodiments, or any modification of any component in each embodiment, or omission of any component in each embodiment is possible. .

１１可変長復号部、１２投票値算出部、１３累積投票値記憶部、１４投票部、１５移動物体検出部、２１逆量子化・逆変換部、２２加算部、２３イントラ予測部、２４動き補償予測部、２５イントラ予測メモリ、２６ループフィルタ部、２７動き補償予測メモリ、１００移動物体検出装置、２００復号画像生成部、３００映像復号装置。 DESCRIPTION OF SYMBOLS 11 Variable length decoding part, 12 Vote value calculation part, 13 Cumulative vote value storage part, 14 Voting part, 15 Moving object detection part, 21 Inverse quantization / inverse conversion part, 22 Adder part, 23 Intra prediction part, 24 Motion compensation Prediction unit, 25 intra prediction memory, 26 loop filter unit, 27 motion compensation prediction memory, 100 moving object detection device, 200 decoded image generation unit, 300 video decoding device.

Claims

A variable length decoding unit that decodes a motion vector between an encoding target frame of a coding block and a reference frame from a video bitstream encoded by motion compensation prediction;
A vote value calculator for calculating a vote value for each of the plurality of motion vectors;
A voting unit for voting the voting value calculated by the voting value calculating unit to coordinates indicated by the motion vector of a voting surface corresponding to a reference frame of each of the motion vectors;
An accumulated voting value storage unit that stores an accumulated voting value that is a value obtained by accumulating the voting value voted by the voting unit for each coordinate of the voting surface;
A moving object detection unit that detects a moving object included in the video of the video bitstream using the cumulative vote value ;
The voting value calculation unit calculates the voting value for the motion vector using an index decoded from the video bitstream by the variable length decoding unit, and a block size of an encoded block having the motion vector The smaller the is, the larger the vote value is calculated.
A moving object detection device characterized by that .

The moving object detection device according to claim 1, wherein the vote value calculation unit corrects the vote value for the calculated motion vector using a weighting function.

3. The moving object detection device according to claim 2, wherein the voting value calculation unit increases the voting value by using a Gaussian function as the motion vector indicating a coordinate closer to the center of the encoded block.

The moving object detection apparatus according to claim 1, wherein the coordinates of the voting surface are coordinates indicating a region including a plurality of pixels.

The moving object detection device according to claim 1, wherein the voting unit determines the reliability of each of the motion vectors and excludes some of the motion vectors from the voting according to the reliability.

The voting unit compares an interval between an encoding target frame of each of the motion vectors and a reference frame with an interval threshold, and excludes the motion vector whose interval exceeds the interval threshold from voting. Item 6. The moving object detection device according to Item 5 .

The voting unit compares the energy of a discrete cosine transform coefficient of an encoded block having each of the motion vectors with an energy threshold, and excludes the motion vector whose energy exceeds the energy threshold from voting. The moving object detection device according to claim 5 .

The moving object detection device according to claim 1, wherein the voting unit performs voting by correcting global motion included in the motion vector.

The voting unit, after voting the voting value calculated by the voting value calculating unit to the coordinates indicated by the motion vector of the voting surface corresponding to the reference frame of each motion vector, 2. The moving object detection apparatus according to claim 1, wherein the cumulative voting value of each coordinate of the voting surface is voted to the coordinate of the voting surface indicated by a vector obtained by inverting the sign of the motion vector.

The said moving object detection part determines the area | region where the said moving object exists in a moving object detection object flame | frame by comparing the said accumulation vote value of each coordinate with a vote threshold value. Moving object detection device.

Frames included in the video bitstream have a hierarchical reference structure;
The moving object detection device according to claim 10, wherein the moving object detection unit sets the voting threshold according to a hierarchy of the moving object detection target frame.

The moving object detection device according to claim 10, wherein the moving object detection unit outputs a binary mask image indicating a region where the moving object exists to the outside.

The moving object detection device according to claim 10 , wherein the moving object detection unit outputs the coordinates and size of a bounding box including an area where the moving object exists to the outside.

A moving object detection device according to claim 1;
A decoded image generation unit for generating a decoded image from the video bitstream;
A video decoding device comprising:

15. The video decoding device according to claim 14, wherein the moving object detection unit synthesizes the detection result of the moving object with the decoded image generated by the decoded image generation unit and outputs the result to the outside.

The variable length decoding unit decodes the motion vector between the encoding target frame of the encoding block and the reference frame from the video bitstream encoded by the motion compensation prediction,
A voting value calculation unit calculates a voting value for each of the plurality of motion vectors,
The voting unit votes the voting value calculated by the voting value calculating unit to the coordinates indicated by the motion vector of the voting surface corresponding to the reference frame of each of the motion vectors,
A cumulative voting value storage unit stores a cumulative voting value that is a value obtained by accumulating the voting value voted by the voting unit for each coordinate of the voting plane;
In the moving object detection method, the moving object detection unit detects the moving object included in the video of the video bitstream using the cumulative vote value .
The voting value calculation unit calculates the voting value for the motion vector using an index decoded from the video bitstream by the variable length decoding unit, and a block size of an encoded block having the motion vector The moving object detection method is characterized in that the smaller the value is, the larger the vote value is calculated .