JP2006254308A

JP2006254308A - Video recording apparatus and video recording method

Info

Publication number: JP2006254308A
Application number: JP2005071005A
Authority: JP
Inventors: Masaaki Sato; 正章佐藤
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 2005-03-14
Filing date: 2005-03-14
Publication date: 2006-09-21

Abstract

<P>PROBLEM TO BE SOLVED: To provide a video recording apparatus and video recording method capable of automatically recording high-resolution video information. <P>SOLUTION: An inputted video signal is received and after only a moving region is extracted, only a part where the face of a human figure is photographed, is extracted from the extracted moving region. A frame where the face of the human figure is photographed best, is detected and a best-shot trigger signal is generated. The inputted video signal is frequency-converted into a frequency domain video signal, the frequency domain video signal is then divided into a plurality of sub band video signals and when the best shot trigger signal is generated, all the sub band video signals for a frame at that time are selected but when the best shot trigger signal is not generated, only sub band video signals in a part of a low frequency side corresponding to the frame at that time are selected. The selected sub band video signals are compressed as sub band compressed video signals, the sub band video signals are multiplexed therewith and recorded in a storage 112 as compressed video signals. <P>COPYRIGHT: (C)2006,JPO&NCIPI

Description

本発明は、入力された映像を圧縮してハードディスク等の記録媒体に記録する映像記録装置及び映像記録方法に関する。 The present invention relates to a video recording apparatus and a video recording method for compressing an input video and recording it on a recording medium such as a hard disk.

従来、上述した映像記録装置には、ユーザの操作による解像度の切り替えを可能にしたものがある（例えば、特許文献１参照）。特許文献１で開示された動画及び静止画の記録／再生方法では、例えば通常の動画フレームを３２０×２４０の画素サイズで記録するが、高画質モードボタンを押下すると、そのタイミングで取得したフレーム画像については１２８０×９６０の画素サイズで記録することを可能としている。 2. Description of the Related Art Conventionally, there is a video recording apparatus that can switch resolutions by a user operation (for example, see Patent Document 1). In the moving image and still image recording / reproducing method disclosed in Patent Document 1, for example, a normal moving image frame is recorded with a pixel size of 320 × 240, but when a high image quality mode button is pressed, a frame image acquired at that timing is recorded. Can be recorded with a pixel size of 1280 × 960.

特開２００３−１２５３４４号公報（第６頁、図１）JP 2003-125344 A (6th page, FIG. 1)

しかしながら、従来の映像記録装置においては、解像度の切り替えはユーザの操作によるため、高解像度の映像情報を自動的に記録することができないという問題がある。 However, the conventional video recording apparatus has a problem that high-resolution video information cannot be automatically recorded because the resolution is switched by a user operation.

本発明は、係る事情に鑑みてなされたものであり、高解像度の映像情報を自動的に記録することができる映像記録装置及び映像記録方法を提供することを目的とする。 The present invention has been made in view of such circumstances, and an object thereof is to provide a video recording apparatus and a video recording method capable of automatically recording high-resolution video information.

上記目的は下記構成及び方法により達成される。
（１）入力された映像信号を記録媒体に蓄積する映像記録装置であって、入力された映像信号を周波数変換して周波数領域映像信号を得る周波数変換手段と、前記周波数変換手段で得られた周波数領域映像信号を複数のサブバンド映像信号に分割するサブバンド分割手段と、高解像度記録指定された映像に対しては、前記サブバンド分割手段で得られた複数のサブバンド映像信号の全てを選択し、前記高解像度記録指定されなかった映像に対しては、前記サブバンド分割手段で得られた複数のサブバンド映像信号のうち低域側の一部を選択する選択手段と、前記選択手段で選択された各サブバンド映像信号を圧縮してサブバンド圧縮映像信号を生成する圧縮手段と、前記高解像度記録指定された映像に対しては、前記選択手段で選択されて前記圧縮手段で圧縮された全てのサブバンド圧縮映像信号を多重し、前記高解像度記録指定されなかった映像に対しては、前記選択手段で選択されて前記圧縮手段で圧縮された低域側の一部のサブバンド圧縮映像信号を多重する多重化手段と、前記多重化手段にて多重化されたサブバンド圧縮映像信号を前記記録媒体に記録する記録制御手段と、を備える。この構成により、高解像度の映像情報を自動的に記録することが可能となる。また、映像の全てを高解像度で記録せず、意味を持たない映像については低解像度で記録するので、記録媒体を有効に活用することができる。 The above object can be achieved by the following constitution and method.
(1) A video recording apparatus for accumulating an input video signal in a recording medium, obtained by the frequency conversion means for converting the frequency of the input video signal to obtain a frequency domain video signal, and the frequency conversion means Subband dividing means for dividing the frequency domain video signal into a plurality of subband video signals, and for a video designated for high resolution recording, all of the plurality of subband video signals obtained by the subband dividing means are Selecting means for selecting a part on the low frequency side of the plurality of subband video signals obtained by the subband dividing means for the video that is selected and not designated for high resolution recording; and the selecting means Compression means for compressing each sub-band video signal selected in step (b) to generate a sub-band compressed video signal; and for the video designated for high-resolution recording, All the sub-band compressed video signals compressed by the compression means are multiplexed, and for the video that is not designated for high-resolution recording, the low-frequency side selected by the selection means and compressed by the compression means Multiplexing means for multiplexing some of the subband compressed video signals, and recording control means for recording the subband compressed video signals multiplexed by the multiplexing means on the recording medium. With this configuration, it is possible to automatically record high-resolution video information. In addition, since all of the video is not recorded at a high resolution and a video having no meaning is recorded at a low resolution, the recording medium can be effectively used.

（２）フレーム単位で入力された映像信号を記録媒体に蓄積する映像記録装置であって、入力された映像信号を画像処理して人物が最も良く写っているフレームを検出してベストショットトリガ信号を生成するベストショットトリガ信号生成手段と、入力された映像信号を周波数変換して周波数領域映像信号を得る周波数変換手段と、前記周波数変換手段で得られた周波数領域映像信号を複数のサブバンド映像信号に分割するサブバンド分割手段と、前記ベストショットトリガ信号生成手段にてベストショットトリガ信号が生成されたときには、そのときのフレームに対応する全てのサブバンド映像信号を選択し、前記ベストショットトリガ信号が生成されなかったときには、そのときのフレームに対応する低域側の一部のサブバンド映像信号を選択する選択手段と、前記選択手段にて選択された各サブバンド映像信号を圧縮してサブバンド圧縮映像信号として出力する圧縮手段と、前記圧縮手段にて圧縮された各サブバンド映像信号を多重して圧縮映像信号として出力する多重手段と、前記多重手段からの圧縮映像信号の前記記録媒体への記録を制御する記録制御手段と、を備える。この構成により、人物の顔が最も良く映っている画像のみを高解像度で記録することができる。すなわち、入力された映像信号から高解像度の映像情報を自動的に取得して記録することができる。また、映像の全てを高解像度で記録せず、ベストショットでない映像については低解像度で記録するので、記録媒体を有効に活用することができる。 (2) A video recording apparatus for storing a video signal input in units of frames in a recording medium, and performing image processing on the input video signal to detect a frame in which a person is best reflected to detect a best shot trigger signal A best shot trigger signal generating means for generating a frequency conversion means for frequency-converting an input video signal to obtain a frequency domain video signal, and a frequency domain video signal obtained by the frequency conversion means for a plurality of subband videos When the best shot trigger signal is generated by the subband dividing means for dividing the signal and the best shot trigger signal generating means, all the subband video signals corresponding to the frame at that time are selected, and the best shot trigger is selected. When no signal was generated, some subband images on the low frequency side corresponding to the frame at that time Selection means for selecting a signal, compression means for compressing each subband video signal selected by the selection means and outputting it as a subband compressed video signal, and each subband video signal compressed by the compression means And a recording control means for controlling the recording of the compressed video signal from the multiplexing means onto the recording medium. With this configuration, it is possible to record only an image in which a person's face is best reflected with high resolution. That is, it is possible to automatically acquire and record high-resolution video information from the input video signal. In addition, since all of the video is not recorded at a high resolution and a video that is not the best shot is recorded at a low resolution, the recording medium can be used effectively.

（３）入力された映像信号を記録媒体に蓄積する映像記録方法であって、入力された映像信号を周波数変換し、これにより得られた周波数領域映像信号を複数のサブバンド映像信号に分割し、高解像度記録指定された映像に対しては、該映像に対する複数のサブバンド映像信号の全てを圧縮してサブバンド圧縮映像信号を生成し、これにより得られた全てのサブバンド圧縮映像信号を多重して記録媒体に記録し、前記高解像度記録指定されなかった映像に対しては、該映像に対する複数のサブバンド映像信号の低域側の一部を圧縮してサブバンド圧縮映像信号を生成し、これにより得られた一部のサブバンド圧縮映像信号を多重して記録媒体に記録する。この構成により、高解像度の映像情報を自動的に記録することが可能となる。また、映像の全てを高解像度で記録せず、意味を持たない映像については低解像度で記録するので、記録媒体を有効に活用することができる。 (3) A video recording method for storing an input video signal in a recording medium, wherein the input video signal is frequency-converted, and the resulting frequency domain video signal is divided into a plurality of subband video signals. For a video designated for high resolution recording, all of the plurality of subband video signals corresponding to the video are compressed to generate a subband compressed video signal, and all the subband compressed video signals obtained thereby are Multiplexed and recorded on a recording medium, and for the video that is not designated for high-resolution recording, a part of the low frequency side of the plurality of subband video signals for the video is compressed to generate a subband compressed video signal Then, some of the sub-band compressed video signals obtained thereby are multiplexed and recorded on the recording medium. With this configuration, it is possible to automatically record high-resolution video information. In addition, since all of the video is not recorded at a high resolution and a video having no meaning is recorded at a low resolution, the recording medium can be effectively used.

（４）映像信号を入力として受けて記録媒体に蓄積する映像記録方法であって、入力された映像信号を受けて動きがある領域のみ抽出した後、抽出した動領域の中から人物が写っている部分を抽出して人物が最も良く写っているフレームを検出してベストショットトリガ信号を生成し、入力された映像信号を周波数領域映像信号に周波数変換した後、該周波数領域映像信号を複数のサブバンド映像信号に分割し、前記ベストショットトリガ信号が生成されたときには、そのときのフレームに対応する全てのサブバンド映像信号を選択し、前記ベストショットトリガ信号が生成されなかったときには、そのときのフレームに対応する低域側の一部のサブバンド映像信号を選択し、選択した各サブバンド映像信号を圧縮してサブバンド圧縮映像信号として、各サブバンド映像信号を多重して圧縮映像信号として前記記録媒体に記録する。この構成により、人物の顔が最も良く映っている画像を高解像度で記録することができる。すなわち、この発明でも、入力された映像信号から高解像度の映像情報を自動的に取得して記録することができる。また、映像の全てを高解像度で記録せず、ベストショットでない映像については低解像度で記録するので、記録媒体を有効に活用することができる。 (4) A video recording method for receiving a video signal as input and storing it in a recording medium. After receiving an input video signal and extracting only a region with motion, a person is captured from the extracted motion region. The frame in which the person is best captured is detected to generate a best shot trigger signal, and the input video signal is frequency-converted into a frequency domain video signal. When the best shot trigger signal is generated by dividing into sub-band video signals, all sub-band video signals corresponding to the frame at that time are selected, and when the best shot trigger signal is not generated, Select a sub-band video signal on the low-frequency side corresponding to the frame of, and compress the selected sub-band video signal to sub-band compressed video No. As is recorded on the recording medium each sub-band video signals as multiplexed with the compressed video signal. With this configuration, an image in which a person's face is best reflected can be recorded with high resolution. That is, also in the present invention, high-resolution video information can be automatically acquired from an input video signal and recorded. In addition, since all of the video is not recorded at a high resolution and a video that is not the best shot is recorded at a low resolution, the recording medium can be used effectively.

本発明は、高解像度で記録したい映像を指定することにより入力された映像信号から、高解像度の映像情報を自動的に取得して記録するという効果を有する映像記録装置及び映像記録方法を提供することができるものである。 The present invention provides a video recording apparatus and a video recording method having an effect of automatically acquiring and recording high resolution video information from an input video signal by designating a video to be recorded at a high resolution. It is something that can be done.

以下、本発明を実施するための好適な実施の形態について、図面を参照して詳細に説明する。 DESCRIPTION OF EXEMPLARY EMBODIMENTS Hereinafter, preferred embodiments for carrying out the invention will be described in detail with reference to the drawings.

図１は、本発明の一実施の形態に係る映像記録装置の概略構成を示すブロック図である。この図において、本実施の形態の映像記録装置１００は、人物を記録する監視用に用いられるものであり、動領域抽出部１０１と、人領域抽出部１０２と、ベストショット判定部１０３と、サブバンド分割部１０４と、高域用バッファ１０５、低域用バッファ１０６と、選択部１０７と、高域圧縮部１０８、低域圧縮部１０９と、サブバンド多重部１１０と、記録制御部１１１と、ストレージ１１２とを備えて構成される。 FIG. 1 is a block diagram showing a schematic configuration of a video recording apparatus according to an embodiment of the present invention. In this figure, a video recording apparatus 100 according to the present embodiment is used for monitoring to record a person, and includes a moving area extracting unit 101, a human area extracting unit 102, a best shot determining unit 103, A band division unit 104, a high frequency buffer 105, a low frequency buffer 106, a selection unit 107, a high frequency compression unit 108, a low frequency compression unit 109, a subband multiplexing unit 110, a recording control unit 111, And a storage 112.

なお、サブバンド分割部１０４は、周波数変換手段とサブバンド分割手段に対応する。また、選択部１０７は、選択手段に対応する。また、高域圧縮部１０８及び低域圧縮部１０９は、圧縮手段に対応する。また、サブバンド多重部１１０は、多重化手段に対応する。また、記録制御部１１１は、記録制御手段に対応する。また、動領域抽出部１０１、人領域抽出部１０２及びベストショット判定部１０３は、ベストショットトリガ信号生成手段を構成する。 The subband division unit 104 corresponds to a frequency conversion unit and a subband division unit. The selection unit 107 corresponds to selection means. The high frequency compression unit 108 and the low frequency compression unit 109 correspond to compression means. The subband multiplexing unit 110 corresponds to multiplexing means. The recording control unit 111 corresponds to a recording control unit. In addition, the moving region extraction unit 101, the human region extraction unit 102, and the best shot determination unit 103 constitute a best shot trigger signal generation unit.

動領域抽出部１０１は、図示しないカメラからの映像信号又は該映像信号を処理した信号を入力として受けて映像中の動きを検出し、動きのあった領域だけを抽出して動領域信号として人領域抽出部１０２に入力する。動領域抽出のための手法としては、最も一般的な背景差分法を用いるものとするが、フレーム間差分、オプティカルフロー、ステレオ視などの手法であっても構わない。動領域抽出部１０１の出力としては、動きがあった画素を「１」、動きがない画素を「０」とした２値画像を想定する。 The motion area extraction unit 101 receives a video signal from a camera (not shown) or a signal obtained by processing the video signal as an input, detects a motion in the video, extracts only the motion area, and extracts a motion area signal as a motion area signal. Input to the region extraction unit 102. As a method for extracting a moving region, the most common background difference method is used, but a method such as inter-frame difference, optical flow, or stereo vision may be used. As an output of the moving region extraction unit 101, a binary image is assumed in which a pixel having motion is “1” and a pixel having no motion is “0”.

人領域抽出部１０２は、動領域抽出部１０１より入力された動領域信号から動領域が人物かどうかを判定し、人物であると判断した場合のみ人領域信号として出力する。すなわち、人領域抽出部１０２は、映像中に動きがある場合のみ動作する。映像中に動きがある場合、動領域抽出部１０１からは動きがあった領域とそうでない領域とを区別した２値画像が出力されるが、今回は楕円ハフ法による頭部検出によって人かそうでないかを判定するものとする。楕円ハフ法を適用するためには、動領域抽出部１０１より入力された２値画像に対してエッジ検出を行い、検出したエッジの形状が楕円に近いかどうかを楕円ハフ変換を用いて判定する。この楕円ハフ法は、映像中に人体が写っている場合、頭部が楕円形状となるため、人かどうかを高精度に判別することができる有効な手法である。楕円ハフ法以外の手法としては、目鼻などの顔の構造を検出する手法がある。 The human region extraction unit 102 determines whether the moving region is a person from the moving region signal input from the moving region extraction unit 101, and outputs the human region signal only when it is determined that the moving region is a person. That is, the human region extraction unit 102 operates only when there is movement in the video. If there is motion in the video, the motion region extraction unit 101 outputs a binary image that distinguishes between the region that has moved and the region that has not moved. It shall be determined whether or not. In order to apply the elliptical Hough method, edge detection is performed on the binary image input from the moving region extraction unit 101, and it is determined using the elliptical Hough transform whether the detected edge shape is close to an ellipse. . The elliptical Hough method is an effective technique that can accurately determine whether a person is a person because the head has an elliptical shape when a human body is captured in the video. As a method other than the elliptical Hough method, there is a method for detecting the structure of the face such as the eyes and nose.

例えば、近年多く用いられるブースティング（Ｂｏｏｓｔｉｎｇ）と呼ばれる手法では、目鼻などの顔の陰影を非常にラフなテンプレートとの比較で高速に検出することが可能である。顔の構造を検出する方法に対する楕円ハフ法の優位点は、後ろ向きなど顔の構造が明確に検出できないケースでも人物を高精度に検出できる点にある。カメラの設置位置によっては通過する人物の顔が必ずしもカメラの視野に入るとは限らないため、今回は楕円ハフ法を採用する。ドアにカメラを設置するケースなど、確実に顔構造が撮影できるような条件下であれば、顔構造を検出する方法でも構わない。 For example, a technique called boosting, which is often used in recent years, can detect a shadow of a face such as the eyes and nose at high speed by comparing with a very rough template. The advantage of the elliptical Hough method over the method of detecting the face structure is that a person can be detected with high accuracy even when the face structure cannot be clearly detected such as backwards. Depending on the camera installation position, the face of the passing person does not necessarily enter the field of view of the camera, so this time the elliptical Hough method is adopted. A method of detecting the face structure may be used as long as the face structure can be reliably photographed, such as a case where a camera is installed on the door.

ベストショット判定部１０３は、人領域信号から人物の写りの良さを判定し、最も良いシーンを指示する。すなわち、ベストショット判定部１０３は、一連の映像の中から人物が最も良く写っている１枚を選択する。本実施の形態では、「人物の顔のサイズ」、「顔のコントラスト」、「欠落がないこと」、「顔の向き」の４つを基準としてベストショットを選択する。人物の顔のサイズは楕円ハフでの適合テンプレートサイズから算出する。また、楕円テンプレートとの適合が一部に偏っている場合、欠落があると判定する。顔の向きについては、顔の横方向の輝度勾配を利用する。 The best shot determination unit 103 determines the image quality of the person from the human area signal and instructs the best scene. That is, the best shot determination unit 103 selects one image in which a person is best captured from a series of videos. In the present embodiment, the best shot is selected based on the following four criteria: “person face size”, “face contrast”, “no missing”, and “face orientation”. The size of the human face is calculated from the matching template size in the elliptical hough. Further, if the fit with the elliptical template is partially biased, it is determined that there is a lack. For the orientation of the face, a luminance gradient in the lateral direction of the face is used.

一般に正面向きは輝度勾配がフラットになり、右向きや左向きでは後頭部が暗く、顔部が明るくなることから輝度勾配が生じる。顔コントラストは顔部の輝度の幅から算出することができる。本実施の形態では、この４つの基準の多数決を用いてベストショット１枚を選択する。１つの基準だけが高得点を取るのではなく、多数（本例では３つ）の基準がこれまでのベストショットのパラメータよりも良い場合、最新のフレームをベストショットとし、以前のベストショットを置換する。 In general, the luminance gradient is flat in the front direction, and the luminance gradient occurs because the back of the head is dark and the face is bright in the right direction and the left direction. The face contrast can be calculated from the luminance width of the face portion. In the present embodiment, one best shot is selected using the majority of these four criteria. If only one criterion gets a high score, but many (three in this example) are better than the best shot parameters so far, the most recent frame is taken as the best shot and the previous best shot is replaced. To do.

サブバンド分割部１０４は、入力される映像信号を周波数変換して周波数領域映像信号にし、この周波数領域映像信号を複数のサブバンド領域に分割する。すなわち、入力される映像を周波数に基づいて一定のサブバンドに分割する。本実施の形態ではウェーブレット変換を用いた水平２段、垂直２段の分割を想定する。図２の（ａ）に水平２段、垂直２段の分割を行った結果を示す。この場合、「Ｈ」が高周波側、「Ｌ」が低周波側を表し、「ＬＬ」は水平方向及び垂直方向夫々が低域のサブバンド、「ＨＬ」は水平方向が高域、垂直方向が低域のサブバンド、「ＬＨ」は水平方向が低域、垂直方向が高域のサブバンド、「ＨＨ」は水平方向及び垂直方向夫々が高域のサブバンドである。「ＬＬ」が元の映像に対して１／４サイズの縮小画像となるため、サムネイル表示などに用いる縮小画像を容易に得られることがサブバンド符号化のメリットである。更に分割を水平１段、垂直１段行うことで、図２の（ｂ）のように７つの帯域に分割できる。ＬＬＬＬサブバンドが１／１６サイズの縮小画像となる。 The subband division unit 104 converts the frequency of the input video signal into a frequency domain video signal, and divides the frequency domain video signal into a plurality of subband areas. That is, the input video is divided into certain subbands based on the frequency. In this embodiment, a horizontal two-stage division and a vertical two-stage division using a wavelet transform are assumed. FIG. 2A shows the result of dividing into two horizontal stages and two vertical stages. In this case, “H” represents the high frequency side, “L” represents the low frequency side, “LL” is a subband of the low frequency in the horizontal direction and the vertical direction, and “HL” is the high frequency in the horizontal direction and the vertical direction. A low-frequency subband, “LH”, is a low frequency band in the horizontal direction and a high frequency subband, and “HH” is a high frequency subband in the horizontal direction and the vertical direction. Since “LL” is a reduced image of 1/4 size with respect to the original video, it is an advantage of subband encoding that a reduced image used for thumbnail display or the like can be easily obtained. Further, by performing division in one horizontal stage and one vertical stage, it can be divided into seven bands as shown in FIG. The LLLL subband is a reduced image of 1/16 size.

以上のように分割された各サブバンドは、それぞれバッファに一時的に保存される。ベストショットを決定するためにはある人物が視野から退場するまで待つ必要があるためである。退場するまでの時間は使用されるシーンによって異なるが、一概に何秒以内と決定することができずバッファ容量も有限なので、バッファ容量から規定される最大滞在時間を超えた場合の処理が必要となる。本実施の形態では最大滞在時間を超えた場合、そこまでのベストショットを確定されたベストショットとして出力し、次のフレームから新たにベストショットの判定を再開する。本実施の形態では、図２の（ｂ）のＬＬＬＬサブバンドを低域側バッファ１０６に保存し、それ以外の６つのサブバンドを高域側バッファ１０５に保存する。 Each subband divided as described above is temporarily stored in a buffer. This is because it is necessary to wait until a person leaves the field of view to determine the best shot. The time until leaving depends on the scene used, but since it cannot be generally determined within a few seconds and the buffer capacity is also finite, processing when the maximum stay time specified by the buffer capacity is exceeded is required. Become. In this embodiment, when the maximum stay time is exceeded, the best shot up to that time is output as a confirmed best shot, and the best shot determination is restarted from the next frame. In the present embodiment, the LLLL subband shown in FIG. 2B is stored in the low frequency buffer 106 and the other six subbands are stored in the high frequency buffer 105.

選択部１０７は、ベストショット判定部１０３の出力に応じて高域用バッファ１０５又は低域用バッファ１０６からデータを読み出す。すなわち、ベストショットと判定されたフレームに対しては高域用バッファ１０５と低域用バッファ１０６の両方からデータを読み出し、ベストショットではないと判定されたフレームに対しては、低域用バッファ１０６からのみデータを読み出す。本実施の形態では、ベストショット判定部１０３でベストショットと判定されたフレームに対して全てのサブバンド（ＬＬＬＬ〜ＨＨ）を出力し、ベストショットではないと判定されたフレームに対して低域側の一部（ＬＬＬＬ）のサブバンドを選択して出力する。 The selection unit 107 reads data from the high frequency buffer 105 or the low frequency buffer 106 according to the output of the best shot determination unit 103. That is, data is read from both the high frequency buffer 105 and the low frequency buffer 106 for the frame determined to be the best shot, and the low frequency buffer 106 is determined for the frame determined not to be the best shot. Read data from only. In the present embodiment, all the subbands (LLLL to HH) are output for the frame determined to be the best shot by the best shot determination unit 103, and the low frequency side for the frame determined not to be the best shot A part (LLLL) of subbands is selected and output.

ベストショット判定部１０３は、現時点までにバッファに保存されているＮフレームのうち、何番目のフレームをベストショットとして確定したかを出力する。例えば、高域用バッファ１０５と低域用バッファ１０６に３０フレームが保存された状態で２２フレーム目をベストショットと判定した場合には「２１」という数値を出力する。この出力を受けた選択部１０７は次のように動作する。即ち、１フレームから２１フレームはベストショットでないので、低域用バッファ１０６に保存された１フレームから２１フレーム夫々のサブバンドを読み出して低域圧縮部１０９に入力する。また、高域用バッファ１０５に保存されている１フレームから２１フレーム夫々のデータは廃棄する。 The best shot determination unit 103 outputs the number of frames determined as the best shot among the N frames stored in the buffer up to the present time. For example, when it is determined that the 22nd frame is the best shot while 30 frames are stored in the high frequency buffer 105 and the low frequency buffer 106, a numerical value “21” is output. Upon receiving this output, the selection unit 107 operates as follows. That is, since the 1st to 21st frames are not the best shots, the subbands of each of the 1st to 21st frames stored in the low frequency buffer 106 are read out and input to the low frequency compression unit 109. Further, the data of 1 to 21 frames stored in the high frequency buffer 105 is discarded.

そして、次の２２フレームはベストショットであるので、低域用バッファ１０６と高域用バッファ１０５の両方から２２フレームのサブバンドを読み出して高域圧縮部１０８と低域圧縮部１０９に入力する。この場合、低域用バッファ１０６に保存されているＬＬＬＬサブバンドを低域圧縮部１０９に入力し、高域用バッファ１０５に保存されているＬＬＨＬ、ＬＬＬＨ、ＬＬＨＨ、ＨＬ、ＬＨ、ＨＨの各サブバンドを高域圧縮部１０８に入力する。 Since the next 22 frames are the best shots, 22 frames of subbands are read from both the low-frequency buffer 106 and the high-frequency buffer 105 and input to the high-frequency compression unit 108 and the low-frequency compression unit 109. In this case, the LLLL subband stored in the low-frequency buffer 106 is input to the low-frequency compression unit 109, and each of the LLHL, LLLH, LLHH, HL, LH, and HH stored in the high-frequency buffer 105 is stored. The band is input to the high frequency compression unit 108.

また、２３フレームから３０フレームまでは、１フレームから２１フレームと同様にベストショットではないので、同様に低域側バッファ１０６に保存されている２３フレームから３０フレーム夫々のサブバンドを読み出して低域圧縮部１０９に入力する。 Also, since the 23 to 30 frames are not the best shot as the 1 to 21 frames, the sub-bands from the 23 frames to the 30 frames stored in the low frequency buffer 106 are similarly read out. Input to the compression unit 109.

高域圧縮部１０８は、選択部１０７から入力された各サブバンドを圧縮する。高域圧縮部１０８は、結果的にベストショットと判定されたフレームについてのみ動作することになり、高域側の６つのサブバンドについて圧縮を行う。そして、圧縮を行った高域側の６つのサブバンドをサブバンド多重部１１０に入力する。低域圧縮部１０９は、結果的に全てのフレームについて動作することになり、低域側の縮小画像について圧縮を行う。そして、圧縮を行った低域側の縮小画像をサブバンド多重部１１０に入力する。圧縮の手法は様々であり、本発明では特に規定するものではないが、本実施の形態では符号の出現頻度に応じて符号長を決定するハフマン符号化による圧縮を想定している。 The high frequency compression unit 108 compresses each subband input from the selection unit 107. As a result, the high frequency compression unit 108 operates only on the frame determined to be the best shot, and performs compression on the six high frequency bands. Then, the 6 subbands on the high frequency side subjected to compression are input to the subband multiplexing unit 110. As a result, the low-frequency compression unit 109 operates on all frames, and compresses the reduced image on the low frequency side. Then, the compressed reduced low-frequency image is input to the subband multiplexing unit 110. There are various compression methods, which are not particularly defined in the present invention, but in the present embodiment, compression by Huffman coding that determines a code length according to the appearance frequency of the code is assumed.

サブバンド多重部１１０は、高域圧縮部１０８又は低域圧縮部１０９から入力されたデータに対してヘッダなどの付加情報を付与して記録制御部１１１に入力する。ベストショットと判定されたフレームについては７つのサブバンド全てについて、またそれ以外のフレームについては低域側の縮小画像のみを処理対象とする。記録制御部１１１は、サブバンド多重部１１０から入力されたデータをストレージ１１２に記録する。特に、ストレージの種類を特定するものではないが、本例ではハードディスクドライブを想定している。 The subband multiplexing unit 110 adds additional information such as a header to the data input from the high frequency compression unit 108 or the low frequency compression unit 109 and inputs the data to the recording control unit 111. For the frame determined to be the best shot, all seven subbands are processed, and for the other frames, only the reduced image on the low frequency side is processed. The recording control unit 111 records the data input from the subband multiplexing unit 110 in the storage 112. In particular, the type of storage is not specified, but in this example, a hard disk drive is assumed.

次に、図３を用いて上記構成の映像記録装置の動作について説明する。図３では、カメラ視野内の人の頭部を円で表現し、説明を容易にするためにベストショット判定基準として顔のサイズだけを使用して説明する。また、説明を容易にするために水平１段、垂直１段のケース（例えば図２の（ａ）のケース）を用いる。 Next, the operation of the video recording apparatus having the above configuration will be described with reference to FIG. In FIG. 3, the human head in the camera field of view is represented by a circle, and for ease of explanation, only the face size is used as the best shot criterion. For ease of explanation, a case of one horizontal stage and one vertical stage (for example, the case shown in FIG. 2A) is used.

０フレームの入力時、まだ人物は視野内に登場していない。人物がいないので、ベストショット判定部１０３は「−１（人物なし）」を出力する。これにより、第０フレームは低域のＬＬサブバンドのみが圧縮されてストレージ１１２に記録される。 At the time of 0 frame input, the person has not yet appeared in the field of view. Since there is no person, the best shot determination unit 103 outputs “−1 (no person)”. As a result, in the 0th frame, only the low-frequency LL subband is compressed and recorded in the storage 112.

第１フレームの入力時、左下に人物が登場する。第１フレームはサブバンド分割され、低域のＬＬサブバンドが低域用バッファ１０６に保持され、高域のＬＨ、ＨＬ及びＨＨの各サブバンドが高域用バッファ１０５に保持される。この場合、以前のデータが存在しないので、第１フレームの基準を仮ベストショットとする。ベストショット判定部１０３は「０」を出力する。 When entering the first frame, a person appears in the lower left. The first frame is divided into subbands, the low frequency LL subband is held in the low frequency buffer 106, and the high frequency LH, HL, and HH subbands are held in the high frequency buffer 105. In this case, since there is no previous data, the reference of the first frame is set as a temporary best shot. The best shot determination unit 103 outputs “0”.

第２フレームの入力時、人物の顔はより大きくなる。仮ベストショットは第２フレームに更新される。第２フレームのデータはサブバンド分割され、低域のＬＬサブバンドが低域用バッファ１０６に保持され、高域のＬＨ、ＨＬ及びＨＨの各サブバンドが高域用バッファ１０５に保持される。ベストショット判定部１０３は「１」を出力する。この第２フレームの入力により、第１フレームがベストショットでないことが確定するので、第１フレームのＬＬサブバンドのみが低域用バッファ１０６から読み出されて、圧縮処理された後にストレージ１１２に記録される。第１フレームの高域側データ即ちＬＨ、ＨＬ及びＨＨの各サブバンドは廃棄される。 When inputting the second frame, the face of the person becomes larger. The temporary best shot is updated to the second frame. The data of the second frame is divided into subbands, the low frequency LL subband is held in the low frequency buffer 106, and the high frequency LH, HL, and HH subbands are held in the high frequency buffer 105. The best shot determination unit 103 outputs “1”. Since the input of the second frame determines that the first frame is not the best shot, only the LL subband of the first frame is read from the low frequency buffer 106 and recorded in the storage 112 after being compressed. Is done. The high band side data of the first frame, that is, the LH, HL, and HH subbands are discarded.

第３フレームの入力時、人物の顔は更に大きくなる。仮ベストショットは第３フレームに更新される。第３フレームのデータはサブバンド分割され、低域のＬＬサブバンドが低域用バッファ１０６に保持され、高域のＬＨ、ＨＬ及びＨＨの各サブバンドが高域用バッファ１０５に保持される。ベストショット判定部１０３は「２」を出力する。この第３フレームの入力により、第２フレームがベストショットでないことが確定するので、第２フレームのＬＬサブバンドのみが低域用バッファ１０６から読み出されて、圧縮処理された後にストレージ１１２に記録される。第２フレームの高域側データ即ちＬＨ、ＨＬ及びＨＨの各サブバンドは廃棄される。 When inputting the third frame, the face of the person becomes larger. The temporary best shot is updated to the third frame. The data of the third frame is divided into subbands, the low frequency LL subband is held in the low frequency buffer 106, and the high frequency LH, HL, and HH subbands are held in the high frequency buffer 105. The best shot determination unit 103 outputs “2”. Since the input of the third frame determines that the second frame is not the best shot, only the LL subband of the second frame is read from the low frequency buffer 106 and recorded in the storage 112 after being compressed. Is done. The high band side data of the second frame, that is, the LH, HL, and HH subbands are discarded.

第４フレームの入力時、人物の顔は更に大きくなる。仮ベストショットは第４フレームに更新される。第４フレームのデータはサブバンド分割され、低域のＬＬサブバンドが低域用バッファ１０６に保持され、高域のＬＨ、ＨＬ及びＨＨの各サブバンドが高域用バッファ１０５に保持される。ベストショット判定部１０３は「３」を出力する。この第４フレームの入力により、第３フレームがベストショットでないことが確定するので、第３フレームのＬＬサブバンドのみが低域用バッファ１０６から読み出されて、圧縮処理された後にストレージ１１２に記録される。第３フレームの高域側データ即ちＬＨ、ＨＬ及びＨＨの各サブバンドは廃棄される。 When inputting the fourth frame, the face of the person becomes even larger. The temporary best shot is updated to the fourth frame. The data of the fourth frame is divided into subbands, the low frequency LL subband is held in the low frequency buffer 106, and the high frequency LH, HL, and HH subbands are held in the high frequency buffer 105. The best shot determination unit 103 outputs “3”. Since the input of the fourth frame determines that the third frame is not the best shot, only the LL subband of the third frame is read from the low frequency buffer 106 and recorded in the storage 112 after being compressed. Is done. The high band side data of the third frame, that is, the LH, HL, and HH subbands are discarded.

第５フレームの入力時、人物の顔は更に大きくなるが、顔の一部が画面外に出てしまい、欠落が生じる。したがって、仮ベストショットは第４フレームのままである。第５フレームのデータはサブバンド分割され、低域のＬＬサブバンドが低域用バッファ１０６に保持され、高域のＬＨ、ＨＬ及びＨＨの各サブバンドが高域用バッファ１０５に保持される。ベストショット判定部１０３は「３」を出力する。 At the time of inputting the fifth frame, the face of the person becomes larger, but part of the face goes out of the screen, resulting in omission. Therefore, the temporary best shot remains the fourth frame. The data of the fifth frame is divided into subbands, the low frequency LL subband is held in the low frequency buffer 106, and the high frequency LH, HL, and HH subbands are held in the high frequency buffer 105. The best shot determination unit 103 outputs “3”.

第６フレームの入力時、人物の顔は小さくなり、欠落も生じたままである。したがって、仮ベストショットは第４フレームのままである。第６フレームのデータはサブバンド分割され、低域のＬＬサブバンドが低域用バッファ１０６に保持され、高域のＬＨ、ＨＬ及びＨＨの各サブバンドが高域用バッファ１０５に保持される。ベストショット判定部１０３は「３」を出力する。 At the time of inputting the sixth frame, the face of the person becomes smaller and missing is still occurring. Therefore, the temporary best shot remains the fourth frame. The data of the sixth frame is divided into subbands, the low frequency LL subband is held in the low frequency buffer 106, and the high frequency LH, HL, and HH subbands are held in the high frequency buffer 105. The best shot determination unit 103 outputs “3”.

第７フレームの入力時、欠落はなくなるが、人物の顔はより小さくなる。したがって、仮ベストショットは第４フレームのままである。第７フレームのデータはサブバンド分割され、低域のＬＬサブバンドが低域用バッファ１０６に保持され、高域のＬＨ、ＨＬ及びＨＨの各サブバンドが高域用バッファ１０５に保持される。ベストショット判定部１０３は「３」を出力する。 At the time of inputting the seventh frame, the omission is eliminated, but the human face becomes smaller. Therefore, the temporary best shot remains the fourth frame. The data of the seventh frame is divided into subbands, the low frequency LL subband is held in the low frequency buffer 106, and the high frequency LH, HL, and HH subbands are held in the high frequency buffer 105. The best shot determination unit 103 outputs “3”.

第８フレームの入力時、人物は画面外に移動する。したがって、ベストショットは第４フレームに確定する。ベストショット判定部１０３は「−１」を出力する。第８フレームのデータは、サブバンド分割され、低域のＬＬサブバンドが低域用バッファ１０６に保持され、高域のＬＨ、ＨＬ及びＨＨの各サブバンドが高域用バッファ１０５に保持される。 When inputting the eighth frame, the person moves out of the screen. Therefore, the best shot is determined in the fourth frame. The best shot determination unit 103 outputs “−1”. The data of the eighth frame is divided into subbands, the low frequency LL subband is held in the low frequency buffer 106, and the high frequency LH, HL, and HH subbands are held in the high frequency buffer 105. .

第８フレームの入力で最大滞在時間を超えたとして、この時点までのベストショットが確定されたベストショットとして出力される。即ち、第４フレームのデータがベストショットとして、該フレームの全てのサブバンドが圧縮処理されてストレージ１１２に記録される。 If the maximum stay time is exceeded at the input of the eighth frame, the best shot up to this point is output as the determined best shot. That is, the data of the fourth frame is the best shot, and all the subbands of the frame are compressed and recorded in the storage 112.

また、第５〜第８フレームのデータは、低域側のサブバンドのみが読み出されて、圧縮された後、ストレージ１１２に記録される。また、第５〜第８フレームの高域側データは廃棄される。 In addition, the data of the fifth to eighth frames are recorded in the storage 112 after only the low-frequency subband is read and compressed. Further, the high frequency side data of the fifth to eighth frames is discarded.

このように本実施の形態の映像記録装置１００によれば、入力された映像信号を受けて動きがある領域のみ抽出した後、抽出した動領域の中から人物の顔が写っている部分のみを抽出して人物の顔が最も良く写っているフレームを検出してベストショットトリガ信号を生成し、入力された映像信号を周波数領域映像信号に周波数変換した後、該周波数領域映像信号を複数のサブバンド映像信号に分割し、ベストショットトリガ信号が生成されたときには、そのときのフレームに対応する全てのサブバンド映像信号を選択し、ベストショットトリガ信号が生成されなかったときには、そのときのフレームに対応する低域側の一部のサブバンド映像信号のみを選択し、選択した各サブバンド映像信号を圧縮してサブバンド圧縮映像信号として、各サブバンド映像信号を多重して圧縮映像信号としてストレージ１１２に記録するので、人物の顔が最も良く映っている画像のみを高解像度で記録することができる。すなわち、入力された映像信号から高解像度の映像情報を自動的に取得して記録することができる。また、映像の全てを高解像度で記録せず、ベストショットでない映像については低解像度で記録するので、ストレージ１１２を有効に活用することができる。 As described above, according to the video recording apparatus 100 of the present embodiment, after receiving an input video signal and extracting only a region where there is a motion, only a portion where a person's face is reflected is extracted from the extracted motion region. Extracting and detecting the frame in which the person's face is best captured, generating a best shot trigger signal, frequency-converting the input video signal into a frequency domain video signal, When the best shot trigger signal is generated after being divided into band video signals, all subband video signals corresponding to the frame at that time are selected, and when the best shot trigger signal is not generated, the frame at that time is selected. Select only a part of the corresponding sub-band video signal on the low-frequency side, compress each selected sub-band video signal, Since recording a subband image signal to the storage 112 as a multiplexed and compressed video signals, it is possible to record only an image in which the face of a person is reflected best in high resolution. That is, it is possible to automatically acquire and record high-resolution video information from the input video signal. In addition, since all of the video is not recorded at high resolution and video that is not the best shot is recorded at low resolution, the storage 112 can be used effectively.

なお、本実施の形態では、人物を記録する監視用として、人物の顔が最も良く写っているフレームを高解像度で記録し、その他のフレームを低解像度で記録するようにしたが、記録する対象は人物に限定されるものではなく任意であり、要は、高解像度で記録したい映像を指定すれば良く、指定した映像は高解像度で記録され、その他は低解像度で記録される。 In this embodiment, for monitoring to record a person, a frame in which the person's face is best captured is recorded at a high resolution, and other frames are recorded at a low resolution. Is not limited to a person but is arbitrary. In short, it is only necessary to designate a video to be recorded at a high resolution, the designated video is recorded at a high resolution, and the others are recorded at a low resolution.

本発明は、入力された映像信号から高解像度の映像情報を自動的に取得して記録するといった効果を有し、入力された映像信号を圧縮してハードディスク等の記録媒体に記録する映像記録装置等として有用である。 The present invention has an effect of automatically acquiring and recording high-resolution video information from an input video signal, and compressing the input video signal and recording it on a recording medium such as a hard disk. Useful as such.

本発明の一実施の形態に係る映像記録装置の概略構成を示すブロック図1 is a block diagram showing a schematic configuration of a video recording apparatus according to an embodiment of the present invention. 図１の映像記録装置のサブバンド分割の動作説明のための概念図Conceptual diagram for explaining the operation of subband division of the video recording apparatus of FIG. 図１の映像記録装置の動作説明のための概念図Conceptual diagram for explaining the operation of the video recording apparatus of FIG.

Explanation of symbols

１００映像記録装置
１０１動領域抽出部
１０２人領域抽出部
１０３ベストショット判定部
１０４サブバンド分割部
１０５高域用バッファ
１０６低域用バッファ
１０７選択部
１０８高域圧縮部
１０９低域圧縮部
１１０サブバンド多重部
１１１記録制御部
１１２ストレージ DESCRIPTION OF SYMBOLS 100 Video recording device 101 Moving area extraction part 102 Human area extraction part 103 Best shot determination part 104 Subband division part 105 High frequency buffer 106 Low frequency buffer 107 Selection part 108 High frequency compression part 109 Low frequency compression part 110 Subband Multiplexer 111 Recording controller 112 Storage

Claims

A video recording apparatus for storing an input video signal in a recording medium,
A frequency converting means for frequency-converting the input video signal to obtain a frequency domain video signal;
Subband dividing means for dividing the frequency domain video signal obtained by the frequency converting means into a plurality of subband video signals;
For the video designated for high resolution recording, all of the plurality of subband video signals obtained by the subband dividing means are selected, and for the video not designated for high resolution recording, the subband is selected. A selecting means for selecting a part on the low frequency side from the plurality of subband video signals obtained by the dividing means;
Compression means for compressing each subband video signal selected by the selection means to generate a subband compressed video signal;
For the video designated for high resolution recording, all the sub-band compressed video signals selected by the selection means and compressed by the compression means are multiplexed, and for the video not designated for high resolution recording. Is a multiplexing means for multiplexing a part of the sub-band compressed video signal on the low frequency side selected by the selection means and compressed by the compression means;
Recording control means for recording the sub-band compressed video signal multiplexed by the multiplexing means on the recording medium;
A video recording apparatus comprising:

A video recording apparatus for storing a video signal input in frame units in a recording medium,
Best shot trigger signal generating means for generating a best shot trigger signal by detecting a frame in which a person is best captured by performing image processing on an input video signal;
A frequency converting means for frequency-converting the input video signal to obtain a frequency domain video signal;
Subband dividing means for dividing the frequency domain video signal obtained by the frequency converting means into a plurality of subband video signals;
When the best shot trigger signal is generated by the best shot trigger signal generation means, all subband video signals corresponding to the frame at that time are selected, and when the best shot trigger signal is not generated, then Selecting means for selecting a part of the subband video signal on the low frequency side corresponding to the frame of
Compression means for compressing each subband video signal selected by the selection means and outputting it as a subband compressed video signal;
Multiplexing means for multiplexing each subband video signal compressed by the compression means and outputting as a compressed video signal;
Recording control means for controlling recording of the compressed video signal from the multiplexing means onto the recording medium;
A video recording apparatus comprising:

A video recording method for storing an input video signal in a recording medium,
The input video signal is frequency converted, and the resulting frequency domain video signal is divided into a plurality of subband video signals,
For a video designated for high resolution recording, all of the subband video signals for the video are compressed to generate a subband compressed video signal, and all the subband compressed video signals obtained thereby are multiplexed. For a video that is recorded on a recording medium and not designated for high-resolution recording, a part of the low frequency side of the plurality of subband video signals for the video is compressed to generate a subband compressed video signal. A video recording method for multiplexing a part of the subband compressed video signals obtained thereby and recording the multiplexed video signals on a recording medium.

A video recording method for receiving a video signal as input and storing it in a recording medium,
After extracting only the region with motion in response to the input video signal, the best shot trigger signal is detected by extracting the part where the person is reflected from the extracted motion area and detecting the frame where the person is best reflected Produces
After frequency conversion of the input video signal into a frequency domain video signal, the frequency domain video signal is divided into a plurality of subband video signals,
When the best shot trigger signal is generated, all subband video signals corresponding to the current frame are selected. When the best shot trigger signal is not generated, the low frequency side corresponding to the current frame is selected. Select some subband video signals,
A video recording method of compressing each selected subband video signal as a subband compressed video signal and multiplexing each subband video signal and recording it as a compressed video signal on the recording medium.