JP2005109606A

JP2005109606A - Signal processing method, signal processing apparatus, recording apparatus, and reproducing apparatus

Info

Publication number: JP2005109606A
Application number: JP2003336757A
Authority: JP
Inventors: Miki Abe; 三樹阿部; Tadao Yoshida; 忠雄吉田
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2003-09-29
Filing date: 2003-09-29
Publication date: 2005-04-21

Abstract

<P>PROBLEM TO BE SOLVED: To provide a signal processing method whereby a compression rate of an input image can be set in response to an object. <P>SOLUTION: An object detection block 90 detects a face image included in received image data in units of frames, and a quantization section 74 of a data processing encode/decode section 31 decreases quantization coefficients of macroblocks including the face image and increases quantization coefficients of the other macroblocks than the macroblocks including the face image when the object detection block 90 detects the face image. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

本発明は、例えば撮影した画像情報を記録媒体に記録したり、記録媒体に記録されている画像情報を再生する際に好適な信号処理方法と信号処理装置、及びそのような信号処理装置を備えた記録装置、再生装置に関するものである。 The present invention includes, for example, a signal processing method and a signal processing apparatus suitable for recording photographed image information on a recording medium and reproducing image information recorded on the recording medium, and such a signal processing apparatus. The present invention relates to a recording apparatus and a reproducing apparatus.

近年、ユーザが撮影した撮影画像を光ディスクなどの記録媒体に記録したり、或いは記録媒体に記録されている画像を再生して表示することができるビデオカメラが提案されている。 In recent years, a video camera has been proposed that can record a photographed image taken by a user on a recording medium such as an optical disk or reproduce and display an image recorded on the recording medium.

上記したようなビデオカメラでは、カメラブロックで撮影した撮影画像をディスクなどに記録する際に、情報量を圧縮するために、例えばＭＰＥＧ(Moving Picture Experts Group)方式などの圧縮処理が行われている。ＭＰＥＧ方式の場合、カメラブロックからの画像情報は動き検出ブロックで動き検出が行われる。そして、この動きベクトルの検出結果と上記撮影画像に基づいて得られる画像情報にＤＣＴ（離散コサイン変換）を行う。そしてさらに、情報量を減らすために量子化ブロックにて再量子化を行い（高域成分を０にする）、ブロックを１フレームの画面左上となるブロックからジグザグとなるようにブロック順を並び代え、ランレングスコーディングを行ってさらに情報量を圧縮することで圧縮符号化を行うようにしている。 In video cameras such as those described above, compression processing such as the MPEG (Moving Picture Experts Group) method is performed in order to compress the amount of information when a captured image taken with a camera block is recorded on a disk or the like. . In the case of the MPEG system, motion detection is performed on the image information from the camera block by the motion detection block. Then, DCT (Discrete Cosine Transform) is performed on the image information obtained based on the detection result of the motion vector and the captured image. Further, in order to reduce the amount of information, requantization is performed in the quantization block (high-frequency component is set to 0), and the block order is rearranged so that the block is zigzag from the block at the upper left of the screen of one frame Then, compression encoding is performed by performing run length coding and further compressing the information amount.

ところで、上記したような量子化ブロックでは、量子化係数に基づいて量子化が行われている。量子化係数は、小さければ小さいほど量子化雑音が小さくなり、高画質化を実現することができる。但し、量子化係数を小さくすると、それに伴って圧縮率もが小さくなるため圧縮符号化後の情報量は増加することになる。逆に、量子化係数を大きくすればするほど圧縮率が大きくなるので、圧縮符号化後の情報量を少なくできるが、この場合は、量子化雑音が大きくなるため画質は低下することになる。 By the way, in the quantization block as described above, quantization is performed based on the quantization coefficient. The smaller the quantization coefficient, the smaller the quantization noise and the higher the image quality. However, if the quantization coefficient is reduced, the compression rate also decreases accordingly, and the amount of information after compression encoding increases. On the contrary, since the compression rate increases as the quantization coefficient is increased, the amount of information after compression coding can be reduced. However, in this case, the quantization noise increases, and the image quality deteriorates.

量子化係数は、通常、上記動き検出ブロックの動きベクトル情報等から算出して決定するようにしている。このため、例えば撮影画像の動きが大きいときは、重要な画像として量子化係数を小さくして高画質化を図るようにしていた。また逆に撮影画像の動きが少ないときは重要な画像でないとして量子化係数を大きくして情報量を削減するようにしていた。しかし、撮影画像の重要度と動き量とは必ずしも一致するものではないため、必ずしも、動きベクトル情報を利用して量子化係数を決定することが適正であるとはいえないものであった。 The quantization coefficient is usually calculated and determined from motion vector information of the motion detection block. For this reason, for example, when the movement of the photographed image is large, the quantization coefficient is reduced as an important image to improve the image quality. On the contrary, when the motion of the photographed image is small, it is determined that the image is not an important image, and the quantization coefficient is increased to reduce the amount of information. However, since the degree of importance and the amount of motion of the captured image do not necessarily coincide with each other, it is not necessarily appropriate to determine the quantization coefficient using motion vector information.

そこで、量子化係数を決定する手法としては、撮像画像を重要な領域と、それ以外の領域とに分けて、画像の重要度に応じて適応的にビット割り付けを行う重み付け圧縮符号化が提案されている。
画像を重要な領域とそれ以外の領域に分割する指標としては、例えば人間の眼が日常肌色に多く接して敏感であることから、肌色に対して閾値を設定することで画像が重要であるか否かを判断する手法が知られている。そして、このような手法によれば、動画像に含まれる顔や手などの領域を検出することが可能になり、これらの領域に相当するマクロブロックに対して優先的にビットの割り当てを行うことで、視感から見た画質を改善することができるようになる。 Therefore, as a method for determining the quantization coefficient, weighted compression coding has been proposed in which a captured image is divided into an important region and other regions, and bit allocation is adaptively performed according to the importance of the image. ing.
As an index to divide an image into important areas and other areas, for example, because human eyes are sensitive to everyday skin color, is the image important by setting a threshold for skin color? A technique for determining whether or not is known. According to such a method, it becomes possible to detect areas such as a face and a hand included in a moving image, and to assign bits preferentially to macroblocks corresponding to these areas. Thus, it is possible to improve the image quality viewed from the visual sense.

また、下記特許文献１には、音声信号の特徴から話者を特定したり、顔や口の動きから発言者を特定し、音声情報や映像情報を記録してから所定期間経過後、或いは蓄積媒体の使用容量が所定値を越えたとき、または空き容量が所定値以下になったときに、音声信号から特定される区間の情報に基づいて音声情報や映像情報の圧縮率などを変更する技術が提案されている。 In Patent Document 1 below, a speaker is specified from the characteristics of an audio signal, a speaker is specified from movements of the face and mouth, and after a predetermined period has elapsed after recording audio information and video information, or stored. A technology for changing the compression rate of audio information and video information based on the information of the section specified from the audio signal when the used capacity of the medium exceeds a predetermined value or the free capacity becomes less than the predetermined value Has been proposed.

特開平１０−２１４２７０号公報JP-A-10-214270

しかしながら、上記したような人間の肌色を検出して、画像を重要な領域とそれ以外の領域とに分ける場合は、例えば背景色が肌色に似ているときの分離方法や、光線状況による色の変化、人種に対するロバストネスを改善する必要があった。
また、例えば画像に含まれるオブジェクトの重要度によって画像に割り当てる情報量を替えたり、或いは特定のオブジェクトだけを記録したい場合などは、上述したような肌色検出による画像分離では実現することができないものであった。 However, when human skin color as described above is detected and the image is divided into important areas and other areas, the separation method when the background color is similar to the skin color, or the There was a need to improve the robustness against change and race.
Also, for example, when the amount of information assigned to an image is changed depending on the importance of the object included in the image, or when only a specific object is to be recorded, it cannot be realized by image separation based on skin color detection as described above. there were.

また、上記特許文献１に記載されている音声信号を利用した話者認識では、例えばフレーム単位で画像情報の圧縮率を変更することができるが、音声信号による話者認識だけではフレームごとに画像を重要な領域とそれ以外の領域に分離することは困難であった。 In speaker recognition using the audio signal described in Patent Document 1, the compression rate of image information can be changed, for example, in units of frames. It was difficult to separate these into important areas and other areas.

そこで、本発明は、上記したような点を鑑みてなされたものであり、本発明の信号処理方法は、入力画像情報に含まれる特定のオブジェクトを検出し、特定のオブジェクトが検出されたときに、少なくとも入力画像情報に含まれる特定のオブジェクトについては圧縮率を変化させて圧縮するようにした。 Therefore, the present invention has been made in view of the above points, and the signal processing method of the present invention detects a specific object included in input image information, and when a specific object is detected. At least a specific object included in the input image information is compressed by changing the compression rate.

また本発明の信号処理装置は、入力画像情報に含まれる特定のオブジェクトを検出するオブジェクト検出手段と、オブジェクト検出手段により特定のオブジェクトが検出されたときに、少なくとも入力画像情報に含まれる特定のオブジェクトについては圧縮率を変化させて圧縮する画像圧縮手段とを備えることとした。 The signal processing device of the present invention includes an object detection unit that detects a specific object included in the input image information, and a specific object included in at least the input image information when the specific object is detected by the object detection unit. Is provided with an image compression means for compressing by changing the compression rate.

上記したような本発明によれば、入力画像情報に含まれる特定のオブジェクトを検出し、特定のオブジェクトが検出されたときに、少なくとも入力画像情報に含まれる特定のオブジェクトについては圧縮率を変化させるようにしている。これにより、オブジェクトに応じて入力画像情報の圧縮率を変えることが可能になる。 According to the present invention as described above, a specific object included in the input image information is detected, and when the specific object is detected, at least the specific object included in the input image information is changed in compression rate. I am doing so. Thereby, the compression rate of the input image information can be changed according to the object.

また、本発明の記録装置は、入力画像情報に含まれる特定のオブジェクトを検出するオブジェクト検出手段と、所要のオブジェクトの情報が格納されている格納手段と、オブジェクト検出手段により検出された特定のオブジェクトが、格納手段に格納されている所要のオブジェクトと一致するかどうか照合する照合手段と、照合手段の照合結果により、オブジェクト検出手段により検出された特定のオブジェクトが、格納手段に格納されている所要のオブジェクトと一致すると判別されたときに、特定のオブジェクトが含まれる入力画像情報を記録媒体に記録するための記録制御を行う記録制御手段とを備えることとした。 Further, the recording apparatus of the present invention includes an object detection unit that detects a specific object included in input image information, a storage unit that stores information on a required object, and a specific object detected by the object detection unit. The matching means for checking whether the object matches the required object stored in the storage means, and the specific object detected by the object detection means based on the verification result of the matching means is stored in the storage means. And a recording control means for performing recording control for recording the input image information including the specific object on the recording medium when it is determined that the object matches.

上記したような本発明の記録装置によれば、オブジェクト検出手段において入力画像情報から検出されたオブジェクトが、格納手段に格納されている所要のオブジェクトと一致するかどうか照合手段で照合し、その照合結果に基づいて、記録制御手段において入力画像情報を記録媒体に記録するための記録制御を行うようにしている。これにより、オブジェクト検出手段により検出されたオブジェクトに応じて、入力画像情報を記録媒体に記録する記録制御を行うことが可能になる。 According to the recording apparatus of the present invention as described above, whether or not the object detected from the input image information by the object detection means matches the required object stored in the storage means is verified by the verification means. Based on the result, the recording control means performs recording control for recording the input image information on the recording medium. This makes it possible to perform recording control for recording input image information on a recording medium in accordance with the object detected by the object detection means.

また、本発明の再生装置は、記録媒体から画像情報の読み出しを行う読出手段と、読出手段により読み出された画像情報に対して所定のデコード処理を施して画像情報として出力するデコード手段と、デコード手段から出力される画像情報から特定のオブジェクトを検出するオブジェクト検出手段と、オブジェクト検出手段の検出結果に基づいて、読出手段による記録媒体から画像情報の読出動作の制御を行う読出制御手段とを備えることとした。 Further, the reproducing apparatus of the present invention includes a reading unit that reads image information from a recording medium, a decoding unit that performs predetermined decoding processing on the image information read by the reading unit and outputs the image information as image information, Object detection means for detecting a specific object from the image information output from the decoding means, and reading control means for controlling the reading operation of the image information from the recording medium by the reading means based on the detection result of the object detection means. I decided to prepare.

上記したような本発明の再生装置によれば、オブジェクト検出手段において、記録媒体からの画像情報からオブジェクトの検出を行い、その検出結果に基づいて、読出手段による記録媒体から画像情報の読出動作の制御を行うようにしている。これにより、オブジェクト検出手段により検出されたオブジェクトに応じて記録媒体から画像情報の読出動作の制御を行うことが可能になる。 According to the playback apparatus of the present invention as described above, the object detection unit detects an object from the image information from the recording medium, and based on the detection result, the reading unit reads the image information from the recording medium. Control is performed. Thereby, it is possible to control the reading operation of the image information from the recording medium in accordance with the object detected by the object detecting means.

このように本発明の信号処理方法及び信号処理装置は、入力画像情報に含まれる特定のオブジェクトを検出し、特定のオブジェクトが検出されたときに、少なくとも入力画像情報に含まれる特定のオブジェクトについては圧縮率を変化させるようにしている。例えば、重要とされるオブジェクトが含まれる画像情報については、それ以外の領域の画像情報より小さい圧縮率で圧縮して出力するようにしている。
これにより、重要とされるオブジェクト画像については高画質化を確保しつつ全体の画像情報量を減らすことが可能になる。 As described above, the signal processing method and the signal processing apparatus of the present invention detect a specific object included in the input image information, and at least a specific object included in the input image information when the specific object is detected. The compression rate is changed. For example, image information including an important object is compressed at a compression rate smaller than that of image information in other areas and output.
This makes it possible to reduce the total amount of image information while ensuring high image quality for important object images.

また本発明の記録装置は、オブジェクト検出手段において入力画像情報から検出されたオブジェクトが、格納手段に格納されている所要のオブジェクトと一致するかどうか照合手段で照合し、その照合結果に基づいて、記録制御手段において入力画像情報を記録媒体に記録するための記録制御を行うようにしている。これにより、オブジェクト検出手段により検出されたオブジェクトに応じて、入力画像情報を記録媒体に記録する記録制御を行うことが可能になる。例えば、記録媒体には重要とされるオブジェクトが含まれる入力画像情報だけを記録することが可能になり、記録媒体の記録容量を有効的に利用することが可能になる。 Further, the recording apparatus of the present invention collates whether or not the object detected from the input image information in the object detection unit matches the required object stored in the storage unit, and based on the collation result, The recording control means performs recording control for recording the input image information on the recording medium. This makes it possible to perform recording control for recording input image information on a recording medium in accordance with the object detected by the object detection means. For example, it is possible to record only input image information including an important object on the recording medium, and it is possible to effectively use the recording capacity of the recording medium.

また本発明の再生装置は、オブジェクト検出手段において、記録媒体からの画像情報からオブジェクトの検出を行い、その検出結果に基づいて、読出手段による記録媒体から画像情報の読出動作の制御を行うようにしている。これにより、オブジェクト検出手段により検出されたオブジェクトに応じて記録媒体から画像情報の読出動作の制御を行うことが可能になるので、例えば、記録媒体から必要とされるオブジェクトを含む画像情報をサーチして読み出すといったことが可能になる。 In the reproducing apparatus of the present invention, the object detection means detects an object from the image information from the recording medium, and controls the reading operation of the image information from the recording medium by the reading means based on the detection result. ing. As a result, it becomes possible to control the reading operation of the image information from the recording medium according to the object detected by the object detecting means. For example, the image information including the required object is searched from the recording medium. Can be read out.

以下、本発明の実施の形態について説明していく。
本実施の形態では、カメラ装置部と画像（静止画又は動画）及び音声の記録再生が可能な記録再生装置部とが一体化されたビデオカメラを例にあげる。
説明は次の順序で行う。
１．ビデオカメラの構成
２．エンコードブロックの構成例
３．オブジェクト検出ブロックの構成
４．エンコードブロックの他の構成例
５．デコードブロックの構成例
６．話者認識部の構成例
Hereinafter, embodiments of the present invention will be described.
In this embodiment, a video camera in which a camera device unit and a recording / reproducing device unit capable of recording / reproducing images (still images or moving images) and audio are given as an example.
The description will be given in the following order.
1. 1. Configuration of video camera 2. Configuration example of encoding block 3. Structure of object detection block 4. Other configuration examples of the encoding block 5. Decoding block configuration example Configuration example of speaker recognition unit

１．ビデオカメラの構成

図１は、本実施の形態とされるビデオカメラの構成例を示したブロック図である。
この図に示すレンズブロック１においては、例えば実際には撮像レンズや絞りなどを備えて構成される光学系１１が備えられている。レンズブロック１には、光学系１１に対してオートフォーカス動作を行わせるためのフォーカスモータや、操作部７の操作に基づくズームレンズの移動を行うためのズームモータなどが、モータ部１２として備えられる。
1. Video camera configuration

FIG. 1 is a block diagram illustrating a configuration example of a video camera according to the present embodiment.
In the lens block 1 shown in this figure, for example, an optical system 11 configured with an imaging lens, a diaphragm, and the like is actually provided. The lens block 1 includes a focus motor for causing the optical system 11 to perform an autofocus operation, a zoom motor for moving the zoom lens based on the operation of the operation unit 7, and the like as the motor unit 12. .

カメラブロック２には、主としてレンズブロック１により撮影した画像光をデジタル画像信号に変換するための回路部が備えられる。
このカメラブロック２のＣＣＤ(Charge Coupled Device) ２１に対しては、光学系１１を透過した被写体の光画像が与えられる。ＣＣＤ２１においては上記光画像について光電変換を行うことで撮像信号を生成し、サンプルホールド／ＡＧＣ(Automatic Gain Control)回路２２に供給する。サンプルホールド／ＡＧＣ回路２２では、ＣＣＤ２１から出力された撮像信号についてゲイン調整を行うと共に、サンプルホールド処理を施すことによって波形整形を行う。サンプルホールド／ＡＧＣ回路２の出力は、ビデオＡ／Ｄコンバータ２３に供給されることで、デジタルとしての画像データに変換される。 The camera block 2 is provided with a circuit unit for mainly converting image light photographed by the lens block 1 into a digital image signal.
The CCD (Charge Coupled Device) 21 of the camera block 2 is given an optical image of the subject that has passed through the optical system 11. The CCD 21 performs photoelectric conversion on the optical image to generate an imaging signal and supplies it to a sample hold / AGC (Automatic Gain Control) circuit 22. The sample hold / AGC circuit 22 performs gain adjustment on the image pickup signal output from the CCD 21 and performs waveform shaping by performing sample hold processing. The output of the sample hold / AGC circuit 2 is supplied to the video A / D converter 23 to be converted into digital image data.

上記ＣＣＤ２１、サンプルホールド／ＡＧＣ回路２２、ビデオＡ／Ｄコンバータ２３における信号処理タイミングは、タイミングジェネレータ２４にて生成されるタイミング信号により制御される。タイミングジェネレータ２４では、後述するデータ処理エンコード／デコード部３１（ビデオ信号処理部３内）にて信号処理に利用されるクロックが入力され、このクロックに基づいて所要のタイミング信号を生成するようにされる。これにより、カメラブロック２における信号処理タイミングを、ビデオ信号処理部３における処理タイミングと同期させるようにしている。 Signal processing timing in the CCD 21, sample hold / AGC circuit 22, and video A / D converter 23 is controlled by a timing signal generated by a timing generator 24. In the timing generator 24, a clock used for signal processing in a data processing encoding / decoding unit 31 (in the video signal processing unit 3) described later is input, and a required timing signal is generated based on this clock. The Thereby, the signal processing timing in the camera block 2 is synchronized with the processing timing in the video signal processing unit 3.

カメラコントローラ２５は、カメラブロック２内に備えられる上記各機能回路部が適正に動作するように所要の制御を実行すると共に、レンズブロック１に対してオートフォーカス、自動露出調整、絞り調整、ズームなどのための制御を行うものとされる。
例えばオートフォーカス制御であれば、カメラコントローラ２５は、所定のオートフォーカス制御方式に従って得られるフォーカス制御情報に基づいて、フォーカスモータの回転角を制御する。これにより、撮像レンズはジャストピント状態となるように駆動されることになる。 The camera controller 25 performs necessary control so that each functional circuit unit provided in the camera block 2 operates properly, and performs autofocus, automatic exposure adjustment, aperture adjustment, zoom, and the like on the lens block 1. It is supposed to perform control for
For example, in the case of autofocus control, the camera controller 25 controls the rotation angle of the focus motor based on focus control information obtained according to a predetermined autofocus control method. As a result, the imaging lens is driven so as to be in a just-focus state.

ビデオ信号処理部３は、記録時においては、カメラブロック２から供給されたデジタル画像信号、及びマイクロフォン２０２により集音したことで得られるデジタル音声信号について圧縮処理を施し、これら圧縮データをユーザ記録データとして後段のメディアドライブ部４に供給する。さらにカメラブロック２から供給されたデジタル画像信号とキャラクタ画像により生成した画像をビューファインダドライブ部２０７に供給し、ビューファインダ２０４に表示させる。 At the time of recording, the video signal processing unit 3 performs compression processing on the digital image signal supplied from the camera block 2 and the digital audio signal obtained by collecting the sound with the microphone 202, and converts the compressed data into user recording data. To the media drive unit 4 in the subsequent stage. Further, an image generated from the digital image signal and the character image supplied from the camera block 2 is supplied to the viewfinder drive unit 207 and displayed on the viewfinder 204.

また、再生時においては、メディアドライブ部４から供給されるユーザ再生データ（光ディスク５１からの読み出しデータ）、つまり圧縮処理された画像データ及び音声データについて復調処理を施し、これらを再生画像信号、再生音声信号として出力する。 Further, at the time of reproduction, the user reproduction data (read data from the optical disc 51) supplied from the media drive unit 4, that is, the compressed image data and audio data are demodulated, and these are reproduced as a reproduction image signal and reproduction. Output as an audio signal.

なお、本実施の形態において、画像データの圧縮／伸張処理方式としては、動画像についてはＭＰＥＧ(Moving Picture Experts Group)２を採用している。また、音声データの圧縮／伸張処理方式には、ＡＴＲＡＣ(Adaptive Transform Acoustic Coding) 方式（ＡＴＲＡＣ、ＡＴＲＡＣ２、ＡＴＲＡＣ３等）を採用するものとする。 In the present embodiment, MPEG (Moving Picture Experts Group) 2 is adopted for moving images as a compression / decompression processing method for image data. Further, an ATRAC (Adaptive Transform Acoustic Coding) method (ATRAC, ATRAC2, ATRAC3, etc.) is adopted as the compression / decompression processing method of audio data.

ビデオ信号処理部３のデータ処理エンコード／デコード部３１は、主として、当該ビデオ信号処理部３における画像データ及び音声データの圧縮／伸張処理に関する制御処理と、ビデオ信号処理部３を経由するデータの入出力を司るための処理を実行する。 The data processing encoding / decoding unit 31 of the video signal processing unit 3 mainly includes control processing related to compression / decompression processing of image data and audio data in the video signal processing unit 3 and input of data via the video signal processing unit 3. Execute processing to control output.

またデータ処理エンコード／デコード部３１を含むビデオ信号処理部３全体についての制御処理は、ビデオコントローラ３３が実行するようにされる。このビデオコントローラ３３は、例えばマイクロコンピュータ等を備えて構成され、カメラブロック２のカメラコントローラ２５、及び後述するメディアドライブ部４のドライバコントローラ４６と、例えば図示しないバスライン等を介して相互通信可能とされている。 The video controller 33 executes control processing for the entire video signal processing unit 3 including the data processing encoding / decoding unit 31. The video controller 33 includes, for example, a microcomputer, and can communicate with the camera controller 25 of the camera block 2 and a driver controller 46 of the media drive unit 4 described later via a bus line (not shown), for example. Has been.

音声圧縮エンコーダ／デコーダ部３２には、Ａ／Ｄコンバータ６４（表示／画像／音声入出力部６内）を介して、例えばマイクロフォン２０２により集音された音声がデジタルによる音声データとして入力される。
音声圧縮エンコーダ／デコーダ部３２では、ＡＴＲＡＣ方式のフォーマットに従って入力された音声データに対する圧縮処理を施すようにされる。 For example, the sound collected by the microphone 202 is input to the audio compression encoder / decoder unit 32 as digital audio data via the A / D converter 64 (in the display / image / audio input / output unit 6).
The audio compression encoder / decoder unit 32 performs a compression process on the audio data input according to the ATRAC format.

表示／画像／音声入出力部６においては、ビデオＤ／Ａコンバータ６１に入力された画像データは、ここでアナログ画像信号に変換され、表示コントローラ６２及びコンポジット信号処理回路６３に対して分岐して出力される。 In the display / image / audio input / output unit 6, the image data input to the video D / A converter 61 is converted into an analog image signal here and branched to the display controller 62 and the composite signal processing circuit 63. Is output.

表示コントローラ６２では、入力された画像信号に基づいて表示部６Ａを駆動する。これにより、表示部６Ａにおいて再生画像の表示が行われる。また、表示部６Ａにおいては、光ディスク５１から再生して得られる画像の表示だけでなく、当然のこととして、レンズブロック１及びカメラブロック２からなるカメラ部位により撮影して得られた撮像画像も、ほぼリアルタイムで表示出力させることが可能である。 The display controller 62 drives the display unit 6A based on the input image signal. As a result, the reproduced image is displayed on the display unit 6A. Further, in the display unit 6A, not only the display of the image obtained by reproducing from the optical disc 51, but of course, the captured image obtained by photographing with the camera part composed of the lens block 1 and the camera block 2, It is possible to display and output in almost real time.

また、再生画像及び撮像画像の他、前述のように、機器の動作に応じて所要のメッセージをユーザに知らせるための文字やキャラクタ等によるメッセージ表示も行われるものとされる。このようなメッセージ表示は、例えばビデオコントローラ３３の制御によって、所要の文字やキャラクタ等が所定の位置に表示されるように、データ処理エンコード／デコード部３１からビデオＤ／Ａコンバータ６１に出力すべき画像データに対して、所要の文字やキャラクタ等の画像データを合成する処理を実行するようにすればよい。 In addition to the reproduced image and the captured image, as described above, a message display using characters, characters, or the like for notifying the user of a required message according to the operation of the device is also performed. Such message display should be output from the data processing encode / decode unit 31 to the video D / A converter 61 so that, for example, a required character or character is displayed at a predetermined position under the control of the video controller 33. What is necessary is just to perform the process which synthesize | combines image data, such as a required character and a character, with respect to image data.

コンポジット信号処理回路６３では、ビデオＤ／Ａコンバータ６１から供給されたアナログ画像信号についてコンポジット信号に変換して、ビデオ出力端子Ｔ１に出力する。例えば、ビデオ出力端子Ｔ１を介して、外部モニタ装置等と接続を行えば、当該ビデオカメラで再生した画像を外部モニタ装置により表示させることが可能となる。 The composite signal processing circuit 63 converts the analog image signal supplied from the video D / A converter 61 into a composite signal and outputs it to the video output terminal T1. For example, if an external monitor device or the like is connected via the video output terminal T1, an image reproduced by the video camera can be displayed on the external monitor device.

また、表示／画像／音声入出力部６において、音声圧縮エンコーダ／デコーダ部３２からＤ／Ａコンバータ６５に入力された音声データは、ここでアナログ音声信号に変換され、ヘッドフォン／ライン端子Ｔ２に対して出力される。また、Ｄ／Ａコンバータ６５から出力されたアナログ音声信号は、アンプ６６を介してスピーカＳＰに対しても分岐して出力され、これにより、スピーカＳＰからは、再生音声等が出力されることになる。 Further, in the display / image / audio input / output unit 6, the audio data input from the audio compression encoder / decoder unit 32 to the D / A converter 65 is converted into an analog audio signal here, and is supplied to the headphone / line terminal T2. Is output. Further, the analog audio signal output from the D / A converter 65 is also branched and output to the speaker SP via the amplifier 66, whereby reproduced audio or the like is output from the speaker SP. Become.

メディアドライブ部４では、主として、記録時には記録データをディスク記録に適合するようにエンコードしてデッキ部５に伝送し、再生時においては、デッキ部５において光ディスク５１から読み出されたデータについてデコード処理を施すことで再生データを得て、ビデオ信号処理部３に対して伝送する。 In the media drive unit 4, the recording data is encoded so as to be suitable for disc recording and transmitted to the deck unit 5 during recording, and the data read from the optical disc 51 in the deck unit 5 is decoded during reproduction. To obtain reproduced data and transmit it to the video signal processing unit 3.

このメディアドライブ部４のエンコーダ／デコーダ４１は、記録時においては、データ処理エンコード／デコード部３１から記録データ（圧縮画像データ＋圧縮音声データ）が入力され、この記録データについて、光ディスク５１の記録フォーマットに準拠したエンコード処理を施し、このエンコードされたデータを一時バッファメモリ４２に蓄積する。そして、所要のタイミングで読み出しを行いながらデッキ部５に伝送する。 The encoder / decoder 41 of the media drive unit 4 receives recording data (compressed image data + compressed audio data) from the data processing encoding / decoding unit 31 at the time of recording. The encoded data is stored in the temporary buffer memory 42. Then, the data is transmitted to the deck unit 5 while being read out at a required timing.

再生時においては、光ディスク５１から読み出され、ＲＦ信号処理回路４４、二値化回路４３を介して入力されたデジタル再生信号について、所定のフォーマットに従ったデコード処理を施して、再生データとしてビデオ信号処理部３のデータ処理エンコード／デコード部３１に対して伝送する。 At the time of reproduction, the digital reproduction signal read from the optical disc 51 and inputted through the RF signal processing circuit 44 and the binarization circuit 43 is subjected to decoding processing according to a predetermined format, and video is reproduced as reproduction data. The data is transmitted to the data processing encoding / decoding unit 31 of the signal processing unit 3.

なお、この際においても、必要があれば再生データを一旦バッファメモリ４２に蓄積し、ここから所要のタイミングで読み出したデータをデータ処理エンコード／デコード部３１に伝送出力するようにされる。このような、バッファメモリ４２に対する書き込み／読み出し制御はドライバコントローラ４６が実行するものとされる。
なお、例えば光ディスク５１の再生時において、外乱等によってサーボ等が外れて、ディスクからの信号の読み出しが不可となったような場合でも、バッファメモリ４２に対して読み出しデータが蓄積されている期間内にディスクに対する再生動作を復帰させるようにすれば、再生データとしての時系列的連続性を維持することが可能となる。 Also in this case, if necessary, the reproduction data is temporarily stored in the buffer memory 42, and the data read therefrom at a required timing is transmitted and output to the data processing encoding / decoding unit 31. Such write / read control for the buffer memory 42 is executed by the driver controller 46.
It should be noted that, for example, during reproduction of the optical disk 51, even if the servo is disconnected due to disturbance or the like and reading of the signal from the disk becomes impossible, the read data is stored in the buffer memory 42 within the period. If the playback operation on the disc is restored, it is possible to maintain time-series continuity as the playback data.

ＲＦ信号処理回路４４には、光ディスク５１からの読み出し信号について所要の処理を施すことで、例えば、再生データとしてのＲＦ信号、デッキ部５に対するサーボ制御のためのフォーカスエラー信号、トラッキングエラー信号等のサーボ制御信号を生成する。ＲＦ信号は、上記のように二値化回路４３により２値化され、デジタル信号データとしてエンコーダ／デコーダ４１に入力される。
また、生成された各種サーボ制御信号はサーボ回路４５に供給される。サーボ回路４５では、入力したサーボ制御信号に基づいて、デッキ部５における所要のサーボ制御を実行する。 The RF signal processing circuit 44 performs necessary processing on the read signal from the optical disc 51, such as an RF signal as reproduction data, a focus error signal for servo control for the deck unit 5, and a tracking error signal. Servo control signal is generated. The RF signal is binarized by the binarization circuit 43 as described above, and is input to the encoder / decoder 41 as digital signal data.
The generated servo control signals are supplied to the servo circuit 45. The servo circuit 45 executes necessary servo control in the deck unit 5 based on the input servo control signal.

デッキ部５は、光ディスク５１を駆動するための機構からなる部位とされる。ここでは図示しないが、デッキ部５においては、装填されるべき光ディスク５１が着脱可能とされ、ユーザの作業によって交換が可能な機構を有しているものとされる。 The deck unit 5 is a part composed of a mechanism for driving the optical disc 51. Although not shown here, the deck unit 5 has a mechanism in which the optical disk 51 to be loaded is detachable and can be exchanged by the user's work.

デッキ部５においては、装填された光ディスク５１がスピンドルモータ５２によって回転駆動される。この光ディスク５１に対しては記録／再生時に光学ヘッド５３によってレーザ光が照射される。
光学ヘッド５３は、記録時には記録トラックをキュリー温度まで加熱するための高レベルのレーザ出力を行い、また再生時には磁気カー効果により反射光からデータを検出するための比較的低レベルのレーザ出力を行う。このため、光学ヘッド５３には、ここでは詳しい図示は省略するがレーザ出力手段としてのレーザダイオード、偏光ビームスプリッタや対物レンズ等からなる光学系、及び反射光を検出するためのディテクタが搭載されている。光学ヘッド５３に備えられる対物レンズとしては、例えば２軸機構によってディスク半径方向及びディスクに接離する方向に変位可能に保持されている。
また、図示しないが、デッキ部５においては、スレッドモータ５４により駆動されるスレッド機構が備えられている。このスレッド機構が駆動されることにより、上記光学ヘッド５３全体はディスク半径方向に移動可能とされている。 In the deck unit 5, the loaded optical disk 51 is rotationally driven by a spindle motor 52. The optical disk 51 is irradiated with laser light from the optical head 53 during recording / reproduction.
The optical head 53 performs high-level laser output for heating the recording track to the Curie temperature during recording, and relatively low-level laser output for detecting data from reflected light by the magnetic Kerr effect during reproduction. . Therefore, although not shown in detail here, the optical head 53 is equipped with a laser diode as a laser output means, an optical system including a polarizing beam splitter, an objective lens, and the like, and a detector for detecting reflected light. Yes. The objective lens provided in the optical head 53 is held so as to be displaceable in a disk radial direction and a direction in which it is in contact with and separated from the disk, for example, by a biaxial mechanism.
Although not shown, the deck unit 5 includes a thread mechanism that is driven by a thread motor 54. By driving this sled mechanism, the entire optical head 53 is movable in the disk radial direction.

操作部７には、当該ビデオカメラの各種操作を行うための各種操作子が備えられている。そして、これらの操作子によるユーザの各種操作情報は例えばビデオコントローラ３３に供給される。ビデオコントローラ３３は、ユーザー操作に応じた必要な動作が各部において実行されるようにするための操作情報、制御情報をカメラコントローラ２５、ドライバコントローラ４６に対して供給する。 The operation unit 7 is provided with various operators for performing various operations of the video camera. Various kinds of user operation information using these operators are supplied to the video controller 33, for example. The video controller 33 supplies the camera controller 25 and the driver controller 46 with operation information and control information for causing each unit to execute necessary operations according to user operations.

外部インターフェイス８は、当該ビデオカメラと外部機器とでデータを相互伝送可能とするために設けられており、例えば図のようにＩ／Ｆ(インターフェース）端子Ｔ３とビデオ信号処理部間に対して設けられる。なお、外部インターフェイス８としてはここでは特に限定されるものではないが、例えばＩＥＥＥ１３９４等が採用されればよい。
例えば、外部のデジタル画像機器と本実施の形態のビデオカメラをＩ／Ｆ端子Ｔ３を介して接続した場合、ビデオカメラで撮影した画像（音声）を外部デジタル画像機器に録画したりすることが可能となる。また、外部デジタル画像機器にて再生した画像（音声）データ等を、外部インターフェイス８を介して取り込むことにより、ディスクフォーマットに従って光ディスク５１に記録するといったことも可能となる。更には、例えばキャプションの挿入などに利用する文字情報としてのファイルも取り込んで記録することが可能となる。 The external interface 8 is provided so that data can be transmitted between the video camera and the external device. For example, as shown in the figure, the external interface 8 is provided between the I / F (interface) terminal T3 and the video signal processing unit. It is done. The external interface 8 is not particularly limited here. For example, IEEE 1394 or the like may be adopted.
For example, when an external digital image device and the video camera of this embodiment are connected via the I / F terminal T3, it is possible to record an image (sound) captured by the video camera on the external digital image device. It becomes. In addition, image (sound) data or the like reproduced by an external digital image device can be recorded via the external interface 8 to be recorded on the optical disc 51 according to the disc format. Furthermore, for example, a file as character information used for inserting a caption can be captured and recorded.

計時部３４は現在日時を計時し、ビデオコントローラ３３から要求が有れば、ビデオコントローラ３３に対して現在日時を出力できるようにされている。
フラッシュメモリ３５は、いわゆる不揮発性メモリの一種であり、ユーザによって入力される主な被写体や撮影者の情報が記憶することができるようにされる。 The timer 34 measures the current date and time, and when requested by the video controller 33, the current date and time can be output to the video controller 33.
The flash memory 35 is a kind of so-called non-volatile memory, and can store information on main subjects and photographers input by the user.

電源ブロック９は、内蔵のバッテリにより得られる直流電源あるいは商用交流電源から生成した直流電源を利用して、各機能回路部に対して所要のレベルの電源電圧を供給する。電源ブロック９による電源オン／オフは、操作部７の操作に応じてビデオコントローラ３３が制御する。 The power supply block 9 supplies a power supply voltage of a required level to each functional circuit unit using a DC power source obtained from a built-in battery or a DC power source generated from a commercial AC power source. The video controller 33 controls the power on / off by the power block 9 according to the operation of the operation unit 7.

２．エンコードブロックの構成例

続いて、図２を用いて、図１に示したデータ処理エンコード／デコード部３１に設けられているエンコードブロック３１Ａの構成について説明する。
この図２において、フィルタ部７１は、カメラブロック２からの画像データに対して色信号や解像度の変換を行い、画像圧縮の基本処理単位としての１６×１６の画素ブロックからなるマクロブロック単位で画像データを減算器７２に対して出力するようにされる。また、フィルタ部７１に入力される画像データの一部は動き検出部７８に供給される。
2. Example of encoding block configuration

Next, the configuration of the encoding block 31A provided in the data processing encoding / decoding unit 31 shown in FIG. 1 will be described with reference to FIG.
In FIG. 2, a filter unit 71 performs color signal and resolution conversion on the image data from the camera block 2, and displays an image in units of macroblocks composed of 16 × 16 pixel blocks as basic processing units for image compression. The data is output to the subtracter 72. A part of the image data input to the filter unit 71 is supplied to the motion detection unit 78.

動き検出部７８は、例えばメモリ制御部７９の制御に基づいて画像用メモリ８０を作業領域として利用しながら、マクロブロック単位により前後数十〜数百フレーム内の範囲で動き検出を行い、動きありとされれば、この検出結果を動きベクトル情報として動き補償部８１に出力するようにされる。 The motion detection unit 78 performs motion detection within a range of several tens to several hundreds of frames in units of macro blocks while using the image memory 80 as a work area based on the control of the memory control unit 79, for example. Then, this detection result is output to the motion compensation unit 81 as motion vector information.

動き補償部８１は、動きベクトル検出部７８の検出結果に基づいて、その前後フレームのエンコード結果の画像を部分的に移動するなどの動き補償等の画像処理を行う。
減算器７２は、フィルタ部７１からの画像データと動き補償部８１との差分が取り出されてＤＣＴ部（離散コサイン変換部）７３に出力するようにされる。 Based on the detection result of the motion vector detection unit 78, the motion compensation unit 81 performs image processing such as motion compensation such as partially moving the encoded image of the preceding and following frames.
The subtracter 72 takes out the difference between the image data from the filter unit 71 and the motion compensation unit 81 and outputs the difference to the DCT unit (discrete cosine transform unit) 73.

ＤＣＴ部７３は、減算器７２で取り出された差分データに対してＤＣＴ変換を施した後、量子化部７４に出力するようにされる。 The DCT unit 73 performs DCT conversion on the difference data extracted by the subtracter 72 and then outputs the result to the quantization unit 74.

量子化部７４では、データ量を減らすために、再量子化を行って高周波成分を削減するようにされる。この場合、量子化部７４は、後述するオブジェクト検出ブロック９０からの顔画像情報（オブジェクト情報）や後述する照合部１０１からの照合情報を利用してマクロブロックごとの量子化係数を決定するようにされる。即ち、マクロブロックごとの圧縮率を決定するようにしている。 In the quantization unit 74, in order to reduce the amount of data, requantization is performed to reduce high frequency components. In this case, the quantization unit 74 determines the quantization coefficient for each macroblock using face image information (object information) from the object detection block 90 described later and verification information from the verification unit 101 described later. Is done. That is, the compression rate for each macroblock is determined.

量子化部７４の出力はスキャン部７５に供給される。スキャン部７５では、例えば１フレームの画面左上となるブロックからジグザグとなるようにブロック順の並び替えを行って、可変長符号化部７６に供給するようにされる。可変長符号化部７６では、ランレングスコーディングを行ってさらに情報量を圧縮するようにしている。
可変長符号化部７６から出力される符号化データは、マルチプレクサ７７に供給され、マルチプレクサ７７において、音声圧縮エンコーダ／デコーダ部３２からの音声データが多重化され、ビットストリームとして出力されることになる。 The output of the quantization unit 74 is supplied to the scan unit 75. In the scanning unit 75, for example, the block order is rearranged so as to be zigzag from the block at the upper left of the screen of one frame, and is supplied to the variable length coding unit 76. The variable length encoding unit 76 performs run length coding to further compress the information amount.
The encoded data output from the variable length encoding unit 76 is supplied to the multiplexer 77, where the audio data from the audio compression encoder / decoder unit 32 is multiplexed and output as a bit stream. .

また、量子化部７４の出力は、差分を取るための比較用データとしても利用される。このため、逆量子化部８２と逆ＤＣＴ部８３により構成されるローカルデコーダブロックにも供給される。そして、この逆ＤＣＴ部８３の出力と動き補償部８１の出力を加算器８４で加算してメモリ制御部７９を介して画像用メモリ８６に保持するようにしている。 The output of the quantization unit 74 is also used as comparison data for taking a difference. For this reason, it is also supplied to the local decoder block constituted by the inverse quantization unit 82 and the inverse DCT unit 83. The output of the inverse DCT unit 83 and the output of the motion compensation unit 81 are added by an adder 84 and are held in the image memory 86 via the memory control unit 79.

３．オブジェクト検出ブロックの構成

次に、上記図２に示したオブジェクト検出ブロックを図３及び図４を用いて説明する。なお、本実施の形態では、オブジェクト検出ブロック９０において人物の顔画像をオブジェクトとして検出するものとして説明する。
図３はオブジェクト検出ブロックの構成を示した図である。また図４はオブジェクト検出ブロックにおける顔画像認識のシーケンスの一例を示した図である。
3. Object detection block configuration

Next, the object detection block shown in FIG. 2 will be described with reference to FIGS. In the present embodiment, description will be made assuming that the object detection block 90 detects a human face image as an object.
FIG. 3 is a diagram showing the configuration of the object detection block. FIG. 4 is a diagram showing an example of a face image recognition sequence in the object detection block.

この図３に示すオブジェクト検出ブロック９０は、オブジェクト検出部９１、オブジェクトのパラメータ検出部９４、Ｗａｖｅｌｅｔ変換部９９によって構成される。 The object detection block 90 shown in FIG. 3 includes an object detection unit 91, an object parameter detection unit 94, and a wavelet conversion unit 99.

オブジェクト検出部９１は、例えば入力される画像データから大まかな顔位置を検出するブロックとされ、この場合は動き検出部（Head Detector）９２と顔検出ブロック（Face Finder）９３を備えて構成される。
動き検出部９２は、入力画像データの前後フレームの画像データから動きがある部分を検出することで、入力画像データに顔画像が含まれている大まかな顔位置を検出するようにされる。
また顔画像検出部（Face Finder）９３は、入力される画像データに対して、顔器官上にノードと呼ばれる特徴点を設定し、グラフマッチングを取るなどして顔と思われる画像部分を検出するようにされる。 The object detection unit 91 is, for example, a block that detects a rough face position from input image data. In this case, the object detection unit 91 includes a motion detection unit (Head Detector) 92 and a face detection block (Face Finder) 93. .
The motion detection unit 92 detects a rough face position in which a face image is included in the input image data by detecting a portion having a motion from the image data of the previous and subsequent frames of the input image data.
A face image detection unit (Face Finder) 93 sets feature points called nodes on the facial organs for the input image data, and detects image portions that appear to be faces by graph matching. To be done.

上記オブジェクト検出部９１の動き検出部９２、及び顔画像検出部９３に入力される画像データとしては、上記図２に示した動きベクトル検出部７８において動きベクトルを求めるために画像用メモリ８０に数フレームにわたって蓄積されるフレーム単位の画像データとされる。つまり、オブジェクト検出ブロック９０では、画像用メモリ８０に蓄積されるフレーム画像データを利用してオブジェクト検出を行うようにしている。 As the image data input to the motion detection unit 92 and the face image detection unit 93 of the object detection unit 91, the image data 80 includes a number in the motion vector detection unit 78 shown in FIG. The image data is stored in units of frames. That is, the object detection block 90 performs object detection using frame image data stored in the image memory 80.

なお、オブジェクト検出ブロック９０において、実時間で顔画像の検出を行うことができない場合は、オブジェクト検出ブロック９０において顔認識が離散的に行われるように、オブジェクト検出ブロック９０が画像用メモリ８０から画像データを取り込む周期を選択すればよい。 When the object detection block 90 cannot detect the face image in real time, the object detection block 90 reads the image from the image memory 80 so that the face detection is performed discretely in the object detection block 90. What is necessary is just to select the period which takes in data.

オブジェクトのパラメータ検出部９４は、オブジェクト検出部９１で検出された顔画像の詳細データの検出や処理を行うブロックとされ、顔検出ブロック（Pre Selector）９５、顔の拡大・縮小部９６、顔画像領域検出部（Landmark Finder）９７、背景消去部９８によって構成される。 The object parameter detection unit 94 is a block that detects and processes the detailed data of the face image detected by the object detection unit 91, and includes a face detection block (Pre Selector) 95, a face enlargement / reduction unit 96, a face image. An area detecting unit (Landmark Finder) 97 and a background erasing unit 98 are included.

顔画像検出部９５には、オブジェクト検出部９１において大まかに顔位置が検出されたフレーム単位の入力画像例えば図４（ａ）に示すような入力画像が入力される。
顔画像検出部９５では、図４（ａ）に示した入力画像に対して、例えば図４（ｂ）に示すように５０本程度のノード（特徴点）を割り当てて、さらに詳細に顔部分の検出を行い、後述するＷａｖｅｌｅｔ変換部９９により、それぞれのノードにおける方向性、濃淡や位置関係などから特徴量を検出するようにされる。 The face image detection unit 95 receives an input image in units of frames in which the face position is roughly detected by the object detection unit 91, for example, an input image as shown in FIG.
The face image detection unit 95 assigns about 50 nodes (feature points) to the input image shown in FIG. 4A, for example, as shown in FIG. Detection is performed, and a feature amount is detected from a directionality, shading, positional relationship, and the like in each node by a wavelet transform unit 99 described later.

顔画像検出部９５で検出された画像は、顔の拡大・縮小部９６に供給され、このような顔の拡大・縮小部９６において、最適な大きさになるように拡大・縮小した後、顔領域検出部９７において各ノードに追従してグラフを適切な位置に会わせるなどして顔領域を切り出するようにされる。 The image detected by the face image detection unit 95 is supplied to the face enlargement / reduction unit 96, and the face enlargement / reduction unit 96 enlarges / reduces the image to an optimal size, and then The area detection unit 97 cuts out the face area by following each node and meeting the graph at an appropriate position.

背景消去部９８では、顔領域検出部９７で切り出された顔領域以外の部分は、グレーで埋めるなどした顔画像を検出画像として出力するようにされる。これにより、図４（ｃ）に示すようにな顔部分だけを抽出した抽出画像が顔画像を認識するための顔画像情報として出力される。 In the background erasing unit 98, a face image other than the face region cut out by the face region detecting unit 97 is output as a detected image, such as a face image filled with gray. As a result, an extracted image obtained by extracting only the face portion as shown in FIG. 4C is output as face image information for recognizing the face image.

Ｗａｖｅｌｅｔ変換部９９は、例えば顔画像における顔器官上にノードと呼ばれる特徴点を設定したときに、それぞれのノードにおける方向性、濃淡や位置関係などの特徴量を検出するようにされる。 For example, when a feature point called a node is set on a facial organ in a face image, the Wavelet conversion unit 99 detects a feature amount such as directionality, shading, and positional relationship at each node.

なお、オブジェクト検出ブロック９０は、例えば図５に示すように入力画像に複数の顔画像が含まれている場合でも、入力画像から顔画像領域を切り出し、それぞれ切り出した顔画像領域のデータを顔画像情報として出力することが可能とされる。 Note that the object detection block 90 cuts out a face image area from the input image even when the input image includes a plurality of face images as shown in FIG. It can be output as information.

このように、本実施の形態のビデオカメラでは、オブジェクト検出ブロック９０において、カメラブロック２において撮影された画像にオブジェクトとして顔画像が含まれているかどうかの検出を行うようにしている。そして、カメラブロック２からの入力画像に像顔画像が含まれているときは、そのような顔画像に関する情報をオブジェクト情報として、図２に示した量子化部７４に出力するようにしている。 As described above, in the video camera according to the present embodiment, the object detection block 90 detects whether or not a face image is included as an object in the image captured by the camera block 2. When an image / face image is included in the input image from the camera block 2, information about the face image is output to the quantization unit 74 shown in FIG. 2 as object information.

量子化部７４では、オブジェクト検出ブロック９０から入力されるオブジェクト情報に基づいて、ＤＣＴ部７３から入力される画像データを量子化するときは、例えば、顔画像が含まれるマクロブロックの量子化係数を小さく（圧縮率を小さく）している。またそれ以外のマクロブロックについては、上記顔画像が含まれるマクロブロックより量子化係数を大きく（圧縮率を大きく）している。
なお、顔画像が含まれるマクロブロックを量子化するための量子化係数の値は画質を考慮して適宜設定すれば良い。またそれ以外の領域の量子化係数の値については全体の情報量などを考慮して設定すれば良い。 In the quantization unit 74, when the image data input from the DCT unit 73 is quantized based on the object information input from the object detection block 90, for example, the quantization coefficient of the macroblock including the face image is set. It is small (compression ratio is small). For other macroblocks, the quantization coefficient is set larger (the compression ratio is larger) than the macroblock including the face image.
Note that the value of the quantization coefficient for quantizing the macroblock including the face image may be appropriately set in consideration of the image quality. The quantization coefficient values in other regions may be set in consideration of the total information amount.

従って、このように構成すれば、顔画像が含まれるマクロブロックの画像データについては、量子化部７４において、小さい量子化係数を維持したまま量子化を行うことができ、重要とされる顔画像部分については高画質とすることができる。
また、顔画像が含まれるマクロブロック以外の画像データについては、量子化係数を大きくしたことで、顔画像が含まれるマクロブロック以外の画像データについてはデータ量を減らすことができる。これにより、たとえばフレーム画像全体を小さい量子化係数で圧縮する場合に比べて画像情報量を減らすことができるようになる。 Therefore, with this configuration, the image data of the macroblock including the face image can be quantized while maintaining a small quantization coefficient in the quantization unit 74, and the face image regarded as important The portion can have high image quality.
For image data other than macroblocks including face images, the amount of data for image data other than macroblocks including face images can be reduced by increasing the quantization coefficient. Thereby, for example, the amount of image information can be reduced as compared with the case where the entire frame image is compressed with a small quantization coefficient.

４．エンコードブロックの他の構成例

次に、図６を用いて図１に示したデータ処理エンコード／デコード部３１のエンコードブロック３１Ａの他の構成について説明する。なお、上記図２及び図３と同一部位に部３２の図示は省略する。また、この図６では音声圧縮エンコーダ／デコーダ３２の図示は省略する。
4). Other configuration example of encoding block

Next, another configuration of the encoding block 31A of the data processing encoding / decoding unit 31 shown in FIG. 1 will be described with reference to FIG. In addition, illustration of the part 32 is abbreviate | omitted in the same site | part as the said FIG.2 and FIG.3. In FIG. 6, the audio compression encoder / decoder 32 is not shown.

この図６に示すデータ処理エンコード／デコード部３１のエンコードブロック３１Ａには、あらかじめ重要と考えられる顔画像を登録しておくための、格納手段であるデータベース１００と、このデータベース１００に登録されている顔画像と、オブジェクト検出ブロック９０において入力画像データから検出した顔画像との照合を行う照合部１０１が設けられている。 In the encoding block 31A of the data processing encoding / decoding unit 31 shown in FIG. 6, a database 100 serving as storage means for registering a face image considered important in advance, and the database 100 are registered in the database 100. A collation unit 101 that collates the face image with the face image detected from the input image data in the object detection block 90 is provided.

データベース１００は、例えば光ディスク５１を利用して構成したり、或いは照合部１０１内にメモリなどを設け、このメモリを利用して構成することが考えられる。
光ディスク５１を利用してデータベース１００を構成する場合、通常、光ディスク５１は、図７に示すように、画像コンテンツを格納するデータエリア１１０と、このデータエリア１１０のアドレス情報や、コンテンツ属性などを格納する管理情報エリア１１１に分けられている。
そこで、光ディスク５１の管理情報エリア１１１を利用して顔画像データなどを登録画像として格納しておくことが考えられる。
その場合には、例えばディスク５１の装填時などにおいて、光ディスク５１から登録画像を読み出して、照合部１０１内のメモリに登録画像を取り込むようにする。 For example, the database 100 may be configured using the optical disc 51, or may be configured using a memory provided in the verification unit 101.
When the database 100 is configured using the optical disc 51, the optical disc 51 normally stores a data area 110 for storing image content, address information of the data area 110, content attributes, and the like, as shown in FIG. The management information area 111 is divided into
Therefore, it is conceivable to store face image data or the like as a registered image using the management information area 111 of the optical disc 51.
In this case, for example, when the disc 51 is loaded, the registered image is read from the optical disc 51 and the registered image is taken into the memory in the collation unit 101.

そして、例えば記録時においては、照合部１０１内のメモリに取り込んだ登録画像を読み出して、照合部１０１においてオブジェクト検出ブロック９０で抽出された顔画像情報との照合を行うようにしている。なお、このようなデータベース１００の構成は、あくまでも一例であり、上記構成以外でも良いことは言うまでもない。 For example, at the time of recording, the registered image taken into the memory in the collation unit 101 is read out, and collation with the face image information extracted by the object detection block 90 is performed in the collation unit 101. It should be noted that such a configuration of the database 100 is merely an example, and it goes without saying that other configurations may be used.

照合部１０１では、例えば、図８（ａ）に示すような検出画像の各ノードから得られる情報と、図８（ｂ）に示すようなデータベース１００に予め登録されている登録画像の情報とを比較して照合するようにされる。このような照合部１０１の照合結果は量子化部７４に供給される。なお、この場合の照合部１０１では、オブジェクト検出ブロック９０が画像用メモリ８０から画像データを取り込む周期で、オブジェクト検出ブロック９０からの顔画像情報とデータベース１００の顔画像情報との照合を行うようにすれば良い。 In the collation unit 101, for example, information obtained from each node of the detected image as shown in FIG. 8A and information on the registered image registered in advance in the database 100 as shown in FIG. 8B. Compare and match. The collation result of the collation unit 101 is supplied to the quantization unit 74. In this case, the collation unit 101 collates the face image information from the object detection block 90 and the face image information in the database 100 at a cycle in which the object detection block 90 fetches image data from the image memory 80. Just do it.

量子化部７４では、照合部１０１からの照合結果に基づいて、ＤＣＴ部７３から入力される画像データを量子化するときは、例えば、照合部１０１においてデータベース１００に登録されている顔画像と一致した顔画像が含まれるフレーム単位の画像データについては、量子化係数を小さくし、それ以外の画像データについては量子化係数を大きくするようにしている。 When the quantization unit 74 quantizes the image data input from the DCT unit 73 based on the collation result from the collation unit 101, for example, it matches the face image registered in the database 100 in the collation unit 101. The quantization coefficient is reduced for the frame-based image data including the face image, and the quantization coefficient is increased for the other image data.

したがって、このように構成した場合も、例えばデータベース１００に重要度の高い画像として予め登録されていた顔画像が含まれているフレーム画像データについては、量子化部７４において、フレーム画像ごと、小さい量子化係数により量子化が行われるので、顔画像の高画質を図ることができる。
また顔画像が含まれないフレーム画像データについては、顔画像が含まれる部分より、大きい量子化係数により量子化を行うようにしているので、フレーム画像全体の画像情報量を削減することができる。 Therefore, even in such a configuration, for example, for the frame image data in which a face image registered in advance as a highly important image in the database 100 is included, the quantization unit 74 performs a small quantum for each frame image. Since quantization is performed by the quantization coefficient, high quality of the face image can be achieved.
Further, since the frame image data not including the face image is quantized with a larger quantization coefficient than the portion including the face image, the amount of image information of the entire frame image can be reduced.

また、例えば量子化部７４に対して、照合部１０１からの照合結果と共に、破線で示すようにオブジェクト検出ブロック９０の検出結果を供給すれば、特定の顔画像が含まれるフレーム画像から、特定の顔画像部分の量子化係数だけを小さくして高画質化を図り、またそれ以外の画像については、量子化係数を大きくして画像情報量の削減を図ることも可能である。
したがって、このように構成した場合は、ＤＣＴ部７３からのデータのうち、データベース１００に登録されている顔画像と一致する顔画像部分の情報量だけが多くなるので、フレーム画像全体の画像情報量を大幅に削減することができるようになる。 Further, for example, if the detection result of the object detection block 90 is supplied to the quantization unit 74 together with the matching result from the matching unit 101 as indicated by a broken line, a specific face image is included in a specific image. It is possible to reduce only the quantization coefficient of the face image portion to improve the image quality, and for other images, the quantization coefficient can be increased to reduce the amount of image information.
Therefore, in the case of such a configuration, only the information amount of the face image portion that matches the face image registered in the database 100 among the data from the DCT unit 73 increases, so that the image information amount of the entire frame image Can be greatly reduced.

また、照合部１０１の照合結果を、図１に示したメディアドライブ部４のドライバコントローラ４６に供給すれば、ドライバコントローラ４６において、ビデオ信号処理部３からのビットストリームを光ディスク５１に記録するかどうか制御することも可能になる。 If the collation result of the collation unit 101 is supplied to the driver controller 46 of the media drive unit 4 shown in FIG. 1, whether or not the driver controller 46 records the bit stream from the video signal processing unit 3 on the optical disc 51. It is also possible to control.

例えば、図９（ａ）（ｂ）（ｃ）に示す画像から、オブジェクト検出ブロック９０で顔画像が検出され、照合部１０１において、これらの顔画像のうち、図９（ｂ）に示す顔画像がデータベース１００に登録されている顔画像と一致するという照合結果が得られたとする。 For example, a face image is detected by the object detection block 90 from the images shown in FIGS. 9A, 9B, and 9C, and among the face images, the face image shown in FIG. Assume that a collation result is obtained that matches the face image registered in the database 100.

そして、このような照合結果を図１に示すメディアドライブ部４のドライバコントローラ４６に供給することで、ドライバコントローラ４６において、図９（ａ）（ｃ）に示す画像に対応するデータについては光ディスク５１に記録することなく、図９（ｂ）に示す画像に対応するデータだけを光ディスク５１に記録するように制御を行うことが可能になる。 Then, by supplying such a collation result to the driver controller 46 of the media drive unit 4 shown in FIG. 1, the driver controller 46 uses the optical disc 51 for data corresponding to the images shown in FIGS. It is possible to perform control so that only the data corresponding to the image shown in FIG.

この場合、光ディスク５１には、データベース１００に登録されている顔画像と一致した顔画像が含まれている画像だけが記録される。
これにより、光ディスク５１には重要とされる顔画像が含まれているフレーム画像データだけを記録することができ、光ディスク５１の記録容量を有効利用することができる。 In this case, only an image including a face image that matches the face image registered in the database 100 is recorded on the optical disc 51.
Thereby, only the frame image data containing the important face image is recorded on the optical disc 51, and the recording capacity of the optical disc 51 can be used effectively.

また、照合部１０１の照合結果を図１に示したメディアドライブ部４のドライバコントローラ４６に供給したうえで、上記のように量子化部７４において、特定の顔画像が含まれるフレーム画像から、特定の顔画像部分の量子化係数だけを小さくして高画質化を図るようにすれば、図９（ｂ）に示す画像を光ディスク５１に対して記録するときに、顔部分の画像だけを高画質化して記録することができるようになる。 Further, after the collation result of the collation unit 101 is supplied to the driver controller 46 of the media drive unit 4 shown in FIG. 1, the quantization unit 74 identifies the specific face image from the frame image as described above. If the image shown in FIG. 9B is recorded on the optical disc 51, only the image of the face portion is improved in image quality. Can be recorded.

このようにすれば、光ディスク５１には、データベース１００に登録されている顔画像と一致した顔画像が含まれている画像で、しかも顔画像だけを高画質化した画像だけを記録することができるので、画像情報量をさらに減らすことができ、光ディスク５１の記録容量をさらに有効利用することができる。 In this way, on the optical disc 51, only an image that includes a face image that matches the face image registered in the database 100 and that is a high-quality image of the face image can be recorded. Therefore, the amount of image information can be further reduced, and the recording capacity of the optical disc 51 can be used more effectively.

また、本実施の形態のように、光ディスク５１を利用してデータベース１００を構成し場合は、例えば光ディスク５１には、データベース１００に登録されている登録画像（顔画像）と一致した顔画像が含まれている入力画像だけを記録することが可能になる。
従って、例えばユーザが記録しておきたい顔画像を光ディスク５１のデータベース１００に登録しておけば、そのような顔画像だけを記録した光ディスク５１を作成するといったことが可能になる。 When the database 100 is configured using the optical disc 51 as in the present embodiment, for example, the optical disc 51 includes a face image that matches the registered image (face image) registered in the database 100. Only the input image that has been recorded can be recorded.
Therefore, for example, if a face image that the user wants to record is registered in the database 100 of the optical disc 51, it becomes possible to create the optical disc 51 in which only such a face image is recorded.

５．デコードブロックの構成例

また、これまでの説明においては、カメラブロック２で撮影した撮影画像を光ディスク５１に記録するためにデータ処理エンコード／デコード部３１においてエンコードする場合の動作を例に挙げて説明したが、データ処理エンコード／デコード部においてデコードする場合においても本発明を適用することが可能である。
5). Decoding block configuration example

In the above description, the operation in the case of encoding in the data processing encoding / decoding unit 31 in order to record the captured image taken by the camera block 2 on the optical disc 51 has been described as an example. The present invention can be applied even when decoding is performed in the decoding unit.

図１０は、データ処理エンコード／デコード部３１のデコードブロック３１Ｂの構成例を示した図である。
なお、図１０に示すデコーダブロック３１Ｂでは、光ディスク５１から読み出されるオーディオデータについての説明は省略し、ビデオデータについてのみ説明する。
また図１０に示すデコードブロックでは、２つのフレームメモリ１２６，１２７を備え、Ｉピクチャ、Ｐピクチャではこれらのフレームメモリ１２６，１２７を交互に使用するものとする。 FIG. 10 is a diagram illustrating a configuration example of the decode block 31B of the data processing encode / decode unit 31.
In the decoder block 31B shown in FIG. 10, description of audio data read from the optical disc 51 is omitted, and only video data is described.
Further, the decoding block shown in FIG. 10 includes two frame memories 126 and 127, and these frame memories 126 and 127 are alternately used for an I picture and a P picture.

デッキ部５の光ディスク５１から読み出され、メディアドライブ部４において所定のフォーマットに従ったデコード処理が施されることにより得られるビットストリーム（ビデオビット列）は、受信バッファ１２１を介して可変長符号デコーダ１２２に入力される。そして、可変長符号デコーダ１２２において、上述したエンコードブロック３１Ａの可変長エンコードに対するデコード処理が施される。 A bit stream (video bit string) read from the optical disc 51 of the deck unit 5 and obtained by performing decoding processing according to a predetermined format in the media drive unit 4 is sent to the variable length code decoder via the reception buffer 121. 122 is input. Then, the variable length code decoder 122 performs a decoding process on the variable length encoding of the encoding block 31A described above.

可変長符号デコーダ１２２では、Ｉピクチャ、Ｐピクチャ、Ｂピクチャのいずれであるかを示すピクチャタイプＰｔｙｐｅ、イントラ予測（ｉｎｔｒａ）、バックワード／フォワード予測（Ｂ／Ｆ）、フォワード／バックワード予測（Ｆ／Ｂ）のいずれであるかを示すマクロブロックタイプＭｔｙｐｅなどが分離される。
可変長符号デコーダ１２２で分離されたピクチャタイプＰｔｙｐｅは、切替スイッチ１３１の切替制御に利用され、マクロブロックタイプＭｔｙｐｅは切替スイッチ１３２の切替制御に利用されている。
また可変長符号デコーダ１２２で分離された動きベクトルＭＶはフレームメモリ１２６，１２７からのハーフペル動き補償部１２８，１２９において利用されている。 In the variable-length code decoder 122, a picture type Ptype indicating whether the picture is an I picture, a P picture, or a B picture, intra prediction (intra), backward / forward prediction (B / F), forward / backward prediction (F / B) is separated into a macroblock type Mtype or the like.
The picture type Ptype separated by the variable length code decoder 122 is used for switching control of the selector switch 131, and the macroblock type Mtype is used for switching control of the selector switch 132.
The motion vector MV separated by the variable length code decoder 122 is used in the half-pel motion compensation units 128 and 129 from the frame memories 126 and 127.

ここで、ピクチャタイプＰｔｙｐｅについて簡単に説明しておくと、圧縮処理される画像信号の各フレームについては、圧縮度の異なる３種類の画像データ（１フレームの映像データ）が設けられる。これらは、Ｉピクチャ(Intra Picture)、Ｐピクチャ(Predicted Picture)、Ｂピクチャ(Bidirectionally predicted Picture)と呼ばれる。
Ｉピクチャは、フレーム内予測画面のみからなる画像データとされ、Ｐピクチャはフレーム間順方向予測により生成される画像データとされる。またＢピクチャはフレーム間双方向予測により生成される画像データとされる。 Here, the picture type Ptype will be briefly described. For each frame of the image signal to be compressed, three types of image data (one frame of video data) having different compression degrees are provided. These are called an I picture (Intra Picture), a P picture (Predicted Picture), and a B picture (Bidirectionally predicted Picture).
The I picture is image data including only the intra-frame prediction screen, and the P picture is image data generated by inter-frame forward prediction. The B picture is image data generated by inter-frame bi-directional prediction.

また可変長符号デコーダ１２２でデコーダされた量子化ＤＣＴ係数(２次元可変長符号)は、逆量子化部１２３、逆ＤＣＴ変換部（ＩＤＣＴ）１２４で変換され、差分画像データとなる。そして、Ｉピクチャ、Ｐピクチャであれば、フレームメモリ１２６，１２７の何れか一方のフレームメモリで、マクロブロックごとに動き補償された対応する位置の画素値を加算器１２５で加算して、他方のフレームメモリ１２７に蓄えるようにする。
Ｉピクチャ、Ｐピクチャのデコード処理中には、現在書き込み中でない方のフレームメモリ１２６，１２７の何れかを出力するようにされる。 Also, the quantized DCT coefficient (two-dimensional variable length code) decoded by the variable length code decoder 122 is converted by the inverse quantization unit 123 and the inverse DCT conversion unit (IDCT) 124 to become difference image data. If the picture is an I picture or a P picture, the adder 125 adds the pixel values of the corresponding positions, which are motion-compensated for each macroblock, in one of the frame memories 126 and 127, and The frame memory 127 is stored.
During the decoding process of the I picture and P picture, one of the frame memories 126 and 127 that are not currently being written is output.

また、Ｂピクチャでは、半画素単位で動き補償が行われるハーフペル動き補償部１２８，１２９で２つのフレームメモリ１２６，１２７から並列にハーフペル動き補償が行われる。そして、これらのハーフペル動き補償部１２８，１２９の出力を平均化回路部１３０で平均化すると共に、それぞれに用意された動きベクトルを使って、動き補償した予測画素値を加算器１２５で加算してＢピクチャの再生画像を作成する。この画像はどこにも蓄積することなくそのまま出力するようにされる。 In the B picture, half-pel motion compensation is performed in parallel from the two frame memories 126 and 127 by the half-pel motion compensation units 128 and 129 that perform motion compensation in units of half pixels. The outputs of the half-pel motion compensation units 128 and 129 are averaged by the averaging circuit unit 130, and the motion compensated predicted pixel values are added by the adder 125 using the motion vectors prepared for each of them. A playback image of a B picture is created. This image is output as it is without accumulating anywhere.

このように図１０に示すデコーダブロックにおいては、Ｂピクチャのときは、デコード処理中の画像をそのまま出力し、Ｉピクチャ、Ｐピクチャのときはハーフペル動き補償に使用している参照画像を出力することで、デコード後の画面の順番替え(Ｉピクチャ、Ｐピクチャの遅延)を行うようにしている。 As described above, in the decoder block shown in FIG. 10, an image being decoded is output as it is for a B picture, and a reference image used for half-pel motion compensation is output for an I picture or a P picture. Therefore, the order of the screen after decoding (delay of I picture and P picture) is performed.

このようなデコーダブロックに対して、本発明を適用する場合は、例えばフレームメモリ１２６，１２７に蓄積されたＩピクチャ、Ｐピクチャを適当な間隔でスキャンしながらオブジェクト検出ブロック９０に供給するようにしている。 When the present invention is applied to such a decoder block, for example, an I picture and a P picture stored in the frame memories 126 and 127 are supplied to the object detection block 90 while scanning at appropriate intervals. Yes.

この場合のオブジェクト検出ブロック９０も、上記同様の構成とされ、入力されるＩピクチャ、Ｐピクチャの画像情報から顔画像の検出を行い、その検出結果に応じて、メディアドライブ部４のドライバコントローラ４６に顔画像情報を出力するようにされる。
そして、ドライバコントローラ４６は、このようなオブジェクト検出ブロック９０からの顔画像情報に基づいて、光ディスク５１からのデータ読み出し動作の制御を行うようにしている。 The object detection block 90 in this case also has the same configuration as described above, detects a face image from input image information of an I picture and a P picture, and according to the detection result, the driver controller 46 of the media drive unit 4. The face image information is output to the screen.
The driver controller 46 controls the data reading operation from the optical disc 51 based on the face image information from the object detection block 90.

これにより、例えば、オブジェクト検出ブロック９０において顔画像が検出されるまではメディアドライブ部４においてデッキ部５の光ディスク５１から高速でデータの読み出しを行うといった高速ピクチャサーチを実現することができる。 Accordingly, for example, high-speed picture search in which the media drive unit 4 reads data from the optical disk 51 of the deck unit 5 at high speed until the face image is detected in the object detection block 90 can be realized.

また、照合部１０１において、オブジェクト検出ブロック９０において検出された顔画像をデータベース１００に予め登録された顔画像を登録したデータベースと照合し、データベース１００の顔画像と一致する顔画像が含まれる画像だけをサーチして再生するといったことも可能になる。 Further, the collation unit 101 collates the face image detected in the object detection block 90 with a database in which face images registered in advance in the database 100 are registered, and only an image including a face image that matches the face image in the database 100 is included. It is also possible to search and play back.

ところで、これまで説明した本実施の形態では、人物を特定するにあたって、オブジェクト検出ブロック９０において検出された顔画像情報と、データベース１００に登録されている顔画像情報を照合して人物を特定するようにしている。
しかしながら、このようにして人物の特定を行ったとしても、人物の顔が極端に似通っている場合などは照合結果に誤りが発生することがある。 By the way, in the present embodiment described so far, when specifying a person, the face image information detected in the object detection block 90 and the face image information registered in the database 100 are collated to specify the person. I have to.
However, even if the person is specified in this way, an error may occur in the collation result when the face of the person is extremely similar.

そこで、例えば上記したようなオブジェクト検出ブロック９０における顔画像情報の照合と併用して音声データを利用した話者認識を行い、より確実に人物を特定することも考えられる。 Therefore, for example, it is conceivable to perform speaker recognition using voice data in combination with the collation of face image information in the object detection block 90 as described above to specify a person more reliably.

６．話者認識部の構成例

ここで、図１１に話者認識ブロックの一例を示す。
なお、このような話者認識ブロックは、例えば図１に示す音声圧縮エンコーダ／デコーダ部３２内に設けられるものである。
この図１１に示す特徴パラメータ抽出部１４１は、入力される音声データから特徴パラメータを抽出して類似度計算部１４２に供給するようにされる。
類似度計算部１４２には、データベース１００などに予め登録されている声データから特徴パラメータ抽出部１４４で抽出された特徴パラメータが供給されており、類似度計算部１４２はこれらのパラメータから類似度を計算するようにされる。
そして、この類似度計算部１４２の計算結果に基づいて話者認識判定部１４３で該当する話者かどうかの判定を行い、その判定結果を話者情報として出力するようにされる。
したがって、この話者情報をデータ処理エンコード／デコード部３１の量子化部７４に供給すれば、量子化部７４では、このような話者情報とオブジェクト検出ブロック９０から顔画像を認識するための顔画像情報（オブジェクト情報）と併用して人物（オブジェクト）などの特定を行うことが可能になる。
6). Configuration example of speaker recognition unit

Here, FIG. 11 shows an example of the speaker recognition block.
Such a speaker recognition block is provided in, for example, the voice compression encoder / decoder unit 32 shown in FIG.
The feature parameter extraction unit 141 shown in FIG. 11 extracts feature parameters from the input voice data and supplies them to the similarity calculation unit 142.
The similarity calculation unit 142 is supplied with the feature parameters extracted by the feature parameter extraction unit 144 from voice data registered in advance in the database 100 or the like, and the similarity calculation unit 142 calculates the similarity from these parameters. To be calculated.
Based on the calculation result of the similarity calculation unit 142, the speaker recognition determination unit 143 determines whether the speaker is a corresponding speaker, and the determination result is output as speaker information.
Accordingly, if this speaker information is supplied to the quantization unit 74 of the data processing encoding / decoding unit 31, the quantization unit 74 recognizes a face image for recognizing a face image from the speaker information and the object detection block 90. It becomes possible to specify a person (object) or the like together with image information (object information).

なお、オブジェクト検出ブロック９０で検出された顔画像情報と動きベクトルを併用して人物の特定する、さらには上記した話者認識情報を含めて人物の特定を行うようにすれば、人物の特定をより確実に行うことができるようになる。 If the person is specified by using the face image information detected by the object detection block 90 and the motion vector together, and further the person is specified including the speaker recognition information described above, the person is specified. It becomes possible to carry out more reliably.

なお、本実施の形態では、オブジェクト検出ブロック９０においては、オブジェクトとして顔画像を検出する場合を例に挙げて説明したが、これはあくまでも一例であり、オブジェクト検出ブロックにおいて他のオブジェクト、例えば自動車などの形や、文字などを検出するように構成しても良い。 In the present embodiment, the object detection block 90 has been described with reference to an example in which a face image is detected as an object. However, this is merely an example, and other objects such as automobiles are used in the object detection block. It may be configured to detect the shape or character.

また、本実施の形態においては、本発明をビデオカメラに適用した場合を例に挙げて説明したが、これはあくまでも一例であり、本発明はビデオカメラ以外の各種記録再生装置に適用可能である。
また、記録媒体として記録可能な光ディスク５１としては、ＣＤ（Compact Disc）方式のディスク、ＤＶＤ（Digital Versatile Disc）方式のディスク、ブルーレイ（Blu-Ray）方式のディスクなど各種ディスクを例に挙げて説明したが、これはあくまでも一例であり、ミニディスク（Mini Disk）や、ＨＤＤ（ハードディスクドライブ）、或いはフラッシュメモリなどのメモリカードなどを、記録媒体として利用することも可能である。 In the present embodiment, the case where the present invention is applied to a video camera has been described as an example. However, this is merely an example, and the present invention can be applied to various recording / playback apparatuses other than a video camera. .
Further, as the optical disc 51 that can be recorded as a recording medium, various discs such as a CD (Compact Disc) disc, a DVD (Digital Versatile Disc) disc, and a Blu-Ray disc are described as examples. However, this is merely an example, and a mini disk, a HDD (hard disk drive), a memory card such as a flash memory, or the like can be used as a recording medium.

本発明の実施の形態のビデオカメラの構成を示したブロック図である。It is the block diagram which showed the structure of the video camera of embodiment of this invention. データ処理エンコード部の構成を示した図である。It is the figure which showed the structure of the data processing encoding part. 図２に示したオブジェクト検出ブロックの構成例を示した図である。FIG. 3 is a diagram illustrating a configuration example of an object detection block illustrated in FIG. 2. 顔画像認識のシーケンスの一例を示した図である。It is the figure which showed an example of the sequence of face image recognition. 顔画像認識による顔画像の検出の一例を示した図である。It is the figure which showed an example of the detection of the face image by face image recognition. データ処理エンコード部の他の構成を示した図である。It is the figure which showed the other structure of the data processing encoding part. データベースの一例を示した図である。It is the figure which showed an example of the database. 照合部における照合を説明するための図である。It is a figure for demonstrating the collation in a collation part. 記録時の記録動作を説明するための図である。It is a figure for demonstrating the recording operation at the time of recording. データ処理デコード部の構成を示した図である。It is the figure which showed the structure of the data processing decoding part. 話者認識の構成を示したブロック図である。It is the block diagram which showed the structure of speaker recognition.

Explanation of symbols

１レンズブロック、２カメラブロック、３ビデオ信号処理部、４メディアドライブ部、５デッキ部、３１データ処理エンコード／デコード部、３２音声圧縮エンコーダ／デコーダ部、３３ビデオコントローラ、４６ドライバコントローラ、５１光ディスク、７１フィルタ部、７２減算器、７３ＤＣＴ部、７４量子化部、７５スキャン部、７６可変長符号化部、７７マルチプレクサ、７８動きベクトル検出部、７９８５メモリ制御部、８０８６画像用メモリ、８１動き補償部、８２逆量子化部、８３逆ＤＣＴ部、８４加算器、９０オブジェクト検出ブロック、９１オブジェクト検出部、９２動き検出部、９３顔画像検出部、９４パラメータ検出部、９５顔画像検出部、９６拡大・縮小部、９７顔画像領域検出部、９８背景消去部、９９Ｗａｖｅｌｅｔ変換部、１００データベース、１０１照合部、１１０データエリア、１１１管理情報エリア、１２１受信バッファ、１２２可変長符号デコーダ、１２３逆量子化部、１２４ＩＣＤＴ、１２５加算器、１２６１２７フレームメモリ、１２８１２９ハーフペル動き補償部、１３０平均化部、１４１特徴パラメータ抽出部、１４２類似度計算部、１４３話者認識判定部、１４４特徴パラメータ抽出部 1 lens block, 2 camera block, 3 video signal processing unit, 4 media drive unit, 5 deck unit, 31 data processing encode / decode unit, 32 audio compression encoder / decoder unit, 33 video controller, 46 driver controller, 51 optical disc, 71 Filter unit, 72 Subtractor, 73 DCT unit, 74 Quantization unit, 75 Scan unit, 76 Variable length encoding unit, 77 Multiplexer, 78 Motion vector detection unit, 79 85 Memory control unit, 80 86 Image memory, 81 Motion compensation unit, 82 inverse quantization unit, 83 inverse DCT unit, 84 adder, 90 object detection block, 91 object detection unit, 92 motion detection unit, 93 face image detection unit, 94 parameter detection unit, 95 face image detection unit , 96 Enlarging / reducing part, 97 Image area detection unit, 98 background elimination unit, 99 Wavelet conversion unit, 100 database, 101 verification unit, 110 data area, 111 management information area, 121 reception buffer, 122 variable length code decoder, 123 inverse quantization unit, 124 ICDT, 125 adder, 126 127 frame memory, 128 129 half-pel motion compensation unit, 130 averaging unit, 141 feature parameter extraction unit, 142 similarity calculation unit, 143 speaker recognition determination unit, 144 feature parameter extraction unit

Claims

Detect specific objects included in the input image information,
When the specific object is detected, at least the specific object included in the input image information is compressed by changing a compression rate.

Object detection means for detecting a specific object included in the input image information;
Image compression means for changing the compression ratio of at least the specific object included in the input image information when the specific object is detected by the object detection means;
A signal processing apparatus comprising:

The image compression means includes
When the specific object is detected by the object detection means, the image information of the area including the specific object in the input image information is compressed at a compression rate smaller than the image information of the other areas. The signal processing apparatus according to claim 2.

Storage means for storing information on the required object;
Collation means for collating whether or not the specific object detected by the object detection means matches the required object stored in the storage means;
The image compression means includes
The input image that includes the specific object when the collating unit determines that the specific object detected by the object detecting unit matches the required object stored in the storage unit The signal processing apparatus according to claim 2, wherein the compression is performed by changing a compression rate of the entire information.

The image compression means includes
The input image that includes the specific object when the collating unit determines that the specific object detected by the object detecting unit matches the required object stored in the storage unit The signal processing apparatus according to claim 4, wherein image information in an area including the specific object is compressed at a compression rate smaller than that of image information in other areas.

Object detection means for detecting a specific object included in the input image information;
Storage means for storing information on the required object;
Collation means for collating whether or not the specific object detected by the object detection means matches the required object stored in the storage means;
The specific object is included when it is determined that the specific object detected by the object detection means matches the required object stored in the storage means based on the collation result of the collation means. Recording control means for performing recording control for recording the input image information on a recording medium;
A recording apparatus comprising:

When the specific object is detected by the object detecting means, the image information of the region including the specific object is included in the input image information including the specific object, and the image information of the other region is included. The recording apparatus according to claim 6, further comprising image compression means for compressing at a smaller compression rate.

Reading means for reading image information from the recording medium;
Decoding means for performing predetermined decoding processing on the image information read by the reading means and outputting as image information;
Object detection means for detecting a specific object from the image information output from the decoding means;
Read control means for controlling the reading operation of the image information from the recording medium by the read means based on the detection result of the object detection means;
A playback device comprising:

Storage means for storing required objects;
Collation means for collating whether or not the specific object detected by the object detection means matches the required object stored in the storage means;
The read control means includes
9. The playback apparatus according to claim 8, wherein the reading operation of the image information from the recording medium by the reading unit is controlled based on a collation result of the collating unit.