JP2013003392A

JP2013003392A - Sound recording apparatus

Info

Publication number: JP2013003392A
Application number: JP2011135266A
Authority: JP
Inventors: Hiroyoshi Sato; 寛祥佐藤
Original assignee: Sanyo Electric Co Ltd
Current assignee: Sanyo Electric Co Ltd
Priority date: 2011-06-17
Filing date: 2011-06-17
Publication date: 2013-01-07

Abstract

PROBLEM TO BE SOLVED: To improve user-friendliness of a sound recording apparatus.SOLUTION: The sound recording apparatus includes: imaging means which picks up an optical image of a subject and outputs it as an image signal; scene determination means which determines a scene on the basis of the image signal; sound recording means which collects and records sound; and setting means which sets a sound recording condition for sound recording, on the basis of the scene determined by the scene determination means. In one aspect of the sound recording apparatus, it is unnecessary for a user to pay attention to a scene for sound recording to set a sound recording condition by himself or herself because a recommended scene for sound recording is determined on the basis of an image signal and a sound recording condition is set on the basis of the scene of sound recording, so that a suitable sound recording condition can be set more easily.

Description

本発明は、録音装置に関し、特に撮像機能を有し撮像された画像信号に基づいて好適な録音を行う録音装置に関する。 The present invention relates to a recording apparatus, and more particularly to a recording apparatus that has an imaging function and performs suitable recording based on a captured image signal.

従来、予め複数の動画シーンに対応した録音設定情報を記録しておき、音声付の動画撮影に際して使用者に所望するいずれかの録画シーンを指定させ、録画シーンに対応する録音設定情報に基づき、撮影時の録音を制御する技術が開示されている。
特開２００６−２１７１１１号公報 Conventionally, recording setting information corresponding to a plurality of moving image scenes is recorded in advance, the user is allowed to specify any recording scene when shooting a movie with sound, and based on the recording setting information corresponding to the recording scene, A technique for controlling recording during shooting is disclosed.
JP 2006-217111 A

従来の技術では、使用者が所望する録画シーンを設定することによって適切な録音設定情報に基づき録音を制御することが出来る。言い換えれば、録画シーンを設定しない限り、録音を適切に制御することが出来ない。従って、従来の技術では、録画を目的とせず適切な録音を目的とした場合においても、所望の録音設定情報に対応する録画シーンを選択しなければならず、使用者にとってわずらわしいものであった。 In the conventional technique, recording can be controlled based on appropriate recording setting information by setting a recording scene desired by the user. In other words, recording cannot be controlled properly unless a recording scene is set. Therefore, in the conventional technology, even when recording is not intended but appropriate recording is performed, a recording scene corresponding to desired recording setting information must be selected, which is troublesome for the user.

本発明は上記の問題を解決するもので、この発明の目的の一つは、好適な録音設定をより簡単に行うことが出来る録音装置を提供するものである。 The present invention solves the above-mentioned problems, and one of the objects of the present invention is to provide a recording apparatus capable of performing a suitable recording setting more easily.

本発明の録音装置は、被写体の光学像を撮像し画像信号として出力する撮像手段と、画像信号に基づいてシーンを判別するシーン判別手段と、音声を集音し録音する録音手段と、シーン判別手段によって判別されたシーンに基づいて、録音するための録音条件を設定する設定手段とを備えることを特徴とする。 The recording apparatus of the present invention includes an imaging unit that captures an optical image of a subject and outputs it as an image signal, a scene determination unit that determines a scene based on the image signal, a recording unit that collects and records sound, and a scene determination Setting means for setting recording conditions for recording based on the scene determined by the means.

この録音装置のある局面によれば、画像信号に基づいて推奨する録音シーンを判別し、該録音シーンに基づいて録音条件を設定するため、使用者は録音シーンを意識して録音条件を自ら設定する必要がないため、より簡単に好適な録音条件を設定することが出来る。 According to one aspect of this recording device, the recommended recording scene is determined based on the image signal, and the recording condition is set based on the recording scene. Therefore, the user sets the recording condition himself in consideration of the recording scene. Therefore, it is possible to set a suitable recording condition more easily.

好ましくは、録音手段を制御して設定手段によって設定された録音条件で録音する録音制御手段と、設定手段によって設定された録音条件を変更する変更手段を更に備える。 Preferably, the recording apparatus further includes recording control means for controlling the recording means to perform recording under the recording conditions set by the setting means, and changing means for changing the recording conditions set by the setting means.

この局面によれば、画像信号に基づいて設定された録音条件に対して変更することが出来るため、使用者が所望する録音条件により近づく。 According to this aspect, since the recording condition set based on the image signal can be changed, the recording condition desired by the user is closer.

好ましくは、設定手段によって設定された録音条件を報知する報知手段を更に備える。 Preferably, the information processing apparatus further includes notification means for notifying the recording condition set by the setting means.

この局面によれば、録音条件が報知されることにより使用者は設定された録音条件を認識することが出来るため、所望する録音条件か否かを判断することが可能となる。 According to this aspect, since the user can recognize the set recording condition by notifying the recording condition, it is possible to determine whether or not the recording condition is desired.

好ましくは、シーン判別手段は、画像信号から所定の形状をした物体などを検出することにより得られる物体の数及び又は物体と物体の離れ具合を検出してシーンを判別する。 Preferably, the scene discrimination means discriminates the scene by detecting the number of objects obtained by detecting an object having a predetermined shape or the like from the image signal and / or the degree of separation between the objects.

この局面によれば、画像信号から所定の形状をした物体などを検出することにより得られる物体の数及び又は物体と物体の離れ具合が検出出来るため、現在のシーンをより精度よく判別することが出来る。 According to this aspect, since the number of objects obtained by detecting an object having a predetermined shape from the image signal and / or the degree of separation between the objects can be detected, it is possible to more accurately determine the current scene. I can do it.

好ましくは、シーン判別手段は、画像信号に基づいて人数及び又は人と人の離れ具合を検出してシーンを判別する。 Preferably, the scene discrimination means discriminates the scene by detecting the number of people and / or the degree of separation between people based on the image signal.

この局面によれば、人数及び又は人と人の離れ具合が検出出来るため、現在のシーンをより精度よく判別することが出来る。 According to this aspect, since the number of persons and / or the degree of separation between persons can be detected, the current scene can be determined with higher accuracy.

好ましくは、シーン判別手段は、画像信号に含まれる顔信号に基づいて人数及び又は人と人の離れ具合を検出してシーンを判別する。 Preferably, the scene discrimination means discriminates the scene by detecting the number of persons and / or the degree of separation between persons based on the face signal included in the image signal.

この局面によれば、設定手段においてシーン判別手段によって判別されたシーンに基づいて設定された録音条件は、無指向性のマイクロフォン又は指向性のマイクロフォンを用いて音声を集音する条件を含む。 According to this aspect, the recording condition set based on the scene determined by the scene determining unit in the setting unit includes a condition for collecting sound using a non-directional microphone or a directional microphone.

この局面によれば、録音条件が無指向性のマイクロフォン又は指向性のマイクロフォンを用いて音声を集音する条件を含むため、より好適な録音をすることが出来る。 According to this aspect, since the recording condition includes a condition for collecting sound using an omnidirectional microphone or a directional microphone, more suitable recording can be performed.

本発明の音声を集音し録音する録音装置において録音するための条件を設定する方法は、被写体の光学像を撮像し画像信号として出力するステップ、画像信号に基づいてシーンを判別するステップ、判別されたシーンに基づいて、録音するための録音条件を設定するステップから成る。本発明のプログラムは、被写体の光学像を撮像し画像信号として出力する撮像手段と音声を集音し録音する録音手段を備える電子機器のプロセッサに、被写体の光学像を撮像し画像信号として出力するステップ、画像信号に基づいてシーンを判別するステップ、判別されたシーンに基づいて、録音するための録音条件を設定するステップを実行させる。 The method of setting conditions for recording in a recording apparatus for collecting and recording sound according to the present invention includes a step of capturing an optical image of a subject and outputting it as an image signal, a step of determining a scene based on the image signal, and determination The method comprises a step of setting recording conditions for recording on the basis of the recorded scene. The program of the present invention captures an optical image of a subject and outputs it as an image signal to a processor of an electronic device that includes an imaging means for capturing an optical image of the subject and outputting it as an image signal and a recording means for collecting and recording sound. A step of determining a scene based on the image signal; and a step of setting recording conditions for recording based on the determined scene.

本発明の録音装置によれば、好適な録音条件の設定をより簡単に行うことが出来る。 According to the recording apparatus of the present invention, it is possible to more easily set a suitable recording condition.

本発明の一実施例であるＩＣレコーダの回路構成の一部を示すブロック図である。It is a block diagram which shows a part of circuit structure of the IC recorder which is one Example of this invention. 顔認識処理に用いる各サイズを示す概念図の一例である。It is an example of the conceptual diagram which shows each size used for a face recognition process. 被写体Ａの撮像画像に対して顔認識処理及びパターン認識処理において検出される顔及びサイズを示す概念図の一例である。It is an example of a conceptual diagram showing a face and a size detected in a face recognition process and a pattern recognition process for a captured image of a subject A. 被写体Ａの撮像画像に対して空間認識処理において検出した幅を示す概念図の一例である。It is an example of a conceptual diagram showing a width detected in a space recognition process for a captured image of subject A. 被写体Ｂの撮像画像に対して顔認識処理及びパターン認識処理において検出される顔及びサイズを示す概念図の一例である。FIG. 3 is an example of a conceptual diagram illustrating a face and a size detected in a face recognition process and a pattern recognition process on a captured image of a subject B. 被写体Ｃの撮像画像に対して顔認識処理及びパターン認識処理において検出される顔及びサイズを示す概念図の一例である。4 is an example of a conceptual diagram showing a face and a size detected in a face recognition process and a pattern recognition process for a captured image of a subject C. FIG. 被写体Ｃの撮像画像に対して空間認識処理において検出した幅を示す概念図の一例である。It is an example of a conceptual diagram showing a width detected in space recognition processing for a captured image of subject C. 被写体Ｄの撮像画像に対して顔認識処理及びパターン認識処理において検出される顔及びサイズを示す概念図の一例である。FIG. 3 is an example of a conceptual diagram illustrating a face and a size detected in a face recognition process and a pattern recognition process on a captured image of a subject D. 被写体Ｄの撮像画像に対して空間認識処理において検出した幅を示す概念図の一例である。It is an example of a conceptual diagram showing a width detected in space recognition processing for a captured image of subject D. ＬＣＤ２８に表示される口述シーンモードを示す画面の一例である。It is an example of the screen which shows the dictation scene mode displayed on LCD28. ＬＣＤ２８に表示される会議シーンモードを示す画面の一例である。It is an example of the screen which shows the meeting scene mode displayed on LCD28. ＬＣＤ２８に表示される講義シーンモードを示す画面の一例である。It is an example of the screen which shows the lecture scene mode displayed on LCD28. ＬＣＤ２８に表示される音楽シーンモードを示す画面の一例である。It is an example of the screen which shows the music scene mode displayed on LCD28. 顔認識処理、空間認識処理及びパターン認識処理の結果と推奨される録音シーンとの関係を示す認識テーブルである。It is a recognition table which shows the relationship between the result of face recognition processing, space recognition processing, and pattern recognition processing, and a recommended recording scene. 各録音シーンと録音機能のパラメータとの関係を示す録音シーンテーブルである。It is a recording scene table which shows the relationship between each recording scene and the parameter of a recording function. オートシーンセレクト機能における処理を実行するタスクの一例を示すフローチャートである。It is a flowchart which shows an example of the task which performs the process in an auto scene select function.

以下、本発明の録音装置の一実施例として、ＩＣレコーダ１０に実施した形態につき、図面に沿って具体的に説明する。 Hereinafter, the embodiment implemented in the IC recorder 10 will be described in detail with reference to the drawings as an embodiment of the recording apparatus of the present invention.

図１は、本実施例のＩＣレコーダ１０のブロック図を示している。ＩＣレコーダ１０は、少なくとも光学レンズを含むレンズ群１６、図示しない絞り、ＣＭＯＳイメージャユニット１８、信号処理回路２０、ＣＰＵ２２、操作部２４、ＬＣＤ（ＬｉｑｕｉｄＣｒｙｓｔａｌＤｉｓｐｌａｙ）２８、外部メモリカード制御回路３０、ＳＤＲＡＭ３２、外部メモリカード３４、フラッシュメモリ３６、バス３８、コーデック４０、マイク部４２、スピーカ４４、ＤＳＰ（ＤｉｇｉｔａｌＳｉｇｎａｌＰｒｏｃｅｓｓｏｒ）４６及びアンプ４８を含んで構成されている。 FIG. 1 shows a block diagram of an IC recorder 10 of this embodiment. The IC recorder 10 includes a lens group 16 including at least an optical lens, a diaphragm (not shown), a CMOS imager unit 18, a signal processing circuit 20, a CPU 22, an operation unit 24, an LCD (Liquid Crystal Display) 28, an external memory card control circuit 30, and an SDRAM 32. , An external memory card 34, a flash memory 36, a bus 38, a codec 40, a microphone unit 42, a speaker 44, a DSP (Digital Signal Processor) 46, and an amplifier 48.

マイク部４２は、無指向性マイクロフォン４２Ｌ及び無指向性マイクロフォン４２Ｒを含む。マイク部４２で集音された音声は、アナログ音声信号として出力され、マイク部４２に接続されるコーデック４０へ入力される。コーデック４０は、入力されたアナログ音声信号に対し所定のデジタル処理を施し、デジタル音声信号を出力する。 The microphone unit 42 includes an omnidirectional microphone 42L and an omnidirectional microphone 42R. The sound collected by the microphone unit 42 is output as an analog audio signal and input to the codec 40 connected to the microphone unit 42. The codec 40 performs predetermined digital processing on the input analog audio signal and outputs a digital audio signal.

バス３８は、ＣＰＵ２２とフラッシュメモリ３６に接続される。ＣＰＵ２２はフラッシュメモリ３６に格納されているプログラムを実行することにより、ＩＣレコーダ１０の各回路及び各部を制御する。ＩＣレコーダ１０は、複数の録音機能を備えており、ＣＰＵ２２は、該録音機能に対して設定されたパラメータに基づいて処理を実行する。 The bus 38 is connected to the CPU 22 and the flash memory 36. The CPU 22 controls each circuit and each part of the IC recorder 10 by executing a program stored in the flash memory 36. The IC recorder 10 has a plurality of recording functions, and the CPU 22 executes processing based on parameters set for the recording functions.

バス３８には、ＣＰＵ２２及びフラッシュメモリ３６のほか、ＣＭＯＳイメージャユニット１８、信号処理回路２０、ＬＣＤ２８、外部メモリカード制御回路３０、ＳＤＲＡＭ３２、コーデック４０及びＤＳＰ４６が接続されている。 In addition to the CPU 22 and flash memory 36, the CMOS imager unit 18, signal processing circuit 20, LCD 28, external memory card control circuit 30, SDRAM 32, codec 40, and DSP 46 are connected to the bus 38.

さて、ＳＤＲＡＭ３２に格納されたデジタル音声信号がファイル形式としてＭＰ３形式で圧縮される場合、デジタル音声信号はＳＤＲＡＭ３２からＤＳＰ４６へ出力される。ＤＳＰ４６は、入力されたデジタル音声信号に対してＭＰ３形式で圧縮処理を施し、ＳＤＲＡＭ３２へＭＰ３音声圧縮データとして一旦格納する。そして、ＣＰＵ２２は外部メモリコントローラ３０を制御して、ＳＤＲＡＭ３２に格納されているＭＰ３音声圧縮データを音声ファイルとして外部メモリカード３４へ記録する。 When the digital audio signal stored in the SDRAM 32 is compressed in the MP3 format as a file format, the digital audio signal is output from the SDRAM 32 to the DSP 46. The DSP 46 compresses the input digital audio signal in the MP3 format, and temporarily stores it in the SDRAM 32 as MP3 audio compression data. Then, the CPU 22 controls the external memory controller 30 to record the MP3 audio compression data stored in the SDRAM 32 as an audio file on the external memory card 34.

なお、ファイル形式としてＰＣＭ方式が採用された場合は、ＳＤＲＡＭ３２に格納されているデジタル音声信号は、音声ファイルとして外部メモリカード制御回路３０の制御のもと、外部メモリカード３４へ記録される。 When the PCM method is adopted as the file format, the digital audio signal stored in the SDRAM 32 is recorded on the external memory card 34 as an audio file under the control of the external memory card control circuit 30.

以上のように、マイク部４２で集音された音声が外部メモリカード３４へ記録される処理を“録音”と定義する。 As described above, the process in which the sound collected by the microphone unit 42 is recorded in the external memory card 34 is defined as “recording”.

外部メモリカード３４に記録された音声ファイルを再生する際には、外部メモリカード制御回路３０の制御のもと、外部メモリカード３４に記録された音声ファイルが読み出される。読み出された音声ファイルがＭＰ３圧縮音声データを含む場合は、ＤＳＰ４６へ入力され、伸張処理が施されて伸張デジタル音声信号としてコーデック４０へ出力される。コーデック４０では伸張デジタル音声信号を処理してアナログ再生信号としてアンプ４８へ出力する。アンプ４８では、アナログ再生信号に対してゲイン調整を施した後、スピーカ４４へ出力し、スピーカ４４はゲイン調整が施されたアナログ再生信号を音声として出力する。 When playing back an audio file recorded on the external memory card 34, the audio file recorded on the external memory card 34 is read under the control of the external memory card control circuit 30. When the read audio file includes MP3 compressed audio data, it is input to the DSP 46, subjected to expansion processing, and output to the codec 40 as an expanded digital audio signal. The codec 40 processes the expanded digital audio signal and outputs it to the amplifier 48 as an analog reproduction signal. The amplifier 48 adjusts the gain of the analog reproduction signal and then outputs it to the speaker 44. The speaker 44 outputs the analog reproduction signal subjected to the gain adjustment as sound.

以上のように、外部メモリカード３４に記録された音声ファイルがスピーカ４４から音声として出力されるまでの処理を“再生”と定義する。 As described above, the process until the audio file recorded in the external memory card 34 is output as audio from the speaker 44 is defined as “reproduction”.

操作部２４はＣＰＵ２２と接続されており、録音機能のパラメータを設定するための設定画面を呼び出すなどのさまざまなメニューを表示するためのメニューキー２４ａ、音声ファイルの再生処理を開始するための再生ボタン２４ｂ、音声の録音処理を開始するための録音ボタン２４ｃ、ＬＣＤ２８上に表示されるカーソルを移動させるためのカーソルキー２４ｄ、各機能の実行を決定するためのセットボタン２４ｅ、再生処理又は録音処理を停止するための停止ボタン２４ｆ及びオートシーンセレクト機能を有効化するためのオートシーンセレクトボタン２４ｇを含む。 The operation unit 24 is connected to the CPU 22 and has a menu key 24a for displaying various menus such as calling a setting screen for setting parameters of the recording function, and a playback button for starting playback processing of the audio file. 24b, a recording button 24c for starting a voice recording process, a cursor key 24d for moving a cursor displayed on the LCD 28, a set button 24e for determining execution of each function, a reproduction process or a recording process A stop button 24f for stopping and an auto scene select button 24g for enabling the auto scene select function are included.

ＩＣレコーダ１０に電源が投入されると、ＩＣレコーダ１０は録音スタンバイ状態に移行する。録音スタンバイ状態において、メニューキー２４ａが操作されると、録音機能のパラメータを設定する設定画面が呼び出される。使用者はカーソルキー２４ｄやセットボタン２４ｅを操作することにより、設定画面において所望する録音機能のパラメータの設定が可能である。なお、パラメータとして、各録音機能自体のオン／オフも含むものとして説明を続ける。 When the IC recorder 10 is powered on, the IC recorder 10 shifts to a recording standby state. When the menu key 24a is operated in the recording standby state, a setting screen for setting recording function parameters is called up. The user can set desired recording function parameters on the setting screen by operating the cursor key 24d and the set button 24e. The description will be continued assuming that each recording function itself is turned on / off as a parameter.

録音機能について詳細に説明する。音声を圧縮する比率を設定する“圧縮比率”機能、マイクロフォン４２Ｌ、４２Ｒのマイク感度を設定する“ＭＩＣ感度”機能、音声信号の振幅を自動調整する自動レベル制御（Automatic Level Control）を示す“ＡＬＣ” 機能、所定周波数以下の音声をカットするためのローカットフィルタを示す“ＬｏｗＣｕｔ”機能、突然の過大入力を抑制するためのピークリミッタを示す“ピークリミッタ” 機能、録音ボタン２４ｃが押下された後に指定した時間が経過すると録音を開始する機能である“セルフタイマ”機能及び無音部分の録音を一時停止するボイスアクティベートシステムを示す“ＶＡＳ” 機能がある。 The recording function will be described in detail. “ALC” indicating “compression ratio” function for setting the ratio for compressing the sound, “MIC sensitivity” function for setting the microphone sensitivity of the microphones 42L and 42R, and automatic level control for automatically adjusting the amplitude of the audio signal. ”Function,“ LowCut ”function indicating a low cut filter for cutting audio below a predetermined frequency,“ Peak limiter ”function indicating a peak limiter for suppressing sudden excessive input, specified after the recording button 24c is pressed There is a “self-timer” function, which is a function for starting recording when a predetermined time has elapsed, and a “VAS” function, which indicates a voice activation system for temporarily stopping recording of a silent part.

“圧縮比率”機能のパラメータとして、ＭＰ３方式である３２ｋｂｐｓ、６４ｋｂｐｓ、１９２ｋｂｐｓ、３２０ｋｂｐｓ及びＰＣＭ方式である４８ｋＨｚ１６ｂｉｔが用意されており、それらの中から１つ選択可能である。なお、３２ｋｂｐｓとは、1秒間に３２キロビットのデータを送れるかを示すデータ通信の速度を表し、４８ｋＨｚ１６ｂｉｔとは、サンプリング周波数４８ｋＨｚ量子化ビットレート１６ｂｉｔを表している。 As parameters of the “compression ratio” function, 32 kbps, 64 kbps, 192 kbps, 320 kbps, which are MP3 systems, and 48 kHz, 16 bits, which are PCM systems, are prepared, and one of them can be selected. Note that 32 kbps represents the speed of data communication indicating whether 32 kilobits of data can be sent per second, and 48 kHz 16 bits represents the sampling frequency 48 kHz quantization bit rate 16 bits.

“ＭＩＣ感度”機能のパラメータとして、Ｌｏｗ又はＨｉが用意されており、それらの中から１つ選択可能である。Ｌｏｗが選択された場合はマイク感度が低く、Ｈｉが選択された場合はマイク感度が高くなる。 Low or Hi is prepared as a parameter of the “MIC sensitivity” function, and one of them can be selected. The microphone sensitivity is low when Low is selected, and the microphone sensitivity is high when Hi is selected.

“ＡＬＣ”機能のパラメータとしてオン又はオフが選択可能である。オンが選択された場合は自動レベル制御が有効化され、オフが選択された場合は無効化される。 On or off can be selected as a parameter of the “ALC” function. Automatic level control is enabled when on is selected, and disabled when off is selected.

“ＬｏｗＣｕｔ”機能のパラメータとして、オン又はオフが選択可能である。オンが選択された場合はローカットフィルタが有効化され、オフが選択された場合は無効化される。 On or off can be selected as a parameter of the “LowCut” function. When on is selected, the low cut filter is enabled, and when off is selected, it is disabled.

“ピークリミッタ”機能のパラメータとして、オン又はオフが選択可能である。オンが選択された場合はピークリミッタが有効化され、オフが選択された場合は無効化される。 On or off can be selected as a parameter of the “peak limiter” function. The peak limiter is enabled when on is selected, and is disabled when off is selected.

“ＶＡＳ”機能のパラメータとして、オン又はオフが選択可能である。オンが選択された場合はボイスアクティベートシステムが有効化され、オフが選択された場合は無効化される。 On or off can be selected as a parameter of the “VAS” function. The voice activation system is activated when on is selected, and is deactivated when off is selected.

また、ＩＣレコーダ１０は、予め複数の録音シーンを想定し録音シーンを指定することにより録音シーン夫々に対して設定されている録音機能のパラメータを自動的に有効化するシーンセレクト機能を備えている。 Further, the IC recorder 10 includes a scene selection function that automatically enables a recording function parameter set for each recording scene by designating the recording scene assuming a plurality of recording scenes in advance. .

図１５は、録音シーンに対して夫々設定されている録音機能のパラメータを示した録音シーンテーブルを表わしている。録音シーンテーブルは、フラッシュメモリ３６に格納されている。シーンセレクト機能において、使用者によって口述シーンが選択された場合、圧縮比率を６４ｋｂｐｓ、ＭＩＣ感度をＬｏｗ、ＡＬＣをオン、ＬｏｗＣｕｔをオン、ピークリミッタをオフ、セルフタイマをオフ及びＶＡＳをオフになるよう設定され、図１０に示すような口述シーンを示す画面がＬＣＤ２８に表示される。 FIG. 15 shows a recording scene table showing the parameters of the recording function set for each recording scene. The recording scene table is stored in the flash memory 36. When the dictation scene is selected by the user in the scene selection function, the compression ratio is 64 kbps, the MIC sensitivity is low, the ALC is on, the low cut is on, the peak limiter is off, the self timer is off, and the VAS is off. The screen showing the dictation scene as shown in FIG.

同様に会議シーンが選択されると、ＣＰＵ２２は録音シーンテーブルを参照して録音機能のパラメータを設定し、図１１に示すような会議シーンを示す画面をＬＣＤ２８に表示させる。講義シーンが選択されると、ＣＰＵ２２は録音シーンテーブルを参照して録音機能のパラメータを設定し、図１２に示すような講義シーンを示す画面をＬＣＤ２８に表示させる。音楽シーンが選択されると、録音シーンテーブルを参照して録音機能のパラメータを設定し、図１３に示すような音楽シーンを示す画面をＬＣＤ２８に表示させる。 Similarly, when a conference scene is selected, the CPU 22 refers to the recording scene table, sets parameters for the recording function, and causes the LCD 28 to display a screen showing the conference scene as shown in FIG. When a lecture scene is selected, the CPU 22 refers to the recording scene table, sets parameters for the recording function, and causes the LCD 28 to display a screen showing the lecture scene as shown in FIG. When a music scene is selected, the recording function parameters are set with reference to the recording scene table, and a screen showing the music scene as shown in FIG.

また、録音スタンバイ状態において、録音ボタン２４ｃが押下されると、ＣＰＵ２２は、設定されたパラメータに基づいて録音を開始させ、停止ボタン２４ｆが押下されることにより録音を終了する。また、録音スタンバイ状態において、再生ボタン２４ｂが押下されると、ＣＰＵ２２は、音声ファイルの再生を開始させ、停止ボタン２４ｆが押下されることにより、再生を終了する。 In the recording standby state, when the recording button 24c is pressed, the CPU 22 starts recording based on the set parameters, and ends recording when the stop button 24f is pressed. When the playback button 24b is pressed in the recording standby state, the CPU 22 starts playback of the audio file, and ends playback by pressing the stop button 24f.

更に、録音スタンバイ状態において、オートシーンセレクトボタン２４ｇが押下されると、ＣＰＵ２２はオートシーンセレクト機能を実行する。オートシーンセレクト機能とは、ＩＣレコーダ１０が持つ撮像機能を利用して得られた撮像画像に基づいて推奨する録音シーンを特定する機能である。そして、録音シーンテーブルが参照され、特定された録音シーンに対応する録音機能のパラメータが自動的に設定される。 Further, when the auto scene select button 24g is pressed in the recording standby state, the CPU 22 executes the auto scene select function. The auto scene selection function is a function for specifying a recommended recording scene based on a captured image obtained by using the imaging function of the IC recorder 10. Then, the recording scene table is referred to and the recording function parameters corresponding to the specified recording scene are automatically set.

以下に、オートシーンセレクト機能について詳細に説明する。 The auto scene select function will be described in detail below.

オートシーンセレクトボタン２４ｇが押下されると、ＩＣレコーダ１０において撮像が開始され、撮像画像が取得される。具体的には、被写体の光学像はＣＰＵ２２による指示によって、図示しないモータ駆動部に制御されたレンズ群１６及び絞りを通して、ＣＭＯＳイメージャユニット１８に取り込まれる。ＣＰＵ２２に接続された図示しないタイミングジェネレータによって与えられる取り込みパルスによって、ＣＭＯＳイメージャユニット１８から１フレーム分のデジタル撮像信号が出力される。 When the auto scene select button 24g is pressed, the IC recorder 10 starts imaging and acquires a captured image. Specifically, the optical image of the subject is taken into the CMOS imager unit 18 through the lens group 16 and the aperture controlled by a motor driving unit (not shown) according to an instruction from the CPU 22. A digital image pickup signal for one frame is output from the CMOS imager unit 18 by a capture pulse provided by a timing generator (not shown) connected to the CPU 22.

ＣＭＯＳイメージャユニット１８では、各画素で蓄積した電荷を増幅し、各画素から配線を使用して信号として読み出しを行い、該信号に対して、相関２重サンプリング処理、ゲイン調整、クランプ処理、Ａ／Ｄ変換処理を施す。該処理が施されたデジタル撮像信号は、画素毎にＲ、Ｇ、Ｂのいずれかの色信号を有し、ＣＰＵ２２の制御によって、バス３８を介してＳＤＲＡＭ３２に一旦格納される。 The CMOS imager unit 18 amplifies the charge accumulated in each pixel, reads out the signal from each pixel as a signal, and performs correlated double sampling processing, gain adjustment, clamping processing, A / D conversion processing is performed. The digital image signal subjected to the processing has one of R, G, and B color signals for each pixel, and is temporarily stored in the SDRAM 32 via the bus 38 under the control of the CPU 22.

なお、本実施例では、イメージセンサとしてＣＭＯＳイメージャユニット１８を採用した形態で説明するが、ＣＣＤイメージャを採用しても良い。ＣＣＤイメージャを採用した場合は、相関２重サンプリング処理、ゲイン調整、クランプ処理、Ａ／Ｄ変換処理を含むＡＦＥ回路が追加される。 In this embodiment, the CMOS imager unit 18 is used as the image sensor. However, a CCD imager may be used. When a CCD imager is employed, an AFE circuit including correlated double sampling processing, gain adjustment, clamping processing, and A / D conversion processing is added.

ＳＤＲＡＭ３２に一旦格納されたデジタル撮像信号は、ＣＰＵ２２の制御によって信号処理回路２０へ入力される。信号処理回路２０では、入力されたデジタル撮像信号に対して色分離処理を施し、更にＹＵＶ変換により、Ｙ、Ｕ、Ｖ信号に変換する。そして、信号処理回路２０で変換されたデジタル画像信号は、バス３８を介して、再びＳＤＲＡＭ３２へ格納される。このように、被写体の光学像がさまざまな処理を経てＳＤＲＡＭ３２に格納されるまでを撮像処理と定義する。 The digital imaging signal once stored in the SDRAM 32 is input to the signal processing circuit 20 under the control of the CPU 22. The signal processing circuit 20 performs color separation processing on the input digital imaging signal, and further converts it into Y, U, and V signals by YUV conversion. Then, the digital image signal converted by the signal processing circuit 20 is stored in the SDRAM 32 again via the bus 38. Thus, the process until the optical image of the subject is stored in the SDRAM 32 through various processes is defined as an imaging process.

さて、撮像処理において得られた１フレーム分のデジタル画像信号に対して、ＣＰＵ２２は所定の形状をした物体を検出する処理を実行する。より具体的には、顔を検出する顔認識処理及び楽器及び人の後頭部を検出するパターン認識処理を行う。顔認識処理及びパターン認識処理では、デジタル画像信号の中から、テンプレートとして用意されている複数の顔テンプレート、複数の楽器テンプレート又は複数の後頭部テンプレートとマッチングを行う。マッチしたテンプレートがデジタル画像信号のどこに位置するかを特定することによって、顔及び又は楽器及び又は後頭部が検出される。 Now, the CPU 22 executes processing for detecting an object having a predetermined shape with respect to the digital image signal for one frame obtained in the imaging processing. More specifically, a face recognition process for detecting a face and a pattern recognition process for detecting a musical instrument and a human head are performed. In the face recognition processing and pattern recognition processing, matching is performed with a plurality of face templates, a plurality of instrument templates, or a plurality of occipital templates prepared as templates from the digital image signal. By identifying where the matched template is located in the digital image signal, the face and / or instrument and / or back of the head is detected.

検出された顔及び又は楽器及び又は後頭部に対し、ＣＰＵ２２は夫々のサイズを認識する。サイズは６種類に分かれており、図２は画像サイズが５インチであるときに対応する６つのサイズを図示したものである。１番小さいサイズを１とし、次に２、３、４、５と続き１番大きいサイズを６とする。ここでは、サイズ１は縦１．２ｃｍ×横１ｃｍとしているが、サイズはこれに限定されない。 For the detected face and / or musical instrument and / or back of the head, the CPU 22 recognizes the respective sizes. The sizes are divided into six types, and FIG. 2 illustrates the six sizes corresponding to the image size of 5 inches. The smallest size is set to 1, then 2, 3, 4, 5 and so on, and the largest size is set to 6. Here, the size 1 is 1.2 cm long × 1 cm wide, but the size is not limited to this.

なお便宜上、図３−図９で示される撮像画像Ａ、Ｂ、Ｃ、Ｄの画像サイズを５インチとし、それに対応する６つのサイズを用いて説明する。 For convenience, the image sizes of the captured images A, B, C, and D shown in FIGS. 3 to 9 are assumed to be 5 inches, and description will be made using six sizes corresponding thereto.

次に、検出された顔及び又は楽器及び又は後頭部が複数存在する場合には、ＣＰＵ２２は、夫々がどれくらい離れているかを示す離れ具合を検出する。離れ具合は、顔及び又は楽器及び又は後頭部の中心から互いに最も近い顔及び又は楽器及び又は後頭部の中心までの幅を測定し、夫々測定された幅の中で最大の幅を検出する。この離れ具合を検出する処理を“空間認識処理”と定義する。 Next, when there are a plurality of detected faces and / or musical instruments and / or occipital heads, the CPU 22 detects the degree of separation indicating how far away each is. The degree of separation measures the width from the center of the face and / or the instrument and / or the back of the head to the center of the face and / or the instrument and / or the back of the head closest to each other, and detects the maximum width among the measured widths. The process of detecting the degree of separation is defined as “space recognition process”.

ＣＰＵ２２は、顔検出処理、パターン検出処理及び空間認識処理に基づき、フラッシュメモリ３６に格納されている認識テーブルを参照して、推奨する録音シーンを決定する。認識テーブルは図１４に示すように録音シーンと顔認識処理、パターン認識処理及び空間認識処理の結果が対応付けられている。 Based on the face detection process, the pattern detection process, and the space recognition process, the CPU 22 refers to the recognition table stored in the flash memory 36 and determines a recommended recording scene. As shown in FIG. 14, the recognition table associates recording scenes with the results of face recognition processing, pattern recognition processing, and space recognition processing.

ＣＰＵ２２は、顔認識処理の結果、顔の個数が１つでサイズが５又は６であり、パターン認識処理の結果、何も認識されなかったことが分かると、認識テーブルを参照して、推奨する録音シーンは口述シーンであると決定する。 If the CPU 22 finds that the number of faces is one and the size is 5 or 6 as a result of the face recognition process and that nothing has been recognized as a result of the pattern recognition process, the CPU 22 recommends referring to the recognition table. The recording scene is determined to be an dictation scene.

顔認識処理の結果、顔の個数が２つ以上でサイズが３−６のいずれかであり、空間認識処理の結果、最大幅が０−５ｃｍのいずれかに該当し、パターン認識処理の結果、楽器が認識されなかったことが分かると、推奨する録音シーンは会議シーンであると決定する。この場合、後頭部が検出されてもされなくても推奨する録音シーンは会議シーンであると決定される。 As a result of the face recognition process, the number of faces is two or more and the size is any of 3-6, and as a result of the space recognition process, the maximum width corresponds to any of 0-5 cm, the result of the pattern recognition process, If it is found that the instrument has not been recognized, it is determined that the recommended recording scene is a conference scene. In this case, it is determined that the recommended recording scene is the conference scene regardless of whether the back of the head is detected.

顔認識処理の結果、顔の個数が１つでサイズが１又は２であり、空間認識の結果、最大幅が０−５ｃｍに該当することが分かり、パターン認識処理の結果、楽器が検出されず後頭部を検出した場合、推奨する録音シーンは講義シーンであると決定する。 As a result of the face recognition process, it is found that the number of faces is one and the size is 1 or 2, and as a result of the space recognition, the maximum width corresponds to 0-5 cm. As a result of the pattern recognition process, no instrument is detected. When the back of the head is detected, it is determined that the recommended recording scene is a lecture scene.

顔認識処理の結果、顔の個数が１つ以上でサイズが１−６であり、空間認識の結果、最大幅が０〜１２ｃｍに該当することが分かり、パターン認識処理の結果、楽器が検出した場合、推奨する録音シーンは音楽シーンであると決定する。この場合、後頭部が検出されてもされなくても推奨する録音シーンは音楽シーンであると決定される。 As a result of the face recognition process, the number of faces is 1 or more and the size is 1-6. As a result of the space recognition, it is found that the maximum width corresponds to 0 to 12 cm. The recommended recording scene is a music scene. In this case, it is determined that the recommended recording scene is a music scene regardless of whether or not the back of the head is detected.

なお、顔認識処理、空間認識処理及びパターン認識処理の結果、図１５に示す認識テーブルに該当しないことが分かった場合には、ＬＣＤ２８に“シーンが見つけられません”等の表示をしても良い。 If it is found as a result of the face recognition process, the space recognition process and the pattern recognition process that it does not correspond to the recognition table shown in FIG. 15, even if “Scene is not found” is displayed on the LCD 28. good.

図３は、推奨する録音シーンが会議シーンであると決定され得る撮像画像Ａを示している。ＣＰＵ２２による顔認識処理及びパターン認識処理の結果、図３に示すように、撮像画像Ａのデジタル画像信号からサイズ２の顔ｗ、ｘ、サイズ３の顔ｙ及びサイズ４の顔ｚが検出され、楽器は検出されないこととなる。空間認識処理の結果、図４に示すように、顔ｗと顔ｘ間の幅Ｋ１＝４ｃｍ、顔ｘと顔ｙ間の幅Ｋ２＝２ｃｍ及び顔ｙと顔ｚ間の幅Ｋ３＝３ｃｍが検出されると、最大幅は４ｃｍとして決定される。この検出結果からＣＰＵ２２は図１４の認識テーブルを参照して、推奨する録音シーンは会議シーンであると決定する。 FIG. 3 shows a captured image A that can be determined that the recommended recording scene is a conference scene. As a result of the face recognition process and the pattern recognition process performed by the CPU 22, as shown in FIG. 3, a size 2 face w, x, a size 3 face y, and a size 4 face z are detected from the digital image signal of the captured image A. The instrument will not be detected. As a result of the spatial recognition processing, as shown in FIG. 4, a width K1 = 4 cm between the face w and the face x, a width K2 = 2 cm between the face x and the face y, and a width K3 = 3 cm between the face y and the face z are detected. Then, the maximum width is determined as 4 cm. From this detection result, the CPU 22 refers to the recognition table of FIG. 14 and determines that the recommended recording scene is a conference scene.

図５は、推奨する録音シーンが口述シーンであると決定され得る撮像画像Ｂを示している。ＣＰＵ２２による顔認識処理及びパターン認識処理の結果、図５に示すように、撮像画像Ｂのデジタル画像信号からサイズ６の顔ｖが検出され、楽器は検出されないこととなる。ＣＰＵ２２はこの検出結果から図１４の認識テーブルを参照して、推奨する録音シーンは口述シーンであると決定する。 FIG. 5 shows a captured image B that can be determined that the recommended recording scene is an dictation scene. As a result of the face recognition process and the pattern recognition process by the CPU 22, as shown in FIG. 5, the face v of size 6 is detected from the digital image signal of the captured image B, and no musical instrument is detected. From this detection result, the CPU 22 refers to the recognition table in FIG. 14 and determines that the recommended recording scene is an dictation scene.

図６は、推奨する録音シーンが講義シーンであると決定され得る撮像画像Ｃを示している。ＣＰＵ２２による顔認識処理及びパターン認識の結果、図６に示すように、撮像画像Ｃのデジタル画像信号からサイズ１の顔ｔが検出され、後頭部ｌ、ｍ、ｎが検出される。なお、顔ｔはサイズ１よりも小さいが、ＣＰＵ２２はサイズ１−６の中で直近のサイズに割り当てる。空間認識処理の結果、図７にすように、顔ｔと後頭部ｌ間の幅Ｋ４＝４ｃｍ、後頭部ｌと後頭部ｍ間の幅Ｋ５＝２．５ｃｍ及び後頭部ｍと後頭部ｎ間の幅Ｋ６＝２ｃｍが検出され、最大幅は４ｃｍとして決定される。ＣＰＵ２２は、この検出結果から図１４の認識テーブルを参照して、推奨するシーンは講義シーンであると決定する。 FIG. 6 shows a captured image C that can be determined that the recommended recording scene is a lecture scene. As a result of the face recognition processing and pattern recognition by the CPU 22, as shown in FIG. 6, a size t face t is detected from the digital image signal of the captured image C, and the occipital region l, m, n is detected. Note that the face t is smaller than the size 1, but the CPU 22 assigns the most recent size among the sizes 1-6. As a result of the space recognition processing, as shown in FIG. 7, the width K4 between the face t and the back of the head 1 is 4 cm, the width K5 between the back of the head 1 and the back of the head m is 2.5 cm, and the width of the back of the head m and the back of the head n is K6 = 2 cm. Is detected and the maximum width is determined as 4 cm. From this detection result, the CPU 22 refers to the recognition table in FIG. 14 and determines that the recommended scene is a lecture scene.

図８は、推奨する録音シーンが音楽シーンであると決定され得る撮像画像Ｄを示している。ＣＰＵ２２による顔認識処理及びパターン認識処理の結果、図８に示すように撮像画像Ｄのデジタル画像信号からサイズ１の顔ｅ、ｆ、ｇ、ｈが検出され、後頭部ｉが検出され、楽器ｏ、ｐ、ｑ、ｒ、ｓが検出される。空間認識処理の結果、図９に示すように、顔ｈと楽器ｒ間の幅Ｋ７＝１．５ｃｍ、顔ｇと楽器ｑ間の幅Ｋ８＝１．５ｃｍ、顔ｆと楽器ｐ間の幅Ｋ９＝１．５ｃｍ及び後頭部ｉと楽器ｓ間の幅＝０ｃｍが検出され、最大幅は１．５ｃｍとして決定される。ＣＰＵ２２は、この検出結果から図１４の認識テーブルを参照して、推奨するシーンは音楽シーンであると決定する。 FIG. 8 shows a captured image D that can be determined that the recommended recording scene is a music scene. As a result of the face recognition process and the pattern recognition process by the CPU 22, as shown in FIG. 8, the face e, f, g, h of size 1 is detected from the digital image signal of the captured image D, the occipital region i is detected, the instrument o, p, q, r, and s are detected. As a result of the space recognition processing, as shown in FIG. 9, the width K7 = 1.5 cm between the face h and the instrument r, the width K8 = 1.5 cm between the face g and the instrument q, and the width K9 between the face f and the instrument p. = 1.5 cm and the width between the occipital area i and the instrument s = 0 cm is detected, and the maximum width is determined as 1.5 cm. The CPU 22 determines that the recommended scene is a music scene with reference to the recognition table of FIG. 14 based on the detection result.

推奨される録音シーンが決定されると、ＣＰＵ２２は、ＬＣＤ２８に推奨される録音シーンに対応する録音シーンの画面を表示させる。推奨される録音シーンが口述シーンの場合には、図１０に示すような口述シーンを示す画面が表示される。推奨される録音シーンが会議シーンの場合には、図１１に示すような会議シーンを示す画面が表示される。推奨される録音シーンが講義シーンの場合には、図１２に示すような会議シーンを示す画面が表示される。推奨される録音シーンが音楽シーンの場合には、図１３に示すような音楽シーンを示す画面が表示される。 When the recommended recording scene is determined, the CPU 22 causes the LCD 28 to display a recording scene screen corresponding to the recommended recording scene. When the recommended recording scene is an dictation scene, a screen showing the dictation scene as shown in FIG. 10 is displayed. When the recommended recording scene is a conference scene, a screen showing the conference scene as shown in FIG. 11 is displayed. When the recommended recording scene is a lecture scene, a screen showing a conference scene as shown in FIG. 12 is displayed. If the recommended recording scene is a music scene, a screen showing the music scene as shown in FIG. 13 is displayed.

そして、ＣＰＵ２２は、録音シーンテーブルを参照して推奨する録音シーンに対応する録音機能のパラメータを設定する。そして、録音ボタン２４ｃが押下された場合は、ＣＰＵ２２は推奨する録音シーンに対応する録音機能のパラメータを録音条件として録音を開始する。 Then, the CPU 22 sets the recording function parameters corresponding to the recommended recording scene with reference to the recording scene table. When the recording button 24c is pressed, the CPU 22 starts recording using the recording function parameters corresponding to the recommended recording scene as recording conditions.

録音機能のパラメータをカスタマイズする場合には、ＬＣＤ２８に録音シーンの画面が表示されてから所定時間内に、使用者によってメニューキー２４ａが操作されることによりパラメータを変更する画面へと遷移させ、カーソルキー２４ｄやセットボタン２４ｅが操作されることによって所望のパラメータに変更することが出来る。なお、所望のパラメータの変更例としては、ＡＬＣ機能のパラメータをオンからオフに変更したり、圧縮比率機能のパラメータを６４ｋｂｐｓから１９２ｋｂｐｓに変更することが挙げられる。そして、録音ボタン２４ｃが押下されることによって、設定されたパラメータを録音条件として録音が開始される。 When customizing the parameters of the recording function, the screen is changed to a screen for changing the parameters by operating the menu key 24a by the user within a predetermined time after the screen of the recording scene is displayed on the LCD 28, and the cursor It can be changed to a desired parameter by operating the key 24d or the set button 24e. Note that examples of changing the desired parameter include changing the parameter of the ALC function from on to off, and changing the parameter of the compression ratio function from 64 kbps to 192 kbps. When the recording button 24c is pressed, recording is started using the set parameters as recording conditions.

次に図１６に示すオートシーンセレクトタスクのフローチャートを参照して、上述したオートシーンセレクト機能を実現するための処理を説明する。このタスクは、フラッシュメモリ３６に格納されている夫々のプログラムに基づいてＣＰＵ２２が実行する。 Next, processing for realizing the above-described auto scene selection function will be described with reference to the flowchart of the auto scene selection task shown in FIG. This task is executed by the CPU 22 based on each program stored in the flash memory 36.

電源が投入され、オートシーンセレクトボタン２４ｇが押下されるとオートシーンセレクトタスクが起動する。まずステップＳ１０１において、ＣＰＵ２２は１フレームの撮像処理を行う。次のステップＳ１０３では、撮像処理によって得られたデジタル画像信号に対してＣＰＵ２２は顔認識処理を実行し、次のステップＳ１０５では空間認識処理を実行する。次のステップＳ１０７ではパターン認識処理を実行する。次のステップＳ１０９では、ＣＰＵ２２は、顔認識処理、空間認識処理及びパターン認識処理の結果に基づいて、認識テーブルを参照して推奨する録音シーンを決定し、次のステップＳ１１１では、推奨する録音シーンに対応する録音機能のパラメータを設定して、推奨する録音シーンの画面をＬＣＤ２８に表示する。 When the power is turned on and the auto scene select button 24g is pressed, the auto scene select task is activated. First, in step S101, the CPU 22 performs an imaging process for one frame. In the next step S103, the CPU 22 performs a face recognition process on the digital image signal obtained by the imaging process, and in the next step S105, executes a space recognition process. In the next step S107, pattern recognition processing is executed. In the next step S109, the CPU 22 determines a recommended recording scene with reference to the recognition table based on the results of the face recognition process, the space recognition process, and the pattern recognition process. In the next step S111, the recommended recording scene is determined. The parameter of the recording function corresponding to is set, and the screen of the recommended recording scene is displayed on the LCD 28.

次のステップＳ１１３では、推奨する録音シーンの画面が表示されてから所定時間内にメニューキー２４ａが操作され、カーソルキー２４ｄ及びセットボタン２４ｅの操作によりパラメータ変更の操作があったか否かを判別する。ステップＳ１１３においてＹＥＳと判別されると、ステップＳ１１５へ進みパラメータの変更を行った後、本タスクを終了する。また、ステップＳ１１３においてＮＯと判断された場合も、本タスクを終了する。 In the next step S113, it is determined whether or not the menu key 24a is operated within a predetermined time after the screen of the recommended recording scene is displayed, and the parameter key is changed by operating the cursor key 24d and the set button 24e. If YES is determined in the step S113, the process proceeds to a step S115 to change the parameters, and then the present task is terminated. Also, if NO is determined in step S113, this task is terminated.

上述したように、本実施例によるＩＣレコーダ１０によれば、使用者はオートシーンセレクトボタン２４ｇを押下するだけで、ＩＣレコーダ１０自身で推奨する録音シーンを判別し、該録音シーンに対応する録音機能のパラメータを設定するため、より簡単にかつ好適な録音条件で録音をすることが出来る。また、撮像画像から推奨する録音シーンが判別されるため、判別の精度がより高くなる。 As described above, according to the IC recorder 10 according to the present embodiment, the user discriminates the recording scene recommended by the IC recorder 10 only by pressing the auto scene select button 24g, and the recording corresponding to the recording scene. Since the function parameters are set, recording can be performed more easily and under suitable recording conditions. In addition, since the recommended recording scene is determined from the captured image, the determination accuracy becomes higher.

なお、本実施例のＩＣレコーダ１０では、オートシーンセレクト機能を実行するためのプログラムが予めフラッシュメモリ３６に格納されているが、ＩＣレコーダ１０と外部装置とを接続させ、使用者が無線通信や有線通信を介して外部装置が保持するプログラムを取得しても良い。 In the IC recorder 10 of this embodiment, a program for executing the auto scene select function is stored in the flash memory 36 in advance. However, the IC recorder 10 and an external device are connected to each other so that the user can perform wireless communication or You may acquire the program which an external apparatus hold | maintains via wired communication.

また、本実施例では本発明をＩＣレコーダ１０に適用させた例を説明したが、デジタルカメラ、ＰＤＡ、携帯電話及びスマートフォンでも適用可能である。携帯電話及びスマートフォンに適用される場合は、オートシーンセレクト機能を実行するためのプログラム全部または一部を、使用者がインターネット又は電話回線を介してダウンロードされても良い。 In the present embodiment, the example in which the present invention is applied to the IC recorder 10 has been described. However, the present invention can also be applied to a digital camera, a PDA, a mobile phone, and a smartphone. When applied to a mobile phone and a smartphone, the user may download all or a part of the program for executing the auto scene selection function via the Internet or a telephone line.

また、本発明がタッチパネル式のスマートフォンに適用される場合、操作部２４の各キー及びボタンの役割は、画面を軽く叩いてすぐ指を離すタップ操作や、指で画面を触れ続ける長押し操作や、画面を軽く払うよう指を動かすフリック操作や、画面に２本以上の指を接しさせ互いの指が離れる方向や近づく方向に動かすスワイプ操作などに対応させても良い。 In addition, when the present invention is applied to a touch panel type smartphone, the role of each key and button of the operation unit 24 is to perform a tap operation in which the user taps the screen and immediately releases the finger, or a long press operation in which the finger touches the screen. Further, a flick operation for moving a finger so as to lightly touch the screen, a swipe operation for touching two or more fingers on the screen and moving the fingers away from each other or approaching each other may be performed.

また、ＳＤＲＡＭ３２に格納されたデジタル画像信号は、ＣＰＵ２２の制御によりＬＣＤ２８へ出力しても良い。ＬＣＤ２８は、図示しないＬＣＤドライバを含み、ＬＣＤドライバはＹ、Ｕ、Ｖ信号をＲＧＢ信号に変換して、ＬＣＤ２８にデジタル画像信号に基づく画像を表示させることが出来る。画像をＬＣＤ２８に表示させると消費電力が増加するため、使用者によるメニューキー２４ａ、カーソルキー２４ｄ及びセットボタン２４ｅの操作によって表示モードに設定したときのみ、表示を行ってもよい。 The digital image signal stored in the SDRAM 32 may be output to the LCD 28 under the control of the CPU 22. The LCD 28 includes an LCD driver (not shown). The LCD driver can convert Y, U, and V signals into RGB signals and cause the LCD 28 to display an image based on the digital image signal. Since power consumption increases when an image is displayed on the LCD 28, display may be performed only when the display mode is set by the user operating the menu key 24a, the cursor key 24d, and the set button 24e.

また、実施例では、モニタとしてＬＣＤ２８を採用した形態を説明したが、有機ＥＬなどの表示デバイスを採用しても良い。 In the embodiment, the LCD 28 is used as a monitor. However, a display device such as an organic EL may be used.

また、顔認識処理、パターン認識処理及び空間認識処理において１フレーム分のデジタル画像信号を認識処理対象としたが、１０フレーム分を認識対象としても良い。この場合、認識精度の向上が期待される。 Further, in the face recognition process, the pattern recognition process, and the space recognition process, the digital image signal for one frame is set as a recognition process target, but 10 frames may be set as a recognition target. In this case, improvement in recognition accuracy is expected.

また、本実施例において顔認識処理から得られる顔のサイズの種類として６つ用意したが、顔のサイズや種類はこれに限らない。顔のサイズの種類が多ければ多いほど、推奨する録音シーンの判別精度が向上する。 In this embodiment, six face size types obtained from the face recognition process are prepared. However, the face size and type are not limited to this. The greater the number of face size types, the better the accuracy of discriminating recommended recording scenes.

また、本実施例においてマイク部４２に無指向性マイクロフォン４２Ｌ、４２Ｒを設けたが、更に指向性マイクロフォンを設けても良い。この場合、録音シーンに対して夫々設定されている録音機能のパラメータとして無指向性マイクロフォン４２Ｌ、４２Ｒと指向性マイクロフォンのオン／オフを追加しても良い。例えば、口述シーンに設定される録音機能のパラメータとして、無指向性マイクロフォン４２Ｌ、４２Ｒをオフ、指向性マイクロフォンをオンと設定される場合、より一層好適な録音が可能となる。 In the present embodiment, the microphone unit 42 is provided with the non-directional microphones 42L and 42R, but a directional microphone may be further provided. In this case, on / off of the omnidirectional microphones 42L and 42R and the directional microphones may be added as parameters of the recording function set for each recording scene. For example, when the omnidirectional microphones 42L and 42R are set to off and the directional microphone is set to on as the parameters of the recording function set in the dictation scene, even more suitable recording can be performed.

１０・・・ＩＣレコーダ
１６・・・レンズ群
１８・・・ＣＭＯＳイメージャユニット
２０・・・信号処理回路
２２・・・ＣＰＵ
２４・・・操作部
２８・・・ＬＣＤ
３０・・・外部メモリカード制御回路
３２・・・ＳＤＲＡＭ
３４・・・外部メモリカード
３６・・・フラッシュメモリ
４０・・・コーデック
４２・・・マイク部
４６・・・ＤＳＰ DESCRIPTION OF SYMBOLS 10 ... IC recorder 16 ... Lens group 18 ... CMOS imager unit 20 ... Signal processing circuit 22 ... CPU
24 ・・・ Operation unit 28 ・・・ LCD
30 ... External memory card control circuit 32 ... SDRAM
34 ... External memory card 36 ... Flash memory 40 ... Codec 42 ... Microphone unit 46 ... DSP

Claims

Imaging means for capturing an optical image of a subject and outputting it as an image signal;
Scene discrimination means for discriminating a scene based on the image signal;
Recording means for collecting and recording audio;
A recording apparatus comprising: setting means for setting a recording condition for recording based on the scene determined by the scene determining means.

Recording control means for controlling the recording means to record under the recording conditions set by the setting means;
The recording apparatus according to claim 1, further comprising changing means for changing the recording condition set by the setting means.

The recording apparatus according to claim 1, further comprising notification means for notifying a recording condition set by the setting means.

4. The recording apparatus according to claim 1, wherein the scene discriminating unit discriminates a scene by detecting the number of objects and / or the degree of separation between objects from the image signal.

5. The recording apparatus according to claim 1, wherein the scene discrimination unit discriminates a scene by detecting the number of persons and / or the degree of separation between persons based on the image signal.

6. The recording apparatus according to claim 5, wherein the scene discriminating unit discriminates a scene by detecting the number of persons and / or the degree of separation between persons based on a face signal included in the image signal.

The recording condition set based on the scene determined by the scene determining unit in the setting unit includes a condition for collecting sound using an omnidirectional microphone or a directional microphone. The recording apparatus according to claim 1.

A method of setting conditions for recording in a recording device that collects and records audio,
Capturing an optical image of a subject and outputting it as an image signal;
Determining a scene based on the image signal;
A method comprising the steps of setting recording conditions for recording based on the determined scene.

In a processor of an electronic device including an imaging unit that captures an optical image of a subject and outputs it as an image signal, and a recording unit that collects and records sound,
Capturing an optical image of a subject and outputting it as an image signal;
Determining a scene based on the image signal;
A program for executing a step of setting recording conditions for recording based on a determined scene.