JP2003348411A

JP2003348411A - Camera for permitting voice input

Info

Publication number: JP2003348411A
Application number: JP2002153017A
Authority: JP
Inventors: Yoji Watanabe; 洋二渡辺
Original assignee: Olympus Optical Co Ltd
Current assignee: Olympus Corp
Priority date: 2002-05-27
Filing date: 2002-05-27
Publication date: 2003-12-05

Abstract

<P>PROBLEM TO BE SOLVED: To provide a camera for permitting voice input capable of composing received voice with an object image whereby a user can be sensible of a photographing state by having only to view a printed photo image. <P>SOLUTION: The camera is provided with: an imaging optical system 2 for acquiring an electronic image; a microphone 15 for picking up voice information in an imaging operation; a voice recognition circuit 16 for converting the voice information into text data; a character image output circuit 17 for converting the text data into a character image; a liquid crystal panel 7 for displaying an image resulting from composing the object electronic image with the character image; and the voice recognition circuit 16 for deducing the photographing state on the basis of the voice information, the character image output circuit 17 is characterized in to revise the character image on the basis of the photographing state. <P>COPYRIGHT: (C)2004,JPO

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、音声入力可能なカ
メラ、詳しくは、マイクロフォンを有し、入力された音
声を文字画像に変換し、これを撮像した被写体像に合成
して表示する機能を有する音声入力可能なカメラに関す
る。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a camera capable of inputting voice, and more particularly, to a camera having a microphone, which has a function of converting input voice into a character image, synthesizing the character image with a captured subject image, and displaying the image. The present invention relates to a camera capable of inputting voice.

【０００２】[0002]

【従来の技術】周知のように、マイクロフォンを備え、
音声入力を可能としたデジタルカメラは、従来から種々
提供されている。この種のカメラは、カメラ付属のモニ
タ装置で、撮像された被写体画像を再生する際、入力さ
れた音声も同時に再生するようになっている。しかし、
この種のカメラは、画像を紙にプリントした場合には、
その音声情報までは再生できなくなる。2. Description of the Related Art As is well known, a microphone is provided,
Various digital cameras capable of voice input have been conventionally provided. In this type of camera, when a captured subject image is reproduced by a monitor device attached to the camera, input sound is reproduced at the same time. But,
This type of camera, when printing images on paper,
Even the audio information cannot be reproduced.

【０００３】そこで、特開平５−１８４５４４号公報に
開示されたカメラでは、入力された音声を画像表示と同
時に再生するだけでなく、入力された音声を分析し、前
記音声を“ひらがな”、“カタカナ”、“漢字”、“ア
ルファベット”等の文字画像に変換し、これを被写体像
と合成することによって、撮影した画像を印刷する場
合、文字も一緒に印字出来ることを可能としている。Therefore, the camera disclosed in Japanese Patent Laid-Open No. 5-184544 not only reproduces the input sound at the same time as displaying the image, but also analyzes the input sound and converts the sound into "Hiragana" and "Hiragana". By converting the image into a character image such as "Katakana", "Kanji", or "Alphabet" and synthesizing it with the subject image, it is possible to print the character when printing the photographed image.

【０００４】よって、このカメラによれば、撮影時の会
話内容は、印刷された画像上に書き込まれるので、静止
画像を見るだけでは得られない「撮影時の雰囲気」や
「臨場感」を印刷された画像から感じ取ることができ
る。Therefore, according to this camera, the conversation content at the time of shooting is written on the printed image, so that the "atmosphere at the time of shooting" or "realism" that cannot be obtained by simply viewing a still image is printed. Can be sensed from the rendered image.

【０００５】[0005]

【発明が解決しようとする課題】ところで、人が音声を
発声する場合、状況に応じて言葉の抑揚や大きさ、音
量、速さ等に大きな変化が生じるのが普通であり、例え
ば、家族や友人と歓談している時の発声と、葬式等の式
典や公的な場での発声とでは、同じ言葉を発する場合で
も、言葉の抑揚等に違いが発生する。By the way, when a person utters a voice, the inflection, the size, the volume, the speed, and the like of words usually change greatly depending on the situation. There is a difference in the inflection of the words when speaking the same words with a friend and when speaking at a ceremony such as a funeral ceremony or a public place, even if the same words are spoken.

【０００６】ところが、上記特開平５−１８４５４４号
公報に開示の音声入力可能なカメラにおいては、このよ
うな人の発声方法までは考慮していないため、印刷され
た画像上に印字された文字は、撮影時の状況とは無関係
に画一的に書き込まれてしまうので、その印字された文
字から撮影時の雰囲気を感じ取ることは難しいといった
問題があった。However, in the camera capable of voice input disclosed in the above-mentioned Japanese Patent Application Laid-Open No. 5-184544, since such a method of uttering a person is not taken into account, characters printed on a printed image are not considered. However, since the information is written uniformly regardless of the situation at the time of shooting, there is a problem that it is difficult to feel the atmosphere at the time of shooting from the printed characters.

【０００７】本発明の目的は、上記事情に鑑みてなされ
たものであり、入力された音声を分析して文字画像に変
換し、被写体像と合成することができるカメラにおい
て、印刷された写真画像を見ただけで「撮影時の雰囲
気」や「臨場感」を感じ取ることができるようにした音
声入力可能なカメラを提供するにある。SUMMARY OF THE INVENTION The object of the present invention has been made in view of the above circumstances, and a photographic image printed by a camera capable of analyzing input speech, converting it into a character image, and synthesizing it with a subject image. It is an object of the present invention to provide a camera capable of inputting voice so that an "atmosphere at the time of shooting" or "realism" can be sensed just by looking at the camera.

【０００８】[0008]

【課題を解決するための手段、及び作用】上記の目的を
達成するために本発明による音声入力可能なカメラは、
撮影光学系を介して被写体を撮像し、被写体の電子画像
を取得する撮像手段と、上記撮像手段による撮像動作時
に、音声情報を取り込む音声入力手段と、上記音声情報
をテキストデータに変換するテキストデータ設定手段
と、上記テキストデータを文字画像に変換する文字画像
生成手段と、上記撮像手段で取得した被写体の電子画像
と上記文字画像とが合成された画像を表示可能な表示手
段と、上記音声情報に基づいて撮影状況を類推する撮影
状況類推手段とを具備し、上記文字画像生成手段は、上
記撮影状況類推手段で類推された撮影状況に基づいて、
上記文字画像の特性を変更するようにしたことを特徴と
し、また、上記撮影状況類推手段は、上記音声情報の音
量、及び／又は、抑揚に基づいて撮影状況を判断するこ
とを特徴とし、また、上記撮影状況類推手段は、さら
に、上記音声情報の周波数も加味して類推することを特
徴とし、さらに、上記文字画像生成手段は、上記撮影状
況類推手段によって類推された撮影状況に応じて、文字
画像の形状、色、又は大きさの少なくとも一つを変更す
るようにしたことを特徴とする。In order to achieve the above object, a camera capable of voice input according to the present invention comprises:
Imaging means for imaging a subject via a photographing optical system and acquiring an electronic image of the subject; voice input means for capturing voice information during the imaging operation by the imaging means; and text data for converting the voice information to text data Setting means; character image generating means for converting the text data into a character image; display means capable of displaying an image obtained by synthesizing the electronic image of the subject obtained by the imaging means with the character image; A photographing situation analogy means for inferring a photographing situation based on the character image generating means, based on the photographing situation analogized by the photographing situation analogy means,
The characteristic of the character image is changed, and the photographing situation analogy means judges the photographing situation based on the volume of the audio information and / or intonation, The photographing situation analogy means is further characterized in that the analogy also takes into account the frequency of the audio information, and further, the character image generating means, according to the photographing situation analogized by the photographing situation analogy means, At least one of the shape, the color, and the size of the character image is changed.

【０００９】[0009]

【発明の実施の形態】先ず、本発明の一実施の形態を説
明するに先立って、本発明の音声入力可能なカメラの概
要について説明する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Before describing an embodiment of the present invention, an outline of a camera capable of inputting voice according to the present invention will be described.

【００１０】本発明の音声入力可能なカメラは、従来の
デジタルカメラと同様、ＣＣＤ等の撮像素子を含む撮像
手段で被写体を撮像し、メモリカード等の記録媒体に撮
像した電子画像を記録し、また、記録した電子画像を前
記記録媒体から読み出して液晶表示装置等の表示手段に
表示することが可能である。そして、音声情報を取り込
むためのマイクロフォンを含む音声入力手段を有してい
て、撮影時の音声を画像情報とともに取得することがで
き、さらに、上記音声情報を文字画像データに置換し、
被写体像の電子画像と合成した画像を液晶モニタ画面上
に表示することが可能である。そして、上記文字画像デ
ータは、入力された音声の大きさ（音量）や抑揚（ダイ
ナミックレンジ）等により、最適な形状が選択され、こ
の文字画像データは、撮影者の意思によって変更が可能
である。The camera capable of voice input according to the present invention captures an image of a subject by an image capturing means including an image sensor such as a CCD and records the captured electronic image on a recording medium such as a memory card, similarly to a conventional digital camera. Further, it is possible to read out the recorded electronic image from the recording medium and display it on a display means such as a liquid crystal display device. And it has voice input means including a microphone for capturing voice information, can obtain voice at the time of shooting together with image information, and further replaces the voice information with character image data,
An image synthesized with the electronic image of the subject image can be displayed on the liquid crystal monitor screen. The character image data has an optimal shape selected according to the volume (volume) and inflection (dynamic range) of the input voice, and the character image data can be changed according to the photographer's intention. .

【００１１】以下、図面を参照して本発明の実施の形態
を説明する。図１は、本発明の一実施の形態である音声
入力可能なカメラの電気回路の構成を示すブロック図で
ある。An embodiment of the present invention will be described below with reference to the drawings. FIG. 1 is a block diagram showing a configuration of an electric circuit of a camera capable of inputting audio according to an embodiment of the present invention.

【００１２】本発明の音声入力可能なカメラ（以下、デ
ジタルカメラと称す）１には、撮影者の操作に応じた動
作を実現するための全体的な制御を行うシステムコント
ローラ１３が内蔵されており、このシステムコントロー
ラ１３には、撮影光学系２と、撮影処理系（ＣＣＤ、Ａ
／Ｄ）３と、画像処理コントローラ４と、記録系（メモ
リカード）５と、ＬＣＤドライバ６と、コントロールパ
ネル（ＬＣＤ）８と、ファインダ情報液晶パネル９、フ
ァインダ光学系（ＯＶＦ）１１で構成されるファインダ
観察部１０と、入力部１２がそれぞれ接続されている。
更に、上記画像処理コントローラ４には、文字画像出力
回路１７が接続されており、前記文字画像出力回路１７
には、音声認識回路１６が接続され、また、音声認識回
路１６には、音声を電気信号に変換するマイクロフォン
（以下、マイクと称す）１５が接続されている。更に、
前記ＬＣＤドライバ６には、液晶パネル７が接続されて
いる。A camera (hereinafter referred to as a digital camera) 1 capable of voice input according to the present invention has a built-in system controller 13 for performing overall control for realizing an operation according to a photographer's operation. The system controller 13 includes a photographing optical system 2 and a photographing processing system (CCD, A
/ D) 3, an image processing controller 4, a recording system (memory card) 5, an LCD driver 6, a control panel (LCD) 8, a finder information liquid crystal panel 9, and a finder optical system (OVF) 11. A finder observation unit 10 and an input unit 12 are connected to each other.
Further, a character image output circuit 17 is connected to the image processing controller 4.
Is connected to a speech recognition circuit 16, and a microphone (hereinafter, referred to as a microphone) 15 for converting speech into an electric signal is connected to the speech recognition circuit 16. Furthermore,
A liquid crystal panel 7 is connected to the LCD driver 6.

【００１３】このように構成されたデジタルカメラ１に
おいては、被写体からその光束が複数の撮影レンズ群か
らなる上記撮影光学系２を介して撮影処理系３のＣＣＤ
に入射される。上記撮影処理系３は、図示しないＣＣ
Ｄ、撮像回路、Ａ／Ｄ変換器等からなり、ＣＣＤからの
出力はＡ／Ｄ変換されて撮影処理系３から出力され、こ
のＡ／Ｄ変換出力は画像処理コントローラ４により画像
処理されてメモリカード等を含む記録系５にて記録され
るようになっている。In the digital camera 1 configured as described above, a light beam from a subject is transmitted to the CCD of the photographing processing system 3 via the photographing optical system 2 comprising a plurality of photographing lens groups.
Is incident on. The photographing processing system 3 includes a CC (not shown).
D, an imaging circuit, an A / D converter, etc., the output from the CCD is A / D converted and output from the photographing processing system 3, and the A / D converted output is subjected to image processing by the image processing controller 4 and stored in a memory. The data is recorded by a recording system 5 including a card and the like.

【００１４】上記画像処理コントローラ４は、前記撮影
処理系３からＡ／Ｄ変換されて出力された電子画像信号
を、図示しないＲＡＭ等からなる画像バッファを利用し
て、公知のホワイトバランス処理、カラー処理、ガンマ
補正、シャープネス調整等の画像処理や、さらにはＪＰ
ＥＧ圧縮処理、伸張処理を実行し、メモリカードインタ
ーフェース、メモリカード本体等の記録媒体からなる上
記記録系５に出力するものである。また、この画像処理
コントローラ４は、後述する文字画像データの画像処理
も行う。The image processing controller 4 converts the electronic image signal, which has been A / D converted and output from the photographing processing system 3, into a known white balance process and color image by using an image buffer such as a RAM (not shown). Processing, gamma correction, image processing such as sharpness adjustment, and even JP
EG compression processing and decompression processing are executed, and output to the recording system 5 composed of a recording medium such as a memory card interface and a memory card body. The image processing controller 4 also performs image processing of character image data described later.

【００１５】上記記録系５は、前記画像処理コントロー
ラ４で前述した各種処理が行われた電子画像信号や、後
述する文字画像データを記録するものであり、記録媒体
としては、上記メモリカードの他、ハードディスクやフ
ロッピー（登録商標）ディスク等の磁気ディスクやＭＯ
等の光磁気ディスク等を用いても良い。The recording system 5 is for recording electronic image signals subjected to the various processes described above by the image processing controller 4 and character image data to be described later. , A magnetic disk such as a hard disk or a floppy disk,
And the like may be used.

【００１６】上記メモリカードに記録された画像や画像
処理された画像は、上記ＬＣＤドライバ６を介して再生
用の液晶パネル７に表示される。また、後述する文字デ
ータを合成した画像も表示することが可能である。更
に、前記再生用液晶パネル７には、再生画像の他、メニ
ュー画面が表示されるようになっており、画像再生や画
像加工、削除等の撮影画像関連についての処理入力や時
間設定等の各種設定を、ここに表示されるメニューから
行うことが出来る。The image recorded on the memory card or the image processed is displayed on the liquid crystal panel 7 for reproduction via the LCD driver 6. In addition, it is possible to display an image obtained by combining character data described later. Further, the reproduction liquid crystal panel 7 is configured to display a menu screen in addition to the reproduction image, and to perform various processing operations such as image input, processing input, time setting, and the like for image reproduction, image processing, deletion, and the like. Settings can be made from the menu displayed here.

【００１７】また、本デジタルカメラ１は、光学ビュー
ファインダを内蔵しており、このファインダ光学系１１
からの像は、上記ファインダ観察部１０で観察すること
ができる。Further, the digital camera 1 has a built-in optical view finder, and
Can be observed by the finder observation unit 10.

【００１８】上記コントロールパネル（ＬＣＤ）８は、
撮影モードや撮影条件、残撮影可能枚数等の各種撮影関
連情報を表示するものである。尚、撮影関連情報の一部
は、前記液晶パネル７や前記ファインダ観察部１０に配
設された上記ファインダ情報液晶パネル９においても表
示可能である。The control panel (LCD) 8 includes:
Various kinds of photographing-related information such as a photographing mode, photographing conditions, and the number of remaining photographable images are displayed. A part of the photographing-related information can also be displayed on the liquid crystal panel 7 or the finder information liquid crystal panel 9 provided in the finder observation unit 10.

【００１９】上記システムコントローラ１３は、上記入
力部１２からの操作入力に従い、撮影再生モード設定、
撮影条件設定、撮影実行、画像記録、再生表示、再生画
像加工等を行うための上記構成部２乃至９の制御を行う
ものである。The system controller 13 sets a photographing / reproduction mode in accordance with an operation input from the input unit 12,
It controls the components 2 to 9 for setting photographing conditions, executing photographing, recording images, reproducing and displaying, and processing reproduced images.

【００２０】この制御にはＡＦ（自動合焦）、シャッタ
速度設定、連写／単写設定、ＡＥ（自動露出）、測光モ
ード、露出補正、ＩＳＯ感度、ホワイトバランス、スト
ロボ制御、ＣＣＤ制御、画像処理指示、記録制御、画質
設定等の撮影に関する制御（以下、撮影関係制御と称
す）の他、再生画像表示、表示画像選択、電子ズーム、
インデックス表示、パノラマ合成、画像プロテクト、画
像削除等の画像の表示・加工に関する制御（以下、再生
関係制御と称す）や、更には時計設定、ビープ音設定、
プリント予約設定、液晶パネルの明るさ調整等のその他
の各種設定に対する制御（以下、その他各種設定制御と
称す）が含まれる。The control includes AF (automatic focusing), shutter speed setting, continuous shooting / single shooting setting, AE (automatic exposure), metering mode, exposure correction, ISO sensitivity, white balance, strobe control, CCD control, image control In addition to processing-related control such as processing instructions, recording control, and image quality setting (hereinafter, referred to as shooting-related control), playback image display, display image selection, electronic zoom,
Index display, panorama synthesis, image protection, image deletion and other image display / processing control (hereinafter referred to as playback-related control), and further, clock setting, beep sound setting,
Controls for other various settings such as print reservation settings and liquid crystal panel brightness adjustment (hereinafter, referred to as other various setting controls) are included.

【００２１】本実施形態のデジタルカメラ１では、上記
３種類の制御（撮影関係制御、再生関係制御及びその他
各種設定制御）が存在し、制御種別によって、その設定
や操作の入力方法が異なっている。In the digital camera 1 of the present embodiment, the above three types of control (shooting-related control, reproduction-related control, and other various setting controls) exist, and the method of inputting settings and operations differs depending on the control type. .

【００２２】尚、撮影関係制御に関連する入力は条件設
定ダイヤル、及び条件種別釦（いずれも図示されず）に
よって行われ、このような設定処理を撮影条件設定処理
という。一方、再生関係制御及びその他各種設定制御に
関連する入力は、前記液晶パネル７に表示されるメニュ
ー画面から行われ、このような設定処理を再生関連・そ
の他処理という。前記システムコントローラ１３は、上
述した各設定処理を実行する。Incidentally, the input related to the photographing-related control is performed by a condition setting dial and a condition type button (both not shown), and such a setting process is called a photographing condition setting process. On the other hand, inputs related to the reproduction-related control and other various setting controls are performed from a menu screen displayed on the liquid crystal panel 7, and such setting processing is referred to as reproduction-related / other processing. The system controller 13 executes each of the setting processes described above.

【００２３】また、本実施形態のデジタルカメラ１は、
音声入力手段としての上記マイク１５が配設されてお
り、撮像動作に同期して、このマイク１５によって音声
情報を所定時間（例えば５秒間）だけ取り込むことがで
きる。The digital camera 1 of the present embodiment is
The microphone 15 as the voice input means is provided, and the microphone 15 can capture voice information for a predetermined time (for example, 5 seconds) in synchronization with the imaging operation.

【００２４】上記音声認識回路１６は、前記マイク１５
で入力された音声情報を認識し、テキストデータに変換
するものであり、更に、音声データに基づいて特徴抽出
を行い、その結果に基づいて撮影状況を分析し、後述す
る複数のカテゴリ（歓談／公的な場での発声等）の内の
一つに分類するものである。この音声認識回路１６は、
本発明におけるテキストデータ設定手段および、撮影状
況類推手段を構成している。The voice recognition circuit 16 is connected to the microphone 15
Recognizes the voice information input in step (1) and converts it into text data. Further, it extracts features based on the voice data, analyzes the shooting situation based on the result, and performs a plurality of categories (chat / Public speaking, etc.). This speech recognition circuit 16
It constitutes text data setting means and photographing situation analogy means in the present invention.

【００２５】上記文字画像出力回路１７は、前記音声認
識回路１６から出力された音声情報を文字画像データに
変換するものであり、この文字画像データは上記画像処
理コントローラ４に出力され、被写体の電子画像（音声
入力時に撮影した画像）と共にメモリカード５に記憶さ
れる。このとき、上記２つの画像データ（電子画像デー
タと音声画像データ）は関連付けされてはいるが、合成
はされていない。この文字画像出力回路１７は、本発明
における文字画像生成手段を構成している。The character image output circuit 17 converts the voice information output from the voice recognition circuit 16 into character image data. The character image data is output to the image processing controller 4, and the electronic image of the subject is output. It is stored in the memory card 5 together with the image (the image taken at the time of voice input). At this time, the two image data (the electronic image data and the audio image data) are associated but not combined. This character image output circuit 17 constitutes a character image generation unit in the present invention.

【００２６】前記画像処理コントローラ４は、前記メモ
リカード５に記録された画像を撮影者の指示に応じて読
み出し、前記ＬＣＤドライバ６を介して前記液晶パネル
７から表示させる機能を有しているが、その際、読み出
し対象の画像に文字画像データが関連付けされている場
合には、その文字画像データも一緒に読み出し、合成処
理した後に前記液晶パネル７に表示する。尚、この液晶
パネル７は、本発明における表示手段を構成している。The image processing controller 4 has a function of reading out an image recorded on the memory card 5 in accordance with a photographer's instruction and displaying the image on the liquid crystal panel 7 through the LCD driver 6. At this time, if the character image data is associated with the image to be read, the character image data is also read out and displayed on the liquid crystal panel 7 after the synthesizing process. Note that the liquid crystal panel 7 constitutes a display unit in the present invention.

【００２７】このとき、撮影状況に対して合成文字の形
状が不適切であると感じた場合には、撮影者は、上記入
力部１２に含まれる指示部材（不図示）を操作して別の
文字データに変更することができる。具体的には、撮影
者が入力部１２内の指示部材を操作すると、その操作信
号を前記システムコントローラ１３が検知し、前記画像
処理コントローラ４に出力する。その操作信号を受けた
画像処理コントローラ４は、前記文字画像出力回路１７
に対して次に適当と思われる文字画像データを送信する
ように指示する。これを受けて前記文字画像出力回路１
７から、新たな文字画像データが前記画像処理コントロ
ーラ４に返信される。そして、これを受けて画像処理コ
ントローラ４は、前記メモリカード５の文字画像データ
の更新と合成画像表示を再度行う。At this time, if the photographer feels that the shape of the composite character is inappropriate for the photographing situation, the photographer operates an instruction member (not shown) included in the input unit 12 to obtain another character. It can be changed to character data. Specifically, when the photographer operates an instruction member in the input unit 12, the operation signal is detected by the system controller 13 and output to the image processing controller 4. Upon receiving the operation signal, the image processing controller 4 operates the character image output circuit 17.
Is instructed to transmit the next appropriate character image data. In response to this, the character image output circuit 1
7, new character image data is returned to the image processing controller 4. Then, in response to this, the image processing controller 4 updates the character image data of the memory card 5 and displays the composite image again.

【００２８】次に、前記音声認識回路１６と前記文字画
像出力回路１７の構成、および動作を図２を用いて詳細
に説明する。Next, the configuration and operation of the voice recognition circuit 16 and the character image output circuit 17 will be described in detail with reference to FIG.

【００２９】図２に示すように、上記音声認識回路１６
と上記文字画像出力回路１７は前述した接続関係にあ
る。As shown in FIG. 2, the speech recognition circuit 16
And the character image output circuit 17 have the connection relationship described above.

【００３０】上記音声認識回路１６には、フィルタ手段
３０と、デジタル変換手段３１と、周波数分析手段３２
と、マッチング手段３３と、識別パターンテーブル３４
と、類推手段３５と、設定手段３６が内蔵されている。The voice recognition circuit 16 includes a filter means 30, a digital conversion means 31, a frequency analysis means 32
, Matching means 33 and identification pattern table 34
, Analoging means 35, and setting means 36.

【００３１】上記マイク１５の出力端はフィルタ手段３
０の入力端に接続されており、前記フィルタ手段３０の
出力端は、上記デジタル変換手段３１の入力端に接続さ
れており、前記デジタル変換手段３１の出力端は、上記
周波数分析手段３２の入力端、および上記類推手段３５
の入力端に接続されている。また、前記周波数分析手段
３２の出力端は、上記マッチング手段３３の入力端、お
よび前記類推手段３５の入力端に接続されており、ま
た、前記マッチング手段３３の入力端には、上記識別パ
ターンテーブル３４の出力端が接続されている。前記マ
ッチング手段３３の出力端は、上記設定手段３６の入力
端に接続されている。さらに、前記類推手段３５の出力
端は、後述する上記文字画像出力回路１７のフォント選
択手段３７の入力端に接続され、前記設定手段３６の出
力端は、後述する上記文字画像出力回路１７の文字画像
生成手段３８の入力端に接続されている。The output terminal of the microphone 15 is
0, the output end of the filter means 30 is connected to the input end of the digital conversion means 31, and the output end of the digital conversion means 31 is connected to the input end of the frequency analysis means 32. End and the analogy means 35
Is connected to the input terminal of An output terminal of the frequency analysis unit 32 is connected to an input terminal of the matching unit 33 and an input terminal of the analogization unit 35. The input terminal of the matching unit 33 is connected to the identification pattern table. 34 output terminals are connected. An output terminal of the matching unit 33 is connected to an input terminal of the setting unit 36. Further, an output terminal of the analog inference means 35 is connected to an input terminal of a font selection means 37 of the character image output circuit 17 described later, and an output terminal of the setting means 36 is connected to a character output of the character image output circuit 17 described later. It is connected to the input end of the image generation means 38.

【００３２】上記文字画像出力回路１７には、フォント
選択手段３７と、文字画像生成手段３８と、フォントテ
ーブル３９が内蔵されている。The character image output circuit 17 includes a font selecting means 37, a character image generating means 38, and a font table 39.

【００３３】前記フォントテーブル３９の出力端は、上
記フォント選択手段３７の入力端に接続されており、前
記フォント選択手段３７の出力端は、上記文字画像生成
手段３８の入力端に接続されており、前記文字画像生成
手段３８の出力端は、上記画像処理コントローラ４の入
力端に接続されている。また、前記フォント選択手段３
７の入力端には、前記画像処理コントローラ４の出力端
が接続されている。The output terminal of the font table 39 is connected to the input terminal of the font selecting means 37, and the output terminal of the font selecting means 37 is connected to the input terminal of the character image generating means 38. The output terminal of the character image generating means 38 is connected to the input terminal of the image processing controller 4. The font selection means 3
The output terminal of the image processing controller 4 is connected to the input terminal 7.

【００３４】前記マイク１５に取り込まれた音声情報
は、上記音声認識回路１６内の前記フィルタ手段３０に
よってノイズが除去された後、前記デジタル変換手段３
１に出力され、デジタルデータに変換される。The voice information taken into the microphone 15 is filtered by the filter means 30 in the voice recognition circuit 16 to remove noise therefrom.
1 and converted to digital data.

【００３５】そして、変換されたデジタルデータは、上
記周波数分析手段３２で周波数の特徴抽出が行われ、続
いて上記マッチング手段３３によって、これに出力され
る上記識別パターンテーブル３４内に記憶されている識
別パターンとのパターンマッチングによって認識が行わ
れる。そして、その認識結果に基づき、前記デジタルデ
ータは、上記設定手段３６に出力され文字データの割り
当て設定がなされる。つまりアナログ音声信号をデジタ
ル変換し、テキストデータに変換する一連の処理が行わ
れる。The converted digital data is subjected to frequency feature extraction by the frequency analysis means 32 and subsequently stored by the matching means 33 in the identification pattern table 34 outputted thereto. Recognition is performed by pattern matching with the identification pattern. Then, based on the recognition result, the digital data is output to the setting means 36, and character data allocation setting is performed. That is, a series of processes for converting an analog audio signal into digital data and converting it into text data are performed.

【００３６】また、前記デジタル変換手段３１でデジタ
ル変換されたデジタルデータは、前記類推手段３５に入
力され、音声データの音量、及び／又は抑揚を分析し
て、撮影状況が類推される。The digital data converted by the digital conversion means 31 is input to the analogization means 35, and the volume and / or intonation of the audio data is analyzed to estimate the shooting condition.

【００３７】この類推手段３５は、音量が小さく、抑揚
が少ない場合は、例えば式典会場等のあまり活発に動き
回ることができない状況であると判断し、また、音量が
大きく、抑揚か激しい場合は、例えばパーティー、運動
会等の動きの激しい状況であると判断する。また、前記
周波数分析手段３２の出力から、周波数帯の周波数の幅
が広い場合は人数が多いと判断したり、周波数が高い場
合は子供であると判断して、周波数データを加味するよ
うにしている。When the volume is small and the inflection is small, the analogy inference means 35 determines that the situation is such that it is not possible to move around very actively, for example, at a ceremony hall. For example, it is determined that the situation is such as a party, a sports day, or the like, in which the movement is intense. Also, from the output of the frequency analysis means 32, when the frequency width of the frequency band is wide, it is determined that there are many people, and when the frequency is high, it is determined that there are children, and the frequency data is taken into account. I have.

【００３８】そして、前記類推手段３５は、前記類推結
果より、例えば「静かな式典」、「多人数のパーティ
ー」、「子供を含むスナップ」、「戸外での記念写真」
というように、撮影状況をいくつかのカテゴリに分別す
る。Based on the result of the analogy, the analogy inference means 35 generates, for example, a "quiet ceremony", a "multiple party", a "snap including children", and a "commemorative photo outdoors".
In this way, the shooting situation is classified into several categories.

【００３９】上記文字画像出力回路１７においては、前
記類推手段３５から出力された類推結果に応じて、上記
フォト選択手段３７が、これに出力される上記フォント
テーブル３９より文字画像のフォントを選択し、上記文
字画像生成手段３８に出力する。In the character image output circuit 17, the photo selecting means 37 selects a font of a character image from the font table 39 outputted thereto according to the analogy result output from the analog inference means 35. Is output to the character image generating means 38.

【００４０】前記文字画像生成手段３８は、前記マッチ
ング手段３３から出力されたテキストデータと、前記選
択されたフォントに基づいて文字画像を生成し、上記画
像処理コントローラ４に出力する。The character image generating means 38 generates a character image based on the text data output from the matching means 33 and the selected font, and outputs the character image to the image processing controller 4.

【００４１】また、前述したように、上記画像処理コン
トローラ４より、文字画像の変更指示がなされた場合に
は、その信号を受けた前記フォント選択手段３７は、次
の文字候補を選択して前記文字画像生成手段３８に出力
する。Further, as described above, when a character image change instruction is issued from the image processing controller 4, the font selecting means 37 receiving the signal selects the next character candidate and selects the next character candidate. Output to the character image generating means 38.

【００４２】尚、フォントの形状は、例えば「静かな式
典」であれば、明朝体を基調にした文字とし、「多人数
のパーティー」であれば、ポップ体を基調とし、「子供
を含むスナップ」であれば、コミック体を基調とすると
いうように、被写体およびその撮影状況に適合する雰囲
気のフォントを用いると良い。For example, if the font shape is "quiet ceremony", the font is based on the Mincho style, and if "the party is large", the font style is based on the pop style, and "including children". In the case of "snap", it is preferable to use a font having an atmosphere suitable for the subject and its shooting situation, such as a comic-based tone.

【００４３】また、楽しい状況では暖色系の文字色と
し、悲しい状況では寒色系にするというように、色によ
る表現を加味するようにしても良い。In addition, it may be possible to add a color expression such that a warm color is used in a pleasant situation and a cool color is used in a sad situation.

【００４４】さらに、大きな声は大きな文字にしたり、
太い文字にする等、強調処理を行っても良い。Further, a loud voice can be made into a large character,
Emphasis processing may be performed, such as using thick characters.

【００４５】上述したようにして、上記文字画像出力回
路１７は、上記音声認識回路１６から出力された文字デ
ータ（テキストデータ）とカテゴリデータとに基づき、
文字画像データを選択的に出力する。As described above, the character image output circuit 17 performs the character image (text data) output from the voice recognition circuit 16 and the category data on the basis of the category data.
Selectively output character image data.

【００４６】このように本発明の一実施形態を示すカメ
ラにおいては、上記デジタルカメラ１の上記マイク１５
に、撮影時、音声が入力すると、上記音声認識回路１６
に内蔵された上記類推手段３５により、入力された音声
の大きさ、抑揚、周波数等の音声データから撮影状況を
類推し、上記文字画像出力回路１７に配設された前記フ
ォント選択手段３７によって、撮影状況に応じた文字デ
ータを選択し、続いて上記文字画像生成手段３８で、文
字画像データを作成して、さらに、上記画像処理コント
ローラ４で撮影された画像と前記文字画像データを合成
する処理を行うので、上記液晶パネル７には、撮影され
た画像とともに、撮影状況に応じた文字画像データを合
成表示することができる。As described above, in the camera according to the embodiment of the present invention, the microphone 15 of the digital camera 1 is used.
When a voice is input at the time of shooting, the voice recognition circuit 16
The photographing situation is inferred from the audio data such as the loudness, intonation, and frequency of the input voice by the analog inference means 35 incorporated in the CPU, and the font selection means 37 provided in the character image output circuit 17 by A process of selecting character data according to a shooting situation, subsequently creating character image data by the character image generating means 38, and further combining the image captured by the image processing controller 4 with the character image data Is performed, character image data according to the shooting situation can be displayed on the liquid crystal panel 7 together with the shot image.

【００４７】[0047]

【発明の効果】以上説明したように本発明によれば、印
刷された写真画像を見ただけで「撮影時の雰囲気」や
「臨場感」を感じ取ることができる音声入力可能なカメ
ラを提供できる。As described above, according to the present invention, it is possible to provide a camera capable of voice input that allows the user to feel "atmosphere at the time of shooting" or "realism" simply by looking at the printed photographic image. .

[Brief description of the drawings]

【図１】本発明の一実施の形態である音声入力可能なカ
メラの電気回路の構成を示すブロック図、FIG. 1 is a block diagram showing a configuration of an electric circuit of a camera capable of inputting audio according to an embodiment of the present invention;

【図２】本発明の一実施の形態の音声入力可能なカメラ
における音声認識回路と文字画像出力回路の構成を示す
ブロック図。FIG. 2 is a block diagram showing a configuration of a voice recognition circuit and a character image output circuit in the camera capable of voice input according to the embodiment of the present invention.

[Explanation of symbols]

２…撮影光学系（撮像手段）７…液晶パネル（表示手段）１５…マイクロフォン（音声入力手段）１６…音声認識回路（テキストデータ設定手段）（撮影
状況類推手段）１７…文字画像出力回路（文字画像生成手段）2. Photographing optical system (imaging means) 7 ... Liquid crystal panel (display means) 15 ... Microphone (voice input means) 16 ... Voice recognition circuit (text data setting means) (photographing situation analogy means) 17 ... Character image output circuit (characters) Image generation means)

Claims

[Claims]

1. An image pickup means for picking up an electronic image of a subject through a photographic optical system, an audio input means for taking in audio information at the time of an image pickup operation by the image pickup means, and a text data for inputting the audio information. Text data setting means for converting the text data into a character image; character image generating means for converting the text data into a character image; and display means capable of displaying an image obtained by synthesizing the electronic image of the subject obtained by the imaging means with the character image. And photographing situation analogy means for inferring a photographing situation based on the audio information, and wherein the character image generating means is characterized in that the characteristic of the character image is based on the photographing situation analogized by the photographing situation analogy means. A camera capable of voice input, wherein the camera is changed.

2. The camera according to claim 1, wherein said photographing situation analogy means judges the photographing situation based on a volume and / or intonation of the voice information.

3. The camera capable of voice input according to claim 2, wherein said photographing situation analogy means further performs analogy in consideration of a frequency of said voice information.

4. The character image generating means changes at least one of a shape, a color, and a size of a character image according to a photographing situation estimated by the photographing situation analogizing means. The camera capable of voice input according to claim 1.