JP3183220B2

JP3183220B2 - Digital camera image comment input device

Info

Publication number: JP3183220B2
Application number: JP18616897A
Authority: JP
Inventors: 博喜藤野
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1997-07-11
Filing date: 1997-07-11
Publication date: 2001-07-09
Anticipated expiration: 2017-07-11
Also published as: JPH1132239A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明はディジタルカメラ画
像のコメント入力装置に係り、特に音声データを文字列
データに変換する音声認識技術、変換する文字列の表示
位置の指定を行うディジタルカメラ画像のコメント入力
装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a digital camera image comment input device, and more particularly to a voice recognition technology for converting voice data into character string data, and a digital camera image comment for designating a display position of a character string to be converted. Related to input device.

【０００２】[0002]

【従来の技術】ディジタルカメラは、画像を録画する機
能、録画した画像を任意に消去する機能、録画した画像
を保持し再生する機能、画像に関した数々の機能を有し
ている。また小型化という点も技術的なポイントが占め
る割合が大きい。現在、ディジタルカメラに音声入力機
能を付加したボイスメール機能付きのディジタルカメラ
システムは存在する。しかし、ディジタルカメラに表示
上の入力が可能なのは画像情報のみである。携帯端末機
器へのユーザの入力手段に音声入力があるが、その入力
に関しての認識・確定は一方的な機械の認識に依存する
ものである。2. Description of the Related Art A digital camera has a function of recording an image, a function of arbitrarily erasing a recorded image, a function of holding and reproducing a recorded image, and various functions relating to an image. In terms of miniaturization, too, technical points occupy a large proportion. At present, there is a digital camera system with a voice mail function in which a voice input function is added to a digital camera. However, only image information can be input on the display of the digital camera. There is a voice input to the user's input means to the portable terminal device, and recognition and determination of the input depends on unilateral machine recognition.

【０００３】特開平８−２８７１００公報によれば、メ
ニュー操作で画面を選択する場合の操作性が向上される
情報検索装置を提供する記載があり、その内容を要約す
れば以下の通りである。検索語逃げ入力画面では、「単
語の意味を調べる」と、「目次を見る」という画面にな
っている。「目次を見る」を選択すると、確定のための
ＹＥＳキーを押すことなしに、所定時間経過後に、目次
画面になる。所定時間経過する前に、ＮＯキーを押す
と、元の画面に戻る。選択操作を確定キーを押すことな
しにおこなえるので、操作性が向上される。According to Japanese Patent Application Laid-Open No. Hei 8-287100, there is a description of providing an information retrieval apparatus that improves operability when a screen is selected by a menu operation, and the contents thereof are summarized as follows. In the search word escape input screen, the screens include "check the meaning of a word" and "view the table of contents". When "view the table of contents" is selected, the table of contents screen is displayed after a predetermined time has elapsed without pressing the YES key for confirmation. If the NO key is pressed before the predetermined time has elapsed, the screen returns to the original screen. Since the selection operation can be performed without pressing the confirmation key, the operability is improved.

【０００４】これを従来のコメント入力装置として、図
面を用いて説明する。図１８は従来のコメント入力装置
を適用した装置の一例の全体構成ブロック図、図１９は
従来のコメント入力装置のキー配列の説明に用いる平面
図、図２０は従来のコメント入力装置の説明図、図２１
は、従来のコメント入力装置の一例に用いる初期画面、
図２２は従来のコメント入力装置の一例に用いる目次画
面、図２３は従来のコメント入力装置の一例の説明に用
いるフローチャートである。[0004] This will be described with reference to the drawings as a conventional comment input device. 18 is an overall block diagram of an example of a device to which the conventional comment input device is applied, FIG. 19 is a plan view used to explain the key arrangement of the conventional comment input device, FIG. 20 is an explanatory diagram of the conventional comment input device, FIG.
Is an initial screen used as an example of a conventional comment input device,
FIG. 22 is a table of contents screen used for an example of a conventional comment input device, and FIG. 23 is a flowchart used for explaining an example of a conventional comment input device.

【０００５】図１８において従来のコメント入力装置
は、光ディスクドライブ２１、データ処理部２２、Ｉ／
Ｏ２３、ＣＰＵ２４、メインＲＯＭ２５、メインＲＡＭ
２６、入力キーボード２７、表示コントローラ２８、Ｖ
ＲＡＭ２９、ディスプレイ３０、電源回路３１とで構成
される。In FIG. 18, a conventional comment input device includes an optical disk drive 21, a data processing unit 22, an I / O
O23, CPU 24, main ROM 25, main RAM
26, input keyboard 27, display controller 28, V
It comprises a RAM 29, a display 30, and a power supply circuit 31.

【０００６】光ディスクドライブ２１は、ＣＤ−ＲＯＭ
に存在するデータを再生する機能を備えている。データ
処理部２２は、ＣＤ−ＲＯＭのデータの復調を行う機能
を備えている。Ｉ／Ｏ２３はデータ処理部２２とＣＰＵ
２４との間のインタフェースの機能を備えている。メイ
ンＲＯＭ２５には、イニシャル処理を実行するためのプ
ログラムが格納されている。メインＲＡＭ２６は、オペ
ランドを格納するのに使用される。[0006] The optical disk drive 21 is a CD-ROM.
It has a function to reproduce the data existing in the. The data processing unit 22 has a function of demodulating data of the CD-ROM. I / O 23 is a data processing unit 22 and a CPU
24 is provided. The main ROM 25 stores a program for executing the initial processing. Main RAM 26 is used to store operands.

【０００７】入力キーボード２７には、図１９の平面図
に示すように、電源キー、シフトキー、ファンクション
キー、アルファベットキー、ローマ字表／文字拡大キ
ー、ＮＯキー、ＹＥＳキー、矢印で方向を示す４つの方
向キー、入力モードキーとを含んでいる。As shown in a plan view of FIG. 19, the input keyboard 27 has a power key, a shift key, a function key, an alphabet key, a Roman character table / character enlargement key, a NO key, a YES key, and four keys indicating directions by arrows. It includes a direction key and an input mode key.

【０００８】表示コントローラ２８では、表示すべきビ
デオデータを格納するＶＲＡＭ２９が接続され、表示コ
ントローラの出力をディスプレイ３０に供給する機能を
備えている。ＶＲＡＭ２９はビデオＲＡＭのことであ
る。電源回路３１は各回路に必要な電力を供給する機能
を備えている。ディスプレイ３０は内容を表示する機能
を備えている。The display controller 28 is connected to a VRAM 29 for storing video data to be displayed, and has a function of supplying an output of the display controller to a display 30. The VRAM 29 is a video RAM. The power supply circuit 31 has a function of supplying necessary power to each circuit. The display 30 has a function of displaying contents.

【０００９】次に、従来装置の動作について説明する。
従来、ＣＤ−ＲＯＭ等を用いる情報検索装置では、コン
ピュータに関する知識を有する人間だけでなくコンピュ
ータの知識を持たない幅広い階層や分野の人に使用され
る可能性があるため、メニュー表示画面を設置すること
が多い。そこで、通常はメニュー選択を基本とするた
め、メニューを選択したことを確定するために図２０で
示すように方向キー等でメニューを選択した後に（ステ
ップＳ１０１）、確定キーが押されたか否かを判断し
（ステップＳ１０２）、確定キーが押されたら、選択画
面を表示する（ステップＳ１０３）ようにしている。Next, the operation of the conventional device will be described.
2. Description of the Related Art Conventionally, an information retrieval apparatus using a CD-ROM or the like has a menu display screen because it may be used not only by persons having knowledge of computers but also by persons in a wide range of hierarchies and fields having no knowledge of computers. Often. Therefore, since the menu selection is usually based on the menu, after the menu is selected with the direction key or the like as shown in FIG. 20 to determine that the menu has been selected (step S101), it is determined whether or not the enter key is pressed. Is determined (step S102), and when the enter key is pressed, a selection screen is displayed (step S103).

【００１０】しかし、これでは操作性が良くないので、
上記の従来装置では以下のようにして操作性を改善しよ
うとしている。例えば、図２１に示すような英和辞典の
検索入力画面では、「単語の意味を調べる」と「目次を
見る」という画面になっている。この画面で方向キーを
操作して「目次を見る」を選択すると、ＹＥＳキーを押
さなくとも、数秒の待ち時間の後に、図２２で示すよう
な目次画面となる。誤って目次メニューを選択した場
合、数秒待ちの間にＮＯキーを押すことにより、図２１
に示す画面に戻せる。また、図２２に示すような目次画
面を見ているときに、ＮＯキーを押したときは、図２１
に示す画面に戻せる。However, since the operability is not good in this case,
In the above-mentioned conventional apparatus, operability is to be improved as follows. For example, in the search input screen of the English-Japanese dictionary as shown in FIG. 21, there are screens of "find the meaning of a word" and "see the table of contents". When the user operates the direction keys on this screen and selects “view the table of contents”, the table of contents as shown in FIG. 22 is displayed after a waiting time of several seconds without pressing the YES key. If the user selects the table of contents menu by mistake, pressing the NO key while waiting for a few seconds causes
You can return to the screen shown in. If the user presses the NO key while viewing the table of contents as shown in FIG.
You can return to the screen shown in.

【００１１】図２３は、このときの動作を示すフローチ
ャートである。検索語入力画面で、「目次を見る」が選
択されたかどうかが判断される（ステップＳ１）。「目
次を見る」が選択されると、所定の待ち時間が設定され
る（ステップＳ２）。所定の待ち時間が経過する前に、
ＮＯキーが押されたかどうかが判断され（ステップＳ
３）、ＮＯキーが押されると、最初の画面に戻される。
「目次を見る」が選択されてから、ＮＯキーを押さず
に、所定の待ち時間が経過すると、目次画面が表示され
る（ステップＳ４）。目次画面の表示中に、ＮＯキーが
押されたかどうか監視されており（ステップＳ５）、Ｎ
Ｏキーが押されたことが検出されると、最初の検索語入
力画面に戻される。FIG. 23 is a flowchart showing the operation at this time. It is determined whether or not “view table of contents” is selected on the search word input screen (step S1). When "view the table of contents" is selected, a predetermined waiting time is set (step S2). Before the predetermined waiting time elapses,
It is determined whether the NO key has been pressed (step S
3) If the NO key is pressed, the screen returns to the first screen.
If a predetermined waiting time elapses without selecting the "NO" key after "view table of contents" is selected, a table of contents is displayed (step S4). While the table of contents screen is being displayed, it is monitored whether the NO key has been pressed (step S5).
When it is detected that the O key is pressed, the screen returns to the first search word input screen.

【００１２】このように、上記の従来装置では、検索の
ための所定の選択操作を行なうと、確定キーを押すこと
なしに、所定時間経過後に、所定の選択操作に対応する
画面が表示される。所定時間を経過する前に、解除操作
をすると、所定の選択操作に対応する画面の表示が中止
される。選択操作を確定キーを押すことなしに行われる
ので操作性が向上される。As described above, in the above-described conventional apparatus, when a predetermined selection operation for search is performed, a screen corresponding to the predetermined selection operation is displayed after a predetermined time has elapsed without pressing the enter key. . If the release operation is performed before the predetermined time has elapsed, the display of the screen corresponding to the predetermined selection operation is stopped. Since the selection operation is performed without pressing the confirmation key, the operability is improved.

【００１３】[0013]

【発明が解決しようとする課題】しかるに、上記の従来
装置では、ディジタルカメラで得る画像情報の入力時
に、ユーザが入力する音声情報を文字列情報として添付
できないため、ディジタルカメラ画像をユーザが撮影す
る際、その画像のみではユーザの感じたままを正確に表
現できない。However, in the above-described conventional apparatus, when inputting image information obtained by a digital camera, voice information input by the user cannot be attached as character string information. In this case, the image alone cannot accurately represent what the user feels.

【００１４】また、上記の従来装置では、ディジタルカ
メラの各画像情報に対して音声情報を文字列情報として
付加して表示する際に、音声で指定入力するコメント文
字列の表示位置とコメント音声を入力後、表示可能な最
大長と比べることで表示の不可を判断する操作を常にユ
ーザが行う必要があるという問題がある。Further, in the above-mentioned conventional apparatus, when voice information is added to each image information of the digital camera as character string information and displayed, the display position of the comment character string specified and input by voice and the comment voice are displayed. After the input, there is a problem that the user always needs to perform an operation of determining whether display is impossible by comparing with the maximum displayable length.

【００１５】これはディジタルカメラで撮影中にコメン
ト文字列を表示位置を指定して各々音声情報を用いて付
加する際、その位置決めのために音声入力する位置情報
と、文字列を仮表示するためにコメント音声入力するコ
メント音声とで表示位置を計算し、表示可能な場合は、
未確定文字列として表示し、表示不可能な場合は非表示
な構成をとる技術が存在しないからである。This is because when a comment character string is specified at a display position during photographing by a digital camera and added using voice information, positional information for voice input for positioning and a character string are temporarily displayed. Calculate the display position with the comment voice input to the comment voice, and if it can be displayed,
This is because there is no technology that displays an undetermined character string and takes a non-display configuration when display is not possible.

【００１６】更に、上記の従来装置では、コメント文字
列を未確定文字列でフェンダ中に表示できても、その決
定する未確定文字列がユーザの要望に合わない際に対処
の方法がない。その理由は、未確定文字列として決定後
に再度コメント音声を入力できない、あるいは、既に入
力済みで一度は未確定文字列まで決定する処理を再度実
行することができないからである。Further, in the above-mentioned conventional apparatus, even if the comment character string can be displayed in the fender as an undetermined character string, there is no method for coping when the determined undetermined character string does not meet the user's request. The reason is that the comment voice cannot be input again after being determined as the undetermined character string, or the process of determining the undetermined character string once and has already been input cannot be executed again.

【００１７】本発明は以上の点に鑑みなされたもので、
ユーザがディジタルカメラを使用するその場の雰囲気か
ら生じる言葉を活かすシステムを提供し得、インターネ
ットや電子アルバム等他のシステムに表示する文字列付
きの画像を即座に作成し得るディジタルカメラ画像のコ
メント入力装置を提供することを目的とする。The present invention has been made in view of the above points,
Digital camera image comment input that can provide a system that makes use of words arising from the immediate atmosphere in which a user uses a digital camera and can immediately create an image with a character string to be displayed on another system such as the Internet or an electronic album It is intended to provide a device.

【００１８】また、本発明の他の目的は、ユーザがディ
ジタルカメラを使用するその場で生じた言葉を画像内に
表示する際に、表示位置をその場で音声を用いて手軽に
確定し得るディジタルカメラ画像のコメント入力装置を
提供することにある。Another object of the present invention is to allow a user to easily determine a display position by using a voice on the spot when displaying a word generated on the spot using a digital camera in an image. An object of the present invention is to provide a comment input device for a digital camera image.

【００１９】更に、本発明の他の目的は、ユーザがディ
ジタルカメラを用いて文字列を画像に合成する際に表示
するコメントが自動的にも恣意的にも確定でき、加えて
一度入力するコメント音声を再度入力する、あるいは既
に入力済みで未確定文字列まで変換するコメント音声に
対して再度認識・文字列化することも可能な機能を有す
るディジタルカメラ画像のコメント入力装置を提供する
ことにある。Still another object of the present invention is to automatically or arbitrarily determine a comment to be displayed when a user combines a character string with an image using a digital camera, and to additionally input a comment once. It is an object of the present invention to provide a digital camera image comment input device having a function of re-inputting a voice or re-recognizing and converting a comment voice already input and converted to an undetermined character string into a character string. .

【００２０】[0020]

【課題を解決するための手段】以上の目的を達成するた
め、請求項１記載の本発明は、画像情報を入力するディ
ジタルカメラ本体と、ディジタルカメラ本体のフェンダ
を見ながらディジタルカメラ本体に入力する画像情報に
対し、コメントを表示する位置を音声で指定する音声入
力部と、音声入力部で指定する位置に表示するコメント
を音声で入力するコメント音声入力部と、コメント音声
入力部に入力するコメント音声を認識し、未確定状態で
文字列化する認識文字列化手段と、未確定状態で文字列
化されたコメント文字列の確定あるいは不確定に関する
返答音声を入力する返答音声入力部と、ディジタルカメ
ラ本体に入力する画像情報を参照しながら、音声入力部
から入力された位置指定音声とコメント音声入力部から
入力されたコメント音声とに基づいて、未確定状態で文
字列化されたコメント文字列がフェンダ内で表示可能か
どうかを判断し、表示可能なときは返答音声入力部から
入力された返答音声を用いて未確定状態のコメント文字
列の確定／未確定を判断し、表示位置情報と確定文字列
情報を出力するコメント作成部と、コメント作成部が確
定するコメント文字列の表示位置情報と確定文字列情報
とを蓄積する表示データ用蓄積部と、ディジタルカメラ
本体で入力する画像情報と表示データ用蓄積部に蓄積す
る情報とをマージする表示データ作成部と、表示データ
作成部で作成した情報を出力する画像出力部とを有する
構成としたものである。To achieve the above object, according to the present invention, there is provided a digital camera body for inputting image information and a digital camera body for inputting image information while looking at a fender of the digital camera body. A voice input unit that specifies the position to display a comment by voice for image information, a comment voice input unit that inputs a comment to be displayed at the position specified by the voice input unit by voice, and a comment that is input to the comment voice input unit A recognition character string converting means for recognizing a voice and converting it into a character string in an undetermined state; a response voice input unit for inputting a response voice relating to the determination or indetermination of a comment character string converted into a character string in an undetermined state; While referring to the image information input to the camera body, the position designation voice input from the voice input unit and the comment input from the comment voice input unit Determines whether the comment character string that has been converted into a character string in an undetermined state can be displayed in the fender based on the voice, and if it can be displayed, is undetermined using the response voice input from the response voice input unit to determine the determined / undetermined status of the comment string, and comments created unit that outputs a fixed character string information and the display position information, and the display position information of the comment string comment creation unit is determined and fixed character string information , A display data generating unit for merging image information input by the digital camera body and information stored in the display data storing unit, and an image for outputting information generated by the display data generating unit. And an output unit.

【００２１】この発明では、ディジタルカメラ本体のフ
ェンダを見ながらディジタルカメラ本体に入力する画像
情報に対し、音声入力部及びコメント音声入力部により
音声で入力したコメントを表示する位置及びコメント
を、コメント作成部を用いてコメント文字列に作成する
ことができる。また、この発明では、ディジタルカメラ
で撮影中にコメント文字列を表示位置を指定して各々音
声情報を用いて付加する際、その位置決めのために音声
入力する位置情報と、文字列を仮表示するためにコメン
ト音声入力するコメント音声とで表示位置を計算し、表
示可能な場合は、未確定文字列として表示し、表示不可
能な場合は非表示な構成をとることができる。According to the present invention, for the image information input to the digital camera main body while looking at the fender of the digital camera main body, the voice input section and the comment voice input section display the comment and the comment display position for displaying the comment input by voice. Can be used to create a comment character string. In addition, in the present invention, when a comment character string is designated by using a digital camera and a display position is added using voice information during photographing, position information for voice input for the positioning and the character string are temporarily displayed. For this reason, the display position is calculated based on the comment voice input with the comment voice, and if it can be displayed, it is displayed as an undetermined character string, and if it cannot be displayed, it can be hidden.

【００２２】請求項２記載の発明は、コメント作成部
を、音声入力部から入力された位置指定音声とコメント
音声入力部から入力されたコメント音声とに基づいて、
未確定状態で文字列化されたコメント文字列がフェンダ
内で表示可能かどうかを判断し、表示不可の場合は音声
入力部から再度入力された位置指定音声とコメント音声
入力部から再度入力されたコメント音声とに基づいて、
未確定状態で文字列化されたコメント文字列がフェンダ
内で表示可能かどうかを再度判断するように構成したも
のである。この発明では、コメント文字列の再入力が可
能である。According to a second aspect of the present invention, a comment creating section is provided based on a position designation voice input from a voice input section and a comment voice input from a comment voice input section.
Determines whether the comment character string that has been converted into a character string in the undetermined state can be displayed in the fender, and if it cannot be displayed, the position designation voice input again from the voice input unit and the comment voice input again from the comment voice input unit Based on the comment voice and
It is configured such that it is determined again whether a comment character string converted into a character string in an undetermined state can be displayed in a fender. According to the present invention, the comment character string can be re-input.

【００２３】また、本発明におけるコメント作成部を、
返答音声入力部から入力された返答音声を用いて未確定
状態のコメント文字列の確定／未確定を判断するに際
し、返答音声が肯定的な意図の確定音声であれば未確定
状態のコメント文字列を確定し、返答音声が音声入力後
の未確定文字列の再入力を意図する不確定音声であれば
再度コメント入力をする手段としてもよい。Further, the comment creation unit in the present invention may include:
When determining whether the comment character string in the undecided state is confirmed / unconfirmed using the response sound input from the response sound input unit, if the response sound is a confirmed sound with a positive intention, the comment character string in the unconfirmed state May be determined, and if the reply voice is an uncertain voice intended to re-enter an undetermined character string after voice input, a comment may be input again.

【００２４】また、本発明におけるコメント作成部を、
返答音声入力部から入力された返答音声を用いて未確定
状態のコメント文字列の確定／未確定を判断するに際
し、返答音声が未確定文字列の再変換を意図する不確定
音声であるときには、再度コメント文字列の認識・文字
列化する手段を有する構成としてもよい。In the present invention, the comment creation unit may include:
When determining whether the comment character string in the undetermined state is confirmed or undetermined using the response voice input from the response voice input unit, if the response voice is an uncertain voice intended to reconvert the undetermined character string, It may be configured to have means for recognizing the comment character string and converting it into a character string.

【００２５】また、本発明におけるコメント作成部を、
未確定状態のコメント文字列を確定するときには、確定
コメント文字列のデータを表示データ用蓄積部に蓄積し
てもよい。In the present invention, the comment creation unit
When a comment character string in an undetermined state is determined, data of the determined comment character string may be stored in the display data storage unit.

【００２６】[0026]

【００２７】また、本発明におけるコメント作成部を、
残り時間を設定すると共に時刻情報を出力するタイマ
と、タイマを管理するタイマ管理手段とを更に有し、音
声入力部から入力された位置指定音声とコメント音声入
力部から入力されたコメント音声とに基づいて、未確定
状態で文字列化されたコメント文字列がフェンダ内で表
示可能かどうかを判断し、表示可能なときはタイマで設
定する時間以内に確定あるいは文字列再認識変換あるい
は文字列再入力を決定することを特徴とする。In the present invention, the comment creation unit is
A timer for setting the remaining time and outputting time information; and a timer management means for managing the timer, wherein a position designation voice input from the voice input unit and a comment voice input from the comment voice input unit are used. Based on this, it is determined whether or not the comment character string that has been converted into a character string in an undetermined state can be displayed in the fender. If it can be displayed, it is determined or converted within the time set by the timer. The input is determined.

【００２８】更に、本発明におけるコメント作成部は、
未確定状態で文字列化されたコメント文字列がフェンダ
内で表示可能かどうかを判断し、表示可能なときはタイ
マで設定する時間経過後に、自動的にコメント文字列を
確定するようにしてもよい。Further, the comment creating section in the present invention includes:
Judgment is made as to whether the comment character string that has been converted into an undetermined state can be displayed in the fender, and if it can be displayed, the comment character string is automatically determined after the time set by the timer has elapsed. Good.

【００２９】また、本発明は、未確定状態のコメント文
字列が確定するまで、確定までの一定時間が経過する度
合いを表示する手段を有するようにしてもよい。Further, the present invention may include means for displaying a degree of elapse of a predetermined time until the comment character string in an undetermined state is determined.

【００３０】更に、本発明は、ディジタルカメラ本体か
ら入力された画像情報と表示データ用蓄積部に蓄積され
たコメント文字列とをマージし、コメント作成部で得ら
れる一定時間内に遷移する時間情報も反映する手段であ
る表示データ作成部を有する構成としたものである。Further, the present invention merges image information input from a digital camera main body with comment character strings stored in a display data storage unit, and obtains time information for transition within a predetermined time obtained by a comment creation unit. And a display data creation unit as a means for reflecting the display data.

【００３１】[0031]

【発明の実施の形態】次に、本発明の実施の形態につい
て説明する。Next, an embodiment of the present invention will be described.

【００３２】図１は、本発明の一実施の形態の全体構成
ブロック図を示す。同図において、ディジタルカメラ画
像コメント入力装置は、画像情報を入力するディジタル
カメラ本体１と、フェンダを見ながらディジタルカメラ
本体１に入力する画像情報に対し、コメントを加える位
置を音声で指定する音声入力部２と、音声入力部２で確
定する位置に表示するコメントを音声で入力するコメン
ト音声入力部３と、コメント音声入力部３に入力するコ
メント音声を認識し未確定状態で文字列化する認識・文
字列化手段４と、未確定状態で表示するコメント文字列
に対し、確定あるいは不確定を強制的に判定する返答音
声入力部５と、管理テーブル６と、コメント作成部７
と、表示データ用蓄積部８と、表示データ作成部９と、
画像出力部１０から構成されている。FIG. 1 is a block diagram showing the overall configuration of an embodiment of the present invention. In FIG. 1, a digital camera image comment input device includes a digital camera main body 1 for inputting image information, and a voice input for specifying a position to add a comment with respect to image information input to the digital camera main body 1 while looking at a fender. Unit 2, a comment voice input unit 3 for voicely inputting a comment to be displayed at a position determined by the voice input unit 2, and a recognition for recognizing a comment voice input to the comment voice input unit 3 and forming a character string in an undetermined state. A character string converting means 4, a reply voice input unit 5 for forcibly determining whether the comment character string is displayed in an undetermined state or not, a management table 6, and a comment creating unit 7
Display data storage unit 8, display data creation unit 9,
It comprises an image output unit 10.

【００３３】管理テーブル６は、ディジタルカメラ本体
１で入力する画像データのコメント表示位置指定と確定
後のコメント文字列とをＩ．Ｄ．ナンバを用いて統一管
理する。コメント作成部７は、ディジタルカメラ本体１
に入力する画像情報を参照しながら音声入力部２より入
力する位置指定音声とコメント音声入力部３より入力す
るコメント音声と必要ならば返答音声入力部５で入力す
る返答音声を用いてそれぞれの音声を認識し、各画像で
のコメント文字列の表示位置情報と確定文字列情報と確
定時間に関した時刻情報を出力する。The management table 6 specifies the comment display position of the image data input by the digital camera body 1 and the comment character string after the determination. D. Unified management using numbers. The comment creation unit 7 is a digital camera body 1
Each voice is used by using the position specifying voice input from the voice input unit 2 and the comment voice input from the comment voice input unit 3 and, if necessary, the reply voice input by the reply voice input unit 5 while referring to the image information input to the And outputs the display position information of the comment character string in each image, the fixed character string information, and the time information relating to the fixed time.

【００３４】表示データ用蓄積部８は、確定するコメン
ト文字列の表示位置情報と確定文字列情報とを蓄積す
る。表示データ作成部９は、ディジタルカメラ本体１で
入力する画像情報と表示データ用蓄積部８に蓄積する情
報とをマージし、コメント作成部７で得られるタイマ情
報も反映するインタフェース情報も更新する。画像出力
部１０は表示データ作成部９で作成する情報を出力す
る。ディジタルカメラ本体１、表示データ蓄積部８及び
画像出力部１０はディジタルカメラ装置本体を構成して
いる。The display data storage unit 8 stores display position information and fixed character string information of the comment character string to be determined. The display data creation unit 9 merges the image information input by the digital camera body 1 with the information stored in the display data storage unit 8 and updates the interface information that also reflects the timer information obtained by the comment creation unit 7. The image output unit 10 outputs information created by the display data creation unit 9. The digital camera main unit 1, the display data storage unit 8, and the image output unit 10 constitute a digital camera device main unit.

【００３５】次に、コメント作成部７の詳細な構成につ
いて述べる。コメント作成部７は、ディジタルカメラ本
体１で入力する画像情報に対し、使用者が画像を撮りつ
つ、その画像に関連して画像と共に記載表示する文字列
を音声で入力・認識し、そして表示用データを確定し提
供する手段である。Next, a detailed configuration of the comment creating section 7 will be described. The comment creating section 7 inputs and recognizes a character string described and displayed together with the image by voice while the user takes an image with respect to the image information input by the digital camera main body 1. A means for determining and providing data.

【００３６】図２は本発明におけるコメント作成部７の
第１の実施の形態の一部を示すブロック図である。図２
に示すように、入力音声認識手段７−１、表示位置指定
手段７−２及び記憶手段７−４から構成されている。入
力音声認識手段７−１は、ディジタルカメラのフェンダ
から見える画面情報に対し、コメント文字列の表示位置
を規定するために、音声入力部２から入力される位置指
定音声である音声情報を認識する。FIG. 2 is a block diagram showing a part of the first embodiment of the comment creating section 7 in the present invention. FIG.
As shown in the figure, the input speech recognition means 7-1, the display position designation means 7-2 and the storage means 7-4. The input voice recognition unit 7-1 recognizes voice information, which is a position designation voice input from the voice input unit 2, in order to define a display position of a comment character string with respect to screen information viewed from a fender of the digital camera. .

【００３７】表示位置指定手段７−２は、ユーザが音声
入力するコメント文字列をユーザの指定する位置では未
確定文字列として表示しないことで音声入力部２で位置
指定音声を再度入力の合図をする、又はコメント文字列
を未確定状態で表示することで表示位置を指定する。記
憶手段７−４は、認識・文字列化手段４を介してコメン
ト音声入力部３から入力されたコメント音声を音声認識
する結果を記憶する。The display position specifying means 7-2 does not display the comment character string input by the user as an unconfirmed character string at the position specified by the user, so that the voice input unit 2 signals the input of the position specifying sound again. Or the display position is specified by displaying the comment character string in an undetermined state. The storage unit 7-4 stores the result of voice recognition of the comment voice input from the comment voice input unit 3 via the recognition / character string conversion unit 4.

【００３８】次に、本発明の一実施の形態の動作につい
て、図３に示す本実施の形態の表示位置指定とコメント
音声入力（コメント作成部）のフローチャートと共に説
明する。まず、音声入力部２から表示する文字列の表示
位置に関しての位置指定の音声入力があると（ステップ
１０１）、その入力位置指定音声情報を入力音声認識手
段７−１を用いて認識する（ステップ１０２）。続い
て、コメント音声がコメント音声入力部３から入力され
ると（ステップ１０３）、入力コメント音声を認識し、
文字列化する処理を認識・文字列化手段４で行う（ステ
ップ１０４）。その際には初期設定としてフォント設定
は行われている。Next, the operation of the embodiment of the present invention will be described with reference to the flowchart of FIG. 3 for specifying the display position and inputting the comment voice (comment creation unit) in the embodiment. First, when there is a voice input for specifying the display position of a character string to be displayed from the voice input unit 2 (step 101), the input position specifying voice information is recognized using the input voice recognition means 7-1 (step 101). 102). Subsequently, when a comment voice is input from the comment voice input unit 3 (step 103), the input comment voice is recognized,
The character string conversion process is performed by the recognition / character string conversion means 4 (step 104). At that time, font setting is performed as an initial setting.

【００３９】次に、入力音声認識手段７−１でステップ
１０２において認識した位置指定音声情報と、認識・文
字列化手段４によりステップ１０４で認識されたコメン
ト音声とその文字列とから、ディジタルカメラのフェン
ダ中でコメント文字列が表示可能か否かを判断し、表示
不可能な場合にはユーザが音声入力するコメント文字列
をユーザの指定する位置では未確定文字列として表示し
ないことで音声入力部２で位置指定音声を再度入力の合
図をする（ステップ１０５）。そして、再びステップ１
０１に戻り、音声入力部２から表示する文字列の表示位
置に関しての位置指定の音声入力を受ける。Next, a digital camera is obtained from the position designation voice information recognized by the input voice recognition means 7-1 in step 102 and the comment voice and its character string recognized in step 104 by the recognition / character string conversion means 4. Determines whether the comment string can be displayed in the fender, and if it cannot be displayed, does not display the comment string input by the user as an undetermined character string at the position specified by the user, and inputs the voice. The part 2 gives a signal to input the position designation voice again (step 105). And again step 1
Returning to 01, a voice input for specifying the position of the character string to be displayed is received from the voice input unit 2.

【００４０】図４は本発明におけるコメント作成部７の
第１の実施の形態のブロック図を示す。同図中、図２と
同一構成部分には同一符号を付し、その説明を省略す
る。図４に示すコメント作成部７は、返答音声入力部５
から返答音声が入力される返答音声認識手段７−７と、
確定処理手段７−８を更に有する。この図４のコメント
作成部の動作について図５のコメント入力手段とそのコ
メント文字列の確定方式のフローチャートと共に説明す
る。なお、図５中、図３と同一処理ステップには同一符
号を付してある。FIG. 4 is a block diagram showing a first embodiment of the comment creating section 7 according to the present invention. 2, the same components as those in FIG. 2 are denoted by the same reference numerals, and the description thereof will be omitted. The comment creating section 7 shown in FIG.
A response voice recognition means 7-7 to which a response voice is input from
It further has a confirmation processing means 7-8. The operation of the comment creation unit in FIG. 4 will be described with reference to the flowchart of the comment input means and the comment character string determination method in FIG. 5, the same processing steps as those in FIG. 3 are denoted by the same reference numerals.

【００４１】位置指定音声入力を音声入力部２で行
い、入力音声認識手段７−１で認識した結果とコメント
音声文字列入力をコメント音声入力部３から行い認識・
文字列化手段４で認識し文字列化する結果からフェンダ
画面中で文字列表示が可能かどうか判断する（ステップ
１１０）。文字列表示が不可能であれば、入力するコメ
ント文字列は非表示であり、再度ステップ１０１に戻り
音声入力部２に表示位置を位置指定音声入力する処理を
再度開始する。The position input voice input is performed by the voice input unit 2, and the result of recognition by the input voice recognition unit 7-1 and the comment voice character string input are performed by the comment voice input unit 3.
From the result of recognition and character string conversion by the character string conversion means 4, it is determined whether a character string can be displayed on the fender screen (step 110). If string Display is impossible, comment string to be input is hidden, it starts to locate the voice input the display position to the speech input unit 2 returns to step 101 again again.

【００４２】一方、ステップ１１０でコメント音声の文
字列化結果が表示可能と判断されたときには、ステップ
１１１に進み、返答音声入力部５によりユーザが未確定
文字列で表示するコメント文字列を肯定的な返答音声を
用いて入力を受け、図４の返答音声認識手段７−７が返
答音声を認識して確定処理手段７−８に渡す。確定処理
手段７−８は、確定音声であるか否かの判断を行い（ス
テップ１１２）、確定音声と判断する場合には、記憶手
段７−４から入力するコメント文字列を認識文字列とし
て確定する（ステップ１１３）。On the other hand, when it is determined in step 110 that the character string of the comment voice can be displayed, the process proceeds to step 111, where the reply voice input unit 5 affirms the comment character string displayed by the user as an undetermined character string. The response voice is received by using the response voice, and the response voice recognition means 7-7 in FIG. 4 recognizes the response voice and passes it to the determination processing means 7-8. The determination processing means 7-8 determines whether or not the voice is a confirmed voice (step 112). If it is determined that the voice is a confirmed voice, the comment character string input from the storage means 7-4 is determined as a recognized character string. (Step 113).

【００４３】このステップ１１３では、返答音声によっ
て確定する文字列を、図６に示すように、記憶手段７−
４から図１及び図６の表示データ用蓄積部８に蓄積する
処理を行い確定操作としてもよい。また、ステップ１１
３の蓄積情報は、出力手段で出力される。In this step 113, the character string determined by the reply voice is stored in the storage unit 7- as shown in FIG.
4 to 4 may be stored in the display data storage unit 8 shown in FIGS. Step 11
The stored information of No. 3 is output by the output means.

【００４４】次に本発明の第２の実施の形態について図
面を参照して説明する。図７は本発明の要部のコメント
作成部の第２の実施の形態のブロック図を示す。第２の
実施の形態のコメント作成部は、図４に示した第１の実
施の形態におけるコメント作成部の構成に加え、図７の
コメント音声再入力手段７−１０を有する点で異なる。Next, a second embodiment of the present invention will be described with reference to the drawings. FIG. 7 is a block diagram showing a comment creation unit according to a second embodiment of the present invention. The comment creator of the second embodiment is different from the comment creator of the first embodiment shown in FIG. 4 in that a comment voice re-input means 7-10 of FIG. 7 is provided.

【００４５】第１の実施の形態では未確定コメント文字
列の確定に対して返答音声を入力し、肯定的な返答音声
であるか否かを判断し、肯定的であるならば、未確定文
字列を確定していたが、本実施の形態では、入力する返
答音声が否定的な場合に再度コメント文字列を、コメン
ト音声再入力手段７−１０によりコメント音声入力部３
から入力させる処理を行うようにしたものである。In the first embodiment, a response voice is input for the confirmation of the unconfirmed comment character string, and it is determined whether or not the response voice is affirmative. In this embodiment, when the response voice to be input is negative, the comment character string is re-entered by the comment voice re-input means 7-10.
In this case, a process for inputting from the user is performed.

【００４６】次に本発明の第３の実施の形態について図
面を参照して説明する。図８は本発明の要部のコメント
作成部の第３の実施の形態のブロック図を示す。同図
中、図４及び図７と同一構成部分には同一符号を付して
ある。第３の実施の形態のコメント作成部は、図４に示
した第１の実施の形態におけるコメント作成部の構成に
加え、図８に示すように、認識・文字列化手段４で認識
・変換する文字列を再度別候補に変換する認識文字列再
変換手段７−９と、認識・文字列化手段４に入力し記憶
手段７−４に記憶する文字列を別の音声入力に再度変更
するコメント音声再入力手段７−１０とを具備する点に
特徴がある。Next, a third embodiment of the present invention will be described with reference to the drawings. FIG. 8 is a block diagram showing a comment creating unit according to a third embodiment of the present invention. 4, the same components as those in FIGS. 4 and 7 are denoted by the same reference numerals. The comment creator of the third embodiment has a configuration similar to that of the comment creator of the first embodiment shown in FIG. 4 and, as shown in FIG. And a character string input to the recognition / character string conversion means 4 and stored in the storage means 7-4 again to another voice input. It is characterized by having comment voice re-input means 7-10.

【００４７】第１の実施の形態では未確定コメント文字
列の確定に対して返答音声を入力し、肯定的な返答音声
であるか否かを判断し、肯定的であるならば、未確定文
字列を確定していたが、本実施の形態では、入力する返
答音声が否定的な場合に再度コメント文字列を、コメン
ト音声再入力手段７−１０によりコメント音声入力部３
から入力させるか、既に入力した音声情報を認識文字列
再変換手段７−９により再認識・文字列化する処理を行
うかの選択が可能を構成をとっている。In the first embodiment, a response voice is input for the confirmation of the unconfirmed comment character string, and it is determined whether or not the response voice is affirmative. In this embodiment, when the response voice to be input is negative, the comment character string is re-entered by the comment voice re-input means 7-10.
It is possible to select whether to input from the user or to perform the process of re-recognizing and converting the already input voice information by the recognition character string re-conversion means 7-9.

【００４８】次に、図８、図９を参照して本実施の形態
の動作を説明する。図９は、本発明の返答音声が不確定
音声で、再度コメント音声入力を行うか又は入力済みの
音声の再変換を行う場合のフローチャートである。図９
中、図５と同一の処理ステップには同一の符号を付して
ある。Next, the operation of this embodiment will be described with reference to FIGS. FIG. 9 is a flowchart in the case where the reply voice of the present invention is an uncertain voice and the comment voice is input again or the input voice is converted again. FIG.
The same reference numerals are given to the same processing steps as those in FIG.

【００４９】図８において、入力音声認識手段７−１、
表示位置指定手段７−２、記憶手段７−４、返答音声認
識手段７−７、確定処理手段７−８、音声入力部２、コ
メント音声入力部３、認識・文字列化手段４、返答音声
入力部５は、ディジタルカメラ画像コメント音声入力装
置の第１の実施の形態の各処理と同一なため、説明は省
略する。In FIG. 8, input speech recognition means 7-1,
Display position designation means 7-2, storage means 7-4, response voice recognition means 7-7, determination processing means 7-8, voice input unit 2, comment voice input unit 3, recognition / character string conversion means 4, response voice The input unit 5 is the same as the respective processes of the first embodiment of the digital camera image comment voice input device, and the description is omitted.

【００５０】第１の実施の形態では未確定コメント文字
列の確定に対して返答音声を入力し、肯定的な返答音声
であるか否かを判断し、肯定的であるならば、未確定文
字列を確定している。本実施の形態では、入力する返答
音声が否定的な場合に再度コメント文字列を入力する処
理を行うか、あるいは既に入力する音声情報を再認識・
文字列化する処理を行うかを選択するために以下の処理
を行う。In the first embodiment, a reply voice is input for the determination of the undecided comment character string, and it is determined whether or not the response voice is affirmative. The columns are fixed. In the present embodiment, when the response voice to be input is negative, the process of inputting the comment character string again is performed, or the voice information already input is
The following processing is performed to select whether or not to perform processing for converting to a character string.

【００５１】図９のステップ１１１での返答音声情報を
もとに、図６の確定処理手段７−８は確定音声か不確定
音声かの判断を行い（ステップ１２１）、否定的な意図
の不確定音声と判断した場合には、コメント音声の再入
力処理かコメント音声の認識・文字列化を再度行うかを
判断する（ステップ１２２）。コメント音声入力を再度
行うと判断したときは、コメント音声再入力手段７−１
０を駆動してコメント音声入力部３よりコメント音声を
再入力させ（ステップ１０３）、認識済みのコメント音
声を認識・文字列化を再度行うと判断したときは、認識
文字列再変換手段７−９を駆動して認識・文字列化手段
４によりコメント音声の認識・文字列を再度行わせる
（ステップ１０４）。Based on the response voice information in step 111 in FIG. 9, the determination processing means 7-8 in FIG. 6 determines whether the voice is a definite voice or an undetermined voice (step 121). If it is determined that the voice is a confirmed voice, it is determined whether re-input processing of the comment voice or recognition / character string conversion of the comment voice is performed again (step 122). If it is determined that comment voice input is to be performed again, comment voice re-input means 7-1
0 is driven to re-enter the comment voice from the comment voice input unit 3 (step 103). If it is determined that the recognized comment voice is to be recognized and converted into a character string again, the recognized character string re-converting means 7- 9 is driven so that the recognition / character string conversion means 4 performs the recognition / character string of the comment voice again (step 104).

【００５２】次に、本発明の第４の実施の形態について
図面を参照して説明する。図１０は本発明の要部のコメ
ント作成部の第４の実施の形態のブロック図を示す。同
図中、図４、図７及び図８と同一構成部分には同一符号
を付し、その説明を省略する。第４の実施の形態のコメ
ント作成部は、図８に示した第３の実施の形態における
コメント作成部の構成に加え、管理テーブル６のＩ．
Ｄ．情報と認識文字列情報とを連携するＩ．Ｄ．連携手
段７−３を有する点で異なる。Ｉ．Ｄ．連携手段７−３
は、ディジタルカメラ本体１から入力する画像情報と確
定文字列情報とを関連づける管理テーブル６で必要とな
る手段である。Next, a fourth embodiment of the present invention will be described with reference to the drawings. FIG. 10 shows a block diagram of a fourth embodiment of the comment creation unit of the main part of the present invention. 4, the same components as those in FIGS. 4, 7, and 8 are denoted by the same reference numerals, and description thereof will be omitted. The comment creation unit according to the fourth embodiment has the same structure as the comment creation unit according to the third embodiment shown in FIG.
D. I. linking information with recognized character string information D. This is different in that it has a cooperation means 7-3. I. D. Cooperation means 7-3
Is a means necessary for the management table 6 for associating image information input from the digital camera body 1 with fixed character string information.

【００５３】次に、本発明の第５の実施の形態について
図面を参照して説明する。図１１は本発明の要部のコメ
ント作成部の第５の実施の形態のブロック図を示す。同
図中、図４、図７、図８及び図１０と同一構成部分には
同一符号を付し、その説明を省略する。Next, a fifth embodiment of the present invention will be described with reference to the drawings. FIG. 11 shows a block diagram of a fifth embodiment of the comment creation unit of the main part of the present invention. 4, the same components as those in FIGS. 4, 7, 8, and 10 are denoted by the same reference numerals, and description thereof will be omitted.

【００５４】第５の実施の形態のコメント作成部は、図
１０に示した第４の実施の形態におけるコメント作成部
の構成に加え、図１１に示すように、残り時間を設定す
るタイマ７−６と、タイマ７−６を管理するタイマ管理
手段７−５とを更に設け、また、確定処理手段７−８が
表示位置指定手段７−２で定まる未確定文字列に対して
タイマ７−６で設定する時間以内に確定あるいは文字列
再認識変換あるいは文字列再入力を決定する点に特徴が
ある。タイマ７−６は、時刻情報を表示データ作成部９
に送る。記憶手段７−４は、表示データ用蓄積部８に時
間経過に応じて確定する確定文字列データを蓄積する。The comment creating section of the fifth embodiment has a timer 7-to set the remaining time as shown in FIG. 11 in addition to the configuration of the comment creating section of the fourth embodiment shown in FIG. 6 and a timer management means 7-5 for managing the timer 7-6, and the determination processing means 7-8 performs the timer 7-6 processing on the undetermined character string determined by the display position specification means 7-2. It is characterized in that the determination, the character string re-recognition conversion, or the character string re-input is determined within the time set in the above. The timer 7-6 stores the time information in the display data creation unit 9
Send to The storage unit 7-4 stores the determined character string data determined over time in the display data storage unit 8.

【００５５】次に、この実施の形態の動作について図１
１及び図１２と共に説明する。図１２は、本発明の文字
列確定に関連したタイマ処理関係のフローチャートであ
る。図１２中、図９と同一処理ステップには同一符号を
付し、その説明を省略する。図１１において、入力音声
認識手段７−１、表示位置指定手段７−２、記憶手段７
−４、返答音声認識手段７−７、確定処理手段７−８、
認識文字列再変換手段７−９、コメント音声再入力手段
７−１０、音声入力部２、コメント音声入力部３、認識
・文字列化手段４、返答音声入力部５は、ディジタルカ
メラ画像コメント音声入力装置の前述した各実施の形態
の各処理と同一なため、説明は省略する。Next, the operation of this embodiment will be described with reference to FIG.
1 and FIG. FIG. 12 is a flowchart of a timer process related to character string determination according to the present invention. 12, the same processing steps as those in FIG. 9 are denoted by the same reference numerals, and description thereof will be omitted. In FIG. 11, input speech recognition means 7-1, display position designation means 7-2, storage means 7
-4, reply voice recognition means 7-7, confirmation processing means 7-8,
Recognition character string re-conversion means 7-9, comment voice re-input means 7-10, voice input unit 2, comment voice input unit 3, recognition / character string conversion means 4, reply voice input unit 5 are digital camera image comment voice Since the processing of the input device is the same as that of each of the above-described embodiments, the description is omitted.

【００５６】この実施の形態では未確定コメント文字列
の確定に対して返答音声を入力し、返答音声がどのよう
な意図の音声であるかを音声認識で判断し、肯定的であ
るならば、未確定文字列を確定し、もしも否定的である
ならば確定せず再度コメント音声を入力する処理を行
う、あるいは再度既に入力済みのコメント音声を認識・
文字列化する処理を行うものである。In this embodiment, a reply voice is input for the determination of the unconfirmed comment character string, and the intention of the reply voice is determined by voice recognition. Confirm the unconfirmed character string, and if it is negative, perform the process of inputting the comment voice again without confirming, or recognize the comment voice that has already been input.
This is a process for converting to a character string.

【００５７】本実施の形態では、入力する返答音声のタ
イミングと初期設定とに応じて未確定文字列が確定する
構成と、その確定を行うまでに時刻情報を常にユーザ指
定のインタフェースに提供可能な構成を備えている。そ
の処理は以下の通りである。In this embodiment, an unconfirmed character string is determined according to the timing of the response voice to be input and the initial setting, and time information can always be provided to the user-specified interface before the determination is made. It has a configuration. The processing is as follows.

【００５８】図１２のステップ１１０でコメント音声の
文字列化結果が表示可能と判断されたときには、ステッ
プ１３１に進み、確定処理手段７−８はタイマ７−６に
よる一定時間内であるかどうか判断し、一定時間内であ
るときには返答音声入力部５によりユーザが未確定文字
列で表示するコメント文字列を返答音声を用いて返答音
声認識手段７−７を介して入力されて確定音声か不確定
音声かの判断を行い（ステップ１２１）、否定的な意図
の不確定音声と判断した場合には、コメント音声の再入
力処理かコメント音声の認識・文字列化を再度行うかを
判断する（ステップ１２２）。When it is determined in step 110 of FIG. 12 that the result of characterizing the comment voice can be displayed, the process proceeds to step 131, where the determination processing means 7-8 determines whether or not the time is within a predetermined time by the timer 7-6. If the answer is within a predetermined time, the user inputs a comment character string displayed as an unconfirmed character string by the reply voice input unit 5 through the reply voice recognition means 7-7 using the reply voice and determines whether the voice is a definite voice or not. It is determined whether the voice is a voice (step 121). If it is determined that the voice is uncertain with a negative intention, it is determined whether the comment voice is re-input or the comment voice is recognized / characterized again (step 121). 122).

【００５９】図１１の返答音声入力部５から上記一定時
間内に入力された返答音声により、ステップ１２１で未
確定文字列を確定すると判断した際には、あるいはステ
ップ１３１で一定時間経過により認識文字列を確定する
場合は、その確定時に表示データ用蓄積部８に確定文字
列データを蓄積する。また、タイマ７−６とタイマ管理
手段７−５から得られる遷移時間情報は、表示データ作
成部９で作成する図１７で示したインタフェースの遷移
時間情報に時間情報を常に提供する。なお、上記の一定
時間経過後、又は未確定文字列の確定時には、上記の遷
移時間情報は表示データ作成部９からクリアされる。When it is determined in step 121 that the undetermined character string is to be determined based on the response voice input within the above-mentioned fixed time from the reply voice input unit 5 in FIG. When the column is determined, the determined character string data is stored in the display data storage unit 8 at the time of the determination. The transition time information obtained from the timer 7-6 and the timer management means 7-5 always provides time information to the transition time information of the interface shown in FIG. It should be noted that the transition time information is cleared from the display data creation unit 9 after the lapse of the predetermined time or when the undetermined character string is determined.

【００６０】ステップ１１３は一定時間を置いた後に自
動的に認識文字列である未確定文字列が確定する処理で
あり、ステップ１３１ないしステップ１１１の肯定的な
返答音声を入力することにより、未確定文字列を確定す
る処理である。ステップ１１３での確定結果は確定文字
列データとして表示データ用蓄積部８に蓄積し、表示デ
ータ作成部９では画面フェンダから見える画像データと
確定文字列とを合成し、表示データを作成する。そし
て、画像出力部１０では、これらの合成結果を表示、出
力する。Step 113 is a process in which an undetermined character string which is a recognized character string is automatically determined after a certain period of time. By inputting a positive reply voice in steps 131 to 111, the undetermined character string is input. This is the process for determining the character string. The determined result in step 113 is stored in the display data storage unit 8 as determined character string data, and the display data creation unit 9 combines the image data visible from the screen fender with the determined character string to create display data. Then, the image output unit 10 displays and outputs these synthesis results.

【００６１】なお、本発明は上記の実施の形態に限定さ
れるものではなく、例えば図１３に示す如く図１の実施
の形態に比べて管理テーブル６及び画像出力部１０を有
しない構成でもよく、また図１４に示す如く管理テーブ
ル６は有するが画像出力部１０を有しない構成でもよ
い。The present invention is not limited to the above embodiment. For example, as shown in FIG. 13, a configuration without the management table 6 and the image output unit 10 may be used as compared with the embodiment shown in FIG. Further, as shown in FIG. 14, a configuration having the management table 6 but not having the image output unit 10 may be employed.

【００６２】[0062]

【実施例】次に、本発明の実施例について図面を参照し
て詳細に説明する。図１１のコメント作成部７を有する
図１５のコメント入力装置の実施の形態の実施例につい
て、図１６のフローチャートと共に説明する。なお、図
１５の実施例は図１の実施の形態に比しディジタルカメ
ラ装置本体が結果出力手段を有している。Next, embodiments of the present invention will be described in detail with reference to the drawings. An example of the embodiment of the comment input device of FIG. 15 having the comment creating section 7 of FIG. 11 will be described with reference to the flowchart of FIG. In the embodiment shown in FIG. 15, the digital camera apparatus main body has a result output unit as compared with the embodiment shown in FIG.

【００６３】まず、ディジタルカメラのフェンダ中でコ
メント文字列を表示する位置を「左、右、上、下」等の
位置指定音声を用いてマイク等で構成する音声入力部２
に入力する（ステップ２０１）。すると、コメント作成
部７の入力音声認識手段７−１は、音声入力部２に入力
された上記の位置指定音声の音声認識を行う（ステップ
２０２）。First, a voice input unit 2 configured by a microphone or the like using a position specifying voice such as “left, right, up, down” in a fender of a digital camera to display a comment character string.
(Step 201). Then, the input voice recognition means 7-1 of the comment creating section 7 performs voice recognition of the above-mentioned position specifying voice input to the voice input section 2 (Step 202).

【００６４】次に、同様にマイク等のコメント音声入力
部３から「今日も明るく」等のコメント音声を入力する
と（ステップ２０３）、この入力コメント音声を認識・
文字列化手段４が、音声認識かつコメント文字列化を行
う（ステップ２０４）。その文字列化する結果はコメン
ト作成部７の記憶手段７−４に記憶する。Next, similarly, when a comment voice such as "Bright today" is input from the comment voice input unit 3 such as a microphone (step 203), this input comment voice is recognized and recognized.
The character string conversion means 4 performs voice recognition and comment character string conversion (step 204). The result of the conversion into a character string is stored in the storage unit 7-4 of the comment creation unit 7.

【００６５】次に、入力音声認識手段７−１による音声
認識結果とコメント作成部７の記憶手段７−４中に記憶
するコメント文字列の長さとを比較し、ディジタルカメ
ラのフェンダ中にコメント文字列が表示可能か否かをコ
メント作成部７内の表示位置指定手段７−２で判断する
（ステップ２０５）。文字列を表示するフェンダの長さ
が「コメント文字列の開始指定位置＋コメント文字列長
さ」よりも長い場合は、コメント文字列は表示可能であ
る。その場合は表示データ用蓄積部８に表示するコメン
ト文字列情報を蓄積する。Next, the result of the voice recognition by the input voice recognition means 7-1 is compared with the length of the comment character string stored in the storage means 7-4 of the comment creation section 7, and the comment character is stored in the fender of the digital camera. Whether or not the column can be displayed is determined by the display position specifying means 7-2 in the comment creating section 7 (step 205). If the length of the fender displaying the character string is longer than “comment character string start designation position + comment character string length”, the comment character string can be displayed. In that case, comment character string information to be displayed in the display data storage unit 8 is stored.

【００６６】複数行にまたがって文字列を表示しない場
合は、文字列を表示するフェンダの長さが、フェンダ内
の左を始点として「コメント文字列の開始指定位置迄の
長さ＋コメント文字列長さ」よりも短い場合は、コメン
ト文字列は表示不可であるのでコメント文字列は表示し
ない。When a character string is not displayed over a plurality of lines, the length of the fender displaying the character string is calculated by dividing the length of the fender starting from the left in the fender by “the length up to the designated position of the comment character string + the comment character string”. If the length is shorter than "length," the comment character string is not displayed because the comment character string cannot be displayed.

【００６７】ここまでの処理でコメント文字列が表示さ
れないときには、最初のステップ２０１に戻り、位置指
定音声入力の処理とコメント音声の入力の処理とを再度
やり直す（ステップ２０１〜２０４）。逆にコメント文
字列を表示するときには表示位置は決定しており、未確
定文字列状態で表示し、ユーザの設定する一定時間を経
過したかどうか判断する（ステップ２０６）。この一定
時間はタイマ７−６で管理する。また、タイマ７−６
は、パソコンのシステムタイマ等であるタイマ管理手段
７−５で管理する。If the comment character string is not displayed in the processing up to this point, the process returns to the first step 201, and the processing of the position designation voice input and the processing of the comment voice input are performed again (steps 201 to 204). Conversely, when displaying the comment character string, the display position is determined, and the comment character string is displayed in an undetermined character string state, and it is determined whether or not a predetermined time set by the user has elapsed (step 206). This fixed time is managed by the timer 7-6. In addition, timer 7-6
Is managed by a timer management means 7-5 such as a system timer of a personal computer.

【００６８】コメント文字列の未確定文字列状態は、ユ
ーザの設定する一定時間が経過すると自動的に確定状態
に変わる（ステップ２０６、２１０）。もしもタイマで
設定した一定時間以内に文字列を確定する、又は時間内
で表示する未確定文字列自体を変更する、又は時間内で
文字列認識結果は合っているが再変換する際にはマイク
等で構成する返答音声部５に返答音声を入力する（ステ
ップ２０７）。そして、この返答音声入力に基づき、コ
メント文字列が確定か不確定かを判断する（ステップ２
０８）。The unconfirmed character string state of the comment character string automatically changes to the confirmed state after a lapse of a predetermined time set by the user (steps 206 and 210). If the character string is confirmed within the fixed time set by the timer, or the unconfirmed character string displayed within the time is changed, or if the character string recognition result matches within the time but is converted again, use the microphone. The reply voice is input to the reply voice part 5 composed of the above-mentioned steps (step 207). Then, it is determined whether or not the comment character string is determined based on the response voice input (step 2).
08).

【００６９】返答音声が「はい。Ｏ．Ｋ．」等肯定的な
意図の場合は、返答音声は確定音声とステップ２０８で
判断され、表示データ用蓄積部８に確定文字列データを
蓄積し（ステップ２１０）、確定文字列を表示する（ス
テップ２１１）。逆に返答音声が「再変換、再入力」等
の否定的な意図の場合は、返答音声は不確定音声とステ
ップ２０８で判断され、コメント作成部７中の認識文字
列再変換手段７−９で入力するコメント音声の認識・文
字列化を再度行うか否か判断し（ステップ２０９）、コ
メント音声の認識・文字列化を再度行うときはステップ
２０４に戻り、コメント音声再入力手段７−１０で現在
入力する音声とは全く異なる音声をコメント音声入力部
３から入力操作を促すときはステップ３０３に戻る。If the reply voice has a positive intention, such as "Yes. OK", the reply voice is determined to be a definite voice in step 208, and the definite character string data is stored in the display data storage unit 8 ( (Step 210), the determined character string is displayed (Step 211). Conversely, if the reply voice has a negative intention, such as "re-conversion, re-input", the reply voice is determined to be an indeterminate voice in step 208, and the recognized character string re-converting means 7-9 in the comment creating section 7 It is determined whether or not the comment voice to be input is recognized and converted into a character string again (step 209). If the comment voice is recognized and converted into a character string again, the process returns to step 204, and the comment voice re-input means 7-10 is executed. When prompting the comment voice input unit 3 to input a voice completely different from the voice currently input, the process returns to step 303.

【００７０】このようにして、記憶媒体等で構成する表
示データ用蓄積部８に確定文字列データを記憶保存す
る。表示データ用蓄積部８に蓄積する確定文字列データ
は、表示データ作成部９でディジタルカメラが写した画
像とマージされる。そのマージ結果は、出力機器である
画像出力部１０により出力表示される。In this way, the determined character string data is stored and stored in the display data storage unit 8 constituted by a storage medium or the like. The determined character string data stored in the display data storage unit 8 is merged with the image captured by the digital camera in the display data creation unit 9. The merge result is output and displayed by the image output unit 10 as an output device.

【００７１】[0071]

【発明の効果】以上説明したように、本発明によれば、
ディジタルカメラで画像情報を手に入れ、同時に音声情
報を音声レコーダで録音した際にディジタルカメラのフ
ェンダ中で画像を表示しつつ入力するコメント音声情報
を文字列化する機能を有するため、ディジタルカメラで
得る画像情報に、音声情報を文字列情報として添付でき
る。As described above, according to the present invention,
When a digital camera obtains image information and simultaneously records voice information with a voice recorder, it has a function to convert comment voice information that is input while displaying an image in the fender of the digital camera into a character string. Audio information can be attached as character string information to the obtained image information.

【００７２】すなわち、本発明によれば、ディジタルカ
メラ本体のフェンダを見ながらディジタルカメラ本体に
入力する画像情報に対し、音声入力部及びコメント音声
入力部により音声で入力したコメントを表示する位置及
びコメントを、コメント作成部を用いてコメント文字列
に作成することができるため、ユーザがディジタルカメ
ラを使用するその場で生じた言葉をコメントとして画像
内に表示する際に、表示位置及びコメントをその場で音
声を用いて手軽に確定できる、また、ユーザがディジタ
ルカメラを用いて文字列を画像に合成する際に表示する
コメントが自動的にも恣意的にも確定できる。That is, according to the present invention, for the image information input to the digital camera main body while looking at the fender of the digital camera main body, the position and the comment for displaying the comment input by voice by the voice input section and the comment voice input section are displayed. Can be created in a comment character string using the comment creation unit, so that when a user uses a digital camera to display a word generated on the spot as a comment in an image, the display position and the comment are displayed on the spot. , The comment displayed when the user synthesizes the character string with the image using the digital camera can be automatically or arbitrarily determined.

【００７３】また、本発明によれば、現在のディジタル
カメラで撮影時にコメント文字列を表示位置を指定して
各々音声情報を用いて付加する際、その位置決めのため
に音声入力する位置情報と、文字列を仮表示するために
コメント音声入力するコメント音声とで表示位置を計算
し、表示可能な場合は、未確定文字列として表示し表示
不可能な場合は非表示な構成をとるようにしたため、デ
ィジタルカメラの各画像情報に対して音声情報を文字列
情報として付加して表示する際に、音声で指定入力する
コメント文字列の表示位置とコメント音声を入力後、表
示可能な最大長と比べることで表示の不可を判断する操
作を不要にできる。Further, according to the present invention, when a comment character string is designated at a display position at the time of shooting with a current digital camera and added using voice information, voice information for positioning is added; The comment position is calculated with the comment voice input to temporarily display the character string, and the display position is calculated as an unconfirmed character string when it can be displayed and hidden when it cannot be displayed. When voice information is added as character string information to each image information of the digital camera and displayed, the display position of the comment character string specified and input by voice and the comment voice are input and then compared with the maximum displayable length. This eliminates the need for an operation to determine whether display is impossible.

【００７４】更に、本発明によれば、未確定文字列とし
て決定後に再度コメント音声を入力する処理も、既に入
力済みで一度は未確定文字列まで決定する処理もユーザ
任意で再度実行することができ、コメント文字列を未確
定文字列でフェンダ中に表示後、その決定する未確定文
字列がユーザの要望に合わない際に対処できる。Further, according to the present invention, the process of inputting a comment voice again after being determined as an undetermined character string, and the process of determining an already determined and once undetermined character string can be re-executed arbitrarily by the user. After displaying the comment character string as an undetermined character string in the fender, it is possible to cope with a case where the determined undetermined character string does not meet the user's request.

【００７５】以上より、本発明によれば、ユーザがディ
ジタルカメラを使用するその場の雰囲気から生じる言葉
を活かすシステムを提供し得、インターネットや電子ア
ルバム等他のシステムに表示する文字列付きの画像を即
座に作成できる。As described above, according to the present invention, it is possible to provide a system that makes use of words arising from the atmosphere of a user who uses a digital camera, and displays an image with a character string displayed on another system such as the Internet or an electronic album. Can be created instantly.

[Brief description of the drawings]

【図１】本発明の一実施の形態を示すブロック図であ
る。FIG. 1 is a block diagram showing an embodiment of the present invention.

【図２】本発明の要部ののコメント作成部の第１の実施
の形態の一部を示すブロック図である。FIG. 2 is a block diagram showing a part of a first embodiment of a comment creation unit as a main part of the present invention.

【図３】図１及び図２の動作説明用フローチャートであ
る。FIG. 3 is a flowchart for explaining the operation of FIGS. 1 and 2;

【図４】本発明の要部のコメント作成部の第１の実施の
形態のブロック図である。FIG. 4 is a block diagram of a comment creation unit according to a first embodiment of the present invention;

【図５】図１及び図４の動作説明用フローチャートであ
る。FIG. 5 is a flowchart for explaining the operation of FIGS. 1 and 4;

【図６】本発明の要部のコメント作成部の第１の実施の
形態を表示データ蓄積部のブロック図と共に示す図であ
る。FIG. 6 is a diagram showing a first embodiment of a comment creation unit as a main part of the present invention, together with a block diagram of a display data storage unit.

【図７】本発明の要部のコメント作成部の第２の実施の
形態のブロック図である。FIG. 7 is a block diagram of a comment creation unit according to a second embodiment of the present invention;

【図８】本発明の要部のコメント作成部の第３の実施の
形態のブロック図である。FIG. 8 is a block diagram of a comment creation unit according to a third embodiment of the present invention.

【図９】図１及び図９の動作説明用フローチャートであ
る。FIG. 9 is a flowchart for explaining the operation of FIGS. 1 and 9;

【図１０】本発明の要部のコメント作成部の第４の実施
の形態のブロック図である。FIG. 10 is a block diagram of a fourth embodiment of a comment creation unit as a main part of the present invention.

【図１１】本発明の要部のコメント作成部の第５の実施
の形態のブロック図である。FIG. 11 is a block diagram of a fifth embodiment of a comment creation unit according to the present invention;

【図１２】図１及び図１１の動作説明用フローチャート
である。FIG. 12 is a flowchart for explaining the operation of FIGS. 1 and 11;

【図１３】本発明の他の実施の形態のブロック図であ
る。FIG. 13 is a block diagram of another embodiment of the present invention.

【図１４】本発明の更に他の実施の形態のブロック図で
ある。FIG. 14 is a block diagram of still another embodiment of the present invention.

【図１５】本発明の一実施例のブロック図である。FIG. 15 is a block diagram of one embodiment of the present invention.

【図１６】図１５の動作説明用フローチャートである。FIG. 16 is a flowchart for explaining the operation of FIG. 15;

【図１７】本発明での所定時間経過状況を示すインタフ
ェース部で実行画面の一例である。FIG. 17 is an example of an execution screen in the interface unit showing a lapse of a predetermined time according to the present invention.

【図１８】従来装置の一例の全体構成を示すブロック図
である。FIG. 18 is a block diagram illustrating an overall configuration of an example of a conventional device.

【図１９】従来装置のキー配列の説明に用いる平面図で
ある。FIG. 19 is a plan view used to explain a key arrangement of a conventional device.

【図２０】従来装置の動作説明用フローチャートであ
る。FIG. 20 is a flowchart for explaining the operation of the conventional apparatus.

【図２１】従来装置の一例の初期画面である。FIG. 21 is an initial screen of an example of a conventional device.

【図２２】従来装置の一例の目次画面である。FIG. 22 is a table of contents screen of an example of a conventional device.

【図２３】従来装置の動作説明用フローチャートであ
る。FIG. 23 is a flowchart for explaining the operation of the conventional apparatus.

[Explanation of symbols]

１ディジタルカメラ本体（画像入力部）２音声入力部３コメント音声入力部４認識・文字列化手段５返答音声入力部６管理テーブル７コメント作成部７−１入力音声認識手段７−２表示位置指定手段７−３Ｉ．Ｄ．連携手段７−４記憶手段７−５タイマ管理手段７−６タイマ７−７返答音声認識手段７−８確定処理手段７−９認識文字列再変換手段７−１０コメント音声再入力手段８表示データ用蓄積部９表示データ作成部１０画像出力部 Reference Signs List 1 Digital camera body (image input unit) 2 Voice input unit 3 Comment voice input unit 4 Recognition / character string conversion unit 5 Reply voice input unit 6 Management table 7 Comment creation unit 7-1 Input voice recognition unit 7-2 Display position designation Means 7-3 I. D. Cooperation means 7-4 Storage means 7-5 Timer management means 7-6 Timer 7-7 Response voice recognition means 7-8 Confirmation processing means 7-9 Recognition character string reconversion means 7-10 Comment voice reinput means 8 Display data Storage unit 9 display data creation unit 10 image output unit

───────────────────────────────────────────────────── フロントページの続き (58)調査した分野(Int.Cl.⁷，ＤＢ名) H04N 5/222 - 5/257 ────────────────────────────────────────────────── ─── Continued on front page (58) Field surveyed (Int. Cl. ⁷ , DB name) H04N 5/222-5/257

Claims

(57) [Claims]

1. A digital camera main body for inputting image information, and a voice input unit for voice-designating a position for displaying a comment for the image information input to the digital camera main body while looking at a fender of the digital camera main body. A comment voice input unit for voicely inputting a comment to be displayed at a position designated by the voice input unit; and a recognized character string for recognizing the comment voice input to the comment voice input unit and converting the comment voice into a character string in an undetermined state. A response voice input unit for inputting a response voice relating to the finalization or indefiniteness of the comment character string converted into a character string in the undetermined state, and referring to the image information input to the digital camera body, Based on the position designation voice input from the voice input unit and the comment voice input from the comment voice input unit, It is determined whether or not the comment character string converted into a character string in the undecided state can be displayed in the fender. When the comment character string can be displayed, the comment in the undecided state is used by using the reply voice input from the reply voice input unit. It determines determined / undetermined character string, accumulation and commenting unit that outputs a fixed character string information and the display position information, and the comment character display position information and confirmed character string information of the column that the comment creating unit is determined A display data storage unit, a display data generation unit for merging the image information input by the digital camera body and information stored in the display data storage unit, and output the information generated by the display data generation unit. A digital camera image comment input device, comprising:

2. The comment creation unit according to claim 1, wherein the comment creation unit is configured to convert the comment into a character string in an undetermined state based on a position designation voice input from the voice input unit and a comment voice input from the comment voice input unit. It is determined whether or not the character string can be displayed in the fender. If the character string cannot be displayed, the character string is determined based on the position designation voice input again from the voice input unit and the comment voice input again from the comment voice input unit. 2. The digital camera image comment input device according to claim 1, wherein it is determined again whether the comment character string converted into a character string in an undetermined state can be displayed in the fender.

3. The comment creating unit determines whether the comment character string in the undecided state is confirmed / unconfirmed by using a reply voice input from the reply voice input unit, wherein the reply voice is affirmative. 2. The comment input device for a digital camera image according to claim 1, wherein the comment character string in an undetermined state is determined if the intention is a confirmed voice.

4. The comment creating section determines whether the comment character string in the undecided state is confirmed / unconfirmed by using a reply voice input from the reply voice input section, wherein the reply voice is input after a voice is input. 2. A digital camera image comment input device according to claim 1, wherein said comment input means is provided for inputting said comment again if the voice is uncertain voice intended to re-enter the undetermined character string.

5. The comment creating unit according to claim 1, wherein said response voice is input from said response voice input unit to determine whether said comment character string in said undetermined state has been determined / undetermined. 2. The digital camera image comment input device according to claim 1, further comprising means for recognizing the comment character string and converting it into a character string when the voice is uncertain voice intended to reconvert the column.

6. The comment creating unit according to claim 1, wherein when the comment character string in the undetermined state is determined, the comment creating unit stores the data of the determined comment character string in a display data storage unit. A comment input device for the described digital camera image.

7. The comment creation unit further includes a timer for setting a remaining time and outputting time information, and timer management means for managing the timer, and a position designation voice input from the voice input unit. Based on the comment voice input from the comment voice input unit and whether or not the comment character string converted into a character string in the undetermined state can be displayed in the fender. 2. The digital camera image according to claim 1, wherein the response voice input from the response voice input unit within the time set in the step (a) is used to determine the confirmation or the character string re-recognition conversion or the character string re-input. Comment input device.

8. The comment creation unit determines whether a comment character string converted into a character string in the undetermined state can be displayed in the fender, and when display is possible, after a time set by the timer has elapsed. 2. The digital camera image comment input device according to claim 1, wherein the comment character string is automatically determined.

9. The digital camera image comment input device according to claim 7, further comprising means for displaying a degree of elapse of a predetermined time until the undetermined comment character string is determined.

10. Time information for merging image information input from the digital camera main body with a comment character string stored in the display data storage unit and changing within a predetermined time obtained by the comment creation unit is also provided. 8. The digital camera image comment input device according to claim 7, further comprising a display data creating section for reflecting the comment.