JPH08146987A

JPH08146987A - Speech input device and its control method

Info

Publication number: JPH08146987A
Application number: JP6283260A
Authority: JP
Inventors: Hiroki Yamamoto; 寛樹山本; Yasuhiro Komori; 康弘小森; Masaaki Yamada; 雅章山田; Yasunori Ohora; 恭則大洞
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 1994-11-17
Filing date: 1994-11-17
Publication date: 1996-06-07

Abstract

PURPOSE: To change a method for speech detection according to the state of speech input so that effective speech information is passed to a lower order processing. CONSTITUTION: As a mode wherein the start point and end point of a speech part inputted from a speech input device 4 are specified, there are three modes, i.e., a mode wherein they are specified automatically, a mode wherein only the start position is specified manually and the end point is specified automatically, and a mode wherein the start point and end point are both specified manually; and one of them is selected through an input device 3.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は音声入力装置及びその制
御方法、詳しくは入力された情報中の音声区間を特定
し、下位の処理に渡す音声入力装置及びその制御方法に
関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice input device and a control method therefor, and more particularly to a voice input device for specifying a voice section in input information and passing it to a subordinate process and a control method therefor.

【０００２】[0002]

【従来の技術】一般に、音声入力インタフェースは、音
声認識等の他のアプリケーションにユーザの発声内容を
渡す処理を行う。そのため、入力される音声データか
ら、ユーザが発声した音声区間を検出することが必要に
なってくる。2. Description of the Related Art Generally, a voice input interface performs a process of passing a user's uttered content to another application such as voice recognition. Therefore, it is necessary to detect the voice section uttered by the user from the input voice data.

【０００３】音声区間を検出する方法は様々あり、例え
ば次に示す方法がある。検出方法１：ユーザがマウスやキー等の入力装置を用い
て、発声の開始点と終了点を決定する。検出方法２：発声の開始点はユーザが決め、終了点は計
算機が判断する。検出方法３：発声の開始点は計算機が判断し、終了点は
ユーザが決める。検出方法４：発声の開始点，終了点ともに計算機が判断
する。There are various methods for detecting the voice section, for example, the following method. Detection method 1: The user determines the start point and end point of utterance using an input device such as a mouse or a key. Detection method 2: The user determines the start point of utterance and the computer determines the end point. Detection method 3: The computer determines the start point of utterance and the end point is determined by the user. Detection method 4: The computer determines both the start point and the end point of utterance.

【０００４】また、上記した各音声区間検出方法に対応
する、音声入力の際にユーザに要求される操作の例を次
に示す。ただし、ここでは入力装置としてマウスを用い
た場合である。操作方法１：発声する前にマウスボタンを押し、発声中
は押し続け、発声終了後マウスボタンを離す。操作方法２：発声する前に一度だけマウスボタンを押下
する。操作方法３：発声終了後に一度だけマウスボタンを押下
する。操作方法４：ユーザの操作は必要ない。Further, an example of an operation required by the user at the time of voice input corresponding to each of the voice section detecting methods described above is shown below. However, in this case, a mouse is used as the input device. Operation method 1: Press the mouse button before speaking, hold it while speaking, and release the mouse button after speaking. Operation method 2: Press the mouse button once before speaking. Operation method 3: Press the mouse button once after utterance. Operation method 4: No user operation is required.

【０００５】従来の音声入力インタフェースでは上記し
た検出方法のいずれか１つの手法のみを備えていた。The conventional voice input interface has only one of the above-mentioned detection methods.

【０００６】[0006]

【発明が解決しようとする課題】前記した各検出方法及
び操作方法にはそれぞれ長所と短所がある。Each of the above-mentioned detection methods and operation methods has advantages and disadvantages.

【０００７】例えば、検出方法１では音声区間の始端・
終端をユーザが与えるため、音声区間の切り出し間違い
が少ない。しかし、前記操作方法１に示した通り操作が
面倒である。[0007] For example, in the detection method 1,
Since the user gives the end, there are few mistakes in cutting out the voice section. However, the operation is troublesome as shown in the operation method 1.

【０００８】また、検出方法２，３では、検出方法１と
比較し、ユーザの負担がやや軽くなるが、その一方で、
計算機が雑音を音声と誤って検出し、検出された音声区
間内に雑音が含まれることがある。The detection methods 2 and 3 are slightly less burdensome on the user than the detection method 1, but on the other hand,
A computer may erroneously detect noise as speech, and noise may be included in the detected speech section.

【０００９】検出方法４ではユーザに全く操作を要求し
ないため使用感は向上するが、やはり音声入力環境の変
化や雑音などにより、発声した音声を検出しなかった
り、雑音を音声として検出してしまったりすることがあ
る。In detection method 4, since the user is not required to operate at all, the usability is improved, but again the uttered voice is not detected or noise is detected as voice due to changes in the voice input environment or noise. It may get chilly.

【００１０】このように従来の音声入力インタフェース
では、検出方法のいずれか１種の方法のみを用いている
ため、検出方法のもつ短所がそのまま音声インタフェー
スの課題の一つになっている。As described above, in the conventional voice input interface, since only one of the detection methods is used, the disadvantage of the detection method remains a problem of the voice interface.

【００１１】[0011]

【課題を解決するための手段】及び[Means for Solving the Problems] and

【作用】本発明はかかる問題点に鑑みなされたものであ
り、音声入力する状況に応じて検出方法を変更すること
を可能にし、有効な音声情報を下位処理に渡す音声入力
装置及びその制御方法を提供しようとするものである。The present invention has been made in view of the above problems, and makes it possible to change the detection method according to the situation of voice input, and a voice input device for passing effective voice information to lower processing and a control method thereof. Is to provide.

【００１２】この課題を解決するため、例えば本発明の
音声入力装置は以下の構成を備える。すなわち、音声入
力手段から入力された音声情報を下位の処理に渡す音声
入力装置であって、前記音声入力手段から入力される有
意な音声情報の期間を特定するための複数のモードと、
該複数のモードの中から１つを選択する選択手段とを備
える。To solve this problem, for example, the voice input device of the present invention has the following configuration. That is, a voice input device that passes the voice information input from the voice input means to a lower process, and a plurality of modes for specifying the period of significant voice information input from the voice input means,
And a selecting means for selecting one from the plurality of modes.

【００１３】また、本発明に係る好適な実施態様に従え
ば、前記モードには、音声の始点及び終点を自動識別す
る第１のモード、始点位置をマニュアル指示する第２の
モード、始点及び終点位置をマニュアル指示する第３の
モードが含まれることが望ましい。これによって、ユー
ザの置かれている状況に応じたモードを網羅することが
可能になる。According to a preferred embodiment of the present invention, the modes include a first mode for automatically identifying a start point and an end point of a voice, a second mode for manually instructing a start point position, a start point and an end point. A third mode for manually indicating position is preferably included. As a result, it becomes possible to cover all modes depending on the situation in which the user is placed.

【００１４】また、前記選択手段は、初期段階では前記
第１のモードを選択し、所定の変更指示があると前記第
２のモード、第３のモードの順に選択することが望まし
い。この結果、通常の使用環境においては、ユーザに全
く負担のかからないモードが選択される。また、使用環
境が変化した場合には、環境の変化に応じてモードが適
宜変更される。Further, it is preferable that the selecting means selects the first mode at an initial stage, and selects the second mode and the third mode in this order when a predetermined change instruction is given. As a result, in a normal use environment, a mode that does not burden the user at all is selected. Further, when the usage environment changes, the mode is appropriately changed according to the change in the environment.

【００１５】また、選択手段は、マニュアルにより選択
することようにしても良い。これによれば、直ちに、ユ
ーザの置かれている環境に適応させることが可能にな
る。The selecting means may be manually selected. According to this, it becomes possible to immediately adapt to the environment in which the user is placed.

【００１６】[0016]

【実施例】以下、添付図面に従って本発明に係る実施例
を詳細に説明する。図１は本発明に係わる情報処理装置
である計算機の第１実施例の概略構成を表すブロック図
である。Embodiments of the present invention will be described in detail below with reference to the accompanying drawings. FIG. 1 is a block diagram showing a schematic configuration of a first embodiment of a computer which is an information processing apparatus according to the present invention.

【００１７】図中、１はシステムバスであり、このシス
テムバス１には、ＣＲＴディスプレイ等の表示装置２、
キーボードやマウス等の入力装置３、マイクロフォン等
の音声入力装置４、前記音声入力装置４から供給される
音声信号を計算機で処理できるデータに変換するＩ／Ｏ
装置５、システム全体の動作を制御するＣＰＵ６が接続
されている。尚、このＣＰＵ６内には、後述するフロー
チャートに従った処理を実現するためのプログラムを格
納したＲＯＭ、及びワークエリアとして使用するＲＡＭ
で構成される主メモリを内蔵しているものとする。In the figure, reference numeral 1 is a system bus, on which a display device 2 such as a CRT display,
An input device 3 such as a keyboard or a mouse, a voice input device 4 such as a microphone, and an I / O for converting a voice signal supplied from the voice input device 4 into data that can be processed by a computer.
A device 5 and a CPU 6 for controlling the operation of the entire system are connected. In the CPU 6, a ROM storing a program for realizing the processing according to a flowchart described later and a RAM used as a work area
It is assumed that it has a built-in main memory.

【００１８】さて、以上のような構成におけるシステム
において、例えば、実施例では、入力装置３としてマウ
スを用いていることにする。また、３種類の音声入力モ
ードがある場合を考え、それぞれのモードでユーザに要
求される操作とそれぞれのモードに対応する音声区間の
検出方法は次のようなものとする。In the system having the above configuration, for example, a mouse is used as the input device 3 in the embodiment. Considering the case where there are three types of voice input modes, the operation required by the user in each mode and the method for detecting the voice section corresponding to each mode are as follows.

【００１９】音声入力モードモード１：ユーザの操作を必要としない。モード２：ユーザは発声する前にマウスを一度だけ押下
する。モード３：ユーザは発声する前にマウスを押し、発声中
マウスを押し続け、発声終了後にマウスから手を放す。Voice Input Mode Mode 1: No user operation is required. Mode 2: The user presses the mouse once before speaking. Mode 3: The user presses the mouse before speaking, keeps pressing the mouse during speaking, and releases the mouse after speaking.

【００２０】従って、各モードにおける音声区間検出方
法は次のようになる。モード１：音声区間の始端，終端の検出を計算機が行な
う。モード２：音声区間の始端をユーザが決定し、終端は計
算機が検出する。モード３：音声区間の始端，終端をユーザが決定する。Therefore, the voice section detection method in each mode is as follows. Mode 1: The computer detects the beginning and end of the voice section. Mode 2: The user determines the beginning of the voice section and the computer detects the end. Mode 3: The user determines the start and end of the voice section.

【００２１】システムの動作を図２のフローチャートに
従って説明する。The operation of the system will be described with reference to the flowchart of FIG.

【００２２】まず、ステップＳ１において、音声入力モ
ードを選択する。図３に、この音声入力モード選択処理
の一例を示す。First, in step S1, a voice input mode is selected. FIG. 3 shows an example of this voice input mode selection processing.

【００２３】まず、ステップＳ１１で、選択可能音声入
力モードを表示し、ステップＳ１２でユーザによる選択
を待つ。ユーザはマウス（入力装置３）を用いて所望の
入力モードを選択する。ユーザが音声入力モードの１つ
を選択すると、ステップＳ１３で選択入力モードを確認
させるためにユーザに知らせると共に、その選択内容を
所定の記憶領域（実施例ではＣＰＵ６内の主記憶メモリ
の所定アドレス）に記憶保持させる。First, in step S11, the selectable voice input mode is displayed, and in step S12, the user waits for selection. The user selects a desired input mode using the mouse (input device 3). When the user selects one of the voice input modes, the user is notified to confirm the selection input mode in step S13, and the selected content is stored in a predetermined storage area (a predetermined address of the main storage memory in the CPU 6 in the embodiment). Keep it in memory.

【００２４】上記の如く、音声入力モードが選択される
と、処理は図２のステップＳ２に進み、時刻ｔを０に設
定し、また、音声区間の始端・終端の検出状況を示す変
数Ｓｔを、始端・終端ともに検出されていないことを示
すＮＯＴＹＥＴに設定する。尚、時刻ｔを“０”にセッ
トするのは、不図示のタイマをリセットするものであ
り、変数ＳｔはＣＰＵ６内の主メモリ内の所定アドレス
に確保されているものである。この初期設定処理後、ス
テップＳ３の音声取り込み処理、ステップＳ５の音声デ
ータ分析処理に移る。As described above, when the voice input mode is selected, the process proceeds to step S2 of FIG. 2, the time t is set to 0, and the variable St indicating the detection status of the start / end of the voice section is set. , Is set to NOTYET indicating that neither the start edge nor the end edge has been detected. The time t is set to "0" to reset a timer (not shown), and the variable St is secured at a predetermined address in the main memory of the CPU 6. After this initial setting process, the process proceeds to the voice capturing process of step S3 and the voice data analyzing process of step S5.

【００２５】音声取り込み処理（ステップＳ３）では、
ユーザの発声した音声をマイクロフォンなどの音声入力
装置４とＩ／Ｏ装置５を用いて、計算機の処理できるデ
ータに変換し、計算機に取り込む。次に、音声データ保
持処理（ステップＳ４）では、前記ステップＳ３で取り
込まれた音声データを記憶装置７に保持する。音声デー
タを保持後、再び音声取り込み処理に戻り同様の動作を
繰り返し、音声取り込みと音声データの保持を続ける。In the voice capturing process (step S3),
The voice uttered by the user is converted into data that can be processed by the computer by using the voice input device 4 such as a microphone and the I / O device 5, and is taken into the computer. Next, in the voice data holding process (step S4), the voice data fetched in step S3 is held in the storage device 7. After holding the voice data, the process returns to the voice capturing process again, and the same operation is repeated to continue the voice capturing and the voice data holding.

【００２６】前記ステップＳ３，Ｓ４と並行して、ステ
ップＳ５の音声データ分析処理が行われ、前記ステップ
Ｓ４で記憶装置７に保持された音声データから、予め定
められた時間幅Δｔの分量だけデータを読み込み音声デ
ータの分析を行なう。音声データの分析処理の一例を図
４のフローチャートに示す。In parallel with the steps S3 and S4, the voice data analyzing process of the step S5 is performed, and the voice data stored in the storage device 7 in the step S4 is converted into data of a predetermined time width Δt. To analyze the voice data. An example of the voice data analysis process is shown in the flowchart of FIG.

【００２７】図４において、ステップＳ５１で時刻ｔか
ら時刻ｔ＋Δｔ間の音声データを記憶装置７から読み込
む。続くステップＳ５２では、読み込まれた音声データ
の二乗平均値を計算して、それをＰ（ｔ）として求め
る。In FIG. 4, in step S51, the voice data from time t to time t + Δt is read from the storage device 7. In a succeeding step S52, a root mean square value of the read voice data is calculated and obtained as P (t).

【００２８】図２に戻り、続くステップＳ６ではステッ
プＳ１で設定した音声入力モードに対応する音声区間検
出を行なう。ここで、ステップＳ１でユーザが音声入力
モードとしてモード１を選んだ場合、音声区間検出の一
例を図５のフローチャートに、モード２を選んだ場合の
音声区間検出の一例を図６のフローチャートに、モード
３を選んだ場合の音声区間検出の一例を図６のフローチ
ャートに示す。Returning to FIG. 2, in the following step S6, the voice section corresponding to the voice input mode set in step S1 is detected. Here, when the user selects mode 1 as the voice input mode in step S1, an example of voice section detection is shown in the flowchart of FIG. 5, and an example of voice section detection when mode 2 is selected is shown in the flowchart of FIG. An example of voice section detection when mode 3 is selected is shown in the flowchart of FIG.

【００２９】まず、モード１が選択された場合の音声区
間検出処理を説明する。First, the voice section detection processing when mode 1 is selected will be described.

【００３０】図５において、音声未入力判断ステップＳ
６１で音声区間検出の状況を示すＳｔがＮＯＴＹＥＴの
場合（音声区間の始端が決定していない場合）、ステッ
プＳ６２１に進んで、音声区間の始端であるかどうかを
判定する。また、ＮＯＴＹＥＴ以外の場合は、ステップ
Ｓ６４に進む。In FIG. 5, a voice non-input judgment step S
In step 61, when St indicating the state of voice section detection is NOTYET (when the start end of the voice section is not determined), the process proceeds to step S621, and it is determined whether or not it is the start point of the voice section. If the result is not NOTYET, the process proceeds to step S64.

【００３１】ステップＳ６２１では、先に求めた二乗平
均値計算結果である、時刻ｔから時刻ｔ＋Δｔにおける
音声データの二乗平均値Ｐ（ｔ）と、予め定められた音
声の始端を判定するための閾値Ｔｐｓを比較し、Ｐ
（ｔ）が閾値Ｔｐｓよりも大きい場合はステップＳ６３
に進んで、始端を決定する。ここでは、Ｓｔを音声区間
の始端が検出されたこと（すなわち音声が入力されたこ
と）を示す“ＩＮ”に変更し、時刻ｔを音声区間の始端
時刻ＳＴｉｍｅとし、音声区間検出の処理を終る。ま
た、ステップＳ６２１でＰ（ｔ）が閾値Ｔｐｓを越えな
いと判断した場合は、音声区間検出の処理を終了する。In step S621, the root mean square value P (t) of the voice data from time t to time t + Δt, which is the previously calculated root mean square value calculation result, and a threshold value for determining a predetermined start point of the voice. Compare Tps, P
If (t) is larger than the threshold Tps, step S63.
Go to and decide the starting point. Here, St is changed to “IN” indicating that the beginning of the voice section is detected (that is, the voice is input), the time t is set to the start time STime of the voice section, and the voice section detection process ends. . If it is determined in step S621 that P (t) does not exceed the threshold Tps, the voice section detection process ends.

【００３２】一方、ステップＳ６４に処理が進んで、音
声入力期間中であると判断した場合には、ＳｔがＩＮで
ある場合（音声区間の始端が決定している場合）は音声
区間の終端を判定する終端検出ステップＳ６５１に進
み、ＳｔがＩＮ以外である場合は音声区間検出の処理を
終える。ステップＳ６５１の終端検出処理では、前記ス
テップＳ５２で算出したＰ（ｔ）と予め定められた音声
の終端を判定するための閾値Ｔｐｅと比較する。Ｐ
（ｔ）がＴｐｅよりも小さい場合はステップＳ６６に進
み、Ｐ（ｔ）が閾値Ｔｐｅ以上の場合は、音声区間検出
の処理を終了する。On the other hand, when the process proceeds to step S64 and it is determined that the voice input period is in progress, when St is IN (when the start end of the voice section is determined), the end of the voice section is set. The process proceeds to determination end detection step S651, and if St is other than IN, the voice section detection process ends. In the end detection process of step S651, P (t) calculated in step S52 is compared with a predetermined threshold Tpe for determining the end of voice. P
If (t) is smaller than Tpe, the process proceeds to step S66, and if P (t) is greater than or equal to the threshold Tpe, the voice section detection process ends.

【００３３】ステップＳ６６では、Ｓｔを音声区間の終
端が検出されたこと（すなわち音声入力が終了したこ
と）を示すＥＮＤに変更し、時刻ｔを音声区間の終端時
刻ＥＴｉｍｅとして、音声区間検出の処理を終了する。In step S66, St is changed to END indicating that the end of the voice section has been detected (that is, the voice input has ended), and the time t is set as the end time ETime of the voice section, and the voice section detection processing is performed. To finish.

【００３４】次に、図２におけるステップＳ１でモード
２が選択された場合の音声区間検出処理を図６のフロー
チャートに従って説明する。Next, the voice section detection process when the mode 2 is selected in step S1 in FIG. 2 will be described with reference to the flowchart of FIG.

【００３５】モード１の場合と異なる点は、音声未入力
判断ステップＳ６１で音声区間検出の状況を示すＳｔが
ＮＯＴＹＥＴの場合の処理である。その他の処理で同じ
部分に関しては図５と同一符号を付した。The difference from the case of mode 1 is the processing when St indicating the state of voice section detection in the voice non-input judgment step S61 is NOTYET. The same reference numerals as those in FIG.

【００３６】モード２では、ステップＳ６１における音
声未入力判断処理でＳｔがＮＯＴＹＥＴの場合、ユーザ
が音声区間開始を知らせるマウス押下を行なったか否か
を判断するステップＳ６２２に進み、始端入力検出処理
を行う。このステップでユーザが音声区間開始の合図で
あるマウス押下を行なったか否かを調べ、ユーザがマウ
スを押下していた場合には、ステップＳ６３に進んで始
端決定を行い、押下していない場合は、音声区間検出の
処理を終る。その他のステップについては、モード１の
場合と同じ処理を行なう。In mode 2, if St is NOTYET in the voice non-input determination process in step S61, the process proceeds to step S622 to determine whether or not the user has pressed the mouse to notify the start of the voice section, and the start input detection process is performed. . In this step, it is checked whether or not the user has pressed the mouse, which is a signal to start the voice section, and if the user has pressed the mouse, the process proceeds to step S63 to determine the start end, and if the mouse has not been pressed, , The processing of voice section detection ends. For the other steps, the same processing as in mode 1 is performed.

【００３７】次に、図２におけるステップＳ１でモード
３が選択された場合の音声区間検出処理を図７のフロー
チャートに従って説明する。Next, the voice section detection process when the mode 3 is selected in step S1 of FIG. 2 will be described with reference to the flowchart of FIG.

【００３８】モード２の場合と異なる点は、ステップＳ
６４における音声入力期間中判断処理で音声区間検出の
状況を示すＳｔがＩＮの場合の処理である。The difference from the case of mode 2 is step S
This is a process in the case where St indicating the state of voice segment detection is IN in the voice input period determination process in 64.

【００３９】ステップＳ６４において、ＳｔがＩＮであ
る場合は音声区間の終端を判定するステップＳ６５２に
進み、ＳｔがＩＮ以外である場合は音声区間検出の処理
を終る。In step S64, if St is IN, the process proceeds to step S652 to determine the end of the voice section, and if St is other than IN, the voice section detection process is ended.

【００４０】ステップＳ６５２では、マウスボタンが
（押下されていた状態から）開放されたか否かを調べ、
マウスが開放された場合には、ステップＳ６６に進んで
終端を決定し、マウスが開放されていない（ユーザがマ
ウスを押し続けている）場合は音声区間検出処理を終了
する。その他のステップについては、モード２の場合と
同じ処理を行なう。In step S652, it is checked whether or not the mouse button has been released (from the pressed state),
If the mouse has been released, the process proceeds to step S66 to determine the end, and if the mouse has not been released (the user continues to press the mouse), the voice section detection process ends. The other steps are the same as those in the mode 2.

【００４１】再び、図２に戻って、上記の如く、音声区
間検出処理が終了した後、ステップＳ７に進んで、音声
入力終了判断を行う。ここでは、ＳｔがＥＮＤであれば
ステップＳ９に進んで、音声区間を表示し、ＳｔがＥＮ
Ｄ以外であればステップＳ８に進んで、分析時刻更新を
行い、時刻ｔをΔｔだけ増加する。そして、ステップＳ
５に戻って音声データ分析を続行する。Returning to FIG. 2 again, after the voice section detection process is completed as described above, the process proceeds to step S7, and a voice input end determination is performed. Here, if St is END, the process proceeds to step S9, the voice section is displayed, and St is EN.
If it is other than D, the process proceeds to step S8, the analysis time is updated, and the time t is increased by Δt. And step S
Return to 5 to continue the voice data analysis.

【００４２】ステップＳ９における音声区間表示処理で
は、上記ステップＳ６６で検出した音声区間（時刻ＳＴ
ｉｍｅから時刻ＥＴｉｍｅまで）を表示装置２に表示し
たのち、音声データを抽出する。In the voice section display process in step S9, the voice section detected in step S66 (time ST
(from time to time ETime) is displayed on the display device 2, and then voice data is extracted.

【００４３】以上のごとく説明した実施例を、ユーザが
突発的な雑音の発生する環境下で音声入力する場合に適
用すると次のようになる。When the embodiment described above is applied to the case where the user inputs a voice in an environment where sudden noise is generated, the following is obtained.

【００４４】まず、プログラム開始と同時にステップＳ
１の音声入力モード選択処理で、表示装置２に選択可能
な入力モードを表示し（ステップＳ１１）、ユーザによ
る選択を待つ（ステップＳ１２）。ここで、例えばユー
ザがモード１を選択したとする。続くステップＳ１３で
は、選択されたモードがモード１であることを、表示装
置２上に表示するなどしてユーザに知らせる。音声入力
モードの選択を表示装置２上に実現した一例を図８に示
す。First, at the same time when the program starts, step S
In the voice input mode selection process No. 1, selectable input modes are displayed on the display device 2 (step S11), and the user waits for selection (step S12). Here, it is assumed that the user selects the mode 1, for example. In the following step S13, the user is informed that the selected mode is the mode 1 by displaying it on the display device 2. FIG. 8 shows an example in which the selection of the voice input mode is realized on the display device 2.

【００４５】同図では、各音声入力モードに呼称を付け
ている。それぞれの呼称は、モード１がKeep Pressing
SpeechInput ，モード２がOne Click SpeechInput ，モ
ード３がHand Free SpeechInput である。この図では、
モード３に対応するHand Free SpeechInput が選択され
ていることを示している。また、ステップＳ１３でユー
ザに選択されたモードを伝えるため表示装置２上に表示
した例を図９に示す。図示の如く、画面上部に選択モー
ドを明示することで、現在のモードをユーザに知らせて
いる。In the figure, each voice input mode is named. Mode 1 is Keep Pressing
SpeechInput, Mode 2 is One Click SpeechInput, and Mode 3 is Hand Free SpeechInput. In this figure,
This indicates that Hand Free Speech Input corresponding to mode 3 is selected. 9 shows an example displayed on the display device 2 to inform the user of the selected mode in step S13. As shown in the figure, the current mode is notified to the user by clearly indicating the selection mode at the top of the screen.

【００４６】続いてステップＳ２の初期設定処理に移行
し、時刻ｔ，音声検出状況Ｓｔの初期化を行なう。続
く、ステップＳ３の音声声取り込み、ステップＳ４の音
声データ保持により、音声取り込みと記憶装置２への音
声データ保持が開始され、ステップＳ３，Ｓ４は繰り返
し続けられる。同時にステップＳ５における音声分析処
理では、記憶装置２に保持された音声データを定められ
た時間幅Δｔで分析し、ステップＳ６の音声区間検出処
理へ移行する。Then, the process proceeds to the initialization process of step S2, and the time t and the voice detection status St are initialized. Succeedingly, voice capturing and voice data holding in the storage device 2 are started by voice voice capturing in step S3 and voice data holding in step S4, and steps S3 and S4 are repeated repeatedly. At the same time, in the voice analysis process in step S5, the voice data held in the storage device 2 is analyzed with a predetermined time width Δt, and the process proceeds to the voice section detection process in step S6.

【００４７】今、音声入力モードとしてモード１が選択
されている例を説明しているので、図５にフローチャー
トを示した音声区間検出を行なうことになる。ステップ
Ｓ６の音声区間検出処理を終えると、音声区間検出状況
Ｓｔを調べ、音声区間の終端が検出されるまで（音声区
間が終了とみなされるまで）、時間幅Δｔごとに音声デ
ータ分析処理（ステップＳ５）と音声区間検出処理（ス
テップＳ６）を繰り返し行なう。Since the example in which the mode 1 is selected as the voice input mode has been described, the voice section detection shown in the flowchart of FIG. 5 will be performed. When the voice section detection process of step S6 is completed, the voice section detection status St is checked, and voice data analysis processing is performed for each time width Δt until the end of the voice section is detected (until the voice section is considered to end). S5) and the voice section detection process (step S6) are repeated.

【００４８】説明のため、ユーザが音声入力モードを選
択した後、ユーザが発生する前に、突発的な雑音（瞬時
的に二乗平均値が閾値Ｔｐｓを上回る大きな音）が時刻
ｔnから時刻ｔn2（ｔn2＞ｔn）にかけて発生したとす
る。この場合、時刻ｔnの音声データを処理する音声区
間検出処理（ステップＳ６）では、突発的な雑音の二乗
平均値が閾値Ｔｐｓより大きいため、雑音の始端が音声
区間の始端として検出され、ステップＳ６３の始端検出
処理に移る。前記ステップＳ６３では、ＳｔをＩＮに変
更し、音声区間の始端の時刻ＳＴｉｍｅをｔｎとする。For the sake of explanation, after the user selects the voice input mode and before the user generates, a sudden noise (a loud sound whose root mean square value exceeds the threshold Tps instantaneously) occurs from time tn to time tn2 ( It is assumed that this occurs over tn2> tn). In this case, in the voice section detection process (step S6) for processing the voice data at the time tn, since the root mean square value of the sudden noise is larger than the threshold value Tps, the start point of the noise is detected as the start point of the voice section, and the step S63 is performed. Then, the process proceeds to the start edge detection process. In step S63, St is changed to IN, and the time STime at the start of the voice section is set to tn.

【００４９】その後、突発的な雑音が弱まる時刻ｔn2に
おける音声区間検出処理（ステップＳ６）の終端検出処
理（ステップＳ６５１）で、雑音が弱まっているため
に、Ｐ（ｔ）が閾値Ｔｐｅよりも小さくなるため、終端
決定を行う（ステップＳ６６）。つまり、ｓｔをＥＮＤ
に変更し、ｔn2を音声区間の終端の時刻ＥＴｉｍｅとす
る。After that, in the end detection processing (step S651) of the voice section detection processing (step S6) at time tn2 when the sudden noise weakens, P (t) becomes smaller than the threshold value Tpe because the noise weakens. Therefore, the termination is determined (step S66). That is, st is END
And tn2 is the time ETime at the end of the voice section.

【００５０】計算機は突発的な雑音を音声と誤認して、
雑音の発生していた時間（時刻ｔn〜ｔn2間）を音声区
間として検出して、ステップＳ９で音声区間を表示す
る。ユーザはこの表示により、誤った音声区間を検出し
たことを確認できる。音声入力モードをモード１に設定
して音声入力を行ない、突発的な雑音で音声区間を誤検
出した一例を図１０に示す。図１０は、音声入力インタ
フェースの波形表示部の一例であり、中央の２層の白色
の表示部のうち、上の表示部が音声の原波形を示し、下
の表示部は二乗平均値を表示している。また、上の表示
部において、領域Ａが検出された音声区間である。The computer mistakenly recognizes sudden noise as voice,
The time when noise is generated (between time tn and tn2) is detected as a voice section, and the voice section is displayed in step S9. From this display, the user can confirm that the wrong voice section has been detected. FIG. 10 shows an example in which the voice input mode is set to mode 1 and voice input is performed, and a voice section is erroneously detected due to sudden noise. FIG. 10 shows an example of the waveform display unit of the voice input interface. Among the two white display units in the center, the upper display unit shows the original waveform of the voice and the lower display unit displays the root mean square value. are doing. Further, in the upper display section, the area A is the detected voice section.

【００５１】２回目の実行では（前回の実行で、突発的
な雑音が発生する環境下では、モード１の音声区間の始
端検出が誤検出することが分かっているので）、ステッ
プＳ１の音声入力モード選択処理でユーザはモード２等
を選択する。In the second execution (because of the previous execution, it is known that the detection of the start edge of the voice section in mode 1 is erroneously detected in the environment where sudden noise is generated), and the voice input in step S1 is performed. In the mode selection process, the user selects mode 2 or the like.

【００５２】以降の処理はモード１を選んだ際と同じ
で、異なる点は、音声区間検出で行われる処理である。
前記ステップＳ１２で音声入力モードはモード２を選択
しているので、モード２に対応する図６にフローチャー
トを示した音声区間検出を行なう。The subsequent processing is the same as when the mode 1 is selected, and the different point is the processing performed in the voice section detection.
Since the mode 2 is selected as the voice input mode in step S12, the voice section detection corresponding to the mode 2 shown in the flowchart of FIG. 6 is performed.

【００５３】前回同様に突発的な雑音が時刻ｔnから時
刻ｔn2（ｔn2＞ｔn）にかけて再び発生したとする。前
回の場合と異なり、時刻ｔnの音声データを処理する音
声区間検出処理（ステップＳ６）では、ステップＳ６２
２の始端入力検出より、どのような大きな雑音（あるい
は音声）であっても、ユーザがマウスを押下するまでは
音声区間の始端を検出したことにはならない。すなわ
ち、ユーザがマウスを押下した時点を音声区間の始端と
するため、モード１を用いた前回のように、雑音に反応
して音声区間の始端と誤判断することはなくなる。音声
区間の始端を検出した以降の処理は前回と同様である。Similarly to the previous time, it is assumed that sudden noise is generated again from time tn to time tn2 (tn2> tn). Unlike the previous case, in the voice section detection process (step S6) for processing the voice data at time tn, step S62
From the detection of the start input of No. 2, no matter how loud the noise (or voice) is, it does not mean that the start of the voice section is detected until the user presses the mouse. That is, since the time point when the user presses the mouse is set as the start end of the voice section, there is no possibility of erroneously determining the start end of the voice section in response to noise as in the previous time using mode 1. The process after detecting the start of the voice section is the same as the previous process.

【００５４】突発的な雑音が発生している環境下で、音
声入力モード２に設定して、ユーザがマウスを押下した
後に発声し、正しく音声区間検出を行なわれた一例を図
１１に示す。図１１において、ユーザが発生する前に突
発的な雑音が入力されているが、ユーザがマウスを押下
していなかったため、音声区間として検出されなかっ
た。FIG. 11 shows an example in which the voice input mode 2 is set in an environment where sudden noise is generated, the user speaks after pressing the mouse, and the voice section is correctly detected. In FIG. 11, a sudden noise was input before the user generated it, but it was not detected as a voice section because the user did not press the mouse.

【００５５】以上の如く、突発的な雑音が入力された場
合は音声入力モードをモード２に選択することで対応で
きた。As described above, when sudden noise is input, it can be dealt with by selecting the voice input mode as mode 2.

【００５６】しかし、定めた閾値を定常的に上回る雑音
が発生する環境ではモード１，モード２の音声区間検出
では、前記終端検出処理（ステップＳ６５１）で常にＰ
（ｔ）＞＝Ｔｐｅとなり、音声区間の終端検出が検出さ
れない。このような環境下で音声入力する場合は、図７
のフローチャートに示した音声区間検出を行なう方法モ
ード３に切替えることで対応できる。However, in an environment in which noise that constantly exceeds the predetermined threshold value is generated, in the voice section detection in Mode 1 and Mode 2, P is always set in the end detection processing (step S651).
(T)> = Tpe, and the end detection of the voice section is not detected. When inputting voice in such an environment, as shown in FIG.
This can be dealt with by switching to the method mode 3 for detecting the voice section shown in the flowchart of FIG.

【００５７】モード３では、音声区間の始端，終端とも
にユーザからマウスで指示されるため、雑音が発生して
も正しく音声区間を検出できる。図１２は音声入力モー
ドをモード１に設定した場合に、定常的な雑音により音
声区間が正しく検出されなかった一例である。これに対
し、音声入力モードをモード３に設定して定常的な雑音
環境下で音声区間を検出した一例を図１３に示す。In mode 3, the user indicates the start and end of the voice section with the mouse, so that the voice section can be correctly detected even if noise occurs. FIG. 12 is an example in which, when the voice input mode is set to mode 1, the voice section is not correctly detected due to constant noise. On the other hand, FIG. 13 shows an example in which the voice input mode is set to mode 3 and the voice section is detected in a constant noise environment.

【００５８】以上の如く、本実施例によれば、音声入力
区間を決定するモードをユーザが選択できるので、その
ユーザの置かれた状況に適応して検出方法を変更し、正
しく音声区間を検出することが可能になる。As described above, according to this embodiment, since the user can select the mode for determining the voice input section, the detection method is changed according to the situation in which the user is placed, and the voice section is correctly detected. It becomes possible to do.

【００５９】因みに、ユーザの操作が簡便なのは、モー
ド１であり、次いで、モード２、モード３と続くが、ユ
ーザが自身の置かれた状態、或いは環境に応じたモード
を選択することができるので、操作性と正しく音声区間
を検出する検出方法の両方を最適なものとすることが可
能になる。By the way, it is the mode 1 that the user's operation is simple, followed by the mode 2 and the mode 3, but the user can select the mode according to his / her own state or the environment. It becomes possible to optimize both the operability and the detection method for correctly detecting the voice section.

【００６０】［第２の実施例の説明］図１４のフローチ
ャートに従って第２の実施例における動作処理内容を説
明する。尚装置構成は図１と同様であるものとする。[Explanation of the Second Embodiment] The contents of the operation processing in the second embodiment will be explained according to the flow chart of FIG. The device configuration is the same as that shown in FIG.

【００６１】さて、本第２の実施例では、第１実施例と
同様の３種の音声入力モードを持つ音声入力インタフェ
ースについて、計算機が自動的に入力モードを設定し
て、音声入力を行ない、設定された入力モードに対応す
る音声区間検出の結果をユーザに示す。ユーザは示され
た結果から、音声区間検出が正しく行なわれているか否
かを判断する。正しく区間検出が行なわれていない場合
は計算機が音声入力モードを変更し、正しく行なわれて
いる場合は入力モードを変更しない。In the second embodiment, with respect to the voice input interface having the same three types of voice input modes as in the first embodiment, the computer automatically sets the input mode to input the voice, The result of voice section detection corresponding to the set input mode is shown to the user. The user determines from the displayed result whether or not the voice section is correctly detected. If the section is not correctly detected, the computer changes the voice input mode. If the section is correctly detected, the input mode is not changed.

【００６２】詳細を図１４と図１５のフローチャートを
用いて説明する。Details will be described with reference to the flowcharts of FIGS. 14 and 15.

【００６３】第１実施例と異なる処理を行なうのは、図
１４における、ステップＳ１’における音声入力モード
選択処理と、ステップＳ９’の音声区間表示処理後に行
なうステップＳ１０のキャンセル判断処理である。その
他の各ステップは図２のと同様である。The processing different from that of the first embodiment is the voice input mode selection processing in step S1 'and the cancel determination processing in step S10 performed after the voice section display processing in step S9' in FIG. The other steps are the same as those in FIG.

【００６４】ステップＳ１’の音声入力モード選択処理
の例を図１５のフローチャートを用いて説明する。An example of the voice input mode selection process of step S1 'will be described with reference to the flowchart of FIG.

【００６５】図１５において、まず、ステップＳ１０１
で起動直後判断処理では、プログラム起動直後であるか
否かを判断する。起動直後である場合にはステップＳ１
０２に進んで、音声入力モードとしてモード１を選択す
る。そして、選択入力モード伝達ステップＳ１３’（図
３のステップＳ１３と同様）に進む。In FIG. 15, first, step S101.
In the immediately after startup determination process, it is determined whether or not it is immediately after the program startup. If it has just been started, step S1
In step 02, mode 1 is selected as the voice input mode. Then, the process proceeds to the selection input mode transmission step S13 ′ (similar to step S13 in FIG. 3).

【００６６】また、起動直後でない場合はステップＳ１
０３に進み、現在のモードがモード１であるかどうかを
判断する。現在の音声入力モードの設定がモード１の場
合は、ステップＳ１０４に進んで、音声入力モードとし
てモード２に設定し、ステップＳ１３’の選択入力モー
ド伝達処理を行う。If it has not been started immediately, step S1
In step 03, it is determined whether the current mode is mode 1. When the current setting of the voice input mode is the mode 1, the process proceeds to step S104, the mode 2 is set as the voice input mode, and the selection input mode transmission process of step S13 ′ is performed.

【００６７】更に、現在のモードがモード１でもないと
判断した場合には、処理はステップＳ１０３からステッ
プＳ１０５に進み、現在の音声入力モードの設定がモー
ド２であるかどうかを判断する。モード２であると判断
した場合には、ステップＳ１０６に進み、音声入力モー
ドをモード３に設定して、ステップＳ１３’の入力モー
ド伝達処理をおこなう。Further, when it is determined that the current mode is not the mode 1, the process proceeds from step S103 to step S105, and it is determined whether the current voice input mode is set to the mode 2. When it is determined that the mode is the mode 2, the process proceeds to step S106, the voice input mode is set to the mode 3, and the input mode transmission process of step S13 'is performed.

【００６８】また、設定されているモードがモード２で
もない、すなわち、モード３である場合には、ステップ
Ｓ１１’に進み、選択可能な入力モードを表示装置２上
に表示し、ステップＳ１２’でユーザの選択を待つ。ユ
ーザが音声入力モードを設定した後に前記ステップＳ１
３’に移る。If the set mode is not the mode 2, that is, the mode 3 is selected, the process proceeds to step S11 ', the selectable input mode is displayed on the display device 2, and at step S12'. Wait for user selection. After the user sets the voice input mode, the step S1 is performed.
Move to 3 '.

【００６９】以上のようにして、モードが決定される
と、図１４に戻り、第１実施例と同様の処理を行なう。
そして、ステップＳ９’で区間検出結果を表示し、ステ
ップＳ１０でキャンセル判断を行う。ユーザは発声した
音声に対して正しく音声区間検出されている場合はキャ
ンセルしない。誤って雑音等を音声区間として誤検出し
ている場合はキャンセルする。ステップＳ１０のキャン
セル判断処理では、ユーザがキャンセルしたか否かを判
断し、キャンセルした場合には前記ステップＳ１’に移
り、キャンセルしない場合にはステップＳ５に進む。キ
ャンセルは入力装置３を用いて行ない、その方法は予め
決めておく。例えば、キャンセルの方法としてはマウス
をダブルクリックする、ないしはキーボード上の特定の
キーを押下する等である。When the mode is determined as described above, the process returns to FIG. 14 and the same processing as in the first embodiment is performed.
Then, the section detection result is displayed in step S9 ', and a cancellation decision is made in step S10. The user does not cancel the voiced voice if the voice segment is correctly detected. If noise or the like is erroneously detected as a voice section, it is canceled. In the cancellation determination process of step S10, it is determined whether or not the user cancels, and if canceled, the process proceeds to step S1 ′, and if not canceled, the process proceeds to step S5. Cancellation is performed using the input device 3, and the method is determined in advance. For example, as a canceling method, a mouse is double-clicked, or a specific key on the keyboard is pressed.

【００７０】以上に示した実施例を実際にユーザが使用
した場合について説明する。A case where the user actually uses the embodiment shown above will be described.

【００７１】まず、プログラム起動と同時にステップＳ
１０２で音声入力モードがモード１（すなわち、一番操
作が簡便なモード）に設定され、ステップＳ１３’でモ
ード１が選択されたことをユーザに知らせる。ステップ
Ｓ２’で初期設定後、ステップＳ３’，Ｓ４’で音声取
り込み及び音声データ保持が行なわれ、ステップＳ５’
である時刻ｔから時刻ｔ＋Δｔの音声データを分析し、
ステップＳ６’で音声区間検出処理が行なわれる。ステ
ップＳ７’で音声検出状況ＳｔがＥＮＤにならない間
は、次の時刻の音声データを分析し、音声区間検出およ
び音声入力終了を判断するステップＳ５’〜Ｓ８’を繰
り返す。ステップＳ７で音声入力終了と判断した場合は
音声区間表示を行なうステップＳ９に移り、検出された
音声区間を表示する。続いて、ステップＳ１０のキャン
セル判断処理に移る。First, at the same time when the program is started, step S
In step 102, the voice input mode is set to the mode 1 (that is, the mode in which the operation is the simplest), and the user is notified that the mode 1 has been selected in step S13 ′. After initial setting in step S2 ', voice acquisition and voice data holding are performed in steps S3' and S4 ', and step S5'.
The voice data from time t to time t + Δt is analyzed,
In step S6 ', the voice section detection process is performed. While the voice detection status St is not END in step S7 ', the voice data at the next time is analyzed, and steps S5' to S8 'for determining voice segment detection and voice input end are repeated. If it is determined in step S7 that the voice input has ended, the process moves to step S9 for displaying the voice section, and the detected voice section is displayed. Then, the cancellation determination process of step S10 is performed.

【００７２】ここで音声区間が正しく検出されたとする
と、ステップＳ１０において、ユーザがキャンセルしな
いのでステップＳ２’に進み、同じ音声入力モード（モ
ード１）で音声入力を続ける。ここで、音声入力の際
に、突発的な雑音が、ユーザの発声前に生じ、音声区間
を誤検出したとする。ユーザはステップＳ９’により、
雑音を誤って検出したことを認識できるので、続くステ
ップＳ１０でキャンセルする。ステップＳ１０でキャン
セルされた場合は、ステップＳ１に進んで、音声入力モ
ード選択処理を行い、図１５のフローチャートに示した
処理が行なわれる。この場合、現在の音声入力モードは
モード１であるから、ステップＳ１０１，Ｓ１０３を経
た後、ステップＳ１０４でモード２が設定される。ま
た、モード２に変更されたことがステップＳ１３’によ
り、ユーザに知らされる。続いて、モード２の音声入力
モードで音声入力が行なわれる。If it is assumed that the voice section is correctly detected, the user does not cancel it in step S10, so the process proceeds to step S2 ', and voice input is continued in the same voice input mode (mode 1). Here, it is assumed that, when a voice is input, sudden noise occurs before the user's utterance, and a voice segment is erroneously detected. The user uses step S9 '
Since it can be recognized that noise has been erroneously detected, it is canceled in the subsequent step S10. If it is canceled in step S10, the process proceeds to step S1, the voice input mode selection process is performed, and the process shown in the flowchart of FIG. 15 is performed. In this case, since the current voice input mode is mode 1, mode 2 is set in step S104 after steps S101 and S103. Further, the user is informed of the change to the mode 2 in step S13 '. Subsequently, voice input is performed in the voice input mode of mode 2.

【００７３】尚、例えは、モード２の音声入力モードを
用いても、定常的に大きな雑音が入り、音声区間の終端
が正しく検出されなかった場合は、ユーザがステップＳ
１０でキャンセルすることにより、ステップＳ１’の音
声入力モード選択処理に戻り、ステップＳ１０１，Ｓ１
０３，Ｓ１０５を経て、ステップＳ１０６でモード３に
変更される。モード３で入力している場合は、ユーザが
音声区間を設定しているので雑音などによる誤検出はな
くなる。Incidentally, even if the voice input mode of mode 2 is used, if a large amount of noise is constantly generated and the end of the voice section is not correctly detected, the user performs step S.
Cancellation in 10 returns to the voice input mode selection process of step S1 ′, and steps S101 and S1
After 03 and S105, the mode is changed to the mode 3 in step S106. When inputting in mode 3, since the user has set the voice section, there is no erroneous detection due to noise or the like.

【００７４】モード３で音声入力中に音声入力環境が変
化し、雑音が小さくなり、操作の容易な入力モードに変
える場合は、ユーザがキャンセルすることによってユー
ザが任意の入力モードを設定できるようになる。この場
合、ステップＳ１０から音声入力モード選択処理（ステ
ップＳ１’）に移り、ステップＳ１０１，Ｓ１０３，Ｓ
１０５を経て、ステップＳ１１’で選択可能な音声入力
モードが表示される。ユーザは続くユーザ選択処理（ス
テップＳ１２’）で任意の入力モードを選択し音声入力
を続けることができる。When the voice input environment changes during voice input in mode 3 to reduce noise and change to an input mode that is easy to operate, the user cancels so that the user can set an arbitrary input mode. Become. In this case, the process proceeds from step S10 to the voice input mode selection process (step S1 '), and steps S101, S103, S
After 105, the selectable voice input mode is displayed in step S11 '. The user can select an arbitrary input mode and continue voice input in the subsequent user selection process (step S12 ′).

【００７５】以上では、音声入力モードを自動選択する
一実施例を説明した。上記説明の通り、プログラム起動
直後はユーザの操作を必要としない音声入力モードで音
声入力を行ない、ユーザがキャンセルする毎に、段階的
に区間検出をユーザに依存した入力モードに変更してい
く方法である。In the above, one embodiment for automatically selecting the voice input mode has been described. As described above, a method of performing voice input in a voice input mode that does not require user operation immediately after starting the program, and gradually changing the section detection to a user-dependent input mode each time the user cancels Is.

【００７６】なお、本発明は、図示の実施例に限定され
ず、種々の変形が可能である。例えばその変形例には次
のようなものがある。（１）上記実施例では、入力装置２としてマウスを用い
たが、これに限定されず、キーボード上のキーやライト
ペン，タッチパネル等を用いても良い。（２）上記実施例では、計算機で音声区間検出をする際
に二乗平均値をパラメータとして用いていたが、これに
限定されず、零交差回数や窓をかけたパワー等を用いて
も良く、またこれらのパラメータを複数用いても良い。（３）上記実施例では、３種の音声区間検出方法を用い
たが、これに限定されず、例えば、（ａ）計算機，ユーザともに音声区間の始端（終端）と
した場合を音声区間の始端（終端）とする。The present invention is not limited to the illustrated embodiment, and various modifications can be made. For example, there are the following modifications. (1) Although the mouse is used as the input device 2 in the above embodiment, the present invention is not limited to this, and a key on a keyboard, a light pen, a touch panel, or the like may be used. (2) In the above embodiment, the root mean square value was used as a parameter when detecting the voice section by the computer, but the present invention is not limited to this, and the number of zero crossings, windowed power, etc. may be used. Also, a plurality of these parameters may be used. (3) In the above embodiment, three types of voice section detection methods are used, but the present invention is not limited to this. For example, (a) the beginning of the voice section when both the computer and the user set the start (end) of the voice section. (End).

【００７７】（ｂ）計算機が始端（終端）と判断した時
刻付近でユーザが入力装置２に始端（終端）を知らせる
操作を行なった場合を始端（終端）とする。等を用いても良い。（４）上記実施例では、ユーザに選択された音声入力モ
ードを伝達する方法として、画面上に選択されたモード
名を表示したが、これに限らず、合成音を用いて選択さ
れたモードをユーザに知らせても良い。(B) A case where the user performs an operation of notifying the input device 2 of the start end (end) around the time when the computer determines that the start end (end) is the start end (end). Etc. may be used. (4) In the above embodiment, the selected mode name is displayed on the screen as a method of transmitting the selected voice input mode to the user, but the present invention is not limited to this. You may inform the user.

【００７８】以上説明したように、本実施例によれば、
複数の音声入力モードの中から音声入力モードを設定す
る音声入力モード設定手段と、ユーザの発生内容を取り
込み保持する音声取り込み保持手段と、複数の音声区間
検出方法から該音声入力モード設定手段により設定され
た音声入力モードに対応する音声区間検出を行なう音声
区間検出手段とを備えたことにより、音声入力環境の変
化や雑音等に応じて、複数の音声入力モードから音声入
力モードを選択でき、操作性，使用感が著しく向上す
る。As described above, according to this embodiment,
A voice input mode setting means for setting a voice input mode from among a plurality of voice input modes, a voice capture holding means for capturing and holding contents generated by the user, and a voice input mode setting means for setting a plurality of voice section detection methods. With the voice section detecting means for detecting the voice section corresponding to the selected voice input mode, the voice input mode can be selected from a plurality of voice input modes according to the change of the voice input environment, noise, etc. Remarkably improved in usability and usability.

【００７９】特に、第２の実施例によれば、初期段階で
は操作性が簡便なモードが選択されるので、比較的雑音
の少ない場所等の環境では一番ユーザに負担がかからな
いモードにすることが可能になる。In particular, according to the second embodiment, a mode having a simple operability is selected in the initial stage, and therefore, the mode is set so that the user is least burdened in an environment such as a place where there is relatively little noise. Will be possible.

【００８０】また、上記実施例では、１つの独立した装
置に適応した例を説明したが、上記の説明から容易に推
察されるごとく、本発明は、複数の機器から構成される
システムに適用しても構わない。また、本発明はシステ
ム或は装置にプログラムを供給することによって達成さ
れる場合にも適用できることは言うまでもない。Further, in the above embodiment, an example in which one independent device is applied has been described. However, as can be easily inferred from the above description, the present invention is applied to a system composed of a plurality of devices. It doesn't matter. Further, it goes without saying that the present invention can be applied to the case where it is achieved by supplying a program to a system or an apparatus.

【００８１】[0081]

【発明の効果】以上説明したように本発明によれば、音
声入力する状況に応じて音声を検出方法を変更し、有効
な音声情報を下位処理に渡すことが可能になる。As described above, according to the present invention, it is possible to change the voice detection method according to the situation of voice input and pass valid voice information to the lower processing.

【００８２】[0082]

[Brief description of drawings]

【図１】本発明に係る情報機器の第１実施例のブロック
図である。FIG. 1 is a block diagram of a first embodiment of an information device according to the present invention.

【図２】第１の実施例のメイン処理を示すフローチャー
トである。FIG. 2 is a flowchart showing a main process of the first embodiment.

【図３】第１の実施例における音声入力モード選択の処
理を示すフローチャートである。FIG. 3 is a flowchart showing a voice input mode selection process in the first embodiment.

【図４】第１の実施例における音声データ分析処理を示
すフローチャートである。FIG. 4 is a flowchart showing a voice data analysis process in the first embodiment.

【図５】第１の実施例における音声区間検出のモード１
の処理を示すフローチャートである。FIG. 5 is a mode 1 of voice section detection in the first embodiment.
It is a flowchart which shows the process of.

【図６】第１の実施例における音声区間検出のモード２
の処理を示すフローチャートである。FIG. 6 is a second mode of voice section detection in the first embodiment.
It is a flowchart which shows the process of.

【図７】第１の実施例における音声区間検出のモード３
の処理を示すフローチャートである。FIG. 7 is a mode 3 of voice section detection in the first embodiment.
It is a flowchart which shows the process of.

【図８】第１の実施例において音声入力モードを選択す
る際に選択可能な音声入力モードを表示した例である。FIG. 8 is an example of displaying a voice input mode selectable when the voice input mode is selected in the first embodiment.

【図９】第１の実施例において、選択された音声入力モ
ードをユーザに伝えるために、画面上に表示した例であ
る。FIG. 9 is an example in which the selected voice input mode is displayed on the screen in order to notify the user in the first embodiment.

【図１０】第１の実施例において、モード１の音声区間
検出処理が突発的な雑音の発生により、音声区間を誤検
出した例である。FIG. 10 is an example of erroneously detecting a voice section due to the occurrence of sudden noise in the voice section detection processing of mode 1 in the first embodiment.

【図１１】第１の実施例において、モード２の音声区間
検出処理が、音声区間を正しく検出した例である。FIG. 11 is an example in which the voice section detection processing of mode 2 correctly detects the voice section in the first embodiment.

【図１２】第１の実施例において、モード１の音声区間
検出処理が定常的な雑音の発生により、音声区間を検出
できなかった例である。FIG. 12 is an example in which the voice section cannot be detected in the voice section detection process of the mode 1 in the first embodiment due to the occurrence of steady noise.

【図１３】第１の実施例において、モード３の音声区間
検出処理が突発的な雑音の発生により、音声区間を正し
く検出した例である。FIG. 13 is an example in which the voice section detection processing of mode 3 correctly detects the voice section due to the occurrence of sudden noise in the first embodiment.

【図１４】第２の実施例のメイン処理を示すフローチャ
ートである。FIG. 14 is a flowchart showing a main process of the second embodiment.

【図１５】第２の実施例における音声入力モード選択の
処理を示すフローチャートである。FIG. 15 is a flowchart showing a voice input mode selection process in the second embodiment.

[Explanation of symbols]

１システムバス２表示装置３入力装置４音声入力装置５Ｉ／Ｏ装置６ＣＰＵ７記憶装置 1 System Bus 2 Display Device 3 Input Device 4 Voice Input Device 5 I / O Device 6 CPU 7 Storage Device

───────────────────────────────────────────────────── フロントページの続き (72)発明者大洞恭則東京都大田区下丸子３丁目30番２号キヤノン株式会社内 ─────────────────────────────────────────────────── ─── Continuation of the front page (72) Inventor, Yasunori Otodo 3-30-2 Shimomaruko, Ota-ku, Tokyo Canon Inc.

Claims

[Claims]

1. A voice input device for passing voice information input from a voice input means to a lower process, comprising a plurality of modes for specifying a period of significant voice information input from the voice input means. A voice input device, comprising: a selection unit that selects one of the plurality of modes.

2. The mode includes a first mode for automatically identifying a start point and an end point of a voice, a second mode for manually instructing a start point position, and a third mode for manually instructing a start point and an end point position. The voice input device according to claim 1, wherein:

3. The selecting means includes the first unit at an initial stage.
3. The voice input device according to claim 2, wherein the second mode and the third mode are selected in that order when a predetermined change instruction is issued.

4. The voice input device according to claim 1, wherein the selection means selects manually.

5. A control method for a voice input device, which transfers voice information input from voice input means to a lower process, comprising a plurality of methods for specifying a period of significant voice information input from said voice input means. And a selection step of selecting one from the plurality of modes, and a method for controlling a voice input device.

6. The mode includes a first mode for automatically identifying a start point and an end point of a voice, a second mode for manually instructing a start point position, and a third mode for manually instructing a start point and an end point position. The method for controlling a voice input device according to claim 5, wherein.

7. The selecting step includes the first step at an initial stage.
7. The control method for the voice input device according to claim 6, wherein the second mode and the third mode are selected in this order when a predetermined change instruction is issued.

8. The method for controlling a voice input device according to claim 5, wherein the selecting means selects manually.