JPH01112299A

JPH01112299A - Voice recognition equipment

Info

Publication number: JPH01112299A
Application number: JP63176754A
Authority: JP
Inventors: Futai Kimura; 木村　普太
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1987-07-16
Filing date: 1988-07-15
Publication date: 1989-04-28
Anticipated expiration: 2012-06-04
Also published as: JP2617527B2

Abstract

PURPOSE: To omit the execution of recognition processing for an unnecessary voice input by inputting a correct voice even after inputting unnecessary voice. CONSTITUTION: A buffer 3 is connected to the post stage of an input part 2 so that voice is temporarily stored in the buffer 3. When a candidate selection switch SW is depressed, only a syllable, paragraph or sentence most close to a current point of time out of syllables, paragraphs or sentences stored in the buffer 3 is recognized. Consequently only a necessary syllable, paragraph or sentence can be recognized after checking that a voice input is not noise, mispronunciation or an unnecessary chat.

Description

【発明の詳細な説明】〔目次〕概要産業上の利用分野従来の技術（第２５図、第２６図）発明が解決しようとする課題課題を解決するだめの手段（第１図）作用実施例（第２図〜第２４図）（１）第１実施例（２）第２実施例（３）第３実施例（４）第４実施例発明の効果Ｃ概要〕音声認識装置に関し、不要な音声入力があってもその後に正しい音声を入力す
ることにより不要な音声入力に対して認識処理を行わな
いようにすることを目的とし、入力音声を分析して特徴
部分を抽出し辞書との比較によって認識を行う音声認識
装置において、入力音声を一旦記憶保持する保持手段と
、この保持手段に記憶された音声を認識することを指示
する認識指示手段を設け、認識すべき音声を入力して前
記認識指示手段を操作したときその操作の直前の音声入
力部分を抽出して認識するようにしたものである。[Detailed description of the invention] [Table of contents] Overview Industrial field of application Prior art (Figures 25 and 26) Means for solving the problem to be solved by the invention (Figure 1) Working examples (Figures 2 to 24) (1) First embodiment (2) Second embodiment (3) Third embodiment (4) Fourth embodiment Effects of the invention C Summary] Regarding the speech recognition device, unnecessary The purpose is to avoid performing recognition processing on unnecessary voice input by inputting the correct voice even if there is voice input, and to analyze the input voice, extract characteristic parts, and compare it with a dictionary. A speech recognition device that performs recognition is provided with a storage means for temporarily storing input speech and a recognition instruction means for instructing recognition of the speech stored in the storage means, and a speech recognition device that inputs the speech to be recognized and performs the above-mentioned speech recognition. When the recognition instruction means is operated, the voice input portion immediately before the operation is extracted and recognized.

[Industrial application field]

この発明は音声認識装置に係り、特に音声により文書を
直接入力するとき、咳などの不所望な入力を認識対象範
囲外とすることができるようにしたものに関する。The present invention relates to a voice recognition device, and particularly to a voice recognition device that can exclude undesired input such as coughing from the recognition target range when directly inputting a document by voice.

音声を単音節、単語、文節あるいは文章単位に区切って
入力する音声認識装置では、音声入力の区切りの直後に
認、織結果の最有力候補を一つ表示出力し、次に認識結
果の候補選択あるいは同音異義語の選択をする必要があ
り、また、他の人との会話、咳あるいは周囲の雑音など
文書作成に不要な音がマイクに入らないようにする必要
があり、このようなことが実現できる音声認識装置が要
望されている。In a speech recognition device that inputs speech by dividing it into single syllables, words, phrases, or sentences, it recognizes it immediately after the speech input is divided, displays and outputs one of the most likely candidates of the recognition result, and then selects the candidate of the recognition result. Alternatively, you may need to choose between homophones, or you may need to prevent unnecessary sounds from entering your microphone, such as conversations with other people, coughs, or ambient noises. There is a need for a speech recognition device that can realize this.

[Conventional technology]

例えば音声入力文書作成装置における音声認識装置にお
いて、音声を単音節、単語、文節あるいは文章単位に区
切って人力するために、従来では、第２５図に示すよう
に、マニアル操作するスイッチを用いていた。For example, in the speech recognition device of a speech input document creation device, in order to manually input the speech by dividing it into single syllables, words, phrases, or sentences, conventionally a manually operated switch was used as shown in Figure 25. .

第２５図において、６１は入力部であり、マイクからの
音声入力を受けて必要な強さにまで増幅し、デジタル信
号に変換する。In FIG. 25, 61 is an input section which receives audio input from a microphone, amplifies it to the required strength, and converts it into a digital signal.

６２は音声区間検出部であり、音声入力の区切りによっ
て、単音節、単語、文節あるいは文章単位を検出する。Reference numeral 62 denotes a speech section detecting section, which detects a single syllable, word, phrase, or sentence unit based on the break in the speech input.

６３は認識部であり、図示省略した辞書を参照して、音
声入力信号を認識する。６４は候補選択・同音異義選択
部であり、最初の認識の結果が同音異義語であり、目的
とするものでなかった時に、他の候補を選択するもので
ある。Reference numeral 63 denotes a recognition unit, which recognizes the audio input signal by referring to a dictionary (not shown). Reference numeral 64 denotes a candidate selection/homophone selection section, which selects another candidate when the first recognition result is a homophone and is not the desired word.

６５は表示部であり、認識部６３の認識結果、あるいは
、候補選択・同音異義選択部で選択された他候補を表示
する。スイッチＳＷＩ、ＳＷ２、ＳＷ３は操作者が操作
するマニアルスイッチである。ＳＷＩは音声入力モード
切換スイッチであり、他の人との会話、咳あるいは周囲
の雑音など文書作成に不要な音がマイクから入らないよ
うに、音声を人力できる音声人力モードと音声を入力で
きない音声非入力モードとに切換えるスイッチである。A display section 65 displays the recognition result of the recognition section 63 or other candidates selected by the candidate selection/homonym selection section. Switches SWI, SW2, and SW3 are manual switches operated by an operator. SWI is a voice input mode selector switch, which allows you to switch between a voice input mode that allows you to input voice manually, and a voice input mode that does not allow voice input, so that unnecessary sounds for document creation, such as conversations with other people, coughs, and ambient noise, do not enter the microphone. This is a switch for switching to non-input mode.

ＳＷ２は、候補選択・同音異義選択スイッチであり、認
識された結果が目的のものでなかった場合に、このスイ
ッチを押下げて他候補を表示させるためのものであり、
ＳＷ３は言い間違いや咳等により、不所望の入力を行っ
た時にその入力を取消すためのものである。SW2 is a candidate selection/homonym selection switch, which is used to display other candidates by pressing this switch when the recognized result is not the desired one.
SW3 is for canceling an undesired input when an undesired input is made due to a mistake in speaking, coughing, or the like.

第２６図は、第２５図に示す従来例の音声人力文書作成
装置の動作説明のための流れ図である。FIG. 26 is a flowchart for explaining the operation of the conventional human-powered voice document creation device shown in FIG.

図に示すとおり、第１の音声入力がなされると、音声区
間検出部６２は、その区切りによって、区間検出を行い
、認識部６３において辞書と比較して認識を行う。その
認識結果を表示部６５に表示する。この認識結果を見て
、それが正しければさらに第２の音声入力を行う。する
と、先の認識結果を正しいものとして、次の第２の音声
人力の認、識処理に入る。As shown in the figure, when the first voice input is made, the voice section detection section 62 detects the section based on the delimitation, and the recognition section 63 performs recognition by comparing it with a dictionary. The recognition result is displayed on the display section 65. This recognition result is checked, and if it is correct, a second voice input is performed. Then, the previous recognition result is assumed to be correct, and the next second voice recognition and recognition process begins.

もし、第１の音声入力の結果の認識が正しくない場合に
は、候補選択・同音異義選択スイッチＳＷ２を押下げる
。これにより、新しい候補が表示されるので、これが所
望のものであれば、次の音声入力を行う。If the recognition of the result of the first voice input is incorrect, the candidate selection/homophone selection switch SW2 is depressed. As a result, a new candidate is displayed, and if this is the desired one, perform the next voice input.

[Problem to be solved by the invention]

ところが、このような従来例では、音声入力モードの場
合、常に音声区間検出処理を行っているため、余計なお
しゃべりや周囲の雑音などは許されなく、発声者に過度
の緊張感を与えてしまうという問題点を有している。However, in such conventional methods, in the voice input mode, voice segment detection processing is always performed, so unnecessary chatter and surrounding noise are not allowed, causing excessive tension to the speaker. There is a problem with this.

また、発声の度に認識結果の候補選択スイッチを押した
り、押さなかったりするため発声タイミングが一定せず
マン・マシン・インターフェースの観点から操作性およ
び使用感が良くないという問題点を有している。また、
単語や文節などの発声単位毎に明確に区切って発声する
のは非常に困難なことであり、次々に音声入力を続ける
と発声が段々速くなってしまいついには、２つの発声単
位が連続してしまい、誤認識に原因となることがある等
の問題点を有している。In addition, since the candidate selection switch for the recognition result is pressed or not pressed each time a voice is uttered, the timing of the utterance is inconsistent, resulting in poor operability and usability from a man-machine interface perspective. There is. Also,
It is very difficult to clearly separate each unit of utterance such as a word or phrase into utterances, and if you continue to input voice one after another, the utterances will become faster and faster, and eventually two utterance units will become continuous. However, there are problems in that it may be difficult to use, and may cause misrecognition.

この発明は、このような点に鑑みてなされたものであり
、スイッチの操作の必要が少なく、また、雑音、発音誤
り等を気にせずに人力作業ができる音声認識装置を提供
することを目的とする。The present invention has been made in view of these points, and an object of the present invention is to provide a speech recognition device that requires less operation of switches and allows manual work without worrying about noise, mispronunciation, etc. shall be.

[Means to solve the problem]

第１図は、この発明の原理図であり、図において、１は
マイク、２は入力部、３はバッファ、４は音声区間検出
部、１０は認識・候補選択部、８は表示部、ＳＷは候補
選択スイッチである。FIG. 1 is a diagram showing the principle of the present invention. In the figure, 1 is a microphone, 2 is an input section, 3 is a buffer, 4 is a voice section detection section, 10 is a recognition/candidate selection section, 8 is a display section, and SW is a candidate selection switch.

発声された音声はマイク１において電気信号に変換され
、入力部２において分析され、その１ｋ　−時的にバッ
ファ３に格納される。バッファ３の容量は、最小限人力
される音声の最大長のものか記憶される容量が必要であ
る。音声区間検出部４では、候補選択スイッチＳＷが押
された時点でバッファのデータを参照し、現時点に最も
近い音声区間部分を検出する。認識・候補選択部１０で
は、この現時点に最も近い音声区間部分の認識を行い、
また、その認識結果の候補から、正しい結果を選択する
。認識の結果は表示部８に表示し、この表示を見ながら
正しい結果を選択する。バッファはリングバッファを構
成しており、古いデータは新しいデータに順次置き換え
られる。The uttered voice is converted into an electrical signal at the microphone 1, analyzed at the input section 2, and stored in the buffer 3 in 1k-times. The buffer 3 must have a minimum capacity to store the maximum length of the voice that can be input manually. The voice section detecting section 4 refers to the data in the buffer at the time when the candidate selection switch SW is pressed, and detects the voice section closest to the current time. The recognition/candidate selection unit 10 recognizes the voice section closest to this current point,
Furthermore, the correct result is selected from the recognition result candidates. The recognition results are displayed on the display section 8, and the correct result is selected while viewing this display. The buffers constitute a ring buffer, and old data is sequentially replaced with new data.

[Effect]

この発明では、人力部２の後に、バッファ３を設け、音
声入力を一旦ハソファ３内に保持するように構成し、か
つ、候補選択スイッチＳＷを押下げたときに、バッファ
３内に保持された音節、文節或いは文章のうち現時点に
最も近い音節、文節或いは文章のみの認識を行うように
しているので、音声入力が雑音や発音誤り或いは余計な
お喋りでないことを確認した後、必要な音節、文節或い
は文章のみの認識を行わせることが可能となる。In this invention, a buffer 3 is provided after the human power section 2, and the voice input is temporarily held in the sofa 3, and when the candidate selection switch SW is pressed down, the voice input is held in the buffer 3. Since only the syllable, phrase, or sentence that is closest to the current moment is recognized among the syllables, phrases, or sentences, the necessary syllables, phrases, or sentences are recognized after confirming that the voice input is not noise, mispronunciation, or unnecessary chatter. Alternatively, it is possible to recognize only sentences.

〔Example〕

（１）第１実施例本発明の第１実施例を第２図〜第６図にもとづき説明す
る。(1) First Embodiment A first embodiment of the present invention will be explained based on FIGS. 2 to 6.

第２図は本発明を使用した文書作成装置の要部構成図、
第３図は音声区間検出部の構成例、第４図は音声パワー
曲線図、第５図は第１実施例の動作説明図、第６図は従
来例と第１実施例の操作比較図である。FIG. 2 is a configuration diagram of main parts of a document creation device using the present invention;
FIG. 3 is a configuration example of the voice section detection section, FIG. 4 is a voice power curve diagram, FIG. 5 is an explanatory diagram of the operation of the first embodiment, and FIG. 6 is a comparison diagram of the operation of the conventional example and the first embodiment. be.

第２図において、第１図の原理図と同じ部分には、同じ
番号を付与しであるので、その部分についての詳細な説
明は省略する。この実施例においては、認識部６、候補
選択・同音異義選択部７を各々別に設けている外、音声
区間検出部４を第３図に示すように、パワー計算部３１
、高検出部３２、メモリ３３、判定部３４によって構成
している。In FIG. 2, the same parts as in the principle diagram of FIG. 1 are given the same numbers, so detailed explanations of those parts will be omitted. In this embodiment, in addition to separately providing a recognition section 6 and a candidate selection/homophone selection section 7, the speech section detection section 4 is provided with a power calculation section 31 as shown in FIG.
, a high detection section 32, a memory 33, and a determination section 34.

第３図において、パワー計算部３１は、バッファ３に記
憶された音声のパワーを計算する部分であり、その出力
には第４図に示すように、時間に応じた音声パワー曲線
が得られる。In FIG. 3, the power calculation unit 31 is a part that calculates the power of the audio stored in the buffer 3, and as its output, a audio power curve according to time is obtained as shown in FIG.

高検出部３２は、第４図に示す音声パワー曲線のうち、
所定の閾値２１以上の領域ａ、ｂ、ｃを検出する部分で
ある。この領域ａ、ｂ、ｃは、島領域と呼ばれ、何らか
の音声入力があった個所に対応している。The high detection unit 32 detects the voice power curve shown in FIG.
This is a part that detects areas a, b, and c that are equal to or higher than a predetermined threshold value 21. These areas a, b, and c are called island areas and correspond to locations where some audio input has been made.

メモリ３３は、島検出部３２で検出された各島領域のス
タートとエンド時間をそれぞれ記憶している。例えば、
島領域ａのスタートＳＩとエンドＥｌ（以下ａ　　（Ｓ
ｌ、　Ｅｌ）と略記する）、ｂ（３２、Ｅ２）　、ｃ　
（ｓ３、Ｅｉ）をそれぞれ記憶しておく。The memory 33 stores the start and end times of each island area detected by the island detection unit 32. for example,
Start SI and end El of island area a (hereinafter a (S
abbreviated as l, El), b(32, E2), c
(s3, Ei) are stored respectively.

判定部３４は、音声パワー曲線で示される音声入力から
、１つの入力単位である単音節、文節或いは文章等の音
声区間を判定し、さらに、この発明に従ってスイッチＳ
界押下げ時点に最も近い音声区間を判定する。例えば、
時刻ｔ２において、スイッチＳＷを押下げるものとする
。島領域ａ、ｂ、ｃに間隔ｊ！、、Ｉ１２が共に予じめ
決められた閾値Ｔｈ／より大のときは、各島領域ａ、ｂ
、ｃはそれぞれ独立した音声区間であると判定し、スイ
ッチＳ界押下げ時点ｔ２に最も近い島領域である音声区
間Ｃのみを認識すべき音声区間として、認識部６に送る
。ｌ５．１２、が共に闇値ＴＨＥより小の時は、島領域
ａ、ｂ、ｃを合わせた領域（ａ＋ｂ＋Ｃ）を１つの音声
区間と判定し、しかも、これがスイッチＳ界押下げ時点
ｔ２に最も近いことが明らかであるから、この音声区間
（ａ　＋　ｂ　＋Ｃ）を認識すべき音声区間として認識
部６に送る。The determination unit 34 determines a voice section such as a single syllable, a phrase, or a sentence, which is one input unit, from the voice input shown by the voice power curve, and further, according to the present invention, switches S
Determine the voice section closest to the moment when the field is pressed. for example,
It is assumed that the switch SW is pressed down at time t2. Island regions a, b, and c are spaced at intervals j! , , I12 are both larger than a predetermined threshold Th/, each island area a, b
, c are determined to be independent voice sections, and only the voice section C, which is the island region closest to the switch S-field depression time t2, is sent to the recognition unit 6 as the voice section to be recognized. l5.12, are both smaller than the dark value THE, the combined area of island areas a, b, and c (a+b+C) is determined to be one voice section, and this is the highest at the time t2 when the switch S field is pressed. Since it is clear that they are close, this voice section (a + b + C) is sent to the recognition unit 6 as the voice section to be recognized.

また、１２が閾値Ｔｈｌより小で、１１が閾値Ｔｈｚよ
り大のときは、音声区間は島領域ａと島領域（ｂ＋ｃ）
であると判定され、さらに時刻ｔ２に最も近い領域（ｂ
　十ｃ）が、認識すべき音声区間として判定されて、そ
のアドレス情報が認識部６に送られる。この音声区間情
報にもとづき、バッファ３より必要な領域を読出し、認
識する。Furthermore, when 12 is smaller than the threshold Thl and 11 is larger than the threshold Thz, the voice section is divided into island area a and island area (b+c).
It is determined that the area closest to time t2 (b
10c) is determined as the voice section to be recognized, and its address information is sent to the recognition unit 6. Based on this voice section information, a necessary area is read out from the buffer 3 and recognized.

次に、この発明の動作を第５図の動作流れ図を参照しな
がら説明する。Next, the operation of the present invention will be explained with reference to the operation flowchart of FIG.

この発明では、マイク１からの音声の入力直後には、認
識結果の最有力候補を表示せずに、音声入力を入力部２
で分析し、それを−旦バソファ３に記憶しておき、バッ
ファ３への蓄積完了の表示のみを表示部８にて行う。即
ち、第５図の動作流れ図を参照して、動作を説明すると
、音声入力がなされ（ステップ１）、完了するとその旨
を表示する（ステップ２）。この表示は、例えば表示部
８における＊印のブリンク等で良い。In this invention, immediately after the voice is input from the microphone 1, the voice input is sent to the input section 2 without displaying the most likely recognition result candidate.
The data is analyzed and stored in the buffer buffer 3, and only the completion of storage in the buffer 3 is displayed on the display unit 8. That is, the operation will be explained with reference to the operation flowchart of FIG. 5. Voice input is performed (step 1), and when it is completed, a message to that effect is displayed (step 2). This display may be, for example, a blinking * mark on the display section 8.

次にスイッチＳＷが押下げられると、音声区間検出部４
にて、最新の音声区間の検出が行われることになる（ス
テップ４）が、スイッチＳＷが押下げられないで再び音
声入力があると、ステップ４には進まず、始めのステッ
プ１に戻る。従って、例えば、始めに誤って発音した場
合或いは咳ばらいをした場合等には、スイッチＳＷを押
下げることなく、音声区間の判定に十分なだけの時間を
あけて、その後に正しい発音で音声を入力すれば良い。Next, when the switch SW is pressed down, the voice section detection section 4
At step 4, the latest voice section is detected (step 4), but if the switch SW is not pressed down and there is voice input again, the process does not proceed to step 4 and returns to the beginning step 1. Therefore, for example, if you pronounce the pronunciation incorrectly at the beginning or cough, etc., without pressing the switch SW, wait enough time to determine the vocal section, and then try again with the correct pronunciation. All you have to do is enter.

第４図において、誤入力の音声区間ａ、ｂの後に、正し
い音声を入力すると、これは音声区間Ｃとなってバッフ
ァ３に記憶されることとなるので、ここでスイッチＳＷ
を押下げると、音声区間検出部４においてこの時点での
最新の音声区間であるＣを認識部６に送ることとなる。In FIG. 4, if correct speech is input after incorrectly input speech sections a and b, this becomes speech section C and is stored in the buffer 3, so switch SW
When pressed, the voice section detection section 4 sends C, which is the latest voice section at this point, to the recognition section 6.

認識部６はこの音声区間情報によりバッファ３より音声
区間を読出し認識を行う。なお、単一の音声のみが入力
された後、直ちに、スイッチＳＷが押下げられ１またときには、その単一の音声が、認識部６に送られ、認
識されることはいうまでもない。なお、バッファ３がい
っばいになった時には、次々に古い入力音声データから
順に書きかえれば良い。The recognition unit 6 reads the speech section from the buffer 3 based on this speech section information and performs recognition. It goes without saying that if the switch SW is pressed down immediately after only a single voice is input, that single voice is sent to the recognition unit 6 and recognized. Note that when the buffer 3 becomes full, it is sufficient to rewrite the input audio data one after another, starting from the oldest input audio data.

認識部６に送られた音声は、辞書と比較され、最優先候
補から出力される（ステップ５）。そして、その結果が
表示部８に表示される（ステップ６）。この結果の表示
をみて、それが、正しいものであるときには、次の音声
入力を行うと（ステップ７）、ステップ１に戻ることに
なる。結果が同音ではあるが異義語であって、所望のも
のでなかったときには、スイッチＳＷを押下げる。する
と、次の候補が表示されることになる（ステップ８、ス
テップ６）。The speech sent to the recognition unit 6 is compared with a dictionary, and the highest priority candidates are output (step 5). The results are then displayed on the display section 8 (step 6). If the displayed result is correct, the next voice input is performed (step 7), and the process returns to step 1. If the result is a homophone but a different meaning, and is not the desired one, the switch SW is pressed down. Then, the next candidate will be displayed (Step 8, Step 6).

この操作を繰り返し、自分の望んでいる結果が得られた
時に、スイッチＳＷを押下げることなく、次の音声人力
を行えば、ステップ１に戻り、次の音声の入力処理に移
行することとなる。Repeat this operation, and when you get the result you want, perform the next voice input without pressing the switch SW, and you will return to step 1 and move on to the next voice input process. .

この発明によれば、音声入力後直に認識を行い、その認
識結果を表示せずに、−旦音声入力をバソファに入力し
、スイッチ操作をした時点での最新の音声区間のみを認
識部に送り、認識を行うこととしているので、誤発声等
をしたときには、その後、所定の時間経過後に正しい音
声入力をし、スイッチを押すことで正しい音声のみを認
識させることができる。このため、誤発声のみでなく雑
音や咳ばらい等にこだわることなく、入力できる。According to this invention, recognition is performed immediately after voice input, and without displaying the recognition result, the voice input is inputted to the bath sofa once, and only the latest voice section at the time of the switch operation is sent to the recognition unit. Since the system transmits and recognizes the correct voice, if the user makes a mistake in utterance, the correct voice can be input after a predetermined period of time has elapsed, and only the correct voice can be recognized by pressing a switch. Therefore, input can be made without worrying about not only erroneous pronunciations but also noise, coughing, etc.

また、単語や文節等発声単位毎のスイッチ操作によって
入力に一定のタイミングを与えることができ、発声単位
毎に明確に区切って発声し易くなるため、２つの発声単
位が結合してしまうこともなく、このための誤認識がな
くなる。In addition, it is possible to give a certain timing to the input by operating a switch for each vocal unit such as a word or phrase, making it easier to clearly separate each vocal unit and utter, so two vocal units will not be combined. , this eliminates misrecognition.

また、使用者は、雑音や発音誤りなどは無視して、とに
かく正しい発声をした直後にスイッチを操作するのみで
良く、操作も簡単である。In addition, the user only has to operate the switch immediately after uttering the correct voice, ignoring noise and mispronunciation, and the operation is easy.

第６図に本発明と従来例の操作を比較して示しであるよ
うに、発声１、発声２を入力する場合で、途中に雑音、
発声誤り、咳ばらいがあった時には、本発明のスイッチ
操作が、きわめて少なくて良いことがわかる。従ってこ
のような音声入力認識装置により文書作成を行えば、正
確に音声入力文書作成を行うことができる。As shown in FIG. 6, which compares the operations of the present invention and the conventional example, when utterance 1 and utterance 2 are input, there may be noise or
It can be seen that when there is a speech error or a cough, the switch operation of the present invention can be minimized. Therefore, if a document is created using such a voice input recognition device, the voice input document can be created accurately.

（２）第２実施例本発明の第２実施例を第７図および第８図により説明す
る。第７図は第２実施例の原理説明図であり第８図はそ
の実施例構成図である。第７図、第８図において第１図
〜第２図と同一部分には同一記号を付与している。(2) Second Embodiment A second embodiment of the present invention will be explained with reference to FIGS. 7 and 8. FIG. 7 is a diagram explaining the principle of the second embodiment, and FIG. 8 is a diagram showing the configuration of the embodiment. In FIGS. 7 and 8, the same parts as in FIGS. 1 and 2 are given the same symbols.

ＳＷＩは音声区間検出指示用のスイッチ、ＳＷ２は候補
選択用のスイッチ、ＳＷ３は削除用のスイッチである。SWI is a switch for instructing voice section detection, SW2 is a switch for selecting candidates, and SW3 is a switch for deletion.

発声された音声はマイク１において電気信号に変換され
、入力部２においてデジタル変換され、その後−時的に
バッファ３に格納される。バッファ３の容量は、最小限
入力される音声の最大長のものが記憶される容量が必要
である。音声区間検出部４では、スイッチＳＷＩが押さ
れた時点でバッファのデータを参照し、現時点に最も近
い音声区間部分を検出する。認識・候補選択部１０では
、この現時点に最も近い音声区間部分の認識を行い、ま
た、その認識結果の候補から、正しい結果を選択する。The uttered voice is converted into an electrical signal at the microphone 1, digitally converted at the input section 2, and then temporarily stored in the buffer 3. The buffer 3 needs to have a minimum capacity to store the maximum length of input audio. The voice section detecting section 4 refers to the data in the buffer at the time when the switch SWI is pressed, and detects the voice section closest to the current time. The recognition/candidate selection unit 10 recognizes the voice segment closest to the current time, and selects the correct result from the candidates of the recognition results.

認識の結果は表示部８に表示し、この表示をみながら正
しい結果を選択する。The recognition result is displayed on the display section 8, and the correct result is selected while viewing this display.

ここでスイッチＳＷＩは発声直後に音声区間の切り出し
を指示するためのものであり、スイッチＳＷ２は認識結
果の候補選択または同音異義選択を行うためのものであ
り、スイッチＳＷ３は誤った認識結果を削除するための
ものである。Here, the switch SWI is used to instruct the extraction of a speech section immediately after utterance, the switch SW2 is used to select candidates or homophones for recognition results, and the switch SW3 is used to delete incorrect recognition results. It is for the purpose of

（３）第３実施例第９図に示す本発明の第３実施例では、第８図に示す第
２実施例において、スイッチＳＷ１とＳＷ２を共通化し
てＳＷＩとしたものであり、煩雑なスイッチ操作を簡単
化している。すなわち単一の音声のみが入力された後、
直ちにスイッチＳＷ１が押下げられたとき音声がバッフ
ァ３内にあるため音声区間検出部４により音声区間が検
出され認識が開始されるので、このとき認識結果の候補
はまだないので認識候補の選択処理は行われない。(3) Third Embodiment In the third embodiment of the present invention shown in FIG. 9, the switches SW1 and SW2 in the second embodiment shown in FIG. It simplifies operation. i.e. after only a single voice is input,
Immediately when the switch SW1 is pressed down, the voice is in the buffer 3, so the voice section detecting section 4 detects the voice section and starts recognition.At this time, there are no recognition result candidates yet, so the recognition candidate selection process is performed. will not be carried out.

また音声の発声がなく認識結果候補のみがありその選択
をスイッチＳＷＩで行うときは、バッファ３内に音声が
ないため音声区間検出部４は動作せず候補の選択のみが
行われる。このような理由によりスイッチＳＷＩとＳＷ
２の共通化を行うことができる。Further, when there is no voice uttered and only recognition result candidates are selected using the switch SWI, since there is no voice in the buffer 3, the voice section detecting section 4 does not operate and only the candidates are selected. For this reason, switches SWI and SW
2 can be made common.

（４）第４実施例第１０図に示す本発明の第４実施例は、第９図に示す第
３実施例とほぼ同じであるが、スイッチＳＷＩにより候
補選択を行っているときに、入力部２′の動作を一時的
に停止する機能を付加したところが異なる。(4) Fourth Embodiment The fourth embodiment of the present invention shown in FIG. 10 is almost the same as the third embodiment shown in FIG. The difference is that a function to temporarily stop the operation of section 2' is added.

各実施例において各構成要素で異なるのは、入力部であ
る。第１実施例〜第３実施例の入力部２は同じものであ
り、その内部構成を第１１図に示す。第４実施例の入力
部２′は他のものと異なり、その内部構成を第１２図に
示す。What differs among each component in each embodiment is the input section. The input section 2 of the first to third embodiments is the same, and its internal configuration is shown in FIG. 11. The input section 2' of the fourth embodiment is different from the others, and its internal configuration is shown in FIG.

第１１図において、マイクより入力された音声信号はア
ナログフィルタ２０に入力される。アナログフィルタ２
０は、次段のサンプルホールド２１でのサンプリング周
波数の半分の周波数より若干低いカットオフ周波数を持
つローバスフィルタである。サンプルホールド２１では
、次段のＡＤ変換より供給されるクロックに従って、ア
ナログフィルタ２０を通過した音声信号の時間軸を量子
化する。ＡＤ変換２２では、時間軸量子化された音声信
号の振幅を量子化し、音声デジタル信号の時系列Ｄｊを
そのクロックｃｋｌと共に次段に出力し、またサンプル
ホールドに必要なりロックをサンプルホールド２１に出
力する。クロック２３は、ＡＤ変換２２に必要なりロッ
クを水晶発振子などにより生成する。In FIG. 11, an audio signal input from a microphone is input to an analog filter 20. In FIG. analog filter 2
0 is a low-pass filter having a cutoff frequency slightly lower than half the sampling frequency of the sample hold 21 in the next stage. The sample hold 21 quantizes the time axis of the audio signal that has passed through the analog filter 20 in accordance with the clock supplied from the AD conversion at the next stage. The AD conversion 22 quantizes the amplitude of the time-axis quantized audio signal, outputs the time series Dj of the audio digital signal together with its clock ckl to the next stage, and also outputs the lock required for sample and hold to the sample and hold 21. do. The clock 23 is necessary for the AD conversion 22, and a lock is generated using a crystal oscillator or the like.

第１２図において、２０．２１．２２．２３の構成要素
は第１１図と同じである。ただし、２４．２５、及び２
６の構成要素によりスイッチＳＷ１からの信号により、
一定時間だけＡＤ変換２２へのクロックの入力を停止し
ている。２４はトリガ回路であり、ワンショトトリガ回
路で実現されている。２５はＮＯＴ回路である。２６は
ＡＮＤ回路であり、ＮＯＴ回路２５の出力力月の時のみ
ＡＤ変換用クロックをＡＤ変換２２に供給するゲートの
役割を果たす。In FIG. 12, the components 20.21.22.23 are the same as in FIG. 11. However, 24.25, and 2
With the signal from switch SW1,
The clock input to the AD conversion 22 is stopped for a certain period of time. 24 is a trigger circuit, which is implemented as a one-shot trigger circuit. 25 is a NOT circuit. 26 is an AND circuit, which serves as a gate that supplies an AD conversion clock to the AD conversion 22 only when the output of the NOT circuit 25 is high.

第１３図に第１２図の回路図の信号タイミング図を示す
。まず、スイッチＳＷＩからの信号Ｘがトリガ２４に入
力されると、トリガ２４では数秒幅のパルス信号ｙを発
生する。この数秒間において入力部の動作が停止するこ
とになる。この数秒幅のパルスの否定２がＮＯＴ回路２
５で得られ、ゲート回路であるＡＮＤ回路２６に供給さ
れる。FIG. 13 shows a signal timing diagram of the circuit diagram of FIG. 12. First, when the signal X from the switch SWI is input to the trigger 24, the trigger 24 generates a pulse signal y with a width of several seconds. The operation of the input section will stop during these few seconds. The negation 2 of this several seconds wide pulse is the NOT circuit 2
5 and is supplied to an AND circuit 26 which is a gate circuit.

ゲートされたＡＤ変換用クりックＷと各信号の関係を第
１３図に示す。FIG. 13 shows the relationship between the gated AD conversion click W and each signal.

第１４図はバッファの構成を説明する図である。FIG. 14 is a diagram illustrating the configuration of the buffer.

入力部からのＤｊはそのまま音声区間検出部へ転送され
るとともにメモリ部３０１の書き込みデータとして供給
される。また入力部からのｃｌｋはそのまま音声区間検
出部へ転送されるとともにカウンタ３００のカウントア
ツプクロックとして供給される。カウンタ３００はメモ
リ部３０１の書き込みアドレスをカウントする。一方ア
ドレス制御部３０２は認識部よりｉｓｓ、ｆｅｅ、５ｔ
ｂ２を得ている。アドレス制御部３０２は５ｔｂ２信号
が１になった直後からｉｓｓからｉｃｅにいたるアドレ
スをクロックｃｌｋｄと共に順次発生する。発生したア
ドレスはメモリ部３０１の読み出しアドレスとして使わ
れる。メモリ部３０１から読み出されたデータＤｋおよ
びｃｌｋｄは認識部６へ送られる認識に用いられる。Dj from the input section is transferred as is to the voice section detection section and is also supplied as write data to the memory section 301. Further, the clk from the input section is transferred as is to the voice section detection section and is also supplied as a count-up clock to the counter 300. A counter 300 counts write addresses in the memory section 301. On the other hand, the address control unit 302 receives iss, fee, 5t from the recognition unit.
I got b2. Immediately after the 5tb2 signal becomes 1, the address control unit 302 sequentially generates addresses from iss to ice together with the clock clkd. The generated address is used as a read address of the memory section 301. Data Dk and clkd read from the memory section 301 are sent to the recognition section 6 and used for recognition.

第１５図は音声区間検出部４を説明する図であり、各実
施例で共通のものである。まず、パワー計算部４０にお
いてバッファから読み出されたデジタル音声信号の数ミ
リ秒毎のパワー計算が行われ、得られたパワー時系列を
一時記憶する。島検出部４１ではスイッチＳＷＩからの
音声区間検出指示に従って、パワー時系列をパワー計算
部４０から読み出して、島の検出を行う。判定部４２で
は得られた名品の間隔を判定し、最終的な音声区間を決
定する。FIG. 15 is a diagram illustrating the voice section detection section 4, which is common to each embodiment. First, the power calculation unit 40 calculates the power of the digital audio signal read out from the buffer every few milliseconds, and temporarily stores the obtained power time series. The island detection section 41 reads out the power time series from the power calculation section 40 in accordance with the voice section detection instruction from the switch SWI, and detects islands. The determination unit 42 determines the interval between the obtained masterpieces and determines the final voice section.

第１６図はパワー計算部４０の内部構成を説明する図で
あり、全実施例に共通するものである。FIG. 16 is a diagram illustrating the internal configuration of the power calculation section 40, which is common to all embodiments.

パワー計算部では、バッファから読み出したｎ個の音声
デジタルデータの二乗値を累積し、その累積値を音声パ
ワーとする。入力部２または２′から得られ、バッファ
３に記憶された音声デジタルデータＤｊは二乗ＲＯＭ　
（４００）のアドレス部に入力される。二乗ＲＯＭの各
アドレスにはアドレス値の二乗の値が記憶されているた
め、ＲＯＭ（４００）の出力データとしては音声デジタ
ルデータの二乗が得られる。加算器４０２およびセレク
タ４０３は累算器を構成しており、二乗ＲＯＭ（４００
）で得られた音声デジタルデータの二乗値の累積を行う
。累積値は対数値ＲＯＭ（４’０４）のアドレスに供給
される。累積値の対数値がＲＯＭ　（４０４）のデータ
として得られる。対数値データは一時メモリ　（４０６
）に順次記憶される。−時記憶されたデータＰｉは、島
検出部４１からアドレスｉが指定され読み出される。バ
ッファ３から得られるクロックｃｋｌは、クロック分周
部４０１に人力され、その周波数が１　／　ｎに分周さ
れる。The power calculation unit accumulates the square values of the n pieces of audio digital data read from the buffer, and uses the accumulated value as the audio power. The audio digital data Dj obtained from the input section 2 or 2' and stored in the buffer 3 is stored in a square ROM.
(400) is input into the address field. Since the square value of the address value is stored in each address of the square ROM, the square of the audio digital data is obtained as the output data of the ROM (400). The adder 402 and the selector 403 constitute an accumulator, and the square ROM (400
) The squared values of the audio digital data obtained in step 2 are accumulated. The accumulated value is supplied to the address of the logarithmic value ROM (4'04). The logarithmic value of the cumulative value is obtained as data in the ROM (404). Logarithmic data is stored in temporary memory (406
) are stored sequentially. -The stored data Pi is read out by the island detection section 41 with address i specified. The clock ckl obtained from the buffer 3 is input to a clock frequency dividing section 401, and its frequency is divided into 1/n.

第１７図にバッファ３からのクロックｃｋｌと分周され
たクロックｃｋ２の関係を示す。分周されたクロックＣ
ｋ２は、まず累積器のクリアのために用いられる。すな
わちセレクタ４０３が加算器４０２の出力の累積値を選
択せずに設定値０を選択するための信号として供給され
る。また、クロックｃｋ２は、−時メモリのアドレスを
決定するカウンタのクロックとして用いられるとともに
、−時メモリの書き込み信号としても用いられる。FIG. 17 shows the relationship between the clock ckl from the buffer 3 and the frequency-divided clock ck2. Divided clock C
k2 is first used to clear the accumulator. That is, it is supplied as a signal for the selector 403 to select the set value 0 without selecting the cumulative value of the output of the adder 402. Further, the clock ck2 is used as a clock for a counter that determines the address of the - hour memory, and is also used as a write signal for the - hour memory.

次に第１８図、第１９図および第２０図を用いて島検出
部４１の機能および構成を説明する。Next, the function and configuration of the island detection section 41 will be explained using FIG. 18, FIG. 19, and FIG. 20.

第１８図は、島検出の原理を説明する図であり、パワー
計算部４０内の一時メモリ４０６の内容を示している。FIG. 18 is a diagram explaining the principle of island detection, and shows the contents of the temporary memory 406 in the power calculation section 40.

第１８図において、横軸はアドレスｉを示しており、縦
軸はデータＰｉを示している。In FIG. 18, the horizontal axis shows address i, and the vertical axis shows data Pi.

アト”レスｌは音声の時間軸に対応している。島検出部
４１では、データＰｉが連続して大きい部分（島）を以
下のような方式で検出する。閾値ｐｔｈ１およびＰ　ｔ
ｈ２（〈Ｐ　ｔｈｌ）が予め与えられている。The address l corresponds to the time axis of the audio.The island detection unit 41 detects a portion (island) where the data Pi is continuously large using the following method.Threshold value pth1 and Pt
h2(<P thl) is given in advance.

まず、Ｐ　ｔｈ２より大きい部分（■■■）を暫定島と
する。こうすることによりイの部分は雑音部分として除
去される。暫定島■■■から両側にｐｔｈ２を下回る直
前まで検索を行う。検索の結果、アの部分とつの部分が
島として得られる。以上述べた方式は、−時メモリの内
容（Ｐｉ）をランダムにアクセスするためハードウェア
向きではない。First, a portion (■■■) larger than P th2 is set as a temporary island. By doing this, the part A is removed as a noise part. Search is performed from the temporary island ■■■ until just before it drops below pth2 on both sides. As a result of the search, parts A and 2 are obtained as islands. The method described above is not suitable for hardware because the contents (Pi) of the -time memory are accessed randomly.

−時メモリの内容（Ｐｉ）をシーケンシャルにアクセス
する等価な方式を次に説明する。- An equivalent method for sequentially accessing the contents of the memory (Pi) will now be described.

まず、Ｐｉ≦Ｐ　ｔｈ２という事象をα、Ｐｔｈ２＜Ｐｉ≦Ｐｔ旧という事象をβ、ｐｔｈｉ＜ｐ
ｉという事象をＴ１と定義する。First, the event Pi≦P th2 is α, the event Pth2<Pi≦Pt old is β, and pthi<p
The event i is defined as T1.

次に第１９図に示すように、４つの状態５ｏ１Ｓ１、Ｂ
２、Ｂ３を考える。Ｐｉのアクセスは本方式ではｉの大
きいほうから小さいはうヘシーケンシャルに行うとする
。第１９図で、まずスタート時点では、状態ＳＯに入る
。順次ｉを減らしＰｉに関する事象α、βおよびγが発
生する度に状態遷移を繰り返す。状態遷移のアークに処
理内容が付与されている場合はその処理を同時に行うも
のとする。以下、第１７図のパワーデータの例で状態遷
移図を説明する。Next, as shown in FIG. 19, the four states 5o1S1, B
2. Consider B3. In this method, access to Pi is performed sequentially from the larger i to the smaller i. In FIG. 19, at the start time, the state SO is entered. Sequentially decrease i and repeat the state transition every time events α, β, and γ regarding Pi occur. If processing contents are assigned to state transition arcs, those processings are performed at the same time. The state transition diagram will be explained below using the example of power data in FIG. 17.

第１８図の＊印の点から前に向かって処理を行う。本発
明では、スイッチＳＷＩがこの時点で押されたと考える
。状態はまずＳＯに入る。＊印の点ではＰｉはＰ　ｔｈ
２より小さいためこの場合の事象はαである。すなわち
状態はＳＯにとどまる。Processing is performed from the point marked * in FIG. 18 forward. In the present invention, the switch SWI is considered to have been pressed at this point. The state first goes to SO. At the point marked *, Pi is P th
Since it is smaller than 2, the event in this case is α. In other words, the state remains SO.

ｌを減らしていくと事象βが発生し、状態はＳＯから８
１に遷移する。このときのｉをＳＴＭＰという内部変数
に一時的に記憶する。しばらくβの区間が続くため状態
はＳｌにとどまる。次に事象γが発生し、状態はＢ３に
遷移する。このとき先に記憶したＳＴＭＰの内容を内部
記憶ＳＲに記憶する。そのあと暫くγの区間が続き（■
の暫定島の部分）、状態はＢ３にとどまる。次に事象β
が発生し、状態はＢ２に遷移する。次にγが発生し状態
はＢ３に戻る（■の暫定島の部分）。そのあとでまた事
象βが発生し、状態が８２に遷移し、さらに事象αが発
生し、状態がＳＯに戻る。ここでｉの値を内部変数ＥＨ
に記憶する。この時点でＳＲとＥＲには島（ア）の両端
のアドレスが得られる。さらに進むと事象βが発生し、
状態がＳ１に遷移し、ｉの値がＳＴＭＰに記憶される（
イの部分）。しかし、次に事象αが発生するため状態が
ＳＯに戻ってしまうため、島（イ）を求めるには到らな
い。暫定島■、島（つ）の部分についても同様に処理が
続く。As l decreases, event β occurs, and the state changes from SO to 8.
Transition to 1. The i at this time is temporarily stored in an internal variable called STMP. Since the period β continues for a while, the state remains at Sl. Next, event γ occurs and the state transitions to B3. At this time, the contents of the previously stored STMP are stored in the internal storage SR. After that, the γ interval continues for a while (■
part of the temporary island), the state remains at B3. Next, event β
occurs, and the state transitions to B2. Next, γ occurs and the state returns to B3 (temporary island part of ■). After that, event β occurs again, and the state transitions to 82, and then event α occurs, and the state returns to SO. Here, the value of i is set as internal variable EH
to be memorized. At this point, the SR and ER have the addresses at both ends of the island (A). As we proceed further, event β will occur,
The state transitions to S1 and the value of i is stored in STMP (
part A). However, since the next occurrence of event α causes the state to return to SO, it is not possible to find the island (A). Processing continues in the same way for the temporary island ■ and island (tsu) parts.

第２０図は、以上説明した島検出方式を実現したハード
ウェア構成図である。FIG. 20 is a hardware configuration diagram that realizes the island detection method described above.

第２０図において、４１１１はクロック発生器であり、
スイッチＳＷ１が押された瞬間からクロックが発生する
。４１工２はカウンタであり、ＳＷｌが押された瞬間に
パワー計算部４ｏの内部のカウンタ４０５の値ｉ′が初
期値としてロードされ、クロック発生器４１１１のクロ
ックに従ってダウンカウントされる。カウンタ４１１２
の値は第１８図横軸のｉの値を示し、＊印の点から始ま
って次第に小さくなる。この値ｉによりパワー計算部４
０の一時メモリ４０６の内容Ｐｉを順番に読み出し、比
較器４１００のＢＯｌおよび４１．０１のＢ１に供給す
る。比較器４１００のＡＯには閾値Ｐ　ｔｈ２が供給さ
れ、Ｐｉとの比較が行われる。In FIG. 20, 4111 is a clock generator;
A clock is generated from the moment the switch SW1 is pressed. A counter 412 is loaded with the value i' of a counter 405 inside the power calculation unit 4o as an initial value at the moment SWl is pressed, and is counted down according to the clock of the clock generator 4111. counter 4112
The value indicates the value of i on the horizontal axis in FIG. 18, which starts from the point marked * and gradually decreases. Based on this value i, the power calculation unit 4
The contents Pi of the temporary memory 406 of 0 are sequentially read out and supplied to BO1 of the comparator 4100 and B1 of 41.01. A threshold value P th2 is supplied to AO of the comparator 4100, and a comparison with Pi is performed.

比較器４１０１のＡＩには閾値Ｐ　ｔｈｌが供給されＰ
ｉとの比較が行われる。比較器４１００のＢＯ≦ＡＯの
出力は、事象αに対応する。比較器４１０１のＢｌ＞Ａ
ｌの出力は、事象γに対応する。A threshold value P thl is supplied to AI of the comparator 4101, and P
A comparison with i is made. The output of comparator 4100 where BO≦AO corresponds to event α. Bl>A of comparator 4101
The output of l corresponds to the event γ.

ＡＮＤ回路４１０２では、比較器４１００のＢＯ＞ＡＯ
の出力と比較器４１０１のＢ１≦ＡＩの出力の論理積が
演算され、すなわち事象βに対応する出力が得られる。In the AND circuit 4102, BO>AO of the comparator 4100
The AND of the output of B1≦AI of the comparator 4101 is calculated, that is, an output corresponding to the event β is obtained.

ここで、α、βおよびγが同時に１になることはない。Here, α, β, and γ never become 1 at the same time.

４１０３および４１０４はフリップフロップであり、表
１のように状態のＳＯへ・Ｓ３を記憶するために使われ
る。4103 and 4104 are flip-flops, which are used to store the states SO and S3 as shown in Table 1.

表１　状態とフリップフロップの関係４１０５．４１０６．４１０７．４１０８．４１０９、
及び４１１０の各要素により第１９図の状態遷移を実現
している。Table 1 Relationship between states and flip-flops 4105.4106.4107.4108.4109,
The state transition shown in FIG. 19 is realized by the elements 4110 and 4110.

４１０３及び４１０４はスイッチＳＷＩからパルスがき
た時にまずリセットされ（図中には図示していない）、
状態がＳＯとなる。状態遷移図より、事象αが発生した
時にはどの状態からも必ずＳＯに遷移するため、αはＯ
Ｒ回路４１０８を通して４１０３のリセット入力に接続
されるとともに、４１０４のリセット入力に接続されて
いる。4103 and 4104 are first reset when a pulse is received from switch SWI (not shown in the figure),
The status becomes SO. According to the state transition diagram, when event α occurs, there is always a transition to SO from any state, so α is O
It is connected to the reset input of 4103 through an R circuit 4108, and also to the reset input of 4104.

状態遷移図より、γが１の時にはどの状態からも必ずＳ
３に遷移するため、γは４１０４のセット入力に接続さ
れるとともに、ＯＲ回路４１０７を通して４１０３のセ
ット入力に接続されている。From the state transition diagram, when γ is 1, S is always present from any state.
3, γ is connected to the set input of 4104 and also connected to the set input of 4103 through OR circuit 4107.

また、状態ＳＯにおいてβが１になった時はＳｌに遷移
するため、まずＡＮＤ回路４１０９で現在の状態ＳＯを
検出し、さらにβと４１０９の出力の論理積をＡＮＤ回
路４１０５で検出し、ＡＮＤ回路４１０５の出力でＯＲ
回路４１０７を通して４１０３をセットする。これによ
りＳＯから８１への遷移を実現している。また、状態Ｓ
３においてβが１になった時は状態Ｓ２に遷移するため
、ＡＮＤ回路４１１０で現在の状態Ｓ３を検出し、さら
にβと４１１０の出力の論理積をＡＮＤ回路４１０６で
検出し、ＡＮＤ回路４１０６の出力でＯＲ回路４１０８
を通して４１０３をリセットする。これにより状態Ｓ３
から状態Ｓ２への遷移を実現している。Also, when β becomes 1 in state SO, the state transitions to Sl, so the AND circuit 4109 first detects the current state SO, and then the AND circuit 4105 detects the logical product of β and the output of 4109, and then OR with the output of circuit 4105
4103 is set through circuit 4107. This realizes the transition from SO to 81. Also, state S
3, when β becomes 1, the state transitions to state S2. Therefore, the AND circuit 4110 detects the current state S3, and the AND circuit 4106 detects the logical product of β and the output of 4110. OR circuit 4108 at output
4103 through. This results in state S3
A transition from state S2 to state S2 is realized.

４１１３．４１１４．４１１７．４１１９及び４１２１
は３人力のＡＮＤ回路であり、各々状態遷移図第１９図
の■〜■の遷移を検出している。4113.4114.4117.4119 and 4121
is a three-man-powered AND circuit, each of which detects the transitions from ■ to ■ in the state transition diagram of FIG.

ＡＮＤ回路４１１３は遷移■を検出し、ＡＮＤ回路４１
１４は遷移■を検出する。■または■の遷移をＯＲ回路
４１１５で検出する。■または■の遷移が検出された場
合は、ｉの値をレジスタ４１１６（ＥＲ）に記憶する。The AND circuit 4113 detects the transition ■, and the AND circuit 41
14 detects the transition ■. The OR circuit 4115 detects the transition of ■ or ■. If a transition of ■ or ■ is detected, the value of i is stored in the register 4116 (ER).

ＡＮＤ回路４１１７は遷移■を検出する。■が検出され
た場合は、ｉの値をレジスタ４１１８　　（ＳＴＭＰ）
に記憶する。The AND circuit 4117 detects the transition ■. If ■ is detected, the value of i is sent to register 4118 (STMP).
to be memorized.

ＡＮＤ回路４１１９は遷移■を検出する。■が検出され
た場合は、セレクタ４１２０によりレジスタ（ＳＴＭＰ
）の内容を選択し、それをレジスタ４１２３（ＳＲ）に
記憶する。ＡＮＤ回路４１２１は遷移■を検出する。■
が検出された場合は、セレクタ４１２０によりｉの値を
選択し、それをレジスタ４１２３（ＳＲ）に記憶する。The AND circuit 4119 detects the transition ■. If ■ is detected, the selector 4120 selects the register (STMP
) and store it in register 4123 (SR). The AND circuit 4121 detects the transition ■. ■
If detected, selector 4120 selects the value of i and stores it in register 4123 (SR).

ＯＲ回路４１２２はＡＮＤ回路４１１９または４１２１
の出力をフリップフロップ４１２３に供給する。フリッ
プフロップ４１２３はスイッチＳＷＩの信号によりリセ
ットされ、ＯＲ回路４Ｉ２２の出力によりセントされる
。４１２３の出力はワンショットトリガ４１２４に接続
されている。４１２３及び４１２４によりスイッチＳＷ
Ｉが押される直前の４１２２の出力の一回のみがレジス
タ４１２５の書き込み信号となる。OR circuit 4122 is AND circuit 4119 or 4121
is supplied to the flip-flop 4123. Flip-flop 4123 is reset by the signal from switch SWI, and is set by the output of OR circuit 4I22. The output of 4123 is connected to one shot trigger 4124. Switch SW by 4123 and 4124
Only one output from 4122 immediately before I is pressed becomes a write signal for register 4125.

各部の信号ｃｌｋ、ｓｔｂ及びレジスタの値ＥＲ３及び
ＳＲを次段の判定部４２に供給する。The signals clk and stb of each section and the values ER3 and SR of the registers are supplied to the determination section 42 at the next stage.

第２１図は判定部の内部構成である。４２０はカウンタ
であり、島検出部４１のｃｌｋ信号でカウントアツプさ
れ、ｓｔｂ信号とｉｅ倍信号論理和でクリアされる。ｓ
ｔｂ信号とｉｅ倍信号論理和はＯＲ回路４２４で演算さ
れる。カンウタ４２０ば一つの島の終点が検出されてか
ら次の島の始点までの長さ（第１８図のβφ）をカウン
トすることになる。この長さがＴＨＥ以上になった場合
に比較器４２１の出力が１となる。ただし、一つの島の
終点が検出される以前に比較器４２１の不要な出力が認
識部６への５ｔｂｌ信号（ストローブ信号）として出力
されないように、フリップフロップ４２５とＡＮＤ回路
４２６が設けられている。フリップフロップ４２５はス
イッチＳＷＩからの信号でリセットされ、ｉｅ倍信号島
の検出信号）でセットされる。すなわちフリップフロッ
プ４２５の出力は少な（とも一つの島を検出しているこ
とを示す。ＡＮＤ回路４２６はフリップフロップ４２５
の出力により、比較器４２１の出力をゲートしている。FIG. 21 shows the internal configuration of the determination section. 420 is a counter, which is counted up by the clk signal of the island detection section 41 and cleared by the logical sum of the stb signal and the ie multiplied signal. s
The OR circuit 424 calculates the logical sum of the tb signal and the ie multiplied signal. The counter 420 counts the length (βφ in FIG. 18) from the detection of the end point of one island to the start point of the next island. When this length exceeds THE, the output of comparator 421 becomes 1. However, a flip-flop 425 and an AND circuit 426 are provided to prevent unnecessary output from the comparator 421 from being output as a 5tbl signal (strobe signal) to the recognition unit 6 before the end point of one island is detected. . The flip-flop 425 is reset by the signal from the switch SWI and set by the ie multiplied signal island detection signal). In other words, the output of the flip-flop 425 is small (one island is detected).
The output of the comparator 421 is gated by the output of the comparator 421.

乗算器４２２はＳＲの値を、間引く前のアドレスに戻す
ため０倍することにより一時メモリ４０６のアドレスを
バッファ３のアドレスに変換し、それをｉｓｒとして認
識部６に送る。同様に乗算器４２３はＥＲＯ値を０倍す
ることにより一時メモリ４０６のアドレスをバッファ３
のアドレスに変換し、それをｉｅｒとして認識部６に送
る。ｉｅｒはバッファ３内の音声の始点アドレスであり
、ｉｓｒはバッファ３内の音声の終点アドレスである。The multiplier 422 converts the address of the temporary memory 406 into the address of the buffer 3 by multiplying the value of SR by 0 to return it to the address before thinning out, and sends it to the recognition unit 6 as isr. Similarly, the multiplier 423 multiplies the ERO value by 0 to set the address of the temporary memory 406 to the buffer 3.
, and sends it to the recognition unit 6 as an ier. ier is the start point address of the audio within the buffer 3, and isr is the end address of the audio within the buffer 3.

認識部６は５ｔｂｌが１に成ったときにｉｓｒ及びｉｅ
ｒを取り込み認識を開始する。The recognition unit 6 recognizes isr and ie when 5tbl becomes 1.
Import r and start recognition.

第２２図は認識部６の内部構成を示した図である。FIG. 22 is a diagram showing the internal configuration of the recognition unit 6.

音声区間検出部４からのｉｓｒ、ｉｅｒ、５ｔｂ１は各
々ｉｅｅ、ｉｓｓ、５ｔｂ２としてそのままバッファ部
３へ転送される。ｉｅｅ、ｉｓｓ、５ｔｂ２に各信号に
よりバッファ部３から読み出された音声データＤｋ及び
クロックｃ　ｌｋｄは音声認識ユニット６００へ転送さ
れる。音声認識ユニット６００では音声データＤｋを認
識する。音声認識ユニット６００では認識時に音声テン
ブレヒトメモリ６０１に記憶された音声テンプレートを
参照する。音声認識ユニット６００で得られた認識結果
は１位から数位までの候補として得られる。認識結果の
候補は、候補選択・同音異義選択部７へ転送される。isr, ier, and 5tb1 from the voice section detection section 4 are transferred as they are to the buffer section 3 as iee, iss, and 5tb2, respectively. The audio data Dk and clock clkd read out from the buffer unit 3 using the iee, iss, and 5tb2 signals are transferred to the audio recognition unit 600. The speech recognition unit 600 recognizes the speech data Dk. The speech recognition unit 600 refers to the speech template stored in the speech Tenbrecht memory 601 during recognition. The recognition results obtained by the speech recognition unit 600 are obtained as first to several candidates. The recognition result candidates are transferred to the candidate selection/homophone selection section 7.

次に、この発明の第２実施例の動作を第２３図の動作流
れ図を参照しながら説明する。Next, the operation of the second embodiment of the present invention will be explained with reference to the operation flowchart of FIG. 23.

この発明では、マイク１からの音声の入力直後には、認
識結果の最有）ｊ候補を表示せずに、音声入力を入力部
２でデジタル変換し、それを−旦バソファ３に記憶して
おき、バッファ３への蓄積完了の表示のみを表示部８に
て行う。即ち、第５図の動作流れ図を参照して、動作を
説明すると、音声入力がなされ（ステップ１）、完了す
るとその旨を表示する（ステップ２）。この表示は、例
えば表示部８における＊印のブリンク等で良い。In this invention, immediately after the voice is input from the microphone 1, the voice input is digitally converted in the input unit 2 without displaying the most recent recognition result candidates, and then it is stored in the bass sofa 3 for a while. Then, only the completion of accumulation in the buffer 3 is displayed on the display section 8. That is, the operation will be explained with reference to the operation flowchart of FIG. 5. Voice input is performed (step 1), and when it is completed, a message to that effect is displayed (step 2). This display may be, for example, a blinking * mark on the display section 8.

次にスイッチＳＷＩが押下げられるとき、音声区間検出
部４にて、最新の音声区間の検出が行われることになる
（ステップ４）が、ＳＷｌが押下げられないで再び音声
入力があると、ステップ４には進まず、始めのステップ
１に戻る。従って、例えば、始めに誤って発音した場合
或いは咳ばらいをした場合等には、スイッチＳＷ１を押
下げることなく、音声区間の判定に十分なだけの時間Ｔ
ＨＮをあけて、その後に正しい発音で音声を入力すれば
良い。第１８図において、誤入力の音声区間（つ）の後
に、正しい音声を入力すると、これは音声区間（ア）と
なるので、ここでスイッチＳＷ１を押下げると、音声区
間検出部４においてこの時点での最新の音声区間である
（ア）を認識部６に送ることとなる（βφ＞ＴＨβの場
合）。なお、単一の音声のみが入力された後、直ちに、
スイッチＳＷ１が押下げられたときには、その単一の音
声が、認識部６に送られ、認識されることはいうまでも
ない。なお、バッファ３及び−時メモＩＪ　４．０６が
いっばいになった時には、次々に古い入力音声データか
ら順に書きかえれば良い。Next time the switch SWI is pressed down, the voice section detection unit 4 will detect the latest voice section (step 4), but if there is a voice input again without SW1 being pressed down, Do not proceed to step 4 and return to step 1. Therefore, for example, if you make an incorrect pronunciation at the beginning or cough, you can wait enough time T to determine the voice section without pressing switch SW1.
Just open HN and then input the audio with the correct pronunciation. In FIG. 18, if the correct voice is input after the incorrectly input voice section (1), this becomes the voice section (A), so if the switch SW1 is pressed down, the voice section detecting section 4 detects this point. (A), which is the latest voice section at , is sent to the recognition unit 6 (in the case of βφ>THβ). In addition, immediately after only a single voice is input,
Needless to say, when the switch SW1 is pressed down, that single voice is sent to the recognition section 6 and recognized. Incidentally, when the buffer 3 and the -time memo IJ 4.06 become full, it is sufficient to rewrite them one after another in order from the oldest input audio data.

認識部６に送られた音声は、辞書と比較され、最優先候
補から出力される（ステップ５）。そして、その結果が
表示部８に表示される（ステップ６）。この結果の表示
をみて、それが、正しいものであるときには、次の音声
入力を行うと（ステップ７）、ステップ１に戻ることに
なる。結果が同音ではあるが異義語であって、所望のも
のでなかったときには、スイッチＳＷ２を押下げる。す
ると、次の候補が表示されることになる（ステップ８、
ステップ６）。The speech sent to the recognition unit 6 is compared with a dictionary, and the highest priority candidates are output (step 5). The results are then displayed on the display section 8 (step 6). If the displayed result is correct, the next voice input is performed (step 7), and the process returns to step 1. If the result is a homophone but a different meaning, and is not the desired one, the switch SW2 is pressed down. Then, the next candidate will be displayed (step 8,
Step 6).

この操作を繰り返し、自分の望んでいる結果が得られた
時に、スイッチＳＷ２を押下げることなく、次の音声入
力を行えば、ステップ１に戻り、次の音声の入力処理に
移行することとなる。Repeat this operation, and when you get the result you want, input the next voice without pressing switch SW2, and you will return to step 1 and move on to the next voice input process. .

第２４図に本発明と従来例の操作を比較して示しである
ように、発声１、発声２を入力する場合で、途中に雑音
、発声誤り、咳ばらいがあった時には、本発明のスイッ
チ操作が、きわめて少なくて良いことがわかる。なおこ
の第２４図は客がＡＢＣ航空株式会社の大阪行×××便
の航空券を申込んだとき、オペレータが−度ＥＦＧ航空
株式会社と間違えた例を示す。そしてスイッチを操作し
ないことにより、不必要な文言「の」、「発」、「ゆき
」、ｒＥＦＧＪ、「失礼しました。」、「便ですね。」
、「お客さまのお名前をどうぞ。As shown in FIG. 24, which compares the operations of the present invention and the conventional example, when inputting utterances 1 and 2, if there is noise, erroneous pronunciation, or coughing, the present invention It can be seen that the number of switch operations is extremely small. FIG. 24 shows an example in which when a customer applied for a ticket for ABC Airlines Co., Ltd.'s flight XXXX bound for Osaka, the operator mistakenly applied for a ticket for EFG Airlines Co., Ltd. By not operating the switch, unnecessary words such as ``no'', ``departure'', ``yuki'', rEFGJ, ``excuse me'', and ``it's a flight'' are removed.
``Please tell me your name.''

」、「様ですね。」、「しばらくおまちください。”, “It’s Mr.”, “Please wait for a while.

」−・−を認識処理しないように操作する例を示してい
る。” -・- is shown as an example of operation so as not to be recognized and processed.

〔Effect of the invention〕

この発明によれば、音声入力後直に認識を行い、その認
識結果を表示せずに、−旦音声入力をバッファに入力し
、スイッチ操作をした時点での最新の音声区間のみを認
識部に送り、認識を行うこととしているので、誤発声等
をしたときには、その後、所定の時間経過後に正しい音
声入力をし、スイッチを押すことで正しい音声のみを認
識させることができる。このため、誤発声のみでなく雑
音や咳ばらい等にこだわることなく、入力できる。According to this invention, recognition is performed immediately after voice input, without displaying the recognition result, the voice input is input into the buffer, and only the latest voice section at the time of switch operation is sent to the recognition unit. Since the system transmits and recognizes the correct voice, if the user makes a mistake in utterance, the correct voice can be input after a predetermined period of time has elapsed, and only the correct voice can be recognized by pressing a switch. Therefore, input can be made without worrying about not only erroneous pronunciations but also noise, coughing, etc.

また、単語や文節等発声単位毎のスイッチ操作によって
入力に一定のタイミングを与えることができ、発声単位
毎に明確に区切って発声し易（なるため、２つの発声単
位が結合してしまうこともな（、このための誤認識がな
くなる。In addition, it is possible to give a certain timing to the input by operating a switch for each unit of utterance, such as a word or phrase, making it easier to clearly separate each unit of utterance and utter it (this prevents two units of utterance from being combined. (This eliminates misrecognition.

[Brief explanation of the drawing]

第１図は本発明の原理説明図、第２図は本発明の第１実施例構成図、第３図は第１実施例における音声区間検出部の構成例、第４図は音声パワーの曲線図、第５図は第１実施例の動作流れ図、第６図は従来例と本発明の操作比較図、第７図は本発明
の第２実施例の原理説明図、第８図は本発明の第２実施
例構成図、第９図は本発明の第３実施例構成図、第１０図は本発明の第４実施例構成図、第１１図は第１
実施例〜第３実施例の入力部の構成例、第１２図は第４実施例の入力部の構成例、第１３図は第
４実施例の入力部のタイミング図、第１４図はハソファ
部の構成例、第１５図は音声区間検出部の構成例、第１６図はパワー計算部の構成例、第１７図はパワー計算部のクロック説明図、第１８図は
島検出状態説明図、第１９図は島検出部の状態遷移図、第２０図は島検出部の構成例、第２１図は判定部の構成例、第２２図は認識部の構成例、第２３図は第２実施例の動作説明図、第２４図は第２実施例〜第４実施例と従来例との操作比
較図、第２５図は従来例構成図、第２６図は従来例の動作流れ図である。 ■−マイク　　　　　２−人力部３−バッファ　　　　４−音声区間検出部６−認識部７−候補選択・同音異義選択部８−表示部特許出願人　　　富士通株式会社代理人弁理士　　山　谷　晧　榮＼、７〜・、「Ｉｙくう　　く◇ ＼−Ｉ〜＼マレ１’Ｙ′−″ 宥年玉歇イFig. 1 is a diagram explaining the principle of the present invention, Fig. 2 is a configuration diagram of the first embodiment of the present invention, Fig. 3 is an example of the configuration of the voice section detection section in the first embodiment, and Fig. 4 is a curve of speech power. Fig. 5 is an operation flowchart of the first embodiment, Fig. 6 is a comparison diagram of the operation of the conventional example and the present invention, Fig. 7 is a principle explanatory diagram of the second embodiment of the present invention, and Fig. 8 is a diagram of the present invention. 9 is a block diagram of the third embodiment of the present invention, FIG. 10 is a block diagram of the fourth embodiment of the present invention, and FIG. 11 is a block diagram of the first embodiment of the present invention.
Example - Example of configuration of the input section of the third embodiment, FIG. 12 is an example of the configuration of the input section of the fourth embodiment, FIG. 13 is a timing diagram of the input section of the fourth embodiment, and FIG. 14 is a diagram of the hash sofa section. 15 is an example of the configuration of the voice section detection section, FIG. 16 is an example of the configuration of the power calculation section, FIG. 17 is an illustration of the clock of the power calculation section, FIG. 18 is an illustration of the island detection state, Figure 19 is a state transition diagram of the island detection unit, Figure 20 is a configuration example of the island detection unit, Figure 21 is a configuration example of the determination unit, Figure 22 is a configuration example of the recognition unit, and Figure 23 is the second embodiment. FIG. 24 is an operation comparison diagram between the second to fourth embodiments and the conventional example, FIG. 25 is a configuration diagram of the conventional example, and FIG. 26 is an operation flowchart of the conventional example. ■-Microphone 2-Human power section 3-Buffer 4-Speech section detection section 6-Recognition section 7-Candidate selection/homophone selection section 8-Display section Patent applicant Fujitsu Ltd. Representative patent attorney Akira Yamatani\,7 ~・, "Iy Kuuku◇ \-I~\Male 1'Y'-"

Claims

[Claims]

(1) In a speech recognition device that analyzes input speech, extracts characteristic parts, and performs recognition by comparing with a dictionary, there is a storage means (3) for temporarily storing input speech, and a storage means (3) for storing the input speech. A recognition instruction means (SW) is provided which instructs recognition of the voice to be recognized, and when the recognition instruction means (SW) is operated by inputting the voice to be recognized, the voice input portion immediately before the operation is extracted and recognized. A voice recognition device characterized by:

(2) A speech section detection means (4) is provided for extracting an island region from the speech stored in the storage means (3), and when the recognition instruction means (SW) is operated, the island region immediately before the operation is extracted. 2. The speech recognition device according to claim 1, wherein the speech recognition device is configured to perform recognition using a speech recognition method.

(3) A display means (8) is provided, and the recognition instruction means (SW
) to display the recognition result and confirm the recognition of the correct input voice.
The voice recognition device described.

(4) The voice input document creation device according to claim 1 or 3, wherein the document is created based on the result confirmed by the display means (8).

(5) The speech recognition apparatus according to any one of claims 1, 2, and 3, characterized in that the recognition instruction means and the recognition candidate selection means are common.

(6) The speech recognition device according to any one of claims 1, 2, 3, and 5, characterized in that the operation of the input unit is temporarily stopped when a recognition candidate is selected.