JPS6211732B2

JPS6211732B2 -

Info

Publication number: JPS6211732B2
Application number: JP54077659A
Authority: JP
Inventors: Hidekazu Shiratori; Osamu Terao; Yasuo Sato; Junichi Ichikawa
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1979-06-20
Filing date: 1979-06-20
Publication date: 1987-03-14
Also published as: JPS562041A

Description

【発明の詳細な説明】本発明は音声入力方式に関し、特に入力された
音声を認識して表示しかつ入力データ格納部に入
力する際に入力された音声に対応する確からしさ
を有する複数個の文字を所定の優先順に表示し、
最優先位置にある文字が所定のものであるときこ
れを自動的に選択して記憶する音声入力方式に関
するものである。DETAILED DESCRIPTION OF THE INVENTION The present invention relates to a voice input method, and more particularly, to a method for recognizing and displaying an input voice, and for inputting a plurality of voices having a probability of corresponding to the input voice when inputting it into an input data storage unit. Display the characters in a given priority order,
This invention relates to a voice input method that automatically selects and stores a predetermined character at the highest priority position.

現在、データの入力装置としてはタイプライ
タ、キーボード、穿孔歎置、タツチ入力装置等が
あるが、それらの装置を使いこなすためには、オ
ペレータはかなりの訓練が必要であり、またオペ
レータが実際の装置の操作を行なう場合にも長時
間連続して操作すると疲労度も増大する。 Currently, data input devices include typewriters, keyboards, perforated machines, touch input devices, etc., but in order to master these devices, operators require considerable training, and the Even when performing these operations, if the operation is continued for a long time, the level of fatigue increases.

このように入力すべきデータを、上記の如き入
力装置を用いて、データ処理装置が認識可能な書
かれた形式あるいは穿孔形式に変換するのには多
くの負担がかかるのが普通である。 There is usually a great deal of effort involved in converting the data to be entered into a written or punched format that can be recognized by a data processing device using an input device such as the one described above.

したがつて、データ処理装置に入力すべきデー
タを上記変換を行なわずに音声の形式でそのまま
直接入力できれば誠に都合がよい。 Therefore, it would be very convenient if the data to be input into the data processing device could be directly input in voice format without the above conversion.

このような音声入力装置についての研究は、か
なり成果を得ており、入力すべきデータについて
音声的に区切つて発声された単語を入力する装置
は、すでにいくつか実現化されている。 Research on such speech input devices has achieved considerable results, and several devices have already been realized that input data that is vocally separated and spoken words.

しかしながら、従来のこの種の装置において
は、入力すべき単語の音の特徴が互に類似してい
る場合には、それらを区別識別することが困難で
あり、その識別率が低下してしまつて実用的であ
るとは言えなかつた。特にカナ文字のように殆ん
どすべての音素の組合わせが存在する場合におい
ては、100％に近い音声の認識を実現するのは極
めて困難な状況にある。 However, in conventional devices of this type, when the sound characteristics of the words to be input are similar, it is difficult to distinguish between them, and the recognition rate decreases. I couldn't say it was practical. Especially in cases where almost all phoneme combinations exist, such as in kana characters, it is extremely difficult to achieve near 100% speech recognition.

この問題に対して発明の特許出願人は、先に特
開昭53−77402号公報において、例えば、カナ文
〓〓〓〓
字において、いくつかの文字候補を複数表示して
その中から正しいものを選択する方式を提案し
た。 Regarding this problem, the patent applicant for the invention previously proposed in Japanese Patent Application Laid-Open No. 53-77402, for example,
We proposed a method for displaying several character candidates and selecting the correct one from among them.

本発明は上記出願の発明を一段と進めて、認識
された音声に対応する確からしさを有する文字を
複数表示する際に優先順に定め、最優先位置にあ
る文字のみを自動的に選択してデータ格納部に格
納する音声入力方式を提供することを目的として
いる。 The present invention takes the invention of the above-mentioned application one step further, and when displaying a plurality of characters that are likely to correspond to recognized speech, prioritizes the characters, automatically selects only the characters in the highest priority position, and stores the data. The purpose is to provide a voice input method that is stored in the computer.

本発明による音声入力方式は、音声入力を分析
し特徴を抽出する分析抽出手段と、上記音声入力
を認識するための多数の情報が蓄積された辞書部
と、上記分析抽出手段および上記辞書部からの情
報にもとづいて上記音声入力を認識する手段、お
よび上記認識手段から上記音声入力に対応する複
数の認識された候補を表示する表示手段を備えた
音声入力方式において、上記表示手段の表示位置
に優先順位を与え、最優先表示位置に表示された
上記候補が所要のものであるとき、次の音声入力
の発音を行うことによつてこれを自動的に選択し
て記憶することを特徴としている。 The voice input method according to the present invention includes: an analysis extraction means for analyzing voice input and extracting features; a dictionary section storing a large amount of information for recognizing the voice input; In the voice input method, the voice input method includes means for recognizing the voice input based on the information of the voice input, and a display means for displaying a plurality of recognized candidates corresponding to the voice input from the recognition means, at a display position of the display means. The system is characterized in that it is given priority and when the candidate displayed at the highest priority display position is the desired one, it is automatically selected and stored by pronouncing the next voice input. .

次に本発明の一実施例を図を参照して説明す
る。 Next, one embodiment of the present invention will be described with reference to the drawings.

第１図は本発明の音声入力方式の一実施例構成
を示し、第２図は第１図の動作フローチヤートを
示す。 FIG. 1 shows the configuration of an embodiment of the voice input method of the present invention, and FIG. 2 shows an operational flowchart of FIG. 1.

第１図において、１はマイクロフオンで使用者
が認識すべき音声データ入力を与えるもの、２は
操作部で、この操作部は操作キーK₁，K₂，K₃，
K₄が設けられており後で述べる表示部に表示さ
れた文字候補（候補…………文字も含む）を自動
的または手動的に選択できるものである。３は辞
書部で、入力された音声に対応する多数の情報、
例えば「ア、イ、ウ、エ、オ、カ、キ、ク、……
……ン、ガ、ギ、グ、ゲ、ゴ、ザ、ジ、…………
パ、ピ、…………ポ」などが記憶されている。４
は周波数分析部で、入力された音声の音声スペク
トルを分析しているもの、５は音声の特徴を抽出
する特徴抽出部である。６は比較判定部であつ
て、上記特徴抽出部５からの音声情報と辞書部３
からの音声情報とを比較して比較判定を行なうも
のである。７は制御部であつて、周波数分析部４
や特徴抽出部５への指令を与えたり、比較判定部
６の比較判定動作を制御している。８は表示部で
あり、制御部７の指令に基づいて比較判定部６か
らの出力を表示している。９は入力データ格納部
で、認識された正しい音声入力に対応するデータ
（文字、数字）を一時格納するものである。 In FIG. 1, 1 is a microphone that provides voice data input to be recognized by the user, 2 is an operation section, and this operation section has operation keys K ₁ , K ₂ , K ₃ ,
_K4 is provided, and the character candidates (candidates...including characters) displayed on the display section, which will be described later, can be automatically or manually selected. 3 is a dictionary section, which contains a large amount of information corresponding to the input voice,
For example, "A, I, U, E, O, Ka, Ki, Ku...
……N, ga, gi, gu, ge, go, the, ji,……
"Pa, pi, .......po" are remembered. 4
5 is a frequency analysis unit that analyzes the audio spectrum of input audio, and 5 is a feature extraction unit that extracts the characteristics of the audio. Reference numeral 6 denotes a comparison/judgment unit which combines the audio information from the feature extraction unit 5 and the dictionary unit 3.
A comparative judgment is made by comparing the audio information from the source. 7 is a control section, and a frequency analysis section 4
It also gives commands to the feature extraction section 5 and controls the comparison and judgment operation of the comparison and judgment section 6. Reference numeral 8 denotes a display section, which displays the output from the comparison/judgment section 6 based on commands from the control section 7. Reference numeral 9 denotes an input data storage section for temporarily storing data (letters, numbers) corresponding to recognized correct voice input.

次に第１図に示す音声入力方式の動作を第２図
の動作フローチヤートを参照して説明する。 Next, the operation of the voice input method shown in FIG. 1 will be explained with reference to the operation flowchart shown in FIG.

例として「カワサキ」という単語をマイクロフ
オン１に向つて発音し、該音声を認識する場合を
考える。まず、使用者は「カ」を発音する。そう
すると、マイクロフオン１からの音声信号が周波
数分析部４へ与えられ、そこで発音された音声の
スペクトラムが電気的に分析される。この分析さ
れた出力信号は特徴抽出部５へ送られる。上記特
徴抽出部５では、周波数分析部４からの分析結果
に基づいて「カ」の音声のもつ特徴を抽出しその
結果の出力信号を比較判定部６の一方の入力へ与
える。この比較判定部６は、別に上記したように
「ア、イ、ウ…………ン、ガ、ギ、グ…………、
パ、ピ…………」等の清音、濁音、半濁音などの
音声に対応する情報が記憶されている辞書部３と
接続されているので、これらの情報が次々と取出
され、前記特徴抽出部５からの情報と順次比較さ
れる。そして入力された音声である可能性のある
情報が見つけ出される度に制御部７を介して表示
部８上の表示位置D₁，D₂，D₃，D₄へ次々と表示
されてゆく。図の実施例では、入力された音声で
ある可能性（確からしさ）をもつ４つの文字候
補、例えば「カ」、「タ」、「パ」、「ハ」が表示され
る。 As an example, consider a case where the word "Kawasaki" is pronounced into the microphone 1 and the speech is recognized. First, the user pronounces "ka". Then, the audio signal from the microphone 1 is given to the frequency analyzer 4, where the spectrum of the generated audio is electrically analyzed. This analyzed output signal is sent to the feature extraction section 5. The feature extraction section 5 extracts the features of the voice of "ka" based on the analysis results from the frequency analysis section 4, and supplies the resulting output signal to one input of the comparison/judgment section 6. As mentioned above, this comparison/determination section 6 performs the following operations:
Since it is connected to the dictionary section 3 that stores information corresponding to sounds such as clear sounds, voiced sounds, and semi-voiced sounds such as "pa, pi......", this information is extracted one after another and the feature extraction is performed. The information from section 5 is sequentially compared. Each time information that may be input voice is found, it is displayed one after another at display positions D ₁ , D ₂ , D ₃ , and D ₄ on the display unit 8 via the control unit 7 . In the illustrated embodiment, four character candidates that have a possibility (certainty) of being the input voice are displayed, for example, "ka", "ta", "pa", and "ha".

この場合に、表示部８の表示位置D₁，D₂，
D₃，D₄のうち、表示位置D₁に最も高い優先順位
を与えておいて、この位置に表示された文字が正
答である場合に、使用者は次の語の「ワ」を発音
する。これにより制御部７では上記次の語である
「ワ」の発声が終つてこれに対する語の候補が比
較判定部６より送出されてきたとき、上記表示位
置D₁に表示されていた文字を第１番目の入力デ
ータとして入力データ格納部９に自動的に格納さ
れる。そして上記「ワ」の発声に対して送出され
てきた語の候補をその確からしさの順に表示位置
D₁，D₂，D₃，D₄に表示する。 In this case, the display positions D ₁ , D ₂ ,
Among D ₃ and D ₄ , the highest priority is given to display position D ₁ , and when the character displayed at this position is the correct answer, the user pronounces the "wa" of the next word. . As a result, when the next word "wa" has been uttered and word candidates for it are sent from the comparison/judgment section 6, the control section 7 changes the character displayed at the display position _D1 to the first word. This is automatically stored in the input data storage section 9 as the first input data. Then, the word candidates sent in response to the above utterance of "wa" are displayed in the order of their likelihood.
Display on D ₁ , D ₂ , D ₃ , D ₄ .

しかしながら、最も確率の高い文字が表示され
ることになつている表示位置D₁に表示された文
字が正答でなく、他の優先順位をもつD₂乃至D₄
〓〓〓〓
の１つに正答が表示された場合には、正しい文字
が表示された位置に対応するキーK₂，K₃，K₄の
いずれかを押すことにより、それを入力データ格
納部９に格納することができる。 However, the character displayed in display position D ₁ , where the character with the highest probability is supposed to be displayed, is not the correct answer, and D ₂ to D ₄ with other priorities
〓〓〓〓
If the correct answer is displayed in one of the characters, press any of the keys K ₂ , K ₃ , and K ₄ corresponding to the position where the correct character is displayed to store it in the input data storage section 9. be able to.

同様にして「サ」、「キ」についても同じ操作を
行ない音声を認識する。 Similarly, the same operation is performed for "sa" and "ki" to recognize the sounds.

また、D₁〜D₄のいずれにも正しく認識された
文字が表示されなかつた場合には、キヤンセルキ
ーK₀を押して表示をキヤンセルし同じ音声を再
度発音しながら正く認識されるまで上記操作を繰
返す。 Also, if a correctly recognized character is not displayed in any of D ₁ to _{D 4} , press the cancel key K ₀ to cancel the display, and repeat the above steps while pronouncing the same voice again until the character is correctly recognized. Repeat.

上記の認識プロセスを第２図のフローチヤート
で要約すれば、ステツプ(1)で装置の動作開始後、
ステツプ(2)でマイク１に向つて、一文字分を発声
し（例えば「カ」）、上記の認識動作の結果、ステ
ツプ(2)で表示部８の表示位置D₁に文字が表示さ
れたか否かを判定し、「YES」であればステツプ
(5)へジヤンプし、次の音声を入力する。「NO」で
あればステツプ(4)で選択キーを操作し、上記プロ
セスを繰返す。 To summarize the above recognition process using the flowchart in Figure 2, after the device starts operating in step (1),
In step (2), one character is uttered into the microphone 1 (for example, "ka"), and as a result of the above recognition operation, it is determined in step (2) whether or not the character is displayed at the display position _D1 of the display unit 8. If “YES”, step
Jump to (5) and input the next audio. If "NO", operate the selection key in step (4) and repeat the above process.

入力された音声がすべて正しく認識されたらス
テツプ(5)で「YES」の経路を辿りステツプ(6)で
動作が終了する。 If all input voices are correctly recognized, the path of "YES" is followed in step (5), and the operation ends in step (6).

以上、述べたように、本発明においては、最も
高い優先順が与えられた表示位置に正答が表示さ
れたとき次の語を発声することによりこれを正し
く認識された文字として自動的に選択してデータ
格納部に自動的に一時格納することができる。そ
して正答がそれ以外の表示位置に表示された場合
には、これをキー手段により手動的に選択してデ
ータ格納部に格納することができる。 As described above, in the present invention, when the correct answer is displayed at the display position given the highest priority, the next word is uttered to automatically select it as the correctly recognized character. The data can be automatically temporarily stored in the data storage unit. If the correct answer is displayed at a different display position, it can be manually selected using the key means and stored in the data storage section.

したがつて、正答が最優先順位の表示位置に表
示されて自動的選択や行なわれる場合には、使用
者は単に次々と発音してゆけばよいので入力操作
が著しく軽減されると共に音声入力速度および認
識速度を非常に高めることができる。 Therefore, when the correct answer is displayed at the highest priority display position and automatically selected or performed, the user can simply pronounce it one after the other, which significantly reduces input operations and speeds up voice input. and recognition speed can be greatly increased.

[Brief explanation of the drawing]

第１図は本発明の一実施例構成を示し、第２図
は第１図における一実施例構成の動作フローチヤ
ートを示す。図中、１はマイクロフオン、２は操作部、３は
辞書部、４は周波数分析部、５は特徴抽出部、６
は比較判定部、７は制御部、８は表示部、９は入
力データ格納部をそれぞれ示す。〓〓〓〓
FIG. 1 shows the configuration of an embodiment of the present invention, and FIG. 2 shows an operation flowchart of the configuration of the embodiment shown in FIG. In the figure, 1 is a microphone, 2 is an operation section, 3 is a dictionary section, 4 is a frequency analysis section, 5 is a feature extraction section, 6
7 represents a comparison/judgment section, 7 represents a control section, 8 represents a display section, and 9 represents an input data storage section. 〓〓〓〓

Claims

[Claims]

1. An analysis extraction means for analyzing voice input and extracting features; a dictionary section storing a large amount of information for recognizing the voice input; A voice input method comprising a recognition means for recognizing an input, and a display means for displaying a plurality of recognized candidates corresponding to the voice input from the recognition means,
Priority is given to the display position of the display means, and when the candidate displayed at the highest priority display position is the desired one, it is automatically selected by uttering the next voice input. A voice input method characterized by being memorized.