JPS59184940A

JPS59184940A - Voice word processor

Info

Publication number: JPS59184940A
Application number: JP58058874A
Authority: JP
Inventors: Hiroshi Hagane; 羽金　廣
Original assignee: NEC Corp; Nippon Electric Co Ltd
Current assignee: NEC Corp
Priority date: 1983-04-04
Filing date: 1983-04-04
Publication date: 1984-10-20

Abstract

PURPOSE:To simplify the operating procedure and to improve the recognition factor for a voice word processor by sorting the voice input into a single sound, a voiced sound, a semi-voiced sound and a contracted sound respectively to produce a standard pattern and then selecting a pair of voice out of said pattern memory when the voice is fed. CONSTITUTION:The voice input fed from a microphone 1 undergoes a frequency analysis through an analyzing part 2 and is delivered to an input pattern part 3. An unknown input voice pattern of the part 3 is compared 5 with one of plural standard voice patterns stored in a standard voice pattern memory part 4 and sent to a deciding part 6 for the final decision. The part 4 stores the voice input by sorting it into four groups, that is, a single sound 40, a voiced sound 41, a semi-voiced sound 42 and a contracted sound 43 respectively. Then a standard voice pattern is designated with operation of a keyboard (not shown in the diagram) and undergoes the comparison processing. The voice 40 is delivered to a comparison processing part 5 with no operation of keys. In this case, a voiced sound key 400, for example, is pushed and therefore a pattern of the semi-voiced sound pattern of a memory 41 is delivered for comparison 5. This eliminates the unnaturalness of the voice input and improves the recognition factor of an voice word processor.

Description

【発明の詳細な説明】本発明は、「い」、「ろ」、「は」など単音節を発声し
て入力する音声ワードプロセッサに関する。DETAILED DESCRIPTION OF THE INVENTION The present invention relates to a voice word processor that inputs monosyllables such as "i", "ro", and "ha" by vocalizing them.

近年、鍵盤入力に代わる新しい入力方式として音声入力
が注目され、その応用分野の一つとして゛ワードプロセ
ッサ用に研究され、一部製晶化されつつあるが、しかし
まだ本格的な実用に至っでいない状況にふる。その理由
は、従来の鍵盤入力やベンタッチ入力等に比べ操作が複
数化し実用上の効率が上らないからである。入力音声も
「い」　。In recent years, voice input has attracted attention as a new input method to replace keyboard input, and one of its applications is being researched for use in word processors, and some of it is being crystallized, but it has not yet been put into full-scale practical use. Think about the situation. The reason for this is that compared to conventional keyboard input, Bentouch input, etc., the operations are multiple and practical efficiency is not improved. The input audio is also good.

「ろ」、「は」４５音節に濁音、半濁音、↑ゆ音。``ro'' and ``ha'' have 45 syllables with voiced sounds, semi-voiced sounds, and ↑yu sounds.

外来語（フィ、フォ）等を加えると１００以上の単音節
となるがこのすべての単音節を認識できる装置はいまだ
実用化されていない。When foreign words (fi, pho), etc. are added, there are over 100 monosyllables, but a device that can recognize all of these monosyllables has not yet been put to practical use.

現在実用化されている音声ワードプロセッサは。What are the audio word processors currently in use?

濁音、半濁音、拗音等を含めた単音節すべてを認識対象
とすると認識率が低下するため、濁音、半濁音、拗音等
を除いた「い」　「ろ」　「は」４５音節だけを認識対
象としている。そのためにたとえは濁音「が」を入力す
る場合には、一度「か」を発声して入力し、その直後に
濁点の鍵盤を打鍵し「かＪを「が」に変換して「が」を
入力する。他の１ｆｌＪとしては、同様に一度「か」を
発声して入力し、直後に「濁点」等のコントロール語を
発声する単によって「か」を［がＪに変換して「が」を
入力している。どちらの例も音声ワードプロセッサが濁
音を認識できないため、それを補う方法ではおるが、オ
ペレータの発声にとって「が」を入力するために「か」
を発声する事は不自然であるため濁音等の入力が複雑な
操作となってしまっている。If all single syllables, including voiced sounds, semi-voiced sounds, and persistent sounds, are recognized, the recognition rate will decrease, so only the 45 syllables such as "i", "ro", and "ha", excluding voiced sounds, semi-voiced sounds, and persistent sounds, are recognized. It is said that To do this, for example, if you want to input the voiced sound "ga", say "ka" once and input it, then immediately press the keyboard with the voiced sound "ka J" to "ga" and enter "ga". input. Another example of 1flJ is to input ``ka'' by saying it once, and then immediately after uttering a control word such as ``dakuten'', ``ka'' is converted into ``J'' and ``ga'' is input. ing. In both examples, the voice word processor cannot recognize voiced sounds, so this is a method to compensate for this, but for the operator's voice, it is necessary to input "ka" to input "ga".
Since it is unnatural to say ``voice'', inputting voiced sounds etc. is a complicated operation.

本発明の目的は、従来の様な発声の不自然さをなくし操
作をより簡単にした音声ワードプロセッサを提供するこ
とである。SUMMARY OF THE INVENTION An object of the present invention is to provide a voice word processor that eliminates the unnaturalness of conventional speech and that is easier to operate.

すなわち本発明によれば、たとえば「が」を入力する場
合、直前に濁音の鍵盤を打鍵する事により、音声ワード
プロセッサは、濁音専用の認識処理を行う様にし、オペ
レータは濁音「がＪそのものを発声する事ができる音声
ワードプロセッサが得られる。本発明では「か」を発声
する従来の方法に比べてきわめて自然な発声となる。又
濁音等の鍵盤が打鍵されると、その時の認識結果は、そ
のグループ内に限られるためグループ外との誤認識を防
止できる。この発明により現在の認識技術レベルで前述
した百数個の単音節すべてを認識対象とした高性能な音
声ワードプロセッサを提供する事が可能である。In other words, according to the present invention, when inputting, for example, "ga", the voice word processor performs recognition processing specifically for the voiced sound by pressing the keyboard of the voiced sound immediately before, and the operator pronounces the voiced sound "ga" itself. According to the present invention, the utterance of ``ka'' is extremely natural compared to the conventional method of uttering ``ka''. Furthermore, when a key such as a voiced sound is pressed, the recognition result at that time is limited to those within that group, thereby preventing erroneous recognition as outside the group. With the present invention, it is possible to provide a high-performance speech word processor that can recognize all of the hundred or so monosyllables mentioned above using the current level of recognition technology.

以下本発明を図面ｆ参照しながら詳細に説明する。印、
１図は１本発明による一実施例である。本図において、
マイクロホンｌからの音声信号に分析部２において周波
数分析され入カバターン部３へ出力される。次に入カバ
ターン部３内の未知入力Ｍ声パターンは、あらかじめ記
憶されている標準音声パターン記憶部４からの標準音声
パターンと比較処理部５で比較照合されその結果は判定
部６に送られて最終的にいかなる単音節であったが判別
され結果が出力される。The present invention will be described in detail below with reference to drawing f. mark,
FIG. 1 shows an embodiment according to the present invention. In this figure,
The audio signal from the microphone 1 undergoes frequency analysis in the analysis section 2 and is output to the input cover section 3. Next, the unknown input M voice pattern in the input cover pattern section 3 is compared with the standard voice pattern stored in advance in the standard voice pattern storage section 4 in the comparison processing section 5, and the result is sent to the determination section 6. In the end, any single syllable is determined and the result is output.

第２図は、音声ワードプロセッサの鍵盤部の拡大図であ
る。濁音、半濁音等に対応フる鍵盤が用意されている。FIG. 2 is an enlarged view of the keyboard section of the audio word processor. A keyboard is provided that corresponds to voiced sounds, semi-voiced sounds, etc.

信号４００〜４０２は通常“０”レベルを示しており、
打鍵されているｔｇ＋ｔａ、゛１″レベルが出力される
。Signals 400 to 402 normally indicate "0" level,
The pressed key tg+ta, level "1", is output.

Ｉ！３図は、第１図の標準音声パターン記憶部４の詳細
南である。４０は「い」　「ろ」　［はＪ４５４５音標
準音声パターン記憶部、４１け濁音、４２は半濁音、４
３は　音の標準音声パターン記憶部である。通常打鍵さ
れていない時ｉｌ−を第２図からの信号４００〜４０２
がすべて“０”レベルであるので４５音節の標準音声パ
ターンが出力される。I! FIG. 3 shows the details of the standard voice pattern storage section 4 shown in FIG. 1. 40 is "i""ro" [is J4545 sound standard voice pattern storage part, 41 is kekuon, 42 is semi-voiced, 4
3 is a storage unit for standard sound patterns. Normally, when no key is pressed, il- is sent to signals 400 to 402 from Fig. 2.
Since all are at the "0" level, a standard speech pattern of 45 syllables is output.

又濁音が打鍵されると信号４００が１”レベルになるた
め濁音の標準パターンが出力される。同様にして信号４
０１，４０２がそれぞれ“ｌ”のとき、半濁音ｓ拗音の
標準パターンが出力される。Also, when a voiced sound is pressed, the signal 400 goes to the 1" level, so the standard pattern of the voiced sound is output. Similarly, the signal 4
When 01 and 402 are each "l", the standard pattern of the semi-voiced s-sulphon is output.

以上本発明の一実施例としてパターンマッチングケ取り
上げ説明したが、他のいかなる認識方式についても本発
明が適用できる事は、当然の事である。Although pattern matching has been described above as an embodiment of the present invention, it is a matter of course that the present invention can be applied to any other recognition method.

[Brief explanation of drawings]

第１図は音声ワードプロセッサの認識部である。第２図は、音声ワードプロセッサの鍵盤部の拡大図であ
る。第３図は、標準音声パターン記憶部の詳細図である
。図で、１・・・・・・マイクロフォン、２・・・・・・
音声信号分析部、３・・・・・・入カバターン部、４・
・・・・・標準音声パターン記憶部、５・・・・・・比
較処理部、６・・・・・・判別部、４０・・・・・・標
準音声（４５音節ツバターン記憶ｎ！ＸＨ’＋　１・・
・・・・標準音声（濁音フパタニンａピ憶部。４２・・・・・・標準音声（半濁音）パターン記憶部、
４３・・・・・律“Ｐ準音声（拗音）パターン記憶部、
４００・・・・・・濁音鍵盤出力信号、４０１・・・・
・・半濁音鍵盤出力信号、４０２・・・・・・　音鍵盤
出力信号。FIG. 1 shows the recognition section of a spoken word processor. FIG. 2 is an enlarged view of the keyboard section of the audio word processor. FIG. 3 is a detailed diagram of the standard voice pattern storage section. In the diagram, 1...microphone, 2...
Audio signal analysis section, 3... Input cover turn section, 4.
...Standard voice pattern storage unit, 5...Comparison processing unit, 6...Discrimination unit, 40...Standard voice (45 syllables Tubaturn memory n!XH' +1...
... Standard speech (semi-voiced sound) pattern memory section. 42 ... Standard speech (semi-voiced sound) pattern memory section,
43... Ritsu "P quasi-voice (sulky sound) pattern storage unit,
400...Dark sound keyboard output signal, 401...
...Semi-voiced keyboard output signal, 402... Sound keyboard output signal.

Claims

[Claims]

It is characterized by comprising a plurality of standard speech patterns corresponding to monosyllables, voiced sounds, semi-voiced sounds, and persistent sounds, and recognizing one of the plurality of sound standard patterns by selecting one of the plurality of sound standard patterns and comparing it with an input speech pattern. Speech word processor.