JPH0749699A

JPH0749699A - Audio device responding to voice input

Info

Publication number: JPH0749699A
Application number: JP5231081A
Authority: JP
Inventors: Nagamasa Nagano; 長正長野
Original assignee: Individual
Current assignee: Individual
Priority date: 1993-08-07
Filing date: 1993-08-07
Publication date: 1995-02-21

Abstract

PURPOSE:To obtain a device which recognizes words inputted by a voice input and replies the answer to the words in voiced words. CONSTITUTION:A voice input is divided into components for every frequency band by plural filter circuits. The signal waves are converted to digital numerical values by A-D converters 7 provided for each frequency band and each higher harmonic content peculiar to the voice is discriminated from the numerical values for voice recognition. The recognized voice is compared and detected against the voice data of the words beforehand registered in a memory 9 by a CPU 8 and the words of inputted voice are recognized. Then, the CPU 8 generates detected signals and a voice IC 9 replies beforehand selected words which are suitable for the reply to the inputted voice in a voice.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は日常生活の挨拶や、簡単
な言葉による指示、命令、問いかけ等の短い言葉を識別
して、音声による返事をしたり、電気のスイッチをＯ
Ｎ，ＯＦＦしたりして応答出来る装置の開発に関するも
のである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention identifies short words such as greetings in daily life, instructions, commands, and inquiries in simple words, and gives a voice reply or turns on an electric switch.
It relates to the development of a device that can respond by turning it on and off.

【０００２】[0002]

【従来の技術】人の音声識別に関する研究は大型コンピ
ューターによる音声言語の翻訳や、音声による特定個人
の判別に用いられる声紋装置の開発等広い分野で行われ
て居るが、数語音から成る日常会話程度の短い言葉の識
別を簡単な装置で行わせようとする研究は立ち遅れて居
るのが現状である。2. Description of the Related Art Research on human speech recognition has been carried out in a wide range of fields such as translation of a speech language by a large computer and development of a voiceprint device used for discrimination of a specific individual by voice. At present, research that attempts to identify words with a short degree using a simple device is lagging behind.

【０００３】[0003]

【発明が解決しようとする課題】音声の識別が難しいの
はその振動波形が極めて短時間のうちに複雑に変動し、
且つその振動が微妙に変化すると言葉の意味が大きく変
わったり、又個人差によって振動波形が異なっても意味
は同じである等、非常に複雑な要素をもって居る為であ
る。DISCLOSURE OF INVENTION Problems to be Solved by the Invention It is difficult to identify a voice because its vibration waveform changes complicatedly in an extremely short time.
Moreover, if the vibration changes subtly, the meaning of the word changes greatly, and even if the vibration waveform is different due to individual differences, the meaning is the same, so there are very complicated elements.

【０００４】本発明は、複雑な振動波形を有する音声を
以下に詳述する手法により短時間に識別し、直ちにその
音声に対する応答をする装置を提供せんとするものであ
る。The present invention is intended to provide a device for identifying a voice having a complicated vibration waveform in a short time by the method described in detail below and immediately responding to the voice.

【０００５】[0005]

【課題を解決するための手段】以下、その解決手段を図
面に従い乍ら説明する。人がア，イ，ウ，エ，オ，と母
音を３３０Ｈｚの音階で発声した時のマイクロホン１の
出力電圧波形を調べてみると１図にその波形の１例とし
てア，イ，エ，の音声を示した如く概略第５高調波成分
迄の合成波形から出来て居る事が分かる。アとオ、イと
ウ、はお互い良く似て居るので図には示さなかったが各
音ともその高調波の含有率が大きく異なって居るので良
く判別出来る。日常会話の音階は２００〜１０００Ｈｚ
の帯域であるから言葉の音声識別は２００〜５０００Ｈ
ｚの周波数帯域内の音声信号波を解析すれば良いことが
分かる。[Means for Solving the Problems] The means for solving the problems will be described below with reference to the drawings. When a person examines the output voltage waveform of the microphone 1 when vowels a, a, u, e, o, and vowels are generated at a scale of 330 Hz, one example of the waveform is shown in FIG. As shown in the voice, it can be seen that it is made up of synthesized waveforms up to the fifth harmonic component. A and O, and I and U are not shown in the figure because they are very similar to each other, but the harmonic content of each sound is significantly different, so it can be easily distinguished. The scale of everyday conversation is 200-1000Hz
Because it is the band of 200 to 5000H
It is understood that it is sufficient to analyze the voice signal wave within the frequency band of z.

【０００６】日本語は５つの母音と幾つかの子音との合
成で出来ていると考えられて居るが、例えば、「カ」の
音はローマ字でＫＡと綴る様にＫの後にＡが引き続いて
発音されて居る様に感じられる。これをミリ秒（ｍｓ）
単位の時間軸で波形分析をしてみると音の立ち上がり初
期の十数ｍｓの期間はＡの音がＫの音に変形された、と
も云うべき合成波となって居るがその後の大部分はＡの
音である。カーと長く発音してみれば明らかであるよう
に或る程度以上Ａの音が長く続いて居ないとカの音には
感じられないものである。以上から日本語の“いろは”
４８文字は総て５つの母音で出来て居り、これが１０個
程の子音で立ち上がり初期の極めて短時間だけ変形され
て４８音となって居る、と云うことが出来る。従って数
語の音声から出来て居る言葉例えば、お早よう、ただい
ま、等の短い言葉はどの母音が幾つ含まれて居るかと云
う事だけを分析しても或る程度の識別は可能である。Japanese is thought to be composed of five vowels and some consonants. For example, the sound of "ka" is pronounced in the Roman alphabet by KA followed by A followed by A. It feels like you're being done. This is millisecond (ms)
When we analyze the waveform on the unit time axis, it seems that the sound of A is transformed into the sound of K during the period of more than 10 ms at the beginning of the rising of the sound, but most of it is after that. It is the sound of A. As is clear from pronouncing the car for a long time, the sound of A cannot be felt unless the sound of A continues for a certain length of time. From the above, the Japanese word “iroha”
It can be said that the 48 characters are made up of all 5 vowels, and these are made up of about 10 consonants and are transformed for a very short time in the initial stage to form 48 sounds. Therefore, it is possible to identify to a certain extent only by analyzing which vowels are included in a short word such as "Good morning, right now, etc."

【０００８】識別された言葉はＣＰＵ８により、そのメ
モリー９に登録されて居る言葉と比較される。登録され
た中に一致して居る言葉があればＣＰＵ８はこの出力で
音声ＩＣ１０を動作させる。音声ＩＣ１０はＣＰＵ８に
指定された言葉を音声で発音して入力された音声の返答
とする。これが本発明の基本概念である。The identified word is compared by the CPU 8 with the word registered in its memory 9. If there is a matching word in the registered words, the CPU 8 operates the voice IC 10 with this output. The voice IC 10 pronounces a word designated by the CPU 8 as a voice and sets it as a response to the input voice. This is the basic concept of the present invention.

【０００９】[0009]

【作用と実施例】次に本発明の構成とその実施例及び実
験結果の１例を挙げて説明する。音声入力はマイクロホ
ン１で電気信号（例えば１図の音声波形信）に変形され
た后、増巾器２で増巾され、更にインピーダンス整合さ
れた信号波となって後続の各要素に分配される。３，
４，５，の各フィルター回路に分配された入力信号波は
そのフィルター回路の周波数帯域毎の成分に弁別され次
いで後続のアナログーディジタル（Ａ−Ｄ）変換器７に
よってディジタル信号に変換されディジタル出力Ｄｌ，
Ｄｂ，Ｄｈ，を得る。ゲート６は入力音声に同期して各
Ａ−Ｄ変換器７が動作するように制御するものである。FUNCTION AND EXAMPLE Next, the constitution of the present invention, its example, and one example of experimental results will be described. The voice input is transformed into an electric signal (for example, a voice waveform signal shown in FIG. 1) by the microphone 1 and then amplified by the amplifier 2 to be further converted into an impedance-matched signal wave, which is distributed to each subsequent element. . Three
The input signal wave distributed to each of the 4, 5 and 5 filter circuits is discriminated into components for each frequency band of the filter circuit and then converted into digital signals by the subsequent analog-digital (A-D) converter 7 to output digital signals. Dl,
Obtain Db, Dh. The gate 6 controls each A-D converter 7 to operate in synchronization with the input voice.

【００１０】表１はこの実験データーの１例を示したも
ので、マイクロホン１は市販の電磁型、増巾器２は対数
特性の増巾器、各フィルター回路３，４，５，はクロス
オーバー周波数を７００Ｈｚと１５００Ｈｚとし、その
減衰特性を−１８ｄｂ／ｏｃｔに設定したアクティブフ
ィルターを用い、Ａ−Ｄ変換器７は市販の周波数カウン
ターを３個用いて第２図の回路構成を具体化した場合の
実験値である。各言葉の音階はほぼ２５０Ｈｚでその継
続時間はいづれも１秒以内である。Table 1 shows an example of this experimental data. The microphone 1 is a commercially available electromagnetic type, the amplifier 2 is a logarithmic amplifier, and the filter circuits 3, 4, 5 are crossovers. When the frequencies are 700 Hz and 1500 Hz and the attenuation characteristics are set to -18 db / oct, and the AD converter 7 uses the three commercially available frequency counters to embody the circuit configuration of FIG. Is the experimental value of. The scale of each word is approximately 250 Hz, and the duration of each is less than 1 second.

【００１１】[0011]

【表１】 [Table 1]

【００１２】各フィルター毎の計数値は総て２００以下
であるがこれでも各言葉の差異は十分弁別出来る。しか
しこの実験器で複数人の男女について実験を繰り返して
各Ｄｌ，Ｄｂ，Ｄｈ，の値を求めてみると次第に個人差
が大きくなって弁別は難しくなるが増巾器２の増巾特性
を更に強い対数特性として高周波帯域の精度を上げて各
高調波の成分比率Ｄｂ／Ｄｌ，Ｄｈ／Ｄｌ，を算定して
みるとこの値は個人差によるバラッキが非常に少なくな
って著しく弁別の精度が向上する。Although the count value for each filter is less than 200 in all, the difference in each word can be sufficiently discriminated. However, if the experiment is repeated for a plurality of men and women to obtain the values of Dl, Db, and Dh, the individual difference gradually increases and it becomes difficult to discriminate. When the accuracy of the high frequency band is increased as a strong logarithmic characteristic and the component ratios Db / Dl and Dh / Dl of each harmonic are calculated, this value has very little variation due to individual differences, and the accuracy of discrimination is significantly improved. To do.

【００１３】識別された言葉、例えば“ただいま”の実
測値９０，１１０，７０，１．２，０．８，はＣＰＵ８
でメモリー９に予め登録されて居る言葉のそれと比較検
出され、その番地出力は音声ＩＣ９に印加されるので音
声ＩＣ９はこの返答として予めセットされて居る音声、
例えば“お帰りなさい”の音声信号を出力する。この出
力は拡声器１３により十分な大きさの音声として発声さ
れ返答となるものである。The identified words, for example, the measured values of "Imaima" 90, 110, 70, 1.2, 0.8 are CPU8.
Is detected by comparison with that of the word pre-registered in the memory 9, and the address output is applied to the voice IC 9, so the voice IC 9 is the voice preset as this reply,
For example, the audio signal "Welcome back" is output. This output is uttered by the loudspeaker 13 as a voice having a sufficiently large volume and becomes a reply.

【００１４】メモリー９への各計数データー等の登録は
コントローラー１２によって次の通りに行われる。三図
にその一例として簡単な２ビットの構成を示す。１２は
コントローラーのケース本体、１２−１〜１２−４の数
字を記したパネルはキーボードであって４個の各キーは
信号線Ｌ１，Ｌ２，Ｌ３，によってＣＰＵ８と音声ＩＣ
９に並列接続される。キーの数字の下にはお早よう、た
だいま等、各キー毎にタイトルが表示されて居るが、こ
れはメモリー９と音声ＩＣ１０との関連を予め決めてお
く為のものである。今、キー１２−１を押すと信号線Ｌ
１がオンとなり番号出力１により予め内蔵されて居る多
数の言葉の内からタイトルに記されて居る言葉“お早よ
う”に対する返答として好ましい“お早よう御座いま
す”と云う言葉を選んでこの音声を出力出来るように音
声ＩＣをセットする。次に実験者が“お早よう”と自分
の音声を入力するとＣＰＵ８はこの音声を弁別して得ら
れたＤｌ，Ｄｂ，Ｄｈ，を入力してＤｂ／Ｄｌ，Ｄｈ／
Ｄｌ，Ｄｈ／Ｄｂ，等を演算しこの各数値をメモリー９
の指定番地０１に記憶させる。続いてキー１２−２を押
すと信号線Ｌ２がオンとなって番号出力２を発信する。
音声ＩＣ１０はこのキーのタイトル“ただいま”の返答
として“お帰りなさい”を選んでこの言葉の音声が出力
されるように音声ＩＣ９をセットする。次に実験者が
“ただいま”と入力するとＣＰＵ８は前述のキー１２−
１の時と同様にして計数データー等の各値をメモリー９
の指定番地０２に記憶させる。以下、１２−３，１２−
４，とキーの数だけ同様の操作を行ってその人固有の音
声による各言葉のデーターをメモリー９の指定番地に登
録する。Registration of each count data and the like in the memory 9 is performed by the controller 12 as follows. FIG. 3 shows a simple 2-bit configuration as an example. Reference numeral 12 is a case body of the controller, a panel with numbers 12-1 to 12-4 is a keyboard, and each of the four keys is a CPU 8 and a voice IC by signal lines L1, L2, L3.
9 in parallel. The title is displayed for each key underneath the number of the key, but for the sake of convenience, this is for predetermining the relation between the memory 9 and the voice IC 10. Now press the key 12-1 and the signal line L
When 1 is turned on and number output 1 is built in beforehand, the word "Ohayoyozaizai" which is preferable as a reply to the word "Ohayoyo" that is written in the title is selected and this voice is selected. Set the voice IC so that can output. Next, when the experimenter inputs his voice "Ohayoyo", the CPU 8 inputs Dl, Db, Dh obtained by discriminating this voice and inputs Db / Dl, Dh /
Dl, Dh / Db, etc. are calculated and the respective numerical values are stored in the memory 9
It is stored in the designated address 01 of. Then, when the key 12-2 is pressed, the signal line L2 is turned on and the number output 2 is transmitted.
The voice IC 10 selects "Welcome" as a reply to the title "I'm home" of this key, and sets the voice IC 9 so that the voice of this word is output. Next, when the experimenter inputs "I'm home", the CPU 8 uses the key 12-
As in the case of 1, each value such as count data is stored in the memory 9
It is stored in the designated address 02 of. Below, 12-3, 12-
By performing the same operation as the number of 4 and keys, the data of each word by the voice peculiar to the person is registered in the designated address of the memory 9.

【００１５】不特定多数人を対象とする場合には、メモ
リー９はリードオンリーメモリー（ＲＯＭ）として各人
に共通データーとして得られるＤｂ／Ｄｌ，Ｄｈ／Ｄ
ｌ，Ｄｈ／Ｄｂ，の値をこのＲＯＭ９に記瞳させておけ
ば、各人が登録する必要は無くすことが出来る。又、Ｄ
ｂ／Ｄｌ，Ｄｈ／Ｄｌ，等の成分比率を算定するために
はＡ−Ｄ変換器７の計数ビットは８以上が必要であるが
この算定が終了した后はＤｌ〜Ｄｈの値を１／１０に減
じてＣＰＵ８の処理能力やメモリー９の容量を少なくす
ることも出来る。In the case of targeting an unspecified number of people, the memory 9 is a read-only memory (ROM), which is obtained as data common to each person and is Db / Dl, Dh / D.
If the values of l, Dh / Db, are recorded in the ROM 9, it is not necessary for each person to register. Also, D
In order to calculate the component ratios of b / Dl, Dh / Dl, etc., it is necessary that the counting bit of the AD converter 7 be 8 or more. The processing capacity of the CPU 8 and the capacity of the memory 9 can be reduced by reducing the number to 10.

【発明の効果】人が日常使用して居る会話、挨拶、指示
等、の言葉はせいぜい５〜６音の音声で出来て居るもの
が非常に多く、且つ又その応答も常識として決まって居
るものが多いものである。“今日わ”と呼びかけられた
ら“今日わ”と同じ言葉を返事したり、“ただいま”と
帰って来た人に“お帰りなさい”と返事する事を予め決
めておいても少しも不自然ではないものである。従って
本発明の装置の言葉の記憶数を４ビット１６個程にすり
ば、人と会話の出来るお人形や命令に返事をしてその指
示どおりに動作するロボットの開発に応用の効果は大き
いものである。[Effects of the Invention] Most of the words that people use everyday such as conversations, greetings, instructions, etc. are made up of at most 5 or 6 sounds, and their responses are also common sense. There are many. Even if you decide in advance that you will reply to the same words as “today” when you are called “today” or “return home” to the person who came back “just now”, it would be a bit unnatural Is not something. Therefore, if the number of words stored in the device of the present invention is set to about 16 bits in 4 bits, the effect of application is great in developing a doll that can talk with a person or a robot that responds to instructions and operates according to the instructions. Is.

[Brief description of drawings]

【図１】母音ア，イ，エ，の音声信号波形を示したもの
である。FIG. 1 shows voice signal waveforms of vowels a, a, d.

【図２】音声の識別とそれに応答して音声で返事する装
置の原理を示す構成図である。FIG. 2 is a configuration diagram showing the principle of an apparatus for identifying a voice and replying with a voice in response to the identification.

【図３】音声を予め登録しておくためのコントローラー
の構成図である。FIG. 3 is a configuration diagram of a controller for registering voice in advance.

[Explanation of symbols]

１マイクロホン２増巾器３，４，５，周波数帯域毎のフィルター６ゲート回路７Ａ−Ｄ変換器８マイクロコンピューター９メモリー１０音声ＩＣ１１拡声器１２コントローラー 1 Microphone 2 Magnifier 3, 4, 5, Filter for each frequency band 6 Gate circuit 7 A-D converter 8 Microcomputer 9 Memory 10 Voice IC 11 Loudspeaker 12 Controller

Claims

[Claims]

1. A plurality of filter circuits (3, 4, 5) provided for each frequency band to identify spoken words.
After analyzing the input voice signal wave into the signal wave component for each band, the digital signal for each frequency band is accumulated for the duration of the voice by the same number of AD converters (7) as the filter circuit. The obtained count values Dl, Db, Dh and the correlation ratios Db / Dl, Dh / Dl, Dh / of the count values
A voice device which responds to a voice input characterized by identifying the voice from Db

2. The count value shown in claim 1 of a word input by voice and its correlation ratio (hereinafter referred to as count data etc.)
And the count data of words registered in the memory (9) in advance, the microcomputer (CPU)
If (8) is compared and verified, and if a match is determined, the CPU (8) transmits as an address number output of this memory (9), and an IC for voice generation (hereinafter referred to as voice IC) (10). A) A voice device which responds to a voice input which causes the voice programmed as the most suitable word to be returned as a response from the) to be pronounced as a reply or which controls another device with this address output.

3. Pre-registering count data etc. in the memory (9) and pre-programming the words contained in the voice IC (10) and the CPU (8) address output described in claim 2. A voice device that responds to voice input with a controller (12)

4. The registration operation of the memory (9) and the voice IC (10) so that the comparison test is performed using only the correlation ratio for each frequency band among the count data and the like, and the reply of claim 2 is uttered. Device for responding to voice input completed in the process of factory production with relation program between CPU and CPU (8)