JPH0749699A - Audio device responding to voice input - Google Patents

Audio device responding to voice input

Info

Publication number
JPH0749699A
JPH0749699A JP5231081A JP23108193A JPH0749699A JP H0749699 A JPH0749699 A JP H0749699A JP 5231081 A JP5231081 A JP 5231081A JP 23108193 A JP23108193 A JP 23108193A JP H0749699 A JPH0749699 A JP H0749699A
Authority
JP
Japan
Prior art keywords
voice
cpu
words
memory
input
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP5231081A
Other languages
Japanese (ja)
Inventor
Nagamasa Nagano
長正 長野
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to JP5231081A priority Critical patent/JPH0749699A/en
Publication of JPH0749699A publication Critical patent/JPH0749699A/en
Pending legal-status Critical Current

Links

Landscapes

  • Reverberation, Karaoke And Other Acoustics (AREA)

Abstract

PURPOSE:To obtain a device which recognizes words inputted by a voice input and replies the answer to the words in voiced words. CONSTITUTION:A voice input is divided into components for every frequency band by plural filter circuits. The signal waves are converted to digital numerical values by A-D converters 7 provided for each frequency band and each higher harmonic content peculiar to the voice is discriminated from the numerical values for voice recognition. The recognized voice is compared and detected against the voice data of the words beforehand registered in a memory 9 by a CPU 8 and the words of inputted voice are recognized. Then, the CPU 8 generates detected signals and a voice IC 9 replies beforehand selected words which are suitable for the reply to the inputted voice in a voice.

Description

【発明の詳細な説明】Detailed Description of the Invention

【0001】[0001]

【産業上の利用分野】本発明は日常生活の挨拶や、簡単
な言葉による指示、命令、問いかけ等の短い言葉を識別
して、音声による返事をしたり、電気のスイッチをO
N,OFFしたりして応答出来る装置の開発に関するも
のである。
BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention identifies short words such as greetings in daily life, instructions, commands, and inquiries in simple words, and gives a voice reply or turns on an electric switch.
It relates to the development of a device that can respond by turning it on and off.

【0002】[0002]

【従来の技術】人の音声識別に関する研究は大型コンピ
ューターによる音声言語の翻訳や、音声による特定個人
の判別に用いられる声紋装置の開発等広い分野で行われ
て居るが、数語音から成る日常会話程度の短い言葉の識
別を簡単な装置で行わせようとする研究は立ち遅れて居
るのが現状である。
2. Description of the Related Art Research on human speech recognition has been carried out in a wide range of fields such as translation of a speech language by a large computer and development of a voiceprint device used for discrimination of a specific individual by voice. At present, research that attempts to identify words with a short degree using a simple device is lagging behind.

【0003】[0003]

【発明が解決しようとする課題】音声の識別が難しいの
はその振動波形が極めて短時間のうちに複雑に変動し、
且つその振動が微妙に変化すると言葉の意味が大きく変
わったり、又個人差によって振動波形が異なっても意味
は同じである等、非常に複雑な要素をもって居る為であ
る。
DISCLOSURE OF INVENTION Problems to be Solved by the Invention It is difficult to identify a voice because its vibration waveform changes complicatedly in an extremely short time.
Moreover, if the vibration changes subtly, the meaning of the word changes greatly, and even if the vibration waveform is different due to individual differences, the meaning is the same, so there are very complicated elements.

【0004】本発明は、複雑な振動波形を有する音声を
以下に詳述する手法により短時間に識別し、直ちにその
音声に対する応答をする装置を提供せんとするものであ
る。
The present invention is intended to provide a device for identifying a voice having a complicated vibration waveform in a short time by the method described in detail below and immediately responding to the voice.

【0005】[0005]

【課題を解決するための手段】以下、その解決手段を図
面に従い乍ら説明する。人がア,イ,ウ,エ,オ,と母
音を330Hzの音階で発声した時のマイクロホン1の
出力電圧波形を調べてみると1図にその波形の1例とし
てア,イ,エ,の音声を示した如く概略第5高調波成分
迄の合成波形から出来て居る事が分かる。アとオ、イと
ウ、はお互い良く似て居るので図には示さなかったが各
音ともその高調波の含有率が大きく異なって居るので良
く判別出来る。日常会話の音階は200〜1000Hz
の帯域であるから言葉の音声識別は200〜5000H
zの周波数帯域内の音声信号波を解析すれば良いことが
分かる。
[Means for Solving the Problems] The means for solving the problems will be described below with reference to the drawings. When a person examines the output voltage waveform of the microphone 1 when vowels a, a, u, e, o, and vowels are generated at a scale of 330 Hz, one example of the waveform is shown in FIG. As shown in the voice, it can be seen that it is made up of synthesized waveforms up to the fifth harmonic component. A and O, and I and U are not shown in the figure because they are very similar to each other, but the harmonic content of each sound is significantly different, so it can be easily distinguished. The scale of everyday conversation is 200-1000Hz
Because it is the band of 200 to 5000H
It is understood that it is sufficient to analyze the voice signal wave within the frequency band of z.

【0006】日本語は5つの母音と幾つかの子音との合
成で出来ていると考えられて居るが、例えば、「カ」の
音はローマ字でKAと綴る様にKの後にAが引き続いて
発音されて居る様に感じられる。これをミリ秒(ms)
単位の時間軸で波形分析をしてみると音の立ち上がり初
期の十数msの期間はAの音がKの音に変形された、と
も云うべき合成波となって居るがその後の大部分はAの
音である。カーと長く発音してみれば明らかであるよう
に或る程度以上Aの音が長く続いて居ないとカの音には
感じられないものである。以上から日本語の“いろは”
48文字は総て5つの母音で出来て居り、これが10個
程の子音で立ち上がり初期の極めて短時間だけ変形され
て48音となって居る、と云うことが出来る。従って数
語の音声から出来て居る言葉例えば、お早よう、ただい
ま、等の短い言葉はどの母音が幾つ含まれて居るかと云
う事だけを分析しても或る程度の識別は可能である。
Japanese is thought to be composed of five vowels and some consonants. For example, the sound of "ka" is pronounced in the Roman alphabet by KA followed by A followed by A. It feels like you're being done. This is millisecond (ms)
When we analyze the waveform on the unit time axis, it seems that the sound of A is transformed into the sound of K during the period of more than 10 ms at the beginning of the rising of the sound, but most of it is after that. It is the sound of A. As is clear from pronouncing the car for a long time, the sound of A cannot be felt unless the sound of A continues for a certain length of time. From the above, the Japanese word “iroha”
It can be said that the 48 characters are made up of all 5 vowels, and these are made up of about 10 consonants and are transformed for a very short time in the initial stage to form 48 sounds. Therefore, it is possible to identify to a certain extent only by analyzing which vowels are included in a short word such as "Good morning, right now, etc."

【0008】識別された言葉はCPU8により、そのメ
モリー9に登録されて居る言葉と比較される。登録され
た中に一致して居る言葉があればCPU8はこの出力で
音声IC10を動作させる。音声IC10はCPU8に
指定された言葉を音声で発音して入力された音声の返答
とする。これが本発明の基本概念である。
The identified word is compared by the CPU 8 with the word registered in its memory 9. If there is a matching word in the registered words, the CPU 8 operates the voice IC 10 with this output. The voice IC 10 pronounces a word designated by the CPU 8 as a voice and sets it as a response to the input voice. This is the basic concept of the present invention.

【0009】[0009]

【作用と実施例】次に本発明の構成とその実施例及び実
験結果の1例を挙げて説明する。音声入力はマイクロホ
ン1で電気信号(例えば1図の音声波形信)に変形され
た后、増巾器2で増巾され、更にインピーダンス整合さ
れた信号波となって後続の各要素に分配される。3,
4,5,の各フィルター回路に分配された入力信号波は
そのフィルター回路の周波数帯域毎の成分に弁別され次
いで後続のアナログーディジタル(A−D)変換器7に
よってディジタル信号に変換されディジタル出力Dl,
Db,Dh,を得る。ゲート6は入力音声に同期して各
A−D変換器7が動作するように制御するものである。
FUNCTION AND EXAMPLE Next, the constitution of the present invention, its example, and one example of experimental results will be described. The voice input is transformed into an electric signal (for example, a voice waveform signal shown in FIG. 1) by the microphone 1 and then amplified by the amplifier 2 to be further converted into an impedance-matched signal wave, which is distributed to each subsequent element. . Three
The input signal wave distributed to each of the 4, 5 and 5 filter circuits is discriminated into components for each frequency band of the filter circuit and then converted into digital signals by the subsequent analog-digital (A-D) converter 7 to output digital signals. Dl,
Obtain Db, Dh. The gate 6 controls each A-D converter 7 to operate in synchronization with the input voice.

【0010】表1はこの実験データーの1例を示したも
ので、マイクロホン1は市販の電磁型、増巾器2は対数
特性の増巾器、各フィルター回路3,4,5,はクロス
オーバー周波数を700Hzと1500Hzとし、その
減衰特性を−18db/octに設定したアクティブフ
ィルターを用い、A−D変換器7は市販の周波数カウン
ターを3個用いて第2図の回路構成を具体化した場合の
実験値である。各言葉の音階はほぼ250Hzでその継
続時間はいづれも1秒以内である。
Table 1 shows an example of this experimental data. The microphone 1 is a commercially available electromagnetic type, the amplifier 2 is a logarithmic amplifier, and the filter circuits 3, 4, 5 are crossovers. When the frequencies are 700 Hz and 1500 Hz and the attenuation characteristics are set to -18 db / oct, and the AD converter 7 uses the three commercially available frequency counters to embody the circuit configuration of FIG. Is the experimental value of. The scale of each word is approximately 250 Hz, and the duration of each is less than 1 second.

【0011】[0011]

【表1】 [Table 1]

【0012】各フィルター毎の計数値は総て200以下
であるがこれでも各言葉の差異は十分弁別出来る。しか
しこの実験器で複数人の男女について実験を繰り返して
各Dl,Db,Dh,の値を求めてみると次第に個人差
が大きくなって弁別は難しくなるが増巾器2の増巾特性
を更に強い対数特性として高周波帯域の精度を上げて各
高調波の成分比率Db/Dl,Dh/Dl,を算定して
みるとこの値は個人差によるバラッキが非常に少なくな
って著しく弁別の精度が向上する。
Although the count value for each filter is less than 200 in all, the difference in each word can be sufficiently discriminated. However, if the experiment is repeated for a plurality of men and women to obtain the values of Dl, Db, and Dh, the individual difference gradually increases and it becomes difficult to discriminate. When the accuracy of the high frequency band is increased as a strong logarithmic characteristic and the component ratios Db / Dl and Dh / Dl of each harmonic are calculated, this value has very little variation due to individual differences, and the accuracy of discrimination is significantly improved. To do.

【0013】識別された言葉、例えば“ただいま”の実
測値90,110,70,1.2,0.8,はCPU8
でメモリー9に予め登録されて居る言葉のそれと比較検
出され、その番地出力は音声IC9に印加されるので音
声IC9はこの返答として予めセットされて居る音声、
例えば“お帰りなさい”の音声信号を出力する。この出
力は拡声器13により十分な大きさの音声として発声さ
れ返答となるものである。
The identified words, for example, the measured values of "Imaima" 90, 110, 70, 1.2, 0.8 are CPU8.
Is detected by comparison with that of the word pre-registered in the memory 9, and the address output is applied to the voice IC 9, so the voice IC 9 is the voice preset as this reply,
For example, the audio signal "Welcome back" is output. This output is uttered by the loudspeaker 13 as a voice having a sufficiently large volume and becomes a reply.

【0014】メモリー9への各計数データー等の登録は
コントローラー12によって次の通りに行われる。三図
にその一例として簡単な2ビットの構成を示す。12は
コントローラーのケース本体、12−1〜12−4の数
字を記したパネルはキーボードであって4個の各キーは
信号線L1,L2,L3,によってCPU8と音声IC
9に並列接続される。キーの数字の下にはお早よう、た
だいま等、各キー毎にタイトルが表示されて居るが、こ
れはメモリー9と音声IC10との関連を予め決めてお
く為のものである。今、キー12−1を押すと信号線L
1がオンとなり番号出力1により予め内蔵されて居る多
数の言葉の内からタイトルに記されて居る言葉“お早よ
う”に対する返答として好ましい“お早よう御座いま
す”と云う言葉を選んでこの音声を出力出来るように音
声ICをセットする。次に実験者が“お早よう”と自分
の音声を入力するとCPU8はこの音声を弁別して得ら
れたDl,Db,Dh,を入力してDb/Dl,Dh/
Dl,Dh/Db,等を演算しこの各数値をメモリー9
の指定番地01に記憶させる。続いてキー12−2を押
すと信号線L2がオンとなって番号出力2を発信する。
音声IC10はこのキーのタイトル“ただいま”の返答
として“お帰りなさい”を選んでこの言葉の音声が出力
されるように音声IC9をセットする。次に実験者が
“ただいま”と入力するとCPU8は前述のキー12−
1の時と同様にして計数データー等の各値をメモリー9
の指定番地02に記憶させる。以下、12−3,12−
4,とキーの数だけ同様の操作を行ってその人固有の音
声による各言葉のデーターをメモリー9の指定番地に登
録する。
Registration of each count data and the like in the memory 9 is performed by the controller 12 as follows. FIG. 3 shows a simple 2-bit configuration as an example. Reference numeral 12 is a case body of the controller, a panel with numbers 12-1 to 12-4 is a keyboard, and each of the four keys is a CPU 8 and a voice IC by signal lines L1, L2, L3.
9 in parallel. The title is displayed for each key underneath the number of the key, but for the sake of convenience, this is for predetermining the relation between the memory 9 and the voice IC 10. Now press the key 12-1 and the signal line L
When 1 is turned on and number output 1 is built in beforehand, the word "Ohayoyozaizai" which is preferable as a reply to the word "Ohayoyo" that is written in the title is selected and this voice is selected. Set the voice IC so that can output. Next, when the experimenter inputs his voice "Ohayoyo", the CPU 8 inputs Dl, Db, Dh obtained by discriminating this voice and inputs Db / Dl, Dh /
Dl, Dh / Db, etc. are calculated and the respective numerical values are stored in the memory 9
It is stored in the designated address 01 of. Then, when the key 12-2 is pressed, the signal line L2 is turned on and the number output 2 is transmitted.
The voice IC 10 selects "Welcome" as a reply to the title "I'm home" of this key, and sets the voice IC 9 so that the voice of this word is output. Next, when the experimenter inputs "I'm home", the CPU 8 uses the key 12-
As in the case of 1, each value such as count data is stored in the memory 9
It is stored in the designated address 02 of. Below, 12-3, 12-
By performing the same operation as the number of 4 and keys, the data of each word by the voice peculiar to the person is registered in the designated address of the memory 9.

【0015】不特定多数人を対象とする場合には、メモ
リー9はリードオンリーメモリー(ROM)として各人
に共通データーとして得られるDb/Dl,Dh/D
l,Dh/Db,の値をこのROM9に記瞳させておけ
ば、各人が登録する必要は無くすことが出来る。又、D
b/Dl,Dh/Dl,等の成分比率を算定するために
はA−D変換器7の計数ビットは8以上が必要であるが
この算定が終了した后はDl〜Dhの値を1/10に減
じてCPU8の処理能力やメモリー9の容量を少なくす
ることも出来る。
In the case of targeting an unspecified number of people, the memory 9 is a read-only memory (ROM), which is obtained as data common to each person and is Db / Dl, Dh / D.
If the values of l, Dh / Db, are recorded in the ROM 9, it is not necessary for each person to register. Also, D
In order to calculate the component ratios of b / Dl, Dh / Dl, etc., it is necessary that the counting bit of the AD converter 7 be 8 or more. The processing capacity of the CPU 8 and the capacity of the memory 9 can be reduced by reducing the number to 10.

【発明の効果】人が日常使用して居る会話、挨拶、指示
等、の言葉はせいぜい5〜6音の音声で出来て居るもの
が非常に多く、且つ又その応答も常識として決まって居
るものが多いものである。“今日わ”と呼びかけられた
ら“今日わ”と同じ言葉を返事したり、“ただいま”と
帰って来た人に“お帰りなさい”と返事する事を予め決
めておいても少しも不自然ではないものである。従って
本発明の装置の言葉の記憶数を4ビット16個程にすり
ば、人と会話の出来るお人形や命令に返事をしてその指
示どおりに動作するロボットの開発に応用の効果は大き
いものである。
[Effects of the Invention] Most of the words that people use everyday such as conversations, greetings, instructions, etc. are made up of at most 5 or 6 sounds, and their responses are also common sense. There are many. Even if you decide in advance that you will reply to the same words as “today” when you are called “today” or “return home” to the person who came back “just now”, it would be a bit unnatural Is not something. Therefore, if the number of words stored in the device of the present invention is set to about 16 bits in 4 bits, the effect of application is great in developing a doll that can talk with a person or a robot that responds to instructions and operates according to the instructions. Is.

【図面の簡単な説明】[Brief description of drawings]

【図1】母音ア,イ,エ,の音声信号波形を示したもの
である。
FIG. 1 shows voice signal waveforms of vowels a, a, d.

【図2】音声の識別とそれに応答して音声で返事する装
置の原理を示す構成図である。
FIG. 2 is a configuration diagram showing the principle of an apparatus for identifying a voice and replying with a voice in response to the identification.

【図3】音声を予め登録しておくためのコントローラー
の構成図である。
FIG. 3 is a configuration diagram of a controller for registering voice in advance.

【符号の説明】[Explanation of symbols]

1 マイクロホン 2 増巾器 3,4,5, 周波数帯域毎のフィルター 6 ゲート回路 7 A−D変換器 8 マイクロコンピューター 9 メモリー 10 音声IC 11 拡声器 12 コントローラー 1 Microphone 2 Magnifier 3, 4, 5, Filter for each frequency band 6 Gate circuit 7 A-D converter 8 Microcomputer 9 Memory 10 Voice IC 11 Loudspeaker 12 Controller

Claims (4)

【特許請求の範囲】[Claims] 【請求項1】 音声による言葉を識別するために周波数
帯域別に設けた複数個のフィルター回路(3,4,5)
で入力音声信号波を、その帯域毎の信号波成分に分析し
た后、フィルター回路と同数個のA−D変換器(7)に
よって各周波数帯域毎のディジタル信号を音声の継続時
間だけ累積して得られた計数値Dl,Db,Dh,及び
その計数値の相関比率Db/Dl,Dh/Dl,Dh/
Db,からその音声を識別することを特徴とする音声入
力に応答する音声装置
1. A plurality of filter circuits (3, 4, 5) provided for each frequency band to identify spoken words.
After analyzing the input voice signal wave into the signal wave component for each band, the digital signal for each frequency band is accumulated for the duration of the voice by the same number of AD converters (7) as the filter circuit. The obtained count values Dl, Db, Dh and the correlation ratios Db / Dl, Dh / Dl, Dh / of the count values
A voice device which responds to a voice input characterized by identifying the voice from Db
【請求項2】 音声入力された言葉の請求項1に示した
計数値及びその相関比率(以下計数データー等と記す)
と、予めメモリー(9)に登録させておいた言葉の計数
データー等とをマイクロコンピューター(CPU)
(8)に比較検定させ、若し合致したものが判定されれ
ばCPU(8)はこのメモリー(9)の番地番号出力と
して発信し、音声発生用のIC(以下音声ICとして記
す)(10)からその言葉の返答として最もふさわしい
言葉としてプログラムされている音声を返事として発音
させたり或いはこの番地出力で他の異なる機器を制御し
たりする音声入力に応答する音声装置
2. The count value shown in claim 1 of a word input by voice and its correlation ratio (hereinafter referred to as count data etc.)
And the count data of words registered in the memory (9) in advance, the microcomputer (CPU)
If (8) is compared and verified, and if a match is determined, the CPU (8) transmits as an address number output of this memory (9), and an IC for voice generation (hereinafter referred to as voice IC) (10). A) A voice device which responds to a voice input which causes the voice programmed as the most suitable word to be returned as a response from the) to be pronounced as a reply or which controls another device with this address output.
【請求項3】 メモリー(9)に計数データー等を予め
登録したり、音声IC(10)が内蔵して居る言葉と請
求項2で述べたCPU(8)番地出力とを予めプログラ
ムしておく為にコントローラー(12)を設けた音声入
力に応答する音声装置
3. Pre-registering count data etc. in the memory (9) and pre-programming the words contained in the voice IC (10) and the CPU (8) address output described in claim 2. A voice device that responds to voice input with a controller (12)
【請求項4】 計数データー等の内、周波数帯域毎の相
関比率のみを用いて比較検定を行い、請求項2の返事を
発声させるようにしてメモリー(9)の登録操作及び音
声IC(10)とCPU(8)との関係プログラムを工
場生産の工程中で完了させた音声入力に応答する音声装
4. The registration operation of the memory (9) and the voice IC (10) so that the comparison test is performed using only the correlation ratio for each frequency band among the count data and the like, and the reply of claim 2 is uttered. Device for responding to voice input completed in the process of factory production with relation program between CPU and CPU (8)
JP5231081A 1993-08-07 1993-08-07 Audio device responding to voice input Pending JPH0749699A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP5231081A JPH0749699A (en) 1993-08-07 1993-08-07 Audio device responding to voice input

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP5231081A JPH0749699A (en) 1993-08-07 1993-08-07 Audio device responding to voice input

Publications (1)

Publication Number Publication Date
JPH0749699A true JPH0749699A (en) 1995-02-21

Family

ID=16917993

Family Applications (1)

Application Number Title Priority Date Filing Date
JP5231081A Pending JPH0749699A (en) 1993-08-07 1993-08-07 Audio device responding to voice input

Country Status (1)

Country Link
JP (1) JPH0749699A (en)

Similar Documents

Publication Publication Date Title
Ainsworth Duration as a cue in the recognition of synthetic vowels
Sroka et al. Human and machine consonant recognition
JP3968133B2 (en) Speech recognition dialogue processing method and speech recognition dialogue apparatus
JP4867804B2 (en) Voice recognition apparatus and conference system
Hollien et al. Speaker identification by long‐term spectra under normal and distorted speech conditions
Arslan et al. Frequency characteristics of foreign accented speech
DE60014583T2 (en) METHOD AND DEVICE FOR INTEGRITY TESTING OF USER INTERFACES OF VOICE CONTROLLED EQUIPMENT
JPS60247697A (en) Voice recognition responder
Davis et al. Tests of the perceptual magnet effect for American English/k/and/g
David Artificial auditory recognition in telephony
JPS60181798A (en) Voice recognition system
JPH0749699A (en) Audio device responding to voice input
JPS59137999A (en) Voice recognition equipment
US5774862A (en) Computer communication system
Arnold et al. The synthesis of English vowels
JPH04273298A (en) Voice recognition device
JPH04324499A (en) Speech recognition device
JP2980382B2 (en) Speaker adaptive speech recognition method and apparatus
JPS645320B2 (en)
JPS59111699A (en) Speaker recognition system
Zhao et al. A study on emotional feature recognition in speech
JPH0556519B2 (en)
Povel Development of a vowel corrector for the deaf
Massey Transients at stop‐consonant releases
Kannenberg et al. Speech intelligibility of two voice output communication aids