JPH0493980A

JPH0493980A - Language learning system

Info

Publication number: JPH0493980A
Application number: JP20978690A
Authority: JP
Inventors: Takeshige Fujitani; 藤谷　武茂
Original assignee: Individual
Current assignee: Individual
Priority date: 1990-08-06
Filing date: 1990-08-06
Publication date: 1992-03-26

Abstract

PURPOSE:To visually discriminate between both a standard pronunciation and a learner's voiced pronunciation and to learn a language by using their similarity by spectrum analyzing both pronunciations respectively and generating voice formant signals, and displaying them on a monitor, etc., one over the other. CONSTITUTION:When learner inputs his or her voice pronunciation through a microphone 24, the voice pronunciation signal is amplified temporarily by an amplifier 25 and sent to an LPF switch 7 and a spectrum analyzer 8 detects its power spectrum to generate a voice pronunciation formant signal, which is passed through a learner voice pronunciation storage circuit 17 and mixed by a mixing circuit 21, so that its image is displayed on the monitor where the character and symbol frame of a video channel and the standard pronunciation formant signal are displayed one over the other. Here, the shape is generated with a sampling frequency signal sent to a control circuit 14. Then the learner discriminates and recognizes whether or not the standard pronunciation formant signal and voice pronunciation formant signal are visually similar, or whether or not the learner correctly pronunces according to the standard pronunciation.

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明は、語学学習システムに係り、より詳細には、教
師等による標準発音と、学習者による学習者発生発音と
の両発音を、それぞれ視覚認識形状信号としてモニタ上
に表示させ、両信号の一致、不一致（相似、非相似）を
学習者が視覚的に認識することで語学学習できるように
した語学学習システムに関する。[Detailed Description of the Invention] [Industrial Application Field] The present invention relates to a language learning system, and more specifically, the present invention relates to a language learning system, and more specifically, it is capable of producing both standard pronunciation by a teacher, etc., and learner-generated pronunciation by a learner, respectively. The present invention relates to a language learning system in which a visually recognized shape signal is displayed on a monitor so that a learner can learn a language by visually recognizing coincidence or mismatch (similarity or dissimilarity) between both signals.

[Conventional technology]

従来、語学学習システムは、テキストと、該テキストの
内容を教師等によって標準発音として録音したテープ、
ディスクを再生するテープレコーダー等の聴覚的把握装
置とを有し、学習者が、上記テキストを参考しながら該
聴覚的把握装置によって、テープ等に録音された標準発
音を繰り返し再生して聴覚把握した後、該把握した標準
発音を真似るように音声発音し、これを上記聴覚的把握
装置に録音、再生して、上記標準発音と、学習者の録音
再生した発音とを比較しながら学習できるようにした学
習システムとなっている。Conventionally, language learning systems have consisted of a text, a tape recorded with the content of the text as standard pronunciation by a teacher, etc.
The system is equipped with an auditory comprehension device such as a tape recorder that plays a disc, and the learner uses the auditory comprehension device to repeatedly play standard pronunciation recorded on a tape etc. while referring to the above text to grasp the auditory comprehension. After that, the learner makes an audio pronunciation imitating the learned standard pronunciation, records and plays it on the auditory comprehension device, and learns by comparing the standard pronunciation with the learner's recorded and played pronunciation. It has become a learning system.

そして、この学習システムによれば、学習者が自由な時
間に、また必要に応じて学習できるという利点を有して
いる。This learning system has the advantage that learners can study in their free time and as needed.

ところで、言語は、民族毎に異なり、それぞれ別個Φ音
声言語を構築している。しかし、この音声言語は、いず
れも、人間の自己表現の伝達手段であって、その必要性
が生じた時、大脳中枢に呼び戻された記憶に基づき、論
理的思考で発声すべき内容が音声として言葉の形態を取
るものである、しかし、この音声言語は、単なる文字列
でなく、強弱、柔軟性等の情報が必要である。そして、
音声は発声と同時に、発声者の耳に入り、他人の音声を
聞く時と同様な処理を受けて、まだ短期記憶に残ってい
る言葉と照合され、この照合によって、今、発声した言
葉が正しく調音されたかどうかを半ば無意識のうちに調
べ、口の動きを制御し、また、話相手の態度や行動を見
て、間違いに気付き言い直し等して、正しい言語を発声
するようにしている。By the way, each ethnic group has a different language, and each has its own Φ spoken language. However, these spoken languages are all means of human self-expression, and when the need arises, the content that should be uttered through logical thinking is expressed as voice based on memories recalled to the cerebral center. However, this spoken language is not just a string of characters, but also requires information such as strength, weakness, and flexibility. and,
The voice enters the speaker's ears at the same time as it is uttered, and is processed in the same way as when listening to someone else's voice, and is compared with the words still in short-term memory. Through this comparison, the words that have just been uttered are confirmed to be correct. Semi-consciously, we check whether the words are articulated or not, control the movements of our mouths, observe the attitude and actions of the person we are speaking to, notice mistakes and rephrase them, so that we can utter the correct language.

また、反対に他人の発声する音声の認識は、その音声が
耳に入ることから始まり、該音声は音波として鼓膜を振
動させ、その振動は、−次二二−ロンの電気パルスとし
て観測され、聴覚神経回路網に送られ、処理され、形態
を変えながら聴覚中枢といわれる大脳の側頭葉の部分に
至る。そして、音声が、単語の場合は、大脳では、入力
情報として記憶の中の単語と比較され、近いものが記憶
にあれば、それが聞こえたとされる。また、近いものが
記憶の中にない場合は、その入力情報を分解して、その
各部を記憶と照合して、僚たちのを組み合わせた単語が
入力されたとする。このように、知らない単語は認識で
きないが、良く知っている単語は容易に認識できるわけ
である。また、状況判断によって予測される言葉があれ
ば、少し違っても予想した単語が聞こえたとしてしまい
、予想したものが聞こえるというのが人間の音声認識の
本質であるとされている。On the other hand, the recognition of the voice uttered by another person begins with the voice entering the ear, and the voice vibrates the eardrum as a sound wave, and the vibration is observed as an electric pulse of -22-ron. It is sent to the auditory nerve network, processed, and changes form until it reaches the auditory center, the temporal lobe of the brain. If the sound is a word, the brain compares it with the word in memory as input information, and if it is similar to the word in memory, it is said to have been heard. Also, if there is no similar word in memory, the input information is broken down and each part is compared with memory, and a word that combines the words of the colleagues is input. In this way, unfamiliar words cannot be recognized, but familiar words can be easily recognized. Furthermore, if there is a word predicted based on situational judgment, it is assumed that the predicted word is heard even if it is slightly different, and it is said that the essence of human speech recognition is that the predicted word is heard.

それでは、語学学習は、本来、どのようにして行われる
のかについて検討すると、人間が言葉を学習する状態は
、乳幼児に、初めて言葉を覚えさす状態を考えることで
理解できる。すなわち、教師である母親が、乳幼児の身
体に手を触れ、自分が、発声発音を繰り返しながら、乳
幼児に発声を促す動作をすると、乳幼児は母親の表情、
動作で何を声に出さなければいけないのかを感じて発声
する。この場合、母親は、乳幼児の発声発音が自分の発
音と同じと思うまで繰り返し練習を促し、乳幼児は、母
親の口の動きを見て、耳から入る音声の発音と同じ発音
を出すように努力することになる。これに対し、発音が
同じになると、母親は、言葉・感情・動作で感動を強く
表現し、乳幼児は、母親の表情をみて、この音声を頭脳
に強く記憶し、次からは、簡単にその音を発音するよう
になり、この状態が、人間が言葉を覚える原点であると
されている。Now, if we consider how language learning is originally carried out, we can understand the state in which humans learn words by considering the state in which infants and young children memorize words for the first time. In other words, when a mother who is a teacher touches an infant's body and encourages the infant to vocalize while repeating vocal pronunciations, the infant recognizes the mother's facial expressions,
Feel what you need to say out loud through your movements and vocalize it. In this case, the mother encourages the infant to practice repeatedly until she thinks that the infant's vocal pronunciation is the same as her own, and the infant watches the mother's mouth movements and tries to produce the same pronunciation as the one heard through the ears. I will do it. On the other hand, when the pronunciation is the same, the mother expresses her emotions strongly through words, emotions, and actions, and the infant sees the mother's facial expressions and memorizes this voice strongly in his brain, and the infant can easily repeat the same voice next time. They begin to pronounce sounds, and this state is said to be the origin of how humans learn words.

次に、以上の点を踏まえて語学学習の場合について検討
すると、該語学学習は、標準発音とされる音声発音を、
学習者が真似て発声発音することから始まり、学習者の
発声発音が標準発音に相似していると認識されるまで、
繰り返し発声発音の学習を行い、発声発音が、標準発音
に相似したと認識された時、その発声発音を標準発音と
して脳に強く記憶させる形態をとっている。この場合、
語学学習は、教師が、発音の比較認識を行うのが理想で
あるが、−船釣には、前述した語学学習システムよりな
る教材を用いた学習方法を選択している。Next, considering the case of language learning based on the above points, the language learning is based on the standard pronunciation of audio pronunciation,
It starts with the learner imitating the pronunciation, and continues until the learner's vocal pronunciation is recognized as similar to the standard pronunciation.
The system repeatedly learns the vocal pronunciation, and when the vocal pronunciation is recognized as similar to the standard pronunciation, the vocal pronunciation is strongly memorized in the brain as the standard pronunciation. in this case,
In language learning, it is ideal for teachers to compare and recognize pronunciations, but for boat fishing, we have chosen a learning method that uses teaching materials from the language learning system described above.

[Problem to be solved by the invention]

しかし、このような語学学習システムの場合、上記標準
発音に学習者の発音が一致（相似）しているかの判断を
、該学習者の聴覚判断で行うため、通常、その識別が充
分でなく、学習の進捗状態が把握できないというのが実
情である。すなわち、従来の教材による場合、標準発音
の比較識別ができないので、元来、識別能力を有しない
学習者は、標準発音を習得するまで時間がかかったり、
不正確な発音でも標準発音として認識する誤りをおかし
易い。このため、その近似する言葉を発音できても、ヒ
アリング能力が高まらないので、学習の進歩が遅く、途
中で学習を止めてしまうことが多い。However, in the case of such language learning systems, the learner's auditory judgment is used to determine whether the learner's pronunciation matches (is similar to) the above-mentioned standard pronunciation, and this discrimination is usually insufficient. The reality is that the progress of learning cannot be monitored. In other words, with conventional teaching materials, it is not possible to compare and identify standard pronunciations, so it takes a long time for learners who do not have the ability to identify standard pronunciations.
It is easy to make the mistake of recognizing incorrect pronunciation as standard pronunciation. For this reason, even if students are able to pronounce similar words, their hearing ability does not improve, so their progress in learning is slow and they often stop learning midway through.

本発明者は、以上の点より、学習機器に標準発音との比
較識別ができる能力を備えさせることで、学習者が正し
い標準発音を脳に記憶でき、乳幼児が言葉を覚えるよう
に、無理なく会得できるのではないかという点に着目し
、従来の語学学習教材について考察を行った。Based on the above points, the present inventors believe that by equipping a learning device with the ability to compare and identify standard pronunciations, learners will be able to memorize the correct standard pronunciation in their brains, and in the same way that infants and toddlers can memorize words, the present inventor will Focusing on the point that it may be possible to understand the language, we considered conventional language learning materials.

そして、従来の教材の場合、学習者の聴覚にのみ働きか
けるもので、「音」は「耳で聞くものＪという概念が存
在することより、音声発音は、聴覚による限り、一過性
であり、音声発音の比較識別能力には個人差が生じ、個
人差の生じる聴覚にて、学習者に音声発音の比較識別を
求める方法には無理があることを知り得た。In the case of conventional teaching materials, they only work on the auditory sense of the learner, and since there is a concept that "sound" is "something heard with the ear," vocal pronunciation is temporary as far as it is based on the auditory sense. We learned that there are individual differences in the comparative discrimination ability of vocal pronunciations, and that it is impossible to ask learners to compare and discriminate vocal pronunciations due to individual differences in auditory ability.

そこで、本発明者は、人間の五感のうちの聴覚以外で音
声を識別し得る感覚がないものかという観点より、視覚
を利用する方法を検討した。本来、視覚も聴覚と同様に
個人差が存在するものの、形状識別については比較的個
人差を生じない能力であると言える。このことより、音
声発音を形状化して表現することが可能であれば、標準
発音と音声発音とを形状化して表現することが可能とな
る。また、音声発音の形状化が可能となれば、聴覚では
無理であった音声発音の比較識別が、視覚による比較識
別では、学習者が、誰でも簡単に、しかも正確な音声発
音の比較識別能力を得ることが可能となる。Therefore, the present inventor investigated a method using visual sense from the viewpoint of whether there is any sense other than hearing among the five human senses that can identify sounds. Although there are originally individual differences in vision as well as hearing, it can be said that shape recognition is an ability that does not cause individual differences. From this, if it is possible to express the audio pronunciation in a form, it becomes possible to express the standard pronunciation and the audio pronunciation in the form of a form. In addition, if it becomes possible to formulate phonetic pronunciations, it will be possible for any learner to easily and accurately compare and identify phonetic pronunciations. It becomes possible to obtain.

それでは、視覚による音声発音の比較識別としては、ど
のような手段とするのが好ましいかということになるが
、まず、発声発音は、音の高低を有する。すなわち、人
間の発声発音は、音程を変えると、基本周波数は変わる
が、声として聞けば、ある程度の幅をもって同じ音に聞
こえる。これは、音を発声する時、音程を変えても、声
門から唇の先までの音声器官は、殆ど変化しないからで
ある。そして、声門から唇までを声道といい、声道には
形があり、人間の耳に聞こえる音声情報は、声道の形が
同じであれば、音源で発声する音がどんなものであって
も、皆同じ音声情報に聞こえるということである。人間
の耳による音声情報のやりとりは、各人の声道の形を同
じにすることから始まり、声道の形と声道の周波数特性
が一致し、標準発音に限りなく近づいた発声発音は、声
道の形は同じになるといえ、声道の周波数特性が等しい
といえる。The question then becomes what kind of means is preferable for visually comparing and identifying vocal pronunciations. First, vocal pronunciations have pitches. In other words, when human vocalizations change the pitch, the fundamental frequency changes, but when heard as a voice, it sounds like the same sound with a certain range of range. This is because when producing a sound, even if the pitch changes, the vocal organs from the glottis to the tip of the lips hardly change. The area from the glottis to the lips is called the vocal tract.The vocal tract has a shape, and as long as the shape of the vocal tract is the same, the voice information heard by the human ear is independent of the sound produced by the sound source. This means that everyone hears the same audio information. The exchange of voice information by the human ear begins with the same shape of each person's vocal tract, and the shape of the vocal tract matches the frequency characteristics of the vocal tract, resulting in vocal pronunciation that is as close to standard pronunciation as possible. The shape of the vocal tract can be said to be the same, and the frequency characteristics of the vocal tract can be said to be the same.

従って、声道の周波数特性を検出することによって、音
声情報を比較をすることが可能となり、音声認識できる
。音声認識は、音声情報を比較識別するものであるが、
その方法は、個性のある音声情報に対して、標準発音を
ベースに音声情報の比較認識を行うものであるため、現
在、完全な音声！！熾は完成されていない。Therefore, by detecting the frequency characteristics of the vocal tract, it becomes possible to compare voice information and perform voice recognition. Speech recognition compares and identifies voice information,
This method performs comparative recognition of unique speech information based on standard pronunciation, so it is currently possible to use complete speech! ! The fire is not completed.

しかし、学習の場合、学習者が、標準発音を真似て発声
発音を繰り返す目的は、標準発音に発声発音が相似する
ことである。そのため、標準発音に限り無く近づいた発
声発音は、ｔａＹｓ発音の声道の形、すなわち声道の周
波数特性を検出することによって、標準発音と発声発音
とを相似発音として比較識別することが可能となり、学
習の場合に限り、完全な音声認識をすることができるこ
とを知り得た。However, in the case of learning, the purpose of the learner repeating the vocal pronunciation by imitating the standard pronunciation is for the vocal pronunciation to be similar to the standard pronunciation. Therefore, by detecting the shape of the vocal tract of the taYs pronunciation, that is, the frequency characteristics of the vocal tract, it becomes possible to compare and identify the standard pronunciation and the vocal pronunciation as similar pronunciations. We learned that complete speech recognition is possible only through learning.

そして、音声発音の形状化は、前述の周波数特性（ホル
マント）を検出することによって行い、標準発音波形を
標準パターンとして、テレビモニタ画面上に静止画とし
て表示し、発声発音も同様な処理をして発声発音波形を
発声パターンとして標準パターンに重ね合わせて表示し
、学習者は、モニタの標準パターンと発声パターンの相
似を視覚によって比較認識することができる。ここで、
標準パターンは、任意時間表示できるようにしておき、
学習者は、発声発音するたびに表示される発声パターン
を標準パターンと比較識別し、発声パターンが標準パタ
ーンに相似するまで発声発音を繰り返し学習し、発声パ
ターンが標準パターンと一致することで標準発音の学習
を行えることを究明した。Then, the vocal pronunciation is shaped by detecting the frequency characteristics (formants) mentioned above, and the standard pronunciation waveform is displayed as a still image on the TV monitor screen as a standard pattern, and the vocal pronunciation is also processed in the same way. The utterance sound waveform is displayed as a utterance pattern superimposed on the standard pattern, and the learner can visually compare and recognize the similarity between the standard pattern on the monitor and the utterance pattern. here,
The standard pattern can be displayed at any time,
Learners compare and identify the pronunciation pattern that is displayed each time they pronounce the pronunciation with the standard pattern, and repeatedly learn the pronunciation until the pronunciation pattern resembles the standard pattern. We have determined that it is possible to perform learning.

本発明は、上述した点に対処して創案したものであって
、その目的とする処は、標準発音および、学習者の発す
る発声発音を形状化することによって、その両発音を視
覚的に識別し、その相似性により、語学学習を行えるよ
うにした語学学習システムを提供することにある。The present invention has been devised in response to the above-mentioned problems, and its purpose is to visually identify the standard pronunciation and the vocalized pronunciation produced by the learner by shaping them. However, it is an object of the present invention to provide a language learning system that enables language learning based on the similarities.

[Means to solve the problem]

そして、上記課題を解決するための手段としての本発明
の語学学習システムは、標準発音をスペクトル分析して
パワースペクトルを検出すると共に、該パワースペクト
ルより音韻性に関係する特徴パラメータを抽出して標準
発音形状化信号を得て、該信号を標準発音視覚認識表示
できるようにした標準発音視覚認識表示手段と、学習者
発声発音をスペクトル分析してパワースペクトルを検出
すると共に、該パワースペクトルより音韻性に関係する
特徴パラメータを抽出して学習者発声発音形状化信号を
得て、該信号を学習者発声発音視覚認識表示できるよう
にした学習者発声発音視覚認識表示手段と、上記各手段
で上記標準発音視覚認識表示された標準発音形状化信号
と、該学習者発声発音視覚認識表示された学習者発声発
音形状化信号とを視覚的に比較する信号比較手段とを有
し、該信号比較手段によって両信号の相似の比較度合を
把握することにより語学学習を行えるようにした構成よ
りなる。The language learning system of the present invention as a means for solving the above problems spectrally analyzes standard pronunciation to detect a power spectrum, extracts characteristic parameters related to phonology from the power spectrum, and A standard pronunciation visual recognition display means that obtains a pronunciation shaping signal and can visually recognize and display the standard pronunciation, and detects a power spectrum by spectral analysis of learner's uttered pronunciation, and detects phonological characteristics from the power spectrum. A learner's vocal pronunciation visual recognition display means is capable of extracting feature parameters related to the learner's vocal pronunciation to obtain a learner's vocal pronunciation shaping signal and displaying the signal through visual recognition of the learner's vocal pronunciation; The signal comparison means visually compares the standard pronunciation shaping signal displayed for visual recognition of pronunciation and the learner vocalization shaping signal displayed for visual recognition of the learner's vocalization, and the signal comparison means The system is configured so that language learning can be performed by understanding the degree of similarity between both signals.

また、本発明の語学学習システムは、上記構成において
、必要に応じて、標準発音と学習者発声発音とを、共通
のスペクトル分析器によって、スペクトル分析し、該発
音のパワースペクトルを検出すると共に音韻性に関係す
る特徴パラメータを抽出するようにした構成としてもよ
く、また、モニタ上に、標準発音形状化信号と学習者発
声発音形状化信号とを画像表示し、両信号を重ね合わせ
ることにより信号比較を視覚的に行えるようにした構成
としてもよい。また、標準発音形状化信号と学習者発声
発音形状化信号とを信号比較する信号比較手段による信
号一致、不一致あるいは信号和イ以、非相似をＬＥＤ、
ランプ等の発光素子によって視覚表示するようにしても
よい。更に、単語または文を音声とした音声発音と、該
音声発音を構成する単語または文の長さを特定する語長
検出信号とを記録して標準発音記録体を作成し、該標準
発音記録体を標準発音として構成し、また標準発音記録
体をビデオディスクで形成し、該ディスクの映像チャン
ネルに映像と共に標準発音のスベル、発音記号等の文字
・記号フレームを記録し、また音声チャンネルのうちの
一チャンネルに上記映像チャンネルの映像に対応する説
明用音声や標準発音音声等の音声を記録し、また他の一
チャンネルに繰り返しのための標準発音と語長検出信号
を記録するようにした構成としてもよい。Further, in the language learning system of the present invention, in the above configuration, if necessary, the standard pronunciation and the learner's uttered pronunciation are subjected to spectrum analysis using a common spectrum analyzer to detect the power spectrum of the pronunciation. It may also be configured to extract feature parameters related to gender.Alternatively, the standard pronunciation shaping signal and the learner's utterance pronunciation shaping signal can be displayed as images on the monitor, and the signals can be generated by superimposing both signals. A configuration that allows comparison to be made visually may also be used. In addition, the signal comparison means for comparing the standard pronunciation shaping signal and the learner's vocalization shaping signal can detect signal coincidence, mismatch, signal sum, or dissimilarity using an LED.
Visual display may be provided using a light emitting device such as a lamp. Furthermore, a standard pronunciation record is created by recording an audio pronunciation of a word or sentence and a word length detection signal that specifies the length of the word or sentence that constitutes the audio pronunciation. is configured as a standard pronunciation, and a standard pronunciation recording body is formed as a video disk, and characters and symbol frames such as suberu and phonetic symbols of the standard pronunciation are recorded along with images on the video channel of the disk, and In one channel, audio such as explanatory audio and standard pronunciation audio corresponding to the video of the video channel is recorded, and in another channel, standard pronunciation and word length detection signals for repetition are recorded. Good too.

[Effect]

そして、上記構成に基づ（、本発明の語学学習システム
は、教師等によって作成された標準発音を記録したビデ
オディスク等の標準発音記録体、または、これから入力
される標準発音を用い、標準発音視覚認識表示手段でも
って、スペクトル分析して声道の周波数特性（ホルマン
ト）を得て、この波形をモニタ上に視覚認識表示させ、
その後、学習者に、該標準発音を真似た音声発音をさせ
ると共に、該音声発音を学習者発声発音視覚認識表示手
段でもって、標準発音と同様にスペクトル分析して声道
の周波数特性を得て、この波形をモニタ上に視覚認識表
示させた後、画形状化信号を信号比較手段でもって、学
習者の視覚によって、その相似性を比較し、その学習度
合を視覚的に把握しながら語学学習できるように作用す
る。Based on the above configuration, the language learning system of the present invention uses a standard pronunciation recording medium such as a video disk that records standard pronunciation created by a teacher or the like, or standard pronunciation that is input from now on, and uses standard pronunciation. Using a visual recognition display means, spectrum analysis is performed to obtain the frequency characteristics (formant) of the vocal tract, and this waveform is visually recognized and displayed on a monitor.
After that, the learner is asked to make an audio pronunciation that imitates the standard pronunciation, and the audio pronunciation is analyzed using a visual recognition display means to obtain the frequency characteristics of the vocal tract, in the same way as the standard pronunciation. After visually recognizing and displaying this waveform on a monitor, the learner visually compares the image shape signal using a signal comparison means and learns the language while visually grasping the degree of learning. act so that it can.

以上のように、本発明の語学学習システムは、標準発音
と学習者発声発音とを、それぞれスペクトル分析して、
ホルマントを検出し、音声形状化信号とし、これをモニ
タ等に表示すると共に、重ね合わせて、視覚的に比較識
別することで語学学習できるようにした点に特徴を有し
、この点によって、聴覚的だけでなく、視覚的手段でも
って標準発音と学習者発声発音とが相似性を有するか否
かを把握し、乳幼児が行う学習と同様な学習を行えると
いう格別な作用を奏する。As described above, the language learning system of the present invention spectrally analyzes the standard pronunciation and the learner's pronunciation, and
The feature is that the formant is detected, converted into a speech shape signal, displayed on a monitor, etc., and superimposed so that language learning can be visually compared and identified. This system has a special effect in that it is possible to understand whether or not there is similarity between the standard pronunciation and the learner's pronunciation using not only the target but also visual means, and learning similar to that performed by infants and young children.

〔Example〕

以下、図面を参照しながら、本発明を具体化した実施例
について説明する。Hereinafter, embodiments embodying the present invention will be described with reference to the drawings.

ここに、第１〜４図は、本発明のンステムを具体化した
装置を示し、第１図は、概略システム図、第２図は概略
回路構成図、第３図はスペクトル分析器の概略構成図、
第４図は標準発音形状化信号と学習者発声発音形状化信
号とを表示した状態のモニタ画面の説明図、第５図（ａ
）は、ａ音についてサンプリング回数を１回に制御した
モニタ画面における形状化信号の説明図、第５図（ｂ）
はａ音についてサンプリング回数を１０回に制御したモ
ニタ画面における形状化信号の説明図である。Here, Figs. 1 to 4 show an apparatus embodying the system of the present invention, Fig. 1 is a schematic system diagram, Fig. 2 is a schematic circuit configuration diagram, and Fig. 3 is a schematic configuration of a spectrum analyzer. figure,
FIG. 4 is an explanatory diagram of the monitor screen displaying the standard pronunciation shaping signal and the learner's vocalization shaping signal, and FIG.
) is an explanatory diagram of the shaping signal on the monitor screen when the number of samplings for the a sound is controlled to one, and FIG. 5(b)
FIG. 2 is an explanatory diagram of a shaping signal on a monitor screen when the number of samplings for the a sound is controlled to 10 times.

本実施例の語学学習システムは、概略すると、■標準発
音作成手段、■標準発音視覚認識表示手段、■学習者発
声発音視覚認識表示手段、■形状化信号比較手段の四半
段を有している。In summary, the language learning system of this embodiment has four stages: (1) standard pronunciation creation means, (2) standard pronunciation visual recognition display means, (2) learner's utterance pronunciation visual recognition and display means, and (4) shaped signal comparison means. .

−標準発音作成手段本手段は、標準発音を教師等によってビデオディク等の
記録媒体に記録したり、または教師等によって標準発音
を記録媒体を介することなく、直接入力する手段よりな
る。- Standard pronunciation creation means This means includes means for recording standard pronunciations on a recording medium such as a video disc by a teacher or the like, or for directly inputting standard pronunciations by a teacher or the like without going through a recording medium.

ここで、標準発音とは、前述したように、語学学習をす
るための教材に該当する発音である。そして、標準発音
を、教師等によってビデオディスク等の記録媒体への記
憶、録音は、テキストに該当する単語や文を構成する、
■標準発音のスペル、発音文字・記号等の文字・記号フ
レームと学習進行用映像、■繰り返し時の標準発音の音
声と、標準発音制御信号、およびサンプリング信号、■
映像に対応する標準発音を含む説明用音声よりなる。デ
ィスク（レーザーディスク）は、映像−チャンネルと、
音声二チャンネルを使用でき、該映像チャンネルに上記
学習進行用映像と標準発音における文字・記号フレーム
を映像として録画（記録）し、また、音声チャンネルの
うちの一チャンネル（Ａチャンネル）に上記映像チャン
ネルの映像に対応する標準発音を含む説明用音声を、他
の一チャンネル（Ｂチャンネル）に、標準発音制御信号
よびサンプリング検出信号と、繰り返し時（換言すれば
、映像が静止画となっている時、繰り返すための標準発
音とを記録し、標準発音記録媒体を作成する。そして、
モニタ上には学習進行用のフレームの画像が流れ、また
、該学習用フレム画像に合わせて音声Ａチャンネルより
画像に対応する音声が流れる。そして、学習進行用の該
フレームの画像が終わりの部位に画像が到達した時には
、静止画像となり上記文字・記号フレームが表示される
。Here, the standard pronunciation is, as mentioned above, the pronunciation that corresponds to teaching materials for language learning. Then, the standard pronunciation is memorized and recorded by the teacher on a recording medium such as a video disk, and the words and sentences corresponding to the text are composed.
■Character/symbol frames such as standard pronunciation spellings, pronunciation letters and symbols, and videos for learning progress, ■Standard pronunciation audio during repetition, standard pronunciation control signals, and sampling signals, ■
Consists of explanatory audio including standard pronunciation that corresponds to the video. The disc (laser disc) has video channels and
Two audio channels can be used, and the video for learning progress and the character/symbol frames for standard pronunciation are recorded as video on the video channel, and the video channel is used on one of the audio channels (channel A). When repeating (in other words, when the video is a still image), the explanatory audio including the standard pronunciation corresponding to the video is sent to the other channel (B channel) along with the standard pronunciation control signal and the sampling detection signal. , record the standard pronunciation to be repeated, and create a standard pronunciation recording medium.And,
Images of frames for learning progress are played on the monitor, and audio corresponding to the images is played from the audio A channel in conjunction with the learning frame images. When the image reaches the end of the image of the frame for learning progress, it becomes a still image and the character/symbol frame is displayed.

なお、標準発音制御信号は、標準発音における単語ある
いは文の語長を制御するための信号で、スタート信号と
ストップ信号とを有し、両信号でもって、一つの単語ま
たは文の長さ、すなわち語長を識別できるようにしてい
る。これによって、映像チャンネルが静止画の際におい
て、音声Ｂチャンネルに録音されている上記−つの単語
または文よりなる標準発音を、該標準発音の語長が相違
（語長が長い場合でも、短い場合でも）してもモニタの
一画面上で表示できる。また、サンプリング回数信号は
、学習者が、モニタ上で標準発音、発声発音の形状化信
号を見易い形状にするためのコントロール信号であって
、発音毎の特徴パラメータの抽出回数（サンプリング）
を変えられるようにした信号である。The standard pronunciation control signal is a signal for controlling the word length of a word or sentence in standard pronunciation, and has a start signal and a stop signal. This allows the word length to be identified. As a result, when the video channel is a still image, the standard pronunciation consisting of the above-mentioned words or sentences recorded on the audio B channel can be changed to ) can be displayed on one screen of the monitor. In addition, the sampling number signal is a control signal that allows the learner to easily view the standard pronunciation and vocalized pronunciation shaped signals on the monitor, and is the number of extraction times (sampling) of feature parameters for each pronunciation.
This is a signal that can be changed.

標準発音視覚認識表示手段本手段は、標準発音作成手段で得た標準発音を用い、該
スペクトル分析してパワースペクトルを検出すると共に
、該パワースペクトルより音韻性に関係する特徴パラメ
ータ（ホルマント）を抽出して標準発音形状化信号を得
て、該信号を標準発音視覚認識表示させるための手段で
ある。Standard pronunciation visual recognition display means This means uses the standard pronunciation obtained by the standard pronunciation creation means, analyzes the spectrum to detect the power spectrum, and extracts characteristic parameters (formants) related to phonology from the power spectrum. This is a means for obtaining a standard pronunciation shaping signal and displaying the standard pronunciation visual recognition signal.

本手段は、−船釣には第１．２図に示す装置を用いて実
施する。この装置について説明すると、モニタテレビ１
と、発音識別装置２、およびレーザーディスクプレーヤ
ー３の三つの装置より構成されている。発音識別装置２
は、具体的には、第２図に示すように、レーザーディス
クプレーヤー３によって標準発音記録体（ここでは、レ
ーザーディスク）を再生した際の音声Ｂチャンネルの標
準発音が音声Ｂチャンネル入力端子４に直列に接続され
たＡＤ変換器５、ＤＡ変換器６およびＬＰＦスインチア
を介してスペクトル分析器８でスペクトル分析され、ホ
ルマントが検出され、標準発音波形が得られ、これがホ
ルマント作成回路９、メモリ書込回路１０を介して標準
発音波形記憶回路１１にメモリされるように接続されて
いる。また、ＡＤ変換器５には標準発音記憶回路（通常
、ＲＡＭ）１２が接続され、ＡＤ変換器５によっＡＤ変
換されたディスク音声における音声Ｂチャン／ネルの標準発音ディジタル信号を標準発音記憶回路１２
にメモリするようにされている。また、標準発音記憶回
路１２にはメモリカウンター３を介して制御回路１４が
接続されている。また、音声Ｂチャンネルの入力端子４
には、音声Ｂチャンネルの標準発音制御信号、サンプリ
ング回数信号は、音声信号検出回路１５を介して制御回
路１３に信号伝達できるように接続され、上記標準発音
制御信号、サンプリング回数信号により、標準発音記憶
回路１２にメモリされている標準発音ディジタル信号を
、制御回路１３、メモリカウンター２により繰り返し、
ＤＡ変換器６を介してＬＰＦスイッチ７に信号伝達でき
るようにしている。ここで、該標準発音は上記標準発音
制御信号により、該標準発音の語長を認識させ、これに
応じてその語長の範囲内における該標準発音を繰り返し
再生できるようしている。また、制御回路１３は動作ス
イッチ１６、スペクトル分析器８、標準発音波形記憶回
路１１、学習者発声発音波形記憶回路１７、カウンター
１８．１９とそれぞれ接続されている。また、標準発音
波形記憶回路１１と学習者発声発音波形記憶回路１７と
は、ディスク映像の入力端子２０と共に、ミキシング回
路２１で、それぞれの信号がミキシングされ、ビデオ信
号としてモニタ１に合成表示されるように接続されてい
る。また、ＬＰＦスイッチ７には、音声Ａチャンネルに
記録されている映像チャンネルの映像に同期する映像進
行用音声信号入力端子２２、レシーバ−２３、およびマ
イク２４が増幅器２５を介して接続されている。This method is implemented by using the device shown in Figure 1.2 for boat fishing. To explain this device, monitor TV 1
It consists of three devices: a pronunciation identification device 2, and a laser disc player 3. Pronunciation identification device 2
Specifically, as shown in FIG. The spectrum is analyzed by the spectrum analyzer 8 via the AD converter 5, DA converter 6, and LPF sinchia connected in series, and the formant is detected to obtain a standard oscillation waveform, which is then sent to the formant generation circuit 9 and written into the memory. It is connected to a standard oscillation waveform storage circuit 11 via a circuit 10 so as to be stored therein. A standard pronunciation storage circuit (usually RAM) 12 is connected to the AD converter 5, and the standard pronunciation digital signal of the audio B channel/channel in the disc audio AD converted by the AD converter 5 is stored in the standard pronunciation storage circuit. 12
It has been stored in memory. Further, a control circuit 14 is connected to the standard pronunciation storage circuit 12 via a memory counter 3. In addition, input terminal 4 of audio B channel
The standard pronunciation control signal and the sampling number signal of the audio B channel are connected to the control circuit 13 via the audio signal detection circuit 15 so that the standard pronunciation control signal and the sampling number signal can be used to control the standard pronunciation. The standard sounding digital signal stored in the memory circuit 12 is repeated by the control circuit 13 and the memory counter 2,
The signal can be transmitted to the LPF switch 7 via the DA converter 6. Here, the word length of the standard pronunciation is recognized by the standard pronunciation control signal, and accordingly, the standard pronunciation can be repeatedly reproduced within the range of the word length. The control circuit 13 is also connected to an operation switch 16, a spectrum analyzer 8, a standard utterance waveform storage circuit 11, a learner's utterance waveform storage circuit 17, and counters 18 and 19, respectively. Further, the signals of the standard utterance waveform storage circuit 11 and the learner's utterance waveform storage circuit 17 are mixed together by a disk video input terminal 20 and a mixing circuit 21, and the signals are combined and displayed on the monitor 1 as a video signal. are connected like this. Further, the LPF switch 7 is connected via an amplifier 25 to an audio signal input terminal 22 for video progression synchronized with the video of the video channel recorded on the audio A channel, a receiver 23, and a microphone 24.

そして、この回路により、標準発音が、スペクトル分析
器８でスペクトル分析してパワースペクトルを検出する
と共に、該パワースペクトルより音韻性に関係する特徴
パラメータ（ホルマント）を抽出して標準発音形状化信
号を得て、該信号を動作スイッチ１６、制御回路１３の
指示によって記憶回路１１、ミキシング回路２１を介し
てモニタ１上に波形信号を表示させている。With this circuit, the standard pronunciation is analyzed by the spectrum analyzer 8 to detect the power spectrum, and the characteristic parameters (formants) related to phonology are extracted from the power spectrum to generate the standard pronunciation shaping signal. The waveform signal is displayed on the monitor 1 via the storage circuit 11 and the mixing circuit 21 according to instructions from the operation switch 16 and the control circuit 13.

なお、スペクトル分析器８は、第３図に示すようなフィ
ルタバンクによって構成され、ホルマントを検出し、標
準発音視覚認識表示させるための標準発音形状化信号を
得ている。そして、具体的には、中心周波数が少しずつ
異なるバンドパスフィルタ２６．２６・・・を並列に、
かつ隣あうフィルタの通過帯域の間には隙間がないよう
に配し、フィルタ２６．２６・・・のそれぞれに二乗器
２７．２７・・・を介して平均化回路２８．２８・・・
が接続され、バンドパスフィルタ２６．２６・・・の出
力を二乗器２７．２７・・・で二乗され、平均化回路２
８．２８・・・で平均値、すなわち、各フィルタ２６．
２６・・・の通過帯域に含まれる音声信号のパワーが求
められ、そして、中心周波数の順に、該平均化回路２８
．２８・・・の出力を読み取り、各周波数帯域にどのく
らいのパワーが含まれているかを把握し、これらの全パ
ワーを合計することでパワースペクトルを得ることがで
きる。そして、例えば、積分回路で構成されるホルマン
ト作成回路９で、パワースペクトルの先端部を結ぶ曲線
よりなるホルマントを得ることができる。The spectrum analyzer 8 is constituted by a filter bank as shown in FIG. 3, detects formants, and obtains a standard pronunciation shaping signal for visually recognizing and displaying the standard pronunciation. Specifically, bandpass filters 26, 26, etc. with slightly different center frequencies are connected in parallel.
Moreover, the filters 26, 26, . . . are arranged so that there is no gap between the passbands of adjacent filters, and the averaging circuits 28, 28, .
are connected, and the outputs of the bandpass filters 26, 26... are squared by the squarer 27, 27..., and the averaging circuit 2
8.28... is the average value, that is, each filter 26.
The power of the audio signal included in the pass band of 26... is determined, and the power of the audio signal included in the passband of 26 is determined, and then the power of the audio signal included in the pass band of
．． 28..., understand how much power is included in each frequency band, and add up all these powers to obtain a power spectrum. Then, for example, a formant creation circuit 9 composed of an integrating circuit can obtain a formant consisting of a curve connecting the leading ends of the power spectrum.

なお、ここでは、スペクトル分析器として〜フィルタバ
ンクによるもので説明したが、線形予測法による方法で
おこなってもよい。Note that although the spectrum analyzer used here has been described as using a filter bank, a method using a linear prediction method may also be used.

−学習者発声発音視覚認識表示手段本手段は、学習者発声発音をスペクトル分析してパワー
スペクトルを検出すると共に、該パワースペクトルより
音韻性に関係する特徴パラメータ（ホルマント）を抽出
して学習者発声発音形状化信号を得て、該信号を学習者
発声発音視覚認識表示するようにした手段である。そし
て、本手段は、標準発音視覚認識表示手段と同様にして
発音視覚認識表示するようにしている。すなわち、標準
発音視覚認識表示手段で用いた発音識別装置を用い、発
音視覚認識表示するようにしている。- Learner's vocal pronunciation visual recognition display means This means spectrally analyzes the learner's vocal pronunciation to detect a power spectrum, extracts characteristic parameters (formants) related to phonology from the power spectrum, and extracts feature parameters (formants) related to phonology from the power spectrum. This means obtains a pronunciation shaping signal and visually recognizes and displays the pronunciation produced by the learner. This means visually recognizes and displays pronunciation in the same manner as the standard pronunciation visual recognition display means. That is, the pronunciation identification device used in the standard pronunciation visual recognition display means is used to visually recognize and display pronunciation.

形状化信号比較手段本手段は、各手段で上記標準発音視覚認識表示された標
準発音形状化信号と、該学習者発声発音視覚認識表示さ
れた学習者発声発音形状化信号とを視覚的に比較する手
段である。すなわち、第２図に示す発音識別回路におい
て、ミキシング回路により、標準発音形状化信号と学習
者発声発音形状化信号とを重ね合わせ（例えば、モニタ
上に上下位置に並列配置したり、重合）して、視覚的に
識別できるようにする手段である。Shaped signal comparison means This means visually compares the standard pronunciation shaped signal visually recognized and displayed by each means with the learner's uttered pronunciation shaped signal displayed by visual recognition of the learner's uttered pronunciation. It is a means to do so. That is, in the pronunciation recognition circuit shown in FIG. 2, the standard pronunciation shaping signal and the learner's utterance pronunciation shaping signal are superimposed (for example, placed in parallel on the monitor at top and bottom positions, or superimposed). It is a means of making it possible to visually identify it.

次に、上述した本実施例の語学学習システムの使用方法
について説明すると、映像チャンネルに学習用映像（標
準発音を練習するについての説明や、練習単語・文の説
明の映像）と、該説明映像の最後に静止画となる標準発
音のスペル、発音文字、記号のフレームを有し、音声Ａ
チャンネルに映像チャンネルの学習用映像に対応する（
同期する）映像音声を有し、また音声Ｂチャンネルに標
準発音、標準発音制御信号、サンプリング回数信号を有
するレーザーディスクを用い、該レーザーディスクをレ
ーザーディスクプレーヤー２にかけて動作スイッチによ
り再生すると、発音識別装置の映像音声入力端子２２に
音声Ａチャンネルの映像音声信号が入力され、学習者が
レシーバ２３で該映像音声を再生することができ、また
、ディスク音声入力端子に音声Ｂチャンネルの標準発音
信号、標準発音制御信号、サンプリング回数信号が入力
され、また音声信号検出回路１５で標準発音信号と、他
の三信号とが検出され、またディスク映像入力端子２０
に映像チャンネル信号が入力される。すると、モニタ上
に映像チャン２ル信号、音声Ａチャンネル信号によって
、学習用映像が再生されると共にそれに対応する映像音
声が再生される。そして、該学習用映像において標準発
音の説明等が終わり、標準発音の文字・記号フレームに
入るとレーザーディスクプレーヤー２によって静止画と
され、一方、音声Ｂチャンネルに記録され、音声信号検
出回路１５で検出された標準発音制御信号によって、該
音声Ｂチャンネルに記録され、ディスク音声入力端子４
より入力され、かつ標準発音記憶回路１２にメモリされ
ている標準発音信号が制御回路１４、メモリカウンター
１３によって繰り返され、ＤＡ変換器６を介してＬＰＦ
スイ・７チ７に信号伝達し、スペクトル分析器８でスペ
クトル分析されてパワースペクトルが検出され、該パワ
ースペクトルより、標準発音におけるホルマントが検出
されて標準発音形状化信号が得られ、標準発音記憶回路
１１を介し、ミキシング回路２１で、映像チャンネルの
文字・記号フレームが上記標準発音形状化信号に重ねら
れ、モニタ１上に画像表示される。ここで、標準発音制
御信号によって、モニタ１上に画像表示する標準発音の
形状化信号は、モニタ画面上、標準発音の語長の長短に
関係なく表示するできる。すなわち、該信号を検出し、
標準発音の形状化信号を標準発音記憶回路１２に書き込
む時間を制御することによって、モニタ１の画面を有効
に使用できる。また、該標準発音形状化信号は、音声Ｂ
チャンネルに記録されているサンプリング回数信号でも
って、第５図に示すように、モニタ上１で、各語の形状
化信号が視覚識別し易いように、標準発音を構成する語
に応じて予め設定し、スペクトル分析器８において所定
数サンプリングし、画像表示するようにしている。なお
、学習者は、標準発音をＬＰＦ音声切替回路７を介して
繰り返しレンーバー２３で聞けることになる。Next, to explain how to use the language learning system of this embodiment described above, the video channel includes learning videos (videos explaining how to practice standard pronunciation and videos explaining practice words and sentences) and the explanatory videos. At the end, there is a frame of standard pronunciation spelling, pronunciation letters, and symbols that becomes a still image, and audio A
The channel corresponds to the learning video of the video channel (
Using a laser disc that has a synchronized video and audio, and also has a standard pronunciation, a standard pronunciation control signal, and a sampling number signal on the audio B channel, when the laser disc is played by the operation switch by playing the laser disc on the laser disc player 2, the pronunciation identification device The video/audio signal of the audio A channel is input to the video/audio input terminal 22 of the disc, and the learner can reproduce the video/audio with the receiver 23, and the standard pronunciation signal of the audio B channel is input to the disc audio input terminal. A sound generation control signal and a sampling number signal are inputted, and a standard sounding signal and three other signals are detected by an audio signal detection circuit 15, and a disc video input terminal 20
A video channel signal is input to. Then, the learning video is played back on the monitor using the video channel 2 signal and the audio A channel signal, and the video and audio corresponding thereto are also played back. When the explanation of the standard pronunciation is finished in the learning video and the character/symbol frame of the standard pronunciation is entered, the laser disc player 2 converts it into a still image, while it is recorded on the audio B channel, and the audio signal detection circuit 15 According to the detected standard pronunciation control signal, it is recorded on the audio B channel and output to the disc audio input terminal 4.
The standard pronunciation signal inputted from the standard pronunciation memory circuit 12 is repeated by the control circuit 14 and the memory counter 13, and then sent to the LPF via the DA converter 6.
The signal is transmitted to the switch 7, the spectrum is analyzed by the spectrum analyzer 8, and a power spectrum is detected. From the power spectrum, the formant in the standard pronunciation is detected, a standard pronunciation shaping signal is obtained, and the standard pronunciation is stored. The character/symbol frame of the video channel is superimposed on the standard pronunciation shaping signal by the mixing circuit 21 via the circuit 11, and the image is displayed on the monitor 1. Here, according to the standard pronunciation control signal, the standard pronunciation shaping signal to be displayed as an image on the monitor 1 can be displayed on the monitor screen regardless of the length of the word of the standard pronunciation. That is, detecting the signal,
By controlling the time for writing the standard pronunciation shaping signal into the standard pronunciation storage circuit 12, the screen of the monitor 1 can be used effectively. Further, the standard pronunciation shaping signal is voice B
With the sampling number signal recorded in the channel, as shown in Fig. 5, on the monitor 1, the shaped signal of each word is set in advance according to the words that make up the standard pronunciation so that it can be easily identified visually. Then, the spectrum analyzer 8 samples a predetermined number of samples and displays the images. The learner can repeatedly hear the standard pronunciation on the listener 23 via the LPF audio switching circuit 7.

次に、学習者が標準発音に沿って語学学習する場合は、
マイク２４を介して、音声発音を入力すると、−旦、該
音声発音信号は、増幅器２５で信号増幅されてＬＰＦス
イッチ７に信号伝達され、前述した標準発音と同様にス
ペクトル分析器８でスペクトル分析されてパワースペク
トルが検出され、該パワースペクトルより、音声発音に
おけるホルマントが検出されて音声発音形状化信号が得
られ、学習者音声発音記憶回路１７を介し、ミキシング
回路２１で、映像チャンネルの文字・記号フレームと上
記標準発音形状化信号とが重ね表示されているモニタ１
上に画像表示され（第４図参照）る。ここで、標準発音
と同様に、制御回路１４に信号伝達されているサンプリ
ング回数信号によって、学習者の発音する発声発音につ
いても標準発音と同様に形状化される。これは、学習者
の発声発音が標準発音を真像て発声する発音であること
による。また、学習者発声発音においても、音声制御信
号（音声Ｂチャンネル）によって、標準発音と同様に処
理される。そして、標準発音形状化信号と音声発音形状
化信号とを、学習者が視覚的に両信号が相領（一致・不
一致）しているか否を、該学習者が標準発音に沿った正
しい音声発音をしいてるか否を識別・認識できる。ここ
で、両信号が、一致しているか否かは、学習者自身が視
覚識別すると共に、図示しないメーター、ＬＥＤ、ラン
プ等の発光素子等その他の視覚的識別手段によっても確
認させ得るようにしている。Next, if learners want to learn a language according to standard pronunciation,
When voice pronunciation is input through the microphone 24, the voice pronunciation signal is amplified by the amplifier 25, transmitted to the LPF switch 7, and subjected to spectrum analysis by the spectrum analyzer 8 in the same manner as the standard pronunciation described above. From the power spectrum, the formant in the voice pronunciation is detected and a voice pronunciation shaping signal is obtained. Monitor 1 on which the symbol frame and the above standard pronunciation shaping signal are displayed overlappingly
An image is displayed above (see Figure 4). Here, similarly to the standard pronunciation, the vocal pronunciation produced by the learner is also shaped in the same way as the standard pronunciation by the sampling number signal transmitted to the control circuit 14. This is because the learner's pronunciation is a true version of the standard pronunciation. Further, the learner's vocal pronunciation is also processed in the same way as the standard pronunciation by the audio control signal (audio B channel). Then, the learner visually checks whether the standard pronunciation shaping signal and the audio pronunciation shaping signal match (match/disagree). Able to identify and recognize whether someone is using a computer or not. Here, whether the two signals match or not can be checked visually by the learner himself or by using other visual identification means such as a meter, an LED, a light emitting device such as a lamp (not shown), etc. There is.

そして、所定の標準発音について学習し終えると、動作
スイッチを操作することで、レーザーディスクプレーヤ
ー２によって、レーザーディスクが再生され、次の標準
発音の語学学習を同様にして行える。After learning a predetermined standard pronunciation, by operating the operation switch, the laser disc player 2 plays the laser disc, and the user can learn the next standard pronunciation in the same way.

なお、本発明は、上述した実施例に限定されるものでな
く本発明の要旨を変更しない範囲内で変形実施できるも
のを含む。因みに、上述した実施例においては、標準発
音をビデオディスクに予め記録したものを用いて説明し
たが、標準発音は、例えば、先生が直接発声するように
した構成としてもよいことは当然である。It should be noted that the present invention is not limited to the above-described embodiments, and includes modifications that can be made without departing from the gist of the present invention. Incidentally, in the above-mentioned embodiment, the standard pronunciation was previously recorded on a video disk. However, it is of course possible that the standard pronunciation may be directly uttered by the teacher, for example.

〔Effect of the invention〕

以上の説明より明らかなように、本発明の語学学習シス
テムによれば、標準発音と学習者発声発音とを、それぞ
れスペクトル分析して、ホルマントを検出し、音声形状
化信号とし、これをモニタ等に表示すると共に、重ね合
わせて、視覚的に比較識別するようにしているので、聴
覚的だけでなく、視覚的手段でもって標準発音と学習者
発声発音とが相偵性を有するか否かを把握し、乳幼児が
行う学習と同様な学習を行えるという効果を有する。As is clear from the above explanation, according to the language learning system of the present invention, the standard pronunciation and the learner's uttered pronunciation are each subjected to spectrum analysis, formants are detected, a speech shape signal is generated, and this is transmitted to the monitor, etc. In addition to displaying the pronunciation on the screen, it is also superimposed to visually compare and identify, so it is possible to check whether or not the standard pronunciation and the learner's pronunciation are compatible not only audibly but also visually. It has the effect of being able to understand and perform learning similar to that performed by infants and young children.

[Brief explanation of drawings]

第１〜４図は、本発明のシステムを具体化した装置を示
し、第１図は、概略システム図、第２図は概略回路構成
図、第３図はスペクトル分析器の概略構成図、第４図は
標準発音形状化信号と学習者発声発音形状化信号とを表
示した状態のモニタ画面の説明図、第５図（ａ）は、ａ
音についてすンプリング回数を１回に制御したモニタ画
面における形状化信号の説明図、第５図（ｂ）はａ音に
ついてサンプリング回数を１０回に制御したモニタ画面
における形状化信号の説明図である。１・・・モニタテレビ、２・・・発音識別装置、３・・
・レーザーディスクプレーヤー、４・・・Ｂチャンネル
音声入力端子、５・・・ＡＤ変換器、６・・・ＤＡ変換
器、７・・・ＬＰＦスイッチ、８・・・スペクトル分析
器、９・・・ホルマント作成回路、１０・・・メモリ書
込回路、１１・・・標準発音波形記憶回路、１２・・・
標準発音記憶回路、１３・・・メモリカウンタ、１４・
・・制御回路、】５・・・書込信号検出回路、１６・・
・動作スイッチ、１７・・・学習者発声発音波形記憶回
路、１８．１９・・・カウンター２０・・・ディスク映
像の入力端子、２１・・ミキシング回路、２２・・・映
像進行用音声信号入力端子、２３・・・レンーハー、２
４・・・マイク、２５・・・増幅器、２６・・・バンド
パスフィルタ、２７・・・二乗器、２８・・・平均化回
路特許1 to 4 show an apparatus that embodies the system of the present invention. FIG. 1 is a schematic system diagram, FIG. 2 is a schematic circuit diagram, and FIG. 3 is a schematic diagram of a spectrum analyzer. Figure 4 is an explanatory diagram of the monitor screen displaying the standard pronunciation shaping signal and the learner's vocalization shaping signal, and Figure 5 (a) is a
FIG. 5(b) is an explanatory diagram of the shaping signal on the monitor screen when the number of samplings for the sound is controlled to 1. FIG. 5(b) is an explanatory diagram of the shaping signal on the monitor screen when the number of samplings for the sound a is controlled to 10 times. . 1... Monitor TV, 2... Pronunciation identification device, 3...
・Laser disc player, 4...B channel audio input terminal, 5...AD converter, 6...DA converter, 7...LPF switch, 8...spectrum analyzer, 9... Formant creation circuit, 10... Memory writing circuit, 11... Standard oscillation waveform storage circuit, 12...
Standard pronunciation memory circuit, 13...Memory counter, 14.
...Control circuit, ]5...Write signal detection circuit, 16...
・Operation switch, 17...Learner's utterance sound waveform memory circuit, 18.19...Counter 20...Input terminal for disk video, 21...Mixing circuit, 22...Audio signal input terminal for video progression , 23... Lenher, 2
4...Microphone, 25...Amplifier, 26...Band pass filter, 27...Squarer, 28...Averaging circuit patent

Claims

[Claims]

(1) A standard pronunciation is spectrally analyzed to detect a power spectrum, and feature parameters related to phonology are extracted from the power spectrum to obtain a standard pronunciation shaping signal, and this signal can be displayed for standard pronunciation visual recognition. The standard pronunciation visual recognition display means that spectrally analyzes the learner's spoken pronunciation to detect the power spectrum, extracts feature parameters related to phonology from the power spectrum, and generates the learner's vocal pronunciation shaping signal. a learner's uttered pronunciation visual recognition display means for visually recognizing and displaying the signal as the learner's uttered pronunciation, the standard pronunciation shaped signal visually recognized and displayed by each of the means, and the learner's uttered Pronunciation Visual Recognition The present invention has a signal comparison means for visually comparing the displayed learner's utterance pronunciation shaping signal, and the signal comparison means allows language learning to be carried out by grasping the degree of similarity between both signals. A language learning system characterized by the following.

(2) The standard pronunciation and the learner's spoken pronunciation are subjected to spectrum analysis using a common spectrum analyzer to detect the power spectrum of the pronunciation and extract feature parameters related to phonology. language learning system.

(3) The standard pronunciation shaping signal and the learner's utterance pronunciation shaping signal are displayed as images on the monitor, and the signals can be visually compared by superimposing both signals. Language learning system.

(4) The signal comparison means that compares the standard pronunciation shaping signal and the learner's utterance pronunciation shaping signal visually displays signal coincidence, mismatch, signal similarity, and dissimilarity using light emitting elements such as LEDs and lamps. The language learning system according to claim 1.

(5) Create a standard pronunciation record by recording an audio pronunciation of a word or sentence and a standard pronunciation control signal that specifies the word length of the word or sentence that makes up the audio pronunciation, and 2. The language learning system according to claim 1, wherein the medium is standard pronunciation.

(6) The standard pronunciation recording medium is formed of a video disc, and the video channel of the disc records standard pronunciation learning videos as well as character/symbol frames such as standard pronunciation spellings and phonetic symbols. On one channel, audio such as explanatory audio and standard pronunciation audio corresponding to the learning video of the video channel is recorded, and on the other channel, standard pronunciation for repetition and standard pronunciation control that detects and controls word length. The language learning system according to claim 1, wherein the language learning system records signals.

(7) A standard pronunciation for repetition, a standard pronunciation control signal for detecting and controlling word length, and a power spectrum sampling frequency control signal are recorded in the other audio channel. language learning system.