JP7195593B2

JP7195593B2 - Language learning devices and language learning programs

Info

Publication number: JP7195593B2
Application number: JP2018233475A
Authority: JP
Inventors: 光成木村
Original assignee: 株式会社Ecc
Priority date: 2018-12-13
Filing date: 2018-12-13
Publication date: 2022-12-26
Anticipated expiration: 2038-12-13
Also published as: JP2020095176A

Description

本発明は、第１言語（例えば日本語）を母国語とするユーザが第２言語（例えば英語）を学習するための語学学習用装置および語学学習用プログラムに関する。 The present invention relates to a language learning device and a language learning program for a user whose mother tongue is a first language (eg Japanese) to learn a second language (eg English).

外国語会話を習得するためには、多くの対話練習が必要となる。近年、英会話講師との会話を通じた対話練習に代わる手段として、コンピュータを用いた語学学習システムが開発されている。 In order to master foreign language conversation, a lot of dialogue practice is required. In recent years, language learning systems using computers have been developed as a means to replace dialogue practice through conversation with an English conversation teacher.

このような語学学習システムにおいて、コンピュータは母語ではない音声を認識する必要がある。たとえば、日本人が英語を学習するための語学学習システムでは、コンピュータは、日本語音素の英語（いわゆるジャパニーズイングリッシュ）を認識する必要がある。これに対し、特許文献１では、男女３０人の日本人がそれぞれ８時間かけて録音した「日本人発音モデル」をベースに、日本人特有の発音（音素の組み合わせ）を認識できる技術が開示されている。 In such a language learning system, computers need to recognize non-native speech. For example, in a language learning system for Japanese people to learn English, a computer needs to recognize English with Japanese phonemes (so-called Japanese English). On the other hand, Patent Document 1 discloses a technology that can recognize pronunciations (combinations of phonemes) peculiar to Japanese people based on a "Japanese pronunciation model" recorded by 30 Japanese men and women for 8 hours each. ing.

特開２０１２－２１５６４５号公報JP 2012-215645 A

学習者は、対話練習を重ねることで、段階的に学習対象言語の音素が含まれた発話をすることができるようになる。例えば、日本人の初心者の英語発話は、英語を母語とする者の音素（英語音素）が少なく、日本語音素が多く含まれるが、習熟度が上がると英語音素が増加する傾向にある。しかし、英語を母語としない学習者が完全に、英語音素で発話することは困難であり、また、音素により、英語音素に近い発音の習得がしやすい音、しにくい音が異なるため、英語音素と日本語音素が混在した形となる。そのような学習者の音声をコンピュータによって正確に認識することは、特許文献１に記載の技術では困難である。 By repeating dialogue practice, the learner will gradually be able to make utterances containing the phonemes of the language to be learned. For example, in the English utterances of Japanese beginners, there are few phonemes (English phonemes) for those whose native language is English (English phonemes), and many Japanese phonemes are included. However, it is difficult for learners whose mother tongue is not English to speak completely with English phonemes. and Japanese phonemes are mixed. It is difficult for the technology described in Patent Document 1 to accurately recognize such a learner's voice by a computer.

本発明は、上記問題を解決するためになされたものであって、習得途上の語学学習者の音声を認識できる語学学習用装置を提供することを課題とする。 SUMMARY OF THE INVENTION An object of the present invention is to provide a language learning device capable of recognizing the voice of a language learner who is still learning the language.

本発明に係る語学学習用装置は、第１言語を母国語とするユーザが第２言語を学習するための語学学習用装置であって、前記ユーザが発した音声を音素データに変換する音素変換部と、前記第２言語の文字データと音素データとが対応付けられた辞書データを検索して、前記音素変換部によって変換された音素データを前記第２言語の文字データに変換する文字変換部とを備え、前記音素変換部が変換可能な音素データには、前記第１言語のネイティブスピーカーが使用する第１音素と、前記第２言語のネイティブスピーカーが使用する第２音素とが含まれ、前記辞書データでは、１つの文字データに、前記第１音素のみからなる音素データ、前記第２音素のみからなる音素データ、および、前記第１音素と前記第２音素との両方を含む音素データが対応付けられていることを特徴とする。 A language learning device according to the present invention is a language learning device for a user whose mother tongue is a first language to learn a second language, and which converts voice uttered by the user into phoneme data. and a character conversion unit that searches for dictionary data in which character data of the second language and phoneme data are associated with each other, and converts the phoneme data converted by the phoneme conversion unit into character data of the second language. wherein the phoneme data convertible by the phoneme conversion unit includes a first phoneme used by a native speaker of the first language and a second phoneme used by a native speaker of the second language, In the dictionary data, one character data includes phoneme data consisting only of the first phoneme, phoneme data consisting only of the second phoneme, and phoneme data including both the first phoneme and the second phoneme. It is characterized by being associated.

本発明に係る語学学習用装置は、前記文字変換部による前記辞書データの検索範囲を制限する検索範囲制限部をさらに備えることが好ましい。 Preferably, the language learning device according to the present invention further comprises a search range limiting section for limiting a search range of the dictionary data by the character conversion section.

本発明に係る語学学習用装置では、前記検索範囲制限部は、前記ユーザの前記第１言語の習熟度に応じて前記検索範囲を決定してもよい。 In the language learning device according to the present invention, the search range limiter may determine the search range according to the user's proficiency level of the first language.

本発明に係る語学学習用装置では、前記検索範囲制限部は、前記ユーザが会話を行う場面に応じて前記検索範囲を決定してもよい。 In the language learning device according to the present invention, the search range limiter may determine the search range according to a scene in which the user converses.

本発明に係る語学学習用装置では、前記検索範囲制限部は、前記ユーザの発話傾向に応じて前記検索範囲を決定してもよい。 In the language learning device according to the present invention, the search range limiter may determine the search range according to the user's utterance tendency.

本発明に係る語学学習用プログラムは、上記いずれかの語学学習用装置の各部としてコンピュータを機能させる。 A language learning program according to the present invention causes a computer to function as each part of any one of the language learning devices described above.

本発明によれば、習得途上の語学学習者の音声を認識することができる。 According to the present invention, it is possible to recognize the voice of a language learner who is still learning the language.

本発明の一実施形態に係る語学学習用装置の構成を示すブロック図である。1 is a block diagram showing the configuration of a language learning device according to an embodiment of the present invention; FIG. 音素変換部の作成方法を説明するための図である。It is a figure for demonstrating the preparation method of a phoneme conversion part. 辞書データの一部を示す図である。It is a figure which shows some dictionary data. 辞書データの一部を示す図である。It is a figure which shows some dictionary data. 辞書データの一部を示す図である。It is a figure which shows some dictionary data.

以下、本発明の実施形態について添付図面を参照して説明する。なお、本発明は、下記の実施形態に限定されるものではない。 BEST MODE FOR CARRYING OUT THE INVENTION Hereinafter, embodiments of the present invention will be described with reference to the accompanying drawings. In addition, this invention is not limited to the following embodiment.

（全体構成）
図１は、本発明の一実施形態に係る語学学習用装置１の構成を示すブロック図である。語学学習用装置１は、スマートフォンや汎用のパーソナルコンピュータによって構成することができる。本実施形態では、語学学習用装置１はスマートフォンによって構成されるものとする。 (overall structure)
FIG. 1 is a block diagram showing the configuration of a language learning device 1 according to one embodiment of the present invention. The language learning device 1 can be configured by a smart phone or a general-purpose personal computer. In this embodiment, it is assumed that the language learning device 1 is configured by a smartphone.

語学学習用装置１は、第１言語を母国語とするユーザが第２言語を学習するために用いられる。本実施形態では、第１言語は日本語であり、第２言語は英語であるが、本発明はこれに限定されない。 The language learning device 1 is used by a user whose mother tongue is a first language to learn a second language. In this embodiment, the first language is Japanese and the second language is English, but the invention is not limited to this.

図１に示すように、語学学習用装置１は、ストレージ２、制御部３、表示部４、入力部５、マイク６およびスピーカ７を主に備えている。 As shown in FIG. 1, the language learning device 1 mainly includes a storage 2, a control section 3, a display section 4, an input section 5, a microphone 6 and a speaker 7. As shown in FIG.

ストレージ２は、語学学習用装置１の演算処理に用いられる各種プログラムやデータを格納する部材であり、例えばフラッシュメモリで構成することができる。 The storage 2 is a member for storing various programs and data used for arithmetic processing of the language learning device 1, and can be configured by, for example, a flash memory.

制御部３は、語学学習用装置１のＣＰＵ（図示せず）が、ストレージ２に格納された語学学習用プログラム（アプリケーション）をメインメモリ（図示せず）に読み出して実行することにより実現される機能ブロックである。語学学習用プログラムは、ネットワークを介して語学学習用装置１にインストールしてもよい。あるいは、語学学習用プログラムを記録したＳＤカード等の、コンピュータ読み取り可能な非一時的な有体の記録媒体を語学学習用装置１に読み取らせることにより、語学学習用プログラムを語学学習用装置１にインストールしてもよい。 The control unit 3 is implemented by the CPU (not shown) of the language learning device 1 reading a language learning program (application) stored in the storage 2 into a main memory (not shown) and executing it. It is a functional block. The language learning program may be installed on the language learning device 1 via a network. Alternatively, the language learning program can be transferred to the language learning device 1 by causing the language learning device 1 to read a computer-readable non-temporary tangible recording medium such as an SD card in which the language learning program is recorded. may be installed.

制御部３は、音素変換部３１、文字変換部３２、判定部３３、フィードバック部３４および検索範囲制限部３５を主に備えている。これらの機能ブロックの機能については、後述する。 The control unit 3 mainly includes a phoneme conversion unit 31 , a character conversion unit 32 , a determination unit 33 , a feedback unit 34 and a search range limiter 35 . Functions of these functional blocks will be described later.

表示部４は、例えば液晶ディスプレイで構成することができる。入力部５は、ユーザからの操作の入力を受け付ける装置であり、例えばタッチパネルで構成することができる。マイク６およびスピーカ７は、語学学習用装置１に内蔵されてもよいし、外付けであってもよい。 The display unit 4 can be composed of, for example, a liquid crystal display. The input unit 5 is a device that receives an operation input from a user, and can be configured by, for example, a touch panel. The microphone 6 and the speaker 7 may be built in the language learning device 1 or may be externally attached.

（制御部）
続いて、制御部３の機能について説明する。 (control part)
Next, functions of the control unit 3 will be described.

音素変換部３１は、ユーザが発した音声を音素データに変換する機能ブロックである。本実施形態では、ユーザが発した音声は、マイク６においてアナログ音声信号に変換され、さらに図示しないＡＤ変換器によってデジタル音声信号に変換され、音素変換部３１に入力される。音素変換部３１は、機械学習された学習済みモデルによって実現されており、デジタル音声信号を分節して、分節された各音声信号を音素データに変換する。 The phoneme conversion unit 31 is a functional block that converts voice uttered by the user into phoneme data. In this embodiment, the voice uttered by the user is converted into an analog voice signal by the microphone 6 , further converted into a digital voice signal by an AD converter (not shown), and input to the phoneme conversion section 31 . The phoneme conversion unit 31 is implemented by a machine-learned model, segments a digital speech signal, and converts each segmented speech signal into phoneme data.

音素は、言語の変種や音韻理論によって多少の差異が存在するが、一般には、日本語には２４の音素（５母音＋１６子音＋３特殊音素）があり、英語には４４の音素（２０母音＋２４子音）がある。機械学習にあたっては、日本語音素（第１音素）については、複数の日本人から英語訛りのない音声データを採取し、英語音素（第２音素）については、複数の英語のネイティブスピーカーから音声データを採取し、図２に示すように、各音素と音声波形とを対応付けた学習用データセットを作成する。この学習用データセットに基づき、例えばディープラーニングなどの機械学習を行うことにより音素変換部３１を作成する。 There are some differences in phonemes depending on language variants and phonological theories, but in general, there are 24 phonemes in Japanese (5 vowels + 16 consonants + 3 special phonemes), and 44 phonemes in English (20 vowels + 24 phonemes). consonants). For machine learning, Japanese phonemes (first phoneme) were collected from multiple Japanese without an English accent, and English phonemes (second phoneme) were collected from multiple native English speakers. are collected, and as shown in FIG. 2, a learning data set is created in which each phoneme and a speech waveform are associated with each other. Based on this learning data set, the phoneme conversion unit 31 is created by performing machine learning such as deep learning.

これにより、音素変換部３１は、ユーザが発した音声を、日本語音素と、英語音素とを含む音素データに変換可能となる。すなわち、音素変換部３１が変換可能な音素データには、日本語音素と英語音素とが含まれ、音素変換部３１は、日本語音素と英語音素とを区別してユーザが発した音声を音素データに変換する。例えば、「ａｐｐｌｅ」という単語の「ａ」の部分について、英語のネイティブスピーカーの発音に対しては、発音記号

に相当する英語音素（アとエの中間程度の音素）に変換され、日本語のネイティブスピーカーの発音に対しては、「ａ」（ア）に相当する日本語音素に変換される。 As a result, the phoneme conversion unit 31 can convert the voice uttered by the user into phoneme data including Japanese phonemes and English phonemes. That is, the phoneme data that can be converted by the phoneme conversion unit 31 includes Japanese phonemes and English phonemes. Convert to For example, for the "a" part of the word "apple", the phonetic symbol for the pronunciation of a native English speaker is

is converted into an English phoneme (a phoneme intermediate between a and e) corresponding to .

なお、以下の説明では、便宜上、英語音素を大文字で表記し、日本語音素を小文字で表記する。例えば、「ａｐｐｌｅ」における「ａ」の音素について、アとエの中間程度の音素に対応する英語音素を「Ａ」と表記し、日本語音素は「ａ」と表記する。 In the following description, for convenience, English phonemes are written in capital letters and Japanese phonemes are written in small letters. For example, for the phoneme "a" in "apple", the English phoneme corresponding to the phoneme between "a" and "e" is written as "A", and the Japanese phoneme is written as "a".

文字変換部３２は、英語の文字データと音素データとが対応付けられた辞書データＤを検索して、音素変換部３１によって変換された音素データを英語の文字データに変換する機能ブロックである。図１に示すように、辞書データＤは、ストレージ２に格納されているが、語学学習用装置１と通信可能に接続された他の装置（サーバなど）に格納されてもよい。通常の辞書データとは異なり、辞書データＤでは、１つの文字データに、日本語音素（第１音素）のみからなる音素データ、英語音素（第２音素）のみからなる音素データ、および、日本語音素と英語音素との両方を含む音素データ（以下、混合音素データと称する）が対応付けられている。 The character conversion unit 32 is a functional block that searches dictionary data D in which English character data and phoneme data are associated with each other, and converts the phoneme data converted by the phoneme conversion unit 31 into English character data. As shown in FIG. 1, the dictionary data D is stored in the storage 2, but may be stored in another device (such as a server) communicatively connected to the language learning device 1. FIG. Unlike normal dictionary data, in the dictionary data D, one character data includes phoneme data consisting only of Japanese phonemes (first phoneme), phoneme data consisting only of English phonemes (second phoneme), and Japanese phonemes. Phoneme data containing both of the phonemes and English phonemes (hereinafter referred to as mixed phoneme data) are associated.

例えば、図３に示すように、「ｃａｋｅ」という単語には、英語音素のみからなる音素データ（ＫＥＹＫ）、日本語音素のみからなる音素データ（ｋｅ：ｋｉ）、および、混合音素データ（ＫＥＹｋｕ）が対応付けられている。また、「ｒｉｃｅ」という単語には、英語音素のみからなる音素データ（ＲＡＩＳ）、日本語音素のみからなる音素データ（ｒａｉｓｕ）および、混合音素データ（ｒａｉＳ）が対応付けられている。 For example, as shown in FIG. 3, the word "cake" includes phoneme data (KEYK) consisting only of English phonemes, phoneme data (ke:ki) consisting only of Japanese phonemes, and mixed phoneme data (KEYku). are associated. The word "rice" is associated with phoneme data (RAIS) consisting only of English phonemes, phoneme data (raisu) consisting only of Japanese phonemes, and mixed phoneme data (raiS).

なお、図３に示す例では、混合音素データは１つのみであるが、複数であってもよい。一般には、単語の文字数が多くなるほど、日本語音素と英語音素との組み合わせが多くなるが、語学学習者の発音は、音素毎に、発音しやすいもの、発音しにくいものがあり、習熟度や、発話者の意識状態により、英語音素の出現率は変化する。単語単位で見た場合、日本人の英語発話に関しては、習熟度にも影響するが、単語ごとに日本語音素と英語音素とが混ざり合うパターンは限定されている。そのため、含まれる音素内容によるが、混合音素データは、理論上の全てのパターンを用意する必要はなく、発音されやすい１～数パターンを用意すればよい。これにより、文字数による影響をさほど受けずに辞書データＤのデータ量を少なくすることができる。 In addition, in the example shown in FIG. 3, there is only one mixed phoneme data, but it may be plural. In general, the greater the number of letters in a word, the more combinations of Japanese and English phonemes. , the rate of occurrence of English phonemes changes according to the speaker's state of consciousness. In terms of word units, Japanese people's proficiency in speaking English is affected, but patterns in which Japanese phonemes and English phonemes are mixed for each word are limited. Therefore, although it depends on the contents of the included phonemes, the mixed phoneme data does not need to prepare all the theoretical patterns, and it is sufficient to prepare one to several patterns that are easy to pronounce. As a result, the amount of dictionary data D can be reduced without being affected by the number of characters.

辞書データＤをこのように構成することで、文字変換部３２は、音素変換部３１から入力された音素データに、日本語音素と英語音素との両方が入り混じっていた場合であっても、混合音素データから該当するものを選択することにより、音素データを文字データに変換することができる。そのため、英語のネイティブスピーカーの発音、および、英語学習の初心者の発音だけでなく、英語音素と日本語音素が混在した習得途上の語学学習者の音声も、正確に認識することができる。 By configuring the dictionary data D in this way, the character conversion unit 32 can convert both Japanese phonemes and English phonemes into the phoneme data input from the phoneme conversion unit 31. By selecting appropriate mixed phoneme data, the phoneme data can be converted into character data. Therefore, it is possible to accurately recognize not only the pronunciation of a native English speaker and the pronunciation of a beginner in learning English, but also the speech of a language learner in the process of mastering a mixture of English and Japanese phonemes.

判定部３３は、ユーザの発話内容および発音が適切か否かを判定する機能ブロックである。具体的には、判定部３３は、文字変換部３２によって変換された文字データから、英語の文章を構築し、構築された文章からユーザの発話内容の適否を判定する。また、判定部３３は、文字変換部３２による辞書データＤの検索処理に基づき、ユーザの発音の習熟度を判定する。より具体的には、判定部３３は、ユーザが発した単語ごとに、文字変換部３２が日本語音素のみからなる音素データ、英語音素のみからなる音素データ、および、混合音素データのいずれを選択して文字データに変換したかに基づき、ユーザの発音が英語寄り（ネイティブスピーカー）か、日本語寄り（初心者）か、それらの中間（中上級者）かを判定する。 The determination unit 33 is a functional block that determines whether or not the user's utterance content and pronunciation are appropriate. Specifically, the determination unit 33 constructs an English sentence from the character data converted by the character conversion unit 32, and determines whether the content of the user's utterance is appropriate from the constructed sentence. Further, the determination unit 33 determines the proficiency level of pronunciation of the user based on the search processing of the dictionary data D by the character conversion unit 32 . More specifically, for each word uttered by the user, the determination unit 33 selects any of the phoneme data consisting only of Japanese phonemes, the phoneme data consisting only of English phonemes, and the mixed phoneme data. It is determined whether the user's pronunciation is closer to English (native speaker), closer to Japanese (beginner), or intermediate between them (intermediate/advanced) based on whether the character data has been converted into character data.

フィードバック部３４は、ユーザの発話が適切でないと判定された場合に、その旨をユーザにフィードバックする機能ブロックである。例えば、ユーザが、語学学習用アプリケーションにおいて仮想のキャラクターと対話練習をしている場合、判定部３３が、ユーザの発話に誤った単語や文章が含まれていると判定すると、フィードバック部３４は、誤った発話に対して、対話キャラクターが、意味が分からない、または、異なる意味として認識したように振舞うように表示部４に表示する。さらに、フィードバック部３４は、ユーザの発話が適切でないと判定された場合、ユーザの習熟度を示す習熟度スコアを減点する。 The feedback unit 34 is a functional block that feeds back to the user when it is determined that the user's utterance is inappropriate. For example, when the user is practicing a dialogue with a virtual character in a language learning application, when the determination unit 33 determines that the user's utterance contains incorrect words or sentences, the feedback unit 34 The erroneous utterance is displayed on the display unit 4 so that the dialogue character behaves as if it does not understand the meaning or recognizes it as a different meaning. Further, when the user's utterance is determined to be inappropriate, the feedback unit 34 deducts the proficiency score indicating the proficiency of the user.

また、判定部３３は、シチュエーションや対話の相手などに応じて、ユーザの言葉遣いが適切であるかを判定し、言葉遣いが誤りではないが不適切な場合も、ユーザの発話が適切でないと判定するようにしてもよい。その場合、対話が不適切であることをユーザが理解できるように、フィードバック部３４は、キャラクターがマイナスの印象を受けたように振る舞うように表示する。 In addition, the determination unit 33 determines whether or not the user's wording is appropriate according to the situation and the conversational partner. You may make it judge. In that case, the feedback unit 34 displays the character to behave as if he/she received a negative impression so that the user can understand that the dialogue is inappropriate.

例えば、どうしたのか相手に確認したい場合の発話として、下記のａ）およびｂ）が考えられる。
ａ） What's your problem?
ｂ） What's wrong? / What happened? / What's the problem?
ａ）およびｂ）は、どちらも意味は通じるが、ａ）は相手の心境的な問題となり、乱暴な表現で相手には強い不快感を与える。一方、ｂ）は事実のみを確認する通常のやり取りである。そのため、ユーザがａ）を発話した場合、フィードバック部３４は、キャラクターがマイナスの印象を受けたように振る舞うように表示するとともに、ユーザの習熟度スコアを減点する。 For example, the following a) and b) are conceivable as utterances when the user wants to confirm with the other party what happened.
a) What's your problem?
b) What's wrong?/What happened?/What's the problem?
Both a) and b) make sense, but a) poses a problem for the other person's mental state, and the rough expression gives the other person a strong sense of displeasure. On the other hand, b) is a normal interaction confirming only the facts. Therefore, when the user utters a), the feedback unit 34 causes the character to behave as if it had a negative impression, and deducts the proficiency score of the user.

また、安倍首相を知っていることを伝える場合の発話として、下記のｃ）およびｄ）が考えられる。
ｃ） I know Prime Minister Abe.
ｄ） I know of Prime Minister Abe.
ｃ）およびｄ）は、どちらも意味は通じるが、ｃ）は、個人的に知っている意味となり、ｄ）は情報として知っている意味となる。 In addition, the following c) and d) are conceivable as utterances for conveying that you know Prime Minister Abe.
c) I know Prime Minister Abe.
d) I know of Prime Minister Abe.
Both c) and d) have the same meaning, but c) has a meaning known personally, and d) has a meaning known as information.

このような対話の適否は相手との関係に依存するが、ユーザが安倍首相と友人関係などではなく、一方的に知っている相手であるにもかかわらずｃ）を発話した場合は、相手に誤解を与えることとなるため、フィードバック部３４は、キャラクターが想定外の印象を受けたように振る舞うように表示するとともに、ユーザの習熟度スコアを減点する。 Whether or not such dialogue is appropriate depends on the relationship with the other party. Since this is misleading, the feedback unit 34 displays the character so that it behaves as if it received an unexpected impression, and deducts the user's proficiency score.

また、上記の例において、ユーザがａ）またはｃ）を発話した場合、フィードバック部３４は、ユーザの発話が不適切である理由をキャラクターなどに解説させるようにしてもよい。 Further, in the above example, when the user utters a) or c), the feedback unit 34 may cause the character or the like to explain why the user's utterance is inappropriate.

判定部３３およびフィードバック部３４にこのような機能を持たせることにより、文法的に間違いではないが、ニュアンスが異なったり、シチュエーションによっては相手を不快にさせたりする発話を是正することができる。 By providing the determination unit 33 and the feedback unit 34 with such functions, it is possible to correct utterances that are not grammatically incorrect but have different nuances or make the other party uncomfortable depending on the situation.

検索範囲制限部３５は、文字変換部３２による辞書データＤの検索範囲を制限する機能ブロックである。辞書データＤの検索範囲を制限することにより、文字変換部３２による検索処理量を減らすことができるため、文字変換の速度を向上させることができる。 The search range limiter 35 is a functional block that limits the search range of the dictionary data D by the character converter 32 . By limiting the search range of the dictionary data D, the amount of search processing by the character conversion unit 32 can be reduced, so the speed of character conversion can be improved.

本実施形態では、検索範囲制限部３５は、ユーザの英語の習熟度、ユーザが会話を行う場面、および、ユーザの発話傾向の少なくともいずれかに応じて、辞書データＤの検索範囲を決定する。この機能を実現するため、検索範囲制限部３５は、習熟度把握部３５１と、場面特定部３５２と、発話傾向把握部３５３とを備えている。 In this embodiment, the search range limiter 35 determines the search range of the dictionary data D according to at least one of the user's proficiency in English, the scene in which the user converses, and the user's utterance tendency. In order to realize this function, the search range limiting section 35 includes a proficiency level grasping section 351 , a scene identifying section 352 , and an utterance tendency grasping section 353 .

習熟度把握部３５１は、ユーザの英語の習熟度を把握する機能ブロックである。習熟度把握部３５１は、判定部３３の判定結果に応じて、上述の習熟度スコアを更新し、習熟度スコアに基づき、ユーザの英語の習熟度を把握する。 The proficiency level grasping unit 351 is a functional block that grasps the user's proficiency level of English. The proficiency level grasping unit 351 updates the proficiency level score according to the determination result of the judging unit 33, and grasps the user's English proficiency level based on the proficiency level score.

場面特定部３５２は、ユーザと仮想のキャラクターとの対話の背景となっている場面を特定する機能ブロックである。本実施形態では、ユーザは対話練習に先立って、対話練習の背景となる所望の場面を選択することができ、場面特定部３５２は、ユーザの選択操作に基づいて、対話の背景となっている場面を特定する。 The scene identification unit 352 is a functional block that identifies a scene that serves as a background for dialogue between the user and the virtual character. In this embodiment, the user can select a desired scene to be the background of the dialogue practice prior to the dialogue practice, and the scene identifying unit 352 selects the background of the dialogue based on the user's selection operation. identify the scene.

あるいは、場面特定部３５２は、対話の流れに沿って場面を特定してもよい。例えば、ユーザおよびキャラクターが用いた用語に、話題に関するキーワードが含まれているか否かに基づいて、場面を特定してもよい。 Alternatively, the scene identification unit 352 may identify the scene along the flow of dialogue. For example, a scene may be identified based on whether or not the terms used by the user and the character include keywords related to the topic.

発話傾向把握部３５３は、ユーザが用いる頻度の高い頻出用語を把握する機能ブロックである。本実施形態では、発話傾向把握部３５３は、これまでのユーザの対話練習の記録に基づき頻出用語を把握している。 The utterance tendency comprehension unit 353 is a functional block that comprehends frequently used terms that are frequently used by the user. In this embodiment, the utterance tendency grasping unit 353 grasps frequently appearing terms based on the record of the user's dialogue practice so far.

また、辞書データＤは、会話の場面（会話シーン、シチュエーション）に基づいて分類された複数の小辞書から構成されている。図３～図５はそれぞれ、小辞書の一例を示している。図３に示す小辞書Ｄ１では、主に食事の場面に用いられる用語とその音素とが対応付けられている。図４に示す小辞書Ｄ２では、主に動植物に関する会話に用いられる用語とその音素とが対応付けられている。図５に示す小辞書Ｄ３では、特定の場面に限定されない、あらゆる会話に用いられる用語とその音素とが対応付けられている（以下、汎用小辞書Ｄ３とも称する）。 The dictionary data D is composed of a plurality of small dictionaries classified based on conversation scenes (conversation scene, situation). 3 to 5 each show an example of a small dictionary. In the small dictionary D1 shown in FIG. 3, terms mainly used in meal scenes are associated with their phonemes. In the small dictionary D2 shown in FIG. 4, terms mainly used in conversations about animals and plants are associated with their phonemes. In the small dictionary D3 shown in FIG. 5, terms used in all kinds of conversations, not limited to specific scenes, are associated with their phonemes (hereinafter also referred to as general-purpose small dictionary D3).

なお、本実施形態における小辞書の分類方法は一例であり、特に限定されない。例えば、ユーザの習得レベルやユーザの使用頻度に応じて小辞書を分類してもよい。 Note that the classification method of the small dictionary in this embodiment is an example, and is not particularly limited. For example, small dictionaries may be classified according to the user's skill level or the user's frequency of use.

検索範囲制限部３５は、辞書データＤにおける複数の小辞書から、検索に用いる小辞書を選択することにより、文字変換部３２による辞書データＤの検索範囲を決定する。例えば、ユーザの習熟度が低く、会話を行う場面が食事の場面である場合、検索範囲制限部３５は、食事の場面に対応する小辞書Ｄ１および汎用小辞書Ｄ３を選択して、これらの小辞書Ｄ１，Ｄ３のみを検索範囲として決定する。 The search range limiting unit 35 selects a small dictionary to be used for searching from a plurality of small dictionaries in the dictionary data D, thereby determining the search range of the dictionary data D by the character conversion unit 32 . For example, if the user's proficiency level is low and the scene of conversation is a meal scene, the search range limiting unit 35 selects the small dictionary D1 and the general-purpose small dictionary D3 corresponding to the meal scene, and selects these small dictionaries. Only the dictionaries D1 and D3 are determined as the search range.

これにより、例えば図４に示す小辞書Ｄ２の「ｌｉｃｅ」（ｌｏｕｓｅ（蛆）の複数形）は、検索対象から除外されるため、文字変換部３２の検索処理量が軽減される。なお、ユーザの発話の音素が「ｒｉｃｅ」よりも「ｌｉｃｅ」に近い場合であっても、文字変換部３２は、「ｒｉｃｅ」に変換することになるが、「ｌｉｃｅ」は、食事の場面で使用される可能性は低く、上級者向けの単語であるため、初心者が用いることはほとんどない。また、一般に初心者は、「ｌ（エル）」の発音よりも「ｒ」の発音を苦手とするため、辞書データＤの検索範囲を制限しない場合、初心者が「ｒｉｃｅ」を意図して発話しても「ｌｉｃｅ」に変換されてしまう可能性が高くなるが、本実施形態では、このような意図しない変換を防止することができる。 As a result, for example, "lice" (plural form of louse (maggot)) in the small dictionary D2 shown in FIG. Even if the phoneme of the user's utterance is closer to "rice" than "rice", the character conversion unit 32 will convert it to "rice". It is rarely used and is rarely used by beginners as it is an advanced word. In general, beginners are not good at pronouncing "r" rather than pronouncing "l". is likely to be converted to "lice", the present embodiment can prevent such unintended conversion.

一方、ユーザの習熟度が高い場合は、検索範囲制限部３５は、辞書データＤからより多くの小辞書を選択して、選択した小辞書を検索範囲として決定する。 On the other hand, if the user's proficiency level is high, the search range limiter 35 selects more small dictionaries from the dictionary data D and determines the selected small dictionaries as the search range.

また、場面特定部３５２が対話の流れに沿って場面を特定する場合、検索範囲制限部３５は、発話単位ごとに、検索範囲となる小辞書を選択してもよい。これにより例えば、同じシチュエーション内では起こりえるが、質問に対しては起こりえない発話を誤認識することを防止でき、意図した文字データへ変換する精度を向上できる。 Moreover, when the scene identification unit 352 identifies a scene along the flow of dialogue, the search range restriction unit 35 may select a small dictionary as a search range for each utterance unit. As a result, for example, it is possible to prevent erroneous recognition of utterances that can occur in the same situation but cannot occur in response to a question, thereby improving the accuracy of conversion into intended character data.

また、ユーザの発話傾向に応じて検索範囲を制限する場合、ユーザの使用頻度に応じて小辞書を分類しておき、使用頻度の高い用語からなる小辞書を優先的に検索対象としてもよい。 Further, when the search range is limited according to the user's utterance tendency, the small dictionaries may be classified according to the frequency of use by the user, and the small dictionaries containing frequently used terms may be preferentially searched.

また、辞書データＤは、文字データへの変換対象としない音素データを含んでもよい。例えば、ユーザの母語での独り言やフィラー（「ええっと」、「なんだっけ？」など）に対応する音素データ（ｅｅｔｔｏ、ｎａｎｎｄａｋｋｅ）を辞書データＤに含めることにより、文字変換部３２は、そのような音素が入力された場合であっても、文字データには変換しない。これにより、発話の可能性の高い発話内容のみを認識対象とすることができる。 Also, the dictionary data D may include phoneme data that is not to be converted into character data. For example, by including phoneme data (eetto, nanndakke) corresponding to soliloquy in the user's native language and fillers ("Uh," "What is it?" etc.) in the dictionary data D, the character conversion unit 32 can Even if such phonemes are input, they are not converted into character data. As a result, only the utterance contents that are highly likely to be uttered can be recognized.

（付記事項）
本発明は上記の実施形態に限定されるものではなく、請求項に示した範囲で種々の変更が可能であり、実施形態に開示された技術的手段を適宜組み合わせて得られる形態も本発明の技術的範囲に含まれる。 (Additional notes)
The present invention is not limited to the above-described embodiments, and various modifications are possible within the scope of the claims, and forms obtained by appropriately combining the technical means disclosed in the embodiments are also included in the present invention. Included in the technical scope.

例えば、上記の実施形態では、制御部３の全ての機能を語学学習用装置１のＣＰＵによって実現していたが、制御部３の機能の一部をサーバなどの他の装置によって実現してもよい。この場合、本発明に係る語学学習用装置は、サーバと協働したシステムとして提供される。 For example, in the above embodiment, all the functions of the control unit 3 are implemented by the CPU of the language learning device 1, but some of the functions of the control unit 3 may be implemented by another device such as a server. good. In this case, the language learning device according to the present invention is provided as a system cooperating with the server.

また、上記の実施形態では、辞書データにおいて音素データと対応付けられている「１つの文字データ」は、原則として１つの単語であるが、本発明はこれに限定されない。例えば、図５に示す小辞書Ｄ３における「have you」のように複数の単語からなる語句や、１つの単語において音素を構成する単位も、特許請求の範囲に記載の「１つの文字データ」の範疇に含まれる。 Further, in the above embodiment, "one character data" associated with phoneme data in the dictionary data is in principle one word, but the present invention is not limited to this. For example, a phrase consisting of a plurality of words such as "have you" in the small dictionary D3 shown in FIG. included in the category.

１語学学習用装置
２ストレージ
３制御部
４表示部
５入力部
６マイク
７スピーカ
３１音素変換部
３２文字変換部
３３判定部
３４フィードバック部
３５検索範囲制限部
３５１習熟度把握部
３５２場面特定部
３５３発話傾向把握部
Ｄ辞書データ
Ｄ１小辞書
Ｄ２小辞書
Ｄ３小辞書 1 Language learning device 2 Storage 3 Control unit 4 Display unit 5 Input unit 6 Microphone 7 Speaker 31 Phoneme conversion unit 32 Character conversion unit 33 Judging unit 34 Feedback unit 35 Search range limiting unit 351 Proficiency level grasping unit 352 Scene specifying unit 353 Speech Trend grasping unit D Dictionary data D1 Small dictionary D2 Small dictionary D3 Small dictionary

Claims

A language learning device for learning a second language by a user whose native language is a first language, comprising:
a phoneme conversion unit that converts the voice uttered by the user into phoneme data;
a character conversion unit that searches for dictionary data in which character data of the second language and phoneme data are associated with each other, and converts the phoneme data converted by the phoneme conversion unit into character data of the second language. ,
The phoneme data convertible by the phoneme conversion unit includes a first phoneme used by a native speaker of the first language and a second phoneme used by a native speaker of the second language,
In the dictionary data, one character data includes phoneme data consisting only of the first phoneme, phoneme data consisting only of the second phoneme, and phoneme data including both the first phoneme and the second phoneme. A language learning device characterized by being associated.

2. The language learning device according to claim 1, further comprising a search range limiting section for limiting a search range of said dictionary data by said character conversion section.

3. The language learning device according to claim 2, wherein said search range limiting unit determines said search range according to said user's proficiency level of said first language.

4. The language learning device according to claim 2, wherein said search range limiting unit determines said search range according to a scene in which said user converses.

5. The language learning device according to any one of claims 2 to 4, wherein said search range limiting unit determines said search range according to said user's utterance tendency.

A language learning program that causes a computer to function as each part of the language learning device according to any one of claims 1 to 5.