KR20060073502A

KR20060073502A - Language learning system and voice data providing method for language learning

Info

Publication number: KR20060073502A
Application number: KR1020050128485A
Authority: KR
Inventors: 나오히로 에모또
Original assignee: 야마하 가부시키가이샤
Priority date: 2004-12-24
Filing date: 2005-12-23
Publication date: 2006-06-28
Also published as: CN1794315A; JP2006178334A; KR100659212B1; CN100585663C

Abstract

본원 발명은, 학습자와 비슷한 모범 음성을 이용하여 학습하는 것을 가능하게 한다. 복수의 각 화자마다, 그 화자의 음성으로부터 추출한 특징량과, 그 화자가 발성한 어학 학습용의 하나 또는 복수의 음성 데이터를 대응시켜, 각각 기억하는 데이터베이스가 설치된다. 학습자의 음성을 취득하고, 취득한 음성으로부터, 그 학습자의 음성의 특징량을 추출한다. 추출된 그 학습자의 특징량을, 상기 데이터베이스에 기록된 복수의 화자의 특징량과 비교하고, 이 비교에 기초하여, 하나의 화자의 음성 데이터를, 상기 데이터베이스로부터 선택한다. 선택된 하나의 음성 데이터에 따라 어학 학습용의 음성을 재생 출력한다. The present invention makes it possible to learn using exemplary voices similar to learners. For each of the plurality of speakers, a database for storing the feature amounts extracted from the speaker's voice and one or a plurality of pieces of audio data for language learning spoken by the speaker are stored. The learner's voice is acquired, and the feature amount of the learner's voice is extracted from the acquired voice. The extracted feature amount of the learner is compared with the feature amounts of a plurality of speakers recorded in the database, and based on the comparison, voice data of one speaker is selected from the database. The audio for language learning is reproduced and output according to the selected audio data.

모범 음성, 음성 데이터, 화자의 특징량, 학습자의 특징량, 유저 음성 Exemplary voice, voice data, speaker feature, learner feature, user voice

Description

LANGUAGE LEARNING SYSTEM AND VOICE DATA PROVIDING METHOD FOR LANGUAGE LEARNING}

도 1은 본 발명의 제1 실시예에 따른 어학 학습 시스템(1)의 기능 구성을 도시하는 블록도. 1 is a block diagram showing a functional configuration of a language learning system 1 according to a first embodiment of the present invention.

도 2는 데이터베이스 DB1의 내용을 예시하는 도면. 2 illustrates the contents of a database DB1.

도 3은 어학 학습 시스템(1)의 하드웨어 구성을 도시하는 블록도. 3 is a block diagram showing a hardware configuration of the language learning system 1.

도 4는 어학 학습 시스템(1)의 동작을 도시하는 플로우차트. 4 is a flowchart showing the operation of the language learning system 1.

도 5는 어학 학습 시스템(1)에서의 데이터베이스 DB1의 갱신 동작을 도시하는 플로우차트. 5 is a flowchart showing an update operation of the database DB1 in the language learning system 1;

도 6은 모범 음성(위) 및 유저 음성(아래)의 스펙트럼 포락을 예시하는 도면. 6 illustrates spectral envelope of exemplary voice (top) and user voice (bottom).

<도면의 주요 부분에 대한 부호의 설명><Explanation of symbols for the main parts of the drawings>

1 : 어학 학습 시스템1: language learning system

2 : 어학 학습 시스템2: language learning system

11 : 기억부11: memory

12 : 입력부12: input unit

13 : 특징 추출부13: feature extraction unit

14 : 음성 데이터 추출부14: voice data extraction unit

15 : 재생부15: playback unit

16 : 기억부16: memory

17 : 비교부17: comparison unit

18 : DB 갱신부18: DB update unit

21 : 화속 변환부21: fire speed conversion unit

101 : CPU101: CPU

102 : RAM102: RAM

104 : HDD104: HDD

105 : 디스플레이105: display

106 : 마이크106: microphone

107 : 음성 처리부107: voice processing unit

108 : 스피커108: speaker

109 : 키보드109: keyboard

110 : 버스110: the bus

111 : I/F111: I / F

[특허 문헌1] 일본 특개2002-244547호 공보[Patent Document 1] Japanese Patent Application Laid-Open No. 2002-244547

[특허 문헌2] 일본 특개2004-133409호 공보[Patent Document 2] Japanese Patent Application Laid-Open No. 2004-133409

본 발명은, 어학 학습을 지원하는 어학 학습 시스템에 관한 것이다. The present invention relates to a language learning system that supports language learning.

외국어 혹은 모국어의 어학 학습, 특히, 발음 혹은 발화의 독습에서는, CD(Compact Disk) 등의 기록 매체에 기록된 모범 음성을 재생하고, 그 모범 음성의 흉내를 내어 발음 혹은 발화한다고 하는 학습 방법이 널리 이용되고 있다. 이것은 모범 음성의 흉내를 냄으로써 올바른 발음을 익히는 것을 목적으로 하는 것이다. 여기서, 학습을 보다 효과적으로 진행시키기 위해서는, 모범 음성과 자신의 음성의 차를 평가할 필요가 있다. 그러나, CD에 기록된 모범 음성은, 임의의 특정한 아나운서나 네이티브 스피커의 음성인 경우가 대부분이다. 즉, 많은 학습자에게 있어서 이들 모범 음성은, 자신의 음성과는 전혀 다른 특징을 갖는 음성에 의해 발해지는 것이기 때문에, 모범 음성과 비교하여 자신의 발음이 어느 정도 정확하게 되고 있는지 등의 평가가 곤란하다고 하는 문제가 있었다. In the language learning of a foreign language or a native language, in particular, the pronunciation or utterance reading, a learning method of reproducing exemplary voices recorded on a recording medium such as a CD (Compact Disk) and imitating the exemplary voices to pronounce or speak is widely used. It is used. This aims to learn correct pronunciation by mimicking exemplary voices. Here, in order to advance learning more effectively, it is necessary to evaluate the difference between the exemplary voice and one's own voice. However, the exemplary voice recorded on the CD is most often the voice of any particular announcer or native speaker. That is, for many learners, since these exemplary voices are emitted by voices having completely different characteristics from their own voices, it is difficult to evaluate how accurate their pronunciation is compared to the exemplary voices. There was a problem.

이러한 문제를 해결하는 기술로서, 예를 들면 특허 문헌1, 2에 기재된 기술이 있다. 특허 문헌1에 기재된 기술은, 모범 음성에 유저의 억양, 화속, 음질 등의 파라미터를 반영시켜, 모범 음성을 유저 음성과 비슷한 음성으로 변환하는 것이다. 특허 문헌2에 기재된 기술은, 복수의 모범 음성 중에서, 학습자가 임의의 것을 선택 가능하게 하는 것이다. As a technique which solves such a problem, the technique of patent document 1, 2 is mentioned, for example. The technique described in Patent Document 1 converts exemplary voices into voices similar to user voices by reflecting parameters such as user intonation, speech rate, and sound quality in exemplary voices. The technique described in Patent Literature 2 enables the learner to select any of a plurality of exemplary voices.

그러나, 특허 문헌1에 기재된 기술에 따르면 인토네이션의 교정은 가능하지 만, 예를 들면 영어에서의 「r과 l」이나 「s와 th」등 명백하게 발음이 다르지만 교정을 행하는 것은 곤란하다고 하는 문제가 있었다. 또한, 음성 파형에 대하여 수정을 실시하기 때문에, 처리가 복잡하게 된다고 하는 문제가 있었다. 또한, 특허 문헌2에 기재된 기술에서는, 모범 음성을 선택하는 방식이기 때문에, 학습자가 스스로 모범 음성을 선택할 필요가 있어, 번잡하다고 하는 문제가 있었다. However, according to the technique described in Patent Literature 1, the correction of the intonation is possible, but there is a problem that the correction is difficult even though the pronunciation is clearly different, such as "r and l" or "s and th" in English. . In addition, there is a problem that the processing becomes complicated because correction is performed on the audio waveform. In addition, in the technique described in Patent Document 2, since it is a method of selecting the exemplary voice, the learner needs to select the exemplary voice on his own and there is a problem that it is complicated.

본 발명은 상술한 사정을 감안하여 이루어진 것으로, 보다 간단한 처리로 학습자와 비슷한 모범 음성을 이용하여 학습하는 것이 가능한 어학 학습 시스템 및 방법을 제공하는 것을 목적으로 한다. SUMMARY OF THE INVENTION The present invention has been made in view of the above circumstances, and an object thereof is to provide a language learning system and method capable of learning using a model voice similar to a learner by simpler processing.

상술한 과제를 해결하기 위해, 본 발명은, 복수의 각 화자마다, 그 화자의 음성으로부터 추출한 특징량과, 그 화자의 하나 또는 복수의 음성 데이터를 대응시켜, 각각 기억하는 데이터베이스와, 학습자의 음성을 취득하는 음성 취득 수단과, 상기 음성 취득 수단에 의해 취득한 음성으로부터, 상기 학습자의 음성의 특징량을 추출하는 특징량 추출 수단과, 상기 특징량 추출 수단에 의해 추출된 상기 학습자의 특징량을, 상기 데이터베이스에 기록된 복수의 화자의 특징량과 비교하고, 이 비교에 기초하여, 하나의 화자의 음성 데이터를, 상기 데이터베이스로부터, 선택하는 음성 데이터 선택 수단과, 상기 음성 데이터 선택 수단에 의해 선택된 하나의 음성 데이터에 따라 음성을 출력하는 재생 수단을 갖는 어학 학습 시스템을 제공한다. MEANS TO SOLVE THE PROBLEM In order to solve the above-mentioned subject, this invention provides the database which stores the feature quantity extracted from the speaker's voice for each of a plurality of speakers, the database which memorize | stores one or more voice data of the speaker, and stores them, respectively, A feature amount extracting means for extracting a feature amount of the learner's voice from the sound acquiring means for acquiring the speech, a feature acquired by the speech acquiring means, and the feature amount of the learner extracted by the feature amount extracting means, A voice data selection means for comparing the voice data of a plurality of speakers recorded in the database with a feature and selecting one speaker's voice data from the database based on the comparison, and one selected by the voice data selection means. A language learning system having reproducing means for outputting speech in accordance with speech data of the present invention.

바람직한 양태에서, 상기 음성 데이터 선택 수단은, 상기 데이터베이스에 기 록된 복수의 화자의 특징량과 상기 특징량 추출 수단에 의해 추출된 상기 학습자의 특징량과의 차를 나타내는 근사도 지수를, 각 화자마다 산출하는 근사도 산출 수단을 포함하고, 상기 그 근사도 산출 수단에 의해 산출된 근사도 지수에 기초하여, 소정의 조건을 만족하는 하나의 화자의 특징량과 대응지어진 하나의 음성 데이터를, 상기 데이터베이스로부터, 선택 추출하는 것을 특징으로 한다. 이 경우, 상기 소정의 조건이, 근사도가 가장 높은 것을 나타내는 근사도 지수에 대응하는 하나의 화자의 음성 데이터를 선택한다고 하는 조건이어도 된다. In a preferred embodiment, the voice data selecting means includes, for each speaker, an approximate index indicating the difference between the feature amounts of the plurality of speakers recorded in the database and the feature amount of the learner extracted by the feature quantity extracting means. An approximation calculation means for calculating, and based on the approximation index calculated by the approximation calculation means, one piece of audio data associated with a feature amount of one speaker satisfying a predetermined condition is stored in the database. It is characterized by extracting selectively from the. In this case, the predetermined condition may be a condition that a voice data of one speaker corresponding to the approximate index indicating the highest approximation is selected.

다른 바람직한 양태에서, 이 어학 학습 시스템은, 상기 음성 데이터 선택 수단에 의해 선택된 음성 데이터의 화속을 변환하는 화속 변환 수단을 더 갖고, 상기 재생 수단이, 상기 화속 변환 수단에 의해 화속 변환된 음성 데이터에 따라 음성을 출력해도 된다. In another preferred aspect, the language learning system further has speech rate converting means for converting speech rates of speech data selected by the speech data selecting means, wherein the reproducing means is adapted to speech data converted by speech rate converting means. You may output audio accordingly.

또 다른 바람직한 양태에서, 이 어학 학습 시스템은, 모범 음성을 기억하는 기억 수단과, 상기 모범 음성과 상기 음성 취득 수단에 의해 취득된 학습자의 음성을 비교하여, 양자의 근사도를 나타내는 정보를 발생하는 비교 수단과, 상기 비교 수단에 의해 발생된 정보에 의해 나타내어지는 근사도가 소정의 조건을 만족시키고 있는 경우, 상기 음성 취득 수단에 의해 취득된 학습자의 음성을, 상기 특징량 추출 수단에 의해 추출된 특징량과 대응시켜 상기 데이터베이스에 추가하는 데이터베이스 갱신 수단을 더 구비해도 된다. In another preferred aspect, the language learning system compares the memory means for storing exemplary voices with the learner's voices acquired by the exemplary voices and the voice acquiring means to generate information indicating the approximate degree of both. When the approximation degree represented by the comparing means and the information generated by the comparing means satisfies a predetermined condition, the learner's speech acquired by the speech obtaining means is extracted by the feature quantity extracting means. You may further be provided with the database update means added to the said database corresponding to a feature amount.

본 발명에 따르면, 학습자와 비슷한 소리의 특징을 갖는 화자에 의해 발해진 음성이, 학습에서의 예문의 음성으로서 재생시킬 수 있다. 따라서 학습자는, 흉내 낼(목표로 할) 발음을 보다 정확하게 인식할 수 있으며, 이에 의해 학습 효율을 향상시킬 수 있다. According to the present invention, a voice made by a speaker having a sound characteristic similar to that of a learner can be reproduced as a voice of an example sentence in learning. Therefore, the learner can recognize the pronunciation to be simulated (targeted) more accurately, thereby improving the learning efficiency.

<실시예><Example>

이하, 도면을 참조하여 본 발명의 실시예에 대하여 설명한다. EMBODIMENT OF THE INVENTION Hereinafter, the Example of this invention is described with reference to drawings.

<1. 구성> <1. Configuration>

도 1은 본 발명의 제1 실시예에 따른 어학 학습 시스템(1)의 기능 구성을 도시하는 블록도이다. 기억부(11)는, 화자의 음성으로부터 추출한 특징량과, 그 화자에 의한 음성의 음성 데이터를 대응시켜 기록한 데이터베이스 DB1을 기억하고 있다. 입력부(12)는, 학습자(유저)의 음성을 취득하여, 유저 음성 데이터로서 출력한다. 특징 추출부(13)는, 학습자의 음성으로부터 특징량을 추출한다. 음성 데이터 추출(선택)부(14)는, 특징 추출부(13)에 의해 추출된 특징량과, 데이터베이스 DB1에 기록되어 있는 특징량을 비교하여, 미리 정해진 조건을 만족하는, 하나의 화자의 특징량을 추출하고, 또한, 추출된 하나의 화자의 특징량에 대응지어져 있는 음성 데이터를 데이터베이스 DB1로부터 추출(선택)한다. 재생부(15)는, 음성 데이터 추출(선택)부(14)에 의해 추출(선택)된 음성 데이터를 재생하여, 스피커 혹은 이어폰 등을 통해 가청적으로 발음한다. 1 is a block diagram showing a functional configuration of a language learning system 1 according to a first embodiment of the present invention. The memory | storage part 11 memorize | stores the database DB1 which recorded the feature quantity extracted from the speaker's voice | voice, and the audio | voice data of the voice | voice by the speaker. The input unit 12 obtains the voice of the learner (user) and outputs it as user voice data. The feature extraction unit 13 extracts a feature amount from the learner's voice. The voice data extraction (selection) unit 14 compares the feature amount extracted by the feature extraction unit 13 with the feature amount recorded in the database DB1, and satisfies a predetermined speaker's feature. The amount is extracted, and the audio data corresponding to the extracted feature amount of one speaker is extracted (selected) from the database DB1. The reproduction unit 15 reproduces the audio data extracted (selected) by the audio data extraction (selection) unit 14 and audibly sounds through a speaker or earphone.

데이터베이스 DB1의 상세한 내용에 대해서는 후술하는데, 어학 학습 시스템(1)은 또한, 데이터베이스 DB1을 갱신하기 위해 이하의 구성 요소를 갖고 있다. 기억부(16)는, 어학 학습의 견본으로 되는 모범 음성 데이터와 그 모범 음성의 텍스트 데이터를 대응시켜 기록한 모범 음성 데이터베이스 DB2를 기억하고 있다. 비 교부(17)는, 입력부(12)에 의해 취득된 유저 음성 데이터와, 기억부(16)에 기억된 모범 음성 데이터의 비교를 행한다. 비교의 결과, 유저 음성이 미리 정해진 조건을 만족하면, DB 갱신부(18)는 유저 음성 데이터를 데이터베이스 DB1에 추가한다. The details of the database DB1 will be described later, but the language learning system 1 also has the following components for updating the database DB1. The memory | storage part 16 memorize | stores the exemplary audio | voice database DB2 which corresponded and recorded the exemplary audio | voice data used as a sample of language learning and the text data of the exemplary audio | voice. The comparison unit 17 compares the user voice data acquired by the input unit 12 with the exemplary voice data stored in the storage unit 16. As a result of the comparison, if the user voice satisfies a predetermined condition, the DB update unit 18 adds the user voice data to the database DB1.

도 2는 데이터베이스 DB1의 내용을 예시하는 도면이다. 데이터베이스 DB1에는, 화자를 특정하는 식별자인 화자 ID(도 2에서는 「ID001」)와, 그 화자의 음성 데이터로부터 추출한 특징량이 기록되어 있다. 데이터베이스 DB1에는 또한, 예문을 특정하는 식별자인 예문 ID와, 그 예문의 음성 데이터와, 그 예문의 발음 레벨(후술함)이 대응지어져 기록되어 있다. 데이터베이스 DB1은, 예문 ID, 음성 데이터, 및 발음 레벨로 이루어지는 데이터 세트를 복수 갖고 있으며, 각 데이터 세트는 음성 데이터의 화자에게 제공된 화자 ID와 대응지어져 기록되어 있다. 즉, 데이터베이스 DB1은, 복수의 화자에 의한 복수의 예문의 음성 데이터를 갖고 있으며, 이들 데이터는 화자 ID 및 특징량에 따라 화자마다 대응지어져 기록되어 있다. 2 is a diagram illustrating the contents of the database DB1. In the database DB1, the speaker ID ("ID001" in Fig. 2), which is an identifier for identifying the speaker, and the feature amount extracted from the speaker's voice data are recorded. The database DB1 further records an example sentence ID, which is an identifier for identifying an example sentence, audio data of the example sentence, and a pronunciation level (described later) of the example sentence. The database DB1 has a plurality of data sets consisting of example sentence IDs, voice data, and pronunciation levels, and each data set is recorded in association with the speaker ID provided to the speaker of the voice data. That is, the database DB1 has voice data of a plurality of example sentences by a plurality of speakers, and these data are corresponded and recorded for each speaker according to the speaker ID and the feature amount.

도 3은 어학 학습 시스템(1)의 하드웨어 구성을 도시하는 블록도이다. CPU(Central Processing Unit)(101)는, RAM(Random Access Memory)(102)을 작업 에리어로 하여, ROM(Read Only Memory)(103) 혹은 HDD(Hard Disk Drive)(104)에 기억되어 있는 프로그램을 판독하여 실행한다. HDD(104)는, 각종 어플리케이션 시스템이나 데이터를 기억하는 기억 장치이다. 또한, HDD(104)는, 데이터베이스 DB1 및 모범 음성 데이터베이스 DB2도 기억한다. 디스플레이(105)는, CRT(Cathode Ray Tube)나 LCD(Liquid Crystal Display) 등, CPU(101)의 제어 하에서 문자나 화상을 표시하는 표시 장치이다. 마이크(106)는, 유저의 음성을 취득하기 위한 집음 장치 로서, 유저가 발한 음성에 대응하는 음성 신호를 출력한다. 음성 처리부(107)는, 마이크(106)에 의해 출력된 아날로그 음성 신호를 디지털 음성 데이터로 변환하는 기능이나, HDD(104)에 기억된 음성 데이터를 음성 신호로 변환하여 스피커(108)에 출력하는 기능을 갖는다. 또한, 유저는 키보드(109)를 조작함으로써, 어학 학습 시스템(1)에 대하여 지시 입력을 행할 수 있다. 이상에서 설명한 각 구성 요소는, 버스(110)를 통해 서로 접속되어 있다. 또한, 어학 학습 시스템(1)은, I/F(인터페이스)(111)를 통해 다른 기기와 통신을 행할 수 있다. 3 is a block diagram showing the hardware configuration of the language learning system 1. CPU (Central Processing Unit) 101 is a program stored in ROM (Read Only Memory) 103 or HDD (Hard Disk Drive) 104 with RAM (Random Access Memory) 102 as a work area. Read it and run it. The HDD 104 is a storage device that stores various application systems and data. The HDD 104 also stores the database DB1 and the exemplary voice database DB2. The display 105 is a display device that displays characters and images under the control of the CPU 101, such as a cathode ray tube (CRT) or a liquid crystal display (LCD). The microphone 106 is a sound collecting device for acquiring a user's voice, and outputs a voice signal corresponding to the voice issued by the user. The voice processing unit 107 converts the analog voice signal output by the microphone 106 into digital voice data, or converts the voice data stored in the HDD 104 into a voice signal and outputs the voice signal to the speaker 108. Has the function. In addition, the user can input the instruction to the language learning system 1 by operating the keyboard 109. Each component described above is connected to each other via the bus 110. In addition, the language learning system 1 can communicate with other devices via the I / F (interface) 111.

<2. 동작><2. Action>

계속해서, 본 실시예에 따른 어학 학습 시스템(1)의 동작에 대하여 설명한다. 여기서는, 우선 예문의 음성을 재생하는 동작에 대하여 설명한 후에, 데이터베이스 DB1의 내용을 갱신하는 동작에 대하여 설명한다. 어학 학습 시스템(1)에서, CPU(101)가 HDD(104)에 기억된 어학 학습 프로그램을 실행함으로써 도 1에 도시한 기능을 갖는다. 또한, 학습자(유저)는, 어학 학습 프로그램의 개시 시 등에 키보드(109)를 조작하여 자신을 특정하는 식별자인 유저 ID를 입력한다. CPU(101)는, 입력된 유저 ID를 현재 시스템을 사용하고 있는 학습자의 유저 ID로서 RAM(102)에 기억한다. Subsequently, the operation of the language learning system 1 according to the present embodiment will be described. Here, the operation of reproducing the audio of the example sentence will first be described, and then the operation of updating the contents of the database DB1 will be described. In the language learning system 1, the CPU 101 executes the language learning program stored in the HDD 104 to have the function shown in FIG. The learner (user) inputs a user ID which is an identifier for identifying the user by operating the keyboard 109 at the time of starting the language learning program. The CPU 101 stores the input user ID in the RAM 102 as the user ID of the learner currently using the system.

<2-1. 음성 재생> <2-1. Play voice>

도 4는 어학 학습 시스템(1)의 동작을 도시하는 플로우차트이다. 어학 학습 프로그램을 실행하면, 어학 학습 시스템(1)의 CPU(101)는, 모범 음성 데이터베이스 DB2를 검색하여 이용 가능한 예문의 리스트를 작성한다. CPU(101)는, 이 리스트에 기초하여, 디스플레이(105) 상에 예문의 선택을 재촉하는 메시지를 표시한다. 유저는 디스플레이(105) 상에 표시된 메시지에 따라, 리스트에 있는 예문으로부터 하나의 예문을 선택한다. CPU(101)는 선택된 예문의 음성을 재생한다(스텝 S101). 구체적으로는, CPU(101)는 예문의 모범 음성 데이터를 모범 음성 데이터베이스 DB2로부터 판독하고, 판독한 모범 음성 데이터를 음성 처리부(107)에 출력한다. 음성 처리부(107)는 입력된 모범 음성 데이터를 디지털/아날로그 변환하여 아날로그 음성 신호로서 스피커(108)에 출력한다. 이렇게 해서 스피커(108)로부터 모범 음성이 재생된다. 4 is a flowchart showing the operation of the language learning system 1. When the language learning program is executed, the CPU 101 of the language learning system 1 searches the exemplary voice database DB2 and creates a list of available sentences. Based on this list, the CPU 101 displays a message on the display 105 prompting the selection of the example sentence. The user selects one example sentence from the example sentences in the list according to the message displayed on the display 105. The CPU 101 reproduces the audio of the selected example sentence (step S101). Specifically, the CPU 101 reads the exemplary voice data of the example sentence from the exemplary voice database DB2 and outputs the read exemplary voice data to the voice processing unit 107. The voice processing unit 107 digitally / analogs the input exemplary voice data and outputs the analog voice signal to the speaker 108 as an analog voice signal. In this way, the exemplary voice is reproduced from the speaker 108.

유저는 스피커(108)로부터 재생된 모범 음성을 듣고, 마이크(106)를 향하여 모범 음성을 흉내내어 예문을 발성한다. 즉, 유저 음성의 입력이 행해진다(단계 S102). 구체적으로는 다음과 같다. 모범 음성의 재생이 종료되면, CPU(101)는, 「다음은 당신 차례입니다. 예문을 발음해 주십시오.」 등, 유저에게 예문의 발생을 재촉하는 메시지를 디스플레이(105)에 표시한다. 또한 CPU(101)는, 「스페이스 키를 누르고 나서 발음하고, 발음이 종료되면 다시 한번 스페이스 키를 눌러 주십시오.」 등, 유저 음성의 입력을 행하기 위한 조작을 지시하는 메시지를 디스플레이(105)에 표시한다. 유저는, 디스플레이(105)에 표시된 메시지에 따라 키보드(109)를 조작하여, 유저 음성의 입력을 행한다. 즉, 키보드(109)의 스페이스 키를 누른 후에, 마이크(106)를 향하여 예문을 발성한다. 발성이 종료되면, 유저는 다시 한번 스페이스 키를 누른다. The user hears the exemplary voice reproduced from the speaker 108 and imitates the exemplary voice toward the microphone 106 to utter an example sentence. That is, user voice is input (step S102). Specifically, it is as follows. When reproduction of exemplary voice is finished, CPU101 says, "Next is your turn. Please pronounce the example sentence ”on the display 105 to prompt the user to generate the example sentence. The CPU 101 also displays a message on the display 105 instructing an operation for inputting the user's voice, such as "press the space key and then pronounce it, and press the space key again when the pronunciation is finished." Display. The user operates the keyboard 109 in accordance with the message displayed on the display 105 to input user voice. That is, after pressing the space key of the keyboard 109, an example sentence is made toward the microphone 106. When utterance ends, the user presses the space key again.

유저의 음성은 마이크(106)에 의해 전기 신호로 변환된다. 마이크(106)는, 유저 음성 신호를 출력한다. 유저 음성 신호는, 음성 처리부(107)에 의해 디지털 음성 데이터로 변환되어, 유저 음성 데이터로서 HDD(104)에 기록된다. CPU(101)는, 모범 음성의 재생이 완료된 후, 스페이스 키의 누름을 트리거로 하여 유저 음성 데이터의 기록을 개시하고, 재차의 스페이스 키의 누름을 트리거로 하여 유저 음성 데이터의 기록을 종료한다. 즉, 유저가 최초로 스페이스 키를 누르고 나서, 다시 한번 스페이스 키를 누르기까지의 동안의 유저 음성이 HDD(104)에 기록된다. The user's voice is converted into an electrical signal by the microphone 106. The microphone 106 outputs a user voice signal. The user voice signal is converted into digital voice data by the voice processing unit 107 and recorded in the HDD 104 as user voice data. After the reproduction of the exemplary voice is completed, the CPU 101 starts recording of the user voice data by pressing the space key, and ends recording of the user voice data by triggering the pressing of the space key again. That is, the user's voice from the first time the user presses the space key until the user presses the space key again is recorded on the HDD 104.

계속해서 CPU(101)는, 얻어진 유저 음성 데이터에 대하여 특징량 추출 처리를 행한다(단계 S103). 구체적으로는 다음과 같다. CPU(101)는, 음성 데이터를 미리 정해진 시간(프레임)마다 분할한다. CPU(101)는, 프레임으로 분해된 모범 음성 데이터가 나타내는 파형 및 유저 음성 신호가 나타내는 파형을 푸리에 변환하여 얻어진 진폭 스펙트럼의 대수를 구하고, 그것을 푸리에 역변환하여 프레임마다의 스펙트럼 포락을 얻는다. CPU(101)는, 이렇게 해서 얻어진 스펙트럼 포락으로부터 제1 포르만트 및 제2 포르만트의 포르만트 주파수를 추출한다. 일반적으로 모음은 제1 및 제2 포르만트의 분포에 의해 특징지어진다. CPU(101)는, 음성 데이터의 선두로부터, 프레임마다 얻어진 포르만트 주파수의 분포를, 미리 정해진 모음(예를 들면 「a」)의 포르만트 주파수 분포와 매칭을 행한다. 매칭에 의해 그 프레임이 모음 「a」에 상당하는 것이다라고 판단되면, CPU(101)는, 그 프레임에서의 포르만트 중, 미리 정해진 포르만트(예를 들면, 제1, 제2, 제3의 3개의 포르만트)의 포르만트 주파수를 산출한다. CPU(101)는, 산출한 포르만트 주파수를, 유저의 음성의 특징량 P로서 RAM(102)에 기억한다. Subsequently, the CPU 101 performs a feature variable extraction process on the obtained user voice data (step S103). Specifically, it is as follows. The CPU 101 divides the voice data every predetermined time (frame). The CPU 101 obtains the number of amplitude spectra obtained by Fourier transforming the waveform indicated by the exemplary speech data decomposed into frames and the waveform indicated by the user speech signal, and inversely transforms it to obtain spectral envelope for each frame. The CPU 101 extracts the formant frequencies of the first formant and the second formant from the spectral envelope thus obtained. In general, vowels are characterized by the distribution of first and second formants. The CPU 101 matches the formant frequency distribution obtained for each frame with the formant frequency distribution of a predetermined vowel (for example, "a") from the head of the audio data. If it is determined by matching that the frame corresponds to the vowel "a", the CPU 101 determines a predetermined formant (for example, first, second, and first) among the formants in the frame. Calculate the formant frequency of three formants of three). The CPU 101 stores the calculated formant frequency in the RAM 102 as the feature amount P of the user's voice.

계속해서 CPU(101)는, 데이터베이스 DB1로부터, 이 유저의 음성의 특징량 P와 비슷한 특징량과 대응지어져 있는 음성 데이터를 추출(선택)한다(스텝 S104). 구체적으로는, 추출된 특징량 P와 데이터베이스 DB1에 기록된 특징량을 비교하여, 특징량 P와 가장 근사하는 것을 특정한다. 비교에서는, 예를 들면, 특징량 P와 데이터베이스 DB1 사이에서 제1∼제3 포르만트 주파수의 값의 차를 산출하고, 또한 3개의 포르만트 주파수의 차의 절대값을 서로 합한 양을 양자의 근사도를 나타내는 근사도 지수로서 산출한다. CPU(101)는, 산출한 근사도 지수가 가장 작은 것, 즉 특징량 P와 가장 근사하는 특징량을 데이터베이스 DB1로부터 특정한다. CPU(101)는 또한, 특정된 특징량과 대응지어져 있는 음성 데이터를 추출하고, 추출한 음성 데이터를 RAM(102)에 기억한다. Subsequently, the CPU 101 extracts (selects) the audio data associated with the feature amount similar to the feature amount P of the user's voice from the database DB1 (step S104). Specifically, the extracted feature variable P is compared with the feature variable recorded in the database DB1, and the closest approximation to the feature variable P is specified. In the comparison, for example, the difference between the values of the first to third formant frequencies is calculated between the feature amount P and the database DB1, and the sum of the absolute values of the differences between the three formant frequencies is added to each other. It is computed as an approximate index indicating the approximate degree of. The CPU 101 specifies, from the database DB1, the smallest calculated approximation index, that is, the feature amount that is closest to the feature amount P. FIG. The CPU 101 further extracts the voice data associated with the specified feature amount and stores the extracted voice data in the RAM 102.

계속해서 CPU(101)는, 음성 데이터의 재생을 행한다(단계 S105). 구체적으로는 다음과 같다. CPU(101)는 음성 데이터를 음성 처리부(107)에 출력한다. 음성 처리부(107)는, 입력된 음성 데이터를 디지털/아날로그 변환하여 음성 신호로서 스피커(108)에 출력한다. 이렇게 해서, 추출된 음성 데이터는 스피커(108)로부터 음성으로서 재생된다. 여기서, 음성 데이터는 특징량의 매칭에 의해 추출된 것이기 때문에, 재생된 음성은, 학습자의 음성과 특징이 비슷한 음성으로 되어 있다. 따라서, 학습자는, 자신과는 전혀 다른 소리의 특징을 갖는 화자(아나운서, 네이티브 스피커 등)에 의해 발해진 음성을 듣는 것만으로는 흉내를 내는 것이 곤란하였던 예문이라도, 자신과 아주 비슷한 소리의 특징을 갖는 화자에 의해 발해진 음성이면, 흉내낼 발음을 보다 정확하게 이해할 수 있어, 학습 효율을 향상시킬 수 있 다. Subsequently, the CPU 101 reproduces the audio data (step S105). Specifically, it is as follows. The CPU 101 outputs voice data to the voice processing unit 107. The voice processing unit 107 digitally / analogs the input voice data and outputs it to the speaker 108 as a voice signal. In this way, the extracted audio data is reproduced from the speaker 108 as audio. Since the voice data is extracted by matching the feature amounts, the reproduced voice is a voice similar in feature to the learner's voice. Therefore, the learner may have a sound characteristic that is very similar to his own, even if it is difficult to imitate simply by listening to a voice made by a speaker (announcer, native speaker, etc.) having a completely different sound characteristic. If the voice is made by a speaker having a voice, the pronunciation to be simulated can be more accurately understood, and the learning efficiency can be improved.

<2-2. 데이터베이스 갱신><2-2. Database Update>

계속해서, 데이터베이스 DB1의 갱신 동작에 대하여 설명한다. Subsequently, the update operation of the database DB1 will be described.

도 5는 어학 학습 시스템(1)에서의 데이터베이스 DB1의 갱신 동작을 도시하는 플로우차트이다. 우선, 상술한 스텝 S101∼S102의 처리에 의해, 모범 음성의 재생 및 유저 음성의 입력이 행해진다. 계속해서 CPU(101)는, 모범 음성과 유저 음성의 비교 처리를 행한다(단계 S201). 구체적으로는 다음과 같다. CPU(101)는, 모범 음성 데이터가 나타내는 파형을 미리 정해진 시간(프레임)마다 분할한다. 또한, CPU(101)는, 유저 음성 데이터가 나타내는 파형에 대해서도 프레임마다 분할한다. CPU(101)는, 프레임으로 분해된 모범 음성 데이터가 나타내는 파형 및 유저 음성 신호가 나타내는 파형을 푸리에 변환하여 얻어진 진폭 스펙트럼을 대수값으로 구하고, 그것을 푸리에 역변환하여 프레임마다의 스펙트럼 포락을 얻는다. 5 is a flowchart showing an update operation of the database DB1 in the language learning system 1. First, reproduction of exemplary voices and input of user voices are performed by the processes of steps S101 to S102 described above. Subsequently, the CPU 101 performs a comparison process between the exemplary voice and the user voice (step S201). Specifically, it is as follows. The CPU 101 divides the waveform indicated by the exemplary voice data every predetermined time (frame). The CPU 101 also divides the waveform indicated by the user voice data for each frame. The CPU 101 obtains the amplitude spectrum obtained by Fourier transforming the waveform indicated by the exemplary speech data decomposed into frames and the waveform represented by the user speech signal, and inversely transforms the Fourier transform to obtain spectral envelope for each frame.

도 6은 모범 음성(위) 및 유저 음성(아래)의 스펙트럼 포락을 예시하는 도면이다. 도 6에 도시한 스펙트럼 포락은, 프레임 Ⅰ∼프레임 Ⅲ의 3개의 프레임으로 구성되어 있다. CPU(101)는, 얻어진 스펙트럼 포락을 프레임마다 비교하여, 양자의 근사도를 수치화하는 처리를 행한다. 근사도의 수치화(근사도 지수의 산출)는, 예를 들면 이하와 같이 행한다. CPU(101)는, 특징적인 포르만트의 주파수와 스펙트럼 밀도를 스펙트럼 밀도-주파수도에 표시하였을 때의 2점간의 거리를 음성 데이터 전체에 대하여 서로 합한 것을 근사도 지수로서 산출해도 된다. 혹은, 특정한 주파수에서의 스펙트럼 밀도의 차를 음성 데이터 전체에 대하여 적분한 것을 근사 도 지수로서 산출해도 된다. 또한, 모범 음성과 유저 음성은 길이(시간)가 다른 것이 통상이기 때문에, 상술한 처리에 앞서서 양자의 길이를 일치시키는 처리를 행하는 것이 바람직하다. 6 is a diagram illustrating the spectral envelope of exemplary voice (above) and user voice (bottom). The spectral envelope shown in FIG. 6 is composed of three frames of frames I to III. The CPU 101 compares the obtained spectral envelope for each frame and performs a process of digitizing the approximate degree of both. The numerical value of the approximation (calculation of the approximate index) is performed as follows, for example. The CPU 101 may calculate, as an approximation index, the sum of the distances between the two points when the characteristic formant frequency and the spectral density are shown in the spectral density-frequency diagram. Alternatively, the integral of the difference in spectral density at a specific frequency with respect to the entire voice data may be calculated as an approximate index. In addition, since the exemplary voice and the user voice usually have different lengths (times), it is preferable to perform a process of matching the lengths of the two prior to the above-described processing.

다시 도 5를 참조하여 설명한다. CPU(101)는, 산출한 근사도 지수에 기초하여, 데이터베이스 DB1의 갱신을 행할지의 여부를 판단한다(스텝 S202). 구체적으로는 다음과 같다. HDD(104)에는, 취득한 음성 데이터를 데이터베이스 DB1에 추가등록하기 위한 조건이 미리 기억되어 있다. CPU(101)는, 스텝 S201에서 산출한 근사도 지수가 이 등록 조건을 만족하는지의 여부를 판단한다. 등록 조건이 만족된 경우(단계 S202 : 예), CPU(101)는, 처리를 후술하는 스텝 S203으로 진행한다. 등록 조건이 만족되지 않은 경우(단계 S202 : 아니오), CPU(101)는 처리를 종료한다. This will be described with reference to FIG. 5 again. The CPU 101 determines whether or not to update the database DB1 based on the calculated approximation index (step S202). Specifically, it is as follows. The HDD 104 stores in advance a condition for additionally registering the acquired audio data in the database DB1. The CPU 101 determines whether or not the approximation index calculated in step S201 satisfies this registration condition. If the registration condition is satisfied (step S202: Yes), the CPU 101 proceeds to step S203, which will be described later. If the registration condition is not satisfied (step S202: NO), the CPU 101 ends the process.

등록 조건이 만족된 경우, CPU(101)는 데이터베이스 갱신 처리를 행한다(단계 S203). 구체적으로는 다음과 같다. CPU(101)는, 등록 조건을 만족한 음성 데이터에, 이 음성 데이터의 화자인 학습자(유저)를 특정하는 유저 ID를 부여한다. CPU(101)는, 모범 음성 데이터베이스 DB2로부터 유저 ID와 동일한 유저 ID를 검색하고, 음성 데이터를 이 유저 ID와 대응시켜 모범 음성 데이터베이스 DB2에 추가 등록한다. CPU(101)는, 갱신 요구로부터 추출한 유저 ID가 모범 음성 데이터베이스 DB2에 등록되어 있지 않았던 경우에는, 이 유저 ID를 추가 등록하고, 이 유저 ID에 대응시켜 음성 데이터를 등록한다. 이와 같이 하여, 데이터베이스 DB1에 학습자의 음성 데이터가 추가 등록되어, 갱신된다. If the registration condition is satisfied, the CPU 101 performs a database update process (step S203). Specifically, it is as follows. The CPU 101 assigns to the voice data that satisfies the registration condition, a user ID for specifying a learner (user) who is the speaker of the voice data. The CPU 101 retrieves the same user ID as the user ID from the exemplary voice database DB2, and further registers the voice data in the exemplary voice database DB2 in association with the user ID. If the user ID extracted from the update request is not registered in the exemplary voice database DB2, the CPU 101 further registers this user ID and registers the voice data in correspondence with this user ID. In this way, the voice data of the learner is additionally registered in the database DB1 and updated.

이상에서 설명한 데이터베이스 갱신 동작은, 상술한 음성 재생 동작과 병행 하여 행해져도 되고, 음성 재생 동작의 완료 후에 행해져도 된다. 이렇게 해서, 학습자의 음성 데이터가 순차적으로 데이터베이스 DB1에 추가되어 감으로써 데이터베이스 DB1에는 수많은 화자의 음성 데이터가 축적되게 된다. 따라서, 어학 학습 시스템(1)이 사용됨에 따라 데이터베이스 DB1에 많은 화자의 음성 데이터가 등록되어 가고, 동시에 새로운 학습자가 어학 학습 시스템(1)을 사용할 때에도 자신과 특징이 비슷한 음성이 재생될 확률이 높아져 간다. The database update operation described above may be performed in parallel with the above-described audio reproduction operation or may be performed after completion of the audio reproduction operation. In this way, the voice data of the learners is sequentially added to the database DB1, so that the voice data of many speakers are accumulated in the database DB1. Therefore, as the language learning system 1 is used, the voice data of many speakers is registered in the database DB1, and at the same time, the probability that a voice similar to the characteristics of oneself is reproduced even when a new learner uses the language learning system 1 is increased. Goes.

<3. 변형예> <3. Variation>

본 발명은 상술한 실시예에 한정되는 것이 아니라, 다양한 변형 실시가 가능하다. The present invention is not limited to the above-described embodiment, but various modifications can be made.

<3-1. 변형예1> <3-1. Modification Example 1

상술한 실시예에서, 스텝 S104에서 추출한 음성 데이터를 RAM(102)에 기억한 후, CPU(101)는, 음성 데이터에 대하여 화속 변환 처리를 행해도 된다. 구체적으로는 다음과 같다. RAM(102)은, 화속 변환 처리의 전후에서의 화속 비율을 지정하는 변수 a를 미리 기억해 둔다. CPU(101)는, 추출한 음성 데이터에 대하여, 음성의 시간(음성 데이터의 선두로부터 말미까지의 재생에 필요한 시간)을 a배하는 처리를 행한다. a>1인 경우에는 화속 변환 처리에 의해 음성의 길이가 신장된다. 즉, 화속은 느려진다. 반대로, a<1인 경우에는 화속 변환 처리에 의해 음성의 길이는 줄어든다. 즉, 화속은 빨라지게 된다. 본 실시예에서, 변수 a의 초기값으로서 1보다 큰 값이 설정되어 있다. 따라서, 모범 음성이 재생되며, 계속해서 유저 음성이 입력된 후, 유저 음성과 비슷한 음성으로 재생되는 예문은, 모범 음성보다 천천히 재생된다. 따라서, 학습자는, 흉내낼 발음(목표로 할 발음)을 보다 명확하게 인식할 수 있다. In the above-described embodiment, after storing the speech data extracted in step S104 in the RAM 102, the CPU 101 may perform the speech rate conversion process on the speech data. Specifically, it is as follows. The RAM 102 stores in advance a variable a that specifies the rate of speech rate before and after the rate of speech conversion process. The CPU 101 performs a process of multiplying the time (the time required for reproduction from the head to the end of the voice data) of the voice with respect to the extracted voice data. In the case of a> 1, the speech length is extended by the speech rate converting process. In other words, the fire speed slows down. On the contrary, in the case of a <1, the length of speech is reduced by the speech rate converting process. In other words, the speed of fire becomes faster. In this embodiment, a value larger than 1 is set as the initial value of the variable a. Therefore, the exemplary voice is reproduced, and after the user voice is continuously input, the example sentence reproduced with the voice similar to the user voice is reproduced more slowly than the exemplary voice. Thus, the learner can more clearly recognize the pronunciation to be simulated (the pronunciation to be targeted).

<3-2. 변형예2> <3-2. Modification Example 2>

상술한 실시예에서는, 스텝 S104에서, 학습자(유저)의 음성으로부터 추출한 특징량과 가장 근사하는 특징량과 대응지어진 음성 데이터를 추출하였지만, 음성 데이터를 추출하는 조건은 학습자의 음성의 특징량과 가장 근사하는 것에 한정되지 않는다. 예를 들면, 데이터베이스 DB1에서, 예문의 음성 데이터와 대응시켜 그 음성의 발화 레벨(모범 음성과의 근사도를 나타내는 지수; 발화 레벨이 높은 것은 보다 모범 음성에 근사하고 있음)을 기록해 놓고, 이 발화 레벨을 음성 데이터 선택의 조건에 포함해도 된다. 구체적인 조건으로서는 예를 들면, 발화 레벨이 임의의 일정 레벨 이상인 것 중에서, 특징량이 가장 근사하는 것을 추출한다고 하는 조건이어도 된다. 혹은, 특징량의 근사도가 임의의 값 이상인 것 중에서, 발화 레벨이 가장 높은 것을 추출한다고 하는 조건이어도 된다. 발화 레벨은, 예를 들면 스텝 S201에서의 근사도 지수의 산출과 마찬가지로 행하면 된다. In the above-described embodiment, in step S104, the voice data associated with the feature amount extracted most closely from the feature amount extracted from the learner's (user's) voice is extracted, but the condition for extracting the voice data is the feature amount of the learner's voice and the most. It is not limited to the approximation. For example, in the database DB1, the speech level (the index representing the approximate degree to the typical speech; the higher the speech level approximates the better speech) is recorded in correspondence with the speech data of the example sentence. The level may be included in the condition of voice data selection. As specific conditions, for example, a condition in which the feature amount is most approximated among those whose ignition level is at least a certain level may be extracted. Alternatively, the condition of extracting the highest ignition level among the approximations of the feature amounts is greater than or equal to an arbitrary value. The ignition level may be performed similarly to the calculation of the approximate index in step S201.

<3-3. 변형예3> <3-3. Variation Example 3>

또한, 시스템의 구성은, 상술한 실시예에서 설명한 것에 한정되지 않는다. 어학 학습 시스템(1)이 네트워크를 통해 서버 장치에 접속되어 있고, 상술한 어학 학습 시스템의 기능 중 일부를, 서버 장치에 담당시켜도 된다. In addition, the structure of a system is not limited to what was demonstrated in embodiment mentioned above. The language learning system 1 is connected to the server device via a network, and the server device may be responsible for some of the functions of the language learning system described above.

또한, 상술한 실시예에서는, CPU(101)가 어학 학습 프로그램을 실행함으로써 어학 학습 시스템으로서의 기능이 소프트웨어적으로 실현되었지만, 도 1에 도시한 기능 구성 요소에 상당하는 전자 회로 등을 이용하여, 하드웨어적으로 시스템을 실현해도 된다. In addition, in the above-described embodiment, although the CPU 101 executes the language learning program, the function as the language learning system is realized in software. However, hardware is implemented using an electronic circuit or the like corresponding to the functional component shown in FIG. It is also possible to realize the system.

<3-4. 변형예4> <3-4. Modification 4>

상술한 실시예에서는, 화자의 음성의 특징량으로서 제1∼제3 포르만트의 포르만트 주파수를 이용하는 양태에 대하여 설명하였지만, 음성의 특징량은 포르만트 주파수에 한정되는 것은 아니다. 스펙트로그램 등, 다른 음성 해석 방법에 기초하여 산출한 특징량이어도 된다. In the above-described embodiment, the aspect using the formant frequencies of the first to third formants as the feature amount of the speaker's voice has been described, but the feature amount of the voice is not limited to the formant frequency. The feature quantity calculated based on other speech analysis methods, such as a spectrogram, may be sufficient.

본 발명에 따르면, 보다 간단한 처리로 학습자와 비슷한 모범 음성을 이용하여 학습하는 것이 가능한 어학 학습 시스템 및 방법을 제공할 수 있다. According to the present invention, it is possible to provide a language learning system and method capable of learning using exemplary voices similar to learners with simpler processing.

Claims

For each of the plurality of speakers, a database for storing a feature amount extracted from the speaker's voice, one or more voice data of the speaker, and storing the correspondences;

Voice acquisition means for acquiring the learner's voice;

Feature amount extracting means for extracting a feature amount of the learner's speech from the speech acquired by the speech obtaining means;

The feature quantity of the learner extracted by the feature quantity extracting means is compared with the feature quantities of a plurality of speakers recorded in the database, and based on the comparison, voice data of one speaker is selected from the database. Voice data selection means

Reproducing means for outputting voice in accordance with one voice data selected by said voice data selecting means

Language learning system having a.

The method of claim 1,

The voice data selecting means calculates an approximate index for each speaker, which indicates a difference between the feature amounts of the plurality of speakers recorded in the database and the feature amount of the learner extracted by the feature extracting means. Selecting from the database one piece of audio data that includes a calculation means and is associated with a feature amount of one speaker that satisfies a predetermined condition based on the approximation index calculated by the approximation calculation means. A language learning system characterized by the above.

The method of claim 2,

And said predetermined condition is a condition that a voice data of one speaker corresponding to an approximation index indicating the highest approximation is selected.

The method of claim 1,

Further comprising a speech rate converting means for converting speech rates of speech data selected by said speech data selecting means,

And said reproducing means outputs speech in accordance with speech data converted by speech rate by said speech rate converting means.

The method according to any one of claims 1 to 4,

Memory means to remember exemplary voices,

Comparison means for comparing the exemplary voice with the learner's voice acquired by the voice acquiring means, and generating information indicative of both approximations;

When the approximation degree represented by the information generated by the comparing means satisfies a predetermined condition, the learner's voice acquired by the speech obtaining means corresponds to the feature quantity extracted by the feature extracting means. Means for updating the database to add it to the database

Language learning system having more.

For each of the plurality of speakers, the learner uses the database to store the feature data extracted from the speaker's voice and the speaker's voice data for one or more language lessons spoken by the speaker. As a method of providing

The process of acquiring the voice spoken by the learner,

Extracting the feature amount of the learner's speech from the acquired speech;

Comparing the extracted feature amount of the learner with feature amounts of a plurality of speakers recorded in the database, and selecting voice data of one speaker from the database based on the comparison;

Outputting a voice according to the selected single voice data

Voice data providing method comprising the.