KR20020007597A

KR20020007597A - Method of speaking test and foreign language pronunciation learning using automatic pronunciation comparison method on internet

Info

Publication number: KR20020007597A
Application number: KR1020000040979A
Authority: KR
Inventors: 이수영; 정소영
Original assignee: 윤덕용; 한국과학기술원
Priority date: 2000-07-18
Filing date: 2000-07-18
Publication date: 2002-01-29
Also published as: JP2002040926A; KR100568167B1

Abstract

PURPOSE: A method for learning foreign language pronunciation and verbal test is provided to objectively test a learner's ability of the foreign language by enabling the learner to rapidly and accurately compare with a native speaker's pronunciation through an automatic voice comparison algorithm based on a DTW(Dynamic Time Warping). CONSTITUTION: A learner connects to a server computer through the Internet(S400). The learner inputs the learner's information(S402). The learner selects a pronunciation exercise scenario among several scenarios provided from a server(S404). The learner hears a native speaker's pronunciation in the selected scenario(S406). The learner records the learner's pronunciation(S408). The difference between the pronunciation of the learner and the native speaker is automatically estimated. In addition, a comparison result is displayed to a screen(S410). It is judged whether the learner goes on training(S412). In case that the learner wants to stop training, it is judged whether the learner goes into training with another scenario(S414).

Description

Foreign language pronunciation learning and verbal test using automatic phonetic comparison method on the Internet {METHOD OF SPEAKING TEST AND FOREIGN LANGUAGE PRONUNCIATION LEARNING USING AUTOMATIC PRONUNCIATION COMPARISON METHOD ON INTERNET}

본 발명은 인터넷 상에서의 자동발음 비교방법을 이용한 외국어 발음 학습 및 구두 테스트를 위한 학습 서비스 제공 방법에 관한 것이다. 특히, 원어민 음성, 학습자 음성, 전문가 평가로 이루어진 데이터를 이용해 자동 음성비교 네트워크를 학습하고, 학습된 자동 음성비교 네트워크를 이용해 학습자가 발음한 음성을 원어민의 발음과 비교하여 발음의 정확도를 구하는 테스트를 실행할 수 있도록 하는 쌍방향 외국어 발음 학습 및 구두 테스트를 위한 서비스 제공 방법에 관한 것이다.The present invention relates to a method of providing a learning service for learning a foreign language pronunciation and oral test using an automatic phonetic comparison method on the Internet. In particular, we test the automatic voice comparison network using data consisting of native speaker's voice, learner's voice, and expert's evaluation, and use the trained automatic voice comparison network to compare the pronunciation of the learner's pronunciation with the native speaker's pronunciation to find the accuracy of pronunciation. The present invention relates to a method of providing a service for interactive language pronunciation learning and oral testing that can be implemented.

기존의 외국어 학습 방법에 있어서 듣기와 말하기 학습은 카세트나 비디오를 통해 원어민의 발음을 반복적으로 듣고, 학습자가 발음을 따라하여 원어민의 발음과 가까운 정도를 스스로 판단하여 발음이 정확해지도록 반복 학습하였다.In the existing foreign language learning method, listening and speaking learning were repeatedly listened to the native speaker's pronunciation through a cassette or video, and the learner followed the pronunciation and judged the degree close to the native speaker's pronunciation to learn the pronunciation accurately.

이러한 학습방법은 자신의 외국어 발음에 대한 객관적인 평가를 할 수 없기 때문에 객관적인 척도를 통해 자신과 원어민의 발음과의 차이를 구하려는 노력이있어왔다.Since these learning methods cannot make an objective evaluation of their foreign language pronunciation, there have been efforts to find the difference between the pronunciation of native speakers and the native speakers through objective scales.

즉, 기존에는 시간영역에서의 음성 차이 예컨대, 음성 신호의 톤과 전체 발음 시간의 차이를 단순히 비교하여 원어민의 발음과 학습자의 발음을 비교하는 방법을 주로 이용하였다.That is, conventionally, a method of comparing the pronunciation of a native speaker and the learner's pronunciation by simply comparing the difference in voice in the time domain, for example, the difference between the tone of the voice signal and the total pronunciation time.

최근 음성신호처리 기술을 이용한 발음 비교 방법이 개발 되었으며, 은닉마코브 모델(Hidden Markov Model 이하,HMM)을 이용하여 학습자의 발음 음성에 대한 인식을 한 후, 원어민의 음성과 비교하는 알고리즘이 대부분이다.Recently, a pronunciation comparison method using voice signal processing technology has been developed, and most of the algorithms that recognize the learner's pronunciation voice using the Hidden Markov Model (HMM) are compared with the native speaker's voice. .

그러나, 학습자가 주변 잡음이 있는 환경에서 발음을 하거나, 학습자의 발음이 불분명하여 인식상에 오류가 발생하게 되면 원어민 발음과의 차이가 의미없게 될 가능성이 크다.However, if the learner pronounces in an environment with ambient noise or if the learner's pronunciation is unclear and an error occurs in recognition, the difference from the native speaker's pronunciation is likely to be meaningless.

또한, 학습자의 외국어 듣기와 말하기 능력을 평가하기 위해서는 TSE(Test of Speaking English), SEPT(Spoken English Proficiency Test)등과 같은 전문 평가 시험을 지정된 시간 및 장소에서 원어민 어학 전문가가 직접 질문하고 대답을 듣는 인터뷰 방식으로 학습자의 외국어 능력을 평가할 수 있었다.In addition, in order to assess the learner's ability to listen and speak foreign languages, a professional evaluation test such as the Test of Speaking English (TSE) and the Spokken English Proficiency Test (SEPT) is conducted by a native language expert who answers questions and answers at a designated time and place. In this way, the learner's foreign language skills could be assessed.

하지만, 이들 방법 역시 외국어 능력을 테스트 하는데 있어서 시간과 공간의 제약을 받게 되고, 전문가의 평가도 전문가의 피로도나 주변 상황에 의한 주관적인 요소에 영향을 받을 우려가 있다.However, these methods are also limited by time and space in testing foreign language proficiency, and the evaluation of experts may be affected by subjective factors such as expert fatigue or surrounding circumstances.

따라서, 본 발명은 상기한 문제점을 해결하기 위한 것으로써, 본 발명의 목적은 동적시간와핑(Dynamic Time Warping 이하, DTW)기반의 자동 음성 비교 알고리즘을 통해 학습자 음성을 인식하지 않고, 원어민 음성과의 차이를 빠르고 정확하게 비교할 수 있는 어학 학습 방법을 구현토록 하는 인터넷 상에서의 자동발음 비교방법을 이용한 외국어 발음 학습 및 구두 테스트 방법을 제공하는데 있다.Therefore, the present invention is to solve the above problems, an object of the present invention is to recognize the learner's voice through a dynamic time warping (DTW) -based automatic speech comparison algorithm, The present invention aims to provide foreign language pronunciation learning and oral test methods using an automatic phonetic comparison method to implement language learning methods that can quickly and accurately compare differences.

본 발명의 다른 목적은 인터넷 상의 웹기반 상태에서 학습자가 시간과 장소에 구애받지 않고 원하는 시간과 장소에서 자신의 외국어 발음을 연습하고 구두 테스트를 받을 수 있도록 하는 인터넷 상에서의 자동발음 비교방법을 이용한 외국어 발음 학습 및 구두 테스트 방법을 제공하는데 있다.Another object of the present invention is a foreign language using an automatic phonetic comparison method on the Internet that allows a learner to practice his / her foreign language pronunciation and take an oral test at any time and place regardless of time and place in a web-based state on the Internet. To provide pronunciation learning and oral test methods.

도 1은 본 발명에 따른 인터넷 상에서의 외국어 발음 학습 및 구두 테스트를 위한 자동 음성 비교 네트워크 알고리즘의 블록 구성도1 is a block diagram of an automatic speech comparison network algorithm for foreign language pronunciation learning and oral testing on the Internet according to the present invention

도 2는 도 1에 도시된 자동 음성 비교 네트워크의 부분 상세도FIG. 2 is a partial detailed view of the automatic voice comparison network shown in FIG. 1

도 3은 도 2에 도시된 음운차이 계산 모델에서 세기차이 비교의 상세 흐름도3 is a detailed flowchart of intensity difference comparison in the phonological difference calculation model shown in FIG.

도 4는 도 3에 도시된 시간-주파수 동적 와핑 기반의 부분 상세도4 is a partial detail view of the time-frequency dynamic warping based shown in FIG.

도 5a 내지 도 5c는 도 2에 도시된 운율차이 계산 모델의 비교 그래프5A to 5C are comparison graphs of the prosody difference calculation model shown in FIG.

도 6은 본 발명에 따른 외국어 발음 학습 및 구두 테스트를 위한 인터넷 시스템의 개략 구성도6 is a schematic structural diagram of an internet system for foreign language pronunciation learning and oral test according to the present invention

도 7은 본 발명에 따른 학습자 컴퓨터와 서버 컴퓨터간의 데이터처리 흐름도7 is a flowchart of data processing between a learner computer and a server computer according to the present invention.

도 8은 본 발명의 실시예에 따라 외국어 발음 학습의 과정을 나타낸 순서도8 is a flowchart illustrating a process of learning a foreign language pronunciation according to an embodiment of the present invention.

도 9는 본 발명의 실시예에 따라 외국어 발음 구두 테스트의 과정을 나타낸 순서도9 is a flowchart illustrating a process of a foreign language pronunciation oral test according to an embodiment of the present invention.

<도면의 주요부분에 대한 부호의 설명><Description of the symbols for the main parts of the drawings>

10 : 학습자 음성신호 20 : 원어민 음성신호10: learner voice signal 20: native speaker voice signal

30 : 자동음성비교 네트워크 40 : DTW기반차이 비교모델30: automatic speech comparison network 40: DTW-based difference comparison model

42 : 음운차이계산모델 44 : 운율차이계산모델42: phonological difference calculation model 44: rhyme difference calculation model

50 : 오차보정신경회로망 60 : 전문가평가비교 네트워크50: error correction neural network 60: expert evaluation comparison network

70 : 오차계산 네트워크70: error calculation network

300 : 외국어 발음학습 및 테스트를 위한 서버컴퓨터300: server computer for learning and testing foreign language pronunciation

320 : 인터넷 340 : 학습자컴퓨터320: Internet 340: student computer

상기한 목적을 달성하기 위한 기술적 사상으로써 본 발명은The present invention as a technical idea for achieving the above object

원어민의 음성과 학습자의 음성을 자동으로 비교하여 그 차이를 구하는 외국어 발음 학습 방법에 있어서,In the foreign language pronunciation learning method that automatically compares the native speaker's voice with the learner's voice and finds the difference,

학습자 음성과 원어민 음성의 음운 및 운율 차이값이 DTW기반 차이 비교 네트워크에 의해 계산되고, 오차계산 네트워크를 통해 전문가평가비교 네트워크에서 평가한 비교 수치와의 차이가 계산되고, 그 차이값이 줄어들도록 DTW기반 차이 비교 네트워크를 학습토록 하는 인터넷 상에서의 자동발음 비교방법을 이용한 외국어 발음 학습 및 구두 테스트 방법이 제시된다.The phonological and rhyme difference between the learner's voice and the native speaker's voice is calculated by DTW-based difference comparison network, and the error calculation network calculates the difference with the comparison value evaluated by the expert evaluation comparison network and reduces the difference. A method for learning phonetic pronunciation and foreign language using the automatic phonetic comparison method on the Internet for learning the difference-based comparison network is presented.

이하, 본 발명의 실시예에 대한 구성 및 그 작용을 첨부한 도면을 참조하면서 상세히 설명하기로 한다.Hereinafter, with reference to the accompanying drawings, the configuration and operation of the embodiment of the present invention will be described in detail.

도 1은 본 발명에 따른 인터넷 상에서의 외국어 발음 학습 및 구두 테스트를 위한 자동 음성 비교 네트워크의 알고리즘을 나타낸 블록 구성도이다.1 is a block diagram illustrating an algorithm of an automatic voice comparison network for foreign language pronunciation learning and oral testing on the Internet according to the present invention.

먼저, 자동음성 비교 네트워크 알고리즘의 구현을 위해 주어진 데이터는 원어민과 학습자가 발음한 각각의 단어나 문장에 대해 전문가가 비교한 발음 차이값이다. 보통 발음의 차이값은 1∼ 5까지 5가지 이산값을 취한다. 여기서, 원어민의 음성은 시간에 따라 발음 기호가 표시된 데이터(transcribed data)로 가정한다.First, the data given for the implementation of the automatic speech comparison network algorithm are pronunciation differences compared by experts for each word or sentence spoken by the native speaker and learner. Usually, the difference in pronunciation takes 5 discrete values from 1 to 5. Here, it is assumed that the native speaker's voice is data written with phonetic symbols over time.

도 1에 도시된 자동음성 비교 네트워크 알고리즘을 살펴보면, 학습자 음성신호(10)와, 원어민 음성신호(20), 자동음성비교 네트워크(30), 전문가평가비교 네트워크(60) 및 오차계산 네트워크(70)로 구성되어 있다.Looking at the automatic speech comparison network algorithm shown in Figure 1, the learner voice signal 10, native speaker voice signal 20, automatic voice comparison network 30, expert evaluation comparison network 60 and error calculation network 70 Consists of

이 때, 학습자의 음성신호(10)와 원어민의 음성신호(20)는 자동음성비교 네트워크(30)를 통해 음운과 운율의 차이값이 계산된다.At this time, the learner's voice signal 10 and the native speaker's voice signal 20 is calculated through the automatic voice comparison network 30, the difference between the rhyme and rhyme.

자동음성비교 네트워크(30)에는 DTW기반차이 비교 네트워크(40)과 오차보정 신경회로망(50)이 더 포함하여 구성된다.The automatic speech comparison network 30 further includes a DTW-based difference comparison network 40 and an error correction neural network 50.

오차계산 네트워크(70)에는 전문가평가비교 네트워크(60)의 수치와 자동음성 비교 네트워크(30)에서 구한 비교 수치의 차이를 구하며, 그 차이값이 줄어들도록 오차보정 신경회로망(50)의 학습이 이루어진다.The error calculation network 70 obtains the difference between the value of the expert evaluation comparison network 60 and the comparison value obtained from the automatic voice comparison network 30, and the error correction neural network 50 is trained to reduce the difference value. .

도 2는 도 1에 도시된 DTW기반차이 비교 모델의 부분 상세도이다.FIG. 2 is a partial detailed view of the DTW-based difference comparison model shown in FIG. 1.

도 2에 도시된 DTW기반차이 비교 모델(또는 DTW기반차이 비교 네트워크)를 좀 더 구체적으로 살펴보면, 학습자의 음성신호(10)와 원어민의 음성신호(20)는 DTW기반차이 비교 모델(40)을 통하여 음운 차이와 운율 차이가 계산된다.Referring to the DTW-based difference comparison model (or DTW-based difference comparison network) shown in FIG. 2 in more detail, the learner's voice signal 10 and the native speaker's voice signal 20 may use the DTW-based difference comparison model 40. Phonological differences and rhyme differences are calculated.

먼저, 음운차이 계산 모델(42)에는 세기차이 비교(42a) 블록과, 시간차이 비교(42b) 블록, 주파수차이 비교(42c) 블록이 계산 되며, 주어진 원어민의 문장 발음에 대해 단어별, 음절별, 음소별로 학습자 발음의 정확도를 계산하는 블록이다.First, the phonological difference calculation model 42 calculates a strength difference comparison 42a block, a time difference comparison 42b block, and a frequency difference comparison 42c block, and calculates words and syllables for a given native speaker's sentence pronunciation. It is a block that calculates the accuracy of learner pronunciation for each phoneme.

세기차이 비교(42a) 블록에서는 화자(speaker)에 따른 음성 신호의 특징들을 모두 없애고 난 후, 두 음성 신호간의 차이를 구한다.In the strength difference comparison 42a block, all the features of the speech signal according to the speaker are removed, and then the difference between the two speech signals is obtained.

즉, 학습자가 발음한 음성과 원어민 음성의 언어학적 메시지(linguistic message)만의 차이를 구하는 블록이라고 볼 수 있다.That is, it can be regarded as a block for finding the difference between the linguistic message of the voice pronounced by the learner and the native speaker's voice.

시간차이 비교(42b) 블록은 문장, 단어, 음절, 음소 등의 발음 지속 시간의 차이를 구하는 블록이며, 주파수차이 비교(42c) 블록은 학습자가 발음한 음성과 원어민 음성 사이의 포만트(formant) 위치 차이를 계산하는 블록을 나타낸다.The time difference comparison block 42b is a block for obtaining a difference in pronunciation duration of sentences, words, syllables, and phonemes. The frequency difference comparison 42c block is a formant between a learner's pronunciation voice and a native speaker's voice. Represents a block for calculating position differences.

이어서, 음운차이 계산 모델(44)은 문장 전체에서 음소와 음소사이, 음절과 음절 사이, 단어와 단어 사이 등을 학습자가 정확히 발음했는지를 구하는 블록을 나타낸다.Subsequently, the phonological difference calculation model 44 represents a block for determining whether the learner correctly pronounces between the phonemes and the phonemes, between the syllables and the syllables, and between the words and the words.

음운차이 계산 모델(44)에는 강세(stress)차이 비교(44a) 블록과, 억양(intonation)차이 비교(44b) 블록 및, 리듬(rhythm)차이 비교(44c) 블록이 계산된다.The phonological difference calculation model 44 calculates a stress difference comparison 44a block, an intonation difference comparison 44b block, and a rhythm difference comparison 44c block.

이 때, 음운차이 계산 모델(44)은 학습자와 원어민 음성의 피치 윤곽선(pitch contour)차이로 부터 얻어진다. 즉, 학습자와 원어민의 음성에서 시간에 따른 피치 모양을 기존의 피치검출 방법을 이용한다.At this time, the phonological difference calculation model 44 is obtained from the pitch contour difference between the learner and the native speaker's voice. In other words, the pitch shape of the learner and the native speaker over time using the conventional pitch detection method.

강세차이 비교(44a) 블록은 피치 윤곽선에서 최대 피크(peak)값들의 상대적인 위치 차이를 구하는 블록이다.The bullish difference comparison 44a block is a block for obtaining the relative position difference of the maximum peak values in the pitch contour.

억양차이 비교(44b) 블록은 학습자와 원어민 발음의 끝부분에서 두 개의 피치 윤곽선들의 기울기 차이로부터 구하는 블록을 나타내며, 리듬차이 비교(44c) 블록은 인접한 단어나 음절들 사이에서 나타나는 피크와 밸리(valley)들의 상대적인 위치 및 크기로부터 계산되는 블록을 나타낸다.The intonation difference comparison (44b) block represents a block obtained from the difference in the slope of two pitch contours at the end of learner and native speaker's pronunciation, and the rhythm difference comparison (44c) block shows peaks and valleys between adjacent words or syllables. Block calculated from the relative position and size of the

상기와 같이 DTW기반차이 비교 모델(40)에서 계산된 6가지 출력값들은 전문가평가비교 네트워크(60)의 전문가 평가수치와 비교되기 전에 오차보정신경회로망(50)를 통과한다.As described above, the six output values calculated in the DTW-based difference comparison model 40 pass through the error correction neural network 50 before being compared with the expert evaluation values of the expert evaluation comparison network 60.

이 때, 오차보정신경회로망(40)은 자동으로 계산된 발음 비교값들이 전문가의 평가 값과 가까워지도록 음운과 운율의 차이 계산 네트워크 즉, DTW기반차이 비교 네트워크(40)의 6개 출력값들을 비선형적으로 조합하는 네트워크이다. 오차보정신경회로망(40)의 구조로는 2층의 다층 구조 퍼셉트론 모델(Multi-Layer Perceptron)이 적용된다.At this time, the error correction neural network 40 nonlinearly calculates six output values of the phonological and rhyme difference calculation network, that is, the DTW-based difference comparison network 40 so that the automatically calculated pronunciation comparisons are close to the expert's evaluation values. It is a network to combine. As a structure of the error correction neural network 40, a multilayer multilayer perceptron model (Multi-Layer Perceptron) is applied.

자동 발음 비교 네트워크의 알고리즘으로 계산된 수치와 전문가평가비교 네트워크(60)의 전문가 평가수치는 오차보정신경회로망(50)에서 오차가 계산되어 신경회로망 네트워크의 시냅스 가중치들을 학습하게 된다.The numerical value calculated by the algorithm of the automatic pronunciation comparison network and the expert evaluation value of the expert evaluation comparison network 60 calculate the error in the error correction neural network 50 to learn the synaptic weights of the neural network.

이 때, 학습은 기존의 제곱 평균 오차 함수(Mean Squared Error Function)와 오차 역전파 학습 알고리즘(Error-Back Propagation algorithm)으로 이루어 진다.In this case, the learning consists of a conventional mean squared error function and an error back propagation algorithm.

도 3은 도 2에 도시된 음운차이 계산 모델에서 세기차이 비교의 상세 흐름도이다.3 is a detailed flowchart of an intensity difference comparison in the phonological difference calculation model shown in FIG. 2.

도 3에 도시된 세기차이 비교 블록의 상세 흐름도를 살펴보면, 원어민과 학습자 음성의 세기 차이 계산은 다음과 같은 알고리즘으로 구현된다.Looking at the detailed flow chart of the intensity difference comparison block shown in Figure 3, the calculation of the intensity difference between the native speaker and the learner's voice is implemented by the following algorithm.

먼저, 원어민의 음성신호(S100)에서 끝점을 추출(S102)하고, 학습자의 음성 신호(S101)에서도 끝점을 추출(S103)하여 에너지 규준화 블록(S104)을 통과한다.First, an end point is extracted from the voice signal S100 of a native speaker (S102), and an end point is also extracted (S103) from the learner's voice signal S101 and passed through the energy normalization block S104.

이어서, 마이크에 따른 출력 에너지의 차이를 제거한 후, 프레임 블록화(S105,S106)하고 푸리에 변환(S107,S108)한다.Subsequently, after removing the difference in the output energy according to the microphone, frame blocking (S105, S106) and Fourier transform (S107, S108).

이 때, 프레임 블록화(S105,S106)는 시계열로 들어오는 음성 신호를 수십 밀리초(milli-second)로 나누어, 해밍 윈도(Hamming window)나 하닝 윈도(Hanning window)를 씌우는 부분이다.At this time, the frame blocking (S105, S106) is a portion that divides the speech signal coming in the time series by several tens of milli-seconds to cover a Hamming window or a Hanning window.

또한, 푸리에 변환(S107,S108)은 두 음성 신호를 각 프레임별로 푸리에 변환을 통해 시간 영역 신호를 주파수 영역 신호로 바꾼다.In addition, the Fourier transforms S107 and S108 convert the time-domain signals into frequency-domain signals through Fourier transforms of the two voice signals for each frame.

이어서, 선형 주파수 변환과 DTW를 통해 화자에 따른 특성을 제거해 주는 시간-주파수 동적 와핑(S109)한 후, 바크(Bark) 단위로의 주파수 와핑(S110,S111)과 라운드니스(Loudness)단위로의 세기 와핑(S112,S113) 과정을 거치게 된다.Subsequently, time-frequency dynamic warping (S109) that removes speaker-specific characteristics through linear frequency conversion and DTW, and then frequency warping (S110, S111) and roundness (unit) in units of bark The intensity warping (S112, S113) is subjected to the process.

이 때, 시간-주파수 동적 와핑(S109)은 원어민의 음성과 학습자의 음성에서 화자의 차이에 의한 영향, 즉, 발음 지속(duration) 시간의 차이와 성도 길이(vocal tract length)의 차이에 의한 주파수 영역의 차이를 제거하기 위한 블록이다. 발음 지속 시간의 차이는 DTW에 의해 없앨 수 있고, 성도 길이에 따른 차이는 선형 주파수 변환에 의해 없앨 수 있다.At this time, the time-frequency dynamic warping (S109) is a frequency caused by the difference between the speaker in the native speaker's voice and the learner's voice, that is, the difference in the duration of pronunciation and the difference in the vocal tract length. This block removes the difference between regions. The difference in pronunciation duration can be eliminated by DTW, and the difference in saint length can be eliminated by linear frequency conversion.

바크(Bark) 단위로의 주파수 와핑(S110,S111)은 헤르쯔(Hz)단위의 음성 신호를 음향 심리학적(psychoacoustic)주파수 단위인 바크(Bark) 단위로 바꾸는 부분이다.Frequency warping in units of Bark (S110, S111) is a part for converting a voice signal in Hertz (Hz) into a unit of Bark, which is an acoustic psychological frequency unit.

라운드니스(Loudness)단위로의 세기 와핑(S112,S113)은 푸리에 변환(S107,S108)을 통해 나온 스펙트럼의 에너지를 음향 심리학적 세기 단위인 Loudness단위로 바꾸는 블록이다.Intensity warping in units of roundness (S112, S113) is a block for converting the energy of the spectrum from the Fourier transform (S107, S108) into a unit of loudness, which is an acoustic psychological intensity unit.

이어서, 푸리에 역변환(S114,S115)한 후, 켑스트럼 계산 블록(S116,S117)을 통해 최종적으로 켑스트럼 특징 벡터들을 추출한다.Subsequently, after Fourier inverse transforms S114 and S115, the cepstrum feature vectors are finally extracted through the cepstrum calculation blocks S116 and S117.

즉, 푸리에 역변환(S114,S115)은 라운드니스(Loudness)단위세기 와핑을 한 신호들이 실수값을 취하고, 대칭적이기 때문에 코사인 변환(Cosine transform)으로 계산하는 부분이다.That is, the Fourier inverse transforms S114 and S115 are parts calculated by a cosine transform because signals having rounded unit strength warping take a real value and are symmetrical.

원어민의 음성 신호(S100)와 학습자의 음성 신호(S101)에서 비교 되어질 켑스트럼 특징 벡터들은 켑스트럼 계산 블록(S116,S117)을 통해 최종적으로 추출한다.The cepstrum feature vectors to be compared in the native speaker's voice signal S100 and the learner's voice signal S101 are finally extracted through the cepstrum calculation blocks S116 and S117.

이 때, 상기한 본 발명의 방법은 기존의 PLP(perceptual linear prediction)특징 추출 방법과 유사한 데, 음성 신호의 화자 특성을 없애기 위한 시간-주파수 동적 와핑을 특징 추출하는 과정에 시행하는 시간-주파수 동적 와핑 블록(S109)이 새로 추가된 점에서 차이가 있다.At this time, the method of the present invention is similar to the conventional PLP (perceptual linear prediction) feature extraction method, which is a time-frequency dynamic that is performed in the process of feature extraction of time-frequency dynamic warping to eliminate speaker characteristics of the speech signal. There is a difference in that the warping block S109 is newly added.

이어서, 프레임별 거리를 계산한 후(S118) 음운의 세기 차이 단위로 변환시키게 된다.(S119)Subsequently, the distance for each frame is calculated (S118) and then converted into units of intensity differences of the phonemes (S119).

이 때, 프레임별 거리 계산 블록(S118)은 원어민과 학습자 음성의 특징 벡터들을 프레임 단위로 거리 차이를 계산하는 부분이다. 거리 계산은 유클리드 거리로 계산하여 모든 프레임에 대해 거리 차이값들을 더하여 두 음성의 발음 차이값으로한다.In this case, the distance calculation block S118 for each frame is a part for calculating a distance difference of the feature vectors of the native speaker and the learner's voice in units of frames. Distance calculation is calculated as Euclidean distance and distance difference values are added for all frames to be pronunciation difference of two voices.

또한, 프레임별 거리 계산 블록(S118)에서 구한 발음 차이값을 전문가 평가 수치와의 비교를 위해 1에서 5까지의 크기를 갖도록 음운의 세기 차이 계산 블록(S117)에서 선형 변환이나 로지스틱(logistic) 변환을 이용해 변환한다.In addition, a linear or logistic transformation is performed in the phonological intensity difference calculation block S117 to have a magnitude of 1 to 5 for comparison with the expert evaluation value. To convert.

도 4는 도 3에 도시된 시간-주파수 동적 와핑 단계의 부분 상세도이다.4 is a partial detail of the time-frequency dynamic warping step shown in FIG. 3.

도 4를 살펴보면, 푸리에 변환된 원어민 음성(200)과 학습자 음성(201)은 이들의 켑스트럼 특징 벡터들의 프레임별 거리들이 최소가 되도록 시간과 주파수 영역에서 와핑이 일어난다.Referring to FIG. 4, the Fourier-transformed native speaker 200 and the learner's voice 201 are warped in a time and frequency domain so that the distances per frame of their cepstral feature vectors are minimized.

즉, 학습자 음성(201)은 화자들간의 음성 신호 차이의 주 요인으로 꼽히는 성도 길이에 의한 차이를 없애기 위해 선형 주파수 와핑 네트워크(202)를 통과하고, 원어민 음성(200)과의 발음 시간 차이를 없애기 위해 비선형 동적 시간 와핑(203)을 통과한다.That is, the learner's voice 201 passes through the linear frequency warping network 202 to eliminate the difference in vocal length, which is considered to be the main factor of the difference in the voice signal between the speakers, and eliminates the difference in pronunciation time with the native speaker's 200. Pass through the nonlinear dynamic time warping 203.

그리고 나서, 켑스트럼 특징 벡터를 추출하는 블록(204,205)으로 들어가 켑스트럼들이 계산되어 프레임별 거리 계산 블록(206)에서 켑스트럼 벡터들 사이의 유클리드 거리가 계산된다. 이 때, 이 오차가 최소가 되도록 선형 주파수 와핑(202)과 비선형 동적 시간 와핑(203)이 이루어진다.Then, blocks 204 and 205 for extracting the cepstruum feature vector are computed so that the Euclidean distance between the cepstrum vectors is calculated in the frame-by-frame distance calculation block 206. At this time, linear frequency warping 202 and nonlinear dynamic time warping 203 are made such that this error is minimal.

한편, 도 2의 시간차이 비교(42b) 블록은 도 4의 시간-주파수 동적 와핑에 의해 계산되어진 학습자와 원어민의 특징 벡터들에서 시간에 따른 와핑 정도를 이용해 계산할 수 있다.Meanwhile, the time difference comparison 42b block of FIG. 2 may be calculated using the degree of warping over time in the feature vectors of the learner and the native speaker calculated by the time-frequency dynamic warping of FIG. 4.

즉, 시간축으로 정렬된 두 음성 신호들의 음소별 발음 지속 시간 차이들을모두 더하여 총 음소의 개수로 나눈 값이 시간차이 비교(42b) 블록의 출력값이 된다.That is, a value obtained by adding all phoneme duration differences of two phonetic signals arranged on a time axis and dividing by the total number of phonemes is an output value of the time difference comparison 42b block.

이와 마찬가지로, 도 2에서 주파수차이 비교(42c) 블록은 상기의 시간차이 비교(42b) 블록에서 구한 방법과 유사하게 주파수 축에 따른 선형 변환을 한 후, 학습자가 발음한 음성과 원어민의 음성사이의 제1 포만트(F1), 제2 포만트(F2), 제3 포만트(F3)등의 위치 차이로부터 계산한다.Similarly, in FIG. 2, the frequency difference comparison 42c block performs a linear transformation along the frequency axis similarly to the method obtained in the time difference comparison 42b block, and then, between the voice spoken by the learner and the native speaker's voice, is obtained. It calculates from the position difference of 1st formant F1, 2nd formant F2, 3rd formant F3, etc.

도 5a 내지 도 5c는 도 2에 도시된 운율차이 계산 모델의 비교 그래프이다.5A to 5C are comparison graphs of the prosody difference calculation model shown in FIG. 2.

도 5에 도시된 바와 같이, 두 음성 신호의 운율 차이는 피치 윤곽선의 차이로부터 계산되는데 음성의 피치는 기존의 주파수 필터링이나 켑스트럼을 이용하는 방법들로 구하고, 선형 회귀방법으로 유성음 발음과 무성음 발음에서의 피치 윤곽선들이 연속이 되게 한다.As shown in FIG. 5, the rhythm difference between the two voice signals is calculated from the difference in the pitch contour. The pitch of the voice is obtained by using conventional frequency filtering or cepstrum, and the voiced pronunciation and the unvoiced pronunciation by linear regression. Let the pitch contours in be continuous.

도 5a를 살펴보면, 원어민과 학습자의 문장 발음에 대해 피치 윤곽선을 시간에 따라 나타낸 그래프이다.Referring to FIG. 5A, a pitch outline of a native speaker and a learner's sentence pronunciation over time is shown.

도 2의 강세차이 비교(44a) 블록은 피치 윤곽선에서 최대 피크가 나타나는 음절이나 단어가 틀린 정도를 두 음성의 강세 차이로 비교한다. 즉, 학습자가 강세를 두고 발음하는 음절과 원어민의 강세 음절과의 시간 차이를 계산하는데 차이 구간내의 음절 개수가 강세의 차이로 나타남을 알 수 있다.The bullish difference comparison 44a block of FIG. 2 compares the degree of syllable or word in which the maximum peak appears in the pitch contour with the stress difference between the two voices. In other words, it can be seen that the number of syllables in the difference section is represented as a difference in stress when the learner calculates a time difference between a syllable pronounced and a native speaker's stressed syllable.

도 5b를 살펴보면, 도 2의 억양차이 비교(44b) 블록을 설명하기 위한 것으로서, 억양의 차이는 문장 발음의 끝부분에서 나타나는 피치 윤곽선의 기울기 차이로부터 계산된다.Referring to FIG. 5B, to explain the intonation difference comparison 44b block of FIG. 2, the difference in intonation is calculated from the difference in the slope of the pitch outline appearing at the end of the sentence pronunciation.

즉, 평서문의 경우 문장 끝부분에서의 피치 기울기는 음수이고, 의문문의 경우는 대체적으로 기울기가 양수가 된다. 상기한 기울기 차이를 이용해 두 음성의 억양의 차이를 구할 수 있다.That is, in the case of a plain sentence, the pitch slope at the end of the sentence is negative, and in the case of a question sentence, the slope is generally positive. The difference in inclination of two voices can be obtained using the above-described slope difference.

도 5c는 도 2의 리듬차이 비교(44c) 블록을 설명하는 그래프로써, 두 음성의 피치 윤곽선에서 삼각형 모양은 피크를 나타내고, 역삼각형 모양은 밸리(valley)를 나타낸다. 두 음성의 리듬 차이는 상기한 피크와 밸리의 개수와 크기 차이로부터 구할 수 있다는 것을 알 수 있다.5C is a graph illustrating the rhythm difference comparison 44c block of FIG. 2, in which the triangular shape represents the peak and the inverse triangle shape represents the valley in the pitch contour of the two voices. It can be seen that the rhythm difference between the two voices can be obtained from the difference in the number and size of the peaks and valleys.

도 6은 본 발명에 따른 외국어 발음 학습 및 구두 테스트를 위한 인터넷 시스템의 개략 구성도이다.6 is a schematic structural diagram of an internet system for foreign language pronunciation learning and oral test according to the present invention.

도 6에 도시된 바와 같이, 외국어 발음 학습 및 구두 테스트를 위한 서버 컴퓨터(300)와, 상기 서버컴퓨터(30)와 연결되는 인터넷(320) 연결망, 상기 인터넷(320) 연결망을 통해 접속하는 학습자 컴퓨터(340a,340b...)와 학습자(360a,360b...)로 이루어진 시스템이다.As shown in FIG. 6, a server computer 300 for foreign language pronunciation learning and oral testing, a learner computer connected through an internet 320 connection network connected to the server computer 30, and an internet 320 connection network. (340a, 340b ...) and learners (360a, 360b ...).

이 때, 학습자(360a,360b...)는 음성 신호를 듣고, 녹음 할 수 있도록 마이크 장치와 스피커를 갖는다.At this time, the learners 360a, 360b ... have a microphone device and a speaker so that they can hear and record voice signals.

도7은 학습자 컴퓨터와 서버 컴퓨터 사이에서 이루어지는 일련의 데이터 처리 과정들을 나타낸 것이다.7 shows a series of data processing steps performed between a learner computer and a server computer.

도 7a는 학습자 컴퓨터에서 대부분의 알고리즘이 처리되고, 도 7b에서는 서버 컴퓨터에서 대부분의 알고리즘이 처리된다.In FIG. 7A, most algorithms are processed in a learner computer, and in FIG. 7B, most algorithms are processed in a server computer.

먼저, 도 7a를 살펴보면, 학습자 컴퓨터(340)에서는 마이크를 통해 학습자의음성 발음을 녹음한 학습자의 음성신호(342)와 서버 컴퓨터(300)의 원어민 음성신호 데이터베이스(302)에서 발음 연습하고자 하는 원어민의 음성 신호(304)를 가져와 자동발음 비교 알고리즘(344)을 통해 두 음성의 차이를 수치화하여, 발음비교결과 디스플레이(346) 블록을 통해 학습자 컴퓨터 화면에 나타낸다.First, referring to FIG. 7A, the learner computer 340 learns a speaker's voice pronunciation through a microphone and a native speaker who wants to practice pronunciation in a native speaker's voice signal database 302 of the server computer 300. The voice signal 304 is taken and the difference between the two voices is digitized through the automatic phonetic comparison algorithm 344 and displayed on the learner computer screen through the pronunciation comparison result display block 346.

또한, 서버 컴퓨터(300)에서는 원어민의 음성신호 데이터베이스(302)를 구비하여 학습자가 요청한 발음연습 시나리오에 따라 인터넷을 통해 학습자 컴퓨터(340)에 원어민의 음성 신호(342)를 보내주게 된다.In addition, the server computer 300 includes a native speaker's voice signal database 302 to send the native speaker's voice signal 342 to the learner computer 340 through the Internet according to the pronunciation practice scenario requested by the learner.

이어서, 도 7b를 살펴보면, 학습자 컴퓨터(340)에서는 상기 과정과 마찬가지로, 마이크를 통해 들어온 학습자 음성신호(342)를 서버 컴퓨터(300)에 보내어 원어민 음성신호 데이터베이스(302)에서 연습하고자 하는 원어민 음성신호(304)를 선택한다.Subsequently, referring to FIG. 7B, the learner computer 340 transmits the learner voice signal 342 through the microphone to the server computer 300 in the same manner as the above process, and the native speaker voice signal to practice in the native speaker voice signal database 302. Select 304.

그 후, 자동발음비교 알고리즘(306)에서 두 음성의 차이가 수치화되어 학습자 컴퓨터의 발음 비교 결과 디스플레이(346)로 보내져 화면에 나타나게 된다.Thereafter, the difference between the two voices is digitized by the automatic phonetic comparison algorithm 306 and sent to the pronunciation comparison result display 346 of the learner computer and displayed on the screen.

도 8은 본 발명의 실시예에 따라 외국어 발음 학습의 과정을 나타낸 순서도이다.8 is a flowchart illustrating a process of learning a foreign language pronunciation according to an embodiment of the present invention.

도 8를 살펴보면, 학습자가 자신의 컴퓨터에서 인터넷을 통해 서버 컴퓨터에 접속하는 단계(S400)와;8, the learner accesses the server computer through the Internet from his computer (S400);

학습자가 자신의 정보를 입력하는 단계(S402)와;A learner inputting his or her information (S402);

학습자가 서버에서 제공되는 여러 개의 발음 연습 시나리오 중에서 원하는 시나리오를 선택하는 단계(S404)와;A learner selecting a desired scenario from among a plurality of pronunciation practice scenarios provided by a server (S404);

선택된 시나리오에서 원어민의 문장 발음을 청취하는 단계(S406)과;Listening to a native speaker's sentence pronunciation in the selected scenario (S406);

학습자 자신의 발음을 녹음하는 단계(S408)와;Recording the pronunciation of the learner himself (S408);

두 음성의 차이를 자동으로 계산하여 비교 결과를 화면에 디스플레이하는 단계(S410)와;Automatically calculating a difference between two voices and displaying a comparison result on a screen (S410);

문장 발음 연습을 계속할 것인지를 판단하는 단계(S412); 및Determining whether to continue the sentence pronunciation practice (S412); And

상기 단계(S410)에서 중단하고자 하는 경우 다른 시나리오로 연습할 것인지의 여부를 선택하는 단계(S414)로 이루어진다.If you want to stop in the step (S410) consists of selecting whether to practice in a different scenario (S414).

도 9는 본 발명의 실시예에 따라 외국어 발음 구두 테스트의 과정을 나타낸 순서도로써, 학습자가 서버 컴퓨터에 접속하여 자신의 외국어 발음 능력을 테스트하는 과정을 보여주기 위해 순서도 형태로 나타낸 실시예이다.9 is a flowchart illustrating a process of a foreign language pronunciation oral test according to an embodiment of the present invention, in which a learner accesses a server computer to show a process of testing his or her foreign language pronunciation ability.

도 9를 살펴보면, 학습자가 서버 컴퓨터에 접속하는 단계(S500)와;9, the learner accesses the server computer (S500);

학습자가 자신의 정보를 입력하는 단계(S502)와;A learner inputting his or her information (S502);

학습자가 발음 테스트하고자 하는 단어나 문장의 난이도를 선택하는 단계(S504)와;Selecting a difficulty level of a word or sentence by a learner (S504);

선택된 난이도의 문장에 대해 원어민의 발음을 청취하는 단계(S506)와;Listening to a native speaker's pronunciation for the sentence of selected difficulty level (S506);

학습자가 자신의 발음을 녹음하는 단계(S508)와;The learner recording his / her pronunciation (S508);

테스트하고자 하는 문장을 모두 발음했는지를 체크하는 단계(S510); 및Checking whether all the sentences to be tested are pronounced (S510); And

다른 난이도의 문제들로 다시 테스트 하고자 하는 지를 선택하는 단계(S512); 및Selecting whether to test again with different difficulty problems (S512); And

학습자가 발음한 음성들에 대해 최종적인 발음 비교 결과를 화면에 디스플레이하는 단계(S514)로 이루어진다.In operation S514, a final comparison result of the pronunciation of the voices pronounced by the learner is displayed on the screen.

이상에서 상술한 바와 같이, 본 발명에 의한 인터넷 상에서의 자동발음 비교방법을 이용한 외국어 발음 학습 및 구두 테스트 방법에 따르면 다음과 같은 이점이 있다.As described above, according to the foreign language pronunciation learning and oral test method using the automatic pronunciation comparison method on the Internet according to the present invention has the following advantages.

첫째, 외국어의 발음을 학습하려는 학습자의 음성과 원어민의 음성을 음성 신호 처리 기술을 이용한 자동음성비교 알고리즘으로 비교함으로써 학습자가 자신의 발음 능력에 대한 객관적인 평가 수치를 알 수 있는 이점이 있다.First, by comparing the learner's voice and the native speaker's voice with an automatic speech comparison algorithm using speech signal processing technology, the learner can know the objective evaluation value of his / her pronunciation ability.

둘째, 자동음성비교 알고리즘을 웹상태에서 즉, 인터넷에 연결된 컴퓨터에서 외국어 발음 학습과 구두 테스트를 할 수 있는 외국어 학습 서비스를 제공함으로써 어학 전문가를 옆에 두고 학습을 하는 것처럼 학습자가 자신의 외국어 발음 능력을 검증할 수 있다.Second, the learner's ability to pronounce his / her foreign language is as if a learner learns by a language expert by providing an automatic voice comparison algorithm that provides foreign language learning services that enable students to learn foreign language pronunciation and oral tests on the web, that is, on a computer connected to the Internet. Can be verified.

Claims

In the foreign language pronunciation learning method that automatically compares the native speaker's voice with the learner's voice to find the difference,

A first step in which a phonological and rhyme difference value between a learner's voice and a native speaker's voice is calculated by a DTW-based difference comparison network;

A second step of calculating a difference value between the calculated value of the DTW-based difference comparison network and the comparison value evaluated by the expert evaluation comparison network through an error calculation network; And

A foreign language pronunciation learning and verbal test method using an automatic phonetic comparison method on the Internet, comprising a third step of learning a DTW-based difference comparison network so that the first and second arithmetic difference values are reduced.

The DTW-based difference comparison network includes an intensity difference comparison block for calculating linguistic pronunciation accuracy for each phoneme, syllable, word, and sentence, and a time difference comparison block and satiety for calculating a difference in pronunciation duration. A phonetic language learning and oral test method using an automatic phonetic comparison method on the Internet, characterized by a phonological difference calculation model comprising a frequency difference comparison block for calculating a difference between twitches.

The DTW-based difference comparison network of claim 1, wherein the DTW-based difference comparison network includes an accent difference comparison block for obtaining a relative position difference of maximum peak values in a pitch contour, and an accent difference comparison obtained from a slope difference between two pitch contours at the end of a learner and a native speaker's pronunciation. Foreign language pronunciation using the automatic pronunciation comparison method on the Internet, characterized by a rhythm difference calculation model consisting of a rhythm difference comparison block calculated from the relative position and size of peaks and valleys between blocks and adjacent words or syllables. Learn and oral test method.

The method according to any one of claims 1 to 4, wherein the method for obtaining the automatic phonetic comparison value is obtained by performing a nonlinear combination of six output values obtained from the DTW-based difference comparison network through an error correction neural network. A foreign language pronunciation learning and oral test method using an automatic phonetic comparison method on the Internet, characterized by learning to be closer to a comparison value and calculating an automatic pronunciation comparison value.

The method of claim 1 or 2, wherein the method of calculating the comparison of the phonological intensity differences extracts a time-frequency region feature from a native speaker's and a learner's speech signal, and removes speaker-specific characteristics through linear frequency conversion and DTW. -Foreign language pronunciation learning and verbal learning using the automatic phonetic comparison method on the Internet, characterized in that the frequency dynamic warping process and the difference of the feature vectors of the two voices are summed by Euclidean distance to calculate linguistic intensity differences of the phonology Testing method.

The method of claim 1, wherein the learner's foreign language pronunciation learning and oral test implementation process includes a learner recording his / her voice to a learner's computer through a microphone, and the native speaker's voice is taken from a native speaker's voice signal database of a server computer and listened to by the learner. The foreign phonetic pronunciation learning and oral test method using the automatic phonetic comparison method on the Internet, characterized in that the difference between the two voices in the learner's computer is calculated by the automatic phonetic comparison algorithm.

The method of claim 6, wherein the learner's foreign language pronunciation learning process

A learner accessing a server computer through the Internet from his computer;

A learner inputting his or her information;

A learner selecting a desired scenario from a plurality of pronunciation practice scenarios provided by a server;

Listening to a native speaker's sentence pronunciation in the selected scenario;

Recording the learner's own pronunciation;

Automatically calculating a difference between two voices and displaying a comparison result on a screen;

Determining whether to continue sentence pronunciation practice; And

Foreign language pronunciation learning and oral test method using the automatic pronunciation comparison method on the Internet, characterized in that the step of selecting whether to practice in a different scenario.

The method of claim 6, wherein the learner's foreign language pronunciation oral test process

Connecting the learner to the server computer;

A learner inputting his or her information;

Selecting a difficulty level of a word or sentence by a learner;

Listening to a native speaker's pronunciation for the sentence of selected difficulty;

The learner recording his / her pronunciation;

Checking whether all the sentences to be tested are pronounced;

Selecting whether to retest with different difficulty problems; And

Foreign language pronunciation learning and oral test method using the automatic pronunciation comparison method on the Internet, characterized in that the step of displaying the final pronunciation comparison results for the pronunciation pronunciation of the learner on the screen.