KR100568167B1

KR100568167B1 - Method of foreign language pronunciation speaking test using automatic pronunciation comparison method

Info

Publication number: KR100568167B1
Application number: KR1020000040979A
Authority: KR
Inventors: 이수영; 정소영
Original assignee: 한국과학기술원
Priority date: 2000-07-18
Filing date: 2000-07-18
Publication date: 2006-04-05
Also published as: KR20020007597A; JP2002040926A

Abstract

본 발명은 인터넷 상에서의 자동발음 비교방법을 이용한 외국어 발음 학습 및 구두 테스트에 관한 것이다. 특히, 인터넷 상에서 실시간으로 쌍방향 외국어 발음 학습 및 말하기 능력 테스트를 위한 서비스 제공 방법에 관한 것이다.The present invention relates to foreign language pronunciation learning and oral testing using an automatic phonetic comparison method on the Internet. In particular, the present invention relates to a method of providing a service for learning a bilingual foreign language pronunciation and speaking ability in real time on the Internet.

본 발명의 DTW기반 음성 비교 네트워크는 두 음성을 음운과 운율 측면에서 6가지 비교값(세기, 시간, 주파수, 강세, 억양, 리듬의 차이)를 구하고, 신경회로망을 통해 전문가의 비교 수치와 가까워지도록 학습하는 구조로 되어있다.The DTW-based voice comparison network of the present invention obtains six comparison values (differences in intensity, time, frequency, stress, intonation, and rhythm) in terms of phonology and rhythm, and approaches the comparison values of experts through neural networks. It is a learning structure.

즉, 음성 자동 비교 알고리즘을 웹에서 구현하는 방법으로써 본 발명은 학습자가 인터넷을 통해 웹서버에 접속하여, 발음 교정 학습을 하고자 하는 외국어 단어나 문장의 원어민 발음을 서버 컴퓨터로부터 요청하여 청취하고, 자신의 발음을 마이크를 통해 녹음하여, 상기의 음성 비교 알고리즘을 이용하여 원어민과의 발음 차이 정도를 수치화하여 학습자의 컴퓨터 화면에 디스플레이하여 주도록 되어 있다. 따라서, 본 발명은 외국어의 듣기 및 말하기를 인터넷에 연결된 컴퓨터를 이용해 학습할 수 있고, 객관적인 척도를 통해 학습자의 외국어 말하기 능력을 테스트할 수 있게 된다.That is, as a method of implementing the automatic speech comparison algorithm on the web, the present invention allows a learner to connect to a web server through the Internet, request a native speaker's pronunciation of a foreign language word or sentence to learn pronunciation correction from a server computer, and listen. By recording the pronunciation of the through a microphone, the degree of pronunciation difference with the native speaker using the voice comparison algorithm is digitized and displayed on the learner's computer screen. Accordingly, the present invention can learn to listen and speak foreign languages using a computer connected to the Internet, and test the learner's ability to speak foreign languages through an objective measure.

DTW 음성비교 네트워크, 외국어 발음 테스트 DTW voice comparison network, foreign language pronunciation test

Description

Foreign language pronunciation test method using the automatic phonetic comparison method {METHOD OF FOREIGN LANGUAGE PRONUNCIATION SPEAKING TEST USING AUTOMATIC PRONUNCIATION COMPARISON METHOD}

도 1은 본 발명에 따른 외국어 발음 테스트를 위한 오차 보정 신경회로망의 매개변수를 구하기 위한 신경회로망의 훈련 과정도이다.
도 2는 학습 완료된 신경회로망의 매개변수를 이용한 외국어 발음의 테스트 과정도이다.
도 3은 도 1에 도시된 자동 음성 비교 네트워크의 부분 상세도이다.
도 4은 도 3에 도시된 음운차이 계산 모델에서 세기차이 비교의 상세 흐름도이다.
도 5는 도 4에 도시된 시간-주파수 동적 와핑 기반의 부분 상세도이다.1 is a training process diagram of a neural network for obtaining parameters of an error correction neural network for a foreign language pronunciation test according to the present invention.
2 is a test process diagram of foreign language pronunciation using the parameters of the learned neural network.
3 is a partial detailed view of the automatic voice comparison network shown in FIG. 1.
4 is a detailed flowchart of an intensity difference comparison in the phonological difference calculation model shown in FIG. 3.
FIG. 5 is a partial detail view of the time-frequency dynamic warping based on FIG. 4.

삭제delete

도 6a 내지 도 6c는 도 3에 도시된 운율차이 계산 모델의 비교 그래프이다.6A to 6C are comparison graphs of the prosody difference calculation model shown in FIG. 3.

삭제delete

<도면의 주요부분에 대한 부호의 설명><Description of Symbols for Main Parts of Drawings>

10 : 학습자 음성신호 20 : 원어민 음성신호10: learner voice signal 20: native speaker voice signal

30 : 신경회로망 훈련 또는 테스트 블럭도
40 : DTW기반차이 비교 네트워크30: neural network training or test block diagram
40: DTW based difference comparison network

42 : 음운차이계산모델 44 : 운율차이계산모델
50 : 오차보정신경회로망 60 : 자동 음성 비교 네트워크42: phonological difference calculation model 44: rhyme difference calculation model
50: error correction neural network 60: automatic voice comparison network

70 : 전문가 평가 비교 네트워크 80 : 오차계산 네트워크70: expert evaluation comparison network 80: error calculation network

삭제delete

본 발명은 인터넷 상에서의 자동발음 비교방법을 이용한 외국어 발음 학습 및 구두 테스트를 위한 학습 서비스 제공 방법에 관한 것이다. 특히, 원어민 음성, 학습자 음성, 전문가 평가로 이루어진 데이터를 이용해 자동 음성비교 네트워크를 학습하고, 학습된 자동 음성비교 네트워크를 이용해 학습자가 발음한 음성을 원어민의 발음과 비교하여 발음의 정확도를 구하는 테스트를 실행할 수 있도록 하는 쌍방향 외국어 발음 학습 및 구두 테스트를 위한 방법에 관한 것이다.The present invention relates to a method of providing a learning service for learning a foreign language pronunciation and oral test using an automatic phonetic comparison method on the Internet. In particular, we test the automatic voice comparison network using data consisting of native speaker's voice, learner's voice, and expert's evaluation, and use the trained automatic voice comparison network to compare the pronunciation of the learner's pronunciation with the native speaker's pronunciation to find the accuracy of pronunciation. A method for interactive language pronunciation learning and oral testing that can be implemented.

기존의 외국어 학습 방법에 있어서 듣기와 말하기 학습은 카세트나 비디오를 통해 원어민의 발음을 반복적으로 듣고, 학습자가 발음을 따라하여 원어민의 발음과 가까운 정도를 스스로 판단하여 발음이 정확해지도록 반복 학습하였다.In the existing foreign language learning method, listening and speaking learning were repeatedly listened to the native speaker's pronunciation through a cassette or video, and the learner followed the pronunciation and judged the degree close to the native speaker's pronunciation to learn the pronunciation accurately.

이러한 학습방법은 자신의 외국어 발음에 대한 객관적인 평가를 할 수 없기 때문에 객관적인 척도를 통해 자신과 원어민의 발음과의 차이를 구하려는 노력이 있어왔다.Since these learning methods cannot make an objective evaluation of their foreign language pronunciation, there have been efforts to find the difference between the pronunciation of native speakers and the native speakers through objective scale.

즉, 기존에는 시간영역에서의 음성 차이 예컨대, 음성 신호의 톤과 전체 발음 시간의 차이를 단순히 비교하여 원어민의 발음과 학습자의 발음을 비교하는 방법을 주로 이용하였다.That is, conventionally, a method of comparing the pronunciation of a native speaker and the learner's pronunciation by simply comparing the difference in voice in the time domain, for example, the difference between the tone of the voice signal and the total pronunciation time.

최근 음성신호처리 기술을 이용한 발음 비교 방법이 개발 되었으며, 은닉마코브 모델(Hidden Markov Model 이하,HMM)을 이용하여 학습자의 발음 음성에 대한 인식을 한 후, 원어민의 음성과 비교하는 알고리즘이 대부분이다.Recently, a pronunciation comparison method using voice signal processing technology has been developed, and most of the algorithms that recognize the learner's pronunciation voice using the Hidden Markov Model (HMM) are compared with the native speaker's voice. .

그러나, 학습자가 주변 잡음이 있는 환경에서 발음을 하거나, 학습자의 발음이 불분명하여 인식상에 오류가 발생하게 되면 원어민 발음과의 차이가 의미없게 될 가능성이 크다.However, if the learner pronounces in an environment with ambient noise or if the learner's pronunciation is unclear and an error occurs in recognition, the difference from the native speaker's pronunciation is likely to be meaningless.

또한, 학습자의 외국어 듣기와 말하기 능력을 평가하기 위해서는 TSE(Test of Speaking English), SEPT(Spoken English Proficiency Test)등과 같은 전문 평가 시험을 지정된 시간 및 장소에서 원어민 어학 전문가가 직접 질문하고 대답을 듣는 인터뷰 방식으로 학습자의 외국어 능력을 평가할 수 있었다.In addition, in order to assess the learner's ability to listen and speak foreign languages, a professional evaluation test such as the Test of Speaking English (TSE) and the Spokken English Proficiency Test (SEPT) is conducted by a native language expert who answers questions and answers at a designated time and place. In this way, the learner's foreign language skills could be assessed.

하지만, 이들 방법 역시 외국어 능력을 테스트 하는데 있어서 시간과 공간의 제약을 받게 되고, 전문가의 평가도 전문가의 피로도나 주변 상황에 의한 주관적인 요소에 영향을 받을 우려가 있다.However, these methods are also limited by time and space in testing foreign language proficiency, and the evaluation of experts may be affected by subjective factors such as expert fatigue or surrounding circumstances.

따라서, 본 발명은 상기한 문제점을 해결하기 위한 것으로써, 본 발명의 목적은 동적시간와핑(Dynamic Time Warping 이하, DTW)기반의 자동 음성 비교 알고리 즘을 통해 학습자 음성을 인식하지 않고, 원어민 음성과의 차이를 빠르고 정확하게 비교할 수 있는 어학 학습 방법을 구현토록 하는 인터넷 상에서의 자동발음 비교방법을 이용한 외국어 발음 학습 및 구두 테스트 방법을 제공하는데 있다.Therefore, the present invention is to solve the above problems, the object of the present invention is to recognize the native voice without the learner's voice through the dynamic time warping (DTW) -based automatic speech comparison algorithm It is to provide foreign language pronunciation learning and oral test method using the automatic phonetic comparison method on the Internet to implement a language learning method that can quickly and accurately compare differences.

본 발명의 다른 목적은 인터넷 상의 웹기반 상태에서 학습자가 시간과 장소에 구애받지 않고 원하는 시간과 장소에서 자신의 외국어 발음을 연습하고 구두 테스트를 받을 수 있도록 하는 인터넷 상에서의 자동발음 비교방법을 이용한 외국어 발음 학습 및 구두 테스트 방법을 제공하는데 있다.Another object of the present invention is a foreign language using an automatic phonetic comparison method on the Internet that allows a learner to practice his / her foreign language pronunciation and take an oral test at any time and place regardless of time and place in a web-based state on the Internet. To provide pronunciation learning and oral test methods.

상기한 목적을 달성하기 위한 기술적 사상으로써 본 발명은The present invention as a technical idea for achieving the above object

원어민의 음성과 학습자의 음성을 자동으로 비교하여 그 차이를 구하는 외국어 발음 학습 방법에 있어서,In the foreign language pronunciation learning method that automatically compares the native speaker's voice with the learner's voice and finds the difference,

학습자 음성과 원어민 음성의 음운 및 운율 차이값이 DTW기반 차이 비교 네트워크에 의해 계산되고, 오차계산 네트워크를 통해 전문가평가비교 네트워크에서 평가한 비교 수치와의 차이가 계산되고, 그 차이값이 줄어들도록 DTW기반 차이 비교 네트워크를 학습토록 하는 인터넷 상에서의 자동발음 비교방법을 이용한 외국어 발음 학습 및 구두 테스트 방법이 제시된다.The phonological and rhyme difference between the learner's voice and the native speaker's voice is calculated by DTW-based difference comparison network, and the error calculation network calculates the difference with the comparison value evaluated by the expert evaluation comparison network and reduces the difference. A method for learning phonetic pronunciation and foreign language using the automatic phonetic comparison method on the Internet for learning the difference-based comparison network is presented.

이하, 본 발명의 실시예에 대한 구성 및 그 작용을 첨부한 도면을 참조하면서 상세히 설명하기로 한다.Hereinafter, with reference to the accompanying drawings, the configuration and operation of the embodiment of the present invention will be described in detail.

도 1은 본 발명에 따른 외국어 발음 테스트를 위한 자동 음성 비교 네트워크의 오차 보정 신경회로망의 매개변수를 구하기 위한 신경회로망 학습 알고리즘을 나타낸 블록 구성도이다.1 is a block diagram illustrating a neural network learning algorithm for obtaining parameters of an error corrected neural network of an automatic speech comparison network for a foreign language pronunciation test according to the present invention.

먼저, 자동음성 비교 네트워크 알고리즘의 구현을 위해 주어진 데이터는 원어민과 학습자가 발음한 각각의 단어나 문장에 대해 전문가가 비교한 발음 차이값이다. 보통 발음의 차이값은 1∼ 5까지 5가지 이산값을 취한다. 여기서, 원어민의 음성은 시간에 따라 발음 기호가 표시된 데이터(transcribed data)로 가정한다.First, the data given for the implementation of the automatic speech comparison network algorithm are pronunciation differences compared by experts for each word or sentence spoken by the native speaker and learner. Usually, the difference in pronunciation takes 5 discrete values from 1 to 5. Here, it is assumed that the native speaker's voice is data written with phonetic symbols over time.

도 1에 도시된 자동음성 비교 네트워크 알고리즘을 살펴보면, 학습자 음성신호(10)와, 원어민 음성신호(20), 자동음성비교 네트워크(60), 전문가평가비교 네트워크(70) 및 오차계산 네트워크(80)로 구성되어 있다.Referring to the automatic speech comparison network algorithm shown in FIG. 1, the learner's voice signal 10, the native speaker's voice signal 20, the automatic voice comparison network 60, the expert evaluation comparison network 70, and the error calculation network 80 are described. Consists of

이 때, 학습자의 음성신호(10)와 원어민의 음성신호(20)는 자동음성비교 네트워크(60)를 통해 음운과 운율의 차이값이 계산된다.At this time, the learner's voice signal 10 and the native speaker's voice signal 20 is calculated through the automatic voice comparison network 60, the difference between the rhyme and rhyme.

자동음성비교 네트워크(60)에는 DTW기반차이 비교 네트워크(40)과 오차보정 신경회로망(50)이 더 포함하여 구성된다.The automatic speech comparison network 60 further includes a DTW-based difference comparison network 40 and an error correction neural network 50.

오차계산 네트워크(80)에는 전문가평가비교 네트워크(70)의 수치와 자동음성 비교 네트워크(60)에서 구한 비교 수치의 차이를 구하며, 그 차이값이 줄어들도록 오차보정 신경회로망(50)의 학습이 이루어진다.
도 2는 도 1의 과정을 통해 매개변수의 학습이 완료된 신경회로망을 이용해 학습자의 음성 신호의 발음 평가 결과를 구할 수 있는 시스템의 상세도이다.
학습자의 음성 신호(10)는 발음 테스트 시스템이 보유하고 있는 원어민 음성 신호(20)와 DTW기반 차이비교 네트워크에서 차이값이 계산되고, 도 1을 통해 학습 완료된 매개변수를 갖는 오차 보정 신경회로망을 통과하여 발음 정확도를 나타내는 수치를 출력으로 주게 된다.The error calculation network 80 obtains the difference between the value of the expert evaluation comparison network 70 and the comparison value obtained from the automatic voice comparison network 60, and the learning of the error correction neural network 50 is made so that the difference value is reduced. .
FIG. 2 is a detailed diagram of a system capable of obtaining a pronunciation evaluation result of a learner's speech signal using a neural network where learning of parameters is completed through the process of FIG. 1.
The learner's voice signal 10 has a difference value calculated from a native speaker's voice signal 20 possessed by the pronunciation test system and a DTW-based difference comparison network, and passes through an error correction neural network having a learned parameter through FIG. 1. To give the output a numerical value indicating the pronunciation accuracy.

도 3은 도 1에 도시된 DTW기반차이 비교 네트워크의 부분 상세도이다.3 is a partial detailed view of the DTW-based difference comparison network shown in FIG. 1.

도 3에 도시된 DTW기반차이 비교 네트워크를 좀 더 구체적으로 살펴보면, 학습자의 음성신호(10)와 원어민의 음성신호(20)는 DTW기반차이 비교 네트워크(40)을 통하여 음운 차이와 운율 차이가 계산된다.Looking at the DTW-based difference comparison network shown in FIG. 3 in more detail, the voice signal 10 of the learner and the voice signal 20 of the native speaker are calculated through the DTW-based difference comparison network 40. do.

먼저, 음운차이 계산 모델(42)에는 세기차이 비교(42a) 블록과, 시간차이 비교(42b) 블록, 주파수차이 비교(42c) 블록이 계산 되며, 주어진 원어민의 문장 발 음에 대해 단어별, 음절별, 음소별로 학습자 발음의 정확도를 계산하는 블록이다.First, the phonological difference calculation model 42 calculates the strength difference comparison 42a block, the time difference comparison 42b block, and the frequency difference comparison 42c block, and calculates words and syllables for each sentence of a given native speaker. It is a block that calculates the accuracy of learner pronunciation by star and phoneme.

세기차이 비교(42a) 블록에서는 화자(speaker)에 따른 음성 신호의 특징들을 모두 없애고 난 후, 두 음성 신호간의 차이를 구한다.In the strength difference comparison 42a block, all the features of the speech signal according to the speaker are removed, and then the difference between the two speech signals is obtained.

즉, 학습자가 발음한 음성과 원어민 음성의 언어학적 메시지(linguistic message)만의 차이를 구하는 블록이라고 볼 수 있다.That is, it can be regarded as a block for finding the difference between the linguistic message of the voice pronounced by the learner and the native speaker's voice.

시간차이 비교(42b) 블록은 문장, 단어, 음절, 음소 등의 발음 지속 시간의 차이를 구하는 블록이며, 주파수차이 비교(42c) 블록은 학습자가 발음한 음성과 원어민 음성 사이의 포만트(formant) 위치 차이를 계산하는 블록을 나타낸다.The time difference comparison block 42b is a block for obtaining a difference in pronunciation duration of sentences, words, syllables, and phonemes. The frequency difference comparison 42c block is a formant between a learner's pronunciation voice and a native speaker's voice. Represents a block for calculating position differences.

이어서, 음운차이 계산 모델(44)은 문장 전체에서 음소와 음소사이, 음절과 음절 사이, 단어와 단어 사이 등을 학습자가 정확히 발음했는지를 구하는 블록을 나타낸다.Subsequently, the phonological difference calculation model 44 represents a block for determining whether the learner correctly pronounces between the phonemes and the phonemes, between the syllables and the syllables, and between the words and the words.

음운차이 계산 모델(44)에는 강세(stress)차이 비교(44a) 블록과, 억양(intonation)차이 비교(44b) 블록 및, 리듬(rhythm)차이 비교(44c) 블록이 계산된다.The phonological difference calculation model 44 calculates a stress difference comparison 44a block, an intonation difference comparison 44b block, and a rhythm difference comparison 44c block.

이 때, 음운차이 계산 모델(44)은 학습자와 원어민 음성의 피치 윤곽선(pitch contour)차이로 부터 얻어진다. 즉, 학습자와 원어민의 음성에서 시간에 따른 피치 모양을 기존의 피치검출 방법을 이용한다.At this time, the phonological difference calculation model 44 is obtained from the pitch contour difference between the learner and the native speaker's voice. In other words, the pitch shape of the learner and the native speaker over time using the conventional pitch detection method.

강세차이 비교(44a) 블록은 피치 윤곽선에서 최대 피크(peak)값들의 상대적인 위치 차이를 구하는 블록이다.The bullish difference comparison 44a block is a block for obtaining the relative position difference of the maximum peak values in the pitch contour.

억양차이 비교(44b) 블록은 학습자와 원어민 발음의 끝부분에서 두 개의 피 치 윤곽선들의 기울기 차이로부터 구하는 블록을 나타내며, 리듬차이 비교(44c) 블록은 인접한 단어나 음절들 사이에서 나타나는 피크와 밸리(valley)들의 상대적인 위치 및 크기로부터 계산되는 블록을 나타낸다.The intonation difference comparison (44b) block represents a block obtained from the difference in the slopes of two pitch contours at the end of learner and native speaker pronunciations. The rhythm difference comparison (44c) block shows peaks and valleys between adjacent words or syllables. block calculated from the relative position and size of the valleys.

상기와 같이 DTW기반차이 비교 네트워크(40)에서 계산된 6가지 출력값들은 전문가평가비교 네트워크(60)의 전문가 평가수치와 비교되기 전에 오차보정신경회로망(50)를 통과한다.As described above, the six output values calculated by the DTW-based difference comparison network 40 pass through the error correction neural network 50 before being compared with the expert evaluation value of the expert evaluation comparison network 60.

이 때, 오차보정신경회로망(50)은 자동으로 계산된 발음 비교값들이 전문가의 평가 값과 가까워지도록 음운과 운율의 차이 계산 네트워크 즉, DTW기반차이 비교 네트워크(40)의 6개 출력값들을 비선형적으로 조합하는 네트워크이다. 오차보정신경회로망(50)의 구조로는 2층의 다층 구조 퍼셉트론 모델(Multi-Layer Perceptron)이 적용된다.At this time, the error correction neural network 50 nonlinearly calculates six output values of the phonological and rhyme difference calculation network, that is, the DTW-based difference comparison network 40 so that the automatically calculated pronunciation comparisons are close to the expert's evaluation values. It is a network to combine. As a structure of the error correction neural network 50, a two-layered multi-layer perceptron model (Multi-Layer Perceptron) is applied.

자동 발음 비교 네트워크의 알고리즘으로 계산된 수치와 전문가평가비교 네트워크(70)의 전문가 평가수치는 오차보정신경회로망(50)에서 오차가 계산되어 신경회로망 네트워크의 시냅스 가중치들을 학습하게 된다.The numerical value calculated by the algorithm of the automatic pronunciation comparison network and the expert evaluation value of the expert evaluation comparison network 70 calculate the error in the error correction neural network 50 to learn the synaptic weights of the neural network.

이 때, 학습은 기존의 제곱 평균 오차 함수(Mean Squared Error Function)와 오차 역전파 학습 알고리즘(Error-Back Propagation algorithm)으로 이루어 진다. 즉, 신경회로망의 출력층 오차 함수

를 다음 식과 같이 정의한다.

여기서,

와

는 각각 s번째 저장된 패턴에 대하여 i번째 출력 뉴런의 목표와 실제 출력 값들이고.

는 출력 뉴런의 수를 나타낸다.

는 다음 식과 같이 표현된다.

여기서,

는 (l)번째 은닉층의

번째 원소에 대한 비선형 함수를 통과하기 이전의 값이다.

단,

는 (l)번째 층에서 오차 역전파되는항이고, 구체적인 수식은 다음과 같다.

그리고, 출력층에서

는 다음 식과 같이 정의된다.

은 (l)번째 층의 가중치에 대한 학습 계수이고, 양극 시그모이드 함수

의 경우에

가 사용된다. In this case, the learning consists of a conventional mean squared error function and an error back propagation algorithm. That is, output layer error function of neural network

Is defined as

here,

Wow

Are the target and actual output values of the i th output neuron for each s th stored pattern.

Represents the number of output neurons.

Is expressed as

here,

Is the ( l ) th hidden layer

The value before passing through the nonlinear function for the first element.

only,

Is the term that is error propagated in the ( l ) th layer, and the specific formula is as follows.

And at the output layer

Is defined as

Is the learning coefficient for the weight of the ( l ) th layer, and the bipolar sigmoid function

in case of

Is used.

도 4는 도 3에 도시된 음운차이 계산 모델에서 세기차이 비교의 상세 흐름도이다.4 is a detailed flowchart of an intensity difference comparison in the phonological difference calculation model shown in FIG. 3.

도 4에 도시된 세기차이 비교 블록의 상세 흐름도를 살펴보면, 원어민과 학습자 음성의 세기 차이 계산은 다음과 같은 알고리즘으로 구현된다.Looking at the detailed flow chart of the intensity difference comparison block shown in Figure 4, the calculation of the intensity difference between the native speaker and the learner's voice is implemented by the following algorithm.

먼저, 원어민의 음성신호(S100)에서 끝점을 추출(S102)하고, 학습자의 음성 신호(S101)에서도 끝점을 추출(S103)하여 에너지 규준화 블록(S104)을 통과한다.First, an end point is extracted from the voice signal S100 of a native speaker (S102), and an end point is also extracted (S103) from the learner's voice signal S101 and passed through the energy normalization block S104.

이어서, 마이크에 따른 출력 에너지의 차이를 제거한 후, 프레임 블록화(S105,S106)하고 푸리에 변환(S107,S108)한다.Subsequently, after removing the difference in the output energy according to the microphone, frame blocking (S105, S106) and Fourier transform (S107, S108).

이 때, 프레임 블록화(S105,S106)는 시계열로 들어오는 음성 신호를 수십 밀리초(milli-second)로 나누어, 해밍 윈도(Hamming window)나 하닝 윈도(Hanning window)를 씌우는 부분이다.At this time, the frame blocking (S105, S106) is a portion that divides the speech signal coming in the time series by several tens of milli-seconds to cover a Hamming window or a Hanning window.

또한, 푸리에 변환(S107,S108)은 두 음성 신호를 각 프레임별로 푸리에 변환을 통해 시간 영역 신호를 주파수 영역 신호로 바꾼다.In addition, the Fourier transforms S107 and S108 convert the time-domain signals into frequency-domain signals through Fourier transforms of the two voice signals for each frame.

이어서, 선형 주파수 변환과 DTW를 통해 화자에 따른 특성을 제거해 주는 시간-주파수 동적 와핑(S109)한 후, 바크(Bark) 단위로의 주파수 와핑(S110,S111)과 라운드니스(Loudness)단위로의 세기 와핑(S112,S113) 과정을 거치게 된다.Subsequently, time-frequency dynamic warping (S109) that removes speaker-specific characteristics through linear frequency conversion and DTW, and then frequency warping (S110, S111) and roundness (unit) in units of bark The intensity warping (S112, S113) is subjected to the process.

이 때, 시간-주파수 동적 와핑(S109)은 원어민의 음성과 학습자의 음성에서 화자의 차이에 의한 영향, 즉, 발음 지속(duration) 시간의 차이와 성도 길이(vocal tract length)의 차이에 의한 주파수 영역의 차이를 제거하기 위한 블록이다. 발음 지속 시간의 차이는 DTW에 의해 없앨 수 있고, 성도 길이에 따른 차이는 선형 주파수 변환에 의해 없앨 수 있다.At this time, the time-frequency dynamic warping (S109) is a frequency caused by the difference between the speaker in the native speaker's voice and the learner's voice, that is, the difference in the duration of pronunciation and the difference in the vocal tract length. This block removes the difference between regions. The difference in pronunciation duration can be eliminated by DTW, and the difference in saint length can be eliminated by linear frequency conversion.

바크(Bark) 단위로의 주파수 와핑(S110,S111)은 헤르쯔(Hz)단위의 음성 신호를 음향 심리학적(psychoacoustic)주파수 단위인 바크(Bark) 단위로 바꾸는 부분이다.Frequency warping in units of Bark (S110, S111) is a part for converting a voice signal in Hertz (Hz) into a unit of Bark, which is an acoustic psychological frequency unit.

라운드니스(Loudness)단위로의 세기 와핑(S112,S113)은 푸리에 변환(S107,S108)을 통해 나온 스펙트럼의 에너지를 음향 심리학적 세기 단위인 Loudness단위로 바꾸는 블록이다.Intensity warping in units of roundness (S112, S113) is a block for converting the energy of the spectrum from the Fourier transform (S107, S108) into a unit of loudness, which is an acoustic psychological intensity unit.

이어서, 푸리에 역변환(S114,S115)한 후, 켑스트럼 계산 블록(S116,S117)을 통해 최종적으로 켑스트럼 특징 벡터들을 추출한다. Subsequently, after Fourier inverse transforms S114 and S115, the cepstrum feature vectors are finally extracted through the cepstrum calculation blocks S116 and S117.

즉, 푸리에 역변환(S114,S115)은 라운드니스(Loudness)단위세기 와핑을 한 신호들이 실수값을 취하고, 대칭적이기 때문에 코사인 변환(Cosine transform)으로 계산하는 부분이다.That is, the Fourier inverse transforms S114 and S115 are parts calculated by a cosine transform because signals having rounded unit strength warping take a real value and are symmetrical.

원어민의 음성 신호(S100)와 학습자의 음성 신호(S101)에서 비교 되어질 켑스트럼 특징 벡터들은 켑스트럼 계산 블록(S116,S117)을 통해 최종적으로 추출한다.The cepstrum feature vectors to be compared in the native speaker's voice signal S100 and the learner's voice signal S101 are finally extracted through the cepstrum calculation blocks S116 and S117.

이 때, 상기한 본 발명의 방법은 기존의 PLP(perceptual linear prediction)특징 추출 방법과 유사한 데, 음성 신호의 화자 특성을 없애기 위한 시간-주파수 동적 와핑을 특징 추출하는 과정에 시행하는 시간-주파수 동적 와핑 블록(S109)이 새로 추가된 점에서 차이가 있다.At this time, the method of the present invention is similar to the conventional PLP (perceptual linear prediction) feature extraction method, which is a time-frequency dynamic that is performed in the process of feature extraction of time-frequency dynamic warping to eliminate speaker characteristics of the speech signal. There is a difference in that the warping block S109 is newly added.

이어서, 프레임별 거리를 계산한 후(S118) 음운의 세기 차이 단위로 변환시키게 된다.(S119)Subsequently, the distance for each frame is calculated (S118) and then converted into units of intensity differences of the phonemes (S119).

이 때, 프레임별 거리 계산 블록(S118)은 원어민과 학습자 음성의 특징 벡터들을 프레임 단위로 거리 차이를 계산하는 부분이다. 거리 계산은 유클리드 거리로 계산하여 모든 프레임에 대해 거리 차이값들을 더하여 두 음성의 발음 차이값으로 한다.In this case, the distance calculation block S118 for each frame is a part for calculating a distance difference of the feature vectors of the native speaker and the learner's voice in units of frames. Distance calculation is calculated as Euclidean distance, and the distance difference values are added for all frames to be the pronunciation difference of the two voices.

또한, 프레임별 거리 계산 블록(S118)에서 구한 발음 차이값을 전문가 평가 수치와의 비교를 위해 1에서 5까지의 크기를 갖도록 음운의 세기 단위로 변환 블록(S119)에서 선형 변환이나 로지스틱(logistic) 변환을 이용해 변환한다.In addition, the phonetic difference value obtained in the distance calculation block S118 for each frame may be linearly transformed or logistic in the phonological intensity unit so as to have a magnitude of 1 to 5 for comparison with an expert evaluation value. Convert using transform.

도 5는 도 4에 도시된 시간-주파수 동적 와핑 단계의 부분 상세도이다.5 is a partial detail of the time-frequency dynamic warping step shown in FIG. 4.

도 5를 살펴보면, 푸리에 변환된 원어민 음성(200)과 학습자 음성(201)은 이들의 켑스트럼 특징 벡터들의 프레임별 거리들이 최소가 되도록 시간과 주파수 영역에서 와핑이 일어난다.Referring to FIG. 5, the Fourier-transformed native speaker 200 and the learner's voice 201 are warped in a time and frequency domain so that the distances per frame of their cepstral feature vectors are minimized.

즉, 학습자 음성(201)은 화자들간의 음성 신호 차이의 주 요인으로 꼽히는 성도 길이에 의한 차이를 없애기 위해 선형 주파수 와핑 네트워크(202)를 통과하고, 원어민 음성(200)과의 발음 시간 차이를 없애기 위해 비선형 동적 시간 와핑(203)을 통과한다.That is, the learner's voice 201 passes through the linear frequency warping network 202 to eliminate the difference in vocal length, which is considered to be the main factor of the difference in the voice signal between the speakers, and eliminates the difference in pronunciation time with the native speaker's 200. Pass through the nonlinear dynamic time warping 203.

그리고 나서, 켑스트럼 특징 벡터를 추출하는 블록(204,205)으로 들어가 켑스트럼들이 계산되어 프레임별 거리 계산 블록(206)에서 켑스트럼 벡터들 사이의 유클리드 거리가 계산된다. 이 때, 이 오차가 최소가 되도록 선형 주파수 와핑(202)과 비선형 동적 시간 와핑(203)이 이루어진다.Then, blocks 204 and 205 for extracting the cepstruum feature vector are computed so that the Euclidean distance between the cepstrum vectors is calculated in the frame-by-frame distance calculation block 206. At this time, linear frequency warping 202 and nonlinear dynamic time warping 203 are made such that this error is minimal.

한편, 도 3의 시간차이 비교(42b) 블록은 도 5의 시간-주파수 동적 와핑에 의해 계산되어진 학습자와 원어민의 특징 벡터들에서 시간에 따른 와핑 정도를 이용해 계산할 수 있다.Meanwhile, the time difference comparison 42b block of FIG. 3 may be calculated using the degree of warping over time in the feature vectors of the learner and the native speaker calculated by the time-frequency dynamic warping of FIG. 5.

즉, 시간축으로 정렬된 두 음성 신호들의 음소별 발음 지속 시간 차이들을 모두 더하여 총 음소의 개수로 나눈 값이 시간차이 비교(42b) 블록의 출력값이 된다.That is, a value obtained by adding all phoneme duration differences between phonemes of two voice signals aligned on a time axis and dividing by the total number of phonemes becomes an output value of the time difference comparison 42b block.

이와 마찬가지로, 도 3에서 주파수차이 비교(42c) 블록은 상기의 시간차이 비교(42b) 블록에서 구한 방법과 유사하게 주파수 축에 따른 선형 변환을 한 후, 학습자가 발음한 음성과 원어민의 음성사이의 제1 포만트(F1), 제2 포만트(F2), 제3 포만트(F3)등의 위치 차이로부터 계산한다.Similarly, in FIG. 3, the frequency difference comparison 42c block performs a linear transformation along the frequency axis similarly to the method obtained in the time difference comparison 42b block, and then, between the voice spoken by the learner and the native speaker's voice, is obtained. It calculates from the position difference of 1st formant F1, 2nd formant F2, 3rd formant F3, etc.

도 6에 도시된 바와 같이, 두 음성 신호의 운율 차이는 피치 윤곽선의 차이로부터 계산되는데 음성의 피치는 기존의 주파수 필터링이나 켑스트럼을 이용하는 방법들로 구하고, 선형 회귀방법으로 유성음 발음과 무성음 발음에서의 피치 윤곽선들이 연속이 되게 한다.As shown in FIG. 6, the rhythm difference between the two voice signals is calculated from the difference in the pitch contour. The pitch of the voice is obtained by using conventional frequency filtering or cepstrum, and the voiced pronunciation and the unvoiced pronunciation by linear regression. Let the pitch contours in be continuous.

도 6a를 살펴보면, 원어민과 학습자의 문장 발음에 대해 피치 윤곽선을 시간에 따라 나타낸 그래프이다.Referring to FIG. 6A, a pitch outline of a native speaker and a learner's sentence pronunciation over time is shown.

도 3의 강세차이 비교(44a) 블록은 피치 윤곽선에서 최대 피크가 나타나는 음절이나 단어가 틀린 정도를 두 음성의 강세 차이로 비교한다. 즉, 학습자가 강세를 두고 발음하는 음절과 원어민의 강세 음절과의 시간 차이를 계산하는데 차이 구간내의 음절 개수가 강세의 차이로 나타남을 알 수 있다.The bullish difference comparison 44a block of FIG. 3 compares the degree of syllables or words in which the maximum peak appears in the pitch contour with the stress difference between the two voices. In other words, it can be seen that the number of syllables in the difference section is represented as a difference in stress when the learner calculates a time difference between a syllable pronounced and a native speaker's stressed syllable.

도 6b를 살펴보면, 도 3의 억양차이 비교(44b) 블록을 설명하기 위한 것으로서, 억양의 차이는 문장 발음의 끝부분에서 나타나는 피치 윤곽선의 기울기 차이로부터 계산된다.Referring to FIG. 6B, to explain the intonation difference comparison 44b block of FIG. 3, the difference in intonation is calculated from the slope difference of the pitch outline appearing at the end of the sentence pronunciation.

즉, 평서문의 경우 문장 끝부분에서의 피치 기울기는 음수이고, 의문문의 경우는 대체적으로 기울기가 양수가 된다. 상기한 기울기 차이를 이용해 두 음성의 억양의 차이를 구할 수 있다.That is, in the case of a plain sentence, the pitch slope at the end of the sentence is negative, and in the case of a question sentence, the slope is generally positive. The difference in inclination of two voices can be obtained using the above-described slope difference.

도 6c는 도 3의 리듬차이 비교(44c) 블록을 설명하는 그래프로써, 두 음성의 피치 윤곽선에서 삼각형 모양은 피크를 나타내고, 역삼각형 모양은 밸리(valley)를 나타낸다. 두 음성의 리듬 차이는 상기한 피크와 밸리의 개수와 크기 차이로부터 구할 수 있다는 것을 알 수 있다.FIG. 6C is a graph illustrating the rhythm difference comparison 44c block of FIG. 3, in which the triangular shapes represent peaks and the inverted triangle shapes represent valleys in the pitch contours of the two voices. It can be seen that the rhythm difference between the two voices can be obtained from the difference in the number and size of the peaks and valleys.

삭제delete

이상에서 상술한 바와 같이, 본 발명에 의한 인터넷 상에서의 자동발음 비교방법을 이용한 외국어 발음 학습 및 구두 테스트 방법에 따르면 다음과 같은 이점이 있다.As described above, according to the foreign language pronunciation learning and oral test method using the automatic pronunciation comparison method on the Internet according to the present invention has the following advantages.

첫째, 외국어의 발음을 학습하려는 학습자의 음성과 원어민의 음성을 음성 신호 처리 기술을 이용한 자동음성비교 알고리즘으로 비교함으로써 학습자가 자신의 발음 능력에 대한 객관적인 평가 수치를 알 수 있는 이점이 있다.First, by comparing the learner's voice and the native speaker's voice with an automatic speech comparison algorithm using speech signal processing technology, the learner can know the objective evaluation value of his / her pronunciation ability.

둘째, 자동음성비교 알고리즘을 웹상태에서 즉, 인터넷에 연결된 컴퓨터에서 외국어 발음 학습과 구두 테스트를 할 수 있는 외국어 학습 서비스를 제공함으로써 어학 전문가를 옆에 두고 학습을 하는 것처럼 학습자가 자신의 외국어 발음 능력을 검증할 수 있다.Second, the learner's ability to pronounce his / her foreign language is as if a learner learns by a language expert by providing an automatic voice comparison algorithm that provides foreign language learning services that enable students to learn foreign language pronunciation and oral tests on the web, that is, on a computer connected to the Internet. Can be verified.

Claims

In the foreign language pronunciation test method that automatically compares the native speaker's voice with the learner's voice to find the difference,

A first step in which the phonological and rhyme difference values of the learner's voice and the native speaker's voice are calculated by a DTW-based difference comparison network;

A second step of calculating a difference value between the calculated value of the DTW-based difference comparison network and a comparison value evaluated by the expert evaluation comparison network through an error calculation network; And

And a third step of training an error-correcting neural network to reduce the first and second arithmetic difference values.

The method according to claim 1,

The DTW-based difference comparison network has a phonological difference calculation model,

The phonological difference calculation model is a strength difference comparison block for calculating linguistic pronunciation accuracy for each phoneme, syllable, word, sentence, and;

A time difference comparison block for calculating a difference in pronunciation duration, and

A foreign language pronunciation test method using an automatic pronunciation comparison method comprising a frequency difference comparison block for calculating a difference between formant positions.

The method according to claim 1,

The DTW-based difference comparison network has a rhythm difference calculation model.

The rhyme difference calculation model includes a strong difference comparison block for obtaining a relative position difference of the maximum peak values in a pitch contour;

Intonation difference comparison block, which is obtained from the slope difference between two pitch contours at the end of learner and native speaker pronunciation, and

And a rhythm difference comparison block calculated from relative positions and sizes of peaks and valleys appearing between adjacent words or syllables.

The method according to any one of claims 1 to 3,

The automatic pronunciation comparison method

Autophonic comparison, characterized in that it calculates an automatic pronunciation comparison value by learning to approximate the comparison value of the expert evaluation comparison network by nonlinear combination of six output values obtained from the DTW-based difference comparison network through the error correction neural network Foreign language pronunciation test method using the method.

The method according to claim 2,

The method for calculating the intensity difference in the intensity difference comparison block

The time-frequency dynamic warping process removes speaker-specific features through linear frequency conversion and DTW by extracting features of native speakers and learners' speech signals, and Euclidean distance between frame-specific differences of feature vectors. The foreign language pronunciation test method using the automatic phonetic comparison method, characterized in that to calculate the linguistic intensity difference of the phoneme by adding to.

delete