KR102212332B1

KR102212332B1 - Apparatus and method for evaluating pronunciation accuracy for foreign language education

Info

Publication number: KR102212332B1
Application number: KR1020190004175A
Authority: KR
Inventors: 김주현; 오지연
Original assignee: 김주현; 오지연
Priority date: 2019-01-11
Filing date: 2019-01-11
Publication date: 2021-02-04
Also published as: KR20200087623A

Abstract

본 발명은 외국어 교육을 위한 발음 정확도 평가 장치에 관한 것으로서, 보다 구체적으로는 발음 정확도 평가 장치로서, 학습자가 발음하는 음성 신호 및 상기 음성 신호를 발음하는 학습자의 영상을 입력받는 입력 모듈; 학습자의 음성 신호로부터 음소 시퀀스를 추출하는 음성 분석 모듈; 학습자의 영상에서 입모양 랜드마크를 인식하고, 인식된 입모양 랜드마크의 변화 시퀀스를 추출하는 영상 분석 모듈; 및 상기 음성 분석 모듈 및 영상 분석 모듈에서 추출된 상기 음성 신호의 음소 시퀀스 및 입모양 랜드마크의 변화 시퀀스를 이용하여 발음 정확도를 평가하는 평가 모듈을 포함하는 것을 그 구성상의 특징으로 한다.
또한, 본 발명은 외국어 교육을 위한 발음 정확도 평가 방법에 관한 것으로서, 보다 구체적으로는 발음 정확도 평가 방법으로서, (1) 학습자가 발음하는 음성 신호 및 상기 음성 신호를 발음하는 학습자의 영상을 입력받는 단계; (2) 학습자의 음성 신호로부터 음소 시퀀스를 추출하는 단계; (3) 학습자의 영상에서 입모양 랜드마크를 인식하고, 인식된 입모양 랜드마크의 변화 시퀀스를 추출하는 단계; 및 (4) 상기 단계 (2) 및 단계 (3)에서 추출된 상기 음성 신호의 음소 시퀀스 및 입모양 랜드마크의 변화 시퀀스를 이용하여 발음 정확도를 평가하는 단계를 포함하는 것을 그 구성상의 특징으로 한다.
본 발명에서 제안하고 있는 외국어 교육을 위한 발음 정확도 평가 장치 및 방법에 따르면, 학습자의 음성 신호로부터 추출되며 발음의 리듬, 음도, 강세 등의 속성이 부여된 음소 시퀀스를 이용해 발음 정확도를 평가함으로써 세부적인 발음 특성을 효과적으로 반영하여 평가할 수 있고, 발음을 하는 학습자의 영상으로부터 입모양 랜드마크의 변화 시퀀스를 추출하여 원어민의 입모양과 비교함으로써 발음 정확도의 평가 결과에 대한 신뢰도를 향상시킬 수 있다.
또한, 본 발명에서 제안하고 있는 외국어 교육을 위한 발음 정확도 평가 장치 및 방법에 따르면, 원어민 발음과 학습자 발음 사이의 음성의 파형의 유사도 및 음소의 유사도 뿐 아니라, 속성이 부여된 음소 시퀀스 및 입모양 랜드마크의 변화 시퀀스를 종합적으로 반영하여 발음 정확도를 평가함으로써, 발음 정확도에 대한 신뢰도 높은 종합적인 평가 결과를 제공할 수 있고, 이를 통해 외국어 학습자가 문제점을 정확하게 인식하고 보완할 수 있도록 하며, 발음 정확도에 대한 가이드를 제시하여 학습 효과를 향상시킬 수 있다.The present invention relates to an apparatus for evaluating pronunciation accuracy for foreign language education, and more particularly, to an apparatus for evaluating pronunciation accuracy, comprising: an input module for receiving a voice signal pronounced by a learner and an image of a learner pronouncing the voice signal; A speech analysis module for extracting a phoneme sequence from the learner's speech signal; An image analysis module that recognizes a mouth-shaped landmark from the learner's image and extracts a change sequence of the recognized mouth-shaped landmark; And an evaluation module for evaluating pronunciation accuracy using a phoneme sequence of the voice signal and a change sequence of a mouth-shaped landmark extracted from the voice analysis module and the image analysis module.
In addition, the present invention relates to a method for evaluating pronunciation accuracy for foreign language education, and more specifically, as a method for evaluating pronunciation accuracy, comprising: (1) receiving a voice signal pronounced by a learner and an image of a learner pronouncing the voice signal. ; (2) extracting a phoneme sequence from the learner's speech signal; (3) recognizing a mouth-shaped landmark from the learner's image and extracting a change sequence of the recognized mouth-shaped landmark; And (4) evaluating pronunciation accuracy using the phoneme sequence of the speech signal and the change sequence of the mouth-shaped landmark extracted in steps (2) and (3). .
According to the apparatus and method for evaluating pronunciation accuracy for foreign language education proposed in the present invention, detailed pronunciation accuracy is evaluated by using a phoneme sequence extracted from a learner's speech signal and given attributes such as rhythm, pitch, and stress. The pronunciation characteristics can be effectively reflected and evaluated, and the reliability of the evaluation result of pronunciation accuracy can be improved by extracting the change sequence of the mouth-shaped landmark from the image of the learner who pronounces the pronunciation and comparing it with the mouth shape of a native speaker.
In addition, according to the pronunciation accuracy evaluation apparatus and method for foreign language education proposed in the present invention, not only the similarity of the waveform of the voice and the similarity of the phoneme between the pronunciation of the native speaker and the pronunciation of the learner, but also the phoneme sequence and mouth-shaped land with attributes By evaluating the pronunciation accuracy by comprehensively reflecting the change sequence of the mark, it is possible to provide a comprehensive evaluation result with high confidence in the pronunciation accuracy, which enables foreign language learners to accurately recognize and compensate for problems, and improve pronunciation accuracy. You can improve the learning effect by presenting a guide for this.

Description

Apparatus and method for evaluating pronunciation accuracy for foreign language education{APPARATUS AND METHOD FOR EVALUATING PRONUNCIATION ACCURACY FOR FOREIGN LANGUAGE EDUCATION}

본 발명은 발음 정확도 평가 장치 및 방법에 관한 것으로서, 보다 구체적으로는 외국어 교육을 위한 발음 정확도 평가 장치 및 방법에 관한 것이다.The present invention relates to a pronunciation accuracy evaluation apparatus and method, and more specifically, to a pronunciation accuracy evaluation apparatus and method for foreign language education.

최근 들어, 산업의 전문화와 국제화 추세에 따라 외국어에 대한 중요성이 날로 커지고 있다. 이러한 중요성에 따라 많은 사람들이 외국어 학습에 많은 시간을 투자하고 있으며, 이에 발맞추어 다양한 온-오프라인 어학 강좌들이 개설되고 있다.
In recent years, the importance of foreign languages is increasing day by day with the trend of industry specialization and internationalization. According to this importance, many people are investing a lot of time in learning foreign languages, and in line with this, various on- and offline language courses are being opened.

그러나 일반적으로 발음이나 발음에 대한 교정은 외국인 강사와의 1:1 지도 방식에 의해 이루어지고 있고, 이 경우 어학 학습에 많은 비용이 소요된다는 문제점이 있으며, 특별히 정해진 시간에 교육이 이루어지기 때문에 직장인 등과 같은 바쁜 일상생활을 영위하는 사람들에게는 그 학습에의 참여가 극히 제한적이라는 문제점이 있었다.
However, in general, pronunciation or pronunciation is corrected by a 1:1 guidance method with a foreign instructor. In this case, there is a problem that language learning is expensive, and because education is provided at a special time, For those who lead the same busy daily life, there was a problem that participation in the learning was extremely limited.

따라서, 유휴 시간에 혼자서도 효과적으로 외국어에 대한 발음 등을 학습하고, 원어민의 발음과 비교 평가하는 교육 프로그램 등을 필요로 하게 되었다. 이러한 요구에 부응하기 위하여 현재 음성 인식을 이용한 다양한 어학용 프로그램들을 탑재한 어학용 학습기가 개발되어 보급되고 있는 실정이다.
Therefore, there is a need for an educational program that effectively learns pronunciation of foreign languages and evaluates the pronunciation of native speakers even when alone during idle time. In order to meet these demands, language learners equipped with various language programs using speech recognition have been developed and spread.

이와 같은 음성인식을 이용한 발음 평가는 다양한 영어 말하기 학습 프로그램에 적용되고 있다. 종래에 발음 정확도를 평가하는 기술은, 원어민 발음과 학습자 발음 사이의 음성의 파형의 유사도, 음의 최소 단위인 음소의 유사도를 이용하고 있다. 그러나 음성 파형의 경우 정확한 발음이더라도 개개인의 특성에 따라 파형이 달라 정확한 평가가 어려우며, 어학에 능통하더라도 원어민 개개인의 파형에 맞춰 발음할 수 없기 때문에 신뢰성에 문제가 있다. 또한, 음소 유사도의 경우, 문자나 기호 등으로 표현되는 음소에 대하여 원어민의 음소와 학습자의 음소를 비교하는 방식으로, 세부적인 발음 특성을 반영할 수 없으므로 정확도가 떨어지는 한계가 있다.
Pronunciation evaluation using such speech recognition is applied to various English speaking learning programs. Conventionally, a technique for evaluating pronunciation accuracy uses a similarity of a voice waveform between a native speaker's pronunciation and a learner's pronunciation, and a similarity of a phoneme that is a minimum unit of sound. However, in the case of a voice waveform, even if it is an accurate pronunciation, it is difficult to accurately evaluate it because the waveform is different depending on the characteristics of each individual, and there is a problem in reliability because it cannot be pronounced according to the waveform of each native speaker even if he is fluent in language. In addition, in the case of phoneme similarity, the phoneme of a native speaker is compared with the phoneme of a learner with respect to a phoneme expressed by letters or symbols, and since detailed pronunciation characteristics cannot be reflected, accuracy is deteriorated.

국제화 시대에 적합한 의사소통능력 향상을 위해서, 학습자의 외국어 발음은 매우 중요하다. 따라서 전술한 바와 같은 종래기술의 문제점을 해결하여 원어민의 발음을 더욱 정확하게 모사하고, 개선된 발음 정확도를 제공하기 위한 기술의 개발이 요구되고 있다.
In order to improve communication skills suitable for the era of internationalization, the pronunciation of learners in foreign languages is very important. Accordingly, there is a need for development of a technology to more accurately simulate the pronunciation of a native speaker by solving the problems of the prior art as described above and to provide improved pronunciation accuracy.

한편, 본 발명과 관련된 선행기술로서, 공개특허 제10-2016-0107735호(발명의 명칭: 음성 인식 기반 발음 평가 방법 및 장치, 공개일자: 2016년 09월 19일), 공개특허 제10-2018-0048136호(발명의 명칭: 발음평가 방법 및 상기 방법을 이용하는 발음평가 시스템, 공개일자: 2018년 05월 10일) 등이 개시된 바 있다.On the other hand, as a prior art related to the present invention, Patent Publication No. 10-2016-0107735 (title of the invention: method and apparatus for evaluating pronunciation based on voice recognition, publication date: September 19, 2016), Patent Publication No. 10-2018 -0048136 (name of the invention: pronunciation evaluation method and pronunciation evaluation system using the method, publication date: May 10, 2018) and the like have been disclosed.

본 발명은 기존에 제안된 방법들의 상기와 같은 문제점들을 해결하기 위해 제안된 것으로서, 학습자의 음성 신호로부터 추출되며 발음의 리듬, 음도, 강세 등의 속성이 부여된 음소 시퀀스를 이용해 발음 정확도를 평가함으로써 세부적인 발음 특성을 효과적으로 반영하여 평가할 수 있고, 발음을 하는 학습자의 영상으로부터 입모양 랜드마크의 변화 시퀀스를 추출하여 원어민의 입모양과 비교함으로써 발음 정확도의 평가 결과에 대한 신뢰도를 향상시킬 수 있는, 외국어 교육을 위한 발음 정확도 평가 장치 및 방법을 제공하는 것을 그 목적으로 한다.
The present invention has been proposed to solve the above problems of the previously proposed methods, and by evaluating pronunciation accuracy using a phoneme sequence extracted from the learner's speech signal and assigned attributes such as rhythm, pitch, and stress. It can effectively reflect and evaluate detailed pronunciation characteristics, and improve the reliability of the evaluation result of pronunciation accuracy by extracting the change sequence of the mouth-shaped landmark from the image of the learner who is speaking and comparing it with the mouth shape of a native speaker. An object thereof is to provide an apparatus and method for evaluating pronunciation accuracy for foreign language education.

또한, 본 발명은, 원어민 발음과 학습자 발음 사이의 음성의 파형의 유사도 및 음소의 유사도 뿐 아니라, 속성이 부여된 음소 시퀀스 및 입모양 랜드마크의 변화 시퀀스를 종합적으로 반영하여 발음 정확도를 평가함으로써, 발음 정확도에 대한 신뢰도 높은 종합적인 평가 결과를 제공할 수 있고, 이를 통해 외국어 학습자가 문제점을 정확하게 인식하고 보완할 수 있도록 하며, 발음 정확도에 대한 가이드를 제시하여 학습 효과를 향상시킬 수 있는, 외국어 교육을 위한 발음 정확도 평가 장치 및 방법을 제공하는 것을 또 다른 목적으로 한다.In addition, the present invention evaluates pronunciation accuracy by comprehensively reflecting not only the similarity of the waveform of the voice and the similarity of the phoneme between the pronunciation of the native speaker and the pronunciation of the learner, but also the phoneme sequence with attributes and the change sequence of the mouth-shaped landmark, Foreign language education that can provide highly reliable and comprehensive evaluation results for pronunciation accuracy, which enables foreign language learners to accurately recognize and compensate for problems, and improve learning effectiveness by presenting guides for pronunciation accuracy. Another object is to provide an apparatus and method for evaluating pronunciation accuracy.

상기한 목적을 달성하기 위한 본 발명의 특징에 따른 외국어 교육을 위한 발음 정확도 평가 장치는,An apparatus for evaluating pronunciation accuracy for foreign language education according to a feature of the present invention for achieving the above object,

발음 정확도 평가 장치로서,As a pronunciation accuracy evaluation device,

학습자가 발음하는 음성 신호 및 상기 음성 신호를 발음하는 학습자의 영상을 입력받는 입력 모듈;An input module for receiving an audio signal pronounced by a learner and an image of a learner pronouncing the audio signal;

학습자의 음성 신호로부터 음소 시퀀스를 추출하는 음성 분석 모듈;A speech analysis module for extracting a phoneme sequence from the learner's speech signal;

학습자의 영상에서 입모양 랜드마크를 인식하고, 인식된 입모양 랜드마크의 변화 시퀀스를 추출하는 영상 분석 모듈; 및An image analysis module for recognizing a mouth-shaped landmark from the learner's image and extracting a change sequence of the recognized mouth-shaped landmark; And

상기 음성 분석 모듈 및 영상 분석 모듈에서 추출된 상기 음성 신호의 음소 시퀀스 및 입모양 랜드마크의 변화 시퀀스를 이용하여 발음 정확도를 평가하는 평가 모듈을 포함하는 것을 그 구성상의 특징으로 한다.
And an evaluation module for evaluating pronunciation accuracy using a phoneme sequence of the voice signal and a change sequence of a mouth-shaped landmark extracted from the voice analysis module and the image analysis module.

바람직하게는, 상기 음성 분석 모듈은,Preferably, the speech analysis module,

상기 음성 신호로부터 음소를 추출하는 음소 추출부; 및A phoneme extractor for extracting a phoneme from the voice signal; And

상기 추출된 음소에 속성을 부여하여, 속성이 부여된 음소 시퀀스를 추출하는 음소 시퀀스 추출부를 포함할 수 있다.
It may include a phoneme sequence extractor for extracting the phoneme sequence by assigning the attribute to the extracted phoneme.

더욱 바람직하게는, 상기 음소에 부여되는 속성은,More preferably, the attribute imparted to the phoneme,

상기 음소의 리듬, 강세 및 음도를 포함할 수 있다.
It may include the rhythm, stress, and pitch of the phoneme.

더욱 바람직하게는, 상기 평가 모듈은,More preferably, the evaluation module,

상기 음소 추출부에서 추출한 음소를 더 이용하여 발음 정확도를 평가할 수 있다.
Pronunciation accuracy may be evaluated by further using the phoneme extracted by the phoneme extraction unit.

바람직하게는, 상기 영상 분석 모듈은,Preferably, the image analysis module,

안면 인식 기술을 이용해 상기 학습자의 안면이 촬영된 영상에서 입술 주위의 입모양 랜드마크를 인식할 수 있다.
Using the facial recognition technology, a mouth-shaped landmark around the lips may be recognized from an image of the learner's face.

바람직하게는,Preferably,

원어민 발음에 대해 분석된 음성 파형, 음소, 입모양 랜드마크 데이터를 저장하는 데이터베이스 모듈을 더 포함하며,Further comprising a database module for storing the voice waveform, phoneme, and mouth-shaped landmark data analyzed for the pronunciation of the native speaker,

상기 평가 모듈은, 상기 데이터베이스 모듈에 저장된 원어민 데이터와 상기 학습자의 음성 신호의 음성 파형, 음소 시퀀스 및 입모양 랜드마크를 비교하여, 발음 정확도를 평가할 수 있다.
The evaluation module may compare the native speaker data stored in the database module with the voice waveform of the learner's voice signal, a phoneme sequence, and a mouth-shaped landmark to evaluate pronunciation accuracy.

상기 데이터베이스 모듈에 저장된 원어민 데이터와 상기 학습자의 음성 신호의 음성 파형, 음소 시퀀스 및 입모양 랜드마크를 각각 비교하여 유사도를 산출하며, 산출된 유사도를 종합하여 발음 정확도를 정량적으로 평가할 수 있다.
The degree of similarity is calculated by comparing the native speaker data stored in the database module with the voice waveform, phoneme sequence, and mouth-shaped landmark of the learner's voice signal, and the accuracy of pronunciation may be quantitatively evaluated by synthesizing the calculated similarity.

상기한 목적을 달성하기 위한 본 발명의 특징에 따른 외국어 교육을 위한 발음 정확도 평가 방법은,A method for evaluating pronunciation accuracy for foreign language education according to a feature of the present invention for achieving the above object,

발음 정확도 평가 방법으로서,As a method for evaluating pronunciation accuracy,

(1) 학습자가 발음하는 음성 신호 및 상기 음성 신호를 발음하는 학습자의 영상을 입력받는 단계;(1) receiving an audio signal pronounced by a learner and an image of a learner pronouncing the audio signal;

(2) 학습자의 음성 신호로부터 음소 시퀀스를 추출하는 단계;(2) extracting a phoneme sequence from the learner's speech signal;

(3) 학습자의 영상에서 입모양 랜드마크를 인식하고, 인식된 입모양 랜드마크의 변화 시퀀스를 추출하는 단계; 및(3) recognizing a mouth-shaped landmark from the learner's image, and extracting a sequence of change of the recognized mouth-shaped landmark; And

(4) 상기 단계 (2) 및 단계 (3)에서 추출된 상기 음성 신호의 음소 시퀀스 및 입모양 랜드마크의 변화 시퀀스를 이용하여 발음 정확도를 평가하는 단계를 포함하는 것을 그 구성상의 특징으로 한다.(4) It is characterized in that it comprises the step of evaluating pronunciation accuracy using the phoneme sequence of the speech signal and the change sequence of the mouth-shaped landmark extracted in the steps (2) and (3).

본 발명에서 제안하고 있는 외국어 교육을 위한 발음 정확도 평가 장치 및 방법에 따르면, 학습자의 음성 신호로부터 추출되며 발음의 리듬, 음도, 강세 등의 속성이 부여된 음소 시퀀스를 이용해 발음 정확도를 평가함으로써 세부적인 발음 특성을 효과적으로 반영하여 평가할 수 있고, 발음을 하는 학습자의 영상으로부터 입모양 랜드마크의 변화 시퀀스를 추출하여 원어민의 입모양과 비교함으로써 발음 정확도의 평가 결과에 대한 신뢰도를 향상시킬 수 있다.
According to the apparatus and method for evaluating pronunciation accuracy for foreign language education proposed in the present invention, detailed pronunciation accuracy is evaluated by using a phoneme sequence extracted from a learner's speech signal and given attributes such as rhythm, pitch, and stress. The pronunciation characteristics can be effectively reflected and evaluated, and the reliability of the evaluation result of pronunciation accuracy can be improved by extracting the change sequence of the mouth-shaped landmark from the image of the learner who pronounces the pronunciation and comparing it with the mouth shape of a native speaker.

또한, 본 발명에서 제안하고 있는 외국어 교육을 위한 발음 정확도 평가 장치 및 방법에 따르면, 원어민 발음과 학습자 발음 사이의 음성의 파형의 유사도 및 음소의 유사도 뿐 아니라, 속성이 부여된 음소 시퀀스 및 입모양 랜드마크의 변화 시퀀스를 종합적으로 반영하여 발음 정확도를 평가함으로써, 발음 정확도에 대한 신뢰도 높은 종합적인 평가 결과를 제공할 수 있고, 이를 통해 외국어 학습자가 문제점을 정확하게 인식하고 보완할 수 있도록 하며, 발음 정확도에 대한 가이드를 제시하여 학습 효과를 향상시킬 수 있다.In addition, according to the pronunciation accuracy evaluation apparatus and method for foreign language education proposed in the present invention, not only the similarity of the waveform of the voice and the similarity of the phoneme between the pronunciation of the native speaker and the pronunciation of the learner, but also the phoneme sequence and mouth-shaped land with attributes By evaluating the pronunciation accuracy by comprehensively reflecting the change sequence of the mark, it is possible to provide a comprehensive evaluation result with high confidence in the pronunciation accuracy, which enables foreign language learners to accurately recognize and compensate for problems, and improve pronunciation accuracy. You can improve the learning effect by presenting a guide for this.

도 1은 본 발명의 일실시예에 따른 외국어 교육을 위한 발음 정확도 평가 장치의 구성을 도시한 도면.
도 2는 본 발명의 일실시예에 따른 외국어 교육을 위한 발음 정확도 평가 장치의 발음 정확도 평가 과정을 설명하기 위해 도시한 도면.
도 3은 본 발명의 일실시예에 따른 외국어 교육을 위한 발음 정확도 평가 장치에서, 음석 분석 모듈의 세부적인 구성을 도시한 도면.
도 4는 본 발명의 일실시예에 따른 외국어 교육을 위한 발음 정확도 평가 방법의 흐름을 도시한 도면.1 is a diagram showing a configuration of a pronunciation accuracy evaluation apparatus for foreign language education according to an embodiment of the present invention.
2 is a diagram illustrating a pronunciation accuracy evaluation process of a pronunciation accuracy evaluation apparatus for foreign language education according to an embodiment of the present invention.
3 is a diagram showing a detailed configuration of a phonetic analysis module in the pronunciation accuracy evaluation apparatus for foreign language education according to an embodiment of the present invention.
4 is a view showing a flow of a pronunciation accuracy evaluation method for foreign language education according to an embodiment of the present invention.

이하, 첨부된 도면을 참조하여 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자가 본 발명을 용이하게 실시할 수 있도록 바람직한 실시예를 상세히 설명한다. 다만, 본 발명의 바람직한 실시예를 상세하게 설명함에 있어, 관련된 공지 기능 또는 구성에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명을 생략한다. 또한, 유사한 기능 및 작용을 하는 부분에 대해서는 도면 전체에 걸쳐 동일한 부호를 사용한다.
Hereinafter, preferred embodiments will be described in detail with reference to the accompanying drawings so that those of ordinary skill in the art may easily implement the present invention. However, in describing a preferred embodiment of the present invention in detail, if it is determined that a detailed description of a related known function or configuration may unnecessarily obscure the subject matter of the present invention, the detailed description thereof will be omitted. In addition, the same reference numerals are used throughout the drawings for portions having similar functions and functions.

덧붙여, 명세서 전체에서, 어떤 부분이 다른 부분과 ‘연결’ 되어 있다고 할 때, 이는 ‘직접적으로 연결’ 되어 있는 경우뿐만 아니라, 그 중간에 다른 소자를 사이에 두고 ‘간접적으로 연결’ 되어 있는 경우도 포함한다. 또한, 어떤 구성요소를 ‘포함’ 한다는 것은, 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있다는 것을 의미한다.
In addition, in the entire specification, when a part is said to be'connected' with another part, it is not only'directly connected', but also'indirectly connected' with another element in the middle. Include. In addition, "including" a certain component means that other components may be further included rather than excluding other components unless otherwise stated.

도 1은 본 발명의 일실시예에 따른 외국어 교육을 위한 발음 정확도 평가 장치의 구성을 도시한 도면이고, 도 2는 본 발명의 일실시예에 따른 외국어 교육을 위한 발음 정확도 평가 장치의 발음 정확도 평가 과정을 설명하기 위해 도시한 도면이다. 도 1 및 도 2에 도시된 바와 같이, 본 발명의 일실시예에 따른 외국어 교육을 위한 발음 정확도 평가 장치는, 입력 모듈(100), 음성 분석 모듈(200), 영상 분석 모듈(300) 및 평가 모듈(400)을 포함하여 구성될 수 있으며, 데이터베이스 모듈(500)을 더 포함하여 구성될 수 있다.
1 is a diagram showing the configuration of a pronunciation accuracy evaluation apparatus for foreign language education according to an embodiment of the present invention, Figure 2 is a pronunciation accuracy evaluation of the pronunciation accuracy evaluation apparatus for foreign language education according to an embodiment of the present invention It is a diagram shown to explain the process. 1 and 2, the pronunciation accuracy evaluation apparatus for foreign language education according to an embodiment of the present invention includes an input module 100, a voice analysis module 200, an image analysis module 300, and an evaluation It may be configured to include a module 400, and may be configured to further include a database module 500.

즉, 본 발명의 일실시예에 따른 외국어 교육을 위한 발음 정확도 평가 장치는, 입력 모듈(100)을 통해 학습자의 음성 신호와 음성 신호를 발음하는 학습자의 영상을 입력받고, 음성 분석 모듈(200)과 영상 분석 모듈(300)이 학습자의 음성 신호와 영상을 각각 분석하며, 평가 모듈(400)이 음성 분석 모듈(200) 및 영상 분석 모듈(300)에서 추출된 음소 시퀀스 및 입모양 랜드마크의 변화 시퀀스를 이용하여 발음 정확도를 평가할 수 있다.
That is, the pronunciation accuracy evaluation apparatus for foreign language education according to an embodiment of the present invention receives a learner's voice signal and a learner's image pronouncing the voice signal through the input module 100, and the voice analysis module 200 And the image analysis module 300 analyzes the learner's voice signal and image, respectively, and the evaluation module 400 analyzes the phoneme sequence extracted from the voice analysis module 200 and the image analysis module 300 and changes in mouth-shaped landmarks The pronunciation accuracy can be evaluated using the sequence.

특히, 본 발명에서는, 음성 분석 모듈(200)이 학습자의 음성 신호로부터 음소를 추출하고 리듬, 음도, 강세 등의 속성을 부여한 음소 시퀀스를 추출하여 이를 발음 정확도 평가에 이용하기 때문에, 세부적인 발음 특성을 효과적으로 반영하여 평가할 수 있다. 또한, 영상 분석 모듈(300)이 발음을 하는 학습자의 영상으로부터 입모양 랜드마크의 변화 시퀀스를 추출하여 원어민의 입모양과 비교함으로써 발음 정확도의 평가 결과에 대한 신뢰도를 향상시킬 수 있다.
In particular, in the present invention, since the speech analysis module 200 extracts a phoneme from the learner's speech signal, extracts a phoneme sequence with attributes such as rhythm, pitch, and stress, and uses it for pronunciation accuracy evaluation, detailed pronunciation characteristics Can be effectively reflected and evaluated. In addition, the image analysis module 300 extracts a change sequence of a mouth-shaped landmark from an image of a learner who speaks pronunciation, and compares it with the mouth shape of a native speaker, thereby improving reliability of the evaluation result of pronunciation accuracy.

이와 같이, 본 발명의 일실시예에 따른 외국어 교육을 위한 발음 정확도 평가 장치는, 종래에 발음 정확도 평가에 활용되던 원어민 발음과 학습자 발음 사이의 음성의 파형의 유사도 및 음소 자체의 유사도 뿐 아니라, 속성이 부여된 음소 시퀀스 및 입모양 랜드마크의 변화 시퀀스를 종합적으로 반영하여 발음 정확도를 평가함으로써, 발음 정확도에 대한 신뢰도 높은 종합적인 평가 결과를 제공할 수 있다.
As described above, in the pronunciation accuracy evaluation apparatus for foreign language education according to an embodiment of the present invention, not only the similarity of the waveform of the voice and the similarity of the phoneme itself between the pronunciation of a native speaker and the learner's pronunciation used in the conventional pronunciation accuracy evaluation, but also By comprehensively reflecting the assigned phoneme sequence and the change sequence of the mouth-shaped landmark to evaluate pronunciation accuracy, a comprehensive evaluation result with high reliability for pronunciation accuracy can be provided.

이하에서는, 도 1 및 도 2를 참조하여 본 발명의 일실시예에 따른 외국어 교육을 위한 발음 정확도 평가 장치의 세부적인 구성에 대하여 상세히 설명하도록 한다.
Hereinafter, a detailed configuration of an apparatus for evaluating pronunciation accuracy for foreign language education according to an embodiment of the present invention will be described in detail with reference to FIGS. 1 and 2.

입력 모듈(100)은, 학습자가 발음하는 음성 신호 및 음성 신호를 발음하는 학습자의 영상을 입력받을 수 있다. 학습자는 컴퓨터, 휴대단말기, 태블릿 PC, 어학용 학습기 등 각종 디바이스를 이용해 외국어 학습을 할 수 있는데, 도 2에 도시된 바와 같이, 입력 모듈(100)은 학습자의 디바이스에 내장되거나 디바이스에 연결된 마이크를 통해 음성 신호를, 카메라를 통해 영상을 각각 입력받을 수 있다.
The input module 100 may receive an audio signal pronounced by a learner and an image of a learner pronouncing the audio signal. The learner can learn a foreign language using various devices such as a computer, a portable terminal, a tablet PC, and a language learner. As shown in FIG. 2, the input module 100 is embedded in the learner's device or through a microphone connected to the device. An audio signal and an image may be respectively input through a camera.

음성 분석 모듈(200)은, 학습자의 음성 신호로부터 음소 시퀀스를 추출할 수 있다. 즉, 음성 분석 모듈(200)은, 입력 모듈(100)로부터 학습자가 발음하는 음성 신호를 전달받아 이를 분석함으로써 음소 시퀀스를 추출할 수 있다. 또한, 도 2에 도시된 바와 같이, 음성 분석 모듈(200)은, 학습자의 음성 신호로부터 음성 파형을 분석하고, 음소를 추출하여 추출한 음소에 음소 속성을 부여하여 음소 시퀀스를 추출할 수 있다. 이하에서는, 도 3을 참조하여 음성 분석 모듈(200)의 세부적인 구성에 대해서 상세히 설명하도록 한다.
The speech analysis module 200 may extract a phoneme sequence from the learner's speech signal. That is, the speech analysis module 200 may extract a phoneme sequence by receiving a speech signal pronounced by the learner from the input module 100 and analyzing it. In addition, as shown in FIG. 2, the speech analysis module 200 may extract a phoneme sequence by analyzing a speech waveform from a learner's speech signal, extracting a phoneme, and assigning a phoneme attribute to the extracted phoneme. Hereinafter, a detailed configuration of the speech analysis module 200 will be described in detail with reference to FIG. 3.

도 3은 본 발명의 일실시예에 따른 외국어 교육을 위한 발음 정확도 평가 장치에서, 음석 분석 모듈의 세부적인 구성을 도시한 도면이다. 도 3에 도시된 바와 같이, 본 발명의 일실시예에 따른 외국어 교육을 위한 발음 정확도 평가 장치의 음석 분석 모듈은, 음소 추출부(210) 및 음소 시퀀스 추출부(220)를 포함하여 구성될 수 있다.
3 is a diagram showing a detailed configuration of a phonetic analysis module in the pronunciation accuracy evaluation apparatus for foreign language education according to an embodiment of the present invention. As shown in FIG. 3, the phoneme analysis module of the apparatus for evaluating pronunciation accuracy for foreign language education according to an embodiment of the present invention may include a phoneme extraction unit 210 and a phoneme sequence extraction unit 220. have.

음소 추출부(210)는, 음성 신호로부터 음소를 추출할 수 있다. 음소는 음의 최소 단위로서 문자나 기호 등으로 표현되는데, 음소 추출부(210)는 학습자의 음성 신호를 음소로 분할함으로써 원어민의 발음 즉, 원어민이 발음한 음소와 세밀한 비교가 가능하도록 할 수 있다.
The phoneme extraction unit 210 may extract a phoneme from a voice signal. The phoneme is the smallest unit of sound and is expressed as a letter or symbol, and the phoneme extraction unit 210 divides the learner's voice signal into phonemes, so that a native speaker's pronunciation, that is, a phoneme pronounced by the native speaker, can be compared in detail. .

음소 시퀀스 추출부(220)는, 추출된 음소에 속성을 부여하여, 속성이 부여된 음소 시퀀스를 추출할 수 있다. 여기서, 음소에 부여되는 속성은, 음소의 리듬(개별 음소의 길이, Rhythm), 강세(Accent) 및 음도(Pitch)를 포함할 수 있다. 즉, 음소 시퀀스 추출부(220)는, 학습자의 음성 신호를 분석하여 개별 음소의 리듬, 강세, 음도 등을 음소 추출부(210)에서 추출된 음소에 부여하여 음소 시퀀스를 생성할 수 있다. 이와 같이 속성이 부여된 음소 시퀀스를 이용하면, 음소 자체 뿐 아니라 음소가 발음되는 리듬, 강세, 음도 등의 특성까지 발음 정확도 평가에 반영할 수 있다.
The phoneme sequence extractor 220 may extract a phoneme sequence to which the attribute is assigned by assigning an attribute to the extracted phoneme. Here, the attributes assigned to the phoneme may include the phoneme's rhythm (individual phoneme length, Rhythm), accent, and pitch. That is, the phoneme sequence extractor 220 may generate a phoneme sequence by analyzing the learner's voice signal and assigning the rhythm, stress, and pitch of individual phonemes to the phoneme extracted by the phoneme extractor 210. By using the phoneme sequence to which attributes are assigned as described above, not only the phoneme itself but also characteristics such as rhythm, stress, and pitch in which the phoneme is pronounced can be reflected in the pronunciation accuracy evaluation.

특히, 외국어에 따라서는, 동일한 음소로 이루어진 단어이더라도 각 음소가 발음되는 길이(리듬), 강세, 음도에 따라 그 의미가 상이해지기도 하고, 개별적인 음소가 덜 정확하더라도 리듬이나 강세만 정확하게 해도 발음이 훨씬 좋아지기도 한다. 따라서 본 발명에서는, 음소 시퀀스를 이용해 음소의 속성을 효과적으로 반영하여 학습자의 발음을 정확하게 평가할 수 있고, 학습자의 발음 정확도 향상에 큰 도움을 줄 수 있다.
In particular, depending on foreign languages, even if the word is composed of the same phoneme, the meaning of each phoneme may be different depending on the length (rhythm), stress, and pitch at which each phoneme is pronounced. It also gets much better. Accordingly, in the present invention, a learner's pronunciation can be accurately evaluated by effectively reflecting the properties of a phoneme using a phoneme sequence, and it can greatly help improve the learner's pronunciation accuracy.

영상 분석 모듈(300)은, 학습자의 영상에서 입모양 랜드마크를 인식하고, 인식된 입모양 랜드마크의 변화 시퀀스를 추출할 수 있다. 즉, 영상 분석 모듈(300)은, 입력 모듈(100)로부터 학습자가 음성 신호를 발음하는 영상을 전달받아, 안면 인식 기술을 이용해 학습자의 안면이 촬영된 영상에서 입술 주위의 입모양 랜드마크를 인식할 수 있으며, 인식된 입모양 랜드마크의 변화 시퀀스를 생성할 수 있다. 보다 구체적으로는, 외국어 발음 시 입술 주위의 특징이 되는 적어도 둘 이상의 부분을 입모양 랜드마크로 설정하고, 영상에 포함된 학습자의 안면에서 미리 설정된 입모양 랜드마크를 인식하여 그 변화 시퀀스를 추출함으로써, 학습자가 정확한 입모양으로 발음을 하고 있는지 판단할 수 있다.
The image analysis module 300 may recognize a mouth-shaped landmark from a learner's image and extract a change sequence of the recognized mouth-shaped landmark. That is, the image analysis module 300 receives an image of a learner pronouncing a voice signal from the input module 100, and recognizes a mouth-shaped landmark around the lips from the image in which the learner's face is photographed using facial recognition technology. It can, and a sequence of changes of recognized mouth-shaped landmarks can be created. More specifically, by setting at least two or more portions that are features around the lips when pronunciation in a foreign language as a mouth-shaped landmark, recognizing a preset mouth-shaped landmark from the learner's face included in the image, and extracting the change sequence, You can judge whether the learner is speaking with the correct mouth shape.

정확한 입모양은 발음 정확도에 큰 영향을 미친다. 소리가 일부 유사하더라도 정확한 입모양을 하지 않으면 정확하게 발음이 되지 않으며, 해당 발음을 정확하게 알고 있다고 하기 어렵다. 따라서 본 발명에서는, 입모양 변화를 추적하여 이를 발음 정확도 평가에 반영함으로써, 발음 평가 신뢰도를 높이고 정확한 평가가 가능하도록 할 수 있다.
The correct mouth shape greatly affects the accuracy of pronunciation. Even if the sound is partially similar, it cannot be pronounced correctly unless the correct mouth shape is made, and it is difficult to say that the pronunciation is accurately known. Accordingly, in the present invention, the change in the shape of the mouth is tracked and reflected in the pronunciation accuracy evaluation, thereby increasing the pronunciation evaluation reliability and enabling an accurate evaluation.

평가 모듈(400)은, 음성 분석 모듈(200) 및 영상 분석 모듈(300)에서 추출된 음성 신호의 음소 시퀀스 및 입모양 랜드마크의 변화 시퀀스를 이용하여 발음 정확도를 평가할 수 있다. 또한, 평가 모듈(400)은, 음소 추출부(210)에서 추출한 음소를 더 이용하여 발음 정확도를 평가할 수 있으며, 학습자의 음성 신호의 파형을 분석하여 발음 정확도 평가에 반영할 수도 있다.
The evaluation module 400 may evaluate pronunciation accuracy using the phoneme sequence of the voice signal extracted from the voice analysis module 200 and the image analysis module 300 and a change sequence of the mouth-shaped landmark. In addition, the evaluation module 400 may further evaluate pronunciation accuracy by further using the phoneme extracted by the phoneme extraction unit 210, and may analyze the waveform of the learner's voice signal and reflect it in the pronunciation accuracy evaluation.

즉, 평가 모듈(400)은, 기존의 방식인 원어민 발음과 학습자 발음 사이의 음성 파형의 유사도 및 음소의 유사도를 이용할 뿐 아니라, 본 발명의 특징인 속성이 부여된 음소 시퀀스 및 입모양 랜드마크의 변화 시퀀스를 종합적으로 반영하여 발음 정확도를 평가할 수 있다. 이와 같이, 음성 분석 모듈(200)이 분석한 학습자의 음성 파형, 음소 및 음소 시퀀스, 영상 분석 모듈(300)이 분석한 입모양 랜드마크의 변화 시퀀스를 통합적으로 활용하여 발음 정확도를 평가함으로써, 신뢰도 높고 종합적인 평가 결과를 제공할 수 있다.
That is, the evaluation module 400 uses not only the similarity of the voice waveform and the similarity of the phoneme between the pronunciation of the native speaker and the learner's pronunciation, which are the conventional methods, but also of the phoneme sequence and mouth-shaped landmarks with attributes that are characteristic of the present invention. The pronunciation accuracy can be evaluated by comprehensively reflecting the change sequence. In this way, the voice analysis module 200 analyzes the learner's voice waveform, phoneme and phoneme sequence, and the image analysis module 300 analyzed by integrating the change sequence of the mouth-shaped landmark to evaluate pronunciation accuracy, It can provide high and comprehensive evaluation results.

데이터베이스 모듈(500)은, 원어민 발음에 대해 분석된 음성 파형, 음소, 입모양 랜드마크 데이터를 저장할 수 있다. 도 2에 도시된 바와 같이, 원어민의 음성 신호와 영상을 분석하여, 데이터베이스 모듈(500)에 원어민 발음에 대해 분석된 음성 파형(음성 파형 DB), 음소 및 음소에 부여된 속성(음소/속성 DB), 입모양 랜드마크(입모양 랜드마크 DB)를 저장하고, 평가 모듈(400)은 데이터베이스 모듈(500)에 저장된 원어민 데이터와 학습자의 음성 신호의 음성 파형, 음소, 음소 시퀀스 및 입모양 랜드마크를 비교하여 발음 정확도를 평가할 수 있다.
The database module 500 may store voice waveform, phoneme, and mouth-shaped landmark data analyzed for the pronunciation of a native speaker. As shown in FIG. 2, by analyzing the voice signal and image of the native speaker, the voice waveform (speech waveform DB) analyzed for the pronunciation of the native speaker in the database module 500, the phoneme and the attributes (phoneme/attribute DB) assigned to the phoneme ), a mouth-shaped landmark (mouth-shaped landmark DB), and the evaluation module 400 stores native speaker data stored in the database module 500 and the voice waveform of the learner's voice signal, phoneme, phoneme sequence, and mouth-shaped landmark The pronunciation accuracy can be evaluated by comparing.

보다 구체적으로는, 평가 모듈(400)은, 데이터베이스 모듈(500)에 저장된 원어민 데이터와 학습자의 음성 신호의 음성 파형, 음소 시퀀스 및 입모양 랜드마크를 각각 비교하여 유사도를 산출하며, 산출된 유사도를 종합하여 발음 정확도를 정량적으로 평가할 수 있다. 이때, 산출되는 유사도는, 코사인 유사도(Cosine similarity), 유클리디언 거리(Euclidean distance), 마할라노비스 거리(Mahalanobis distance) 및 민코스키 거리(Minkowski distance)를 포함하는 군에서 선택된 적어도 어느 하나일 수 있다.
More specifically, the evaluation module 400 calculates similarity by comparing the native speaker data stored in the database module 500 with the voice waveform, phoneme sequence, and mouth-shaped landmark of the learner's voice signal, respectively, and calculates the similarity. Collectively, pronunciation accuracy can be quantitatively evaluated. At this time, the calculated similarity may be at least one selected from the group including Cosine similarity, Euclidean distance, Mahalanobis distance, and Minkosuki distance. have.

도 4는 본 발명의 일실시예에 따른 외국어 교육을 위한 발음 정확도 평가 방법의 흐름을 도시한 도면이다. 도 4에 도시된 바와 같이, 본 발명의 일실시예에 따른 외국어 교육을 위한 발음 정확도 평가 방법은, 발음 정확도 평가 장치에 의해 처리되며, 학습자가 발음하는 음성 신호 및 학습자의 영상을 입력받는 단계(S100), 음성 신호로부터 음소 시퀀스를 추출하는 단계(S200), 학습자의 영상에서 입모양 랜드마크를 인식하고 입모양 랜드마크의 변화 시퀀스를 추출하는 단계(S300) 및 음소 시퀀스 및 입모양 랜드마크의 변화 시퀀스를 이용하여 발음 정확도를 평가하는 단계(S400)를 포함하여 구현될 수 있다.
4 is a diagram illustrating a flow of a method for evaluating pronunciation accuracy for foreign language education according to an embodiment of the present invention. As shown in FIG. 4, the pronunciation accuracy evaluation method for foreign language education according to an embodiment of the present invention is processed by a pronunciation accuracy evaluation apparatus, and receiving a voice signal and an image of a learner pronounced by a learner ( S100), extracting a phoneme sequence from the voice signal (S200), recognizing a mouth-shaped landmark from the learner's image and extracting a change sequence of the mouth-shaped landmark (S300), and the phoneme sequence and the mouth-shaped landmark It may be implemented including the step (S400) of evaluating pronunciation accuracy using the change sequence.

단계 S100에서는, 학습자가 발음하는 음성 신호 및 음성 신호를 발음하는 학습자의 영상을 입력받을 수 있다. 단계 S100은 발음 정확도 평가 장치의 입력 모듈(100)에 의해 처리될 수 있다.
In step S100, a voice signal pronounced by the learner and an image of a learner who pronounce the voice signal may be input. Step S100 may be processed by the input module 100 of the pronunciation accuracy evaluation apparatus.

단계 S200에서는, 학습자의 음성 신호로부터 음소 시퀀스를 추출할 수 있다. 단계 S200은 발음 정확도 평가 장치의 음성 분석 모듈(200)에 의해 처리될 수 있다.
In step S200, a phoneme sequence may be extracted from the learner's voice signal. Step S200 may be processed by the speech analysis module 200 of the apparatus for evaluating pronunciation accuracy.

단계 S300에서는, 학습자의 영상에서 입모양 랜드마크를 인식하고, 인식된 입모양 랜드마크의 변화 시퀀스를 추출할 수 있다. 단계 S300은 발음 정확도 평가 장치의 영상 분석 모듈(300)에 의해 처리될 수 있다.
In step S300, a mouth-shaped landmark may be recognized from the learner's image, and a change sequence of the recognized mouth-shaped landmark may be extracted. Step S300 may be processed by the image analysis module 300 of the pronunciation accuracy evaluation apparatus.

단계 S400에서는, 단계 S200 및 단계 S300에서 추출된 음성 신호의 음소 시퀀스 및 입모양 랜드마크의 변화 시퀀스를 이용하여 발음 정확도를 평가할 수 있다. 단계 S400은 발음 정확도 평가 장치의 평가 모듈(400)에 의해 처리될 수 있다.
In step S400, pronunciation accuracy may be evaluated using the phoneme sequence of the speech signal extracted in steps S200 and S300 and the change sequence of the mouth-shaped landmark. Step S400 may be processed by the evaluation module 400 of the pronunciation accuracy evaluation apparatus.

본 발명의 일실시예에 따른 외국어 교육을 위한 발음 정확도 평가 방법의 각 단계에 대한 구체적인 설명은, 본 발명의 일실시예에 따른 외국어 교육을 위한 발음 정확도 평가 장치에서 이미 설명하였으므로 생략하도록 한다.
A detailed description of each step of the method for evaluating pronunciation accuracy for foreign language education according to an embodiment of the present invention has already been described in the pronunciation accuracy evaluation apparatus for foreign language education according to an embodiment of the present invention, and thus will be omitted.

본 발명에서 제안하고 있는 외국어 교육을 위한 발음 정확도 평가 장치 및 방법에 따르면, 학습자의 음성 신호로부터 추출되며 발음의 리듬, 음도, 강세 등의 속성이 부여된 음소 시퀀스를 이용해 발음 정확도를 평가함으로써 세부적인 발음 특성을 효과적으로 반영하여 평가할 수 있고, 발음을 하는 학습자의 영상으로부터 입모양 랜드마크의 변화 시퀀스를 추출하여 원어민의 입모양과 비교함으로써 발음 정확도의 평가 결과에 대한 신뢰도를 향상시킬 수 있다. 또한, 원어민 발음과 학습자 발음 사이의 음성의 파형의 유사도 및 음소의 유사도 뿐 아니라, 속성이 부여된 음소 시퀀스 및 입모양 랜드마크의 변화 시퀀스를 종합적으로 반영하여 발음 정확도를 평가함으로써, 발음 정확도에 대한 신뢰도 높은 종합적인 평가 결과를 제공할 수 있고, 이를 통해 외국어 학습자가 문제점을 정확하게 인식하고 보완할 수 있도록 하며, 발음 정확도에 대한 가이드를 제시하여 학습 효과를 향상시킬 수 있다.
According to the apparatus and method for evaluating pronunciation accuracy for foreign language education proposed in the present invention, detailed pronunciation accuracy is evaluated by using a phoneme sequence extracted from a learner's speech signal and given attributes such as rhythm, pitch, and stress. The pronunciation characteristics can be effectively reflected and evaluated, and the reliability of the evaluation result of pronunciation accuracy can be improved by extracting the change sequence of the mouth-shaped landmark from the image of the learner who pronounces the pronunciation and comparing it with the mouth shape of a native speaker. In addition, pronunciation accuracy is evaluated by comprehensively reflecting not only the similarity of the waveform of the voice and the similarity of the phoneme between the pronunciation of the native speaker and the pronunciation of the learner, but also the phoneme sequence with attributes and the change sequence of the mouth-shaped landmark. Comprehensive evaluation results with high reliability can be provided, through which foreign language learners can accurately recognize and supplement problems, and improve learning effects by presenting a guide for pronunciation accuracy.

이상 설명한 본 발명은 본 발명이 속한 기술분야에서 통상의 지식을 가진 자에 의하여 다양한 변형이나 응용이 가능하며, 본 발명에 따른 기술적 사상의 범위는 아래의 특허청구범위에 의하여 정해져야 할 것이다.The present invention described above can be modified or applied in various ways by those of ordinary skill in the technical field to which the present invention belongs, and the scope of the technical idea according to the present invention should be determined by the following claims.

100: 입력 모듈
200: 음성 분석 모듈
210: 음소 추출부(210)
220: 음소 시퀀스 추출부(220)
300: 영상 분석 모듈
400: 평가 모듈
500: 데이터베이스 모듈
S100: 학습자가 발음하는 음성 신호 및 학습자의 영상을 입력받는 단계
S200: 음성 신호로부터 음소 시퀀스를 추출하는 단계
S300: 학습자의 영상에서 입모양 랜드마크를 인식하고 입모양 랜드마크의 변화 시퀀스를 추출하는 단계
S400: 음소 시퀀스 및 입모양 랜드마크의 변화 시퀀스를 이용하여 발음 정확도를 평가하는 단계100: input module
200: speech analysis module
210: phoneme extraction unit 210
220: phoneme sequence extraction unit 220
300: image analysis module
400: evaluation module
500: database module
S100: Step of receiving an audio signal pronounced by the learner and an image of the learner
S200: Extracting a phoneme sequence from the speech signal
S300: Recognizing a mouth-shaped landmark from the learner's image and extracting a change sequence of the mouth-shaped landmark
S400: Evaluating pronunciation accuracy using a phoneme sequence and a change sequence of a mouth-shaped landmark

Claims

As a pronunciation accuracy evaluation device,
An input module 100 for receiving an audio signal pronounced by a learner and an image of a learner pronouncing the audio signal;
A speech analysis module 200 for extracting a phoneme sequence from the learner's speech signal;
An image analysis module 300 for recognizing a mouth-shaped landmark from the learner's image and extracting a change sequence of the recognized mouth-shaped landmark;
An evaluation module 400 for evaluating pronunciation accuracy using a phoneme sequence of the voice signal and a change sequence of a mouth-shaped landmark extracted from the voice analysis module 200 and the image analysis module 300; And
It includes a database module 500 for storing voice waveforms analyzed for native speaker pronunciation, phonemes and attributes assigned to phonemes, and mouth-shaped landmark data,
The image analysis module 300,
Recognizes the mouth-shaped landmark around the lips from the image of the learner's face by using facial recognition technology, but sets at least two or more features around the lips as the mouth-shaped landmark in advance during pronunciation, and included in the image. Recognizing at least two pre-set mouth-shaped landmarks on the learner's face to extract a change sequence,
The evaluation module 400,
The native speaker data stored in the database module 500 is compared with the voice waveform of the learner's voice signal, the phoneme, the phoneme sequence to which the attribute is assigned, and the change sequence of the mouth-shaped landmark to evaluate pronunciation accuracy, and the database module The similarity is calculated by comparing the native speaker data stored in 500 with the voice waveform of the learner's voice signal, the phoneme sequence to which the attribute is assigned, and the change sequence of the mouth-shaped landmark, and by synthesizing the calculated similarity, pronunciation accuracy Evaluate quantitatively,
The calculated similarity is,
Cosine similarity (Cosine similarity), Euclidean distance (Euclidean distance), Mahalanobis distance (Mahalanobis distance) and Minkowski distance (Minkowski distance), characterized in that any one of, pronunciation accuracy evaluation apparatus for foreign language education.

The method of claim 1, wherein the speech analysis module (200),
A phoneme extraction unit 210 for extracting a phoneme from the speech signal; And
And a phoneme sequence extracting unit (220) for assigning an attribute to the extracted phoneme and extracting a phoneme sequence to which the attribute is assigned.

The method of claim 2, wherein the attribute assigned to the phoneme,
Apparatus for evaluating pronunciation accuracy for foreign language education, comprising the rhythm, stress, and pitch of the phoneme.

The method of claim 2, wherein the evaluation module 400,
The pronunciation accuracy evaluation apparatus for foreign language education, characterized in that the pronunciation accuracy is evaluated by further using the phoneme extracted by the phoneme extraction unit 210.

delete

As a method for evaluating pronunciation accuracy,
(1) receiving an audio signal pronounced by a learner and an image of a learner pronouncing the audio signal;
(2) extracting a phoneme sequence from the learner's speech signal;
(3) recognizing a mouth-shaped landmark from the learner's image and extracting a change sequence of the recognized mouth-shaped landmark; And
(4) evaluating pronunciation accuracy using the phoneme sequence of the speech signal and the change sequence of the mouth-shaped landmark extracted in the steps (2) and (3),
In step (3),
Recognizes the mouth-shaped landmark around the lips from the image of the learner's face by using facial recognition technology, but sets at least two or more features around the lips as the mouth-shaped landmark in advance during pronunciation, and included in the image. Recognizing at least two pre-set mouth-shaped landmarks on the learner's face to extract a change sequence,
In step (4),
Native speaker data stored in the database module 500 that stores the analyzed voice waveform, phoneme and phoneme attributes, and mouth-shaped landmark data analyzed for the native speaker's pronunciation, and the voice waveform, phoneme, and attributes of the learner's voice signal The pronunciation accuracy is evaluated by comparing the generated phoneme sequence and the change sequence of the mouth-shaped landmark, but the native speaker data stored in the database module 500 and the speech waveform of the learner's speech signal, the phoneme sequence to which the attribute is assigned, and the The similarity is calculated by comparing each change sequence of the mouth-shaped landmark, and the pronunciation accuracy is quantitatively evaluated by synthesizing the calculated similarity,
The calculated similarity is,
Cosine similarity (Cosine similarity), Euclidean distance (Euclidean distance), Mahalanobis distance (Mahalanobis distance) and Minkowski distance (Minkowski distance), characterized in that any one of, pronunciation accuracy evaluation method for foreign language education.