KR20220036239A

KR20220036239A - Pronunciation evaluation system based on deep learning

Info

Publication number: KR20220036239A
Application number: KR1020200118571A
Authority: KR
Inventors: 장현석
Original assignee: 주식회사 퀄슨
Priority date: 2020-09-15
Filing date: 2020-09-15
Publication date: 2022-03-22
Also published as: KR102405547B1

Abstract

본 발명은 발음 평가 시스템에 대한 것이다.
본 발명에 따른 발음 평가 시스템은 학습에 필요한 문장을 제공하는 문자 제공부, 제공된 문장에 대응하여 학습자에 의해 발화된 음성을 수집하는 발화 음성 수집부, 상기 발화된 음성을 텍스트 형태로 변환하고, 변환된 텍스트를 발화문으로 획득하는 텍스트 변환부, 상기 획득한 발화문과 제공된 문장을 비교 분석하여 문장에 대한 매칭률을 산출하는 평가부, 평가된 결과를 이용하여 학습자가 지속적으로 틀리게 발음하는 단어 및 인식에 오류가 발생되는 단어들에 대한 정보를 수집하고, 수집된 단어를 테이블형태로 저장하는 오인식 단어 수집부, 그리고 발화문과 제공된 문장이 일치할 경우에는 평가된 결과를 수치화하고, 발음 정확도 및 강세 일치에 따라 색상으로 구분화하여 제공하고, 발화문과 제공된 문장이 불일치 할 경우에는 제공된 문장에 대하여 재 발화 요청 신호를 송신하는 제어부를 포함한다.The present invention relates to a pronunciation evaluation system.
The pronunciation evaluation system according to the present invention includes a character providing unit that provides sentences necessary for learning, a spoken voice collection unit that collects the voice uttered by the learner in response to the provided sentence, and converts the spoken voice into text form. A text conversion unit that acquires the obtained text into an utterance, an evaluation unit that calculates the matching rate for the sentence by comparing and analyzing the obtained utterance and the provided sentence, and recognizes words that the learner consistently pronounces incorrectly using the evaluated results. A misrecognition word collection unit that collects information about words in which errors occur and stores the collected words in table form, and when the utterance and the provided sentence match, the evaluation results are quantified, and pronunciation accuracy and stress matching are provided. It is provided in color-coded form, and if there is a discrepancy between the utterance and the provided sentence, it includes a control unit that transmits a signal requesting re-utterance for the provided sentence.

Description

Pronunciation evaluation system based on deep learning}

본 발명은 발음 평가 시스템에 관한 것으로, 보다 상세하게는 딥러닝 및 NLP(자연어 처리)기술을 활용하여 사용자가 발화함에 따라 획득한 음성정보와 제공되는 문장을 매칭하여 사용자의 발음을 평가하는 발음 평가 시스템에 관한 것이다.The present invention relates to a pronunciation evaluation system, and more specifically, a pronunciation evaluation that utilizes deep learning and NLP (natural language processing) technology to evaluate the user's pronunciation by matching the provided sentences with the voice information obtained as the user speaks. It's about the system.

최근 컴퓨터를 이용하여 영어와 같은 외국어 학습을 하는 사용자가 증가되고 있다. 특히 영어 발음의 학습을 위한 프로그램이 증가하고 있는데 사용자가 마이크를 통해 특정 단어 또는 문장을 발화하면 그 발화를 분석하여 사용자의 발음에 대한 평가를 수행하여 제공한다. 이때, 사용자의 발화 내용을 알아내기 위해 음성 인식 기술이 응용되며, 평가 결과로 점수 또는 평가 수준에 맞는 피드백을 사용자에게 제공한다.Recently, the number of users using computers to learn foreign languages such as English has been increasing. In particular, programs for learning English pronunciation are increasing. When a user utters a specific word or sentence through a microphone, the utterance is analyzed and an evaluation of the user's pronunciation is provided. At this time, voice recognition technology is applied to find out the content of the user's speech, and as a result of the evaluation, feedback appropriate to the score or evaluation level is provided to the user.

사용자에게 제공되는 발음 학습 결과의 내용으로는 문장 발음 학습의 경우 발화 내용 전체에 대한 전반적인 발음의 정확도(overall score)만을 표시하고 있는 경우가 많으며, 단어 발음 학습의 경우 해당 단어의 발음이 제대로 발음되었는지 아닌 지만을 표시하고 있는 경우가 많다. 이때, 문장과 같이 여러 단어를 발화하는 경우에는 일부 문제 있는 발음의 단어에 대한 지적이 없어 정확한 발음 학습 결과를 제공받지 못하는 문제점이 있다.In the case of sentence pronunciation learning, the content of the pronunciation learning results provided to the user often indicates only the overall pronunciation accuracy (overall score) for the entire utterance content, and in the case of word pronunciation learning, it indicates whether the word was pronounced correctly. In many cases, it only indicates that it is not. At this time, when uttering multiple words such as sentences, there is a problem in that accurate pronunciation learning results are not provided because some words with problematic pronunciation are not pointed out.

한편, 음성 인식 결과를 더욱 활용하여 사용자가 잘못 발음한 부분을 지적하여 알려주는 경우도 있으나, 이 경우 한국인의 영어 발음에 있어서 틀리기 쉬운 부분을 미리 지식화하여 구축하고 음성 인식을 통해 잘못된 발음이 인식되면 이를 사용자에게 알려주는 것으로 추가로 정확한 발음에 대한 정보를 구축하여야 발음의 교정이 가능한 문제점이 있다.On the other hand, there are cases where the voice recognition results are further utilized to point out and inform the user of parts that the user has mispronounced, but in this case, parts that are prone to errors in Koreans' English pronunciation are built with knowledge in advance and the incorrect pronunciation is recognized through voice recognition. There is a problem that correcting pronunciation is possible only by building additional information about accurate pronunciation by informing the user of this.

발음 자체의 정확도 이외에도, 특히 문장 발화에 있어 발화의 자연스러움을 측정하기 위해 초분절적인(suprasegmental) 평가요소를 포함하여 제공하는 경우도 있으나, 초분절적인 요소도 문장단위로 평가되고 있어 문장내의 잘못된 부분을 지적하고 어떻게 잘못된 것인지에 대한 세부적인 내용을 제공하기에는 어려움이 따른다. 여기서, 초분절적인 평가요소로는 문장의 억양, 강세, 말의 빠르기 등과 같이 분절되지 않는 항목이며, 분절적인 요소는 문장, 구절, 음절, 단어 및 음소와 같이 분리 가능한 항목이다.In addition to the accuracy of the pronunciation itself, there are cases where suprasegmental evaluation factors are included to measure the naturalness of speech, especially when uttering sentences. It's difficult to point out something and provide details about how it went wrong. Here, suprasegmental evaluation elements are non-segmental items such as sentence intonation, stress, speaking speed, etc., and segmental elements are separable items such as sentences, phrases, syllables, words, and phonemes.

따라서, 문장에 대한 발음학습 결과를 소정의 마디 단위 별로 초분절적인 요소까지 평가하여 제공할 수 있는 외국어 발음 평가 기술이 요구된다.Therefore, there is a need for a foreign language pronunciation evaluation technology that can provide pronunciation learning results for sentences by evaluating suprasegmental elements for each predetermined unit of words.

본 발명의 배경이 되는 기술은 대한민국 공개특허공보 제10-2019-0068841호(2019.06.19. 공개)에 개시되어 있다.The technology behind the present invention is disclosed in Republic of Korea Patent Publication No. 10-2019-0068841 (published on June 19, 2019).

본 발명이 이루고자 하는 기술적 과제는 딥러닝 및 NLP(자연어 처리)기술을 활용하여 사용자가 발화함에 따라 획득한 음성정보와 제공되는 문장을 매칭하여 사용자의 발음을 평가하는 발음 평가 시스템을 제공하기 위한 것이다.The technical problem to be achieved by the present invention is to provide a pronunciation evaluation system that evaluates the user's pronunciation by matching the speech information obtained as the user speaks with the provided sentence using deep learning and NLP (natural language processing) technology. .

이러한 기술적 과제를 이루기 위한 본 발명의 실시 예에 따른 발음 평가 시스템에 있어서, 학습에 필요한 문장을 제공하는 문자 제공부, 제공된 문장에 대응하여 학습자에 의해 발화된 음성을 수집하는 발화 음성 수집부, 상기 발화된 음성을 텍스트 형태로 변환하고, 변환된 텍스트를 발화문으로 획득하는 텍스트 변환부, 상기 획득한 발화문과 제공된 문장을 비교 분석하여 문장에 대한 매칭률을 산출하는 평가부, 평가된 결과를 이용하여 학습자가 지속적으로 틀리게 발음하는 단어 및 인식에 오류가 발생되는 단어들에 대한 정보를 수집하고, 수집된 단어를 테이블형태로 저장하는 오인식 단어 수집부, 그리고 발화문과 제공된 문장이 일치할 경우에는 평가된 결과를 수치화하고, 발음 정확도 및 강세 일치에 따라 색상으로 구분화하여 제공하고, 발화문과 제공된 문장이 불일치 할 경우에는 제공된 문장에 대하여 재 발화 요청 신호를 송신하는 제어부를 포함한다. In the pronunciation evaluation system according to an embodiment of the present invention for achieving this technical problem, a character providing unit providing sentences necessary for learning, a speech voice collection unit collecting speech uttered by the learner in response to the provided sentences, the above A text conversion unit that converts the spoken voice into text form and obtains the converted text as an utterance, an evaluation unit that calculates a matching rate for the sentence by comparing and analyzing the obtained utterance and the provided sentence, and uses the evaluated results. A misrecognition word collection unit collects information on words that learners consistently pronounce incorrectly and words in which recognition errors occur, stores the collected words in a table, and evaluates when the utterance and the provided sentence match. It includes a control unit that quantifies the results, provides them by color-coded according to pronunciation accuracy and stress matching, and transmits a signal requesting re-utterance for the provided sentence if there is a discrepancy between the utterance and the provided sentence.

상기 텍스트 변환부는, 상기 발화된 음성을 단어별로 분류하고, 분류된 단어를 이용하여 획득한 발화문과, 상기 발화문과 유사한 복수의 후보 문장을 생성하며, 상기 분류된 단어마다 그에 대응하는 시간 정보를 제공할 수 있다. The text conversion unit classifies the spoken voice by word, generates an utterance obtained using the classified word, and a plurality of candidate sentences similar to the utterance, and provides time information corresponding to each classified word. can do.

상기 평가부는, 제공된 문장과 발화문을 한 단어씩 비교하고, 매칭되는 단어의 개수에 따라 매칭률을 산출할 수 있다. The evaluation unit may compare the provided sentence and utterance word by word and calculate a matching rate according to the number of matched words.

상기 평가부는, 제공된 문장과 발화문을 단어 단위로 분류하고, 분류된 단어 마다 비교하여 제공된 문장 또는 발화문에 가중치를 부여하여 매칭률을 산출할 수 있다. The evaluation unit may classify the provided sentences and utterances word by word, compare each classified word, assign a weight to the provided sentences or utterances, and calculate a matching rate.

상기 평가부는, 단어를 제대로 발음하거나 단어를 한번 더 발음하였을 경우에는 발화문에 가중치를 부여하고, 단어를 잘못 발음하였거나 단어를 빼먹었을 경우에는 제공된 문장에 가중치를 부여할 수 있다. The evaluation unit may assign weight to the utterance if the word is pronounced correctly or the word is pronounced once again, and may assign weight to the provided sentence if the word is pronounced incorrectly or the word is omitted.

상기 평가부는, 제공된 문장과 발화문을 단어 단위로 분류하고, 분류된 단어를 복수의 경우의 수로 설정한 다음, 설정된 경우의 수를 이용하여 매칭률을 산출할 수 있다. The evaluation unit may classify the provided sentences and utterances word by word, set the classified words to a plurality of cases, and then calculate a matching rate using the set number of cases.

상기 오인식 단어 수집부는, 숫자, 자주 쓰이지 않은 단어 및 학습자가 반복적으로 틀리게 발음하는 단어를 추출하여 오인식 단어로 분류하고, 분류된 오인식 단어들을 테이블 형태로 저장할 수 있다. The misrecognized word collection unit extracts numbers, infrequently used words, and words that learners repeatedly pronounce incorrectly, classifies them as misrecognized words, and stores the classified misrecognized words in a table.

상기 제어부는, 학습자의 발화된 음성을 이용하여 파형을 획득하고, 상기 획득한 파형과 발화문에 포함된 단어의 시간정보를 매칭하여 단어의 강세를 획득하며, 획득한 강세와 원 단어의 강세를 비교하여 정답 여부를 색상으로 표현하여 출력할 수 있다. The control unit acquires a waveform using the learner's uttered voice, acquires the stress of the word by matching the acquired waveform with time information of the word included in the utterance, and combines the acquired stress with the stress of the original word. By comparing, you can print out whether the answer is correct or not, expressed in color.

이와 같이 본 발명에 따르면, 딥러닝 및 자연어 처리(NLP)기술을 기반으로 문장에 대한 사용자의 발화 정도를 평가하고, 평가된 결과를 지표화하여 제공하므로 사용자로 하여금 성취감을 느낄 수 있도록 하며, 스스로 영어의 발음을 입력하면 부정확한 발음에 대한 위치를 음소 혹은 음절 단위로 지적하여 사용자에게 알려줌과 동시에 발음의 정확도, 강세, 억양, 속도 등에 대한 자세한 분석이 가능하여 학습 능률을 높일 수 있다. In this way, according to the present invention, based on deep learning and natural language processing (NLP) technology, the user's level of speech for the sentence is evaluated, and the evaluated results are indexed and provided, so that the user can feel a sense of accomplishment and learn English on their own. By entering the pronunciation of , the location of the incorrect pronunciation is pointed out in units of phonemes or syllables and notified to the user. At the same time, detailed analysis of pronunciation accuracy, stress, intonation, speed, etc. can be performed, thereby improving learning efficiency.

또한 본 발명에 따르면, 사용자가 반복적으로 틀리게 하는 발음하는 단어들을 모니터링하여 지속적으로 변환테이블을 업데이트 함으로서 사용자의 발음에 대한 인식률을 높일 수 있고, 강세가 포함된 단어와 강세가 포함되지 않은 단어로 분류하여 발화된 문장과 제공된 문장의 매칭률을 높일 수 있다. In addition, according to the present invention, the recognition rate of the user's pronunciation can be increased by monitoring words that the user repeatedly pronounces incorrectly and continuously updating the conversion table, and classifying them into stressed words and unstressed words. By doing so, the matching rate between the uttered sentence and the provided sentence can be increased.

도 1은 본 발명의 실시예에 따른 발음 평가 시스템을 설명하기 위한 구성도이다.
도 2는 본 발명의 실시예에 따른 발음 평가 시스템을 이용하여 발음을 평가하는 방법을 나타내는 순서도이다.
도 3은 도 2에 도시된 S210단계를 나타내는 도면이다.
도 4는 도 2에 도시된 S220단계를 설명하기 위한 도면이다.
도 5는 도 2에 도시된 S240단계에서 발화문으로 변환하여 출력한 상태를 나타내는 예시도이다.
도 6은 도 2에 도시된 S250단계에서 3번째 방법에 의해 매칭률을 산출하는 방법을 설명하기 위한 도면이다.
도 7은 도 2에 도시된 S270단계에서 도출된 결과를 출력한 상태를 나타내는 예시도이다.
도 8은 도 2에 도시된 S270단계에서 단어의 강세를 판단하는 방법을 설명하기 위한 도면이다. 1 is a configuration diagram for explaining a pronunciation evaluation system according to an embodiment of the present invention.
Figure 2 is a flowchart showing a method of evaluating pronunciation using a pronunciation evaluation system according to an embodiment of the present invention.
Figure 3 is a diagram showing step S210 shown in Figure 2.
FIG. 4 is a diagram for explaining step S220 shown in FIG. 2.
Figure 5 is an example diagram showing the state of converting and outputting a speech sentence in step S240 shown in Figure 2.
FIG. 6 is a diagram illustrating a method of calculating the matching rate by the third method in step S250 shown in FIG. 2.
Figure 7 is an example diagram showing a state in which the results derived in step S270 shown in Figure 2 are output.
FIG. 8 is a diagram for explaining a method of determining the stress of a word in step S270 shown in FIG. 2.

이하 첨부된 도면을 참조하여 본 발명에 따른 바람직한 실시 예를 상세히 설명하기로 한다. 이 과정에서 도면에 도시된 선들의 두께나 구성요소의 크기 등은 설명의 명료성과 편의상 과장되게 도시되어 있을 수 있다. Hereinafter, preferred embodiments according to the present invention will be described in detail with reference to the attached drawings. In this process, the thickness of lines or sizes of components shown in the drawings may be exaggerated for clarity and convenience of explanation.

또한 후술되는 용어들은 본 발명에서의 기능을 고려하여 정의된 용어들로서, 이는 사용자, 운용자의 의도 또는 관례에 따라 달라질 수 있다. 그러므로 이러한 용어들에 대한 정의는 본 명세서 전반에 걸친 내용을 토대로 내려져야 할 것이다.Additionally, the terms described below are terms defined in consideration of functions in the present invention, and may vary depending on the intention or custom of the user or operator. Therefore, definitions of these terms should be made based on the content throughout this specification.

이하에서는 본 발명의 실시예에 따른 발음 평가 시스템에 대해 더욱 구체적으로 설명한다. Hereinafter, the pronunciation evaluation system according to an embodiment of the present invention will be described in more detail.

도 1은 본 발명의 실시예에 따른 딥러닝 기반의 발음 평가 시스템을 설명하기 위한 구성도이다. 1 is a configuration diagram illustrating a deep learning-based pronunciation evaluation system according to an embodiment of the present invention.

도 1에 도시된 바와 같이, 본 발명의 실시예에 따른 발음 평가 시스템(100)은 문자 제공부(110), 발화 음성 수집부(120), 텍스트 변환부(130), 평가부(140), 오인식 단어 수집부(150) 및 제어부(160)를 포함한다. As shown in FIG. 1, the pronunciation evaluation system 100 according to an embodiment of the present invention includes a text providing unit 110, a spoken voice collection unit 120, a text conversion unit 130, an evaluation unit 140, It includes a misrecognition word collection unit 150 and a control unit 160.

먼저, 문자 제공부(110)는 학습에 필요한 문장을 제공한다. 이때 제공되는 문장은 영화나 드라마에 나오는 대사일수도 있고 일상 생활에 필요한 대화 또는 음악 가사일 수도 있다. First, the text provider 110 provides sentences necessary for learning. The sentences provided at this time may be lines from movies or dramas, conversations needed in daily life, or music lyrics.

발화 음성 수집부(120)는 제공된 문장에 대응하여 학습자에 의해 발화된 음성을 수집한다. 학습자는 단말기에 포함된 마이크 기능을 턴온한다. 그리고, 마이크에 대고 제공되는 문자를 따라 읽음으로써, 학습자 단말기는 학습자로부터 발화된 음성을 수집한다. The spoken voice collection unit 120 collects the voice spoken by the learner in response to the provided sentence. The learner turns on the microphone function included in the terminal. Then, by reading the text provided into the microphone, the learner's terminal collects the voice spoken by the learner.

텍스트 변환부(130)는 수집된 발화된 음성을 텍스트 형태로 변환하여 획득한다. The text conversion unit 130 converts the collected spoken voices into text form and obtains them.

평가부(140)는 발화문과 학습에 제공된 문장을 비교하여 매칭률을 획득한다. 평가부(140)는 딥러닝 모델을 구축하고, 구축된 딥러닝 모델에 발화문과 학습에 제공된 문장을 입력한다. 그러면, 딥러닝 모델은 발화문과 학습에 제공된 문장을 단어 단위로 분류한 다음, 분류된 단어를 상호 매칭하여 매칭률을 획득한다. The evaluation unit 140 obtains a matching rate by comparing the utterance with the sentences provided for learning. The evaluation unit 140 builds a deep learning model and inputs the utterance and sentences provided for learning into the built deep learning model. Then, the deep learning model classifies the utterance and the sentences provided for learning by word, and then matches the classified words with each other to obtain a matching rate.

오인식 단어 수집부(150)는 평가된 결과를 이용하여 학습자가 지속적으로 틀리게 발음하는 단어 및 인식에 오류가 발생되는 단어들에 대한 정보를 수집하고, 수집된 단어를 테이블형태로 저장한다. The misrecognition word collection unit 150 uses the evaluation results to collect information on words that the learner consistently pronounces incorrectly and words in which recognition errors occur, and stores the collected words in a table.

마지막으로 제어부(160)는 발화문과 제공된 문장이 일치할 경우에는 평가된 결과를 수치화하고, 발음 정확도 및 강세 일치에 따라 색상으로 구분화하여 제공하고, 발화문과 제공된 문장이 불일치 할 경우에는 제공된 문장에 대하여 재 발화 요청 신호를 송신한다. Finally, if the utterance and the provided sentence match, the control unit 160 quantifies the evaluation results and provides them by color-coding them according to pronunciation accuracy and stress matching. If the utterance and the provided sentence do not match, the provided sentence is provided. A re-ignition request signal is transmitted.

이하에서는 도 2 내지 도 8을 이용하여 본 발명의 실시예에 따른 발음 평가 시스템을 이용하여 발음을 평가하는 방법에 대해 더욱 상세하게 설명한다. Hereinafter, a method for evaluating pronunciation using a pronunciation evaluation system according to an embodiment of the present invention will be described in more detail using FIGS. 2 to 8.

도 2는 본 발명의 실시예에 따른 발음 평가 시스템을 이용하여 발음을 평가하는 방법을 나타내는 순서도이고, 도 3은 도 2에 도시된 S210단계를 나타내는 도면이고, 도 4는 도 2에 도시된 S220단계를 설명하기 위한 도면이고, 도 5는 도 2에 도시된 S240단계에서 발화문으로 변환하여 출력한 상태를 나타내는 예시도이고 도 6은 도 2에 도시된 S250단계에서 3번째 방법에 의해 매칭률을 산출하는 방법을 설명하기 위한 도면이고, 도 7은 도 2에 도시된 S270단계에서 도출된 결과를 출력한 상태를 나타내는 예시도이고, 도 8은 도 2에 도시된 S270단계에서 단어의 강세를 판단하는 방법을 설명하기 위한 도면이다. Figure 2 is a flowchart showing a method of evaluating pronunciation using a pronunciation evaluation system according to an embodiment of the present invention, Figure 3 is a diagram showing step S210 shown in Figure 2, and Figure 4 is a diagram showing step S220 shown in Figure 2. It is a diagram for explaining the steps, and Figure 5 is an example diagram showing the state of converting and outputting an utterance in step S240 shown in Figure 2, and Figure 6 is a matching rate by the third method in step S250 shown in Figure 2. It is a diagram for explaining a method of calculating, and Figure 7 is an example diagram showing the output state of the result derived in step S270 shown in Figure 2, and Figure 8 shows the stress of the word in step S270 shown in Figure 2. This is a drawing to explain how to make a judgment.

먼저, 학습자는 발음을 평가할 수 있는 어플리케이션을 다운로드 하고, 이를 활성화한다. 그러면, 도 3에 도시된 바와 같이, 문자 제공부(110)는 학습 가능한 문장을 제공한다(S210). First, the learner downloads an application that can evaluate pronunciation and activates it. Then, as shown in FIG. 3, the text provider 110 provides a learnable sentence (S210).

학습자는 제공된 문장 중에서 어느 하나의 문장을 선택할 수 도 있고, 첫 문장부터 순차적으로 선택할 수 도 있다. Learners can select any one sentence from among the sentences provided, or they can select them sequentially starting from the first sentence.

문장 선택이 완료되면, 제어부(160)는 학습자에 의해 선택된 학습자의 성별과 평가받고자 하는 특정 국가에 대한 억양에 대한 정보를 수신한다(S220).When the sentence selection is completed, the control unit 160 receives information about the gender of the learner selected by the learner and the accent for the specific country to be evaluated (S220).

도 4에 도시된 바와 같이, 본 발명은 학습자의 선택된 언어 방식 즉, 영국식 또는 미국식 중에서 선택된 방법에 따라 그에 대응하여 평가가 이루어질 수 있도록 한다. As shown in Figure 4, the present invention allows evaluation to be made corresponding to the learner's selected language method, that is, British or American.

그 다음, 학습자가 제공받은 문장에 대해 녹음을 수행하면, 발화 음성 수집부(120)는 녹음을 수행함에 발생된 학습자의 발화 음성을 수신한다(S230).Next, when the learner records the provided sentence, the speech voice collection unit 120 receives the learner's speech voice generated during the recording (S230).

텍스트 변환부(130)는 수신된 발화 음성을 텍스트 형태로 변환한다(S240). The text conversion unit 130 converts the received spoken voice into text form (S240).

그리고, 도 5에 도시된 바와 같이, 변환된 발화문은 즉각적으로 어플리케이션을 통해 출력된다 And, as shown in Figure 5, the converted utterance is immediately output through the application.

부연하자면, 텍스트 변환부(130)는 발화된 음성을 단어별로 분류하고, 분류된 단어를 이용하여 발화문을 획득한다. 이때, 텍스트 변환부(130)는 발화문 이외에 추가적으로 복수의 후보 문장을 생성한다.To elaborate, the text conversion unit 130 classifies the uttered voice by word and obtains an utterance using the classified words. At this time, the text conversion unit 130 generates a plurality of candidate sentences in addition to the utterance sentence.

예를 들면, 제공된 문장이 "I am hungry"일 경우, 텍스트 변환부(130)는 제공된 문자에 대응하여 발화된 음성을 "I am hungry"로 변환하고, 그 외에 "I was hungry", "I am angry"와 같이 후보 문장을 생성한다. For example, if the provided sentence is “I am hungry,” the text conversion unit 130 converts the voice spoken in response to the provided text into “I am hungry,” and in addition, “I was hungry,” “I Generate candidate sentences such as “am angry.”

S250 단계가 완료되면, 평가부(140)는 발화문과 제공된 문장을 상호 비교하여 매칭율을 산출한다(S250). When step S250 is completed, the evaluation unit 140 compares the utterance and the provided sentence to calculate a matching rate (S250).

평가부(140)는 3가지 방법 중에서 어느 하나의 방법에 의해 매칭율을 산출한다. The evaluation unit 140 calculates the matching rate using one of three methods.

먼저 첫번째 방법에 대해 설명하면, 평가부(140)는 제공된 문장과 발화문을 한 단어씩 비교하고, 매칭되는 단어의 개수에 따라 매칭률을 산출한다. First, the first method will be described. The evaluation unit 140 compares the provided sentence and utterance word by word and calculates the matching rate according to the number of matched words.

예를 들면, 제공된 문장이 "I want to go home"이고, 발화문이 "I want to go home"라고 가정하면, 평가부(140)는 해당되는 발화문에 대해 매칭률을 100%로 산출한다.For example, assuming that the provided sentence is “I want to go home” and the utterance is “I want to go home,” the evaluation unit 140 calculates a matching rate of 100% for the corresponding utterance. .

반면에, 발화문이 "I go home"일 경우, 평가부(140)는 해당하는 발화문에 대해 매칭률을 20%로 산출한다. On the other hand, when the utterance is “I go home,” the evaluation unit 140 calculates a matching rate of 20% for the corresponding utterance.

그 다음, 두번째 발명에 대해 설명하면, 평가부(140)는 제공된 문장과 발화문을 단어 단위로 분류하고, 분류된 단어 마다 비교하여 제공된 문장 또는 발화문에 가중치를 부여하여 매칭률을 산출한다. Next, explaining the second invention, the evaluation unit 140 classifies the provided sentences and utterances by word, compares each classified word, assigns a weight to the provided sentences or utterances, and calculates a matching rate.

예를 들면, 제공된 문장이 "I want to go home"으로서 "I", "want", "to", "go", "home"으로 단어 단위로 분류되었고, 제공된 문장에 대응하는, 발화문은 "I go home "으로서 "I", "go", "home"으로 단어 단위로 분류되었다고 가정한다.For example, the provided sentence was "I want to go home", which was classified into words as "I", "want", "to", "go", and "home", and the utterance corresponding to the provided sentence was Assume that “I go home” is classified into words as “I”, “go”, and “home”.

그러면, 평가부(140)는 한 단어씩 비교하여 단어를 제대로 발음하거나 단어를 한번 더 발음하였을 경우에는 발화문에 가중치를 부여하고, 단어를 잘못 발음하였거나 단어를 빼먹었을 경우에는 제공된 문장에 가중치를 부여하여 매칭률을 산출한다. Then, the evaluation unit 140 compares each word and assigns a weight to the utterance if the word is pronounced correctly or if the word is pronounced once again, and if a word is pronounced incorrectly or a word is omitted, it assigns a weight to the provided sentence. Calculate the matching rate.

마지막으로 3번째 방법에 대해 설명하면, 평가부(140)는 제공된 문장과 발화문을 단어 단위로 분류한다. 그리고 도 6에 도시된 바와 같이, 평가부(140)는 분류된 단어를 복수의 경우의 수로 설정한 다음, 설정된 경우의 수를 이용하여 매칭률을 산출한다. Lastly, regarding the third method, the evaluation unit 140 classifies the provided sentences and utterances into word units. And as shown in FIG. 6, the evaluation unit 140 sets the number of cases for the classified word to a plurality of cases and then calculates the matching rate using the set number of cases.

그 다음, 오인식 단어 수집부(150)는 평가가 완료된 발화문으로부터 인식되지 않은 단어 또는 학습자가 발음에 실패하는 단어들을 추출하고, 추출된 단어들을 테이블 형태로 저장한다(S260). Next, the misrecognition word collection unit 150 extracts unrecognized words or words that the learner fails to pronounce from the evaluated utterance and stores the extracted words in a table (S260).

오인식 단어 수집부(150)는 인식하는데 어려움을 느끼는 숫자, 또는 자주 활용되지 않은 단어 또는 인간의 감정을 표현하는 단어 등에 대해 제대로 인식되지 않고 잘못 변환될 확률이 높으므로, 오인식 단어 수집부(150)는 오인식된 단어들을 추출하고, 추출된 단어를 테이블 형태로 저장한다. The misrecognition word collection unit 150 has a high probability of not properly recognizing numbers that are difficult to recognize, words that are not used frequently, or words expressing human emotions and being converted incorrectly, so the misrecognition word collection unit 150 Extracts misrecognized words and stores the extracted words in table form.

테이블 형태로 오인식 단어들은 저장하는 이유는 발화문과 비교하여 오답 또는 정답으로 분석하는데 정확한 인식률을 도모할 수 있도록 하기 위함이다. The reason for storing misrecognized words in table form is to achieve an accurate recognition rate by comparing them with the utterance and analyzing them as incorrect or correct.

마지막으로 제어부(160)는 매칭률이 높은 발화문에 대한 분석된 결과를 출력한다(S270).Finally, the control unit 160 outputs the analysis results for the utterance with a high matching rate (S270).

도 7에 도시된 바와 같이, 제어부(160)는 발음 정확도, 강세, 국가에 따른 억양과 비슷한 정도를 분석하여 출력한다. As shown in FIG. 7, the control unit 160 analyzes and outputs pronunciation accuracy, stress, and degree of similarity to intonation according to country.

이때 도 8에 도시된 바와 같이, 제어부(160)는 학습자의 발화된 음성을 이용하여 파형을 획득하고, 획득한 파형과 발화문에 포함된 단어의 시간정보를 매칭하여 단어의 강세를 획득한다. At this time, as shown in FIG. 8, the control unit 160 acquires a waveform using the learner's uttered voice, and matches the obtained waveform with the time information of the word included in the utterance to obtain the stress of the word.

한편, 제어부(160)는 학습자 목소리의 세기에 따라 단어의 강세를 판단할 수도 있다. Meanwhile, the control unit 160 may determine the stress of a word according to the strength of the learner's voice.

본 발명은 도면에 도시된 실시 예를 참고로 하여 설명되었으나 이는 예시적인 것에 불과하며, 당해 기술이 속하는 분야에서 통상의 지식을 가진 자라면 이로부터 다양한 변형 및 균등한 타 실시 예가 가능하다는 점을 이해할 것이다. 따라서 본 발명의 진정한 기술적 보호범위는 아래의 특허청구범위의 기술적 사상에 의하여 정해져야 할 것이다.The present invention has been described with reference to the embodiments shown in the drawings, but these are merely illustrative, and those skilled in the art will understand that various modifications and other equivalent embodiments are possible therefrom. will be. Therefore, the true technical protection scope of the present invention should be determined by the technical spirit of the patent claims below.

100 : 발음 평가 시스템
110 : 문자 제공부
120 : 발화 음성 수집부
130 : 텍스트 변환부
140 : 평가부
150 : 오인식 단어 수집부
160 : 제어부100: Pronunciation evaluation system
110: Text provision unit
120: Speech voice collection unit
130: Text conversion unit
140: Evaluation department
150: Misrecognition word collection unit
160: control unit

Claims

In a pronunciation evaluation system based on deep learning and NLP (natural language processing) technology,
A text provider that provides sentences necessary for learning,
A spoken voice collection unit that collects the voice uttered by the learner in response to the provided sentence;
A text conversion unit that converts the uttered voice into text form and obtains the converted text as an utterance,
An evaluation unit that calculates a matching rate for sentences by comparing and analyzing the obtained utterance and the provided sentence using a previously constructed deep learning model;
A misrecognition word collection unit that uses the evaluated results to collect information on words that learners consistently pronounce incorrectly and words in which recognition errors occur, and stores the collected words in a table format;
If the utterance and the provided sentence match, the evaluation results are quantified and provided color-coded according to pronunciation accuracy and stress matching. If the utterance and the provided sentence do not match, a signal requesting reutterance for the provided sentence is sent. A pronunciation evaluation system including a control unit that

According to paragraph 1,
The text conversion unit,
A pronunciation evaluation system that classifies the spoken voice by word, generates an utterance obtained using the classified words, and a plurality of candidate sentences similar to the utterance, and provides corresponding time information for each classified word.

According to paragraph 1,
The evaluation department,
A pronunciation evaluation system that compares provided sentences and utterances word by word and calculates the matching rate based on the number of matched words.

According to paragraph 3,
The evaluation department,
A pronunciation evaluation system that classifies provided sentences and utterances by word, compares each classified word, assigns weight to the provided sentences or utterances, and calculates the matching rate.

According to clause 4,
The evaluation department,
If a word is pronounced correctly or a word is pronounced again, weight is given to the utterance,
A pronunciation evaluation system that gives weight to the sentences provided when words are mispronounced or words are omitted.

According to clause 4,
The evaluation department,
A pronunciation evaluation system that classifies provided sentences and utterances by word, sets the number of classified words to a plurality of cases, and then calculates the matching rate using the set number of cases.

According to paragraph 1,
The misrecognition word collection unit,
A pronunciation evaluation system that extracts numbers, infrequently used words, and words that learners repeatedly pronounce incorrectly, classifies them as misrecognized words, and stores the classified misrecognized words in a table.

According to paragraph 1,
The control unit,
Obtain the waveform using the learner's spoken voice,
A pronunciation evaluation system that obtains the stress of a word by matching the obtained waveform with the time information of the word included in the utterance, compares the obtained stress with the stress of the original word, and outputs the correct answer in color.