KR101411039B1

KR101411039B1 - Method for evaluating pronunciation with speech recognition and electronic device using the same

Info

Publication number: KR101411039B1
Application number: KR1020120012421A
Authority: KR
Inventors: 김현수; 김진석; 박용석; 백승한; 이정제; 한병민; 공혜진
Original assignee: 에스케이씨앤씨 주식회사
Priority date: 2012-02-07
Filing date: 2012-02-07
Publication date: 2014-07-07
Also published as: KR20130091128A

Abstract

음성인식을 이용한 발음 평가 방법이 제공된다. 본 발명의 실시예에 따른, 발음 평가 방법은, 사용자의 음성에 대한 음성 인식 결과로 추출된 다수의 후보 문장들을 이용하여 사용자의 발음을 평가한다. 이에 의해, 즉각적이면서도 객관적으로 발음에 대한 평가를 제공받을 수 있게 된다.A pronunciation evaluation method using speech recognition is provided. The pronunciation evaluation method according to the embodiment of the present invention evaluates the user's pronunciation by using a plurality of candidate sentences extracted as a result of speech recognition of the user's speech. As a result, evaluation of pronunciation can be provided immediately and objectively.

Description

TECHNICAL FIELD [0001] The present invention relates to a pronunciation evaluation method using speech recognition,

본 발명은 발음 평가 방법에 관한 것으로, 더욱 상세하게는 사용자의 음성을 획득하여 어플리케이션을 통해 발음의 정확도를 평가하는 발음 평가 방법 및 이를 적용한 전자기기에 관한 것이다.
The present invention relates to a pronunciation evaluation method, and more particularly, to a pronunciation evaluation method for acquiring a user's voice and evaluating the pronunciation accuracy through an application, and an electronic apparatus using the method.

스마트폰의 영어 학습 어플리케이션은 때와 장소에 구애받지 않는 간편한 영어 학습을 가능하게 하였다. 이와 같은 영어 학습 어플리케이션은 스마트폰의 다양한 기능을 이용하여, 보다 다양한 학습 프로그램을 제공하기에 이르렀다.The English learning application on the smartphone enabled easy learning of English regardless of time and place. Such an English learning application has provided various learning programs using various functions of the smartphone.

대표적으로, 스마트폰의 녹음기능을 이용하여 사용자의 영어 발음을 녹음하고, 이를 다시 사용자에게 제공함으로서 사용자가 자신의 발음을 직접 듣고 확인할 수 있도록 하였다. 더 나아가, 스마트폰의 무선 인터넷 기능을 이용하여 녹음 파일을 인터넷 상의 게시판에 등록하고, 사용자들 간에 발음 평가(별점이나 코멘트 등)를 주고 받을 수 있도록 한 것도 있다.Typically, the recording function of the smartphone is used to record the user's English pronunciation, and the user is provided with the pronunciation again so that the user can directly hear and confirm his pronunciation. Furthermore, there is also a case where a recording file is registered on a bulletin board on the Internet using the wireless Internet function of a smartphone, and a pronunciation evaluation (a comment, a comment, etc.) is sent and received between users.

하지만, 이와 같은 방식의 발음 평가들은, 자신 또는 타인의 주관적인 평가에 입각한 것으로 객관적이지 못하다. 뿐만 아니라, 평가에 적극적으로 참여하는 타인이 없다면, 타인의 평가를 받는 것 자체게 불가능하므로 무용지물이 되어 버린다는 문제가 있다.However, pronunciation evaluations of this type are based on subjective evaluation of oneself or others and are not objective. In addition, if there is no other person actively participating in the evaluation, there is a problem that it becomes useless because it is impossible to receive the evaluation of others.

또한, 타인이 자신의 발음에 대해 평가를 한다 하더라도, 타인이 게시판을 통해 녹음 파일을 확인하기까지의 비교적 짧지 않은 시간이 소요되는 바, 즉각적인 평가가 사실상 불가능하여 조속한 평가를 받지 못하는 답답함이 있다.
In addition, even if a person evaluates his / her pronunciation, it takes a comparatively short time until another person confirms a recorded file through a bulletin board. Therefore, there is a frustration that an immediate evaluation is virtually impossible and a prompt evaluation is not received.

본 발명은 상기와 같은 문제점을 해결하기 위하여 안출된 것으로서, 본 발명의 목적은, 보다 객관적이면서 즉각적인 발음 평가를 위한 방안으로, 음성인식을 이용하여 발음의 정확도를 평가하는 방법 및 이를 적용한 전자기기를 제공함에 있다.
SUMMARY OF THE INVENTION The present invention has been made in order to solve the above problems, and it is an object of the present invention to provide a method of estimating pronunciation accuracy by using speech recognition as a method for more objective and immediate pronunciation evaluation, .

상기 목적을 달성하기 위한 본 발명의 일 실시예에 따른, 발음 평가 방법은, 사용자의 음성을 획득하는 단계; 획득한 상기 사용자의 음성에 대한 음성 인식 결과로 추출된 다수의 후보 문장들을 획득하는 단계; 및 상기 후보 문장들을 이용하여 상기 사용자의 발음을 평가하는 단계;를 포함한다.According to another aspect of the present invention, there is provided a pronunciation evaluation method including: obtaining a voice of a user; Acquiring a plurality of candidate sentences extracted as speech recognition results of the obtained voice of the user; And evaluating the pronunciation of the user using the candidate sentences.

그리고, 본 발명의 실시예에 따른 발음 평가 방법은, 문장을 제공하는 단계;를 더 포함하고, 상기 음성 획득단계는, 상기 제공단계에서 제공된 문장을 읽는 사용자의 음성을 획득할 수 있다.Further, the pronunciation evaluation method according to the embodiment of the present invention may further include the step of providing a sentence, and the voice acquisition step may acquire a voice of a user reading the sentence provided in the providing step.

또한, 상기 사용자의 음성에 대한 음성 인식은, 문장 단위의 음성 인식일 수 있다.The voice recognition of the voice of the user may be voice recognition in units of sentences.

그리고, 상기 평가단계는, 다수의 후보 문장들을 단어 단위로 분리하는 단계; 상기 분리단계를 통해 분리된 단어들을 중복 제거하면서 통합하여 단어 그룹을 생성하는 단계; 상기 제공한 문장을 구성하는 단어들 중 상기 단어 그룹에 포함된 단어들을 파악하는 단계; 및 상기 파악단계에서 파악된 단어들의 개수를 기초로, 상기 사용자의 발음 정확도를 평가하는 단계;를 포함할 수 있다.The evaluating step may include: dividing a plurality of candidate sentences into words; Generating a word group by merging and removing the separated words through the separation step; Recognizing words included in the word group among words constituting the provided sentence; And evaluating the pronunciation accuracy of the user based on the number of words recognized in the recognition step.

또한, 상기 평가단계는, 상기 다수의 후보 문장들에서 문장 부호를 제거하는 단계;를 더 포함하고, 상기 분리단계는, 상기 문장 부호가 제거된 다수의 후보 문장들을 단어 단위로 분리할 수 있다.The evaluating step may further include removing the punctuation marks from the plurality of candidate sentences, and the separating step may separate the plurality of candidate sentences from which the punctuation marks have been removed into words.

그리고, 본 발명의 실시예에 따른 발음 평가 방법은, 상기 제공단계에서 제공한 문장에서, 상기 파악단계에서 파악된 단어들을 파악되지 않은 단어와 다른 색으로 변경시키는 단계;를 포함할 수 있다.In the pronunciation evaluation method according to the embodiment of the present invention, in the sentence provided in the providing step, the words identified in the recognition step may be changed to a color different from the unidentified word.

또한, 상기 발음 정확도 평가단계는, 상기 파악단계에서 파악된 단어들의 개수와 상기 제공한 문장을 구성하는 단어들의 개수를 기초로 발음 정확도를 산출하는 단계; 및 상기 발음 정확도를 제공하는 단계;를 포함할 수 있다.The pronunciation accuracy evaluation step may include calculating pronunciation accuracy based on the number of words recognized in the recognition step and the number of words constituting the provided sentence; And providing the pronunciation accuracy.

그리고, 상기 후보 문장 획득단계는, 상기 음성 획득 단계에서 획득한 상기 사용자의 음성을 서버에 제공하면서, 음성 인식을 요청하는 단계; 및 음성 인식 요청에 대한 응답으로, 음성 인식을 통해 추출된 다수의 후보 문장들을 상기 서버로부터 수신하는 단계;를 포함할 수 있다.The candidate sentence obtaining step may include: requesting voice recognition while providing the voice of the user obtained in the voice obtaining step to the server; And receiving a plurality of candidate sentences extracted through speech recognition from the server in response to the speech recognition request.

한편, 본 발명의 다른 실시예에 따른, 전자기기는, 사용자의 음성을 획득하는 마이크; 및 상기 마이크를 통해 획득한 상기 사용자의 음성에 대한 음성 인식 결과로 추출된 다수의 후보 문장들을 이용하여 상기 사용자의 발음을 평가하는 프로세서;를 포함한다.According to another aspect of the present invention, there is provided an electronic device including: a microphone for acquiring a voice of a user; And a processor for evaluating the pronunciation of the user using a plurality of candidate sentences extracted as speech recognition results of the user's voice acquired through the microphone.

그리고, 상기 제어부는, 다수의 후보 문장들을 단어 단위로 분리하고, 분리된 단어들을 중복 제거하면서 통합하여 단어 그룹을 생성한 후, 상기 문장을 구성하는 단어들 중 상기 단어 그룹에 포함된 단어들을 파악하여 상기 사용자의 발음 정확도를 평가할 수 있다.The control unit separates a plurality of candidate sentences into words, generates a word group by integrating the separated words while eliminating the redundant words, and then recognizes words included in the word group among the words constituting the sentence The pronunciation accuracy of the user can be evaluated.

한편, 본 발명의 다른 실시예에 따른, 컴퓨터로 읽을 수 있는 기록매체에는, 사용자의 음성을 획득하는 단계; 획득한 상기 사용자의 음성에 대한 음성 인식 결과로 추출된 다수의 후보 문장들을 획득하는 단계; 및 상기 후보 문장들을 이용하여 상기 사용자의 발음을 평가하는 단계;를 포함하는 것을 특징으로 하는 발음 평가 방법을 수행할 수 있는 프로그램이 기록된다.
According to another aspect of the present invention, there is provided a computer-readable recording medium including: Acquiring a plurality of candidate sentences extracted as speech recognition results of the obtained voice of the user; And evaluating the pronunciation of the user using the candidate sentences. The program for performing the pronunciation evaluation method is recorded.

이상 설명한 바와 같이, 본 발명에 따르면, 음성인식을 이용하여 발음의 정확도를 평가할 수 있게 되어, 즉각적이면서도 객관적으로 발음에 대한 평가를 제공받을 수 있게 된다.INDUSTRIAL APPLICABILITY As described above, according to the present invention, the accuracy of pronunciation can be evaluated using speech recognition, and evaluation of pronunciation can be provided promptly and objectively.

또한, 본 발명에 따르면, 문장 단위의 음성인식과 단어 단위의 매칭 비율을 통해 발음 정확도를 산출하므로, 문장에 최적화되면서도 단어 단위의 비교 성능이 더욱 향상되어 신뢰성 있는 발음 정확도 산출이 가능해진다.Further, according to the present invention, since the pronunciation accuracy is calculated through the speech recognition on a sentence-by-word basis and the matching ratio on a word-by-word basis, the comparison performance of word units is further improved while being optimized for sentences, and reliable pronunciation accuracy can be calculated.

그리고, 발음 정확도 평가 결과가 문장을 구성하는 단어 단위로 제공되므로, 문장 내에서 부정확하게 발음되는 단어가 무엇인지 파악가능하여, 부정확한 단어에 대해 집중 학습이 아울러 가능하다.In addition, since the pronunciation accuracy evaluation result is provided in units of words constituting the sentence, it is possible to grasp the word that is inaccurately pronounced in the sentence, and intensive learning is possible for the inaccurate word as well.

뿐만 아니라, 대소문자 여부에 상관없고, 마침표, 물음표, 느낌표 등의 문장 부호를 포함한 문장의 경우도, 비교적 정교하게 발음 정확도를 산출할 수 있게 된다.
In addition, the sentence including punctuation marks such as a period, a question mark, and an exclamation mark regardless of whether the character is a case or not can be calculated relatively accurately.

도 1은 음성 인식을 이용한 영어 발음 평가 어플리케이션의 메인 화면,
도 2는 E-Study 메뉴 실행 화면,
도 3은, 도 2에 도시된 유닛들 중 Unit-1을 선택한 경우에 제공되는 단어 리스트 화면,
도 4는, 도 3에 도시된 단어들 중 "2. to death"를 선택한 경우에 제공되는 단어 학습 화면,
도 5는 문장 학습 화면,
도 6은, 도 4에 도시된 Speak 버튼이 선택된 경우에 나타나는 문장 녹취 화면,
도 7은 녹취된 사용자의 문장 발음에 대한 정확도가 나타난 문장 학습 화면,
도 8은 본 발명의 바람직한 실시예에 따른, 문장 발음 평가 방법의 설명에 제공되는 흐름도, 그리고,
도 9는 본 발명의 바람직한 실시예에 따른 스마트폰의 블럭도이다.1 shows a main screen of an English pronunciation evaluation application using speech recognition,
2 shows an E-Study menu execution screen,
FIG. 3 shows a word list screen provided when Unit-1 is selected from the units shown in FIG. 2,
4 shows a word learning screen provided when "2. to death" is selected among the words shown in FIG. 3,
FIG. 5 shows a sentence learning screen,
FIG. 6 shows a sentence recording screen displayed when the Speak button shown in FIG. 4 is selected,
FIG. 7 shows a sentence learning screen in which the recorded user's accuracy of the pronunciation of the sentence is displayed,
8 is a flowchart provided in the explanation of the sentence pronunciation evaluation method according to the preferred embodiment of the present invention,
9 is a block diagram of a smartphone according to a preferred embodiment of the present invention.

이하에서는 도면을 참조하여 본 발명을 보다 상세하게 설명한다.Hereinafter, the present invention will be described in detail with reference to the drawings.

도 1은 음성 인식을 이용한 영어 발음 평가 어플리케이션의 메인 화면을 도시한 도면이다. 도 1에 도시된 메인 화면은 영어 발음 평가 어플리케이션(이하 "어플리케이션"으로 약칭)을 실행시킨 경우, 디스플레이에 최초로 나타나는 사용자 인터페이스 화면에 해당한다.1 is a diagram showing a main screen of an English pronunciation evaluation application using speech recognition. The main screen shown in Fig. 1 corresponds to a user interface screen first displayed on the display when an English pronunciation evaluation application (abbreviated as "application" hereinafter) is executed.

도 1에 도시된 바에 따르면, 어플리케이션의 메인화면에는, 4가지의 메뉴들(E-Study, 뻔뻔 English, My Note, Tutorial) 나타나 있음을 확인할 수 있다.As shown in FIG. 1, four main menus (E-Study, English, My Note, and Tutorial) are displayed on the main screen of the application.

1) "E-Study" 메뉴는 DB(DataBase)에 저장되어 있는 내용을 학습하기 위한 메뉴이고,1) "E-Study" menu is a menu for learning contents stored in DB (DataBase)

2) "뻔뻔 English"는 영어 학습용으로 수신되는 전자메일을 불러내어 학습하기 위한 메뉴이며,2) "Brutal English" is a menu for learning by inviting e-mail received for learning English.

3) "My Note" 메뉴는 "E-Study" 메뉴에서 등록한 단어들만을 별도로 재학습하기 위한 메뉴이며,3) The "My Note" menu is for re-learning only the words registered in the "E-Study" menu,

4) "Tutorial" 메뉴는, 어플리케이션의 사용 설명을 제공받기 위한 메뉴이다.4) The "Tutorial" menu is a menu for receiving instructions on how to use the application.

도 2는 "E-Study" 메뉴를 실행한 화면을 도시한 도면이다. 도 2의 상부에 도시된 바에 따르면 "E-Study"는 3개의 파트들 중 하나를 선택하는 형식으로 되어 있음을 확인할 수 있다.2 is a view showing a screen on which the "E-Study" menu is executed. As shown in the upper part of Fig. 2, it can be seen that the "E-Study" has a format for selecting one of the three parts.

파트들은 테마별로 구분된 대화들의 묶음이다. 구체적으로, Part-1은 행동에 관한 대화들, Part-2는 감정에 관한 대화들 Part-3은 일상 생활에 관한 대화들이다. 또한, 각 파트들은, 단어 학습, 문장 학습 및 대화 학습들로 이루어져 있다.Parts are a bundle of conversations separated by themes. Specifically, Part-1 is conversations about behavior, Part-2 is conversations about emotions Part-3 is conversations about everyday life. In addition, each part consists of word learning, sentence learning and conversation learning.

도 2에는 "E-Study" 메뉴 화면에 Part-1(행동)의 단어 학습을 위한 12개의 선택가능한 유닛들이 나타난 상태를 도시하였다.FIG. 2 shows a state in which 12 selectable units for word learning of Part-1 (action) are displayed on the "E-Study" menu screen.

도 3에는, 도 2에 도시된 유닛들 중 "Unit-1"을 선택한 경우에 제공되는 단어 리스트 화면을 도시하였다. 도 3에 도시된 단어들 중 "2. to death"를 선택한 경우에 제공되는 단어 학습 화면을 도 4에 도시하였다.Fig. 3 shows a word list screen provided when "Unit-1" is selected among the units shown in Fig. FIG. 4 shows a word learning screen provided when "2. to death" of the words shown in FIG. 3 is selected.

도 4에 도시된 바와 같이, 단어 학습 화면에는, 단어와 단어 의미가 중앙에 나타나고, 좌측에는 이전 단어를 호출하기 위한 "Previous" 버튼이 나타나며, 우측에는 다음 단어를 호출하기 위한 "Next" 버튼이 나타난다.As shown in Fig. 4, a word " Previous "button for calling a previous word is displayed on the left side and a" Next "button for calling the next word is displayed on the right side appear.

또한, 단어 학습 화면의 중앙 하부에는, 1) "Listen(발음듣기)" 버튼, 2) "Speak(발음하기)" 버튼, 3) "Flash(깜박이)" 버튼, 4) "My Note" 버튼, 5) "Goal Setting(목표설정)" 버튼 및 6) "Interpretation(뜻)" 버튼이 마련되어 있다.In the middle of the word learning screen, there are 1) a "Listen" button, 2) a "Speak" button, 3) a "Flash" 5) "Goal Setting" button and 6) "Interpretation" button.

1) "Listen(발음듣기)" 버튼은 단어의 발음을 듣기 위해 단어 발음을 출력 명령하기 위한 버튼이고, 2) "Speak(발음하기)" 버튼은 사용자가 자신의 단어 발음의 정확도를 측정하고자 할 때 선택하는 버튼이다.1) The "Listen" button is a button for outputting the word pronunciation to hear the pronunciation of the word. 2) The "Speak" button is for the user to measure the accuracy of the pronunciation of the word It is a button to select when.

3) "Flash(깜박이)" 버튼은 단어와 단어 의미를 교번적으로 표시하도록 명령하기 위한 버튼이고, 4) "My Note" 버튼은 현재의 단어를 추후에 별도로 재학습하기 위해 등록하기 위한 버튼으로, 전술한 "My Note" 메뉴를 통해 등록된 단어들만을 재학습 가능하다.3) "Flash" button is a button for commanding to alternately display the word and word meaning; and 4) "My Note" button is a button for registering the current word separately for re-learning later , It is possible to re-learn only words registered through the above-described "My Note" menu.

5) "Goal Setting(목표설정)" 버튼은 "Speak(발음하기)"에서 목표로 하는 발음 정확도를 설정하기 위한 버튼이다. 이 버튼을 이용하여 토글 방식에 따라 목표 설정이 가능한데, 버튼을 선택할 때마다 목표가 '미설정 → 30% → 50% → 70% → 90% → 미설정'으로 토글된다.5) The "Goal Setting" button is used to set the target pronunciation accuracy in "Speak". With this button, the target can be set according to the toggle method. Each time the button is selected, the target toggles to 'Not set → 30% → 50% → 70% → 90% → not set'.

6) "Interpretation(뜻)" 버튼은 단어 의미를 단어와 함께 표시하거나 표시하지 않도록 명령하기 위한 버튼이다.6) The "Interpretation" button is a button for instructing not to display or display the word meaning together with the word.

도 5에는 문장 학습 화면을 도시하였다. 도 5에 도시된 바와 같이, 문장 학습 화면에는, 문장과 문장 의미가 중앙에 나타나고, 중앙 하부에는 이전 문장을 호출하기 위한 "Previous" 버튼과 다음 문장을 호출하기 위한 "Next" 버튼이 나타난다.FIG. 5 shows a sentence learning screen. 5, the sentence and sentence meanings appear at the center in the sentence learning screen, and a "Previous" button for calling the previous sentence and a "Next" button for calling the next sentence appear in the center lower part.

또한, 문장 학습 화면의 중앙 하부에는, 1) "Listen(발음듣기)" 버튼, 2) "Speak(발음하기)" 버튼, 3) "Record(녹음하기)" 버튼, 4) "Play(재생하기)" 버튼, 5) "Goal Setting(목표설정)" 버튼 및 6) "Interpretation" 버튼이 마련되어 있다.At the center bottom of the sentence learning screen, there are 1) "Listen" button, 2) "Speak" button, 3) "Record" button, 4) ) "Button, 5)" Goal Setting "button, and 6)" Interpretation "button.

1) "Listen(발음듣기)" 버튼은 문장의 발음을 듣기 위해 문장 발음을 출력 명령하기 위한 버튼이고, 2) "Speak(발음하기)" 버튼은 사용자가 자신의 문장 발음의 정확도를 측정하고자 할 때 선택하는 버튼이다.1) The "Listen" button is a button for commanding the pronunciation of the sentence to hear the pronunciation of the sentence. 2) The "Speak" button is for the user to measure the accuracy of pronunciation of the sentence It is a button to select when.

3) "Record(녹음하기)" 버튼은 사용자의 문장 발음을 녹음하도록 명령하기 위한 버튼이고, 4) "Play(재생하기)" 버튼은 "Record(녹음하기)"를 통해 녹음한 내용을 재생하도록 명령하기 위한 버튼이다. 녹음된 것이 없는 경우, 도 5에 도시된 바와 같이 "Play(재생하기)" 버튼은 비활성화된다.3) The "Record" button is used to instruct the user to record the pronunciation of the sentence. 4) The "Play" button is used to play back the recorded content via "Record" It is a button for command. If nothing is recorded, the "Play" button is inactivated as shown in Fig.

5) "Goal Setting(목표설정)" 버튼은 "Speak(발음하기)"에서 목표로 하는 발음 정확도를 설정하기 위한 버튼이고, 6) "Interpretation" 버튼은 문장 의미를 문장와 함께 표시하거나 표시하지 않도록 명령하기 위한 버튼이다.5) The "Goal Setting" button is used to set the target pronunciation accuracy in "Speak". 6) The "Interpretation" button is used to display the sentence meaning with sentence or not Button.

도 4에 도시된 "Speak(발음하기)" 버튼이 선택된 경우에 나타나는 문장 녹취 화면을 도 6에 도시하였다. 도 6에 도시된 문장 녹취 화면은 문장 발음의 정확도 측정을 위해 사용자의 문장 발음을 녹취하는 중에 나타나는 화면이다.6 shows a sentence recording screen that appears when the "Speak" button shown in Fig. 4 is selected. The sentence recording screen shown in FIG. 6 is a screen that appears while recording the pronunciation of the sentence of the user for measuring the accuracy of sentence pronunciation.

도 7에는 녹취된 사용자의 문장 발음에 대한 정확도가 평가 결과로 나타난 문장 학습 화면을 도시한 도면이다. 도 7에 도시된 바에 따르면, 문장의 단어 중 일부가 적색으로 변경된 것을 확인할 수 있는데, 적색으로 변경된 단어들은 문장 내에서 사용자가 정확하게 발음한 단어들이다. 반면, 흑색으로 유지된 단어들은 문장 내에서 사용자가 정확하게 발음하지 못한 단어들이다.FIG. 7 is a diagram showing a sentence learning screen in which the accuracy of pronunciation of a sentence of a user is evaluated. As shown in FIG. 7, it can be confirmed that some of the words of the sentence are changed to red, and the words changed to red are words that are correctly pronounced by the user in the sentence. On the other hand, words kept black are words that the user did not pronounce correctly in the sentence.

한편, 도 7에 도시된 바에 따르면, "인식률 : 58%"가 표시된 것을 확인할 수 있는데, 이 표시는 발음 정확도를 나타내는 수치이다. 인식률은 [(정확히 발음된 단어 수)/(문장의 전체 단어 수)*100]로 산출가능하다.On the other hand, as shown in FIG. 7, it can be seen that the "recognition rate: 58%" is displayed, which indicates the pronunciation accuracy. The recognition rate can be calculated as [(number of correctly pronounced words) / (total number of words in a sentence) * 100].

도 7에 도시되지 않았지만, 인식률(58%)이 목표(30%)를 초과한 경우, 문장에 동그라미가 표시된다.Although not shown in FIG. 7, when the recognition rate (58%) exceeds the target (30%), a circle is displayed in the sentence.

이하에서, 문장 발음을 평가하는 과정에 대해, 도 8을 참조하여 상세히 설명한다. 도 8은 본 발명의 바람직한 실시예에 따른, 문장 발음 평가 방법의 설명에 제공되는 흐름도이다. 도 8에 도시된 흐름도는 스마트폰의 어플리케이션이 실행하는 알고리즘으로 이해할 수 있다.Hereinafter, the process of evaluating sentence pronunciation will be described in detail with reference to FIG. 8 is a flowchart provided in the explanation of a sentence pronunciation evaluation method according to a preferred embodiment of the present invention. The flowchart shown in Fig. 8 can be understood as an algorithm executed by a smart phone application.

도 8에 도시된 바와 같이, 먼저 스마트폰에서 실행된 어플리케이션이 학습할 문장이 나타난 문장 학습 화면을 터치스크린에 표시하여 사용자에게 제공한다(S105).As shown in FIG. 8, an application executed on a smartphone first displays a sentence learning screen on which a sentence to be learned is displayed on a touch screen, and provides the sentence learning screen to a user (S105).

문장 학습 화면에서 "Speak(발음하기)" 버튼이 선택되면(S110-Y), 어플리케이션은 도 6에 도시된 바와 같은 문장 녹음 화면을 터치 스크린에 표시하고, 마이크를 통해 S105단계에서 제공한 문장을 읽는 사용자의 음성을 획득한다(S115).When the "Speak" button is selected on the sentence learning screen (S110-Y), the application displays a sentence recording screen as shown in FIG. 6 on the touch screen, and the sentence And acquires the voice of the user who is reading (S115).

이후, 어플리케이션은 S115단계를 통해 획득한 사용자의 음성을 음성 인식 서버에 제공하면서, 음성 인식을 요청한다(S120).Then, the application requests the voice recognition while providing the voice of the user acquired in step S115 (S120).

음성 인식 서버는 S120단계를 통해 스마트폰으로부터 수신한 사용자의 음성에 대해, 문장 단위의 음성 인식을 통해 후보 문장 10개를 추출한다. 10개의 후보 문장은 유사도가 높은 순으로 추출가능하며, 추출한 10개의 후보 문장을 음성 인식 요청에 대한 응답으로 스마트폰에 전달한다.In step S120, the speech recognition server extracts 10 candidate sentences from the user's voice received from the smart phone through speech recognition on a sentence-by-sentence basis. 10 candidate sentences can be extracted in descending order of similarity, and 10 extracted candidate sentences are transmitted to the smartphone in response to a voice recognition request.

이에 따라, 스마트폰의 어플리케이션은 음성 인식 서버에 의해 추출된 10개의 후보 문장들을 수신하게 되고(S125), 수신된 10개의 문장들에서 문장 부호들을 제거한다(S130). 문장 부호들은 문장을 구성하는 단어들 제외한 것들로, 느낌표(!), 물음표(?), 쉼표(,), 마침표(.) 등을 말한다.Accordingly, the smartphone application receives 10 candidate sentences extracted by the speech recognition server (S125), and removes the sentence codes from the received 10 sentences (S130). Punctuation marks are exclamation points (!), Question marks (?), Commas (,), and periods (.

이후, 어플리케이션은, 문장 부호들이 제거된 10개의 후보 문장들을 단어 단위로 분리하고(S135), 분리된 단어들을 중복 제거하면서 통합하여 단어 그룹을 생성한다(S140).Then, the application separates the ten candidate sentences from which the punctuation marks have been removed in units of words (S135), and separates and removes the separated words to create a word group (S140).

예를 들어, 음성 인식 서버로부터 수신한 10개의 후보 문장들이, "It's my life.", "It is my knife.", ... , "It's my lie."인 경우, S140단계에서 생성되는 단어 그룹은 [It's, my, life, It, is, knife, lie]가 된다.For example, if the ten candidate sentences received from the speech recognition server are "It's my life.", "It is my knife.", ..., "It's my lie.", The group becomes [It's, my, life, It, is, knife, lie].

단어 그룹 생성이 완료되면, 어플리케이션은 S105단계에서 제공한 문장을 구성하는 단어들 중 S140단계에서 생성된 단어 그룹에 포함된 단어들을 파악한다(S145).When the word group creation is completed, the application recognizes words included in the word group generated in step S140 among the words constituting the sentence provided in step S105 (S145).

문장을 구성하는 단어들 중, S145단계에서 단어 그룹에 포함된 것으로 파악된 단어는 정확하게 발음된 단어로 분류되지만, 그렇지 않은 단어는 정확하게 발음되지 않은 단어로 분류된다.Of the words constituting the sentence, the words recognized as being included in the word group in step S145 are classified as correctly pronounced words, while the other words are not correctly pronounced.

S105단계에서 제공한 문장이 "He is my knight."이고 S140단계에서 생성된 단어 그룹이 [It's, my, life, It, is, knife, lie]인 경우를 예로 들면, 문장을 구성하는 단어들 중, "is"와 "my"는 단어 그룹에 포함되었으므로 정확하게 발음된 단어로 분류되지만, "He"와 "knight"는 단어 그룹에 포함되지 않았으므로 정확하게 발음되지 않은 단어로 분류된다.If the sentence provided in step S105 is "He is my knight." And the word group generated in step S140 is [It's, my, life, It, is, knife, lie] Since "is" and "my" are included in the word group, they are classified as correctly pronounced words, but "He" and "knight" are not included in the word group and are classified as not pronounced words.

한편, S145단계에서의 단어 파악시 대문자와 소문자 구별은 무시한다. 예를 들어, "It"와 "it"는 동일한 단어로 취급한다.On the other hand, when the word is recognized in step S145, uppercase and lowercase characters are ignored. For example, "It" and "it" are treated as the same word.

이후, 어플리케이션은 S145단계에서 단어 그룹에 포함된 것으로 파악된 문장의 단어들을 문장에서 빨간 색으로 변경하여 표시한다(S150). 또한, 어플리케이션은 문장 발음의 정확도를 산출하고 터치스크린 표시하여 사용자에게 안내한다(S155).Thereafter, the application changes the words of sentences identified as being included in the word group in step S145 from red to a sentence (S150). Further, the application calculates the accuracy of the pronunciation of the sentence and displays it on the touch screen to inform the user (S155).

S155단계에서, 어플리케이션은 문장을 구성하는 단어의 개수(문장의 전체 단어 수)와 S145단계에서 단어 그룹에 포함된 것으로 파악된 단어의 개수(정확히 발음된 단어 수)의 비율을 계산하여 발음의 정확도(인식률)를 산출할 수 있다. S150단계 및 S155단계의 수행결과는, 도 7에 예시되어 있다.In step S155, the application calculates the ratio of the number of words constituting the sentence (the total number of words in the sentence) to the number of words recognized as being included in the word group in step S145 (the number of correctly pronounced words) (Recognition rate) can be calculated. The results of the operations in steps S150 and S155 are illustrated in FIG.

S105단계에서 제공한 문장이 "He is my knight."이고 S140단계에서 생성된 단어 그룹이 [It's, my, life, It, is, knife, lie]인 경우를 예로 들면, 문장을 구성하는 단어의 개수는 "4"이고, 단어 그룹에 포함된 것으로 파악된 단어의 개수는 "2"이므로 발음의 정확도는 50%가 된다.If the sentence provided in step S105 is "He is my knight." And the word group generated in step S140 is [It's, my, life, It, is, knife, lie] The number is "4", and the number of words recognized as being included in the word group is "2", so that the pronunciation accuracy is 50%.

도 9에는, 도 8에 도시된 문장 발음 평가 방법을 수행할 수 있는 어플리케이션이 설치/실행되는, 본 발명의 바람직한 실시예에 따른 스마트폰의 블럭도이다. 도 9에 도시된 바와 같이, 본 발명의 바람직한 실시예에 따른 스마트폰(200)은, 통신부(210), 터치스크린(220), 프로세서(230), 스피커(240), 마이크(250) 및 저장부(260)를 포함한다.FIG. 9 is a block diagram of a smartphone according to a preferred embodiment of the present invention, in which an application capable of performing the sentence pronunciation evaluation method shown in FIG. 8 is installed / executed. 9, the smartphone 200 according to the preferred embodiment of the present invention includes a communication unit 210, a touch screen 220, a processor 230, a speaker 240, a microphone 250, (260).

통신부(210)는 이동 통신과 무선 네트워킹을 통해, 기지국은 물론 AP(Access Point) 및 주변 기기들과 통신 연결을 설정하고 유지한다. 특히, 통신부(210)는 어플리케이션 서버와 음성 인식 서버에 통신 연결을 설정한다.The communication unit 210 establishes and maintains a communication connection with an access point (AP) and peripheral devices as well as a base station through mobile communication and wireless networking. In particular, the communication unit 210 establishes a communication connection with the application server and the voice recognition server.

터치스크린(220)는 어플리케이션 실행 화면이 출력되는 디스플레이로 기능하는 한편, 사용자 명령을 입력받아 프로세서(230)로 전달하는 사용자 입력수단으로 기능한다.The touch screen 220 functions as a display for outputting an application execution screen, and functions as a user input means for receiving a user command and transmitting it to the processor 230.

스피커(240)는 오디오 출력 수단으로, 단어 발음, 문장 발음 및 녹음된 사용자의 발음을 출력한다.The speaker 240 is an audio output means and outputs word pronunciation, sentence pronunciation, and pronunciation of the recorded user.

마이크(250)는 사용자의 발음을 음성신호에서 전기신호로 변환하여, 사용자 음성을 획득하고, 획득한 사용자 음성을 프로세서(230)로 전달한다.The microphone 250 converts the pronunciation of the user into an electrical signal from the voice signal, acquires the user voice, and transmits the acquired user voice to the processor 230.

저장부(260)는 스마트폰에 필요한 프로그램과 데이터가 저장되는 저장매체로, 전술한 어플리케이션이 설치된다.The storage unit 260 is a storage medium that stores programs and data necessary for a smartphone, and the above-described applications are installed.

프로세서(230)는 터치스크린(220)을 통해 입력되는 사용자 명령에 따라 스마트폰의 전반적인 동작을 제어한다. 특히, 프로세서(230)는 저장부(260)에 설치된 어플리케이션을 실행시켜, 도 8에 도시된 문장 발음 평가를 수행한다.The processor 230 controls the overall operation of the smartphone according to a user command input through the touch screen 220. [ Particularly, the processor 230 executes the application installed in the storage unit 260 and performs the sentence pronunciation evaluation shown in Fig.

지금까지, 음성 인식을 이용한 문장 발음 평가 방법 및 이를 수행할 수 있는 어플리케이션이 설치/실행되는 스마트폰에 대해 상세히 설명하였다.Up to now, a method of evaluating sentence pronunciation using speech recognition and a smartphone installed / executed by an application capable of performing the method have been described in detail.

위 실시예에서는 영어 발음 평가를 상정하였는데, 이는 설명의 편의를 위해 든 일 예에 불과한 것으로, 영어 이외의 다른 언어에 대한 발음 평가에도 본 발명이 적용될 수 있다.In the above embodiment, English pronunciation evaluation is assumed, which is merely an example for convenience of explanation, and the present invention can also be applied to pronunciation evaluation for languages other than English.

또한, 위 실시예에서 음성 인식 서버는 후보 문장을 10개 추출하는 것으로 설명하였으나, 이 역시 예시적인 것으로 후보 문장의 개수는 필요와 서버의 DB/사양에 따라 변경가능하다.In the above embodiment, the speech recognition server extracts 10 candidate sentences. However, this is also an example, and the number of candidate sentences can be changed according to the necessity and the DB / specification of the server.

그리고, 음성 인식 서버가 아닌 스마트폰에서 음성 인식을 수행하도록 구현하는 것이 가능한데, 이 경우 스마트폰은 사용자 음성을 음성 인식 서버에 전송할 필요가 없지만, 음성 인식을 위한 알고리즘과 DB를 구비하고 있어야 한다.In this case, the smart phone does not need to transmit the user voice to the voice recognition server, but the voice recognition server and the voice recognition server need to have an algorithm and DB for voice recognition.

한편, 위 실시예에서 음성 인식을 이용한 문장 발음 평가를 수행할 수 있는 어플리케이션이 설치/실행되는 기기로 스마트폰을 언급하였는데, 스마트폰은 위 어플리케이션이 설치/실행가능한 전자기기의 일 예로 제시한 것이다. 본 발명의 기술적 사상은 스마트폰 이외의 다른 전자기기에도 적용가능함은 물론이다.On the other hand, in the above embodiment, a smartphone is referred to as an apparatus for installing / executing an application capable of performing sentence pronunciation evaluation using speech recognition. The smart phone is an example of an electronic device in which the application can be installed / executed . It goes without saying that the technical idea of the present invention is also applicable to electronic devices other than smart phones.

한편, 본 실시예에 따른 장치와 방법의 기능을 수행하게 하는 컴퓨터 프로그램을 수록한 컴퓨터로 읽을 수 있는 기록매체에도 본 발명의 기술적 사상이 적용될 수 있음은 물론이다. 또한, 본 발명의 다양한 실시예에 따른 기술적 사상은 컴퓨터로 읽을 수 있는 기록매체에 기록된 컴퓨터로 읽을 수 있는 코드 형태로 구현될 수도 있다. 컴퓨터로 읽을 수 있는 기록매체는 컴퓨터에 의해 읽을 수 있고 데이터를 저장할 수 있는 어떤 데이터 저장 장치이더라도 가능하다. 예를 들어, 컴퓨터로 읽을 수 있는 기록매체는 ROM, RAM, CD-ROM, 자기 테이프, 플로피 디스크, 광디스크, 하드 디스크 드라이브, 등이 될 수 있음은 물론이다. 또한, 컴퓨터로 읽을 수 있는 기록매체에 저장된 컴퓨터로 읽을 수 있는 코드 또는 프로그램은 컴퓨터간에 연결된 네트워크를 통해 전송될 수도 있다.It goes without saying that the technical idea of the present invention can also be applied to a computer-readable recording medium having a computer program for performing the functions of the apparatus and method according to the present embodiment. In addition, the technical idea according to various embodiments of the present invention may be embodied in computer-readable code form recorded on a computer-readable recording medium. The computer-readable recording medium is any data storage device that can be read by a computer and can store data. For example, the computer-readable recording medium may be a ROM, a RAM, a CD-ROM, a magnetic tape, a floppy disk, an optical disk, a hard disk drive, or the like. In addition, the computer readable code or program stored in the computer readable recording medium may be transmitted through a network connected between the computers.

또한, 이상에서는 본 발명의 바람직한 실시예에 대하여 도시하고 설명하였지만, 본 발명은 상술한 특정의 실시예에 한정되지 아니하며, 청구범위에서 청구하는 본 발명의 요지를 벗어남이 없이 당해 발명이 속하는 기술분야에서 통상의 지식을 가진자에 의해 다양한 변형실시가 가능한 것은 물론이고, 이러한 변형실시들은 본 발명의 기술적 사상이나 전망으로부터 개별적으로 이해되어져서는 안될 것이다.
While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is to be understood that the invention is not limited to the disclosed exemplary embodiments, but, on the contrary, It will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the present invention.

200 : 스마트폰 210 : 통신부
220 : 터치스크린 230 : 프로세서
240 : 스피커 250 : 마이크
260 : 저장부200: smartphone 210: communication unit
220: Touch screen 230: Processor
240: speaker 250: microphone
260:

Claims

Providing a sentence;
Obtaining a voice of a user reading a sentence provided in the providing step;
Extracting and acquiring a plurality of candidate sentences in descending order of similarity through speech recognition on the basis of sentences of the speech of the user;
Separating the words included in each of the plurality of candidate sentences into words;
Generating a word group by merging and removing the separated words through the separation step; And
Comparing the words constituting the sentence with words included in the word group and evaluating pronunciation of the user,
In the evaluation step,
Identifying words included in the word group among the words constituting the sentence individually irrespective of the order of inclusion; And
And evaluating the pronunciation accuracy of the user based on the number of words recognized in the recognition step,
The plurality of candidate sentences including a first candidate sentence and a second candidate sentence,
Wherein the separating step comprises:
A first separating step of separating the first candidate sentence into words; And
And a second separating step of separating the second candidate sentence into words,
Wherein the generating comprises:
Wherein a word group is generated by combining words separated from the first candidate sentence in the first separation step and words separated from the second candidate sentence in the second separation step, .

delete

The method according to claim 1,
Removing punctuation marks from the plurality of candidate sentences,
Wherein the separating step comprises:
Wherein a plurality of candidate sentences from which the punctuation marks have been removed are divided into words.

The method according to claim 1,
And changing, in the sentence provided in the providing step, the words identified in the holding step to a color different from the unidentified word.

The method according to claim 1,
In the pronunciation accuracy evaluation step,
Calculating a pronunciation accuracy based on the number of words recognized in the recognition step and the number of words constituting the provided sentence; And
And providing the pronunciation accuracy. &Lt; Desc / Clms Page number 19 >

The method according to claim 1,
The candidate sentence acquiring step includes:
Requesting voice recognition while providing the voice of the user acquired in the voice acquiring step to a server; And
And receiving a plurality of candidate sentences extracted through speech recognition from the server in response to the speech recognition request.

A microphone for acquiring a voice of a user who reads the provided sentence; And
And a processor for evaluating pronunciation of the user by using a plurality of candidate sentences extracted in descending order of similarity through speech recognition on a sentence unit basis of the user's voice acquired through the microphone,
The processor comprising:
The words included in each of the plurality of candidate sentences are divided into words, the separated words are removed, and words are grouped together, and then words constituting the sentence are compared with words included in the word group The pronunciation of the user is evaluated, and the words included in the word group among the words constituting the sentence are individually grasped irrespective of the order of inclusion, and the pronunciation accuracy of the user is evaluated based on the number of recognized words In addition,
The plurality of candidate sentences including a first candidate sentence and a second candidate sentence,
The processor comprising:
And separating the first candidate sentence by a word unit, separating the second candidate sentence by a word unit, integrating words separated from the first candidate sentence and words separated from the second candidate sentence, And generates a word group.

delete

Providing a sentence;
Obtaining a voice of a user reading a sentence provided in the providing step;
Extracting and acquiring a plurality of candidate sentences in descending order of similarity through speech recognition on the basis of sentences of the speech of the user;
Separating the words included in each of the plurality of candidate sentences into words;
Generating a word group by merging and removing the separated words through the separation step; And
Comparing the words constituting the sentence with words included in the word group and evaluating pronunciation of the user,
In the evaluation step,
Identifying words included in the word group among the words constituting the sentence individually irrespective of the order of inclusion; And
And evaluating the pronunciation accuracy of the user based on the number of words recognized in the recognition step,
The plurality of candidate sentences including a first candidate sentence and a second candidate sentence,
Wherein the separating step comprises:
A first separating step of separating the first candidate sentence into words; And
And a second separating step of separating the second candidate sentence into words,
Wherein the generating comprises:
Wherein a word group is generated by combining words separated from the first candidate sentence in the first separation step and words separated from the second candidate sentence in the second separation step, A computer-readable recording medium having recorded thereon a program capable of performing the above-described operations.