KR20200044285A

KR20200044285A - Apparatus for Learning Service Using Speech Recorgnition and Driving Method Thereof

Info

Publication number: KR20200044285A
Application number: KR1020180124820A
Authority: KR
Inventors: 신종우
Original assignee: 신한대학교 산학협력단
Priority date: 2018-10-19
Filing date: 2018-10-19
Publication date: 2020-04-29
Also published as: KR102233155B1

Abstract

The present invention relates to an apparatus for a learning service using voice recognition technology to increase efficiency of learning, and a driving method thereof. According to an embodiment of the present invention, the apparatus for a learning service using voice recognition technology may comprise: a storage unit storing evaluation data related to pronunciation and rhythm of foreign words and foreign sentences including the foreign words; and a control unit requesting repetition of read by comparing a voice recognition result for pronunciation of a user reading the foreign words or the foreign sentences provided to a user device of the user with (previously) stored evaluation data and informing an inaccurate section of the pronunciation to the user device according to a matching rate of the pronunciation, or providing a sign related to the rhythm of the foreign sentences of the user device to compare a voice recognition result for the rhythm of the user based on the provided sign with the (previously) stored evaluation data and identify a matching rate of the rhythm.

Description

Apparatus for Learning Service Using Speech Recorgnition and Driving Method Thereof}

본 발명은 음성인식기술을 이용한 학습서비스장치 및 그 장치의 구동방법에 관한 것으로서, 더 상세하게는 가령 음성인식기술을 이용해 외국어 단어나 문장의 발음과 리듬의 일치율을 판별하고, 아울러 화면에 표시되는 정보를 통해 사용자에게 일치여부를 정확히 전달하여 학습의 효율을 증대하려는 음성인식기술을 이용한 학습서비스장치 및 그 장치의 구동방법에 관한 것이다.The present invention relates to a learning service device using a voice recognition technology and a method of driving the device, and more specifically, to determine the match rate of pronunciation and rhythm of a foreign language word or sentence using voice recognition technology, and displayed on the screen. The present invention relates to a learning service device using a speech recognition technology and an operation method of the device to increase the efficiency of learning by accurately conveying whether or not the user agrees with information.

기존의 외국어 학습방법은 스피커에서 출력되는 외국어 발음을 반복하여 학습하는 것이 일반적이었다. 하지만, 이러한 학습방법만으로는 자신이 발음에 대해서는 평가를 받을 수 없는 문제가 있었다. 이러한 문제로 최근에는 컴퓨터가 마이크와 같은 소리 센서를 통해 얻은 음향학적 신호(acoustic speech signal)를 단어나 문장으로 변환시키는 음성인식기술을 적용하려는 추세로 발전하고 있다.In the conventional method of learning a foreign language, it is common to repeatedly learn the pronunciation of a foreign language output from a speaker. However, there was a problem in that he / she could not be evaluated for pronunciation by using this learning method alone. Due to these problems, computers have recently developed a trend to apply speech recognition technology that converts acoustic speech signals obtained through sound sensors such as microphones into words or sentences.

그런데, 음성인식기술은 컴퓨터가 사람의 음성을 얼마만큼 정확하게 인식하느냐가 핵심 기술로 주로 다루어지고 있지만, 이면에는 사람이 얼마만큼 정확한 발음으로 해당 단어 또는 문장을 발화 즉 이야기했는지도 중요한 문제로 제기된다.However, the voice recognition technology mainly deals with how accurately a computer recognizes a human voice as a core technology, but it is also an important issue that how much a person speaks or speaks the word or sentence with accurate pronunciation.

한국공개특허공보 제10-2017-0041642호(2017.04.17.)Korean Patent Publication No. 10-2017-0041642 (2017.04.17.) 한국공개특허공보 제10-2004-0065593호(2004.07.23.)Korean Patent Publication No. 10-2004-0065593 (2004.07.23.)

본 발명의 실시예는 가령 음성인식기술을 이용해 외국어 단어나 문장의 발음과 억양의 일치율을 판별하고, 아울러 화면에 표시되는 정보를 통해 사용자에게 일치여부를 정확히 전달하여 학습의 효율을 증대하려는 음성인식기술을 이용한 학습서비스장치 및 그 장치의 구동방법을 제공함에 그 목적이 있다.According to an embodiment of the present invention, for example, speech recognition technology is used to determine the matching rate of pronunciation and intonation of a foreign language word or sentence, and voice recognition to increase learning efficiency by accurately transmitting a match to a user through information displayed on the screen. It is an object to provide a learning service device using technology and a driving method of the device.

본 발명의 실시예에 따른 음성인식기술을 이용한 학습서비스장치는 외국어 단어 및 상기 단어를 포함하는 외국어 문장의 발음 및 리듬에 관련되는 평가데이터를 저장하는 저장부, 및 사용자의 사용자장치로 제공되는 외국어 단어 또는 문장을 읽은 상기 사용자의 발음에 대한 음성인식결과를 상기 저장한 평가데이터와 비교하여 상기 발음의 일치율에 따라 상기 사용자장치로 상기 발음의 부정확한 구간을 알려주어 읽기를 반복 요청하거나, 상기 사용자장치로 상기 문장의 리듬 관련 표식을 제공하여 상기 제공한 표식에 근거한 상기 사용자의 리듬에 대한 음성인식결과를 상기 저장한 평가데이터와 비교하여 상기 리듬의 일치율을 확인하는 제어부를 포함한다.The learning service apparatus using speech recognition technology according to an embodiment of the present invention includes a storage unit for storing evaluation data related to pronunciation and rhythm of a foreign language word and a foreign language sentence containing the word, and a foreign language provided to the user's user device Compare the voice recognition result of the user's pronunciation that read the word or sentence with the stored evaluation data, and request the user to repeat reading by notifying the user device of the inaccurate section of the pronunciation according to the matching rate of the pronunciation, or the user And a control unit that provides a rhythm-related marker of the sentence to a device and compares the voice recognition result of the user's rhythm based on the provided marker with the stored evaluation data to check the matching rate of the rhythm.

상기 제어부는, 상기 발음의 일치율이 기준치(예: 100%) 미만일 때 상기 부정확한 구간을 상기 사용자장치로 알려주며, 상기 발음의 일치율이 기준치 이상일 때 상기 리듬 관련 표식을 상기 사용자장치로 제공할 수 있다.The controller may notify the user device of the inaccurate section when the match rate of the pronunciation is less than a reference value (eg, 100%), and provide the rhythm-related marker to the user device when the match rate of the pronunciation is greater than or equal to the reference value. .

상기 제어부는, 상기 일치율을 나타내는 부가정보를 상기 사용자장치로 제공하여 화면에 표시되도록 할 수 있다.The control unit may provide additional information indicating the matching rate to the user device to be displayed on the screen.

상기 제어부는, 상기 반복 요청을 위해 상기 사용자장치에서 음성을 출력하도록 하거나 상기 부정확한 구간의 표식이 상기 일치율에 따라 화면에 조절되어 표시되도록 상기 사용자장치로 요청할 수 있다.The control unit may request the user device to output a voice from the user device for the repetition request, or to display the incorrect section mark on the screen according to the match rate.

상기 제어부는, 상기 리듬 관련 표식을 악보 형태로 화면에 표시하도록 상기 사용자장치에 요청할 수 있다.The controller may request the user device to display the rhythm-related mark on the screen in the form of a score.

또한, 본 발명의 실시예에 따른 음성인식기술을 이용한 학습서비스장치의 구동방법은, 저장부 및 제어부를 포함하는 음성인식기술을 이용한 학습서비스장치의 구동방법으로서, 상기 저장부가, 외국어 단어 및 상기 단어를 포함하는 외국어 문장의 발음 및 리듬에 관련되는 평가데이터를 저장하는 단계, 및 상기 제어부가, 사용자의 사용자장치로 제공되는 외국어 단어 또는 문장을 읽은 상기 사용자의 발음에 대한 음성인식결과를 상기 저장한 평가데이터와 비교하여 상기 발음의 일치율에 따라 상기 사용자장치로 상기 발음의 부정확한 구간을 알려주어 읽기를 반복 요청하거나, 상기 사용자장치로 상기 문장의 리듬 관련 표식을 제공하여 상기 제공한 표식에 근거한 상기 사용자의 리듬에 대한 음성인식결과를 상기 저장한 평가데이터와 비교하여 상기 리듬의 일치율을 확인하는 단계를 포함한다.In addition, a driving method of a learning service device using a speech recognition technology according to an embodiment of the present invention is a driving method of a learning service device using a speech recognition technology including a storage unit and a control unit, wherein the storage unit includes foreign language words and the Storing evaluation data related to pronunciation and rhythm of a foreign language sentence including a word, and the control unit stores the voice recognition result of the user's pronunciation reading the foreign language word or sentence provided to the user device of the user, According to the match rate of the pronunciation compared to one evaluation data, the user device notifies the inaccurate section of the pronunciation and requests reading repeatedly, or provides the rhythm-related marker of the sentence to the user device based on the provided marker The voice recognition result of the user's rhythm is compared with the stored evaluation data. And a step to determine the match rate of rhythm.

상기 확인하는 단계는, 상기 발음의 일치율이 기준치 미만일 때 상기 부정확한 구간을 상기 사용자장치로 알려주며, 상기 발음의 일치율이 기준치 이상일 때 상기 리듬 관련 표식을 상기 사용자장치로 제공할 수 있다.The checking step may notify the user device of the inaccurate section when the match rate of the pronunciation is less than a reference value, and may provide the rhythm-related marker to the user device when the match rate of the pronunciation is more than a reference value.

상기 확인하는 단계는, 상기 일치율을 나타내는 부가정보를 상기 사용자장치로 제공하여 화면에 표시되도록 하는 단계를 포함할 수 있다.The checking may include providing additional information indicating the matching rate to the user device so that it is displayed on the screen.

상기 확인하는 단계는, 상기 반복 요청을 위해 상기 사용자장치에서 음성을 출력하도록 하거나 상기 부정확한 구간의 표식이 상기 일치율에 따라 화면에 조절되어 표시되도록 상기 사용자장치로 요청하는 단계를 포함할 수 있다.The checking may include requesting the user device to output a voice from the user device for the repetition request or to display the inaccurate section marker on the screen according to the match rate.

상기 확인하는 단계는, 상기 리듬 관련 표식을 악보 형태로 화면에 표시하도록 상기 사용자장치에 요청하는 단계를 포함할 수 있다.The checking may include requesting the user device to display the rhythm-related mark on the screen in the form of a score.

본 발명의 실시예에 따르면 사용자가 소지하는 사용자장치로 제시된 외국어 단어나 문장의 발음 및 리듬과 관련하여 사용자장치의 화면에 표시되는 일치여부 관련 정보를 통해 쉽게 확인이 가능하고, 또 이를 확인해 가면서 반복적으로 학습을 수행할 수 있게 되므로 학습의 능률이 상당이 증대될 것이다.According to an embodiment of the present invention, it is possible to easily check through the information related to the match displayed on the screen of the user device in relation to the pronunciation and rhythm of the foreign language word or sentence presented as the user device possessed by the user, and repeatedly check it. As learning becomes possible, the efficiency of learning will increase considerably.

도 1은 본 발명의 실시예에 따른 학습서비스시스템을 나타내는 도면,
도 2는 도 1의 사용자장치에 표시되는 제1 화면의 예시도,
도 3은 도 1의 사용자장치에 표시되는 제2 화면의 예시도,
도 4는 도 1의 학습서비스장치의 세부구조를 나타내는 블록다이어그램,
도 5는 도 1의 학습서비스장치의 다른 구조를 나타내는 블록다이어그램, 그리고
도 6은 본 발명의 실시예에 따른 학습서비스과정을 나타내는 흐름도이다.1 is a view showing a learning service system according to an embodiment of the present invention,
2 is an exemplary view of a first screen displayed on the user device of FIG. 1,
3 is an exemplary view of a second screen displayed on the user device of FIG. 1,
Figure 4 is a block diagram showing the detailed structure of the learning service device of Figure 1,
5 is a block diagram showing another structure of the learning service device of FIG. 1, and
6 is a flowchart showing a learning service process according to an embodiment of the present invention.

이하, 도면을 참조하여 본 발명의 실시예에 대하여 상세히 설명한다.Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

도 1은 본 발명의 실시예에 따른 학습서비스시스템을 나타내는 도면이고, 도 2는 도 1의 사용자장치에 표시되는 제1 화면의 예시도이며, 도 3은 도 1의 사용자장치에 표시되는 제2 화면의 예시도이다.1 is a view showing a learning service system according to an embodiment of the present invention, FIG. 2 is an exemplary view of a first screen displayed on the user device of FIG. 1, and FIG. 3 is a second view displayed on the user device of FIG. 1. This is an example of the screen.

도 1에 도시된 바와 같이, 본 발명의 실시예에 따른 학습서비스시스템(90)은 음성인식기술을 이용하여 외국어 학습을 수행하는 학습서비스시스템으로서, 사용자장치(100), 통신망(110) 및 학습서비스장치(120)의 일부 또는 전부를 포함한다.As shown in Figure 1, the learning service system 90 according to an embodiment of the present invention is a learning service system that performs foreign language learning using voice recognition technology, the user device 100, the communication network 110 and learning Some or all of the service device 120 is included.

여기서, "일부 또는 전부를 포함한다"는 것은 통신망(110)과 같은 일부 구성요소가 생략되어 사용자장치(100)와 학습서비스장치(120)가 다이렉트(예: P2P) 통신을 수행하거나, 학습서비스장치(120)가 통신망(110) 내의 네트워크장치(예: 무선교환장치 등)에 통합되어 구성될 수 있는 것 등을 의미하는 것으로서, 발명의 충분한 이해를 돕기 위하여 전부 포함하는 것으로 설명한다.Here, "including all or part" means that some components, such as the communication network 110, are omitted, so that the user device 100 and the learning service device 120 perform direct (eg, P2P) communication, or the learning service. The device 120 is a network device in the communication network 110 (for example, a wireless switching device, etc.) that can be configured to be configured, and the like, and will be described as including everything to help the understanding of the invention.

사용자장치(100)는 본 발명의 실시예에 따른 외국어 학습서비스를 이용하기 위한 다양한 장치를 포함한다. 사용자장치(100)는 검색창의 검색주소나 바탕화면의 아이콘을 학습자가 클릭해 애플리케이션의 실행에 의해 학습서비스에 접속할 수 있다. 예를 들어 사용자장치(100)는 MP3, PMP(portable multimedial player), 스마트폰, 랩탑컴퓨터 등의 이동단말장치, 데스크탑컴퓨터, 스마트TV 및 국내 S사의 갤*시기어와 같은 웨어러블장치 등을 포함할 수 있다. 또한 본 발명의 실시예에 따른 사용자장치(100)는 사용자 즉 외국어 학습을 수행하는 학습자의 음성을 취득하기 위한 마이크로폰 등의 음성취득부를 포함한다. 물론 이러한 음성취득부는 사용자장치(100)의 커넥터에 가령 잭(jack) 등을 연결해 동작할 수도 있으므로, 본 발명의 실시예에서는 어느 하나의 형태에 특별히 한정하지는 않을 것이다.The user device 100 includes various devices for using a foreign language learning service according to an embodiment of the present invention. The user device 100 may access the learning service by executing the application by clicking on the search address of the search bar or the icon on the desktop. For example, the user device 100 may include a mobile terminal device such as an MP3, a portable multimedial player (PMP), a smart phone, a laptop computer, a desktop computer, a smart TV, and a wearable device such as a domestic company's Gal * Gear. have. In addition, the user device 100 according to an embodiment of the present invention includes a voice acquisition unit such as a microphone for acquiring the voice of a user, that is, a learner performing foreign language learning. Of course, such a voice acquisition unit may operate by connecting a jack or the like to the connector of the user device 100, so the embodiment of the present invention will not be particularly limited to any one form.

본 발명의 실시예에 따른 사용자장치(100)는 학습자의 음성을 취득하기 위한 마이크로폰만을 구비할 수 있다. 물론 이어폰이나 헤드폰과 마이크로폰이 결합된 장치이어도 무관하다. 이의 경우 가령 MP3는 학습자의 음성만을 취득하고, 주변의 스마트폰이나 랩탑컴퓨터 또는 스마트TV를 통해 학습서비스장치(120)에서 제공하는 외국어 단어나 해당 단어를 포함하는 문장을 수신하여 화면에 표시할 수 있기 때문이다. 물론 주변장치는 관련 데이터를 직접 수신할 수 있겠지만, 사용자장치(100)가 미러링 기능을 포함하는 경우에는 사용자장치(100)를 통해 데이터를 간접적으로 수신할 수도 있다. 또한, 스마트폰이나 랩탑컴퓨터 등은 고가의 장치이므로 마이크로폰을 포함할 확률이 높으므로 해당 마이크로폰을 사용해도 무관할 것이다. The user device 100 according to an embodiment of the present invention may include only a microphone for acquiring a learner's voice. Of course, it does not matter even if the earphone or headphones and a microphone are combined. In this case, for example, the MP3 acquires only the learner's voice, and receives a foreign language word provided by the learning service device 120 or a sentence containing the word through a nearby smart phone, laptop computer, or smart TV and displays it on the screen. Because there is. Of course, the peripheral device may directly receive the related data, but when the user device 100 includes a mirroring function, the user device 100 may also indirectly receive data through the user device 100. In addition, since a smart phone or a laptop computer is an expensive device, it is highly likely to include a microphone, so it may be irrelevant to use the microphone.

이러한 관점에서 볼 때, 본 발명의 실시예에 따른 사용자장치(100)는 가령 랩탑컴퓨터나 스마트TV 등의 주변에 설치되는 가령 인공지능(AI) 스피커를 포함할 수도 있다. 다시 말해, 학습서비스장치(120)에서 제공하는 외국어 관련 단어나 문장은 주변 장치인 랩탑컴퓨터 등의 화면을 통해 확인하고, 사용자의 음성취득이나 음성출력은 AI 스피커를 사용하는 것이다. 물론 데이터 처리와 관련해서는 위에서 이미 설명한 바와 같이 다양한 형태로 처리될 수 있을 것이다.From this point of view, the user device 100 according to an embodiment of the present invention may include, for example, an artificial intelligence (AI) speaker installed around a laptop computer or a smart TV. In other words, words or sentences related to foreign languages provided by the learning service device 120 are checked through a screen of a peripheral computer, such as a laptop computer, and the user's voice acquisition or voice output is using an AI speaker. Of course, the data processing may be processed in various forms as described above.

상기한 바와 같이, 본 발명의 실시예에 따른 사용자장치(100)는 다양한 장치를 포함할 수 있고, 또 독립적으로 사용되거나 또는 주변 장치와 연동해서 사용될 수 있을 것이다. 고가의 장치인 스마트폰 등은 본 발명의 실시예에 따른 서비스를 이용하기 위하여 독립적으로 사용될 수 있지만, MP3나 AI 스피커 등은 주변의 스마트TV 등과 연동해서 사용될 수 있을 것이다. 물론 본 발명의 기술사상을 벗어나지 않는다면 가령 스마트폰이나 스마트TV의 내장 카메라를 함께 동작시켜 학습자의 입모양을 촬영하여 촬영영상의 분석을 통해 학습자의 발음구조를 유추하고, 다시 말해 정확도를 좀더 높이기 위하여 음성인식기술과 병행할 수 있고 이를 통해 발음이나 억양(혹은 인토네이션) 또는 리듬(감) 등을 좀더 정확히 훈련하고 교정하도록 하는 것도 얼마든지 가능할 수 있을 것이다. 여기서, 억양과 리듬을 굳이 좀더 구분한다면 억양은 단순히 한 단어나 문장에서 음의 높고 낮음을 의미한다면, 리듬은 문장에서 그 억양에 의해 형성되는 운율 정도라 이해될 수 있을 것이다.As described above, the user device 100 according to an embodiment of the present invention may include various devices, and may be used independently or in conjunction with a peripheral device. An expensive device such as a smartphone may be used independently to use the service according to an embodiment of the present invention, but MP3 or AI speakers may be used in conjunction with nearby smart TVs. Of course, if it does not deviate from the technical idea of the present invention, for example, by operating a built-in camera of a smart phone or a smart TV, the learner's mouth is photographed to infer the learner's pronunciation structure, and in other words, to increase accuracy. It can be used in parallel with speech recognition technology, and it can be possible to train and correct pronunciation or intonation (or intonation) or rhythm (perception) more accurately. Here, if the distinction between accent and rhythm is necessary, if accent simply means high or low in a word or sentence, rhythm can be understood as the rhyme formed by the accent in a sentence.

물론 본 발명의 따른 사용자장치(100)는 마이크로폰을 통해 취득된 음성을 음성신호의 형태로 생성하므로, 즉 소리를 전기신호로 변환하여 생성한 음성신호를 분석하여 음성의 내용을 인식하는 동작을 수행할 수 있다. 음성신호는 가령 아날로그 신호로서 사용자장치(100)는 그 신호의 파형을 분석함으로써 취득된 음성의 내용이 무엇인지를 파악할 수 있다. 즉 음성신호의 분석결과는 2진 비트정보의 집합일 수 있는데, 이러한 2진 비트정보의 집합은 발음의 발음정보나 억양의 억양정보를 의미할 수 있지만, 지정된 텍스트 즉 외국어 텍스트를 나타낼 수도 있다. 다시 말해 분석결과를 통해 텍스트를 찾고 그 텍스트에 매칭되어 있는 발음이나 억양의 정보 즉 데이터를 추출하여 활용하는 것이다. 예컨대, 도 2에서 "English language"의 발음이 발음기호를 기초로 "1111"의 2진 4비트정보로 설정되었다면, 신호 분석결과도 4비트 단위로 비교하여 "1111"의 2진 4비트정보로 분석될 때 발음이 서로 일치한다고 판단하는 것과 같다. 이러한 원리는 당업자에게 자명하므로 더 이상의 설명은 생략하도록 한다.Of course, the user device 100 according to the present invention generates a voice acquired through a microphone in the form of a voice signal, that is, converts the sound into an electrical signal and analyzes the generated voice signal to recognize the contents of the voice can do. The voice signal is, for example, an analog signal, and the user device 100 can understand what the content of the acquired voice is by analyzing the waveform of the signal. That is, the analysis result of the audio signal may be a set of binary bit information. The set of binary bit information may mean pronunciation information of accent or intonation information of accent, but may also indicate designated text, that is, foreign language text. In other words, it finds the text through the analysis results and extracts and uses information of pronunciation or intonation that matches the text. For example, if the pronunciation of "English language" in FIG. 2 is set to binary 4 bit information of "1111" based on the phonetic symbol, the signal analysis result is also compared in 4 bit units to binary 4 bit information of "1111". It is like judging that pronunciations match each other when analyzed. Since these principles are obvious to those skilled in the art, further explanation will be omitted.

상기의 인식 동작은 인식모듈에서 수행될 수 있다. 인식모듈은 하드웨어(H/W)나 소프트웨어(S/W) 또는 그 조합의 형태로 구성될 수 있다. 이러한 인식모듈은 스마트폰과 같은 사용자장치(100)에 하드웨어나 소프트웨어(예: 애플리케이션 등)적으로 설치되어 사용될 수 있지만, 음성인식의 정확도를 높이기 위하여 고가의 장비인 학습서비스장치(120)에 포함되어 사용될 수 있다. 물론 두 장치 모두에 인식모듈을 포함해 연동하는 것도 가능하다. 가령 스마트폰과 같은 사용자장치(100)가 학습자의 음성을 취득하여 복잡도를 확인(혹은 분석)해 본 결과 복잡도가 낮은 간단한 단어나 문장인 경우에는 자신이 보유하는 인식모듈을 통해 인식하여 인식결과를 학습서비스장치(120)에 제공하지만, 복잡도가 높으면 학습서비스장치(120)로 인식하도록 요청하는 것이다. 이러한 것은 시스템 설계자의 의도에 따라 얼마든지 변경가능한 것이므로 어느 하나의 방식에 특별히 한정하지는 않을 것이다.The above recognition operation may be performed in the recognition module. The recognition module may be configured in the form of hardware (H / W) or software (S / W) or a combination thereof. The recognition module may be installed and used as a hardware or software (for example, an application) in the user device 100 such as a smart phone, but is included in the learning service device 120 which is an expensive equipment to increase the accuracy of speech recognition. Can be used. Of course, it is also possible to interlock both devices by including a recognition module. For example, when the user device 100 such as a smartphone acquires the learner's voice and checks (or analyzes) the complexity, in the case of a simple word or sentence with low complexity, it recognizes the recognition result through recognition module owned by it. It is provided to the learning service device 120, but if the complexity is high, it is requested to be recognized as the learning service device 120. Since this can be changed as much as the intention of the system designer, it will not be particularly limited to either method.

상기와 같이 다양한 방식에 따라 사용자장치(100)는 학습서비스장치(120)에서 제공하는 외국어 서비스에 접속하게 된다. 물론 그러한 외국어 서비스는 프로그램의 사용자경험(UX)/사용자인터페이스(UI)의 설계에 따라 다양한 형태로 이루어질 수 있지만, 본 발명의 실시예에서는 외국어 단어나 문장을 정확히 발음하고 억양, 더 정확하게는 문장의 리듬을 원어민 수준으로 정확히 구사하도록 학습시키는 것이다. 따라서, 이를 만족시킬 수 있는 UX/UI이면 어떠한 형태이어도 좋다. 다시 말해, 본 발명의 실시예는 종래와 같이 음성의 인식에 대한 정확도를 높이려는 측면보다는 사용자가 정확히 발음하고 억양을 원어민 수준으로 구사하도록 훈련하고 반복적으로 학습시키는 데 있는 것이다.As described above, the user device 100 accesses a foreign language service provided by the learning service device 120 according to various methods. Of course, such a foreign language service may be formed in various forms according to the design of a user experience (UX) / user interface (UI) of a program, but in an embodiment of the present invention, a foreign language word or sentence is accurately pronounced and accented, and more precisely, a sentence It is to train the rhythm to be accurately spoken by native speakers. Therefore, any form may be used as long as it is a UX / UI capable of satisfying this. In other words, the embodiment of the present invention is to train a user to accurately pronounce and speak accents to the level of a native speaker rather than to increase the accuracy of speech recognition as in the prior art, and to learn repeatedly.

예를 들어, 사용자장치(100)는 학습서비스장치(120)로 본 발명의 실시예에 따른 외국어 서비스를 요청하여 가령 초급 과정으로서 도 2에서와 같은 화면을 표시할 수 있다. 물론 화면의 구성 즉 레이아웃(layout)은 다양한 형태가 가능하므로 어떠한 형태이어도 무관하다. 사용자장치(100)를 사용하는 학습자는 화면에 표시된 외국어 단어나 문장을 순차적으로 발음할 수 있다. 가령 학습자가 "English language"를 읽었을 때 사용자장치(100)는 해당 구문(200)을 읽은 학습자의 음성을 취득하여 음성신호를 학습서비스장치(120)로 제공할 수 있다. 본 발명의 실시예에서는 설명의 편의를 위해 이하, 사용자장치(100)에서 음성을 취득하여 학습서비스장치(120)로 전송하는 것으로 가정하여 설명한다.For example, the user device 100 may request a foreign language service according to an embodiment of the present invention to the learning service device 120 to display a screen as shown in FIG. 2 as an elementary process. Of course, the configuration of the screen, that is, the layout (layout) is possible in a variety of forms, any form is irrelevant. A learner using the user device 100 may sequentially pronounce a foreign language word or sentence displayed on the screen. For example, when a learner reads "English language", the user device 100 may acquire a voice of the learner who reads the corresponding phrase 200 and provide a voice signal to the learning service device 120. In the embodiment of the present invention, for convenience of description, it will be described under the assumption that voice is acquired from the user device 100 and transmitted to the learning service device 120.

학습서비스장치(120)가 그 수신된 음성을 인식하여, 가령 DB(120a)에 기저장되어 있는 해당 단어나 문장의 발음 및 억양과 관련한 평가데이터와 비교하여 일치율을 측정하고, 그 측정결과를 사용자장치(100)로 다시 제공할 수 있다. 사용자장치(100)로 제공된 일치율의 측정결과 즉 그를 나타내는 일치율 정보(2l0)는 도 2에서와 같이 화면의 일영역에 표시될 수 있다. 또한, 학습서비스장치(120)는 발음의 인식결과와 평가데이터를 비교하여 발음이 서로 일치하지 않는 구간을 판단하게 되는데, 그 구간이 판단되면 관련 정보를 사용자장치(100)로 제공하여 구문(200)의 주변에 제1 표식(220)과 같은 형태로 표시해 줄 수 있다. 가령 도 2에서와 같이 밑줄 형태로 표시해 줄 수 있다. 물론 그러한 표식의 모양이나 형태는 다양할 수 있으므로 어느 하나의 형태에 특별히 한정하지는 않을 것이다.The learning service device 120 recognizes the received voice, compares it with evaluation data related to pronunciation and intonation of the corresponding word or sentence, which is pre-stored in the DB 120a, measures the match rate, and uses the measurement result. It can be provided back to the device 100. The measurement result of the matching rate provided to the user device 100, that is, the matching rate information 2101 indicating the same, may be displayed on one area of the screen as shown in FIG. 2. In addition, the learning service device 120 compares the recognition result of the pronunciation with the evaluation data to determine a section in which the pronunciation does not coincide with each other. When the section is determined, the relevant information is provided to the user device 100 to construct the phrase 200 ) May be displayed in the same form as the first marker 220. For example, it may be displayed in an underline as shown in FIG. 2. Of course, the shape or form of such a marker may vary, so it will not be specifically limited to any one form.

도 2는 사용자장치(100)의 화면상에 일치율이 10%이고, 일치하지 않는 80%의 구간에 제1 표식(220)이 가령 밑줄 형태로 표시된 것을 보여주고 있다. 일치율이 기준값에 미치지 못하므로 이의 경우, 학습서비스장치(120)는 반복 학습을 요청한다. 이때, 사용자장치(100)는 학습서비스장치(120)의 요청에 따라 정확한 발음을 스피커를 통해 제시한 후에 사용자가 따라 발음하도록 할 수 있다. 물론 사용자장치(100)는 발음기호를 화면에 시각적으로 표시해 줄 수도 있을 것이다. 가령 출력된 발음을 따라하도록 요청하여 일치율이 달라지게 되면, 사용자장치(100)는 학습서비스장치(120)의 요청에 따라 제1 표식(220)의 표시 구간을 조정할 수 있다. 가령 구간을 줄이거나 혹은 일치율이 100%일 때 제1 표식(220)을 사라지게 한다.FIG. 2 shows that the first marker 220 is displayed in the form of an underline, for example, on the screen of the user device 100, where the match rate is 10% and the match rate is 80%. In this case, since the matching rate does not reach the reference value, the learning service device 120 requests repeated learning. At this time, the user device 100 may present the correct pronunciation through the speaker at the request of the learning service device 120 and then allow the user to pronounce it. Of course, the user device 100 may visually display the phonetic symbol. For example, if the matching rate is changed by requesting to follow the output pronunciation, the user device 100 may adjust the display section of the first marker 220 according to the request of the learning service device 120. For example, when the interval is reduced or the match rate is 100%, the first marker 220 disappears.

이러한 과정을 통해 일치율이 즉 발음의 일치율이 가령 기준값인 100%를 만족하게 될 때, 학습서비스장치(120)는 다시 사용자장치(100)로 해당 문장과 관련한 억양, 더 정확하게는 리듬을 테스트한다. 여기서 리듬은 억양에 의해 형성된 문장의 운율을 의미할 것이다. 이러한 리듬은 도 3에서와 같이 외국어 문장의 상측에 제2 표식(300)으로서 서로 다른 크기의 점을 표시하는 형태로 이루어질 수도 있다. 작은 점은 짧게 읽고, 큰 점은 길게 읽으라는 의미이다. 혹은 큰 점은 길고 강하게 읽으라는 의미일 수 있다. 물론 이러한 제2 표식(300)은 다양한 형태를 가질 수 있으며, 음표를 표시한 악보형태를 가질 수도 있을 것이다. 음표의 경우 8분음표보다는 4분음표가 길게 발음될 것이다. 본 발명의 실시예에 따라 제2 표식(300)은 문장의 억양을 사용자에게 정확히 전달할 수만 있다면 어떠한 형태로 형성되어도 무관할 것이다. 또한, 사용자장치(100)는 이전 제1 화면에서 제시한 단어나 구문이 포함되는 제3 화면을 제시할 수 있지만, 사용자의 더 정확한 억양 측정을 위하여 더 이전단계에서 제시했던 단어나 구문이 포함된 문장을 제시해 줄 수도 있다.Through this process, when the match rate, that is, the match rate of pronunciation satisfies, for example, the reference value of 100%, the learning service device 120 again tests the accent associated with the sentence with the user device 100, and more precisely, the rhythm. Here, the rhythm will mean the rhyme of the sentence formed by intonation. As shown in FIG. 3, the rhythm may be formed in a form of displaying dots of different sizes as the second marker 300 on the upper side of the foreign language sentence. It means to read the short point shorter and the longer point longer. Or the big point could mean reading long and strong. Of course, the second marker 300 may have a variety of forms, and may also have a score form in which notes are displayed. In the case of notes, the quarter note will be pronounced longer than the eighth note. According to an embodiment of the present invention, the second marker 300 will be irrespective of any form as long as the accent of the sentence can be accurately transmitted to the user. In addition, the user device 100 may present a third screen including the words or phrases presented in the previous first screen, but includes words or phrases suggested in the previous step for more accurate accent measurement by the user. You can also present a sentence.

학습자가 사용자장치(100)의 화면에 표시된 도 3과 같은 제2 표식(300)의 억양을 참조하여 문장을 읽으면, 사용자장치(100)는 다시 학습자의 해당 음성을 취득하여 학습서비스장치(120)로 제공한다. 학습서비스장치(120)는 취득된 음성을(음성신호를) 분석하여 문장의 억양에 대한 일치율을 판단하고, 판단 결과에 따라 사용자에게 반복 학습을 요청할 수 있다. 물론 도 3의 제2 화면의 경우에도 도 2의 제1 화면에서와 같이 일치율에 대한 부가정보를 더 보여줄 수 있고, 이러한 부가정보의 제시와 관련해서는 억양의 불일치 구간을 표시해 주는 등의 동작으로 더 확장될 수 있을 것이다. 이와 같이 다양한 변형이 가능하므로 본 발명의 실시예에서는 어느 하나의 형태에 특별히 한정하지는 않을 것이다.When the learner reads the sentence with reference to the accent of the second marker 300 shown in FIG. 3 displayed on the screen of the user device 100, the user device 100 acquires the corresponding voice of the learner again, and the learning service device 120 To provide. The learning service device 120 may analyze the acquired voice (voice signal) to determine the match rate for the intonation of the sentence, and request the user to repeat learning according to the determination result. Of course, in the case of the second screen of FIG. 3, additional information on the match rate may be further displayed as in the first screen of FIG. 2, and the presentation of such additional information may be performed by an operation such as displaying an inconsistency section of intonation. It could be extended. As such various modifications are possible, the embodiment of the present invention will not be particularly limited to any one form.

상기한 바와 같이 본 발명의 실시예에 따른 사용자장치(100)는 외국어 단어, 구문이나 문장에 대한 발음을 먼저 측정해 보고, 이의 과정에서 일치율이나 일치하지 않는 구간을 화면에 제공하여, 이의 과정에서 부가적으로 정확한 발음도 제공하여 사용자가 정확히 발음할 수 있게 하고, 이러한 과정을 통해 발음의 일치율이 기준값, 가령 100%가 되면 다시 억양을 측정하는 방식으로 학습자를 훈련시킨다. 가령 화면상에 악보와 같은 형태로 억양에 대한 정보를 제공하고 이에 따라 학습자가 문장을 읽으면 이를 취득하여 가령 DB(120a)상에 미리 저장해 놓은 평가데이터와 비교하여 억양의 일치율을 측정하게 되는 것이다. 이러한 과정을 통해 사용자들이 외국어 학습에 있어서 정확한 발음과 억양을 구사하도록 반복 훈련을 시키게 된다.As described above, the user device 100 according to an embodiment of the present invention first measures pronunciation of a foreign language word, phrase, or sentence, and provides a match rate or an unmatched section on the screen during the process, thereby In addition, it provides accurate pronunciation so that the user can pronounce accurately, and through this process, when the coincidence rate becomes a reference value, for example, 100%, the learner is trained by measuring intonation again. For example, it provides information on intonation in the form of sheet music on the screen, and accordingly, when a learner reads a sentence, it acquires it and compares it with evaluation data stored in advance in DB 120a to measure the coincidence of intonation. Through this process, users are repeatedly trained to use correct pronunciation and intonation in learning a foreign language.

통신망(110)은 유무선 통신망을 모두 포함한다. 가령 통신망(110)으로서 유무선 인터넷망이 이용되거나 연동될 수 있다. 여기서 유선망은 케이블망이나 공중 전화망(PSTN)과 같은 인터넷망을 포함하는 것이고, 무선 통신망은 CDMA, WCDMA, GSM, EPC(Evolved Packet Core), LTE(Long Term Evolution), 와이브로(Wibro) 망 등을 포함하는 의미이다. 물론 본 발명의 실시예에 따른 통신망(110)은 이에 한정되는 것이 아니며, 향후 구현될 차세대 이동통신 시스템의 접속망으로서 가령 클라우드 컴퓨팅 환경하의 클라우드 컴퓨팅망, 5G 통신망 등에 사용될 수 있다. 가령, 통신망(110)이 유선 통신망인 경우 통신망(110) 내의 전화국의 교환국 등에 접속할 수 있지만, 무선 통신망인 경우에는 통신사에서 운용하는 SGSN 또는 GGSN(Gateway GPRS SupportNode)에 접속하여 데이터를 처리하거나, BTS(Base Station Transmission), NodeB, e-NodeB 등의 다양한 중계기에 접속하여 데이터를 처리할 수 있다.The communication network 110 includes both wired and wireless communication networks. For example, a wired / wireless Internet network may be used or interlocked as the communication network 110. Here, the wired network includes an internet network such as a cable network or a public telephone network (PSTN), and the wireless communication networks include CDMA, WCDMA, GSM, Evolved Packet Core (EPC), Long Term Evolution (LTE), Wibro network, etc. It includes meaning. Of course, the communication network 110 according to an embodiment of the present invention is not limited to this, and may be used, for example, a cloud computing network under a cloud computing environment, a 5G communication network, etc. as a connection network of a next-generation mobile communication system to be implemented in the future. For example, when the communication network 110 is a wired communication network, it is possible to connect to a switching center of a telephone station in the communication network 110, but in the case of a wireless communication network, it connects to a SGSN or a Gateway GPRS SupportNode (GGSN) operated by a communication company to process data or BTS. (Base Station Transmission), NodeB, e-NodeB can be connected to various repeaters to process data.

통신망(110)은 액세스포인트(AP)를 포함한다. 액세스포인트는 건물 내에 많이 설치되는 펨토(femto) 또는 피코(pico) 기지국과 같은 소형 기지국을 포함한다. 여기서, 펨토 또는 피코 기지국은 소형 기지국의 분류상 사용자장치(100) 등을 최대 몇 대까지 접속할 수 있느냐에 따라 구분된다. 물론 액세스포인트는 사용자장치(100)와 지그비 및 와이파이(Wi-Fi) 등의 근거리 통신을 수행하기 위한 근거리통신 모듈을 포함한다. 액세스포인트는 무선통신을 위하여 TCP/IP 혹은 RTSP(Real-Time Streaming Protocol)를 이용할 수 있다. 여기서, 근거리 통신은 와이파이 이외에 블루투스, 지그비, 적외선(IrDA), UHF(Ultra High Frequency) 및 VHF(Very High Frequency)와 같은 RF(Radio Frequency) 및 초광대역 통신(UWB) 등의 다양한 규격으로 수행될 수 있다. 이에 따라 액세스포인트는 데이터 패킷의 위치를 추출하고, 추출된 위치에 대한 최상의 통신 경로를 지정하며, 지정된 통신 경로를 따라 데이터 패킷을 다음 장치, 예컨대 학습서비스장치(120)로 전달할 수 있다. 액세스포인트는 일반적인 네트워크 환경에서 여러 회선을 공유할 수 있으며, 예컨대 라우터(router), 리피터(repeater) 및 중계기 등이 포함될 수 있다.The communication network 110 includes an access point (AP). Access points include small base stations, such as femto or pico base stations, which are often installed in buildings. Here, the femto or pico base station is classified according to the maximum number of user equipments 100 and the like accessible by classification of the small base station. Of course, the access point includes a user equipment 100 and a short-range communication module for performing short-range communication such as Zigbee and Wi-Fi. The access point can use TCP / IP or RTSP (Real-Time Streaming Protocol) for wireless communication. Here, short-range communication may be performed in various standards such as radio frequency (RF) and ultra-wideband communication (UWB), such as Bluetooth, Zigbee, infrared (IrDA), ultra high frequency (UHF), and very high frequency (VHF), in addition to Wi-Fi. You can. Accordingly, the access point can extract the location of the data packet, designate the best communication path for the extracted location, and forward the data packet to the next device, for example, the learning service device 120 along the designated communication path. An access point may share multiple lines in a typical network environment, and may include, for example, routers, repeaters, and repeaters.

학습서비스장치(120)는 본 발명의 실시예에 따른 외국어 학습 서비스를 제공한다. 이를 위하여 먼저 평가데이터를 구축하는 동작을 수행할 수 있다. 여기서, 평가데이터는 제공하고자 하는 외국어 콘텐츠, 가령 단어, 구문 또는 문장의 발음 및 억양과 관련한 데이터를 생성하여 DB(120a)에 분류하여 저장할 수 있다. 앞서 언급한 대로 이러한 데이터는 2진 비트정보의 형태로 저장될 수 있다. 32비트나 64비트 단위의 정형화된 데이터구조를 활용할 수 있다. 학습서비스장치(120)는 가령 관련 외국어(예: 영어, 일본, 중국어, 스페인어, 불어 등) 분야의 원어민으로부터 취득된 음성을 분석해 이를 기준되는 평가데이터로서 활용할 수 있다. 또는 특정 국가에서 제공하는 표준데이터가 있다면 이를 활용할 수도 있을 것이고, 외국어 콘텐츠를 제공하는 서드파티업체가 있다면 해당 업체들의 데이터를 활용할 수도 있을 것이다. 학습서비스장치(120)는 사용자장치(100)에 제시된 단어나 문장을 시스템에서 이미 알고 있기 때문에 해당 제시된 단어와 문장과 관련해 2진 비트 형태로 기저장된 발음기호정보나 억양정보에 대한 비교데이터를 알 수 있게 될 것이고, 이에 따라 해당 단어나 문장을 읽은 학습자의 음성이 취득되면 이를 인식한 음성인식결과 즉 분석결과와 비교하여 일치율을 판단해 볼 수 있다. 가령, 도 3에서 제시된 문장의 억양이 2진 32비트정보로 생성되었다면 취득된 음성의 음성인식결과인 분석결과를 32비트정보로 생성해 서로 비교함으로써 일치율을 판단할 수 있다.The learning service device 120 provides a foreign language learning service according to an embodiment of the present invention. To this end, an operation of first constructing evaluation data may be performed. Here, the evaluation data may be generated and stored in the DB 120a by generating data related to pronunciation and intonation of foreign language content to be provided, such as words, phrases, or sentences. As mentioned above, such data can be stored in the form of binary bit information. You can use standardized data structures in 32-bit or 64-bit units. The learning service device 120 may analyze voices acquired from native speakers in the relevant foreign languages (eg, English, Japanese, Chinese, Spanish, French, etc.) and use them as standard evaluation data. Or, if there is standard data provided by a specific country, it may be used. If there is a third-party company that provides foreign language content, the data of those companies may be used. Since the learning service device 120 already knows the word or sentence presented in the user device 100 in the system, it knows comparison data for pronunciation information or accent information pre-stored in binary bit form in relation to the proposed word and sentence. Accordingly, when a learner's voice reading the word or sentence is acquired, the match rate can be judged by comparing the recognized voice recognition result, that is, the analysis result. For example, if the accent of the sentence presented in FIG. 3 is generated as binary 32-bit information, the match rate can be determined by generating and analyzing the analysis results, which are the speech recognition results of the acquired speech, as 32-bit information.

평가데이터의 구축 과정이 완료되면, 학습서비스장치(120)는 사용자장치(100)에 제공되는 외국어 단어, 구문 및 문장과 관련해 수신되는 음성신호를 분석하여 사용자가 읽은 음성의 내용을 인식하게 된다. 물론 이의 과정에서 인식된 결과는 텍스트화될 수 있다. 여기서, 텍스트화된다는 것은 DB(120a)에 저장되어 있는 텍스트 사전에서 인식결과와 일치하는 텍스트를 가져와 조합하는 것을 의미할 수도 있다. 가령, 4비트정보 "1111"이 영어 단어 "baby"의 발음기호에 해당되면 이러한 방식으로 발음기호를 결합하여 문장을 형성하고, 이에 매칭되는 발음기호에 대한 데이터와 기저장된 평가데이터를 비교하여 일치여부를 판단하는 것이다. 학습서비스장치(120)는 음성인식기술을 이용하여 전반적으로 이와 같은 방식으로 사용자의 발음과 억양의 일치여부를 판단할 수 있다. 인식결과를 이용한 비교 방식은 다양할 수 있으므로, 본 발명의 실시예에서는 어느 하나의 형태에 특별히 한정하지는 않을 것이다.When the evaluation data construction process is completed, the learning service device 120 analyzes a voice signal received in relation to a foreign language word, phrase, and sentence provided to the user device 100 to recognize the content of the voice read by the user. Of course, the result recognized in the process can be textized. Here, textualization may mean combining and obtaining text matching the recognition result from the text dictionary stored in the DB 120a. For example, if the 4-bit information "1111" corresponds to the pronunciation symbol of the English word "baby", a sentence is formed by combining the pronunciation symbol in this manner, and the data on the matching pronunciation symbol and the stored evaluation data are compared and matched. It is to judge whether or not. The learning service device 120 may determine whether the user's pronunciation and intonation match in this manner as a whole using voice recognition technology. Since the comparison method using the recognition result may be various, the embodiment of the present invention will not be particularly limited to any one form.

또한, 학습서비스장치(120)는 사용자장치(100)의 화면상에 표시되는 다양한 부가정보들, 다시 말해 도 2나 도 3에서와 같이 일치율을 나타내는 일치율 정보(210)나 일치하지 않는 구간을 나타내는 제1 표식(220) 및 억양을 나타내는 제2 표식(300) 등을 외국어 콘텐츠의 제공시 함께 제공하며, 사용자장치(100)는 학습서비스장치(120)에서 해당 화면을 표시하도록 요청하면 이의 요청에 따라 단순히 화면에 기설정된 방식에 따라 표시해 준다고 볼 수 있다. 여기서 "기설정된 방식"이란 가령 통상적으로 사용되는 방식과 같이 학습서비스장치(120)와 사용자장치(100) 간에는 인코딩/디코딩 동작이 수행될 수 있고, 사용자장치(100)는 학습서비스장치(120)의 서비스 데이터가 제공되면 복조 이후에 디코딩 동작을 수행하고, 디코딩된 데이터에서 비디오, 음성 및 부가정보를 분리하며, 해상도 변환을 수행하는 등의 스케일링 동작을 수행한 후 화면에 비디오 데이터를 표시할 때 분리된 부가정보 즉 위의 일치율정보나 억양과 관련한 제2 표식(300)을 결합하여 표시하는 것이다.In addition, the learning service device 120 displays various additional information displayed on the screen of the user device 100, that is, the match rate information 210 indicating the match rate or the non-match section as shown in FIGS. 2 and 3. When the foreign language content is provided, the first mark 220 and the second mark 300 indicating the accent are provided together, and the user device 100 responds to the request when the learning service device 120 requests to display the corresponding screen. Therefore, it can be seen that it is displayed on the screen according to a preset method. Here, the "predetermined method", for example, the encoding / decoding operation may be performed between the learning service device 120 and the user device 100 as in a commonly used method, and the user device 100 may include the learning service device 120. When service data is provided, decoding is performed after demodulation, and video data is displayed on the screen after scaling operations such as separating video, voice, and additional information from the decoded data and performing resolution conversion. The separated additional information is displayed by combining the above-described coincidence rate information or the second marker 300 related to intonation.

결론적으로 학습서비스장치(120)는 사용자장치(100)와 통신을 수행하여 외국어 학습 서비스를 제공하면서 가령 제공된 외국어 단어, 구문 및 문장의 발음을 먼저 확인해 보고, 또 훈련을 통해 발음이 좋아지면 억양을 훈련시키는 등의 과정으로 일종의 외국어 말하기 훈련을 시킨다고 볼 수 있는 것이다.In conclusion, the learning service device 120 communicates with the user device 100 to provide a foreign language learning service, first check the pronunciation of the provided foreign language words, phrases, and sentences, and if the pronunciation improves through training, accent It can be said that it is a kind of foreign language speaking training through training.

학습서비스장치(120)는 구체적으로 다음의 시나리오대로 동작할 수 있다. 먼저 학습서비스장치(120)는 스마트폰 등의 스마트장치로 특정 문장을 제공한다. 그리고, 스마트장치는 해당 문장에 사용되는 주요 단어에 대한 발음을 스피커를 통해 제공한다. 이어, 학습서비스장치(120)는 학습자에게 스마트장치에 해당 단어를 발음하게 한 후 학습서비스장치(120)에서 제공하는 발음과 사용자 자신의 발음을 동시에 들을 수 있도록 하여 학습자의 발음과 학습서비스장치(120)의 발음을 비교할 수 있도록 한다. 단어 학습이 완료되면 학습서비스장치(120)는 해당 문장을 발음하도록 요청할 수 있고, 스마트장치에 입력된 학습자의 발음을 음성인식기술을 이용하여 (외국어) 텍스트로 변환한다. 그리고 변환된 텍스트와 학습서비스장치(120)에서 제공한 문장과의 일치율, 그리고 부정확한 구간을 스마트장치에 알려주고 반복(학습)을 요청한다. 변환된 텍스트와 학습서비스장치(120)에서 제공한 문장과의 일치율이 100%가 되면 해당 문장의 전달력을 높이기 위해 피치(pitch) 주파수의 높낮이에 대한 변화를 평가한다. 가령 학습서비스장치(120)는 제공된 문장에 대한 음성의 높이를 악보형태로 변환하여 문장과 함께 알려주도록 하고, 학습자는 해당 문장의 변환된 악보형태를 따라 읽게 되면 학습서비스장치(120)는 학습자가 읽는 문장의 높낮이를 평가하여 억양의 일치율을 확인하게 되는 것이다.The learning service device 120 may specifically operate according to the following scenario. First, the learning service device 120 provides a specific sentence with a smart device such as a smartphone. In addition, the smart device provides pronunciation of a key word used in the corresponding sentence through a speaker. Subsequently, the learning service device 120 allows the learner to pronounce the word on the smart device, and then allows the user to listen to the pronunciation provided by the learning service device 120 and the user's own pronunciation at the same time. 120) to be able to compare pronunciation. When the word learning is completed, the learning service device 120 may request to pronounce the corresponding sentence, and converts the pronunciation of the learner input to the smart device into text (foreign language) using voice recognition technology. Then, the match rate between the converted text and the sentence provided by the learning service device 120 and the inaccurate section are notified to the smart device and repetition (learning) is requested. When the match rate between the converted text and the sentence provided by the learning service device 120 is 100%, a change in pitch height is evaluated to increase the transmission power of the sentence. For example, the learning service device 120 converts the height of the voice for the provided sentence into a score form to inform with the sentence, and when the learner reads the sentence according to the converted score form, the learning service device 120 learns By evaluating the height of the sentence you are reading, you will check the coincidence of intonation.

상기의 구성 결과, 본 발명의 실시예에 따른 학습서비스시스템(90)은 음성인식기술을 적용해 사용자 가령 외국어 학습자로 하여금 자신의 외국어 발음과 억양을 훈련하고 평가받을 수 있도록 하여, 이를 통해 원어민 수준의 외국어 발음과 억양을 구사할 수 있게 도울 수 있을 것이다.As a result of the above configuration, the learning service system 90 according to an embodiment of the present invention applies a voice recognition technology to allow a user, for example, a foreign language learner to train and evaluate their own foreign language pronunciation and intonation, through which the native speaker level Will be able to help you speak your foreign language pronunciation and intonation.

도 4는 도 1의 학습서비스장치의 세부구조를 나타내는 블록다이어그램이다.FIG. 4 is a block diagram showing the detailed structure of the learning service device of FIG. 1.

도 4에 도시된 바와 같이, 본 발명의 실시예에 따른 학습서비스장치(120)는 통신 인터페이스부(400) 및 외국어학습처리부(410)의 일부 또는 전부를 포함하며, 여기서 "일부 또는 전부를 포함"한다는 것은 앞서서의 의미와 동일하다.As illustrated in FIG. 4, the learning service device 120 according to an embodiment of the present invention includes some or all of the communication interface 400 and the foreign language learning processing unit 410, where "some or all of them are included." "It has the same meaning as before.

통신 인터페이스부(400)는 외국어학습처리부(410)의 제어하에 도 1의 통신망(110)을 경유하여 사용자장치(100)와 통신을 수행한다. 이의 과정에서 통신 인터페이스부(400)는 영상처리를 위해 변복조, 인코딩/디코딩, 먹싱/디먹싱, 스케일링, 믹싱(mixing) 등의 다양한 동작을 수행할 수 있으며, 이는 당업자에게 자명하므로 더 이상의 설명은 생략하도록 한다.The communication interface 400 communicates with the user apparatus 100 via the communication network 110 of FIG. 1 under the control of the foreign language learning processing unit 410. In the process of this, the communication interface unit 400 may perform various operations such as modulation / demodulation, encoding / decoding, muxing / demuxing, scaling, and mixing for image processing, which is obvious to those skilled in the art, so further description It should be omitted.

통신 인터페이스부(400)는 본 발명의 실시예와 관련하여, 먼저 학습자가 외국어 학습을 요청하면, 외국어학습처리부(410)의 지시에 따라 도 2에서와 같은 제1 화면을 사용자장치(100)로 제공할 수 있다. 그리고, 이를 통해 학습자의 음성, 즉 외국어 단어, 구문 또는 문장을 읽은 음성이 취득되면 이를 외국어학습처리부(410)에 전달한다. In connection with the embodiment of the present invention, the communication interface unit 400 first sends the first screen as shown in FIG. 2 to the user device 100 according to the instruction of the foreign language learning processing unit 410 when the learner requests foreign language learning. Can provide. Then, when a learner's voice, that is, a voice reading a foreign language word, phrase or sentence is acquired through this, it is transmitted to the foreign language learning processing unit 410.

또한, 통신 인터페이스부(400)는 외국어학습처리부(410)의 지시에 따라 도 3에서와 같은 제2 화면을 사용자장치(100)로 제공한다. 사실, 도 3의 제2 화면은 도 2의 제1 화면에서 제시한 단어나 구문이 포함된 문장이 제시되도록 하는 것이 바람직할 수 있지만, 학습자의 정확한 실력을 확인하기 위하여 더 이전 단계에서 제시된 단어나 구문이 포함된 문장이 제시되도록 할 수 있다. 따라서, 본 발명의 실시예에서는 학습 실험 등을 통해 실험 데이터를 확보하여 이를 고려해 다양한 형태로 서비스를 구축할 수 있으므로 어느 하나의 형태에 특별히 한정하지는 않을 것이다.In addition, the communication interface 400 provides the second screen as shown in FIG. 3 to the user device 100 according to the instruction of the foreign language learning processing unit 410. In fact, the second screen of FIG. 3 may be preferably such that a sentence including the word or phrase presented in the first screen of FIG. 2 is presented, but in order to confirm the correct ability of the learner, Sentences with phrases can be presented. Therefore, in the embodiment of the present invention, it is possible to construct the service in various forms in consideration of this by securing the experimental data through learning experiments, etc., so it will not be specifically limited to any one form.

상기의 제1 화면 및 제2 화면을 제시하면서 통신 인터페이스부(400)는 일치율이나 일치하지 않는 구간에 관련되는 부가정보를 함께 사용자장치(100)로 제공할 수 있고, 사용자장치(100)는 해당 부가정보를 기설정된 방식, 즉 UX/UI 설계방식에 따라 기설정된 영역에 표시해 주게 되는 것이다.While presenting the first screen and the second screen, the communication interface 400 may provide the user device 100 with additional information related to a match rate or a non-match section, and the user device 100 corresponds The additional information is displayed in a predetermined area according to a preset method, that is, a UX / UI design method.

외국어학습처리부(410)는 본 발명의 실시예에 따라 학습자의 외국어 학습을 담당한다. 무엇보다 음성인식기술을 이용한 외국어 학습을 수행한다. 음성인식기술을 기반으로 하되, 외국어학습처리부(410)는 학습자에게 제시된 외국어 단어나 문장에 대하여 발음 연습이 충분히 이루어졌다고 판단될 때, 억양을 연습하는 단계로 이행하게 된다. 이의 과정에서 외국어학습처리부(410)는 일치율을 나타내는 부가정보로서 일치율 정보나 일치하지 않는 구간을 나타내는 다양한 표식을 함께 사용자장치(100)에 표시해주도록 하여 사용자와의 친밀감을 도모하고, 또 이를 통해 정확한 외국어 학습이 이루어지도록 하는 것이다.The foreign language learning processing unit 410 is in charge of learning a foreign language of a learner according to an embodiment of the present invention. Above all, it performs foreign language learning using voice recognition technology. Based on the speech recognition technology, when the foreign language learning processing unit 410 determines that pronunciation practice has been sufficiently performed for a foreign language word or sentence presented to a learner, it proceeds to a step of practicing intonation. In the course of this, the foreign language learning processing unit 410 promotes intimacy with the user by displaying the coincidence rate information or various markers indicating the non-matching section together on the user device 100 as additional information indicating the coincidence rate. It is to help foreign language learning be done.

외국어학습처리부(410)는 학습자가 자신의 사용자장치(100)에 제시되는 외국어 단어나 문장을 읽으면 외국어학습처리부(410)는 음성인식을 통해 일치여부를 판단하고 판단 결과에 따라 정확한 발음을 제시하면서 발음을 교정하도록 한다. 이러한 과정을 통해 발음이 교정되어 일치율이 기준값의 범위에 도달하게 되면 외국어학습처리부(410)는 억양을 훈련시키기 위해 억양과 관련한 부가정보를 다시 학습자에게 제시하고, 이를 다시 음성인식을 통해 판별하여 사용자와의 음성인식이나 부가정보 등의 제시에 의해 학습효과를 고양시키게 된다.When the learner reads a foreign language word or sentence presented to his / her user device 100, the foreign language learning processing unit 410 determines whether it is matched through voice recognition and presents the correct pronunciation according to the judgment result. Try to correct the pronunciation. When the pronunciation is corrected through such a process and the matching rate reaches the range of the reference value, the foreign language learning processing unit 410 presents additional information related to the accent to the learner again to train the accent, and determines it through voice recognition again. The learning effect is enhanced by the presentation of voice recognition and additional information.

외국어학습처리부(410)의 동작은 프로그램의 설계방식에 따라 다양하게 이루어질 수 있고, 또 UX/UI를 어떻게 설계하느냐에 따라 다양한 화면의 구성 즉 레이아웃이 구성되는 것이므로, 본 발명의 실시예에서는 어느 하나의 형태에 특별히 한정하지는 않을 것이다.The operation of the foreign language learning processing unit 410 may be variously performed according to a program design method, and various screen configurations, that is, layouts may be configured according to how the UX / UI is designed. It will not be specifically limited to the form.

도 5는 도 1의 학습서비스장치의 다른 구조를 나타내는 블록다이어그램이다.5 is a block diagram showing another structure of the learning service device of FIG. 1.

도 5에 도시된 바와 같이, 본 발명의 다른 실시예에 따른 학습서비스장치(120')는 통신 인터페이스부(500), 제어부(510), 외국어학습실행부(520) 및 저장부(530)의 일부 또는 전부를 포함한다.As shown in FIG. 5, the learning service device 120 ′ according to another embodiment of the present invention includes a communication interface unit 500, a control unit 510, a foreign language learning execution unit 520 and a storage unit 530. Includes some or all.

여기서, "일부 또는 전부를 포함한다"는 것은 저장부(530)와 같은 일부 구성요소가 생략되어 학습서비스장치(120')가 구성되거나 외국어학습실행부(520)와 같은 일부 구성요소가 제어부(510)와 같은 다른 구성요소에 통합되어 구성될 수 있는 것 등을 의미하는 것으로서, 발명의 충분한 이해를 돕기 위하여 전부 포함하는 것으로 설명한다.Here, "including some or all" means that some components, such as the storage unit 530, are omitted so that the learning service device 120 'is configured, or some components such as the foreign language learning execution unit 520 control unit ( 510), which means that it can be integrated with other components, and is described as including everything in order to help the understanding of the invention.

도 5의 학습서비스장치(120')는 도 4의 학습서비스장치(120)와 비교해 볼 때, 도 4의 외국어학습처리부(410)가 도 5의 제어부(510) 및 외국어학습실행부(520) 등으로 분리되고, 이때 분리는 하드웨어, 소프트웨어 또는 그 조합에 의해 분리되어 구성된다는 데에 차이가 있다고 볼 수 있다. 다시 말해, 도 5의 학습서비스장치(120')에서 제어동작은 제어부(510)에서 담당하고, 외국어학습실행부(520)는 본 발명의 실시예에 따른 학습동작만을 담당하는 것이다.The learning service device 120 'of FIG. 5 is compared with the learning service device 120 of FIG. 4, and the foreign language learning processing unit 410 of FIG. 4 is the control unit 510 and the foreign language learning execution unit 520 of FIG. It can be seen that there is a difference in that the separation is composed of hardware, software, or a combination thereof. In other words, in the learning service device 120 'of FIG. 5, the control operation is performed by the control unit 510, and the foreign language learning execution unit 520 is only responsible for the learning operation according to an embodiment of the present invention.

예컨대, 제어부(510)는 통신 인터페이스부(500)를 통해 사용자장치(100)로 기제공된 외국어 문장의 억양에 대한 음성이 취득되어 제공되면 이를 저장부(530)에 임시 저장할 수 있을 것이다. 그리고, 외국어학습실행부(520)의 요청이 있으면 저장부(530)에 저장된 해당 음성신호를 제공할 수 있다. 그리고, 제어부(510)는 외국어학습실행부(520)에서 외국어 학습과 관련해 다양한 요청을 하면, 이를 통신 인터페이스부(500)를 제어하여 사용자장치(100)로 전송하도록 한다. 이와 같이 제어부(510)는 학습서비스장치(120')를 구성하는 통신 인터페이스부(500), 외국어학습실행부(520) 및 저장부(530)의 전반적인 제어동작을 수행하게 되는 것이다.For example, the control unit 510 may temporarily store the voice for the accent of the foreign language sentence previously provided to the user device 100 through the communication interface unit 500 and store it in the storage unit 530. In addition, when a request of the foreign language learning execution unit 520 is requested, the corresponding audio signal stored in the storage unit 530 may be provided. Then, the control unit 510, when the foreign language learning execution unit 520 makes various requests related to learning a foreign language, controls the communication interface unit 500 to transmit it to the user device 100. In this way, the control unit 510 is to perform the overall control operation of the communication interface unit 500, the foreign language learning execution unit 520 and the storage unit 530 constituting the learning service device 120 '.

외국어학습실행부(520)는 본 발명의 실시예에 따른 외국어학습서비스를 제공하기 위한 프로그램을 포함할 수 있다. 제어부(510)의 제어에 따라 이의 프로그램을 실행시켜 서비스를 제공할 수 있다. 외국어학습실행부(520)는 본 발명의 실시예에 따른 음성인식기술을 이용해 학습자의 발음과 억양을 훈련시키는 훈련기반의 동작에 관여한다고 볼 수 있다. 이를 위하여 외국어학습실행부(520)는 기설정된 방식에 따라 다양한 형태의 부가정보를 사용자장치(100)로 제공할 수 있다. 부가정보의 제시를 통해 현 상황을 학습자에게 정확히 알리고, 이러한 과정을 통해 학습자와의 친밀감을 높여 학습성취도를 증대시킬 수 있을 것이다.The foreign language learning execution unit 520 may include a program for providing a foreign language learning service according to an embodiment of the present invention. The service may be provided by executing its program under the control of the control unit 510. It can be seen that the foreign language learning execution unit 520 is involved in a training-based operation of training a learner's pronunciation and intonation using a voice recognition technology according to an embodiment of the present invention. To this end, the foreign language learning execution unit 520 may provide various types of additional information to the user device 100 according to a preset method. Through the presentation of additional information, the current situation can be accurately informed to the learner, and through this process, the intimacy with the learner can be increased to increase the learning achievement.

상기한 내용들을 제외하면, 도 5의 통신 인터페이스부(500), 제어부(510), 외국어학습실행부(520) 및 저장부(530)에 대한 내용은 도 1의 학습서비스장치(120), 도 4의 통신 인터페이스부(400) 및 외국어학습처리부(410)의 내용과 크게 다르지 않으므로 그 내용들로 대신하고자 한다.Except for the above, contents of the communication interface unit 500, the control unit 510, the foreign language learning execution unit 520, and the storage unit 530 of FIG. 5 are the learning service device 120 of FIG. Since the contents of the communication interface unit 400 and the foreign language learning processing unit 410 of 4 are not significantly different, the contents will be replaced.

도 6은 본 발명의 실시예에 따른 학습서비스과정을 나타내는 흐름도이다.6 is a flowchart showing a learning service process according to an embodiment of the present invention.

설명의 편의상 도 6을 도 1과 함께 참조하면, 본 발명의 실시예에 따른 학습서비스장치(120)는 외국어 단어 및 단어를 포함하는 외국어 문장의 발음 및 리듬(혹은 억양)에 관련되는 평가데이터를 저장할 수 있다(S600). 사실 평가데이터는 데이터 구축과정을 통해 기저장될 수 있지만, 연동하는 특정 서버로부터 필요한 시점에 제공받아 특정 시점에서 바로 사용하는 것도 가능하므로 본 발명의 실시예에서는 기저장하는 것을 넘어 임시 저장하는 것을 모두 포함한다고 볼 수 있다.Referring to FIG. 6 together with FIG. 1 for convenience of description, the learning service device 120 according to an embodiment of the present invention may evaluate data related to pronunciation and rhythm (or intonation) of a foreign language sentence including a foreign language word and words. It can be stored (S600). In fact, the evaluation data may be pre-stored through the data construction process, but it is also possible to receive it from a specific server interworking at a necessary time point and use it at a specific point in time, so in the embodiment of the present invention, it is possible to temporarily store everything beyond the pre-stored value. It can be considered as including.

또한, 학습서비스장치(120)는 사용자의 사용자장치(100)로 제공되는 외국어 단어 또는 문장을 읽은 사용자의 발음에 대한 음성인식결과를 (기)저장한 평가데이터와 비교하여 발음의 일치율에 따라 사용자장치(100)로 발음의 부정확한 구간을 알려주어 읽기를 반복 요청하거나, 사용자장치(100)로 문장의 리듬 관련 표식을 제공하여 제공한 표식에 근거한 사용자의 리듬에 대한 음성인식결과를 (기)저장한 평가데이터와 비교하여 억양의 일치율을 확인하게 된다(S610).In addition, the learning service device 120 compares the voice recognition result for the pronunciation of the user who reads the foreign language word or sentence provided to the user's user device 100 with the pre-stored evaluation data, and according to the pronunciation matching rate The device 100 informs the incorrect section of the pronunciation to request reading repeatedly, or the user device 100 provides the rhythm-related marker of the sentence and provides the result of speech recognition for the user's rhythm based on the provided marker (gi) The match rate of intonation is checked by comparing with the stored evaluation data (S610).

상기와 같은 주요 동작 이외에도 학습서비스장치(120)는 다양한 동작을 수행할 수 있지만, 이와 관련해서는 앞서 충분히 설명하였으므로 그 내용들로 대신하고자 한다.In addition to the main operation as described above, the learning service device 120 may perform various operations, but in this regard, it has been sufficiently described above, and the contents thereof will be replaced.

한편, 본 발명의 실시 예를 구성하는 모든 구성 요소들이 하나로 결합하거나 결합하여 동작하는 것으로 설명되었다고 해서, 본 발명이 반드시 이러한 실시 예에 한정되는 것은 아니다. 즉, 본 발명의 목적 범위 안에서라면, 그 모든 구성 요소들이 하나 이상으로 선택적으로 결합하여 동작할 수도 있다. 또한, 그 모든 구성요소들이 각각 하나의 독립적인 하드웨어로 구현될 수 있지만, 각 구성 요소들의 그 일부 또는 전부가 선택적으로 조합되어 하나 또는 복수 개의 하드웨어에서 조합된 일부 또는 전부의 기능을 수행하는 프로그램 모듈을 갖는 컴퓨터 프로그램으로서 구현될 수도 있다. 그 컴퓨터 프로그램을 구성하는 코드들 및 코드 세그먼트들은 본 발명의 기술 분야의 당업자에 의해 용이하게 추론될 수 있을 것이다. 이러한 컴퓨터 프로그램은 컴퓨터가 읽을 수 있는 비일시적 저장매체(non-transitory computer readable media)에 저장되어 컴퓨터에 의하여 읽혀지고 실행됨으로써, 본 발명의 실시 예를 구현할 수 있다.On the other hand, that all components constituting the embodiments of the present invention are described as being combined or operated as one, the present invention is not necessarily limited to these embodiments. That is, if it is within the scope of the present invention, all of the components may be selectively combined and operated. In addition, although all of the components may be implemented by one independent hardware, a part or all of the components are selectively combined to perform a part or all of functions combined in one or a plurality of hardware. It may be implemented as a computer program having a. The codes and code segments constituting the computer program may be easily deduced by those skilled in the art of the present invention. Such a computer program is stored in a computer-readable non-transitory computer readable media, and read and executed by a computer, thereby implementing an embodiment of the present invention.

여기서 비일시적 판독 가능 기록매체란, 레지스터, 캐시(cache), 메모리 등과 같이 짧은 순간 동안 데이터를 저장하는 매체가 아니라, 반영구적으로 데이터를 저장하며, 기기에 의해 판독(reading)이 가능한 매체를 의미한다. 구체적으로, 상술한 프로그램들은 CD, DVD, 하드 디스크, 블루레이 디스크, USB, 메모리 카드, ROM 등과 같은 비일시적 판독가능 기록매체에 저장되어 제공될 수 있다.Here, the non-transitory readable recording medium means a medium that stores data semi-permanently and that can be read by a device, rather than a medium that stores data for a short time, such as registers, caches, and memory. . Specifically, the above-described programs may be stored and provided on a non-transitory readable recording medium such as a CD, DVD, hard disk, Blu-ray disk, USB, memory card, ROM, and the like.

이상에서는 본 발명의 바람직한 실시 예에 대하여 도시하고 설명하였지만, 본 발명은 상술한 특정의 실시 예에 한정되지 아니하며, 청구범위에 청구하는 본 발명의 요지를 벗어남이 없이 당해 발명이 속하는 기술분야에서 통상의 지식을 가진 자에 의해 다양한 변형실시가 가능한 것은 물론이고, 이러한 변형실시들은 본 발명의 기술적 사상이나 전망으로부터 개별적으로 이해되어서는 안 될 것이다.Although the preferred embodiments of the present invention have been illustrated and described above, the present invention is not limited to the specific embodiments described above, and it is usually in the technical field to which the present invention pertains without departing from the gist of the present invention as claimed in the claims. It is of course possible to perform various modifications by a person having knowledge of, and these modifications should not be individually understood from the technical idea or prospect of the present invention.

100: 사용자장치 110: 통신망
120, 120': 학습서비스장치 400, 500: 통신 인터페이스부
410: 외국어학습처리부 510: 제어부
520: 외국어학습실행부 530: 저장부100: user equipment 110: communication network
120, 120 ': learning service device 400, 500: communication interface unit
410: foreign language learning processing unit 510: control unit
520: foreign language learning execution unit 530: storage unit

Claims

A storage unit for storing evaluation data related to pronunciation and rhythm of a foreign language word and a foreign language sentence containing the word; And
The voice recognition result of the user's pronunciation reading a foreign language word or sentence provided to the user's user device is compared with the stored evaluation data to inform the user device of the inaccurate section of the pronunciation according to the matching rate of the pronunciation Requesting to repeat reading or providing a rhythm-related marker of the sentence to the user device to compare the voice recognition result of the user's rhythm based on the provided marker with the stored evaluation data to check the matching rate of the rhythm Control unit;
Learning service device using a speech recognition technology that includes.

According to claim 1,
The control unit notifies the user device of the inaccurate section when the match rate of the pronunciation is less than a reference value, and a learning service device using voice recognition technology that provides the rhythm-related mark to the user device when the match rate of the pronunciation is more than a reference value .

According to claim 1,
The control unit is a learning service device using voice recognition technology to provide additional information indicating the match rate to the user device to be displayed on the screen.

According to claim 1,
The control unit is a learning service device using a voice recognition technology that requests the user device to output a voice for the repetition request or to display the inaccurate section on the screen according to the match rate.

According to claim 1,
The control unit is a learning service device using a speech recognition technology that requests the user device to display the rhythm-related mark on the screen in the form of a score.

As a driving method of a learning service device using a speech recognition technology including a storage unit and a control unit,
Storing, by the storage unit, evaluation data related to pronunciation and rhythm of a foreign language word and a foreign language sentence including the word; And
The control unit compares the speech recognition result for the pronunciation of the user who reads a foreign language word or sentence provided to the user's user device with the stored evaluation data, and according to the matching rate of the pronunciation, the pronunciation of the pronunciation is incorrect to the user device. Repeat the request by notifying the section, or by providing the rhythm-related marker of the sentence to the user device, comparing the result of speech recognition for the rhythm of the user based on the provided marker with the stored evaluation data, and Checking the match rate;
Method of driving a learning service device using a speech recognition technology including.