KR20220167575A

KR20220167575A - A foreign language speaking method that provides repeated intonation check feedback

Info

Publication number: KR20220167575A
Application number: KR1020210076711A
Authority: KR
Inventors: 홍미소; 서창백
Original assignee: (주)아큐플라이에이아이
Priority date: 2021-06-14
Filing date: 2021-06-14
Publication date: 2022-12-21

Abstract

The present invention relates to a foreign language speaking learning method for providing repetitive intonation-checking feedback. The method comprises: a provision step of extracting a random learning target word or sentence from a learning language DB storing a plurality of learning target words or sentences with matched text and voice data and outputting the text string of the learning target word or sentence to a user terminal; an input step of receiving voice input for the text string provided in the provision step from the user terminal; a generation step of generating syllable information that separates the voice into syllable units from the voice input in the input step and tone information about a pitch change value between each syllable information; a comparison step in which the syllable information and tone information generated in the generation step are compared with the voice data matched with the text string of the learning target word or sentence extracted in the provision step; and an output step of providing the comparison results of the comparison step to the user terminal. Accordingly, the present invention can increase the efficiency of speaking learning.

Description

A foreign language speaking learning method that provides repeated accent check feedback {A FOREIGN LANGUAGE SPEAKING METHOD THAT PROVIDES REPEATED INTONATION CHECK FEEDBACK}

본 발명은, 반복적인 억양 확인 피드백을 제공하는 외국어 말하기 학습 방법에 관한 것이다.The present invention relates to a method for learning to speak a foreign language that provides repeated accent confirmation feedback.

외국어 말하기 학습은 단어별 발음에 유의하면서 단어 또는 문장을 반복하여 읽어가는 반복 학습이 가장 중요하다.Learning to speak a foreign language is most important in repetitive learning in which words or sentences are read repeatedly while paying attention to the pronunciation of each word.

강사를 대동한 말하기 학습에서는 강사가 말하기 학습을 수행하는 인원의 음성을 직접 듣고 발음 또는 성조의 문제점을 개선시키는 방식으로 학습이 진행되나, PC, 스마트기기를 이용한 말하기 학습에서는 단순히 녹음된 원어민 음성을 반복적으로 듣고 따라하는 방식으로 진행되는 것이 일반적임에 따라, 잘못된 발음 또는 성조의 이상이 발생한다 하더라도, 사용자가 직접 이를 인지하고 개선시키는 데에는 많은 노력과 시간이 소요되어 학습의 효율이 떨어지는 문제점이 있었다.In speaking learning accompanied by an instructor, learning proceeds in such a way that the instructor directly listens to the voice of the person performing the speaking learning and improves pronunciation or tonal problems. As it is common to proceed by repeatedly listening and copying, even if a wrong pronunciation or tonal abnormality occurs, it takes a lot of effort and time for the user to directly recognize and improve it, which reduces the efficiency of learning .

이러한, 문제점을 해결하기 위한 일한으로, 대한민국 공개특허공보 제10-2014-0004540호(출원일: 2012.07.03., 공개일: 2014.01.13., 이하 ‘종래기술’이라 함.)에서는 복수의 문장을 사용자에게 제공한 뒤, 각 문장을 발성한 사용자의 음성 각각에 점수를 책정하고, 모든 문장에 대해 사용자의 음성이 입력되면 합산 점수를 사용자에게 제공하되, 일정 이하의 점수인 경우에 또 다른 복수의 문장을 사용자에게 제공하는 말하기 학습 방법이 개시된 바 있다.In order to solve these problems, Korean Patent Publication No. 10-2014-0004540 (filing date: 2012.07.03, publication date: 2014.01.13., hereinafter referred to as 'prior art') has a plurality of sentences. After providing to the user, a score is set for each voice of the user who uttered each sentence, and when the user's voice is input for all sentences, the combined score is provided to the user, but if the score is below a certain level, another plurality A speaking learning method for providing sentences of to a user has been disclosed.

하지만, 종래기술은 사용자의 음성에 대한 피드백을 단순히 점수 형태로만 산출하고 있고, 복수의 문장에 대한 학습 결과 또한 각 문장에 대한 합산점수로만 제공함에 따라, 문장 내에 발음 또는 성조에 이상이 발생한 부분에 대한 정확한 피드백을 제공하지 못하는 문제점이 있었다.However, in the prior art, feedback on the user's voice is simply calculated in the form of a score, and the learning result for a plurality of sentences is also provided only as an aggregate score for each sentence. There was a problem with not being able to provide accurate feedback about it.

본 발명은 상술한 문제점을 해결하기 위해, 사용자에게 부정확한 발음 및 이상 성조 발생부분에 대한 반복적인 억양 확인 피드백을 제공하는 외국어 말하기 학습 방법을 제공하는 것을 목적으로 한다.In order to solve the above-mentioned problems, an object of the present invention is to provide a foreign language speaking learning method that provides a user with repeated accent confirmation feedback for inaccurate pronunciation and abnormal tonal occurrence parts.

본 발명의 일 실시예에 따른 반복적인 억양 확인 피드백을 제공하는 외국어 말하기 학습 방법은, 텍스트와 음성데이터가 매칭된 복수의 학습대상 단어 또는 문장이 저장된 학습언어DB로부터 임의의 학습대상 단어 또는 문장을 추출하여, 해당 학습대상 단어 또는 문장의 텍스트열을 사용자단말에 출력 제공하는 제공단계; 상기 사용자단말로부터 상기 제공단계에서 제공된 텍스트열에 대한 음성을 입력받는 입력단계; 상기 입력단계에서 입력된 음성으로부터, 음성을 음절단위로 분리시킨 음절정보와 각 음절정보 사이의 음의 높낮이 변화 값에 대한 성조정보를 생성하는 생성단계; 상기 생성단계에서 생성한 음절정보와 성조정보를 상기 제공단계에서 추출된 학습대상 단어 또는 문장의 텍스트열와 매칭된 음성데이터와 비교하는 비교단계; 및 상기 사용자단말에 상기 비교단계의 비교결과를 제공하는 출력단계;를 포함하며, 상기 출력단계는 상기 비교단계에서 산출된 상기 음절정보와 음성데이터의 시간-주파수 영역에서의 파형분석을 통해 확인된 음절별로 발음 일치 정확도 및 상기 성조정보와 음성데이터의 볼륨-주파수 영역에서의 스펙트럼(Spectrum) 분석을 통해 상기 음성데이터와 성조정보의 일치정도를 개별적으로 산출하여 수치화하며, 상기 음성데이터에서 단위 시간 당 가장 큰 볼륨이 확인된 단위 주파수 영역을 시계열 순으로 연결한 제1 그래프를 생성하고, 상기 성조정보에서 단위 시간 당 가장 큰 볼륨이 확인된 단위 주파수 영역을 시계열 순으로 연결한 제2 그래프를 생성하여, 상기 제1 그래프를 제2 그래프와 합성시켜 음성데이터와 성조정보의 성조 변화의 비교가 가능한 그래프 형태의 그래프정보를 생성하며, 생성된 그래프정보와 음절별 발음 일치 결과를 상기 사용자단말에 제공 출력시키는 단계인 것을 특징으로 한다.According to an embodiment of the present invention, a method for learning to speak a foreign language that provides repeated accent confirmation feedback selects any target word or sentence from a learning language DB in which a plurality of target words or sentences in which text and voice data are matched are stored. Extraction and provision step of outputting and providing a text string of a corresponding learning target word or sentence to a user terminal; an input step of receiving an input of voice for the text string provided in the providing step from the user terminal; a generation step of generating syllable information obtained by dividing the voice into syllable units from the voice input in the input step and tonal information about a pitch change value between each syllable information; A comparison step of comparing the syllable information and tonal information generated in the generating step with voice data matched with the text string of the target word or sentence to be learned extracted in the providing step; And an output step of providing a comparison result of the comparison step to the user terminal; including, wherein the output step is confirmed through waveform analysis in the time-frequency domain of the syllable information and voice data calculated in the comparison step. The accuracy of pronunciation matching for each syllable and the matching degree of the voice data and tone information are individually calculated and digitized through spectrum analysis in the volume-frequency domain of the tone information and voice data, and the voice data per unit time A first graph is created by connecting the unit frequency domains with the largest volume in chronological order, and a second graph is created by connecting the unit frequency domains with the largest volume per unit time in chronological order. , Synthesize the first graph with the second graph to generate graph information in the form of a graph capable of comparing voice data and tonal changes in tonal information, and provide the generated graph information and pronunciation match results for each syllable to the user terminal. It is characterized in that it is a step of doing.

그리고, 상기 비교단계는 상기 생성단계에서 생성된 음절정보와 성조정보의 주파수대역 및 음성 입력 길이(Input length)를 상기 음성데이터의 주파수대역 및 음성 출력 길이(Output length)와 유사하도록 조정하여, 조정된 음절정보와 성조정보를 음성데이터와 비교하는 단계인 것을 특징으로 한다.And, the comparison step adjusts the frequency band and voice input length of the syllable information and tonal information generated in the generating step to be similar to the frequency band and voice output length of the voice data, and adjusts It is characterized in that it is a step of comparing the syllable information and tone information obtained with voice data.

또한, 상기 출력단계는 상기 비교단계의 비교 결과가 상기 음성데이터와 음절정보의 발음 일치 정확도 수치 또는 상기 음성데이터와 성조정보의 일치 정도 수치 중 적어도 어느 하나가 제1 기준 값을 초과하지 않은 경우에 한하여, 상기 입력단계로 회귀하여 상기 사용자단말로부터 상기 제공단계에서 제공된 텍스트열에 대한 음성의 재입력을 요청하는 단계인 것을 특징으로 한다.In addition, in the outputting step, when the comparison result of the comparing step does not exceed the first reference value, at least one of the pronunciation matching accuracy value of the voice data and syllable information or the matching degree value between the voice data and tonal information However, it is characterized in that the step of returning to the input step and requesting re-input of the voice for the text string provided in the providing step from the user terminal.

여기서. 상기 입력단계에서 상기 사용자단말로부터 입력된 음성의 주파수대역과 상기 생성단계에서 생성된 음절정보 및 성조정보를 분석하여 발화자의 연령대를 추정하는 추정단계;를 더 포함하는 것을 특징으로 한다.here. and an estimating step of estimating the age group of the speaker by analyzing the frequency band of the voice input from the user terminal in the input step and the syllable information and tonal information generated in the generating step.

이때, 상기 출력단계는 상기 추정단계의 추정 결과가 미리 지정된 연령대에 해당하는 경우에 한하여, 상기 제1 기준 값보다 낮은 수치에 해당하는 제2 기준 값을 통해 상기 음성데이터와 음절정보의 발음 일치 정확도 수치 또는 상기 음성데이터와 성조정보의 일치 정도 수치의 제2 기준 값 초과 여부를 확인하는 단계인 것을 특징으로 한다.At this time, in the outputting step, only when the estimation result of the estimating step corresponds to a pre-specified age group, the pronunciation matching accuracy of the voice data and syllable information is determined through a second reference value corresponding to a value lower than the first reference value. It is characterized in that it is a step of confirming whether a numerical value or a numerical value of the degree of matching between the voice data and the tone information exceeds a second reference value.

덧붙여, 상기 출력단계는 상기 제1 및 제2 그래프를 소정의 시간단위 및 주파수단위로 샘플링하여 단순화된 곡선 그래프 형태로 변환하고, 변환된 제1 그래프를 제2 그래프 상에 오버레이시켜 그래프정보를 생성하는 단계인 것을 특징으로 한다.In addition, the outputting step converts the first and second graphs into a simplified curve graph form by sampling the first and second graphs in predetermined time units and frequency units, and generates graph information by overlaying the converted first graph on the second graph. It is characterized in that it is a step of doing.

한편, 상술한 반복적인 억양 확인 피드백을 제공하는 외국어 말하기 학습 방법은 각 단계를 실행시키기 위한 프로그램을 기록한 컴퓨터 판독 가능한 기록매체 형태로 마련될 수 있다.On the other hand, the foreign language speaking learning method for providing repeated accent check feedback described above may be prepared in the form of a computer readable recording medium in which a program for executing each step is recorded.

본 발명의 바람직한 실시예에 따른 반복적인 억양 확인 피드백을 제공하는 외국어 말하기 학습 방법은, 출력단계에서 사용자의 입력 음성의 발음 및 성조의 일치여부에 대한 정보를 시각적으로 전달하여, 사용자가 학습대상 단어 또는 문장에서의 성조 불일치 영역을 육안으로 확인하면서 말하기 학습을 수행할 수 있게 되어, 말하기 학습의 효율을 증대시킬 수 있는 효과가 있다.According to a preferred embodiment of the present invention, a method for learning to speak a foreign language that provides repeated accent confirmation feedback visually conveys information on whether the pronunciation and tones of the user's input voice match in the output step, so that the user can learn the target word. Alternatively, it is possible to perform speaking learning while visually checking a tonal incongruity region in a sentence, thereby increasing the efficiency of speaking learning.

또한, 본 발명의 바람직한 실시예에 따른 반복적인 억양 확인 피드백을 제공하는 외국어 말하기 학습 방법은 비교단계에서 학습대상언어의 종류, 사용자의 연령 또는 부정확발음 영역을 가지는 사용자에 따라 보정하여 발음이 부정확한 유아, 아동 또는 구강구조 상 특정 발음이 부정확한 인원에 대해서도 비교적 정확한 결과 제공 및 반복 학습이 가능한 효과가 있다.In addition, in the method of learning to speak a foreign language that provides repeated accent confirmation feedback according to a preferred embodiment of the present invention, in the comparison step, the pronunciation is corrected according to the type of target language, the age of the user, or the user having an inaccurate pronunciation area. It has the effect of providing relatively accurate results and enabling repeated learning even for infants, children, or persons whose specific pronunciation is inaccurate due to oral structure.

도 1은 본 발명의 바람직한 실시예에 따른 반복적인 억양 확인 피드백을 제공하는 외국어 말하기 학습 방법의 각 단계를 도시한 순서도이다.1 is a flowchart illustrating each step of a method for learning to speak a foreign language that provides repeated accent confirmation feedback according to a preferred embodiment of the present invention.

이하, 첨부한 도면을 참조하여 본 발명의 반복적인 억양 확인 피드백을 제공하는 외국어 말하기 학습 방법에 대한 바람직한 실시 예를 상세히 설명한다.Hereinafter, with reference to the accompanying drawings, a preferred embodiment of the foreign language speaking learning method for providing repetitive accent confirmation feedback according to the present invention will be described in detail.

각 도면에 제시된 동일한 참조부호는 동일한 부재를 나타낸다. 또한 본 발명의 실시 예들에 대해서 특정한 구조적 내지 기능적 설명들은 단지 본 발명에 따른 실시 예를 설명하기 위한 목적으로 예시된 것으로, 다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가지고 있다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥상 가지는 의미와 일치하는 의미를 가지는 것으로 해석되어야 하며, 본 명세서에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는 것이 바람직하다.Like reference numerals in each figure indicate like members. In addition, specific structural or functional descriptions of the embodiments of the present invention are merely exemplified for the purpose of explaining the embodiments according to the present invention, and unless otherwise defined, all terms used herein, including technical or scientific terms have the same meaning as commonly understood by a person of ordinary skill in the art to which the present invention belongs. Terms such as those defined in commonly used dictionaries should be interpreted as having a meaning consistent with the meaning in the context of the related art, and unless explicitly defined in this specification, it should not be interpreted in an ideal or excessively formal meaning. It is preferable not to

첨부된 도면을 참조하여, 본 발명의 바람직한 실시예에 따른 반복적인 억양 확인 피드백을 제공하는 외국어 말하기 학습 방법의 각 단계를 설명하기에 앞서, 본 발명의 반복적인 억양 확인 피드백을 제공하는 외국어 말하기 학습 방법은, 사용자(또는 발화자)가 사용자단말을 통해 별도의 웹페이지 또는 연동 애플리케이션에 접속하면 사용자로부터 입력된 음성을 통해 발음 및 성조를 확인하는 학습 서버에 의해 구현될 수 있으며, 후술할 반복적인 억양 확인 피드백을 제공하는 외국어 말하기 학습 방법의 각 단계는 상술한 학습 서버를 구성하는 디바이스, 컴포넌트 및 애플리케이션의 연계된 동작에 의해 수행되는 것으로 이해되어져야할 것이다.Referring to the accompanying drawings, prior to describing each step of the foreign language speaking learning method for providing repeated accent confirmation feedback according to a preferred embodiment of the present invention, foreign language speaking learning for providing repeated accent confirmation feedback of the present invention. The method may be implemented by a learning server that checks pronunciation and tone through a voice input from the user when the user (or speaker) accesses a separate web page or linked application through the user terminal, and repeats intonation, which will be described later. It should be understood that each step of the foreign language speaking learning method providing confirmation feedback is performed by linked operations of devices, components, and applications constituting the above-described learning server.

도 1을 참조하여 설명하자면, 본 발명의 바람직한 실시예에 따른 반복적인 억양 확인 피드백을 제공하는 외국어 말하기 학습 방법은, 데스크톱, 랩톱, 스마트기기 등으로 마련되는 사용자단말과 통신하여, 사용자단말에 학습 대상 언어의 단어 또는 문장을 제시하고 사용자단말의 마이크로부터 입력된 발화자의 음성을 분석하여 발음 및 성조를 확인하여 피드백을 제공할 수 있으며, 제공단계(S100), 입력단계(S200), 추정단계(S300), 생성단계(S400), 비교단계(S500) 및 출력단계(S600)를 포함할 수 있다.Referring to FIG. 1, a foreign language speaking learning method that provides repeated accent confirmation feedback according to a preferred embodiment of the present invention communicates with a user terminal provided as a desktop, laptop, smart device, etc., and learns in the user terminal. It is possible to provide feedback by presenting a word or sentence of the target language and analyzing the speaker's voice input from the microphone of the user terminal to check pronunciation and tone, providing step (S100), input step (S200), estimation step ( S300), a generation step (S400), a comparison step (S500), and an output step (S600) may be included.

제공단계(S100)는 텍스트와 음성데이터가 매칭된 복수의 학습대상 단어 또는 문장이 저장된 학습언어DB로부터 임의의 학습대상 단어 또는 문장을 추출하여, 해당 학습대상 단어 또는 문장의 텍스트열을 사용자단말에 출력 제공하는 단계이다. 제공단계(S100)는 사용자단말이 본 발명의 반복적인 억양 확인 피드백을 제공하는 외국어 말하기 학습 방법을 제공하는 학습 서버에서 제공하는 웹페이지에 접속하거나 학습 서버와 연동된 애플리케이션을 통해 말하기 학습을 수행할 외국어를 선택한 이후에 수행되는 단계이다. 이 단계에서는 학습언어DB에 기 저장된 단어 또는 문장 사용자단말에 텍스트열 형태로 제공하며, 경우에 따라서는 해당 단어 및 문장과 매칭된 음성데이터를 함께 제공하여, 학습대상 단어 또는 문장을 시각적, 청각적으로 사용자단말에 출력시켜 제공함으로써, 사용자가 학습대상 단어 또는 문장를 확인할 수 있다.In the providing step (S100), any target word or sentence is extracted from a learning language DB in which a plurality of target words or sentences in which text and voice data are matched are stored, and the text string of the target word or sentence is transmitted to the user terminal. This step provides output. In the providing step (S100), the user terminal accesses a web page provided by a learning server that provides a foreign language speaking learning method that provides repeated accent confirmation feedback of the present invention, or performs speaking learning through an application linked to the learning server. This step is performed after selecting a foreign language. In this step, words or sentences pre-stored in the learning language DB are provided to the user terminal in the form of text strings, and in some cases, voice data matched with the words and sentences are provided together, so that the word or sentence to be learned is visually and audibly By providing the output to the user terminal, the user can check the word or sentence to be learned.

또한, 제공단계(S100)는 애플리케이션 또는 웹사이트를 통한 학습 서버로의 접속 시에 사용한 사용자계정에 사용자의 말하기 구사 능력 정도에 대한 정보가 저장된 경우, 학습언어DB로부터 사용자계정을 통해 확인된 사용자의 말하기 구사 능력 정도에 대응하는 단어 또는 문장을 추출하여 사용자단말에 제공할 수도 있다. 이때, 제공단계(S100)에서 사용자의 말하기 구사 능력 정도에 따라 학습언어DB로부터 추출하는 단어 또는 문장은, 음절 또는 어절의 발음이 어려운 정도, 예를 들어, 문장 또는 단어를 한국어로 발음할 때 각 음절의 종성에 울림소리가 연속적으로 배치되거나 배치되지 않은 정도, 혹은, 각 음절의 초성에 거센소리가 연속적으로 배치되거나 배치되지 않은 정도에 따라 연속적 발음의 난이도를 부여하여 추출될 수 있으며, 성조(음의 높낮이)변화를 가지는 학습언어의 경우, 단어 또는 문장에 포함된 성조변화 횟수 또는 포함된 성조변화 형태의 종류(예를 들어, 중국어에서의 4성 등)에 따라 성조변화에 난이도를 부여하여 추출될 수도 있다.In addition, in the providing step (S100), when information on the level of the user's speaking ability is stored in the user account used when accessing the learning server through the application or website, the user's information confirmed through the user account from the learning language DB Words or sentences corresponding to the degree of speaking ability may be extracted and provided to the user terminal. At this time, the words or sentences extracted from the learning language DB according to the level of the user's speaking ability in the providing step (S100) are difficult to pronounce syllables or words, for example, when pronouncing sentences or words in Korean. It can be extracted by giving the difficulty of continuous pronunciation according to the degree to which the ringing sound is continuously arranged or not arranged at the end of the syllable, or the degree to which the harsh sound is continuously arranged or not arranged at the beginning of each syllable. In the case of a learning language having a change in pitch), difficulty is given to the tone change according to the number of tonal changes included in a word or sentence or the type of tonal change type included (for example, 4 voices in Chinese) may be extracted.

입력단계(S200)는 사용자단말로부터 제공단계(S200)에서 제공된 텍스트열에 대한 음성을 입력받는 단계이다. 이때, 입력단계(S200)에서 입력되는 사용자단말에 포함 또는 별도로 연결된 마이크를 통해 사용자가 제공단계(S200)에서 제공된 텍스트열을 따라 읽으면서 입력된 음성이며, 이 단계에서 입력된 음성은 후술할 추정단계(S400)에서 사용자의 연령대 또는 사용자의 구강구조 상의 이유로 발음이 취약한 영역을 확인할 수 있게 된다.The input step (S200) is a step of receiving a voice input for the text string provided in the providing step (S200) from the user terminal. At this time, it is the voice input while the user reads along the text string provided in the providing step (S200) through a microphone included in the user terminal input in the input step (S200) or connected separately, and the voice input in this step is estimated to be described later. In step S400, it is possible to check areas where pronunciation is weak due to the age of the user or the structure of the user's oral cavity.

생성단계(S300)는 입력단계(S200)에서 입력된 음성으로부터, 음성을 음절단위로 분리시킨 음절정보와 각 음절정보 사이의 음의 높낮이 변화 값에 대한 성조정보를 생성할 수 있다. 생성단계(S300)에서 생성되는 음절정보는 각 단어의 후술할 비교단계(S500) 및 출력단계(S600)에서 사용자로부터 입력된 단어의 음절별 발음 정확도 판단을 위한 근거로 이용되며, 상술한 바와 같이, 연속적인 발음이 어려운 단어 또는 문장에 있어서, 부정확한 발음이 발생하는 구간 또는 음절을 정확하게 확인하기 위해 이용된다. 말하기 구사 능력에 따라, 사용자는 정확한 발음을 유지한 채로 길이가 긴 단어 또는 문장을 읽기는 쉽지 않으며, 더욱이 단어 또는 문장을 읽어나가는 과정에서 부정확한 발음이 발생된 구간 하나하나를 직접 확인하기는 어려울 수 있다. 따라서, 본 발명의 반복적인 억양 확인 피드백을 제공하는 외국어 말하기 학습 방법에서의 생성단계(S300)에서는 후술할 비교단계(S500) 및 출력단계(S600)에서 사용자의 입력 음성에 부정확한 발음이 발생한 위치를 명확하게 판단하기 위한 목적으로, 사용자의 입력 음성을 음절 단위로 분리한 음절정보를 생성할 수 있다. 또한, 생성단계(S300)에서는 음절정보의 생성과정에서 각 음절의 음의 높낮이 변화 값으로부터 성조정보를 생성하게 되며, 생성단계(S300)에서의 성조정보는 제공단계(S100)에서 제공된 학습대상 단어 또는 문장에 대해 입력단계(S200)에서 입력된 사용자의 음성 상의 성조변화가 명확하게 이루어졌는지를 판단하기 위한 근거로 이용될 수 있다.In the generation step (S300), from the voice input in the input step (S200), syllable information obtained by dividing the voice into syllable units and tonal information about the pitch change value between each syllable information may be generated. The syllable information generated in the generation step (S300) is used as a basis for determining the pronunciation accuracy for each syllable of the word input from the user in the comparison step (S500) and output step (S600) of each word, which will be described later. , In words or sentences in which continuous pronunciation is difficult, it is used to accurately identify sections or syllables in which inaccurate pronunciation occurs. Depending on the speaking ability, it is not easy for the user to read a long word or sentence while maintaining correct pronunciation, and it is difficult to directly check each section in which incorrect pronunciation occurred in the process of reading the word or sentence. can Therefore, in the generation step (S300) of the foreign language speaking learning method providing repetitive accent check feedback according to the present invention, in the comparison step (S500) and the output step (S600) to be described later, the position where the incorrect pronunciation occurred in the user's input voice For the purpose of clearly determining, syllable information obtained by dividing the user's input voice into syllable units may be generated. In addition, in the generating step (S300), tonal information is generated from the pitch change value of each syllable in the process of generating syllable information, and the tonal information in the generating step (S300) is provided in the providing step (S100) of the target word to learn. Alternatively, it can be used as a basis for determining whether the tonal change in the voice of the user input in the input step (S200) for the sentence is clearly made.

추정단계(S400)는 입력단계(S200)에서 사용자단말로부터 입력된 음성의 주파수 대역과 생성단계(S300)에서 생성된 음절정보 및 성조정보를 분석하여 발화자의 연령대를 추정하는 단계이다. 이 단계에서는 입력단계(S200)에서 입력된 음성의 주파수 대역과 구강구조 등으로 인한 발음의 부정확 정도를 우선적으로 확인하여, 확인결과를 통해 비교단계(S500)의 비교과정에서 발화자의 연령 및 발음 부정확정도에 대한 분석 결과의 오차 발생을 최소화할 수 있다. 예를 들어, 발화자의 연령대에 따른 발음부정확도는 입술 주변의 운동성이 성인에 비해 떨어지는 2~6세의 유아, 아동에서 나타나는 부분으로, 말하기 학습에 있어서 한국의 자음을 기준으로 만 2~3세에서는 ‘ㄱ, ㄹ, ㅅ, ㅈ’ 의 발음이 부정확하며, 3~5세에서는, ‘ㅅ, ㄹ’, 5~6세에서는 ‘ㄹ’, 6세 이상이되어서도 ‘ㅅ’의 발음이 부정확할 수 있다. 따라서, 추정단계(S400)에서는 입력단계(S200)에서 입력된 음성의 주 주파수대역(또는 중심 주파수 대역)을 확인하여, 성별 및 연령에서 뚜렷한 차이를 보이는 성인 남성의 목소리 주파수 대역에 해당하는 주파수 대역이 아닌 경우, 발화자가 여성 또는 유아기인 것으로 인식하게 된다. 물론, 발화자의 음성이 직접 입력된 것이 아니므로, 마이크의 사양 또는 음성입력 환경을 고려하여 소정의 오차를 적용할 수도 있으며, 더 나아가, 학습 대상 외국어의 종류에 따라 오차를 조정할 수도 있다. 예를 들어, 한국어의 경우 500~2000Hz의 주파수 대역에서 성조변화가 발생하며, 미국식 영어의 경우 1000~5000Hz(영국식 영어, 2000~12000Hz)의 성조변화가, 중국어의 경우에는 1,000~3,000Hz의 성조변화를 보이기 때문에, 최초 제공단계(S100)에서 제공된 학습대상 단어 또는 문장의 언어 종류에 따라 입력단계(S200)에서 입력된 음성의 주파수 대역에 대한 오차를 적용할 수도 있다. 이후, 생성단계(S300)에서 생성된 적어도 음절정보를 통해 발음 취약 단어에 대한 음절 간 분석을 통해 발화자가 여성인지 또는 유아기인지를 특정할 수 있게 된다.Estimation step (S400) is a step of estimating the age group of the speaker by analyzing the frequency band of the voice input from the user terminal in the input step (S200) and the syllable information and tonal information generated in the generation step (S300). In this step, the frequency band of the voice input in the input step (S200) and the degree of pronunciation inaccuracy due to oral structure are first checked, and the age and pronunciation inaccuracy of the speaker are determined through the confirmation result in the comparison process in the comparison step (S500). It is possible to minimize the occurrence of errors in the analysis results for the degree. For example, the pronunciation inaccuracy according to the speaker's age is a part that appears in infants and children aged 2 to 6 years old, whose mobility around the lips is lower than that of adults. In , the pronunciation of 'ㄱ, ㄹ, ㅅ, ㄴ' is inaccurate, 'ㅅ, ㄹ' in 3-5 years old, 'ㄹ' in 5-6 years old, and 'ㅅ' pronunciation can be inaccurate even after 6 years old. there is. Therefore, in the estimation step (S400), the main frequency band (or center frequency band) of the voice input in the input step (S200) is checked, and the frequency band corresponding to the frequency band of adult male voices showing a clear difference in gender and age. If not, it is recognized that the speaker is a woman or an infant. Of course, since the speaker's voice is not directly input, a predetermined error may be applied in consideration of the microphone specification or voice input environment, and furthermore, the error may be adjusted according to the type of foreign language to be studied. For example, in the case of Korean, tone changes occur in the frequency band of 500 to 2000 Hz, in the case of American English, tonal changes of 1000 to 5000 Hz (British English, 2000 to 12000 Hz), and in Chinese, tonal changes of 1,000 to 3,000 Hz. Since there is a change, an error for the frequency band of the voice input in the input step (S200) may be applied according to the language type of the word or sentence to be learned provided in the first providing step (S100). Thereafter, it is possible to specify whether the speaker is a woman or an infant through inter-syllable analysis of the word with weak pronunciation through at least the syllable information generated in the generation step (S300).

비교단계(S500)는 생성단계(S300)에서 생성한 음절정보와 성조정보를 제공단계(S100)에서 추출 제공된 학습대상 단어 또는 문장의 텍스트열와 매칭된 음성데이터와 비교하는 단계이다. 이 단계에서는 음절정보와 음성데이터의 시간-주파수 영역에서의 파형분석에서 파형의 일치정도를 통해 각 음절단위의 발음이 명확하게 이루어졌는지를 확인할 수 있으며, 추정단계(S400)에서의 추정결과에 따라 비교결과를 보정할 수도 있다. 또한, 성조정보와 음성데이터의 볼륨-주파수 영역에서의 스펙트럼(Spectrum) 분석을 통해 음성데이터와 성조정보의 주파수 대역별 톤 변화, 다시 말해, 음의 높낮이의 일치정도를 확인할 수 있으며, 음절정보와 마찬가지로 성조정보와 음성데이터의 정확한 비교를 위해 성조정보의 주파수 대역을 음성데이터의 주파수 대역과 유사한 주파수 대역으로 변환시킨 뒤에 음의 높낮이의 일치정도를 확인할 수 있다. 다시 말해, 비교단계(S500)에서는 제공단계(S100)에서 제공된 학습대상 단어 또는 문장의 음성데이터와 생성단계(S300)에서 입력단계(S200)에서 입력된 사용자의 음성으로부터 생성한 음절정보와 성조정보를 비교 분석하여 각 음절단위의 발음 정확도 및 음절 간 성조변화의 일치 여부를 비교 확인하게 된다.The comparison step (S500) is a step of comparing the syllable information and tonal information generated in the generation step (S300) with voice data matched with the text string of the learning target word or sentence extracted and provided in the providing step (S100). In this step, it is possible to check whether the pronunciation of each syllable unit has been clearly made through the matching degree of the waveform in the waveform analysis in the time-frequency domain of the syllable information and voice data. According to the estimation result in the estimation step (S400) Comparison results can be corrected. In addition, through spectrum analysis in the volume-frequency domain of tonal information and voice data, it is possible to check the tone change for each frequency band of voice data and tonal information, that is, the degree of agreement between the pitch of the sound, and the syllable information and Likewise, in order to accurately compare the tone information and the voice data, the frequency band of the tone information is converted into a frequency band similar to that of the voice data, and then the degree of matching of the pitch of the sound can be confirmed. In other words, in the comparison step (S500), the syllable information and tonal information generated from the voice data of the learning target word or sentence provided in the providing step (S100) and the user's voice input in the input step (S200) in the generating step (S300) By comparing and analyzing, the pronunciation accuracy of each syllable unit and the matching of tonal changes between syllables are compared and confirmed.

출력단계(S600)는 사용자단말에 비교단계(S500)의 비교결과를 제공하는 단계이다. 이때, 출력단계(S600)에서는 사용자로부터 입력된 음성에서의 발음 정확정도에 대한 정보를 사용자단말에 제공하기 위해, 음절정보와 음성데이터의 시간-주파수 영역에서의 파형분석을 통해 확인된 음절별로 발음 일치 정확도를 수치화하여, 음절별 발음 정확정도에 대한 정보를 제공할 수 있으며, 음성데이터의 시계열 성조변화 그래프와 성조정보의 시계열 성조변화 그래프를 비교 분석하여 사용자의 성조 정확정도에 대한 정보를 제공할 수 있게 된다. 여기서, 출력단계(S600)에서 사용자단말에 출력 제공되는 정보는, 발음의 정확정도 수치가 기준(이하의 설명에서 언급될 제1 기준 값)을 초과한 텍스트열 내의 음절과 기준에 이하의 음절을 서로 다른 색으로 표시된 형태로 출력 제공되고, 음성데이터의 성조변화와 사용자의 음성에서의 성조변화를 비교한 그래프 형태로 출력 제공되어, 출력단계(S600)에서 출력된 정보를 통해 사용자가 사용자단말로부터 음절별 발음 정확정도와 성조 정확 정도를 시각적으로 확인할 수 있게 된다.Output step (S600) is a step of providing the comparison result of the comparison step (S500) to the user terminal. At this time, in the output step (S600), pronunciation for each syllable confirmed through waveform analysis in the time-frequency domain of syllable information and voice data is provided to the user terminal with information on the degree of pronunciation accuracy in the voice input from the user. By quantifying the matching accuracy, information on the accuracy of pronunciation by syllable can be provided, and information on the accuracy of the user's tone can be provided by comparing and analyzing the time-series tone change graph of voice data and the time-series tone change graph of tone information. be able to Here, the information output and provided to the user terminal in the output step (S600) includes the syllables in the text string whose pronunciation accuracy value exceeds the standard (the first standard value to be mentioned in the following description) and the syllables below the standard. The output is provided in the form of different colors, and the output is provided in the form of a graph comparing the tonal change in the voice data and the tonal change in the user's voice. It is possible to visually check the degree of pronunciation accuracy and tonal accuracy for each syllable.

이때, 출력단계(S600)에서는 음절별 발음 정확도에 대한 정보에 비해 성조 정확 정도에 대한 시각적 정보 전달이 어렵기 때문에, 사용자가 성조정보와 음성데이터 간의 성조변화를 시각적으로 확인할 수 있도록, 음성데이터의 시계열 성조변화에 대한 제1 볼륨-주파수 그래프를 생성하고, 성조정보의 시계열 성조변화에 대한 제2 볼륨-주파수 그래프를 생성할 수 있다. 이때, 출력단계(S600)에서 생성된 제1 및 제2 볼륨-주파수 그래프는 단위 시간에서 어떠한 주파수 대역의 볼륨이 크게 감지되는 지를 나타내는 그래프이며, 출력단계(S600)에서는 제1 볼륨-주파수 그래프에서 단위 시간 당 가장 큰 볼륨이 확인된 단위 주파수 영역을 시계열 순으로 연결하여 음성데이터의 음성의 높낮이 변화에 해당하는 제1 그래프를 생성하고, 이와 마찬가지로, 제2 볼륨-주파수 그래프에서 단위 시간 당 가장 큰 볼륨이 확인된 단위 주파수 영역을 시계열 순으로 연결하여 성조정보의 음성의 높낮이 변화에 해당하는 제2 그래프를 생성할 수 있다. 이후, 제1 그래프와 제2 그래프를 소정의 시간단위 및 주파수단위로 샘플링하여 단순화된 시간-주파수 영역에서의 곡선 그래프 형태로 변환하고, 변환된 제1 그래프를 제2 그래프 상에 오버레이시켜 그래프정보를 생성하여, 사용자단말에 제공 출력시킬 수 있다. 다시 말해, 사용자는 사용자단말에 제공된 그래프정보를 통해 음성데이터와 사용자가 직접 입력한 음성의 음의 높낮이 변화 정도를 비교하여 확인함으로써, 학습대상 단어 또는 문장에서의 성조 불일치 영역을 육안으로 확인할 수 있게 된다.At this time, in the output step (S600), since it is difficult to convey visual information about the degree of tone accuracy compared to information about the pronunciation accuracy of each syllable, so that the user can visually check the tone change between the tone information and the voice data, the voice data A first volume-frequency graph for time-series tonal changes may be generated, and a second volume-frequency graph for time-series tonal changes of tone information may be generated. At this time, the first and second volume-frequency graphs generated in the output step (S600) are graphs indicating which frequency band's volume is detected as high in unit time, and in the output step (S600), in the first volume-frequency graph A first graph corresponding to the change in pitch of the voice of the voice data is generated by connecting the unit frequency areas in which the largest volume per unit time is confirmed in time series order, and similarly, in the second volume-frequency graph, the highest volume per unit time A second graph corresponding to the change in pitch of voice of tonal information may be generated by connecting the unit frequency domains in which the volume is confirmed in the order of time series. Thereafter, the first graph and the second graph are sampled in predetermined time units and frequency units to be converted into a simplified curve graph form in the time-frequency domain, and the converted first graph is overlaid on the second graph to provide graph information. By generating, it is possible to provide output to the user terminal. In other words, the user compares and checks the degree of pitch change of the voice data and the voice directly input by the user through the graph information provided to the user terminal, so that the tonal inconsistency area in the target word or sentence can be visually confirmed. do.

또한, 출력단계(S600)에서는 비교단계(S500)에서 음절별 발음 정확도 수치 또는 음성데이터와 성조정보의 일치 정도 수치 중 적어도 어느 하나가 제1 기준 값을 초과하지 않으면서, 제1 기준 값을 초과하는 연속 초과 횟수가 기 설정된 기준 횟수에 도달하지 않은 경우에 한하여 입력단계(S200)로 회귀하여 사용자단말로부터 제공단계(S100)에서 제공된 텍스트열에 대한 음성의 재입력을 요청할 수 있다. 다시 말해, 출력단계(S600)는 사용자가 일치율 75% 등과 같이 설정된 기준 값 이상으로 발음과 성조가 일치한 횟수가 연속적으로 3회 이상 등과 같은 기준 횟수에 도달하였을 경우에만 해당 학습대상 단어 또는 문장의 학습을 종료하게 되며, 이후에 제공단계(S100)로 회귀하되, 학습언어DB로부터 새로운 학습대상 단어 또는 문장을 추출하여 사용자단말에 제공함으로써, 사용자가 새로운 학습대상 단어 또는 문장의 말하기 학습을 이어나갈 수 있게 된다.In addition, in the output step (S600), at least one of the pronunciation accuracy value for each syllable or the degree of matching between voice data and tonal information exceeds the first reference value in the comparison step (S500) without exceeding the first reference value. Only when the number of times exceeding the consecutive number of times does not reach the preset reference number may return to the input step (S200) and request re-input of the voice for the text string provided in the providing step (S100) from the user terminal. In other words, in the output step (S600), only when the user reaches the standard number of times that the pronunciation and tonality match more than the set standard value, such as 75% of the matching rate, such as 3 or more consecutive times, the corresponding learning target word or sentence The learning ends, and then returns to the providing step (S100), but by extracting new learning target words or sentences from the learning language DB and providing them to the user terminal, the user continues learning to speak the new learning target words or sentences. be able to

한편, 출력단계(S600)에서는 추정단계(S400)의 추정 결과가 미리 지정된 연령대에 해당하는 경우에 한하여, 제1 기준 값보다 낮은 수치에 해당하는 제2 기준 값을 통해 음성데이터와 음절정보의 발음 일치 정확도 수치 또는 음성데이터와 성조정보의 일치 정도 수치가 제2 기준 값을 초과하는지 여부를 확인하여, 발음이 부정확한 유아, 아동 또는 구강구조 상 특정 발음이 부정확한 인원에 대해서도 비교적 정확한 결과 제공 및 반복 학습이 가능할 수 있다.On the other hand, in the output step (S600), pronunciation of voice data and syllable information through a second reference value corresponding to a numerical value lower than the first reference value, only when the estimation result of the estimation step (S400) corresponds to a pre-specified age group. By checking whether the matching accuracy value or the matching degree value of voice data and tone information exceeds the second standard value, relatively accurate results are provided even for infants and children with inaccurate pronunciation or persons with inaccurate specific pronunciation due to oral structure and Iterative learning may be possible.

즉, 본 발명의 바람직한 실시예에 따른 반복적인 억양 확인 피드백을 제공하는 외국어 말하기 학습 방법은, 출력단계에서 사용자의 입력 음성의 발음 및 성조의 일치여부에 대한 정보를 시각적으로 전달하여, 사용자가 학습대상 단어 또는 문장에서의 성조 불일치 영역을 육안으로 확인하면서 말하기 학습을 수행할 수 있게 되어, 말하기 학습의 효율을 증대시킬 수 있는 효과가 있다.That is, in the method of learning to speak a foreign language that provides repeated accent confirmation feedback according to a preferred embodiment of the present invention, information on whether pronunciation and tone of the user's input voice are matched visually is transmitted in the output step, so that the user can learn It is possible to perform speaking learning while visually checking a tonal incongruity region in a target word or sentence, thereby increasing the efficiency of speaking learning.

상기한 본 발명의 실시 예는 예시의 목적을 위해 개시된 것이고, 본 발명에 대해 통상의 지식을 가진 당업자라면 본 발명의 사상과 범위 안에서 다양한 수정, 변경, 부가가 가능할 것이며, 이러한 수정, 변경 및 부가는 하기의 특허 청구범위에 속하는 것으로 보아야 할 것이다.The above embodiments of the present invention have been disclosed for illustrative purposes, and those skilled in the art with ordinary knowledge of the present invention will be able to make various modifications, changes, and additions within the spirit and scope of the present invention, and such modifications, changes, and additions will be considered to fall within the scope of the following patent claims.

Claims

A providing step of extracting an arbitrary learning target word or sentence from a learning language DB in which a plurality of target learning words or sentences in which text and voice data are matched are stored, and outputting and providing a text string of the target learning word or sentence to a user terminal;
an input step of receiving an input of voice for the text string provided in the providing step from the user terminal;
a generation step of generating syllable information obtained by dividing the voice into syllable units from the voice input in the input step and tonal information about a pitch change value between each syllable information;
A comparison step of comparing the syllable information and tonal information generated in the generating step with voice data matched with the text string of the target word or sentence to be learned extracted in the providing step; and
Including; an output step of providing a comparison result of the comparison step to the user terminal,
The output step is the pronunciation matching accuracy for each syllable identified through waveform analysis in the time-frequency domain of the syllable information and voice data calculated in the comparison step and the spectrum in the volume-frequency domain of the tone information and voice data ( Through Spectrum) analysis, the degree of matching between the voice data and tonal information is individually calculated and digitized, and a first graph is created by connecting the unit frequency areas in which the largest volume per unit time is identified in the voice data in time series order. In the tonal information, a second graph is generated by connecting the unit frequency domains in which the largest volume per unit time is confirmed in time series order, and the first graph is synthesized with the second graph to determine the tone change of the voice data and the tonal information. Generating graph information in the form of a graph that can be compared, and providing and outputting the generated graph information and pronunciation matching results for each syllable to the user terminal.

According to claim 1.
The comparison step adjusts the frequency band and voice input length of the syllable information and tonal information generated in the generating step to be similar to the frequency band and output length of the voice data, so that the adjusted syllables A foreign language speaking learning method that provides repeated accent confirmation feedback, characterized in that the step of comparing information and tonal information with voice data.

According to claim 1,
In the outputting step, as long as the comparison result of the comparing step does not exceed the first reference value, at least one of the pronunciation matching accuracy value of the voice data and syllable information or the matching degree value of the voice data and tonal information does not exceed the first reference value, Returning to the input step, the foreign language speaking learning method for providing repeated accent confirmation feedback, characterized in that the step of requesting re-input of the voice for the text string provided in the providing step from the user terminal.

According to claim 3,
and an estimating step of estimating the age group of the speaker by analyzing the frequency band of the voice input from the user terminal in the input step and the syllable information and tonal information generated in the generating step. A way to learn to speak a foreign language that provides feedback.

According to claim 4,
In the outputting step, only when the estimation result of the estimating step corresponds to a pre-specified age group, the pronunciation matching accuracy value of the voice data and syllable information or A foreign language speaking learning method that provides repeated accent confirmation feedback, characterized in that the step of checking whether the degree of matching between the voice data and the tonal information exceeds a second reference value.

According to claim 1,
The output step is a step of sampling the first and second graphs in predetermined time units and frequency units to convert them into a simplified curve graph form, and overlaying the converted first graph on a second graph to generate graph information. A foreign language speaking learning method that provides repeated accent confirmation feedback, characterized in that.

A computer-readable recording medium recording a program for executing each step of the foreign language speaking learning method for providing repeated accent confirmation feedback according to any one of claims 1 to 6.