KR20210071713A

KR20210071713A - Speech Skill Feedback System

Info

Publication number: KR20210071713A
Application number: KR1020190162180A
Authority: KR
Inventors: 김수진; 안지윤
Original assignee: 사단법인 스마트미디어인재개발원
Priority date: 2019-12-07
Filing date: 2019-12-07
Publication date: 2021-06-16

Abstract

The present invention relates to a speech skill feedback system that analyzes a speaker's speech and shows the ability of his/her speech. More specifically, the present invention relates to the speech skill feedback system capable of improving a speech skill by analyzing the speaker's speech speed, pitch, pronunciation accuracy, language habits, and the like and receiving feedback on the analysis results. The speech skill feedback system comprises: a voice input part; a script file; a pronunciation accuracy analysis part; a word analysis part; a high/low pitch analysis part; an intonation analysis part; a speed analysis part; a storage part; an evaluation part; and a recommendation part.

Description

Speech Skill Feedback System {Speech Skill Feedback System}

본 발명은 화자의 말을 분석하여 스피치 능력을 보여주는 스피치 스킬 피드백 시스템에 관한 것으로, 더욱, 상세하게는 화자의 말 속도, 음 높이, 발음 정확도, 언어 습관 등을 분석하여 분석 결과를 피드백을 받음으로써 스피치 능력을 향상 시킬 수 있도록 지원하는 스피치 스킬 피드백 시스템에 관한 것이다.The present invention relates to a speech skill feedback system showing speech ability by analyzing a speaker's speech, and more particularly, by analyzing the speaker's speech speed, pitch, pronunciation accuracy, language habit, etc. and receiving the analysis result as feedback. It relates to a speech skill feedback system that supports to improve speech ability.

자신의 생각으로 남을 설득시키고 자신이 아는 것을 다른 사람들에게 효과적으로 전달할 수 있는 능력은 현대사회에서 선택이 아닌 필수로 자리잡혔다. 아무리 많이 알고 있고 자신의 생각이 확실하더라도 다른 사람을납득시키고 설득시키지 못 한다면 아무런 소용이 없다. The ability to persuade others with one's thoughts and effectively communicate what one knows to others has become a necessity rather than an option in modern society. No matter how much you know or what you think, it is of no use if you cannot convince and persuade others.

이러한 스피치 능력은 사람의 자본으로 하나의 경쟁력이 될 수 있으며, 짧은 순간으로 첫 인상과 이미지가 결정된다. 평범하고 친숙한 스피치는 사람들에게 기억되지 못하므로 무엇을 어필할지, 어떤 말을 할지, 제대로 말하고 있는지 등에 대한 스피치 스킬은 현대를 살아가는데 있어서 중요한 능력 중의 하나이다. Such speech ability can become a competitive advantage with human capital, and the first impression and image are determined in a short moment. Ordinary and familiar speech is not remembered by people, so speech skills such as what to appeal to, what to say, and whether to speak properly are one of the most important skills in living in the modern world.

최근에는 전화 공포증(phone phobia), 스피치 공포등(glossophobia) 등 이전에는 없던 신조어가 생겨나는 등 현대인들에게 말하는 것이 더 이상 익숙하고 쉬운 것이 아님을 보여준다. Recently, new words such as phone phobia and glossophobia have emerged, which show that speaking to modern people is no longer familiar and easy.

미국 캘리포니아 대학의 심리학자 메러비안이 주장한 메라비언의 법칙에 의하면, 의사소통에 있어서 목소리는 38%, 표정은 30%, 태도는 20%, 몸짓은 5%, 말의 내용이나 어휘는 7%의 중요성을 가진다고 한다. 이 이론에 따라 개발 어플리케이션의 서비스 영역을 사용자의 목소리인 음성적 요소로 구체화할 수 있다.According to Meravian's Law, a psychologist at the University of California, USA, the importance of voice is 38%, facial expression is 30%, attitude is 20%, gesture is 5%, content or vocabulary is 7% in communication. is said to have According to this theory, the service area of the development application can be specified as the voice element, which is the user's voice.

본 발명은 이상과 같은 배경의 필요에 따라 이루어진 것으로 그 목적은 화자의 말을 세부적으로 분석하여 말 속도, 음 높이, 발음 정확도, 언어 습관에 대한 특징을 추출하고 기존의 우수한 스피치 데이터을 학습시킨 딥러닝 모델을 이용하여 화자의 스피치 스킬 능력을 파악하여 결과를 피드백하는 스피치 스킬 피드백 시스템에 관한 것이다.The present invention was made according to the needs of the background as described above, and its purpose is to analyze the speaker's speech in detail to extract features of speech speed, pitch, pronunciation accuracy, and language habit, and deep learning to learn the existing excellent speech data. It relates to a speech skill feedback system that uses a model to identify a speaker's speech skill ability and feeds back the result.

상기와 같은 목적을 달성하기 위하여 본 발명의 실시예에 따른 스피치 스킬 피드백 시스템은 화자의 말을 실시간으로 입력받거나 녹음된 음성을 불러오는 음성 입력부; 입력받는 음성 데이터를 문자로 변화하여 발음의 정확도를 분석하는 발음발음정확도분석부; 화자의 말 속도를 분석하는 속도분석부; 화자의 음의 높낮이를 분석하는 음높낮이 분석부; 음높이의 변화 분석을 통해 억양을 분석하는 억양분석부; 화자의 말속에 포함되어 있는 단어를 분석하는 단어분석부; 분석 결과를 저장하는 저장부; 분석 결과를 평가하는 평가부; 평가 결과에 따라 화자에게 피드백하거나 보여주는 추천부를 포함하는 것을 특징으로 한다.In order to achieve the above object, a speech skill feedback system according to an embodiment of the present invention includes: a voice input unit for receiving a speaker's speech in real time or calling a recorded voice; Pronunciation accuracy analysis unit to analyze the accuracy of pronunciation by changing the input voice data into text; a speed analyzer analyzing the speaker's speech speed; a pitch analyzer for analyzing the pitch of a speaker's pitch; an intonation analysis unit that analyzes intonation through pitch change analysis; a word analysis unit that analyzes words included in the speaker's speech; a storage unit for storing analysis results; an evaluation unit that evaluates the analysis result; It is characterized in that it includes a recommendation unit that feeds back or shows the speaker according to the evaluation result.

상기와 같은 구성을 이루는 본 발명의 실시예에 따른 스피치 스킬 피드백 시스템에 의하면 다음과 같은 효과가 있다.According to the speech skill feedback system according to the embodiment of the present invention having the configuration as described above, the following effects are obtained.

첫째, 스피치에 자신 없던 사람들도 본 발명에 의한 시스템을 활용함으로써 발표, 면접, 연설 등 중요한 스피치를 앞두고 발표 연습을 수행할 수 있다.First, even those who are not confident in speech can perform presentation practice prior to important speech such as presentation, interview, and speech by using the system according to the present invention.

둘째, 본 발명에 의한 시스템을 활용함으로써 자신이 원하는 스타일의 발표, 대화 방법, 발음 방법 등을 익힐 수 있어 언어 공부, 연기 공부, 음악 공부 등에 활용할 수 있다.Second, by utilizing the system according to the present invention, it is possible to learn a desired style of presentation, conversation method, pronunciation method, etc., so that it can be used for language study, acting study, music study, and the like.

셋째, 대화 상대방의 말을 분석함으로써 상대방이 느끼는 감정상태, 대화능력, 서로 간의 매칭도 등을 분석하므로써 비지니스, 만남 등의 서비스에 활용할 수 있다.Third, by analyzing the words of the conversation partner, it can be used for services such as business and meeting by analyzing the emotional state of the other party, communication ability, and the degree of matching with each other.

도 1은 본 발명의 일실시예에 따른 스피치 스킬 피드백 시스템의 구성을 도시한 것이다.
도 2는 본 발명의 일실시예에 따른 스피치 스킬 피드백 시스템으로 구현된 어플리케이션의 예시이다.
도 3은 본 발명의 일실시예에 따른 스피치 스킬 피드백 시스템의 화자의 말과 대본의 유사도 분석 방법을 도시한 것이다.
도 4는 본 발명의 일실시예에 따른 스피치 스킬 피드백 시스템의 사용 단어 분석 방법을 도시한 것이다.
도 5는 본 발명의 일실시예에 따른 스피치 스킬 피드백 시스템의 음의 크기, 말의 속도, 억양 분석 방법을 도시한 것이다.
도 6은 본 발명의 일실시예에 따른 스피치 스킬 피드백 시스템의 유사도, 사용 단어, 크기, 속도, 억양 데이터를 기반으로 평가 및 피드백 방법을 도시한 것이다.
도 7은1 shows the configuration of a speech skill feedback system according to an embodiment of the present invention.
2 is an example of an application implemented as a speech skill feedback system according to an embodiment of the present invention.
3 is a diagram illustrating a similarity analysis method between a speaker's words and a script of the speech skill feedback system according to an embodiment of the present invention.
4 is a diagram illustrating a used word analysis method of a speech skill feedback system according to an embodiment of the present invention.
5 is a diagram illustrating a method of analyzing the volume, speech speed, and intonation of the speech skill feedback system according to an embodiment of the present invention.
6 is a diagram illustrating an evaluation and feedback method based on similarity, used words, size, speed, and intonation data of a speech skill feedback system according to an embodiment of the present invention.
7 is

이상의 본 발명의 목적들, 다른 목적들, 특징들 및 이점들은 첨부된 도면과 관련된 이하의 바람직한 실시예들을 통해서 쉽게 이해될 것이다. 그러나 본 발명은 여기서 설명되는 실시예들에 한정되지 않고 다른 형태로 구체화될 수도 있다. 오히려, 여기서 소개되는 실시예들은 개시된 내용이 철저하고 완전해질 수 있도록 그리고 당업자에게 본 발명의 사상이 충분히 전달될 수 있도록 하기 위해 제공되는 것이다.The above objects, other objects, features and advantages of the present invention will be easily understood through the following preferred embodiments in conjunction with the accompanying drawings. However, the present invention is not limited to the embodiments described herein and may be embodied in other forms. Rather, the embodiments introduced herein are provided so that the disclosed subject matter may be thorough and complete, and that the spirit of the present invention may be sufficiently conveyed to those skilled in the art.

본 명세서에서, 단수형은 문구에서 특별히 언급하지 않는 한 복수형도 포함한다. 명세서에서 사용되는 '포함한다' 및/또는 '포함하는'은 언급된 구성요소는 하나 이상의 다른 구성요소의 존재 또는 추가를 배제하지 않는다.As used herein, the singular also includes the plural unless specifically stated otherwise in the phrase. As used herein, 'comprises' and/or 'comprising' does not exclude the presence or addition of one or more other elements.

이하, 첨부된 도면을 참조하여 본 발명의 바람직한 실시예에 따른 스피치 스킬 피드백 시스템을 상세히 설명한다. 아래의 특정 실시예를 기술하는데 있어서 여러 가지의 특정적인 내용들은 발명을 더 구체적으로 설명하고 이해를 돕기 위해 작성되었다. 하지만 본 발명을 이해할 수 있을 정도로 이 분야의 지식을 갖고 있는 독자는 이러한 여러 가지의 특정적인 내용들이 없어도 사용될 수 있다는 것을 인지할 수 있다. 어떤 경우에는 발명을 기술하는 데 있어서 흔히 알려졌으면서 발명과 크게 관련 없는 부분들은 본 발명을 설명하는 데 있어 혼돈을 막기 위해 기술하지 않음을 미리 언급해 둔다.Hereinafter, a speech skill feedback system according to a preferred embodiment of the present invention will be described in detail with reference to the accompanying drawings. In describing the specific embodiments below, various specific contents have been prepared to more specifically describe and help the understanding of the invention. However, a reader having enough knowledge in this field to understand the present invention may recognize that it may be used without these various specific details. In some cases, it is mentioned in advance that parts which are commonly known in describing the invention and which are not largely related to the invention are not described in order to avoid confusion in describing the present invention.

도 1은 본 발명의 일실시예에 따른 스피치 스킬 피드백 시스템의 구성을 도시한 것으로 화자(100)의 말을 실시간으로 입력받거나(111), 이전에 저장된 음성파일을 불러오는(112) 음성입력부(110); 음성입력에 필요한 대본을 저장하고 있는 대본 파일(113); 불러온 음성 파일을 텍스트로 변환하여 정확한 문장으로 변환되는지를 대본의 내용과 비교하여 정확한 발음으로 표현되었는지 분석하는 발음정확도분석부(120); 정확도분석부로부터 추출된 단어들을 워드클라우딩 기법을 활용하여 단어분석을 수행하는 단어분석부(130); 음성 파일에 포함된 음성의 시간별 크기변화를 분석하는 음 높낮이 분석부(140); 음성 파일에 포함된 주파수 성분을 분석하고 음 높낮이 분석부의 데이터와 연계하여 화자의 억양을 분석하는 억양분석부(150); 음성 파일에 포함된 음성을 분석하여 음절의 개수를 도출하고 시간에 따른 음절의 개수를 계산하여 말의 속도를 분석하는 속도분석부(140); 분석부들에서 분석된 데이터를 저장하고(171) 우수한 스피치 데이터를 분석하여 모델링한 학습 모델(172)이 저장된 저장부(170); 저장된 분석 데이터들로 부터 화자의 스피치 수준을 평가하는 평가부(180); 평가결과를 화자에게 스피치 스킬을 올릴 수 있도록 추천 피드백을 수행하는 추천부(190)를 포함한다.1 is a diagram showing the configuration of a speech skill feedback system according to an embodiment of the present invention. A voice input unit 110 for receiving a speech of a speaker 100 in real time (111) or calling a previously stored voice file (112). ); a script file 113 storing a script necessary for voice input; a pronunciation accuracy analysis unit 120 that converts the called voice file into text and analyzes whether it is converted into an accurate sentence by comparing it with the content of the script and expressing whether it is expressed in the correct pronunciation; a word analysis unit 130 that analyzes the words extracted from the accuracy analysis unit using a word clouding technique; a pitch analyzer 140 that analyzes a change in the volume of the voice included in the voice file over time; an intonation analysis unit 150 that analyzes a frequency component included in a voice file and analyzes a speaker's intonation in connection with the data of the pitch analysis unit; a speed analyzer 140 for analyzing speech speed by analyzing the speech included in the speech file to derive the number of syllables and calculating the number of syllables according to time; a storage unit 170 storing the data analyzed by the analysis units (171) and a learning model 172 modeled by analyzing and modeling excellent speech data; an evaluation unit 180 for evaluating the speaker's speech level from the stored analysis data; and a recommendation unit 190 that performs recommendation feedback so that the evaluation result can be raised to the speaker with a speech skill.

도 2는 본 발명의 일실시예에 따른 스피치 스킬 피드백 시스템으로 구현된 어플리케이션의 예시로 어플리케이션을 시작하면 먼저 대본을 직접 입력하거나 대본 파일을 읽어오는 단계(200); 입력된 대본에 따라 화자가 대본을 읽어서 음성을 생성하는 단계(210); 음성을 텍스트로 변환하고 대본과 비교하여 발음의 정확도를 분석하는 단계(220); 음성의 크기, 속도, 사용단어, 억양 등을 분석하는 단계(240); 분석결과를 화면에 표시하여 화자에게 결과를 피드백하는 단계(250)를 포함한다.2 is an example of an application implemented as a speech skill feedback system according to an embodiment of the present invention. When the application is started, first directly inputting a script or reading a script file ( 200 ); Step 210, the speaker reads the script according to the input script to generate a voice; converting the voice into text and analyzing the accuracy of pronunciation by comparing it with the script (220); Analyzing the volume, speed, used words, intonation, etc. of the voice (240); and displaying the analysis result on the screen to feed back the result to the speaker (250).

도 3은 본 발명의 일실시예에 따른 스피치 스킬 피드백 시스템의 화자의 말과 대본의 유사도 분석 방법을 도시한 것으로 화자의 음성이나 저장된 음성 파일(100)로부터 음성 데이터를 읽어서 텍스트로 변환하는 STT 단계(310); BERT 알고리즘을 적용하여 단어들을 분석하는 단계(320); 대본과 분석한 단어의 유사도를 계산하는 단계(340); 계산된 유사도를 저장하는 단계(350)를 포함한다.3 is a diagram illustrating a similarity analysis method between a speaker's speech and a script of the speech skill feedback system according to an embodiment of the present invention. The STT step of reading the speaker's voice or voice data from the stored voice file 100 and converting it into text (310); analyzing the words by applying the BERT algorithm (320); calculating a similarity between the script and the analyzed word ( 340 ); and storing 350 the calculated similarity.

도 4는 본 발명의 일실시예에 따른 스피치 스킬 피드백 시스템의 사용 단어 분석 방법을 도시한 것으로 STT 단계(310)에서 추출된 문장에서 단어만을 추출하는 단계(400); 추출된 단어들에서 사용하지 않는 단어를 제거하는 전처리 단계(410); 추출된 단어들의 수를 계수하는 단계(420); 계수된 단어를 워드크라우드로 시각화하는 단계(430); 계수 데이터를 저장하는 단계(440); 추출된 단어들의 연계성을 분석하기 위해 빈도수에 따른 단어들의 관계를 벡터로 변환하는 단계(450); 변환된 벡터값들을 이용하여 학습 모델을 생성하는 단계(460)를 포함한다.4 is a diagram illustrating a used word analysis method of a speech skill feedback system according to an embodiment of the present invention, in which only words are extracted from the sentences extracted in the STT step 310 (400); a preprocessing step 410 of removing unused words from the extracted words; counting the number of extracted words (420); Visualizing the counted words as a word cloud (430); storing the coefficient data (440); converting a relationship between words according to frequency into a vector in order to analyze the association of the extracted words ( 450 ); and generating a learning model using the transformed vector values (460).

도 5는 본 발명의 일실시예에 따른 스피치 스킬 피드백 시스템의 음의 크기, 말의 속도, 억양 분석 방법을 도시한 것으로 화자의 음성이나 저장된 음성 파일(100)로부터 음성 데이터를 읽어서 음의 진폭을 측정하는 단계(500); 측정된 음의 진폭을 데시벨 값을 변화하는 단계(510); 변환된 음의 크기를 저장하는 단계(520); 소리 크기에 따라 크기가 크게 변하는 구간을 검색하여 단어의 수를 계산하는 단계(530); 기준 시간당 단어의 수를 저장하는 단계(540); 음성데이터를 주파수 영역으로 변환하는 단계(550); 주파수 별로 음성 데이터를 계수하는 단계(560); 음의 크기, 단어 수, 주파수별 분포 데이터를 사용하여 음의 특성을 분석하여 억양을 판단하는 단계(570); 분석된 억양을 기반을 화자를 분류하는 단계(580)를 포함한다.5 is a diagram illustrating a method for analyzing the volume, speed of speech, and intonation of the speech skill feedback system according to an embodiment of the present invention, and the amplitude of the sound is obtained by reading the voice data from the speaker's voice or the stored voice file 100. measuring 500; changing the measured sound amplitude to a decibel value (510); storing the converted sound level (520); Calculating the number of words by searching for a section in which the loudness is greatly changed according to the loudness (530); storing (540) the number of words per reference time; converting voice data into a frequency domain (550); counting voice data for each frequency (560); determining the intonation by analyzing the characteristics of the sound using the sound size, the number of words, and distribution data for each frequency (570); and classifying (580) the speaker based on the analyzed intonation.

도 6은 본 발명의 일실시예에 따른 스피치 스킬 피드백 시스템의 유사도, 사용 단어, 크기, 속도, 억양 데이터를 기반으로 평가 및 피드백 방법을 도시한 것ㅇ으로 이전 단계에서 분석된 유사도(350), 사용 단어(450), 크기(520), 속도(540), 억양(580)을 입력하는 단계(600); 입력데이터를 학습 모델(610)에 적용하여 화자의 스피치 정도를 평가하고 출력하는 단계(620); 평가 결과를 분석하여 화자에게 피드백하는 단계(630)을 포함한다.6 shows an evaluation and feedback method based on similarity, used words, size, speed, and intonation data of the speech skill feedback system according to an embodiment of the present invention. The similarity 350 analyzed in the previous step, inputting (600) used word (450), size (520), speed (540), intonation (580); Evaluating and outputting the speaker's speech level by applying the input data to the learning model 610 (620); and analyzing the evaluation result and feeding back to the speaker ( 630 ).

Claims

In the construction of a speech skill feedback system.
a voice input unit for receiving the speaker's speech in real time or for loading a previously stored voice file;
a script file storing a script necessary for voice input;
a pronunciation accuracy analysis unit that converts the called voice file into text and analyzes whether it is converted into an accurate sentence by comparing it with the content of the script and analyzing whether it is expressed as an accurate pronunciation;
a word analysis unit that analyzes the words extracted from the accuracy analysis unit using a word clouding technique;
a pitch analyzer for analyzing a change in volume over time of a voice included in the voice file;
an intonation analysis unit that analyzes frequency components included in the voice file and analyzes a speaker's intonation in connection with the data of the pitch analysis unit;
a speed analyzer that analyzes the voice included in the voice file to derive the number of syllables and calculates the number of syllables according to time to analyze the speed of speech;
a storage unit storing the data analyzed by the analysis units and a learning model modeled by analyzing and modeling excellent speech data;
an evaluation unit for evaluating the speaker's speech level from the stored analysis data;
A configuration of a speech skill feedback system comprising a recommendation unit for performing a recommendation feedback so as to raise the speech skill to the speaker based on the evaluation result.

In the method of operation of the pronunciation accuracy analysis unit of claim 1,
STT step of reading voice data from the speaker's voice or a stored voice file and converting it into text;
analyzing the words by applying the BERT algorithm;
calculating a similarity between the script and the analyzed word;
Method of operation of the pronunciation accuracy analyzer comprising the step of storing the calculated similarity.

In the operating method of the word analysis unit of claim 1,
extracting only words from the sentences extracted in the STT step;
a preprocessing step of removing unused words from the extracted words;
counting the number of extracted words;
Visualizing the counted words as a word cloud;
storing coefficient data;
converting a relationship between words according to frequency into a vector in order to analyze the relationship between the extracted words;
and generating a learning model using the transformed vector values.

In the method of operation of the pitch analysis unit, the intonation analysis unit, and the speed analysis unit of claim 1,
Measuring the amplitude of the sound by reading the speaker's voice or voice data from a stored voice file to show a method for analyzing the volume, speed of speech, and intonation;
changing the measured sound amplitude to a decibel value;
storing the converted sound level;
calculating the number of words by searching for a section in which the loudness varies greatly according to the loudness of the sound;
storing the number of words per reference time;
converting voice data into a frequency domain;
counting voice data for each frequency;
determining intonation by analyzing sound characteristics using sound size, number of words, and frequency distribution data;
A method of operating a pitch analysis unit, an intonation analysis unit, and a speed analysis unit, comprising the step of classifying a speaker based on the analyzed intonation.

In the method of operation of the evaluation unit and the recommendation unit of claim 1,
It shows an evaluation and feedback method based on similarity, word used, size, speed, and intonation data, comprising: inputting the similarity, word used, size, speed, and intonation analyzed in the previous step;
evaluating and outputting the speaker's speech level by applying the input data to the learning model;
An operation method of an evaluation unit and a recommendation unit, comprising the step of analyzing the evaluation result and giving feedback to the speaker.