KR20210147985A

KR20210147985A - Method and system for analyzing fluency using big data

Info

Publication number: KR20210147985A
Application number: KR1020210069571A
Authority: KR
Inventors: 고연정; 윤재민
Original assignee: 보리 주식회사
Priority date: 2020-05-29
Filing date: 2021-05-28
Publication date: 2021-12-07
Also published as: KR102533368B1

Abstract

The present invention relates to a method for analyzing fluency using big data and a system thereof and, more specifically, to a method for automatically analyzing phenomena in which speech is not smooth, such as speech repetition, speech prolongation, blockage, speech correction, interjections, and the like, which are stuttering disorders, and a system thereof. To this end, a client comprises: a video/audio transmission unit transmitting video and audio to a server; a voice transmission unit transmitting voice to the server; and a result display unit displaying a fluency analysis result. The server comprises: a video/audio reception unit; the voice receiving unit; a voice analysis unit separating only the voice from a voice or video stream; a preprocessing unit removing noise from the voice; a minimum voice unit separation unit separating the voice into a minimum unit to perform detection; a voice feature extraction unit; a primary repeated and extended interjection prediction unit automatically predicting a non-fluency type on the basis of one-syllable repetition, extension, and interjection; a voice recognition unit; a language analysis unit performing morpheme analysis, syntax analysis, and semantic analysis; a secondary repeated and extended interjection prediction automatically predicting a non-fluency type on the basis of repetitive recognition of two or more syllable words, phrases, and clauses, extension of two or more syllables, and interjection; a correction blockage prediction unit automatically predicting a non-fluency type of blockage; and a result analysis unit calculating fluency analysis results.

Description

Method and system for analyzing fluency using big data}

본 발명은 유창성을 분석하는 방법 및 시스템에 관한 것으로서, 더욱 상세하게는 말더듬 장애에 해당하는 말의 반복(Repetition), 말의 연장(Prolongation), 말의 막힘(Block) 등 말의 흐름이 순조롭지 않은 현상을 자동으로 분석하는 방법과 시스템에 관한 것이다.The present invention relates to a method and system for analyzing fluency, and more particularly, if the flow of speech, such as repetition, prolongation, blockage, etc. of speech corresponding to a stuttering disorder, is not smooth It relates to a method and system for automatically analyzing an unexpected phenomenon.

유창성 장애(Fluency Disorders)는 자연스럽지 않은 말의 흐름으로 의사소통의 어려움이 있는 장애를 말하는데, 크게 말더듬(Stuttering)과 말빠름증으로 구분한다.Fluency Disorders are disorders in which communication is difficult due to an unnatural flow of speech.

말더듬은 말의 흐름이 비정상적으로 자주 끊기거나, 말 속도가 불규칙하거나, 말을 할 때 불필요한 노력이 들어가는 것을 말한다.Stuttering refers to abnormally frequent interruptions in the flow of speech, irregular speech speed, or unnecessary effort when speaking.

말빠름증은 말을 하는 도중에 점차 말의 속도가 빨라져서 제대로 조음이 되지 않는 현상을 말한다.Rapid speech refers to a phenomenon in which speech gradually increases in speed during speech, resulting in poor articulation.

현재 유창성을 분석하기 위해서 파라다이스-유창성 검사(P-FA) 도구가 있으며, 이 도구를 바탕으로 음성을 직접 사람이 전사하여 정상적 비유창(ND)와 비정상적비유창(AD)를 분석할 수 있다.Currently, there is the Paradise-Fluency Test (P-FA) tool to analyze fluency, and based on this tool, a person can directly transcribe a voice to analyze normal non-fluency (ND) and abnormal non-fluency (AD).

정상적 비유창(ND)에는 주저(H), 간투사(I), 미완성 또는/그리고 수정(Ur), 반복1(R1)이 있고, 비정상적비유창(AD)에는 주저-비정상적(Ha), 간투사-비정상적(Ia), 미완성 또는/그리고 수정-비정상적(URa), 반복1-비정상적(R1a), 반복2(R2), 비운율적 발성(DP)이 있다.Normal non-fluency (ND) includes hesitation (H), hepatic projection (I), incomplete or/and modified (Ur), repeat 1 (R1), and abnormal non-fluency (AD) hesitation-abnormal (Ha), hepatic-projection- Abnormal (Ia), incomplete or/and modified-abnormal (URa), Repetitive 1 - Abnormal (R1a), Repetition 2 (R2), and non-prosody phonation (DP).

그러나, 이 방법은 사람이 직접 소리를 듣고 문자로 전사하여 ND, AD를 분석하기 때문에 많은 시간과 비용이 발생하는 방법이다.However, this method is a method that takes a lot of time and cost because a person directly hears the sound and transcribes it into text to analyze the ND and AD.

종래의 기술로서, 데이터베이스화된 언어치료 이력들을 검색하여 현재 아동의 증상 및 발화상태에 가장 유사한 몇 건의 치료이력을 추출하고, 치료이력을 재가공하여 개인별 발화치료 커리큘럼을 새롭게 작성하여 제공하도록 하는 발달장애 아동 언어치료 장치는 이미 개발된 바 있다(KR 공개특허 2019-0031128 참조).As a prior art, a developmental disorder that searches databaseized speech therapy histories, extracts several treatment histories most similar to the current child's symptoms and utterances, and reprocesses the treatment histories to create and provide a new individual speech therapy curriculum. A child speech therapy device has already been developed (refer to KR Patent Publication 2019-0031128).

KR 공개특허 제2019-0031128호 (2019.03.25)KR Patent Publication No. 2019-0031128 (2019.03.25)

본 발명은 상기와 같은 종래 기술의 문제점을 해결하기 위하여 제안된 것으로서, 기존 말더듬이 발생한 구간에 대한 음성을 수집하여 학습하여, 신규 음성이 입력될 경우, 말더듬을 자동으로 예측하여, 사람의 노동력을 절감할 수 있도록, 말더듬 장애 음성 추출 및 해당 음성이 발생한 구간에 해당하는 말더듬 장애 유형을 태깅하고, 말더듬 장애 음성 구간에 대한 특징 추출 및 학습 후, 신규 입력되는 음성으로부터 말더듬 장애 구간 자동 추출 및 말더듬 장애 종류를 자동으로 예측하는 빅데이터를 이용해서 유창성을 분석하는 방법 및 시스템을 제공하는데 그 목적이 있다.The present invention has been proposed to solve the problems of the prior art, as described above, by collecting and learning voices for a section in which existing stuttering occurs, and automatically predicts stuttering when a new voice is input, thereby reducing human labor. To do this, extract the stuttering disorder voice and tag the stuttering disorder type corresponding to the section in which the voice occurred, and after feature extraction and learning for the stuttering disorder voice section, automatic extraction of the stuttering disorder section from the newly input voice and the type of stuttering disorder It aims to provide a method and system for analyzing fluency using big data that automatically predicts

이와 같은 목적을 달성하기 위한 빅데이터를 이용해서 유창성을 분석하는 시스템은 클라이언트와 서버를 포함하여 구성된다.The system for analyzing fluency using big data to achieve this purpose is composed of a client and a server.

구체적으로, 클라이언트는 카메라와 마이크를 통해서 영상과 음성을 실시간 캡쳐해서 스트림으로 서버로 전송하는 영상/음성전송부, 마이크를 통해서 음성을 실시간 캡쳐해서 스트림으로 서버로 전송하는 음성전송부, 및 유창성분석 결과를 그래프로 표시하는 결과표시부가 구성되는 것을 특징으로 한다.Specifically, the client includes a video/audio transmitter that captures video and audio in real time through a camera and microphone and transmits it to the server as a stream, a voice transmitter that captures audio in real time through a microphone and transmits it to the server as a stream, and fluency analysis It is characterized in that the result display unit for displaying the result in a graph is configured.

그리고 서버는 상기 클라이언트의 영상/음성전송부에서 전송된 영상과 음성을 실시간 수신하는 영상/음성수신부, 음성전송부에서 전송된 음성을 실시간 수신하는 음성수신부, 음성 또는 영상 스트림으로부터 음성만 분리해내는 음성분석부, 음성에서 노이즈를 제거하는 전처리부, 음성을 최소 단위로 분리해서 검출하는 음성최소단위분리부, 음성으로부터 특징을 추출하는 음성특징추출부, 1음절반복, 연장, 간투사를 음성기반으로 비유창성유형을 자동으로 예측하는 반복연장간투사1차예측부, 음성을 인식하는 음성인식부, 음성인식한 텍스트를 대상으로 형태소분석, 구문분석, 의미분석을 진행하는 언어분석부, 음성인식 및 언어분석 기반으로 2음절이상 낱말, 구, 절 반복인식과 2음절이상 연장, 간투사의 비유창성유형을 자동으로 예측하는 반복연장간투사2차예측부, 음성인식 및 언어분석 기반으로 수정, 막힘의 비유창성유형을 자동으로 예측하는 수정막힘예측부,유창성분석 결과를 계산하는 결과분석부가 구성되는 것을 특징으로 한다.In addition, the server separates only the audio from the video/audio receiver that receives the video and audio transmitted from the video/audio transmitter of the client in real time, the audio receiver that receives the audio transmitted from the audio transmitter in real time, and the audio or video stream. Voice analysis unit, pre-processing unit that removes noise from voice, voice minimum unit separation unit that separates and detects voice into minimum units, voice feature extraction unit that extracts features from voice, 1-syllable repetition, extension, and inter-projection based on voice The primary prediction unit between repeated extensions that automatically predicts the non-fluency type, the voice recognition unit that recognizes the voice, the language analysis unit that performs morpheme analysis, syntax analysis, and semantic analysis on the speech recognized text, speech recognition and language Recognition of repetition of words, phrases, and clauses of more than 2 syllables based on analysis, extension of more than 2 syllables, secondary prediction unit between repeated extensions that automatically predicts the non-fluency type of inter-projection, correction based on speech recognition and language analysis, non-fluency of blockage It is characterized in that the correction blockage prediction unit that automatically predicts the type, and the result analysis unit that calculates the fluency analysis result are configured.

그리고, 비유창성유형을 수집해서 학습하고, 유창성평가모델을 생성하기 위해서, 반복, 연장, 간투사가 적용된 음성을 입력하는 반복연장간투사입력부; 학습에 사용할 음성의 특징을 추출하는 학습용음성특징추출부; 지도학습으로 반복, 연장, 간투사 음성을 학습하는 머신러닝학습부; 및 유창성을 평가할 수 있는 유창성평가모델생성부를 포함하여 구성되어, 음성이 입력되면, 해당 음성의 비유창성 유형을 자동으로 예측하는 것을 특징으로 한다.And, in order to collect and learn the non-fluency type, and to generate a fluency evaluation model, a repeat-extension inter-projection input unit for inputting a voice to which repetition, extension, and inter-projection are applied; a learning voice feature extraction unit for extracting voice features to be used for learning; a machine learning learning unit that learns repeated, extended, and inter-projected voices through supervised learning; and a fluency evaluation model generator capable of evaluating fluency, and is characterized in that, when a voice is input, the non-fluency type of the corresponding voice is automatically predicted.

또한, 영상과 음성을 동시에 입력받거나, 음성만 입력 받아도 자동으로 유창성을 분석해 주며, 1음절반복, 1음절연장, 1음절간투사는 음성기반으로 비유창성유형을 자동으로 예측하고, 2음절이상 낱말, 구, 절 반복인식과 2음절이상 연장과 간투사는 음성인식과 언어분석기반으로 비유창성유형을 자동으로 예측하는 것을 특징으로 한다.In addition, it automatically analyzes fluency even when video and audio are input at the same time or when only voice is input, 1 syllable repetition, 1 syllable extension, and 1 syllable projection automatically predict the non-fluency type based on the voice, words of 2 syllables or more, Phrase and clause repetition recognition, extension of more than two syllables, and inter-projection are characterized by automatically predicting non-fluency types based on speech recognition and language analysis.

한편, 본 발명의 빅데이터를 이용해서 유창성을 분석하는 방법은 클라이언트의 카메라와 마이크를 통해서 영상과 음성을 실시간 캡쳐해서 스트림으로 서버로 전송하는 단계; 영상/음성전송부에서 전송된 영상과 음성을 서버가 실시간 수신하는 단계; 클라이언트의 마이크를 통해서 음성을 실시간 캡쳐해서 스트림으로 서버로 전송하는 단계; 음성전송부에서 전송된 음성을 서버가 실시간 수신하는 단계; 음성 또는 영상 스트림으로부터 음성만 분리해내는 단계; 음성에서 노이즈를 제거하는 단계; 음성을 최소 단위로 분리해서 검출하는 단계; 음성으로부터 특징을 추출하는 단계가 구성되는 것을 특징으로 한다.On the other hand, the method of analyzing fluency using big data of the present invention includes the steps of capturing video and audio in real time through a camera and a microphone of a client and transmitting it as a stream to a server; Receiving the video and audio transmitted from the video/audio transmission unit in real time by the server; A method comprising: capturing voice through a client's microphone in real time and transmitting it as a stream to a server; A server real-time receiving the voice transmitted from the voice transmission unit; separating only the audio from the audio or video stream; removing noise from the voice; Separating and detecting a voice into a minimum unit; It is characterized in that the step of extracting the features from the voice is configured.

또한, 1음절반복, 연장, 간투사를 음성기반으로 비유창성유형을 자동으로 예측하는 단계; 음성을 인식하는 단계; 음성인식한 텍스트를 대상으로 형태소분석, 구문분석, 의미분석을 진행하는 단계; 음성인식 및 언어분석 기반으로 2음절이상 낱말, 구, 절 반복인식과 2음절이상 연장, 간투사 등 비유창성유형을 자동으로 예측하는 단계; 음성인식 및 언어분석 기반으로 수정, 막힘 등 비유창성유형을 자동으로 예측하는 단계; 유창성분석 결과를 계산하는 단계, 및 유창성분석 결과를 클라이언트가 그래프로 표시하는 단계; 를 포함하여 구성되는 것을 특징으로 한다.In addition, automatically predicting the non-fluency type based on the one-syllable repetition, extension, and inter-projection; Recognizing a voice; performing morphological analysis, syntax analysis, and semantic analysis on the speech-recognized text; automatically predicting non-fluency types such as word, phrase, and verse repetition recognition of more than two syllables, extension of more than two syllables, and inter-projection based on speech recognition and language analysis; automatically predicting non-fluency types such as correction and blockage based on speech recognition and language analysis; calculating a fluency analysis result, and displaying the fluency analysis result as a graph by the client; It is characterized in that it comprises a.

그리고, 상기 방법은 비유창성유형을 수집해서 학습하고, 유창성평가모델을 생성하기 위해서, 반복, 연장, 간투사가 적용된 음성을 입력하는 단계; 학습에 사용할 음성의 특징을 추출하는 단계; 지도학습으로 반복, 연장, 간투사 음성을 학습하는 단계; 및 유창성을 평가할 수 있는 유창성평가모델을 생성하는 단계를 포함하여 구성되어, 음성이 입력되면, 해당 음성의 비유창성 유형을 자동으로 예측하는 것을 특징으로 한다.In addition, the method includes the steps of: collecting and learning non-fluency types, and inputting a voice to which repetition, extension, and inter-projection are applied to generate a fluency evaluation model; extracting features of speech to be used for learning; Learning repetition, extension, and inter-projection voice through supervised learning; and generating a fluency evaluation model capable of evaluating fluency, wherein when a voice is input, the non-fluency type of the corresponding voice is automatically predicted.

또한, 상기 방법은 영상과 음성을 동시에 입력받거나, 음성만 입력 받아도 자동으로 유창성을 분석해 주며, 1음절반복, 1음절연장, 1음절간투사는 음성기반으로 비유창성유형을 자동으로 예측하는 단계; 및 2음절이상 낱말, 구, 절 반복인식과 2음절이상 연장과 간투사는 음성인식과 언어분석기반으로 비유창성유형을 자동으로 예측하는 단계를 포함하여 구성되는 것을 특징으로 한다.In addition, the method automatically analyzes fluency even when video and audio are input at the same time or only voice is input, and automatically predicting the non-fluency type based on voice for 1-syllable repetition, 1-syllable extension, and 1-syllable projection; And it is characterized in that it comprises a step of automatically predicting the non-fluency type based on speech recognition and language analysis for two or more syllables or more of word, phrase, and clause repetition recognition, extension of two syllables or more, and inter-projection.

이와 같이 구성된 본 발명의 빅데이터를 이용해서 유창성을 분석하는 방법 및 시스템은 다음과 같은 유용한 효과를 발휘한다.The method and system for analyzing fluency using the big data of the present invention configured as described above exhibit the following useful effects.

1) 전문가가 직접 음성을 듣고 텍스트로 전사하여 유창성을 평가하던 작업이 자동화되어 유창성을 평가하는 비용과 시간을 절감할 수 있다.1) It is possible to reduce the cost and time for fluency evaluation by automating the task of an expert listening to voice and transcribed it into text to evaluate fluency.

2) 영상과 음성을 동시에 입력받거나, 음성만 입력 받아도 자동으로 유창성을 분석해준다.2) It automatically analyzes fluency even when video and audio are input at the same time or only audio is input.

3) 별도의 영상녹화나 음성녹음없이 실시간으로 영상과 음성을 처리해서 유창성을 분석해준다.3) It analyzes fluency by processing video and audio in real time without separate video or audio recording.

4) 로봇이나 테블릿, 스마트폰, 전자펜, 홀로그램, 디지털사이니지, TV, 자동차 등 다양한 멀티미디어 장비로부터 영상과 음성을 실시간 수집하여 유창성을 분석해줄 수 있다.4) Fluency can be analyzed by collecting video and audio from various multimedia equipment such as robots, tablets, smartphones, electronic pens, holograms, digital signage, TVs, and automobiles in real time.

도 1은 본 발명에 따른 빅데이터를 이용해서 유창성을 분석하는 시스템의 구성도;
도2는 빅데이터를 이용해서 유창성을 분석하는 시스템의 결과표시부 요약 화면;
도3은 빅데이터를 이용해서 유창성을 분석하는 시스템의 결과표시부 그래프 화면;1 is a block diagram of a system for analyzing fluency using big data according to the present invention;
2 is a result display unit summary screen of a system for analyzing fluency using big data;
3 is a graph screen of the result display unit of a system for analyzing fluency using big data;

이하, 본 발명의 목적이 구체적으로 실현될 수 있는 바람직한 실시예를 첨부된 도면을 참조하여 상세히 설명한다. 본 실시예를 설명함에 있어서, 동일 구성에 대해서는 동일 명칭이 사용되며 이에 따른 부가적인 설명은 생략하기로 한다.Hereinafter, preferred embodiments in which the object of the present invention can be specifically realized will be described in detail with reference to the accompanying drawings. In describing the present embodiment, the same names are used for the same components, and an additional description thereof will be omitted.

본 발명에서 말의 반복은 음절이나 낱말의 일부, 낱말, 구, 절 등을 스스로 멈추지 못하고 여러 차례 되풀이 하는 것을 말한다. 즉, "(아) (아) (아) 아빠"에서 "(아)" 음절이 반복해서 발생한 경우에 해당한다.In the present invention, word repetition refers to repeating a syllable or a part of a word, a word, a phrase, a clause, etc. several times without stopping by itself. That is, it corresponds to the case where the syllable “(ah)” is repeated in “(ah) (ah) (ah) daddy”.

말의 연장은 말소리가 0.5초 이상 길게 이어져 계속되는 상태를 말한다. 즉, "(아~~~) 아빠"에서 "(아~~~)"와 같이 연장되어 발음되는 것을 말한다.Prolongation of speech refers to a state in which speech sounds are continued for more than 0.5 seconds. In other words, it means that it is pronounced as "(ah~~~)" from "(ah~~~) daddy".

말의 막힘은 말소리가 정지되어 소리를 더 이상 이어나갈 수 없는 상태를 말한다. ""아 아 아 아빠 (2초 이상 발성없이 다음 문장 발화) 오세요".Speech blockage is a condition in which speech stops and it is no longer possible to continue the sound. ""Oh ah ah ah daddy (speak the next sentence without uttering for more than 2 seconds) Come".

말의 수정은 "(이제 공룡) 이제 아빠가 책 읽어줘"에서 "(이제 공룡)" 발화를 말한다.The modification of the word refers to the utterance of “(Now Dinosaur)” in “(Now, Dinosaur), read a book, Daddy”.

간투사는 "아빠가 (어) 오늘 (어) 회사에 갔어."에서 "(어)" 발화를 말한다.Ganthuja refers to the utterance "(uh)" in "Dad (uh) went to (uh) work today."

도 1은 본 발명에 따른 빅데이터를 이용해서 유창성을 분석하는 시스템의 구성도이다.1 is a block diagram of a system for analyzing fluency using big data according to the present invention.

본 발명에 따른 빅데이터를 이용해서 유창성을 분석하는 시스템은 도 1에 도시된 바와 같이, 카메라로 입력되는 영상정보, 마이크로 입력되는 음성정보를 실시간으로 처리하여 유창성을 분석하는 클라이언트(100)와 서버(200)로 구성된다.As shown in FIG. 1, the system for analyzing fluency using big data according to the present invention is a client 100 and a server that analyzes fluency by processing video information input by a camera and voice information input into a microphone in real time (200).

클라이언트(100)는 음성전송부(101), 영상/음성전송부(102), 결과표시부(103)로 구성된다.The client 100 is composed of an audio transmission unit 101 , an image/audio transmission unit 102 , and a result display unit 103 .

서버(200)는 음성수신부(201), 영상/음성수신부(202), 음성분석부(203), 전처리부(204), 음성최소단위분리부(205), 음성특징추출부(206), 반복연장간투사1차예측부(207), 음성인식부(208), 언어분석부(209), 반복연장간투사2차예측부(210), 수정막힘예측부(211), 결과분석부(212), 반복연장간투사음성입력부(213), 학습용음성특징추출부(214), 머신러닝학습부(215), 유창성평가모델생성부(216)로 구성된다.The server 200 includes a voice receiver 201, an image/audio receiver 202, a voice analysis unit 203, a preprocessor 204, a minimum voice unit separation unit 205, a voice feature extraction unit 206, and repeat. Projection between extension primary prediction unit 207, voice recognition unit 208, language analysis unit 209, repeated projection secondary prediction unit 210, correction blockage prediction unit 211, result analysis unit 212, It consists of a projection voice input unit 213 between repeated extensions, a voice feature extraction unit 214 for learning, a machine learning learning unit 215 , and a fluency evaluation model generation unit 216 .

클라이언트(100)의 영상/음성전송부(102)는 클라이언트측 카메라와 마이크를 통해서 영상과 음성을 실시간 캡쳐해서 스트림으로 서버(200)로 전송하는 역할을 수행한다.The video/audio transmission unit 102 of the client 100 performs a role of capturing video and audio in real time through the client-side camera and microphone and transmitting the video and audio as a stream to the server 200 .

서버(200)의 영상/음성수신부(202)는 영상/음성전송부에서 전송된 영상과 음성을 실시간 수신해서 음성분석부로 전송한다.The video/audio receiver 202 of the server 200 receives the video and audio transmitted from the video/audio transmitter in real time and transmits it to the audio analyzer.

클라이언트(100)의 음성전송부(101)는 클라이언트측 마이크를 통해서 음성을 실시간 캡쳐해서 스트림으로 서버(200)로 전송한다.The voice transmission unit 101 of the client 100 captures the voice in real time through the client-side microphone and transmits it to the server 200 as a stream.

음성수신부(201)는 음성전송부(101)에서 전송된 음성을 실시간 수신한다.The voice receiver 201 receives the voice transmitted from the voice transmitter 101 in real time.

음성분석부(203)는 음성 또는 영상 스트림으로부터 음성만 분리한다.The audio analysis unit 203 separates only the audio from the audio or video stream.

전처리부(204)는 음성에서 노이즈를 제거한다.The preprocessor 204 removes noise from the voice.

음성최소단위분리부(205)는 음성을 최소 단위로 분리해서 검출한다.The voice minimum unit separation unit 205 separates and detects the voice into the smallest unit.

음성특징추출부(206)는 음성으로부터 특징을 추출한다.The voice feature extraction unit 206 extracts features from the voice.

반복연장간투사1차예측부(207)는 1음절반복, 연장, 간투사를 음성기반으로 비유창성유형을 자동으로 예측한다.The repeat-extension inter-projection primary prediction unit 207 automatically predicts the non-fluency type based on one syllable repetition, extension, and inter-projection based on the voice.

음성인식부(208)는 음성을 인식한다.The voice recognition unit 208 recognizes a voice.

언어분석부(209)는 음성인식한 텍스트를 대상으로 형태소분석, 구문분석, 의미분석을 진행한다.The language analysis unit 209 performs morpheme analysis, syntax analysis, and semantic analysis on the speech-recognized text.

반복연장간투사2차예측부(210)는 음성인식 및 언어분석 기반으로 2음절이상 낱말, 구, 절 반복인식과 2음절이상 연장, 간투사 등 비유창성유형을 자동으로 예측한다.The repeated extension inter-projection secondary prediction unit 210 automatically predicts non-fluency types such as word, phrase, and clause repetition recognition of two or more syllables, extension of two syllables or more, and inter-projection based on speech recognition and language analysis.

수정막힘예측부(211)는 음성인식 및 언어분석 기반으로 수정, 막힘 등 비유창성유형을 자동으로 예측한다.The correction blockage prediction unit 211 automatically predicts the type of non-fluency such as correction and blockage based on speech recognition and language analysis.

결과분석부(212)는 유창성분석 결과를 계산한다.The result analysis unit 212 calculates a fluency analysis result.

클라이언트(100)의 결과표시부(103)는 유창성분석 결과를 그래프로 표시한다.The result display unit 103 of the client 100 displays the fluency analysis result in a graph.

그리고, 본 발명은 비유창성유형을 수집해서 학습하고, 유창성평가모델을 생성하기 위해서, 반복연장간투사입력부(213)는 반복, 연장, 간투사가 적용된 음성을 입력한다.And, in the present invention, in order to collect and learn non-fluency types and to generate a fluency evaluation model, the repeated-extension inter-projection input unit 213 inputs a voice to which repetition, extension, and inter-projection are applied.

학습용음성특징추출부(214)는 학습에 사용할 음성의 특징을 추출한다.The learning voice feature extraction unit 214 extracts voice features to be used for learning.

머신러닝학습부(215)는 지도학습으로 반복, 연장, 간투사 음성을 학습한다.The machine learning learning unit 215 learns repetition, extension, and inter-projection voice through supervised learning.

유창성평가모델생성부(216)는 유창성을 평가할 수 있는 모델을 생성한다.The fluency evaluation model generation unit 216 generates a model capable of evaluating fluency.

이를 통해서 음성이 입력되면, 해당 음성의 비유창성 유형을 자동으로 예측할 수 있다.Through this, when a voice is input, the non-fluency type of the corresponding voice can be automatically predicted.

또한, 본 발명의 빅데이터를 이용해서 유창성을 분석하는 시스템은 영상과 음성을 동시에 입력받거나, 음성만 입력 받아도 자동으로 유창성을 분석해 준다.In addition, the system for analyzing fluency using big data of the present invention automatically analyzes fluency even when video and audio are simultaneously input or only audio is input.

그리고, 1음절반복, 1음절연장, 1음절간투사는 음성기반으로 비유창성유형을 자동으로 예측하고, 2음절이상 낱말, 구, 절 반복인식과 2음절이상 연장과 간투사는 음성인식과 언어분석기반으로 비유창성유형을 자동으로 예측하는 것을 특징으로 한다.And, 1-syllable repetition, 1-syllable extension, and inter-syllable projection automatically predict the non-fluency type based on speech, and the repetition recognition of words, phrases, and verses over 2 syllables, extension of 2 syllables or more, and inter-projection are based on speech recognition and language analysis. It is characterized by automatically predicting the non-fluency type.

또한, 본 발명의 빅데이터를 이용해서 유창성을 분석하는 방법은 클라이언트측 카메라와 마이크를 통해서 영상과 음성을 실시간 캡쳐해서 스트림으로 서버로 전송하는 단계를 거친다.In addition, the method of analyzing fluency using big data according to the present invention includes capturing video and audio in real time through a client-side camera and microphone and transmitting it to the server as a stream.

영상/음성전송부에서 전송된 영상과 음성을 실시간 수신하는 단계를 거친다.The video/audio transmitter receives the video and audio transmitted in real time.

클라이언트측 마이크를 통해서 음성을 실시간 캡쳐해서 스트림으로 서버로 전송하는 단계를 거친다.Through the client-side microphone, the voice is captured in real time and transmitted to the server as a stream.

음성전송부에서 전송된 음성을 실시간 수신하는 단계를 거친다.A step of receiving the voice transmitted from the voice transmitter in real time is performed.

음성 또는 영상 스트림으로부터 음성만 분리해내는 단계를 거친다.It goes through a step of separating only the audio from the audio or video stream.

음성에서 노이즈를 제거하는 단계를 거친다.It goes through the steps of removing noise from the voice.

음성을 최소 단위로 분리해서 검출하는 단계를 거친다.A step of separating and detecting a voice into a minimum unit is performed.

음성으로부터 특징을 추출하는 단계를 거친다.It goes through the steps of extracting features from the voice.

1음절반복, 연장, 간투사를 음성기반으로 비유창성유형을 자동으로 예측하는 단계를 거친다.It automatically predicts the non-fluency type based on one syllable repetition, extension, and liver projection.

음성을 인식하는 단계를 거친다.It goes through the steps of recognizing a voice.

음성인식한 텍스트를 대상으로 형태소분석, 구문분석, 의미분석을 진행하는 단계를 거친다.It goes through the steps of morphological analysis, syntax analysis, and semantic analysis on the speech-recognized text.

음성인식 및 언어분석 기반으로 2음절이상 낱말, 구, 절 반복인식과 2음절이상 연장, 간투사 등 비유창성유형을 자동으로 예측하는 단계를 거친다.Based on speech recognition and language analysis, it automatically predicts non-fluency types such as word, phrase, and clause repetition recognition of more than two syllables, extension of more than two syllables, and inter-projection.

음성인식 및 언어분석 기반으로 수정, 막힘 등 비유창성유형을 자동으로 예측하는 단계를 거친다.Based on speech recognition and language analysis, it automatically predicts non-fluency types such as correction and blockage.

유창성분석 결과를 계산하는 단계를 거친다.It goes through the steps of calculating the results of the fluency analysis.

유창성분석 결과를 그래프로 표시하는 단계로 구성된다.It consists of displaying the results of the fluency analysis in a graph.

또한, 본 발명은 비유창성유형을 수집해서 학습하고, 유창성평가모델을 생성하기 위해서, 반복, 연장, 간투사가 적용된 음성을 입력하는 단계를 거친다.In addition, the present invention goes through the steps of inputting a voice to which repetition, extension, and inter-projection are applied in order to collect and learn non-fluency types and to generate a fluency evaluation model.

학습에 사용할 음성의 특징을 추출하는 단계를 거친다.It goes through the steps of extracting the features of speech to be used for learning.

지도학습으로 반복, 연장, 간투사 음성을 학습하는 단계를 거친다.Through supervised learning, it goes through the steps of learning repetition, extension, and inter-projection speech.

유창성을 평가할 수 있는 유창성평가모델을 생성하는 단계로 구성된다.It consists of creating a fluency evaluation model that can evaluate fluency.

이를 통해서 음성이 입력되면, 해당 음성의 비유창성 유형을 자동으로 예측한다.When a voice is input through this, the non-fluency type of the corresponding voice is automatically predicted.

또한, 본 발명은 영상과 음성을 동시에 입력받거나, 음성만 입력 받아도 자동으로 유창성을 분석해준다.In addition, the present invention automatically analyzes fluency even when video and audio are input at the same time or only audio is input.

그리고, 1음절반복, 1음절연장, 1음절간투사는 음성기반으로 비유창성유형을 자동으로 예측하는 단계를 거친다.And, 1 syllable repetition, 1 syllable extension, and 1 syllable projection go through the steps of automatically predicting the non-fluency type based on the voice.

또한, 2음절이상 낱말, 구, 절 반복인식과 2음절이상 연장과 간투사는 음성인식과 언어분석기반으로 비유창성유형을 자동으로 예측하는 단계를 거친다.In addition, the repetition recognition of words, phrases, and clauses of more than two syllables, extension of more than two syllables, and inter-projection go through the steps of automatically predicting the non-fluency type based on speech recognition and language analysis.

구분division 내용Contents 분석방법Analysis method 분석내용Analysis content 말의 반복(1음절)repetition of speech (1 syllable) (아) (아) (아) 아빠(Ah) (Ah) (Ah) Dad 음성분석speech analysis 말의 연장(1음절)extension of speech (1 syllable) (아~~~) 아빠(Ah~~~) Dad 간투사(1음절)Liver Projection (1 syllable) 아빠가 (어) 오늘 (어) 회사에 갔어Dad (uh) went to work today (uh) 말의 반복(2음절 이상)repetition of speech (more than 2 syllables) (여기) (여기) 와(here) (here) and 음성인식 +
언어분석(형태소분석, 구문분석, 의미분석)Voice Recognition +
Language analysis (morpheme analysis, syntax analysis, semantic analysis) 말의 연장(2음절 이상)Extension of speech (more than 2 syllables) (지~~~~금) 아빠(Ji~~~Fri) Dad 간투사(2음절 이상)Liver projection (more than 2 syllables) 아빠 (있잖아요) 장수풍뎅이가 나왔어요.Daddy (you know) a long-lived beetle is here. 말의 수정fertilization of horses (이제 공룡) 이제 아빠가 책 읽어줘(Now Dinosaur) Now Daddy reads a book 문장의 의미관계(체언부, 용언, 수식부, 대화요소)에서 제외되는 부분The part excluded from the semantic relation (verb, verb, modifier, dialogue element) of the sentence 말의 막힘blockage of words (아) (아) (아) 아빠 (2초 이상 발성없이 다음 문장 발화) 오세요(Ah) (Ah) (Ah) Daddy (Speak the next sentence without uttering for more than 2 seconds) Come

상기 표 1에서 알 수 있는 바와 같이, 본 발명에서 "말의 반복"은 음절이나 낱말의 일부, 낱말, 구, 절 등을 스스로 멈추지 못하고 여러 차례 되풀이 하는 것을 말한다. As can be seen from Table 1, in the present invention, "repeating words" refers to repeating a syllable or a part of a word, a word, a phrase, a clause, etc. several times without stopping by itself.

"1음절 기준 말의 반복"은 "(아) (아) (아) 아빠"에서 "(아)" 음절이 반복해서 발생한 경우에 해당한다."Repetition of words based on one syllable" corresponds to a case in which the "(ah)" syllable repeatedly occurs in "(a) (ah) (ah) daddy".

"1음절 기준 말의 연장"은 말소리가 0.5초 이상 길게 이어져 계속되는 상태를 말한다. 즉, "(아~~~) 아빠"에서 "(아~~~)"와 같이 연장되어 발음되는 것을 말한다."Extension of speech based on one syllable" refers to a state in which speech sounds are continued for more than 0.5 seconds. In other words, it means that it is pronounced as "(ah~~~)" from "(ah~~~) daddy".

"1음절 기준 간투사"의 경우, "아빠가 (어) 오늘 (어) 회사에 갔어"에서 "(어)" 음절이 발생한 경우에 해당한다.In the case of “one-syllable-based liver projection”, it corresponds to the case where the “(uh)” syllable occurs in “Dad (uh) went to (uh) today (uh) work”.

"1음절 기준 말의 반복, 말의 연장, 간투사"의 경우, 입력 음성에서 특징을 추출한 뒤에, 학습모델과 비교해서 "말의 반복"인지 "말의 연장"인지 "간투사"인지 예측할 수 있다.In the case of "one syllable-based speech repetition, speech extension, liver projection", after extracting features from the input speech, it is possible to predict whether "speech repetition", "speech extension" or "liver projection" is compared with the learning model.

"말의 반복", "말의 연장", "간투사"가 2음절 이상으로 발생할 때는 음성분석이 아니라, 음성인식결과와 언어분석결과를 통해서 추론할 수 있다.When "repetition of speech", "prolongation of speech", and "liver projection" occur with more than two syllables, it can be inferred through speech recognition results and language analysis results, not speech analysis.

예들들자면, "말의 반복"의 경우, "여기"가 음성인식 후, 형태소분석에 의해 대명사로 분석되기 때문에, 반복으로 판정할 수 있다.For example, in the case of "repetition of speech", since "here" is analyzed as a pronoun by morpheme analysis after voice recognition, it can be determined as repetition.

"말의 연장"인 "지~~~~금"의 경우, 음성인식 후, 합친 단어가 "지금"으로, 형태소분석에 의해 명사로 분석되기 때문에, "말의 연장"으로 판정할 수 있다.In the case of “now~~~~ now”, which is “extension of speech,” it can be determined as “extension of speech” because the combined word is analyzed as “now” and into a noun by morphological analysis after voice recognition.

"간투사"의 경우 "있잖아요"가 형태소분석에 의해 일반동사 "있"으로 분석되고, "있잖아요"가 간투사 사전에 등록되어 있으므로, "간투사"로 판정할 수 있다.In the case of "liver projection", "you are" is analyzed as a general verb "to be" by morphological analysis, and "you are" is registered in the liver projection dictionary, so it can be determined as "liver projection".

"말의 수정"의 경우, 구문분석, 의미분석에 의해서 체언부(문장속에서 주체, 객체의 역할)에서 행위자, 경험자, 소유자, 공존자, 수혜자, 대상, 시체, 인용/창조물을 관계를 문장 속에서 도출하고, 용언부에서 행위, 서술, 상태서술, 실체서술, 부정서술, 체언수식, 용언수식 관계를 문장 속에서 도출한다.In the case of "modification of words", the relationship between the actor, the experiencer, the owner, the co-existent, the beneficiary, the object, the corpse, and the quotation/creation in the body part (the role of the subject and the object in the sentence) by syntax analysis and semantic analysis is sentenced In the verb part, the relationship between action, description, state statement, substantive statement, negative statement, verb expression, and verb expression is derived from the sentence.

수식부(문장 속에서 체언, 용언, 수식언을 수식)에서는 장소, 도구, 부정, 때, 이유, 조건, 비교, 재현, 양보관계를 문장 속에서 도출한다.In the modifier section (modifying adjectives, verbs, and modifiers in a sentence), the place, tool, negation, time, reason, condition, comparison, representation, and concession relations are derived from the sentence.

또한, 대화요소로서 주의끌기, 되묻기/확인하기, 감탄, 예/아니오 대답, 강조, 동반소리, 인사, 접속, 자동구 등으로 문장관계를 도출한다.In addition, as dialogue elements, sentence relationships are derived by drawing attention, asking/confirming, exclamation, yes/no answer, emphasis, accompanying sound, greeting, connection, and automatic phrase.

상기, 체언부, 용언부, 수식부, 대화요소부가 아닌 부분은 "말의 수정" 후보군으로 판정한다.The parts other than the verbal part, the verb part, the modifier part, and the dialogue element part are determined as the "word correction" candidate group.

따라서, 상기 표 1에서 "말의 수정"은 "(이제 공룡) 이제 아빠가 책 읽어줘"에서 "(이제 공룡)" 발화를 말한다.Therefore, in Table 1, "correction of words" refers to the utterance of "(now dinosaur)" in "(now dinosaur), now Dad read a book".

즉, 때(이제) 행위자(아빠)가 대상(책)을 행위(읽다)한다는 의미다. 앞의 "이제 공룡"은 상기 문장을 구성하는 개별의미유형에 소속되지 않는 발화이다.That is, when (now) the actor (dad) acts (reads) the object (book). The preceding "now dinosaur" is an utterance that does not belong to an individual semantic type constituting the sentence.

말의 막힘은 말소리가 정지되어 소리를 더 이상 이어나갈 수 없는 상태를 말한다. ""아 아 아 아빠 (2초 이상 발성없이 다음 문장 발화) 오세요"Speech blockage is a condition in which speech stops and it is no longer possible to continue the sound. "Ah ah ah ah daddy (speak the next sentence without uttering for more than 2 seconds) Come"

도 2는 빅데이터를 이용해서 유창성을 분석하는 시스템의 결과표시부 요약 화면이다.2 is a summary screen of the result display unit of a system that analyzes fluency using big data.

본 발명의 결과표시부 요약 화면에는 언어이해와 언어표현 수준을 발달연령의 형태로 나타낼 수 있다.On the summary screen of the result display unit of the present invention, language comprehension and language expression levels can be expressed in the form of developmental age.

또한, 말의 반복 수준, 말의 연장 수준, 말의 막힘 수준, 간투사 수준을 표시할 수 있다.In addition, it is possible to display the level of repetition of speech, the level of extension of speech, the level of blockage of speech, and the level of liver projection.

지수는 100번 발화당 발생율 평균이다.The index is the average incidence rate per 100 utterances.

도 3은 빅데이터를 이용해서 유창성을 분석하는 시스템의 결과표시부 그래프 화면이다.3 is a graph screen of the result display unit of a system that analyzes fluency using big data.

본 발명의 결과표시부 그래프 화면에는 말의 반복 수준, 말의 연장 수준, 말의 막힘 수준, 간투사 등의 비유창성정도를 연령별, 기간별, 수준별로 그래프를 표시한다.On the graph screen of the result display unit of the present invention, graphs of the degree of non-fluency such as the repetition level of speech, the level of extension of the speech, the level of blockage of the speech, the liver projection, etc. are displayed as graphs by age, period, and level.

예를 들자면, 현재 치료대상자는 생활연령 6세, 언어발달연령 31개월이며, 말의 반복 수준은 50%로써, 100번 발화에 평균 50번 정도 "말의 반복"이 발생한다는 의미이다.For example, the current treatment target is 6 years of age and 31 months of speech development, and the repetition level of speech is 50%, meaning that "word repetition" occurs on average about 50 times for every 100 utterances.

또래 아동과 비교해서 정상아동의 경우, 100번 발화당 평균 8%, 언어발달장애 아동의 경우 평균 20% 정도 "말의 반복"이 발생함을 알 수 있다.Compared with children of the same age, it can be seen that in the case of normal children, "repeated speech" occurs on average by 8% per 100 utterances and by 20% in the case of children with language development disorders.

기간별 통계는 일별, 주별, 월별, 년별로 현재 비유창성 정도가 개선되어가는 진행 상황을 알 수 있다.Statistics by period show the progress of improving the current level of fluency by day, week, month, and year.

이와 같이 본 발명에 따른 바람직한 실시예를 살펴보았으며, 앞서 설명된 실시예 이외에도 본 발명이 그 취지나 범주에서 벗어남이 없이 다른 특정 형태로 구체화될 수 있다는 사실은 해당 기술분야에 있어 통상의 지식을 가진 자에게는 자명한 것이다.As such, the preferred embodiments according to the present invention have been reviewed, and the fact that the present invention can be embodied in other specific forms without departing from the spirit or scope of the present invention in addition to the above-described embodiments is common knowledge in the art. It is self-evident to those who have it.

그러므로, 상술된 실시예는 제한적인 것이 아니라 예시적인 것으로 여겨져야 하며, 이에 따라 본 발명은 상술한 설명에 한정되지 않고 첨부된 청구항의 범주 및 그 동등 범위 내에서 변경될 수 있다.Therefore, the above-described embodiments are to be regarded as illustrative rather than restrictive, and accordingly, the present invention is not limited to the above description but may be modified within the scope of the appended claims and their equivalents.

100...클라이언트 200...서버100...client 200...Server

Claims

A video/audio transmitter that captures video and audio in real time through a camera and microphone and transmits it to the server as a stream;
A voice transmission unit that captures voice in real time through a microphone and transmits it to the server as a stream; and
A client comprising a result display unit for displaying the fluency analysis result in a graph; and
a video/audio receiver for receiving the video and audio transmitted from the video/audio transmitter of the client in real time;
A voice receiver that receives the voice transmitted from the voice transmitter in real time;
A voice analysis unit that separates only the voice from the voice or video stream;
A preprocessor that removes noise from voice,
A voice minimum unit separation unit that separates and detects the voice into the smallest unit;
A voice feature extraction unit for extracting features from the voice,
A primary prediction unit between repeated and extended projections that automatically predicts the non-fluency type based on speech-based 1-syllable repetition, extension, and inter-projection;
A voice recognition unit for recognizing a voice,
A language analysis unit that performs morphological analysis, syntax analysis, and semantic analysis on the speech-recognized text;
Based on speech recognition and language analysis, repeated recognition of words, phrases, and clauses of more than two syllables, extension of more than two syllables, and a secondary prediction unit between repeated extensions that automatically predict the non-fluency type of the inter-projection;
Correction blockage prediction unit that automatically predicts the non-fluency type of correction and blockage based on speech recognition and language analysis;
a server comprising a result analysis unit for calculating a fluency analysis result;
A system for analyzing fluency using big data, characterized in that it comprises a.

The method of claim 1,
To collect and learn non-fluency types and create a fluency evaluation model,
a repeated and extended inter-projection input unit for inputting a voice to which repetition, extension, and inter-projection are applied;
a learning voice feature extraction unit for extracting voice features to be used for learning;
a machine learning learning unit that learns repeated, extended, and inter-projected voices through supervised learning; and
It consists of a fluency evaluation model generation unit that can evaluate fluency,
A system for analyzing fluency using big data, characterized in that when a voice is input, the non-fluency type of the corresponding voice is automatically predicted.

3. The method according to claim 1 or 2,
It automatically analyzes fluency even when video and audio are input at the same time or only audio is input.
1 syllable repetition, 1 syllable extension, and 1 syllable projection automatically predict the non-fluency type based on the voice,
A system that analyzes fluency using big data, characterized by automatically predicting the type of non-fluency based on speech recognition and language analysis for word, phrase, and clause repetition recognition of more than 2 syllables, extension of more than 2 syllables, and inter-projection.

capturing video and audio in real time through the client's camera and microphone and transmitting it as a stream to a server;
Receiving the video and audio transmitted from the video/audio transmission unit in real time by the server;
A method comprising: capturing voice through a client's microphone in real time and transmitting it as a stream to a server;
A server real-time receiving the voice transmitted from the voice transmission unit;
separating only the audio from the audio or video stream;
removing noise from the voice;
Separating and detecting speech into minimum units
extracting features from speech;
automatically predicting a non-fluency type based on one-syllable repetition, extension, and inter-projection;
Recognizing a voice;
performing morphological analysis, syntax analysis, and semantic analysis on the speech-recognized text;
automatically predicting non-fluency types such as repeated recognition of words, phrases, and clauses of two or more syllables, extension of two or more syllables, and inter-projection based on speech recognition and language analysis;
automatically predicting non-fluency types such as correction and blockage based on speech recognition and language analysis;
calculating a fluency analysis result; and
displaying the fluency analysis result as a graph by the client;
A method of analyzing fluency using big data, characterized in that it comprises a.

5. The method of claim 4,
inputting a voice to which repetition, extension, and inter-projection are applied in order to collect and learn non-fluency types and generate a fluency evaluation model;
extracting features of speech to be used for learning;
Learning repetition, extension, and inter-projection voice through supervised learning; and
It consists of generating a fluency evaluation model that can evaluate fluency,
A method of analyzing fluency using big data, characterized in that when a voice is input, the non-fluency type of the corresponding voice is automatically predicted.

6. The method according to claim 4 or 5,
It automatically analyzes fluency even when video and audio are input at the same time or only audio is input.
1 syllable repetition, 1 syllable extension, and 1 syllable projection automatically predicting the non-fluency type based on the voice; and
Repeat recognition of words, phrases, and verses more than two syllables, extension of more than two syllables, and inter-projection are constructed including the steps of automatically predicting non-fluency types based on speech recognition and language analysis. How to analyze.