KR20200085041A

KR20200085041A - Language rehabilitation based vocal voice evaluation apparatus and method thereof

Info

Publication number: KR20200085041A
Application number: KR1020190001073A
Authority: KR
Inventors: 최성준; 이건수; 남윤영; 홍경훈
Original assignee: 순천향대학교 산학협력단
Priority date: 2019-01-04
Filing date: 2019-01-04
Publication date: 2020-07-14
Also published as: KR102188264B1

Abstract

The present invention relates to a device and method for evaluating a spoken voice based on language rehabilitation and, more specifically, to a device and method for evaluating a spoken voice based on language rehabilitation to extract a feature for a plurality of features for pronunciation spoken by a person with language disorder, who has different vocal features with a general person, measuring a distance with standard features in feature space according to a plurality of features, and determining whether the presence or absence of a disorder in speaking and degree of disorder by an average value that a weight of each feature is applied to the distance by the feature.

Description

Speech rehabilitation based vocal voice evaluation apparatus and method thereof

본 발명은 언어재활 기반 발성 음성 평가 장치 및 방법에 관한 것으로, 더욱 상세하게는 정상인과는 다른 발성 특징을 갖고 있는 언어 장애인이 발성한 발음에 대한 복수의 속성에 대한 특징을 추출하고 복수의 속성에 따른 속성 공간에서의 표준 특징과의 거리를 계측하고, 상기 각 속성별 가중치를 상기 속성별 거리에 적용한 평균값에 의해 발성의 장애유무 및 장애 정도를 판별하는 언어재활 기반 발성 음성 평가 장치 및 방법에 관한 것이다.The present invention relates to an apparatus and method for speech rehabilitation based speech rehabilitation, and more specifically, to extract features for a plurality of attributes for pronunciation spoken by a speech impaired person having speech characteristics different from a normal person, and to A speech rehabilitation based speech evaluation apparatus and method for measuring a distance from a standard feature in a property space according to each other and determining the presence or absence of speech disorder by the average value applied to the distance for each property will be.

일반적으로, 사람의 목소리를 처리하기 위한 기존 기술들은 대부분 다음의 두 가지 목표를 위해 개발되어 왔다.In general, most existing techniques for processing human voice have been developed for two purposes:

첫 번째 목표는 목소리를 듣고, 화자를 구분하는 것이고, 두 번째 목표는 지금 하는 말은 무엇인지를 인지하는 것이다.The first goal is to hear the voice and distinguish the speaker, and the second goal is to recognize what you are saying.

첫 번째 목표를 위한 기술은 개개인이 갖고 있 발성 특징을 찾고, 찾아진 발성 패턴의 소유자를 찾는 방법에 관한 기술이며, 두 번째 목표를 위한 기술은 개개인의 특성을 제거하고 남은 발성 정보를 바탕으로 어떤 어휘가 발음되었는지를 찾는 방법에 관한 기술이다.The technique for the first goal is a technique for finding the vocal characteristics of each individual and finding the owner of the found vocal pattern, and the technique for the second goal is to remove the individual characteristics and use some remaining speech information This is how to find out if a vocabulary is pronounced.

현재 음성 인식 관련 연구는 이들 기술들이 주를 이루고 있으며, 여기에서 발전하여 음성인식, 즉 두 번째 기술에 기반하여 언어장애 여부 등을 검사할 수 있는 기술들 또한 연구되고 있다.Currently, these technologies are mainly used in speech recognition research, and the technologies that can develop speech recognition, that is, whether or not there is a language disorder based on the second technology, are also being studied.

그러나 종래 언어장애 검사 기술은 상술한 바와 같이 음성인식에 기반한 기술로 정확한 표현을 하지 못함에 의한 장애만을 검사할 뿐, 음성인식 불가에서 출발하여 표준 발성과의 발성 발음 자체의 불일치의 정도를 판단하여 발성 음성을 평가할 수 없는 문제점이 있었다.However, as described above, the conventional language disorder test technology only examines the disorder due to the inability to accurately express the speech recognition-based technology, and judges the degree of inconsistency in the pronunciation of the vocal pronunciation itself, starting from impossibility of speech recognition. There was a problem in that speech voice could not be evaluated.

등록특허공보 제10-1804389호(2017.12.04.공고)Registered Patent Publication No. 10-1804389 (Notice 2017.12.04.)

따라서 본 발명의 목적은 정상인과는 다른 발성 특징을 갖고 있는 언어 장애인이 발성한 발음에 대한 복수의 속성에 대한 특징을 추출하고 복수의 속성에 따른 속성 공간에서의 표준 특징과의 거리를 계측하고, 상기 각 속성별 가중치를 상기 속성별 거리에 적용한 평균값에 의해 발성의 장애유무 및 장애 정도를 판별하는 언어재활 기반 발성 음성 평가 장치 및 방법을 제공함에 있다.Accordingly, an object of the present invention is to extract characteristics of a plurality of attributes for pronunciation uttered by a speech impaired person having speech characteristics different from normal people, measure distances from standard characteristics in the attribute space according to the plurality of attributes, It is to provide a speech rehabilitation-based speech and speech evaluation apparatus and method for determining the presence or absence of speech disorder by the average value applied to the distance for each attribute by the weight for each attribute.

상기와 같은 목적을 달성하기 위한 본 발명에 따른 언어재활 기반 발성 음성 평가 장치는: 표준음성에 대한 속성별 표준음성 특징정보를 저장하는 표준음성 DB 및 평가 기준 레벨값을 저장하는 평가 기준 DB를 포함하는 저장부; 발화자가 발성하는 음성을 입력받아 음성 데이터를 출력하는 오디오 처리부; 및 상기 음성 데이터를 입력받아 다수의 프레임으로 분할하고, 프레임 단위로 샘플링하여 정규화하며, 정규화된 프레임 단위의 복수의 속성별 특징을 검출하고, 검출된 속성별 특징과 상기 표준음성 DB에 미리 저장되어 있는 발성된 상기 음성에 대한 해당 표준음성의 속성별 특징정보의 특징간의 유클리드 거리(유사도)를 계산하고, 계산된 각 속성별 유클리드 거리를 복합적으로 반영한 평가값을 계산한 후, 계산된 상기 평가값과 상기 평가 기준 DB의 평가 기준 레벨값을 비교하여 상기 발화자의 발성 음성을 평가하는 제어부를 포함하는 것을 특징으로 한다.The speech rehabilitation based speech rehabilitation apparatus according to the present invention for achieving the above object includes: a standard voice DB storing standard voice characteristic information for each attribute for standard voice and an evaluation reference DB storing evaluation reference level values Storage unit to; An audio processing unit that receives a voice spoken by the talker and outputs voice data; And receiving the voice data, dividing it into a plurality of frames, sampling and normalizing them in units of frames, detecting a plurality of characteristics for each attribute in a normalized frame unit, and storing the detected characteristics for each attribute and the standard voice DB in advance. The Euclidean distance (similarity) between the characteristics of the feature information for each attribute of the corresponding standard voice for the voiced speech is calculated, and after calculating the evaluation value reflecting the calculated Euclidean distance for each attribute in a complex manner, the calculated evaluation value And a control unit for comparing the evaluation reference level value of the evaluation reference DB and evaluating the spoken voice of the speaker.

상기 제어부는, 상기 오디오 처리부를 통해 음성 데이터를 획득하여 출력하는 음성신호 처리부; 상기 음성신호 처리부로부터 음성 데이터를 입력받고, 다수의 프레임으로 분할하고, 프레임 단위로 샘플링하여 정규화하며, 정규화된 프레임 단위의 복수의 속성별 특징을 검출하는 음성 특징 추출부; 검출된 속성별 특징과 상기 표준음성 DB에 미리 저장되어 있는 표준음성에 대한 속성별 특징간의 거리(유사도)를 계산하는 유클리드 거리 계산부; 계산된 각 속성별 거리를 복합적으로 반영한 평가값을 계산하는 평가값 계산부; 및 계산된 상기 평가값과 상기 평가 기준 DB의 평가 기준 레벨값을 비교하여 상기 발화자의 발성 음성을 평가하는 음성 평가부를 포함하는 것을 특징으로 한다.The control unit may include: a voice signal processing unit for obtaining and outputting voice data through the audio processing unit; A voice feature extraction unit that receives voice data from the voice signal processing unit, divides it into a plurality of frames, samples and normalizes them in frame units, and detects a plurality of attributes for each attribute in a normalized frame unit; A Euclidean distance calculator that calculates a distance (similarity) between the detected feature-specific feature and the feature-specific feature for the standard voice previously stored in the standard voice DB; An evaluation value calculation unit for calculating an evaluation value that reflects the calculated distance for each attribute in a complex manner; And a voice evaluation unit comparing the calculated evaluation value with the evaluation reference level value of the evaluation reference DB to evaluate the spoken voice of the speaker.

상기 음성 특징 추출부는, 다수의 프레임으로 분할하고, 프레임 단위로 샘플링하여 정규화하는 샘플링부; 정규화된 프레임에 단기 푸리에 변환(Short Term Fourier Transform)을 수행하여 발성 음성에 대응하는 상기 음성 데이터에 대한 파워 스펙트럼을 구하는 에너지 스펙트럼 획득부; 상기 파워 스펙트럼에 대한 각 주파수 구간에서의 각각의 속성별 에너지를 추출하는 속성별 특징 추출부; 상기 각 속성의 구간별 에너지에 로그를 취하여 로그값을 계산하는 구간별 로그부; 및 각 속성에 대해 구간별 연속되는 로그값에 의해 표현되는 곡선에 대한 이산 코사인 변환을 수행하여 이산 코사인 변환값을 특징값으로 출력하는 이산 코사인 변환 계산부를 포함하는 것을 특징으로 한다.The speech feature extracting unit may include a sampling unit that is divided into a plurality of frames and sampled in units of frames to normalize; An energy spectrum obtaining unit performing a short term Fourier transform on a normalized frame to obtain a power spectrum for the voice data corresponding to the spoken voice; A feature extraction unit for each attribute that extracts energy for each attribute in each frequency section for the power spectrum; A log unit for each section that calculates a log value by taking a log of energy for each section of each attribute; And a discrete cosine transform calculation unit for performing a discrete cosine transform on a curve represented by a continuous log value for each section and outputting the discrete cosine transform value as a feature value.

상기 속성별 특징 추출부는, 상기 파워 스펙트럼에 멜 스케일(Mel Scale)필터 뱅크를 적용하여 발성 음성에 대한 청각 기반 속성의 에너지를 계산하고, 계산된 에너지를 합하여 제1 특징을 추출하는 MFCC 특징 추출부; 상기 파워 스펙트럼에 선형 스케일(Linear Scale)필터 뱅크를 적용하여 발성 음성에 대한 성도 기반 속성의 에너지를 계산하고, 계산된 에너지를 합하여 제2 특징을 추출하는 LPCC 특징 추출부; 및 상기 파워 스펙트럼에 바크 스케일(Bark Scale)필터 뱅크 및 노이즈 필터를 적용하여 배경 잡음이 제거된 발성 음성에 대한 에너지를 계산하고, 계산된 에너지를 합하여 제3 특징을 추출하는 RASTA-PLP(Relative Spectral-Perceptual Linear Prediction) 특징 추출부를 포함하는 것을 특징으로 한다.The feature extracting unit for each attribute calculates the energy of the auditory-based attribute for the spoken voice by applying a Mel Scale filter bank to the power spectrum, and adds the calculated energy to extract the MFCC feature extracting unit ; An LPCC feature extraction unit that calculates the energy of a seong-based attribute for a speech voice by applying a linear scale filter bank to the power spectrum, and extracts a second feature by adding the calculated energy; And RASTA-PLP (Relative Spectral), which applies energy to the power spectrum, calculates energy for a speech speech in which background noise is removed by applying a Bark Scale filter bank and a noise filter, and sums the calculated energy to extract a third characteristic. -Perceptual Linear Prediction) It characterized in that it comprises a feature extraction.

상기 표준음성 DB는, 미리 정의된 어휘에 대한 표준음성 특징정보를 저장하고, 상기 제어부는, 미리 정의된 어휘 중 어느 하나 이상을 오디오 처리부의 스피커를 통해 순차적으로 출력하여 상기 발화자가 출력되는 어휘에 대한 음성을 발성하도록 유도하는 것을 특징으로 한다.The standard voice DB stores standard voice feature information for a predefined vocabulary, and the control unit sequentially outputs one or more of the predefined vocabularies through a speaker of an audio processing unit to a vocabulary output from the speaker. It is characterized by inducing a voice to be spoken.

상기 어휘는 유탭(U-TAP) 어휘인 것을 특징으로 한다.The vocabulary is characterized in that it is a U-TAP vocabulary.

상기 평가값 계산부는, MFCC, LPCC 및 RASTA-PLP 속성별 가중치(MFCC->w1, LPCC->w2, RASTA-PLP->w3) 및 속성별 유클리드 거리를 하기 수학식 2에 적용하여 평가값을 계산하는 것을 특징으로 한다.The evaluation value calculation unit applies the MFCC, LPCC and RASTA-PLP attribute-specific weights (MFCC->w1, LPCC->w2, RASTA-PLP->w3) and Euclidean distances by attribute to Equation 2 below to evaluate the evaluation values. It is characterized by calculating.

[수학식 2][Equation 2]

여기서, Feature_similarity_i는 각 속성의 유클리드 거리이다.Here, Feature_similarity _i is the Euclidean distance of each attribute.

상기 표준음성 DB는, 미리 정의된 어휘에 대한 표준 음성 특징정보 및 어휘별 가중치를 저장하고, 상기 제어부는, 미리 정의된 어휘 중 어느 하나 이상을 오디오 처리부의 스피커를 통해 순차적으로 출력하여 상기 발화자가 출력되는 어휘에 대한 음성을 발성하도록 유도한 후, 발성 유도된 어휘에 따라 입력되는 음성의 어휘에 대응하는 가중치를 적용하는 것을 특징으로 한다.The standard voice DB stores standard voice feature information and weights for each vocabulary for a predefined vocabulary, and the controller sequentially outputs one or more of the predefined vocabularies through a speaker of an audio processing unit to enable the talker to After inducing a voice to be output for the output vocabulary, a weight corresponding to the vocabulary of the input voice is applied according to the derived vocabulary.

상기와 같은 목적을 달성하기 위한 본 발명에 따른 언어재활 기반 발성 음성 평가 방법은: 제어부가 오디오 처리부를 통해 발화자가 발성한 음성에 대한 음성 데이터를 획득하는 음성 획득 과정;The speech rehabilitation-based speech evaluation method according to the present invention for achieving the above object includes: a speech acquisition process in which the control unit acquires speech data for the speech uttered by the speaker through the audio processing unit;

상기 제어부가 상기 음성 데이터를 다수의 프레임으로 분할하고, 프레임 단위로 샘플링하여 정규화하며, 정규화된 프레임 단위의 복수의 속성별 특징을 검출하는 속성별 특징 검출 과정; 상기 제어부가 검출된 속성별 특징과 상기 표준음성 DB에 미리 저장되어 있는 표준음성에 대한 속성별 특징간의 유클리드 거리(유사도)를 계산하는 유클리드 거리 계산 과정; 상기 제어부가 계산된 각 속성별 유클리드 거리를 복합적으로 반영한 평가값을 계산하는 평가값 계산 과정; 및 상기 제어부가 계산된 상기 평가값과 상기 평가 기준 DB의 평가 기준 레벨값을 비교하여 상기 발화자의 발성 음성을 평가하는 평가 과정을 포함하는 것을 특징으로 한다.A feature detection process for each attribute in which the control unit divides the audio data into a plurality of frames, samples the frames, and normalizes them, and detects a plurality of characteristics for each attribute in the normalized frame unit; A Euclidean distance calculation process in which the control unit calculates a Euclidean distance (similarity) between the characteristics for each attribute detected and the characteristics for each attribute for a standard voice previously stored in the standard voice DB; An evaluation value calculation process in which the control unit calculates an evaluation value that reflects the Euclidean distance for each attribute calculated in a complex manner; And an evaluation process in which the controller compares the calculated evaluation value with the evaluation reference level value of the evaluation reference DB to evaluate the spoken voice of the speaker.

상기 속성별 특징 검출 과정은, 다수의 프레임으로 분할하고, 프레임 단위로 샘플링하여 정규화하는 샘플링 단계; 정규화된 프레임에 단기 푸리에 변환(Short Term Fourier Transform)을 수행하여 발성 음성에 대응하는 상기 음성 데이터에 대한 파워 스펙트럼을 구하는 에너지 스펙트럼 획득 단계; 상기 파워 스펙트럼에 대한 각 주파수 구간에서의 각각의 속성별 특징(에너지값)을 추출하는 속성별 특징 추출 단계; 상기 각 속성의 구간별 특징에 로그를 취하여 로그값을 계산하는 구간별 로그값 계산 단계; 상기 각 속성의 구간별 특징에 로그를 취하여 로그값을 계산하는 구간별 로그값 계산 단계; 및 각 속성의 구간별 로그값에 이산 코사인 변환을 수행하여 이산 코사인 변환값을 계산하는 이산 코사인 변환 계산 단계를 포함하는 것을 특징으로 한다.The feature detection process for each attribute includes: a sampling step of dividing into a plurality of frames and sampling and normalizing by frame unit; An energy spectrum acquisition step of performing a short term Fourier transform on a normalized frame to obtain a power spectrum for the speech data corresponding to the spoken speech; A feature extraction step for each attribute that extracts a feature (energy value) for each attribute in each frequency section for the power spectrum; A log value calculation step for each section that calculates a log value by taking a log for each section of each attribute; A log value calculation step for each section that calculates a log value by taking a log for each section of each attribute; And a discrete cosine transform calculation step of performing a discrete cosine transform on a log value for each section of each attribute to calculate a discrete cosine transform value.

상기 속성별 특징 추출 단계는, 상기 파워 스펙트럼에 멜 스케일(Mel Scale)필터 뱅크를 적용하여 발성 음성에 대한 청각 기반 속성의 에너지를 계산하고, 계산된 에너지를 합하여 제1 특징을 추출하는 MFCC 특징 추출 단계; 상기 파워 스펙트럼에 선형 스케일(Linear Scale)필터 뱅크를 적용하여 발성 음성에 대한 성도 기반 속성의 에너지를 계산하고, 계산된 에너지를 합하여 제2 특징을 추출하는 LPCC 특징 추출 단계; 및 상기 파워 스펙트럼에 바크 스케일(Bark Scale)필터 뱅크를 적용하여 발성 음성에 대한 억양 기반 속성의 에너지를 계산하고, 계산된 에너지를 합하여 제3 특징을 추출하는 RASTA-PLP(Relative Spectral-Packet Level Procedure) 특징 추출 단계를 포함하는 것을 특징으로 한다.In the feature extraction step for each attribute, applying a Mel Scale filter bank to the power spectrum calculates the energy of the auditory-based attribute for the speech, and extracts MFCC features that extract the first feature by adding the calculated energy. step; An LPCC feature extraction step of applying a linear scale filter bank to the power spectrum to calculate the energy of a voice-based attribute for a spoken voice, and summing the calculated energy to extract a second feature; And a RASTA-PLP (Relative Spectral-Packet Level Procedure) applying a Bark Scale filter bank to the power spectrum to calculate the energy of intonation-based attributes for speech, and summing the calculated energy to extract a third characteristic. ) Characterized by including a feature extraction step.

상기 방법은: 상기 제어부가 표준음성 DB에 미리 정의된 어휘 중 어느 하나 이상을 오디오 처리부의 스피커를 통해 순차적으로 출력하여 상기 발화자가 출력되는 어휘에 대한 음성을 발성하도록 유도하는 어휘 발성 유도 과정을 더 포함하는 것을 특징으로 한다.The method further comprises: a vocabulary vocalization induction process in which the controller sequentially outputs at least one of the vocabularies predefined in the standard speech DB through the speaker of the audio processing unit to induce the speaker to vocalize the output vocabulary. It is characterized by including.

상기 제어부는, MFCC, LPCC 및 RASTA-PLP 속성별 가중치(MFCC->w1, LPCC->w2, RASTA-PLP->w3) 및 속성별 유클리드 거리를 하기 수학식 2에 적용하여 평가값을 계산하는 것을 특징으로 한다.The control unit calculates the evaluation value by applying MFCC, LPCC, and RASTA-PLP attribute-specific weights (MFCC->w1, LPCC->w2, RASTA-PLP->w3) and Euclidean distances by attribute to Equation 2 below. It is characterized by.

[수학식 2][Equation 2]

여기서, Feature_similarity_i는 각 속성의 유클리드 거리이다. 한다.Here, Feature_similarity _i is the Euclidean distance of each attribute. do.

본 발명은 발화자가 발성한 음성에 대해 세 가지의 속성을 측정하고, 측정된 세 가지의 속성에 따른 속성 공간에서 표준 발성과의 거리를 측정하고, 속성별 가중치를 적용하여 발화자의 발성을 평가하므로 발화자의 장애 여부 및 정상인과의 차이에 따른 장애 정도를 정확하게 판별할 수 있는 효과를 갖는다.The present invention measures the three attributes of the voice spoken by the talker, measures the distance from the standard speech in the attribute space according to the measured three attributes, and evaluates the talker's speech by applying weights for each attribute. It has the effect of accurately discriminating the degree of disability according to the difference between the talker and the normal person.

본 발명은 장애 여부 및 장애 정도를 정확하게 판단할 수 있으므로 재활치료사에 따른 재활 훈련의 질 변화를 방지하고, 보다 저렴한 비용으로 양질의 음성 재활 서비스를 제공하도록 할 수 있는 효과를 갖는다.The present invention has the effect of preventing the change in the quality of rehabilitation training according to the rehabilitation therapist because it can accurately determine whether or not the degree of disability and the degree of disability, and to provide a high quality voice rehabilitation service at a lower cost.

도 1은 본 발명에 따른 언어재활 기반 발성 음성 평가 장치의 구성을 나타낸 도면이다.
도 2는 본 발명에 따른 언어재활 기반 발성 음성 평가 방법을 나타낸 흐름도이다.
도 3은 본 발명에 따른 언어재활기반 발성 음성 평가 방법 중 음성 특징 추출 방법을 나타낸 흐름도이다.
도 4는 본 발명의 일실시예에 따른 장애 아동 및 정상 아동의 음성 파형 및 특징들을 나타낸 도면이다.1 is a view showing the configuration of a speech rehabilitation based speech rehabilitation device according to the present invention.
2 is a flowchart illustrating a speech rehabilitation based speech rehabilitation method according to the present invention.
3 is a flowchart illustrating a speech feature extraction method among speech rehabilitation-based speech evaluation methods according to the present invention.
4 is a diagram showing voice waveforms and characteristics of a disabled child and a normal child according to an embodiment of the present invention.

이하 첨부된 도면을 참조하여 본 발명에 따른 언어재활 기반 발성 음성 평가 장치의 구성 및 동작을 설명하고, 그에 따른 발성 음성 평가 방법을 상세히 설명한다.Hereinafter, the configuration and operation of the speech rehabilitation-based speech evaluation apparatus according to the present invention will be described with reference to the accompanying drawings, and the speech speech evaluation method according to the description will be described in detail.

도 1은 본 발명에 따른 언어재활 기반 발성 음성 평가 장치의 구성을 나타낸 도면이다.1 is a view showing the configuration of a speech rehabilitation based speech rehabilitation device according to the present invention.

본 발명에 따른 언어재활 기반 발성 음성 평가 장치는 저장부(10), 디스플레이부(20), 입력부(30), 오디오 처리부(40) 및 제어부(50)를 포함한다.The speech rehabilitation-based speech evaluation apparatus according to the present invention includes a storage unit 10, a display unit 20, an input unit 30, an audio processing unit 40, and a control unit 50.

저장부(10)는 본 발명에 따른 발성 음성 평가 장치의 동작을 제어하기 위한 제어프로그램을 저장하는 프로그램영역, 상기 제어프로그램의 수행 중에 발생하는 데이터를 일시 저장하는 임시영역, 상기 제어프로그램에 필요한 데이터 및 제어프로그램에 의해 발생되는 데이터를 반영구적으로 저장하는 데이터영역을 포함한다.The storage unit 10 is a program area for storing a control program for controlling the operation of the speech evaluation apparatus according to the present invention, a temporary area for temporarily storing data generated during the execution of the control program, and data required for the control program And a data area for permanently storing data generated by the control program.

본 발명에 따라 상기 데이터 영역에는 표준음성 DB(11), 평가 기준 DB(12) 및 평가 DB(13)가 구성된다.According to the present invention, in the data area, a standard voice DB 11, an evaluation criterion DB 12, and an evaluation DB 13 are configured.

상기 표준음성 DB(11)는 다수의 어휘들을 정의하고 있으며, 정의된 어휘 각각의 표준음성에 대한 속성별 특징에 대한 속성별 표준음성 특징정보를 저장한다. 상기 어휘는 하기 표 1과 같이 유탭(Urimal Test of Articulation and Phonation: U-TAP)에서 사용되는 어휘를 적용한다.The standard voice DB 11 defines a number of vocabularies, and stores standard voice feature information for each attribute for each attribute for each standard voice of the defined vocabulary. The vocabulary applies the vocabulary used in Urimal Test of Articulation and Phonation (U-TAP) as shown in Table 1 below.

U-TAP 어휘U-TAP vocabulary 바지, 단추, 책상, 가방, 사탕, 연필, 자동차, 동물원, 엄마, 뽀뽀, 호랑이, 코끼리, 땅콩, 귀, 그네, 토끼, 풍선, 로봇, 그림, 못, 눈썹, 괴물, 싸움, 참새, 세 마리, 짹짹, 나무, 메뚜기, 전화, 목도리Pants, button, desk, bag, candy, pencil, car, zoo, mummy, kissy, tiger, elephant, peanut, ears, swing, rabbit, balloon, robot, figure, nail, eyebrow, monster, fight, sparrow, three , Tweet, tree, grasshopper, phone, shawl

상기 표준음성에 대한 속성별 특징은 정상인의 상기 U-TAP의 어휘에 대한 속성별 특징들의 평균값으로 정의한다. 상기 표준음성에 대한 속성별 특징은 연령대별로 구분되어 정의될 수도 있을 것이다. 상기 속성은 후술할 음성인식 방법으로 알려진 MFCC(Mel Frequency Cepstral Coefficient), LPCC(Linear Prediction Cepstrum Coefficient) 및 RASTA_PLP(Relative Spectral- Perceptual Linear Prediction)이며, MFCC는 청각 (속성)기반이고, LPCC는 성도 (속성)기반이며, RASTA_PLP는 발음 (속성) 기반이다.The characteristics for each attribute for the standard voice are defined as the average value of the characteristics for each attribute for the vocabulary of the U-TAP of a normal person. The characteristics for each attribute of the standard voice may be defined by age. The properties are MFCC (Mel Frequency Cepstral Coefficient), LPCC (Linear Prediction Cepstrum Coefficient) and RASTA_PLP (Relative Spectral-Perceptual Linear Prediction), which are known as speech recognition methods to be described later, MFCC is auditory (attribute) based, Attribute) based, RASTA_PLP is based on pronunciation (attribute).

또한, 상기 표준음성 DB(11)는 각 어휘에 대한 속성별 가중치(Weight)를 더 저장한다.In addition, the standard voice DB 11 further stores weight for each vocabulary by attribute.

평가 기준 DB(12)는 발성된 음성에 대한 레벨을 정의하기 위한 평가 기준 레벨값 및 평가 기준 레벨값별 평가정보를 저장한다.The evaluation criterion DB 12 stores evaluation criterion level values and evaluation information for each evaluation criterion level value to define a level for the spoken voice.

평가 DB(13)는 임의의 사용자에 대해 평가된 사용자별 평가정보를 저장한다.The evaluation DB 13 stores evaluation information for each user evaluated for any user.

디스플레이부(20)는 언어재활 기반 발성 음성 평가 장치의 동작 상태에 따른 정보 및 동작 중에 발생되는 다양한 정보들을 텍스트, 아이콘 등을 포함하는 그래픽, 정지영상 및 동영상 중 어느 하나 이상으로 표시한다.The display unit 20 displays information according to the operation state of the speech rehabilitation based speech rehabilitation device and various information generated during the operation as one or more of graphics, still images, and videos including text and icons.

입력부(30)는 사용자로부터 본 발명에 따른 기능 및 정보 입력을 위한 다수의 키를 구비하는 키보드 등과 같은 키 입력장치, 상기 디스플레이부(20)의 화면에 일체로 구성되어 터치되는 화면상의 위치에 대응하는 위치정보를 출력하는 터치패드, 상기 화면상에서 움직이는 커서를 이동시키고 이동되는 커서의 이동정보를 출력하는 마우스 등 중 하나 이상을 포함한다.The input unit 30 is a key input device such as a keyboard having a plurality of keys for inputting functions and information according to the present invention from a user, and is integrally configured on a screen of the display unit 20 and corresponds to a position on the screen being touched. It includes at least one of a touch pad for outputting the location information, a mouse for moving the moving cursor on the screen and outputting the movement information of the moved cursor.

오디오 처리부(40)는 제어부(50)로부터 음성(오디오)데이터를 입력받아 스피커(SPK)를 통해 가청음으로 출력하고, 마이크(MIC)를 통해 발화자가 발성하는 발음에 대한 음성신호를 생성하고, 음성신호를 음성 데이터로 변환하여 제어부(50)로 출력한다.The audio processing unit 40 receives voice (audio) data from the control unit 50 and outputs it as an audible sound through a speaker SPK, and generates a voice signal for a speaker's pronunciation through a microphone (MIC), and voice The signal is converted into audio data and output to the control unit 50.

제어부(50)는 발성 유도부(101), 음성신호 처리부(110), 음성 특징 추출부(120), 유클리드 거리 계산부(130), 평가값 계산부(140) 및 음성 평가부(150)를 포함하여, 본 발명에 따른 언어재활 기반 발성 음성 평가 장치의 전반적인 동작을 제어한다.The control unit 50 includes a voice induction unit 101, a voice signal processing unit 110, a voice feature extraction unit 120, a Euclidean distance calculation unit 130, an evaluation value calculation unit 140, and a voice evaluation unit 150 Thus, the overall operation of the speech rehabilitation-based speech evaluation apparatus according to the present invention is controlled.

구체적으로, 발성 유도부(101)는 입력부(30)를 통해 발음 평가 이벤트가 발생하는지를 모니터링하고, 발음 평가 이벤트가 발생되면 저장부(10)의 표준음성 DB(11)에 등록된 어휘에 대응하는 음성을 발성할 것을 유도하는 어휘 발성 요청 메시지를 디스플레이부(20) 및 오디오 처리부(40) 중 어느 하나를 통해 출력하여 발화자가 상기 어휘를 발성하도록 유도한다.Specifically, the speech induction unit 101 monitors whether a pronunciation evaluation event occurs through the input unit 30, and when the pronunciation evaluation event occurs, a voice corresponding to the vocabulary registered in the standard voice DB 11 of the storage unit 10 A vocabulary request message that induces vocalization is output through any one of the display unit 20 and the audio processing unit 40 to induce the talker to vocalize the vocabulary.

발성 유도부(101)는 상기 유도된 어휘에 대한 정보를 유클리드 거리 계산부(130) 및 평가값 계산부(140)로 제공한다. The vocal induction unit 101 provides information on the derived vocabulary to the Euclidean distance calculation unit 130 and the evaluation value calculation unit 140.

음성신호 처리부(110)는 상기 발성 유도된 어휘에 대한 음성 데이터가 오디오 처리부(40)로부터 입력되는지를 모니터링하고, 음성 데이터가 입력되면 음성 특징 추출부(120)로 출력한다.The voice signal processing unit 110 monitors whether voice data for the vocabulary-derived vocabulary is input from the audio processing unit 40 and outputs the voice data to the voice feature extraction unit 120 when the voice data is input.

음성 특징 추출부(120)는 샘플링부(121), 에너지 스펙트럼 획득부(122), 속성별 특징 추출부(123), 로그부(127), 이산 코사인 계산부(128) 및 이산 코사인 계수 계산부(129)를 포함하여 상기 음성 데이터를 입력받아 다수의 프레임으로 분할하고, 프레임 단위로 샘플링하여 정규화하며, 정규화된 프레임 단위의 복수의 속성별 특징을 검출한다.The voice feature extraction unit 120 includes a sampling unit 121, an energy spectrum acquisition unit 122, a feature extraction unit 123 for each attribute, a log unit 127, a discrete cosine calculation unit 128, and a discrete cosine coefficient calculation unit The audio data including (129) is received, divided into a plurality of frames, sampled in units of frames, and normalized, and a plurality of attributes for each attribute in the normalized frames is detected.

구체적으로, 샘플링부(121)는 상기 음성 데이터를 다수의 프레임으로 분할하고, 상기 분할된 프레임들을 샘플링하여 정규화한 후 출력한다. 상기 샘플링 수는 2000 등이 될 수 있을 것이다. 프레임 당 샘플 수가 너무 적으면 주파수 분석 신뢰도가 낮아지고, 너무 크면 하나의 프레임 안에 많은 변화치가 하나로 섞이기 때문에 중요한 변화를 놓칠 수 있으므로 적당한 사이즈를 맞추는 것이 중요하다.Specifically, the sampling unit 121 divides the audio data into a plurality of frames, samples the normalized frames, and normalizes the output. The sampling number may be 2000 or the like. If the number of samples per frame is too small, the reliability of frequency analysis decreases. If it is too large, it is important to fit the proper size because many changes are mixed into one frame, so important changes can be missed.

에너지 스펙트럼 획득부(122)는 각각의 상기 프레임에 단기 푸리에 변환(Short Term Fourier Transform: STFT)을 수행하여 상기 음성 데이터에 대한 파워 스펙트럼(Power Spectrum)을 획득하여 출력한다.The energy spectrum acquisition unit 122 performs a short term Fourier transform (STFT) on each frame to obtain and output a power spectrum for the speech data.

속성별 특징 추출부(123)는 상기 파워 스펙트럼에 속성별 필터뱅크를 적용하여 속성별 에너지를 출력한다.The feature extraction unit for each attribute 123 applies a filter bank for each attribute to the power spectrum to output energy for each attribute.

상기 속성별 특징 추출부(123)는 청각 기반의 제1속성의 에너지를 추출하는 멜 스케일(Mel Scale) 필터 뱅크를 적용하는 제1특징(MFCC) 추출부(124), 성도 기반의 제2속성의 에너지를 추출하는 선형 스케일(Linear Scale) 필터 뱅크를 적용하는 제2특징(LPCC) 추출부(125) 및 억양 기반의 제3속성의 에너지를 추출하는 바크 스케일(Bark Scale) 필터뱅크를 적용하는 제3특징(RASTA-PLP) 추출부(126)를 포함한다.The feature extraction unit 123 for each attribute includes a first feature (MFCC) extraction unit 124 that applies a Mel Scale filter bank that extracts energy of the auditory-based first attribute, and a second attribute based on the saints A second feature (LPCC) extraction unit 125 that applies a linear scale filter bank for extracting energy of the bark and a bark scale filter bank that extracts energy of a third attribute based on accent It includes a third feature (RASTA-PLP) extraction unit 126.

로그부(127)는 상기 속성별 특징 추출부(123)로부터 출력되는 각 속성의 구간별 에너지들에 로그를 취한 로그값을 계산하여 출력한다.The log unit 127 calculates and outputs a log value obtained by taking a log of energy for each section of each attribute output from the feature extraction unit 123 for each attribute.

이산 코사인 변환(DCT) 계산부(128)는 각 속성의 구간별 로그값에 의해 표현되는 곡선을 이산 코사인 함수 계수를 가지는 적절한 수의 코사인 함수로 변환하고, 적절한 수의 코사인 함수 각각의 이산 코사인 변환 계수를 찾고, 찾아진 이산 코사인 변환 계수를 적용하여 각 속성별 이산 코사인 변환값을 계산하여 각 속성별 특징(값)으로써 출력한다. 상기 코사인 함수의 수는 8개인 것이 바람직하며, 이에 한정되는 것은 아니다. 상기 이산 코사인 변환 계수 및 이산 코사인 변환값을 계산하는 그 자체는 공지의 기술이므로 그 상세한 설명을 생략한다.The discrete cosine transform (DCT) calculator 128 converts a curve represented by log values for each section of each attribute into an appropriate number of cosine functions having discrete cosine function coefficients, and a discrete cosine transform of each of the appropriate number of cosine functions. Find the coefficient, apply the found discrete cosine transform coefficient, calculate the discrete cosine transform value for each attribute, and output it as a feature (value) for each attribute. The number of cosine functions is preferably 8, but is not limited thereto. The discrete cosine transform coefficients and the discrete cosine transform values themselves are well-known techniques, and detailed descriptions thereof will be omitted.

유클리드 거리 계산부(130)는 상기 음성 특징 추출부(120), 즉 이산 코사인 계수 계산부(129)에서 출력되는 속성별 특징과 표준음성 DB(11)의 상기 음성 데이터에 대한 어휘에 대응하는 속성별 특징들 간의 거리를 계산하여 출력한다.The Euclidean distance calculator 130 is a feature for each attribute output from the voice feature extractor 120, that is, a discrete cosine coefficient calculator 129, and an attribute corresponding to the vocabulary for the voice data of the standard voice DB 11 The distance between the star features is calculated and output.

각 속성의 유클리드 거리는 하기 수학식 1과 같이 정의될 수 있을 것이다.The Euclidean distance of each attribute may be defined as in Equation 1 below.

x_n은 각 속성의 프레임들의 특징인 이산 코사인 변환값이고,

은 표준음성 DB(11)에 미리 저장되어 있는 해당 어휘의 각 속성에 대한 특징인 이산 코사인 변환값이다.x _n is a discrete cosine transform characteristic of the frames of each attribute,

Is a discrete cosine transform value that is characteristic for each attribute of the corresponding vocabulary previously stored in the standard voice DB 11.

평가값 계산부(140)는 상기 유클리드 거리 계산부(130)에서 계산된 유클리드 거리를 하기 수학식 2에 적용하여 평가값(Quality)을 계산한다.The evaluation value calculation unit 140 calculates an evaluation value (Quality) by applying the Euclidean distance calculated by the Euclidean distance calculation unit 130 to Equation 2 below.

음성 평가부(150)는 상기 평가값 계산부(140)에서 계산된 평가값과 평가 기준 DB(12)에 저장되어 있는 평가 기준 레벨값을 비교하여 해당 평가 기준 레벨값에 대응하는 평가정보를 디스플레이부(20) 및 오디오 처리부(40) 중 어느 하나 이상을 통해 제공한다.The voice evaluation unit 150 compares the evaluation value calculated by the evaluation value calculation unit 140 with the evaluation reference level value stored in the evaluation reference DB 12 and displays evaluation information corresponding to the evaluation reference level value It is provided through any one or more of the unit 20 and the audio processing unit 40.

도 2는 본 발명에 따른 언어재활 기반 발성 음성 평가 방법을 나타낸 흐름도로, 발화자의 연령을 입력받아 연령대별 평가를 수행하는 방법을 나타낸 흐름도이다.FIG. 2 is a flowchart illustrating a speech rehabilitation based speech rehabilitation method according to the present invention, and is a flowchart illustrating a method of receiving an age of a speaker and performing evaluation by age group.

도 2를 참조하면, 우선 제어부(50)는 발성 평가 이벤트가 발생되는지를 검사한다(S111).Referring to FIG. 2, first, the controller 50 checks whether a speech evaluation event is generated (S111).

발성 평가 이벤트가 발생되면 제어부(50)는 디스플레이부(20) 및 오디오 처리부(40) 중 어느 하나 이상을 통해 연령을 입력할 것을 요청할 수 있다(S113).When the utterance evaluation event occurs, the control unit 50 may request to input the age through one or more of the display unit 20 and the audio processing unit 40 (S113).

연령이 요청되면 제어부(50)는 입력부(30)를 통해 연령이 입력되는지를 검사하고(S115), 연령이 입력되면 연령에 따른 평가 설정을 수행한다(S117). 상기 평가 설정이란 연령대가 입력되는 연령대에 대응하는 속성별 특성 및 평가값을 설정하는 것이다. When the age is requested, the control unit 50 checks whether the age is input through the input unit 30 (S115), and when the age is input, performs evaluation setting according to the age (S117). The evaluation setting is to set characteristics and evaluation values for each attribute corresponding to the age group to which the age group is input.

평가 설정이 완료되면 제어부(50)는 디스플레이부(20) 및 오디오 처리부(40) 중 어느 하나 이상을 통해 표준 음성 DB(11)에 등록되어 있는 어휘들 중 미리 설정된 순서 또는 임의의 순서로 해당 어휘를 발성할 것을 요청한다(S119).When the evaluation setting is completed, the control unit 50 may display the corresponding vocabulary in a predetermined order or a random order among vocabularies registered in the standard voice DB 11 through one or more of the display unit 20 and the audio processing unit 40. It is requested to utter (S119).

발성 요청 후 제어부(50)는 오디오 처리부(40)를 통해 발화자가 발성한 음성에 대한 음성 데이터가 획득되는지를 모니터링하고(S121), 음성 데이터가 획득되면 음성 특징 추출 과정을 수행하여 평가값을 계산하고(S123), 계산된 평가값과 평가 기준 DB(12)에 저장된 평가 기준 레벨값을 비교하여 평가 레벨을 평가한 후, 평가 레벨에 대응하는 평가정보를 출력한다(S125).After the voice request, the control unit 50 monitors whether the voice data for the voice spoken is obtained through the audio processing unit 40 (S121), and when the voice data is obtained, performs a voice feature extraction process to calculate the evaluation value Then, the evaluation level is evaluated by comparing the calculated evaluation value with the evaluation reference level value stored in the evaluation reference DB 12, and evaluation information corresponding to the evaluation level is output (S125).

도 3은 본 발명에 따른 언어재활기반 발성 음성 평가 방법 중 음성 특징 추출 방법을 나타낸 흐름도이고, 도 4는 본 발명의 일실시예에 따른 장애 아동 및 정상 아동의 음성 파형 및 특징들을 나타낸 도면이다. 이하 도 3 및 도 4를 참조하여 설명한다. 도 4의 401은 6세 여아가 "목도리"를 발성한 경우의 음성 원신호(1), 파워 스펙트럼(Power Spectrum)(2), 속성별(RASTA-PLP, LPCC, MFCC) 에너지(3, 4, 5)를 나타낸 것이고, 402는 12세 여아가 "목도리"를 발성한 경우의 음성 원신호(1), 파워 스펙트럼(Power Spectrum)(2), 속성별(RASTA-PLP, LPCC, MFCC) 에너지(3, 4, 5)를 나타낸 것이며, 403은 정상인이 "목도리"를 발성한 경우의 음성 원신호(1), 파워 스펙트럼(Power Spectrum)(2), 속성별(RASTA-PLP, LPCC, MFCC) 에너지(3, 4, 5)를 나타낸 것이다. 3 is a flowchart illustrating a speech feature extraction method among speech rehabilitation based speech evaluation methods according to the present invention, and FIG. 4 is a diagram showing speech waveforms and characteristics of a disabled child and a normal child according to an embodiment of the present invention. It will be described below with reference to FIGS. 3 and 4. 4, 401 is a 6-year-old girl's voice when a "shawl" is spoken (1), power spectrum (Power Spectrum) (2), attribute-specific (RASTA-PLP, LPCC, MFCC) energy (3, 4) , 5), and 402 is the original signal (1), power spectrum (2), energy by attribute (RASTA-PLP, LPCC, MFCC) when a 12-year-old girl has spoken a "shawl" (3, 4, 5), and 403 is the voice signal (1), Power Spectrum (2), and attribute (RASTA-PLP, LPCC, MFCC) when a normal person has spoken "Shawl". ) Energy (3, 4, 5).

제어부(50)는 음성 데이터가 획득되면 음성 전처리 과정을 수행한다(S210).When the voice data is acquired, the controller 50 performs a voice pre-processing process (S210).

상기 음성 전처리 과정을 상세히 설명하면, 제어부(50)는 상기 음성 데이터를 (윈도우) 프레임으로 분할하고(S211), 분할한 프레임을 미리 정의된 샘플수로 샘플링을 수행하며(S213), 스플라인 보간법을 적용하여 정규화를 수행하고(S215), 정규화된 음성의 프레임에 단기 푸리에 변환(STFT)을 수행하여 도 4의 401 및 402의 2와 같이 전체 음성 데이터에 대한 파워 스펙트럼(Power Spectrum)을 계산한다(S217).When the voice preprocessing process is described in detail, the controller 50 divides the voice data into (window) frames (S211), samples the divided frames with a predefined number of samples (S213), and performs spline interpolation. Apply to normalize (S215), and perform a short-term Fourier transform (STFT) on the frame of the normalized speech to calculate the power spectrum (Power Spectrum) for the entire speech data as shown in 401 and 402 of FIG. S217).

상술한 음성 전처리 과정이 완료되면 제어부(50)는 파워 스펙트럼으로부터 속성 별 특성을 추출하는 음성 특징 추출 과정(S220)을 수행한다.When the above-described voice pre-processing process is completed, the controller 50 performs a voice feature extraction process (S220) for extracting characteristics for each attribute from the power spectrum.

상기 음성 특징 추출 과정을 구체적으로 설명하면, 제어부(50)는 파워 스펙트럼이 획득되면 상기 파워 스펙트럼에 속성별 필터뱅크를 적용하여 필터링을 수행하여 도 4의 401 및 402의 3, 4, 5와 같은 속성별 에너지를 계산한다(S221).When the power feature extraction process is specifically described, the control unit 50 performs filtering by applying a filter bank for each attribute to the power spectrum when the power spectrum is obtained, such as 3, 4 and 5 of 401 and 402 of FIG. 4. Energy for each attribute is calculated (S221).

속성별 에너지가 계산되면 제어부(50)는 상기 속성별 에너지의 구간별로 로그를 취하여 로그값을 계산한다(S223). 여기서의 구간이란 프레임 구간을 의미한다.When energy for each attribute is calculated, the controller 50 takes a log for each section of energy for each attribute and calculates a log value (S223). Here, the section means a frame section.

로그값의 계산 후 제어부(50)는 속성별 로그값들에 대해 이산 코사인 변환(DCT)을 수행하여 속성별 이산 코사인 변환값을 계산하고 상기 이산 코사인 변환값을 특징값으로 출력한다(S227).After calculating the log value, the controller 50 performs discrete cosine transform (DCT) on log values for each attribute to calculate a discrete cosine transform value for each attribute and outputs the discrete cosine transform value as a feature value (S227).

상기 속성별 특성값인 이산 코사인 변환값이 계산되면 제어부(50)는 속성별 특성과 표준 음성 DB(11)에 저장되어 있는 해당 어휘의 표준음성에 대한 속성별 기준 특성을 상기 수학식 1에 적용하여 속성별 유클리드 거리를 계산한다(S230).When the discrete cosine transform value, which is the characteristic value for each attribute, is calculated, the control unit 50 applies the characteristic characteristic and the reference characteristic by attribute for the standard voice of the corresponding vocabulary stored in the standard voice DB 11 to Equation (1). To calculate the Euclidean distance for each attribute (S230).

상기 유클리드 거리가 계산되면 제어부(50)는 유클리드 거리 및 표준 음성 DB(11)에 해당 어휘에 대해 정의된 속성별 가중치를 수학식 2에 적용하여 평가값을 계산한다(S240). 평가값은 클수록 정상인에 가까우며, 작을수록 언어 장애가 있을 가능성이 커짐을 의미한다.When the Euclidean distance is calculated, the control unit 50 calculates the evaluation value by applying the attribute-specific weight defined for the vocabulary to Equation 2 in the Euclidean distance and the standard voice DB 11 (S240). The larger the evaluation value, the closer it is to the normal person, and the smaller the evaluation value, the greater the likelihood of having a speech disorder.

도 4의 401에 대한 유클리드 거리 및 평가값(Quality)은 하기 표 2와 같이 계산되며, 도 4의 402에 대한 유클리드 거리 및 평가값은 하기 표 3과 같이 계산된다. Euclidean distance and evaluation value (Quality) for 401 in FIG. 4 are calculated as shown in Table 2 below, and Euclidean distance and evaluation value for 402 in FIG. 4 are calculated as shown in Table 3 below.

6세 여아6-year-old girl LPCCLPCC MFCCMFCC RASTA-PLPRASTA-PLP QualityQuality 20.7520.75 1190.41190.4 15.4515.45 0.0450.045

12세 여아12-year-old girl LPCCLPCC MFCCMFCC RASTA-PLPRASTA-PLP QualityQuality 13.2313.23 716.5716.5 11.0811.08 0.0660.066

상기 표 2 및 표3에서 보이는 바와 같이 12세 여아의 경우 0.066으로 6세 여아에 비해 평가값이 높음을 알 수 있다. 따라서 6세 여아의 경우 평가 레벨이 낮으며 그에 따라 언어 장애 가능성이 높을 것이다.As shown in Table 2 and Table 3, it can be seen that the evaluation value of the 12-year-old girl is 0.066, which is higher than the 6-year-old girl. Therefore, a 6-year-old girl will have a low evaluation level and will likely have a language disorder.

한편, 본 발명은 전술한 전형적인 바람직한 실시예에만 한정되는 것이 아니라 본 발명의 요지를 벗어나지 않는 범위 내에서 여러 가지로 개량, 변경, 대체 또는 부가하여 실시할 수 있는 것임은 당해 기술분야에서 통상의 지식을 가진 자라면 용이하게 이해할 수 있을 것이다. 이러한 개량, 변경, 대체 또는 부가에 의한 실시가 이하의 첨부된 특허청구범위의 범주에 속하는 것이라면 그 기술사상 역시 본 발명에 속하는 것으로 보아야 한다.On the other hand, the present invention is not limited to only the typical preferred embodiments described above, but can be carried out by improving, changing, replacing or adding in various ways without departing from the gist of the present invention. Anyone who has a will easily understand. If the implementation by such improvement, modification, replacement, or addition falls within the scope of the appended claims, the technical idea should also be regarded as belonging to the present invention.

10: 저장부 11: 표준음성 DB
12: 평가기준 DB 13: 평가 DB
20: 디스플레이부 30: 입력부
40: 오디오 처리부 50: 제어부
101: 발성 유도부 110: 음성신호 처리부
120: 음성 특징 추출부 121: 샘플링부
122: 에너지 스펙트럼 획득부 123: 속성별 특징 추출부
124: 제1특징(MFCC) 추출부 125: 제2특징(LPCC) 추출부
126: 제3특징(RASTA-PLP) 추출부 127: 로그부
128: 이산 코사인 변환부
130: 유클리드 거리 계산부 140: 평가값 계산부
150: 음성 평가부 10: storage unit 11: standard voice DB
12: Evaluation criteria DB 13: Evaluation DB
20: display unit 30: input unit
40: audio processing unit 50: control unit
101: speech induction unit 110: voice signal processing unit
120: voice feature extraction unit 121: sampling unit
122: energy spectrum acquisition unit 123: feature extraction unit by attribute
124: first feature (MFCC) extraction unit 125: second feature (LPCC) extraction unit
126: third feature (RASTA-PLP) extraction unit 127: log unit
128: discrete cosine transform
130: Euclidean distance calculation unit 140: Evaluation value calculation unit
150: voice evaluation unit

Claims

A storage unit including a standard voice DB for storing standard voice feature information for each attribute for the standard voice and an evaluation criterion DB for storing evaluation reference level values;
An audio processing unit that receives a voice spoken by the talker and outputs voice data; And
It receives the voice data, divides it into a number of frames, normalizes by sampling in frame units, detects a plurality of attributes for each attribute in a normalized frame unit, and detects the characteristics for each attribute and the standard voice DB stored in advance. Calculate the Euclidean distance (similarity) between the features of the feature information for each attribute of the corresponding standard voice for the spoken voice, calculate the evaluation value that reflects the calculated Euclidean distance for each attribute, and calculate the evaluation value and And a control unit for evaluating the spoken voice of the speaker by comparing the evaluation reference level value of the evaluation criteria DB.

According to claim 1,
The control unit,
A voice signal processor for obtaining and outputting voice data through the audio processor;
A voice feature extraction unit receiving voice data from the voice signal processing unit, dividing it into a plurality of frames, sampling and normalizing them in frame units, and detecting a plurality of characteristics for each attribute in a normalized frame unit;
A Euclidean distance calculator that calculates a distance (similarity) between the detected feature-specific feature and the feature-specific feature for the standard voice previously stored in the standard voice DB;
An evaluation value calculation unit for calculating an evaluation value that reflects the calculated distance for each attribute in a complex manner; And
And a speech evaluation unit for evaluating the spoken voice of the talker by comparing the calculated evaluation value with the evaluation reference level value of the evaluation reference DB.

According to claim 2,
The voice feature extraction unit,
A sampling unit that is divided into a plurality of frames and sampled in units of frames to normalize;
An energy spectrum obtaining unit performing a short term Fourier transform on a normalized frame to obtain a power spectrum for the voice data corresponding to the spoken voice;
A feature extraction unit for each attribute that extracts energy for each attribute in each frequency section for the power spectrum;
A log unit for each section that calculates a log value by taking a log of energy for each section of each attribute; And
Speech rehabilitation-based speech evaluation comprising a discrete cosine transform calculation unit for performing discrete cosine transform on a curve represented by continuous log values for each section and outputting discrete cosine transform values as feature values. Device.

According to claim 3,
The feature extraction unit for each attribute,
An MFCC feature extraction unit that calculates the energy of the auditory-based attribute for the spoken voice by applying a Mel Scale filter bank to the power spectrum, and extracts the first characteristic by adding the calculated energy;
An LPCC feature extraction unit that calculates the energy of a seong-based attribute for a speech voice by applying a linear scale filter bank to the power spectrum, and extracts a second feature by adding the calculated energy; And
RASTA-PLP (Relative Spectral-) that calculates the energy for the speech with the background noise removed by applying a Bark Scale filter bank and a noise filter to the power spectrum, and extracts the third characteristic by adding the calculated energy Perceptual Linear Prediction) Speech rehabilitation-based speech evaluation apparatus comprising a feature extraction unit.

According to claim 1,
The standard voice DB,
Stores standard voice feature information for predefined vocabulary,
The control unit,
A speech rehabilitation-based speech evaluation apparatus characterized in that one or more of a predefined vocabulary is sequentially output through a speaker of an audio processing unit to induce the speaker to speak a voice for the output vocabulary.

The method of claim 5,
The vocabulary is speech rehabilitation based speech rehabilitation device, characterized in that the U-TAP (U-TAP) vocabulary.

According to claim 4,
The evaluation value calculation unit,
MFCC, LPCC and RASTA-PLP attribute-specific weights (MFCC->w1, LPCC->w2, RASTA-PLP->w3) and Euclidean distance for each attribute are applied to the following Equation 2 to calculate the evaluation value: Speech rehabilitation device based on language rehabilitation.
[Equation 2]

Here, Feature_similarity _i is the Euclidean distance of each attribute.

The method of claim 7,
The standard voice DB,
Stores standard voice feature information and weights for each vocabulary for a predefined vocabulary,
The control unit,
After one or more of the predefined vocabulary is sequentially output through the speaker of the audio processing unit, the speaker is induced to utter a voice for the output vocabulary, and then a weight corresponding to the vocabulary of the voice input according to the vocabulary derived Speech rehabilitation based speech rehabilitation device, characterized in that applying.

A voice acquiring process in which the controller acquires voice data for the voice uttered by the speaker through the audio processor;
A feature detection process for each attribute in which the control unit divides the audio data into a plurality of frames, samples the frames, and normalizes them, and detects a plurality of attributes for each attribute in the normalized frame unit;
A Euclidean distance calculation process in which the control unit calculates a Euclidean distance (similarity) between the characteristics for each attribute detected and the characteristics for each attribute for a standard voice previously stored in the standard voice DB;
An evaluation value calculation process in which the control unit calculates an evaluation value that reflects the Euclidean distance for each attribute calculated in a complex manner; And
And an evaluation process in which the control unit compares the calculated evaluation value with the evaluation reference level value of the evaluation reference DB to evaluate the spoken voice of the speaker.

The method of claim 9,
The feature detection process for each attribute,
A sampling step of dividing into a plurality of frames and sampling and normalizing by frame unit;
An energy spectrum acquisition step of performing a short term Fourier transform on a normalized frame to obtain a power spectrum for the speech data corresponding to the spoken speech;
A feature extraction step for each attribute that extracts a feature (energy value) for each attribute in each frequency section for the power spectrum;
A log value calculation step by section for calculating a log value by taking a log of the feature of each section of each attribute; And
A speech rehabilitation-based speech evaluation method comprising a discrete cosine transform calculation step of calculating a discrete cosine transform value by performing a discrete cosine transform on a log value for each section of each attribute.

The method of claim 10,
The feature extraction step for each attribute,
An MFCC feature extraction step of applying a Mel Scale filter bank to the power spectrum to calculate the energy of the auditory-based attribute for speech, and summing the calculated energy to extract a first feature;
An LPCC feature extraction step of applying a linear scale filter bank to the power spectrum to calculate the energy of a voice-based attribute for a spoken voice, and summing the calculated energy to extract a second feature; And
A Relative Spectral-Packet Level Procedure (RASTA-PLP) that applies a Bark Scale filter bank to the power spectrum to calculate the energy of intonation-based attributes for speech, and sums the calculated energy to extract a third feature. Speech rehabilitation based speech evaluation method characterized in that it comprises a feature extraction step.

The method of claim 9,
The control unit further comprises a vocabulary speech inducing process of sequentially outputting at least one of the vocabularies predefined in the standard speech DB through the speaker of the audio processing unit to induce the speaker to vocalize the output vocabulary. Speech rehabilitation-based speech evaluation method.

The method of claim 12,
The vocabulary is a speech rehabilitation based speech rehabilitation method, characterized in that the U-TAP vocabulary.

The method of claim 11,
The control unit,
MFCC, LPCC and RASTA-PLP attribute-specific weights (MFCC->w1, LPCC->w2, RASTA-PLP->w3) and Euclidean distance for each attribute are applied to the following Equation 2 to calculate the evaluation value: Speech rehabilitation based speech rehabilitation method.
[Equation 2]

Here, Feature_similarity _i is the Euclidean distance of each attribute.