KR20200000116A

KR20200000116A - VOCAL ABILITY JUDGEMENT METHOD WORKING ON SMALL SoC

Info

Publication number: KR20200000116A
Application number: KR1020180072020A
Authority: KR
Inventors: 신대진
Original assignee: 주식회사 이드웨어
Priority date: 2018-06-22
Filing date: 2018-06-22
Publication date: 2020-01-02

Abstract

Provided is a speech ability determination method used in a voice recognition application for a user who speaks in an inaccurate pronunciation in a small device having a small SoC mounted thereon. The method comprises: a step (S110) of defining a speech error pattern according to speaker types for each word or phrase; a step (S120) of generating an impairment sound acoustic model and a non-impairment sound acoustic model for words or phrases defined in the speech error pattern; a step (S130) of receiving a user voice; a step (S140) of comparing and analyzing the input user voice with the impairment sound acoustic model and non-impairment sound acoustic model; and a step (S150) of determining a speech ability of the user from the comparison and analysis result. The speech ability determination method is used in a voice recognition application comprising: a step (S210) of selecting a voice DB to be used in voice recognition according to a speech ability determination result; and a step (S220) of proceeding user voice recognition by using the selected voice DB.

Description

VOCAL ABILITY JUDGEMENT METHOD WORKING ON SMALL SoC}

이 발명은 발화자의 발성 능력을 판정하는 방법에 관한 것이며, 더 자세하게는 소형 SoC(System on Chip)에서 빠르게 동작하는 음성 인식 알고리즘을 구현하기 위해 이용되는 발화자의 발성 능력 판정 방법에 관한 것이다.The present invention relates to a method for determining a talker's speech ability, and more particularly, to a method for determining a talker's speech ability used to implement a speech recognition algorithm that operates quickly in a small system on chip (SoC).

스마트폰 등과 같은 소형 기기에서 음성인식을 활용하는 각종 애플리케이션이 보급되고 있다.Various applications that utilize voice recognition in small devices such as smart phones are spreading.

이러한 어플리케이션의 성패는 음성인식의 성능에 좌우된다.The success or failure of these applications depends on the performance of voice recognition.

음성인식은 발화자의 음성에서 추출된 음소열 음파 특성을 음소열 데이터 풀의 음소열 음파 특성과 비교하여 최적의 음소열을 찾아내고, 찾아낸 음소열을 발화자의 음성에 대응하는 것이라고 인식한다.Speech recognition finds the optimal phoneme sequence by comparing phoneme sequence sound wave characteristics extracted from the speaker's voice with phoneme sequence sound wave characteristics of the phoneme sequence data pool, and recognizes the found phoneme sequence as corresponding to the caller's voice.

음소열 데이터 풀은 기본적으로 정상인의 발음에 대응하는 음소열로 구성된다. 그러므로, 언어에 미숙한 아동, 외국인, 또는 기타 장애로 발음이 비정상적인 사람의 발음은 제대로 인식하지 못하여 무용지물이 된다.The phoneme string data pool basically consists of phoneme strings corresponding to pronunciation of normal people. Therefore, pronunciation of a child who is not good at language, a foreigner, or a person whose pronunciation is abnormal due to a disorder is not properly recognized and becomes useless.

발음이 비정상적인 사람의 음성까지도 정확하게 인식하기 위해서는 음소열 데이터 풀의 양이 방대해지게 되는데, 음소열 데이터 풀의 양이 커진다는 것은 그 만큼 많은 연산 자원을 요구한다는 것이다. 다시 말해서, 이러한 음성인식은 활용 가능한 연산 자원이 한정된 소형 SoC가 장착된 소형 기기에서는 사용할 수 없음을 의미한다.In order to accurately recognize even the voice of an abnormal speaker, the amount of phoneme string pool becomes huge, and the increase of the phoneme string pool requires a lot of computational resources. In other words, this speech recognition means that it cannot be used in small devices equipped with small SoCs with limited computing resources available.

이 발명의 발명자들은 발음이 비정상적인 사람의 유형에 따라 상당히 높은 확률로 비슷한 오류를 범한다는 사실, 다시 말해서 발화자의 발화오류를 패턴화할 수 있으며, 발화자를 장애 유형별로 구분하여 관찰하면 상당히 유의미한 발화오류 패턴을 얻을 수 있음을 발견하였다.The inventors of the present invention can pattern similar errors with a very high probability according to the type of abnormal person pronunciation, that is, the speaker's speech error can be patterned. It was found that can be obtained.

이 발명은 소형 SoC가 장착된 소형 기기에서는 발음이 비정상적인 사람이 사용할 수 있는 음성인식 애플리케이션에 이용하기 위한 발성 능력 판정 방법을 제안하려는 것이다.The present invention aims to propose a method for determining speech ability for use in speech recognition applications that can be used by people with abnormal pronunciation in small devices equipped with small SoCs.

또한, 이 발명은 발성 능력 판정 방법을 이용한 음성인식 애플리케이션을 제안하려는 것이다.In addition, the present invention is to propose a speech recognition application using the speech ability determination method.

또한, 이 발명은 발성 능력 판정 방법을 이용한 음성인식 애플리케이션이 설치된 소형 기기를 제안하려는 것이다.In addition, the present invention is to propose a small device equipped with a speech recognition application using the speech ability determination method.

이 발명의 한 양태에 따르면, 소형 SoC가 장착된 소형 기기에서는 발음이 비정상적인 사람이 사용할 수 있는 음성인식 애플리케이션에 이용하기 위한 발성 능력 판정 방법이 제공된다.According to one aspect of the present invention, in a small device equipped with a small SoC, there is provided a method for determining speech ability for use in speech recognition applications that can be used by people with abnormal pronunciation.

이 방법은, 단어별 또는 어절별로 발화자 유형에 따른 발화오류 패턴을 정의하는 단계,The method includes the steps of defining a speech error pattern according to a talker type by word or word,

발화오류 패턴에 정의된 단어 또는 어절에 대한 장애음 음향모델 및 비장애음 음향모델을 생성하는 단계,Generating an acoustic sound model and a non-sound acoustic sound model for a word or word defined in a speech error pattern;

사용자의 음성을 입력받는 단계,Receiving a voice of a user,

입력된 사용자 음성을 장애음 음향모델 및 비장애음 음향모델에 각각 비교분석하는 단계, 및Comparing and analyzing the input user's voice to the acoustic sound model and the non-sound acoustic model, respectively; and

비교분석 결과로부터 사용자의 발성능력을 판정하는 단계를 포함한다.And determining the user's speech ability from the comparative analysis result.

이 발명의 다른 한 양태에 따르면, 위와 같은 발성 능력 판정 방법을 이용한 음성인식 애플리케이션이 제공된다.According to another aspect of the present invention, there is provided a speech recognition application using the method for determining speech ability.

이 애플리케이션은, 발성능력 판정 결과에 따라 음성인식에 사용할 음성 DB(database)를 선택하는 단계, 및This application, the step of selecting a voice DB (database) to be used for speech recognition in accordance with the speech ability determination result, and

선택된 음성 DB를 이용하여 사용자 음성인식을 진행하는 단계를 포함한다.And performing user voice recognition using the selected voice DB.

이 발명의 또다른 한 양태에 따르면, 위와 같은 발성 능력 판정 방법을 이용한 음성인식 애플리케이션이 설치된 소형 기기가 제공된다.According to yet another aspect of the present invention, there is provided a small device provided with a speech recognition application using the above-described speech ability determination method.

여기에서, "음성인식 애플리케이션"이라고 함은 음성인식을 이용하여 아동의 언어학습을 돕는 애플리케이션, 음성인식을 이용하여 장애인의 인지재활학습을 돕는 애플리케이션 등 음성인식을 이용하는 다양한 애플리케이션을 포괄적으로 지칭한다.Here, "speech recognition application" refers to a variety of applications that use speech recognition, such as applications that help children learn language using speech recognition, applications that help people with cognitive rehabilitation learning using speech recognition.

여기에서, "소형 기기"라고 함은 음성인식을 이용하는 다양한 애플리케이션이 실행되는 스마트폰, 완구 등 각종 기기를 포괄적으로 지칭한다.Here, the term “small device” refers to a variety of devices such as smart phones and toys on which various applications using voice recognition are executed.

이 발명에 따른 발성 능력 판정 방법은 다양한 유형의 장애 음성 DB 및 비장애 음성 DB 중에서 사용자에게 적합한 음성 DB를 선택하여 사용할 수 있게 함으로써, 활용 가능한 연산 자원이 한정된 소형 SoC가 장착된 소형 기기에서 음성인식 애플리케이션이 원활하게 실행될 수 있게 한다.In the speech ability determination method according to the present invention, a voice DB suitable for a user can be selected and used among various types of impaired voice DBs and non-disabled voice DBs, thereby enabling speech recognition in a small device equipped with a small SoC having limited available computational resources. To make this run smoothly.

도 1은 이 발명의 한 실시예에 따른 발성 능력 판정 방법을 설명하기 위한 순서도이고,
도 2는 도 1에 따른 발성 능력 판정 방법을 이용한 음성인식 애플리케이션의 예시적 작동을 설명하기 위한 순서도이다.1 is a flowchart illustrating a method for determining speech ability according to an embodiment of the present invention.
2 is a flowchart illustrating an exemplary operation of a speech recognition application using the method for determining speech ability according to FIG. 1.

아래에서는, 첨부된 도면을 참조하여, 이 발명의 한 실시예에 따른, 발성 능력 판정 방법, 및 그것을 이용한 음성인식 애플리케이션에 대하여 상세하게 설명한다.Hereinafter, with reference to the accompanying drawings, it will be described in detail with respect to the speech ability determination method, and speech recognition application using the same according to an embodiment of the present invention.

도 1에는 이 발명의 한 실시예에 따른 발성 능력 판정 방법을 설명하기 위한 순서도가 되시되어 있고,1 is a flowchart illustrating a method for determining speech ability according to an embodiment of the present invention.

도 2에는 도 1에 따른 발성 능력 판정 방법을 이용한 음성인식 애플리케이션의 예시적 작동을 설명하기 위한 순서도가 도시되어 있다.2 is a flowchart illustrating an exemplary operation of a speech recognition application using the method for determining speech ability according to FIG. 1.

이 실시예에 따른 발성 능력 판정 방법에서는, 먼저, 단어별 또는 어절별로 발화자 유형에 따른 발화오류 패턴을 정의한다.(S110) 발화오류 패턴은 발화자의 발성 장애 유형에 따라 구분하여 관찰함으로써 경험적으로 정의되며, 사용빈도가 큰 단어 또는 어절을 선택하여 대표적으로 구성된다. 예를 들어, 아동의 아동의 언어학습을 돕는 애플리케이션에서는 "코끼리" 등과 같은 동물이나 사물 이름이 자주 사용될 수 있다. 아동들은 흔히 "코끼리"를 "오띠, 도띠, 토띠, 오띠이, 코끼이, 코끼디" 등으로 잘 못 발성하며, 때로는 "싫어, 안해, 뭐야, 사슴" 등 심리상태나 인지오류에 의해 전혀 엉뚱한 말을 하기도 한다. 발화오류 패턴 정의는 관찰에 의해 경험적으로 구성할 수도 있지만, 언어학적 또는 심리학적 고찰에 의해 추론적으로 구성할 수도 있다.In the speech ability determining method according to this embodiment, first, a speech error pattern according to a talker type is defined for each word or word. It is composed representatively by selecting a word or word with a high frequency of use. For example, in an application that helps a child learn language, the animal or thing name such as "elephant" may be used frequently. Children often speak "Elephant" as "Otti, Dotti, Totti, Ottii, Elephant, Elephant", and sometimes say nothing at all because of psychological conditions or cognitive errors such as "No, no, what, deer". Sometimes. Speech error pattern definitions may be empirically constructed by observation, but may be inferentially constructed by linguistic or psychological considerations.

이어서, 발화오류 패턴에 정의된 단어 또는 어절에 대해 장애음 음향모델 및 비장애음 음향모델을 생성한다.(S120) 비장애음 음향모델에는 정상적인 발음 "코끼리"가 포함될 것이며, 비장애음 음향모델에는 "오띠, 도띠, 토띠, 오띠이, 코끼이, 코끼디" 또는 "오띠, 도띠, 토띠, 오띠이, 코끼이, 코끼디, 싫어, 안해, 뭐야, 사슴" 등의 장애 유형에 따른 비정상적 발음이 포함될 것이다. 정의된 발화오류 패턴에 있는 단어 또는 어절에 대응하는 음행모델을 생성함에 있어서는 글자를 발음나는 음소로 바꾸는 엔진인 G2P(Grapheme To Phoneme)를 이용할 수 있다.Subsequently, the acoustic sound model and the non-sounding acoustic model are generated for the word or word defined in the speech error pattern. (S120) The non-sounding acoustic model will include the normal pronunciation "elephant", and the non-sounding acoustic model will be "Otti". Abnormal pronunciation, depending on the type of disorder, such as, "Dotti, Totti, Otti, Elephant, Elephant", "Otti, Totti, Totti, Otiti, Elephant, Elephant, Dislike, No Harm, What, Deer". In generating a phonetic model corresponding to a word or word in a defined speech error pattern, G2P (Grapheme To Phoneme), which is an engine that converts letters into phonemes, may be used.

발화오류 패턴에 따른 음향 DB가 구축되면, 사용자의 음성을 입력받고(S130), 입력된 사용자 음성을 장애음 음향모델 및 비장애음 음향모델에 각각 비교분석한다.(S140) 비교분석은 비터비 디코딩을 이용할 수 있다.When the acoustic DB according to the speech error pattern is constructed, the user's voice is input (S130), and the input user's voice is compared and analyzed to the acoustic sound model and the non-sounding acoustic model, respectively (S140). Can be used.

비교분석 결과로부터 사용자의 발성능력을 판정한다.(S150) 발성능력 판정은 동일한 사용자가 발성하는 음성의 수개의 단어 또는 어절에 대해 비교분석한 스코어에 의해 결정하며, 충분히 정확한 판정을 기대할 수 있는 개수의 단어 또는 어절에 이르기까지 단계가 지속된다.The speech ability of the user is determined based on the result of the comparative analysis. The stage lasts up to a word or word.

도 1의 발성 능력 판정 방법에서 얻은 결과는 도 2의 음성인식 애플리케이션의 작동에서 이용된다.The results obtained in the speech capability determination method of FIG. 1 are used in the operation of the speech recognition application of FIG.

이 애플리케이션에서는, 발성능력 판정 결과에 따라 음성인식에 사용할 음성 DB를 선택하는 단계(S210), 및 선택된 음성 DB를 이용하여 사용자 음성인식을 진행하는 단계(S220)가 진행된다.In this application, the step S210 of selecting a voice DB to be used for voice recognition according to the speech ability determination result, and the step S220 of performing user voice recognition using the selected voice DB are performed.

이 애플리케이션에서는, 선택된 음성 DB만를 이용하여 사용자 음성인식을 진행하므로, 활용 가능한 연산 자원이 한정된 소형 SoC가 장착된 소형 기기에서 애플리케이션이 충분히 신속하고 원활하게 실행될 수 있다.In this application, the user's voice recognition is performed using only the selected voice DB, so that the application can be executed quickly and smoothly in a small device equipped with a small SoC with limited computing resources available.

또한, 이 발명에 따르면, 위와 같은 발성 능력 판정 방법을 이용한 음성인식 애플리케이션이 설치된 소형 기기가 제공된다.In addition, according to the present invention, there is provided a small device equipped with a speech recognition application using the above-described speech ability determination method.

아래에서는, 이 발명의 예시적 실시형태를 상세하게 설명한다.In the following, exemplary embodiments of this invention are described in detail.

제1 실시형태에에서는, 소형 SoC가 장착된 소형 기기에서는 발음이 비정상적인 사람이 사용할 수 있는 음성인식 애플리케이션에 이용하기 위한 발성 능력 판정 방법으로서,In the first embodiment, as a speech ability determination method for use in a speech recognition application that can be used by a person whose pronunciation is abnormal in a small device equipped with a small SoC,

단어별 또는 어절별로 발화자 유형에 따른 발화오류 패턴을 정의하는 단계,Defining a speech error pattern according to a talker type by word or word,

사용자의 음성을 입력받는 단계,Receiving a voice of a user,

비교분석 결과로부터 사용자의 발성능력을 판정하는 단계를 포함하는, 발성 능력 판정 방법이 제공된다.A speech ability determination method is provided, comprising determining a user's speech ability from a comparative analysis result.

제2 실시형태에에서는, 제1항에 따른 발성 능력 판정 방법을 이용하는 음성인식 애플리케이션으로서,In the second embodiment, the speech recognition application using the speech ability determination method according to claim 1,

발성능력 판정 결과에 따라 음성인식에 사용할 음성 DB를 선택하는 단계, 및Selecting a voice DB to be used for voice recognition according to the result of the speech ability determination; and

선택된 음성 DB를 이용하여 사용자 음성인식을 진행하는 단계를 포함하는, 음성인식 애플리케이션이 제공된다.There is provided a voice recognition application, including the step of performing a user voice recognition using the selected voice DB.

제3 실시형태에에서는, 제1항에 따른 발성 능력 판정 방법을 이용하는 음성인식 애플리케이션이 설치된 소형 기기가 제공된다.In the third embodiment, a small device provided with a speech recognition application using the speech capability determining method according to claim 1 is provided.

위에서는 이 발명의 양호한 실시예에 따른 예시적 구성에 관하여 설명하였지만, 이 발명이 그러한 예시적 구성으로 제한되는 것은 아니다.Although an exemplary configuration has been described above in accordance with the preferred embodiment of the present invention, the present invention is not limited to such exemplary configuration.

이 발명이 속한 기술분야에서 숙련된 자들은, 이 발명의 기술사상을 벗어남이 없이, 위에서 설명한 예시적 구성의 수정, 변경 또는 치환을 쉽게 이룰 수 있음이 명백하다.It will be apparent to those skilled in the art that the present invention may be readily modified, changed or substituted without departing from the spirit of the invention.

첨부된 특허청구범위는 이 발명의 기술사상의 범위를 벗어남이 없는 그러한 수정, 변경 또는 치환을 이 발명의 보허범위 내에 두고자 한다It is intended that the appended claims cover such modifications, changes or substitutions without departing from the spirit of the invention.

..

Claims

In small devices equipped with a small SoC, as a method of determining speech ability for use in speech recognition applications that can be used by people with abnormal pronunciation,
Defining a speech error pattern according to a talker type by word or word,
Generating an acoustic sound model and a non-sound acoustic sound model for a word or word defined in a speech error pattern;
Receiving a voice of a user,
Comparing and analyzing the input user's voice to the acoustic sound model and the non-sound acoustic model, respectively; and
Determining the speech ability of the user from the comparative analysis result.

A speech recognition application using the method for determining speech ability according to claim 1,
Selecting a voice DB to be used for voice recognition according to the result of the speech ability determination; and
Proceeding to the user voice recognition using the selected voice DB, speech recognition application.

A small device equipped with a speech recognition application using the method for determining speech ability according to claim 1.