KR20200011176A

KR20200011176A - Korean speech recognition system with pronunciation classification based on vowel pronunciation

Info

Publication number: KR20200011176A
Application number: KR1020180085978A
Authority: KR
Inventors: 권용은
Original assignee: 권용은
Priority date: 2018-07-24
Filing date: 2018-07-24
Publication date: 2020-02-03

Abstract

The present invention relates to a Korean speech recognition and conversion system with pronunciation classification focusing on vowel pronunciation. The Korean speech recognition and conversion system recognizes one syllable focusing on vowel and converts one character including at least one consonant and vowel focusing on vowel. By providing a method to classify vowels in a method different from an existing vowel classification method and to provide the speech pronunciation characteristics focusing on vowel pronunciation, the Korean speech recognition and conversion system may recognize speech with excellent recognition rate and convert speech with natural pronunciation.

Description

Korean speech recognition system with pronunciation classification based on vowel pronunciation}

본 발명은 모음 발음을 중심으로 하는 발음 분류방법 및 상기 분류방법이 도입된 한국어 음성인식 시스템에 관한 것이다.The present invention relates to a pronunciation classification method centering on vowel pronunciation and a Korean speech recognition system in which the classification method is introduced.

여기서는, 본 개시에 관한 배경기술이 제공되며, 이들이 반드시 공지기술을 의미하는 것은 아니다.Here, background art is provided with respect to the present disclosure, and these do not necessarily mean known art.

음성은 사람과 사람 상호간에 의사전달을 하기 위해 인간이 사용하는 여러 가지 수단들 중에서 가장 기본적인 의사전달 수단이다. 이러한 사람의 음성을 컴퓨터, 전화기, 휴대폰 등의 전자기기가 인식 가능하도록 텍스트로 변환하는 것이 음성인식 기술이며, 텍스트를 음성으로 변환하는 것이 음성변환 기술이다. 오래전부터 음성인식 및 음성변환과 관련하여 많은 연구가 이루어지고 있으며, 최근에는 다양한 분야에서 음성인식 및 음성변환 기술이 상용화되고 있다.Voice is the most basic means of communication among the various means that humans use to communicate with each other. The voice recognition technology converts a person's voice into text to be recognized by an electronic device such as a computer, a telephone, a mobile phone, and the like. A lot of research has been done with respect to speech recognition and speech conversion for a long time, and recently, speech recognition and speech conversion technologies have been commercialized in various fields.

일반적으로 음성인식은 다음과 같이 이루어진다. 마이크를 통해 음성이 입력되면, 입력된 음성신호를 디지털 신호로 변환한다. 변환된 음성신호로부터 음성 특징을 추출한 다음, 메모리에 저장된 코드북(code book)과의 비교를 통해 인식 결정을 하고, 그 결과를 처리한다. 여기서 코드북이란 인식 결정에 이용되는 기준 모델인 코드 워드(code word)의 집합체이다. 일반적으로 음성신호는 표본화(sampling), 양자화(quantization) 과정을 거친 후 코드 워드로 부호화된다.In general, speech recognition is performed as follows. When voice is input through the microphone, the input voice signal is converted into a digital signal. After extracting the speech feature from the converted speech signal, recognition is made by comparison with a code book stored in the memory, and the result is processed. Here, the codebook is a collection of code words that are reference models used for recognition determination. In general, a speech signal is encoded into a code word after sampling and quantization.

한편 음성변환의 경우, 음소에 대한 발음 데이터베이스를 구축하고 이를 연결시켜 연속된 음성을 생성하고, 음성의 크기, 길이, 높낮이 등을 조절해 자연스러운 음성을 합성한다. 즉 텍스트가 입력이 되면 언어처리 단계에서 입력된 문서의 문법적 구조를 분석한 후 분석된 문서 구조에 의하여 사람이 읽는 것과 같은 운율을 생성하고, 생성된 운율에 따라 저장된 음성 DB의 기본 단위들을 모아서 합성음을 생성하는 파형합성 단계를 거친다.On the other hand, in the case of speech conversion, a sound database for the phonemes is constructed and connected to generate continuous speech, and the natural speech is synthesized by adjusting the size, length, and height of the speech. That is, when text is input, the grammatical structure of the input document is analyzed in the language processing step, and then a rhyme like human reading is generated by the analyzed document structure, and the basic units of the stored voice DB are collected according to the generated rhyme. The waveform synthesis step is generated.

상기와 같은 음성인식 및 음성변환 기술은 해당 언어의 고유한 음성 특징, 특히 발음 특징이 고려되어야 음성인식의 인식률이 높아지고, 자연스럽게 음성변환이 가능한데, 종래의 한국어 음성인식 기술은 한국어 발음의 고유한 특징을 고려하지 않고 해외의 기술을 그대로 도입하여 한국어 음성인식 기술을 개발하여 인식률이 떨어지고 음성변환 시 어색하게 들리는 문제점이 있다. The speech recognition and speech conversion techniques such as the unique speech features of the language, in particular, pronunciation features are considered in consideration of the higher recognition rate of speech recognition, and the natural voice conversion is possible, the conventional Korean speech recognition technology is a unique feature of Korean pronunciation Korean voice recognition technology is developed by introducing foreign technology as it is, without recognition, and there is a problem that the recognition rate decreases and sounds awkward during voice conversion.

본 발명은 상기와 같은 문제점을 해결하기 위한 것으로 한국어 고유의 음성 발음 특징에 따라 음성 발음 특징을 부여하는 방법을 제공하고, 상기 부여된 음성 발음 특징으로 분류된 음성 발음 데이터를 이용하여 한국어 음성인식 및 음성변환 기술에 이용하고자 하는 것이다. The present invention has been made to solve the above problems and provides a method for assigning a phonetic pronunciation feature according to Korean phonetic pronunciation features, and using Korean phonetic recognition data classified using the phonetic pronunciation features. It is intended to be used for voice conversion technology.

그러나 본 발명의 목적들은 상기에 언급된 목적으로 제한되지 않으며, 언급되지 않은 또 다른 목적들은 아래의 기재로부터 당업자에게 명확하게 이해될 수 있을 것이다.However, the objects of the present invention are not limited to the above-mentioned objects, and other objects not mentioned will be clearly understood by those skilled in the art from the following description.

본 발명은 한국어에 있어서, 하나의 음절을 모음을 중심으로 음성인식하는 것을 특징으로 하는 모음 발음을 중심으로 하는 발음 분류가 도입된 한국어 음성인식 시스템을 제공한다. The present invention provides a Korean speech recognition system in which pronunciation classification based on vowel pronunciation is introduced.

또한 상기 모음 발음을 중심으로 하는 발음 분류는 모음을 소리에 따라 제1 계열 모음, 제2 계열 모음 및 제3 계열 모음을 포함하는 3가지의 계열 모음으로 분류하는 제1 분류방법을 포함하는 것을 특징으로 한다. The pronunciation classification based on the vowel pronunciation includes a first classification method of classifying the vowel into three series vowels including the first series vowel, the second series vowel, and the third series vowel according to sound. It is done.

또한 상기 제1 계열 모음은 ㅏ, ㅓ, ㅗ, ㅜ, ㅡ, ㅣ, ㅐ, ㅔ를 포함하고, 상기 제2 계열 모음은 ㅑ, ㅕ, ㅛ, ㅠ, ㅒ, ㅖ를 포함하며, 상기 제3 계열 모음은 ㅘ, ㅙ, ㅞ, ㅚ, ㅝ, ㅟ, ㅢ를 포함하는 것을 특징으로 한다. The first series vowel includes ㅏ, ㅏ, ㅗ, TT, ㅡ,, ㅐ, ㅔ, and the second series vowel includes ㅑ, ㅕ, ㅛ, ㅠ, ㅒ, ㅖ, and the third Series vowels are characterized by including ㅘ, ㅙ, ㅞ, ㅚ, ㅝ, ㅟ, ㅢ.

또한 상기 모음 발음을 중심으로 하는 발음 분류는 각각의 모음에 대하여 제1 발음 모양 및 제2 발음 모양을 포함하는 2개의 발음 모양을 부여하는 제2 분류방법을 포함하는 것을 특징으로 한다. In addition, the pronunciation classification based on the vowel pronunciation includes a second classification method of giving two pronunciation shapes including a first pronunciation shape and a second pronunciation shape for each vowel.

또한 상기 발음 모양은 상기 제1 계열 모음 및 상기 제2 계열 모음을 이용하여 부여하는 것을 특징으로 한다. In addition, the pronunciation shape is characterized by using the first series vowels and the second series vowels.

또한 상기 제2 분류방법은, 상기 제1 계열 모음 및 상기 제3 계열 모음의 경우 상기 제1 계열 모음을 이용하여 2개의 발음 모양을 부여하고, 상기 제2 계열 모음의 경우 상기 제1 계열 모음 및 상기 제2 계열 모음을 이용하여 2개의 발음 모양을 부여하는 것을 특징으로 한다. In addition, in the second classification method, the first series vowel and the third series vowel are given two pronunciation shapes using the first series vowel, and the second series vowel and the first series vowel; It is characterized by giving two pronunciation shapes using the second series of vowels.

또한 상기 제2 분류방법은, 제1 계열 모음의 경우 제1 발음 모양 및 제2 발음 모양으로 동일한 제1 계열 모음을 부여하고, 제2 계열 모음의 경우 제1 발음 모양으로 제2 계열 모음을 부여하고 제2 발음 모양으로 제1 계열 모음을 부여하며, 제3 계열 모음의 경우 제1 발음 모양 및 제2 발음 모양으로 서로 다른 제1 계열 모음을 부여하는 것을 특징으로 한다. In addition, in the second classification method, the first series vowels are assigned the same first series vowels as the first pronunciation shape and the second pronunciation vowels, and the second series vowels are assigned the second series vowels as the first pronunciation vowels. And a first series vowel as a second pronunciation shape, and a third series vowel as a first pronunciation vowel and a second pronunciation vowel as a second pronunciation vowel.

또한 상기 모음 발음을 중심으로 하는 발음 분류는 각각의 모음에 대하여 부여된 제1 발음 모양 및 제2 발음 모양 중 어느 하나의 발음 모양에 대하여 강세를 부여하는 제3 분류방법을 포함하는 것을 특징으로 한다. In addition, the pronunciation classification based on the vowel pronunciation includes a third classification method for applying stress to any one of the first and second pronunciation shapes given for each vowel. .

또한 상기 제3 분류방법은, 상기 제1 계열 모음 및 제2 계열 모음의 경우 상기 2개의 발음 모양 중 상기 제1 발음 모양에 강세를 부여하고, 상기 제3 계열 모음의 경우 상기 2개의 발음 모양 중 상기 제2 발음 모양에 강세를 부여하는 것을 특징으로 한다. In the third classification method, in the case of the first series vowel and the second series vowel, the accent is applied to the first pronunciation shape among the two pronunciation shapes, and in the case of the third series vowel, among the two pronunciation shapes. It is characterized by giving stress to the second pronunciation pattern.

또한 상기 한국어 음성인식 시스템은 상기 모음 발음을 중심으로 하는 발음 분류가 저장된 음성인식 데이터베이스를 이용하여 입력된 음성을 해당하는 문자로 변환하는 시스템인 것을 특징으로 한다. In addition, the Korean voice recognition system is characterized in that the system for converting the input voice to the corresponding text using a voice recognition database stored pronunciation classification centered on the vowel pronunciation.

또한 상기 한국어 음성인식 시스템은 음성을 입력받는 음성 입력부; 상기 음성 입력부를 통해 입력된 음성을 디지털 음성신호로 변환하는 A/D 변환부; 상기 A/D 변환부를 통해 변환된 디지털 음성신호로부터 발음 특징을 추출하는 특징 추출부; 및 상기 특징 추출부로부터 추출된 발음 특징을 상기 음성인식 데이터베이스에 저장된 발음 분류와 비교하여 어떤 문자에 해당하는지 결정하는 판별부;를 포함하는 것을 특징으로 한다. In addition, the Korean voice recognition system includes a voice input unit for receiving a voice; An A / D converter for converting a voice input through the voice input unit into a digital voice signal; A feature extraction unit for extracting a pronunciation feature from the digital voice signal converted by the A / D converter; And a determination unit for determining which character corresponds to the pronunciation classification extracted from the feature extraction unit by comparing with the pronunciation classification stored in the speech recognition database.

본 발명은 기존의 모음을 분류하는 방법과 다른 방법으로 모음을 분류하고 모음 발음을 중심으로 음성 발음 특징을 부여하는 방법을 제공하고 상기 부여된 음성 발음 특징에 따라 구성된 음성 발음 데이터베이스를 제공하여 음성인식 및 음성변환 기술에 적용시킬 수 있도록 함으로써 음성을 문자로 변환하는 경우 우수한 인식률로 음성인식이 가능하고 문자를 음성으로 변환하는 경우 자연스러운 발음으로 음성변환이 가능한 효과를 제공할 수 있다. The present invention provides a method of classifying a vowel and assigning a voice pronunciation feature centering on the vowel pronunciation by a method different from the existing method of classifying a vowel, and providing a voice pronunciation database configured according to the given voice pronunciation feature. And by being able to be applied to the voice conversion technology it is possible to provide an effect capable of speech recognition with a good recognition rate when converting the voice to the text and a natural pronunciation when converting the text to the voice.

이하에 본 발명을 상세하게 설명하기에 앞서, 본 명세서에 사용된 용어는 특정의 실시예를 기술하기 위한 것일 뿐 첨부하는 특허청구의 범위에 의해서만 한정되는 본 발명의 범위를 한정하려는 것은 아님을 이해하여야 한다. 본 명세서에 사용되는 모든 기술용어 및 과학용어는 다른 언급이 없는 한은 기술적으로 통상의 기술을 가진 자에게 일반적으로 이해되는 것과 동일한 의미를 가진다.Prior to describing the present invention in detail below, it is understood that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to limit the scope of the invention, which is limited only by the scope of the appended claims. shall. All technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise indicated.

본 명세서 및 청구범위의 전반에 걸쳐, 다른 언급이 없는 한 포함(comprise, comprises, comprising)이라는 용어는 언급된 물건, 단계 또는 일군의 물건, 및 단계를 포함하는 것을 의미하고, 임의의 어떤 다른 물건, 단계 또는 일군의 물건 또는 일군의 단계를 배제하는 의미로 사용된 것은 아니다.Throughout this specification and claims, unless otherwise indicated, the termcomprise, comprises, and configure means to include the referenced article, step, or group of articles, and step, and any other article It is not meant to exclude a stage or group of things or a group of stages.

한편, 본 발명의 여러 가지 실시예들은 명확한 반대의 지적이 없는 한 그 외의 어떤 다른 실시예들과 결합될 수 있다. 특히 바람직하거나 유리하다고 지시하는 어떤 특징도 바람직하거나 유리하다고 지시한 그 외의 어떤 특징 및 특징들과 결합될 수 있다. 이하, 첨부된 도면을 참조하여 본 발명의 실시예 및 이에 따른 효과를 설명하기로 한다.On the other hand, various embodiments of the present invention can be combined with any other embodiment unless clearly indicated to the contrary. Any feature indicated as particularly preferred or advantageous may be combined with any other feature and features indicated as preferred or advantageous. Hereinafter, with reference to the accompanying drawings will be described an embodiment of the present invention and the effects thereof.

한국어 발음은 다른 외국어, 특히 영어와 다르게 자음만으로 발음되지 않고, 모음과 만나야 자음의 발음이 가능하며 받침소리로 사용되는 자음 또한 모음과 만나야 그 발음이 가능하다. 이렇게 모음이 한국어 음절 발음의 기준이 됨에도 불구하고 종래에는 음성인식이나 음성변환 기술을 적용하는데 있어서 한국어 고유의 모음 분류가 고려되지 않았기 때문에 인식률이 떨어지고 어색한 음성을 제공하는 문제점이 있었다. Unlike other foreign languages, especially English, Korean pronunciation is not pronounced only by consonants, but it is possible to pronounce consonants only when it meets vowels. Although vowels are the criterion of Korean syllable pronunciation, there is a problem in that the recognition rate is lowered and awkward speech is provided because the Korean vowel classification is not considered in applying speech recognition or speech conversion techniques.

본 발명은 기존의 모음을 분류하는 방법과 다른 방법으로 모음을 분류하고 모음 발음을 중심으로 음성 발음 특징을 부여함으로써 이러한 특징에 따라 구성된 음성 발음 데이터베이스를 제공하여 음성인식 및 음성변환 기술에 적용시킴으로써 우수한 인식률로 음성인식이 가능하고 자연스러운 발음으로 음성변환이 가능한 효과를 제공한다. The present invention provides a speech pronunciation database constructed according to these characteristics by classifying vowels and assigning vocal pronunciation features centering on vowel pronunciations, and applying them to speech recognition and speech conversion techniques. It provides speech recognition with recognition rate and voice conversion with natural pronunciation.

먼저, 기존의 모음을 분류하는 방법은 발음할 때 입술이나 혀가 고정되어 움직이지 않는 단모음(ㅏ, ㅔ, ㅓ, ㅐ, ㅗ, ㅚ, ㅜ, ㅟ, ㅡ, ㅣ)과 발음할 때 입술 모양이 바뀌거나 혀가 움직이는 이중모음(ㅑ, ㅒ, ㅕ, ㅖ, ㅘ, ㅙ, ㅛ, ㅝ, ㅞ, ㅠ, ㅢ)으로 구분한다. 또한 상기 단모음은 하기 표 1에 나타낸 것과 같이 발음할 때 혀의 최고점 위치가 앞쪽에 있는지, 뒤쪽에 있는지에 따라 전설모음 및 후설모음으로 분류될 수 있고, 혀의 위치가 높은지, 중간인지, 낮은지에 따라 고모음(폐모음), 중모음 및 저모음(개모음)으로 분류될 수 있으며, 입술을 둥글게하여 소리를 내는지, 납작하게하여 소리를 내는지에 따라 원순모음 및 평순모음으로 구분하고 있다. First, the method of classifying the existing vowels is a short vowel (ㅏ, ㅔ, ㅓ, ㅐ, ㅗ, ㅚ, ㅜ, ㅟ, ㅡ, ㅣ) that is fixed because the lips or tongue are fixed when pronounced and the shape of the lips when pronounced. It is divided into two vowels (ㅑ, ㅒ, ㅕ, ㅖ, ㅘ, ㅙ, ㅛ, ㅝ, ㅞ, ㅠ, ㅢ) that the tongue changes. In addition, the short vowels may be classified into legend vowels and rear vowels depending on whether the position of the highest point of the tongue is in the front or the rear when pronounced as shown in Table 1, and whether the position of the tongue is high, middle, or low. Therefore, it can be classified into high vowel (closed vowel), middle vowel and low vowel (open vowel), and is divided into round vowels and flat vowels according to whether the lips make a sound or flat sounds.

혀의 최고점 위치Peak position of the tongue 전설 모음Legend collection 후설 모음Husserl Collection 혀의 높이 / 입술의 모양The height of the tongue / the shape of the lips 평순Normal 원순Original 평순Normal 원순Original 고모음High vowels ㅣㅣ ㅟㅟ ㅡㅡ ㅜTT 중모음Heavy vowels ㅔㅔ ㅚㅚ ㅓㅓ ㅗㅗ 저모음Low vowel ㅐㅐ ㅏㅏ

상기와 같이 모음이 분류되어 있으나 소리, 즉 모음의 발음에 따른 분류체계가 없어 소리를 정확하게 분리할 수 없으므로 기준의 모음 분류체계는 음성인식 및 음성변환에 적용되기에 적합하지 않다. As mentioned above, the vowel classification system is not suitable for speech recognition and speech conversion because the vowel is classified, but the sound cannot be accurately separated because there is no classification system according to the pronunciation of the vowel.

일반적으로 한국어 소리에 대한 인식은 음절이라고 하여 자음과 모음을 합한 것, 즉 적어도 1개 이상의 자음과 모음이 구비되는 경우 1 음절을 의미한지만, 본 발명에서는 이러한 음절에 대한 기준을 “모음”을 중심으로 인식하고자 한다.In general, the recognition of Korean sounds means syllables that combine consonants and vowels, that is, one syllable when at least one consonant and vowels are provided. I want to recognize it as the center.

한국어는 모음 중심의 소리를 가진다. 모음 중심이라는 의미는 모음이 없으면 자음의 소리와 받침자음의 소리가 나지 않음을 의미한다. 자음은 소리를 가졌지만 자음 자체가 하나의 소리로 발음될 수 없다. 이는 영어의 고유한 발음 특성과 큰 차이가 있는 한국어의 고유한 발음 특성이다. 예를 들면, 영어의 자음 P, F 는 단독으로 소리가 나지만 한국어의 자음 ㅍ 은 모음없이 단독으로 소리가 나지 않는 것과 같은 고유한 발음 특성을 갖는다.Korean has a vowel sound. The vowel center means that without the vowel, the sound of the consonant and the consonant are not heard. Consonants have sounds, but consonants themselves cannot be pronounced as a single sound. This is a unique pronunciation characteristic of Korean, which has a great difference from the unique pronunciation characteristic of English. For example, the English consonants P and F have sound effects on their own, such as the Korean consonants P, F, and Korean consonants P.

따라서 본 발명은 상기와 같은 한국어 고유의 발음 특성에 따라 모음을 음절의 기준으로 하는 분류방법을 제공하는 것이다. 즉 한국어 음성인식 및 음성변환에 사용되는 음성 발음 데이터베이스를 구축함에 있어서 자음과 모음이 합쳐진 기존의 음절로 분류하는 것이 아닌 모음을 중심으로 음절을 구분하는 분류방법을 적용하는 것이다. Accordingly, the present invention provides a classification method using vowels as syllables based on the pronunciation characteristics of Korean. In other words, in constructing a phonetic pronunciation database used for Korean speech recognition and voice conversion, a classification method is used to classify syllables based on vowels rather than classifying existing syllables with consonants and vowels combined.

또한 본 발명은 먼저 모음을 소리에 따라 A 계열, Y 계열 및 W 계열의 3가지 계열로 분류한다. 다음으로 분류된 모든 모음들에 대하여 2종의 발음 모양을 부여하고, 상기 2종의 발음 모양 중 적어도 하나의 발음 모양에 대해 강세를 부여하여 발음 형태를 부여한다. 즉 모음 발음은 1차적으로 소리에 따라 구분되고, 2차적으로 2종의 발음 모양에 의해 구분되며, 3차적으로 발음 형태에 의해 구분되도록 함으로써 더욱 명확하고 높은 인식률로 발음 구분이 가능하도록 한다.In addition, according to the present invention, the vowels are first classified into three series of A series, Y series, and W series according to sound. Next, two kinds of pronunciation shapes are assigned to all the vowels classified as follows, and accentuation is given to at least one of the two pronunciation shapes to give a pronunciation form. That is, vowel pronunciation is primarily classified according to sound, and secondly, by two kinds of pronunciation shapes, and thirdly, by phonological forms, so that pronunciation can be distinguished with a clearer and higher recognition rate.

본 발명에 따른 모음 분류방법을 정리하면 하기 표 2에 나타낸 것과 같다. The vowel classification method according to the present invention is summarized as shown in Table 2 below.

소리sound 발음 모양Pronunciation shape 발음 형태
(강세가 있는 발음)Pronunciation form
(Pronounced pronunciation) 제1 발음First pronunciation 제2 발음Second pronunciation ㅏㅏ A 계열
(ㅏ[a] 계열)A series
(ㅏ [a] series) ㅏㅏ ㅏㅏ 제1 발음 모양First pronunciation shape ㅓㅓ ㅓㅓ ㅓㅓ 제1 발음 모양First pronunciation shape ㅗㅗ ㅗㅗ ㅗㅗ 제1 발음 모양First pronunciation shape ㅜTT ㅜTT ㅜTT 제1 발음 모양First pronunciation shape ㅡㅡ ㅡㅡ ㅡㅡ 제1 발음 모양First pronunciation shape ㅣㅣ ㅣㅣ ㅣㅣ 제1 발음 모양First pronunciation shape ㅐㅐ ㅐㅐ ㅐㅐ 제1 발음 모양First pronunciation shape ㅔㅔ ㅔㅔ ㅔㅔ 제1 발음 모양First pronunciation shape ㅑㅑ J 계열
(ㅑ[ja] 계열)J series
(Ja [ja] series) ㅑㅑ ㅏㅏ 제1 발음 모양First pronunciation shape ㅕㅕ ㅕㅕ ㅓㅓ 제1 발음 모양First pronunciation shape ㅛㅛ ㅛㅛ ㅗㅗ 제1 발음 모양First pronunciation shape ㅠㅠ ㅠㅠ ㅜTT 제1 발음 모양First pronunciation shape ㅒㅒ ㅒㅒ ㅐㅐ 제1 발음 모양First pronunciation shape ㅖㅖ ㅖㅖ ㅔㅔ 제1 발음 모양First pronunciation shape ㅘㅘ W 계열
(ㅘ[wa] 계열)W series
(ㅘ [wa] series) ㅗㅗ ㅏㅏ 제2 발음 모양Second pronunciation shape ㅙㅙ ㅗㅗ ㅐㅐ 제2 발음 모양Second pronunciation shape ㅞㅞ ㅜTT ㅔㅔ 제2 발음 모양Second pronunciation shape ㅚㅚ ㅗㅗ ㅔㅔ 제2 발음 모양Second pronunciation shape ㅝㅝ ㅜTT ㅓㅓ 제2 발음 모양Second pronunciation shape ㅟㅟ ㅜTT ㅣㅣ 제2 발음 모양Second pronunciation shape ㅢㅢ ㅡㅡ ㅣㅣ 제2 발음 모양Second pronunciation shape

상기 표 2에 나타난 것과 같이 모음 중 ㅏ, ㅓ, ㅗ, ㅜ, ㅣ, ㅐ, ㅔ 를 A 계열로 분류하고, ㅑ, ㅕ, ㅛ, ㅠ, ㅒ, ㅖ를 J 계열로 분류하며, ㅘ, ㅙ, ㅞ, ㅚ, ㅝ, ㅟ, ㅢ를 W 계열로 소리에 따라 1차 분류한다.As shown in Table 2, 모음, ㅓ, ㅗ,, ㅣ, ㅐ, ㅔ of the vowels are classified as A series, and ㅑ, ㅕ, ㅛ, ㅠ, ㅒ, 로 as J series, ㅘ, ㅙ , ㅞ, ㅚ, ㅝ, ㅟ, ㅢ are classified into W series based on sound.

다음으로, 기존에 분류된 단모음 및 이중모음 구별하지 않고 모든 모음에 대하여 2개의 발음 모양을 부여한다. 상기 표 2에 나타낸 것과 같이 모든 모음에 대하여 제1 발음 모양 및 제2 발음 모양을 부여한다. 발음 모양으로 부여되는 모음은 A 계열 모음과 J 계열 모음을 이용하고 W 계열 모음은 제외된다. Next, two phonetic shapes are given to all vowels without distinguishing between the previously classified short vowels and double vowels. As shown in Table 2, the first pronunciation shape and the second pronunciation shape are given to all vowels. The vowels given by the phonetic shape use the A series and J series collections, except for the W series collections.

이때, A 계열 및 W 계열 모음의 경우 A 계열 모음을 이용하여 2개의 발음 모양이 부여되고, J 계열 모음의 경우 A 계열 모음 및 J 계열 모음을 이용하여 2개의 발음 모양이 부여된다. In this case, two phonetic shapes are assigned to the A-series and the W-series vowels using the A-series vowels, and two pronunciation shapes are assigned to the J-series vowels using the A-series and J-series vowels.

더욱 구체적으로, A 계열 모음의 경우 제1 발음 모양 및 제2 발음 모양으로 동일한 A 계열 모음을 부여한다. 또한 J 계열 모음의 경우 제1 발음 모양으로 J 계열 모음을 부여하고 제2 발음 모양으로 A 계열 모음을 부여한다. 또한 W 계열 모음의 경우 제1 발음 모양 및 제2 발음 모양으로 서로 다른 A 계열 모음을 부여한다.More specifically, in the case of the A series vowel, the same A series vowel is assigned to the first pronunciation shape and the second pronunciation shape. In addition, in the case of the J series vowel, the J series vowel is given as the first pronunciation shape and the A series vowel is given as the second pronunciation shape. In addition, in the case of the W series vowels, different A series vowels are assigned to the first pronunciation shape and the second pronunciation shape.

본 발명에 따른 분류방법에 의하면 기존에 단모음으로 분류된 모음의 경우에도 2개의 발음 모양이 부여된다. 예를 들면, 기존에 단모음으로 분류되는 ㅔ 의 경우는 [ㅔ,ㅔ]로 ㅚ 의 경우는 [ㅗ,ㅔ] 로 분류된다. According to the classification method according to the present invention, even in the case of a vowel previously classified as a single vowel, two phonetic shapes are given. For example, in the case of 되는 previously classified as a short vowel, [ㅔ, ㅔ] is classified as [ㅗ, ㅔ].

본 발명의 발명자는 모음이 소리가 날 때 자음소리와 받침소리에 각각 영향을 주는 것을 발견하여 모음 소리가 한국어 음절의 기준이 되는 소리인 것을 인식하게 되었으며 모음 소리에 대하여 2개의 발음 모양을 부여함으로써 인식 확률을 높이고 자연스러운 발음이 가능하게 하였다. The inventors of the present invention found that the vowels affect the consonants and the bass sounds, respectively, when the vowels are sounded, thereby recognizing that the vowels are the standard sounds of Korean syllables. The recognition probability was increased and natural pronunciation was made possible.

본 발명에 따라 한 글자의 발음 시 두 개의 소리로 구분될 수 있다. 즉, 자음+모음+받침(자음)으로 이루어진 한 글자의 경우 [자음+모음(제1 발음)] 및 [모음(제2 발음)+받침]의 두 개의 소리로 구분되며, 자음+모음으로 이루어진 한 글자의 경우 [자음+모음(제1 발음)] 및 [모음(제2 발음)]의 두 개의 소리로 구분된다. According to the present invention, when a letter is pronounced, it may be divided into two sounds. That is, in the case of one letter consisting of consonants + vowels + consonants, it is divided into two sounds, [consonants + vowels (first pronunciation)] and [vowels (second pronunciation) + support]. One letter is divided into two sounds, [consonant + vowel (first pronunciation)] and [vowel (second pronunciation)].

마지막으로 상기 부여된 2개의 발음 모양 중 어느 하나의 발음 모양에 대하여 강세를 부여한다. 즉, 한 글자 안의 두 소리, 즉 [자음+모음(제1 발음)] 및 [모음(제2 발음)+(받침)] 중 어느 소리를 더 세게 발음하는 것인지를 구분하는 것이다.Finally, the accent is given to any one of the two pronunciation shapes. That is, it distinguishes which sounds are pronounced more strongly from two sounds in one letter, that is, [consonants + vowels (first pronunciation)] and [vowels (second pronunciation) + (support)].

상기 표 2에 나타낸 것과 같이 A 계열 모음 및 J 계열 모음의 경우, 2개의 발음 모양 중 제1 발음 모양에 강세를 부여하고, W 계열 모음의 경우 2개의 발음 모양 중 제2 발음 모양에 강세를 부여하는 것으로 구분하여 더욱 인식 확률을 높이고 자연스러운 발음이 가능하게 하였다. As shown in Table 2, in the case of the A-series vowels and the J-series vowels, stress is given to the first pronunciation shape among the two pronunciation shapes, and in the case of the W-series vowels, the stress is applied to the second pronunciation shape among the two pronunciation shapes. In addition, the recognition probability was increased and natural pronunciation was possible.

이하 설명의 편의를 위하여 계열, 제1 발음 모양 및 제2 발음 모양을 대괄호 안에 순서대로 기재하고, 제1 발음 모양 및 제2 발음 모양 중 강세가 있는 발음 모양에 큰따옴표 표시하여 표현한다. 예를 들어, ㅚ 의 경우 [W, ㅗ, “ㅔ”] 로 표현한다.For convenience of description, the sequence, the first pronunciation shape, and the second pronunciation shape are described in order in square brackets, and double quotation marks are displayed on the accented pronunciation shapes among the first and second pronunciation shapes. For example, ㅚ is expressed as [W, ㅗ, “ㅔ”].

예를 들어, ‘눈’ 이라는 글자의 발음은 모음인 ㅜ를 중심으로 2개의 소리로 구분할 수 있다. 먼저 모음 ㅜ 의 제1 발음 모양과 자음 ㄴ이 합쳐져 [누] 소리가 먼저 발음되고, 모음 ㅜ의 제2발음 모양과 받침 ㄴ과 합쳐져 [운] 소리가 이어 발음된다. 즉 모음 소리가 길게 발음(이중으로 발음)되면서 자음소리와 받침소리에 영향을 받아 한 글자가 발음된다. 이때 ㅜ는 A 계열 모음이므로 제1 발음에 강세를 두고 발음된다. For example, the pronunciation of the word 'eye' can be divided into two sounds around the vowel TT. First, the first pronunciation shape of the vowel TT and the consonant b are combined, and the [nu] sound is pronounced first, and the [pronounce] sound is pronounced after being combined with the second pronunciation form and the support b of the vowel TT. That is, a vowel sound is pronounced for a long time (double pronunciation), and a letter is pronounced under the influence of consonants and buzzing sounds. At this time, the TT is pronounced with emphasis on the first pronunciation since it is an A series vowel.

또 다른 예시로서, ‘괌’ 이라는 글자의 발음은 모음인 ㅘ를 중심으로 모음 ㅘ의 제1 발음 모양과 자음 ㄱ이 합쳐져 [고] 소리가 먼저 발음되고, 모음 ㅘ의 제2 발음 모양과 받침 ㅁ이 합쳐져 [암] 소리가 이어 발음된다. 이 때 ㅘ는 W 계열 모음이므로 제2 발음에 강세를 두고 발음된다. As another example, the pronunciation of the word 'Guam' is the first pronunciation of the vowel 과 and the consonant a combined with 모음, the vowel [. These are combined and the sound of [cancer] is followed. At this time, ㅘ is pronounced with emphasis on the second pronunciation because it is a W-series vowel.

한국어 발음에 대한 분류체계에 따라 음성인식 및 음성변환에 접목되었을 때, 음성인식의 인식률 및 음성변환 발음에 큰 영향을 미친다. 본 발명은 상기와 같은 모음 발음을 중심으로 하는 한국어 음성 방법 분류체계를 제공함으로써 음성을 인식하여 문자로 변환할 때 더욱 우수한 인식률로 음성인식이 가능하고, 문자를 음성으로 변활할 때 더욱 자연스러운 발음으로 음성변환이 가능하도록 한다.When it is applied to speech recognition and speech conversion according to the classification system for Korean pronunciation, it greatly affects the recognition rate and speech conversion pronunciation of speech recognition. The present invention provides a Korean voice method classification system centering on the vowel pronunciation as described above, so that voice recognition can be performed at a higher recognition rate when the voice is recognized and converted into a character, and when the character is changed into a voice, Enable voice conversion.

본 발명은 상기와 같은 분류방법으로 분류된 음성 발음 데이터베이스를 제공하여, 이 음성 발음 데이터베이스를 이용한 음성인식 시스템 및 음성변환 시스템을 제공할 수 있다. The present invention can provide a speech pronunciation database classified by the above classification method, and can provide a speech recognition system and a speech conversion system using the speech pronunciation database.

음성인식 시스템의 경우, 예를 들어, 음성이 입력되면 입력된 음성을 디지털 음성신호로 변환하고, 변환된 디지털 음성신호로부터 발음 특징을 추출한 다음 본 발명에 따른 음성 발음 데이터베이스와의 비교를 통해 해당하는 문자를 결정함으로써 음성인식 기술에 적용될 수 있다. In the case of a voice recognition system, for example, when a voice is input, the input voice is converted into a digital voice signal, a pronunciation feature is extracted from the converted digital voice signal, and then compared with a voice pronunciation database according to the present invention. It can be applied to speech recognition technology by determining the text.

더욱 구체적으로 음성인식 시스템(10)은 음성 입력부(11), A/D 변환부(12), 특징 추출부(13), 음성인식 데이터베이스(14), 판별부(15)를 포함하여 구성된다. 상기 음성인식 데이터베이스(14)에는 본 발명에 따른 음성 발음 데이터베이스가 포함되어 저장된다.More specifically, the speech recognition system 10 includes a speech input unit 11, an A / D converter 12, a feature extractor 13, a speech recognition database 14, and a discriminator 15. The speech recognition database 14 includes and stores a speech pronunciation database according to the present invention.

음성 입력부(11)는 예컨대 마이크로서, 사용자의 음성을 입력받는다. A/D 변환부(12)는 음성 입력부(11)를 통해 입력된 아날로그 음성신호를 디지털 음성신호로 변환한다. 특징 추출부(13)는 변환된 음성신호로부터 발음 특징 데이터를 추출한다. 판별부(15)는 특징 추출부(13)에서 추출한 발음 특징 데이터를 음성인식 데이터베이스(14)에 저장된 음성 발음 데이터베이스(14a)와 비교하여 어떤 문자에 해당하는지 결정한다. The voice input unit 11 receives a user's voice, for example, as a micro. The A / D converter 12 converts an analog voice signal input through the voice input unit 11 into a digital voice signal. The feature extractor 13 extracts pronunciation feature data from the converted speech signal. The determination unit 15 compares the pronunciation feature data extracted by the feature extraction unit 13 with the voice pronunciation database 14a stored in the voice recognition database 14 to determine which character corresponds.

이 때 판별부(15)는 본 발명의 모음 발음을 중심으로 한 한국어 음성 발음 분류에 따라 디지털 음성신호를 분류하여 데이터베이스와 비교하여 해당하는 문자를 결정할 수 있으며, 본 발명에 따른 분류체계를 통하여 더욱 우수한 인식률을 제공할 수 있다. At this time, the determination unit 15 may classify the digital voice signal according to the Korean voice pronunciation classification centered on the vowel pronunciation of the present invention and compare the database with the database to determine the corresponding character, and further, through the classification system according to the present invention. Excellent recognition rate can be provided.

음성변환 시스템의 경우, 예를 들어, 문자가 입력되면 입력된 문자에 대응하는 발음을 본 발명에 따른 음성 발음 데이터베이스로부터 추출하여 해당 문자에 대한 음성을 생성함으로써 음성변환 기술에 적용될 수 있다. In the case of a voice conversion system, for example, when a character is input, the voice corresponding to the input character may be extracted from the voice pronunciation database according to the present invention, thereby generating a voice for the corresponding character, and applied to a voice conversion technology.

더욱 구체적으로 음성변환 시스템(20)은 문자 분석부(21), 음성변환 데이터베이스(22), 음성 합성부(23) 및 음성 출력부(24)를 포함하여 구성된다. 상기 음성변환 데이터베이스(22)에는 본 발명에 따른 음성 발음 데이터베이스가 포함되어 저장된다. More specifically, the voice conversion system 20 includes a text analysis unit 21, a voice conversion database 22, a voice synthesis unit 23, and a voice output unit 24. The voice conversion database 22 includes and stores a voice pronunciation database according to the present invention.

상기 음성변환 데이터베이스(22)에는 문자 데이터들 각각에 대응하는 음성 데이터를 정의하는 정보, 음성 데이터들에 대한 음의 음운, 발음 및 강세 등 특징 데이터를 정의하는 정보를 포함하는데 상기 특징 데이터에 본 발명에 따른 분류체계가 적용될 수 있다. The voice conversion database 22 includes information defining voice data corresponding to each of the text data, and information defining feature data such as phonetic phonology, pronunciation, and stress on the voice data. The classification system according to

문자 분석부(21)는 문자 데이터를 입력받아 상기 음성변환 데이터베이스(22)를 이용하여 형태소 분석/태깅, 구문 분석 및 의미분석 및 음절/음소 변환 등을 수행한다.The text analysis unit 21 receives text data and performs morphological analysis / tagging, syntax analysis and semantic analysis, and syllable / phoneme conversion using the voice conversion database 22.

이 때 음성 합성부(23)는 상기 문자 분석부로부터 추출된 문자 특징 데이터를 입력받고 상기 음성변환 데이터베이스(22)에 저장된 문자 데이터에 대응하는 음성 데이터를 이용하여 합성음을 생성하고, 상기 음성 출력부(24), 예컨대 스피커를 통하여 음성을 출력할 수 있으며, 본 발명에 따른 분류체계를 통하여 더욱 자연스러운 한국어 발음을 출력할 수 있다. At this time, the voice synthesizer 23 receives the text feature data extracted from the text analyzer, generates a synthesized sound using voice data corresponding to the text data stored in the voice conversion database 22, and outputs the voice output unit. (24) For example, a voice may be output through a speaker, and a more natural Korean pronunciation may be output through a classification system according to the present invention.

전술한 각 실시예에서 예시된 특징, 구조, 효과 등은 실시예들이 속하는 분야의 통상의 지식을 가지는 자에 의하여 다른 실시예들에 대해서도 조합 또는 변형되어 실시 가능하다. 따라서 이러한 조합과 변형에 관계된 내용들은 본 발명의 범위에 포함되는 것으로 해석되어야 할 것이다.Features, structures, effects, and the like illustrated in the above-described embodiments may be combined or modified with respect to other embodiments by those skilled in the art to which the embodiments belong. Therefore, it should be interpreted that the contents related to such a combination and modification are included in the scope of the present invention.

Claims

In Korean,
Korean speech recognition system introduced pronunciation classification centered on vowel pronunciation, characterized in that one syllable is speech recognition centered on vowel.

The method of claim 1,
The pronunciation classification based on the vowel pronunciation includes a first classification method of classifying a vowel into three series vowels including a first series vowel, a second series vowel, and a third series vowel according to sound. Korean speech recognition system.

The method of claim 2,
The first series vowels include ㅏ, ㅓ, ㅗ, TT, ㅡ, ㅣ, ㅐ, ㅔ,
The second series vowels include ㅑ, ㅕ, ㅛ, ㅠ, ㅒ, ㅖ,
And the third series vowels include 한국어, ㅙ, ㅞ, ㅚ, ㅝ, ㅟ, ㅢ.

The method of claim 3,
The pronunciation classification based on the vowel pronunciation includes a second classification method for giving two pronunciation shapes including a first pronunciation shape and a second pronunciation shape for each vowel.

The method of claim 4, wherein
And the pronunciation shape is given by using the first series vowel and the second series vowel.

The method of claim 5,
The second classification method,
In the case of the first series vowel and the third series vowel, two phonetic shapes are given by using the first series vowel,
In the case of the second series vowel, the Korean speech recognition system is configured to give two pronunciation shapes using the first series vowel and the second series vowel.

The method of claim 5,
The second classification method,
In the case of the first series vowel, the same first series vowel is assigned to the first pronunciation shape and the second pronunciation shape.
In the case of the second series vowel, the second series vowel is assigned to the first pronunciation shape and the first series vowel is assigned to the second pronunciation shape.
In the case of the third series vowels, the Korean speech recognition system of claim 1, wherein the first series vowels are different from each other in the first pronunciation vowel and the second pronunciation vowel.

The method of claim 4, wherein
The pronunciation classification centering on the vowel pronunciation includes a third classification method for applying stress to one of the first and second pronunciation shapes given to each vowel. Voice recognition system.

The method of claim 8,
The third classification method,
In the case of the first series vowels and the second series vowels, stress is applied to the first pronunciation vowel among the two pronunciation vowels,
In the case of the third series vowels, the Korean voice recognition system of claim 1, wherein stress is applied to the second phonetic shape among the two phonetic shapes.

The method according to any one of claims 1 to 9,
The Korean speech recognition system is a Korean speech recognition system, characterized in that for converting the input voice into a corresponding character using a speech recognition database stored pronunciation classification centered on the vowel pronunciation.

The method of claim 10,
The Korean voice recognition system
A voice input unit for receiving a voice;
An A / D converter for converting a voice input through the voice input unit into a digital voice signal;
A feature extraction unit for extracting a pronunciation feature from the digital voice signal converted by the A / D converter; And
And a discriminating unit for determining which character corresponds to the pronunciation classification stored in the speech recognition database by comparing the pronunciation feature extracted from the feature extraction unit.