KR20180025559A

KR20180025559A - Apparatus and Method for Learning Pronunciation Dictionary

Info

Publication number: KR20180025559A
Application number: KR1020160112381A
Authority: KR
Inventors: 곽철
Original assignee: 에스케이텔레콤 주식회사
Priority date: 2016-09-01
Filing date: 2016-09-01
Publication date: 2018-03-09
Also published as: KR102615290B1

Abstract

The present invention relates to an apparatus and a method for learning a pronunciation dictionary, and according to the present invention, when a pronunciation string based on a first language for a word or a phrase, in which a first language and a second language are mixed, is to be generated, a grapheme string is generated and converted in consideration of a phonetic symbol and a prolonged sound rule of the first language, so it is possible to generate the pronunciation dictionary having more accurate information by reducing errors generated due to limited elements without requiring pre-segmentation of training data and limited context length in existing LSTM-CTC-based G2P.

Description

[0001] Apparatus and Method for Learning Pronunciation [0002]

본 발명은 음성 인식을 위한 발음 사전의 학습 방법 및 장치에 관한 것으로서, 더욱 상세하게는 일 언어를 다른 언어의 발음 기호를 고려하여 발음 사전을 학습하는데 있어서, 일 언어의 언어 체계와 다른 언어의 언어 체계의 차이점을 기반으로, 발음 사전을 학습하여, 다른 언어의 발음에 가까운 발음을 제공할 수 있도록 하는 발음 사전 학습 방법 및 장치에 관한 것이다.The present invention relates to a method and apparatus for learning a phonetic dictionary for speech recognition, and more particularly, to a method and apparatus for learning a phonetic dictionary in which one language is a phonetic symbol of another language, The present invention relates to a pronunciation dictionary learning method and apparatus for learning a pronunciation dictionary based on a difference of a system and providing a pronunciation similar to a pronunciation of another language.

이 부분에 기술된 내용은 단순히 본 실시 예에 대한 배경 정보를 제공할 뿐 종래기술을 구성하는 것은 아니다.The contents described in this section merely provide background information on the present embodiment and do not constitute the prior art.

음성인식 시스템에서 필요한 단어의 발성 정보를 저장하고 있는 발음 사전을 생성하기 위하여, G2P(Grapheme to Phoneme)이 널리 활용되고 있다.G2P (Grapheme to Phoneme) is widely used to generate a phonetic dictionary that stores vocabulary information of necessary words in a speech recognition system.

이러한 G2P를 이용한 발음 사전 생성 방법에는 규칙기반의 G2P를 이용한 발음사전 생성 방법과 데이터 기반의 G2P를 이용한 발음사전 생성 방법, 그리고 LSTM-CTC(Long Short Term Memory - Connectionist Temporal Classification) 기반 G2P를 이용한 발음사전 생성 방법이 있다.The G2P-based phonetic dictionary generation method includes a phonetic dictionary generation method using rule-based G2P, a pronunciation dictionary generation method using data-based G2P, and pronunciation using G2P based on LSTM-CTC (Long Short Term Memory-Connectionist Temporal Classification) There is a dictionary generation method.

이 중, 규칙 기반의 G2P는 사전 정보를 이용하여 발음열을 생성하나, 이러한 방법은 화자에 따른 다양한 발음열을 생성하는데 한계가 있었다.Among them, rule-based G2P generates a pronunciation string using dictionary information, but this method has a limitation in generating various pronunciation strings according to the speaker.

또한, 데이터 기반 G2P는 학습 데이터로부터 발음열에 대한 통계적 모델을 생성하고 이 모델을 이용하여 단어에 대한 다양한 발음열을 생성할 수 있지만, 각 프레임에서의 레이블 정보를 가지고 있어야 하므로 사전 세분화된 학습 데이터를 가지고 있어야 하고, 따라서 이러한 사전 세분화 과정에서 발생되는 오류로 인해 G2P의 성능을 저하시켜 부정확한 발음사전을 생성하게 되는 문제점이 있었다.In addition, the data-based G2P can generate a statistical model of the pronunciation string from the training data and generate various sound sequences for the word using this model. However, since it needs to have the label information in each frame, Therefore, there is a problem that the performance of the G2P is degraded due to an error generated in the pre-segmentation process, and an incorrect pronunciation dictionary is generated.

따라서, 현재에는 G2P를 이용한 발음 사전 학습에는 주로 LSTM-CTC 기반의 G2P 기법을 이용하게 되는데, 이러한 LSTM-CTC 기반 G2P 방법은 사전 세분화된 학습데이터를 요청하지 않고, 동적인 문맥 길이와 어느 시점 이후에 나타나는 문맥 정보를 이용할 수 있으므로 보다 정확한 발음 사전을 생성할 수 있다.Therefore, G2P method based on LSTM-CTC is mainly used for pronunciation dictionary learning using G2P. However, G2P method based on LSTM-CTC does not require pre-classified learning data, Can be used to generate a more accurate pronunciation dictionary.

그러나 기존의 LSTM-CTC 기반 G2P는 영어식 발음기호의 음소로 구성된 G2P 모델을 학습시키기 위해 영어를 위한 문자소를 입력으로 사용하여 발음 모델을 학습하였는 바, 입력된 단어가 한국어인 경우에는 발음열을 생성하지 못하여 한국어 단어의 발음사전을 저장할 수 없다는 문제점이 있었다.However, the existing LSTM-CTC-based G2P learned phonetic models using alphabets for English as input for learning G2P models composed of phonemes of English phonetic symbols. In the case where the input word is Korean, The pronunciation dictionary of the Korean word can not be stored.

한국공개특허 제10-2016-0089210호 (명칭: 언어 모델 학습 방법 및 장치, 언어 인식 방법 및 장치, 2016.07.27.)Korean Patent Laid-Open No. 10-2016-0089210 (Title: Language Model Learning Method and Apparatus, Language Recognition Method and Apparatus, 2016.07.27.)

본 발명은 상술한 문제점을 해결하기 위하여 제안된 것으로, 복수의 언어, 예를 들어, 한글 및 영어가 혼합되어 이루어지는 단어 또는 어절에 대해, 특정 언어(한국어)의 발음열을 생성하고자 한 것으로서, 특히 영어의 한국어 발음기호 및 연음 법칙을 고려하여 문자소열을 생성 및 변환하여, 상기 문자소열을 기반으로 발음열을 생성함으로써, 다양한 상황에 있어서, 정확한 발음정보를 가지는 발음 사전을 생성할 수 있는 방법 및 장치를 제공하고자 한다.SUMMARY OF THE INVENTION The present invention has been proposed in order to solve the above-described problems, and it is an object of the present invention to generate pronunciation strings of a specific language (Korean) for words or phrases composed of a plurality of languages, for example, A method of generating a phonetic dictionary having correct pronunciation information in various situations by generating and converting a character sequence by taking into account the Korean pronunciation symbols and the phonetic rules of English and generating a pronunciation sequence based on the character sequence, Device.

그러나, 이러한 본 발명의 목적은 상기의 목적으로 제한되지 않으며, 언급되지 않은 또 다른 목적들은 아래의 기재로부터 명확하게 이해될 수 있을 것이다.However, the object of the present invention is not limited to the above-mentioned objects, and other objects not mentioned can be clearly understood from the following description.

상술한 바와 같은 목적을 달성하기 위한 본 발명의 실시 예에 따른 발음 사전 학습 방법은 단어 또는 어절로 이루어지는 제2 언어의 문자열을 문자소로 구분하여 상기 제2 언어의 문자소열을 생성하는 단계, 제1 언어의 발음 기호를 고려하여 설정된 제1 규칙을 기반으로 상기 제2 언어의 문자소열을 변환하는 단계 및 상기 변환된 문자소열을 기반으로 상기 단어 또는 어절에 대응하는 발음열을 생성하는 단계를 포함할 수 있다.According to another aspect of the present invention, there is provided a pronunciation dictionary learning method comprising the steps of: generating a character sequence of a second language by dividing a character string of a second language composed of a word or a word into characters; Converting a character sequence of the second language based on a first rule set in consideration of a phonetic symbol of a language, and generating a pronunciation sequence corresponding to the word or the phrase based on the converted character sequence .

이때, 상기 생성하는 단계 이전에, 상기 제1 언어 및 제2 언어를 함께 포함하는 문자열을 제1 언어의 문자열과 상기 제2 언어의 문자열로 분리하는 단계를 더 포함할 수 있고, 상기 발음열을 생성하는 단계 이전에, 상기 제1 언어의 문자열을 문자소로 구분하여 제1 언어의 문자소열을 생성하는 단계, 상기 제1 언어의 문자소열과 상기 변환된 제2 언어의 문자소열을 결합하여 전체 문자소열을 생성하는 단계 및 연음 법칙을 고려하여 설정된 제2 규칙을 기반으로 상기 결합된 전체 문자소열을 변환하는 단계를 더 포함하고, 상기 발음열을 생성하는 단계는, 상기 전체 문자소열에 대한 발음열을 생성할 수 있다.The method may further include separating a character string including the first language and the second language into a character string of the first language and a character string of the second language before the generating step, Generating a character sequence of a first language by dividing a character string of the first language into a character sequence and combining the character sequence of the first language with a character sequence of the converted second language, Wherein the step of generating the sounding sequence further comprises a step of generating the sounding sequence and a step of converting the combined total character sequence based on the second rule set in consideration of the law of smearing, Can be generated.

또한, 상기 제1 언어가 한글이고, 상기 제2 언어가 영어인 경우, 상기 전체 문자소열을 변환하는 단계는 상기 전체 문자소열 안에서, 자음 알파벳 프레임 다음에 한글 모음 프레임이 위치하거나 한글 종성 프레임 다음에 모음 알파벳 프레임이 위치하는 경우, 상기 자음 알파벳 프레임 또는 상기 한글 종성 프레임에 대응하는 연음 프레임으로 상기 자음 알파벳 프레임 또는 상기 한글 종성 프레임을 변경할 수 있고, 상기 제1 언어가 한글이고, 상기 제2 언어가 영어인 경우, 상기 제2 언어의 문자소열을 변환하는 단계는, 두 개의 연속된 자음 알파벳 프레임의 사이, 또는 상기 문자소열의 마지막에 위치한 자음 알파벳 프레임의 다음에 한글 발음 '으'를 나타내는 기 설정된 프레임을 삽입할 수 있으며, 상기 단어 또는 어절에 대응하는 발음열을 생성하는 단계는 LSTM (Long Short Term Memory) 기법 및 CTC (Connectionist Temporal Classification) 기법 중 적어도 하나를 기반으로 발음열을 생성할 수 있다.In the case where the first language is Korean and the second language is English, the step of converting the entire character sequence may include a step of, after the consonant alphabet frame is located after the consonant alphabet frame, Wherein the consonant alphabetic frame or the Hangul consecutive frame can be changed into an consonant frame corresponding to the consonant alphabetic frame or the Hangul consecutive frame when the vowel alphabetic frame is located and the first language is Hangul, In the case of English, the step of converting the character sequence of the second language comprises the steps of: determining whether the consonant alphabet frame is consecutively set between two consonant consonant alphabet frames or at the end of the consonant alphabet sequence, A frame for generating a pronunciation string corresponding to the word or phrase, It can generate heat pronounce based on at least one of LSTM (Long Short Term Memory) techniques and CTC (Connectionist Temporal Classification) technique.

또한, 상기 생성된 발음열을 상기 단어 또는 어절과 매칭하여 저장하는 단계를 더 포함할 수 있다.The method may further include storing the generated pronunciation string matching the word or phrase.

한편, 상술한 발음 사전 학습 방법은 상술한 바와 같은 방법을 실행시키도록 구현되어 컴퓨터 판독 가능한 기록매체에 저장된 컴퓨터 프로그램으로 제공될 수도 있다.On the other hand, the above-described phonetic dictionary learning method may be provided as a computer program stored in a computer-readable recording medium so as to implement the method as described above.

상술한 바와 같은 목적을 달성하기 위한 본 발명에 따른 발음 사전 학습 장치는 단어 또는 어절로 이루어지는 제2 언어의 문자열을 문자소로 구분하여 상기 제2 언어의 문자소열을 생성하는 문자소열 생성 모듈, 제1 언어의 발음 기호를 고려하여 설정된 제1 규칙을 기반으로 상기 제2 언어의 문자소열을 변환하는 문자소열 변환 모듈 및 상기 변화된 문자소열을 기반으로 상기 단어 또는 어절에 대응하는 발음열을 생성하는 발음열 생성모듈을 포함할 수 있다.According to another aspect of the present invention, there is provided a pronunciation dictionary learning apparatus comprising: a character exclusion generation module for generating a character exclusion of a second language by dividing a character string of a second language composed of a word or a word into characters; A character string cancellation conversion module for converting a character string of the second language based on a first rule set in consideration of a phonetic symbol of a language and a pronunciation column for generating a pronunciation string corresponding to the word or phrase based on the changed character quit Generating module.

이때, 상기 제1 언어 및 제2 언어를 함께 포함하는 문자열을 제1 언어의 문자열과 상기 제2 언어의 문자열로 분리하는 언어 분리 모듈을 더 포함할 수 있고, 상기 문자소열 생성 모듈은 상기 제1 언어의 문자열을 문자소로 구분하여 제1 언어의 문자소열을 더 생성하고, 상기 문자소열 변환 모듈은 상기 제1 언어의 문자소열과 상기 변환된 제2 언어의 문자소열을 결합하여 전체 문자소열을 생성하며, 연음법칙을 고려하여 설정된 제2 규칙을 기반으로 상기 결합된 전체 문자소열을 변환하며, 상기 발음열 생성 모듈은 상기 변환된 전체 문자소열에 대한 발음열을 생성할 수 있다.The character string generation module may further include a language separation module for separating a character string including the first language and the second language into a character string of the first language and a character string of the second language, The character string sorting module generates a first character string sorting by dividing a string of a language into a character string and the character string sorting conversion module combines the character string of the first language with the character sorting of the converted second language to generate a total character string And converts the combined total character sequence based on a second rule set in consideration of the law of the phoneme, and the pronunciation sequence generation module can generate a pronunciation sequence for the converted whole character string.

본 발명에 따르면, 제1 언어와 제2 언어가 혼합된 단어 또는 어절에 대한 제1 언어를 기준으로 한 발음열을 생성하려고 할 때, 제1 언어의 발음 기호 및 연음 법칙을 고려하여, 문자소열을 생성 및 변환하고, 상기 문자소열을 기반으로 발음열을 생성함으로써, 기존의 LSTM-CTC 기반 G2P에서의 제한적 요소인 학습데이터의 사전 세분화 과정 및 설정된 문맥길이를 요청하지 않고, 상기 제한적 요소로 인해 발생되는 오류를 감소시킬 수 있어 보다 정확한 발음정보를 가지는 발음사전을 생성할 수 있다.According to the present invention, when a pronunciation string based on a first language for a word or a phrase having a mixture of a first language and a second language is to be generated, And generating a pronunciation sequence based on the character extinguishment, the preliminary segmentation process of the learning data, which is a limiting element in the existing LSTM-CTC-based G2P, and the set context length are not requested, The generated error can be reduced, and a pronunciation dictionary having more accurate pronunciation information can be generated.

아울러, 상술한 효과 이외의 다양한 효과들이 후술될 본 발명의 실시 예에 따른 상세한 설명에서 직접적 또는 암시적으로 개시될 수 있다.In addition, various effects other than the above-described effects can be directly or implicitly disclosed in the detailed description according to the embodiment of the present invention to be described later.

도1은 본 발명의 실시 예에 따른 발음 사전 학습 방법을 구현하기 위한 시스템을 나타낸 도면이다.
도2는 본 발명에 따른 발음 사전 학습 장치의 구성을 나타낸 블록도이다.
도3은 BLSTM (Bidirectional Long Short Term Memory) 구조를 설명하기 위한 예시도이다.
도4는 본 발명의 실시 예에 따른 발음 사전 학습 장치의 동작 과정을 나타낸 흐름도이다.
도5는 본 발명에 따른 실시 예를 설명하기 위한 예시도이다.
도6은 본 발명의 실시 예가 적용된 음성 인식 시스템의 전체 구조를 설명하기 위한 블록도이다.1 is a diagram illustrating a system for implementing a pronunciation dictionary learning method according to an embodiment of the present invention.
2 is a block diagram showing a configuration of a pronunciation dictionary learning apparatus according to the present invention.
3 is an exemplary diagram for explaining a BLSM (Bidirectional Long Short Term Memory) structure.
4 is a flowchart illustrating an operation procedure of the pronunciation dictionary learning apparatus according to an embodiment of the present invention.
5 is an exemplary view for explaining an embodiment according to the present invention.
6 is a block diagram for explaining the overall structure of a speech recognition system to which an embodiment of the present invention is applied.

본 발명의 과제 해결 수단의 특징 및 이점을 보다 명확히 하기 위하여, 첨부된 도면에 도시된 본 발명의 특정 실시 예를 참조하여 본 발명을 더 상세하게 설명한다.BRIEF DESCRIPTION OF THE DRAWINGS For a more complete understanding of the nature and advantages of the present invention, reference should be made to the following detailed description taken in conjunction with the accompanying drawings, in which:

다만, 하기의 설명 및 첨부된 도면에서 본 발명의 요지를 흐릴 수 있는 공지 기능 또는 구성에 대한 상세한 설명은 생략한다. 또한, 도면 전체에 걸쳐 동일한 구성 요소들은 가능한 한 동일한 도면 부호로 나타내고 있음에 유의하여야 한다.In the following description and the accompanying drawings, detailed description of well-known functions or constructions that may obscure the subject matter of the present invention will be omitted. It should be noted that the same constituent elements are denoted by the same reference numerals as possible throughout the drawings.

이하의 설명 및 도면에서 사용된 용어나 단어는 통상적이거나 사전적인 의미로 한정해서 해석되어서는 아니 되며, 발명자는 그 자신의 발명을 가장 최선의 방법으로 설명하기 위한 용어의 개념으로 적절하게 정의할 수 있다는 원칙에 입각하여 본 발명의 기술적 사상에 부합하는 의미와 개념으로 해석되어야만 한다. 따라서 본 명세서에 기재된 실시 예와 도면에 도시된 구성은 본 발명의 가장 바람직한 일 실시 예에 불과할 뿐이고, 본 발명의 기술적 사상을 모두 대변하는 것은 아니므로, 본 출원시점에 있어서 이들을 대체할 수 있는 다양한 균등물과 변형 예들이 있을 수 있음을 이해하여야 한다.The terms and words used in the following description and drawings are not to be construed in an ordinary sense or a dictionary, and the inventor can properly define his or her invention as a concept of a term to be described in the best way It should be construed as meaning and concept consistent with the technical idea of the present invention. Therefore, the embodiments described in the present specification and the configurations shown in the drawings are merely the most preferred embodiments of the present invention, and not all of the technical ideas of the present invention are described. Therefore, It is to be understood that equivalents and modifications are possible.

아울러, 본 발명의 범위 내의 실시 예들은 컴퓨터 실행가능 명령어 또는 컴퓨터 판독가능 매체에 저장된 데이터 구조를 가지거나 전달하는 컴퓨터 판독가능 매체를 포함한다. 이러한 컴퓨터 판독가능 매체는, 범용 또는 특수 목적의 컴퓨터 시스템에 의해 액세스 가능한 임의의 이용 가능한 매체일 수 있다. 예로서, 이러한 컴퓨터 판독가능 매체는 RAM, ROM, EPROM, CD-ROM 또는 기타 광 디스크 저장장치, 자기 디스크 저장장치 또는 기타 자기 저장장치, 또는 컴퓨터 실행가능 명령어, 컴퓨터 판독가능 명령어 또는 데이터 구조의 형태로 된 소정의 프로그램 코드 수단을 저장하거나 전달하는 데에 이용될 수 있고, 범용 또는 특수 목적 컴퓨터 시스템에 의해 액세스 될 수 있는 임의의 기타 매체와 같은 물리적 저장 매체를 포함할 수 있지만, 이에 한정되지 않는다.In addition, embodiments within the scope of the present invention include computer-readable media having computer-executable instructions or data structures stored on computer-readable media. Such computer-readable media can be any available media that is accessible by a general purpose or special purpose computer system. By way of example, and not limitation, such computer-readable media can comprise RAM, ROM, EPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or in the form of computer- But is not limited to, a physical storage medium such as any other medium that can be used to store or communicate certain program code means of the general purpose or special purpose computer system, .

본 발명은 음성 인식에 필요한 단어의 발성 정보를 저장하는 발음 사전을 생성하는데 있어서, 복수의 언어가 혼합된 단어 또는 어절을 특정 언어의 발음열로 나타내고자 하는 것으로서, 이하의 설명 및 특허청구범위에서 본 발명에 따른 단어 또는 어절은 한글 및 영문 알파벳이 혼합된 형태의 단어 또는 어절인 것을 가정하고 설명한다.The present invention is intended to represent a word or a phrase in which a plurality of languages are mixed in a pronunciation column of a specific language in generating a pronunciation dictionary for storing pronunciation information of a word necessary for speech recognition, A word or a word according to the present invention is assumed to be a word or a word in the form of a mixture of Korean and English alphabets.

즉, 특허청구범위에서 표현되는 제1 언어는 본 상세한 설명에서의 한글과 대응될 수 있으며, 제2 언어는 영문 또는 영문 알파벳과 대응될 수 있다.That is, the first language represented in the claims may correspond to Hangul in this detailed description, and the second language may correspond to the English alphabet or the English alphabet.

또한, 특허청구범위에서 표현되는 제1 언어의 발음 기호는 영어에 대한 한국어 발음기호에 대응될 수 있다.In addition, the pronunciation symbol of the first language expressed in the claims may correspond to the Korean pronunciation symbol for English.

하지만, 이는 발명의 이해를 돕기 위한 것에 불과하며, 제1 언어 및 제2 언어는 한국어, 영어 이외의 불어, 스페인어, 중국어, 일본어 등 언어학적으로 대응되는 발음 기호의 차이점이 있는 두 언어가 혼합된 단어 또는 음절의 발음열을 생성하기 위한 것이라면, 어떠한 언어이던지 제1 언어 또는 제2 언어가 될 수 있다.However, this is only for the purpose of helping understanding of the invention, and the first language and the second language are a mixture of two languages in which phonetic correspondence such as Korean, English, French, Spanish, Chinese, Any language can be a first language or a second language, as long as it is for generating a pronunciation string of a word or a syllable.

이하, 설명 및 이해의 편의를 돕기 위하여, 한글과 영문이 혼합된 단어 또는 어절의 발음열을 생성하는 것을 기준으로 설명한다.Hereinafter, for the convenience of explanation and understanding, description will be given on the basis of generating a pronunciation string of a word or a word mixed with Korean and English.

다만, 앞서 설명하였듯이, 한글과 영문의 혼합 이외에도 서로 다른 두 언어라면, 두 언어 중 어떠한 언어도 제1 언어 또는 제2 언어가 될 수 있음을 유의해야 한다.However, as described above, it should be noted that, in the case of two different languages other than the mixture of Korean and English, any one of the two languages can be the first language or the second language.

또한, 후술할 본 발명에 따른 설명에서 프레임이라는 용어는 발음 사전 학습 장치가 생성하는 문자소열 혹은 발음열을 구성하는 각각의 문자소 혹은 발음 기호를 뜻한다.In addition, in the following description according to the present invention, the term frame refers to each letter or phonetic symbol constituting a character string or a pronunciation string generated by the pronunciation dictionary learning apparatus.

즉, 하나의 프레임 내에 하나의 문자소 혹은 하나의 발음기호를 포함할 수 있는 것이다.That is, one character or one phonetic symbol can be included in one frame.

그러면 이제 본 발명의 실시 예에 따른 발음 사전 학습 방법에 대하여 도면을 참조하여 상세하게 설명하도록 한다.Now, a pronunciation dictionary learning method according to an embodiment of the present invention will be described in detail with reference to the drawings.

도1은 본 발명의 실시 예에 따른 발음 사전 학습 방법을 제공하기 위한 시스템의 구성을 나타내는 도면이다.1 is a diagram showing a configuration of a system for providing a pronunciation dictionary learning method according to an embodiment of the present invention.

도1을 참조하면, 본 실시 예에 따른 발음 사전 학습 방법을 제공하기 위한 시스템은 텍스트 DB(100), 발음사전 학습장치(200) 및 발음사전 저장장치(300)를 포함하여 구성될 수 있다.Referring to FIG. 1, a system for providing a pronunciation dictionary learning method according to the present embodiment may include a text DB 100, a pronunciation dictionary learning device 200, and a pronunciation dictionary storage device 300.

그러면 도1을 참조하여 각 구성요소에 대해 개략적으로 설명하도록 한다.Each component will now be schematically described with reference to FIG.

먼저, 텍스트 DB(100)는 외부 장치 또는 사용자로부터 입력받은 영문과 한글이 혼합된 단어 또는 어절을 저장하는 장치이다.First, the text DB 100 is a device for storing words or phrases mixed with Korean alphabet and input from an external device or user.

이러한 텍스트 DB(100)는 영문과 한글이 혼합된 특정 단어 또는 특정 어절을 발음 사전 학습 장치(200)로 전달하는 역할을 한다.The text DB 100 serves to transmit a specific word or a specific word, which is a mixture of English and Korean, to the pronunciation dictionary learning apparatus 200.

발음사전 학습장치(200)는 텍스트 DB(100)로부터 전달받은 특정 단어 또는 특정 어절의 발음열을 생성하는 장치이다. 구체적으로, 발음사전 학습장치(200)는 입력된 단어 또는 어절로부터 문자소열을 생성한 후, 상기 문자소열로부터 발음열을 생성한다. 이때, 본 발명에 있어서 상기 입력된 단어 또는 어절은 영문과 한글이 혼합되어 있을 수 있다.The pronunciation dictionary learning apparatus 200 is a device for generating a pronunciation word of a specific word or a specific word received from the text DB 100. Specifically, the pronunciation dictionary learning apparatus 200 generates a character string from the input word or phrase, and then generates a pronunciation string from the string termination. At this time, in the present invention, the input word or the phrase may be a mixture of English and Korean.

이에, 상기 발음 사전 학습 장치(200)는 특정 단어 또는 특정 어절에 포함된 한글과 영어를 한글 부분으로 이루어진 제1 문자열과 영어 부분으로 이루어진 제2 문자열로 분리하고, 제1 문자열 부분을 자소 단위 문자소로 구분하여 제1 문자소열을 생성하고, 제2 문자열 부분을 글자 단위 문자소로 구분하여 제2 문자소열을 생성하여, 영어의 한글 발음 기호를 고려하여 설정된 제1 규칙을 기반으로 제2 문자소열을 변환하고, 변환된 제2 문자소열 및 생성된 제1 문자소열을 결합하여, 전체 문자소열을 생성한 다음, 한글의 연음법칙을 고려하여 설정된 제2 규칙을 기반으로 전체 문자소열을 변환하고, LSTM 기법 및 CTC 기법 중 적어도 하나를 기반으로 상기 단어 또는 어절에 대응하는 발음열을 생성하여, 발음사전 저장장치(300)에 상기 생성된 발음열 및 상기 단어 또는 어절을 매칭하여 저장한다.Accordingly, the pronunciation dictionary learning apparatus 200 separates Korean characters and English characters included in a specific word or a specific word into a first character string composed of a Hangul part and a second character string composed of an English part, A second character string is generated by dividing the second string portion into letter units and a second character string is generated based on a first rule set in consideration of the Hangul pronunciation symbol of English Converts the entire character sequence based on the second rule set in consideration of the Hangul rule of Hangul, and then converts the entire character sequence into LSTM Technique and a CTC technique to generate a pronunciation string corresponding to the word or phrase and to cause the pronunciation dictionary storage device 300 to generate the pronunciation string and the word Or a matching phrase.

발음사전 학습장치(200)의 동작에 관한 구체적인 설명은 후술하도록 한다.A detailed description of the operation of the pronunciation dictionary learning apparatus 200 will be given later.

발음사전 저장장치(300)는 발음사전 학습장치(200)가 생성한 발음열을 저장하는 장치로서, 상기 생성한 발음열을 상기 생성한 발음열과 대응하는 단어 또는 어절과 매칭하여 저장할 수 있다.The pronunciation dictionary storage device 300 is a device for storing pronunciation strings generated by the pronunciation dictionary learning device 200. The pronunciation dictionary storage device 300 may store the generated pronunciation strings in matching with words or phrases corresponding to the generated pronunciation strings.

또한, 상기 발음 사전 저장장치(300)는 도 1에서 보는 것과 같이, 발음 사전 학습장치(200)와 구분되어 구현될 수 있지만, 발음사전 학습장치(200)내에 탑재되어 내장될 수도 있다.The pronunciation dictionary storage device 300 may be implemented separately from the pronunciation dictionary learning device 200 as shown in FIG. 1, but may be embedded in the pronunciation dictionary learning device 200 and embedded therein.

도2는 본 발명에 따른 발음사전 학습장치(200)의 구성을 설명하기 위한 블록도이다.2 is a block diagram for explaining a configuration of a pronunciation dictionary learning apparatus 200 according to the present invention.

도2를 살펴보면, 발음사전 학습장치(200)는 입력모듈(210), 언어분리모듈(220), 문자소열 생성모듈(230), 문자소열 변환모듈(240), 발음열 생성모듈(250) 및 출력모듈(260)을 포함할 수 있다.2, the pronunciation dictionary learning apparatus 200 includes an input module 210, a language separation module 220, a character string sorter module 230, a character string sorter module 240, a pronunciation string generation module 250, And an output module 260.

입력모듈(210)은 숫자, 한글, 영어 등의 최소 2개 이상의 언어가 혼합된 단어 또는 어절을 입력받는 장치로서, 입력모듈(210)을 통해 사용자가 상기 혼합된 단어 또는 어절을 직접 입력할 수도 있고, 텍스트 DB(100)를 통해 상기 단어 또는 어절을 전달받는 형태로 입력받을 수도 있다.The input module 210 is a device for receiving a word or a word having a mixture of at least two languages such as numbers, Korean, and English. The input module 210 allows the user to directly input the mixed word or the word through the input module 210 And may receive the words or phrases received through the text DB 100.

입력모듈(210)이 입력받은 단어 또는 어절은 언어분리모듈(220)로 전달된다.The words or phrases received by the input module 210 are transmitted to the language separation module 220.

언어분리모듈(220)은 입력모듈(210)로부터 전달받은 단어 또는 어절을 분리하는 장치로서, 각 언어 별로 단어 또는 어절을 분리한다.The language separation module 220 separates words or phrases received from the input module 210 and separates words or phrases for each language.

예를 들어, 전달받은 단어 또는 어절이 한글과 영어가 조합된 문자열을 한글로 이루어진 부분과 영어로 이루어진 부분을 각각 분리하여, 한글 문자열과 영어 문자열로 분리한다.For example, a segment of a word or a phrase, which is a combination of Hangul and English, is divided into a Hangul string and an English string.

이와 마찬가지로, 전달받은 단어 또는 어절이 중국어와 일본어가 조합된 형태라면, 일본어 부분과 중국어 부분을 각각 일본어 문자열과 중국어 문자열로 분리할 것이다.Likewise, if the received word or phrase is a combination of Chinese and Japanese, the Japanese part and the Chinese part will be separated into a Japanese string and a Chinese string, respectively.

즉, 제1 언어 및 제2 언어를 함께 포함하는 문자열을 제1 언어의 문자열과 제2 언어의 문자열로 분리하는 것이다.That is, the character string including the first language and the second language is divided into a character string of the first language and a character string of the second language.

문자소열 생성 모듈(230)은 상기 분리된 제1 언어의 문자열과 제2 언어의 문자열을 각각 문자소로 구분하여, 제1 언어의 문자소열과 제2 언어의 문자소열을 생성한다.The character string serialization module 230 generates a string of the first language and a string of the second language by separating the string of the separated first language and the string of the second language into characters.

이때, 문자소열은 제1 언어 및 제2 언어의 특성에 맞게 생성할 수 있는데, 예를 들어, 제1 언어가 한글인 경우, 한글의 특성에 맞게 자소 단위로 문자열을 생성할 수 있고, 제2 언어가 영어인 경우, 영어의 특성에 맞게 문자 단위로 문자열을 생성할 수 있다.For example, if the first language is Hangul, a character string can be generated on a per-character basis in accordance with the characteristics of Hangul, and the second character can be generated according to the characteristics of the second language and the second language. If the language is English, you can create a character string on a character-by-character basis to match the characteristics of English.

문자소열 변환 모듈(240)은 문자소열 생성 모듈(230)에서 생성된 문자소열을 발음열을 생성하고자 하는 언어 체계와의 차이에 따라 기 설정된 몇 가지 규칙을 기반으로 변환하는 장치이다.The character cancellation conversion module 240 is a device for converting the character cancellation generated by the character cancellation module 230 on the basis of a predetermined rule according to a difference from a language system in which a pronunciation column is to be generated.

이때, 생성된 문자소열을 변환하는 규칙의 대표적인 예는 발음열을 생성의 기준이 되는 언어의 발음 기호 및 연음 법칙 등이 될 수 있다.At this time, a typical example of a rule for converting the generated character sequence may be a pronunciation symbol and an age law of the language on which the pronunciation string is generated.

언어의 발음 기호를 기준으로 변환하는 예에 대해 설명하면, 영어의 문자소열이 생성되었고, 이에 대한 한국어 발음열을 생성하기 위해서는 영어의 한글 발음 기호를 고려하여 설정된 제1 규칙에 따라 영어의 문자소열을 변환할 수 있다.In order to generate a Korean pronunciation string, an English character string is generated according to the first rule set in consideration of the Korean pronunciation symbol of English, Lt; / RTI >

좀 더 구체적인 예시로, 영어에는 한글 '으'에 해당하는 발음 기호가 존재하지 않으므로 영어 문자열을 한글 발음열로 생성하기 위해서는 한글 '으'에 해당하는 발음 기호를 표현해주는 것이 필요하다.As a more specific example, in English, there is no phonetic symbol corresponding to 'Korean', so in order to generate an English string as a Korean pronunciation string, it is necessary to express a phonetic symbol corresponding to 'Korean'.

따라서, 영어 발음 열에서 한글 '으'의 발음이 나타날 수 있는 경우, 즉, 영어의 자음 알파벳이 두 개 연속으로 오는 경우, 또는 영어 문자소열의 마지막에 위치한 알파벳이 자음 알파벳인 경우, 연속적인 두 자음 알파벳 사이 혹은 마지막에 위치한 자음 알파벳 다음에 '으'를 나타내는 프레임 혹은 기호를 삽입하여 표현해줄 수 있다.Therefore, when the pronunciation of Hangul 'in' English pronunciation column can appear, that is, when English consonant alphabets come in two consecutive, or when the alphabet located at the end of the English alphabetic sequence is consonant alphabet, It can be expressed by inserting a frame or symbol representing 'u' after the consonant alphabet located at or between the consonant alphabet or at the end.

연음 법칙을 기준으로 변환하는 예에 대해 설명하면, 문자소열 변환 모듈(240)은 자소 단위의 한글 문자소열과 문자 단위의 영어 문자소열을 결합한 전체 문자소열을 생성할 수 있는데, 상기 전체 문자소열을 결합한 이후, 한글과 영어 알파벳 사이에서 발생할 수 있는 연음을 기준으로 제2 규칙을 생성하여, 문자소열을 변환할 수 있다.The character string canonical conversion module 240 can generate an entire character string by combining the Korean character string of the character unit and the English character string of the character unit. After combining, the second rule can be generated based on the possible tones between the Korean alphabet and the English alphabet to convert the character sequence.

좀 더 구체적인 예시로서, 영어의 자음 알파벳 다음에 한글 모음이 위치하거나 한글 종성 다음에 영어의 모음 알파벳이 위치하는 경우, 연음이 발생할 수 있는데, 이때, 해당 위치에서 연음이 발생함을 표시하기 위하여, 상기 자음 알파벳 프레임 또는 한글 종성 프레임에 대응하는 연음 프레임으로 상기 자음 알파벳 프레임 또는 상기 한글 종성 프레임을 변경할 수 있다.As a more specific example, when a Korean vowel is located after the English consonant alphabet, or when a vowel alphabet of English is located after the Hangul consonant, there may occur an incongruency. At this time, The consonant alphabet frame or the Hangul consecutive frame can be changed to an consonant frame corresponding to the consonant alphabetic frame or the Hangul consecutive frame.

발음열 생성 모듈(250)은 상기 변환된 문자소열을 기반으로 입력된 단어 또는 어절에 대응하는 발음열을 생성하는 장치이다.The pronunciation string generating module 250 is a device for generating a pronunciation string corresponding to a word or a phrase input based on the converted character string.

이때, 발음열 생성 모듈(250)은 LSTM 기법 및 CTC 기법 중 적어도 하나를 기반으로 발음열을 생성할 수 있다.At this time, the pronunciation tone generation module 250 can generate a pronunciation tone based on at least one of the LSTM technique and the CTC technique.

특히, LSTM 기법 중, BLSTM(Bidirectional Long Short Term Memory) 구조의 순환 신경망을 이용하면, 순방향 LSTM 구조를 이용하여 동적인 문맥 길이를 사용하고, 역방향 LTSM 구조를 이용하여 향후 문맥을 사용함으로써, 정확한 발음열을 생성할 수 있는데, 도3을 통해 이런 BLSTM 구조의 순환 신경망을 이용한 예시를 간단하게 살펴보면, 도3(a)는 순방향 LSTM의 예시로서, 단어 'ABLE'에서 프레임 'E'는 'ABL' 이전 문맥을 통해 'blank'처리가 되고, 단어 'GET'에서는 프레임 'E'가 이전 문맥인 'G'를 사용하여 'e'의 발음기호로 처리된다.Particularly, in the LSTM technique, when a circular neural network having a BLTM (Bidirectional Long Short Term Memory) structure is used, dynamic context length is used by using a forward LSTM structure and future context is used by using a reverse LTSM structure, 3 illustrates an example of a forward LSTM in which a frame 'E' in a word 'ABLE' is an 'ABL' in a word 'ABLE' In the previous context, 'blank' is processed. In the word 'GET', frame 'E' is treated as 'e' by using the previous context 'G'.

도3(b)는 역방향 LSTM의 예시로서, 단어 'CARE'에서의 프레임 'A'는 뒤에서 나타나는 'RE'문맥을 사용하여 'e'의 발음 기호로 처리되고, 단어 'CAR'에서의 'A'프레임은 뒤에서 나타나는 'R'문맥을 사용하여 'a'의 발음 기호로 처리된다.3 (b) is an example of a reverse LSTM, in which the frame 'A' in the word 'CARE' is treated as a pronunciation symbol of 'e' using the 'RE' 'Frame is treated as the phonetic symbol of' a 'using the' R 'context that appears afterwards.

한편, CTC 구조를 이용한 순환 신경망 구조에서는 학습 데이터 레이블을 생성하기 위하여, 단어 또는 어절의 시작과 끝에 blank를 추가하고, 각 프레임 사이에 blank를 삽입한다.On the other hand, in the cyclic neural network structure using the CTC structure, a blank is added at the beginning and end of a word or a word to generate a learning data label, and a blank is inserted between each frame.

또한, 이러한 학습 데이터를 CTC구조를 통해 인식하기 위하여, 학습된 순환 신경망을 통해 출력된 문자소열에서 중복된 프레임을 제거하고, blank 프레임을 제거할 수 있다.Also, in order to recognize such learning data through the CTC structure, it is possible to remove redundant frames and remove blank frames from the character exclusion output through the learned cyclic neural network.

이러한 LSTM과 CTC기법을 활용하여 발음열을 생성하는 발음열 생성 모듈(250)이 발음열을 생성하는 방법은 종래의 기술과 유사하므로 더 이상의 구체적인 설명은 생략하도록 한다.The method in which the pronunciation string generating module 250 for generating a pronunciation string using the LSTM and CTC techniques generates pronunciation strings is similar to the conventional art, and thus a detailed description thereof will be omitted.

출력모듈(260)은 생성된 발음열을 출력하는 장치로서, 발음열 생성 모듈(250)이 생성한 발음열을 전달받아, 사용자에게 출력하거나, 발음사전 저장장치(300)에 저장할 수 있다.The output module 260 is a device for outputting the generated pronunciation string. The output module 260 may receive the pronunciation string generated by the pronunciation string generation module 250 and output it to the user or store the pronunciation string in the pronunciation dictionary storage device 300.

이때, 상기 입력모듈(210)을 통해 입력된 단어 또는 어절과 상기 생성된 발음열을 매칭하여 저장할 수 있다.At this time, the word or phrase inputted through the input module 210 and the generated pronunciation string can be matched and stored.

이상으로 본 발명에 따른 발음 사전 학습 장치(200)에 대해 설명하였다.The pronunciation dictionary learning apparatus 200 according to the present invention has been described above.

이하, 발음 사전 학습 장치(200)의 동작과정에 대하여 살펴보도록 한다.Hereinafter, an operation process of the pronunciation dictionary learning apparatus 200 will be described.

도4는 발음 사전 학습 장치(200)의 동작 과정을 설명하기 위한 흐름도이다.4 is a flowchart for explaining an operation process of the pronunciation dictionary learning apparatus 200. FIG.

발음 사전 학습 장치(200)에 입력되는 문자열은 각기 다른 2 이상의 언어가 혼합된 문자열이면, 어떠한 문자열이든지 가능하나, 설명의 편의를 위하여 한국어의 한글과 영어의 알파벳이 혼합된 문자열이 입력된 것을 가정하고 설명하도록 한다.It is possible to use any string as long as the string inputted to the pronunciation dictionary learning device 200 is a mixture of two or more different languages, but for the sake of convenience of explanation, it is assumed that a string in which Korean alphabet and English alphabet are mixed is input .

발음 사전 학습 장치(200)에 한글과 알파벳이 혼합된 문자열이 입력되면(S101), 발음 사전 학습 장치(200)는 한국어 부분의 문자열과 영어 부분의 문자열을 분리한다(S103).When a character string in which the Korean alphabet and the alphabet are mixed is input to the pronunciation dictionary learning apparatus 200 (S101), the pronunciation dictionary learning apparatus 200 separates the Korean character string and the English character string (S103).

그 후, 한국어 부분의 문자열을 자소 단위의 문자소로 구분하여 한국어 문자소열을 생성하고(S105), 영어 부분의 문자열을 문자 단위의 문자소로 구분하여 영어 문자소열을 생성한다(S107).Thereafter, the Korean character string is divided into alphabetic characters to generate a Korean character sequence (S105), and the English character string is divided into alphabetic characters (S107).

그 후, 영어의 한국어 발음 기호를 고려하여 설정된 제1 규칙을 기반으로 영어 문자소열을 변환하는데(S109), 예를 들면, 영어의 발음기호에는 '으'에 해당하는 발음기호가 존재하지 않으므로, '으'의 발음이 올 수 있는 연속된 자음 알파벳 프레임 사이, 또는 마지막에 위치한 자음 알파벳 프레임의 다음 등, '으'발음이 표현되어야 하는 위치에 '으'를 나타내는 기 설정된 프레임을 삽입할 수 있다.Thereafter, the English character sequence is converted based on the first rule set in consideration of the Korean pronunciation symbol of English (S109). For example, since the pronunciation symbol corresponding to 'U' does not exist in the pronunciation symbol of English, A predetermined frame representing 'u' may be inserted at a position where a 'u' pronunciation should be expressed, such as between consecutive consonant alphabet frames that can be pronounced 'u' or next to consonant alphabet frames located at the end .

이렇게 영어 문자소열을 변환하면, 상기 생성한 한글 문자소열과 변환된 문자소열을 결합하여 전체 문자소열을 생성하고, 연음 법칙을 고려하여 설정된 제 2규칙을 기반으로 전체 문자소열을 변환한다(S111).If the English character sequence is converted, the generated Korean character sequence is combined with the converted character sequence to generate the entire character sequence, and the entire character sequence is converted based on the second rule set in consideration of the law of the smell (S111) .

이때, 상기 제 2규칙은 영어의 알파벳과 한글이 결합하여 소리가 날 때, 발생할 수 있는 연음 법칙을 고려하여 설정된 것으로, 자음 알파벳 프레임 다음에 한글 모음 프레임이 위치하거나 한글 종성 프레임 다음에 모음 알파벳 프레임이 위치하는 경우, 연음이 발생하므로, 상기 자음 알파벳 프레임 또는 상기 한글 종성 프레임에 대응하는 연음 프레임으로 상기 자음 알파벳 프레임 또는 상기 한글 종성 프레임을 변경할 수 있다.In this case, the second rule is set in consideration of the law of abondence that can occur when the English alphabet and Hangul are combined and sounded. When a Korean vowel frame is located after the consonant alphabet frame or a vowel alphabet frame The consonant alphabet frame or the Hangul consecutive frame can be changed by the consonant frame corresponding to the consonant alphabet frame or the Hangul consecutive frame.

S111 단계에서 변환된 문자소열에 대해 순방향 LSTM과 역방향 LSTM을 수행하여 발음열을 생성하고, 생성된 발음열을 출력하여, 발음사전 저장장치(300)에 저장할 수 있다(S113~S117).In step S111, the forward direction LSTM and the reverse direction LSTM may be performed on the converted character string to generate a pronunciation string, and the generated pronunciation string may be output and stored in the pronunciation dictionary storage device 300 (S113 to S117).

상기 도4에서 설명한 발음 사전 학습 장치(200)의 동작과정에 대한 구체적인 실시 예를 설명하기 위하여, 도5를 참조하면, 발음 사전 학습 장치(200)에 {미니STOP에}라는 어절이 입력되면(S201), 한글 부분 문자열인 {미니}, {에}와 영여 부분 문자열인 {STOP} 부분으로 분리한다(S203~S205).Referring to FIG. 5, when the word dictionary for {Mini STOP} is input to the pronunciation dictionary learning apparatus 200 (FIG. 5), the pronunciation dictionary learning apparatus 200 shown in FIG. S201), and separates into Hangul partial strings {mini}, {end} and ending partial string {STOP} (S203 to S205).

그리고, 한글 부분 문자열을 자소 단위의 문자소로 구분하여 {ㅁ,ㅣ,ㄴ,ㅣ}와 {ㅔ}라는 2개의 문자소열을 생성하고(S207), 영어 부분 문자열을 문자 단위의 문자소로 구분하여 {S,T,O,P}의 문자소열을 생성한다(S209).Then, a two-character sequence {k?,?,?,?,} And {?} Are generated by dividing the Korean partial character string into alphabetical character units (S207) S, T, O, P} (S209).

그리고, 영어 부분의 문자열을 한글 발음에 부합하는 발음열로 생성하기 위하여, {S,T,O,P}의 문자소열을 변환하는데, 'S'와 'T'의 자음 알파벳 프레임이 연속하여 'ㅅ,ㅡ,ㅌ' 형태의 발음을 생성하게 되므로, 한글 발음 '으'를 나타내기 위해 기 설정된 프레임인 'K_EU'를 'S'와 'T'사이에 삽입하여, {S,T,O,P}의 문자소열을 {S,K_EU,T,O,P}의 문자소열로 변환한다(S211).Then, in order to generate a string of the English part as a pronunciation string corresponding to Hangul pronunciation, a character sequence of {S, T, O, P} is converted, and consonant alphabet frames of 'S' and 'T' S ',' T ',' O ',' K ',' T ', and' T ' P} into a character sequence of {S, K_EU, T, O, P} (S211).

그 후, 한글 부분의 문자소열과 변환된 영어 문자소열을 결합하여, {ㅁ,ㅣ,ㄴ,ㅣ,S,K_EU,T,O,P,ㅔ}의 전체 문자소열을 생성하고(S213), 자음 알파벳 프레임 'P'다음에 한글 모음 프레임 'ㅔ'가 위치하여, 한글 발음으로는 '미니스타베'와 같이 발음되는 연음 현상이 발생하므로, 이러한 연음 현상이 발생됨을 표시하기 위하여, 자음 알파벳 프레임 'P'를 'P'의 연음 기호인 'P_LK'로 변환한다.Then, an entire character sequence of {Kl, L, L, L, S, K_EU, T, O, P, K} is generated by combining the character sequence of the Hangul part and the translated English character sequence (S213) A consonant alphabet frame 'P' followed by a Korean vowel frame 'ㅔ' is generated, and a Hangul pronunciation is pronounced as a 'Mini Starbe'. Thus, in order to indicate that such a loudness phenomenon occurs, Converts 'P' to 'P_LK', which is an abbreviation of 'P'.

즉, 전체 문자소열을 {ㅁ,ㅣ,ㄴ,ㅣ,S,K_EU,T,O,P_LK,ㅔ}로 변환한다(S215).That is, the entire character sequence is converted into {K?,?,?,?, S, K_EU, T, O, P_LK,?} (S215).

그 후, 상기 변환된 전체 문자소열을 기반으로 순방향 LSTM 및 역방향 LSTM을 수행하면(S217~S219), {m,i,n,i,s,eu,t,a,b,e}와 같은 발음열이 생성되고, 상기 발음열을 {미니STOP에}와 매칭하여 저장할 수 있다(S221).Then, if the forward LSTM and the backward LSTM are performed based on the converted total character exclusion (S217 to S219), pronunciation such as {m, i, n, i, s, eu, t, a, b, e} A column is generated, and the pronunciation column can be matched with {Mini STOP} and stored (S221).

상술한 발음 사전 학습 장치(200) 및 발음사전 저장장치(300), 텍스트 DB(100)는 도6에 표현된 음성 인식 시스템의 일부에 포함되어 구성될 수 있다.The pronunciation dictionary learning device 200, the pronunciation dictionary storage device 300, and the text DB 100 described above may be included in a part of the speech recognition system shown in FIG.

도6에서는 점선으로 표시된 부분이 본 발명에 따른 발음 사전 학습 방법을 구현하기 위한 장치들이 포함되는 부분을 나타낸다.In FIG. 6, a portion indicated by a dotted line represents a portion including devices for implementing a pronunciation learning dictionary method according to the present invention.

이 외에도 음성 인식 시스템은 언어모델을 학습하고 저장하는 언어모델 학습장치(420) 및 언어모델 저장장치(410), 사람의 발음 모델을 학습하고 저장하는 음향모델저장장치(510), 음향 모델 학습 장치(520) 및 음향 모델을 도출하기 위한 여러 발음을 저장하는 스피치DB(530), 발음 사전, 언어 모델, 음향 모델을 통합한 통합 모델을 산출하는 통합 모델 산출 장치(600), 사용자의 음성 신호를 수신하여, 통합 모델 산출 장치(600)에서 산출한 통합 모델을 이용하여 상기 음성 신호를 처리하여, 단어열을 생성하는 음성 신호 처리 장치(700)등을 포함하여 구성될 수 있다.In addition, the speech recognition system includes a language model learning device 420 and a language model storage device 410 for learning and storing a language model, an acoustic model storage 510 for learning and storing a pronunciation model of a person, A speech DB 530 for storing various sounds for deriving an acoustic model, an integrated model calculating device 600 for calculating an integrated model in which a pronunciation dictionary, a language model, and an acoustic model are integrated, And an audio signal processor 700 for processing the audio signal using the integrated model calculated by the integrated model calculating device 600 and generating word strings.

즉, 사용자가 입력한 음성 신호를 발음 사전, 언어 모델 및 음향 모델을 기반으로 분석하여 사용자가 말하는 문장 또는 단어가 무엇인지를 해석하여 출력할 수 있으며, 본 발명에 따른 발음 사전 학습 방법을 사용하면, 2개 이상의 언어가 혼합된 단어나 어절이라고 하더라도 이를 효과적으로 분석하여 사용자가 말하는 문장 또는 단어 및 그 의미를 효과적으로 해석할 수 있다.That is, the speech signal input by the user can be analyzed based on the pronunciation dictionary, the language model, and the acoustic model, and the sentence or the word that the user speaks can be analyzed and outputted. If the pronunciation dictionary learning method according to the present invention is used , Even if a word or a phrase is a mixture of two or more languages, it can be effectively analyzed to effectively interpret a sentence or word and its meaning.

이상에서 설명한 바와 같이, 본 명세서는 다수의 특정한 구현물의 세부사항들을 포함하지만, 이들은 어떠한 발명이나 청구 가능한 것의 범위에 대해서도 제한적인 것으로서 이해되어서는 안되며, 오히려 특정한 발명의 특정한 실시형태에 특유할 수 있는 특징들에 대한 설명으로서 이해되어야 한다. As described above, the present specification contains details of a number of specific implementations, but they should not be construed as being limitations on the scope of any invention or claimability, but rather on the particular embodiment of a particular invention But should be understood as an explanation of the features.

또한, 특정한 순서로 도면에서 동작들을 묘사하고 있지만, 이는 바람직한 결과를 얻기 위하여 도시된 그 특정한 순서나 순차적인 순서대로 그러한 동작들을 수행하여야 한다거나 모든 도시된 동작들이 수행되어야 하는 것으로 이해되어서는 안 된다. 특정한 경우, 멀티태스킹과 병렬 프로세싱이 유리할 수 있다. 또한, 상술한 실시형태의 다양한 시스템 컴포넌트의 분리는 그러한 분리를 모든 실시형태에서 요구하는 것으로 이해되어서는 안되며, 설명한 프로그램 컴포넌트와 시스템들은 일반적으로 단일의 소프트웨어 제품으로 함께 통합되거나 다중 소프트웨어 제품에 패키징될 수 있다는 점을 이해하여야 한다.In addition, although the operations are depicted in the drawings in a particular order, it should be understood that such operations must be performed in that particular order or sequential order shown to obtain the desired results, or that all illustrated operations should be performed. In certain cases, multitasking and parallel processing may be advantageous. Also, the separation of the various system components of the above-described embodiments should not be understood as requiring such separation in all embodiments, and the described program components and systems will generally be integrated together into a single software product or packaged into multiple software products It should be understood.

본 기술한 설명은 본 발명의 최상의 모드를 제시하고 있으며, 본 발명을 설명하기 위하여, 그리고 통상의 기술자가 본 발명을 제작 및 이용할 수 있도록 하기 위한 예를 제공하고 있다. 이렇게 작성된 명세서는 그 제시된 구체적인 용어에 본 발명을 제한하는 것이 아니다. 따라서, 상술한 예를 참조하여 본 발명을 상세하게 설명하였지만, 통상의 기술자라면 본 발명의 범위를 벗어나지 않으면서도 본 예들에 대한 개조, 변경 및 변형을 가할 수 있다.The description sets forth the best modes of the present invention and provides examples for the purpose of illustrating the invention and enabling a person skilled in the art to make and use the invention. The written description is not intended to limit the invention to the specific terminology presented. Thus, while the present invention has been described in detail with reference to the above examples, those skilled in the art will recognize that modifications, changes, and modifications can be made thereto without departing from the scope of the present invention.

따라서 본 발명의 범위는 설명된 실시 예에 의하여 정할 것이 아니고 특허청구범위에 의해 정하여져야 한다.Therefore, the scope of the present invention should not be limited by the described embodiments but should be defined by the claims.

본 발명은 발음 사전 학습 방법 및 장치에 관한 것으로서, 더욱 상세하게는 어느 일국의 언어를 다른 나라 언어의 발음 기호를 고려하여 발음 사전을 학습하고자 하는 경우, 어느 일국의 언어 체계와 다른 나라 언어의 언어 체계의 차이점을 기반으로, 발음 사전을 학습하여, 다른 나라의 발음에 가까운 발음을 제공할 수 있도록 하는 발음 사전 학습 방법 및 장치에 관한 것이다.The present invention relates to a pronunciation dictionary learning method and apparatus, and more particularly, to a pronunciation dictionary learning method in which a language of a certain country is studied in consideration of pronunciation symbols of other languages, The present invention relates to a pronunciation dictionary learning method and apparatus for learning a pronunciation dictionary based on a difference of a system and providing a pronunciation similar to a pronunciation of another country.

따라서, 상기의 발음 사전 학습 방법을 통해 어학 학습 산업 발전에 이바지 할 수 있고, 더불어, 본 발명은 시판 또는 영업의 가능성이 충분할 뿐만 아니라 현실적으로 명백하게 실시할 수 있는 정도이므로 산업상 이용가능성이 있다.Therefore, it is possible to contribute to the development of the language learning industry through the pronunciation dictionary learning method described above. In addition, the present invention has a possibility of industrial use because it is not only a possibility of being marketed or operated but also can be practically and practically carried out.

100: 텍스트 DB 200: 발음 사전 학습 장치 300: 발음사전 저장장치
210: 입력모듈 220: 언어분리모듈 230: 문자소열생성모듈
240: 문자소열변환모듈 250: 발음열 생성모듈
260: 출력모듈 410: 언어모델 저장장치 420: 언어모델 학습장치
510: 음향모델 저장장치 520: 음향모델 학습장치 530: 스피치 DB
600: 통합모델 산출장치 700: 음성신호 처리장치100: text DB 200: pronunciation dictionary learning device 300: pronunciation dictionary storage device
210: input module 220: language separation module 230:
240: character smoothing conversion module 250: pronunciation smoothing conversion module
260: output module 410: language model storage device 420: language model learning device
510: Acoustic model storage device 520: Acoustic model learning device 530: Speech DB
600: integrated model calculation device 700: voice signal processing device

Claims

Generating a character sequence of the second language by dividing a character string of a second language composed of words or phrases into characters;
Converting a character sequence of the second language based on a first rule set in consideration of a phonetic symbol of the first language; And
Generating pronunciation strings corresponding to the words or phrases based on the converted character strings;
A pronunciation dictionary learning method.

2. The method of claim 1, further comprising: prior to the step of generating the character sequence,
Separating a character string including the first language and the second language into a character string of the first language and a character string of the second language;
Further comprising: before the step of generating the pronunciation column,
Generating a character sequence of a first language by dividing a character string of the first language into characters;
Combining the character string of the first language with the character string of the converted second language to generate a full character string; And
Converting the combined total character sequence based on a second rule set in consideration of the law of abstraction;
Wherein the step of generating the pronunciation string generates a pronunciation string for the entire small letter string.

3. The method of claim 2, wherein if the first language is Hangul and the second language is English,
If the Korean vowel frame is located after the consonant alphabet frame or the vowel alphabet frame is located after the Korean consonant frame, the consonant alphabetic frame or the consonant alphabetic frame corresponding to the Korean-style consonant frame, And changing the Hangul consecutive frame.

2. The method of claim 1, wherein if the first language is Hangul and the second language is English,
Wherein the step of inserting a consonant phonetic alphabet is a step of inserting a predetermined frame representing a Hangul pronunciation '''between two consonant consonant alphabet frames or after consonant alphabet frames located at the end of the consonant sequence.

The method according to claim 1, wherein the step of generating a pronunciation string corresponding to the word or phrase
Wherein a pronunciation string is generated based on at least one of a Long Short Term Memory (LSTM) technique and a Connectionist Temporal Classification (CTC) technique.

The method according to claim 1,
Matching the generated pronunciation string with the word or phrase;
Further comprising the steps of:

A computer-readable recording medium on which a program for executing the method according to any one of claims 1 to 6 is recorded.

A character string generation module for generating a character string of the second language by dividing the string of the second language into words and phrases;
A character cancellation conversion module for converting a character cancellation of the second language based on a first rule set in consideration of a phonetic symbol of the first language; And
A pronunciation sequence generation module for generating a pronunciation sequence corresponding to the word or the phrase based on the changed character sequence;
A pronunciation dictionary learning device.

9. The method of claim 8,
A language separation module for separating a character string including the first language and a second language into a character string of the first language and a character string of the second language;
, And the character string exclusion generation module
Wherein the character string sorter further generates a character string sorter of the first language by dividing the string of the first language into a character string,
Generating a total character sequence by combining the character sequence of the first language and the character sequence of the converted second language, transforming the combined total character sequence based on a second rule set in consideration of the law of abstraction, The pronunciation tone generation module
And generates a pronunciation column for the converted whole small letter string.