KR100288144B1

KR100288144B1 - Foreign language coding method in Korean and search method using the same

Info

Publication number: KR100288144B1
Application number: KR1019980054212A
Authority: KR
Inventors: 유광일; 이준성; 김명호; 김종수; 신봉근; 정희정
Original assignee: 이계철; 한국전기통신공사
Priority date: 1998-12-10
Filing date: 1998-12-10
Publication date: 2001-05-02
Also published as: KR20000039018A

Abstract

1. 청구범위에 기재된 발명이 속한 기술분야1. TECHNICAL FIELD OF THE INVENTION

본 발명은 외래어 코드화 방법 및 그를 이용한 검색 방법에 관한 것임.The present invention relates to a foreign language encoding method and a search method using the same.

2. 발명이 해결하려고 하는 기술적 과제2. The technical problem to be solved by the invention

본 발명은, 정보 검색시스템 등에서 한글로 다양하게 표기될 수 있는 외국어 및 외래어를 초, 중, 종성으로 분리하여 각각을 발음 및 음운 법칙에 따라 코드화하므로써, 서로 다른 형태로 표기되었으나 동일한 외래어를 의미하는 여러 자료들을 정보 검색시 일치시키기 위한 코드화 방법 및 그를 이용한 검색 방법을 제공하고자 함.According to the present invention, the foreign language and the foreign language, which can be variously expressed in Korean in an information retrieval system and the like, are separated into elementary, middle, and final languages, and each is coded according to the pronunciation and phonological laws. The purpose of this paper is to provide an encoding method for retrieving various data and matching method for information retrieval.

3. 발명의 해결방법의 요지3. Summary of Solution to Invention

본 발명은, 완성형 한글로 구성된 외래어 문자열을 초성, 중성, 종성 분리가 용이한 조합형 한글 문자열로 변환하는 제 1 단계; 기본 코드화 규칙 테이블 및 변환 코드화 규칙 테이블을 저장수단에 저장하는 제 2 단계; 및 상기 기본 코드화 규칙 테이블 및 상기 변환 코드화 규칙 테이블을 이용하여 상기 조합형 한글 문자열로 변환된 외래어 문자열을 초성, 중성, 종성으로 분리한 후에, 각 음소별로 발음 및 음운법칙에 따라 코드값을 부여하는 제 3 단계를 포함함.The present invention comprises a first step of converting a foreign-language string consisting of a complete Hangul into a combinatorial Hangul string that is easy to separate initial, neutral, and final; A second step of storing the basic encoding rule table and the transformation encoding rule table in storage means; And dividing the foreign-language string converted into the combined Hangul character string into initial, neutral, and finality using the basic encoding rule table and the conversion encoding rule table, and assigning a code value to each phoneme according to the pronunciation and phonological laws. Includes 3 levels.

4. 발명의 중요한 용도4. Important uses of the invention

본 발명은 정보 검색시스템 등에 이용됨.The present invention is used for information retrieval system and the like.

Description

Foreign language coding method in Korean and search method using the same

본 발명은 정보 검색시스템 등에서 한글로 다양하게 표기될 수 있는 외국어 및 외래어를 코드화하므로써, 서로 다른 형태로 표기되었으나 동일한 외국어(또는 외래어)를 의미하는 여러 자료들을 정보 검색시 일치시키는 외래어 코드화 방법 및 그를 이용한 검색 방법과, 그를 실현시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체에 관한 것이다.The present invention encodes a foreign language and a foreign language that can be variously expressed in Korean in an information retrieval system, etc., and displays a foreign language encoding method that matches various data that are expressed in different forms but mean the same foreign language (or foreign language) when searching for information. A search method used and a computer readable recording medium having recorded thereon a program for realizing the same.

일반적으로, 업체의 상호명과 같은 고유 명칭에 대한 검색시스템의 자료중 상당수는 외래어(차용어)로 구성되어 있다.In general, much of the search system's data on unique names, such as the company's business name, is composed of foreign words (borrowed words).

그런데, 이러한 외래어 상호명은 통일된 표기법을 정하기 힘들고, 검색시스템 사용자에 의한 다양한 표기가 가능한 특성을 갖는다. 즉, 동일한 외래어 단어에 대한 다양한 표기가 가능하다.However, such a foreign language name is difficult to determine a unified notation, and has a characteristic that can be variously expressed by the search system user. That is, various notations for the same foreign word are possible.

정보 검색시스템은 방대한 자료들에 대한 다양한 형태의 검색 방법을 제공하는 시스템이다.An information retrieval system is a system that provides various types of retrieval methods for a large amount of data.

최근들어, 웹 환경의 발달 등에 힘입어, 정보 검색시스템은 사용자가 직접 검색어를 입력하고, 이와 일치되는 자료들을 찾는 방식으로 서비스되고 있다. 이러한 형태의 검색 시스템에서 일반적인 영문 자료와 한국어 자료에 대한 검색 기법은 많은 연구가 되어 왔으며, 효과적인 검색 기법이 제안되어왔다.Recently, thanks to the development of the web environment, the information retrieval system is being serviced by a user directly inputting a search word and searching for data corresponding thereto. In this type of retrieval system, search techniques for general English and Korean data have been studied and effective retrieval techniques have been proposed.

그러나, 한글로 표기된 자료중 외국어 또는 외래어의 검색은 그 특성상 특별한 검색 기법이 필요하다. 즉, 외국어 및 외래어는 그 발음에 따라 한글로 표기하게 되므로 완전하게 표준화된 표기 방법을 정하기가 힘들다. 또한, 사용자가 직접 검색어를 입력하는 경우에, 사용자가 입력 가능한 다양한 표기 방법을 실제로 저장된 자료의 표기 방법과 일치시켜야 하는 문제점이 존재한다.However, the search of foreign or foreign language among the Korean written data requires a special search technique. In other words, foreign and foreign languages are written in Korean according to their pronunciation, so it is difficult to determine a completely standardized notation method. In addition, when a user directly inputs a search word, there is a problem in that various notation methods that can be input by the user must match the notation method of the stored data.

이와 같이, 다양하게 표기될 수 있는 한글로 표기된 외국어(또는 외래어) 검색과 관련된 기술로는, 영문 알파벳으로 된 문자열을 알파벳 문자열 대신 특정 코드로 저장하는 "SOUNDEX 코드"를 들 수 있다. 이러한 "SOUNDEX 코드"는 영문에서 사람의 성(Surname)을 표기한 단어에 대하여, 각 알파벳을 그 발음에 따라 특정 코드로 변환하므로써, 서로 다르게 표기되었으나 유사하게 발음되는 단어를 일치시키는 방법이다.As such, a technique related to searching for a foreign language (or a foreign language) written in Korean, which can be variously described, includes a "SOUNDEX code" that stores a string of English alphabets as a specific code instead of an alphabetic string. The "SOUNDEX code" is a method of matching words that are marked differently but are pronounced similarly by converting each alphabet into a specific code according to the pronunciation of a word in which a person's surname is written in English.

그러나, 이러한 기법은 영문 알파벳 문자열을 기반으로 제안되었으므로, 한글로 표기된 외국어 (또는 외래어) 검색 문제에는 적용할 수 없으며, 코드화 방법 또한 문자간의 전후 관계 등을 고려치 않은 단순 변환이다. 또한, 한글로 표기된 외국어 및 외래어 검색 기법에 대한 국내외의 기존 연구는 아직까지 조사된 바 없다.However, since this technique is proposed based on the English alphabet string, it cannot be applied to the foreign language (or foreign language) search problem written in Korean, and the encoding method is also a simple conversion without considering the context between characters. In addition, existing domestic and foreign studies on foreign and foreign language retrieval techniques written in Korean have not been investigated.

도 1 은 일반적인 사용자 입력 검색어 기반의 검색시스템의 구성도로서, 도면에서 "10"은 검색 인터페이싱부, "11"은 검색어 데이터베이스 검색부, 및 "12"는 검색어 데이터베이스를 각각 나타낸다.1 is a block diagram of a general user input search word-based search system, in which "10" represents a search interface, "11" represents a search word database search unit, and "12" represents a search word database, respectively.

도 1을 참조하면, 일반적인 사용자 입력 검색어 기반의 검색시스템은 검색어 데이터베이스 검색부(11)에서 사용자가 입력한 검색어를 사용하여 시스템에서 관리하는 검색어 데이터베이스(12)를 검색한 후에, 일치하는 자료가 존재하는 경우에 이를 사용자에게 반환한다.Referring to FIG. 1, a general user input search term based search system searches for a search term database 12 managed by the system using a search term input by a user in a search term database search unit 11, and then a matching data exists. If so, return it to the user.

그러나, 사용자가 입력한 검색어와 검색어 데이터베이스(12)의 자료가 한글로 표기된 외국어(또는 외래어)인 경우에는 검색의 문제점이 존재한다.However, there is a problem of searching when the search word input by the user and the data of the search word database 12 are foreign languages (or foreign languages) written in Korean.

그런데, 이러한 외국어 및 외래어는 그 발음에 따라 한글로 표기되므로, 이를 표기하는 사람에 따라 다양한 표기 방법이 존재한다. 비록, 외국어 및 외래어의 표준화된 표기를 위한 규칙 등이 제정되기는 하였지만, 사용자가 직접 검색어를 입력하는 경우에, 사용자가 완벽히 표준화된 표기 방법으로 검색어를 작성한다는 보장을 할 수 없다.However, since these foreign languages and foreign words are represented in Korean according to their pronunciation, there are various notation methods according to the person who writes them. Although rules for standardized notation of foreign and foreign languages have been established, when a user directly inputs a search word, there is no guarantee that the user writes the search word in a perfectly standardized notation method.

예를들면, "Clover Hotel"이 한글로 표기되어 검색어 데이터베이스(12)에 저장되어 있다고 가정하자.For example, assume that "Clover Hotel" is written in Korean and stored in the search word database 12.

이에 대한 한글 표기는 "클로버 호텔", "크로버 호탤", "클로바 호텔" 등 매우 다양하게 존재할 수 있다. 즉, 이러한 다양한 표기방법중 어느 한 가지만을 검색어 데이터베이스(12)에 저장한다면, 가능한 표기법중 사용자가 오직 시스템에 저장된 값과 완전히 일치되는 표기법의 검색어를 입력한 경우에만 검색에 성공한다. 또한, 이를 해결하기 위해 가능한 모든 표기법을 저장하는 것은 기억장치 용량의 엄청난 낭비를 초래한다.The Korean notation for this may be in various ways such as "clover hotel", "clover hotel", and "clover hotel". That is, if only one of these various notations is stored in the search term database 12, the search is successful only if the user enters a search term of a notation that matches the value stored in the system completely among the possible notations. In addition, storing all possible notations in order to solve this causes a huge waste of storage capacity.

이처럼, 종래에는 다양한 표기가 가능한 한글로 표기된 외국어(또는 외래어)에 대해 정확한 검색을 수행할 수 없는 문제점이 있었다.As such, in the related art, there is a problem in that an accurate search cannot be performed for a foreign language (or a foreign language) written in Korean, which can be variously written.

상기한 바와 같은 문제점을 해결하기 위하여 안출된 본 발명은, 정보 검색시스템 등에서 한글로 다양하게 표기될 수 있는 외국어 및 외래어를 초, 중, 종성으로 분리하여 각각을 발음 및 음운 법칙에 따라 코드화하므로써, 서로 다른 형태로 표기되었으나 동일한 외국어(또는 외래어)를 의미하는 여러 자료들을 정보 검색시 일치시키기 위한 외래어 코드화 방법 및 그를 이용한 검색 방법과, 그를 실현시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체를 제공하는데 그 목적이 있다.In order to solve the problems described above, the present invention, by separating the foreign language and the foreign language that can be variously expressed in Korean in the information retrieval system, such as elementary, middle, and final by encoding each according to the law of pronunciation and phonology, Provides a method of encoding a foreign language and a search method using the same, and a computer-readable recording medium that records a program for realizing the same, in which different materials are represented in different forms but mean the same foreign language (or foreign language). Its purpose is to.

도 1 은 일반적인 사용자 입력 검색어 기반의 검색시스템의 구성도.1 is a block diagram of a general user input search term based search system.

도 2 는 본 발명에 따른 한글로 표기된 외국어(또는 외래어)의 코드화 과정을 이용한 검색 방법에 대한 일실시예 설명도.2 is a diagram illustrating an embodiment of a search method using a coding process of a foreign language (or a foreign language) represented in Korean according to the present invention.

도 3 은 본 발명에 따른 한글로 표기된 외국어(또는 외래어) 기본 코드화 규칙 테이블의 구조도.3 is a structural diagram of a foreign language (or foreign language) basic encoding rule table written in Korean according to the present invention;

도 4 는 본 발명에 따른 한글로 표기된 외국어(또는 외래어) 변환 코드화 규칙 테이블의 구조도.4 is a structural diagram of a foreign language (or foreign language) conversion encoding rule table written in Korean according to the present invention;

도 5 는 본 발명에 따른 다양하게 한글로 표기된 외국어(또는 외래어)의 코드화 방법에 대한 일실시예 흐름도.5 is a flowchart illustrating one embodiment of a method of encoding a foreign language (or a foreign language) represented by various Korean characters according to the present invention.

도 6 은 본 발명의 실시예에 따른 코드화 기법을 이용한 한글로 표기된 외국어(또는 외래어) 코드화 예시도.Figure 6 is an illustration of a foreign language (or foreign language) coding in Korean using the coding technique according to an embodiment of the present invention.

*도면의 주요 부분에 대한 부호의 설명* Explanation of symbols for the main parts of the drawings

20 : 검색 인터페이싱부 21 : 한글 검색어 코드화부20: Search interface 21: Hangul query coder

22 : 코드화된 검색어 데이터베이스 검색부 23 : 코드화된 검색어 데이터베이스22: coded search query database search unit 23: coded search query database

상기 목적을 달성하기 위한 본 발명은, 정보 검색시스템에 적용되는 한글로 표기된 외래어를 코드화하는 방법에 있어서, 완성형 한글로 구성된 외래어 문자열을 초성, 중성, 종성 분리가 용이한 조합형 한글 문자열로 변환하는 제 1 단계; 기본 코드화 규칙 테이블 및 변환 코드화 규칙 테이블을 저장수단에 저장하는 제 2 단계; 및 상기 기본 코드화 규칙 테이블 및 상기 변환 코드화 규칙 테이블을 이용하여 상기 조합형 한글 문자열로 변환된 외래어 문자열을 초성, 중성, 종성으로 분리한 후에, 각 음소별로 발음 및 음운법칙에 따라 코드값을 부여하는 제 3 단계를 포함한다.In order to achieve the above object, the present invention provides a method for encoding a foreign language written in Korean that is applied to an information retrieval system, comprising: converting a foreign-language string composed of a complete Hangul into a combined Hangul string that is easy to separate initial, neutral, and final Stage 1; A second step of storing the basic encoding rule table and the transformation encoding rule table in storage means; And dividing the foreign-language string converted into the combined Hangul character string into initial, neutral, and finality using the basic encoding rule table and the conversion encoding rule table, and assigning a code value to each phoneme according to the pronunciation and phonological laws. Includes three steps.

그리고, 본 발명은, 정보 검색시스템에 적용되는 한글로 표기된 외래어를 검색하는 방법에 있어서, 한글로 표기된 외래어를 초성, 중성, 종성으로 분리하여 발음 및 음운 법칙에 따라 각 음소별로 코드값을 부여한 후에, 검색어 데이터베이스에 저장하는 제 1 단계; 및 정보 검색시에, 사용자가 입력한 코드화된 사용자 입력 한글 검색어를 상기 검색어 데이터베이스에 저장된 코드값과 비교하여 일치 여부를 검색하는 제 2 단계를 포함한다.In addition, the present invention, in the method of searching for a foreign language written in Korean that is applied to the information retrieval system, after separating the foreign language written in Korean into a consonant, neutral, jongseong to give a code value for each phoneme according to the pronunciation and phonological law A first step of storing in the search word database; And a second step of searching for an information by comparing a coded user input Korean search word input by a user with a code value stored in the search word database.

또한, 본 발명은, 프로세서를 구비한 한글로 표기된 외래어를 코드화하는 장치에, 완성형 한글로 구성된 외래어 문자열을 초성, 중성, 종성 분리가 용이한 조합형 한글 문자열로 변환하는 기능; 기본 코드화 규칙 테이블 및 변환 코드화 규칙 테이블을 저장수단에 저장하는 기능; 및 상기 기본 코드화 규칙 테이블 및 상기 변환 코드화 규칙 테이블을 이용하여 상기 조합형 한글 문자열로 변환된 외래어 문자열을 발음 및 음운법칙에 따라 초성, 중성, 종성으로 분리한 후에, 각 음소별로 코드값을 부여하는 기능을 실현시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체를 제공한다.The present invention also provides a device for encoding a foreign language written in Hangul with a processor, a function for converting a foreign language string composed of the complete Hangul into a combined Hangul string that is easy to separate initial, neutral, and final; Storing the basic encoding rule table and the transformation encoding rule table in storage means; And dividing the foreign-language string converted into the combined Hangul string using the basic encoding rule table and the conversion encoding rule table into initial, neutral, and final according to the pronunciation and phonological rules, and assigning a code value to each phoneme. A computer readable recording medium having recorded thereon a program for realizing the present invention is provided.

또한, 본 발명은, 프로세서를 구비한 한글로 표기된 외래어를 검색하는 장치에, 한글로 표기된 외래어를 초성, 중성, 종성으로 분리하여 발음 및 음운 법칙에 따라 각 음소별로 코드값을 부여한 후에, 검색어 데이터베이스에 저장하는 기능; 및 정보 검색시에, 사용자가 입력한 코드화된 사용자 입력 한글 검색어를 상기 검색어 데이터베이스에 저장된 코드값과 비교하여 일치 여부를 검색하는 기능을 실현시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체를 제공한다.The present invention also provides a device for searching a foreign language written in Korean with a processor, and separates the foreign word written in Korean into a consonant, neutral, and final letter, and assigns a code value to each phoneme according to the pronunciation and phonological laws. The ability to save to; And a computer-readable recording medium having recorded thereon a program for realizing a function of searching for a match by comparing a coded user input Korean search word input by a user with a code value stored in the search word database. .

이하, 첨부된 도면을 참조하여 본 발명에 따른 바람직한 일실시예를 상세히 설명한다.Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 2 는 본 발명에 따른 한글로 표기된 외국어(또는 외래어)의 코드화 과정을 이용한 검색 방법에 대한 일실시예 설명도이다.2 is a diagram illustrating an embodiment of a search method using a coding process of a foreign language (or a foreign language) written in Korean according to the present invention.

본 발명에 따른 한글로 표기된 외국어(또는 외래어)를 코드화하는 과정을 이용한 검색 방법은, 다양한 한글 표기가 가능한 외국어(또는 외래어)들을 발음 및 음운 법칙에 기반하여 코드화하므로써, 효과적인 검색을 가능하게 한다.The retrieval method using a process of encoding a foreign language (or a foreign language) written in Korean according to the present invention enables effective retrieval by encoding a variety of foreign languages (or foreign words) capable of using Korean based on phonetic and phonological laws.

도 2에 도시된 바와 같이, 본 발명에 따른 한글로 표기된 외국어(또는 외래어)를 코드화하는 과정을 이용한 검색 방법은, 시스템 자료들을 한글 검색어 코드화부(21)에서 고유의 코드로 변환하여 코드화된 입력 검색어를 코드화된 검색어 데이터베이스(23)에 저장한다.As shown in FIG. 2, a search method using a process of encoding a foreign language (or a foreign language) written in Korean according to the present invention includes converting system data into a unique code in the Korean search word encoding unit 21 and input the encoded code. The search word is stored in the coded search word database 23.

그리고, 코드화된 검색어 데이터베이스 검색부(22)에서 정보 검색시 사용자가 입력한 사용자 입력 한글 검색어를 코드화 과정을 거쳐, 코드화된 검색어 데이터베이스(23)내에 저장된 코드값과 비교하여 일치 여부를 조사한다. 이때, 본 발명의 코드화 기법은 동일한 외국어(또는 외래어)에 대한 서로 다른 다양한 표기법에 대하여 동일한 코드화 결과값을 생성하며, 따라서 코드값 저장을 위한 공간만으로 한글로 표기된 외국어(또는 외래어) 검색을 효과적으로 처리한다.Then, the coded search word database searcher 22 compares the user input Hangul search word input by the user when the information is searched with the code value stored in the coded search word database 23 to check whether there is a match. In this case, the encoding scheme of the present invention generates the same encoding result for different notation for the same foreign language (or foreign language), and thus effectively processes the foreign language (or foreign language) search in Korean with only the space for storing the code value. do.

이제, 한글로 표기된 외국어(또는 외래어)를 효과적으로 검색하기 위한 코드화 과정에 대해 보다 상세히 설명한다.Now, the encoding process for effectively searching for a foreign language (or a foreign language) written in Korean will be described in more detail.

도 3 은 본 발명에 따른 한글로 표기된 외국어(또는 외래어) 기본 코드화 규칙 테이블의 구조도이다.3 is a structural diagram of a foreign language (or foreign language) basic encoding rule table written in Korean according to the present invention.

본 발명은 한글로 표기된 외국어(또는 외래어)에 대하여, 초성, 중성, 종성 각각에 특수한 문자를 할당하여 코드화를 수행하고, 음절간의 구분자로서 "$"를 사용한다.The present invention performs encoding by assigning a special character to each of the first consonant, the neutral consonant, and the final consonant for a foreign language (or a foreign word) written in Korean, and uses "$" as a separator between syllables.

도 3을 참조하면, 한글로 표기된 외국어(또는 외래어)가 입력되면, 본 발명의 코드화 기법은 기본적으로 한글 문자열을 초, 중, 종성으로 분리하여 기본 코드화 규칙표에 따라 코드화를 수행하며, 외국어 및 외래어의 다양한 표기에 대한 특정 조건이 만족되는 경우에, 후술되는 도 4에 제시된 변환 코드화 규칙표를 이용하여 코드화를 수행한다.Referring to FIG. 3, when a foreign language (or a foreign language) written in Korean is input, the encoding technique of the present invention basically divides a Korean character string into seconds, middle, and final characters to perform encoding according to a basic encoding rule table. When specific conditions for various notations of foreign words are satisfied, encoding is performed using the conversion encoding rule table shown in FIG. 4 to be described later.

도 4 는 본 발명에 따른 한글로 표기된 외국어(또는 외래어) 변환 코드화 규칙 테이블의 구조도이다.4 is a structural diagram of a foreign language (or foreign language) conversion encoding rule table written in Korean according to the present invention.

도 4에 도시된 바와 같이, 괄호안의 숫자는 해당 음소의 위치를 나타낸다. 즉,은 초성 위치의을,은 종성 위치의을 의미한다. 또한,은 종성이이고 다음 문자의 초성이인 경우를 의미한다.As shown in Figure 4, the numbers in parentheses indicate the location of the phonemes. In other words, Of the initial position of, Of the final position Means. Also, Silver bell And the initial character of Means if.

도 4를 참조하면, 제1 유형의 변환 규칙은 유사한 발음이 가능한 음소들에 동일한 코드값을 부여하는 규칙이다. 예를 들면, 영문 단어에서 복수 및 소유격을 나타내는 "~ s", "~ 's" 등을 한글로 표기하였을 경우에, "~스" 및 "~즈" 표기가 모두 가능하다.Referring to FIG. 4, the first type of conversion rule is a rule for giving the same code value to phonemes with similar pronunciation. For example, when "~ s", "~ 's", etc. indicating plural and possessives are written in Korean in English, both "~ s" and "~ z" are possible.

따라서, 초성의과은 동일한 코드로 변환된다. 마찬가지로, 모음와의 경우에도, "~ a" 등의 표기시 모두 나타날 수 있으므로, 동일한 코드로 변환된다.Thus, supernatural and Is converted to the same code. Similarly, vowels Wow Even in the case of "~ a", all of them may appear, so that they are converted to the same code.

그리고, 제2 유형은 발음이 종성과 다음 글자의 초성에 이어져 나타나는 경우이며, 이러한 경우에 종성을 삭제하므로써 통일된 코드값으로 변환한다. 예를 들면, "plus"의 경우에, "플러스" 및 "프러스"로 표기 가능하다.The second type is a case in which the pronunciation is followed by the finality of the last letter and the first letter of the next letter. In this case, the final word is converted into a uniform code value by deleting the finality. For example, in the case of "plus", it can be written as "plus" and "prus".

또한, 제 3 유형은 단어의 마지막에 "~ t" 등의 격음이 나타나는 경우이며, 발음을 길게 늘여 쓰는 방식으로 통일한다.In addition, the third type is a case in which a vowel sound such as "-t" appears at the end of a word, and is unified by extending the pronunciation.

마지막으로, 제4 유형 및 제5 유형은 모음의 표기에 관한 것으로, 모음 축약을 통해 통일된 코드값을 생성한다.Finally, the fourth and fifth types relate to the notation of vowels, resulting in uniform code values through vowel abbreviation.

도 5 는 본 발명에 따른 다양하게 한글로 표기된 외국어(또는 외래어)의 코드화 방법에 대한 일실시예 흐름도로서, 도 3 및 도 4에서 제시된 코드화 규칙을 참조하여 입력된 한글 문자열에 대한 코드화를 수행하는 절차를 나타낸다.FIG. 5 is a flowchart illustrating a method of encoding a foreign language (or a foreign language) represented by various Korean characters according to the present invention. Referring to FIG. 3 and FIG. 4, FIG. Indicates a procedure.

도 5에 도시된 바와 같이, 다양하게 한글로 표기된 외국어(또는 외래어)의 코드화 방법은, 먼저 코드화 수행의 대상이 되는 입력 문자열에 대한 전처리 과정을 수행한다(501). 즉, 본 발명에서 제안한 코드화 기법은 한글의 초/중/종성별로 코드를 할당하게 되므로, 완성형 한글로 구성된 외래어 문자열을 입력으로 가정하여, 입력된 완성형 문자열을 초/중/종성 분리가 용이한 조합형 한글 문자열로 변환하는 전처리 과정이 필요하게 된다.As illustrated in FIG. 5, the method for encoding a foreign language (or a foreign language) written in various Korean characters first performs a preprocessing process on an input string that is a target of encoding (501). That is, in the encoding scheme proposed in the present invention, codes are allocated for each elementary, middle, and finality of Hangul. Therefore, assuming that a foreign-language string composed of a complete Hangul is assumed as an input, the inputted completed string is easily combined with the elementary, middle, and finality. Preprocessing is required to convert Korean strings.

이후, 코드화 수행에 필요한 초기화 작업을 수행한다(502). 즉, 도 3 및 도 4에 제시된 기본 코드화 규칙 테이블 및 변환 코드화 규칙 테이블을 메모리에 적재한다.Thereafter, an initialization operation required to perform encoding is performed (502). That is, the basic encoding rule table and the transformation encoding rule table shown in Figs. 3 and 4 are loaded into the memory.

다음으로, 원시 문자열에서 음절을 분리하여(503) 입력 문자열의 각 음절에 대하여 해당 음절의 초, 중, 종성에 해당하는 음소값을 얻어내고, 변환 규칙의 적용시 참조 할 수 있는 전/후 음절에 대한 정보를 얻어낸다(504). 예를 들면, 입력 문자열이 "클로바"이고 음절 "클"에 대하여 루프가 수행된다면, 초성이, 중성이, 종성이이라는 정보와, 다음 음절(즉, "로")의 초성이이라는 정보를 추출하게 된다.Next, the syllable is separated from the raw string (503) to obtain the phoneme value corresponding to the beginning, middle, and final of the syllable for each syllable of the input string, and can be referred to when applying the conversion rule. Obtain information about 504. For example, if the input string is "Clobar" and a loop is performed on the syllable "Clo", , Neutral Jongjong And the first syllable of the next syllable (that is, "ro") This will extract information.

이후에, 추출된 정보를 바탕으로 코드화 변환 규칙에 해당되는 요건이 존재하는지를 검사(즉, 즉, 현재의 초/중/종성 음소 정보와 전/후 음절에 대한 정보가 도 4에 제시된 규칙의 조건을 만족시키는지를 검사)하여(505), 현재의 음소가 코드화 변환 규칙의 조건을 만족하는지를 분석한다(506).Then, based on the extracted information, it is checked whether there is a requirement corresponding to the coded conversion rule (i.e., the information of the current super / medium / final phoneme information and the information about the pre / post syllable is shown in FIG. 4). (505), and analyze whether the current phoneme satisfies the condition of the encoding conversion rule (506).

분석결과, 현재의 음소가 코드화 변환 규칙의 조건을 만족하면, 도 4의 변환 규칙을 이용하여 현재의 음소를 특정 코드로 변환한 후에(507), 문자열의 끝인지를 판단한다(509).As a result of the analysis, if the current phoneme satisfies the condition of the coded conversion rule, after the current phoneme is converted to a specific code using the conversion rule of FIG. 4 (507), it is determined whether it is the end of the character string (509).

분석결과, 현재의 음소가 코드화 변환 규칙의 조건을 만족시키지 못하면, 도 3의 기본 코드화 규칙표에 따라 현재의 음소를 기본 코드로 변환한 후에(508), 문자열의 끝인지를 판단한다(509).As a result of the analysis, if the current phoneme does not satisfy the condition of the coded conversion rule, after converting the current phoneme to the basic code according to the basic encoding rule table of FIG. 3 (508), it is determined whether it is the end of the string (509). .

예를 들면, 음절 "클"에서 초성의은 코드화 변환 규칙의 조건에 해당되지 않으므로, 도 3의 기본 코드값 "h"로 변환되며, 종성의의 경우 다음 음절의 초성이이므로 도 4의 제2 유형 규칙에 해당되어, 기본 코드값 "9"가 아닌 변환된 코드값 "1"로 변환된다. 이러한 과정을 통하여 현재 변환되고자 하는 음절은 그 초/중/종성의 음소값에 대한 세 자리 코드로 변환된다.For example, in syllable "cle", Does not correspond to the conditions of the coded conversion rule, so it is converted to the basic code value "h" of FIG. In the case of the next syllable The second type rule of FIG. Is converted to the converted code value "1" instead of the basic code value "9". Through this process, the syllable to be converted is converted into three digit code for the phoneme value of the elementary / medium / final.

판단결과, 문자열의 끝이 아니면, 조합형 문자열로 변환된 입력 문자열의 각 음절에 대하여 상기의 단계(503 내지 508)를 반복 수행한다.As a result of the determination, if the end of the string is not repeated, steps 503 to 508 are repeated for each syllable of the input string converted into the combined string.

도 6 은 본 발명의 실시예에 따른 코드화 기법을 이용한 한글로 표기된 외국어(또는 외래어) 코드화 예시도로서, 본 발명에서 제안된 코드화 방법을 이용하여 실제로 한글로 표기된 외국어(또는 외래어)에 대한 코드화를 수행하는 예이다.6 is an exemplary diagram of a foreign language (or a foreign language) coded in Korean using an encoding method according to an embodiment of the present invention. FIG. 6 illustrates an encoding of a foreign language (or a foreign language) actually written in Korean using the proposed coding method. Here is an example.

도 6에 도시된 바와 같이, 서로 다르게 표기된 두 한글 문자열은 각각 초, 중, 종성으로 분리되어, 전술한 바와 같은 코드화 규칙을 사용하여 코드화된다. 그 결과, 서로 다른 두 문자열이 동일한 코드로 변환됨을 알 수 있다.As shown in FIG. 6, two different Korean character strings, which are differently marked, are separated into seconds, middle, and trailing words, respectively, and are encoded using the encoding rules as described above. As a result, you can see that two different strings are converted to the same code.

따라서, 코드화된 검색어 데이터베이스(23)내에 저장된 자료와 사용자가 입력한 검색어가 서로 다르게 표기된 경우에도, 이들간의 코드값 비교를 통하여 검색을 효과적으로 수행할 수 있다.Therefore, even when the data stored in the coded search word database 23 and the search word input by the user are displayed differently, the search can be effectively performed by comparing the code values therebetween.

상기한 바와 같은 본 발명은, 기존에는 효과적으로 처리할 수 없었던, 서로 다르게 한글로 표기된 동일한 의미의 외국어 및 외래어에 대한 효과적인 검색을 가능하게 한다. 즉, 다양한 표기가 가능한 외국어(또는 외래어)의 특성을 바탕으로, 가능한 모든 표기 방법을 데이터베이스에 저장해 놓지 않고도, 저장된 자료와 입력된 검색어의 코드화 과정만으로 검색 성공을 유도할 수 있다. 따라서, 기존의 방법보다 저장 공간 효율성 면에서 매우 우수하다고 할 수 있다.As described above, the present invention enables an effective search for foreign and foreign words having the same meanings written in Korean differently, which could not be effectively processed in the past. That is, based on the characteristics of a foreign language (or a foreign language) that can be written in various ways, a successful search can be induced only by encoding the stored data and the input search word without storing all possible notation methods in a database. Therefore, it can be said that the storage space efficiency is much superior to the conventional method.

그러므로, 본 발명은 한글로 표기된 외국어(또는 외래어) 자료들이 존재하는 검색시스템, 예를 들어 전문 용어 검색 혹은 상호명 검색 시스템 등에서 사용될 수 있다.Therefore, the present invention can be used in a search system in which foreign language (or foreign language) materials written in Korean exist, for example, a term search or a business name search system.

이상에서 설명한 본 발명은, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에 있어 본 발명의 기술적 사상을 벗어나지 않는 범위내에서 여러 가지 치환, 변형 및 변경이 가능하므로 전술한 실시예 및 첨부된 도면에 한정되는 것이 아니다.The present invention described above is capable of various substitutions, modifications, and changes without departing from the spirit of the present invention for those skilled in the art to which the present invention pertains, and the above-described embodiments and accompanying It is not limited to the drawing.

상기한 바와 같은 본 발명은, 외래어의 한글 표기를 코드화하므로써, 다양한 외래어 표기에 대한 동일한 코드 부여를 통해 외래어에 대한 검색 효과를 높일 수 있는 효과가 있다.The present invention as described above, by encoding the Hangul notation of the foreign language, there is an effect that can enhance the search effect for the foreign language through the same code assignment for various foreign language notation.

Claims

In a method of encoding a foreign language written in Korean that is applied to an information retrieval system,

A first step of converting a foreign-language string composed of a complete Hangul into a Hangul-type Hangul string that is easy to separate initial, neutral, and final;

A second step of storing the basic encoding rule table and the transformation encoding rule table in storage means; And

Using the basic encoding rule table and the conversion encoding rule table to separate the foreign language strings converted into the combined Hangul character strings into initial, neutral, and final characters, and to assign code values according to phonetic and phonological laws for each phoneme. step

Foreign language coding method written in Korean, including.

The method of claim 1,

The third step,

For every syllable in the input string converted to a combinatorial string, separate the syllables from the raw string to obtain phoneme values corresponding to the initial, neutral, and final syllables of each syllable in the input string, and apply the conversion rule. Extracting information on pre- and post-syllables for reference;

A fifth step of analyzing whether a current phoneme satisfies a condition of a coded conversion rule by checking whether a requirement corresponding to a coded conversion rule exists based on the extracted information;

A sixth step of converting the current phoneme into a specific code using the conversion encoding rule table if the current phoneme meets the condition of the encoding conversion rule as a result of the analysis of the fifth step; And

A seventh step of converting the current phoneme to a basic code according to the basic encoding rule table if the current phoneme does not satisfy the condition of the encoding conversion rule as a result of the analysis of the fifth step;

Foreign language coding method written in Korean, including.

In a method for searching a foreign language written in Korean that is applied to an information retrieval system,

A first step of dividing a foreign word written in Korean into a consonant, a neutral, and a final word and assigning a code value to each phoneme according to the pronunciation and phonological laws, and storing it in a search word database; And

A second step of searching for information, comparing the coded user input Korean search term input by the user with a code value stored in the search term database;

Foreign language search method is written in Korean, including.

The method of claim 3, wherein

The process of assigning a code value for each phoneme according to the pronunciation and phonological laws of the first step may include:

A third step of converting the foreign-language string composed of the complete Hangul into a Hangul-type Hangul string that is easy to separate initial, neutral, and final;

A fourth step of storing the basic encoding rule table and the conversion encoding rule table in storage means; And

A fifth language in which the foreign language string converted into the combined Hangul character string is divided into initial, neutral, and final according to pronunciation and phonological rules using the basic encoding rule table and the conversion encoding rule table, and then a code value is assigned to each phoneme. step

Foreign language search method is written in Korean, including.

In a device for encoding a foreign language written in Korean with a processor,

Converting a foreign-language string composed of complete Hangul into a combined Hangul string with easy separation of initial, neutral, and longitudinal;

Storing the basic encoding rule table and the transformation encoding rule table in storage means; And

A function of assigning a code value according to the pronunciation and phonological rules for each phoneme after separating the foreign language string converted into the combined Hangul character string into initial, neutral, and finality using the basic encoding rule table and the conversion encoding rule table.

A computer-readable recording medium having recorded thereon a program for realizing this.

In a device for searching for a foreign language written in Korean having a processor,

Separating foreign words written in Korean into initial, neutral, and final words, assigning code values to each phoneme according to phonetic and phonological laws, and storing them in a search word database; And

When searching for information, a function of searching for a match by comparing a coded user input Korean search term input by a user with a code value stored in the search term database